cs.LG - 2023-08-08

TranSTYLer: Multimodal Behavioral Style Transfer for Facial and Body Gestures Generation

  • paper_url: http://arxiv.org/abs/2308.10843
  • repo_url: None
  • paper_authors: Mireille Fares, Catherine Pelachaud, Nicolas Obin
  • for: 这篇论文目标是将虚拟代理人的行为表达风格传递到另一个代理人中,保持行为的形式不变,以便在交流中传递意思。
  • methods: 我们提出了一种基于多模态变换器的模型,称为TranSTYLer,可以将多 modal的source speaker的行为与目标 speaker的风格相结合。我们假设行为表达风格在不同的沟通模式中都存在,包括文本、语音、身体姿势和面部表情。模型采用了内容和风格分离的方法,以确保传递的风格不会对源行为的意思产生干扰。
  • results: 我们在PATS数据集上训练了我们的模型,并对比了现有的状态数据模型。对jective和主观评价结果表明,我们的模型在seen和unseen风格中都能够实现更高的性能。此外,我们还提出了一种方法来评估传递的行为和姿势是否正确,以确保源行为的意思不会产生泄露。
    Abstract This paper addresses the challenge of transferring the behavior expressivity style of a virtual agent to another one while preserving behaviors shape as they carry communicative meaning. Behavior expressivity style is viewed here as the qualitative properties of behaviors. We propose TranSTYLer, a multimodal transformer based model that synthesizes the multimodal behaviors of a source speaker with the style of a target speaker. We assume that behavior expressivity style is encoded across various modalities of communication, including text, speech, body gestures, and facial expressions. The model employs a style and content disentanglement schema to ensure that the transferred style does not interfere with the meaning conveyed by the source behaviors. Our approach eliminates the need for style labels and allows the generalization to styles that have not been seen during the training phase. We train our model on the PATS corpus, which we extended to include dialog acts and 2D facial landmarks. Objective and subjective evaluations show that our model outperforms state of the art models in style transfer for both seen and unseen styles during training. To tackle the issues of style and content leakage that may arise, we propose a methodology to assess the degree to which behavior and gestures associated with the target style are successfully transferred, while ensuring the preservation of the ones related to the source content.
    摘要

Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage

  • paper_url: http://arxiv.org/abs/2308.04341
  • repo_url: None
  • paper_authors: Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju
  • For: This paper aims to mitigate attacks on machine learning models that provide algorithmic recourse to individuals who receive negative outcomes.* Methods: The paper presents two novel methods for generating differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR).* Results: The authors find that DPM and LR perform well in reducing what an adversary can infer, especially at low false positive rates. When the training dataset size is large enough, the authors achieve particular success in preventing privacy leakage while maintaining model and recourse accuracy with the LR method.Here’s the information in Simplified Chinese text:
  • for: 这篇论文目标是解决机器学习模型提供的算法救济对于个人不良结果的攻击。
  • methods: 论文提出了两种新的幂等私人救济方法:差分私人模型(DPM)和拉普拉斯救济(LR)。
  • results: 作者发现,在低假阳性率下,DPM和LR都能够减少攻击者可以获取的信息量,特别是在训练数据集大 enough 的情况下。
    Abstract Machine learning models are increasingly utilized across impactful domains to predict individual outcomes. As such, many models provide algorithmic recourse to individuals who receive negative outcomes. However, recourse can be leveraged by adversaries to disclose private information. This work presents the first attempt at mitigating such attacks. We present two novel methods to generate differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR). Using logistic regression classifiers and real world and synthetic datasets, we find that DPM and LR perform well in reducing what an adversary can infer, especially at low FPR. When training dataset size is large enough, we find particular success in preventing privacy leakage while maintaining model and recourse accuracy with our novel LR method.
    摘要 机器学习模型在影响各个领域中越来越广泛应用,以预测个人结果。然而,这些模型可能会泄露个人隐私信息。这项工作提出了首个防止这种攻击的方法。我们提出了两种新的涉嫌隐私模型(DPM)和拉普拉斯补偿(LR)。使用логистиック回归分类器和实际世界和synthetic数据集,我们发现DPM和LR在低 False Positive Rate(FP)下具有良好的隐私保护能力,特别是当训练集大小充分时。我们的LR方法在防止隐私泄露的同时保持模型和补偿精度。

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

  • paper_url: http://arxiv.org/abs/2308.04332
  • repo_url: None
  • paper_authors: Yannick Metz, David Lindner, Raphaël Baur, Daniel Keim, Mennatallah El-Assady
  • for: 用于学习人类反馈的奖励模型,并考虑人类反馈的因素。
  • methods: 使用RLHF-Blender,一个可配置的交互式界面,系统atically investigate人类反馈的性质和质量。
  • results: 可以 investigate various types of feedback, such as demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness.
    Abstract To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, the systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers. To bridge this gap, we propose RLHF-Blender, a configurable, interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework and implementation that enables researchers to systematically investigate the properties and qualities of human feedback for reward learning. The system facilitates the exploration of various feedback types, including demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness. We discuss a set of concrete research opportunities enabled by RLHF-Blender. More information is available at https://rlhfblender.info/.
    摘要 <>将人类反馈学习(RLHF)应用于实际场景中,是非常重要的。因此,学习从多种人类反馈来的奖励模型是必须的,同时也需要考虑人类提供反馈的因素。然而,现有的研究工具有限,使得系统性的研究受到阻碍。为了bridging这个差距,我们提议RLHF-Blender,一个可配置的交互式界面,用于学习人类反馈。RLHF-Blender提供了可模块化的实验框架和实现,帮助研究者系统地探索不同类型的人类反馈的性质和质量。系统支持explore多种反馈类型,包括示例、排名、比较和自然语言指令,以及考虑人类因素对其效果的影响。我们介绍了RLHF-Blender可以开发的具体研究机会。更多信息请访问https://rlhfblender.info/.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs

  • paper_url: http://arxiv.org/abs/2308.04314
  • repo_url: None
  • paper_authors: Lin Yang, Xuchuang Wang, Mohammad Hajiesmaili, Lijun Zhang, John C. S. Lui, Don Towsley
  • for: 这个论文的目的是开发一种协同多智能体多机枪游戏中的Optimal group regret和低通信成本的bandit算法。
  • methods: 这个论文使用了两种方法:领导者-追随者和完全分布式算法。
  • results: 这个论文的算法可以达到最佳个体 regret和常量通信成本。
    Abstract Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs.
    摘要

The Model Inversion Eavesdropping Attack in Semantic Communication Systems

  • paper_url: http://arxiv.org/abs/2308.04304
  • repo_url: None
  • paper_authors: Yuhao Chen, Qianqian Yang, Zhiguo Shi, Jiming Chen
  • for: 本研究探讨了 semantic communication 系统中的隐私泄露问题,并提出了一种基于 Random Permutation and Substitution 的防御策略。
  • methods: 本研究使用了 Model Inversion Eavesdropping Attack (MIEA) 来攻击 semantic communication 系统,并考虑了 white-box 和 black-box 两种设定。
  • results: 实验结果表明,提出的防御策略可以有效防止 MIEA,并且在不同的通道条件下能够保持高质量的征文重建。
    Abstract In recent years, semantic communication has been a popular research topic for its superiority in communication efficiency. As semantic communication relies on deep learning to extract meaning from raw messages, it is vulnerable to attacks targeting deep learning models. In this paper, we introduce the model inversion eavesdropping attack (MIEA) to reveal the risk of privacy leaks in the semantic communication system. In MIEA, the attacker first eavesdrops the signal being transmitted by the semantic communication system and then performs model inversion attack to reconstruct the raw message, where both the white-box and black-box settings are considered. Evaluation results show that MIEA can successfully reconstruct the raw message with good quality under different channel conditions. We then propose a defense method based on random permutation and substitution to defend against MIEA in order to achieve secure semantic communication. Our experimental results demonstrate the effectiveness of the proposed defense method in preventing MIEA.
    摘要 近年来, semantic communication 成为研究热点,因为它可以提高通信效率。然而, semantic communication 依赖深度学习来提取消息的意义,因此它容易受到深度学习模型的攻击。在这篇论文中,我们介绍了模型反向窃听攻击(MIEA),以揭示 semantic communication 系统中的隐私泄露风险。在 MIEA 中,攻击者首先监听 semantic communication 系统传输的信号,然后通过模型反向攻击来重建原始消息,包括白盒和黑盒两种设置。我们的evaluation结果表明, MIEA 可以在不同的通信道条件下成功重建原始消息,并且提议了基于随机排序和替换的防御方法,以确保 semantic communication 的安全。我们的实验结果表明,提议的防御方法可以有效防止 MIEA。

Comparative Analysis of the wav2vec 2.0 Feature Extractor

  • paper_url: http://arxiv.org/abs/2308.04286
  • repo_url: None
  • paper_authors: Peter Vieting, Ralf Schlüter, Hermann Ney
  • for: 这个论文主要是为了检验 neural raw waveform feature extractors (FEs) 是否可以取代传统的手工特征提取方法,以实现更加一致的模型从语音到转录文本。
  • methods: 这篇论文使用了 wav2vec 2.0 模型,这是一种直接在语音波形上运行的 convolutional FE,以及一种替代的神经网络特征提取方法。
  • results: 研究表明,两种神经网络特征提取方法都能与传统的特征提取方法竞争在 LibriSpeech benchmark 上,并且分析了各个组件的效果。 另外,研究还发现,ASR 系统最重要的信息是由一组带宽滤波器获得的。
    Abstract Automatic speech recognition (ASR) systems typically use handcrafted feature extraction pipelines. To avoid their inherent information loss and to achieve more consistent modeling from speech to transcribed text, neural raw waveform feature extractors (FEs) are an appealing approach. Also the wav2vec 2.0 model, which has recently gained large popularity, uses a convolutional FE which operates directly on the speech waveform. However, it is not yet studied extensively in the literature. In this work, we study its capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model and compare it to an alternative neural FE. We show that both are competitive with traditional FEs on the LibriSpeech benchmark and analyze the effect of the individual components. Furthermore, we analyze the learned filters and show that the most important information for the ASR system is obtained by a set of bandpass filters.
    摘要 自动语音识别(ASR)系统通常使用手工设计的特征提取管道。以避免其内置的信息损失并实现更一致的模型化从语音到转录文本,神经原始波形特征提取器(FEs)是一种吸引人的方法。另外,最近广受欢迎的wav2vec 2.0模型使用了一个 convolutional FE,该模型直接操作于语音波形。然而,它在文献中还没有得到广泛的研究。在这项工作中,我们研究了它的可行性以replace标准特征提取方法在一个 Connectionist Temporal Classification(CTC) ASR 模型中,并与一个 alternating neural FE 进行比较。我们发现两者在 LibriSpeech benchmark 上都是与传统特征提取方法竞争的,并分析了各种组件的效果。此外,我们还分析了学习的滤波器,发现主要的信息 для ASR 系统是由一组 bandpass 滤波器获得。

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

  • paper_url: http://arxiv.org/abs/2308.04275
  • repo_url: https://github.com/xhan77/in-context-alignment
  • paper_authors: Xiaochuang Han
  • for: 这个研究探讨了在运行时进行对适应的自适应语言模型。
  • methods: 研究使用了一个未经任何精度调整的语言模型Llama-2,并在它的示例上进行了协同学习。
  • results: 与直接提示相比,在受Context的协同学习下,无需修改模型参数的情况下,vanilla语言模型的赢利率提高了7倍,与文本达文西003模型from OpenAI进行比较。
    Abstract In this note, we explore inference-time alignment through in-context learning. We consider a vanilla pretrained language model Llama-2 before any fine-tuning and retrieve an average of 9 demonstration alignment examples when the model is prompted to follow chat-style instructions. Compared to direct prompting, the in-context alignment without changing model weights leads to a 7x increase in win-rate w.r.t. the text-davinci-003 model from OpenAI, making the vanilla language model comparable to strong baselines with alignment fine-tuning.
    摘要 在这份笔记中,我们研究了在使用受Context learning时进行推理时的对齐。我们考虑了未经任何微调的语言模型Llama-2,并从chat风格的指令中获取了9个示例对齐。与直接提示相比,在不变换模型参数时进行受Context learning的对齐,导致了与文本-达文西003模型from OpenAI的7倍增加赢得率,使得未经微调的语言模型与对齐微调的基elines相当。

Teacher-Student Architecture for Knowledge Distillation: A Survey

  • paper_url: http://arxiv.org/abs/2308.04268
  • repo_url: None
  • paper_authors: Chengming Hu, Xuan Li, Dan Liu, Haolun Wu, Xi Chen, Ju Wang, Xue Liu
  • for: 本研究旨在探讨 teacher-student 架构在多种知识压缩目标上的应用,包括知识压缩、知识扩展、知识适应和知识增强等。
  • methods: 本文提出了一种系统性的对 teacher-student 架构的介绍,包括不同的知识表示方法和优化目标,以及一些代表学习算法和有效的压缩方案。
  • results: 本文综述了现有的应用场景,包括分类、识别、生成、排名和回归等多种目标,并提出了未来研究方向,包括架构设计、知识质量和回归学习等。
    Abstract Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.
    摘要 although deep neural networks (DNNs) have shown strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. to tackle this issue, teacher-student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. recently, teacher-student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. with the help of teacher-student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. different from existing KD surveys that primarily focus on knowledge compression, this survey first explores teacher-student architectures across multiple distillation objectives. this survey presents an introduction to various knowledge representations and their corresponding optimization objectives. additionally, we provide a systematic overview of teacher-student architectures with representative learning algorithms and effective distillation schemes. this survey also summarizes recent applications of teacher-student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying teacher-student architectures on various distillation objectives.

BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.04263
  • repo_url: None
  • paper_authors: Omer Veysel Cagatan
  • for: 提高数据效率的强化学习Agent
  • methods: combinest Barlow Twins自动学习框架和DER数据有效雨bow算法
  • results: 在Atari 100k测试集上表现出色,超过了DER和其它对比算法的表现In Simplified Chinese:
  • for: 提高强化学习数据效率
  • methods: combine Barlow Twins 自动学习框架和DER 数据有效雨bow 算法
  • results: 在Atari 100k 测试集上表现出色,超过了DER 和其它对比算法的表现
    Abstract This paper introduces BarlowRL, a data-efficient reinforcement learning agent that combines the Barlow Twins self-supervised learning framework with DER (Data-Efficient Rainbow) algorithm. BarlowRL outperforms both DER and its contrastive counterpart CURL on the Atari 100k benchmark. BarlowRL avoids dimensional collapse by enforcing information spread to the whole space. This helps RL algorithms to utilize uniformly spread state representation that eventually results in a remarkable performance. The integration of Barlow Twins with DER enhances data efficiency and achieves superior performance in the RL tasks. BarlowRL demonstrates the potential of incorporating self-supervised learning techniques to improve RL algorithms.
    摘要 这篇论文介绍了BarlowRL,一种数据效率的 reinforcement learning代理人,它将Barlow Twins自我超vis学框架与DER(数据效率雨bow)算法结合在一起。BarlowRL在Atari 100k benchmark上表现出优于DER和其对应的对比算法CURL。BarlowRL通过保证信息散布到整个空间,避免维度塌陷,使RL算法能够利用 uniformly 分布的状态表示,最终导致了很好的表现。将Barlow Twins与DER集成,可以提高数据效率并实现RL任务中的优秀表现。BarlowRL表明了将自我超vis学技术integrated into RL算法可以提高其表现。

SDLFormer: A Sparse and Dense Locality-enhanced Transformer for Accelerated MR Image Reconstruction

  • paper_url: http://arxiv.org/abs/2308.04262
  • repo_url: https://github.com/rahul-gs-16/sdlformer
  • paper_authors: Rahul G. S., Sriprabha Ramnarayanan, Mohammad Al Fahim, Keerthi Ram, Preejith S. P, Mohanasankar Sivaprakasam
  • for: 这个论文目的是提出一种基于窗口变换器的快速MRI图像重建方法,以优化MRI图像重建速度和质量。
  • methods: 该方法使用了窗口变换器网络,并 integrate了扩展注意力机制和卷积操作,以捕捉更远的像素关系和学习低级翻译不变的特征。
  • results: 实验结果显示,提出的方法可以在4x和5x的下采样情况下,与其他架构和平行领域自主学习基准相比,提高了1.40dB的PSNR和0.028的SSIM的平均提升。代码可以在https://github.com/rahul-gs-16/sdlformer.git中下载。
    Abstract Transformers have emerged as viable alternatives to convolutional neural networks owing to their ability to learn non-local region relationships in the spatial domain. The self-attention mechanism of the transformer enables transformers to capture long-range dependencies in the images, which might be desirable for accelerated MRI image reconstruction as the effect of undersampling is non-local in the image domain. Despite its computational efficiency, the window-based transformers suffer from restricted receptive fields as the dependencies are limited to within the scope of the image windows. We propose a window-based transformer network that integrates dilated attention mechanism and convolution for accelerated MRI image reconstruction. The proposed network consists of dilated and dense neighborhood attention transformers to enhance the distant neighborhood pixel relationship and introduce depth-wise convolutions within the transformer module to learn low-level translation invariant features for accelerated MRI image reconstruction. The proposed model is trained in a self-supervised manner. We perform extensive experiments for multi-coil MRI acceleration for coronal PD, coronal PDFS and axial T2 contrasts with 4x and 5x under-sampling in self-supervised learning based on k-space splitting. We compare our method against other reconstruction architectures and the parallel domain self-supervised learning baseline. Results show that the proposed model exhibits improvement margins of (i) around 1.40 dB in PSNR and around 0.028 in SSIM on average over other architectures (ii) around 1.44 dB in PSNR and around 0.029 in SSIM over parallel domain self-supervised learning. The code is available at https://github.com/rahul-gs-16/sdlformer.git
    摘要 transformers 已经成为了卷积神经网络的可行替代品,因为它们可以学习图像空间中的非本地区关系。transformers 的自注意机制使得它们可以捕捉图像中的长距离依赖关系,这可能是加速 MRI 图像重建的潜在的优点,因为 MRI 图像下折衰的效果是非本地的。尽管它们有计算效率的优势,但窗口基于的 transformers 受限于图像窗口范围内的依赖关系。我们提议一种窗口基于的 transformer 网络,该网络 integrate 了扩展注意力机制和卷积操作以加速 MRI 图像重建。我们的提案的网络包括扩展和密集 neighborhood attention transformers,以增强远方块像素关系,并在 transformer 模块中添加 depth-wise 卷积来学习低级翻译不变的特征。我们的模型在自我超vised 的方式进行训练。我们进行了多种实验,包括多个 MRI 加速器,以及 coronal PD、coronal PDFS 和 axial T2 对比。我们与其他重建架构和并行Domain self-supervised learning 基线进行比较。结果表明,我们的模型在 PSNR 和 SSIM 两个指标上分别提高了约1.40 dB和约0.028的平均提升。代码可以在 中找到。

Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

  • paper_url: http://arxiv.org/abs/2308.04258
  • repo_url: https://github.com/optimusprimus/dcase2023_task6b
  • paper_authors: Paul Primus, Khaled Koutini, Gerhard Widmer
  • for: 这篇论文旨在提出一种基于预训练文本和声音变换器的文本到声音检索系统。
  • methods: 该方法使用自注意力基于声音编码器对声音进行编码,并在不同模式之间进行了系统性分析,以评估每个系统组件对检索性能的影响。
  • results: 该系统在2023年DCASE挑战中 ranked第一,并在ClothoV2测试集上超越当前状态的艺术点,提高了5.6 pp. mAP@10。
    Abstract This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers. Our method projects recordings and textual descriptions into a shared audio-caption space in which related examples from different modalities are close. Through a systematic analysis, we examine how each component of the system influences retrieval performance. As a result, we identify two key components that play a crucial role in driving performance: the self-attention-based audio encoder for audio embedding and the utilization of additional human-generated and synthetic data sets during pre-training. We further experimented with augmenting ClothoV2 captions with available keywords to increase their variety; however, this only led to marginal improvements. Our system ranked first in the 2023's DCASE Challenge, and it outperforms the current state of the art on the ClothoV2 benchmark by 5.6 pp. mAP@10.
    摘要 Simplified Chinese:这个研究提出了一个基于预训练文本和spectrogram转换器的文本至声音检索系统。我们的方法将录音和文本描述映射到一个共享声音-caption空间,在不同modalities中相关的示例都很近。通过系统atic分析,我们评估每个系统组件对检索性能的影响。我们发现两个关键的组件对检索性能有决定性的影响:使用自我注意力基于的声音编码器,以及在预训练过程中使用additional human-generated和合成数据集。我们还尝试了将ClothoV2标签中可用的关键词加入,但只导致了微妙的改进。我们的系统在2023年DCASE挑战中名列第一,并在ClothoV2标准测试集上比现状权威的检索性能提高5.6 pp. mAP@10。

Federated Inference with Reliable Uncertainty Quantification over Wireless Channels via Conformal Prediction

  • paper_url: http://arxiv.org/abs/2308.04237
  • repo_url: None
  • paper_authors: Meiyi Zhu, Matteo Zecchin, Sangwoo Park, Caili Guo, Chunyan Feng, Osvaldo Simeone
  • for: 这篇论文旨在研究在多个设备和服务器共享预训练模型的情况下,通过设备到服务器的通信来提高服务器的推断决策质量。
  • methods: 这篇论文提出了一种名为“联邦均衡预测”(Federated Conformal Prediction,简称WFCP)的协议,它基于类型基本多访问(Type-Based Multiple Access,TBMA)和一种新的量词 corrections 策略。WFCP 可以在无阻塞通信的情况下提供正式的可靠性保证。
  • results: 根据数值结果,作者比较了WFCP 与现有的联邦CP 方案的性能,发现WFCP 在有限通信资源和/或多个设备的情况下具有显著优势。特别是,WFCP 可以在无阻塞通信的情况下提供正式的可靠性保证,而现有的联邦CP 方案则不能做到。
    Abstract Consider a setting in which devices and a server share a pre-trained model. The server wishes to make an inference on a new input given the model. Devices have access to data, previously not used for training, and can communicate to the server over a common wireless channel. If the devices have no access to the new input, can communication from devices to the server enhance the quality of the inference decision at the server? Recent work has introduced federated conformal prediction (CP), which leverages devices-to-server communication to improve the reliability of the server's decision. With federated CP, devices communicate to the server information about the loss accrued by the shared pre-trained model on the local data, and the server leverages this information to calibrate a decision interval, or set, so that it is guaranteed to contain the correct answer with a pre-defined target reliability level. Previous work assumed noise-free communication, whereby devices can communicate a single real number to the server. In this paper, we study for the first time federated CP in a wireless setting. We introduce a novel protocol, termed wireless federated conformal prediction (WFCP), which builds on type-based multiple access (TBMA) and on a novel quantile correction strategy. WFCP is proved to provide formal reliability guarantees in terms of coverage of the predicted set produced by the server. Using numerical results, we demonstrate the significant advantages of WFCP against digital implementations of existing federated CP schemes, especially in regimes with limited communication resources and/or large number of devices.
    摘要 Setting 中,设备和服务器共享预训练模型。服务器想要对新输入进行推断。设备可以访问未使用过训练的数据,并可以通过公共无线频道与服务器进行通信。如果设备没有访问新输入,是否可以通过设备到服务器的通信提高服务器的推断决策质量? latest work 引入了联邦均衡预测(CP),该技术利用设备到服务器的通信来提高服务器的决策可靠性。在联邦CP中,设备将共享模型在本地数据上的损失信息通过无线频道传输给服务器,服务器利用这些信息进行均衡决策集,以 garantuee 决策集中包含正确答案,并且预定的可靠性水平。 previous work 假设了无噪通信,设备可以将单个实数传输给服务器。在这篇论文中,我们研究了在无线设置下的联邦CP。我们提出了一种新的协议,称为无线联邦均衡预测(WFCP),它基于类型基本多访问(TBMA)和一种新的量衡修正策略。WFCP提供了正式可靠性保证,包括预测集产生的覆盖率。通过数值结果,我们展示了WFCP在数字实现联邦CP方案的情况下,特别是在通信资源有限和/或设备数量很大的情况下,具有显著优势。

OpinionConv: Conversational Product Search with Grounded Opinions

  • paper_url: http://arxiv.org/abs/2308.04226
  • repo_url: None
  • paper_authors: Vahid Sadiri Javadi, Martin Potthast, Lucie Flek
  • for: 这 paper 是为了 simulating sales conversations 和 grounding conversational AI in true subjective narratives.
  • methods: 该 paper 使用 product reviews 作为 rich source of product opinions.
  • results: 在 several user studies 中,generated conversations 被评估为 realistic, 并且 assessors 确认 opinions 作为 informed basis for decision-making.
    Abstract When searching for products, the opinions of others play an important role in making informed decisions. Subjective experiences about a product can be a valuable source of information. This is also true in sales conversations, where a customer and a sales assistant exchange facts and opinions about products. However, training an AI for such conversations is complicated by the fact that language models do not possess authentic opinions for their lack of real-world experience. We address this problem by leveraging product reviews as a rich source of product opinions to ground conversational AI in true subjective narratives. With OpinionConv, we develop the first conversational AI for simulating sales conversations. To validate the generated conversations, we conduct several user studies showing that the generated opinions are perceived as realistic. Our assessors also confirm the importance of opinions as an informative basis for decision-making.
    摘要 Translated into Simplified Chinese:在寻找产品时,他人的意见具有重要的指导作用。产品的主观经验可以提供有价值的信息。这也是销售对话中的事实,顾客和销售助手交换产品的信息和意见。然而,用AI训练这些对话是因为语言模型缺乏真实世界经验而复杂。我们解决这个问题,利用产品评论作为产品意见的丰富源,以真实的主观故事为 conversational AI 定位。我们开发了 OpinionConv,首个用于模拟销售对话的对话AI。为验证生成的对话,我们进行了多个用户研究,显示生成的意见被评估为真实。我们的评估人也证实了意见作为决策基础的重要性。

Semantic Interpretation and Validation of Graph Attention-based Explanations for GNN Models

  • paper_url: http://arxiv.org/abs/2308.04220
  • repo_url: None
  • paper_authors: Efimia Panagiotaki, Daniele De Martini, Lars Kunze
  • for: 这个论文旨在研究如何使用Semantic Attention提高Graph Neural Network(GNN)模型的解释性,并通过在模型中引入semantically-informed perturbations来建立feature-importance weights和模型准确性之间的相关性。
  • methods: 该论文提出了一种基于Graph Deep Learning(GDL)的方法,通过引入Semantic Attention Mechanism来提高GNN模型的解释性。该方法基于Attention Mechanism的概念,通过计算模型对输入特征的重要性来提供feature-based解释。
  • results: 该论文通过应用该方法于一个Lidar点云估计模型,成功地标识了模型的透明性和性能之间的相关性,并生成了可靠的后果Semantic Explanation。
    Abstract In this work, we propose a methodology for investigating the application of semantic attention to enhance the explainability of Graph Neural Network (GNN)-based models, introducing semantically-informed perturbations and establishing a correlation between predicted feature-importance weights and model accuracy. Graph Deep Learning (GDL) has emerged as a promising field for tasks like scene interpretation, leveraging flexible graph structures to concisely describe complex features and relationships. As traditional explainability methods used in eXplainable AI (XAI) cannot be directly applied to such structures, graph-specific approaches are introduced. Attention mechanisms have demonstrated their efficacy in estimating the importance of input features in deep learning models and thus have been previously employed to provide feature-based explanations for GNN predictions. Building upon these insights, we extend existing attention-based graph-explainability methods investigating the use of attention weights as importance indicators of semantically sorted feature sets. Through analysing the behaviour of predicted attention-weights distribution in correlation with model accuracy, we gain valuable insights into feature importance with respect to the behaviour of the GNN model. We apply our methodology to a lidar pointcloud estimation model successfully identifying key semantic classes that contribute to enhanced performance effectively generating reliable post-hoc semantic explanations.
    摘要 在这个研究中,我们提出了一种方法来提高图 neural network(GNN)模型的解释性,通过引入Semantic attention和建立 predicted feature-importance weights 和模型准确率之间的相关性。图深度学习(GDL)已经成为了场景理解的一个有前途的领域,利用图结构来简洁地描述复杂的特征和关系。传统的解释方法在XAI中不能直接应用于这些结构,因此图特定的方法被引入。Attention机制已经证明了它们可以估算输入特征的重要性,因此在GNN预测中提供了基于特征的解释。我们在这些基础上进一步推广了现有的注意力Weight-based graph-explainability方法,研究 attention weights 作为semantic sorted feature sets的重要性指标。通过分析预测的注意力分布的行为和模型准确率之间的相关性,我们获得了对feature importance的重要信息,即 Semantic classes的贡献对于提高性能的贡献。我们应用了我们的方法ology到一个 lidar pointcloud estimation模型,成功地Identifying key semantic classes that contribute to enhanced performance, effectively generating reliable post-hoc semantic explanations.

Varying-coefficients for regional quantile via KNN-based LASSO with applications to health outcome study

  • paper_url: http://arxiv.org/abs/2308.04212
  • repo_url: https://github.com/younghhk/software
  • paper_authors: Seyoung Park, Eun Ryung Lee, Hyokyoung G. Hong
  • for: 这个论文的目的是动态模型健康结果和风险因素之间的关系,以及年龄的时变效应。
  • methods: 这个论文使用了变 coefficient (VC) 区域量化回归和K-最近邻 (KNN) 混合lasso,以捕捉健康结果和风险因素之间的时变关系。
  • results: 该方法在实际应用中能够准确地捕捉健康结果和风险因素之间的复杂时变关系。
    Abstract Health outcomes, such as body mass index and cholesterol levels, are known to be dependent on age and exhibit varying effects with their associated risk factors. In this paper, we propose a novel framework for dynamic modeling of the associations between health outcomes and risk factors using varying-coefficients (VC) regional quantile regression via K-nearest neighbors (KNN) fused Lasso, which captures the time-varying effects of age. The proposed method has strong theoretical properties, including a tight estimation error bound and the ability to detect exact clustered patterns under certain regularity conditions. To efficiently solve the resulting optimization problem, we develop an alternating direction method of multipliers (ADMM) algorithm. Our empirical results demonstrate the efficacy of the proposed method in capturing the complex age-dependent associations between health outcomes and their risk factors.
    摘要 健康结果,如体重指数和尿痰水平,与年龄存在许多关系,这些关系随着风险因素的变化而发生变化。在这篇论文中,我们提出了一种新的方法,即时变量方程模型,用于描述健康结果和风险因素之间的关系。这种方法通过变量系数(VC)地方量化回归和K-最近邻(KNN)束regularization,捕捉年龄的时间变化效应。我们的方法具有强制实施证明,包括紧张估计误差 bound和在某些正则条件下检测到具体的团集模式。为解决相应的优化问题,我们开发了一种分解方法of multipliers(ADMM)算法。我们的实验结果表明,我们的方法能够准确捕捉健康结果和风险因素之间的复杂年龄关系。

Iterative Sketching for Secure Coded Regression

  • paper_url: http://arxiv.org/abs/2308.04185
  • repo_url: None
  • paper_authors: Neophytos Charalambides, Hessam Mahdavifar, Mert Pilanci, Alfred O. Hero III
  • for: 这篇论文是为了提高线性回归的分布式加速而写的。
  • methods: 该论文使用随机抽样技术,并提高异步系统中势特缺失的抗性。具体来说,它使用随机正交矩阵,并对块进行抽样,以同时保护信息和降低回归问题的维度。在我们的设置中,这种变换对应于一种编码加密的简化 Gradient Coding Scheme,而抽样对应于非势特工作者的回答。
  • results: 该论文提出了一种分布式iterative sketching方法,可以同时实现线性回归的加速和安全保护。具体来说,它提出了一种使用随机抽样和编码加密的方法,可以在分布式系统中实现高效的线性回归计算。此外,论文还特别关注了一种特殊的随机化hadamard transform,并将其扩展到块抽样。
    Abstract In this work, we propose methods for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the transformation corresponds to an encoded encryption in an \textit{approximate gradient coding scheme}, and the subsampling corresponds to the responses of the non-straggling workers; in a centralized coded computing network. This results in a distributive \textit{iterative sketching} approach for an $\ell_2$-subspace embedding, \textit{i.e.} a new sketch is considered at each iteration. We also focus on the special case of the \textit{Subsampled Randomized Hadamard Transform}, which we generalize to block sampling; and discuss how it can be modified in order to secure the data.
    摘要 在这项工作中,我们提出了一种加速线性回归的分布式方法,同时保证安全性。我们利用随机抽取技术,并改进异步系统中的延迟问题。特别是,我们首先应用随机正交矩阵,然后对块进行采样,以同时保护信息和缩小回归问题的维度。在我们的设置中,这种变换对应于一种编码加密方案,即精度梯度编码,而采样对应于非延迟工作者的响应。因此,我们得到了一种分布式迭代绘制方法,即在每次迭代中生成一个新的绘制。我们还关注特殊情况下的归一化随机哈达姆变换,并将其扩展到块采样;并讨论如何修改它以保护数据。

Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: “Are we on the same page ?”

  • paper_url: http://arxiv.org/abs/2308.04180
  • repo_url: https://github.com/mlinardicyu/sud_study_different_eyes
  • paper_authors: Bruno Machado Carneiro, Michele Linardi, Julien Longhi
  • for: 本研究探讨了在线文本中的社会不容许语言特征描述和检测。
  • methods: 我们首先构建了一个包含多种不同在线源的手动标注文本的新集合,以测试现有机器学习(ML)社会不容许语言检测解决方案中的通用能力。
  • results: 我们提供了一些数据洞察,以支持领域专家在标注任务中。同时,我们还分析了可能存在的不同批注modalities的影响于社会不容许语言学习,并提出了一些未解决的挑战和研究方向。
    Abstract We study Socially Unacceptable Discourse (SUD) characterization and detection in online text. We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources used so far in state-of-the-art Machine learning (ML) SUD detection solutions. This global context allows us to test the generalization ability of SUD classifiers that acquire knowledge around the same SUD categories, but from different contexts. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning by discussing open challenges and open research directions. We also provide several data insights which can support domain experts in the annotation task.
    摘要 我们研究社会不容许的语言讨论(SUD)Characterization和检测在在线文本中。我们首先构建并提供了一个新的 корпуス,包含了不同在线源的手动标注的文本,这些文本在过去的 estado-of-the-art 机器学习(ML)SUD检测解决方案中使用过。这个全球背景允许我们测试SUD分类器的通用能力,这些分类器从不同的上下文中获得了相同的 SUD 类别知识。从这个角度来看,我们可以分析不同标注方式对 SUD 学习的影响,并讨论开放的挑战和研究方向。我们还提供了一些数据见解,以支持领域专家在标注任务中。Note: "SUD" stands for "Socially Unacceptable Discourse" in English.

Dual input neural networks for positional sound source localization

  • paper_url: http://arxiv.org/abs/2308.04169
  • repo_url: https://github.com/egrinstein/di_nn
  • paper_authors: Eric Grinstein, Vincent W. Neo, Patrick A. Naylor
  • for: 本研究旨在提高Sound Source Localization(SSL)算法的精度,通过将高维度的多渠道声音信号和场景的声学特性(如投射坐标)与神经网络模型结合使用。
  • methods: 本研究提出了一种简单有效的双输入神经网络(DI-NN)模型,可以有效地模型高维度声音信号和场景声学特性。
  • results: 对于一系列的实验数据,DI-NN与基准方法(如最小二乘法和卷积循环神经网络)进行比较,DI-NN在本试用 dataset 中实现了五倍的Localization error reduction than 基准方法,并且较CRNN两倍。
    Abstract In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.
    摘要 在许多信号处理应用程序中,元数据可能被利用于高维信号生成愿景输出。在经典的声音源地理位置算法中,来自多个分布式 Mikrophone 的高维多通道音频信号以及场景的声学属性信息(如 Mikrophone 的空间坐标)被组合以估算声音源的位置。我们引入双输入神经网络(DI-NN)作为一种简单而有效的方法来模型这两种数据类型。我们在不同的难度和真实性场景下训练和评估我们的提议 DI-NN,并与基准architecture(经典的最小二乘法和卷积循环神经网络)进行比较。我们的结果显示,DI-NN 明显超越了基准,在实际录制的测试集中实现了与经典方法(最小二乘法)和卷积循环神经网络(CRNN)相比的五倍的地理位置误差。

Comprehensive Assessment of the Performance of Deep Learning Classifiers Reveals a Surprising Lack of Robustness

  • paper_url: http://arxiv.org/abs/2308.04137
  • repo_url: None
  • paper_authors: Michael W. Spratling
  • for: 评估机器学习模型的可靠性和稳定性
  • methods: 使用多种数据类型进行评估,并使用单一指标评估模型的性能
  • results: 现有深度神经网络模型容易在某些数据类型上出现错误,表明它们在实际场景中可能不可靠,并且容易被骗到错误决策Here’s a more detailed explanation of each point:
  • for: The paper aims to evaluate the robustness and reliability of machine learning models, specifically deep neural networks, by using a wide range of data types and a single metric to assess their performance.
  • methods: The authors propose using a benchmark that includes multiple types of data to evaluate the models’ performance, and they use a single metric to compare the models’ performance across different data types.
  • results: The authors found that current deep neural networks are vulnerable to making mistakes on certain types of data, which means they may not be reliable in real-world scenarios where they may encounter data from many different domains. Additionally, the authors found that these models can be easily fooled into making wrong decisions.
    Abstract Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to comprehensively evaluate performance as they tend to rely on limited types of test data, and ignore others. For example, using the standard test data fails to evaluate the predictions made by the classifier to samples from classes it was not trained on. On the other hand, testing with data containing samples from unknown classes fails to evaluate how well the classifier can predict the labels for known classes. This article advocates bench-marking performance using a wide range of different types of data and using a single metric that can be applied to all such data types to produce a consistent evaluation of performance. Using such a benchmark it is found that current deep neural networks, including those trained with methods that are believed to produce state-of-the-art robustness, are extremely vulnerable to making mistakes on certain types of data. This means that such models will be unreliable in real-world scenarios where they may encounter data from many different domains, and that they are insecure as they can easily be fooled into making the wrong decisions. It is hoped that these results will motivate the wider adoption of more comprehensive testing methods that will, in turn, lead to the development of more robust machine learning methods in the future. Code is available at: \url{https://codeberg.org/mwspratling/RobustnessEvaluation}
    摘要 可靠和可靠的评估方法是开发可靠和可靠的机器学习模型的必要第一步。 unfortunately,现有的评估协议通常只能够部分评估模型的性能,因为它们通常只使用有限的测试数据来进行评估。例如,使用标准测试数据不能评估模型对未知类样本的预测结果。相反,使用未知类样本来测试模型将不能评估模型对已知类样本的预测结果。本文提出了使用多种不同类型的数据进行 benchmarking性能,并使用一个可以应用于所有数据类型的单一指标来生成一致的评估性能。使用这种标准,发现当前的深度神经网络,包括使用认为会生成状态对的训练方法,对于某些类型的数据表示极度易误。这意味着这些模型在实际世界中的应用中将不可靠,因为它们可能会遇到多种领域的数据。此外,这些模型也是不安全的,因为它们可以轻松地被骗到错误地做出决策。希望这些结果能够激励更广泛的测试方法的采用,以便在未来开发更加可靠的机器学习方法。Code is available at: \url{https://codeberg.org/mwspratling/RobustnessEvaluation}

D-Score: A Synapse-Inspired Approach for Filter Pruning

  • paper_url: http://arxiv.org/abs/2308.04470
  • repo_url: None
  • paper_authors: Doyoung Park, Jinsoo Kim, Jina Nam, Jooyoung Chang, Sang Min Park
  • for: 本文提出了一种基于神经系统的filter pruning方法,用于减少卷积神经网络中过度的权重。
  • methods: 该方法采用了神经科学的视角,使用了动态得分(D-Score)分析独立重要性,并将权重分配分数来评估独立重要性。
  • results: 实验结果表明,该方法可以在CIFAR-10和ImageNet datasets上减少了显著的计算量和参数数量,而无需损失精度。
    Abstract This paper introduces a new aspect for determining the rank of the unimportant filters for filter pruning on convolutional neural networks (CNNs). In the human synaptic system, there are two important channels known as excitatory and inhibitory neurotransmitters that transmit a signal from a neuron to a cell. Adopting the neuroscientific perspective, we propose a synapse-inspired filter pruning method, namely Dynamic Score (D-Score). D-Score analyzes the independent importance of positive and negative weights in the filters and ranks the independent importance by assigning scores. Filters having low overall scores, and thus low impact on the accuracy of neural networks are pruned. The experimental results on CIFAR-10 and ImageNet datasets demonstrate the effectiveness of our proposed method by reducing notable amounts of FLOPs and Params without significant Acc. Drop.
    摘要

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

  • paper_url: http://arxiv.org/abs/2308.04126
  • repo_url: None
  • paper_authors: Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An
  • for: 这篇论文旨在提出一种多modal数据融合和无限数据生成的创新方法,以提高多modal数据之间的交互和增强。
  • methods: 该方法使用了多种多operation,包括视频/图像描述EXTRACTION、稠密描述EXTRACTION、自动语音识别(ASR)、光学字符识别(OCR)、Recognize Anything Model(RAM)和物体跟踪。
  • results: 该方法可以识别超过6400种类型的对象,大幅扩大视觉信息范围。它将多modal数据融合起来,促进modalities之间的互助和跨modal数据更正。最终输出将每个视频输入转化为详细的时间序列文档,使视频内容更易被大语言模型处理。
    Abstract This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to the core breakthrough, it introduces a cohesive data structure proficient in processing and merging multimodal data inputs, which include video, audio, and text. Our crafted algorithm leverages advancements across multiple operations such as video/image caption extraction, dense caption extraction, Automatic Speech Recognition (ASR), Optical Character Recognition (OCR), Recognize Anything Model(RAM), and object tracking. OmniDataComposer is capable of identifying over 6400 categories of objects, substantially broadening the spectrum of visual information. It amalgamates these diverse modalities, promoting reciprocal enhancement among modalities and facilitating cross-modal data correction. \textbf{The final output metamorphoses each video input into an elaborate sequential document}, virtually transmuting videos into thorough narratives, making them easier to be processed by large language models. Future prospects include optimizing datasets for each modality to encourage unlimited data generation. This robust base will offer priceless insights to models like ChatGPT, enabling them to create higher quality datasets for video captioning and easing question-answering tasks based on video content. OmniDataComposer inaugurates a new stage in multimodal learning, imparting enormous potential for augmenting AI's understanding and generation of complex, real-world data.
    摘要

Constructing Custom Thermodynamics Using Deep Learning

  • paper_url: http://arxiv.org/abs/2308.04119
  • repo_url: None
  • paper_authors: Xiaoli Chen, Beatrice W. Soh, Zi-En Ooi, Eleonore Vissol-Gaudin, Haijun Yu, Kostya S. Novoselov, Kedar Hippalgaonkar, Qianxiao Li
  • for: 这个论文的目的是为了开发一种基于总的奥托曼理论的自动化科学发现平台,用于研究复杂动态系统。
  • methods: 该论文使用了机器学习方法,通过对微观轨迹观察的学习,直接从微观描述中学习宏观动力描述。
  • results: 该论文通过对聚合物延伸的研究,成功地学习了三个可解释的热动力坐标,并建立了聚合物延伸的动力景观,包括稳定状态和转变状态的识别,以及延伸速率的控制。此外,该论文还应用了该方法到了不同领域的空间疫病问题,证明了该方法的广泛科学和技术应用前景。
    Abstract One of the most exciting applications of AI is automated scientific discovery based on previously amassed data, coupled with restrictions provided by the known physical principles, including symmetries and conservation laws. Such automated hypothesis creation and verification can assist scientists in studying complex phenomena, where traditional physical intuition may fail. Of particular importance are complex dynamic systems where their time evolution is strongly influenced by varying external parameters. In this paper we develop a platform based on a generalised Onsager principle to learn macroscopic dynamical descriptions of arbitrary stochastic dissipative systems directly from observations of their microscopic trajectories. We focus on systems whose complexity and sheer sizes render complete microscopic description impractical, and constructing theoretical macroscopic models requires extensive domain knowledge or trial-and-error. Our machine learning approach addresses this by simultaneously constructing reduced thermodynamic coordinates and interpreting the dynamics on these coordinates. We demonstrate our method by studying theoretically and validating experimentally, the stretching of long polymer chains in an externally applied field. Specifically, we learn three interpretable thermodynamic coordinates and build a dynamical landscape of polymer stretching, including (1) the identification of stable and transition states and (2) the control of the stretching rate. We further demonstrate the universality of our approach by applying it to an unrelated problem in a different domain: constructing macroscopic dynamics for spatial epidemics, showing that our method addresses wide scientific and technological applications.
    摘要 一种非常有趣的人工智能应用是基于先前整理的数据自动化科学发现,与知道的物理原理限制相结合,包括对称和能量保守法则。这种自动生成和验证假设可以帮助科学家研究复杂现象,其中传统的物理直觉可能失效。特别是复杂动态系统,其时间演化受外部参数变化的影响很强。在这篇论文中,我们开发了基于总体的奥托生理定律的平台,用于直接从微型跟踪数据中学习杂动系统的宏观动力描述。我们关注的是 complexity 和 scale 至今不可能完全描述的系统,并且构建理论宏观模型需要广泛的领域知识或尝试试验。我们的机器学习方法解决了这个问题,同时构建了减少的热力学坐标和解释动力学。我们通过研究杂动聚合物强制延展的实验和理论分析,证明我们的方法可以在不同领域应用。

PTransIPs: Identification of phosphorylation sites based on protein pretrained language model and Transformer

  • paper_url: http://arxiv.org/abs/2308.05115
  • repo_url: https://github.com/statxzy7/ptransips
  • paper_authors: Ziyang Xu, Haitian Zhong
    for:这份研究用于开发一个新的深度学习模型,用于识别蛋白质中的磷酸化位点。methods:这个模型使用了一种新的深度学习架构,叫做PTransIPs,它将蛋白质中的氨基酸看作是字,将它们拓展为唯一的编码,并且使用了大型预训练的蛋白质模型的嵌入。results:实验结果显示,PTransIPs 能够高效地识别蛋白质中的磷酸化位点,AUROC 值为 0.9232 和 0.9660,分别用于识别磷酸化 S/T 和 Y 位点。此外,实验还显示了预训练模型嵌入的贡献,以及模型的可读性和普遍性。
    Abstract Phosphorylation is central to numerous fundamental cellular processes, influencing the onset and progression of a variety of diseases. The correct identification of these phosphorylation sites is of great importance to unravel the intricate molecular mechanisms within cells and during viral infections, potentially leading to the discovery of new therapeutic targets. In this study, we introduce PTransIPs, a novel deep learning model for the identification of phosphorylation sites. PTransIPs treat amino acids within protein sequences as words, extracting unique encodings based on their type and sequential position. The model also incorporates embeddings from large pretrained protein models as additional data inputs. PTransIPS is further trained on a combination model of convolutional neural network with residual connections and Transformer model equipped with multi-head attention mechanisms. At last, the model outputs classification results through a fully connected layer. The results of independent testing reveal that PTransIPs outperforms existing state-of-the-art(SOTA) methods, achieving AUROCs of 0.9232 and 0.9660 for identifying phosphorylated S/T and Y sites respectively. In addition, ablation studies prove that pretrained model embeddings contribute to the performance of PTransIPs. Furthermore, PTransIPs has interpretable amino acid preference, visible training process and shows generalizability on other bioactivity classification tasks. To facilitate usage, our code and data are publicly accessible at \url{https://github.com/StatXzy7/PTransIPs}.
    摘要 蛋白磷酸化是细胞内多种基本生物过程中的核心,影响疾病发生和进程。正确识别这些磷酸化位点非常重要,以解释细胞内分子机制和病毒感染过程,可能导致新的药物目标的发现。在这项研究中,我们介绍了PTransIPs,一种新的深度学习模型,用于识别磷酸化位点。PTransIPs将蛋白质内的氨基酸看作 слова,提取唯一的编码,基于它们的类型和顺序位置。模型还使用大型预训练蛋白质模型的嵌入为附加数据输入。PTransIPS在一种组合的卷积神经网络和Transformer模型中进行了进一步训练。最后,模型输出了分类结果通过完全连接层。独立测试结果表明,PTransIPs超过了现有状态的方法,实现了AUROC值为0.9232和0.9660,用于识别磷酸化S/T和Y位点。此外,归因研究表明,预训练模型嵌入对PTransIPs的性能做出了贡献。此外,PTransIPs具有可解释的氨基酸偏好、可见的训练过程和在其他生物活动分类任务上的普适性。为便于使用,我们的代码和数据在GitHub上公开 accessible。

Correlating Medi-Claim Service by Deep Learning Neural Networks

  • paper_url: http://arxiv.org/abs/2308.04469
  • repo_url: None
  • paper_authors: Jayanthi Vajiram, Negha Senthil, Nean Adhith. P
  • for: 防止医疗保险诈骗案件,涵盖患者、医生、诊断中心和保险公司等多方面。
  • methods: 使用卷积神经网络架构,通过对不同提供者的clamshell进行相关性研究,检测洗钱活动。同时使用supervised和Unsupervised分类器来检测诈骗和非诈骗laims。
  • results: 通过这种方法,可以准确地检测和预测诈骗案件,保护医疗保险公司和投保人的金融发展。
    Abstract Medical insurance claims are of organized crimes related to patients, physicians, diagnostic centers, and insurance providers, forming a chain reaction that must be monitored constantly. These kinds of frauds affect the financial growth of both insured people and health insurance companies. The Convolution Neural Network architecture is used to detect fraudulent claims through a correlation study of regression models, which helps to detect money laundering on different claims given by different providers. Supervised and unsupervised classifiers are used to detect fraud and non-fraud claims.
    摘要 医疗保险索赔是有组织犯罪活动相关的患者、医生、诊断中心和保险公司,形成一个排练的链式反推。这种类型的诈骗活动会对保险人和医疗保险公司的财务发展产生影响。使用卷积神经网络架构来检测诈骗索赔,通过对不同提供者的索赔进行相关的回归分析,可以检测到财务融资。使用超级vised和无级supervised分类器来检测诈骗和非诈骗索赔。

Explainable machine learning to enable high-throughput electrical conductivity optimization of doped conjugated polymers

  • paper_url: http://arxiv.org/abs/2308.04103
  • repo_url: None
  • paper_authors: Ji Wei Yoon, Adithya Kumar, Pawan Kumar, Kedar Hippalgaonkar, J Senthilnath, Vijila Chellappan
    for: 这研究旨在提高填充 polymer 材料的电导率测量效率,并通过机器学习(ML)方法来加速物料发现。methods: 该研究使用 readily measured absorbance spectra 作为输入,使用 ML 模型来预测填充 polymer 材料的电导率。results: 研究发现,使用 ML 模型可以高度准确地分类和预测填充 polymer 材料的电导率,并且可以提高实验测量效率 by 89%。此外,该研究还解决了机器学习模型中的常见问题,即不可解释性,通过利用特有的数学性质和 ML 模型,得到了证明了 spectral influences on conductivity 的准确信息。
    Abstract The combination of high-throughput experimentation techniques and machine learning (ML) has recently ushered in a new era of accelerated material discovery, enabling the identification of materials with cutting-edge properties. However, the measurement of certain physical quantities remains challenging to automate. Specifically, meticulous process control, experimentation and laborious measurements are required to achieve optimal electrical conductivity in doped polymer materials. We propose a ML approach, which relies on readily measured absorbance spectra, to accelerate the workflow associated with measuring electrical conductivity. The first ML model (classification model), accurately classifies samples with a conductivity >~25 to 100 S/cm, achieving a maximum of 100% accuracy rate. For the subset of highly conductive samples, we employed a second ML model (regression model), to predict their conductivities, yielding an impressive test R2 value of 0.984. To validate the approach, we showed that the models, neither trained on the samples with the two highest conductivities of 498 and 506 S/cm, were able to, in an extrapolative manner, correctly classify and predict them at satisfactory levels of errors. The proposed ML workflow results in an improvement in the efficiency of the conductivity measurements by 89% of the maximum achievable using our experimental techniques. Furthermore, our approach addressed the common challenge of the lack of explainability in ML models by exploiting bespoke mathematical properties of the descriptors and ML model, allowing us to gain corroborated insights into the spectral influences on conductivity. Through this study, we offer an accelerated pathway for optimizing the properties of doped polymer materials while showcasing the valuable insights that can be derived from purposeful utilization of ML in experimental science.
    摘要 高通过率实验技术和机器学习(ML)已经引入了一个新的时代,快速发现新材料的Properties。然而,一些物理量的测量仍然具有挑战。 Specifically, 制造过程控制、实验和劳动密集的测量是必需的,以实现射频电性的优化。我们提议一种ML方法,基于ready measured absorbance spectrum,加速测量电性的工作流程。第一个ML模型(分类模型)精确地将样本分类为电性> ~25 to 100 S/cm,达到了100%的准确率。对于部分高电性样本,我们使用了第二个ML模型(回归模型),预测他们的电性,得到了惊人的测试R2值为0.984。为验证方法,我们证明了模型没有在两个最高电性的样本(498和506 S/cm)上训练时,仍然可以在推导性的方式下,正确地分类和预测它们,并且达到了满意的误差水平。提出的ML工作流程可以提高电性测量的效率 by 89%。此外,我们的方法解决了通用机器学习模型的解释性问题,通过特有的数学属性和ML模型,使我们可以获得协同的理解,从而提高了我们对电性的理解。通过这项研究,我们提供了一个加速优化射频电性材料的路径,同时展示了机器学习在实验科学中的有价值。

Asynchronous Evolution of Deep Neural Network Architectures

  • paper_url: http://arxiv.org/abs/2308.04102
  • repo_url: None
  • paper_authors: Jason Liang, Hormoz Shahrzad, Risto Miikkulainen
  • for: 提高ENAS的并发评估速度,提高演化过程的效率。
  • methods: 提出了一种通用异步评估策略(AES),适用于ENAS。AES使用队列保存最多$K$个个体,等待工作节点进行评估,并在$M<<K$个个体已经被评估后进行下一代创建。
  • results: 在11比特多路分配任务和图像描述任务中,AES实现了多重性和效率的提升, Suggesting that AES is a promising method for parallelizing the evolution of complex systems with long and variable evaluation times.
    Abstract Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation to be created. Evolutionary neural architecture search (ENAS), a class of EAs that optimizes the architecture and hyperparameters of deep neural networks, is particularly vulnerable to this issue. This paper proposes a generic asynchronous evaluation strategy (AES) that is then adapted to work with ENAS. AES increases throughput by maintaining a queue of upto $K$ individuals ready to be sent to the workers for evaluation and proceeding to the next generation as soon as $M<
    摘要 (Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The translation is written in the traditional Chinese characters, rather than the simplified Chinese characters used in mainland China. The translation is based on the standard grammar and vocabulary of Simplified Chinese, and may differ slightly from the original text in terms of wording and sentence structure.)

Why Data Science Projects Fail

Abstract Data Science is a modern Data Intelligence practice, which is the core of many businesses and helps businesses build smart strategies around to deal with businesses challenges more efficiently. Data Science practice also helps in automating business processes using the algorithm, and it has several other benefits, which also deliver in a non-profitable framework. In regards to data science, three key components primarily influence the effective outcome of a data science project. Those are 1.Availability of Data 2.Algorithm 3.Processing power or infrastructure
摘要 《数据科学是现代数据智能实践之一,它是许多企业的核心,帮助企业构建智能策略,更有效地面对企业挑战。数据科学实践还可以自动化商业过程,它还有许多其他的优点,可以在非营利性框架下实现。在数据科学方面,三个关键组件主要影响数据科学项目的效果。那些是1.数据的可用性2.算法3.处理能力或基础设施》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK

  • paper_url: http://arxiv.org/abs/2308.04082
  • repo_url: None
  • paper_authors: Florian J. Kiwit, Marwa Marso, Philipp Ross, Carlos A. Riofrío, Johannes Klepsch, Andre Luckow
  • for: 本研究旨在提供一个标准化的 Quantum Machine Learning(QML)算法评估框架,以便更好地评估Quantum Computing(QC)应用程序的性能。
  • methods: 本研究使用了QUantum computing Application benchmaRK(QUARK)框架,并将其扩展以包括训练和部署Quantum generative models的能力。
  • results: 本研究通过将不同的Quantum generative models在不同的环境下训练和部署,并使用了广泛的评估指标,以评估这些模型的实际性和可行性。
    Abstract Benchmarking of quantum machine learning (QML) algorithms is challenging due to the complexity and variability of QML systems, e.g., regarding model ansatzes, data sets, training techniques, and hyper-parameters selection. The QUantum computing Application benchmaRK (QUARK) framework simplifies and standardizes benchmarking studies for quantum computing applications. Here, we propose several extensions of QUARK to include the ability to evaluate the training and deployment of quantum generative models. We describe the updated software architecture and illustrate its flexibility through several example applications: (1) We trained different quantum generative models using several circuit ansatzes, data sets, and data transformations. (2) We evaluated our models on GPU and real quantum hardware. (3) We assessed the generalization capabilities of our generative models using a broad set of metrics that capture, e.g., the novelty and validity of the generated data.
    摘要 审核量子机器学习(QML)算法具有复杂性和多样性,如模型架构、数据集、训练技术和超参数选择等方面。QUantum computing Application benchmaRK(QUARK)框架可以简化和标准化量子计算应用程序的审核研究。我们提出了将QUARK扩展以支持量子生成模型的训练和部署评估。我们描述了更新后的软件架构,并通过多个示例应用 illustrate its flexibility:1. 我们使用不同的量子生成模型、数据集和数据变换训练了多种环境。2. 我们在GPU和真实量子硬件上评估了我们的模型。3. 我们使用一组广泛的指标评估我们的生成模型的泛化能力,例如生成数据的新鲜度和有效性。

Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients

  • paper_url: http://arxiv.org/abs/2308.04077
  • repo_url: None
  • paper_authors: Yao Shu, Xiaoqiang Lin, Zhongxiang Dai, Bryan Kian Hsiang Low
  • for: 该文章的目的是提出一种基于追踪信息的 federated zeroth-order optimization(FZoo)算法,以提高 Query 和通信效率。
  • methods: 该算法使用追踪信息来估计函数梯度,并通过自适应梯度修正来减少实际更新与globally更新之间的差异。
  • results: 实验表明,该算法在 federated black-box adversarial attack 和 federated non-differentiable metric optimization 等实际应用中具有理论上的改进和实际效果。
    Abstract Federated optimization, an emerging paradigm which finds wide real-world applications such as federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function. The clients do not share their local datasets and typically only share their local gradients. However, the gradient information is not available in many applications of federated optimization, which hence gives rise to the paradigm of federated zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from the limitations of query and communication inefficiency, which can be attributed to (a) their reliance on a substantial number of function queries for gradient estimation and (b) the significant disparity between their realized local updates and the intended global updates. To this end, we (a) introduce trajectory-informed gradient surrogates which is able to use the history of function queries during optimization for accurate and query-efficient gradient estimation, and (b) develop the technique of adaptive gradient correction using these gradient surrogates to mitigate the aforementioned disparity. Based on these, we propose the federated zeroth-order optimization using trajectory-informed surrogate gradients (FZooS) algorithm for query- and communication-efficient federated ZOO. Our FZooS achieves theoretical improvements over the existing approaches, which is supported by our real-world experiments such as federated black-box adversarial attack and federated non-differentiable metric optimization.
    摘要 联合优化,是一种emerging paradigm,可以应用于联合学习、联合优化等实际应用中。在这种模型中,多个客户端(例如边缘设备)可以共同优化一个全球函数。客户端不会分享本地数据,通常只会分享本地梯度。然而,在许多应用中,梯度信息不可用,这导致了联合零次顺序优化(ZOO)的出现。现有的联合ZOO算法受到查询和通信不确定性的限制,这可以被归因于(a)它们依赖大量的函数查询来Estimate梯度,以及(b)它们实现的本地更新与globally intended的更新之间的差异。为解决这个问题,我们(a)引入了路径参数预测的梯度代理,可以在优化过程中使用历史的函数查询来精确地Estimate梯度,以及(b)开发了适应的梯度调整技术,以mitigate the aforementioned disparity。基于这些,我们提出了联合零次顺序优化使用路径参数预测梯度(FZooS)算法,实现了查询和通信效率的联合ZOO。我们的FZooS理论上超越了现有的方法,这被支持了我们的实际实验,例如联合黑盒抗攻击和联合非 differentiable 度量优化。

Learning Specialized Activation Functions for Physics-informed Neural Networks

  • paper_url: http://arxiv.org/abs/2308.04073
  • repo_url: https://github.com/leaplabthu/adaafforpinns
  • paper_authors: Honghui Wang, Lu Lu, Shiji Song, Gao Huang
  • for: This paper aims to address the optimization difficulty of physics-informed neural networks (PINNs) by exploring the connection between PINNs and activation functions.
  • methods: The paper introduces adaptive activation functions to search for the optimal function when solving different problems, and compares different adaptive activation functions and their limitations in the context of PINNs.
  • results: The proposed adaptive activation function can be used to solve different PDE systems in an interpretable way, and its effectiveness is demonstrated on a series of benchmarks.Here is the same information in Simplified Chinese text:
  • for: 这篇论文的目的是解决physics-informed neural networks (PINNs) 中的优化困难,通过研究 PINNs 和活动函数之间的连接。
  • methods: 论文提出了 adaptive 活动函数,用于在不同问题上搜索优化的最佳函数,并对不同的 adaptive 活动函数进行比较和限制分析。
  • results: 提议的 adaptive 活动函数可以用于解决不同 PDE 系统,并且在可读性方面具有优势,效果在一系列 benchmark 上得到证明。
    Abstract Physics-informed neural networks (PINNs) are known to suffer from optimization difficulty. In this work, we reveal the connection between the optimization difficulty of PINNs and activation functions. Specifically, we show that PINNs exhibit high sensitivity to activation functions when solving PDEs with distinct properties. Existing works usually choose activation functions by inefficient trial-and-error. To avoid the inefficient manual selection and to alleviate the optimization difficulty of PINNs, we introduce adaptive activation functions to search for the optimal function when solving different problems. We compare different adaptive activation functions and discuss their limitations in the context of PINNs. Furthermore, we propose to tailor the idea of learning combinations of candidate activation functions to the PINNs optimization, which has a higher requirement for the smoothness and diversity on learned functions. This is achieved by removing activation functions which cannot provide higher-order derivatives from the candidate set and incorporating elementary functions with different properties according to our prior knowledge about the PDE at hand. We further enhance the search space with adaptive slopes. The proposed adaptive activation function can be used to solve different PDE systems in an interpretable way. Its effectiveness is demonstrated on a series of benchmarks. Code is available at https://github.com/LeapLabTHU/AdaAFforPINNs.
    摘要 物理学 informed neural networks (PINNs) oftentimes 受到优化困难。在这种工作中,我们揭示了 PINNs 的优化困难与 activation functions 之间的关系。具体来说,我们发现 PINNs 解决不同的 PDE 问题时会具有高度敏感性于 activation functions。现有的工作通常通过不efficient trial-and-error 来选择 activation functions。为了避免不efficient manual selection 和 PINNs 的优化困难,我们引入了适应 activation functions,以搜索解决不同问题的优化函数。我们比较了不同的适应 activation functions,并讨论它们在 PINNs 中的局限性。此外,我们提议在 PINNs 优化中应用学习组合 candidate activation functions 的思想,以提高学习得到的函数的平滑性和多样性。这可以通过从候选集中除掉无法提供高阶导数的 activation functions,并将不同性质的 elementary functions 纳入候选集中来实现。我们还增加了 adaptive slopes,以进一步扩大搜索空间。我们的提议的适应 activation function 可以在可读性方面解决不同 PDE 系统。我们在一系列 benchmark 上证明了其效iveness。代码可以在 GitHub 上找到:https://github.com/LeapLabTHU/AdaAFforPINNs。

Path Signatures for Diversity in Probabilistic Trajectory Optimisation

  • paper_url: http://arxiv.org/abs/2308.04071
  • repo_url: None
  • paper_authors: Lucas Barcelos, Tin Lai, Rafael Oliveira, Paulo Borges, Fabio Ramos
  • for: 这篇论文是为了提出一种基于粗路理论的并行轨迹优化算法,以避免模式塌生和提高全球性。
  • methods: 该算法使用了粗路理论的路径签名和希尔伯特空间表示法,并将并行变ational推断与多样性推广核心相连接。
  • results: 经验表明,该策略可以在各种问题上实现更低的平均成本,从2D导航到受擦层环境中的机器人抓取器。
    Abstract Motion planning can be cast as a trajectory optimisation problem where a cost is minimised as a function of the trajectory being generated. In complex environments with several obstacles and complicated geometry, this optimisation problem is usually difficult to solve and prone to local minima. However, recent advancements in computing hardware allow for parallel trajectory optimisation where multiple solutions are obtained simultaneously, each initialised from a different starting point. Unfortunately, without a strategy preventing two solutions to collapse on each other, naive parallel optimisation can suffer from mode collapse diminishing the efficiency of the approach and the likelihood of finding a global solution. In this paper we leverage on recent advances in the theory of rough paths to devise an algorithm for parallel trajectory optimisation that promotes diversity over the range of solutions, therefore avoiding mode collapses and achieving better global properties. Our approach builds on path signatures and Hilbert space representations of trajectories, and connects parallel variational inference for trajectory estimation with diversity promoting kernels. We empirically demonstrate that this strategy achieves lower average costs than competing alternatives on a range of problems, from 2D navigation to robotic manipulators operating in cluttered environments.
    摘要 运动规划可以被视为一个轨迹优化问题,其中一个目标是将轨迹优化为最小化成本函数。在复杂的环境中,拥有多个障碍物和复杂的几何结构时,这个优化问题通常具有困难和极杂的本地最优解。然而,当前的计算硬件技术使得可以并行进行轨迹优化,从不同的起始点初始化多个解决方案。然而,如果没有避免两个解决方案相互冲突的策略,直观的并行优化可能会降低效率和找到全局解的可能性。在这篇论文中,我们利用了最近的粗 PATH 理论来设计一种避免模式崩溃的并行轨迹优化算法,该算法会Promote 多样性在解决方案的范围内,从而避免模式崩溃并实现更好的全局性。我们的方法基于轨迹签名和希尔伯特空间表示法,并将并行变分推理与多样性激活器相连接。我们在各种问题上进行了实验,并证明了这种策略可以在范围内实现更低的平均成本。

ConDistFL: Conditional Distillation for Federated Learning from Partially Annotated Data

  • paper_url: http://arxiv.org/abs/2308.04070
  • repo_url: https://github.com/nvidia/nvflare
  • paper_authors: Pochuan Wang, Chen Shen, Weichung Wang, Masahiro Oda, Chiou-Shann Fuh, Kensaku Mori, Holger R. Roth
  • For: 提出了一种总结多个器官和疾病的整体分割模型,使用联合学习(FL)技术,并且解决了基本缺乏完全标注数据的问题。* Methods: combining FL with knowledge distillation,使得本地模型可以从全球模型中提取未标注器官和肿瘤的知识,并且使用适当的条件概率表示来做这一点。* Results: 对四个不同的部分标注的腹部CT数据集进行验证,并证明了该方法与FedAvg和FedOpt基elines相比,具有显著的提高。此外,对外部测试数据集的性能也表明了模型在不同数据集上进行集成训练后的优异普适性。
    Abstract Developing a generalized segmentation model capable of simultaneously delineating multiple organs and diseases is highly desirable. Federated learning (FL) is a key technology enabling the collaborative development of a model without exchanging training data. However, the limited access to fully annotated training data poses a major challenge to training generalizable models. We propose "ConDistFL", a framework to solve this problem by combining FL with knowledge distillation. Local models can extract the knowledge of unlabeled organs and tumors from partially annotated data from the global model with an adequately designed conditional probability representation. We validate our framework on four distinct partially annotated abdominal CT datasets from the MSD and KiTS19 challenges. The experimental results show that the proposed framework significantly outperforms FedAvg and FedOpt baselines. Moreover, the performance on an external test dataset demonstrates superior generalizability compared to models trained on each dataset separately. Our ablation study suggests that ConDistFL can perform well without frequent aggregation, reducing the communication cost of FL. Our implementation will be available at https://github.com/NVIDIA/NVFlare/tree/dev/research/condist-fl.
    摘要 发展一种可同时分割多个器官和疾病的通用模型是非常有优点的。联邦学习(FL)是一种关键技术,它允许合作建立模型,而不需要交换训练数据。然而,有限的完全标注数据对训练通用模型 pose 一个主要挑战。我们提出了 "ConDistFL" 框架,它将 FL 与知识塑造相结合,以解决这个问题。本地模型可以从全球模型中提取未标注器官和肿瘤的知识,使用适当设计的conditional probability表示。我们在四个不同的 partially annotated 腹部 CT 数据集上验证了我们的框架。实验结果表明,我们的框架在 FedAvg 和 FedOpt 基elines 上显著超越了。此外,对于外部测试集的性能表明,我们的模型具有较高的普适性,比单独在每个数据集上训练的模型要好。我们的剖析研究表明,ConDistFL 可以在不经常的聚合情况下表现良好,降低了联邦学习中的通信成本。我们的实现将在 GitHub 上提供,请参考

Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2308.04061
  • repo_url: None
  • paper_authors: Dongyoon Yang, Insung Kong, Yongdai Kim
  • For: This paper focuses on semi-supervised adversarial training, where labeled data is scarce.* Methods: The authors derive two upper bounds for the robust risk and propose a regularization term for unlabeled data. They also develop a semi-supervised adversarial training algorithm that combines the proposed regularization term with knowledge distillation using a semi-supervised teacher.* Results: The authors achieve state-of-the-art performance with significant margins compared to existing algorithms. Specifically, their algorithm with only 8% labeled data is comparable to supervised adversarial training algorithms that use all labeled data in terms of standard and robust accuracies on CIFAR-10.Here’s the Chinese translation of the three key points:* For: 这篇论文专注于半指导式对抗训练,即标注数据匮乏的情况。* Methods: 作者提出了两个Upper bound,并提出了一个用于未标注数据的正则化项。他们还开发了一种半指导式对抗训练算法,该算法结合了提出的正则化项和知识塑造。* Results: 作者实现了现有算法的最佳性能,具体来说,他们的算法只使用8%的标注数据,仍能与全量标注数据使用的超级vised adversarial training算法相当,即在CIFAR-10上的标准准确率和对抗性准确率。
    Abstract Adversarial robustness is a research area that has recently received a lot of attention in the quest for trustworthy artificial intelligence. However, recent works on adversarial robustness have focused on supervised learning where it is assumed that labeled data is plentiful. In this paper, we investigate semi-supervised adversarial training where labeled data is scarce. We derive two upper bounds for the robust risk and propose a regularization term for unlabeled data motivated by these two upper bounds. Then, we develop a semi-supervised adversarial training algorithm that combines the proposed regularization term with knowledge distillation using a semi-supervised teacher (i.e., a teacher model trained using a semi-supervised learning algorithm). Our experiments show that our proposed algorithm achieves state-of-the-art performance with significant margins compared to existing algorithms. In particular, compared to supervised learning algorithms, performance of our proposed algorithm is not much worse even when the amount of labeled data is very small. For example, our algorithm with only 8\% labeled data is comparable to supervised adversarial training algorithms that use all labeled data, both in terms of standard and robust accuracies on CIFAR-10.
    摘要 “敌对响应性”是人工智能的研究领域,最近受到了很多关注,以建立可靠的人工智能。然而,现有的工作假设了充足的标签数据,并且专注于监督学习。在本文中,我们研究 semi-supervised adversarial 训练,其中标签数据稀缺。我们 derive two upper bounds for the robust risk,并提出一个鼓励不标签数据的调整项。然后,我们开发了一个 semi-supervised adversarial 训练算法,它结合了提案的调整项和知识传授使用 semi-supervised teacher (即使用 semi-supervised 学习算法训练的教师模型)。我们的实验结果显示,我们的提案算法可以实现现在的最佳性能,并且与已有的算法相比,仅在标签数据非常少时,性能与监督学习算法相似。例如,我们的算法仅使用 8% 的标签数据时,与监督学习算法使用所有标签数据相比,在 CIFAR-10 上的标准和敌对精度都具有显著的优化。”

Backdoor Federated Learning by Poisoning Backdoor-Critical Layers

  • paper_url: http://arxiv.org/abs/2308.04466
  • repo_url: None
  • paper_authors: Haomin Zhuang, Mingxian Yu, Hao Wang, Yang Hua, Jian Li, Xu Yuan
  • for: 这 paper 旨在探讨 federated learning (FL) 中存在攻击敏感数据的差点,并提出了一种基于攻击者视角的增强型隐蔽攻击方法。
  • methods: 该 paper 使用了一种涉及攻击者视角的方法来识别 federated learning (FL) 模型中的敏感层次,然后通过适应性地进行攻击来寻找适合的攻击方法。
  • results: 实验结果表明,该 paper 提出的 BC 层攻击方法可以在七种 state-of-the-art (SOTA) 防御策略下成功地攻击 federated learning (FL),且比较新的攻击方法更高效。
    Abstract Federated learning (FL) has been widely deployed to enable machine learning training on sensitive data across distributed devices. However, the decentralized learning paradigm and heterogeneity of FL further extend the attack surface for backdoor attacks. Existing FL attack and defense methodologies typically focus on the whole model. None of them recognizes the existence of backdoor-critical (BC) layers-a small subset of layers that dominate the model vulnerabilities. Attacking the BC layers achieves equivalent effects as attacking the whole model but at a far smaller chance of being detected by state-of-the-art (SOTA) defenses. This paper proposes a general in-situ approach that identifies and verifies BC layers from the perspective of attackers. Based on the identified BC layers, we carefully craft a new backdoor attack methodology that adaptively seeks a fundamental balance between attacking effects and stealthiness under various defense strategies. Extensive experiments show that our BC layer-aware backdoor attacks can successfully backdoor FL under seven SOTA defenses with only 10% malicious clients and outperform the latest backdoor attack methods.
    摘要 联合学习(FL)已经广泛应用以进行分散设备上的机器学习训练。然而,分散式学习模式和资料多样性对FL的攻击面积增加了额外的隐藏问题。现有的FL攻击和防御方法通常集中在整个模型上。 none of them 认为存在关键层(BC)-一小subset of layers that dominate the model vulnerabilities. 攻击BC层可以实现equivalent 的效果,但是比攻击整个模型要小得多,这使得现有的防御技术更难察觉。 This paper proposes a general in-situ approach that identifies and verifies BC layers from the perspective of attackers. Based on the identified BC layers, we carefully craft a new backdoor attack methodology that adaptively seeks a fundamental balance between attacking effects and stealthiness under various defense strategies. 实验表明,我们的BC层意识的后门攻击可以成功地在七种SOTA防御措施下进行后门攻击,并且比latest backdoor attack methods 高效。

Toward Improving Predictive Risk Modelling for New Zealand’s Child Welfare System Using Clustering Methods

  • paper_url: http://arxiv.org/abs/2308.04060
  • repo_url: None
  • paper_authors: Sahar Barmomanesh, Victor Miranda-Soberanis
  • for: 这个研究旨在帮助社工人员更好地识别儿童滋扰的风险因素,并决定当局是否应对儿童进行介入。
  • methods: 这个研究使用了主成因分析和K-Means clustering方法来识别儿童的风险因素,并分析这些因素之间的互动关系。
  • results: 研究发现,使用不同的 clustering 方法可以分辨出不同的儿童群体,并且这些群体之间存在一定的区别。此外,研究发现,使用特定的年龄组别的模型可以提高模型的准确性。
    Abstract The combination of clinical judgement and predictive risk models crucially assist social workers to segregate children at risk of maltreatment and decide when authorities should intervene. Predictive risk modelling to address this matter has been initiated by several governmental welfare authorities worldwide involving administrative data and machine learning algorithms. While previous studies have investigated risk factors relating to child maltreatment, several gaps remain as to understanding how such risk factors interact and whether predictive risk models perform differently for children with different features. By integrating Principal Component Analysis and K-Means clustering, this paper presents initial findings of our work on the identification of such features as well as their potential effect on current risk modelling frameworks. This approach allows examining existent, unidentified yet, clusters of New Zealand (NZ) children reported with care and protection concerns, as well as to analyse their inner structure, and evaluate the performance of prediction models trained cluster wise. We aim to discover the extent of clustering degree required as an early step in the development of predictive risk models for child maltreatment and so enhance the accuracy of such models intended for use by child protection authorities. The results from testing LASSO logistic regression models trained on identified clusters revealed no significant difference in their performance. The models, however, performed slightly better for two clusters including younger children. our results suggest that separate models might need to be developed for children of certain age to gain additional control over the error rates and to improve model accuracy. While results are promising, more evidence is needed to draw definitive conclusions, and further investigation is necessary.
    摘要 临床判断和预测风险模型可以帮助社工分类受护儿童投入风险和决定当局是否介入。预测风险模型在世界各地政府儿童护理机构中已经被开发,使用行政数据和机器学习算法。 although previous studies have investigated child maltreatment risk factors, there are still gaps in understanding how these risk factors interact and whether predictive risk models perform differently for children with different features. 本文使用主成分分析和K-Means聚类分析初步发现了这些特征,以及它们可能对当前风险模型 frameworks 有何影响。这种方法允许我们检查新西兰(NZ)儿童报告了护理和保护问题的现有、未知的群集,以及其内部结构,并评估这些群集训练的预测模型性能。我们的目标是发现预测模型是否需要不同的年龄层分配,以提高预测模型的准确性。我们的结果表明,使用LASSO logistic regression模型训练于特定群集没有显著差异。然而,这些模型在两个年龄较少的群集中表现稍微更好。这些结果表明,可能需要为不同的年龄层开发不同的模型,以提高预测模型的准确性。虽然结果有前途,但需要更多的证据来 draw definitive conclusions,并进一步进行调查。

The Five-Dollar Model: Generating Game Maps and Sprites from Sentence Embeddings

  • paper_url: http://arxiv.org/abs/2308.04052
  • repo_url: https://github.com/TimMerino1710/five-dollar-model
  • paper_authors: Timothy Merino, Roman Negri, Dipika Rajesh, M Charity, Julian Togelius
  • for: 这个论文旨在提出一种轻量级的文本到图像生成模型,能够从编码的文本提示生成低维度图像。
  • methods: 这个模型使用了一些新的扩展策略,以提高模型在有限的数据集上的性能。
  • results: 模型能够生成高度准确和美观的图像,同时保持文本提示中的含义。
    Abstract The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model.
    摘要 “五块模型”是一种轻量级文本到图像生成架构,可以从编码的文本提示生成低维度图像。这种模型可以在有限的培训数据下生成准确和美观的内容,并且保持文本提示中的含义。我们将这种模型应用于三个小 datasets:像素艺术视频游戏地图、视频游戏填充图像和压缩emoji图像。我们还使用了新的扩展策略来提高我们模型的性能。我们使用 cosine similarity 分数来评估我们模型对文本-图像对的表现。

Generative Models for Anomaly Detection and Design-Space Dimensionality Reduction in Shape Optimization

  • paper_url: http://arxiv.org/abs/2308.04051
  • repo_url: None
  • paper_authors: Danny D’Agostino
  • for: 提高全球优化算法的效率,同时促进优化过程中高质量的设计生成。
  • methods: 减少设计变量数量,最大化几何变量的差异,使用概率线性隐藏变量模型,如因素分析和概率主成分分析。
  • results: 提高全球优化算法的收敛性,仅生成高质量几何特征的设计,避免 computationally expensive 优化过程中的浪费。
    Abstract Our work presents a novel approach to shape optimization, that has the twofold objective to improve the efficiency of global optimization algorithms while promoting the generation of high-quality designs during the optimization process free of geometrical anomalies. This is accomplished by reducing the number of the original design variables defining a new reduced subspace where the geometrical variance is maximized and modeling the underlying generative process of the data via probabilistic linear latent variable models such as Factor Analysis and Probabilistic Principal Component Analysis. We show that the data follows approximately a Gaussian distribution when the shape modification method is linear and the design variables are sampled uniformly at random, due to the direct application of the central limit theorem. The model uncertainty is measured in terms of Mahalanobis distance, and the paper demonstrates that anomalous designs tend to exhibit a high value of this metric. This enables the definition of a new optimization model where anomalous geometries are penalized and consequently avoided during the optimization loop. The procedure is demonstrated for hull shape optimization of the DTMB 5415 model, extensively used as an international benchmark for shape optimization problems. The global optimization routine is carried out using Bayesian Optimization and the DIRECT algorithm. From the numerical results, the new framework improves the convergence of global optimization algorithms, while only designs with high-quality geometrical features are generated through the optimization routine thereby avoiding the wastage of precious computationally expensive simulations.
    摘要 我们的工作提出了一种新的方法 для优化形状,以提高全球优化算法的效率,同时推出高质量的设计。这是通过减少原始设计变量,定义一个新的减少子空间,使几何异常值最大化,并使用抽象线性latent variable模型,如因素分析和概率主成分分析,来模型数据的生成过程。我们证明数据遵循近似 Gaussian 分布,当shape modification方法是线性的,并且设计变量随机 sampling 时,通过直接应用中心偏移定理。模型不确定性被测量为 Mahalanobis 距离,并且实验表明,异常设计通常具有高值这个指标。这允许定义一个新的优化模型,惩罚异常几何,并在优化迭代中避免异常设计的生成。我们在 DTMB 5415 模型的船体形状优化中进行了实验,使用 Bayesian 优化和 DIRECT 算法。从numerical 结果来看,新的框架可以提高全球优化算法的收敛,同时只有高质量的几何特征被优化算法生成,从而避免了计算成本expensive的simulation 的浪费。

A Comparative Study on TF-IDF feature Weighting Method and its Analysis using Unstructured Dataset

  • paper_url: http://arxiv.org/abs/2308.04037
  • repo_url: None
  • paper_authors: Mamata Das, Selvakumar K., P. J. A. Alphonse
  • for: 这篇论文的目的是研究文本分类中feature重要性的问题,以及使用TF-IDF和NLP算法进行文本分类。
  • methods: 这篇论文使用了IMDB电影评论和Amazon Alexa评论数据集进行实验,并使用了多种常见的分类算法来验证提出的方法,包括支持向量机(SVM)、抽象函数(Logistic Regression)、多项随机树(Random Forest)、决策树(Decision Tree)和k-最近邻居(KNN)。
  • results: 研究发现,基于TF-IDF特征提取方法可以获得最高的准确率(93.81%)、精度(94.20%)、回归率(93.81%)和F1分数(91.99%)值,而基于N-Gram特征提取方法则不如TF-IDF方法。
    Abstract Text Classification is the process of categorizing text into the relevant categories and its algorithms are at the core of many Natural Language Processing (NLP). Term Frequency-Inverse Document Frequency (TF-IDF) and NLP are the most highly used information retrieval methods in text classification. We have investigated and analyzed the feature weighting method for text classification on unstructured data. The proposed model considered two features N-Grams and TF-IDF on the IMDB movie reviews and Amazon Alexa reviews dataset for sentiment analysis. Then we have used the state-of-the-art classifier to validate the method i.e., Support Vector Machine (SVM), Logistic Regression, Multinomial Naive Bayes (Multinomial NB), Random Forest, Decision Tree, and k-nearest neighbors (KNN). From those two feature extractions, a significant increase in feature extraction with TF-IDF features rather than based on N-Gram. TF-IDF got the maximum accuracy (93.81%), precision (94.20%), recall (93.81%), and F1-score (91.99%) value in Random Forest classifier.
    摘要 文本分类是将文本分类到相关的类别中的过程,其算法是自然语言处理(NLP)的核心。文本频率-反向文档频率(TF-IDF)和NLP是文本检索中最广泛使用的方法。我们已经对文本分类中的特征赋值方法进行了调查和分析。我们提出了基于IMDB电影评论和Amazon Alexa评论数据集的 sentiment analysis 的方法,并使用了当今最佳的分类器来验证方法,即支持向量机(SVM)、概率回归、多项随机森林(Multinomial NB)、随机树、决策树和k-最近邻居(KNN)。从两个特征提取来看,TF-IDF特征的特征提取得到了显著的增加,而不是基于N- Gram。TF-IDF在Random Forest分类器中获得了最大的准确率(93.81%)、精度(94.20%)、回归率(93.81%)和F1分数(91.99%)值。

Top K Relevant Passage Retrieval for Biomedical Question Answering

  • paper_url: http://arxiv.org/abs/2308.04028
  • repo_url: https://github.com/shashank140195/Biomedical_QA_Model
  • paper_authors: Shashank Gupta
    for:本研究旨在开发一个基于Pubmed文章的生物医学问答系统,以提供准确的答案。methods:本研究使用现有的DPR框架,并在其基础上进行了细致的调整和训练,以提高问答系统的准确率。results:在 BioASQ 问答集上进行评估,我们的调整后的紧密检索器得分为0.81,表明我们的方法可以提供高度准确的答案。
    Abstract Question answering is a task that answers factoid questions using a large collection of documents. It aims to provide precise answers in response to the user's questions in natural language. Question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. On the web, there is no single article that could provide all the possible answers available on the internet to the question of the problem asked by the user. The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions. Question answering (QA) has made big strides with several open-domain and machine comprehension systems built using large-scale annotated datasets. However, in the clinical domain, this problem remains relatively unexplored. According to multiple surveys, Biomedical Questions cannot be answered correctly from Wikipedia Articles. In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions. When evaluated on a BioASQ QA dataset, our fine-tuned dense retriever results in a 0.81 F1 score.
    摘要 问答任务是用一个大量文档集来回答用户的问题。其目标是通过自然语言提供精准的答案。问答需要高效的段落检索,以选择可能的上下文,传统上使用TF-IDF或BM25等稀疏 вектор空间模型。在互联网上,没有一篇文章可以提供用户问题的所有可能的答案。我们使用Dec. 20, 2018年的Wikipedia备份作为问答模型的训练数据源。问答(QA)在开放领域和机器理解领域已经做出了很大的进步,但在医疗领域这个问题还很少研究。根据多个调查,医学问题不能准确地从Wikipedia文章中得到答案。在这种情况下,我们对现有的DPR框架进行了修改,并从Pubmed文章中检索答案。当评估在BioASQ QA数据集上时,我们的精度检索器得到了0.81的F1分数。

Scope Loss for Imbalanced Classification and RL Exploration

  • paper_url: http://arxiv.org/abs/2308.04024
  • repo_url: None
  • paper_authors: Hasham Burhani, Xiao Qi Shi, Jonathan Jaegerman, Daniel Balicki
  • for: 本研究目的是Equivalence between reinforcement learning problem和Supervised classification problem,并找到它们之间的相似性。
  • methods: 本研究使用了探索尝试和优化问题的探索-优化补偿来Address the exploration exploitation trade-off in reinforcement learning and the dataset imbalance problem in supervised classification。
  • results: 研究发现了一种新的损失函数Scope Loss,可以防止过度利用和数据偏好导致的性能下降,无需进行任何调整。Scope Loss在一系列基准功能回归学任务和一个偏好分类 dataset 上测试,与State-of-the-art损失函数相比,Scope Loss表现出色。
    Abstract We demonstrate equivalence between the reinforcement learning problem and the supervised classification problem. We consequently equate the exploration exploitation trade-off in reinforcement learning to the dataset imbalance problem in supervised classification, and find similarities in how they are addressed. From our analysis of the aforementioned problems we derive a novel loss function for reinforcement learning and supervised classification. Scope Loss, our new loss function, adjusts gradients to prevent performance losses from over-exploitation and dataset imbalances, without the need for any tuning. We test Scope Loss against SOTA loss functions over a basket of benchmark reinforcement learning tasks and a skewed classification dataset, and show that Scope Loss outperforms other loss functions.
    摘要 我们展示了强化学习问题和超级vised分类问题之间的等值性。我们遂视探索优化和数据集不均势问题在强化学习和超级vised分类中的相似性,并从这些问题的分析中获得了一个新的损失函数。我们称之为Scope Loss。Scope Loss可以调整 gradients,以避免因过度探索而导致的性能损失和数据集不均势问题,不需要任何调整。我们将Scope Loss与现有的损失函数进行比较,在一签 benchmark 强化学习任务和一个偏斜的分类dataset上进行测试,结果显示Scope Loss可以超越其他损失函数。

Improving Performance of Semi-Supervised Learning by Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2308.04018
  • repo_url: None
  • paper_authors: Dongyoon Yang, Kunwoong Kim, Yongdai Kim
  • for: 提高latest SSL算法的表现,使其更适应 semi-supervised learning 的应用场景。
  • methods: 提出了一种通用框架SCAR,通过对预训练模型进行 adversarial 攻击,选择高自信仪器的无标样本进行标注。
  • results: 在 CIFAR10 上,与 SCAR 结合的三种 latest SSL 算法显示出了显著提高图像分类的表现。
    Abstract Semi-supervised learning (SSL) algorithm is a setup built upon a realistic assumption that access to a large amount of labeled data is tough. In this study, we present a generalized framework, named SCAR, standing for Selecting Clean samples with Adversarial Robustness, for improving the performance of recent SSL algorithms. By adversarially attacking pre-trained models with semi-supervision, our framework shows substantial advances in classifying images. We introduce how adversarial attacks successfully select high-confident unlabeled data to be labeled with current predictions. On CIFAR10, three recent SSL algorithms with SCAR result in significantly improved image classification.
    摘要 <>将文本翻译成简化中文。<>半有指导学习(SSL)算法是基于现实的假设,即获得大量标注数据困难。在这个研究中,我们提出一种通用框架,名为SCAR,即选择清洁样本并具有对抗鲁棒性。通过对预训练模型进行对抗攻击,我们的框架实现了显著提高图像分类性能。我们介绍了如何使用对抗攻击选择高信度无标记数据,并将当前预测作为标注。在CIFAR10上,三种最近的SSL算法与SCAR结果显著提高图像分类。

Continual Pre-Training of Large Language Models: How to (re)warm your model?

  • paper_url: http://arxiv.org/abs/2308.04014
  • repo_url: None
  • paper_authors: Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort
  • for: 这研究旨在提高大型语言模型(LLMs)的效率和成本,通过不断更新已经预训练的模型而不是从scratch重新训练。
  • methods: 研究者采用了不同的温存策略来检查学习率的影响,包括线性温存和偏微分衰减。
  • results: 研究结果表明,在继续预训练时,模型的整体性能会逐渐提高,即使在大量下游数据集上。此外,在不同的预训练点和最大学习率下,模型的性能也有显著的不同。
    Abstract Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset.
    摘要 大型语言模型(LLM)通常在数十亿个字符上进行预训练,然后又重新开始预训练。一种更经济高效的解决方案是让这些模型在新数据上进行连续预训练,而不是从scratch重新训练。然而,新数据引入的分布变化通常会导致过去数据的性能下降。为了实现效率的连续预训练,在这项工作中,我们研究了不同的温存策略。我们的假设是,在训练新数据集时,学习率必须重新增加以提高计算效率。我们研究在Pile(上游数据,300亿个字符)预训练后,在SlimPajama(下游数据,297亿个字符)上继续预训练,采用线性温存和cosine衰减时间表。我们在Pythia 410M语言模型架构上进行所有实验,并通过验证plexity来评估性能。我们对不同的预训练检查点、最大学习率和温存长度进行了尝试。我们的结果表明,虽然在重新暖化模型后,初期loss会增加在上游和下游数据上,但在长期来看,它会提高下游性能,超过从scratch训练的模型,即使是大型下游数据集。

Generalization bound for estimating causal effects from observational network data

  • paper_url: http://arxiv.org/abs/2308.04011
  • repo_url: None
  • paper_authors: Ruichu Cai, Zeqin Yang, Weilin Chen, Yuguang Yan, Zhifeng Hao
  • for: 这篇论文是为了估计来自观察网络数据的 causal effect 的。
  • methods: 论文使用了重量学习和 representation learning 两种方法来估计 causal effect。
  • results: 实验研究表明,这种方法可以有效地估计 causal effect,并且可以提供一个理论上的支持来减少复杂的干扰偏见。
    Abstract Estimating causal effects from observational network data is a significant but challenging problem. Existing works in causal inference for observational network data lack an analysis of the generalization bound, which can theoretically provide support for alleviating the complex confounding bias and practically guide the design of learning objectives in a principled manner. To fill this gap, we derive a generalization bound for causal effect estimation in network scenarios by exploiting 1) the reweighting schema based on joint propensity score and 2) the representation learning schema based on Integral Probability Metric (IPM). We provide two perspectives on the generalization bound in terms of reweighting and representation learning, respectively. Motivated by the analysis of the bound, we propose a weighting regression method based on the joint propensity score augmented with representation learning. Extensive experimental studies on two real-world networks with semi-synthetic data demonstrate the effectiveness of our algorithm.
    摘要 估计来自观察网络数据的 causal effect 是一个重要 yet 挑战性的问题。现有的 causal inference 在网络数据上lacks 一个分析 generalization bound,可以 theoretically 提供支持来减少复杂的混杂偏见和实践 guide 学习目标的原则性。为了填这个 gap,我们 derivate 一个 generalization bound для causal effect estimation 在网络场景下,通过 exploiting 1) 重量 schema based on joint propensity score 和 2) representation learning schema based on Integral Probability Metric (IPM)。我们提供 two perspectives on the generalization bound in terms of reweighting and representation learning,分别。受 bound 分析的激励,我们提议一种基于 joint propensity score 和 representation learning的重量回归方法。经验性研究在 two real-world networks 上的 semi-synthetic data 表明了我们的算法的有效性。

Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning

  • paper_url: http://arxiv.org/abs/2308.03999
  • repo_url: https://github.com/abhilekha-dalal/xai-using-wikidataAndEcii
  • paper_authors: Abhilekha Dalal, Md Kamruzzaman Sarker, Adrita Barua, Eugene Vasserman, Pascal Hitzler
  • for: 本研究旨在解释深度学习系统中隐藏层神经元的活动,以提供深度学习系统内部探测输入的信息。
  • methods: 本研究使用了大规模背景知识(约200万个类)和基于描述逻辑的符号逻辑方法called Concept Induction,以自动附加隐藏层神经元的意义 labels。
  • results: 研究结果表明,我们可以通过一种假设和验证过程,自动将大规模背景知识中的意义labels附加到Convolutional Neural Network的 dense层神经元上。
    Abstract A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would provide insights into the question of what a deep learning system has internally detected as relevant on the input, demystifying the otherwise black-box character of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. Our approach is based on using large-scale background knowledge approximately 2 million classes curated from the Wikipedia concept hierarchy together with a symbolic reasoning approach called Concept Induction based on description logics, originally developed for applications in the Semantic Web field. Our results show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network through a hypothesis and verification process.
    摘要 一 Major challenge in Explainable AI 是正确地理解隐藏节点的活动:正确的解释可以提供关于深度学习系统内部检测到的输入的信息,从而干预深度学习系统的黑盒特性。现状的技术是,隐藏节点的活动可以在某些情况下被解释得通用人类理解,但系统化的自动方法来测试和验证解释是未经探索的。在这篇论文中,我们提供了一种方法,并证明其可以提供有意义的解释。我们的方法基于使用大规模背景知识(约200万个类别,来自wikipedia知识树),并使用基于描述逻辑的符号推理方法 called Concept Induction,原始是为Semantic Web领域开发的。我们的结果表明,我们可以通过一个假设和验证过程,将background知识中的有意义标签自动地应用于 dense层中的神经元。

Cooperative Multi-Type Multi-Agent Deep Reinforcement Learning for Resource Management in Space-Air-Ground Integrated Networks

  • paper_url: http://arxiv.org/abs/2308.03995
  • repo_url: None
  • paper_authors: Hengxi Zhang, Huaze Tang, Wenbo Ding, Xiao-Ping Zhang
  • for: 提高智能城市应用的潜在价值和可行性
  • methods: 提出了一种涵盖五个不同通信链的完整SAGIN系统,并提出了一种高效的合作多类多代理人深度学习(CMT-MARL)方法来解决资源管理问题
  • results: 实验结果表明,提议的CMT-MARL方法能够有效地解决资源管理问题,并且可以提高总传输率和传输成功率等关键性能指标。
    Abstract The Space-Air-Ground Integrated Network (SAGIN), integrating heterogeneous devices including low earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users (GUs), holds significant promise for advancing smart city applications. However, resource management of the SAGIN is a challenge requiring urgent study in that inappropriate resource management will cause poor data transmission, and hence affect the services in smart cities. In this paper, we develop a comprehensive SAGIN system that encompasses five distinct communication links and propose an efficient cooperative multi-type multi-agent deep reinforcement learning (CMT-MARL) method to address the resource management issue. The experimental results highlight the efficacy of the proposed CMT-MARL, as evidenced by key performance indicators such as the overall transmission rate and transmission success rate. These results underscore the potential value and feasibility of future implementation of the SAGIN.
    摘要 SAGIN(空间-空气-地面集成网络),包括低地球轨道卫星(LEO)、无人飞行器(UAV)和地面用户(GU)多种设备,具有推进智能城市应用的潜力。然而,SAGIN资源管理却是一项需要优先研究的挑战,因为不当的资源管理会导致数据传输差,从而影响智能城市服务。本文提出了一个完整的SAGIN系统,包括五种不同的通信链,并提出了一种高效的合作多种多代理人深度学习(CMT-MARL)方法来解决资源管理问题。实验结果表明,提议的CMT-MARL方法能够有效地解决SAGIN资源管理问题,以示KEY表现指标(总传输率和传输成功率)。这些结果表明SAGIN的可能性和实现性。

Fourier neural operator for real-time simulation of 3D dynamic urban microclimate

  • paper_url: http://arxiv.org/abs/2308.03985
  • repo_url: None
  • paper_authors: Wenhui Peng, Shaoxiang Qin, Senwen Yang, Jianchun Wang, Xue Liu, Liangzhu, Wang
  • For: The paper aims to develop a real-time three-dimensional urban wind field simulation method using the Fourier Neural Operator (FNO) network to accelerate the modeling of complex non-linear interactions and system dynamics in urban microclimates.* Methods: The paper uses a combination of Computational Fluid Dynamics (CFD) simulation and the FNO network to model urban microclimates. The training and testing data are generated from CFD simulation of the urban area, based on the semi-Lagrangian approach and fractional stepping method.* Results: The paper shows that the FNO model can accurately reconstruct the instantaneous spatial velocity field and generalize well on different wind directions. The FNO approach can make predictions within milliseconds on the graphics processing unit, making real-time simulation of 3D dynamic urban microclimate possible.Here are the three points in Simplified Chinese:* For: 本研究旨在通过 Фурье神经网络(FNO)加速城市微气候模型化。* Methods: 本研究使用CFD计算和FNO网络模拟城市微气候。训练和测试数据来自CFD计算城市区域,基于半拉格朗日方法和分辨率步骤法。* Results: FNO模型可以准确重建三维城市风场速度场,并在不同风向下Generalize well。FNO方法可以在图形处理器上进行毫秒级准确预测,使城市微气候实时模拟变得可能。
    Abstract Global urbanization has underscored the significance of urban microclimates for human comfort, health, and building/urban energy efficiency. They profoundly influence building design and urban planning as major environmental impacts. Understanding local microclimates is essential for cities to prepare for climate change and effectively implement resilience measures. However, analyzing urban microclimates requires considering a complex array of outdoor parameters within computational domains at the city scale over a longer period than indoors. As a result, numerical methods like Computational Fluid Dynamics (CFD) become computationally expensive when evaluating the impact of urban microclimates. The rise of deep learning techniques has opened new opportunities for accelerating the modeling of complex non-linear interactions and system dynamics. Recently, the Fourier Neural Operator (FNO) has been shown to be very promising in accelerating solving the Partial Differential Equations (PDEs) and modeling fluid dynamic systems. In this work, we apply the FNO network for real-time three-dimensional (3D) urban wind field simulation. The training and testing data are generated from CFD simulation of the urban area, based on the semi-Lagrangian approach and fractional stepping method to simulate urban microclimate features for modeling large-scale urban problems. Numerical experiments show that the FNO model can accurately reconstruct the instantaneous spatial velocity field. We further evaluate the trained FNO model on unseen data with different wind directions, and the results show that the FNO model can generalize well on different wind directions. More importantly, the FNO approach can make predictions within milliseconds on the graphics processing unit, making real-time simulation of 3D dynamic urban microclimate possible.
    摘要 Recently, deep learning techniques have been applied to accelerate the modeling of complex non-linear interactions and system dynamics. One promising approach is the Fourier Neural Operator (FNO), which can accelerate the solution of Partial Differential Equations (PDEs) and model fluid dynamic systems.In this study, we use the FNO network for real-time three-dimensional (3D) urban wind field simulation. The training and testing data are generated from CFD simulations of the urban area, using the semi-Lagrangian approach and fractional stepping method to simulate urban microclimate features for modeling large-scale urban problems. Our numerical experiments show that the FNO model can accurately reconstruct the instantaneous spatial velocity field. We also evaluate the trained FNO model on unseen data with different wind directions, and the results show that the model can generalize well on different wind directions.More importantly, the FNO approach can make predictions within milliseconds on a graphics processing unit, making real-time simulation of 3D dynamic urban microclimate possible. This has significant implications for urban planning and design, as well as for the development of more energy-efficient and resilient cities.

Characterization of Human Balance through a Reinforcement Learning-based Muscle Controller

  • paper_url: http://arxiv.org/abs/2308.04462
  • repo_url: None
  • paper_authors: Kübra Akbaş, Carlotta Mummolo, Xianlian Zhou
  • for: This paper aims to explore the use of center of mass (COM) state space and reinforcement learning (RL) to monitor balance capabilities in humans, and to establish balance recovery limits.
  • methods: The paper employs a musculoskeletal model integrated with a balance controller, trained through RL, to investigate balancing capabilities. The RL framework includes two interconnected neural networks governing balance recovery and muscle coordination, trained using Proximal Policy Optimization (PPO) with reference state initialization, early termination, and multiple training strategies.
  • results: The paper obtains final balance recovery (BR) enclosing successful balance recovery trajectories by exploring recovery from random initial COM states (position and velocity) space for a trained controller. The BRs are compared with analytical postural stability limits from a linear inverted pendulum model, and the results show a similar trend in successful COM states but more limited ranges in the recoverable areas. The paper also investigates the effect of muscle weakness and neural excitation delay on the BRs, revealing reduced balancing capability in different regions.
    Abstract Balance assessment during physical rehabilitation often relies on rubric-oriented battery tests to score a patient's physical capabilities, leading to subjectivity. While some objective balance assessments exist, they are often limited to tracking the center of pressure (COP), which does not fully capture the whole-body postural stability. This study explores the use of the center of mass (COM) state space and presents a promising avenue for monitoring the balance capabilities in humans. We employ a musculoskeletal model integrated with a balance controller, trained through reinforcement learning (RL), to investigate balancing capabilities. The RL framework consists of two interconnected neural networks governing balance recovery and muscle coordination respectively, trained using Proximal Policy Optimization (PPO) with reference state initialization, early termination, and multiple training strategies. By exploring recovery from random initial COM states (position and velocity) space for a trained controller, we obtain the final BR enclosing successful balance recovery trajectories. Comparing the BRs with analytical postural stability limits from a linear inverted pendulum model, we observe a similar trend in successful COM states but more limited ranges in the recoverable areas. We further investigate the effect of muscle weakness and neural excitation delay on the BRs, revealing reduced balancing capability in different regions. Overall, our approach of learning muscular balance controllers presents a promising new method for establishing balance recovery limits and objectively assessing balance capability in bipedal systems, particularly in humans.
    摘要 评估身体重建中的平衡能力经常采用套路-oriented测试维度来评估病人的身体能力,带来主观性。尽管有一些 объектив的平衡评估存在,但它们通常只能跟踪中心重量(COP),不能完全捕捉人体整体姿态稳定性。本研究探讨了使用中心质量(COM)状态空间来监测人体平衡能力。我们采用了一种musculoskeletal模型和平衡控制器,通过反射学习(RL)训练,investigate balancing capabilities。RL框架包括两个相连的神经网络,一个 governing balance recovery,另一个 governing muscle coordination,通过距离最小化算法(PPO)进行训练。通过探索已经训练好的控制器从随机初始COM状态空间中恢复平衡的过程,我们获得了最终的BR(balance recovery)。 Comparing the BRs with analytical postural stability limits from a linear inverted pendulum model, we observe a similar trend in successful COM states but more limited ranges in the recoverable areas。我们进一步调查了肌肉衰竭和神经刺激延迟对BR的影响,发现在不同区域的平衡能力受到了限制。总的来说,我们的学习muscular平衡控制器的方法可能是评估人体平衡能力的新方法,特别是在人类身上。

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

  • paper_url: http://arxiv.org/abs/2308.03977
  • repo_url: https://github.com/facebookresearch/pug
  • paper_authors: Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, Pascal Vincent, Ari S. Morcos
  • for: 这篇论文旨在推广使用真实图像数据的替代方案,提供更多的控制权和更加真实的图像数据,以便更好地训练和评估深度神经网络。
  • methods: 这篇论文使用了Unreal Engine游戏引擎,生成了PUG(真实图像格式)环境和数据集,以便进行表示学习研究。
  • results: 论文通过PUG环境和数据集,实现了更加准确和可靠的视觉模型评估,提供了一种更加可控和真实的替代方案。
    Abstract Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth labels (and captions), (iii) precisely control distribution shifts between training and testing to isolate variables of interest for sound experimentation. Despite such promise, the use of synthetic image data is still limited -- and often played down -- mainly due to their lack of realism. Most works therefore rely on datasets of real images, which have often been scraped from public images on the internet, and may have issues with regards to privacy, bias, and copyright, while offering little control over how objects precisely appear. In this work, we present a path to democratize the use of photorealistic synthetic data: we develop a new generation of interactive environments for representation learning research, that offer both controllability and realism. We use the Unreal Engine, a powerful game engine well known in the entertainment industry, to produce PUG (Photorealistic Unreal Graphics) environments and datasets for representation learning. In this paper, we demonstrate the potential of PUG to enable more rigorous evaluations of vision models.
    摘要 <>通过实验室自动生成的图像集,深度神经网络的设计和评估受到了无与伦比的优势:可以(i)生成无数样本,(ii)精准控制每个场景,并提供细腻的标签和描述,(iii)在训练和测试之间准确控制分布变化,以孤立变量对照。尽管如此,使用synthetic图像数据仍然受到限制——常被淡化——主要因为它们缺乏真实感。大多数作品因此选择使用实际图像数据,这些数据通常是从互联网上抓取的,可能存在隐私、偏见和版权问题,而且对物体的显示没有准确控制。在这篇论文中,我们提出了一种将高真实度的synthetic数据普及化的方法:我们开发了一代新的交互环境,以Unreal Engine游戏引擎为基础,生成PUG(高真实度Unreal图形)环境和数据集,用于 representation learning研究。我们在这篇论文中展示了PUG的潜在力量,帮助vision模型的更加严格的评估。

Amortized Global Search for Efficient Preliminary Trajectory Design with Deep Generative Models

  • paper_url: http://arxiv.org/abs/2308.03960
  • repo_url: None
  • paper_authors: Anjian Li, Amlan Sinha, Ryne Beeson
  • for: 提出了一种用于减轻全球搜索问题的 computational complexity的方法,以便更好地解决高维度和非对称的 trajectory optimization problem。
  • methods: 我们使用深度生成模型来预测 trajectory 的解,并且利用 clustering 结构来加速全球搜索。
  • results: 我们在 De Jong 的 5 个函数和一个低推力圆形三体问题中进行了评估,并得到了良好的结果。
    Abstract Preliminary trajectory design is a global search problem that seeks multiple qualitatively different solutions to a trajectory optimization problem. Due to its high dimensionality and non-convexity, and the frequent adjustment of problem parameters, the global search becomes computationally demanding. In this paper, we exploit the clustering structure in the solutions and propose an amortized global search (AmorGS) framework. We use deep generative models to predict trajectory solutions that share similar structures with previously solved problems, which accelerates the global search for unseen parameter values. Our method is evaluated using De Jong's 5th function and a low-thrust circular restricted three-body problem.
    摘要 <>转换给定文本到简化中文。<>预liminary trajectory design是一个全球搜索问题,旨在找到多个 качеitative不同的解决方案。由于其高维度和非拟合性,以及常见的问题参数调整,全球搜索变得计算极其困难。在这篇论文中,我们利用解决方案中的凝集结构,并提出了一种含有各种凝集的全球搜索(AmorGS)框架。我们使用深度生成模型预测 trajectory解决方案,这些解决方案与之前已解决的问题中的结构相似,从而加速了未before seen的参数值上的全球搜索。我们的方法被评估使用De Jong的第五个函数和一个低推力圆形 restricted three-body problem。

Fixed Inter-Neuron Covariability Induces Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2308.03956
  • repo_url: None
  • paper_authors: Muhammad Ahmed Shah, Bhiksha Raj
  • for: 这个论文旨在提高深度神经网络(DNNs)对抗攻击的可靠性,并探索人类视觉的特性可能会帮助提高DNNs的类别化能力。
  • methods: 本论文提出了一个叫做自适应活化(SCA)层的新方法,这个层包含neuron的启动是彼此一致的,它们遵循一个已知但是学习的 covariability 模式。
  • results: 在实验中,使用 SCA 层的模型在图像和声音识别任务中实现了高准确率,并在Auto-PGD攻击中展现了明显更高的Robustness,不需要在训练过程中使用随机噪声训练。
    Abstract The vulnerability to adversarial perturbations is a major flaw of Deep Neural Networks (DNNs) that raises question about their reliability when in real-world scenarios. On the other hand, human perception, which DNNs are supposed to emulate, is highly robust to such perturbations, indicating that there may be certain features of the human perception that make it robust but are not represented in the current class of DNNs. One such feature is that the activity of biological neurons is correlated and the structure of this correlation tends to be rather rigid over long spans of times, even if it hampers performance and learning. We hypothesize that integrating such constraints on the activations of a DNN would improve its adversarial robustness, and, to test this hypothesis, we have developed the Self-Consistent Activation (SCA) layer, which comprises of neurons whose activations are consistent with each other, as they conform to a fixed, but learned, covariability pattern. When evaluated on image and sound recognition tasks, the models with a SCA layer achieved high accuracy, and exhibited significantly greater robustness than multi-layer perceptron models to state-of-the-art Auto-PGD adversarial attacks \textit{without being trained on adversarially perturbed data
    摘要

PMU measurements based short-term voltage stability assessment of power systems via deep transfer learning

  • paper_url: http://arxiv.org/abs/2308.03953
  • repo_url: https://github.com/SuperBruceJia/Power-Systems-Stability-Transfer-Learning
  • paper_authors: Yang Li, Shitu Zhang, Yuanzheng Li, Jiting Cao, Shuyue Jia
  • for: 这篇论文的目的是提出一种基于深度学习的短期电压稳定评估方法(STVSA),以解决现有的挑战,包括适应结构变化、样本标注和小型数据集处理。
  • methods: 本文提出的方法使用了深度转移学习,利用PMU测量数据创建初始数据集,采用时间ensemble进行样本标注,并使用最小二乘生成整数网络(LSGAN)进行数据增强。该方法可以有效地在小型数据集上进行深度学习,并且具有适应结构变化的能力。
  • results: 实验结果表明,提出的方法可以在IEEE 39-bus测试系统上提高模型评估精度约20%,并且具有强大的适应能力于结构变化。该方法还利用 transformer 模型中的自注意机制,与浅学习方法和其他深度学习基于方法相比,具有显著的优势。
    Abstract Deep learning has emerged as an effective solution for addressing the challenges of short-term voltage stability assessment (STVSA) in power systems. However, existing deep learning-based STVSA approaches face limitations in adapting to topological changes, sample labeling, and handling small datasets. To overcome these challenges, this paper proposes a novel phasor measurement unit (PMU) measurements-based STVSA method by using deep transfer learning. The method leverages the real-time dynamic information captured by PMUs to create an initial dataset. It employs temporal ensembling for sample labeling and utilizes least squares generative adversarial networks (LSGAN) for data augmentation, enabling effective deep learning on small-scale datasets. Additionally, the method enhances adaptability to topological changes by exploring connections between different faults. Experimental results on the IEEE 39-bus test system demonstrate that the proposed method improves model evaluation accuracy by approximately 20% through transfer learning, exhibiting strong adaptability to topological changes. Leveraging the self-attention mechanism of the Transformer model, this approach offers significant advantages over shallow learning methods and other deep learning-based approaches.
    摘要 深度学习已经成为电力系统短期电压稳定评估 (STVSA) 的有效解决方案。然而,现有的深度学习基于 STVSA 方法受到 топологи奇变、样本标注和处理小数据集的限制。为了缓解这些挑战,这篇论文提议一种基于 PMU 测量的新型 STVSA 方法,使用深度转移学习。该方法利用 PMU 测量获得的实时动态信息,创建初始数据集。它使用时间ensemble 进行样本标注,并使用最小二乘生成整形网络 (LSGAN) 进行数据增强,以便在小规模数据集上进行有效的深度学习。此外,该方法改进了对 topology 变化的适应性,通过探索不同的故障之间的连接。实验结果在 IEEE 39-bus 测试系统上表明,提案的方法可以通过转移学习提高评估准确率约 20%,并且具有强大的适应性。利用 Transformer 模型的自注意机制,该方法在比较深度学习方法和其他深度学习基于方法之上具有显著优势。

The Prospect of Enhancing Large-Scale Heterogeneous Federated Learning with Transformers

  • paper_url: http://arxiv.org/abs/2308.03945
  • repo_url: None
  • paper_authors: Yulan Gao, Zhaoxiang Hou, Chengyi Yang, Zengxiang Li, Han Yu
  • for: 本文探讨了基于Transformer的 Federated Learning(FL)模型在多个不同数据所有者之间协同训练AI模型的可能性,以实现泛化和个性化。
  • methods: 本文使用了Transformer、ResNet和个性化ResNet-based FL方法进行比较性实验,以评估这些方法在不同数据所有者和不同场景下的性能。
  • results: 实验结果显示,Transformer-based FL模型在大规模不同数据所有者的场景下表现出色,特别是在数据多样性和数据规模增加的情况下。此外,通过对CKA表示相似性进行分析,本文还提供了对Transformers的表现的深入理解。
    Abstract Federated learning (FL) addresses data privacy concerns by enabling collaborative training of AI models across distributed data owners. Wide adoption of FL faces the fundamental challenges of data heterogeneity and the large scale of data owners involved. In this paper, we investigate the prospect of Transformer-based FL models for achieving generalization and personalization in this setting. We conduct extensive comparative experiments involving FL with Transformers, ResNet, and personalized ResNet-based FL approaches under various scenarios. These experiments consider varying numbers of data owners to demonstrate Transformers' advantages over deep neural networks in large-scale heterogeneous FL tasks. In addition, we analyze the superior performance of Transformers by comparing the Centered Kernel Alignment (CKA) representation similarity across different layers and FL models to gain insight into the reasons behind their promising capabilities.
    摘要 合作学习(FL)解决数据隐私问题,通过在分布式数据所有者之间进行AI模型的共同训练。广泛采用FL面临了数据多样性和数据所有者的大规模挑战。在这篇论文中,我们调查了使用Transformer-based FL模型来实现通用和个性化。我们进行了广泛的比较实验,包括FL与Transformers、ResNet和个性化ResNet-based FL方法在不同情况下。这些实验涵盖了不同数据所有者的数量,以 demonstarteTransformers在大规模不同数据类型FL任务中的优势。此外,我们还分析了Transformers的高性能原因,通过比较不同层次的CKA表示相似性来获得关键因素的含义。

GraPhSyM: Graph Physical Synthesis Model

  • paper_url: http://arxiv.org/abs/2308.03944
  • repo_url: None
  • paper_authors: Ahmed Agiza, Rajarshi Roy, Teodor Dumitru Ene, Saad Godil, Sherief Reda, Bryan Catanzaro
  • for: 这个研究旨在开发一个快速和精准地预测实体合成电路延迟和面积度量的Graph Attention Network(GATv2)模型,以便在逻辑合成阶段就能够获得最终设计度量的准确见解,而不需要进行慢速实体合成流程。
  • methods: 这个模型使用Graph Structure、连接性和电气性特征来预测实体合成变数的影响,并且通过训练在6000个前置逻辑合成设计中,以0.22秒的快速推断时间预测未见的逻辑合成设计的实体合成延迟和面积度量。
  • results: 研究发现,这个模型可以高度精准地预测未见的逻辑合成设计的实体合成延迟(98.3%)和面积度量(96.1%),并且可以在不同的延迟目标下进行预测。此外,模型还可以在不同的逻辑合成设计中实现高度的构成性。
    Abstract In this work, we introduce GraPhSyM, a Graph Attention Network (GATv2) model for fast and accurate estimation of post-physical synthesis circuit delay and area metrics from pre-physical synthesis circuit netlists. Once trained, GraPhSyM provides accurate visibility of final design metrics to early EDA stages, such as logic synthesis, without running the slow physical synthesis flow, enabling global co-optimization across stages. Additionally, the swift and precise feedback provided by GraPhSym is instrumental for machine-learning-based EDA optimization frameworks. Given a gate-level netlist of a circuit represented as a graph, GraPhSyM utilizes graph structure, connectivity, and electrical property features to predict the impact of physical synthesis transformations such as buffer insertion and gate sizing. When trained on a dataset of 6000 prefix adder designs synthesized at an aggressive delay target, GraPhSyM can accurately predict the post-synthesis delay (98.3%) and area (96.1%) metrics of unseen adders with a fast 0.22s inference time. Furthermore, we illustrate the compositionality of GraPhSyM by employing the model trained on a fixed delay target to accurately anticipate post-synthesis metrics at a variety of unseen delay targets. Lastly, we report promising generalization capabilities of the GraPhSyM model when it is evaluated on circuits different from the adders it was exclusively trained on. The results show the potential for GraPhSyM to serve as a powerful tool for advanced optimization techniques and as an oracle for EDA machine learning frameworks.
    摘要 在这项工作中,我们介绍了GraPhSyM模型,是基于图注意力网络(GATv2)的一种快速和准确地计算后physical synthesis circuit延迟和面积指标的方法。一旦训练完成,GraPhSyM可以在逻辑合成之前提供准确的设计指标视图,无需运行慢的物理合成流程,从而实现全局协调。此外,GraPhSyM提供的快速和准确反馈对机器学习基于EDA优化框架非常有利。对于一个表示为图的逻辑电路,GraPhSyM利用图结构、连接和电性特征来预测物理合成转换(如缓冲插入和门大小调整)的影响。当训练在6000个逻辑和逻辑电路的延迟目标下进行的时候,GraPhSyM可以准确预测未看过的加器的延迟(98.3%)和面积(96.1%)指标,并且具有快速的0.22秒推理时间。此外,我们还证明了GraPhSyM的可组合性,可以使用固定延迟目标训练的模型来准确预测未看过的延迟目标。最后,我们报告了GraPhSyM模型在不同于它被专门训练的加器之外的普遍化能力。结果表明,GraPhSyM有望成为一种强大的进阶优化技术工具和EDA机器学习框架的oracle。

The Compatibility between the Pangu Weather Forecasting Model and Meteorological Operational Data

  • paper_url: http://arxiv.org/abs/2308.04460
  • repo_url: None
  • paper_authors: Wencong Cheng, Yan Yan, Jiangjiang Xia, Qi Liu, Chang Qu, Zhigang Wang
  • for: 本研究旨在评估Pangu-Weather模型与各种常用的 numerical weather prediction(NWP)操作分析的兼容性,以及改进模型的预测性能。
  • methods: 本研究使用Pangu-Weather模型进行预测,并对各种NWP操作分析进行对比研究。
  • results: 研究结果显示,Pangu-Weather模型与各种NWP操作分析兼容,并且可以改进预测性能。此外,提高全球或地方初始条件质量能够显著提高Pangu-Weather模型的预测性能。
    Abstract Recently, multiple data-driven models based on machine learning for weather forecasting have emerged. These models are highly competitive in terms of accuracy compared to traditional numerical weather prediction (NWP) systems. In particular, the Pangu-Weather model, which is open source for non-commercial use, has been validated for its forecasting performance by the European Centre for Medium-Range Weather Forecasts (ECMWF) and has recently been published in the journal "Nature". In this paper, we evaluate the compatibility of the Pangu-Weather model with several commonly used NWP operational analyses through case studies. The results indicate that the Pangu-Weather model is compatible with different operational analyses from various NWP systems as the model initial conditions, and it exhibits a relatively stable forecasting capability. Furthermore, we have verified that improving the quality of global or local initial conditions significantly contributes to enhancing the forecasting performance of the Pangu-Weather model.
    摘要 Translation in Simplified Chinese:最近,基于机器学习的多种数据驱动模型为气象预报出现了,这些模型与传统的数值气象预测(NWP)系统相比,具有高度竞争的准确性。其中,开源非商业用途的Pangu-Weather模型,已经由欧洲中期气象预测中心(ECMWF)验证了预测性能,并最近在《自然》杂志上发表。在这篇论文中,我们通过 caso studies 评估了Pangu-Weather模型与多种常用的NWP操作分析相容性。结果显示,Pangu-Weather模型可以与不同的NWP系统的操作分析进行Compatible,并且显示出相对稳定的预测能力。此外,我们还证明了改善全球或地方初始条件质量能够明显提高Pangu-Weather模型的预测性能。

Optimizing the switching operation in monoclonal antibody production: Economic MPC and reinforcement learning

  • paper_url: http://arxiv.org/abs/2308.03928
  • repo_url: None
  • paper_authors: Sandra A. Obiri, Song Bo, Bernard T. Agyeman, Benjamin Decardi-Nelson, Jinfeng Liu
  • for: 这篇论文主要针对的是大规模生产益血抗体(mAb)的实际问题,以及如何通过 kontinuierliche 生产过程来提高产品质量和生产效率。
  • methods: 这篇论文提出了三种计算机fficient的控制方法,包括sigmoid函数近似方法、ReLU近似方法和深度强化学习(DRL)。这三种方法都是为了解决批处理操作中的数学难题。
  • results: 论文的实验结果表明,使用sigmoid函数近似方法和ReLU近似方法可以提高吞吐量和生产效率,而且比传统的1%产品剩余规则更为灵活和有效。
    Abstract Monoclonal antibodies (mAbs) have emerged as indispensable assets in medicine, and are currently at the forefront of biopharmaceutical product development. However, the growing market demand and the substantial doses required for mAb clinical treatments necessitate significant progress in its large-scale production. Most of the processes for industrial mAb production rely on batch operations, which result in significant downtime. The shift towards a fully continuous and integrated manufacturing process holds the potential to boost product yield and quality, while eliminating the extra expenses associated with storing intermediate products. The integrated continuous mAb production process can be divided into the upstream and downstream processes. One crucial aspect that ensures the continuity of the integrated process is the switching of the capture columns, which are typically chromatography columns operated in a fed-batch manner downstream. Due to the discrete nature of the switching operation, advanced process control algorithms such as economic MPC (EMPC) are computationally difficult to implement. This is because an integer nonlinear program (INLP) needs to be solved online at each sampling time. This paper introduces two computationally-efficient approaches for EMPC implementation, namely, a sigmoid function approximation approach and a rectified linear unit (ReLU) approximation approach. It also explores the application of deep reinforcement learning (DRL). These three methods are compared to the traditional switching approach which is based on a 1% product breakthrough rule and which involves no optimization.
    摘要

Spellburst: A Node-based Interface for Exploratory Creative Coding with Natural Language Prompts

  • paper_url: http://arxiv.org/abs/2308.03921
  • repo_url: None
  • paper_authors: Tyler Angert, Miroslav Ivan Suzara, Jenny Han, Christopher Lawrence Pondoc, Hariharan Subramonyam
  • for: 该论文旨在提高创作编程的效率和效iveness,帮助艺术家更快速地实现他们的想法。
  • methods: 该论文使用大语言模型(LLM)来提供一个具有节点基础的创作编程环境,并通过表达式提示来帮助艺术家在 semantic 空间中进行编程。
  • results: 论文的评估表明,Spellburst 可以帮助艺术家更快速地实现他们的想法,并且可以帮助开发计算机创造力工具,以便在 semantic 和 sintactic 空间之间进行桥接。
    Abstract Creative coding tasks are often exploratory in nature. When producing digital artwork, artists usually begin with a high-level semantic construct such as a "stained glass filter" and programmatically implement it by varying code parameters such as shape, color, lines, and opacity to produce visually appealing results. Based on interviews with artists, it can be effortful to translate semantic constructs to program syntax, and current programming tools don't lend well to rapid creative exploration. To address these challenges, we introduce Spellburst, a large language model (LLM) powered creative-coding environment. Spellburst provides (1) a node-based interface that allows artists to create generative art and explore variations through branching and merging operations, (2) expressive prompt-based interactions to engage in semantic programming, and (3) dynamic prompt-driven interfaces and direct code editing to seamlessly switch between semantic and syntactic exploration. Our evaluation with artists demonstrates Spellburst's potential to enhance creative coding practices and inform the design of computational creativity tools that bridge semantic and syntactic spaces.
    摘要 创造性编程任务经常具有探索性质。当生成数字艺术作品时,艺术家通常从高水平semantic construct开始,如“普遍玻璃过滤器”,然后通过代码参数的变化,如形状、颜色、线条和透明度,来生成可观的结果。根据艺术家的采访,将semantic construct翻译到程序语法可能会困难,现有的编程工具也不太适合快速的创作探索。为解决这些挑战,我们介绍Spellburst,一个基于大语言模型(LLM)的创造编程环境。Spellburst提供以下功能:1. 节点基本接口,让艺术家通过分支和合并操作来生成生成艺术和探索不同的变化。2. 表达式基于的提示式交互,让艺术家通过提示来参与semantic programming。3. dinamic提示驱动的界面和直接代码编辑,让艺术家轻松地在semantic和语法空间之间切换。我们的评估表明,Spellburst可以增强创造编程做法,并为计算创造工具的设计提供指导。

Predicting and explaining nonlinear material response using deep Physically Guided Neural Networks with Internal Variables

  • paper_url: http://arxiv.org/abs/2308.03915
  • repo_url: None
  • paper_authors: Javier Orera-Echeverria, Jacobo Ayensa-Jiménez, Manuel Doblare
  • for: 这项研究的目的是用Physically Guided Neural Networks with Internal Variables (PGNNIV)方法揭示材料的 constitutive law,并能够预测未经见过的载荷场景下的内部和外部变量。
  • methods: 这项研究使用了新发展的PGNNIV方法,该方法通过使用物理问题的Physics-Informed Constraints (PIC)来约束特定的隐藏层,并且只通过测量力-压缩数据进行训练。
  • results: 研究发现PGNNIV方法能够预测不同材料的内部和外部变量,并且可以解释材料的 constitutive law,这种方法被称为Explainable Artificial Intelligence (XAI)。
    Abstract Nonlinear materials are often difficult to model with classical state model theory because they have a complex and sometimes inaccurate physical and mathematical description or we simply do not know how to describe such materials in terms of relations between external and internal variables. In many disciplines, Neural Network methods have arisen as powerful tools to identify very complex and non-linear correlations. In this work, we use the very recently developed concept of Physically Guided Neural Networks with Internal Variables (PGNNIV) to discover constitutive laws using a model-free approach and training solely with measured force-displacement data. PGNNIVs make a particular use of the physics of the problem to enforce constraints on specific hidden layers and are able to make predictions without internal variable data. We demonstrate that PGNNIVs are capable of predicting both internal and external variables under unseen load scenarios, regardless of the nature of the material considered (linear, with hardening or softening behavior and hyperelastic), unravelling the constitutive law of the material hence explaining its nature altogether, placing the method in what is known as eXplainable Artificial Intelligence (XAI).
    摘要

ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition

  • paper_url: http://arxiv.org/abs/2308.03908
  • repo_url: None
  • paper_authors: Soumyabrata Chaudhuri, Saumik Bhattacharya
  • for: 本研究旨在提出一种基于多Modal学习的人体动作识别方法,以解决视频动作识别 task 中的复杂性问题。
  • methods: 本研究使用了一种新的 pose 增强的视力语言模型 (VLM),其combines pose 和视觉信息,以及文本特征。
  • results: experiments 表明,该方法可以在两个人体动作识别数据集 UCF-101 和 HMDB-51 上达到 92.81% 和 73.02% 的准确率,而无需任何视频数据预训练。经ketics预训练后,准确率可以达到 96.11% 和 75.75%。
    Abstract Video Action Recognition (VAR) is a challenging task due to its inherent complexities. Though different approaches have been explored in the literature, designing a unified framework to recognize a large number of human actions is still a challenging problem. Recently, Multi-Modal Learning (MML) has demonstrated promising results in this domain. In literature, 2D skeleton or pose modality has often been used for this task, either independently or in conjunction with the visual information (RGB modality) present in videos. However, the combination of pose, visual information, and text attributes has not been explored yet, though text and pose attributes independently have been proven to be effective in numerous computer vision tasks. In this paper, we present the first pose augmented Vision-language model (VLM) for VAR. Notably, our scheme achieves an accuracy of 92.81% and 73.02% on two popular human video action recognition benchmark datasets, UCF-101 and HMDB-51, respectively, even without any video data pre-training, and an accuracy of 96.11% and 75.75% after kinetics pre-training.
    摘要 视频动作识别(VAR)是一个复杂的任务,它的内在复杂性使得设计一个综合性的框架来识别大量人类动作变得非常困难。然而,在文献中,不同的方法已经被探讨过,但是设计一个综合性的框架仍然是一个挑战。在文献中,2D骨骼或pose特征 oftentimes 用于这个任务,可以独立或与视觉信息(RGB特征)一起使用。然而,对于pose、视觉信息和文本特征的组合尚未被探讨,尽管文本和pose特征独立地已经证明了其效果在许多计算机视觉任务中。在这篇论文中,我们提出了首个含有 pose 的视力语言模型(VLM),该模型在 UCF-101 和 HMDB-51 两个常用的人类动作识别 benchmark 数据集上达到了 92.81% 和 73.02% 的准确率,而无需任何视频数据预训练,并且在预训练后达到了 96.11% 和 75.75% 的准确率。

Advancements In Crowd-Monitoring System: A Comprehensive Analysis of Systematic Approaches and Automation Algorithms: State-of-The-Art

  • paper_url: http://arxiv.org/abs/2308.03907
  • repo_url: None
  • paper_authors: Mohammed Ameen, Richard Stone
  • For: This paper focuses on the development and analysis of crowd monitoring systems, specifically exploring the use of artificial intelligence (AI) algorithms and models to enhance their effectiveness and security.* Methods: The paper employs a bifurcated approach, comparing vision-based and non-vision-based technologies for crowd monitoring, and examines the efficacy of these methods in different environments and contexts.* Results: The paper presents an in-depth analysis of the recent incorporation of AI algorithms and models into automated crowd monitoring systems, highlighting their contemporary applications and effectiveness in various contexts.
    Abstract Growing apprehensions surrounding public safety have captured the attention of numerous governments and security agencies across the globe. These entities are increasingly acknowledging the imperative need for reliable and secure crowd-monitoring systems to address these concerns. Effectively managing human gatherings necessitates proactive measures to prevent unforeseen events or complications, ensuring a safe and well-coordinated environment. The scarcity of research focusing on crowd monitoring systems and their security implications has given rise to a burgeoning area of investigation, exploring potential approaches to safeguard human congregations effectively. Crowd monitoring systems depend on a bifurcated approach, encompassing vision-based and non-vision-based technologies. An in-depth analysis of these two methodologies will be conducted in this research. The efficacy of these approaches is contingent upon the specific environment and temporal context in which they are deployed, as they each offer distinct advantages. This paper endeavors to present an in-depth analysis of the recent incorporation of artificial intelligence (AI) algorithms and models into automated systems, emphasizing their contemporary applications and effectiveness in various contexts.
    摘要 全球各地政府和安全机构都在关注公众安全的问题上感到担忧,认为需要可靠和安全的人群监测系统来解决这些问题。管理人群聚集需要采取先进的措施,以避免未然的事件或复杂性,确保安全和有效地协调环境。由于人群监测系统的安全性研究不足,这个领域的研究正在不断扩展,探讨有效地保护人群聚集的方法。人群监测系统采用分割方法,包括视觉基于和非视觉基于技术。本研究将进行深入分析这两种方法,分别在不同环境和时间上的效果。由于这些方法在不同情况下的应用,它们各有优劣。本文将强调现代应用的人工智能(AI)算法和模型在自动化系统中的应用,探讨其在不同场景中的现代应用和效果。

Intelligent Assistant Language Understanding On Device

  • paper_url: http://arxiv.org/abs/2308.03905
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Cecilia Aas, Hisham Abdelsalam, Irina Belousova, Shruti Bhargava, Jianpeng Cheng, Robert Daland, Joris Driesen, Federico Flego, Tristan Guigue, Anders Johannsen, Partha Lal, Jiarui Lu, Joel Ruben Antony Moniz, Nathan Perkins, Dhivya Piraviperumal, Stephen Pulman, Diarmuid Ó Séaghdha, David Q. Sun, John Torr, Marco Del Vecchio, Jay Wacker, Jason D. Williams, Hong Yu
  • for: 这篇论文描述了一种运行在个人电子设备上的自然语言理解系统的设计。
  • methods: 该系统使用了一些特定的架构和技术,例如在对话系统文献中一些方法可能在部署环境中难以维护。
  • results: 相比服务器基础上的助手,这种系统更加私钥、可靠、快速、表达力强和准确。
    Abstract It has recently become feasible to run personal digital assistants on phones and other personal devices. In this paper we describe a design for a natural language understanding system that runs on device. In comparison to a server-based assistant, this system is more private, more reliable, faster, more expressive, and more accurate. We describe what led to key choices about architecture and technologies. For example, some approaches in the dialog systems literature are difficult to maintain over time in a deployment setting. We hope that sharing learnings from our practical experiences may help inform future work in the research community.
    摘要 现在可以在手机和其他个人设备上运行个人数字助手。在这篇论文中,我们描述了一种运行于设备上的自然语言理解系统的设计。与服务器上的助手相比,这种系统更加私隐、可靠、快速、表达力 stronger和更准确。我们介绍了一些关键的建筑和技术选择,例如在部署环境中维护一些对话系统文献中的方法可能困难。我们希望通过分享我们的实践经验,可以对未来的研究community提供指导。

On genuine invariance learning without weight-tying

  • paper_url: http://arxiv.org/abs/2308.03904
  • repo_url: https://github.com/amoskalev/ginvariance
  • paper_authors: Artem Moskalev, Anna Sepliarskaia, Erik J. Bekkers, Arnold Smeulders
  • for: investigate properties and limitations of invariance learned by neural networks from the data compared to the genuine invariance achieved through invariant weight-tying.
  • methods: adopt a group theoretical perspective and analyze invariance learning in neural networks without weight-tying constraints.
  • results: demonstrate that even when a network learns to correctly classify samples on a group orbit, the underlying decision-making in such a model does not attain genuine invariance, and propose several metrics to quantify learned invariance.
    Abstract In this paper, we investigate properties and limitations of invariance learned by neural networks from the data compared to the genuine invariance achieved through invariant weight-tying. To do so, we adopt a group theoretical perspective and analyze invariance learning in neural networks without weight-tying constraints. We demonstrate that even when a network learns to correctly classify samples on a group orbit, the underlying decision-making in such a model does not attain genuine invariance. Instead, learned invariance is strongly conditioned on the input data, rendering it unreliable if the input distribution shifts. We next demonstrate how to guide invariance learning toward genuine invariance by regularizing the invariance of a model at the training. To this end, we propose several metrics to quantify learned invariance: (i) predictive distribution invariance, (ii) logit invariance, and (iii) saliency invariance similarity. We show that the invariance learned with the invariance error regularization closely reassembles the genuine invariance of weight-tying models and reliably holds even under a severe input distribution shift. Closer analysis of the learned invariance also reveals the spectral decay phenomenon, when a network chooses to achieve the invariance to a specific transformation group by reducing the sensitivity to any input perturbation.
    摘要 在这篇论文中,我们研究神经网络学习的不变性和其限制,并与真正的不变性相比较。为此,我们采用群理论的视角,分析神经网络无束缚的不变性学习。我们示示,即使神经网络能正确地分类样本在群或бие中,其下面的决策不会实现真正的不变性。相反,学习的不变性强烈受输入数据的影响,因此在输入分布变化时无法保靠。我们随后示出如何通过训练时的不变性正则化来引导神经网络学习真正的不变性。为此,我们提出了几个度量学习的不变性:(一)预测分布不变性、(二)启动函数不变性和(三)相似性不变性。我们表明,通过不变性错误正则化学习的不变性几乎与束缚模型的真正不变性相同,并在输入分布变化时可靠地保持。进一步分析学习的不变性也揭示了特征衰落现象,当神经网络选择通过减少输入干扰的敏感度来实现不变性。

FLIPS: Federated Learning using Intelligent Participant Selection

  • paper_url: http://arxiv.org/abs/2308.03901
  • repo_url: None
  • paper_authors: Rahul Atul Bhope, K. R. Jayaram, Nalini Venkatasubramanian, Ashish Verma, Gegi Thomas
  • for: 本研究旨在设计和实现聚合资料和参与者不同性中的联邦学习(FL)训练负载中的中介软件系统(FLIPS),以实现资料和参与者不同性的管理。
  • methods: FLIPS使用标签分布对party的资料进行对数分布 clustering,并在FL训练过程中确保每个群集都具有相等的代表性。FLIPS支持最常用的FL算法,包括FedAvg、FedProx、FedDyn、FedOpt和FedYogi。它还包括一个适应式的对应过程来处理分布式环境中的平台不同性和动态资源可用性。
  • results: 我们的严谨实验表明,相比随机选择party,FLIPS可以提高精度,在20-60%的通信成本下提高精度17-20%,并且这些优势在参与者具有慢卡特性时仍保持。
    Abstract This paper presents the design and implementation of FLIPS, a middleware system to manage data and participant heterogeneity in federated learning (FL) training workloads. In particular, we examine the benefits of label distribution clustering on participant selection in federated learning. FLIPS clusters parties involved in an FL training job based on the label distribution of their data apriori, and during FL training, ensures that each cluster is equitably represented in the participants selected. FLIPS can support the most common FL algorithms, including FedAvg, FedProx, FedDyn, FedOpt and FedYogi. To manage platform heterogeneity and dynamic resource availability, FLIPS incorporates a straggler management mechanism to handle changing capacities in distributed, smart community applications. Privacy of label distributions, clustering and participant selection is ensured through a trusted execution environment (TEE). Our comprehensive empirical evaluation compares FLIPS with random participant selection, as well as two other "smart" selection mechanisms - Oort and gradient clustering using two real-world datasets, two different non-IID distributions and three common FL algorithms (FedYogi, FedProx and FedAvg). We demonstrate that FLIPS significantly improves convergence, achieving higher accuracy by 17 - 20 % with 20 - 60 % lower communication costs, and these benefits endure in the presence of straggler participants.
    摘要

Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational Data

  • paper_url: http://arxiv.org/abs/2308.03892
  • repo_url: https://github.com/anupshakya07/attn-scaling
  • paper_authors: Anup Shakya, Vasile Rus, Deepak Venugopal
    for:本研究旨在提高学生数学学习效果使用智能教学系统(ITS)和适应教学系统(AIS)。methods:我们利用机器学习和人工智能技术来预测学生的解决策略,以便个性化为每个学生适应。我们首先学习学生的掌握表示(MVec),然后使用非 Parametric 聚类算法将这些表示分成不同的群组。最后,我们使用深度神经网络(DNN)模型来预测学生的解决策略。results:我们使用实际世界大规模学生互动数据集(MATHia)进行实验,并使用 transformers 和 Node2Vec 来学习 MVec,以及 LSTM 来预测解决策略。我们的方法可以扩展到大规模数据集,并且具有预测准确性和predictive equality,即预测策略具有一定的普适性。
    Abstract Understanding a student's problem-solving strategy can have a significant impact on effective math learning using Intelligent Tutoring Systems (ITSs) and Adaptive Instructional Systems (AISs). For instance, the ITS/AIS can better personalize itself to correct specific misconceptions that are indicated by incorrect strategies, specific problems can be designed to improve strategies and frustration can be minimized by adapting to a student's natural way of thinking rather than trying to fit a standard strategy for all. While it may be possible for human experts to identify strategies manually in classroom settings with sufficient student interaction, it is not possible to scale this up to big data. Therefore, we leverage advances in Machine Learning and AI methods to perform scalable strategy prediction that is also fair to students at all skill levels. Specifically, we develop an embedding called MVec where we learn a representation based on the mastery of students. We then cluster these embeddings with a non-parametric clustering method where we progressively learn clusters such that we group together instances that have approximately symmetrical strategies. The strategy prediction model is trained on instances sampled from these clusters. This ensures that we train the model over diverse strategies and also that strategies from a particular group do not bias the DNN model, thus allowing it to optimize its parameters over all groups. Using real world large-scale student interaction datasets from MATHia, we implement our approach using transformers and Node2Vec for learning the mastery embeddings and LSTMs for predicting strategies. We show that our approach can scale up to achieve high accuracy by training on a small sample of a large dataset and also has predictive equality, i.e., it can predict strategies equally well for learners at diverse skill levels.
    摘要 理解学生的问题解决策略可以对智能教学系统(ITS)和适应教学系统(AIS)的有效学习产生重要影响。例如,ITS/AIS可以更好地个性化自己,为学生的特定错误策略进行特定的更正,设计特定的问题来改善策略,并降低学生的沮丧度。虽然在课堂 SETTINGS中,人工专家可能可以手动确定策略,但不可能扩展到大数据。因此,我们利用机器学习和人工智能技术进行可扩展的策略预测,同时保证学生的公平性。我们开发了一个叫做MVec的嵌入,其中我们学习了学生的掌握程度的表示。然后我们使用非Parametric clustering方法,分类这些嵌入,并逐渐学习分组,以便将学生的策略分为不同的组。我们的策略预测模型是基于这些分组的实例进行训练的。这种方法可以在多个组中学习多种策略,同时避免策略来自某个组的偏见,使得神经网络模型能够在所有组之间优化参数。使用来自MATHia的实际大规模学生互动数据,我们采用了 transformers 和 Node2Vec 来学习掌握嵌入,并使用 LSTM 来预测策略。我们的方法可以在大规模数据上进行扩展,并且具有预测公平性,即可以平等地预测学生的策略水平。

  • paper_url: http://arxiv.org/abs/2308.03883
  • repo_url: https://github.com/northeastern-datalab/alt-gen
  • paper_authors: Koyena Pal, Aamod Khatiwada, Roee Shraga, Renée J. Miller
  • for: 这篇论文的目的是提出一种使用生成AI模型创建结构化数据审核 benchmark,以解决数据管理问题的semantic nature。
  • methods: 该论文使用的方法包括使用生成AI模型创建表结构和数据,以及评估表联合搜索方法的性能。
  • results: 该论文的结果表明,使用生成AI模型创建的 benchmark 更加具有挑战性,比手动创建的 benchmark 更能让方法进行细致的分析,包括 false positives 和 false negatives 的分析。
    Abstract Data management has traditionally relied on synthetic data generators to generate structured benchmarks, like the TPC suite, where we can control important parameters like data size and its distribution precisely. These benchmarks were central to the success and adoption of database management systems. But more and more, data management problems are of a semantic nature. An important example is finding tables that can be unioned. While any two tables with the same cardinality can be unioned, table union search is the problem of finding tables whose union is semantically coherent. Semantic problems cannot be benchmarked using synthetic data. Our current methods for creating benchmarks involve the manual curation and labeling of real data. These methods are not robust or scalable and perhaps more importantly, it is not clear how robust the created benchmarks are. We propose to use generative AI models to create structured data benchmarks for table union search. We present a novel method for using generative models to create tables with specified properties. Using this method, we create a new benchmark containing pairs of tables that are both unionable and non-unionable but related. We thoroughly evaluate recent existing table union search methods over existing benchmarks and our new benchmark. We also present and evaluate a new table search methods based on recent large language models over all benchmarks. We show that the new benchmark is more challenging for all methods than hand-curated benchmarks, specifically, the top-performing method achieves a Mean Average Precision of around 60%, over 30% less than its performance on existing manually created benchmarks. We examine why this is the case and show that the new benchmark permits more detailed analysis of methods, including a study of both false positives and false negatives that were not possible with existing benchmarks.
    摘要 datamanagement 历史上通过生成器来生成结构化的benchmark,如TPC集成,以便控制数据大小和分布的重要参数。这些benchmark 对数据管理系统的成功和普及起到了关键作用。但随着时间的推移,数据管理问题变得越来越 semantic in nature。一个重要的例子是找到可以合并的表。虽然任何两个表都可以合并,但表合并搜索是找到可以semantically coherent的表的问题。semantic 问题不能使用生成的数据来 benchmark。我们目前的benchmark创建方法是通过手动筛选和标注实际数据来实现。这些方法不具有可靠性和可扩展性,而且更重要的是,不确定创建的benchmark 的可靠性。我们提议使用生成AI模型来创建结构化数据benchmark для表合并搜索。我们提出了一种使用生成模型创建表的新方法。使用这种方法,我们创建了一个新的benchmark,包含可以合并的表和不可以合并的表,但它们之间存在关系。我们对现有benchmark 和我们新创建的benchmark进行了仔细的评估。我们还提出了基于最新的大语言模型的新表搜索方法,并对所有benchmark进行了评估。我们发现,新的benchmark 比手动创建的benchmark 更加挑战,特别是top-performing方法的 Mean Average Precision 约为60%,相比手动创建的benchmark 上的性能下降了30%。我们分析了这种情况,并证明新的benchmark 允许更详细的方法分析,包括对方法的false positives和false negatives进行了研究,这些研究不可能通过现有的benchmark 进行。

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

  • paper_url: http://arxiv.org/abs/2308.03882
  • repo_url: None
  • paper_authors: Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme
  • for: 这个论文的目的是提出一种新的无线网络学习方法,以增强无线网络学习方法的探索和利用能力。
  • methods: 这个论文使用了一种模型械ск抽象的方法,通过减少值估计的不确定性来保持探索和利用的平衡。
  • results: 这个论文通过一种新的不可见状态扩展策略来提高无线网络学习的性能,并证明了这种策略可以减少数据集Q值估计的平均值,从而实现更保守的Q值估计。
    Abstract Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based methods are able to further exploit unseen states via model rollouts. However, such methods are handicapped in their ability to find unseen states far away from the available offline data due to two factors -- (a) very short rollout horizons in models due to cascading model errors, and (b) model rollouts originating solely from states observed in offline data. We relax the second assumption and present a novel unseen state augmentation strategy to allow exploitation of unseen states where the learned model and value estimates generalize. Our strategy finds unseen states by value-informed perturbations of seen states followed by filtering out states with epistemic uncertainty estimates too high (high error) or too low (too similar to seen data). We observe improved performance in several offline RL tasks and find that our augmentation strategy consistently leads to overall lower average dataset Q-value estimates i.e. more conservative Q-value estimates than a baseline.
    摘要 无线连接学习(RL)方法寻找平衡 между探索和占用,通过保守估值来衡量未看到的状态和动作的价值。无模型方法对所有未看到的动作进行 penalty,而模型基于方法可以通过模型扩展来进一步利用未看到的状态。然而,这些方法因两个因素受限:(a)模型中的扩展时间非常短,由于堆叠模型错误,和(b)模型扩展仅从看到的状态开始。我们relax这个第二个假设,并提出了一种新的未看到状态扩展策略,允许利用未看到状态的价值估计。我们的策略通过对已经看到的状态进行价值意识的扰动,然后过滤高度 Epistemic 不确定性(高错误)或者太像已经看到的数据的状态。我们在多个无线RL任务中观察到改进的性能,并发现我们的扩展策略通常比基准值更保守,即更低的平均数据Q估值。

Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

  • paper_url: http://arxiv.org/abs/2308.03873
  • repo_url: https://github.com/wm-semeru/codesyntaxconcept
  • paper_authors: David N Palacio, Alejandro Velasco, Daniel Rodriguez-Cardenas, Kevin Moran, Denys Poshyvanyk
  • for: 这个论文旨在探讨大型自然语言模型(LLM)在编程任务上的效果和解释方法。
  • methods: 该论文提出了一种专门为 LLM 编程任务的解释方法,即 ASTxplainer,它可以帮助用户理解模型预测结果。ASTxplainer 使用了自动将token预测与AST结构相对应的方法,并提供了一种基于 AST 结构的模型评估方法和预测视图。
  • results: 该论文通过对 12 种流行的 LLM 进行实验,以及对 ASTxplainer derive 的视图进行用户研究,显示了 ASTxplainer 的潜在作用和可用性。研究结果表明,ASTxplainer 可以提供有用的预测解释和模型效果评估。
    Abstract Large Language Models (LLMs) for code are a family of high-parameter, transformer-based neural networks pre-trained on massive datasets of both natural and programming languages. These models are rapidly being employed in commercial AI-based developer tools, such as GitHub CoPilot. However, measuring and explaining their effectiveness on programming tasks is a challenging proposition, given their size and complexity. The methods for evaluating and explaining LLMs for code are inextricably linked. That is, in order to explain a model's predictions, they must be reliably mapped to fine-grained, understandable concepts. Once this mapping is achieved, new methods for detailed model evaluations are possible. However, most current explainability techniques and evaluation benchmarks focus on model robustness or individual task performance, as opposed to interpreting model predictions. To this end, this paper introduces ASTxplainer, an explainability method specific to LLMs for code that enables both new methods for LLM evaluation and visualizations of LLM predictions that aid end-users in understanding model predictions. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes, by extracting and aggregating normalized model logits within AST structures. To demonstrate the practical benefit of ASTxplainer, we illustrate the insights that our framework can provide by performing an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects. Additionally, we perform a user study examining the usefulness of an ASTxplainer-derived visualization of model predictions aimed at enabling model users to explain predictions. The results of these studies illustrate the potential for ASTxplainer to provide insights into LLM effectiveness, and aid end-users in understanding predictions.
    摘要 大型语言模型(LLM) для程式码是一家高参数、transformer基于神经网络的家族,在巨大的自然语言和程式语言Dataset上预训。这些模型在商业AI基于开发工具中被快速运用,例如GitHub CoPilot。然而,评估和解释LLMs的效果在程式任务上是一个具有挑战性的问题,因为它们的大小和复杂性。以下是一些用于评估和解释LLMs的方法:1. 将模型预测与AST结构进行自动对齐,以提取和聚合 нор化的模型潜在值。2. 使用AST结构来解释模型预测的方法,以提供更多的具体和可理解的概念。3. 使用新的评估方法和可视化工具来评估LLM的效果。为了解决这个问题,本文将介绍一种特有的解释方法——ASTxplainer,它可以帮助用户理解LLM的预测。ASTxplainer使用自动对齐模型预测和AST结构,以提取和聚合 нор化的模型潜在值。这些方法可以提供更多的具体和可理解的概念,以帮助用户理解LLM的预测。为了证明ASTxplainer的实用性,我们在12种popular LLMs for code上进行了一场empirical评估,使用一个 curaateddataset of the most popular GitHub projects。此外,我们还进行了一次用户研究,评估ASTxplainer-derived的可视化工具是否可以帮助用户解释模型预测。研究结果表明,ASTxplainer可以提供LLM效果的实际价值,并帮助用户理解预测。

Semantic Equivalence of e-Commerce Queries

  • paper_url: http://arxiv.org/abs/2308.03869
  • repo_url: None
  • paper_authors: Aritra Mandal, Daniel Tunkelang, Zhe Wu
  • for: 提高电商搜索的用户体验和商业效果,解决查询Intent的识别和利用问题。
  • methods: 提出了一种框架,包括将查询映射到搜索意图的 vector 表示,并通过对查询的surface similarity和行为相似性进行识别,以便找到最相似的查询。
  • results: 实验结果表明,该方法可以高效地识别和利用查询Intent,并且可以超越流行的句子转换器模型,实现了查询相似性的Pearson相关系数0.85。这些结果表明,可以通过历史搜索行为数据和模型训练来认识和利用查询Intent,从而提高用户体验和商业效果。
    Abstract Search query variation poses a challenge in e-commerce search, as equivalent search intents can be expressed through different queries with surface-level differences. This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes. The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for user or business objectives. The framework utilizes both surface similarity and behavioral similarity to determine query equivalence. Surface similarity involves canonicalizing queries based on word inflection, word order, compounding, and noise words. Behavioral similarity leverages historical search behavior to generate vector representations of query intent. An offline process is used to train a sentence similarity model, while an online nearest neighbor approach supports processing of unseen queries. Experimental evaluations demonstrate the effectiveness of the proposed approach, outperforming popular sentence transformer models and achieving a Pearson correlation of 0.85 for query similarity. The results highlight the potential of leveraging historical behavior data and training models to recognize and utilize query equivalence in e-commerce search, leading to improved user experiences and business outcomes. Further advancements and benchmark datasets are encouraged to facilitate the development of solutions for this critical problem in the e-commerce domain.
    摘要 搜索查询的变化呈现了电商搜索中的挑战,因为相同的搜索意图可以通过不同的查询语句表达。本文介绍了一个框架,用于认可和利用查询相似性,以提高搜索者和商业目标的结果。该方案解决了三个关键问题:将查询映射到搜索意图的 вектор表示,标识最相似的查询,并优化用户或商业目标。该框架利用surface similarity和behavioral similarity来确定查询相似性。surface similarity通过词形变化、词序、合成和噪声词进行 canonicalization。behavioral similarity利用历史搜索行为生成搜索意图的 вектор表示。在线 nearest neighbor 方法支持处理未看过的查询。实验证明了提议的方法的有效性,比 популяр的句子转换器模型高效,并达到了0.85的Spearman相关系数。结果表明可以利用历史行为数据和模型训练来认可和利用查询相似性,从而提高用户体验和商业result。进一步的进步和标准化数据集的开发可以促进在电商领域内的解决这类问题的发展。

AI Text-to-Behavior: A Study In Steerability

  • paper_url: http://arxiv.org/abs/2308.07326
  • repo_url: None
  • paper_authors: David Noever, Sam Hyams
  • for: 这个研究探究了大语言模型(LLMs)的可控性,尤其是OpenAI的ChatGPT迭代。
  • methods: 我们使用了行为心理学框架called OCEAN(开放性、注意力、外向性、合作性、不稳定性)来量化模型的响应性。
  • results: 我们发现了不同 trait的语言对应性,包括“开放性”、“注意力”和“合作性”,而“外向性”和“不稳定性”则显示了明显的差异。这些发现表明GPT的多样性和适应能力,但同时也表明了一些问题,如训练技术的不透明度和LLM的快速进步。
    Abstract The research explores the steerability of Large Language Models (LLMs), particularly OpenAI's ChatGPT iterations. By employing a behavioral psychology framework called OCEAN (Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism), we quantitatively gauged the model's responsiveness to tailored prompts. When asked to generate text mimicking an extroverted personality, OCEAN scored the language alignment to that behavioral trait. In our analysis, while "openness" presented linguistic ambiguity, "conscientiousness" and "neuroticism" were distinctly evoked in the OCEAN framework, with "extroversion" and "agreeableness" showcasing a notable overlap yet distinct separation from other traits. Our findings underscore GPT's versatility and ability to discern and adapt to nuanced instructions. Furthermore, historical figure simulations highlighted the LLM's capacity to internalize and project instructible personas, precisely replicating their philosophies and dialogic styles. However, the rapid advancements in LLM capabilities and the opaque nature of some training techniques make metric proposals degrade rapidly. Our research emphasizes a quantitative role to describe steerability in LLMs, presenting both its promise and areas for further refinement in aligning its progress to human intentions.
    摘要 Note: Please note that the translation is in Simplified Chinese, and some words or phrases may have different translations in Traditional Chinese.

MCTS guided Genetic Algorithm for optimization of neural network weights

  • paper_url: http://arxiv.org/abs/2308.04459
  • repo_url: https://github.com/AkshayHebbar/MCTS-GA
  • paper_authors: Akshay Hebbar
  • for: 本研究探讨了应用搜索策略到遗传算法中的整个遗传树结构。
  • methods: 本研究使用了多种搜索方法,包括广度优先搜索、深度优先搜索和迭代搜索等,但这些方法通常需要较长的计算时间。作者采用了对抗技术,以优化遗传算法中的搜索。
  • results: 本研究结果表明,结合遗传算法和蒙地卡树搜索策略可以优化神经网络的优化问题。通过对遗传树进行优化搜索,可以快速地找到最佳的神经网络结构。
    Abstract In this research, we investigate the possibility of applying a search strategy to genetic algorithms to explore the entire genetic tree structure. Several methods aid in performing tree searches; however, simpler algorithms such as breadth-first, depth-first, and iterative techniques are computation-heavy and often result in a long execution time. Adversarial techniques are often the preferred mechanism when performing a probabilistic search, yielding optimal results more quickly. The problem we are trying to tackle in this paper is the optimization of neural networks using genetic algorithms. Genetic algorithms (GA) form a tree of possible states and provide a mechanism for rewards via the fitness function. Monte Carlo Tree Search (MCTS) has proven to be an effective tree search strategy given states and rewards; therefore, we will combine these approaches to optimally search for the best result generated with genetic algorithms.
    摘要 在这项研究中,我们研究了将搜索策略应用于遗传算法,以探索整个遗传树结构。许多方法可以进行树搜索,但是简单的算法如广度优先、深度优先和迭代方法往往需要较长的计算时间。对于probabilistic搜索,反斗技术通常是首选的机制,可以快速获得优化结果。我们在这篇论文中是通过遗传算法优化神经网络的优化问题。遗传算法形成了一棵可能的状态树,并提供了一种via遗传函数的奖励机制。蒙地卡罗瑞搜索(MCTS)在给定状态和奖励时已经证明是一个有效的搜索策略,因此我们将这些方法相结合,以优化遗传算法中的最佳结果。

Revisiting Prompt Engineering via Declarative Crowdsourcing

  • paper_url: http://arxiv.org/abs/2308.03854
  • repo_url: None
  • paper_authors: Aditya G. Parameswaran, Shreya Shankar, Parth Asawa, Naman Jain, Yujie Wang
  • for: 本研究旨在提高LLM(大型自然语言模型)在数据处理工作流程中的质量,同时保持成本在bounds。
  • methods: 本研究提出了一种宣言式的推广工程(Prompt Engineering)方法,利用多种推广策略、保证内部一致性,以及混合LLM-非LLM方法来使推广工程变得更原理化。
  • results: 预liminary的案例研究表明,使用宣言式推广工程可以提高LLM在排序、实体解析和填充等任务中的性能。
    Abstract Large language models (LLMs) are incredibly powerful at comprehending and generating data in the form of text, but are brittle and error-prone. There has been an advent of toolkits and recipes centered around so-called prompt engineering-the process of asking an LLM to do something via a series of prompts. However, for LLM-powered data processing workflows, in particular, optimizing for quality, while keeping cost bounded, is a tedious, manual process. We put forth a vision for declarative prompt engineering. We view LLMs like crowd workers and leverage ideas from the declarative crowdsourcing literature-including leveraging multiple prompting strategies, ensuring internal consistency, and exploring hybrid-LLM-non-LLM approaches-to make prompt engineering a more principled process. Preliminary case studies on sorting, entity resolution, and imputation demonstrate the promise of our approach
    摘要 大型语言模型(LLM)具有极高的文本理解和生成能力,但是它们受限于精度和精度。随着推 engineering(提示工程)的出现,人们开始关注如何使用提示来让 LLM 完成某种任务。然而,为了在 LLM 驱动的数据处理工作流程中提高质量,同时保持成本在bounds,是一个繁琐、手动的过程。我们提出了声明式推 engineering 的视野。我们视 LLM 为群组工作者,并利用声明式招募文献中的想法,包括多种提示策略、内部一致性和混合 LLM-非 LLM approaches,以使提示工程变得更加原则化。初步的案例研究表明,这种方法在排序、实体解析和填充等方面具有承诺的批处。

Search Engine and Recommendation System for the Music Industry built with JinaAI

  • paper_url: http://arxiv.org/abs/2308.03842
  • repo_url: None
  • paper_authors: Ishita Gopalakrishnan, Sanjjushri Varshini R, Ponshriharini V
  • for: 提供一个有用的搜索引擎和推荐系统 для音乐业界,以解决现有搜索引擎场景中的问题,例如速度、准确性和搜寻数据的格式。
  • methods: 使用Jina AI,一个MLOps框架,以建立基于神经网络的搜索引擎,并使用单一查询输入进行搜寻分析,并且可以对数据库中的歌曲进行精确的匹配。
  • results: 建立了一个有效的搜索引擎和推荐系统,可以帮助用户快速找到想要的歌曲,并且可以保持和提高搜索引擎的性能质量。
    Abstract One of the most intriguing debates regarding a novel task is the development of search engines and recommendation-based systems in the music industry. Studies have shown a drastic depression in the search engine fields, due to concerning factors such as speed, accuracy and the format of data given for querying. Often people face difficulty in searching for a song solely based on the title, hence a solution is proposed to complete a search analysis through a single query input and is matched with the lyrics of the songs present in the database. Hence it is essential to incorporate cutting-edge technology tools for developing a user-friendly search engine. Jina AI is an MLOps framework for building neural search engines that are utilized, in order for the user to obtain accurate results. Jina AI effectively helps to maintain and enhance the quality of performance for the search engine for the query given. An effective search engine and a recommendation system for the music industry, built with JinaAI.
    摘要 一个非常有趣的讨论是音乐业中搜索引擎和推荐系统的开发。研究表明,搜索引擎领域受到了严重的萧瑟和精度等因素的影响,导致搜索效果不佳。因此,一种解决方案是通过单个查询输入完成搜索分析,并将数据库中的歌曲歌词与查询结果进行匹配。因此,采用先进的技术工具对于建立用户友好的搜索引擎是非常重要。Jina AI 是一个 ML Ops 框架,用于建立基于神经网络的搜索引擎,以提供精准的搜索结果。Jina AI 有效地帮助维护和提高搜索引擎的性能质量。一款有效的搜索引擎和推荐系统,用于音乐industry,基于 JinaAI。

The Copycat Perceptron: Smashing Barriers Through Collective Learning

  • paper_url: http://arxiv.org/abs/2308.03743
  • repo_url: None
  • paper_authors: Giovanni Catania, Aurélien Decelle, Beatriz Seoane
  • for: 研究一种 Binary Perceptron 模型在教师-学生场景下的平衡性质。
  • methods: 使用适当的学习规则和显式氧化 coupling proportional to Hamming distance between students’ weights。
  • results: 对于具有非零温度的情况, coupling of replicas 导致 phase diagram shift to smaller values of α,这表明在 fixed fraction of reviewed examples 下,解决方案的自由能 landscape 变得更平滑,使用 local update algorithms such as Simulated Annealing 可以更容易到达解决方案。
    Abstract We characterize the equilibrium properties of a model of $y$ coupled binary perceptrons in the teacher-student scenario, subject to a suitable learning rule, with an explicit ferromagnetic coupling proportional to the Hamming distance between the students' weights. In contrast to recent works, we analyze a more general setting in which a thermal noise is present that affects the generalization performance of each student. Specifically, in the presence of a nonzero temperature, which assigns nonzero probability to configurations that misclassify samples with respect to the teacher's prescription, we find that the coupling of replicas leads to a shift of the phase diagram to smaller values of $\alpha$: This suggests that the free energy landscape gets smoother around the solution with good generalization (i.e., the teacher) at a fixed fraction of reviewed examples, which allows local update algorithms such as Simulated Annealing to reach the solution before the dynamics gets frozen. Finally, from a learning perspective, these results suggest that more students (in this case, with the same amount of data) are able to learn the same rule when coupled together with a smaller amount of data.
    摘要 我们研究一个 teacher-student enario中的 $y$ 关联 binary perceptron 模型的稳定性特性,采用一种合适的学习规则,并具有明确的 ferromagnetic 相互作用,该相互作用与学生的权重差值成正比。与之前的研究不同,我们分析了一种更通用的设置,在其中每个学生面临着一定温度,这使得每个学生的推测结果受到样本推测结果的影响。我们发现,在非零温度下,相互作用导致解的相对温度下降,这使得解的自由能面积变得更平滑,从而使用 Simulated Annealing 类型的本地更新算法可以更好地到达解。最后,从学习角度来看,这些结果表明,通过将更多的学生(每个学生具有相同数据量) coupling вместе,可以在相同数据量下学习同样的规则。

Randomized algorithms for precise measurement of differentially-private, personalized recommendations

  • paper_url: http://arxiv.org/abs/2308.03735
  • repo_url: https://github.com/apple/ml-dprecs
  • paper_authors: Allegra Laro, Yanqing Chen, Hao He, Babak Aghazadeh
  • for: 这篇论文是关于个性化推荐的算法设计,帮助企业建立隐私第一的个性化推荐系统。
  • methods: 本论文提出了一种隐私保护的个性化推荐算法,通过对用户数据进行加密和扩展来保证用户隐私,同时仍能够准确地反映用户兴趣。
  • results: 作者通过实验研究了这种隐私保护的个性化推荐算法在用户体验、广告商价值和平台收益等方面的影响,并发现该算法可以减少用户隐私泄露风险,同时保持用户满意度和广告商满意度。
    Abstract Personalized recommendations form an important part of today's internet ecosystem, helping artists and creators to reach interested users, and helping users to discover new and engaging content. However, many users today are skeptical of platforms that personalize recommendations, in part due to historically careless treatment of personal data and data privacy. Now, businesses that rely on personalized recommendations are entering a new paradigm, where many of their systems must be overhauled to be privacy-first. In this article, we propose an algorithm for personalized recommendations that facilitates both precise and differentially-private measurement. We consider advertising as an example application, and conduct offline experiments to quantify how the proposed privacy-preserving algorithm affects key metrics related to user experience, advertiser value, and platform revenue compared to the extremes of both (private) non-personalized and non-private, personalized implementations.
    摘要 现代互联网生态系统中,个性化推荐已成为重要的一部分,帮助艺术家和创作者与有兴趣的用户连接,并帮助用户发现新和有趣的内容。然而,许多用户今天对于个性化推荐平台的存在表示怀疑,部分原因是历史上对个人数据和隐私的不谨慎处理。现在,基于个性化推荐的企业正进入一个新的 paradigma,其中许多系统需要重新设计以保持隐私。在这篇文章中,我们提出一种隐私保护的个性化推荐算法,可以同时保证精度和分配隐私。我们通过广告作为应用例子,并在线实验评估了提议的隐私保护算法对用户体验、广告商价值和平台收益的影响,与非个性化和非隐私个性化实现相比。

SurvBeX: An explanation method of the machine learning survival models based on the Beran estimator

  • paper_url: http://arxiv.org/abs/2308.03730
  • repo_url: https://github.com/danilaeremenko/survbex
  • paper_authors: Lev V. Utkin, Danila Y. Eremenko, Andrei V. Konstantinov
  • For: The paper proposes a new method called SurvBeX to interpret predictions of machine learning survival black-box models.* Methods: The method uses a modified Beran estimator as a surrogate explanation model, and generates many points in a local area around an example of interest to compute the survival function of the black-box model and the Beran estimator.* Results: The paper demonstrates the efficiency of SurvBeX through numerical experiments with synthetic and real survival data, and compares the method with SurvLIME and SurvSHAP. The code implementing SurvBeX is available online.
    Abstract An explanation method called SurvBeX is proposed to interpret predictions of the machine learning survival black-box models. The main idea behind the method is to use the modified Beran estimator as the surrogate explanation model. Coefficients, incorporated into Beran estimator, can be regarded as values of the feature impacts on the black-box model prediction. Following the well-known LIME method, many points are generated in a local area around an example of interest. For every generated example, the survival function of the black-box model is computed, and the survival function of the surrogate model (the Beran estimator) is constructed as a function of the explanation coefficients. In order to find the explanation coefficients, it is proposed to minimize the mean distance between the survival functions of the black-box model and the Beran estimator produced by the generated examples. Many numerical experiments with synthetic and real survival data demonstrate the SurvBeX efficiency and compare the method with the well-known method SurvLIME. The method is also compared with the method SurvSHAP. The code implementing SurvBeX is available at: https://github.com/DanilaEremenko/SurvBeX
    摘要 提出了一种解释方法 called SurvBeX,用于解释机器学习生存黑盒模型的预测结果。该方法的主要想法是使用 modify Beran 估计器作为解释模型。 incorporated into Beran 估计器的系数可以看作黑盒模型预测结果中特定特征的影响值。采用 LIME 方法的做法,在对 интересов的示例点附近 generate many 点,然后对每个生成的示例点,计算黑盒模型的生存函数,并将 Beran 估计器中的生存函数作为解释系数的函数。为了找到解释系数,提议使用生成的示例点中的mean distance between survival functions of the black-box model and the Beran estimator 来减少。 numerically experiments with synthetic and real survival data demonstrate SurvBeX 的效果,并与 SurvLIME 方法进行比较。 SurvBeX 还与 SurvSHAP 方法进行比较。 SurvBeX 的代码可以在以下链接中找到:https://github.com/DanilaEremenko/SurvBeX。

Dimensionality Reduction for Improving Out-of-Distribution Detection in Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2308.03723
  • repo_url: https://github.com/mckellwoodland/dimen_reduce_mahal
  • paper_authors: McKell Woodland, Nihil Patel, Mais Al Taie, Joshua P. Yung, Tucker J. Netherton, Ankit B. Patel, Kristy K. Brock
  • for: 这个论文是为了检测liver segmentation模型在不同数据分布下的性能,以避免自动化偏见。
  • methods: 该论文使用了 Mahalanobis 距离后处理瓶颈特征,将瓶颈特征缩放到 Principal Component Analysis 中,以高效地检测out-of-distribution 图像。
  • results: 该论文的实验结果显示,通过应用 Mahalanobis 距离后处理瓶颈特征,可以高效地检测out-of-distribution 图像,并且具有较高的性能和较低的计算负担。
    Abstract Clinically deployed segmentation models are known to fail on data outside of their training distribution. As these models perform well on most cases, it is imperative to detect out-of-distribution (OOD) images at inference to protect against automation bias. This work applies the Mahalanobis distance post hoc to the bottleneck features of a Swin UNETR model that segments the liver on T1-weighted magnetic resonance imaging. By reducing the dimensions of the bottleneck features with principal component analysis, OOD images were detected with high performance and minimal computational load.
    摘要 临床应用的分割模型通常会在训练分布外的数据上失败。由于这些模型在大多数情况下表现良好,因此在推理阶段检测出idanormal(OOD)图像是非常重要的,以避免自动化偏见。这个工作使用Swin UNITER模型的瓶颈特征使用 Mahalanobis 距离后处理,以降低瓶颈特征的维度。通过使用主成分分析,OOD 图像可以高效地检测到,而且计算负担相对较小。

“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

  • paper_url: http://arxiv.org/abs/2308.03825
  • repo_url: https://github.com/verazuo/jailbreak_llms
  • paper_authors: Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang
  • for: 本研究旨在探讨大语言模型(LLM)遭到恶意使用的问题,特别是新出现的“监狱提示”(jailbreak prompt),以及它们如何绕过安全措施并刺激LLM生成有害内容。
  • methods: 本研究使用自然语言处理技术和图形基于的社区检测方法,探索监狱提示的特有特征和主要攻击策略,如提示注入和特权提升。研究还发现监狱提示逐渐倾向私人平台,对LLM供应商带来新的检测挑战。
  • results: 研究发现现有LLM和安全措施无法彻底防范监狱提示的攻击,特别是在13种禁止enario中,其中两个监狱提示在GPT-3.5和GPT-4上达到了0.99攻击成功率,并在线上免疫超过100天。研究 shed light on the严重和不断演化的监狱提示威胁领域。希望本研究可以促进研究人员和LLM供应商在推广安全和规范的LLM方面努力。
    Abstract The misuse of large language models (LLMs) has garnered significant attention from the general public and LLM vendors. In response, efforts have been made to align LLMs with human values and intent use. However, a particular type of adversarial prompts, known as jailbreak prompt, has emerged and continuously evolved to bypass the safeguards and elicit harmful content from LLMs. In this paper, we conduct the first measurement study on jailbreak prompts in the wild, with 6,387 prompts collected from four platforms over six months. Leveraging natural language processing technologies and graph-based community detection methods, we discover unique characteristics of jailbreak prompts and their major attack strategies, such as prompt injection and privilege escalation. We also observe that jailbreak prompts increasingly shift from public platforms to private ones, posing new challenges for LLM vendors in proactive detection. To assess the potential harm caused by jailbreak prompts, we create a question set comprising 46,800 samples across 13 forbidden scenarios. Our experiments show that current LLMs and safeguards cannot adequately defend jailbreak prompts in all scenarios. Particularly, we identify two highly effective jailbreak prompts which achieve 0.99 attack success rates on ChatGPT (GPT-3.5) and GPT-4, and they have persisted online for over 100 days. Our work sheds light on the severe and evolving threat landscape of jailbreak prompts. We hope our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.
    摘要 大量语言模型(LLM)的不当使用已经引起了公众和LLM供应商的关注。为了规避LLMs被用于不良目的,努力被进行了对LLMs的人类价值观和合法用途的Alignment。然而,一种特殊的恶意提示,称为监狱破解提示,在不断演化以通过安全措施得到恶意内容从LLMs。在这篇论文中,我们进行了首次在野外中对监狱提示的测量研究,收集了6,387个提示从四个平台上, duration of six months。通过自然语言处理技术和图形基本的社区探测方法,我们发现了监狱提示的独特特征和主要攻击策略,如提示注入和特权提升。我们还发现,监狱提示在公共平台逐渐减少,这对LLM供应商在抢救措施方面带来了新的挑战。为了评估监狱提示所可能引起的危害,我们创建了46,800个问题样本,涵盖13个禁止enario。我们的实验表明,目前的LLMs和安全措施无法有效地防止监狱提示在所有情况下。特别是,我们标识出了两个非常有效的监狱提示,在ChatGPT(GPT-3.5)和GPT-4上达到了0.99的攻击成功率,它们在线上持续超过100天。我们的工作照明了监狱提示的严重和演化的威胁风险。我们希望我们的研究能够促进研究 сообщество和LLM供应商在推广安全和规范的LLMs方面的努力。

Communication-Efficient Framework for Distributed Image Semantic Wireless Transmission

  • paper_url: http://arxiv.org/abs/2308.03713
  • repo_url: None
  • paper_authors: Bingyan Xie, Yongpeng Wu, Yuxuan Shi, Derrick Wing Kwan Ng, Wenjun Zhang
  • For: The paper proposes a federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.* Methods: The FLSC framework uses a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator for coarse-to-fine semantic extraction and meaning translation. The framework also employs a channel state information-based multiple-input multiple-output transmission module to combat channel fading and noise.* Results: The paper shows that the FLSC framework can achieve better performance than traditional schemes in terms of coarse semantic information and signal-to-noise ratio, especially in low signal-to-noise ratio and channel bandwidth ratio regimes. Specifically, the FLSC framework can provide around 10 dB signal-to-noise ratio gain in the 3 dB channel condition.
    Abstract Multi-node communication, which refers to the interaction among multiple devices, has attracted lots of attention in many Internet-of-Things (IoT) scenarios. However, its huge amounts of data flows and inflexibility for task extension have triggered the urgent requirement of communication-efficient distributed data transmission frameworks. In this paper, inspired by the great superiorities on bandwidth reduction and task adaptation of semantic communications, we propose a federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices. Federated learning enables the design of independent semantic communication link of each user while further improves the semantic extraction and task performance through global aggregation. Each link in FLSC is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator for coarse-to-fine semantic extraction and meaning translation according to specific tasks. In order to extend the FLSC into more realistic conditions, we design a channel state information-based multiple-input multiple-output transmission module to combat channel fading and noise. Simulation results show that the coarse semantic information can deal with a range of image-level tasks. Moreover, especially in low signal-to-noise ratio and channel bandwidth ratio regimes, FLSC evidently outperforms the traditional scheme, e.g. about 10 peak signal-to-noise ratio gain in the 3 dB channel condition.
    摘要

Scaling may be all you need for achieving human-level object recognition capacity with human-like visual experience

  • paper_url: http://arxiv.org/abs/2308.03712
  • repo_url: https://github.com/eminorhan/humanlike-vits
  • paper_authors: A. Emin Orhan
  • for: investigate whether current self-supervised learning methods can reach human-level visual object recognition capabilities with the same type and amount of visual experience as humans.
  • methods: use vision transformers with up to 633M parameters and train with up to 5K hours of human-like video data, with image resolutions of up to 476x476 pixels, using masked autoencoders as a self-supervised learning algorithm.
  • results: find that it is feasible to reach human-level object recognition capacity at sub-human scales of model size, data size, and image size, if these factors are scaled up simultaneously, and estimate that a 2.5B parameter ViT model trained with 20K hours of human-like video data should be able to reach roughly human-level accuracy on ImageNet.
    Abstract This paper asks whether current self-supervised learning methods, if sufficiently scaled up, would be able to reach human-level visual object recognition capabilities with the same type and amount of visual experience humans learn from. Previous work on this question only considered the scaling of data size. Here, we consider the simultaneous scaling of data size, model size, and image resolution. We perform a scaling experiment with vision transformers up to 633M parameters in size (ViT-H/14) trained with up to 5K hours of human-like video data (long, continuous, mostly egocentric videos) with image resolutions of up to 476x476 pixels. The efficiency of masked autoencoders (MAEs) as a self-supervised learning algorithm makes it possible to run this scaling experiment on an unassuming academic budget. We find that it is feasible to reach human-level object recognition capacity at sub-human scales of model size, data size, and image size, if these factors are scaled up simultaneously. To give a concrete example, we estimate that a 2.5B parameter ViT model trained with 20K hours (2.3 years) of human-like video data with a spatial resolution of 952x952 pixels should be able to reach roughly human-level accuracy on ImageNet. Human-level competence is thus achievable for a fundamental perceptual capability from human-like perceptual experience (human-like in both amount and type) with extremely generic learning algorithms and architectures and without any substantive inductive biases.
    摘要 这篇论文询问了现有自动学习方法,如果继续扩大,能否达到人类级视觉对象识别能力,使用同样的类型和量的视觉经验。先前的工作只考虑了数据量的扩大。我们在这篇论文中考虑了同时扩大数据量、模型大小和图像分辨率。我们通过使用视Transformer模型,最大达633M参数(ViT-H/14),使用人类类似的视频数据(长、连续、主要是 Egocentric 视频),并将图像分辨率提高至476x476像素。我们发现,在同时扩大数据量、模型大小和图像分辨率的情况下,可以达到人类级对象识别能力,但是这些因素需要同时扩大。例如,我们估计,一个2.5B参数的 ViT 模型,通过20K小时(2.3年)的人类类似的视频数据,并在952x952像素的空间分辨率下进行训练,应该能够达到图像Net roughly human-level accuracy。这显示,通过人类类似的感知经验(包括同样的类型和量),使用极简的学习算法和架构,并不具备重要的逻辑假设,可以达到人类级的视觉对象识别能力。

DeRisk: An Effective Deep Learning Framework for Credit Risk Prediction over Real-World Financial Data

  • paper_url: http://arxiv.org/abs/2308.03704
  • repo_url: None
  • paper_authors: Yancheng Liang, Jiajie Zhang, Hui Li, Xiaochen Liu, Yi Hu, Yong Wu, Jinyao Zhang, Yongyan Liu, Yi Wu
  • for: 预测信用风险(credit risk prediction)
  • methods: 使用深度学习模型(deep learning model)
  • results: 超越统计学习方法(statistical learning methods),实现更高的预测精度(higher prediction accuracy)
    Abstract Despite the tremendous advances achieved over the past years by deep learning techniques, the latest risk prediction models for industrial applications still rely on highly handtuned stage-wised statistical learning tools, such as gradient boosting and random forest methods. Different from images or languages, real-world financial data are high-dimensional, sparse, noisy and extremely imbalanced, which makes deep neural network models particularly challenging to train and fragile in practice. In this work, we propose DeRisk, an effective deep learning risk prediction framework for credit risk prediction on real-world financial data. DeRisk is the first deep risk prediction model that outperforms statistical learning approaches deployed in our company's production system. We also perform extensive ablation studies on our method to present the most critical factors for the empirical success of DeRisk.
    摘要 尽管深度学习技术在过去几年中取得了巨大的进步,但最新的风险预测模型仍然基于高度手动调整的阶段性统计学学习工具,如梯度提升和随机森林方法。不同于图像或语言,实际世界金融数据具有高维、稀疏、噪音和极度不均衡的特点,这使得深度神经网络模型在实践中特别困难要求和脆弱。在这项工作中,我们提出了DeRisk,一种高效的深度学习风险预测框架,用于实际世界金融数据的风险预测。DeRisk是我们公司生产系统中现在使用的统计学学习方法的首个深度风险预测模型,我们还进行了广泛的减少研究,以阐明DeRisk的成功的重要因素。

AgentBench: Evaluating LLMs as Agents

  • paper_url: http://arxiv.org/abs/2308.03688
  • repo_url: https://github.com/thudm/agentbench
  • paper_authors: Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, Jie Tang
  • for: 评估大语言模型(LLM)在实际世界中的应用,包括评估 LLM 的推理和决策能力在多turn open-ended generation setting 中。
  • methods: 作者提出了 AgentBench,一个多维度演化的测试准则,目前包括 8 个环境,用于评估 LLM 作为代理的能力。
  • results: 作者对 25 个 LLM(包括 API 和开源模型)进行了广泛的测试,发现Top商业 LLM 在复杂环境中表现出了强大的代理能力,但是与开源竞争对手之间存在显著的性能差异。
    Abstract Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. Our extensive test over 25 LLMs (including APIs and open-sourced models) shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and open-sourced competitors. It also serves as a component of an ongoing project with wider coverage and deeper consideration towards systematic LLM evaluation. Datasets, environments, and an integrated evaluation package for AgentBench are released at https://github.com/THUDM/AgentBench
    摘要

Almost-sure convergence of iterates and multipliers in stochastic sequential quadratic optimization

  • paper_url: http://arxiv.org/abs/2308.03687
  • repo_url: None
  • paper_authors: Frank E. Curtis, Xin Jiang, Qi Wang
  • for: 这种方法用于解决连续优化问题,具有非线性等式约束。
  • methods: 使用渐进随机Sequential Quadratic Programming(SQP)方法。
  • results: 提供了新的几乎确定的收敛保证,包括 primal 迭代、Lagrange多余阶度和稳定度量的收敛。
    Abstract Stochastic sequential quadratic optimization (SQP) methods for solving continuous optimization problems with nonlinear equality constraints have attracted attention recently, such as for solving large-scale data-fitting problems subject to nonconvex constraints. However, for a recently proposed subclass of such methods that is built on the popular stochastic-gradient methodology from the unconstrained setting, convergence guarantees have been limited to the asymptotic convergence of the expected value of a stationarity measure to zero. This is in contrast to the unconstrained setting in which almost-sure convergence guarantees (of the gradient of the objective to zero) can be proved for stochastic-gradient-based methods. In this paper, new almost-sure convergence guarantees for the primal iterates, Lagrange multipliers, and stationarity measures generated by a stochastic SQP algorithm in this subclass of methods are proved. It is shown that the error in the Lagrange multipliers can be bounded by the distance of the primal iterate to a primal stationary point plus the error in the latest stochastic gradient estimate. It is further shown that, subject to certain assumptions, this latter error can be made to vanish by employing a running average of the Lagrange multipliers that are computed during the run of the algorithm. The results of numerical experiments are provided to demonstrate the proved theoretical guarantees.
    摘要 This paper presents new almost-sure convergence guarantees for the primal iterates, Lagrange multipliers, and stationarity measures generated by a stochastic SQP algorithm in this subclass of methods. The error in the Lagrange multipliers can be bounded by the distance of the primal iterate to a primal stationary point plus the error in the latest stochastic gradient estimate. Furthermore, it is shown that this latter error can be made to vanish by employing a running average of the Lagrange multipliers computed during the run of the algorithm, subject to certain assumptions.Numerical experiments are provided to demonstrate the proved theoretical guarantees. These results demonstrate the effectiveness of the proposed method in solving continuous optimization problems with nonlinear equality constraints.

Linear Convergence Bounds for Diffusion Models via Stochastic Localization

  • paper_url: http://arxiv.org/abs/2308.03686
  • repo_url: None
  • paper_authors: Joe Benton, Valentin De Bortoli, Arnaud Doucet, George Deligiannidis
  • for: 这个论文旨在提供高维数据分布中的近似样本生成方法,以及这些方法的拓扑分布的拓扑分布。
  • methods: 这个论文使用了扩散模型,这些模型可以在高维数据分布中生成近似样本。这些模型使用了$L^2$-精度分布 estimator,并且可以在不更改数据分布的情况下生成样本。
  • results: 这个论文提供了高维数据分布中扩散模型的新的拓扑分布 bound,这些 bound 是线性增长的(在数据维度上),并且不需要数据分布具有强平滑性。这个论文还证明了扩散模型只需要 $\tilde O(\frac{d \log^2(1/\delta)}{\varepsilon^2})$ 步来近似于任何数据分布,其中 $\delta$ 是数据分布的噪声标准差,$\varepsilon$ 是近似度。
    Abstract Diffusion models are a powerful method for generating approximate samples from high-dimensional data distributions. Several recent results have provided polynomial bounds on the convergence rate of such models, assuming $L^2$-accurate score estimators. However, up until now the best known such bounds were either superlinear in the data dimension or required strong smoothness assumptions. We provide the first convergence bounds which are linear in the data dimension (up to logarithmic factors) assuming only finite second moments of the data distribution. We show that diffusion models require at most $\tilde O(\frac{d \log^2(1/\delta)}{\varepsilon^2})$ steps to approximate an arbitrary data distribution on $\mathbb{R}^d$ corrupted with Gaussian noise of variance $\delta$ to within $\varepsilon^2$ in Kullback--Leibler divergence. Our proof builds on the Girsanov-based methods of previous works. We introduce a refined treatment of the error arising from the discretization of the reverse SDE, which is based on tools from stochastic localization.
    摘要 Diffusion models are a powerful method for generating approximate samples from high-dimensional data distributions. Several recent results have provided polynomial bounds on the convergence rate of such models, assuming $L^2$-accurate score estimators. However, up until now the best known such bounds were either superlinear in the data dimension or required strong smoothness assumptions. We provide the first convergence bounds which are linear in the data dimension (up to logarithmic factors) assuming only finite second moments of the data distribution. We show that diffusion models require at most $\tilde O(\frac{d \log^2(1/\delta)}{\varepsilon^2})$ steps to approximate an arbitrary data distribution on $\mathbb{R}^d$ corrupted with Gaussian noise of variance $\delta$ to within $\varepsilon^2$ in Kullback--Leibler divergence. Our proof builds on the Girsanov-based methods of previous works. We introduce a refined treatment of the error arising from the discretization of the reverse SDE, which is based on tools from stochastic localization.Note: "Simplified Chinese" is a romanization of Chinese that uses the Chinese characters and their pronunciations, but not the traditional Chinese grammar and syntax. It is often used for computer interfaces and other contexts where a more simplified representation of Chinese is desired.