2023-10-19

cs.AI

cs.AI - 2023-10-19

Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices

paper_url: http://arxiv.org/abs/2310.18329
repo_url: None
paper_authors: Xiaolong Tu, Anik Mallik, Dawei Chen, Kyungtae Han, Onur Altintas, Haoxin Wang, Jiang Xie
for: 这篇论文的目的是提高深度学习的能源效率和降低延误时间，并且实现在不同的边缘设备上的可持续性。
methods: 这篇论文使用了详细的能量测量、预测和效率评分等三种方法，以提高深度学习的能源效率和降低延误时间。
results: 这篇论文的研究结果包括三个大的能量数据集，以及一个基于这些数据集的预测器和效率评分指标。这些结果可以帮助提高边缘 computing 的可持续性和效率。

Abstract
Today, deep learning optimization is primarily driven by research focused on achieving high inference accuracy and reducing latency. However, the energy efficiency aspect is often overlooked, possibly due to a lack of sustainability mindset in the field and the absence of a holistic energy dataset. In this paper, we conduct a threefold study, including energy measurement, prediction, and efficiency scoring, with an objective to foster transparency in power and energy consumption within deep learning across various edge devices. Firstly, we present a detailed, first-of-its-kind measurement study that uncovers the energy consumption characteristics of on-device deep learning. This study results in the creation of three extensive energy datasets for edge devices, covering a wide range of kernels, state-of-the-art DNN models, and popular AI applications. Secondly, we design and implement the first kernel-level energy predictors for edge devices based on our kernel-level energy dataset. Evaluation results demonstrate the ability of our predictors to provide consistent and accurate energy estimations on unseen DNN models. Lastly, we introduce two scoring metrics, PCS and IECS, developed to convert complex power and energy consumption data of an edge device into an easily understandable manner for edge device end-users. We hope our work can help shift the mindset of both end-users and the research community towards sustainability in edge computing, a principle that drives our research. Find data, code, and more up-to-date information at https://amai-gsu.github.io/DeepEn2023.

摘要
Firstly, we present a detailed, first-of-its-kind measurement study that uncovers the energy consumption characteristics of on-device deep learning. This study results in the creation of three extensive energy datasets for edge devices, covering a wide range of kernels, state-of-the-art DNN models, and popular AI applications.Secondly, we design and implement the first kernel-level energy predictors for edge devices based on our kernel-level energy dataset. Evaluation results demonstrate the ability of our predictors to provide consistent and accurate energy estimations on unseen DNN models.Lastly, we introduce two scoring metrics, PCS and IECS, developed to convert complex power and energy consumption data of an edge device into an easily understandable manner for edge device end-users. We hope our work can help shift the mindset of both end-users and the research community towards sustainability in edge computing, a principle that drives our research.Find data, code, and more up-to-date information at .

The opaque law of artificial intelligence

paper_url: http://arxiv.org/abs/2310.13192
repo_url: None
paper_authors: Vincenzo Calderonio
for: 本研究旨在分析算法的透明度，在人工智能 causation 问题上的开放辩论中提供一种实验方法，以评估现有一个最佳生成 AI 模型（Chat-GPT）的性能，并研究如何通过法律规范来调控它。
methods: 本研究使用 Turing Test 的对话方法进行实验，以评估 Chat-GPT 模型的性能，并从法律角度探讨意识、故意和责任等意义，以更好地理解 AI 的使用问题。
results: 研究结果表明，Chat-GPT 模型在一些情况下可以达到高度的准确率，但也存在一些潜在的问题和风险，如模型的透明度和可控性等。此外，本研究还提出了一些可能的法律解决方案，以帮助调控 AI 的使用。

Abstract
The purpose of this paper is to analyse the opacity of algorithms, contextualized in the open debate on responsibility for artificial intelligence causation; with an experimental approach by which, applying the proposed conversational methodology of the Turing Test, we expect to evaluate the performance of one of the best existing NLP model of generative AI (Chat-GPT) to see how far it can go right now and how the shape of a legal regulation of it could be. The analysis of the problem will be supported by a comment of Italian classical law categories such as causality, intent and fault to understand the problem of the usage of AI, focusing in particular on the human-machine interaction. On the computer science side, for a technical point of view of the logic used to craft these algorithms, in the second chapter will be proposed a practical interrogation of Chat-GPT aimed at finding some critical points of the functioning of AI. The end of the paper will concentrate on some existing legal solutions which can be applied to the problem, plus a brief description of the approach proposed by EU Artificial Intelligence act.

摘要
本文的目的是分析算法的透明度，在人工智能 causation 的开放辩论中上下文化了这个问题；通过应用提议的对话方法（Turing Test），我们预计可以评估现有一个最佳的生成 AI 模型（Chat-GPT）的性能，以评估它目前的可能性和如何制定相关的法律规范。对问题的分析将得到意大利古典法Category的评论，包括 causality、意图和责任，以更好地理解 AI 的使用问题，特别是人机交互方面。从计算机科学的角度来看，在第二章中将提出一种实践的问题对 Chat-GPT 的探索，以找出一些算法的关键点。总结部分将聚焦现有的法律解决方案， plus 简要描述 EU 人工智能法规。

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models

paper_url: http://arxiv.org/abs/2310.13191
repo_url: None
paper_authors: Jianwei Li, Qi Lei, Wei Cheng, Dongkuan Xu
for: 本研究旨在提高语音模型的Robustness，以便在实际应用中减少模型的敏感性和训练时间。
methods: 本研究提出了一种基于预训练知识的post-training束致法，通过保留更多的预训练知识来提高语音模型的Robustness。该方法通过层次重建错误来自动修正层次错误，以便更好地保留预训练知识。
results: 相比其他基于state-of-the-art的基eline，本研究的方法在SST2、IMDB和AGNews等 dataset上表现出了更好的平衡点，即Accuracy、Sparsity、Robustness和束致成本之间的trade-off。这表明了本研究的方法可以有效地提高语音模型的Robustness，同时保持Accuracy和Sparsity的良好性。

Abstract
The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.

摘要
<>translate "The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models."into Simplified Chinese:<>大型语言模型的剪除目标已经从精度和稀疏性扩展到了可靠性，然而现有方法在不断增加模型稀疏性时很难提高对攻击性词语的抵御能力，而且需要重新训练过程。人类进入大型语言模型时代，这些问题变得越来越突出。本文提议语言模型的可靠性与它拥有的预训练知识量成正比。根据此，我们提出了一种增强语言模型可靠性的后期剪除策略，该策略可以准确复制权重空间和特征空间，以保留更多的预训练知识。在这种设置下，每层的重建错误不仅来自自己，还包括先前层次的累加错误，然后进行自适应修正。相比其他当前基elines，我们的方法在 SST2、IMDB 和 AGNews 数据集上显示出了更好的平衡性，marks a significant step towards robust pruning in language models.

Fast and Accurate Factual Inconsistency Detection Over Long Documents

paper_url: http://arxiv.org/abs/2310.13189
repo_url: https://github.com/asappresearch/scale-score
paper_authors: Barrett Martin Lattimer, Patrick Chen, Xinyuan Zhang, Yi Yang
for: 本研究旨在提出一种任务不受限制的模型 для检测事实不一致，以解决现有方法对长输入的处理有限制。
methods: 本研究使用了一种新的分割策略，即源剥离方法（Source Chunking Approach），将长文本分割成大块来conditioning。这种方法基于自然语言推理（Natural Language Inference），并且可以在多种任务上达到状态之最的性能。
results: 本研究的实验结果表明，SCALE模型可以在多种任务上超越现有方法，并且在长输入情况下表现更好。此外，SCALE模型还可以快速地解释自己的决策，并且在效率和模型解释评价中也表现出优异。

Abstract
Generative AI models exhibit remarkable potential; however, hallucinations across various tasks present a significant challenge, particularly for longer inputs that current approaches struggle to address effectively. We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy. Specifically, SCALE is a Natural Language Inference (NLI) based model that uses large text chunks to condition over long texts. This approach achieves state-of-the-art performance in factual inconsistency detection for diverse tasks and long inputs. Additionally, we leverage the chunking mechanism and employ a novel algorithm to explain SCALE's decisions through relevant source sentence retrieval. Our evaluations reveal that SCALE outperforms existing methods on both standard benchmarks and a new long-form dialogue dataset ScreenEval we constructed. Moreover, SCALE surpasses competitive systems in efficiency and model explanation evaluations. We have released our code and data publicly to GitHub.

摘要
现代生成AI模型表现出了惊人的潜力，但是幻觉在不同任务中存在一定挑战，特别是对 longer inputs 的处理。我们介绍了 SCALE（Source Chunking Approach for Large-scale inconsistency Evaluation），一种任务非依赖的模型，用于检测事实不一致性。特别是，SCALE 是一种基于自然语言推理（NLI）的模型，使用大量文本块来condition over long text。这种方法在多种任务和长输入上实现了状态的末点性表现。此外，我们利用块化机制，并使用一种新的算法来解释 SCALE 的决策，通过 relevance source sentence retrieval。我们的评估表明，SCALE 在标准 benchmark 和我们新建的长形对话 dataset ScreenEval 上表现出色，并且在效率和模型解释评估中也超过了现有的系统。我们已经在 GitHub 上公开了代码和数据。

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

paper_url: http://arxiv.org/abs/2310.13165
repo_url: https://github.com/sled-group/cyclenet
paper_authors: Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai
For: 本文旨在提出一种新的image-to-image翻译方法，以帮助在image synthesis任务中实现高质量的图像生成。* Methods: 本文使用了cycle consistency来regulates image manipulation，并且通过多种方法来提高image-to-image翻译的精度和质量。* Results: 实验结果表明，Cyclenet可以在不同的粒度水平上实现高质量的图像翻译，并且可以在不同的物体和场景翻译任务中提供高度的一致性。此外，本文还提供了一个多域image-to-image翻译 dataset，以便研究物体的物理状态变化。

Abstract
Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation. Various methods have been explored to address this issue, including mask-based methods, attention-based methods, and image-conditioning. However, it remains a critical challenge to enable unpaired I2I translation with pre-trained DMs while maintaining satisfying consistency. This paper introduces Cyclenet, a novel but simple method that incorporates cycle consistency into DMs to regularize image manipulation. We validate Cyclenet on unpaired I2I tasks of different granularities. Besides the scene and object level translation, we additionally contribute a multi-domain I2I translation dataset to study the physical state changes of objects. Our empirical studies show that Cyclenet is superior in translation consistency and quality, and can generate high-quality images for out-of-domain distributions with a simple change of the textual prompt. Cyclenet is a practical framework, which is robust even with very limited training data (around 2k) and requires minimal computational resources (1 GPU) to train. Project homepage: https://cyclenetweb.github.io/

摘要
Diffusion models (DMs) 已经帮助实现图像生成任务的突破，但缺乏直观的图像到图像（I2I）翻译界面。各种方法已经被探索以解决这个问题，包括面罩基的方法、注意力基的方法和图像条件。但是，尚未能够通过预训练的 DMs 实现无对应 I2I 翻译，并保持满意的一致性。这篇论文介绍 CycleNet，一种新的简单方法，将图像 manipulate 中的循环一致性 incorporated 到 DMs 中来规范化图像翻译。我们验证 CycleNet 在不同粒度的无对应 I2I 任务上。除了场景和对象层翻译，我们还贡献了多域 I2I 翻译数据集，以研究物体的物理状态变化。我们的实验表明，CycleNet 在翻译一致性和质量方面具有优势，能够生成高质量的图像，并且可以通过简单地修改文本提示来生成出去domains 的图像。CycleNet 是一个实用的框架，可以在很少的训练数据（约 2k）和少量计算资源（1 GPU）下进行训练。项目首页：https://cyclenetweb.github.io/

A Distributed Approach to Meteorological Predictions: Addressing Data Imbalance in Precipitation Prediction Models through Federated Learning and GANs

paper_url: http://arxiv.org/abs/2310.13161
repo_url: None
paper_authors: Elaheh Jafarigol, Theodore Trafalis
for: 本研究旨在为各种领域（农业、航空、灾害管理等）提供精准预测和深入分析，通过对大量多维天气数据进行分类。
methods: 本研究使用机器学习模型分析天气数据，并利用数据增强技术（如少数民类过多样本技术或生成对抗网络）改善模型的准确性，以便分类罕见但重要的天气事件。
results: 本研究表明，通过使用数据增强技术，可以提高模型的准确性，并在中央和联合数据存储和处理环境中进行分类，保持数据隐私和完整性。

Abstract
The classification of weather data involves categorizing meteorological phenomena into classes, thereby facilitating nuanced analyses and precise predictions for various sectors such as agriculture, aviation, and disaster management. This involves utilizing machine learning models to analyze large, multidimensional weather datasets for patterns and trends. These datasets may include variables such as temperature, humidity, wind speed, and pressure, contributing to meteorological conditions. Furthermore, it's imperative that classification algorithms proficiently navigate challenges such as data imbalances, where certain weather events (e.g., storms or extreme temperatures) might be underrepresented. This empirical study explores data augmentation methods to address imbalanced classes in tabular weather data in centralized and federated settings. Employing data augmentation techniques such as the Synthetic Minority Over-sampling Technique or Generative Adversarial Networks can improve the model's accuracy in classifying rare but critical weather events. Moreover, with advancements in federated learning, machine learning models can be trained across decentralized databases, ensuring privacy and data integrity while mitigating the need for centralized data storage and processing. Thus, the classification of weather data stands as a critical bridge, linking raw meteorological data to actionable insights, enhancing our capacity to anticipate and prepare for diverse weather conditions.

摘要
天气数据分类涉及将气象现象分为类别，以便进行细化分析和精准预测，为各个领务如农业、航空和灾害管理等提供指导。这些分类模型可以使用机器学习算法分析大量多维天气数据寻找模式和趋势。这些数据可能包括温度、湿度、风速和压力等气象条件变量。此外，分类算法需要能够有效地处理数据不均衡问题，例如某些天气事件（如风暴或极端温度）可能被下标。本研究探讨了数据扩展技术来解决中心化和联邦化设置下的不均衡分类问题。使用数据扩展技术如Synthetic Minority Over-sampling Technique或生成对抗网络可以提高模型在分类罕见而critical天气事件的准确率。此外，随着联邦学习的发展，机器学习模型可以在分布式数据库上进行训练，保持隐私和数据完整性，同时减少中心化数据存储和处理的需求。因此，天气数据分类作为把天气数据与行动指导相连接的关键桥梁，可以提高我们对多样化天气条件的预测和准备能力。

Conditional Generative Modeling for Images, 3D Animations, and Video

paper_url: http://arxiv.org/abs/2310.13157
repo_url: None
paper_authors: Vikram Voleti
for: 这个论文的目的是驱动计算机视觉领域的创新，探索新的条件生成模型，并将其应用于图像、3D动画和视频等领域。
methods: 这个研究专注于具有逆转变数的条件生成模型，以及使用encoder-decoder架构进行生成任务和3D内容修饰。我们还使用神经泛化函数来模型视频动态，并提出了一个具有条件信息的泛化流程，以提高生成过程和生成内容的效率。
results: 我们的研究实现了许多成果，包括使用神经泛化函数来预测未来视频几帧，以及基于低分辨率输入进行高分辨率图像生成。我们还提出了一个可以自动调整人像和3D角色的对称运动架构，并进行了几个实验来评估这些成果。

Abstract
This dissertation attempts to drive innovation in the field of generative modeling for computer vision, by exploring novel formulations of conditional generative models, and innovative applications in images, 3D animations, and video. Our research focuses on architectures that offer reversible transformations of noise and visual data, and the application of encoder-decoder architectures for generative tasks and 3D content manipulation. In all instances, we incorporate conditional information to enhance the synthesis of visual data, improving the efficiency of the generation process as well as the generated content. We introduce the use of Neural ODEs to model video dynamics using an encoder-decoder architecture, demonstrating their ability to predict future video frames despite being trained solely to reconstruct current frames. Next, we propose a conditional variant of continuous normalizing flows that enables higher-resolution image generation based on lower-resolution input, achieving comparable image quality while reducing parameters and training time. Our next contribution presents a pipeline that takes human images as input, automatically aligns a user-specified 3D character with the pose of the human, and facilitates pose editing based on partial inputs. Next, we derive the relevant mathematical details for denoising diffusion models that use non-isotropic Gaussian processes, and show comparable generation quality. Finally, we devise a novel denoising diffusion framework capable of solving all three video tasks of prediction, generation, and interpolation. We perform ablation studies, and show SOTA results on multiple datasets. Our contributions are published articles at peer-reviewed venues. Overall, our research aims to make a meaningful contribution to the pursuit of more efficient and flexible generative models, with the potential to shape the future of computer vision.

摘要
这个论文目标在计算机视觉领域推动创新，通过探索新的条件生成模型形式和应用，提高生成过程和生成内容的效率。我们的研究关注降噪和视觉数据的可逆转换，以及使用编码器-解码器架构进行生成任务和3D内容修饰。在所有情况下，我们包含条件信息以提高生成的效果。我们引入神经 ordinary differential equations（ODEs）来模型视频动态，使用编码器-解码器架构预测未来视频帧，即使只有训练current frame。然后，我们提出一种基于Continuous Normalizing Flows的条件变体，允许基于lower-resolution输入生成高分辨率图像，实现相同的图像质量，同时减少参数和训练时间。我们的下一个贡献是一个管道，可以将人像作为输入，自动将用户指定的3D人物与人像的 pose 对齐，并且允许基于部分输入进行pose编辑。接着，我们 derive the relevant mathematical details for denoising diffusion models that use non-isotropic Gaussian processes, and show comparable generation quality。最后，我们提出一种新的杂化噪声框架，能够解决所有三个视频任务：预测、生成和 interpolate。我们进行了减少研究，并在多个数据集上达到了state-of-the-art results。总的来说，我们的研究旨在为计算机视觉领域提供更有效率和灵活的生成模型，并且具有可能在未来影响计算机视觉的潜在。

CLIFT: Analysing Natural Distribution Shift on Question Answering Models in Clinical Domain

paper_url: http://arxiv.org/abs/2310.13146
repo_url: https://github.com/openlifescience-ai/clift
paper_authors: Ankit Pal
for: 这个论文提出了一个新的临床域问答任务测试环境 CLIFT (Clinical Shift)，以提供可靠和多样化的参考基准。
methods: 论文使用了多种QA深度学习模型进行了全面的实验研究，并评估了这些模型在提posed的测试环境下的性能。
results: 论文发现，即使在原始测试集上显示出了很好的 результа，但当应用于新的测试集时，模型的性能却会下降，这显示出了分布shift的问题。

Abstract
This paper introduces a new testbed CLIFT (Clinical Shift) for the clinical domain Question-answering task. The testbed includes 7.5k high-quality question answering samples to provide a diverse and reliable benchmark. We performed a comprehensive experimental study and evaluated several QA deep-learning models under the proposed testbed. Despite impressive results on the original test set, the performance degrades when applied to new test sets, which shows the distribution shift. Our findings emphasize the need for and the potential for increasing the robustness of clinical domain models under distributional shifts. The testbed offers one way to track progress in that direction. It also highlights the necessity of adopting evaluation metrics that consider robustness to natural distribution shifts. We plan to expand the corpus by adding more samples and model results. The full paper and the updated benchmark are available at github.com/openlifescience-ai/clift

摘要
这份论文介绍了一个新的 клиниче域问答任务测试环境（CLIFT）。该测试环境包含7500个高质量问答样本，以提供多样化和可靠的参考基线。我们进行了全面的实验研究，评估了多种问答深度学习模型在提posed的测试环境下的性能。尽管在原始测试集上表现出色，但当应用于新的测试集时，其性能却受到分布Shift的影响。我们的发现强调了在分布Shift下提高клиниче域模型的可靠性的需要，以及采用考虑到自然分布Shift的评价指标的重要性。该测试环境可以用于跟踪进步，并高亮了采用更多样本和模型结果来扩展 corrpus的必要性。论文全文和更新的benchmark可以在github.com/openlifescience-ai/clift找到。

Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries

paper_url: http://arxiv.org/abs/2310.13132
repo_url: https://github.com/claws-lab/XLingEval
paper_authors: Yiqiao Jin, Mohit Chandra, Gaurav Verma, Yibo Hu, Munmun De Choudhury, Srijan Kumar
For: The paper aims to investigate the effectiveness of large language models (LLMs) as multi-lingual dialogue systems for healthcare queries, and to provide a cross-lingual benchmark for evaluating their performance.* Methods: The paper uses empirically-derived framework XlingEval, which focuses on three fundamental criteria for evaluating LLM responses to naturalistic health-related questions: correctness, consistency, and verifiability. The paper also uses extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, and an amalgamation of algorithmic and human-evaluation strategies.* Results: The paper finds a pronounced disparity in LLM responses across the four languages, indicating a need for enhanced cross-lingual capabilities. The paper proposes XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. The findings underscore the need to bolster the cross-lingual capacities of these models and provide an equitable information ecosystem accessible to all.Here is the information in Simplified Chinese text:* For: 研究大语言模型（LLM）在医疗域中的多语言对话系统的有效性。* Methods: 使用经验所得的框架XlingEval，对自然语言中的医疗问题进行评估。* Results: 发现不同语言中的LLM响应存在显著的差异，需要提升跨语言能力。提出了跨语言医疗 benchmark XlingHealth，以评估 LLM 在医疗域中的多语言能力。发现需要提高跨语言能力，以提供一个平等的信息生态系统。

Abstract
Large language models (LLMs) are transforming the ways the general public accesses and consumes information. Their influence is particularly pronounced in pivotal sectors like healthcare, where lay individuals are increasingly appropriating LLMs as conversational agents for everyday queries. While LLMs demonstrate impressive language understanding and generation proficiencies, concerns regarding their safety remain paramount in these high-stake domains. Moreover, the development of LLMs is disproportionately focused on English. It remains unclear how these LLMs perform in the context of non-English languages, a gap that is critical for ensuring equity in the real-world use of these systems.This paper provides a framework to investigate the effectiveness of LLMs as multi-lingual dialogue systems for healthcare queries. Our empirically-derived framework XlingEval focuses on three fundamental criteria for evaluating LLM responses to naturalistic human-authored health-related questions: correctness, consistency, and verifiability. Through extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets, and through an amalgamation of algorithmic and human-evaluation strategies, we found a pronounced disparity in LLM responses across these languages, indicating a need for enhanced cross-lingual capabilities. We further propose XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. Our findings underscore the pressing need to bolster the cross-lingual capacities of these models, and to provide an equitable information ecosystem accessible to all.

摘要
大型语言模型（LLMs）正在改变公众如何获取和消耗信息的方式。它们在重要的领域 like 医疗方面的影响 particualry pronounced，lay individuals increasingly appropriate LLMs as conversational agents for everyday queries. Although LLMs demonstrate impressive language understanding and generation capabilities, concerns about their safety remain paramount in these high-stakes domains. In addition, the development of LLMs is disproportionately focused on English, and it is unclear how these models perform in the context of non-English languages, which is critical for ensuring equity in the real-world use of these systems.This paper proposes a framework to investigate the effectiveness of LLMs as multi-lingual dialogue systems for healthcare queries. Our empirically derived framework XlingEval focuses on three fundamental criteria for evaluating LLM responses to naturalistic human-authored health-related questions: correctness, consistency, and verifiability. Through extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets, and through an amalgamation of algorithmic and human-evaluation strategies, we found a pronounced disparity in LLM responses across these languages, indicating a need for enhanced cross-lingual capabilities. We further propose XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. Our findings underscore the pressing need to bolster the cross-lingual capacities of these models and provide an equitable information ecosystem accessible to all.

Deep Reinforcement Learning-based Intelligent Traffic Signal Controls with Optimized CO2 emissions

paper_url: http://arxiv.org/abs/2310.13129
repo_url: https://github.com/pagand/eco-light
paper_authors: Pedram Agand, Alexey Iskrov, Mo Chen
for: 这个论文的目的是提出一种名为EcoLight的奖励形式，用于改善交通信号控制器的绩效，并最小化CO2排放。
methods: 该论文使用了多种RLAlgorithms，包括 tabular Q-Learning、DQN、SARSA 和 A2C，并对它们的性能进行比较。
results: 论文的实验结果表明，EcoLight可以不仅减少CO2排放，还能达到与传统方法相当的旅行时间和等待时间。

Abstract
Nowadays, transportation networks face the challenge of sub-optimal control policies that can have adverse effects on human health, the environment, and contribute to traffic congestion. Increased levels of air pollution and extended commute times caused by traffic bottlenecks make intersection traffic signal controllers a crucial component of modern transportation infrastructure. Despite several adaptive traffic signal controllers in literature, limited research has been conducted on their comparative performance. Furthermore, despite carbon dioxide (CO2) emissions' significance as a global issue, the literature has paid limited attention to this area. In this report, we propose EcoLight, a reward shaping scheme for reinforcement learning algorithms that not only reduces CO2 emissions but also achieves competitive results in metrics such as travel time. We compare the performance of tabular Q-Learning, DQN, SARSA, and A2C algorithms using metrics such as travel time, CO2 emissions, waiting time, and stopped time. Our evaluation considers multiple scenarios that encompass a range of road users (trucks, buses, cars) with varying pollution levels.

摘要
现在，交通网络面临着优化控制策略的挑战，这可能有害于人类健康、环境和交通堵塞。交通拥堵导致的空气污染和延长的交通时间，使得路口交通信号控制器成为现代交通基础设施的关键组件。尽管有多种适应交通信号控制器的文献研究，但有限的研究把关于它们的比较性能。此外，尽管二氧化碳（CO2）排放的全球问题的重要性，文献却很少关注这一方面。在本报告中，我们提出了EcoLight，一种奖励形式学习算法的补做方案，不仅减少CO2排放，还可以在交通时间、等待时间和停止时间等指标中实现竞争性能。我们使用表格Q学习、DQN、SARSA和A2C算法进行比较，评估多种情况，包括不同的路用者（卡车、汽车、客车）以及不同的污染水平。

Understanding Addition in Transformers

paper_url: http://arxiv.org/abs/2310.13121
repo_url: None
paper_authors: Philip Quirke, Fazl Barez
for: 这篇论文旨在对一种一层Transformer模型的内部工作机制进行深入分析，以便更好地理解这些模型的安全和道德使用。
methods: 论文使用了一种彻底的分析方法，揭示了模型在整数加法任务上的具体实现方式，包括将任务分解成并行的数字特定流程，并采用不同的算法来处理不同的数字位置。
results: 研究发现，模型在计算过程中延迟开始，但快速执行计算任务，并且在某些罕见的高损失情况下进行了详细的解释。通过严格的测试和数学模型，这些发现得到了证实。这些研究对机器学习模型的安全和道德使用做出了贡献，并为更复杂的任务和多层Transformer模型的分析开了道。

Abstract
Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper presents an in-depth analysis of a one-layer Transformer model trained for integer addition. We reveal that the model divides the task into parallel, digit-specific streams and employs distinct algorithms for different digit positions. Our study also finds that the model starts calculations late but executes them rapidly. A rare use case with high loss is identified and explained. Overall, the model's algorithm is explained in detail. These findings are validated through rigorous testing and mathematical modeling, contributing to the broader works in Mechanistic Interpretability, AI safety, and alignment. Our approach opens the door for analyzing more complex tasks and multi-layer Transformer models.

摘要
理解机器学习模型如转换器的内部工作方式是安全和道德使用的关键。本文对一个一层转换器模型进行了深入的分析，该模型用于整数加法。我们发现该模型将任务分解为并行、数字特定的流程，并采用不同的算法来处理不同的数字位置。我们的研究还发现该模型在计算开始后很快进行了执行。我们还发现了一个罕见的高损失情况，并对其进行了解释。总的来说，该模型的算法得到了详细的解释。我们的方法为机器学习安全和启发领域的更广泛研究开创了新的可能性。

Semi-Supervised Learning of Dynamical Systems with Neural Ordinary Differential Equations: A Teacher-Student Model Approach

paper_url: http://arxiv.org/abs/2310.13110
repo_url: None
paper_authors: Yu Wang, Yuxuan Yin, Karthik Somayaji Nanjangud Suryanarayana, Jan Drgona, Malachi Schram, Mahantesh Halappanavar, Frank Liu, Peng Li
for: 模型动态系统，提高模型预测性和泛化能力。
methods: 使用 semi-supervised 方法和 Neural Ordinary Differential Equations (NODE) 模型，利用丰富的无标示数据和 teacher-student 模型来提高模型的性能。
results: 与基准 Neural ODE 模型比较，TS-NODE 在多个动态系统模型Task上显示出了显著的性能改进。

Abstract
Modeling dynamical systems is crucial for a wide range of tasks, but it remains challenging due to complex nonlinear dynamics, limited observations, or lack of prior knowledge. Recently, data-driven approaches such as Neural Ordinary Differential Equations (NODE) have shown promising results by leveraging the expressive power of neural networks to model unknown dynamics. However, these approaches often suffer from limited labeled training data, leading to poor generalization and suboptimal predictions. On the other hand, semi-supervised algorithms can utilize abundant unlabeled data and have demonstrated good performance in classification and regression tasks. We propose TS-NODE, the first semi-supervised approach to modeling dynamical systems with NODE. TS-NODE explores cheaply generated synthetic pseudo rollouts to broaden exploration in the state space and to tackle the challenges brought by lack of ground-truth system data under a teacher-student model. TS-NODE employs an unified optimization framework that corrects the teacher model based on the student's feedback while mitigating the potential false system dynamics present in pseudo rollouts. TS-NODE demonstrates significant performance improvements over a baseline Neural ODE model on multiple dynamical system modeling tasks.

摘要
Translation in Simplified Chinese:模型动力系统是许多任务的关键，但是它们仍然具有复杂的非线性动力学、有限的观察数据和缺乏先验知识等挑战。最近，基于数据驱动的方法如神经常微方程（NODE）已经展示了许多承诺的结果，通过神经网络的表达能力来模型未知的动力学。然而，这些方法通常受限于有限的标注训练数据，导致泛化和优化结果不佳。相反，半supervised算法可以利用丰富的无标注数据，在分类和回归任务中达到良好的表现。我们提出了TS-NODE，首个半supervised的动力系统模型ing方法，它采用了免费生成的合成pseudo rollouts来扩大状态空间的探索，并在教师-学生模型下对lack of ground-truth system data bring的挑战。TS-NODE employ an unified optimization framework to correct the teacher model based on the student's feedback while mitigating the potential false system dynamics present in pseudo rollouts。TS-NODE在多个动力系统模型ing任务上示出了显著的性能提升 compared to a baseline Neural ODE model.

AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection

paper_url: http://arxiv.org/abs/2310.13103
repo_url: None
paper_authors: Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang
for: 防止深伪视频的散布，增加媒体regulation，和研究领域的新挑战。
methods: 基于 transformer 的 Audio-Visual Transformer-based Ensemble Network (AVTENet) 框架，包括视觉、声音和视频 modalities 的融合特征提取。
results: 在多Modal audio-video FakeAVCeleb 数据集上，模型达到了最佳性能，比已有方法高效。

Abstract
Forged content shared widely on social media platforms is a major social problem that requires increased regulation and poses new challenges to the research community. The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous work on detecting AI-generated fake videos only utilizes visual modality or audio modality. While there are some methods in the literature that exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multi-modal datasets of deepfake videos involving acoustic and visual manipulations. Moreover, these existing methods are mostly based on CNN and suffer from low detection accuracy. Inspired by the recent success of Transformer in various fields, to address the challenges posed by deepfake technology, in this paper, we propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation to achieve effective video forgery detection. Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction. For evaluation, we use the recently released benchmark multi-modal audio-video FakeAVCeleb dataset. For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset. Experimental results show that our best model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset.

摘要
伪造内容在社交媒体平台上广泛分享是一个重要的社会问题，需要加强管理和 pose new challenges to the research community。 recent deepfake video 的普及引起了听Visual forgery的问题的关注。 previoius work on detecting AI-generated fake videos 仅仅使用视觉modalities or audio modalities。 Although there are some methods in the literature that exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multi-modal datasets of deepfake videos involving acoustic and visual manipulations。 Moreover, these existing methods are mostly based on CNN and suffer from low detection accuracy。inspired by the recent success of Transformer in various fields, to address the challenges posed by deepfake technology, in this paper, we propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation to achieve effective video forgery detection。 Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction。 For evaluation, we use the recently released benchmark multi-modal audio-video FakeAVCeleb dataset。 For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset。 Experimental results show that our best model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset。

Particle Guidance: non-I.I.D. Diverse Sampling with Diffusion Models

paper_url: http://arxiv.org/abs/2310.13102
repo_url: https://github.com/gcorso/particle-guidance
paper_authors: Gabriele Corso, Yilun Xu, Valentin de Bortoli, Regina Barzilay, Tommi Jaakkola
for: The paper is written for researchers and practitioners interested in improving the diversity and sample efficiency of generative models, particularly in the context of image and molecular conformer generation.
methods: The paper proposes a new method called particle guidance, which extends diffusion-based generative sampling by enforcing diversity through a joint-particle time-evolving potential. The method is based on the idea of moving beyond the common assumption of independent samples.
results: The paper reports empirical results on conditional image generation and molecular conformer generation, showing that particle guidance can increase diversity without affecting quality in the former, and reduce the state-of-the-art median error by 13% on average in the latter.

Abstract
In light of the widespread success of generative models, a significant amount of research has gone into speeding up their sampling time. However, generative models are often sampled multiple times to obtain a diverse set incurring a cost that is orthogonal to sampling time. We tackle the question of how to improve diversity and sample efficiency by moving beyond the common assumption of independent samples. We propose particle guidance, an extension of diffusion-based generative sampling where a joint-particle time-evolving potential enforces diversity. We analyze theoretically the joint distribution that particle guidance generates, its implications on the choice of potential, and the connections with methods in other disciplines. Empirically, we test the framework both in the setting of conditional image generation, where we are able to increase diversity without affecting quality, and molecular conformer generation, where we reduce the state-of-the-art median error by 13% on average.

摘要
据广泛成功的生成模型，有大量的研究集中在减速生成模型的采样时间。然而，生成模型通常需要多次采样以获得多样性，这会产生与采样时间无关的成本。我们考虑如何提高多样性和采样效率，我们提出了粒子指导，它是基于扩散基本生成采样的扩展，在粒子时间演化的共同可能性下保证多样性。我们 theoretically 分析了粒子指导生成的共同分布，其影响粒子可能性的选择，以及与其他领域方法的连接。empirically，我们在条件图像生成和分子 conformer 生成中测试了框架，并在后一种情况下降低了状态艺术中的平均错误率 by 13%。

No offence, Bert – I insult only humans! Multiple addressees sentence-level attack on toxicity detection neural network

paper_url: http://arxiv.org/abs/2310.13099
repo_url: None
paper_authors: Sergey Berezin, Reza Farahbakhsh, Noel Crespi
for: 针对黑盒恶意识别器模型的 sentence-level 攻击。
methods: 通过添加一些积极词或句子到仇恨消息的末端，改变神经网络的预测结果，并通过恶意识别系统的检查。
results: 在七种语言上（来自三种语言家族）实现了这种攻击方法，并描述了对此攻击的防御机制以及其局限性。

Abstract
We introduce a simple yet efficient sentence-level attack on black-box toxicity detector models. By adding several positive words or sentences to the end of a hateful message, we are able to change the prediction of a neural network and pass the toxicity detection system check. This approach is shown to be working on seven languages from three different language families. We also describe the defence mechanism against the aforementioned attack and discuss its limitations.

摘要
我们介绍了一种简单 yet 高效的句子级攻击方法，用于针对黑盒毒性检测器模型。我们通过在仇恨消息的末端添加一些积极的词语或句子，使得神经网络的预测结果发生变化，并通过毒性检测系统的检查。这种方法在七种语言中进行了测试，并在三个语言家族中进行了应用。我们还描述了对此攻击的防御机制，并讨论了它的局限性。

Unsupervised Representation Learning to Aid Semi-Supervised Meta Learning

paper_url: http://arxiv.org/abs/2310.13085
repo_url: https://github.com/atik666/representationtransfer
paper_authors: Atik Faysal, Mohammad Rostami, Huaxia Wang, Avimanyu Sahoo, Ryan Antle
for: 解决数据稀缺问题，提高机器学习模型的泛化能力。
methods: 使用一shot无监督meta学习学习模型的含义表示，并在训练阶段使用扩展样本作为查询集。内循环使用温度扩大的交叉熵损失避免过拟合。
results: 在Omniglot和mini-Imagenet datasets上使用模型不偏 meta-learning和关系网络（RN），并实现了具有较好的准确率的模型。此外，使用提档的初始化方法可以在训练样本数量减少的情况下达到满意的准确率。

Abstract
Few-shot learning or meta-learning leverages the data scarcity problem in machine learning. Traditionally, training data requires a multitude of samples and labeling for supervised learning. To address this issue, we propose a one-shot unsupervised meta-learning to learn the latent representation of the training samples. We use augmented samples as the query set during the training phase of the unsupervised meta-learning. A temperature-scaled cross-entropy loss is used in the inner loop of meta-learning to prevent overfitting during unsupervised learning. The learned parameters from this step are applied to the targeted supervised meta-learning in a transfer-learning fashion for initialization and fast adaptation with improved accuracy. The proposed method is model agnostic and can aid any meta-learning model to improve accuracy. We use model agnostic meta-learning (MAML) and relation network (RN) on Omniglot and mini-Imagenet datasets to demonstrate the performance of the proposed method. Furthermore, a meta-learning model with the proposed initialization can achieve satisfactory accuracy with significantly fewer training samples.

摘要
几shot学习或元学习可以解决机器学习中的数据缺乏问题。传统上，超vised学习需要大量的训练数据和标签。为解决这个问题，我们提出了一种一shot无监控元学习，以learn训练数据的隐藏表现。在训练阶段，我们使用了扩展的训练数据作为查询集。在元学习的内循环中，我们使用温度调整的十字熵损失，以避免过拟合。学习的参数从这一步被应用到目标的超级vised元学习中，作为初始化和快速适应的改进。我们的方法是模型不受限的，可以帮助任何元学习模型提高精度。我们使用了model不受限元学习（MAML）和关系网络（RN）在Omniglot和mini-Imagenet数据集上进行表现示例。此外，具有我们的初始化的元学习模型可以在很少的训练数据下 дости得到满意的精度。

From Multilingual Complexity to Emotional Clarity: Leveraging Commonsense to Unveil Emotions in Code-Mixed Dialogues

paper_url: http://arxiv.org/abs/2310.13080
repo_url: https://github.com/lcs2-iiitd/emnlp-coffee
paper_authors: Shivani Kumar, Ramaneswaran S, Md Shad Akhtar, Tanmoy Chakraborty
for: 本研究旨在提高对码混合对话中情感识别的能力。
methods: 该研究提出了一种新的方法，即将常识信息与对话Context相结合，以更深入地理解情感。该方法包括提取基于码混合输入的相关常识信息，并使用高级融合技术将其与对话表示相结合。
results: 实验表明，通过系统地 интеGRATION常识信息，ERC的性能得到了显著提高。 both量化评估和质量分析都证明了我们的假设的正确性， thereby confirming the importance of incorporating commonsense in ERC.

Abstract
Understanding emotions during conversation is a fundamental aspect of human communication, driving NLP research for Emotion Recognition in Conversation (ERC). While considerable research has focused on discerning emotions of individual speakers in monolingual dialogues, understanding the emotional dynamics in code-mixed conversations has received relatively less attention. This motivates our undertaking of ERC for code-mixed conversations in this study. Recognizing that emotional intelligence encompasses a comprehension of worldly knowledge, we propose an innovative approach that integrates commonsense information with dialogue context to facilitate a deeper understanding of emotions. To achieve this, we devise an efficient pipeline that extracts relevant commonsense from existing knowledge graphs based on the code-mixed input. Subsequently, we develop an advanced fusion technique that seamlessly combines the acquired commonsense information with the dialogue representation obtained from a dedicated dialogue understanding module. Our comprehensive experimentation showcases the substantial performance improvement obtained through the systematic incorporation of commonsense in ERC. Both quantitative assessments and qualitative analyses further corroborate the validity of our hypothesis, reaffirming the pivotal role of commonsense integration in enhancing ERC.

摘要
Translated into Simplified Chinese:理解对话中的情感是人类交流的基本方面，驱动了NLP研究的情感识别在对话中（ERC）。 although considerable research has focused on distinguishing the emotions of individual speakers in monolingual dialogues, understanding the emotional dynamics in code-mixed conversations has received relatively less attention. This motivates our study of ERC for code-mixed conversations. recognizing that emotional intelligence includes a comprehension of worldly knowledge, we propose an innovative approach that integrates commonsense information with dialogue context to facilitate a deeper understanding of emotions. To achieve this, we design an efficient pipeline that extracts relevant commonsense from existing knowledge graphs based on the code-mixed input. Subsequently, we develop an advanced fusion technique that seamlessly combines the acquired commonsense information with the dialogue representation obtained from a dedicated dialogue understanding module. Our comprehensive experimentation shows the substantial performance improvement obtained through the systematic incorporation of commonsense in ERC. Both quantitative assessments and qualitative analyses further corroborate the validity of our hypothesis, reaffirming the pivotal role of commonsense integration in enhancing ERC.

Creative Robot Tool Use with Large Language Models

paper_url: http://arxiv.org/abs/2310.13065
repo_url: None
paper_authors: Mengdi Xu, Peide Huang, Wenhao Yu, Shiqi Liu, Xilun Zhang, Yaru Niu, Tingnan Zhang, Fei Xia, Jie Tan, Ding Zhao
for: 这 paper 的目的是研究如何让机器人使用工具创新地完成复杂的任务，包括适应物理约束和长期规划。methods: 这 paper 使用 Large Language Models (LLMs) 开发了一个系统，可以根据自然语言指令控制机器人在实验室和实际环境中进行操作。该系统包括四个重要组件：分析器、规划器、计算器和编译器。results: 这 paper 的结果表明，RoboTool 可以不仅理解任务中的物理约束和环境因素，还能示出创新的工具使用。与传统的 Task and Motion Planning (TAMP) 方法不同，我们的 LLM-based 系统提供了更灵活、高效和用户友好的解决方案，可以扩展机器人系统的能力。经过广泛的实验，我们证明了 RoboTool 能够处理不可能 без创新工具使用的任务，从而扩大机器人系统的可能性。

Abstract
Tool use is a hallmark of advanced intelligence, exemplified in both animal behavior and robotic capabilities. This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning. Leveraging Large Language Models (LLMs), we develop RoboTool, a system that accepts natural language instructions and outputs executable code for controlling robots in both simulated and real-world environments. RoboTool incorporates four pivotal components: (i) an "Analyzer" that interprets natural language to discern key task-related concepts, (ii) a "Planner" that generates comprehensive strategies based on the language input and key concepts, (iii) a "Calculator" that computes parameters for each skill, and (iv) a "Coder" that translates these plans into executable Python code. Our results show that RoboTool can not only comprehend explicit or implicit physical constraints and environmental factors but also demonstrate creative tool use. Unlike traditional Task and Motion Planning (TAMP) methods that rely on explicit optimization, our LLM-based system offers a more flexible, efficient, and user-friendly solution for complex robotics tasks. Through extensive experiments, we validate that RoboTool is proficient in handling tasks that would otherwise be infeasible without the creative use of tools, thereby expanding the capabilities of robotic systems. Demos are available on our project page: https://creative-robotool.github.io/.

摘要
tool 使用是高级智能的标志，在动物行为和机器人能力中都有代表性。这篇论文探讨了让机器人具备创新使用工具的能力，以实现在含有隐式物理约束和长期规划的任务中的高级智能。利用大型自然语言模型（LLM），我们开发了RoboTool系统，可以接受自然语言指令并生成控制机器人的可执行代码。RoboTool包括四个重要组件：（一）一个“分析器”，可以从自然语言中解读关键任务相关的概念；（二）一个“规划器”，可以根据自然语言输入和关键概念生成全面的策略；（三）一个“计算器”，可以计算每种技能的参数；以及（四）一个“编译器”，可以将这些规划转换为可执行的Python代码。我们的结果表明，RoboTool不仅可以理解隐式或明确的物理约束和环境因素，而且可以展示创新的工具使用。与传统的任务和动作规划（TAMP）方法不同，我们的LLM基于系统提供了更灵活、高效和用户友好的解决方案，用于复杂的机器人任务。通过广泛的实验，我们证明了RoboTool可以处理不可能 без创新工具使用的任务，从而扩展机器人系统的能力。demo可以在我们项目页面上找到：https://creative-robotool.github.io/.

Training Dynamics of Deep Network Linear Regions

paper_url: http://arxiv.org/abs/2310.12977
repo_url: None
paper_authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk
for: 这个研究主要关注深度网络（DN）训练过程中的输入空间分区或线性区域的形成。
methods: 这个研究使用了细分Affine DNs（例如，带有（漏斗）ReLU非线性的网络），并使用了一种新的统计量来描述深度网络的本地复杂性（LC）。
results: 研究发现，在训练过程中，LC在数据点周围的区域经历了多个阶段，包括一个减少趋势，然后是一个升高阶段，最终是一个最后的减少趋势。此外，研究还发现，在最后一个LC下降阶段的训练过程中，线性区域从训练和测试样本偏移向决策边界，使得深度网络的输入输出变得几乎线性Everywhere else。此外，研究还发现，不同的LC阶段与深度网络的记忆和泛化性强相关，特别是在“搁浅”阶段。

Abstract
The study of Deep Network (DN) training dynamics has largely focused on the evolution of the loss function, evaluated on or around train and test set data points. In fact, many DN phenomenon were first introduced in literature with that respect, e.g., double descent, grokking. In this study, we look at the training dynamics of the input space partition or linear regions formed by continuous piecewise affine DNs, e.g., networks with (leaky)ReLU nonlinearities. First, we present a novel statistic that encompasses the local complexity (LC) of the DN based on the concentration of linear regions inside arbitrary dimensional neighborhoods around data points. We observe that during training, the LC around data points undergoes a number of phases, starting with a decreasing trend after initialization, followed by an ascent and ending with a final descending trend. Using exact visualization methods, we come across the perplexing observation that during the final LC descent phase of training, linear regions migrate away from training and test samples towards the decision boundary, making the DN input-output nearly linear everywhere else. We also observe that the different LC phases are closely related to the memorization and generalization performance of the DN, especially during grokking.

摘要
研究深度网络（DN）训练动态的研究主要集中在损失函数的演化，即在训练集和测试集数据点附近。实际上，许多DN现象在文献中首次出现，例如双峰Descender。在这种研究中，我们关注了连续划分 affine DN 的输入空间分区或线性区域的训练动态。例如，使用（漏斗）ReLU非线性。我们首先介绍了一种新的统计，用于量化DN本地复杂性（LC），基于数据点附近多维度邻域中线性区域的吸引度。我们发现，在训练过程中，LC附近数据点经历了一些阶段，包括初始阶段下降趋势，然后升附向阶段，最终结束于下降趋势。使用精确视觉化方法，我们发现在最后一个LC下降阶段的训练中，线性区域在训练和测试样本之间迁移，使DN的输入-输出变得nearly linear everywhere else。我们还发现不同的LC阶段与DN的记忆和泛化性的表现有着密切的关系，特别是在搜寻过程中。

Variational Inference for SDEs Driven by Fractional Noise

paper_url: http://arxiv.org/abs/2310.12975
repo_url: None
paper_authors: Rembert Daems, Manfred Opper, Guillaume Crevecoeur, Tolga Birdal
for: 这个论文是为了提出一种基于变量框架的概率方法，用于解决随机 diffeomorphism 方程（SDE）驱动的Markov-approximate fractional Brownian motion（fBM）的推理问题。
methods: 这篇论文使用了变量方法，并基于Markov approxiamtion of fBM， derivation of evidence lower bound，以及使用神经网络来学习涨搅、扩散和控制项的变量 posterior。
results: 论文提出了一种基于变量框架的概率方法，可以快速和有效地解决SDE驱动的fBM推理问题，并且在synthetic data上验证了这种方法的有效性。此外，论文还提出了一种基于变量 neural-SDE的视频预测方法，这是varational neural-SDE的首次应用于视频识别领域。

Abstract
We present a novel variational framework for performing inference in (neural) stochastic differential equations (SDEs) driven by Markov-approximate fractional Brownian motion (fBM). SDEs offer a versatile tool for modeling real-world continuous-time dynamic systems with inherent noise and randomness. Combining SDEs with the powerful inference capabilities of variational methods, enables the learning of representative function distributions through stochastic gradient descent. However, conventional SDEs typically assume the underlying noise to follow a Brownian motion (BM), which hinders their ability to capture long-term dependencies. In contrast, fractional Brownian motion (fBM) extends BM to encompass non-Markovian dynamics, but existing methods for inferring fBM parameters are either computationally demanding or statistically inefficient. In this paper, building upon the Markov approximation of fBM, we derive the evidence lower bound essential for efficient variational inference of posterior path measures, drawing from the well-established field of stochastic analysis. Additionally, we provide a closed-form expression to determine optimal approximation coefficients. Furthermore, we propose the use of neural networks to learn the drift, diffusion and control terms within our variational posterior, leading to the variational training of neural-SDEs. In this framework, we also optimize the Hurst index, governing the nature of our fractional noise. Beyond validation on synthetic data, we contribute a novel architecture for variational latent video prediction,-an approach that, to the best of our knowledge, enables the first variational neural-SDE application to video perception.

摘要
我们提出了一种新的变量框架，用于在神经网络随机差分方程（SDE）中进行推理。SDE 提供了一种灵活的工具，用于模型化真实世界中的连续时间动态系统，具有内在的噪音和随机性。将 SDE 与变量方法相结合，可以通过随机梯度下降来学习表达函数分布。然而，传统的 SDE 通常假设下面的噪音随机过程是布朗运动（BM），这限制了它们的能力 capture 长期关系。相比之下，分布式扩展运动（fBM）扩展了 BM，以捕捉非Markovian 动态，但现有的 fBM 参数推断方法是计算昂贵或 statistically 不fficient。在这篇论文中，基于 Markov approxiamtion 的 fBM，我们 deriv 出证明 Lower bound 必要的 Variational 推断 posterior path measures，从 Stochastic 分析领域中得到了灵感。此外，我们还提供了一个关闭式表达式，用于确定最佳拟合系数。此外，我们建议使用神经网络来学习涨栅、滤波和控制项的变ational posterior，从而实现变量训练神经-SDE。在这个框架中，我们还优化了哈斯特指数，它控制了我们的分布式噪音的性质。在此之外，我们还提出了一种新的变量架构，用于变量潜在视频预测，这是一种具有 Variational 神经-SDE 应用的首个途径。

Robust multimodal models have outlier features and encode more concepts

paper_url: http://arxiv.org/abs/2310.13040
repo_url: None
paper_authors: Jonathan Crabbé, Pau Rodríguez, Vaishaal Shankar, Luca Zappella, Arno Blaas
for: 这个论文的目的是研究robust模型与非robust模型之间的区别，以及这些区别是如何影响模型的学习和泛化性。
methods: 该论文使用了12种不同的多模态模型（包括ResNet和ViT）和不同的预训练集（如OpenAI、LAION-400M、LAION-2B、YFCC15M、CC12M和DataComp）进行 probing，以探索robust模型的表征空间中的两种特征。
results: 该论文发现robust模型在表征空间中存在两种特征：1）Robust模型具有异常特征，其活动值可以达到数个数量级以上，这些异常特征导致模型的预测力量。2）Robust模型在表征空间中储存了更多的概念，这使得模型可以更好地泛化，但同时也使得模型的解释变得更加困难。

Abstract
What distinguishes robust models from non-robust ones? This question has gained traction with the appearance of large-scale multimodal models, such as CLIP. These models have demonstrated unprecedented robustness with respect to natural distribution shifts. While it has been shown that such differences in robustness can be traced back to differences in training data, so far it is not known what that translates to in terms of what the model has learned. In this work, we bridge this gap by probing the representation spaces of 12 robust multimodal models with various backbones (ResNets and ViTs) and pretraining sets (OpenAI, LAION-400M, LAION-2B, YFCC15M, CC12M and DataComp). We find two signatures of robustness in the representation spaces of these models: (1) Robust models exhibit outlier features characterized by their activations, with some being several orders of magnitude above average. These outlier features induce privileged directions in the model's representation space. We demonstrate that these privileged directions explain most of the predictive power of the model by pruning up to $80 \%$ of the least important representation space directions without negative impacts on model accuracy and robustness; (2) Robust models encode substantially more concepts in their representation space. While this superposition of concepts allows robust models to store much information, it also results in highly polysemantic features, which makes their interpretation challenging. We discuss how these insights pave the way for future research in various fields, such as model pruning and mechanistic interpretability.

摘要
Our findings reveal two distinct signatures of robustness in the representation spaces of these models:1. Robust models exhibit outlier features with significantly higher activations, sometimes orders of magnitude above the average. These outlier features create privileged directions in the model's representation space. We show that these privileged directions account for most of the model's predictive power, as pruning up to 80% of the least important representation space directions does not negatively impact model accuracy or robustness.2. Robust models encode a greater number of concepts in their representation space, leading to a more polysemantic feature space. While this allows robust models to store more information, it also makes their interpretation challenging.These insights open up new avenues for future research, such as model pruning and mechanistic interpretability. By understanding what makes a model robust, we can develop more effective and efficient AI systems that are better equipped to handle the complexities of real-world data.

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

paper_url: http://arxiv.org/abs/2310.12973
repo_url: https://github.com/ziqipang/lm4visualencoding
paper_authors: Ziqi Pang, Ziyang Xie, Yunze Man, Yu-Xiong Wang
for: 这篇论文探讨了大型自然语言模型（LLM）在没有语言的情况下如何用于纯视觉任务。
methods: 这篇论文提出了一种新的、前所未见的策略：将预训练的LLM块作为纯视觉任务中的元编码层使用。
results: 该策略可以在多种任务上提高性能，包括2D和3D图像识别、动作识别、非语义任务（如运动预测）和多模态任务（如2D/3D视问答和图像文本检索）。这些改进是对不同类型的LLM和不同LLM块的一致性的。此外，论文还提出了一种叫做信息筛选假设的解释，即预训练LLM块可以挖掘有用的视觉标记并加强其效果。

Abstract
This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a simple yet previously overlooked strategy -- employing a frozen transformer block from pre-trained LLMs as a constituent encoder layer to directly process visual tokens. Our work pushes the boundaries of leveraging LLMs for computer vision tasks, significantly departing from conventional practices that typically necessitate a multi-modal vision-language setup with associated language prompts, inputs, or outputs. We demonstrate that our approach consistently enhances performance across a diverse range of tasks, encompassing pure 2D and 3D visual recognition tasks (e.g., image and point cloud classification), temporal modeling tasks (e.g., action recognition), non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g., 2D/3D visual question answering and image-text retrieval). Such improvements are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and OPT) and different LLM transformer blocks. We additionally propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding -- the pre-trained LLM transformer blocks discern informative visual tokens and further amplify their effect. This hypothesis is empirically supported by the observation that the feature activation, after training with LLM transformer blocks, exhibits a stronger focus on relevant regions. We hope that our work inspires new perspectives on utilizing LLMs and deepening our understanding of their underlying mechanisms. Code is available at https://github.com/ziqipang/LM4VisualEncoding.

摘要
这篇论文发现，大型语言模型（LLM），即使只接受文本数据培训，可以 surprisngly strong的编码器 для纯视觉任务。甚至更有趣的是，这可以通过一种简单 yet previously overlooked 的策略实现——使用预先做好的 transformer 块作为纯视觉任务中的编码器层直接处理视觉符号。我们的工作推动了利用 LLM 进行计算机视觉任务的推广，明显 Departing from conventional practices that typically require a multi-modal vision-language setup with associated language prompts, inputs, or outputs. We demonstrate that our approach consistently enhances performance across a diverse range of tasks, including pure 2D and 3D visual recognition tasks (e.g., image and point cloud classification), temporal modeling tasks (e.g., action recognition), non-semantic tasks (e.g., motion forecasting), and multi-modal tasks (e.g., 2D/3D visual question answering and image-text retrieval). Such improvements are a general phenomenon, applicable to various types of LLMs (e.g., LLaMA and OPT) and different LLM transformer blocks. We additionally propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding—the pre-trained LLM transformer blocks discern informative visual tokens and further amplify their effect. This hypothesis is empirically supported by the observation that the feature activation, after training with LLM transformer blocks, exhibits a stronger focus on relevant regions. We hope that our work inspires new perspectives on utilizing LLMs and deepening our understanding of their underlying mechanisms. 代码可以在 https://github.com/ziqipang/LM4VisualEncoding 上找到。

CLAIR: Evaluating Image Captions with Large Language Models

paper_url: http://arxiv.org/abs/2310.12971
repo_url: https://github.com/zeniSoida/pl1
paper_authors: David Chan, Suzanne Petryk, Joseph E. Gonzalez, Trevor Darrell, John Canny
for: 评估机器生成的图像标签，提出了一种新的评估方法CLAIR。
methods: CLAIR利用大型语言模型（LLMs）的零shot语言模型能力来评估候选标签。
results: CLAIR与人类评估标签质量的协调相对于现有方法提高39.6%，并且可以提供噪音可读的结果，让语言模型可以找到其分配的分数的根据。

Abstract
The evaluation of machine-generated image captions poses an interesting yet persistent challenge. Effective evaluation measures must consider numerous dimensions of similarity, including semantic relevance, visual structure, object interactions, caption diversity, and specificity. Existing highly-engineered measures attempt to capture specific aspects, but fall short in providing a holistic score that aligns closely with human judgments. Here, we propose CLAIR, a novel method that leverages the zero-shot language modeling capabilities of large language models (LLMs) to evaluate candidate captions. In our evaluations, CLAIR demonstrates a stronger correlation with human judgments of caption quality compared to existing measures. Notably, on Flickr8K-Expert, CLAIR achieves relative correlation improvements over SPICE of 39.6% and over image-augmented methods such as RefCLIP-S of 18.3%. Moreover, CLAIR provides noisily interpretable results by allowing the language model to identify the underlying reasoning behind its assigned score. Code is available at https://davidmchan.github.io/clair/

摘要
评估机器生成的图像标签 pose 一个有趣又普遍存在挑战。有效的评估方法应该考虑多个相似性维度，包括semantic relevance、视觉结构、对象互动、标签多样性和特定性。现有的高度工程化度量尝试捕捉特定方面，但都不够提供一个整体分数，与人类评估相吻合。在这里，我们提出了CLAIR，一种新的方法，利用大型语言模型（LLM）的零容量语言模型能力来评估候选标签。在我们的评估中，CLAIR表现出较强的人类评估标签质量相关性，比 existed 度量方法更高。尤其是在Flickr8K-Expert上，CLAIR相比SPICE的39.6%和图像扩展方法如RefCLIP-S的18.3%，实现了更高的相关性提升。此外，CLAIR提供了噪音可读的结果，让语言模型可以确定其分配的分数的下层逻辑。代码可以在https://davidmchan.github.io/clair/ 获取。

Does Your Model Think Like an Engineer? Explainable AI for Bearing Fault Detection with Deep Learning

paper_url: http://arxiv.org/abs/2310.12967
repo_url: None
paper_authors: Thomas Decker, Michael Lebacher, Volker Tresp
for: 这paper是为了解释深度学习模型在承载轮盘附件的磨损检测 task 中的决策过程。methods: 这paper使用了特有的特征归属分析框架，以评估深度学习模型的逻辑是否符合专家的理解。results: 这paper的研究结果表明，通过使用这种特征归属分析框架，可以评估深度学习模型的可靠性和扩展性，并预测不同的深度学习模型在这个任务中的表现。

Abstract
Deep Learning has already been successfully applied to analyze industrial sensor data in a variety of relevant use cases. However, the opaque nature of many well-performing methods poses a major obstacle for real-world deployment. Explainable AI (XAI) and especially feature attribution techniques promise to enable insights about how such models form their decision. But the plain application of such methods often fails to provide truly informative and problem-specific insights to domain experts. In this work, we focus on the specific task of detecting faults in rolling element bearings from vibration signals. We propose a novel and domain-specific feature attribution framework that allows us to evaluate how well the underlying logic of a model corresponds with expert reasoning. Utilizing the framework we are able to validate the trustworthiness and to successfully anticipate the generalization ability of different well-performing deep learning models. Our methodology demonstrates how signal processing tools can effectively be used to enhance Explainable AI techniques and acts as a template for similar problems.

摘要
深度学习已经成功应用于工业传感器数据分析多个相关的实际应用场景。然而，许多高性能方法的含义性具有大量的难以理解性，妨碍了实际应用。可解释AI（XAI）和特征归因技术承诺可以提供决策过程中模型的含义。然而，简单地应用这些方法并不能提供具体的域专家需要的理解。在这种情况下，我们将注意力集中在检测滚动元件 bearings 的振荡信号中的缺陷。我们提出了一种域专家特有的特征归因框架，可以评估模型下的逻辑与域专家的推理是否匹配。通过这种框架，我们能够验证模型的可靠性和扩展性。我们的方法体现了如何使用信号处理工具来增强可解释AI技术，并为类似问题提供模板。

AutoMix: Automatically Mixing Language Models

paper_url: http://arxiv.org/abs/2310.12963
repo_url: https://github.com/automix-llm/automix
paper_authors: Aman Madaan, Pranjal Aggarwal, Ankit Anand, Srividya Pranavi Potharaju, Swaroop Mishra, Pei Zhou, Aditya Gupta, Dheeraj Rajagopal, Karthik Kappaganthu, Yiming Yang, Shyam Upadhyay, Mausam, Manaal Faruqui
for: 本研究旨在优化大型自然语言处理器（LLM）的计算成本和性能，通过灵活地路由查询到更大的LLM。
methods: 本文提出了一种策略性路由方法，基于一个较小的LLM的输出的相对正确性来选择路由到更大的LLM。该方法包括一种几次检验机制，以及一个元验证器来精度地评估检验结果。
results: 根据实验结果，AutoMix方法在五个上下文推理数据集上超过了现有的基准值，提高了每次成本的增量效果，最高达89%。

Abstract
Large language models (LLMs) are now available in various sizes and configurations from cloud API providers. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present AutoMix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to AutoMix is a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring training. Given that verifications can be noisy, we employ a meta verifier in AutoMix to refine the accuracy of these assessments. Our experiments using LLAMA2-13/70B, on five context-grounded reasoning datasets demonstrate that AutoMix surpasses established baselines, improving the incremental benefit per cost by up to 89%. Our code and data are available at https://github.com/automix-llm/automix.

摘要
大型语言模型（LLM）现在从云API提供商处可以获得多种大小和配置。这种多样性提供了广泛的选择，但是有效地利用这些选择以便最大化computational cost和性能仍然是挑战。在这个工作中，我们介绍AutoMix，一种策略性路由查询到更大的LM，基于更小的LM的输出的近似正确性。AutoMix的核心是几 shot自我验证机制，可以无需训练来估计自己的出力的可靠性。由于验证可能会是噪音的，我们在AutoMix中使用meta验证器来精确化这些评价。我们的实验使用LLAMA2-13/70B在五个上下文推理 dataset上显示，AutoMix超过了已知的基准点，提高了增量的价值随着成本的增加量最多达89%。我们的代码和数据可以在https://github.com/automix-llm/automix上获得。

An Emulator for Fine-Tuning Large Language Models using Small Language Models

paper_url: http://arxiv.org/abs/2310.12962
repo_url: None
paper_authors: Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning
for:* The paper aims to decouple the knowledge and skills gained during pre-training and fine-tuning in language models, and to investigate the effect of scaling up these stages on the performance of the models.methods:* The authors propose a novel technique called emulated fine-tuning (EFT) to sample from a distribution that approximates the result of pre-training and fine-tuning at different scales.* EFT is based on an RL-based framework derived from recent developments in learning from human preferences.results:* The authors show that scaling up fine-tuning tends to improve helpfulness, while scaling up pre-training tends to improve factuality.* They also demonstrate that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.* Finally, the authors show that LM up-scaling, which involves ensembling small fine-tuned models with large pre-trained models, consistently improves the helpfulness and factuality of instruction-following models in several families of models.

Abstract
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or other specifications of desired behaviors. While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filters this knowledge and skillset, this intuition has not been extensively tested. To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)?" Using an RL-based framework derived from recent developments in learning from human preferences, we introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates (or 'emulates') the result of pre-training and fine-tuning at different scales. Our experiments with EFT show that scaling up fine-tuning tends to improve helpfulness, while scaling up pre-training tends to improve factuality. Beyond decoupling scale, we show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models, essentially emulating the result of fine-tuning the large pre-trained model. Up-scaling consistently improves helpfulness and factuality of instruction-following models in the Llama, Llama-2, and Falcon families, without additional hyperparameters or training.

摘要
通用的语言模型（LM）通常由扩大一个两个阶段训练管道建立：一个预训练阶段使用非常大、多样化的文本数据，以及一个精度调整（或准确性调整）阶段使用目标示例或特定的行为要求。尽管曾经 hypothesized 知识和技能来自于预训练，而精度调整主要过滤这些知识和技能集，但这种假设未经过广泛测试。为了解决这问题，我们介绍了一种新的技术，可以分离这两个阶段中学习的知识和技能，从而解决问题，“如果将大型模型的预训练知识与小型模型的精度调整结果相结合（或相反），会发生什么？”使用基于最近的人类偏好学习的RL基础，我们引入了一种名为 Emulated Fine-Tuning（EFT）的原则正确且实用的方法，可以采样一个 approximates （或 '模拟'）预训练和精度调整的结果。我们的实验表明，在不同的缩放比例下，精度调整会提高帮助fulness，而预训练会提高准确性。此外，EFT 还可以在测试时调整竞争性 trait 如帮助fulness 和无害性，无需额外训练。最后，我们还介绍了一种特殊的 Emulated Fine-Tuning 情况，即 LM up-scaling，可以避免高资源占用的精度调整大型预训练模型，通过 ensemble 小型精度调整模型，效果相当于精度调整大型预训练模型。 up-scaling 一致地提高了 instruction-following 模型在 Llama、Llama-2 和 Falcon 家族中的帮助fulness 和准确性，无需额外参数或训练。

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

paper_url: http://arxiv.org/abs/2310.12956
repo_url: None
paper_authors: David T. Hoffmann, Simon Schrodi, Nadine Behrmann, Volker Fischer, Thomas Brox
for: 本研究探讨了transformer在多步决策任务中快速增进损失的现象。
methods: 我们使用了transformer和CNN来研究多步决策任务，并发现transformer在这些任务上遇到了困难，而CNN则没有这个问题。
results: 当transformer学习中间任务时，它们会快速并意外地提高性能，我们称之为“Eureka-oment”。这种快速改进会使模型在训练过程中更快地 дости到最佳性能，并且更容易学习中间任务，从而提高最终准确率和 robustness。

Abstract
In this work, we study rapid, step-wise improvements of the loss in transformers when being confronted with multi-step decision tasks. We found that transformers struggle to learn the intermediate tasks, whereas CNNs have no such issue on the tasks we studied. When transformers learn the intermediate task, they do this rapidly and unexpectedly after both training and validation loss saturated for hundreds of epochs. We call these rapid improvements Eureka-moments, since the transformer appears to suddenly learn a previously incomprehensible task. Similar leaps in performance have become known as Grokking. In contrast to Grokking, for Eureka-moments, both the validation and the training loss saturate before rapidly improving. We trace the problem back to the Softmax function in the self-attention block of transformers and show ways to alleviate the problem. These fixes improve training speed. The improved models reach 95% of the baseline model in just 20% of training steps while having a much higher likelihood to learn the intermediate task, lead to higher final accuracy and are more robust to hyper-parameters.

摘要
在这项研究中，我们研究了transformer在多步决策任务中快速、步进改进损失的现象。我们发现transformer在学习中缺乏intermediate task的能力，而CNN则没有这种问题。当transformer学习intermediate task时，它们会在百余个epoch后， unexpectedly和rapidly提高性能。我们称这些快速改进为“Eureka-moment”，因为transformer似乎在学习一个前不可理解的任务时，突然有了新的理解。相比Grokking，Eureka-moment中，两个loss函数都会先于改进而饱和。我们跟踪问题的起源，并提出了修复方案，这些修复方案可以提高训练速度，并使模型更加快速地学习intermediate task，最终性能高于基线模型，并且更加鲁棒于hyperparameter。

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

paper_url: http://arxiv.org/abs/2310.12955
repo_url: None
paper_authors: Rui Yang, Han Zhong, Jiawei Xu, Amy Zhang, Chongjie Zhang, Lei Han, Tong Zhang
for:* 这个论文旨在研究在线上不需要实际交互的情况下学习强化策略的可行性。methods:* 该论文使用了现有的OFFLINE强化学习算法，并进行了全面的数据损害测试，以评估不同算法对数据损害的影响。results:* 研究发现，使用IQL算法可以具有很好的抗性能，并且通过理论和实验分析，发现IQL的超级vised策略学习方案是关键因素。* 不过，IQL仍然面临着动力损害下Q函数的重 tailed Target问题，为了解决这个问题，该论文提出了一种使用惩罚函数来处理重 tailed Target的方法。* 通过将这些简单 yet effective的修改加入IQL算法，该论文提出了一种更加Robust的OFFLINE强化学习方法，称为Robust IQL（RIQL）。* 广泛的实验表明，RIQL在不同的数据损害情况下具有极高的Robust性能。

Abstract
Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment. However, datasets collected by humans in real-world environments are often noisy and may even be maliciously corrupted, which can significantly degrade the performance of offline RL. In this work, we first investigate the performance of current offline RL algorithms under comprehensive data corruption, including states, actions, rewards, and dynamics. Our extensive experiments reveal that implicit Q-learning (IQL) demonstrates remarkable resilience to data corruption among various offline RL algorithms. Furthermore, we conduct both empirical and theoretical analyses to understand IQL's robust performance, identifying its supervised policy learning scheme as the key factor. Despite its relative robustness, IQL still suffers from heavy-tail targets of Q functions under dynamics corruption. To tackle this challenge, we draw inspiration from robust statistics to employ the Huber loss to handle the heavy-tailedness and utilize quantile estimators to balance penalization for corrupted data and learning stability. By incorporating these simple yet effective modifications into IQL, we propose a more robust offline RL approach named Robust IQL (RIQL). Extensive experiments demonstrate that RIQL exhibits highly robust performance when subjected to diverse data corruption scenarios.

摘要
<> translate the following text into Simplified Chinese:Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment. However, datasets collected by humans in real-world environments are often noisy and may even be maliciously corrupted, which can significantly degrade the performance of offline RL. In this work, we first investigate the performance of current offline RL algorithms under comprehensive data corruption, including states, actions, rewards, and dynamics. Our extensive experiments reveal that implicit Q-learning (IQL) demonstrates remarkable resilience to data corruption among various offline RL algorithms. Furthermore, we conduct both empirical and theoretical analyses to understand IQL's robust performance, identifying its supervised policy learning scheme as the key factor. Despite its relative robustness, IQL still suffers from heavy-tail targets of Q functions under dynamics corruption. To tackle this challenge, we draw inspiration from robust statistics to employ the Huber loss to handle the heavy-tailedness and utilize quantile estimators to balance penalization for corrupted data and learning stability. By incorporating these simple yet effective modifications into IQL, we propose a more robust offline RL approach named Robust IQL (RIQL). Extensive experiments demonstrate that RIQL exhibits highly robust performance when subjected to diverse data corruption scenarios.离线 reinforcement learning (RL) 提供了一个有前途的方法，通过从离线数据集中学习奖励策略，而不需要costly或 unsafe 的环境交互。然而，由人类在实际环境收集的数据集经常受到噪音和可能是黑客损害，这可能会很大程度地下降离线 RL 的性能。在这项工作中，我们首先investigate 当前离线 RL 算法在全面数据损害情况下的性能，包括状态、动作、奖励和动力学。我们的广泛实验表明，隐式 Q 学习 (IQL) 在多种离线 RL 算法中表现出了抗损害的特点。此外，我们还进行了empirical 和理论分析，以了解 IQL 的Robust性能，并确定其超visited 策略学习方案为关键因素。尽管 IQL 相对强健，但是它仍然受到动力学损害的 heavy-tailed 目标问题。为解决这个挑战，我们 draw inspiration 从robust statistics 中，采用 Huber 损失函数来处理 heavy-tailedness，并使用量iles estimators来平衡损害数据和学习稳定性。通过将这些简单 yet effective 修改 incorporated into IQL，我们提出了一种更 Robust 的离线 RL 方法，名为 Robust IQL (RIQL)。广泛实验表明，RIQL 在多种数据损害场景下具有高度 Robust 性能。

Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation

paper_url: http://arxiv.org/abs/2310.12953
repo_url: None
paper_authors: Sangho Suh, Meng Chen, Bryan Min, Toby Jia-Jun Li, Haijun Xia
for: 这个研究是为了探讨如何将大语言模型（LLMs）转化为创作工具，以探索它们的创作潜力。
methods: 研究人员使用了一个框架，帮助用户透过结构化的生成方式，让他们可以轻松地探索、评估和 sinthez 许多创作想法。
results: 研究发现，这个框架可以帮助用户寻找更多的创作灵感，并且可以帮助他们快速创造出更多的创作作品。

Abstract
Thanks to their generative capabilities, large language models (LLMs) have become an invaluable tool for creative processes. These models have the capacity to produce hundreds and thousands of visual and textual outputs, offering abundant inspiration for creative endeavors. But are we harnessing their full potential? We argue that current interaction paradigms fall short, guiding users towards rapid convergence on a limited set of ideas, rather than empowering them to explore the vast latent design space in generative models. To address this limitation, we propose a framework that facilitates the structured generation of design space in which users can seamlessly explore, evaluate, and synthesize a multitude of responses. We demonstrate the feasibility and usefulness of this framework through the design and development of an interactive system, Luminate, and a user study with 8 professional writers. Our work advances how we interact with LLMs for creative tasks, introducing a way to harness the creative potential of LLMs.

摘要
Large language models (LLMs) 的生成能力使其成为了创作过程中的不可或缺工具。这些模型可以生成数百或数千个视觉和文本输出，提供了大量的创作灵感。但是，我们现在是否正确地利用其潜在力？我们认为，当前的交互方式有限制，导引用户快速 converges 到一小组IDEAS，而不是让用户在生成模型的庞大latent design space中自由探索。为解决这一限制，我们提出了一个框架，可以让用户轻松探索、评估和synthesize 多种响应。我们通过设计和开发一个交互系统Luminate，以及与8名专业作家的用户研究，证明了这种框架的可行性和实用性。我们的工作推动了如何在创作任务中与LLMs交互，推出了一种可以挖掘LLMs的创作潜力的方法。

The Foundation Model Transparency Index

paper_url: http://arxiv.org/abs/2310.12941
repo_url: https://github.com/stanford-crfm/fmti
paper_authors: Rishi Bommasani, Kevin Klyman, Shayne Longpre, Sayash Kapoor, Nestor Maslej, Betty Xiong, Daniel Zhang, Percy Liang
for: The paper aims to assess the transparency of foundation model developers and encourage better governance of the foundation model ecosystem.
methods: The authors use a comprehensive set of 100 indicators to evaluate the transparency of 10 major foundation model developers, including information about the upstream resources used to build the models, details about the models themselves, and downstream use.
results: The authors find that there is a lack of transparency in the foundation model ecosystem, with no developer disclosing significant information about the downstream impact of their flagship models, such as the number of users, affected market sectors, or how users can seek redress for harm. The findings provide a baseline for evaluating progress on transparency and governance in the future.In Simplified Chinese, the three key points would be:
for: 论文目的是评估基础模型开发者的透明度，并促进基础模型生态系统的治理。
methods: 作者使用100个细化指标评估10个主要基础模型开发者的透明度，包括模型建构所用的上游资源、模型自身的详细信息以及下游使用。
results: 作者发现基础模型生态系统中的透明度很低，没有任何开发者公布其旗舰模型的下游影响的重要信息，如用户数量、受影响的市场部门以及如何寻求伤害赔偿等。

Abstract
Foundation models have rapidly permeated society, catalyzing a wave of generative AI applications spanning enterprise and consumer-facing contexts. While the societal impact of foundation models is growing, transparency is on the decline, mirroring the opacity that has plagued past digital technologies (e.g. social media). Reversing this trend is essential: transparency is a vital precondition for public accountability, scientific innovation, and effective governance. To assess the transparency of the foundation model ecosystem and help improve transparency over time, we introduce the Foundation Model Transparency Index. The Foundation Model Transparency Index specifies 100 fine-grained indicators that comprehensively codify transparency for foundation models, spanning the upstream resources used to build a foundation model (e.g data, labor, compute), details about the model itself (e.g. size, capabilities, risks), and the downstream use (e.g. distribution channels, usage policies, affected geographies). We score 10 major foundation model developers (e.g. OpenAI, Google, Meta) against the 100 indicators to assess their transparency. To facilitate and standardize assessment, we score developers in relation to their practices for their flagship foundation model (e.g. GPT-4 for OpenAI, PaLM 2 for Google, Llama 2 for Meta). We present 10 top-level findings about the foundation model ecosystem: for example, no developer currently discloses significant information about the downstream impact of its flagship model, such as the number of users, affected market sectors, or how users can seek redress for harm. Overall, the Foundation Model Transparency Index establishes the level of transparency today to drive progress on foundation model governance via industry standards and regulatory intervention.

摘要
基础模型在社会中迅速普及，推动了一波基于企业和消费者面向的生成AI应用程序。然而，社会影响力的透明度在下降，与过去的数字技术（如社交媒体）一样，透明度的下降需要逆转。为确保公共负责任、科学创新和有效管理，我们引入基础模型透明度指数。基础模型透明度指数包括100个细化的指标，全面捕捉基础模型的透明度，包括建模时使用的上游资源（如数据、劳动、计算）、模型的详细信息（如大小、功能、风险）和下游使用（如分布渠道、使用政策、受影地域）。我们对10个主要基础模型开发者（如OpenAI、Google、Meta）进行评分，以评估他们的透明度。为了便于和标准化评估，我们对开发者的做法是基于他们的旗舰模型（如GPT-4、PaLM 2、Llama 2）进行评分。我们发现了10个关键的基础模型生态系统发现，例如，目前没有开发者公布其旗舰模型下游的重要信息，如用户数量、受影市场部门和用户如何申诉害。总之，基础模型透明度指数为今天的透明度水平，以便通过行业标准和法规干预加以进步。

Eureka: Human-Level Reward Design via Coding Large Language Models

paper_url: http://arxiv.org/abs/2310.12931
repo_url: https://github.com/eureka-research/Eureka
paper_authors: Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar
For: The paper aims to develop a human-level reward design algorithm powered by large language models (LLMs) to acquire complex skills via reinforcement learning.* Methods: The proposed algorithm, called Eureka, leverages the zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs to perform evolutionary optimization over reward code.* Results: Eureka outperforms human experts on 83% of the tasks in a diverse suite of 29 open-source RL environments, leading to an average normalized improvement of 52%. Additionally, the algorithm is able to learn complex skills such as pen spinning tricks using a simulated Shadow Hand.

Abstract
Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs, such as GPT-4, to perform evolutionary optimization over reward code. The resulting rewards can then be used to acquire complex skills via reinforcement learning. Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%. The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating. Finally, using Eureka rewards in a curriculum learning setting, we demonstrate for the first time, a simulated Shadow Hand capable of performing pen spinning tricks, adeptly manipulating a pen in circles at rapid speed.

摘要
大型语言模型（LLM）在序列决策任务中表现出色，但将其用于学习复杂的低级机械操作任务，如灵活的笔旋转，仍然是一个未解决的问题。我们在这个问题上提出了解论，并提出了一种基于LLM的人类水平奖励设计算法，称为Eureka。Eureka利用现代LLM的零容量生成、代码写作和在场进程提高功能，通过对奖励代码进行演化优化，以获得人类水平的技能。无需任务特定的提示或预定的奖励模板，Eureka可以生成高性能的奖励函数，在29个开源RL环境中，包括10种不同的机器人形态，在83%的任务上超过人类专家，平均减少了52%。Eureka的通用性还允许一种 gradient-free 在场学习方法，可以从人类反馈中提取改进奖励函数的质量和安全性，无需模型更新。最后，使用Eureka奖励在CURRICULUM学习中，我们在 simulations 中首次实现了一个名为Shadow Hand的笔旋转技巧，快速旋转笔，灵活地控制笔的运动。

Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

paper_url: http://arxiv.org/abs/2310.13724
repo_url: None
paper_authors: Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi
for: 研究人类和机器人之间的协作任务在家庭环境中的模拟平台。
methods: 使用高速的人iform模拟，支持人类和机器人之间的真实交互，以及研究社交导航和社交重新排序两种协作任务。
results: 实现了基于学习的机器人策略的效果，并与人类合作完成任务，以及观察到机器人在协作任务执行过程中的自然行为。

Abstract
We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real human interaction with simulated robots via mouse/keyboard or a VR interface, facilitating evaluation of robot policies with human input. (3) Collaborative tasks: studying two collaborative tasks, Social Navigation and Social Rearrangement. Social Navigation investigates a robot's ability to locate and follow humanoid avatars in unseen environments, whereas Social Rearrangement addresses collaboration between a humanoid and robot while rearranging a scene. These contributions allow us to study end-to-end learned and heuristic baselines for human-robot collaboration in-depth, as well as evaluate them with humans in the loop. Our experiments demonstrate that learned robot policies lead to efficient task completion when collaborating with unseen humanoid agents and human partners that might exhibit behaviors that the robot has not seen before. Additionally, we observe emergent behaviors during collaborative task execution, such as the robot yielding space when obstructing a humanoid agent, thereby allowing the effective completion of the task by the humanoid agent. Furthermore, our experiments using the human-in-the-loop tool demonstrate that our automated evaluation with humanoids can provide an indication of the relative ordering of different policies when evaluated with real human collaborators. Habitat 3.0 unlocks interesting new features in simulators for Embodied AI, and we hope it paves the way for a new frontier of embodied human-AI interaction capabilities.

摘要
我们介绍Habitat 3.0：一个人工智能 simulate human-robot collaborative tasks 的平台。Habitat 3.0在三个维度上提供贡献：1. 准确的人形模拟：解决复杂的材质和形态的模拟挑战，同时保证高速的模拟速度。2. 人类在Loop基础设施：允许人类与模拟机器人进行实际的互动，通过鼠标/键盘或VR界面进行评估机器人策略。3. 合作任务：研究社交导航和社交重新排序两种合作任务。社交导航探索机器人如何在未看过环境中找到和跟随人形模拟器，而社交重新排序则研究机器人和人形模拟器之间的合作，重新排序场景。这些贡献使我们可以深入研究人机器人合作的综合学习和规则基础，并通过人类在Loop评估机器人策略的效果。我们的实验表明，学习机器人策略在与未看过人形模拟器和人类合作时能够高效完成任务，并且在合作过程中出现了规则行为，如机器人让出空间以便人形模拟器完成任务。此外，我们使用人类在Loop工具进行自动评估，发现我们的人机器人合作策略在与真实的人类合作者评估时可以提供相对排名的指示。Habitat 3.0开启了人工智能 simulate 的新领域，我们希望它能够推动人机器人交互能力的新前ier。

Digital Twin-Enabled Intelligent DDoS Detection Mechanism for Autonomous Core Networks

paper_url: http://arxiv.org/abs/2310.12924
repo_url: None
paper_authors: Yagmur Yigit, Bahadir Bal, Aytac Karameseoglu, Trung Q. Duong, Berk Canberk
for: 本研究旨在提供一种适用于互联网服务提供商核心网络的分布式拒绝服务攻击检测机制，以解决现有的检测技术无法处理高度聚合数据流的问题。
methods: 本文提出了一种基于数字双子的智能拒绝服务检测机制，使用在线学习方法来自动更新模型，提高检测精度和效率。文章还提出了一种基于YANG模型和自动特征选择模块来处理核心网络数据。
results: Results show that the proposed solution can successfully detect DDoS attacks and update the feature selection method and learning model with a true classification rate of ninety-seven percent. The proposed solution can estimate the attack within approximately fifteen minutes after the DDoS attack starts.

Abstract
Existing distributed denial of service attack (DDoS) solutions cannot handle highly aggregated data rates; thus, they are unsuitable for Internet service provider (ISP) core networks. This article proposes a digital twin-enabled intelligent DDoS detection mechanism using an online learning method for autonomous systems. Our contributions are three-fold: we first design a DDoS detection architecture based on the digital twin for ISP core networks. We implemented a Yet Another Next Generation (YANG) model and an automated feature selection (AutoFS) module to handle core network data. We used an online learning approach to update the model instantly and efficiently, improve the learning model quickly, and ensure accurate predictions. Finally, we reveal that our proposed solution successfully detects DDoS attacks and updates the feature selection method and learning model with a true classification rate of ninety-seven percent. Our proposed solution can estimate the attack within approximately fifteen minutes after the DDoS attack starts.

摘要
现有的分布式拒绝服务攻击（DDoS）解决方案无法处理高度归一数据流量，因此不适用于互联网服务提供商（ISP）核心网络。本文提出一种基于数字双胞虫的智能DDoS检测机制，使用在线学习方法。我们的贡献包括：1. 基于数字双胞虫的DDoS检测体系设计 для ISP核心网络。2. 实现了基于YANG模型和自动特征选择（AutoFS）模块来处理核心网络数据。3. 使用在线学习方法来升级模型，实时更新学习模型，提高预测精度。我们的解决方案可以在约15分钟内检测DDoS攻击，并且可以随时更新特征选择方法和学习模型，实现true类别率达97%。

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.12921
repo_url: None
paper_authors: Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David Lindner
for: 使用预训练的视觉语言模型（VLM）作为零次学习的奖励模型（RM），以便使用自然语言来定义任务。
methods: 使用 CLIP 基于的 VLM，通过提供单个文本提示来训练 MuJoCo 人工智能学习复杂任务，无需手动指定奖励函数。
results: 使用 VLM-RMs 训练 MuJoCo 人工智能学习多种复杂任务，包括跪姿、分解和坐法印袋pose，只需提供单个文本提示，并且可以通过提供基线提示来改进性能。

Abstract
Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn complex tasks without a manually specified reward function, such as kneeling, doing the splits, and sitting in a lotus position. For each of these tasks, we only provide a single sentence text prompt describing the desired task with minimal prompt engineering. We provide videos of the trained agents at: https://sites.google.com/view/vlm-rm. We can improve performance by providing a second ``baseline'' prompt and projecting out parts of the CLIP embedding space irrelevant to distinguish between goal and baseline. Further, we find a strong scaling effect for VLM-RMs: larger VLMs trained with more compute and data are better reward models. The failure modes of VLM-RMs we encountered are all related to known capability limitations of current VLMs, such as limited spatial reasoning ability or visually unrealistic environments that are far off-distribution for the VLM. We find that VLM-RMs are remarkably robust as long as the VLM is large enough. This suggests that future VLMs will become more and more useful reward models for a wide range of RL applications.

摘要
强化学习（RL）通常需要手动指定奖励函数，这经常是不可能的或者需要大量的人工反馈，这都是非常昂贵的。我们研究了一种更加样本效率的方法：使用预训练的视觉语言模型（VLM）作为零shot奖励模型（RM），通过自然语言来定义任务。我们提出了一种自然的通用方法，称之为VLM-RM。我们使用CLIP基于VLM来训练一个MuJoCo人型机器人学习复杂任务，无需手动指定奖励函数，例如跪下、坐印、坐法轮等。对于每个任务，我们只提供了一个单 sentence文本提示，描述所需任务的目标，并且尽可能少的提示工程。我们提供了训练过程中的视频在：https://sites.google.com/view/vlm-rm。我们可以通过提供第二个“基准”提示来提高性能，并且可以在CLIP embedding空间中投射出无关于目标和基准的部分，以更好地分辨目标和基准。此外，我们发现VLM-RM失败模式都与当前VLM的能力限制相关，如视觉无法理解空间或环境非常远程。然而，我们发现VLM-RM具有惊人的稳定性，只要VLM具有足够的大小和计算资源，它们就可以成为RL应用中广泛使用的优秀奖励模型。这意味着未来的VLM会变得越来越有用。

Generative Marginalization Models

paper_url: http://arxiv.org/abs/2310.12920
repo_url: https://github.com/princetonlips/mam
paper_authors: Sulin Liu, Peter J. Ramadge, Ryan P. Adams
for: 这篇论文旨在推介一种新的生成模型家族，称为积分模型（MaMs），可以处理高维整数数据。这些模型具有可扩展和灵活的生成模型，以及可追踪的概率函数。
methods: 这篇论文提出了一种基于“积分自适应”的可扩展学习方法，用于学习高维整数数据的生成模型。这种方法可以在energy-based训练中实现可靠的生成模型，并且可以在高维问题上进行任意阶层的生成模型化。
results: 论文通过在多种整数数据集上进行实验，证明了MaMs模型的效果。MaMs模型可以在 maximum likelihood 和 energy-based 训练中对高维整数数据进行高效的生成模型化，并且可以实现数量级快速的 marginal 概率评估。

Abstract
We introduce marginalization models (MaMs), a new family of generative models for high-dimensional discrete data. They offer scalable and flexible generative modeling with tractable likelihoods by explicitly modeling all induced marginal distributions. Marginalization models enable fast evaluation of arbitrary marginal probabilities with a single forward pass of the neural network, which overcomes a major limitation of methods with exact marginal inference, such as autoregressive models (ARMs). We propose scalable methods for learning the marginals, grounded in the concept of "marginalization self-consistency". Unlike previous methods, MaMs support scalable training of any-order generative models for high-dimensional problems under the setting of energy-based training, where the goal is to match the learned distribution to a given desired probability (specified by an unnormalized (log) probability function such as energy function or reward function). We demonstrate the effectiveness of the proposed model on a variety of discrete data distributions, including binary images, language, physical systems, and molecules, for maximum likelihood and energy-based training settings. MaMs achieve orders of magnitude speedup in evaluating the marginal probabilities on both settings. For energy-based training tasks, MaMs enable any-order generative modeling of high-dimensional problems beyond the capability of previous methods. Code is at https://github.com/PrincetonLIPS/MaM.

摘要
我们介绍 MARGINALIZATION MODELS (MaMs)，一新的一代生成模型，适用于高维数据。它具有可扩展和 flexible 的生成模型，并且可以实现可读的可能性。MaMs 可以快速评估任意阶层的边界概率，这解决了前一代方法的一个主要限制，即需要精确地掌握边界概率。我们提出了可扩展的学习方法，基于 "marginalization self-consistency" 的概念。MaMs 支持可扩展的训练高维问题，以energy-based 训练为例，目的是将学习的分布与givens 的 (log) 可能性函数相匹配。我们在不同的数据分布上进行了各种实验，包括 binary 图像、语言、物理系统和分子，并获得了许多次速度的提升。对于 energy-based 训练任务，MaMs 可以实现高维问题的任何阶层生成模型，超出了前一代方法的能力。请参考 https://github.com/PrincetonLIPS/MaM。

Network-Aware AutoML Framework for Software-Defined Sensor Networks

paper_url: http://arxiv.org/abs/2310.12914
repo_url: None
paper_authors: Emre Horsanali, Yagmur Yigit, Gokhan Secinti, Aytac Karameseoglu, Berk Canberk
for: 本文旨在提出一种基于自适应机器学习（AutoML）的软件定义感知网络中的拒绝服务攻击检测方法，以满足软件定义感知网络和感知网络的安全需求。
methods: 本文使用了一种基于变量负载、不同流量率和检测时间的网络知识检测方法，并选择了适合网络环境的最佳机器学习算法来检测拒绝服务攻击。
results: 本文的实验结果表明，在遭受拒绝服务攻击时，我们的检测方法能够保证网络中的流量 packet仍然在正确的时间内传输，并且可以避免过拟合。

Abstract
As the current detection solutions of distributed denial of service attacks (DDoS) need additional infrastructures to handle high aggregate data rates, they are not suitable for sensor networks or the Internet of Things. Besides, the security architecture of software-defined sensor networks needs to pay attention to the vulnerabilities of both software-defined networks and sensor networks. In this paper, we propose a network-aware automated machine learning (AutoML) framework which detects DDoS attacks in software-defined sensor networks. Our framework selects an ideal machine learning algorithm to detect DDoS attacks in network-constrained environments, using metrics such as variable traffic load, heterogeneous traffic rate, and detection time while preventing over-fitting. Our contributions are two-fold: (i) we first investigate the trade-off between the efficiency of ML algorithms and network/traffic state in the scope of DDoS detection. (ii) we design and implement a software architecture containing open-source network tools, with the deployment of multiple ML algorithms. Lastly, we show that under the denial of service attacks, our framework ensures the traffic packets are still delivered within the network with additional delays.

摘要
Current detection solutions of distributed denial of service attacks (DDoS) require additional infrastructure to handle high aggregate data rates, making them unsuitable for sensor networks or the Internet of Things. Moreover, the security architecture of software-defined sensor networks must address the vulnerabilities of both software-defined networks and sensor networks. In this paper, we propose a network-aware automated machine learning (AutoML) framework that detects DDoS attacks in software-defined sensor networks. Our framework selects the most appropriate machine learning algorithm to detect DDoS attacks in network-constrained environments, taking into account variables such as traffic load, heterogeneous traffic rate, and detection time while preventing overfitting. Our contributions are twofold:1. We investigate the trade-off between the efficiency of ML algorithms and network/traffic state in the context of DDoS detection.2. We design and implement a software architecture that incorporates open-source network tools and deploys multiple ML algorithms.Our framework ensures that traffic packets are still delivered within the network, albeit with additional delays, even under denial of service attacks.

Experimental Narratives: A Comparison of Human Crowdsourced Storytelling and AI Storytelling

paper_url: http://arxiv.org/abs/2310.12902
repo_url: None
paper_authors: Nina Begus
for:This paper aims to investigate cultural artifacts and social biases in storytelling by both humans and generative AI.methods:The study employs fictional prompts as a novel tool for analyzing storytelling, combining behavioral and computational experiments.results:The study finds that AI-generated stories are more progressive in terms of gender roles and sexuality than human-authored texts, but offer less imaginative scenarios and rhetoric. The study also reveals the pervasive presence of the Pygmalion myth in both human and AI storytelling.

Abstract
The paper proposes a framework that combines behavioral and computational experiments employing fictional prompts as a novel tool for investigating cultural artifacts and social biases in storytelling both by humans and generative AI. The study analyzes 250 stories authored by crowdworkers in June 2019 and 80 stories generated by GPT-3.5 and GPT-4 in March 2023 by merging methods from narratology and inferential statistics. Both crowdworkers and large language models responded to identical prompts about creating and falling in love with an artificial human. The proposed experimental paradigm allows a direct comparison between human and LLM-generated storytelling. Responses to the Pygmalionesque prompts confirm the pervasive presence of the Pygmalion myth in the collective imaginary of both humans and large language models. All solicited narratives present a scientific or technological pursuit. The analysis reveals that narratives from GPT-3.5 and particularly GPT-4 are more more progressive in terms of gender roles and sexuality than those written by humans. While AI narratives can occasionally provide innovative plot twists, they offer less imaginative scenarios and rhetoric than human-authored texts. The proposed framework argues that fiction can be used as a window into human and AI-based collective imaginary and social dimensions.

摘要
文章提出一种框架，结合行为和计算实验，使用虚构提问作为一种新的工具，探讨人类和生成AI对文学作品的叙述方式和社会偏见。研究分析了2019年6月由大量志愿者创作的250个故事，以及2023年3月由GPT-3.5和GPT-4生成的80个故事，通过结合 naratology 和推理统计方法进行分析。人类和大语言模型都 responded to identical prompts about creating and falling in love with an artificial human。提出的实验方法允许直接比较人类和大语言模型生成的叙述方式。对 pygmalionesque 提问的答复表明了人类和大语言模型的 коллектив imagine 中 pygmalion 神话的普遍存在。所有的请求故事都涉及科学或技术的追求。分析发现，GPT-3.5 和尤其是 GPT-4 生成的故事在性别角色和性 orientation 方面比人类生成的故事更加进步。虽然 AI 生成的故事可以提供创新的情节和叙述方式，但它们的想象力和修辞比人类创作的文章更差。提出的框架认为， fiction 可以作为人类和 AI 基础 imagine 和社会方面的窗口。

Personalized human mobility prediction for HuMob challenge

paper_url: http://arxiv.org/abs/2310.12900
repo_url: None
paper_authors: Masahiro Suzuki, Shomu Furuta, Yusuke Fukazawa
for: 这个论文是为了提出一种基于个人行为特征的人员流动预测方法，以便更好地预测每个人的行动路径。
methods: 这个论文采用了个性化模型来预测每个人的行动路径，而不是预测整个人群的行动。它采用了日期和时间、活动时间、天数、时间和 POI 访问频率等特征，并通过聚类技术将其他具有相似行为特征的人员的运动方向纳入考虑。机器学习模型是支持向量回归（SVR）。
results: 论文通过线上评估准确性和特征选择和参数调整来评估模型的准确性，并发现这种个性化模型具有较低的计算成本而且可以达到较好的准确性。

Abstract
We explain the methodology used to create the data submitted to HuMob Challenge, a data analysis competition for human mobility prediction. We adopted a personalized model to predict the individual's movement trajectory from their data, instead of predicting from the overall movement, based on the hypothesis that human movement is unique to each person. We devised the features such as the date and time, activity time, days of the week, time of day, and frequency of visits to POI (Point of Interest). As additional features, we incorporated the movement of other individuals with similar behavior patterns through the employment of clustering. The machine learning model we adopted was the Support Vector Regression (SVR). We performed accuracy through offline assessment and carried out feature selection and parameter tuning. Although overall dataset provided consists of 100,000 users trajectory, our method use only 20,000 target users data, and do not need to use other 80,000 data. Despite the personalized model's traditional feature engineering approach, this model yields reasonably good accuracy with lower computational cost.

摘要
我们介绍了在 HuMob Challenge 数据分析比赛中使用的数据创建方法ологи。我们采用了个性化模型来预测个人的运动轨迹，而不是基于整体运动的预测，根据假设人类运动各自独特。我们设计了特征，包括日期和时间、活动时间、天数、时间和访问点索引（POI）的频率。在其他特征上，我们通过聚类来 incorporate 其他有相似行为模式的个体的运动。我们采用的机器学习模型是支持向量回归（SVR）。我们通过线上评估来评估准确性，并进行特征选择和参数调整。虽然总体数据集包含100,000个用户的轨迹，但我们的方法只使用了20,000个目标用户的数据，并不需要使用其他80,000个数据。尽管个性化模型采用传统的特征工程方法，但这种模型仍然可以得到相对较好的准确性，同时计算成本较低。

TwinPot: Digital Twin-assisted Honeypot for Cyber-Secure Smart Seaports

paper_url: http://arxiv.org/abs/2310.12880
repo_url: None
paper_authors: Yagmur Yigit, Omer Kemal Kinaci, Trung Q. Duong, Berk Canberk
for: 这个研究是为了提供一个基于双胞萃取技术的智能港区防护系统，以应对现代港区中的攻击和侵略。methods: 本研究使用了双胞萃取技术和诱掳技术来建立一个具有高真实性的诱掳系统，并与现有的防护系统集成。results: 本研究发现，使用双胞萃取技术和诱掳系统可以实现更高的攻击检测精度和防护性。对于智能港区中的内部和外部攻击，我们的解决方案可以成功地检测和防护系统。

Abstract
The idea of next-generation ports has become more apparent in the last ten years in response to the challenge posed by the rising demand for efficiency and the ever-increasing volume of goods. In this new era of intelligent infrastructure and facilities, it is evident that cyber-security has recently received the most significant attention from the seaport and maritime authorities, and it is a primary concern on the agenda of most ports. Traditional security solutions can be applied to safeguard IoT and Cyber-Physical Systems (CPS) from harmful entities. Nevertheless, security researchers can only watch, examine, and learn about the behaviors of attackers if these solutions operate more transparently. Herein, honeypots are potential solutions since they offer valuable information about the attackers. It can be virtual or physical. Virtual honeypots must be more realistic to entice attackers, necessitating better high-fidelity. To this end, Digital Twin (DT) technology can be employed to increase the complexity and simulation fidelity of the honeypots. Seaports can be attacked from both their existing devices and external devices at the same time. Existing mechanisms are insufficient to detect external attacks; therefore, the current systems cannot handle attacks at the desired level. DT and honeypot technologies can be used together to tackle them. Consequently, we suggest a DT-assisted honeypot, called TwinPot, for external attacks in smart seaports. Moreover, we propose an intelligent attack detection mechanism to handle different attack types using DT for internal attacks. Finally, we build an extensive smart seaport dataset for internal and external attacks using the MANSIM tool and two existing datasets to test the performance of our system. We show that under simultaneous internal and external attacks on the system, our solution successfully detects internal and external attacks.

摘要
“随着近期的高效需求和货物量的增加，次代港口设施在过去十年中得到了更多的关注。在这个新时代的智能基础设施和设备中，cyber安全已经成为港口和海事管理部门的首要课题。传统安全解决方案可以保护互联网物理设备和Cyber-Physical Systems（CPS）免于危险威胁。但是，安全研究人员只能通过观察攻击者的行为来了解攻击者。在这种情况下，诱饵（honeypot）是一个有potential的解决方案，因为它们提供了攻击者的有益信息。诱饵可以是虚拟的或物理的。虚拟诱饵需要更加真实，以吸引攻击者，因此需要更高的高精确度。为此，数位双胞袭（DT）技术可以被应用，以增加诱饵的复杂性和模拟精度。 smart ports 可以受到来自现有设备以及外部设备的同时攻击。现有的机制不足以检测外部攻击，因此现有的系统无法处理攻击到所需的水平。DT和诱饵技术可以被用 вместе，以解决这个问题。此外，我们还提出了一个智能攻击探测机制，用于处理不同类型的攻击。最后，我们建立了一个大量的聪明港口数据集，用于内部和外部攻击的测试。我们显示，在同时受到内部和外部攻击的情况下，我们的解决方案成功地检测了内部和外部攻击。”

Predicting Ovarian Cancer Treatment Response in Histopathology using Hierarchical Vision Transformers and Multiple Instance Learning

paper_url: http://arxiv.org/abs/2310.12866
repo_url: https://github.com/scjjb/hipt_abmil_atec23
paper_authors: Jack Breen, Katie Allen, Kieran Zucker, Geoff Hall, Nishant Ravikumar, Nicolas M. Orsi
for: 这个研究的目的是用深度学习预测抗血管生长药bevacizumab治疗悉尼癌症患者是否能达到至少6个月的痊愈或疾病进展阻止。
methods: 这个研究使用了一种预训练的层次图像变换器（HIPT）和一种注意力基于多个实例学习（ABMIL）模型来提取区域特征和汇集特征，并使用一个内置的权重系统来分类整个扫描片。
results: 这个研究发现，使用HIPT-ABMIL模型可以在282个厚示片图像（WSIs）中预测悉尼癌症患者是否能达到至少6个月的痊愈或疾病进展阻止，并取得了60.2%±2.9%的内部平衡准确率和0.646±0.033的ROC曲线。

Abstract
For many patients, current ovarian cancer treatments offer limited clinical benefit. For some therapies, it is not possible to predict patients' responses, potentially exposing them to the adverse effects of treatment without any therapeutic benefit. As part of the automated prediction of treatment effectiveness in ovarian cancer using histopathological images (ATEC23) challenge, we evaluated the effectiveness of deep learning to predict whether a course of treatment including the antiangiogenic drug bevacizumab could contribute to remission or prevent disease progression for at least 6 months in a set of 282 histopathology whole slide images (WSIs) from 78 ovarian cancer patients. Our approach used a pretrained Hierarchical Image Pyramid Transformer (HIPT) to extract region-level features and an attention-based multiple instance learning (ABMIL) model to aggregate features and classify whole slides. The optimal HIPT-ABMIL model had an internal balanced accuracy of 60.2% +- 2.9% and an AUC of 0.646 +- 0.033. Histopathology-specific model pretraining was found to be beneficial to classification performance, though hierarchical transformers were not, with a ResNet feature extractor achieving similar performance. Due to the dataset being small and highly heterogeneous, performance was variable across 5-fold cross-validation folds, and there were some extreme differences between validation and test set performance within folds. The model did not generalise well to tissue microarrays, with accuracy worse than random chance. It is not yet clear whether ovarian cancer WSIs contain information that can be used to accurately predict treatment response, with further validation using larger, higher-quality datasets required.

摘要
很多病人，现有的卵巢癌治疗方案具有有限的临床效果。一些疗法，无法预测患者的反应，可能会将患者暴露于无效的治疗后果。在自动预测卵巢癌治疗效果使用历史Pathological Images（ATEC23）挑战中，我们评估了深度学习是否可以预测在6个月内remission或疾病进展的可能性，并对78例卵巢癌病人的282个历史Pathology Whole Slide Images（WSIs）进行了分析。我们的方法使用了预训练的层次Image Pyramid Transformer（HIPT）提取区域特征，以及关注基本多实例学习（ABMIL）模型来聚合特征并分类整个扫描图。最佳的HIPT-ABMIL模型具有内部平衡准确率为60.2% ± 2.9%和ROC曲线为0.646 ± 0.033。 histopathology特定的模型预训练被发现对分类性能产生了正面影响，而层次变换器则不是，ResNet特征提取器达到了类似的性能。由于数据集较小且高度多样化，性能在5个批处分验中具有变化性，并且在批处和测试集之间有一些极端的差异。模型无法通过细胞 microarray进行泛化，准确率落后于随机抽样。这表明卵巢癌WSIs中可能没有准确预测治疗效果的信息，需要进一步的验证和 validate larger、higherquality数据集。

Physical Information Neural Networks for Solving High-index Differential-algebraic Equation Systems Based on Radau Methods

paper_url: http://arxiv.org/abs/2310.12846
repo_url: None
paper_authors: Jiasheng Chen, Juan Tang, Ming Yan, Shuai Lai, Kun Liang, Jianguang Lu, Wenqiang Yang
for: 解决大型差分代数方程系统（DAE）中的精度问题
methods: 结合Radau IIA数学方法和神经网络结构，使用注意力机制解决高次DAE
results: 对两个经典高次DAE系统进行数值实验，并证明使用5次Radau IIA方法可以实现最高精度解决方案，其绝对误差低于$10^{-6}$，超过现有文献结果。

Abstract
As is well known, differential algebraic equations (DAEs), which are able to describe dynamic changes and underlying constraints, have been widely applied in engineering fields such as fluid dynamics, multi-body dynamics, mechanical systems and control theory. In practical physical modeling within these domains, the systems often generate high-index DAEs. Classical implicit numerical methods typically result in varying order reduction of numerical accuracy when solving high-index systems.~Recently, the physics-informed neural network (PINN) has gained attention for solving DAE systems. However, it faces challenges like the inability to directly solve high-index systems, lower predictive accuracy, and weaker generalization capabilities. In this paper, we propose a PINN computational framework, combined Radau IIA numerical method with a neural network structure via the attention mechanisms, to directly solve high-index DAEs. Furthermore, we employ a domain decomposition strategy to enhance solution accuracy. We conduct numerical experiments with two classical high-index systems as illustrative examples, investigating how different orders of the Radau IIA method affect the accuracy of neural network solutions. The experimental results demonstrate that the PINN based on a 5th-order Radau IIA method achieves the highest level of system accuracy. Specifically, the absolute errors for all differential variables remains as low as $10^{-6}$, and the absolute errors for algebraic variables is maintained at $10^{-5}$, surpassing the results found in existing literature. Therefore, our method exhibits excellent computational accuracy and strong generalization capabilities, providing a feasible approach for the high-precision solution of larger-scale DAEs with higher indices or challenging high-dimensional partial differential algebraic equation systems.

摘要
为了描述动态变化和下面约束，混合方程（DAEs）在工程领域得到广泛应用，如流体动力学、多体动力学、机械系统和控制理论。在实际物理模拟中，系统经常生成高指数DAEs。经典的假设方法通常会导致解的级数减少，从而降低数值精度。在这篇论文中，我们提议一种基于物理学习神经网络（PINN）的计算框架，结合卷积IIA方法和神经网络结构，以直接解决高指数DAEs。此外，我们采用域划分策略来提高解的准确性。我们在两个经典的高指数系统中进行了数值实验，研究不同的卷积IIA方法的影响于神经网络解的精度。实验结果表明，基于5个卷积IIA方法的PINN算法可以达到系统的最高精度水平。具体地，所有的杂分变量的绝对误差保持在10^-6之下，而部分变量的绝对误差保持在10^-5之下，超过现有文献的结果。因此，我们的方法可以具有出色的计算精度和强大的泛化能力，为更大规模的DAEs或更复杂的部分数学方程系统提供可靠的解决方案。

AgentTuning: Enabling Generalized Agent Abilities for LLMs

paper_url: http://arxiv.org/abs/2310.12823
repo_url: https://github.com/thudm/agenttuning
paper_authors: Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yuxiao Dong, Jie Tang
for: 提高LLMs的代理能力（agent capabilities），不影响其通用能力（general abilities）。
methods: 使用AgentInstruct dataset和混合 instruciton-tuning策略，升级Llama 2 series为AgentLM。
results: AgentTuning可以提高LLMs的代理能力，而无需妥协其通用能力，AgentLM-70B与GPT-3.5-turbo在未看过任务上的表现相当。

Abstract
Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities. In this work, we present AgentTuning, a simple and general method to enhance the agent abilities of LLMs while maintaining their general LLM capabilities. We construct AgentInstruct, a lightweight instruction-tuning dataset containing high-quality interaction trajectories. We employ a hybrid instruction-tuning strategy by combining AgentInstruct with open-source instructions from general domains. AgentTuning is used to instruction-tune the Llama 2 series, resulting in AgentLM. Our evaluations show that AgentTuning enables LLMs' agent capabilities without compromising general abilities. The AgentLM-70B is comparable to GPT-3.5-turbo on unseen agent tasks, demonstrating generalized agent capabilities. We open source the AgentInstruct and AgentLM-7B, 13B, and 70B models at https://github.com/THUDM/AgentTuning, serving open and powerful alternatives to commercial LLMs for agent tasks.

摘要
大型语言模型（LLMs）在各种任务中表现出色，但在实际世界中执行复杂任务时，它们远远不如商业模型如ChatGPT和GPT-4。这些代理任务使用LLMs作为中央控制器，负责计划、记忆和工具使用，需要细化的提示方法和Robust LLMs来实现满意性。虽然许多提示方法已经被提出来完成特定任务，但是尚未有关于提高LLMs自身代理能力的研究。在这项工作中，我们提出了AgentTuning方法，用于提高LLMs的代理能力，而不需要妥协其总体能力。我们构建了AgentInstruct数据集，包含高质量交互轨迹，并采用混合指令练级策略，将AgentInstruct与通用领域的开源指令结合使用。我们使用AgentTuning进行指令练级，得到了AgentLM。我们的评估结果表明，AgentTuning可以提高LLMs的代理能力，无需妥协其总体能力。AgentLM-70B与GPT-3.5-turbo在未见agent任务上的表现相当，表明AgentLM具有通用的代理能力。我们将AgentInstruct和AgentLM-7B、13B和70B模型公开发布在GitHub上， serving as open and powerful alternatives to commercial LLMs for agent tasks。

Hybrid Search for Efficient Planning with Completeness Guarantees

paper_url: http://arxiv.org/abs/2310.12819
repo_url: None
paper_authors: Kalle Kujanpää, Joni Pajarinen, Alexander Ilin
for: 解决复杂的规划问题，使用学习基于搜索的方法可以减少计算复杂性，但是这些方法通常缺乏完整性保证，可能无法找到答案。
methods: 我们提出了一种可以确保完整性的高级搜索方法，即完整子目标搜索（complete subgoal search），通过将高级搜索与低级动作结合在一起，实现了高级搜索的实用性和低级搜索的完整性。
results: 我们在复杂的规划问题上应用了我们的完整子目标搜索方法，并证明了该方法可以确保完整性，并且在一些情况下，可以提高高级搜索的性能。这种方法使得可以在系统中应用高级搜索，并且确保完整性。

Abstract
Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.

摘要

Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

paper_url: http://arxiv.org/abs/2310.12818
repo_url: None
paper_authors: Weize Chen, Xiaoyue Xu, Xu Han, Yankai Lin, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou
for: 提高 parameter-shared PLMs 的推理效率，以适应有限的计算资源和延迟要求。
methods: 基于神经网络 ODEs 的一种简单技术，以及一种简单的预训练技术，可以更加提高推理效率。
results: 对 autoregressive 和 autoencoding PLMs 进行实验，显示了我们的方法的有效性，并提供了新的想法 для更有效地使用 parameter-shared 模型。

Abstract
Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise. However, it is important to note that parameter sharing does not alleviate computational burdens associated with inference, thus impeding its practicality in situations characterized by limited stringent latency requirements or computational resources. Building upon neural ordinary differential equations (ODEs), we introduce a straightforward technique to enhance the inference efficiency of parameter-shared PLMs. Additionally, we propose a simple pre-training technique that leads to fully or partially shared models capable of achieving even greater inference acceleration. The experimental results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs, providing novel insights into more efficient utilization of parameter-shared models in resource-constrained settings.

摘要
parameter-shared预训练语言模型（PLMs）在有限资源环境中成为成功的方法，实现了重要的存储和内存成本减少，而无需妥协性能。然而，需要注意的是，参数共享不会减轻推理过程中的计算压力，因此在具有严格的响应时间要求或计算资源限制的情况下，实际应用中可能存在一定的限制。基于神经ordinary differential equations（ODEs），我们提出了一种简单的技术来提高参数共享PLMs的推理效率。此外，我们还提出了一种简单的预训练技术，可以实现完全或部分共享的模型，以达到更高的推理加速。实验结果表明，我们的方法在autoregressive和autoencodingPLMs中具有显著的效果，为有限资源设置中更有效地使用参数共享模型提供了新的理解。

2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision

paper_url: http://arxiv.org/abs/2310.12817
repo_url: None
paper_authors: Cheng-Kun Yang, Min-Hung Chen, Yung-Yu Chuang, Yen-Yu Lin
for: weakly supervised point cloud segmentation
methods: transformer model with two encoders and one decoder, using self-attention and interlaced 2D-3D cross-attention for implicit feature fusion
results: outperforms existing weakly supervised point cloud segmentation methods by a large margin on S3DIS and ScanNet benchmarksHere is the Chinese translation of the three key information points:
for: 弱Supervised点云分割
methods: 转换器模型，使用自注意和跨视图的2D-3D交叉注意来实现隐式特征融合
results: 在S3DIS和ScanNet benchmark上，与现有弱Supervised点云分割方法相比，表现出了大幅提升

Abstract
We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D and 3D feature fusion based on weakly supervised learning is in great demand. To this end, we propose a transformer model with two encoders and one decoder for weakly supervised point cloud segmentation using only scene-level class tags. Specifically, the two encoders compute the self-attended features for 3D point clouds and 2D multi-view images, respectively. The decoder implements interlaced 2D-3D cross-attention and carries out implicit 2D and 3D feature fusion. We alternately switch the roles of queries and key-value pairs in the decoder layers. It turns out that the 2D and 3D features are iteratively enriched by each other. Experiments show that it performs favorably against existing weakly supervised point cloud segmentation methods by a large margin on the S3DIS and ScanNet benchmarks. The project page will be available at https://jimmy15923.github.io/mit_web/.

摘要
我们提出了一个多模式融合 transformer（MIT），它同时考虑了2D和3D数据 для弱监督点云分类。研究表明了2D和3D特征是点云分类的补充，但现有方法需要额外的2D标注以实现2D-3D信息融合。鉴于点云标注的高成本，有效的2D和3D特征融合基于弱监督学习是非常需要。为此，我们提出了一个 transformer 模型，它包括两个Encoder和一个Decoder，用于弱监督点云分类。具体来说，两个Encoder computes 3D点云和2D多视角图像的自我对应特征，respectively。Decoder 实现了排序2D-3D交叉对话，并执行隐式2D和3D特征融合。我们在Decoder层中 alternate 将查询和键值组替换。发现2D和3D特征在彼此之间轮流浓化。实验结果显示，它在S3DIS和ScanNet参数上与现有弱监督点云分类方法相比，大幅提高了表现。更多信息可以通过查看https://jimmy15923.github.io/mit_web/。

Prompt Injection Attacks and Defenses in LLM-Integrated Applications

paper_url: http://arxiv.org/abs/2310.12815
repo_url: https://github.com/liu00222/open-prompt-injection
paper_authors: Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong
for: 这篇论文旨在bridge在LLM-Integrated Applications中的攻击和防御之间的知识 gap。
methods: 该论文提出了一个通用的攻击框架，可以帮助研究人员更好地理解攻击方法，以及一个防御框架，可以帮助开发者设计有效的防御策略。
results: 该论文通过对10个LLMs和7个任务进行系统评估，发现了许多攻击和防御策略，并提供了一个可用的框架来帮助研究人员进一步发展这个领域。

Abstract
Large Language Models (LLMs) are increasingly deployed as the backend for a variety of real-world applications called LLM-Integrated Applications. Multiple recent works showed that LLM-Integrated Applications are vulnerable to prompt injection attacks, in which an attacker injects malicious instruction/data into the input of those applications such that they produce results as the attacker desires. However, existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a general framework to formalize prompt injection attacks. Existing attacks, which are discussed in research papers and blog posts, are special cases in our framework. Our framework enables us to design a new attack by combining existing attacks. Moreover, we also propose a framework to systematize defenses against prompt injection attacks. Using our frameworks, we conduct a systematic evaluation on prompt injection attacks and their defenses with 10 LLMs and 7 tasks. We hope our frameworks can inspire future research in this field. Our code is available at https://github.com/liu00222/Open-Prompt-Injection.

摘要

Model Merging by Uncertainty-Based Gradient Matching

paper_url: http://arxiv.org/abs/2310.12808
repo_url: None
paper_authors: Nico Daheim, Thomas Möllenhoff, Edoardo Maria Ponti, Iryna Gurevych, Mohammad Emtiyaz Khan
for: 这个论文旨在解释模型训练在不同数据集上的合并方法是如何实现Weighted-averaging，以及这种方法在不同情况下是否会失败。
methods: 该论文使用Weighted-averaging方法将不同数据集上的模型参数进行权重平均，并通过分析梯度匹配度来解释这种方法的成功和失败。
results: 研究发现，Weighted-averaging方法可以在大语言模型和视觉转换器上提供良好的性能和鲁棒性，但在某些情况下可能会失败，这与梯度匹配度的不同有关。

Abstract
Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.

摘要
模型训练在不同的数据集上可以通过将其参数进行权重平均合并，但为什么它会工作，而且在哪些情况下可能失败？我们连接权重平均的不准确性与梯度匹配度的差异，并提出一种基于不确定性的方案来改进性能，从而降低差异。这种连接还暴露了其他方案，如平均、任务加法和鱼得权重平均，的隐式假设。我们的新方法在大语言模型和视Transformers上都得到了一致性的改进，以及随着超参数的Robustness。

An effective theory of collective deep learning

paper_url: http://arxiv.org/abs/2310.12802
repo_url: None
paper_authors: Lluís Arola-Fernández, Lucas Lacasa
for: 这种研究的目的是解释人工神经网络系统中集体学习的出现。
methods: 这种模型使用了多种分布式算法，包括对每个神经网络单元的本地学习动态和对单元之间的卷积 Coupling。
results: 研究发现，当集体学习阶段出现时，各个网络单元可以通过私有数据进行训练，并且可以完全泛化到未看到的数据类型。

Abstract
Unraveling the emergence of collective learning in systems of coupled artificial neural networks is an endeavor with broader implications for physics, machine learning, neuroscience and society. Here we introduce a minimal model that condenses several recent decentralized algorithms by considering a competition between two terms: the local learning dynamics in the parameters of each neural network unit, and a diffusive coupling among units that tends to homogenize the parameters of the ensemble. We derive the coarse-grained behavior of our model via an effective theory for linear networks that we show is analogous to a deformed Ginzburg-Landau model with quenched disorder. This framework predicts (depth-dependent) disorder-order-disorder phase transitions in the parameters' solutions that reveal the onset of a collective learning phase, along with a depth-induced delay of the critical point and a robust shape of the microscopic learning path. We validate our theory in realistic ensembles of coupled nonlinear networks trained in the MNIST dataset under privacy constraints. Interestingly, experiments confirm that individual networks -- trained only with private data -- can fully generalize to unseen data classes when the collective learning phase emerges. Our work elucidates the physics of collective learning and contributes to the mechanistic interpretability of deep learning in decentralized settings.

摘要
探讨集成人工神经网络系统中共同学习的出现是一项广泛有关物理学、机器学习、神经科学和社会的研究。我们在这里提出了一个简化的模型，它把几种最近的分布式算法简化为两个参数的竞争：每个神经网络单元的本地学习动力和 Ensemble 中参数的卷积 Coupling。我们通过一种有效的线性网络理论来解释模型的宽泛行为，并证明其与扭曲的加顿兹-兰道模型相似，带有随机噪声。这个框架预测了参数解决方案中的（深度依赖）约束-约束-噪声相变，这些相变表明集成学习阶段的出现，以及深度带来的晚期极点延迟和微型学习路径的稳定形态。我们在使用实际的非线性网络集合和MNIST数据集进行训练的情况下验证了我们的理论。结果表明，当集成学习阶段出现时，各个网络——只使用私有数据进行训练——可以完全通过未看过数据类型进行总结。我们的研究为集成学习的物理学和深度学习在分布式设置中的机制解释做出了贡献。

Exploring Graph Neural Networks for Indian Legal Judgment Prediction

paper_url: http://arxiv.org/abs/2310.12800
repo_url: None
paper_authors: Mann Khatri, Mirza Yusuf, Yaman Kumar, Rajiv Ratn Shah, Ponnurangam Kumaraguru
for: 减轻法律系统受到不公正的判例量压力，提高司法效率和公正性。
methods: 使用图神经网络模型，利用事实证据和前例来预测判例结果，并考虑性别和姓名偏见。
results: 使用XLNet预训练特征得到最佳表现，macro F1分数达75%，链接预测ROC超过80%。

Abstract
The burdensome impact of a skewed judges-to-cases ratio on the judicial system manifests in an overwhelming backlog of pending cases alongside an ongoing influx of new ones. To tackle this issue and expedite the judicial process, the proposition of an automated system capable of suggesting case outcomes based on factual evidence and precedent from past cases gains significance. This research paper centres on developing a graph neural network-based model to address the Legal Judgment Prediction (LJP) problem, recognizing the intrinsic graph structure of judicial cases and making it a binary node classification problem. We explored various embeddings as model features, while nodes such as time nodes and judicial acts were added and pruned to evaluate the model's performance. The study is done while considering the ethical dimension of fairness in these predictions, considering gender and name biases. A link prediction task is also conducted to assess the model's proficiency in anticipating connections between two specified nodes. By harnessing the capabilities of graph neural networks and incorporating fairness analyses, this research aims to contribute insights towards streamlining the adjudication process, enhancing judicial efficiency, and fostering a more equitable legal landscape, ultimately alleviating the strain imposed by mounting case backlogs. Our best-performing model with XLNet pre-trained embeddings as its features gives the macro F1 score of 75% for the LJP task. For link prediction, the same set of features is the best performing giving ROC of more than 80%

摘要
“judicial系统中个偏斜的审判官至案例比例对审判过程带来巨大的负担，这导致了案件库存问题和持续的新案件涌入。为了解决这个问题并加速审判过程，建议一个自动化的系统，可以根据实际证据和过去案例的先例来预测审判结果。这个研究 paper 的目标是发展一个基于图 neural network 的模型，以解决审判预测问题（Legal Judgment Prediction，LJP），并考虑了性别和名称偏见的伦理维护。我们尝试了不同的嵌入 Space 作为模型特征，并在时间节点和司法行为上进行了评估。我们的最佳模型，使用 XLNet 预训嵌入，在 LJP 任务中取得了macro F1 分数为75%。另外，我们还进行了连接预测任务，使用相同的特征集，ROC 的成果超过80%。”Note: Please keep in mind that the translation is done by a machine and may not be as accurate as a human translation. Additionally, the translation may not capture all the nuances and idiomatic expressions of the original text.

Agri-GNN: A Novel Genotypic-Topological Graph Neural Network Framework Built on GraphSAGE for Optimized Yield Prediction

paper_url: http://arxiv.org/abs/2310.13037
repo_url: None
paper_authors: Aditya Gupta, Asheesh Singh
for:* 这篇论文旨在帮助农业领域实现更高的生产力和可持续性，通过将技术融入农业领域。methods:* 本论文提出了一个名为 $\textit{Agri-GNN}$ 的新型 Genotypic-Topological Graph Neural Network 框架，用于捕捉农作物间的细微空间和遗传学交互，以提高农作物的收获预测。results:* $\textit{Agri-GNN}$ 在一个包括植物指标、时间、遗传学资讯和位置资料的广泛数据集上进行了实验，得到了 $R^2 = .876$ 的收获预测准确性，与基准和其他相关研究相比有很大的改善。

Abstract
Agriculture, as the cornerstone of human civilization, constantly seeks to integrate technology for enhanced productivity and sustainability. This paper introduces $\textit{Agri-GNN}$, a novel Genotypic-Topological Graph Neural Network Framework tailored to capture the intricate spatial and genotypic interactions of crops, paving the way for optimized predictions of harvest yields. $\textit{Agri-GNN}$ constructs a Graph $\mathcal{G}$ that considers farming plots as nodes, and then methodically constructs edges between nodes based on spatial and genotypic similarity, allowing for the aggregation of node information through a genotypic-topological filter. Graph Neural Networks (GNN), by design, consider the relationships between data points, enabling them to efficiently model the interconnected agricultural ecosystem. By harnessing the power of GNNs, $\textit{Agri-GNN}$ encapsulates both local and global information from plants, considering their inherent connections based on spatial proximity and shared genotypes, allowing stronger predictions to be made than traditional Machine Learning architectures. $\textit{Agri-GNN}$ is built from the GraphSAGE architecture, because of its optimal calibration with large graphs, like those of farming plots and breeding experiments. $\textit{Agri-GNN}$ experiments, conducted on a comprehensive dataset of vegetation indices, time, genotype information, and location data, demonstrate that $\textit{Agri-GNN}$ achieves an $R^2 = .876$ in yield predictions for farming fields in Iowa. The results show significant improvement over the baselines and other work in the field. $\textit{Agri-GNN}$ represents a blueprint for using advanced graph-based neural architectures to predict crop yield, providing significant improvements over baselines in the field.

摘要
农业，为人类文明的基础，不断寻求技术进步，以提高生产力和可持续性。本文介绍了一种新的种植物种拟合Graph Neural Network（GNN）框架，称为$\textit{Agri-GNN}$，用于捕捉作物的细致空间和种植物互动关系，为产量预测做出优化。$\textit{Agri-GNN}$首先构建一个图 $\mathcal{G}$，其中 Plot 作为节点，然后遍历节点之间的边基于空间和种植物相似性，通过种植物拟合筛选节点信息。GNN 由于其可以准确地模型数据点之间的关系，因此可以高效地模型农业生态系统的连接关系。通过利用 GNN 的能力，$\textit{Agri-GNN}$ 可以捕捉作物的本地和全局信息，考虑作物的空间距离和共享种植物，从而进行更准确的产量预测。$\textit{Agri-GNN}$ 基于 GraphSAGE 框架，因为它可以对大型图进行优化匹配。在一个包括植被指数、时间、种植物信息和位置数据的全面数据集上进行了 $\textit{Agri-GNN}$ 的实验，结果显示 $\textit{Agri-GNN}$ 在农场产量预测中达到了 $R^2 = .876$。结果表明 $\textit{Agri-GNN}$ 在基线和其他相关工作上具有显著改善。 $\textit{Agri-GNN}$ 为使用先进的图基于神经网络预测作物产量提供了蓝本，并且在基eline上显示出了显著的改善。

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning

paper_url: http://arxiv.org/abs/2310.12774
repo_url: https://github.com/cambridgeltl/claps
paper_authors: Han Zhou, Xingchen Wan, Ivan Vulić, Anna Korhonen
for: 本研究旨在探讨黑盒寄存器搜索的效率和用途。
methods: 本研究提出了一种简单的黑盒搜索方法，named ClaPS，它首先对搜索空间进行分组并剔除不重要的搜索token，然后使用简单的搜索方法在剔除后的搜索空间中进行搜索。
results: 根据多个任务和黑盒语言模型的实验结果，ClaPS方法可以在黑盒搜索中实现state-of-the-art表现，同时减少搜索成本。这些结果表明搜索空间设计和优化在黑盒寄存器搜索中扮演了关键的角色。

Abstract
Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), enabling few-shot or even zero-shot learning. Black-box prompt search has received growing interest recently for its distinctive properties of gradient-free optimization, proven particularly useful and powerful for model-as-a-service usage. However, the discrete nature and the complexity of combinatorial optimization hinder the efficiency of modern black-box approaches. Despite extensive research on search algorithms, the crucial aspect of search space design and optimization has been largely overlooked. In this paper, we first conduct a sensitivity analysis by prompting LLM, revealing that only a small number of tokens exert a disproportionate amount of influence on LLM predictions. Leveraging this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, ClaPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.

摘要
In this paper, we first conduct a sensitivity analysis by prompting LLMs, revealing that only a small number of tokens have a disproportionate influence on LLM predictions. Building on this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple black-box search method that clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, ClaPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings highlight the critical role of search space design and optimization in enhancing both the usefulness and efficiency of black-box prompt-based learning.

Safe RLHF: Safe Reinforcement Learning from Human Feedback

paper_url: http://arxiv.org/abs/2310.12773
repo_url: https://github.com/pku-alignment/safe-rlhf
paper_authors: Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang
for: 这篇论文旨在提出一种名为Safe Reinforcement Learning from Human Feedback（Safe RLHF）的新算法，用于人类价值调整大语言模型（LLMs）。
methods: Safe RLHF 使用了解释分离问题的方法，将人类偏好的帮助和无害性分开，以避免工作者的混乱，并将两个目标分别训练为奖励和成本模型。
results: 通过三轮精细调整，Safe RLHF 能够优化模型的性能和安全性，比较现有的价值调整算法更好地避免危害性的回应。实验中，我们使用 Safe RLHF 调整了 Alpaca-7B，并与人类偏好相匹配，实现了模型的帮助和无害性。

Abstract
With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical. However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training. To address this issue, we propose Safe Reinforcement Learning from Human Feedback (Safe RLHF), a novel algorithm for human value alignment. Safe RLHF explicitly decouples human preferences regarding helpfulness and harmlessness, effectively avoiding the crowdworkers' confusion about the tension and allowing us to train separate reward and cost models. We formalize the safety concern of LLMs as an optimization task of maximizing the reward function while satisfying specified cost constraints. Leveraging the Lagrangian method to solve this constrained problem, Safe RLHF dynamically adjusts the balance between the two objectives during fine-tuning. Through a three-round fine-tuning using Safe RLHF, we demonstrate a superior ability to mitigate harmful responses while enhancing model performance compared to existing value-aligned algorithms. Experimentally, we fine-tuned the Alpaca-7B using Safe RLHF and aligned it with collected human preferences, significantly improving its helpfulness and harmlessness according to human evaluations.

摘要
大语言模型（LLM）的发展使得保持人工智能系统的性能和安全性更加重要。然而，帮助和无害的目标之间存在内在的矛盾，在LLM训练中带来了一定的挑战。为解决这个问题，我们提出了安全强化学习从人类反馈（Safe RLHF），一种新的人价值对齐算法。Safe RLHF将人类对帮助和无害的喜好分开显式地避免了协作者的混乱，让我们可以分开训练奖励和成本模型。我们将LLM的安全问题定义为最大化奖励函数的优化问题，满足specified的成本约束。通过拉格朗日方法解决这个受约问题，Safe RLHF在细化过程中动态调整了两个目标之间的平衡。通过三轮细化使用Safe RLHF，我们示出了与现有的值适应算法相比，更好地 mitigate harmful responses while enhancing model performance。实验中，我们使用Safe RLHF细化了Alpaca-7B，并与收集的人类偏好相对应，显著提高了其帮助和无害性。

SemantIC: Semantic Interference Cancellation Towards 6G Wireless Communications

paper_url: http://arxiv.org/abs/2310.12768
repo_url: https://github.com/linwest/SemantIC
paper_authors: Wensheng Lin, Yuna Yan, Lixin Li, Zhu Han, Tad Matsumoto
for: 提高6G无线网络信息质量
methods: 使用语义干扰除法（SemantIC）
results: 无需额外频率资源，可以提高信号干扰抑制性和信息质量

Abstract
This letter proposes a novel anti-interference technique, semantic interference cancellation (SemantIC), for enhancing information quality towards the sixth-generation (6G) wireless networks. SemantIC only requires the receiver to concatenate the channel decoder with a semantic auto-encoder. This constructs a turbo loop which iteratively and alternately eliminates noise in the signal domain and the semantic domain. From the viewpoint of network information theory, the neural network of the semantic auto-encoder stores side information by training, and provides side information in iterative decoding, as an implementation of the Wyner-Ziv theorem. Simulation results verify the performance improvement by SemantIC without extra channel resource cost.

摘要
这封信件提出了一种新的反干扰技术，即语义干扰抑制（SemantIC），用于提高 sixth-generation（6G）无线网络中信息质量。SemantIC只需接收器将通道解码器与语义自动编码器 concatenate 起来，这将construct一个 turbo 循环，通过逐次和 alternately 消除信号域和语义域中的干扰。从网络信息理论的视角来看，语义自动编码器的神经网络在训练中存储了侧信息，并在循环解码中提供了侧信息，实现了万ер- Жи夫定理。实验结果表明SemantIC可以不添加额外通道资源成本下提高性能。

Training binary neural networks without floating point precision

paper_url: http://arxiv.org/abs/2310.19815
repo_url: None
paper_authors: Federico Fontana
for: 提高 binary neural network 训练效率，实现低延迟低能耗网络。
methods: 提出两种解决方案，包括 topology 变化和策略训练，使网络达到接近现状前方性和高效训练。
results: 实现 near state-of-the-art 性能和高效训练，降低训练时间和内存占用量。

Abstract
The main goal of this work is to improve the efficiency of training binary neural networks, which are low latency and low energy networks. The main contribution of this work is the proposal of two solutions comprised of topology changes and strategy training that allow the network to achieve near the state-of-the-art performance and efficient training. The time required for training and the memory required in the process are two factors that contribute to efficient training.

摘要
主要目标是提高Binary神经网络训练效率，这些神经网络具有低延迟和低能耗特性。本工作的主要贡献是提出两种解决方案，包括结构变化和策略训练，使网络达到近状态艺术性能和高效训练。训练时间和训练过程中所需的内存是两个 contribuuting 因素。Note: "Binary neural networks" in the original text is translated as "Binary神经网络" in Simplified Chinese, which is a common way to refer to neural networks with binary weights and activations.

LASER: Linear Compression in Wireless Distributed Optimization

paper_url: http://arxiv.org/abs/2310.13033
repo_url: None
paper_authors: Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar
for: 这篇论文主要是为了解决分布式优化中的通信瓶颈问题。
methods: 这篇论文提出了一种名为LASER的线性压缩方法，利用梯度的低级结构来高效地传输杂变通信频道。
results: 实验结果表明，LASER比基eline对比，在计算机视觉和GPT语言模型任务上表现出了Consistent的提升，特别是在噪声通信频道上，可以获得50-64%的提升。

Abstract
Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineAr CompreSsion in WirEless DistRibuted Optimization. LASER capitalizes on the inherent low-rank structure of gradients and transmits them efficiently over the noisy channels. Whilst enjoying theoretical guarantees similar to those of the classical SGD, LASER shows consistent gains over baselines on a variety of practical benchmarks. In particular, it outperforms the state-of-the-art compression schemes on challenging computer vision and GPT language modeling tasks. On the latter, we obtain $50$-$64 \%$ improvement in perplexity over our baselines for noisy channels.

摘要
<> translate text into Simplified ChineseData-parallel SGD 是大规模机器学习中的标准算法，却受到交通瓶颈的困扰。大多数压缩方案可以减轻这个问题，但是它们假设无噪通信链路，或者在实际任务上没有良好的性能。在这篇论文中，我们填补这个空白和引入了LASER：线性压缩在无线分布优化中。LASER 利用梯度的自然低维结构，并将其高效地传输到噪音通信链路上。与传统的SGD 算法相似，LASER 具有类似的理论保证，但是在实际任务上显示出了与基准模型的一致性。特别是在计算机视觉和 GPT 自然语言模型任务上，LASER 表现出了50-64%的改善。在后一个任务中，我们对噪音通信链路上的基准模型进行了50-64%的改善。<>

Neurosymbolic Grounding for Compositional World Models

paper_url: http://arxiv.org/abs/2310.12690
repo_url: None
paper_authors: Atharva Sehgal, Arya Grayeli, Jennifer J. Sun, Swarat Chaudhuri
for: cosm os是一个框架，旨在实现物件中心的世界模型，以进行可compose的通用化（CG），即通过视觉“原子”的合成，实现高性能的未见input scene。
methods: cosm os使用了一种新的征识符学符合grounding，包括：(i) neurosymbolic scene encodings，即每个场景中的实体使用一个真实向量，由神经网络编码器计算，以及一组可组合的符号描述实体的属性; (ii) neurosymbolic attention机制，将这些实体与学习的交互规则绑定。
results: cosm os在一个已知的块组 pushing 领域进行了评估，并实现了一个新的状态剑框架，在可 compose 的通用化方面实现了新的州of-the-art。

Abstract
We introduce Cosmos, a framework for object-centric world modeling that is designed for compositional generalization (CG), i.e., high performance on unseen input scenes obtained through the composition of known visual "atoms." The central insight behind Cosmos is the use of a novel form of neurosymbolic grounding. Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.

摘要
我们介绍 Cosmos，一个基于物体中心的世界模型框架，旨在实现compositional generalization（CG），即在见到的输入场景中表现出高效性。 Cosmos的中心思想是通过一种新的神经 символіック对应。 Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.Here's the translation of the text into Traditional Chinese:我们介绍 Cosmos，一个基于物体中心的世界模型框架，旨在实现compositional generalization（CG），即在见到的输入场景中表现出高效性。 Cosmos的中心思想是通过一种新的神经 символіック对应。 Specifically, the framework introduces two new tools: (i) neurosymbolic scene encodings, which represent each entity in a scene using a real vector computed using a neural encoder, as well as a vector of composable symbols describing attributes of the entity, and (ii) a neurosymbolic attention mechanism that binds these entities to learned rules of interaction. Cosmos is end-to-end differentiable; also, unlike traditional neurosymbolic methods that require representations to be manually mapped to symbols, it computes an entity's symbolic attributes using vision-language foundation models. Through an evaluation that considers two different forms of CG on an established blocks-pushing domain, we show that the framework establishes a new state-of-the-art for CG in world modeling.

Compression of Recurrent Neural Networks using Matrix Factorization

paper_url: http://arxiv.org/abs/2310.12688
repo_url: https://github.com/deathekirl/low-rank-approximation
paper_authors: Lucas Maison, Hélion du Mas des Bourboux, Thomas Courtat
for: 压缩神经网络，以提高实时或嵌入式应用中模型的可扩展性和效率。
methods: 使用低级别预测来因素化模型的矩阵，并通过rank-tuning方法选择不同的级别。同时使用训练改进来实现压缩。
results: 在信号处理任务上，可以对循环神经网络进行14倍的压缩，而且最多产生1.4%的性能下降。

Abstract
Compressing neural networks is a key step when deploying models for real-time or embedded applications. Factorizing the model's matrices using low-rank approximations is a promising method for achieving compression. While it is possible to set the rank before training, this approach is neither flexible nor optimal. In this work, we propose a post-training rank-selection method called Rank-Tuning that selects a different rank for each matrix. Used in combination with training adaptations, our method achieves high compression rates with no or little performance degradation. Our numerical experiments on signal processing tasks show that we can compress recurrent neural networks up to 14x with at most 1.4% relative performance reduction.

摘要
压缩神经网络是部署模型实时或嵌入式应用时的关键步骤。使用低级别应对approximation фактор化模型的矩阵是一种有前途的方法。although it is possible to set the rank before training, this approach is neither flexible nor optimal。在这种工作中，我们提出了一种post-training rank-selection方法，叫做Rank-Tuning，它可以为每个矩阵选择不同的极值。通过与训练改进相结合，我们的方法可以实现高度压缩，而无或很少的性能下降。我们的数字实验表明，可以将回传神经网络压缩到14倍，最多1.4%的相对性能下降。

Quality-Diversity through AI Feedback

paper_url: http://arxiv.org/abs/2310.13032
repo_url: None
paper_authors: Herbie Bradley, Andrew Dai, Hannah Teufel, Jenny Zhang, Koen Oostermeijer, Marco Bellagente, Jeff Clune, Kenneth Stanley, Grégory Schott, Joel Lehman
for: 这 paper 的目的是探讨Quality-Diversity (QD) 搜索算法在文学创作领域的应用。
methods: 这 paper 使用了语言模型 (LM) 来引导搜索，并通过人工选择和淘汰来优化候选文本的质量和多样性。
results: 对比非 QD 控制方法，QDAIF 能够更好地覆盖搜索空间，并生成高质量的创作文本。人工评价也表明，AI 和人类评价之间存在相对的一致。

Abstract
In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through AI feedback, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.

摘要
在许多文本生成问题中，用户可能会希望不仅得到一个响应，而是一个多样化的高质量输出，从中选择。质量多样性（QD）搜索算法target这些输出，通过不断改进和多样化候选人 population。然而，在艺术创作领域，QD的应用受到了算法specifying measure of quality和多样性的difficulty的限制。幸好，现在的语言模型（LM）的发展使得可以通过人工智能反馈来引导搜索，其中LM被提示在自然语言中评估文本的质量和多样性。基于这一发展，我们介绍了质量多样性通过人工智能反馈（QDAIF），其中演化算法利用LM来生成多样性和评估候选人的质量和多样性。在艺术创作领域进行评估，QDAIF能够更好地覆盖指定的搜索空间，并且输出高质量的样本。此外，人类对QDAIF生成的艺术创作文本进行评估，并与人工智能评估 Display reasonable agreement。我们的结果因此表明了人工智能反馈可以引导开放的搜索，以找到创新和原创的解决方案，这种方法似乎可以普遍应用于多个领域和模式。因此，QDAIF是一步向AI系统独立搜索、多样化、评估和改进，这些技能是人类社会创新的核心。

A Use Case: Reformulating Query Rewriting as a Statistical Machine Translation Problem

paper_url: http://arxiv.org/abs/2310.13031
repo_url: None
paper_authors: Abdullah Can Algan, Emre Yürekli, Aykut Çayır
for: 该论文目的是提高现代搜索引擎的搜索结果相关性，通过用户查询 rewrite 模型来实现。
methods: 该论文提出了一个基于单语言机器翻译模型的查询 rewrite 管道，并对用户查询进行预处理，以创建用户查询和网页标题之间的映射。
results: 该论文通过使用这种查询 rewrite 管道，可以提高搜索结果的相关性和精度。

Abstract
One of the most important challenges for modern search engines is to retrieve relevant web content based on user queries. In order to achieve this challenge, search engines have a module to rewrite user queries. That is why modern web search engines utilize some statistical and neural models used in the natural language processing domain. Statistical machine translation is a well-known NLP method among them. The paper proposes a query rewriting pipeline based on a monolingual machine translation model that learns to rewrite Arabic user search queries. This paper also describes preprocessing steps to create a mapping between user queries and web page titles.

摘要
现代搜索引擎面临着一个非常重要的挑战，即根据用户查询 retrieve relevante web内容。为了解决这个挑战，搜索引擎通常具有一个用于重写用户查询的模块。为了实现这一目标，现代网络搜索引擎通常使用自然语言处理领域的统计学和神经网络模型。统计机器翻译是这些模型中的一个非常知名的方法。本文提出了一个基于单语言机器翻译模型的查询重写管道，该管道可以学习重写阿拉伯语用户搜索查询。本文还描述了为创建用户查询和网页标题之间的映射而进行的预处理步骤。

PSYCHIC: A Neuro-Symbolic Framework for Knowledge Graph Question-Answering Grounding

paper_url: http://arxiv.org/abs/2310.12638
repo_url: None
paper_authors: Hanna Abi Akl
for: 本研究用于回答知识 graphs（KGs）上的问题。
methods: 我们提出了一种 neuralsymbolic（NS）框架，基于EXTRACTIVE QA模型PSYCHIC，可以识别问题和关联的实体。
results: 我们的系统在问题回答任务（KGQA）上取得了0.18%的F1分数，并在实体链接任务（EL）上取得了71.00%的分数。

Abstract
The Scholarly Question Answering over Linked Data (Scholarly QALD) at The International Semantic Web Conference (ISWC) 2023 challenge presents two sub-tasks to tackle question answering (QA) over knowledge graphs (KGs). We answer the KGQA over DBLP (DBLP-QUAD) task by proposing a neuro-symbolic (NS) framework based on PSYCHIC, an extractive QA model capable of identifying the query and entities related to a KG question. Our system achieved a F1 score of 00.18% on question answering and came in third place for entity linking (EL) with a score of 71.00%.

摘要
“学术问答 над连接数据”（Scholarly QALD）在2023年国际semantic Web会议（ISWC）挑战中提出了两个子任务来解决知识图（KG）上的问答（QA）。我们使用一个基于neuro-symbolic（NS）框架，其中PSYCHIC是一种提取式QA模型，可以识别KG问题中的查询和相关的实体。我们的系统在问答任务上达到了0.18%的F1分数，并在实体关联（EL）任务上达到了71.00%的分数。

On existence, uniqueness and scalability of adversarial robustness measures for AI classifiers

paper_url: http://arxiv.org/abs/2310.14421
repo_url: None
paper_authors: Illia Horenko
for: 该论文是为了解决机器学习模型的攻击性和风险性问题，提供了一种可靠的存在和唯一性条件，以及可 analytical 计算方法。
methods: 论文使用了一种基于泛函理论的方法，并提出了一种基于 entropy 的 AI 模型。
results: 论文提出了一种可 analytical 计算的最小敌对路径（MAP）和最小敌对距离（MAD），并在不同类型的 AI 工具（如神经网络、启动随机森林、GLM 和 EAI）上进行了实践计算和比较。另外，论文还解释了如何使用 MAP 提供专门的患者特定风险 Mitigation 方案。

Abstract
Simply-verifiable mathematical conditions for existence, uniqueness and explicit analytical computation of minimal adversarial paths (MAP) and minimal adversarial distances (MAD) for (locally) uniquely-invertible classifiers, for generalized linear models (GLM), and for entropic AI (EAI) are formulated and proven. Practical computation of MAP and MAD, their comparison and interpretations for various classes of AI tools (for neuronal networks, boosted random forests, GLM and EAI) are demonstrated on the common synthetic benchmarks: on a double Swiss roll spiral and its extensions, as well as on the two biomedical data problems (for the health insurance claim predictions, and for the heart attack lethality classification). On biomedical applications it is demonstrated how MAP provides unique minimal patient-specific risk-mitigating interventions in the predefined subsets of accessible control variables.

摘要
<>通过形式化的数学条件，得出最小对抗路径（MAP）和最小对抗距离（MAD）的存在、唯一性和计算方法， для（本地）唯一推导类型（GLM）和生成Entropic AI（EAI）。在各种AI工具（神经网络、提高Random Forest、GLM和EAI）中进行实际计算MAP和MAD，并对它们进行比较和解释。在Double Swiss Roll Spiral和其扩展问题上，以及在两个生物医学数据问题（健康保险laims预测和心血栓病性分类）上进行实践。在生物医学应用中，MAP提供了唯一的最小特定患者风险减轻 intervención。Note: " Simplified Chinese" is also known as "Mandarin Chinese" or "Standard Chinese".Please note that the translation is done using a machine translation tool, and may not be 100% accurate or idiomatic.

Towards a Deep Learning-based Online Quality Prediction System for Welding Processes

paper_url: http://arxiv.org/abs/2310.12632
repo_url: None
paper_authors: Yannik Hahn, Robert Maack, Guido Buchholz, Marion Purrio, Matthias Angerhausen, Hasan Tercan, Tobias Meisen
for: 这个论文旨在提出一种基于深度学习的预测质量系统，用于监控和评估钢材气动焊（GMAW） proces。
methods: 该论文提出了一个包括四个主要阶段的概念：数据收集和管理（如电压和电流）、实时处理和特征工程（使用自适应神经网络）、使用适当的循环深度学习模型进行质量预测，以及在 proces 条件变化时进行模型进化（使用不间断学习）。
results: 该论文未提供实际result，但提出了一种可靠的预测质量系统基于深度学习，可以在非实验室环境中实时监控和评估GMAW proces。

Abstract
The digitization of manufacturing processes enables promising applications for machine learning-assisted quality assurance. A widely used manufacturing process that can strongly benefit from data-driven solutions is gas metal arc welding (GMAW). The welding process is characterized by complex cause-effect relationships between material properties, process conditions and weld quality. In non-laboratory environments with frequently changing process parameters, accurate determination of weld quality by destructive testing is economically unfeasible. Deep learning offers the potential to identify the relationships in available process data and predict the weld quality from process observations. In this paper, we present a concept for a deep learning based predictive quality system in GMAW. At its core, the concept involves a pipeline consisting of four major phases: collection and management of multi-sensor data (e.g. current and voltage), real-time processing and feature engineering of the time series data by means of autoencoders, training and deployment of suitable recurrent deep learning models for quality predictions, and model evolutions under changing process conditions using continual learning. The concept provides the foundation for future research activities in which we will realize an online predictive quality system for running production.

摘要
随着制造过程的数字化，机器学习助成质量监控应用得到了推动。一种广泛使用的制造过程是气密填充焊接（GMAW）。焊接过程具有复杂的原因关系，其中物质性、过程参数和焊接质量之间存在紧密的关系。在实际生产环境中，通过采用破坏性测试准确地确定焊接质量是经济不可行。深度学习可以识别可用的过程数据中的关系，预测焊接质量基于过程观察。在这篇论文中，我们提出了一种基于深度学习的预测质量系统概念。这个概念包括四个主要阶段：收集和管理多感器数据（如电流和电压）、实时处理和特征工程时间序列数据使用自动编码器、训练和部署适合的循环深度学习模型以进行质量预测、以及在过程参数变化时使用连续学习进行模型进化。这个概念提供了未来研究活动的基础，我们将实现在生产中运行的在线预测质量系统。

Heart Disease Detection using Vision-Based Transformer Models from ECG Images

paper_url: http://arxiv.org/abs/2310.12630
repo_url: None
paper_authors: Zeynep Hilal Kilimci, Mustafa Yalcin, Ayhan Kucukmanisa, Amit Kumar Mishra
for: 您的论文旨在检测心血管疾病，使用最新的技术和计算方法。
methods: 您使用了视transformer模型，包括Google-Vit、Microsoft-Beit和Swin-Tiny等。
results: 实验结果显示，您的方法可以准确地检测心血管疾病。

Abstract
Heart disease, also known as cardiovascular disease, is a prevalent and critical medical condition characterized by the impairment of the heart and blood vessels, leading to various complications such as coronary artery disease, heart failure, and myocardial infarction. The timely and accurate detection of heart disease is of paramount importance in clinical practice. Early identification of individuals at risk enables proactive interventions, preventive measures, and personalized treatment strategies to mitigate the progression of the disease and reduce adverse outcomes. In recent years, the field of heart disease detection has witnessed notable advancements due to the integration of sophisticated technologies and computational approaches. These include machine learning algorithms, data mining techniques, and predictive modeling frameworks that leverage vast amounts of clinical and physiological data to improve diagnostic accuracy and risk stratification. In this work, we propose to detect heart disease from ECG images using cutting-edge technologies, namely vision transformer models. These models are Google-Vit, Microsoft-Beit, and Swin-Tiny. To the best of our knowledge, this is the initial endeavor concentrating on the detection of heart diseases through image-based ECG data by employing cuttingedge technologies namely, transformer models. To demonstrate the contribution of the proposed framework, the performance of vision transformer models are compared with state-of-the-art studies. Experiment results show that the proposed framework exhibits remarkable classification results.

摘要
心脏病，也称为心血管疾病，是一种非常普遍和严重的医疗病种， caracterized by impairment of the heart and blood vessels, leading to various complications such as coronary artery disease, heart failure, and myocardial infarction. 时间和准确的检测心脏病非常重要在临床实践中。早期识别患者风险可以实施措施，预防措施和个性化的治疗策略，以降低疾病的进程和不良结果。在过去几年，心脏病检测领域已经经历了显著的进步，它们是通过融合先进技术和计算方法的努力。这些包括机器学习算法、数据挖掘技术和预测模型框架，这些技术可以利用庞大的临床和生理学数据，提高诊断精度和风险分级。在这项工作中，我们提议使用电cardioGRAM（ECG）图像来检测心脏病，使用 cutting-edge 技术，即视Transformer模型。这些模型包括Google-Vit、Microsoft-Beit和Swin-Tiny。到目前为止，这是第一个集中于通过图像基本ECG数据来检测心脏病的研究，使用 cutting-edge 技术，即 transformer 模型。为了证明提案的贡献，我们将比较视Transformer模型的表现与现有研究的最佳成果。实验结果显示，我们的提案框架具有惊人的分类效果。

Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps

paper_url: http://arxiv.org/abs/2310.12616
repo_url: None
paper_authors: Sidi Wu, Yizi Chen, Konrad Schindler, Lorenz Hurni
for: 这 paper 的目的是提出一种基于 U-Net 和 cross-attention transformers 的网络，用于提取历史地图中的信息。
methods: 这 paper 使用了 neural networks 和 U-Net 来处理历史地图，并使用 cross-attention transformers 来捕捉历史地图中的 espacio-temporal 信息。
results: 这 paper 的模型比其他 state-of-the-art 模型更好，可以更好地 segment 历史地图。此外，这 paper 还提出了一种将 spatial 和 temporal 上下文 fusion 到一起的方法，以提高模型的性能。

Abstract
Historical maps provide useful spatio-temporal information on the Earth's surface before modern earth observation techniques came into being. To extract information from maps, neural networks, which gain wide popularity in recent years, have replaced hand-crafted map processing methods and tedious manual labor. However, aleatoric uncertainty, known as data-dependent uncertainty, inherent in the drawing/scanning/fading defects of the original map sheets and inadequate contexts when cropping maps into small tiles considering the memory limits of the training process, challenges the model to make correct predictions. As aleatoric uncertainty cannot be reduced even with more training data collected, we argue that complementary spatio-temporal contexts can be helpful. To achieve this, we propose a U-Net-based network that fuses spatio-temporal features with cross-attention transformers (U-SpaTem), aggregating information at a larger spatial range as well as through a temporal sequence of images. Our model achieves a better performance than other state-or-art models that use either temporal or spatial contexts. Compared with pure vision transformers, our model is more lightweight and effective. To the best of our knowledge, leveraging both spatial and temporal contexts have been rarely explored before in the segmentation task. Even though our application is on segmenting historical maps, we believe that the method can be transferred into other fields with similar problems like temporal sequences of satellite images. Our code is freely accessible at https://github.com/chenyizi086/wu.2023.sigspatial.git.

摘要
历史地图提供了地球表面之前的有用空间-时间信息。为提取信息从地图，人工神经网络在最近几年中得到了广泛的应用，取代了手工地图处理方法和繁琐的手动劳动。然而，抽象不确定性（数据依赖的不确定性），包括地图描述/扫描/褪色缺陷和缺乏合适的上下文，使模型预测 incorrect。由于这种抽象不确定性不能减少，我们认为可以利用 complementary 空间-时间上下文。为此，我们提出了基于 U-Net 网络的混合空间-时间特征（U-SpaTem），将空间-时间特征混合在一起，并通过带有权重的混合层来进行权重融合。我们的模型在与其他状态 искусственного神经网络（ANN）比较时表现更好，而且相比于纯视觉变换器，我们的模型更轻量级、有效。我们认为利用空间和时间上下文的组合是一项 rarely explored 的方法，即使在地图分割任务中。虽然我们的应用是在历史地图分割中，但我们认为这种方法可以被应用到其他具有相似问题的领域，如卫星图像序列中的时间序列分割。我们的代码可以免费在 GitHub 上获取：https://github.com/chenyizi086/wu.2023.sigspatial.git。

Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series

paper_url: http://arxiv.org/abs/2310.13029
repo_url: https://github.com/IoannisNasios/M5_Uncertainty_3rd_place
paper_authors: Ioannis Nasios, Konstantinos Vogklis
for: 这 paper targets 点和 probabilistic 预测问题，并提出了一种 combining 机器学习模型的方法ology，包括 gradient boosted trees 和 neural networks 家族。这些原则在 reciently M5 竞赛中得到了成功应用。
methods: 方法包括将任务转换为销售量的回归问题，rich feature engineering，构建多种 state-of-the-art 机器学习模型，并且精心选择验证集进行模型调整。
results: 结果显示，多样性的机器学习模型和精心选择的验证示例是效果最好的关键因素。尽管预测数据具有自然的层次结构（12 级），但我们的提posed 解决方案并未利用这种层次结构。使用提posed 方法，我们的团队在 both Accuracy 和 Uncertainty track 中获得了金牌奖。

Abstract
In this paper we tackle the problem of point and probabilistic forecasting by describing a blending methodology of machine learning models that belong to gradient boosted trees and neural networks families. These principles were successfully applied in the recent M5 Competition on both Accuracy and Uncertainty tracks. The keypoints of our methodology are: a) transform the task to regression on sales for a single day b) information rich feature engineering c) create a diverse set of state-of-the-art machine learning models and d) carefully construct validation sets for model tuning. We argue that the diversity of the machine learning models along with the careful selection of validation examples, where the most important ingredients for the effectiveness of our approach. Although forecasting data had an inherent hierarchy structure (12 levels), none of our proposed solutions exploited that hierarchical scheme. Using the proposed methodology, our team was ranked within the gold medal range in both Accuracy and the Uncertainty track. Inference code along with already trained models are available at https://github.com/IoannisNasios/M5_Uncertainty_3rd_place

摘要
在这篇论文中，我们解决了点预测和概率预测问题，描述了一种将机器学习模型组合成 gradient boosted trees 和神经网络家族的方法。这些原则在最近的 M5 竞赛中得到了成功应用，在精度和不确定性轨道上均获得了金牌。我们的方法的关键特点包括：a) 将任务转化为销售单日的回归问题b) rich feature engineeringc) 创建多种现有最佳机器学习模型d) 仔细构建验证集 для模型调整我们认为多样化的机器学习模型以及仔细选择的验证示例是我们方法的关键元素。尽管预测数据有自然的层次结构（12级），但我们所提出的解决方案并未利用该层次结构。使用我们提议的方法，我们的团队在精度和不确定性轨道上均获得了金牌。可以在 https://github.com/IoannisNasios/M5_Uncertainty_3rd_place 获取预测代码以及已训练模型。

Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model

paper_url: http://arxiv.org/abs/2310.12611
repo_url: https://github.com/iabhijith/bias-causal-analysis
paper_authors: Abhijith Chintam, Rahel Beloch, Willem Zuidema, Michael Hanna, Oskar van der Wal
for: 本研究旨在探讨如何使Language Model（LM）减少各种不良偏见，包括性别偏见。
methods: 本研究使用三种方法来识别Language Model组件之间的 causal 关系： causal mediation analysis、自动化电路发现和我们的新方法DiffMask+基于差异掩码。
results: 我们应用这些方法于GPT-2小和性别偏见问题，并使用发现的组件集来进行参数有效地 fine-tuning，以减少性别偏见而不损害通用语言模型表现。

Abstract
Language models (LMs) exhibit and amplify many types of undesirable biases learned from the training data, including gender bias. However, we lack tools for effectively and efficiently changing this behavior without hurting general language modeling performance. In this paper, we study three methods for identifying causal relations between LM components and particular output: causal mediation analysis, automated circuit discovery and our novel, efficient method called DiffMask+ based on differential masking. We apply the methods to GPT-2 small and the problem of gender bias, and use the discovered sets of components to perform parameter-efficient fine-tuning for bias mitigation. Our results show significant overlap in the identified components (despite huge differences in the computational requirements of the methods) as well as success in mitigating gender bias, with less damage to general language modeling compared to full model fine-tuning. However, our work also underscores the difficulty of defining and measuring bias, and the sensitivity of causal discovery procedures to dataset choice. We hope our work can contribute to more attention for dataset development, and lead to more effective mitigation strategies for other types of bias.

摘要

Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning

paper_url: http://arxiv.org/abs/2310.12609
repo_url: None
paper_authors: Junwoo Chang, Hyunwoo Ryu, Jiwoo Kim, Soochul Yoo, Joohwan Seo, Nikhil Prakash, Jongeun Choi, Roberto Horowitz
for: 这个论文是为了解决 robotics 中的问题，特别是在对于复杂的环境下实现机器人的运动和目标追寻。
methods: 这个方法使用了一种新的碰撞避免的扩散函数进行训练，并在推断时同时生成可到达的目标和避免障碍的动作计划。
results: 试验结果显示，这个方法在多模式环境中表现更加稳定和有效，能够实现机器人对于可到达的目标的追寻，并避免障碍物的碰撞。

Abstract
Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance.

摘要
Diffusion 模型在机器人学中得到了广泛应用，因为它们具有灵活性和多模式性。虽然一些这些方法可以有效地解决复杂的问题，但它们经常依赖于运行时检测障碍物并需要额外设备。为解决这些挑战，我们提出了一种方法，即在运行时同时生成可达的目标和避免障碍的动作计划，从单一的视觉输入中进行训练。我们的方法的核心是使用避免碰撞的扩散核心进行训练。经过对行为假写和古典扩散模型的评估，我们的框架在多模式环境中表现特别有效，能够寻求目标并避免不可达的目标，同时确保碰撞避免。

Time-Aware Representation Learning for Time-Sensitive Question Answering

paper_url: http://arxiv.org/abs/2310.12585
repo_url: https://github.com/sonjbin/tcqa
paper_authors: Jungbin Son, Alice Oh
for: 提高语言模型理解时间关系和数字之间的关系，以解决现实世界问答问题中的时间因素。
methods: 提议一种时间上下文相互作用Question Answering（TCQA）框架，包括时间上下文依赖的Span Extraction（TCSE）任务，以及基于时间上下文的数据生成框架。
results: 对比基eline模型，TCQA模型在TimeQA数据集上的表现提高了8.5个F1得分。

Abstract
Time is one of the crucial factors in real-world question answering (QA) problems. However, language models have difficulty understanding the relationships between time specifiers, such as 'after' and 'before', and numbers, since existing QA datasets do not include sufficient time expressions. To address this issue, we propose a Time-Context aware Question Answering (TCQA) framework. We suggest a Time-Context dependent Span Extraction (TCSE) task, and build a time-context dependent data generation framework for model training. Moreover, we present a metric to evaluate the time awareness of the QA model using TCSE. The TCSE task consists of a question and four sentence candidates classified as correct or incorrect based on time and context. The model is trained to extract the answer span from the sentence that is both correct in time and context. The model trained with TCQA outperforms baseline models up to 8.5 of the F1-score in the TimeQA dataset. Our dataset and code are available at https://github.com/sonjbin/TCQA

摘要
时间是现实世界问答（QA）问题中的一个关键因素。然而，语言模型很难理解时间specifier，如'after'和'before'，以及数字之间的关系，因为现有的QA数据集没有充足的时间表达。为解决这个问题，我们提议一种时间上下文意识Question Answering（TCQA）框架。我们建议一种基于时间上下文的Span抽取任务（TCSE），并为模型训练建立了时间上下文依赖的数据生成框架。此外，我们还提出了一种用于评估时间意识的QA模型的度量，该度量基于TCSE任务。TCSE任务包括一个问题和四个句子选择器，每个选择器根据时间和上下文被分为正确或错误。模型需要从正确的时间和上下文中提取答案Span。与基eline模型相比，使用TCQA训练的模型在TimeQA数据集上的F1得分提高了8.5。我们的数据集和代码可以在https://github.com/sonjbin/TCQA上获取。

Pretraining Language Models with Text-Attributed Heterogeneous Graphs

paper_url: http://arxiv.org/abs/2310.12580
repo_url: https://github.com/hope-rita/thlm
paper_authors: Tao Zou, Le Yu, Yifei Huang, Leilei Sun, Bowen Du
for: 本文旨在提高语言模型（LM）的预训练方法，以更好地捕捉文本各种关系网络中的 topological 和异质信息。
methods: 本文提出了一种新的预训练框架，其包括定义 context graph 和一种 topology-aware 预训练任务，以及一种基于文本增强策略来处理文本不平衡问题。
results: 实验结果表明，本文的方法在三个不同领域的 datasets 上的链接预测和节点分类任务中具有明显的优势，并且每一个设计的合理性。代码可以在 https://github.com/Hope-Rita/THLM 中找到。

Abstract
In many real-world scenarios (e.g., academic networks, social platforms), different types of entities are not only associated with texts but also connected by various relationships, which can be abstracted as Text-Attributed Heterogeneous Graphs (TAHGs). Current pretraining tasks for Language Models (LMs) primarily focus on separately learning the textual information of each entity and overlook the crucial aspect of capturing topological connections among entities in TAHGs. In this paper, we present a new pretraining framework for LMs that explicitly considers the topological and heterogeneous information in TAHGs. Firstly, we define a context graph as neighborhoods of a target node within specific orders and propose a topology-aware pretraining task to predict nodes involved in the context graph by jointly optimizing an LM and an auxiliary heterogeneous graph neural network. Secondly, based on the observation that some nodes are text-rich while others have little text, we devise a text augmentation strategy to enrich textless nodes with their neighbors' texts for handling the imbalance issue. We conduct link prediction and node classification tasks on three datasets from various domains. Experimental results demonstrate the superiority of our approach over existing methods and the rationality of each design. Our code is available at https://github.com/Hope-Rita/THLM.

摘要
在许多实际场景（如学术网络、社交平台），不同类型的实体不仅与文本相关，还之间存在多种关系，可以抽象为文本拥有hetogeneous图（TAHG）。现有的语言模型（LM）预训练任务主要关注每个实体的文本信息，忽略了捕捉TAHG中实体之间的 topological和多样化信息的重要性。在这篇论文中，我们提出了一种新的预训练框架 дляLM，其中明确考虑TAHG中实体之间的topological和多样化信息。首先，我们定义了一个上下文图，即target节点的邻居在特定顺序中的 neighborhood，并提出了一种 topology-aware预训练任务，用于预测target节点的上下文图中的节点。其次，根据发现一些节点有多少文本信息，而其他节点几乎没有文本信息的观察，我们提出了一种文本扩充策略，用于让文本缺乏节点通过与其他节点的文本信息进行扩充。我们在三个不同领域的 dataset上进行了链接预测和节点分类任务。实验结果表明，我们的方法比现有方法更有优势，并且每一个设计的合理性。我们的代码可以在https://github.com/Hope-Rita/THLM中找到。

Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

paper_url: http://arxiv.org/abs/2310.12567
repo_url: https://github.com/PKU-Alignment/safety-gymnasium
paper_authors: Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, Yaodong Yang
for: 本研究旨在提供一个环境套件和一个安全政策优化算法库，以便在安全敏感 scenarios 中实现强化学习的应用。
methods: 本研究使用了 Safety-Gymnasium 环境套件和 Safe Policy Optimization (SafePO) 算法库，提供了多种安全强化学习算法供比较和评估。
results: 本研究将提供一个全面的安全性评估工具，以促进强化学习在安全敏感 scenarios 中的应用。

Abstract
Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns. Safe reinforcement learning (SafeRL) emerges as a solution to optimize policies while simultaneously adhering to multiple constraints, thereby addressing the challenge of integrating reinforcement learning in safety-critical scenarios. In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input. Additionally, we offer a library of algorithms named Safe Policy Optimization (SafePO), comprising 16 state-of-the-art SafeRL algorithms. This comprehensive library can serve as a validation tool for the research community. By introducing this benchmark, we aim to facilitate the evaluation and comparison of safety performance, thus fostering the development of reinforcement learning for safer, more reliable, and responsible real-world applications. The website of this project can be accessed at https://sites.google.com/view/safety-gymnasium.

摘要
人工智能（AI）系统具有潜在的社会进步 potential。然而，它们的部署经常遇到安全问题的阻碍。安全强化学习（SafeRL）作为一种解决方案，可以同时优化策略并遵循多个约束，以解决在安全关键场景中应用强化学习的挑战。在这篇论文中，我们介绍了一个名为安全健身房（Safety-Gymnasium）的环境集合，包括单机和多机场景下的安全关键任务，接受 вектор和视觉输入。此外，我们还提供了一个名为安全策略优化（SafePO）的库，包含16种当前最佳的SafeRL算法。这个全面的库可以作为一个验证工具，以便研究人员对这些算法进行评估和比较。通过推出这个标准，我们希望能够促进强化学习在实际应用中的安全性、可靠性和责任性的发展。相关项目的网站地址为。

DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text

paper_url: http://arxiv.org/abs/2310.12557
repo_url: https://github.com/syon-li/depwignn
paper_authors: Shuaiyi Li, Yang Deng, Wai Lam
for: 本研究旨在提高图像中的空间理解能力，以便在各种应用场景中进行更好的决策。
methods: 我们提出了一种新的深度智能图 neural network（DepWiGNN），其特点在于在图像中采用深度维度进行信息汇集，以避免过拟合问题。
results: 实验结果表明，Compared with现有的空间理解方法，DepWiGNN在两个复杂的多趟空间理解数据集上表现出色，并且在捕捉长距离依赖关系方面具有优势。

Abstract
Spatial reasoning in text plays a crucial role in various real-world applications. Existing approaches for spatial reasoning typically infer spatial relations from pure text, which overlook the gap between natural language and symbolic structures. Graph neural networks (GNNs) have showcased exceptional proficiency in inducing and aggregating symbolic structures. However, classical GNNs face challenges in handling multi-hop spatial reasoning due to the over-smoothing issue, \textit{i.e.}, the performance decreases substantially as the number of graph layers increases. To cope with these challenges, we propose a novel \textbf{Dep}th-\textbf{Wi}se \textbf{G}raph \textbf{N}eural \textbf{N}etwork (\textbf{DepWiGNN}). Specifically, we design a novel node memory scheme and aggregate the information over the depth dimension instead of the breadth dimension of the graph, which empowers the ability to collect long dependencies without stacking multiple layers. Experimental results on two challenging multi-hop spatial reasoning datasets show that DepWiGNN outperforms existing spatial reasoning methods. The comparisons with the other three GNNs further demonstrate its superiority in capturing long dependency in the graph.

摘要
文本中的空间理解在各种实际应用中发挥关键作用。现有的空间理解方法通常从纯文本中推断空间关系，忽略自然语言和符号结构之间的差异。图 neural network（GNN）已经表现出了exceptional的能力 inducting和聚合符号结构。然而，经典GNN难以处理多跳空间理解，因为over-smoothing问题，即在图层数增加时性能减退较大。为了解决这些挑战，我们提出了一种新的Depth-Wise Graph Neural Network（DepWiGNN）。具体来说，我们设计了一种新的节点储存方案，并在图的深度维度上聚合信息，而不是在图的宽度维度上，这使得我们可以不堆叠多层来收集长距离依赖关系。实验结果表明，DepWiGNN在两个复杂的多跳空间理解 dataset 上表现出了比较出色的表现，并且与其他三种 GNN 进行比较，具体来说，DepWiGNN 能够更好地捕捉图中的长距离依赖关系。

Large Language Model for Multi-objective Evolutionary Optimization

paper_url: http://arxiv.org/abs/2310.12541
repo_url: None
paper_authors: Fei Liu, Xi Lin, Zhenkun Wang, Shunyu Yao, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang
for: 解决多目标优化问题 (Multi-Objective Optimization Problems, MOPs)
methods: 使用语言模型 (Large Language Model, LLM) 设计多目标演化算法 (Multi-Objective Evolutionary Algorithm, MOEA) Operator
results: 在不同测试 benchmark 上实现竞争性的表现，并且Operator 只需学习从一些实例就能够在未经见过的问题上有 robust 的一致性表现。

Abstract
Multiobjective evolutionary algorithms (MOEAs) are major methods for solving multiobjective optimization problems (MOPs). Many MOEAs have been proposed in the past decades, of which the search operators need a carefully handcrafted design with domain knowledge. Recently, some attempts have been made to replace the manually designed operators in MOEAs with learning-based operators (e.g., neural network models). However, much effort is still required for designing and training such models, and the learned operators might not generalize well on new problems. To tackle the above challenges, this work investigates a novel approach that leverages the powerful large language model (LLM) to design MOEA operators. With proper prompt engineering, we successfully let a general LLM serve as a black-box search operator for decomposition-based MOEA (MOEA/D) in a zero-shot manner. In addition, by learning from the LLM behavior, we further design an explicit white-box operator with randomness and propose a new version of decomposition-based MOEA, termed MOEA/D-LO. Experimental studies on different test benchmarks show that our proposed method can achieve competitive performance with widely used MOEAs. It is also promising to see the operator only learned from a few instances can have robust generalization performance on unseen problems with quite different patterns and settings. The results reveal the potential benefits of using pre-trained LLMs in the design of MOEAs.

摘要
多目标演化算法（MOEA）是多目标优化问题（MOP）的主要解决方法。过去几十年内，许多MOEA的搜索运算需要手动设计，需要域知识。现在，一些尝试将MOEA中的手动设计 replaced with学习基于神经网络模型（e.g., neural network models）。然而，设计和训练这些模型需要很多努力，并且学习的运算可能无法在新的问题上generalize well。为了解决以上挑战，本研究提出了一种新的方法，利用强大的大语言模型（LLM）来设计MOEA运算。通过适当的提示工程，我们成功地让一个通用的LLM服务器为MOEA/D中的黑盒搜索运算。此外，通过学习LLM的行为，我们进一步设计了一个显式的白盒运算，并提出了一新的MOEA/D版本，称为MOEA/D-LO。实验研究在不同的测试准则上表明，我们的提posed方法可以与常用的MOEA具有竞争性的性能。此外，我们发现只需从一些实例学习，Operator就可以在未看过问题上display robust generalization性。结果表明，使用预训练的LLM可以在MOEA的设计中带来潜在的优势。

Reliable Academic Conference Question Answering: A Study Based on Large Language Model

paper_url: http://arxiv.org/abs/2310.13028
repo_url: None
paper_authors: Zhiwei Huang, Long Jin, Junjie Wang, Mingchen Tu, Yin Hua, Zhiqiang Liu, Jiawei Meng, Huajun Chen, Wen Zhang
for: 这研究的目的是提出一种基于大语言模型的学术会议问答系统，以便快速地回答研究人员关于学术会议的各种问题。
methods: 这种系统使用了一种组合自动和手动的方法，首先将学术会议数据组织成一种半结构化JSON格式，然后为每个会议注释了约100个问题答对。每个对被分为四个维度，并且手动注释了每个答案的来源。
results: 该研究表明，通过采用结构意识的检索方法，可以增强大语言模型的问答能力，并且对学术会议问答 task 进行了验证。

Abstract
The rapid growth of computer science has led to a proliferation of research presented at academic conferences, fostering global scholarly communication. Researchers consistently seek accurate, current information about these events at all stages. This data surge necessitates an intelligent question-answering system to efficiently address researchers' queries and ensure awareness of the latest advancements. The information of conferences is usually published on their official website, organized in a semi-structured way with a lot of text. To address this need, we have developed the ConferenceQA dataset for 7 diverse academic conferences with human annotations. Firstly, we employ a combination of manual and automated methods to organize academic conference data in a semi-structured JSON format. Subsequently, we annotate nearly 100 question-answer pairs for each conference. Each pair is classified into four different dimensions. To ensure the reliability of the data, we manually annotate the source of each answer. In light of recent advancements, Large Language Models (LLMs) have demonstrated impressive performance in various NLP tasks. They have demonstrated impressive capabilities in information-seeking question answering after instruction fine-tuning, and as such, we present our conference QA study based on LLM. Due to hallucination and outdated knowledge of LLMs, we adopt retrieval based methods to enhance LLMs' question-answering abilities. We have proposed a structure-aware retrieval method, specifically designed to leverage inherent structural information during the retrieval process. Empirical validation on the ConferenceQA dataset has demonstrated the effectiveness of this method. The dataset and code are readily accessible on https://github.com/zjukg/ConferenceQA.

摘要
computer科学的快速发展导致学术会议的研究成果激增，促进了全球学术交流。研究人员通常需要准确、实时的信息关于这些活动，这些数据涌入使得需要一个智能问答系统来有效地回答研究人员的问题，以确保对最新的发展进行了了解。会议信息通常发布在官方网站上，排序方式半结构化，具有大量文本。为了解决这个需求，我们开发了7个不同的学术会议的会议QA数据集，并进行了人工和自动方法来组织学术会议数据。然后，我们为每个会议annotated约100个问题答对。每对问题答被分类为四个维度。为保证数据的可靠性，我们手动标注每个答案的来源。鉴于最近的进步，大型自然语言模型（LLMs）在多种自然语言处理任务中表现出色。它们在带有指导的信息时进行问答任务也表现出色。因此，我们基于LLM进行会议QA研究。由于LLM的幻觉和过时知识，我们采用检索方法来增强LLM的问答能力。我们提出了结构意识检索方法，专门利用检索过程中的结构信息。对ConferenceQA数据集的验证表明了这种方法的有效性。数据集和代码可以在GitHub上获得：https://github.com/zjukg/ConferenceQA。

Be Bayesian by Attachments to Catch More Uncertainty

paper_url: http://arxiv.org/abs/2310.13027
repo_url: None
paper_authors: Shiyu Shen, Bin Pan, Tianyang Shi, Tao Li, Zhenwei Shi
for: 这个论文旨在提出一种新的泛矩条件网络（ABNN），以捕捉更多的不确定性从对外数据（OOD）中。
methods: 这个论文使用了一个附加结构（ABNN），它包含一个期望模组和几个分布模组。期望模组是一个深度网络，它主要关注原始任务，而分布模组则是一些小型的bayesian结构，它们serve as ABNN的附加部分，以捕捉ID和OOD数据中的不确定性。
results: 这个论文提出了一个 theoretically sound的ABNN模型，并进行了实验验证，与一些现有的不确定性估计方法进行比较，结果显示ABNN具有较高的uncertainty估计精度。

Abstract
Bayesian Neural Networks (BNNs) have become one of the promising approaches for uncertainty estimation due to the solid theorical foundations. However, the performance of BNNs is affected by the ability of catching uncertainty. Instead of only seeking the distribution of neural network weights by in-distribution (ID) data, in this paper, we propose a new Bayesian Neural Network with an Attached structure (ABNN) to catch more uncertainty from out-of-distribution (OOD) data. We first construct a mathematical description for the uncertainty of OOD data according to the prior distribution, and then develop an attached Bayesian structure to integrate the uncertainty of OOD data into the backbone network. ABNN is composed of an expectation module and several distribution modules. The expectation module is a backbone deep network which focuses on the original task, and the distribution modules are mini Bayesian structures which serve as attachments of the backbone. In particular, the distribution modules aim at extracting the uncertainty from both ID and OOD data. We further provide theoretical analysis for the convergence of ABNN, and experimentally validate its superiority by comparing with some state-of-the-art uncertainty estimation methods Code will be made available.

摘要
权 bayesian neural networks (BNNs) 已成为一种有前途的方法 для uncertainty estimation，它的理论基础非常坚固。然而，BNNs 的性能受到捕捉 uncertainty 的能力的限制。在这篇论文中，我们提议一种新的 Bayesian Neural Network with Attached structure (ABNN)，可以更好地捕捉 OOD 数据中的 uncertainty。我们首先构造了 OOD 数据中 uncertainty 的数学描述，然后开发了一种附加结构来将 OOD 数据中的 uncertainty 集成到主要网络中。ABNN 由一个期望模块和多个分布模块组成。期望模块是一个深度网络，主要关注原始任务，而分布模块则是一些附加的 Bayesian 结构，用于提取 ID 和 OOD 数据中的 uncertainty。我们进一步提供了 ABNN 的理论分析，并通过与一些现有的 uncertainty estimation 方法进行比较，证明 ABNN 的优越性。代码将会公开。

Testing the Consistency of Performance Scores Reported for Binary Classification Problems

paper_url: http://arxiv.org/abs/2310.12527
repo_url: https://github.com/gykovacs/mlscorecheck
paper_authors: Attila Fazekas, György Kovács
for: 本研究旨在提高机器学习中的二分类任务评估方法的可靠性，以及检测报告性能指标的可靠性。
methods: 本研究使用数值方法来检测报告性能指标的可靠性，而不是基于统计学推断。
results: 通过三个医学应用例程，研究人员可以使用提出的方法检测报告性能指标的不一致，以保护科学领域的 integriy。

Abstract
Binary classification is a fundamental task in machine learning, with applications spanning various scientific domains. Whether scientists are conducting fundamental research or refining practical applications, they typically assess and rank classification techniques based on performance metrics such as accuracy, sensitivity, and specificity. However, reported performance scores may not always serve as a reliable basis for research ranking. This can be attributed to undisclosed or unconventional practices related to cross-validation, typographical errors, and other factors. In a given experimental setup, with a specific number of positive and negative test items, most performance scores can assume specific, interrelated values. In this paper, we introduce numerical techniques to assess the consistency of reported performance scores and the assumed experimental setup. Importantly, the proposed approach does not rely on statistical inference but uses numerical methods to identify inconsistencies with certainty. Through three different applications related to medicine, we demonstrate how the proposed techniques can effectively detect inconsistencies, thereby safeguarding the integrity of research fields. To benefit the scientific community, we have made the consistency tests available in an open-source Python package.

摘要
<>将文本翻译成简化字符的中文。<>机器学习中的二分类任务是基础任务之一，它在各个科学领域中有各种应用。科学家在进行基础研究或实践应用时，通常会评估和排序分类技术基于性能指标 such as 准确率、敏感度和特异度。然而，报告的性能分数可能不一定是可靠的基础 для研究排名。这可以归因于不明文或非标准的批处分配、 typographical errors 和其他因素。在给定的实验设置中，有一定数量的正例和负例测试项时，大多数性能分数会假设特定、相关的值。在这篇论文中，我们介绍了数学技术来评估报告的性能分数的一致性和假设的实验设置。importantly，我们的方法不基于统计推断，而是使用数学方法来确定不一致性。通过医学相关的三个应用，我们示例了如何使用我们的方法检测不一致性，从而保护科学领域的准确性。为了利助科学社区，我们将一致性测试公开发布为开源的 Python 包。

Powerset multi-class cross entropy loss for neural speaker diarization

paper_url: http://arxiv.org/abs/2310.13025
repo_url: https://github.com/frenchkrab/is2023-powerset-diarization
paper_authors: Alexis Plaquet, Hervé Bredin
for: 这篇论文旨在提出一种新的Speaker diarization方法，以解决现有的多标签分类问题。
methods: 该方法使用了 permutation-invariant training 和 (local) 监督 EEND diarization 的组合，并对多标签分类问题进行了改进。
results: 经过广泛的实验，该方法在9个标准测试集上达到了较好的性能（特别是在 overlap speech 中），同时消除了探测阈值参数，从而提高了 robustness 和灵活性。

Abstract
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work has been addressing speaker diarization as a frame-wise multi-label classification problem with permutation-invariant training. Despite EEND showing great promise, a few recent works took a step back and studied the possible combination of (local) supervised EEND diarization with (global) unsupervised clustering. Yet, these hybrid contributions did not question the original multi-label formulation. We propose to switch from multi-label (where any two speakers can be active at the same time) to powerset multi-class classification (where dedicated classes are assigned to pairs of overlapping speakers). Through extensive experiments on 9 different benchmarks, we show that this formulation leads to significantly better performance (mostly on overlapping speech) and robustness to domain mismatch, while eliminating the detection threshold hyperparameter, critical for the multi-label formulation.

摘要
自2019年引入以来，整个端到端神经 диари化（EEND）工作一直是将说话者 диари化视为一个帧内多类标签分类问题，并在训练中保持 permutation-invariant。 despite EEND表现出了很大的承诺，一些最近的工作又回退了，研究了可能的(本地)supervised EEND диари化与(全局)无监督分群的组合。然而，这些混合贡献没有质疑原始多类形式。我们提议将从多类（任何两个说话者都可以同时活跃）转换为集合类别分类（每对重叠的说话者都有专门的类别）。通过对9个benchmark进行了广泛的实验，我们显示这种形式导致了明显的性能提高（主要是在重叠speech），同时消除了多个标签分类的检测阈值参数，这是多类分类的 kritical 参数。

RTNH+: Enhanced 4D Radar Object Detection Network using Combined CFAR-based Two-level Preprocessing and Vertical Encoding

paper_url: http://arxiv.org/abs/2310.17659
repo_url: None
paper_authors: Seung-Hyun Kong, Dong-Hee Paek, Sangjae Cho
for: 提高4D Radar对3D物体探测和相对垂直速度估计的精度methods: 提出了两种新算法：一是基于CFAR的两级预处理算法（CCTP），通过生成不同特征的两个 filtered measurement，提高输入4D Radar物体探测网络的信息含义；二是垂直编码算法（VE），有效地编码 Vertical 特征。results: RTNH+在${AP}{3D}^{IoU=0.3}$和${AP}{3D}^{IoU=0.5}$中具有10.14%和16.12%的性能提升，相比RTNH。

Abstract
Four-dimensional (4D) Radar is a useful sensor for 3D object detection and the relative radial speed estimation of surrounding objects under various weather conditions. However, since Radar measurements are corrupted with invalid components such as noise, interference, and clutter, it is necessary to employ a preprocessing algorithm before the 3D object detection with neural networks. In this paper, we propose RTNH+ that is an enhanced version of RTNH, a 4D Radar object detection network, by two novel algorithms. The first algorithm is the combined constant false alarm rate (CFAR)-based two-level preprocessing (CCTP) algorithm that generates two filtered measurements of different characteristics using the same 4D Radar measurements, which can enrich the information of the input to the 4D Radar object detection network. The second is the vertical encoding (VE) algorithm that effectively encodes vertical features of the road objects from the CCTP outputs. We provide details of the RTNH+, and demonstrate that RTNH+ achieves significant performance improvement of 10.14\% in ${AP}_{3D}^{IoU=0.3}$ and 16.12\% in ${AP}_{3D}^{IoU=0.5}$ over RTNH.

摘要
四维度（4D）雷达是一种有用的感知器 для 3D 对象检测和周围对象的相对径速度估计，在不同的天气条件下。然而，由于雷达测量受到无效组成部分的干扰，如噪声、干扰和垃圾，因此需要采用预处理算法 перед 3D 对象检测用神经网络。在这篇论文中，我们提出了 RTNH+，它是 RTNH 的改进版，采用了两个新的算法。第一个算法是基于常量假阳报率（CFAR）的两级预处理算法（CCTP），它使用同一个 4D 雷达测量生成两个不同特征的 filtered measurement，以增加输入到 4D 雷达对象检测网络的信息量。第二个算法是垂直编码（VE）算法，它有效地编码了从 CCTP 输出的 vertical 特征。我们详细介绍了 RTNH+，并证明了 RTNH+ 在 ${AP}_{3D}^{IoU=0.3}$ 和 ${AP}_{3D}^{IoU=0.5}$ 中提高了10.14% 和 16.12% 相对于 RTNH。

Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks

paper_url: http://arxiv.org/abs/2310.12516
repo_url: None
paper_authors: Xiaodong Yu, Hao Cheng, Xiaodong Liu, Dan Roth, Jianfeng Gao
for: This paper aims to develop a method of automatically generating evaluation data for large language models (LLMs) to measure their reliability and detect hallucinations.
methods: The proposed method, called AutoDebug, uses prompting chaining to generate transferable adversarial attacks in the form of question-answering examples. These examples are designed to trigger hallucination behaviors in LLMs.
results: The paper evaluates the effectiveness of AutoDebug using two variants of the Natural Questions (NQ) dataset and a collection of open-source and proprietary LLMs. The results show that LLMs are likely to hallucinate in certain question-answering scenarios, and the adversarial examples generated by AutoDebug are transferable across all considered LLMs.

Abstract
Although remarkable progress has been achieved in preventing large language model (LLM) hallucinations using instruction tuning and retrieval augmentation, it remains challenging to measure the reliability of LLMs using human-crafted evaluation data which is not available for many tasks and domains and could suffer from data leakage. Inspired by adversarial machine learning, this paper aims to develop a method of automatically generating evaluation data by appropriately modifying existing data on which LLMs behave faithfully. Specifically, this paper presents AutoDebug, an LLM-based framework to use prompting chaining to generate transferable adversarial attacks in the form of question-answering examples. We seek to understand the extent to which these examples trigger the hallucination behaviors of LLMs. We implement AutoDebug using ChatGPT and evaluate the resulting two variants of a popular open-domain question-answering dataset, Natural Questions (NQ), on a collection of open-source and proprietary LLMs under various prompting settings. Our generated evaluation data is human-readable and, as we show, humans can answer these modified questions well. Nevertheless, we observe pronounced accuracy drops across multiple LLMs including GPT-4. Our experimental results show that LLMs are likely to hallucinate in two categories of question-answering scenarios where (1) there are conflicts between knowledge given in the prompt and their parametric knowledge, or (2) the knowledge expressed in the prompt is complex. Finally, we find that the adversarial examples generated by our method are transferable across all considered LLMs. The examples generated by a small model can be used to debug a much larger model, making our approach cost-effective.

摘要
尽管在防止大语言模型（LLM）幻 apparition 中使用 instrucion tuning 和 Retrieval Augmentation 进行了很大的进步，但仍然困难量测试 LLM 的可靠性使用人工制作的评估数据，因为这些数据可能不可得到多个任务和领域，并且可能会uffer from data leakage。受到对抗机器学习的启发，本文提出了一种自动生成评估数据的方法，通过修改 LLM 在 faithful 的数据上进行应ropriate 的修改来生成 transferred adversarial attacks。我们想要了解这些例子在 LLM 中引发幻 apparition 行为的程度。我们使用 ChatGPT 实现 AutoDebug 框架，并使用 prompting chaining 技术生成了一些可读性好的评估数据。我们对一些常用的开放领域问答任务 Natural Questions (NQ) 进行了两种变体的评估，并在多个开源和商业 LLM 上进行了多种 prompting 设置的测试。我们发现，这些修改后的问题可以由人类回答，但 LLM 却表现出了明显的准确率下降。我们的实验结果表明，LLM 在某些问题解决方案中会出现幻 apparition 行为，包括知识与 parametric 知识之间的冲突以及复杂的知识表达。最后，我们发现生成的 adversarial examples 是可 пере移的，可以在所有考虑的 LLM 上使用。此外，我们发现小型模型生成的例子可以用来调试大型模型，这使得我们的方法成本低廉。

Towards Anytime Fine-tuning: Continually Pre-trained Language Models with Hypernetwork Prompt

paper_url: http://arxiv.org/abs/2310.13024
repo_url: https://github.com/gangwjiang/hprompt-cpt
paper_authors: Gangwei Jiang, Caigao Jiang, Siqiao Xue, James Y. Zhang, Jun Zhou, Defu Lian, Ying Wei
for: 本研究旨在探讨 continual pre-training 的有效性，以适应快速发展的世界中多个领域和任务。
methods: 我们使用了一种提前学习方法，并通过协调和不协调的损失函数来训练一个 hypernetwork，以生成域pecific的提示。提示可以降低域标识，并且促进了域之间的知识传递。
results: 我们在两个真实世界 dataset 上进行了实验，并获得了3.57%和3.4%的提高，证明了我们的方法的有效性。

Abstract
Continual pre-training has been urgent for adapting a pre-trained model to a multitude of domains and tasks in the fast-evolving world. In practice, a continually pre-trained model is expected to demonstrate not only greater capacity when fine-tuned on pre-trained domains but also a non-decreasing performance on unseen ones. In this work, we first investigate such anytime fine-tuning effectiveness of existing continual pre-training approaches, concluding with unanimously decreased performance on unseen domains. To this end, we propose a prompt-guided continual pre-training method, where we train a hypernetwork to generate domain-specific prompts by both agreement and disagreement losses. The agreement loss maximally preserves the generalization of a pre-trained model to new domains, and the disagreement one guards the exclusiveness of the generated hidden states for each domain. Remarkably, prompts by the hypernetwork alleviate the domain identity when fine-tuning and promote knowledge transfer across domains. Our method achieved improvements of 3.57% and 3.4% on two real-world datasets (including domain shift and temporal shift), respectively, demonstrating its efficacy.

摘要
To address this issue, we propose a prompt-guided continual pre-training method. Our approach involves training a hypernetwork to generate domain-specific prompts using both agreement and disagreement losses. The agreement loss ensures that the generated prompts do not deviate from the original input, thereby preserving the generalization of the pre-trained model to new domains. The disagreement loss encourages the generated prompts to be unique and exclusive for each domain, preventing the model from overfitting to a single domain.Remarkably, the prompts generated by the hypernetwork help to alleviate the domain shift problem when fine-tuning and promote knowledge transfer across domains. Our method achieved improvements of 3.57% and 3.4% on two real-world datasets (including domain shift and temporal shift), demonstrating its effectiveness.

GraphGPT: Graph Instruction Tuning for Large Language Models

paper_url: http://arxiv.org/abs/2310.13023
repo_url: https://github.com/HKUDS/GraphGPT
paper_authors: Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, Chao Huang
for: 提高图模型的通用化能力，在零shot学习场景下实现高通用性。
methods: 提出了一种基于大语言模型（LLMs）的图模型框架，即图GPT框架，通过文本图关联组件和双Stage指令调整方法来帮助LLMs更好地理解图结构和适应不同下游任务。
results: 通过对超级vised和零shot图学习任务进行评估，表明了我们的框架在不同下游任务中的优于state-of-the-art基elines。

Abstract
Graph Neural Networks (GNNs) have advanced graph structure understanding via recursive information exchange and aggregation among graph nodes. To improve model robustness, self-supervised learning (SSL) has emerged as a promising approach for data augmentation. However, existing methods for generating pre-trained graph embeddings often rely on fine-tuning with specific downstream task labels, which limits their usability in scenarios where labeled data is scarce or unavailable. To address this, our research focuses on advancing the generalization capabilities of graph models in challenging zero-shot learning scenarios. Inspired by the success of large language models (LLMs), we aim to develop a graph-oriented LLM that can achieve high generalization across diverse downstream datasets and tasks, even without any information available from the downstream graph data. In this work, we present the GraphGPT framework that aligns LLMs with graph structural knowledge with a graph instruction tuning paradigm. Our framework incorporates a text-graph grounding component to establish a connection between textual information and graph structures. Additionally, we propose a dual-stage instruction tuning paradigm, accompanied by a lightweight graph-text alignment projector. This paradigm explores self-supervised graph structural signals and task-specific graph instructions, to guide LLMs in understanding complex graph structures and improving their adaptability across different downstream tasks. Our framework is evaluated on supervised and zero-shot graph learning tasks, demonstrating superior generalization and outperforming state-of-the-art baselines.

摘要
GRAPH NEURAL NETWORKS (GNNs) 有助于更好地理解图结构，通过图节点之间的循环信息交换和聚合来提高模型的 robustness。然而，现有的图模型生成预训练 embeddings 方法通常依赖于特定的下游任务标签，这限制了它们在数据罕见或无标签情况下的使用性。为了解决这个问题，我们的研究集中着精力在挑战的零例学习场景中提高图模型的通用性。受到大型自然语言模型 (LLMs) 的成功所 inspirited，我们目标是开发一个可以在多种下游任务和数据上高度通用的图模型。在这个工作中，我们提出了 GraphGPT 框架，它将 LLMs 与图结构知识集成，并通过图 instrucion 优化 paradigm 来实现。我们的框架包括一个文本-图结合组件，用于将文本信息与图结构相连。此外，我们还提出了一种双stage instrucion 优化 парадигмы，并附加了一个轻量级的图文对齐项目。这种 парадигмы探索了自然语言 Task 特定的图结构信号和任务特定的图 instrucion，以帮助 LLMs 理解复杂的图结构并提高其适应性。我们的框架在超级vised和零例学习图学习任务上进行了评估，并表现出了superior的通用性和超越了现有基eline。

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

paper_url: http://arxiv.org/abs/2310.12508
repo_url: https://github.com/optml-group/unlearn-saliency
paper_authors: Chongyu Fan, Jiancheng Liu, Yihua Zhang, Dennis Wei, Eric Wong, Sijia Liu
for: This paper focuses on the problem of machine unlearning (MU) and introduces a new method called saliency unlearning (SalUn) to address the limitations of existing MU methods.
methods: The SalUn method uses the concept of weight saliency to direct MU’s attention toward specific model weights, improving effectiveness and efficiency.
results: SalUn narrows the performance gap with exact unlearning and achieves better stability and accuracy in high-variance random data forgetting and preventing conditional diffusion models from generating harmful images.Here’s the same information in Simplified Chinese text:
for: 这篇论文关注机器学习忘记（MU）问题，并提出了一种新的方法called saliency unlearning（SalUn），以解决现有MU方法的限制。
methods: SalUn方法利用模型重量焦点的概念，引导MU的注意力向特定的模型重量，提高效果和效率。
results: SalUn方法在高异常随机数据忘记中减小了与精准忘记（模型重新训练）之间的性能差距，并在预测扩散模型生成危险图像时达到了 nearly 100%的忘记精度。

Abstract
With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often grapple with limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' in MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting dataset). To the best of our knowledge, SalUn is the first principled MU approach adaptable enough to effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation. For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not.

摘要
With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often grapple with limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' in MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting dataset). To the best of our knowledge, SalUn is the first principled MU approach adaptable enough to effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation. For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not.

Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models

paper_url: http://arxiv.org/abs/2310.12481
repo_url: None
paper_authors: Wenxuan Wang, Wenxiang Jiao, Jingyuan Huang, Ruyi Dai, Jen-tse Huang, Zhaopeng Tu, Michael R. Lyu
for: 本研究探讨了大语言模型（LLM）中的文化主导问题，即因训练数据主要来自英语而导致的模型偏向英语文化。
methods: 我们建立了一个包含具体（如假期和歌曲）和抽象（如价值观和意见）文化对象的基准。我们对一些代表性的GPT模型进行了系统性的评估，发现这些模型受到文化主导问题的影响，GPT-4最为严重，而text-davinci-003最少受到这种问题的影响。
results: 我们的研究强调了在开发和部署LMM时的文化主导和伦理考虑的必要性。我们示出了在模型开发和部署中使用两种简单的方法（即预训练数据更加多样化和文化意识提醒）可以有效缓解LMM中的文化主导问题。

Abstract
In this paper, we identify a cultural dominance issue within large language models (LLMs) due to the predominant use of English data in model training (e.g. ChatGPT). LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages. To systematically evaluate the cultural dominance issue, we build a benchmark that consists of both concrete (e.g. holidays and songs) and abstract (e.g. values and opinions) cultural objects. Empirical results show that the representative GPT models suffer from the culture dominance problem, where GPT-4 is the most affected while text-davinci-003 suffers the least from this problem. Our study emphasizes the need for critical examination of cultural dominance and ethical consideration in their development and deployment. We show two straightforward methods in model development (i.e. pretraining on more diverse data) and deployment (e.g. culture-aware prompting) can significantly mitigate the cultural dominance issue in LLMs.

摘要
在这篇论文中，我们发现了大语言模型（LLM）中的文化主导问题，即因训练数据主要来自英语而导致模型提供不 relevante的英文文化答案。我们建立了一个 benchmark，包括具体的文化对象（如假日和歌曲）和抽象的文化对象（如价值和意见），以系统地评估文化主导问题。实验结果表明，代表性的 GPT 模型受到文化主导问题的影响，其中 GPT-4 最为严重，而 text-davinci-003 最少受到这个问题的影响。我们的研究强调了在开发和部署 LLM 时需要进行严格的文化主导和伦理考虑。我们展示了在模型开发和部署阶段使用更多多样化数据和文化意识提醒等两种简单方法可以减轻 LLM 中的文化主导问题。

GRAPE-S: Near Real-Time Coalition Formation for Multiple Service Collectives

paper_url: http://arxiv.org/abs/2310.12480
repo_url: None
paper_authors: Grace Diehl, Julie A. Adams
for: 这种论文是为了解决机器人集群在军事和灾害应急应用中的协会化问题，包括分配机器人到合适的任务小组。methods: 这种论文使用了GRAPE算法和服务模型，以及两种拍卖基础算法作为比较。results: 论文表明，拍卖基础算法在分布式集群中转移不好，导致过长的运行时间和低Utility解决方案。GRAPE-S和Pair-GRAPE-S算法可以在近实时内提供近似优质解决方案，并且可以支持大规模分布式集群和多种服务。

Abstract
Robotic collectives for military and disaster response applications require coalition formation algorithms to partition robots into appropriate task teams. Collectives' missions will often incorporate tasks that require multiple high-level robot behaviors or services, which coalition formation must accommodate. The highly dynamic and unstructured application domains also necessitate that coalition formation algorithms produce near optimal solutions (i.e., >95% utility) in near real-time (i.e., <5 minutes) with very large collectives (i.e., hundreds of robots). No previous coalition formation algorithm satisfies these requirements. An initial evaluation found that traditional auction-based algorithms' runtimes are too long, even though the centralized simulator incorporated ideal conditions unlikely to occur in real-world deployments (i.e., synchronization across robots and perfect, instantaneous communication). The hedonic game-based GRAPE algorithm can produce solutions in near real-time, but cannot be applied to multiple service collectives. This manuscript integrates GRAPE and a services model, producing GRAPE-S and Pair-GRAPE-S. These algorithms and two auction baselines were evaluated using a centralized simulator with up to 1000 robots, and via the largest distributed coalition formation simulated evaluation to date, with up to 500 robots. The evaluations demonstrate that auctions transfer poorly to distributed collectives, resulting in excessive runtimes and low utility solutions. GRAPE-S satisfies the target domains' coalition formation requirements, producing near optimal solutions in near real-time, and Pair-GRAPE-S more than satisfies the domain requirements, producing optimal solutions in near real-time. GRAPE-S and Pair-GRAPE-S are the first algorithms demonstrated to support near real-time coalition formation for very large, distributed collectives with multiple services.

摘要
军事和灾害应急应用中的机器人集群需要联盟形成算法来将机器人分配到相应的任务团队。集群的任务经常包括多个高级机器人行为或服务，联盟形成算法必须满足这些要求。应用领域具有高度动态和无结构特点，因此联盟形成算法需要生成>95%的利用率（i.e., <5分钟内），并且可以处理数百个机器人。现有的任何一种联盟形成算法都不满足这些要求。一个初步评估发现，传统的拍卖式算法的运行时间太长，即使在中央模拟器中包含理想的条件（i.e., 机器人同步和精准、实时通信）。hedonic game-based GRAPE算法可以在近实时生成解决方案，但不能应用于多服务集群。这篇论文将GRAPE算法与服务模型集成，生成GRAPE-S和Pair-GRAPE-S。这些算法和两个拍卖基准被用中央模拟器中的1000台机器人进行评估，以及最大的分布式联盟形成模拟评估，包含500台机器人。评估结果表明，拍卖式算法在分布式集群中传递不好，导致运行时间过长，解决方案质量低。GRAPE-S满足了目标领域的联盟形成要求，在近实时内生成近似优解，而Pair-GRAPE-S更 чем满足了领域要求，在近实时内生成优解。GRAPE-S和Pair-GRAPE-S是首个在大规模、分布式集群中实现近实时联盟形成的算法。

An Exploration of In-Context Learning for Speech Language Model

paper_url: http://arxiv.org/abs/2310.12477
repo_url: https://github.com/Aryia-Behroziuan/Robot-learning
paper_authors: Ming-Hao Hsu, Kai-Wei Chang, Shang-Wen Li, Hung-yi Lee
for: 本研究探讨了语音自然语言处理领域中卷积学习（ICL）的可能性，以便不需要文本指导或参数修改，使语音语言模型（LM）能够快速学习和适应。
methods: 本研究使用了提档训练方法，使语音LM能够完成无文本指导的几拟学习。
results: 研究表明，通过提档训练，语音LM可以在未看到任务示例的情况下完成几拟学习，并在语音分类任务中证明了其可行性。

Abstract
Ever since the development of GPT-3 in the natural language processing (NLP) field, in-context learning (ICL) has played an important role in utilizing large language models (LLMs). By presenting the LM utterance-label demonstrations at the input, the LM can accomplish few-shot learning without relying on gradient descent or requiring explicit modification of its parameters. This enables the LM to learn and adapt in a black-box manner. Despite the success of ICL in NLP, little work is exploring the possibility of ICL in speech processing. This study proposes the first exploration of ICL with a speech LM without text supervision. We first show that the current speech LM does not have the ICL capability. With the proposed warmup training, the speech LM can, therefore, perform ICL on unseen tasks. In this work, we verify the feasibility of ICL for speech LM on speech classification tasks.

摘要
Translated into Simplified Chinese: desde el desarrollo de GPT-3 en el campo de procesamiento de lenguaje natural (NLP), la aprendizaje en contexto (ICL) ha desempeñado un papel importante en el uso de modelos de lenguaje grande (LLM). Al presentar las demostraciones de utterance-etiqueta del LM como entrada, el LM puede realizar aprendizaje de pocos tiros sin depender del descenso de gradientes o requerir modificaciones explícitas de sus parámetros. Esto permite al LM aprender y adaptarse de manera "caja negra". A pesar del éxito de ICL en NLP, poco trabajo se ha realizado en explorar la posibilidad de ICL en el procesamiento de speech. Este estudio propone la primera exploración de ICL con un modelo de speech sin supervisión textual. Primero mostramos que el actual modelo de speech no tiene la capacidad de ICL. Con la formación de warm-up propuesta, el modelo de speech puede, por lo tanto, realizar ICL en tareas no vistas. En este trabajo, verificamos la factibilidad de ICL para el modelo de speech en tareas de clasificación de speech.

Affective Conversational Agents: Understanding Expectations and Personal Influences

paper_url: http://arxiv.org/abs/2310.12459
repo_url: None
paper_authors: Javier Hernandez, Jina Suh, Judith Amores, Kael Rowan, Gonzalo Ramos, Mary Czerwinski
for: 这个研究的目的是调查人们对AI会话代理的情感能力的期望和偏好，以便更好地理解它们在不同应用场景中的表现和用户体验。
methods: 这个研究使用了745名受试者的问卷调查，以评估受试者对不同情感能力的需求和偏好。Specifically, the study assessed preferences regarding AI agents that can perceive, respond to, and simulate emotions across 32 distinct scenarios.
results: 研究发现，受试者对AI会话代理的情感能力的需求因应用场景而异，具体来说，受试者最偏好AI agents能够进行人际交流、提供情感支持和创造性任务。 Additionally, the study found that factors such as emotional reappraisal and personality traits influence the desired affective skills in AI agents.

Abstract
The rise of AI conversational agents has broadened opportunities to enhance human capabilities across various domains. As these agents become more prevalent, it is crucial to investigate the impact of different affective abilities on their performance and user experience. In this study, we surveyed 745 respondents to understand the expectations and preferences regarding affective skills in various applications. Specifically, we assessed preferences concerning AI agents that can perceive, respond to, and simulate emotions across 32 distinct scenarios. Our results indicate a preference for scenarios that involve human interaction, emotional support, and creative tasks, with influences from factors such as emotional reappraisal and personality traits. Overall, the desired affective skills in AI agents depend largely on the application's context and nature, emphasizing the need for adaptability and context-awareness in the design of affective AI conversational agents.

摘要
人工智能对话代理的出现已经扩大了增强人类能力的机会，在不同领域。随着这些代理变得更普遍，研究对它们的表现和用户体验的影响已经变得非常重要。在这项研究中，我们询问了745名参与者，以了解他们对不同情感能力的期望和偏好。我们专门评估了参与者对情感感知、应对和模拟情感的场景中的偏好。我们的结果表明，参与者对人工交流、情感支持和创造性任务的场景有很高的偏好，这些场景的影响因素包括情感重新评估和个性特质。总之，人工智能对话代理所需的情感能力取决于应用场景的内容和性质，这推翻了设计情感AI对话代理的需要适应性和场景意识。

Rethinking the Construction of Effective Metrics for Understanding the Mechanisms of Pretrained Language Models

paper_url: http://arxiv.org/abs/2310.12454
repo_url: https://github.com/cclx/effective_metrics
paper_authors: You Li, Jinhui Yin, Yuming Lin
for: 本研究旨在解释BERT-like预训练语言模型的机制，并提出一种基于树Topological Probe的方法来计算这些机制。
methods: 本研究使用了一种基于树Topological Probe的方法来计算BERT-large模型中的各种机制，并对BERT-large模型进行了测试。
results: 实验结果表明，使用树Topological Probe可以提供有用的信息，并且可以帮助提高 fine-tuning 性能。此外，研究还提出了一种可能的BERT-like预训练语言模型的工作机制。

Abstract
Pretrained language models are expected to effectively map input text to a set of vectors while preserving the inherent relationships within the text. Consequently, designing a white-box model to compute metrics that reflect the presence of specific internal relations in these vectors has become a common approach for post-hoc interpretability analysis of pretrained language models. However, achieving interpretability in white-box models and ensuring the rigor of metric computation becomes challenging when the source model lacks inherent interpretability. Therefore, in this paper, we discuss striking a balance in this trade-off and propose a novel line to constructing metrics for understanding the mechanisms of pretrained language models. We have specifically designed a family of metrics along this line of investigation, and the model used to compute these metrics is referred to as the tree topological probe. We conducted measurements on BERT-large by using these metrics. Based on the experimental results, we propose a speculation regarding the working mechanism of BERT-like pretrained language models, as well as a strategy for enhancing fine-tuning performance by leveraging the topological probe to improve specific submodules.

摘要
<> translate "Pretrained language models are expected to effectively map input text to a set of vectors while preserving the inherent relationships within the text. Consequently, designing a white-box model to compute metrics that reflect the presence of specific internal relations in these vectors has become a common approach for post-hoc interpretability analysis of pretrained language models. However, achieving interpretability in white-box models and ensuring the rigor of metric computation becomes challenging when the source model lacks inherent interpretability. Therefore, in this paper, we discuss striking a balance in this trade-off and propose a novel line to constructing metrics for understanding the mechanisms of pretrained language models. We have specifically designed a family of metrics along this line of investigation, and the model used to compute these metrics is referred to as the tree topological probe. We conducted measurements on BERT-large by using these metrics. Based on the experimental results, we propose a speculation regarding the working mechanism of BERT-like pretrained language models, as well as a strategy for enhancing fine-tuning performance by leveraging the topological probe to improve specific submodules."中文翻译：预训言语模型预期能够有效地将输入文本映射到一组矢量，保留文本中内在关系的约束。因此，设计一个白盒模型来计算这些矢量中特定内在关系的指标，成为预训言语模型后期可读性分析的常见方法。然而，在白盒模型中实现可读性和指标计算的精准性变得困难，因为源模型缺乏内在可读性。因此，在这篇论文中，我们讨论如何平衡这种贸易，并提出了一种新的方法来构建适用于理解预训言语模型机制的指标。我们专门设计了一家指标，用于计算这些指标，并称之为树Topological probe。我们对BERT-large进行了测量，并基于实验结果，我们提出了预训言语模型工作机制的 Speculation，以及可以通过树Topological probe来提高精度调整性的策略。

MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features

paper_url: http://arxiv.org/abs/2310.12451
repo_url: https://github.com/huayuliarizona/mst-lof
paper_authors: Huayu Li, Ana S. Carreon-Rascon, Xiwen Chen, Geng Yuan, Ao Li
for: 这 paper 是为了提高医疗数据的表示学学习，特别是针对医疗时间序数据。
methods: 这 paper 使用了自适应学习（SSL）和伪预测器（MAE）方法，并将其组合起来，以提高医疗应用程序的表示学学习。它还使用了多个遮盲策略，以便在不同的医疗时间序数据上学习不同的视图。
results: 实验结果表明，MTS-LOF 比其他方法更高效，并且可以更好地捕捉医疗时间序数据中的 Contextual information。这些结果表明，MTS-LOF 可以提高医疗应用程序的表示学学习，并且可以更好地理解医疗数据的复杂关系。

Abstract
Medical time series data are indispensable in healthcare, providing critical insights for disease diagnosis, treatment planning, and patient management. The exponential growth in data complexity, driven by advanced sensor technologies, has presented challenges related to data labeling. Self-supervised learning (SSL) has emerged as a transformative approach to address these challenges, eliminating the need for extensive human annotation. In this study, we introduce a novel framework for Medical Time Series Representation Learning, known as MTS-LOF. MTS-LOF leverages the strengths of contrastive learning and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data. By combining these techniques, MTS-LOF enhances the potential of healthcare applications by providing more sophisticated, context-rich representations. Additionally, MTS-LOF employs a multi-masking strategy to facilitate occlusion-invariant feature learning. This approach allows the model to create multiple views of the data by masking portions of it. By minimizing the discrepancy between the representations of these masked patches and the fully visible patches, MTS-LOF learns to capture rich contextual information within medical time series datasets. The results of experiments conducted on diverse medical time series datasets demonstrate the superiority of MTS-LOF over other methods. These findings hold promise for significantly enhancing healthcare applications by improving representation learning. Furthermore, our work delves into the integration of joint-embedding SSL and MAE techniques, shedding light on the intricate interplay between temporal and structural dependencies in healthcare data. This understanding is crucial, as it allows us to grasp the complexities of healthcare data analysis.

摘要
医疗时间序数据是医疗健康管理中不可或缺的，它们提供了重要的疾病诊断、治疗规划和患者管理的关键信息。随着感知技术的不断发展，医疗时间序数据的复杂性呈指数增长，这对数据标注带来了挑战。自主学习（SSL）已经成为一种解决这些挑战的transformative方法，不需要大量的人类标注。在这个研究中，我们介绍了一种新的医疗时间序表示学习框架，称为MTS-LOF。MTS-LOF利用了对比学习和Masked Autoencoder（MAE）方法的优点，提供了一种新的医疗时间序表示学习方法。通过将这些技术相结合，MTS-LOF可以为医疗应用程序提供更加复杂、上下文rich的表示。此外，MTS-LOF使用多masking策略来促进遮盲不变的特征学习。这种方法使得模型可以创建多个视图的数据。通过将这些遮盲的patches与完整可见的patches的表示差异到最小，MTS-LOF可以捕捉医疗时间序数据中的rich上下文信息。实验结果表明，MTS-LOF在多种医疗时间序数据集上的性能superior于其他方法。这些发现承诺可以大幅提高医疗应用程序的表示学习，并且我们的工作也探讨了 joint-embedding SSL和MAE技术的共同作用， shedding light on the intricate interplay between temporal and structural dependencies in healthcare data。这种理解是关键的，因为它允许我们更好地理解医疗数据分析的复杂性。

Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy Searcher

paper_url: http://arxiv.org/abs/2310.12443
repo_url: None
paper_authors: Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu
for: 提高 relevance 和提供直接答案的搜寻结果的可靠性和信任性。
methods: 提出了一个 novel 的生成搜寻框架，利用 LLM 的知识实现查询和线上资源之间的直接关联，包括 Generator、Validator 和 Optimizer 三个核心模组，每个模组专注于生成可靠的线上资源、验证来源可靠性和修正不可靠来源。
results: 经过广泛的实验和评估，我们的方法在不同的 SOTA 方法面进行了超过其他方法的重要性、责任性和信任性。

Abstract
The advent of Large Language Models (LLMs) has shown the potential to improve relevance and provide direct answers in web searches. However, challenges arise in validating the reliability of generated results and the credibility of contributing sources, due to the limitations of traditional information retrieval algorithms and the LLM hallucination problem. Aiming to create a "PageRank" for the LLM era, we strive to transform LLM into a relevant, responsible, and trustworthy searcher. We propose a novel generative retrieval framework leveraging the knowledge of LLMs to foster a direct link between queries and online sources. This framework consists of three core modules: Generator, Validator, and Optimizer, each focusing on generating trustworthy online sources, verifying source reliability, and refining unreliable sources, respectively. Extensive experiments and evaluations highlight our method's superior relevance, responsibility, and trustfulness against various SOTA methods.

摘要
LLM时代的出现带来了提高搜索结果相关性和直接回答的潜力。然而，验证生成结果的可靠性和贡献来源的可靠性受到传统信息检索算法的限制和LLM幻觉问题的影响。我们希望通过转化LLM成为可靠、负责任和可信的搜索引擎。我们提出了一种新的生成检索框架，利用LLM的知识来建立直接连接查询和在线源。这个框架包括三个核心模块：生成器、验证器和优化器，每个模块都关注生成可靠的在线源，验证源可靠性，并且修复不可靠的源。我们的方法在多种SOTA方法的比较中显示出了更高的相关性、负责任性和可信worthiness。

PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models

paper_url: http://arxiv.org/abs/2310.12439
repo_url: https://github.com/grasses/poisonprompt
paper_authors: Hongwei Yao, Jian Lou, Zhan Qin
for: 研究表示，预训练大型自然语言模型（LLM）的提示方法可以提高其下游任务的性能，但是这些方法尚未充分探讨过攻击性质的威胁。本文提出了一种新的后门攻击方法，可以成功地攻击硬件和软件提示方法的LLM。
methods: 本文使用的三种 популяр的提示方法是：（1）硬件提示法（Hard Prompt），（2）软件提示法（Soft Prompt），（3）混合提示法（Hybrid Prompt）。
results: 根据EXTENSIVE experiments中的结果，POISONPROMPT可以成功地攻击三种提示方法中的两种（硬件和软件提示法），并且可以在六个数据集和三种常用的LLM中进行可靠的攻击。这些结果表明，提示方法可以增强LLM的下游任务性能，但是同时也增加了LLM的安全风险。

Abstract
Prompts have significantly improved the performance of pretrained Large Language Models (LLMs) on various downstream tasks recently, making them increasingly indispensable for a diverse range of LLM application scenarios. However, the backdoor vulnerability, a serious security threat that can maliciously alter the victim model's normal predictions, has not been sufficiently explored for prompt-based LLMs. In this paper, we present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs. We evaluate the effectiveness, fidelity, and robustness of POISONPROMPT through extensive experiments on three popular prompt methods, using six datasets and three widely used LLMs. Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.

摘要
Recently, 提示（prompts）have significantly improved the performance of pre-trained Large Language Models (LLMs) on various downstream tasks, making them increasingly indispensable for a diverse range of LLM application scenarios. However, the backdoor vulnerability, a serious security threat that can maliciously alter the victim model's normal predictions, has not been sufficiently explored for prompt-based LLMs. In this paper, we present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs. We evaluate the effectiveness, fidelity, and robustness of POISONPROMPT through extensive experiments on three popular prompt methods, using six datasets and three widely used LLMs. Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.Here's the text with traditional Chinese characters:Recently, 提示（prompts）have significantly improved the performance of pre-trained Large Language Models (LLMs) on various downstream tasks, making them increasingly indispensable for a diverse range of LLM application scenarios. However, the backdoor vulnerability, a serious security threat that can maliciously alter the victim model's normal predictions, has not been sufficiently explored for prompt-based LLMs. In this paper, we present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs. We evaluate the effectiveness, fidelity, and robustness of POISONPROMPT through extensive experiments on three popular prompt methods, using six datasets and three widely used LLMs. Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.

Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach

paper_url: http://arxiv.org/abs/2310.12428
repo_url: None
paper_authors: Joshua Rosaler, Dhruv Desai, Bhaskarjit Sarmah, Dimitrios Vamvourellis, Deran Onay, Dhagash Mehta, Stefano Pasquali
for: 这个论文旨在解释随机森林模型（RF）的外样性表现，通过利用随机森林可以表示为一种可适应权重k最近邻居模型来实现。
methods: 这种方法使用随机森林中点之间的距离学习到的特征空间 proximity 来重写随机森林预测，将随机森林预测转化为一个权重平均的目标标签。
results: 这种方法可以为随机森林预测提供地方性的解释，生成对于任何模型预测的贡献，并且与已有的方法like SHAP相比，能够更好地解释随机森林模型的外样性表现。

Abstract
We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.

摘要
我们提出了一种新的方法来解释随机森林（RF）模型的外样性表现，利用随机森林可以视为一种适应加权最近邻近模型的事实。特别是，我们使用随机森林学习到的特征空间中点的距离来重写随机森林预测为一个加权平均的target标签值。这种线性性使得我们可以在培 обу集中的观察点上生成随机森林预测的解释，并且与已有的方法 like SHAP 相结合，从而提供了一种地方性的解释随机森林预测的方法。我们在美国公司债券交易数据集上应用了这种方法，并与其他已有的解释方法进行比较。

MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models

paper_url: http://arxiv.org/abs/2310.12426
repo_url: https://github.com/deepakn97/maf
paper_authors: Deepak Nathani, David Wang, Liangming Pan, William Yang Wang
for: 提高自然语言理解能力
methods: 多方面反馈机制，包括冻结LM和外部工具，每一个模块专注于特定的错误类型
results: 对LM生成的理由链中多种错误进行改进，提高LM在多种逻辑任务的表现，相对提高率达20%（数学逻辑）和18%（逻辑推论）

Abstract
Language Models (LMs) have shown impressive performance in various natural language tasks. However, when it comes to natural language reasoning, LMs still face challenges such as hallucination, generating incorrect intermediate reasoning steps, and making mathematical errors. Recent research has focused on enhancing LMs through self-improvement using feedback. Nevertheless, existing approaches relying on a single generic feedback source fail to address the diverse error types found in LM-generated reasoning chains. In this work, we propose Multi-Aspect Feedback, an iterative refinement framework that integrates multiple feedback modules, including frozen LMs and external tools, each focusing on a specific error category. Our experimental results demonstrate the efficacy of our approach to addressing several errors in the LM-generated reasoning chain and thus improving the overall performance of an LM in several reasoning tasks. We see a relative improvement of up to 20% in Mathematical Reasoning and up to 18% in Logical Entailment.

摘要
自然语言模型（LM）在各种自然语言任务中表现出色，但在自然语言推理方面仍面临挑战，如幻见、生成错误中间推理步骤和数学错误。现有研究集中在使LM自我改进的方法，但现有的方法均靠单一的通用反馈源，无法解决LM生成推理链中的多种错误类型。在这项工作中，我们提出了多方面反馈（Multi-Aspect Feedback），一种迭代优化框架，它将多个反馈模块、包括冻结LM和外部工具，每个模块都专注于特定错误类型。我们的实验结果表明，我们的方法可以有效地改进LM在数学推理和逻辑推理等任务中的表现，相对提高了20%以上，并在逻辑推理任务中提高了18%以上。

Automated Repair of Declarative Software Specifications in the Era of Large Language Models

paper_url: http://arxiv.org/abs/2310.12425
repo_url: None
paper_authors: Md Rashedul Hasan, Jiawei Li, Iftekhar Ahmed, Hamid Bagheri
for: 这个研究旨在评估OpenAI的ChatGPT在Alloy声明语言中自动修复软件规范的能力。
methods: 该研究使用ChatGPT自动修复Alloy规范，并比较其与现有的自动修复方法的效果。
results: 研究发现ChatGPT可以成功修复一些其他方法无法 Address的错误，但也存在一些错误和幻见问题。

Abstract
The growing adoption of declarative software specification languages, coupled with their inherent difficulty in debugging, has underscored the need for effective and automated repair techniques applicable to such languages. Researchers have recently explored various methods to automatically repair declarative software specifications, such as template-based repair, feedback-driven iterative repair, and bounded exhaustive approaches. The latest developments in large language models provide new opportunities for the automatic repair of declarative specifications. In this study, we assess the effectiveness of utilizing OpenAI's ChatGPT to repair software specifications written in the Alloy declarative language. Unlike imperative languages, specifications in Alloy are not executed but rather translated into logical formulas and evaluated using backend constraint solvers to identify specification instances and counterexamples to assertions. Our evaluation focuses on ChatGPT's ability to improve the correctness and completeness of Alloy declarative specifications through automatic repairs. We analyze the results produced by ChatGPT and compare them with those of leading automatic Alloy repair methods. Our study revealed that while ChatGPT falls short in comparison to existing techniques, it was able to successfully repair bugs that no other technique could address. Our analysis also identified errors in ChatGPT's generated repairs, including improper operator usage, type errors, higher-order logic misuse, and relational arity mismatches. Additionally, we observed instances of hallucinations in ChatGPT-generated repairs and inconsistency in its results. Our study provides valuable insights for software practitioners, researchers, and tool builders considering ChatGPT for declarative specification repairs.

摘要
随着声明式软件要求的使用越来越普遍，人们对于这类语言的自动修复技术的需求也在增长。研究人员们已经开始 explore various方法来自动修复声明式软件要求，如模板基于的修复、反馈驱动的迭代修复和约束搜索等方法。最新的大语言模型提供了新的机会 для声明式软件要求的自动修复。本研究通过使用 OpenAI 的 ChatGPT 来评估 Alloy 声明语言中的自动修复效果。不同于 imperative 语言，Alloy 的specification 不会被执行，而是被翻译成逻辑方程并通过 backend 约束解释器来评估 specification 实例和 counterexample 。我们的评估将注重 ChatGPT 在修复 Alloy 声明语言中的正确性和完整性。我们分析了 ChatGPT 生成的结果，并与主流的自动 Alloy 修复方法进行比较。我们的研究发现，虽然 ChatGPT 落后于现有的技术，但它能够成功修复一些其他技术无法处理的错误。我们的分析还发现了 ChatGPT 生成的修复结果中的错误，包括不正确的运算使用、类型错误、高阶逻辑错误和关系性质匹配错误。此外，我们还观察到了 ChatGPT 生成的修复结果中的幻见和结果不一致。本研究为软件实践人员、研究人员和工具制作人员在考虑使用 ChatGPT 进行声明式软件要求的修复提供了有价值的洞察。

Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding

paper_url: http://arxiv.org/abs/2310.13022
repo_url: https://github.com/wjn1996/upet
paper_authors: Jianing Wang, Qiushi Sun, Nuo Chen, Chengyu Wang, Jun Huang, Ming Gao, Xiang Li
for: 这个 paper 的目的是提高大型预训语言模型在具有有限资源的情况下表现，以解决大型预训语言模型对于有限资源的缺乏。
methods: 这个 paper 使用了自教学法（SSL），具体来说是使用大量无标的数据生成模拟例子，并在教师模型中添加 Monte Carlo 抽掉以进行不确定性估计。在学生训练中，我们提出了多个参数效率学习（PEL）方法，允许仅更新一小部分参数。
results: 实验结果显示，UPET 可以提高表现和效率，并且可以在多个下游任务上 достичьsubstantial 的改进。

Abstract
The recent success of large pre-trained language models (PLMs) heavily hinges on massive labeled data, which typically produces inferior performance in low-resource scenarios. To remedy this dilemma, we study self-training as one of the predominant semi-supervised learning (SSL) approaches, which utilizes large-scale unlabeled data to generate synthetic examples. However, too many noisy labels will hurt the model performance, and the self-training procedure requires multiple training iterations making it more expensive if all the model parameters of the PLM are updated. This paper presents UPET, a novel Uncertainty-aware Parameter-Efficient self-Training framework to effectively and efficiently address the labeled data scarcity issue. Specifically, we incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation for the teacher model and then judiciously select reliable pseudo-labeled examples based on confidence and certainty. During the student training, we introduce multiple parameter-efficient learning (PEL) paradigms that allow the optimization of only a small percentage of parameters. We also propose a novel Easy-Hard Contrastive Tuning to enhance the robustness and generalization. Extensive experiments over multiple downstream tasks demonstrate that UPET achieves a substantial improvement in terms of performance and efficiency. Our codes and data are released at https: //github.com/wjn1996/UPET.

摘要

Reducing Uncertainty in Sea-level Rise Prediction: A Spatial-variability-aware Approach

paper_url: http://arxiv.org/abs/2310.15179
repo_url: None
paper_authors: Subhankar Ghosh, Shuai An, Arun Sharma, Jayant Gupta, Shashi Shekhar, Aneesh Subramanian
for: 这 paper 的目的是准确地预测未来海平面升高，同时降低不确定性。
methods: 这 paper 使用了 zonal regression 模型，解决了地域差异和模型依赖关系。
results: 实验结果表明，通过这 approach 在地区层次上学习 weights，可以提供更可靠的海平面升高预测。

Abstract
Given multi-model ensemble climate projections, the goal is to accurately and reliably predict future sea-level rise while lowering the uncertainty. This problem is important because sea-level rise affects millions of people in coastal communities and beyond due to climate change's impacts on polar ice sheets and the ocean. This problem is challenging due to spatial variability and unknowns such as possible tipping points (e.g., collapse of Greenland or West Antarctic ice-shelf), climate feedback loops (e.g., clouds, permafrost thawing), future policy decisions, and human actions. Most existing climate modeling approaches use the same set of weights globally, during either regression or deep learning to combine different climate projections. Such approaches are inadequate when different regions require different weighting schemes for accurate and reliable sea-level rise predictions. This paper proposes a zonal regression model which addresses spatial variability and model inter-dependency. Experimental results show more reliable predictions using the weights learned via this approach on a regional scale.

摘要
Translated into Simplified Chinese:给定多模型ensemble气候预测，目标是准确地预测未来海平面上升，同时降低不确定性。这个问题非常重要，因为海平面上升会对海洋和沿海社区的人们产生深见影响，这是由于气候变化对北极和南极冰川的影响。这个问题具有空间不一致和未知因素，如可能的致命点（例如格陵兰或西安таркти亚极冰川崩塌）、气候反馈循环（例如云和冻土融化）、未来政策决策和人类行为。现有的气候模拟方法通常使用全球相同的权重，在回归或深度学习中组合不同的气候预测。这些方法无法满足不同地区需要不同权重分配的准确和可靠海平面上升预测。这篇论文提出了zonal回归模型，该模型解决了空间不一致和模型互相依赖问题。实验结果表明，通过该方法在地域级别上学习权重后，可以获得更可靠的预测结果。

AI for Mathematics: A Cognitive Science Perspective

paper_url: http://arxiv.org/abs/2310.13021
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Cedegao E. Zhang, Katherine M. Collins, Adrian Weller, Joshua B. Tenenbaum
for: This paper is written for researchers and practitioners in the field of artificial intelligence (AI) who are interested in developing automated mathematicians.
methods: The paper draws on cognitive science research directions to inform the development of truly human-level mathematical systems.
results: The paper highlights the importance of considering a multidisciplinary perspective, involving cognitive scientists, AI researchers, and mathematicians, to develop better mathematical AI systems that can push the frontier of mathematics and provide insights into human cognition.

Abstract
Mathematics is one of the most powerful conceptual systems developed and used by the human species. Dreams of automated mathematicians have a storied history in artificial intelligence (AI). Rapid progress in AI, particularly propelled by advances in large language models (LLMs), has sparked renewed, widespread interest in building such systems. In this work, we reflect on these goals from a \textit{cognitive science} perspective. We call attention to several classical and ongoing research directions from cognitive science, which we believe are valuable for AI practitioners to consider when seeking to build truly human (or superhuman)-level mathematical systems. We close with open discussions and questions that we believe necessitate a multi-disciplinary perspective -- cognitive scientists working in tandem with AI researchers and mathematicians -- as we move toward better mathematical AI systems which not only help us push the frontier of the mathematics, but also offer glimpses into how we as humans are even capable of such great cognitive feats.

摘要
mathematics是人类创造并使用的一种极其强大的概念体系。自动化数学家的梦想有着悠久的历史在人工智能（AI）领域。特别是在大语言模型（LLMs）的进步下，AI的进步得到了推动，这引发了人们对于建立这种系统的再一次兴趣。在这篇文章中，我们从认知科学的角度思考这些目标。我们强调了一些经典和持续研究的方向，我们认为这些方向对于AI实践人员来说非常有价值，可以帮助建立人类（或超人）水平的数学系统。文章结束时，我们提出了一些开放的讨论和问题，我们认为需要多学科合作才能解决这些问题，即认知科学家、AI研究者和数学家在一起，以实现更好的数学AI系统，不仅能够推动数学的前沿，还能够为我们提供人类的大脑能力之间的窥视。

Provable Guarantees for Neural Networks via Gradient Feature Learning

paper_url: http://arxiv.org/abs/2310.12408
repo_url: None
paper_authors: Zhenmei Shi, Junyi Wei, Yingyu Liang
for: 这项研究的目的是为了解释深度学习网络的成功，并提供一种统一的分析框架。
methods: 这项研究使用了梯度下降来训练两层网络，并基于特征学习的原理来分析网络的学习过程。
results: 研究发现，两层网络在训练过程中可以学习出非线性特征，并且这种特征学习能力不仅限于特征函数的内存。此外，研究还发现了一些有趣的网络学习现象，如特征学习超出kernel的限制和抽奖假设。

Abstract
Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, e.g., the Neural Tangent Kernel approach fails to capture their key feature learning ability, while recent analyses on feature learning are typically problem-specific. This work proposes a unified analysis framework for two-layer networks trained by gradient descent. The framework is centered around the principle of feature learning from gradients, and its effectiveness is demonstrated by applications in several prototypical problems, such as mixtures of Gaussians and parity functions. The framework also sheds light on interesting network learning phenomena such as feature learning beyond kernels and the lottery ticket hypothesis.

摘要
神经网络已经实现了非常出色的实际性能，而当前的理论分析并没有充分理解它们的成功，例如，神经折衔核方法不能捕捉它们的关键特征学习能力，而最近的特征学习分析通常是问题特定的。这个工作提出了两层网络通过梯度下降学习的统一分析框架，该框架中心在特征学习从梯度中的原则上，并通过应用于多个典型问题，如混合的高斯函数和平衡函数，证明了其效iveness。此外，该框架还暴露了神经网络学习现象，如特征学习超过折衔和抽奖假设。

Classification-Aided Robust Multiple Target Tracking Using Neural Enhanced Message Passing

paper_url: http://arxiv.org/abs/2310.12407
repo_url: None
paper_authors: Xianglong Bai, Zengfu Wang, Quan Pan, Tao Yun, Hua Lan
for: 提高强陌生环境下多个目标的跟踪稳定性，使用雷达测量信息。
methods: 利用范围-Doppler谱信息进行测量类别标识，以提高干扰抑制和数据关联，从而提高目标跟踪稳定性。采用神经网络增强消息传递方法，将信念输入神经网络，以提高原始信念的准确性。
results: 提出了一种基于神经网络增强消息传递的分类帮助稳定多目标跟踪算法，包括三个模块：消息传递模块、神经网络模块和德本-沙佛模块。该算法可以有效地抑制干扰和提高数据关联，从而在实际雷达应用中提高多目标跟踪性能。验证了该方法的有效性通过实验和实际数据场景。

Abstract
We address the challenge of tracking an unknown number of targets in strong clutter environments using measurements from a radar sensor. Leveraging the range-Doppler spectra information, we identify the measurement classes, which serve as additional information to enhance clutter rejection and data association, thus bolstering the robustness of target tracking. We first introduce a novel neural enhanced message passing approach, where the beliefs obtained by the unified message passing are fed into the neural network as additional information. The output beliefs are then utilized to refine the original beliefs. Then, we propose a classification-aided robust multiple target tracking algorithm, employing the neural enhanced message passing technique. This algorithm is comprised of three modules: a message-passing module, a neural network module, and a Dempster-Shafer module. The message-passing module is used to represent the statistical model by the factor graph and infers target kinematic states, visibility states, and data associations based on the spatial measurement information. The neural network module is employed to extract features from range-Doppler spectra and derive beliefs on whether a measurement is target-generated or clutter-generated. The Dempster-Shafer module is used to fuse the beliefs obtained from both the factor graph and the neural network. As a result, our proposed algorithm adopts a model-and-data-driven framework, effectively enhancing clutter suppression and data association, leading to significant improvements in multiple target tracking performance. We validate the effectiveness of our approach using both simulated and real data scenarios, demonstrating its capability to handle challenging tracking scenarios in practical radar applications.

摘要
我们面临对未知数目目标在强杂环境中进行追踪，使用射频感知器的测量。我们利用射频 Doppler спектrum 信息，识别测量类型，从而增强杂环排除和数据汇合，提高目标追踪的Robustness。我们首先介绍一个具有神经网络增强的讯息传递方法，其中信念由统一讯息传递获取，然后透过神经网络将信念处理，以改善原始信念。然后，我们提出一个类别帮助强化多目标追踪算法，这个算法包括三个模组：讯息传递模组、神经网络模组和德мп斯特-沙佛模组。讯息传递模组用于表示统计模型，使用因子 граhp 表示目标运动状态、可见状态和数据汇合，基于空间测量信息。神经网络模组用于从射频 Doppler спектrum 提取特征，得出 measurement 是否为目标生成或噪音生成的信念。德мп斯特-沙佛模组用于联合这两个模组所得到的信念。因此，我们的提出的方法采用模型和数据驱动的框架，实现强化杂环排除和数据汇合，导致多目标追踪性能的明显提高。我们验证了方法的有效性通过实验和实际数据enario，证明它在实际射频应用中可以应对具有具有挑战性的追踪enario。

GPT-4 Doesn’t Know It’s Wrong: An Analysis of Iterative Prompting for Reasoning Problems

paper_url: http://arxiv.org/abs/2310.12397
repo_url: None
paper_authors: Kaya Stechly, Matthew Marquez, Subbarao Kambhampati
for: 这 paper investigate LLMs 的自我批判能力在 Graph Coloring 问题上。
methods: authors 使用 GPT4 和 external correct reasoner 对 Graph Coloring 实例进行解决和验证。
results: study 显示 LLMs 不good at solving Graph Coloring 实例，并且不能够验证自己生成的解决方案。correctness 和内容 of the criticisms 对 iterative prompting 的性能没有影响。

Abstract
There has been considerable divergence of opinion on the reasoning abilities of Large Language Models (LLMs). While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples, a wide spread belief in their iterative self-critique capabilities persists. In this paper, we set out to systematically investigate the effectiveness of iterative prompting of LLMs in the context of Graph Coloring, a canonical NP-complete reasoning problem that is related to propositional satisfiability as well as practical problems like scheduling and allocation. We present a principled empirical study of the performance of GPT4 in solving graph coloring instances or verifying the correctness of candidate colorings. In iterative modes, we experiment with the model critiquing its own answers and an external correct reasoner verifying proposed solutions. In both cases, we analyze whether the content of the criticisms actually affects bottom line performance. The study seems to indicate that (i) LLMs are bad at solving graph coloring instances (ii) they are no better at verifying a solution--and thus are not effective in iterative modes with LLMs critiquing LLM-generated solutions (iii) the correctness and content of the criticisms--whether by LLMs or external solvers--seems largely irrelevant to the performance of iterative prompting. We show that the observed increase in effectiveness is largely due to the correct solution being fortuitously present in the top-k completions of the prompt (and being recognized as such by an external verifier). Our results thus call into question claims about the self-critiquing capabilities of state of the art LLMs.

摘要
LLMs 的思维能力受到了不同的观点的争议。尽管初始的乐观情绪，思维能力会自动出现随着规模的提高，已经受到了许多对例的推翻，但对 LLMs 的循环自我批判能力仍然广泛存在信任。在这篇论文中，我们系统地 investigate LLMs 在图色置问题上的效果，这是一个NP完备的问题，与 propositional 满足性和实际问题如调度和分配有关。我们使用 GPT4 来解决图色置问题或验证候选解的正确性。在循环模式下，我们实验室LMs 自我批判其自己的答案，以及一个外部正确的解释器验证提出的解决方案。在两种情况下，我们分析了批判的内容是否实际影响循环提问的表现。研究显示：1. LLMs 解决图色置问题不善。2. LLMs 不能验证解决方案的正确性，因此循环模式下LMs 自我批判LMs 生成的解决方案无效。3. 批判的内容和正确性无关，无论是由 LLMs 或外部解释器验证。4. 观察到的效果增加主要归因于prompt中的top-k完成中包含正确解决方案，并由外部验证器认可。我们的结果质疑了现代 LLMs 的自我批判能力的声称。