2023-10-06

cs.AI

cs.AI - 2023-10-06

Copy Suppression: Comprehensively Understanding an Attention Head

paper_url: http://arxiv.org/abs/2310.04625
repo_url: https://github.com/callummcdougall/seri-mats-2023-streamlit-pages
paper_authors: Callum McDougall, Arthur Conmy, Cody Rushing, Thomas McGrath, Neel Nanda
for: 本研究主要针对语言模型中的一个重要组件——注意头10.7（L10H7），以及它在模型训练过程中的作用。
methods: 本研究使用了GPT-2 Small语言模型，并通过分析模型的 weights 来描述 L10H7 的减震机制。
results: 研究发现，L10H7 可以减少模型的复制行为，并且这种减震机制对于自修复（self-repair）具有重要作用。自修复指的是，当模型中的某些组件被 удали时，下游神经网络部分会进行补做，以维持模型的正确性。研究还发现，自修复的一个重要机制就是复制减震。

Abstract
We present a single attention head in GPT-2 Small that has one main role across the entire training distribution. If components in earlier layers predict a certain token, and this token appears earlier in the context, the head suppresses it: we call this copy suppression. Attention Head 10.7 (L10H7) suppresses naive copying behavior which improves overall model calibration. This explains why multiple prior works studying certain narrow tasks found negative heads that systematically favored the wrong answer. We uncover the mechanism that the Negative Heads use for copy suppression with weights-based evidence and are able to explain 76.9% of the impact of L10H7 in GPT-2 Small. To the best of our knowledge, this is the most comprehensive description of the complete role of a component in a language model to date. One major effect of copy suppression is its role in self-repair. Self-repair refers to how ablating crucial model components results in downstream neural network parts compensating for this ablation. Copy suppression leads to self-repair: if an initial overconfident copier is ablated, then there is nothing to suppress. We show that self-repair is implemented by several mechanisms, one of which is copy suppression, which explains 39% of the behavior in a narrow task. Interactive visualisations of the copy suppression phenomena may be seen at our web app https://copy-suppression.streamlit.app/

摘要
我们提出了一个单一的注意头在GPT-2 Small中，这个注意头在整个训练分布中有一个主要角色。如果在earlier层中的 компонент预测了某个 tokens，并且这个 tokens 在上下文中出现得更早，那么这个注意头会对它进行抑制：我们称这为copy suppression。注意头10.7（L10H7）对于naive copying行为进行抑制，这解释了为什么多个先前的研究在特定的狭频任务中发现了负面的头。我们探索了这个机制的负面头使用 weights-based evidence 的实际方式，并能够解释76.9%的L10H7在GPT-2 Small中的影响。根据我们所知，这是 language model 中 Component 的最完整的角色描述至今。一个主要的效果 OF copy suppression 是 self-repair。self-repair 指的是当模型中的重要部分被删除时，下游神经网络部分会对此进行补偿。copy suppression 导致 self-repair：如果初始的骄傲 copier 被删除，那么没有什么可以对其进行抑制。我们显示了 self-repair 是通过多种机制实现的，其中一种是 copy suppression，这解释了39%的行为在狭频任务中。可以在我们的网页应用中看到互动的visualisations of the copy suppression 现象：https://copy-suppression.streamlit.app/

Deconstructing Cooperation and Ostracism via Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.04623
repo_url: None
paper_authors: Atsushi Ueshima, Shayegan Omidshafiei, Hirokazu Shirado
for: 这个论文探讨了在生物系统、人类社会和多代理系统中的合作挑战，以及如何通过网络重构来解决这些挑战。
methods: 作者使用了多代理人学习 simulate the Prisoner’s Dilemma game，并研究了在不同的连接策略下，合作和网络重构之间的复杂 causal 动力。
results: 研究发现，网络重构可以促进双方合作，即使一方总是合作。此外，研究还发现， ostracism 是 network rewiring 的关键因素，但 ostracism alone 不能使合作出现。相反， ostracism 是在学习合作后才出现的，并且已有的合作则由 ostracism 加强。

Abstract
Cooperation is challenging in biological systems, human societies, and multi-agent systems in general. While a group can benefit when everyone cooperates, it is tempting for each agent to act selfishly instead. Prior human studies show that people can overcome such social dilemmas while choosing interaction partners, i.e., strategic network rewiring. However, little is known about how agents, including humans, can learn about cooperation from strategic rewiring and vice versa. Here, we perform multi-agent reinforcement learning simulations in which two agents play the Prisoner's Dilemma game iteratively. Each agent has two policies: one controls whether to cooperate or defect; the other controls whether to rewire connections with another agent. This setting enables us to disentangle complex causal dynamics between cooperation and network rewiring. We find that network rewiring facilitates mutual cooperation even when one agent always offers cooperation, which is vulnerable to free-riding. We then confirm that the network-rewiring effect is exerted through agents' learning of ostracism, that is, connecting to cooperators and disconnecting from defectors. However, we also find that ostracism alone is not sufficient to make cooperation emerge. Instead, ostracism emerges from the learning of cooperation, and existing cooperation is subsequently reinforced due to the presence of ostracism. Our findings provide insights into the conditions and mechanisms necessary for the emergence of cooperation with network rewiring.

摘要
合作在生物系统、人类社会和多代理系统中都是挑战。而每个代理都可能会选择自利而不是合作。人类研究表明，人们可以在选择互动伙伴时超越社会困境，即策略网络重启。然而，关于代理如何从策略重启中学习合作以及vice versa， ainda不够了解。在这里，我们通过多代理学习回归 simulations进行了 investigate。两个代理在谎言游戏中互动，每个代理有两个策略：一个控制合作或背叛；另一个控制与另一个代理的连接。这种设置允许我们分离复杂的 causal 动力。我们发现，网络重启可以促进互合作，即使一个代理总是合作，容易受到恶意骗取。然后，我们确认了网络重启的效果是通过代理学习排斥来实现的，即与合作者连接并与背叛者断开。然而，我们也发现，排斥本身不足以使合作出现。相反，排斥是由学习合作而起的，并且现有的合作后来受到了排斥的加强。我们的发现可以为合作的出现和维护提供条件和机制。

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

paper_url: http://arxiv.org/abs/2310.04621
repo_url: None
paper_authors: Fred Hohman, Mary Beth Kery, Donghao Ren, Dominik Moritz
for: 本研究旨在推动机器学习（ML） computations 在日常用户设备上进行，以提高隐私、响应速度和新智能用户体验的普及。
methods: 本研究使用了 Apple 专家的采访研究，汲取了具有实践经验的专家在模型压缩方面的tacit knowledge。
results: 研究发现了一些实践中的技术策略和设计考虑，以及在不同硬件平台上实现高效模型的具体步骤。此外，研究还提出了一些工具设计建议，以便使得在设备上进行 ML computations 的工作更加容易。

Abstract
On-device machine learning (ML) promises to improve the privacy, responsiveness, and proliferation of new, intelligent user experiences by moving ML computation onto everyday personal devices. However, today's large ML models must be drastically compressed to run efficiently on-device, a hurtle that requires deep, yet currently niche expertise. To engage the broader human-centered ML community in on-device ML experiences, we present the results from an interview study with 30 experts at Apple that specialize in producing efficient models. We compile tacit knowledge that experts have developed through practical experience with model compression across different hardware platforms. Our findings offer pragmatic considerations missing from prior work, covering the design process, trade-offs, and technical strategies that go into creating efficient models. Finally, we distill design recommendations for tooling to help ease the difficulty of this work and bring on-device ML into to more widespread practice.

摘要
“设备机器学习（ML）将提高用户隐私、响应速度和新智能体验的普及，但现在的大型ML模型需要压缩运行在日常个人设备上，这是一项需要深厚专业知识的挑战。为了让更广泛的人类中心ML社区参与在设备上ML经验，我们公布了30名Apple专家在模型压缩 across不同硬件平台的实践经验。我们汇集了专家在实践中发展的潜在知识，包括设计过程、让步和技术策略。我们的发现缺失在先前的工作中，提供了实用的建议，以帮助抵消这项工作的困难。最后，我们提炼了工具设计的建议，以便更好地普及设备上ML技术。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Hong Kong, Macau, and Taiwan. If you need the translation in Traditional Chinese, please let me know.

SlotGNN: Unsupervised Discovery of Multi-Object Representations and Visual Dynamics

paper_url: http://arxiv.org/abs/2310.04617
repo_url: None
paper_authors: Alireza Rezazadeh, Athreyi Badithela, Karthik Desingh, Changhyun Choi
for: 这 paper 是用于学习多对象动力学从视觉数据中的不监督技术。
methods: 这 paper 使用了两个新的建筑：SlotTransport 和 SlotGNN。SlotTransport 是一种基于槽注意力的无监督物体发现算法，使用了特征传输机制来保持时间Alignment在物体-中心表示中。SlotGNN 是一种无监督图像基于 Scene 的动力学模型，使用了发现的槽来预测未来Scene 的状态。
results: 这 paper demonstarted SlotTransport 可以学习准确地编码 both visual 和位置信息，而 SlotGNN 可以在多对象重新排序和长期预测等 robotic 任务中准确预测 slots 和它们的动力学行为。此外，这 paper 的无监督方法在实际世界中也得到了证明。

Abstract
Learning multi-object dynamics from visual data using unsupervised techniques is challenging due to the need for robust, object representations that can be learned through robot interactions. This paper presents a novel framework with two new architectures: SlotTransport for discovering object representations from RGB images and SlotGNN for predicting their collective dynamics from RGB images and robot interactions. Our SlotTransport architecture is based on slot attention for unsupervised object discovery and uses a feature transport mechanism to maintain temporal alignment in object-centric representations. This enables the discovery of slots that consistently reflect the composition of multi-object scenes. These slots robustly bind to distinct objects, even under heavy occlusion or absence. Our SlotGNN, a novel unsupervised graph-based dynamics model, predicts the future state of multi-object scenes. SlotGNN learns a graph representation of the scene using the discovered slots from SlotTransport and performs relational and spatial reasoning to predict the future appearance of each slot conditioned on robot actions. We demonstrate the effectiveness of SlotTransport in learning object-centric features that accurately encode both visual and positional information. Further, we highlight the accuracy of SlotGNN in downstream robotic tasks, including challenging multi-object rearrangement and long-horizon prediction. Finally, our unsupervised approach proves effective in the real world. With only minimal additional data, our framework robustly predicts slots and their corresponding dynamics in real-world control tasks.

摘要
学习多对象动力学从视觉数据中使用无监督技术是具有挑战性的，因为需要Robust，可以通过机器人互动学习的对象表示。这篇论文提出了一个新的框架，包括两种新的架构：SlotTransport用于从RGB图像中发现对象表示，以及SlotGNN用于基于RGB图像和机器人互动预测多对象场景的共同动力学。我们的SlotTransport架构基于插槽注意力来无监督地发现对象，并使用特征传输机制来保持时间Alignment在对象中心表示。这使得可以发现具有固定组合的对象插槽，即使在压抑或缺失情况下也能够稳定地绑定到对象。我们的SlotGNN是一种新的无监督图形学模型，它使用发现的插槽来学习场景的图形表示，并在插槽之间进行关系和空间的推理来预测未来场景的 appears。我们 demonstarte了SlotTransport的有效性在学习对象中心特征，以及SlotGNN在下游机器人任务中的准确性。最后，我们的无监督方法在实际世界中得到了证明，只需要最少的额外数据，我们的框架就可以在实际控制任务中Robust地预测插槽和它们的相应动力学。

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

paper_url: http://arxiv.org/abs/2310.04610
repo_url: None
paper_authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi Hanson, Thomas E Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin Aji, Angela Dalton, Michael Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens
for: 这个论文目的是要探讨深度学习如何应用于自然科学领域，以推动科学探索和发现。
methods: 这个研究使用了DeepSpeed4Science倡议，利用深度学习系统技术创新来帮助领域专家解释今天最大的科学谜团。
results: 这个研究获得了初步的进展，通过对结构生物学研究中的两个系统挑战进行解决。

Abstract
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.

摘要
在未来的一个 década，深度学习可能会革命化自然科学，提高我们对自然现象的模型和预测能力。这可能会开启一个新的科学探索时代，带来重要的进步 across 多个领域，从药物开发到可再生能源。为回答这个呼吁，我们提出了 DeepSpeed4Science iniciativa（deepspeed4science.ai），旨在通过人工智能系统技术创新，帮助领域专家解开今天最大的科学谜团。通过利用 DeepSpeed 的当前技术柱（训练、推理和压缩）作为基础技术驱动者，DeepSpeed4Science 将创造一个新的人工智能系统技术，用于加速科学发现，超越常见的技术方法used for 加速通用大语言模型（LLMs）。在这篇论文中，我们展示了 DeepSpeed4Science 在结构生物研究中的早期进展。

A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators

paper_url: http://arxiv.org/abs/2310.04607
repo_url: None
paper_authors: Murali Emani, Sam Foreman, Varuni Sastry, Zhen Xie, Siddhisanket Raskar, William Arnold, Rajeev Thakur, Venkatram Vishwanath, Michael E. Papka
for: 本研究使用大型自然语言模型（LLM）来加速科学应用，检验不同AI加速器硬件系统的性能特点。
methods: 本研究使用多种AI加速器和GPU进行比较性能测试，包括一个核心转换块微型Benchmark、GPT-2模型和GenSLM科学应用案例。
results: 研究发现不同AI加速器在处理LLM模型时的性能特点，包括序列长度、缩放行为、缺失率和梯度积累步骤的影响。

Abstract
Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications is contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPUs and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT- 2 model, and (iii) an LLM-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, sparsity, and sensitivity to gradient accumulation steps.

摘要
人工智能（AI）方法已成为科学应用中的关键因素，以加速科学发现。大型语言模型（LLM）被视为解决一些挑战性问题的有望方法，因为它们在领域之间具有优秀的总体化能力。模型的效果和应用的准确性取决于其下面硬件基础设施的高效运行。特殊的AI加速器硬件系统在最近成为了加速AI应用的选择。然而，这些AI加速器对大型语言模型的性能尚未被系统性研究。在这篇论文中，我们系统地研究了多种AI加速器和GPU在LLM上的性能特点。我们使用（i）一个核心变换块的微型 benchmark，（ii）GPT-2模型，以及（iii）一个基于LLM的科学应用use case，GenSLM。我们提供我们的发现和分析结果，以更好地理解AI加速器的内在能力。此外，我们的分析考虑了序列长度、缩放行为、稀疏性和梯度积累步骤的影响。

A neuro-symbolic framework for answering conjunctive queries

paper_url: http://arxiv.org/abs/2310.04598
repo_url: None
paper_authors: Pablo Barceló, Tamara Cucumides, Floris Geerts, Juan Reutter, Miguel Romero
for: answering arbitrary conjunctive queries over incomplete knowledge graphs
methods: approximating cyclic queries with an infinite family of tree-like queries, leveraging existing neuro-symbolic models
results: strong guarantees of completeness and optimality, competitive results and improved performance with existentially quantified variables.Here’s the Chinese version:
for: answering incomplete knowledge graphs中的任意谱 queries
methods: approximating循环 queries by infinite family of树型 queries, 利用现有的 neuralsymbolic models
results: strong guarantees of completeness和optimality, competitive results and improved performance with existentially quantified variables.

Abstract
The problem of answering logical queries over incomplete knowledge graphs is receiving significant attention in the machine learning community. Neuro-symbolic models are a promising recent approach, showing good performance and allowing for good interpretability properties. These models rely on trained architectures to execute atomic queries, combining them with modules that simulate the symbolic operators in queries. Unfortunately, most neuro-symbolic query processors are limited to the so-called tree-like logical queries that admit a bottom-up execution, where the leaves are constant values or anchors, and the root is the target variable. Tree-like queries, while expressive, fail short to express properties in knowledge graphs that are important in practice, such as the existence of multiple edges between entities or the presence of triangles. We propose a framework for answering arbitrary conjunctive queries over incomplete knowledge graphs. The main idea of our method is to approximate a cyclic query by an infinite family of tree-like queries, and then leverage existing models for the latter. Our approximations achieve strong guarantees: they are complete, i.e. there are no false negatives, and optimal, i.e. they provide the best possible approximation using tree-like queries. Our method requires the approximations to be tree-like queries where the leaves are anchors or existentially quantified variables. Hence, we also show how some of the existing neuro-symbolic models can handle these queries, which is of independent interest. Experiments show that our approximation strategy achieves competitive results, and that including queries with existentially quantified variables tends to improve the general performance of these models, both on tree-like queries and on our approximation strategy.

摘要
machine learning 社区中受到“回答逻辑查询 над 不完整知识图”的问题 receiving significant attention。 neuro-symbolic 模型是一种可靠的新方法，表现良好并具有良好解释性质。这些模型通过训练架构来执行原子查询，并将其与模块组合以模拟查询中的符号运算。然而，大多数 neuro-symbolic 查询处理器都受到限制，只能处理叶子结构式查询，即叶子是常量值或吊钩，根是目标变量。叶子结构式查询，虽然表达力强，但缺少在实际中重要的知识图特性，如多个边 между实体或存在三角形。我们提出了一个 answering arbitrary conjunctive queries over incomplete knowledge graphs 的框架。我们的方法的主要想法是将 cyclic 查询近似为无穷多个 tree-like 查询，然后利用现有模型来处理后者。我们的近似 garanties 是完整的，即无 false negatives，以及优化的，即它们在 tree-like 查询中提供了最好的近似。我们的方法需要近似是 tree-like 查询的叶子是吊钩或 universally quantified 变量。因此，我们还证明了一些现有的 neuro-symbolic 模型可以处理这些查询，这是独立有趣的。实验显示，我们的近似策略实现了竞争力的结果，并且包括含 universally quantified 变量的查询通常会提高这些模型的总性能，不 только在 tree-like 查询上，还在我们的近似策略上。

Segmented Harmonic Loss: Handling Class-Imbalanced Multi-Label Clinical Data for Medical Coding with Large Language Models

paper_url: http://arxiv.org/abs/2310.04595
repo_url: None
paper_authors: Surjya Ray, Pratik Mehta, Hongen Zhang, Ada Chaman, Jian Wang, Chung-Jen Ho, Michael Chiou, Tashfeen Suleman
for: 这篇论文旨在评估大自然语言模型（LLM）在医疗领域的应用，特别是在实际噪音数据上进行医疗编码任务。
methods: 作者使用了encoder-based LLMs，如BERT，并开发了一种新的损失函数，即分割和解耦多个类别的数据集的Segmented Harmonic Loss。此外，作者还提出了一种基于embedding相似性的技术来处理噪音数据。
results: 作者的实验结果表明，当使用提议的损失函数进行训练时，LLMs在噪音长尾数据上达到了显著性能提升，与状态 искусственный智能的F1分数相比，提高了十几个百分点。

Abstract
The precipitous rise and adoption of Large Language Models (LLMs) have shattered expectations with the fastest adoption rate of any consumer-facing technology in history. Healthcare, a field that traditionally uses NLP techniques, was bound to be affected by this meteoric rise. In this paper, we gauge the extent of the impact by evaluating the performance of LLMs for the task of medical coding on real-life noisy data. We conducted several experiments on MIMIC III and IV datasets with encoder-based LLMs, such as BERT. Furthermore, we developed Segmented Harmonic Loss, a new loss function to address the extreme class imbalance that we found to prevail in most medical data in a multi-label scenario by segmenting and decoupling co-occurring classes of the dataset with a new segmentation algorithm. We also devised a technique based on embedding similarity to tackle noisy data. Our experimental results show that when trained with the proposed loss, the LLMs achieve significant performance gains even on noisy long-tailed datasets, outperforming the F1 score of the state-of-the-art by over ten percentage points.

摘要
大量语言模型（LLM）的急剧升级和推广，历史上任何消费者面向技术中最快的采用率都不能比拟。医疗领域，曾经使用自然语言处理技术（NLP），不可避免地受到这种飞速的影响。在这篇论文中，我们评估了LLM在医疗数据中的表现，使用实际生成的噪音数据进行评估。我们在MIMIC III和IV dataset上进行了多个实验，使用了BERT等encoder-based LLM。此外，我们还提出了一种新的损失函数——分割和解除相互关联的类别损失函数（Segmented Harmonic Loss），用于Addressing the extreme class imbalance problem in most medical data。此外，我们还提出了一种基于 embedding similarity的技术来处理噪音数据。我们的实验结果表明，当使用我们提出的损失函数进行训练时，LLM在噪音长尾数据上表现出了明显的性能提升，与状态之前的F1分数高出十个百分点以上。

Can pruning make Large Language Models more efficient?

paper_url: http://arxiv.org/abs/2310.04573
repo_url: None
paper_authors: Sia Gholami, Marwan Omar
for: 这篇论文旨在探讨用于Transformer架构的Weight遗传减少，以提高模型的 Computational Efficiency、环境影响和资源有限的平台上的部署。
methods: 这篇论文使用了多种剪辑方法，包括阶层剪辑、梯度剪辑和混合剪辑，并评估它们对模型性能、模型大小和computational demand的影响。
results: 研究发现，对于Transformer架构，适当选择剪辑参数可以实现轻量级化模型，而不会对模型性能造成严重干扰。此外，给定的剪辑方法可以提高模型的普遍化能力。

Abstract
Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational efficiency, environmental impact, and deployability on resource-limited platforms. To address these challenges, this paper investigates the application of weight pruning-a strategic reduction of model parameters based on their significance-as an optimization strategy for Transformer architectures. Through extensive experimentation, we explore various pruning methodologies, highlighting their impact on model performance, size, and computational demands. Our findings suggest that with judicious selection of pruning hyperparameters, significant reductions in model size are attainable without considerable compromise on performance. Moreover, when coupled with post-pruning fine-tuning strategies, some pruned models even exhibit enhanced generalization capabilities. This work seeks to bridge the gap between model efficiency and performance, paving the way for more scalable and environmentally responsible deep learning applications.

摘要
Transformer 模型已经对自然语言处理带来革命性的变革，但是它们的庞大参数数量也引发了计算效率、环境影响和资源有限平台上的部署的担忧。为了解决这些挑战，这篇论文探讨了在Transformer架构中应用权重剔除（一种基于参数重要性的参数剔除策略）的应用。通过广泛的实验，我们探讨了不同的剔除方法，并对它们的影响对模型性能、体积和计算需求进行了详细的探讨。我们的发现表明，通过合理地选择剔除超参数的参数，可以实现大幅减少模型体积，而不会对性能造成重要的损害。此外，当与后期剔除精度练习结合使用时，一些剔除后的模型甚至会表现出更高的泛化能力。这种工作旨在bridging模型效率和性能之间的差距，为更可持续和环保的深度学习应用开辟新的可能性。

Knolling bot: A Transformer-based Approach to Organizing a Messy Table

paper_url: http://arxiv.org/abs/2310.04566
repo_url: None
paper_authors: Yuhang Hu, Zhizhuo Zhang, Ruibo Liu, Philippe Wyder, Hod Lipson
for: equip domestic robots with the ability to perform simple household tidying tasks
methods: transformer-based approach that predicts the next position of an item in a sequence of neatly positioned items, integrated with a visual perception model and a physical robot arm
results: a machine that declutters and organizes a dozen freeform items of various shapes and sizes

Abstract
In this study, we propose an approach to equip domestic robots with the ability to perform simple household tidying tasks. We focus specifically on 'knolling,' an activity related to organizing scattered items into neat and space-efficient arrangements. Unlike the uniformity of industrial environments, household settings present unique challenges due to their diverse array of items and the subjectivity of tidiness. Here, we draw inspiration from natural language processing (NLP) and utilize a transformer-based approach that predicts the next position of an item in a sequence of neatly positioned items. We integrate the knolling model with a visual perception model and a physical robot arm to demonstrate a machine that declutters and organizes a dozen freeform items of various shapes and sizes.

摘要
在这项研究中，我们提出了一种方法，以使家庭机器人具备简单的家务整理功能。我们专注于“整理”活动，即将杂乱的物品整理成整洁和高效的排序。与工业环境的统一性不同，家庭环境具有各种不同的物品和整理主观性。我们 Draw inspiration from自然语言处理（NLP），并使用变换器基本方法预测下一个item的位置序列中的整理位置。我们将整理模型与视觉识别模型和物理机器臂集成，以示一种机器人可以整理和组织多达十二种不同形状和大小的自由形态物品的机器人。

Binary Quantification and Dataset Shift: An Experimental Investigation

paper_url: http://arxiv.org/abs/2310.04565
repo_url: https://github.com/pglez82/quant_datasetshift
paper_authors: Pablo González, Alejandro Moreo, Fabrizio Sebastiani
for: 本研究的目的是调查现有的量化方法在不同类型的数据集shift下的性能，以便开发更加普遍应用的方法。
methods: 本研究使用了一系列的数据生成协议，以模拟不同类型的数据集shift，然后测试现有的量化方法在这些数据集上的性能。
results: 研究发现，许多现有的量化方法只是对优先概率shift进行了robust性测试，而对其他类型的数据集shift并不是robust enough。此外，无论是现有的量化方法还是一些新的方法，都没有能够在所有类型的数据集shift下达到良好的性能。

Abstract
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the unlabelled data are not IID, i.e., suffer from dataset shift. To date, quantification methods have mostly been tested only on a special case of dataset shift, i.e., prior probability shift; the relationship between quantification and other types of dataset shift remains, by and large, unexplored. In this work we carry out an experimental analysis of how current quantification algorithms behave under different types of dataset shift, in order to identify limitations of current approaches and hopefully pave the way for the development of more broadly applicable methods. We do this by proposing a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift, and by testing existing quantification methods on the datasets thus generated. One finding that results from this investigation is that many existing quantification methods that had been found robust to prior probability shift are not necessarily robust to other types of dataset shift. A second finding is that no existing quantification method seems to be robust enough to dealing with all the types of dataset shift we simulate in our experiments. The code needed to reproduce all our experiments is publicly available at https://github.com/pglez82/quant_datasetshift.

摘要
“量化任务”是指在已经有标签数据上训练预测器，然后用这些预测器预测未标注数据中类别的浸泡率值的超vised learning任务。这种任务特别有用，当标签数据和未标注数据不是独立同分布（IID）时。目前，量化方法已经主要测试在优先概率偏移的特殊情况下，关于其他类型的数据偏移情况，尚未得到广泛的研究。在这项工作中，我们通过设计细化的数据偏移类型分类、生成受到这些类型偏移的数据集，并测试现有的量化方法在这些数据集上的性能。我们的发现之一是，许多已知的量化方法，曾被发现对优先概率偏移有效，但并不一定对其他类型的数据偏移有效。另一个发现是，现有的任何量化方法都无法对我们在实验中 simulate 的所有类型数据偏移应用。相关的代码可以在 GitHub 上获取，网址是。

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

paper_url: http://arxiv.org/abs/2310.04564
repo_url: None
paper_authors: Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar
for: 这个研究旨在探讨如何在大型语言模型（LLMs）中使用ReLU活化函数以提高效率，并且降低在资源受限的设备上的计算成本。
methods: 这个研究使用了ReLU活化函数，并且探讨了将ReLU活化函数应用于LLMs中的练习方法，以及如何将ReLU活化函数与其他活化函数比较。
results: 研究发现，使用ReLU活化函数不会影响LLMs的性能和测试速度，但是可以降低计算成本和运算量。此外，这个研究还提出了一些实践的策略，可以将LLMs的推导运算量降低到3倍以上，并且对性能进行最小的贡献损失。

Abstract
Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications. However, their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices. Despite recent trends favoring alternative activation functions such as GELU or SiLU, known for increased computation, this study strongly advocates for reinstating ReLU activation in LLMs. We demonstrate that using the ReLU activation function has a negligible impact on convergence and performance while significantly reducing computation and weight transfer. This reduction is particularly valuable during the memory-bound inference step, where efficiency is paramount. Exploring sparsity patterns in ReLU-based LLMs, we unveil the reutilization of activated neurons for generating new tokens and leveraging these insights, we propose practical strategies to substantially reduce LLM inference computation up to three times, using ReLU activations with minimal performance trade-offs.

摘要

Towards Foundation Models for Knowledge Graph Reasoning

paper_url: http://arxiv.org/abs/2310.04562
repo_url: https://github.com/DeepGraphLearning/ULTRA
paper_authors: Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, Zhaocheng Zhu
for: 这个研究旨在建立基础模型，以便在语言和视觉领域中进行推论。
methods: 本研究使用了一种名为ULTRA的方法，它可以将基础表示学习到任何语言和视觉资料上，并且可以在不同的知识库中进行推论。
results: 研究发现，使用ULTRA方法可以在57个不同的知识库中进行零式推论，并且可以与特定Graph上训练的强基础模型进行比较。 Fine-tuning可以进一步提高表现。

Abstract
Foundation models in language and vision have the ability to run inference on any textual and visual inputs thanks to the transferable representations such as a vocabulary of tokens in language. Knowledge graphs (KGs) have different entity and relation vocabularies that generally do not overlap. The key challenge of designing foundation models on KGs is to learn such transferable representations that enable inference on any graph with arbitrary entity and relation vocabularies. In this work, we make a step towards such foundation models and present ULTRA, an approach for learning universal and transferable graph representations. ULTRA builds relational representations as a function conditioned on their interactions. Such a conditioning strategy allows a pre-trained ULTRA model to inductively generalize to any unseen KG with any relation vocabulary and to be fine-tuned on any graph. Conducting link prediction experiments on 57 different KGs, we find that the zero-shot inductive inference performance of a single pre-trained ULTRA model on unseen graphs of various sizes is often on par or better than strong baselines trained on specific graphs. Fine-tuning further boosts the performance.

摘要
基础模型在语言和视觉领域具有对任何文本和视觉输入进行推理的能力，归功于可转移的表示，如语言中的词汇表。知识图（KG）的不同实体和关系词汇通常不重叠。设计基础模型在KG上的主要挑战是学习可转移的表示，以便在任何图中进行推理。在这种情况下，我们在ULTRA方法中提出了一种学习通用和可转移的图表示的方法。ULTRA建立了基于交互的关系表示，这使得预训练的ULTRA模型可以在未看过的图上进行零基本推理，并且可以通过细化来进一步提高性能。在57个不同的知识图上进行了链接预测实验，我们发现了一个预训练的ULTRA模型在不同的图上的零基本推理性能经常与强基eline模型相当或更高，并且细化可以进一步提高性能。

Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

paper_url: http://arxiv.org/abs/2310.04521
repo_url: None
paper_authors: Tzu-Yuan Lin, Minghan Zhu, Maani Ghaffari
for: 这篇论文提出了一种随 Lie 代数数据输入的适应 invariants 神经网络，用于处理输入数据的变换。
methods: 该模型使用 conjugate 关系来捕捉变换之间的协变关系，并且利用了killings form 的不变性来扩展到任意半凝arametric Lie 代数。
results: 模型在 homography 模型中的应用显示了其在sl(3) Lie 代数上的扩展性和灵活性。

Abstract
This paper proposes an adjoint-equivariant neural network that takes Lie algebra data as input. Various types of equivariant neural networks have been proposed in the literature, which treat the input data as elements in a vector space carrying certain types of transformations. In comparison, we aim to process inputs that are transformations between vector spaces. The change of basis on transformation is described by conjugations, inducing the adjoint-equivariance relationship that our model is designed to capture. Leveraging the invariance property of the Killing form, the proposed network is a general framework that works for arbitrary semisimple Lie algebras. Our network possesses a simple structure that can be viewed as a Lie algebraic generalization of a multi-layer perceptron (MLP). This work extends the application of equivariant feature learning. As an example, we showcase its value in homography modeling using sl(3) Lie algebra.

摘要
这篇论文提出了一种随李代数数据输入的随变神经网络。文献中已经提出了各种类型的等变神经网络，它们将输入数据视为元素在一个向量空间上的变换。与之相比，我们的模型处理的输入是变换于向量空间之间的转换。这种转换的变换基的更改被描述为 conjugation，导致我们的模型拥有随变性关系。通过利用李代数内积的不变性，我们的网络是一种通用的李代数扩展，可以应用于任意半简单李代数。我们的网络结构简单，可以视为李代数扩展的多层感知器（MLP）的普遍化。这项工作扩展了等变特征学习的应用范围。例如，我们使用 sl(3) 李代数示例表明其价值在投影学中。

Utilizing Free Clients in Federated Learning for Focused Model Enhancement

paper_url: http://arxiv.org/abs/2310.04515
repo_url: None
paper_authors: Aditya Narayan Ravi, Ilan Shomorony
for: 本研究旨在解决 Federated Learning (FL) 中非优先级客户端参与的挑战，提出了一种名为 Prioritized FL 的分布式机器学习方法。
methods: 该方法使用匹配策略选择非优先级客户端，根据其数据上模型损失与全局数据上模型损失之间的相似程度来决定使用非优先级客户端的更新。
results: 该方法在多种synthetic和benchmark数据集上显示了更快的收敛速度和更高的测试准确率，比基eline更好。

Abstract
Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data, without the need for clients to share their data. Many existing FL approaches assume that all clients have equal importance and construct a global objective based on all clients. We consider a version of FL we call Prioritized FL, where the goal is to learn a weighted mean objective of a subset of clients, designated as priority clients. An important question arises: How do we choose and incentivize well aligned non priority clients to participate in the federation, while discarding misaligned clients? We present FedALIGN (Federated Adaptive Learning with Inclusion of Global Needs) to address this challenge. The algorithm employs a matching strategy that chooses non priority clients based on how similar the models loss is on their data compared to the global data, thereby ensuring the use of non priority client gradients only when it is beneficial for priority clients. This approach ensures mutual benefits as non priority clients are motivated to join when the model performs satisfactorily on their data, and priority clients can utilize their updates and computational resources when their goals align. We present a convergence analysis that quantifies the trade off between client selection and speed of convergence. Our algorithm shows faster convergence and higher test accuracy than baselines for various synthetic and benchmark datasets.

摘要
联邦学习（FL）是一种分布式机器学习方法，用于在分散的非同构数据上学习模型，而不需要客户端共享其数据。许多现有的FL方法假设所有客户端都有相同的重要性，并将所有客户端的数据结构组合成一个全球目标。我们称之为优先FL，其目的是学习一个优先客户端的Weighted Mean目标。一个重要的问题是如何选择和激励与优先客户端不同步的客户端参加联邦，而且抛弃不同步的客户端？我们提出了FedALIGN（联邦适应学习具有全球需求的匹配）来解决这个挑战。这个算法使用一个匹配策略，选择非优先客户端基于它们的模型损失与全球数据之间的相似度，以确保非优先客户端的 gradients 只在优先客户端的目标有益时使用。这种方法确保了非优先客户端的动机 join 联邦，并且优先客户端可以利用其更新和计算资源，当它们的目标相似。我们提供了一个对 client 选择和速度快速融合的可读性分析。我们的算法在多个 sintetic 和 benchmark 数据上显示 faster convergence 和更高的测试精度。

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

paper_url: http://arxiv.org/abs/2310.04413
repo_url: https://github.com/Improbable-AI/dw-offline-rl
paper_authors: Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal
for: 本研究旨在提出一种新的Offline Policy学习算法，以解决现有数据集中具有偏袋性的问题。
methods: 该算法使用了一种新的采样策略，允许策略仅受到“好的数据”的限制，而不是所有数据集中的所有动作。
results: 研究表明，该算法可以在72个偏袋数据集中实现显著的性能提升，并且在D4RL数据集和三种不同的Offline RL算法中也显示出良好的效果。

Abstract
Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.

摘要
偏好离线策略学习是目标是通过现有的轨迹数据集来学习决策策略，而不是收集更多数据。使用奖励学习（RL）而不是监督学习技术，例如行为复制，的 PRIMARY 动机是找到一个可以高于数据集中的轨迹平均返回的策略。然而，我们实际上发现，当数据集受到低质量轨迹的影响时，当前的偏好离线RL算法并没有显著提高数据集中的平均返回。我们认为这是因为当前的算法假设在数据集中很近的轨迹。如果数据集主要由低质量轨迹组成，这个假设迫使策略模仿低质量的动作。我们解决这个问题，我们提出了一种采样策略，即只允许策略遵循“好的数据”（即高质量轨迹），而不是所有数据集中的动作（即均匀采样）。我们采用了这种采样策略，并开发了一种可以作为标准偏好离线RL算法中的插件模块。我们的评估表明，在72个不均衡数据集、D4RL数据集和三种不同的偏好离线RL算法中，我们的方法具有显著的性能提升。代码可以在https://github.com/Improbable-AI/dw-offline-rl 上获取。

Policy-Gradient Training of Language Models for Ranking

paper_url: http://arxiv.org/abs/2310.04407
repo_url: None
paper_authors: Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim
for: This paper aims to improve the training of text retrieval models for decision-making systems by introducing a novel training algorithm called Neural PG-RANK.
methods: Neural PG-RANK uses a Plackett-Luce ranking policy to learn to rank, which is a principled method that relies little on complex heuristics. The algorithm unifies the training objective with downstream decision-making quality.
results: The paper presents extensive experiments on various text retrieval benchmarks, showing that Neural PG-RANK achieves remarkable in-domain performance improvement and substantial out-of-domain generalization to some critical datasets used in downstream question answering tasks.Here is the same information in Simplified Chinese text:
for: 这篇论文目标是提高决策系统中文本检索模型的训练方法，通过引入一种新的训练算法called Neural PG-RANK。
methods: Neural PG-RANK使用Plackett-Luce排序策略来学习排序，这是一种原理性的方法，它减少了复杂的规则的依赖。这种算法将训练目标与下游决策质量集成起来。
results: 论文提供了多个文本检索 benchmark 上的广泛实验结果，显示 Neural PG-RANK 在具体领域性能上有remarkable提升，并在一些关键的问答任务上具有显著的泛化性。

Abstract
Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires intricate heuristics, including selecting hard negatives and using additional supervision as learning signals. This reliance on heuristics stems from the fact that the contrastive loss itself is heuristic and does not directly optimize the downstream metrics of decision quality at the end of the processing pipeline. To address this issue, we introduce Neural PG-RANK, a novel training algorithm that learns to rank by instantiating a LLM as a Plackett-Luce ranking policy. Neural PG-RANK provides a principled method for end-to-end training of retrieval models as part of larger decision systems via policy gradient, with little reliance on complex heuristics, and it effectively unifies the training objective with downstream decision-making quality. We conduct extensive experiments on various text retrieval benchmarks. The results demonstrate that when the training objective aligns with the evaluation setup, Neural PG-RANK yields remarkable in-domain performance improvement, with substantial out-of-domain generalization to some critical datasets employed in downstream question answering tasks.

摘要
(Simplified Chinese translation)文本检索对于将知识 integrate into 语言处理管道中扮演着关键的角色，从 chat 式网页搜索到问答系统。当前的 state-of-the-art 文本检索模型利用预训练的大语言模型（LLM）来实现竞争性表现，但是通过 Typical 对比损失来训练 LLM-based 检索器时需要复杂的规则，包括选择困难的负例和使用额外的监督作为学习信号。这种依赖于规则的问题来自于对比损失本身是规则的，不直接优化下游决策质量的度量。为解决这个问题，我们引入 Neural PG-RANK，一种新的训练算法，通过将 LLM 实例化为 Plackett-Luce 排序策略来学习排序。Neural PG-RANK 提供了一种原则性的方法，通过策略梯度来训练检索器，减少复杂的规则，并具有良好的决策质量相关性。我们在多个文本检索标准benchmark上进行了广泛的实验。结果表明，当训练目标与评估setup相一致时，Neural PG-RANK 在域中显著提高表现，同时具有一定的外部泛化能力，在一些关键的问答任务上进行了重要的应用。

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

paper_url: http://arxiv.org/abs/2310.04406
repo_url: https://github.com/andyz245/LanguageAgentTreeSearch
paper_authors: Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang
for: 提高大型自然语言模型（LLM）在决策任务中的表现，并推广其作为自主代理机制的应用。
methods: 基于Monte Carlo搜索的语言代理搜索（LATS）框架，利用LLM作为代理、价值函数和优化器，充分发挥其内在优势。环境反馈机制为外部问题解决机制，超越现有方法的局限性。
results: 在多个领域（编程、HotPotQA、WebShop）进行了实验，demonstrated LATS在理解和行为方面的效果和通用性。例如，在HumanEval上使用GPT-4时达到94.4%的分数，在WebShop上使用GPT-3.5时平均分数为75.9。

Abstract
While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4\% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method.

摘要
大型语言模型（LLM）在各种决策任务上表现出色，但它们依赖于简单的行为过程，无法普遍应用为自主代理。我们介绍了 Language Agent Tree Search（LATS）框架，它将 LLM 作为计划、行为和理解的基础，挖掘它们的潜在能力。 drew inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS 使用 LLM 作为代理、价值函数和优化器，将其用于更好的决策。在这种方法中，环境提供了外部反馈，这意味着更加积极和适应的问题解决机制，超越现有技术的局限性。我们在多个领域进行了实验，包括编程、HotPotQA 和 WebShop，展示了 LATS 在理解和行为方面的可应用性。特别是，LATS 在 HumanEval 上使用 GPT-4 achieve 94.4%，并在 WebShop 上使用 GPT-3.5 获得了平均分数为 75.9，这说明了我们的方法的有效性和通用性。

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

paper_url: http://arxiv.org/abs/2310.04395
repo_url: None
paper_authors: Marvin Schmitt, Daniel Habermann, Paul-Christian Bürkner, Ullrich Köthe, Stefan T. Radev
for: 提高折衔折 Bayesian inference 的效率和准确性，通过抽象 JOINT 模型 $p(\theta, y)$ 中的 universality symmetries 利用。
methods: 借鉴 Bayes 定理，估算 marginal likelihood 基于approximate representation 的 JOINT 模型。在完美准确情况下，marginal likelihood 固定不变，但approximation error 导致不 desirable variance 在不同参数值上。我们将这种偏差作为损失函数，加速 conditional neural density estimator 的学习动态。
results: 在一个简单的 Toy 问题和一个实际模型中，我们应用了我们的方法，并观察到提高了效率和准确性。

Abstract
We propose a method to improve the efficiency and accuracy of amortized Bayesian inference (ABI) by leveraging universal symmetries in the probabilistic joint model $p(\theta, y)$ of parameters $\theta$ and data $y$. In a nutshell, we invert Bayes' theorem and estimate the marginal likelihood based on approximate representations of the joint model. Upon perfect approximation, the marginal likelihood is constant across all parameter values by definition. However, approximation error leads to undesirable variance in the marginal likelihood estimates across different parameter values. We formulate violations of this symmetry as a loss function to accelerate the learning dynamics of conditional neural density estimators. We apply our method to a bimodal toy problem with an explicit likelihood (likelihood-based) and a realistic model with an implicit likelihood (simulation-based).

摘要
我们提出一种方法，用于提高权重抽象概率推理（ABI）的效率和准确性。我们利用概率联合模型 $p(\theta, y)$ 中的universal symmetry来优化。总之，我们在推理 Bayes 定理时进行反向推理，并基于approximate representation来估算 marginal likelihood。在完美approximation情况下， marginal likelihood 对所有参数值都是常数。然而，approximation error 会导致不良的偏差在不同参数值上的 marginal likelihood 估算中。我们将这种不Symmetry 表示为损失函数，以加速 conditional neural density estimator 的学习动态。我们对一个简单的 Toy problem 和一个实际模型进行应用。Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

FMM-Head: Enhancing Autoencoder-based ECG anomaly detection with prior knowledge

paper_url: http://arxiv.org/abs/2310.05848
repo_url: None
paper_authors: Giacomo Verardo, Magnus Boman, Samuel Bruchfeld, Marco Chiesa, Sabine Koch, Gerald Q. Maguire Jr., Dejan Kostic
for: 检测电子心电图数据中异常点的检测是重要的，以提供适时 intervención для高风险患者。
methods: 使用多种AutoEncoder模型（AE）来解决异常检测任务，但这些模型不考虑ECG领导的特定模式。我们则将AE的解码部分替换为基于ECG形态的重建头（namely, FMM-Head），以提高异常检测能力。
results: 我们的模型在AUROC指标上比现有模型高，最高提升0.31，而且具有相对较少的模型大小和可解释的提取特征。模型的处理时间也比解决优化问题来获得相同参数的时间更快，适用于实时ECG参数提取和异常检测。

Abstract
Detecting anomalies in electrocardiogram data is crucial to identifying deviations from normal heartbeat patterns and providing timely intervention to at-risk patients. Various AutoEncoder models (AE) have been proposed to tackle the anomaly detection task with ML. However, these models do not consider the specific patterns of ECG leads and are unexplainable black boxes. In contrast, we replace the decoding part of the AE with a reconstruction head (namely, FMM-Head) based on prior knowledge of the ECG shape. Our model consistently achieves higher anomaly detection capabilities than state-of-the-art models, up to 0.31 increase in area under the ROC curve (AUROC), with as little as half the original model size and explainable extracted features. The processing time of our model is four orders of magnitude lower than solving an optimization problem to obtain the same parameters, thus making it suitable for real-time ECG parameters extraction and anomaly detection.

摘要
“检测电子心脏ogram（ECG）数据中的偏差是关键的，以提供对有问题的患者进行时间对称的评估和治疗。多种机器学习（ML）模型已经被提出供侦测偏差任务，但这些模型未考虑ECG领域的特定模式，而且是黑盒子，不可解释。相比之下，我们将AE模型的解码部分替换为基于ECG形状的重建头（FMM-Head），我们的模型在与国际顶尖模型进行比较时，具有更高的偏差检测能力，最高可以提高0.31倍的ROC曲线面积（AUROC），并且仅需要原始模型的一半大小，同时具有可解释的提取特征。我们的模型处理时间只需四个数据类型的几个排序，相较于解决似然最小化问题以取得相同的参数，处理时间是四个数据类型的四个排序，这使得我们的模型适合实时ECG参数提取和偏差检测。”

Hermes: Unlocking Security Analysis of Cellular Network Protocols by Synthesizing Finite State Machines from Natural Language Specifications

paper_url: http://arxiv.org/abs/2310.04381
repo_url: https://github.com/synsec-den/hermes-spec-to-fsm
paper_authors: Abdullah Al Ishtiaq, Sarkar Snigdha Sarathi Das, Syed Md Mukit Rashid, Ali Ranjbar, Kai Tu, Tianwei Wu, Zhezheng Song, Weixuan Wang, Mujtahid Akon, Rui Zhang, Syed Rafiul Hussain
for: 本 paper 提供了一个终端框架，即 Hermes，用于自动从自然语言 cellular 规范中生成正式表示。
methods: 本 paper 使用了一个神经网络构成分析器，即 NEUTREX，处理 transition-relevant 文本，提取 transition ком成分 (即状态、条件和动作)。它还设计了一个专门的语言，用于将这些 transition ком成分转换为逻辑式ula by 利用依赖关系 parse tree。最后，它将这些逻辑式ula 编译为生成转换和建立正式模型为 finite state machines。
results: 本 paper 使用 Hermes 评估 4G NAS、5G NAS 和 5G RRC 规范，获得了81-87% 的总准确率，与现有的方法相比有所提高。对于提取的模型进行安全分析，发现了3个新的攻击和 Identified 19个先前的攻击在 4G 和 5G 规范中，以及7个偏差在商业 4G 底层。

Abstract
In this paper, we present Hermes, an end-to-end framework to automatically generate formal representations from natural language cellular specifications. We first develop a neural constituency parser, NEUTREX, to process transition-relevant texts and extract transition components (i.e., states, conditions, and actions). We also design a domain-specific language to translate these transition components to logical formulas by leveraging dependency parse trees. Finally, we compile these logical formulas to generate transitions and create the formal model as finite state machines. To demonstrate the effectiveness of Hermes, we evaluate it on 4G NAS, 5G NAS, and 5G RRC specifications and obtain an overall accuracy of 81-87%, which is a substantial improvement over the state-of-the-art. Our security analysis of the extracted models uncovers 3 new vulnerabilities and identifies 19 previous attacks in 4G and 5G specifications, and 7 deviations in commercial 4G basebands.

摘要
在本文中，我们介绍了 Hermes，一个端到端框架，用于自动生成 формаль Representation 从自然语言 celular 规范中。我们首先开发了一个神经网络成分分析器，名为 NEUTREX，用于处理过渡相关的文本，并提取过渡组件（即状态、条件和动作）。我们还设计了域特定语言，用于将过渡组件翻译成逻辑公式，通过利用依赖树来做这个翻译。最后，我们编译这些逻辑公式，生成过渡和创建正式模型为有限状态机。为了证明 Hermes 的效果，我们对 4G NAS、5G NAS 和 5G RRC 规范进行了评估，并获得了总准确率在 81-87% 之间，这与当前状态的技术具有显著的提升。我们的安全分析中发现了 3 个新的攻击点和 19 个之前的攻击在 4G 和 5G 规范中，以及 7 个偏差在商业 4G 基带中。

Confronting Reward Model Overoptimization with Constrained RLHF

paper_url: http://arxiv.org/abs/2310.04373
repo_url: https://github.com/tedmoskovitz/constrainedrl4lms
paper_authors: Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer
for: 本研究旨在解决复杂的语言评估问题，通过调整 composite reward models (CRMs) 的权重来避免过优化现象。
methods: 本研究使用 constrained reinforcement learning 方法，通过 Lagrange multipliers 学习动态权重，以避免每个 CRM 的用处提升点。
results: 研究发现，对于不同的 correlation междуcomponent RMs，可以通过调整权重来避免过优化现象，并且可以通过 gradient-free optimization 方法来在单个运行中进行优化。

Abstract
Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriately weight these component RMs when combining them. Compounding this difficulty, because any RM is only a proxy for human evaluation, this process is vulnerable to $\textit{overoptimization}$, wherein past a certain point, accumulating higher reward is associated with worse human ratings. In this paper, we perform, to our knowledge, the first study on overoptimization in composite RMs, showing that correlation between component RMs has a significant effect on the locations of these points. We then introduce an approach to solve this issue using constrained reinforcement learning as a means of preventing the agent from exceeding each RM's threshold of usefulness. Our method addresses the problem of weighting component RMs by learning dynamic weights, naturally expressed by Lagrange multipliers. As a result, each RM stays within the range at which it is an effective proxy, improving evaluation performance. Finally, we introduce an adaptive method using gradient-free optimization to identify and optimize towards these points during a single run.

摘要
大型语言模型通常与人类偏好相互align，通过优化 $\textit{奖励模型}$ (RM) 适应人类反馈。然而，人类偏好是多方面的，因此常会从多个简单的奖励模型中获得奖励，每个模型都 capture 不同的语言质量方面。这本身就是一个挑战，因为将这些组件 RM 组合时难以设置合适的权重。另外，因为任何 RM 都只是人类评价的对称 proxy，这个过程容易出现 $\textit{过乎优化}$ 现象，即当 RM 获得更高的奖励后，人类评价会变得更差。在这篇论文中，我们执行了我们知道的第一个关于过乎优化在composite RM 中的研究，显示了成分 RM 之间的联乘效应有着重要的影响。我们随后引入了一种方法来解决这个问题，使用受限的循环学习来防止代理人超过每个 RM 的有用性阈值。我们的方法可以自然地学习动态的权重，由拉格朗日积分自然地表达。因此，每个 RM 都保持在其有用性范围内，提高评估性能。最后，我们引入了一种适应方法，使用梯度自由优化来识别和优化这些点 During a single run。

A Language-Agent Approach to Formal Theorem-Proving

paper_url: http://arxiv.org/abs/2310.04353
repo_url: https://github.com/trishullab/copra
paper_authors: Amitayush Thakur, Yeming Wen, Swarat Chaudhuri
for: 这篇论文旨在开发一种基于大语言模型（LLM）的语言代理方法，用于控制任务。
methods: 该方法使用一个高容量黑盒LMM（GPT-4）作为策略，并在状态备份搜索中使用外部数据库中的证明和定义。在搜索过程中，策略可以选择证明策略和从外部数据库中检索证明和定义，并将每次选择的策略执行在基础证明框架中。搜索还跟踪选择历史记录，并使用其来减少幻觉和不必要的LMM查询。
results: 在Lean的miniF2F标准套件和Compcert项目中的Coq任务上，COPRA比一次性调用GPT-4和已经精心适应证明数据的当前状态-of-the-art模型更快地找到正确的证明。

Abstract
Language agents, which use a large language model (LLM) capable of in-context learning to interact with an external environment, have recently emerged as a promising approach to control tasks. We present the first language-agent approach to formal theorem-proving. Our method, COPRA, uses a high-capacity, black-box LLM (GPT-4) as part of a policy for a stateful backtracking search. During the search, the policy can select proof tactics and retrieve lemmas and definitions from an external database. Each selected tactic is executed in the underlying proof framework, and the execution feedback is used to build the prompt for the next policy invocation. The search also tracks selected information from its history and uses it to reduce hallucinations and unnecessary LLM queries. We evaluate COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the Compcert project. On these benchmarks, COPRA is significantly better than one-shot invocations of GPT-4, as well as state-of-the-art models fine-tuned on proof data, at finding correct proofs quickly.

摘要
language agents，具有大型语言模型（LLM）可进行在上下文中学习的能力，最近被认为是控制任务的有望的方法。我们现在提出了第一个语言代理方法，用于形式证明。我们的方法，COPRA，使用一个高容量的黑盒子LLM（GPT-4）作为一个状态备用搜索的策略。在搜索过程中，策略可以选择证明策略和从外部数据库中检索证明和定义。每次选择的策略都会在下面的证明框架中执行，并将执行反馈用于下一次策略调用的建议。搜索还跟踪了历史记录中的选定信息，并使用其来减少幻想和不必要的LLM查询。我们对miniF2F测试准则和Coq项目中的一组任务进行评估。在这些准则上，COPRA表现出了较一次GPT-4的 invoke 和状态当前模型，快速找到正确的证明。

Neur2RO: Neural Two-Stage Robust Optimization

paper_url: http://arxiv.org/abs/2310.04345
repo_url: https://github.com/khalil-research/neur2ro
paper_authors: Justin Dumouchelle, Esther Julien, Jannis Kurtz, Elias B. Khalil
for: 这篇论文旨在提出一种高效的机器学习驱动的二stage robust优化算法（Neur2RO），用于解决具有最差情况不确定性的决策问题。
methods: 论文提出了一种基于神经网络的 column-and-constraint generation（CCG）算法，通过嵌入神经网络到 CCG 中来实现高质量的解决方案。
results: 实验表明，Neur2RO 可以快速地获得高质量的解决方案，比如在 knapsack 问题上，Neur2RO 可以在几秒钟内获得比best-known值的2%的解决方案，而且在更大和更复杂的实例上，Neur2RO 还可以获得更好的解决方案。在 capital budgeting 问题上，Neur2RO 比三种 $k$-adaptability 算法更高效，尤其是在最大实例上，解决时间减少了5到10倍。

Abstract
Robust optimization provides a mathematical framework for modeling and solving decision-making problems under worst-case uncertainty. This work addresses two-stage robust optimization (2RO) problems (also called adjustable robust optimization), wherein first-stage and second-stage decisions are made before and after uncertainty is realized, respectively. This results in a nested min-max-min optimization problem which is extremely challenging computationally, especially when the decisions are discrete. We propose Neur2RO, an efficient machine learning-driven instantiation of column-and-constraint generation (CCG), a classical iterative algorithm for 2RO. Specifically, we learn to estimate the value function of the second-stage problem via a novel neural network architecture that is easy to optimize over by design. Embedding our neural network into CCG yields high-quality solutions quickly as evidenced by experiments on two 2RO benchmarks, knapsack and capital budgeting. For knapsack, Neur2RO finds solutions that are within roughly $2\%$ of the best-known values in a few seconds compared to the three hours of the state-of-the-art exact branch-and-price algorithm; for larger and more complex instances, Neur2RO finds even better solutions. For capital budgeting, Neur2RO outperforms three variants of the $k$-adaptability algorithm, particularly on the largest instances, with a 5 to 10-fold reduction in solution time. Our code and data are available at https://github.com/khalil-research/Neur2RO.

摘要
Robust优化提供了一个数学框架，用于模型和解决面临最坏情况不确定性的决策问题。这项工作关注了两阶段稳健优化（2RO）问题，其中第一阶段和第二阶段决策在不确定性实现前后分别进行。这将导致一个嵌套的最小值最大值最小值优化问题，计算上非常复杂，特别是当决策是离散的时候。我们提出了Neur2RO，一种高效的机器学习驱动的列和约束生成（CCG）实现。具体来说，我们通过一种新的神经网络架构来估算第二阶段问题的价值函数，这种神经网络架构易于优化。将我们的神经网络 embedding到 CCG 中，可以快速获得高质量的解决方案，经实验表明，在两个 2RO benchmark 中，Neur2RO 可以在几秒钟内获得比best-known值几乎2%的解决方案，而且在更大和更复杂的实例中，Neur2RO 可以获得更好的解决方案。对于资本投入问题，Neur2RO 可以在三个 $k$-adaptability 算法的基础上更好地解决问题，尤其是在最大实例中，解决时间减少了5-10倍。我们的代码和数据可以在中找到。

T-Rep: Representation Learning for Time Series using Time-Embeddings

paper_url: http://arxiv.org/abs/2310.04486
repo_url: None
paper_authors: Archibald Fraikin, Adrien Bennetot, Stéphanie Allassonnière
for: 本研究旨在Addressing the challenges of multivariate time series data in machine learning, specifically unlabeled, high-dimensional, noisy, and missing data.
methods: 提出了T-Rep方法，它是一种自动编写的方法，通过学习时间序列表示来捕捉时间特征，包括趋势、周期性和分布变化。
results: 与现有的自动编写方法进行比较，T-Rep在下游分类、预测和异常检测任务中表现出色，在缺失数据情况下也表现更加稳定。此外，通过准确可见性实验，表明学习的表示具有可读性。

Abstract
Multivariate time series present challenges to standard machine learning techniques, as they are often unlabeled, high dimensional, noisy, and contain missing data. To address this, we propose T-Rep, a self-supervised method to learn time series representations at a timestep granularity. T-Rep learns vector embeddings of time alongside its feature extractor, to extract temporal features such as trend, periodicity, or distribution shifts from the signal. These time-embeddings are leveraged in pretext tasks, to incorporate smooth and fine-grained temporal dependencies in the representations, as well as reinforce robustness to missing data. We evaluate T-Rep on downstream classification, forecasting, and anomaly detection tasks. It is compared to existing self-supervised algorithms for time series, which it outperforms in all three tasks. We test T-Rep in missing data regimes, where it proves more resilient than its counterparts. Finally, we provide latent space visualisation experiments, highlighting the interpretability of the learned representations.

摘要
多变量时间序列呈现出标准机器学习技术的挑战，因为它们通常无标签、高维、噪音和存在散失数据。为解决这一问题，我们提出T-Rep方法，这是一种自我超级vised的方法，用于在时间步长级别上学习时间序列表示。T-Rep将时间序列与其特征提取器一起学习 vector embedding，以EXTRACT时间特征，如趋势、周期性和分布变化。这些时间Embedding被用于预text任务中，以捕捉细致的时间相关性，以及在散失数据的情况下强化Robustness。我们对T-Rep进行下游分类、预测和异常检测任务的评估，并与现有的自我超级vised算法进行比较。 results show that T-Rep在所有三个任务中表现出色，并在散失数据的情况下更加稳定。最后，我们进行了隐藏空间视觉实验，以显示学习的表示的可读性。

Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

paper_url: http://arxiv.org/abs/2310.04323
repo_url: None
paper_authors: Yuxin Pan, Yize Chen, Fangzhen Lin
for: 解决在线三维堆包含问题（3D-BPP）的有效政策设计，因为问题的不可预测性和物理约束，带来了长期的挑战。
methods: 我们首先引入了一种排序基于的攻击者，以Investigate现有的DRL和冒险方法在解决3D-BPP问题中的实际Robustness。然后，我们提出了一种可调策略Robust reinforcement learning（AR2L）框架，可以efficient地调整Robustness的权重，以实现策略在平均和worst-case环境中的desired平衡。
results: 我们的实验表明，AR2L是一种多功能的策略，可以提高策略的Robustness，同时保持 Nominal 情况下的性能在接受 Water level。

Abstract
Designing effective policies for the online 3D bin packing problem (3D-BPP) has been a long-standing challenge, primarily due to the unpredictable nature of incoming box sequences and stringent physical constraints. While current deep reinforcement learning (DRL) methods for online 3D-BPP have shown promising results in optimizing average performance over an underlying box sequence distribution, they often fail in real-world settings where some worst-case scenarios can materialize. Standard robust DRL algorithms tend to overly prioritize optimizing the worst-case performance at the expense of performance under normal problem instance distribution. To address these issues, we first introduce a permutation-based attacker to investigate the practical robustness of both DRL-based and heuristic methods proposed for solving online 3D-BPP. Then, we propose an adjustable robust reinforcement learning (AR2L) framework that allows efficient adjustment of robustness weights to achieve the desired balance of the policy's performance in average and worst-case environments. Specifically, we formulate the objective function as a weighted sum of expected and worst-case returns, and derive the lower performance bound by relating to the return under a mixture dynamics. To realize this lower bound, we adopt an iterative procedure that searches for the associated mixture dynamics and improves the corresponding policy. We integrate this procedure into two popular robust adversarial algorithms to develop the exact and approximate AR2L algorithms. Experiments demonstrate that AR2L is versatile in the sense that it improves policy robustness while maintaining an acceptable level of performance for the nominal case.

摘要
“设计有效的策略 для在线三维弹性问题（3D-BPP）已经是一个长期的挑战，主要是因为来宾盒子序列的不可预测性和严格的物理限制。现有的深度强化学习（DRL）方法可以对在线3D-BPP中的均值性表现进行优化，但是它们在实际世界中可能会失败，因为一些最差情况可能会出现。标准的Robust DRL算法往往将最差情况的表现优先考虑，导致在正常问题域中的表现不佳。为解决这些问题，我们首先引入了一个 permutation-based 攻击者，以investigate the practical robustness of both DRL-based和heuristic methods proposed for solving online 3D-BPP。然后，我们提出了一个可调robust reinforcement learning（AR2L）框架，可以实现政策的性能平衡。具体来说，我们将目标函数设计为一个权重加权的总和，并 derivethe lower performance bound by relating to the return under a mixture dynamics。为实现这个下界，我们运用了一个迭代程序，寻找相应的混合动力学和改善相应的政策。我们将这个程序整合到了两种流行的Robust adversarial algorithms中，以开发出精确和近似的AR2L算法。实验结果显示，AR2L是一个多元的策略，可以提高政策的Robustness，同时维持nominal case中的表现水准。”

Towards A Robust Group-level Emotion Recognition via Uncertainty-Aware Learning

paper_url: http://arxiv.org/abs/2310.04306
repo_url: None
paper_authors: Qing Zhu, Qirong Mao, Jialin Zhang, Xiaohua Huang, Wenming Zheng
for: 这篇论文是为了提出一种能够在不约束环境下更好地进行人群情感识别（GER）的方法。
methods: 该方法使用了不确定性感知（UAL）技术，通过显式地模型每个个体的不确定性，使用 Gaussian 分布中的杂态 embedding 来捕捉每个个体的可能性。在推理阶段，通过这种杂态性，生成多种情感预测。此外，还开发了一个图像增强模块，以提高模型对严重噪声的抗颤响性。
results: 实验结果表明，该方法在三个通用的数据库上达到了高效性和普适性。

Abstract
Group-level emotion recognition (GER) is an inseparable part of human behavior analysis, aiming to recognize an overall emotion in a multi-person scene. However, the existing methods are devoted to combing diverse emotion cues while ignoring the inherent uncertainties under unconstrained environments, such as congestion and occlusion occurring within a group. Additionally, since only group-level labels are available, inconsistent emotion predictions among individuals in one group can confuse the network. In this paper, we propose an uncertainty-aware learning (UAL) method to extract more robust representations for GER. By explicitly modeling the uncertainty of each individual, we utilize stochastic embedding drawn from a Gaussian distribution instead of deterministic point embedding. This representation captures the probabilities of different emotions and generates diverse predictions through this stochasticity during the inference stage. Furthermore, uncertainty-sensitive scores are adaptively assigned as the fusion weights of individuals' face within each group. Moreover, we develop an image enhancement module to enhance the model's robustness against severe noise. The overall three-branch model, encompassing face, object, and scene component, is guided by a proportional-weighted fusion strategy and integrates the proposed uncertainty-aware method to produce the final group-level output. Experimental results demonstrate the effectiveness and generalization ability of our method across three widely used databases.

摘要
group-level emotion recognition (GER) 是人类行为分析中不可或缺的一部分，旨在在多人场景中识别总体的情感。然而，现有的方法均是通过结合多种情感迹象来实现，而忽略了无结构环境中的自然不确定性，如群体中的堵塞和遮挡。此外，只有群体级别的标签可用，因此在同一个群体中不一致的情感预测可能会混淆网络。在这篇论文中，我们提出了一种不确定性意识学习（UAL）方法，以提取更加稳定的表示 для GER。我们通过显式地模型每个个体的不确定性，使用 Gaussian 分布中的随机点 embedding，而不是固定点 embedding。这种表示捕捉了不同情感的概率，并在推理阶段通过随机性产生多种预测。此外，我们还开发了一个图像增强模块，以提高模型对严重噪声的抗锋性。总体来说，我们的三支分支模型，包括人脸、物体和场景组件，采用比例权重混合策略，并将我们提出的不确定性意识学习方法 integrate 到生成最终群体级别输出。实验结果表明我们的方法在三个通用的数据库上表现出色，并且具有普适性和泛化能力。

Coding by Design: GPT-4 empowers Agile Model Driven Development

paper_url: http://arxiv.org/abs/2310.04304
repo_url: None
paper_authors: Ahmed R. Sadik, Sebastian Brulin, Markus Olhofer
for: 这个研究的目的是提出一种基于 OpenAI GPT-4 的 Agile Model-Driven Development (MDD) 方法，以便在使用自然语言生成代码时解决模型中的歧义性问题。
methods: 这种方法包括在首层和第二层使用 Unified Model Language (UML) 图文 Representation，然后在第三层使用 GPT-4 自动生成代码。在第二层，我们引入了两组约束来减少模型的歧义性，包括 Object Constraints Language (OCL) 和 FIPA ontology。
results: 我们的研究表明，使用这种方法可以生成符合预期 UML 序列图的行为，并且对代码结构进行了比较。结果表明，使用ontology-constrained模型可以生成更复杂的代码，但这种代码仍然可以被轻松地测试和维护。

Abstract
Generating code from a natural language using Large Language Models (LLMs) such as ChatGPT, seems groundbreaking. Yet, with more extensive use, it's evident that this approach has its own limitations. The inherent ambiguity of natural language presents challenges for complex software designs. Accordingly, our research offers an Agile Model-Driven Development (MDD) approach that enhances code auto-generation using OpenAI's GPT-4. Our work emphasizes "Agility" as a significant contribution to the current MDD method, particularly when the model undergoes changes or needs deployment in a different programming language. Thus, we present a case-study showcasing a multi-agent simulation system of an Unmanned Vehicle Fleet. In the first and second layer of our approach, we constructed a textual representation of the case-study using Unified Model Language (UML) diagrams. In the next layer, we introduced two sets of constraints that minimize model ambiguity. Object Constraints Language (OCL) is applied to fine-tune the code constructions details, while FIPA ontology is used to shape communication semantics and protocols. Ultimately, leveraging GPT-4, our last layer auto-generates code in both Java and Python. The Java code is deployed within the JADE framework, while the Python code is deployed in PADE framework. Concluding our research, we engaged in a comprehensive evaluation of the generated code. From a behavioural standpoint, the auto-generated code aligned perfectly with the expected UML sequence diagram. Structurally, we compared the complexity of code derived from UML diagrams constrained solely by OCL to that influenced by both OCL and FIPA-ontology. Results indicate that ontology-constrained model produce inherently more intricate code, but it remains manageable and low-risk for further testing and maintenance.

摘要
使用大型自然语言模型（LLM）如ChatGPT生成代码看起来是一项创新的技术，但是随着更广泛的使用，这种方法的限制也变得更加明显。自然语言的内在抽象性会导致复杂的软件设计困难。因此，我们的研究提出了一种基于Model-Driven Development（MDD）的Agile模型驱动方法，通过OpenAI的GPT-4提高代码自动生成。我们的工作强调“适应”作为我们的贡献，特别是当模型进行变更或需要在不同编程语言中部署时。因此，我们提供了一个多代理 simulations系统的无人车队例子。在我们的方法中，在第一层和第二层，我们使用Unified Model Language（UML）图文描述了这个例子。在下一层，我们引入了两组约束，以降低模型的抽象性。Object Constraints Language（OCL）用于细化代码构造细节，而FIPAontology用于形成通信协议和 semantics。最后，通过GPT-4，我们的最后一层自动生成代码在Java和Python两种编程语言中。Java代码在JADE框架中部署，而Python代码在PADE框架中部署。在结束我们的研究后，我们进行了全面的代码生成评估。从行为上来看，自动生成的代码与预期的UML序列图完全匹配。从结构上来看，我们比较了由UML图文描述的代码和只由OCL约束的代码的复杂性。结果表明，基于ontology的模型生成的代码具有更高的复杂性，但是它仍然可以被成功地测试和维护。

Identifying Representations for Intervention Extrapolation

paper_url: http://arxiv.org/abs/2310.04295
repo_url: None
paper_authors: Sorawit Saengkyongam, Elan Rosenfeld, Pradeep Ravikumar, Niklas Pfister, Jonas Peters
for: 本研究旨在提高当前表征学学习方法的泛化性和稳定性。
methods: 本研究使用了可识别表征学学习方法，并结合了权重变换约束来保证表征的线性不变性。
results: 研究表明，通过使用可识别表征学学习方法，可以在不见到干预的情况下预测干预的效果，并且可以减少干预的影响。

Abstract
The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: predicting how interventions affect an outcome, even when those interventions are not observed at training time, and show that identifiable representations can provide an effective solution to this task even if the interventions affect the outcome non-linearly. Our setup includes an outcome Y, observed features X, which are generated as a non-linear transformation of latent features Z, and exogenous action variables A, which influence Z. The objective of intervention extrapolation is to predict how interventions on A that lie outside the training support of A affect Y. Here, extrapolation becomes possible if the effect of A on Z is linear and the residual when regressing Z on A has full support. As Z is latent, we combine the task of intervention extrapolation with identifiable representation learning, which we call Rep4Ex: we aim to map the observed features X into a subspace that allows for non-linear extrapolation in A. We show using Wiener's Tauberian theorem that the hidden representation is identifiable up to an affine transformation in Z-space, which is sufficient for intervention extrapolation. The identifiability is characterized by a novel constraint describing the linearity assumption of A on Z. Based on this insight, we propose a method that enforces the linear invariance constraint and can be combined with any type of autoencoder. We validate our theoretical findings through synthetic experiments and show that our approach succeeds in predicting the effects of unseen interventions.

摘要
《表观性和 causal 表示学习的前提是提高当前表示学习 paradigm 的普遍性或Robustness。尽管最近的进展在表示可识别性方面，但需要更多的理论成果，证明这些方法在下游任务中的优点。在这篇论文中，我们考虑了 intervención extrapolation 任务：预测干扰的影响，即在训练时没有观察到的干扰，并显示了可识别表示可以提供非线性干扰预测的有效解决方案。我们的设置包括结果 Y，观察特征 X，它们是非线性变换的隐藏特征 Z 的生成，以及外部动作变量 A，它们影响 Z。干扰 extrapolation 的目标是预测 A 在训练支持外的干扰对 Y 的影响。在 Z 是隐藏的情况下，推断成为可能，如果 A 对 Z 的效果是线性的，并且 Z 中的剩余差异拥有完整的支持。我们将 Representation 4 Ex（Rep4Ex）任务与可识别表示学习结合在一起，即将观察特征 X 映射到一个允许非线性推断 A 的子空间。根据维ener 的 Tauberian 定理，隐藏表示是可识别的，即在 Z 空间中可以确定一个 afine 变换。这种可识别性是基于一个新的约束，描述了 A 对 Z 的线性假设。根据这一点，我们提出了一种方法，可以与任何类型的 autoencoder 结合使用，并且可以满足这种约束。我们通过synthetic实验验证了我们的理论发现，并证明了我们的方法可以预测未见干扰的效果。

Searching for Optimal Runtime Assurance via Reachability and Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.04288
repo_url: None
paper_authors: Kristina Miller, Christopher K. Zeitler, William Shen, Kerianne Hobbs, Sayan Mitra, John Schierman, Mahesh Viswanathan
for: 本研究旨在开发一种可靠的runtime assurance system (RTA)，以确保安全性while exercising an untrusted或实验性控制器。
methods: 本研究使用 reward shaping和 reinforcement learning来解决RTA的optimal设计问题，可以保证安全性并利用机器学习技术来提高可扩展性。
results: 对于一些复杂的安全需求的3D空间飞机模型，我们的方法可以保证安全性并提高实验控制器的使用率，比已有方法更高。

Abstract
A runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup (or safety) controller. The relevant computational design problem is to create a logic that assures safety by switching to the safety controller as needed, while maximizing some performance criteria, such as the utilization of the untrusted controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it. Our approach relies on reward shaping and reinforcement learning. It can guarantee safety and leverage machine learning technologies for scalability. We have implemented this algorithm and present experimental results comparing our approach with state-of-the-art reachability and simulation-based RTA approaches in a number of scenarios using aircraft models in 3D space with complex safety requirements. Our approach can guarantee safety while increasing utilization of the experimental controller over existing approaches.

摘要
traducción al chino simplificado:runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup (or safety) controller. The relevant computational design problem is to create a logic that assures safety by switching to the safety controller as needed, while maximizing some performance criteria, such as the utilization of the untrusted controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it. Our approach relies on reward shaping and reinforcement learning. It can guarantee safety and leverage machine learning technologies for scalability. We have implemented this algorithm and present experimental results comparing our approach with state-of-the-art reachability and simulation-based RTA approaches in a number of scenarios using aircraft models in 3D space with complex safety requirements. Our approach can guarantee safety while increasing utilization of the experimental controller over existing approaches.Notes:* "runtime assurance system" (RTA) is translated as "runtime assurance system" (RTA)* "untrusted or experimental controller" is translated as "untrusted or experimental controller"* "backup (or safety) controller" is translated as "backup (or safety) controller"* "computational design problem" is translated as "computational design problem"* "reward shaping and reinforcement learning" is translated as "奖励形态和返回学习"* "state-of-the-art reachability and simulation-based RTA approaches" is translated as "现有的可达性和模拟基于RTA方法"* "experimental results" is translated as "实验结果"* "complex safety requirements" is translated as "复杂的安全要求"

Assessing Robustness via Score-Based Adversarial Image Generation

paper_url: http://arxiv.org/abs/2310.04285
repo_url: None
paper_authors: Marcel Kollovieh, Lukas Gosch, Yan Scholten, Marten Lienen, Stephan Günnemann
for: 本研究旨在探讨针对抗骚扰攻击的限制， traditional的方法只能在 $\ell_p$ 概率范围内进行攻击，但是这些限制无法捕捉所有Semantic-preserving的骚扰。
methods: 本文提出了 Score-Based Adversarial Generation（ScoreAG）框架，利用了分布式生成模型的进步，可以生成不受 $\ell_p$ 概率范围限制的骚扰例，称为不限制骚扰例。
results: 对多个benchmark进行了广泛的实验，得到的结果显示，ScoreAG具有与现状最佳的攻击和防御性能，而且可以纠正攻击者所搅入的杂谏。

Abstract
Most adversarial attacks and defenses focus on perturbations within small $\ell_p$-norm constraints. However, $\ell_p$ threat models cannot capture all relevant semantic-preserving perturbations, and hence, the scope of robustness evaluations is limited. In this work, we introduce Score-Based Adversarial Generation (ScoreAG), a novel framework that leverages the advancements in score-based generative models to generate adversarial examples beyond $\ell_p$-norm constraints, so-called unrestricted adversarial examples, overcoming their limitations. Unlike traditional methods, ScoreAG maintains the core semantics of images while generating realistic adversarial examples, either by transforming existing images or synthesizing new ones entirely from scratch. We further exploit the generative capability of ScoreAG to purify images, empirically enhancing the robustness of classifiers. Our extensive empirical evaluation demonstrates that ScoreAG matches the performance of state-of-the-art attacks and defenses across multiple benchmarks. This work highlights the importance of investigating adversarial examples bounded by semantics rather than $\ell_p$-norm constraints. ScoreAG represents an important step towards more encompassing robustness assessments.

摘要
大多数敌对攻击和防御都集中在小$\ell_p$-norm的偏差内。然而，$\ell_p$ 威胁模型无法捕捉所有具有 semantic-preserving 偏差的攻击，因此robustness评估的范围有限。在这项工作中，我们介绍了Score-Based Adversarial Generation（ScoreAG），一种新的框架，利用了得到的分数基本生成模型来生成 beyond $\ell_p$-norm 的攻击例子，即所谓的无限攻击例子，超越它们的局限性。与传统方法不同，ScoreAG保留了图像的核心含义，并生成了真实的攻击例子，可以是对现有图像进行变换还是从头开始生成全新的图像。我们进一步利用了ScoreAG的生成能力来纯化图像，实际上提高了分类器的 robustness。我们的广泛的实验证明了 ScoreAG 与当前状态的攻击和防御技术在多个benchmark上具有相同的性能。这项工作强调了对于 adversarial examples 的Semantic bounded 而不是 $\ell_p$-norm 的限制是更加重要的。ScoreAG 代表了更全面的robustness评估的一个重要步阶。

From task structures to world models: What do LLMs know?

paper_url: http://arxiv.org/abs/2310.04276
repo_url: None
paper_authors: Ilker Yildirim, L. A. Paul
for: 本研究探讨了大语言模型具备知识的方面，挑战我们对智能和知识的假设。
methods: 本研究使用了大语言模型，探讨了这些模型具备的能力是否可以视为知识。
results: 研究发现，大语言模型具备一种称为”工具知识”的知识，这种知识定义为一组能力。然而，这种知识与人类agens所表现的”世界知识”之间存在关系，需要进一步探讨。

Abstract
In what sense does a large language model have knowledge? The answer to this question extends beyond the capabilities of a particular AI system, and challenges our assumptions about the nature of knowledge and intelligence. We answer by granting LLMs "instrumental knowledge"; knowledge defined by a certain set of abilities. We then ask how such knowledge is related to the more ordinary, "worldly" knowledge exhibited by human agents, and explore this in terms of the degree to which instrumental knowledge can be said to incorporate the structured world models of cognitive science. We discuss ways LLMs could recover degrees of worldly knowledge, and suggest such recovery will be governed by an implicit, resource-rational tradeoff between world models and task demands.

摘要
哪种意义上的知识具有大语言模型知识？答案超出了某个AI系统的能力，挑战我们对知识和智能的假设。我们回答是通过授予LLMs“工具知识”，即知识定义为某些能力。然后我们问这种知识与人类代理的“日常”知识之间的关系，并通过考虑这种知识是否可以通过认知科学中的结构化世界模型来捕捉。我们讨论LLMs如何恢复世界知识的方式，并建议这种恢复受到了隐式的资源可用性评估和任务需求的交互。

A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

paper_url: http://arxiv.org/abs/2310.04270
repo_url: None
paper_authors: Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, Jimmy Huang
for: 本研究旨在评估大型自然语言模型（LLM）在生物医学领域中的表现。
methods: 我们使用4种Popular LLM在6种多样化的生物医学任务中进行了广泛的评估，并在26个数据集上进行了对比。
results: 我们发现，在生物医学数据集中，具有较小训练集的零 shot LLM可以超越当前状态的拟合biomedical模型。此外，我们发现不同的LLM在不同任务中的表现可能会异常，而且 их表现仍然远低于经过大训练集的精心调整的生物医学模型。然而，我们的发现表明LLM在缺乏大量标注数据的任务中可能是一种有价值的工具。

Abstract
Recently, Large Language Models (LLM) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, we conduct a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.

摘要
Interestingly, we find that in biomedical datasets with smaller training sets, zero-shot LLMs outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs specialized in the biomedical domain. We also find that no single LLM can outperform other LLMs in all tasks, and the performance of different LLMs varies depending on the task. While their performance is still poor compared to fine-tuned biomedical models, our findings show that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.

DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories

paper_url: http://arxiv.org/abs/2310.04266
repo_url: https://github.com/elharirymatteo/rans
paper_authors: Matteo El-Hariry, Antoine Richard, Vivek Muralidharan, Baris Can Yalcin, Matthieu Geist, Miguel Olivares-Mendez
For: This paper introduces a novel deep reinforcement learning-based suite to control floating platforms in both simulated and real-world environments.* Methods: The paper uses state-of-the-art deep reinforcement learning techniques to train policies capable of precise maneuvers amid dynamic and unpredictable conditions.* Results: The paper achieves robustness, adaptability, and good transferability from simulation to reality, and provides a comprehensive platform for researchers with open-access on GitHub.

Abstract
This investigation introduces a novel deep reinforcement learning-based suite to control floating platforms in both simulated and real-world environments. Floating platforms serve as versatile test-beds to emulate microgravity environments on Earth. Our approach addresses the system and environmental uncertainties in controlling such platforms by training policies capable of precise maneuvers amid dynamic and unpredictable conditions. Leveraging state-of-the-art deep reinforcement learning techniques, our suite achieves robustness, adaptability, and good transferability from simulation to reality. Our Deep Reinforcement Learning (DRL) framework provides advantages such as fast training times, large-scale testing capabilities, rich visualization options, and ROS bindings for integration with real-world robotic systems. Beyond policy development, our suite provides a comprehensive platform for researchers, offering open-access at https://github.com/elharirymatteo/RANS/tree/ICRA24.

摘要
这个研究引入了一套基于深度优化学习的控制浮 плаform suite，可以在模拟和实际环境中控制浮 плаform。浮 плаform 作为地球上模拟微重力环境的软件平台，我们的方法可以在不稳定和随机的环境中做精准的操作。通过使用现代深度优化学习技术，我们的框架可以实现对系统和环境不确定性的适应，以及从模拟到实际的好转移性。我们的深度优化学习（DRL）框架具有快速训练时间、大规模测试能力、丰富的视觉化选项以及ROS绑定，以便与实际机器人系统集成。此外，我们的框架还提供了许多研究平台，包括开源的Github上的https://github.com/elharirymatteo/RANS/tree/ICRA24。

Ada-Instruct: Adapting Instruction Generators for Complex Reasoning

paper_url: http://arxiv.org/abs/2310.04484
repo_url: https://github.com/wangitu/ada-instruct
paper_authors: Wanyun Cui, Qianle Wang
for: 提高大语言模型（LLM）下推荐练习任务的效果，以提高模型的推荐能力。
methods: 利用开源大语言模型进行精度调整，以生成较长的复杂指令，以解决现有方法无法生成长度超过100的指令问题。
results: 对多种应用（代码完成、数学逻辑、常识理解）进行了实验 Validation，并证明了 Ada-Instruct 的优越性，比其基础模型、自我指导方法和当前状态艺术模型更好。

Abstract
Generating diverse and sophisticated instructions for downstream tasks by Large Language Models (LLMs) is pivotal for advancing the effect. Current approaches leverage closed-source LLMs, employing in-context prompting for instruction generation. However, in this paper, we found that in-context prompting cannot generate complex instructions with length $\ge 100$ for tasks like code completion. To solve this problem, we introduce Ada-Instruct, an adaptive instruction generator developed by fine-tuning open-source LLMs. Our pivotal finding illustrates that fine-tuning open-source LLMs with a mere ten samples generates long instructions that maintain distributional consistency for complex reasoning tasks. We empirically validated Ada-Instruct's efficacy across different applications, including code completion, mathematical reasoning, and commonsense reasoning. The results underscore Ada-Instruct's superiority, evidencing its improvements over its base models, current self-instruct methods, and other state-of-the-art models.

摘要
<>将文本翻译成简化中文。<>现有 Approaches 使用关闭源 LLMs 进行下游任务的指令生成。然而，这篇论文发现，在 Context 提示下，不能生成长度超过 100 的复杂指令。为解决这个问题，我们介绍了 Ada-Instruct，一种基于开源 LLMs 的适应指令生成器。我们的重要发现是，只需要 Fine-Tuning 开源 LLMs 的 Ten 个样本，就可以生成长指令，保持分布性一致性。我们对 Ada-Instruct 的可行性进行了不同应用的实验 validate，包括代码完成、数学逻辑和常识逻辑。结果表明 Ada-Instruct 的优势，比 Base 模型、现有自我指令方法和其他状态艺术模型都更好。

Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison

paper_url: http://arxiv.org/abs/2310.04241
repo_url: None
paper_authors: Moritz Lange, Noah Krystiniak, Raphael C. Engelhardt, Wolfgang Konen, Laurenz Wiskott
for: 提高实际RL环境中效率和可靠性的假设observation representation学习方法。
methods: 比较常见的 auxillary task 基于，与之前没有 decoupled representation learning 方法进行比较。
results: 在简单的摆铃到复杂的模拟 robotics 任务中，表示学习环境动力学是更有利于预测奖励的。这些发现可以指导未来的假设observation representation学习方法的开发，并推动RL解决方案在实际场景中的应用。

Abstract
Real-world reinforcement learning (RL) environments, whether in robotics or industrial settings, often involve non-visual observations and require not only efficient but also reliable and thus interpretable and flexible RL approaches. To improve efficiency, agents that perform state representation learning with auxiliary tasks have been widely studied in visual observation contexts. However, for real-world problems, dedicated representation learning modules that are decoupled from RL agents are more suited to meet requirements. This study compares common auxiliary tasks based on, to the best of our knowledge, the only decoupled representation learning method for low-dimensional non-visual observations. We evaluate potential improvements in sample efficiency and returns for environments ranging from a simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks only provides performance gains in sufficiently complex environments and that learning environment dynamics is preferable to predicting rewards. These insights can inform future development of interpretable representation learning approaches for non-visual observations and advance the use of RL solutions in real-world scenarios.

摘要
现实世界中的强化学习（RL）环境，无论是机器人或工业场景，通常会包含非视觉观察和需要高效、可靠、可解释和灵活的RL方法。为了提高效率，在视觉观察上进行状态表示学习的代理人已经广泛研究。然而，在实际问题中，专门为状态表示学习设计的模块更适合满足需求。本研究比较了常见的辅助任务，基于我们所知道的唯一分离状态学习方法，对低维非视觉观察进行评估。我们发现，只有在 suficiently complex 环境下，代理人可以从状态学习中获得性能提升。此外，我们发现，学习环境动态是更好地预测奖励，而不是预测奖励本身。这些发现可以指导未来的解释状态学习方法的发展，以及RL解决方案在实际场景中的应用。

The WayHome: Long-term Motion Prediction on Dynamically Scaled

paper_url: http://arxiv.org/abs/2310.04232
repo_url: None
paper_authors: Kay Scheerer, Thomas Michalke, Juergen Mathes
for: This paper is written for the purpose of developing a novel motion forecasting approach for autonomous vehicles, specifically to accurately predict the motion of other objects in the surrounding environment.
methods: The paper uses a neural network-based model to predict multiple heatmaps for every traffic participant in the vicinity of the autonomous vehicle, with one heatmap per timestep. The heatmaps are then used as input to a novel sampling algorithm that extracts coordinates corresponding to the most likely future positions.
results: The approach improves state-of-the-art miss rate performance for the function-relevant prediction interval of 3 seconds, while being competitive in longer prediction intervals (up to eight seconds). The evaluation is done on the public 2022 Waymo motion challenge.Here is the same information in Simplified Chinese text:
for: 这篇论文是为了开发一种新的自动驾驶车动力预测方法，特别是准确预测周围环境中其他对象的运动。
methods: 该论文使用神经网络模型预测每个交通参与者的多个热图，每个热图一个时间步长。热图然后用作输入，并使用一种新的采样算法提取最有可能性的未来位置坐标。
results: 该方法在3秒预测函数重要预测间隔中提高了状态艺术预测率，同时在更长的预测间隔（Up to 8秒）中保持竞争力。评估基于2022年 Waymo 动力挑战。

Abstract
One of the key challenges for autonomous vehicles is the ability to accurately predict the motion of other objects in the surrounding environment, such as pedestrians or other vehicles. In this contribution, a novel motion forecasting approach for autonomous vehicles is developed, inspired by the work of Gilles et al. [1]. We predict multiple heatmaps with a neuralnetwork-based model for every traffic participant in the vicinity of the autonomous vehicle; with one heatmap per timestep. The heatmaps are used as input to a novel sampling algorithm that extracts coordinates corresponding to the most likely future positions. We experiment with different encoders and decoders, as well as a comparison of two loss functions. Additionally, a new grid-scaling technique is introduced, showing further improved performance. Overall, our approach improves stateof-the-art miss rate performance for the function-relevant prediction interval of 3 seconds while being competitive in longer prediction intervals (up to eight seconds). The evaluation is done on the public 2022 Waymo motion challenge.

摘要
一个关键挑战 для自动驾驶车是正确预测周围环境中其他对象的运动，如行人或其他车辆。在这篇论文中，我们开发了一种新的运动预测方法， Drawing inspiration from the work of Gilles et al. [1]。我们预测每个交通参与者的vicinity中每帧的多个热图，使用神经网络模型，并使用一种新的采样算法来提取最有可能性的未来位置坐标。我们对不同的编码器和解码器进行了比较，以及两种损失函数的比较。此外，我们还引入了一种新的网格缩放技术，以提高性能。总的来说，我们的方法在3秒预测时间内的状态-相关性表现得更好，并且在更长的预测时间（Up to 8秒）中保持竞争力。我们的评估基于2022年 Waymo 动作挑战的公共数据集。

A Fixed-Parameter Tractable Algorithm for Counting Markov Equivalence Classes with the same Skeleton

paper_url: http://arxiv.org/abs/2310.04218
repo_url: None
paper_authors: Vidya Sagar Sharma
for: 本文是针对 conditional dependencies 的 Bayesian networks （也称为 causal DAGs）的一种工具。
methods: 本文使用的方法包括 combinaatorial characterizations 和 fixed parameter tractable algorithm。
results: 本文取得了一种可以在 polynomial time 内解决 Markov equivalence classes 的问题的方法，该方法基于 treewidth 和 maximum degree 的参数。

Abstract
Causal DAGs (also known as Bayesian networks) are a popular tool for encoding conditional dependencies between random variables. In a causal DAG, the random variables are modeled as vertices in the DAG, and it is stipulated that every random variable is independent of its ancestors conditioned on its parents. It is possible, however, for two different causal DAGs on the same set of random variables to encode exactly the same set of conditional dependencies. Such causal DAGs are said to be Markov equivalent, and equivalence classes of Markov equivalent DAGs are known as Markov Equivalent Classes (MECs). Beautiful combinatorial characterizations of MECs have been developed in the past few decades, and it is known, in particular that all DAGs in the same MEC must have the same ''skeleton'' (underlying undirected graph) and v-structures (induced subgraph of the form $a\rightarrow b \leftarrow c$). These combinatorial characterizations also suggest several natural algorithmic questions. One of these is: given an undirected graph $G$ as input, how many distinct Markov equivalence classes have the skeleton $G$? Much work has been devoted in the last few years to this and other closely related problems. However, to the best of our knowledge, a polynomial time algorithm for the problem remains unknown. In this paper, we make progress towards this goal by giving a fixed parameter tractable algorithm for the above problem, with the parameters being the treewidth and the maximum degree of the input graph $G$. The main technical ingredient in our work is a construction we refer to as shadow, which lets us create a "local description'' of long-range constraints imposed by the combinatorial characterizations of MECs.

摘要
causal DAGs (也称为 bayesian networks) 是一种流行的工具，用于编码 conditional dependencies between random variables。在 causal DAG 中，random variables 被视为Vertices在 DAGC 中，并且假设每个 random variable 独立于其父节点 conditioned on its parents。然而，可能存在两个不同的 causal DAG 在同一组 random variables 上，仅仅编码出同样的 conditional dependencies。这些 causal DAG 被称为 Markov 等价（MEC）。在过去几十年中，美丽的 combinatorial caracterizations 被发展出来，并知道，在特定的情况下，所有 DAG 在同一个 MEC 中必须有同样的 skeleton （基本的无向图）和 v-structures （由 $a\rightarrow b \leftarrow c$ 组成的嵌入式子图）。这些 combinatorial caracterizations 还提出了许多自然的 algorithmic 问题。其中一个是：给一个无向图 $G$ 作为输入，在 $G$ 中有多少个不同的 Markov 等价类？在过去几年中，许多工作已经被投入到这个和相关的问题上。然而，到目前为止，一个 polynomial time 算法 для这个问题仍然未知。在本文中，我们在这个问题上做出了进展，提供了一个 fixed parameter tractable 算法，其中的参数是 treewidth 和最大度数。我们的主要技术成分是一种我们称为 "shadow" 的建构，它允许我们创建 "local description" 来描述 long-range 约束，它们由 MEC 的 combinatorial caracterizations 强制实施。

Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface

paper_url: http://arxiv.org/abs/2310.04205
repo_url: https://github.com/amaze18/speeKAR
paper_authors: Anupam Purwar, Rahul Sundar
For: The paper aims to improve the efficiency and cost-effectiveness of language model-based knowledge retrieval systems, particularly for speech-based interfaces.* Methods: The authors propose a keyword-based search framework that uses a smaller language model to generate keywords and compare them with the query, reducing the time and cost of context identification. They also use a larger language model to provide answers based on a prompt tailored for Q&A.* Results: The authors demonstrate that the use of keywords in context identification reduces the overall inference time and cost of information retrieval, making it more feasible to integrate speech-based interfaces with language model-based systems.

Abstract
Retrieving answers in a quick and low cost manner without hallucinations from a combination of structured and unstructured data using Language models is a major hurdle. This is what prevents employment of Language models in knowledge retrieval automation. This becomes accentuated when one wants to integrate a speech interface on top of a text based knowledge retrieval system. Besides, for commercial search and chat-bot applications, complete reliance on commercial large language models (LLMs) like GPT 3.5 etc. can be very costly. In the present study, the authors have addressed the aforementioned problem by first developing a keyword based search framework which augments discovery of the context from the document to be provided to the LLM. The keywords in turn are generated by a relatively smaller LLM and cached for comparison with keywords generated by the same smaller LLM against the query raised. This significantly reduces time and cost to find the context within documents. Once the context is set, a larger LLM uses that to provide answers based on a prompt tailored for Q\&A. This research work demonstrates that use of keywords in context identification reduces the overall inference time and cost of information retrieval. Given this reduction in inference time and cost with the keyword augmented retrieval framework, a speech based interface for user input and response readout was integrated. This allowed a seamless interaction with the language model.

摘要
Retrieving answers quickly and at a low cost without relying on hallucinations from a combination of structured and unstructured data using language models is a major challenge. This is what prevents the use of language models in knowledge retrieval automation. This becomes especially pronounced when one wants to integrate a speech interface on top of a text-based knowledge retrieval system. Moreover, relying solely on commercial large language models (LLMs) like GPT 3.5 for commercial search and chatbot applications can be very costly. In this study, the authors addressed this problem by first developing a keyword-based search framework that enhances the discovery of context from the document to be provided to the LLM. The keywords are generated by a smaller LLM and cached for comparison with keywords generated by the same smaller LLM against the query raised. This significantly reduces the time and cost of finding the context within documents. Once the context is established, a larger LLM uses that to provide answers based on a prompt tailored for Q&A. This research demonstrates that using keywords in context identification reduces the overall inference time and cost of information retrieval. With this reduction in inference time and cost, a speech-based interface for user input and response readout was integrated, allowing for seamless interaction with the language model.

A Bi-objective Perspective on Controllable Language Models: Reward Dropout Improves Off-policy Control Performance

paper_url: http://arxiv.org/abs/2310.04483
repo_url: https://github.com/anonymous-user01/controllability-of-lm-anonymous
paper_authors: Changhun Lee, Chiehyeon Lim
for: 这 paper 是研究 CLM (可控语言模型) 的理论方面，特别是通过 би对象函数优化来同时提高奖励和概率目标。
methods: 这 paper 使用了 bi-objective 优化方法，包括 reward upper bound 和 Pareto improvement/optimality conditions。
results: 研究结果表明，Reward Dropout 方法可以 guarantees policy improvement based on Pareto improvement condition，并且在五个 CLM benchmark 数据集上进行了实验，发现 Reward Dropout 可以显著提高 CLM 的性能。

Abstract
We study the theoretical aspects of CLMs (Controllable Language Models) from a bi-objective optimization perspective. Specifically, we consider the CLMs as an off-policy RL problem that requires simultaneously maximizing the reward and likelihood objectives. Our main contribution consists of three parts. First, we establish the theoretical foundations of CLM by presenting reward upper bound and Pareto improvement/optimality conditions. Second, we analyze conditions that improve and violate Pareto optimality itself, respectively. Finally, we propose Reward Dropout, a simple yet powerful method to guarantee policy improvement based on a Pareto improvement condition. Our theoretical outcomes are supported by not only deductive proofs but also empirical results. The performance of Reward Dropout was evaluated on five CLM benchmark datasets, and it turns out that the Reward Dropout significantly improves the performance of CLMs.

摘要
我们研究控制语言模型（CLM）的理论方面，从双目标优化的角度来看。特别是，我们将CLM视为不同策略RL问题，需要同时 maximize reward和likelihood目标。我们的主要贡献有三个部分：第一部分是建立CLM的理论基础，提出了奖励upper bound和Pareto改进/优化条件。第二部分是分析Pareto优化的条件，并探讨Pareto优化的改进和违反情况。第三部分是提出了一种简单 yet powerful的方法——奖励抽样（Reward Dropout），可以基于Pareto改进条件来保证策略改进。我们的理论成果不仅得到了deductive证明，还得到了实验的支持。我们在五个CLM标准测试集上评估了奖励抽样的性能，结果显示，奖励抽样可以显著提高CLM的性能。

EMOFM: Ensemble MLP mOdel with Feature-based Mixers for Click-Through Rate Prediction

paper_url: http://arxiv.org/abs/2310.04482
repo_url: None
paper_authors: Yujian Betterest Li, Kai Wu
for: 预测点击率 (CTR) 预测
methods: 使用网络基于方法进行类型化特征提取和跨Field信息融合，并使用简单插入混合器进行场&类型 wise ensemble 模型构建
results: 在实验中，提议的模型比基eline模型高效，并且可视化优化过程和简介减少实验结果。未来工作可能包括对不同类型的交互进行考虑。

Abstract
Track one of CTI competition is on click-through rate (CTR) prediction. The dataset contains millions of records and each field-wise feature in a record consists of hashed integers for privacy. For this task, the keys of network-based methods might be type-wise feature extraction and information fusion across different fields. Multi-layer perceptrons (MLPs) are able to extract field feature, but could not efficiently fuse features. Motivated by the natural fusion characteristic of cross attention and the efficiency of transformer-based structures, we propose simple plug-in mixers for field/type-wise feature fusion, and thus construct an field&type-wise ensemble model, namely EMOFM (Ensemble MLP mOdel with Feature-based Mixers). In the experiments, the proposed model is evaluated on the dataset, the optimization process is visualized and ablation studies are explored. It is shown that EMOFM outperforms compared baselines. In the end, we discuss on future work. WARNING: The comparison might not be fair enough since the proposed method is designed for this data in particular while compared methods are not. For example, EMOFM especially takes different types of interactions into consideration while others do not. Anyway, we do hope that the ideas inside our method could help other developers/learners/researchers/thinkers and so on.

摘要
<> translate the following text into Simplified Chinese: Track one of CTI competition is on click-through rate (CTR) prediction. The dataset contains millions of records and each field-wise feature in a record consists of hashed integers for privacy. For this task, the keys of network-based methods might be type-wise feature extraction and information fusion across different fields. Multi-layer perceptrons (MLPs) are able to extract field feature, but could not efficiently fuse features. Motivated by the natural fusion characteristic of cross attention and the efficiency of transformer-based structures, we propose simple plug-in mixers for field/type-wise feature fusion, and thus construct an field&type-wise ensemble model, namely EMOFM (Ensemble MLP mOdel with Feature-based Mixers). In the experiments, the proposed model is evaluated on the dataset, the optimization process is visualized and ablation studies are explored. It is shown that EMOFM outperforms compared baselines. In the end, we discuss on future work. WARNING: The comparison might not be fair enough since the proposed method is designed for this data in particular while compared methods are not. For example, EMOFM especially takes different types of interactions into consideration while others do not. Anyway, we do hope that the ideas inside our method could help other developers/learners/researchers/thinkers and so on.Please note that the translation is in Simplified Chinese, and the formatting of the text may be different from the original English version.Here's the translation:跟踪一项CTI竞赛是Click-through rate（CTR）预测。数据集包含数百万条记录，每个记录中的每个字段特征都是使用哈希值进行隐私保护。为了实现这个任务，网络基于方法的关键可能是类型化特征提取和不同字段之间的信息融合。多层感知器（MLP）可以提取字段特征，但不能高效融合特征。我们受到自然融合特性和转换结构的支持，提出了简单的插入混合器来实现字段/类型特征融合，并构建了一个场&类型特征混合模型，即EMOFM（场&类型特征混合MLP模型）。在实验中，我们评估了提案模型在数据集上，Visualize优化过程和缺省研究。结果表明，EMOFM在比较基线上表现出色。在结尾，我们讨论了未来工作。警告：比较可能不公平，因为我们的方法是为这些数据而设计的，而比较方法则未经设计。例如，EMOFM特别是考虑不同类型的互动，而其他方法并没有。然而，我们希望将我们的方法中的想法传递给其他开发者/学习者/研究者/思想家等。

Conversational Financial Information Retrieval Model (ConFIRM)

paper_url: http://arxiv.org/abs/2310.13001
repo_url: None
paper_authors: Stephen Choi, William Gazeley, Siu Ho Wong, Tingting Li
for: 这 paper 是为了探讨利用大语言模型（LLM）在金融领域中的应用。
methods: 这 paper 使用了一种名为 ConFIRM 的 conversational financial information retrieval模型，包括两个模块：首先，生成金融领域特有的问答对话集；其次，评估多个参数精细调整方法的查询分类任务的准确率。
results: 据测试集数据，ConFIRM 可以达到高于 90% 的准确率，这对于 regulatory compliance 是必要的。ConFIRM 提供了一种数据效率的解决方案，用于提取金融对话系统中的精确查询意图。

Abstract
With the exponential growth in large language models (LLMs), leveraging their emergent properties for specialized domains like finance merits exploration. However, regulated fields such as finance pose unique constraints, requiring domain-optimized frameworks. We present ConFIRM, an LLM-based conversational financial information retrieval model tailored for query intent classification and knowledge base labeling. ConFIRM comprises two modules: 1) a method to synthesize finance domain-specific question-answer pairs, and 2) evaluation of parameter efficient fine-tuning approaches for the query classification task. We generate a dataset of over 4000 samples, assessing accuracy on a separate test set. ConFIRM achieved over 90% accuracy, essential for regulatory compliance. ConFIRM provides a data-efficient solution to extract precise query intent for financial dialog systems.

摘要
随着大语言模型（LLM）的快速增长，利用它们的特性在特定领域如金融方面的应用值得探索。然而，受控领域如金融的特殊要求，需要针对领域进行优化的框架。我们介绍ConFIRM，一个基于LLM的对话金融信息检索模型，适用于查询意图分类和知识库标签。ConFIRM包括两个模块：1. 生成金融领域特定的问答对数生成方法。2. 评估参数高效微调方法的查询分类任务评估。我们生成了超过4000个样本，并评估了测试集上的准确率。ConFIRM达到了90%的准确率，这是必要的证明合规遵守。ConFIRM提供了数据效率的解决方案，用于提取金融对话系统中精准的查询意图。

Introducing the Attribution Stability Indicator: a Measure for Time Series XAI Attributions

paper_url: http://arxiv.org/abs/2310.04178
repo_url: https://github.com/visual-xai-for-time-series/attribution-stability-indicator
paper_authors: Udo Schlegel, Daniel A. Keim
for: 这篇论文旨在提供一种可解释性模型，用于满足递归时序数据领域中增长的需求。
methods: 该论文使用了 perturbation 分析和相关性分析来评估对时序数据的解释性模型。
results: 该论文提出了一种robustness和可信度的评价指标，即 Attribution Stability Indicator (ASI)，并通过三个整体时序分类 datasets 的分析，证明了 ASI 的可靠性和有用性。

Abstract
Given the increasing amount and general complexity of time series data in domains such as finance, weather forecasting, and healthcare, there is a growing need for state-of-the-art performance models that can provide interpretable insights into underlying patterns and relationships. Attribution techniques enable the extraction of explanations from time series models to gain insights but are hard to evaluate for their robustness and trustworthiness. We propose the Attribution Stability Indicator (ASI), a measure to incorporate robustness and trustworthiness as properties of attribution techniques for time series into account. We extend a perturbation analysis with correlations of the original time series to the perturbed instance and the attributions to include wanted properties in the measure. We demonstrate the wanted properties based on an analysis of the attributions in a dimension-reduced space and the ASI scores distribution over three whole time series classification datasets.

摘要
随着金融、天气预测和医疗等领域时序数据的增加和总体复杂度，需要更高级的性能模型，以提供可读性的下溯。但是评估Attribution技术的可靠性和信任性具有挑战。我们提出Attribution Stability Indicator（ASI），一种将可靠性和信任性作为时序Attribution技术的质量因素。我们通过对原始时序数据的推动分析，包括原始时序和推动分析的相关性，来扩展ASI的评估。我们通过对三个整体时序分类 dataset的Attribution分析，示出了ASI分数的分布，并证明了所需的属性。

Dynamic Relation-Attentive Graph Neural Networks for Fraud Detection

paper_url: http://arxiv.org/abs/2310.04171
repo_url: https://github.com/bdi-lab/drag
paper_authors: Heehyeon Kim, Jinhyeok Choi, Joyce Jiyoung Whang
for: 检测fraudster在社交媒体上的活动，例如留言或交易。
methods: 使用图 neural network（GNN）和动态关系注意力机制来解决这个分类问题。
results: 对实际数据集进行实验，我们的方法（DRAG）的性能高于现有的fraud detection方法。Here’s the full translation of the abstract in Simplified Chinese:检测fraudster在社交媒体上的活动，例如留言或交易，是一个重要的问题。在这篇论文中，我们使用图 neural network（GNN）和动态关系注意力机制来解决这个分类问题。我们首先提出了一种动态关系注意力机制，使得我们可以在不同的层次上学习不同类型的关系，并且可以在不同层次上进行归一化。然后，我们使用这种机制来对图进行归一化，并且使用一个可学习的注意力函数来权衡不同类型的关系。最后，我们对实际数据集进行实验，并证明了我们的方法（DRAG）的性能高于现有的fraud detection方法。

Abstract
Fraud detection aims to discover fraudsters deceiving other users by, for example, leaving fake reviews or making abnormal transactions. Graph-based fraud detection methods consider this task as a classification problem with two classes: frauds or normal. We address this problem using Graph Neural Networks (GNNs) by proposing a dynamic relation-attentive aggregation mechanism. Based on the observation that many real-world graphs include different types of relations, we propose to learn a node representation per relation and aggregate the node representations using a learnable attention function that assigns a different attention coefficient to each relation. Furthermore, we combine the node representations from different layers to consider both the local and global structures of a target node, which is beneficial to improving the performance of fraud detection on graphs with heterophily. By employing dynamic graph attention in all the aggregation processes, our method adaptively computes the attention coefficients for each node. Experimental results show that our method, DRAG, outperforms state-of-the-art fraud detection methods on real-world benchmark datasets.

摘要
针对欺诈者通过假评论或异常交易伤害其他用户的行为，欺诈检测目标是找到这些欺诈者。基于图的欺诈检测方法将这个问题视为一个分类问题，其中有两个类别：欺诈或正常。我们使用图神经网络（GNNs）来解决这个问题，并提出了动态关系注意机制。根据许多真实世界图中包含不同类型的关系的观察，我们提议学习每个关系的节点表示，并使用学习的注意函数来对每个关系分配不同的注意系数。此外，我们将不同层的节点表示进行组合，以考虑目标节点的本地和全局结构，从而提高欺诈检测在异质图上的性能。通过在所有聚合过程中使用动态图注意，我们的方法可以适应性计算每个节点的注意系数。实验结果表明，我们的方法DRAG在真实世界 benchmark 数据集上超过了当前最佳的欺诈检测方法。

Document-Level Relation Extraction with Relation Correlation Enhancement

paper_url: http://arxiv.org/abs/2310.13000
repo_url: https://github.com/lumia-group/lace
paper_authors: Yusheng Huang, Zhouhan Lin
for: 本研究旨在提高文档关系提取（DocRE）模型的性能，通过显著地利用关系相互关系。
methods: 我们提出了一种关系图方法，使用先前知道的关系统统征信息来构建关系图，并使用重要性权重Matrix来引导关系信息的传播。
results: 我们的方法可以很好地与现有模型结合使用，并且在多关系提取 task 中提高性能，证明了关系相互关系的考虑对 DocRE task 是重要的。

Abstract
Document-level relation extraction (DocRE) is a task that focuses on identifying relations between entities within a document. However, existing DocRE models often overlook the correlation between relations and lack a quantitative analysis of relation correlations. To address this limitation and effectively capture relation correlations in DocRE, we propose a relation graph method, which aims to explicitly exploit the interdependency among relations. Firstly, we construct a relation graph that models relation correlations using statistical co-occurrence information derived from prior relation knowledge. Secondly, we employ a re-weighting scheme to create an effective relation correlation matrix to guide the propagation of relation information. Furthermore, we leverage graph attention networks to aggregate relation embeddings. Importantly, our method can be seamlessly integrated as a plug-and-play module into existing models. Experimental results demonstrate that our approach can enhance the performance of multi-relation extraction, highlighting the effectiveness of considering relation correlations in DocRE.

摘要
文档关系提取（DocRE）是一个着眼于文档中实体之间关系的任务。然而，现有的 DocRE 模型通常忽视关系之间的相关性并缺乏关系相关性的量化分析。为了解决这些限制并有效地捕捉关系相关性，我们提议一种关系图方法，它利用优先知识中的关系统统统计协会信息来模型关系相关性。首先，我们构建一个关系图，其中模型了关系之间的相关性。然后，我们采用一种重新权重分配方法，以创建一个有效的关系相关性矩阵，以导引关系信息的协会。此外，我们利用图注意网络来聚合关系嵌入。重要的是，我们的方法可以轻松地与现有模型集成，成为插件模块。实验结果表明，我们的方法可以提高多关系提取的性能，从而证明了考虑关系相关性在 DocRE 中的重要性。

Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.04148
repo_url: https://github.com/ydchen0806/dbmim
paper_authors: Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong
for: This paper aims to improve the performance of supervised neuron segmentation methods by using self-supervised learning and reinforcement learning to pretrain a decision-based mask image model (MIM).methods: The proposed method utilizes reinforcement learning to automatically search for the optimal image masking ratio and masking strategy, and treats each input patch as an agent with a shared behavior policy to enable multi-agent collaboration.results: The proposed approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation, as demonstrated by experiments conducted on representative EM datasets.

Abstract
The performance of existing supervised neuron segmentation methods is highly dependent on the number of accurate annotations, especially when applied to large scale electron microscopy (EM) data. By extracting semantic information from unlabeled data, self-supervised methods can improve the performance of downstream tasks, among which the mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. However, due to the high degree of structural locality in EM images, as well as the existence of considerable noise, many voxels contain little discriminative information, making MIM pretraining inefficient on the neuron segmentation task. To overcome this challenge, we propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Due to the vast exploration space, using single-agent RL for voxel prediction is impractical. Therefore, we treat each input patch as an agent with a shared behavior policy, allowing for multi-agent collaboration. Furthermore, this multi-agent model can capture dependencies between voxels, which is beneficial for the downstream segmentation task. Experiments conducted on representative EM datasets demonstrate that our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation. Code is available at \url{https://github.com/ydchen0806/dbMiM}.

摘要
现有的监督学习神经分 segmentation 方法的性能受到标注数量的影响，特别是在大规模电子顺icroscopy (EM) 数据上。通过提取 semantic 信息于无标注数据中，自动学习方法可以提高下游任务的性能。在这些下游任务中，mask image model (MIM) 得到了广泛的应用，因为它的简单性和能效性在恢复原始信息的masked 图像中。然而，由于EM图像中的结构本地性和噪声的存在，许多 voxel 含有少量特征信息，使得 MIM 预training 不具有效果。为了解决这个挑战，我们提出了一种决策基于的 MIM，通过 reinforcement learning (RL) 自动搜索最佳的图像掩蔽比率和掩蔽策略。由于搜索空间的庞大，单个agent RL 为 voxel 预测是不实用的。因此，我们将每个输入 patch 作为一个agent，并将其共享行为策略。此外，这种多agent模型可以捕捉voxels之间的依赖关系，这对下游分 segmentation 任务是有利的。在代表性的 EM 数据集上进行的实验表明，我们的方法在神经分 segmentation 任务中具有显著的优势。代码可以在 \url{https://github.com/ydchen0806/dbMiM} 上获取。

Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations

paper_url: http://arxiv.org/abs/2310.04481
repo_url: None
paper_authors: Manon Macary, Marie Tahon, Yannick Estève, Daniel Luzzati
for: 这项研究旨在自动检测真实的客户服务电话对话中的满意度和不满度。
methods: 这项研究使用预训练的语音表示进行转移学习，以提高客户服务质量。
results: 实验结果表明，使用预训练的语音表示可以获得大量的性能提升，而语言内容是满意度预测的主要 contribuutor。

Abstract
The goal of our research is to automatically retrieve the satisfaction and the frustration in real-life call-center conversations. This study focuses an industrial application in which the customer satisfaction is continuously tracked down to improve customer services. To compensate the lack of large annotated emotional databases, we explore the use of pre-trained speech representations as a form of transfer learning towards AlloSat corpus. Moreover, several studies have pointed out that emotion can be detected not only in speech but also in facial trait, in biological response or in textual information. In the context of telephone conversations, we can break down the audio information into acoustic and linguistic by using the speech signal and its transcription. Our experiments confirms the large gain in performance obtained with the use of pre-trained features. Surprisingly, we found that the linguistic content is clearly the major contributor for the prediction of satisfaction and best generalizes to unseen data. Our experiments conclude to the definitive advantage of using CamemBERT representations, however the benefit of the fusion of acoustic and linguistic modalities is not as obvious. With models learnt on individual annotations, we found that fusion approaches are more robust to the subjectivity of the annotation task. This study also tackles the problem of performances variability and intends to estimate this variability from different views: weights initialization, confidence intervals and annotation subjectivity. A deep analysis on the linguistic content investigates interpretable factors able to explain the high contribution of the linguistic modality for this task.

摘要

Reinforcement Learning with Fast and Forgetful Memory

paper_url: http://arxiv.org/abs/2310.04128
repo_url: https://github.com/proroklab/ffm
paper_authors: Steven Morad, Ryan Kortvelesy, Stephan Liwicki, Amanda Prorok
for: 这篇论文主要是为了解决模型自由RL中存储问题，提出了一种专门为RL设计的快速和忘记型内存模型（Fast and Forgetful Memory，FFM），以提高奖励和训练效率。
methods: FFM使用了 Computational Psychology 中的强结构预先知识，将RL的模型搜索空间约束到一个固定的空间，并且可以作为 RNN 的替换部件，无需改变任何超参数。
results: FFM在多种回合RL benchmark 和算法上达到了更高的奖励，并且在训练速度方面也有了两个数量级的提升，具体来说是logarithmic time和linear space复杂度。

Abstract
Nearly all real world tasks are inherently partially observable, necessitating the use of memory in Reinforcement Learning (RL). Most model-free approaches summarize the trajectory into a latent Markov state using memory models borrowed from Supervised Learning (SL), even though RL tends to exhibit different training and efficiency characteristics. Addressing this discrepancy, we introduce Fast and Forgetful Memory, an algorithm-agnostic memory model designed specifically for RL. Our approach constrains the model search space via strong structural priors inspired by computational psychology. It is a drop-in replacement for recurrent neural networks (RNNs) in recurrent RL algorithms, achieving greater reward than RNNs across various recurrent benchmarks and algorithms without changing any hyperparameters. Moreover, Fast and Forgetful Memory exhibits training speeds two orders of magnitude faster than RNNs, attributed to its logarithmic time and linear space complexity. Our implementation is available at https://github.com/proroklab/ffm.

摘要
大多数实际任务都是半观察的，因此需要在强化学习（RL）中使用记忆。大多数无模型方法会把轨迹摘要为一个 latent Markov state 使用来自supervised Learning（SL）的记忆模型，although RL 的训练和效率特点与 SL 有所不同。为解决这个差异，我们介绍 Fast and Forgetful Memory，一种专门为 RL 设计的记忆模型。我们通过强制模型搜索空间的计算机科学逻辑约束，使其成为 RNN 的替换部件，在不同的 recurrent 算法和hyperparameters 下实现更高的奖励。此外，Fast and Forgetful Memory 的训练速度比 RNN 快两个数量级，这是因为它的循环时间复杂度为对数几何，而不是 RNN 的线性循环时间复杂度。我们的实现可以在 GitHub 上找到：https://github.com/proroklab/ffm。

Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems

paper_url: http://arxiv.org/abs/2310.05847
repo_url: None
paper_authors: Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han, Dan Meng, Jun Wang
for: 隐私担忧在推荐系统中增加了关注，因此推荐系统中的推学算法忘记（Unlearning）已经得到了越来越多的关注。现有的研究主要使用训练数据作为忘记目标。然而，我们发现攻击者可以从一个已经训练过的模型中提取私人信息，例如性别、种族和年龄，即使这些信息没有直接出现在训练数据中。我们称这些未看到的信息为特征，并将其作为忘记目标。
methods: 为了保护用户的敏感特征，我们提出了特征忘记（Attribute Unlearning，AU），它的目的是降低攻击性能和使目标特征变得无法识别。在这篇论文中，我们关注一个具有严格且实际意义的AU问题，即在训练完成后进行忘记（Post-Training Attribute Unlearning，PoT-AU）。为解决PoT-AU问题，我们设计了一个两部分损失函数，包括i）特征标签无法识别的混淆损失，和ii）正则化损失，以防止模型受到训练变化而导致推荐性能下降。
results: 我们使用批量Descendant Gradient Algorithm进行优化。对于三个真实的数据集进行了广泛的实验，结果表明我们的提出的方法有效地解决了PoT-AU问题。

Abstract
With the growing privacy concerns in recommender systems, recommendation unlearning, i.e., forgetting the impact of specific learned targets, is getting increasing attention. Existing studies predominantly use training data, i.e., model inputs, as the unlearning target. However, we find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training. We name this unseen information as attribute and treat it as the unlearning target. To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable. In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of i) distinguishability loss: making attribute labels indistinguishable from attackers, and ii) regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance. Specifically, we investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.

摘要
In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of:1. Distinguishability loss: making attribute labels indistinguishable from attackers.2. Regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance.We investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.

Auto-survey Challenge

paper_url: http://arxiv.org/abs/2310.04480
repo_url: None
paper_authors: Thanh Gia Hieu Khuong, Benedictus Kent Rachmat
for: This paper is written for evaluating the capability of Large Language Models (LLMs) to autonomously compose and critique survey papers in various disciplines.
methods: The paper uses a simulated peer-review mechanism and human organizers in an editorial oversight capacity to evaluate the LLMs’ performance.
results: The paper presents the design of a competition for the AutoML conference 2023, where entrants are tasked with presenting stand-alone models that can author and appraise articles based on designated prompts, and the assessment criteria include clarity, reference appropriateness, accountability, and substantive value of the content.Here’s the simplified Chinese text:
for: 这篇论文是为了评估大语言模型（LLM）在不同领域的自动撰写和评论报告的能力而写的。
methods: 这篇论文使用了模拟的同行评审机制和人类组织者在编辑监督性的情况下评估LLM的表现。
results: 这篇论文介绍了AutoML会议2023年的一项竞赛，参赛者需要提交独立的模型，能够根据指定的提示自动撰写和评论文章，并且评估标准包括明确度、参考适用性、责任和内容的价值。

Abstract
We present a novel platform for evaluating the capability of Large Language Models (LLMs) to autonomously compose and critique survey papers spanning a vast array of disciplines including sciences, humanities, education, and law. Within this framework, AI systems undertake a simulated peer-review mechanism akin to traditional scholarly journals, with human organizers serving in an editorial oversight capacity. Within this framework, we organized a competition for the AutoML conference 2023. Entrants are tasked with presenting stand-alone models adept at authoring articles from designated prompts and subsequently appraising them. Assessment criteria include clarity, reference appropriateness, accountability, and the substantive value of the content. This paper presents the design of the competition, including the implementation baseline submissions and methods of evaluation.

摘要
我们提出了一种新的平台，用于评估大自然语言模型（LLM）能够自主撰写和评论论文，涵盖了各种学科，包括科学、人文社会科学、教育和法律。在这个框架下，人工智能系统在模拟学术审查机制下进行自动审查，人类组织者在编辑监督 capacity 中发挥作用。为2023年AutoML会议，我们组织了一场竞赛，参赛者需要提交独立的模型，可以从指定的提示中撰写文章，并且进行评估。评价标准包括明确度、引用适用性、负责任性和内容的实际价值。本文介绍了竞赛的设计，包括提交基eline和评估方法。

Leveraging Data Geometry to Mitigate CSM in Steganalysis

paper_url: http://arxiv.org/abs/2310.04479
repo_url: None
paper_authors: Rony Abecidan, Vincent Itier, Jérémie Boulanger, Patrick Bas, Tomáš Pevný
for: 本研究旨在 mitigating Cover Source Mismatch (CSM) issue in steganalysis, by exploring a grid of processing pipelines and developing a strategy for selecting or deriving customized training datasets.
methods: 本研究使用了一种基于几何学的优化策略，通过Computer Tomography (DCTr) 特征下的子空间距离计算出高相关性的操作 regret。
results: 实验 validate 了我们的几何学基于优化策略，相比传统原子方法，在理想的假设下表现出色。详细的实验结果可以在github.com/RonyAbecidan/LeveragingGeometrytoMitigateCSM 上找到。

Abstract
In operational scenarios, steganographers use sets of covers from various sensors and processing pipelines that differ significantly from those used by researchers to train steganalysis models. This leads to an inevitable performance gap when dealing with out-of-distribution covers, commonly referred to as Cover Source Mismatch (CSM). In this study, we consider the scenario where test images are processed using the same pipeline. However, knowledge regarding both the labels and the balance between cover and stego is missing. Our objective is to identify a training dataset that allows for maximum generalization to our target. By exploring a grid of processing pipelines fostering CSM, we discovered a geometrical metric based on the chordal distance between subspaces spanned by DCTr features, that exhibits high correlation with operational regret while being not affected by the cover-stego balance. Our contribution lies in the development of a strategy that enables the selection or derivation of customized training datasets, enhancing the overall generalization performance for a given target. Experimental validation highlights that our geometry-based optimization strategy outperforms traditional atomistic methods given reasonable assumptions. Additional resources are available at github.com/RonyAbecidan/LeveragingGeometrytoMitigateCSM.

摘要
在操作场景中，隐写者使用来自不同感知和处理管道的集合，这与研究人员用于训练隐写检测模型的集合有很大差异。这导致了对于非典型覆盖（CSM）的性能差距。在本研究中，我们考虑的enario是测试图像通过同一个管道进行处理。然而，关于标签和覆盖的权重知道的信息缺失。我们的目标是找到一个允许最大化适应性的训练集。通过探索一个离散的处理管道集合，我们发现了一种基于分割距离的 геометри metric，它与操作 regret 具有高相关性，而不受覆盖-隐写权重的影响。我们的贡献在于开发了一种基于几何学的选择或 derivation 训练集策略，以提高给定目标的总体适应性。实验验证表明，我们的几何学基于优化策略在理想情况下比传统原子方法表现更好。更多资源可以在github.com/RonyAbecidan/LeveragingGeometrytoMitigateCSM中找到。

Nash Welfare and Facility Location

paper_url: http://arxiv.org/abs/2310.04102
repo_url: None
paper_authors: Alexander Lam, Haris Aziz, Toby Walsh
for: 解决资源分配问题中兼顾公平和效率的问题
methods: 使用纳什公平函数，将个体成本转换为Utility，并提供一个多阶段搜索算法来寻找最大化纳什公平的设施位置
results: 提供一个纳什公平函数来衡量设施位置的公平性和效率，并证明这种方法可以在多阶段搜索中实现可靠的近似解Here’s a more detailed explanation of each point:
for: The paper is written to address the problem of locating a facility to serve a set of agents located along a line, and to provide a compromise between fairness and efficiency in resource allocation problems.
methods: The paper uses the Nash welfare objective function, which is defined as the product of the agents’ utilities, to convert individual costs to utilities and analyze the facility placement that maximizes the Nash welfare. The paper also provides a polynomial-time approximation algorithm to compute this facility location.
results: The paper proves results suggesting that the proposed facility location algorithm achieves a good balance of fairness and efficiency, and also proposes a strategy-proof mechanism with a bounded approximation ratio for Nash welfare from a mechanism design perspective.

Abstract
We consider the problem of locating a facility to serve a set of agents located along a line. The Nash welfare objective function, defined as the product of the agents' utilities, is known to provide a compromise between fairness and efficiency in resource allocation problems. We apply this welfare notion to the facility location problem, converting individual costs to utilities and analyzing the facility placement that maximizes the Nash welfare. We give a polynomial-time approximation algorithm to compute this facility location, and prove results suggesting that it achieves a good balance of fairness and efficiency. Finally, we take a mechanism design perspective and propose a strategy-proof mechanism with a bounded approximation ratio for Nash welfare.

摘要
我团队考虑了一个服务多个代理的设施问题。我们使用纳什利益函数，它是资源分配问题中的一种妥协之处，来解决这个问题。我们将个人成本转换为利益，并分析最大化纳什利益的设施位置。我们提供一个可靠的多阶段批处理算法，并证明它可以实现一个良好的平衡 между公平和效率。最后，我们从机制设计的角度出发，并提出一个具有 bounded approximation ratio的纳什利益响应的机制。Note: "纳什利益函数" (Nash welfare function) is a direct translation of the English term, and "纳什利益响应" (Nash welfare response) is a translation of the phrase "strategy-proof mechanism".

A Deeply Supervised Semantic Segmentation Method Based on GAN

paper_url: http://arxiv.org/abs/2310.04081
repo_url: None
paper_authors: Wei Zhao, Qiyu Wei, Zeng Zeng
for: 这篇论文旨在提高智能交通系统中的交通安全性，通过精确地识别和位置化不同类型的路面元素，如路面裂隙、车道和交通标志。
methods: 本研究提出一个改进的semantic segmentation模型，结合了抗对抗学习和现有的semantic segmentation技术，以提高模型在交通图像中捕捉复杂和微妙的特征的能力。
results: 比较 existing methods，如SEGAN，本研究获得了明显的性能提升，特别是在路面裂隙数据集上。这些改进可以归因于对抗学习和semantic segmentation之间的相互补充作用，导致更精确和丰富的路面结构和状态表现。

Abstract
In recent years, the field of intelligent transportation has witnessed rapid advancements, driven by the increasing demand for automation and efficiency in transportation systems. Traffic safety, one of the tasks integral to intelligent transport systems, requires accurately identifying and locating various road elements, such as road cracks, lanes, and traffic signs. Semantic segmentation plays a pivotal role in achieving this task, as it enables the partition of images into meaningful regions with accurate boundaries. In this study, we propose an improved semantic segmentation model that combines the strengths of adversarial learning with state-of-the-art semantic segmentation techniques. The proposed model integrates a generative adversarial network (GAN) framework into the traditional semantic segmentation model, enhancing the model's performance in capturing complex and subtle features in transportation images. The effectiveness of our approach is demonstrated by a significant boost in performance on the road crack dataset compared to the existing methods, \textit{i.e.,} SEGAN. This improvement can be attributed to the synergistic effect of adversarial learning and semantic segmentation, which leads to a more refined and accurate representation of road structures and conditions. The enhanced model not only contributes to better detection of road cracks but also to a wide range of applications in intelligent transportation, such as traffic sign recognition, vehicle detection, and lane segmentation.

摘要
在最近的几年，智能交通领域已经受到了自动化和效率的需求的推动，导致了智能交通系统的快速发展。交通安全是智能交通系统中的一项重要任务，需要准确地识别和定位不同的路面元素，如路面裂隙、车道和交通标志。Semantic segmentation在完成这项任务中扮演着关键的角色，它使得图像被分割成有意义的区域，边界准确。在本研究中，我们提出了一种改进的Semantic segmentation模型，该模型结合了对抗学习和当今最佳Semantic segmentation技术的优势。我们的提案把生成对抗网络（GAN）框架 incorporated into the traditional Semantic segmentation模型，从而提高模型对复杂和细微的特征的捕捉能力。我们的方法在路面裂隙数据集上表现出了显著的提高，与现有的方法相比，例如SEGAN，即使在较为复杂的场景下也能够更好地识别路面裂隙。这种提高可以归因于对抗学习和Semantic segmentation的相互作用，导致了更加精细和准确的路面结构和状况的表示。提高的模型不仅有助于更好地检测路面裂隙，还有广泛的应用前景在智能交通领域，如交通标志识别、车辆检测和车道分 segmentation。

Automatic Aspect Extraction from Scientific Texts

paper_url: http://arxiv.org/abs/2310.04074
repo_url: https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts
paper_authors: Anna Marshalova, Elena Bruches, Tatiana Batura
for: 本研究的目的是开发一种自动从俄语科学文献中提取主要元素的工具，以便进行科学文献综述。
methods: 本研究使用的方法包括创建了跨领域俄语科学文献数据集，并使用多语言BERT模型在这些数据上进行了微调。
results: 研究表明，使用微调后的多语言BERT模型可以在不同领域的俄语科学文献中提取主要元素，并且可以在不同领域之间进行泛化。

Abstract
Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at \url{https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts}.

摘要
<>将科学文献中的主要点、关键发现和其他重要信息（以下简称“方面”）抽取出来，可能会使科学文献复习更加容易。因此，我们的研究目标是开发一种自动从俄语科学文献中提取方面的工具。在这篇论文中，我们提供了跨领域俄语科学文献数据集，每篇文献都有标注的方面，包括任务、贡献、方法和结论。此外，我们还提出了一个基线算法，基于多语言BERT模型，并在我们的数据集上进行了微调。我们发现在不同领域中，方面的表示存在一些差异，但是我们的模型即使只在有限的科学领域进行了训练，仍然能够在新领域中广泛应用。代码和数据集可以在 GitHub 上找到：。

AI Regulation in Europe: From the AI Act to Future Regulatory Challenges

paper_url: http://arxiv.org/abs/2310.04072
repo_url: None
paper_authors: Philipp Hacker
for: 本文探讨了欧盟对人工智能（AI）的规制，与英国更为分化和自律的方法进行比较，并提出了一种混合规制策略，强调了需要灵活和安全坚实的执行。
methods: 本文研究了AI法，作为人工智能技术面临多方面挑战的先驱立法努力，尽管该法有缺陷，但是是一个重要的立法成就。
results: 本文预测了未来的规制挑战，如管理恶势力内容、环境问题和杂合威胁，并强调了需要立即创建规则 для规制高性能、可能是开源的AI系统访问。虽然AI法是一个重要的立法成就，但是需要进一步的细化和全球合作，以有效管理在不断发展的AI技术。

Abstract
This chapter provides a comprehensive discussion on AI regulation in the European Union, contrasting it with the more sectoral and self-regulatory approach in the UK. It argues for a hybrid regulatory strategy that combines elements from both philosophies, emphasizing the need for agility and safe harbors to ease compliance. The paper examines the AI Act as a pioneering legislative effort to address the multifaceted challenges posed by AI, asserting that, while the Act is a step in the right direction, it has shortcomings that could hinder the advancement of AI technologies. The paper also anticipates upcoming regulatory challenges, such as the management of toxic content, environmental concerns, and hybrid threats. It advocates for immediate action to create protocols for regulated access to high-performance, potentially open-source AI systems. Although the AI Act is a significant legislative milestone, it needs additional refinement and global collaboration for the effective governance of rapidly evolving AI technologies.

摘要
Translation in Simplified Chinese:这章节讲述欧盟的人工智能规则，与英国更为领域化和自 réglementaire的方法进行比较，强调混合 réglementaire 策略，并强调需要灵活和安全庇护以便遵守。文章分析了人工智能法为人工智能多重挑战所采取的先驱法律努力，但认为法律有缺陷可能会阻碍人工智能技术的发展。文章还预测将来的规制挑战，如抑制危险内容、环境问题和杂合威胁。它主张立即创建规则了高性能、可能是开源的人工智能系统的访问协议。虽然人工智能法是一个重要的立法里程碑，但它需要进一步的细化和全球合作，以有效管理在不断发展的人工智能技术。

ByteStack-ID: Integrated Stacked Model Leveraging Payload Byte Frequency for Grayscale Image-based Network Intrusion Detection

paper_url: http://arxiv.org/abs/2310.09298
repo_url: None
paper_authors: Irfan Khan, Yasir Ali Farrukh, Syed Wali
for: 本研究旨在提高网络安全性，通过 packet-level 数据分析，快速、准确地识别多样化的攻击类型。
methods: 本文提出了 “ByteStack-ID” 方法，基于 payload 数据频率分布生成的灰度图像，以及 packet-level 信息为基础，采用堆式方法拟合，并具有高度优化、一体化的特点。
results: 实验结果表明，ByteStack-ID 框架在多类分类任务中表现杰出，与基eline模型和现有方法相比，具有较高的精度、回归率和 F1 分数。特别是，ByteStack-ID 在多类分类任务中达到了 81% 的macro F1 分数。

Abstract
In the ever-evolving realm of network security, the swift and accurate identification of diverse attack classes within network traffic is of paramount importance. This paper introduces "ByteStack-ID," a pioneering approach tailored for packet-level intrusion detection. At its core, ByteStack-ID leverages grayscale images generated from the frequency distributions of payload data, a groundbreaking technique that greatly enhances the model's ability to discern intricate data patterns. Notably, our approach is exclusively grounded in packet-level information, a departure from conventional Network Intrusion Detection Systems (NIDS) that predominantly rely on flow-based data. While building upon the fundamental concept of stacking methodology, ByteStack-ID diverges from traditional stacking approaches. It seamlessly integrates additional meta learner layers into the concatenated base learners, creating a highly optimized, unified model. Empirical results unequivocally confirm the outstanding effectiveness of the ByteStack-ID framework, consistently outperforming baseline models and state-of-the-art approaches across pivotal performance metrics, including precision, recall, and F1-score. Impressively, our proposed approach achieves an exceptional 81\% macro F1-score in multiclass classification tasks. In a landscape marked by the continuous evolution of network threats, ByteStack-ID emerges as a robust and versatile security solution, relying solely on packet-level information extracted from network traffic data.

摘要
在网络安全领域中，能快速和准确地识别多样化的攻击类型在网络流量中是极其重要的。本文介绍了一种名为“ByteStack-ID”的创新方法，这种方法是专门为packet-level攻击检测设计的。ByteStack-ID的核心思想是利用 payload 数据的频率分布生成灰度图像，这是一种新的技术，它能够大幅提高模型对复杂数据模式的识别能力。与传统的网络入侵检测系统（NIDS）不同，ByteStack-ID 仅仅基于流量数据，而不是基于流量的总是。在核心思想上，ByteStack-ID 是一种堆叠方法，但它不同于传统的堆叠方法，它可以将多个基础学习层集成到 concatenated 的基础学习层中，创造出一个高度优化的、统一的模型。实验结果证明，ByteStack-ID 框架在多类分类任务中表现杰出，与基准模型和当前领域的state-of-the-art方法相比，在重要的性能指标上均取得了出色的成绩。特别是，我们的提议方法在多类分类任务中达到了81%的macro F1-score，这表明ByteStack-ID 在网络安全领域中是一种Robust和多样化的安全解决方案。

Kick Bad Guys Out! Zero-Knowledge-Proof-Based Anomaly Detection in Federated Learning

paper_url: http://arxiv.org/abs/2310.04055
repo_url: None
paper_authors: Shanshan Han, Wenxuan Wu, Baturalp Buyukates, Weizhao Jin, Yuhang Yao, Qifan Zhang, Salman Avestimehr, Chaoyang He
for: 防止 federated learning 系统中的恶意客户端提交损害模型以实现对抗目标，如阻碍全局模型的协调或让全局模型对某些数据进行误分类。
methods: 使用 cutting-edge anomaly detection 技术，包括：i) 检测攻击发生并在攻击发生时执行防御操作；ii) 在攻击发生时，进一步检测恶意客户端模型并从中除掉无害的模型；iii) 使用零知识证明机制来保证服务器上的防御机制的诚实执行。
results: 通过广泛的实验证明了提posed approach的超越性表现。

Abstract
Federated learning (FL) systems are vulnerable to malicious clients that submit poisoned local models to achieve their adversarial goals, such as preventing the convergence of the global model or inducing the global model to misclassify some data. Many existing defense mechanisms are impractical in real-world FL systems, as they require prior knowledge of the number of malicious clients or rely on re-weighting or modifying submissions. This is because adversaries typically do not announce their intentions before attacking, and re-weighting might change aggregation results even in the absence of attacks. To address these challenges in real FL systems, this paper introduces a cutting-edge anomaly detection approach with the following features: i) Detecting the occurrence of attacks and performing defense operations only when attacks happen; ii) Upon the occurrence of an attack, further detecting the malicious client models and eliminating them without harming the benign ones; iii) Ensuring honest execution of defense mechanisms at the server by leveraging a zero-knowledge proof mechanism. We validate the superior performance of the proposed approach with extensive experiments.

摘要
联合学习（FL）系统容易受到黑客的攻击，这些黑客可能会提交毒害的本地模型来达到他们的恶意目的，例如阻碍全球模型的聚合或让全球模型错分析一些数据。许多现有的防御机制不实用于实际的FL系统中，因为它们需要先知道黑客的数量或基于重新挂绫或修改提交。这是因为敌人通常不会在进攻前宣布他们的意图，而重新挂绫可能会在没有进攻的情况下改变聚合结果。为了解决实际FL系统中的这些挑战，这篇研究论文引入了一种前沿的异常探测方法，具有以下特点：1. 当进攻发生时，探测进攻并执行防御操作；2. 进攻发生后，进一步探测侵略客模型，并将其淘汰无害的模型；3. 在服务器端进行诚实的防御机制执行，运用零知识证明机制。我们透过广泛的实验证明了提案的方法的超越性。

Higher-Order DeepTrails: Unified Approach to *Trails

paper_url: http://arxiv.org/abs/2310.04477
repo_url: https://github.com/lsx-uniwue/deeptrails
paper_authors: Tobias Koopmann, Jan Pfister, André Markus, Astrid Carolus, Carolin Wienrich, Andreas Hotho
for: 本研究旨在分析和理解人类行为，以便在不同的设定下提高和优化基础设施或用户界面。
methods: 本研究使用自动逆进语言模型来分析整个序列，以模型高阶相关性在序列中。
results: 本研究可以轻松地适应先前的工作中的不同设定，如 HypTrails、MixedTrails 和 SubTrails，同时具有uniqueadvantages：1. 模型高阶相关性 между状态转移，2. 能够识别提出的假设缺陷，3. 自然地实现了所有设定的统一模型。

Abstract
Analyzing, understanding, and describing human behavior is advantageous in different settings, such as web browsing or traffic navigation. Understanding human behavior naturally helps to improve and optimize the underlying infrastructure or user interfaces. Typically, human navigation is represented by sequences of transitions between states. Previous work suggests to use hypotheses, representing different intuitions about the navigation to analyze these transitions. To mathematically grasp this setting, first-order Markov chains are used to capture the behavior, consequently allowing to apply different kinds of graph comparisons, but comes with the inherent drawback of losing information about higher-order dependencies within the sequences. To this end, we propose to analyze entire sequences using autoregressive language models, as they are traditionally used to model higher-order dependencies in sequences. We show that our approach can be easily adapted to model different settings introduced in previous work, namely HypTrails, MixedTrails and even SubTrails, while at the same time bringing unique advantages: 1. Modeling higher-order dependencies between state transitions, while 2. being able to identify short comings in proposed hypotheses, and 3. naturally introducing a unified approach to model all settings. To show the expressiveness of our approach, we evaluate our approach on different synthetic datasets and conclude with an exemplary analysis of a real-world dataset, examining the behavior of users who interact with voice assistants.

摘要
分析、理解和描述人类行为是在不同的设定中有利，如网络浏览或交通导航。理解人类行为自然地 помоляет改进和优化下面的基础设施或用户界面。通常，人类导航被表示为状态转移序列。先前的工作建议使用假设，表示不同的导航 intuitions 来分析这些转移。使用首领链来数学地抓住这种设定，可以应用不同的图比较方法，但是会隐藏高阶的序列相互关系信息。为了解决这个问题，我们提议使用拟合语言模型来分析整个序列，这种模型通常用于模型序列中的高阶相互关系。我们表明，我们的方法可以轻松地适应先前的工作中引入的不同设定，即 HypTrails、MixedTrails 和 SubTrails，同时带来独特的优势：1. 模型状态转移中的高阶相互关系，2. 能够识别提出的假设缺陷，3. 自然地引入一个统一的方法来模型所有设定。为了证明我们的方法的表达力，我们在不同的 sintetic 数据集上进行了评估，并在一个实际的数据集上进行了 exemplary 分析，探讨用户与语音助手交互的行为。

Observation-Guided Diffusion Probabilistic Models

paper_url: http://arxiv.org/abs/2310.04041
repo_url: None
paper_authors: Junoh Kang, Jinyoung Choi, Sungik Choi, Bohyung Han
for: 提高速度抽样和质量控制之间的平衡，建立一种新的扩散模型，即观察指导扩散概率模型（OGDM）。
methods: 通过在Markov链中integrating观察过程的指导，以原则性的方式重新定义培训目标。这里引入基于观察的一个额外损失项，使用bernoulli分布判断输入是否处于噪声真实抽象上，从而优化具有更高准确性的负逻恒 log-likelihood。
results: 通过对强 diffusion model 基础版本进行训练，实现更好的降噪网络，而无需增加计算成本。此外，我们的训练方法可以与不同的快速推理策略结合使用，并且在精度控制和速度抽样之间具有优势。我们通过对多种推理方法进行评估，证明了我们的训练算法的效iveness。

Abstract
We propose a novel diffusion model called observation-guided diffusion probabilistic model (OGDM), which effectively addresses the trade-off between quality control and fast sampling. Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain in a principled way. This is achieved by introducing an additional loss term derived from the observation based on the conditional discriminator on noise level, which employs Bernoulli distribution indicating whether its input lies on the (noisy) real manifold or not. This strategy allows us to optimize the more accurate negative log-likelihood induced in the inference stage especially when the number of function evaluations is limited. The proposed training method is also advantageous even when incorporated only into the fine-tuning process, and it is compatible with various fast inference strategies since our method yields better denoising networks using the exactly same inference procedure without incurring extra computational cost. We demonstrate the effectiveness of the proposed training algorithm using diverse inference methods on strong diffusion model baselines.

摘要
我们提出了一种新的扩散模型，即观察指导扩散概率模型（OGDM），可以有效地解决速度控制和质量控制之间的牵扯。我们的方法通过在Markov链中 integrate观察过程的指导来重新定义培训目标。这是通过在噪声水平上采用bernoulli分布来判断输入是否处于噪声真实抽象上来实现的。这种策略使得我们可以在推理阶段更好地优化更准确的负逻辑极限，特别是当功能评估数量有限时。我们的训练方法还具有优化准确性的优势，并且可以轻松地与各种快速推理策略结合使用，不需要额外的计算成本。我们通过使用不同的推理方法对强大的扩散模型基准进行了证明。

Demystifying Embedding Spaces using Large Language Models

paper_url: http://arxiv.org/abs/2310.04475
repo_url: None
paper_authors: Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier
for: 使得嵌入模型中的信息更易理解和更广泛使用，使用大自然语言模型（LLM）直接与嵌入 vectors 交互，将抽象矢量转化为可理解的故事。
methods: 使用 LLM 将嵌入 vectors 注入到 LLM 中，以便查询和探索复杂嵌入数据。
results: 在多种多样的任务上，包括增强概念活动矢量（CAV）、传达新嵌入实体、和解码用户喜好 recommender 系统中的用户喜好。

Abstract
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing Large Language Models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.

摘要
Currently, embeddings have become a crucial tool for representing complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. However, these embeddings often preclude direct interpretation, making it difficult to understand their meaning. Although downstream tasks can utilize these compressed representations, interpreting them usually requires visualization methods such as dimensionality reduction or specialized machine learning interpretability techniques.This paper aims to address the challenge of making embeddings more interpretable and broadly useful by using Large Language Models (LLMs) to directly interact with embeddings. By injecting embeddings into LLMs, we can transform abstract vectors into understandable narratives, enabling users to query and explore complex embedding data. We demonstrate our approach on a variety of diverse tasks, including enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work combines the vast information potential of embeddings with the interpretive power of LLMs, unlocking new possibilities for understanding and working with complex data.

Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning

paper_url: http://arxiv.org/abs/2310.04474
repo_url: None
paper_authors: Yinger Zhang, Hui Cai, Yicheng Chen, Rui Sun, Jing Zheng
for: 这篇论文目的是为了增强大型自然语言处理模型（LLMs）的表现，使其能够通过外部API进行函数调用，而不需要精度调整。
methods: 这篇论文提出了一种简单又可控的目标驱动方法，称为“反链”，以使LLMs能够通过提示来使用外部API。
results: 实验结果表明，reverse Chain可以具有多 функ数调用的可能性，而且可以提高现有LLMs的工具使用能力。

Abstract
While enabling large language models to implement function calling (known as APIs) can greatly enhance the performance of LLMs, function calling is still a challenging task due to the complicated relations between different APIs, especially in a context-learning setting without fine-tuning. This paper proposes a simple yet controllable target-driven approach called Reverse Chain to empower LLMs with capabilities to use external APIs with only prompts. Given that most open-source LLMs have limited tool-use or tool-plan capabilities, LLMs in Reverse Chain are only employed to implement simple tasks, e.g., API selection and argument completion, and a generic rule is employed to implement a controllable multiple functions calling. In this generic rule, after selecting a final API to handle a given task via LLMs, we first ask LLMs to fill the required arguments from user query and context. Some missing arguments could be further completed by letting LLMs select another API based on API description before asking user. This process continues until a given task is completed. Extensive numerical experiments indicate an impressive capability of Reverse Chain on implementing multiple function calling. Interestingly enough, the experiments also reveal that tool-use capabilities of the existing LLMs, e.g., ChatGPT, can be greatly improved via Reverse Chain.

摘要
While enabling large language models to implement function calling (known as APIs) can greatly enhance the performance of LLMs, function calling is still a challenging task due to the complicated relations between different APIs, especially in a context-learning setting without fine-tuning. This paper proposes a simple yet controllable target-driven approach called Reverse Chain to empower LLMs with capabilities to use external APIs with only prompts. Given that most open-source LLMs have limited tool-use or tool-plan capabilities, LLMs in Reverse Chain are only employed to implement simple tasks, e.g., API selection and argument completion, and a generic rule is employed to implement a controllable multiple functions calling. In this generic rule, after selecting a final API to handle a given task via LLMs, we first ask LLMs to fill the required arguments from user query and context. Some missing arguments could be further completed by letting LLMs select another API based on API description before asking user. This process continues until a given task is completed. Extensive numerical experiments indicate an impressive capability of Reverse Chain on implementing multiple function calling. Interestingly enough, the experiments also reveal that tool-use capabilities of the existing LLMs, e.g., ChatGPT, can be greatly improved via Reverse Chain.Translation notes:* "large language models" (大型语言模型) is translated as "LLMs" (LLM) for brevity.* "function calling" (函数调用) is translated as "API" (API) to refer to the specific task or action being performed.* "context-learning setting" (上下文学习设置) is translated as "context-learning" (上下文学习) to emphasize the learning aspect.* "tool-use or tool-plan capabilities" (工具使用或计划能力) is translated as "tool-use capabilities" (工具使用能力) to simplify the phrase.* "Reverse Chain" (返回链) is translated as "Reverse Chain" (返回链) to maintain the original English name for clarity.* "extensive numerical experiments" (详细数学实验) is translated as "extensive numerical experiments" (详细数学实验) to maintain the original phrase for clarity.* "tool-use capabilities" (工具使用能力) is translated as "tool-use capabilities" (工具使用能力) to maintain consistency.

Effective Slogan Generation with Noise Perturbation

paper_url: http://arxiv.org/abs/2310.04472
repo_url: https://github.com/joannekim0420/slogangeneration
paper_authors: Jongeun Kim, MinChung Kim, Taehwan Kim
for: 本研究旨在自动生成优秀的企业品牌标语，以帮助企业建立独特的品牌identify。
methods: 本研究使用预训练的 transformer T5 模型，并在新提出的 1:N 匹配对数据集上加入噪声扰动。这种方法可以生成更加独特和凝重的品牌标语。
results: 根据 ROUGE1、ROUGEL 和夹角相似度指标，以及人类评测者的评价，本研究的方法在生成品牌标语方面表现更好，比基eline模型和其他 transformer-based 模型更高。

Abstract
Slogans play a crucial role in building the brand's identity of the firm. A slogan is expected to reflect firm's vision and brand's value propositions in memorable and likeable ways. Automating the generation of slogans with such characteristics is challenging. Previous studies developted and tested slogan generation with syntactic control and summarization models which are not capable of generating distinctive slogans. We introduce a a novel apporach that leverages pre-trained transformer T5 model with noise perturbation on newly proposed 1:N matching pair dataset. This approach serves as a contributing fator in generting distinctive and coherent slogans. Turthermore, the proposed approach incorporates descriptions about the firm and brand into the generation of slogans. We evaluate generated slogans based on ROUGE1, ROUGEL and Cosine Similarity metrics and also assess them with human subjects in terms of slogan's distinctiveness, coherence, and fluency. The results demonstrate that our approach yields better performance than baseline models and other transformer-based models.

摘要
标语对企业形象的建立扮演着关键角色。一句标语应该反映公司的未来方向和品牌价值提供程序，而且需要具备记忆性和喜欢性。但自动生成标语这种特点具备的挑战。先前的研究已经开发和测试了 syntax控制和摘要模型，但这些模型不能生成独特的标语。我们提出了一种新的方法，利用预训练的 transformer T5 模型，并在新的 1:N 匹配对数据集上加入噪音诱导。这种方法可以为生成独特和流畅的标语做出贡献。此外，我们的方法还会 incorporate 公司和品牌的描述信息到标语的生成中。我们根据 ROUGE1、ROUGEL 和 Kosine 相似度度量来评估生成的标语，并通过人类评审者对标语的独特性、 coherence 和流畅性进行评估。结果表明，我们的方法在基eline模型和 transformer-based 模型之上表现更好。

Excision and Recovery: Enhancing Surface Anomaly Detection with Attention-based Single Deterministic Masking

paper_url: http://arxiv.org/abs/2310.04010
repo_url: None
paper_authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Yeonho Lee, Juneho Yi
for: 这篇论文的目的是提出一种基于剪辑和恢复的异常检测方法，以解决Surface Inspection中的量度不均匀问题。
methods: 这篇论文使用了一个单检定的剪辑方法，即隐藏可疑异常区域并将其剪辑出来，以导致恢复模型的复原错误较大。
results: 实验结果显示，提出的EAR模型在一个常用的Surface Inspection数据集KolektorSDD2上，具有较好的异常检测性和更高的处理速率，相比于现有的方法。

Abstract
Anomaly detection (AD) in surface inspection is an essential yet challenging task in manufacturing due to the quantity imbalance problem of scarce abnormal data. To overcome the above, a reconstruction encoder-decoder (ED) such as autoencoder or U-Net which is trained with only anomaly-free samples is widely adopted, in the hope that unseen abnormals should yield a larger reconstruction error than normal. Over the past years, researches on self-supervised reconstruction-by-inpainting have been reported. They mask out suspected defective regions for inpainting in order to make them invisible to the reconstruction ED to deliberately cause inaccurate reconstruction for abnormals. However, their limitation is multiple random masking to cover the whole input image due to defective regions not being known in advance. We propose a novel reconstruction-by-inpainting method dubbed Excision and Recovery (EAR) that features single deterministic masking. For this, we exploit a pre-trained spatial attention model to predict potential suspected defective regions that should be masked out. We also employ a variant of U-Net as our ED to further limit the reconstruction ability of the U-Net model for abnormals, in which skip connections of different layers can be selectively disabled. In the training phase, all the skip connections are switched on to fully take the benefits from the U-Net architecture. In contrast, for inferencing, we only keep deeper skip connections with shallower connections off. We validate the effectiveness of EAR using an MNIST pre-trained attention for a commonly used surface AD dataset, KolektorSDD2. The experimental results show that EAR achieves both better AD performance and higher throughput than state-of-the-art methods. We expect that the proposed EAR model can be widely adopted as training and inference strategies for AD purposes.

摘要
anomaly detection (AD) 在表面检查中是一项重要但困难的任务，因为数据异常的量不均。为了解决这个问题，广泛采用一种重建编码器-解码器（ED），例如自动编码器或 U-Net，这些模型在只有正常样本上训练时具有良好的性能。在过去几年中，关于自我超级重建-填充的研究得到了报道。它们会将受到怀疑的缺陷区域填充，以便使其在重建过程中变得不可见，从而让异常样本的重建结果更加准确。然而，这些方法的限制是需要多个随机的填充，因为缺陷区域没有被预先知道。我们提出了一种新的重建-填充方法，称为摘除和恢复（EAR）。这种方法具有单个决定性的填充。我们利用一个预训练的空间注意力模型来预测可能的缺陷区域，并使用一种变体的 U-Net 作为我们的 ED。在训练阶段，所有的跳跃连接都被打开，以便完全利用 U-Net 体系的优势。在推理阶段，我们只保留更深层的跳跃连接，使得更浅层的跳跃连接被关闭。我们验证了 EAR 方法的有效性，使用 MNIST 预训练注意力模型，并在常用的表面异常检测数据集 KolektorSDD2 上进行了实验。实验结果表明，EAR 方法在异常检测性能和通过率方面都超过了当前的方法。我们预计，提出的 EAR 模型将广泛采用作为训练和推理策略。

Fast Neighborhood Search Heuristics for the Colorful Bin Packing Problem

paper_url: http://arxiv.org/abs/2310.04471
repo_url: https://gitlab.com/renanfernandofranco/fast-neighborhood-search-heuristics-for-the-colorful-bin-packing-problem
paper_authors: Renan F. F. da Silva, Yulle G. F. Borges, Rafael C. S. Schouery
for: 解决颜色分别压缩问题 (Colorful Bin Packing Problem, CBPP)，这是加载问题 (Bin Packing Problem, BPP) 的扩展。
methods: 提出了加载问题的有效规则和新规则，以及一种基于变量邻域搜索 (Variable Neighborhood Search, VNS) 和精算法 (Greedy Randomized Adaptive Search, GRASP) 的协同搜索策略。
results: 结果表明我们的协同策略比 VNS 更高效，并且两种方法都可以解决许多元素的实例，即使实际上有很多元素。

Abstract
The Colorful Bin Packing Problem (CBPP) is a generalization of the Bin Packing Problem (BPP). The CBPP consists of packing a set of items, each with a weight and a color, in bins of limited capacity, minimizing the number of used bins and satisfying the constraint that two items of the same color cannot be packed side by side in the same bin. In this article, we proposed an adaptation of BPP heuristics and new heuristics for the CBPP. Moreover, we propose a set of fast neighborhood search algorithms for CBPP. These neighborhoods are applied in a meta-heuristic approach based on the Variable Neighborhood Search (VNS) and a matheuristic approach that mixes linear programming with the meta-heuristics VNS and Greedy Randomized Adaptive Search (GRASP). The results indicate that our matheuristic is superior to VNS and that both approaches can find near-optimal solutions for a large number instances, even for instances with many items.

摘要
《彩色箱包问题（CBPP）》是对《箱包问题（BPP）》的推广。CBPP包括将一组物品，每个物品具有重量和颜色，打包在容量有限的箱中，最小化使用箱的数量，并满足两个同色物品不能在同一箱中并列的约束。在这篇文章中，我们提出了BPP的适应和新的CBPP启发式。此外，我们还提出了一组快速邻居搜索算法 дляCBPP。这些邻居是在基于变量邻居搜索（VNS）和精确优化的GRASP meta-heuristic方法中应用的。结果表明，我们的矩阵启发式在VNS的基础上实现了更好的性能，而两种方法都可以对大量实例进行优化。

Hierarchical Multi-Marginal Optimal Transport for Network Alignment

paper_url: http://arxiv.org/abs/2310.04470
repo_url: None
paper_authors: Zhichen Zeng, Boxin Du, Si Zhang, Yinglong Xia, Zhining Liu, Hanghang Tong
for: Multi-network alignment, an essential prerequisite for joint learning on multiple networks.
methods: Hierarchical multi-marginal optimal transport framework (HOT) with fused Gromov-Wasserstein (FGW) barycenter and generalized multi-marginal FGW distance.
results: Significant improvements over the state-of-the-art in both effectiveness and scalability.Here’s the full text in Simplified Chinese:
for: 这个论文主要用于多个网络之间的对应关系找索，也就是多网络学习的先修。
methods: 我们提出了一种层次多参数最优运输框架（HOT），通过粘合Gromov-Wasserstein（FGW）中心来划分多个网络，并基于多参数FGW距离来对多个网络进行对应关系找索。
results: 我们的HOT方法在效果和可扩展性两个方面具有显著的改善，比如当前的状态OFTHEART。

Abstract
Finding node correspondence across networks, namely multi-network alignment, is an essential prerequisite for joint learning on multiple networks. Despite great success in aligning networks in pairs, the literature on multi-network alignment is sparse due to the exponentially growing solution space and lack of high-order discrepancy measures. To fill this gap, we propose a hierarchical multi-marginal optimal transport framework named HOT for multi-network alignment. To handle the large solution space, multiple networks are decomposed into smaller aligned clusters via the fused Gromov-Wasserstein (FGW) barycenter. To depict high-order relationships across multiple networks, the FGW distance is generalized to the multi-marginal setting, based on which networks can be aligned jointly. A fast proximal point method is further developed with guaranteed convergence to a local optimum. Extensive experiments and analysis show that our proposed HOT achieves significant improvements over the state-of-the-art in both effectiveness and scalability.

摘要
找到多个网络之间的节点对应关系，即多网络对Alignment，是 JOINT learning 多网络的重要前提。despite 在多网络对Alignment 方面取得了很大的成功，文献中关于多网络对Alignment 的研究相对落后，这是因为解决多网络对Alignment 问题的解空间是 exponential 增长的，而且缺乏高阶差异度量。为了填补这一漏洞，我们提出了一种基于层次多margin optimal transport 框架的 HOT 方法，用于多网络对Alignment。为了处理庞大的解空间，我们将多个网络 decomposed 成更小的对应集合，使用 Fused Gromov-Wasserstein (FGW) 中心进行aligned clustering。此外，我们将 FGW 距离推广到多margin Setting，以便在多个网络之间 depict 高阶关系。基于这些推广的 FGW 距离，我们可以将多个网络进行 JOINT 对Alignment。进一步，我们开发了一种快速的 proximal point method，并证明其 convergence 是一个本地最优点。广泛的实验和分析表明，我们的提出的 HOT 方法可以在效果和可扩展性两个方面取得显著的改进。

CUPre: Cross-domain Unsupervised Pre-training for Few-Shot Cell Segmentation

paper_url: http://arxiv.org/abs/2310.03981
repo_url: None
paper_authors: Weibin Liao, Xuhong Li, Qingzhong Wang, Yanwu Xu, Zhaozheng Yin, Haoyi Xiong
for: 这篇论文的目的是提出一种几何对称预训练方法（CUPre），以便在几何预训练中将通用物品识别和实例分类的能力转移到细胞的视觉领域，并且仅使用少量标注的细胞图像进行预训练。
methods: 这篇论文使用的方法包括：1) alternate multi-task pre-training（AMT2），2) 热度对称对抗学习（MoCo），3) 实例分类。
results: 实验结果显示，使用CUPre进行预训练后，在几何细胞分类和检测任务中，可以实现高精度的结果，比较 existing pre-training 方法更高。

Abstract
While pre-training on object detection tasks, such as Common Objects in Contexts (COCO) [1], could significantly boost the performance of cell segmentation, it still consumes on massive fine-annotated cell images [2] with bounding boxes, masks, and cell types for every cell in every image, to fine-tune the pre-trained model. To lower the cost of annotation, this work considers the problem of pre-training DNN models for few-shot cell segmentation, where massive unlabeled cell images are available but only a small proportion is annotated. Hereby, we propose Cross-domain Unsupervised Pre-training, namely CUPre, transferring the capability of object detection and instance segmentation for common visual objects (learned from COCO) to the visual domain of cells using unlabeled images. Given a standard COCO pre-trained network with backbone, neck, and head modules, CUPre adopts an alternate multi-task pre-training (AMT2) procedure with two sub-tasks -- in every iteration of pre-training, AMT2 first trains the backbone with cell images from multiple cell datasets via unsupervised momentum contrastive learning (MoCo) [3], and then trains the whole model with vanilla COCO datasets via instance segmentation. After pre-training, CUPre fine-tunes the whole model on the cell segmentation task using a few annotated images. We carry out extensive experiments to evaluate CUPre using LIVECell [2] and BBBC038 [4] datasets in few-shot instance segmentation settings. The experiment shows that CUPre can outperform existing pre-training methods, achieving the highest average precision (AP) for few-shot cell segmentation and detection.

摘要
而�PREtraining on object detection tasks, such as Common Objects in Contexts (COCO) [1], can significantly boost the performance of cell segmentation, but it still requires a large amount of fine-annotated cell images [2] with bounding boxes, masks, and cell types for every cell in every image to fine-tune the pre-trained model. To reduce the cost of annotation, this work considers the problem of pre-training deep neural network (DNN) models for few-shot cell segmentation, where massive unlabeled cell images are available but only a small proportion is annotated. Hereby, we propose Cross-domain Unsupervised Pre-training, namely CUPre, which transfers the capability of object detection and instance segmentation for common visual objects (learned from COCO) to the visual domain of cells using unlabeled images. Given a standard COCO pre-trained network with backbone, neck, and head modules, CUPre adopts an alternate multi-task pre-training (AMT2) procedure with two sub-tasks -- in every iteration of pre-training, AMT2 first trains the backbone with cell images from multiple cell datasets via unsupervised momentum contrastive learning (MoCo) [3], and then trains the whole model with vanilla COCO datasets via instance segmentation. After pre-training, CUPre fine-tunes the whole model on the cell segmentation task using a few annotated images. We conduct extensive experiments to evaluate CUPre using LIVECell [2] and BBBC038 [4] datasets in few-shot instance segmentation settings. The experiment shows that CUPre can outperform existing pre-training methods, achieving the highest average precision (AP) for few-shot cell segmentation and detection.

Perfect Alignment May be Poisonous to Graph Contrastive Learning

paper_url: http://arxiv.org/abs/2310.03977
repo_url: None
paper_authors: Jingyu Liu, Huayi Tang, Yong Liu
for: 本文旨在研究图像学习中的增强技术，以及这些技术如何影响下游任务的表现。
methods: 本文使用了图像增强法，包括增强图像的尺寸、颜色、纹理等方面，以及使用了对照学习来验证图像的表现。
results: 研究发现，图像增强法可以提高下游任务的表现，但是需要在增强程度和类型上进行调整。同时，对照学习可以验证图像的表现，但是需要避免过于准确的对照，以免导致模型的泛化能力受到影响。

Abstract
Graph Contrastive Learning (GCL) aims to learn node representations by aligning positive pairs and separating negative ones. However, limited research has been conducted on the inner law behind specific augmentations used in graph-based learning. What kind of augmentation will help downstream performance, how does contrastive learning actually influence downstream tasks, and why the magnitude of augmentation matters? This paper seeks to address these questions by establishing a connection between augmentation and downstream performance, as well as by investigating the generalization of contrastive learning. Our findings reveal that GCL contributes to downstream tasks mainly by separating different classes rather than gathering nodes of the same class. So perfect alignment and augmentation overlap which draw all intra-class samples the same can not explain the success of contrastive learning. Then in order to comprehend how augmentation aids the contrastive learning process, we conduct further investigations into its generalization, finding that perfect alignment that draw positive pair the same could help contrastive loss but is poisonous to generalization, on the contrary, imperfect alignment enhances the model's generalization ability. We analyse the result by information theory and graph spectrum theory respectively, and propose two simple but effective methods to verify the theories. The two methods could be easily applied to various GCL algorithms and extensive experiments are conducted to prove its effectiveness.

摘要
graph对比学习（GCL）目标是通过对正方向对对应的锚点进行对应，以便学习锚点表示。然而，有限的研究已经进行了关于在图基于学习中使用特定的扩充方法的内律。这种扩充方法会帮助下游性能，如何对下游任务产生影响，以及为什么扩充方法的大小重要吗？这篇论文旨在回答这些问题，同时也进行了对扩充和对比学习的总体化研究。我们的发现表明，GCL在下游任务中的主要贡献是将不同类型的锚点分开，而不是将同类型的锚点集中。因此，完美对齐和扩充 overlap，即将所有同类型的锚点都设为同一个样本，无法解释对比学习的成功。为了更好地理解扩充对对比学习的作用，我们进行了进一步的总体研究，发现了以下结论：完美对齐可以帮助对比损失，但是对总体化有毒。相反，不完美对齐可以增强模型的总体化能力。我们通过信息学和图谱论断来分析结果，并提出了两种简单 yet 有效的方法来验证这些理论。这两种方法可以轻松应用于各种GCL算法，并进行了广泛的实验来证明其效果。

Sub-token ViT Embedding via Stochastic Resonance Transformers

paper_url: http://arxiv.org/abs/2310.03967
repo_url: None
paper_authors: Dong Lao, Yangchao Wu, Tian Yu Liu, Alex Wong, Stefano Soatto
For: The paper is written to address the issue of quantization artifacts in Vision Transformers (ViTs) and to propose a zero-shot method to improve the handling of spatial quantization in pre-trained ViTs.* Methods: The proposed method, called Stochastic Resonance Transformer (SRT), ensembles the features obtained from perturbing input images via sub-token spatial translations, inspired by Stochastic Resonance. SRT can be applied at any layer, on any task, and does not require any fine-tuning.* Results: The paper shows that SRT can effectively super-resolve features of pre-trained ViTs, capturing more of the local fine-grained structures that might otherwise be neglected as a result of tokenization. SRT outperforms the baseline models by an average of 4.7% and 14.9% on the RMSE and RMSE-log metrics across three different architectures for monocular depth prediction, and by an average of 2.4% in F&J score for semi-supervised video object segmentation. Additionally, the paper shows that SRT improves upon the base model by an average of 2.1% on the maxF metric for unsupervised salient region segmentation, and yields consistent improvements of up to 2.6% and 1.0% respectively for image retrieval and object discovery.

Abstract
We discover the presence of quantization artifacts in Vision Transformers (ViTs), which arise due to the image tokenization step inherent in these architectures. These artifacts result in coarsely quantized features, which negatively impact performance, especially on downstream dense prediction tasks. We present a zero-shot method to improve how pre-trained ViTs handle spatial quantization. In particular, we propose to ensemble the features obtained from perturbing input images via sub-token spatial translations, inspired by Stochastic Resonance, a method traditionally applied to climate dynamics and signal processing. We term our method ``Stochastic Resonance Transformer" (SRT), which we show can effectively super-resolve features of pre-trained ViTs, capturing more of the local fine-grained structures that might otherwise be neglected as a result of tokenization. SRT can be applied at any layer, on any task, and does not require any fine-tuning. The advantage of the former is evident when applied to monocular depth prediction, where we show that ensembling model outputs are detrimental while applying SRT on intermediate ViT features outperforms the baseline models by an average of 4.7% and 14.9% on the RMSE and RMSE-log metrics across three different architectures. When applied to semi-supervised video object segmentation, SRT also improves over the baseline models uniformly across all metrics, and by an average of 2.4% in F&J score. We further show that these quantization artifacts can be attenuated to some extent via self-distillation. On the unsupervised salient region segmentation, SRT improves upon the base model by an average of 2.1% on the maxF metric. Finally, despite operating purely on pixel-level features, SRT generalizes to non-dense prediction tasks such as image retrieval and object discovery, yielding consistent improvements of up to 2.6% and 1.0% respectively.

摘要

Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models

paper_url: http://arxiv.org/abs/2310.03965
repo_url: None
paper_authors: Junchi Yu, Ran He, Rex Ying
for: 提高大型自然语言模型（LLM）的复杂理解能力
methods: 利用相似问题的解决方案和推理策略增强LLM的推理能力
results: 相比基eline，TP提高了短路推理任务的优化解决率12%，创作写作任务的人类偏好提高13%，LLM-Agent规划任务的完成率提高15%

Abstract
Large Language Models (LLMs) have achieved remarkable success in reasoning tasks with the development of prompting methods. However, existing prompting approaches cannot reuse insights of solving similar problems and suffer from accumulated errors in multi-step reasoning, since they prompt LLMs to reason \textit{from scratch}. To address these issues, we propose \textbf{\textit{Thought Propagation} (TP)}, which explores the analogous problems and leverages their solutions to enhance the complex reasoning ability of LLMs. These analogous problems are related to the input one, with reusable solutions and problem-solving strategies. Thus, it is promising to propagate insights of solving previous analogous problems to inspire new problem-solving. To achieve this, TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one. Then, TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch. TP is compatible with existing prompting approaches, allowing plug-and-play generalization and enhancement in a wide range of tasks without much labor in task-specific prompt engineering. Experiments across three challenging tasks demonstrate TP enjoys a substantial improvement over the baselines by an average of 12\% absolute increase in finding the optimal solutions in Shortest-path Reasoning, 13\% improvement of human preference in Creative Writing, and 15\% enhancement in the task completion rate of LLM-Agent Planning.

摘要
大型语言模型（LLM）在逻辑任务中取得了杰出的成功，但现有的提示方法不能重复这些成果和受到累累逻辑的影响，因为它们将LLM训练自scratch。为了解决这些问题，我们提出了“思维传播”（TP），它探索相关的问题和将其解决方案传播到LLM中，以增强它们的复杂逻辑能力。这些相关问题的解决方案可以重复使用，因此可以将这些问题的解决方案传播到LLM中，以激发它们对新的问题的解决。TP可以与现有的提示方法相容，允许插件和改进在广泛的任务中，而不需要大量的问题特定的提示工程。实验结果显示，TP与基准相比，平均提高了12%的最佳解决方案找到率，13%的人类偏好在创意写作中，和15%的任务完成率。

A Learnable Counter-condition Analysis Framework for Functional Connectivity-based Neurological Disorder Diagnosis

paper_url: http://arxiv.org/abs/2310.03964
repo_url: https://github.com/es-kang/learnable-counter-condition-FC
paper_authors: Eunsong Kang, Da-woon Heo, Jiwon Lee, Heung-Il Suk
for: 这研究旨在了解 neuroscience 疾病的生物特征，通过深度学习模型进行识别疾病并进行后续分析以找出疾病相关的生物标志物。
methods: 我们提出了一种新的一体化框架，它将诊断和特征提取集成在一起，并提出了一种适应性注意力网络来实现特征选择。我们还提出了一种功能网络关系编码器，它可以捕捉全局的功能连接 topological 性，而不需要先定义函数网络之间的边。
results: 我们的框架可以提供更高的诊断精度和解释力，并且通过对抗条件分析来描述疾病相关的神经生物学特征。我们使用了大量的 REST-meta-MDD 和 ABIDE 数据集，并证明了我们的框架在疾病识别方面的优异性。

Abstract
To understand the biological characteristics of neurological disorders with functional connectivity (FC), recent studies have widely utilized deep learning-based models to identify the disease and conducted post-hoc analyses via explainable models to discover disease-related biomarkers. Most existing frameworks consist of three stages, namely, feature selection, feature extraction for classification, and analysis, where each stage is implemented separately. However, if the results at each stage lack reliability, it can cause misdiagnosis and incorrect analysis in afterward stages. In this study, we propose a novel unified framework that systemically integrates diagnoses (i.e., feature selection and feature extraction) and explanations. Notably, we devised an adaptive attention network as a feature selection approach to identify individual-specific disease-related connections. We also propose a functional network relational encoder that summarizes the global topological properties of FC by learning the inter-network relations without pre-defined edges between functional networks. Last but not least, our framework provides a novel explanatory power for neuroscientific interpretation, also termed counter-condition analysis. We simulated the FC that reverses the diagnostic information (i.e., counter-condition FC): converting a normal brain to be abnormal and vice versa. We validated the effectiveness of our framework by using two large resting-state functional magnetic resonance imaging (fMRI) datasets, Autism Brain Imaging Data Exchange (ABIDE) and REST-meta-MDD, and demonstrated that our framework outperforms other competing methods for disease identification. Furthermore, we analyzed the disease-related neurological patterns based on counter-condition analysis.

摘要
为了理解神经疾病的生物特征，现有研究广泛采用深度学习基本模型来识别疾病，并通过可解释模型来找出疾病相关的生物标志物。现有的框架通常包括三个阶段，即特征选择、特征提取 для分类和分析，每个阶段都是独立实现的。然而，如果每个阶段的结果无法靠拢，可能会导致诊断和错误分析。在这种情况下，我们提出了一种新的一体化框架，系统地结合诊断和解释。特别是，我们设计了适应性注意力网络作为特征选择方法，以特定疾病相关的连接进行个体特定的诊断。此外，我们提出了功能网络关系编码器，以学习无定义的函数网络关系，从而捕捉全局的 topological 特征。最后，我们的框架还提供了一种新的解释力，即对神经科学的解释，也称为对 condition 分析。我们在FC中反转诊断信息（i.e., counter-condition FC），将正常脑变异常，并将异常脑变正常。我们使用了两个大的休息态功能磁共振成像（fMRI）数据集，ABIDE 和 REST-meta-MDD，并证明了我们的框架在疾病识别方面高效。此外，我们还分析了疾病相关的神经学特征。

Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations

paper_url: http://arxiv.org/abs/2310.03951
repo_url: https://github.com/microsoft/conli_hallucination
paper_authors: Deren Lei, Yaxi Li, Mengya Hu, Mingyu Wang, Vincent Yun, Emily Ching, Eslam Kamal
for: 这篇论文是为了探讨大型自然语言模型（LLMs）如何检测和解决它们生成的幻视文本。
methods: 这篇论文提出了一个几何构造，用于检测和解决 LLMs 生成的幻视文本。这个构造使用了自然语言推理链（CoNLI）来检测幻视文本，并通过后期重写来实现幻视文本的减少。
results: 这篇论文的结果显示，这个几何构造可以实现高度的幻视文本检测和减少，并且可以在不需要精确调整或域专标语言调整的情况下，实现文本质量的提高。

Abstract
Large language models (LLMs) can generate fluent natural language texts when given relevant documents as background context. This ability has attracted considerable interest in developing industry applications of LLMs. However, LLMs are prone to generate hallucinations that are not supported by the provided sources. In this paper, we propose a hierarchical framework to detect and mitigate such ungrounded hallucination. Our framework uses Chain of Natural Language Inference (CoNLI) for hallucination detection and hallucination reduction via post-editing. Our approach achieves state-of-the-art performance on hallucination detection and enhances text quality through rewrite, using LLMs without any fine-tuning or domain-specific prompt engineering. We show that this simple plug-and-play framework can serve as an effective choice for hallucination detection and reduction, achieving competitive performance across various contexts.

摘要
大型自然语言模型（LLM）可以生成流畅的自然语言文本，当给定相关的文档作为背景上下文时。这种能力吸引了产业应用的广泛关注。然而，LLM容易产生没有文档支持的幻见。在这篇论文中，我们提出了一种层次结构的检测和 Mitigate 幻见的框架。我们的框架使用自然语言推理链（CoNLI） для幻见检测和幻见减少via重写。我们的方法实现了状态艺术的幻见检测性能和提高文本质量通过重写，无需任何细化或域特定的提问工程。我们展示了这个简单的插件和玩家可以作为有效的幻见检测和减少选择，在不同的上下文中实现竞争性能。