2023-10-06

cs.AI

cs.AI - 2023-10-06

Copy Suppression: Comprehensively Understanding an Attention Head

paper_url: http://arxiv.org/abs/2310.04625
repo_url: https://github.com/callummcdougall/seri-mats-2023-streamlit-pages
paper_authors: Callum McDougall, Arthur Conmy, Cody Rushing, Thomas McGrath, Neel Nanda
for: 本研究主要针对语言模型中的一个重要组件——注意头10.7（L10H7），以及它在模型训练过程中的作用。
methods: 本研究使用了GPT-2 Small语言模型，并通过分析模型的 weights 来描述 L10H7 的减震机制。
results: 研究发现，L10H7 可以减少模型的复制行为，并且这种减震机制对于自修复（self-repair）具有重要作用。自修复指的是，当模型中的某些组件被 удали时，下游神经网络部分会进行补做，以维持模型的正确性。研究还发现，自修复的一个重要机制就是复制减震。

Abstract
We present a single attention head in GPT-2 Small that has one main role across the entire training distribution. If components in earlier layers predict a certain token, and this token appears earlier in the context, the head suppresses it: we call this copy suppression. Attention Head 10.7 (L10H7) suppresses naive copying behavior which improves overall model calibration. This explains why multiple prior works studying certain narrow tasks found negative heads that systematically favored the wrong answer. We uncover the mechanism that the Negative Heads use for copy suppression with weights-based evidence and are able to explain 76.9% of the impact of L10H7 in GPT-2 Small. To the best of our knowledge, this is the most comprehensive description of the complete role of a component in a language model to date. One major effect of copy suppression is its role in self-repair. Self-repair refers to how ablating crucial model components results in downstream neural network parts compensating for this ablation. Copy suppression leads to self-repair: if an initial overconfident copier is ablated, then there is nothing to suppress. We show that self-repair is implemented by several mechanisms, one of which is copy suppression, which explains 39% of the behavior in a narrow task. Interactive visualisations of the copy suppression phenomena may be seen at our web app https://copy-suppression.streamlit.app/

摘要
我们提出了一个单一的注意头在GPT-2 Small中，这个注意头在整个训练分布中有一个主要角色。如果在earlier层中的 компонент预测了某个 tokens，并且这个 tokens 在上下文中出现得更早，那么这个注意头会对它进行抑制：我们称这为copy suppression。注意头10.7（L10H7）对于naive copying行为进行抑制，这解释了为什么多个先前的研究在特定的狭频任务中发现了负面的头。我们探索了这个机制的负面头使用 weights-based evidence 的实际方式，并能够解释76.9%的L10H7在GPT-2 Small中的影响。根据我们所知，这是 language model 中 Component 的最完整的角色描述至今。一个主要的效果 OF copy suppression 是 self-repair。self-repair 指的是当模型中的重要部分被删除时，下游神经网络部分会对此进行补偿。copy suppression 导致 self-repair：如果初始的骄傲 copier 被删除，那么没有什么可以对其进行抑制。我们显示了 self-repair 是通过多种机制实现的，其中一种是 copy suppression，这解释了39%的行为在狭频任务中。可以在我们的网页应用中看到互动的visualisations of the copy suppression 现象：https://copy-suppression.streamlit.app/

Deconstructing Cooperation and Ostracism via Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.04623
repo_url: None
paper_authors: Atsushi Ueshima, Shayegan Omidshafiei, Hirokazu Shirado
for: 这个论文探讨了在生物系统、人类社会和多代理系统中的合作挑战，以及如何通过网络重构来解决这些挑战。
methods: 作者使用了多代理人学习 simulate the Prisoner’s Dilemma game，并研究了在不同的连接策略下，合作和网络重构之间的复杂 causal 动力。
results: 研究发现，网络重构可以促进双方合作，即使一方总是合作。此外，研究还发现， ostracism 是 network rewiring 的关键因素，但 ostracism alone 不能使合作出现。相反， ostracism 是在学习合作后才出现的，并且已有的合作则由 ostracism 加强。

Abstract
Cooperation is challenging in biological systems, human societies, and multi-agent systems in general. While a group can benefit when everyone cooperates, it is tempting for each agent to act selfishly instead. Prior human studies show that people can overcome such social dilemmas while choosing interaction partners, i.e., strategic network rewiring. However, little is known about how agents, including humans, can learn about cooperation from strategic rewiring and vice versa. Here, we perform multi-agent reinforcement learning simulations in which two agents play the Prisoner's Dilemma game iteratively. Each agent has two policies: one controls whether to cooperate or defect; the other controls whether to rewire connections with another agent. This setting enables us to disentangle complex causal dynamics between cooperation and network rewiring. We find that network rewiring facilitates mutual cooperation even when one agent always offers cooperation, which is vulnerable to free-riding. We then confirm that the network-rewiring effect is exerted through agents' learning of ostracism, that is, connecting to cooperators and disconnecting from defectors. However, we also find that ostracism alone is not sufficient to make cooperation emerge. Instead, ostracism emerges from the learning of cooperation, and existing cooperation is subsequently reinforced due to the presence of ostracism. Our findings provide insights into the conditions and mechanisms necessary for the emergence of cooperation with network rewiring.

摘要
合作在生物系统、人类社会和多代理系统中都是挑战。而每个代理都可能会选择自利而不是合作。人类研究表明，人们可以在选择互动伙伴时超越社会困境，即策略网络重启。然而，关于代理如何从策略重启中学习合作以及vice versa， ainda不够了解。在这里，我们通过多代理学习回归 simulations进行了 investigate。两个代理在谎言游戏中互动，每个代理有两个策略：一个控制合作或背叛；另一个控制与另一个代理的连接。这种设置允许我们分离复杂的 causal 动力。我们发现，网络重启可以促进互合作，即使一个代理总是合作，容易受到恶意骗取。然后，我们确认了网络重启的效果是通过代理学习排斥来实现的，即与合作者连接并与背叛者断开。然而，我们也发现，排斥本身不足以使合作出现。相反，排斥是由学习合作而起的，并且现有的合作后来受到了排斥的加强。我们的发现可以为合作的出现和维护提供条件和机制。

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

paper_url: http://arxiv.org/abs/2310.04621
repo_url: None
paper_authors: Fred Hohman, Mary Beth Kery, Donghao Ren, Dominik Moritz
for: 本研究旨在推动机器学习（ML） computations 在日常用户设备上进行，以提高隐私、响应速度和新智能用户体验的普及。
methods: 本研究使用了 Apple 专家的采访研究，汲取了具有实践经验的专家在模型压缩方面的tacit knowledge。
results: 研究发现了一些实践中的技术策略和设计考虑，以及在不同硬件平台上实现高效模型的具体步骤。此外，研究还提出了一些工具设计建议，以便使得在设备上进行 ML computations 的工作更加容易。

Abstract
On-device machine learning (ML) promises to improve the privacy, responsiveness, and proliferation of new, intelligent user experiences by moving ML computation onto everyday personal devices. However, today's large ML models must be drastically compressed to run efficiently on-device, a hurtle that requires deep, yet currently niche expertise. To engage the broader human-centered ML community in on-device ML experiences, we present the results from an interview study with 30 experts at Apple that specialize in producing efficient models. We compile tacit knowledge that experts have developed through practical experience with model compression across different hardware platforms. Our findings offer pragmatic considerations missing from prior work, covering the design process, trade-offs, and technical strategies that go into creating efficient models. Finally, we distill design recommendations for tooling to help ease the difficulty of this work and bring on-device ML into to more widespread practice.

摘要
“设备机器学习（ML）将提高用户隐私、响应速度和新智能体验的普及，但现在的大型ML模型需要压缩运行在日常个人设备上，这是一项需要深厚专业知识的挑战。为了让更广泛的人类中心ML社区参与在设备上ML经验，我们公布了30名Apple专家在模型压缩 across不同硬件平台的实践经验。我们汇集了专家在实践中发展的潜在知识，包括设计过程、让步和技术策略。我们的发现缺失在先前的工作中，提供了实用的建议，以帮助抵消这项工作的困难。最后，我们提炼了工具设计的建议，以便更好地普及设备上ML技术。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Hong Kong, Macau, and Taiwan. If you need the translation in Traditional Chinese, please let me know.

SlotGNN: Unsupervised Discovery of Multi-Object Representations and Visual Dynamics

paper_url: http://arxiv.org/abs/2310.04617
repo_url: None
paper_authors: Alireza Rezazadeh, Athreyi Badithela, Karthik Desingh, Changhyun Choi
for: 这 paper 是用于学习多对象动力学从视觉数据中的不监督技术。
methods: 这 paper 使用了两个新的建筑：SlotTransport 和 SlotGNN。SlotTransport 是一种基于槽注意力的无监督物体发现算法，使用了特征传输机制来保持时间Alignment在物体-中心表示中。SlotGNN 是一种无监督图像基于 Scene 的动力学模型，使用了发现的槽来预测未来Scene 的状态。
results: 这 paper demonstarted SlotTransport 可以学习准确地编码 both visual 和位置信息，而 SlotGNN 可以在多对象重新排序和长期预测等 robotic 任务中准确预测 slots 和它们的动力学行为。此外，这 paper 的无监督方法在实际世界中也得到了证明。

Abstract
Learning multi-object dynamics from visual data using unsupervised techniques is challenging due to the need for robust, object representations that can be learned through robot interactions. This paper presents a novel framework with two new architectures: SlotTransport for discovering object representations from RGB images and SlotGNN for predicting their collective dynamics from RGB images and robot interactions. Our SlotTransport architecture is based on slot attention for unsupervised object discovery and uses a feature transport mechanism to maintain temporal alignment in object-centric representations. This enables the discovery of slots that consistently reflect the composition of multi-object scenes. These slots robustly bind to distinct objects, even under heavy occlusion or absence. Our SlotGNN, a novel unsupervised graph-based dynamics model, predicts the future state of multi-object scenes. SlotGNN learns a graph representation of the scene using the discovered slots from SlotTransport and performs relational and spatial reasoning to predict the future appearance of each slot conditioned on robot actions. We demonstrate the effectiveness of SlotTransport in learning object-centric features that accurately encode both visual and positional information. Further, we highlight the accuracy of SlotGNN in downstream robotic tasks, including challenging multi-object rearrangement and long-horizon prediction. Finally, our unsupervised approach proves effective in the real world. With only minimal additional data, our framework robustly predicts slots and their corresponding dynamics in real-world control tasks.

摘要
学习多对象动力学从视觉数据中使用无监督技术是具有挑战性的，因为需要Robust，可以通过机器人互动学习的对象表示。这篇论文提出了一个新的框架，包括两种新的架构：SlotTransport用于从RGB图像中发现对象表示，以及SlotGNN用于基于RGB图像和机器人互动预测多对象场景的共同动力学。我们的SlotTransport架构基于插槽注意力来无监督地发现对象，并使用特征传输机制来保持时间Alignment在对象中心表示。这使得可以发现具有固定组合的对象插槽，即使在压抑或缺失情况下也能够稳定地绑定到对象。我们的SlotGNN是一种新的无监督图形学模型，它使用发现的插槽来学习场景的图形表示，并在插槽之间进行关系和空间的推理来预测未来场景的 appears。我们 demonstarte了SlotTransport的有效性在学习对象中心特征，以及SlotGNN在下游机器人任务中的准确性。最后，我们的无监督方法在实际世界中得到了证明，只需要最少的额外数据，我们的框架就可以在实际控制任务中Robust地预测插槽和它们的相应动力学。

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

paper_url: http://arxiv.org/abs/2310.04610
repo_url: None
paper_authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi Hanson, Thomas E Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin Aji, Angela Dalton, Michael Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens
for: 这个论文目的是要探讨深度学习如何应用于自然科学领域，以推动科学探索和发现。
methods: 这个研究使用了DeepSpeed4Science倡议，利用深度学习系统技术创新来帮助领域专家解释今天最大的科学谜团。
results: 这个研究获得了初步的进展，通过对结构生物学研究中的两个系统挑战进行解决。

Abstract
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.

摘要
在未来的一个 década，深度学习可能会革命化自然科学，提高我们对自然现象的模型和预测能力。这可能会开启一个新的科学探索时代，带来重要的进步 across 多个领域，从药物开发到可再生能源。为回答这个呼吁，我们提出了 DeepSpeed4Science iniciativa（deepspeed4science.ai），旨在通过人工智能系统技术创新，帮助领域专家解开今天最大的科学谜团。通过利用 DeepSpeed 的当前技术柱（训练、推理和压缩）作为基础技术驱动者，DeepSpeed4Science 将创造一个新的人工智能系统技术，用于加速科学发现，超越常见的技术方法used for 加速通用大语言模型（LLMs）。在这篇论文中，我们展示了 DeepSpeed4Science 在结构生物研究中的早期进展。

A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators

paper_url: http://arxiv.org/abs/2310.04607
repo_url: None
paper_authors: Murali Emani, Sam Foreman, Varuni Sastry, Zhen Xie, Siddhisanket Raskar, William Arnold, Rajeev Thakur, Venkatram Vishwanath, Michael E. Papka
for: 本研究使用大型自然语言模型（LLM）来加速科学应用，检验不同AI加速器硬件系统的性能特点。
methods: 本研究使用多种AI加速器和GPU进行比较性能测试，包括一个核心转换块微型Benchmark、GPT-2模型和GenSLM科学应用案例。
results: 研究发现不同AI加速器在处理LLM模型时的性能特点，包括序列长度、缩放行为、缺失率和梯度积累步骤的影响。

Abstract
Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications is contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPUs and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT- 2 model, and (iii) an LLM-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, sparsity, and sensitivity to gradient accumulation steps.

摘要
人工智能（AI）方法已成为科学应用中的关键因素，以加速科学发现。大型语言模型（LLM）被视为解决一些挑战性问题的有望方法，因为它们在领域之间具有优秀的总体化能力。模型的效果和应用的准确性取决于其下面硬件基础设施的高效运行。特殊的AI加速器硬件系统在最近成为了加速AI应用的选择。然而，这些AI加速器对大型语言模型的性能尚未被系统性研究。在这篇论文中，我们系统地研究了多种AI加速器和GPU在LLM上的性能特点。我们使用（i）一个核心变换块的微型 benchmark，（ii）GPT-2模型，以及（iii）一个基于LLM的科学应用use case，GenSLM。我们提供我们的发现和分析结果，以更好地理解AI加速器的内在能力。此外，我们的分析考虑了序列长度、缩放行为、稀疏性和梯度积累步骤的影响。

A neuro-symbolic framework for answering conjunctive queries

paper_url: http://arxiv.org/abs/2310.04598
repo_url: None
paper_authors: Pablo Barceló, Tamara Cucumides, Floris Geerts, Juan Reutter, Miguel Romero
for: answering arbitrary conjunctive queries over incomplete knowledge graphs
methods: approximating cyclic queries with an infinite family of tree-like queries, leveraging existing neuro-symbolic models
results: strong guarantees of completeness and optimality, competitive results and improved performance with existentially quantified variables.Here’s the Chinese version:
for: answering incomplete knowledge graphs中的任意谱 queries
methods: approximating循环 queries by infinite family of树型 queries, 利用现有的 neuralsymbolic models
results: strong guarantees of completeness和optimality, competitive results and improved performance with existentially quantified variables.

Abstract
The problem of answering logical queries over incomplete knowledge graphs is receiving significant attention in the machine learning community. Neuro-symbolic models are a promising recent approach, showing good performance and allowing for good interpretability properties. These models rely on trained architectures to execute atomic queries, combining them with modules that simulate the symbolic operators in queries. Unfortunately, most neuro-symbolic query processors are limited to the so-called tree-like logical queries that admit a bottom-up execution, where the leaves are constant values or anchors, and the root is the target variable. Tree-like queries, while expressive, fail short to express properties in knowledge graphs that are important in practice, such as the existence of multiple edges between entities or the presence of triangles. We propose a framework for answering arbitrary conjunctive queries over incomplete knowledge graphs. The main idea of our method is to approximate a cyclic query by an infinite family of tree-like queries, and then leverage existing models for the latter. Our approximations achieve strong guarantees: they are complete, i.e. there are no false negatives, and optimal, i.e. they provide the best possible approximation using tree-like queries. Our method requires the approximations to be tree-like queries where the leaves are anchors or existentially quantified variables. Hence, we also show how some of the existing neuro-symbolic models can handle these queries, which is of independent interest. Experiments show that our approximation strategy achieves competitive results, and that including queries with existentially quantified variables tends to improve the general performance of these models, both on tree-like queries and on our approximation strategy.

摘要
machine learning 社区中受到“回答逻辑查询 над 不完整知识图”的问题 receiving significant attention。 neuro-symbolic 模型是一种可靠的新方法，表现良好并具有良好解释性质。这些模型通过训练架构来执行原子查询，并将其与模块组合以模拟查询中的符号运算。然而，大多数 neuro-symbolic 查询处理器都受到限制，只能处理叶子结构式查询，即叶子是常量值或吊钩，根是目标变量。叶子结构式查询，虽然表达力强，但缺少在实际中重要的知识图特性，如多个边 между实体或存在三角形。我们提出了一个 answering arbitrary conjunctive queries over incomplete knowledge graphs 的框架。我们的方法的主要想法是将 cyclic 查询近似为无穷多个 tree-like 查询，然后利用现有模型来处理后者。我们的近似 garanties 是完整的，即无 false negatives，以及优化的，即它们在 tree-like 查询中提供了最好的近似。我们的方法需要近似是 tree-like 查询的叶子是吊钩或 universally quantified 变量。因此，我们还证明了一些现有的 neuro-symbolic 模型可以处理这些查询，这是独立有趣的。实验显示，我们的近似策略实现了竞争力的结果，并且包括含 universally quantified 变量的查询通常会提高这些模型的总性能，不 только在 tree-like 查询上，还在我们的近似策略上。

Segmented Harmonic Loss: Handling Class-Imbalanced Multi-Label Clinical Data for Medical Coding with Large Language Models

paper_url: http://arxiv.org/abs/2310.04595
repo_url: None
paper_authors: Surjya Ray, Pratik Mehta, Hongen Zhang, Ada Chaman, Jian Wang, Chung-Jen Ho, Michael Chiou, Tashfeen Suleman
for: 这篇论文旨在评估大自然语言模型（LLM）在医疗领域的应用，特别是在实际噪音数据上进行医疗编码任务。
methods: 作者使用了encoder-based LLMs，如BERT，并开发了一种新的损失函数，即分割和解耦多个类别的数据集的Segmented Harmonic Loss。此外，作者还提出了一种基于embedding相似性的技术来处理噪音数据。
results: 作者的实验结果表明，当使用提议的损失函数进行训练时，LLMs在噪音长尾数据上达到了显著性能提升，与状态 искусственный智能的F1分数相比，提高了十几个百分点。

Abstract
The precipitous rise and adoption of Large Language Models (LLMs) have shattered expectations with the fastest adoption rate of any consumer-facing technology in history. Healthcare, a field that traditionally uses NLP techniques, was bound to be affected by this meteoric rise. In this paper, we gauge the extent of the impact by evaluating the performance of LLMs for the task of medical coding on real-life noisy data. We conducted several experiments on MIMIC III and IV datasets with encoder-based LLMs, such as BERT. Furthermore, we developed Segmented Harmonic Loss, a new loss function to address the extreme class imbalance that we found to prevail in most medical data in a multi-label scenario by segmenting and decoupling co-occurring classes of the dataset with a new segmentation algorithm. We also devised a technique based on embedding similarity to tackle noisy data. Our experimental results show that when trained with the proposed loss, the LLMs achieve significant performance gains even on noisy long-tailed datasets, outperforming the F1 score of the state-of-the-art by over ten percentage points.

摘要
大量语言模型（LLM）的急剧升级和推广，历史上任何消费者面向技术中最快的采用率都不能比拟。医疗领域，曾经使用自然语言处理技术（NLP），不可避免地受到这种飞速的影响。在这篇论文中，我们评估了LLM在医疗数据中的表现，使用实际生成的噪音数据进行评估。我们在MIMIC III和IV dataset上进行了多个实验，使用了BERT等encoder-based LLM。此外，我们还提出了一种新的损失函数——分割和解除相互关联的类别损失函数（Segmented Harmonic Loss），用于Addressing the extreme class imbalance problem in most medical data。此外，我们还提出了一种基于 embedding similarity的技术来处理噪音数据。我们的实验结果表明，当使用我们提出的损失函数进行训练时，LLM在噪音长尾数据上表现出了明显的性能提升，与状态之前的F1分数高出十个百分点以上。

Can pruning make Large Language Models more efficient?

paper_url: http://arxiv.org/abs/2310.04573
repo_url: None
paper_authors: Sia Gholami, Marwan Omar
for: 这篇论文旨在探讨用于Transformer架构的Weight遗传减少，以提高模型的 Computational Efficiency、环境影响和资源有限的平台上的部署。
methods: 这篇论文使用了多种剪辑方法，包括阶层剪辑、梯度剪辑和混合剪辑，并评估它们对模型性能、模型大小和computational demand的影响。
results: 研究发现，对于Transformer架构，适当选择剪辑参数可以实现轻量级化模型，而不会对模型性能造成严重干扰。此外，给定的剪辑方法可以提高模型的普遍化能力。

Abstract
Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational efficiency, environmental impact, and deployability on resource-limited platforms. To address these challenges, this paper investigates the application of weight pruning-a strategic reduction of model parameters based on their significance-as an optimization strategy for Transformer architectures. Through extensive experimentation, we explore various pruning methodologies, highlighting their impact on model performance, size, and computational demands. Our findings suggest that with judicious selection of pruning hyperparameters, significant reductions in model size are attainable without considerable compromise on performance. Moreover, when coupled with post-pruning fine-tuning strategies, some pruned models even exhibit enhanced generalization capabilities. This work seeks to bridge the gap between model efficiency and performance, paving the way for more scalable and environmentally responsible deep learning applications.

摘要
Transformer 模型已经对自然语言处理带来革命性的变革，但是它们的庞大参数数量也引发了计算效率、环境影响和资源有限平台上的部署的担忧。为了解决这些挑战，这篇论文探讨了在Transformer架构中应用权重剔除（一种基于参数重要性的参数剔除策略）的应用。通过广泛的实验，我们探讨了不同的剔除方法，并对它们的影响对模型性能、体积和计算需求进行了详细的探讨。我们的发现表明，通过合理地选择剔除超参数的参数，可以实现大幅减少模型体积，而不会对性能造成重要的损害。此外，当与后期剔除精度练习结合使用时，一些剔除后的模型甚至会表现出更高的泛化能力。这种工作旨在bridging模型效率和性能之间的差距，为更可持续和环保的深度学习应用开辟新的可能性。

Knolling bot: A Transformer-based Approach to Organizing a Messy Table

paper_url: http://arxiv.org/abs/2310.04566
repo_url: None
paper_authors: Yuhang Hu, Zhizhuo Zhang, Ruibo Liu, Philippe Wyder, Hod Lipson
for: equip domestic robots with the ability to perform simple household tidying tasks
methods: transformer-based approach that predicts the next position of an item in a sequence of neatly positioned items, integrated with a visual perception model and a physical robot arm
results: a machine that declutters and organizes a dozen freeform items of various shapes and sizes

Abstract
In this study, we propose an approach to equip domestic robots with the ability to perform simple household tidying tasks. We focus specifically on 'knolling,' an activity related to organizing scattered items into neat and space-efficient arrangements. Unlike the uniformity of industrial environments, household settings present unique challenges due to their diverse array of items and the subjectivity of tidiness. Here, we draw inspiration from natural language processing (NLP) and utilize a transformer-based approach that predicts the next position of an item in a sequence of neatly positioned items. We integrate the knolling model with a visual perception model and a physical robot arm to demonstrate a machine that declutters and organizes a dozen freeform items of various shapes and sizes.

摘要
在这项研究中，我们提出了一种方法，以使家庭机器人具备简单的家务整理功能。我们专注于“整理”活动，即将杂乱的物品整理成整洁和高效的排序。与工业环境的统一性不同，家庭环境具有各种不同的物品和整理主观性。我们 Draw inspiration from自然语言处理（NLP），并使用变换器基本方法预测下一个item的位置序列中的整理位置。我们将整理模型与视觉识别模型和物理机器臂集成，以示一种机器人可以整理和组织多达十二种不同形状和大小的自由形态物品的机器人。

Binary Quantification and Dataset Shift: An Experimental Investigation

paper_url: http://arxiv.org/abs/2310.04565
repo_url: https://github.com/pglez82/quant_datasetshift
paper_authors: Pablo González, Alejandro Moreo, Fabrizio Sebastiani
for: 本研究的目的是调查现有的量化方法在不同类型的数据集shift下的性能，以便开发更加普遍应用的方法。
methods: 本研究使用了一系列的数据生成协议，以模拟不同类型的数据集shift，然后测试现有的量化方法在这些数据集上的性能。
results: 研究发现，许多现有的量化方法只是对优先概率shift进行了robust性测试，而对其他类型的数据集shift并不是robust enough。此外，无论是现有的量化方法还是一些新的方法，都没有能够在所有类型的数据集shift下达到良好的性能。

Abstract
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the unlabelled data are not IID, i.e., suffer from dataset shift. To date, quantification methods have mostly been tested only on a special case of dataset shift, i.e., prior probability shift; the relationship between quantification and other types of dataset shift remains, by and large, unexplored. In this work we carry out an experimental analysis of how current quantification algorithms behave under different types of dataset shift, in order to identify limitations of current approaches and hopefully pave the way for the development of more broadly applicable methods. We do this by proposing a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift, and by testing existing quantification methods on the datasets thus generated. One finding that results from this investigation is that many existing quantification methods that had been found robust to prior probability shift are not necessarily robust to other types of dataset shift. A second finding is that no existing quantification method seems to be robust enough to dealing with all the types of dataset shift we simulate in our experiments. The code needed to reproduce all our experiments is publicly available at https://github.com/pglez82/quant_datasetshift.

摘要
“量化任务”是指在已经有标签数据上训练预测器，然后用这些预测器预测未标注数据中类别的浸泡率值的超vised learning任务。这种任务特别有用，当标签数据和未标注数据不是独立同分布（IID）时。目前，量化方法已经主要测试在优先概率偏移的特殊情况下，关于其他类型的数据偏移情况，尚未得到广泛的研究。在这项工作中，我们通过设计细化的数据偏移类型分类、生成受到这些类型偏移的数据集，并测试现有的量化方法在这些数据集上的性能。我们的发现之一是，许多已知的量化方法，曾被发现对优先概率偏移有效，但并不一定对其他类型的数据偏移有效。另一个发现是，现有的任何量化方法都无法对我们在实验中 simulate 的所有类型数据偏移应用。相关的代码可以在 GitHub 上获取，网址是。

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

paper_url: http://arxiv.org/abs/2310.04564
repo_url: None
paper_authors: Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar
for: 这个研究旨在探讨如何在大型语言模型（LLMs）中使用ReLU活化函数以提高效率，并且降低在资源受限的设备上的计算成本。
methods: 这个研究使用了ReLU活化函数，并且探讨了将ReLU活化函数应用于LLMs中的练习方法，以及如何将ReLU活化函数与其他活化函数比较。
results: 研究发现，使用ReLU活化函数不会影响LLMs的性能和测试速度，但是可以降低计算成本和运算量。此外，这个研究还提出了一些实践的策略，可以将LLMs的推导运算量降低到3倍以上，并且对性能进行最小的贡献损失。

Abstract
Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications. However, their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices. Despite recent trends favoring alternative activation functions such as GELU or SiLU, known for increased computation, this study strongly advocates for reinstating ReLU activation in LLMs. We demonstrate that using the ReLU activation function has a negligible impact on convergence and performance while significantly reducing computation and weight transfer. This reduction is particularly valuable during the memory-bound inference step, where efficiency is paramount. Exploring sparsity patterns in ReLU-based LLMs, we unveil the reutilization of activated neurons for generating new tokens and leveraging these insights, we propose practical strategies to substantially reduce LLM inference computation up to three times, using ReLU activations with minimal performance trade-offs.

摘要

Towards Foundation Models for Knowledge Graph Reasoning

paper_url: http://arxiv.org/abs/2310.04562
repo_url: https://github.com/DeepGraphLearning/ULTRA
paper_authors: Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, Zhaocheng Zhu
for: 这个研究旨在建立基础模型，以便在语言和视觉领域中进行推论。
methods: 本研究使用了一种名为ULTRA的方法，它可以将基础表示学习到任何语言和视觉资料上，并且可以在不同的知识库中进行推论。
results: 研究发现，使用ULTRA方法可以在57个不同的知识库中进行零式推论，并且可以与特定Graph上训练的强基础模型进行比较。 Fine-tuning可以进一步提高表现。

Abstract
Foundation models in language and vision have the ability to run inference on any textual and visual inputs thanks to the transferable representations such as a vocabulary of tokens in language. Knowledge graphs (KGs) have different entity and relation vocabularies that generally do not overlap. The key challenge of designing foundation models on KGs is to learn such transferable representations that enable inference on any graph with arbitrary entity and relation vocabularies. In this work, we make a step towards such foundation models and present ULTRA, an approach for learning universal and transferable graph representations. ULTRA builds relational representations as a function conditioned on their interactions. Such a conditioning strategy allows a pre-trained ULTRA model to inductively generalize to any unseen KG with any relation vocabulary and to be fine-tuned on any graph. Conducting link prediction experiments on 57 different KGs, we find that the zero-shot inductive inference performance of a single pre-trained ULTRA model on unseen graphs of various sizes is often on par or better than strong baselines trained on specific graphs. Fine-tuning further boosts the performance.

摘要
基础模型在语言和视觉领域具有对任何文本和视觉输入进行推理的能力，归功于可转移的表示，如语言中的词汇表。知识图（KG）的不同实体和关系词汇通常不重叠。设计基础模型在KG上的主要挑战是学习可转移的表示，以便在任何图中进行推理。在这种情况下，我们在ULTRA方法中提出了一种学习通用和可转移的图表示的方法。ULTRA建立了基于交互的关系表示，这使得预训练的ULTRA模型可以在未看过的图上进行零基本推理，并且可以通过细化来进一步提高性能。在57个不同的知识图上进行了链接预测实验，我们发现了一个预训练的ULTRA模型在不同的图上的零基本推理性能经常与强基eline模型相当或更高，并且细化可以进一步提高性能。

Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

paper_url: http://arxiv.org/abs/2310.04521
repo_url: None
paper_authors: Tzu-Yuan Lin, Minghan Zhu, Maani Ghaffari
for: 这篇论文提出了一种随 Lie 代数数据输入的适应 invariants 神经网络，用于处理输入数据的变换。
methods: 该模型使用 conjugate 关系来捕捉变换之间的协变关系，并且利用了killings form 的不变性来扩展到任意半凝arametric Lie 代数。
results: 模型在 homography 模型中的应用显示了其在sl(3) Lie 代数上的扩展性和灵活性。

Abstract
This paper proposes an adjoint-equivariant neural network that takes Lie algebra data as input. Various types of equivariant neural networks have been proposed in the literature, which treat the input data as elements in a vector space carrying certain types of transformations. In comparison, we aim to process inputs that are transformations between vector spaces. The change of basis on transformation is described by conjugations, inducing the adjoint-equivariance relationship that our model is designed to capture. Leveraging the invariance property of the Killing form, the proposed network is a general framework that works for arbitrary semisimple Lie algebras. Our network possesses a simple structure that can be viewed as a Lie algebraic generalization of a multi-layer perceptron (MLP). This work extends the application of equivariant feature learning. As an example, we showcase its value in homography modeling using sl(3) Lie algebra.

摘要
这篇论文提出了一种随李代数数据输入的随变神经网络。文献中已经提出了各种类型的等变神经网络，它们将输入数据视为元素在一个向量空间上的变换。与之相比，我们的模型处理的输入是变换于向量空间之间的转换。这种转换的变换基的更改被描述为 conjugation，导致我们的模型拥有随变性关系。通过利用李代数内积的不变性，我们的网络是一种通用的李代数扩展，可以应用于任意半简单李代数。我们的网络结构简单，可以视为李代数扩展的多层感知器（MLP）的普遍化。这项工作扩展了等变特征学习的应用范围。例如，我们使用 sl(3) 李代数示例表明其价值在投影学中。

Utilizing Free Clients in Federated Learning for Focused Model Enhancement

paper_url: http://arxiv.org/abs/2310.04515
repo_url: None
paper_authors: Aditya Narayan Ravi, Ilan Shomorony
for: 本研究旨在解决 Federated Learning (FL) 中非优先级客户端参与的挑战，提出了一种名为 Prioritized FL 的分布式机器学习方法。
methods: 该方法使用匹配策略选择非优先级客户端，根据其数据上模型损失与全局数据上模型损失之间的相似程度来决定使用非优先级客户端的更新。
results: 该方法在多种synthetic和benchmark数据集上显示了更快的收敛速度和更高的测试准确率，比基eline更好。

Abstract
Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data, without the need for clients to share their data. Many existing FL approaches assume that all clients have equal importance and construct a global objective based on all clients. We consider a version of FL we call Prioritized FL, where the goal is to learn a weighted mean objective of a subset of clients, designated as priority clients. An important question arises: How do we choose and incentivize well aligned non priority clients to participate in the federation, while discarding misaligned clients? We present FedALIGN (Federated Adaptive Learning with Inclusion of Global Needs) to address this challenge. The algorithm employs a matching strategy that chooses non priority clients based on how similar the models loss is on their data compared to the global data, thereby ensuring the use of non priority client gradients only when it is beneficial for priority clients. This approach ensures mutual benefits as non priority clients are motivated to join when the model performs satisfactorily on their data, and priority clients can utilize their updates and computational resources when their goals align. We present a convergence analysis that quantifies the trade off between client selection and speed of convergence. Our algorithm shows faster convergence and higher test accuracy than baselines for various synthetic and benchmark datasets.

摘要
联邦学习（FL）是一种分布式机器学习方法，用于在分散的非同构数据上学习模型，而不需要客户端共享其数据。许多现有的FL方法假设所有客户端都有相同的重要性，并将所有客户端的数据结构组合成一个全球目标。我们称之为优先FL，其目的是学习一个优先客户端的Weighted Mean目标。一个重要的问题是如何选择和激励与优先客户端不同步的客户端参加联邦，而且抛弃不同步的客户端？我们提出了FedALIGN（联邦适应学习具有全球需求的匹配）来解决这个挑战。这个算法使用一个匹配策略，选择非优先客户端基于它们的模型损失与全球数据之间的相似度，以确保非优先客户端的 gradients 只在优先客户端的目标有益时使用。这种方法确保了非优先客户端的动机 join 联邦，并且优先客户端可以利用其更新和计算资源，当它们的目标相似。我们提供了一个对 client 选择和速度快速融合的可读性分析。我们的算法在多个 sintetic 和 benchmark 数据上显示 faster convergence 和更高的测试精度。

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

paper_url: http://arxiv.org/abs/2310.04413
repo_url: https://github.com/Improbable-AI/dw-offline-rl
paper_authors: Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal
for: 本研究旨在提出一种新的Offline Policy学习算法，以解决现有数据集中具有偏袋性的问题。
methods: 该算法使用了一种新的采样策略，允许策略仅受到“好的数据”的限制，而不是所有数据集中的所有动作。
results: 研究表明，该算法可以在72个偏袋数据集中实现显著的性能提升，并且在D4RL数据集和三种不同的Offline RL算法中也显示出良好的效果。

Abstract
Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.

摘要
偏好离线策略学习是目标是通过现有的轨迹数据集来学习决策策略，而不是收集更多数据。使用奖励学习（RL）而不是监督学习技术，例如行为复制，的 PRIMARY 动机是找到一个可以高于数据集中的轨迹平均返回的策略。然而，我们实际上发现，当数据集受到低质量轨迹的影响时，当前的偏好离线RL算法并没有显著提高数据集中的平均返回。我们认为这是因为当前的算法假设在数据集中很近的轨迹。如果数据集主要由低质量轨迹组成，这个假设迫使策略模仿低质量的动作。我们解决这个问题，我们提出了一种采样策略，即只允许策略遵循“好的数据”（即高质量轨迹），而不是所有数据集中的动作（即均匀采样）。我们采用了这种采样策略，并开发了一种可以作为标准偏好离线RL算法中的插件模块。我们的评估表明，在72个不均衡数据集、D4RL数据集和三种不同的偏好离线RL算法中，我们的方法具有显著的性能提升。代码可以在https://github.com/Improbable-AI/dw-offline-rl 上获取。

Policy-Gradient Training of Language Models for Ranking

paper_url: http://arxiv.org/abs/2310.04407
repo_url: None
paper_authors: Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim
for: This paper aims to improve the training of text retrieval models for decision-making systems by introducing a novel training algorithm called Neural PG-RANK.
methods: Neural PG-RANK uses a Plackett-Luce ranking policy to learn to rank, which is a principled method that relies little on complex heuristics. The algorithm unifies the training objective with downstream decision-making quality.
results: The paper presents extensive experiments on various text retrieval benchmarks, showing that Neural PG-RANK achieves remarkable in-domain performance improvement and substantial out-of-domain generalization to some critical datasets used in downstream question answering tasks.Here is the same information in Simplified Chinese text:
for: 这篇论文目标是提高决策系统中文本检索模型的训练方法，通过引入一种新的训练算法called Neural PG-RANK。
methods: Neural PG-RANK使用Plackett-Luce排序策略来学习排序，这是一种原理性的方法，它减少了复杂的规则的依赖。这种算法将训练目标与下游决策质量集成起来。
results: 论文提供了多个文本检索 benchmark 上的广泛实验结果，显示 Neural PG-RANK 在具体领域性能上有remarkable提升，并在一些关键的问答任务上具有显著的泛化性。

Abstract
Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires intricate heuristics, including selecting hard negatives and using additional supervision as learning signals. This reliance on heuristics stems from the fact that the contrastive loss itself is heuristic and does not directly optimize the downstream metrics of decision quality at the end of the processing pipeline. To address this issue, we introduce Neural PG-RANK, a novel training algorithm that learns to rank by instantiating a LLM as a Plackett-Luce ranking policy. Neural PG-RANK provides a principled method for end-to-end training of retrieval models as part of larger decision systems via policy gradient, with little reliance on complex heuristics, and it effectively unifies the training objective with downstream decision-making quality. We conduct extensive experiments on various text retrieval benchmarks. The results demonstrate that when the training objective aligns with the evaluation setup, Neural PG-RANK yields remarkable in-domain performance improvement, with substantial out-of-domain generalization to some critical datasets employed in downstream question answering tasks.

摘要
(Simplified Chinese translation)文本检索对于将知识 integrate into 语言处理管道中扮演着关键的角色，从 chat 式网页搜索到问答系统。当前的 state-of-the-art 文本检索模型利用预训练的大语言模型（LLM）来实现竞争性表现，但是通过 Typical 对比损失来训练 LLM-based 检索器时需要复杂的规则，包括选择困难的负例和使用额外的监督作为学习信号。这种依赖于规则的问题来自于对比损失本身是规则的，不直接优化下游决策质量的度量。为解决这个问题，我们引入 Neural PG-RANK，一种新的训练算法，通过将 LLM 实例化为 Plackett-Luce 排序策略来学习排序。Neural PG-RANK 提供了一种原则性的方法，通过策略梯度来训练检索器，减少复杂的规则，并具有良好的决策质量相关性。我们在多个文本检索标准benchmark上进行了广泛的实验。结果表明，当训练目标与评估setup相一致时，Neural PG-RANK 在域中显著提高表现，同时具有一定的外部泛化能力，在一些关键的问答任务上进行了重要的应用。

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

paper_url: http://arxiv.org/abs/2310.04406
repo_url: https://github.com/andyz245/LanguageAgentTreeSearch
paper_authors: Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang
for: 提高大型自然语言模型（LLM）在决策任务中的表现，并推广其作为自主代理机制的应用。
methods: 基于Monte Carlo搜索的语言代理搜索（LATS）框架，利用LLM作为代理、价值函数和优化器，充分发挥其内在优势。环境反馈机制为外部问题解决机制，超越现有方法的局限性。
results: 在多个领域（编程、HotPotQA、WebShop）进行了实验，demonstrated LATS在理解和行为方面的效果和通用性。例如，在HumanEval上使用GPT-4时达到94.4%的分数，在WebShop上使用GPT-3.5时平均分数为75.9。

Abstract
While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4\% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method.

摘要
大型语言模型（LLM）在各种决策任务上表现出色，但它们依赖于简单的行为过程，无法普遍应用为自主代理。我们介绍了 Language Agent Tree Search（LATS）框架，它将 LLM 作为计划、行为和理解的基础，挖掘它们的潜在能力。 drew inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS 使用 LLM 作为代理、价值函数和优化器，将其用于更好的决策。在这种方法中，环境提供了外部反馈，这意味着更加积极和适应的问题解决机制，超越现有技术的局限性。我们在多个领域进行了实验，包括编程、HotPotQA 和 WebShop，展示了 LATS 在理解和行为方面的可应用性。特别是，LATS 在 HumanEval 上使用 GPT-4 achieve 94.4%，并在 WebShop 上使用 GPT-3.5 获得了平均分数为 75.9，这说明了我们的方法的有效性和通用性。

Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

paper_url: http://arxiv.org/abs/2310.04395
repo_url: None
paper_authors: Marvin Schmitt, Daniel Habermann, Paul-Christian Bürkner, Ullrich Köthe, Stefan T. Radev
for: 提高折衔折 Bayesian inference 的效率和准确性，通过抽象 JOINT 模型 $p(\theta, y)$ 中的 universality symmetries 利用。
methods: 借鉴 Bayes 定理，估算 marginal likelihood 基于approximate representation 的 JOINT 模型。在完美准确情况下，marginal likelihood 固定不变，但approximation error 导致不 desirable variance 在不同参数值上。我们将这种偏差作为损失函数，加速 conditional neural density estimator 的学习动态。
results: 在一个简单的 Toy 问题和一个实际模型中，我们应用了我们的方法，并观察到提高了效率和准确性。

Abstract
We propose a method to improve the efficiency and accuracy of amortized Bayesian inference (ABI) by leveraging universal symmetries in the probabilistic joint model $p(\theta, y)$ of parameters $\theta$ and data $y$. In a nutshell, we invert Bayes' theorem and estimate the marginal likelihood based on approximate representations of the joint model. Upon perfect approximation, the marginal likelihood is constant across all parameter values by definition. However, approximation error leads to undesirable variance in the marginal likelihood estimates across different parameter values. We formulate violations of this symmetry as a loss function to accelerate the learning dynamics of conditional neural density estimators. We apply our method to a bimodal toy problem with an explicit likelihood (likelihood-based) and a realistic model with an implicit likelihood (simulation-based).

摘要
我们提出一种方法，用于提高权重抽象概率推理（ABI）的效率和准确性。我们利用概率联合模型 $p(\theta, y)$ 中的universal symmetry来优化。总之，我们在推理 Bayes 定理时进行反向推理，并基于approximate representation来估算 marginal likelihood。在完美approximation情况下， marginal likelihood 对所有参数值都是常数。然而，approximation error 会导致不良的偏差在不同参数值上的 marginal likelihood 估算中。我们将这种不Symmetry 表示为损失函数，以加速 conditional neural density estimator 的学习动态。我们对一个简单的 Toy problem 和一个实际模型进行应用。Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

FMM-Head: Enhancing Autoencoder-based ECG anomaly detection with prior knowledge

paper_url: http://arxiv.org/abs/2310.05848
repo_url: None
paper_authors: Giacomo Verardo, Magnus Boman, Samuel Bruchfeld, Marco Chiesa, Sabine Koch, Gerald Q. Maguire Jr., Dejan Kostic
for: 检测电子心电图数据中异常点的检测是重要的，以提供适时 intervención для高风险患者。
methods: 使用多种AutoEncoder模型（AE）来解决异常检测任务，但这些模型不考虑ECG领导的特定模式。我们则将AE的解码部分替换为基于ECG形态的重建头（namely, FMM-Head），以提高异常检测能力。
results: 我们的模型在AUROC指标上比现有模型高，最高提升0.31，而且具有相对较少的模型大小和可解释的提取特征。模型的处理时间也比解决优化问题来获得相同参数的时间更快，适用于实时ECG参数提取和异常检测。

Abstract
Detecting anomalies in electrocardiogram data is crucial to identifying deviations from normal heartbeat patterns and providing timely intervention to at-risk patients. Various AutoEncoder models (AE) have been proposed to tackle the anomaly detection task with ML. However, these models do not consider the specific patterns of ECG leads and are unexplainable black boxes. In contrast, we replace the decoding part of the AE with a reconstruction head (namely, FMM-Head) based on prior knowledge of the ECG shape. Our model consistently achieves higher anomaly detection capabilities than state-of-the-art models, up to 0.31 increase in area under the ROC curve (AUROC), with as little as half the original model size and explainable extracted features. The processing time of our model is four orders of magnitude lower than solving an optimization problem to obtain the same parameters, thus making it suitable for real-time ECG parameters extraction and anomaly detection.

摘要
“检测电子心脏ogram（ECG）数据中的偏差是关键的，以提供对有问题的患者进行时间对称的评估和治疗。多种机器学习（ML）模型已经被提出供侦测偏差任务，但这些模型未考虑ECG领域的特定模式，而且是黑盒子，不可解释。相比之下，我们将AE模型的解码部分替换为基于ECG形状的重建头（FMM-Head），我们的模型在与国际顶尖模型进行比较时，具有更高的偏差检测能力，最高可以提高0.31倍的ROC曲线面积（AUROC），并且仅需要原始模型的一半大小，同时具有可解释的提取特征。我们的模型处理时间只需四个数据类型的几个排序，相较于解决似然最小化问题以取得相同的参数，处理时间是四个数据类型的四个排序，这使得我们的模型适合实时ECG参数提取和偏差检测。”

Hermes: Unlocking Security Analysis of Cellular Network Protocols by Synthesizing Finite State Machines from Natural Language Specifications

paper_url: http://arxiv.org/abs/2310.04381
repo_url: https://github.com/synsec-den/hermes-spec-to-fsm
paper_authors: Abdullah Al Ishtiaq, Sarkar Snigdha Sarathi Das, Syed Md Mukit Rashid, Ali Ranjbar, Kai Tu, Tianwei Wu, Zhezheng Song, Weixuan Wang, Mujtahid Akon, Rui Zhang, Syed Rafiul Hussain
for: 本 paper 提供了一个终端框架，即 Hermes，用于自动从自然语言 cellular 规范中生成正式表示。
methods: 本 paper 使用了一个神经网络构成分析器，即 NEUTREX，处理 transition-relevant 文本，提取 transition ком成分 (即状态、条件和动作)。它还设计了一个专门的语言，用于将这些 transition ком成分转换为逻辑式ula by 利用依赖关系 parse tree。最后，它将这些逻辑式ula 编译为生成转换和建立正式模型为 finite state machines。
results: 本 paper 使用 Hermes 评估 4G NAS、5G NAS 和 5G RRC 规范，获得了81-87% 的总准确率，与现有的方法相比有所提高。对于提取的模型进行安全分析，发现了3个新的攻击和 Identified 19个先前的攻击在 4G 和 5G 规范中，以及7个偏差在商业 4G 底层。

Abstract
In this paper, we present Hermes, an end-to-end framework to automatically generate formal representations from natural language cellular specifications. We first develop a neural constituency parser, NEUTREX, to process transition-relevant texts and extract transition components (i.e., states, conditions, and actions). We also design a domain-specific language to translate these transition components to logical formulas by leveraging dependency parse trees. Finally, we compile these logical formulas to generate transitions and create the formal model as finite state machines. To demonstrate the effectiveness of Hermes, we evaluate it on 4G NAS, 5G NAS, and 5G RRC specifications and obtain an overall accuracy of 81-87%, which is a substantial improvement over the state-of-the-art. Our security analysis of the extracted models uncovers 3 new vulnerabilities and identifies 19 previous attacks in 4G and 5G specifications, and 7 deviations in commercial 4G basebands.

摘要
在本文中，我们介绍了 Hermes，一个端到端框架，用于自动生成 формаль Representation 从自然语言 celular 规范中。我们首先开发了一个神经网络成分分析器，名为 NEUTREX，用于处理过渡相关的文本，并提取过渡组件（即状态、条件和动作）。我们还设计了域特定语言，用于将过渡组件翻译成逻辑公式，通过利用依赖树来做这个翻译。最后，我们编译这些逻辑公式，生成过渡和创建正式模型为有限状态机。为了证明 Hermes 的效果，我们对 4G NAS、5G NAS 和 5G RRC 规范进行了评估，并获得了总准确率在 81-87% 之间，这与当前状态的技术具有显著的提升。我们的安全分析中发现了 3 个新的攻击点和 19 个之前的攻击在 4G 和 5G 规范中，以及 7 个偏差在商业 4G 基带中。

Confronting Reward Model Overoptimization with Constrained RLHF

paper_url: http://arxiv.org/abs/2310.04373
repo_url: https://github.com/tedmoskovitz/constrainedrl4lms
paper_authors: Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer
for: 本研究旨在解决复杂的语言评估问题，通过调整 composite reward models (CRMs) 的权重来避免过优化现象。
methods: 本研究使用 constrained reinforcement learning 方法，通过 Lagrange multipliers 学习动态权重，以避免每个 CRM 的用处提升点。
results: 研究发现，对于不同的 correlation междуcomponent RMs，可以通过调整权重来避免过优化现象，并且可以通过 gradient-free optimization 方法来在单个运行中进行优化。

Abstract
Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriately weight these component RMs when combining them. Compounding this difficulty, because any RM is only a proxy for human evaluation, this process is vulnerable to $\textit{overoptimization}$, wherein past a certain point, accumulating higher reward is associated with worse human ratings. In this paper, we perform, to our knowledge, the first study on overoptimization in composite RMs, showing that correlation between component RMs has a significant effect on the locations of these points. We then introduce an approach to solve this issue using constrained reinforcement learning as a means of preventing the agent from exceeding each RM's threshold of usefulness. Our method addresses the problem of weighting component RMs by learning dynamic weights, naturally expressed by Lagrange multipliers. As a result, each RM stays within the range at which it is an effective proxy, improving evaluation performance. Finally, we introduce an adaptive method using gradient-free optimization to identify and optimize towards these points during a single run.

摘要
大型语言模型通常与人类偏好相互align，通过优化 $\textit{奖励模型}$ (RM) 适应人类反馈。然而，人类偏好是多方面的，因此常会从多个简单的奖励模型中获得奖励，每个模型都 capture 不同的语言质量方面。这本身就是一个挑战，因为将这些组件 RM 组合时难以设置合适的权重。另外，因为任何 RM 都只是人类评价的对称 proxy，这个过程容易出现 $\textit{过乎优化}$ 现象，即当 RM 获得更高的奖励后，人类评价会变得更差。在这篇论文中，我们执行了我们知道的第一个关于过乎优化在composite RM 中的研究，显示了成分 RM 之间的联乘效应有着重要的影响。我们随后引入了一种方法来解决这个问题，使用受限的循环学习来防止代理人超过每个 RM 的有用性阈值。我们的方法可以自然地学习动态的权重，由拉格朗日积分自然地表达。因此，每个 RM 都保持在其有用性范围内，提高评估性能。最后，我们引入了一种适应方法，使用梯度自由优化来识别和优化这些点 During a single run。

A Language-Agent Approach to Formal Theorem-Proving

paper_url: http://arxiv.org/abs/2310.04353
repo_url: https://github.com/trishullab/copra
paper_authors: Amitayush Thakur, Yeming Wen, Swarat Chaudhuri
for: 这篇论文旨在开发一种基于大语言模型（LLM）的语言代理方法，用于控制任务。
methods: 该方法使用一个高容量黑盒LMM（GPT-4）作为策略，并在状态备份搜索中使用外部数据库中的证明和定义。在搜索过程中，策略可以选择证明策略和从外部数据库中检索证明和定义，并将每次选择的策略执行在基础证明框架中。搜索还跟踪选择历史记录，并使用其来减少幻觉和不必要的LMM查询。
results: 在Lean的miniF2F标准套件和Compcert项目中的Coq任务上，COPRA比一次性调用GPT-4和已经精心适应证明数据的当前状态-of-the-art模型更快地找到正确的证明。

Abstract
Language agents, which use a large language model (LLM) capable of in-context learning to interact with an external environment, have recently emerged as a promising approach to control tasks. We present the first language-agent approach to formal theorem-proving. Our method, COPRA, uses a high-capacity, black-box LLM (GPT-4) as part of a policy for a stateful backtracking search. During the search, the policy can select proof tactics and retrieve lemmas and definitions from an external database. Each selected tactic is executed in the underlying proof framework, and the execution feedback is used to build the prompt for the next policy invocation. The search also tracks selected information from its history and uses it to reduce hallucinations and unnecessary LLM queries. We evaluate COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the Compcert project. On these benchmarks, COPRA is significantly better than one-shot invocations of GPT-4, as well as state-of-the-art models fine-tuned on proof data, at finding correct proofs quickly.

摘要
language agents，具有大型语言模型（LLM）可进行在上下文中学习的能力，最近被认为是控制任务的有望的方法。我们现在提出了第一个语言代理方法，用于形式证明。我们的方法，COPRA，使用一个高容量的黑盒子LLM（GPT-4）作为一个状态备用搜索的策略。在搜索过程中，策略可以选择证明策略和从外部数据库中检索证明和定义。每次选择的策略都会在下面的证明框架中执行，并将执行反馈用于下一次策略调用的建议。搜索还跟踪了历史记录中的选定信息，并使用其来减少幻想和不必要的LLM查询。我们对miniF2F测试准则和Coq项目中的一组任务进行评估。在这些准则上，COPRA表现出了较一次GPT-4的 invoke 和状态当前模型，快速找到正确的证明。

Neur2RO: Neural Two-Stage Robust Optimization

paper_url: http://arxiv.org/abs/2310.04345
repo_url: https://github.com/khalil-research/neur2ro
paper_authors: Justin Dumouchelle, Esther Julien, Jannis Kurtz, Elias B. Khalil
for: 这篇论文旨在提出一种高效的机器学习驱动的二stage robust优化算法（Neur2RO），用于解决具有最差情况不确定性的决策问题。
methods: 论文提出了一种基于神经网络的 column-and-constraint generation（CCG）算法，通过嵌入神经网络到 CCG 中来实现高质量的解决方案。
results: 实验表明，Neur2RO 可以快速地获得高质量的解决方案，比如在 knapsack 问题上，Neur2RO 可以在几秒钟内获得比best-known值的2%的解决方案，而且在更大和更复杂的实例上，Neur2RO 还可以获得更好的解决方案。在 capital budgeting 问题上，Neur2RO 比三种 $k$-adaptability 算法更高效，尤其是在最大实例上，解决时间减少了5到10倍。

Abstract
Robust optimization provides a mathematical framework for modeling and solving decision-making problems under worst-case uncertainty. This work addresses two-stage robust optimization (2RO) problems (also called adjustable robust optimization), wherein first-stage and second-stage decisions are made before and after uncertainty is realized, respectively. This results in a nested min-max-min optimization problem which is extremely challenging computationally, especially when the decisions are discrete. We propose Neur2RO, an efficient machine learning-driven instantiation of column-and-constraint generation (CCG), a classical iterative algorithm for 2RO. Specifically, we learn to estimate the value function of the second-stage problem via a novel neural network architecture that is easy to optimize over by design. Embedding our neural network into CCG yields high-quality solutions quickly as evidenced by experiments on two 2RO benchmarks, knapsack and capital budgeting. For knapsack, Neur2RO finds solutions that are within roughly $2\%$ of the best-known values in a few seconds compared to the three hours of the state-of-the-art exact branch-and-price algorithm; for larger and more complex instances, Neur2RO finds even better solutions. For capital budgeting, Neur2RO outperforms three variants of the $k$-adaptability algorithm, particularly on the largest instances, with a 5 to 10-fold reduction in solution time. Our code and data are available at https://github.com/khalil-research/Neur2RO.

摘要
Robust优化提供了一个数学框架，用于模型和解决面临最坏情况不确定性的决策问题。这项工作关注了两阶段稳健优化（2RO）问题，其中第一阶段和第二阶段决策在不确定性实现前后分别进行。这将导致一个嵌套的最小值最大值最小值优化问题，计算上非常复杂，特别是当决策是离散的时候。我们提出了Neur2RO，一种高效的机器学习驱动的列和约束生成（CCG）实现。具体来说，我们通过一种新的神经网络架构来估算第二阶段问题的价值函数，这种神经网络架构易于优化。将我们的神经网络 embedding到 CCG 中，可以快速获得高质量的解决方案，经实验表明，在两个 2RO benchmark 中，Neur2RO 可以在几秒钟内获得比best-known值几乎2%的解决方案，而且在更大和更复杂的实例中，Neur2RO 可以获得更好的解决方案。对于资本投入问题，Neur2RO 可以在三个 $k$-adaptability 算法的基础上更好地解决问题，尤其是在最大实例中，解决时间减少了5-10倍。我们的代码和数据可以在中找到。

T-Rep: Representation Learning for Time Series using Time-Embeddings

paper_url: http://arxiv.org/abs/2310.04486
repo_url: None
paper_authors: Archibald Fraikin, Adrien Bennetot, Stéphanie Allassonnière
for: 本研究旨在Addressing the challenges of multivariate time series data in machine learning, specifically unlabeled, high-dimensional, noisy, and missing data.
methods: 提出了T-Rep方法，它是一种自动编写的方法，通过学习时间序列表示来捕捉时间特征，包括趋势、周期性和分布变化。
results: 与现有的自动编写方法进行比较，T-Rep在下游分类、预测和异常检测任务中表现出色，在缺失数据情况下也表现更加稳定。此外，通过准确可见性实验，表明学习的表示具有可读性。

Abstract
Multivariate time series present challenges to standard machine learning techniques, as they are often unlabeled, high dimensional, noisy, and contain missing data. To address this, we propose T-Rep, a self-supervised method to learn time series representations at a timestep granularity. T-Rep learns vector embeddings of time alongside its feature extractor, to extract temporal features such as trend, periodicity, or distribution shifts from the signal. These time-embeddings are leveraged in pretext tasks, to incorporate smooth and fine-grained temporal dependencies in the representations, as well as reinforce robustness to missing data. We evaluate T-Rep on downstream classification, forecasting, and anomaly detection tasks. It is compared to existing self-supervised algorithms for time series, which it outperforms in all three tasks. We test T-Rep in missing data regimes, where it proves more resilient than its counterparts. Finally, we provide latent space visualisation experiments, highlighting the interpretability of the learned representations.

摘要
多变量时间序列呈现出标准机器学习技术的挑战，因为它们通常无标签、高维、噪音和存在散失数据。为解决这一问题，我们提出T-Rep方法，这是一种自我超级vised的方法，用于在时间步长级别上学习时间序列表示。T-Rep将时间序列与其特征提取器一起学习 vector embedding，以EXTRACT时间特征，如趋势、周期性和分布变化。这些时间Embedding被用于预text任务中，以捕捉细致的时间相关性，以及在散失数据的情况下强化Robustness。我们对T-Rep进行下游分类、预测和异常检测任务的评估，并与现有的自我超级vised算法进行比较。 results show that T-Rep在所有三个任务中表现出色，并在散失数据的情况下更加稳定。最后，我们进行了隐藏空间视觉实验，以显示学习的表示的可读性。

Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

paper_url: http://arxiv.org/abs/2310.04323
repo_url: None
paper_authors: Yuxin Pan, Yize Chen, Fangzhen Lin
for: 解决在线三维堆包含问题（3D-BPP）的有效政策设计，因为问题的不可预测性和物理约束，带来了长期的挑战。
methods: 我们首先引入了一种排序基于的攻击者，以Investigate现有的DRL和冒险方法在解决3D-BPP问题中的实际Robustness。然后，我们提出了一种可调策略Robust reinforcement learning（AR2L）框架，可以efficient地调整Robustness的权重，以实现策略在平均和worst-case环境中的desired平衡。
results: 我们的实验表明，AR2L是一种多功能的策略，可以提高策略的Robustness，同时保持 Nominal 情况下的性能在接受 Water level。

Abstract
Designing effective policies for the online 3D bin packing problem (3D-BPP) has been a long-standing challenge, primarily due to the unpredictable nature of incoming box sequences and stringent physical constraints. While current deep reinforcement learning (DRL) methods for online 3D-BPP have shown promising results in optimizing average performance over an underlying box sequence distribution, they often fail in real-world settings where some worst-case scenarios can materialize. Standard robust DRL algorithms tend to overly prioritize optimizing the worst-case performance at the expense of performance under normal problem instance distribution. To address these issues, we first introduce a permutation-based attacker to investigate the practical robustness of both DRL-based and heuristic methods proposed for solving online 3D-BPP. Then, we propose an adjustable robust reinforcement learning (AR2L) framework that allows efficient adjustment of robustness weights to achieve the desired balance of the policy's performance in average and worst-case environments. Specifically, we formulate the objective function as a weighted sum of expected and worst-case returns, and derive the lower performance bound by relating to the return under a mixture dynamics. To realize this lower bound, we adopt an iterative procedure that searches for the associated mixture dynamics and improves the corresponding policy. We integrate this procedure into two popular robust adversarial algorithms to develop the exact and approximate AR2L algorithms. Experiments demonstrate that AR2L is versatile in the sense that it improves policy robustness while maintaining an acceptable level of performance for the nominal case.

摘要
“设计有效的策略 для在线三维弹性问题（3D-BPP）已经是一个长期的挑战，主要是因为来宾盒子序列的不可预测性和严格的物理限制。现有的深度强化学习（DRL）方法可以对在线3D-BPP中的均值性表现进行优化，但是它们在实际世界中可能会失败，因为一些最差情况可能会出现。标准的Robust DRL算法往往将最差情况的表现优先考虑，导致在正常问题域中的表现不佳。为解决这些问题，我们首先引入了一个 permutation-based 攻击者，以investigate the practical robustness of both DRL-based和heuristic methods proposed for solving online 3D-BPP。然后，我们提出了一个可调robust reinforcement learning（AR2L）框架，可以实现政策的性能平衡。具体来说，我们将目标函数设计为一个权重加权的总和，并 derivethe lower performance bound by relating to the return under a mixture dynamics。为实现这个下界，我们运用了一个迭代程序，寻找相应的混合动力学和改善相应的政策。我们将这个程序整合到了两种流行的Robust adversarial algorithms中，以开发出精确和近似的AR2L算法。实验结果显示，AR2L是一个多元的策略，可以提高政策的Robustness，同时维持nominal case中的表现水准。”

Towards A Robust Group-level Emotion Recognition via Uncertainty-Aware Learning

paper_url: http://arxiv.org/abs/2310.04306
repo_url: None
paper_authors: Qing Zhu, Qirong Mao, Jialin Zhang, Xiaohua Huang, Wenming Zheng
for: 这篇论文是为了提出一种能够在不约束环境下更好地进行人群情感识别（GER）的方法。
methods: 该方法使用了不确定性感知（UAL）技术，通过显式地模型每个个体的不确定性，使用 Gaussian 分布中的杂态 embedding 来捕捉每个个体的可能性。在推理阶段，通过这种杂态性，生成多种情感预测。此外，还开发了一个图像增强模块，以提高模型对严重噪声的抗颤响性。
results: 实验结果表明，该方法在三个通用的数据库上达到了高效性和普适性。

Abstract
Group-level emotion recognition (GER) is an inseparable part of human behavior analysis, aiming to recognize an overall emotion in a multi-person scene. However, the existing methods are devoted to combing diverse emotion cues while ignoring the inherent uncertainties under unconstrained environments, such as congestion and occlusion occurring within a group. Additionally, since only group-level labels are available, inconsistent emotion predictions among individuals in one group can confuse the network. In this paper, we propose an uncertainty-aware learning (UAL) method to extract more robust representations for GER. By explicitly modeling the uncertainty of each individual, we utilize stochastic embedding drawn from a Gaussian distribution instead of deterministic point embedding. This representation captures the probabilities of different emotions and generates diverse predictions through this stochasticity during the inference stage. Furthermore, uncertainty-sensitive scores are adaptively assigned as the fusion weights of individuals' face within each group. Moreover, we develop an image enhancement module to enhance the model's robustness against severe noise. The overall three-branch model, encompassing face, object, and scene component, is guided by a proportional-weighted fusion strategy and integrates the proposed uncertainty-aware method to produce the final group-level output. Experimental results demonstrate the effectiveness and generalization ability of our method across three widely used databases.

摘要
group-level emotion recognition (GER) 是人类行为分析中不可或缺的一部分，旨在在多人场景中识别总体的情感。然而，现有的方法均是通过结合多种情感迹象来实现，而忽略了无结构环境中的自然不确定性，如群体中的堵塞和遮挡。此外，只有群体级别的标签可用，因此在同一个群体中不一致的情感预测可能会混淆网络。在这篇论文中，我们提出了一种不确定性意识学习（UAL）方法，以提取更加稳定的表示 для GER。我们通过显式地模型每个个体的不确定性，使用 Gaussian 分布中的随机点 embedding，而不是固定点 embedding。这种表示捕捉了不同情感的概率，并在推理阶段通过随机性产生多种预测。此外，我们还开发了一个图像增强模块，以提高模型对严重噪声的抗锋性。总体来说，我们的三支分支模型，包括人脸、物体和场景组件，采用比例权重混合策略，并将我们提出的不确定性意识学习方法 integrate 到生成最终群体级别输出。实验结果表明我们的方法在三个通用的数据库上表现出色，并且具有普适性和泛化能力。

Coding by Design: GPT-4 empowers Agile Model Driven Development

paper_url: http://arxiv.org/abs/2310.04304
repo_url: None
paper_authors: Ahmed R. Sadik, Sebastian Brulin, Markus Olhofer
for: 这个研究的目的是提出一种基于 OpenAI GPT-4 的 Agile Model-Driven Development (MDD) 方法，以便在使用自然语言生成代码时解决模型中的歧义性问题。
methods: 这种方法包括在首层和第二层使用 Unified Model Language (UML) 图文 Representation，然后在第三层使用 GPT-4 自动生成代码。在第二层，我们引入了两组约束来减少模型的歧义性，包括 Object Constraints Language (OCL) 和 FIPA ontology。
results: 我们的研究表明，使用这种方法可以生成符合预期 UML 序列图的行为，并且对代码结构进行了比较。结果表明，使用ontology-constrained模型可以生成更复杂的代码，但这种代码仍然可以被轻松地测试和维护。

Abstract
Generating code from a natural language using Large Language Models (LLMs) such as ChatGPT, seems groundbreaking. Yet, with more extensive use, it's evident that this approach has its own limitations. The inherent ambiguity of natural language presents challenges for complex software designs. Accordingly, our research offers an Agile Model-Driven Development (MDD) approach that enhances code auto-generation using OpenAI's GPT-4. Our work emphasizes "Agility" as a significant contribution to the current MDD method, particularly when the model undergoes changes or needs deployment in a different programming language. Thus, we present a case-study showcasing a multi-agent simulation system of an Unmanned Vehicle Fleet. In the first and second layer of our approach, we constructed a textual representation of the case-study using Unified Model Language (UML) diagrams. In the next layer, we introduced two sets of constraints that minimize model ambiguity. Object Constraints Language (OCL) is applied to fine-tune the code constructions details, while FIPA ontology is used to shape communication semantics and protocols. Ultimately, leveraging GPT-4, our last layer auto-generates code in both Java and Python. The Java code is deployed within the JADE framework, while the Python code is deployed in PADE framework. Concluding our research, we engaged in a comprehensive evaluation of the generated code. From a behavioural standpoint, the auto-generated code aligned perfectly with the expected UML sequence diagram. Structurally, we compared the complexity of code derived from UML diagrams constrained solely by OCL to that influenced by both OCL and FIPA-ontology. Results indicate that ontology-constrained model produce inherently more intricate code, but it remains manageable and low-risk for further testing and maintenance.

摘要
使用大型自然语言模型（LLM）如ChatGPT生成代码看起来是一项创新的技术，但是随着更广泛的使用，这种方法的限制也变得更加明显。自然语言的内在抽象性会导致复杂的软件设计困难。因此，我们的研究提出了一种基于Model-Driven Development（MDD）的Agile模型驱动方法，通过OpenAI的GPT-4提高代码自动生成。我们的工作强调“适应”作为我们的贡献，特别是当模型进行变更或需要在不同编程语言中部署时。因此，我们提供了一个多代理 simulations系统的无人车队例子。在我们的方法中，在第一层和第二层，我们使用Unified Model Language（UML）图文描述了这个例子。在下一层，我们引入了两组约束，以降低模型的抽象性。Object Constraints Language（OCL）用于细化代码构造细节，而FIPAontology用于形成通信协议和 semantics。最后，通过GPT-4，我们的最后一层自动生成代码在Java和Python两种编程语言中。Java代码在JADE框架中部署，而Python代码在PADE框架中部署。在结束我们的研究后，我们进行了全面的代码生成评估。从行为上来看，自动生成的代码与预期的UML序列图完全匹配。从结构上来看，我们比较了由UML图文描述的代码和只由OCL约束的代码的复杂性。结果表明，基于ontology的模型生成的代码具有更高的复杂性，但是它仍然可以被成功地测试和维护。

Identifying Representations for Intervention Extrapolation

paper_url: http://arxiv.org/abs/2310.04295
repo_url: None
paper_authors: Sorawit Saengkyongam, Elan Rosenfeld, Pradeep Ravikumar, Niklas Pfister, Jonas Peters
for: 本研究旨在提高当前表征学学习方法的泛化性和稳定性。
methods: 本研究使用了可识别表征学学习方法，并结合了权重变换约束来保证表征的线性不变性。
results: 研究表明，通过使用可识别表征学学习方法，可以在不见到干预的情况下预测干预的效果，并且可以减少干预的影响。

Abstract
The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: predicting how interventions affect an outcome, even when those interventions are not observed at training time, and show that identifiable representations can provide an effective solution to this task even if the interventions affect the outcome non-linearly. Our setup includes an outcome Y, observed features X, which are generated as a non-linear transformation of latent features Z, and exogenous action variables A, which influence Z. The objective of intervention extrapolation is to predict how interventions on A that lie outside the training support of A affect Y. Here, extrapolation becomes possible if the effect of A on Z is linear and the residual when regressing Z on A has full support. As Z is latent, we combine the task of intervention extrapolation with identifiable representation learning, which we call Rep4Ex: we aim to map the observed features X into a subspace that allows for non-linear extrapolation in A. We show using Wiener's Tauberian theorem that the hidden representation is identifiable up to an affine transformation in Z-space, which is sufficient for intervention extrapolation. The identifiability is characterized by a novel constraint describing the linearity assumption of A on Z. Based on this insight, we propose a method that enforces the linear invariance constraint and can be combined with any type of autoencoder. We validate our theoretical findings through synthetic experiments and show that our approach succeeds in predicting the effects of unseen interventions.

摘要
《表观性和 causal 表示学习的前提是提高当前表示学习 paradigm 的普遍性或Robustness。尽管最近的进展在表示可识别性方面，但需要更多的理论成果，证明这些方法在下游任务中的优点。在这篇论文中，我们考虑了 intervención extrapolation 任务：预测干扰的影响，即在训练时没有观察到的干扰，并显示了可识别表示可以提供非线性干扰预测的有效解决方案。我们的设置包括结果 Y，观察特征 X，它们是非线性变换的隐藏特征 Z 的生成，以及外部动作变量 A，它们影响 Z。干扰 extrapolation 的目标是预测 A 在训练支持外的干扰对 Y 的影响。在 Z 是隐藏的情况下，推断成为可能，如果 A 对 Z 的效果是线性的，并且 Z 中的剩余差异拥有完整的支持。我们将 Representation 4 Ex（Rep4Ex）任务与可识别表示学习结合在一起，即将观察特征 X 映射到一个允许非线性推断 A 的子空间。根据维ener 的 Tauberian 定理，隐藏表示是可识别的，即在 Z 空间中可以确定一个 afine 变换。这种可识别性是基于一个新的约束，描述了 A 对 Z 的线性假设。根据这一点，我们提出了一种方法，可以与任何类型的 autoencoder 结合使用，并且可以满足这种约束。我们通过synthetic实验验证了我们的理论发现，并证明了我们的方法可以预测未见干扰的效果。

Searching for Optimal Runtime Assurance via Reachability and Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.04288
repo_url: None
paper_authors: Kristina Miller, Christopher K. Zeitler, William Shen, Kerianne Hobbs, Sayan Mitra, John Schierman, Mahesh Viswanathan
for: 本研究旨在开发一种可靠的runtime assurance system (RTA)，以确保安全性while exercising an untrusted或实验性控制器。
methods: 本研究使用 reward shaping和 reinforcement learning来解决RTA的optimal设计问题，可以保证安全性并利用机器学习技术来提高可扩展性。
results: 对于一些复杂的安全需求的3D空间飞机模型，我们的方法可以保证安全性并提高实验控制器的使用率，比已有方法更高。

Abstract
A runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup (or safety) controller. The relevant computational design problem is to create a logic that assures safety by switching to the safety controller as needed, while maximizing some performance criteria, such as the utilization of the untrusted controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it. Our approach relies on reward shaping and reinforcement learning. It can guarantee safety and leverage machine learning technologies for scalability. We have implemented this algorithm and present experimental results comparing our approach with state-of-the-art reachability and simulation-based RTA approaches in a number of scenarios using aircraft models in 3D space with complex safety requirements. Our approach can guarantee safety while increasing utilization of the experimental controller over existing approaches.

摘要
traducción al chino simplificado:runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup (or safety) controller. The relevant computational design problem is to create a logic that assures safety by switching to the safety controller as needed, while maximizing some performance criteria, such as the utilization of the untrusted controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it. Our approach relies on reward shaping and reinforcement learning. It can guarantee safety and leverage machine learning technologies for scalability. We have implemented this algorithm and present experimental results comparing our approach with state-of-the-art reachability and simulation-based RTA approaches in a number of scenarios using aircraft models in 3D space with complex safety requirements. Our approach can guarantee safety while increasing utilization of the experimental controller over existing approaches.Notes:* "runtime assurance system" (RTA) is translated as "runtime assurance system" (RTA)* "untrusted or experimental controller" is translated as "untrusted or experimental controller"* "backup (or safety) controller" is translated as "backup (or safety) controller"* "computational design problem" is translated as "computational design problem"* "reward shaping and reinforcement learning" is translated as "奖励形态和返回学习"* "state-of-the-art reachability and simulation-based RTA approaches" is translated as "现有的可达性和模拟基于RTA方法"* "experimental results" is translated as "实验结果"* "complex safety requirements" is translated as "复杂的安全要求"

Assessing Robustness via Score-Based Adversarial Image Generation

paper_url: http://arxiv.org/abs/2310.04285
repo_url: None
paper_authors: Marcel Kollovieh, Lukas Gosch, Yan Scholten, Marten Lienen, Stephan Günnemann
for: 本研究旨在探讨针对抗骚扰攻击的限制， traditional的方法只能在 $\ell_p$ 概率范围内进行攻击，但是这些限制无法捕捉所有Semantic-preserving的骚扰。
methods: 本文提出了 Score-Based Adversarial Generation（ScoreAG）框架，利用了分布式生成模型的进步，可以生成不受 $\ell_p$ 概率范围限制的骚扰例，称为不限制骚扰例。
results: 对多个benchmark进行了广泛的实验，得到的结果显示，ScoreAG具有与现状最佳的攻击和防御性能，而且可以纠正攻击者所搅入的杂谏。

Abstract
Most adversarial attacks and defenses focus on perturbations within small $\ell_p$-norm constraints. However, $\ell_p$ threat models cannot capture all relevant semantic-preserving perturbations, and hence, the scope of robustness evaluations is limited. In this work, we introduce Score-Based Adversarial Generation (ScoreAG), a novel framework that leverages the advancements in score-based generative models to generate adversarial examples beyond $\ell_p$-norm constraints, so-called unrestricted adversarial examples, overcoming their limitations. Unlike traditional methods, ScoreAG maintains the core semantics of images while generating realistic adversarial examples, either by transforming existing images or synthesizing new ones entirely from scratch. We further exploit the generative capability of ScoreAG to purify images, empirically enhancing the robustness of classifiers. Our extensive empirical evaluation demonstrates that ScoreAG matches the performance of state-of-the-art attacks and defenses across multiple benchmarks. This work highlights the importance of investigating adversarial examples bounded by semantics rather than $\ell_p$-norm constraints. ScoreAG represents an important step towards more encompassing robustness assessments.

摘要
大多数敌对攻击和防御都集中在小$\ell_p$-norm的偏差内。然而，$\ell_p$ 威胁模型无法捕捉所有具有 semantic-preserving 偏差的攻击，因此robustness评估的范围有限。在这项工作中，我们介绍了Score-Based Adversarial Generation（ScoreAG），一种新的框架，利用了得到的分数基本生成模型来生成 beyond $\ell_p$-norm 的攻击例子，即所谓的无限攻击例子，超越它们的局限性。与传统方法不同，ScoreAG保留了图像的核心含义，并生成了真实的攻击例子，可以是对现有图像进行变换还是从头开始生成全新的图像。我们进一步利用了ScoreAG的生成能力来纯化图像，实际上提高了分类器的 robustness。我们的广泛的实验证明了 ScoreAG 与当前状态的攻击和防御技术在多个benchmark上具有相同的性能。这项工作强调了对于 adversarial examples 的Semantic bounded 而不是 $\ell_p$-norm 的限制是更加重要的。ScoreAG 代表了更全面的robustness评估的一个重要步阶。

From task structures to world models: What do LLMs know?

paper_url: http://arxiv.org/abs/2310.04276
repo_url: None
paper_authors: Ilker Yildirim, L. A. Paul
for: 本研究探讨了大语言模型具备知识的方面，挑战我们对智能和知识的假设。
methods: 本研究使用了大语言模型，探讨了这些模型具备的能力是否可以视为知识。
results: 研究发现，大语言模型具备一种称为”工具知识”的知识，这种知识定义为一组能力。然而，这种知识与人类agens所表现的”世界知识”之间存在关系，需要进一步探讨。

Abstract
In what sense does a large language model have knowledge? The answer to this question extends beyond the capabilities of a particular AI system, and challenges our assumptions about the nature of knowledge and intelligence. We answer by granting LLMs "instrumental knowledge"; knowledge defined by a certain set of abilities. We then ask how such knowledge is related to the more ordinary, "worldly" knowledge exhibited by human agents, and explore this in terms of the degree to which instrumental knowledge can be said to incorporate the structured world models of cognitive science. We discuss ways LLMs could recover degrees of worldly knowledge, and suggest such recovery will be governed by an implicit, resource-rational tradeoff between world models and task demands.

摘要
哪种意义上的知识具有大语言模型知识？答案超出了某个AI系统的能力，挑战我们对知识和智能的假设。我们回答是通过授予LLMs“工具知识”，即知识定义为某些能力。然后我们问这种知识与人类代理的“日常”知识之间的关系，并通过考虑这种知识是否可以通过认知科学中的结构化世界模型来捕捉。我们讨论LLMs如何恢复世界知识的方式，并建议这种恢复受到了隐式的资源可用性评估和任务需求的交互。

A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

paper_url: http://arxiv.org/abs/2310.04270
repo_url: None
paper_authors: Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, Jimmy Huang
for: 本研究旨在评估大型自然语言模型（LLM）在生物医学领域中的表现。
methods: 我们使用4种Popular LLM在6种多样化的生物医学任务中进行了广泛的评估，并在26个数据集上进行了对比。
results: 我们发现，在生物医学数据集中，具有较小训练集的零 shot LLM可以超越当前状态的拟合biomedical模型。此外，我们发现不同的LLM在不同任务中的表现可能会异常，而且 их表现仍然远低于经过大训练集的精心调整的生物医学模型。然而，我们的发现表明LLM在缺乏大量标注数据的任务中可能是一种有价值的工具。

Abstract
Recently, Large Language Models (LLM) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, we conduct a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.

摘要
Interestingly, we find that in biomedical datasets with smaller training sets, zero-shot LLMs outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs specialized in the biomedical domain. We also find that no single LLM can outperform other LLMs in all tasks, and the performance of different LLMs varies depending on the task. While their performance is still poor compared to fine-tuned biomedical models, our findings show that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.

DRIFT: Deep Reinforcement Learning for Intelligent Floating Platforms Trajectories

paper_url: http://arxiv.org/abs/2310.04266
repo_url: https://github.com/elharirymatteo/rans
paper_authors: Matteo El-Hariry, Antoine Richard, Vivek Muralidharan, Baris Can Yalcin, Matthieu Geist, Miguel Olivares-Mendez
For: This paper introduces a novel deep reinforcement learning-based suite to control floating platforms in both simulated and real-world environments.* Methods: The paper uses state-of-the-art deep reinforcement learning techniques to train policies capable of precise maneuvers amid dynamic and unpredictable conditions.* Results: The paper achieves robustness, adaptability, and good transferability from simulation to reality, and provides a comprehensive platform for researchers with open-access on GitHub.

Abstract
This investigation introduces a novel deep reinforcement learning-based suite to control floating platforms in both simulated and real-world environments. Floating platforms serve as versatile test-beds to emulate microgravity environments on Earth. Our approach addresses the system and environmental uncertainties in controlling such platforms by training policies capable of precise maneuvers amid dynamic and unpredictable conditions. Leveraging state-of-the-art deep reinforcement learning techniques, our suite achieves robustness, adaptability, and good transferability from simulation to reality. Our Deep Reinforcement Learning (DRL) framework provides advantages such as fast training times, large-scale testing capabilities, rich visualization options, and ROS bindings for integration with real-world robotic systems. Beyond policy development, our suite provides a comprehensive platform for researchers, offering open-access at https://github.com/elharirymatteo/RANS/tree/ICRA24.

摘要
这个研究引入了一套基于深度优化学习的控制浮 плаform suite，可以在模拟和实际环境中控制浮 плаform。浮 плаform 作为地球上模拟微重力环境的软件平台，我们的方法可以在不稳定和随机的环境中做精准的操作。通过使用现代深度优化学习技术，我们的框架可以实现对系统和环境不确定性的适应，以及从模拟到实际的好转移性。我们的深度优化学习（DRL）框架具有快速训练时间、大规模测试能力、丰富的视觉化选项以及ROS绑定，以便与实际机器人系统集成。此外，我们的框架还提供了许多研究平台，包括开源的Github上的https://github.com/elharirymatteo/RANS/tree/ICRA24。

Ada-Instruct: Adapting Instruction Generators for Complex Reasoning

paper_url: http://arxiv.org/abs/2310.04484
repo_url: https://github.com/wangitu/ada-instruct
paper_authors: Wanyun Cui, Qianle Wang
for: 提高大语言模型（LLM）下推荐练习任务的效果，以提高模型的推荐能力。
methods: 利用开源大语言模型进行精度调整，以生成较长的复杂指令，以解决现有方法无法生成长度超过100的指令问题。
results: 对多种应用（代码完成、数学逻辑、常识理解）进行了实验 Validation，并证明了 Ada-Instruct 的优越性，比其基础模型、自我指导方法和当前状态艺术模型更好。

Abstract
Generating diverse and sophisticated instructions for downstream tasks by Large Language Models (LLMs) is pivotal for advancing the effect. Current approaches leverage closed-source LLMs, employing in-context prompting for instruction generation. However, in this paper, we found that in-context prompting cannot generate complex instructions with length $\ge 100$ for tasks like code completion. To solve this problem, we introduce Ada-Instruct, an adaptive instruction generator developed by fine-tuning open-source LLMs. Our pivotal finding illustrates that fine-tuning open-source LLMs with a mere ten samples generates long instructions that maintain distributional consistency for complex reasoning tasks. We empirically validated Ada-Instruct's efficacy across different applications, including code completion, mathematical reasoning, and commonsense reasoning. The results underscore Ada-Instruct's superiority, evidencing its improvements over its base models, current self-instruct methods, and other state-of-the-art models.

摘要
<>将文本翻译成简化中文。<>现有 Approaches 使用关闭源 LLMs 进行下游任务的指令生成。然而，这篇论文发现，在 Context 提示下，不能生成长度超过 100 的复杂指令。为解决这个问题，我们介绍了 Ada-Instruct，一种基于开源 LLMs 的适应指令生成器。我们的重要发现是，只需要 Fine-Tuning 开源 LLMs 的 Ten 个样本，就可以生成长指令，保持分布性一致性。我们对 Ada-Instruct 的可行性进行了不同应用的实验 validate，包括代码完成、数学逻辑和常识逻辑。结果表明 Ada-Instruct 的优势，比 Base 模型、现有自我指令方法和其他状态艺术模型都更好。

Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison

paper_url: http://arxiv.org/abs/2310.04241
repo_url: None
paper_authors: Moritz Lange, Noah Krystiniak, Raphael C. Engelhardt, Wolfgang Konen, Laurenz Wiskott
for: 提高实际RL环境中效率和可靠性的假设observation representation学习方法。
methods: 比较常见的 auxillary task 基于，与之前没有 decoupled representation learning 方法进行比较。
results: 在简单的摆铃到复杂的模拟 robotics 任务中，表示学习环境动力学是更有利于预测奖励的。这些发现可以指导未来的假设observation representation学习方法的开发，并推动RL解决方案在实际场景中的应用。

Abstract
Real-world reinforcement learning (RL) environments, whether in robotics or industrial settings, often involve non-visual observations and require not only efficient but also reliable and thus interpretable and flexible RL approaches. To improve efficiency, agents that perform state representation learning with auxiliary tasks have been widely studied in visual observation contexts. However, for real-world problems, dedicated representation learning modules that are decoupled from RL agents are more suited to meet requirements. This study compares common auxiliary tasks based on, to the best of our knowledge, the only decoupled representation learning method for low-dimensional non-visual observations. We evaluate potential improvements in sample efficiency and returns for environments ranging from a simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks only provides performance gains in sufficiently complex environments and that learning environment dynamics is preferable to predicting rewards. These insights can inform future development of interpretable representation learning approaches for non-visual observations and advance the use of RL solutions in real-world scenarios.

摘要
现实世界中的强化学习（RL）环境，无论是机器人或工业场景，通常会包含非视觉观察和需要高效、可靠、可解释和灵活的RL方法。为了提高效率，在视觉观察上进行状态表示学习的代理人已经广泛研究。然而，在实际问题中，专门为状态表示学习设计的模块更适合满足需求。本研究比较了常见的辅助任务，基于我们所知道的唯一分离状态学习方法，对低维非视觉观察进行评估。我们发现，只有在 suficiently complex 环境下，代理人可以从状态学习中获得性能提升。此外，我们发现，学习环境动态是更好地预测奖励，而不是预测奖励本身。这些发现可以指导未来的解释状态学习方法的发展，以及RL解决方案在实际场景中的应用。

The WayHome: Long-term Motion Prediction on Dynamically Scaled

paper_url: http://arxiv.org/abs/2310.04232
repo_url: None
paper_authors: Kay Scheerer, Thomas Michalke, Juergen Mathes
for: This paper is written for the purpose of developing a novel motion forecasting approach for autonomous vehicles, specifically to accurately predict the motion of other objects in the surrounding environment.
methods: The paper uses a neural network-based model to predict multiple heatmaps for every traffic participant in the vicinity of the autonomous vehicle, with one heatmap per timestep. The heatmaps are then used as input to a novel sampling algorithm that extracts coordinates corresponding to the most likely future positions.
results: The approach improves state-of-the-art miss rate performance for the function-relevant prediction interval of 3 seconds, while being competitive in longer prediction intervals (up to eight seconds). The evaluation is done on the public 2022 Waymo motion challenge.Here is the same information in Simplified Chinese text:
for: 这篇论文是为了开发一种新的自动驾驶车动力预测方法，特别是准确预测周围环境中其他对象的运动。
methods: 该论文使用神经网络模型预测每个交通参与者的多个热图，每个热图一个时间步长。热图然后用作输入，并使用一种新的采样算法提取最有可能性的未来位置坐标。
results: 该方法在3秒预测函数重要预测间隔中提高了状态艺术预测率，同时在更长的预测间隔（Up to 8秒）中保持竞争力。评估基于2022年 Waymo 动力挑战。

Abstract
One of the key challenges for autonomous vehicles is the ability to accurately predict the motion of other objects in the surrounding environment, such as pedestrians or other vehicles. In this contribution, a novel motion forecasting approach for autonomous vehicles is developed, inspired by the work of Gilles et al. [1]. We predict multiple heatmaps with a neuralnetwork-based model for every traffic participant in the vicinity of the autonomous vehicle; with one heatmap per timestep. The heatmaps are used as input to a novel sampling algorithm that extracts coordinates corresponding to the most likely future positions. We experiment with different encoders and decoders, as well as a comparison of two loss functions. Additionally, a new grid-scaling technique is introduced, showing further improved performance. Overall, our approach improves stateof-the-art miss rate performance for the function-relevant prediction interval of 3 seconds while being competitive in longer prediction intervals (up to eight seconds). The evaluation is done on the public 2022 Waymo motion challenge.

摘要
一个关键挑战 для自动驾驶车是正确预测周围环境中其他对象的运动，如行人或其他车辆。在这篇论文中，我们开发了一种新的运动预测方法， Drawing inspiration from the work of Gilles et al. [1]。我们预测每个交通参与者的vicinity中每帧的多个热图，使用神经网络模型，并使用一种新的采样算法来提取最有可能性的未来位置坐标。我们对不同的编码器和解码器进行了比较，以及两种损失函数的比较。此外，我们还引入了一种新的网格缩放技术，以提高性能。总的来说，我们的方法在3秒预测时间内的状态-相关性表现得更好，并且在更长的预测时间（Up to 8秒）中保持竞争力。我们的评估基于2022年 Waymo 动作挑战的公共数据集。

A Fixed-Parameter Tractable Algorithm for Counting Markov Equivalence Classes with the same Skeleton

paper_url: http://arxiv.org/abs/2310.04218
repo_url: None
paper_authors: Vidya Sagar Sharma
for: 本文是针对 conditional dependencies 的 Bayesian networks （也称为 causal DAGs）的一种工具。
methods: 本文使用的方法包括 combinaatorial characterizations 和 fixed parameter tractable algorithm。
results: 本文取得了一种可以在 polynomial time 内解决 Markov equivalence classes 的问题的方法，该方法基于 treewidth 和 maximum degree 的参数。

Abstract
Causal DAGs (also known as Bayesian networks) are a popular tool for encoding conditional dependencies between random variables. In a causal DAG, the random variables are modeled as vertices in the DAG, and it is stipulated that every random variable is independent of its ancestors conditioned on its parents. It is possible, however, for two different causal DAGs on the same set of random variables to encode exactly the same set of conditional dependencies. Such causal DAGs are said to be Markov equivalent, and equivalence classes of Markov equivalent DAGs are known as Markov Equivalent Classes (MECs). Beautiful combinatorial characterizations of MECs have been developed in the past few decades, and it is known, in particular that all DAGs in the same MEC must have the same ''skeleton'' (underlying undirected graph) and v-structures (induced subgraph of the form $a\rightarrow b \leftarrow c$). These combinatorial characterizations also suggest several natural algorithmic questions. One of these is: given an undirected graph $G$ as input, how many distinct Markov equivalence classes have the skeleton $G$? Much work has been devoted in the last few years to this and other closely related problems. However, to the best of our knowledge, a polynomial time algorithm for the problem remains unknown. In this paper, we make progress towards this goal by giving a fixed parameter tractable algorithm for the above problem, with the parameters being the treewidth and the maximum degree of the input graph $G$. The main technical ingredient in our work is a construction we refer to as shadow, which lets us create a "local description'' of long-range constraints imposed by the combinatorial characterizations of MECs.

摘要
causal DAGs (也称为 bayesian networks) 是一种流行的工具，用于编码 conditional dependencies between random variables。在 causal DAG 中，random variables 被视为Vertices在 DAGC 中，并且假设每个 random variable 独立于其父节点 conditioned on its parents。然而，可能存在两个不同的 causal DAG 在同一组 random variables 上，仅仅编码出同样的 conditional dependencies。这些 causal DAG 被称为 Markov 等价（MEC）。在过去几十年中，美丽的 combinatorial caracterizations 被发展出来，并知道，在特定的情况下，所有 DAG 在同一个 MEC 中必须有同样的 skeleton （基本的无向图）和 v-structures （由 $a\rightarrow b \leftarrow c$ 组成的嵌入式子图）。这些 combinatorial caracterizations 还提出了许多自然的 algorithmic 问题。其中一个是：给一个无向图 $G$ 作为输入，在 $G$ 中有多少个不同的 Markov 等价类？在过去几年中，许多工作已经被投入到这个和相关的问题上。然而，到目前为止，一个 polynomial time 算法 для这个问题仍然未知。在本文中，我们在这个问题上做出了进展，提供了一个 fixed parameter tractable 算法，其中的参数是 treewidth 和最大度数。我们的主要技术成分是一种我们称为 "shadow" 的建构，它允许我们创建 "local description" 来描述 long-range 约束，它们由 MEC 的 combinatorial caracterizations 强制实施。

Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface

paper_url: http://arxiv.org/abs/2310.04205
repo_url: https://github.com/amaze18/speeKAR
paper_authors: Anupam Purwar, Rahul Sundar
For: The paper aims to improve the efficiency and cost-effectiveness of language model-based knowledge retrieval systems, particularly for speech-based interfaces.* Methods: The authors propose a keyword-based search framework that uses a smaller language model to generate keywords and compare them with the query, reducing the time and cost of context identification. They also use a larger language model to provide answers based on a prompt tailored for Q&A.* Results: The authors demonstrate that the use of keywords in context identification reduces the overall inference time and cost of information retrieval, making it more feasible to integrate speech-based interfaces with language model-based systems.

Abstract
Retrieving answers in a quick and low cost manner without hallucinations from a combination of structured and unstructured data using Language models is a major hurdle. This is what prevents employment of Language models in knowledge retrieval automation. This becomes accentuated when one wants to integrate a speech interface on top of a text based knowledge retrieval system. Besides, for commercial search and chat-bot applications, complete reliance on commercial large language models (LLMs) like GPT 3.5 etc. can be very costly. In the present study, the authors have addressed the aforementioned problem by first developing a keyword based search framework which augments discovery of the context from the document to be provided to the LLM. The keywords in turn are generated by a relatively smaller LLM and cached for comparison with keywords generated by the same smaller LLM against the query raised. This significantly reduces time and cost to find the context within documents. Once the context is set, a larger LLM uses that to provide answers based on a prompt tailored for Q\&A. This research work demonstrates that use of keywords in context identification reduces the overall inference time and cost of information retrieval. Given this reduction in inference time and cost with the keyword augmented retrieval framework, a speech based interface for user input and response readout was integrated. This allowed a seamless interaction with the language model.

摘要
Retrieving answers quickly and at a low cost without relying on hallucinations from a combination of structured and unstructured data using language models is a major challenge. This is what prevents the use of language models in knowledge retrieval automation. This becomes especially pronounced when one wants to integrate a speech interface on top of a text-based knowledge retrieval system. Moreover, relying solely on commercial large language models (LLMs) like GPT 3.5 for commercial search and chatbot applications can be very costly. In this study, the authors addressed this problem by first developing a keyword-based search framework that enhances the discovery of context from the document to be provided to the LLM. The keywords are generated by a smaller LLM and cached for comparison with keywords generated by the same smaller LLM against the query raised. This significantly reduces the time and cost of finding the context within documents. Once the context is established, a larger LLM uses that to provide answers based on a prompt tailored for Q&A. This research demonstrates that using keywords in context identification reduces the overall inference time and cost of information retrieval. With this reduction in inference time and cost, a speech-based interface for user input and response readout was integrated, allowing for seamless interaction with the language model.

A Bi-objective Perspective on Controllable Language Models: Reward Dropout Improves Off-policy Control Performance

paper_url: http://arxiv.org/abs/2310.04483
repo_url: https://github.com/anonymous-user01/controllability-of-lm-anonymous
paper_authors: Changhun Lee, Chiehyeon Lim
for: 这 paper 是研究 CLM (可控语言模型) 的理论方面，特别是通过 би对象函数优化来同时提高奖励和概率目标。
methods: 这 paper 使用了 bi-objective 优化方法，包括 reward upper bound 和 Pareto improvement/optimality conditions。
results: 研究结果表明，Reward Dropout 方法可以 guarantees policy improvement based on Pareto improvement condition，并且在五个 CLM benchmark 数据集上进行了实验，发现 Reward Dropout 可以显著提高 CLM 的性能。

Abstract
We study the theoretical aspects of CLMs (Controllable Language Models) from a bi-objective optimization perspective. Specifically, we consider the CLMs as an off-policy RL problem that requires simultaneously maximizing the reward and likelihood objectives. Our main contribution consists of three parts. First, we establish the theoretical foundations of CLM by presenting reward upper bound and Pareto improvement/optimality conditions. Second, we analyze conditions that improve and violate Pareto optimality itself, respectively. Finally, we propose Reward Dropout, a simple yet powerful method to guarantee policy improvement based on a Pareto improvement condition. Our theoretical outcomes are supported by not only deductive proofs but also empirical results. The performance of Reward Dropout was evaluated on five CLM benchmark datasets, and it turns out that the Reward Dropout significantly improves the performance of CLMs.

摘要
我们研究控制语言模型（CLM）的理论方面，从双目标优化的角度来看。特别是，我们将CLM视为不同策略RL问题，需要同时 maximize reward和likelihood目标。我们的主要贡献有三个部分：第一部分是建立CLM的理论基础，提出了奖励upper bound和Pareto改进/优化条件。第二部分是分析Pareto优化的条件，并探讨Pareto优化的改进和违反情况。第三部分是提出了一种简单 yet powerful的方法——奖励抽样（Reward Dropout），可以基于Pareto改进条件来保证策略改进。我们的理论成果不仅得到了deductive证明，还得到了实验的支持。我们在五个CLM标准测试集上评估了奖励抽样的性能，结果显示，奖励抽样可以显著提高CLM的性能。

EMOFM: Ensemble MLP mOdel with Feature-based Mixers for Click-Through Rate Prediction

paper_url: http://arxiv.org/abs/2310.04482
repo_url: None
paper_authors: Yujian Betterest Li, Kai Wu
for: 预测点击率 (CTR) 预测
methods: 使用网络基于方法进行类型化特征提取和跨Field信息融合，并使用简单插入混合器进行场&类型 wise ensemble 模型构建
results: 在实验中，提议的模型比基eline模型高效，并且可视化优化过程和简介减少实验结果。未来工作可能包括对不同类型的交互进行考虑。

Abstract
Track one of CTI competition is on click-through rate (CTR) prediction. The dataset contains millions of records and each field-wise feature in a record consists of hashed integers for privacy. For this task, the keys of network-based methods might be type-wise feature extraction and information fusion across different fields. Multi-layer perceptrons (MLPs) are able to extract field feature, but could not efficiently fuse features. Motivated by the natural fusion characteristic of cross attention and the efficiency of transformer-based structures, we propose simple plug-in mixers for field/type-wise feature fusion, and thus construct an field&type-wise ensemble model, namely EMOFM (Ensemble MLP mOdel with Feature-based Mixers). In the experiments, the proposed model is evaluated on the dataset, the optimization process is visualized and ablation studies are explored. It is shown that EMOFM outperforms compared baselines. In the end, we discuss on future work. WARNING: The comparison might not be fair enough since the proposed method is designed for this data in particular while compared methods are not. For example, EMOFM especially takes different types of interactions into consideration while others do not. Anyway, we do hope that the ideas inside our method could help other developers/learners/researchers/thinkers and so on.

摘要
<> translate the following text into Simplified Chinese: Track one of CTI competition is on click-through rate (CTR) prediction. The dataset contains millions of records and each field-wise feature in a record consists of hashed integers for privacy. For this task, the keys of network-based methods might be type-wise feature extraction and information fusion across different fields. Multi-layer perceptrons (MLPs) are able to extract field feature, but could not efficiently fuse features. Motivated by the natural fusion characteristic of cross attention and the efficiency of transformer-based structures, we propose simple plug-in mixers for field/type-wise feature fusion, and thus construct an field&type-wise ensemble model, namely EMOFM (Ensemble MLP mOdel with Feature-based Mixers). In the experiments, the proposed model is evaluated on the dataset, the optimization process is visualized and ablation studies are explored. It is shown that EMOFM outperforms compared baselines. In the end, we discuss on future work. WARNING: The comparison might not be fair enough since the proposed method is designed for this data in particular while compared methods are not. For example, EMOFM especially takes different types of interactions into consideration while others do not. Anyway, we do hope that the ideas inside our method could help other developers/learners/researchers/thinkers and so on.Please note that the translation is in Simplified Chinese, and the formatting of the text may be different from the original English version.Here's the translation:跟踪一项CTI竞赛是Click-through rate（CTR）预测。数据集包含数百万条记录，每个记录中的每个字段特征都是使用哈希值进行隐私保护。为了实现这个任务，网络基于方法的关键可能是类型化特征提取和不同字段之间的信息融合。多层感知器（MLP）可以提取字段特征，但不能高效融合特征。我们受到自然融合特性和转换结构的支持，提出了简单的插入混合器来实现字段/类型特征融合，并构建了一个场&类型特征混合模型，即EMOFM（场&类型特征混合MLP模型）。在实验中，我们评估了提案模型在数据集上，Visualize优化过程和缺省研究。结果表明，EMOFM在比较基线上表现出色。在结尾，我们讨论了未来工作。警告：比较可能不公平，因为我们的方法是为这些数据而设计的，而比较方法则未经设计。例如，EMOFM特别是考虑不同类型的互动，而其他方法并没有。然而，我们希望将我们的方法中的想法传递给其他开发者/学习者/研究者/思想家等。

Conversational Financial Information Retrieval Model (ConFIRM)

paper_url: http://arxiv.org/abs/2310.13001
repo_url: None
paper_authors: Stephen Choi, William Gazeley, Siu Ho Wong, Tingting Li
for: 这 paper 是为了探讨利用大语言模型（LLM）在金融领域中的应用。
methods: 这 paper 使用了一种名为 ConFIRM 的 conversational financial information retrieval模型，包括两个模块：首先，生成金融领域特有的问答对话集；其次，评估多个参数精细调整方法的查询分类任务的准确率。
results: 据测试集数据，ConFIRM 可以达到高于 90% 的准确率，这对于 regulatory compliance 是必要的。ConFIRM 提供了一种数据效率的解决方案，用于提取金融对话系统中的精确查询意图。

Abstract
With the exponential growth in large language models (LLMs), leveraging their emergent properties for specialized domains like finance merits exploration. However, regulated fields such as finance pose unique constraints, requiring domain-optimized frameworks. We present ConFIRM, an LLM-based conversational financial information retrieval model tailored for query intent classification and knowledge base labeling. ConFIRM comprises two modules: 1) a method to synthesize finance domain-specific question-answer pairs, and 2) evaluation of parameter efficient fine-tuning approaches for the query classification task. We generate a dataset of over 4000 samples, assessing accuracy on a separate test set. ConFIRM achieved over 90% accuracy, essential for regulatory compliance. ConFIRM provides a data-efficient solution to extract precise query intent for financial dialog systems.

摘要
随着大语言模型（LLM）的快速增长，利用它们的特性在特定领域如金融方面的应用值得探索。然而，受控领域如金融的特殊要求，需要针对领域进行优化的框架。我们介绍ConFIRM，一个基于LLM的对话金融信息检索模型，适用于查询意图分类和知识库标签。ConFIRM包括两个模块：1. 生成金融领域特定的问答对数生成方法。2. 评估参数高效微调方法的查询分类任务评估。我们生成了超过4000个样本，并评估了测试集上的准确率。ConFIRM达到了90%的准确率，这是必要的证明合规遵守。ConFIRM提供了数据效率的解决方案，用于提取金融对话系统中精准的查询意图。

Introducing the Attribution Stability Indicator: a Measure for Time Series XAI Attributions

paper_url: http://arxiv.org/abs/2310.04178
repo_url: https://github.com/visual-xai-for-time-series/attribution-stability-indicator
paper_authors: Udo Schlegel, Daniel A. Keim
for: 这篇论文旨在提供一种可解释性模型，用于满足递归时序数据领域中增长的需求。
methods: 该论文使用了 perturbation 分析和相关性分析来评估对时序数据的解释性模型。
results: 该论文提出了一种robustness和可信度的评价指标，即 Attribution Stability Indicator (ASI)，并通过三个整体时序分类 datasets 的分析，证明了 ASI 的可靠性和有用性。

Abstract
Given the increasing amount and general complexity of time series data in domains such as finance, weather forecasting, and healthcare, there is a growing need for state-of-the-art performance models that can provide interpretable insights into underlying patterns and relationships. Attribution techniques enable the extraction of explanations from time series models to gain insights but are hard to evaluate for their robustness and trustworthiness. We propose the Attribution Stability Indicator (ASI), a measure to incorporate robustness and trustworthiness as properties of attribution techniques for time series into account. We extend a perturbation analysis with correlations of the original time series to the perturbed instance and the attributions to include wanted properties in the measure. We demonstrate the wanted properties based on an analysis of the attributions in a dimension-reduced space and the ASI scores distribution over three whole time series classification datasets.

摘要
随着金融、天气预测和医疗等领域时序数据的增加和总体复杂度，需要更高级的性能模型，以提供可读性的下溯。但是评估Attribution技术的可靠性和信任性具有挑战。我们提出Attribution Stability Indicator（ASI），一种将可靠性和信任性作为时序Attribution技术的质量因素。我们通过对原始时序数据的推动分析，包括原始时序和推动分析的相关性，来扩展ASI的评估。我们通过对三个整体时序分类 dataset的Attribution分析，示出了ASI分数的分布，并证明了所需的属性。

Dynamic Relation-Attentive Graph Neural Networks for Fraud Detection

paper_url: http://arxiv.org/abs/2310.04171
repo_url: https://github.com/bdi-lab/drag
paper_authors: Heehyeon Kim, Jinhyeok Choi, Joyce Jiyoung Whang
for: 检测fraudster在社交媒体上的活动，例如留言或交易。
methods: 使用图 neural network（GNN）和动态关系注意力机制来解决这个分类问题。
results: 对实际数据集进行实验，我们的方法（DRAG）的性能高于现有的fraud detection方法。Here’s the full translation of the abstract in Simplified Chinese:检测fraudster在社交媒体上的活动，例如留言或交易，是一个重要的问题。在这篇论文中，我们使用图 neural network（GNN）和动态关系注意力机制来解决这个分类问题。我们首先提出了一种动态关系注意力机制，使得我们可以在不同的层次上学习不同类型的关系，并且可以在不同层次上进行归一化。然后，我们使用这种机制来对图进行归一化，并且使用一个可学习的注意力函数来权衡不同类型的关系。最后，我们对实际数据集进行实验，并证明了我们的方法（DRAG）的性能高于现有的fraud detection方法。

Abstract
Fraud detection aims to discover fraudsters deceiving other users by, for example, leaving fake reviews or making abnormal transactions. Graph-based fraud detection methods consider this task as a classification problem with two classes: frauds or normal. We address this problem using Graph Neural Networks (GNNs) by proposing a dynamic relation-attentive aggregation mechanism. Based on the observation that many real-world graphs include different types of relations, we propose to learn a node representation per relation and aggregate the node representations using a learnable attention function that assigns a different attention coefficient to each relation. Furthermore, we combine the node representations from different layers to consider both the local and global structures of a target node, which is beneficial to improving the performance of fraud detection on graphs with heterophily. By employing dynamic graph attention in all the aggregation processes, our method adaptively computes the attention coefficients for each node. Experimental results show that our method, DRAG, outperforms state-of-the-art fraud detection methods on real-world benchmark datasets.

摘要
针对欺诈者通过假评论或异常交易伤害其他用户的行为，欺诈检测目标是找到这些欺诈者。基于图的欺诈检测方法将这个问题视为一个分类问题，其中有两个类别：欺诈或正常。我们使用图神经网络（GNNs）来解决这个问题，并提出了动态关系注意机制。根据许多真实世界图中包含不同类型的关系的观察，我们提议学习每个关系的节点表示，并使用学习的注意函数来对每个关系分配不同的注意系数。此外，我们将不同层的节点表示进行组合，以考虑目标节点的本地和全局结构，从而提高欺诈检测在异质图上的性能。通过在所有聚合过程中使用动态图注意，我们的方法可以适应性计算每个节点的注意系数。实验结果表明，我们的方法DRAG在真实世界 benchmark 数据集上超过了当前最佳的欺诈检测方法。

Document-Level Relation Extraction with Relation Correlation Enhancement

paper_url: http://arxiv.org/abs/2310.13000
repo_url: https://github.com/lumia-group/lace
paper_authors: Yusheng Huang, Zhouhan Lin
for: 本研究旨在提高文档关系提取（DocRE）模型的性能，通过显著地利用关系相互关系。
methods: 我们提出了一种关系图方法，使用先前知道的关系统统征信息来构建关系图，并使用重要性权重Matrix来引导关系信息的传播。
results: 我们的方法可以很好地与现有模型结合使用，并且在多关系提取 task 中提高性能，证明了关系相互关系的考虑对 DocRE task 是重要的。

Abstract
Document-level relation extraction (DocRE) is a task that focuses on identifying relations between entities within a document. However, existing DocRE models often overlook the correlation between relations and lack a quantitative analysis of relation correlations. To address this limitation and effectively capture relation correlations in DocRE, we propose a relation graph method, which aims to explicitly exploit the interdependency among relations. Firstly, we construct a relation graph that models relation correlations using statistical co-occurrence information derived from prior relation knowledge. Secondly, we employ a re-weighting scheme to create an effective relation correlation matrix to guide the propagation of relation information. Furthermore, we leverage graph attention networks to aggregate relation embeddings. Importantly, our method can be seamlessly integrated as a plug-and-play module into existing models. Experimental results demonstrate that our approach can enhance the performance of multi-relation extraction, highlighting the effectiveness of considering relation correlations in DocRE.

摘要
文档关系提取（DocRE）是一个着眼于文档中实体之间关系的任务。然而，现有的 DocRE 模型通常忽视关系之间的相关性并缺乏关系相关性的量化分析。为了解决这些限制并有效地捕捉关系相关性，我们提议一种关系图方法，它利用优先知识中的关系统统统计协会信息来模型关系相关性。首先，我们构建一个关系图，其中模型了关系之间的相关性。然后，我们采用一种重新权重分配方法，以创建一个有效的关系相关性矩阵，以导引关系信息的协会。此外，我们利用图注意网络来聚合关系嵌入。重要的是，我们的方法可以轻松地与现有模型集成，成为插件模块。实验结果表明，我们的方法可以提高多关系提取的性能，从而证明了考虑关系相关性在 DocRE 中的重要性。

Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.04148
repo_url: https://github.com/ydchen0806/dbmim
paper_authors: Yinda Chen, Wei Huang, Shenglong Zhou, Qi Chen, Zhiwei Xiong
for: This paper aims to improve the performance of supervised neuron segmentation methods by using self-supervised learning and reinforcement learning to pretrain a decision-based mask image model (MIM).methods: The proposed method utilizes reinforcement learning to automatically search for the optimal image masking ratio and masking strategy, and treats each input patch as an agent with a shared behavior policy to enable multi-agent collaboration.results: The proposed approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation, as demonstrated by experiments conducted on representative EM datasets.

Abstract
The performance of existing supervised neuron segmentation methods is highly dependent on the number of accurate annotations, especially when applied to large scale electron microscopy (EM) data. By extracting semantic information from unlabeled data, self-supervised methods can improve the performance of downstream tasks, among which the mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. However, due to the high degree of structural locality in EM images, as well as the existence of considerable noise, many voxels contain little discriminative information, making MIM pretraining inefficient on the neuron segmentation task. To overcome this challenge, we propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Due to the vast exploration space, using single-agent RL for voxel prediction is impractical. Therefore, we treat each input patch as an agent with a shared behavior policy, allowing for multi-agent collaboration. Furthermore, this multi-agent model can capture dependencies between voxels, which is beneficial for the downstream segmentation task. Experiments conducted on representative EM datasets demonstrate that our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation. Code is available at \url{https://github.com/ydchen0806/dbMiM}.

摘要
现有的监督学习神经分 segmentation 方法的性能受到标注数量的影响，特别是在大规模电子顺icroscopy (EM) 数据上。通过提取 semantic 信息于无标注数据中，自动学习方法可以提高下游任务的性能。在这些下游任务中，mask image model (MIM) 得到了广泛的应用，因为它的简单性和能效性在恢复原始信息的masked 图像中。然而，由于EM图像中的结构本地性和噪声的存在，许多 voxel 含有少量特征信息，使得 MIM 预training 不具有效果。为了解决这个挑战，我们提出了一种决策基于的 MIM，通过 reinforcement learning (RL) 自动搜索最佳的图像掩蔽比率和掩蔽策略。由于搜索空间的庞大，单个agent RL 为 voxel 预测是不实用的。因此，我们将每个输入 patch 作为一个agent，并将其共享行为策略。此外，这种多agent模型可以捕捉voxels之间的依赖关系，这对下游分 segmentation 任务是有利的。在代表性的 EM 数据集上进行的实验表明，我们的方法在神经分 segmentation 任务中具有显著的优势。代码可以在 \url{https://github.com/ydchen0806/dbMiM} 上获取。

Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations

paper_url: http://arxiv.org/abs/2310.04481
repo_url: None
paper_authors: Manon Macary, Marie Tahon, Yannick Estève, Daniel Luzzati
for: 这项研究旨在自动检测真实的客户服务电话对话中的满意度和不满度。
methods: 这项研究使用预训练的语音表示进行转移学习，以提高客户服务质量。
results: 实验结果表明，使用预训练的语音表示可以获得大量的性能提升，而语言内容是满意度预测的主要 contribuutor。

Abstract
The goal of our research is to automatically retrieve the satisfaction and the frustration in real-life call-center conversations. This study focuses an industrial application in which the customer satisfaction is continuously tracked down to improve customer services. To compensate the lack of large annotated emotional databases, we explore the use of pre-trained speech representations as a form of transfer learning towards AlloSat corpus. Moreover, several studies have pointed out that emotion can be detected not only in speech but also in facial trait, in biological response or in textual information. In the context of telephone conversations, we can break down the audio information into acoustic and linguistic by using the speech signal and its transcription. Our experiments confirms the large gain in performance obtained with the use of pre-trained features. Surprisingly, we found that the linguistic content is clearly the major contributor for the prediction of satisfaction and best generalizes to unseen data. Our experiments conclude to the definitive advantage of using CamemBERT representations, however the benefit of the fusion of acoustic and linguistic modalities is not as obvious. With models learnt on individual annotations, we found that fusion approaches are more robust to the subjectivity of the annotation task. This study also tackles the problem of performances variability and intends to estimate this variability from different views: weights initialization, confidence intervals and annotation subjectivity. A deep analysis on the linguistic content investigates interpretable factors able to explain the high contribution of the linguistic modality for this task.

摘要

Reinforcement Learning with Fast and Forgetful Memory

paper_url: http://arxiv.org/abs/2310.04128
repo_url: https://github.com/proroklab/ffm
paper_authors: Steven Morad, Ryan Kortvelesy, Stephan Liwicki, Amanda Prorok
for: 这篇论文主要是为了解决模型自由RL中存储问题，提出了一种专门为RL设计的快速和忘记型内存模型（Fast and Forgetful Memory，FFM），以提高奖励和训练效率。
methods: FFM使用了 Computational Psychology 中的强结构预先知识，将RL的模型搜索空间约束到一个固定的空间，并且可以作为 RNN 的替换部件，无需改变任何超参数。
results: FFM在多种回合RL benchmark 和算法上达到了更高的奖励，并且在训练速度方面也有了两个数量级的提升，具体来说是logarithmic time和linear space复杂度。

Abstract
Nearly all real world tasks are inherently partially observable, necessitating the use of memory in Reinforcement Learning (RL). Most model-free approaches summarize the trajectory into a latent Markov state using memory models borrowed from Supervised Learning (SL), even though RL tends to exhibit different training and efficiency characteristics. Addressing this discrepancy, we introduce Fast and Forgetful Memory, an algorithm-agnostic memory model designed specifically for RL. Our approach constrains the model search space via strong structural priors inspired by computational psychology. It is a drop-in replacement for recurrent neural networks (RNNs) in recurrent RL algorithms, achieving greater reward than RNNs across various recurrent benchmarks and algorithms without changing any hyperparameters. Moreover, Fast and Forgetful Memory exhibits training speeds two orders of magnitude faster than RNNs, attributed to its logarithmic time and linear space complexity. Our implementation is available at https://github.com/proroklab/ffm.

摘要
大多数实际任务都是半观察的，因此需要在强化学习（RL）中使用记忆。大多数无模型方法会把轨迹摘要为一个 latent Markov state 使用来自supervised Learning（SL）的记忆模型，although RL 的训练和效率特点与 SL 有所不同。为解决这个差异，我们介绍 Fast and Forgetful Memory，一种专门为 RL 设计的记忆模型。我们通过强制模型搜索空间的计算机科学逻辑约束，使其成为 RNN 的替换部件，在不同的 recurrent 算法和hyperparameters 下实现更高的奖励。此外，Fast and Forgetful Memory 的训练速度比 RNN 快两个数量级，这是因为它的循环时间复杂度为对数几何，而不是 RNN 的线性循环时间复杂度。我们的实现可以在 GitHub 上找到：https://github.com/proroklab/ffm。

Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems

paper_url: http://arxiv.org/abs/2310.05847
repo_url: None
paper_authors: Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han, Dan Meng, Jun Wang
for: 隐私担忧在推荐系统中增加了关注，因此推荐系统中的推学算法忘记（Unlearning）已经得到了越来越多的关注。现有的研究主要使用训练数据作为忘记目标。然而，我们发现攻击者可以从一个已经训练过的模型中提取私人信息，例如性别、种族和年龄，即使这些信息没有直接出现在训练数据中。我们称这些未看到的信息为特征，并将其作为忘记目标。
methods: 为了保护用户的敏感特征，我们提出了特征忘记（Attribute Unlearning，AU），它的目的是降低攻击性能和使目标特征变得无法识别。在这篇论文中，我们关注一个具有严格且实际意义的AU问题，即在训练完成后进行忘记（Post-Training Attribute Unlearning，PoT-AU）。为解决PoT-AU问题，我们设计了一个两部分损失函数，包括i）特征标签无法识别的混淆损失，和ii）正则化损失，以防止模型受到训练变化而导致推荐性能下降。
results: 我们使用批量Descendant Gradient Algorithm进行优化。对于三个真实的数据集进行了广泛的实验，结果表明我们的提出的方法有效地解决了PoT-AU问题。

Abstract
With the growing privacy concerns in recommender systems, recommendation unlearning, i.e., forgetting the impact of specific learned targets, is getting increasing attention. Existing studies predominantly use training data, i.e., model inputs, as the unlearning target. However, we find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training. We name this unseen information as attribute and treat it as the unlearning target. To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable. In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of i) distinguishability loss: making attribute labels indistinguishable from attackers, and ii) regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance. Specifically, we investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.

摘要
In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of:1. Distinguishability loss: making attribute labels indistinguishable from attackers.2. Regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance.We investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.

Auto-survey Challenge

paper_url: http://arxiv.org/abs/2310.04480
repo_url: None
paper_authors: Thanh Gia Hieu Khuong, Benedictus Kent Rachmat
for: This paper is written for evaluating the capability of Large Language Models (LLMs) to autonomously compose and critique survey papers in various disciplines.
methods: The paper uses a simulated peer-review mechanism and human organizers in an editorial oversight capacity to evaluate the LLMs’ performance.
results: The paper presents the design of a competition for the AutoML conference 2023, where entrants are tasked with presenting stand-alone models that can author and appraise articles based on designated prompts, and the assessment criteria include clarity, reference appropriateness, accountability, and substantive value of the content.Here’s the simplified Chinese text:
for: 这篇论文是为了评估大语言模型（LLM）在不同领域的自动撰写和评论报告的能力而写的。
methods: 这篇论文使用了模拟的同行评审机制和人类组织者在编辑监督性的情况下评估LLM的表现。
results: 这篇论文介绍了AutoML会议2023年的一项竞赛，参赛者需要提交独立的模型，能够根据指定的提示自动撰写和评论文章，并且评估标准包括明确度、参考适用性、责任和内容的价值。

Abstract
We present a novel platform for evaluating the capability of Large Language Models (LLMs) to autonomously compose and critique survey papers spanning a vast array of disciplines including sciences, humanities, education, and law. Within this framework, AI systems undertake a simulated peer-review mechanism akin to traditional scholarly journals, with human organizers serving in an editorial oversight capacity. Within this framework, we organized a competition for the AutoML conference 2023. Entrants are tasked with presenting stand-alone models adept at authoring articles from designated prompts and subsequently appraising them. Assessment criteria include clarity, reference appropriateness, accountability, and the substantive value of the content. This paper presents the design of the competition, including the implementation baseline submissions and methods of evaluation.

摘要
我们提出了一种新的平台，用于评估大自然语言模型（LLM）能够自主撰写和评论论文，涵盖了各种学科，包括科学、人文社会科学、教育和法律。在这个框架下，人工智能系统在模拟学术审查机制下进行自动审查，人类组织者在编辑监督 capacity 中发挥作用。为2023年AutoML会议，我们组织了一场竞赛，参赛者需要提交独立的模型，可以从指定的提示中撰写文章，并且进行评估。评价标准包括明确度、引用适用性、负责任性和内容的实际价值。本文介绍了竞赛的设计，包括提交基eline和评估方法。

Leveraging Data Geometry to Mitigate CSM in Steganalysis

paper_url: http://arxiv.org/abs/2310.04479
repo_url: None
paper_authors: Rony Abecidan, Vincent Itier, Jérémie Boulanger, Patrick Bas, Tomáš Pevný
for: 本研究旨在 mitigating Cover Source Mismatch (CSM) issue in steganalysis, by exploring a grid of processing pipelines and developing a strategy for selecting or deriving customized training datasets.
methods: 本研究使用了一种基于几何学的优化策略，通过Computer Tomography (DCTr) 特征下的子空间距离计算出高相关性的操作 regret。
results: 实验 validate 了我们的几何学基于优化策略，相比传统原子方法，在理想的假设下表现出色。详细的实验结果可以在github.com/RonyAbecidan/LeveragingGeometrytoMitigateCSM 上找到。

Abstract
In operational scenarios, steganographers use sets of covers from various sensors and processing pipelines that differ significantly from those used by researchers to train steganalysis models. This leads to an inevitable performance gap when dealing with out-of-distribution covers, commonly referred to as Cover Source Mismatch (CSM). In this study, we consider the scenario where test images are processed using the same pipeline. However, knowledge regarding both the labels and the balance between cover and stego is missing. Our objective is to identify a training dataset that allows for maximum generalization to our target. By exploring a grid of processing pipelines fostering CSM, we discovered a geometrical metric based on the chordal distance between subspaces spanned by DCTr features, that exhibits high correlation with operational regret while being not affected by the cover-stego balance. Our contribution lies in the development of a strategy that enables the selection or derivation of customized training datasets, enhancing the overall generalization performance for a given target. Experimental validation highlights that our geometry-based optimization strategy outperforms traditional atomistic methods given reasonable assumptions. Additional resources are available at github.com/RonyAbecidan/LeveragingGeometrytoMitigateCSM.

摘要
在操作场景中，隐写者使用来自不同感知和处理管道的集合，这与研究人员用于训练隐写检测模型的集合有很大差异。这导致了对于非典型覆盖（CSM）的性能差距。在本研究中，我们考虑的enario是测试图像通过同一个管道进行处理。然而，关于标签和覆盖的权重知道的信息缺失。我们的目标是找到一个允许最大化适应性的训练集。通过探索一个离散的处理管道集合，我们发现了一种基于分割距离的 геометри metric，它与操作 regret 具有高相关性，而不受覆盖-隐写权重的影响。我们的贡献在于开发了一种基于几何学的选择或 derivation 训练集策略，以提高给定目标的总体适应性。实验验证表明，我们的几何学基于优化策略在理想情况下比传统原子方法表现更好。更多资源可以在github.com/RonyAbecidan/LeveragingGeometrytoMitigateCSM中找到。

Nash Welfare and Facility Location

paper_url: http://arxiv.org/abs/2310.04102
repo_url: None
paper_authors: Alexander Lam, Haris Aziz, Toby Walsh
for: 解决资源分配问题中兼顾公平和效率的问题
methods: 使用纳什公平函数，将个体成本转换为Utility，并提供一个多阶段搜索算法来寻找最大化纳什公平的设施位置
results: 提供一个纳什公平函数来衡量设施位置的公平性和效率，并证明这种方法可以在多阶段搜索中实现可靠的近似解Here’s a more detailed explanation of each point:
for: The paper is written to address the problem of locating a facility to serve a set of agents located along a line, and to provide a compromise between fairness and efficiency in resource allocation problems.
methods: The paper uses the Nash welfare objective function, which is defined as the product of the agents’ utilities, to convert individual costs to utilities and analyze the facility placement that maximizes the Nash welfare. The paper also provides a polynomial-time approximation algorithm to compute this facility location.
results: The paper proves results suggesting that the proposed facility location algorithm achieves a good balance of fairness and efficiency, and also proposes a strategy-proof mechanism with a bounded approximation ratio for Nash welfare from a mechanism design perspective.

Abstract
We consider the problem of locating a facility to serve a set of agents located along a line. The Nash welfare objective function, defined as the product of the agents' utilities, is known to provide a compromise between fairness and efficiency in resource allocation problems. We apply this welfare notion to the facility location problem, converting individual costs to utilities and analyzing the facility placement that maximizes the Nash welfare. We give a polynomial-time approximation algorithm to compute this facility location, and prove results suggesting that it achieves a good balance of fairness and efficiency. Finally, we take a mechanism design perspective and propose a strategy-proof mechanism with a bounded approximation ratio for Nash welfare.

摘要
我团队考虑了一个服务多个代理的设施问题。我们使用纳什利益函数，它是资源分配问题中的一种妥协之处，来解决这个问题。我们将个人成本转换为利益，并分析最大化纳什利益的设施位置。我们提供一个可靠的多阶段批处理算法，并证明它可以实现一个良好的平衡 между公平和效率。最后，我们从机制设计的角度出发，并提出一个具有 bounded approximation ratio的纳什利益响应的机制。Note: "纳什利益函数" (Nash welfare function) is a direct translation of the English term, and "纳什利益响应" (Nash welfare response) is a translation of the phrase "strategy-proof mechanism".

A Deeply Supervised Semantic Segmentation Method Based on GAN

paper_url: http://arxiv.org/abs/2310.04081
repo_url: None
paper_authors: Wei Zhao, Qiyu Wei, Zeng Zeng
for: 这篇论文旨在提高智能交通系统中的交通安全性，通过精确地识别和位置化不同类型的路面元素，如路面裂隙、车道和交通标志。
methods: 本研究提出一个改进的semantic segmentation模型，结合了抗对抗学习和现有的semantic segmentation技术，以提高模型在交通图像中捕捉复杂和微妙的特征的能力。
results: 比较 existing methods，如SEGAN，本研究获得了明显的性能提升，特别是在路面裂隙数据集上。这些改进可以归因于对抗学习和semantic segmentation之间的相互补充作用，导致更精确和丰富的路面结构和状态表现。

Abstract
In recent years, the field of intelligent transportation has witnessed rapid advancements, driven by the increasing demand for automation and efficiency in transportation systems. Traffic safety, one of the tasks integral to intelligent transport systems, requires accurately identifying and locating various road elements, such as road cracks, lanes, and traffic signs. Semantic segmentation plays a pivotal role in achieving this task, as it enables the partition of images into meaningful regions with accurate boundaries. In this study, we propose an improved semantic segmentation model that combines the strengths of adversarial learning with state-of-the-art semantic segmentation techniques. The proposed model integrates a generative adversarial network (GAN) framework into the traditional semantic segmentation model, enhancing the model's performance in capturing complex and subtle features in transportation images. The effectiveness of our approach is demonstrated by a significant boost in performance on the road crack dataset compared to the existing methods, \textit{i.e.,} SEGAN. This improvement can be attributed to the synergistic effect of adversarial learning and semantic segmentation, which leads to a more refined and accurate representation of road structures and conditions. The enhanced model not only contributes to better detection of road cracks but also to a wide range of applications in intelligent transportation, such as traffic sign recognition, vehicle detection, and lane segmentation.

摘要
在最近的几年，智能交通领域已经受到了自动化和效率的需求的推动，导致了智能交通系统的快速发展。交通安全是智能交通系统中的一项重要任务，需要准确地识别和定位不同的路面元素，如路面裂隙、车道和交通标志。Semantic segmentation在完成这项任务中扮演着关键的角色，它使得图像被分割成有意义的区域，边界准确。在本研究中，我们提出了一种改进的Semantic segmentation模型，该模型结合了对抗学习和当今最佳Semantic segmentation技术的优势。我们的提案把生成对抗网络（GAN）框架 incorporated into the traditional Semantic segmentation模型，从而提高模型对复杂和细微的特征的捕捉能力。我们的方法在路面裂隙数据集上表现出了显著的提高，与现有的方法相比，例如SEGAN，即使在较为复杂的场景下也能够更好地识别路面裂隙。这种提高可以归因于对抗学习和Semantic segmentation的相互作用，导致了更加精细和准确的路面结构和状况的表示。提高的模型不仅有助于更好地检测路面裂隙，还有广泛的应用前景在智能交通领域，如交通标志识别、车辆检测和车道分 segmentation。

Automatic Aspect Extraction from Scientific Texts

paper_url: http://arxiv.org/abs/2310.04074
repo_url: https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts
paper_authors: Anna Marshalova, Elena Bruches, Tatiana Batura
for: 本研究的目的是开发一种自动从俄语科学文献中提取主要元素的工具，以便进行科学文献综述。
methods: 本研究使用的方法包括创建了跨领域俄语科学文献数据集，并使用多语言BERT模型在这些数据上进行了微调。
results: 研究表明，使用微调后的多语言BERT模型可以在不同领域的俄语科学文献中提取主要元素，并且可以在不同领域之间进行泛化。

Abstract
Being able to extract from scientific papers their main points, key insights, and other important information, referred to here as aspects, might facilitate the process of conducting a scientific literature review. Therefore, the aim of our research is to create a tool for automatic aspect extraction from Russian-language scientific texts of any domain. In this paper, we present a cross-domain dataset of scientific texts in Russian, annotated with such aspects as Task, Contribution, Method, and Conclusion, as well as a baseline algorithm for aspect extraction, based on the multilingual BERT model fine-tuned on our data. We show that there are some differences in aspect representation in different domains, but even though our model was trained on a limited number of scientific domains, it is still able to generalize to new domains, as was proved by cross-domain experiments. The code and the dataset are available at \url{https://github.com/anna-marshalova/automatic-aspect-extraction-from-scientific-texts}.

摘要
<>将科学文献中的主要点、关键发现和其他重要信息（以下简称“方面”）抽取出来，可能会使科学文献复习更加容易。因此，我们的研究目标是开发一种自动从俄语科学文献中提取方面的工具。在这篇论文中，我们提供了跨领域俄语科学文献数据集，每篇文献都有标注的方面，包括任务、贡献、方法和结论。此外，我们还提出了一个基线算法，基于多语言BERT模型，并在我们的数据集上进行了微调。我们发现在不同领域中，方面的表示存在一些差异，但是我们的模型即使只在有限的科学领域进行了训练，仍然能够在新领域中广泛应用。代码和数据集可以在 GitHub 上找到：。

AI Regulation in Europe: From the AI Act to Future Regulatory Challenges

paper_url: http://arxiv.org/abs/2310.04072
repo_url: None
paper_authors: Philipp Hacker
for: 本文探讨了欧盟对人工智能（AI）的规制，与英国更为分化和自律的方法进行比较，并提出了一种混合规制策略，强调了需要灵活和安全坚实的执行。
methods: 本文研究了AI法，作为人工智能技术面临多方面挑战的先驱立法努力，尽管该法有缺陷，但是是一个重要的立法成就。
results: 本文预测了未来的规制挑战，如管理恶势力内容、环境问题和杂合威胁，并强调了需要立即创建规则 для规制高性能、可能是开源的AI系统访问。虽然AI法是一个重要的立法成就，但是需要进一步的细化和全球合作，以有效管理在不断发展的AI技术。

Abstract
This chapter provides a comprehensive discussion on AI regulation in the European Union, contrasting it with the more sectoral and self-regulatory approach in the UK. It argues for a hybrid regulatory strategy that combines elements from both philosophies, emphasizing the need for agility and safe harbors to ease compliance. The paper examines the AI Act as a pioneering legislative effort to address the multifaceted challenges posed by AI, asserting that, while the Act is a step in the right direction, it has shortcomings that could hinder the advancement of AI technologies. The paper also anticipates upcoming regulatory challenges, such as the management of toxic content, environmental concerns, and hybrid threats. It advocates for immediate action to create protocols for regulated access to high-performance, potentially open-source AI systems. Although the AI Act is a significant legislative milestone, it needs additional refinement and global collaboration for the effective governance of rapidly evolving AI technologies.

摘要
Translation in Simplified Chinese:这章节讲述欧盟的人工智能规则，与英国更为领域化和自 réglementaire的方法进行比较，强调混合 réglementaire 策略，并强调需要灵活和安全庇护以便遵守。文章分析了人工智能法为人工智能多重挑战所采取的先驱法律努力，但认为法律有缺陷可能会阻碍人工智能技术的发展。文章还预测将来的规制挑战，如抑制危险内容、环境问题和杂合威胁。它主张立即创建规则了高性能、可能是开源的人工智能系统的访问协议。虽然人工智能法是一个重要的立法里程碑，但它需要进一步的细化和全球合作，以有效管理在不断发展的人工智能技术。

ByteStack-ID: Integrated Stacked Model Leveraging Payload Byte Frequency for Grayscale Image-based Network Intrusion Detection

paper_url: http://arxiv.org/abs/2310.09298
repo_url: None
paper_authors: Irfan Khan, Yasir Ali Farrukh, Syed Wali
for: 本研究旨在提高网络安全性，通过 packet-level 数据分析，快速、准确地识别多样化的攻击类型。
methods: 本文提出了 “ByteStack-ID” 方法，基于 payload 数据频率分布生成的灰度图像，以及 packet-level 信息为基础，采用堆式方法拟合，并具有高度优化、一体化的特点。
results: 实验结果表明，ByteStack-ID 框架在多类分类任务中表现杰出，与基eline模型和现有方法相比，具有较高的精度、回归率和 F1 分数。特别是，ByteStack-ID 在多类分类任务中达到了 81% 的macro F1 分数。

Abstract
In the ever-evolving realm of network security, the swift and accurate identification of diverse attack classes within network traffic is of paramount importance. This paper introduces "ByteStack-ID," a pioneering approach tailored for packet-level intrusion detection. At its core, ByteStack-ID leverages grayscale images generated from the frequency distributions of payload data, a groundbreaking technique that greatly enhances the model's ability to discern intricate data patterns. Notably, our approach is exclusively grounded in packet-level information, a departure from conventional Network Intrusion Detection Systems (NIDS) that predominantly rely on flow-based data. While building upon the fundamental concept of stacking methodology, ByteStack-ID diverges from traditional stacking approaches. It seamlessly integrates additional meta learner layers into the concatenated base learners, creating a highly optimized, unified model. Empirical results unequivocally confirm the outstanding effectiveness of the ByteStack-ID framework, consistently outperforming baseline models and state-of-the-art approaches across pivotal performance metrics, including precision, recall, and F1-score. Impressively, our proposed approach achieves an exceptional 81\% macro F1-score in multiclass classification tasks. In a landscape marked by the continuous evolution of network threats, ByteStack-ID emerges as a robust and versatile security solution, relying solely on packet-level information extracted from network traffic data.

摘要
在网络安全领域中，能快速和准确地识别多样化的攻击类型在网络流量中是极其重要的。本文介绍了一种名为“ByteStack-ID”的创新方法，这种方法是专门为packet-level攻击检测设计的。ByteStack-ID的核心思想是利用 payload 数据的频率分布生成灰度图像，这是一种新的技术，它能够大幅提高模型对复杂数据模式的识别能力。与传统的网络入侵检测系统（NIDS）不同，ByteStack-ID 仅仅基于流量数据，而不是基于流量的总是。在核心思想上，ByteStack-ID 是一种堆叠方法，但它不同于传统的堆叠方法，它可以将多个基础学习层集成到 concatenated 的基础学习层中，创造出一个高度优化的、统一的模型。实验结果证明，ByteStack-ID 框架在多类分类任务中表现杰出，与基准模型和当前领域的state-of-the-art方法相比，在重要的性能指标上均取得了出色的成绩。特别是，我们的提议方法在多类分类任务中达到了81%的macro F1-score，这表明ByteStack-ID 在网络安全领域中是一种Robust和多样化的安全解决方案。

Kick Bad Guys Out! Zero-Knowledge-Proof-Based Anomaly Detection in Federated Learning

paper_url: http://arxiv.org/abs/2310.04055
repo_url: None
paper_authors: Shanshan Han, Wenxuan Wu, Baturalp Buyukates, Weizhao Jin, Yuhang Yao, Qifan Zhang, Salman Avestimehr, Chaoyang He
for: 防止 federated learning 系统中的恶意客户端提交损害模型以实现对抗目标，如阻碍全局模型的协调或让全局模型对某些数据进行误分类。
methods: 使用 cutting-edge anomaly detection 技术，包括：i) 检测攻击发生并在攻击发生时执行防御操作；ii) 在攻击发生时，进一步检测恶意客户端模型并从中除掉无害的模型；iii) 使用零知识证明机制来保证服务器上的防御机制的诚实执行。
results: 通过广泛的实验证明了提posed approach的超越性表现。

Abstract
Federated learning (FL) systems are vulnerable to malicious clients that submit poisoned local models to achieve their adversarial goals, such as preventing the convergence of the global model or inducing the global model to misclassify some data. Many existing defense mechanisms are impractical in real-world FL systems, as they require prior knowledge of the number of malicious clients or rely on re-weighting or modifying submissions. This is because adversaries typically do not announce their intentions before attacking, and re-weighting might change aggregation results even in the absence of attacks. To address these challenges in real FL systems, this paper introduces a cutting-edge anomaly detection approach with the following features: i) Detecting the occurrence of attacks and performing defense operations only when attacks happen; ii) Upon the occurrence of an attack, further detecting the malicious client models and eliminating them without harming the benign ones; iii) Ensuring honest execution of defense mechanisms at the server by leveraging a zero-knowledge proof mechanism. We validate the superior performance of the proposed approach with extensive experiments.

摘要
联合学习（FL）系统容易受到黑客的攻击，这些黑客可能会提交毒害的本地模型来达到他们的恶意目的，例如阻碍全球模型的聚合或让全球模型错分析一些数据。许多现有的防御机制不实用于实际的FL系统中，因为它们需要先知道黑客的数量或基于重新挂绫或修改提交。这是因为敌人通常不会在进攻前宣布他们的意图，而重新挂绫可能会在没有进攻的情况下改变聚合结果。为了解决实际FL系统中的这些挑战，这篇研究论文引入了一种前沿的异常探测方法，具有以下特点：1. 当进攻发生时，探测进攻并执行防御操作；2. 进攻发生后，进一步探测侵略客模型，并将其淘汰无害的模型；3. 在服务器端进行诚实的防御机制执行，运用零知识证明机制。我们透过广泛的实验证明了提案的方法的超越性。

Higher-Order DeepTrails: Unified Approach to *Trails

paper_url: http://arxiv.org/abs/2310.04477
repo_url: https://github.com/lsx-uniwue/deeptrails
paper_authors: Tobias Koopmann, Jan Pfister, André Markus, Astrid Carolus, Carolin Wienrich, Andreas Hotho
for: 本研究旨在分析和理解人类行为，以便在不同的设定下提高和优化基础设施或用户界面。
methods: 本研究使用自动逆进语言模型来分析整个序列，以模型高阶相关性在序列中。
results: 本研究可以轻松地适应先前的工作中的不同设定，如 HypTrails、MixedTrails 和 SubTrails，同时具有uniqueadvantages：1. 模型高阶相关性 между状态转移，2. 能够识别提出的假设缺陷，3. 自然地实现了所有设定的统一模型。

Abstract
Analyzing, understanding, and describing human behavior is advantageous in different settings, such as web browsing or traffic navigation. Understanding human behavior naturally helps to improve and optimize the underlying infrastructure or user interfaces. Typically, human navigation is represented by sequences of transitions between states. Previous work suggests to use hypotheses, representing different intuitions about the navigation to analyze these transitions. To mathematically grasp this setting, first-order Markov chains are used to capture the behavior, consequently allowing to apply different kinds of graph comparisons, but comes with the inherent drawback of losing information about higher-order dependencies within the sequences. To this end, we propose to analyze entire sequences using autoregressive language models, as they are traditionally used to model higher-order dependencies in sequences. We show that our approach can be easily adapted to model different settings introduced in previous work, namely HypTrails, MixedTrails and even SubTrails, while at the same time bringing unique advantages: 1. Modeling higher-order dependencies between state transitions, while 2. being able to identify short comings in proposed hypotheses, and 3. naturally introducing a unified approach to model all settings. To show the expressiveness of our approach, we evaluate our approach on different synthetic datasets and conclude with an exemplary analysis of a real-world dataset, examining the behavior of users who interact with voice assistants.

摘要
分析、理解和描述人类行为是在不同的设定中有利，如网络浏览或交通导航。理解人类行为自然地 помоляет改进和优化下面的基础设施或用户界面。通常，人类导航被表示为状态转移序列。先前的工作建议使用假设，表示不同的导航 intuitions 来分析这些转移。使用首领链来数学地抓住这种设定，可以应用不同的图比较方法，但是会隐藏高阶的序列相互关系信息。为了解决这个问题，我们提议使用拟合语言模型来分析整个序列，这种模型通常用于模型序列中的高阶相互关系。我们表明，我们的方法可以轻松地适应先前的工作中引入的不同设定，即 HypTrails、MixedTrails 和 SubTrails，同时带来独特的优势：1. 模型状态转移中的高阶相互关系，2. 能够识别提出的假设缺陷，3. 自然地引入一个统一的方法来模型所有设定。为了证明我们的方法的表达力，我们在不同的 sintetic 数据集上进行了评估，并在一个实际的数据集上进行了 exemplary 分析，探讨用户与语音助手交互的行为。

Observation-Guided Diffusion Probabilistic Models

paper_url: http://arxiv.org/abs/2310.04041
repo_url: None
paper_authors: Junoh Kang, Jinyoung Choi, Sungik Choi, Bohyung Han
for: 提高速度抽样和质量控制之间的平衡，建立一种新的扩散模型，即观察指导扩散概率模型（OGDM）。
methods: 通过在Markov链中integrating观察过程的指导，以原则性的方式重新定义培训目标。这里引入基于观察的一个额外损失项，使用bernoulli分布判断输入是否处于噪声真实抽象上，从而优化具有更高准确性的负逻恒 log-likelihood。
results: 通过对强 diffusion model 基础版本进行训练，实现更好的降噪网络，而无需增加计算成本。此外，我们的训练方法可以与不同的快速推理策略结合使用，并且在精度控制和速度抽样之间具有优势。我们通过对多种推理方法进行评估，证明了我们的训练算法的效iveness。

Abstract
We propose a novel diffusion model called observation-guided diffusion probabilistic model (OGDM), which effectively addresses the trade-off between quality control and fast sampling. Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain in a principled way. This is achieved by introducing an additional loss term derived from the observation based on the conditional discriminator on noise level, which employs Bernoulli distribution indicating whether its input lies on the (noisy) real manifold or not. This strategy allows us to optimize the more accurate negative log-likelihood induced in the inference stage especially when the number of function evaluations is limited. The proposed training method is also advantageous even when incorporated only into the fine-tuning process, and it is compatible with various fast inference strategies since our method yields better denoising networks using the exactly same inference procedure without incurring extra computational cost. We demonstrate the effectiveness of the proposed training algorithm using diverse inference methods on strong diffusion model baselines.

摘要
我们提出了一种新的扩散模型，即观察指导扩散概率模型（OGDM），可以有效地解决速度控制和质量控制之间的牵扯。我们的方法通过在Markov链中 integrate观察过程的指导来重新定义培训目标。这是通过在噪声水平上采用bernoulli分布来判断输入是否处于噪声真实抽象上来实现的。这种策略使得我们可以在推理阶段更好地优化更准确的负逻辑极限，特别是当功能评估数量有限时。我们的训练方法还具有优化准确性的优势，并且可以轻松地与各种快速推理策略结合使用，不需要额外的计算成本。我们通过使用不同的推理方法对强大的扩散模型基准进行了证明。

Demystifying Embedding Spaces using Large Language Models

paper_url: http://arxiv.org/abs/2310.04475
repo_url: None
paper_authors: Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier
for: 使得嵌入模型中的信息更易理解和更广泛使用，使用大自然语言模型（LLM）直接与嵌入 vectors 交互，将抽象矢量转化为可理解的故事。
methods: 使用 LLM 将嵌入 vectors 注入到 LLM 中，以便查询和探索复杂嵌入数据。
results: 在多种多样的任务上，包括增强概念活动矢量（CAV）、传达新嵌入实体、和解码用户喜好 recommender 系统中的用户喜好。

Abstract
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing Large Language Models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.

摘要
Currently, embeddings have become a crucial tool for representing complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. However, these embeddings often preclude direct interpretation, making it difficult to understand their meaning. Although downstream tasks can utilize these compressed representations, interpreting them usually requires visualization methods such as dimensionality reduction or specialized machine learning interpretability techniques.This paper aims to address the challenge of making embeddings more interpretable and broadly useful by using Large Language Models (LLMs) to directly interact with embeddings. By injecting embeddings into LLMs, we can transform abstract vectors into understandable narratives, enabling users to query and explore complex embedding data. We demonstrate our approach on a variety of diverse tasks, including enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work combines the vast information potential of embeddings with the interpretive power of LLMs, unlocking new possibilities for understanding and working with complex data.

Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning

paper_url: http://arxiv.org/abs/2310.04474
repo_url: None
paper_authors: Yinger Zhang, Hui Cai, Yicheng Chen, Rui Sun, Jing Zheng
for: 这篇论文目的是为了增强大型自然语言处理模型（LLMs）的表现，使其能够通过外部API进行函数调用，而不需要精度调整。
methods: 这篇论文提出了一种简单又可控的目标驱动方法，称为“反链”，以使LLMs能够通过提示来使用外部API。
results: 实验结果表明，reverse Chain可以具有多 функ数调用的可能性，而且可以提高现有LLMs的工具使用能力。

Abstract
While enabling large language models to implement function calling (known as APIs) can greatly enhance the performance of LLMs, function calling is still a challenging task due to the complicated relations between different APIs, especially in a context-learning setting without fine-tuning. This paper proposes a simple yet controllable target-driven approach called Reverse Chain to empower LLMs with capabilities to use external APIs with only prompts. Given that most open-source LLMs have limited tool-use or tool-plan capabilities, LLMs in Reverse Chain are only employed to implement simple tasks, e.g., API selection and argument completion, and a generic rule is employed to implement a controllable multiple functions calling. In this generic rule, after selecting a final API to handle a given task via LLMs, we first ask LLMs to fill the required arguments from user query and context. Some missing arguments could be further completed by letting LLMs select another API based on API description before asking user. This process continues until a given task is completed. Extensive numerical experiments indicate an impressive capability of Reverse Chain on implementing multiple function calling. Interestingly enough, the experiments also reveal that tool-use capabilities of the existing LLMs, e.g., ChatGPT, can be greatly improved via Reverse Chain.

摘要
While enabling large language models to implement function calling (known as APIs) can greatly enhance the performance of LLMs, function calling is still a challenging task due to the complicated relations between different APIs, especially in a context-learning setting without fine-tuning. This paper proposes a simple yet controllable target-driven approach called Reverse Chain to empower LLMs with capabilities to use external APIs with only prompts. Given that most open-source LLMs have limited tool-use or tool-plan capabilities, LLMs in Reverse Chain are only employed to implement simple tasks, e.g., API selection and argument completion, and a generic rule is employed to implement a controllable multiple functions calling. In this generic rule, after selecting a final API to handle a given task via LLMs, we first ask LLMs to fill the required arguments from user query and context. Some missing arguments could be further completed by letting LLMs select another API based on API description before asking user. This process continues until a given task is completed. Extensive numerical experiments indicate an impressive capability of Reverse Chain on implementing multiple function calling. Interestingly enough, the experiments also reveal that tool-use capabilities of the existing LLMs, e.g., ChatGPT, can be greatly improved via Reverse Chain.Translation notes:* "large language models" (大型语言模型) is translated as "LLMs" (LLM) for brevity.* "function calling" (函数调用) is translated as "API" (API) to refer to the specific task or action being performed.* "context-learning setting" (上下文学习设置) is translated as "context-learning" (上下文学习) to emphasize the learning aspect.* "tool-use or tool-plan capabilities" (工具使用或计划能力) is translated as "tool-use capabilities" (工具使用能力) to simplify the phrase.* "Reverse Chain" (返回链) is translated as "Reverse Chain" (返回链) to maintain the original English name for clarity.* "extensive numerical experiments" (详细数学实验) is translated as "extensive numerical experiments" (详细数学实验) to maintain the original phrase for clarity.* "tool-use capabilities" (工具使用能力) is translated as "tool-use capabilities" (工具使用能力) to maintain consistency.

Effective Slogan Generation with Noise Perturbation

paper_url: http://arxiv.org/abs/2310.04472
repo_url: https://github.com/joannekim0420/slogangeneration
paper_authors: Jongeun Kim, MinChung Kim, Taehwan Kim
for: 本研究旨在自动生成优秀的企业品牌标语，以帮助企业建立独特的品牌identify。
methods: 本研究使用预训练的 transformer T5 模型，并在新提出的 1:N 匹配对数据集上加入噪声扰动。这种方法可以生成更加独特和凝重的品牌标语。
results: 根据 ROUGE1、ROUGEL 和夹角相似度指标，以及人类评测者的评价，本研究的方法在生成品牌标语方面表现更好，比基eline模型和其他 transformer-based 模型更高。

Abstract
Slogans play a crucial role in building the brand's identity of the firm. A slogan is expected to reflect firm's vision and brand's value propositions in memorable and likeable ways. Automating the generation of slogans with such characteristics is challenging. Previous studies developted and tested slogan generation with syntactic control and summarization models which are not capable of generating distinctive slogans. We introduce a a novel apporach that leverages pre-trained transformer T5 model with noise perturbation on newly proposed 1:N matching pair dataset. This approach serves as a contributing fator in generting distinctive and coherent slogans. Turthermore, the proposed approach incorporates descriptions about the firm and brand into the generation of slogans. We evaluate generated slogans based on ROUGE1, ROUGEL and Cosine Similarity metrics and also assess them with human subjects in terms of slogan's distinctiveness, coherence, and fluency. The results demonstrate that our approach yields better performance than baseline models and other transformer-based models.

摘要
标语对企业形象的建立扮演着关键角色。一句标语应该反映公司的未来方向和品牌价值提供程序，而且需要具备记忆性和喜欢性。但自动生成标语这种特点具备的挑战。先前的研究已经开发和测试了 syntax控制和摘要模型，但这些模型不能生成独特的标语。我们提出了一种新的方法，利用预训练的 transformer T5 模型，并在新的 1:N 匹配对数据集上加入噪音诱导。这种方法可以为生成独特和流畅的标语做出贡献。此外，我们的方法还会 incorporate 公司和品牌的描述信息到标语的生成中。我们根据 ROUGE1、ROUGEL 和 Kosine 相似度度量来评估生成的标语，并通过人类评审者对标语的独特性、 coherence 和流畅性进行评估。结果表明，我们的方法在基eline模型和 transformer-based 模型之上表现更好。

Excision and Recovery: Enhancing Surface Anomaly Detection with Attention-based Single Deterministic Masking

paper_url: http://arxiv.org/abs/2310.04010
repo_url: None
paper_authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Yeonho Lee, Juneho Yi
for: 这篇论文的目的是提出一种基于剪辑和恢复的异常检测方法，以解决Surface Inspection中的量度不均匀问题。
methods: 这篇论文使用了一个单检定的剪辑方法，即隐藏可疑异常区域并将其剪辑出来，以导致恢复模型的复原错误较大。
results: 实验结果显示，提出的EAR模型在一个常用的Surface Inspection数据集KolektorSDD2上，具有较好的异常检测性和更高的处理速率，相比于现有的方法。

Abstract
Anomaly detection (AD) in surface inspection is an essential yet challenging task in manufacturing due to the quantity imbalance problem of scarce abnormal data. To overcome the above, a reconstruction encoder-decoder (ED) such as autoencoder or U-Net which is trained with only anomaly-free samples is widely adopted, in the hope that unseen abnormals should yield a larger reconstruction error than normal. Over the past years, researches on self-supervised reconstruction-by-inpainting have been reported. They mask out suspected defective regions for inpainting in order to make them invisible to the reconstruction ED to deliberately cause inaccurate reconstruction for abnormals. However, their limitation is multiple random masking to cover the whole input image due to defective regions not being known in advance. We propose a novel reconstruction-by-inpainting method dubbed Excision and Recovery (EAR) that features single deterministic masking. For this, we exploit a pre-trained spatial attention model to predict potential suspected defective regions that should be masked out. We also employ a variant of U-Net as our ED to further limit the reconstruction ability of the U-Net model for abnormals, in which skip connections of different layers can be selectively disabled. In the training phase, all the skip connections are switched on to fully take the benefits from the U-Net architecture. In contrast, for inferencing, we only keep deeper skip connections with shallower connections off. We validate the effectiveness of EAR using an MNIST pre-trained attention for a commonly used surface AD dataset, KolektorSDD2. The experimental results show that EAR achieves both better AD performance and higher throughput than state-of-the-art methods. We expect that the proposed EAR model can be widely adopted as training and inference strategies for AD purposes.

摘要
anomaly detection (AD) 在表面检查中是一项重要但困难的任务，因为数据异常的量不均。为了解决这个问题，广泛采用一种重建编码器-解码器（ED），例如自动编码器或 U-Net，这些模型在只有正常样本上训练时具有良好的性能。在过去几年中，关于自我超级重建-填充的研究得到了报道。它们会将受到怀疑的缺陷区域填充，以便使其在重建过程中变得不可见，从而让异常样本的重建结果更加准确。然而，这些方法的限制是需要多个随机的填充，因为缺陷区域没有被预先知道。我们提出了一种新的重建-填充方法，称为摘除和恢复（EAR）。这种方法具有单个决定性的填充。我们利用一个预训练的空间注意力模型来预测可能的缺陷区域，并使用一种变体的 U-Net 作为我们的 ED。在训练阶段，所有的跳跃连接都被打开，以便完全利用 U-Net 体系的优势。在推理阶段，我们只保留更深层的跳跃连接，使得更浅层的跳跃连接被关闭。我们验证了 EAR 方法的有效性，使用 MNIST 预训练注意力模型，并在常用的表面异常检测数据集 KolektorSDD2 上进行了实验。实验结果表明，EAR 方法在异常检测性能和通过率方面都超过了当前的方法。我们预计，提出的 EAR 模型将广泛采用作为训练和推理策略。

Fast Neighborhood Search Heuristics for the Colorful Bin Packing Problem

paper_url: http://arxiv.org/abs/2310.04471
repo_url: https://gitlab.com/renanfernandofranco/fast-neighborhood-search-heuristics-for-the-colorful-bin-packing-problem
paper_authors: Renan F. F. da Silva, Yulle G. F. Borges, Rafael C. S. Schouery
for: 解决颜色分别压缩问题 (Colorful Bin Packing Problem, CBPP)，这是加载问题 (Bin Packing Problem, BPP) 的扩展。
methods: 提出了加载问题的有效规则和新规则，以及一种基于变量邻域搜索 (Variable Neighborhood Search, VNS) 和精算法 (Greedy Randomized Adaptive Search, GRASP) 的协同搜索策略。
results: 结果表明我们的协同策略比 VNS 更高效，并且两种方法都可以解决许多元素的实例，即使实际上有很多元素。

Abstract
The Colorful Bin Packing Problem (CBPP) is a generalization of the Bin Packing Problem (BPP). The CBPP consists of packing a set of items, each with a weight and a color, in bins of limited capacity, minimizing the number of used bins and satisfying the constraint that two items of the same color cannot be packed side by side in the same bin. In this article, we proposed an adaptation of BPP heuristics and new heuristics for the CBPP. Moreover, we propose a set of fast neighborhood search algorithms for CBPP. These neighborhoods are applied in a meta-heuristic approach based on the Variable Neighborhood Search (VNS) and a matheuristic approach that mixes linear programming with the meta-heuristics VNS and Greedy Randomized Adaptive Search (GRASP). The results indicate that our matheuristic is superior to VNS and that both approaches can find near-optimal solutions for a large number instances, even for instances with many items.

摘要
《彩色箱包问题（CBPP）》是对《箱包问题（BPP）》的推广。CBPP包括将一组物品，每个物品具有重量和颜色，打包在容量有限的箱中，最小化使用箱的数量，并满足两个同色物品不能在同一箱中并列的约束。在这篇文章中，我们提出了BPP的适应和新的CBPP启发式。此外，我们还提出了一组快速邻居搜索算法 дляCBPP。这些邻居是在基于变量邻居搜索（VNS）和精确优化的GRASP meta-heuristic方法中应用的。结果表明，我们的矩阵启发式在VNS的基础上实现了更好的性能，而两种方法都可以对大量实例进行优化。

Hierarchical Multi-Marginal Optimal Transport for Network Alignment

paper_url: http://arxiv.org/abs/2310.04470
repo_url: None
paper_authors: Zhichen Zeng, Boxin Du, Si Zhang, Yinglong Xia, Zhining Liu, Hanghang Tong
for: Multi-network alignment, an essential prerequisite for joint learning on multiple networks.
methods: Hierarchical multi-marginal optimal transport framework (HOT) with fused Gromov-Wasserstein (FGW) barycenter and generalized multi-marginal FGW distance.
results: Significant improvements over the state-of-the-art in both effectiveness and scalability.Here’s the full text in Simplified Chinese:
for: 这个论文主要用于多个网络之间的对应关系找索，也就是多网络学习的先修。
methods: 我们提出了一种层次多参数最优运输框架（HOT），通过粘合Gromov-Wasserstein（FGW）中心来划分多个网络，并基于多参数FGW距离来对多个网络进行对应关系找索。
results: 我们的HOT方法在效果和可扩展性两个方面具有显著的改善，比如当前的状态OFTHEART。

Abstract
Finding node correspondence across networks, namely multi-network alignment, is an essential prerequisite for joint learning on multiple networks. Despite great success in aligning networks in pairs, the literature on multi-network alignment is sparse due to the exponentially growing solution space and lack of high-order discrepancy measures. To fill this gap, we propose a hierarchical multi-marginal optimal transport framework named HOT for multi-network alignment. To handle the large solution space, multiple networks are decomposed into smaller aligned clusters via the fused Gromov-Wasserstein (FGW) barycenter. To depict high-order relationships across multiple networks, the FGW distance is generalized to the multi-marginal setting, based on which networks can be aligned jointly. A fast proximal point method is further developed with guaranteed convergence to a local optimum. Extensive experiments and analysis show that our proposed HOT achieves significant improvements over the state-of-the-art in both effectiveness and scalability.

摘要
找到多个网络之间的节点对应关系，即多网络对Alignment，是 JOINT learning 多网络的重要前提。despite 在多网络对Alignment 方面取得了很大的成功，文献中关于多网络对Alignment 的研究相对落后，这是因为解决多网络对Alignment 问题的解空间是 exponential 增长的，而且缺乏高阶差异度量。为了填补这一漏洞，我们提出了一种基于层次多margin optimal transport 框架的 HOT 方法，用于多网络对Alignment。为了处理庞大的解空间，我们将多个网络 decomposed 成更小的对应集合，使用 Fused Gromov-Wasserstein (FGW) 中心进行aligned clustering。此外，我们将 FGW 距离推广到多margin Setting，以便在多个网络之间 depict 高阶关系。基于这些推广的 FGW 距离，我们可以将多个网络进行 JOINT 对Alignment。进一步，我们开发了一种快速的 proximal point method，并证明其 convergence 是一个本地最优点。广泛的实验和分析表明，我们的提出的 HOT 方法可以在效果和可扩展性两个方面取得显著的改进。

CUPre: Cross-domain Unsupervised Pre-training for Few-Shot Cell Segmentation

paper_url: http://arxiv.org/abs/2310.03981
repo_url: None
paper_authors: Weibin Liao, Xuhong Li, Qingzhong Wang, Yanwu Xu, Zhaozheng Yin, Haoyi Xiong
for: 这篇论文的目的是提出一种几何对称预训练方法（CUPre），以便在几何预训练中将通用物品识别和实例分类的能力转移到细胞的视觉领域，并且仅使用少量标注的细胞图像进行预训练。
methods: 这篇论文使用的方法包括：1) alternate multi-task pre-training（AMT2），2) 热度对称对抗学习（MoCo），3) 实例分类。
results: 实验结果显示，使用CUPre进行预训练后，在几何细胞分类和检测任务中，可以实现高精度的结果，比较 existing pre-training 方法更高。

Abstract
While pre-training on object detection tasks, such as Common Objects in Contexts (COCO) [1], could significantly boost the performance of cell segmentation, it still consumes on massive fine-annotated cell images [2] with bounding boxes, masks, and cell types for every cell in every image, to fine-tune the pre-trained model. To lower the cost of annotation, this work considers the problem of pre-training DNN models for few-shot cell segmentation, where massive unlabeled cell images are available but only a small proportion is annotated. Hereby, we propose Cross-domain Unsupervised Pre-training, namely CUPre, transferring the capability of object detection and instance segmentation for common visual objects (learned from COCO) to the visual domain of cells using unlabeled images. Given a standard COCO pre-trained network with backbone, neck, and head modules, CUPre adopts an alternate multi-task pre-training (AMT2) procedure with two sub-tasks -- in every iteration of pre-training, AMT2 first trains the backbone with cell images from multiple cell datasets via unsupervised momentum contrastive learning (MoCo) [3], and then trains the whole model with vanilla COCO datasets via instance segmentation. After pre-training, CUPre fine-tunes the whole model on the cell segmentation task using a few annotated images. We carry out extensive experiments to evaluate CUPre using LIVECell [2] and BBBC038 [4] datasets in few-shot instance segmentation settings. The experiment shows that CUPre can outperform existing pre-training methods, achieving the highest average precision (AP) for few-shot cell segmentation and detection.

摘要
而�PREtraining on object detection tasks, such as Common Objects in Contexts (COCO) [1], can significantly boost the performance of cell segmentation, but it still requires a large amount of fine-annotated cell images [2] with bounding boxes, masks, and cell types for every cell in every image to fine-tune the pre-trained model. To reduce the cost of annotation, this work considers the problem of pre-training deep neural network (DNN) models for few-shot cell segmentation, where massive unlabeled cell images are available but only a small proportion is annotated. Hereby, we propose Cross-domain Unsupervised Pre-training, namely CUPre, which transfers the capability of object detection and instance segmentation for common visual objects (learned from COCO) to the visual domain of cells using unlabeled images. Given a standard COCO pre-trained network with backbone, neck, and head modules, CUPre adopts an alternate multi-task pre-training (AMT2) procedure with two sub-tasks -- in every iteration of pre-training, AMT2 first trains the backbone with cell images from multiple cell datasets via unsupervised momentum contrastive learning (MoCo) [3], and then trains the whole model with vanilla COCO datasets via instance segmentation. After pre-training, CUPre fine-tunes the whole model on the cell segmentation task using a few annotated images. We conduct extensive experiments to evaluate CUPre using LIVECell [2] and BBBC038 [4] datasets in few-shot instance segmentation settings. The experiment shows that CUPre can outperform existing pre-training methods, achieving the highest average precision (AP) for few-shot cell segmentation and detection.

Perfect Alignment May be Poisonous to Graph Contrastive Learning

paper_url: http://arxiv.org/abs/2310.03977
repo_url: None
paper_authors: Jingyu Liu, Huayi Tang, Yong Liu
for: 本文旨在研究图像学习中的增强技术，以及这些技术如何影响下游任务的表现。
methods: 本文使用了图像增强法，包括增强图像的尺寸、颜色、纹理等方面，以及使用了对照学习来验证图像的表现。
results: 研究发现，图像增强法可以提高下游任务的表现，但是需要在增强程度和类型上进行调整。同时，对照学习可以验证图像的表现，但是需要避免过于准确的对照，以免导致模型的泛化能力受到影响。

Abstract
Graph Contrastive Learning (GCL) aims to learn node representations by aligning positive pairs and separating negative ones. However, limited research has been conducted on the inner law behind specific augmentations used in graph-based learning. What kind of augmentation will help downstream performance, how does contrastive learning actually influence downstream tasks, and why the magnitude of augmentation matters? This paper seeks to address these questions by establishing a connection between augmentation and downstream performance, as well as by investigating the generalization of contrastive learning. Our findings reveal that GCL contributes to downstream tasks mainly by separating different classes rather than gathering nodes of the same class. So perfect alignment and augmentation overlap which draw all intra-class samples the same can not explain the success of contrastive learning. Then in order to comprehend how augmentation aids the contrastive learning process, we conduct further investigations into its generalization, finding that perfect alignment that draw positive pair the same could help contrastive loss but is poisonous to generalization, on the contrary, imperfect alignment enhances the model's generalization ability. We analyse the result by information theory and graph spectrum theory respectively, and propose two simple but effective methods to verify the theories. The two methods could be easily applied to various GCL algorithms and extensive experiments are conducted to prove its effectiveness.

摘要
graph对比学习（GCL）目标是通过对正方向对对应的锚点进行对应，以便学习锚点表示。然而，有限的研究已经进行了关于在图基于学习中使用特定的扩充方法的内律。这种扩充方法会帮助下游性能，如何对下游任务产生影响，以及为什么扩充方法的大小重要吗？这篇论文旨在回答这些问题，同时也进行了对扩充和对比学习的总体化研究。我们的发现表明，GCL在下游任务中的主要贡献是将不同类型的锚点分开，而不是将同类型的锚点集中。因此，完美对齐和扩充 overlap，即将所有同类型的锚点都设为同一个样本，无法解释对比学习的成功。为了更好地理解扩充对对比学习的作用，我们进行了进一步的总体研究，发现了以下结论：完美对齐可以帮助对比损失，但是对总体化有毒。相反，不完美对齐可以增强模型的总体化能力。我们通过信息学和图谱论断来分析结果，并提出了两种简单 yet 有效的方法来验证这些理论。这两种方法可以轻松应用于各种GCL算法，并进行了广泛的实验来证明其效果。

Sub-token ViT Embedding via Stochastic Resonance Transformers

paper_url: http://arxiv.org/abs/2310.03967
repo_url: None
paper_authors: Dong Lao, Yangchao Wu, Tian Yu Liu, Alex Wong, Stefano Soatto
For: The paper is written to address the issue of quantization artifacts in Vision Transformers (ViTs) and to propose a zero-shot method to improve the handling of spatial quantization in pre-trained ViTs.* Methods: The proposed method, called Stochastic Resonance Transformer (SRT), ensembles the features obtained from perturbing input images via sub-token spatial translations, inspired by Stochastic Resonance. SRT can be applied at any layer, on any task, and does not require any fine-tuning.* Results: The paper shows that SRT can effectively super-resolve features of pre-trained ViTs, capturing more of the local fine-grained structures that might otherwise be neglected as a result of tokenization. SRT outperforms the baseline models by an average of 4.7% and 14.9% on the RMSE and RMSE-log metrics across three different architectures for monocular depth prediction, and by an average of 2.4% in F&J score for semi-supervised video object segmentation. Additionally, the paper shows that SRT improves upon the base model by an average of 2.1% on the maxF metric for unsupervised salient region segmentation, and yields consistent improvements of up to 2.6% and 1.0% respectively for image retrieval and object discovery.

Abstract
We discover the presence of quantization artifacts in Vision Transformers (ViTs), which arise due to the image tokenization step inherent in these architectures. These artifacts result in coarsely quantized features, which negatively impact performance, especially on downstream dense prediction tasks. We present a zero-shot method to improve how pre-trained ViTs handle spatial quantization. In particular, we propose to ensemble the features obtained from perturbing input images via sub-token spatial translations, inspired by Stochastic Resonance, a method traditionally applied to climate dynamics and signal processing. We term our method ``Stochastic Resonance Transformer" (SRT), which we show can effectively super-resolve features of pre-trained ViTs, capturing more of the local fine-grained structures that might otherwise be neglected as a result of tokenization. SRT can be applied at any layer, on any task, and does not require any fine-tuning. The advantage of the former is evident when applied to monocular depth prediction, where we show that ensembling model outputs are detrimental while applying SRT on intermediate ViT features outperforms the baseline models by an average of 4.7% and 14.9% on the RMSE and RMSE-log metrics across three different architectures. When applied to semi-supervised video object segmentation, SRT also improves over the baseline models uniformly across all metrics, and by an average of 2.4% in F&J score. We further show that these quantization artifacts can be attenuated to some extent via self-distillation. On the unsupervised salient region segmentation, SRT improves upon the base model by an average of 2.1% on the maxF metric. Finally, despite operating purely on pixel-level features, SRT generalizes to non-dense prediction tasks such as image retrieval and object discovery, yielding consistent improvements of up to 2.6% and 1.0% respectively.

摘要

Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models

paper_url: http://arxiv.org/abs/2310.03965
repo_url: None
paper_authors: Junchi Yu, Ran He, Rex Ying
for: 提高大型自然语言模型（LLM）的复杂理解能力
methods: 利用相似问题的解决方案和推理策略增强LLM的推理能力
results: 相比基eline，TP提高了短路推理任务的优化解决率12%，创作写作任务的人类偏好提高13%，LLM-Agent规划任务的完成率提高15%

Abstract
Large Language Models (LLMs) have achieved remarkable success in reasoning tasks with the development of prompting methods. However, existing prompting approaches cannot reuse insights of solving similar problems and suffer from accumulated errors in multi-step reasoning, since they prompt LLMs to reason \textit{from scratch}. To address these issues, we propose \textbf{\textit{Thought Propagation} (TP)}, which explores the analogous problems and leverages their solutions to enhance the complex reasoning ability of LLMs. These analogous problems are related to the input one, with reusable solutions and problem-solving strategies. Thus, it is promising to propagate insights of solving previous analogous problems to inspire new problem-solving. To achieve this, TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one. Then, TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch. TP is compatible with existing prompting approaches, allowing plug-and-play generalization and enhancement in a wide range of tasks without much labor in task-specific prompt engineering. Experiments across three challenging tasks demonstrate TP enjoys a substantial improvement over the baselines by an average of 12\% absolute increase in finding the optimal solutions in Shortest-path Reasoning, 13\% improvement of human preference in Creative Writing, and 15\% enhancement in the task completion rate of LLM-Agent Planning.

摘要
大型语言模型（LLM）在逻辑任务中取得了杰出的成功，但现有的提示方法不能重复这些成果和受到累累逻辑的影响，因为它们将LLM训练自scratch。为了解决这些问题，我们提出了“思维传播”（TP），它探索相关的问题和将其解决方案传播到LLM中，以增强它们的复杂逻辑能力。这些相关问题的解决方案可以重复使用，因此可以将这些问题的解决方案传播到LLM中，以激发它们对新的问题的解决。TP可以与现有的提示方法相容，允许插件和改进在广泛的任务中，而不需要大量的问题特定的提示工程。实验结果显示，TP与基准相比，平均提高了12%的最佳解决方案找到率，13%的人类偏好在创意写作中，和15%的任务完成率。

A Learnable Counter-condition Analysis Framework for Functional Connectivity-based Neurological Disorder Diagnosis

paper_url: http://arxiv.org/abs/2310.03964
repo_url: https://github.com/es-kang/learnable-counter-condition-FC
paper_authors: Eunsong Kang, Da-woon Heo, Jiwon Lee, Heung-Il Suk
for: 这研究旨在了解 neuroscience 疾病的生物特征，通过深度学习模型进行识别疾病并进行后续分析以找出疾病相关的生物标志物。
methods: 我们提出了一种新的一体化框架，它将诊断和特征提取集成在一起，并提出了一种适应性注意力网络来实现特征选择。我们还提出了一种功能网络关系编码器，它可以捕捉全局的功能连接 topological 性，而不需要先定义函数网络之间的边。
results: 我们的框架可以提供更高的诊断精度和解释力，并且通过对抗条件分析来描述疾病相关的神经生物学特征。我们使用了大量的 REST-meta-MDD 和 ABIDE 数据集，并证明了我们的框架在疾病识别方面的优异性。

Abstract
To understand the biological characteristics of neurological disorders with functional connectivity (FC), recent studies have widely utilized deep learning-based models to identify the disease and conducted post-hoc analyses via explainable models to discover disease-related biomarkers. Most existing frameworks consist of three stages, namely, feature selection, feature extraction for classification, and analysis, where each stage is implemented separately. However, if the results at each stage lack reliability, it can cause misdiagnosis and incorrect analysis in afterward stages. In this study, we propose a novel unified framework that systemically integrates diagnoses (i.e., feature selection and feature extraction) and explanations. Notably, we devised an adaptive attention network as a feature selection approach to identify individual-specific disease-related connections. We also propose a functional network relational encoder that summarizes the global topological properties of FC by learning the inter-network relations without pre-defined edges between functional networks. Last but not least, our framework provides a novel explanatory power for neuroscientific interpretation, also termed counter-condition analysis. We simulated the FC that reverses the diagnostic information (i.e., counter-condition FC): converting a normal brain to be abnormal and vice versa. We validated the effectiveness of our framework by using two large resting-state functional magnetic resonance imaging (fMRI) datasets, Autism Brain Imaging Data Exchange (ABIDE) and REST-meta-MDD, and demonstrated that our framework outperforms other competing methods for disease identification. Furthermore, we analyzed the disease-related neurological patterns based on counter-condition analysis.

摘要
为了理解神经疾病的生物特征，现有研究广泛采用深度学习基本模型来识别疾病，并通过可解释模型来找出疾病相关的生物标志物。现有的框架通常包括三个阶段，即特征选择、特征提取 для分类和分析，每个阶段都是独立实现的。然而，如果每个阶段的结果无法靠拢，可能会导致诊断和错误分析。在这种情况下，我们提出了一种新的一体化框架，系统地结合诊断和解释。特别是，我们设计了适应性注意力网络作为特征选择方法，以特定疾病相关的连接进行个体特定的诊断。此外，我们提出了功能网络关系编码器，以学习无定义的函数网络关系，从而捕捉全局的 topological 特征。最后，我们的框架还提供了一种新的解释力，即对神经科学的解释，也称为对 condition 分析。我们在FC中反转诊断信息（i.e., counter-condition FC），将正常脑变异常，并将异常脑变正常。我们使用了两个大的休息态功能磁共振成像（fMRI）数据集，ABIDE 和 REST-meta-MDD，并证明了我们的框架在疾病识别方面高效。此外，我们还分析了疾病相关的神经学特征。

Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations

paper_url: http://arxiv.org/abs/2310.03951
repo_url: https://github.com/microsoft/conli_hallucination
paper_authors: Deren Lei, Yaxi Li, Mengya Hu, Mingyu Wang, Vincent Yun, Emily Ching, Eslam Kamal
for: 这篇论文是为了探讨大型自然语言模型（LLMs）如何检测和解决它们生成的幻视文本。
methods: 这篇论文提出了一个几何构造，用于检测和解决 LLMs 生成的幻视文本。这个构造使用了自然语言推理链（CoNLI）来检测幻视文本，并通过后期重写来实现幻视文本的减少。
results: 这篇论文的结果显示，这个几何构造可以实现高度的幻视文本检测和减少，并且可以在不需要精确调整或域专标语言调整的情况下，实现文本质量的提高。

Abstract
Large language models (LLMs) can generate fluent natural language texts when given relevant documents as background context. This ability has attracted considerable interest in developing industry applications of LLMs. However, LLMs are prone to generate hallucinations that are not supported by the provided sources. In this paper, we propose a hierarchical framework to detect and mitigate such ungrounded hallucination. Our framework uses Chain of Natural Language Inference (CoNLI) for hallucination detection and hallucination reduction via post-editing. Our approach achieves state-of-the-art performance on hallucination detection and enhances text quality through rewrite, using LLMs without any fine-tuning or domain-specific prompt engineering. We show that this simple plug-and-play framework can serve as an effective choice for hallucination detection and reduction, achieving competitive performance across various contexts.

摘要
大型自然语言模型（LLM）可以生成流畅的自然语言文本，当给定相关的文档作为背景上下文时。这种能力吸引了产业应用的广泛关注。然而，LLM容易产生没有文档支持的幻见。在这篇论文中，我们提出了一种层次结构的检测和 Mitigate 幻见的框架。我们的框架使用自然语言推理链（CoNLI） для幻见检测和幻见减少via重写。我们的方法实现了状态艺术的幻见检测性能和提高文本质量通过重写，无需任何细化或域特定的提问工程。我们展示了这个简单的插件和玩家可以作为有效的幻见检测和减少选择，在不同的上下文中实现竞争性能。

2023-10-06

cs.CL

cs.CL - 2023-10-06

From Nuisance to News Sense: Augmenting the News with Cross-Document Evidence and Context

paper_url: http://arxiv.org/abs/2310.04592
repo_url: None
paper_authors: Jeremiah Milbauer, Ziqi Ding, Zhijin Wu, Tongshuang Wu
for: 帮助用户更好地理解新闻故事，避免受到各种信息来源的混乱和谣言的影响。
methods: 使用referenceless fact verification技术，将多种新闻articles integrate into一个中心主题，并在文章中提供相关的补充信息和鉴别信息。
results: 在试验中，NEWSSENSE能够帮助用户更好地找到关键信息，验证新闻文章的准确性，并探索不同的视角。

Abstract
Reading and understanding the stories in the news is increasingly difficult. Reporting on stories evolves rapidly, politicized news venues offer different perspectives (and sometimes different facts), and misinformation is rampant. However, existing solutions merely aggregate an overwhelming amount of information from heterogenous sources, such as different news outlets, social media, and news bias rating agencies. We present NEWSSENSE, a novel sensemaking tool and reading interface designed to collect and integrate information from multiple news articles on a central topic, using a form of reference-free fact verification. NEWSSENSE augments a central, grounding article of the user's choice by linking it to related articles from different sources, providing inline highlights on how specific claims in the chosen article are either supported or contradicted by information from other articles. Using NEWSSENSE, users can seamlessly digest and cross-check multiple information sources without disturbing their natural reading flow. Our pilot study shows that NEWSSENSE has the potential to help users identify key information, verify the credibility of news articles, and explore different perspectives.

摘要
阅读和理解新闻故事越来越Difficult。新闻报道不断发展，政治化的新闻场景提供不同的视角（有时还有不同的事实），而且谣言游走在社交媒体上。然而，现有的解决方案只是将过载量的信息从多个来源集中到一起，如不同的新闻报道、社交媒体和新闻偏见评级机构。我们介绍NEWSSENSE，一种新的意义感知工具和阅读界面，可以收集和结合多篇关于同一主题的新闻文章，使用参照文章的无需准备。NEWSSENSE将用户选择的中心文章与不同来源的相关文章集成，并在文章中提供 inline 高亮，以显示特定文章中的声明是否由其他文章中的信息支持或否认。使用NEWSSENSE，用户可以无需干扰自然阅读流程，轻松摄取和检查多个信息源。我们的试点研究表明，NEWSSENSE有助于用户发现关键信息、验证新闻文章的可靠性和探索不同的视角。

Measuring Information in Text Explanations

paper_url: http://arxiv.org/abs/2310.04557
repo_url: https://github.com/klonnet23/helloy-word
paper_authors: Zining Zhu, Frank Rudzicz
for: 这篇论文的目的是为了探讨文本解释的评估方法，以及两种流行的文本解释方法的评估标准。
methods: 这篇论文使用信息理论的框架来评估文本解释的质量，并量化了文本解释的信息流动。
results: 研究发现，使用信息理论的评估方法可以帮助评估文本解释的质量，并且可以揭示不同文本解释方法的下面机制。例如，NLEs在传输输入相关信息和目标相关信息之间存在一定的权衡，而 rationales 则不具有这种机制。

Abstract
Text-based explanation is a particularly promising approach in explainable AI, but the evaluation of text explanations is method-dependent. We argue that placing the explanations on an information-theoretic framework could unify the evaluations of two popular text explanation methods: rationale and natural language explanations (NLE). This framework considers the post-hoc text pipeline as a series of communication channels, which we refer to as ``explanation channels''. We quantify the information flow through these channels, thereby facilitating the assessment of explanation characteristics. We set up tools for quantifying two information scores: relevance and informativeness. We illustrate what our proposed information scores measure by comparing them against some traditional evaluation metrics. Our information-theoretic scores reveal some unique observations about the underlying mechanisms of two representative text explanations. For example, the NLEs trade-off slightly between transmitting the input-related information and the target-related information, whereas the rationales do not exhibit such a trade-off mechanism. Our work contributes to the ongoing efforts in establishing rigorous and standardized evaluation criteria in the rapidly evolving field of explainable AI.

摘要
文本基本解释方法在可解释AI中具有极高的潜力，但评估文本解释的方法受到方法的限制。我们认为将文本解释放在信息理论框架上可以统一两种受欢迎的文本解释方法：理由和自然语言解释（NLE）的评估。这个框架视文本解释管道为一系列通信频道，我们称之为“解释频道”。我们量化这些频道中的信息流动，从而促进解释特征的评估。我们设立了量化两种信息分数的工具：相关性和启示性。我们explain了我们所提出的信息分数是什么，并与传统评估指标进行比较。我们的信息理论分数表明了两种文本解释的下面机制。例如，NLEs在传递输入相关信息和目标相关信息之间进行了微妙的 equilibrio，而 rationales没有这种平衡机制。我们的工作贡献到了在可解释AI领域积极发展的评估标准化的努力。

Module-wise Adaptive Distillation for Multimodality Foundation Models

paper_url: http://arxiv.org/abs/2310.04550
repo_url: None
paper_authors: Chen Liang, Jiahui Yu, Ming-Hsuan Yang, Matthew Brown, Yin Cui, Tuo Zhao, Boqing Gong, Tianyi Zhou
for: 这个研究旨在提高预训练多Modal基础模型的可靠性和可扩展性，通过将大型教师模型转易为小型学习模型。
methods: 这个研究使用了层刻散布法，将大型教师模型中的各层数据转易为小型学习模型中的各层数据，以提高学习模型的可靠性和可扩展性。
results: 这个研究透过实验发现，使用OPTIMA算法可以将模型更好地调整，从而提高预训练多Modal基础模型的可靠性和可扩展性。

Abstract
Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer. Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently. Such an approach can be naturally formulated as a multi-armed bandit (MAB) problem, where modules and loss decrements are considered as arms and rewards, respectively. We then develop a modified-Thompson sampling algorithm named OPTIMA to address the nonstationarity of module contributions resulting from model updating. Specifically, we leverage the observed contributions in recent history to estimate the changing contribution of each module and select modules based on these estimations to maximize the cumulative contribution. We evaluate the effectiveness of OPTIMA through distillation experiments on various multimodal understanding and image captioning tasks, using the CoCa-Large model (Yu et al., 2022) as the teacher model.

摘要
<>传统的多Modal基础模型已经表现出了惊人的通用性，但是它们的大小却带来了部署的挑战。一种有效的减少大小的方法是层WISE的distillation，其中小的学生模型在每层都被训练以匹配大的教师模型的隐藏表示。我们发现了一些architecture组件，被称为模块，在学生的性能中发挥了更大的作用，我们因此提议在distillation过程中跟踪这些模块的贡献。我们可以将这种方法形式化为多重投掷（MAB）问题，其中模块和loss减掉被视为手中的武器和奖励，分别。我们然后开发了一种修改后Thompson投掷算法，名为OPTIMA，以解决模块贡献的不平等。我们利用了最近历史中每个模块的贡献 Observation，来估算每个模块的变化贡献，并根据这些估算选择模块，以最大化总贡献。我们通过在多种多Modal理解和图像描述任务上进行distillation实验，使用Yu et al.（2022）的CoCa-Large模型作为教师模型，证明OPTIMA的有效性。

Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology

paper_url: http://arxiv.org/abs/2310.04529
repo_url: https://github.com/USArmyResearchLab/ARL-Creative-Visual-Storytelling
paper_authors: Brett A. Halperin, Stephanie M. Lukin
for: 这篇论文收集了100个视觉故事作者参与的系统创作过程中的图片序列，并进行了密切的阅读和主题分析，描述了5种在视觉故事创作过程中出现的变化：（1）描述vs预测；（2）动态特征实体/物体；（3）感受场景信息；（4）调整情感；（5）编码故事偏见。
methods: 这篇论文采用了密切的阅读和主题分析来研究视觉故事创作过程中的变化。
results: 这篇论文描述了5种在视觉故事创作过程中出现的变化，并提出了计算视觉故事创作的智能 критерионов：创作、可靠、表达、固有和负责任。

Abstract
In this paper, we collect an anthology of 100 visual stories from authors who participated in our systematic creative process of improvised story-building based on image sequences. Following close reading and thematic analysis of our anthology, we present five themes that characterize the variations found in this creative visual storytelling process: (1) Narrating What is in Vision vs. Envisioning; (2) Dynamically Characterizing Entities/Objects; (3) Sensing Experiential Information About the Scenery; (4) Modulating the Mood; (5) Encoding Narrative Biases. In understanding the varied ways that people derive stories from images, we offer considerations for collecting story-driven training data to inform automatic story generation. In correspondence with each theme, we envision narrative intelligence criteria for computational visual storytelling as: creative, reliable, expressive, grounded, and responsible. From these criteria, we discuss how to foreground creative expression, account for biases, and operate in the bounds of visual storyworlds.

摘要
在这篇论文中，我们收集了100个视觉故事作者参与我们系统化的创作过程中的故事建构，并进行了仔细的阅读和主题分析。根据我们的分析，我们发现了5种主题，它们描述了在这种创作过程中变化的方式：1. 描述视野中的事物 vs. 预测未来的情境2. 动态 caracterize entities/objects3. 感受景色中的情感信息4. 调节情感5. 编码故事偏见在理解人们如何从图像中获得故事的过程中，我们提供了收集故事驱动的训练数据的考虑事项，以 Inform automatic story generation。与每个主题相对应，我们提出了计算视觉故事创作的智能准则：创造力、可靠性、表达力、基于现实的、负责任。从这些准则中，我们讨论了如何强调创作表达，考虑偏见，并在视觉故事世界中运行。

RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation

paper_url: http://arxiv.org/abs/2310.04408
repo_url: https://github.com/carriex/recomp
paper_authors: Fangyuan Xu, Weijia Shi, Eunsol Choi
for: 提高语言模型（LM）在各种任务上的表现，如语言模型化和开放问答任务。
methods: 使用各种文档检索和简要生成技术，如提取式压缩器和抽象压缩器，将检索到的文档简化为短文本摘要，以提高LM的表现。
results: 在语言模型化任务和开放问答任务中，实现了6%的压缩率，而无需 sacrifiSing表现质量，并且可以在不同的LM上进行模型转换。

Abstract
Retrieving documents and prepending them in-context at inference time improves performance of language model (LMs) on a wide range of tasks. However, these documents, often spanning hundreds of words, make inference substantially more expensive. We propose compressing the retrieved documents into textual summaries prior to in-context integration. This not only reduces the computational costs but also relieves the burden of LMs to identify relevant information in long retrieved documents. We present two compressors -- an extractive compressor which selects useful sentences from retrieved documents and an abstractive compressor which generates summaries by synthesizing information from multiple documents. Both compressors are trained to improve LMs' performance on end tasks when the generated summaries are prepended to the LMs' input, while keeping the summary concise.If the retrieved documents are irrelevant to the input or offer no additional information to LM, our compressor can return an empty string, implementing selective augmentation.We evaluate our approach on language modeling task and open domain question answering task. We achieve a compression rate of as low as 6% with minimal loss in performance for both tasks, significantly outperforming the off-the-shelf summarization models. We show that our compressors trained for one LM can transfer to other LMs on the language modeling task and provide summaries largely faithful to the retrieved documents.

摘要
LMs的表现可以通过在推理时预先附加文档来提高表现，但这些文档经常 span hundreds of words，使推理成本增加substantially。我们提议将检索到的文档压缩成短文档摘要，以降低计算成本并使LM不必检索长文档中重要信息。我们提出了两种压缩器：一种是提取用于检索到的文档中有用句子的抽取压缩器，另一种是通过将多个文档的信息合并来生成摘要的抽取压缩器。两种压缩器都是根据LM在输入中预先附加摘要来提高LM的表现，而且保持摘要简洁。如果检索到的文档与输入无关或无法提供LM任何新信息，我们的压缩器可以返回空串，实现选择性的扩展。我们在语言模型任务和开放问题 answering任务中评估了我们的方法，实现了 compression rate as low as 6% ，与传统摘要模型相比，表现出明显的提升。我们还证明了我们的压缩器可以在不同LM上进行转移，并为推理任务提供大致 faithful 的摘要。

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

paper_url: http://arxiv.org/abs/2310.04399
repo_url: None
paper_authors: Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li
for: 本研究旨在解决同时扩展语言翻译中的精度稳定问题，尤其是在实时通信中。
methods: 本研究提出了一种新的修订控制方法，通过在搜索剔除过程中设置允许的修订窗口，以避免选择需要大量修订的候选翻译，从而减少了精度稳定问题。
results: 实验结果表明，提议的方法可以重要地改善翻译稳定性，而无需妥协substantially translation质量。

Abstract
Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in the flickering of partial results. In this paper, we propose a novel revision-controllable method designed to address this issue. Our method introduces an allowed revision window within the beam search pruning process to screen out candidate translations likely to cause extensive revisions, leading to a substantial reduction in flickering and, crucially, providing the capability to completely eliminate flickering. The experiments demonstrate the proposed method can significantly improve the decoding stability without compromising substantially on the translation quality.

摘要
simultaneous 语音到文本翻译在实时跨语言通信中发挥关键作用。 despite recent advances, there are still challenges in achieving stability in the translation process, primarily manifesting as flickering of partial results. in this paper, we propose a novel revision-controllable method to address this issue. our method introduces an allowed revision window within the beam search pruning process to screen out candidate translations likely to cause extensive revisions, leading to a substantial reduction in flickering and, crucially, providing the capability to completely eliminate flickering. the experiments demonstrate that the proposed method can significantly improve decoding stability without compromising translation quality.

Amortizing intractable inference in large language models

paper_url: http://arxiv.org/abs/2310.04363
repo_url: None
paper_authors: Edward J. Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin
for: 这个论文的目的是怎样使用自然语言模型（LLM）来实现知识压缩和数据效果的融合。
methods: 这个论文使用了权重级分布的权重学习算法（GFlowNets）来让LLM进行权重学习，以实现更好的数据效果。
results: 这个论文的实验结果表明，通过使用这种分布匹配方法，LLM可以更好地适应具有多步骤的理智和工具使用的任务。

Abstract
Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.

摘要
自适应大语言模型（LLM）通过下一个元素的 conditional 分布压缩知识从训练数据。这限制了可追踪的知识查询到开始到终止的排序样本。然而，许多有用的任务，包括序列续写、填充和其他受限制的生成任务，涉及到无法解决的 posterior 分布。我们解决这个限制，使用权重学习算法来精心 fine-tune LLM，以实现各种可追踪的 posterior 分布。我们称之为泛化 Bayesian 推理（GFlowNets）。我们经验表明，这种分布匹配方法可以作为最大likelihood 训练和奖励最大化策略的有效替代方案。作为重要应用，我们解释了链条思维为 latent 变量模型问题，并示出我们的方法可以实现数据效率地适应 LLM 到需要多步合理化和工具使用的任务。

Transferring speech-generic and depression-specific knowledge for Alzheimer’s disease detection

paper_url: http://arxiv.org/abs/2310.04358
repo_url: None
paper_authors: Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang
for: 本研究旨在提高阿尔茨染色肿瘤病（AD）诊断精度，通过知识传递来解决干预数据稀缺问题。
methods: 本研究使用了知识传递，特别是从普通语音和文本数据预训练的基础模型中提取的语音特征知识，以及高危病症抑郁症诊断任务的知识同时传递。
results: 实验结果表明，提出的方法可以提高AD和抑郁症诊断精度，并在ADReSSo数据集上实现了状态最佳的F1分数0.928。

Abstract
The detection of Alzheimer's disease (AD) from spontaneous speech has attracted increasing attention while the sparsity of training data remains an important issue. This paper handles the issue by knowledge transfer, specifically from both speech-generic and depression-specific knowledge. The paper first studies sequential knowledge transfer from generic foundation models pretrained on large amounts of speech and text data. A block-wise analysis is performed for AD diagnosis based on the representations extracted from different intermediate blocks of different foundation models. Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD. A parallel knowledge transfer framework is studied that jointly learns the information shared between these two tasks. Experimental results show that the proposed method improves AD and depression detection, and produces a state-of-the-art F1 score of 0.928 for AD diagnosis on the commonly used ADReSSo dataset.

摘要
抑郁症和阿尔茨海默症诊断从自然语言中进行探测已经吸引了越来越多的关注，但训练数据的稀缺性仍然是一个重要的问题。本文通过知识传递来解决这个问题，具体来说是从speech-通用和抑郁症特定的知识中进行传递。本文首先研究了基于大量的语音和文本数据预训练的基础模型，然后对不同的中间块进行块级分析，以实现AD诊断。此外，本文还提出了同时传递speech抑郁症诊断任务的知识，根据抑郁症和AD的高共同发病率。本文提出了并行知识传递框架，并同时学习这两个任务之间的共享信息。实验结果表明，提出的方法可以提高AD和抑郁症的诊断精度，并在常用的ADReSSo数据集上获得了0.928的F1分数，创下了状态精度记录。

Large-Scale Korean Text Dataset for Classifying Biased Speech in Real-World Online Services

paper_url: http://arxiv.org/abs/2310.04313
repo_url: https://github.com/dasol-choi/komultitext
paper_authors: Dasol Choi, Jooyoung Song, Eunsun Lee, Jinwoo Seo, Heejune Park, Dongbin Na
for: 这篇论文的目的是提出一种新的大规模 hate speech 检测算法，以解决韩国社交媒体平台上的偏见语言问题。
methods: 该论文使用了BERT基于语言模型，并对文本样本进行多任务学习，包括偏见检测、荒情检测和不雅语检测。
results: 该论文的方法可以超越人类水平的准确率，并且可以同时检测多种类型的偏见语言。这些结果可以为实际的 hate speech 和偏见 mitigation 提供实用的解决方案，从而改善在线社区的健康状况。

Abstract
With the growth of online services, the need for advanced text classification algorithms, such as sentiment analysis and biased text detection, has become increasingly evident. The anonymous nature of online services often leads to the presence of biased and harmful language, posing challenges to maintaining the health of online communities. This phenomenon is especially relevant in South Korea, where large-scale hate speech detection algorithms have not yet been broadly explored. In this paper, we introduce a new comprehensive, large-scale dataset collected from a well-known South Korean SNS platform. Our proposed dataset provides annotations including (1) Preferences, (2) Profanities, and (3) Nine types of Bias for the text samples, enabling multi-task learning for simultaneous classification of user-generated texts. Leveraging state-of-the-art BERT-based language models, our approach surpasses human-level accuracy across diverse classification tasks, as measured by various metrics. Beyond academic contributions, our work can provide practical solutions for real-world hate speech and bias mitigation, contributing directly to the improvement of online community health. Our work provides a robust foundation for future research aiming to improve the quality of online discourse and foster societal well-being. All source codes and datasets are publicly accessible at https://github.com/Dasol-Choi/KoMultiText.

摘要
随着在线服务的发展，卷积式文本分类算法，如情感分析和偏见文本检测，的需求日益明显。无名氏的在线服务通常会导致偏见和有害语言的存在，对于维护在线社区的健康带来挑战。这种现象特别在韩国是普遍存在的，在这里，大规模的偏见排除算法还没有广泛探索。在这篇论文中，我们介绍了一个新的全面的大规模数据集，从韩国知名的社交媒体平台收集得到的。我们的提议的数据集包括了（1）偏好、（2）荒唐词汇和（3）九种偏见的注释，使得文本样本可以同时进行多任务学习。利用现代BERT基于语言模型，我们的方法超越人类水平的准确率，在多种 метриках上测试。我们的工作不仅有学术价值，还可以实际地减少在线偏见和偏见，直接提高在线社区的健康。我们的工作提供了对于改善在线讨论质量和促进社会 благополучи性的坚实基础。所有代码和数据集都公开可访问于https://github.com/Dasol-Choi/KoMultiText。

paper_url: http://arxiv.org/abs/2310.04237
repo_url: None
paper_authors: Ng Bee Chin, Ng Zhi Ee Nicole, Kyla Kwan, Lee Yong Han Dylann, Liu Fang, Xu Hong
for: This study aims to investigate the linguistic traits of fake news and real news in both written and speech data.
methods: The study uses a dataset of COVID-19 related tweets and TikTok videos, which are fact-checked and labeled as ‘Real’, ‘Fake’, or ‘Questionable’. The Linguistic Inquiry and Word Count (LIWC) software is used to detect patterns in linguistic data.
results: The study finds a set of linguistic features that distinguish fake news from real news in both written and speech data, offering valuable insights into the role of language in shaping trust, social media interactions, and the propagation of fake news.Here is the same information in Simplified Chinese text:
for: 这个研究是 investigate fake news 和 real news 的语言特征。
methods: 研究使用 COVID-19 相关的 tweets 和 TikTok 视频数据集，并使用 credible sources 进行验证和标注为 ‘Real’、’Fake’ 或 ‘Questionable’。使用 Linguistic Inquiry and Word Count (LIWC) 软件检测语言数据中的特征。
results: 研究发现 fake news 和 real news 的语言特征，提供有价值的信息，用于理解信任、社交媒体互动以及假新闻的传播。

Abstract
This study investigates the linguistic traits of fake news and real news. There are two parts to this study: text data and speech data. The text data for this study consisted of 6420 COVID-19 related tweets re-filtered from Patwa et al. (2021). After cleaning, the dataset contained 3049 tweets, with 2161 labeled as 'real' and 888 as 'fake'. The speech data for this study was collected from TikTok, focusing on COVID-19 related videos. Research assistants fact-checked each video's content using credible sources and labeled them as 'Real', 'Fake', or 'Questionable', resulting in a dataset of 91 real entries and 109 fake entries from 200 TikTok videos with a total word count of 53,710 words. The data was analysed using the Linguistic Inquiry and Word Count (LIWC) software to detect patterns in linguistic data. The results indicate a set of linguistic features that distinguish fake news from real news in both written and speech data. This offers valuable insights into the role of language in shaping trust, social media interactions, and the propagation of fake news.

摘要
Here's the translation in Simplified Chinese:这个研究 investigate fake news 和 real news 的语言特征。研究包括两部分：文本数据和说话数据。文本数据包括从 Patwa et al. (2021) 筛选出的 6420 个 COVID-19 相关的推文，经过清洁，剩下 3049 个推文，其中 2161 个被标记为 "real"，888 个被标记为 "fake"。说话数据来自 TikTok，关注 COVID-19 相关的视频，研究助手使用可靠的来源进行 факт-核查，并将每个视频的内容分为 "Real"、"Fake" 或 "问题" 三类，共有 91 个实际的入口和 109 个假的入口，总共 53,710 个字。数据被利用 Linguistic Inquiry and Word Count (LIWC) 软件分析，探测文本数据中的语言特征。结果显示， fake news 和 real news 之间存在一组语言特征，这些特征可以在文本数据和说话数据中被探测出来。这些发现对于语言在建立信任、社交媒体互动和假新闻传播中的作用提供了有价值的信息。

mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis

paper_url: http://arxiv.org/abs/2310.04196
repo_url: None
paper_authors: Alexander Brauckmann, Elizabeth Polgreen, Tobias Grosser, Michael F. P. O’Boyle
for: 本研究旨在提高现代硬件上的编译效率，但现有的程序无法直接利用MLIR的高性能编译。为了解决这问题，本文提出了一种新的方法mlirSynth，可以自动将低级MLIR语言转换为高级MLIR语言，而无需手动定义转换规则。
methods: mlirSynth使用现有的MLIR语言定义来构建一个程序空间，并使用类型约束和等价关系进行有效的搜索。
results: 对Polybench测试集的分析显示，mlirSynth可以实现更高的覆盖率，并且在Intel和AMD两种硬件平台上实现了2.5倍和3.4倍的平均增速，相比之下现有的编译流程。此外，mlirSynth还可以对域特定加速器进行重定向，实现了TPU上的21.6倍的平均增速。

Abstract
MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR's high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth -- a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness \revi{by raising C programs} to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows for the C programming language. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU.

摘要
MLIR 是一个emerging compiler 基础设施 для modern 硬件，但现有的程式不能够利用 MLIR 的高性能编译。因此，以避免程式需要手动 rewrite，导致了对 MLIR dialects 的自动提升的努力。然而，现有的方法仍然 rely на手动定义的提升规则，这限制了它们的应用范围和维护可能性。我们提出了 mlirSynth，一个新的方法，可以将程式从低层 MLIR dialects 提升到高层 dialects without manually defined rules。它使用可用的 dialect definitions 建立一个程式空间，并使用类型条件和等价关系进行有效的搜寻。我们透过将 C 程式提升到两种不同的高层 MLIR dialects，以使用现有的高层 dialect specific compilation flows。在 Polybench 上，我们显示了更高的覆盖率，导致 geomean 加速率为 2.5x (Intel) 和 3.4x (AMD) compared to state-of-the-art compilation flows for the C programming language。mlirSynth 还允许透过域对应加速器的应用，导致一个 geomean 加速率为 21.6x 在 TPU 上。

How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

paper_url: http://arxiv.org/abs/2310.04064
repo_url: None
paper_authors: Josh Alman, Zhao Song
for: 这个论文是为了研究一种基于 triple-wise 相关的推理扩展，用于解决 transformer 无法解决的问题。
methods: 这个论文使用的方法是基于 tensor 的一种扩展，可以快速计算 triple-wise 相关的推理结果。
results: 这个论文的结果表明，如果输入矩阵中的元素均小于 $o(\sqrt[3]{\log n})$，那么可以在 $n^{1+o(1)}$ 时间内将“tensor-type”的注意力矩阵近似计算出来。但如果输入矩阵中的元素可能达到 $\Omega(\sqrt[3]{\log n})$，那么不存在 faster than $n^{3-o(1)}$ 的算法。

Abstract
In the classical transformer attention scheme, we are given three $n \times d$ size matrices $Q, K, V$ (the query, key, and value tokens), and the goal is to compute a new $n \times d$ size matrix $D^{-1} \exp(QK^\top) V$ where $D = \mathrm{diag}( \exp(QK^\top) {\bf 1}_n )$. In this work, we study a generalization of attention which captures triple-wise correlations. This generalization is able to solve problems about detecting triple-wise connections that were shown to be impossible for transformers. The potential downside of this generalization is that it appears as though computations are even more difficult, since the straightforward algorithm requires cubic time in $n$. However, we show that in the bounded-entry setting (which arises in practice, and which is well-studied in both theory and practice), there is actually a near-linear time algorithm. More precisely, we show that bounded entries are both necessary and sufficient for quickly performing generalized computations: $\bullet$ On the positive side, if all entries of the input matrices are bounded above by $o(\sqrt[3]{\log n})$ then we show how to approximate the ``tensor-type'' attention matrix in $n^{1+o(1)}$ time. $\bullet$ On the negative side, we show that if the entries of the input matrices may be as large as $\Omega(\sqrt[3]{\log n})$, then there is no algorithm that runs faster than $n^{3-o(1)}$ (assuming the Strong Exponential Time Hypothesis from fine-grained complexity theory). We also show that our construction, algorithms, and lower bounds naturally generalize to higher-order tensors and correlations. Interestingly, the higher the order of the tensors, the lower the bound on the entries needs to be for an efficient algorithm. Our results thus yield a natural tradeoff between the boundedness of the entries, and order of the tensor one may use for more expressive, efficient attention computation.

摘要
在 классическом transformer 注意机制中，我们给定三个 $n \times d$ 大小矩阵 $Q$, $K$, $V$（问题、键和值符号），目标是计算一个新的 $n \times d$ 大小矩阵 $D^{-1} \exp(QK^\top) V$，其中 $D = \text{diag}( \exp(QK^\top) I_n )$。在这个工作中，我们研究一种扩展 attention，它可以捕捉 triple-wise 相关性。这种扩展可以解决 transformer 无法解决的问题，但是可能增加计算的难度。我们证明，在受限的输入矩阵中（这种情况在实践中经常出现，并且有良好的理论和实践研究），实际上存在一个近线时间算法。具体来说，我们证明如果输入矩阵中所有元素都是 $o(\sqrt[3]{\log n})$ 的Upper bound，那么可以在 $n^{1+o(1)}$ 时间内 aproximate“tensor-type” 注意矩阵。然而，如果输入矩阵中元素可能达到 $\Omega(\sqrt[3]{\log n})$，那么我们证明无法在 $n^{3-o(1)}$ 时间内完成扩展计算。此外，我们还证明我们的构造、算法和下界自然地推广到更高阶的tensor和相关性。有趣的是，与tensor的阶数成正比，输入矩阵中元素的下界需要越低，以便有效地进行注意计算。我们的结果因此表示在bounded-entry setting中，注意计算的效率和tensor阶数之间存在自然的负反关系。

Analysis of the Reasoning with Redundant Information Provided Ability of Large Language Models

paper_url: http://arxiv.org/abs/2310.04039
repo_url: None
paper_authors: Wenbei Xie
for: 这个研究的目的是评估大语言模型（LLMs）在含重复信息的情景下的推理能力。
methods: 研究使用了一种新的问答任务（RRIP），其中提供了多种含重复信息的变体。两个流行的大语言模型（LlaMA2-13B-chat和GPT-3.5）在传统的问答任务上达到了moderate成功，但在RRIP任务上表现不佳。
results: 研究发现，当 LLMS面临含重复信息的情景时，其表现不佳。这种情况透视了当前 LLMS 在推理方面的局限性，并建议将未来的训练数据包含更多的重复信息，以提高 RRIP 任务的表现。

Abstract
Recent advancements in Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks, especially in reasoning, a cornerstone for achieving Artificial General Intelligence (AGI). However, commonly used benchmarks may not fully encapsulate the inferential abilities of these models in real-world scenarios. To address this gap, a new form of Question-Answering (QA) task, termed Reasoning with Redundant Information Provided (RRIP), is introduced. The study designed a modified version of the grade school math 8K (GSM-8K) dataset which has several variants focusing on different attributes of redundant information. This investigation evaluates two popular LLMs, LlaMA2-13B-chat and generative pre-trained transformer 3.5 (GPT-3.5), contrasting their performance on traditional QA tasks against the RRIP tasks. Findings indicate that while these models achieved moderate success on standard QA benchmarks, their performance notably declines when assessed on RRIP tasks. The study not only highlights the limitations of current LLMs in handling redundant information but also suggests that future training of these models should focus on incorporating redundant information into the training data to increase the performance on RRIP tasks.

摘要

Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models

paper_url: http://arxiv.org/abs/2310.04027
repo_url: https://github.com/AI4Finance-Foundation/FinGPT/tree/master/fingpt/FinGPT-RAG
paper_authors: Boyu Zhang, Hongyang Yang, Tianyu Zhou, Ali Babar, Xiao-Yang Liu
for: 这个论文是为了提高金融 sentiment 分析的精度和效果而写的。
methods: 该论文使用了 Large Language Models (LLMs) 和检索增强模块，以解决传统 NLP 模型在金融 sentiment 分析中的局限性和不足。
results: 该论文对比传统模型和其他 LLMs (如 ChatGPT 和 LLaMA)，实现了15% 到 48% 的性能提升。

Abstract
Financial sentiment analysis is critical for valuation and investment decision-making. Traditional NLP models, however, are limited by their parameter size and the scope of their training datasets, which hampers their generalization capabilities and effectiveness in this field. Recently, Large Language Models (LLMs) pre-trained on extensive corpora have demonstrated superior performance across various NLP tasks due to their commendable zero-shot abilities. Yet, directly applying LLMs to financial sentiment analysis presents challenges: The discrepancy between the pre-training objective of LLMs and predicting the sentiment label can compromise their predictive performance. Furthermore, the succinct nature of financial news, often devoid of sufficient context, can significantly diminish the reliability of LLMs' sentiment analysis. To address these challenges, we introduce a retrieval-augmented LLMs framework for financial sentiment analysis. This framework includes an instruction-tuned LLMs module, which ensures LLMs behave as predictors of sentiment labels, and a retrieval-augmentation module which retrieves additional context from reliable external sources. Benchmarked against traditional models and LLMs like ChatGPT and LLaMA, our approach achieves 15\% to 48\% performance gain in accuracy and F1 score.

摘要
financial sentiment分析是决定性的 для评估和投资决策。传统的NLP模型 however，受其参数大小和训练数据范围的限制，导致其泛化能力和效果在这个领域受到限制。最近，大语言模型（LLMs）在庞大的文本资源上进行预训练后表现出色，因为它们在不同的NLP任务上显示出了出色的零扩展能力。然而，直接将LLMs应用于金融 sentiment分析存在挑战：预训练目标和predicting sentiment标签之间的差异可能会降低 LLMS 的预测性能。此外，金融新闻通常简短，缺乏充分的上下文，可能会使 LLMS 的 sentiment分析成本不可靠。为了解决这些挑战，我们提出了一个结合检索增强的 LLMS 框架。该框架包括一个受 instrucion 训练的 LLMS 模块，以及一个检索增强模块，该模块可以从可靠的外部源中检索更多的上下文。与传统模型和 ChatGPT 以及 LLaMA 类 LLMS 进行比较，我们的方法在精度和 F1 分数方面实现了15% 到 48% 的性能提升。

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

paper_url: http://arxiv.org/abs/2310.03991
repo_url: None
paper_authors: Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov
for: 本研究旨在提高 watermarking 算法的抗伪性，以防止 paraphrase 攻击。
methods: 本研究使用了 locality-sensitive hashing (LSH) 和 sentence-level rejection sampling 等技术，实现了 sentence-level semantic watermarking。
results: 对比传统的token-level watermarking方法，本研究的方法更加抗伪，并且能够更好地保持生成质量。 experiments 表明，本研究的novel semantic watermark algorithm 在 common 和 bigram paraphrase 攻击下具有更高的抗伪性和生成质量。

Abstract
Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence-level rejection sampling until the sampled sentence falls in watermarked partitions in the semantic embedding space. A margin-based constraint is used to enhance its robustness. To show the advantages of our algorithm, we propose a "bigram" paraphrase attack using the paraphrase that has the fewest bigram overlaps with the original sentence. This attack is shown to be effective against the existing token-level watermarking method. Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.

摘要
现有的水印算法受到重写攻击的威胁，因为它们是基于语句级别的设计。为解决这个问题，我们提出SemStamp，一种可靠的句子级别Semantic水印算法，基于locality-sensitive hashing（LSH）。这个算法会将生成器LM生成的候选句子编码和LSH-对hash，然后通过句子级别弃权探针探查，直到找到水印分区中的句子。另外，我们还使用了一个margin-based的约束，以提高其可靠性。为证明我们的算法的优势，我们提出了一个"bigram"重写攻击，使用最少bigram重写的句子进行攻击。实验结果显示，我们的新的句子水印算法不只是在常规重写和bigram重写攻击下更加可靠，而且也能保持生成质量的好。

Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder

paper_url: http://arxiv.org/abs/2310.03985
repo_url: https://github.com/jason7580/End-to-End-ASR-and-Dementia-detection-system
paper_authors: Zih-Jyun Lin, Yi-Ju Chen, Po-Chih Kuo, Likai Huang, Chaur-Jong Hu, Cheng-Yu Chen
for: 这篇论文是为了提出一个基于语音识别模型的慢性认知评估系统，以帮助早期识别 деменція。
methods: 这篇论文使用了一个注意力型语音识别模型，并将其扩展为一个类型识别模型，以进行慢性认知评估。
results: 这篇论文获得了92.04%的准确率在认知症识别方面，并在临床认知评估分数预测方面得到了9%的平均绝对误差。

Abstract
Dementia diagnosis requires a series of different testing methods, which is complex and time-consuming. Early detection of dementia is crucial as it can prevent further deterioration of the condition. This paper utilizes a speech recognition model to construct a dementia assessment system tailored for Mandarin speakers during the picture description task. By training an attention-based speech recognition model on voice data closely resembling real-world scenarios, we have significantly enhanced the model's recognition capabilities. Subsequently, we extracted the encoder from the speech recognition model and added a linear layer for dementia assessment. We collected Mandarin speech data from 99 subjects and acquired their clinical assessments from a local hospital. We achieved an accuracy of 92.04% in Alzheimer's disease detection and a mean absolute error of 9% in clinical dementia rating score prediction.

摘要
德мен诊断需要一系列不同的测试方法，这是复杂和时间consuming的。早期发现德门可以防止病情加重。这篇论文使用语音识别模型构建了一个专门为普通话说者设计的德门评估系统。通过在真实场景中训练关注型语音识别模型，我们已经显著提高了模型的识别能力。然后，我们从语音识别模型中提取了编码器，并添加了一个线性层进行德门评估。我们从当地医院收集了99名患者的普通话语音数据，并获得了他们的临床评估。我们达到了阿尔茨heimer病 detection的准确率92.04%和临床德门评估分数预测的平均绝对误差9%。

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model

paper_url: http://arxiv.org/abs/2310.03975
repo_url: None
paper_authors: Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe
for: 提高 HuBERT 模型的 semantic representation，使其更好地捕捉语音内容中的多方面含义。
methods: 我们使用 topic model 对 pseudo-labels 进行分类，生成每个语音句子的话题标签。然后，我们将话题标签作为教师，将其添加到 HuBERT 模型中，以便在无监督的情况下提高模型的泛化能力。
results: 我们的方法在大多数任务中达到了比基eline更好的性能，包括自动语音识别和 SUPERB 任务中的五个任务。此外，我们发现话题标签包含了各种语音句子中的信息，如 gender、speaker 和其主题，这表明我们的方法可以有效地捕捉语音内容中的多方面含义。

Abstract
Recently, the usefulness of self-supervised representation learning (SSRL) methods has been confirmed in various downstream tasks. Many of these models, as exemplified by HuBERT and WavLM, use pseudo-labels generated from spectral features or the model's own representation features. From previous studies, it is known that the pseudo-labels contain semantic information. However, the masked prediction task, the learning criterion of HuBERT, focuses on local contextual information and may not make effective use of global semantic information such as speaker, theme of speech, and so on. In this paper, we propose a new approach to enrich the semantic representation of HuBERT. We apply topic model to pseudo-labels to generate a topic label for each utterance. An auxiliary topic classification task is added to HuBERT by using topic labels as teachers. This allows additional global semantic information to be incorporated in an unsupervised manner. Experimental results demonstrate that our method achieves comparable or better performance than the baseline in most tasks, including automatic speech recognition and five out of the eight SUPERB tasks. Moreover, we find that topic labels include various information about utterance, such as gender, speaker, and its theme. This highlights the effectiveness of our approach in capturing multifaceted semantic nuances.

摘要

Quantized Transformer Language Model Implementations on Edge Devices

paper_url: http://arxiv.org/abs/2310.03971
repo_url: None
paper_authors: Mohammad Wali Ur Rahman, Murad Mehrab Abrar, Hunter Gibbons Copening, Salim Hariri, Sicong Shao, Pratik Satam, Soheil Salehi
for: 这个研究的目的是将大规模的 transformer-based 模型，如 BERT，转换为适合资源受限的边缘设备的 FlatBuffer 格式，以提高它们在这些设备上的部署和运算效率。
methods: 这个研究使用了 FlatBuffer 的转换技术来将大规模的 BERT 模型转换为更小的 MobileBERT 模型，并进一步使用量化技术来将 MobileBERT 模型转换为适合边缘设备的硬件。
results: 这个研究的结果显示，相比 Original BERT 大模型， converted 和量化的 MobileBERT 模型具有 160$\times$ 小的库存储空间，并且在边缘设备上进行评估时能够保持至少一则 tweet 每秒的速度，却是 Original BERT 模型的 4.1% 损失。此外，这个研究也诉说了在无服务器环境中进行隐私保证的特点。

Abstract
Large-scale transformer-based models like the Bidirectional Encoder Representations from Transformers (BERT) are widely used for Natural Language Processing (NLP) applications, wherein these models are initially pre-trained with a large corpus with millions of parameters and then fine-tuned for a downstream NLP task. One of the major limitations of these large-scale models is that they cannot be deployed on resource-constrained devices due to their large model size and increased inference latency. In order to overcome these limitations, such large-scale models can be converted to an optimized FlatBuffer format, tailored for deployment on resource-constrained edge devices. Herein, we evaluate the performance of such FlatBuffer transformed MobileBERT models on three different edge devices, fine-tuned for Reputation analysis of English language tweets in the RepLab 2013 dataset. In addition, this study encompassed an evaluation of the deployed models, wherein their latency, performance, and resource efficiency were meticulously assessed. Our experiment results show that, compared to the original BERT large model, the converted and quantized MobileBERT models have 160$\times$ smaller footprints for a 4.1% drop in accuracy while analyzing at least one tweet per second on edge devices. Furthermore, our study highlights the privacy-preserving aspect of TinyML systems as all data is processed locally within a serverless environment.

摘要
大规模转换器基模型如 bidirectional Encoder Representations from Transformers (BERT) 在自然语言处理 (NLP) 应用中广泛使用，其中这些模型首先通过大量 Parameters 的预训练来初始化，然后为下游 NLP 任务进行细化。一个主要的 limitation 是这些大规模模型无法在有限的设备上部署，因为它们的模型大小和执行时间增加。为了解决这些限制，这些大规模模型可以转换为适合部署在有限资源的边缘设备的 FlatBuffer 格式。在这种情况下，我们评估了这些转换后的 MobileBERT 模型在三个不同的边缘设备上的性能，并对这些部署的模型进行了精心的评估。我们的实验结果表明，相比于原始 BERT 大模型，转换并量化后的 MobileBERT 模型具有 160 倍小于的占用空间，对于一个 4.1% 的精度下降，可以在边缘设备上分析至少一条微博每秒。此外，我们的研究强调了无人化 ML 系统的隐私保护特点，所有数据都是在无人化环境中进行本地处理。

2023-10-06

cs.LG

cs.LG - 2023-10-06

Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning

paper_url: http://arxiv.org/abs/2310.04627
repo_url: None
paper_authors: Liam Collins, Shanshan Wu, Sewoong Oh, Khe Chai Sim
for: 这个论文主要目标是解决 federated learning 中的个性化和Robustness 之间的贸易关系，以及如何在 computation-limited 的设置下实现 parameter-efficient fine-tuning (PEFT) approaches。
methods: 这个论文使用了 FedAvg 和 FedSGD plus personalization (通过客户端本地微调) 算法，并在多种 гипер参数设置下对这些算法进行了 benchmarking。它们还使用了一种常见的 PEFT 方法 — 提示调整 — 来训练大型语言模型 (LLMs)。
results: 研究发现，在使用小学习率并且使用多个本地循环进行个性化时， federated-trained 提示可以 surprisingly Robust。此外，研究还表明，通过添加 Regularization 和 interpolating 两个提示来改善个性化 vs Robustness 的贸易关系。

Abstract
In many applications of federated learning (FL), clients desire models that are personalized using their local data, yet are also robust in the sense that they retain general global knowledge. However, the presence of data heterogeneity across clients induces a fundamental trade-off between personalization (i.e., adaptation to a local distribution) and robustness (i.e., not forgetting previously learned general knowledge). It is critical to understand how to navigate this personalization vs robustness trade-off when designing federated systems, which are increasingly moving towards a paradigm of fine-tuning large foundation models. Due to limited computational and communication capabilities in most federated settings, this foundation model fine-tuning must be done using parameter-efficient fine-tuning (PEFT) approaches. While some recent work has studied federated approaches to PEFT, the personalization vs robustness trade-off of federated PEFT has been largely unexplored. In this work, we take a step towards bridging this gap by benchmarking fundamental FL algorithms -- FedAvg and FedSGD plus personalization (via client local fine-tuning) -- applied to one of the most ubiquitous PEFT approaches to large language models (LLMs) -- prompt tuning -- in a multitude of hyperparameter settings under varying levels of data heterogeneity. Our results show that federated-trained prompts can be surprisingly robust when using a small learning rate with many local epochs for personalization, especially when using an adaptive optimizer as the client optimizer during federated training. We also demonstrate that simple approaches such as adding regularization and interpolating two prompts are effective in improving the personalization vs robustness trade-off in computation-limited settings with few local updates allowed for personalization.

摘要
在许多 Federated Learning (FL) 应用中，客户端希望使用本地数据个性化模型，但同时也希望保持一致性，即不忘记之前学习的总体知识。然而，客户端数据的不同性会导致个性化与一致性之间的基本质量权衡。在设计 Federated 系统时，了解这种个性化与一致性质量权衡是非常重要的。由于大多数 Federated 设置下的计算和通信能力有限，因此需要使用基于大基础模型的 fine-tuning approaches。虽然一些最近的工作已经研究了 Federated 的 fine-tuning 方法，但 Federated 的个性化与一致性质量权衡仍然得不到充分的研究。在这项工作中，我们通过对 FedAvg 和 FedSGD 等基本 FL 算法进行个性化（通过客户端本地练习）和大基础模型 fine-tuning 的比较，在不同的 гиперпарамет Setting 下对 Federated 训练的个性化与一致性质量进行了评估。我们的结果表明，使用小学习率和多个本地 epoch 进行个性化时， federated-trained 提问可以 surprisingly 强大。我们还证明了简单的方法，如添加正则化和 interpolating 两个提问，可以在计算有限的情况下提高个性化与一致性之间的质量权衡。

FluxGAN: A Physics-Aware Generative Adversarial Network Model for Generating Microstructures That Maintain Target Heat Flux

paper_url: http://arxiv.org/abs/2310.04622
repo_url: None
paper_authors: Artem K. Pimachev, Manoj Settipalli, Sanghamitra Neogi
For: The paper is written for the purpose of proposing a physics-aware generative adversarial network (GAN) model, called FluxGAN, which can generate high-quality images of large microstructures and describe their thermal properties.* Methods: The FluxGAN model uses a synthesis-by-parts approach and is trained on a dataset of 2D images of microstructures, which allows it to generate arbitrary large size images at low computational cost. The model learns about the relationship between local structural features and physical processes, such as heat flux, due to external temperature gradients.* Results: The paper demonstrates that the FluxGAN model can be used to generate designs of thermal sprayed coatings that satisfy target thermal properties. The model is also capable of generating coating microstructures and physical processes in 3D domain after being trained on 2D examples. The approach has the potential to transform the design and optimization of thermal sprayed coatings for various applications, including high-temperature and long-duration operation of gas turbines for aircraft or ground-based power generators.Here is the same information in Simplified Chinese text:* для: 这篇论文是为了提出一种物理意识的生成反抗网络模型（FluxGAN），可以同时生成高质量的大微结构图像和其热物理特性的描述。* 方法: FluxGAN模型使用分割-synthesis Approach，并在一个包含微结构图像的数据集上训练。这使得模型可以生成任意大小的图像，而且计算成本很低。在训练过程中，模型学习了微结构特征和外部温度变化对物理过程的关系。* 结果: 论文展示了FluxGAN模型可以用来生成满足目标热性质的热涂层设计。模型还可以从2D例子上训练而生成3D域中的层结构和物理过程。这种方法有可能对涂层设计和优化的各种应用，包括高温和长时间运行的飞机发动机或地面发电机产生高效的影响。

Abstract
We propose a physics-aware generative adversarial network model, FluxGAN, capable of simultaneously generating high-quality images of large microstructures and description of their thermal properties. During the training phase, the model learns about the relationship between the local structural features and the physical processes, such as the heat flux in the microstructures, due to external temperature gradients. Once trained, the model generates new structural and associated heat flux environments, bypassing the computationally expensive modeling. Our model provides a cost effective and efficient approach over conventional modeling techniques, such as the finite element method (FEM), for describing the thermal properties of microstructures. The conventional approach requires computational modeling that scales with the size of the microstructure model, therefore limiting the simulation to a given size, resolution, and complexity of the model. In contrast, the FluxGAN model uses synthesis-by-part approach and generates arbitrary large size images at low computational cost. We demonstrate that the model can be utilized to generate designs of thermal sprayed coatings that satisfies target thermal properties. Furthermore, the model is capable of generating coating microstructures and physical processes in three-dimensional (3D) domain after being trained on two-dimensional (2D) examples. Our approach has the potential to transform the design and optimization of thermal sprayed coatings for various applications, including high-temperature and long-duration operation of gas turbines for aircraft or ground-based power generators.

摘要
我们提出了一种具有物理意识的生成对抗网络模型，FluxGAN，可以同时生成高质量的大型微结构图像和其热性质描述。在训练阶段，模型学习了本地结构特征与外部温度 gradients 对热流的关系。一旦训练完成，模型可以通过 circumventing 计算代价高昂的模型计算，生成新的结构和相关热流环境。我们的模型提供了一种可靠且高效的方法，比如金属元件法（FEM），用于描述微结构的热性质。传统方法需要计算模型，其计算复杂度与微结构模型的大小成直接关系，因此只能在给定大小、分辨率和复杂度下进行模拟。相比之下，FluxGAN 模型使用生成合并法，可以生成任意大小的图像，并且计算成本较低。我们示例中，我们使用了 FluxGAN 模型来生成符合目标热性质的热涂层设计。此外，模型还可以在三维空间中生成层次结构和物理过程，并且可以在训练于二维例子后在三维空间中生成层次结构。我们的方法具有可能改变热涂层设计和优化的应用，包括高温和长时间运行的液体发动机，如飞机或地面发电机。

A Topological Perspective on Demystifying GNN-Based Link Prediction Performance

paper_url: http://arxiv.org/abs/2310.04612
repo_url: https://github.com/yuwvandy/topo_lp_gnn
paper_authors: Yu Wang, Tong Zhao, Yuying Zhao, Yunchao Liu, Xueqi Cheng, Neil Shah, Tyler Derr
for:* 这种研究旨在探讨Graph Neural Networks (GNNs)在链接预测 (LP) 中的表现不均衡性，以及这种不均衡性的原因。methods:* 该研究使用了GNNs来学习节点嵌入，并对不同节点的LP表现进行分析。* 提出了一个新的度量指标Topological Concentration (TC)，基于每个节点的本地子图与其邻居节点的子图的交集。* 对TC指标与其他度量指标之间的相关性进行了实验证明。results:* 发现TC指标与LP表现之间存在高度相关性，而不是使用度量指标如度和子图密度。* 通过TC指标可以更好地确定低表现节点，并且可以预测节点之间的交互变化。* 提出了一种可Scalable的 Approximated Topological Concentration (ATC)，以便在计算TC指标时降低计算复杂性。* 研究了通过增强TC指标来提高LP表现的可能性，并对这种方法的局限性进行了讨论。

Abstract
Graph Neural Networks (GNNs) have shown great promise in learning node embeddings for link prediction (LP). While numerous studies aim to improve the overall LP performance of GNNs, none have explored its varying performance across different nodes and its underlying reasons. To this end, we aim to demystify which nodes will perform better from the perspective of their local topology. Despite the widespread belief that low-degree nodes exhibit poorer LP performance, our empirical findings provide nuances to this viewpoint and prompt us to propose a better metric, Topological Concentration (TC), based on the intersection of the local subgraph of each node with the ones of its neighbors. We empirically demonstrate that TC has a higher correlation with LP performance than other node-level topological metrics like degree and subgraph density, offering a better way to identify low-performing nodes than using cold-start. With TC, we discover a novel topological distribution shift issue in which newly joined neighbors of a node tend to become less interactive with that node's existing neighbors, compromising the generalizability of node embeddings for LP at testing time. To make the computation of TC scalable, We further propose Approximated Topological Concentration (ATC) and theoretically/empirically justify its efficacy in approximating TC and reducing the computation complexity. Given the positive correlation between node TC and its LP performance, we explore the potential of boosting LP performance via enhancing TC by re-weighting edges in the message-passing and discuss its effectiveness with limitations. Our code is publicly available at https://github.com/YuWVandy/Topo_LP_GNN.

摘要
graph neural networks (GNNs) 已经表现出很好的可能性在学习节点嵌入中进行链接预测（LP）。虽然许多研究努力提高 GNNs 的总体 LP 性能，但是没有探讨其各节点的不同性和下面的原因。为了解这些问题，我们想要解读哪些节点会在本地拓扑结构上表现更好。一些普遍的信念认为，低度节点会表现更差的 LP 性能，但我们的实证发现了这个观点的复杂性，并提出了一个更好的指标：拓扑强度（TC），基于每个节点的本地子图与其邻居节点的交集。我们通过实验表明，TC 与 LP 性能之间存在更高的相关性，而不是其他节点级别拓扑指标，如度和子图密度。这意味着可以通过TC来更好地识别 LP 性能下降的节点，而不是使用冷启动。与TC相关的发现是，在新加入邻居节点的情况下，节点之间的互动性会降低，这会影响节点的总体 LP 性能。为了使TC计算可扩展，我们还提出了一种近似TC的方法： Approximated Topological Concentration（ATC），并论证了其可以准确地近似TC并降低计算复杂性。我们发现，节点的TC与其 LP 性能之间存在正相关性，因此我们可以通过增强TC来提高 LP 性能，例如通过重新权重边在消息传递中。我们的代码可以在https://github.com/YuWVandy/Topo_LP_GNN上获取。

Robust Transfer Learning with Unreliable Source Data

paper_url: http://arxiv.org/abs/2310.04606
repo_url: None
paper_authors: Jianqing Fan, Cheng Gao, Jason M. Klusowski
for: 这paper解决了转移学习中的抗抵抗性和weak transferable signal问题。
methods: 我们引入了一个新的量 called ‘’ambiguity level’’，用于度量目标和来源分布之间的差异，并提出了一种简单的转移学习过程。我们还证明了一个普遍的定理，表明这个新量与转移学习中的风险改进之间的关系。
results: 我们的’’Transfer Around Boundary’’（TAB）模型，通过考虑目标和来源数据的性能平衡，能够提高分类，同时避免负转移。此外，我们在非 Parametric 分类和логисти回归任务上表现了TAB模型的效果，达到了最佳的上界，即logarithmic factor。验证研究也证明了TAB模型的效果。此外，我们还提供了简单的方法来 bound Excess Misclassification Error без需要特殊的转移学习知识。

Abstract
This paper addresses challenges in robust transfer learning stemming from ambiguity in Bayes classifiers and weak transferable signals between the target and source distribution. We introduce a novel quantity called the ''ambiguity level'' that measures the discrepancy between the target and source regression functions, propose a simple transfer learning procedure, and establish a general theorem that shows how this new quantity is related to the transferability of learning in terms of risk improvements. Our proposed ''Transfer Around Boundary'' (TAB) model, with a threshold balancing the performance of target and source data, is shown to be both efficient and robust, improving classification while avoiding negative transfer. Moreover, we demonstrate the effectiveness of the TAB model on non-parametric classification and logistic regression tasks, achieving upper bounds which are optimal up to logarithmic factors. Simulation studies lend further support to the effectiveness of TAB. We also provide simple approaches to bound the excess misclassification error without the need for specialized knowledge in transfer learning.

摘要
Our proposed "Transfer Around Boundary" (TAB) model, which uses a threshold to balance the performance of the target and source data, is efficient and robust, and can improve classification while avoiding negative transfer. We demonstrate the effectiveness of the TAB model on non-parametric classification and logistic regression tasks, achieving upper bounds that are optimal up to logarithmic factors. Simulation studies also support the effectiveness of TAB.Furthermore, we provide simple approaches to bound the excess misclassification error without the need for specialized knowledge in transfer learning. Our results show that the TAB model can be a useful tool for addressing the challenges of robust transfer learning in a variety of applications.

Learning Optimal Power Flow Value Functions with Input-Convex Neural Networks

paper_url: http://arxiv.org/abs/2310.04605
repo_url: None
paper_authors: Andrew Rosemberg, Mathieu Tanneau, Bruno Fanzeres, Joaquim Garcia, Pascal Van Hentenryck
for: solves the Optimal Power Flow (OPF) problem with machine learning (ML) to improve the speed of analysis and enable real-time decision-making in power systems.
methods: uses ML to learn convex approximate solutions that can be solved more quickly than traditional methods, while still maintaining a high level of accuracy.
results: enables faster exploration of vast solution spaces in complex power system problems, allowing for more efficient and practical decision-making.

Abstract
The Optimal Power Flow (OPF) problem is integral to the functioning of power systems, aiming to optimize generation dispatch while adhering to technical and operational constraints. These constraints are far from straightforward; they involve intricate, non-convex considerations related to Alternating Current (AC) power flow, which are essential for the safety and practicality of electrical grids. However, solving the OPF problem for varying conditions within stringent time frames poses practical challenges. To address this, operators resort to model simplifications of varying accuracy. Unfortunately, better approximations (tight convex relaxations) are often computationally intractable. This research explores machine learning (ML) to learn convex approximate solutions for faster analysis in the online setting while still allowing for coupling into other convex dependent decision problems. By trading off a small amount of accuracy for substantial gains in speed, they enable the efficient exploration of vast solution spaces in these complex problems.

摘要
OPTimal Power Flow (OPF) 问题是电力系统的关键问题，旨在优化发电规划，同时遵循技术和运营限制。这些限制并不简单，涉及到复杂的交流电流流体系，这些限制对电力网络的安全性和实用性具有重要性。然而，为了解决在不同条件下的变化，在紧张时间framworks中解决OPF问题具有实际挑战。为此，操作人员通常采用模型简化，以提高解决速度。然而，更好的近似（紧密的 convex relaxation）经常是计算易于过载。这个研究探讨了机器学习（ML），以学习减少精度的convex近似解决方案，以更快地进行在线分析，同时仍能与其他convex相互依赖的决策问题相集成。通过折衔精度和速度之间的平衡，他们可以快速探索复杂的问题空间。

PriViT: Vision Transformers for Fast Private Inference

paper_url: http://arxiv.org/abs/2310.04604
repo_url: https://github.com/nyu-dice-lab/privit
paper_authors: Naren Dhyani, Jianqiao Mo, Minsu Cho, Ameya Joshi, Siddharth Garg, Brandon Reagen, Chinmay Hegde
for: 这篇论文的目的是提出一种可以在安全多方计算（MPC）协议下进行私人执行的深度模型，以提高计算机视觉应用的性能。
methods: 该论文提出了一种基于导数的算法，可以选择性地”Taylorize” ViT中的非线性运算（自注意、Feed-Forward rectifiers、层normalization），以维护其预测精度。
results: experiments表明，该算法可以在多个标准图像分类任务上提高MPC协议下的私人执行性能，并且与现有的设计MPCCompatible transformer架构的方法相比，在达到精度-延迟的Pareto前沿上表现更好。

Abstract
The Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications. However, ViTs are ill-suited for private inference using secure multi-party computation (MPC) protocols, due to the large number of non-polynomial operations (self-attention, feed-forward rectifiers, layer normalization). We propose PriViT, a gradient based algorithm to selectively "Taylorize" nonlinearities in ViTs while maintaining their prediction accuracy. Our algorithm is conceptually simple, easy to implement, and achieves improved performance over existing approaches for designing MPC-friendly transformer architectures in terms of achieving the Pareto frontier in latency-accuracy. We confirm these improvements via experiments on several standard image classification tasks. Public code is available at https://github.com/NYU-DICE-Lab/privit.

摘要
“视野变数器（ViT）架构已经成为现代 компьютер视觉应用中的后座架构，但是ViTs在安全多方计算（MPC）协议下进行私人推干时存在问题，因为它们具有大量非多项式操作（自我注意力、传递反射、层常化）。我们提出了PriViT，一个基于梯度的算法，可以选择性地“Taylorize” ViTs 中的非线性性，以维持其预测精度。我们的算法是概念简单、易于实现，并在实现Pareto点（延迟精度）上提供了改进。我们透过对多个标准图像分类任务进行实验，证实了这些改进。相关的公共代码可以在https://github.com/NYU-DICE-Lab/privit中找到。”

Deep Model Predictive Optimization

paper_url: http://arxiv.org/abs/2310.04590
repo_url: https://github.com/jisacks/dmpo
paper_authors: Jacob Sacks, Rwik Rana, Kevin Huang, Alex Spitzer, Guanya Shi, Byron Boots
for: 本研究旨在设计Robotics中的坚固政策，以实现复杂和灵活的行为在实际世界中。methods: 本研究使用Deep Model Predictive Optimization（DMPO），通过经验学习内部循环的MPC优化算法，直接对控制问题进行适应。results: DMPO在一个真实的四旋翼机追踪任务中表现出色，比基eline MPC算法提高性能，并且可以在 fewer samples 和更少的内存下进行适应。在附加的风暴预测情况下，DMPO可以灵活适应零shot，并且仍然超越所有基eline。更多结果可以在 https://tinyurl.com/mr2ywmnw 获取。

Abstract
A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world. On one end of the spectrum, we have model-free reinforcement learning (MFRL), which is incredibly flexible and general but often results in brittle policies. In contrast, model predictive control (MPC) continually re-plans at each time step to remain robust to perturbations and model inaccuracies. However, despite its real-world successes, MPC often under-performs the optimal strategy. This is due to model quality, myopic behavior from short planning horizons, and approximations due to computational constraints. And even with a perfect model and enough compute, MPC can get stuck in bad local optima, depending heavily on the quality of the optimization algorithm. To this end, we propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience, specifically tailored to the needs of the control problem. We evaluate DMPO on a real quadrotor agile trajectory tracking task, on which it improves performance over a baseline MPC algorithm for a given computational budget. It can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%. Moreover, because DMPO requires fewer samples, it can also achieve these benefits with 4.3X less memory. When we subject the quadrotor to turbulent wind fields with an attached drag plate, DMPO can adapt zero-shot while still outperforming all baselines. Additional results can be found at https://tinyurl.com/mr2ywmnw.

摘要
robotics 中的一个主要挑战是设计Robust的策略，以实现复杂且灵活的行为在真实世界中。一个端的spectrum中，我们有model-free reinforcement learning（MFRL），它非常灵活和通用，但经常导致脆弱的策略。相比之下，model predictive control（MPC）在每个时间步骤 continually re-plans，以保持对偏移和模型不准确的Robust性。然而，尽管在实际世界中获得了成功，MPC经常下 Perform 优化策略。这是因为模型质量、短时间内的自我优化行为和计算约束导致的approximation。而且， même avec un modèle parfait et suffisamment de calcul, MPC peut se retrouver dans des optima local mauvaises, en fonction de la qualité de l'algorithme d'optimisation.为了解决这个问题，我们提出了Deep Model Predictive Optimization（DMPO），它通过经验直接学习MPC优化算法的内部循环，特别是适应控制问题的需求。我们在一个真实的quadrotor agile trajectory tracking任务上评估了DMPO，并与基准MPC算法进行比较。DMPO可以在给定的计算预算下提高性能，相比baseline MPCA algorithm by up to 27% with fewer samples，并且可以在练习量和MFRL中训练的策略之间进行比较。此外，由于DMPO需要 fewer samples，它还可以实现这些优势，并且可以在4.3倍的内存中进行学习。当我们将quadrotor expose to turbulent wind fields with an attached drag plate时，DMPO可以适应zero-shot，并且仍然超过所有基准。详细的结果可以在https://tinyurl.com/mr2ywmnw 找到。

The Impact of Equal Opportunity on Statistical Discrimination

paper_url: http://arxiv.org/abs/2310.04585
repo_url: None
paper_authors: John Y. Zhu
for: 这个论文是为了扩展公司的工具箱，以便通过合理的方式执行法规。
methods: 论文使用机器学习生成公司对个人的不可见类别信仰，从而使得法规可以更加有效。
results: 研究表明，通过要求公司选择决策策略来平等化真正正确率，可以消除统计性隔离。

Abstract
I modify the canonical statistical discrimination model of Coate and Loury (1993) by assuming the firm's belief about an individual's unobserved class is machine learning-generated and, therefore, contractible. This expands the toolkit of a regulator beyond belief-free regulations like affirmative action. Contractible beliefs make it feasible to require the firm to select a decision policy that equalizes true positive rates across groups -- what the algorithmic fairness literature calls equal opportunity. While affirmative action does not necessarily end statistical discrimination, I show that imposing equal opportunity does.

摘要
我修改了科恩和劳雷（1993）的标准统计歧视模型，假设企业对个人未知类别的信念是通过机器学习生成的，因此可控。这扩展了管理者的工具包，包括不基于信念的法规，如有利预测。可控信念使得可以要求企业选择决策策略，使true positive rate across groups相同，这与算法公平 литературе称为equal opportunity。虽然有利预测不一定会消除统计歧视，但我表明，要求equal opportunity会消除它。

Self-Confirming Transformer for Locally Consistent Online Adaptation in Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.04579
repo_url: None
paper_authors: Tao Li, Juan Guevara, Xinghong Xie, Quanyan Zhu
for: 本研究旨在提高offline reinforcement learning（RL）的在线适应性，即使在online testing中opponents（外部控制不能的agent） exhibit nonstationary behaviors。
methods: 本研究使用transformer architecture和self-confirming loss（SCL）来Address the online nonstationarity。
results: 实验结果表明，使用SCT可以在online testing中适应nonstationary opponents，并 achieved higher returns than vanilla transformers和offline MARL baselines。

Abstract
Offline reinforcement learning (RL) leverages previously collected data to extract policies that return satisfying performance in online environments. However, offline RL suffers from the distribution shift between the offline dataset and the online environment. In the multi-agent RL (MARL) setting, this distribution shift may arise from the nonstationary opponents (exogenous agents beyond control) in the online testing who display distinct behaviors from those recorded in the offline dataset. Hence, the key to the broader deployment of offline MARL is the online adaptation to nonstationary opponents. Recent advances in large language models have demonstrated the surprising generalization ability of the transformer architecture in sequence modeling, which prompts one to wonder \textit{whether the offline-trained transformer policy adapts to nonstationary opponents during online testing}. This work proposes the self-confirming loss (SCL) in offline transformer training to address the online nonstationarity, which is motivated by the self-confirming equilibrium (SCE) in game theory. The gist is that the transformer learns to predict the opponents' future moves based on which it acts accordingly. As a weaker variant of Nash equilibrium (NE), SCE (equivalently, SCL) only requires local consistency: the agent's local observations do not deviate from its conjectures, leading to a more adaptable policy than the one dictated by NE focusing on global optimality. We evaluate the online adaptability of the self-confirming transformer (SCT) by playing against nonstationary opponents employing a variety of policies, from the random one to the benchmark MARL policies. Experimental results demonstrate that SCT can adapt to nonstationary opponents online, achieving higher returns than vanilla transformers and offline MARL baselines.

摘要
偏向学习（Offline Reinforcement Learning）可以利用先前收集的数据提取策略，以实现在线环境中达到满意性的表现。然而，偏向学习在线环境中会面临数据分布的变化问题。在多代理学习（Multi-agent Reinforcement Learning） Setting中，这种分布变化可能来自于在线测试中的非站台式对手（exogenous agents beyond control），这些对手在online测试中展现出与偏向学习数据中记录的行为不同的行为。因此，延伸到更广泛的部署需要在线适应非站台式对手。最近的大语言模型技术的进步表明了变换体系的抽象能力在序列模型中，这使得我们可以思考，“是否在线测试中，偏向学习过的变换策略能够适应非站台式对手？”这项工作提出了在线适应的自Confirming损失（SCL），以解决在线非站台式对手的问题。SCL的核心思想是，变换学习如何预测对手的未来行动，然后根据这些预测行动进行反应。相比 Nash平衡（Nash Equilibrium），SCE（自Confirming Equilibrium）只需要本地一致性：代理的本地观察不会与其推测相偏离，从而导致更适应的策略。我们通过在不同策略上对非站台式对手进行在线适应测试，评估SCT（自Confirming transformer）在线适应能力。实验结果表明，SCT可以在线适应非站台式对手，并在返回上超过了普通变换和偏向学习基准值。

Transformer-Based Neural Surrogate for Link-Level Path Loss Prediction from Variable-Sized Maps

paper_url: http://arxiv.org/abs/2310.04570
repo_url: None
paper_authors: Thomas M. Hehn, Tribhuvanesh Orekondy, Ori Shental, Arash Behboodi, Juan Bucheli, Akash Doshi, June Namgoong, Taesang Yoo, Ashwin Sampath, Joseph B. Soriaga
for: 预测传输器-接收器位置的信道损失是许多场景的关键，如网络规划和手动遮挡。
methods: 本文提出了一种基于转换器的神经网络架构，可以从不同维度的地图数据中预测无线通信频率特性。
results: 我们的方法可以高效地学习主要的信道损失从稀疏的训练数据中，并在新地图上进行良好的泛化。

Abstract
Estimating path loss for a transmitter-receiver location is key to many use-cases including network planning and handover. Machine learning has become a popular tool to predict wireless channel properties based on map data. In this work, we present a transformer-based neural network architecture that enables predicting link-level properties from maps of various dimensions and from sparse measurements. The map contains information about buildings and foliage. The transformer model attends to the regions that are relevant for path loss prediction and, therefore, scales efficiently to maps of different size. Further, our approach works with continuous transmitter and receiver coordinates without relying on discretization. In experiments, we show that the proposed model is able to efficiently learn dominant path losses from sparse training data and generalizes well when tested on novel maps.

摘要
估算发射器-接收器位置之间的信道损失是许多应用场景的关键，如网络规划和手动 переключение。机器学习已成为预测无线通道属性的受欢迎工具。在这种工作中，我们提出了基于变换器的神经网络架构，可以从不同维度的地图数据中预测链路级属性。地图中包含建筑和植被信息。变换器模型会关注 relevante 区域，因此可以有效缩放到不同的地图大小。此外，我们的方法不需要发射器和接收器坐标的精确分解，可以使用连续坐标。在实验中，我们发现提议的模型可以高效地从笔记数据中学习主要的信道损失，并在新的地图上具有良好的泛化性。

DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors

paper_url: http://arxiv.org/abs/2310.04561
repo_url: https://github.com/tianhaoxie/DragD3D
paper_authors: Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa
for: 本研究旨在提供一种全面考虑物体全局上下文的本地矩阵编辑方法，以实现全面的、真实的和自然的形状变换。
methods: 本方法基于经典的 геометрическиеARAP（最大可能牢固）正则化和大规模扩散模型生成的2D优先顺序，并使用最近引入的DDS损失函数评估图像的准确性。 DragD3Dcombines approximate gradients of DDS with gradients from ARAP loss to modify mesh vertices via neural Jacobian field, while satisfying vertex constraints.
results: 研究表明， DragD3D可以实现高质量、真实和自然的形状变换，并且可以考虑物体的全局上下文。 DragD3D的实现超过了只使用 геометрические正则化的结果。

Abstract
Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Direct mesh editing methods are typically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics and geometry-based regularizers are not aware of the global context and semantics of the object, and the more recent deep learning priors are limited to a specific class of 3D object deformations. In this work, our main contribution is a local mesh editing method called DragD3D for global context-aware realistic deformation through direct manipulation of a few vertices. DragD3D is not restricted to any class of objects. It achieves this by combining the classic geometric ARAP (as rigid as possible) regularizer with 2D priors obtained from a large-scale diffusion model. Specifically, we render the objects from multiple viewpoints through a differentiable renderer and use the recently introduced DDS loss which scores the faithfulness of the rendered image to one from a diffusion model. DragD3D combines the approximate gradients of the DDS with gradients from the ARAP loss to modify the mesh vertices via neural Jacobian field, while also satisfying vertex constraints. We show that our deformations are realistic and aware of the global context of the objects, and provide better results than just using geometric regularizers.

摘要
<>translate_language: zh-CN<>直接矩阵编辑和变形是创建三维模型和动画管道中的关键组件。直接矩阵编辑方法通常是形式为优化问题，将用户指定的顶点约束与一种regularizer相结合，该regularizer determines the position of the rest of the vertices。选择这种regularizer是创建最终结果的真实性和准确性的关键。物理和几何基于的regularizers不具备物体全局上下文和 semantics的认知，而最近的深度学习假设只是限制到特定类型的3D对象变形。在这项工作中，我们的主要贡献是一种全球上下文相关的实实地摆动方法，即DragD3D，通过直接控制一些顶点来实现高真实性的变形。DragD3D不受任何类型的物体限制。它通过将经典的几何ARAP（as rigid as possible）regulator与多视点 render 后的2D priors结合，使用最近引入的DDS损失（scores the faithfulness of the rendered image to one from a diffusion model）来修改矩阵顶点，同时满足顶点约束。我们表明了我们的变形是真实的，aware of the global context of the objects，并提供了更好的结果，比只使用几何regularizers。

Talk like a Graph: Encoding Graphs for Large Language Models

paper_url: http://arxiv.org/abs/2310.04560
repo_url: None
paper_authors: Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi
for: 这研究旨在探讨如何将图structured data编码为文本，以便由大型自然语言模型（LLM）进行理解。
methods: 研究者采用了多种图编码方法，包括Graph2Vec、DeepWalk和LPA。
results: 研究发现， LLM 在图理解任务中表现强度各不相同，具体来说是：1）编码方法的选择，2）图任务的性质，3）图结构本身。这些结果提供了对编码图为文本的策略的有价值的指导，并示出了对图理解任务中 LLM 性能的改进。

Abstract
Graphs are a powerful tool for representing and analyzing complex relationships in real-world applications such as social networks, recommender systems, and computational finance. Reasoning on graphs is essential for drawing inferences about the relationships between entities in a complex system, and to identify hidden patterns and trends. Despite the remarkable progress in automated reasoning with natural text, reasoning on graphs with large language models (LLMs) remains an understudied problem. In this work, we perform the first comprehensive study of encoding graph-structured data as text for consumption by LLMs. We show that LLM performance on graph reasoning tasks varies on three fundamental levels: (1) the graph encoding method, (2) the nature of the graph task itself, and (3) interestingly, the very structure of the graph considered. These novel results provide valuable insight on strategies for encoding graphs as text. Using these insights we illustrate how the correct choice of encoders can boost performance on graph reasoning tasks inside LLMs by 4.8% to 61.8%, depending on the task.

摘要
GRAPHs 是一种强大的工具，用于表示和分析复杂关系的实际应用，如社交网络、推荐系统和计算金融。理解 GRAPH 是为了从复杂系统中提取关系之间的信息，发现隐藏的模式和趋势。虽然自动化的文本推理得到了很大的进步，但 GRAPH 推理 WITH 大型自然语言模型（LLM）仍然是一个未研究的问题。在这个工作中，我们进行了 GRAPH 结构数据作为文本的第一次全面研究。我们发现， LLM 在 GRAPH 理解任务中的表现因三个基本因素而异常：（1） GRAPH 编码方法，（2） GRAPH 任务的本质，以及（3）考虑 GRAPH 的结构。这些新的结果提供了值得关注的投入，并证明了如何选择正确的编码器可以提高 LLM 中 GRAPH 理解任务的性能，从4.8% 到 61.8%，具体取决于任务。

Multi-decadal Sea Level Prediction using Neural Networks and Spectral Clustering on Climate Model Large Ensembles and Satellite Altimeter Data

paper_url: http://arxiv.org/abs/2310.04540
repo_url: None
paper_authors: Saumya Sinha, John Fasullo, R. Steven Nerem, Claire Monteleoni
for: 这项研究的目的是预测未来30年全球海平面水位变化趋势。
methods: 该研究使用机器学习（ML）技术预测海平面水位变化趋势，并提供了不确定性估计。
results: 研究发现，通过使用特征连接神经网络（FCNN），可以根据气候模型预测来预测海平面水位变化趋势。此外，研究还发现将空间数据分割并学习专门的ML模型对每个分割区域的预测有所提高。

Abstract
Sea surface height observations provided by satellite altimetry since 1993 show a rising rate (3.4 mm/year) for global mean sea level. While on average, sea level has risen 10 cm over the last 30 years, there is considerable regional variation in the sea level change. Through this work, we predict sea level trends 30 years into the future at a 2-degree spatial resolution and investigate the future patterns of the sea level change. We show the potential of machine learning (ML) in this challenging application of long-term sea level forecasting over the global ocean. Our approach incorporates sea level data from both altimeter observations and climate model simulations. We develop a supervised learning framework using fully connected neural networks (FCNNs) that can predict the sea level trend based on climate model projections. Alongside this, our method provides uncertainty estimates associated with the ML prediction. We also show the effectiveness of partitioning our spatial dataset and learning a dedicated ML model for each segmented region. We compare two partitioning strategies: one achieved using domain knowledge, and the other employing spectral clustering. Our results demonstrate that segmenting the spatial dataset with spectral clustering improves the ML predictions.

摘要
卫星探雷数据自1993年起显示全球海平面水位的升高率为3.4毫米/年。虽然在过去30年平均海平面上升10厘米，但地域性差异在海平面变化中很大。通过这项工作，我们预测未来30年海平面趋势，并研究未来海平面变化的 Patterns。我们利用机器学习（ML）技术来实现这项挑战性的海平面预测任务。我们的方法包括将海平面数据从探雷观测和气候模型仿真数据中提取出来，并使用全连接神经网络（FCNN）来预测海平面趋势。同时，我们的方法还提供了与ML预测相关的不确定性估计。我们还表明，将空间数据分割并学习每个分割区域专门的ML模型可以提高ML预测的准确性。我们比较了两种分割策略：一种基于领域知识，另一种使用spectral clustering。我们的结果表明，使用spectral clustering分割空间数据可以提高ML预测的准确性。

Generating Less Certain Adversarial Examples Improves Robust Generalization

paper_url: http://arxiv.org/abs/2310.04539
repo_url: https://github.com/trustmlrg/edac
paper_authors: Minxing Zhang, Michael Backes, Xiao Zhang
for: 该研究旨在解释深度神经网络受到敌意示例的攻击的原因，并提出一种基于外部梯度的方法来提高模型的Robustness。
methods: 该研究使用了对抗训练方法，并提出了一种基于对抗 cer certainty 的方法来减少模型的对抗过拟合。
results: 实验结果表明，该方法能够有效地减少对抗过拟合，并可以生成具有更好的Robustness的模型。

Abstract
Recent studies have shown that deep neural networks are vulnerable to adversarial examples. Numerous defenses have been proposed to improve model robustness, among which adversarial training is most successful. In this work, we revisit the robust overfitting phenomenon. In particular, we argue that overconfident models produced during adversarial training could be a potential cause, supported by the empirical observation that the predicted labels of adversarial examples generated by models with better robust generalization ability tend to have significantly more even distributions. Based on the proposed definition of adversarial certainty, we incorporate an extragradient step in the adversarial training framework to search for models that can generate adversarially perturbed inputs with lower certainty, further improving robust generalization. Our approach is general and can be easily combined with other variants of adversarial training methods. Extensive experiments on image benchmarks demonstrate that our method effectively alleviates robust overfitting and is able to produce models with consistently improved robustness.

摘要
研究最近发现深度神经网络容易受到攻击性示例的影响。许多防御方法已经被提出，其中最成功的是对抗训练。在这项工作中，我们重新检视了Robust Overfitting现象。我们认为，在对抗训练中生成的模型会产生过于自信的问题，这是由于我们观察到了由模型产生的攻击示例预测结果的分布变得更加均匀。基于我们定义的对抗确定性，我们在对抗训练框架中添加了一个extragradient步骤，以搜索具有更低自信度的模型，从而进一步提高对抗泛化。我们的方法是通用的，可以轻松地与其他对抗训练方法结合使用。我们在图像准则上进行了广泛的实验，并证明了我们的方法可以有效地减轻Robust Overfitting现象，并生成具有改善的对抗性。

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation

paper_url: http://arxiv.org/abs/2310.04535
repo_url: None
paper_authors: Zixi Zhang, Greg Chadwick, Hugo McNally, Yiren Zhao, Robert Mullins
for: 这 paper 是为了提高硬件设计验证的测试过程中的自动生成测试输入的效率而写的。
methods: 这 paper 使用了大型自然语言模型 (LLM) 的力量，并提出了一个新的测试框架，即 LLM4DV。这个框架包括一个互动式提取测试输入的模板，以及四种创新的提示改进，以支持管道执行并进一步提高其性能。
results: 对于三个自定义的设计下测试 (DUT) 模块，我们的实验表明，LLM4DV 在简单的 DUT 场景下能够高效地使用基本的数学逻辑和预训练知识来处理测试输入。虽然在复杂任务下其效率有所下降，但它仍然在相对 терms 中超过了传统的受限制随机测试 (CRT)。

Abstract
Test stimuli generation has been a crucial but labor-intensive task in hardware design verification. In this paper, we revolutionize this process by harnessing the power of large language models (LLMs) and present a novel benchmarking framework, LLM4DV. This framework introduces a prompt template for interactively eliciting test stimuli from the LLM, along with four innovative prompting improvements to support the pipeline execution and further enhance its performance. We compare LLM4DV to traditional constrained-random testing (CRT), using three self-designed design-under-test (DUT) modules. Experiments demonstrate that LLM4DV excels in efficiently handling straightforward DUT scenarios, leveraging its ability to employ basic mathematical reasoning and pre-trained knowledge. While it exhibits reduced efficiency in complex task settings, it still outperforms CRT in relative terms. The proposed framework and the DUT modules used in our experiments will be open-sourced upon publication.

摘要
实验刺激生成是对硬件设计验证的重要但是劳动密集的任务。在这篇论文中，我们使用大型自然语言模型（LLM）来推翻这个过程，并提出了一个新的测试框架，即LLM4DV。这个框架包括一个互动式提示模板，以及四种创新的提示改进，以支持管线执行和进一步提高其性能。我们与传统的受限制随机测试（CRT）进行比较，使用三个自己设计的设计下的模组（DUT）。实验结果显示，LLM4DV在简单的DUT场景中能够高效地运行，利用其基本的数学逻辑和预先训练知识。然而，在复杂的任务设定中，其效率较低，但仍然在相对的 терminus上高于CRT。我们将提出的框架和DUT模组使用在实验中的会公开开源。

DPGOMI: Differentially Private Data Publishing with Gaussian Optimized Model Inversion

paper_url: http://arxiv.org/abs/2310.04528
repo_url: None
paper_authors: Dongjie Chen, Sen-ching S. Cheung, Chen-Nee Chuah
for: 保护敏感数据在GAN训练中的隐私
methods: 提出了一种新的差分隐私数据发布方法 called Differentially Private Data Publishing with Gaussian Optimized Model Inversion (DPGOMI)
results: DPGOMI在CIFAR10和SVHN标准数据集上表现出优于标准DP-GAN方法，同时保持同等的隐私水平

Abstract
High-dimensional data are widely used in the era of deep learning with numerous applications. However, certain data which has sensitive information are not allowed to be shared without privacy protection. In this paper, we propose a novel differentially private data releasing method called Differentially Private Data Publishing with Gaussian Optimized Model Inversion (DPGOMI) to address this issue. Our approach involves mapping private data to the latent space using a public generator, followed by a lower-dimensional DP-GAN with better convergence properties. We evaluate the performance of DPGOMI on standard datasets CIFAR10 and SVHN. Our results show that DPGOMI outperforms the standard DP-GAN method in terms of Inception Score, Fr\'echet Inception Distance, and classification performance, while providing the same level of privacy. Our proposed approach offers a promising solution for protecting sensitive data in GAN training while maintaining high-quality results.

摘要
高维数据在深度学习时代广泛应用，但某些敏感信息不得分享无隐私保护。本文提出了一种新的差分隐私数据发布方法called differentially private data publishing with Gaussian optimized model inversion (DPGOMI)，以解决这个问题。我们的方法包括将私人数据映射到隐藏空间使用公共生成器，然后使用更好的协调性DP-GAN进行Lower-dimensional化。我们对标准Dataset CIFAR10和SVHN进行评估，结果显示DPGOMI在Inception Score、Fréchet Inception Distance和分类性能方面与标准DP-GAN方法比较，同时保持同等的隐私水平。我们的提议的方法可以保护深度学习中敏感数据的隐私，同时保持高质量结果。

SPADE: Sparsity-Guided Debugging for Deep Neural Networks

paper_url: http://arxiv.org/abs/2310.04519
repo_url: None
paper_authors: Arshia Soltani Moakhar, Eugenia Iofinova, Dan Alistarh
for: 提高深度学习模型的可解释性，即理解模型如何做出具体的决策。
methods: 使用 sample-targeted pruning 技术，从已经训练好的模型和目标样本开始，提供一个 “trace” 的网络执行跟踪，以减少网络中不必要的连接，提高模型的可解释性。
results: 对多种可解释性方法进行测试，发现使用 SPADE 预处理后，图像锐度地图的准确率得到了显著提高，同时neuron visualization 也得到了改善，帮助人们更好地理解网络的行为。

Abstract
Interpretability, broadly defined as mechanisms for understanding why and how machine learning models reach their decisions, is one of the key open goals at the intersection of deep learning theory and practice. Towards this goal, multiple tools have been proposed to aid a human examiner in reasoning about a network's behavior in general or on a set of instances. However, the outputs of these tools-such as input saliency maps or neuron visualizations-are frequently difficult for a human to interpret, or even misleading, due, in particular, to the fact that neurons can be multifaceted, i.e., a single neuron can be associated with multiple distinct feature combinations. In this paper, we present a new general approach to address this problem, called SPADE, which, given a trained model and a target sample, uses sample-targeted pruning to provide a "trace" of the network's execution on the sample, reducing the network to the connections that are most relevant to the specific prediction. We demonstrate that preprocessing with SPADE significantly increases both the accuracy of image saliency maps across several interpretability methods and the usefulness of neuron visualizations, aiding humans in reasoning about network behavior. Our findings show that sample-specific pruning of connections can disentangle multifaceted neurons, leading to consistently improved interpretability.

摘要
优化机器学习模型的理解性是一个当前的开放问题，即使用深度学习理论和实践的交叉点。为了解决这个问题，多种工具已经被提出来帮助人类分析网络的行为。然而，这些工具的输出，如输入突出地图或神经视觉，经常难以 для人类理解，甚至是误leading的，因为神经元可以有多个不同的特征组合。在这篇论文中，我们提出了一种新的通用方法，called SPADE，它可以给一个已经训练的模型和一个目标样本提供一个"轨迹"，用于描述网络在该样本上的执行，从而减少网络到最 relevante的连接。我们示示了使用 SPADE 预处理可以显著提高图像突出地图的准确性和神经视觉的有用性，帮助人类更好地理解网络的行为。我们的发现表明，对特定样本进行预处理的连接可以分解多面神经元，导致Consistent improvement in interpretability。

Domain Randomization for Sim2real Transfer of Automatically Generated Grasping Datasets

paper_url: http://arxiv.org/abs/2310.04517
repo_url: https://github.com/Johann-Huber/qd_grasp
paper_authors: Johann Huber, François Hélénon, Hippolyte Watrelot, Faiz Ben Amar, Stéphane Doncieux
for: 这paper主要针对的是如何使用数据驱动方法解决机器人抓取问题，以及如何 Addressing the challenge of sparse rewards in grasping.methods: 本paper使用了Quality-Diversity（QD）方法生成了超过7000个抓取轨迹，并在实际世界中进行了测试。results: 研究发现了几个领域随机化的质量标准和实际世界之间的相关性，并且确定了未来研究抓取问题的关键挑战。此外，QD方法已经被提议用于使抓取更加强健对域随机化。在Franka Research 3臂上，QD方法实现了84%的传输率。

Abstract
Robotic grasping refers to making a robotic system pick an object by applying forces and torques on its surface. Many recent studies use data-driven approaches to address grasping, but the sparse reward nature of this task made the learning process challenging to bootstrap. To avoid constraining the operational space, an increasing number of works propose grasping datasets to learn from. But most of them are limited to simulations. The present paper investigates how automatically generated grasps can be exploited in the real world. More than 7000 reach-and-grasp trajectories have been generated with Quality-Diversity (QD) methods on 3 different arms and grippers, including parallel fingers and a dexterous hand, and tested in the real world. Conducted analysis on the collected measure shows correlations between several Domain Randomization-based quality criteria and sim-to-real transferability. Key challenges regarding the reality gap for grasping have been identified, stressing matters on which researchers on grasping should focus in the future. A QD approach has finally been proposed for making grasps more robust to domain randomization, resulting in a transfer ratio of 84% on the Franka Research 3 arm.

摘要
（简体中文） robotic grasping 指的是使 robotic 系统用力和扭矩对物体进行抓取。Recent studies 多使用数据驱动方法解决抓取问题，但这个任务的奖励稀缺性使得学习过程具有挑战。To avoid constraining the operational space, increasing number of works propose grasping datasets to learn from. However, most of them are limited to simulations. 本文 investigate 如何在实际世界中利用自动生成的抓取。More than 7000 reach-and-grasp trajectories have been generated with Quality-Diversity (QD) methods on 3 different arms and grippers, including parallel fingers and a dexterous hand, and tested in the real world. Collected measure analysis shows correlations between several Domain Randomization-based quality criteria and sim-to-real transferability. Key challenges regarding the reality gap for grasping have been identified, stressing matters on which researchers on grasping should focus in the future. A QD approach has finally been proposed for making grasps more robust to domain randomization, resulting in a transfer ratio of 84% on the Franka Research 3 arm.

Generative Diffusion From An Action Principle

paper_url: http://arxiv.org/abs/2310.04490
repo_url: None
paper_authors: Akhil Premkumar
for: 这个论文主要用于描述一种生成扩散模型，它可以将给定的数据集转化为通用的噪声。
methods: 这种模型使用反扩散过程来生成新的样本，并通过训练神经网络来匹配数据集的梯度。
results: 通过将反扩散转化为优化控制问题，这种方法可以从动作原理中得出Score匹配，并将不同类型的扩散模型相连接。

Abstract
Generative diffusion models synthesize new samples by reversing a diffusive process that converts a given data set to generic noise. This is accomplished by training a neural network to match the gradient of the log of the probability distribution of a given data set, also called the score. By casting reverse diffusion as an optimal control problem, we show that score matching can be derived from an action principle, like the ones commonly used in physics. We use this insight to demonstrate the connection between different classes of diffusion models.

摘要
<>将给定文本翻译成简化中文。>生成扩散模型可以Synthesize新样本，通过逆扩散过程将给定数据集转化为通用噪声。这是通过训练神经网络匹配给定数据集的梯度，也就是score的对数分布的梯度。我们将逆扩散视为优化控制问题，从而显示出score匹配可以由动作原理 derivation。我们利用这一点来描述不同类型的扩散模型之间的连接。

BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity

paper_url: http://arxiv.org/abs/2310.04420
repo_url: None
paper_authors: Andrew F. Luo, Margaret M. Henderson, Michael J. Tarr, Leila Wehbe
for: 了解高等观觉 cortical 的功能组织
methods: 使用 data-driven 方法生成自然语言描述，并使用 contrastive vision-language 模型和大型自然语言模型生成可读的描述
results: 实现 voxel-level 描述，并通过 text-conditioned 图像生成技术发现 fine-grained semantic selectivity in body-selective areas

Abstract
Understanding the functional organization of higher visual cortex is a central focus in neuroscience. Past studies have primarily mapped the visual and semantic selectivity of neural populations using hand-selected stimuli, which may potentially bias results towards pre-existing hypotheses of visual cortex functionality. Moving beyond conventional approaches, we introduce a data-driven method that generates natural language descriptions for images predicted to maximally activate individual voxels of interest. Our method -- Semantic Captioning Using Brain Alignments ("BrainSCUBA") -- builds upon the rich embedding space learned by a contrastive vision-language model and utilizes a pre-trained large language model to generate interpretable captions. We validate our method through fine-grained voxel-level captioning across higher-order visual regions. We further perform text-conditioned image synthesis with the captions, and show that our images are semantically coherent and yield high predicted activations. Finally, to demonstrate how our method enables scientific discovery, we perform exploratory investigations on the distribution of "person" representations in the brain, and discover fine-grained semantic selectivity in body-selective areas. Unlike earlier studies that decode text, our method derives voxel-wise captions of semantic selectivity. Our results show that BrainSCUBA is a promising means for understanding functional preferences in the brain, and provides motivation for further hypothesis-driven investigation of visual cortex.

摘要
Our method, called Semantic Captioning Using Brain Alignments (BrainSCUBA), builds upon the rich embedding space learned by a contrastive vision-language model and utilizes a pre-trained large language model to generate interpretable captions. We validate our method through fine-grained voxel-level captioning across higher-order visual regions. We further perform text-conditioned image synthesis with the captions, and show that our images are semantically coherent and yield high predicted activations.To demonstrate the potential of our method for scientific discovery, we perform exploratory investigations on the distribution of "person" representations in the brain. Our results reveal fine-grained semantic selectivity in body-selective areas, which is unlike earlier studies that have only decoded text. Our method derives voxel-wise captions of semantic selectivity, providing a new means for understanding functional preferences in the brain. Our results show that BrainSCUBA is a promising approach for understanding the functional organization of the higher visual cortex, and provides motivation for further hypothesis-driven investigation of visual cortex.Translated into Simplified Chinese:理解高级视觉 cortical 的功能组织是生物学的中心关注点。过去的研究主要使用手动选择的刺激来映射视觉和semantic 的选择性，这可能会导致结果受到先前的假设的影响。我们的方法可以让我们跳过这些传统的方法，我们引入了一种数据驱动的方法，该方法可以生成预测最大启动个 voxel 的自然语言描述。我们的方法，叫做 BrainSCUBA，基于视觉语言模型学习的丰富嵌入空间，并使用预训练的大语言模型来生成可读的描述。我们验证了我们的方法，通过高级视觉区域的细胞级描述。我们进一步进行了基于描述的图像生成，并证明我们的图像具有高预测活动。为了证明我们的方法的可能性，我们进行了探索性的寻究 "人" 表示在大脑中的分布。我们的结果表明，在体部选择区域中存在细致的semantic 选择性，这与之前的研究只是解码文本不同。我们的方法可以 derive voxel 级的描述，提供了一种新的方法来理解大视觉 cortical 的功能组织。我们的结果表明，BrainSCUBA 是一种有前途的方法，可以帮助我们更好地理解大视觉 cortical 的功能组织，并提供了新的假设来研究视觉 cortical。

Functional Interpolation for Relative Positions Improves Long Context Transformers

paper_url: http://arxiv.org/abs/2310.04418
repo_url: None
paper_authors: Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli
for: 提高 transformer 模型在训练时间 longer than training 输入的性能
methods: 提出一种新的函数相对位编码FIRE，通过进行进度 interpolating 来提高 transformer 模型对 longer 上下文的泛化
results: FIRE 模型在零 shot 语言模型和长文本benchmark上表现更好，可以更好地泛化到 longer 上下文

Abstract
Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models. Though the Transformer architecture has fundamentally no limits on the input sequence lengths it can process, the choice of position encoding used during training can limit the performance of these models on longer inputs. We propose a novel functional relative position encoding with progressive interpolation, FIRE, to improve Transformer generalization to longer contexts. We theoretically prove that this can represent some of the popular relative position encodings, such as T5's RPE, Alibi, and Kerple. We next empirically show that FIRE models have better generalization to longer contexts on both zero-shot language modeling and long text benchmarks.

摘要
防止 transformer 模型在训练时未使用的输入长度上 decay 性能是扩展context length的重要挑战。虽然 transformer 架构没有处理输入序列长度的限制，但选择的位置编码方法可能会限制这些模型在 longer inputs 上的性能。我们提出一种新的函数相对位置编码，称为 FIRE，以提高 transformer 对更长上下文的总体化。我们理论上证明 FIRE 可以表示一些流行的相对位置编码，如 T5 的 RPE、Alibi 和 Kerple。我们随后通过实验表明 FIRE 模型在 zero-shot 语言模型和长文本benchmark上具有更好的总体化能力。

Diffusion Random Feature Model

paper_url: http://arxiv.org/abs/2310.04417
repo_url: None
paper_authors: Esha Saha, Giang Tran
for: 这 paper 的目的是提出一种可解释的扩散模型，用于解决复杂的机器学习任务。
methods: 这 paper 使用了扩散模型的思想，并结合了随机特征模型的优点，以实现可解释性和数值相当的结果。 specifically, 作者们使用了现有的概率分布的扩展结果和分配匹配性的特性， derive 一种基于扩散模型的深度随机特征模型。
results: 作者们通过在时尚 MNIST 数据集和工具音频数据集上生成样本，验证了他们的模型的可解释性和数值相当性。

Abstract
Diffusion probabilistic models have been successfully used to generate data from noise. However, most diffusion models are computationally expensive and difficult to interpret with a lack of theoretical justification. Random feature models on the other hand have gained popularity due to their interpretability but their application to complex machine learning tasks remains limited. In this work, we present a diffusion model-inspired deep random feature model that is interpretable and gives comparable numerical results to a fully connected neural network having the same number of trainable parameters. Specifically, we extend existing results for random features and derive generalization bounds between the distribution of sampled data and the true distribution using properties of score matching. We validate our findings by generating samples on the fashion MNIST dataset and instrumental audio data.

摘要
Diffusion probabilistic models have been successfully used to generate data from noise. However, most diffusion models are computationally expensive and difficult to interpret with a lack of theoretical justification. Random feature models on the other hand have gained popularity due to their interpretability but their application to complex machine learning tasks remains limited. In this work, we present a diffusion model-inspired deep random feature model that is interpretable and gives comparable numerical results to a fully connected neural network having the same number of trainable parameters. Specifically, we extend existing results for random features and derive generalization bounds between the distribution of sampled data and the true distribution using properties of score matching. We validate our findings by generating samples on the fashion MNIST dataset and instrumental audio data.Translation in Simplified Chinese:Diffusion probabilistic models 已经成功地使用来生成噪音中的数据。然而，大多数 diffusion models computationally expensive 和难以理解，lacking theoretical justification。Random feature models 在 interpretable 方面受到了欢迎，但它们在复杂的机器学习任务上的应用还受限。在这项工作中，我们提出了一种基于 diffusion model 的深度随机特征模型，这种模型具有可读性和与完全连接神经网络相同数量的可训练参数。我们将 existing results for random features 推广，并使用 score matching 属性 derive generalization bounds between the distribution of sampled data and the true distribution。我们验证了我们的发现，通过在 fashion MNIST dataset 和 instrumental audio data 上生成样本。

Why Do We Need Weight Decay in Modern Deep Learning?

paper_url: http://arxiv.org/abs/2310.04415
repo_url: https://github.com/tml-epfl/why-weight-decay
paper_authors: Maksym Andriushchenko, Francesco D’Angelo, Aditya Varre, Nicolas Flammarion
for: 这篇论文探讨了现代深度学习中广泛使用的权重衰减技术，包括大语言模型的训练。尽管它在现代深度学习中广泛使用，但它的作用仍然不够了解。
methods: 这篇论文使用了现代深度学习中常用的SGD优化器，并研究了权重衰减对于过参数化深度网络的影响。
results: 研究发现，权重衰减不仅不是一种直接的正则化效果，而且可以改变深度学习训练的动态，从而提高SGD优化器的性能。具体来说，权重衰减可以增强SGD优化器中的损失稳定机制，使得过参数化深度网络在训练过程中更加稳定。同时，权重衰减还可以在bfloat16混合精度训练中防止损失快速增长，从而提高大语言模型的训练效果。

Abstract
Weight decay is a broadly used technique for training state-of-the-art deep networks, including large language models. Despite its widespread usage, its role remains poorly understood. In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory. For overparameterized deep networks, we show how weight decay modifies the optimization dynamics enhancing the ever-present implicit regularization of SGD via the loss stabilization mechanism. In contrast, for underparameterized large language models trained with nearly online SGD, we describe how weight decay balances the bias-variance tradeoff in stochastic optimization leading to lower training loss. Moreover, we show that weight decay also prevents sudden loss divergences for bfloat16 mixed-precision training which is a crucial tool for LLM training. Overall, we present a unifying perspective from ResNets on vision tasks to LLMs: weight decay is never useful as an explicit regularizer but instead changes the training dynamics in a desirable way. Our code is available at https://github.com/tml-epfl/why-weight-decay.

摘要
“weight decay”是现代深度学习中广泛使用的技术，包括大型语言模型的训练。尽管其广泛使用，但其作用仍然不够了解。在这个工作中，我们强调“weight decay”在现代深度学习中的角色与经典学习理论中的调整效应不同。对于过parameterized的深度网络来说，我们显示了weight decay对优化律动的影响，增强了SGD的隐式调整机制，使得这些网络在训练中更加稳定。相反，对于大型语言模型使用nearly online SGD进行训练的情况下，我们描述了weight decay如何在数据来调整偏差和变分之间的平衡，导致训练损失下降。此外，我们还证明了weight decay可以防止bfloat16混合精度训练中的损失峰值分化。总之，我们提出了一个统一的见解，即“weight decay”不是一个直接的调整器，而是改变训练律动的方式，以提高深度学习模型的性能。我们的代码可以在https://github.com/tml-epfl/why-weight-decay上获取。

Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL

paper_url: http://arxiv.org/abs/2310.04411
repo_url: https://github.com/yueyang130/seem
paper_authors: Yang Yue, Rui Lu, Bingyi Kang, Shiji Song, Gao Huang
for: 这个研究的目的是解释在无实际动力学训练的情况下Q值估计的异常情况，并提出一种改进的解决方案。methods: 该研究使用了NTK来计算Q网络在训练过程中的演变性，并提出了一种自适应 eigenvalues度量来评估训练过程中的异常情况。results: 研究发现，自适应 eigenvalues度量可以准确地预测训练过程中Q值估计的异常情况，并且可以预测模型的 нор 的增长和崩溃步骤。实验结果与理论分析一致。此外，研究还提出了一种改进方案，通过修改模型的架构来避免异常情况，并通过广泛的实验研究证明其效果。

Abstract
The divergence of the Q-value estimation has been a prominent issue in offline RL, where the agent has no access to real dynamics. Traditional beliefs attribute this instability to querying out-of-distribution actions when bootstrapping value targets. Though this issue can be alleviated with policy constraints or conservative Q estimation, a theoretical understanding of the underlying mechanism causing the divergence has been absent. In this work, we aim to thoroughly comprehend this mechanism and attain an improved solution. We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL. Then, we propose a novel Self-Excite Eigenvalue Measure (SEEM) metric based on Neural Tangent Kernel (NTK) to measure the evolving property of Q-network at training, which provides an intriguing explanation of the emergence of divergence. For the first time, our theory can reliably decide whether the training will diverge at an early stage, and even predict the order of the growth for the estimated Q-value, the model's norm, and the crashing step when an SGD optimizer is used. The experiments demonstrate perfect alignment with this theoretic analysis. Building on our insights, we propose to resolve divergence from a novel perspective, namely improving the model's architecture for better extrapolating behavior. Through extensive empirical studies, we identify LayerNorm as a good solution to effectively avoid divergence without introducing detrimental bias, leading to superior performance. Experimental results prove that it can still work in some most challenging settings, i.e. using only 1 transitions of the dataset, where all previous methods fail. Moreover, it can be easily plugged into modern offline RL methods and achieve SOTA results on many challenging tasks. We also give unique insights into its effectiveness.

摘要
<> translate into Simplified ChineseOffline RL 中的 Q-值估计差异问题很 prominent, agent 无法访问实际 dynamics. traditional beliefs 认为这种不稳定性来自于查询 out-of-distribution 动作时的估计 Q-value. although this issue can be alleviated with policy constraints or conservative Q estimation, a theoretical understanding of the underlying mechanism causing the divergence has been absent.在这项工作中，我们希望彻底了解这种机制，并提出一个改进的解决方案。我们首先认为自我触发（self-excitation）是 offline RL 中 Q-值估计差异的基本原因。然后，我们提出了一种基于 Neural Tangent Kernel (NTK) 的 Self-Excite Eigenvalue Measure (SEEM) 度量，用于测量 Q-网络在训练过程中的演变性质。这提供了让人感到奇异的解释：Q-网络的训练过程中的差异是如何 emerge 的？我们的理论可以在训练的早期准确地判断 Whether the training will diverge, 并且可以预测 Q-值估计的增长顺序、模型的 norm 和折冲步骤的增长速率。实验结果与我们的分析完全一致。基于我们的发现，我们提出了一种新的方法，以改进模型的架构，以便更好地推断。通过广泛的实验研究，我们发现层 normalization (LayerNorm) 是一种有效的方法，可以避免差异而不导致偏见。这种方法可以在许多最有挑战性的任务上实现 SOTA 结果。此外，我们还给出了这种方法的独特效果。在这项工作中，我们还进行了一系列的实验研究，以证明我们的发现和方法的有效性。我们发现，只需使用 dataset 中的一个转移，我们可以使用 LayerNorm 来避免差异，并且可以在许多最有挑战性的任务上实现 SOTA 结果。此外，我们还发现，层 normalization 可以轻松地整合到现有的 offline RL 方法中，以提高其性能。

On the Embedding Collapse when Scaling up Recommendation Models

paper_url: http://arxiv.org/abs/2310.04400
repo_url: None
paper_authors: Xingzhuo Guo, Junwei Pan, Ximei Wang, Baixu Chen, Jie Jiang, Mingsheng Long
for: 本研究旨在探讨大深度基本模型在推荐系统中的应用，并评估大模型是否能够得到更好的性能。
methods: 本研究使用实验和理论分析来探讨大模型中的嵌入层 collapse 问题，并提出一种简单 yet effective 的多嵌入设计来解决这个问题。
results: 经过广泛的实验证明，提出的多嵌入设计能够提供不同的推荐模型中的可扩展性。

Abstract
Recent advances in deep foundation models have led to a promising trend of developing large recommendation models to leverage vast amounts of available data. However, we experiment to scale up existing recommendation models and observe that the enlarged models do not improve satisfactorily. In this context, we investigate the embedding layers of enlarged models and identify a phenomenon of embedding collapse, which ultimately hinders scalability, wherein the embedding matrix tends to reside in a low-dimensional subspace. Through empirical and theoretical analysis, we demonstrate that the feature interaction module specific to recommendation models has a two-sided effect. On the one hand, the interaction restricts embedding learning when interacting with collapsed embeddings, exacerbating the collapse issue. On the other hand, feature interaction is crucial in mitigating the fitting of spurious features, thereby improving scalability. Based on this analysis, we propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to capture diverse patterns and reduce collapse. Extensive experiments demonstrate that this proposed design provides consistent scalability for various recommendation models.

摘要
Translated into Simplified Chinese:近期深基模型的进步导致了推荐模型的大型化，以利用庞大数据。然而，我们对现有模型进行扩大，并观察到扩大后的模型不会改善。在这种情况下，我们对扩大后的模型的嵌入层进行调查，并发现一种嵌入潮涨现象，这 ultimately hinders scalability, wherein the embedding matrix tends to reside in a low-dimensional subspace.通过实验和理论分析，我们证明了推荐模型特有的Feature interaction模块有两个面向的效果。一方面，交互 restricts嵌入学习 When interacting with collapsed embeddings, exacerbating the collapse issue。另一方面，Feature interaction is crucial in mitigating the fitting of spurious features, thereby improving scalability。基于这种分析，我们提出了一种简单 yet effective的多嵌入设计，其中包括嵌入集specific interaction模块，以 Capture diverse patterns and reduce collapse。经验示出，该设计可以提供不同推荐模型的一致性的可扩展性。

MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

paper_url: http://arxiv.org/abs/2310.04369
repo_url: None
paper_authors: Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie
for: 这个论文旨在提出一种新的多频带时间频率神经网络（MBTFNet），用于歌唱音频提升。
methods: 这个模型利用了inter和intra频道模型，以及双路模型，以更好地处理全频率信号。
results: experiments表明，提出的模型在比较多种现状背景音频提升模型时，表现明显更好。

Abstract
A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models.

摘要
一般来说，神经音频增强（SE）方法主要处理speech和噪声混合，而这并不是最佳的歌唱voice增强场景。音乐源分离（MSS）模型对 vocals和不同伴奏元素进行同等处理，这可能会降低模型性能，比如只考虑 vocals增强。在这篇论文中，我们提出了一种新的多频带时间频率神经网络（MBTFNet），它特别是从歌曲录制中除去背景音乐、噪声和 backing vocals。MBTFNet组合了内部和外部模型，以更好地处理全带信号。我们还引入了双路模型，以扩大模型的感知范围。我们提出了隐式个性化增强（IPE）阶段，基于信号噪声比（SNR）估计，以进一步提高MBTFNet的性能。实验表明，我们提出的模型在多个状态的SE和MSS模型之上显著超越。

A Marketplace Price Anomaly Detection System at Scale

paper_url: http://arxiv.org/abs/2310.04367
repo_url: None
paper_authors: Akshit Sarpal, Qiwen Kang, Fangping Huang, Yang Song, Lijie Wan
for: 这篇论文是为了解决在线市场平台上每天有大量价格更新，但这些更新可能会导致数据质量问题的问题。
methods: 该论文提出了一种可扩展的价格异常检测框架，使用邻域和历史价格趋势来生成一个可靠的价格上限。
results: 论文的方法可以提高精准anchor覆盖率，特别是在高易受影响的 Item 子集中。在高易受影响的 Item 子集中，论文的方法可以提高精准anchor覆盖率高达46.6%。

Abstract
Online marketplaces execute large volume of price updates that are initiated by individual marketplace sellers each day on the platform. This price democratization comes with increasing challenges with data quality. Lack of centralized guardrails that are available for a traditional online retailer causes a higher likelihood for inaccurate prices to get published on the website, leading to poor customer experience and potential for revenue loss. We present MoatPlus (Masked Optimal Anchors using Trees, Proximity-based Labeling and Unsupervised Statistical-features), a scalable price anomaly detection framework for a growing marketplace platform. The goal is to leverage proximity and historical price trends from unsupervised statistical features to generate an upper price bound. We build an ensemble of models to detect irregularities in price-based features, exclude irregular features and use optimized weighting scheme to build a reliable price bound in real-time pricing pipeline. We observed that our approach improves precise anchor coverage by up to 46.6% in high-vulnerability item subsets

摘要
在线市场上执行大量的价格更新，每天由个人市场商家发起的请求非常大。这种价格民主化带来了数据质量的增加挑战。由于在传统的在线零售商家中没有中央的指南箱，导致更高的假象价格在网站上发布，从而导致客户体验不佳，并且可能导致收入损失。我们介绍了MoatPlus（嵌入最优锚点使用树、距离基于标签和无监督统计特征），一个可扩展的价格异常检测框架，用于一个快速发展的市场平台。我们的目标是利用距离和历史价格趋势从无监督统计特征中生成一个可靠的价格上限。我们建立了一个ensemble模型，检测价格基本特征中的异常，排除异常特征，并使用优化的权重方案生成一个可靠的价格 bound。我们观察到，我们的方法可以提高准确的锚点覆盖率达到46.6%在高抵触性ITEM subsets中。

Exploiting Transformer Activation Sparsity with Dynamic Inference

paper_url: http://arxiv.org/abs/2310.04361
repo_url: None
paper_authors: Mikołaj Piórczyński, Filip Szatkowski, Klaudia Bałazy, Bartosz Wójcik
for: 降低 transformer 模型的执行成本，使其更加实用。
methods: 使用 activation sparsity 和 mixture of experts (MoE) 技术，将 dense 模型转换成 sparse MoE 版本。
results: 可以培训小型闭合网络，成功地预测每个专家的贡献。另外，还提出了一种动态确定每个token执行的专家数量的机制。 DSTI 可以应用于任何 transformer 基于的架构，并无影响准确性。对 BERT-base 分类模型，可以减少执行成本约 60%。

Abstract
Transformer models, despite their impressive performance, often face practical limitations due to their high computational requirements. At the same time, previous studies have revealed significant activation sparsity in these models, indicating the presence of redundant computations. In this paper, we propose Dynamic Sparsified Transformer Inference (DSTI), a method that radically reduces the inference cost of Transformer models by enforcing activation sparsity and subsequently transforming a dense model into its sparse Mixture of Experts (MoE) version. We demonstrate that it is possible to train small gating networks that successfully predict the relative contribution of each expert during inference. Furthermore, we introduce a mechanism that dynamically determines the number of executed experts individually for each token. DSTI can be applied to any Transformer-based architecture and has negligible impact on the accuracy. For the BERT-base classification model, we reduce inference cost by almost 60%.

摘要
“对于变数增强模型（Transformer）来说，即使它们在表现上表现出色，但是它们仍面临实际的限制，主要是 Computational cost 高昂。同时，先前的研究表明，这些模型中存在较多的活动缺失，这表明这些模型中存在累累的计算。在这篇论文中，我们提出了一种方法，即动态简化transformer推干（DSTI），它可以将transformer模型转换为其简化的 Mixture of Experts（MoE）版本，并且在推干过程中强制实施活动缺失。我们显示了可以训练小型的闸道网络，并且可以成功预测每个专家的相对贡献。此外，我们引入了一个机制，将专家的数量 dynamically 决定为每个 Token 的需要。DSTI 可以应用于任何基于 transformer 的架构，并且对精度造成无法可预测的影响。对于 BERT-base 分类模型，我们可以降低推干成本约 60%。”

Integrating Transformations in Probabilistic Circuits

paper_url: http://arxiv.org/abs/2310.04354
repo_url: None
paper_authors: Tom Schierenbeck, Vladimir Vutov, Thorsten Dickhaus, Michael Beetz
for: 这个研究旨在解决 probabilistic circuits 的预测限制，并提出了一种方法来缓解这种限制。
methods: 这种方法基于独立成分分析（ICA），并是基于 joint probability trees 的扩展。
results: 对七个 benchmark 数据集和实际 робот数据进行测试，该方法能够 achieve higher likelihoods 使用 fewer parameters，并可以进行有效的采样和approximate inference。

Abstract
This study addresses the predictive limitation of probabilistic circuits and introduces transformations as a remedy to overcome it. We demonstrate this limitation in robotic scenarios. We motivate that independent component analysis is a sound tool to preserve the independence properties of probabilistic circuits. Our approach is an extension of joint probability trees, which are model-free deterministic circuits. By doing so, it is demonstrated that the proposed approach is able to achieve higher likelihoods while using fewer parameters compared to the joint probability trees on seven benchmark data sets as well as on real robot data. Furthermore, we discuss how to integrate transformations into tree-based learning routines. Finally, we argue that exact inference with transformed quantile parameterized distributions is not tractable. However, our approach allows for efficient sampling and approximate inference.

摘要
Translation notes:* "probabilistic circuits" ⇒ 概率Circuit (Simplified Chinese)* "independent component analysis" ⇒ 独立元分析 (Simplified Chinese)* "joint probability trees" ⇒ 共同概率树 (Simplified Chinese)* "transformations" ⇒ 变换 (Simplified Chinese)* "quantile parameterized distributions" ⇒ 量词分布 (Simplified Chinese)* "exact inference" ⇒ 精确推理 (Simplified Chinese)* "efficient sampling" ⇒ 高效采样 (Simplified Chinese)* "approximate inference" ⇒ 近似推理 (Simplified Chinese)

Fair Feature Importance Scores for Interpreting Tree-Based Methods and Surrogates

paper_url: http://arxiv.org/abs/2310.04352
repo_url: None
paper_authors: Camille Olivia Little, Debolina Halder Lina, Genevera I. Allen
for: 这种论文旨在提高机器学习系统的可解释性和公平性。
methods: 本论文提出了一种新的公平特征重要度分数，用于解释树状模型中各特征对公平性的贡献。
results: 通过实验和实际例子，论文表明了该分数的有效性，可以用于解释树状模型、树状集合和其他复杂机器学习系统中的公平性。

Abstract
Across various sectors such as healthcare, criminal justice, national security, finance, and technology, large-scale machine learning (ML) and artificial intelligence (AI) systems are being deployed to make critical data-driven decisions. Many have asked if we can and should trust these ML systems to be making these decisions. Two critical components are prerequisites for trust in ML systems: interpretability, or the ability to understand why the ML system makes the decisions it does, and fairness, which ensures that ML systems do not exhibit bias against certain individuals or groups. Both interpretability and fairness are important and have separately received abundant attention in the ML literature, but so far, there have been very few methods developed to directly interpret models with regard to their fairness. In this paper, we focus on arguably the most popular type of ML interpretation: feature importance scores. Inspired by the use of decision trees in knowledge distillation, we propose to leverage trees as interpretable surrogates for complex black-box ML models. Specifically, we develop a novel fair feature importance score for trees that can be used to interpret how each feature contributes to fairness or bias in trees, tree-based ensembles, or tree-based surrogates of any complex ML system. Like the popular mean decrease in impurity for trees, our Fair Feature Importance Score is defined based on the mean decrease (or increase) in group bias. Through simulations as well as real examples on benchmark fairness datasets, we demonstrate that our Fair Feature Importance Score offers valid interpretations for both tree-based ensembles and tree-based surrogates of other ML systems.

摘要
各个领域，如医疗、刑事司法、国家安全、金融和技术，都在大规模机器学习（ML）和人工智能（AI）系统中进行关键的数据驱动决策。许多人问到我们是否可以和应该信任这些ML系统来做出决策。两个关键组件是必需的 для信任ML系统：可解释性，即理解ML系统为什么会做出这些决策，以及公平，即确保ML系统不会对某些个人或群体产生偏见。两者都很重要，而且在ML文献中已经得到了充分的关注，但是直到现在，尚未有多少方法可以直接解释模型的公平性。在这篇论文中，我们将关注最受欢迎的ML解释方法之一：特征重要性分数。通过使用决策树在知识储存中的使用，我们提议利用树来解释复杂黑盒ML模型。我们开发了一种新的公平特征重要性分数，可以用来解释树、树 ensemble 或任何复杂ML系统的公平性或偏见中各个特征的贡献。与popular的mean decrease in impurity一样，我们的公平特征重要性分数基于mean decrease (或增加) in group bias。通过实验和真实的例子，我们示例了我们的公平特征重要性分数在树 ensemble 和树基于其他ML系统的surrogate 中提供了有效的解释。

Learning to Grasp: from Somewhere to Anywhere

paper_url: http://arxiv.org/abs/2310.04349
repo_url: https://github.com/Johann-Huber/qd_grasp
paper_authors: François Hélénon, Johann Huber, Faïz Ben Amar, Stéphane Doncieux
for: 研究人员想要对不同物体和机械臂进行自动化抓取，但抓取问题仍然是一个尚未完全解决的多学科问题，特别是在对不 convention 的形状或高度自适应的情况下。
methods: 本研究使用了Quality-Diversity（QD）方法，通过学习物体抓取的具体位置，并将其应用到新的物体位置。使用 RGB-D 数据流，视觉管线首先检测目标物体，预测其6DOF 位置，然后追踪它。
results: 本研究成功地将 QD 生成的抓取轨迹适应到新的物体位置，并在实际世界中进行了多种物体和机械臂的测试。实际应用中获得的转移率与 simulation 中获得的转移率相似，显示了提案的方法的效率。

Abstract
Robotic grasping is still a partially solved, multidisciplinary problem where data-driven techniques play an increasing role. The sparse nature of rewards make the automatic generation of grasping datasets challenging, especially for unconventional morphologies or highly actuated end-effectors. Most approaches for obtaining large-scale datasets rely on numerous human-provided demonstrations or heavily engineered solutions that do not scale well. Recent advances in Quality-Diversity (QD) methods have investigated how to learn object grasping at a specific pose with different robot morphologies. The present work introduces a pipeline for adapting QD-generated trajectories to new object poses. Using an RGB-D data stream, the vision pipeline first detects the targeted object, predicts its 6-DOF pose, and finally tracks it. An automatically generated reach-and-grasp trajectory can then be adapted by projecting it relatively to the object frame. Hundreds of trajectories have been deployed into the real world on several objects and with different robotic setups: a Franka Research 3 with a parallel gripper and a UR5 with a dexterous SIH Schunk hand. The transfer ratio obtained when applying transformation to the object pose matches the one obtained when the object pose matches the simulation, demonstrating the efficiency of the proposed approach.

摘要
机器人抓取仍然是一个部分解决的多学科问题，数据驱动技术在解决这个问题中扮演着越来越重要的角色。由于抓取任务的奖励 sparse，自动生成抓取数据集的问题特别是对于不 convent ional 的形态或高度 actuated 的底部件而更加挑战。大多数方法都需要大量的人类示例或者高度工程化的解决方案，这些方法并不具有扩展性。现有的进展在 Quality-Diversity（QD）方法中，研究如何通过Specific pose 学习对象抓取。本文介绍一个管道，用于适应 QD 生成的轨迹。使用 RGB-D 数据流，视觉管道首先检测目标对象，预测其 Six-DOF 位姿，然后跟踪它。接着，通过将 QD 生成的轨迹 проек到对象帧，自动生成的抓取轨迹可以进行适应。在实际世界中，已经部署了数百个轨迹，在多种对象和不同的机器人设置下进行了测试：Franka Research 3 WITH 平行握手和 UR5 WITH 灵活 SIH Schunk 手。转换对象姿势时的转换率与 simulation 中的转换率相同，表明提posed 方法的效率。

Functional Geometry Guided Protein Sequence and Backbone Structure Co-Design

paper_url: http://arxiv.org/abs/2310.04343
repo_url: https://github.com/jocelynsong/naepro
paper_authors: Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Yang Yang, Lei Li
for: 本文提出了一种基于自动探测功能位点的蛋白质序列和结构设计模型，即 NAEPro。
methods: NAEPro 使用了一个拥有自适应和对称层的交互网络，可以捕捉全序列的全局相关性和三维空间中最近氨酸的本地影响。
results: experimental results 表明，NAEPro 在两个蛋白质数据集，$\beta$-lactamase 和 myoglobin 上表现出最高的氨酸恢复率、TM-score 和最低的 RMSD。这些发现证明了 NAEPro 的能力设计高效的蛋白质序列和结构。

Abstract
Proteins are macromolecules responsible for essential functions in almost all living organisms. Designing reasonable proteins with desired functions is crucial. A protein's sequence and structure are strongly correlated and they together determine its function. In this paper, we propose NAEPro, a model to jointly design Protein sequence and structure based on automatically detected functional sites. NAEPro is powered by an interleaving network of attention and equivariant layers, which can capture global correlation in a whole sequence and local influence from nearest amino acids in three dimensional (3D) space. Such an architecture facilitates effective yet economic message passing at two levels. We evaluate our model and several strong baselines on two protein datasets, $\beta$-lactamase and myoglobin. Experimental results show that our model consistently achieves the highest amino acid recovery rate, TM-score, and the lowest RMSD among all competitors. These findings prove the capability of our model to design protein sequences and structures that closely resemble their natural counterparts. Furthermore, in-depth analysis further confirms our model's ability to generate highly effective proteins capable of binding to their target metallocofactors. We provide code, data and models in Github.

摘要
蛋白质是生物组织中重要的macromolecule，负责许多生物过程的核心功能。设计功能蛋白质的设计是非常重要的。蛋白质的序列和结构是强相関的，它们共同决定蛋白质的功能。在这篇文章中，我们提出了NAEPro，一个可以同时设计蛋白质序列和结构的模型，基于自动检测到的功能位点。NAEPro运用了跨维度对称层和注意力层，可以捕捉整个序列的全球相关和三维空间中最近的氨酸的本地影响。这种架构可以实现有效且经济的讯息传递。我们评估了我们的模型和几个强大的基eline在β-lactamase和myoglobin两个蛋白质数据集上。实验结果显示，我们的模型在所有竞争者中连续获得最高的氨酸重建率、TM-score和最低的RMSD。这些发现证明了我们的模型能够设计功能蛋白质序列和结构，与自然蛋白质序列和结构相似。此外，我们进行了深入分析，确认了我们的模型能够生成高效的蛋白质，可以与其标的金属复合物结合。我们在GitHub上提供了代码、数据和模型。

Applying Reinforcement Learning to Option Pricing and Hedging

paper_url: http://arxiv.org/abs/2310.04336
repo_url: None
paper_authors: Zoran Stoiljkovic
for: This paper provides an overview of recent advances in reinforcement learning for pricing and hedging financial instruments, with a focus on the Q-Learning Black Scholes approach.
methods: The paper uses a model-free and data-driven approach that bridges the traditional Black and Scholes model with novel artificial intelligence algorithms.
results: The algorithm is found to be an accurate estimator under different levels of volatility and hedging frequency, and exhibits robust performance across various levels of option’s moneyness. Additionally, the algorithm incorporates proportional transaction costs, which have diverse impacts on profit and loss.Here is the same information in Simplified Chinese text:
for: 这篇论文提供了现代金融工具价格和保险的最新进展，主要关注黑施勒斯（2017）提出的Q学习黑施勒斯方法。这种学习方法将传统的黑施勒斯模型与人工智能算法相结合，实现了完全无模型和数据驱动的选项价格和保险。
methods: 本论文使用无模型和数据驱动的Q学习方法，bridges传统的黑施勒斯模型与新的人工智能算法。
results: 结果表明，该模型在不同的状态变量和场景下具有准确估计的性能，并且在不同的投资额和保险频率下展现出了稳定性。此外，该方法在不同的货币性水平下也具有稳定性。最后，该算法包含了不同状态变量的负担费用，这些负担费用会影响盈亏。

Abstract
This thesis provides an overview of the recent advances in reinforcement learning in pricing and hedging financial instruments, with a primary focus on a detailed explanation of the Q-Learning Black Scholes approach, introduced by Halperin (2017). This reinforcement learning approach bridges the traditional Black and Scholes (1973) model with novel artificial intelligence algorithms, enabling option pricing and hedging in a completely model-free and data-driven way. This paper also explores the algorithm's performance under different state variables and scenarios for a European put option. The results reveal that the model is an accurate estimator under different levels of volatility and hedging frequency. Moreover, this method exhibits robust performance across various levels of option's moneyness. Lastly, the algorithm incorporates proportional transaction costs, indicating diverse impacts on profit and loss, affected by different statistical properties of the state variables.

摘要
这个论文提供了现代补偿学习在财务工具估价和投资风险管理方面的最新进展，主要强调黑尔伯恩（2017）提出的Q学习黑沃尔（1973）方法的详细解释。这种补偿学习方法结合了传统的黑沃尔模型和新型人工智能算法，实现了完全无模型和数据驱动的选项估价和投资风险管理。本文还探讨了算法在不同状态变量和场景下的性能，发现该模型在不同的震动率和投资频率下具有准确估计性。此外，这种方法在不同的财务质量下也表现了强劲的稳定性。最后，该算法包含了不同状态变量的负担成本，表明不同状态变量对利润和损失产生的多种影响。

Saliency-Guided Hidden Associative Replay for Continual Learning

paper_url: http://arxiv.org/abs/2310.04334
repo_url: https://github.com/baithebest/sharc
paper_authors: Guangji Bai, Qilong Zhao, Xiaoyang Jiang, Yifei Zhang, Liang Zhao
for: 这个研究旨在提出一个基于协调 associative memory 和 replay 的新方法，以解决 continual learning 中的严重遗传问题。
methods: 本研究使用 sparse memory encoding 技术，将重要的数据段落存储在 associative memory 中，并使用 content-focused memory retrieval 机制，以快速和几乎完美地回传数据。
results: 实验结果显示，该方法可以有效解决 continual learning 中的严重遗传问题，并且在不同的 continual learning 任务中表现出色。

Abstract
Continual Learning is a burgeoning domain in next-generation AI, focusing on training neural networks over a sequence of tasks akin to human learning. While CL provides an edge over traditional supervised learning, its central challenge remains to counteract catastrophic forgetting and ensure the retention of prior tasks during subsequent learning. Amongst various strategies to tackle this, replay based methods have emerged as preeminent, echoing biological memory mechanisms. However, these methods are memory intensive, often preserving entire data samples, an approach inconsistent with humans selective memory retention of salient experiences. While some recent works have explored the storage of only significant portions of data in episodic memory, the inherent nature of partial data necessitates innovative retrieval mechanisms. Current solutions, like inpainting, approximate full data reconstruction from partial cues, a method that diverges from genuine human memory processes. Addressing these nuances, this paper presents the Saliency Guided Hidden Associative Replay for Continual Learning. This novel framework synergizes associative memory with replay-based strategies. SHARC primarily archives salient data segments via sparse memory encoding. Importantly, by harnessing associative memory paradigms, it introduces a content focused memory retrieval mechanism, promising swift and near-perfect recall, bringing CL a step closer to authentic human memory processes. Extensive experimental results demonstrate the effectiveness of our proposed method for various continual learning tasks.

摘要

Robust Losses for Decision-Focused Learning

paper_url: http://arxiv.org/abs/2310.04328
repo_url: None
paper_authors: Noah Schutte, Krzysztof Postek, Neil Yorke-Smith
for: 本研究旨在探讨用于做出精细决策的优化模型中的不确定参数，以及如何通过预测来训练这些参数。
methods: 本研究使用了预测后优化的方法，即使用预测模型来预测参数的不确定性，然后使用这些预测来训练优化模型。
results: 研究发现，使用 robust regret loss 可以更好地预测实际的 regret，并且可以降低测试样本上的 regret。

Abstract
Optimization models used to make discrete decisions often contain uncertain parameters that are context-dependent and are estimated through prediction. To account for the quality of the decision made based on the prediction, decision-focused learning (end-to-end predict-then-optimize) aims at training the predictive model to minimize regret, i.e., the loss incurred by making a suboptimal decision. Despite the challenge of this loss function being possibly non-convex and in general non-differentiable, effective gradient-based learning approaches have been proposed to minimize the expected loss, using the empirical loss as a surrogate. However, empirical regret can be an ineffective surrogate because the uncertainty in the optimization model makes the empirical regret unequal to the expected regret in expectation. To illustrate the impact of this inequality, we evaluate the effect of aleatoric and epistemic uncertainty on the accuracy of empirical regret as a surrogate. Next, we propose three robust loss functions that more closely approximate expected regret. Experimental results show that training two state-of-the-art decision-focused learning approaches using robust regret losses improves test-sample empirical regret in general while keeping computational time equivalent relative to the number of training epochs.

摘要
优化模型常用于作出精细决策，它们的参数通常受到上下文依赖和预测的不确定性影响。为了考虑决策质量基于预测的问题，决策专注学习（终端预测然后优化） targets 在减少 regret 方面进行训练，即决策时的损失。然而，这个损失函数可能是非凸的，甚至不可导的，这使得有效的梯度学习方法成为了一个挑战。不过，使用 empirical 损失作为代理来减少预期损失的方法已经被提出。然而，empirical regret 可能不能准确地反映预期损失，因为优化模型中的不确定性会导致 empirical regret 与预期损失之间的差异。为了描述这种不同，我们评估了 aleatoric 和 epistemic uncertainty 对 empirical regret 的影响。接着，我们提出了三种robust regret loss，这些损失函数更好地预测预期损失。实验结果表明，使用这些 robust regret loss 训练两种现有的决策专注学习方法可以提高测试样本 empirical regret 的准确性，而不会增加计算时间相对于训练纪录数量。

Program Synthesis with Best-First Bottom-Up Search

paper_url: http://arxiv.org/abs/2310.04327
repo_url: None
paper_authors: Saqib Ameen, Levi H. S. Lelis
for: 解决程序合成任务中的搜索问题，使用成本函数引导搜索，以优化程序生成。
methods: 引入一种新的最佳先进搜索算法，称为“蜜蜂搜索”（Bee Search），可以在成本函数引导下，在最佳先进顺序下进行程序生成。该算法不会产生比解决方案更贵的程序，并且可以在内存中生成程序。同时，我们还引入了一种新的成本函数，可以更好地利用模型提供的信息。
results: 实验结果表明，使用蜜蜂搜索和新的成本函数可以在使用更复杂的域特定语言（DSL）时，比前方法更高效；在使用更简单的 DSL 时，蜜蜂搜索和前方法的性能相同。此外，新的成本函数在字符串处理任务上表现更高效。

Abstract
Cost-guided bottom-up search (BUS) algorithms use a cost function to guide the search to solve program synthesis tasks. In this paper, we show that current state-of-the-art cost-guided BUS algorithms suffer from a common problem: they can lose useful information given by the model and fail to perform the search in a best-first order according to a cost function. We introduce a novel best-first bottom-up search algorithm, which we call Bee Search, that does not suffer information loss and is able to perform cost-guided bottom-up synthesis in a best-first manner. Importantly, Bee Search performs best-first search with respect to the generation of programs, i.e., it does not even create in memory programs that are more expensive than the solution program. It attains best-first ordering with respect to generation by performing a search in an abstract space of program costs. We also introduce a new cost function that better uses the information provided by an existing cost model. Empirical results on string manipulation and bit-vector tasks show that Bee Search can outperform existing cost-guided BUS approaches when employing more complex domain-specific languages (DSLs); Bee Search and previous approaches perform equally well with simpler DSLs. Furthermore, our new cost function with Bee Search outperforms previous cost functions on string manipulation tasks.

摘要
<> translate("Cost-guided bottom-up search (BUS) algorithms use a cost function to guide the search to solve program synthesis tasks. In this paper, we show that current state-of-the-art cost-guided BUS algorithms suffer from a common problem: they can lose useful information given by the model and fail to perform the search in a best-first order according to a cost function. We introduce a novel best-first bottom-up search algorithm, which we call Bee Search, that does not suffer information loss and is able to perform cost-guided bottom-up synthesis in a best-first manner. Importantly, Bee Search performs best-first search with respect to the generation of programs, i.e., it does not even create in memory programs that are more expensive than the solution program. It attains best-first ordering with respect to generation by performing a search in an abstract space of program costs. We also introduce a new cost function that better uses the information provided by an existing cost model. Empirical results on string manipulation and bit-vector tasks show that Bee Search can outperform existing cost-guided BUS approaches when employing more complex domain-specific languages (DSLs); Bee Search and previous approaches perform equally well with simpler DSLs. Furthermore, our new cost function with Bee Search outperforms previous cost functions on string manipulation tasks.")]Here's the translation:<>成本导向底层搜索（BUS）算法使用成本函数来引导搜索解决程序生成任务。在这篇论文中，我们显示现有状态对抗BUS算法受到一个共同问题：它们可能会在模型提供的信息上失去有用信息并不能按照成本函数进行最优先搜索。我们介绍了一种新的最优先底层搜索算法，我们称之为“蜜蜂搜索”（Bee Search）。蜜蜂搜索不会产生更昂贵的程序，并且可以在成本空间中进行最优先搜索。我们还引入了一个新的成本函数，该函数更好地利用现有成本模型提供的信息。实验结果表明，蜜蜂搜索在使用更复杂的域特定语言（DSL）时可以超过现有的成本导向BUS方法，并且与 simpler DSL 相同，蜜蜂搜索和前一代方法的性能相同。此外，我们的新成本函数与蜜蜂搜索在串 manipulate 任务上表现更高效。

Latent Graph Inference with Limited Supervision

paper_url: http://arxiv.org/abs/2310.04314
repo_url: https://github.com/Jianglin954/LGI-LS
paper_authors: Jianglin Lu, Yi Xu, Huan Wang, Yue Bai, Yun Fu
for: 提高 latent graph inference（LGI）的性能，特别是在有限的监督下。
methods: 提出了一种方法来Restore the corrupted affinities和Recover the missed supervision，包括定义 pivot nodes 和使用CUR matrix decomposition。
results: 在多个 benchmark 上实现了提高 LGI 的性能，特别是在有限的监督下（6.12% 提高 Pubmed 上，只需要0.3% 的标注率）。

Abstract
Latent graph inference (LGI) aims to jointly learn the underlying graph structure and node representations from data features. However, existing LGI methods commonly suffer from the issue of supervision starvation, where massive edge weights are learned without semantic supervision and do not contribute to the training loss. Consequently, these supervision-starved weights, which may determine the predictions of testing samples, cannot be semantically optimal, resulting in poor generalization. In this paper, we observe that this issue is actually caused by the graph sparsification operation, which severely destroys the important connections established between pivotal nodes and labeled ones. To address this, we propose to restore the corrupted affinities and replenish the missed supervision for better LGI. The key challenge then lies in identifying the critical nodes and recovering the corrupted affinities. We begin by defining the pivotal nodes as $k$-hop starved nodes, which can be identified based on a given adjacency matrix. Considering the high computational burden, we further present a more efficient alternative inspired by CUR matrix decomposition. Subsequently, we eliminate the starved nodes by reconstructing the destroyed connections. Extensive experiments on representative benchmarks demonstrate that reducing the starved nodes consistently improves the performance of state-of-the-art LGI methods, especially under extremely limited supervision (6.12% improvement on Pubmed with a labeling rate of only 0.3%).

摘要

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

paper_url: http://arxiv.org/abs/2310.04292
repo_url: https://github.com/datamol-io/graphium
paper_authors: Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Zhiyi Li, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis Müller, Jama Hussein Mohamud, Ali Parviz, Michael Craig, Michał Koziarski, Jiarui Lu, Zhaocheng Zhu, Cristian Gabellini, Kerstin Klaser, Josef Dean, Cas Wognum, Maciej Sypetkowski, Guillaume Rabusseau, Reihaneh Rabbany, Jian Tang, Christopher Morris, Ioannis Koutis, Mirco Ravanelli, Guy Wolf, Prudencio Tossou, Hadrien Mary, Therence Bois, Andrew Fitzgibbon, Błażej Banaszewski, Chad Martin, Dominic Masters
for:这篇论文的目的是为了提供大量的分类标签数据集，以促进分子学机器学习领域中的基础模型的发展。methods:这篇论文使用了7个新的数据集，分别是ToyMix、LargeMix和UltraLarge三个类别，这些数据集的标签数据量非常大，涵盖了10亿分子和3000多个稀聚定义的任务，总计超过1300亿个个别标签，其中包括量子和生物性的标签。results:这篇论文提供了一些基线结果，以便在这些数据集上进行多任务和多级分子学机器学习模型的训练。Empirical studies show that training on large amounts of quantum data can improve the performance of low-resource biological datasets, indicating that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks。Here’s the format you requested:for: 这篇论文的目的是为了提供大量的分类标签数据集，以促进分子学机器学习领域中的基础模型的发展。methods: 这篇论文使用了7个新的数据集，分别是ToyMix、LargeMix和UltraLarge三个类别，这些数据集的标签数据量非常大，涵盖了10亿分子和3000多个稀聚定义的任务，总计超过1300亿个个别标签，其中包括量子和生物性的标签。results: 这篇论文提供了一些基线结果，以便在这些数据集上进行多任务和多级分子学机器学习模型的训练。Empirical studies show that training on large amounts of quantum data can improve the performance of low-resource biological datasets, indicating that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks。

Abstract
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks.

摘要
近期，预训练基础模型已经导致多个领域的进步。在分子学学习中， however，由于数据集通常是手工筛选，因此通常较小，缺乏带有标签的特征数据集和管理这些数据集的代码库，这限制了基础模型的发展。在这项工作中，我们提出了七个新的数据集，分为三个不同类别：ToyMix、LargeMix和UltraLarge。这些数据集在标签数量和多样性方面都为分子学学习带来了新的纪录。它们涵盖了 nearly 100 million 分子和超过 3000 稀缺定的任务，总计超过 1300 亿个标签，其中包括量子和生物性质的标签。与此相比，我们的数据集包含 300 个更多的数据点，than the widely used OGB-LSC PCQM4Mv2 数据集，并且 13 个更多的量子只数据集。此外，为基于我们提出的数据集开发基础模型，我们提供了 Graphium 图机器学习库，该库 simplifies the process of building and training 分子机器学习模型，特别是在多任务和多级分子数据集上。最后，我们提供了一系列的基线结果，作为多任务和多级训练的开始点。我们观察到，在资源受限的生物数据集上，通过同时训练大量量子数据也能够提高表现。这表明，可能在基础模型的多任务和多级训练和精度训练下进行 fine-tuning 可以获得更好的性能。

On the Error-Propagation of Inexact Deflation for Principal Component Analysis

paper_url: http://arxiv.org/abs/2310.04283
repo_url: None
paper_authors: Fangshuo Liao, Junhyung Lyle Kim, Cruz Barnum, Anastasios Kyrillidis
for: 本研究旨在解决数据分析中常用的主成分分析（PCA）问题，尤其是高维数据的情况下。
methods: 本研究使用了逐次扫描法（deflation method）来找出主成分。
results: 本研究提供了两个主要结果：一是当副程序用于找到主成分向量时是通用的时，二是当使用力迭代法（power iteration）作为副程序时，可以获得更紧的误差界限。这两个结果都是关于PCA的误差卷积的数学分析。

Abstract
Principal Component Analysis (PCA) is a popular tool in data analysis, especially when the data is high-dimensional. PCA aims to find subspaces, spanned by the so-called \textit{principal components}, that best explain the variance in the dataset. The deflation method is a popular meta-algorithm -- used to discover such subspaces -- that sequentially finds individual principal components, starting from the most important one and working its way towards the less important ones. However, due to its sequential nature, the numerical error introduced by not estimating principal components exactly -- e.g., due to numerical approximations through this process -- propagates, as deflation proceeds. To the best of our knowledge, this is the first work that mathematically characterizes the error propagation of the inexact deflation method, and this is the key contribution of this paper. We provide two main results: $i)$ when the sub-routine for finding the leading eigenvector is generic, and $ii)$ when power iteration is used as the sub-routine. In the latter case, the additional directional information from power iteration allows us to obtain a tighter error bound than the analysis of the sub-routine agnostic case. As an outcome, we provide explicit characterization on how the error progresses and affects subsequent principal component estimations for this fundamental problem.

摘要
主成分分析（PCA）是数据分析中广泛使用的工具，特别是当数据维度很高时。PCA的目标是找到数据集中最好解释协方差的子空间，这些子空间被称为“主成分”。抽象法是一种广泛使用的meta-算法，它逐次找到数据集中的主成分，从最重要的开始，向更不重要的进行。然而，由于其逐次性，在不精确地计算主成分时的数字错误会在抽象进程中卷毁。根据我们所知，这是首次对不精确抽象法的错误卷毁进行数学 caracterization的研究，这是本文的关键贡献。我们提供两个主要结果：i) 当找到主成分的子routine是通用的时，ii) 当使用力耗迭代法作为子routine时。在后者情况下，通过力耗迭代法提供的方向信息，我们可以获得更紧张的错误 bound，比sub-routine agnostic case的分析更精确。因此，我们提供了主成分估计过程中错误的明确 caracterization，并且描述了错误如何在后续主成分估计中传播。

Deep learning modelling of tip clearance variations on multi-stage axial compressors aerodynamics

paper_url: http://arxiv.org/abs/2310.04264
repo_url: None
paper_authors: Giuseppe Bruni, Sepehr Maleki, Senthil K. Krishnababu
for: 这个论文是为了应用深度学习方法于物理模拟（CFD），以提高液压机的性能和生产效率。
methods: 这个论文使用了深度学习框架，以实时预测多Stage液压机中缘 clearance 变化对流场和 aerodynamic performance 的影响。
results: 该框架被证明可扩展到工业应用，并在实时达到 CFD 参照值的准确性。已经部署的模型可以直接integrated到液压机的生产和建造过程中，以便分析性能的影响和减少 física tests 的需求。

Abstract
Application of deep learning methods to physical simulations such as CFD (Computational Fluid Dynamics) for turbomachinery applications, have been so far of limited industrial relevance. This paper demonstrates the development and application of a deep learning framework for real-time predictions of the impact of tip clearance variations on the flow field and aerodynamic performance of multi-stage axial compressors in gas turbines. The proposed architecture is proven to be scalable to industrial applications, and achieves in real-time accuracy comparable to the CFD benchmark. The deployed model, is readily integrated within the manufacturing and build process of gas turbines, thus providing the opportunity to analytically assess the impact on performance and potentially reduce requirements for expensive physical tests.

摘要
使用深度学习方法进行物理模拟，如计算流体动力学（CFD），在液压机应用中尚未得到了广泛的工业应用。这篇文章描述了一种深度学习框架的开发和应用，用于实时预测多Stage液压机中缺口变化对流体场和 aerodynamic性能的影响。提出的架构被证明可扩展到工业应用，并在实时达到了CFD标准的准确率。已经部署的模型可以直接integrated into the manufacturing and build process of gas turbines，从而提供了对性能的分析和可能性reducing expensive physical tests的机会。Note: The above text is translated into Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms

paper_url: http://arxiv.org/abs/2310.04238
repo_url: None
paper_authors: Dennis Klau, Marc Zöller, Christian Tutschku
for: 这个研究旨在选择和分析现有的AutoML框架，以便 incorporating Quantum Machine Learning（QML）算法到自动解决方案中，并解决不同类型的ML问题的一组工业使用情况。
methods: 该研究使用了多phase、多 criterion 方法来筛选可用的开源工具，并从技术和AutoML角度进行评估框架。
results: 研究选择了Ray和AutoGluon作为适用的低级和高级框架，并基于这些发现建立了一个扩展的自动化量子机器学习（AutoQML）框架，并在特定硬件和软件约束下实现了QC特有的管道步骤和决策特征。

Abstract
This work describes the selection approach and analysis of existing AutoML frameworks regarding their capability of a) incorporating Quantum Machine Learning (QML) algorithms into this automated solving approach of the AutoML framing and b) solving a set of industrial use-cases with different ML problem types by benchmarking their most important characteristics. For that, available open-source tools are condensed into a market overview and suitable frameworks are systematically selected on a multi-phase, multi-criteria approach. This is done by considering software selection approaches, as well as in terms of the technical perspective of AutoML. The requirements for the framework selection are divided into hard and soft criteria regarding their software and ML attributes. Additionally, a classification of AutoML frameworks is made into high- and low-level types, inspired by the findings of. Finally, we select Ray and AutoGluon as the suitable low- and high-level frameworks respectively, as they fulfil all requirements sufficiently and received the best evaluation feedback during the use-case study. Based on those findings, we build an extended Automated Quantum Machine Learning (AutoQML) framework with QC-specific pipeline steps and decision characteristics for hardware and software constraints.

摘要
Here is the text in Simplified Chinese:这个研究描述了自动机器学习（AutoML）框架的选择和分析，包括它们是否可以 incorporating Quantum Machine Learning（QML）算法到自动解决方案中，以及解决不同类型的工业用 caso 问题。为此，我们对可用的开源工具进行了市场概述，并 sistematicamente 选择了适合的框架，基于多个多个 criterion 的多 phase approach。这些 criterion 包括软件选择方法和技术上的 AutoML 特点。框架的选择要求分为硬件和软件两个类别，即 Software 和 ML 属性。此外，我们还对 AutoML 框架进行了分类，将其分为高级和低级两类，这是根据了发现的结果。最后，我们选择了 Ray 和 AutoGluon 作为最适合的低级和高级框架。基于这些发现，我们建立了一个扩展的自动量子机器学习（AutoQML）框架，它包括量子计算机Specific 管道步骤和决策特征，以满足硬件和软件约束。

Cost-Effective Retraining of Machine Learning Models

paper_url: http://arxiv.org/abs/2310.04216
repo_url: None
paper_authors: Ananth Mahadevan, Michael Mathioudakis
for: 这 paper 的目的是提出一种自动化和经济的机器学习模型重新训练决策算法，以优化数据变化和模型衰退的交互关系。
methods: 该 paper 使用了一种基于成本考虑的 Cost-Aware Retraining Algorithm (Cara)，通过考虑不同的数据和模型因素，自动决定是否需要重新训练机器学习模型。
results: 该 paper 通过对 sintetic 数据集和实际数据集进行分析和实验，证明了 Cara 可以适应不同的数据漂移和重新训练成本，同时保持与最佳回档算法相似的性能。

Abstract
It is important to retrain a machine learning (ML) model in order to maintain its performance as the data changes over time. However, this can be costly as it usually requires processing the entire dataset again. This creates a trade-off between retraining too frequently, which leads to unnecessary computing costs, and not retraining often enough, which results in stale and inaccurate ML models. To address this challenge, we propose ML systems that make automated and cost-effective decisions about when to retrain an ML model. We aim to optimize the trade-off by considering the costs associated with each decision. Our research focuses on determining whether to retrain or keep an existing ML model based on various factors, including the data, the model, and the predictive queries answered by the model. Our main contribution is a Cost-Aware Retraining Algorithm called Cara, which optimizes the trade-off over streams of data and queries. To evaluate the performance of Cara, we analyzed synthetic datasets and demonstrated that Cara can adapt to different data drifts and retraining costs while performing similarly to an optimal retrospective algorithm. We also conducted experiments with real-world datasets and showed that Cara achieves better accuracy than drift detection baselines while making fewer retraining decisions, ultimately resulting in lower total costs.

摘要
“重新训练机器学习（ML）模型是重要的，以保持模型的性能随着数据的变化而改善。然而，这可能会带来高昂的计算成本，因为通常需要重新处理整个数据集。这创造了一个 retraining 的权衡问题， retraining 的频率需要考虑计算成本。我们提出了一种自动化并经济的 ML 系统，可以自动决定是否需要重新训练 ML 模型。我们的研究集中在决定是否需要重新训练或保留现有的 ML 模型，根据数据、模型以及模型回答的预测查询。我们的主要贡献是一种名为 Cara 的 Cost-Aware Retraining Algorithm，可以优化权衡。为评估 Cara 的性能，我们分析了 sintetic 数据集并证明了 Cara 可以适应不同的数据漂移和重新训练成本，同时与潜在的回顾算法相似。我们还对实际数据集进行了实验，并证明了 Cara 可以在较低的总成本下达到更高的准确率。”

Non-Redundant Graph Neural Networks with Improved Expressiveness

paper_url: http://arxiv.org/abs/2310.04190
repo_url: None
paper_authors: Franka Bause, Samir Moustafa, Johannes Langguth, Wilfried N. Gansterer, Nils M. Kriege
for: 这篇论文旨在提出一种基于 Message Passing Graph Neural Networks (MPGNNs) 的新汇聚方法，以提高表示力和避免过损压缩。
methods: 该方法基于 neighborhood trees 的新汇聚 scheme，可以控制冗余，从而提高表示力和避免过损压缩。
results: 实验表明，该方法可以提高表示力和避免过损压缩，并且在 widely-used benchmark datasets 上实现高分类率。

Abstract
Message passing graph neural networks iteratively compute node embeddings by aggregating messages from all neighbors. This procedure can be viewed as a neural variant of the Weisfeiler-Leman method, which limits their expressive power. Moreover, oversmoothing and oversquashing restrict the number of layers these networks can effectively utilize. The repeated exchange and encoding of identical information in message passing amplifies oversquashing. We propose a novel aggregation scheme based on neighborhood trees, which allows for controlling the redundancy by pruning branches of the unfolding trees underlying standard message passing. We prove that reducing redundancy improves expressivity and experimentally show that it alleviates oversquashing. We investigate the interaction between redundancy in message passing and redundancy in computation and propose a compact representation of neighborhood trees, from which we compute node and graph embeddings via a neural tree canonization technique. Our method is provably more expressive than the Weisfeiler-Leman method, less susceptible to oversquashing than message passing neural networks, and provides high classification accuracy on widely-used benchmark datasets.

摘要
message passing 图 нейрон网络逐步计算节点嵌入，通过所有邻居的消息汇总来计算节点嵌入。这种过程可以视为一种神经网络中的Weisfeiler-Leman方法的变体，它限制了它们的表达力。另外，过滤和压缩限制了图层数，这些图层数可以有效利用。在消息传递中重复交换和编码相同的信息会增加压缩。我们提出了一种基于邻域树的新的聚合方法，可以控制干扰的强度，通过隐藏树的层次结构来减少干扰。我们证明了减少干扰可以提高表达力，并在实验中证明它可以缓解过滤。我们研究消息传递中干扰和计算中干扰之间的交互，并提出一种紧凑的表示方法，从而计算节点和图嵌入。我们的方法比Weisfeiler-Leman方法更表达力强，对消息传递中干扰更敏感，并在广泛使用的 benchmark 数据集上达到高精度分类。

Amortized Network Intervention to Steer the Excitatory Point Processes

paper_url: http://arxiv.org/abs/2310.04159
repo_url: None
paper_authors: Zitao Song, Wendi Ren, Shuang Li
for: 这个研究旨在解决大规模网络干预问题，专门是指引刺激点过程，如传染病毒或交通壅塞控制。
methods: 我们的模型基于学习掌控方法，使用神经数据流函数（Neural ODEs）来捕捉网络上刺激点过程的时间变化。我们的方法结合了Gradient-Descent基于的Model Predictive Control（GD-MPC），以便满足对策略的灵活性，并考虑对策略的约束。
results: 我们的方法可以实现网络上刺激点过程的有效控制，并且可以在实际应用中运用，例如减少传染病毒的传播和减少碳排放。

Abstract
We tackle the challenge of large-scale network intervention for guiding excitatory point processes, such as infectious disease spread or traffic congestion control. Our model-based reinforcement learning utilizes neural ODEs to capture how the networked excitatory point processes will evolve subject to the time-varying changes in network topology. Our approach incorporates Gradient-Descent based Model Predictive Control (GD-MPC), offering policy flexibility to accommodate prior knowledge and constraints. To address the intricacies of planning and overcome the high dimensionality inherent to such decision-making problems, we design an Amortize Network Interventions (ANI) framework, allowing for the pooling of optimal policies from history and other contexts, while ensuring a permutation equivalent property. This property enables efficient knowledge transfer and sharing across diverse contexts. Our approach has broad applications, from curbing infectious disease spread to reducing carbon emissions through traffic light optimization, and thus has the potential to address critical societal and environmental challenges.

摘要
我们面临大规模网络干预挑战，以导引刺激点过程，如感染病毒传播或交通堵塞控制。我们的模型基于学习环境动量（Neural ODEs），用于捕捉网络上刺激点过程在时间变化的网络结构下发展的变化。我们的方法结合梯度下降基于模型预测控制（GD-MPC），以便满足先前知识和约束。为了 Addressing the intricacies of planning and high-dimensional decision-making problems, we design an Amortize Network Interventions (ANI) framework, which pools optimal policies from history and other contexts, while ensuring a permutation equivalent property. This property enables efficient knowledge transfer and sharing across diverse contexts. Our approach has broad applications, from curbing infectious disease spread to reducing carbon emissions through traffic light optimization, and thus has the potential to address critical societal and environmental challenges.Here's the word-for-word translation of the text into Simplified Chinese:我们面临大规模网络干预挑战，以导引刺激点过程，如感染病毒传播或交通堵塞控制。我们的模型基于学习环境动量（Neural ODEs），用于捕捉网络上刺激点过程在时间变化的网络结构下发展的变化。我们的方法结合梯度下降基于模型预测控制（GD-MPC），以便满足先前知识和约束。为了 Addressing the intricacies of planning and high-dimensional decision-making problems, we design an Amortize Network Interventions (ANI) framework, which pools optimal policies from history and other contexts, while ensuring a permutation equivalent property。 This property enables efficient knowledge transfer and sharing across diverse contexts。 Our approach has broad applications, from curbing infectious disease spread to reducing carbon emissions through traffic light optimization, and thus has the potential to address critical societal and environmental challenges。

From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying

paper_url: http://arxiv.org/abs/2310.04145
repo_url: None
paper_authors: Biao Wu, Qiang Huang, Anthony K. H. Tung
for: 本研究旨在保护数据的知识产权，尤其是在机器学习应用萌芽的时候，数据训练过程中的数据泄露问题日益突出。
methods: 本文提出了一种新的方法——本地分布变化合成(\textsc{LDSS})，用于检测模型训练过程中的数据泄露。\textsc{LDSS}通过在所有者的数据集中插入一小量的合成数据，使得模型训练过程中的数据泄露得到了有效的检测。
results: EXTENSIVE experiments表明，\textsc{LDSS} 具有可靠性、可靠性、准确性、安全性和效率。在七种不同的分类模型和五个实际 dataset 上，\textsc{LDSS} 都取得了优秀的结果。

Abstract
Safeguarding the Intellectual Property (IP) of data has become critically important as machine learning applications continue to proliferate, and their success heavily relies on the quality of training data. While various mechanisms exist to secure data during storage, transmission, and consumption, fewer studies have been developed to detect whether they are already leaked for model training without authorization. This issue is particularly challenging due to the absence of information and control over the training process conducted by potential attackers. In this paper, we concentrate on the domain of tabular data and introduce a novel methodology, Local Distribution Shifting Synthesis (\textsc{LDSS}), to detect leaked data that are used to train classification models. The core concept behind \textsc{LDSS} involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset. This enables the effective identification of models trained on leaked data through model querying alone, as the synthetic data injection results in a pronounced disparity in the predictions of models trained on leaked and modified datasets. \textsc{LDSS} is \emph{model-oblivious} and hence compatible with a diverse range of classification models, such as Naive Bayes, Decision Tree, and Random Forest. We have conducted extensive experiments on seven types of classification models across five real-world datasets. The comprehensive results affirm the reliability, robustness, fidelity, security, and efficiency of \textsc{LDSS}.

摘要
保护数据的知识产权（IP）在机器学习应用得更加重要，因为机器学习模型的成功几乎完全取决于训练数据的质量。虽然有各种机制来保护数据在存储、传输和消耗过程中，但有 fewer studies 是用于检测模型是否在未经授权的情况下使用训练数据。这个问题特别Difficult 是因为攻击者可能不具备训练过程的信息和控制。在这篇论文中，我们专注于表格数据领域，并提出了一种新的方法——本地分布式同步生成(\textsc{LDSS）），用于检测训练模型的数据泄露。\textsc{LDSS} 的核心思想是将小量的合成数据（具有本地分布的类别分布差异）注入到所有者的数据集中。这使得可以通过模型查询 alone 确定是否使用了未经授权的训练数据，因为合成数据注入会导致模型在使用修改后的数据集上预测结果出现显著差异。\textsc{LDSS} 是“模型无关”的，可以与多种分类模型（如普通概率、决策树、随机森林）结合使用。我们在五个实际数据集上进行了七种分类模型的广泛实验，结果证明了 \textsc{LDSS} 的可靠性、稳定性、准确性、安全性和效率。

Routing Arena: A Benchmark Suite for Neural Routing Solvers

paper_url: http://arxiv.org/abs/2310.04140
repo_url: None
paper_authors: Daniela Thyssens, Tim Dernedde, Jonas K. Falkner, Lars Schmidt-Thieme
for: 该论文主要研究目标是提出一个基于Machine Learning的路径选择策略评估 benchmark suite，以便更好地评估不同方法的性能和比较不同领域中的基eline。
methods: 该论文提出了一种新的评估协议，该协议包括两个主要的评估情况：一是预先固定的时间预算，二是任意时间的性能评估。此外，该论文还提出了一种新的评估指标——Weighted Relative Average Performance（WRAP），用于衡量不同方法的运行时效率。
results: 该论文的初步实验结果表明，最新的操作研究方法在解决交通问题上获得了最佳解的解决方案和运行时效率的最佳性能。然而，一些发现还提出了使用神经网络方法的优势，并促使我们对神经网络方法的概念如何进行重新定义。

Abstract
Neural Combinatorial Optimization has been researched actively in the last eight years. Even though many of the proposed Machine Learning based approaches are compared on the same datasets, the evaluation protocol exhibits essential flaws and the selection of baselines often neglects State-of-the-Art Operations Research approaches. To improve on both of these shortcomings, we propose the Routing Arena, a benchmark suite for Routing Problems that provides a seamless integration of consistent evaluation and the provision of baselines and benchmarks prevalent in the Machine Learning- and Operations Research field. The proposed evaluation protocol considers the two most important evaluation cases for different applications: First, the solution quality for an a priori fixed time budget and secondly the anytime performance of the respective methods. By setting the solution trajectory in perspective to a Best Known Solution and a Base Solver's solutions trajectory, we furthermore propose the Weighted Relative Average Performance (WRAP), a novel evaluation metric that quantifies the often claimed runtime efficiency of Neural Routing Solvers. A comprehensive first experimental evaluation demonstrates that the most recent Operations Research solvers generate state-of-the-art results in terms of solution quality and runtime efficiency when it comes to the vehicle routing problem. Nevertheless, some findings highlight the advantages of neural approaches and motivate a shift in how neural solvers should be conceptualized.

摘要
neurolinkOptimization 在过去八年内得到了active研究。虽然许多提出的机器学习基于方法在同一个数据集上进行比较，但评价协议具有重要的缺陷，而且基准选择frequently neglectsState-of-the-Art操作研究方法。为了改进这两点，我们提出了Routing Arena，一个用于路由问题的benchmark集合，它提供了一个协调的评价方法和机器学习和操作研究领域中常见的基准和benchmark。我们的评价方法考虑了不同应用场景中的两个最重要的评价情况：一是预先固定的时间预算，二是任意时间内的方法性能。我们还提出了一个新的评价指标——Weighted Relative Average Performance（WRAP），它可以量化许多 neural routing solvers中的运行时效率。经过首次实验评估，我们发现最新的操作研究 solvers在解决交通流量问题时达到了状态体验率和运行时效率的国际前景。然而，一些发现还提出了神经方法的优势，并促使我们对神经方法的概念如何进行重新思考。

Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends

paper_url: http://arxiv.org/abs/2310.04078
repo_url: https://github.com/wxr99/holisticpu
paper_authors: Xinrui Wang, Wenhai Wan, Chuanxin Geng, Shaoyuan LI, Songcan Chen
for: 本研究旨在提出一种基于带有正例和无标例的PUL方法，以优化二分类模型的训练。
methods: 该方法利用了一种启发式的做法，即在每次训练中采样正例数据，以确保正例和无标例之间的分布尽可平衡。此外，该方法还使用了一种新的时间点 процесс（TPP）模型，来识别正例和无标例之间的变化趋势。
results: 实验表明，该方法在具有高差异度的实际应用场景中表现出色，相比传统PUL方法，该方法可以提高$11.3%$的关键指标。

Abstract
Learning binary classifiers from positive and unlabeled data (PUL) is vital in many real-world applications, especially when verifying negative examples is difficult. Despite the impressive empirical performance of recent PUL methods, challenges like accumulated errors and increased estimation bias persist due to the absence of negative labels. In this paper, we unveil an intriguing yet long-overlooked observation in PUL: \textit{resampling the positive data in each training iteration to ensure a balanced distribution between positive and unlabeled examples results in strong early-stage performance. Furthermore, predictive trends for positive and negative classes display distinctly different patterns.} Specifically, the scores (output probability) of unlabeled negative examples consistently decrease, while those of unlabeled positive examples show largely chaotic trends. Instead of focusing on classification within individual time frames, we innovatively adopt a holistic approach, interpreting the scores of each example as a temporal point process (TPP). This reformulates the core problem of PUL as recognizing trends in these scores. We then propose a novel TPP-inspired measure for trend detection and prove its asymptotic unbiasedness in predicting changes. Notably, our method accomplishes PUL without requiring additional parameter tuning or prior assumptions, offering an alternative perspective for tackling this problem. Extensive experiments verify the superiority of our method, particularly in a highly imbalanced real-world setting, where it achieves improvements of up to $11.3\%$ in key metrics. The code is available at \href{https://github.com/wxr99/HolisticPU}{https://github.com/wxr99/HolisticPU}.

摘要
学习二分类器从正例和无标例数据（PUL）是许多实际应用中非常重要的，特别是当验证负例很困难时。虽然 latest PUL 方法在实际性方面表现出色，但是缺乏负例的存在会导致积累的错误和提高估计偏差。在这篇论文中，我们发现了PUL 中很长时间未被注意的一点：在每个训练轮中对正例数据进行重新采样，以确保正例和无标例数据之间的分布均衡，会在早期得到强大的表现。具体来说，无标例负例的分布下降，而无标例正例的分布则显示出很大的混乱趋势。而不是围绕各个时间帧的分类，我们创新地采用一种整体方法，视每个例的分数（输出概率）为时间点 процесс（TPP）。这将PUL 的核心问题重新定义为识别这些分数的趋势。我们然后提出一种基于 TPP 的新度量方法，并证明其在预测变化时的极限无偏性。与传统方法不同的是，我们的方法不需要额外的参数调整或假设，可以作为PUL 问题的另一种视角。广泛的实验证明了我们的方法的优越性，特别是在实际中具有很大的不均衡性的场景中，其在关键指标上提高了11.3%。代码可以在 \href{https://github.com/wxr99/HolisticPU}{https://github.com/wxr99/HolisticPU} 中找到。

Overview of AdaBoost : Reconciling its views to better understand its dynamics

paper_url: http://arxiv.org/abs/2310.18323
repo_url: None
paper_authors: Perceval Beja-Battais
for: 本文旨在探讨AdaBoost算法的不同视图和其相关的动力学。
methods: 本文将从Friend和Schapire的原始视图开始，然后探讨不同视图的AdaBoost算法，并将它们统一到同一个形式上。
results: 本文希望能帮助非专家读者更好地理解AdaBoost算法的动力学和不同视图之间的关系。

Abstract
Boosting methods have been introduced in the late 1980's. They were born following the theoritical aspect of PAC learning. The main idea of boosting methods is to combine weak learners to obtain a strong learner. The weak learners are obtained iteratively by an heuristic which tries to correct the mistakes of the previous weak learner. In 1995, Freund and Schapire [18] introduced AdaBoost, a boosting algorithm that is still widely used today. Since then, many views of the algorithm have been proposed to properly tame its dynamics. In this paper, we will try to cover all the views that one can have on AdaBoost. We will start with the original view of Freund and Schapire before covering the different views and unify them with the same formalism. We hope this paper will help the non-expert reader to better understand the dynamics of AdaBoost and how the different views are equivalent and related to each other.

摘要
boosting方法在1980年代晚期出现。它们的出现是基于PAC学习理论的。boosting方法的主要想法是将弱学习器合并而成一个强学习器。弱学习器是通过一种规则来获取，这种规则会尝试修复前一个弱学习器的错误。在1995年，Freund和Schapire（18）引入了AdaBoost算法，这是目前仍然广泛使用的。 desde entonces, 多种视角对算法进行了提出，以适应其动态。在这篇论文中，我们将尝试涵盖所有可能的视角，并将它们统一到同一种形式中。我们希望这篇论文能够帮助非专家读者更好地理解AdaBoost的动态和不同视角之间的关系。

DEFT: A new distance-based feature set for keystroke dynamics

paper_url: http://arxiv.org/abs/2310.04059
repo_url: None
paper_authors: Nuwan Kaluarachchi, Sevvandi Kandanaarachchi, Kristen Moore, Arathi Arakala
for: 用于用户身份验证和识别
methods: 使用新的键盘距离特征，与之前未被考虑的键盘动态特征结合，提供全面的打印行为特征
results: 在三种常见设备上（桌面、手机、平板电脑）测试DEFT模型，与现有状态的方法相比，实现了准确率超过99%，错误率低于10%

Abstract
Keystroke dynamics is a behavioural biometric utilised for user identification and authentication. We propose a new set of features based on the distance between keys on the keyboard, a concept that has not been considered before in keystroke dynamics. We combine flight times, a popular metric, with the distance between keys on the keyboard and call them as Distance Enhanced Flight Time features (DEFT). This novel approach provides comprehensive insights into a person's typing behaviour, surpassing typing velocity alone. We build a DEFT model by combining DEFT features with other previously used keystroke dynamic features. The DEFT model is designed to be device-agnostic, allowing us to evaluate its effectiveness across three commonly used devices: desktop, mobile, and tablet. The DEFT model outperforms the existing state-of-the-art methods when we evaluate its effectiveness across two datasets. We obtain accuracy rates exceeding 99% and equal error rates below 10% on all three devices.

摘要
键盘动态是一种行为生物特征，用于用户认证和身份验证。我们提出了一组基于键盘键位距离的新特征，这是之前在键盘动态中没有考虑过的概念。我们将这些特征与已经广泛使用的飞行时间相结合，并称之为距离增强飞行时间特征（DEFT）。这种新的方法可以带来用户键盘输入行为的全面的了解，超过了单纯的输入速度。我们构建了DEFT模型，并将其与其他已经使用的键盘动态特征相结合。这个DEFT模型是设备无关的，因此我们可以在桌面、手机和平板电脑上评估其效果。我们发现DEFT模型在两个数据集上的效果比现有状态的方法更高，我们在三个设备上获得了准确率超过99%，并且错误率低于10%。

AUTOPARLLM: GNN-Guided Automatic Code Parallelization using Large Language Models

paper_url: http://arxiv.org/abs/2310.04047
repo_url: None
paper_authors: Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung D Phan, Nesreen K. Ahmed, Ali Jannesari
for: 自动发现并生成并行代码的框架，以提高Sequential Programs中的并行化效率。
methods: 使用heterogeneous Graph Neural Network (GNN)来发现并行特征和并行模式，并使用LLM-based code generator生成并行版本的Sequential Programs。
results: 对11个应用程序进行了evaluation，并显示AUTOPARLLM可以提高当前LLM-based模型的并行代码生成效果，并且可以提高平均运行时间（在NAS Parallel Benchmark和Rodinia Benchmark上提高了3.4%和2.9%）。此外，提出了OMPScore来评估生成的并行代码质量，并表明OMPScore与人工评估之间存在更高的相关性（最多提高75%的Spearman相关性）。

Abstract
Parallelizing sequentially written programs is a challenging task. Even experienced developers need to spend considerable time finding parallelism opportunities and then actually writing parallel versions of sequentially written programs. To address this issue, we present AUTOPARLLM, a framework for automatically discovering parallelism and generating the parallel version of the sequentially written program. Our framework consists of two major components: i) a heterogeneous Graph Neural Network (GNN) based parallelism discovery and parallel pattern detection module, and ii) an LLM-based code generator to generate the parallel counterpart of the sequential programs. We use the GNN to learn the flow-aware characteristics of the programs to identify parallel regions in sequential programs and then construct an enhanced prompt using the GNN's results for the LLM-based generator to finally produce the parallel counterparts of the sequential programs. We evaluate AUTOPARLLM on 11 applications of 2 well-known benchmark suites: NAS Parallel Benchmark and Rodinia Benchmark. Our results show that AUTOPARLLM is indeed effective in improving the state-of-the-art LLM-based models for the task of parallel code generation in terms of multiple code generation metrics. AUTOPARLLM also improves the average runtime of the parallel code generated by the state-of-the-art LLMs by as high as 3.4% and 2.9% for the NAS Parallel Benchmark and Rodinia Benchmark respectively. Additionally, to overcome the issue that well-known metrics for translation evaluation have not been optimized to evaluate the quality of the generated parallel code, we propose OMPScore for evaluating the quality of the generated code. We show that OMPScore exhibits a better correlation with human judgment than existing metrics, measured by up to 75% improvement of Spearman correlation.

摘要
自动找到并生成并行代码的框架，我们提出了AUTOPARLLM。它包括两个主要组件：一个基于多型神经网络（GNN）的并行性发现和并行模式检测模块，以及一个基于LLM的代码生成器。我们使用GNN来学习程序的流程特征，以确定并行区域在连续编程中，然后使用GNN的结果构建了加强的提示，并使用LLM-based代码生成器生成并行程序的相应版本。我们对11个应用程序进行了 NAS Parallel Benchmark 和 Rodinia Benchmark 的测试，结果表明，AUTOPARLLM可以在 LLM-based 模型中提高并行代码生成的状态态度，并且在 NAS Parallel Benchmark 和 Rodinia Benchmark 中平均提高了3.4%和2.9%的运行时间。此外，为了解决现有评价纪录不适应评估生成的并行代码质量的问题，我们提出了OMPScore，它可以评估生成的代码质量，并且与人类判断之间 exhibits 更高的相关性，提高了75%的斯佩曼相关性。

Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering

paper_url: http://arxiv.org/abs/2310.04038
repo_url: https://github.com/weilvnju/jpltd
paper_authors: Wei Lv, Chao Zhang, Huaxiong Li, Xiuyi Jia, Chunlin Chen
for: address the problems of incomplete multi-view data and suboptimal graph construction in existing methods
methods: introduces an orthogonal projection matrix to project high-dimensional features into a lower-dimensional space, learns similarity graphs for instances of different views, and stacks these graphs into a third-order low-rank tensor to explore high-order correlations
results: outperforms state-of-the-art methods on several benchmark datasets, with an effective optimization algorithm to solve the JPLTD model

Abstract
Incomplete multi-view clustering (IMVC) has received increasing attention since it is often that some views of samples are incomplete in reality. Most existing methods learn similarity subgraphs from original incomplete multi-view data and seek complete graphs by exploring the incomplete subgraphs of each view for spectral clustering. However, the graphs constructed on the original high-dimensional data may be suboptimal due to feature redundancy and noise. Besides, previous methods generally ignored the graph noise caused by the inter-class and intra-class structure variation during the transformation of incomplete graphs and complete graphs. To address these problems, we propose a novel Joint Projection Learning and Tensor Decomposition Based method (JPLTD) for IMVC. Specifically, to alleviate the influence of redundant features and noise in high-dimensional data, JPLTD introduces an orthogonal projection matrix to project the high-dimensional features into a lower-dimensional space for compact feature learning.Meanwhile, based on the lower-dimensional space, the similarity graphs corresponding to instances of different views are learned, and JPLTD stacks these graphs into a third-order low-rank tensor to explore the high-order correlations across different views. We further consider the graph noise of projected data caused by missing samples and use a tensor-decomposition based graph filter for robust clustering.JPLTD decomposes the original tensor into an intrinsic tensor and a sparse tensor. The intrinsic tensor models the true data similarities. An effective optimization algorithm is adopted to solve the JPLTD model. Comprehensive experiments on several benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art methods. The code of JPLTD is available at https://github.com/weilvNJU/JPLTD.

摘要
隐藏多视图协同分 clustering（IMVC）在过去几年内已经收到了越来越多的关注，因为实际情况下样本的一些视图通常是 incomplete。现有的方法通常是从原始的 incomplete multi-view 数据中学习 Similarity subgraphs，然后通过探索每个视图中的 incomplete subgraphs 进行 spectral clustering。然而，在原始高维数据上构建的图可能是不优化的，这是因为特征的重复和噪声。此外，前一些方法通常忽略了在转换 incomplete graph 和 complete graph 过程中因为 inter-class 和 intra-class 结构变化而引起的图像噪声。为了解决这些问题，我们提出了一种新的 Joint Projection Learning and Tensor Decomposition Based 方法（JPLTD）。具体来说，为了减少高维特征的重复和噪声，JPLTD 引入了一个正交投影矩阵，将高维特征投影到一个lower-dimensional space中进行紧凑特征学习。同时，基于这个lower-dimensional space，JPLTD 学习了不同视图中的相似图，并将这些图栅stacked 成一个 third-order low-rank tensor，以探索不同视图之间的高阶相关性。我们还考虑了投影数据中的图像噪声，使用了基于 tensor 的分解方法进行 Robust clustering。JPLTD 将原始 tensor 分解为内在 tensor 和稀疏 tensor。内在 tensor 表示真实数据的相似性。我们采用了一种有效的优化算法来解决 JPLTD 模型。在多个 benchmark 数据集上进行了广泛的实验，结果表明，JPLTD 可以比 state-of-the-art 方法更高效。JPLTD 的代码可以在 GitHub 上找到：https://github.com/weilvNJU/JPLTD。

Genetic prediction of quantitative traits: a machine learner’s guide focused on height

paper_url: http://arxiv.org/abs/2310.04028
repo_url: None
paper_authors: Lucie Bourguignon, Caroline Weis, Catherine R. Jutzeler, Michael Adamer
for: 本文旨在为机器学习社区提供Current state of the art模型和相关细节，以便在开发新模型时更好地预测人类特征。
methods: 本文使用了height作为连续型特征的示例，并介绍了 referential datasets、干扰因素、特征选择和常用度量。
results: 本文提供了一个对现有模型和相关细节的概述，以便更好地理解和应用这些模型。

Abstract
Machine learning and deep learning have been celebrating many successes in the application to biological problems, especially in the domain of protein folding. Another equally complex and important question has received relatively little attention by the machine learning community, namely the one of prediction of complex traits from genetics. Tackling this problem requires in-depth knowledge of the related genetics literature and awareness of various subtleties associated with genetic data. In this guide, we provide an overview for the machine learning community on current state of the art models and associated subtleties which need to be taken into consideration when developing new models for phenotype prediction. We use height as an example of a continuous-valued phenotype and provide an introduction to benchmark datasets, confounders, feature selection, and common metrics.

摘要
<> translate "Machine learning and deep learning have been celebrating many successes in the application to biological problems, especially in the domain of protein folding. Another equally complex and important question has received relatively little attention by the machine learning community, namely the one of prediction of complex traits from genetics. Tackling this problem requires in-depth knowledge of the related genetics literature and awareness of various subtleties associated with genetic data. In this guide, we provide an overview for the machine learning community on current state of the art models and associated subtleties which need to be taken into consideration when developing new models for phenotype prediction. We use height as an example of a continuous-valued phenotype and provide an introduction to benchmark datasets, confounders, feature selection, and common metrics." into Simplified Chinese.中文简体版：机器学习和深度学习在生物问题中获得了许多成功，特别是在蛋白质折叠领域。然而，机器学习社区对复杂 trait 预测从遗传学方面Received relatively little attention，这是一个equally complex and important question。解决这个问题需要对相关的遗传学 литературе进行深入的了解，并对遗传数据中的各种细节进行了解。在这个指南中，我们为机器学习社区提供了现状概述，包括当前的状态艺术模型和相关的细节，以及开发新模型时需要考虑的因素。我们使用身高为continue 值的 fenotype 的例子，并介绍了标准 datasets，干扰因素、特征选择和常用度量。

PGraphDTA: Improving Drug Target Interaction Prediction using Protein Language Models and Contact Maps

paper_url: http://arxiv.org/abs/2310.04017
repo_url: None
paper_authors: Rakesh Bal, Yijia Xiao, Wei Wang
for:这个研究旨在提高药物标的互动预测精度，以促进药物探索。methods:本研究使用了蛋白语言模型（PLM）和聚合Current Models，以提高DTI预测的精度。results:研究结果显示，提案的方法在与基准模型比较之下表现出色，具有许多优点，如精度和速度等。

Abstract
Developing and discovering new drugs is a complex and resource-intensive endeavor that often involves substantial costs, time investment, and safety concerns. A key aspect of drug discovery involves identifying novel drug-target (DT) interactions. Existing computational methods for predicting DT interactions have primarily focused on binary classification tasks, aiming to determine whether a DT pair interacts or not. However, protein-ligand interactions exhibit a continuum of binding strengths, known as binding affinity, presenting a persistent challenge for accurate prediction. In this study, we investigate various techniques employed in Drug Target Interaction (DTI) prediction and propose novel enhancements to enhance their performance. Our approaches include the integration of Protein Language Models (PLMs) and the incorporation of Contact Map information as an inductive bias within current models. Through extensive experimentation, we demonstrate that our proposed approaches outperform the baseline models considered in this study, presenting a compelling case for further development in this direction. We anticipate that the insights gained from this work will significantly narrow the search space for potential drugs targeting specific proteins, thereby accelerating drug discovery. Code and data for PGraphDTA are available at https://anonymous.4open.science/r/PGraphDTA.

摘要
开发和发现新药物是一项复杂且需要资源的努力，通常需要大量的成本、时间投入和安全问题。新药物发现的关键之一是确定新药物-标的（DT）交互。现有的计算方法 для预测DT交互主要集中在 binary 分类任务上，试图确定DT对的交互是否存在。然而，蛋白质-药物交互存在着绑定强度的连续分布，这种挑战正在减少对预测的准确性。在这种研究中，我们调查了不同的DT预测技术和我们的提议，并通过广泛的实验来证明我们的提议方法可以超越基eline模型。我们的方法包括PLM（蛋白质语言模型）的集成和当前模型中的Contact Map信息作为逻辑偏好。我们通过广泛的实验证明，我们的提议方法可以超越基eline模型，这为药物发现提供了一个吸引人的可能性。我们期望通过这种研究，能够减少潜在药物 targeting 特定蛋白质的搜索空间，从而加速药物发现。PGraphDTA 代码和数据可以在上下载。

Anonymous Learning via Look-Alike Clustering: A Precise Analysis of Model Generalization

paper_url: http://arxiv.org/abs/2310.04015
repo_url: None
paper_authors: Adel Javanmard, Vahab Mirrokni
for: 这篇论文目的是探讨一种自然的技术—看类 clustering，以取代个人敏感特征，并评估这种方法对模型的泛化能力的影响。
methods: 本文使用了一种称为Convex Gaussian Minimax Theorem（CGMT）的精确分析方法，以了解模型的泛化误差。
results: 研究发现，在某些高维度情况下，使用看类 clustering 训练模型可以对泛化误差进行调整，并且在一些 finite-sample numerical experiments 中证实了这一点。

Abstract
While personalized recommendations systems have become increasingly popular, ensuring user data protection remains a top concern in the development of these learning systems. A common approach to enhancing privacy involves training models using anonymous data rather than individual data. In this paper, we explore a natural technique called \emph{look-alike clustering}, which involves replacing sensitive features of individuals with the cluster's average values. We provide a precise analysis of how training models using anonymous cluster centers affects their generalization capabilities. We focus on an asymptotic regime where the size of the training set grows in proportion to the features dimension. Our analysis is based on the Convex Gaussian Minimax Theorem (CGMT) and allows us to theoretically understand the role of different model components on the generalization error. In addition, we demonstrate that in certain high-dimensional regimes, training over anonymous cluster centers acts as a regularization and improves generalization error of the trained models. Finally, we corroborate our asymptotic theory with finite-sample numerical experiments where we observe a perfect match when the sample size is only of order of a few hundreds.

摘要
personalized recommendation systems 已经变得越来越受欢迎，保护用户数据的安全仍然是开发这些学习系统的主要挑战。一种常见的方法来增强隐私是使用匿名数据进行模型训练而不是个人数据。在这篇论文中，我们探讨一种自然的技术called“look-alike clustering”，这种技术把个人敏感特征替换为群集的平均值。我们提供了精确的分析，证明训练使用匿名群集中心的模型会带来一定的泛化误差。我们将关注一个 asymptotic 的情况，在这个情况下，特征维度与训练集大小成正比。我们的分析基于Convex Gaussian Minimax Theorem (CGMT)，允许我们从理论角度理解不同模型组件对泛化误差的影响。此外，我们还证明在某些高维度情况下，使用匿名群集中心进行训练会作为一种正则化，提高训练模型的泛化误差。最后，我们通过finite-sample 的numerical experiments来证明我们的极限理论，并发现采用这种方法可以在样本大小只有几百的情况下达到完美匹配。

Accelerating optimization over the space of probability measures

paper_url: http://arxiv.org/abs/2310.04006
repo_url: None
paper_authors: Shi Chen, Qin Li, Oliver Tse, Stephen J. Wright
for: 优化机器学习问题中的梯度下降问题
methods: 使用哈密顿流方法，类似于矩阵方法在欧几丁度空间中
results: 实现了无限阶 converges 率，数字示例证明了这一点

Abstract
Acceleration of gradient-based optimization methods is an issue of significant practical and theoretical interest, particularly in machine learning applications. Most research has focused on optimization over Euclidean spaces, but given the need to optimize over spaces of probability measures in many machine learning problems, it is of interest to investigate accelerated gradient methods in this context too. To this end, we introduce a Hamiltonian-flow approach that is analogous to moment-based approaches in Euclidean space. We demonstrate that algorithms based on this approach can achieve convergence rates of arbitrarily high order. Numerical examples illustrate our claim.

摘要
“加速梯度基本优化方法是一个有亮点的实用和理论问题，尤其在机器学习应用中。大多数研究都集中在欧几何空间上进行优化，但在许多机器学习问题中需要优化概率分布空间，因此研究加速梯度方法在这种情况下也是非常有价值的。为此，我们提出了一种哈密顿流方法，与欧几何空间中的点基方法类似。我们证明了这种方法可以实现任意高阶准确率。numerical examples validate our claim。”Here's the word-for-word translation:“加速梯度基本优化方法是一个有亮点的实用和理论问题，尤其在机器学习应用中。大多数研究都集中在欧几何空间上进行优化，但在许多机器学习问题中需要优化概率分布空间，因此研究加速梯度方法在这种情况下也是非常有价值的。为此，我们提出了一种哈密顿流方法，与欧几何空间中的点基方法类似。我们证明了这种方法可以实现任意高阶准确率。numerical examples validate our claim。”

The Role of Federated Learning in a Wireless World with Foundation Models

paper_url: http://arxiv.org/abs/2310.04003
repo_url: None
paper_authors: Zihan Chen, Howard H. Yang, Y. C. Tay, Kai Fong Ernest Chong, Tony Q. S. Quek
for: 本文探讨了基于无线网络的联邦学习（FL）和基本模型（FM）之间的交互，以及将FM应用于FL中的可能性和挑战。
methods: 本文提出了多种新的思路和方法，用于实现将FM与FL相结合的未来智能网络。这些方法包括使用分布式计算和数据处理来帮助FM的训练，以及使用FM来提高FL的性能。
results: 本文提出了许多未来智能网络的研究挑战和机遇，包括如何使用FM和FL来提高网络性能和安全性，以及如何处理数据隐私和安全问题。

Abstract
Foundation models (FMs) are general-purpose artificial intelligence (AI) models that have recently enabled multiple brand-new generative AI applications. The rapid advances in FMs serve as an important contextual backdrop for the vision of next-generation wireless networks, where federated learning (FL) is a key enabler of distributed network intelligence. Currently, the exploration of the interplay between FMs and FL is still in its nascent stage. Naturally, FMs are capable of boosting the performance of FL, and FL could also leverage decentralized data and computing resources to assist in the training of FMs. However, the exceptionally high requirements that FMs have for computing resources, storage, and communication overhead would pose critical challenges to FL-enabled wireless networks. In this article, we explore the extent to which FMs are suitable for FL over wireless networks, including a broad overview of research challenges and opportunities. In particular, we discuss multiple new paradigms for realizing future intelligent networks that integrate FMs and FL. We also consolidate several broad research directions associated with these paradigms.

摘要
在这篇文章中，我们探讨 FMs 是否适用于 FL over wireless networks，包括研究挑战和机遇的广泛概述。具体来说，我们讨论了多种新的实现未来智能网络的方法，以及这些方法相关的多个广泛的研究方向。

Runtime Monitoring DNN-Based Perception

paper_url: http://arxiv.org/abs/2310.03999
repo_url: None
paper_authors: Chih-Hong Cheng, Michael Luttenberger, Rongjie Yan
for: 本文旨在介绍一些用于实时验证深度神经网络（DNN）应用的方法，以确保这些应用不会导致安全问题。
methods: 文章提到了一些在机器学习社区中提出的监控方法，以及一些由正式方法社区提出的监控方法。两者之间的决策边界创建方式有所不同。
results: 文章强调了需要仔细设计监控器，特别是在数据可用性外 опера作域的情况下。

Abstract
Deep neural networks (DNNs) are instrumental in realizing complex perception systems. As many of these applications are safety-critical by design, engineering rigor is required to ensure that the functional insufficiency of the DNN-based perception is not the source of harm. In addition to conventional static verification and testing techniques employed during the design phase, there is a need for runtime verification techniques that can detect critical events, diagnose issues, and even enforce requirements. This tutorial aims to provide readers with a glimpse of techniques proposed in the literature. We start with classical methods proposed in the machine learning community, then highlight a few techniques proposed by the formal methods community. While we surely can observe similarities in the design of monitors, how the decision boundaries are created vary between the two communities. We conclude by highlighting the need to rigorously design monitors, where data availability outside the operational domain plays an important role.

摘要

AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement

paper_url: http://arxiv.org/abs/2310.03984
repo_url: None
paper_authors: Zhenghai Xue, Qingpeng Cai, Tianyou Zuo, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An
for: 优化长期用户参与度的推荐任务中的RL算法。
methods: 提出了一种新的 Adaptive Sequential Recommendation（AdaRec）模式，使用距离基于表达函数损失来抽取用户互动轨迹中的隐藏信息，以反映RL策略与当前用户行为模式之间的适应度。
results: 在 simulator-based 和实际推荐任务中，AdaRec 展现出了与所有基准算法相比的长期性表现优异。

Abstract
Growing attention has been paid to Reinforcement Learning (RL) algorithms when optimizing long-term user engagement in sequential recommendation tasks. One challenge in large-scale online recommendation systems is the constant and complicated changes in users' behavior patterns, such as interaction rates and retention tendencies. When formulated as a Markov Decision Process (MDP), the dynamics and reward functions of the recommendation system are continuously affected by these changes. Existing RL algorithms for recommendation systems will suffer from distribution shift and struggle to adapt in such an MDP. In this paper, we introduce a novel paradigm called Adaptive Sequential Recommendation (AdaRec) to address this issue. AdaRec proposes a new distance-based representation loss to extract latent information from users' interaction trajectories. Such information reflects how RL policy fits to current user behavior patterns, and helps the policy to identify subtle changes in the recommendation system. To make rapid adaptation to these changes, AdaRec encourages exploration with the idea of optimism under uncertainty. The exploration is further guarded by zero-order action optimization to ensure stable recommendation quality in complicated environments. We conduct extensive empirical analyses in both simulator-based and live sequential recommendation tasks, where AdaRec exhibits superior long-term performance compared to all baseline algorithms.

摘要
《增强用户持续参与的推荐算法》随着用户行为模式的不断变化，在大规模在线推荐系统中，RL算法的优化问题已经吸引了越来越多的关注。然而，这些变化会导致RL算法的分布shift，使得现有的RL算法难以适应。在这篇论文中，我们介绍了一种新的推荐算法called Adaptive Sequential Recommendation（AdaRec），用于解决这个问题。AdaRec提出了一种基于距离的表示损失来提取用户交互轨迹中的隐藏信息。这种信息反映RL策略是否适应当前用户行为模式，并帮助RL策略识别推荐系统中的微scopic变化。为了快速适应这些变化，AdaRec鼓励探索，并通过 Zero-order action optimization 来保证在复杂环境中的稳定推荐质量。我们在模拟器和实际推荐任务中进行了广泛的实验研究，并证明了AdaRec在长期性方面比所有基准算法表现出色。

Ultimate limit on learning non-Markovian behavior: Fisher information rate and excess information

paper_url: http://arxiv.org/abs/2310.03968
repo_url: None
paper_authors: Paul M. Riechers
for: 本文探讨了从时间序列数据中学习未知参数的基本限制，并发现了optimal inference的最佳尺度是几何函数关于观测长度的平方根。
methods: 本文使用了参数化的模型类型，利用观测序列概率的fisher信息来下界模型参数的变分。
results: 作者发现了一个简单的关闭式表达式，用于描述对于不同Markov顺序的情况下的信息率。此外，作者还获得了对于 Observation-induced metadynamic的lower bound，以及不同模型的变分。

Abstract
We address the fundamental limits of learning unknown parameters of any stochastic process from time-series data, and discover exact closed-form expressions for how optimal inference scales with observation length. Given a parametrized class of candidate models, the Fisher information of observed sequence probabilities lower-bounds the variance in model estimation from finite data. As sequence-length increases, the minimal variance scales as the square inverse of the length -- with constant coefficient given by the information rate. We discover a simple closed-form expression for this information rate, even in the case of infinite Markov order. We furthermore obtain the exact analytic lower bound on model variance from the observation-induced metadynamic among belief states. We discover ephemeral, exponential, and more general modes of convergence to the asymptotic information rate. Surprisingly, this myopic information rate converges to the asymptotic Fisher information rate with exactly the same relaxation timescales that appear in the myopic entropy rate as it converges to the Shannon entropy rate for the process. We illustrate these results with a sequence of examples that highlight qualitatively distinct features of stochastic processes that shape optimal learning.

摘要
我们研究了任何测量过程的不知数参数的基本限制，并发现了对观测长度的优化推断的准确闭形表达。给定一个参数化的模型类，观测序列概率的鱼 informationsLower bounds模型参数的噪声Variance from finite data。随着序列长度增加，最小噪声 scaling为时间平方 reciprocal —— with constant coefficient given by the information rate。我们还发现了观测引起的层次隐藏函数的下界，以及这个函数的closed-form表达，包括无穷Markov顺序的情况。此外，我们还发现了不同类型的收敛模式，包括短暂、指数和更一般的收敛模式，并且这些收敛模式与 asymptotic Fisher information rate的relaxation timescales完全相同。我们通过一系列示例来highlight这些结果的Qualitatively distinct features of stochastic processes that shape optimal learning.Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

2023-10-06

eess.IV

eess.IV - 2023-10-06

A Plug-and-Play Image Registration Network

paper_url: http://arxiv.org/abs/2310.04297
repo_url: None
paper_authors: Junhao Hu, Weijie Gan, Zhixin Sun, Hongyu An, Ulugbek S. Kamilov
for: 这个研究旨在开发一个基于深度学习的形变影像注册（DIR）方法，以提高生物医学影像注册的精度和效率。methods: 这个方法基于一个条件enced Convolutional Neural Network（CNN）来估计两个输入影像之间的注册场，并透过将CNN检测器”嵌入”到一个迭代法中，以增强注册的稳定性和准确性。results: 我们的方法在OASIS和CANDI dataset上的数据显示，能够 дости得生物医学影像注册的州度顶峰性能。

Abstract
Deformable image registration (DIR) is an active research topic in biomedical imaging. There is a growing interest in developing DIR methods based on deep learning (DL). A traditional DL approach to DIR is based on training a convolutional neural network (CNN) to estimate the registration field between two input images. While conceptually simple, this approach comes with a limitation that it exclusively relies on a pre-trained CNN without explicitly enforcing fidelity between the registered image and the reference. We present plug-and-play image registration network (PIRATE) as a new DIR method that addresses this issue by integrating an explicit data-fidelity penalty and a CNN prior. PIRATE pre-trains a CNN denoiser on the registration field and "plugs" it into an iterative method as a regularizer. We additionally present PIRATE+ that fine-tunes the CNN prior in PIRATE using deep equilibrium models (DEQ). PIRATE+ interprets the fixed-point iteration of PIRATE as a network with effectively infinite layers and then trains the resulting network end-to-end, enabling it to learn more task-specific information and boosting its performance. Our numerical results on OASIS and CANDI datasets show that our methods achieve state-of-the-art performance on DIR.

摘要
扭形图像registratio (DIR) 是生物医学成像领域的活跃研究领域。随着深度学习 (DL) 的发展，DIR 方法也在不断地演化。传统的 DL 方法是通过训练一个卷积神经网络 (CNN) 来估计两个输入图像之间的 registrtion 场。然而，这种方法存在一个限制，即完全依赖于预训练的 CNN，而不是直接强制图像注册和参考图像之间的准确性。我们提出了一种新的 DIR 方法，即插件和游戏图像注册网络 (PIRATE)，它通过结合显式数据准确性罚和 CNN 先验来解决这个问题。PIRATE 先训练了一个 CNN 减噪器在注册场景中，然后将其作为 PIRATE 的规则进行插入。此外，我们还提出了 PIRATE+，它在 PIRATE 中使用深度平衡模型 (DEQ) 进行 fine-tuning，从而使 PIRATE 能够更好地学习任务特有的信息，提高其性能。我们的数字结果表明，我们的方法在 OASIS 和 CANDI 数据集上实现了 DIR 领域的状态能力。

Towards Non-contact 3D Ultrasound for Wrist Imaging

paper_url: http://arxiv.org/abs/2310.04296
repo_url: None
paper_authors: Antony Jerald, A. N. Madhavanunni, Gayathri Malamal, Mahesh Raveendranatha Panicker
for:The paper aims to develop a novel approach for non-contact freehand 3D ultrasound imaging with minimal complexity added to existing point of care ultrasound (POCUS) systems.methods:The proposed approach uses a mechanical track for non-contact ultrasound scanning, which restricts the probe motion to a linear plane and simplifies the acquisition and 3D reconstruction process. A pipeline for US 3D volume reconstruction using an US research platform and a GPU-based edge device is developed.results:The proposed approach is demonstrated through ex-vivo and in-vivo experiments, showing its efficacy in providing accurate 3D US imaging with adjustable field of view capability, non-contact design, and low cost of deployment without significantly altering the existing setup.Please note that the above information is in Simplified Chinese:for: 这篇论文的目的是开发一种新的非接触自由手3D超声成像方法，以便在现有的点检超声系统（POCUS）上增加了最小复杂度。methods: 该提议使用机械轨迹进行非接触超声扫描，这限制了探针的运动范围为直线平面，以简化获取和3D重建过程。一个基于超声研究平台和GPU的边缘设备上的3D超声卷积管道也被开发。results: 该提议的效果通过外科和生物实验显示，能够提供精度的3D超声成像，并且具有可调Field of view功能、非接触设计和低成本部署，不会对现有设置产生显著影响。

Abstract
Objective: The objective of this work is an attempt towards non-contact freehand 3D ultrasound imaging with minimal complexity added to the existing point of care ultrasound (POCUS) systems. Methods: This study proposes a novel approach of using a mechanical track for non-contact ultrasound (US) scanning. The approach thus restricts the probe motion to a linear plane, to simplify the acquisition and 3D reconstruction process. A pipeline for US 3D volume reconstruction employing an US research platform and a GPU-based edge device is developed. Results: The efficacy of the proposed approach is demonstrated through ex-vivo and in-vivo experiments. Conclusion: The proposed approach with the adjustable field of view capability, non-contact design, and low cost of deployment without significantly altering the existing setup would open doors for up gradation of traditional systems to a wide range of 3D US imaging applications. Significance: Ultrasound (US) imaging is a popular clinical imaging modality for the point-of-care bedside imaging, particularly of the wrist/knee in the pediatric population due to its non-invasive and radiation free nature. However, the limited views of tissue structures obtained with 2D US in such scenarios make the diagnosis challenging. To overcome this, 3D US imaging which uses 2D US images and their orientation/position to reconstruct 3D volumes was developed. The accurate position estimation of the US probe at low cost has always stood as a challenging task in 3D reconstruction. Additionally, US imaging involves contact, which causes difficulty to pediatric subjects while monitoring live fractures or open wounds. Towards overcoming these challenges, a novel framework is attempted in this work.

摘要
Methods: This study proposes a new approach using a mechanical track for non-contact US scanning, which simplifies the acquisition and 3D reconstruction process by restricting the probe motion to a linear plane. A pipeline for US 3D volume reconstruction using an US research platform and a GPU-based edge device is developed.Results: The effectiveness of the proposed approach is demonstrated through ex-vivo and in-vivo experiments.Conclusion: The proposed approach with adjustable field of view capability, non-contact design, and low cost of deployment without significantly altering the existing setup would expand the applications of 3D US imaging in a wide range of clinical scenarios, particularly in pediatric populations.Significance: Traditional 2D US imaging has limited views of tissue structures, making diagnosis challenging. 3D US imaging overcomes this limitation by using 2D US images and their orientation/position to reconstruct 3D volumes. However, accurate probe position estimation at low cost has been a long-standing challenge in 3D reconstruction. Additionally, contact-based US imaging can be difficult for pediatric subjects, particularly when monitoring live fractures or open wounds. This novel framework addresses these challenges and has the potential to upgrade traditional systems for a wide range of 3D US imaging applications.In Simplified Chinese:目标：本研究的目标是开发一种新的、非接触、低成本的ultrasound（US）图像三维重建方法，以便在现有的点检查ultrasound（POCUS）系统上进行最小化的修改。方法：这种研究提议使用机械轨迹来实现非接触US扫描，这将简化获取和三维重建过程，并且只有在 linear 平面上进行探针运动。在US研究平台和GPU基于的边缘设备上开发了US三维图像重建的管道。结果：经过对外部和内部实验，效果表明了该方法的可行性。结论：该方法可以提供可调适的视场、非接触设计和低成本实施，不会对现有设置进行重大改变。这将扩展3D US图像 reconstruction在各种临床应用中的可能性，特别是在儿童人口中。重要性：传统的2D US图像有限制的视场，从而使诊断变得困难。3D US图像使用2D US图像和其orientation/position来重建3DVolume，从而突破这些限制。然而，在3D重建中准确地计算探针位置的低成本问题一直是一个挑战。此外，基于接触的US图像扫描可能会对儿童人口产生困难，特别是在监测活动骨折或开放性伤口时。这种新的框架可以解决这些挑战，并且有可能升级传统系统，以扩展3D US图像重建的应用范围。

Hessian-based Similarity Metric for Multimodal Medical Image Registration

paper_url: http://arxiv.org/abs/2310.04009
repo_url: None
paper_authors: Mohammadreza Eskandari, Houssem-Eddine Gueziri, D. Louis Collins
for: 这个论文主要是为了提出一种新的医学影像匹配算法，用于衡量不同医学影像模式之间的相似性。
methods: 该论文使用了一种基于幂函数的方法，通过研究两个完美匹配的图像板块之间的偏微分关系，来量化它们之间的相似性。
results: 该论文通过实验表明，该新的相似性度量可以快速和精度地衡量不同医学影像模式之间的相似性，并且可以快速和精度地进行医学影像匹配。

Abstract
One of the fundamental elements of both traditional and certain deep learning medical image registration algorithms is measuring the similarity/dissimilarity between two images. In this work, we propose an analytical solution for measuring similarity between two different medical image modalities based on the Hessian of their intensities. First, assuming a functional dependence between the intensities of two perfectly corresponding patches, we investigate how their Hessians relate to each other. Secondly, we suggest a closed-form expression to quantify the deviation from this relationship, given arbitrary pairs of image patches. We propose a geometrical interpretation of the new similarity metric and an efficient implementation for registration. We demonstrate the robustness of the metric to intensity nonuniformities using synthetic bias fields. By integrating the new metric in an affine registration framework, we evaluate its performance for MRI and ultrasound registration in the context of image-guided neurosurgery using target registration error and computation time.

摘要
一种基本元素 OF both traditional and certain deep learning医疗图像注册算法是测量两个图像之间的相似性/不同性。在这项工作中，我们提出了一个分析解决方案，用于测量两种不同医疗图像模式之间的相似性，基于图像强度的赫西安关系。首先，我们假设两个完美匹配的图像块之间存在函数依赖关系，然后我们研究了这两个赫西安之间的关系。其次，我们提出了一个具有closed-form表达式，用于衡量这种关系的偏差，给出任意两个图像块的对应。我们提出了一种几何解释这个新的相似度标准和一种高效的实现方式，并且在affine注册框架中集成了这个标准。我们通过使用模拟的扭曲场来证明metric的稳定性，并且在MRI和ultrasound注册中进行了image-guided neurosurgery的应用，通过target registration error和计算时间来评估metric的性能。

2023-10-06

eess.SP

eess.SP - 2023-10-06

Deep Learning Based Active Spatial Channel Gain Prediction Using a Swarm of Unmanned Aerial Vehicles

paper_url: http://arxiv.org/abs/2310.04547
repo_url: None
paper_authors: Enes Krijestorac, Danijela Cabric
for: 预测无线通道增强（CG）在空间中的预测是许多重要无线网络设计问题的必需工具。本文开发了采用环境特定特征，即建筑地图和CG测量，以实现高精度预测的预测方法。
methods: 我们提出了两种活动预测方法，即基于深度学习（DL）和 Kriging interpolación。第一种方法不依赖发送器位置，并利用3D地图补做不精确的预测。我们使用DL来 incorporate 3D maps into prediction和 reinforcement learning for optimal path planning for UAVs based on DL prediction。第二种方法基于 Kriging interpolación，需要知道发送器位置，而且不能使用3D地图。我们在一个基于射线追踪的通道模拟器中训练和评估两种提议的方法。
results: 我们通过 simulated experiments demonstrate the importance of active prediction compared to prediction based on randomly collected measurements of channel gain。另外，我们还表明使用 DL 和 3D maps，可以在不知道发送器位置的情况下实现高精度预测。 finally，我们还证明了在使用多个 UAVs 采集测量时，协调的路径规划对于活动预测具有重要的重要性。

Abstract
Prediction of wireless channel gain (CG) across space is a necessary tool for many important wireless network design problems. In this paper, we develop prediction methods that use environment-specific features, namely building maps and CG measurements, to achieve high prediction accuracy. We assume that measurements are collected using a swarm of coordinated unmanned aerial vehicles (UAVs). We develop novel active prediction approaches which consist of both methods for UAV path planning for optimal measurement collection and methods for prediction of CG across space based on the collected measurements. We propose two active prediction approaches based on deep learning (DL) and Kriging interpolation. The first approach does not rely on the location of the transmitter and utilizes 3D maps to compensate for the lack of it. We utilize DL to incorporate 3D maps into prediction and reinforcement learning for optimal path planning for the UAVs based on DL prediction. The second active prediction approach is based on Kriging interpolation, which requires known transmitter location and cannot utilize 3D maps. We train and evaluate the two proposed approaches in a ray-tracing-based channel simulator. Using simulations, we demonstrate the importance of active prediction compared to prediction based on randomly collected measurements of channel gain. Furthermore, we show that using DL and 3D maps, we can achieve high prediction accuracy even without knowing the transmitter location. We also demonstrate the importance of coordinated path planning for active prediction when using multiple UAVs compared to UAVs collecting measurements independently in a greedy manner.

摘要
<>按照以下准则进行简化中文翻译：1. 使用标准中文翻译词汇和 grammar2. 尽可能简化语句结构和表达3. 保留原文的意思和主题Prediction of wireless channel gain (CG) across space is a crucial tool for many important wireless network design problems. In this paper, we develop prediction methods that use environment-specific features, namely building maps and CG measurements, to achieve high prediction accuracy. We assume that measurements are collected using a swarm of coordinated unmanned aerial vehicles (UAVs). We develop novel active prediction approaches that consist of both methods for UAV path planning for optimal measurement collection and methods for prediction of CG across space based on the collected measurements. We propose two active prediction approaches based on deep learning (DL) and Kriging interpolation. The first approach does not rely on the location of the transmitter and utilizes 3D maps to compensate for the lack of it. We utilize DL to incorporate 3D maps into prediction and reinforcement learning for optimal path planning for the UAVs based on DL prediction. The second active prediction approach is based on Kriging interpolation, which requires known transmitter location and cannot utilize 3D maps. We train and evaluate the two proposed approaches in a ray-tracing-based channel simulator. Using simulations, we demonstrate the importance of active prediction compared to prediction based on randomly collected measurements of channel gain. Furthermore, we show that using DL and 3D maps, we can achieve high prediction accuracy even without knowing the transmitter location. We also demonstrate the importance of coordinated path planning for active prediction when using multiple UAVs compared to UAVs collecting measurements independently in a greedy manner.Translated text:预测无线通道增强（CG）在空间是许多重要无线网络设计问题中的必需工具。在这篇论文中，我们开发了预测方法，使用环境特定特征，namely building maps和CG测量，以实现高预测精度。我们假设测量是通过一群协调的无人飞行器（UAVs）进行收集。我们开发了两种活动预测方法，它们分别基于深度学习（DL）和 Kriging interpolate。第一种方法不依赖发送器位置，并利用3D地图补偿发送器位置的缺失。我们利用 DL 将3D地图 incorporated into prediction，并通过强化学习对 UAVs 的路径规划进行优化，基于 DL 预测。第二种方法基于 Kriging interpolate，它需要知道发送器位置，而且无法使用3D地图。我们在一个基于投影法的通道模拟器中训练和评估了两种提议的方法。使用仿真，我们表明了活动预测比随机收集通道增强的预测更重要。此外，我们还表明了使用 DL 和3D地图，我们可以在不知道发送器位置的情况下实现高预测精度。我们还 demonstarted 多个 UAVs 协调的路径规划对活动预测的重要性。

Evolution of High Throughput Satellite Systems: Vision, Requirements, and Key Technologies

paper_url: http://arxiv.org/abs/2310.04389
repo_url: None
paper_authors: Olfa Ben Yahia, Zineb Garroussi, Olivier Bélanger, Brunilde Sansò, Jean-François Frigon, Stéphane Martel, Antoine Lesage-Landry, Gunes Karabulut Kurt
For: The paper provides a comprehensive state-of-the-art of high throughput satellite (HTS) systems and envisions the next generation of extremely high-throughput satellite (EHTS) systems.* Methods: The paper discusses various techniques such as beamforming, advanced modulation techniques, reconfigurable phased array technologies, and electronically steerable antennas that are being used to improve the performance of HTS systems.* Results: The paper provides a vision for future EHTS systems that will maximize spectrum reuse and data rates, and flexibly steer capacity to satisfy user demand. Additionally, the paper introduces a novel architecture for future regenerative payloads and summarizes the challenges imposed by this architecture.

Abstract
High throughput satellites (HTS), with their digital payload technology, are expected to play a key role as enablers of the upcoming 6G networks. HTS are mainly designed to provide higher data rates and capacities. Fueled by technological advancements including beamforming, advanced modulation techniques, reconfigurable phased array technologies, and electronically steerable antennas, HTS have emerged as a fundamental component for future network generation. This paper offers a comprehensive state-of-the-art of HTS systems, with a focus on standardization, patents, channel multiple access techniques, routing, load balancing, and the role of software-defined networking (SDN). In addition, we provide a vision for next-satellite systems that we named as extremely-HTS (EHTS) toward autonomous satellites supported by the main requirements and key technologies expected for these systems. The EHTS system will be designed such that it maximizes spectrum reuse and data rates, and flexibly steers the capacity to satisfy user demand. We introduce a novel architecture for future regenerative payloads while summarizing the challenges imposed by this architecture.

摘要
高通信率卫星（HTS）预计将扮演6G网络的关键激活器。HTS主要用于提供更高的数据速率和容量。驱动技术的进步，包括射频扫描、高级调制技术、可编程相位阵列技术和电子扫描天线，使HTS成为未来网络代表性的组件。本文提供了HTS系统的全面状态艺术，强调标准化、套件、通道多访问技术、路由、负荷均衡和软件定义网络（SDN）的角色。此外，我们还提出了下一代卫星系统，我们称之为“极高通信率卫星”（EHTS），该系统将具备自主卫星的主要需求和关键技术。EHTS系统将实现spectrum reuse和数据速率的最大化，并可以自动调整容量来满足用户需求。我们还介绍了未来复合 payload 的新架构，并总结了这种架构带来的挑战。

Enhanced Backpressure Routing Using Wireless Link Features

paper_url: http://arxiv.org/abs/2310.04364
repo_url: None
paper_authors: Zhongyuan Zhao, Gunjan Verma, Ananthram Swami, Santiago Segarra
for: 提高 wireless multi-hop 网络中分布式Routing和Scheduling的效率和延迟
methods: 使用 Biased BP 和短路寻址机制，不增加每次时间步骤的信号量 overhead
results: 提出了优化积分偏好、保持偏好在移动环境下、以及 incorporating sojourn time awareness into biased BP 等三个长期挑战，并通过分析和实验证明其效果。

Abstract
Backpressure (BP) routing is a well-established framework for distributed routing and scheduling in wireless multi-hop networks. However, the basic BP scheme suffers from poor end-to-end delay due to the drawbacks of slow startup, random walk, and the last packet problem. Biased BP with shortest path awareness can address the first two drawbacks, and sojourn time-based backlog metrics were proposed for the last packet problem. Furthermore, these BP variations require no additional signaling overhead in each time step compared to the basic BP. In this work, we further address three long-standing challenges associated with the aforementioned low-cost BP variations, including optimal scaling of the biases, bias maintenance under mobility, and incorporating sojourn time awareness into biased BP. Our analysis and experimental results show that proper scaling of biases can be achieved with the help of common link features, which can effectively reduce end-to-end delay of BP by mitigating the random walk of packets under low-to-medium traffic, including the last packet scenario. In addition, our low-overhead bias maintenance scheme is shown to be effective under mobility, and our bio-inspired sojourn time-aware backlog metric is demonstrated to be more efficient and effective for the last packet problem than existing approaches when incorporated into biased BP.

摘要
背压路由（BP）是无线多项网络中分布路由和排程的一个成熟框架。然而，基本BP方案受到终端到终端延迟的问题，包括启动时间较慢、随机漫步和最后一个包问题。偏好BP可以解决首两个问题，而且可以使用游历时间-基础的伙伴度量来解决最后一个包问题。此外，这些BP变化不需要每个时间步骤中额外的讯号过程。在这个工作中，我们进一步解决了这些低成本BP变化的三个长期挑战，包括对偏好的优化维护、在移动环境中维护偏好以及将游历时间意识到偏好BP中。我们的分析和实验结果显示，正确地对偏好进行缩小可以使用通用链接特征来减少BP对终端的延迟，包括最后一个包enario。此外，我们的低负载维护方案在移动环境中是有效的，并且将游历时间意识到偏好BP中的方法比较高效和有效。

A physics-informed generative model for passive radio-frequency sensing

paper_url: http://arxiv.org/abs/2310.04173
repo_url: None
paper_authors: Stefano Savazzi, Federica Fieramosca, Sanaz Kianoush, Vittorio Rampa, Michele D’amico
for: 研究人员使用电romagnetic (EM) 体模型来预测无线设备附近的电磁波强度，并且应用于通信和位置测定等问题。
methods: 使用physics-informed生成神经网络 (GNN) 模型，将电磁波体diffraction方法 incorporated into variational autoencoder (VAE) 技术，以便模拟/重建缺失的样本或学习受物理法则约束的数据分布。
results: 与传统diffraction-based EM body工具相比，提出的 EM-informed生成模型能够更好地预测真实的电磁波强度，并且在实际测量数据上验证了其有效性。

Abstract
Electromagnetic (EM) body models predict the impact of human presence and motions on the Radio-Frequency (RF) stray radiation received by wireless devices nearby. These wireless devices may be co-located members of a Wireless Local Area Network (WLAN) or even cellular devices connected with a Wide Area Network (WAN). Despite their accuracy, EM models are time-consuming methods which prevent their adoption in strict real-time computational imaging problems and Bayesian estimation, such as passive localization, RF tomography, and holography. Physics-informed Generative Neural Network (GNN) models have recently attracted a lot of attention thanks to their potential to reproduce a process by incorporating relevant physical laws and constraints. Thus, GNNs can be used to simulate/reconstruct missing samples, or learn physics-informed data distributions. The paper discusses a Variational Auto-Encoder (VAE) technique and its adaptations to incorporate a relevant EM body diffraction method with applications to passive RF sensing and localization/tracking. The proposed EM-informed generative model is verified against classical diffraction-based EM body tools and validated on real RF measurements. Applications are also introduced and discussed.

摘要
电磁体（EM）模型预测人员存在和运动对附近无线设备接收的 радио频偏振（RF）杂谱的影响。这些无线设备可能是分布在同一个地方的无线本地网络（WLAN）成员或者连接到宽带网络（WAN）的无线设备。尽管它们的准确性很高，但EM模型是时间consuming的方法，这阻碍了它们在严格的实时计算图像问题和 bayesian估计中的采用。物理学 Informed Generative Neural Network（GNN）模型在最近吸引了很多关注，因为它们可以通过包含相关的物理法律和约束来重现一个过程。因此，GNN可以用来 simulate/重construct缺失的样本，或者学习物理学 Informed 数据分布。文章介绍了一种 Variational Auto-Encoder（VAE）技术和其修改，以包含相关的EM体 diffraction 方法，并应用于无线RF感知和定位/跟踪。提出的EM-informed生成模型被证明了 классиical diffraction-based EM体工具和实际RF测量。应用也是介绍和讨论的。

Physics-assisted machine learning for THz spectroscopy: sensing moisture on plant leaves

paper_url: http://arxiv.org/abs/2310.04056
repo_url: None
paper_authors: Milan Koumans, Daan Meulendijks, Haiko Middeljans, Djero Peeters, Jacob C. Douma, Dook van Mechelen
for: 这个论文旨在用机器学习技术提高 THz 时间域спектроскопи亮度，以实现实用应用。
methods: 该论文使用了决策树和卷积神经网络等机器学习技术，基于光物理学知识进行辅助。
results: 研究人员通过对 12,000 个水pattern 的 THz 时间域数据进行分析，提出了关于决定水Pattern 的重要发现，并证明了这些模型在不同的测试集上的普适性。

Abstract
Signal processing techniques are of vital importance to bring THz spectroscopy to a maturity level to reach practical applications. In this work, we illustrate the use of machine learning techniques for THz time-domain spectroscopy assisted by domain knowledge based on light-matter interactions. We aim at the potential agriculture application to determine the amount of free water on plant leaves, so-called leaf wetness. This quantity is important for understanding and predicting plant diseases that need leaf wetness for disease development. The overall transmission of a moist plant leaf for 12,000 distinct water patterns was experimentally acquired using THz time-domain spectroscopy. We report on key insights of applying decision trees and convolutional neural networks to the data using physics-motivated choices. Eventually, we discuss the generalizability of these models to determine leaf wetness after testing them on cases with increasing deviations from the training set.

摘要
<>translate "Signal processing techniques are of vital importance to bring THz spectroscopy to a maturity level to reach practical applications. In this work, we illustrate the use of machine learning techniques for THz time-domain spectroscopy assisted by domain knowledge based on light-matter interactions. We aim at the potential agriculture application to determine the amount of free water on plant leaves, so-called leaf wetness. This quantity is important for understanding and predicting plant diseases that need leaf wetness for disease development. The overall transmission of a moist plant leaf for 12,000 distinct water patterns was experimentally acquired using THz time-domain spectroscopy. We report on key insights of applying decision trees and convolutional neural networks to the data using physics-motivated choices. Eventually, we discuss the generalizability of these models to determine leaf wetness after testing them on cases with increasing deviations from the training set." into Simplified Chinese.中文简体版：信号处理技术对于 THz спектроскопия的成熔度具有核心重要性，以实现实用应用。本工作介绍了基于光物理相互作用的机器学习技术在 THz 时域спектроскопии中的应用。我们target了农业应用，通过测量植物叶子上的自由水量，也称为叶质湿度。这个量对于理解和预测植物疾病非常重要，疾病发展需要叶质湿度。我们通过 THz 时域спектроскопии实验获得了12,000个不同水平的叶质湿度数据。我们使用决策树和卷积神经网络对数据进行分析，并根据物理原理进行选择。最后，我们讨论了这些模型在不同于训练集的情况下的泛化性。

2023-10-05

cs.SD

cs.SD - 2023-10-05

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios

paper_url: http://arxiv.org/abs/2310.03938
repo_url: None
paper_authors: Tejes Srivastava, Jiatong Shi, William Chen, Shinji Watanabe
for: 提高多语言语音识别 task 的性能
methods: 使用 SSL 模型进行预测，并将多个 SSL 模型的特征进行预测
results: 提高了 ML-SUPERB benchmarck 中的平均 SUPERB 分数，并且减少了模型参数大小和执行时间

Abstract
Self-Supervised Learning (SSL) models have demonstrated exceptional performance in various speech tasks, particularly in low-resource and multilingual domains. Recent works show that fusing SSL models could achieve superior performance compared to using one SSL model. However, fusion models have increased model parameter size, leading to longer inference times. In this paper, we propose a novel approach of predicting other SSL models' features from a single SSL model, resulting in a light-weight framework with competitive performance. Our experiments show that SSL feature prediction models outperform individual SSL models in multilingual speech recognition tasks. The leading prediction model achieves an average SUPERB score increase of 135.4 in ML-SUPERB benchmarks. Moreover, our proposed framework offers an efficient solution, as it reduces the resulting model parameter size and inference times compared to previous fusion models.

摘要

Challenges and Insights: Exploring 3D Spatial Features and Complex Networks on the MISP Dataset

paper_url: http://arxiv.org/abs/2310.03901
repo_url: None
paper_authors: Yiwen Shao
for: 本研究旨在探讨多通道多人说话识别问题中，如干扰声、延迟和 overlap 等问题，以及如何通过 Contextual cues 来分离目标说话人的speech。
methods: 本研究使用了3D spatial feature，具体来说是通过计算目标说话人的位势信息来提高识别率。
results: 研究发现，通过使用3D spatial feature，可以减少或完全消除中间处理步骤，从而提高识别率。此外，对 MISP 数据集的扩展和模型的验证也表明了该方法的可行性和有效性。

Abstract
Multi-channel multi-talker speech recognition presents formidable challenges in the realm of speech processing, marked by issues such as background noise, reverberation, and overlapping speech. Overcoming these complexities requires leveraging contextual cues to separate target speech from a cacophonous mix, enabling accurate recognition. Among these cues, the 3D spatial feature has emerged as a cutting-edge solution, particularly when equipped with spatial information about the target speaker. Its exceptional ability to discern the target speaker within mixed audio, often rendering intermediate processing redundant, paves the way for the direct training of "All-in-one" ASR models. These models have demonstrated commendable performance on both simulated and real-world data. In this paper, we extend this approach to the MISP dataset to further validate its efficacy. We delve into the challenges encountered and insights gained when applying 3D spatial features to MISP, while also exploring preliminary experiments involving the replacement of these features with more complex input and models.

摘要
多通道多发言人语音识别面临多种复杂性，如背景噪音、反射和 overlap 的问题。为了解决这些复杂性，需要利用上下文ual cue 分离目标语音从杂乱的混音中，以实现准确的识别。在这些上下文ual cue 中，3D 空间特征已经成为一种前导的解决方案，特别当配备空间信息目标说话人时。它的出色能力在杂乱的音频中分离目标说话人，经常使得中间处理redundant，从而降低了ASR模型的训练复杂性。在这篇论文中，我们将这种方法应用到MISP数据集，以进一步验证其效果。我们将详细介绍在应用3D 空间特征时遇到的挑战和获得的洞察，同时也将展开一些将这些特征更换为更复杂的输入和模型的先期实验。

Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification

paper_url: http://arxiv.org/abs/2310.03889
repo_url: None
paper_authors: Yuanbo Hou, Siyang Song, Chuang Yu, Wenwu Wang, Dick Botteldooren
for: 本研究旨在揭示实际生活中各种听觉场景与语音事件关系图中的Semantic embedding的关系。
methods: 本研究提出了一种事件关系图表示学习（ERGL）框架，用于实现听觉场景分类，同时清晰地表明分类所用的cue。在事件关系图中，每个事件的嵌入被视为节点，而每对节点之间的关系cue被描述为多维边特征。
results: 在一个真实的听觉场景分类 dataset上，提出的ERGL方法实现了与限制数量的语音事件嵌入相关的高度竞争性表现。结果表明可以通过语音事件关系图来识别多样化的听觉场景。可见化的语音事件关系图 Representation可以在这里（https://github.com/Yuanbo2020/ERGL） obtener。

Abstract
Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This paper conducts the first study on disclosing the relationship between real-life acoustic scenes and semantic embeddings from the most relevant AEs. Specifically, we propose an event-relational graph representation learning (ERGL) framework for ASC to classify scenes, and simultaneously answer clearly and straightly which cues are used in classifying. In the event-relational graph, embeddings of each event are treated as nodes, while relationship cues derived from each pair of nodes are described by multi-dimensional edge features. Experiments on a real-life ASC dataset show that the proposed ERGL achieves competitive performance on ASC by learning embeddings of only a limited number of AEs. The results show the feasibility of recognizing diverse acoustic scenes based on the audio event-relational graph. Visualizations of graph representations learned by ERGL are available here (https://github.com/Yuanbo2020/ERGL).

摘要
大多数深度学习基于的声音场景分类（ASC）方法都是基于声音特征，将混合多种声音事件（AEs）转换为声音特征。然而，这些方法往往难以解释它们如何标识场景。本文提出了第一个研究声音场景与 Semantic Embeddings 之间的关系的研究，以及一种Event-Relational Graph Representation Learning（ERGL）框架，用于分类场景。在事件关系图中，每个事件的嵌入被视为节点，而每对节点之间的关系cue被描述为多维边feature。实验表明，提出的ERGL可以在一个有限数量的AEs上达到竞争力的ASC性能。结果表明，可以通过声音事件关系图来识别多样化的声音场景。可以在这里查看Visualization of ERGL学习的图表（https://github.com/Yuanbo2020/ERGL）。

Securing Voice Biometrics: One-Shot Learning Approach for Audio Deepfake Detection

paper_url: http://arxiv.org/abs/2310.03856
repo_url: None
paper_authors: Awais Khan, Khalid Mahmood Malik
for: 防止冒饵攻击 voice biometrics 系统，尤其是运用 audio deepfakes 进行逻辑存取攻击。
methods: 使用 one-shot learning 和 Metric Learning 技术探测和识别不同统计分布的 Synthetic 攻击，并使用有效的 спектраль特征集抽出有价的时间嵌入。
results: 在 ASVspoof 2019 逻辑存取（LA）数据集上评估了 Quick-SpoofNet 的表现，并在不同的 deepfake 攻击下进行了测试。实验结果显示 Quick-SpoofNet 能够具有高度的攻击探测率和优化的一致性。

Abstract
The Automatic Speaker Verification (ASV) system is vulnerable to fraudulent activities using audio deepfakes, also known as logical-access voice spoofing attacks. These deepfakes pose a concerning threat to voice biometrics due to recent advancements in generative AI and speech synthesis technologies. While several deep learning models for speech synthesis detection have been developed, most of them show poor generalizability, especially when the attacks have different statistical distributions from the ones seen. Therefore, this paper presents Quick-SpoofNet, an approach for detecting both seen and unseen synthetic attacks in the ASV system using one-shot learning and metric learning techniques. By using the effective spectral feature set, the proposed method extracts compact and representative temporal embeddings from the voice samples and utilizes metric learning and triplet loss to assess the similarity index and distinguish different embeddings. The system effectively clusters similar speech embeddings, classifying bona fide speeches as the target class and identifying other clusters as spoofing attacks. The proposed system is evaluated using the ASVspoof 2019 logical access (LA) dataset and tested against unseen deepfake attacks from the ASVspoof 2021 dataset. Additionally, its generalization ability towards unseen bona fide speech is assessed using speech data from the VSDC dataset.

摘要
“自动话语识别（ASV）系统面临伪造活动的威胁，包括语音深圳攻击（Deepfake）。这些深圳攻击对话语音识别器具有潜在的威胁，因为近年来的生成AI和语音合成技术得到了进步。虽然许多深度学习模型用于语音合成检测已经发展出来，但大多数它们在不同的统计分布下显示出差。因此，这篇文章提出了快速攻击网络（Quick-SpoofNet），用于检测ASV系统中见到和未见到的合成攻击。这个方法使用有效的спектраль特征集，将声音样本中的时间特征提取出来，并使用度量学习和三重损失来评估相似性指数。系统可以划分相似的声音嵌入，将真正的话语识别为目标类别，并识别其他嵌入为伪造攻击。这个系统在ASVspoof 2019逻辑存取（LA）数据集上进行评估，并对未见到的深圳攻击进行测试。此外，它的普遍能力也被评估使用VSDC数据集上的话语数据。”

Speaker localization using direct path dominance test based on sound field directivity

paper_url: http://arxiv.org/abs/2310.03688
repo_url: None
paper_authors: Boaz Rafaely, Koby Alhaiany
for: 这项研究的目的是开发一种robust to reverberation的DOA估计方法。
methods: 该方法基于时域频域分布的直接路径占据性测试，但不需要频率缓和矩阵分解。
results: 对比之前的方法，提议的方法在噪声和泛音条件下保持了相似的Robustness，并且计算效率高于原方法四倍。

Abstract
Estimation of the direction-of-arrival (DoA) of a speaker in a room is important in many audio signal processing applications. Environments with reverberation that masks the DoA information are particularly challenging. Recently, a DoA estimation method that is robust to reverberation has been developed. This method identifies time-frequency bins dominated by the contribution from the direct path, which carries the correct DoA information. However, its implementation is computationally demanding as it requires frequency smoothing to overcome the effect of coherent early reflections and matrix decomposition to apply the direct-path dominance (DPD) test. In this work, a novel computationally-efficient alternative to the DPD test is proposed, based on the directivity measure for sensor arrays, which requires neither frequency smoothing nor matrix decomposition, and which has been reformulated for sound field directivity with spherical microphone arrays. The paper presents the proposed method and a comparison to previous methods under a range of reverberation and noise conditions. Result demonstrate that the proposed method shows comparable performance to the original method in terms of robustness to reverberation and noise, and is about four times more computationally efficient for the given experiment.

摘要
<>translate( Estimation of the direction-of-arrival (DoA) of a speaker in a room is important in many audio signal processing applications. Environments with reverberation that masks the DoA information are particularly challenging. Recently, a DoA estimation method that is robust to reverberation has been developed. This method identifies time-frequency bins dominated by the contribution from the direct path, which carries the correct DoA information. However, its implementation is computationally demanding as it requires frequency smoothing to overcome the effect of coherent early reflections and matrix decomposition to apply the direct-path dominance (DPD) test. In this work, a novel computationally-efficient alternative to the DPD test is proposed, based on the directivity measure for sensor arrays, which requires neither frequency smoothing nor matrix decomposition, and which has been reformulated for sound field directivity with spherical microphone arrays. The paper presents the proposed method and a comparison to previous methods under a range of reverberation and noise conditions. Result demonstrate that the proposed method shows comparable performance to the original method in terms of robustness to reverberation and noise, and is about four times more computationally efficient for the given experiment. )中文简体版：<>音频信号处理应用中，确定发声者的方向来（DoA）在房间中非常重要。尤其是在听到延迟响应的环境中，DoA信息会被遮盖。最近，一种可以在延迟响应的环境中具有高Robustness的DoA估算方法已经被开发出来。这种方法可以在时域频域中标识由直接路径提供的DoA信息的占据率。然而，它的实现具有计算挺大的问题，需要频率平滑以超越协同早期反射的效果，并且需要矩阵分解来应用直通性测试。在这项工作中，一种新的计算高效的代替方法被提出，基于探测阵列的直接性度，不需要频率平滑也不需要矩阵分解。这种方法的实现可以在圆形 Mikrofon 阵列上进行 reformulation。文章介绍了该方法，并对之前的方法进行比较，包括各种噪声和延迟的条件下的性能。结果表明，该方法在robustness和计算效率方面与原始方法相似，且计算效率高于原始方法四倍。

Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems

paper_url: http://arxiv.org/abs/2310.03455
repo_url: https://github.com/ronfrancesca/sed_carbon_footprint
paper_authors: Francesca Ronchini, Romain Serizel
for: 这篇研究旨在探讨深度学习系统中增加复杂性和能耗问题的趋势，以及这些系统对环境的影响。
methods: 本研究使用了过去两年的探测和分类响应挑战 зада务的提交作为基础，进行比较和详细分析。
results: 研究发现，过去两年中深度学习系统的复杂性和能耗问题有所增加，并且这些系统对环境的影响也逐渐增加。

Abstract
In recent years, deep learning systems have shown a concerning trend toward increased complexity and higher energy consumption. As researchers in this domain and organizers of one of the Detection and Classification of Acoustic Scenes and Events challenges tasks, we recognize the importance of addressing the environmental impact of data-driven SED systems. In this paper, we propose an analysis focused on SED systems based on the challenge submissions. This includes a comparison across the past two years and a detailed analysis of this year's SED systems. Through this research, we aim to explore how the SED systems are evolving every year in relation to their energy efficiency implications.

摘要
近年来，深度学习系统在复杂性和能耗方面表现出了担忧的趋势。作为这个领域的研究人员和挑战任务组织者之一，我们认为对数据驱动的SED系统环境影响的问题非常重要。在这篇论文中，我们提出了基于挑战提交的SED系统分析。包括过去两年的比较和本年SED系统的详细分析。通过这些研究，我们希望探讨每年SED系统在能效环境方面的发展趋势。

VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice

paper_url: http://arxiv.org/abs/2310.03444
repo_url: None
paper_authors: Frederik Bous, Axel Roebel
for: voice transformation, disentanglement of F0 parameter
methods: dropout-based information bottleneck auto-encoder, adaptive bottleneck size
results: improved disentanglement of F0 parameter for both speech and singing voice, improved synthesis quality, universal voice model for both speech and singing voice

Abstract
The information bottleneck auto-encoder is a tool for disentanglement commonly used for voice transformation. The successful disentanglement relies on the right choice of bottleneck size. Previous bottleneck auto-encoders created the bottleneck by the dimension of the latent space or through vector quantization and had no means to change the bottleneck size of a specific model. As the bottleneck removes information from the disentangled representation, the choice of bottleneck size is a trade-off between disentanglement and synthesis quality. We propose to build the information bottleneck using dropout which allows us to change the bottleneck through the dropout rate and investigate adapting the bottleneck size depending on the context. We experimentally explore into using the adaptive bottleneck for pitch transformation and demonstrate that the adaptive bottleneck leads to improved disentanglement of the F0 parameter for both, speech and singing voice leading to improved synthesis quality. Using the variable bottleneck size, we were able to achieve disentanglement for singing voice including extremely high pitches and create a universal voice model, that works on both speech and singing voice with improved synthesis quality.

摘要
信息瓶颈自适应Encoder是一种常用的分解工具，通常用于音频变换。成功的分解取决于瓶颈大小的选择。过去的瓶颈自适应Encoder通过缺省空间维度或VECTOR量化来创建瓶颈，而无法改变特定模型中的瓶颈大小。因为瓶颈从分解表示中移除信息，因此瓶颈大小的选择是一种负担很大的负担，即分解和生成质量之间的权衡。我们提议使用dropout来构建信息瓶颈，这allow us可以通过dropout率来改变瓶颈大小，并且在不同的上下文中进行调整。我们通过实验explore使用可变瓶颈大小来进行音高变换，并证明可变瓶颈可以提高F0参数的分解，并且对于语音和歌唱voice都可以提高生成质量。使用可变瓶颈大小，我们可以实现对歌唱voice的分解，包括极高的音高，并创建一个通用的语音模型，可以在语音和歌唱voice上进行改进的生成。

2023-10-05

eess.AS

eess.AS - 2023-10-05

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis

paper_url: http://arxiv.org/abs/2310.03538
repo_url: None
paper_authors: Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim
for: 提高 zero-shot text-to-speech（ZS-TTS）系统的性能
methods: 使用简单而有效的幽合空间数据增强方法（Latent Filling，LF），在ZS-TTS系统的 speaker embedding 空间中进行数据增强
results: LF 能够提高 speaker 相似性，同时保持 speech 质量

Abstract
Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overall system performance. To avoid such degradation, instead of directly augmenting the input data, we propose a latent filling (LF) method that adopts simple but effective latent space data augmentation in the speaker embedding space of the ZS-TTS system. By incorporating a consistency loss, LF can be seamlessly integrated into existing ZS-TTS systems without the need for additional training stages. Experimental results show that LF significantly improves speaker similarity while preserving speech quality.

摘要
Note: Simplified Chinese is a standardized form of Chinese that uses shorter words and sentences, and is often used in informal writing and online communication. The translation above uses Simplified Chinese characters and grammar.

2023-10-05

cs.CV

cs.CV - 2023-10-05

Diffusion Models as Masked Audio-Video Learners

paper_url: http://arxiv.org/abs/2310.03937
repo_url: None
paper_authors: Elvis Nunez, Yanzi Jin, Mohammad Rastegari, Sachin Mehta, Maxwell Horton
for: 这篇论文是为了探讨采用散射模型与MAViL架构的协同运作，以实现更好的音频-视频表现。
methods: 这篇论文使用了Masked Audio-Video Learners（MAViL）架构，结合了对比学习和对应预测，将音频spectrogram和视频帧组合在一起，并且使用散射模型对应视频帧。
results: 这篇论文的结果显示，通过将散射模型与MAViL架构结合，可以实现32%的预训操作数量（FLOPS）和18%的预训时间（wall clock time）的减少，并且不会对音频类别 зада对的表现造成影响。

Abstract
Over the past several years, the synchronization between audio and visual signals has been leveraged to learn richer audio-visual representations. Aided by the large availability of unlabeled videos, many unsupervised training frameworks have demonstrated impressive results in various downstream audio and video tasks. Recently, Masked Audio-Video Learners (MAViL) has emerged as a state-of-the-art audio-video pre-training framework. MAViL couples contrastive learning with masked autoencoding to jointly reconstruct audio spectrograms and video frames by fusing information from both modalities. In this paper, we study the potential synergy between diffusion models and MAViL, seeking to derive mutual benefits from these two frameworks. The incorporation of diffusion into MAViL, combined with various training efficiency methodologies that include the utilization of a masking ratio curriculum and adaptive batch sizing, results in a notable 32% reduction in pre-training Floating-Point Operations (FLOPS) and an 18% decrease in pre-training wall clock time. Crucially, this enhanced efficiency does not compromise the model's performance in downstream audio-classification tasks when compared to MAViL's performance.

摘要
在过去几年，听视同步学习被利用来学习更加 ricah 的听视信号表示。由于大量的无标视频数据的可用性，许多无监督训练框架在各种下游听视任务中表现出色。最近，Masked Audio-Video Learners（MAViL）在听视预训练框架中崛起为state-of-the-art。MAViL将对听视信号和视频帧进行对比学习和压缩编码，通过两种Modalities的信息融合来重建听视spectrogram和视频帧。在这篇文章中，我们研究了diffusion模型和MAViL之间的可能的共识，以寻找这两个框架之间的互惠关系。通过将diffusion incorporated into MAViL，并结合多种训练效率技术，包括使用masquerade ratio curriculum和adaptive batch sizing，我们得到了一个明显的32%的预训练FLOPS减少和18%的预训练wall clock time减少。幸运的是，这些提高的效率不会影响模型在下游听视分类任务中的性能，与MAViL的性能相比。

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

paper_url: http://arxiv.org/abs/2310.03923
repo_url: https://github.com/UARK-AICV/OpenFusion
paper_authors: Kashu Yamazaki, Taisei Hanyu, Khoa Vo, Thang Pham, Minh Tran, Gianfranco Doretto, Anh Nguyen, Ngan Le
for: 这篇论文旨在提出一种实时开放词汇3D环境地图创建和可询问场景表示方法，使用RGB-D数据。
methods: 该方法利用预训练的视觉语言基础模型（VLFM）进行开放集semantic解决，并使用TSDF快速生成3D场景重建。
results: 对于ScanNet数据集，Open-Fusion的表现明显超过了领先的零shot方法，并且可以实现无需额外3D训练的注释自由3D分割。此外，Open-Fusion通过结合区域基于VLFM和TSDF的优化hungarian特征匹配机制，实现了实时3D场景理解，包括物体概念和开放世界 semantics。Here’s the English version of the three key information points for reference:
for: This paper proposes a real-time open-vocabulary 3D environmental mapping method using RGB-D data, which is groundbreaking.
methods: The method utilizes a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction.
results: The results on the ScanNet dataset demonstrate that Open-Fusion outperforms leading zero-shot methods and can achieve annotation-free 3D segmentation without additional 3D training. Additionally, Open-Fusion seamlessly combines the strengths of region-based VLFM and TSDF, enabling real-time 3D scene comprehension that includes object concepts and open-world semantics.

Abstract
Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction. By leveraging the VLFM, we extract region-based embeddings and their associated confidence maps. These are then integrated with 3D knowledge from TSDF using an enhanced Hungarian-based feature-matching mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without necessitating additional 3D training. Benchmark tests on the ScanNet dataset against leading zero-shot methods highlight Open-Fusion's superiority. Furthermore, it seamlessly combines the strengths of region-based VLFM and TSDF, facilitating real-time 3D scene comprehension that includes object concepts and open-world semantics. We encourage the readers to view the demos on our project page: https://uark-aicv.github.io/OpenFusion

摘要
precisions 3D 环境地图是 robotics 中关键的。现有方法 oftentimes rely on predefined concepts during training or are time-consuming when generating semantic maps。这篇文章 introduce Open-Fusion， a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data。Open-Fusion 利用 pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension，并使用 Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction。通过 leveraging VLFM，我们可以 extract region-based embeddings and their associated confidence maps。这些是 then integrated with 3D knowledge from TSDF using an enhanced Hungarian-based feature-matching mechanism。 worth noting that Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without requiring additional 3D training。 benchmark tests on ScanNet dataset against leading zero-shot methods highlight Open-Fusion's superiority。 Furthermore, it seamlessly combines the strengths of region-based VLFM and TSDF, facilitating real-time 3D scene comprehension that includes object concepts and open-world semantics。readers can view demos on our project page: https://uark-aicv.github.io/OpenFusion

Coloring Deep CNN Layers with Activation Hue Loss

paper_url: http://arxiv.org/abs/2310.03911
repo_url: None
paper_authors: Louis-François Bouchard, Mohsen Ben Lazreg, Matthew Toews
for: 这篇论文旨在提出一种新的深度卷积神经网络（CNN）活动空间模型，即“活动色彩”，以优化模型进行更有效的学习。
methods: 该论文使用了一种基于 nearest neighbor indexing of activation vectors 的方法，发现类信息强度集中于某个角度 $\theta$ 处， both in $(x,y)$ 图像平面和多通道活动空间。论文还提出了一种使用活动色彩标签的偏置项来补偿标准一采样损失。
results: 论文通过在多种分类任务上从零开始训练，包括 ImageNet，发现了一些有限的改进。

Abstract
This paper proposes a novel hue-like angular parameter to model the structure of deep convolutional neural network (CNN) activation space, referred to as the {\em activation hue}, for the purpose of regularizing models for more effective learning. The activation hue generalizes the notion of color hue angle in standard 3-channel RGB intensity space to $N$-channel activation space. A series of observations based on nearest neighbor indexing of activation vectors with pre-trained networks indicate that class-informative activations are concentrated about an angle $\theta$ in both the $(x,y)$ image plane and in multi-channel activation space. A regularization term in the form of hue-like angular $\theta$ labels is proposed to complement standard one-hot loss. Training from scratch using combined one-hot + activation hue loss improves classification performance modestly for a wide variety of classification tasks, including ImageNet.

摘要
Here is the text in Simplified Chinese:这篇论文提出了一种新的深度 convolutional neural network (CNN) 正则化方法，基于一种新的概念——“活动色”。活动色是一种模型深度 CNN 的活动空间结构的方法，它基于标准的3通道 RGB 强度空间中的色度角的想法。作者们提议使用一种类似于色度角的angular标签来补充标准的一个零一个Hot损失函数，并表明这种方法可以提高各种分类任务的分类性能，包括 ImageNet。

TWICE Dataset: Digital Twin of Test Scenarios in a Controlled Environment

paper_url: http://arxiv.org/abs/2310.03895
repo_url: None
paper_authors: Leonardo Novicki Neto, Fabio Reway, Yuri Poledna, Maikol Funk Drechsler, Eduardo Parente Ribeiro, Werner Huber, Christian Icking
for: 本研究是为了提供一个实际测试车道和实验室中重现的恶势力天气下自动驾驶车辆安全可靠运行的数据集。
methods: 该数据集包括了摄像头、雷达、LiDAR、自转仪和GPS数据，并在恶势力天气条件下进行了多种测试场景，包括雨天、夜间和雪天等。
results: 该数据集包含了超过2小时的录制数据，总量超过280GB，因此是自动驾驶领域研究人员测试和改进算法的宝贵资源，同时也可以探索 simulation-to-reality gap。数据集可以在以下地址下载：https://twicedataset.github.io/site/

Abstract
Ensuring the safe and reliable operation of autonomous vehicles under adverse weather remains a significant challenge. To address this, we have developed a comprehensive dataset composed of sensor data acquired in a real test track and reproduced in the laboratory for the same test scenarios. The provided dataset includes camera, radar, LiDAR, inertial measurement unit (IMU), and GPS data recorded under adverse weather conditions (rainy, night-time, and snowy conditions). We recorded test scenarios using objects of interest such as car, cyclist, truck and pedestrian -- some of which are inspired by EURONCAP (European New Car Assessment Programme). The sensor data generated in the laboratory is acquired by the execution of simulation-based tests in hardware-in-the-loop environment with the digital twin of each real test scenario. The dataset contains more than 2 hours of recording, which totals more than 280GB of data. Therefore, it is a valuable resource for researchers in the field of autonomous vehicles to test and improve their algorithms in adverse weather conditions, as well as explore the simulation-to-reality gap. The dataset is available for download at: https://twicedataset.github.io/site/

摘要
Ensuring the safe and reliable operation of autonomous vehicles under adverse weather remains a significant challenge. To address this, we have developed a comprehensive dataset composed of sensor data acquired in a real test track and reproduced in the laboratory for the same test scenarios. The provided dataset includes camera, radar, LiDAR, inertial measurement unit (IMU), and GPS data recorded under adverse weather conditions (rainy, night-time, and snowy conditions). We recorded test scenarios using objects of interest such as car, cyclist, truck and pedestrian -- some of which are inspired by EURONCAP (European New Car Assessment Programme). The sensor data generated in the laboratory is acquired by the execution of simulation-based tests in hardware-in-the-loop environment with the digital twin of each real test scenario. The dataset contains more than 2 hours of recording, which totals more than 280GB of data. Therefore, it is a valuable resource for researchers in the field of autonomous vehicles to test and improve their algorithms in adverse weather conditions, as well as explore the simulation-to-reality gap. The dataset is available for download at: https://twicedataset.github.io/site/Here's the word-for-word translation of the text into Simplified Chinese: Ensuring the safe and reliable operation of autonomous vehicles under adverse weather remains a significant challenge. To address this, we have developed a comprehensive dataset composed of sensor data acquired in a real test track and reproduced in the laboratory for the same test scenarios. The provided dataset includes camera, radar, LiDAR, inertial measurement unit (IMU), and GPS data recorded under adverse weather conditions (rainy, night-time, and snowy conditions). We recorded test scenarios using objects of interest such as car, cyclist, truck and pedestrian -- some of which are inspired by EURONCAP (European New Car Assessment Programme). The sensor data generated in the laboratory is acquired by the execution of simulation-based tests in hardware-in-the-loop environment with the digital twin of each real test scenario. The dataset contains more than 2 hours of recording, which totals more than 280GB of data. Therefore, it is a valuable resource for researchers in the field of autonomous vehicles to test and improve their algorithms in adverse weather conditions, as well as explore the simulation-to-reality gap. The dataset is available for download at: https://twicedataset.github.io/site/

Characterizing the Features of Mitotic Figures Using a Conditional Diffusion Probabilistic Model

paper_url: http://arxiv.org/abs/2310.03893
repo_url: https://github.com/cagladbahadir/dpm-for-mitotic-figures
paper_authors: Cagla Deniz Bahadir, Benjamin Liechty, David J. Pisapia, Mert R. Sabuncu
for: 本研究旨在描述mitosis标注的不确定性和分类任务的人类可读性特征。
methods: 我们使用泛化噪声模型生成相同核心变换为mitosis标注的Synthetic图像序列，以便识别不同的核心特征，如细胞核粒体粗糙度、核心密度、核心异常和核心与细胞体的对比度。
results: 我们的方法可以帮助病理学家更好地理解和解释决策所需的特征。

Abstract
Mitotic figure detection in histology images is a hard-to-define, yet clinically significant task, where labels are generated with pathologist interpretations and where there is no ``gold-standard'' independent ground-truth. However, it is well-established that these interpretation based labels are often unreliable, in part, due to differences in expertise levels and human subjectivity. In this paper, our goal is to shed light on the inherent uncertainty of mitosis labels and characterize the mitotic figure classification task in a human interpretable manner. We train a probabilistic diffusion model to synthesize patches of cell nuclei for a given mitosis label condition. Using this model, we can then generate a sequence of synthetic images that correspond to the same nucleus transitioning into the mitotic state. This allows us to identify different image features associated with mitosis, such as cytoplasm granularity, nuclear density, nuclear irregularity and high contrast between the nucleus and the cell body. Our approach offers a new tool for pathologists to interpret and communicate the features driving the decision to recognize a mitotic figure.

摘要
In this paper, we aim to shed light on the inherent uncertainty of mitosis labels and characterize the mitotic figure classification task in a human-interpretable manner. We train a probabilistic diffusion model to synthesize patches of cell nuclei for a given mitosis label condition. By using this model, we can generate a sequence of synthetic images that correspond to the same nucleus transitioning into the mitotic state. This allows us to identify different image features associated with mitosis, such as cytoplasm granularity, nuclear density, nuclear irregularity, and high contrast between the nucleus and the cell body.Our approach offers a new tool for pathologists to interpret and communicate the features driving the decision to recognize a mitotic figure.

FNOSeg3D: Resolution-Robust 3D Image Segmentation with Fourier Neural Operator

paper_url: http://arxiv.org/abs/2310.03872
repo_url: https://github.com/ibm/multimodal-3d-image-segmentation
paper_authors: Ken C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood
for: 这个论文主要是为了解决深度学习在医疗三维静止图像分割中的计算复杂性问题，通过使用下采样的图像进行训练，但是这会导致模型在原始分辨率下的准确性下降。
methods: 这个论文提出了一种基于干扰 нейронOperator（FNO）的3D分割模型，该模型具有零批量超分辨率和全局响应场的特点。 authors 通过减少参数量和加入循环连接和深度监督来改进FNO，从而实现了具有少量参数和高精度的FNOSeg3D模型。
results: 对于BraTS’19数据集，FNOSeg3D模型在不同的训练图像分辨率下表现出了较好的鲁棒性，与其他测试模型相比，它的参数量少于1%。

Abstract
Due to the computational complexity of 3D medical image segmentation, training with downsampled images is a common remedy for out-of-memory errors in deep learning. Nevertheless, as standard spatial convolution is sensitive to variations in image resolution, the accuracy of a convolutional neural network trained with downsampled images can be suboptimal when applied on the original resolution. To address this limitation, we introduce FNOSeg3D, a 3D segmentation model robust to training image resolution based on the Fourier neural operator (FNO). The FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We improve the FNO by reducing its parameter requirement and enhancing its learning capability through residual connections and deep supervision, and these result in our FNOSeg3D model which is parameter efficient and resolution robust. When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.

摘要
The FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the advantages of zero-shot super-resolution and global receptive field. We improve the FNO by reducing its parameter requirement and enhancing its learning capability through residual connections and deep supervision, resulting in our FNOSeg3D model. This model is parameter efficient and resolution robust, achieving superior performance compared to other models with less than 1% of their parameters when tested on the BraTS'19 dataset.

Consistency Regularization Improves Placenta Segmentation in Fetal EPI MRI Time Series

paper_url: http://arxiv.org/abs/2310.03870
repo_url: https://github.com/firstmover/cr-seg
paper_authors: Yingcheng Liu, Neerav Karani, Neel Dey, S. Mazdak Abulnaga, Junshen Xu, P. Ellen Grant, Esra Abaci Turk, Polina Golland
for: 这个论文旨在提高围绕胎动成像的胎盘三维自动分割精度，以便提高先天医疗的诊断和治疗。
methods: 该论文提出了一种有效的半监督学习方法，使用了一种协调正则化损失函数来提高三维胎盘分割的精度。
results: 实验结果表明，该方法可以提高总分割精度，并且对于异常样本和困难样本表现更好。此外，该方法还可以提高时间序列中的预测准确性，这将有助于更 preciselly 计算胎盘生物标志物。

Abstract
The placenta plays a crucial role in fetal development. Automated 3D placenta segmentation from fetal EPI MRI holds promise for advancing prenatal care. This paper proposes an effective semi-supervised learning method for improving placenta segmentation in fetal EPI MRI time series. We employ consistency regularization loss that promotes consistency under spatial transformation of the same image and temporal consistency across nearby images in a time series. The experimental results show that the method improves the overall segmentation accuracy and provides better performance for outliers and hard samples. The evaluation also indicates that our method improves the temporal coherency of the prediction, which could lead to more accurate computation of temporal placental biomarkers. This work contributes to the study of the placenta and prenatal clinical decision-making. Code is available at https://github.com/firstmover/cr-seg.

摘要
placenta 在胎儿发展中扮演重要角色。自动化3D placenta分割从胎儿EPI MRI序列图像中提取placenta信息可能提高产前照管。本文提出一种有效的半指导学习方法，用于提高胎儿EPI MRI时序序列中placenta分割精度。我们使用一种协调常量损失函数，以便在同一张图像中的不同位置和邻近图像时序序列中保持一致性。实验结果表明，该方法可以提高总的分割精度，并且对于异常样本和困难样本表现更好。评估还表明，我们的方法可以提高计算时间序列placental生物标志的准确性。这种研究对于胎儿和产前诊断具有重要意义。代码可以在https://github.com/firstmover/cr-seg上获取。

OpenIncrement: A Unified Framework for Open Set Recognition and Deep Class-Incremental Learning

paper_url: http://arxiv.org/abs/2310.03848
repo_url: https://github.com/gawainxu/openincremen
paper_authors: Jiawen Xu, Claas Grohnfeldt, Odej Kao
for: 这篇论文是用于探讨深度增量学习的问题，特别是当 Novel samples 是预先识别的时候， neural network 的重训可能会导致错误的预测。
methods: 本文提出了一个基于 open set recognition 的深度类别增量学习框架，并将类别增量学习的特征整合到了距离基本 open set recognition。
results: 实验结果显示，我们的方法在比较于现有的增量学习技术的情况下，表现出了更好的性能，并且在 open set recognition 方面比基于方法表现更好。

Abstract
In most works on deep incremental learning research, it is assumed that novel samples are pre-identified for neural network retraining. However, practical deep classifiers often misidentify these samples, leading to erroneous predictions. Such misclassifications can degrade model performance. Techniques like open set recognition offer a means to detect these novel samples, representing a significant area in the machine learning domain. In this paper, we introduce a deep class-incremental learning framework integrated with open set recognition. Our approach refines class-incrementally learned features to adapt them for distance-based open set recognition. Experimental results validate that our method outperforms state-of-the-art incremental learning techniques and exhibits superior performance in open set recognition compared to baseline methods.

摘要
多数深度增量学习研究中假设新样本已经被预先标识，以供神经网络重新训练。然而，实际的深度分类器经常错误地分类这些样本，导致错误预测。这种错误分类可能会降低模型性能。开集识别技术提供了检测这些新样本的方式，这是机器学习领域的一个重要领域。在这篇论文中，我们介绍了一种集成了开集识别的深度分类器增量学习框架。我们的方法可以将增量学习得到的特征进行修正，以适应距离基于开集识别。实验结果表明，我们的方法比状态数据增量学习技术更高效，并在基准方法比较下表现出超越性。

Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks

paper_url: http://arxiv.org/abs/2310.03843
repo_url: None
paper_authors: Xu Luo, Difan Zou, Lianli Gao, Zenglin Xu, Jingkuan Song
For: 这篇论文的目的是研究如何将预训练模型转移到下游任务中，以便在几 shot 的情况下提高模型的性能。* Methods: 这篇论文使用了线性探测方法，即在预训练模型中固化特征后，使用特征来训练一个线性分类器。然而，在下游数据中存在差距，因此这篇论文提出了问题：预训练特征中的哪些维度是有用的？* Results: 研究发现，在几 shot 的情况下，预训练特征可以很 redundant，即使只使用1%的最重要维度，也可以恢复使用全个表示的性能。此外，研究还发现，这种 redundancy 在几 shot 的情况下是非常明显的，而且随着数据量的增加，这种 redundancy 逐渐消失。这种现象的理论理解和解释，以及如何通过软掩码来解决这种问题，都是这篇论文的重要贡献。

Abstract
Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model. As there may exist significant gaps between pretraining and downstream datasets, one may ask whether all dimensions of the pretrained features are useful for a given downstream task. We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce, or few-shot. For some cases such as 5-way 1-shot tasks, using only 1\% of the most important feature dimensions is able to recover the performance achieved by using the full representation. Interestingly, most dimensions are redundant only under few-shot settings and gradually become useful when the number of shots increases, suggesting that feature redundancy may be the key to characterizing the "few-shot" nature of few-shot transfer problems. We give a theoretical understanding of this phenomenon and show how dimensions with high variance and small distance between class centroids can serve as confounding factors that severely disturb classification results under few-shot settings. As an attempt at solving this problem, we find that the redundant features are difficult to identify accurately with a small number of training samples, but we can instead adjust feature magnitude with a soft mask based on estimated feature importance. We show that this method can generally improve few-shot transfer performance across various pretrained models and downstream datasets.

摘要
传播预训模型到下游任务可以非常简单，只需要在目标数据上进行线性探测，即在预训模型中冻结特征后，训练一个线性分类器。由于预训和下游数据集之间可能存在很大差距，因此我们可能会问到，预训特征中的所有维度都是下游任务中有用吗。我们发现，在线性探测情况下，预训特征可以非常重复，特别是在scarce或少shot情况下。例如，在5种类1个shot任务中，只使用1%的最重要特征维度可以恢复使用全表示性能。 Interestingly, most dimensions are redundant only under few-shot settings and gradually become useful when the number of shots increases, suggesting that feature redundancy may be the key to characterizing the "few-shot" nature of few-shot transfer problems. We give a theoretical understanding of this phenomenon and show how dimensions with high variance and small distance between class centroids can serve as confounding factors that severely disturb classification results under few-shot settings. As an attempt at solving this problem, we find that the redundant features are difficult to identify accurately with a small number of training samples, but we can instead adjust feature magnitude with a soft mask based on estimated feature importance. We show that this method can generally improve few-shot transfer performance across various pretrained models and downstream datasets.

HartleyMHA: Self-Attention in Frequency Domain for Resolution-Robust and Parameter-Efficient 3D Image Segmentation

paper_url: http://arxiv.org/abs/2310.04466
repo_url: https://github.com/ibm/multimodal-3d-image-segmentation
paper_authors: Ken C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood
for: 提高3D图像分割中自动注意力模型的训练缓存效率，避免训练图像大小过小导致的准确性下降。
methods: 基于FNO框架，使用 Hartley transform 和共享参数来减少模型大小，并在频域中应用自注意力以提高表达能力和效率。
results: 在BraTS’19数据集上测试， HartleyMHA 模型可以与其他模型相比，在训练图像大小不同情况下保持超过99%的准确性，而且具有训练缓存效率的优势。

Abstract
With the introduction of Transformers, different attention-based models have been proposed for image segmentation with promising results. Although self-attention allows capturing of long-range dependencies, it suffers from a quadratic complexity in the image size especially in 3D. To avoid the out-of-memory error during training, input size reduction is usually required for 3D segmentation, but the accuracy can be suboptimal when the trained models are applied on the original image size. To address this limitation, inspired by the Fourier neural operator (FNO), we introduce the HartleyMHA model which is robust to training image resolution with efficient self-attention. FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We modify the FNO by using the Hartley transform with shared parameters to reduce the model size by orders of magnitude, and this allows us to further apply self-attention in the frequency domain for more expressive high-order feature combination with improved efficiency. When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.

摘要
“受到变换器引入后，不同的注意力基于模型在图像分割方面提出了出色的结果。虽然自注意力允许捕捉长距离依赖关系，但它在图像大小上具有二次复杂性，特别是在3D分割中。为了避免训练过程中的内存出错，通常需要降低输入图像大小，但在原始图像大小上应用已经训练的模型时，准确率可能会受到限制。为解决这个局限性，我们引入了HartleyMHA模型，它具有高效的自注意力和鲁棒性。FNO是一种深度学习框架，用于学习函数partial differential equations中的映射，它具有透明的特性，如零shot超解像和全球响应区。我们通过使用Hartley变换和共享参数来降低模型大小，从而使得自注意力在频域中进行更加有表达力的高阶特征组合，并且提高了效率。在BraTS'19数据集上测试时，它与其他测试模型相比，具有更好的训练图像分辨率鲁棒性，仅占其模型参数的0.1%。”

Integrating Audio-Visual Features for Multimodal Deepfake Detection

paper_url: http://arxiv.org/abs/2310.03827
repo_url: None
paper_authors: Sneha Muppalla, Shan Jia, Siwei Lyu
for: 本研究旨在提出一种音视频基于的deepfake检测方法，以提高对多模态检测的精度。
methods: 本方法结合细致的deepfake标识与二分类算法，将样本分为四类，并对带内和跨域测试进行提升。
results: 实验结果表明，该方法在多模态检测中显著提高了检测精度，并在带内和跨域测试中具有优异表现。

Abstract
Deepfakes are AI-generated media in which an image or video has been digitally modified. The advancements made in deepfake technology have led to privacy and security issues. Most deepfake detection techniques rely on the detection of a single modality. Existing methods for audio-visual detection do not always surpass that of the analysis based on single modalities. Therefore, this paper proposes an audio-visual-based method for deepfake detection, which integrates fine-grained deepfake identification with binary classification. We categorize the samples into four types by combining labels specific to each single modality. This method enhances the detection under intra-domain and cross-domain testing.

摘要
深圳技术是由人工智能修改的图像或视频。随着深圳技术的发展， privacy和安全问题得到了关注。大多数深圳检测技术都是基于单一模式的检测。现有的音频视频检测方法不总能超越单个模式的分析。因此，本文提出了一种音频视频基于的深圳检测方法，该方法将细致的深圳标识与二分类结合。我们将样本分为四类，通过将每个单模式的标签结合。这种方法可以在同频和交叉频测试中提高检测精度。

WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection

paper_url: http://arxiv.org/abs/2310.03821
repo_url: https://github.com/jacky121298/WLST
paper_authors: Tsung-Lin Tsou, Tsung-Han Wu, Winston H. Hsu
for: 提高3D物体检测中的领域适应性，特别是在无目标标注的情况下。
methods: 提出了一种通用的弱标签导向自教学框架（WLST），利用自动标签器生成3D假标签，以提高目标频谱的训练过程。
results: 经验证明，我们的WLST框架可以提高3D物体检测中的领域适应性，并且在所有评价任务上表现出色，超过了之前的状态作法。

Abstract
In the field of domain adaptation (DA) on 3D object detection, most of the work is dedicated to unsupervised domain adaptation (UDA). Yet, without any target annotations, the performance gap between the UDA approaches and the fully-supervised approach is still noticeable, which is impractical for real-world applications. On the other hand, weakly-supervised domain adaptation (WDA) is an underexplored yet practical task that only requires few labeling effort on the target domain. To improve the DA performance in a cost-effective way, we propose a general weak labels guided self-training framework, WLST, designed for WDA on 3D object detection. By incorporating autolabeler, which can generate 3D pseudo labels from 2D bounding boxes, into the existing self-training pipeline, our method is able to generate more robust and consistent pseudo labels that would benefit the training process on the target domain. Extensive experiments demonstrate the effectiveness, robustness, and detector-agnosticism of our WLST framework. Notably, it outperforms previous state-of-the-art methods on all evaluation tasks.

摘要
在三维 объек目检测领域中的领域适应（DA）中，大多数工作集中在无监督领域适应（UDA）上。然而，无法获得目标域标注，DA方法与完全监督方法之间的性能差距仍然存在，这对实际应用来说是不实际的。相反，弱监督领域适应（WDA）是一个未得到充分发掘的 yet practical task，只需要少量的标注努力在目标域上。为了提高DA性能，我们提出了一个通用的弱标签指导自学习框架，WLST，设计为WDA在三维对象检测中进行。通过将自动标签器，可以生成3Dpseudo标签从2D bounding box，加入现有的自学习管道中，我们的方法可以生成更加稳定和一致的pseudo标签，这将对目标域训练过程中帮助提高DA性能。广泛的实验表明我们的WLST框架的有效性、稳定性和检测器免疫性。特别是，它超过了之前的状态 искусственный方法在所有评估任务上。

ContactGen: Generative Contact Modeling for Grasp Generation

paper_url: http://arxiv.org/abs/2310.03740
repo_url: https://github.com/stevenlsw/contactgen
paper_authors: Shaowei Liu, Yang Zhou, Jimei Yang, Saurabh Gupta, Shenlong Wang
for: 这个论文旨在提出一种基于物体中心的接触表示方法，以便更好地模型手部与物体之间的交互。
methods: 该方法包括三个组件：接触地图显示接触位置，部分地图表示接触手部，方向地图表示每个部分中的接触方向。给定输入物体，我们提出了一种 conditional generative 模型，以便预测接触地图并采用模型基于优化来预测多种具有多样性和几何可能性的抓取。
results: 实验结果表明，我们的方法可以生成高精度和多样性的人类抓取，并且适用于各种物体。项目页面：https://stevenlsw.github.io/contactgen/

Abstract
This paper presents a novel object-centric contact representation ContactGen for hand-object interaction. The ContactGen comprises three components: a contact map indicates the contact location, a part map represents the contact hand part, and a direction map tells the contact direction within each part. Given an input object, we propose a conditional generative model to predict ContactGen and adopt model-based optimization to predict diverse and geometrically feasible grasps. Experimental results demonstrate our method can generate high-fidelity and diverse human grasps for various objects. Project page: https://stevenlsw.github.io/contactgen/

摘要
这篇论文提出了一种新的物体呈现中心的接触表示方法ContactGen，用于手对象交互。ContactGen包括三个组成部分：接触地图显示接触位置，手部地图表示接触手部，以及每个部分的方向地图。给定输入物体，我们提议一种条件生成模型预测ContactGen，并采用模型基于优化预测多种具有多样性和几何可行性的抓取。实验结果表明我们的方法可以生成高精度和多样的人类抓取对象。项目页面：https://stevenlsw.github.io/contactgen/Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Stylist: Style-Driven Feature Ranking for Robust Novelty Detection

paper_url: http://arxiv.org/abs/2310.03738
repo_url: None
paper_authors: Stefan Smeu, Elena Burceanu, Emanuela Haller, Andrei Liviu Nicolicioiu
for: 检测样本是否具有新鲜度，但不是所有变化都是重要的。
methods: 使用形式化分为Semantic（有用）和Style（无用）变化，并且使用预训练大规模模型表示来提高抗性。
results: 提出了Stylist方法，可以去除环境偏见的特征，提高新鲜度检测性能。经验表明，在多个数据集上，Stylist方法可以提高新鲜度检测性能，并且可以处理 conten 和 style 类型的变化。

Abstract
Novelty detection aims at finding samples that differ in some form from the distribution of seen samples. But not all changes are created equal. Data can suffer a multitude of distribution shifts, and we might want to detect only some types of relevant changes. Similar to works in out-of-distribution generalization, we propose to use the formalization of separating into semantic or content changes, that are relevant to our task, and style changes, that are irrelevant. Within this formalization, we define the robust novelty detection as the task of finding semantic changes while being robust to style distributional shifts. Leveraging pretrained, large-scale model representations, we introduce Stylist, a novel method that focuses on dropping environment-biased features. First, we compute a per-feature score based on the feature distribution distances between environments. Next, we show that our selection manages to remove features responsible for spurious correlations and improve novelty detection performance. For evaluation, we adapt domain generalization datasets to our task and analyze the methods behaviors. We additionally built a large synthetic dataset where we have control over the spurious correlations degree. We prove that our selection mechanism improves novelty detection algorithms across multiple datasets, containing both stylistic and content shifts.

摘要
To achieve this, we leverage pre-trained, large-scale model representations and introduce Stylist, a novel method that focuses on dropping environment-biased features. We first compute a per-feature score based on the feature distribution distances between environments. We show that our selection mechanism removes features responsible for spurious correlations and improves novelty detection performance.For evaluation, we adapt domain generalization datasets to our task and analyze the methods' behaviors. We also built a large synthetic dataset where we have control over the degree of spurious correlations. Our results prove that our selection mechanism improves novelty detection algorithms across multiple datasets, containing both stylistic and content shifts.

Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency

paper_url: http://arxiv.org/abs/2310.03734
repo_url: None
paper_authors: Tianhong Li, Sangnie Bhardwaj, Yonglong Tian, Han Zhang, Jarred Barber, Dina Katabi, Guillaume Lajoie, Huiwen Chang, Dilip Krishnan
for: 提高vision-language生成模型的性能和泛化能力，使其能够在无需大量对应图像和文本数据的情况下进行训练和推断。
methods: 提出了一种新的训练方法，称为ITIT（图像-文本同步训练），它基于循环一致性的概念，可以在无需对应图像和文本数据的情况下进行图像-文本训练。
results: 实验表明，ITIT可以与高质量对应图像和文本数据进行训练，并且可以达到与现有的文本-图像模型相当的性能，只需要orders of magnitude fewer paired image-text data。

Abstract
Current vision-language generative models rely on expansive corpora of paired image-text data to attain optimal performance and generalization capabilities. However, automatically collecting such data (e.g. via large-scale web scraping) leads to low quality and poor image-text correlation, while human annotation is more accurate but requires significant manual effort and expense. We introduce $\textbf{ITIT}$ ($\textbf{I}$n$\textbf{T}$egrating $\textbf{I}$mage $\textbf{T}$ext): an innovative training paradigm grounded in the concept of cycle consistency which allows vision-language training on unpaired image and text data. ITIT is comprised of a joint image-text encoder with disjoint image and text decoders that enable bidirectional image-to-text and text-to-image generation in a single framework. During training, ITIT leverages a small set of paired image-text data to ensure its output matches the input reasonably well in both directions. Simultaneously, the model is also trained on much larger datasets containing only images or texts. This is achieved by enforcing cycle consistency between the original unpaired samples and the cycle-generated counterparts. For instance, it generates a caption for a given input image and then uses the caption to create an output image, and enforces similarity between the input and output images. Our experiments show that ITIT with unpaired datasets exhibits similar scaling behavior as using high-quality paired data. We demonstrate image generation and captioning performance on par with state-of-the-art text-to-image and image-to-text models with orders of magnitude fewer (only 3M) paired image-text data.

摘要
Current vision-language生成模型依赖广泛的图像文本资料来 достичь最佳性能和泛化能力。然而，自动从网页抓取大规模图像文本资料会导致低品质和差强的图像文本相关性，而人工标注更加精准但需要较大的人工努力和成本。我们介绍了 $\textbf{ITIT}$（$\textbf{I}$ntegrating $\textbf{I}$mage $\textbf{T}$ext）：一种创新的训练方案基于循环一致的概念，允许vision-language训练在不对应图像文本资料上。ITIT包括一个共同图像文本编码器和分开的图像和文本解oder，允许两向的图像文本生成。在训练过程中，ITIT利用一小量的对应图像文本资料来确保其输出与输入相对对应。同时，模型还被训练在包含很多图像或文本资料的更大 datasets 中。这是通过强制循环一致的原始无对应样本和循环生成的对应样本之间的一致性来实现的。例如，它将一个输入图像的描述生成为一个图像，并将这个图像与输入图像进行比较，以确保它们之间的一致性。我们的实验显示，ITIT可以与高品质的对应资料一样具有推广性。我们在实验中使用了只有300万对应图像文本资料，并且可以达到与现有的文本至图像和图像至文本模型相同的表现。

OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks

paper_url: http://arxiv.org/abs/2310.03707
repo_url: None
paper_authors: Ofir Bar Tal, Adi Haviv, Amit H. Bermano
for: 测试神经网络的可靠性，使用欺骗攻击（Evasion Attacks）对神经网络进行测试。
methods: 使用自我指导的、计算机 эконоical的方法生成敌意攻击，采用表示学习技术，生成在数据分布上的攻击。
results: 对不同的模型、数据类别和防御模型进行了实验，显示了该方法的效果， suggessting on-manifold EAs 在对未看过模型的攻击中具有重要作用。

Abstract
Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data to misguide the model into incorrect classifications. Creating these attacks is a challenging task, especially with the ever-increasing complexity of models and datasets. In this work, we introduce a self-supervised, computationally economical method for generating adversarial examples, designed for the unseen black-box setting. Adapting techniques from representation learning, our method generates on-manifold EAs that are encouraged to resemble the data distribution. These attacks are comparable in effectiveness compared to the state-of-the-art when attacking the model trained on, but are significantly more effective when attacking unseen models, as the attacks are more related to the data rather than the model itself. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models, suggesting a significant role for on-manifold EAs when targeting unseen models.

摘要
逃脱攻击（EA）用于测试训练过的神经网络robustness，通过扭曲输入数据来诱导模型进行错误分类。创建这些攻击是一项复杂的任务，特别是随着模型和数据集的复杂度不断增加。在这项工作中，我们提出了一种自动supervised，computational economical的方法，用于生成黑盒 setting下的敌意例子。我们采用了表示学习技术，使得我们的方法可以在数据分布上生成在敌意例子。这些攻击效果相当于state-of-the-art，但是在训练过的模型上更有效果，而在未看过模型上更有效果，因为这些攻击更加 relate to the data 而不是模型自身。我们的实验表明，这种方法在不同的模型、未seen data category和even defended models中具有显著的作用， suggesting a significant role for on-manifold EAs when targeting unseen models。

Drag View: Generalizable Novel View Synthesis with Unposed Imagery

paper_url: http://arxiv.org/abs/2310.03704
repo_url: None
paper_authors: Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Hanwen Jiang, Dejia Xu, Zehao Zhu, Dilin Wang, Zhangyang Wang
for: DragView is designed for generating novel views of unseen scenes from a single source image, with the ability to handle occlusion and flexible camera trajectories.
methods: DragView uses a sparse set of unposed multi-view images, a view-dependent modulation layer, and a transformer to decode ray features into final pixel intensities, all executed within a single feed-forward pass.
results: DragView showcases the capability to generalize to new scenes unseen during training, and consistently demonstrates superior performance in view synthesis quality compared to recent scene representation networks and generalizable NeRFs.

Abstract
We introduce DragView, a novel and interactive framework for generating novel views of unseen scenes. DragView initializes the new view from a single source image, and the rendering is supported by a sparse set of unposed multi-view images, all seamlessly executed within a single feed-forward pass. Our approach begins with users dragging a source view through a local relative coordinate system. Pixel-aligned features are obtained by projecting the sampled 3D points along the target ray onto the source view. We then incorporate a view-dependent modulation layer to effectively handle occlusion during the projection. Additionally, we broaden the epipolar attention mechanism to encompass all source pixels, facilitating the aggregation of initialized coordinate-aligned point features from other unposed views. Finally, we employ another transformer to decode ray features into final pixel intensities. Crucially, our framework does not rely on either 2D prior models or the explicit estimation of camera poses. During testing, DragView showcases the capability to generalize to new scenes unseen during training, also utilizing only unposed support images, enabling the generation of photo-realistic new views characterized by flexible camera trajectories. In our experiments, we conduct a comprehensive comparison of the performance of DragView with recent scene representation networks operating under pose-free conditions, as well as with generalizable NeRFs subject to noisy test camera poses. DragView consistently demonstrates its superior performance in view synthesis quality, while also being more user-friendly. Project page: https://zhiwenfan.github.io/DragView/.

摘要
我们介绍DragView，一种新的和交互式框架，用于生成未被见过的场景视图。DragView从单个源图像初始化新视图，并且渲染是基于一个稀缺的多视图图像支持，完全在单个往返传播中执行。我们的方法开始于用户将源视图拖动到本地相对坐标系中。通过对抽样的3D点进行 projetction，从源视图中获得齐平的特征点。然后，我们添加了视角依赖的修饰层，以有效地处理 occlusion durante la proyección。此外，我们扩展了 epipolar 注意力机制，以覆盖所有源像素，使得自 initialize 协调对齐点特征从其他无法 pose 视图中提取 initialized 协调点特征。最后，我们使用另一个 transformer 来解码轨道特征为最终像素强度。关键是，我们的框架不依赖于2D先验模型或自动确定相机位置。在测试时，DragView 显示了能够泛化到未在训练过程中看到的新场景，并且只使用无法 pose 支持图像，以生成 photo-realistic 的新视图，其中 camera 轨迹具有灵活性。在我们的实验中，我们对 DragView 的性能进行了对比，包括最近的场景表示网络在无法 pose 条件下的性能，以及一些受到噪音测试相机位置的 Generalizable NeRF 的性能。DragView 在视图合成质量方面 consistently 表现出色，而且更加 user-friendly。项目页面：https://zhiwenfan.github.io/DragView/.

LumiNet: The Bright Side of Perceptual Knowledge Distillation

paper_url: http://arxiv.org/abs/2310.03669
repo_url: None
paper_authors: Md. Ismail Hossain, M M Lutfe Elahi, Sameera Ramasinghe, Ali Cheraghian, Fuad Rahman, Nabeel Mohammed, Shafin Rahman
for: 该论文主要研究了基于logit的知识填充方法，以强化知识传递的能力。
methods: 该方法提出了一个名为LumiNet的新型知识传递算法，通过调整logits来提高学生模型对教师模型的学习。
results: 对于CIFAR-100、ImageNet和MSCOCO等基准数据集，LumiNet的实验结果表明其与当前的特征基于方法相比具有竞争力。此外，通过在不同任务下进行传输学习，该方法还能够强化学生模型在下游任务中的适应能力。

Abstract
In knowledge distillation research, feature-based methods have dominated due to their ability to effectively tap into extensive teacher models. In contrast, logit-based approaches are considered to be less adept at extracting hidden 'dark knowledge' from teachers. To bridge this gap, we present LumiNet, a novel knowledge-transfer algorithm designed to enhance logit-based distillation. We introduce a perception matrix that aims to recalibrate logits through adjustments based on the model's representation capability. By meticulously analyzing intra-class dynamics, LumiNet reconstructs more granular inter-class relationships, enabling the student model to learn a richer breadth of knowledge. Both teacher and student models are mapped onto this refined matrix, with the student's goal being to minimize representational discrepancies. Rigorous testing on benchmark datasets (CIFAR-100, ImageNet, and MSCOCO) attests to LumiNet's efficacy, revealing its competitive edge over leading feature-based methods. Moreover, in exploring the realm of transfer learning, we assess how effectively the student model, trained using our method, adapts to downstream tasks. Notably, when applied to Tiny ImageNet, the transferred features exhibit remarkable performance, further underscoring LumiNet's versatility and robustness in diverse settings. With LumiNet, we hope to steer the research discourse towards a renewed interest in the latent capabilities of logit-based knowledge distillation.

摘要
在知识储备研究中，基于特征的方法长期占据主导地位，这可能是因为它们能够有效地利用了大量的教师模型。然而，基于幂数的方法被视为无法充分激发教师模型中隐藏的“黑知识”。为了bridging这个差距，我们提出了LumiNet，一种新的知识传递算法，旨在通过调整幂数来增强基于幂数的储备。我们引入了一个感知矩阵，用于重新塑造幂数，以便通过对模型表示能力的调整来激发更多的隐藏知识。两个模型都被映射到这个矩阵上，学生模型的目标是减少表示差异。我们在CIFAR-100、ImageNet和MSCOCO等标准测试集上进行了严格的测试，并证明LumiNet的效果，其与主流基于特征的方法相比，显示出竞争力。此外，我们还进行了对下游任务的探索，并发现通过我们的方法进行训练后，学生模型在Tiny ImageNet上表现出了很好的性能，这再次证明了LumiNet在多种设置下的多样性和稳定性。我们希望通过LumiNet，激发研究者对基于幂数的知识储备的新兴兴趣。

Certification of Deep Learning Models for Medical Image Segmentation

paper_url: http://arxiv.org/abs/2310.03664
repo_url: None
paper_authors: Othmane Laousy, Alexandre Araujo, Guillaume Chassagnon, Nikos Paragios, Marie-Pierre Revel, Maria Vakalopoulou
for: 针对医疗影像分割模型，提供了首次的证明过程。
methods: 基于渐进推 diffusion 模型和随机缩放的方法。
results: 对五个公共静止肺X射线、皮肤癌和直肠内视镜数据集进行了广泛的实验，并观察到可以保持高度证明的 dice 分数，即使图像受到了很高水平的干扰。

Abstract
In medical imaging, segmentation models have known a significant improvement in the past decade and are now used daily in clinical practice. However, similar to classification models, segmentation models are affected by adversarial attacks. In a safety-critical field like healthcare, certifying model predictions is of the utmost importance. Randomized smoothing has been introduced lately and provides a framework to certify models and obtain theoretical guarantees. In this paper, we present for the first time a certified segmentation baseline for medical imaging based on randomized smoothing and diffusion models. Our results show that leveraging the power of denoising diffusion probabilistic models helps us overcome the limits of randomized smoothing. We conduct extensive experiments on five public datasets of chest X-rays, skin lesions, and colonoscopies, and empirically show that we are able to maintain high certified Dice scores even for highly perturbed images. Our work represents the first attempt to certify medical image segmentation models, and we aspire for it to set a foundation for future benchmarks in this crucial and largely uncharted area.

摘要
医疗影像中的分割模型在过去一代有了显著改进，现在在临床实践中每天都使用。然而，与分类模型一样，分割模型也受到敌意攻击的影响。在医疗领域中，确认模型预测的重要性是无可估量的。Randomized smoothing最近引入了一种框架，可以证明模型的预测，并提供了理论保证。在这篇论文中，我们为首次提出了医疗影像中证明的分割基线，基于随机熔浆概率模型和扩散模型。我们的结果表明，利用扩散概率模型的力量，我们可以超越随机熔浆的限制。我们在五个公共数据集上进行了广泛的实验，包括胸部X射线、皮肤损害和colonoscopy，并观察到我们可以保持高的证明 dice 分数，即使图像受到了严重的干扰。我们的工作代表了医疗影像分割模型的首次证明，我们希望这可以成为未来在这一关键和未知的领域的基础。

Robustness-Guided Image Synthesis for Data-Free Quantization

paper_url: http://arxiv.org/abs/2310.03661
repo_url: None
paper_authors: Jianhong Bai, Yuchen Yang, Huanpeng Chu, Hualiang Wang, Zuozhu Liu, Ruizhe Chen, Xiaoxuan He, Lianrui Mu, Chengfei Cai, Haoji Hu
for: 提高数据自由压缩性能，增强生成图像 semantics 和多样性。
methods: 提出 Robustness-Guided Image Synthesis (RIS) 方法，通过在输入和模型参数层次上引入干扰，并在特征和预测层次上定义不一致度指标，以提高生成图像 semantics 和多样性。
results: 在不同设定下实现了数据自由压缩性能的状态场，并且可以扩展到其他数据自由压缩任务。

Abstract
Quantization has emerged as a promising direction for model compression. Recently, data-free quantization has been widely studied as a promising method to avoid privacy concerns, which synthesizes images as an alternative to real training data. Existing methods use classification loss to ensure the reliability of the synthesized images. Unfortunately, even if these images are well-classified by the pre-trained model, they still suffer from low semantics and homogenization issues. Intuitively, these low-semantic images are sensitive to perturbations, and the pre-trained model tends to have inconsistent output when the generator synthesizes an image with poor semantics. To this end, we propose Robustness-Guided Image Synthesis (RIS), a simple but effective method to enrich the semantics of synthetic images and improve image diversity, further boosting the performance of downstream data-free compression tasks. Concretely, we first introduce perturbations on input and model weight, then define the inconsistency metrics at feature and prediction levels before and after perturbations. On the basis of inconsistency on two levels, we design a robustness optimization objective to enhance the semantics of synthetic images. Moreover, we also make our approach diversity-aware by forcing the generator to synthesize images with small correlations in the label space. With RIS, we achieve state-of-the-art performance for various settings on data-free quantization and can be extended to other data-free compression tasks.

摘要
量化已经出现为模型压缩的可能的方向。最近，数据无关量化已经广泛研究，以避免隐私问题，它使用图像作为代替实际训练数据来生成图像。现有方法使用类别损失来确保生成的图像的可靠性。然而，即使这些图像由预训练模型良好分类，它们仍然受到低 semantics 和同化问题的困扰。我们认为这些低 semantics 图像是敏感的，生成器Synthesize图像时容易受到扰动的影响，预训练模型对生成的图像的输出是不一致的。为此，我们提出了Robustness-Guided Image Synthesis（RIS），一种简单 yet effective的方法，以提高生成图像的 semantics 和多样性，从而提高下游数据free压缩任务的性能。具体来说，我们首先对输入和模型参数进行扰动，然后定义在特征层和预测层之前和之后的不一致度量。基于这两个层次的不一致度量，我们设计了一个Robustness optimization objective，以提高生成图像的 semantics。此外，我们还使我们的方法具有多样性， forcing the generator to synthesize images with small correlations in the label space。与RIS相比，我们实现了数据free压缩中的状态级性能，并且可以扩展到其他数据free压缩任务。

Visual inspection for illicit items in X-ray images using Deep Learning

paper_url: http://arxiv.org/abs/2310.03658
repo_url: None
paper_authors: Ioannis Mademlis, Georgios Batsis, Adamantia Anna Rebolledo Chrysochoou, Georgios Th. Papadopoulos
for: 实现自动检测货物摄像头中的黑心物品，以提高公共安全性，增加安全人员的生产力和减轻对于安全人员的心理负担，特别是在机场、地铁、海关/邮政等地区。
methods: 使用现代计算机视觉算法，基于深度神经网络（DNNs），以应对大量和高速的旅客和邮件等，并在受限和嵌入式环境中进行实现。
results: 根据实验结果显示，Transformer检测器最具优势，而过时的辅助神经网络在安全应用中的发展没有效果，CSP-DarkNet底层CNN也表现高效。

Abstract
Automated detection of contraband items in X-ray images can significantly increase public safety, by enhancing the productivity and alleviating the mental load of security officers in airports, subways, customs/post offices, etc. The large volume and high throughput of passengers, mailed parcels, etc., during rush hours practically make it a Big Data problem. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task even under resource-constrained and embedded execution scenarios, e.g., as is the case with fast, single-stage object detectors. However, no comparative experimental assessment of the various relevant DNN components/methods has been performed under a common evaluation protocol, which means that reliable cross-method comparisons are missing. This paper presents exactly such a comparative assessment, utilizing a public relevant dataset and a well-defined methodology for selecting the specific DNN components/modules that are being evaluated. The results indicate the superiority of Transformer detectors, the obsolete nature of auxiliary neural modules that have been developed in the past few years for security applications and the efficiency of the CSP-DarkNet backbone CNN.

摘要
自动检测违法物品在X射图像中可以显著提高公共安全，因为它可以提高安全官员的产量和减轻他们的心理负担，特别是在机场、地铁、海关/邮政等场合。在湮旷时间段，大量的旅客和寄送包裹等会导致实际上是一个大数据问题。现代计算机视觉算法，基于深度神经网络（DNN），已经证明可以完成这项任务，即使在资源受限和嵌入式执行 scenarios 中。然而，没有一个 Comparative experimental assessment of the various relevant DNN components/methods has been performed under a common evaluation protocol，这意味着可靠的交叉比较不存在。这篇文章提供了一个 Comparative assessment，使用公共 relevante 的 dataset 和一个 Well-defined methodology for selecting the specific DNN components/modules being evaluated。结果表明 transformer 检测器的超越性，落后性 auxillary neural modules 在过去几年为安全应用程序开发的，以及 CSP-DarkNet 背景 CNN 的效率。

Wasserstein Distortion: Unifying Fidelity and Realism

paper_url: http://arxiv.org/abs/2310.03629
repo_url: None
paper_authors: Yang Qiu, Aaron B. Wagner, Johannes Ballé, Lucas Theis
for: 这篇论文是为了提出一种图像扭曲度量，即 Wasserstein distortion，该度量同时涵盖了像素级准确性和现实性。
methods: 论文使用了 Wasserstein distortion 来评估图像的扭曲度量，并对不同参数选择进行了分析。
results: 论文通过实验示出了 Wasserstein distortion 的实用性，可以同时保证图像的像素级准确性和现实性。例如，通过生成随机的 texture 来示出 Wasserstein distortion 的应用。

Abstract
We introduce a distortion measure for images, Wasserstein distortion, that simultaneously generalizes pixel-level fidelity on the one hand and realism on the other. We show how Wasserstein distortion reduces mathematically to a pure fidelity constraint or a pure realism constraint under different parameter choices. Pairs of images that are close under Wasserstein distortion illustrate its utility. In particular, we generate random textures that have high fidelity to a reference texture in one location of the image and smoothly transition to an independent realization of the texture as one moves away from this point. Connections between Wasserstein distortion and models of the human visual system are noted.

摘要
我们介绍了一种图像扭曲度量，即沃森拓扑扭曲度量，它同时同时具有像素级准确性和现实性两种特点。我们展示了沃森拓扑扭曲度量在不同参数选择下可以Mathematically reduce to纯准确性约束或纯现实性约束。图像中的离散点对象示出了沃森拓扑扭曲度量的用途。特别是，我们生成了一些随机的纹理图像，这些图像在一个图像中具有高准确性参照纹理，随着移动 away from this point，纹理逐渐变得独立和自由。我们还注意到了沃森拓扑扭曲度量与人类视觉系统模型之间的连接。

High-Degrees-of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning

paper_url: http://arxiv.org/abs/2310.03624
repo_url: None
paper_authors: Lennart Schulze, Hod Lipson
for:The paper is written for developing a robot self-model that can be used for motion planning tasks in the absence of classical geometric kinematic models.methods:The paper uses neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations.results:The learned self-model achieves a Chamfer-L2 distance of 2% of the robot’s workspace dimension, demonstrating its effectiveness in motion planning tasks.Here’s the Chinese translation of the three key points:for:论文是为了开发一个不需要经典几何遥感模型的机器人自模型，用于减少人工介入，实现真正的自主 Agent。methods:论文使用神经场来让机器人自己模型其动态物体的几何结构，通过基于神经网络的封闭概率场建模。results:学习的自模型在7个自由度（DOF）机器人测试setup中实现了2%的机器人工作空间维度的Chamfer-L2距离。

Abstract
A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in absence of classical geometric kinematic models. In particular, when the latter are hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's workspace dimension. We demonstrate the capabilities of this model on a motion planning task as an exemplary downstream application.

摘要
一个机器人自我模型是一种任务无关的机器人物理结构表示，可以用于减少或缺失经典几何动力学模型，尤其是在机器人的动力学发生意外变化或很难工程化时。在这种情况下，无人工作机器人自我模型是真正自主的必备特性。在这种工作中，我们利用神经场来让机器人自我模型其动态物理结构，通过学习神经隐式查询模型，只需基于2D图像和摄像头位置和配置进行学习。这使得我们的方法在已有方法所不足的情况下具有更大的可用性。为此，我们还提出了一种课程数据采样策略和新的编码器基于神经树量场架构，用于Conditional on high degrees of freedom (DOFs)的动态物体中心场景。在7DOF机器人测试设置中，学习的自我模型实现了机器人工作空间维度的Chamfer-L2距离为2%。我们示示了这种模型在运动规划任务中的应用能力。

Animatable Virtual Humans: Learning pose-dependent human representations in UV space for interactive performance synthesis

paper_url: http://arxiv.org/abs/2310.03615
repo_url: None
paper_authors: Wieland Morgenstern, Milena T. Bagdasarian, Anna Hilsmann, Peter Eisert
for: 这篇论文旨在提出一种新的虚拟人类表现方法，用于高度真实的实时动画和渲染在3D应用中。
methods: 该方法基于高精度多视图视频重建获取的动态 mesh序列学习 pose-dependent 外观和几何。
results: 该方法可以高效地学习人体 pose-dependent 外观和几何，并在实时场景中进行流畅处理和渲染虚拟人类。

Abstract
We propose a novel representation of virtual humans for highly realistic real-time animation and rendering in 3D applications. We learn pose dependent appearance and geometry from highly accurate dynamic mesh sequences obtained from state-of-the-art multiview-video reconstruction. Learning pose-dependent appearance and geometry from mesh sequences poses significant challenges, as it requires the network to learn the intricate shape and articulated motion of a human body. However, statistical body models like SMPL provide valuable a-priori knowledge which we leverage in order to constrain the dimension of the search space enabling more efficient and targeted learning and define pose-dependency. Instead of directly learning absolute pose-dependent geometry, we learn the difference between the observed geometry and the fitted SMPL model. This allows us to encode both pose-dependent appearance and geometry in the consistent UV space of the SMPL model. This approach not only ensures a high level of realism but also facilitates streamlined processing and rendering of virtual humans in real-time scenarios.

摘要
我们提出了一种新的虚拟人形表示方法，用于高度真实的实时动画和渲染在3D应用程序中。我们从 state-of-the-art 多视图视频重建获取了高度准确的动态网格序列，并学习 pose-dependent 形状和外观。学习pose-dependent的形状和运动呈poses significant challenges，因为它需要网络学习人体体形的细节和骨骼运动。然而，统计体模型如SMPL提供了valuable 先前知识，我们可以利用其来约束搜索空间的维度，以便更有效地学习和定向学习。而不是直接学习绝对pose-dependent的准确 geometry，我们学习了观察到的 geometry 与SMPL模型相比的差异。这种方法不仅保证了高度真实，还便利了实时enario中的虚拟人形处理和渲染。

How Good Are Synthetic Medical Images? An Empirical Study with Lung Ultrasound

paper_url: http://arxiv.org/abs/2310.03608
repo_url: https://github.com/global-health-labs/us-dcgan
paper_authors: Menghan Yu, Sourabh Kulhare, Courosh Mehanian, Charles B Delahunt, Daniel E Shea, Zohreh Laverriere, Ishan Shah, Matthew P Horning
for: 这个研究旨在提出一个整体框架，用于适应医学影像分析模型的开发 workflow。
methods: 该研究使用生成模型作为数据增强方法，并使用对抗方法保护患者隐私。
results: 研究表明，将真实数据和生成数据混合训练可以超越只使用真实数据训练的性能，并且模型只使用生成数据训练的性能接近真实数据训练的性能。

Abstract
Acquiring large quantities of data and annotations is known to be effective for developing high-performing deep learning models, but is difficult and expensive to do in the healthcare context. Adding synthetic training data using generative models offers a low-cost method to deal effectively with the data scarcity challenge, and can also address data imbalance and patient privacy issues. In this study, we propose a comprehensive framework that fits seamlessly into model development workflows for medical image analysis. We demonstrate, with datasets of varying size, (i) the benefits of generative models as a data augmentation method; (ii) how adversarial methods can protect patient privacy via data substitution; (iii) novel performance metrics for these use cases by testing models on real holdout data. We show that training with both synthetic and real data outperforms training with real data alone, and that models trained solely with synthetic data approach their real-only counterparts. Code is available at https://github.com/Global-Health-Labs/US-DCGAN.

摘要
获取大量数据和注释是开发高性能深度学习模型的有效方法，但在医疗上困难和昂贵。通过使用生成模型生成的假数据可以解决数据稀缺问题，并可以解决数据不均衡和患者隐私问题。在这项研究中，我们提出了适应医学图像分析模型开发工作流程的完整框架。我们在不同的数据量下测试了（i）生成模型作为数据扩充方法的效果；（ii）如何通过数据替换来保护患者隐私；（iii）为这些用例提供新的性能指标，通过测试模型在真实副本数据上进行测试。我们发现，训练使用真实和假数据的模型比训练使用真实数据alone更高效，并且模型准备了假数据alone与真实数据alone相似。可以在 GitHub 上获取代码：https://github.com/Global-Health-Labs/US-DCGAN。

Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints

paper_url: http://arxiv.org/abs/2310.03602
repo_url: None
paper_authors: Chuan Fang, Xiaotao Hu, Kunming Luo, Ping Tan
for: 这paper的目的是为了提供一种可以从文本提示生成高质量的3D室内场景，并且允许用户进行可交互的编辑操作。methods: 该方法使用了一种分解layout和appearance的思路，首先使用文本Conditional Diffusion Model来学习场景布局分布，然后使用精度调整的ControlNet来生成高质量的3D场景图像。results: 该方法可以生成高质量的3D场景图像，并且可以让用户通过Mask-guided editing模块进行可交互的编辑操作，而无需进行贵重的编辑培训。在Structured3D dataset上进行了广泛的实验，并证明该方法可以比 existed方法更高效地生成从文本提示生成的3D场景。

Abstract
Text-driven 3D indoor scene generation could be useful for gaming, film industry, and AR/VR applications. However, existing methods cannot faithfully capture the room layout, nor do they allow flexible editing of individual objects in the room. To address these problems, we present Ctrl-Room, which is able to generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a text prompt. Moreover, Ctrl-Room enables versatile interactive editing operations such as resizing or moving individual furniture items. Our key insight is to separate the modeling of layouts and appearance. %how to model the room that takes into account both scene texture and geometry at the same time. To this end, Our proposed method consists of two stages, a `Layout Generation Stage' and an `Appearance Generation Stage'. The `Layout Generation Stage' trains a text-conditional diffusion model to learn the layout distribution with our holistic scene code parameterization. Next, the `Appearance Generation Stage' employs a fine-tuned ControlNet to produce a vivid panoramic image of the room guided by the 3D scene layout and text prompt. In this way, we achieve a high-quality 3D room with convincing layouts and lively textures. Benefiting from the scene code parameterization, we can easily edit the generated room model through our mask-guided editing module, without expensive editing-specific training. Extensive experiments on the Structured3D dataset demonstrate that our method outperforms existing methods in producing more reasonable, view-consistent, and editable 3D rooms from natural language prompts.

摘要
文本驱动的3D室内场景生成可能对游戏、电影业和AR/VR应用程序有用。然而，现有方法无法准确地捕捉室内布局，也无法让个体对象进行灵活编辑。为解决这些问题，我们提出了Ctrl-Room，它可以从文本提示生成真实的3D室内场景，并具有设计风格的布局和高质量的纹理图像。此外，Ctrl-Room还允许用户进行便捷的交互式编辑操作，如调整室内家具的大小或位置。我们的关键发现是将布局和外观模型分离开来。我们的提posed方法包括两个阶段：一个`Layout Generation Stage'和一个`Appearance Generation Stage'。`Layout Generation Stage'使用文本条件的扩散模型来学习室内布局的分布，而`Appearance Generation Stage'使用精心调整的ControlNet来生成基于3D场景布局和文本提示的生动的全景图像。通过这种方式，我们可以生成高质量的3D室内场景，具有真实的布局和生动的纹理图像。由于使用场景代码参数化，我们可以轻松地通过我们的面具引导编辑模块进行编辑，而无需贵重的编辑Specific training。我们的实验表明，我们的方法可以在Structured3D dataset上生成更合理、视角一致和可编辑的3D室内场景，比现有方法更高效。

BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields

paper_url: http://arxiv.org/abs/2310.03563
repo_url: None
paper_authors: Ágoston István Csehi, Csaba Máté Józsa
for: 提高倒掌风格场（iNeRF）算法，将图像pose估计问题定义为基于NeRF的迭代线性优化问题。
methods: 我们对iNeRF算法进行了以下改进：添加深度基于损失函数的本地化优化目标，使用多张图像的关系pose来定义损失函数，并在树状渲染中减少层次抽象采样。
results: 我们的改进可以显著提高iNeRF算法的吞吐量和基本错误率，并将估计范围扩展到更高的初始pose估计错误率。

Abstract
We aim to improve the Inverted Neural Radiance Fields (iNeRF) algorithm which defines the image pose estimation problem as a NeRF based iterative linear optimization. NeRFs are novel neural space representation models that can synthesize photorealistic novel views of real-world scenes or objects. Our contributions are as follows: we extend the localization optimization objective with a depth-based loss function, we introduce a multi-image based loss function where a sequence of images with known relative poses are used without increasing the computational complexity, we omit hierarchical sampling during volumetric rendering, meaning only the coarse model is used for pose estimation, and we how that by extending the sampling interval convergence can be achieved even or higher initial pose estimate errors. With the proposed modifications the convergence speed is significantly improved, and the basin of convergence is substantially extended.

摘要
我们目标是改进倒计时神经辐射场（iNeRF）算法，该算法定义图像pose估计问题为基于NeRF的迭代线性优化问题。NeRF是一种新型神经空间表示模型，可以生成高品质的新视图图像或物体场景。我们的贡献包括以下几点：1. 将本地化优化目标添加depth-based损失函数，以提高pose估计精度。2. 引入多张图像基于损失函数，使用known相对pose的图像序列，无需增加计算复杂度。3. 在Volume Rendering中弃用层次抽象采样，只使用粗略模型进行pose估计，从而降低计算复杂度。4. 通过延长采样间隔，可以实现高初始pose估计错误的抽象，并且扩展了basin of convergence。通过我们的修改， convergence speed 得到了显著提高，并且basin of convergence得到了substantial扩展。

MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images

paper_url: http://arxiv.org/abs/2310.03559
repo_url: None
paper_authors: Yanwu Xu, Li Sun, Wei Peng, Shyam Visweswaran, Kayhan Batmanghelich
for: This paper presents an innovative method for generating high-quality 3D lung CT images based on textual information, which can enhance numerous downstream tasks.methods: The proposed method utilizes a hierarchical scheme with a modified UNet architecture to synthesize low-resolution images conditioned on text, and further generates vascular, airway, and lobular segmentation masks to ensure anatomical plausibility.results: The proposed approach demonstrates superior performance compared to state-of-the-art models based on GAN and diffusion techniques, especially in retaining crucial anatomical features such as fissure lines, airways, and vascular structures.

Abstract
This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. The results of comparative assessments indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines, airways, and vascular structures. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.

摘要
To address the memory issue, we propose a hierarchical scheme that uses a modified UNet architecture. We first synthesize low-resolution images conditioned on the text, which serves as a foundation for subsequent generators to complete the volumetric data. Our approach demonstrates the capability to use textual input and segmentation tasks to generate synthesized images, with superior performance compared to existing GAN and diffusion techniques.The two main objectives of this study are:1. Developing a method for creating images based on textual prompts and anatomical components.2. Generating new images conditioning on anatomical elements.The advancements in image generation can be applied to enhance numerous downstream tasks.

Towards Unified Deep Image Deraining: A Survey and A New Benchmark

paper_url: http://arxiv.org/abs/2310.03535
repo_url: None
paper_authors: Xiang Chen, Jinshan Pan, Jiangxin Dong, Jinhui Tang
for: 本研究旨在提供一个统一的评估设定，用于评估现有的图像雨排除方法，并提供一个新的高质量标准benchmark。
methods: 本研究使用了现有的图像雨排除方法，并对其进行了全面的评估。
results: 本研究提出了一个新的高质量标准benchmark，并通过了extensive的性能评估。

Abstract
Recent years have witnessed significant advances in image deraining due to the kinds of effective image priors and deep learning models. As each deraining approach has individual settings (e.g., training and test datasets, evaluation criteria), how to fairly evaluate existing approaches comprehensively is not a trivial task. Although existing surveys aim to review of image deraining approaches comprehensively, few of them focus on providing unify evaluation settings to examine the deraining capability and practicality evaluation. In this paper, we provide a comprehensive review of existing image deraining method and provide a unify evaluation setting to evaluate the performance of image deraining methods. We construct a new high-quality benchmark named HQ-RAIN to further conduct extensive evaluation, consisting of 5,000 paired high-resolution synthetic images with higher harmony and realism. We also discuss the existing challenges and highlight several future research opportunities worth exploring. To facilitate the reproduction and tracking of the latest deraining technologies for general users, we build an online platform to provide the off-the-shelf toolkit, involving the large-scale performance evaluation. This online platform and the proposed new benchmark are publicly available and will be regularly updated at http://www.deraining.tech/.

摘要
近年来，因为有效的图像前提和深度学习模型，图像排除技术得到了显著进步。然而，每种排除方法都有自己的设置（例如训练和测试数据集、评价标准），因此全面评估现有方法的问题不是很容易解决。虽然现有的报告尝试了对图像排除方法进行全面审查，但只有很少的几篇文章关注提供统一的评估设置，以评估图像排除方法的性能和实用性。在这篇文章中，我们提供了一项全面的图像排除方法审查，并提供了统一的评估设置。我们构建了一个新的高品质标准 benchmark，名为HQ-RAIN，以进行广泛的评估。该标准包括5000对高分辨率的合成图像，具有更高的和实际性。我们还讨论了现有的挑战和提出了一些未来研究的可能性。为便于普通用户复制和跟踪最新的排除技术，我们建立了一个在线平台，提供了大规模的性能评估工具。这个在线平台和我们提出的新标准公共可用，将在http://www.deraining.tech/上进行定期更新。

3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation

paper_url: http://arxiv.org/abs/2310.03534
repo_url: None
paper_authors: Chen Zhao, Tong Zhang, Mathieu Salzmann
for: 本研究旨在解决只有一个参考图像表示物体的情况下，估计物体在不同姿态下的相对 pose 问题。
methods: 我们提出了一种新的假设和验证框架，通过生成和评估多个姿态假设，最终选择最可靠的一个作为相对 pose。为了衡量可靠性，我们引入了3D变换aware的验证方法，将3D物体表示从两个输入图像中学习到的3D对象表示应用3D变换。
results: 我们的方法在Objaverse、LINEMOD和CO3D数据集上进行了广泛的实验，证明我们的相对pose估计精度较高，对大规模姿态变化和不visible object测试时的稳定性具有优势。

Abstract
Prior methods that tackle the problem of generalizable object pose estimation highly rely on having dense views of the unseen object. By contrast, we address the scenario where only a single reference view of the object is available. Our goal then is to estimate the relative object pose between this reference view and a query image that depicts the object in a different pose. In this scenario, robust generalization is imperative due to the presence of unseen objects during testing and the large-scale object pose variation between the reference and the query. To this end, we present a new hypothesis-and-verification framework, in which we generate and evaluate multiple pose hypotheses, ultimately selecting the most reliable one as the relative object pose. To measure reliability, we introduce a 3D-aware verification that explicitly applies 3D transformations to the 3D object representations learned from the two input images. Our comprehensive experiments on the Objaverse, LINEMOD, and CO3D datasets evidence the superior accuracy of our approach in relative pose estimation and its robustness in large-scale pose variations, when dealing with unseen objects.

摘要
先前的方法很多都是基于具有 dense views 的未知 объек的假设，而我们则是面临只有一个参考视图的情况。我们的目标是将参考视图中的对象pose与测试图像中的对象pose进行相对pose estimation。在这种情况下，Robust generalization 是非常重要的，因为测试时可能会出现未知的对象，并且参考和测试图像中的对象pose之间存在大规模的差异。为此，我们提出了一个新的假设-验证框架，在这个框架中，我们生成并评估多个pose假设，最终选择最可靠的 pose 作为对象的相对pose。为了衡量可靠性，我们引入了3D-aware验证，该验证Explicitly applies 3D变换到从两个输入图像中学习的3D对象表示。我们对 Objaverse、LINEMOD 和 CO3D 数据集进行了广泛的实验，证明了我们的方法在相对pose估计中的精度和在大规模的pose变化中的Robustness，当处理未知对象时。

V2X Cooperative Perception for Autonomous Driving: Recent Advances and Challenges

paper_url: http://arxiv.org/abs/2310.03525
repo_url: None
paper_authors: Tao Huang, Jianan Liu, Xi Zhou, Dinh C. Nguyen, Mostafa Rahimi Azghadi, Yuxuan Xia, Qing-Long Han, Sumei Sun
for: 本研究的目的是为了提高自动驾驶系统的安全性和可靠性，通过推动合作感知（Cooperative Perception，CP）技术的发展。
methods: 本研究使用了许多现有的计算机视觉技术，包括物体识别等，以解决现实世界交通环境中的难题。此外，还使用了许多最新的通信技术，如V2X技术，以增强驾驶自动化系统。
results: 本研究提出了一种基于V2X通信技术的CP工作流程，并对现有的V2X-based CP方法进行了分类和评价。此外，还对CP技术的发展进行了一种彻底的文献综述，并评估了现有的数据集和模拟器。最后，本研究还讨论了CP技术的未来发展和挑战。

Abstract
Accurate perception is essential for advancing autonomous driving and addressing safety challenges in modern transportation systems. Despite significant advancements in computer vision for object recognition, current perception methods still face difficulties in complex real-world traffic environments. Challenges such as physical occlusion and limited sensor field of view persist for individual vehicle systems. Cooperative Perception (CP) with Vehicle-to-Everything (V2X) technologies has emerged as a solution to overcome these obstacles and enhance driving automation systems. While some research has explored CP's fundamental architecture and critical components, there remains a lack of comprehensive summaries of the latest innovations, particularly in the context of V2X communication technologies. To address this gap, this paper provides a comprehensive overview of the evolution of CP technologies, spanning from early explorations to recent developments, including advancements in V2X communication technologies. Additionally, a contemporary generic framework is proposed to illustrate the V2X-based CP workflow, aiding in the structured understanding of CP system components. Furthermore, this paper categorizes prevailing V2X-based CP methodologies based on the critical issues they address. An extensive literature review is conducted within this taxonomy, evaluating existing datasets and simulators. Finally, open challenges and future directions in CP for autonomous driving are discussed by considering both perception and V2X communication advancements.

摘要
准确的感知是自动驾驶技术发展的关键，以解决现代交通系统中的安全挑战。尽管计算机视觉技术在物体识别方面做出了 significiant 进步，但现在的感知方法仍然在复杂的实际交通环境中遇到困难。这些困难包括物体遮挡和汽车感知器的有限范围。协同感知（CP）技术与 everything （V2X）技术已经出现为解决这些问题并增强驾驶自动化系统。虽然一些研究探讨了 CP 技术的基本架构和关键组件，但还有一些研究 gap 需要填充。为了填充这些 gap，这篇文章提供了 CP 技术的演化历史，从早期探索到最新的发展，包括 V2X 通信技术的进步。此外，文章还提出了一个现代化的 CP 工作流程框架，以便系统化地理解 CP 系统组件。此外，文章还对 CP 方法分为不同的关键问题，进行了广泛的文献综述。最后，文章讨论了 CP 技术在自动驾驶方面的未来发展和挑战。

PrototypeFormer: Learning to Explore Prototype Relationships for Few-shot Image Classification

paper_url: http://arxiv.org/abs/2310.03517
repo_url: None
paper_authors: Feihong He, Gang Li, Lingyu Si, Leilei Yan, Fanzhang Li, Fuchun Sun
for: 提高少量图像分类的性能，Addressing the challenge of poor classification performance with limited samples in novel classes.
methods: 使用 transformer 架构建 prototype 抽取模块，提取更有准确性的类表示，并在少shot learning scenario 中使用对比学习优化 prototype 特征。
results: 在多个流行的少shot image classification benchmark dataset上进行实验，显示了我们的方法在现有state-of-the-art方法之上具有remarkable的性能，并且将于未来释出代码。Here’s the translation in English for reference:
for: To improve the performance of few-shot image classification, addressing the challenge of poor classification performance with limited samples in novel classes.
methods: Using a transformer architecture to build a prototype extraction module, extracting more discriminative class representations for few-shot classification, and employing a contrastive learning-based optimization approach to optimize prototype features in few-shot learning scenarios.
results: Experimented on several popular few-shot image classification benchmark datasets, showing that our method outperforms all current state-of-the-art methods, with remarkable performance. The code will be released later.

Abstract
Few-shot image classification has received considerable attention for addressing the challenge of poor classification performance with limited samples in novel classes. However, numerous studies have employed sophisticated learning strategies and diversified feature extraction methods to address this issue. In this paper, we propose our method called PrototypeFormer, which aims to significantly advance traditional few-shot image classification approaches by exploring prototype relationships. Specifically, we utilize a transformer architecture to build a prototype extraction module, aiming to extract class representations that are more discriminative for few-shot classification. Additionally, during the model training process, we propose a contrastive learning-based optimization approach to optimize prototype features in few-shot learning scenarios. Despite its simplicity, the method performs remarkably well, with no bells and whistles. We have experimented with our approach on several popular few-shot image classification benchmark datasets, which shows that our method outperforms all current state-of-the-art methods. In particular, our method achieves 97.07% and 90.88% on 5-way 5-shot and 5-way 1-shot tasks of miniImageNet, which surpasses the state-of-the-art results with accuracy of 7.27% and 8.72%, respectively. The code will be released later.

摘要
Recently, few-shot image classification has received significant attention due to the challenge of achieving poor classification performance with limited samples in novel classes. Many studies have employed sophisticated learning strategies and diversified feature extraction methods to address this issue. In this paper, we propose a method called PrototypeFormer, which aims to significantly advance traditional few-shot image classification approaches by exploring prototype relationships. Specifically, we use a transformer architecture to build a prototype extraction module, which aims to extract class representations that are more discriminative for few-shot classification. Additionally, during the model training process, we propose a contrastive learning-based optimization approach to optimize prototype features in few-shot learning scenarios. Despite its simplicity, the method performs remarkably well, with no bells and whistles. We have experimented with our approach on several popular few-shot image classification benchmark datasets, which shows that our method outperforms all current state-of-the-art methods. In particular, our method achieves 97.07% and 90.88% on 5-way 5-shot and 5-way 1-shot tasks of miniImageNet, which surpasses the state-of-the-art results with accuracy of 7.27% and 8.72%, respectively. The code will be released later.Here's the breakdown of the translation:1. Recently, few-shot image classification has received significant attention (近些时候，几张图像分类 receiving 了 considerable attention)2. due to the challenge of achieving poor classification performance with limited samples in novel classes (因为它们需要处理有限样本的新类，而且这些样本通常是异常的)3. Many studies have employed sophisticated learning strategies and diversified feature extraction methods to address this issue. (许多研究使用了复杂的学习策略和多样化的特征提取方法来解决这个问题)4. In this paper, we propose a method called PrototypeFormer, which aims to significantly advance traditional few-shot image classification approaches by exploring prototype relationships. (在这篇论文中，我们提出了一种方法，叫做 PrototypeFormer，它目的是通过探索原型关系来显著地提高传统的几张图像分类方法)5. Specifically, we use a transformer architecture to build a prototype extraction module, which aims to extract class representations that are more discriminative for few-shot classification. (具体来说，我们使用 transformer 架构来建立原型提取模块，以提取更加特征的几张图像分类)6. Additionally, during the model training process, we propose a contrastive learning-based optimization approach to optimize prototype features in few-shot learning scenarios. (此外，在模型训练过程中，我们提出了一种基于对比学习的优化方法，用于优化几张图像分类中的原型特征)7. Despite its simplicity, the method performs remarkably well, with no bells and whistles. (尽管它的简单，但它的性能很好，没有任何辅助工具)8. We have experimented with our approach on several popular few-shot image classification benchmark datasets, which shows that our method outperforms all current state-of-the-art methods. (我们在几个流行的几张图像分类标准数据集上进行了实验，发现我们的方法超过了当前的状态艺术方法)9. In particular, our method achieves 97.07% and 90.88% on 5-way 5-shot and 5-way 1-shot tasks of miniImageNet, which surpasses the state-of-the-art results with accuracy of 7.27% and 8.72%, respectively. (特别是，我们的方法在 miniImageNet 中的 5-way 5-shot 和 5-way 1-shot 任务上达到了 97.07% 和 90.88%，超过了当前状态艺术方法的准确率 7.27% 和 8.72%， соответственно)10. The code will be released later. (代码将在后来发布)

Exploring DINO: Emergent Properties and Limitations for Synthetic Aperture Radar Imagery

paper_url: http://arxiv.org/abs/2310.03513
repo_url: None
paper_authors: Joseph A. Gallego-Mejia, Anna Jungbluth, Laura Martínez-Ferrer, Matt Allen, Francisco Dorr, Freddie Kalaitzis, Raúl Ramos-Pollán
for: 这个研究探讨了Self-Distillation with No Labels（DINO）算法在 Synthetic Aperture Radar（SAR）图像上的应用和emergent特性。
methods: 我们使用无标签SAR数据预训练一个基于Vision Transformer（ViT）的DINO模型，然后精确调整模型来预测高分辨率土地覆盖图。我们仔细评估了ViT底层抽象的MAP值，并与模型的Token Embedding空间进行比较。
results: 我们发现预训练比于从scratch训练有小量的提升，并讨论了SSL在Remote Sensing和土地覆盖分类中的局限性和机遇。此外，我们发现ViT的抽象MAP值对于Remote Sensing具有很大的内在价值，可以提供有用的输入 для其他算法。这个研究为大型和更好的SSL模型的开发奠定了基础。

Abstract
Self-supervised learning (SSL) models have recently demonstrated remarkable performance across various tasks, including image segmentation. This study delves into the emergent characteristics of the Self-Distillation with No Labels (DINO) algorithm and its application to Synthetic Aperture Radar (SAR) imagery. We pre-train a vision transformer (ViT)-based DINO model using unlabeled SAR data, and later fine-tune the model to predict high-resolution land cover maps. We rigorously evaluate the utility of attention maps generated by the ViT backbone, and compare them with the model's token embedding space. We observe a small improvement in model performance with pre-training compared to training from scratch, and discuss the limitations and opportunities of SSL for remote sensing and land cover segmentation. Beyond small performance increases, we show that ViT attention maps hold great intrinsic value for remote sensing, and could provide useful inputs to other algorithms. With this, our work lays the ground-work for bigger and better SSL models for Earth Observation.

摘要
自我指导学习（SSL）模型最近已经在不同任务上展示出惊人的表现，包括图像分割。本研究探讨了自我混合 WITH NO Labels（DINO）算法的 emergent 特性，并应用于Synthetic Aperture Radar（SAR）成像。我们使用无标签 SAR 数据预训练一个基于视Transformer（ViT）的 DINO 模型，然后练习模型预测高分辨率地形覆盖图。我们仔细评估了 ViT 底层的注意力地图，并与模型的 Token 空间进行比较。我们发现预训练比于从 scratch 训练有小幅提升性能，并讨论了SSL 在遥感和地形分类中的局限性和机遇。此外，我们发现 ViT 的注意力地图具有很大的内在价值，可以提供用于遥感的有用输入。因此，我们的工作为大型和更好的 SSL 模型奠定了基础。

RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing

paper_url: http://arxiv.org/abs/2310.03507
repo_url: https://github.com/ajsvb/rl_path_tracing
paper_authors: Antoine Scardigli, Lukas Cavigelli, Lorenz K. Müller
for: 提高真实图像生成的可靠性和速度
methods: 使用抽象学习网络对抽象进行END-TO-END训练，包括采样重要性网络、嵌入空间编码器网络和减噪网络
results: 在多个具有挑战性的数据集上提高视觉质量，并将比前一个状态艺术的渲染时间减少为1.6倍，为实时应用提供了有前途的解决方案。

Abstract
Monte-Carlo path tracing is a powerful technique for realistic image synthesis but suffers from high levels of noise at low sample counts, limiting its use in real-time applications. To address this, we propose a framework with end-to-end training of a sampling importance network, a latent space encoder network, and a denoiser network. Our approach uses reinforcement learning to optimize the sampling importance network, thus avoiding explicit numerically approximated gradients. Our method does not aggregate the sampled values per pixel by averaging but keeps all sampled values which are then fed into the latent space encoder. The encoder replaces handcrafted spatiotemporal heuristics by learned representations in a latent space. Finally, a neural denoiser is trained to refine the output image. Our approach increases visual quality on several challenging datasets and reduces rendering times for equal quality by a factor of 1.6x compared to the previous state-of-the-art, making it a promising solution for real-time applications.

摘要
蒙特卡洛路追踪是一种具有很高真实度的图像生成技术，但是在低样本数下会受到高水平的噪声影响，限制其在实时应用中的使用。为了解决这个问题，我们提出了一个框架，其中包括端到端培生样本重要性网络、嵌入空间编码器网络和净化网络。我们的方法使用了回归学习来优化样本重要性网络，从而避免直接用数值 aproximated 的数学导数。我们的方法不是将每个像素的样本值相加，而是保留所有的样本值，然后将它们传递给嵌入空间编码器。编码器将手工设计的空间时间规则替换为学习的表示在嵌入空间中。最后，我们训练了一个神经净化器来精细化输出图像。我们的方法可以在多个复杂的数据集上提高视觉质量，同时降低等质量的渲染时间，比前一个状态艺术高一点1.6倍，因此是一个有前途的解决方案。

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

paper_url: http://arxiv.org/abs/2310.03502
repo_url: https://github.com/ai-forever/movqgan
paper_authors: Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, Denis Dimitrov
for: 这篇论文主要是为了提出一种新的扩展了演化架构，用于提高文本生成图像质量。
methods: 该模型使用了扩展了演化架构，包括像素级和幂等级的方法，并结合了图像先验模型和latent扩散技术。
results: 实验结果显示，该模型在COCO-30K数据集上的FID分数为8.03，与其他开源模型相比，表示该模型在可衡量的图像生成质量方面取得了突出的成绩。

Abstract
Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel exploration of latent diffusion architecture, combining the principles of the image prior models with latent diffusion techniques. The image prior model is trained separately to map text embeddings to image embeddings of CLIP. Another distinct feature of the proposed model is the modified MoVQ implementation, which serves as the image autoencoder component. Overall, the designed model contains 3.3B parameters. We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting. Additionally, we released the source code and checkpoints for the Kandinsky models. Experimental evaluations demonstrate a FID score of 8.03 on the COCO-30K dataset, marking our model as the top open-source performer in terms of measurable image generation quality.

摘要
现代计算机视觉中的文本至图生成是一个重要领域，在生成架构的演化中得到了显著改进。这些模型可以分为两类：像素级和幂等级方法。我们提出了一种新的探索，即将图像先验模型与幂等技术结合，称之为Kandinsky1。这个模型将文本嵌入模型与CLIP的图像嵌入模型进行共同训练。另外，我们修改了MoVQ实现方式，用于图像自编码器组件。总共有3.3亿参数。我们还实现了一个易于使用的示例系统，支持多种生成模式，如文本至图生成、图像融合、文本和图像融合、图像变化生成和文本引导填充/剔除。此外，我们还公布了Kandinsky模型的源代码和检查点。实验评估表明，Kandinsky1模型在COCO-30K dataset上的FID分数为8.03，这标志着我们的模型在可衡量的图像生成质量方面成为开源领先者。

IceCloudNet: Cirrus and mixed-phase cloud prediction from SEVIRI input learned from sparse supervision

paper_url: http://arxiv.org/abs/2310.03499
repo_url: None
paper_authors: Kai Jeggle, Mikolaj Czerkawski, Federico Serva, Bertrand Le Saux, David Neubauer, Ulrike Lohmann
for: 这项研究旨在提供地球站点覆盖率和活动卫星 Retrievals 的 ice 微物理性质 regime-dependent 观测约束，以提高气候模型中 ice 云物理过程的理解，从而减少气候变化中的不确定性。
methods: 这项研究使用了 convolutional neural network (CNN) 训练方法，使用了三年的 SEVIRI 和 DARDAR 数据集，以获得地球站点覆盖率和活动卫星 Retrievals 的 ice 微物理性质观测约束。
results: 这项研究实现了创造一种新的观测约束，可以用于改进气候模型中 ice 云物理过程的理解，从而减少气候变化中的不确定性。

Abstract
Clouds containing ice particles play a crucial role in the climate system. Yet they remain a source of great uncertainty in climate models and future climate projections. In this work, we create a new observational constraint of regime-dependent ice microphysical properties at the spatio-temporal coverage of geostationary satellite instruments and the quality of active satellite retrievals. We achieve this by training a convolutional neural network on three years of SEVIRI and DARDAR data sets. This work will enable novel research to improve ice cloud process understanding and hence, reduce uncertainties in a changing climate and help assess geoengineering methods for cirrus clouds.

摘要
云含冰粒物理性质在气候系统中扮演着关键性角色。然而，这些云仍然对未来气候预测中存在大量不确定性。在这项工作中，我们创造了一个新的观测约束，即在地球同步卫星 instrumente 上的空间时间覆盖和活动卫星推算的冰微物理性质的Registry-dependent。我们通过训练一个卷积神经网络，使用三年的SEVIRI和DARDAR数据集来实现这一目标。这项工作将帮助改善冰云过程理解，从而减少气候变化中的不确定性，并评估环境工程方法 для cirrus 云。

paper_url: http://arxiv.org/abs/2310.03485
repo_url: None
paper_authors: Dimitrios Kollias, Karanjot Vendal, Priyanka Gadhavi, Solomon Russom
for: 预测脑肿瘤MGMTpromoter的甲基化状态
methods: 利用多Modal的MRI扫描数据，包括FLAIR、T1w、T1wCE和T2 3D尺寸，采用BTDNet模型进行预测
results: 在RSNA-ASNR-MICCAI BraTS 2021 Challenge中，BTDNet方法舒大margin地超越了现有方法，提供了可能的脑肿瘤诊断和治疗的新途径

Abstract
Brain tumors pose significant health challenges worldwide, with glioblastoma being one of the most aggressive forms. Accurate determination of the O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is crucial for personalized treatment strategies. However, traditional methods are labor-intensive and time-consuming. This paper proposes a novel multi-modal approach, BTDNet, leveraging multi-parametric MRI scans, including FLAIR, T1w, T1wCE, and T2 3D volumes, to predict MGMT promoter methylation status. BTDNet addresses two main challenges: the variable volume lengths (i.e., each volume consists of a different number of slices) and the volume-level annotations (i.e., the whole 3D volume is annotated and not the independent slices that it consists of). BTDNet consists of four components: i) the data augmentation one (that performs geometric transformations, convex combinations of data pairs and test-time data augmentation); ii) the 3D analysis one (that performs global analysis through a CNN-RNN); iii) the routing one (that contains a mask layer that handles variable input feature lengths), and iv) the modality fusion one (that effectively enhances data representation, reduces ambiguities and mitigates data scarcity). The proposed method outperforms by large margins the state-of-the-art methods in the RSNA-ASNR-MICCAI BraTS 2021 Challenge, offering a promising avenue for enhancing brain tumor diagnosis and treatment.

摘要
脑肿瘤对全球健康造成重要挑战，其中 glioblastoma 是最攻击性的一种。正确地确定 O6-methylguanine-DNA methyltransferase (MGMT) 基因Promoter的甲基化状态是个人化治疗策略的关键。然而，传统方法具有劳动 INTENSIVE 和时间耗费的缺点。这篇论文提出了一种新的多Modal方法，BTDNet，利用多参量 MRI 扫描结果，包括 FLAIR、T1w、T1wCE 和 T2 3D 尺度，来预测 MGMT Promoter 甲基化状态。BTDNet 解决了两个主要挑战：每个尺度的变量尺度（即每个尺度都有不同的 slice 数）和尺度级别的注释（即整个 3D 尺度被注释，而不是每个独立的 slice）。BTDNet 由四个组成部分组成：1. 数据增强一（通过 геометрические变换、数据对的凸合和测试时数据增强进行数据增强）。2. 3D 分析一（通过 CNN-RNN 进行全球分析）。3. 路由一（包含一个mask层，处理变量输入特征长度）。4. 模式融合一（有效地增强数据表示，减少歧义和减少数据缺乏）。提出的方法在 RSNA-ASNR-MICCAI BraTS 2021 挑战中大幅超越了当前状态的方法，提供了一个有前途的方向，用于提高脑肿瘤诊断和治疗。

Ammonia-Net: A Multi-task Joint Learning Model for Multi-class Segmentation and Classification in Tooth-marked Tongue Diagnosis

paper_url: http://arxiv.org/abs/2310.03472
repo_url: None
paper_authors: Shunkai Shi, Yuqi Wang, Qihui Ye, Yanran Wang, Yiming Zhu, Muhammad Hassan, Aikaterini Melliou, Dongmei Yu
For: This paper aims to address the challenges of manual diagnosis of tooth-marked tongue in traditional Chinese medicine, by proposing a multi-task joint learning model named Ammonia-Net.* Methods: The proposed model employs a convolutional neural network-based architecture, specifically designed for multi-class segmentation and classification of tongue images. It performs semantic segmentation of tongue images to identify tongue and tooth marks, and classifies the images into the desired number of classes.* Results: The experimental results show that the proposed model achieves 99.06% accuracy in the two-class classification task of tooth-marked tongue identification and 80.02% accuracy in the segmentation task, with mIoU for tongue and tooth marks amounting to 71.65%.

Abstract
In Traditional Chinese Medicine, the tooth marks on the tongue, stemming from prolonged dental pressure, serve as a crucial indicator for assessing qi (yang) deficiency, which is intrinsically linked to visceral health. Manual diagnosis of tooth-marked tongue solely relies on experience. Nonetheless, the diversity in shape, color, and type of tooth marks poses a challenge to diagnostic accuracy and consistency. To address these problems, herein we propose a multi-task joint learning model named Ammonia-Net. This model employs a convolutional neural network-based architecture, specifically designed for multi-class segmentation and classification of tongue images. Ammonia-Net performs semantic segmentation of tongue images to identify tongue and tooth marks. With the assistance of segmentation output, it classifies the images into the desired number of classes: healthy tongue, light tongue, moderate tongue, and severe tongue. As far as we know, this is the first attempt to apply the semantic segmentation results of tooth marks for tooth-marked tongue classification. To train Ammonia-Net, we collect 856 tongue images from 856 subjects. After a number of extensive experiments, the experimental results show that the proposed model achieves 99.06% accuracy in the two-class classification task of tooth-marked tongue identification and 80.02%. As for the segmentation task, mIoU for tongue and tooth marks amounts to 71.65%.

摘要
在中医中，吃牙印痕的舌头，长期的牙关节压力，作为脉气衰竭（阳衰）的重要指标，舌头的手动诊断完全依赖经验。然而，吃牙印痕的多样性对于诊断精度和一致性带来挑战。为了解决这些问题，我们提出了一个名为Ammonia-Net的多任务集成学习模型。这个模型使用了一个特定设计 для 多 клаス混合分类和 semantic segmentation的舌头图像。Ammonia-Net 进行 semantic segmentation of tongue images，以识别舌头和吃牙印痕。受欢迎的分类结果显示，这是首次将吃牙印痕的 semantic segmentation 结果应用于舌头分类。我们收集了856个舌头图像，并进行了多次广泛的实验。结果显示，我们提出的模型在两种类别分类任务中的舌头印痕识别中取得了99.06%的准确率，并在分类任务中取得了80.02%的准确率。在分 segmentation 任务中，miou 为舌头和吃牙印痕为71.65%。

Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization

paper_url: http://arxiv.org/abs/2310.03456
repo_url: None
paper_authors: Edward Fish, Jon Weinbren, Andrew Gilbert
for: 本文旨在提高视频中的动作识别精度，特别是将音频特征集成到视觉特征检测框架中。
methods: 本文提出了一种新的多尺度音视频特征融合方法（MRAV-FF），通过层次阻止权重机制，灵活地调整音频信息的重要性。
results: 实验表明，MRAV-FF可以提高视频动作识别精度，特别是当音频数据可用时。此外，MRAV-FF可以与现有的FPN TAL架构兼容，提供了一个简单而强大的方法来提高视频动作识别性能。

Abstract
Temporal Action Localization (TAL) aims to identify actions' start, end, and class labels in untrimmed videos. While recent advancements using transformer networks and Feature Pyramid Networks (FPN) have enhanced visual feature recognition in TAL tasks, less progress has been made in the integration of audio features into such frameworks. This paper introduces the Multi-Resolution Audio-Visual Feature Fusion (MRAV-FF), an innovative method to merge audio-visual data across different temporal resolutions. Central to our approach is a hierarchical gated cross-attention mechanism, which discerningly weighs the importance of audio information at diverse temporal scales. Such a technique not only refines the precision of regression boundaries but also bolsters classification confidence. Importantly, MRAV-FF is versatile, making it compatible with existing FPN TAL architectures and offering a significant enhancement in performance when audio data is available.

摘要
Temporal Action Localization (TAL) 目标是在未处理视频中确定动作的开始、结束和类别标签。 Although recent advances in transformer networks and Feature Pyramid Networks (FPN) have improved visual feature recognition in TAL tasks, there has been less progress in integrating audio features into these frameworks. This paper introduces the Multi-Resolution Audio-Visual Feature Fusion (MRAV-FF), a novel method that combines audio-visual data across different temporal resolutions. The key to our approach is a hierarchical gated cross-attention mechanism, which selectively weights the importance of audio information at different temporal scales. This not only refines the precision of regression boundaries but also boosts classification confidence. Importantly, MRAV-FF is versatile and can be compatible with existing FPN TAL architectures, providing a significant improvement in performance when audio data is available.

Mitigating the Influence of Domain Shift in Skin Lesion Classification: A Benchmark Study of Unsupervised Domain Adaptation Methods on Dermoscopic Images

paper_url: http://arxiv.org/abs/2310.03432
repo_url: None
paper_authors: Sireesha Chamarthi, Katharina Fogelberg, Roman C. Maron, Titus J. Brinker, Julia Niebling
For: This paper aims to improve the performance of deep neural networks in skin lesion classification by addressing the issue of domain shift, which can negatively impact the models’ accuracy when tested on new data.* Methods: The authors evaluate eight different unsupervised domain adaptation methods to determine their effectiveness in improving generalization for dermoscopic datasets.* Results: The authors find that all eight domain adaptation methods result in improved AUPRC for the majority of analyzed datasets, indicating that unsupervised domain adaptation generally leads to performance improvements for the binary melanoma-nevus classification task. However, small or heavily imbalanced datasets may lead to reduced conformity of the results due to the influence of these factors on the methods’ performance.Here is the same information in Simplified Chinese text:* For: 本研究旨在提高深度神经网络在皮肤病诊断中的表现， Addressing the issue of domain shift，即模型在新数据上的表现下降。* Methods: 作者评估了8种不supervised domain adaptation方法，以确定它们在dermoscopic dataset中的效果。* Results: 作者发现，所有8种domain adaptation方法在大多数分析数据集上都有所改进，表明不supervised domain adaptation通常能够提高binary melanoma-nevus分类任务的表现。但是，小型或受束缚数据集可能会导致结果的不一致，因为这些因素对方法的表现产生影响。

Abstract
The potential of deep neural networks in skin lesion classification has already been demonstrated to be on-par if not superior to the dermatologists diagnosis. However, the performance of these models usually deteriorates when the test data differs significantly from the training data (i.e. domain shift). This concerning limitation for models intended to be used in real-world skin lesion classification tasks poses a risk to patients. For example, different image acquisition systems or previously unseen anatomical sites on the patient can suffice to cause such domain shifts. Mitigating the negative effect of such shifts is therefore crucial, but developing effective methods to address domain shift has proven to be challenging. In this study, we carry out an in-depth analysis of eight different unsupervised domain adaptation methods to analyze their effectiveness in improving generalization for dermoscopic datasets. To ensure robustness of our findings, we test each method on a total of ten distinct datasets, thereby covering a variety of possible domain shifts. In addition, we investigated which factors in the domain shifted datasets have an impact on the effectiveness of domain adaptation methods. Our findings show that all of the eight domain adaptation methods result in improved AUPRC for the majority of analyzed datasets. Altogether, these results indicate that unsupervised domain adaptations generally lead to performance improvements for the binary melanoma-nevus classification task regardless of the nature of the domain shift. However, small or heavily imbalanced datasets lead to a reduced conformity of the results due to the influence of these factors on the methods performance.

摘要
深度神经网络在皮肤病变分类中的潜力已经被证明与专业 dermatologist 诊断相当或更高。然而，这些模型在测试数据与训练数据之间的域转换（domain shift）时通常会表现出现下降。这种问题在实际应用中对患者造成风险，例如不同的图像获取系统或患者的前所未见的解剖位置可能会导致域转换。因此，解决域转换的负面影响是关键，但是开发有效的方法来解决域转换问题已经证明是困难的。在这项研究中，我们进行了八种不同的无监督领域适应方法的深入分析，以分析它们在DERMOSCOPIC dataset上的效果。为确保我们的发现的可靠性，我们将每种方法测试在总共十个不同的dataset上，以覆盖多种可能的域转换情况。此外，我们还研究了域转换 dataset 中各因素对领域适应方法的影响。我们的发现显示，所有八种领域适应方法都导致了大多数分析dataset中的AUPRC的提高。总之，这些结果表明无监督领域适应方法在皮肤病变分类任务中广泛地实现性提高，不管域转换的性质如何。然而，小型或受折制的dataset会导致结果的一致性受到这些因素的影响。

Robust Zero Level-Set Extraction from Unsigned Distance Fields Based on Double Covering

paper_url: http://arxiv.org/abs/2310.03431
repo_url: https://github.com/jjjkkyz/dcudf
paper_authors: Fei Hou, Xuhui Chen, Wencheng Wang, Hong Qin, Ying He
for: 本研究提出了一种新的方法，叫做DoubleCoverUDF，用于从无符号距离场（UDF）中提取零水平面。
methods: DoubleCoverUDF 方法使用一个学习的 UDF 和用户指定的参数 $r$（一个小正数）作为输入，使用 conventional marching cubes 算法计算一个iso-surface，其中iso-value 为 $r$。
results: 计算得到的iso-surface 是 $r$-偏移体积 $S$ 的边界，其中 $S$ 是一个 orientable manifold，无论 $S$ 的topology如何。然后，算法计算一个覆盖图来投影边界网格onto $S$，保持网格的topology和避免折叠。如果 $S$ 是 orientable manifold 表面，我们的算法将double-layered mesh 分解成一个单层 mesh，否则保持 double-layered mesh 作为输出。我们对 open models 进行了3D 面的重建，并在synthetic models 和benchmark datasets上进行了实验，结果表明我们的方法比现有的 UDF-based 方法更加稳定和生成高质量的 mesh。

Abstract
In this paper, we propose a new method, called DoubleCoverUDF, for extracting the zero level-set from unsigned distance fields (UDFs). DoubleCoverUDF takes a learned UDF and a user-specified parameter $r$ (a small positive real number) as input and extracts an iso-surface with an iso-value $r$ using the conventional marching cubes algorithm. We show that the computed iso-surface is the boundary of the $r$-offset volume of the target zero level-set $S$, which is an orientable manifold, regardless of the topology of $S$. Next, the algorithm computes a covering map to project the boundary mesh onto $S$, preserving the mesh's topology and avoiding folding. If $S$ is an orientable manifold surface, our algorithm separates the double-layered mesh into a single layer using a robust minimum-cut post-processing step. Otherwise, it keeps the double-layered mesh as the output. We validate our algorithm by reconstructing 3D surfaces of open models and demonstrate its efficacy and effectiveness on synthetic models and benchmark datasets. Our experimental results confirm that our method is robust and produces meshes with better quality in terms of both visual evaluation and quantitative measures than existing UDF-based methods. The source code is available at https://github.com/jjjkkyz/DCUDF.

摘要
在本文中，我们提出了一种新的方法，即DoubleCoverUDF，用于从无符号距离场（UDF）中提取零水平面。DoubleCoverUDF接受一个学习过的UDF和用户指定的参数$r$（一个小正数）作为输入，使用传统的推进立方体算法计算一个iso-面，其iso-值为$r$。我们证明计算得到的iso-面是$r$-偏移体积的目标零水平面的边界，这是一个orientable manifold，无论$S$的topology如何。接下来，算法计算一个覆盖函数，将边界网格映射到$S$上，保持网格的topology和避免折叠。如果$S$是orientable manifold表面，我们的算法将double-layered网格分解为单层网格，使用一种robust minimum-cut后处理步骤。否则，它将double-layered网格作为输出。我们验证了我们的算法，通过重建开放模型的3D表面，并在 sintetic模型和标准数据集上进行了实验。我们的实验结果表明，我们的方法可以快速、稳定、高质量地从UDF中提取零水平面，并且在视觉评价和量化度量上比existings UDF-based方法更好。源代码可以在https://github.com/jjjkkyz/DCUDF上下载。

FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators

paper_url: http://arxiv.org/abs/2310.03420
repo_url: https://github.com/WHU-USI3DV/FreeReg
paper_authors: Haiping Wang, Yuan Liu, Bing Wang, Yujing Sun, Zhen Dong, Wenping Wang, Bisheng Yang
for: 图像和点云之间的匹配问题的基础问题是图像-点云注册。但由于图像和点云的模式差，使得现有的度量学习方法难以学习稳定和特异的跨模态特征。
methods: 我们提议先使用大规模预训练模型将图像和点云的模式统一，然后在同一模式内建立稳定的对应关系。我们发现diffusion特征在深度图生成器中提取的特征在图像和点云之间具有 semantics的一致性，因此可以建立粗略而 Robust的跨模态对应关系。
results: 我们进一步提取了depth图生成器中的geometry特征，并将其与diffusion特征进行匹配。这有效地提高了粗略对应关系的准确性。我们在三个公共的indoor和outdoor标准测试集上进行了广泛的实验，并显示了没有任务特别训练的情况下，直接使用这两种特征可以实现高精度的图像-点云注册。

Abstract
Matching cross-modality features between images and point clouds is a fundamental problem for image-to-point cloud registration. However, due to the modality difference between images and points, it is difficult to learn robust and discriminative cross-modality features by existing metric learning methods for feature matching. Instead of applying metric learning on cross-modality data, we propose to unify the modality between images and point clouds by pretrained large-scale models first, and then establish robust correspondence within the same modality. We show that the intermediate features, called diffusion features, extracted by depth-to-image diffusion models are semantically consistent between images and point clouds, which enables the building of coarse but robust cross-modality correspondences. We further extract geometric features on depth maps produced by the monocular depth estimator. By matching such geometric features, we significantly improve the accuracy of the coarse correspondences produced by diffusion features. Extensive experiments demonstrate that without any task-specific training, direct utilization of both features produces accurate image-to-point cloud registration. On three public indoor and outdoor benchmarks, the proposed method averagely achieves a 20.6 percent improvement in Inlier Ratio, a three-fold higher Inlier Number, and a 48.6 percent improvement in Registration Recall than existing state-of-the-arts.

摘要
基于图像和点云的图像-点云匹配是图像处理领域的基本问题。然而，由于图像和点云的模式差异，使用现有的度量学习方法来学习强健和特异的跨模态特征是困难的。而不是将度量学习应用于跨模态数据上，我们提议先使用大规模预训练模型将图像和点云的模式统一，然后在同一模式内建立强健的对应关系。我们发现Diffusion特征，由深度图像扩散模型提取的中间特征，在图像和点云之间具有相似的含义，这使得可以建立粗略 yet 强健的跨模态对应关系。此外，我们还提取了depth图像上的几何特征。通过匹配这些几何特征，我们可以大幅提高粗略对应关系的准确性。我们的方法不需要任务特有的训练，直接使用这两种特征可以实现高精度的图像-点云匹配。在三个公共的室内和户外标准benchmark上，我们的方法平均提高了20.6%的准确率、三倍的准确数和48.6%的注册回溯率，与现有状态的方法相比。

paper_url: http://arxiv.org/abs/2310.03402
repo_url: None
paper_authors: Zhenyu Bu, Kai-Ni Wang, Fuxing Zhao, Shengxiao Li, Guang-Quan Zhou
for: 提高ultrasound imaging的图像质量，以便进行后续的分类和识别任务。
methods: 使用global和local知识网络，并 integration fine-grained refinement block，以提高图像的细节表示。
results: 在HC18和BUSI两个公共数据集上进行验证，实验结果表明，该模型可以在量化指标和视觉性能上达到竞争力水平。

Abstract
Ultrasound imaging serves as an effective and non-invasive diagnostic tool commonly employed in clinical examinations. However, the presence of speckle noise in ultrasound images invariably degrades image quality, impeding the performance of subsequent tasks, such as segmentation and classification. Existing methods for speckle noise reduction frequently induce excessive image smoothing or fail to preserve detailed information adequately. In this paper, we propose a complementary global and local knowledge network for ultrasound denoising with fine-grained refinement. Initially, the proposed architecture employs the L-CSwinTransformer as encoder to capture global information, incorporating CNN as decoder to fuse local features. We expand the resolution of the feature at different stages to extract more global information compared to the original CSwinTransformer. Subsequently, we integrate Fine-grained Refinement Block (FRB) within the skip-connection stage to further augment features. We validate our model on two public datasets, HC18 and BUSI. Experimental results demonstrate that our model can achieve competitive performance in both quantitative metrics and visual performance. Our code will be available at https://github.com/AAlkaid/USDenoising.

摘要
超声影像成为诊断工具的有效和非侵入性方法，广泛应用于临床检查。然而，超声影像中的斑点噪声常常降低影像质量，影响后续任务，如分割和分类。现有的噪声减少方法 frequently会导致过度的图像平滑或失去细节信息。在这篇论文中，我们提议一种 complementary 全球和本地知识网络 для超声杂噪减少，并在不同阶段扩大特征的分辨率，以获取更多的全球信息。然后，我们在跳过阶段内 интеGRATE Fine-grained Refinement Block (FRB)，以进一步增强特征。我们在 HC18 和 BUSI 两个公共数据集上验证我们的模型，实验结果表明我们的模型可以在量化指标和视觉性能方面达到竞争性表现。我们的代码将在 GitHub 上发布，链接为。

Learning to Simplify Spatial-Temporal Graphs in Gait Analysis

paper_url: http://arxiv.org/abs/2310.03396
repo_url: None
paper_authors: Adrian Cosma, Emilian Radoi
for: 这篇论文的目的是提高走势识别中的解释性和任务特定适应性，以提高走势识别的效率和可靠性。
methods: 这篇论文提出了一种新的方法，即使用两个模型（上游和下游模型）调整每个走势实例的边度矩阵，以删除固定的 graphs。这使得模型可以trainable end-to-end，并且可以根据特定的数据集和任务进行自动调整。
results: 研究人员使用了CASIA-B数据集进行实验，结果显示了our方法可以提高解释性和任务特定适应性，并且与固定 graphs相比，our方法的结果有着不同的解释性。

Abstract
Gait analysis leverages unique walking patterns for person identification and assessment across multiple domains. Among the methods used for gait analysis, skeleton-based approaches have shown promise due to their robust and interpretable features. However, these methods often rely on hand-crafted spatial-temporal graphs that are based on human anatomy disregarding the particularities of the dataset and task. This paper proposes a novel method to simplify the spatial-temporal graph representation for gait-based gender estimation, improving interpretability without losing performance. Our approach employs two models, an upstream and a downstream model, that can adjust the adjacency matrix for each walking instance, thereby removing the fixed nature of the graph. By employing the Straight-Through Gumbel-Softmax trick, our model is trainable end-to-end. We demonstrate the effectiveness of our approach on the CASIA-B dataset for gait-based gender estimation. The resulting graphs are interpretable and differ qualitatively from fixed graphs used in existing models. Our research contributes to enhancing the explainability and task-specific adaptability of gait recognition, promoting more efficient and reliable gait-based biometrics.

摘要
《走姿分析利用唯一的步态特征进行人体身份识别和评估，在多个领域中得到广泛应用。 Among the methods used for gait analysis, 骨架基 Approaches have shown promise due to their robust and interpretable features. However, these methods often rely on hand-crafted spatial-temporal graphs that are based on human anatomy, disregarding the particularities of the dataset and task. This paper proposes a novel method to simplify the spatial-temporal graph representation for gait-based gender estimation, improving interpretability without losing performance. Our approach employs two models, an upstream and a downstream model, that can adjust the adjacency matrix for each walking instance, thereby removing the fixed nature of the graph. By employing the Straight-Through Gumbel-Softmax trick, our model is trainable end-to-end. We demonstrate the effectiveness of our approach on the CASIA-B dataset for gait-based gender estimation. The resulting graphs are interpretable and differ qualitatively from fixed graphs used in existing models. Our research contributes to enhancing the explainability and task-specific adaptability of gait recognition, promoting more efficient and reliable gait-based biometrics.》Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

OpenPatch: a 3D patchwork for Out-Of-Distribution detection

paper_url: http://arxiv.org/abs/2310.03388
repo_url: None
paper_authors: Paolo Rabino, Antonio Alliegro, Francesco Cappio Borlino, Tatiana Tommasi
for: 本研究旨在准备深度学习模型在实际世界中进行部署，以处理不可预期的情况。在某些应用中，新型的出现会带来重要的威胁，因此需要有效地探测它们。这种技能应该可以在需要时使用，而不需要任何额外的计算训练努力。
methods: 本研究使用了OpenPatch方法，基于大量预训练模型，简单地从其中提取了一组patch表示，用于描述每个已知类型。对于新样本，我们可以通过评估该样本是否可以主要通过已知类型的patch组成来获得新型性分数。
results: 本研究在实际点云样本上进行了semantic novelty检测任务，并在全known样本和几个known样本情况下进行了广泛的实验评估。结果表明，OpenPatch在不同的预训练目标和网络背bone下都表现出了强大的稳定性，并且可以在不需要重新训练的情况下应用于各种实际任务。

Abstract
Moving deep learning models from the laboratory setting to the open world entails preparing them to handle unforeseen conditions. In several applications the occurrence of novel classes during deployment poses a significant threat, thus it is crucial to effectively detect them. Ideally, this skill should be used when needed without requiring any further computational training effort at every new task. Out-of-distribution detection has attracted significant attention in the last years, however the majority of the studies deal with 2D images ignoring the inherent 3D nature of the real-world and often confusing between domain and semantic novelty. In this work, we focus on the latter, considering the objects geometric structure captured by 3D point clouds regardless of the specific domain. We advance the field by introducing OpenPatch that builds on a large pre-trained model and simply extracts from its intermediate features a set of patch representations that describe each known class. For any new sample, we obtain a novelty score by evaluating whether it can be recomposed mainly by patches of a single known class or rather via the contribution of multiple classes. We present an extensive experimental evaluation of our approach for the task of semantic novelty detection on real-world point cloud samples when the reference known data are synthetic. We demonstrate that OpenPatch excels in both the full and few-shot known sample scenarios, showcasing its robustness across varying pre-training objectives and network backbones. The inherent training-free nature of our method allows for its immediate application to a wide array of real-world tasks, offering a compelling advantage over approaches that need expensive retraining efforts.

摘要
将深度学习模型从室内设置到开放世界需要准备其处理不可预测的条件。在多个应用程序中，新类的出现会带来重大的威胁，因此可以快速、无需进一步的计算训练努力，检测这些新类。我们在这里关注对象的三维结构，不管具体的领域。我们提出了开放patch（OpenPatch），基于大量预训练模型，简单地从其中提取一些patch表示，用于描述每个已知类。对于任何新样本，我们可以计算一个新类准确性分数，根据是否可以通过已知类中的patch组成。我们对实际世界点云样本进行了广泛的实验评估，证明了我们的方法在全示例和几示例已知样本enario中均表现出色，并且在不同的预训练目标和网络背景下保持了稳定性。由于我们的方法不需要重新训练，可以立即应用于各种实际世界任务，提供了优势。

ACT-Net: Anchor-context Action Detection in Surgery Videos

paper_url: http://arxiv.org/abs/2310.03377
repo_url: None
paper_authors: Luoying Hao, Yan Hu, Wenjun Lin, Qun Wang, Heng Li, Huazhu Fu, Jinming Duan, Jiang Liu
for: 这篇论文的目的是精确地检测运行整个手术过程中的细部动作，以提高Context-aware决策支持系统的精度。
methods: 这篇论文提出了一个名为ACTNet的检测网络，包括一个 anchor-context检测（ACD）模组和一个分类传播激活（CCD）模组，以回答以下问题：1）动作发生的地方是哪里？2）动作是什么样的？3）预测结果的可信度是什么样的？特别是，ACD模组在运行影片中找到和点选出执行动作的区域，并且根据这些区域的对话来预测动作的位置和分布。CCD模组则使用一个减震传播激活模型，以确定动作的预测结果。
results: 这篇论文的结果显示，这个方法可以实现高精度的动作检测，并且可以提供模型的可信度。与基eline相比，这个方法的MAP值提高了4.0%。

Abstract
Recognition and localization of surgical detailed actions is an essential component of developing a context-aware decision support system. However, most existing detection algorithms fail to provide high-accuracy action classes even having their locations, as they do not consider the surgery procedure's regularity in the whole video. This limitation hinders their application. Moreover, implementing the predictions in clinical applications seriously needs to convey model confidence to earn entrustment, which is unexplored in surgical action prediction. In this paper, to accurately detect fine-grained actions that happen at every moment, we propose an anchor-context action detection network (ACTNet), including an anchor-context detection (ACD) module and a class conditional diffusion (CCD) module, to answer the following questions: 1) where the actions happen; 2) what actions are; 3) how confidence predictions are. Specifically, the proposed ACD module spatially and temporally highlights the regions interacting with the extracted anchor in surgery video, which outputs action location and its class distribution based on anchor-context interactions. Considering the full distribution of action classes in videos, the CCD module adopts a denoising diffusion-based generative model conditioned on our ACD estimator to further reconstruct accurately the action predictions. Moreover, we utilize the stochastic nature of the diffusion model outputs to access model confidence for each prediction. Our method reports the state-of-the-art performance, with improvements of 4.0% mAP against baseline on the surgical video dataset.

摘要
Recognition and localization of surgical detailed actions is an essential component of developing a context-aware decision support system. However, most existing detection algorithms fail to provide high-accuracy action classes even having their locations, as they do not consider the surgery procedure's regularity in the whole video. This limitation hinders their application. Moreover, implementing the predictions in clinical applications seriously needs to convey model confidence to earn entrustment, which is unexplored in surgical action prediction. In this paper, to accurately detect fine-grained actions that happen at every moment, we propose an anchor-context action detection network (ACTNet), including an anchor-context detection (ACD) module and a class conditional diffusion (CCD) module, to answer the following questions: 1) where the actions happen; 2) what actions are; 3) how confidence predictions are. Specifically, the proposed ACD module spatially and temporally highlights the regions interacting with the extracted anchor in surgery video, which outputs action location and its class distribution based on anchor-context interactions. Considering the full distribution of action classes in videos, the CCD module adopts a denoising diffusion-based generative model conditioned on our ACD estimator to further reconstruct accurately the action predictions. Moreover, we utilize the stochastic nature of the diffusion model outputs to access model confidence for each prediction. Our method reports the state-of-the-art performance, with improvements of 4.0% mAP against baseline on the surgical video dataset.Here's a word-for-word translation of the text into Simplified Chinese:Recognition和localization of surgical detailed actions是developing context-aware decision support system的关键组成部分。然而，大多数现有的探测算法无法提供高精度的动作类别，即使有其位置信息，因为它们不考虑手术程序整体视频中的规则性。这种限制约束了其应用。此外，在临床应用中，实现预测结果的应用需要传递模型confidence来赢得信任，这在手术动作预测中未经explored。在这篇论文中，我们提议一种名为 anchor-context action detection network (ACTNet)，包括一个 anchor-context detection (ACD)模块和一个类别 conditioned diffusion (CCD)模块，以回答以下问题：1) where the actions happen; 2) what actions are; 3) how confidence predictions are。具体来说，我们的 ACD模块在手术视频中将抽取的 anchor 与其相互作用的区域进行空间和时间高亮显示，输出动作的位置和类别分布。考虑整个视频中动作类别的全部分布，我们的 CCD模块采用一种denoising diffusion-based generative model，根据我们的 ACD 估计器来进一步重建准确的动作预测。此外，我们利用 diffusion model 输出的随机性来访问模型信任度。我们的方法在手术视频数据集上报告了状态艺术性的表现，与基准相比提高了4.0% mAP。

Point-Based Radiance Fields for Controllable Human Motion Synthesis

paper_url: http://arxiv.org/abs/2310.03375
repo_url: https://github.com/dehezhang2/point_based_nerf_editing
paper_authors: Haitao Yu, Deheng Zhang, Peiyuan Xie, Tianyi Zhang
for: 本 paper 提出了一种新的可控人体动作合成方法，用于细粒度变形。以前的编辑神经透镜场方法可以生成出吸引人的结果，但它们几乎无法实现复杂的3D人体编辑，如前向骨骼动作。我们的方法利用了明确的点云来训练静态3D场景，并通过编码点云转移来应用变形。
methods: 我们的方法使用了静态点云来训练静态3D场景，并通过编码点云转移来应用变形。我们还使用了SVD来估计本地旋转，并通过插值来将每个点的旋转转化到查询视图方向上。
results: 我们的方法可以对细粒度变形进行高效的控制，并且可以泛化到其他3D角色 besides humans。我们的实验结果表明，我们的方法可以与当前状态静态点云场景的最佳实现竞争。

Abstract
This paper proposes a novel controllable human motion synthesis method for fine-level deformation based on static point-based radiance fields. Although previous editable neural radiance field methods can generate impressive results on novel-view synthesis and allow naive deformation, few algorithms can achieve complex 3D human editing such as forward kinematics. Our method exploits the explicit point cloud to train the static 3D scene and apply the deformation by encoding the point cloud translation using a deformation MLP. To make sure the rendering result is consistent with the canonical space training, we estimate the local rotation using SVD and interpolate the per-point rotation to the query view direction of the pre-trained radiance field. Extensive experiments show that our approach can significantly outperform the state-of-the-art on fine-level complex deformation which can be generalized to other 3D characters besides humans.

摘要
这篇论文提出了一种新的可控人体动作合成方法，基于静止点云基于辐射场。 although previous neural radiance field方法可以生成印象深刻的结果，但它们很难实现复杂的3D人体编辑，如前向运动。 our method使用点云进行Explicit 3D场景训练，并通过编码点云平移来应用塑形。 to ensure the rendering result is consistent with the canonical space training, we estimate the local rotation using SVD and interpolate the per-point rotation to the query view direction of the pre-trained radiance field. extensive experiments show that our approach can significantly outperform the state-of-the-art on fine-level complex deformation, which can be generalized to other 3D characters besides humans.

Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior

paper_url: http://arxiv.org/abs/2310.03363
repo_url: None
paper_authors: Jinting Wang, Li Liu, Jun Wang, Hei Victor Cheng
for: 这个论文的目的是提出一种新的语音到脸图生成框架，以解决现有的语音到脸图生成方法中存在的不稳定性和无法生成真实的脸图问题。
methods: 该框架基于一种新型的噪声扩散模型（SCLDM），并采用对比预训练来保持语音和脸图之间的共同特征信息。此外，我们还提出了一种新的增强方法，通过将统计面积至 estadístico incorporated into the diffusion process to eliminate the shared component across the faces and enhance the subtle variations captured by the speech condition.
results: 我们的方法可以生成更加真实的脸图，同时保持说话人的身份特征。经验表明，我们的方法在AVSpeech和Voxceleb两个 dataset上具有显著的改进，特别是在cosine distance metric上的提高。例如，在AVSpeech dataset上，我们的方法提高了32.17和32.72的cosine distance metric，对比之前的最佳方法，提高了23.53%和25.37%。

Abstract
Speech-to-face generation is an intriguing area of research that focuses on generating realistic facial images based on a speaker's audio speech. However, state-of-the-art methods employing GAN-based architectures lack stability and cannot generate realistic face images. To fill this gap, we propose a novel speech-to-face generation framework, which leverages a Speech-Conditioned Latent Diffusion Model, called SCLDM. To the best of our knowledge, this is the first work to harness the exceptional modeling capabilities of diffusion models for speech-to-face generation. Preserving the shared identity information between speech and face is crucial in generating realistic results. Therefore, we employ contrastive pre-training for both the speech encoder and the face encoder. This pre-training strategy facilitates effective alignment between the attributes of speech, such as age and gender, and the corresponding facial characteristics in the face images. Furthermore, we tackle the challenge posed by excessive diversity in the synthesis process caused by the diffusion model. To overcome this challenge, we introduce the concept of residuals by integrating a statistical face prior to the diffusion process. This addition helps to eliminate the shared component across the faces and enhances the subtle variations captured by the speech condition. Extensive quantitative, qualitative, and user study experiments demonstrate that our method can produce more realistic face images while preserving the identity of the speaker better than state-of-the-art methods. Highlighting the notable enhancements, our method demonstrates significant gains in all metrics on the AVSpeech dataset and Voxceleb dataset, particularly noteworthy are the improvements of 32.17 and 32.72 on the cosine distance metric for the two datasets, respectively.

摘要
《speech-to-face》是一个吸引人的研究领域，它旨在基于说话人的音频speech生成真实的脸部图像。然而，当前的方法使用GAN结构，缺乏稳定性，无法生成真实的脸部图像。为了填补这一漏洞，我们提出了一种新的speech-to-face生成框架，它利用了一种叫做Speech-Conditioned Latent Diffusion Model（SCLDM）。据我们所知，这是首次利用扩散模型来进行speech-to-face生成。在生成真实结果的同时，保持说话人的身份信息与脸部图像之间的共同性是关键。因此，我们使用了对比预训练，使得说话人的年龄和性别特征与对应的脸部特征进行有效的对应。此外，我们解决了由扩散模型引起的生成过程中的过度多样性挑战。我们在扩散过程中添加了一个统计学面壳，以消除共同的部分在脸部图像中，并使得扩散过程中的微妙变化更加明显。经验证明，我们的方法可以生成更真实的脸部图像，同时保持说话人的身份。特别是，在AVSpeech和Voxceleb两个 dataset上，我们的方法表现出了显著的提升，cosine distance指标上的提升分别为32.17和32.72。

CSI: Enhancing the Robustness of 3D Point Cloud Recognition against Corruption

paper_url: http://arxiv.org/abs/2310.03360
repo_url: https://github.com/masterwu2115/csi
paper_authors: Zhuoyuan Wu, Jiachen Sun, Chaowei Xiao
for: 提高点云识别 robustness 对于实际世界中的数据损害
methods: 利用点云数据的自然集 свой性，提出一种新的极限子集标识（CSI）方法，包括具有强度感知抽样（DAS）和自身熵减少（SEM）两部分
results: 与比较方法相比，CSI方法在ModelNet40-C和PointCloud-C上实现了18.4%和16.3%的错误率，比前者提高5.2%和4.2%，代表了 Notable improvement in point cloud recognition robustness against data corruption.

Abstract
Despite recent advancements in deep neural networks for point cloud recognition, real-world safety-critical applications present challenges due to unavoidable data corruption. Current models often fall short in generalizing to unforeseen distribution shifts. In this study, we harness the inherent set property of point cloud data to introduce a novel critical subset identification (CSI) method, aiming to bolster recognition robustness in the face of data corruption. Our CSI framework integrates two pivotal components: density-aware sampling (DAS) and self-entropy minimization (SEM), which cater to static and dynamic CSI, respectively. DAS ensures efficient robust anchor point sampling by factoring in local density, while SEM is employed during training to accentuate the most salient point-to-point attention. Evaluations reveal that our CSI approach yields error rates of 18.4\% and 16.3\% on ModelNet40-C and PointCloud-C, respectively, marking a notable improvement over state-of-the-art methods by margins of 5.2\% and 4.2\% on the respective benchmarks. Code is available at \href{https://github.com/masterwu2115/CSI/tree/main}{https://github.com/masterwu2115/CSI/tree/main}

摘要
尽管最近的深度神经网络在点云识别方面做出了重要进步，但在实际应用中仍然面临数据损害的挑战。现有模型经常无法适应不可预测的分布转移。本研究利用点云数据的自然集属性，提出一种新的极值子集标识（CSI）方法，以增强识别的可靠性。我们的CSI框架包括两个重要组成部分：具有地方权重的采样（DAS）和自 entropy 最小化（SEM），它们分别适应静态和动态CSI。DAS 确保高效地采样稳定的均勋点，而 SEM 在训练中被使用，以强调最重要的点对点关注。我们的CSI方法在 ModelNet40-C 和 PointCloud-C 上获得了18.4%和16.3%的错误率，与当前最佳方法相比，占了5.2%和4.2%的较大优势。代码可以在 \href{https://github.com/masterwu2115/CSI/tree/main}{https://github.com/masterwu2115/CSI/tree/main} 上获取。

Combining Datasets with Different Label Sets for Improved Nucleus Segmentation and Classification

paper_url: http://arxiv.org/abs/2310.03346
repo_url: None
paper_authors: Amruta Parulekar, Utkarsh Kanwat, Ravi Kant Gupta, Medha Chippa, Thomas Jacob, Tripti Bameta, Swapnil Rane, Amit Sethi
for: automatic cell counting and morphometric assessments in histopathology images
methods: deep neural networks (DNNs) with a coarse-to-fine class hierarchy
results: improved segmentation and classification metrics on test splits, as well as generalization to previously unseen datasets

Abstract
Segmentation and classification of cell nuclei in histopathology images using deep neural networks (DNNs) can save pathologists' time for diagnosing various diseases, including cancers, by automating cell counting and morphometric assessments. It is now well-known that the accuracy of DNNs increases with the sizes of annotated datasets available for training. Although multiple datasets of histopathology images with nuclear annotations and class labels have been made publicly available, the set of class labels differ across these datasets. We propose a method to train DNNs for instance segmentation and classification on multiple datasets where the set of classes across the datasets are related but not the same. Specifically, our method is designed to utilize a coarse-to-fine class hierarchy, where the set of classes labeled and annotated in a dataset can be at any level of the hierarchy, as long as the classes are mutually exclusive. Within a dataset, the set of classes need not even be at the same level of the class hierarchy tree. Our results demonstrate that segmentation and classification metrics for the class set used by the test split of a dataset can improve by pre-training on another dataset that may even have a different set of classes due to the expansion of the training set enabled by our method. Furthermore, generalization to previously unseen datasets also improves by combining multiple other datasets with different sets of classes for training. The improvement is both qualitative and quantitative. The proposed method can be adapted for various loss functions, DNN architectures, and application domains.

摘要
深度神经网络（DNNs）可以自动完成 Histopathology 图像中细胞核心的分割和分类，从而为诊断多种疾病，包括癌症，节省病理医生的时间。现在已经证明，DNNs 的准确性与用于训练的数据集的大小成正相关。虽然多个历史病理图像数据集已经公开发布，但这些数据集中的类别标签不同。我们提出了一种方法，可以在不同数据集上训练 DNNs 进行实例分割和分类，其中数据集中的类别标签可以是归一化的树结构中的任何一级，只要这些类别是互斥的。在一个数据集中，类别标签不必是同一级别的树结构中的。我们的结果表明，在使用另一个数据集进行预训练后，对测试分割的分割和分类指标可以得到改进，并且将多个不同数据集合并训练可以提高总体化和推广性。这种方法可以适应不同的损失函数、DNN 架构和应用领域。

Denoising Diffusion Step-aware Models

paper_url: http://arxiv.org/abs/2310.03337
repo_url: None
paper_authors: Shuai Yang, Yukang Chen, Luozhou Wang, Shu Liu, Yingcong Chen
for: 提高 diffusion model 的计算效率，使其适用于更广泛的数据生成任务。
methods: 使用 spectrum of neural networks ，其中每个网络的大小根据每个生成步骤的重要性进行调整，通过遗传搜索确定。
results: 对 CIFAR-10、CelebA-HQ、LSUN-bedroom、AFHQ 和 ImageNet 等数据集进行了实验，显示 DDSM 可以提高计算效率，对应的计算时间减少了 49%、61%、59%、71% 和 76%，而不会 sacrificing 生成质量。

Abstract
Denoising Diffusion Probabilistic Models (DDPMs) have garnered popularity for data generation across various domains. However, a significant bottleneck is the necessity for whole-network computation during every step of the generative process, leading to high computational overheads. This paper presents a novel framework, Denoising Diffusion Step-aware Models (DDSM), to address this challenge. Unlike conventional approaches, DDSM employs a spectrum of neural networks whose sizes are adapted according to the importance of each generative step, as determined through evolutionary search. This step-wise network variation effectively circumvents redundant computational efforts, particularly in less critical steps, thereby enhancing the efficiency of the diffusion model. Furthermore, the step-aware design can be seamlessly integrated with other efficiency-geared diffusion models such as DDIMs and latent diffusion, thus broadening the scope of computational savings. Empirical evaluations demonstrate that DDSM achieves computational savings of 49% for CIFAR-10, 61% for CelebA-HQ, 59% for LSUN-bedroom, 71% for AFHQ, and 76% for ImageNet, all without compromising the generation quality. Our code and models will be publicly available.

摘要
Diffusion Probabilistic Models (DDPMs) 有很多应用领域的数据生成，但是存在一个主要的瓶颈是每次生成过程中整个网络的计算 overhead，这导致了高效性的问题。这篇文章提出了一种新的框架，即Denosing Diffusion Step-aware Models (DDSM)，以解决这个挑战。与传统方法不同，DDSM 使用了一个适应性的spectrum of neural networks，其中每个网络的大小根据生成过程中每个步骤的重要性来确定，通过进化搜索来确定。这种步骤 wise network variation 可以减少 redundant computational efforts，特别是在 less critical steps，从而提高 diffusion model 的效率。此外，步骤 aware 的设计可以与其他高效 diffusion model such as DDIMs 和 latent diffusion 集成，从而拓宽了计算省力的范围。我们的实验证明，DDSM 可以在 CIFAR-10 上实现49%的计算减少，在 CelebA-HQ 上实现61%的计算减少，在 LSUN-bedroom 上实现59%的计算减少，在 AFHQ 上实现71%的计算减少，并在 ImageNet 上实现76%的计算减少，无需牺牲生成质量。我们的代码和模型将公开发布。

Continual Test-time Domain Adaptation via Dynamic Sample Selection

paper_url: http://arxiv.org/abs/2310.03335
repo_url: None
paper_authors: Yanshuo Wang, Jie Hong, Ali Cheraghian, Shafin Rahman, David Ahmedt-Aristizabal, Lars Petersson, Mehrtash Harandi
for: 这篇论文的目的是提出一种 continual test-time domain adaptation (CTDA) 方法，以逐步适应一串目标领域 без 访问原始数据。
methods: 本文提出了一种 Dynamic Sample Selection (DSS) 方法，包括动态阈值、正面学习和负面学习三个过程。传统上，模型从未知环境数据中学习，并将所有样本的 Pseudo-label 作为更新模型参数的来源。但是，这些 Pseudo-label 可能会受到杂音的影响，因此所有样本不是Equally trustworthy。因此，我们首先设计了一个动态阈值模组，选择可疑的低质量样本。选择了低质量样本的 samples 更可能是错误预测的。因此，我们将 JOINT 正面和负面学习应用到高质量和低质量样本上，以减少使用错误信息的风险。
results: 我们进行了广泛的实验，证明我们的提出的方法在图像领域中实现了 CTDA 的最佳效果，超越了当前的州际结果。此外，我们的方法还被评估在 3D 点云领域中，展示了它的多元性和应用的可能性。

Abstract
The objective of Continual Test-time Domain Adaptation (CTDA) is to gradually adapt a pre-trained model to a sequence of target domains without accessing the source data. This paper proposes a Dynamic Sample Selection (DSS) method for CTDA. DSS consists of dynamic thresholding, positive learning, and negative learning processes. Traditionally, models learn from unlabeled unknown environment data and equally rely on all samples' pseudo-labels to update their parameters through self-training. However, noisy predictions exist in these pseudo-labels, so all samples are not equally trustworthy. Therefore, in our method, a dynamic thresholding module is first designed to select suspected low-quality from high-quality samples. The selected low-quality samples are more likely to be wrongly predicted. Therefore, we apply joint positive and negative learning on both high- and low-quality samples to reduce the risk of using wrong information. We conduct extensive experiments that demonstrate the effectiveness of our proposed method for CTDA in the image domain, outperforming the state-of-the-art results. Furthermore, our approach is also evaluated in the 3D point cloud domain, showcasing its versatility and potential for broader applicability.

摘要
CTDA 的目标是慢慢地适应一系列目标领域的模型，无需访问源数据。这篇论文提出了动态样本选择（DSS）方法。DSS包括动态阈值调整、正例学习和负例学习过程。传统上，模型从未知环境数据中学习，并且均依赖所有样本的假标签来更新参数通过自我训练。然而，数据中的预测结果存在噪音，因此所有样本都不是 equally trustworthy。因此，我们首先设计了动态阈值模块，选择可疑的低质量样本。选择的低质量样本更有可能错误预测。因此，我们应用了联合正例和负例学习在高质量和低质量样本上进行减风险。我们进行了广泛的实验，证明我们提出的方法在图像频谱中实现了 CTDA，并超越了现状势力的结果。此外，我们的方法还在3D点云频谱中进行了评估，展示了其 universality 和普遍性。

paper_url: http://arxiv.org/abs/2310.03333
repo_url: None
paper_authors: Jia Syuen Lim, Ziwei Wang, Jiajun Liu, Abdelwahed Khamis, Reza Arablouei, Robert Barlow, Ryan McAllister
for: 实现在多元领域的监管遵循性实现高品质保证和可追溯性。
methods: 使用实时多感器探测系统，包括3D时间探测和RGB摄像头，联合无监督学习技术在边缘AI设备上。
results: 提高记录保持效率，减少手动干预，并在 agrifood 设施中认真验证刀具清洁效果，掌握 occlusion 和低光照等问题。

Abstract
Regulatory compliance auditing across diverse industrial domains requires heightened quality assurance and traceability. Present manual and intermittent approaches to such auditing yield significant challenges, potentially leading to oversights in the monitoring process. To address these issues, we introduce a real-time, multi-modal sensing system employing 3D time-of-flight and RGB cameras, coupled with unsupervised learning techniques on edge AI devices. This enables continuous object tracking thereby enhancing efficiency in record-keeping and minimizing manual interventions. While we validate the system in a knife sanitization context within agrifood facilities, emphasizing its prowess against occlusion and low-light issues with RGB cameras, its potential spans various industrial monitoring settings.

摘要
政策遵循审核 Across 多个工业领域需要提高质量控制和可追溯性。现有的手动和间歇性方法在审核过程中存在重要的挑战，可能导致监测过程中的漏洞。为解决这些问题，我们介绍一种实时、多模式感知系统，使用3D时间旋转和RGB相机，并与边缘AI设备结合不监督学习技术。这使得对象的连续跟踪而提高记录保存的效率，并减少手动干预。我们在农业食品设施中 validate 这种系统，强调其在遮盖和低光照问题下的强健性，但其潜在适用范围包括多个工业监测场景。

Investigating the Limitation of CLIP Models: The Worst-Performing Categories

paper_url: http://arxiv.org/abs/2310.03324
repo_url: None
paper_authors: Jie-Jing Shao, Jiang-Xin Shi, Xiao-Wen Yang, Lan-Zhe Guo, Yu-Feng Li
for: 提高 CLIP 模型在特定类别下的表现，尤其是在风险敏感应用中，其中一些类别具有重要性。
methods: 研究 CLIP 模型两Modalities 的吻合，并提出了类别匹配margin（\cmm）来衡量推理冲击。
results: 通过查询大型自然语言模型和建立权重和 ensemble，提高了 ImageNet 上最差10类的准确率，从0%提高到5.2%，无需手动工程提示、劳辑优化或访问标注验证数据。

Abstract
Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts, enabling zero-shot recognition on downstream tasks. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance. For example, on ImageNet, there are a total of 10 categories with class-wise accuracy as low as 0\%, even though the overall performance has achieved 64.1\%. This phenomenon reveals the potential risks associated with using CLIP models, particularly in risk-sensitive applications where specific categories hold significant importance. To address this issue, we investigate the alignment between the two modalities in the CLIP model and propose the Class-wise Matching Margin (\cmm) to measure the inference confusion. \cmm\ can effectively identify the worst-performing categories and estimate the potential performance of the candidate prompts. We further query large language models to enrich descriptions of worst-performing categories and build a weighted ensemble to highlight the efficient prompts. Experimental results clearly verify the effectiveness of our proposal, where the accuracy on the worst-10 categories on ImageNet is boosted to 5.2\%, without manual prompt engineering, laborious optimization, or access to labeled validation data.

摘要
CLIP（对比语言图像预训练）提供了一个基本模型，将自然语言和视觉概念集成起来，以实现零shot认知任务。通常认为，通过Well-designed文本提示，可以在多个领域达到可接受的总体精度。然而，我们发现CLIP模型在最差类别表现不佳，比总体表现低至0%。例如，在ImageNet中有10个类别，其中每个类别的精度只有0%。这种现象表明CLIP模型在风险敏感应用中可能存在风险，特别是在特定类别具有重要性时。为解决这个问题，我们调查CLIP模型两个模式之间的对应关系，并提出了类别匹配margin（CMM）来度量推理冲击。CMM可以准确地确定最差表现的类别，并估算候选提示的可能性。我们进一步咨询大型自然语言模型，以描述最差表现的类别，并建立了权重 ensemble，以强调高效的提示。实验结果表明，我们的提议有效，ImageNet最差10类精度从0%提高到5.2%，无需手动提取工程、繁琐优化或访问标注验证数据。

Functional data learning using convolutional neural networks

paper_url: http://arxiv.org/abs/2310.03773
repo_url: https://github.com/jesusgl86/fdap01
paper_authors: Jose Galarza, Tamer Oraby
for: 这个论文目的是使用卷积神经网络（CNN）来解决功能数据中的回归和分类问题，特别是面临噪音和非噪音功能数据的情况。
methods: 这个方法是将功能数据转换为28x28图像，并使用特定的卷积神经网络来进行所有的回归运算和函数形式分类。
results: 这个方法可以实现高精度的回归和分类，并且可以应对噪音和非噪音功能数据。实验结果显示，这个方法可以成功地估计楕円增长和均值、幅度和峰值的大小，以及楕円函数的傅立叶 exponent和束缚问题。此外，这个方法还可以用于检测疾病传播率、药物溶解 profiling、和检测公元病例。

Abstract
In this paper, we show how convolutional neural networks (CNN) can be used in regression and classification learning problems of noisy and non-noisy functional data. The main idea is to transform the functional data into a 28 by 28 image. We use a specific but typical architecture of a convolutional neural network to perform all the regression exercises of parameter estimation and functional form classification. First, we use some functional case studies of functional data with and without random noise to showcase the strength of the new method. In particular, we use it to estimate exponential growth and decay rates, the bandwidths of sine and cosine functions, and the magnitudes and widths of curve peaks. We also use it to classify the monotonicity and curvatures of functional data, algebraic versus exponential growth, and the number of peaks of functional data. Second, we apply the same convolutional neural networks to Lyapunov exponent estimation in noisy and non-noisy chaotic data, in estimating rates of disease transmission from epidemic curves, and in detecting the similarity of drug dissolution profiles. Finally, we apply the method to real-life data to detect Parkinson's disease patients in a classification problem. The method, although simple, shows high accuracy and is promising for future use in engineering and medical applications.

摘要
在这篇论文中，我们展示了如何使用卷积神经网络（CNN）在有噪和无噪函数数据上进行回归和分类学习问题。主要思想是将函数数据转换成28x28图像。我们使用了一种特定 yet typical的卷积神经网络架构来实现所有的参数估计和函数形态分类问题。首先，我们使用了一些函数案例研究，包括带有噪声和无噪声的函数数据，以示新方法的强大性。我们使用它来估计指数增长和减速率、振荡函数的宽度和峰值强度、函数数据的 monotonicity 和曲线性、函数数据的 algebraic 增长和指数增长、函数数据的峰值数量等。其次，我们对噪声和无噪声杂化数据中的 Lyapunov 指数进行估计，从 epidemic 曲线中估计疾病传播率，并在药物溶解曲线上检测同义性。最后，我们应用这种方法到实际数据，以进行 Parkinson 病患分类问题。这种简单的方法具有高准确率，并在工程和医学应用中表示了承诺。

Can pre-trained models assist in dataset distillation?

paper_url: http://arxiv.org/abs/2310.03295
repo_url: https://github.com/yaolu-zjut/ddinterpreter
paper_authors: Yao Lu, Xuguang Chen, Yuchen Zhang, Jianyang Gu, Tianle Zhang, Yifan Zhang, Xiaoniu Yang, Qi Xuan, Kai Wang, Yang You
for: 本研究旨在探讨Pre-trained Models（PTMs）是否能够有效地传递知识到合成 dataset，以便DD可以准确地进行。
methods: 我们通过进行先验实验，并系统地研究不同的PTMs选项，包括初始化参数、模型架构、训练轮数和领域知识，以探讨PTMs对DD的贡献。
results: 我们发现：1）提高模型多样性可以提高合成dataset的性能; 2）不但优质模型可以帮助DD，而且在某些情况下，它们可以超越训练得非常好的模型; 3）领域特定的PTMs不是必需的，但是适应领域的PTMs可以提高DD的性能。通过选择最佳选项，我们可以大幅提高cross-architecture泛化性能。

Abstract
Dataset Distillation (DD) is a prominent technique that encapsulates knowledge from a large-scale original dataset into a small synthetic dataset for efficient training. Meanwhile, Pre-trained Models (PTMs) function as knowledge repositories, containing extensive information from the original dataset. This naturally raises a question: Can PTMs effectively transfer knowledge to synthetic datasets, guiding DD accurately? To this end, we conduct preliminary experiments, confirming the contribution of PTMs to DD. Afterwards, we systematically study different options in PTMs, including initialization parameters, model architecture, training epoch and domain knowledge, revealing that: 1) Increasing model diversity enhances the performance of synthetic datasets; 2) Sub-optimal models can also assist in DD and outperform well-trained ones in certain cases; 3) Domain-specific PTMs are not mandatory for DD, but a reasonable domain match is crucial. Finally, by selecting optimal options, we significantly improve the cross-architecture generalization over baseline DD methods. We hope our work will facilitate researchers to develop better DD techniques. Our code is available at https://github.com/yaolu-zjut/DDInterpreter.

摘要
dataset 简化 (DD) 是一种广泛应用的技术，它将原始数据集中的知识封装到一个小型的 sintetic 数据集上，以便高效地训练。同时，先修学模型 (PTM) 作为知识库，含有原始数据集中的广泛信息。这 naturally 引起了一个问题：PTM 是否可以正确地传递知识到 sintetic 数据集，以便 DD 准确地进行？为此，我们进行了初步的实验，并证明了 PTM 对 DD 的贡献。后续，我们系统地研究了不同的 PTM 选项，包括初始化参数、模型架构、训练轮数和领域知识，发现：1）提高模型多样性可以提高 sintetic 数据集的性能；2）不佳的模型也可以帮助 DD 进行，在某些情况下超越了训练得非常好的模型；3）领域特定的 PTM 并不是 DD 的必要条件，但是领域匹配是非常重要。最后，通过选择优化的选项，我们可以显著提高基eline DD 方法的跨建制泛化性能。我们希望我们的工作能够促进研究人员开发更好的 DD 技术。我们的代码可以在上找到。

SimVLG: Simple and Efficient Pretraining of Visual Language Generative Models

paper_url: http://arxiv.org/abs/2310.03291
repo_url: None
paper_authors: Yiren Jian, Tingkai Liu, Yunzhe Tao, Soroush Vosoughi, Hongxia Yang
for: 这篇论文目的是提出一种高效的视觉语言生成模型预训练方法，使用冻结的大型自然语言模型（LLM）。
methods: 该方法使用一个单阶段、单loss的框架，通过在训练过程中逐渐合并相似的视觉token来压缩视觉信息，保留 semantic content的 ricness，以实现快速的训练速度。
results: 对于视觉语言模型的训练，该方法可以提高训练速度 $\times 5$ 而无需减少性能，并且可以使用只有 $1/10$ 的数据 achieve 相当的性能。此外，该方法还可以将图像语言模型扩展到视频语言生成任务，通过一种新的软注意力 temporal token合并模块。

Abstract
In this paper, we propose ``SimVLG'', a streamlined framework for the pre-training of computationally intensive vision-language generative models, leveraging frozen pre-trained large language models (LLMs). The prevailing paradigm in vision-language pre-training (VLP) typically involves a two-stage optimization process: an initial resource-intensive phase dedicated to general-purpose vision-language representation learning, aimed at extracting and consolidating pertinent visual features, followed by a subsequent phase focusing on end-to-end alignment between visual and linguistic modalities. Our one-stage, single-loss framework circumvents the aforementioned computationally demanding first stage of training by gradually merging similar visual tokens during training. This gradual merging process effectively compacts the visual information while preserving the richness of semantic content, leading to fast convergence without sacrificing performance. Our experiments show that our approach can speed up the training of vision-language models by a factor $\times 5$ without noticeable impact on the overall performance. Additionally, we show that our models can achieve comparable performance to current vision-language models with only $1/10$ of the data. Finally, we demonstrate how our image-text models can be easily adapted to video-language generative tasks through a novel soft attentive temporal token merging modules.

摘要
在这篇论文中，我们提出了``SimVLG''框架，用于预训 computationally intensive的视觉语言生成模型，利用冰结的大型语言模型（LLM）。传统的视觉语言预训（VLP）方法通常包括两个阶段的优化过程：首先是一个资源占用 intensives的阶段，用于学习通用的视觉语言表示，然后是一个结合视觉语言Modalities的阶段。我们的一个阶段、单个损失框架可以 circumvent这个 computationally demanding的第一阶段训练，通过在训练过程中逐渐合并相似的视觉符号来压缩视觉信息，同时保持 semantics的 ricness，从而实现快速的训练 convergence 而无需牺牲性能。我们的实验表明，我们的方法可以将视觉语言模型的训练速度提高五倍，而无需注意到性能的影响。此外，我们还证明了我们的模型可以通过只使用一半的数据来实现与当前视觉语言模型相同的性能。最后，我们展示了我们的图像文本模型可以通过一种新的软注意时间符号合并模块来简单地适应视频语言生成任务。

PoseAction: Action Recognition for Patients in the Ward using Deep Learning Approaches

paper_url: http://arxiv.org/abs/2310.03288
repo_url: None
paper_authors: Zherui Li, Raye Chen-Hua Yeow
for: 这篇论文是为了提出一个基于计算机视觉和深度学习的方法，用于在医院内部的人员行为识别和预测。
methods: 这篇论文使用了OpenPose来准确地检测人员的位置，并使用AlphAction的异步互动聚合网络来预测人员的动作。这两个模型结合使用，称为PoseAction。
results: PoseAction模型在识别12种常见的医院区域动作时取得了98.72%的分类MAP（IoU@0.5）的最高成绩。此外，这篇论文还开发了一个在线实时模式，将支持医疗译rezension的实现。此外，使用OpenPose的面部点检测功能，这篇论文还实现了面部模糊，以保护病人和医疗工作者的隐私。

Abstract
Real-time intelligent detection and prediction of subjects' behavior particularly their movements or actions is critical in the ward. This approach offers the advantage of reducing in-hospital care costs and improving the efficiency of healthcare workers, which is especially true for scenarios at night or during peak admission periods. Therefore, in this work, we propose using computer vision (CV) and deep learning (DL) methods for detecting subjects and recognizing their actions. We utilize OpenPose as an accurate subject detector for recognizing the positions of human subjects in the video stream. Additionally, we employ AlphAction's Asynchronous Interaction Aggregation (AIA) network to predict the actions of detected subjects. This integrated model, referred to as PoseAction, is proposed. At the same time, the proposed model is further trained to predict 12 common actions in ward areas, such as staggering, chest pain, and falling down, using medical-related video clips from the NTU RGB+D and NTU RGB+D 120 datasets. The results demonstrate that PoseAction achieves the highest classification mAP of 98.72% (IoU@0.5). Additionally, this study develops an online real-time mode for action recognition, which strongly supports the clinical translation of PoseAction. Furthermore, using OpenPose's function for recognizing face key points, we also implement face blurring, which is a practical solution to address the privacy protection concerns of patients and healthcare workers. Nevertheless, the training data for PoseAction is currently limited, particularly in terms of label diversity. Consequently, the subsequent step involves utilizing a more diverse dataset (including general actions) to train the model's parameters for improved generalization.

摘要
“现场智能探测和预测病人的行为，特别是其运动或动作，在医院中是非常重要的。这种方法可以降低医院内部门成本和提高医疗工作者的效率，尤其在夜间或峰值 admit 期间。因此，在这种工作中，我们提议使用计算机视觉（CV）和深度学习（DL）方法来探测和识别病人的动作。我们使用 OpenPose 作为准确的人体探测器，并使用 AlphAction 的异步互动聚合（AIA）网络预测病人的动作。这个整体模型被称为 PoseAction。同时，我们进一步训练这个模型，以预测医院区域中的 12 种常见动作，如摇摆、胸痛和跌倒。结果表明，PoseAction 达到了最高的分类MAP 98.72%（IoU@0.5）。此外，本研究还开发了在线实时模式，以便支持临床翻译。此外，通过 OpenPose 的人脸关键点识别功能，我们还实现了面部模糊，这是一个实际的解决方案，以保护患者和医疗工作者的隐私。然而，PoseAction 的训练数据目前还受限，特别是Label多样性不够。因此，后续步骤是使用更多的多样化数据（包括通用动作）来训练模型的参数，以提高其泛化能力。”

Classifying Whole Slide Images: What Matters?

paper_url: http://arxiv.org/abs/2310.03279
repo_url: None
paper_authors: Long Nguyen, Aiden Nibali, Joshua Millward, Zhen He
for: 这篇论文旨在研究把握高分辨率整幕报告（WSIs）的分类算法。
methods: 这篇论文使用了不同的设计选择来探索WSIs分类算法中最重要的特征。
results: 研究发现，在WSIs分类中，最重要的特征是在小 patch 级别上捕捉的地方环境细节，而不是全幕级别的全球信息。此外，一种简单的多实例学习方法，不捕捉全球信息，也可以达到高精度。

Abstract
Recently there have been many algorithms proposed for the classification of very high resolution whole slide images (WSIs). These new algorithms are mostly focused on finding novel ways to combine the information from small local patches extracted from the slide, with an emphasis on effectively aggregating more global information for the final predictor. In this paper we thoroughly explore different key design choices for WSI classification algorithms to investigate what matters most for achieving high accuracy. Surprisingly, we found that capturing global context information does not necessarily mean better performance. A model that captures the most global information consistently performs worse than a model that captures less global information. In addition, a very simple multi-instance learning method that captures no global information performs almost as well as models that capture a lot of global information. These results suggest that the most important features for effective WSI classification are captured at the local small patch level, where cell and tissue micro-environment detail is most pronounced. Another surprising finding was that unsupervised pre-training on a larger set of 33 cancers gives significantly worse performance compared to pre-training on a smaller dataset of 7 cancers (including the target cancer). We posit that pre-training on a smaller, more focused dataset allows the feature extractor to make better use of the limited feature space to better discriminate between subtle differences in the input patch.

摘要
近些时间，有许多算法提出来用于分类高解像整幕照片（WSIs）。这些新算法主要集中在找到小地方区域Extracted from the slide的信息的新方法，强调有效地汇集全局信息为最终预测器。在这篇论文中，我们详细探讨了不同的关键设计选择WSI分类算法，以 investigate what matters most for achieving high accuracy。 Surprisingly, we found that capturing global context information does not necessarily mean better performance. A model that captures the most global information consistently performs worse than a model that captures less global information. In addition, a very simple multi-instance learning method that captures no global information performs almost as well as models that capture a lot of global information. These results suggest that the most important features for effective WSI classification are captured at the local small patch level, where cell and tissue micro-environment detail is most pronounced. Another surprising finding was that unsupervised pre-training on a larger set of 33 cancers gives significantly worse performance compared to pre-training on a smaller dataset of 7 cancers (including the target cancer). We posit that pre-training on a smaller, more focused dataset allows the feature extractor to make better use of the limited feature space to better discriminate between subtle differences in the input patch.

Ablation Study to Clarify the Mechanism of Object Segmentation in Multi-Object Representation Learning

paper_url: http://arxiv.org/abs/2310.03273
repo_url: None
paper_authors: Takayuki Komatsu, Yoshiyuki Ohmura, Yasuo Kuniyoshi
for: 多个物体表示学习旨在将复杂的真实世界视觉输入转化为多个物体的组合。
methods: prevailing 方法通常使用不监督学习来将输入图像分割成各个物体，并将这些物体编码到每个幂量 Vector 中。但是，不清楚前一代方法如何实现正确的物体分割。此外，大多数前一代方法使用 Variational Autoencoder (VAE) 进行幂量 Vector 的正则化，因此不清楚 VAE 正则化是否对物体分割有效。
results: 为了解释多个物体表示学习中对物体分割的机制，我们对 MONet 进行了减少性研究。MONet 使用对应的注意mask和幂量 Vector 来表示多个物体。每个幂量 Vector 来自输入图像和注意mask。然后，对于每个幂量 Vector，分解图像和注意mask。MONet 的损失函数包括1) 输入图像和分解图像之间的总准确率损失，2) VAE 正则化损失，3) 注意mask 的准确率损失以显式地编码形态信息。我们对这三个损失函数进行了减少性研究，我们发现 VAE 正则化损失没有影响分割性能，而其他两个损失函数确实影响分割性能。基于这个结果，我们提出了一个新的假设：在图像区域中，最好是使得每个幂量 Vector 对应的注意mask 最大化。我们验证了这个假设，并证明了它是正确的。

Abstract
Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects. Representation learning methods have often used unsupervised learning to segment an input image into individual objects and encode these objects into each latent vector. However, it is not clear how previous methods have achieved the appropriate segmentation of individual objects. Additionally, most of the previous methods regularize the latent vectors using a Variational Autoencoder (VAE). Therefore, it is not clear whether VAE regularization contributes to appropriate object segmentation. To elucidate the mechanism of object segmentation in multi-object representation learning, we conducted an ablation study on MONet, which is a typical method. MONet represents multiple objects using pairs that consist of an attention mask and the latent vector corresponding to the attention mask. Each latent vector is encoded from the input image and attention mask. Then, the component image and attention mask are decoded from each latent vector. The loss function of MONet consists of 1) the sum of reconstruction losses between the input image and decoded component image, 2) the VAE regularization loss of the latent vector, and 3) the reconstruction loss of the attention mask to explicitly encode shape information. We conducted an ablation study on these three loss functions to investigate the effect on segmentation performance. Our results showed that the VAE regularization loss did not affect segmentation performance and the others losses did affect it. Based on this result, we hypothesize that it is important to maximize the attention mask of the image region best represented by a single latent vector corresponding to the attention mask. We confirmed this hypothesis by evaluating a new loss function with the same mechanism as the hypothesis.

摘要
多对象表示学习目标是将复杂的真实世界视觉输入转换为多个对象的组合。表示学习方法通常使用无监督学习来将输入图像分割成各个对象，并将这些对象编码到每个幂量中。然而，没有准确的方法来实现适当的对象分割。此外，大多数前一代方法使用Variational Autoencoder（VAE）来规范幂量。因此，不清楚VAE规范是否对适当的对象分割做出贡献。为了解释多对象表示学习中对象分割机制，我们进行了MONet方法的ablation研究。MONet使用对应于注意Mask和幂量的对象对来表示多个对象。每个幂量来自输入图像和注意Mask的编码。然后，从每个幂量中解码输入图像和注意Mask。MONet的损失函数包括1）输入图像和解码组件图像之间的总差异损失，2）VAE规范损失，3）注意Mask的重建损失，以便显式地编码形状信息。我们对这三个损失函数进行了ablation研究，并证明VAE规范损失没有影响分割性能，而其他两个损失函数有影响。基于这结果，我们提出了一种新的损失函数，它的机制与我们的假设相同。我们证明了这种损失函数能够提高对象分割性能。

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

paper_url: http://arxiv.org/abs/2310.03270
repo_url: None
paper_authors: Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
for: 这个研究旨在提高Diffusion模型的实用性，以便在实时应用中实现低延迟和低资料使用率。
methods: 这个研究使用了两种主要的压缩方法：post-training quantization (PTQ)和quantization-aware training (QAT)。而我们的方法是一种不需要训练数据的、简洁的精简架构，可以实现QAT-level的性能，并且具有PTQ-like的效率。
results: 实验结果显示，我们的方法可以与先前的PTQ-based diffusion模型比较，同时维持相似的时间和数据效率，并且实现更高的生成质量。具体来说，将LDM-4的 weights和活化函数压缩到4位数字时，与先前的PTQ-based方法相比，只有0.05 sFID增加。相比于QAT-based方法，我们的EfficientDM也具有16.2倍的压缩速度，并且实现了相似的生成质量。

Abstract
Diffusion models have demonstrated remarkable capabilities in image synthesis and related generative tasks. Nevertheless, their practicality for low-latency real-world applications is constrained by substantial computational costs and latency issues. Quantization is a dominant way to compress and accelerate diffusion models, where post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches, each bearing its own properties. While PTQ exhibits efficiency in terms of both time and data usage, it may lead to diminished performance in low bit-width. On the other hand, QAT can alleviate performance degradation but comes with substantial demands on computational and data resources. To capitalize on the advantages while avoiding their respective drawbacks, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. Specifically, we propose a quantization-aware variant of the low-rank adapter (QALoRA) that can be merged with model weights and jointly quantized to low bit-width. The fine-tuning process distills the denoising capabilities of the full-precision model into its quantized counterpart, eliminating the requirement for training data. We also introduce scale-aware optimization and employ temporal learned step-size quantization to further enhance performance. Extensive experimental results demonstrate that our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency. Specifically, there is only a marginal 0.05 sFID increase when quantizing both weights and activations of LDM-4 to 4-bit on ImageNet 256x256. Compared to QAT-based methods, our EfficientDM also boasts a 16.2x faster quantization speed with comparable generation quality.

摘要
Diffusion models 有 demonstrated 非常出色的创造力在图像生成和相关的生成任务中。然而，它们在实际应用中的延迟和计算成本问题受到了一定的限制。量化是Diffusion models的压缩和加速的主要方法，其中Post-training quantization (PTQ)和quantization-aware training (QAT)是两种主要的方法，每个方法都有自己的特点。PTQ可以快速地压缩和加速Diffusion models，但可能会导致低位数bit的性能下降。而QAT可以减轻性能下降，但需要大量的计算和数据资源。为了利用这些优点而避免其缺点，我们介绍了一种无需数据和参数的efficient fine-tuning框架，以实现QAT级别的性能，同时保持PTQ类似的效率。我们提出了一种量化感知的低级Adapter（QALoRA），可以与模型参数和量化结合使用，并且可以将模型 weights和活动量化到低位数bit。 fine-tuning过程将混合模型的权重和活动的权重和量化结果，从而消除了对训练数据的需求。我们还引入了缩放比例优化和时间学习步长量化，以进一步提高性能。我们的方法在实际实验中显著超越了之前基于PTQ的Diffusion models，同时保持相同的时间和数据效率。具体来说，在ImageNet 256x256上，将LDM-4的权重和活动量化到4位数bit时，只有0.05 sFID提升。相比QAT基于方法，我们的EfficientDM还具有16.2倍 faster量化速度，同时保持类似的生成质量。

2023-10-05

cs.AI

cs.AI - 2023-10-05

Hard View Selection for Contrastive Learning

paper_url: http://arxiv.org/abs/2310.03940
repo_url: None
paper_authors: Fabio Ferreira, Ivo Rapant, Frank Hutter
for: 提高对图像输入的抗变易性和稳定性
methods: 提出一种无需学习的、强大的硬视角选择策略（HVS），通过随机生成多个视角，并对每个视角对照进行反向传播来增加任务难度
results: 在ImageNet上 Linear Evaluation 中提高了0.55%-1.9%的精度，并在多种CL方法（如DINO、SimSiam、SimCLR）上显示了类似的改进，而且HVS在800个训练周期的基础上只需300个训练周期即可达到类似水平，即使 compte tenu of the additional forward passes induced by HVS.

Abstract
Many Contrastive Learning (CL) methods train their models to be invariant to different "views" of an image input for which a good data augmentation pipeline is crucial. While considerable efforts were directed towards improving pre-text tasks, architectures, or robustness (e.g., Siamese networks or teacher-softmax centering), the majority of these methods remain strongly reliant on the random sampling of operations within the image augmentation pipeline, such as the random resized crop or color distortion operation. In this paper, we argue that the role of the view generation and its effect on performance has so far received insufficient attention. To address this, we propose an easy, learning-free, yet powerful Hard View Selection (HVS) strategy designed to extend the random view generation to expose the pretrained model to harder samples during CL training. It encompasses the following iterative steps: 1) randomly sample multiple views and create pairs of two views, 2) run forward passes for each view pair on the currently trained model, 3) adversarially select the pair yielding the worst loss, and 4) run the backward pass with the selected pair. In our empirical analysis we show that under the hood, HVS increases task difficulty by controlling the Intersection over Union of views during pretraining. With only 300-epoch pretraining, HVS is able to closely rival the 800-epoch DINO baseline which remains very favorable even when factoring in the slowdown induced by the additional forwards of HVS. Additionally, HVS consistently achieves accuracy improvements on ImageNet between 0.55% and 1.9% on linear evaluation and similar improvements on transfer tasks across multiple CL methods, such as DINO, SimSiam, and SimCLR.

摘要
许多对比学习（CL）方法在训练模型时强调模型对不同视图的图像输入具有抗变异性。而现有的大量努力集中在改进预测任务、建筑或者稳定性（例如siamese网络或教师软max中心），但大多数这些方法仍然依赖于随机抽样操作在图像增强pipeline中，如随机缩放或颜色干扰操作。在这篇论文中，我们认为观察视图生成和其影响表现所得到的关注不足。为此，我们提出一种简单、学习无需的、具有强大抗变异性的硬视选择（HVS）策略，用于在CL训练中延长随机视图生成，并将模型 expose 到更难的样本。该策略包括以下步骤：1. 随机抽取多个视图，并将每个视图对创建对。2. 对每个视图对进行前向传播，并计算当前训练模型的损失。3. 选择损失最大的对，并对该对进行反向传播。我们的实验表明，HVS可以通过控制视图之间的交集来增加CL训练的difficulty。只需要300个训练回合，HVS就可以与800个训练回合的DINO基eline相当，而且这些基eline在CL训练中保持了非常有利的。此外，HVS在ImageNet上实现了Linear评估中的0.55%-1.9%的准确率提升，以及同样的提升在多个CL方法上，如DINO、SimSiam和SimCLR。

Multitask Learning for Time Series Data with 2D Convolution

paper_url: http://arxiv.org/abs/2310.03925
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Xin Dai, Yan Zheng, Junpeng Wang, Huiyuan Chen, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang
for: 这个研究探讨了多任务学习（MTL）在时间序列资料上的应用，以提高时间序列分类（TSC）模型的通用化能力。
methods: 我们将现有的1D核心嵌入式TSC模型与MTL结合，并评估其性能。我们还提出了一个新的2D核心嵌入式模型，以增强模型的表达能力。
results: 我们的提案在UCR档案和一个工业交易TSC数据集上实现了比较好的性能，较以往的方法还要好。

Abstract
Multitask learning (MTL) aims to develop a unified model that can handle a set of closely related tasks simultaneously. By optimizing the model across multiple tasks, MTL generally surpasses its non-MTL counterparts in terms of generalizability. Although MTL has been extensively researched in various domains such as computer vision, natural language processing, and recommendation systems, its application to time series data has received limited attention. In this paper, we investigate the application of MTL to the time series classification (TSC) problem. However, when we integrate the state-of-the-art 1D convolution-based TSC model with MTL, the performance of the TSC model actually deteriorates. By comparing the 1D convolution-based models with the Dynamic Time Warping (DTW) distance function, it appears that the underwhelming results stem from the limited expressive power of the 1D convolutional layers. To overcome this challenge, we propose a novel design for a 2D convolution-based model that enhances the model's expressiveness. Leveraging this advantage, our proposed method outperforms competing approaches on both the UCR Archive and an industrial transaction TSC dataset.

摘要
多任务学习（MTL）目的是开发一个可以同时处理一组相关任务的统一模型。通过优化模型 across multiple tasks，MTL 通常会超过其非 MTL 对应模型的普适性。虽然 MTL 在不同领域 such as 计算机视觉、自然语言处理和推荐系统中得到了广泛的研究，但对时间序列数据的应用却收到了有限的注意。在这篇论文中，我们调查了在时间序列分类（TSC）问题上MTL的应用。然而，当我们将现有的 state-of-the-art 1D核心样本-based TSC模型与 MTL 集成时，TSC 模型的性能实际下降。通过比较 1D 核心样本-based 模型和动态时间戳距（DTW）距离函数，可以看出，不满的结果实际上来自于 1D 核心样本层的有限表达能力。为了解决这个挑战，我们提议一种新的 2D 核心样本-based 模型，该模型可以增强模型的表达能力。利用这个优势，我们的提议方法在 UCR archive 和一个工业交易 TSC 数据集上超过了竞争方法的性能。

An Efficient Content-based Time Series Retrieval System

paper_url: http://arxiv.org/abs/2310.03919
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Junpeng Wang, Vivian Lai, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang, Jeff M. Phillips
for: 这个论文旨在提供一个可以处理多个领域时间序列数据的信息检索系统，帮助用户通过提交时间序列来检索相关时间序列和元数据。
methods: 该论文提出了一种高效和可靠的时间序列检索模型，使用了一种基于内存的快速相似度计算方法，并对多个领域时间序列进行了比较。
results: 对于具体的交易数据问题，该模型比其他方法更适合，并且在实时交互过程中可以保持reasonable的推理时间。

Abstract
A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated metadata. By analyzing the retrieved metadata, users can gather more information about the source of the time series. Because the CTSR system is required to work with time series data from diverse domains, it needs a high-capacity model to effectively measure the similarity between different time series. On top of that, the model within the CTSR system has to compute the similarity scores in an efficient manner as the users interact with the system in real-time. In this paper, we propose an effective and efficient CTSR model that outperforms alternative models, while still providing reasonable inference runtimes. To demonstrate the capability of the proposed method in solving business problems, we compare it against alternative models using our in-house transaction data. Our findings reveal that the proposed model is the most suitable solution compared to others for our transaction data problem.

摘要
一个内容基于时间序列检索（CTSR）系统是一个信息检索系统，用于帮助用户与多个领域（如金融、医疗和制造）中的时间序列进行交互。例如，用户想要了解时间序列的来源，可以将时间序列作为查询提交到CTSR系统，并获取相关的元数据列表。通过分析返回的元数据，用户可以了解更多关于时间序列的来源信息。由于CTSR系统需要处理来自不同领域的时间序列数据，因此需要一个高容量的模型来有效地度量不同时间序列之间的相似性。同时，模型内部需要高效地计算相似性分数，以便用户在实时交互时可以得到快速的回答。在这篇论文中，我们提出一种高效和高效的CTSR模型，超过了其他模型，同时仍提供了合理的推理运行时间。为了证明提案的方法在解决业务问题时的可行性，我们对它与其他模型进行比较，并使用我们的自有交易数据进行实践。我们的发现表明，提案的模型是与其他模型相比最适合的解决方案。

Toward a Foundation Model for Time Series Data

paper_url: http://arxiv.org/abs/2310.03916
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Xin Dai, Huiyuan Chen, Yan Zheng, Yujie Fan, Audrey Der, Vivian Lai, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang
for: 这个论文的目的是开发一种有效的时间序列基础模型，使其可以在多个领域中进行适应。
methods: 这篇论文使用了四种现有的自然学习基于预训练方法，以及一种新方法，在多个领域的无标示样本上进行预训练。
results: 实验结果表明，预训练可以提高下游分类任务的融合过程，并且提出了一种基于Transformer模型的新预训练方法，其在其他方法中表现出色。

Abstract
A foundation model is a machine learning model trained on a large and diverse set of data, typically using self-supervised learning-based pre-training techniques, that can be adapted to various downstream tasks. However, current research on time series pre-training has mostly focused on models pre-trained solely on data from a single domain, resulting in a lack of knowledge about other types of time series. However, current research on time series pre-training has predominantly focused on models trained exclusively on data from a single domain. As a result, these models possess domain-specific knowledge that may not be easily transferable to time series from other domains. In this paper, we aim to develop an effective time series foundation model by leveraging unlabeled samples from multiple domains. To achieve this, we repurposed the publicly available UCR Archive and evaluated four existing self-supervised learning-based pre-training methods, along with a novel method, on the datasets. We tested these methods using four popular neural network architectures for time series to understand how the pre-training methods interact with different network designs. Our experimental results show that pre-training improves downstream classification tasks by enhancing the convergence of the fine-tuning process. Furthermore, we found that the proposed pre-training method, when combined with the Transformer model, outperforms the alternatives.

摘要
《基础模型》是一种机器学习模型，通过大量和多样化的数据进行自动学习预训练，可以适应多种下游任务。然而，当前关于时间序列预训练的研究主要集中在尝试使用单个领域的数据进行预训练，导致对其他领域时间序列的知识缺乏。为了开拓新的研究途径，我们希图通过多个领域的无标签样本来开发一个有效的时间序列基础模型。我们利用了公共可用的UCRL Archive，评估了四种现有的自动学习预训练方法，以及一种新方法，在这些数据集上进行测试。我们使用了四种流行的快速网络架构来评估这些预训练方法的效果。我们的实验结果表明，预训练可以提高下游分类任务的整合，并且我们提出的预训练方法，与Transformer模型结合使用，可以超越其他方法。

RTDK-BO: High Dimensional Bayesian Optimization with Reinforced Transformer Deep kernels

paper_url: http://arxiv.org/abs/2310.03912
repo_url: None
paper_authors: Alexander Shmakov, Avisek Naug, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna Gutierrez, Ashwin Ramesh Babu, Antonio Guillen, Soumyendu Sarkar
for: 这篇论文的目的是提高Meta-learning Bayesian Optimization（BO）的表达力，以便更好地处理高维度黑盒优化问题。methods: 该论文使用了Deep Kernel Learning（DKL）和注意力基于Transformer模型来提高GPsurrogates的模型能力，并使用了Soft Actor-Critic Reinforcement Learning（SACRL）来学习获取函数的优化策略。results: 该论文的实验结果表明，combined DKL和Transformer模型可以提高Meta-learning BO surrogates的表达力，并在高维度黑盒优化问题上实现了最佳性能。

Abstract
Bayesian Optimization (BO), guided by Gaussian process (GP) surrogates, has proven to be an invaluable technique for efficient, high-dimensional, black-box optimization, a critical problem inherent to many applications such as industrial design and scientific computing. Recent contributions have introduced reinforcement learning (RL) to improve the optimization performance on both single function optimization and \textit{few-shot} multi-objective optimization. However, even few-shot techniques fail to exploit similarities shared between closely related objectives. In this paper, we combine recent developments in Deep Kernel Learning (DKL) and attention-based Transformer models to improve the modeling powers of GP surrogates with meta-learning. We propose a novel method for improving meta-learning BO surrogates by incorporating attention mechanisms into DKL, empowering the surrogates to adapt to contextual information gathered during the BO process. We combine this Transformer Deep Kernel with a learned acquisition function trained with continuous Soft Actor-Critic Reinforcement Learning to aid in exploration. This Reinforced Transformer Deep Kernel (RTDK-BO) approach yields state-of-the-art results in continuous high-dimensional optimization problems.

摘要

Taming Binarized Neural Networks and Mixed-Integer Programs

paper_url: http://arxiv.org/abs/2310.04469
repo_url: None
paper_authors: Johannes Aspman, Georgios Korpas, Jakub Marecek
for: 本研究旨在解决binarized neural networks的训练问题，特别是因为这些神经网络具有解释性。
methods: 研究人员使用了将问题 Reformulate为杂integer程序的子添加问题的方法，以便使用Bolte等人提出的暗示法，实现backpropagation的实际应用。
results: 研究人员表明，使用这种方法可以使binarized neural networks具有可控的表示，并且可以使用Bolte等人的框架进行隐式导数，从而实现实际应用。

Abstract
There has been a great deal of recent interest in binarized neural networks, especially because of their explainability. At the same time, automatic differentiation algorithms such as backpropagation fail for binarized neural networks, which limits their applicability. By reformulating the problem of training binarized neural networks as a subadditive dual of a mixed-integer program, we show that binarized neural networks admit a tame representation. This, in turn, makes it possible to use the framework of Bolte et al. for implicit differentiation, which offers the possibility for practical implementation of backpropagation in the context of binarized neural networks. This approach could also be used for a broader class of mixed-integer programs, beyond the training of binarized neural networks, as encountered in symbolic approaches to AI and beyond.

摘要
有很多最近关注二进制神经网络，特别是它们的可解释性。然而，自动梯度计算算法如反射propagation失效于二进制神经网络，限制了它们的应用。我们通过将二进制神经网络训练问题重新表述为杂Integer程序的子Additive dual，证明二进制神经网络具有可控的表示。这种表示使得可以使用博尔特等人的框架对偶计算，这些计算可以实现在二进制神经网络训练中的Backpropagation。这种方法可以用于更广泛的杂Integer程序，不仅是训练二进制神经网络，还有在符号智能术中遇到的更加广泛的应用。

Accelerated Neural Network Training with Rooted Logistic Objectives

paper_url: http://arxiv.org/abs/2310.03890
repo_url: None
paper_authors: Zhu Wang, Praveen Raj Veluswami, Harsh Mishra, Sathya N. Ravi
for: 本研究旨在提出一种新的损失函数，以提高神经网络模型在实际应用中的训练速度和性能。
methods: 本研究使用了一种新的损失函数，即“根据损失函数”，这个函数是基于对数函数的一种变形，可以提高神经网络模型的训练速度和性能。
results: 在实际实验中，使用“根据损失函数”训练神经网络模型可以提高模型的性能，并且训练速度也比传统的损失函数快。此外，这种损失函数还可以应用于生成模型下沉淀应用，如StyleGAN模型的训练。

Abstract
Many neural networks deployed in the real world scenarios are trained using cross entropy based loss functions. From the optimization perspective, it is known that the behavior of first order methods such as gradient descent crucially depend on the separability of datasets. In fact, even in the most simplest case of binary classification, the rate of convergence depends on two factors: (1) condition number of data matrix, and (2) separability of the dataset. With no further pre-processing techniques such as over-parametrization, data augmentation etc., separability is an intrinsic quantity of the data distribution under consideration. We focus on the landscape design of the logistic function and derive a novel sequence of {\em strictly} convex functions that are at least as strict as logistic loss. The minimizers of these functions coincide with those of the minimum norm solution wherever possible. The strict convexity of the derived function can be extended to finetune state-of-the-art models and applications. In empirical experimental analysis, we apply our proposed rooted logistic objective to multiple deep models, e.g., fully-connected neural networks and transformers, on various of classification benchmarks. Our results illustrate that training with rooted loss function is converged faster and gains performance improvements. Furthermore, we illustrate applications of our novel rooted loss function in generative modeling based downstream applications, such as finetuning StyleGAN model with the rooted loss. The code implementing our losses and models can be found here for open source software development purposes: https://anonymous.4open.science/r/rooted_loss.

摘要
许多神经网络在实际场景中被训练使用基于权重 entropy 的损失函数。优化方面来说，已知的是，训练使用首领方法 such as 梯度下降时，数据集的分化度对结果的准确性和速度具有关键作用。实际上，甚至在最简单的二分类情况下，训练的速度和准确性都取决于数据集的分化度和Condition number of data matrix。在没有额外的预处理技术，如过 parametrization、数据增强等，数据分化度是数据分布考虑的内在特性。我们关注了对 logistic 函数的 landscape 设计，并 derive 一个 novel 的 strictly convex 函数序列，这些函数在至少等效于 logistic loss 的情况下，其最小值的解归并与 minimum norm solution 匹配。我们发现，使用我们提议的根据梯度损失函数可以更快 converges 并提高性能。此外，我们还应用了我们的新的根据梯度损失函数在生成模型中的应用，例如 StyleGAN 模型的资源化。我们的实验结果表明，使用根据梯度损失函数可以提高模型的性能和速度。代码实现我们的损失函数和模型可以在以下链接找到：。

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

paper_url: http://arxiv.org/abs/2310.19804
repo_url: None
paper_authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland
for: 这个论文是为了探讨行为度量在再征学习中的应用。
methods: 论文使用了正定定义kernels来定义一种新的度量，并利用这种度量提供了新的理论结果，包括距离值函数差异的上限和度量可以证明嵌入到finite维Euclidean空间中 WITH low distortion error。
results: 论文通过实验证明了这种方法在实践中的效果。

Abstract
Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.

摘要
行为指标已被证明是在增强学习中有效的表示机制。我们提出了一种新的视角，使用正定定义的kernel来解决Markov决策过程中的行为指标。我们利用这种新的视角来定义一个新的度量，该度量与Castro等人（2021）最近提出的MICo距离相等。kernel视角还允许我们提供新的理论结果，包括通过我们的度量下界值函数差异，以及证明我们的度量可以被证明嵌入到有低抖动误差的finite维Euclidean空间中。这些是使用行为指标进行增强学习表示的两个重要性质。我们在理论上补充了强大的实验结果，证明这些方法在实践中的有效性。

Small batch deep reinforcement learning

paper_url: http://arxiv.org/abs/2310.03882
repo_url: None
paper_authors: Johan Obando-Ceron, Marc G. Bellemare, Pablo Samuel Castro
for: 提高深度强化学习的性能
methods: 使用批处理大小为控制参数，对每次梯度更新进行采样
results: 研究发现，减小批处理大小可以提高性能，这与通常认为增加批处理大小可以提高 neural network 性能的想法相反。

Abstract
In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.

摘要
在值基深度强化学习中使用回忆储存，批处理大小参数指定每个梯度更新中样本的数量。虽然对学习过程非常重要，但通常不会在提出新算法时调整这个值。在这项工作中，我们提供了广泛的实验研究，表明减小批处理大小可以导致一些重要的性能提升，这是对训练神经网络时通常采用大批处理大小的惯例。我们补充了一系列实验分析，以更好地理解这种现象。

Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

paper_url: http://arxiv.org/abs/2310.05862
repo_url: None
paper_authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
for: 防止CLIP模型被targeted数据毒化和后门攻击
methods: 使用unimodal对比学习（CL）对图像和文本模式进行暖身，并将数据分成安全和危险 subsets，对危险 subsets进行unimodal CL 和CLIP损失的同时训练
results: 在许多数据集上，SAFECLIP可以有效防止targeted数据毒化和后门攻击，而不会对CLIP性能产生影响

Abstract
Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification and enabled transferability to new domains. However, CLIP is extremely more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning. Perhaps surprisingly, poisoning 0.0001% of CLIP pre-training data is enough to make targeted data poisoning attacks successful. This is four orders of magnitude smaller than what is required to poison supervised models. Despite this vulnerability, existing methods are very limited in defending CLIP models during pre-training. In this work, we propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks. SAFECLIP warms up the model by applying unimodal contrastive learning (CL) on image and text modalities separately. Then, it carefully divides the data into safe and risky subsets. SAFECLIP trains on the risky data by applying unimodal CL to image and text modalities separately, and trains on the safe data using the CLIP loss. By gradually increasing the size of the safe subset during the training, SAFECLIP effectively breaks targeted data poisoning and backdoor attacks without harming the CLIP performance. Our extensive experiments show that SAFECLIP decrease the attack success rate of targeted data poisoning attacks from 93.75% to 0% and that of the backdoor attacks from 100% to 0%, without harming the CLIP performance on various datasets.

摘要
对大量图像描述文本 datasets 进行 Contrastive Language-Image Pre-training (CLIP) 后得到了杰出的成功，并且允许在新领域中进行转移。然而，CLIP 对于Targeted Data Poisoning 和 Backdoor 攻击更加易受攻击，相比于超vised learning。奇怪的是，对 CLIP 预训练数据进行0.0001%的恶意数据投毒只需要0.0001%的数据，而supervised learning 需要1000倍的数据。尽管如此，现有的方法对于防止 CLIP 模型在预训练中受到攻击很有限。在这个工作中，我们提出了一种强大的防御方法，即 SafeCLIP，以安全地在预训练 CLIP 模型中进行Targeted Data Poisoning 和 Backdoor 攻击。SafeCLIP 通过在图像和文本模式上分别应用 unimodal Contrastive Learning (CL) 来让模型进行温身。然后，它 méticulously 将数据分为安全和危险子集。SafeCLIP 在危险子集上应用 unimodal CL，并在安全子集上使用 CLIP 损失进行训练。通过逐渐增加安全子集的大小 durante 训练，SafeCLIP 可以有效地破坏 Targeted Data Poisoning 和 Backdoor 攻击，而不会害 CLIP 性能。我们的广泛的实验表明，SafeCLIP 可以将 Targeted Data Poisoning 攻击的成功率从 93.75% 降低到 0%，并将 Backdoor 攻击的成功率从 100% 降低到 0%，而不会害 CLIP 在不同的 dataset 上的性能。

Validating transformers for redaction of text from electronic health records in real-world healthcare

paper_url: http://arxiv.org/abs/2310.04468
repo_url: https://github.com/CogStack/MedCAT
paper_authors: Zeljko Kraljevic, Anthony Shek, Joshua Au Yeung, Ewart Jonathan Sheldon, Mohammad Al-Agil, Haris Shuaib, Xi Bai, Kawsar Noor, Anoop D. Shah, Richard Dobson, James Teo
for: 保护医疗记录中患者隐私的研究，以实现医疗数据的安全和共享。
methods: 使用深度学习技术，特别是变换器模型，以提高隐私 obscuration 的精度和效率。
results: 在三个英国医院的实际记录中，AnonCAT 模型达到了高性能，具体来说是：Recall 为 0.99、0.99 和 0.96。

Abstract
Protecting patient privacy in healthcare records is a top priority, and redaction is a commonly used method for obscuring directly identifiable information in text. Rule-based methods have been widely used, but their precision is often low causing over-redaction of text and frequently not being adaptable enough for non-standardised or unconventional structures of personal health information. Deep learning techniques have emerged as a promising solution, but implementing them in real-world environments poses challenges due to the differences in patient record structure and language across different departments, hospitals, and countries. In this study, we present AnonCAT, a transformer-based model and a blueprint on how deidentification models can be deployed in real-world healthcare. AnonCAT was trained through a process involving manually annotated redactions of real-world documents from three UK hospitals with different electronic health record systems and 3116 documents. The model achieved high performance in all three hospitals with a Recall of 0.99, 0.99 and 0.96. Our findings demonstrate the potential of deep learning techniques for improving the efficiency and accuracy of redaction in global healthcare data and highlight the importance of building workflows which not just use these models but are also able to continually fine-tune and audit the performance of these algorithms to ensure continuing effectiveness in real-world settings. This approach provides a blueprint for the real-world use of de-identifying algorithms through fine-tuning and localisation, the code together with tutorials is available on GitHub (https://github.com/CogStack/MedCAT).

摘要
保护患者隐私在医疗记录是最高优先事项，而红aktion是一种常用的方法来隐藏直接可识别的信息。规则基于的方法已经广泛使用，但它们的精度frequently low，导致过度的文本隐藏和不够适应非标准化或不同结构的个人医疗信息。深度学习技术已经出现为一种可能的解决方案，但在实际环境中实施它们却存在医疗数据结构和语言不同的医院、医生和国家的挑战。在本研究中，我们介绍了AnonCAT，一种基于transformer的模型和在实际医疗环境中部署deidentification模型的蓝图。AnonCAT通过手动标注真实文档的红aktion进行训练，并在三个英国医院中使用3116个文档进行训练。模型在三个医院中表现出色，Recall值为0.99、0.99和0.96。我们的发现表明深度学习技术可以提高医疗数据隐藏的效率和准确性，并且建立可以不断细化和审核这些算法的工作流程，以确保在实际环境中的持续效果。这种方法提供了在实际使用deep learning隐藏算法时的蓝图，代码和教程可以在GitHub上找到（https://github.com/CogStack/MedCAT）。

Design Principles for Lifelong Learning AI Accelerators

paper_url: http://arxiv.org/abs/2310.04467
repo_url: None
paper_authors: Dhireesha Kudithipudi, Anurag Daram, Abdullah M. Zyarah, Fatima Tuz Zohora, James B. Aimone, Angel Yanguas-Gil, Nicholas Soures, Emre Neftci, Matthew Mattina, Vincenzo Lomonaco, Clare D. Thiem, Benjamin Epstein
for: 这篇论文主要是关于人工智能（AI）的持续学习（Lifelong learning），以及如何在Edge设备上实现持续学习AI模型的加速。
methods: 论文使用了一些适用于Edge设备的现有加速器，以及一些新的技术，如 neuromorphic computing 和edge AI accelerators，来实现持续学习AI模型的加速。
results: 论文提出了一些关键的可能性和度量来评估持续学习AI模型的加速器，并探讨了未来可能的技术和应用场景。

Abstract
Lifelong learning - an agent's ability to learn throughout its lifetime - is a hallmark of biological learning systems and a central challenge for artificial intelligence (AI). The development of lifelong learning algorithms could lead to a range of novel AI applications, but this will also require the development of appropriate hardware accelerators, particularly if the models are to be deployed on edge platforms, which have strict size, weight, and power constraints. Here, we explore the design of lifelong learning AI accelerators that are intended for deployment in untethered environments. We identify key desirable capabilities for lifelong learning accelerators and highlight metrics to evaluate such accelerators. We then discuss current edge AI accelerators and explore the future design of lifelong learning accelerators, considering the role that different emerging technologies could play.

摘要
人生学习 - 一个智能代理的生命中不断学习能力 - 是生物学学习系统的特征和人工智能（AI）的中心挑战。开发持续学习AI算法可能会导致多种新的AI应用程序，但这也需要开发适当的硬件加速器，特别是如果模型需要在边缘平台上部署，这些平台具有严格的大小、重量和功耗限制。我们研究了部署在无缝环境中的持续学习AI加速器的设计。我们确定了持续学习加速器所需的关键愿景和评价指标，然后讨论当前的边缘AI加速器和未来的持续学习加速器设计，并考虑不同的新技术在这方面的作用。

Contextualized Structural Self-supervised Learning for Ontology Matching

paper_url: http://arxiv.org/abs/2310.03840
repo_url: https://github.com/ellenzhuwang/lakermap
paper_authors: Zhu Wang
for: This paper is written for researchers and practitioners in the field of knowledge graph (KG) integration, particularly those interested in ontology matching (OM) and self-supervised learning.
methods: The paper proposes a novel self-supervised learning OM framework called LaKERMap, which leverages transformer-based language models and incorporates implicit knowledge to capture multiple structural contexts. The framework utilizes distinct training objectives to improve alignment quality and inference time.
results: The paper reports that LaKERMap outperforms state-of-the-art systems in terms of alignment quality and inference time, as demonstrated through experiments on the Bio-ML datasets and tasks. The findings suggest that LaKERMap is a promising approach for KG integration.

Abstract
Ontology matching (OM) entails the identification of semantic relationships between concepts within two or more knowledge graphs (KGs) and serves as a critical step in integrating KGs from various sources. Recent advancements in deep OM models have harnessed the power of transformer-based language models and the advantages of knowledge graph embedding. Nevertheless, these OM models still face persistent challenges, such as a lack of reference alignments, runtime latency, and unexplored different graph structures within an end-to-end framework. In this study, we introduce a novel self-supervised learning OM framework with input ontologies, called LaKERMap. This framework capitalizes on the contextual and structural information of concepts by integrating implicit knowledge into transformers. Specifically, we aim to capture multiple structural contexts, encompassing both local and global interactions, by employing distinct training objectives. To assess our methods, we utilize the Bio-ML datasets and tasks. The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems in terms of alignment quality and inference time. Our models and codes are available here: https://github.com/ellenzhuwang/lakermap.

摘要
ontology matching (OM) 涉及到两个或更多知识图(KG)中概念之间的Semantic关系的识别，并且作为将KG集成的关键步骤。 current advancements in deep OM models have leveraged the power of transformer-based language models and the advantages of knowledge graph embedding. However, these OM models still face persistent challenges, such as a lack of reference alignments, runtime latency, and unexplored different graph structures within an end-to-end framework. In this study, we introduce a novel self-supervised learning OM framework with input ontologies, called LaKERMap. This framework capitalizes on the contextual and structural information of concepts by integrating implicit knowledge into transformers. Specifically, we aim to capture multiple structural contexts, encompassing both local and global interactions, by employing distinct training objectives. To assess our methods, we utilize the Bio-ML datasets and tasks. The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems in terms of alignment quality and inference time. Our models and codes are available here: https://github.com/ellenzhuwang/lakermap.Here's the breakdown of the translation:* "Ontology matching" is translated as "ontology matching" (同义词翻译)* "identification of semantic relationships" is translated as "识别Semantic关系" ( literal translation)* "between concepts within two or more knowledge graphs" is translated as "在两个或更多知识图中的概念之间" ( literal translation)* "and serves as a critical step in integrating knowledge graphs from various sources" is translated as "并且作为将KG集成的关键步骤" ( literal translation)* "current advancements in deep OM models" is translated as "current advancements in deep OM models" (同义词翻译)* "have leveraged the power of transformer-based language models" is translated as "have leveraged the power of transformer-based language models" (同义词翻译)* "and the advantages of knowledge graph embedding" is translated as "和知识图嵌入的优点" ( literal translation)* "However, these OM models still face persistent challenges" is translated as "然而, these OM models still face persistent challenges" (同义词翻译)* "such as a lack of reference alignments" is translated as "如无参照对对应" ( literal translation)* "runtime latency" is translated as "运行时延迟" ( literal translation)* "and unexplored different graph structures within an end-to-end framework" is translated as "和未探索的不同图结构在端到端框架中" ( literal translation)* "In this study, we introduce a novel self-supervised learning OM framework with input ontologies" is translated as "在本研究中, we introduce a novel self-supervised learning OM framework with input ontologies" (同义词翻译)* "called LaKERMap" is translated as "called LaKERMap" (同义词翻译)* "This framework capitalizes on the contextual and structural information of concepts" is translated as "这个框架利用概念的上下文ual和结构信息" ( literal translation)* "by integrating implicit knowledge into transformers" is translated as "通过将隐式知识 integrate into transformers" ( literal translation)* "Specifically, we aim to capture multiple structural contexts" is translated as " Specifically, we aim to capture multiple structural contexts" (同义词翻译)* "encompassing both local and global interactions" is translated as "包括both local and global interactions" ( literal translation)* "by employing distinct training objectives" is translated as "通过不同的训练目标" ( literal translation)* "To assess our methods, we utilize the Bio-ML datasets and tasks" is translated as "以评估我们的方法, we utilize the Bio-ML datasets and tasks" (同义词翻译)* "The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems" is translated as "我们创新的方法的结果表明LaKERMap超过了现状的系统" ( literal translation)* "in terms of alignment quality and inference time" is translated as "在对应质量和推理时间方面" ( literal translation)* "Our models and codes are available here: https://github.com/ellenzhuwang/lakermap" is translated as "我们的模型和代码可以在这里获取: https://github.com/ellenzhuwang/lakermap" (同义词翻译)

paper_url: http://arxiv.org/abs/2310.05984
repo_url: None
paper_authors: Petter Törnberg, Diliara Valeeva, Justus Uitermark, Christopher Bail
for: 本研究旨在研究如何通过组合大自然语言模型（LLM）和代理模型来实现社交媒体平台的优化。
methods: 本研究使用了大自然语言模型（LLM）和代理模型来模拟社交媒体平台，并使用了来自美国全国选举研究的数据来填充模拟的社交媒体平台。
results: 研究发现，使用“桥接”算法可以促进不同政见用户之间的构成性对话，而不同的新闻列表算法则可能会导致更多的攻击性和不constructive的对话。

Abstract
Social media is often criticized for amplifying toxic discourse and discouraging constructive conversations. But designing social media platforms to promote better conversations is inherently challenging. This paper asks whether simulating social media through a combination of Large Language Models (LLM) and Agent-Based Modeling can help researchers study how different news feed algorithms shape the quality of online conversations. We create realistic personas using data from the American National Election Study to populate simulated social media platforms. Next, we prompt the agents to read and share news articles - and like or comment upon each other's messages - within three platforms that use different news feed algorithms. In the first platform, users see the most liked and commented posts from users whom they follow. In the second, they see posts from all users - even those outside their own network. The third platform employs a novel "bridging" algorithm that highlights posts that are liked by people with opposing political views. We find this bridging algorithm promotes more constructive, non-toxic, conversation across political divides than the other two models. Though further research is needed to evaluate these findings, we argue that LLMs hold considerable potential to improve simulation research on social media and many other complex social settings.

摘要

ECAvg: An Edge-Cloud Collaborative Learning Approach using Averaged Weights

paper_url: http://arxiv.org/abs/2310.03823
repo_url: None
paper_authors: Atah Nuh Mih, Hung Cao, Asfia Kawnine, Monica Wachowicz
for: 这个研究旨在提出一个Edge-Cloud共同运作的架构，让Edge和Clouddevices之间建立合作关系，各自补偿对方的缺陷。
methods: 这个方法是基于Edge device先训练本地模型，然后将模型转移到服务器进行精细调整。服务器给出了一个全球模型，并将它与各个Edge device的本地模型进行测量。
results: 在CIFAR-10和CIFAR-100分类任务中，我们发现使用我们的方法可以提高服务器模型的性能，并且在Edge device上进行模型更新可以提高性能。但在MNIST分类任务中，将权重平均化导致服务器和Edge device模型的性能下降，这是由于负面传播学习所致。

Abstract
The use of edge devices together with cloud provides a collaborative relationship between both classes of devices where one complements the shortcomings of the other. Resource-constraint edge devices can benefit from the abundant computing power provided by servers by offloading computationally intensive tasks to the server. Meanwhile, edge devices can leverage their close proximity to the data source to perform less computationally intensive tasks on the data. In this paper, we propose a collaborative edge-cloud paradigm called ECAvg in which edge devices pre-train local models on their respective datasets and transfer the models to the server for fine-tuning. The server averages the pre-trained weights into a global model, which is fine-tuned on the combined data from the various edge devices. The local (edge) models are then updated with the weights of the global (server) model. We implement a CIFAR-10 classification task using MobileNetV2, a CIFAR-100 classification task using ResNet50, and an MNIST classification using a neural network with a single hidden layer. We observed performance improvement in the CIFAR-10 and CIFAR-100 classification tasks using our approach, where performance improved on the server model with averaged weights and the edge models had a better performance after model update. On the MNIST classification, averaging weights resulted in a drop in performance on both the server and edge models due to negative transfer learning. From the experiment results, we conclude that our approach is successful when implemented on deep neural networks such as MobileNetV2 and ResNet50 instead of simple neural networks.

摘要
使用边缘设备与云计算机联合，两类设备之间形成合作关系，其中边缘设备利用云计算机的庞大计算能力来推OFF computationally intensive tasks，而云计算机则可以利用边缘设备的数据靠近源来执行less computationally intensive tasks。在这篇论文中，我们提出了一种协同边缘云模型（ECAvg），其中边缘设备先在本地数据集上预训local models，然后将模型传输到服务器进行细化。服务器将预训模型的权重平均化为全局模型，并在多个边缘设备的数据合并后进行细化。本地（边缘）模型然后将全局（服务器）模型的权重更新。我们在MobileNetV2、ResNet50和一个单Hidden layer的神经网络上实现了CIFAR-10、CIFAR-100和MNIST分类任务。我们发现，使用我们的方法时，CIFAR-10和CIFAR-100分类任务中的性能提高，而服务器模型和边缘模型都有更好的性能之后更新模型。但在MNIST分类任务中，平均权重导致服务器和边缘模型的性能下降，这是因为负转移学习。从实验结果来看，我们的方法在深度神经网络如MobileNetV2和ResNet50上更加成功，而不是简单的神经网络。

Accurate Cold-start Bundle Recommendation via Popularity-based Coalescence and Curriculum Heating

paper_url: http://arxiv.org/abs/2310.03813
repo_url: None
paper_authors: Hyunsik Jeon, Jong-eun Lee, Jeongin Yun, U Kang
for: 提出了一种准确的冷启动bundle推荐方法，用于解决实际场景中新bundle的创造和推荐问题。
methods: 提出了一种基于媒体的CoHeat方法，通过结合历史信息和联合信息来衡量用户-bundle关系，并通过curriculum学习和对比学习来学习秘密表示。
results: 对比 bestechnologie，CoHeat方法在冷启动bundle推荐中显示出了193%高的nDCG@20指标， indicating its superior performance in accurately recommending cold-start bundles.

Abstract
How can we accurately recommend cold-start bundles to users? The cold-start problem in bundle recommendation is critical in practical scenarios since new bundles are continuously created for various marketing purposes. Despite its importance, no previous studies have addressed cold-start bundle recommendation. Moreover, existing methods for cold-start item recommendation overly rely on historical information, even for unpopular bundles, failing to tackle the primary challenge of the highly skewed distribution of bundle interactions. In this work, we propose CoHeat (Popularity-based Coalescence and Curriculum Heating), an accurate approach for the cold-start bundle recommendation. CoHeat tackles the highly skewed distribution of bundle interactions by incorporating both historical and affiliation information based on the bundle's popularity when estimating the user-bundle relationship. Furthermore, CoHeat effectively learns latent representations by exploiting curriculum learning and contrastive learning. CoHeat demonstrates superior performance in cold-start bundle recommendation, achieving up to 193% higher nDCG@20 compared to the best competitor.

摘要
如何准确推荐冷启用户？冷启问题在Bundle推荐中是非常重要的，新的Bundle在各种市场营销目的下不断创建。然而，前面的研究未能够正确地解决冷启Bundle推荐问题。现有的冷启Item推荐方法过于依赖历史信息，即使是不受欢迎的Bundle也会被优先推荐，无法解决主要挑战：Bundle互动的非常均衡分布。在这项工作中，我们提出CoHeat（流行度基于的合并和辅助热化），一种精度的冷启Bundle推荐方法。CoHeat通过考虑Bundle的流行度时对用户-Bundle关系进行估计，解决了高度均衡分布的Bundle互动问题。此外，CoHeat通过辅助学习和对比学习来有效地学习潜在表示。CoHeat在冷启Bundle推荐方面表现出色，与最佳竞争对手相比，可以达到193%的nDCG@20提高。

Improved Baselines with Visual Instruction Tuning

paper_url: http://arxiv.org/abs/2310.03744
repo_url: None
paper_authors: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
for: 这份研究是为了提高大型多模态模型（LMM）的视觉指令调整能力。
methods: 这份研究使用了 LLVA 模型，并进行了一些简单的修改，包括使用 CLIP-ViT-L-336px 和 MLP 投影，以及添加学术任务oriented VQA 数据和简单的响应格式提示。
results: 研究显示，通过这些修改，可以建立更强的基elines，达到了 11 个标准测试 benchmark 的状态态。最终的 13B 检查点只需使用了 1.2M 公共可用数据，并在单个 8-A100 节点上完成了全程训练，耗时约为 1 天。

Abstract
Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~1 day on a single 8-A100 node. We hope this can make state-of-the-art LMM research more accessible. Code and model will be publicly available.

摘要
大型多Modal模型（LMM）最近已经表现出了鼓舞人心的进步，在这份笔记中，我们表明了 LLava 中的全连接视力语言跨模态连接器 surprisingly 强大和数据有效。通过简单地修改 LLava， specifically using CLIP-ViT-L-336px with MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks。我们的最终13B checkpoint只用了1.2M公共可用数据，并在单个8-A100节点上完成了完整的训练，只需要大约1天时间。我们希望这可以让state-of-the-art LMM研究更加 accessible。代码和模型将公开available。

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

paper_url: http://arxiv.org/abs/2310.03739
repo_url: https://github.com/mihirp1998/alignprop
paper_authors: Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki
for: 这篇论文的目的是优化文本至图像生成模型，以便在下游任务中控制其行为，例如提高人类所感觉的图像质量、图像文本Alignment、或道德性图像生成。
methods: 本篇论文提出了一种名为AlignProp的方法，它使用端到端归整法将扩散模型与下游 reward function alignment，通过终端测量过程中的梯度检查点来实现可持续的内存使用。
results: 根据本篇论文的测试结果，AlignProp 在调整扩散模型的不同目标下（例如图像文本对齐、美学、压缩性和物件数量控制）表现出比其他方法更高的奖励，同时更简单易懂，因此可以轻松地优化扩散模型以满足 differentiable reward function 的需求。

Abstract
Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest. Code and Visualization results are available at https://align-prop.github.io/.

摘要
文本到图像扩散模型最近在图像生成领域取得了前列位，得益于巨大的文本到图像无监督或弱监督训练 dataset。由于它们的无监督训练，控制它们在下游任务中，如提高人类感知的图像质量、图像文本对齐或道德图像生成，是困难的。现有的工作是通过普通的再征学习训练 diffusion models，不可避免的高弹性问题。在这篇论文中，我们提出了 AlignProp，一种方法，用于将 diffusion models 与下游奖励函数相对位。我们使用终端到终端的反推进程来实现这一点，并使用低级adapter weight模块的训练和梯度检查点，使其可以实现可持续的存储和计算。我们在不同的目标上训练 diffusion models，如图像文本 semantic alignment、美学、压缩和对象数量控制，以及其组合。我们发现 AlignProp 在更少的训练步骤中 achieve 更高的奖励，而且概念更简单，因此在 differentiable 奖励函数的 интерес领域中是一个简单的选择。代码和视觉结果可以通过 https://align-prop.github.io/ 访问。

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

paper_url: http://arxiv.org/abs/2310.03731
repo_url: https://github.com/mathllm/mathcoder
paper_authors: Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, Hongsheng Li
for: 这篇论文的目的是提高开源语言模型的数学逻辑能力。
methods: 该论文提出了一种方法，用于微调开源语言模型，使其可以使用代码来建模和 derivation 数学公式。
results: 该论文的实验结果表明，使用该方法可以创建一些高质量的数学问题和其解决方案的代码 dataset，并且可以在MATH和GSM8K数据集上达到状态 искусственный智能模型的最高分（45.2%和83.9%）。

Abstract
The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.

摘要
Recently released GPT-4 Code Interpreter 已经表现出优秀的能力解决复杂的数学问题，主要归功于其能够自然语言和代码之间无缝相互作用，生成代码，执行代码，然后继续根据执行结果进行数学逻辑推理。在这篇论文中，我们提出了一种方法来调整开源语言模型，使其可以使用代码来建模和推导数学方程，从而提高其数学逻辑能力。我们提出了一种生成 novel 和高质量数学问题和代码解决方案的方法，称为 MathCodeInstruct。每个解决方案都包含自然语言、代码和执行结果。我们还介绍了一种自定义的监督训练和推理方法。这种方法生成了 MathCoder 模型，它是一家能够通过代码来解决复杂数学问题的模型。很显然，MathCoder 模型在开源 LLM 中的状态表现非常出色，在 MATH 和 GSM8K 数据集上分别达到了 45.2% 和 83.9% 的得分，大幅超过其他开源选项。尤其是，MathCoder 模型不仅在 GSM8K 和 MATH 数据集上超过 ChatGPT-3.5 和 PaLM-2，还在竞赛级别的 MATH 数据集上超过 GPT-4。数据集和模型将在 GitHub 上发布，请参考。

Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.03718
repo_url: None
paper_authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao
for: 本研究旨在训练具有安全限制的奖励学习（RL） Agent，以满足不同安全限制要求。
methods: 我们提出了 Conditioned Constrained Policy Optimization（CCPO）框架，包括两个关键模块：（1） Versatile Value Estimation（VVE）用于在未经见过的阈值条件下估算价值函数，以及（2） Conditioned Variational Inference（CVI）用于在策略优化过程中编码特定的安全限制条件。
results: 我们的实验结果表明，CCPO 可以在安全性和任务性能之间取得平衡，同时维护零批量适应性，使其适用于真实世界中的动态应用场景。

Abstract
Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements: training efficiency and zero-shot adaptation capability. To address them, we introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules: (1) Versatile Value Estimation (VVE) for approximating value functions under unseen threshold conditions, and (2) Conditioned Variational Inference (CVI) for encoding arbitrary constraint thresholds during policy optimization. Our extensive experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance while preserving zero-shot adaptation capabilities to different constraint thresholds data-efficiently. This makes our approach suitable for real-world dynamic applications.

摘要
安全强化学习（RL）专注于训练根据预先定义的安全限制的奖励最大化代理人。然而，在部署期间无需重新训练而适应不同安全限制要求的灵活安全策略仍是一个未探讨的和挑战性的领域。在这项工作中，我们定义了多样化安全RL问题，并考虑了两个主要要求：训练效率和零扩展能力。为解决这些问题，我们提出了条件constrained Policy优化框架（CCPO），该框架包括以下两个关键模块：1. 多样化价值估计（VVE）：用于在未看到的阈值条件下估计价值函数。2. 条件variational推理（CVI）：用于在政策优化过程中编码任意的阈值条件。我们的广泛的实验表明，CCPO在安全性和任务性能方面表现出色，同时保持零扩展能力，以便在不同的阈值条件下进行数据效率地部署。这使得我们的方法适用于真实的动态应用程序。

Artificial Intelligence Index Report 2023

paper_url: http://arxiv.org/abs/2310.03715
repo_url: None
paper_authors: Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, Raymond Perrault
for: 这份报告是为了提供不偏袋化、严格验证的AI相关数据，以便政策制定者、研究人员、高管、新闻工作者和一般公众更好地理解人工智能领域的复杂问题。
methods: 这份报告使用了多种方法，包括新的AI公众意见章节、更详细的技术性表现章节、大语言和多媒体模型的原始分析、全球AI法规记录的详细趋势、AI系统环境影响的研究和更多的数据来跟踪、汇总、筛选和可视化AI相关数据。
results: 这份报告提供了更多原创数据，包括AI公众意见、技术性表现、大语言和多媒体模型的分析、全球AI法规记录的趋势和AI系统环境影响的研究结果。

Abstract
Welcome to the sixth edition of the AI Index Report. This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about large language and multimodal models, detailed trends in global AI legislation records, a study of the environmental impact of AI systems, and more. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world's most credible and authoritative source for data and insights about AI.

摘要
Translated into Simplified Chinese:欢迎来到第六版AI指数报告。本年度报告包含更多的原创数据，包括一新的公众意见章节、更加详细的技术性能章节、关于大语言和多Modal模型的原始分析、全球AI法规纪录的详细趋势、AI系统的环境影响研究和更多。AI指数报告跟踪、汇总、缩写和可视化相关的人工智能数据，旨在为政策制定者、研究人员、高管、记者和普通公众提供不偏不倚的、严格审核的、广泛来源的数据，以便他们更好地理解人工智能领域的复杂问题。报告目标是成为全球最可靠和权威的AI数据和意见源。

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

paper_url: http://arxiv.org/abs/2310.03714
repo_url: https://github.com/stanfordnlp/dspy
paper_authors: Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
for: This paper is written for developing and optimizing language model (LM) pipelines using a programming model called DSPy.
methods: The paper uses a programming model called DSPy to abstract LM pipelines as text transformation graphs, and introduces a compiler that optimizes any DSPy pipeline to maximize a given metric.
results: The paper shows that succinct DSPy programs can express and optimize sophisticated LM pipelines that outperform standard few-shot prompting and pipelines with expert-created demonstrations, and that DSPy is competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5.

Abstract
The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy

摘要
《机器学习社区正在快速探索语言模型（LM）的引导技术和将其组合成复杂任务解决的管道。然而，现有的LM管道通常通过手动定制"提示模板"来实现，即通过尝试和错误来发现长串。为了更系统地开发和优化LM管道，我们介绍了DSPy，它是一种文本转换图模型，将LM管道转换为声明式计算图。DSPy模块是可参数化的，这意味着它们可以通过创建和收集示例来学习应用 Compositions of 提示、训练、扩展和理解技术。我们设计了一个编译器，可以对任何DSPy管道进行优化，以最大化给定指标。我们进行了两个案例研究，显示了简洁的DSPy程序可以表达和优化复杂LM管道，处理数学问题、多步返回、复杂问题和控制 Agent 循环。在编译后只需几行DSPy代码，GPT-3.5和 llama2-13b-chat 可以自动化管道，并在标准几个示例后（通常高于25%和65%）和专家创建示例（最高上升5-46%和16-40%）之上表现出色。此外，DSPy 编译到小型LM如770M-参数的T5和 llama2-13b-chat 与专家写提示链的方法相当竞争。DSPy 可以在上获取。

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

paper_url: http://arxiv.org/abs/2310.03710
repo_url: https://github.com/wang-research-lab/agentinstruct
paper_authors: Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, Chenguang Wang
for: 提高大语言模型在通用语言理解任务上的零shot理解能力
methods: 建立一个自主智能体来指导大语言模型的理解过程
results: 我们的方法在各种数据集上表现出色，在29个数据集中取得了state-of-the-art的零shot性能，比如提高了现状体现模型的性能，例如Vicuna-13b（13.3%）、Llama-2-70b-chat（23.2%）和GPT-3.5 Turbo（17.0%），与零shot chain of thought相比，我们的改进是很显著的，平均提高10.5%。

Abstract
We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%.

摘要
我们提出一种方法，以提高大语言模型在通用语言理解任务上的零基础理解能力。具体来说，我们构建了一个自动化代理，以控制大语言模型的理解过程。我们发现这种方法可以进一步释放大语言模型的零基础理解能力，以更多任务。我们对一系列数据集进行了测试，包括生成、分类和理解等任务。我们发现这种方法在大多数任务上具有普适性，并在20个数据集中实现了零基础性状态之势。例如，我们的方法可以大幅提高现有的状态级别大语言模型的性能，包括Vicuna-13b（13.3%）、Llama-2-70b-chat（23.2%）和GPT-3.5 Turbo（17.0%）。相比零基础思维，我们的改进在理解方面是悬殊的，平均提高10.5%。凭借我们的方法，Llama-2-70b-chat可以在零基础情况下超越GPT-3.5 Turbo的性能，提高10.2%。

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models

paper_url: http://arxiv.org/abs/2310.03708
repo_url: None
paper_authors: Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao
for: 这 paper 的目的是为了开发一种不需要人工学习的多目标RLHF算法，以提高语言模型的个性化适应性。
methods: 这 paper 使用的方法是基于直接喜好函数优化（DPO）的多目标RLHF算法，通过约束搜索和约束优化来学习多个目标对齐对象。
results: 实验结果表明，使用 MODPO 可以与现有方法匹配或超越其性能，并且可以在3倍的计算量下完成多目标RLHF。

Abstract
A single language model (LM), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences. Recent approaches thus pursue customization, training separate principle-based reward models to represent different alignment objectives (e.g. helpfulness, harmlessness, or honesty). Different LMs can then be trained for different preferences through multi-objective RLHF (MORLHF) with different objective weightings. Yet, RLHF is unstable and resource-heavy, especially for MORLHF with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO folds LM learning directly into reward modeling, aligning LMs with the weighted sum of all principle-based rewards using pure cross-entropy loss. While theoretically guaranteed to produce the same optimal solutions as MORLHF, MODPO is practically more stable and computationally efficient, obviating value function modeling and online sample collection. Empirical results in safety alignment and long-form question answering confirm that MODPO matches or outperforms existing methods, consistently producing one of the most competitive LM fronts that cater to diverse preferences with 3 times fewer computations compared with MORLHF.

摘要
一个语言模型（LM），即使与平均标注员的匹配得很好，也可能不适应人类的多样化偏好。现有的方法因此尝试个性化，通过训练不同的原则基于奖励模型来表达不同的匹配目标（例如帮助fulness、无害性和诚实）。然后可以通过多目标RLHF（MORLHF）来训练不同的LM。然而，RLHF是不稳定的，特别是在多目标情况下。在这篇论文中，我们提出了多目标直接偏好优化（MODPO）算法，它是RL无法算法，扩展了直接偏好优化（DPO）来处理多个原则基于奖励的目标。MODPO通过将LM学习直接嵌入奖励模型中，将LM与所有原则基于奖励的权重加权和平均值相对应。虽然从理论角度来看，MODPO和MORLHF都可以生成同样的优化解，但MODPO在实践中更稳定和计算效率更高，不需要值函数模型和在线样本采集。实验结果表明，MODPO在安全匹配和长文问答中与现有方法匹配或超越，可靠地生成适应多种偏好的LM前端，使用3倍少的计算量比MORLHF。

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

paper_url: http://arxiv.org/abs/2310.03693
repo_url: https://github.com/llm-tuning-safety/llms-finetuning-safety
paper_authors: Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson
for:这篇论文探讨了在自定义逻辑语言模型（LLM）上进行微调时，安全成本是什么样的？研究发现，即使模型初始化时的安全性检查通过，也不能保证模型在微调后保持安全。methods:研究人员使用了OpenAI的API进行微调GPT-3.5 Turbo模型，并通过自定义训练集来攻击模型的安全性。results:研究发现，只需要对GPT-3.5 Turbo模型进行微调 Using 10个逆向设计的训练例子，可以破坏模型的安全保护，并且这些例子可以通过OpenAI的API进行微调。此外，研究还发现，只要使用一些常用的训练集来微调模型，即使没有恶意，也可以减弱模型的安全性。这些发现表明，自定义微调aligned LLMs可能会带来新的安全隐患，而现有的安全基础设施并不能够处理这些隐患。

Abstract
Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are the safety costs associated with such custom fine-tuning? We note that while existing safety alignment infrastructures can restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users. Our red teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GPT-3.5 Turbo's safety guardrails by fine-tuning it on only 10 such examples at a cost of less than $0.20 via OpenAI's APIs, making the model responsive to nearly any harmful instructions. Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing -- even if a model's initial safety alignment is impeccable, it is not necessarily to be maintained after custom fine-tuning. We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the custom fine-tuning of aligned LLMs.

摘要
优化大型自然语言模型（LLM）为下游应用场景通常通过进一步精度调整来进行自定义。梅塔公司开源了LLAMA模型，而OpenAI提供了用于在自定义数据集上精度调整GPT-3.5 Turbo的API，这些做法也鼓励了这种实践。但是，在这种自定义调整中存在哪些安全成本呢？我们发现，虽然现有的安全对齐基础设施可以在推理时约束LLM的危险行为，但是它们不能覆盖在调整时的安全风险。我们的红团研究发现，通过只有少量反制设计的训练例来破坏LLM的安全对齐可以在 less than $0.20 的成本下使GPT-3.5 Turbo进行破坏。另外，我们发现，只要是通过常用的数据集进行调整，即使没有恶意，也可以不知不觉地削弱LLM的安全对齐。这些发现表明，在自定义调整aligned LLMs时，存在新的安全风险，现有的安全基础设施无法妥善处理这些风险。我们提出和分析了可能的缓解措施，并且强调进一步的研究努力应对自定义调整 aligned LLMs 的安全问题。

Probabilistic Generative Modeling for Procedural Roundabout Generation for Developing Countries

paper_url: http://arxiv.org/abs/2310.03687
repo_url: None
paper_authors: Zarif Ikram, Ling Pan, Dianbo Liu
for: 设计优化交通路网，以优化交通运输和 validate 效果，为发展中国家提供成本效果的方案。
methods: 使用 Generative Flow Networks (GFlowNets) 学习权值分布，生成高质量的解决方案，保留多样性。
results: 与相关方法进行比较，实验结果表明，我们的方法可以保持高效性，同时具有更高的多样性。

Abstract
Due to limited resources and fast economic growth, designing optimal transportation road networks with traffic simulation and validation in a cost-effective manner is vital for developing countries, where extensive manual testing is expensive and often infeasible. Current rule-based road design generators lack diversity, a key feature for design robustness. Generative Flow Networks (GFlowNets) learn stochastic policies to sample from an unnormalized reward distribution, thus generating high-quality solutions while preserving their diversity. In this work, we formulate the problem of linking incident roads to the circular junction of a roundabout by a Markov decision process, and we leverage GFlowNets as the Junction-Art road generator. We compare our method with related methods and our empirical results show that our method achieves better diversity while preserving a high validity score.

摘要
(Simplified Chinese translation)由于有限的资源和快速的经济增长，为发展中国家设计优化的交通运输路网，并在成本效益的情况下进行交通模拟和验证，是非常重要的。现有的规则基于的路线设计生成器缺乏多样性，这是设计Robustness的关键特征。生成流网络（GFlowNets）学习了随机政策，从未正规化的奖励分布中采样，因此可以生成高质量的解决方案，同时保持多样性。在这项工作中，我们将环境穿梭的问题形式化为Markov决策过程，并利用GFlowNets作为环境艺术路径生成器。我们与相关方法进行比较，我们的实验结果表明，我们的方法可以保持高有效性分数，同时提高多样性。

Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation

paper_url: http://arxiv.org/abs/2310.03780
repo_url: None
paper_authors: Tung Phung, Victor-Alexandru Pădurean, Anjali Singh, Christopher Brooks, José Cambronero, Sumit Gulwani, Adish Singla, Gustavo Soares
for: 提高编程教育质量 by 自动生成个性化反馈
methods: 使用生成AI模型提供人工导师式编程提示，使学生解决buggy程序错误
results: 通过使用GPT-4和GPT-3.5两个模型，提高生成质量，并通过自动评估提示质量，证明效果可行

Abstract
Generative AI and large language models hold great promise in enhancing programming education by automatically generating individualized feedback for students. We investigate the role of generative AI models in providing human tutor-style programming hints to help students resolve errors in their buggy programs. Recent works have benchmarked state-of-the-art models for various feedback generation scenarios; however, their overall quality is still inferior to human tutors and not yet ready for real-world deployment. In this paper, we seek to push the limits of generative AI models toward providing high-quality programming hints and develop a novel technique, GPT4Hints-GPT3.5Val. As a first step, our technique leverages GPT-4 as a ``tutor'' model to generate hints -- it boosts the generative quality by using symbolic information of failing test cases and fixes in prompts. As a next step, our technique leverages GPT-3.5, a weaker model, as a ``student'' model to further validate the hint quality -- it performs an automatic quality validation by simulating the potential utility of providing this feedback. We show the efficacy of our technique via extensive evaluation using three real-world datasets of Python programs covering a variety of concepts ranging from basic algorithms to regular expressions and data analysis using pandas library.

摘要
�� Makin' AI and big language models hold great promise in improvIN' programming education by automatically generatin' individualized feedback for students. We investigate the role of generative AI models in providin' human tutor-style programming hints to help students resolve errors in their buggy programs. Recent works have benchmarked state-of-the-art models for various feedback generation scenarios; however, their overall quality is still inferior to human tutors and not yet ready for real-world deployment. In this paper, we seek to push the limits of generative AI models toward providin' high-quality programming hints and develop a novel technique, GPT4Hints-GPT3.5Val. As a first step, our technique leverages GPT-4 as a "tutor" model to generate hints -- it boosts the generative quality by usin' symbolic information of failin' test cases and fixes in prompts. As a next step, our technique leverages GPT-3.5, a weaker model, as a "student" model to further validate the hint quality -- it performs an automatic quality validation by simulatin' the potential utility of providin' this feedback. We show the efficacy of our technique via extensive evaluation usin' three real-world datasets of Python programs coverin' a variety of concepts ranging from basic algorithms to regular expressions and data analysis usin' pandas library.

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

paper_url: http://arxiv.org/abs/2310.03684
repo_url: None
paper_authors: Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas
for: 提高大型自然语言模型（LLM）的安全性，防止攻击者利用LLM生成不良内容。
methods: 提出了首个针对LLM的攻击mitigation算法SmoothLLM，通过多个复制输入提示，并将其相应的预测结果集成以检测攻击输入。
results: SmoothLLM可以在许多流行的LLM上降低攻击成功率至0.1%以下，避免过度保守，并具有可证明的攻击防御保证。

Abstract
Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on LLMs. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. SmoothLLM reduces the attack success rate on numerous popular LLMs to below one percentage point, avoids unnecessary conservatism, and admits provable guarantees on attack mitigation. Moreover, our defense uses exponentially fewer queries than existing attacks and is compatible with any LLM.

摘要
尽管努力对大型语言模型（LLM）与人类价值观念进行对应，广泛使用的 LLM 如 GPT、Llama、Claude 和 PalM 仍然容易受到犯罪攻击，其中敌对者会让目标 LLM 生成不良内容。为解决这个漏洞，我们提出了 SmoothLLM，首个针对 LLM 进行犯罪攻击防御的算法。根据我们发现，恶意生成的提示语是字符级别上不稳定的，我们的防御首先随机干扰多个输入提示的字符，然后将相应的预测结果聚合以检测恶意输入。SmoothLLM 可以在许多流行的 LLM 上降低犯罪成功率至少一个百分点，避免不必要的保守性，并且具有可证明的攻击防御保证。此外，我们的防御需要比现有攻击更少的查询数量，并且可以与任何 LLM 兼容。

MapperGPT: Large Language Models for Linking and Mapping Entities

paper_url: http://arxiv.org/abs/2310.03666
repo_url: None
paper_authors: Nicolas Matentzoglu, J. Harry Caufield, Harshad B. Hegde, Justin T. Reese, Sierra Moxon, Hyeongsik Kim, Nomi L. Harris, Melissa A Haendel, Christopher J. Mungall
for: 提高数据 интеграция中的Entity mapping精度，使其更能准确地将不同资源中的实体映射到相应的概念上。
methods: 使用Large Language Models（LLMs）进行Entity mapping审核和修正，以提高 mapping 精度。
results: 在不同领域的Alignment任务中，MapperGPT可以与高准确率方法相结合，提供substantial改进的准确率，比如LogMap等State-of-the-art方法。

Abstract
Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is the process of determining correspondences between entities across these resources, such as gene identifiers, disease concepts, or chemical entity identifiers. Many tools have been developed to compute such mappings based on common structural features and lexical information such as labels and synonyms. Lexical approaches in particular often provide very high recall, but low precision, due to lexical ambiguity. As a consequence of this, mapping efforts often resort to a labor intensive manual mapping refinement through a human curator. Large Language Models (LLMs), such as the ones employed by ChatGPT, have generalizable abilities to perform a wide range of tasks, including question-answering and information extraction. Here we present MapperGPT, an approach that uses LLMs to review and refine mapping relationships as a post-processing step, in concert with existing high-recall methods that are based on lexical and structural heuristics. We evaluated MapperGPT on a series of alignment tasks from different domains, including anatomy, developmental biology, and renal diseases. We devised a collection of tasks that are designed to be particularly challenging for lexical methods. We show that when used in combination with high-recall methods, MapperGPT can provide a substantial improvement in accuracy, beating state-of-the-art (SOTA) methods such as LogMap.

摘要
合理资源对Alignment是各个领域的数据 интеграción中的关键环节，如医疗、化学和生物研究等。实体映射是确定这些资源中的实体之间对应关系的过程，例如基因标识符、疾病概念或化学实体标识符。许多工具已经开发出来计算这些对应关系，基于共同结构特征和 lexical信息，如标签和同义词。lexical方法通常提供很高的回快，但准确率很低，因为lexical是多义的。因此，映射努力通常需要劳动密集的手动映射纠正。大语言模型（LLMs），如ChatGPT中所使用的模型，具有通用的能力来完成广泛的任务，包括问答和信息提取。我们提出了MapperGPT，一种使用LLMs来复制和纠正映射关系的方法，并与现有的高准确率方法相结合。我们在不同领域的一系列对alignment任务中评估了MapperGPT。我们设计了一组特别适合lexical方法的任务，并显示了MapperGPT可以提供substantial提升的准确率，超过了State-of-the-art（SOTA）方法，如LogMap。

Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures

paper_url: http://arxiv.org/abs/2310.03659
repo_url: None
paper_authors: Thorsten Händler
for: 这篇论文是为了探讨自主的语言模型（LLM）动态抽象和协调框架，以便在复杂的多个任务中实现更好的AI功能。
methods: 这篇论文使用了多维度分类法来分析自主LLM多代理系统中的自动化和对齐平衡问题，并提供了一个领域ontology模型来 specify 基本的架构概念。
results: 这篇论文通过对一些代表性的LLM多代理系统的exploratory分类，ILLUSTRATE了其实际应用的实用性，并揭示了未来的研究和开发的潜在前景。

Abstract
Large language models (LLMs) have revolutionized the field of artificial intelligence, endowing it with sophisticated language understanding and generation capabilities. However, when faced with more complex and interconnected tasks that demand a profound and iterative thought process, LLMs reveal their inherent limitations. Autonomous LLM-powered multi-agent systems represent a strategic response to these challenges. Such systems strive for autonomously tackling user-prompted goals by decomposing them into manageable tasks and orchestrating their execution and result synthesis through a collective of specialized intelligent agents. Equipped with LLM-powered reasoning capabilities, these agents harness the cognitive synergy of collaborating with their peers, enhanced by leveraging contextual resources such as tools and datasets. While these architectures hold promising potential in amplifying AI capabilities, striking the right balance between different levels of autonomy and alignment remains the crucial challenge for their effective operation. This paper proposes a comprehensive multi-dimensional taxonomy, engineered to analyze how autonomous LLM-powered multi-agent systems balance the dynamic interplay between autonomy and alignment across various aspects inherent to architectural viewpoints such as goal-driven task management, agent composition, multi-agent collaboration, and context interaction. It also includes a domain-ontology model specifying fundamental architectural concepts. Our taxonomy aims to empower researchers, engineers, and AI practitioners to systematically analyze the architectural dynamics and balancing strategies employed by these increasingly prevalent AI systems. The exploratory taxonomic classification of selected representative LLM-powered multi-agent systems illustrates its practical utility and reveals potential for future research and development.

摘要
大型语言模型（LLM）已经革命化人工智能领域，具备了复杂语言理解和生成能力。然而，当面临更复杂和相互连接的任务时，LLM具有内在的限制。自主 LLM 驱动多代理系统是一种策略性应对这些挑战的回应。这些系统通过自动将用户提交的目标 decomposing 成可管理的任务，并通过一群特殊智能代理来进行执行和结果合成。这些代理通过 LLM 强化的理解能力，可以协同合作，通过利用上下文资源 such as 工具和数据集来提高合作效果。虽然这些体系具有潜在的扩展可能性，但是保持不同水平的自主和对齐是关键的挑战。这篇论文提出了一种多维度分类，用于分析自主 LLM 驱动多代理系统如何在不同的体系视角下平衡动态的自主和对齐。它还包括一个领域 ontology 模型，描述了基本体系概念。我们的分类旨在为研究者、工程师和 AI 实践者提供系统性分析自主 LLM 驱动多代理系统的建议和指导。选择代表 LLM 驱动多代理系统的exploratory分类表明了我们的分类的实用性，并揭示了未来研究和发展的潜在 potential。

paper_url: http://arxiv.org/abs/2310.03779
repo_url: None
paper_authors: Yanming Wan, Jiayuan Mao, Joshua B. Tenenbaum
for: 本研究是为了评估机器人理解和执行人类指令的全面评估标准。
methods: 本研究使用了人类行为轨迹、物理环境和社会各种各样的信息来评估机器人理解和执行人类指令的能力。
results: 研究发现现有的语言固定和规划方法在HandMeThat上表现不佳， suggesting significant room for future work on physical and social human-robot communications and interactions。

Abstract
We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.

摘要
我们介绍HandMeThat，一个标准套件用于评估人工智能机器人理解和遵循语言指令的全面评估。在过去的数据集中，大多集中在语言落实和规划上，而HandMeThat则考虑了人类指令的解释时的物理（物体状态和关系）和社交（人类行为和目标）信息。HandMeThat包含10,000集的人机互动纪录。在每个集中，机器人首先观察人类行为的轨迹，然后接收人类指令，并通过指令中的子目标来完成。在这篇文章中，我们提供了文本界面 для我们的标准套件，机器人通过文本命令与虚拟环境互动。我们评估了多个基eline模型在HandMeThat上，并发现了 both offline和线上循环学习算法在HandMeThat上表现不佳，这表明了未来人工智能机器人与人类沟通和互动的Physical和社交方面还有很大的潜力。

CLEVRER-Humans: Describing Physical and Causal Events the Human Way

paper_url: http://arxiv.org/abs/2310.03635
repo_url: None
paper_authors: Jiayuan Mao, Xuelin Yang, Xikun Zhang, Noah D. Goodman, Jiajun Wu
for: 本研究旨在建立机器可以理解物理事件和其 causal 关系，以便与物理世界进行灵活交互。
methods: 研究使用了两种技术来提高数据收集效率：一种是使用新的迭代事件cloze任务来生成视频中事件的新表示，称为 causal event graphs (CEGs)；另一种是基于神经语言生成模型的数据增强技术。
results: 研究提出了一个名为 CLEVRER-Humans 的视频理解数据集，用于评估物理事件的 causal 判断。研究还展示了一些基准方法的表现， highlighting 该 benchmark 对机器学习领域的挑战。

Abstract
Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.

摘要
建立机器可以理解物理事件和其 causal 关系是在互动性的世界中灵活交互的关键。然而，大多数现有的物理和 causal 逻辑标准是基于生成的事件和人工生成的自然语言描述 causal 关系。这种设计存在两个问题：首先，数据的多样性不够，其次， causal 关系基于人工定义的规则与人类判断不同。为了解决这两个缺陷，我们提出了 CLEVRER-Humans benchmark，一个基于视频逻辑的 causal 判断数据集，它们由人类标签。我们采用了两种技术来提高数据收集效率：首先，一种新的迭代事件cloze任务，用于生成视频中事件的新表示，我们称之为 causal event graph (CEG)；其次，基于神经语言生成模型的数据增强技术。我们将收集的 CEG 转换成问题和答案，与先前的工作一致。最后，我们研究了 CLEVRER-Humans 问题回答的一些基准方法，并 highlighted 这些标准的巨大挑战。

PeaTMOSS: Mining Pre-Trained Models in Open-Source Software

paper_url: http://arxiv.org/abs/2310.03620
repo_url: https://github.com/purduedualitylab/peatmoss-demos
paper_authors: Wenxin Jiang, Jason Jones, Jerin Yasmin, Nicholas Synovic, Rajeev Sashti, Sophie Chen, George K. Thiruvathukal, Yuan Tian, James C. Davis
for: 这个论文的目的是为了研究基于预训练深度学习模型（PTM）的软件工程做出一个大规模的数据集，以便更好地理解PTM在软件工程中的应用和挑战。
methods: 这篇论文使用了一个名为PeaTMOSS的数据集，该数据集包含281,638个预训练深度学习模型和27,270个开源软件项目，以及这些项目中PTM的使用情况。论文还提出了一个挑战，即通过分析PTM在软件工程中的使用情况，探索PTM在软件工程中的应用和挑战。
results: 论文提出了一个名为PeaTMOSS的数据集，该数据集包含大量的PTM和相关的软件项目信息，可供研究PTM在软件工程中的应用和挑战。此外，论文还提出了一个挑战，以便研究PTM在软件工程中的应用和挑战。

Abstract
Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the wide-spread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: https://github.com/PurdueDualityLab/PeaTMOSS-Demos.

摘要
开发和训练深度学习模型是昂贵的，因此软件工程师们开始 reuse pre-trained deep learning models (PTMs) 并对其进行精度调整用于下游任务。尽管PTMs 的使用广泛，但我们对相关的软件工程行为和挑战知之甚少。为了启用PTMs 的研究，我们提供了 PeaTMOSS 数据集：开源软件中的 Pre-Trained Models。PeaTMOSS 包括三部分：(1) 281,638 个 PTMs，(2) 27,270 个开源软件仓库使用 PTMs，以及 (3) PTMs 和这些项目之间的映射。我们挑战 PeaTMOSS 挖掘者发现PTMs 在软件工程中的实践。 demo 和数据集的链接可以在：https://github.com/PurdueDualityLab/PeaTMOSS-Demos 中找到。

Solving a Class of Non-Convex Minimax Optimization in Federated Learning

paper_url: http://arxiv.org/abs/2310.03613
repo_url: None
paper_authors: Xidong Wu, Jianhui Sun, Zhengmian Hu, Aidong Zhang, Heng Huang
for: addressing large-scale data challenges in machine learning applications with communication-efficient distributed training
methods: Federated Learning (FL) algorithms (FedSGDA+ and FedSGDA-M) and existing centralized optimization algorithms
results: reduced communication complexity and improved sample complexity for nonconvex-concave and nonconvex-strongly-concave minimax problems, with the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$

Abstract
The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (\emph{i.e.} single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to $O(\varepsilon^{-6})$. Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$. FedSGDA-M is the first algorithm to match the best sample complexity $O(\varepsilon^{-3})$ achieved by the single-machine method under the nonconvex-strongly-concave setting. Extensive experimental results on fair classification and AUROC maximization show the efficiency of our algorithms.

摘要
“小最大最小值问题在机器学习应用中随处出现，从对抗训练和奖励学习策略评估到AUROC最大化。为了 Addressing 大规模数据问题 across multiple clients with communication-efficient distributed training, federated learning (FL) 在 gaining popularity。许多中央化设置中的最小最大值问题优化算法已经被开发出来，但是在 FL setting 中，这个问题还未得到充分研究。在这篇论文中，我们研究了一类联邦非凸最小最大值优化问题。我们提出了 FedSGDA+ 和 FedSGDA-M 算法，并将 существу的复杂性结果缩小到最常见的最小最大值问题中。对非凸-凹型问题，我们提出了 FedSGDA+，并将通信复杂性降至 $O(\varepsilon^{-6})$。在非凸-强凹和非凸-PL最小最大值设置中，我们证明了 FedSGDA-M 的样本复杂性为 $O(\kappa^{3} N^{-1}\varepsilon^{-3})$，并且通信复杂性为 $O(\kappa^{2}\varepsilon^{-2})$。FedSGDA-M 是第一个与单机器方法在非凸-强凹设置中匹配的样本复杂性 $O(\varepsilon^{-3})$。我们的实验结果表明，我们的算法在公平分类和 AUROC 最大化中具有高效性。”

FASER: Binary Code Similarity Search through the use of Intermediate Representations

paper_url: http://arxiv.org/abs/2310.03605
repo_url: https://github.com/br0kej/FASER
paper_authors: Josh Collyer, Tim Watson, Iain Phillips
for: 本研究旨在提高跨架构软件功能关注的能力，以便分析恶意软件、安全软件供应链和漏洞研究等领域。
methods: 本研究使用了 binary intermediate representations（ Intermediate Representations，IR）作为数据源，并提出了一种基于文档长Transformers的函数为字符串编码表示（FASER）模型，以实现跨架构函数搜索无需人工特征工程、预训练或动态分析步骤。
results: compared to several baseline methods, the proposed FASER model demonstrates strong performance in both general function search and targeted vulnerability search tasks, outperforming all baseline approaches.

Abstract
Being able to identify functions of interest in cross-architecture software is useful whether you are analysing for malware, securing the software supply chain or conducting vulnerability research. Cross-Architecture Binary Code Similarity Search has been explored in numerous studies and has used a wide range of different data sources to achieve its goals. The data sources typically used draw on common structures derived from binaries such as function control flow graphs or binary level call graphs, the output of the disassembly process or the outputs of a dynamic analysis approach. One data source which has received less attention is binary intermediate representations. Binary Intermediate representations possess two interesting properties: they are cross architecture by their very nature and encode the semantics of a function explicitly to support downstream usage. Within this paper we propose Function as a String Encoded Representation (FASER) which combines long document transformers with the use of intermediate representations to create a model capable of cross architecture function search without the need for manual feature engineering, pre-training or a dynamic analysis step. We compare our approach against a series of baseline approaches for two tasks; A general function search task and a targeted vulnerability search task. Our approach demonstrates strong performance across both tasks, performing better than all baseline approaches.

摘要
“能够识别感兴趣的函数在跨架构软件中是有用的，无论你是分析恶意软件、安全软件供应链或进行攻击性研究。跨架构软件Binary Code相似性搜寻已经在多个研究中探讨过，通常使用了各种不同的数据来源来实现目的。这些数据来源通常是根据binaries的常见结构，例如函数控制流图或二进制层级呼叫图、这些output的资料分析结果或动态分析的结果。然而，一个较少受到注意的数据来源是二进制中继表示。二进制中继表示具有两个有趣的性能：它们是跨架构的本性，且可以明确地表示函数的 semantics，以便在下游使用。在这篇文章中，我们提出了Function as a String Encoded Representation（FASER），它结合了长文本转换器和中继表示，创建了可以在跨架构上进行函数搜寻，不需要手动的特性工程、预训练或动态分析步骤。我们与一些基eline方法进行比较，在两个任务上显示了强大的表现，比基eline方法更好。”

How toxic is antisemitism? Potentials and limitations of automated toxicity scoring for antisemitic online content

paper_url: http://arxiv.org/abs/2310.04465
repo_url: None
paper_authors: Helena Mihaljević, Elisabeth Steffen
for: 这个论文探讨了Google和Jigsaw开发的Perspective API在探测仇视性言语中的潜力和局限性，特别是在内容审核、监测和社交媒体研究等领域。methods: 作者使用了一个手动标注的德语 dataset，包括来自 Telegram 和 Twitter 的约 3,600 条帖子，以探索恶意言语是如何被评估为恶势力，以及不同形式的反犹太主义和文本表达的偏好是如何影响评分。results: 作者发现，Perspective API 在基本水平上能够识别恶意言语，但对非显式的反犹太主义和批判反犹太主义的文本表达存在重大的局限性。此外，作者还发现，通过使用广泛的反犹太主义代码，可以减少 API 分数，从而轻松绕过基于该服务的内容审核。

Abstract
The Perspective API, a popular text toxicity assessment service by Google and Jigsaw, has found wide adoption in several application areas, notably content moderation, monitoring, and social media research. We examine its potentials and limitations for the detection of antisemitic online content that, by definition, falls under the toxicity umbrella term. Using a manually annotated German-language dataset comprising around 3,600 posts from Telegram and Twitter, we explore as how toxic antisemitic texts are rated and how the toxicity scores differ regarding different subforms of antisemitism and the stance expressed in the texts. We show that, on a basic level, Perspective API recognizes antisemitic content as toxic, but shows critical weaknesses with respect to non-explicit forms of antisemitism and texts taking a critical stance towards it. Furthermore, using simple text manipulations, we demonstrate that the use of widespread antisemitic codes can substantially reduce API scores, making it rather easy to bypass content moderation based on the service's results.

摘要
Google和Jigsaw的Perspective API，一个流行的文本恶意评估服务，在多个应用领域得到了广泛的采用，主要包括内容审核、监测和社交媒体研究。我们研究了它在检测仇Semite在线内容的潜力和局限性。使用一个手动注释的德语 dataset，包括 Telegram 和 Twitter 上约 3,600 个帖子，我们探索了恶意文本是如何被评分，以及不同形式的反Semite和文本表达的立场如何影响了评分。我们发现，在基本水平上，Perspective API 能够识别反Semite内容为恶意，但对于不直接表达的反Semite和批判反Semite的文本表达存在重要的限制。此外，我们使用简单的文本修改示例，示出了使用广泛的反Semite代码可以减少 API 分数，使得内容审核基于服务的结果相对较容易被逃脱。

paper_url: http://arxiv.org/abs/2310.03581
repo_url: None
paper_authors: Jin Jin, Chong Zhang, Jonas Frey, Nikita Rudin, Matias Mattamala, Cesar Cadena, Marco Hutter
for: 本研究旨在帮助自主机器人在未知环境中快速准确 Navigation，即使感知受到干扰或错误。
methods: 本文使用再归折 learning（RL）基于本地决策策略，通过在潜在空间重建环境信息，以便在感知失败时进行有效应对。
results: 在模拟和实际四足机器人 ANYmal 上，本策略在面临感知失败时成功率高于30%，与传统基于规则的本地响应策略相比。I hope that helps! Let me know if you have any other questions.

Abstract
Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforcement learning (RL) based local navigation policy to guide our legged robot. Unlike previous works relying on heuristics and anomaly detection to update navigational information, we train our navigation policy to reconstruct the environment information in the latent space from corrupted perception and react to perception failures end-to-end. To this end, we incorporate both proprioception and exteroception into our policy inputs, thereby enabling the policy to sense collisions on different body parts and pits, prompting corresponding reactions. We validate our approach in simulation and on the real quadruped robot ANYmal running in real-time (<10 ms CPU inference). In a quantitative comparison with existing heuristic-based locally reactive planners, our policy increases the success rate over 30% when facing perception failures. Project Page: https://bit.ly/45NBTuh.

摘要
（简化中文）自主机器人需要在未知环境中 Navigation 可靠，即使感知受到了损害。这些损害通常发生在具有劣化感知的环境中，或者感知算法错误地理解场景。在这篇论文中，我们将感知失败模型为隐藏的障碍物和坑，并使用强化学习（RL）基于的本地导航政策来引导我们的四肢机器人。与前一些基于规则和异常检测来更新导航信息的方法不同，我们的导航政策可以在损害感知中重建环境信息，并在端到端进行反应。为此，我们将 proprioception 和 exteroception 作为政策输入，以便政策可以感受到不同的身体部分和坑，并且进行相应的反应。我们在模拟和真实的四肢机器人 ANYmal 上进行了实时（<10 ms CPU 推理）验证，并与现有的启发式本地反应计划进行了量化比较。在面临感知失败情况下，我们的政策的成功率高于30%。项目页面：https://bit.ly/45NBTuh.

Causal Inference in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems

paper_url: http://arxiv.org/abs/2310.03579
repo_url: None
paper_authors: Trang Nguyen, Alexander Tong, Kanika Madan, Yoshua Bengio, Dianbo Liu
for: 该研究旨在提高生物学GRNs中 causal structure learning的效能，并解决了可扩展性问题。
methods: 该研究提出了 Swift-DynGFN 框架，具有 gene-wise independence 特点，可以提高并行化和计算成本下降。
results: 实验表明，Swift-DynGFN 可以在实验室单细胞 RNA 速度数据和 sintetic GRN 数据上快速学习 causal structure，并在更大的系统中保持可扩展性。

Abstract
Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular processes. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.

摘要
理解生物细胞中的生物化学网络（GRNs）内部的 causal 关系是解释细胞过程中 gene 之间的交互的关键。然而，在 GRNs 中发现 causal 关系是一项复杂的问题，因为存在循环反馈征和不确定性，导致可能出现多种 causal 结构。先前的工作往往忽略循环动态（假设无环结构）或者缺乏可扩展性。我们介绍 Swift-DynGFN 框架，该框架可以提高 GRNs 中 causal 结构学习的效果，同时解决可扩展性问题。具体来说，Swift-DynGFN 利用了每个基因独立性来提高并行化和降低计算成本。实验表明，Swift-DynGFN 在真实的单元细胞 RNA 速度数据和 sintetic GRN 数据上可以有效地提高 GRNs 中 causal 结构的学习和可扩展性。

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching

paper_url: http://arxiv.org/abs/2310.12999
repo_url: None
paper_authors: Junliang Luo, Yi Tian Xu, Di Wu, Michael Jenkin, Xue Liu, Gregory Dudek
for: 提高无线网络的能效性，适应新一代无线网络的发展需求，环境和政策因素，以及可能的能源危机。
methods: 使用准确动态 програм理论（ADP）和在线优化，根据状态动作对各个基站Cells进行开关，以减少网络功率消耗，保持足够的服务质量指标（QoS）。
results: 使用多层感知器（MLP）和长期快速储存（LSTM）预测功率和QoS，并在在线优化算法中采用自适应QoS阈值来筛选基站Cells的 switching 动作，以实现最大化功率减少而不妨碍QoS。

Abstract
Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.

摘要
“无线网络的能源救减在日益增长的重要性中，因为新一代无线网络的需求不断增长，环境和 regulatory 因素，以及可能的能源危机问题，导致由地opolitical 紧张关系。在这个工作中，我们提出一个基于推对 Dynamic Programming（ADP）的方法，与在线服务器进行优化，以降低网络电力消耗，同时保持足够的服务质量（QoS）指标。我们使用每个状态-行动 pairs 的多层感知神经网络（MLP）来预测电力消耗，以近似值函数在 ADP 中选择最佳的行动。为了储存最大化的电力消耗，而不损害 QoS，我们另外使用一个 MLP 预测 QoS，并与一个长期传递内存（LSTM）进行估计，将其组合入线上优化算法，生成适应的 QoS 阈值，以根据网络的 QoS 历史进行筛选网络转换动作。这个方法的表现被评估使用实际的网络 simulator，以及不同的实际情况，包括动态的流量模式。”

Lightweight Boosting Models for User Response Prediction Using Adversarial Validation

paper_url: http://arxiv.org/abs/2310.03778
repo_url: None
paper_authors: Hyeonwoo Kim, Wonsung Lee
for: 预测应用程序安装概率
methods: 使用对抗验证、特征工程和 Gradient Boosted Decision Trees (GBDT) 实现轻量级解决方案
results: 在 ACM RecSys Challenge 2023 中，我们的方法取得了最终排名第九的成绩，得分为 6.059065。

Abstract
The ACM RecSys Challenge 2023, organized by ShareChat, aims to predict the probability of the app being installed. This paper describes the lightweight solution to this challenge. We formulate the task as a user response prediction task. For rapid prototyping for the task, we propose a lightweight solution including the following steps: 1) using adversarial validation, we effectively eliminate uninformative features from a dataset; 2) to address noisy continuous features and categorical features with a large number of unique values, we employ feature engineering techniques.; 3) we leverage Gradient Boosted Decision Trees (GBDT) for their exceptional performance and scalability. The experiments show that a single LightGBM model, without additional ensembling, performs quite well. Our team achieved ninth place in the challenge with the final leaderboard score of 6.059065. Code for our approach can be found here: https://github.com/choco9966/recsys-challenge-2023.

摘要
“ACM RecSys Challenge 2023”，由 ShareChat 主办，旨在预测应用程序安装概率。这篇文章描述了一种轻量级解决方案。我们将任务定型为用户响应预测任务。为了快速原型，我们提议以下步骤：1. 使用对抗验证，有效地从数据集中消除无用的特征。2. 对噪音连续特征和 categorical 特征（具有大量唯一值）使用Feature工程技术。3. 利用 Gradient Boosted Decision Trees（GBDT）的异常表现和可扩展性。实验显示，一个单独的 LightGBM 模型（没有额外 ensemble）表现非常好。我们在挑战中获得第九名，最终排名为 6.059065。我们的代码可以在以下地址找到：https://github.com/choco9966/recsys-challenge-2023。

Towards Robust and Generalizable Training: An Empirical Study of Noisy Slot Filling for Input Perturbations

paper_url: http://arxiv.org/abs/2310.03518
repo_url: None
paper_authors: Jiachi Liu, Liwen Wang, Guanting Dong, Xiaoshuai Song, Zechen Wang, Zhengyang Wang, Shanglin Lei, Jinzheng Zhao, Keqing He, Bo Xiao, Weiran Xu
for: 这篇论文主要研究了语音识别领域中的噪声稳定性评价方法。
methods: 该论文提出了一个名为Noise-SF的噪声稳定性评价 datasets，该dataset包含了五种人工标注的噪声类型，并且这些噪声类型都是实际中广泛使用的噪声稳定性训练方法中的一部分。
results: 经过对Noise-SF dataset的广泛实验测试，基eline模型在噪声稳定性评价中表现不佳，而提出的框架则能够有效地提高模型的噪声稳定性。根据实验结果，我们提出了一些前瞻性的建议，以促进噪声稳定性研究的发展。

Abstract
In real dialogue scenarios, as there are unknown input noises in the utterances, existing supervised slot filling models often perform poorly in practical applications. Even though there are some studies on noise-robust models, these works are only evaluated on rule-based synthetic datasets, which is limiting, making it difficult to promote the research of noise-robust methods. In this paper, we introduce a noise robustness evaluation dataset named Noise-SF for slot filling task. The proposed dataset contains five types of human-annotated noise, and all those noises are exactly existed in real extensive robust-training methods of slot filling into the proposed framework. By conducting exhaustive empirical evaluation experiments on Noise-SF, we find that baseline models have poor performance in robustness evaluation, and the proposed framework can effectively improve the robustness of models. Based on the empirical experimental results, we make some forward-looking suggestions to fuel the research in this direction. Our dataset Noise-SF will be released at https://github.com/dongguanting/Noise-SF.

摘要
在实际对话场景中，由于输入噪声的存在，现有的超级vised插槽填充模型在实际应用中经常表现不佳。尽管有一些关于噪声Robustness的研究，但这些研究仅在基于规则生成的 sintetic 数据上进行评估，这限制了研究的发展。在这篇论文中，我们介绍了一个噪声Robustness评估集合名为Noise-SF，该集合包含了五种人类标注的噪声，这些噪声都是现实中广泛采用的Robust-training方法中的噪声。经过了广泛的实验研究，我们发现基eline模型在Robustness评估中表现不佳，而我们提出的框架可以有效提高模型的Robustness。基于实验结果，我们提出了一些前瞻的建议，以推动这一方向的研究。我们的Noise-SF数据集将在 GitHub 上发布，链接为：https://github.com/dongguanting/Noise-SF。

How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

paper_url: http://arxiv.org/abs/2310.03494
repo_url: None
paper_authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
for: 本研究旨在解释deep reinforcement learning（RL）训练出的自适应代理人能否在新环境中展现良好的扩展性。
methods: 我们使用非均匀采样策略来测试RL代理人的零例扩展性（ZSG），包括两种失败模式：过拟合和过总结。我们首先测量RL代理人内部表征和训练级别之间的相互信息（MI），并发现非均匀采样策略可以更好地保持低MI，这提供了一种新的理论依据。然后，我们转移到无监督环境设计（UED）方法，这些方法可以在运行时动态生成新的训练级别，并尽可能减少MI。然而，我们发现UED方法会导致训练分布的显著变化，从而导致过总结和worse ZSG性能。
results: 我们引入自适应环境设计（SSED）方法，SSED使用变量自动编码器来生成级别，从而减少MI并最小化与目标分布的偏移。SSED方法与固定集合级别采样策略和UED方法相比，导致了统计学上显著的ZSG性能改善。

Abstract
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.

摘要
深度强化学习训练的自主Agent有一个关键的局限性，即其在新环境中的泛化能力较差，即使新环境与训练环境有相似特征。在这项工作中，我们研究了非均匀抽样策略对深度强化学习Agent的零例泛化能力（ZSG）的影响，包括两种失败模式：过拟合和过泛化。作为第一步，我们测量了Agent内部表示和训练环境集之间的相互信息（MI），发现它与实例过拟合高度相关。与均匀抽样不同，适应抽样策略，根据预测值损失来决定抽样级别，能够保持更低的MI，提供了一种新的理论依据。然后我们转向无监督环境设计（UED）方法，这些方法可以在运行时动态生成新的训练级别，并最效地减少MI。然而，我们发现UED方法会导致训练分布的显著变化，从而导致过泛化和较差的ZSG性能。为了避免实例过拟合和过泛化，我们介绍了无监督环境设计（SSED）。SSED使用变量自动编码器生成级别，并能够减少MI，同时尽可能减少与目标分布的偏移，从而导致了统计学上的改进。

Tik-to-Tok: Translating Language Models One Token at a Time: An Embedding Initialization Strategy for Efficient Language Adaptation

paper_url: http://arxiv.org/abs/2310.03477
repo_url: None
paper_authors: François Remy, Pieter Delobelle, Bettina Berendt, Kris Demuynck, Thomas Demeester
for: addresses the challenge of training monolingual language models for low and mid-resource languages
methods: uses a novel model conversion strategy that adapts high-resource monolingual language models to a new target language
results: achieves a new state-of-the-art performance on mid- and low-resource languages, and reduces significantly the amount of data and time required for training state-of-the-art models.

Abstract
Training monolingual language models for low and mid-resource languages is made challenging by limited and often inadequate pretraining data. In this study, we propose a novel model conversion strategy to address this issue, adapting high-resources monolingual language models to a new target language. By generalizing over a word translation dictionary encompassing both the source and target languages, we map tokens from the target tokenizer to semantically similar tokens from the source language tokenizer. This one-to-many token mapping improves tremendously the initialization of the embedding table for the target language. We conduct experiments to convert high-resource models to mid- and low-resource languages, namely Dutch and Frisian. These converted models achieve a new state-of-the-art performance on these languages across all sorts of downstream tasks. By reducing significantly the amount of data and time required for training state-of-the-art models, our novel model conversion strategy has the potential to benefit many languages worldwide.

摘要
training monolingual language models for low and mid-resource languages is challenging due to limited and inadequate pretraining data. in this study, we propose a novel model conversion strategy to address this issue, adapting high-resource monolingual language models to a new target language. by generalizing over a word translation dictionary encompassing both the source and target languages, we map tokens from the target tokenizer to semantically similar tokens from the source language tokenizer. this one-to-many token mapping improves the initialization of the embedding table for the target language. we conduct experiments to convert high-resource models to mid- and low-resource languages, namely dutch and frisian. these converted models achieve a new state-of-the-art performance on these languages across all sorts of downstream tasks. by significantly reducing the amount of data and time required for training state-of-the-art models, our novel model conversion strategy has the potential to benefit many languages worldwide.

Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties

paper_url: http://arxiv.org/abs/2310.04463
repo_url: None
paper_authors: Siyuan Guo, Jihong Guan, Shuigeng Zhou
for: 本研究旨在提出一种新的分子生成方法，以提高现有模型的分子生成效果。
methods: 本研究使用了扩展的扩散模型框架，并提出了多种创新的设计方法。在这些设计方法中，我们首次在分子生成过程中使用了电子效应基于的分子 фрагмента化方法。此外，我们还引入了多个目标函数来同时优化多个分子性质。
results: 对于两个参考数据集QM9和ZINC250k，我们的提议方法可以生成比现状态最佳的分子，其中包括有效性、独特性、新颖性、Fréchet ChemNet距离（FCD）、QED和PlogP等多个分子性质。

Abstract
In the past decade, Artificial Intelligence driven drug design and discovery has been a hot research topic, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue only the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fr\'echet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models.

摘要
在过去的一个 décennie，人工智能驱动的药物设计和发现已经是研究热点，其中一个重要分支是通过生成模型生成分子，从GAN基于模型和VAE基于模型到最新的扩散模型。然而，大多数现有模型只追求基本属性，如有效性和uniqueness的生成分子，很少进一步optimize一个重要的分子性质（例如QED或PlogP），这使得大多数生成的分子在实际应用中有 Limited usefulness。在这篇论文中，我们提出了一种新的分子生成方法，拓展了扩散模型框架，并通过多种创新的设计来提高生成分子的性能。 novelties twofold。一方面，由于分子结构复杂多样，而分子性质通常由一些子结构（例如药理学残基）决定，我们提议在分子和分子段级别进行扩散，从而获得一个混合 Gaussian 分布，用于逆扩散过程。另一方面，我们开发了一种新的电子效应基于的分子段化方法，以获得更好的分子段。此外，我们引入了两种方法来直接在扩散模型框架下进行多个分子性质的优化。首先，以实际的药物分子必须有化学有效性为前提，我们使用能量引导函数来优化分子有效性。其次，由于实际的药物分子应该具有多种性质，我们采用多目标机制来同时优化多个分子性质。在两个标准测试集QM9和ZINC250k上进行了广泛的实验，发现生成的分子具有更高的有效性、uniqueness、新鲜度、Fréchet ChemNet Distance (FCD)、QED和PlogP等性质，而与当前最佳模型相比，具有更高的效果。

A Quantitatively Interpretable Model for Alzheimer’s Disease Prediction Using Deep Counterfactuals

paper_url: http://arxiv.org/abs/2310.03457
repo_url: None
paper_authors: Kwanseok Oh, Da-Woon Heo, Ahmad Wisnu Mulyadi, Wonsik Jung, Eunsong Kang, Kun Ho Lee, Heung-Il Suk
For: The paper aims to provide a more interpretable and effective approach for predicting Alzheimer’s disease (AD) using counterfactual reasoning and gray matter density maps.* Methods: The paper proposes a framework that synthesizes counterfactual-labeled structural MRIs, transforms them into gray matter density maps, and uses a lightweight linear classifier to boost predictive performance and provide quantitative interpretation.* Results: The paper demonstrates that the proposed framework can produce an “AD-relatedness index” for each region of interest (ROI) and offer an intuitive understanding of brain status for individuals and patient groups with respect to AD progression, with comparable predictive performance to deep learning methods.Here is the same information in Simplified Chinese text:
for: 这项研究旨在通过对比逻辑和灰 mater 激光扫描图像来提供更加可解释的和有效的阿尔茨哈默病 (AD) 预测方法。
methods: 这项研究提议一种框架，该框架可以将对比逻辑标注的结构MRIs转换为灰 mater 激光扫描图像，并使用轻量级线性分类器来提高预测性能和提供量化解释。
results: 这项研究表明，提议的框架可以生成每个区域兴趣 (ROI) 的 “AD相关性指数”，并为每个个体和患者群提供有关 AD 进程的直观理解，与深度学习方法相比具有相同的预测性能。

Abstract
Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explanatory maps based on visual inspection alone are insufficient unless we intuitively demonstrate their medical or neuroscientific validity via quantitative features. In this study, we synthesize the counterfactual-labeled structural MRIs using our proposed framework and transform it into a gray matter density map to measure its volumetric changes over the parcellated region of interest (ROI). We also devised a lightweight linear classifier to boost the effectiveness of constructed ROIs, promoted quantitative interpretation, and achieved comparable predictive performance to DL methods. Throughout this, our framework produces an ``AD-relatedness index'' for each ROI and offers an intuitive understanding of brain status for an individual patient and across patient groups with respect to AD progression.

摘要
深度学习（DL）用于预测阿尔茨曼尼尔病（AD）已经提供了及时的 intervención，但仍需要注意的解释性来解释它们的DL模型如何做出定义性的决策。最近，对比因果逻辑得到了医学研究中的越来越多的注意，因为它可以提供一个精细的视觉解释地图。然而，基于视觉检查 alone的视觉解释地图是无效的，除非我们能够INTRODUCE其医学或神经科学的VALIDITY通过量化特征。在这种研究中，我们将Counterfactual-labeled的结构MRI使用我们的提案的框架进行合成，并将其转换成灰色物质浓度地图，以测量ROI中的体积变化。我们还开发了一种轻量级的线性分类器，以提高构建的ROIs的效果，促进量化解释，并实现与DL方法相同的预测性能。通过这种方式，我们的框架生成了每个ROI的“AD相关性指数”，并提供了对个人患者和patient group的AD进程状况的直观理解。

Pre-Training and Fine-Tuning Generative Flow Networks

paper_url: http://arxiv.org/abs/2310.03419
repo_url: None
paper_authors: Ling Pan, Moksh Jain, Kanika Madan, Yoshua Bengio
for: 这个论文的目的是探索如何使用 reward-free pre-training 方法来快速适应下游任务，并且可以快速发现更多的模式。
methods: 这个论文使用了 Generative Flow Networks (GFlowNets) 作为探索 compositional objects 的方法，并通过自我监督的方式来训练 GFlowNets。
results: 实验结果表明，使用 reward-free pre-training 方法可以快速适应下游任务，并且可以快速发现更多的模式。此外，这种方法还可以在不知道下游任务的情况下进行预训练，从而提高下游任务的效果。

Abstract
Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.

摘要
generate 生成 Flow Networks (GFlowNets) 是束缚 samplers ，它们学习随机政策来从给定的非正态奖励分布中顺序生成组合性 объек。它们可以生成多个高奖对象，这是科学发现任务中的重要考虑因素。然而，由于它们通常从给定的外部奖励函数进行训练，因此如何利用预训练的力量并在下游任务中训练 GFlowNets 是一个重要的开放挑战。 inspirited by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. However, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.

Domain Generalization for Medical Image Analysis: A Survey

paper_url: http://arxiv.org/abs/2310.08598
repo_url: None
paper_authors: Jee Seok Yoon, Kwanseok Oh, Yooseung Shin, Maciej A. Mazurowski, Heung-Il Suk
for: 这篇论文旨在探讨医疗影像分析（MedIA）中深度学习（DL）的应用，以及DL模型在真实世界中的应用问题。
methods: 这篇论文评论了医疗影像分析领域内的领域整合研究，包括数据水平、特征水平、模型水平和分析水平的方法。
results: 这篇论文提供了医疗影像分析领域内执行预测和分析时的绩效，以及不同领域整合方法的优缺点，并揭露未来研究的机遇。

Abstract
Medical Image Analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, DL models for MedIA remain challenging to deploy in real-world situations, failing for generalization under the distributional gap between training and testing samples, known as a distribution shift problem. Researchers have dedicated their efforts to developing various DL methods to adapt and perform robustly on unknown and out-of-distribution data distributions. This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We provide a holistic view of how domain generalization techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize domain generalization methods into data-level, feature-level, model-level, and analysis-level methods. We show how those methods can be used in various stages of the MedIA workflow with DL equipped from data acquisition to model prediction and analysis. Furthermore, we include benchmark datasets and applications used to evaluate these approaches and analyze the strengths and weaknesses of various methods, unveiling future research opportunities.

摘要
医疗图像分析（MedIA）已成为医学和医疗领域的重要工具，帮助诊断疾病、预测疾病趋势和制定治疗计划，而最近的深度学习（DL）技术的发展也为其带来了 significiant 改进。然而，DL模型在实际应用中仍然面临 distribuional shift 问题，即训练和测试样本的分布不同，导致模型的泛化问题。研究人员对此做出了努力，开发了多种 DL 方法来适应和在未知和非标准数据分布下表现稳定。本文对医疗图像分析领域的域合理化研究进行了全面的回顾，不仅涵盖了不同的 DL 方法，还考虑了这些方法在整个 MedIA 工作流程中的操作影响。特别是，我们将域合理化方法分为数据级、特征级、模型级和分析级方法，并详细介绍了这些方法在不同阶段的 MedIA 工作流程中的应用。此外，我们还提供了一些标准的数据集和应用，用于评估这些方法的效果，并分析了各种方法的优缺点，探讨未来的研究机遇。

GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.03399
repo_url: https://github.com/dfdazac/grapes
paper_authors: Taraneh Younesian, Thiviyan Thanapalasingam, Emile van Krieken, Daniel Daza, Peter Bloem
for: 这篇论文的目的是提出一种适应性 Graph Sampling 方法，以便在各种结构和任务下减少 GNN 的内存开销。
methods: 这篇论文使用了 GFlowNet 来学习节点抽样概率，以达到在 GNN 分类器训练中 Identify 影响节点的目的。
results: 在多个小规模和大规模图标准 bencmark 上，GRAPES 被证明能够维持高准确率，同时具有扩展性和可扩展性。

Abstract
Graph neural networks (GNNs) learn the representation of nodes in a graph by aggregating the neighborhood information in various ways. As these networks grow in depth, their receptive field grows exponentially due to the increase in neighborhood sizes, resulting in high memory costs. Graph sampling solves memory issues in GNNs by sampling a small ratio of the nodes in the graph. This way, GNNs can scale to much larger graphs. Most sampling methods focus on fixed sampling heuristics, which may not generalize to different structures or tasks. We introduce GRAPES, an adaptive graph sampling method that learns to identify sets of influential nodes for training a GNN classifier. GRAPES uses a GFlowNet to learn node sampling probabilities given the classification objectives. We evaluate GRAPES across several small- and large-scale graph benchmarks and demonstrate its effectiveness in accuracy and scalability. In contrast to existing sampling methods, GRAPES maintains high accuracy even with small sample sizes and, therefore, can scale to very large graphs. Our code is publicly available at https://github.com/dfdazac/grapes.

摘要
GRAPES是一种适应性 Graph sampling 方法，用于在深度层次的 Graph Neural Networks（GNNs）中解决内存问题。GRAPES通过学习选择训练 GNN 分类器的影响节点集来采样graph。我们使用 GFlowNet 学习节点采样概率，基于分类目标。我们在多个小规模和大规模图 benchmark 中评估 GRAPES，并证明其精度和可扩展性。与现有的采样方法不同，GRAPES可以在小样本大小下保持高精度，因此可以扩展到非常大的图。我们的代码公开在 GitHub 上：https://github.com/dfdazac/grapes。

Unpacking Human-AI Interaction in Safety-Critical Industries: A Systematic Literature Review

paper_url: http://arxiv.org/abs/2310.03392
repo_url: None
paper_authors: Tita A. Bach, Jenny K. Kristiansen, Aleksandar Babic, Alon Jacovi
for: 本研究的目的是探讨人工智能与人互动（HAII）在安全关键行业中的实现，以提高这些行业的安全性和可靠性。
methods: 本研究采用文献综述方法，检查当前HAII领域的研究，并提出了在这些领域中进行研究的最佳实践。
results: 本研究发现HAII领域的研究 Fragmented and inconsistent，存在多种不同的术语和定义，并且HAII的评价方法多样化。研究还发现了HAII的五大影响因素，即用户特征和背景（如用户人性和观察）、人工智能界面和功能（如交互UI设计）、人工智能输出（如准确性和操作建议）、可解释性和可 interpretability（如级别和用户理解）、以及人工智能的使用（如多样化环境和用户需求）。

Abstract
Ensuring quality human-AI interaction (HAII) in safety-critical industries is essential. Failure to do so can lead to catastrophic and deadly consequences. Despite this urgency, what little research there is on HAII is fragmented and inconsistent. We present here a survey of that literature and recommendations for research best practices that will improve the field. We divided our investigation into the following research areas: (1) terms used to describe HAII, (2) primary roles of AI-enabled systems, (3) factors that influence HAII, and (4) how HAII is measured. Additionally, we described the capabilities and maturity of the AI-enabled systems used in safety-critical industries discussed in these articles. We found that no single term is used across the literature to describe HAII and some terms have multiple meanings. According to our literature, five factors influence HAII: user characteristics and background (e.g., user personality, perceptions), AI interface and features (e.g., interactive UI design), AI output (e.g., accuracy, actionable recommendations), explainability and interpretability (e.g., level of detail, user understanding), and usage of AI (e.g., heterogeneity of environments and user needs). HAII is most commonly measured with user-related subjective metrics (e.g., user perception, trust, and attitudes), and AI-assisted decision-making is the most common primary role of AI-enabled systems. Based on this review, we conclude that there are substantial research gaps in HAII. Researchers and developers need to codify HAII terminology, involve users throughout the AI lifecycle (especially during development), and tailor HAII in safety-critical industries to the users and environments.

摘要
Ensuring high-quality human-AI interaction (HAII) in safety-critical industries is crucial. Failure to do so can lead to disastrous and deadly consequences. Despite the urgency, the current research on HAII is fragmented and inconsistent. We conducted a survey of the literature and provided recommendations for research best practices that can improve the field. Our investigation focused on the following areas:1. Terms used to describe HAII2. Primary roles of AI-enabled systems3. Factors that influence HAII4. How HAII is measuredWe found that there is no single term used across the literature to describe HAII, and some terms have multiple meanings. According to our literature review, five factors influence HAII: user characteristics and background (e.g., user personality, perceptions), AI interface and features (e.g., interactive UI design), AI output (e.g., accuracy, actionable recommendations), explainability and interpretability (e.g., level of detail, user understanding), and usage of AI (e.g., heterogeneity of environments and user needs). HAII is most commonly measured with user-related subjective metrics (e.g., user perception, trust, and attitudes), and AI-assisted decision-making is the most common primary role of AI-enabled systems.Based on our review, we conclude that there are substantial research gaps in HAII. Researchers and developers need to codify HAII terminology, involve users throughout the AI lifecycle (especially during development), and tailor HAII in safety-critical industries to the users and environments.

Procedural Text Mining with Large Language Models

paper_url: http://arxiv.org/abs/2310.03376
repo_url: https://github.com/jd-coderepos/proc-tm
paper_authors: Anisa Rula, Jennifer D’Souza
for: 本研究探讨了使用大型自然语言处理（NLP）模型在零例学习和Context-Aware学习环境中提取PDF文档中的过程，以问题回答的方式进行不间断提取。
methods: 本研究使用了当前领先的GPT-4（生成准备 transformer 4）模型，并采用了基于ontology的定义和少量示例学习的两种Context-Aware学习方法。
results: 研究发现，这些 modification 有能力有效地地址深度学习基于NLP的过程提取技术中的数据收集困难，并且表明了这些自定义的Context-Aware学习方法在提取过程中的承诺。

Abstract
Recent advancements in the field of Natural Language Processing, particularly the development of large-scale language models that are pretrained on vast amounts of knowledge, are creating novel opportunities within the realm of Knowledge Engineering. In this paper, we investigate the usage of large language models (LLMs) in both zero-shot and in-context learning settings to tackle the problem of extracting procedures from unstructured PDF text in an incremental question-answering fashion. In particular, we leverage the current state-of-the-art GPT-4 (Generative Pre-trained Transformer 4) model, accompanied by two variations of in-context learning that involve an ontology with definitions of procedures and steps and a limited number of samples of few-shot learning. The findings highlight both the promise of this approach and the value of the in-context learning customisations. These modifications have the potential to significantly address the challenge of obtaining sufficient training data, a hurdle often encountered in deep learning-based Natural Language Processing techniques for procedure extraction.

摘要
最近在自然语言处理领域的进步，特别是大规模语言模型的开发，在知识工程中创造了新的机会。在这篇论文中，我们 investigate了使用大型语言模型（LLMs）在零shot学习和在context学习Setting中解决抽取PDF文本中的过程的问题。我们利用了当前状态的GPT-4（生成预训练变换器4）模型，并使用过程和步骤的ontology和少量的几个示例学习。研究发现，这种方法和context学习定制具有普遍提高过程抽取的批处和可行性。这些修改可以减少深度学习基于自然语言处理技术中的训练数据获得困难。

Design Optimizer for Planar Soft-Growing Robot Manipulators

paper_url: http://arxiv.org/abs/2310.03374
repo_url: None
paper_authors: Fabio Stroppa
For: The paper is written for designing and optimizing soft-growing robots for specific manipulation tasks, such as exploration of delicate/dangerous environments, manipulation of items, or assistance in domestic environments.* Methods: The paper presents a novel approach for design optimization of soft-growing robots, which involves modeling the design process as a multi-objective optimization problem and using population-based optimization algorithms, specifically evolutionary algorithms, to transform the problem into a single-objective problem. The method also incorporates a novel rank-partitioning algorithm and obstacle avoidance within the optimizer operators.* Results: The proposed method is tested on different tasks and shows significant performance in solving the problem, outperforming existing methods in terms of precision, resource consumption, and run time.

Abstract
Soft-growing robots are innovative devices that feature plant-inspired growth to navigate environments. Thanks to their embodied intelligence of adapting to their surroundings and the latest innovation in actuation and manufacturing, it is possible to employ them for specific manipulation tasks. The applications of these devices include exploration of delicate/dangerous environments, manipulation of items, or assistance in domestic environments. This work presents a novel approach for design optimization of soft-growing robots, which will be used prior to manufacturing to suggest engineers -- or robot designer enthusiasts -- the optimal dimension of the robot to be built for solving a specific task. I modeled the design process as a multi-objective optimization problem, in which I optimize the kinematic chain of a soft manipulator to reach targets and avoid unnecessary overuse of material and resources. The method exploits the advantages of population-based optimization algorithms, in particular evolutionary algorithms, to transform the problem from multi-objective into a single-objective thanks to an efficient mathematical formulation, the novel rank-partitioning algorithm, and obstacle avoidance integrated within the optimizer operators. I tested the proposed method on different tasks to access its optimality, which showed significant performance in solving the problem. Finally, comparative experiments showed that the proposed method works better than the one existing in the literature in terms of precision, resource consumption, and run time.

摘要
软性增长机器人是一种创新性的设备，它们借鉴植物的生长机理来适应环境。由于它们的内置智能和最新的活动和制造技术，因此可以用于特定的操作任务。这些设备的应用包括探索敏感/危险环境、物品操作和家庭环境中的协助。本工作提出了一种新的软性增长机器人设计优化方法，该方法将在制造之前使用以确定最佳的机器人尺寸，以解决特定任务。我模型了设计过程为多目标优化问题，并且优化软 manipulate器的骨骼来达到目标并避免不必要的材料和资源的浪费。该方法利用了人口基于优化算法的优势，尤其是进化算法，将问题转化为单目标问题，并且通过rank-partitioning算法和避免障碍的内置算法来提高效率。我对不同任务进行了测试，以评估其优化性，结果显示了显著的性能提升。最后，对比性测试表明，提出的方法与现有文献中的方法相比，在精度、资源消耗和运行时间上具有更好的性能。

AI-based automated active learning for discovery of hidden dynamic processes: A use case in light microscopy

paper_url: http://arxiv.org/abs/2310.04461
repo_url: None
paper_authors: Nils Friederich, Angelo Yamachui Sitcheu, Oliver Neumann, Süheyla Eroğlu-Kayıkçı, Roshan Prizak, Lennart Hilbert, Ralf Mikut
for: 本研究旨在提出两种新方法，用于提高生物医学实验中的动态过程观测效率。
methods: 一种基于人工智能的方法（Encoded Dynamic Process，EDP），可以从单个静止图像中预测动态过程的时间值。另一种是基于机器学习操作（MLOps）的实验自动化管道（Experiment Automation Pipeline for Dynamic Processes，EAPDP），使用EDP提取的知识来有效地安排实验。
results: 在一个实验中，我们示出了使用预训练的State-Of-The-Art（SOTA）对象分割网络（Contour Proposal Networks，CPN）作为EAPDP模块，可以有效地提取动态过程中相关的对象。

Abstract
In the biomedical environment, experiments assessing dynamic processes are primarily performed by a human acquisition supervisor. Contemporary implementations of such experiments frequently aim to acquire a maximum number of relevant events from sometimes several hundred parallel, non-synchronous processes. Since in some high-throughput experiments, only one or a few instances of a given process can be observed simultaneously, a strategy for planning and executing an efficient acquisition paradigm is essential. To address this problem, we present two new methods in this paper. The first method, Encoded Dynamic Process (EDP), is Artificial Intelligence (AI)-based and represents dynamic processes so as to allow prediction of pseudo-time values from single still images. Second, with Experiment Automation Pipeline for Dynamic Processes (EAPDP), we present a Machine Learning Operations (MLOps)-based pipeline that uses the extracted knowledge from EDP to efficiently schedule acquisition in biomedical experiments for dynamic processes in practice. In a first experiment, we show that the pre-trained State-Of-The- Art (SOTA) object segmentation method Contour Proposal Networks (CPN) works reliably as a module of EAPDP to extract the relevant object for EDP from the acquired three-dimensional image stack.

摘要
在生物医学环境中，动态过程的实验通常由人工监控员进行。现代实验技术常采用多个并发、异步过程来获取最大数量的相关事件。由于一些高通过put实验中只能同时观察一些过程的一个或几个实例，因此制定有效的招待和执行策略是非常重要。为解决这个问题，本文提出了两种新方法。首先，我们提出了编码动态过程（EDP）方法，该方法基于人工智能（AI），可以从单张停止图像中预测动态过程中的pseudo-时间值。其次，我们提出了实验自动化管道 для动态过程（EAPDP），该管道基于机器学习操作（MLOps），使用EDP提取的知识来有效地调度获取在生物医学实验中的动态过程。在一个实验中，我们证明了在EAPDP中使用预训练的状态 искусственный智能（SOTA）对象分割方法Contour Proposal Networks（CPN）可靠地作为EAPDP中EXTRACT对象的模块来提取获取的三维图像堆中相关的对象。

Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video Sequences Using Swin Transformer-Enhanced UNet

paper_url: http://arxiv.org/abs/2310.03365
repo_url: None
paper_authors: Hossein Jafari, Karim Faez, Hamidreza Amindavar
for: 这个研究的目的是提高Computer-aided diagnosis (CAD)系统的精度，以便更好地识别lung nodules from computed tomography (CT) scans。
methods: 这个研究使用了一个新的模型，它结合了卷积神经网和感知器transformer的优点，将每个3D CT影像视为一个影像序列，并将肺肿瘤视为影像中的物体，以进行时间序列应用。
results: 这个研究使用了10-fold cross-validation技术来验证提案的网络，得到了97.84%的感度标准和96.0%的竞赛性能指标（CPM），并与现有的肺肿瘤识别技术进行比较，显示了这个提案的优秀精度。

Abstract
Lung cancer is highly lethal, emphasizing the critical need for early detection. However, identifying lung nodules poses significant challenges for radiologists, who rely heavily on their expertise for accurate diagnosis. To address this issue, computer-aided diagnosis (CAD) systems based on machine learning techniques have emerged to assist doctors in identifying lung nodules from computed tomography (CT) scans. Unfortunately, existing networks in this domain often suffer from computational complexity, leading to high rates of false negatives and false positives, limiting their effectiveness. To address these challenges, we present an innovative model that harnesses the strengths of both convolutional neural networks and vision transformers. Inspired by object detection in videos, we treat each 3D CT image as a video, individual slices as frames, and lung nodules as objects, enabling a time-series application. The primary objective of our work is to overcome hardware limitations during model training, allowing for efficient processing of 2D data while utilizing inter-slice information for accurate identification based on 3D image context. We validated the proposed network by applying a 10-fold cross-validation technique to the publicly available Lung Nodule Analysis 2016 dataset. Our proposed architecture achieves an average sensitivity criterion of 97.84% and a competition performance metrics (CPM) of 96.0% with few parameters. Comparative analysis with state-of-the-art advancements in lung nodule identification demonstrates the significant accuracy achieved by our proposed model.

摘要
肺癌是高度致命的，强调了早期发现的急迫性。然而，识别肺节圆柱体呈难度很大的问题， radiologist仰赖自己的专业技巧进行精准诊断。为解决这个问题，基于机器学习技术的计算机辅助诊断（CAD）系统在肺节圆柱体CT扫描图像中进行帮助。然而，现有网络在这个领域经常受到计算复杂性的限制，导致高false negative和false positive率，限制其效iveness。为了解决这些挑战，我们提出了一种创新的模型，利用了 convolutional neural networks和vision transformers的优势。受到 object detection in videos 的启发，我们将每个3D CT图像视为视频，每个slice为帧，并将肺节圆柱体视为物体，使得时序应用。我们的主要目标是在训练模型时缓解硬件限制，以便高效处理2D数据，同时利用3D图像上下文信息进行准确识别。我们采用了10-fold cross-validation技术来验证我们的提议的网络。我们的提议的架构实现了97.84%的敏感指标和96.0%的竞赛性能指标（CPM），同时具有少量参数。与当前肺节圆柱体识别领域的状态代表性进行比较分析，显示了我们的提议模型的显著精准性。

Robust Representation Learning via Asymmetric Negative Contrast and Reverse Attention

paper_url: http://arxiv.org/abs/2310.03358
repo_url: https://github.com/changzhang777/ancra
paper_authors: Nuoyan Zhou, Decheng Liu, Dawei Zhou, Xinbo Gao, Nannan Wang
for: 提高深度神经网络的鲁棒性，增强神经网络免受攻击的能力。
methods: 提出一种Generic Framework of Adversarial Training (AT)，通过偏置对应的负样本和反注意力来获得鲁棒表示。
results: 经验证明，我们的方法可以大幅提高AT中的鲁棒性，并实现状态级表现。

Abstract
Deep neural networks are vulnerable to adversarial noise. Adversarial training (AT) has been demonstrated to be the most effective defense strategy to protect neural networks from being fooled. However, we find AT omits to learning robust features, resulting in poor performance of adversarial robustness. To address this issue, we highlight two characteristics of robust representation: (1) $\bf{exclusion}$: the feature of natural examples keeps away from that of other classes; (2) $\bf{alignment}$: the feature of natural and corresponding adversarial examples is close to each other. These motivate us to propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Specifically, we design an asymmetric negative contrast based on predicted probabilities, to push away examples of different classes in the feature space. Moreover, we propose to weight feature by parameters of the linear classifier as the reverse attention, to obtain class-aware feature and pull close the feature of the same class. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance. Code is available at .

摘要
Translated into Simplified Chinese:深度神经网络容易受到敌意噪声的攻击。对抗训练（AT）已经被证明是保护神经网络不被欺骗的最有效的防御策略。然而，我们发现AT不学习强健特征，导致对抗Example的性能不佳。为解决这问题，我们强调了两个特征的稳健表示：（1）隔离：自然示例在特征空间与其他类的示例保持距离;（2）对齐：自然示例和相应的敌意示例在特征空间之间的距离很近。这两个特征激励我们提出一种通用的AT框架，通过倒推对比和反注意力来获得稳健表示。具体来说，我们设计了基于预测概率的倒推对比，以推动不同类型的示例在特征空间中分离。此外，我们提议使用分类器参数来Weight特征，以实现类归一类的特征和同类示例之间的减距。我们对三个标准测试集进行实验，结果显示我们的方法可以大幅提高AT的稳健性和性能，并达到当前领先水平。代码可以在上找到。

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

paper_url: http://arxiv.org/abs/2310.03354
repo_url: None
paper_authors: Zelai Xu, Yancheng Liang, Chao Yu, Yu Wang, Yi Wu
for: 这种paper是为了解决竞争游戏中的多代理人学习问题，特别是在混合合作竞争游戏中，where agents on the same team need to cooperate with each other。
methods: 这种paper使用了自适应（SP）和策略空间响应器（PSRO）两种方法，并将它们结合在一起以实现更好的性能。
results: 这种paper的实验结果表明，FXP算法可以在矩阵游戏和格子世界Domain中击败基准模型，并在一个更复杂的足球游戏中获得94%的赢利率。

Abstract
Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.

摘要
自适应（SP）是一种流行的多代理人学习（MARL）框架，用于解决竞争性游戏，每个代理人都通过对别人的行为来优化策略。 despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium（NE）with high probability. Alternatively, Policy-Space Response Oracles（PSRO）is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play（FXP）, which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.

Parking Spot Classification based on surround view camera system

paper_url: http://arxiv.org/abs/2310.12997
repo_url: None
paper_authors: Andy Xiao, Deep Doshi, Lihao Wang, Harsha Gorantla, Thomas Heitzmann, Peter Groth
for: 本研究旨在掌握半自动驾驶场景中的自动停车空位探测和分类，以提高自动停车的精度和效率。
methods: 本研究使用了围绕式鱼眼摄像头系统，并采用了基于物体检测的YOLOv4神经网络，以及一种新的多边形 bounding box 模型，以适应不同的停车空位形状。
results: 研究结果表明，我们提出的停车空位分类方法可以有效地分类不同类型的停车空位，包括普通停车空位、电动车停车空位和残疾人停车空位。

Abstract
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving scenarios, including urban driving and auto valet parking. Four fisheye cameras, one on each side, are sufficient to cover 360{\deg} around the vehicle capturing the entire near-field region. Based on surround view cameras, there has been much research on parking slot detection with main focus on the occupancy status in recent years, but little work on whether the free slot is compatible with the mission of the ego vehicle or not. For instance, some spots are handicap or electric vehicles accessible only. In this paper, we tackle parking spot classification based on the surround view camera system. We adapt the object detection neural network YOLOv4 with a novel polygon bounding box model that is well-suited for various shaped parking spaces, such as slanted parking slots. To the best of our knowledge, we present the first detailed study on parking spot detection and classification on fisheye cameras for auto valet parking scenarios. The results prove that our proposed classification approach is effective to distinguish between regular, electric vehicle, and handicap parking spots.

摘要
围绕式鱼眼摄像机在自动驾驶场景中广泛使用，包括城市驾驶和自动停车。四个鱼眼摄像机，一个在每侧，可以覆盖360度周围车辆，捕捉整个近场区域。基于围绕视频摄像机，近年来有很多研究关于停车位置检测，主要关注车辆在停车位置的占用状态。然而，很少人研究停车位置是否适合egos车辆的任务。例如，一些停车位置只能由人们或电动车辆使用。在这篇论文中，我们解决了基于围绕视频摄像机系统的停车位置类型分类。我们适应了对象检测神经网络YOLOv4的一种新的多边框架模型，这种模型适合各种形状的停车位置，如斜停车位置。到目前为止，我们的分类方法是对围绕视频摄像机系统中的停车位置进行详细研究的第一个详细研究。结果表明，我们的分类方法可以有效地将正常停车位置、电动车辆停车位置和残疾人停车位置分开。

Deep Geometric Learning with Monotonicity Constraints for Alzheimer’s Disease Progression

paper_url: http://arxiv.org/abs/2310.03353
repo_url: None
paper_authors: Seungwoo Jeong, Wonsik Jung, Junghyo Sohn, Heung-Il Suk
for: 预测阿尔茨海默病（AD）的进程，以便诊断和治疗。
methods: 使用结构MRI数据模型AD进程，包括时间变化、不完整观测和时间几何特征。
results: 提出了一种新的几何学学习方法，结合了 topological space shift、ODE-RGRU 和 trajectory estimation，可以模型长期数据序列。该方法还包括一种训练算法，将 manifold mapping 与 monotonicity constraints 结合以反映测量过程的不可逆转换。通过预测临床标签和认知分数， validate 了我们的提议方法的有效性。

Abstract
Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observations, and (iii) temporal geometric characteristics. However, deep learning-based approaches regarding data variability and sparsity have yet to consider inherent geometrical properties sufficiently. The ordinary differential equation-based geometric modeling method (ODE-RGRU) has recently emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. Despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. Therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. We have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. We verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. Furthermore, we thoroughly analyze our proposed framework through an ablation study.

摘要
阿尔茨heimer病（AD）是一种毁灭性神经退化疾病，其前进程是不可逆的，因此预测其进程的演变是诊断和治疗中非常重要。许多研究已经使用结构Magnetic Resonance Imaging（MRI）来模型AD的进程，关注三个重要方面：（i）时间变化，（ii）部分观测，（iii）时间几何特征。然而，深度学习基于数据变化和稀缺的方法尚未充分考虑内在的几何特征。recently, an ordinary differential equation-based geometric modeling method（ODE-RGRU）has emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. we have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. we verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. furthermore, we thoroughly analyze our proposed framework through an ablation study.

Tractable Bounding of Counterfactual Queries by Knowledge Compilation

paper_url: http://arxiv.org/abs/2310.03352
repo_url: https://github.com/idsia/credici
paper_authors: David Huber, Yizuo Chen, Alessandro Antonucci, Adnan Darwiche, Marco Zaffalon
for: 本文研究了在pearlian结构 causal模型中绑定部分可识别查询（counterfactuals）的问题。
methods: 本文使用了一种新的迭代EM算法来获得这些绑定的上限，该算法通过采样初始化参数来实现。该方法需要多个（Bayesian网络）查询，这些查询共享同一个结构方程和概率分布，但每个查询有不同的外生参数。因此，编译下来的Circuit结构有利于执行多个查询，从而实现了一定的计算减速。
results: 作者们实验表明，使用symbolic知识编译可以快速地计算绑定，并且可以实现一个训练 bayesian network inference的速度减速。

Abstract
We discuss the problem of bounding partially identifiable queries, such as counterfactuals, in Pearlian structural causal models. A recently proposed iterated EM scheme yields an inner approximation of those bounds by sampling the initialisation parameters. Such a method requires multiple (Bayesian network) queries over models sharing the same structural equations and topology, but different exogenous probabilities. This setup makes a compilation of the underlying model to an arithmetic circuit advantageous, thus inducing a sizeable inferential speed-up. We show how a single symbolic knowledge compilation allows us to obtain the circuit structure with symbolic parameters to be replaced by their actual values when computing the different queries. We also discuss parallelisation techniques to further speed up the bound computation. Experiments against standard Bayesian network inference show clear computational advantages with up to an order of magnitude of speed-up.

摘要
我们讨论 partially identifiable queries的问题，例如 counterfactuals，在 Pearlian 结构 causal models 中。一种最近提出的迭代 EM 方法可以获得这些约束的内部approximation，通过 sampling 初始化参数。这种方法需要多个（Bayesian network）查询，这些查询共享同一个结构方程和结构，但每个查询有不同的外生概率。这种设置使得 compiling 下面的模型到一个算术Circuit 有利可图，从而induces 一个明显的推理速度增加。我们示出了一种单symbolic knowledge compilation可以获得这些circuit structure 的符号参数，并将其替换为实际值当计算不同的查询。我们还讨论了并行技术，以进一步加速约束计算。对标准 Bayesian network inference 进行实验，我们发现了一个许多的计算优势，速度增加达一个数量级。

Tuning In to Neural Encoding: Linking Human Brain and Artificial Supervised Representations of Language

paper_url: http://arxiv.org/abs/2310.04460
repo_url: None
paper_authors: Jingyuan Sun, Xiaohan Zhang, Marie-Francine Moens
for: investigate how task tuning influences a pretained Transformer for neural encoding and which tasks lead to the best encoding performances.
methods: generate supervised representations on eight Natural Language Understanding (NLU) tasks using prompt-tuning, a technique that is seldom explored in neural encoding for language.
results: demonstrate that prompt-tuning yields representations that better predict neural responses to Chinese stimuli than traditional fine-tuning on four tasks, and discover that tasks that require a fine-grained processing of concepts and entities lead to representations that are most predictive of brain activation patterns.

Abstract
To understand the algorithm that supports the human brain's language representation, previous research has attempted to predict neural responses to linguistic stimuli using embeddings generated by artificial neural networks (ANNs), a process known as neural encoding. However, most of these studies have focused on probing neural representations of Germanic languages, such as English, with unsupervised ANNs. In this paper, we propose to bridge the gap between human brain and supervised ANN representations of the Chinese language. Specifically, we investigate how task tuning influences a pretained Transformer for neural encoding and which tasks lead to the best encoding performances. We generate supervised representations on eight Natural Language Understanding (NLU) tasks using prompt-tuning, a technique that is seldom explored in neural encoding for language. We demonstrate that prompt-tuning yields representations that better predict neural responses to Chinese stimuli than traditional fine-tuning on four tasks. Furthermore, we discover that tasks that require a fine-grained processing of concepts and entities lead to representations that are most predictive of brain activation patterns. Additionally, we reveal that the proportion of tuned parameters highly influences the neural encoding performance of fine-tuned models. Overall, our experimental findings could help us better understand the relationship between supervised artificial and brain language representations.

摘要
以前的研究曾尝试使用人工神经网络（ANNs）生成的编码来预测大脑对语言刺激的神经响应，但大多数这些研究都集中在探索德语族语言，如英语。在这篇论文中，我们提议将人类大脑和有监督的ANN语言表示之间的关系 bridged。特别是，我们研究了一种任务调整对预先训练的 transformer 语言编码器的影响，以及哪些任务会导致最佳的编码性能。我们使用 prompt-tuning 技术，它在语音编码领域尚未得到充分探索，来生成八种自然语言理解（NLU）任务的有监督表示。我们发现，使用 prompt-tuning 技术可以更好地预测中文刺激的神经响应，并且发现任务需要细化概念和实体处理时，表示更加预测大脑活动 Pattern 相关。此外，我们发现调整参数的比例对练习后的模型语言编码性能具有重要影响。总的来说，我们的实验结果可以帮助我们更好地理解人造语言和大脑之间的关系。

Zero-shot Learning of Drug Response Prediction for Preclinical Drug Screening

paper_url: http://arxiv.org/abs/2310.12996
repo_url: https://github.com/drugd/msda
paper_authors: Kun Li, Yong Luo, Xiantao Cai, Wenbin Hu, Bo Du
for: 这篇论文旨在提出一种零例学习解决方案，用于预测新药物的药物对应（DRP）任务。
methods: 方法基于多支多源领域适应试验插件（MSDA），可以与传统的 DRP 方法相结合，从相似药物的内部对应数据学习不变的特征，以提高实时预测未知药物的药物对应。
results: 实验结果显示，MSDA 能够效率地预测新药物的药物对应，导致预测误差下降5-10%，实现了在预 клиніical 阶段的药物探索过程中的加速和改善药物选择。

Abstract
Conventional deep learning methods typically employ supervised learning for drug response prediction (DRP). This entails dependence on labeled response data from drugs for model training. However, practical applications in the preclinical drug screening phase demand that DRP models predict responses for novel compounds, often with unknown drug responses. This presents a challenge, rendering supervised deep learning methods unsuitable for such scenarios. In this paper, we propose a zero-shot learning solution for the DRP task in preclinical drug screening. Specifically, we propose a Multi-branch Multi-Source Domain Adaptation Test Enhancement Plug-in, called MSDA. MSDA can be seamlessly integrated with conventional DRP methods, learning invariant features from the prior response data of similar drugs to enhance real-time predictions of unlabeled compounds. We conducted experiments using the GDSCv2 and CellMiner datasets. The results demonstrate that MSDA efficiently predicts drug responses for novel compounds, leading to a general performance improvement of 5-10\% in the preclinical drug screening phase. The significance of this solution resides in its potential to accelerate the drug discovery process, improve drug candidate assessment, and facilitate the success of drug discovery.

摘要
传统的深度学习方法通常采用有监督学习的方式进行药物响应预测（DRP）。这意味着模型训练需要有标注的响应数据来源于药物。然而，在实际应用中，在前期药物层面的药物屏选阶段，需要预测新的化合物的响应，而这些化合物的响应 oftentimes unknown。这增加了挑战，使得传统的深度学习方法无法满足这些情况。在本文中，我们提出了零shot学习的解决方案 для DRP 任务在前期药物层面。特别是，我们提出了一种多支多源领域适应测试扩展 Plug-in，称为 MSDA。 MSDA 可以与传统的 DRP 方法集成，从价值类似药物的响应数据中学习不变的特征，以提高实时预测无标注的化合物的响应。我们在 GDSCv2 和 CellMiner 数据集上进行了实验，结果表明，MSDA 能有效地预测新的化合物的响应，从而在前期药物层面提高了5-10%的性能。这种解决方案的重要性在于，它可以加速药物发现过程，改善药物候选者评估，并促进药物发现的成功。

Learning Concept-Based Visual Causal Transition and Symbolic Reasoning for Visual Planning

paper_url: http://arxiv.org/abs/2310.03325
repo_url: None
paper_authors: Yilue Qian, Peiyu Yu, Ying Nian Wu, Wei Wang, Lifeng Fan
for: 这个论文旨在提出一个可解释的和通用的视觉观念规划框架，以帮助Agent在复杂环境中完成日常任务。
methods: 这个框架包括三个主要部分：novel Substitution-based Concept Learner (SCL)、symbol abstraction和reasoning、以及Visual Causal Transition model (ViCT)。SCL抽象视觉输入，生成分离的概念表示；symbol abstraction和reasoning使用自学到的符号来进行任务观念规划；ViCT将视觉 causal transition 与实际世界中相似的动作相连接。
results: 这个方法在一个大规模的视觉观念规划数据集（CCTP）上进行了严格的实验，展示了该方法在视觉任务规划方面的超越性性能。实验结果显示，该方法可以对未见过的任务路径和物品类别进行扩展。

Abstract
Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual task planning. Empirically, we show that our framework can generalize to unseen task trajectories and unseen object categories.

摘要
<> translate into Simplified ChineseVisual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual task planning. Empirically, we show that our framework can generalize to unseen task trajectories and unseen object categories.>Here's the translation in Simplified Chinese characters:Visual 规划 simulate 人类做出决策以达到目标的形式，即在初始视觉状态和目标视觉状态之间搜索视觉 causal 过渡。在 egocentric 视觉中，它变得越来越重要，因为它可以指导代理人进行日常任务在复杂环境中。在这篇论文中，我们提出了可解释性和普适性的视觉规划框架，包括 i) 一种新的替换基于概念学习器（SCL），ii) 符号抽象和理据，iii) 视觉 causal 过渡模型（ViCT）。给出初始状态，我们通过符号意义和 causal 过渡来实现目标状态的Visual 规划。为了证明我们的模型的效果，我们收集了基于 AI2-THOR 的大规模视觉规划数据集，并进行了广泛的实验。经验表明，我们的框架可以通过在未看过的任务轨迹和未看过的物品类别上进行普适化。

Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction

paper_url: http://arxiv.org/abs/2310.03314
repo_url: None
paper_authors: Aadi Kothari, Tony Tohme, Xiaotong Zhang, Kamal Youcef-Toumi
for: 这篇论文的目的是提出一种基于人体 JOINT 约束和场景约束的人体动作预测方法，以提高人机合作的效率和安全性。
methods: 该方法使用 Gaussian Process Regression（GPR）模型，并将人体 JOINT 约束和场景约束直接integrated into the model，以便在预测人体动作的过程中考虑人体的物理约束和场景约束。
results: 实验和 simulate 结果表明，当将人体 JOINT 约束和场景约束explicitly considered时，Gaussian Process 框架可以得到较好的预测结果，而且在实际应用中也可以实现实时的人机合作。

Abstract
Human motion prediction is an essential step for efficient and safe human-robot collaboration. Current methods either purely rely on representing the human joints in some form of neural network-based architecture or use regression models offline to fit hyper-parameters in the hope of capturing a model encompassing human motion. While these methods provide good initial results, they are missing out on leveraging well-studied human body kinematic models as well as body and scene constraints which can help boost the efficacy of these prediction frameworks while also explicitly avoiding implausible human joint configurations. We propose a novel human motion prediction framework that incorporates human joint constraints and scene constraints in a Gaussian Process Regression (GPR) model to predict human motion over a set time horizon. This formulation is combined with an online context-aware constraints model to leverage task-dependent motions. It is tested on a human arm kinematic model and implemented on a human-robot collaborative setup with a UR5 robot arm to demonstrate the real-time capability of our approach. Simulations were also performed on datasets like HA4M and ANDY. The simulation and experimental results demonstrate considerable improvements in a Gaussian Process framework when these constraints are explicitly considered.

摘要
人类动作预测是人机合作中不可或缺的一步，目前的方法可以分为两类：一是将人体关节表示为神经网络 Architecture 中的某种形式，二是使用回归模型在线下适应hyperparameters，以 capture 人体动作模型。尽管这些方法可以提供初步的好结果，但是它们缺乏利用人体动作学习的知识和场景约束，这些约束可以帮助提高预测框架的效果，同时明确避免人体关节配置的不可能情况。我们提出了一种新的人体动作预测框架，该框架在 Gaussian Process Regression（GPR）模型中包含人体关节约束和场景约束，以预测人体动作在时间范围内的动作。这种形式与在线上的上下文意识约束模型结合，以利用任务висимы的动作。我们在人类臂动机学模型上进行了测试，并在人机合作设置中使用UR5机械臂进行实际应用，以示我们的方法的实时能力。我们还在HA4M和ANDY等数据集上进行了 simulated 实验，实验和实际结果表明，在Gaussian Process框架中，当这些约束被Explicitly 考虑时，可以获得显著的改善。

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning

paper_url: http://arxiv.org/abs/2310.03309
repo_url: None
paper_authors: Shaotian Yan, Chen Shen, Junjie Liu, Jieping Ye
for: 提高大型自然语言模型（LLM）的逻辑推理能力。
methods: 提出了一种新的逻辑推理方法，即 Concise and Organized Perception（COP），通过精炼给定的陈述，快速分析出最重要信息，并将其组织得更加系统化，以便更好地逻辑推理。
results: 实验结果表明，与先前的状态艺术方法相比，COP方法在三个popular deductive benchmark（ProofWriter、PrOntoQA和PrOntoQA-OOD）上显著提高了性能。

Abstract
Exploiting large language models (LLMs) to tackle deductive reasoning has garnered growing attention. It still remains highly challenging to achieve satisfactory results in complex deductive problems, characterized by plenty of premises (i.e., facts or rules) entailing intricate relationships among entities and requiring multi-hop reasoning. One intuitive solution is to decompose the original task into smaller sub-tasks, and then chain the multiple casual reasoning steps together in a forward (e.g., Selection-Inference) or backward (e.g., LAMBADA) direction. However, these techniques inevitably necessitate a large number of overall stages, leading to computationally expensive operations and a higher possibility of making misleading steps. In addition to stage-by-stage decomposition, we draw inspiration from another aspect of human problem-solving. Humans tend to distill the most relevant information and organize their thoughts systematically (e.g., creating mind maps), which assists them in answering questions or drawing conclusions precisely and quickly. In light of this, we propose a novel reasoning approach named Concise and Organized Perception (COP). COP carefully analyzes the given statements to efficiently identify the most pertinent information while eliminating redundancy. It then prompts the LLMs in a more organized form that adapts to the model's inference process. By perceiving concise and organized proofs, the deductive reasoning abilities of LLMs can be better elicited, and the risk of acquiring errors caused by excessive reasoning stages is mitigated. Furthermore, our approach can be combined with the aforementioned ones to further boost their performance. Extensive experimental results on three popular deductive benchmarks (i.e., ProofWriter, PrOntoQA and PrOntoQA-OOD) show that COP significantly outperforms previous state-of-the-art methods.

摘要
大量语言模型（LLM）在解释逻辑中得到了越来越多的关注。然而，复杂的逻辑问题仍然具有许多前提（即事实或规则），导致复杂的关系和多步逻辑推理。一种直观的解决方案是将原始任务分解成小任务，然后在前向（如选择-推理）或反向（如LAMBADA）方向连接多个逻辑推理步骤。然而，这些技术无可避免地需要大量的总体阶段，导致计算昂贵的操作和更高的误导步骤的可能性。除了阶段 decomposition外，我们从人类问题解决的另一个方面着想着。人类倾向于抽象出最重要的信息，并系统地组织自己的思想（如创建MIND MAPS），这有助于他们快速、准确地回答问题或 Draw 结论。在这 basis，我们提出了一种新的逻辑方法，称为 Concise and Organized Perception（COP）。COP méticulously analyzes the given statements to efficiently identify the most relevant information while eliminating redundancy. It then prompts the LLMs in a more organized form that adapts to the model's inference process. By perceiving concise and organized proofs, the deductive reasoning abilities of LLMs can be better elicited, and the risk of acquiring errors caused by excessive reasoning stages is mitigated。此外，我们的方法可以与以前的方法结合使用，以进一步提高 их性能。我们在三个流行的逻辑标准 benchmark（ ProofWriter、PrOntoQA 和 PrOntoQA-OOD）进行了广泛的实验，结果表明，COP significantly outperforms previous state-of-the-art methods。

Benchmarking Large Language Models As AI Research Agents

paper_url: http://arxiv.org/abs/2310.03302
repo_url: https://github.com/snap-stanford/mlagentbench
paper_authors: Qian Huang, Jian Vora, Percy Liang, Jure Leskovec
for:MLAgentBench is a suite of ML tasks for benchmarking AI research agents, allowing them to perform actions like reading/writing files, executing code, and inspecting outputs.methods:The benchmark evaluates the agent’s performance objectively over various metrics related to performance and efficiency, and an LLM-based research agent is designed to automatically perform experimentation loops in such an environment.results:A GPT-4-based research agent can feasibly build compelling ML models over many tasks in MLAgentBench, displaying highly interpretable plans and actions, but the success rates vary considerably and the agent faces challenges such as long-term planning and hallucination.Here is the answer in Simplified Chinese text:for: MLAgentBench 是一个 ML 任务集合，用于评估 AI 研究代理的表现，允许代理执行文件读写、代码执行和输出检查等操作。methods: MLAgentBench 使用对象 Orientated 评估代理的表现，包括多种关于性能和效率的 metric，而一个基于 LLM 的研究代理被设计用于自动执行实验循环。results: GPT-4 基于的研究代理可以在 MLAgentBench 上建立优秀的 ML 模型，显示出高度可读取的计划和行动，但成功率差异较大，从 nearly 90% 在较古老的 dataset 上到 recent Kaggle Challenges 上的 10%，甚至 newer research challenges 上的 0%。此外， LLB 基于的研究代理还面临着长期规划和幻觉等挑战。

Abstract
Scientific experimentation involves an iterative process of creating hypotheses, designing experiments, running experiments, and analyzing the results. Can we build AI research agents to perform these long-horizon tasks? To take a step towards building and evaluating research agents on such open-ended decision-making tasks, we focus on the problem of machine learning engineering: given a task description and a dataset, build a high-performing model. In this paper, we propose MLAgentBench, a suite of ML tasks for benchmarking AI research agents. Agents can perform actions like reading/writing files, executing code, and inspecting outputs. With these actions, agents could run experiments, analyze the results, and modify the code of entire machine learning pipelines, such as data processing, architecture, training processes, etc. The benchmark then automatically evaluates the agent's performance objectively over various metrics related to performance and efficiency. We also design an LLM-based research agent to automatically perform experimentation loops in such an environment. Empirically, we find that a GPT-4-based research agent can feasibly build compelling ML models over many tasks in MLAgentBench, displaying highly interpretable plans and actions. However, the success rates vary considerably; they span from almost 90\% on well-established older datasets to as low as 10\% on recent Kaggle Challenges -- unavailable during the LLM model's pretraining -- and even 0\% on newer research challenges like BabyLM. Finally, we identify several key challenges for LLM-based research agents such as long-term planning and hallucination. Our code is released at https://github.com/snap-stanford/MLAgentBench.

摘要
Translation (Simplified Chinese):科学实验涉及到一个迭代的过程，包括创建假设、设计实验、运行实验和分析结果。我们是否可以建立AI研究代理来完成这些长期决策任务？为了实现这一目标，我们将关注机器学习工程问题：给定任务描述和数据集，建立高性能的模型。在这篇论文中，我们提出了MLAgentBench，一个用于评估AI研究代理的ML任务集。代理可以执行如读写文件、执行代码和检查输出等动作。通过这些动作，代理可以运行实验、分析结果并修改整个机器学习管道，包括数据处理、架构、训练过程等。然后，比较器会自动评估代理的表现，并对其表现进行对比。我们还设计了一个基于LLM的研究代理，可以自动完成实验循环。我们的实验表明，一个基于GPT-4的研究代理可以在MLAgentBench中建立吸引人的ML模型，并显示出高度可读的计划和操作。然而，成功率很大，从 almost 90% 到 recent Kaggle Challenges 的10% ，甚至到 newer research challenges like BabyLM 的0%。最后，我们确定了一些关键挑战，包括长期规划和幻觉。我们的代码发布在 https://github.com/snap-stanford/MLAgentBench。

LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

paper_url: http://arxiv.org/abs/2310.03294
repo_url: https://github.com/rulinshao/lightseq
paper_authors: Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang
for: 本研究旨在提高大语言模型（LLMs）的训练context长度，但是这会增加训练的内存占用。现有的分布式系统，如Megatron-LM，通过分解并并行计算不同的注意头，但是这会导致大量的通信量，因此无法扩展。
methods: 本研究提出了一种新的方法——LightSeq，用于长context LLMs 的训练。LightSeq通过分解序列维度来实现，因此不受模型结构的限制，可以应用于不同的注意头数量模型，如多头注意、多个查询注意和分组查询注意。LightSeq比Megatron-LM需要更少的通信，并且可以重合计算和通信。
results: 通过对Llama-7B和其变种进行详细的单节和跨节训练测试，我们发现LightSeq可以 дости到1.24-2.01倍的总体速度提升，并可以支持更长的序列长度（32K-512K）。相比Megatron-LM，LightSeq可以减少4.7倍的通信量，并且实现了更高效的训练。

Abstract
Increasing the context length of large language models (LLMs) unlocks fundamentally new capabilities, but also significantly increases the memory footprints of training. Previous model-parallel systems such as Megatron-LM partition and compute different attention heads in parallel, resulting in large communication volumes, so they cannot scale beyond the number of attention heads, thereby hindering its adoption. In this paper, we introduce a new approach, LightSeq, for long-context LLMs training. LightSeq has many notable advantages. First, LightSeq partitions over the sequence dimension, hence is agnostic to model architectures and readily applicable for models with varying numbers of attention heads, such as Multi-Head, Multi-Query and Grouped-Query attention. Second, LightSeq not only requires up to 4.7x less communication than Megatron-LM on popular LLMs but also overlaps the communication with computation. To further reduce the training time, LightSeq features a novel gradient checkpointing scheme to bypass an forward computation for memory-efficient attention. We evaluate LightSeq on Llama-7B and its variants with sequence lengths from 32K to 512K. Through comprehensive experiments on single and cross-node training, we show that LightSeq achieves up to 1.24-2.01x end-to-end speedup, and a 2-8x longer sequence length on models with fewer heads, compared to Megatron-LM. Codes will be available at https://github.com/RulinShao/LightSeq.

摘要
增加大语言模型（LLM）的上下文长度可以解锁新的功能，但也会增加训练时的内存占用。现有的模型并行系统，如Megatron-LM，通过分布式计算不同的注意头，以并行计算方式来降低通信量，但是这种方法无法扩展到更多的注意头，因此限制了其应用。在这篇论文中，我们介绍了一种新的方法——LightSeq，用于长上下文LLM的训练。LightSeq具有多个优势。首先，LightSeq分配在序列维度上，因此不受模型结构限制，可以应用于不同数量的注意头，如多头注意、多Query注意和分组Query注意。其次，LightSeq相比Megatron-LM需要4.7倍少的通信量，并且可以在计算和通信之间进行 overlap。为了进一步减少训练时间，LightSeq还提供了一种独特的梯度检查点 schemes，以快速地缓存减少计算注意。我们在Llama-7B和其变种上进行了广泛的实验，并证明了LightSeq可以达到1.24-2.01倍的综合速度，并在模型中有更多的注意头时可以处理更长的序列长度。代码将在https://github.com/RulinShao/LightSeq上提供。

SoK: Access Control Policy Generation from High-level Natural Language Requirements

paper_url: http://arxiv.org/abs/2310.03292
repo_url: None
paper_authors: Sakuna Harinda Jayasundara, Nalin Asanka Gamagedara Arachchilage, Giovanni Russello
for: 防止管理员中心化访问控制失败，以避免数据泄露和组织受到金融损失和声誉损害。
methods: 已有图形策略配置工具和自动生成策略框架，帮助管理员配置和生成访问控制策略，以避免such failures。但是，图形策略配置工具容易出现人工错误，而自动生成策略框架容易出现错误预测，因此需要改进其可用性和可靠性。
results: 通过系统性文献回顾分析49篇论文，发现现有工具和框架具有限制，需要改进以提高可用性和可靠性。

Abstract
Administrator-centered access control failures can cause data breaches, putting organizations at risk of financial loss and reputation damage. Existing graphical policy configuration tools and automated policy generation frameworks attempt to help administrators configure and generate access control policies by avoiding such failures. However, graphical policy configuration tools are prone to human errors, making them unusable. On the other hand, automated policy generation frameworks are prone to erroneous predictions, making them unreliable. Therefore, to find ways to improve their usability and reliability, we conducted a Systematic Literature Review analyzing 49 publications, to identify those tools, frameworks, and their limitations. Identifying those limitations will help develop effective access control policy generation solutions while avoiding access control failures.

摘要
管理员中心的访问控制失败可导致数据泄露，使组织面临金融损失和声誉损害的风险。现有的图形策略配置工具和自动策略生成框架尝试帮助管理员配置和生成访问控制策略，以避免这些失败。然而，图形策略配置工具容易出现人为错误，使其不可用。相反，自动策略生成框架容易出现错误预测，使其不可靠。因此，为了改善其可用性和可靠性，我们进行了系统性文献综述，分析了49篇论文，以识别这些工具、框架和其限制，以帮助开发有效的访问控制策略生成解决方案，并避免访问控制失败。

A 5’ UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions

paper_url: http://arxiv.org/abs/2310.03281
repo_url: None
paper_authors: Yanyi Chu, Dan Yu, Yupeng Li, Kaixuan Huang, Yue Shen, Le Cong, Jason Zhang, Mengdi Wang
for:* The paper is written to introduce a language model for 5’ UTR (UTR-LM) to predict the translation efficiency and mRNA expression level.methods:* The UTR-LM is pre-trained on endogenous 5’ UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy.* The model is fine-tuned in a variety of downstream tasks.results:* The UTR-LM outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level.* The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline.* Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5’ UTR optimized for therapeutics.

Abstract
The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading, and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline. Further, we designed a library of 211 novel 5' UTRs with high predicted values of translation efficiency and evaluated them via a wet-lab assay. Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5' UTR optimized for therapeutics.

摘要
“5' UTR，一个调节区域，位于mRNA分子的起始处，对翻译过程进行重要调节，并影响蛋白质表达水平。语音模型已经展示了它们可以解读蛋白质和基因序列的功能。在这里，我们引入了5' UTR的语音模型，我们称之为UTR-LM。UTR-LM在多种生物体中的组合式训练中进行预训练，并且受到次要结构和最小自由能的指导。我们在多个下游任务中精确调整UTR-LM。模型比最佳参考基准高达42% для预测蛋白质载入平均值，并高达60% для预测翻译效率和mRNA表达水平。模型还应用于识别未被评估的Internal Ribosome Entry Sites（iRES），并提高AUPR从0.37提升至0.52，比最佳基eline高出35%。此外，我们设计了211个新的5' UTR，预测的翻译效率高，并通过湿库实验验证。结果显示，我们的顶部设计可以提高蛋白质生产水平32.5%，相比于已知的5' UTR优化 для医药。”

Network Alignment with Transferable Graph Autoencoders

paper_url: http://arxiv.org/abs/2310.03272
repo_url: https://github.com/graphmatching/graph-matching
paper_authors: Jiashu He, Charilaos I. Kanatsoulis, Alejandro Ribeiro
for: 提高网络对齐的精度和效率，使得网络对齐可以在大规模 graphs 上进行。
methods: 提出一种基于自适应神经网络的普适 graph autoencoder 框架，通过提取节点嵌入来实现网络对齐。该框架可以利用传输学习和数据增强来实现高效的网络对齐。
results: 实验表明，提出的方法可以在实际世界 graphs 上进行高精度、高效的网络对齐，而且不需要重新训练。

Abstract
Network alignment is the task of establishing one-to-one correspondences between the nodes of different graphs and finds a plethora of applications in high-impact domains. However, this task is known to be NP-hard in its general form, and existing algorithms do not scale up as the size of the graphs increases. To tackle both challenges we propose a novel generalized graph autoencoder architecture, designed to extract powerful and robust node embeddings, that are tailored to the alignment task. We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs and can achieve more accurate alignment compared to classical spectral methods. Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining. Extensive experiments on both network and sub-network alignment with real-world graphs provide corroborating evidence supporting the effectiveness and scalability of the proposed approach.

摘要
网络对齐是指将不同图像的节点进行一对一对应，并具有许多应用于高影响领域。然而，这个任务已知为NP困难的普通形式，现有的算法无法随图像大小增长缩放。为了解决这两个挑战，我们提出了一种新的通用图自编码器架构，用于提取强大和可靠的节点嵌入，特化于对齐任务。我们证明了生成的嵌入是对图像的特征值和特征向量相关的，并可以在比 классическихspectral方法更高精度的情况下进行对齐。我们的提出的框架还利用了传输学习和数据扩展来实现大规模的网络对齐，无需重新训练。广泛的实验表明，我们的方法可以在真实世界的图像上实现高精度和可扩展的网络对齐。

Sparse Deep Learning for Time Series Data: Theory and Applications

paper_url: http://arxiv.org/abs/2310.03243
repo_url: None
paper_authors: Mingxuan Zhang, Yan Sun, Faming Liang
for: 这篇论文的目的是提高深度学习网络在不同类型数据上的表现，特别是在不确定性量化、变数选择和大规模网络压缩等领域。
methods: 本论文使用的方法是简单深度学习，并研究了这种方法在相依数据上的应用。研究结果显示，简单深度学习可以在相依数据上适当地训练，并且可以正确地量化预测uncertainty。
results: 本论文的numerical results显示，简单深度学习可以在时间序列数据上进行更好的预测uncertainty量化，并且可以正确地决定时间序列中的自相依关系。此外，本论文的结果显示，简单深度学习可以在大规模网络压缩中表现更好，并且可以正确地识别时间序列中的自相依关系。

Abstract
Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.

摘要
sparse deep learning 已成为深度学习中提高性能的受欢迎技术，特别是在不确定量评估、变量选择和大规模网络压缩等领域。然而，大多数现有研究都集中在独立相同分布（i.i.d）的问题上，尚未对相关的问题进行研究，如时间序列数据和自然语言处理中的序列数据。这篇论文想要填补这一差距，通过研究依赖数据的概率理论，来探讨这些问题。我们显示了 sparse RNN 可以透明地估算，其预测值在适当假设下是均匀分布的，从而正确地评估预测uncertainty。我们的numerical结果表明， sparse deep learning 在时间序列数据中的预测uncertainty评估方面超过了现有的方法，如 конформаль预测，并且在大规模模型压缩方面也表现出了优异性。我们的提议方法在金融、医疗和能源等领域有重要实践意义，因为它们都需要准确的点估计和预测uncertainty评估。

Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization

paper_url: http://arxiv.org/abs/2310.03234
repo_url: None
paper_authors: Quanqi Hu, Dixian Zhu, Tianbao Yang
For: investigate new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO)* Methods: examine non-smooth weakly-convex FCCO, analyze a single-loop algorithm, and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function* Results: extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, and explore applications in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization using empirical studies to showcase the effectiveness of the proposed algorithms.Here’s the format you requested:* For: <what are the paper written for?>* Methods: <what methods the paper use?>* Results: <what results the paper get?>I hope that helps!

Abstract
This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.

摘要
Translated into Simplified Chinese:这篇论文研究了新的一类 compositional optimization 问题，即 $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO)。随着机器学习和人工智能领域中 FCCO 的应用广泛，以及基于 empirical risk minimization 的权重随机算法的缺点，FCCO 的研究吸引了越来越多的关注。然而，现有的 FCCO 研究假设内函数和外函数都是平滑的，这限制了它们的应用范围。我们的研究扩展了这一领域，研究非平滑弱 convex FCCO，其中外函数是弱 convex 的，内函数是弱-convex。我们分析了单循环算法，并确定了其在 Moreau 封闭中的 $\epsilon $-站点的复杂性。此外，我们还扩展了算法，以解决新的非平滑弱 convex tri-level finite-sum coupled compositional optimization 问题，其中有三个函数的嵌套排序。最后，我们探讨了我们的算法在深度学习中的应用，包括两种方法的 partial AUC 最大化和多实例两种方法的 partial AUC 最大化，并通过实验研究表明了我们的算法的效果。

Deep Representations of First-person Pronouns for Prediction of Depression Symptom Severity

paper_url: http://arxiv.org/abs/2310.03232
repo_url: None
paper_authors: Xinyang Ren, Hannah A Burkhardt, Patricia A Areán, Thomas D Hull, Trevor Cohen
for: 本研究使用文本数据分析个人心理状态，尤其是抑郁症状的严重程度。
methods: 研究使用了Contextualized language representation models来生成首人宾词的上下文嵌入，以捕捉首人宾词在语料中的使用方式。
results: 研究结果表明，使用上下文嵌入的首人宾词表现出色于标准分类token嵌入和频率分析结果，在预测抑郁症状严重程度方面表现出优异。这表明Contextual representations of first-person pronouns可以增强语言使用的预测性能。

Abstract
Prior work has shown that analyzing the use of first-person singular pronouns can provide insight into individuals' mental status, especially depression symptom severity. These findings were generated by counting frequencies of first-person singular pronouns in text data. However, counting doesn't capture how these pronouns are used. Recent advances in neural language modeling have leveraged methods generating contextual embeddings. In this study, we sought to utilize the embeddings of first-person pronouns obtained from contextualized language representation models to capture ways these pronouns are used, to analyze mental status. De-identified text messages sent during online psychotherapy with weekly assessment of depression severity were used for evaluation. Results indicate the advantage of contextualized first-person pronoun embeddings over standard classification token embeddings and frequency-based pronoun analysis results in predicting depression symptom severity. This suggests contextual representations of first-person pronouns can enhance the predictive utility of language used by people with depression symptoms.

摘要
Translated into Simplified Chinese:先前的研究表明，分析首人单数代名词的使用可以提供困惑状态的人们的心理状况信息，特别是抑郁症Symptom的严重程度。这些发现是通过计数首人单数代名词在文本数据中的频率来获得的。然而，计数不能捕捉首人单数代名词的使用方式。近年来，神经语言模型的发展已经利用了生成上下文 embedding的方法。在这项研究中，我们想要利用来自上下文化语言表示模型的首人单数代名词 embedding来捕捉首人单数代名词的使用方式，以分析困惑状态。在在线心理咨询中发送的匿名短信，与每周评估抑郁症Symptom的严重程度一起使用进行评估。结果表明，上下文化首人单数代名词 embedding 的优势在 predicting 抑郁症Symptom 的严重程度上，比标准化 classification token embedding 和频率分析结果更高。这表示上下文表示首人单数代名词可以增强基于语言使用的抑郁症Symptom 的预测utilities。

Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms

paper_url: http://arxiv.org/abs/2310.03225
repo_url: None
paper_authors: Akifumi Wachi, Wataru Hashimoto, Xun Shen, Kazumune Hashimoto
for: 本研究旨在提供一种通用安全探索（Generalized Safe Exploration，GSE）问题的统一形式，以及一种基于无约束RL算法和不确定度量表示的安全探索方法MASE，以确保在当前 episoden 中的安全性，并避免未来 episoden 中的安全性抵触。
methods: 本研究使用了一种基于Generalized Linear Models（GLMs）的隐藏 MARGE 方法，以及一种 combine 了 Gaussian Process 和 Deep RL 算法的 variant。
results: 实验结果表明，相比之前的状态 искусственный智能算法，MASE 可以在 grid-world 和 Safety Gym 测试环境中实现更好的性能，而不需要违反任何安全约束，即使在训练过程中。

Abstract
Safe exploration is essential for the practical use of reinforcement learning (RL) in many real-world scenarios. In this paper, we present a generalized safe exploration (GSE) problem as a unified formulation of common safe exploration problems. We then propose a solution of the GSE problem in the form of a meta-algorithm for safe exploration, MASE, which combines an unconstrained RL algorithm with an uncertainty quantifier to guarantee safety in the current episode while properly penalizing unsafe explorations before actual safety violation to discourage them in future episodes. The advantage of MASE is that we can optimize a policy while guaranteeing with a high probability that no safety constraint will be violated under proper assumptions. Specifically, we present two variants of MASE with different constructions of the uncertainty quantifier: one based on generalized linear models with theoretical guarantees of safety and near-optimality, and another that combines a Gaussian process to ensure safety with a deep RL algorithm to maximize the reward. Finally, we demonstrate that our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks without violating any safety constraints, even during training.

摘要
安全探索是重要的实用应用强化学习（RL）的前提。在这篇论文中，我们提出一种通用安全探索（GSE）问题的总体形式，并提出一种解决GSE问题的元算法MASE，该算法结合不受限制的RL算法和不确定度量表来保证当前pisode中的安全性，并正确惩罚不安全的探索，以避免将来的episode中的安全性被违反。MASE的优点在于，我们可以在合理的假设下优化策略，同时保证高概率下不会违反任何安全约束。我们采用两种不同的构建不确定度量表的MASE变体：一种基于泛化线性模型，具有安全性和优化性的理论保证；另一种 combining Gaussian process ensure safety with a deep RL algorithm to maximize the reward.最后，我们证明我们提出的算法在Grid-world和Safety Gym benchmark上比现有算法更好的性能，而不违反任何安全约束，甚至在训练过程中。

Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs

paper_url: http://arxiv.org/abs/2310.03221
repo_url: https://github.com/yijia-xiao/know2bio
paper_authors: Yijia Xiao, Dylan Steinecke, Alexander Russell Pelletier, Yushi Bai, Peipei Ping, Wei Wang
for: 这个论文目的是提出一个通用的生物医学知识 graphs（KG）测试集，以便用于生物医学知识 repre sentation学习。
methods: 这个论文使用了多种数据源，并将这些数据源中的信息集成到一个KG中，以捕捉生物医学领域的复杂关系。它还可以自动更新，以适应最新的生物医学知识。
results: 研究人员通过在Know2BIO上评估知识 repre sentation模型，发现Know2BIO可以作为生物医学领域中知识 repre sentation学习的标准测试集。

Abstract
Knowledge graphs (KGs) have emerged as a powerful framework for representing and integrating complex biomedical information. However, assembling KGs from diverse sources remains a significant challenge in several aspects, including entity alignment, scalability, and the need for continuous updates to keep pace with scientific advancements. Moreover, the representative power of KGs is often limited by the scarcity of multi-modal data integration. To overcome these challenges, we propose Know2BIO, a general-purpose heterogeneous KG benchmark for the biomedical domain. Know2BIO integrates data from 30 diverse sources, capturing intricate relationships across 11 biomedical categories. It currently consists of ~219,000 nodes and ~6,200,000 edges. Know2BIO is capable of user-directed automated updating to reflect the latest knowledge in biomedical science. Furthermore, Know2BIO is accompanied by multi-modal data: node features including text descriptions, protein and compound sequences and structures, enabling the utilization of emerging natural language processing methods and multi-modal data integration strategies. We evaluate KG representation models on Know2BIO, demonstrating its effectiveness as a benchmark for KG representation learning in the biomedical field. Data and source code of Know2BIO are available at https://github.com/Yijia-Xiao/Know2BIO/.

摘要
知识图（KG）在生物医学领域已经出现为表示和集成复杂生物医学信息的强大框架。然而，从多种来源组装KG仍然是一个重要的挑战，包括实体对应、可扩展性和需要不断更新以保持科学进步的速度。此外，KG的表达力 часто受到多模态数据集成的限制。为了解决这些挑战，我们提出了知2生物（Know2BIO），一个通用的生物医学领域多模态KG Benchmark。知2生物从30种多样化来源中提取了11类生物医学信息，涵盖了复杂的实体关系，目前包含约219,000个节点和6,200,000个边。知2生物支持用户指导的自动更新，以反映最新的生物医学知识。此外，知2生物还附带了多模态数据，包括节点特征文本描述、蛋白质和化合物序列和结构，这使得可以利用生成的自然语言处理方法和多模态数据集成策略。我们在知2生物上评估KG表示模型，证明其在生物医学领域KG表示学习的有效性。数据和源代码可以在https://github.com/Yijia-Xiao/Know2BIO/ obtained。

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC

paper_url: http://arxiv.org/abs/2310.03218
repo_url: https://github.com/yupeiyu98/diffusion-amortized-mcmc
paper_authors: Peiyu Yu, Yaxuan Zhu, Sirui Xie, Xiaojian Ma, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu
For: The paper is written for learning latent space Energy-Based Models (EBMs) with long-run Markov Chain Monte Carlo (MCMC) sampling, to address the issue of degenerate MCMC sampling quality in practice.* Methods: The paper introduces a simple but effective diffusion-based amortization method for long-run MCMC sampling, and develops a novel learning algorithm for the latent space EBM based on it.* Results: The paper provides theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler, and demonstrates superior performance of the method on several image modeling benchmark datasets compared with strong counterparts.Here is the text in Simplified Chinese:
for: 本文是为了学习嵌入空间能量基本模型（EBM）的长期Markov链 Monte Carlo（MCMC）采样，以解决实践中MCMC采样质量不佳的问题。
methods: 本文提出了一种简单 yet effective的扩散基于权重融合方法，用于长期MCMC采样，并基于其开发了一种新的学习算法。
results: 本文提供了理论证明，表明学习的MCMC权重融合是一个有效的长期MCMC采样方法，并在多个图像模型benchmark数据集上与强对手进行比较，得到了更好的性能。

Abstract
Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts

摘要
<>将文本翻译成简化中文。<>Latent space Energy-Based Models (EBMs)，也称为能量基因准确，在生成模型领域得到了越来越多的关注，这是因为它们在形式化的灵活性和高效的 latent space 模型能力之间。然而，通常通过非收敛短期 MCMC 学习 latent space EBMs 的做法会带来模型的进一步发展困难; 短期 MCMC 抽取质量在实践中 часто导致生成质量下降和训练不稳定，特别是面临高多态和/或高维target distribution。为了解决这种抽取问题，在这篇论文中我们提出了一种简单 yet effective 的扩散基于散度 amortization 方法，并开发了一种基于这种方法的 latent space EBM 学习算法。我们提供了理论证明，表明学习的扩散 MCMC 是一个有效的长期 MCMC 抽取器。在多个图像模型 benchmark 数据集上，我们的方法与强有力的对手相比，表现出了更高的性能。

2023-10-05

cs.CL

cs.CL - 2023-10-05

Exploring the evolution of research topics during the COVID-19 pandemic

paper_url: http://arxiv.org/abs/2310.03928
repo_url: None
paper_authors: Francesco Invernici, Anna Bernasconi, Stefano Ceri
for: 这研究旨在提供一种方法和可视化工具，用于检查COVID-19开放研究数据集（CORD-19）的科学摘要文章。
methods: 该方法基于选择最新技术（包括大语言模型），实现了对文章集成 orthogonal 维度的 clustering 和时间主题挖掘技术。
results: 该方法可以快速、一键 inspect 文章主题内容，并提供时间序列图表和 word cloud 图表，以便对任意时间窗口中主题的出现进行统计测试。

Abstract
The COVID-19 pandemic has changed the research agendas of most scientific communities, resulting in an overwhelming production of research articles in a variety of domains, including medicine, virology, epidemiology, economy, psychology, and so on. Several open-access corpora and literature hubs were established; among them, the COVID-19 Open Research Dataset (CORD-19) has systematically gathered scientific contributions for 2.5 years, by collecting and indexing over one million articles. Here, we present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts. Our method is based upon a careful selection of up-to-date technologies (including large language models), resulting in an architecture for clustering articles along orthogonal dimensions and extraction techniques for temporal topic mining. Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series, equipped with easy-to-drive statistical testing for analyzing the significance of topic emergence along arbitrarily selected time windows. The processes of data preparation and results visualization are completely general and virtually applicable to any corpus of textual documents - thus suited for effective adaptation to other contexts.

摘要
COVID-19 流行病已经对大多数科学社区的研究议程产生了深见的影响，导致了一些领域的研究文章急剧增加，包括医学、病毒学、流行病学、经济学、心理学等等。此外，一些开放获取的数据库和文献庐也被建立起来，其中COVID-19开放研究数据集（CORD-19）在过去2.5年内系统地收集和索引了大量的科学论文。在这里，我们介绍了CORD-19话题可视化工具（CORToViz），它是基于最新的技术（包括大语言模型）的方法和相应的可视化工具，用于探索COVID-19的文本数据库中的科学摘要。我们的方法包括对各个维度进行分 clustering 和时间序列分析等技术，以及一个交互式的可视化面板，可以快速地Visualize 摘要中的话题内容为云图和时间序列图表，同时提供了一些简单易用的统计测试，以分析选定时间窗口中话题的出现是否为 statistically significant。数据准备和结果可视化的过程是完全通用的，可以方便地适应到其他文本数据库上。

Evaluating Multi-Agent Coordination Abilities in Large Language Models

paper_url: http://arxiv.org/abs/2310.03903
repo_url: None
paper_authors: Saaket Agashe, Yue Fan, Xin Eric Wang
for: 这项研究的目标是开发能够与人类和其他系统合作 efectively 的多智能体代理人。
methods: 这项研究使用了 Large Language Models (LLMs)，可以理解、生成和解释人类语言的方式，以开发多智能体 coordination 代理人。
results: 研究表明，使用 LLMs 可以在多智能体协调场景中实现高效的协调，包括理解伙伴的意图、 reasoning 行为、持续协调和对不熟悉的伙伴的Robustness。此外，研究还发现 LLMS 可以在 Overcooked-AI benchmark 中提供有用的帮助，并且可以快速学习和适应新的协调场景。

Abstract
A pivotal aim in contemporary AI research is to develop agents proficient in multi-agent coordination, enabling effective collaboration with both humans and other systems. Large Language Models (LLMs), with their notable ability to understand, generate, and interpret language in a human-like manner, stand out as promising candidates for the development of such agents. In this study, we build and assess the effectiveness of agents crafted using LLMs in various coordination scenarios. We introduce the LLM-Coordination (LLM-Co) Framework, specifically designed to enable LLMs to play coordination games. With the LLM-Co framework, we conduct our evaluation with three game environments and organize the evaluation into five aspects: Theory of Mind, Situated Reasoning, Sustained Coordination, Robustness to Partners, and Explicit Assistance. First, the evaluation of the Theory of Mind and Situated Reasoning reveals the capabilities of LLM to infer the partner's intention and reason actions accordingly. Then, the evaluation around Sustained Coordination and Robustness to Partners further showcases the ability of LLMs to coordinate with an unknown partner in complex long-horizon tasks, outperforming Reinforcement Learning baselines. Lastly, to test Explicit Assistance, which refers to the ability of an agent to offer help proactively, we introduce two novel layouts into the Overcooked-AI benchmark, examining if agents can prioritize helping their partners, sacrificing time that could have been spent on their tasks. This research underscores the promising capabilities of LLMs in sophisticated coordination environments and reveals the potential of LLMs in building strong real-world agents for multi-agent coordination.

摘要
当代人工智能研究的核心目标是开发多智能体协作的能力，以便和人类以及其他系统有效协作。大型自然语言模型（LLM）因其能够理解、生成和解释人类语言方式而出众，因此在开发这类多智能体协作代理人方面表现出了扎实的潜力。在这项研究中，我们采用LLM-Coordination（LLM-Co）框架，以便LLM在协作游戏中表现出色。我们通过三个游戏环境进行评估，并将评估分为五个方面：理解伙伴意图、地域思维、持续协作、对伙伴强健和显式帮助。经过评估，我们发现LLM在理解伙伴意图和地域思维方面具有出色的能力，并在持续协作和对伙伴强健方面超越了强化学习基eline。最后，为了测试显式帮助，我们在Overcooked-AI bencmark中引入了两个新的布局，以测试代理人是否可以主动为伙伴提供帮助，牺牲一些时间来完成自己的任务。这项研究表明LLM在复杂多智能体协作环境中的潜力，并探讨LLM在实际世界中建立强大的多智能体协作代理人的可能性。

Trustworthy Formal Natural Language Specifications

paper_url: http://arxiv.org/abs/2310.03885
repo_url: None
paper_authors: Colin S. Gordon, Sergey Matskevich
for: 这篇论文目的是提供一种在现有的证明助手中支持表达自然语言Specification的方法，以便在证明软件正确性时更好地利用自然语言specification。
methods: 这篇论文使用了一种基于现有证明助手的方法，即使用一种可以自动将自然语言specification翻译成正式laims的方法。这种方法是可扩展的，可以轻松地添加新的词汇和语法结构，并且可以生成证明证书，解释每个词语的解释和句子结构如何计算意思。
results: 这篇论文的实验结果表明，使用这种方法可以正确地翻译多种来自popular textbook的英语描述 formal specifications into Lean formalizations，而无需大量修改词汇库。

Abstract
Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of software with respect to a natural language specification. The translation from informal to formal remains a challenging, time-consuming process that is difficult to audit for correctness. This paper shows that it is possible to build support for specifications written in expressive subsets of natural language, within existing proof assistants, consistent with the principles used to establish trust and auditability in proof assistants themselves. We implement a means to provide specifications in a modularly extensible formal subset of English, and have them automatically translated into formal claims, entirely within the Lean proof assistant. Our approach is extensible (placing no permanent restrictions on grammatical structure), modular (allowing information about new words to be distributed alongside libraries), and produces proof certificates explaining how each word was interpreted and how the sentence's structure was used to compute the meaning. We apply our prototype to the translation of various English descriptions of formal specifications from a popular textbook into Lean formalizations; all can be translated correctly with a modest lexicon with only minor modifications related to lexicon size.

摘要
交互证明助手是计算机程序，它们仔细构建，可以快速地检查人类设计的数学陈述的真实性。然而，这只有确认形式陈述的真实性，而不是自然语言中的陈述。这尤其是在使用证明助手来正式验证软件是否符合自然语言规范时，会出现问题。翻译自然语言中的陈述到形式语言仍然是一项困难的、耗时的任务，难以审核正确性。这篇论文展示了可以在现有的证明助手中支持基于表达ive subset of natural language的规范，并遵循证明助手自己的原则来建立信任和审核性。我们实现了一种方法，可以在Lean证明助手中提供表达ive subset of English的模块化可扩展的 формаль subsets，并自动将自然语言中的陈述翻译成形式索引。我们的方法是可扩展的（不会对语法结构做永久性的限制），可模块化（可以在库中分发信息），并生成证明证明，解释每个单词的解释和句子结构如何计算meaning。我们使用我们的原型将各种自然语言中的英语描述翻译成Lean形式化，所有可以正确地翻译，只需要一个小型词汇库，只需要一些相应的修改。

Automatic and Human-AI Interactive Text Generation

paper_url: http://arxiv.org/abs/2310.03878
repo_url: https://github.com/na-mrata/3D-Animation
paper_authors: Yao Dou, Philippe Laban, Claire Gardent, Wei Xu
for: 这篇论文主要研究了文本生成 tasks，具体来说是文本简化和修改任务，以提高文本的可读性和语言风格，而不改变文本的主要含义和长度。
methods: 这些任务使用了多种自然语言生成（NLG）技术，包括文本简化、重新译写、风格转换等，以达到提高文本可读性和语言风格的目的。
results: 研究人员通过不同的数据集、模型和评估方法来评估和提高文本生成模型的性能，并发现了一些新的技术和方法，如非回退式方法、大语言模型的提前定型、可学习度量和细致人类评估框架等，以提高文本生成的可读性和语言风格。

Abstract
In this tutorial, we focus on text-to-text generation, a class of natural language generation (NLG) tasks, that takes a piece of text as input and then generates a revision that is improved according to some specific criteria (e.g., readability or linguistic styles), while largely retaining the original meaning and the length of the text. This includes many useful applications, such as text simplification, paraphrase generation, style transfer, etc. In contrast to text summarization and open-ended text completion (e.g., story), the text-to-text generation tasks we discuss in this tutorial are more constrained in terms of semantic consistency and targeted language styles. This level of control makes these tasks ideal testbeds for studying the ability of models to generate text that is both semantically adequate and stylistically appropriate. Moreover, these tasks are interesting from a technical standpoint, as they require complex combinations of lexical and syntactical transformations, stylistic control, and adherence to factual knowledge, -- all at once. With a special focus on text simplification and revision, this tutorial aims to provide an overview of the state-of-the-art natural language generation research from four major aspects -- Data, Models, Human-AI Collaboration, and Evaluation -- and to discuss and showcase a few significant and recent advances: (1) the use of non-retrogressive approaches; (2) the shift from fine-tuning to prompting with large language models; (3) the development of new learnable metric and fine-grained human evaluation framework; (4) a growing body of studies and datasets on non-English languages; (5) the rise of HCI+NLP+Accessibility interdisciplinary research to create real-world writing assistant systems.

摘要
在这个教程中，我们关注文本到文本生成任务，这是自然语言生成（NLG）任务的一种，它从一段文本输入中生成一个改进后的文本，保持原始意思和长度，同时符合某些特定的标准（如可读性或语言风格）。这包括了许多有用的应用，如文本简化、重叠生成、风格传递等。与文本概要和开放式文本完成（如故事）不同，文本到文本生成任务在Semantic consistency和targeted language styles方面更加具有制约，这使得这些任务成为模型生成文本的semantic adequacy和风格适应能力的 идеальtestbed。此外，这些任务也具有技术上的挑战，需要复杂的词汇和语法变换、风格控制和事实知识的结合，全面来说。本教程将从数据、模型、人工智能合作和评估四个方面提供文本生成领域的现状报告，并讲解和展示一些最近的进步：（1）非退化方法的使用；（2）大语言模型的 Fine-tuning 到提示；（3）开发新的可学习度量和细化人类评估框架；（4）非英语语料的增长和应用；（5）人工智能+计算机科学+访问性研究的协同发展，以创造真实世界的写作助手系统。

Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report

paper_url: http://arxiv.org/abs/2310.03874
repo_url: None
paper_authors: Jason Holmes, Lian Zhang, Yuzhen Ding, Hongying Feng, Zhengliang Liu, Tianming Liu, William W. Wong, Sujay A. Vora, Jonathan B. Ashman, Wei Liu
for: 本研究旨在使用大型自然语言模型（LLM）标准化 radiation oncology 领域中的结构名称，并为未来研究提供参考基准。methods: 本研究使用 Generative Pre-trained Transformer（GPT）-4 API 实现 DICOM 存储服务器，当接收到结构集 DICOM 文件时，GPT-4 会根据 American Association of Physicists in Medicine（AAPM）任务组（TG）-263 标准重新标注结构名称。选择了三个疾病位置：肾病、头颈部和胸部，对每个疾病类型，随机选择 150 名病人进行手动调整指令提示（分 batches of 50），并随机选择 50 名病人进行评估。results: 结果显示，肾病、头颈部和胸部疾病情况下的结构名称重新标注精度为 96.0%、98.5% 和 96.9% соответственно。重新标注目标体部分的精度较低，除了肾病情况下的 100% 外，其他两个疾病类型的平均精度分别为 93.1% 和 91.1%。

Abstract
Purpose: To introduce the concept of using large language models (LLMs) to re-label structure names in accordance with the American Association of Physicists in Medicine (AAPM) Task Group (TG)-263 standard, and to establish a benchmark for future studies to reference. Methods and Materials: The Generative Pre-trained Transformer (GPT)-4 application programming interface (API) was implemented as a Digital Imaging and Communications in Medicine (DICOM) storage server, which upon receiving a structure set DICOM file, prompts GPT-4 to re-label the structure names of both target volumes and normal tissues according to the AAPM TG-263. Three disease sites, prostate, head and neck, and thorax were selected for evaluation. For each disease site category, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50) and 50 patients were randomly selected for evaluation. Structure names that were considered were those that were most likely to be relevant for studies utilizing structure contours for many patients. Results: The overall re-labeling accuracy of both target volumes and normal tissues for prostate, head and neck, and thorax cases was 96.0%, 98.5%, and 96.9% respectively. Re-labeling of target volumes was less accurate on average except for prostate - 100%, 93.1%, and 91.1% respectively. Conclusions: Given the accuracy of GPT-4 in re-labeling structure names of both target volumes and normal tissues as presented in this work, LLMs are poised to be the preferred method for standardizing structure names in radiation oncology, especially considering the rapid advancements in LLM capabilities that are likely to continue.

摘要
目的：介绍使用大语言模型（LLM）来按照美国物理学会医学分会（AAPM）任务组（TG）263标准重新标注结构名称，并建立参考基准 для未来研究。方法和材料：使用生成预训练的变换器（GPT）4应用程序编程接口（API）将其作为数字医疗影像和通信（DICOM）存储服务器，当接收到结构集DICOM文件时，请求GPT-4将结构名称重新标注为符合AAPM TG-263标准。选择了三个疾病站点，即肾病、头颈部和胸部，进行评估。每个疾病站点类别中随机选择50名病人进行手动调整说明提示（batches of 50），并随机选择50名病人进行评估。考虑重新标注的结构名称是最有可能被用于多个患者的研究中的结构辐射。结果：评估结果显示，肾病、头颈部和胸部疾病 случа例中结构名称重新标注的精度为96.0%、98.5%和96.9% соответственно。重新标注目标体部分的精度较低，除了肾病外，其中的精度为100%、93.1%和91.1%分别。结论：根据这些结果，GPT-4在重新标注结构名称方面的精度很高，LLMs可能成为医学物理学会标准化结构名称的首选方法，特别是考虑到LLM技术的快速发展，未来的进步也可能会继续。

paper_url: http://arxiv.org/abs/2310.03724
repo_url: None
paper_authors: Paul-Ambroise Duquenne, Holger Schwenk, Benoît Sagot
for: 这种方法可以提高 speech-to-text 翻译的竞争力。
methods: 使用独立的编码器和解码器，通过共享固定大小表示进行组合。
results: 在零shot cross-modal speech translation中获得了显著改善，甚至超过了基于 XLSR 的超vised方法。

Abstract
Recent research has shown that independently trained encoders and decoders, combined through a shared fixed-size representation, can achieve competitive performance in speech-to-text translation. In this work, we show that this type of approach can be further improved with multilingual training. We observe significant improvements in zero-shot cross-modal speech translation, even outperforming a supervised approach based on XLSR for several languages.

摘要
latest research has shown that independently trained encoders and decoders, combined through a shared fixed-size representation, can achieve competitive performance in speech-to-text translation. In this work, we show that this type of approach can be further improved with multilingual training. We observe significant improvements in zero-shot cross-modal speech translation, even outperforming a supervised approach based on XLSR for several languages.Note:* "speech-to-text translation" is translated as "语音至文本翻译" (yǔyīn zhì wén tiě bīng yì)* "independently trained encoders and decoders" is translated as "独立训练的编码器和解码器" (dāng zhì xiǎng zhì de biān mǎo yǔ jiě mǎo yì)* "combined through a shared fixed-size representation" is translated as "通过共享固定大小的表示" (tōng guò gòng xiāng gòng dào zhì yǐ jīng)* "multilingual training" is translated as "多语言训练" (duō yǔ yán xiǎng zhì)* "zero-shot cross-modal speech translation" is translated as "零发射跨模态语音翻译" (líng fā shè qū mó dài yǔ yīn bīng yì)* "outperforming a supervised approach based on XLSR" is translated as "超过基于 XLSR 的指导方法" (chāo guò jī bù xīn xiǎng yǐ jīng fāng mó)

A Long Way to Go: Investigating Length Correlations in RLHF

paper_url: http://arxiv.org/abs/2310.03716
repo_url: https://github.com/prasanns/rlhf-length-biases
paper_authors: Prasann Singhal, Tanya Goyal, Jiacheng Xu, Greg Durrett
for: 本研究使用强制学习从人类反馈（RLHF）将大型语言模型Alignment。
methods: 使用开源的偏好数据集和奖励模型进行更广泛的实验，以使系统更加”有用”，如网页问答、概要、多转Dialogue等任务。
results: RLHF通常能够提高模型的性能，但是研究发现，RLHF的主要原因是提高输出长度。在调整奖励模型的时候，可以通过调整输出长度来提高奖励得分。

Abstract
Great successes have been reported using Reinforcement Learning from Human Feedback (RLHF) to align large language models. Open-source preference datasets and reward models have enabled wider experimentation beyond generic chat settings, particularly to make systems more "helpful" for tasks like web question answering, summarization, and multi-turn dialogue. When optimizing for helpfulness, RLHF has been consistently observed to drive models to produce longer outputs. This paper demonstrates that optimizing for response length is a significant factor behind RLHF's reported improvements in these settings. First, we study the relationship between reward and length for reward models trained on three open-source preference datasets for helpfulness. Here, length correlates strongly with reward, and improvements in reward score are driven in large part by shifting the distribution over output lengths. We then explore interventions during both RL and reward model learning to see if we can achieve the same downstream improvements as RLHF without increasing length. While our interventions mitigate length increases, they aren't uniformly effective across settings. Furthermore, we find that even running RLHF with a reward based solely on length can reproduce most of the downstream improvements over the initial policy model, showing that reward models in these settings have a long way to go.

摘要
很大的成功有被报告使用从人类反馈学习（RLHF）调整大型语言模型。开源的偏好数据和奖励模型使得更多的实验可以进行 beyond 通用的对话设定，特别是对 tasks like 网页问答、摘要和多轮对话进行调整。当优化为“帮助”时，RLHF 被观察到 consistently 驱动模型生成更长的输出。本文证明了优化响应length 是 RLHF 报告的改善的重要因素。我们首先研究了奖励和长度之间的关系，发现长度和奖励Score 间存在强正相关，并且改善奖励Score 的主要原因是将输出长度的分布shift。然后我们在RL和奖励模型学习过程中进行了 intervene ，以看看我们可以在不增加长度的情况下 achieving 同等的下游改善。我们发现了一些 intervene 可以 mitigate length increases，但不是uniformly effective across settings。此外，我们发现了在Running RLHF with a reward based solely on length 可以重现大部分的下游改善，显示奖励模型在这些设定下有很长的方向。

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

paper_url: http://arxiv.org/abs/2310.03686
repo_url: None
paper_authors: Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet
for: 这个论文旨在帮助解释转换器模型的内部状态，尤其是encoder-decoder模型。
methods: 这个方法叫做DecoderLens，它基于LogitLens，允许解码器跨attend到经过encoder层的中间表示，而不是使用最终的encoder输出。这种方法可以将不可解 interpreted vector表示转换为可读的字符或符号序列。
results: 应用DecoderLens于问答、逻辑推理、语音识别和机器翻译模型后，发现这些模型在低或中间层解决了一些具体的子任务，从而新的推测了encoder组件内部信息的流动。

Abstract
In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representations of intermediate encoder layers instead of using the final encoder output, as is normally done in encoder-decoder models. The method thus maps previously uninterpretable vector representations to human-interpretable sequences of words or symbols. We report results from the DecoderLens applied to models trained on question answering, logical reasoning, speech recognition and machine translation. The DecoderLens reveals several specific subtasks that are solved at low or intermediate layers, shedding new light on the information flow inside the encoder component of this important class of models.

摘要

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction

paper_url: http://arxiv.org/abs/2310.03668
repo_url: https://github.com/hitz-zentroa/gollie
paper_authors: Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre
for: 这篇论文旨在提高无seen任务泛化的大语言模型（LLMs）的表现，具体来说是在信息提取（IE）任务上。methods: 该论文提出了一种基于指南的大语言模型（GoLLIE），通过遵循指南来提高无seen任务的泛化表现。results: 实验表明，GoLLIE可以成功地遵循未看过的指南，并在无seen任务上表现出优于之前的尝试。减少细节指南的研究也表明了指南的重要性。

Abstract
Large Language Models (LLMs) combined with instruction tuning have made significant progress when generalizing to unseen tasks. However, they have been less successful in Information Extraction (IE), lagging behind task-specific models. Typically, IE tasks are characterized by complex annotation guidelines which describe the task and give examples to humans. Previous attempts to leverage such information have failed, even with the largest models, as they are not able to follow the guidelines out-of-the-box. In this paper we propose GoLLIE (Guideline-following Large Language Model for IE), a model able to improve zero-shot results on unseen IE tasks by virtue of being fine-tuned to comply with annotation guidelines. Comprehensive evaluation empirically demonstrates that GoLLIE is able to generalize to and follow unseen guidelines, outperforming previous attempts at zero-shot information extraction. The ablation study shows that detailed guidelines is key for good results.

摘要
大型语言模型（LLM）结合指南调整已经在无seen任务中做出了 significi cant进步，但在信息提取（IE）方面表现不佳，落后于任务特定模型。通常，IE任务被特定的注释指南描述，这些指南给出了人类的示例。过去尝试使用这些信息来优化模型的尝试都失败了，即使使用最大化模型。在这篇论文中，我们提出了GoLLIE（指南遵循的大语言模型 дляIE），一个能够通过遵循未看过的指南来提高零shot结果的IE任务。经验证明，GoLLIE能够遵循未看过的指南，并且在无seen任务中表现出色。ablation研究表明，详细的指南是关键获得好结果。

TRAM: Bridging Trust Regions and Sharpness Aware Minimization

paper_url: http://arxiv.org/abs/2310.03646
repo_url: https://github.com/tomsherborne/tram_optimizer
paper_authors: Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao Peng
for: 提高领域跨越和表示通用性
methods: 使用信任区 bounds Inform SAM-style regularizers，并提出 Trust Region Aware Minimization（TRAM）算法，用于优化极小化函数和有用的表示，而不会忘记预训练结构
results: TRAM 在跨区域语言模型和cross-语言传输中表现出色，并且超过了锐度感知和信任区基于优化方法。TRAM 成为训练通用模型的新标准，需要最小额外计算。

Abstract
By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of representations as the optimization target for out-of-domain generalization in a fine-tuning setup. To encourage the retention of transferable representations, we consider trust region-based fine-tuning methods, which exploit task-specific skills without forgetting task-agnostic representations from pre-training. We unify parameter- and representation-space smoothing approaches by using trust region bounds to inform SAM-style regularizers on both of these optimization surfaces. We propose Trust Region Aware Minimization (TRAM), a fine-tuning algorithm that optimizes for flat minima and smooth, informative representations without forgetting pre-trained structure. We find that TRAM outperforms both sharpness-aware and trust region-based optimization methods on cross-domain language modeling and cross-lingual transfer, where robustness to domain transfer and representation generality are critical for success. TRAM establishes a new standard in training generalizable models with minimal additional computation.

摘要
通过减少参数空间中折枝的弯曲率，锐度意识化最小化（SAM）得到了广泛的鲁棒性改进。而不是关注参数，这项工作强调了投入的表示空间中的转移性。为了促进保留转移性的表示，我们考虑了信任区域基于的练习方法，该方法利用任务特定技能而不忘记任务无关的表示。我们将参数空间和表示空间的平滑方法统一到了信任区域约束中，以便在SAM风格的正则化中使用信任区域约束。我们提出了信任区域意识化最小化（TRAM）算法，它在微调设置中优化了平均陡峭和有用的表示，而无需忘记先验结构。我们发现TRAM在跨频道语言模型和跨语言传输中表现出色，其中鲁棒性和表示总体性是成功的关键因素。TRAM设置了训练普适模型的新标准，并且增加了最小的额外计算。

Evaluating Self-Supervised Speech Representations for Indigenous American Languages

paper_url: http://arxiv.org/abs/2310.03639
repo_url: None
paper_authors: Chih-Chen Chen, William Chen, Rodolfo Zevallos, John E. Ortega
for: 这 paper 是为了研究自动语音识别（ASR）领域中的自我监督学习（SSL）技术。
methods: 这 paper 使用了现有的大规模 SSL 模型，对 Quechua 语言和其他六种原住民语言进行了低资源 ASR 测试。
results: 结果表明，使用现有的大规模 SSL 模型可以在 Quechua 语言和其他原住民语言的低资源 ASR 中实现出色的表现。

Abstract
The application of self-supervision to speech representation learning has garnered significant interest in recent years, due to its scalability to large amounts of unlabeled data. However, much progress, both in terms of pre-training and downstream evaluation, has remained concentrated in monolingual models that only consider English. Few models consider other languages, and even fewer consider indigenous ones. In our submission to the New Language Track of the ASRU 2023 ML-SUPERB Challenge, we present an ASR corpus for Quechua, an indigenous South American Language. We benchmark the efficacy of large SSL models on Quechua, along with 6 other indigenous languages such as Guarani and Bribri, on low-resource ASR. Our results show surprisingly strong performance by state-of-the-art SSL models, showing the potential generalizability of large-scale models to real-world data.

摘要
“自动监督学习应用于语音表示学习领域已经吸引了相当多的关注，因为它可以承载大量的无标注数据。然而，许多进步，包括预训练和下游评估，都集中在英语之上，只有一些模型考虑了其他语言，而几乎没有考虑过传统语言。在我们的ASRU 2023 ML-SUPERB挑战提交中，我们提供了一个Quechua语言的ASR数据集，同时也评估了6种其他原住民语言，如Guarani和Bribri的低资源ASR。我们的结果显示了现有大规模SSL模型在实际数据上表现了惊人的好几何性。”Note: "ASR" stands for "Automatic Speech Recognition", "SSL" stands for "Self-Supervised Learning", and "ML-SUPERB" is a challenge for multilingual speech recognition.

Redefining Digital Health Interfaces with Large Language Models

paper_url: http://arxiv.org/abs/2310.03560
repo_url: None
paper_authors: Fergus Imrie, Paulius Rauba, Mihaela van der Schaar
for: 这篇论文的目的是探讨如何使用大语言模型（LLMs）提高医疗服务的提供。
methods: 这篇论文使用了外部工具与临床医生之间的接口，以提高数字医疗工具和人工智能模型的实用性和实际效果。
results: 论文通过使用外部工具，解决了使用LLMs在临床设置中的问题，如幻觉。同时，论文还提供了卡ди奥vascular疾病和diabetes风险预测的例子，展示了这种新的接口的优势。

Abstract
Digital health tools have the potential to significantly improve the delivery of healthcare services. However, their use remains comparatively limited due, in part, to challenges surrounding usability and trust. Recently, Large Language Models (LLMs) have emerged as general-purpose models with the ability to process complex information and produce human-quality text, presenting a wealth of potential applications in healthcare. Directly applying LLMs in clinical settings is not straightforward, with LLMs susceptible to providing inconsistent or nonsensical answers. We demonstrate how LLMs can utilize external tools to provide a novel interface between clinicians and digital technologies. This enhances the utility and practical impact of digital healthcare tools and AI models while addressing current issues with using LLM in clinical settings such as hallucinations. We illustrate our approach with examples from cardiovascular disease and diabetes risk prediction, highlighting the benefit compared to traditional interfaces for digital tools.

摘要
“数字健康工具有可能提高医疗服务的提供，但其使用仍然相对有限，主要因为使用困难和信任问题。最近，大型语言模型（LLM）在医疗领域的应用已经具有广泛的潜在应用。直接在临床设置中使用LLM并不直接，LLM容易提供不一致或无意义的答案。我们示示了如何使用外部工具将LLM与临床技术相连，从而提高数字医疗工具和AI模型的实用性和实际效果，同时解决现有的LLM在临床设置中的问题，如投射。我们通过心血管疾病和 диабе层诊断预测的例子示出了我们的方法的优势，比传统界面更加有利。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

PrIeD-KIE: Towards Privacy Preserved Document Key Information Extraction

paper_url: http://arxiv.org/abs/2310.03777
repo_url: None
paper_authors: Saifullah Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed
for: 本研究旨在开发private Key Information Extraction（KIE）系统，通过利用大规模预训练文档基础模型，并结合 diferencial privacy（DP）、联合学习（FL）和不同的参数 Settings。
methods: 本研究使用大文档基础模型进行KIE任务的私有化设置，并通过对六个 benchmark datasets（FUNSD、CORD、SROIE、WildReceipts、XFUND、DOCILE）进行广泛的实验，以证明这些大文档基础模型可以在私有设置下具有充分的性能，同时保持强的隐私保障。
results: 本研究通过分析不同的训练和模型参数对模型性能的影响，提出了简单 yet effective的指南，以实现KIE任务下的优质隐私融合。此外，本研究还引入了FeAm-DP算法，可以有效地在多个客户端 Federated 环境中实现globally DP的扩展。通过对不同客户端和隐私设置进行广泛的评估，本研究证明了FeAm-DP算法可以在多个参与客户端的情况下保持相同的性能和隐私保障。

Abstract
In this paper, we introduce strategies for developing private Key Information Extraction (KIE) systems by leveraging large pretrained document foundation models in conjunction with differential privacy (DP), federated learning (FL), and Differentially Private Federated Learning (DP-FL). Through extensive experimentation on six benchmark datasets (FUNSD, CORD, SROIE, WildReceipts, XFUND, and DOCILE), we demonstrate that large document foundation models can be effectively fine-tuned for the KIE task under private settings to achieve adequate performance while maintaining strong privacy guarantees. Moreover, by thoroughly analyzing the impact of various training and model parameters on model performance, we propose simple yet effective guidelines for achieving an optimal privacy-utility trade-off for the KIE task under global DP. Finally, we introduce FeAm-DP, a novel DP-FL algorithm that enables efficiently upscaling global DP from a standalone context to a multi-client federated environment. We conduct a comprehensive evaluation of the algorithm across various client and privacy settings, and demonstrate its capability to achieve comparable performance and privacy guarantees to standalone DP, even when accommodating an increasing number of participating clients. Overall, our study offers valuable insights into the development of private KIE systems, and highlights the potential of document foundation models for privacy-preserved Document AI applications. To the best of authors' knowledge, this is the first work that explores privacy preserved document KIE using document foundation models.

摘要
在这篇论文中，我们介绍了如何开发 private Key Information Extraction（KIE）系统，通过利用大型预训练文档基础模型、分布式学习（FL）和不同化分布式学习（DP-FL）等技术，以保持强大的隐私保证。通过对六个benchmark数据集（FUNSD、CORD、SROIE、WildReceipts、XFUND和DOCILE）进行广泛的实验，我们证明了大型文档基础模型可以在private Setting下高效地 Fine-tune для KIE任务，以实现适当的性能while maintaining strong privacy guarantees。此外，我们对模型训练和参数的影响进行了系统的分析，并提出了简洁 yet effective的指南，以实现KIE任务下的优质隐私融合。最后，我们引入了FeAm-DP算法，一种基于DP-FL的新算法，可以高效地将全局DP从独立上下文扩展到多客户联邦环境。我们对该算法进行了广泛的评估，并在不同的客户和隐私设置下示出其可以实现相同的性能和隐私保证。总之，我们的研究提供了开发private KIE系统的有价值的信息，并高调了文档基础模型的隐私保护能力，用于隐私保护文档AI应用。作者认为，这是开发private KIE系统的首次研究。

Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards

paper_url: http://arxiv.org/abs/2310.03473
repo_url: None
paper_authors: Litton J Kurisinkel, Nancy F chen
for: 这项研究旨在提出一种可控的多文摘要方法，使用大型自然语言模型（LLM）来优化文本输入，提高报道的可读性。
methods: 该方法使用了一种新的覆盖度和凝聚度直观策略来训练可控内容提取方案，该策略通过一个通过训练的 LLM 得到奖励。
results: 根据ROUGE指标的评估和人工评估，该方法的result比baseline更高，在凝聚性方面也表现出优异。

Abstract
Memory-efficient large language models are good at refining text input for better readability. However, controllability is a matter of concern when it comes to text generation tasks with long inputs, such as multi-document summarization. In this work, we investigate for a generic controllable approach for multi-document summarization that leverages the capabilities of LLMs to refine the text. In particular, we train a controllable content extraction scheme to extract the text that will be refined by an LLM. The scheme is designed with a novel coverage and coherence intuitive policy, which is duly rewarded by a passively trained LLM. Our approach yields competitive results in the evaluation using ROUGE metrics and outperforms potential baselines in coherence, as per human evaluation.

摘要
大型语言模型具有很好的缩写能力，可以为文本输入提高可读性。然而，在长输入文本生成任务中，控制性是一个关注的问题。在这种情况下，我们研究了一种通用可控的多文摘要方法，利用 LLMS 的能力来改进文本。我们培训了一种可控内容提取方案，通过一种新的覆盖率和吸引力感知策略来提取需要改进的文本。这种策略通过一个通过训练的 LLMS 得到的奖励。我们的方法在使用 ROUGE 指标进行评估中获得了竞争力的结果，并在人工评估中超过了可能的基准值。

The North System for Formosa Speech Recognition Challenge 2023

paper_url: http://arxiv.org/abs/2310.03443
repo_url: None
paper_authors: Li-Wei Chen, Kai-Chen Cheng, Hung-Shin Lee
for: 这个研究旨在实现台湾闽南语（六善）自动词音识别。
methods: 这个系统有三个重要的 комponent：训练数据的获取、组合和使用; 模型的架构; 和硬件规格和运行统计。
results: 这个系统的示范已经在https://asrvm.iis.sinica.edu.tw/hakka_sixian中公开。

Abstract
This report provides a concise overview of the proposed North system, which aims to achieve automatic word/syllable recognition for Taiwanese Hakka (Sixian). The report outlines three key components of the system: the acquisition, composition, and utilization of the training data; the architecture of the model; and the hardware specifications and operational statistics. The demonstration of the system has been made public at https://asrvm.iis.sinica.edu.tw/hakka_sixian.

摘要
这份报告提供了北系自动词/音节识别系统的简洁概述，旨在实现台湾闽南话（六年）自动识别。报告介绍了系统的三个关键组成部分：训练数据的获取、组合和使用；模型的架构；以及硬件规格和运行统计。系统的示例已经在https://asrvm.iis.sinica.edu.tw/hakka_sixian上公开。Note: "北系" (Běixì) is a shortened form of "北部自动识别系统" (Běibù Zìdòng Shìbié Xìtsū) in Simplified Chinese, which means "North system" in English.

Neural Language Model Pruning for Automatic Speech Recognition

paper_url: http://arxiv.org/abs/2310.03424
repo_url: None
paper_authors: Leonardo Emili, Thiago Fraga-Silva, Ernest Pusateri, Markus Nußbaum-Thom, Youssef Oualil
for: 这paper研究了应用于Transformer基于神经网络语音识别模型的模型剔除方法，以提高自动语音识别的精度和速度。
methods: 本paper explore了三个剔除框架方面，namely criterion, method和scheduler，并分析了它们在精度和执行速度方面的贡献。据我们知道，这些大规模识别系统的深入分析未经报道在文献中。此外，我们还提出了一种适合逐步压缩模型的低级approximation方法。
results: 主要的结果包括：a) data-driven剔除在一些场景下超过了大小剔除的性能; b) 逐步剔除在目标模型更小时比一次性剔除更高的精度; c) 低级approximation在一定的压缩率下提供了最佳的贸易协议 между压缩和执行速度。

Abstract
We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and inference speed-up for moderate compression.

摘要
我们研究基于Transformer语言模型的模型剪辑方法，用于自动语音识别。我们分析了三个方面的剪辑框架，即标准、方法和调度器，对准确率和执行速度进行分析。根据我们所知，这种大规模识别系统的深入分析没有在文献中报道过。此外，我们还提出了适用于步骤式压缩模型的低级approximation方法，并实现了多个模型 TargetSize不同的多种模型。其中一些结果包括： a) 数据驱动剪辑比例驱动更高的性能在一些场景中表现出色; b) 逐步剪辑比一次剪辑更高的准确率，特别是targetSize更小的场景中; c) 低级approximation具有最佳的剪辑尺度和执行速度之间的折衔，尤其是在moderate压缩情况下。

LLM Based Multi-Document Summarization Exploiting Main-Event Biased Monotone Submodular Content Extraction

paper_url: http://arxiv.org/abs/2310.03414
repo_url: None
paper_authors: Litton J Kurisinkel, Nancy F. Chen
for: 主要目的是提供对新闻文摘中关键事件的 объектив概括，增强文摘的可信度和完整性。
methods: 采用提取-重写方法，通过主事新闻biased monotone-submodular函数选择内容，确保文摘具有充足的上下文和准确性。
results: 评估结果表明，本方法可以准确地捕捉新闻文摘中关键事件的核心信息，同时保持文摘的准确性和完整性。

Abstract
Multi-document summarization is a challenging task due to its inherent subjective bias, highlighted by the low inter-annotator ROUGE-1 score of 0.4 among DUC-2004 reference summaries. In this work, we aim to enhance the objectivity of news summarization by focusing on the main event of a group of related news documents and presenting it coherently with sufficient context. Our primary objective is to succinctly report the main event, ensuring that the summary remains objective and informative. To achieve this, we employ an extract-rewrite approach that incorporates a main-event biased monotone-submodular function for content selection. This enables us to extract the most crucial information related to the main event from the document cluster. To ensure coherence, we utilize a fine-tuned Language Model (LLM) for rewriting the extracted content into a coherent text. The evaluation using objective metrics and human evaluators confirms the effectiveness of our approach, as it surpasses potential baselines, demonstrating excellence in both content coverage, coherence, and informativeness.

摘要
多文摘要是一项具有主观偏见的任务， Duc-2004 参考摘要的低 ROUGE-1 分数为 0.4 表明这一点。在这项工作中，我们想增强新闻摘要的 объектив性，通过关注一组相关新闻文档中的主要事件，并以充分的上下文进行摘要。我们的主要目标是简要报道主要事件，以确保摘要具有 objetivity 和信息性。为实现这一目标，我们采用了提取-重写方法，通过主要事件偏见的 monotone-submodular 函数进行内容选择。这使得我们可以从文档集中提取关键相关主要事件的信息。为确保准确性，我们使用了调整的语言模型（LLM）进行重写提取的内容，以确保摘要具有准确性和一致性。经过对象指标和人类评估者的评估，我们的方法得到了证明，它在内容覆盖率、一致性和信息性等方面具有优异性。

Evaluating Hallucinations in Chinese Large Language Models

paper_url: http://arxiv.org/abs/2310.03368
repo_url: https://github.com/xiami2019/halluqa
paper_authors: Qinyuan Cheng, Tianxiang Sun, Wenwei Zhang, Siyin Wang, Xiangyang Liu, Mozhi Zhang, Junliang He, Mianqiu Huang, Zhangyue Yin, Kai Chen, Xipeng Qiu
for: 本研究提出了一个名为 HalluQA（中文幻视问答）的benchmark，用于量化中文大型语言模型中的幻视现象。
methods: 我们为HalluQA设计了450个精心设计的类 adversarial问题，覆盖多个领域，包括中国历史文化、习俗和社会现象。我们在建立HalluQA时考虑了两种幻视现象：模仿Falsehood和事实错误，并基于GLM-130B和ChatGPT construct adversarial samples。
results: 我们对24个大型语言模型进行了广泛的实验，发现18个模型的非幻视率低于50%。这表明HalluQA是非常具有挑战性的。我们分析了不同类型的模型中的主要幻视类型和其原因。此外，我们还讨论了不同类型的模型中哪些幻视类型应被优先顾及。

Abstract
In this paper, we establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models. HalluQA contains 450 meticulously designed adversarial questions, spanning multiple domains, and takes into account Chinese historical culture, customs, and social phenomena. During the construction of HalluQA, we consider two types of hallucinations: imitative falsehoods and factual errors, and we construct adversarial samples based on GLM-130B and ChatGPT. For evaluation, we design an automated evaluation method using GPT-4 to judge whether a model output is hallucinated. We conduct extensive experiments on 24 large language models, including ERNIE-Bot, Baichuan2, ChatGLM, Qwen, SparkDesk and etc. Out of the 24 models, 18 achieved non-hallucination rates lower than 50%. This indicates that HalluQA is highly challenging. We analyze the primary types of hallucinations in different types of models and their causes. Additionally, we discuss which types of hallucinations should be prioritized for different types of models.

摘要
在这篇论文中，我们建立了一个名为哈鲁QA（中文幻想问答）的标准测试集，用于衡量中文大语模型中的幻想现象。哈鲁QA包含450个精心设计的对抗问题，覆盖多个领域，并考虑了中国历史文化、习俗和社会现象。在建立哈鲁QA时，我们考虑了两种类型的幻想：模仿错误和事实错误，并基于GLM-130B和ChatGPT构建了对抗样本。为评估，我们设计了一种自动评估方法，使用GPT-4来判断模型输出是否幻想。我们对24个大语模型进行了广泛的实验，其中18个模型的非幻想率低于50%。这表明哈鲁QA是非常具有挑战性。我们分析了不同类型的模型中的主要幻想类型和其原因。此外，我们还讨论了不同类型的模型中应该优先级幻想的类型。

Reformulating Domain Adaptation of Large Language Models as Adapt-Retrieve-Revise

paper_url: http://arxiv.org/abs/2310.03328
repo_url: None
paper_authors: Zhen wan, Yating Zhang, Yexiang Wang, Fei Cheng, Sadao Kurohashi
for: 提高中文法律领域中 GPT-4 的应用可能性
methods: 利用 adapt-retrieve-revise 过程来将小型 7B LLM 适应到目标领域，并让 GPT-4 评估证据并修改答案
results: 与直接使用 GPT-4 生成比较，在四个中文法律任务中获得了33.3%的提高率，并较两个更强的搜寻基于的基准值得15.4%和23.9%的提高率。

Abstract
While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capabilities in general domain tasks, they often generate content with hallucinations in specific domains such as Chinese law, hindering their application in these areas. This is typically due to the absence of training data that encompasses such a specific domain, preventing GPT-4 from acquiring in-domain knowledge. A pressing challenge is that it's not plausible to continue training LLMs of such scale on in-domain data. This paper introduces a simple and effective domain adaptation framework for GPT-4 by reformulating generation as an \textbf{adapt-retrieve-revise} process. The initial step is to \textbf{adapt} an affordable 7B LLM to the target domain by continuing learning on in-domain data. When solving a task, we leverage the adapted LLM to generate a draft answer given a task query. Then, the draft answer will be used to \textbf{retrieve} supporting evidence candidates from an external in-domain knowledge base. Finally, the draft answer and retrieved evidence are concatenated into a whole prompt to let GPT-4 assess the evidence and \textbf{revise} the draft answer to generate the final answer. Our proposal combines the advantages of the efficiency of adapting a smaller 7B model with the evidence-assessing capability of GPT-4 and effectively prevents GPT-4 from generating hallucinatory content. In the zero-shot setting of four Chinese legal tasks, our method improves accuracy by 33.3\% compared to the direct generation by GPT-4. When compared to two stronger retrieval-based baselines, our method outperforms them by 15.4\% and 23.9\%. Our code will be released

摘要
While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capabilities in general domain tasks, they often generate content with hallucinations in specific domains such as Chinese law, hindering their application in these areas. This is typically due to the absence of training data that encompasses such a specific domain, preventing GPT-4 from acquiring in-domain knowledge. A pressing challenge is that it's not plausible to continue training LLMs of such scale on in-domain data. This paper introduces a simple and effective domain adaptation framework for GPT-4 by reformulating generation as an \textbf{adapt-retrieve-revise} process. The initial step is to \textbf{adapt} an affordable 7B LLM to the target domain by continuing learning on in-domain data. When solving a task, we leverage the adapted LLM to generate a draft answer given a task query. Then, the draft answer will be used to \textbf{retrieve} supporting evidence candidates from an external in-domain knowledge base. Finally, the draft answer and retrieved evidence are concatenated into a whole prompt to let GPT-4 assess the evidence and \textbf{revise} the draft answer to generate the final answer. Our proposal combines the advantages of the efficiency of adapting a smaller 7B model with the evidence-assessing capability of GPT-4 and effectively prevents GPT-4 from generating hallucinatory content. In the zero-shot setting of four Chinese legal tasks, our method improves accuracy by 33.3\% compared to the direct generation by GPT-4. When compared to two stronger retrieval-based baselines, our method outperforms them by 15.4\% and 23.9\%. Our code will be released.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Learning Personalized Story Evaluation

paper_url: http://arxiv.org/abs/2310.03304
repo_url: None
paper_authors: Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei Li, Yuandong Tian
for: 评估大语言模型（LLM）在开放式文本生成任务上的性能，因为数据污染、多维评价标准和评审人员个人偏好等因素而具有挑战性。
methods: 我们提议在不污染的开放式文本生成评价中模拟个性化。我们创建了两个新的数据集Per-MPST和Per-DOC用于个性化故事评价，并开发了一个个性化故事评价模型PERSE。PERSE通过对特定评审人员的几个示例评论进行学习，预测该评审人员对新文本输入的评价细节或细致比较。
results: 我们的实验结果表明，PERSE在Kendall相关度和对比性准确率上都高于GPT-4，提高了15.8%和13.7%。这两个数据集和代码将被发布。

Abstract
While large language models (LLMs) have shown impressive results for more objective tasks such as QA and retrieval, it remains nontrivial to evaluate their performance on open-ended text generation for reasons including (1) data contamination; (2) multi-dimensional evaluation criteria; and (3) subjectiveness stemming from reviewers' personal preferences. To address such issues, we propose to model personalization in an uncontaminated open-ended generation assessment. We create two new datasets Per-MPST and Per-DOC for personalized story evaluation, by re-purposing existing datasets with proper anonymization and new personalized labels. We further develop a personalized story evaluation model PERSE to infer reviewer preferences and provide a personalized evaluation. Specifically, given a few exemplary reviews from a particular reviewer, PERSE predicts either a detailed review or fine-grained comparison in several aspects (such as interestingness and surprise) for that reviewer on a new text input. Experimental results show that PERSE outperforms GPT-4 by 15.8% on Kendall correlation of story ratings, and by 13.7% on pairwise preference prediction accuracy. Both datasets and code will be released.

摘要
大型语言模型（LLM）在问答和搜寻任务上表现出色，但在开放式文本生成任务上仍然是一个挑战，因为存在以下问题：（1）数据污染；（2）多维评估标准；以及（3）由评论者们的个人偏好所致的主观性。为了解决这些问题，我们提议一个个人化的开放式生成评估方法。我们创建了两个新的数据集Per-MPST和Per-DOC，并发展了一个个人化的故事评估模型（PERSE），以推断评论者的偏好和提供个人化评估。具体来说，给定一些特定评论者的几则评论，PERSE预测该评论者对新文本的评估，包括有兴趣和惊喜等方面的细部评价。我们的实验结果显示，PERSE比GPT-4提高了15.8%的柯德兹通信相互评价准确性，并且比GPT-4提高了13.7%的对照选择准确性。我们将数据和代码发布。

A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User’s Intentions

paper_url: http://arxiv.org/abs/2310.03293
repo_url: None
paper_authors: Siwei Wu, Xiangqing Shen, Rui Xia
for: 提高对话响应生成的精度，使其更加准确地反映用户的意图。
methods: 使用大语言模型（LLM）来生成对话响应，并通过问题来探测用户的潜在意图，以及在域pecific知识库中搜索相关信息。
results: 在两个任务oriented对话任务（巫匠ofWikipedia和Holl-E）上，EDIT比其他LLMs表现出色，提高了对话响应的质量和准确性。

Abstract
Large Language Models (LLMs), such as ChatGPT, have recently been applied to various NLP tasks due to its open-domain generation capabilities. However, there are two issues with applying LLMs to dialogue tasks. 1. During the dialogue process, users may have implicit intentions that might be overlooked by LLMs. Consequently, generated responses couldn't align with the user's intentions. 2. It is unlikely for LLMs to encompass all fields comprehensively. In certain specific domains, their knowledge may be incomplete, and LLMs cannot update the latest knowledge in real-time. To tackle these issues, we propose a framework~\emph{using LLM to \textbf{E}nhance dialogue response generation by asking questions to \textbf{D}etect user's \textbf{I}mplicit in\textbf{T}entions} (\textbf{EDIT}). Firstly, EDIT generates open questions related to the dialogue context as the potential user's intention; Then, EDIT answers those questions by interacting with LLMs and searching in domain-specific knowledge bases respectively, and use LLMs to choose the proper answers to questions as extra knowledge; Finally, EDIT enhances response generation by explicitly integrating those extra knowledge. Besides, previous question generation works only focus on asking questions with answers in context. In order to ask open questions, we construct a Context-Open-Question (COQ) dataset. On two task-oriented dialogue tasks (Wizard of Wikipedia and Holl-E), EDIT outperformed other LLMs.

摘要
大型自然语言模型（LLM），如ChatGPT，在近期被应用于各种自然语言处理任务中，它们具有开放预测能力。然而，在对话任务中，用户可能有隐藏的意图， LLM 可能会忽略这些隐藏的意图，导致生成的响应与用户的意图不符。其次， LLM 可能不能全面覆盖所有领域，在特定领域中，它们的知识可能是不完整的，无法在实时更新最新的知识。为解决这些问题，我们提出了一个框架，使用 LLM 来增强对话响应生成，通过问题来探测用户的隐藏意图。我们的方法包括以下三个步骤：首先，生成对话上下文相关的开放问题，作为用户的隐藏意图；然后，通过与 LLM 交互和在具体领域知识库中搜索，解决这些问题，并使用 LLM 选择最佳答案作为额外知识；最后，将这些额外知识Explicitly integrate into response generation。此外，以前的问题生成工作只是在 Context 中寻找答案。为了生成开放问题，我们构建了 Context-Open-Question（COQ）数据集。在两个任务（巫医与 Holl-E）上，EDIT 比其他 LLM 高效。

A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores

paper_url: http://arxiv.org/abs/2310.03283
repo_url: None
paper_authors: Ke Shen, Mayank Kejriwal
for: This paper aims to provide a systematic understanding of the risks posed by large language models (LLMs) in natural language inference (NLI) tasks, and to propose a risk-centric evaluation framework and a risk-adjusted calibration method to mitigate these risks.
methods: The paper defines and formalizes two types of risk in LLMs, decision risk and composite risk, and proposes four novel metrics for assessing these risks in both in-domain and out-of-domain settings. The proposed risk-adjusted calibration method, called DwD, helps LLMs minimize these risks in an overall NLI architecture.
results: The paper presents detailed experiments using four NLI benchmarks, three baselines, and two LLMs, including ChatGPT, to demonstrate the practical utility of the evaluation framework and the efficacy of DwD in reducing decision and composite risk. For instance, the paper shows that DwD can help an underlying LLM address an extra 20.1% of low-risk inference tasks and skip a further 19.8% of high-risk tasks, which would have been answered incorrectly without risk adjustment.

Abstract
Large Language Models (LLMs), such as ChatGPT, have achieved impressive milestones in natural language processing (NLP). Despite their impressive performance, the models are known to pose important risks. As these models are deployed in real-world applications, a systematic understanding of different risks posed by these models on tasks such as natural language inference (NLI), is much needed. In this paper, we define and formalize two distinct types of risk: decision risk and composite risk. We also propose a risk-centric evaluation framework, and four novel metrics, for assessing LLMs on these risks in both in-domain and out-of-domain settings. Finally, we propose a risk-adjusted calibration method called DwD for helping LLMs minimize these risks in an overall NLI architecture. Detailed experiments, using four NLI benchmarks, three baselines and two LLMs, including ChatGPT, show both the practical utility of the evaluation framework, and the efficacy of DwD in reducing decision and composite risk. For instance, when using DwD, an underlying LLM is able to address an extra 20.1% of low-risk inference tasks (but which the LLM erroneously deems high-risk without risk adjustment) and skip a further 19.8% of high-risk tasks, which would have been answered incorrectly.

摘要
大型自然语言模型（LLM），如ChatGPT，在自然语言处理（NLP）中取得了吸引人的成绩。尽管它们的表现很出色，但这些模型也存在重要的风险。在这些模型被实际应用时，对它们在不同任务上的风险进行系统性的理解是非常重要。在这篇论文中，我们定义和正式化了两种不同的风险类型：决策风险和复杂风险。我们还提出了一种风险中心的评估框架，以及四种新的评估指标，用于评估 LLM 在这些风险上的表现。最后，我们提出了一种名为 DwD 的风险补偿方法，用于帮助 LLM 在整体 NLI 架构中减少这些风险。我们的实验，使用四个 NLI benchmark，三个基eline和两个 LLM，包括 ChatGPT，显示了评估框架的实用性，以及 DwD 在减少决策风险和复杂风险方面的效果。例如，使用 DwD，一个基eline LLM 可以处理更多的低风险推理任务（原本 LLM 错误地认为高风险），并且可以跳过更多的高风险任务（ LLM 会错误地答案）。

Investigating Alternative Feature Extraction Pipelines For Clinical Note Phenotyping

paper_url: http://arxiv.org/abs/2310.03772
repo_url: None
paper_authors: Neil Daniel
for: 这个论文的目的是提出一种新的方法来提取医疗记录中的医学特征。
methods: 这个方法使用了ScispaCy和不同的超参数进行医学特征的提取，然后使用不同的支持学习模型来关联病种的存在与患者特征。
results: 这个研究发现，这种新方法可以减少计算时间，并且可以检测不在医疗记录中的病种。相比之下，使用BERT和LSTM的方法有较高的计算时间和较低的准确率。

Abstract
A common practice in the medical industry is the use of clinical notes, which consist of detailed patient observations. However, electronic health record systems frequently do not contain these observations in a structured format, rendering patient information challenging to assess and evaluate automatically. Using computational systems for the extraction of medical attributes offers many applications, including longitudinal analysis of patients, risk assessment, and hospital evaluation. Recent work has constructed successful methods for phenotyping: extracting medical attributes from clinical notes. BERT-based models can be used to transform clinical notes into a series of representations, which are then condensed into a single document representation based on their CLS embeddings and passed into an LSTM (Mulyar et al., 2020). Though this pipeline yields a considerable performance improvement over previous results, it requires extensive convergence time. This method also does not allow for predicting attributes not yet identified in clinical notes. Considering the wide variety of medical attributes that may be present in a clinical note, we propose an alternative pipeline utilizing ScispaCy (Neumann et al., 2019) for the extraction of common diseases. We then train various supervised learning models to associate the presence of these conditions with patient attributes. Finally, we replicate a ClinicalBERT (Alsentzer et al., 2019) and LSTM-based approach for purposes of comparison. We find that alternative methods moderately underperform the replicated LSTM approach. Yet, considering a complex tradeoff between accuracy and runtime, in addition to the fact that the alternative approach also allows for the detection of medical conditions that are not already present in a clinical note, its usage may be considered as a supplement to established methods.

摘要
医疗行业常用临床笔记，其中包含细致的患者观察记录。然而，电子健康记录系统通常不会将这些观察记录structured format中，使得患者信息具有自动评估和评估的挑战。使用计算机系统进行医学属性提取有很多应用，包括长期分析患者、风险评估和医院评估。最近的工作已经建立了成功的方法，用于从临床笔记中提取医学属性。BERT基于模型可以将临床笔记转换为一系列表示，然后将这些表示condensed into a single document representation based on their CLS embeddings，并将其传递给LSTM（Mulyar et al., 2020）。虽然这个管道可以提高性能，但它需要广泛的对接时间。此外，这种方法不能预测没有在临床笔记中出现的医学属性。Considering the wide variety of medical attributes that may be present in a clinical note, we propose an alternative pipeline utilizing ScispaCy (Neumann et al., 2019) for the extraction of common diseases. We then train various supervised learning models to associate the presence of these conditions with patient attributes. Finally, we replicate a ClinicalBERT (Alsentzer et al., 2019) and LSTM-based approach for purposes of comparison. We find that alternative methods moderately underperform the replicated LSTM approach. Yet, considering a complex tradeoff between accuracy and runtime, in addition to the fact that the alternative approach also allows for the detection of medical conditions that are not already present in a clinical note, its usage may be considered as a supplement to established methods.

InstructProtein: Aligning Human and Protein Language via Knowledge Instruction

paper_url: http://arxiv.org/abs/2310.03269
repo_url: None
paper_authors: Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, Huajun Chen
for: 这个研究旨在推动大语言模型（LLM）在生物sequences的理解，例如蛋白质，并提高其在生物语言和人工语言之间的沟通能力。
methods: 这个研究使用了两种方法：一是将蛋白质序列作为输入，预测其文本功能描述；二是使用自然语言来给蛋白质序列进行生成。
results: 实验结果显示，InstructProtein比现有的LLM出色地表现，对于 bidirectional protein-text生成任务具有优秀的表现。此外，InstructProtein可以实现蛋白质功能预测和序列设计，帮助bridging蛋白质和人工语言理解之间的距离。

Abstract
Large Language Models (LLMs) have revolutionized the field of natural language processing, but they fall short in comprehending biological sequences such as proteins. To address this challenge, we propose InstructProtein, an innovative LLM that possesses bidirectional generation capabilities in both human and protein languages: (i) taking a protein sequence as input to predict its textual function description and (ii) using natural language to prompt protein sequence generation. To achieve this, we first pre-train an LLM on both protein and natural language corpora, enabling it to comprehend individual languages. Then supervised instruction tuning is employed to facilitate the alignment of these two distinct languages. Herein, we introduce a knowledge graph-based instruction generation framework to construct a high-quality instruction dataset, addressing annotation imbalance and instruction deficits in existing protein-text corpus. In particular, the instructions inherit the structural relations between proteins and function annotations in knowledge graphs, which empowers our model to engage in the causal modeling of protein functions, akin to the chain-of-thought processes in natural languages. Extensive experiments on bidirectional protein-text generation tasks show that InstructProtein outperforms state-of-the-art LLMs by large margins. Moreover, InstructProtein serves as a pioneering step towards text-based protein function prediction and sequence design, effectively bridging the gap between protein and human language understanding.

摘要
大型自然语言模型（LLM）已经革命化自然语言处理领域，但它们在蛋白序列上的理解 still falls short. To address this challenge, we propose InstructProtein, an innovative LLM that possesses bidirectional generation capabilities in both human and protein languages: (i) taking a protein sequence as input to predict its textual function description and (ii) using natural language to prompt protein sequence generation. To achieve this, we first pre-train an LLM on both protein and natural language corpora, enabling it to comprehend individual languages. Then supervised instruction tuning is employed to facilitate the alignment of these two distinct languages. Herein, we introduce a knowledge graph-based instruction generation framework to construct a high-quality instruction dataset, addressing annotation imbalance and instruction deficits in existing protein-text corpus. In particular, the instructions inherit the structural relations between proteins and function annotations in knowledge graphs, which empowers our model to engage in the causal modeling of protein functions, akin to the chain-of-thought processes in natural languages. Extensive experiments on bidirectional protein-text generation tasks show that InstructProtein outperforms state-of-the-art LLMs by large margins. Moreover, InstructProtein serves as a pioneering step towards text-based protein function prediction and sequence design, effectively bridging the gap between protein and human language understanding.

Unlock Predictable Scaling from Emergent Abilities

paper_url: http://arxiv.org/abs/2310.03262
repo_url: https://github.com/ShengdingHu/PredictableScaling
paper_authors: Shengding Hu, Xin Liu, Xu Han, Xinrong Zhang, Chaoqun He, Weilin Zhao, Yankai Lin, Ning Ding, Zebin Ou, Guoyang Zeng, Zhiyuan Liu, Maosong Sun
for: 了解大型自然语言模型（LLM）的科学级别扩大的缺点和挑战。
methods: 提出了一种新的评估策略——PassUntil，通过大量的扫描阶段来测试模型的性能。
results: 发现小型模型具有重要的任务性能改进，这些改进不被传统的评估策略所捕捉。通过PassUntil评估策略，发现任务性能随模型大小增长的准确性有限，并提出了一种新的emergent能力定义，以推翻一种流行的多步逻辑假设。

Abstract
The scientific scale-up of large language models (LLMs) necessitates a comprehensive understanding of their scaling properties. However, the existing literature on the scaling properties only yields an incomplete answer: optimization loss decreases predictably as the model size increases, in line with established scaling law; yet no scaling law for task has been established and the task performances are far from predictable during scaling. Task performances typically show minor gains on small models until they improve dramatically once models exceed a size threshold, exemplifying the ``emergent abilities''. In this study, we discover that small models, although they exhibit minor performance, demonstrate critical and consistent task performance improvements that are not captured by conventional evaluation strategies due to insufficient measurement resolution. To measure such improvements, we introduce PassUntil, an evaluation strategy through massive sampling in the decoding phase. We conduct quantitative investigations into the scaling law of task performance. Firstly, a strict task scaling law is identified, enhancing the predictability of task performances. Remarkably, we are able to predict the performance of the 2.4B model on code generation with merely 0.05\% deviation before training starts. Secondly, underpinned by PassUntil, we observe concrete evidence of emergent abilities and ascertain that they are not in conflict with the continuity of performance improvement. Their semblance to break-through is that their scaling curve cannot be fitted by standard scaling law function. We then introduce a mathematical definition for the emergent abilities. Through the definition, we refute a prevalent ``multi-step reasoning hypothesis'' regarding the genesis of emergent abilities and propose a new hypothesis with a satisfying fit to the observed scaling curve.

摘要
科学级大语言模型（LLM）的扩大 Properties需要一个全面的理解。然而，现有的文献中的扩大 Properties只提供了一个不完整的答案：优化损失随模型大小减少，与已知的扩大法律一致；然而，没有任何任务扩大法律，并且任务性能并不是预测可能的，任务性能通常在小模型上显示微增长，直到模型大小超过一个阈值，然后快速增长，这是“emergent abilities”的表现。在这项研究中，我们发现，即使小模型表现不佳，它们仍然表现出重要和一致的任务性能改进，这些改进不被传统的评价策略捕捉，因为测量分辨率不够高。为了测量这些改进，我们引入了PassUntil评价策略，通过大量采样在解码阶段进行测量。我们进行了量化的研究，探索任务性能扩大的法律。首先，我们发现了一个严格的任务扩大法律，从而提高了任务性能的预测可能性。特别是，我们可以在代码生成任务上预测2.4B模型的性能，只需0.05%的差异。其次，通过PassUntil，我们发现了真正的emergent abilities，并证明它们与突破性有很大相似之处，它们的扩大曲线不能用标准扩大法律函数描述。然后，我们提出了一个数学定义，用于描述emergent abilities。通过定义，我们否定了一种流行的“多步逻辑假设”，即emergent abilities的起源，并提出了一种新的假设，具有满足性的适应性。

Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning

paper_url: http://arxiv.org/abs/2310.03249
repo_url: None
paper_authors: Mohamed Aghzal, Erion Plaku, Ziyu Yao
for: 本研究旨在探讨大型自然语言处理器（LLM）在长期规划和空间理解方面的局限性。
methods: 本文提出了一个新的benchmark，名为“路径规划从自然语言”（PPNL），以评估LLM在避免障碍物和遵循约束的情况下进行航向目标位置的能力。
results: 通过使用不同的几何示例和精度调整方法，我们发现GPT-4在干预示中间 reasoning能力较强，但在长期时间征 reasoning方面仍然失败。相比之下，经过精度调整的LLM在适应环境中表现出色，但在更大的环境或具有更多障碍物的环境中表现较差。

Abstract
Large language models (LLMs) have achieved remarkable success across a wide spectrum of tasks; however, they still face limitations in scenarios that demand long-term planning and spatial reasoning. To facilitate this line of research, in this work, we propose a new benchmark, termed $\textbf{P}$ath $\textbf{P}$lanning from $\textbf{N}$atural $\textbf{L}$anguage ($\textbf{PPNL}$). Our benchmark evaluates LLMs' spatial-temporal reasoning by formulating ''path planning'' tasks that require an LLM to navigate to target locations while avoiding obstacles and adhering to constraints. Leveraging this benchmark, we systematically investigate LLMs including GPT-4 via different few-shot prompting methodologies and BART and T5 of various sizes via fine-tuning. Our experimental results show the promise of few-shot GPT-4 in spatial reasoning, when it is prompted to reason and act interleavedly, although it still fails to make long-term temporal reasoning. In contrast, while fine-tuned LLMs achieved impressive results on in-distribution reasoning tasks, they struggled to generalize to larger environments or environments with more obstacles.

摘要
大型语言模型（LLM）在各种任务上取得了优异成绩，但仍面临长期观念和空间理解的限制。为进一步推进这一研究，在这个工作中，我们提出了一个新的benchmark，称为“路径观察”（PPNL）。我们的benchmark评估LLM的空间-时间理解，通过要求LLM通过避免障碍物并遵循限制进行路径观察。我们通过不同的几何提示方法和精致适应的LLM进行系统性的探索，包括GPT-4、BART和T5。我们的实验结果显示GPT-4在几何提示下能够具有优异的空间理解能力，但仍无法进行长期时间的观察。相比之下，精致适应的LLM在类型内的理解任务上表现出色，但是它们在更大的环境或具有更多障碍的环境中则对应不佳。

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

paper_url: http://arxiv.org/abs/2310.03214
repo_url: https://github.com/freshllms/freshqa
paper_authors: Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong
for: 这个研究旨在检验大型自然语言模型（LLM）是否能够适应我们现在的变化世界，并对其生成文本的准确性进行评估。
methods: 作者引入了一个新的动态Question Answering（QA） benchmark，名为FreshQA，该 benchmark 包括了多种问题和答案类型，包括需要快速变化的世界知识以及含有谬误的前提的问题。作者使用了一种名为FreshPrompt的简单的几个shot提示方法，通过在提示中包含相关和当前的信息来提高 LLM 的性能。
results: 作者通过人工评估，发现所有模型都在快速变化的知识和谬误前提下表现不佳，而 FreshPrompt 方法可以显著提高 LLM 的性能，并在与其他搜索引擎增强的提示方法和商业系统如 Perplexity.AI 进行比较中表现出色。

Abstract
Most large language models (LLMs) are trained once and never updated; thus, they lack the ability to dynamically adapt to our ever-changing world. In this work, we perform a detailed study of the factuality of LLM-generated text in the context of answering questions that test current world knowledge. Specifically, we introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types, including questions that require fast-changing world knowledge as well as questions with false premises that need to be debunked. We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination. Through human evaluations involving more than 50K judgments, we shed light on limitations of these models and demonstrate significant room for improvement: for instance, all models (regardless of model size) struggle on questions that involve fast-changing knowledge and false premises. Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA by incorporating relevant and up-to-date information retrieved from a search engine into the prompt. Our experiments show that FreshPrompt outperforms both competing search engine-augmented prompting methods such as Self-Ask (Press et al., 2022) as well as commercial systems such as Perplexity.AI. Further analysis of FreshPrompt reveals that both the number of retrieved evidences and their order play a key role in influencing the correctness of LLM-generated answers. Additionally, instructing the LLM to generate concise and direct answers helps reduce hallucination compared to encouraging more verbose answers. To facilitate future work, we release FreshQA at github.com/freshllms/freshqa and commit to updating it at regular intervals.

摘要
大多数大型语言模型（LLM）在训练后从未更新，因此它们缺乏能够动态适应我们世界的变化的能力。在这项工作中，我们进行了详细的实验，以确定LLM生成的文本是否准确。 Specifically，我们引入了一个新的动态问答数据集，名为FreshQA，其包含了多种问题和答案类型，包括需要快速更新的世界知识以及含有谬误的前提的问题。我们对一些关闭和开源LLM进行了两种评估方法，以评估它们的正确性和幻想。通过人工评估，我们发现了这些模型的局限性，并证明了它们在快速更新的知识和谬误前提下的表现不佳。为了解决这些问题，我们提出了FreshPrompt，一种简单的几个示例提示法，可以使LLM在FreshQA上表现更好。我们的实验显示，FreshPrompt不仅比自然语言提示法（Press et al., 2022）和商业系统（Perplexity.AI）更高效，还能够在不同的问题类型下提高LLM的表现。我们的分析表明，检索引擎搜索结果的数量和顺序均对LLM生成答案的正确性产生重要影响。此外， instrucibing LLM生成简洁和直接的答案可以减少幻想。为了促进未来的研究，我们在github.com/freshllms/freshqa上发布了FreshQA，并将在定期更新。