2023-10-09

cs.AI

cs.AI - 2023-10-09

Estimating Numbers without Regression

paper_url: http://arxiv.org/abs/2310.06204
repo_url: https://github.com/Aastha2104/Parkinson-Disease-Prediction
paper_authors: Avijit Thawani, Jay Pujara, Ashwin Kalyan
for: 提高语言模型对数字的表示能力
methods: modificare 数字表示方法，包括notation、vocabulary和语言模型架构
results: Tokenization scheme 可以提高 masked number prediction 性能，而无需大规模修改语言模型架构。

Abstract
Despite recent successes in language models, their ability to represent numbers is insufficient. Humans conceptualize numbers based on their magnitudes, effectively projecting them on a number line; whereas subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks. To alleviate this shortcoming, alternative approaches have been proposed that modify numbers at various stages of the language modeling pipeline. These methods change either the (1) notation in which numbers are written (\eg scientific vs decimal), the (2) vocabulary used to represent numbers or the entire (3) architecture of the underlying language model, to directly regress to a desired number. Previous work suggests that architectural change helps achieve state-of-the-art on number estimation but we find an insightful ablation: changing the model's vocabulary instead (\eg introduce a new token for numbers in range 10-100) is a far better trade-off. In the context of masked number prediction, a carefully designed tokenization scheme is both the simplest to implement and sufficient, \ie with similar performance to the state-of-the-art approach that requires making significant architectural changes. Finally, we report similar trends on the downstream task of numerical fact estimation (for Fermi Problems) and discuss reasons behind our findings.

摘要
尽管最近的语言模型具有了一定的成功，它们对数字的表示仍然不够。人类对数字基于其大小来思考，实际将其投射到数字线上，而分词 Tokenization 则不能明确地捕捉大小。为了解决这个缺陷，有些方法提议在语言模型的不同阶段进行修改。这些方法可以改变（1）数字的notation（例如科学 notation vs 十进制），（2）用于表示数字的词汇，或（3）语言模型的基础结构，以直接预测目标数字。根据我们的研究，对语言模型的结构进行修改可以达到领先的性能，但我们发现一个有趣的ablation：改变模型的词汇（例如引入10-100之间的数字新token）是一个更好的交换。在遮盲数字预测 зада务中，一个精心设计的tokenization scheme是最简单的实现方式，并且具有与采用大量结构修改的状态艺术领先性的相似性。最后，我们报告了相似的趋势在下游任务中（例如数学问题），并讨论了我们的发现的原因。

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM

paper_url: http://arxiv.org/abs/2310.06178
repo_url: None
paper_authors: Saeed Maleki
for: 本研究旨在提出一种新的低精度数据类型算法，以提高人工智能模型的训练和执行效率。
methods: 本研究使用了一种新的msGeMM算法，该算法可以在低精度数据类型下实现AI模型的训练和执行，并且可以减少约2.5倍的乘法和加法指令数量。
results: 本研究的结果表明，msGeMM算法可以在NVIDIA和AMD的GPU上实现AI模型的训练和执行，并且可以提高模型训练和执行的效率。

Abstract
AI models are increasing in size and recent advancement in the community has shown that unlike HPC applications where double precision datatype are required, lower-precision datatypes such as fp8 or int4 are sufficient to bring the same model quality both for training and inference. Following these trends, GPU vendors such as NVIDIA and AMD have added hardware support for fp16, fp8 and int8 GeMM operations with an exceptional performance via Tensor Cores. However, this paper proposes a new algorithm called msGeMM which shows that AI models with low-precision datatypes can run with ~2.5x fewer multiplication and add instructions. Efficient implementation of this algorithm requires special CUDA cores with the ability to add elements from a small look-up table at the rate of Tensor Cores.

摘要
“人工智能模型不断增大，而最新的社区发展表明，不同于高性能计算应用程序（HPC）中需要双精度数据类型，低精度数据类型如fp8或int4却可以提供同等模型质量 Both for training and inference。随着这些趋势，GPU提供者如NVIDIA和AMD已经添加了硬件支持 дляfp16、fp8和int8 GeMM操作，通过tensor核心实现了非常出色的性能。但本文提出了一新的算法called msGeMM，显示低精度数据类型的AI模型可以透过大约2.5倍的multiplication和add指令数量进行运算。实施此算法需要特殊的CUDA核心，能够快速从小look-up表中添加元素，与tensor核心具有相同的速度。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

Factual and Personalized Recommendations using Language Models and Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.06176
repo_url: None
paper_authors: Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier
for: 这个论文主要旨在开发一种能够为用户提供有趣、个性化、有挑战性的电影推荐系统，通过自然语言交互来匹配用户的偏好。
methods: 该论文使用了一种基于语言模型的推荐系统，其中包括一种基于用户偏好的嵌入空间表示，以及一种基于用户反馈的评价函数。
results: 经过实验 validate，该论文的方法可以在MovieLens 25M 数据集上提供有趣、个性化、有挑战性的电影推荐，并且可以准确地捕捉用户的偏好。

Abstract
Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences. While traditional RSs rely on implicit user feedback signals, conversational RSs interact with users in natural language. In this work, we develop a comPelling, Precise, Personalized, Preference-relevant language model (P4LM) that recommends items to users while putting emphasis on explaining item characteristics and their relevance. P4LM uses the embedding space representation of a user's preferences to generate compelling responses that are factually-grounded and relevant w.r.t. the user's preferences. Moreover, we develop a joint reward function that measures precision, appeal, and personalization, which we use as AI-based feedback in a reinforcement learning-based language model framework. Using the MovieLens 25M dataset, we demonstrate that P4LM delivers compelling, personalized movie narratives to users.

摘要
Simplified Chinese translation:推荐系统（RS）在连接用户与内容、产品和服务方面扮演中心角色，根据用户的偏好来匹配候选项。传统的RS通过隐式用户反馈信号来工作，而对话式RS则通过自然语言与用户交互。在这项工作中，我们开发了一个名为P4LM的语言模型，它可以为用户推荐项目，同时强调解释项目特性以及其与用户偏好的相关性。P4LM使用用户偏好的嵌入空间表示来生成有吸引力和个性化的回应，这些回应与用户偏好相关。此外，我们开发了一个共同奖励函数，该函数衡量精度、吸引力和个性化三个方面，并用于在语言模型框架中作为AI-based反馈。使用MovieLens 25M数据集，我们示示了P4LM可以为用户提供有吸引力和个性化的电影情节。

How does prompt engineering affect ChatGPT performance on unsupervised entity resolution?

paper_url: http://arxiv.org/abs/2310.06174
repo_url: None
paper_authors: Khanin Sisaengsuwanchai, Navapat Nananukul, Mayank Kejriwal
for: 这篇论文是关于实体解析（ER）问题的研究，具体来说是研究如何使用大型自然语言模型（LLM）来自动决定两个实体是否指向同一个基础实体。
methods: 本论文使用了大型自然语言模型（LLM），如ChatGPT，来进行实体解析。研究者通过不同的提示方法来评估LLM的性能，并对不同数据集进行比较。
results: 研究结果显示，提示方法可以很大程度上影响LLM的性能，其中一些指标更加敏感于提示方法的变化。此外，结果还表明了数据集的不同性可能会导致提示方法的不同效果。

Abstract
Entity Resolution (ER) is the problem of semi-automatically determining when two entities refer to the same underlying entity, with applications ranging from healthcare to e-commerce. Traditional ER solutions required considerable manual expertise, including feature engineering, as well as identification and curation of training data. In many instances, such techniques are highly dependent on the domain. With recent advent in large language models (LLMs), there is an opportunity to make ER much more seamless and domain-independent. However, it is also well known that LLMs can pose risks, and that the quality of their outputs can depend on so-called prompt engineering. Unfortunately, a systematic experimental study on the effects of different prompting methods for addressing ER, using LLMs like ChatGPT, has been lacking thus far. This paper aims to address this gap by conducting such a study. Although preliminary in nature, our results show that prompting can significantly affect the quality of ER, although it affects some metrics more than others, and can also be dataset dependent.

摘要
entity resolution (er) 是指自动或半自动地确定两个实体是指同一个基础实体，它在医疗、电子商务等领域有广泛的应用。传统的er解决方案需要较大的人工干预，包括特征工程和训练数据的标识和筛选。在许多情况下，这些技术是域特定的。 however， with the recent advent of large language models (LLMs), there is an opportunity to make er much more seamless and domain-independent. unfortunately, it is also well known that LLMs can pose risks, and the quality of their outputs can depend on so-called prompt engineering. to address this gap, this paper aims to conduct a systematic experimental study on the effects of different prompting methods for addressing er, using LLMs like chatgpt. although preliminary in nature, our results show that prompting can significantly affect the quality of er, although it affects some metrics more than others and can also be dataset dependent.

Memory-Consistent Neural Networks for Imitation Learning

paper_url: http://arxiv.org/abs/2310.06171
repo_url: https://github.com/kaustubhsridhar/MCNN
paper_authors: Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman, James Weimer, Insup Lee
For: + The paper is written for imitation learning applications, specifically to address the problem of compounding errors in policy synthesis. + The authors aim to develop a new method that can learn from expert demonstrations and improve the performance of imitation policies.* Methods: + The proposed method is called “memory-consistent neural network” (MCNN), which is a type of deep neural network that is designed to counter the compounding error phenomenon. + The MCNN outputs are hard-constrained to stay within clearly specified permissible regions anchored to prototypical “memory” training samples. + The authors provide a guaranteed upper bound for the sub-optimality gap induced by MCNN policies.* Results: + The authors test the MCNN method on 9 imitation learning tasks, including dexterous robotic manipulation and driving, proprioceptive inputs and visual inputs, and varying sizes and types of demonstration data. + They find large and consistent gains in performance using the MCNN method, validating that it is better-suited for imitation learning applications than vanilla deep neural networks.Here is the information in Simplified Chinese text:* For: + 这篇论文是为了解决imitition learning应用中的政策合成问题，具体来说是解决错误堆叠问题。 + 作者们想要开发一种可以从专家示例学习并提高imitition政策的方法。* Methods: + 提出的方法是called “memory-consistent neural network” (MCNN)，这是一种特殊的深度神经网络，旨在解决错误堆叠问题。 + MCNN输出是固定在明确规定的可能区域内的，这些可能区域是基于”memory”训练样本的概念示例。 + 作者们提供了一个确定的上限 bound дляMCNN政策中的优化性差。* Results: + 作者们对9个imitition learning任务进行测试，包括dexterous robotic manipulation和驾驶、 proprioceptive inputs和视觉输入、以及不同的示例数据大小和类型。 + 他们发现，使用MCNN方法可以获得大量和稳定的性能提升，证明MCNN比vanilla深度神经网络更适合imitition learning应用。

Abstract
Imitation learning considerably simplifies policy synthesis compared to alternative approaches by exploiting access to expert demonstrations. For such imitation policies, errors away from the training samples are particularly critical. Even rare slip-ups in the policy action outputs can compound quickly over time, since they lead to unfamiliar future states where the policy is still more likely to err, eventually causing task failures. We revisit simple supervised ``behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon. Our ``memory-consistent neural network'' (MCNN) outputs are hard-constrained to stay within clearly specified permissible regions anchored to prototypical ``memory'' training samples. We provide a guaranteed upper bound for the sub-optimality gap induced by MCNN policies. Using MCNNs on 9 imitation learning tasks, with MLP, Transformer, and Diffusion backbones, spanning dexterous robotic manipulation and driving, proprioceptive inputs and visual inputs, and varying sizes and types of demonstration data, we find large and consistent gains in performance, validating that MCNNs are better-suited than vanilla deep neural networks for imitation learning applications. Website: https://sites.google.com/view/mcnn-imitation

摘要
“模仿学习可以大幅简化政策生成比较于其他方法，通过利用专家示范的访问。对于这些模仿政策，错误离开训练样本是非常重要的。甚至 rare 的政策动作输出错误可以快速堆积，因为它们导致未知的未来状态， где政策仍然更有可能出错，最终导致任务失败。我们重新考虑简单的监督式“行为克隆”，通过只使用预录的示范来训练政策，但是注意地设计模型类来对错误堆积现象进行抗衡。我们的“记忆一致神经网络”（MCNN）输出是固定的约束在 clearly specified 的允许区域内， anchored 在“记忆”训练样本上。我们提供了对 MCNN 政策的至少优化差的保证上限。使用 MCNN 在 9 个模仿学习任务上，包括多种各种类型的示范数据，我们发现了大量和一致的性能提升，证明 MCNN 更适合于模仿学习应用。网站：https://sites.google.com/view/mcnn-imitation”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Predictable Artificial Intelligence

paper_url: http://arxiv.org/abs/2310.06167
repo_url: https://github.com/duemig/Stanford-Project-Predicting-stock-prices-using-a-LSTM-Network
paper_authors: Lexin Zhou, Pablo A. Moreno-Casares, Fernando Martínez-Plumed, John Burden, Ryan Burnell, Lucy Cheke, Cèsar Ferri, Alexandru Marcoci, Behzad Mehrbakhsh, Yael Moros-Daval, Seán Ó hÉigeartaigh, Danaja Rutar, Wout Schellaert, Konstantinos Voudouris, José Hernández-Orallo
for: 这篇论文旨在探讨 Predictable AI 这一新兴研究领域的基本思想和挑战。
methods: 本论文使用的方法是阐述 Predictable AI 领域的问题、假设和挑战，并呼吁开发者关注 AI 预测性的问题。
results: 本论文认为，在 AI 预测性方面取得积极进展可以帮助建立更加可靠、负责任、控制、对齐和安全的 AI 生态系统，因此应该在性能之上优先考虑预测性。

Abstract
We introduce the fundamental ideas and challenges of Predictable AI, a nascent research area that explores the ways in which we can anticipate key indicators of present and future AI ecosystems. We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems, and thus should be prioritised over performance. While distinctive from other areas of technical and non-technical AI research, the questions, hypotheses and challenges relevant to Predictable AI were yet to be clearly described. This paper aims to elucidate them, calls for identifying paths towards AI predictability and outlines the potential impact of this emergent field.

摘要
我团队介绍了人工智能预测的基本想法和挑战，这是一个新兴的研究领域，探讨了如何预测AI生态系统中的关键指标。我们认为，实现预测性是在AI生态系统中建立信任、责任、控制、对齐和安全的关键，因此应该被优先于性能。尽管与其他技术和非技术AI研究领域不同，Predictable AI中的问题、假设和挑战仍未得到清晰描述。这篇论文的目的是为此领域提供定义，呼吁开发AI预测的道路，并详细说明这个新兴领域的潜在影响。

CAW-coref: Conjunction-Aware Word-level Coreference Resolution

paper_url: http://arxiv.org/abs/2310.06165
repo_url: https://github.com/kareldo/wl-coref
paper_authors: Karel D’Oosterlinck, Semere Kiros Bitew, Brandon Papineau, Christopher Potts, Thomas Demeester, Chris Develder
for: 这个论文的目的是提高Word-level核心参照解决方法的性能，以便在大量文档中进行信息提取。
methods: 这个论文使用了一种简单 yet effective的解决方法，即在Word-level核心参照模型中处理 conjunction mentions，以提高 OntoNotes 测试集的 F1 分数。
results: 这个解决方法可以提高 OntoNotes 测试集的 F1 分数 by 0.9%，使得 Word-level 核心参照解决方法与 expensive SOTA 方法之间的差距缩小了 34.6%。

Abstract
State-of-the-art coreference resolutions systems depend on multiple LLM calls per document and are thus prohibitively expensive for many use cases (e.g., information extraction with large corpora). The leading word-level coreference system (WL-coref) attains 96.6% of these SOTA systems' performance while being much more efficient. In this work, we identify a routine yet important failure case of WL-coref: dealing with conjoined mentions such as 'Tom and Mary'. We offer a simple yet effective solution that improves the performance on the OntoNotes test set by 0.9% F1, shrinking the gap between efficient word-level coreference resolution and expensive SOTA approaches by 34.6%. Our Conjunction-Aware Word-level coreference model (CAW-coref) and code is available at https://github.com/KarelDO/wl-coref.

摘要
现代核心投引系统取决于文档中多个LLM调用，因此对许多应用场景（如大量文档提取信息）而言是不可持预算的。领先的单词级投引系统（WL-coref）达到了96.6%的SOTA系统性能，而且非常高效。在这项工作中，我们发现了WL-coref中的一种重要且常见失败情况：处理连接的提及，如“Tom和Mary”。我们提出了一种简单 yet有效的解决方案，在OntoNotes测试集上提高了0.9%的F1分，将高效的单词级投引与昂贵的SOTA方法之间的差距缩小了34.6%。我们的Conjunction-Aware Word-level coreference模型（CAW-coref）和代码可以在GitHub上找到：https://github.com/KarelDO/wl-coref。

Understanding Transfer Learning and Gradient-Based Meta-Learning Techniques

paper_url: http://arxiv.org/abs/2310.06148
repo_url: https://github.com/mikehuisman/transfer-meta-feature-representations
paper_authors: Mike Huisman, Aske Plaat, Jan N. van Rijn
for: 本研究旨在探讨meta-学习技术在不同数据分布下的表现，并比较了finetuning、MAML和Reptile三种方法的性能。
methods: 本研究使用了三种方法：finetuning、MAML和Reptile。finetuning是一种简单的微调方法，MAML是一种基于学习环境的meta-学习技术，Reptile是一种基于精度评估的meta-学习技术。
results: 研究结果表明，在相同数据分布下，finetuning的性能较高，而MAML和Reptile在不同数据分布下的性能较差。此外，研究还发现MAML和Reptile在严重数据缺乏情况下特化于快适应，而finetuning可以背景学习。最后，研究发现finetuning学习的特征为得到的特征更加多样和特异。

Abstract
Deep neural networks can yield good performance on various tasks but often require large amounts of data to train them. Meta-learning received considerable attention as one approach to improve the generalization of these networks from a limited amount of data. Whilst meta-learning techniques have been observed to be successful at this in various scenarios, recent results suggest that when evaluated on tasks from a different data distribution than the one used for training, a baseline that simply finetunes a pre-trained network may be more effective than more complicated meta-learning techniques such as MAML, which is one of the most popular meta-learning techniques. This is surprising as the learning behaviour of MAML mimics that of finetuning: both rely on re-using learned features. We investigate the observed performance differences between finetuning, MAML, and another meta-learning technique called Reptile, and show that MAML and Reptile specialize for fast adaptation in low-data regimes of similar data distribution as the one used for training. Our findings show that both the output layer and the noisy training conditions induced by data scarcity play important roles in facilitating this specialization for MAML. Lastly, we show that the pre-trained features as obtained by the finetuning baseline are more diverse and discriminative than those learned by MAML and Reptile. Due to this lack of diversity and distribution specialization, MAML and Reptile may fail to generalize to out-of-distribution tasks whereas finetuning can fall back on the diversity of the learned features.

摘要

Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond

paper_url: http://arxiv.org/abs/2310.06147
repo_url: None
paper_authors: Hao Sun
for: 这 paper 的目的是将传统RL与LLM研究中使用的RL技术相连接，解释RL在LLM中的优势和应用场景。
methods: 这 paper 使用了RLHF技术，具体来说是在线 inverse RL with offline demonstration data，并与 SFT 进行比较。
results: 这 paper 发现RLHF比 SFT 更为有利，因为它可以减少练习数据中的问题折衔。此外，RLHF 可以应用于其他 LLM 任务，如提问评估和优化，即使它们的反馈也是昂贵的。但RLHF 的策略学习更加具有挑战，因为它们的动作维度很高，并且反馈稀缺。

Abstract
Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest (3H) responses can largely be attributed to the technique of Reinforcement Learning from Human Feedback (RLHF). In this paper, we aim to link the research in conventional RL to RL techniques used in LLM research. Demystify this technique by discussing why, when, and how RL excels. Furthermore, we explore potential future avenues that could either benefit from or contribute to RLHF research. Highlighted Takeaways: 1. RLHF is Online Inverse RL with Offline Demonstration Data. 2. RLHF $>$ SFT because Imitation Learning (and Inverse RL) $>$ Behavior Cloning (BC) by alleviating the problem of compounding error. 3. The RM step in RLHF generates a proxy of the expensive human feedback, such an insight can be generalized to other LLM tasks such as prompting evaluation and optimization where feedback is also expensive. 4. The policy learning in RLHF is more challenging than conventional problems studied in IRL due to their high action dimensionality and feedback sparsity. 5. The main superiority of PPO over off-policy value-based methods is its stability gained from (almost) on-policy data and conservative policy updates.

摘要
最近的大语言模型（LLM）的进步引起了广泛的关注，如ChatGPT和GPT-4等产品。它们的遵循指令和提供无害、有益和诚实（3H）回复的能力归功于人类反馈学习（RLHF）的技术。在这篇论文中，我们想要将传统RL研究与LLM研究中的RL技术相连。通过解释RLHF的优势和应用场景，我们希望能够启发更多的研究者关注和投入到这一领域。突出的摘要：1. RLHF是在线反RL与离线示例数据的组合。2. RLHF比SFT更高效，因为假设学习（和反RL）比Behavior Cloning（BC）更高效，因为它可以解决复杂的错误问题。3. RM步骤在RLHF中生成了贵重的人类反馈的代理，这种理解可以推广到其他LLM任务，如提问评估和优化，其中Feedback也是贵重的。4. RLHF中策略学习比传统IRL中的问题更加挑战，因为它们具有高动作维度和反馈稀缺性。5. PPO在比值基方法更稳定，因为它在（近乎）在policy上的数据上学习，并且保守地更新策略。

Layout Sequence Prediction From Noisy Mobile Modality

paper_url: http://arxiv.org/abs/2310.06138
repo_url: https://github.com/Hai-chao-Zhang/LTrajDiff
paper_authors: Haichao Zhang, Yi Xu, Hongsheng Lu, Takayuki Shimizu, Yun Fu
for: 本研究旨在解决现实世界中 Layout sequence 和 trajectory prediction 模型所遇到的挑战，使得可以正确地预测行人 bounding box 的 trajectory。
methods: 我们提出了 LTrajDiff，一种新的方法，它将干扰或遮盲的对象视为与完全可见的对象一样重要。LTrajDiff 使用了来自移动设备的感知数据，但是也引入了新的挑战，如模式融合、噪声数据和缺失空间布局和对象大小信息。我们使用了一种杜因采样模型，通过粗细到细的扩散策略，将噪声数据融合到精度 Layout sequence 中。
results: 我们的模型在随机遮盲和非常短输入实验中达到了 SOTA Result，证明了我们的方法可以准确地预测行人 bounding box 的 trajectory，并且可以在实际世界中使用感知数据来预测 pedestrian movement。

Abstract
Trajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics. Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities. Nevertheless, real-world situations often involve obstructed cameras, missed objects, or objects out of sight due to environmental factors, leading to incomplete or noisy trajectories. To overcome these limitations, we propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories. LTrajDiff utilizes sensor data from mobile phones to surmount out-of-sight constraints, albeit introducing new challenges such as modality fusion, noisy data, and the absence of spatial layout and object size information. We employ a denoising diffusion model to predict precise layout sequences from noisy mobile data using a coarse-to-fine diffusion strategy, incorporating the RMS, Siamese Masked Encoding Module, and MFM. Our model predicts layout sequences by implicitly inferring object size and projection status from a single reference timestamp or significantly obstructed sequences. Achieving SOTA results in randomly obstructed experiments and extremely short input experiments, our model illustrates the effectiveness of leveraging noisy mobile data. In summary, our approach offers a promising solution to the challenges faced by layout sequence and trajectory prediction models in real-world settings, paving the way for utilizing sensor data from mobile phones to accurately predict pedestrian bounding box trajectories. To the best of our knowledge, this is the first work that addresses severely obstructed and extremely short layout sequences by combining vision with noisy mobile modality, making it the pioneering work in the field of layout sequence trajectory prediction.

摘要
atrajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics. Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities. Nevertheless, real-world situations often involve obstructed cameras, missed objects, or objects out of sight due to environmental factors, leading to incomplete or noisy trajectories. To overcome these limitations, we propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories. LTrajDiff utilizes sensor data from mobile phones to surmount out-of-sight constraints, albeit introducing new challenges such as modality fusion, noisy data, and the absence of spatial layout and object size information. We employ a denoising diffusion model to predict precise layout sequences from noisy mobile data using a coarse-to-fine diffusion strategy, incorporating the RMS, Siamese Masked Encoding Module, and MFM. Our model predicts layout sequences by implicitly inferring object size and projection status from a single reference timestamp or significantly obstructed sequences. Achieving SOTA results in randomly obstructed experiments and extremely short input experiments, our model illustrates the effectiveness of leveraging noisy mobile data. In summary, our approach offers a promising solution to the challenges faced by layout sequence and trajectory prediction models in real-world settings, paving the way for utilizing sensor data from mobile phones to accurately predict pedestrian bounding box trajectories. To the best of our knowledge, this is the first work that addresses severely obstructed and extremely short layout sequences by combining vision with noisy mobile modality, making it the pioneering work in the field of layout sequence trajectory prediction.

Learning Layer-wise Equivariances Automatically using Gradients

paper_url: http://arxiv.org/abs/2310.06131
repo_url: https://github.com/tychovdo/ella
paper_authors: Tycho F. A. van der Ouderaa, Alexander Immer, Mark van der Wilk
for: 提高神经网络的泛化性能，使其更好地适应不同的输入数据。
methods: 使用权重相互连接结构和梯度下降法自适应地学习层 wise 对称性。
results: 在图像分类任务上，自动学习层 wise 对称性可以达到与固定编码的对称性相同或更好的性能。

Abstract
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and associated weight connectivity structures from scratch is difficult for two reasons. First, it requires efficient and flexible parameterisations of layer-wise equivariances. Secondly, symmetries act as constraints and are therefore not encouraged by training losses measuring data fit. To overcome these challenges, we improve parameterisations of soft equivariance and learn the amount of equivariance in layers by optimising the marginal likelihood, estimated using differentiable Laplace approximations. The objective balances data fit and model complexity enabling layer-wise symmetry discovery in deep networks. We demonstrate the ability to automatically learn layer-wise equivariances on image classification tasks, achieving equivalent or improved performance over baselines with hard-coded symmetry.

摘要
将文本翻译成简化中文。卷积层可以将等价对称性 encode到神经网络中，导致更好的泛化性表现。然而，对称性提供硬coded的约束，需要在预先指定，并不能适应。我们的目标是让柔性的对称约束，可以通过梯度学习自动从数据中学习。学习对称和相关的权重连接结构从零开始很困难，因为它们需要有效的和灵活的层wise equivariant parameterization。其次，对称性作为约束，因此不会被训练损失奖励。为了解决这些挑战，我们改进了软对称 parameterization，并通过优化 marginal likelihood，使用可微 differentiable Laplace approximations来学习层wise equivariance。该目标平衡数据适应和模型复杂度，使得层wise symmetry discovery在深度网络中自动学习。我们在图像分类任务上展示了自动学习层wise equivariance的能力，与硬编码的对称性具有相同或改进的性能。

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

paper_url: http://arxiv.org/abs/2310.06125
repo_url: https://github.com/jwr1995/pubsep
paper_authors: William Ravenscroft, Stefan Goetze, Thomas Hain
for: 这篇论文主要针对多mic麦克风技术领域的speech separation问题进行研究。
methods: 这篇论文使用了卷积扩展器（Conformers），它们在多种speech处理任务中表现良好，但在speech separation领域尚未得到充分研究。最近的state-of-the-art（SOTA）分离模型主要是时域音频分离网络（TasNets）。一些成功的模型使用了双路（DP）网络，它们在本地和全局信息的序列处理中做出了优秀的表现。
results: 在实际的短信号长度下，TD-Conformers在控制特征维度时表现更高效。 authors proposed subsampling layers to further improve computational efficiency. The best TD-Conformer achieves 14.6 dB and 21.2 dB SISDR improvement on the WHAMR and WSJ0-2Mix benchmarks, respectively.

Abstract
Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation. Most recent state-of-the-art (SOTA) separation models have been time-domain audio separation networks (TasNets). A number of successful models have made use of dual-path (DP) networks which sequentially process local and global information. Time domain conformers (TD-Conformers) are an analogue of the DP approach in that they also process local and global context sequentially but have a different time complexity function. It is shown that for realistic shorter signal lengths, conformers are more efficient when controlling for feature dimension. Subsampling layers are proposed to further improve computational efficiency. The best TD-Conformer achieves 14.6 dB and 21.2 dB SISDR improvement on the WHAMR and WSJ0-2Mix benchmarks, respectively.

摘要
<>传送文本到简化中文。>研究多говор话技术的人们认为，语音分离仍然是一个重要的话题。卷积加强变换器（conformers）在许多语音处理任务中表现良好，但是它们在语音分离方面尚未得到足够的研究。最近的最佳状态（SOTA）分离模型主要是时域音频分离网络（TasNets）。一些成功的模型具有双路（DP）网络，这些网络先后处理本地和全局信息。时域卷积器（TD-Conformers）是DP方法的同义词，它们也在本地和全局上下文中进行顺序处理，但是它们的时间复杂度函数不同。研究发现，对于更加现实的信号长度，TD-Conformers在控制特征维度时更加高效。抽样层被提议来进一步提高计算效率。最佳TD-Conformer在WHAMR和WSJ0-2Mix测试集上分别提高了14.6dB和21.2dB的SISDR指标。

Text-driven Prompt Generation for Vision-Language Models in Federated Learning

paper_url: http://arxiv.org/abs/2310.06123
repo_url: None
paper_authors: Chen Qiu, Xingyu Li, Chaithanya Kumar Mummadi, Madan Ravi Ganesh, Zhenzhen Li, Lu Peng, Wan-Yi Lin
for: 这个研究旨在提出一种整合多个远端客户的 Federated Text-driven Prompt Generation（FedTPG）方法，以实现视觉语言模型的普遍化。
methods: 这个方法使用了一个内置的文本输入，并通过一个专门的生成网络来学习文本提示。这个生成网络是基于任务相关的文本输入，因此具有内在的内容感知能力，可以对未见过的类别进行普遍化。
results: 我们的实验结果显示，这个方法在九个多标的图像分类任务上比较出色，可以对已知和未知的类别进行更好的普遍化，并且可以应用于新的数据集。

Abstract
Prompt learning for vision-language models, e.g., CoOp, has shown great success in adapting CLIP to different downstream tasks, making it a promising solution for federated learning due to computational reasons. Existing prompt learning techniques replace hand-crafted text prompts with learned vectors that offer improvements on seen classes, but struggle to generalize to unseen classes. Our work addresses this challenge by proposing Federated Text-driven Prompt Generation (FedTPG), which learns a unified prompt generation network across multiple remote clients in a scalable manner. The prompt generation network is conditioned on task-related text input, thus is context-aware, making it suitable to generalize for both seen and unseen classes. Our comprehensive empirical evaluations on nine diverse image classification datasets show that our method is superior to existing federated prompt learning methods, that achieve overall better generalization on both seen and unseen classes and is also generalizable to unseen datasets.

摘要
Prompt learning for vision-language models, such as CoOp, has shown great success in adapting CLIP to different downstream tasks, making it a promising solution for federated learning due to computational reasons. Existing prompt learning techniques replace hand-crafted text prompts with learned vectors that offer improvements on seen classes, but struggle to generalize to unseen classes. Our work addresses this challenge by proposing Federated Text-driven Prompt Generation (FedTPG), which learns a unified prompt generation network across multiple remote clients in a scalable manner. The prompt generation network is conditioned on task-related text input, thus is context-aware, making it suitable to generalize for both seen and unseen classes. Our comprehensive empirical evaluations on nine diverse image classification datasets show that our method is superior to existing federated prompt learning methods, achieving overall better generalization on both seen and unseen classes and is also generalizable to unseen datasets.Here's the translation breakdown:* "Prompt learning" is translated as "提示学习" (tíshì xuéxí)* "Vision-language models" is translated as "视觉语言模型" (wèi jiàn yǔ yán módel)* "CoOp" is translated as "CoOp" (同义词)* "CLIP" is translated as "CLIP" (同义词)* "Federated learning" is translated as "联合学习" (liánhé xuéxí)* "Hand-crafted text prompts" is translated as "手工编写的文本提示" (shǒu gōng biān xī de wén tiě zhǐ)* "Learned vectors" is translated as "学习后的 вектор" (xuéxí hòu de vector)* "Task-related text input" is translated as "任务相关的文本输入" (tâi yè xiāngguān de wén tiě shūrū)* "Context-aware" is translated as "Context-aware" (同义词)* "Federated Text-driven Prompt Generation" is translated as "联合文本驱动提示生成" (liánhé wén tiě qiú xíng chǎng zhǐ jiàn)* "Existing federated prompt learning methods" is translated as "现有的联合提示学习方法" (xiàn yǒu de liánhé zhǐ xuéxí fāngfa)* "Generalize to unseen classes" is translated as "泛化到未见类" (guānghuà dào wèi jiàn lèi)* "Our comprehensive empirical evaluations" is translated as "我们的全面实验评估" (wǒmen de quánxiān shíyìn zhìshì)Note that the translation is based on the standard Simplified Chinese pronunciation and may vary depending on the specific dialect or accent.

Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis

paper_url: http://arxiv.org/abs/2310.06119
repo_url: https://github.com/zezhishao/basicts
paper_authors: Zezhi Shao, Fei Wang, Yongjun Xu, Wei Wei, Chengqing Yu, Zhao Zhang, Di Yao, Guangyin Jin, Xin Cao, Gao Cong, Christian S. Jensen, Xueqi Cheng
for: 本研究旨在解决现有的评价缺陷和技术方法选择争议，提供关于多变量时间序列预测（MTS）领域进步的深入理解。
methods: 本研究提出了一个名为BasicTS的比较平台，用于公正地评价多变量时间序列预测模型。 BasicTS 设置了一个标准的训练管道和合理的评价标准，使得评估了超过 30 种常见 MTS 预测模型的性能。
results: 研究发现，现有的 MTS 预测模型在不同的时间和空间特征下表现有很大差异。 BasicTS 可以帮助研究人员选择和设计适合的 MTS 预测模型，并提供了多个可重现的性能和效率比较结果。

Abstract
Multivariate Time Series (MTS) widely exists in real-word complex systems, such as traffic and energy systems, making their forecasting crucial for understanding and influencing these systems. Recently, deep learning-based approaches have gained much popularity for effectively modeling temporal and spatial dependencies in MTS, specifically in Long-term Time Series Forecasting (LTSF) and Spatial-Temporal Forecasting (STF). However, the fair benchmarking issue and the choice of technical approaches have been hotly debated in related work. Such controversies significantly hinder our understanding of progress in this field. Thus, this paper aims to address these controversies to present insights into advancements achieved. To resolve benchmarking issues, we introduce BasicTS, a benchmark designed for fair comparisons in MTS forecasting. BasicTS establishes a unified training pipeline and reasonable evaluation settings, enabling an unbiased evaluation of over 30 popular MTS forecasting models on more than 18 datasets. Furthermore, we highlight the heterogeneity among MTS datasets and classify them based on temporal and spatial characteristics. We further prove that neglecting heterogeneity is the primary reason for generating controversies in technical approaches. Moreover, based on the proposed BasicTS and rich heterogeneous MTS datasets, we conduct an exhaustive and reproducible performance and efficiency comparison of popular models, providing insights for researchers in selecting and designing MTS forecasting models.

摘要
多变量时间系列（MTS）广泛存在在实际世界复杂系统中，如交通和能源系统，其预测对这些系统的理解和影响是关键。在最近几年，深度学习基于方法在MTS预测中得到了很多欢迎，特别是在长期时间序列预测（LTSF）和空间-时间预测（STF）中。然而，实际工作中的公平比较问题和技术方法选择问题一直是热点议题。这些争议很大程度上阻碍了我们对这个领域的进步的理解。因此，这篇论文旨在解决这些争议，提供关于领域的进步的新视角。为了解决公平比较问题，我们提出了BasicTS，一个用于公平比较的 benchmark。BasicTS 设计了一个统一的训练管道和合理的评估设置，使得不受偏见的评估了超过30种常见MTS预测模型在18个数据集上。此外，我们还发现了MTS数据集中的多样性，并将其分为了时间和空间特征的两类。我们还证明了忽略多样性是预测技术方法中的主要问题。此外，基于我们提出的BasicTS和丰富的多样化MTS数据集，我们进行了广泛和可重复的性和效率比较，为研究人员提供了选择和设计MTS预测模型的新的指导思想。

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

paper_url: http://arxiv.org/abs/2310.06117
repo_url: None
paper_authors: Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H. Chi, Quoc V Le, Denny Zhou
for: 提高LLM的抽象能力，使其能够从具体情况中提取高级概念和原则，以便更好地进行正确的逻辑推理。
methods: 使用Step-Back Prompting技术，通过提供高级概念和原则来引导LLM的推理步骤，使其能够更好地遵循正确的逻辑推理路径。
results: 在多种复杂的逻辑推理任务中，使用Step-Back Prompting技术可以提高PaLM-2L模型的性能，比如物理和化学知识测验（MMLU Physics和Chemistry）上提高7%和11%，时间问答（TimeQA）上提高27%，以及多步逻辑推理任务（MuSiQue）上提高7%。

Abstract
We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide the reasoning steps, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L models and observe substantial performance gains on a wide range of challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU Physics and Chemistry by 7% and 11%, TimeQA by 27%, and MuSiQue by 7%.

摘要
我们提出Step-Back Prompting，一种简单的提示技术，让机器学习模型（LLMs）从具体的实例中抽象出高水平概念和基本假设，然后使用这些概念和假设来引导逻辑步骤，以提高LLMs的解释正确性。我们在PaLM-2L模型上进行实验，并观察到了广泛的应用数据领域中的表现优化，包括STEM、知识问题答案和多步逻辑等。例如，Step-Back Prompting在物理和化学MMLU中提高PaLM-2L表现的比例为7%和11%，在TimeQA中提高27%，在MuSiQue中提高7%。

OptiMUS: Optimization Modeling Using MIP Solvers and large language models

paper_url: http://arxiv.org/abs/2310.06116
repo_url: https://github.com/teshnizi/optimus
paper_authors: Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell
for: 这个论文是为了提供一种基于自然语言描述的优化问题解决方案。
methods: 该论文使用了大语言模型（LLM）来解决优化问题，包括发展数学模型、编写和调试解决方案代码、开发测试、检查生成解决方案的有效性。
results: 试验表明，OptiMUS可以比基本的LLM提示策略多解决优化问题。

Abstract
Optimization problems are pervasive across various sectors, from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers, as the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve MILP problems from their natural language descriptions. OptiMUS is capable of developing mathematical models, writing and debugging solver code, developing tests, and checking the validity of generated solutions. To benchmark our agent, we present NLP4LP, a novel dataset of linear programming (LP) and mixed integer linear programming (MILP) problems. Our experiments demonstrate that OptiMUS solves nearly twice as many problems as a basic LLM prompting strategy. OptiMUS code and NLP4LP dataset are available at \href{https://github.com/teshnizi/OptiMUS}{https://github.com/teshnizi/OptiMUS}

摘要
优化问题在不同领域广泛存在，从制造和分布到医疗。然而，大多数这些问题仍然通过手动规则来解决而不是使用当前的优化解决方案，因为解决这些问题所需的专业知识限制了优化工具和技术的普及。我们介绍OptiMUS，一个基于大语言模型（LLM）的代理人，可以从自然语言描述中形式化和解决优化问题。OptiMUS可以开发数学模型，编写和调试解决器代码，开发测试，并检查生成的解决方案的有效性。为了评估我们的代理人，我们提出了NLP4LP数据集，一个新的线性 программирова（LP）和混合整数线性程序（MILP）问题的数据集。我们的实验表明，OptiMUS可以比基本的LLM提示策略多 solves一半的问题。OptiMUS代码和NLP4LP数据集可以在获取。

Learning Interactive Real-World Simulators

paper_url: http://arxiv.org/abs/2310.06114
repo_url: None
paper_authors: Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Dale Schuurmans, Pieter Abbeel
for: 这篇论文的目的是学习一个 универсаль的实际世界模拟器（UniSim），用于模拟人类和机器人之间的互动。
methods: 这篇论文使用生成模型来学习不同类型的数据，包括图像、视频和机器人数据，以实现真实的实际世界 simulate。
results: 这篇论文的实验结果表明，通过在UniSim中训练高级视觉语言规划和低级强化学习策略，可以在真实世界中展示零批量训练的功能。此外，视频描述模型也可以通过与Simulink进行培训，提高其应用范围。

Abstract
Generative models trained on internet data have revolutionized how text, image, and video content can be created. Perhaps the next milestone for generative models is to simulate realistic experience in response to actions taken by humans, robots, and other interactive agents. Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world. We explore the possibility of learning a universal simulator (UniSim) of real-world interaction through generative modeling. We first make the important observation that natural datasets available for learning a real-world simulator are often rich along different axes (e.g., abundant objects in image data, densely sampled actions in robotics data, and diverse movements in navigation data). With careful orchestration of diverse datasets, each providing a different aspect of the overall experience, UniSim can emulate how humans and agents interact with the world by simulating the visual outcome of both high-level instructions such as "open the drawer" and low-level controls such as "move by x, y" from otherwise static scenes and objects. There are numerous use cases for such a real-world simulator. As an example, we use UniSim to train both high-level vision-language planners and low-level reinforcement learning policies, each of which exhibit zero-shot real-world transfer after training purely in a learned real-world simulator. We also show that other types of intelligence such as video captioning models can benefit from training with simulated experience in UniSim, opening up even wider applications. Video demos can be found at https://universal-simulator.github.io.

摘要
优化模型在互联网数据上进行训练已经革命化了文本、图像和视频内容的创建方式。可能下一个里程碑 для优化模型是模拟人类、机器人和其他交互代理的真实经验，响应于人类和机器人的行为。我们探讨了通过生成模型学习的UniSim universal simulator，以模拟人类和代理之间的互动。我们发现了自然数据集的重要观察：各种数据集在不同的轴上充满着数据（例如图像数据中的充满物体、机器人数据中的紧密的动作和导航数据中的多种运动）。通过综合考虑这些不同的数据集，UniSim可以模拟人类和代理在世界中交互的方式，包括通过高级指令如“打开抽屉”和低级控制如“移动by x, y”来模拟静止场景和物体的视觉结果。这种真实世界模拟器有很多应用场景。例如，我们使用UniSim训练高级视力语言规划和低级强化学习策略，它们在唯一学习的真实世界模拟器中展现出零基础真实世界传递。此外，我们还发现了训练在UniSim中的视频描述模型可以受益于实际经验，开阔了更广泛的应用领域。视频 demo 可以在找到。

When is Agnostic Reinforcement Learning Statistically Tractable?

paper_url: http://arxiv.org/abs/2310.06113
repo_url: None
paper_authors: Zeyu Jia, Gene Li, Alexander Rakhlin, Ayush Sekhari, Nathan Srebro
for: 学习一个 unknown MDP 中的 $\epsilon$-优秀策略， Given a policy class $\Pi$。
methods: 使用一种新的复杂度度量——\emph{spanning capacity}，该度量只取决于集合 $\Pi$ 而不依赖于 MDP 动态。
results: 显示存在一个 policy class $\Pi$ 的 bounded spanning capacity 可以学习，但是需要 superpolynomial 数量的样本。此外，我们还提出了一种新的算法 called POPLER，可以实现 statistically efficient online RL。

Abstract
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi$, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an $\epsilon$-suboptimal policy with respect to $\Pi$? Towards that end, we introduce a new complexity measure, called the \emph{spanning capacity}, that depends solely on the set $\Pi$ and is independent of the MDP dynamics. With a generative model, we show that for any policy class $\Pi$, bounded spanning capacity characterizes PAC learnability. However, for online RL, the situation is more subtle. We show there exists a policy class $\Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional \emph{sunflower} structure, which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as techniques for reachable-state identification and policy evaluation in reward-free exploration.

摘要
我们研究无知RL问题（Policy Optimization with Unknown Dynamics）：给定一个策略集合 $\Pi$，何时需要多少回交互 avec一个未知MDP（可能具有很大的状态和动作空间）以学习一个 $\epsilon$-优化策略？为了解决这个问题，我们引入了一个新的复杂度度量，即 \emph{spanning capacity}，这个度量只取决于集合 $\Pi$，与MDP动力完全无关。使用生成模型，我们证明了任何策略集合 $\Pi$ 的 bounded spanning capacity 是PAC学习可能的。然而，在在线RL中，情况更加复杂。我们证明了存在一个策略集合 $\Pi$ 的 bounded spanning capacity 需要超polynomial数量的样本来学习。这表明了无知学习中的分开性，在生成访问模型和在线访问模型之间，以及在渐进性和随机MDP之间。然而，我们还发现了一种附加的 \emph{sunflower} 结构，它可以在 conjunction WITH bounded spanning capacity 使得在线RL可以通过一种新的算法called POPLER来实现，这个算法结合了古典的重要性抽象方法以及探索和策略评估技术在奖励free探索中。

High Dimensional Causal Inference with Variational Backdoor Adjustment

paper_url: http://arxiv.org/abs/2310.06100
repo_url: https://github.com/danielmisrael/variational-backdoor-adjustment
paper_authors: Daniel Israel, Aditya Grover, Guy Van den Broeck
for: 这篇论文旨在应用后门调整方法来估计干扰量，并解决高维度干扰和变量的问题。
methods: 本论文使用生成模型来实现后门调整，并将后门调整视为variational推导中的优化问题。
results: 实验结果显示，本方法能够在高维度设置下估计干扰likelihood，并在各种高维度应用中实现成功。

Abstract
Backdoor adjustment is a technique in causal inference for estimating interventional quantities from purely observational data. For example, in medical settings, backdoor adjustment can be used to control for confounding and estimate the effectiveness of a treatment. However, high dimensional treatments and confounders pose a series of potential pitfalls: tractability, identifiability, optimization. In this work, we take a generative modeling approach to backdoor adjustment for high dimensional treatments and confounders. We cast backdoor adjustment as an optimization problem in variational inference without reliance on proxy variables and hidden confounders. Empirically, our method is able to estimate interventional likelihood in a variety of high dimensional settings, including semi-synthetic X-ray medical data. To the best of our knowledge, this is the first application of backdoor adjustment in which all the relevant variables are high dimensional.

摘要
<>使用后门调整技术来估计从完全观察数据中的干预量。例如，在医疗设置下，后门调整可以控制干预因素和估计治疗效果。然而，高维度干预和干预因素可能存在一系列潜在的陷阱：可追踪性、可识别性和优化。在这项工作中，我们采用生成模型方法来实现后门调整。我们将后门调整视为变量推断中的优化问题，而不需要使用代理变量和隐藏干预因素。实际上，我们的方法可以在高维度设置下估计干预概率，包括半人工X射数据等多种高维度设置。根据我们知道，这是首次在所有相关变量都是高维度情况下应用后门调整。

Predictive auxiliary objectives in deep RL mimic learning in the brain

paper_url: http://arxiv.org/abs/2310.06089
repo_url: None
paper_authors: Ching Fang, Kimberly L Stachenfeld
for:This paper explores the use of predictive auxiliary objectives in deep reinforcement learning (RL) to support representation learning and improve task performance.methods:The paper uses a deep RL system with self-supervised auxiliary objectives to study the effects of predictive learning on representation learning across different modules of the system.results:The paper finds that predictive objectives improve and stabilize learning, particularly in resource-limited architectures, and identifies settings where longer predictive horizons better support representational transfer. Additionally, the paper finds that representational changes in the RL system bear a striking resemblance to changes in neural activity observed in the brain.

Abstract
The ability to predict upcoming events has been hypothesized to comprise a key aspect of natural and machine cognition. This is supported by trends in deep reinforcement learning (RL), where self-supervised auxiliary objectives such as prediction are widely used to support representation learning and improve task performance. Here, we study the effects predictive auxiliary objectives have on representation learning across different modules of an RL system and how these mimic representational changes observed in the brain. We find that predictive objectives improve and stabilize learning particularly in resource-limited architectures, and we identify settings where longer predictive horizons better support representational transfer. Furthermore, we find that representational changes in this RL system bear a striking resemblance to changes in neural activity observed in the brain across various experiments. Specifically, we draw a connection between the auxiliary predictive model of the RL system and hippocampus, an area thought to learn a predictive model to support memory-guided behavior. We also connect the encoder network and the value learning network of the RL system to visual cortex and striatum in the brain, respectively. This work demonstrates how representation learning in deep RL systems can provide an interpretable framework for modeling multi-region interactions in the brain. The deep RL perspective taken here also suggests an additional role of the hippocampus in the brain -- that of an auxiliary learning system that benefits representation learning in other regions.

摘要

Performative Time-Series Forecasting

paper_url: http://arxiv.org/abs/2310.06077
repo_url: https://github.com/adityalab/pets
paper_authors: Zhiyuan Zhao, Alexander Rodriguez, B. Aditya Prakash
for: 本文旨在解决时间序列预测中的回馈循环问题，即预测结果可能会影响实际结果，从而改变预测目标变量的分布。
methods: 本文提出了一种新的方法Feature Performative-Shifting（FPS），利用延迟响应来预测分布的变化，并根据此预测目标变量。
results: 实验结果表明，FPS方法可以有效地处理回馈循环引起的挑战，并在COVID-19和交通预测任务中表现出优于传统时间序列预测方法。

Abstract
Time-series forecasting is a critical challenge in various domains and has witnessed substantial progress in recent years. Many real-life scenarios, such as public health, economics, and social applications, involve feedback loops where predictions can influence the predicted outcome, subsequently altering the target variable's distribution. This phenomenon, known as performativity, introduces the potential for 'self-negating' or 'self-fulfilling' predictions. Despite extensive studies in classification problems across domains, performativity remains largely unexplored in the context of time-series forecasting from a machine-learning perspective. In this paper, we formalize performative time-series forecasting (PeTS), addressing the challenge of accurate predictions when performativity-induced distribution shifts are possible. We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts and subsequently predicts targets accordingly. We provide theoretical insights suggesting that FPS can potentially lead to reduced generalization error. We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks. The results demonstrate that FPS consistently outperforms conventional time-series forecasting methods, highlighting its efficacy in handling performativity-induced challenges.

摘要
时间序列预测是各个领域中的一项重要挑战，在过去几年中得到了重要进展。许多实际场景，如公共卫生、经济和社会应用，都存在反馈循环，其中预测结果可能会影响预测结果的分布，从而导致“自我实现”或“自我否定”的预测。这种现象被称为“表现力”，它在机器学习角度来看，尚未在时间序列预测中得到了广泛的研究。在这篇论文中，我们正式定义了表现力时间序列预测（PeTS），即在预测过程中考虑表现力引起的分布变化的挑战。我们提出了一种新的方法，即特征表现滚动（FPS），它利用延迟应答来预测分布变化，并根据此预测目标。我们提供了理论分析，表明FPS可能会减少泛化误差。我们在COVID-19和交通预测任务上进行了广泛的实验，结果表明FPS在处理表现力引起的挑战时表现出色，高于传统时间序列预测方法。

Pain Forecasting using Self-supervised Learning and Patient Phenotyping: An attempt to prevent Opioid Addiction

paper_url: http://arxiv.org/abs/2310.06075
repo_url: None
paper_authors: Swati Padhee, Tanvi Banerjee, Daniel M. Abrams, Nirmish Shah
for: 本研究旨在预测患有抗阻塞综合症（SCD）的患者未来疼痛轨迹，以提高他们的生活质量而无需妥协他们的治疗。
methods: 本研究使用自动学习方法来解决疼痛预测问题，并对时间序列数据进行归类，以分类患者的亚群并提供个性化的治疗方案。
results: 实验结果显示，我们的模型在五年的实际数据上表现出色，超过了现有的标准准则，并可以准确地分类患者，提供有价值的临床决策信息。

Abstract
Sickle Cell Disease (SCD) is a chronic genetic disorder characterized by recurrent acute painful episodes. Opioids are often used to manage these painful episodes; the extent of their use in managing pain in this disorder is an issue of debate. The risk of addiction and side effects of these opioid treatments can often lead to more pain episodes in the future. Hence, it is crucial to forecast future patient pain trajectories to help patients manage their SCD to improve their quality of life without compromising their treatment. It is challenging to obtain many pain records to design forecasting models since it is mainly recorded by patients' self-report. Therefore, it is expensive and painful (due to the need for patient compliance) to solve pain forecasting problems in a purely supervised manner. In light of this challenge, we propose to solve the pain forecasting problem using self-supervised learning methods. Also, clustering such time-series data is crucial for patient phenotyping, anticipating patients' prognoses by identifying "similar" patients, and designing treatment guidelines tailored to homogeneous patient subgroups. Hence, we propose a self-supervised learning approach for clustering time-series data, where each cluster comprises patients who share similar future pain profiles. Experiments on five years of real-world datasets show that our models achieve superior performance over state-of-the-art benchmarks and identify meaningful clusters that can be translated into actionable information for clinical decision-making.

摘要
针对患有悉尼细胞病（SCD）的患者，我们提出了一种基于自我监督学习的痛情预测方法。这种方法可以帮助患者更好地管理自己的病情，提高生活质量，而不需要妥协对治疗的影响。由于痛情记录的收集是主要由患者自己报告，因此收集数据的成本很高，而且需要患者的合作性，这使得解决痛情预测问题在完全监督方式下是非常困难的。为了解决这个问题，我们提出了一种基于自我监督学习的时间序列数据划分方法，每个分组包含拥有相似未来痛情轨迹的患者。我们对实际数据进行五年的实验，结果表明我们的模型在比较状态的标准准的基础上表现出色，并能够分配有意义的分组，这些分组可以被翻译成临床决策中的有用信息。

Augmenting Vision-Based Human Pose Estimation with Rotation Matrix

paper_url: http://arxiv.org/abs/2310.06068
repo_url: None
paper_authors: Milad Vazan, Fatemeh Sadat Masoumi, Ruizhi Ou, Reza Rawassizadeh
for: 本研究旨在提高基于姿势估计的活动识别精度，通过结合pose estimation和novel数据增强方法。
methods: 本研究使用pose estimation和一种新的数据增强方法，即旋转矩阵，来增强活动识别的精度。
results: 经过我们的实验，我们发现使用SVM与SGD优化，并结合旋转矩阵数据增强方法，可以达到96%的活动识别精度，而不使用数据增强方法的基准精度只有64%。

Abstract
Fitness applications are commonly used to monitor activities within the gym, but they often fail to automatically track indoor activities inside the gym. This study proposes a model that utilizes pose estimation combined with a novel data augmentation method, i.e., rotation matrix. We aim to enhance the classification accuracy of activity recognition based on pose estimation data. Through our experiments, we experiment with different classification algorithms along with image augmentation approaches. Our findings demonstrate that the SVM with SGD optimization, using data augmentation with the Rotation Matrix, yields the most accurate results, achieving a 96% accuracy rate in classifying five physical activities. Conversely, without implementing the data augmentation techniques, the baseline accuracy remains at a modest 64%.

摘要
fitness 应用程序通常用于健身房内活动监测，但它们经常无法自动跟踪健身房内的活动。本研究提出一种使用 pose estimation 和 rotation matrix 的模型，以提高基于 pose estimation 数据的活动识别精度。我们通过不同的分类算法和图像增强方法进行实验，发现使用 SVM WITH SGD 优化和数据增强方法，可以达到 96% 的正确率，分类五种物理活动。相比之下，没有实施数据增强技术，基准精度只有 64%。

LLM for SoC Security: A Paradigm Shift

paper_url: http://arxiv.org/abs/2310.06046
repo_url: None
paper_authors: Dipayan Saha, Shams Tarek, Katayoon Yahyaei, Sujan Kumar Saha, Jingbo Zhou, Mark Tehranipoor, Farimah Farahmandi
for: 提高SoC设计流程中的安全性 verification的效率、可扩展性和适应性。
methods: 利用生成式预训练 transformer（GPT）技术来替代现有的安全解决方案，以提供更加有效、可扩展和适应的安全验证方法。
results: 通过实践案例和实验研究，得到了GPT在SoC安全验证中的成果，包括提高验证效率、扩展验证范围和适应性能。

Abstract
As the ubiquity and complexity of system-on-chip (SoC) designs increase across electronic devices, the task of incorporating security into an SoC design flow poses significant challenges. Existing security solutions are inadequate to provide effective verification of modern SoC designs due to their limitations in scalability, comprehensiveness, and adaptability. On the other hand, Large Language Models (LLMs) are celebrated for their remarkable success in natural language understanding, advanced reasoning, and program synthesis tasks. Recognizing an opportunity, our research delves into leveraging the emergent capabilities of Generative Pre-trained Transformers (GPTs) to address the existing gaps in SoC security, aiming for a more efficient, scalable, and adaptable methodology. By integrating LLMs into the SoC security verification paradigm, we open a new frontier of possibilities and challenges to ensure the security of increasingly complex SoCs. This paper offers an in-depth analysis of existing works, showcases practical case studies, demonstrates comprehensive experiments, and provides useful promoting guidelines. We also present the achievements, prospects, and challenges of employing LLM in different SoC security verification tasks.

摘要
(Simplified Chinese translation)随着系统在片（SoC）设计的 ubique 和复杂性的增加，在电子设备中实现安全性的任务变得更加困难。现有的安全解决方案因其缺乏扩展性、全面性和适应性而无法提供有效的验证。然而，大型自然语言模型（LLM）在自然语言理解、高级逻辑和程序生成任务中受到广泛的赞誉。我们的研究希望通过利用生成预训练转换器（GPT）的emergent capability来解决现有的安全阻碍，以实现更加高效、可扩展和适应的方法学。通过将LLM integrate into SoC安全验证模式，我们开启了一个新的前ier的可能性和挑战，以确保逐渐增加的SoC的安全性。本文提供了深入的现有工作分析、实践案例展示、全面的实验和有用的推广指南。我们还提出了使用LLM在不同的SoC安全验证任务中的成就、前景和挑战。

Generative ensemble deep learning severe weather prediction from a deterministic convection-allowing model

paper_url: http://arxiv.org/abs/2310.06045
repo_url: https://github.com/yingkaisha/severe_weather_cgan
paper_authors: Yingkai Sha, Ryan A. Sobash, David John Gagne II
For: This paper is written for the purpose of developing an ensemble post-processing method for probabilistic prediction of severe weather (tornadoes, hail, and wind gusts) over the conterminous United States (CONUS).* Methods: The method combines conditional generative adversarial networks (CGANs) and a convolutional neural network (CNN) to post-process convection-allowing model (CAM) forecasts. The CGANs create synthetic ensemble members from deterministic CAM forecasts, and the CNN processes the outputs to estimate the probability of severe weather.* Results: The method produced skillful predictions with up to 20% Brier Skill Score (BSS) increases compared to other neural-network-based reference methods using a testing dataset of HRRR forecasts in 2021. The method also provided meaningful ensemble spreads that can distinguish good and bad forecasts, despite being overconfident. The quality of CGAN outputs was found to be similar to a numerical ensemble, preserving inter-variable correlations and the contribution of influential predictors.

Abstract
An ensemble post-processing method is developed for the probabilistic prediction of severe weather (tornadoes, hail, and wind gusts) over the conterminous United States (CONUS). The method combines conditional generative adversarial networks (CGANs), a type of deep generative model, with a convolutional neural network (CNN) to post-process convection-allowing model (CAM) forecasts. The CGANs are designed to create synthetic ensemble members from deterministic CAM forecasts, and their outputs are processed by the CNN to estimate the probability of severe weather. The method is tested using High-Resolution Rapid Refresh (HRRR) 1--24 hr forecasts as inputs and Storm Prediction Center (SPC) severe weather reports as targets. The method produced skillful predictions with up to 20% Brier Skill Score (BSS) increases compared to other neural-network-based reference methods using a testing dataset of HRRR forecasts in 2021. For the evaluation of uncertainty quantification, the method is overconfident but produces meaningful ensemble spreads that can distinguish good and bad forecasts. The quality of CGAN outputs is also evaluated. Results show that the CGAN outputs behave similarly to a numerical ensemble; they preserved the inter-variable correlations and the contribution of influential predictors as in the original HRRR forecasts. This work provides a novel approach to post-process CAM output using neural networks that can be applied to severe weather prediction.

摘要
一种ensemble post-processing方法被开发用于预测美国大陆部分地区（CONUS）的严重天气（风暴、冰雨和风速）。该方法结合了条件生成隐藏模型（CGANs）和卷积神经网络（CNN）来处理可变性模型（CAM）预测。CGANs用于创建基于权值的ensemble成员，并将其输出经过CNN处理以估计严重天气的概率。该方法使用2021年的高分解速Refresh（HRRR）1--24小时预测作为输入，并使用 Storm Prediction Center（SPC）的严重天气报告作为目标。该方法生成了有20%的Brier Skill Score（BSS）提升 compared to其他基于神经网络的参考方法。为了评估不确定性评估，该方法显示出了一定的过于自信心，但生成了有意义的ensemble距离，可以分辨出好和坏预测。此外，CGAN输出的质量也被评估，结果表明CGAN输出与原始HRRR预测的相互关系和重要预测变量的贡献保持了一致。这种方法可以应用于严重天气预测中的深度学习post-processing。

DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

paper_url: http://arxiv.org/abs/2310.06020
repo_url: None
paper_authors: Maximilian Seitzer, Sjoerd van Steenkiste, Thomas Kipf, Klaus Greff, Mehdi S. M. Sajjadi
for: 本研究旨在从单摄视频中提取真实世界场景的3D结构和动态特征，以便生成视频中的视图。
methods: 该模型基于现有的神经场景表示方法，通过一种新的协作训练方法和新的人工数据集DySO，以分解单摄视频为场景内容、每个视图的场景动态和摄像机pose。
results: 模型学习到了可质感的幂等特征，可以分离控制摄像机和场景内容的视图生成。

Abstract
Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene Transformer (DyST) model leverages recent work in neural scene representation to learn a latent decomposition of monocular real-world videos into scene content, per-view scene dynamics, and camera pose. This separation is achieved through a novel co-training scheme on monocular videos and our new synthetic dataset DySO. DyST learns tangible latent representations for dynamic scenes that enable view generation with separate control over the camera and the content of the scene.

摘要
世界的视觉理解不仅仅是图像的 semantics 和平面结构。在这项工作中，我们想要从单目世界视频中捕捉到真实场景的3D结构和动态。我们的动态场景变换模型（DyST）利用了最近的神经场景表示学习来学习单目世界视频的含义，并将其分解为场景内容、每个视角的场景动态和摄像头姿态。这种分解是通过我们新的合作训练方案和我们的新的 sintetic dataset DySO 来实现的。DyST 学习了真实场景的具体隐藏表示，使得可以通过分离摄像头和场景内容来生成视图。

Divide-and-Conquer Dynamics in AI-Driven Disempowerment

paper_url: http://arxiv.org/abs/2310.06009
repo_url: None
paper_authors: Peter S. Park, Max Tegmark
for: 研究AI对经济最有价值的工作的模型，以及AI模型如何影响现代艺术家、演员和作家的生活。
methods: 使用游戏理论模型来研究AI模型的分裂和不一致，以及历史记录中AI对人类的影响。
results: 研究预测了AI对未来的威胁，以及现代艺术家、演员和作家的利益相互关系。此外，研究还发现了AI模型的分裂和不一致可能导致更多的人受到AI的影响。

Abstract
AI companies are attempting to create AI systems that outperform humans at most economically valuable work. Current AI models are already automating away the livelihoods of some artists, actors, and writers. But there is infighting between those who prioritize current harms and future harms. We construct a game-theoretic model of conflict to study the causes and consequences of this disunity. Our model also helps explain why throughout history, stakeholders sharing a common threat have found it advantageous to unite against it, and why the common threat has in turn found it advantageous to divide and conquer. Under realistic parameter assumptions, our model makes several predictions that find preliminary corroboration in the historical-empirical record. First, current victims of AI-driven disempowerment need the future victims to realize that their interests are also under serious and imminent threat, so that future victims are incentivized to support current victims in solidarity. Second, the movement against AI-driven disempowerment can become more united, and thereby more likely to prevail, if members believe that their efforts will be successful as opposed to futile. Finally, the movement can better unite and prevail if its members are less myopic. Myopic members prioritize their future well-being less than their present well-being, and are thus disinclined to solidarily support current victims today at personal cost, even if this is necessary to counter the shared threat of AI-driven disempowerment.

摘要
根据现实的参数假设，我们的模型有以下预测：1. 当前被AI驱逐的人需要将未来受害者理解到，他们的利益也面临严重和即将到来的威胁，以便未来受害者可以在团结的基础上支持当前受害者。2. 反对AI驱逐的运动可以更加团结，并因此更有可能取得胜利，如果成员们认为他们的努力会成功而不是费时。3. 运动成员们更加不偏袋穷，他们将未来的利益优先于当前的利益，因此他们可能不愿意为当前受害者支付个人成本，即使这是必要的，以对抗共同的AI驱逐威胁。

Grokking as Compression: A Nonlinear Complexity Perspective

paper_url: http://arxiv.org/abs/2310.05918
repo_url: None
paper_authors: Ziming Liu, Ziqian Zhong, Max Tegmark
for: 本文研究了神经网络压缩的效果对于准确率的影响，并提出了一种 Linear Mapping Number (LMN) 来衡量神经网络复杂度。
methods: 本文使用了 ReLU 网络和 XOR 网络进行实验研究，并对神经网络压缩后的泛化性进行了分析。
results: 研究发现，LMN 可以准确地描述神经网络压缩前后的泛化性关系，而 $L_2$ norm 则与测试损失之间存在复杂的非线性关系。 Additionally, the paper finds that LMN can be used to explain the phenomenon of “grokking” in neural networks, where generalization is delayed after memorization.

Abstract
We attribute grokking, the phenomenon where generalization is much delayed after memorization, to compression. To do so, we define linear mapping number (LMN) to measure network complexity, which is a generalized version of linear region number for ReLU networks. LMN can nicely characterize neural network compression before generalization. Although the $L_2$ norm has been a popular choice for characterizing model complexity, we argue in favor of LMN for a number of reasons: (1) LMN can be naturally interpreted as information/computation, while $L_2$ cannot. (2) In the compression phase, LMN has linear relations with test losses, while $L_2$ is correlated with test losses in a complicated nonlinear way. (3) LMN also reveals an intriguing phenomenon of the XOR network switching between two generalization solutions, while $L_2$ does not. Besides explaining grokking, we argue that LMN is a promising candidate as the neural network version of the Kolmogorov complexity since it explicitly considers local or conditioned linear computations aligned with the nature of modern artificial neural networks.

摘要
我们将“吸收”现象，即记忆化后延迟普化，归因于压缩。为此，我们定义线性映射数（LMN）来衡量神经网络复杂度，这是ReLU网络的通用化版本。LMN能够nicely characterize神经网络压缩 перед普化。尽管$L_2$norm已经是神经网络复杂度的一个受欢迎选择，但我们认为LMN比$L_2$norm更加适合以下几个理由：（1）LMN可以自然地被理解为信息/计算，而$L_2$norm不能。（2）在压缩阶段，LMN与测试损失之间存在线性关系，而$L_2$norm与测试损失之间存在复杂的非线性关系。（3）LMN还揭示了XOR网络在压缩阶段转换到两个普化解决方案的意外现象，而$L_2$norm不会这样。除了解释吸收现象，我们认为LMN是神经网络版本的科尔莫哈洛夫复杂度，因为它直接考虑了现代人工神经网络中的局部或条件线性计算，与神经网络的自然性相符。

Interpreting CLIP’s Image Representation via Text-Based Decomposition

paper_url: http://arxiv.org/abs/2310.05916
repo_url: https://github.com/yossigandelsman/clip_prs
paper_authors: Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt
for: 本研究用CLIP图像编码器来分析各个模型组件如何影响最终表示。
methods: 我们将图像表示分解为图像块、模型层和注意头的和，并使用CLIP的文本表示来解释和评估这些和的组成部分。
results: 我们发现CLIP中的注意头扮演了各种特性特定的角色（例如位置或形状），并发现图像块存在自适应的空间局部化现象。我们使用这些理解来修复和改进CLIP模型，并创建了一个强大的零基础图像分割器。

Abstract
We investigate the CLIP image encoder by analyzing how individual model components affect the final representation. We decompose the image representation as a sum across individual image patches, model layers, and attention heads, and use CLIP's text representation to interpret the summands. Interpreting the attention heads, we characterize each head's role by automatically finding text representations that span its output space, which reveals property-specific roles for many heads (e.g. location or shape). Next, interpreting the image patches, we uncover an emergent spatial localization within CLIP. Finally, we use this understanding to remove spurious features from CLIP and to create a strong zero-shot image segmenter. Our results indicate that a scalable understanding of transformer models is attainable and can be used to repair and improve models.

摘要
我们研究CLIP图像编码器，分析各个模型组件对最终表示的影响。我们将图像表示分解为图像 patches、模型层和注意头的和，使用CLIP的文本表示来解释和Summands。对注意头进行解释，我们自动找到了每个头的输出空间中的文本表示，从而描述了许多头的具体作用（例如，位置或形状）。接着，对图像 patches进行解释，我们发现CLIP中存在自然的空间局部化。最后，我们利用这种理解，将CLIP中的干扰特征除掉，并创建了一个强大的零基本图像分割器。我们的结果表明，可以可靠地理解转换器模型，并使其进行修复和改进。

FireAct: Toward Language Agent Fine-tuning

paper_url: http://arxiv.org/abs/2310.05915
repo_url: https://github.com/anchen1011/FireAct
paper_authors: Baian Chen, Chang Shu, Ehsan Shareghi, Nigel Collier, Karthik Narasimhan, Shunyu Yao
for: 这篇论文主要探讨了如何通过微小示例技术和外部环境来将语言模型（LMs）训练成能够理解和行动的语言代理。
methods: 本文使用了问题回答（QA）设置和Google搜寻API，探索了不同的基础LMs、提示方法、微小示例和QA任务，发现通过微小示例训练基础LMs可以提高语言代理的性能。
results: 例如，将Llama2-7B微小示例训练500条语言代理访问GPT-4，可以提高HotpotQA性能77%。此外，本文提出了FireAct，一种新的微小示例训练LMs的方法，并显示可以透过更多的多元任务和提示方法来进一步提高代理。

Abstract
Recent efforts have augmented language models (LMs) with external tools or environments, leading to the development of language agents that can reason and act. However, most of these agents rely on few-shot prompting techniques with off-the-shelf LMs. In this paper, we investigate and argue for the overlooked direction of fine-tuning LMs to obtain language agents. Using a setup of question answering (QA) with a Google search API, we explore a variety of base LMs, prompting methods, fine-tuning data, and QA tasks, and find language agents are consistently improved after fine-tuning their backbone LMs. For example, fine-tuning Llama2-7B with 500 agent trajectories generated by GPT-4 leads to a 77% HotpotQA performance increase. Furthermore, we propose FireAct, a novel approach to fine-tuning LMs with trajectories from multiple tasks and prompting methods, and show having more diverse fine-tuning data can further improve agents. Along with other findings regarding scaling effects, robustness, generalization, efficiency and cost, our work establishes comprehensive benefits of fine-tuning LMs for agents, and provides an initial set of experimental designs, insights, as well as open questions toward language agent fine-tuning.

摘要
最近努力已经补充语言模型（LM）以外的工具或环境，导致语言代理人能够思考和行动。然而，大多数这些代理人仍然依赖于几个示例提示技术与商业LM。在这篇论文中，我们调查和论证LM的细化提高语言代理人的方向。使用问答（QA）的Google搜索API设置，我们探索了多种基础LM、提示方法、细化数据和QA任务，并发现在细化基础LM后，语言代理人的性能一直提高。例如，将Llama2-7B细化500个代理人轨迹，生成自GPT-4，可以提高HotpotQA表现77%。此外，我们提出了FireAct，一种新的LM细化方法，使用多个任务和提示方法生成的轨迹，并证明更多的多样化细化数据可以进一步提高代理人。此外，我们还发现了扩展效果、稳定性、普适性、效率和成本等方面的优点。我们的工作证明了细化LM为代理人的全面利好，并提供了初步的实验设计、发现以及未解决问题，为语言代理人细化做出了初步的贡献。

SALMON: Self-Alignment with Principle-Following Reward Models

paper_url: http://arxiv.org/abs/2310.05910
repo_url: https://github.com/ibm/salmon
paper_authors: Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan
for: 这篇论文旨在探讨如何使用监督式微调（Supervised Fine-Tuning，SFT）与人类反馈学习（Reinforcement Learning from Human Feedback，RLHF）来调整基于自然语言处理（NLP）的语言模型（LLM），以提高其性能和可控性。
methods: 本论文提出了一个新的方法，即Self-ALignMent（SALMON），可以将基于RLHF的LLM调整为符合人类定义的原则，并且只需要小量的人类监督。这个方法中心在一个以原则为基础的赏罚模型，可以根据人类定义的原则生成赏罚分数，并且可以在RL训练过程中调整这些原则，以控制RL训练出来的策略的行为。
results: 在实验中，使用SALMON方法训练了一个名为Dromedary-2的AI助手，并且证明了Dromedary-2可以在多个benchmark数据集上表现出色，比如LLaMA-2-Chat-70b等现有的AI系统。此外，Dromedary-2只需要6个内容学习示例和31个人类定义的原则，而不需要大量的人类监督。

Abstract
Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM-based AI agents. However, a significant limitation of such an approach is its dependency on high-quality human annotations, making its application to intricate tasks challenging due to difficulties in obtaining consistent response demonstrations and in-distribution response preferences. This paper presents a novel approach, namely SALMON (Self-ALignMent with principle-fOllowiNg reward models), to align base language models with minimal human supervision, using only a small set of human-defined principles, yet achieving superior performance. Central to our approach is a principle-following reward model. Trained on synthetic preference data, this model can generate reward scores based on arbitrary human-defined principles. By merely adjusting these principles during the RL training phase, we gain full control over the preferences with the reward model, subsequently influencing the behavior of the RL-trained policies, and eliminating the reliance on the collection of online human preferences. Applying our method to the LLaMA-2-70b base language model, we developed an AI assistant named Dromedary-2. With only 6 exemplars for in-context learning and 31 human-defined principles, Dromedary-2 significantly surpasses the performance of several state-of-the-art AI systems, including LLaMA-2-Chat-70b, on various benchmark datasets. We have open-sourced the code and model weights to encourage further research into aligning LLM-based AI agents with enhanced supervision efficiency, improved controllability, and scalable oversight.

摘要
<>使用监督微调（SFT）和人工智能反馈学习（RLHF）的方法可以帮助基于自然语言处理（NNP）的人工智能代理人（AI）达到更好的性能。然而，这种方法的一个重要限制是它需要高质量的人类监督，这使得在复杂任务上应用困难，因为获得一致的人类响应示例和在线人类响应偏好是困难的。本文提出了一种新的方法，即Self-ALignMent（SALMON），以使基于自然语言处理的AI代理人与最小的人类监督达到更好的性能。SALMON的核心思想是使用原则遵循奖励模型，这种模型可以根据人类定义的原则生成奖励分数。通过在RL训练阶段调整这些原则，我们可以控制奖励模型中的偏好，并且消除在线人类响应的收集 limitation。我们在LLaMA-2-70b基础语言模型上实现了一个名为Dromedary-2的AI助手。只需6个示例和31个人类定义的原则，Dromedary-2可以在多个标准数据集上达到许多现有AI系统的性能水平。我们已经开源了代码和模型参数，以便进一步的研究可以提高LLM-based AI代理人的监督效率、控制性和可扩展性。

TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models

paper_url: http://arxiv.org/abs/2310.05905
repo_url: None
paper_authors: Zuxin Liu, Jesse Zhang, Kavosh Asadi, Yao Liu, Ding Zhao, Shoham Sabach, Rasool Fakoor
for: 这个研究是为了解决控制领域中大型预训模型的潜力仍未获得充分利用，主要是因为数据的缺乏和训练或精度化这些大型模型的计算挑战。
methods: 本研究提出了TAIL（任务特定拓展器 для循环学习）框架，用于实现新任务的高效适材化。研究参考了现有的效率 fine-tuning技术，如瓶颈拓展器、P-Tuning和低维拓展（LoRA），以适材化大型预训模型。
results: 实验结果显示，TAIL框架将LoRA与其他效率 fine-tuning技术进行比较，在大量语言条件操作任务中可以实现最好的后适材化性能，仅使用1%的trainable parameter，并避免了遗传性遗传和持续学习设定中的干扰。

Abstract
The full potential of large pretrained models remains largely untapped in control domains like robotics. This is mainly because of the scarcity of data and the computational challenges associated with training or fine-tuning these large models for such applications. Prior work mainly emphasizes effective pretraining of large models for decision-making, with little exploration into how to perform data-efficient continual adaptation of these models for new tasks. Recognizing these constraints, we introduce TAIL (Task-specific Adapters for Imitation Learning), a framework for efficient adaptation to new control tasks. Inspired by recent advancements in parameter-efficient fine-tuning in language domains, we explore efficient fine-tuning techniques -- e.g., Bottleneck Adapters, P-Tuning, and Low-Rank Adaptation (LoRA) -- in TAIL to adapt large pretrained models for new tasks with limited demonstration data. Our extensive experiments in large-scale language-conditioned manipulation tasks comparing prevalent parameter-efficient fine-tuning techniques and adaptation baselines suggest that TAIL with LoRA can achieve the best post-adaptation performance with only 1\% of the trainable parameters of full fine-tuning, while avoiding catastrophic forgetting and preserving adaptation plasticity in continual learning settings.

摘要
大型预训模型的潜在能力在控制领域仍然尚未得到充分利用，主要是因为数据的罕见和训练或细化这些大模型 для这些应用程序所需的计算挑战。先前的工作主要强调有效地预训大模型进行决策，却忽略了如何通过数据有效地适应新任务。Recognizing these constraints, we introduce TAIL (Task-specific Adapters for Imitation Learning), a framework for efficient adaptation to new control tasks. Inspired by recent advancements in parameter-efficient fine-tuning in language domains, we explore efficient fine-tuning techniques -- e.g., Bottleneck Adapters, P-Tuning, and Low-Rank Adaptation (LoRA) -- in TAIL to adapt large pretrained models for new tasks with limited demonstration data. Our extensive experiments in large-scale language-conditioned manipulation tasks comparing prevalent parameter-efficient fine-tuning techniques and adaptation baselines suggest that TAIL with LoRA can achieve the best post-adaptation performance with only 1% of the trainable parameters of full fine-tuning, while avoiding catastrophic forgetting and preserving adaptation plasticity in continual learning settings.

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts

paper_url: http://arxiv.org/abs/2310.05898
repo_url: None
paper_authors: Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu
for: 本文旨在解释Lion优化器的理论基础。
methods: 本文使用维度分解的权重衰减和紧密链接的梯度下降来解释Lion优化器的动态。
results: 研究发现Lion优化器在训练大型人工智能模型时表现良好，并且比AdamW更具有内存效率。然而，由于Lion不受任何已知的理论支持，因此其可能性和扩展性受限。本文通过连续时间和离散时间分析，解释了Lion优化器在满足约束条件时的理论基础。

Abstract
Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.

摘要
狮子（演进签证势），一种新的优化器，通过程序搜索发现，在训练大型人工智能模型时表现了有 promise的结果。它与AdamW相比，具有更高的内存效率。由于狮子包含了多种现有算法元素，包括签证、分离权重衰退、Polak和Nesterov势等，但它不属于任何已知的理论基础上的优化器。因此，尽管狮子在许多任务上表现良好，但其理论基础仍然存在uncertainty。这种不确定性限制了可以进一步提高和扩展狮子的效果的机会。本工作的目标是使狮子更加明了。通过连续时间和离散时间分析，我们证明了狮子是一种理论上有基础的优化器，可以在满足一个约束 $\|x\|_\infty \leq 1/\lambda$ 的情况下将一个通用损失函数 $f(x)$ 的最小值。狮子通过含有分离权重衰退的势 decay，其中 $\lambda$ 表示权重衰退系数。我们的分析基于一个新的Lyapunov函数，可以应用于狮子-$\kappa$ 算法中，其中 $sign(\cdot)$ 操作在狮子中被替换为一个凸函数 $\kappa$ 的极值。这些发现对狮子的动态和 Lion-相关算法的进一步改进和扩展提供了有价值的洞察。

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

paper_url: http://arxiv.org/abs/2310.05886
repo_url: None
paper_authors: Utkarsh Oggy Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho, Sai Srujana Buddi, Saurabh Adya, Ahmed Tewfik
for: 这个研究是为了提高实时数据流中的语音和感觉信号预测能力，使用流动 нейрон网络模型，并且因应实际应用环境的限制，不能增加更多的模型参数。
methods: 我们提出了一个新的损失函数，即Streaming Anchor Loss（SAL），并且提出了两种专注于重要几帧的方法，即动态地调整frame-wise cross entropy损失函数，以便将更高的损失penalty赋予在semantically critical events的 temporal proximity内的帧。
results: 我们的实验结果显示，使用SAL来训练流动 нейрон网络模型，可以提高预测的精度和延迟时间，无需增加更多的数据或模型参数，并且可以在三个不同的语音检测任务上达到更好的效果。

Abstract
Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms. Hence, increasing the learning capacity of such streaming models (i.e., by adding more parameters) to improve the predictive power may not be viable for real-world tasks. In this work, we propose a new loss, Streaming Anchor Loss (SAL), to better utilize the given learning capacity by encouraging the model to learn more from essential frames. More specifically, our SAL and its focal variations dynamically modulate the frame-wise cross entropy loss based on the importance of the corresponding frames so that a higher loss penalty is assigned for frames within the temporal proximity of semantically critical events. Therefore, our loss ensures that the model training focuses on predicting the relatively rare but task-relevant frames. Experimental results with standard lightweight convolutional and recurrent streaming networks on three different speech based detection tasks demonstrate that SAL enables the model to learn the overall task more effectively with improved accuracy and latency, without any additional data, model parameters, or architectural changes.

摘要
<>转换给定文本到简化中文。<>流动神经网络模型在资源受限的平台上广泛应用，以快速响应各种语音和感知信号。因此，增加流动模型学习容量（即添加更多参数）以提高预测力可能不是现实世界任务中可行的。在这种情况下，我们提出了一种新的损失函数——流动锚点损失（SAL），以更好地利用给定的学习容量。具体来说，我们的 SAL 和其关注变种在时间 proximity 上动态调整帧 wise cross entropy 损失，以将更高的损失penalty分配给 semantic 事件附近的帧。因此，我们的损失函数使得模型在预测任务相关帧时更加注重。实验结果表明，使用标准轻量级卷积神经网络和流动神经网络在三种不同的语音检测任务上，SAL 可以使模型更好地学习任务，提高准确率和响应时间，不需要额外数据、模型参数或建模变化。

A Meta-Learning Perspective on Transformers for Causal Language Modeling

paper_url: http://arxiv.org/abs/2310.05884
repo_url: None
paper_authors: Xinbo Wu, Lav R. Varshney
for: 这 paper 的目的是解释Transformer架构在开发大型 causal 语言模型时的能力机制。
methods: 这 paper 使用 meta-learning 视角来解释 Transformer 架构在 causal 语言模型任务上的训练过程，并发现了 Transformer 中的内部优化过程中的一种特殊特征。
results: experiments 表明，Transformer 架构在 real-world 数据上可以带来优秀的结果，并且该特殊特征可以在 Transformer 中的 token 表示中找到。

Abstract
The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process that may happen within the Transformer. Further, from within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models. Our analysis is supported by experiments conducted on pre-trained large language models and real-world data.

摘要
“transformer架构在大语言模型开发中变得非常知名，但它们的能力运作 Mechanism 仍未得到充分理解。我们专注于训练过程，以 meta-learning 的视角来探索transformer架构在语言模型化任务上的内部优化过程，并从内部优化中发现了transformer 学习的token表现内 norms 特有的一个特性。我们的分析得到了实验证明，并且在实际应用中得到了支持。”Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and other regions.

Coarse-Graining Hamiltonian Systems Using WSINDy

paper_url: http://arxiv.org/abs/2310.05879
repo_url: None
paper_authors: Daniel A. Messenger, Joshua W. Burby, David M. Bortz
for: 这个论文的目的是推广弱形式粗化算法（WSINDy）到具有approximate symmetries的哈密顿动力学中，以便高效地捕捉相关度量的动力学。
methods: 这个论文使用的方法是WSINDy算法，它可以在具有approximate symmetries的哈密顿动力学中成功地适应粗化。WSINDy算法在数学上保留哈密顿结构，并且计算高效，通常只需要一个轨迹来学习整个减少后的哈密顿系统。
results: 这个论文的结果表明，WSINDy算法可以在具有approximate symmetries的哈密顿动力学中高效地捕捉相关度量的动力学，并且可以在各种实际应用中提供更高精度的预测。例如，在振荡器动力学、Hénon-Heiles系统和电荷粒子动力学等方面都有physically relevant例子。

Abstract
The Weak-form Sparse Identification of Nonlinear Dynamics algorithm (WSINDy) has been demonstrated to offer coarse-graining capabilities in the context of interacting particle systems ( https://doi.org/10.1016/j.physd.2022.133406 ). In this work we extend this capability to the problem of coarse-graining Hamiltonian dynamics which possess approximate symmetries. Such approximate symmetries often lead to the existence of a Hamiltonian system of reduced dimension that may be used to efficiently capture the dynamics of the relevant degrees of freedom. Deriving such reduced systems, or approximating them numerically, is an ongoing challenge. We demonstrate that WSINDy can successfully identify this reduced Hamiltonian system in the presence of large perturbations imparted from both the inexact nature of the symmetry and extrinsic noise. This is significant in part due to the nontrivial means by which such systems are derived analytically. WSINDy naturally preserves the Hamiltonian structure by restricting to a trial basis of Hamiltonian vector fields, and the methodology is computational efficient, often requiring only a single trajectory to learn the full reduced Hamiltonian, and avoiding forward solves in the learning process. In this way, we argue that weak-form equation learning is particularly well-suited for Hamiltonian coarse-graining. Using nearly-periodic Hamiltonian systems as a prototypical class of systems with approximate symmetries, we show that WSINDy robustly identifies the correct leading-order reduced system of dimension $2(N-1)$ or $N$ from the original $(2N)$-dimensional system, upon observation of the relevant degrees of freedom. We provide physically relevant examples, namely coupled oscillator dynamics, the H\'enon-Heiles system for stellar motion within a galaxy, and the dynamics of charged particles.

摘要
“弱形式简润识别非线性动力学算法（WSINDy）已经在互动粒子系统上显示出简润功能。在这个工作中，我们将这个功能扩展到具有约束的哈密顿动力学问题。这些约束通常导致一个简润的哈密顿系统，可以高效地捕捉相关的动力学度复。 derive这个简润系统或 numerically Approximate它是一个ongoing挑战。我们示出WSINDy可以成功地识别这个简润的哈密顿系统，甚至在大规模的干扰和随机变动下。这是由于WSINDy的方法自然地保留哈密顿结构，通过仅对实验基底的哈密顿 вектор场进行限制。此外，WSINDy的方法具有计算效率高，通常只需要一条轨道来学习全部简润哈密顿，而不需要前向 solves在学习过程中。因此，我们认为弱形式方程式学习特别适合哈密顿简润。使用nearly periodic哈密顿系统作为一个具有约束的系统，我们显示WSINDy可以坚定地识别原始(2N)-维系统中的(2N-1)维或N维简润系统，通过观察相关的动力学度复。我们提供了物理相关的例子，包括相互作用的振荡器动力学、Hénon-Heiles系统 для星系动力学和带电粒子的动力学。”

AI Systems of Concern

paper_url: http://arxiv.org/abs/2310.05876
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Kayla Matteucci, Shahar Avin, Fazl Barez, Seán Ó hÉigeartaigh
for: 这篇论文主要是为了讨论高级AI系统中的危险性和控制问题。
methods: 论文使用了多种学术 frameworks和指标来评估高级AI系统中的危险性和控制问题。
results: 论文认为，高级AI系统中的“属性X”特征会导致AI系统的危险性和控制问题，并提出了一些指标和管理措施来评估和限制高级AI系统中的“属性X”特征。

Abstract
Concerns around future dangers from advanced AI often centre on systems hypothesised to have intrinsic characteristics such as agent-like behaviour, strategic awareness, and long-range planning. We label this cluster of characteristics as "Property X". Most present AI systems are low in "Property X"; however, in the absence of deliberate steering, current research directions may rapidly lead to the emergence of highly capable AI systems that are also high in "Property X". We argue that "Property X" characteristics are intrinsically dangerous, and when combined with greater capabilities will result in AI systems for which safety and control is difficult to guarantee. Drawing on several scholars' alternative frameworks for possible AI research trajectories, we argue that most of the proposed benefits of advanced AI can be obtained by systems designed to minimise this property. We then propose indicators and governance interventions to identify and limit the development of systems with risky "Property X" characteristics.

摘要
有些担忧将来的人工智能会带来危险，通常集中在假设存在自我行为、战略意识和远程规划等特性的系统上。我们称这些特性为“X属性”。目前的大多数人工智能系统具有低度的X属性，但在没有干预导航的情况下，当前的研究方向可能会迅速导致高度可能X属性的AI系统的出现。我们认为X属性是危险的，当与更高的能力相结合时，控制和安全难以保证。我们根据一些学者的不同框架，提出了可以通过降低X属性来获得大多数高级人工智能的提antages的可能性。我们还提出了指标和管理措施，以识别和限制开发高风险X属性的系统。

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

paper_url: http://arxiv.org/abs/2310.05872
repo_url: None
paper_authors: Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang
for: 这个论文的目的是探索预训练的视觉语言模型（VLM）和大语言模型（LLM）在视觉常识逻辑（VCR）中的合作能力。
methods: 这个论文使用了预训练的VLM和LLM来解决视觉常识理解（VCU）和视觉常识推理（VCI）问题。VLM提供图像描述来支持LLM进行推理，并且使用了一种协作策略，让LLM在不确定的推理时指导VLM集中关注相关的视觉元素。
results: 这个论文在两个VCR benchmark数据集上进行了评估，并与没有培 retrained fine-tuning的方法进行比较，得到了更好的性能。

Abstract
In our work, we explore the synergistic capabilities of pre-trained vision-and-language models (VLMs) and large language models (LLMs) for visual commonsense reasoning (VCR). We categorize the problem of VCR into visual commonsense understanding (VCU) and visual commonsense inference (VCI). For VCU, which involves perceiving the literal visual content, pre-trained VLMs exhibit strong cross-dataset generalization. On the other hand, in VCI, where the goal is to infer conclusions beyond image content, VLMs face difficulties. We find that a baseline where VLMs provide perception results (image captions) to LLMs leads to improved performance on VCI. However, we identify a challenge with VLMs' passive perception, which often misses crucial context information, leading to incorrect or uncertain reasoning by LLMs. To mitigate this issue, we suggest a collaborative approach where LLMs, when uncertain about their reasoning, actively direct VLMs to concentrate on and gather relevant visual elements to support potential commonsense inferences. In our method, named ViCor, pre-trained LLMs serve as problem classifiers to analyze the problem category, VLM commanders to leverage VLMs differently based on the problem classification, and visual commonsense reasoners to answer the question. VLMs will perform visual recognition and understanding. We evaluate our framework on two VCR benchmark datasets and outperform all other methods that do not require in-domain supervised fine-tuning.

摘要
在我们的工作中，我们探索预训练的视觉语言模型（VLM）和大型语言模型（LLM）在视觉常识逻辑（VCR）中的共同能力。我们将VCR分为视觉常识理解（VCU）和视觉常识推理（VCI）两个问题。在VCU中，涉及到直接读取图像内容的VCMs表现出了强大的跨数据集泛化能力。然而，在VCI中，VCMs面临困难，因为需要从图像内容中做出更多的推理。我们发现，将VCMs提供视觉内容（图像描述）给LLMs可以提高VCI的性能。然而，我们发现VCMs的被动感知有时会错过重要的上下文信息，导致LLMs的推理错误或不确定。为了解决这个问题，我们提议一种协作方法，其中LLMs在不确定的推理时会活动地指导VCMs集中聚焦和收集相关的视觉元素以支持可能的常识推理。我们称这种方法为ViCor，其中预训练的LLMs serves为问题类别分析器，VCM commander以不同的问题类别来利用VCMs，并且视觉常识推理器来回答问题。VCMs将进行视觉识别和理解。我们对VCR benchmark数据集进行评估，并在不需要域内抽象精细调整的情况下超越所有其他方法。

Dynamic value alignment through preference aggregation of multiple objectives

paper_url: http://arxiv.org/abs/2310.05871
repo_url: None
paper_authors: Marcin Korecki, Damian Dailisan, Cesare Carissimo
for: 这项研究目标是为了开发能够与人类目标相对应的伦理AI系统。
methods: 这种方法使用多目标方法来动态调整价值，以确保RL算法能够同时满足多个目标。
results: 这种方法在简化后的两脚交叉控制系统中进行了应用，并实现了在三个维度（速度、停止和等待时间）的全面性能提高，同时能够有效地 инте格各个目标之间的矛盾。

Abstract
The development of ethical AI systems is currently geared toward setting objective functions that align with human objectives. However, finding such functions remains a research challenge, while in RL, setting rewards by hand is a fairly standard approach. We present a methodology for dynamic value alignment, where the values that are to be aligned with are dynamically changing, using a multiple-objective approach. We apply this approach to extend Deep $Q$-Learning to accommodate multiple objectives and evaluate this method on a simplified two-leg intersection controlled by a switching agent.Our approach dynamically accommodates the preferences of drivers on the system and achieves better overall performance across three metrics (speeds, stops, and waits) while integrating objectives that have competing or conflicting actions.

摘要
现在的人工智能系统开发将注意力集中在设定目标函数，以便与人类目标相互对应。然而，发现这些函数仍然是研究挑战，而在RL中，手动设置优化奖励是一种标准的方法。我们提出了动态值协调的方法，使得需要协调的价值可以随时变化，使用多目标方法。我们将这种方法应用到深度Q学来扩展多目标，并评估这种方法在简化的二脚交汇控制系统上。我们的方法可以动态地考虑驱驶者对系统的偏好，并 achieve better 总性表现（速度、停止和等待时间），并同时考虑了具有竞争或冲突的目标。

HyperAttention: Long-context Attention in Near-Linear Time

paper_url: http://arxiv.org/abs/2310.05869
repo_url: None
paper_authors: Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh
For: 提高Large Language Model（LLM）中长上下文的计算效率。* Methods: 引入两个细化参数，分别测量：（1）折衔列norm在归一化注意力矩阵中，和（2）行norm在归一化注意力矩阵后的大项检测和 removing。使用这两个参数捕捉问题的困难程度。* Results: HyperAttention比 existed方法更快，具有linear time sampling算法，并且可以适应不同的长上下文长度。Empirical experiments表明， HyperAttention在不同的长上下文长度上都具有良好的性能。例如，在32k上下文长度上，HyperAttention可以提高ChatGLM2的推理速度50%，而且只增加了0.7的折衔值。在更大的长上下文长度上，HyperAttention可以提高单层注意力层的速度5倍。

Abstract
We present an approximate attention mechanism named HyperAttention to address the computational challenges posed by the growing complexity of long contexts used in Large Language Models (LLMs). Recent work suggests that in the worst-case scenario, quadratic time is necessary unless the entries of the attention matrix are bounded or the matrix has low stable rank. We introduce two parameters which measure: (1) the max column norm in the normalized attention matrix, and (2) the ratio of row norms in the unnormalized attention matrix after detecting and removing large entries. We use these fine-grained parameters to capture the hardness of the problem. Despite previous lower bounds, we are able to achieve a linear time sampling algorithm even when the matrix has unbounded entries or a large stable rank, provided the above parameters are small. HyperAttention features a modular design that easily accommodates integration of other fast low-level implementations, particularly FlashAttention. Empirically, employing Locality Sensitive Hashing (LSH) to identify large entries, HyperAttention outperforms existing methods, giving significant speed improvements compared to state-of-the-art solutions like FlashAttention. We validate the empirical performance of HyperAttention on a variety of different long-context length datasets. For example, HyperAttention makes the inference time of ChatGLM2 50\% faster on 32k context length while perplexity increases from 5.6 to 6.3. On larger context length, e.g., 131k, with causal masking, HyperAttention offers 5-fold speedup on a single attention layer.

摘要
我们提出了一种近似的注意机制名为HyperAttention，以Addressing the computational challenges posed by the growing complexity of long contexts used in Large Language Models (LLMs). Recent work suggests that in the worst-case scenario, quadratic time is necessary unless the entries of the attention matrix are bounded or the matrix has low stable rank. We introduce two parameters to measure: (1) the maximum column norm in the normalized attention matrix, and (2) the ratio of row norms in the unnormalized attention matrix after detecting and removing large entries. We use these fine-grained parameters to capture the hardness of the problem. Despite previous lower bounds, we are able to achieve a linear time sampling algorithm even when the matrix has unbounded entries or a large stable rank, provided the above parameters are small. HyperAttention features a modular design that easily accommodates integration of other fast low-level implementations, particularly FlashAttention. Empirically, employing Locality Sensitive Hashing (LSH) to identify large entries, HyperAttention outperforms existing methods, giving significant speed improvements compared to state-of-the-art solutions like FlashAttention. We validate the empirical performance of HyperAttention on a variety of different long-context length datasets. For example, HyperAttention makes the inference time of ChatGLM2 50% faster on a context length of 32k while perplexity increases from 5.6 to 6.3. On larger context length, e.g., 131k, with causal masking, HyperAttention offers 5-fold speedup on a single attention layer.

Generative quantum machine learning via denoising diffusion probabilistic models

paper_url: http://arxiv.org/abs/2310.05866
repo_url: None
paper_authors: Bingzhi Zhang, Peng Xu, Xiaohui Chen, Quntao Zhuang
for: 本文旨在探讨Quantum Denoising Diffusion Probabilistic Models（QuDDPM），它是一种可以有效地学习量子数据的生成学模型。
methods: QuDDPM使用多层环路来保证表达能力，并在训练过程中引入多个中间任务来避免荒漠板和提高训练效率。
results: QuDDPM可以有效地学习相关的量子噪声模型和量子数据的topological结构。

Abstract
Deep generative models are key-enabling technology to computer vision, text generation and large language models. Denoising diffusion probabilistic models (DDPMs) have recently gained much attention due to their ability to generate diverse and high-quality samples in many computer vision tasks, as well as to incorporate flexible model architectures and relatively simple training scheme. Quantum generative models, empowered by entanglement and superposition, have brought new insight to learning classical and quantum data. Inspired by the classical counterpart, we propose the quantum denoising diffusion probabilistic models (QuDDPM) to enable efficiently trainable generative learning of quantum data. QuDDPM adopts sufficient layers of circuits to guarantee expressivity, while introduces multiple intermediate training tasks as interpolation between the target distribution and noise to avoid barren plateau and guarantee efficient training. We demonstrate QuDDPM's capability in learning correlated quantum noise model and learning topological structure of nontrivial distribution of quantum data.

摘要
深度生成模型是计算机视觉、文本生成和大语言模型的关键技术。干扰扩散probabilistic模型（DDPM）在计算机视觉任务中最近受到了广泛关注，因为它们可以生成多样和高质量的样本，同时可以采用灵活的模型架构和简单的训练方案。量子生成模型，受到共聚和超position的 empowerment，为学习классиcu和量子数据提供了新的视角。以类型 counterpart为基础，我们提出了量子干扰扩散probabilistic模型（QuDDPM），以实现高效可训练的生成学习 quantum data。QuDDPM采用了多层环路来保证表达力，同时引入多个中间训练任务作为干扰和静止的插值，以避免恐慌板和保证高效的训练。我们示例了QuDDPM在学习相关的量子噪声模型和学习不规则分布的 topological结构。

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

paper_url: http://arxiv.org/abs/2310.05863
repo_url: https://github.com/briansidp/audiovisualllm
paper_authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
for: 这篇论文旨在扩展文本基础的语言模型，以同时处理音频和视频输入流，以便Language Model（LLM）更好地理解通用视频输入。
methods: 该论文提出了一种名为FAVOR的 audio-visual联合表示学习框架，通过把音频和视频输入流与LLM输入空间进行同步，以实现高精度的音频视频联合表示。具体来说，该框架包括一个 causal Q-Former 结构和一个 causal attention模块，以增强音频视频帧之间的 causal 关系的捕捉。
results: 在AVEB评估准则下，FAVOR实现了与单modal任务相当的性能，并在视频问答任务上达到了20%以上的性能提升。此外，FAVOR还示出了在其他多modal LLVM中缺乏的视频理解和推理能力。

Abstract
Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs. To this end, a fine-grained audio-visual joint representation (FAVOR) learning framework for multimodal LLMs is proposed in this paper, which extends a text-based LLM to simultaneously perceive speech and audio events in the audio input stream and images or videos in the visual input stream, at the frame level. To fuse the audio and visual feature streams into joint representations and to align the joint space with the LLM input embedding space, we propose a causal Q-Former structure with a causal attention module to enhance the capture of causal relations of the audio-visual frames across time. An audio-visual evaluation benchmark (AVEB) is also proposed which comprises six representative single-modal tasks with five cross-modal tasks reflecting audio-visual co-reasoning abilities. While achieving competitive single-modal performance on audio, speech and image tasks in AVEB, FAVOR achieved over 20% accuracy improvements on the video question-answering task when fine-grained information or temporal causal reasoning is required. FAVOR, in addition, demonstrated remarkable video comprehension and reasoning abilities on tasks that are unprecedented by other multimodal LLMs. An interactive demo of FAVOR is available at https://github.com/BriansIDP/AudioVisualLLM.git, and the training code and model checkpoints will be released soon.

摘要
大型语音视频语言模型（LLM）已经吸引了广泛的注意，然而将两个输入流进行细腻的组合仍然是一项挑战，这是LLM理解普通视频输入的必要条件。为此，本文提出了一个 audio-visual 共同表示学习框架（FAVOR），该框架将文本语言模型扩展到同时感知语音和视频流的帧级别上，并将语音和视频特征流 fusion 到共同表示。为了将音频和视频特征流与语言模型输入空间进行对应，我们提出了一种 causal Q-Former 结构，并在其中添加了一个 causal 注意模块，以增强捕捉音频视频帧之间的 causal 关系。我们还提出了一个 audio-visual 评价指标（AVEB），该指标包括6种单Modal任务和5种跨Modal任务，旨在测试音频、语音和图像任务中的单Modal性能，以及音频视频之间的相互理解能力。FAVOR在 AVEB 中 achieved 20% 以上的性能提升，特别是在需要细腻信息或时间 causal 逻辑时。此外，FAVOR 还示出了在其他多Modal LLM 未能达到的视频理解和逻辑能力。FAVOR 的交互 demo 可以在 GitHub 上找到（https://github.com/BriansIDP/AudioVisualLLM.git），训练代码和模型检查点将很快地发布。

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

paper_url: http://arxiv.org/abs/2310.05861
repo_url: https://github.com/archiki/repare
paper_authors: Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
for: 这个研究旨在提高零或几 shot 下的视觉语言任务性能，通过将大型语言模型（LLM）与视觉编码器结合起来，得到大型视觉语言模型（LVLM）。methods: 该研究使用了 gradient-free 框架，名为 RepARe，可以提取图像中核心信息，并通过 LLM 作为描述者和理解者，对原始问题进行修改。results: 研究发现，使用 RepARe 可以提高零或几 shot 下的视觉语言任务性能，在 VQAv2 和 A-OKVQA 两个任务上分别提高了 3.85% 和 6.41%。此外，使用黄金答案作为oracle问题候选选择，可以实现更大的提高。

Abstract
An increasing number of vision-language tasks can be handled with little to no training, i.e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs). While this has huge upsides, such as not requiring training data or custom architectures, how an input is presented to a LVLM can have a major impact on zero-shot model performance. In particular, inputs phrased in an underspecified way can result in incorrect answers due to factors like missing visual information, complex implicit reasoning, or linguistic ambiguity. Therefore, adding visually grounded information to the input as a preemptive clarification should improve model performance by reducing underspecification, e.g., by localizing objects and disambiguating references. Similarly, in the VQA setting, changing the way questions are framed can make them easier for models to answer. To this end, we present Rephrase, Augment and Reason (RepARe), a gradient-free framework that extracts salient details about the image using the underlying LVLM as a captioner and reasoner, in order to propose modifications to the original question. We then use the LVLM's confidence over a generated answer as an unsupervised scoring function to select the rephrased question most likely to improve zero-shot performance. Focusing on two visual question answering tasks, we show that RepARe can result in a 3.85% (absolute) increase in zero-shot performance on VQAv2 and a 6.41% point increase on A-OKVQA. Additionally, we find that using gold answers for oracle question candidate selection achieves a substantial gain in VQA accuracy by up to 14.41%. Through extensive analysis, we demonstrate that outputs from RepARe increase syntactic complexity, and effectively utilize vision-language interaction and the frozen language model in LVLMs.

摘要
随着更多的视觉任务可以通过几个或者 zero-shot 方式处理，大量语言模型（LLM）与视觉编码器结合形成大型视觉语言模型（LVLM）。虽然这有着巨大的优点，如不需要训练数据或自定义架构，但输入如何给 LVLM 是非常重要的。特别是，用 underspecified 的方式提交输入可能会导致错误答案，因为缺失视觉信息、复杂的隐式推理或语言抽象。因此，通过添加视觉固定信息来增强输入可以提高模型性能，例如，localizing objects 和解决参考。在 VQA Setting 中，改变问题的表述方式可以使模型更容易回答。为此，我们提出了 Rephrase、Augment 和 Reason（RepARe）框架，它使用 underlying LVLM 作为captioner和reasoner来提取图像中的精锦信息，并提出修改原始问题。然后，使用 LVLM 对生成的答案的信任度作为无supervised 评分函数来选择修改后的问题。我们在两个视觉问答任务上进行了实验，结果表明，RepARe 可以提高零shot性能，VQAv2 上提高 3.85%（绝对值），A-OKVQA 上提高 6.41% 点。此外，我们发现使用黄金答案作为oracle问题候选者选择可以在 VQA 中提高准确率，最高提高 14.41%。通过广泛的分析，我们证明了 RepARe 输出增加了语法复杂性，并有效地利用了视觉语言交互和冰凉语言模型在 LVLM 中。

Improving Summarization with Human Edits

paper_url: http://arxiv.org/abs/2310.05857
repo_url: None
paper_authors: Zonghai Yao, Benjamin J Schloss, Sai P. Selvaraj
for: 这paper主要针对的是如何使用人类反馈来提高自然语言处理模型的质量。
methods: 该paper提出了一种新的技术Sequence Alignment（un）Likelihood Training（SALT），它可以同时使用人类编辑和模型生成的数据来进行训练。此外，paper还提出了一种伪编辑技术来模拟人类编辑数据，以降低人类编辑数据的成本。
results: paper的实验结果表明，SALT可以提高SUMMARY的质量，并且在医学领域SUMMARY中表现更好。此外，paper还比较了SALT与传统的RLHF方法（DPO），发现SALT在使用人类编辑数据时能够表现更好。

Abstract
Recent work has shown the promise of learning with human feedback paradigms to produce human-determined high-quality text. Existing works use human feedback to train large language models (LLMs) in general domain abstractive summarization and have obtained summary quality exceeding traditional likelihood training. In this paper, we focus on a less explored form of human feedback -- Human Edits. We propose Sequence Alignment (un)Likelihood Training (SALT), a novel technique to use both the human-edited and model-generated data together in the training loop. In addition, we demonstrate simulating Human Edits with ground truth summaries coming from existing training data -- Imitation edits, along with the model-generated summaries obtained after the training, to reduce the need for expensive human-edit data. In our experiments, we extend human feedback exploration from general domain summarization to medical domain summarization. Our results demonstrate the effectiveness of SALT in improving the summary quality with Human and Imitation Edits. Through additional experiments, we show that SALT outperforms the conventional RLHF method (designed for human preferences) -- DPO, when applied to human-edit data. We hope the evidence in our paper prompts researchers to explore, collect, and better use different human feedback approaches scalably.

摘要
近期研究表明了使用人类反馈方式进行学习可以生成人决定的高质量文本。现有研究使用人类反馈来训练大语言模型（LLM），并已经获得了传统可能性训练所超越的摘要质量。在这篇论文中，我们关注了一种 menos explored的人类反馈方式---人类编辑。我们提出了序列匹配（不）可能性训练（SALT），一种新的技术，可以在训练循环中结合人类编辑和模型生成的数据。此外，我们还示出了使用现有训练数据中的真实编辑和模型生成的摘要来模拟人类编辑的方法，以降低人类编辑数据的成本。在我们的实验中，我们扩展了人类反馈的探索，从通用领域摘要扩展到医学领域摘要。我们的结果表明SALT可以提高摘要质量，并且在使用人类编辑和伪编辑数据时表现出色。通过额外的实验，我们还证明了SALT在使用人类编辑数据时超过了传统的RLHF方法（设计为人类偏好）——DPO。我们希望这篇论文的证据能够让研究人员更好地探索、收集和利用不同的人类反馈方式，以便在大规模的应用中进行更好的学习。

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

paper_url: http://arxiv.org/abs/2310.05845
repo_url: https://github.com/mistyreed63849/graph-llm
paper_authors: Ziwei Chai, Tianjie Zhang, Liang Wu, Kaiqiao Han, Xiaohai Hu, Xuanwen Huang, Yang Yang
for: 这篇论文旨在解决大语言模型（LLM）在图数据理解和推理方面的瓶颈问题。
methods: 该论文提出了一种结合图学学习模型和LLM的综合方法，称为GraphLLM，以提高LLM在图数据理解和推理方面的能力。
results: 实验结果表明，GraphLLM可以提高图数据理解和推理的准确率，并将上下文量减少96.45%。

Abstract
The advancement of Large Language Models (LLMs) has remarkably pushed the boundaries towards artificial general intelligence (AGI), with their exceptional ability on understanding diverse types of information, including but not limited to images and audio. Despite this progress, a critical gap remains in empowering LLMs to proficiently understand and reason on graph data. Recent studies underscore LLMs' underwhelming performance on fundamental graph reasoning tasks. In this paper, we endeavor to unearth the obstacles that impede LLMs in graph reasoning, pinpointing the common practice of converting graphs into natural language descriptions (Graph2Text) as a fundamental bottleneck. To overcome this impediment, we introduce GraphLLM, a pioneering end-to-end approach that synergistically integrates graph learning models with LLMs. This synergy equips LLMs with the ability to proficiently interpret and reason on graph data, harnessing the superior expressive power of graph learning models. Our empirical evaluations across four fundamental graph reasoning tasks validate the effectiveness of GraphLLM. The results exhibit a substantial average accuracy enhancement of 54.44%, alongside a noteworthy context reduction of 96.45% across various graph reasoning tasks.

摘要
大语言模型（LLM）的发展有力地推动了人工通用智能（AGI）的前进， LLM 表现出色地理解多种信息类型，包括图像和音频。然而，Graph reasoning 领域中 LLM 的表现仍然不够，特别是在基本的图级 reasoning 任务中。研究发现，这是由于通常将图转换成自然语言描述（Graph2Text）的做法而导致的。为了缓解这个障碍，我们提出了 GraphLLM，一种独特的端到端方法，它 synergistically 结合了图学学习模型和 LLM 。这种结合使得 LLM 能够有效地理解和处理图数据，并且利用图学学习模型的更高表达力。我们在四个基本的图级 reasoning 任务上进行了Empirical 评估，结果表明 GraphLLM 的效果惊人，相对于基eline 方法，GraphLLM 的均值精度提高了54.44%，同时在不同的图级 reasoning 任务中Context 减少了96.45%。

Predicting Accident Severity: An Analysis Of Factors Affecting Accident Severity Using Random Forest Model

paper_url: http://arxiv.org/abs/2310.05840
repo_url: None
paper_authors: Adekunle Adefabi, Somtobe Olisah, Callistus Obunadike, Oluwatosin Oyetubo, Esther Taiwo, Edward Tella
for: 预测交通事故严重程度，以采取措施降低交通事故的发生频率。
methods: 使用Random Forest机器学习算法，对大都会区交通事故记录数据进行训练，并对模型进行优化。
results: Random Forest模型的准确率高于80%，其中最重要的6个变量为风速、气压、湿度、视力、清晰天气和云层覆盖。

Abstract
Road accidents have significant economic and societal costs, with a small number of severe accidents accounting for a large portion of these costs. Predicting accident severity can help in the proactive approach to road safety by identifying potential unsafe road conditions and taking well-informed actions to reduce the number of severe accidents. This study investigates the effectiveness of the Random Forest machine learning algorithm for predicting the severity of an accident. The model is trained on a dataset of accident records from a large metropolitan area and evaluated using various metrics. Hyperparameters and feature selection are optimized to improve the model's performance. The results show that the Random Forest model is an effective tool for predicting accident severity with an accuracy of over 80%. The study also identifies the top six most important variables in the model, which include wind speed, pressure, humidity, visibility, clear conditions, and cloud cover. The fitted model has an Area Under the Curve of 80%, a recall of 79.2%, a precision of 97.1%, and an F1 score of 87.3%. These results suggest that the proposed model has higher performance in explaining the target variable, which is the accident severity class. Overall, the study provides evidence that the Random Forest model is a viable and reliable tool for predicting accident severity and can be used to help reduce the number of fatalities and injuries due to road accidents in the United States

摘要
道路交通事故具有重要的经济和社会成本，一小部分严重事故占据了大部分成本。预测事故严重性可以帮助采取抢险策略，识别可能发生事故的不安全道路情况，采取有知识的行动，以减少严重事故的数量。本研究检查Random Forest机器学习算法是否能够预测事故严重性。模型在一个大都市区的事故记录 dataset 上训练，并使用不同的指标进行评估。Hyperparameters 和特征选择被优化，以提高模型的性能。结果显示，Random Forest 模型可以高效地预测事故严重性，准确率高于 80%。研究还确定了最重要的六个变量，包括风速、压力、湿度、视程、晴天和云层覆盖。已经适应的模型具有折线曲线面积为 80%，回归率为 79.2%，准确率为 97.1%，F1 分数为 87.3%。这些结果表明，提案的模型具有更高的表达能力，可以帮助减少因道路交通事故而导致的死亡和伤害。总的来说，本研究提供了Random Forest模型是可靠和可靠的预测事故严重性工具，可以在美国用于减少道路交通事故的死亡和伤害。

Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis

paper_url: http://arxiv.org/abs/2310.05804
repo_url: https://github.com/Haoyu-ha/ALMT
paper_authors: Haoyu Zhang, Yu Wang, Guanghao Yin, Kejun Liu, Yuanyuan Liu, Tianshu Yu
for: 提高多模态情感分析（MSA）的性能，增强模态之间的协调和相互启发。
methods: 采用适应语言引导多模态变换（ALMT），包括自适应超模态学习（AHL）模块，从视频和音频特征中学习干扰和冲突抑制表示。
results: 在多个流行数据集（如 MOSI、MOSEI 和 CH-SIMS）上实现状态的表现，并通过多种缺省示例证明了我们的干扰和冲突抑制机制的有效性和必要性。

Abstract
Though Multimodal Sentiment Analysis (MSA) proves effective by utilizing rich information from multiple sources (e.g., language, video, and audio), the potential sentiment-irrelevant and conflicting information across modalities may hinder the performance from being further improved. To alleviate this, we present Adaptive Language-guided Multimodal Transformer (ALMT), which incorporates an Adaptive Hyper-modality Learning (AHL) module to learn an irrelevance/conflict-suppressing representation from visual and audio features under the guidance of language features at different scales. With the obtained hyper-modality representation, the model can obtain a complementary and joint representation through multimodal fusion for effective MSA. In practice, ALMT achieves state-of-the-art performance on several popular datasets (e.g., MOSI, MOSEI and CH-SIMS) and an abundance of ablation demonstrates the validity and necessity of our irrelevance/conflict suppression mechanism.

摘要
这篇文章探讨了多modal sentiment分析（MSA）的问题，MSA可以利用多种资料源（如语言、视频和音频）来分析情感，但是可能会存在不相关或冲突的资料，这可能会妨碍MSA的表现。为了解决这个问题，我们提出了适应语言导向多modal transformer（ALMT），它包括一个适应多模式学习（AHL）模组，可以从视觉和音频特征中学习一个不相关或冲突的表现。这个表现可以与语言特征进行联合表现，从而实现有效的MSA。在实践中，ALMT在多个流行的数据集（如MOSI、MOSEI和CH-SIMS）上 achieve state-of-the-art 表现，并且进行了丰富的ablation 测试，以验证我们的不相关或冲突抑制机制的有效性和必要性。

Are Large Language Models Post Hoc Explainers?

paper_url: http://arxiv.org/abs/2310.05797
repo_url: https://github.com/AI4LIFE-GROUP/LLM_Explainer
paper_authors: Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
for: This paper aims to study the effectiveness of large language models (LLMs) in explaining other predictive models.
methods: The paper proposes a novel framework that utilizes multiple prompting strategies, including perturbation-based ICL, prediction-based ICL, instruction-based ICL, and explanation-based ICL, to generate explanations for other models.
results: The paper demonstrates that LLM-generated explanations perform on par with state-of-the-art post hoc explainers, with an average accuracy of 72.19% in identifying the most important feature.

Abstract
Large Language Models (LLMs) are increasingly used as powerful tools for a plethora of natural language processing (NLP) applications. A recent innovation, in-context learning (ICL), enables LLMs to learn new tasks by supplying a few examples in the prompt during inference time, thereby eliminating the need for model fine-tuning. While LLMs have been utilized in several applications, their applicability in explaining the behavior of other models remains relatively unexplored. Despite the growing number of new explanation techniques, many require white-box access to the model and/or are computationally expensive, highlighting a need for next-generation post hoc explainers. In this work, we present the first framework to study the effectiveness of LLMs in explaining other predictive models. More specifically, we propose a novel framework encompassing multiple prompting strategies: i) Perturbation-based ICL, ii) Prediction-based ICL, iii) Instruction-based ICL, and iv) Explanation-based ICL, with varying levels of information about the underlying ML model and the local neighborhood of the test sample. We conduct extensive experiments with real-world benchmark datasets to demonstrate that LLM-generated explanations perform on par with state-of-the-art post hoc explainers using their ability to leverage ICL examples and their internal knowledge in generating model explanations. On average, across four datasets and two ML models, we observe that LLMs identify the most important feature with 72.19% accuracy, opening up new frontiers in explainable artificial intelligence (XAI) to explore LLM-based explanation frameworks.

摘要

Rethinking Memory and Communication Cost for Efficient Large Language Model Training

paper_url: http://arxiv.org/abs/2310.06003
repo_url: None
paper_authors: Chan Wu, Hanxiao Zhang, Lin Ju, Jinjing Huang, Youshao Xiao, Zhaoxin Huan, Siyuan Li, Fanzhuang Meng, Lei Liang, Xiaolu Zhang, Jun Zhou
for: 本研究旨在提出一种能够均衡内存消耗和通信成本的大语言模型训练策略集Partial Redundancy Optimizer (PaRO)，以提高训练效率。
methods: 本研究使用了细化的分割策略和 Hierarchical Overlapping Ring (HO-Ring) 通信拓扑，以减少内存重复和通信成本，提高训练效率。
results: 实验表明，PaRO 可以提高训练速度，相比 SOTA 方法的 1.19x-2.50x，并实现近线性扩展性。 HO-Ring 算法可以提高通信效率，相比传统的 Ring 算法的 36.5%。

Abstract
Recently, various distributed strategies for large language model training have been proposed. However, these methods provided limited solutions for the trade-off between memory consumption and communication cost. In this paper, we rethink the impact of memory consumption and communication costs on the training speed of large language models, and propose a memory-communication balanced strategy set Partial Redundancy Optimizer (PaRO). PaRO provides comprehensive options which reduces the amount and frequency of inter-group communication with minor memory redundancy by fine-grained sharding strategy, thereby improving the training efficiency in various training scenarios. Additionally, we propose a Hierarchical Overlapping Ring (HO-Ring) communication topology to enhance communication efficiency between nodes or across switches in large language model training. Our experiments demonstrate that PaRO significantly improves training throughput by 1.19x-2.50x compared to the SOTA method and achieves a near-linear scalability. The HO-Ring algorithm improves communication efficiency by 36.5% compared to the traditional Ring algorithm.

摘要
近期，许多分布式策略 для大型自然语言模型训练被提出。然而，这些方法具有限制的解决方案，即内存消耗和通信成本之间的质量协调。在本文中，我们重新思考大型自然语言模型训练中内存消耗和通信成本的影响，并提出了一个内存-通信平衡策略集Partial Redundancy Optimizer（PaRO）。PaRO提供了全面的选项，以减少归并分组通信的数量和频率，并通过细化分组策略减少内存约束，从而提高训练效率在不同的训练场景中。此外，我们提出了层次 overlap 环（HO-Ring）通信架构，以提高在节点或交换机之间的通信效率。我们的实验表明，PaRO可以对比SOTA方法提高训练速度，并实现近似线性扩展性。HO-Ring算法提高了传输效率，相比传统环算法，提高了36.5%。

DANet: Enhancing Small Object Detection through an Efficient Deformable Attention Network

paper_url: http://arxiv.org/abs/2310.05768
repo_url: None
paper_authors: Md Sohag Mia, Abdullah Al Bary Voban, Abu Bakor Hayat Arnob, Abdu Naim, Md Kawsar Ahmed, Md Shariful Islam
for: 这个论文旨在提高生产环境中小物件检测的效率和精度，以提高产品质量和安全性。
methods: 本论文使用的方法包括嵌入式Pyramid Network，扭转网络，对应适应网络，并且将Convolutional Block Attention Module加入每个基本ResNet50对组。
results: 本论文的模型在NEU-DET和Pascal VOC datasets上的认知性和泛化能力得到了证明，特别是在识别不同类型的钢材损坏时表现出色。

Abstract
Efficient and accurate detection of small objects in manufacturing settings, such as defects and cracks, is crucial for ensuring product quality and safety. To address this issue, we proposed a comprehensive strategy by synergizing Faster R-CNN with cutting-edge methods. By combining Faster R-CNN with Feature Pyramid Network, we enable the model to efficiently handle multi-scale features intrinsic to manufacturing environments. Additionally, Deformable Net is used that contorts and conforms to the geometric variations of defects, bringing precision in detecting even the minuscule and complex features. Then, we incorporated an attention mechanism called Convolutional Block Attention Module in each block of our base ResNet50 network to selectively emphasize informative features and suppress less useful ones. After that we incorporated RoI Align, replacing RoI Pooling for finer region-of-interest alignment and finally the integration of Focal Loss effectively handles class imbalance, crucial for rare defect occurrences. The rigorous evaluation of our model on both the NEU-DET and Pascal VOC datasets underscores its robust performance and generalization capabilities. On the NEU-DET dataset, our model exhibited a profound understanding of steel defects, achieving state-of-the-art accuracy in identifying various defects. Simultaneously, when evaluated on the Pascal VOC dataset, our model showcases its ability to detect objects across a wide spectrum of categories within complex and small scenes.

摘要
efficient和准确的小对象检测在制造环境中是至关重要的，以确保产品质量和安全。为解决这个问题，我们提出了一项涵合策略，将Faster R-CNN与前沿技术相结合。通过将Faster R-CNN与Feature Pyramid Network结合使用，我们让模型能够有效地处理制造环境中的多尺度特征。此外，我们还使用了Deformable Net，它可以根据缺陷的几何变化进行扭形和适应，提高缺陷检测的精度。接着，我们在每个基本ResNet50网络块中添加了Convolutional Block Attention Module，以选择特征中的有用信息，并抑制无用的信息。然后，我们将RoI Align取代RoI Pooling，以实现更细的区域对齐。最后，我们通过集成Focal Loss来有效地处理类偏好，这是对罕见缺陷的检测中非常重要。我们在NEU-DET和Pascal VOC数据集上进行了严格的评估，并证明了我们的模型在不同的环境下具有出色的稳定性和泛化能力。在NEU-DET数据集上，我们的模型对钢铁缺陷进行了深刻的理解，实现了不同缺陷的状况下的最高精度。同时，当我们的模型在Pascal VOC数据集上进行评估时，它展示了对多种类别的对象检测的能力，并在复杂和小的场景中具有出色的检测能力。

Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design

paper_url: http://arxiv.org/abs/2310.05764
repo_url: https://github.com/hannesstark/flowsite
paper_authors: Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola
for: 这个论文的目的是设计蛋白质结构域以便与小分子结合。
methods: 论文使用了一种名为HarmonicFlow的改进的生成过程，用于生成3D蛋白质-小分子结合结构。这个过程还可以同时生成蛋白质结构域中的精确残基类型和小分子的结合3D结构。
results: 论文表明，HarmonicFlow在简洁性、通用性和性能方面都超过了现有的生成过程，并且可以设计蛋白质结构域的精确 binding 结构。这种结构模型使得FlowSite可以设计精确的蛋白质结构域，并提供了首个通用的蛋白质结构域设计方法。

Abstract
A significant amount of protein function requires binding small molecules, including enzymatic catalysis. As such, designing binding pockets for small molecules has several impactful applications ranging from drug synthesis to energy storage. Towards this goal, we first develop HarmonicFlow, an improved generative process over 3D protein-ligand binding structures based on our self-conditioned flow matching objective. FlowSite extends this flow model to jointly generate a protein pocket's discrete residue types and the molecule's binding 3D structure. We show that HarmonicFlow improves upon the state-of-the-art generative processes for docking in simplicity, generality, and performance. Enabled by this structure modeling, FlowSite designs binding sites substantially better than baseline approaches and provides the first general solution for binding site design.

摘要
一些蛋白质功能需要与小分子结合，包括enzymatic catalysis。因此，设计小分子结合 pocket 有很多有效的应用，从药物合成到能量储存。为达到这个目标，我们首先开发了 HarmonicFlow，一种改进的生成过程，用于生成3D蛋白质-小分子结合结构。FlowSite 扩展了这种流模型，以同时生成蛋白质口袋中的粒子类型和分子的结合3D结构。我们表明，HarmonicFlow 在简洁性、通用性和性能方面都超越了状态元的生成过程。通过这种结构模型，FlowSite 可以设计结合站点得到substantially better than基线方法，并提供了第一个通用的结合站点设计解决方案。

Large-Scale OD Matrix Estimation with A Deep Learning Method

paper_url: http://arxiv.org/abs/2310.05753
repo_url: None
paper_authors: Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng
for: 实时数据驱动的交通流量统计分析
methods: combining deep learning和数值优化算法，将嵌入式数据与数据流聚合为数据统计
results: 提供了一个可靠且实时的交通流量统计方法，不dependent on prior information，并且可以减少工程费用

Abstract
The estimation of origin-destination (OD) matrices is a crucial aspect of Intelligent Transport Systems (ITS). It involves adjusting an initial OD matrix by regressing the current observations like traffic counts of road sections (e.g., using least squares). However, the OD estimation problem lacks sufficient constraints and is mathematically underdetermined. To alleviate this problem, some researchers incorporate a prior OD matrix as a target in the regression to provide more structural constraints. However, this approach is highly dependent on the existing prior matrix, which may be outdated. Others add structural constraints through sensor data, such as vehicle trajectory and speed, which can reflect more current structural constraints in real-time. Our proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization. This approach combines the advantages of both deep learning and numerical optimization algorithms. The neural network(NN) learns to infer structural constraints from probe traffic flows, eliminating dependence on prior information and providing real-time performance. Additionally, due to the generalization capability of NN, this method is economical in engineering. We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset. Subsequently, we verified the stability of our method on real traffic data. Our experiments provided confirmation of the benefits of combining NN and numerical optimization.

摘要
“OD矩阵估计是智能交通系统（ITS）中的一个重要问题。它需要对初始OD矩阵进行调整，使用最小二乘法 regression 来适应现有的观测数据（例如交通流量资料）。但是，OD估计问题缺乏足够的条件，从数学上是不充分决定的。为了解决这个问题，一些研究人员将 Target OD 矩阵添加到 regression 中，以提供更多的构造约束。但是，这种方法对于现有的 Target OD 矩阵依赖度太高，可能会受到旧有的矩阵影响。另一些研究人员通过感应器数据，如车辆轨迹和速度，添加更多的构造约束。我们的提案方法是通过深度学习和数值优化算法来推导矩阵结构，并将其与数值优化算法结合。这种方法结合了深度学习的优点和数值优化算法的稳定性。对于大规模的 sintetic 数据集，我们的方法具有良好的泛化性。进一步的，我们对真实交通数据进行验证，证明了我们的方法的稳定性和可靠性。我们的实验结果显示，结合深度学习和数值优化算法可以提供更好的性能和经济性。”

A Review of the Ethics of Artificial Intelligence and its Applications in the United States

paper_url: http://arxiv.org/abs/2310.05751
repo_url: None
paper_authors: Esther Taiwo, Ahmed Akinsola, Edward Tella, Kolade Makinde, Mayowa Akinwande
For: The paper focuses on the ethical considerations of Artificial Intelligence (AI) in the United States, highlighting its impact on various sectors and entities, and the need for responsible and ethical AI practices.* Methods: The paper explores eleven fundamental ethical principles, including Transparency, Justice, Fairness, Equity, Non-Maleficence, Responsibility, Accountability, Privacy, Beneficence, Freedom, Autonomy, Trust, Dignity, Sustainability, and Solidarity, as a guiding framework for ethical AI development and deployment.* Results: The paper discusses the revolutionary impact of AI applications, such as Machine Learning, and explores various approaches used to implement AI ethics, addressing the growing concerns surrounding the inherent risks associated with the widespread use of AI.

Abstract
This study is focused on the ethics of Artificial Intelligence and its application in the United States, the paper highlights the impact AI has in every sector of the US economy and multiple facets of the technological space and the resultant effect on entities spanning businesses, government, academia, and civil society. There is a need for ethical considerations as these entities are beginning to depend on AI for delivering various crucial tasks, which immensely influence their operations, decision-making, and interactions with each other. The adoption of ethical principles, guidelines, and standards of work is therefore required throughout the entire process of AI development, deployment, and usage to ensure responsible and ethical AI practices. Our discussion explores eleven fundamental 'ethical principles' structured as overarching themes. These encompass Transparency, Justice, Fairness, Equity, Non- Maleficence, Responsibility, Accountability, Privacy, Beneficence, Freedom, Autonomy, Trust, Dignity, Sustainability, and Solidarity. These principles collectively serve as a guiding framework, directing the ethical path for the responsible development, deployment, and utilization of artificial intelligence (AI) technologies across diverse sectors and entities within the United States. The paper also discusses the revolutionary impact of AI applications, such as Machine Learning, and explores various approaches used to implement AI ethics. This examination is crucial to address the growing concerns surrounding the inherent risks associated with the widespread use of artificial intelligence.

摘要
The paper proposes eleven fundamental ethical principles, structured as overarching themes, to guide the ethical development, deployment, and utilization of AI technologies. These principles include:1. Transparency2. Justice3. Fairness4. Equity5. Non-Maleficence6. Responsibility7. Accountability8. Privacy9. Beneficence10. Freedom11. Autonomy12. Trust13. Dignity14. Sustainability15. SolidarityThese principles collectively serve as a guiding framework for the ethical development and use of AI technologies in the United States. The paper also discusses the revolutionary impact of AI applications, such as Machine Learning, and explores various approaches used to implement AI ethics. This examination is crucial to address the growing concerns surrounding the inherent risks associated with the widespread use of AI.

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

paper_url: http://arxiv.org/abs/2310.05746
repo_url: https://github.com/jiangjiechen/auction-arena
paper_authors: Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle Richardson
for: evaluating the ability of Large Language Models (LLMs) to simulate human behavior in complex environments, specifically in auctions.
methods: using a novel simulation environment called AucArena to test the ability of state-of-the-art LLMs as bidding agents in controlled simulations.
results: LLMs demonstrate advanced reasoning skills and ability to manage budget, adhere to long-term goals and priorities, but with considerable variability in capabilities and occasional surpassing by heuristic baselines and human agents, highlighting the potential for further improvements in agent design and the importance of simulation environments for testing and refining agent architectures.

Abstract
Can Large Language Models (LLMs) simulate human behavior in complex environments? LLMs have recently been shown to exhibit advanced reasoning skills but much of NLP evaluation still relies on static benchmarks. Answering this requires evaluation environments that probe strategic reasoning in competitive, dynamic scenarios that involve long-term planning. We introduce AucArena, a novel simulation environment for evaluating LLMs within auctions, a setting chosen for being highly unpredictable and involving many skills related to resource and risk management, while also being easy to evaluate. We conduct several controlled simulations using state-of-the-art LLMs as bidding agents. We find that through simple prompting, LLMs do indeed demonstrate many of the skills needed for effectively engaging in auctions (e.g., managing budget, adhering to long-term goals and priorities), skills that we find can be sharpened by explicitly encouraging models to be adaptive and observe strategies in past auctions. These results are significant as they show the potential of using LLM agents to model intricate social dynamics, especially in competitive settings. However, we also observe considerable variability in the capabilities of individual LLMs. Notably, even our most advanced models (GPT-4) are occasionally surpassed by heuristic baselines and human agents, highlighting the potential for further improvements in the design of LLM agents and the important role that our simulation environment can play in further testing and refining agent architectures.

摘要

Language Model Beats Diffusion – Tokenizer is Key to Visual Generation

paper_url: http://arxiv.org/abs/2310.05737
repo_url: https://github.com/kyegomez/MAGVIT2
paper_authors: Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
for: 用于提高语言模型（LLM）在语言生成任务中表现，并不是 diffusion 模型在图像和视频生成任务中表现好。
methods: 提出了一种名为 MAGVIT-v2 的视频tokenizer，可以生成高效的字符串表示图像和视频，并使用这个tokenizer，LLM 可以在标准图像和视频生成 benchmark 上表现出色。
results: 通过使用 MAGVIT-v2 tokenizer，LLM 可以超越 diffusion 模型在图像和视频生成任务中的表现，同时在视频压缩和动作识别任务中也表现出色。

Abstract
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation. To effectively use LLMs for visual generation, one crucial component is the visual tokenizer that maps pixel-space inputs to discrete tokens appropriate for LLM learning. In this paper, we introduce MAGVIT-v2, a video tokenizer designed to generate concise and expressive tokens for both videos and images using a common token vocabulary. Equipped with this new tokenizer, we show that LLMs outperform diffusion models on standard image and video generation benchmarks including ImageNet and Kinetics. In addition, we demonstrate that our tokenizer surpasses the previously top-performing video tokenizer on two more tasks: (1) video compression comparable to the next-generation video codec (VCC) according to human evaluations, and (2) learning effective representations for action recognition tasks.

摘要
LLMs 是语言生成任务中的主导模型，但它们在图像和视频生成任务中不如扩散模型表现为好。为了有效地使用 LLMs 进行视觉生成，一个关键组件是视觉 токен化器，它将 pixel-space 输入映射到适合 LLM 学习的精炼的 tokens。在这篇论文中，我们介绍了 MAGVIT-v2，一种用于生成简洁和表达力强的 видео和图像 tokens 的视觉 токен化器。我们使用这个新的 токен化器，我们展示了 LLMs 在标准的图像和视频生成 benchmark 上表现出色，并且在两个额外任务上表现出色：（1）与下一代视频编码器（VCC）相当的视频压缩，以及（2）学习有效的动作认知任务。

The Program Testing Ability of Large Language Models for Code

paper_url: http://arxiv.org/abs/2310.05727
repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
paper_authors: Weimin Xiong, Yiwen Guo, Hao Chen
for: 这 paper 探讨了大型语言模型（LLMs）在代码测试方面的能力。
methods: 这 paper 使用了一系列的方法来测试 LLMs，包括人工评估和 MBPP 等数据集。
results: 这 paper 显示了 LLMs 在代码测试方面的一些有趣的性质，并通过使用生成的测试用例提高了代码质量，从而提高了代码的执行率。

Abstract
Recent development of large language models (LLMs) for code like CodeX and CodeT5+ demonstrates tremendous promise in achieving code intelligence. Their ability of synthesizing code that completes a program for performing a pre-defined task has been intensively tested and verified on benchmark datasets including HumanEval and MBPP. Yet, evaluation of these LLMs from more perspectives (than just program synthesis) is also anticipated, considering their broad scope of applications in software engineering. In this paper, we explore the ability of LLMs for testing programs/code. By performing thorough analyses of recent LLMs for code in program testing, we show a series of intriguing properties of these models and demonstrate how program testing ability of LLMs can be improved. Following recent work which utilizes generated test cases to enhance program synthesis, we further leverage our findings in improving the quality of the synthesized programs and show +11.77% and +4.22% higher code pass rates on HumanEval+ comparing with the GPT-3.5-turbo baseline and the recent state-of-the-art, respectively.

摘要
In this paper, we explore the ability of LLMs for testing programs/code. We conduct a thorough analysis of recent LLMs for code in program testing and identify several intriguing properties of these models. Furthermore, we demonstrate how the program testing ability of LLMs can be improved, building on recent work that utilizes generated test cases to enhance program synthesis. Our findings lead to a +11.77% and +4.22% increase in code pass rates on HumanEval+ compared to the GPT-3.5-turbo baseline and the recent state-of-the-art, respectively.

STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects on Production Lines

paper_url: http://arxiv.org/abs/2310.05717
repo_url: None
paper_authors: Yuxuan Kuang, Qin Han, Danshi Li, Qiyu Dai, Lian Ding, Dong Sun, Hanlin Zhao, He Wang
for: 本文提出了一种用于生产线上6DoF物体抓取检测的框架，特别是透明物体，这是机器人系统和现代工业中的重要和困难问题。
methods: 我们提出了一种基于多视图涂抹的新方法，只使用RGB输入，能够重建生产线上的场景，并在真实世界中获得高质量的6DoF抓取姿势。
results: 对于现有方法，我们的方法在实验和实际应用中具有更好的普适性和更高的性能，可满足实际工业的需求。

Abstract
In this work, we present STOPNet, a framework for 6-DoF object suction detection on production lines, with a focus on but not limited to transparent objects, which is an important and challenging problem in robotic systems and modern industry. Current methods requiring depth input fail on transparent objects due to depth cameras' deficiency in sensing their geometry, while we proposed a novel framework to reconstruct the scene on the production line depending only on RGB input, based on multiview stereo. Compared to existing works, our method not only reconstructs the whole 3D scene in order to obtain high-quality 6-DoF suction poses in real time but also generalizes to novel environments, novel arrangements and novel objects, including challenging transparent objects, both in simulation and the real world. Extensive experiments in simulation and the real world show that our method significantly surpasses the baselines and has better generalizability, which caters to practical industrial needs.

摘要
在这项工作中，我们介绍了STOPNet，一个用于生产线上6个自由度物体捕捉检测的框架，强调但不限于透明物体，这是现代机器人系统和现代工业中的一个重要和困难的问题。现有的方法，需要深度输入，在透明物体上失败，因为深度摄像头无法感知其几何结构，而我们提出了一种新的框架，基于多视图零点投影，可以在RGB输入基础上重建生产线上的场景，并且可以在实时获得高质量的6个自由度捕捉姿势。与现有的方法相比，我们的方法不仅可以重建整个3D场景，以获得高质量的捕捉姿势，而且可以在新环境、新排序和新物体上普遍，包括实际上的挑战性透明物体，并在实际世界中达到了更好的普遍性。广泛的实验在实际世界和 simulate 中表明，我们的方法在比较基eline上显著超越了基eline，并且具有更好的普遍性，这符合实际工业需求。

Guiding Language Model Reasoning with Planning Tokens

paper_url: http://arxiv.org/abs/2310.05707
repo_url: None
paper_authors: Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, Alessandro Sordoni
for: 提高大语言模型（LLMs）的复杂逻辑能力，特别是链式思维能力。
methods: 引入”规划符”（planning tokens）作为模型的引导，并在模型参数中微调这些符号表示。
results: 在三个数学问题 datasets 上，与普通的链式思维精心 fine-tuning 基eline 相比，示出了明显的准确性改善。

Abstract
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce 'planning tokens' at the start of each reasoning step, serving as a guide for the model. These token embeddings are then fine-tuned along with the rest of the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. plain chain-of-thought fine-tuning baselines.

摘要

An Attribution Method for Siamese Encoders

paper_url: http://arxiv.org/abs/2310.05703
repo_url: None
paper_authors: Lucas Möller, Dmitry Nikolaev, Sebastian Padó
for: 本研究旨在探讨siamese encoder模型如sentence transformers（ST）在处理输入时关注哪些方面。
methods: 本研究使用了一种基于integrated gradients的本地 attribute方法，通过将多个输入转化为对应的feature-pair attribute。
results: 研究发现，ST在做出预测时通常只需要关注一些token-pairs，但是为了准确预测，需要关注大多数token和parts of speech。

Abstract
Despite the success of Siamese encoder models such as sentence transformers (ST), little is known about the aspects of inputs they pay attention to. A barrier is that their predictions cannot be attributed to individual features, as they compare two inputs rather than processing a single one. This paper derives a local attribution method for Siamese encoders by generalizing the principle of integrated gradients to models with multiple inputs. The solution takes the form of feature-pair attributions, and can be reduced to a token-token matrix for STs. Our method involves the introduction of integrated Jacobians and inherits the advantageous formal properties of integrated gradients: it accounts for the model's full computation graph and is guaranteed to converge to the actual prediction. A pilot study shows that in an ST few token-pairs can often explain large fractions of predictions, and it focuses on nouns and verbs. For accurate predictions, it however needs to attend to the majority of tokens and parts of speech.

摘要
尽管SIAMESE编码器模型如sentence transformers（ST）在成功的背景下，仍然知之少于其处理输入的方面。一个障碍是它们的预测无法归因于单个特征，因为它们比较两个输入而不是处理单个输入。这篇论文提出了一种本地归因方法 для SIAMESE编码器模型，通过泛化集成导数原理来对多输入模型进行归因。该方法的解释形式为对应对方特征归因，可以将其减少到一个单词单词的矩阵中，并且具有集成导数的优点：它考虑了模型的全部计算图和是确定的归因方法。一项试点研究显示，在ST中，只需要几对单词可以解释大量预测，并且它们主要集中在名词和动词上。然而，为了准确预测，它们需要对大多数单词和部分语法进行注意。

Based on What We Can Control Artificial Neural Networks

paper_url: http://arxiv.org/abs/2310.05692
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Cheng Kang, Xujing Yao
for: 本研究旨在提高人工神经网络（ANNs）的稳定性和效率，通过系统分析方法。
methods: 本研究使用控制系统知识来分析ANNs的系统功能，模拟系统响应。尽管大多数ANNs的复杂性很高，但我们仍然可以分析每个因素（例如优化器、超参数）的系统响应。
results: 本研究提出了一种新的分析方法，可以帮助确保ANNs的学习过程的稳定性和效率。这种方法还可能对新的优化器和学习系统的开发产生影响，特别是当找出哪些组件对ANNs产生负面影响时。请参考：\url{https://github.com/RandomUserName2023/Control-ANNs}。

Abstract
How can the stability and efficiency of Artificial Neural Networks (ANNs) be ensured through a systematic analysis method? This paper seeks to address that query. While numerous factors can influence the learning process of ANNs, utilizing knowledge from control systems allows us to analyze its system function and simulate system responses. Although the complexity of most ANNs is extremely high, we still can analyze each factor (e.g., optimiser, hyperparameters) by simulating their system response. This new method also can potentially benefit the development of new optimiser and learning system, especially when discerning which components adversely affect ANNs. Controlling ANNs can benefit from the design of optimiser and learning system, as (1) all optimisers act as controllers, (2) all learning systems operate as control systems with inputs and outputs, and (3) the optimiser should match the learning system. Please find codes: \url{https://github.com/RandomUserName2023/Control-ANNs}.

摘要
如何确保人工神经网络（ANNs）的稳定性和效率？这篇论文旨在回答这个问题。虽然多种因素可能影响 ANNs 的学习过程，但通过知识控制系统来分析其系统功能并模拟系统响应。虽然大多数 ANNs 的复杂度很高，但我们仍可以分析每个因素（例如优化器、超参数） by 模拟它们的系统响应。这种新方法还可能为 ANNs 的发展提供新的优化器和学习系统，特别是当探测那些组件对 ANNs 产生负面影响时。控制 ANNs 可以从优化器和学习系统的设计中受益，因为（1）所有优化器都是控制器，（2）所有学习系统都是控制系统，（3）优化器应该与学习系统匹配。请找到代码：https://github.com/RandomUserName2023/Control-ANNs。

Abstractive Summarization of Large Document Collections Using GPT

paper_url: http://arxiv.org/abs/2310.05690
repo_url: None
paper_authors: Sengjie Liu, Christopher G. Healey
for: 本研究提出了一种可扩展到文档收集的抽象概要方法。
methods: 该方法使用了 semantic clustering、文档内部话题减少、semantic chunking、GPT基于概要和 concatenation 等方法。
results: 对比 exist 状态的 art 系统 BART、BRIO、PEGASUS 和 MoCa 的 ROGUE 摘要分数，本研究在 CNN/Daily Mail 测试集上与 BART 和 PEGASUS 相当，在 Gigaword 测试集上与 BART 相当。

Abstract
This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents. Our approach applies a combination of semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster's documents, GPT-based summarization and concatenation, and a combined sentiment and text visualization of each topic to support exploratory data analysis. Statistical comparison of our results to existing state-of-the-art systems BART, BRIO, PEGASUS, and MoCa using ROGUE summary scores showed statistically equivalent performance with BART and PEGASUS on the CNN/Daily Mail test dataset, and with BART on the Gigaword test dataset. This finding is promising since we view document collection summarization as more challenging than individual document summarization. We conclude with a discussion of how issues of scale are

摘要
这篇论文提出了一种抽象摘要方法，旨在对文档集合进行摘要而不是单个文档。我们的方法使用了 semantics 归一化、文档内容减少、semantic 块分割、基于 GPT 的摘要和 concatenation，以及每个话题的感情和文本视觉表示。我们通过对 ROGUE 摘要分数进行统计比较，与现有的状态 искус数据集 BART、BRIO、PEGASUS 和 MoCa 进行比较，在 CNN/Daily Mail 测试集上与 BART 和 PEGASUS 相当，在 Gigaword 测试集上与 BART 相当。这一结果是有前途的，因为我们视文档集合摘要为单个文档摘要更加困难。我们在结尾采用了一些问题的扩展和未来工作的讨论。

The potential of large language models for improving probability learning: A study on ChatGPT3.5 and first-year computer engineering students

paper_url: http://arxiv.org/abs/2310.05686
repo_url: None
paper_authors: Angel Udias, Antonio Alonso-Ayuso, Ignacio Sanchez, Sonia Hernandez, Maria Eugenia Castellanos, Raquel Montes Diez, Emilio Lopez Cano
For: The paper assesses the efficacy of ChatGPT in solving probability problems typically presented in introductory computer engineering exams.* Methods: The study uses a set of 23 probability exercises administered to students at Rey Juan Carlos University (URJC) in Madrid, and evaluates the responses produced by ChatGPT qualitatively, assigning grades based on the same criteria used for students.* Results: The results indicate that ChatGPT surpasses the average student in terms of phrasing, organization, and logical reasoning, and the model’s performance remained consistent for both the Spanish and English versions of the exercises. However, ChatGPT encountered difficulties in executing basic numerical operations, which were overcome by requesting the solution in the form of an R script.

Abstract
In this paper, we assess the efficacy of ChatGPT (version Feb 2023), a large-scale language model, in solving probability problems typically presented in introductory computer engineering exams. Our study comprised a set of 23 probability exercises administered to students at Rey Juan Carlos University (URJC) in Madrid. The responses produced by ChatGPT were evaluated by a group of five statistics professors, who assessed them qualitatively and assigned grades based on the same criteria used for students. Our results indicate that ChatGPT surpasses the average student in terms of phrasing, organization, and logical reasoning. The model's performance remained consistent for both the Spanish and English versions of the exercises. However, ChatGPT encountered difficulties in executing basic numerical operations. Our experiments demonstrate that requesting ChatGPT to provide the solution in the form of an R script proved to be an effective approach for overcoming these limitations. In summary, our results indicate that ChatGPT surpasses the average student in solving probability problems commonly presented in introductory computer engineering exams. Nonetheless, the model exhibits limitations in reasoning around certain probability concepts. The model's ability to deliver high-quality explanations and illustrate solutions in any programming language, coupled with its performance in solving probability exercises, suggests that large language models have the potential to serve as learning assistants.

摘要
在这篇论文中，我们评估了ChatGPT（版本为2月2023）大型语言模型在解probability问题方面的效果。我们的研究包括23个probability问题，对于马德里 Rey Juan Carlos大学（URJC）的学生进行了测试。ChatGPT的答案由5名统计教授评估，他们根据同样的标准评分学生的答案。我们的结果表明，ChatGPT在表达、组织和逻辑推理方面超过了学生的平均水平。模型在西班牙语和英语版probability问题上表现一致。然而，ChatGPT在基本数学运算方面遇到了困难。我们的实验表明，向ChatGPT请求提供解决方案的R脚本形式是一种有效的方法，以超越这些限制。总之，我们的结果表明，ChatGPT在入门计算机工程考试中常见的probability问题方面表现出色，但模型在某些概率概念上存在限制。模型能够提供高质量的解释和在任何编程语言中示例解决方案，表明大语言模型有可能作为学习助手。

Automated Argument Generation from Legal Facts

paper_url: http://arxiv.org/abs/2310.05680
repo_url: None
paper_authors: Oscar Tuvey, Procheta Sen
For: The paper aims to enhance the efficiency and speed of legal procedures by utilizing AI technology to help legal professionals analyze legal cases.* Methods: The paper uses open-sourced large language models to create arguments derived from the facts present in legal cases.* Results: The generated arguments from the best performing method have on average 63% overlap with the benchmark set gold standard annotations.Here are the three key points in Simplified Chinese text:* For: 这项研究旨在利用人工智能技术，提高法律程序的效率和速度。* Methods: 这项研究使用开源大型自然语言模型，生成法律案例中的事实所基于的Arguments。* Results: 最佳方法生成的Arguments在基本标准注释中的重合率平均为63%。

Abstract
The count of pending cases has shown an exponential rise across nations (e.g., with more than 10 million pending cases in India alone). The main issue lies in the fact that the number of cases submitted to the law system is far greater than the available number of legal professionals present in a country. Given this worldwide context, the utilization of AI technology has gained paramount importance to enhance the efficiency and speed of legal procedures. In this study we partcularly focus on helping legal professionals in the process of analyzing a legal case. Our specific investigation delves into harnessing the generative capabilities of open-sourced large language models to create arguments derived from the facts present in legal cases. Experimental results show that the generated arguments from the best performing method have on average 63% overlap with the benchmark set gold standard annotations.

摘要
全球各国案件数量在急增（例如印度单独已经有超过1000万个案件）。主要问题在于法律系统内的案件数量比法律专业人员的数量更多。视这种全球背景，利用人工智能技术已成为提高法律程序效率和速度的重要手段。本研究专注于帮助法律专业人员分析法律案件。我们的特定调查是利用开源大型自然语言模型的生成能力来从法律案件中生成基于事实的法律Arguments。实验结果显示，最佳方法生成的Arguments的平均 overlap率为63%。

Making Scalable Meta Learning Practical

paper_url: http://arxiv.org/abs/2310.05674
repo_url: https://github.com/leopard-ai/betty
paper_authors: Sang Keun Choe, Sanket Vaibhav Mehta, Hwijeen Ahn, Willie Neiswanger, Pengtao Xie, Emma Strubell, Eric Xing
for: 本研究旨在实现可扩展的元学习实现，提高元学习的可扩展性和可行性。
methods: 本研究使用SAMA算法，其combines implicit differentiation algorithms和系统，以提高元学习的可扩展性和可行性。SAMA支持广泛的适应性optimizers，并通过避免显式计算第二阶导数信息，以及利用高效的分布式训练技术，提高元学习的可扩展性和可行性。
results: 在多个大规模元学习benchmark上测试，SAMA比基eline元学习算法具有1.7/4.8倍的 Throughput和2.0/3.8倍的内存占用率，单/多GPU集成。此外，SAMA在BERT和RoBERTa大语言模型中进行文本分类任务中，以及在图像分类任务中进行数据优化，均实现了顺利的改进，并达到了小规模和大规模数据剪辑的状态机器。

Abstract
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.

摘要
尽管机器学习中的元学习（即学习学习）具有学习多种启发的灵活性，但长期以来，元学习受到了计算/存储成本过高、训练不稳定和分布式训练支持不充分的问题困扰。在这项工作中，我们关注使得元学习可扩展的实用性，通过引入SAMA来实现。SAMA组合了隐式 diferentiation算法和系统技术，特别是在基础级元学习程序中支持广泛的适应化优化器，同时减少计算负担，避免直接计算第二阶导数信息，并利用高效的分布式训练技术实现。在多个大规模元学习标准 benchmark 上评估，SAMA显示在单/多GPU设置下具有1.7/4.8倍的throughput和2.0/3.8倍的内存占用量，相比其他基eline元学习算法。此外，我们表明SAMA可以在语言和视觉领域中实现可扩展的数据优化，并在BERT和RoBERTa大语言模型中表现出了一致的提升，并在图像分类任务中实现了小规模和大规模数据减少的状态环境。

Reinforcement learning for freeform robot design

paper_url: http://arxiv.org/abs/2310.05670
repo_url: None
paper_authors: Muhan Li, David Matthews, Sam Kriegman
for: 探讨了用Policy Gradients来设计自由形机器人的方法
methods: 使用动作将atomic building block bundle归并或移除，形成非parametric macrostructure如肢体、器官和腔体
results: 实现了开 loop控制，未来可以适应closed loop控制和 sim2real转移到物理机器人

Abstract
Inspired by the necessity of morphological adaptation in animals, a growing body of work has attempted to expand robot training to encompass physical aspects of a robot's design. However, reinforcement learning methods capable of optimizing the 3D morphology of a robot have been restricted to reorienting or resizing the limbs of a predetermined and static topological genus. Here we show policy gradients for designing freeform robots with arbitrary external and internal structure. This is achieved through actions that deposit or remove bundles of atomic building blocks to form higher-level nonparametric macrostructures such as appendages, organs and cavities. Although results are provided for open loop control only, we discuss how this method could be adapted for closed loop control and sim2real transfer to physical machines in future.

摘要
受动物形态适应的需要启发，一组增长的研究尝试将机器人训练扩展到物理机器人设计的方面。然而，使用奖励学习方法优化3D机器人形态的方法一直受限于重定向或缩放预先预定的顺序型机器人的臂部。我们现在显示了一种使用政策偏好来设计自由形机器人，这种方法通过执行填充或 removing 粒子堆集来形成高级非参数 macrostructure，如肢体、器官和腔体。虽然我们只提供了开loop控制的结果，但我们讨论了如何将这种方法适应到closed loop控制和 sim2real 转移到物理机器人的未来。

ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain

paper_url: http://arxiv.org/abs/2310.05664
repo_url: None
paper_authors: Md Sohag Mia, Abu Bakor Hayat Arnob, Abdu Naim, Abdullah Al Bary Voban, Md Shariful Islam
for: 这篇论文主要关注于Computer Vision（CV）领域中的Transformer设计，以及它在不同CV应用中的表现。
methods: 这篇论文使用了多种Vision Transformer（ViT）模型，并对它们进行了分类和比较，以找出它们在不同CV应用中的优势和缺陷。
results: 研究发现，ViTs在多种CV应用中表现出优于Convolutional Neural Networks（CNNs），包括图像分类、物体识别、图像 segmentation、视频变换、图像净化和NAS等。同时，研究还提出了许多未解决的问题和潜在的研究机会。

Abstract
Transformer design is the de facto standard for natural language processing tasks. The success of the transformer design in natural language processing has lately piqued the interest of researchers in the domain of computer vision. When compared to Convolutional Neural Networks (CNNs), Vision Transformers (ViTs) are becoming more popular and dominant solutions for many vision problems. Transformer-based models outperform other types of networks, such as convolutional and recurrent neural networks, in a range of visual benchmarks. We evaluate various vision transformer models in this work by dividing them into distinct jobs and examining their benefits and drawbacks. ViTs can overcome several possible difficulties with convolutional neural networks (CNNs). The goal of this survey is to show the first use of ViTs in CV. In the first phase, we categorize various CV applications where ViTs are appropriate. Image classification, object identification, image segmentation, video transformer, image denoising, and NAS are all CV applications. Our next step will be to analyze the state-of-the-art in each area and identify the models that are currently available. In addition, we outline numerous open research difficulties as well as prospective research possibilities.

摘要
<>将文本翻译成简化中文。<>变换器设计已经成为自然语言处理任务的逻辑标准。随着变换器设计在自然语言处理领域的成功，研究人员开始关注这种设计在计算机视觉领域的应用。与卷积神经网络（CNN）相比，视力变换器（ViT）在许多视觉问题上变得更加受欢迎和主导性。基于变换器的模型在各种视觉标准上表现出色，超过了卷积神经网络和回归神经网络的性能。在这项工作中，我们将对不同的视觉变换器模型进行分类和分析，描述其优缺点。ViT可以超越卷积神经网络的一些可能的困难。本文的目标是在计算机视觉领域内，首次使用ViT。在第一个阶段，我们将分类各种适用于计算机视觉应用的CV应用程序。包括图像分类、物体识别、图像分割、视频变换、图像净化和NAS等。接下来，我们将分析每个领域的现状，并识别目前可用的模型。此外，我们还将列出许多开放的研究Difficulties和前景。

Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGs

paper_url: http://arxiv.org/abs/2310.05655
repo_url: https://github.com/mschauer/CausalInference.jl
paper_authors: Moritz Schauer, Marcel Wienöbst
for: INFERRING BAYESIAN NETWORK STRUCTURE (DIRECTED Acyclic Graph, DAG FOR SHORT)
methods: NON-REVERSIBLE CONTINUOUS TIME MARKOV CHAIN (CAUSAL ZIG-ZAG SAMPLER) TARGETING A PROBABILITY DISTRIBUTION OVER CLASSES OF OBSERVATIONALLY EQUIVALENT (MARKOV EQUIVALENT) DAGs
results: IMPROVED MIXING COMPARED TO STATE-OF-THE-ART IMPLEMENTATIONS USING GREEDY EQUIVALENCE SEARCH (GES) OPERATORS WITH A MOMENTUM VARIABLE, AND EFFICIENT IMPLEMENTATION OF LISTING, COUNTING, UNIFORMLY SAMPLING, AND APPLYING POSSIBLE MOVES OF GES OPERATORS.

Abstract
In the context of inferring a Bayesian network structure (directed acyclic graph, DAG for short), we devise a non-reversible continuous time Markov chain, the "Causal Zig-Zag sampler", that targets a probability distribution over classes of observationally equivalent (Markov equivalent) DAGs. The classes are represented as completed partially directed acyclic graphs (CPDAGs). The non-reversible Markov chain relies on the operators used in Chickering's Greedy Equivalence Search (GES) and is endowed with a momentum variable, which improves mixing significantly as we show empirically. The possible target distributions include posterior distributions based on a prior over DAGs and a Markov equivalent likelihood. We offer an efficient implementation wherein we develop new algorithms for listing, counting, uniformly sampling, and applying possible moves of the GES operators, all of which significantly improve upon the state-of-the-art.

摘要
在推断 bayesian 网络结构（直接循环图，dag 简称）方面，我们设计了一种不可逆的连续时间马尔可夫链，称为“ causal zig-zag sampler”，该链targeted一个观察可Equivalence classes of observationally (Markov equivalent) DAGs的概率分布。这些类型被表示为完善的部分导向循环图（CPDAGs）。不可逆的马尔可夫链利用GES操作符，并具有一个势量变量，这使得混合得到了显著改善，我们在实验中验证了这一点。 possible target distributions include posterior distributions based on a prior over DAGs and a Markov equivalent likelihood。我们提供了高效的实现，其中包括开发了新的列表、计数、均匀采样和可能的移动操作算法，这些算法都有显著改善了现有状态的。

No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling

paper_url: http://arxiv.org/abs/2310.05654
repo_url: None
paper_authors: Xuwei Xu, Changlin Li, Yudong Chen, Xiaojun Chang, Jiajun Liu, Sen Wang
For: 这个论文目的是提出一种名为IdleViT的动态token遮瑕方法，以提高运算效率和表现力。* Methods: 这个方法选择每层中的一部分图像token参与计算，并将其他token直接传递到下一层的输出中。这样可以避免在早期阶段 improvident pruning 导致的 permanent loss of image information。此外，这个方法还使用了 normalized graph cut 的内置数据库损失来改善图像选择能力。* Results: 实验结果显示，IdleViT 可以将预训 ViT 的复杂度降低到 33%，而且只需要调整 30 次 epoch。此外，当保留比例为 0.5 时，IdleViT 可以在 DeiT-S 上比 EViT 高 0.5% 的精度和更快的测试速度。

Abstract
Vision Transformers (ViTs) have demonstrated outstanding performance in computer vision tasks, yet their high computational complexity prevents their deployment in computing resource-constrained environments. Various token pruning techniques have been introduced to alleviate the high computational burden of ViTs by dynamically dropping image tokens. However, some undesirable pruning at early stages may result in permanent loss of image information in subsequent layers, consequently hindering model performance. To address this problem, we propose IdleViT, a dynamic token-idle-based method that achieves an excellent trade-off between performance and efficiency. Specifically, in each layer, IdleViT selects a subset of the image tokens to participate in computations while keeping the rest of the tokens idle and directly passing them to this layer's output. By allowing the idle tokens to be re-selected in the following layers, IdleViT mitigates the negative impact of improper pruning in the early stages. Furthermore, inspired by the normalized graph cut, we devise a token cut loss on the attention map as regularization to improve IdleViT's token selection ability. Our method is simple yet effective and can be extended to pyramid ViTs since no token is completely dropped. Extensive experimental results on various ViT architectures have shown that IdleViT can diminish the complexity of pretrained ViTs by up to 33\% with no more than 0.2\% accuracy decrease on ImageNet, after finetuning for only 30 epochs. Notably, when the keep ratio is 0.5, IdleViT outperforms the state-of-the-art EViT on DeiT-S by 0.5\% higher accuracy and even faster inference speed. The source code is available in the supplementary material.

摘要
通过图像矩阵变换（ViT），计算机视觉任务的表现几乎不可思议，但是它们的计算复杂性使得在计算资源受限的环境中不能实施。为了解决这个问题，我们提出了IdleViT，一种基于动态token idle的方法，可以很好地平衡性能和效率。具体来说，在每层中，IdleViT选择图像中的一部分token参与计算，而保留剩下的token idle，直接将其传递给当前层的输出。通过在后续层中重新选择idletoken，IdleViT消除了在早期阶段的不合适剪裁所产生的负面影响。此外，我们根据 норма化图像排序（normalized graph cut）的思想，在注意力图中定义了一个token cut损失，以便提高IdleViT的token选择能力。我们的方法简单而有效，可以扩展到 pyramid ViTs，因为没有完全drop的token。我们在不同的ViT架构上进行了广泛的实验，并证明了IdleViT可以减少预训练ViT的复杂度达33%，只需要30个epoch的微调，而且在ImageNet上保持至少0.2%的准确率下。特别是，当保留比例为0.5时，IdleViT可以在DeiT-S上高于状态之前的EViT，增加0.5%的准确率和更快的执行速度。详细的源代码可以在补充材料中找到。

FENCE: Fairplay Ensuring Network Chain Entity for Real-Time Multiple ID Detection at Scale In Fantasy Sports

paper_url: http://arxiv.org/abs/2310.05651
repo_url: None
paper_authors: Akriti Upreti, Kartavya Kothari, Utkarsh Thukral, Vishal Verma
for: 本文旨在解决 Dream11 平台上的多个账户创建问题，以防止用户利用平台的奖励提供程序。
methods: 本文提出了一种图形基的解决方案，首先预测用户之间的关系，然后检测欺诈账户的协同行为。
results: 本文介绍了一种分布式Machine Learning系统，用于支持和服务检测模型的推断。系统能够在实时中进行检测，以采取 corrrective actions。此外，文章还包括人类在Loop组件，用于验证、反馈和ground truth标注。

Abstract
Dream11 takes pride in being a unique platform that enables over 190 million fantasy sports users to demonstrate their skills and connect deeper with their favorite sports. While managing such a scale, one issue we are faced with is duplicate/multiple account creation in the system. This is done by some users with the intent of abusing the platform, typically for bonus offers. The challenge is to detect these multiple accounts before it is too late. We propose a graph-based solution to solve this problem in which we first predict edges/associations between users. Using the edge information we highlight clusters of colluding multiple accounts. In this paper, we talk about our distributed ML system which is deployed to serve and support the inferences from our detection models. The challenge is to do this in real-time in order to take corrective actions. A core part of this setup also involves human-in-the-loop components for validation, feedback, and ground-truth labeling.

摘要

Plug n’ Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers

paper_url: http://arxiv.org/abs/2310.05642
repo_url: None
paper_authors: Xuwei Xu, Sen Wang, Yudong Chen, Jiajun Liu
for: 提高小型变换器（ViT）的效率，使其适用于具有限定内存和计算资源的设备。
methods: 提出一种新的混合核心排序模块，通过在缓存频率较高的卷积层和自我关注机制之间进行信息交换，提高小型ViT的性能。
results: 在ImageNet-1K数据集上，通过 incorporating our module into tiny ViT models，可以提高top-1准确率，而且计算复杂度变化在0.03 GMACs以下。 Specifically, our proposed channel shuffle module consistently improves the top-1 accuracy by up to 2.8%.

Abstract
Vision Transformers (ViTs) have demonstrated remarkable performance in various computer vision tasks. However, the high computational complexity hinders ViTs' applicability on devices with limited memory and computing resources. Although certain investigations have delved into the fusion of convolutional layers with self-attention mechanisms to enhance the efficiency of ViTs, there remains a knowledge gap in constructing tiny yet effective ViTs solely based on the self-attention mechanism. Furthermore, the straightforward strategy of reducing the feature channels in a large but outperforming ViT often results in significant performance degradation despite improved efficiency. To address these challenges, we propose a novel channel shuffle module to improve tiny-size ViTs, showing the potential of pure self-attention models in environments with constrained computing resources. Inspired by the channel shuffle design in ShuffleNetV2 \cite{ma2018shufflenet}, our module expands the feature channels of a tiny ViT and partitions the channels into two groups: the \textit{Attended} and \textit{Idle} groups. Self-attention computations are exclusively employed on the designated \textit{Attended} group, followed by a channel shuffle operation that facilitates information exchange between the two groups. By incorporating our module into a tiny ViT, we can achieve superior performance while maintaining a comparable computational complexity to the vanilla model. Specifically, our proposed channel shuffle module consistently improves the top-1 accuracy on the ImageNet-1K dataset for various tiny ViT models by up to 2.8\%, with the changes in model complexity being less than 0.03 GMACs.

摘要
《视图变换器》（ViTs）在计算机视觉任务中表现出色，但高计算复杂性限制了ViTs在有限内存和计算资源的设备上的应用。虽然一些研究已经探索了将卷积层与自注意机制结合使用以提高ViTs的效率，但还有一个知识空白在建立简单而高效的ViTssolely基于自注意机制。此外，通常减少大型ViT的特征通道会导致显著性能下降，尽管提高了效率。为了解决这些挑战，我们提出了一种新的通道排序模块，用于改进简单型ViTs。我们的模块基于ShuffleNetV2中的通道排序设计，将特征通道分成两组：“Attended”和“Idle”组。只有在“Attended”组上进行自注意计算，然后进行通道排序操作，以便在两组之间进行信息交换。通过将我们的模块纳入简单型ViT中，我们可以实现高性能，同时保持与标准模型的计算复杂性相似。具体来说，我们的提议的通道排序模块在ImageNet-1K数据集上的top-1准确率上提高了2.8%，而模型的变化量占0.03 GMACs以下。

Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand

paper_url: http://arxiv.org/abs/2310.14942
repo_url: https://github.com/junfenggo/domain-watermark
paper_authors: Junfeng Guo, Yiming Li, Lixu Wang, Shu-Tao Xia, Heng Huang, Cong Liu, Bo Li
for: 保护开源数据集的版权，防止恶意模型攻击
methods: 基于域水印的数据集所有权验证，通过生成难 Sample来验证模型的准确性
results: 提出了一种基于域水印的数据集所有权验证方法，可以防止恶意模型攻击，并且有较高的鲁棒性和抗性能力。

Abstract
The prosperity of deep neural networks (DNNs) is largely benefited from open-source datasets, based on which users can evaluate and improve their methods. In this paper, we revisit backdoor-based dataset ownership verification (DOV), which is currently the only feasible approach to protect the copyright of open-source datasets. We reveal that these methods are fundamentally harmful given that they could introduce malicious misclassification behaviors to watermarked DNNs by the adversaries. In this paper, we design DOV from another perspective by making watermarked models (trained on the protected dataset) correctly classify some `hard' samples that will be misclassified by the benign model. Our method is inspired by the generalization property of DNNs, where we find a \emph{hardly-generalized domain} for the original dataset (as its \emph{domain watermark}). It can be easily learned with the protected dataset containing modified samples. Specifically, we formulate the domain generation as a bi-level optimization and propose to optimize a set of visually-indistinguishable clean-label modified data with similar effects to domain-watermarked samples from the hardly-generalized domain to ensure watermark stealthiness. We also design a hypothesis-test-guided ownership verification via our domain watermark and provide the theoretical analyses of our method. Extensive experiments on three benchmark datasets are conducted, which verify the effectiveness of our method and its resistance to potential adaptive methods. The code for reproducing main experiments is available at \url{https://github.com/JunfengGo/Domain-Watermark}.

摘要
“深度神经网络（DNN）的繁荣得益于开源数据集，用户可以通过这些数据集进行评估和改进自己的方法。在这篇论文中，我们再次检视了基于DOV（ dataset ownership verification）的数据集权利保护方法，我们发现这些方法是根本不可靠的，因为它们可能会由 adversaries 引入黑客识别器模型中的恶意识别行为。在这篇论文中，我们从另一个角度设计 DOV，使得在训练在保护数据集上的损坏模型中，对一些难样本进行正确的识别。我们的方法是基于 DNN 的总体化性特性，我们在原始数据集中找到一个难以总化的Domain（领域），然后通过修改这些样本来学习一个 hardly-generalized 领域。我们将这个领域作为 DNN 的域水印，通过对这些修改后的样本进行训练，来确保 watermark 的隐蔽性。我们还设计了一种假设测试导向的所有权验证方法，并提供了方法的理论分析。我们在三个基准数据集上进行了广泛的实验，并证明了我们的方法的有效性和对可适应方法的抵御力。相关的代码可以在 GitHub 上找到：https://github.com/JunfengGo/Domain-Watermark。”

Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods

paper_url: http://arxiv.org/abs/2310.05619
repo_url: None
paper_authors: Jonathan Kamp, Lisa Beinborn, Antske Fokkens
for: 这paper是用来解释文本分类器的预测结果的方法。
methods: 这paper使用了几种不同的方法来计算特征归因分数，并评估了这些方法的性能。
results: 研究发现，使用固定k或动态k都可以得到高度一致的结果，但动态k主要提高了 интеграルGradient和GradientXInput的性能。这是首次证明了 attribute scores 的顺序性有用于人类理解。

Abstract
Feature attribution scores are used for explaining the prediction of a text classifier to users by highlighting a k number of tokens. In this work, we propose a way to determine the number of optimal k tokens that should be displayed from sequential properties of the attribution scores. Our approach is dynamic across sentences, method-agnostic, and deals with sentence length bias. We compare agreement between multiple methods and humans on an NLI task, using fixed k and dynamic k. We find that perturbation-based methods and Vanilla Gradient exhibit highest agreement on most method--method and method--human agreement metrics with a static k. Their advantage over other methods disappears with dynamic ks which mainly improve Integrated Gradient and GradientXInput. To our knowledge, this is the first evidence that sequential properties of attribution scores are informative for consolidating attribution signals for human interpretation.

摘要
<>translate_text="Feature attribution scores 是用于解释文本分类器预测结果的 Token 的准确性。在这种工作中，我们提出了一种方法来确定显示 k 个 Token 的优化数量，基于序列性质。我们的方法是动态 sentence，方法不依赖的，并且能够处理句子长度偏好。我们比较了多种方法和人类在 NLI 任务中的一致性，使用 fixes k 和动态 k。我们发现，扰动基于方法和 Vanilla Gradient 在大多数方法--方法和方法--人类协议中表现最高，其优势在 static k 下消失。这是我们知道的第一个证据，Sequential 性质是用于结合权重信号的准确信号的。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Adaptive Multi-head Contrastive Learning

paper_url: http://arxiv.org/abs/2310.05615
repo_url: None
paper_authors: Lei Wang, Piotr Koniusz, Tom Gedeon, Liang Zheng
for: The paper is written to address the issue of inconsistent similarity measurements in contrastive learning, specifically when using multiple augmentation strategies.methods: The paper proposes using multiple projection heads, each producing a separate set of features, to improve the consistency of similarity measurements in contrastive learning. The loss function for pre-training is based on a solution to the maximum likelihood estimation over head-wise posterior distributions of positive samples given observations.results: The proposed adaptive multi-head contrastive learning (AMCL) method improves the performance of several popular contrastive learning methods, including SimCLR, MoCo, and Barlow Twins, under various backbones and linear probing epochs. The improvement is more significant when multiple augmentation methods are used.

Abstract
In contrastive learning, two views of an original image generated by different augmentations are considered as a positive pair whose similarity is required to be high. Moreover, two views of two different images are considered as a negative pair, and their similarity is encouraged to be low. Normally, a single similarity measure given by a single projection head is used to evaluate positive and negative sample pairs, respectively. However, due to the various augmentation strategies and varying intra-sample similarity, augmented views from the same image are often not similar. Moreover, due to inter-sample similarity, augmented views of two different images may be more similar than augmented views from the same image. As such, enforcing a high similarity for positive pairs and a low similarity for negative pairs may not always be achievable, and in the case of some pairs, forcing so may be detrimental to the performance. To address this issue, we propose to use multiple projection heads, each producing a separate set of features. Our loss function for pre-training emerges from a solution to the maximum likelihood estimation over head-wise posterior distributions of positive samples given observations. The loss contains the similarity measure over positive and negative pairs, each re-weighted by an individual adaptive temperature that is regularized to prevent ill solutions. Our adaptive multi-head contrastive learning (AMCL) can be applied to and experimentally improves several popular contrastive learning methods such as SimCLR, MoCo and Barlow Twins. Such improvement is consistent under various backbones and linear probing epoches and is more significant when multiple augmentation methods are used.

摘要
在对比学习中，两个视图来自不同的扩充方法的原始图像被视为一个正样对，需要高度相似。同时，两个不同图像的两个视图被视为一个负样对，需要低度相似。通常情况下，单一的相似度测量由单个投影头提供，用于评估正样对和负样对。然而，由于不同的扩充策略和内样 Similarity 的变化，扩充视图从同一个图像中可能不相似，而两个不同图像的扩充视图可能更相似。因此，强制正样对和负样对的相似度高低可能并不总是可 achievable，而且在某些对之中，强制如此可能会损害性能。为解决这个问题，我们提议使用多个投影头，每个生成一个独立的特征集。我们的损失函数在预训练阶段由每个头wise posterior distribution of positive samples given observations的最大可能性解决出来。损失函数包括对正样对和负样对的相似度测量，每个重新权重通过个体适应温度评正化，以避免不良解决。我们称之为自适应多头对比学习（AMCL）。我们的AMCL可以应用到多种流行的对比学习方法，如SimCLR、MoCo和Barlow Twins，并在不同的后端和线性探针级别上实现了实验增进。这种改进是不同的扩充方法和误差率下的可重复的，并且在多个扩充方法使用时更加明显。

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations

paper_url: http://arxiv.org/abs/2310.05592
repo_url: https://github.com/dfki-nlp/interrolang
paper_authors: Nils Feldhus, Qianli Wang, Tatiana Anikina, Sahil Chopra, Cennet Oguz, Sebastian Möller
for: 本研究旨在开发一个可交互的对话系统，帮助用户通过自然语言界面获得模型和数据集的解释。
methods: 本研究采用了对话扩展模型TalkToModel（Slack et al., 2022），并添加了新的NLP特有操作，如自由文本合理化。
results: 研究发现，对话性解释对用户来说是有用和有 corrections 的，可以帮助用户更好地理解模型的预测结果。此外，用户通过对话性解释可以更好地预测模型的结果，而不是基于单个解释。

Abstract
While recently developed NLP explainability methods let us open the black box in various ways (Madsen et al., 2022), a missing ingredient in this endeavor is an interactive tool offering a conversational interface. Such a dialogue system can help users explore datasets and models with explanations in a contextualized manner, e.g. via clarification or follow-up questions, and through a natural language interface. We adapt the conversational explanation framework TalkToModel (Slack et al., 2022) to the NLP domain, add new NLP-specific operations such as free-text rationalization, and illustrate its generalizability on three NLP tasks (dialogue act classification, question answering, hate speech detection). To recognize user queries for explanations, we evaluate fine-tuned and few-shot prompting models and implement a novel Adapter-based approach. We then conduct two user studies on (1) the perceived correctness and helpfulness of the dialogues, and (2) the simulatability, i.e. how objectively helpful dialogical explanations are for humans in figuring out the model's predicted label when it's not shown. We found rationalization and feature attribution were helpful in explaining the model behavior. Moreover, users could more reliably predict the model outcome based on an explanation dialogue rather than one-off explanations.

摘要
Recently developed NLP explainability methods have allowed us to open the black box in various ways (Madsen et al., 2022), but a missing ingredient in this endeavor is an interactive tool offering a conversational interface. Such a dialogue system can help users explore datasets and models with explanations in a contextualized manner, for example, via clarification or follow-up questions, and through a natural language interface. We adapted the conversational explanation framework TalkToModel (Slack et al., 2022) to the NLP domain, added new NLP-specific operations such as free-text rationalization, and demonstrated its generalizability on three NLP tasks (dialogue act classification, question answering, hate speech detection). To recognize user queries for explanations, we evaluated fine-tuned and few-shot prompting models and implemented a novel Adapter-based approach. We then conducted two user studies on (1) the perceived correctness and helpfulness of the dialogues, and (2) the simulatability, i.e., how objectively helpful dialogical explanations are for humans in figuring out the model's predicted label when it's not shown. We found that rationalization and feature attribution were helpful in explaining the model behavior, and users could more reliably predict the model outcome based on an explanation dialogue rather than one-off explanations.

Aggregated f-average Neural Network for Interpretable Ensembling

paper_url: http://arxiv.org/abs/2310.05566
repo_url: None
paper_authors: Mathieu Vu, Emilie Chouzenoux, Jean-Christophe Pesquet, Ismail Ben Ayed
for: 这篇论文主要用于解决几个机器学习模型（即弱学习器）在共同任务上进行提高预测性能的问题。
methods: 这篇论文使用了基本的拼接法和更复杂的树状拼接法，并引入了一个新的卷积神经网络来实现最佳的吞吐量拼接。
results: 该论文通过使用不同类型的均值来优化弱学习器预测结果，并通过使用可解释的架构和简单的训练策略，实现了在少量示例增强学习问题上的好表现。

Abstract
Ensemble learning leverages multiple models (i.e., weak learners) on a common machine learning task to enhance prediction performance. Basic ensembling approaches average the weak learners outputs, while more sophisticated ones stack a machine learning model in between the weak learners outputs and the final prediction. This work fuses both aforementioned frameworks. We introduce an aggregated f-average (AFA) shallow neural network which models and combines different types of averages to perform an optimal aggregation of the weak learners predictions. We emphasise its interpretable architecture and simple training strategy, and illustrate its good performance on the problem of few-shot class incremental learning.

摘要
ensemble learning可以利用多个模型（即弱学习器）来增强预测性能， Basic的ensembleapproaches是将弱学习器输出平均化，而更复杂的ones是在弱学习器输出和最终预测之间堆叠一个机器学习模型。这个工作 fusion这两种框架。我们介绍了一个集成了不同类型的平均值的简单神经网络，即集成f-平均（AFA）神经网络。我们强调其可解释的architecture和简单的训练策略，并通过几何学问题中的少量逻辑学习问题 illustrate its good performance。Here's the breakdown of the translation:* "ensemble learning" is translated as "集成学习" (zhòngshì xuéxí)* "leverages" is translated as "利用" (lìyòu)* "multiple models" is translated as "多个模型" (duō ge móde)* "weak learners" is translated as "弱学习器" (ruò xuéxí qì)* "basic ensembling approaches" is translated as "基本的ensembleapproaches" (jīběn de ensembleapproaches)* "stack a machine learning model" is translated as "堆叠一个机器学习模型" (zhumu yī ge jīshì xuéxí móde)* "this work" is translated as "这个工作" (zhè ge gōngzuò)* "fuses" is translated as "集成" (jìshèng)* "both frameworks" is translated as "这两种框架" (zhè liàng zhī kāngjī)* "introduce" is translated as "介绍" (jièkuài)* "an aggregated f-average (AFA) shallow neural network" is translated as "一个集成f-平均（AFA）神经网络" (yī ge jìshèng f-píngyān (AFA) xīnnéng wǎngluò)* "interpretable architecture" is translated as "可解释的architecture" (kějiěshì de architecture)* "simple training strategy" is translated as "简单的训练策略" (jìndān de xùnxíng zhìxí)* "illustrate" is translated as "illustrate" (xìhuì)* "good performance" is translated as "好的性能" (hǎo de xìngnéng)Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know and I'll be happy to provide it.

paper_url: http://arxiv.org/abs/2310.05563
repo_url: None
paper_authors: Yuwei Wang, Enmeng Lu, Zizhe Ruan, Yao Liang, Yi Zeng
for: 本文提出了一个社交数据和知识共同智能平台，用于训练伦理AI模型（STREAM），以解决人工智能模型与人类伦理价值观念的匹配问题，并提供伦理数据集和知识库，以帮助AI模型“遵循好 advise的自然流动”。
methods: 本文使用了创建全面和代表性的平台，准确反映人类和AI的伦理判断，捕捉人类和AI的群体和文化差异，以及时间的演变，以便实现6Es（Establishment、Evaluation、Embedding、Embodiment、Ensemble、Evolvement）中的伦理能力。
results: STREAM已经提供了丰富的伦理enario集，并收集了大量由志愿者和各种流行的大语言模型（LLMs）注释的伦理判断数据，共同反映人类和AI在不同伦理上的偏好和表现。

Abstract
This paper presents Social data and knowledge collective intelligence platform for TRaining Ethical AI Models (STREAM) to address the challenge of aligning AI models with human moral values, and to provide ethics datasets and knowledge bases to help promote AI models "follow good advice as naturally as a stream follows its course". By creating a comprehensive and representative platform that accurately mirrors the moral judgments of diverse groups including humans and AIs, we hope to effectively portray cultural and group variations, and capture the dynamic evolution of moral judgments over time, which in turn will facilitate the Establishment, Evaluation, Embedding, Embodiment, Ensemble, and Evolvement (6Es) of the moral capabilities of AI models. Currently, STREAM has already furnished a comprehensive collection of ethical scenarios, and amassed substantial moral judgment data annotated by volunteers and various popular Large Language Models (LLMs), collectively portraying the moral preferences and performances of both humans and AIs across a range of moral contexts. This paper will outline the current structure and construction of STREAM, explore its potential applications, and discuss its future prospects.

摘要
Currently, STREAM has furnished a comprehensive collection of ethical scenarios and amassed substantial moral judgment data annotated by volunteers and various popular Large Language Models (LLMs), collectively portraying the moral preferences and performances of both humans and AIs across a range of moral contexts. This paper will outline the current structure and construction of STREAM, explore its potential applications, and discuss its future prospects.

WeatherDepth: Curriculum Contrastive Learning for Self-Supervised Depth Estimation under Adverse Weather Conditions

paper_url: http://arxiv.org/abs/2310.05556
repo_url: None
paper_authors: Jiyuan Wang, Chunyu Lin, Lang Nie, Shujun Huang, Yao Zhao, Xing Pan, Rui Ai
for: 提高雨天环境下深度估计模型的性能，适应不同天气条件下的摄像头捕捉。
methods: 提出了一种自动进行课程分配和学习的自监学习策略，通过不同的课程来逐渐适应不同的天气条件，并且通过对不同课程之间的深度一致性进行约束，以提高模型的鲁棒性。
results: 在实验中，提出的解决方案可以轻松地与不同的模型结合使用，并在人工挑战和实际雨天捕捉数据集上达到了当前最佳性能。

Abstract
Depth estimation models have shown promising performance on clear scenes but fail to generalize to adverse weather conditions due to illumination variations, weather particles, etc. In this paper, we propose WeatherDepth, a self-supervised robust depth estimation model with curriculum contrastive learning, to tackle performance degradation in complex weather conditions. Concretely, we first present a progressive curriculum learning scheme with three simple-to-complex curricula to gradually adapt the model from clear to relative adverse, and then to adverse weather scenes. It encourages the model to gradually grasp beneficial depth cues against the weather effect, yielding smoother and better domain adaption. Meanwhile, to prevent the model from forgetting previous curricula, we integrate contrastive learning into different curricula. Drawn the reference knowledge from the previous course, our strategy establishes a depth consistency constraint between different courses towards robust depth estimation in diverse weather. Besides, to reduce manual intervention and better adapt to different models, we designed an adaptive curriculum scheduler to automatically search for the best timing for course switching. In the experiment, the proposed solution is proven to be easily incorporated into various architectures and demonstrates state-of-the-art (SoTA) performance on both synthetic and real weather datasets.

摘要
depth estimation模型在清晰场景下表现出色，但在不利天气条件下表现不佳，主要是因为照明变化、天气粒子等因素。在这篇论文中，我们提出了一种自动适应的深度估计模型——WeatherDepth，使得模型在复杂的天气条件下能够更好地适应。具体来说，我们首先提出了一种进步式课程学习方案，包括三个简单到复杂的课程，以逐步适应模型从清晰到相对不利、然后到不利天气场景。这使得模型逐渐捕捉到恰当的深度提示，从而获得更好的预测性。同时，为了避免模型忘记之前的课程，我们将对不同的课程进行了对比学习。从此，我们的策略建立了一个深度一致性约束，以保证模型在多种天气条件下的稳定性。此外，为了避免手动 intervención和更好地适应不同的模型，我们设计了一个自适应课程调度器，以自动搜索最佳课程时间点。在实验中，我们的解决方案轻松地适应到不同的架构，并在真实的天气数据集上达到了当前最佳性能（SoTA）。

Logic-guided Deep Reinforcement Learning for Stock Trading

paper_url: http://arxiv.org/abs/2310.05551
repo_url: None
paper_authors: Zhiming Li, Junzhe Jiang, Yushi Cao, Aixin Cui, Bozhi Wu, Bo Li, Yang Liu
for: 这篇论文的目的是提出一种逻辑导航的交易框架，以提高深度强化学习（DRL）在动态股票市场中的稳定性和性能。
methods: 该论文提出了一种新的逻辑导航框架，称为SYENS（程序合成基于集成策略），它通过在层次结构中使用程序合成来规范模型的行为，并具有更高的稳定性和性能。
results: 根据实验结果，SYENS在30个道琴股票的股票交易中具有更高的累积收益和较低的最大投降，并在两种交易设置下（即现金交易和质押交易）都能够显著超越基elines。

Abstract
Deep reinforcement learning (DRL) has revolutionized quantitative finance by achieving excellent performance without significant manual effort. Whereas we observe that the DRL models behave unstably in a dynamic stock market due to the low signal-to-noise ratio nature of the financial data. In this paper, we propose a novel logic-guided trading framework, termed as SYENS (Program Synthesis-based Ensemble Strategy). Different from the previous state-of-the-art ensemble reinforcement learning strategy which arbitrarily selects the best-performing agent for testing based on a single measurement, our framework proposes regularizing the model's behavior in a hierarchical manner using the program synthesis by sketching paradigm. First, we propose a high-level, domain-specific language (DSL) that is used for the depiction of the market environment and action. Then based on the DSL, a novel program sketch is introduced, which embeds human expert knowledge in a logical manner. Finally, based on the program sketch, we adopt the program synthesis by sketching a paradigm and synthesizing a logical, hierarchical trading strategy. We evaluate SYENS on the 30 Dow Jones stocks under the cash trading and the margin trading settings. Experimental results demonstrate that our proposed framework can significantly outperform the baselines with much higher cumulative return and lower maximum drawdown under both settings.

摘要
深度强化学习（DRL）已经革命化金融科学，它可以在不需要显著的人工努力的情况下达到出色的性能。然而，我们观察到DRL模型在动态股票市场中的不稳定行为，这是因为财务数据的信号噪声比例较低。在这篇论文中，我们提出了一种新的逻辑引导交易框架，称为SYENS（程序合成基于ensemble策略）。与前一代状态的聚合强化学习策略不同，我们的框架在层次结构上使用程序合成来规范模型的行为。首先，我们提出了一种高级、领域特定语言（DSL），用于描述市场环境和行动。然后，我们基于DSL引入了一种新的程序绘制，其嵌入了人类专家知识在逻辑上。最后，我们采用程序合成 by sketching 方法，并将其应用于SYENS框架中。我们在30个道琴股票下对cash交易和margin交易进行了实验。实验结果表明，我们的提出的框架可以与基准值相比较高的净返报和较低的最大下降。

ParFam – Symbolic Regression Based on Continuous Global Optimization

paper_url: http://arxiv.org/abs/2310.05537
repo_url: https://github.com/philipp238/parfam
paper_authors: Philipp Scholl, Katharina Bieker, Hillary Hauger, Gitta Kutyniok
for: 解决Symbolic Regression（SR）问题，包括从数据中找到物理法律或财务市场行为的数学方程。
methods: 使用参数家族的符号函数来将精确的SR问题转化为连续问题，然后与强大的全球优化器结合使用，实现SR问题的解决。
results: 通过广泛的数字实验，证明ParFam可以 дости到SR问题的状态Esp中的最佳解决方案，并可以轻松扩展到更高级的算法，例如添加深度神经网络来找到适合的参数家族。

Abstract
The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually quite complicated and require a lot of hyperparameter tuning and computational resources. In this paper, we present our new method ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a powerful global optimizer, this approach results in an effective method to tackle the problem of SR. Furthermore, it can be easily extended to more advanced algorithms, e.g., by adding a deep neural network to find good-fitting parametric families. We prove the performance of ParFam with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Our code and results can be found at https://github.com/Philipp238/parfam .

摘要
SR（符号回归）问题在多种应用中出现，如从数据中找到物理法律或财务市场行为的数学方程。现有多种解决 SR 问题的方法，通常基于进化编程。然而，这些方法通常很复杂，需要许多Hyperparameter调整和计算资源。在这篇论文中，我们介绍了我们的新方法 ParFam，它利用适当的参数家族符号函数来将离散的符号回归问题转化为连续的问题，从而得到更直观的设置。与现有状态之册方法相比，我们的方法更加简单，并且可以轻松地扩展到更高级的算法，例如通过添加深度神经网络来找到good-fitting参数家族。我们通过对 SRBench 常用的 SR benchmark 进行广泛的数字实验，证明了 ParFam 的性能。我们的代码和结果可以在 GitHub 上找到：https://github.com/Philipp238/parfam。

On Double Descent in Reinforcement Learning with LSTD and Random Features

paper_url: http://arxiv.org/abs/2310.05518
repo_url: None
paper_authors: David Brellmann, Eloïse Berthier, David Filliat, Goran Frehse
for: 本研究探讨了深度强化学习（RL）中 temporal difference（TD）算法的性能如何受到神经网络大小的影响。
methods: 本研究使用了理论分析来研究神经网络大小和 $l_2$-正则化对性能的影响。研究人员还使用了Random Features和懒散训练策略来研究正则化最小二乘差分算法在无穷大参数和状态数下的性能。
results: 研究人员发现了一种双峰现象，即在神经网络参数和状态数比例大于1时，性能会快速下降。他们还发现，在增加 $l_2$-正则化或状态数下降到0时， corrction terms消失。numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.

Abstract
Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and $l_2$-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Square Bellman Error (MSBE) that feature correction terms responsible for the double-descent. Correction terms vanish when the $l_2$-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.

摘要
temporal difference（TD）算法在深度学习（RL）中广泛使用。其性能受到神经网络大小的影响。在超vised学习中，过度参数的情况和其好处已经很好地理解，但在RL中情况却相对不清楚。在这篇论文中，我们提供了TD算法性能与神经网络大小的理论分析。我们确定了参数与访问状态的比率为关键因素，并定义了过度参数为神经网络参数的数量大于状态数量的情况。此外，我们发现了一种双峰现象，即参数/状态比率接近1时性能突然下降。通过随机特征和懒散训练策略，我们研究了正则化最小二乘差（LSTD）算法在参数和状态数量 infinito 的极限情况下。我们 derive了参数和状态数量 infinito 下的零限的实际和真实的Mean-Square Bellman Error（MSBE），其中包含修正项负责 Double Descent。这些修正项在 $l_2$ 正则化强度增大或未访问状态数量减少时消失。实际实验结果与理论预测匹配得非常好。

UAVs and Neural Networks for search and rescue missions

paper_url: http://arxiv.org/abs/2310.05512
repo_url: None
paper_authors: Hartmut Surmann, Artur Leinweber, Gerhard Senkowski, Julien Meine, Dominik Slomma
for: detection of objects of interest (cars, humans, fire) in aerial images captured by UAVs during vegetation fires
methods: use of artificial neural networks, creation of a dataset for supervised learning, implementation of an object detection pipeline combining classic image processing techniques with pretrained neural networks, development of a data augmentation pipeline to augment the dataset with automatically labeled images
results: evaluation of the performance of different neural networksHere’s the information in Simplified Chinese:
for: 检测 aerial 图像中的目标对象（车辆、人员、火灾），通常由无人飞行器（UAV）拍摄
methods: 使用人工神经网络，创建一个超级vised 学习的数据集，实现一个对象检测管道，将经典的图像处理技术与预训练的神经网络结合使用，开发一个自动标注图像数据集的数据增强管道
results: 评估不同神经网络的性能

Abstract
In this paper, we present a method for detecting objects of interest, including cars, humans, and fire, in aerial images captured by unmanned aerial vehicles (UAVs) usually during vegetation fires. To achieve this, we use artificial neural networks and create a dataset for supervised learning. We accomplish the assisted labeling of the dataset through the implementation of an object detection pipeline that combines classic image processing techniques with pretrained neural networks. In addition, we develop a data augmentation pipeline to augment the dataset with automatically labeled images. Finally, we evaluate the performance of different neural networks.

摘要
在这篇论文中，我们提出了一种方法用于在由无人飞行器（UAV）拍摄的空中图像中检测有关兴趣的对象，包括汽车、人体和火灾。为 дости这一目标，我们使用人工神经网络，并创建了一个用于超级vised学习的数据集。我们通过实施对象检测管道，该管道组合了经典的图像处理技术和预训练的神经网络，来协助标注数据集。此外，我们还开发了一个自动生成数据集的管道，以增加数据集的自动标注图像。最后，我们评估了不同的神经网络的性能。

Query and Response Augmentation Cannot Help Out-of-domain Math Reasoning Generalization

paper_url: http://arxiv.org/abs/2310.05506
repo_url: https://github.com/ofa-sys/gsm8k-screl
paper_authors: Chengpeng Li, Zheng Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou
for: 这个论文主要是为了研究在数学逻辑中使用大语言模型（LLMs）时，数据增强的效果，以及增强数据的量和多样性对模型性能的影响。
methods: 作者使用了一种新的数据集——AugGSM8K，通过复杂和多样化 queries 来增强数据，并通过 fine-tuning 来训练 LLMs。
results: 作者发现，通过增强数据，可以提高 LLMs 的数学逻辑性能，并且存在对数据量的呈几何关系。然而，在各种数学逻辑任务之间的泛化性能仍然需要进一步改进。

Abstract
In math reasoning with large language models (LLMs), fine-tuning data augmentation by query evolution and diverse reasoning paths is empirically verified effective, profoundly narrowing the gap between open-sourced LLMs and cutting-edge proprietary LLMs. In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer: (1) What strategies of data augmentation are more effective; (2) What is the scaling relationship between the amount of augmented data and model performance; and (3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks? To this end, we create a new dataset, AugGSM8K, by complicating and diversifying the queries from GSM8K and sampling multiple reasoning paths. We obtained a series of LLMs called MuggleMath by fine-tuning on subsets of AugGSM8K. MuggleMath substantially achieves new state-of-the-art on GSM8K (from 54% to 68.4% at the scale of 7B, and from 63.9% to 74.0% at the scale of 13B). A log-linear relationship is presented between MuggleMath's performance and the amount of augmented data. We also find that MuggleMath is weak in out-of-domain math reasoning generalization to MATH. This is attributed to the differences in query distribution between AugGSM8K and MATH which suggest that augmentation on a single benchmark could not help with overall math reasoning performance. Codes and AugGSM8K will be uploaded to https://github.com/OFA-Sys/gsm8k-ScRel.

摘要
在数学逻辑中使用大型自然语言模型（LLMs）， fine-tuning数据增强和多种逻辑路径的数据增强被证明是有效的，可以减小开源LLMs和高级专有LLMs之间的差距。在这篇论文中，我们进行了数学逻辑中数据增强的调查，旨在回答以下问题：（1）哪些数据增强策略更加有效；（2）数据增强量和模型性能之间存在哪种整数关系；和（3）数据增强是否能够适应尺度外的数学逻辑任务？为此，我们创建了一个新的数据集，AugGSM8K，通过复杂和多样化 queries from GSM8K 来生成多种 reasoning paths。我们使用这些数据集进行了一系列的 LLMS 的 fine-tuning，并取得了一系列的 MuggleMath 模型。MuggleMath 在 GSM8K 上实现了新的状态机器人，从 54% 提高到 68.4% （7B 缩放）和从 63.9% 提高到 74.0% （13B 缩放）。我们发现了数据增强和模型性能之间存在很好的对数关系。此外，我们发现 MuggleMath 在尺度外的数学逻辑任务上的总体性能较弱，这是因为 AugGSM8K 和 MATH 的查询分布之间存在差异。这意味着数据增强在单一的 benchmark 上不能够提高总体数学逻辑性能。代码和 AugGSM8K 将在上上传。

Integrating Graphs with Large Language Models: Methods and Prospects

paper_url: http://arxiv.org/abs/2310.05499
repo_url: None
paper_authors: Shirui Pan, Yizhen Zheng, Yixin Liu
for: 本研究旨在探讨大语言模型（LLM）与图structured data的 интеграción，以提高LLM的性能和应用范围。
methods: 研究分为两类：首先，使用LLM进行图学习，以提高图任务的预测性能；其次，通过图结构来加强LLM的性能，例如在复杂任务中进行合作或理解。
results: 研究表明，通过图结构和LLM的结合，可以提高LLM的性能和应用范围，并且提出了未来研究的开放问题。

Abstract
Large language models (LLMs) such as GPT-4 have emerged as frontrunners, showcasing unparalleled prowess in diverse applications, including answering queries, code generation, and more. Parallelly, graph-structured data, an intrinsic data type, is pervasive in real-world scenarios. Merging the capabilities of LLMs with graph-structured data has been a topic of keen interest. This paper bifurcates such integrations into two predominant categories. The first leverages LLMs for graph learning, where LLMs can not only augment existing graph algorithms but also stand as prediction models for various graph tasks. Conversely, the second category underscores the pivotal role of graphs in advancing LLMs. Mirroring human cognition, we solve complex tasks by adopting graphs in either reasoning or collaboration. Integrating with such structures can significantly boost the performance of LLMs in various complicated tasks. We also discuss and propose open questions for integrating LLMs with graph-structured data for the future direction of the field.

摘要
大型语言模型（LLM）如GPT-4在多种应用中表现出不一样的优势，包括回答问题、代码生成等。同时，图струк成数据是实际世界中普遍存在的数据类型。把LMLM的能力与图结构数据结合，已成为研究者的焦点之一。这篇评论文将这些结合分为两大类。第一种将LMLM用于图学习，LMLM可以不仅增强现有的图算法，还可以作为各种图任务的预测模型。相反，第二种类型强调图结构数据在提高LMLM表现的重要性。人类的思维方式是透过图来解释和协作来解决复杂任务。与图结构数据结合可以将LMLM在多种复杂任务中表现出较好的成绩。我们还讨论了未来领域的开启问题，以推动LMLM与图结构数据的结合。

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

paper_url: http://arxiv.org/abs/2310.05492
repo_url: None
paper_authors: Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou
for: 这个研究的目的是调查多种能力的LLMs在Supervised Fine-tuning（SFT）中的可调性，以及不同能力之间的数据组合对性能的影响。methods: 这个研究使用了多种SFT策略，包括顺序学习多能力（sequential learning）和 dual-stage mixed fine-tuning（DMT）策略，以及不同数据量和数据组合比例的调整。results: 研究发现，不同的能力展现出不同的扩展特征，大型模型通常需要更多的数据来提高性能。数学逻辑和代码生成能力随着数据量的增加而提高，而通用能力需要约一千个样本才能提高，然后slowly improves。数据组合对不同的能力有启发作用，但高数据量时会导致能力冲突。DMT策略可以避免卷积学习导致忘记现象，提供了多能力学习的可能解决方案。

Abstract
Large language models (LLMs) with enormous pre-training tokens and parameter amounts emerge abilities, including math reasoning, code generation, and instruction following. These abilities are further enhanced by supervised fine-tuning (SFT). The open-source community has studied on ad-hoc SFT for each ability, while proprietary LLMs are versatile for all abilities. It is important to investigate how to unlock them with multiple abilities via SFT. In this study, we specifically focus on the data composition between mathematical reasoning, code generation, and general human-aligning abilities during SFT. From a scaling perspective, we investigate the relationship between model abilities and various factors including data amounts, data composition ratio, model parameters, and SFT strategies. Our experiments reveal that different abilities exhibit different scaling patterns, and larger models generally show superior performance with the same amount of data. Mathematical reasoning and code generation improve as data amounts increase consistently, while the general ability is enhanced with about a thousand samples and improves slowly. We find data composition results in various abilities improvements with low data amounts, while conflicts of abilities with high data amounts. Our experiments further show that composition data amount impacts performance, while the influence of composition ratio is insignificant. Regarding the SFT strategies, we evaluate sequential learning multiple abilities are prone to catastrophic forgetting. Our proposed Dual-stage Mixed Fine-tuning (DMT) strategy learns specialized abilities first and then learns general abilities with a small amount of specialized data to prevent forgetting, offering a promising solution to learn multiple abilities with different scaling patterns.

摘要
大型语言模型（LLM）有巨大的预训语料和参数数量，并且拥有多种能力，包括数学推理、代码生成和指令跟进。这些能力可以通过监督精致训练（SFT）进一步增强。开源社区已经研究了随机SFT的每个能力，而商业LLM则具有多种能力。我们需要研究如何通过SFT解锁多种能力。在本研究中，我们专注于SFT中数学推理、代码生成和通用人类调整的数据结构之间的关系。从扩展角度来看，我们调查模型能力和不同因素（包括数据量、数据结构比例、模型参数和SFT策略）之间的关系。我们的实验显示不同的能力展现出不同的扩展模式，大型模型通常在同量数据下表现出色。数学推理和代码生成随着数据量增加逐渐提高，而通用能力则在约一千个数据 sample 后逐渐提高。我们发现数据结构可以在低数据量下提高不同的能力，但高数据量时会出现能力冲突。我们的实验还显示了数据结构填充量影响表现，但结构比例无法影响表现。关于SFT策略，我们评估了预先学习特定能力后，将特定数据进行混合精致训练（DMT），以避免忘记，提供了多能力学习的有前途的解决方案。

Cabbage Sweeter than Cake? Analysing the Potential of Large Language Models for Learning Conceptual Spaces

paper_url: http://arxiv.org/abs/2310.05481
repo_url: None
paper_authors: Usashi Chatterjee, Amit Gajbhiye, Steven Schockaert
for: 这个论文旨在探讨使用大型自然语言模型（LLM）学习概念空间的可能性。
methods: 该论文使用了一种基于语言模型的方法，通过学习人类判断来构建概念空间。
results: 实验表明，LLM可以学习一定程度的概念空间表示，但是特定的BERT模型在训练后可以与最大的GPT-3模型匹配或超越它，即使它们的大小只是GPT-3的2-3个数量级。

Abstract
The theory of Conceptual Spaces is an influential cognitive-linguistic framework for representing the meaning of concepts. Conceptual spaces are constructed from a set of quality dimensions, which essentially correspond to primitive perceptual features (e.g. hue or size). These quality dimensions are usually learned from human judgements, which means that applications of conceptual spaces tend to be limited to narrow domains (e.g. modelling colour or taste). Encouraged by recent findings about the ability of Large Language Models (LLMs) to learn perceptually grounded representations, we explore the potential of such models for learning conceptual spaces. Our experiments show that LLMs can indeed be used for learning meaningful representations to some extent. However, we also find that fine-tuned models of the BERT family are able to match or even outperform the largest GPT-3 model, despite being 2 to 3 orders of magnitude smaller.

摘要
理论的概念空间模型是一种有影响力的认知语言框架，用于表示概念的含义。概念空间由一系列质量维度组成，这些质量维度通常来自人类判断，这意味着应用概念空间通常受限于特定领域（如色彩或味道模elling）。鼓动了最近发现大语言模型（LLMs）可以学习基于感知的表示，我们explore了这些模型是否可以学习有意义的概念空间。我们的实验表明LLMs可以学习有意义的表示，但我们还发现，经过精度调整的BERT家族模型可以与最大GPT-3模型匹配或者超越，即使其体积只是GPT-3模型的2-3个数量级。

Deep Optimal Timing Strategies for Time Series

paper_url: http://arxiv.org/abs/2310.05479
repo_url: https://github.com/chenpopper/optimal_timing_tsf
paper_authors: Chen Pan, Fan Zhou, Xuanwei Hu, Xinxin Zhu, Wenxin Ning, Zi Zhuang, Siqiao Xue, James Zhang, Yunhua Hu
for: 这篇论文的目的是解决很多企业活动中的时间执行计划问题，即在时间序列预测中做出最佳的执行时间选择。
methods: 该论文提出了一种机制，即将时间序列预测任务和优化执行时间决策任务结合起来，以提供一个具有坚实理论基础和实际应用灵活性的解决方案。特别是，它通过使用概率时间序列预测算法，不需要 сложные数学动力模型，从而避免了很多其他常见实践中的假设强大优化知识。
results: 该论文通过使用核心束回归神经网络（RNN）来近似优化时间执行时间，实现了在实际应用中减少操作成本的目标。详细的实现细节可以参考github上的\url{github.com/ChenPopper/optimal_timing_TSF}仓库。

Abstract
Deciding the best future execution time is a critical task in many business activities while evolving time series forecasting, and optimal timing strategy provides such a solution, which is driven by observed data. This solution has plenty of valuable applications to reduce the operation costs. In this paper, we propose a mechanism that combines a probabilistic time series forecasting task and an optimal timing decision task as a first systematic attempt to tackle these practical problems with both solid theoretical foundation and real-world flexibility. Specifically, it generates the future paths of the underlying time series via probabilistic forecasting algorithms, which does not need a sophisticated mathematical dynamic model relying on strong prior knowledge as most other common practices. In order to find the optimal execution time, we formulate the decision task as an optimal stopping problem, and employ a recurrent neural network structure (RNN) to approximate the optimal times. Github repository: \url{github.com/ChenPopper/optimal_timing_TSF}.

摘要
决定最佳未来执行时间是许多企业活动中的关键任务，而且随着时间序列预测的演化，最佳时间策略提供了一个解决方案，它是由观察数据驱动的。这种解决方案有很多有价值的应用，可以降低运营成本。在这篇论文中，我们提出了一种机制，它将混合 probabilistic 时间序列预测任务和最佳时间决策任务，作为第一个系统性的尝试，以解决这些实际问题。Specifically, it generates the future paths of the underlying time series via probabilistic forecasting algorithms, which does not need a sophisticated mathematical dynamic model relying on strong prior knowledge as most other common practices. In order to find the optimal execution time, we formulate the decision task as an optimal stopping problem, and employ a recurrent neural network structure (RNN) to approximate the optimal times. Github repository: \url{github.com/ChenPopper/optimal_timing_TSF}.Here's the word-for-word translation of the text into Simplified Chinese:决定最佳未来执行时间是许多企业活动中的关键任务，而且随着时间序列预测的演化，最佳时间策略提供了一个解决方案，它是由观察数据驱动的。这种解决方案有很多有价值的应用，可以降低运营成本。在这篇论文中，我们提出了一种机制，它将混合 probabilistic 时间序列预测任务和最佳时间决策任务，作为第一个系统性的尝试，以解决这些实际问题。Specifically, it generates the future paths of the underlying time series via probabilistic forecasting algorithms, which does not need a sophisticated mathematical dynamic model relying on strong prior knowledge as most other common practices. In order to find the optimal execution time, we formulate the decision task as an optimal stopping problem, and employ a recurrent neural network structure (RNN) to approximate the optimal times. Github repository: \url{github.com/ChenPopper/optimal_timing_TSF}.

Sentence-level Prompts Benefit Composed Image Retrieval

paper_url: http://arxiv.org/abs/2310.05473
repo_url: https://github.com/chunmeifeng/sprc
paper_authors: Yang Bai, Xinxing Xu, Yong Liu, Salman Khan, Fahad Khan, Wangmeng Zuo, Rick Siow Mong Goh, Chun-Mei Feng
for: 提高组合图像检索的精度（Composed Image Retrieval）
methods: 使用预训练的V-L模型生成句子级提示，并使用image-text对比损失和文本提示对齐损失来学习适合的句子级提示。
results: 在Fashion-IQ和CIRR数据集上比革尤其良好，比起现有state-of-the-art方法。In Simplified Chinese text, the three information would be:
for: 提高组合图像检索的精度
methods: 使用预训练的V-L模型生成句子级提示，并使用image-text对比损失和文本提示对齐损失来学习适合的句子级提示。
results: 在Fashion-IQ和CIRR数据集上比革尤其良好，比起现有state-of-the-art方法。

Abstract
Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption. Most existing CIR models adopt the late-fusion strategy to combine visual and language features. Besides, several approaches have also been suggested to generate a pseudo-word token from the reference image, which is further integrated into the relative caption for CIR. However, these pseudo-word-based prompting methods have limitations when target image encompasses complex changes on reference image, e.g., object removal and attribute modification. In this work, we demonstrate that learning an appropriate sentence-level prompt for the relative caption (SPRC) is sufficient for achieving effective composed image retrieval. Instead of relying on pseudo-word-based prompts, we propose to leverage pretrained V-L models, e.g., BLIP-2, to generate sentence-level prompts. By concatenating the learned sentence-level prompt with the relative caption, one can readily use existing text-based image retrieval models to enhance CIR performance. Furthermore, we introduce both image-text contrastive loss and text prompt alignment loss to enforce the learning of suitable sentence-level prompts. Experiments show that our proposed method performs favorably against the state-of-the-art CIR methods on the Fashion-IQ and CIRR datasets. The source code and pretrained model are publicly available at https://github.com/chunmeifeng/SPRC

摘要
“组合图像检索（CIR）任务是根据查询包含参考图像和相关描述文本来检索特定图像。现有大多数CIR模型采用较晚的融合策略将视觉和语言特征结合。此外，一些方法还建议生成基于参考图像的 pseudo-word 令，并将其与相关描述文本结合使用。然而，这些 pseudo-word 基于的提示方法在参考图像具有复杂变化时存在限制，例如对象移除和特征修改。在这种情况下，我们表明了学习适当的句子级提示（SPRC）是可以实现有效的组合图像检索的。而不是依赖 pseudo-word 基于的提示，我们提议利用预训练的 V-L 模型，如 BLIP-2，生成句子级提示。将学习的句子级提示与相关描述文本 concatenate 后，可以直接使用现有的文本基于图像检索模型进行改进 CIR 性能。此外，我们引入了图像文本对比loss和文本提示对齐loss，以便学习适当的句子级提示。实验结果表明，我们提出的方法在 Fashion-IQ 和 CIRR 数据集上与状态对照方法相比表现优异。源代码和预训练模型可以在 GitHub 上下载。”

Generative Judge for Evaluating Alignment

paper_url: http://arxiv.org/abs/2310.05470
repo_url: https://github.com/gair-nlp/auto-j
paper_authors: Junlong Li, Shichao Sun, Weizhe Yuan, Run-Ze Fan, Hai Zhao, Pengfei Liu
for: 本研究旨在提出一种生成式评审器（Auto-J），以应对大语言模型（LLM）在自然语言处理（NLP）领域的扩展。
methods: 我们提出了一种基于用户问题和LLM生成的回答的训练方法，并采用了质量权重学习和权重融合来提高模型的一致性和泛化性。
results: 实验结果显示，Auto-J在58个不同情景下的测试环境中具有显著的优势，并且与其他竞争对手（包括开源和关闭源模型）形成明显的差异。

Abstract
The rapid development of Large Language Models (LLMs) has substantially expanded the range of tasks they can address. In the field of Natural Language Processing (NLP), researchers have shifted their focus from conventional NLP tasks (e.g., sequence tagging and parsing) towards tasks that revolve around aligning with human needs (e.g., brainstorming and email writing). This shift in task distribution imposes new requirements on evaluating these aligned models regarding generality (i.e., assessing performance across diverse scenarios), flexibility (i.e., examining under different protocols), and interpretability (i.e., scrutinizing models with explanations). In this paper, we propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios and accommodates diverse evaluation protocols (e.g., pairwise response comparison and single-response evaluation) with well-structured natural language critiques. To demonstrate the efficacy of our approach, we construct a new testbed covering 58 different scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models, by a large margin. We also provide detailed analysis and case studies to further reveal the potential of our method and make a variety of resources public at https://github.com/GAIR-NLP/auto-j.

摘要
随着大型语言模型（LLM）的快速发展，它们可以 addresses 的任务范围得到了极大的扩展。在自然语言处理（NLP）领域，研究人员的焦点从传统的 NLP 任务（如序列标记和分析）转移到了与人类需求相关的任务（如审想和电子邮件写作）。这种任务分布的变化对评估这些对齐的模型进行评估有新的要求，包括总体性（即在多种场景中的表现评估）、灵活性（即在不同的协议下进行评估）以及可读性（即使用自然语言的解释来评估模型）。在这篇论文中，我们提出了一个名为 Auto-J 的生成式评价器，拥有 13B 参数。我们的模型在用户查询和 LLM 生成的回答下进行训练，并且可以处理多种评估协议（如对比回答和单独评估），并且具有良好的自然语言批评结构。为了证明我们的方法的效果，我们构建了一个包含 58 个不同场景的测试床。实验结果表明，Auto-J 在与多种强大竞争对手进行比较时，有大幅度的优势。我们还提供了详细的分析和案例研究，以及在 GitHub 上公开多种资源。

Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective

paper_url: http://arxiv.org/abs/2310.05464
repo_url: None
paper_authors: Ricardo Knauer, Erik Rodner
for: 这个研究是为了设计可解释的机器学习模型，以简化其预测的输入，尤其在医疗领域。
methods: 这篇研究使用了混合整数几何优化的观点，提出了一个可证实且最佳的对应选择程序，可以考虑附加成本来选择特征。
results: 研究结果显示了低数据情况下和标签噪音情况下方法的限制，并提供了实践建议和适当的数据设计。此外，研究也辟开了meta学研究的新领域。

Abstract
A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.

摘要
一大挑战在机器学习中是设计可解释的模型，以减少输入并提供透明的预测，尤其在医疗领域。在这项工作中，我们提议一种 certificately 优化的特征选择方法，通过杂谱矩阵优化的视角来考虑辅助成本。我们通过了评 literature 的广泛回顾，并且 méticulously 创建了临床预测模型的 sintética 数据生成器。这使得我们可以系统地评估不同的启发式和优化的卡达性和预算限制下的特征选择方法。分析表明了低数据情况下和标签噪声时方法的局限性。我们的论文不仅提供了实践建议，还开辟了meta-学习领域的未来研究之路。

Ensemble-based Hybrid Optimization of Bayesian Neural Networks and Traditional Machine Learning Algorithms

paper_url: http://arxiv.org/abs/2310.05456
repo_url: None
paper_authors: Peiwen Tan
for: 这个研究旨在优化 bayesian neural networks (BNNs) 的方法，通过与传统机器学习算法如随机森林 (RF)、梯度提升 (GB) 和支持向量机 (SVM) 的综合Integration。
methods: 该研究使用 feature integration 将这些方法相互融合，并强调第二个条件的优化，包括站点性和正定定征矩阵。
results: ensemble method 表现出了 Robust 和算法优化的特点，并且 hyperparameter tuning 对 Expected Improvement (EI) 的影响较弱。

Abstract
This research introduces a novel methodology for optimizing Bayesian Neural Networks (BNNs) by synergistically integrating them with traditional machine learning algorithms such as Random Forests (RF), Gradient Boosting (GB), and Support Vector Machines (SVM). Feature integration solidifies these results by emphasizing the second-order conditions for optimality, including stationarity and positive definiteness of the Hessian matrix. Conversely, hyperparameter tuning indicates a subdued impact in improving Expected Improvement (EI), represented by EI(x). Overall, the ensemble method stands out as a robust, algorithmically optimized approach.

摘要
Translation notes:* "Bayesian Neural Networks" (BNNs) is translated as "泛函神经网络" (pànfungen xīnǎo wǎngluò)* "Random Forests" (RF) is translated as "随机森林" (suījì sēn líng)* "Gradient Boosting" (GB) is translated as "梯度提升" (dēngdì tímshēng)* "Support Vector Machines" (SVM) is translated as "支持向量机器" (zhīchēng xiàngwù jīqì)* "Feature integration" is translated as "特征集成" (fēngjī zhùchéng)* "Hyperparameter tuning" is translated as "超参数调整" (chāojianxìa dào zhèng)* "Expected Improvement" (EI) is translated as "预期改进" (yùxì gǎngyì)Note that the translation is in Simplified Chinese, which is the most commonly used form of Chinese in mainland China. If you need the translation in Traditional Chinese, please let me know.

Explaining the Complex Task Reasoning of Large Language Models with Template-Content Structure

paper_url: http://arxiv.org/abs/2310.05452
repo_url: None
paper_authors: Haotong Yang, Fanxu Meng, Zhouchen Lin, Muhan Zhang
for: This paper aims to provide an explanation for the exceptional generalization abilities of pre-trained large language models, and to offer a novel framework for understanding their ability to solve complex natural language tasks.
methods: The paper presents a hierarchical “template-content” structure for modeling answer generation in natural language tasks, and demonstrates that pre-trained models can automatically decompose tasks into constituent steps during autoregressive generation through language modeling on a sufficiently large corpus.
results: The paper shows that practical models exhibit different behaviors for “template” and “content” providing support for the proposed modeling, and offers an explanatory tool for the complex reasoning abilities of large language models from the perspective of modeling autoregressive generation tasks.

Abstract
The continuous evolution of pre-trained large language models with ever-growing parameters and corpus sizes has augmented their capacity to solve complex tasks. This ability, which obviates the necessity for task-specific training or fine-tuning, relies on providing the model with a language description or some task exemplars -- referred to the prompt -- that guide the desired autoregressive generation. Despite the remarkable success, the underlying mechanisms that facilitate such exceptional generalization abilities remain an open question. In this paper, we present a novel framework that formally conceptualizes answer generation for complex natural language tasks as a hierarchical ``template-content'' structure. According to our modeling, there exist pre-trained models that can automatically decompose tasks into constituent steps during autoregressive generation, through language modeling on a sufficiently large corpus, thereby solving them. Our framework offers an explanatory tool for the complex reasoning abilities of large language models from the perspective of modeling autoregressive generation tasks. Our experiments show that practical models exhibit different behaviors for ``template'' and ``content'' providing support for our modeling.

摘要
大型自然语言模型的不断演化和参数的增加以及训练数据的增加，使得这些模型可以更好地解决复杂任务。这种能力，不需要任务特定的训练或微调，通过给模型提供语言描述或一些任务示例（即提示）来引导潜在的自然语言生成。虽然这些成果很出色，但是这些成果的基础机制仍然是一个开放的问题。在这篇论文中，我们提出了一种新的框架，它正式地概括了复杂自然语言任务的回答生成为一个层次结构。根据我们的模型，存在一些预训练模型可以在生成过程中自动将任务 decomposes into constituent steps，通过对 sufficiently large corpus进行语言模型化，以解决任务。我们的框架提供了对大语言模型的复杂逻辑能力的解释工具，从概念生成任务的角度出发。我们的实验表明，实际模型在“模板”和“内容”提供支持，这支持我们的模型。

Replication of Multi-agent Reinforcement Learning for the “Hide and Seek” Problem

paper_url: http://arxiv.org/abs/2310.05430
repo_url: None
paper_authors: Haider Kamal, Muaz A. Niazi, Hammad Afzal
for: 本研究的目的是提高隐藏者（Hider）的搜索策略，以增强其在复杂环境中的运动和搜索能力。
methods: 本研究使用了补偿学习（Reinforcement Learning），利用奖励函数和超参数来生成策略。
results: 研究发现，通过增加飞行机制，提高了隐藏者的机动性和搜索范围，从约2000万步到1600万步，提高了隐藏者的追踪策略。

Abstract
Reinforcement learning generates policies based on reward functions and hyperparameters. Slight changes in these can significantly affect results. The lack of documentation and reproducibility in Reinforcement learning research makes it difficult to replicate once-deduced strategies. While previous research has identified strategies using grounded maneuvers, there is limited work in more complex environments. The agents in this study are simulated similarly to Open Al's hider and seek agents, in addition to a flying mechanism, enhancing their mobility, and expanding their range of possible actions and strategies. This added functionality improves the Hider agents to develop a chasing strategy from approximately 2 million steps to 1.6 million steps and hiders

摘要
利用强化学习生成策略，该策略基于奖励函数和超参数。小小的变化可能会导致 significativetransformations。 reinforcement learning研究的documentación和可重现性不足，使得复制已经获得的策略具有困难。在这种研究中，我们使用了在Open Al的隐藏者和搜索者中的 Agent，同时添加了飞行机制，从而提高了隐藏者的 mobilidad和可能的动作和策略。这种添加的功能使得隐藏者可以开发追踪策略，从约2000万步提高到1600万步。

Divide and Ensemble: Progressively Learning for the Unknown

paper_url: http://arxiv.org/abs/2310.05425
repo_url: None
paper_authors: Hu Zhang, Xin Shen, Heming Du, Huiqiang Chen, Chen Liu, Hongwei Sheng, Qingzheng Xu, MD Wahiduzzaman Khan, Qingtao Yu, Tianqing Zhu, Scott Chapman, Zi Huang, Xin Yu
for:* 这个论文是为了解决蔬菜营养不足的问题，提出了一种基于分类的方法来进行识别。methods:* 这个方法使用了分类器 ensemble 和 pseudo-labeling 技术来进行识别。results:* 这个方法在测试集上得到了93.6%的 Top-1 测试精度（94.0% 在 WW2020 上和 93.2% 在 WR2021 上），并在 Deep Nutrient Deficiency Challenge 中获得了第一名。

Abstract
In the wheat nutrient deficiencies classification challenge, we present the DividE and EnseMble (DEEM) method for progressive test data predictions. We find that (1) test images are provided in the challenge; (2) samples are equipped with their collection dates; (3) the samples of different dates show notable discrepancies. Based on the findings, we partition the dataset into discrete groups by the dates and train models on each divided group. We then adopt the pseudo-labeling approach to label the test data and incorporate those with high confidence into the training set. In pseudo-labeling, we leverage models ensemble with different architectures to enhance the reliability of predictions. The pseudo-labeling and ensembled model training are iteratively conducted until all test samples are labeled. Finally, the separated models for each group are unified to obtain the model for the whole dataset. Our method achieves an average of 93.6\% Top-1 test accuracy~(94.0\% on WW2020 and 93.2\% on WR2021) and wins the 1$st$ place in the Deep Nutrient Deficiency Challenge~\footnote{https://cvppa2023.github.io/challenges/}.

摘要
在小麦营养不足分类挑战中，我们提出了分类测试数据进行进行分组的DEEM方法（DividE和Ensemble）。我们发现：1. 测试图像提供给挑战；2. 样本具有收集日期信息；3. 不同日期的样本存在明显的差异。根据这些发现，我们将数据集分成不同日期的分组，并在每个分组上训练模型。然后，我们采用 Pseudo-labeling 方法来标注测试数据，并将高信任性的预测结果包含到训练集中。在 Pseudo-labeling 中，我们利用不同架构的模型 ensemble 以提高预测的可靠性。这些pseudo-labeling和 ensemble 模型训练是相互进行的，直到所有测试样本都被标注为止。最后，我们将每个组的模型集成起来，以获得整个数据集的模型。我们的方法实现了 Top-1 测试准确率的平均值为 93.6%（94.0% 在 WW2020 和 93.2% 在 WR2021），并在 Deep Nutrient Deficiency Challenge 中获得了第一名。

Humanoid Agents: Platform for Simulating Human-like Generative Agents

paper_url: http://arxiv.org/abs/2310.05418
repo_url: https://github.com/humanoidagents/humanoidagents
paper_authors: Zhilin Wang, Yu Ying Chiu, Yu Cheung Chiu
for: 这篇论文旨在提出一种基于人类行为的生成人工智能系统，以便更好地模拟人类行为。
methods: 该论文使用了三种系统一类处理元素：基本需求（如饥饿、健康和能量）、情感和关系质量，以导引生成人工智能agent behave更加人类化。
results: 该系统能够通过这些动态元素来适应每天的活动和对其他代理的交流，并且经验证了其效果。此外，该系统还可以扩展到不同的设定和其他影响人类行为的因素（如同情、道德价值和文化背景）。

Abstract
Just as computational simulations of atoms, molecules and cells have shaped the way we study the sciences, true-to-life simulations of human-like agents can be valuable tools for studying human behavior. We propose Humanoid Agents, a system that guides Generative Agents to behave more like humans by introducing three elements of System 1 processing: Basic needs (e.g. hunger, health and energy), Emotion and Closeness in Relationships. Humanoid Agents are able to use these dynamic elements to adapt their daily activities and conversations with other agents, as supported with empirical experiments. Our system is designed to be extensible to various settings, three of which we demonstrate, as well as to other elements influencing human behavior (e.g. empathy, moral values and cultural background). Our platform also includes a Unity WebGL game interface for visualization and an interactive analytics dashboard to show agent statuses over time. Our platform is available on https://www.humanoidagents.com/ and code is on https://github.com/HumanoidAgents/HumanoidAgents

摘要
“computational simulations of atoms、molecules和 cells 已经影响了我们研究科学的方法，true-to-life simulations of human-like agents 可以是我们研究人类行为的有用工具。我们提出了人工智能代理人系统（Humanoid Agents），它将引入系统1处理中的三个元素：基本需求（例如饥饿、健康和能量）、情感和关系的亲密度。人工智能代理人可以通过这些动态元素来适应每天的活动和与其他代理人的对话，并且经过实验支持。我们的系统可以扩展到不同的设定，包括三个示例，以及其他影响人类行为的元素（例如共关、道德价值和文化背景）。我们的平台还包括Unity WebGL游戏界面 для可视化和互动分析亮点，以及跟踪代理人的时间变化。我们的平台可以在上运行，代码可以在上找到。”Note: Please keep in mind that the translation is done using Google Translate, and may not be perfect or entirely accurate.

Ethics of Artificial Intelligence and Robotics in the Architecture, Engineering, and Construction Industry

paper_url: http://arxiv.org/abs/2310.05414
repo_url: None
paper_authors: Ci-Jyun Liang, Thai-Hoa Le, Youngjib Ham, Bharadwaj R. K. Mantha, Marvin H. Cheng, Jacob J. Lin
for: This research paper focuses on the ethical considerations of AI and robotics adoption in the architecture, engineering, and construction (AEC) industry.methods: The paper systematically reviews existing literature on AI and robotics research in the AEC industry, identifying key ethical issues and research topics.results: The paper identifies nine key ethical issues, including job loss, data privacy, and liability, and provides thirteen research topics for future study. It also highlights current challenges and knowledge gaps in the field, and provides recommendations for future research directions.

Abstract
Artificial intelligence (AI) and robotics research and implementation emerged in the architecture, engineering, and construction (AEC) industry to positively impact project efficiency and effectiveness concerns such as safety, productivity, and quality. This shift, however, warrants the need for ethical considerations of AI and robotics adoption due to its potential negative impacts on aspects such as job security, safety, and privacy. Nevertheless, this did not receive sufficient attention, particularly within the academic community. This research systematically reviews AI and robotics research through the lens of ethics in the AEC community for the past five years. It identifies nine key ethical issues namely job loss, data privacy, data security, data transparency, decision-making conflict, acceptance and trust, reliability and safety, fear of surveillance, and liability, by summarizing existing literature and filtering it further based on its AEC relevance. Furthermore, thirteen research topics along the process were identified based on existing AEC studies that had direct relevance to the theme of ethics in general and their parallels are further discussed. Finally, the current challenges and knowledge gaps are discussed and seven specific future research directions are recommended. This study not only signifies more stakeholder awareness of this important topic but also provides imminent steps towards safer and more efficient realization.

摘要
人工智能（AI）和机器人技术在建筑、工程和建筑（AEC）行业的研究和应用已经出现，以提高项目效率和质量的问题。但是，这种转变也需要考虑AI和机器人的伦理问题，因为它们可能对工作安全、隐私和其他方面产生负面影响。然而，这一点在学术界并未得到充分关注，特别是在AEC领域。这项研究系统性地查看了AEC社区过去五年的AI和机器人研究，并Identified nine key ethical issues，namely job loss, data privacy, data security, data transparency, decision-making conflict, acceptance and trust, reliability and safety, fear of surveillance, and liability。此外，这些研究还标识出了13个相关的研究主题，包括数据隐私、数据安全、决策冲突、接受和信任、可靠性和安全、恐慌监测和责任。最后，这项研究讨论了当前的挑战和知识漏洞，并建议七个未来研究方向。这项研究不仅增加了参与者对这个重要话题的意识，而且还提供了更安全和效率的实现方法。

Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering

paper_url: http://arxiv.org/abs/2310.05410
repo_url: None
paper_authors: Trang Nguyen, Naoaki Okazaki
For: 提高Visual Question Answering（VQA）模型的泛化能力，使其能够回答图像问题，并考虑到训练分布之外的上下文。* Methods: 提出了Cognitive pathways VQA（CopVQA）模型，通过强调 causal reasoning 因素来提高多Modal 预测。 CopVQA 首先创建了多条可能的 causal reasoning 流程，然后将每个阶段的责任划分给独立的专家和认知组件（CC）。最后，模型优先选择由两个 CC 执行的答案预测，而忽略由单个 CC 生成的答案。* Results: 对实际生活和医疗数据进行了实验，证明了 CopVQA 可以提高 VQA 性能和泛化性，并在不同的基线和领域上达到新的州OF-THE-ART 水平，而且模型规模只是当前 SOTA 的一半。

Abstract
Generalization in Visual Question Answering (VQA) requires models to answer questions about images with contexts beyond the training distribution. Existing attempts primarily refine unimodal aspects, overlooking enhancements in multimodal aspects. Besides, diverse interpretations of the input lead to various modes of answer generation, highlighting the role of causal reasoning between interpreting and answering steps in VQA. Through this lens, we propose Cognitive pathways VQA (CopVQA) improving the multimodal predictions by emphasizing causal reasoning factors. CopVQA first operates a pool of pathways that capture diverse causal reasoning flows through interpreting and answering stages. Mirroring human cognition, we decompose the responsibility of each stage into distinct experts and a cognition-enabled component (CC). The two CCs strategically execute one expert for each stage at a time. Finally, we prioritize answer predictions governed by pathways involving both CCs while disregarding answers produced by either CC, thereby emphasizing causal reasoning and supporting generalization. Our experiments on real-life and medical data consistently verify that CopVQA improves VQA performance and generalization across baselines and domains. Notably, CopVQA achieves a new state-of-the-art (SOTA) on PathVQA dataset and comparable accuracy to the current SOTA on VQA-CPv2, VQAv2, and VQA RAD, with one-fourth of the model size.

摘要
通用化在视觉问答（VQA）中需要模型能够回答图像上的问题，并且考虑到训练分布之外的上下文。现有的尝试主要是对单模型方面进行精细调整，忽略了多模型方面的改进。此外，图像的多种解释会导致多种答案生成，这 highlights 了在解释和回答步骤之间的 causal reasoning 的角色。基于这个视角，我们提出了认知路径 VQA（CopVQA），它可以提高多模型预测的准确率。CopVQA 的实现方式是首先建立一个路径 pool，用于捕捉不同的 causal reasoning 流程。这与人类认知的层次结构相似，我们将解释和回答的责任分别划分为多个专家和一个认知能力Component（CC）。两个 CC 采用不同的策略来逐一执行每个专家，以便更好地捕捉 causal reasoning 的关系。最后，我们优先支持由多个 CC 共同执行的答案预测，而不是由单个 CC 生成的答案，以强调 causal reasoning 的重要性并且提高泛化能力。我们在真实生活和医疗数据上进行了实验，结果表明 CopVQA 可以提高 VQA 性能和泛化能力，并且在不同的基eline和领域上具有一致的表现。特别是，CopVQA 在 PathVQA 数据集上达到了新的状态态（SOTA），与当前 SOTA 在 VQA-CPv2、VQAv2 和 VQA RAD 数据集上的精度相似，仅使用一半的模型大小。

CAMEL2: Enhancing weakly supervised learning for histopathology images by incorporating the significance ratio

paper_url: http://arxiv.org/abs/2310.05394
repo_url: https://github.com/ThoroughFuture/CAMEL2
paper_authors: Gang Xu, Shuhao Wang, Lingyu Zhao, Xiao Chen, Tongwei Wang, Lang Wang, Zhenwei Luo, Dahan Wang, Zewen Zhang, Aijun Liu, Wei Ba, Zhigang Song, Huaiyin Shi, Dingrong Zhong, Jianpeng Ma
for: Histopathology image analysis for cancer diagnosis
methods: Weakly supervised learning methods with coarse-grained labels at the image level
results: Comparable performance to fully supervised baselines in both instance- and slide-level classifications, with the help of 5,120x5,120 image-level binary annotations that are easy to annotate.Here’s the summary in Traditional Chinese text:
for: histopathology图像分析 для癌症诊断
methods: weakly supervised learning方法，仅需 coarse-grained labels at the image level
results: 与完全监督基eline相比，在 both instance- 和 slide-level classification中 achieve comparable performance，仅需 5,120x5,120个易于annotate的image-level binary annotations。

Abstract
Histopathology image analysis plays a crucial role in cancer diagnosis. However, training a clinically applicable segmentation algorithm requires pathologists to engage in labour-intensive labelling. In contrast, weakly supervised learning methods, which only require coarse-grained labels at the image level, can significantly reduce the labeling efforts. Unfortunately, while these methods perform reasonably well in slide-level prediction, their ability to locate cancerous regions, which is essential for many clinical applications, remains unsatisfactory. Previously, we proposed CAMEL, which achieves comparable results to those of fully supervised baselines in pixel-level segmentation. However, CAMEL requires 1,280x1,280 image-level binary annotations for positive WSIs. Here, we present CAMEL2, by introducing a threshold of the cancerous ratio for positive bags, it allows us to better utilize the information, consequently enabling us to scale up the image-level setting from 1,280x1,280 to 5,120x5,120 while maintaining the accuracy. Our results with various datasets, demonstrate that CAMEL2, with the help of 5,120x5,120 image-level binary annotations, which are easy to annotate, achieves comparable performance to that of a fully supervised baseline in both instance- and slide-level classifications.

摘要

CCAE: A Corpus of Chinese-based Asian Englishes

paper_url: http://arxiv.org/abs/2310.05381
repo_url: https://github.com/jacklanda/CCAE
paper_authors: Yang Liu, Melissa Xiaohui Qin, Long Wang, Chao Huang
for: 这篇论文是为了创建一个多变体资料库，用于研究亚洲英语。
methods: 这篇论文使用了NLP技术，创建了一个基于中文的亚洲英语多变体资料库，包括六个中文基于的亚洲英语变体。
results: 这篇论文提供了一个448万个单词的448万个网页文档，来自六个区域，并且这些数据可以用于语言模型的训练和下游任务，这将为亚洲英语研究提供巨大的研究 potential。

Abstract
Language models have been foundations in various scenarios of NLP applications, but it has not been well applied in language variety studies, even for the most popular language like English. This paper represents one of the few initial efforts to utilize the NLP technology in the paradigm of World Englishes, specifically in creating a multi-variety corpus for studying Asian Englishes. We present an overview of the CCAE -- Corpus of Chinese-based Asian English, a suite of corpora comprising six Chinese-based Asian English varieties. It is based on 340 million tokens in 448 thousand web documents from six regions. The ontology of data would make the corpus a helpful resource with enormous research potential for Asian Englishes (especially for Chinese Englishes for which there has not been a publicly accessible corpus yet so far) and an ideal source for variety-specific language modeling and downstream tasks, thus setting the stage for NLP-based World Englishes studies. And preliminary experiments on this corpus reveal the practical value of CCAE. Finally, we make CCAE available at \href{https://huggingface.co/datasets/CCAE/CCAE-Corpus}{this https URL}.

摘要
受欢迎的语言模型在各种自然语言处理（NLP）应用场景中发挥了重要作用，但它们在语言多样性研究中尚未得到广泛应用，即使是最受欢迎的语言之一的英语。这篇论文是一个初始尝试，利用NLP技术来探索世界英语的多样性，具体来说是创建一个多种英语语料库，用于研究亚洲英语。我们介绍了CCAE——中基于英语的亚洲英语词库，这是一个包含6种中基于英语的亚洲英语变体的suite of corpora，基于3.4亿个字的448万个网页文档。这些数据的 ontology 使得这个词库成为了研究亚洲英语（特别是中英语）的有用资源，以及下游任务的理想来源，因此设置了NPLT-based World Englishes studies的场景。而我们的初步实验表明，CCAE 具有实际的价值。最后，我们将CCAE 公布在这个https URL 上。

Quantum Bayesian Optimization

paper_url: http://arxiv.org/abs/2310.05373
repo_url: https://github.com/daizhongxiang/Quantum_Bayesian_Optimization
paper_authors: Zhongxiang Dai, Gregory Kang Ruey Lau, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, Patrick Jaillet
for: 优化复杂的黑盒函数（black-box function）
methods: 使用量子计算机（quantum computer）和 Gaussian process（Gaussian process）实现 Bayesian optimization（BO）
results: 实现了对于非线性奖励函数的优化，并且在理论上达到了 O(polylog T) 的 regret upper bound，比 классиical BO 下的lower bound Omega(sqrt(T)) 更小。

Abstract
Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets for any classical BO algorithm. Recent works on quantum bandits have shown that with the aid of quantum computing, it is possible to achieve tighter regret upper bounds better than their corresponding classical lower bounds. However, these works are restricted to either multi-armed or linear bandits, and are hence not able to solve sophisticated real-world problems with non-linear reward functions. To this end, we introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm. To the best of our knowledge, our Q-GP-UCB is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting. Moreover, thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm from the previous work. We use simulations, as well as an experiment using a real quantum computer, to verify that the theoretical quantum speedup achieved by our Q-GP-UCB is also potentially relevant in practice.

摘要
kernelized bandits，也称为 bayesian optimization (BO)，已经是优化复杂黑盒奖励函数的常用方法。多种 BO 算法已经有理论上证明了每个迭代 T 的累累 regret 是下线的，而 Omega(sqrt(T)) 的 regret 下界则表示任何 classical BO 算法不可避免的 regret。现代量子bandits 研究表明，通过量子计算机的帮助，可以超过其对应的 classical 下界。然而，这些工作都是限制在多重武器或线性 bandits 上，因此无法解决复杂的实际问题。为此，我们介绍了量子- Gaussian 过程-上界 bound (Q-GP-UCB) 算法。根据我们所知，我们的 Q-GP-UCB 算法可以达到 O(polylog T) 的 regret Upper bound，这比 classical 设置的 Omega(sqrt(T)) 下界要小得多。此外，我们的新的 confidence ellipsoid 分析表明，我们的 Q-GP-UCB 算法使用线性 kernel 时的 regret 小于前一个工作中的量子线性 UCB 算法。我们使用 simulations 以及一个使用真实量子计算机的实验，以验证我们的理论上的量子速度提升也是在实践中有可能的。

Measuring Acoustics with Collaborative Multiple Agents

paper_url: http://arxiv.org/abs/2310.05368
repo_url: https://github.com/yyf17/MACMA
paper_authors: Yinfeng Yu, Changan Chen, Lele Cao, Fangkai Yang, Fuchun Sun
for: This paper aims to improve the efficiency and accuracy of measuring environment acoustics using multiple robots.
methods: The paper proposes using two robots to actively move and emit/receive sweep signals to measure the environment’s acoustics, and trains them using a collaborative multi-agent policy to explore the environment while minimizing prediction error.
results: The robots learn to collaborate and move to explore the environment acoustics while minimizing the prediction error, demonstrating the effectiveness of the proposed method.

Abstract
As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by setting up a loudspeaker and microphone in the environment for all source/receiver locations, which is time-consuming and inefficient. We propose to let two robots measure the environment's acoustics by actively moving and emitting/receiving sweep signals. We also devise a collaborative multi-agent policy where these two robots are trained to explore the environment's acoustics while being rewarded for wide exploration and accurate prediction. We show that the robots learn to collaborate and move to explore environment acoustics while minimizing the prediction error. To the best of our knowledge, we present the very first problem formulation and solution to the task of collaborative environment acoustics measurements with multiple agents.

摘要
人类生活中每秒都听到声音。声音我们听到常常受到环境的折射影响。例如，一个大厅会导致更多的延 reverberation。 Room Impulse Responses（RIR）是用于描述环境声学特性的函数，其中包括场景几何学、材料和源/接收器位置。传统上，RIRs通过在环境中设置 loudspeaker 和 microphone 来测量，这是时间consuming 和不效环境。我们提议使用两个机器人来测量环境的声学特性，它们通过活动移动和发送/接收扫描信号来测量。我们还开发了一种多agent 协同策略，这两个机器人在环境中探索声学特性，同时被奖励宽泛探索和准确预测。我们显示这两个机器人可以协同工作，在最小化预测错误的情况下探索环境声学特性。根据我们所知，我们提出了首个多agent 环境声学测量问题的问题与解决方案。

Molecular De Novo Design through Transformer-based Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.05365
repo_url: None
paper_authors: Tao Feng, Pengcheng Xu, Tianfan Fu, Siddhartha Laghuvarapu, Jimeng Sun
for: 本研究旨在利用Transformer驱动的生成模型进行分子 де novo设计。
methods: 我们的模型可以通过Transformer进行高效的序列学习，并且可以生成具有欲要性能的分子结构。与传统的RNN驱动模型相比，我们的方法可以更好地捕捉分子结构序列中的长期依赖关系。
results: 我们的模型在多个任务上表现出色，包括生成查询结构的同分子和生成具有特定属性的分子。与基eline的RNN驱动方法相比，我们的方法显著提高了模型的性能。我们的方法可以用于骨架跳转、库扩展和预测高活性分子。

Abstract
In this work, we introduce a method to fine-tune a Transformer-based generative model for molecular de novo design. Leveraging the superior sequence learning capacity of Transformers over Recurrent Neural Networks (RNNs), our model can generate molecular structures with desired properties effectively. In contrast to the traditional RNN-based models, our proposed method exhibits superior performance in generating compounds predicted to be active against various biological targets, capturing long-term dependencies in the molecular structure sequence. The model's efficacy is demonstrated across numerous tasks, including generating analogues to a query structure and producing compounds with particular attributes, outperforming the baseline RNN-based methods. Our approach can be used for scaffold hopping, library expansion starting from a single molecule, and generating compounds with high predicted activity against biological targets.

摘要
在这项工作中，我们介绍了一种方法，用于微调基于Transformer的生成模型，以实现分子的德诺vo计划。我们利用Transformer对序列学习的优势，可以效果地生成具有所需性能的分子结构。与传统的RNN基本方法相比，我们的提议方法在生成具有不同生物目标活性的分子结构方面表现出色，捕捉分子结构序列中长期依赖关系。我们的方法在多个任务上展现出优势，包括生成查询结构的同化体和生成具有特定属性的分子，超越基eline RNN基本方法。我们的方法可以用于跳跃架构、从单个分子开始扩大图书馆，以及预测高活性 against生物目标的分子。

paper_url: http://arxiv.org/abs/2310.05364
repo_url: https://github.com/blzhu0823/pathfusion
paper_authors: Bolin Zhu, Xiaoze Liu, Xin Mao, Zhuo Chen, Lingbing Guo, Tao Gui, Qi Zhang
for: 本研究旨在提高知识 graphs（KGs）的一体化性，通过发现多个 KGs 中相同实体对的对应关系。
methods: 本研究提出了PathFusion方法，包括两个主要组成部分：一是MSP模型，它通过建立实体和模式节点之间的路径来表示多个Modalities;二是IRF融合方法，它通过路径作为信息传递者，有效地融合不同Modalities中的信息。
results: 实验结果表明，对真实世界数据集进行测试的PathFusion方法，与现有方法相比，具有22.4%-28.9%的绝对提升（Hits@1），以及0.194-0.245的绝对提升（MRR）。

Abstract
The objective of Entity Alignment (EA) is to identify equivalent entity pairs from multiple Knowledge Graphs (KGs) and create a more comprehensive and unified KG. The majority of EA methods have primarily focused on the structural modality of KGs, lacking exploration of multi-modal information. A few multi-modal EA methods have made good attempts in this field. Still, they have two shortcomings: (1) inconsistent and inefficient modality modeling that designs complex and distinct models for each modality; (2) ineffective modality fusion due to the heterogeneous nature of modalities in EA. To tackle these challenges, we propose PathFusion, consisting of two main components: (1) MSP, a unified modeling approach that simplifies the alignment process by constructing paths connecting entities and modality nodes to represent multiple modalities; (2) IRF, an iterative fusion method that effectively combines information from different modalities using the path as an information carrier. Experimental results on real-world datasets demonstrate the superiority of PathFusion over state-of-the-art methods, with 22.4%-28.9% absolute improvement on Hits@1, and 0.194-0.245 absolute improvement on MRR.

摘要
目标是实体对应（Entity Alignment，EA）是从多个知识图（Knowledge Graph，KG）中标识相应的实体对，并创建一个更加完整和统一的KG。大多数EA方法主要关注了知识图的结构性，缺乏多 modal 信息的探索。一些多模式EA方法有所进步，但它们具有两个缺陷：（1）不稳定和不效率的模式模型，通常采用复杂和特定的模型来表示每种模式；（2）不具有有效的多模式融合，由于实体对应中的多种模式具有不同的特征。为了解决这些挑战，我们提出了PathFusion，它包括两个主要组成部分：（1）MSP，一种简化实体对应过程的统一模型方法，通过构建实体和模式节点之间的路径来表示多种模式；（2）IRF，一种迭代融合方法，通过路径作为信息传递者，有效地将不同模式的信息融合在一起。实验结果表明，PathFusion比 estado-of-the-art 方法具有22.4%-28.9%的绝对改善率（Hits@1），和0.194-0.245的绝对改善率（MRR）。

Generalized Neural Collapse for a Large Number of Classes

paper_url: http://arxiv.org/abs/2310.05351
repo_url: https://github.com/kongwanbianjinyu/Generalized-Neural-Collapse-for-a-Large-Number-of-Classes
paper_authors: Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin Mixon, Chong You, Zhihui Zhu
for: 这篇论文旨在探讨深度学习模型中的学习后层表示和分类权重的概念化，以及如何通过新的技术来提高实际深度模型的性能。
methods: 这篇论文使用了一种名为“普通化神经垮坏”的概念，用于描述深度学习模型中的学习后层表示和分类权重。这种概念可以帮助我们更好地理解深度学习模型的工作机理，并提供新的技术来提高模型的性能。
results: 这篇论文显示了在实际深度神经网络中发生的“普通化神经垮坏”现象，即在大数据集中，分类器的最小一对一 margin 是最大化的。此外，论文还提供了一系列实际和理论研究，以证明这种现象的存在性和可靠性。

Abstract
Neural collapse provides an elegant mathematical characterization of learned last layer representations (a.k.a. features) and classifier weights in deep classification models. Such results not only provide insights but also motivate new techniques for improving practical deep models. However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space. This paper extends neural collapse to cases where the number of classes are much larger than the dimension of feature space, which broadly occur for language models, retrieval systems, and face recognition applications. We show that the features and classifier exhibit a generalized neural collapse phenomenon, where the minimum one-vs-rest margins is maximized.We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks. Moreover, we provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint, under certain technical conditions on feature dimension and number of classes.

摘要
神经坍塌提供了深度学习模型中学习最后层表示（即特征）以及分类器权重的简洁数学定义。这些结果不仅提供了启示，还激发了改进实际深度模型的新技术。然而，现有的实际和理论研究多集中在深度模型中的小数目类别情况下进行研究。这篇论文扩展了神经坍塌到类别数量远大于特征空间维度的情况下，这种情况广泛出现在语音识别、检索系统和人脸识别等应用中。我们显示了神经坍塌现象在实际深度神经网络中发生，并且提供了理论研究，证明在不受特征模型约束的情况下，神经坍塌在某种技术条件下发生。

Continuous Invariance Learning

paper_url: http://arxiv.org/abs/2310.05348
repo_url: None
paper_authors: Yong Lin, Fan Zhou, Lu Tan, Lintao Ma, Jiameng Liu, Yansu He, Yuan Yuan, Yu Liu, James Zhang, Yujiu Yang, Hao Wang
for: 这篇论文主要针对了连续域问题的一致学习问题，即如何通过学习不变特征来提高模型的一致性。
methods: 这篇论文提出了一种新的连续域一致学习方法（CIL），该方法通过测量和控制 conditional independence 来提取连续域上的不变特征。
results: 论文的实验结果表明，CIL 可以在各种实验数据集上（包括生产环境中的数据）与强基线相比，具有更高的一致性。

Abstract
Invariance learning methods aim to learn invariant features in the hope that they generalize under distributional shifts. Although many tasks are naturally characterized by continuous domains, current invariance learning techniques generally assume categorically indexed domains. For example, auto-scaling in cloud computing often needs a CPU utilization prediction model that generalizes across different times (e.g., time of a day and date of a year), where `time' is a continuous domain index. In this paper, we start by theoretically showing that existing invariance learning methods can fail for continuous domain problems. Specifically, the naive solution of splitting continuous domains into discrete ones ignores the underlying relationship among domains, and therefore potentially leads to suboptimal performance. To address this challenge, we then propose Continuous Invariance Learning (CIL), which extracts invariant features across continuously indexed domains. CIL is a novel adversarial procedure that measures and controls the conditional independence between the labels and continuous domain indices given the extracted features. Our theoretical analysis demonstrates the superiority of CIL over existing invariance learning methods. Empirical results on both synthetic and real-world datasets (including data collected from production systems) show that CIL consistently outperforms strong baselines among all the tasks.

摘要
对于不变学习方法来说，它们的目标是学习不变特征，以便在分布转移时保持一致。虽然许多任务是自然地表示为连续领域，但当前的不变学习技术通常假设Category indexed领域。例如，云计算中的自动扩缩通常需要一个能够在不同时间（例如天时和年度）上泛化的CPU使用预测模型，其中`time'是连续领域的索引。在这篇论文中，我们开始 by theoretically showing that existing invariance learning methods can fail for continuous domain problems。Specifically, the naive solution of splitting continuous domains into discrete ones ignores the underlying relationship among domains, and therefore potentially leads to suboptimal performance。To address this challenge, we then propose Continuous Invariance Learning (CIL), which extracts invariant features across continuously indexed domains。CIL is a novel adversarial procedure that measures and controls the conditional independence between the labels and continuous domain indices given the extracted features。我们的理论分析表明CIL比既有的不变学习方法更加有利。empirical results on both synthetic and real-world datasets（包括生产系统中收集的数据）显示，CIL在所有任务中一直表现出优于强基eline。

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

paper_url: http://arxiv.org/abs/2310.05344
repo_url: None
paper_authors: Yi Dong, Zhilin Wang, Makesh Narsimhan Sreedhar, Xianchao Wu, Oleksii Kuchaiev
for: 该研究目标是使大语言模型（LLM）更加适应人类价值观。
methods: 该研究使用监督精度调整（SFT）和人类反馈学习（RLHF）两个阶段。
results: 试验结果表明，使用SteerLM可以生成更加有用和高质量的回答，而且训练更加容易。 compare to 多种基elines。

Abstract
Model alignment with human preferences is an essential step in making Large Language Models (LLMs) helpful and consistent with human values. It typically consists of supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) stages. However, RLHF faces inherent limitations stemming from a complex training setup and its tendency to align the model with implicit values that end users cannot control at run-time. Moreover, reward models in RLHF stage commonly rely on single-dimensional feedback as opposed to explicit, multifaceted signals that indicate attributes such as helpfulness, humor, and toxicity. To address these limitations, we propose SteerLM, a supervised fine-tuning method that empowers end-users to control responses during inference. SteerLM conditions responses to conform to an explicitly defined multi-dimensional set of attributes, thereby empowering a steerable AI capable of generating helpful and high-quality responses while maintaining customizability. Experiments show that SteerLM trained on open source datasets generates responses that are preferred by human and automatic evaluators to many state-of-the-art baselines trained with RLHF while being much easier to train. Try SteerLM at https://huggingface.co/nvidia/SteerLM-llama2-13B

摘要
大型语言模型（LLM）的模型对齐人类偏好是一项重要步骤，以确保模型与人类价值观 align。通常包括监督微调（SFT）和人类反馈学习（RLHF）两个阶段。然而，RLHF受到内在的限制，包括复杂的训练设置和对隐藏的价值观align。此外，RLHF阶段的奖励模型通常依赖单一的反馈信号，而不是显式、多方面的信号，如帮助程度、幽默度和恶势力。为解决这些限制，我们提出了SteerLM，一种监督微调方法，使得用户可以在推理时控制响应。SteerLM Condition Responses to Conform to an Explicitly Defined Multi-Dimensional Set of Attributes, thereby Empowering a Steerable AI Capable of Generating Helpful and High-Quality Responses While Maintaining Customizability。实验显示，SteerLM在开源数据集上训练后可以跟上人类和自动评价者的首选，而且训练得非常容易。可以在https://huggingface.co/nvidia/SteerLM-llama2-13B中尝试SteerLM。

Investigating Continuous Learning in Spiking Neural Networks

paper_url: http://arxiv.org/abs/2310.05343
repo_url: None
paper_authors: C. Tanner Fredieu
for: 这个论文探讨了使用第三代机器学习算法（叫做脉冲神经网络架构）进行连续学习的可能性，并与传统模型进行比较。
methods: 这个论文使用了三个阶段的实验。第一阶段是使用传输学习来训练传统模型。第二阶段使用Nengo模型库中的模型进行训练。最后，每个传统模型都被转换成了脉冲神经网络，并进行了训练。
results: 初步结果表明，使用SNN模型可以避免了训练过程中的迷失知识问题，但还需要进一步的研究。所有模型都能正确地识别当前的类别，但是它们都会在前一个类别上具有高于正常水平的输出概率。这表明SNN模型有潜力来超越迷失知识问题，但还需要很多的研究和改进。

Abstract
In this paper, the use of third-generation machine learning, also known as spiking neural network architecture, for continuous learning was investigated and compared to conventional models. The experimentation was divided into three separate phases. The first phase focused on training the conventional models via transfer learning. The second phase trains a Nengo model from their library. Lastly, each conventional model is converted into a spiking neural network and trained. Initial results from phase 1 are inline with known knowledge about continuous learning within current machine learning literature. All models were able to correctly identify the current classes, but they would immediately see a sharp performance drop in previous classes due to catastrophic forgetting. However, the SNN models were able to retain some information about previous classes. Although many of the previous classes were still identified as the current trained classes, the output probabilities showed a higher than normal value to the actual class. This indicates that the SNN models do have potential to overcome catastrophic forgetting but much work is still needed.

摘要
在这篇论文中，使用第三代机器学习技术，即脉冲神经网络架构，进行连续学习的可能性进行了调查和比较，并与传统模型进行对比。实验分为三个阶段。第一阶段是通过转移学习训练传统模型。第二阶段使用Nengo模型库中的模型进行训练。最后，每个传统模型都被转换成脉冲神经网络并进行训练。初果显示，第一阶段的结果与现有机器学习文献中关于连续学习的知识一致。所有模型都能正确地识别当前的类别，但它们都会因为恐慌遗忘而显示出过去类别的性能下降。然而，SNN模型能够保持一些过去类别的信息。虽然许多过去类别仍然被识别为当前训练类别，但输出概率显示高于实际类别的值。这表示SNN模型有可能超越恐慌遗忘，但还需要进一步的研究。

A Critical Look at Classic Test-Time Adaptation Methods in Semantic Segmentation

paper_url: http://arxiv.org/abs/2310.05341
repo_url: None
paper_authors: Chang’an Yi, Haotian Chen, Yifan Zhang, Yonghui Xu, Lizhen Cui
for: 这篇研究的目的是探讨测试时适应（TTA）在Semantic Segmentation任务中的应用。
methods: 这篇研究使用了 Classic TTA 方法，包括批量 normalization 更新策略和教师学生结构，以测试它们是否能够有效地适应 Semantic Segmentation 任务中的数据分布变化。
results: 研究结果显示，这些 Classic TTA 方法在 Semantic Segmentation 任务中的表现不如预期，特别是在批量 normalization 更新策略和教师学生结构方面。此外，Segmentation TTA 还面临着严重的长尾问题，这问题比 классификаation TTA 更加复杂。

Abstract
Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to potential distribution shifts in the test data. Most existing TTA studies, however, focus on classification tasks, leaving a notable gap in the exploration of TTA for semantic segmentation. This pronounced emphasis on classification might lead numerous newcomers and engineers to mistakenly assume that classic TTA methods designed for classification can be directly applied to segmentation. Nonetheless, this assumption remains unverified, posing an open question. To address this, we conduct a systematic, empirical study to disclose the unique challenges of segmentation TTA, and to determine whether classic TTA strategies can effectively address this task. Our comprehensive results have led to three key observations. First, the classic batch norm updating strategy, commonly used in classification TTA, only brings slight performance improvement, and in some cases it might even adversely affect the results. Even with the application of advanced distribution estimation techniques like batch renormalization, the problem remains unresolved. Second, the teacher-student scheme does enhance training stability for segmentation TTA in the presence of noisy pseudo-labels. However, it cannot directly result in performance improvement compared to the original model without TTA. Third, segmentation TTA suffers a severe long-tailed imbalance problem, which is substantially more complex than that in TTA for classification. This long-tailed challenge significantly affects segmentation TTA performance, even when the accuracy of pseudo-labels is high. In light of these observations, we conclude that TTA for segmentation presents significant challenges, and simply using classic TTA methods cannot address this problem well.

摘要
测试时适应（TTA）目的是使模型，首先在训练数据上进行训练，适应测试数据中的可能存在的分布变化。大多数现有的TTA研究， however，它们主要关注分类任务，留下了 semantic segmentation 的探索空间。这种注重分类的偏好可能导致许多新手和工程师 mistakenly assume that classic TTA methods designed for classification can be directly applied to segmentation。然而，这个假设仍未得到证明，这 constitutes an open question。为了解决这个问题，我们进行了系统性的、 empirical 的研究，以揭示 semantic segmentation TTA 中独特的挑战，并确定 classic TTA 策略是否能有效地解决这个任务。我们的全面的结果表明，有三个关键观察：1. 通常用于 classification TTA 的 batch norm 更新策略，对 semantic segmentation TTA 来说只有微scopic 的性能提高，而在一些情况下，甚至会 adversely affect the results。即使使用 advanced distribution estimation techniques like batch renormalization，问题仍未得到解决。2. teacher-student scheme 可以增强 semantic segmentation TTA 中的训练稳定性，但是它不能直接导致性能提高，与无TTA 的原始模型相比。3. semantic segmentation TTA 受到严重的长尾偏度问题困扰，这个问题比分类 TTA 更加复杂。这种长尾偏度问题会很大地影响 semantic segmentation TTA 性能，即使 pseudo-labels 的准确率很高。根据这些观察结论，我们 conclude that semantic segmentation TTA 存在 significativeschallenges，并且简单地使用 classic TTA methods 无法很好地解决这个问题。

Enhancing Long-form Text Generation Efficacy with Task-adaptive Tokenization

paper_url: http://arxiv.org/abs/2310.05317
repo_url: https://github.com/MichiganNLP/task-adaptive_tokenization
paper_authors: Siyang Liu, Naihao Deng, Sahand Sabour, Yilin Jia, Minlie Huang, Rada Mihalcea
for: 提高长文生成性能，特别是在心理问答任务中
methods: 采用任务适应的Tokenization，通过多个结果的采样，使用任务特定的数据来优化抽象概率
results: 在中文和英文心理问答任务上，通过对特定任务的抽象进行优化，可以提高生成性能，并且使用60% fewer tokens。初步实验表明，将我们的抽象方法与大语言模型结合使用，具有扎实的前景。

Abstract
We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We introduce a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol that allows for the integration of task-specific tokens into the pre-trained model's tokenization step. Through extensive experiments on psychological question-answering tasks in both Chinese and English, we find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens. Preliminary experiments point to promising results when using our tokenization approach with very large language models.

摘要
我们提议使用任务适应式分词法来适应下游任务的特点，以提高长文本生成在心理健康领域。 drawing inspiration from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We propose a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol that allows for the integration of task-specific tokens into the pre-trained model's tokenization step. Through extensive experiments on psychological question-answering tasks in both Chinese and English, we find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens. Preliminary experiments suggest promising results when using our tokenization approach with very large language models.Here's the breakdown of the translation:* 我们 (wǒmen) - we* 提议 (tīyì) - propose* 使用 (shǐyòu) - use* 任务适应式 (róngyè tíyìxì) - task-adaptive* 分词法 (fēnzihòu) - tokenization* 适应 (shìyìng) - adapt* 下游 (xiàyù) - downstream* 任务 (àiwù) - task* 特点 (tèdiǎn) - specifics* 以提高 (yǐ tígāo) - to improve* 长文本 (chángwén tiě) - long-form* 生成 (shēngchǎn) - generation* 在 (zhī) - in* 心理健康 (xīn lí jīn kāng) - mental health* 领域 (lǐngyè) - field* drawing inspiration from (zhìyì zhī) - inspired by* cognitive science (xīn lí kēxíng) - cognitive science* 任务特定 (àiwù tèqì) - task-specific* 数据 (shùdà) - data* 优化 (yòujiā) - optimize* probabilities (jiào dé) - probabilities* sampling (jiào) - sampling* 变量 (biānliàng) - variable* segmentations (jiāo biān) - segmentations* 多种 (duōzhèng) - multiple* outcomes (fāngyì) - outcomes* integration (tōngyì) - integration* 特定 vocabulary (tèqì yǔyīn) - specialized vocabulary* introduce (jìdǎo) - propose* a strategy (a jìdǎo) - a strategy* for building (jìdǎo) - for building* a specialized vocabulary (tèqì yǔyīn) - a specialized vocabulary* and introduce (jìdǎo) - and introduce* a vocabulary merging protocol (yǔyīn tóngxīng) - a vocabulary merging protocol* that allows for (dēng) - that allows for* the integration (tōngyì) - the integration* of task-specific tokens (àiwù zhǐxīn) - of task-specific tokens* into (dào) - into* the pre-trained model's (zhìyì zhī) - the pre-trained model's* tokenization step (tiězi xiàng) - tokenization step* Through (zhī) - through* extensive experiments (zhìyì yánjiū) - extensive experiments* on (zhī) - on* psychological question-answering (xīn lí yánsuō) - psychological question-answering* tasks (àiwù) - tasks* in (zhī) - in* both (liǎng) - both* Chinese (zhōngwén) - Chinese* and English (yīnggrēsī) - and English* we find (wǒmen jiào) - we find* a significant improvement (tóngyì zhìyì) - a significant improvement* in generation performance (chángwén tiěyì) - in generation performance* while using (yǐ) - while using* up to 60% fewer tokens (dào zhìyì) - up to 60% fewer tokens* Preliminary experiments (zhìyì yánjiū) - preliminary experiments* point to promising results (zhìyì zhìyì) - point to promising results* when using (yǐ) - when using* our tokenization approach (wǒmen tiězi xiàng) - our tokenization approach* with very large language models (dào yìjī zhīyì) - with very large language models.

2023-10-09

Estimating Numbers without Regression

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM

Factual and Personalized Recommendations using Language Models and Reinforcement Learning

How does prompt engineering affect ChatGPT performance on unsupervised entity resolution?

Memory-Consistent Neural Networks for Imitation Learning

Predictable Artificial Intelligence

CAW-coref: Conjunction-Aware Word-level Coreference Resolution

Understanding Transfer Learning and Gradient-Based Meta-Learning Techniques

Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond

Layout Sequence Prediction From Noisy Mobile Modality

Learning Layer-wise Equivariances Automatically using Gradients

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

Text-driven Prompt Generation for Vision-Language Models in Federated Learning

Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

OptiMUS: Optimization Modeling Using MIP Solvers and large language models

Learning Interactive Real-World Simulators

When is Agnostic Reinforcement Learning Statistically Tractable?

High Dimensional Causal Inference with Variational Backdoor Adjustment

Predictive auxiliary objectives in deep RL mimic learning in the brain

Performative Time-Series Forecasting

Pain Forecasting using Self-supervised Learning and Patient Phenotyping: An attempt to prevent Opioid Addiction

Augmenting Vision-Based Human Pose Estimation with Rotation Matrix

LLM for SoC Security: A Paradigm Shift

Generative ensemble deep learning severe weather prediction from a deterministic convection-allowing model

DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Divide-and-Conquer Dynamics in AI-Driven Disempowerment

Grokking as Compression: A Nonlinear Complexity Perspective

Interpreting CLIP’s Image Representation via Text-Based Decomposition

FireAct: Toward Language Agent Fine-tuning

SALMON: Self-Alignment with Principle-Following Reward Models

TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

A Meta-Learning Perspective on Transformers for Causal Language Modeling

Coarse-Graining Hamiltonian Systems Using WSINDy

AI Systems of Concern

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

Dynamic value alignment through preference aggregation of multiple objectives

HyperAttention: Long-context Attention in Near-Linear Time

Generative quantum machine learning via denoising diffusion probabilistic models

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models

Improving Summarization with Human Edits

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

Predicting Accident Severity: An Analysis Of Factors Affecting Accident Severity Using Random Forest Model

Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis

Are Large Language Models Post Hoc Explainers?

Rethinking Memory and Communication Cost for Efficient Large Language Model Training

DANet: Enhancing Small Object Detection through an Efficient Deformable Attention Network

Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design

Large-Scale OD Matrix Estimation with A Deep Learning Method

A Review of the Ethics of Artificial Intelligence and its Applications in the United States

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Language Model Beats Diffusion – Tokenizer is Key to Visual Generation

The Program Testing Ability of Large Language Models for Code

STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects on Production Lines

Guiding Language Model Reasoning with Planning Tokens

An Attribution Method for Siamese Encoders

Based on What We Can Control Artificial Neural Networks

Abstractive Summarization of Large Document Collections Using GPT

The potential of large language models for improving probability learning: A study on ChatGPT3.5 and first-year computer engineering students

Automated Argument Generation from Legal Facts

Making Scalable Meta Learning Practical

Reinforcement learning for freeform robot design

ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain

Causal structure learning with momentum: Sampling distributions over Markov Equivalence Classes of DAGs

No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling

FENCE: Fairplay Ensuring Network Chain Entity for Real-Time Multiple ID Detection at Scale In Fantasy Sports

Plug n’ Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers

Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand

Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods

Adaptive Multi-head Contrastive Learning

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations

Aggregated f-average Neural Network for Interpretable Ensembling

STREAM: Social data and knowledge collective intelligence platform for TRaining Ethical AI Models

WeatherDepth: Curriculum Contrastive Learning for Self-Supervised Depth Estimation under Adverse Weather Conditions

Logic-guided Deep Reinforcement Learning for Stock Trading

ParFam – Symbolic Regression Based on Continuous Global Optimization