2023-08-17

cs.AI

cs.AI - 2023-08-17

Enhancing API Documentation through BERTopic Modeling and Summarization

paper_url: http://arxiv.org/abs/2308.09070
repo_url: https://github.com/scam2023-bert/bertopic
paper_authors: AmirHossein Naghshzan, Sylvie Ratte
for: 本研究旨在提高API文档理解的效率和可用性，以帮助开发者更好地利用API功能。
methods: 本研究使用BERTopic进行主题分析和自然语言处理（NLP）技术，自动生成API文档摘要，从而为开发者提供更加有效的信息检索方式。
results: 研究发现API文档中有很多重复的主题和问题，并生成了可行的解决方案。这些发现和解决方案可以帮助开发者更好地理解API文档，提高软件开发过程的效率和质量。

Abstract
As the amount of textual data in various fields, including software development, continues to grow, there is a pressing demand for efficient and effective extraction and presentation of meaningful insights. This paper presents a unique approach to address this need, focusing on the complexities of interpreting Application Programming Interface (API) documentation. While official API documentation serves as a primary source of information for developers, it can often be extensive and lacks user-friendliness. In light of this, developers frequently resort to unofficial sources like Stack Overflow and GitHub. Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation, thereby creating a more efficient method for developers to extract the information they need. The produced summaries and topics are evaluated based on their performance, coherence, and interoperability. The findings of this research contribute to the field of API documentation analysis by providing insights into recurring topics, identifying common issues, and generating potential solutions. By improving the accessibility and efficiency of API documentation comprehension, our work aims to enhance the software development process and empower developers with practical tools for navigating complex APIs.

摘要
随着不同领域的文本数据量不断增加，包括软件开发，有效地提取和表现出有用的洞察结论变得越来越重要。这篇论文提出了一种独特的方法来解决这个需求，集中在API文档的复杂性上。官方API文档作为开发者的主要信息来源，然而它们经常是广泛的和不易于使用的。为了解决这个问题，开发者们经常查看Stack Overflow和GitHub等非官方源。我们的新方法利用BERTopic的主题分析和自然语言处理（NLP）技术自动生成API文档的摘要，从而创造了一种更高效的方法，使开发者可以更方便地提取所需的信息。生成的摘要和主题都被评估了其性能、一致性和可共享性。这些研究结果对API文档分析领域做出了贡献，提供了复杂API文档中的循环主题、通用问题和 potential解决方案的洞察。通过提高API文档的可访问性和效率，我们的工作希望能够改善软件开发过程，并为开发者们提供实用的工具来探索复杂的API。

Fostering User Engagement in the Critical Reflection of Arguments

paper_url: http://arxiv.org/abs/2308.09061
repo_url: None
paper_authors: Klaus Weber, Annalena Aicher, Wolfang Minker, Stefan Ultes, Elisabeth André
for: 支持公正、不偏袋化的意见形成过程
methods: 使用对话系统和模型来评估用户的反思程度和开放性
results: 在58名参与者的用户研究中，发现了对用户反思和总体关注程度的显著影响，证明了我们的方法的有效性。

Abstract
A natural way to resolve different points of view and form opinions is through exchanging arguments and knowledge. Facing the vast amount of available information on the internet, people tend to focus on information consistent with their beliefs. Especially when the issue is controversial, information is often selected that does not challenge one's beliefs. To support a fair and unbiased opinion-building process, we propose a chatbot system that engages in a deliberative dialogue with a human. In contrast to persuasive systems, the envisioned chatbot aims to provide a diverse and representative overview - embedded in a conversation with the user. To account for a reflective and unbiased exploration of the topic, we enable the system to intervene if the user is too focused on their pre-existing opinion. Therefore we propose a model to estimate the users' reflective engagement (RUE), defined as their critical thinking and open-mindedness. We report on a user study with 58 participants to test our model and the effect of the intervention mechanism, discuss the implications of the results, and present perspectives for future work. The results show a significant effect on both user reflection and total user focus, proving our proposed approach's validity.

摘要
自然的方式解决不同看法和形成意见是通过互动对话和交换知识。面对互联网上的巨量信息，人们往往会选择一些不会挑战他们的信念的信息。特别是当问题是争议的时，人们往往会偏好一些不会挑战他们的信念的信息。为了支持公正和不偏袋的意见形成过程，我们提议一个 chatbot 系统，与人类进行审慎对话。与推销系统不同的是，我们的 chatbot 旨在提供多样化和代表性的概述，并且在与用户进行对话时嵌入在其中。为了考虑用户的反思和公正探索，我们允许系统在用户过于围绕自己的先前意见时进行 вмешательство。为了衡量用户的反思程度，我们提出了用户反思参与度（RUE）模型，定义为用户的批判思维和开明性。我们报告了一个用户研究，测试我们的模型和干预机制的效果，讨论结果的意义，并提出未来工作的视角。结果显示我们的方法有效，用户反思程度和总用户焦点都有显著改善。

Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods

paper_url: http://arxiv.org/abs/2308.09051
repo_url: None
paper_authors: Paavo Alku, Sudarsana Reddy Kadiri, Dhananjaya Gowda
for: 这个研究 investigate了使用数据驱动的DeepFormants跟踪器进行形式追踪，并使用模型驱动的LP-based方法来进一步改进跟踪结果。
methods: 这个研究使用了LP-COV和QCP-FB两种LP-based形式估计方法，并将其与数据驱动的DeepFormants跟踪器结合使用，以提高跟踪结果。
results: 研究结果显示，使用LP-based模型驱动的跟踪器可以提高跟踪效果，并且使用QCP-FB分析方法可以获得最佳的跟踪结果。 Additionally, the study showed that the refined DeepFormants trackers were more resilient to noise than the reference trackers when tracking formants using VTR speech that was corrupted by additive noise.

Abstract
In this study, formant tracking is investigated by refining the formants tracked by an existing data-driven tracker, DeepFormants, using the formants estimated in a model-driven manner by linear prediction (LP)-based methods. As LP-based formant estimation methods, conventional covariance analysis (LP-COV) and the recently proposed quasi-closed phase forward-backward (QCP-FB) analysis are used. In the proposed refinement approach, the contours of the three lowest formants are first predicted by the data-driven DeepFormants tracker, and the predicted formants are replaced frame-wise with local spectral peaks shown by the model-driven LP-based methods. The refinement procedure can be plugged into the DeepFormants tracker with no need for any new data learning. Two refined DeepFormants trackers were compared with the original DeepFormants and with five known traditional trackers using the popular vocal tract resonance (VTR) corpus. The results indicated that the data-driven DeepFormants trackers outperformed the conventional trackers and that the best performance was obtained by refining the formants predicted by DeepFormants using QCP-FB analysis. In addition, by tracking formants using VTR speech that was corrupted by additive noise, the study showed that the refined DeepFormants trackers were more resilient to noise than the reference trackers. In general, these results suggest that LP-based model-driven approaches, which have traditionally been used in formant estimation, can be combined with a modern data-driven tracker easily with no further training to improve the tracker's performance.

摘要
在这个研究中，我们研究了使用数据驱动的 DeepFormants 溯形器进行形式追踪，并使用 LP-based 方法来估计形式。我们使用了传统的 covariance analysis（LP-COV）和最近提出的 quasi-closed phase forward-backward（QCP-FB）分析。在我们的改进方法中，首先使用 DeepFormants 溯形器预测三个最低的形式迹，然后将预测的形式迹替换为每帧的本地 спектral peak 显示出来。这个改进过程可以轻松地插入到 DeepFormants 溯形器中，无需进行任何新的数据学习。我们比较了两个改进后的 DeepFormants 溯形器与原始的 DeepFormants 和五个知名的传统溯形器，使用 popular vocal tract resonance（VTR） corpus。结果表明，数据驱动的 DeepFormants 溯形器比传统的溯形器更高效，而使用 QCP-FB 分析进行改进可以获得最佳性能。此外，通过使用受到添加噪声损害的 VTR 语音进行追踪，研究表明，改进后的 DeepFormants 溯形器比参照溯形器更抗雷。总之，这些结果表明，LP-based 模型驱动方法可以轻松地与现代数据驱动溯形器结合使用，以提高溯形器的性能。

Severity Classification of Parkinson’s Disease from Speech using Single Frequency Filtering-based Features

paper_url: http://arxiv.org/abs/2308.09042
repo_url: None
paper_authors: Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku
for: 本研究的目的是提出一种新的评估多巴生殖疾病（PD）严重程度的对象方法，以改善诊断和治疗。
methods: 本研究使用了单频 filtering（SFF）方法， derivated two sets of novel features：(1) SFF cepstral coefficients（SFFCC）和(2) MFCCs from SFF（MFCC-SFF），用于识别PD的严重程度。
results: 实验表明，提posed的特征比 conventinal MFCCs在三种说话任务（元音、句子、文本读取）中都有较好的表现，具体来说，SFFCC和MFCC-SFF特征在元音任务中比MFCC特征提高5.8%和2.3%，在句子任务中提高7.0%和1.8%，在文本读取任务中提高2.4%和1.1%。

Abstract
Developing objective methods for assessing the severity of Parkinson's disease (PD) is crucial for improving the diagnosis and treatment. This study proposes two sets of novel features derived from the single frequency filtering (SFF) method: (1) SFF cepstral coefficients (SFFCC) and (2) MFCCs from the SFF (MFCC-SFF) for the severity classification of PD. Prior studies have demonstrated that SFF offers greater spectro-temporal resolution compared to the short-time Fourier transform. The study uses the PC-GITA database, which includes speech of PD patients and healthy controls produced in three speaking tasks (vowels, sentences, text reading). Experiments using the SVM classifier revealed that the proposed features outperformed the conventional MFCCs in all three speaking tasks. The proposed SFFCC and MFCC-SFF features gave a relative improvement of 5.8% and 2.3% for the vowel task, 7.0% & 1.8% for the sentence task, and 2.4% and 1.1% for the read text task, in comparison to MFCC features.

摘要
开发对parkinson病（PD）严重程度评估方法是至关重要，以提高诊断和治疗的效果。这项研究提出了两个新的特征集：（1）单频范围滤波（SFF）cepstral coefficients（SFFCC）和（2）基于SFF的MFCC（MFCC-SFF），用于PD严重分类。先前的研究表明，SFF在spectro-temporal分辨率方面比短时傅立干更高。本研究使用PC-GITA数据库，包括PD患者和正常人的语音数据，从三种说话任务（声音、句子、文本阅读）中获得。实验表明，提出的特征超过了传统MFCC特征，在三种说话任务中都有显著提高。SFFCC和MFCC-SFF特征相比MFCC特征，在声音任务中提高5.8%、7.0%和2.4%，在句子任务中提高1.8%和1.1%，在文本阅读任务中提高2.3%。

A Mathematical Characterization of Minimally Sufficient Robot Brains

paper_url: http://arxiv.org/abs/2308.09041
repo_url: None
paper_authors: Basak Sakcak, Kalle G. Timperi, Vadim Weinstein, Steven M. LaValle
for: 本研究探讨了内部系统（机器人算法或软件）与外部系统（机器人身体和环境）之间的交互所获得的信息的下限。
methods: 本研究使用过程系统来模型内部系统和外部系统之间的交互，并研究了最弱的内部系统是否具备可以完成过滤和规划任务的能力。
results: 研究发现，在给定机器人硬件和任务下，存在最小的信息过程系统，这些系统具备Uniqueness和可以实现的性。此外，这些系统还可以用于解决优化整合/筛选、基本规划任务和模型系统给定输入-输出关系的问题。

Abstract
This paper addresses the lower limits of encoding and processing the information acquired through interactions between an internal system (robot algorithms or software) and an external system (robot body and its environment) in terms of action and observation histories. Both are modeled as transition systems. We want to know the weakest internal system that is sufficient for achieving passive (filtering) and active (planning) tasks. We introduce the notion of an information transition system for the internal system which is a transition system over a space of information states that reflect a robot's or other observer's perspective based on limited sensing, memory, computation, and actuation. An information transition system is viewed as a filter and a policy or plan is viewed as a function that labels the states of this information transition system. Regardless of whether internal systems are obtained by learning algorithms, planning algorithms, or human insight, we want to know the limits of feasibility for given robot hardware and tasks. We establish, in a general setting, that minimal information transition systems exist up to reasonable equivalence assumptions, and are unique under some general conditions. We then apply the theory to generate new insights into several problems, including optimal sensor fusion/filtering, solving basic planning tasks, and finding minimal representations for modeling a system given input-output relations.

摘要
To address this question, the paper introduces the concept of an information transition system for the internal system, which is a transition system over a space of information states that reflect the robot's or other observer's perspective based on limited sensing, memory, computation, and actuation. An information transition system is viewed as a filter, and a policy or plan is viewed as a function that labels the states of this information transition system.The paper establishes, in a general setting, that minimal information transition systems exist and are unique under certain conditions. These minimal systems are the weakest possible internal systems that can achieve the desired tasks. The theory is then applied to generate new insights into several problems, including optimal sensor fusion/filtering, solving basic planning tasks, and finding minimal representations for modeling a system given input-output relations.In simplified Chinese, the paper is about exploring the minimum amount of information and processing power needed for a robot to perform tasks such as filtering and planning, and how to model this information and processing power using transition systems. The paper introduces the concept of an information transition system, which is a way of representing the robot's perspective based on limited sensing, memory, computation, and actuation. The paper shows that there exist minimal information transition systems that are unique and sufficient for achieving passive and active tasks. These insights can be applied to various problems such as optimal sensor fusion, basic planning, and finding minimal representations for modeling a system.

Synthesizing Physically Plausible Human Motions in 3D Scenes

paper_url: http://arxiv.org/abs/2308.09036
repo_url: https://github.com/liangpan99/interscene
paper_authors: Liang Pan, Jingbo Wang, Buzhen Huang, Junyu Zhang, Haofan Wang, Xu Tang, Yangang Wang
for: 本研究旨在Synthesizing physically plausible human motions in 3D scenes, 解决现有的kinematics-based方法和physics-based方法存在缺陷，如penetration和foot skating。
methods: 我们提出了一种框架，即InterScene，将人类-场景交互分解成两个基本过程：Interacting和Navigating，并设计了两个可重用控制器，即InterCon和NavCon。
results: 实验结果表明，我们的框架可以在复杂的3D场景中生成physically plausible的长期人类运动。Here’s the translation in English for reference:
for: The paper aims to synthesize physically plausible human motions in 3D scenes, addressing the limitations of existing kinematics-based and physics-based methods, such as penetration and foot skating.
methods: We propose a framework called InterScene, which decomposes human-scene interactions into two fundamental processes, Interacting and Navigating, and designs two reusable controllers, InterCon and NavCon.
results: Experimental results demonstrate that our framework can generate physically plausible long-term human motions in complex 3D scenes.

Abstract
Synthesizing physically plausible human motions in 3D scenes is a challenging problem. Kinematics-based methods cannot avoid inherent artifacts (e.g., penetration and foot skating) due to the lack of physical constraints. Meanwhile, existing physics-based methods cannot generalize to multi-object scenarios since the policy trained with reinforcement learning has limited modeling capacity. In this work, we present a framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes. The key idea is to decompose human-scene interactions into two fundamental processes, Interacting and Navigating, which motivates us to construct two reusable Controller, i.e., InterCon and NavCon. Specifically, InterCon contains two complementary policies that enable characters to enter and leave the interacting state (e.g., sitting on a chair and getting up). To generate interaction with objects at different places, we further design NavCon, a trajectory following policy, to keep characters' locomotion in the free space of 3D scenes. Benefiting from the divide and conquer strategy, we can train the policies in simple environments and generalize to complex multi-object scenes. Experimental results demonstrate that our framework can synthesize physically plausible long-term human motions in complex 3D scenes. Code will be publicly released at https://github.com/liangpan99/InterScene.

摘要
使三维场景中的人物表现出physically plausible的运动是一个复杂的问题。基于骨骼动学的方法无法避免内在的缺陷（例如冲突和脚滑）由于physical constraints的缺失。而现有的物理学基的方法则无法泛化到多 объек 场景，因为训练的策略使用了强化学习有限的模型ing capacity。在这项工作中，我们提出了一种框架，允许 физи simulated 人物在多物件场景中进行长期交互任务。我们的关键思想是将人物-场景交互 decomposed 成两个基本过程：交互和导航。这使我们可以构建两个可重用的控制器，即InterCon和NavCon。具体来说，InterCon包含两个补做策略，使人物能够进入和离开交互状态（例如坐在椅子上和站起来）。为了在不同的位置上生成对象与人物之间的交互，我们还设计了 NavCon，一个路径跟踪策略，以保证人物在3D场景中的自由运动。由于我们采用了分而治之策略，我们可以在简单的环境中训练策略，并在复杂的多物件场景中进行泛化。实验结果表明，我们的框架可以在复杂的3D场景中生成physically plausible的长期人物运动。代码将在https://github.com/liangpan99/InterScene 上公开。

Reinforcement Learning for Battery Management in Dairy Farming

paper_url: http://arxiv.org/abs/2308.09023
repo_url: None
paper_authors: Nawazish Ali, Abdul Wahid, Rachael shaw, Karl Mason
for: 本研究旨在应用人工智能（AI）以提高牛奶牧场中的可再生能源发电效率。
methods: 本研究使用Q学习算法来学习一个有效的电池充放和充电策略。
results: 研究结果显示，开发的策略可以对比基准算法大幅降低电力成本。这些结果显示了强化学习在牛奶牧场中电池管理中的效iveness。

Abstract
Dairy farming is a particularly energy-intensive part of the agriculture sector. Effective battery management is essential for renewable integration within the agriculture sector. However, controlling battery charging/discharging is a difficult task due to electricity demand variability, stochasticity of renewable generation, and energy price fluctuations. Despite the potential benefits of applying Artificial Intelligence (AI) to renewable energy in the context of dairy farming, there has been limited research in this area. This research is a priority for Ireland as it strives to meet its governmental goals in energy and sustainability. This research paper utilizes Q-learning to learn an effective policy for charging and discharging a battery within a dairy farm setting. The results demonstrate that the developed policy significantly reduces electricity costs compared to the established baseline algorithm. These findings highlight the effectiveness of reinforcement learning for battery management within the dairy farming sector.

摘要
奶业是农业部分中最为能源密集的一部分。有效的电池管理是农业部门中绿色能源融合的关键。然而，控制电池充放电是一项具有挑战性的任务，这主要归结于能源需求的变化、可再生能源生产的随机性和能源价格的波动。尽管应用人工智能（AI）于奶业中可能带来多少优势，然而有限的研究进行了在这个领域。这项研究使用Q学习算法学习一个有效的电池充放电策略。研究结果表明，开发的策略可以significantly降低电力成本，与已有的基线算法相比。这些发现 highlights了Q学习在奶业中电池管理的有效性。

Towards Lightweight Data Integration using Multi-workflow Provenance and Data Observability

paper_url: http://arxiv.org/abs/2308.09004
repo_url: None
paper_authors: Renan Souza, Tyler J. Skluzacek, Sean R. Wilkinson, Maxim Ziatdinov, Rafael Ferreira da Silva
for: 这 paper 的目的是提出一种基于数据可见性的多任务集成数据分析方法，以满足现代大规模科学发现所需的多学科协作和多 computing 环境。
methods: 该 paper 使用了数据可见性策略和适配器系统设计，以及证明信息，来实现轻量级运行时多 workflow 集成数据分析。
results: 实验表明，该方法可以在多种并行系统和机器学习工具上实现near-zero overhead的多任务集成数据分析，并在 Summit 超级计算机上运行 up to 100,000 任务。

Abstract
Modern large-scale scientific discovery requires multidisciplinary collaboration across diverse computing facilities, including High Performance Computing (HPC) machines and the Edge-to-Cloud continuum. Integrated data analysis plays a crucial role in scientific discovery, especially in the current AI era, by enabling Responsible AI development, FAIR, Reproducibility, and User Steering. However, the heterogeneous nature of science poses challenges such as dealing with multiple supporting tools, cross-facility environments, and efficient HPC execution. Building on data observability, adapter system design, and provenance, we propose MIDA: an approach for lightweight runtime Multi-workflow Integrated Data Analysis. MIDA defines data observability strategies and adaptability methods for various parallel systems and machine learning tools. With observability, it intercepts the dataflows in the background without requiring instrumentation while integrating domain, provenance, and telemetry data at runtime into a unified database ready for user steering queries. We conduct experiments showing end-to-end multi-workflow analysis integrating data from Dask and MLFlow in a real distributed deep learning use case for materials science that runs on multiple environments with up to 276 GPUs in parallel. We show near-zero overhead running up to 100,000 tasks on 1,680 CPU cores on the Summit supercomputer.

摘要
现代大规模科学发现需要跨学科合作，包括高性能计算机（HPC）机器和边缘到云端的连续体系。集成数据分析在科学发现中扮演着关键性角色，特别是在当今人工智能时代，可以实现负责任的人工智能开发、FAIR、可重现和用户导航。然而，科学的多样性带来了多种支持工具、跨设施环境和高效HPC执行的挑战。基于数据可见性、适配器系统设计和 происхождение，我们提出了MIDA：一种轻量级运行时多 workflow интеGRATED数据分析方法。MIDA定义了数据可见性策略和适配方法，可以在多种并行系统和机器学习工具上使用。通过可见性，它可以在背景中 intercept数据流无需实rumentation，并将domain、 происхождение和telemetry数据集成到运行时，并将其存储在一个统一的数据库中，准备就绪 для用户导航查询。我们对实验结果表明，可以在多种环境中同时运行多个 workflow，并将数据集成到一个统一的数据库中，并且可以实现近于零 overhead。在 Summit 超级计算机上，我们运行了100,000个任务，使用1,680个CPU核心，并在多个环境中使用多达276个GPU并行运行。

An Extended Convergence Result for Behaviour Tree Controllers

paper_url: http://arxiv.org/abs/2308.08994
repo_url: None
paper_authors: Christopher Iliffe Sprague, Petter Ögren
for: 这篇论文是为了研究行为树（BTs）的叉集抽象和控制策略的可能性。
methods: 这篇论文使用了一种树结构，将低级控制策略组合成为层次结构，从而实现控制策略的可模块化。
results: 这篇论文研究了行为树的叉集抽象和控制策略的可能性，并推广了之前的结果，包括新的循环切换情况。

Abstract
Behavior trees (BTs) are an optimally modular framework to assemble hierarchical hybrid control policies from a set of low-level control policies using a tree structure. Many robotic tasks are naturally decomposed into a hierarchy of control tasks, and modularity is a well-known tool for handling complexity, therefor behavior trees have garnered widespread usage in the robotics community. In this paper, we study the convergence of BTs, in the sense of reaching a desired part of the state space. Earlier results on BT convergence were often tailored to specific families of BTs, created using different design principles. The results of this paper generalize the earlier results and also include new cases of cyclic switching not covered in the literature.

摘要
行为树（BT）是一种最佳的模块化框架，可以将层次结构的混合控制策略从一组低级控制策略中组装 together。许多 робо械任务自然可以被划分为一个层次结构的控制任务，而模块化是处理复杂性的知名工具，因此行为树在机器人社区中得到了广泛的应用。在这篇论文中，我们研究行为树的叉树，即达到某个状态空间中的愿景部分。先前的结果通常是针对特定的行为树设计原则而设计的，本paper的结果可以总结这些结果，并包括先前未被探讨的循环切换情况。

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases

paper_url: http://arxiv.org/abs/2308.11761
repo_url: None
paper_authors: Xintao Wang, Qianwen Yang, Yongting Qiu, Jiaqing Liang, Qianyu He, Zhouhong Gu, Yanghua Xiao, Wei Wang
for: 该论文旨在探讨如何将大型自然语言处理模型（LLM）与多种知识库（KB）集成，以提高模型的完整性、时效性、忠诚性和适应性。
methods: 该论文提出了一种名为 KnowledGPT 的全面框架，可以将 LLM 与多种 KB 集成，并提供了两种方法：一是使用思维提示程序来为 KB 进行搜索，二是将个性化知识库（PKB）引入到 LLM 中，以满足用户的特定需求。
results: 经过广泛的实验，该论文表明，通过将 LLM 与 KB 集成， KnowledGPT 可以更好地回答需要世界知识的问题，比如使用已知 KB 中的知识和从 PKB 中提取的个性化知识。

Abstract
Large language models (LLMs) have demonstrated impressive impact in the field of natural language processing, but they still struggle with several issues regarding, such as completeness, timeliness, faithfulness and adaptability. While recent efforts have focuses on connecting LLMs with external knowledge sources, the integration of knowledge bases (KBs) remains understudied and faces several challenges. In this paper, we introduce KnowledGPT, a comprehensive framework to bridge LLMs with various knowledge bases, facilitating both the retrieval and storage of knowledge. The retrieval process employs the program of thought prompting, which generates search language for KBs in code format with pre-defined functions for KB operations. Besides retrieval, KnowledGPT offers the capability to store knowledge in a personalized KB, catering to individual user demands. With extensive experiments, we show that by integrating LLMs with KBs, KnowledGPT properly answers a broader range of questions requiring world knowledge compared with vanilla LLMs, utilizing both knowledge existing in widely-known KBs and extracted into personalized KBs.

摘要
The retrieval process uses the program of thought prompting, which generates search language for KBs in code format with pre-defined functions for KB operations. In addition to retrieval, KnowledGPT offers the capability to store knowledge in a personalized KB, catering to individual user demands. With extensive experiments, we show that by integrating LLMs with KBs, KnowledGPT can properly answer a broader range of questions requiring world knowledge compared to vanilla LLMs, utilizing both knowledge existing in widely-known KBs and extracted into personalized KBs.

Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health

paper_url: http://arxiv.org/abs/2308.09726
repo_url: https://github.com/google-research/socialgood
paper_authors: Jackson A. Killian, Manish Jain, Yugang Jia, Jonathan Amar, Erich Huang, Milind Tambe
For: 这篇论文研究了如何使用多臂抽签机制（Restless Multi-armed Bandits，RMAB）来实现公平的决策，尤其是在电子健康领域。* Methods: 论文使用了两种平衡公平性的目标函数：最小最大奖赏和Nash均衡益。论文也提出了一个水满算法和一个理论据基于的均衡算法来解决这两个目标函数。* Results: 论文透过三个模拟领域的实验结果表明，使用论文提出的方法可以实现多倍的公平性，而且不需要严重影响使用度的损失。这些结果证明了论文的重要性，因为RMAB在影响人类和野生动物的系统中愈来愈普遍。

Abstract
Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources. RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health. For such high stakes settings, decisions must both improve outcomes and prevent disparities between groups (e.g., ensure health equity). We study equitable objectives for RMABs (ERMABs) for the first time. We consider two equity-aligned objectives from the fairness literature, minimax reward and max Nash welfare. We develop efficient algorithms for solving each -- a water filling algorithm for the former, and a greedy algorithm with theoretically motivated nuance to balance disparate group sizes for the latter. Finally, we demonstrate across three simulation domains, including a new digital health model, that our approaches can be multiple times more equitable than the current state of the art without drastic sacrifices to utility. Our findings underscore our work's urgency as RMABs permeate into systems that impact human and wildlife outcomes. Code is available at https://github.com/google-research/socialgood/tree/equitable-rmab

摘要
不休息的多手犬 (RMAB) 是一种流行的算法决策框架，用于Sequential Setting with limited resources。 RMAB 已经在公共卫生、治疗安排、抵抗贪婪和数字卫生等高规格场景中使用。为了保证高规格决策，决策必须同时提高结果和避免群体之间的差异（例如，确保健康公平）。我们研究了 equitable 目标函数 (ERMAB) ，并考虑了两种与公平性相关的目标函数：最小最大奖励和最大 Nash 利益。我们开发了高效的算法来解决每一个——一种水填算法来实现最小最大奖励，以及一种理论上支持的精心设计来平衡不同群体大小的 greedy 算法来实现最大 Nash 利益。最后，我们在三个模拟领域中，包括一个新的数字卫生模型，证明了我们的方法可以在不产生极大的Utility 损失的情况下多达多ples 更公平。我们的发现强调了我们的工作的急迫性，因为 RMAB 在影响人类和野生动物的系统中普遍存在。代码可以在中获取。

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

paper_url: http://arxiv.org/abs/2308.08949
repo_url: None
paper_authors: Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei
for: 本研究旨在提供一个稳固的评估特征归因方法的框架，以便更好地理解神经网络预测结果的基础特征。
methods: 本研究使用了两种新的视角来评估特征归因方法的 faithfulness 性能，即听soundness和completeness。soundness 评估归因特征是真实预测特征，而completeness 评估归因结果是否能够捕捉所有预测特征。
results: 研究人员通过对主流特征归因方法应用这两种新的视角，发现了一些缺陷和不足，并提出了一些改进建议。这些改进建议可以帮助提高特征归因方法的 faithfulness 和完整性。

Abstract
Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.

摘要
neural network 预测结果的Feature归因方法尝试解释预测结果的相关特征。然而，建立一个一致的框架来评估归因方法仍然是一个挑战。我们可以通过以下几种视角来评估归因：一个重要的镜像是观察归因特征的改变对模型的行为的影响（即寓言）。虽然提供了有用的洞察，但现有的寓言评估存在缺陷，我们在这篇论文中揭示这些缺陷。在这个工作中，我们提出了两种新的视角，它们基于坚实的数学基础，并提供了可计算的量化度量。我们对主流归因方法应用这些度量，提供了一种新的分析和比较feature归因方法的镜像。

Predicting Crop Yield With Machine Learning: An Extensive Analysis Of Input Modalities And Models On a Field and sub-field Level

paper_url: http://arxiv.org/abs/2308.08948
repo_url: None
paper_authors: Deepak Pathak, Miro Miranda, Francisco Mena, Cristhian Sanchez, Patrick Helber, Benjamin Bischke, Peter Habelitz, Hiba Najjar, Jayanth Siddamsetty, Diego Arenas, Michaela Vollmer, Marcela Charfuelan, Marlon Nuske, Andreas Dengel
for: 这项研究的目的是开发一种简单又有效的早期融合方法，以优化耐灌植物生长和产量预测。
methods: 该方法使用高分辨率耐灌植物产量地图作为训练数据，并使用机器学习模型和气象、土壤和DEM数据进行融合。
results: 研究发现，使用不同输入模式可以得到不同的最佳结果，并且输入模式的选择取决于地区、作物和选择的模型。

Abstract
We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other complementary modalities, including weather, soil, and DEM data. The proposed method uses input modalities available with global coverage, making the framework globally scalable. We explicitly highlight the importance of input modalities for crop yield prediction and emphasize that the best-performing combination of input modalities depends on region, crop, and chosen model.

摘要
我们介绍了一种简单 yet 有效的早期融合方法 для农作物产量预测，该方法可以处理多个输入模式，具有不同的时间和空间分辨率。我们使用高分辨率农作物产量地图作为真实数据来训练农作物和机器学习模型无关的方法。我们使用 Sentinal-2 卫星图像作为主要输入数据，其他补充模式包括天气、土壤和 DEM 数据。我们表明输入模式的重要性，并强调选择地区、作物和模型而定的最佳输入模式的重要性。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need the translation in Traditional Chinese, please let me know.

Interpretable Graph Neural Networks for Tabular Data

paper_url: http://arxiv.org/abs/2308.08945
repo_url: None
paper_authors: Amr Alkhatib, Sofiane Ennadir, Henrik Boström, Michalis Vazirgiannis
for: 本研究旨在提出一种可解释的图 neural network（IGNNet），用于处理常见的表格数据，并且能够准确地表示输入特征之间的交互作用。
methods: IGNNet使用一种受限的学习算法，以生成可解释的模型，其中模型能够从原始输入特征直接计算出预测结果的具体计算过程。
results: 实验结果表明，IGNNet与现有针对表格数据的机器学习算法相比，在性能上具有相当的竞争力，而且可以准确地表示输入特征之间的交互作用。同时，IGNNet的解释结果与真实的Shapley值相关，无需额外计算开销。

Abstract
Data in tabular format is frequently occurring in real-world applications. Graph Neural Networks (GNNs) have recently been extended to effectively handle such data, allowing feature interactions to be captured through representation learning. However, these approaches essentially produce black-box models, in the form of deep neural networks, precluding users from following the logic behind the model predictions. We propose an approach, called IGNNet (Interpretable Graph Neural Network for tabular data), which constrains the learning algorithm to produce an interpretable model, where the model shows how the predictions are exactly computed from the original input features. A large-scale empirical investigation is presented, showing that IGNNet is performing on par with state-of-the-art machine-learning algorithms that target tabular data, including XGBoost, Random Forests, and TabNet. At the same time, the results show that the explanations obtained from IGNNet are aligned with the true Shapley values of the features without incurring any additional computational overhead.

摘要
实际应用中 frequently 出现的数据格式是表格格式。图 neuron 网络（GNNs）在最近扩展以处理这种数据，以便捕捉特征之间的交互。然而，这些方法基本上生成黑盒模型，即深度神经网络，禁止用户跟踪模型预测的逻辑。我们提出了一种方法，称为 IGNNet（可解释图 neuron 网络 для表格数据），它限制学习算法生成可解释的模型，其中模型可以从原始输入特征直接计算预测。我们进行了大规模的实验研究，显示，IGNNet与目标 tabular 数据的状态机器学习算法相当，包括 XGBoost、Random Forests 和 TabNet。同时，结果表明，IGNNet 获得的解释与实际 Shapley 值相对，而不会增加任何额外的计算开销。

Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?

paper_url: http://arxiv.org/abs/2308.08943
repo_url: https://github.com/antonio-mastropaolo/satd-removal
paper_authors: Antonio Mastropaolo, Massimiliano Di Penta, Gabriele Bavota
for:* The paper aims to investigate the use of neural-based generative models for automatically paying back technical debt in software development.methods:* The paper employs seven different generative deep learning (DL) model configurations, including transformers pre-trained and fine-tuned with different combinations of training objectives.results:* The best model experimented with was able to automatically fix ~2% to 8% of test instances, depending on the number of attempts it was allowed to make.Here is the simplified Chinese translation of the three information points:for:* 本研究目的是 investigating 使用神经网络基于的生成模型来自动偿还软件开发中的技术债。methods:* 本研究使用了七种不同的生成深度学习（DL）模型配置，包括使用不同的训练目标来预训练和细化 transformers。results:* 最佳模型在试试次数不同情况下，能够自动修复 ~2% 到 8% 的测试实例。

Abstract
Upon evolving their software, organizations and individual developers have to spend a substantial effort to pay back technical debt, i.e., the fact that software is released in a shape not as good as it should be, e.g., in terms of functionality, reliability, or maintainability. This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models, and in particular models exploiting different strategies for pre-training and fine-tuning. We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects. SATD refers to technical debt instances documented (e.g., via code comments) by developers. We use this dataset to experiment with seven different generative deep learning (DL) model configurations. Specifically, we compare transformers pre-trained and fine-tuned with different combinations of training objectives, including the fixing of generic code changes, SATD removals, and SATD-comment prompt tuning. Also, we investigate the applicability in this context of a recently-available Large Language Model (LLM)-based chat bot. Results of our study indicate that the automated repayment of SATD is a challenging task, with the best model we experimented with able to automatically fix ~2% to 8% of test instances, depending on the number of attempts it is allowed to make. Given the limited size of the fine-tuning dataset (~5k instances), the model's pre-training plays a fundamental role in boosting performance. Also, the ability to remove SATD steadily drops if the comment documenting the SATD is not provided as input to the model. Finally, we found general-purpose LLMs to not be a competitive approach for addressing SATD.

摘要
企业和个人开发者在软件发展过程中需要投入很大的努力来偿还技术债务，即软件发布的形态不是理想的，例如功能、可靠性和维护性等方面的问题。这篇论文employs neural网络模型来自动偿还技术债务，并研究了不同预训练和细化目标的影响。我们从595个开源项目中提取了5,039个自我披露技术债务（SATD）的实例，并使用这些实例进行了7种不同的深度学习（DL）模型配置的实验。特别是，我们比较了使用预训练后细化不同组合的训练目标，包括通用代码修改、SATD修复和SATD注释提示训练。此外，我们还 investigate了在这种上下文中Large Language Model（LLM）基于的聊天机器人的可用性。结果表明自动偿还SATD是一项具有挑战性的任务，最佳我们实验的模型可以自动修复test实例中的~2%至8%，具体取决于允许尝试的次数。由于 fine-tuning数据的数量（~5k个实例）较小，预训练在性能提升中扮演了基本的角色。同时，如果不提供SATD注释作为输入，模型的SATD修复能力逐渐下降。最后，我们发现通用LLM不是适合解决SATD的方法。

A White-Box False Positive Adversarial Attack Method on Contrastive Loss-Based Offline Handwritten Signature Verification Models

paper_url: http://arxiv.org/abs/2308.08925
repo_url: None
paper_authors: Zhongliang Guo, Yifei Qian, Ognjen Arandjelović, Lei Fang
for: 这 paper 是为了解决白盒式 false positive 对做 contrastive loss 基于的 Offline 手写签名验证模型。
methods: 我们提出了一种新的攻击方法，它将攻击看作是一种style transfer между两种相似而又独特的写作风格。为了导引生成的幌子图像，我们引入了两个新的损失函数，它们可以在 embedding вектор 之间增加攻击成功率，同时保证最小化生成图像与原图像之间的差异。
results: 我们的方法在 white-box 攻击中对 contrastive loss 基于的 Offline 手写签名验证模型表现出了 state-of-the-art 性能，这得到了我们的实验证明。这 paper 的关键贡献包括一种新的 false positive 攻击方法，两个新的损失函数，有效的 style transfer 在手写风格之间，以及在 white-box false positive 攻击中比其他 white-box 攻击方法表现出更高的性能。

Abstract
In this paper, we tackle the challenge of white-box false positive adversarial attacks on contrastive loss-based offline handwritten signature verification models. We propose a novel attack method that treats the attack as a style transfer between closely related but distinct writing styles. To guide the generation of deceptive images, we introduce two new loss functions that enhance the attack success rate by perturbing the Euclidean distance between the embedding vectors of the original and synthesized samples, while ensuring minimal perturbations by reducing the difference between the generated image and the original image. Our method demonstrates state-of-the-art performance in white-box attacks on contrastive loss-based offline handwritten signature verification models, as evidenced by our experiments. The key contributions of this paper include a novel false positive attack method, two new loss functions, effective style transfer in handwriting styles, and superior performance in white-box false positive attacks compared to other white-box attack methods.

摘要
在这篇论文中，我们面临了对对照损失基于线上手写签名验证模型的白盒式false positive骚扰攻击的挑战。我们提出了一种新的攻击方法，它将攻击看作是在 closely related but distinct writing styles之间进行风格转换。为了导引生成的误leading images的生成，我们引入了两个新的损失函数，它们可以在 embedding vectors of the original and synthesized samples之间增加攻击成功率，同时保持最小的抖动，使得生成的图像与原始图像之间的差异尽量小。我们的方法在 white-box false positive attacks中表现出了state-of-the-art的性能，经过我们的实验证明。本文的主要贡献包括一种新的false positive攻击方法、两个新的损失函数、effective的风格转换在手写风格中、以及在 white-box false positive attacks中的superior performance compared to other white-box attack methods。

IMM: An Imitative Reinforcement Learning Approach with Predictive Representation Learning for Automatic Market Making

paper_url: http://arxiv.org/abs/2308.08918
repo_url: None
paper_authors: Hui Niu, Siyuan Li, Jiahao Zheng, Zhouchi Lin, Jian Li, Jian Guo, Bo An
for: 这种研究的目的是为了开发一种基于强化学习的多价格级市场制作者（Imitative Market Maker，IMM），以提高市场流动性和财务表现。methods: 这种方法利用了知识从不优化的信号基础 экспер特性和直接策略互动来快速发展多价格级市场制作者。它首先引入了有效的状态和动作表示，能够快速编码多价格级订单信息。然后，IMM利用了一个表示学习单元，能够捕捉短期和长期市场趋势，以减少不良选择风险。最后，IMM通过结合RL和仿效学习技术来训练代理人，从而实现有效的学习。results: 实验结果表明，IMM在四个实际市场数据集上比现有的RL基于市场制作者策略具有较高的财务表现。减少不良选择风险的策略组件的实验结果也证明了模型的有效性。

Abstract
Market making (MM) has attracted significant attention in financial trading owing to its essential function in ensuring market liquidity. With strong capabilities in sequential decision-making, Reinforcement Learning (RL) technology has achieved remarkable success in quantitative trading. Nonetheless, most existing RL-based MM methods focus on optimizing single-price level strategies which fail at frequent order cancellations and loss of queue priority. Strategies involving multiple price levels align better with actual trading scenarios. However, given the complexity that multi-price level strategies involves a comprehensive trading action space, the challenge of effectively training profitable RL agents for MM persists. Inspired by the efficient workflow of professional human market makers, we propose Imitative Market Maker (IMM), a novel RL framework leveraging both knowledge from suboptimal signal-based experts and direct policy interactions to develop multi-price level MM strategies efficiently. The framework start with introducing effective state and action representations adept at encoding information about multi-price level orders. Furthermore, IMM integrates a representation learning unit capable of capturing both short- and long-term market trends to mitigate adverse selection risk. Subsequently, IMM formulates an expert strategy based on signals and trains the agent through the integration of RL and imitation learning techniques, leading to efficient learning. Extensive experimental results on four real-world market datasets demonstrate that IMM outperforms current RL-based market making strategies in terms of several financial criteria. The findings of the ablation study substantiate the effectiveness of the model components.

摘要
市场制作（MM）在金融交易中吸引了广泛的注意力，因为它提供了市场流动性的关键 fonction。强大的顺序决策能力使得强化学习（RL）技术在量化交易中取得了显著的成功。然而，大多数现有的RL基于MM方法都是优化单价级别策略，这些策略在频繁的订单取消和优先级损失中失败。使用多个价格级别的策略更好地适应实际交易场景。然而，由于多个价格级别的策略的复杂性，RL代理人的训练仍然存在挑战。以人类市场制作者的高效工作流程为灵感，我们提出了imitative Market Maker（IMM），一种基于RL和仿制学习技术的新的市场制作框架。IMM从引入有效的状态和动作表示开始，并将市场趋势短期和长期都 captured。然后，IMM通过RL和仿制学习技术结合来训练代理人，从而实现高效的学习。我们在四个实际市场数据集上进行了广泛的实验，发现IMM在许多金融指标上表现出色，超过了现有的RL基于市场制作策略。减少风险的研究结果证明了模型组件的效果。

paper_url: http://arxiv.org/abs/2308.08915
repo_url: https://github.com/dawnvince/mts_cad
paper_authors: Haotian Si, Changhua Pei, Zhihan Li, Yadong Zhao, Jingjing Li, Haiming Zhang, Zulong Diao, Jianhui Li, Gaogang Xie, Dan Pei
for: 本研究旨在提出一种基于多任务学习的多变量时间序列异常检测算法（CAD），以解决现有异常检测方法中缺乏准确检测异常时序序列数据的问题。
methods: CAD使用了一种具有冲突意识的结构，以mitigate potential conflicts among metrics的 regression objectives，并提出了一种简单 yet effective的任务oriented metric selection和personalized和shared gating机制。
results: 对于多个公共数据集，CAD实现了一个平均的F1分数为0.943，明显超过了现有方法的性能。

Abstract
Massive key performance indicators (KPIs) are monitored as multivariate time series data (MTS) to ensure the reliability of the software applications and service system. Accurately detecting the abnormality of MTS is very critical for subsequent fault elimination. The scarcity of anomalies and manual labeling has led to the development of various self-supervised MTS anomaly detection (AD) methods, which optimize an overall objective/loss encompassing all metrics' regression objectives/losses. However, our empirical study uncovers the prevalence of conflicts among metrics' regression objectives, causing MTS models to grapple with different losses. This critical aspect significantly impacts detection performance but has been overlooked in existing approaches. To address this problem, by mimicking the design of multi-gate mixture-of-experts (MMoE), we introduce CAD, a Conflict-aware multivariate KPI Anomaly Detection algorithm. CAD offers an exclusive structure for each metric to mitigate potential conflicts while fostering inter-metric promotions. Upon thorough investigation, we find that the poor performance of vanilla MMoE mainly comes from the input-output misalignment settings of MTS formulation and convergence issues arising from expansive tasks. To address these challenges, we propose a straightforward yet effective task-oriented metric selection and p&s (personalized and shared) gating mechanism, which establishes CAD as the first practicable multi-task learning (MTL) based MTS AD model. Evaluations on multiple public datasets reveal that CAD obtains an average F1-score of 0.943 across three public datasets, notably outperforming state-of-the-art methods. Our code is accessible at https://github.com/dawnvince/MTS_CAD.

摘要
巨大的键性表现指标 (KPIs) 被监测为多变量时间序列数据 (MTS)，以确保软件应用程序和服务系统的可靠性。检测 MTS 中异常性的精度非常重要，以便后续的缺陷排除。由于缺乏异常例和手动标注，已经开发了许多自动学习 MTS 异常检测 (AD) 方法，这些方法通过优化总体的函数/损失来优化所有指标的回归函数/损失。然而，我们的实证研究发现，存在指标回归函数之间的冲突，导致 MTS 模型面临着不同的损失。这种问题在现有方法中受到了忽略。为解决这个问题，我们引入了 CAD（异常检测算法），一种具有冲突意识的多变量 KPI 异常检测算法。CAD 采用了多个指标的专属结构，以mitigate 指标之间的冲突，并且通过促进指标之间的促进来促进指标之间的互动。经过了系统的调查，我们发现，原始 MMoE 的性能不佳主要来自 MTS 表示形式的输入-输出不对称和任务的扩展问题。为解决这些挑战，我们提出了一种简单 yet 有效的任务关注的指标选择和 p&s 阻止机制，这使得 CAD 成为了首个实用的多任务学习 (MTL) 基于 MTS AD 模型。对多个公共数据集进行评估，我们发现，CAD 在三个公共数据集上的平均 F1 分数为 0.943，较state-of-the-art 方法高。我们的代码可以在上获取。

MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive Learning with Omics-Inference Modeling

paper_url: http://arxiv.org/abs/2308.09725
repo_url: None
paper_authors: Ziwei Yang, Zheng Chen, Yasuko Matsubara, Yasushi Sakurai
for: 本研究旨在推动个性化医疗的发展，通过在肿瘤分型中利用多Omics数据来提高肿瘤分型结果。
methods: 本研究提出了MoCLIM Representation Learning框架，可以独立提取不同Omics模式下的有用特征，并通过对不同Omics模式的对照学习来获得更好的分型结果。
results: 在六个肿瘤数据集上，MoCLIM Representation Learning框架可以显著提高肿瘤分型结果，并且可以在更少的高维度肿瘤实例中实现更好的分型结果。此外，该框架还可以在医疗分析中提供高可读性。

Abstract
Precision medicine fundamentally aims to establish causality between dysregulated biochemical mechanisms and cancer subtypes. Omics-based cancer subtyping has emerged as a revolutionary approach, as different level of omics records the biochemical products of multistep processes in cancers. This paper focuses on fully exploiting the potential of multi-omics data to improve cancer subtyping outcomes, and hence developed MoCLIM, a representation learning framework. MoCLIM independently extracts the informative features from distinct omics modalities. Using a unified representation informed by contrastive learning of different omics modalities, we can well-cluster the subtypes, given cancer, into a lower latent space. This contrast can be interpreted as a projection of inter-omics inference observed in biological networks. Experimental results on six cancer datasets demonstrate that our approach significantly improves data fit and subtyping performance in fewer high-dimensional cancer instances. Moreover, our framework incorporates various medical evaluations as the final component, providing high interpretability in medical analysis.

摘要
准精准医学目标是确定肿瘤分型中的生物化学机制异常。 omics 技术在肿瘤分类方面带来了革命性的变革，因为不同的 omics 数据记录了肿瘤多步骤过程中的生物化学产物。本文探讨了将多Omics 数据完全利用来改进肿瘤分类结果，因此开发了 MoCLIM 表示学框架。 MoCLIM 独立提取不同 omics Modalities 中的有用特征。通过对异 Omics Modalities 的异构学习，我们可以将肿瘤分类到更低的准则空间中。这种异构可以被解释为生物网络中跨Modalities 的推断。实验结果表明，我们的方法可以在更少的高维肿瘤实例中提高数据适应度和分类性能。此外，我们的框架还包括各种医学评估作为最后一个组件，提供了高度可读性在医学分析中。

Building Emotional Support Chatbots in the Era of LLMs

paper_url: http://arxiv.org/abs/2308.11584
repo_url: None
paper_authors: Zhonghua Zheng, Lizi Liao, Yang Deng, Liqiang Nie
for: 这篇论文目标是提出一种基于大语言模型（LLM）的情感支持对话集合创建方法，以推动情感支持机器人的实际应用。
methods: 该方法首先采用了仔细设计的对话生成模板，然后通过使用ChatGPT进行反射生成，创建了一个广泛的情感支持对话集合（ExTES）。接着，对LLaMA模型进行了高级调参技术，以优化情感支持交互的性能。
results: 结果表明，该模型在情感支持交互中表现出色，emarking a significant step forward in the field of emotional support bots, and paving the way for subsequent research and applications.

Abstract
The integration of emotional support into various conversational scenarios presents profound societal benefits, such as social interactions, mental health counseling, and customer service. However, there are unsolved challenges that hinder real-world applications in this field, including limited data availability and the absence of well-accepted model training paradigms. This work endeavors to navigate these challenges by harnessing the capabilities of Large Language Models (LLMs). We introduce an innovative methodology that synthesizes human insights with the computational prowess of LLMs to curate an extensive emotional support dialogue dataset. Our approach is initiated with a meticulously designed set of dialogues spanning diverse scenarios as generative seeds. By utilizing the in-context learning potential of ChatGPT, we recursively generate an ExTensible Emotional Support dialogue dataset, named ExTES. Following this, we deploy advanced tuning techniques on the LLaMA model, examining the impact of diverse training strategies, ultimately yielding an LLM meticulously optimized for emotional support interactions. An exhaustive assessment of the resultant model showcases its proficiency in offering emotional support, marking a pivotal step in the realm of emotional support bots and paving the way for subsequent research and implementations.

摘要
integración de apoyo emocional en diversas escenarios conversacionales ofrece beneficios sociales profundos, como interacciones sociales, consejería de salud mental y servicio al cliente. Sin embargo, existen desafíos sin resolver que impiden aplicaciones en el mundo real en este campo, como la limitaciones de datos y la ausencia de paradigmas de entrenamiento bien aceptados. Este trabajo busca superar estos desafíos mediante la combinación de la capacidad de modelos de lenguaje grande (LLMs) con la perspicacia humana. Presentamos un enfoque innovador que sintetiza la sabiduría humana con el poder computacional de LLMs para crear un gran conjunto de diálogos de apoyo emocional, llamado ExTES. Luego, aplicamos técnicas de ajuste avanzadas en el modelo LLaMA, examinando el impacto de diversas estrategias de entrenamiento, lo que permite optimizar meticulosamente el modelo para interacciones de apoyo emocional. Una evaluación exhaustiva del modelo resultante demuestra su habilidad para brindar apoyo emocional, lo que representa un paso importante en el campo de los bots de apoyo emocional y abre la puerta a investigaciones y aplicaciones subsiguientes.

Towards a Practical Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via Randomized Smoothing

paper_url: http://arxiv.org/abs/2308.08906
repo_url: None
paper_authors: Daniel Gibert, Giulio Zizzo, Quan Le
for: 防御深度学习恶意软件检测器受到针对性攻击
methods: 提议使用随机缩放方法来防御恶意软件检测器
results: 对BODMAS数据集进行实验表明，相比非粗糙分类器，我们的随机缩放模型具有更高的鲁棒性和泛化能力对抗针对性攻击

Abstract
Malware detectors based on deep learning (DL) have been shown to be susceptible to malware examples that have been deliberately manipulated in order to evade detection, a.k.a. adversarial malware examples. More specifically, it has been show that deep learning detectors are vulnerable to small changes on the input file. Given this vulnerability of deep learning detectors, we propose a practical defense against adversarial malware examples inspired by randomized smoothing. In our work, instead of employing Gaussian or Laplace noise when randomizing inputs, we propose a randomized ablation-based smoothing scheme that ablates a percentage of the bytes within an executable. During training, our randomized ablation-based smoothing scheme trains a base classifier based on ablated versions of the executable files. At test time, the final classification for a given input executable is taken as the class most commonly predicted by the classifier on a set of ablated versions of the original executable. To demonstrate the suitability of our approach we have empirically evaluated the proposed ablation-based model against various state-of-the-art evasion attacks on the BODMAS dataset. Results show greater robustness and generalization capabilities to adversarial malware examples in comparison to a non-smoothed classifier.

摘要
深度学习（DL）基于的恶意软件检测器有 been shown to be susceptible to deliberately manipulated malware examples that evade detection, a.k.a. adversarial malware examples. Specifically, it has been shown that deep learning detectors are vulnerable to small changes in the input file. Given this vulnerability of deep learning detectors, we propose a practical defense against adversarial malware examples inspired by randomized smoothing. In our work, instead of employing Gaussian or Laplace noise when randomizing inputs, we propose a randomized ablation-based smoothing scheme that ablates a percentage of the bytes within an executable. During training, our randomized ablation-based smoothing scheme trains a base classifier based on ablated versions of the executable files. At test time, the final classification for a given input executable is taken as the class most commonly predicted by the classifier on a set of ablated versions of the original executable. To demonstrate the suitability of our approach, we have empirically evaluated the proposed ablation-based model against various state-of-the-art evasion attacks on the BODMAS dataset. Results show greater robustness and generalization capabilities to adversarial malware examples in comparison to a non-smoothed classifier.

Development of a Knowledge Graph Embeddings Model for Pain

paper_url: http://arxiv.org/abs/2308.08904
repo_url: None
paper_authors: Jaya Chaturvedi, Tao Wang, Sumithra Velupillai, Robert Stewart, Angus Roberts
For: The paper aims to construct knowledge graph embedding models of pain concepts extracted from mental health electronic health records, combined with external knowledge from SNOMED CT, and evaluate their performance on a subject-object link prediction task.* Methods: The paper uses knowledge graph embedding models to represent pain concepts in a low-dimensional vector space, and combines them with external knowledge from SNOMED CT to enrich the graph. The models are evaluated on a subject-object link prediction task to assess their performance.* Results: The paper compares the performance of the knowledge graph embedding models with other baseline models, and evaluates their ability to predict subject-object links in the context of pain. The results show that the knowledge graph embedding models outperform the baseline models, demonstrating their effectiveness in capturing the complex relationships between pain concepts.

Abstract
Pain is a complex concept that can interconnect with other concepts such as a disorder that might cause pain, a medication that might relieve pain, and so on. To fully understand the context of pain experienced by either an individual or across a population, we may need to examine all concepts related to pain and the relationships between them. This is especially useful when modeling pain that has been recorded in electronic health records. Knowledge graphs represent concepts and their relations by an interlinked network, enabling semantic and context-based reasoning in a computationally tractable form. These graphs can, however, be too large for efficient computation. Knowledge graph embeddings help to resolve this by representing the graphs in a low-dimensional vector space. These embeddings can then be used in various downstream tasks such as classification and link prediction. The various relations associated with pain which are required to construct such a knowledge graph can be obtained from external medical knowledge bases such as SNOMED CT, a hierarchical systematic nomenclature of medical terms. A knowledge graph built in this way could be further enriched with real-world examples of pain and its relations extracted from electronic health records. This paper describes the construction of such knowledge graph embedding models of pain concepts, extracted from the unstructured text of mental health electronic health records, combined with external knowledge created from relations described in SNOMED CT, and their evaluation on a subject-object link prediction task. The performance of the models was compared with other baseline models.

摘要
疼痛是一种复杂的概念，可以与其他概念相连接，例如一种疾病可能会引起疼痛，一种药物可能会缓解疼痛等。为了全面理解个人或人口所经历的疼痛，我们需要检查所有与疼痛相关的概念和它们之间的关系。这在电子医疗记录中模拟疼痛 particullary useful。知识图表表示概念和它们之间的关系为相互连接的网络，使得semantic和上下文基于的理解变得可计算化。这些图表可能太大，导致不可fficiente computation。知识图表嵌入帮助解决这个问题，将知识图表转化为低维度向量空间中的表示。这些嵌入可以在多种下游任务中使用，如分类和链接预测。为了构建这些知识图表嵌入模型，我们可以从电子精神医疗记录中提取疼痛相关的文本信息，并与外部医学知识库SNOMED CT中的概念关系结合。SNOMED CT是一种层次系统atic的医学术语系统，可以提供疼痛相关的外部知识。通过将这些知识图表嵌入模型与电子精神医疗记录中的疼痛实例进行结合，我们可以进一步丰富知识图表，并在subject-object链接预测任务上评估这些模型的性能。在比较baseline模型的基础上，我们发现这些模型在这个任务上表现出色。

Model-Free Algorithm with Improved Sample Efficiency for Zero-Sum Markov Games

paper_url: http://arxiv.org/abs/2308.08858
repo_url: None
paper_authors: Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang
for: 本研究探讨了两个玩家零 SUM Markov 游戏中的理论研究，尤其是在考虑 finite-horizon episodic Markov decision processes (MDPs) 中。
methods: 本研究提出了一种基于 stage-based Q-learning 的模型自由算法，并证明了它可以达到与最佳模型基于算法相同的样本复杂度，即 $O(H^3SAB/\epsilon^2) $。
results: 本研究显示了模型自由算法可以在 Markov 游戏中实现 $\epsilon$-优 Nash 平衡 (NE)，并且在样本复杂度上与最佳模型基于算法相同。此外，本研究还提出了一种基于 reference-advantage decomposition 的变量减少技术，以提高样本效率。

Abstract
The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $\epsilon$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/\epsilon^2)$, which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms. The main improvement of the dependency on $H$ arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency.

摘要
“两Player零 SUM Markov 游戏问题在多智能RL理论研究中已经吸引了越来越多的关注。特别是在finite-horizon episodic Markov decision processes（MDPs）中，已经证明了模型基于算法可以在$O(H^3SAB/\epsilon^2)$的样本复杂度下找到$\epsilon$-优 Nash Equilibrium（NE），这是很依赖于Horizon $H$和状态数 $S$（其中 $A$ 和 $B$ 分别表示两个玩家的动作数）。然而，现有的模型free算法无法达到这种优化。在这项工作中，我们提出了一种模型free stage-based Q-学习算法，并证明它可以 дости到与最佳模型基于算法相同的样本复杂度，因此首次确立了模型free算法可以在$H$ 的依赖性上达到同样的优化。主要改进来自于variance reduction技术，基于reference-advantage decomposition，这种技术在单个RL中已经使用了很长时间，但是在Markov 游戏中它无法使用，因为policy更新通过coarse correlated equilibrium（CCE）论点。因此，为了将这种技术扩展到Markov 游戏，我们的算法具有一个关键的新特点，即在历史中更新参 referential value functions的方法，使其值差为历史中最小，以实现所需的样本效率提升。”

D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field

paper_url: http://arxiv.org/abs/2308.08857
repo_url: https://github.com/psyai-net/d-if_release
paper_authors: Xueting Yang, Yihao Luo, Yuliang Xiu, Wei Wang, Hao Xu, Zhaoxin Fan
for: 这 paper 的目的是提出一种基于深度隐式函数的图像基于3D人体重建方法，以实现高级别的真实人体模拟。
methods: 这 paper 使用的方法是基于深度隐式函数的图像基于3D人体重建方法，并将 implicit value 替换为 adaptive uncertainty distribution，以 differentiate Points based on their distance to the surface。
results: compared to nearly all baselines, the models trained using the uncertainty distribution loss proposed in this paper can capture more intricate wrinkles and realistic limbs, and demonstrate significant improvements.

Abstract
Realistic virtual humans play a crucial role in numerous industries, such as metaverse, intelligent healthcare, and self-driving simulation. But creating them on a large scale with high levels of realism remains a challenge. The utilization of deep implicit function sparks a new era of image-based 3D clothed human reconstruction, enabling pixel-aligned shape recovery with fine details. Subsequently, the vast majority of works locate the surface by regressing the deterministic implicit value for each point. However, should all points be treated equally regardless of their proximity to the surface? In this paper, we propose replacing the implicit value with an adaptive uncertainty distribution, to differentiate between points based on their distance to the surface. This simple ``value to distribution'' transition yields significant improvements on nearly all the baselines. Furthermore, qualitative results demonstrate that the models trained using our uncertainty distribution loss, can capture more intricate wrinkles, and realistic limbs. Code and models are available for research purposes at https://github.com/psyai-net/D-IF_release.

摘要
现实型人体在虚拟世界、智能医疗和自动驾驶等领域扮演着关键角色，但在大规模创造高真实度人体模型上存在挑战。深度隐函数推发了一新的图像基于3D人体重建时代，使得像素对应的形态恢复得到了细节。然而，Should all points be treated equally regardless of their proximity to the surface?在本文中，我们提出将implizit值换为 adaptive uncertainty distribution，以 differentiate between points based on their distance to the surface。这种简单的“值到分布”转换带来了大量基础上的改进。此外，qualitative results表明，使用我们的uncertainty distribution损失来训练模型，可以捕捉更复杂的皱纹和真实的肢体。可以在https://github.com/psyai-net/D-IF_release上获取代码和模型用于研究purpose。

BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

paper_url: http://arxiv.org/abs/2308.11527
repo_url: None
paper_authors: Dong Wang, Kavé Salamatian, Yunqing Xia, Weiwei Deng, Qi Zhiang
for: 这个研究的目的是提出一个新的框架BERT4CTR，以解决在Click-Through-Rate（CTR）预测中融合预训语言模型和多媒体输入的挑战。
methods: 本研究使用Uni-Attention机制，允许不同类型的输入（文本和非文本）之间的互动，并维持训练和推导时间成本低。
results: 实验结果显示，BERT4CTR可以与现有的州OF-the-art框架相比，在处理多媒体输入和CTR预测中表现出色，并且具有较低的训练和推导时间成本。

Abstract
Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging. Up to now two directions have been explored to integrate multi-modal inputs in fine-tuning of pre-trained language models. One consists of fusing the outcome of language models and non-textual features through an aggregation layer, resulting into ensemble framework, where the cross-information between textual and non-textual inputs are only learned in the aggregation layer. The second one consists of splitting non-textual features into fine-grained fragments and transforming the fragments to new tokens combined with textual ones, so that they can be fed directly to transformer layers in language models. However, this approach increases the complexity of the learning and inference because of the numerous additional tokens. To address these limitations, we propose in this work a novel framework BERT4CTR, with the Uni-Attention mechanism that can benefit from the interactions between non-textual and textual features while maintaining low time-costs in training and inference through a dimensionality reduction. Comprehensive experiments on both public and commercial data demonstrate that BERT4CTR can outperform significantly the state-of-the-art frameworks to handle multi-modal inputs and be applicable to CTR prediction.

摘要

Fuse the output of language models and non-textual features through an aggregation layer, resulting in an ensemble framework, where the interactions between textual and non-textual inputs are only learned in the aggregation layer.2. Split non-textual features into fine-grained fragments and transform them into new tokens combined with textual ones, so that they can be fed directly into transformer layers in language models. However, this approach increases the complexity of training and inference due to the numerous additional tokens.To address these limitations, we propose a novel framework called BERT4CTR, which utilizes the Uni-Attention mechanism to benefit from the interactions between non-textual and textual features while maintaining low time costs in training and inference through dimensionality reduction. Extensive experiments on both public and commercial data demonstrate that BERT4CTR can significantly outperform state-of-the-art frameworks in handling multi-modal inputs and be applicable to CTR prediction.

CMB: A Comprehensive Medical Benchmark in Chinese

paper_url: http://arxiv.org/abs/2308.08833
repo_url: https://github.com/FreedomIntelligence/CMB
paper_authors: Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li
for: 这篇论文的目的是提出一个基于中药文化的本地化医疗标准，以便评估和发展大型语言模型（LLMs）在医疗领域的表现。
methods: 这篇论文使用了一个名为CMB的本地化医疗标准，评估了多达几个大型语言模型，包括ChatGPT、GPT-4、中药模型和医疗领域专门的模型。
results: 根据CMB的评估结果，这些大型语言模型在医疗领域的表现有所不同，并且发现了一些地区特有的语言特征。

Abstract
Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. It is worth noting that our benchmark is not devised as a leaderboard competition but as an instrument for self-assessment of model advancements. We hope this benchmark could facilitate the widespread adoption and enhancement of medical LLMs within China. Check details in \url{https://cmedbenchmark.llmzoo.com/}.

摘要
大语言模型（LLMs）为医学领域提供了一个可能性，以确定重要的突破点。建立一个标准化的医学标准became a fundamental cornerstone to measure progress. However, different regions have their own local characteristics, such as the prevalence and significance of traditional Chinese medicine within China. Therefore, simply translating English-based medical evaluation may result in 上下文不符 incongruities in a local region. To solve this issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is an integral part of this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. It is worth noting that our benchmark is not designed as a leaderboard competition but as an instrument for self-assessment of model advancements. We hope this benchmark could facilitate the widespread adoption and enhancement of medical LLMs within China. For more details, please visit \url{https://cmedbenchmark.llmzoo.com/}.

Lifted Algorithms for Symmetric Weighted First-Order Model Sampling

paper_url: http://arxiv.org/abs/2308.08828
repo_url: None
paper_authors: Yuanhong Wang, Juhua Pu, Yuyi Wang, Ondřej Kuželka
for: 本文 targets the problem of weighted model sampling (WMS) in first-order logic, and explores whether WMS can be solved efficiently like domain-liftable model counting problems.
methods: 本文提出了一种快速的抽样算法，用于解决Weighted Model Sampling (WMS)问题，该算法基于首ORDER 逻辑中的 counting quantifiers。
results: 本文证明了 WMS 问题在 two-variables fragment 中是可采样的，并提供了一种快速的抽样算法，其运行时间与域大小成正比。此外，本文还证明了这一结论在 cardinality constraints 的存在下仍然成立。实验结果表明，该算法比现有的 WMS 抽样器快得多，证实了理论结论。

Abstract
Weighted model counting (WMC) is the task of computing the weighted sum of all satisfying assignments (i.e., models) of a propositional formula. Similarly, weighted model sampling (WMS) aims to randomly generate models with probability proportional to their respective weights. Both WMC and WMS are hard to solve exactly, falling under the $\#\mathsf{P}$-hard complexity class. However, it is known that the counting problem may sometimes be tractable, if the propositional formula can be compactly represented and expressed in first-order logic. In such cases, model counting problems can be solved in time polynomial in the domain size, and are known as domain-liftable. The following question then arises: Is it also the case for weighted model sampling? This paper addresses this question and answers it affirmatively. Specifically, we prove the domain-liftability under sampling for the two-variables fragment of first-order logic with counting quantifiers in this paper, by devising an efficient sampling algorithm for this fragment that runs in time polynomial in the domain size. We then further show that this result continues to hold even in the presence of cardinality constraints. To empirically verify our approach, we conduct experiments over various first-order formulas designed for the uniform generation of combinatorial structures and sampling in statistical-relational models. The results demonstrate that our algorithm outperforms a start-of-the-art WMS sampler by a substantial margin, confirming the theoretical results.

摘要
“加重模型计数（WMC）是计算一个 propositional 式中所有满足 assignment（即模型）的加重和的任务。相似地，加重模型采样（WMS）目标是随机生成权重 proportional 的模型。两者都属于 $\#\mathsf{P}$-hard 复杂性类别。然而，已知在某些情况下，计数问题可能是可解的，如果 propositional 式可以简洁地表示并表示在第一频频论 logic 中。在这种情况下，模型计数问题可以在域大小的时间复杂度内解决，并被称为域可提升的。这个问题的下一个问题是：加重模型采样是否也是可提升的？这篇论文回答了这个问题，并答曰：是的。具体来说，我们证明在两个变量的 Fragment 中的 first-order 逻辑中，加重模型采样是可提升的，并提供了一种高效的采样算法，其时间复杂度与域大小成正比。我们再次证明了在具有 cardinality 约束时，这个结果仍然保持。为确认我们的方法，我们对多种适用于统计关系模型和采样的 first-order 方程进行了实验，结果表明我们的算法在相对较大的域大小下表现更好，证明了我们的 теоретичеResult。”

Do you really follow me? Adversarial Instructions for Evaluating the Robustness of Large Language Models

paper_url: http://arxiv.org/abs/2308.10819
repo_url: None
paper_authors: Zekun Li, Baolin Peng, Pengcheng He, Xifeng Yan
for: 本研究旨在评估大语言模型（LLM）对针对性指令的Robustness，以确保其安全地部署在实际应用中。
methods: 本研究提出了一种探索LLM对针对性指令的Robustness的benchmark，通过测试当今的 instruction-following LLMs，发现了这些模型对针对性指令攻击的有限性。
results: 研究发现，现有的 instruction-tuned 模型容易被针对性指令攻击，并且这些模型会被训练以仅完成提示中的指令而不真正理解提示的含义。这 highlights 需要Addressing the challenge of training models to comprehend prompts instead of merely following instruction phrases and completing the text.

Abstract
Large Language Models (LLMs) have shown remarkable proficiency in following instructions, making them valuable in customer-facing applications. However, their impressive capabilities also raise concerns about the amplification of risks posed by adversarial instructions, which can be injected into the model input by third-party attackers to manipulate LLMs' original instructions and prompt unintended actions and content. Therefore, it is crucial to understand LLMs' ability to accurately discern which instructions to follow to ensure their safe deployment in real-world scenarios. In this paper, we propose a pioneering benchmark for automatically evaluating the robustness of LLMs against adversarial instructions. The objective of this benchmark is to quantify the extent to which LLMs are influenced by injected adversarial instructions and assess their ability to differentiate between these adversarial instructions and original user instructions. Through experiments conducted with state-of-the-art instruction-following LLMs, we uncover significant limitations in their robustness against adversarial instruction attacks. Furthermore, our findings indicate that prevalent instruction-tuned models are prone to being overfitted to follow any instruction phrase in the prompt without truly understanding which instructions should be followed. This highlights the need to address the challenge of training models to comprehend prompts instead of merely following instruction phrases and completing the text.

摘要
大型语言模型（LLMs）在客户面前的应用中表现出了惊人的能力，但这也使人们对其面临的风险的增强表示担忧。这些印象的能力可以通过第三方攻击者通过模型输入插入恶意指令来控制LLMs的原始指令和让其生成不良内容。因此，了解LLMs如何准确地执行原始指令是关键。在这篇论文中，我们提出了一个先进的benchmark来自动评估LLMs对恶意指令的抵抗力。本benchmark的目标是量化LLMs对插入的恶意指令的影响和评估它们是否能够分辨恶意指令和原始用户指令。经过对当今最先进的指令执行LLMs进行实验，我们发现了这些模型对恶意指令攻击的有限性。此外，我们的发现表明，现有的指令训练模型容易被适应过度，以至于无法真正理解需要执行的指令，而是仅仅完成了提示中的指令。这highlights the need to address the challenge of training models to comprehend prompts instead of merely following instruction phrases and completing the text.

Capturing Popularity Trends: A Simplistic Non-Personalized Approach for Enhanced Item Recommendation

paper_url: http://arxiv.org/abs/2308.08799
repo_url: https://github.com/jingxiaoyi/pare
paper_authors: Jiazheng Jing, Yinan Zhang, Xin Zhou, Zhiqi Shen
For: 本研究提出了一种基于Item Popularity的推荐方法（PARE），以优化现有的推荐方法。* Methods: PARE包括四个模块，每个模块关注不同的方面：历史流行度、时间影响、周期性影响以及侧信息。 finally，一个注意层用于将四个模块的输出融合。* Results: 对于多个数据集的实验表明，PARE可以与现有的先进推荐方法相比，或者甚至超越它们。此外，将PARE与现有的推荐方法结合使用可以显著提高性能。

Abstract
Recommender systems have been gaining increasing research attention over the years. Most existing recommendation methods focus on capturing users' personalized preferences through historical user-item interactions, which may potentially violate user privacy. Additionally, these approaches often overlook the significance of the temporal fluctuation in item popularity that can sway users' decision-making. To bridge this gap, we propose Popularity-Aware Recommender (PARE), which makes non-personalized recommendations by predicting the items that will attain the highest popularity. PARE consists of four modules, each focusing on a different aspect: popularity history, temporal impact, periodic impact, and side information. Finally, an attention layer is leveraged to fuse the outputs of four modules. To our knowledge, this is the first work to explicitly model item popularity in recommendation systems. Extensive experiments show that PARE performs on par or even better than sophisticated state-of-the-art recommendation methods. Since PARE prioritizes item popularity over personalized user preferences, it can enhance existing recommendation methods as a complementary component. Our experiments demonstrate that integrating PARE with existing recommendation methods significantly surpasses the performance of standalone models, highlighting PARE's potential as a complement to existing recommendation methods. Furthermore, the simplicity of PARE makes it immensely practical for industrial applications and a valuable baseline for future research.

摘要
很多研究者在过去几年中对推荐系统进行了逐渐增长的研究。大多数现有的推荐方法都是通过历史用户项交互来捕捉用户个性化的偏好，这可能会违反用户隐私。此外，这些方法通常忽视了item popularity的时间变化，这可能会影响用户做出决策。为了bridging这个差距，我们提出了Popularity-Aware Recommender（PARE），它通过预测item popularity来提供非个性化推荐。PARE包括四个模块，每个模块都专注于不同的方面：popularity history、temporal impact、periodic impact和side information。最后，我们使用了注意层来融合四个模块的输出。据我们所知，这是首次在推荐系统中显式地模型item popularity。我们的实验表明，PARE可以与现有的先进推荐方法相比，并且在一些情况下可以更好。由于PARE强调item popularity而不是个性化用户偏好，因此它可以增强现有的推荐方法，作为补充组件。我们的实验还表明，将PARE与现有的推荐方法集成可以大幅超越单独的模型性能，这说明PARE的潜在价值。此外，PARE的简单性使得它在实际应用中非常实用，并且成为未来研究的优秀基准。

Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data

paper_url: http://arxiv.org/abs/2308.11646
repo_url: None
paper_authors: Xinting Liao, Chaochao Chen, Weiming Liu, Pengyang Zhou, Huabin Zhu, Shuheng Shen, Weiqiang Wang, Mengling Hu, Yanchao Tan, Xiaolin Zheng
for: 提高 Federated Learning 在实际应用中的效果，特别是在非独立和 identical 数据 setting 中。
methods: 提出了两个主要模块：本地关系增强（LRA）和全局尼亚希尔贝克（GNE），用于同时解决客户端间和客户端内的不一致性。
results: 经过了四个 benchmark 数据集的广泛实验，证明 FedRANE 可以在非独立和 identical 数据 setting 中提高 Federated Learning 的性能。

Abstract
Federated learning (FL) is a distributed machine learning paradigm that needs collaboration between a server and a series of clients with decentralized data. To make FL effective in real-world applications, existing work devotes to improving the modeling of decentralized data with non-independent and identical distributions (non-IID). In non-IID settings, there are intra-client inconsistency that comes from the imbalanced data modeling, and inter-client inconsistency among heterogeneous client distributions, which not only hinders sufficient representation of the minority data, but also brings discrepant model deviations. However, previous work overlooks to tackle the above two coupling inconsistencies together. In this work, we propose FedRANE, which consists of two main modules, i.e., local relational augmentation (LRA) and global Nash equilibrium (GNE), to resolve intra- and inter-client inconsistency simultaneously. Specifically, in each client, LRA mines the similarity relations among different data samples and enhances the minority sample representations with their neighbors using attentive message passing. In server, GNE reaches an agreement among inconsistent and discrepant model deviations from clients to server, which encourages the global model to update in the direction of global optimum without breaking down the clients optimization toward their local optimums. We conduct extensive experiments on four benchmark datasets to show the superiority of FedRANE in enhancing the performance of FL with non-IID data.

摘要
Federated learning (FL) 是一种分布式机器学习 paradigma，需要服务器和多个客户端之间的合作，以便处理 Decentralized 数据。为了在实际应用中使 FL 有效，现有的工作努力于非独立和同分布（non-IID）数据的模型化。在非独立 Setting 中，有内部客户端不一致性，来自不均衡数据模型的差异，不仅妨碍了足够表示少数据，还导致了不同客户端的模型偏差不同。然而，过去的工作忽视了 simultaneous 处理上述两种 coupling inconsistency。在这种工作中，我们提议 FedRANE，它包括两个主要模块：本地关系增强（LRA）和全局纳什均衡（GNE）。具体来说，在每个客户端上，LRA 挖掘不同数据样本之间的相似关系，并通过对注意力传递来增强少数据表示。在服务器端，GNE 达成了客户端和服务器之间的一致，以避免客户端优化方向和服务器优化方向之间的冲突。我们在四个 benchmark 数据集上进行了广泛的实验，以显示 FedRANE 在非独立数据上提高 FL 的性能。

Bayesian polynomial neural networks and polynomial neural ordinary differential equations

paper_url: http://arxiv.org/abs/2308.10892
repo_url: None
paper_authors: Colby Fronk, Jaewoong Yun, Prashant Singh, Linda Petzold
for: 这些方法用于 recovering 多种科学和工程问题中的方程模型，但是它们只提供点估计方法，无法处理噪音数据。
methods: 我们使用 Laplace 近似、Markov Chain Monte Carlo (MCMC) 采样方法和 variational inference 进行 Bayesian 推理。
results: 我们发现 Laplace 近似是这类问题中最佳的方法。我们的工作可以轻松扩展到符号神经网络中的更广泛类型。

Abstract
Symbolic regression with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) are two recent and powerful approaches for equation recovery of many science and engineering problems. However, these methods provide point estimates for the model parameters and are currently unable to accommodate noisy data. We address this challenge by developing and validating the following Bayesian inference methods: the Laplace approximation, Markov Chain Monte Carlo (MCMC) sampling methods, and variational inference. We have found the Laplace approximation to be the best method for this class of problems. Our work can be easily extended to the broader class of symbolic neural networks to which the polynomial neural network belongs.

摘要
Symbolic regression with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) 是两种最新和强大的方法 для解决许多科学和工程问题中的方程回归问题。然而，这些方法只能提供点估计方法参数，并且无法处理噪声数据。我们通过开发和验证以下抽象推理方法来解决这个挑战：拉普拉斯逼近、Markov Chain Monte Carlo（MCMC）采样方法和variational推理。我们发现拉普拉斯逼近是这类问题中最佳的方法。我们的工作可以轻松扩展到符号化神经网络中的更广泛的类型。

CodeCoT and Beyond: Learning to Program and Test like a Developer

paper_url: http://arxiv.org/abs/2308.08784
repo_url: None
paper_authors: Dong Huang, Qingwen Bu, Heming Cui
For: The paper aims to improve the code generation accuracy of transformer-based large language models (LLMs) like GPT-x models, which often encounter challenges when handling tasks that differ from their training data.* Methods: The paper proposes two components: Vanilla CodeCoT and Self-exam CodeCoT. The Self-exam CodeCoT incorporates self-examination, allowing the model to iteratively generate code, formulate test cases, and refine its outputs.* Results: The paper reports significant enhancements in code generation accuracy across various LLM variants, with the Self-exam CodeCoT approach achieving an unprecedented pass@1 accuracy of 79.27% on the gpt-3.5-turbo-0613 model in the HumanEval dataset.Here are the three points in Simplified Chinese:
for: 这篇论文目标是提高基于转换器的大型自然语言处理模型（LLM）如GPT-x模型的代码生成精度，这些模型经常在不同于训练数据的任务中遇到挑战。
methods: 论文提出了两个组成部分：普通的CodeCoT和自我评估的CodeCoT。后者将自我评估纳入到模型中，使其可以顺序生成代码，制定测试例子，并改进其输出。
results: 论文发现，使用不同的LLM变体时，CodeCoT技术均能够显著提高代码生成精度，而Self-exam CodeCoT方法在gpt-3.5-turbo-0613模型上的HumanEval数据集上达到了历史最高的pass@1准确率为79.27%。

Abstract
In natural language processing, transformer-based large language models (LLMs) like GPT-x models developed by OpenAI have revolutionized the landscape. Despite their impressive capabilities, these models often encounter challenges when handling tasks that differ from their training data, resulting in compromised performance. To address this, few-shot learning has emerged as a valuable technique, allowing LLMs to adapt with minimal task-specific data. One innovative strategy, known as Chain-of-Thought Prompting (CoT), has been introduced to guide LLMs in revealing cognitive processes during multi-step reasoning. In this paper, we propose Code Chain-of-Thought~(CodeCoT), which consists of two components: the Vanilla CodeCoT and the Self-exam CodeCoT. The latter incorporates self-examination, empowering the model to iteratively generate code, formulate test cases, and refine its outputs. Specifically, the process entails the generation of test examples by the model corresponding to the code it is tasked to implement. If it fails on the test examples, then it regenerates the code based on the erroneous code and associated error types. Through comprehensive experiments, we observed that both techniques significantly enhance code generation accuracy across various LLM variants. Our evaluation results reveal that CodeCoT improves the code generation effectiveness, including an unprecedented pass@1 accuracy of 79.27\% using the Self-exam CodeCoT approach on the gpt-3.5-turbo-0613 model in the HumanEval dataset.

摘要
在自然语言处理领域，基于转换器的大语言模型（LLM）如OpenAI开发的GPT-x模型已经革命化了景观。尽管它们具有印象的能力，但它们在处理不同于训练数据的任务时经常遇到挑战，导致性能受损。为解决这个问题，几何学学习（few-shot learning）已经成为一种有价值的技术，允许LLM在最小的任务特定数据上适应。在这篇论文中，我们提出了代码链条提示（CodeCoT）技术，它包括两个组成部分：简单的CodeCoT和自我检验CodeCoT。后者在模型生成代码、编写测试例子和改进输出过程中，具有自我检验能力。具体来说，该过程包括由模型生成测试例子，并将其与代码相关的错误类型进行关联。我们通过了详细的实验，发现CodeCoT技术可以在不同的LLM变体上提高代码生成精度。我们的评估结果表明，使用Self-exam CodeCoT方法，gpt-3.5-turbo-0613模型在人类评估数据集上达到了历史上无 precedent的pass@1准确率为79.27%。

Knowledge-inspired Subdomain Adaptation for Cross-Domain Knowledge Transfer

paper_url: http://arxiv.org/abs/2308.09724
repo_url: None
paper_authors: Liyue Chen, Linian Wang, Jinyu Xu, Shuai Chen, Weiqiang Wang, Wenbiao Zhao, Qiyu Li, Leye Wang
for: 这篇论文是针对cross-domain fraud detection和traffic demand prediction领域的深度领域适应技术。
methods: 本文提出了一个novel的Knowledge-Inspired Subdomain Adaptation（KISA）框架，包括提供了理论底下 Shared Expected Loss的最小化，以及知识驱动的子领域分配问题和知识融合网络等。
results: 实验结果显示，KISA在骗贾检测和交通需求预测任务中获得了杰出的成绩。

Abstract
Most state-of-the-art deep domain adaptation techniques align source and target samples in a global fashion. That is, after alignment, each source sample is expected to become similar to any target sample. However, global alignment may not always be optimal or necessary in practice. For example, consider cross-domain fraud detection, where there are two types of transactions: credit and non-credit. Aligning credit and non-credit transactions separately may yield better performance than global alignment, as credit transactions are unlikely to exhibit patterns similar to non-credit transactions. To enable such fine-grained domain adaption, we propose a novel Knowledge-Inspired Subdomain Adaptation (KISA) framework. In particular, (1) We provide the theoretical insight that KISA minimizes the shared expected loss which is the premise for the success of domain adaptation methods. (2) We propose the knowledge-inspired subdomain division problem that plays a crucial role in fine-grained domain adaption. (3) We design a knowledge fusion network to exploit diverse domain knowledge. Extensive experiments demonstrate that KISA achieves remarkable results on fraud detection and traffic demand prediction tasks.

摘要
大多数当今深度领域适应技术都是在全局方式对源和目标样本进行对齐。即 после对齐，每个源样本都应该变得与任何目标样本相似。然而，全局对齐可能并不是最佳或必需的在实践中。例如，考虑cross-domain fraud detection，其中有两类交易：信用和非信用。对这两类交易进行分别对齐可能会提高性能，因为信用交易很 unlikely to exhibit patterns similar to non-credit transactions。为实现这种细致的领域适应，我们提出了一个novel Knowledge-Inspired Subdomain Adaptation（KISA） frameworks。具体来说，我们提供了以下三个方面：1. 我们提供了对KISA的理论启示，即KISA最小化了共享预期损失，这是领域适应方法的成功前提。2. 我们提出了基于知识的子领域分配问题，这在细致领域适应中扮演着关键性的角色。3. 我们设计了一个知识融合网络，以利用不同领域的知识。我们的实验证明，KISA在fraud detection和交通需求预测任务上表现出了很好的 result。

Exploring Demonstration Ensembling for In-context Learning

paper_url: http://arxiv.org/abs/2308.08780
repo_url: https://github.com/mukhal/icl-ensembling
paper_authors: Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang
for: 强化 язы言模型（LM）的学习，使其能够更好地完成给定任务。
methods: 使用示例集（demonstrations）进行强化学习，并研究不同的ensemble方法。
results: 比 concatenation 方法更高效，可以提高模型的预测精度。weighted max ensemble 方法在12种语言任务上表现出色，比 concatenation 方法提高了2.4个平均点。

Abstract
In-context learning (ICL) operates by showing language models (LMs) examples of input-output pairs for a given task, i.e., demonstrations. The standard approach for ICL is to prompt the LM with concatenated demonstrations followed by the test input. This approach suffers from some issues. First, concatenation offers almost no control over the contribution of each demo to the model prediction. This can be sub-optimal when some demonstrations are irrelevant to the test example. Second, due to the input length limit of some transformer models, it might be infeasible to fit many examples into the context, especially when dealing with long-input tasks. In this work, we explore Demonstration Ensembling (DENSE) as an alternative to simple concatenation. DENSE predicts outputs using subsets (i.e., buckets) of the demonstrations and then combines the output probabilities resulting from each subset to produce the final prediction. We study different ensembling methods using GPT-j and experiment on 12 language tasks. Our experiments show weighted max ensembling to outperform vanilla concatenation by as large as 2.4 average points. Code available at https://github.com/mukhal/icl-ensembling.

摘要
启发式学习（ICL）通过显示语言模型（LM）输入输出对的示例来操作，即示例。标准的ICL方法是在LM中提示示例后跟测试输入。这种方法存在一些问题。首先， concatenation 无法控制每个示例对模型预测的贡献。这可能是不优的when some demonstrations 无关于测试示例。其次，由于一些转换器模型的输入长度限制，可能无法将多个示例放入上下文中，特别是处理长输入任务。在这项工作中，我们探讨使用Demonstration Ensembling（DENSE）作为concatentation的替代方案。DENSE 使用示例集（i.e., bucket）预测输出，然后将每个集的输出概率组合成最终预测。我们研究不同的 ensemble 方法使用 GPT-j，并在 12 种语言任务上进行实验。我们的实验结果显示 weighted max ensembling 可以和concatentation 相比，在12种语言任务上出performances by as large as 2.4 average points。代码可以在 https://github.com/mukhal/icl-ensembling 上获取。

Large Language Models at Work in China’s Labor Market

paper_url: http://arxiv.org/abs/2308.08776
repo_url: None
paper_authors: Qin Chen, Jinfeng Ge, Huaqing Xie, Xingcheng Xu, Yanqing Yang
for: This paper explores the potential impacts of large language models (LLMs) on the Chinese labor market, with a focus on understanding the displacement risks for high-paying and experience-intensive jobs.
methods: The paper uses a methodology that incorporates human expertise and LLM classifications to analyze occupational exposure to LLM capabilities, and aggregates occupation exposure to the industry level to obtain industry exposure scores.
results: The results indicate a positive correlation between occupation exposure and wage levels/experience premiums, suggesting that higher-paying and experience-intensive jobs may face greater displacement risks from LLM-powered software. The industry exposure scores align with expert assessments and economic intuitions, and the study provides an analytical basis for understanding the labor market impacts of increasingly capable AI systems in China.

Abstract
This paper explores the potential impacts of large language models (LLMs) on the Chinese labor market. We analyze occupational exposure to LLM capabilities by incorporating human expertise and LLM classifications, following Eloundou et al. (2023)'s methodology. We then aggregate occupation exposure to the industry level to obtain industry exposure scores. The results indicate a positive correlation between occupation exposure and wage levels/experience premiums, suggesting higher-paying and experience-intensive jobs may face greater displacement risks from LLM-powered software. The industry exposure scores align with expert assessments and economic intuitions. We also develop an economic growth model incorporating industry exposure to quantify the productivity-employment trade-off from AI adoption. Overall, this study provides an analytical basis for understanding the labor market impacts of increasingly capable AI systems in China. Key innovations include the occupation-level exposure analysis, industry aggregation approach, and economic modeling incorporating AI adoption and labor market effects. The findings will inform policymakers and businesses on strategies for maximizing the benefits of AI while mitigating adverse disruption risks.

摘要

Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models

paper_url: http://arxiv.org/abs/2308.08774
repo_url: None
paper_authors: Phillip Rust, Anders Søgaard
for: 本文目的是探讨语言模型如mBERT、XLM-R和BLOOM在多语言泛化和压缩方面是否可以同时满足隐私、语言公平和透明性的要求。
methods: 本文使用了多语言压缩和语言公平性来评估这些模型的隐私、语言公平和透明性。
results: 研究发现，多语言压缩和语言公平性可以同时满足隐私要求，但是隐私和训练数据的影响稀缺性之间存在矛盾。研究还进行了多种NLP任务的实验，并评估了不同隐私保证下的多语言压缩和训练数据影响的质量。结果表明，需要开发新的方法来同时优化这些目标，以找到实际的让步。

Abstract
Language models such as mBERT, XLM-R, and BLOOM aim to achieve multilingual generalization or compression to facilitate transfer to a large number of (potentially unseen) languages. However, these models should ideally also be private, linguistically fair, and transparent, by relating their predictions to training data. Can these requirements be simultaneously satisfied? We show that multilingual compression and linguistic fairness are compatible with differential privacy, but that differential privacy is at odds with training data influence sparsity, an objective for transparency. We further present a series of experiments on two common NLP tasks and evaluate multilingual compression and training data influence sparsity under different privacy guarantees, exploring these trade-offs in more detail. Our results suggest that we need to develop ways to jointly optimize for these objectives in order to find practical trade-offs.

摘要
Language models like mBERT、XLM-R和BLOOM目的是实现多语言通用或压缩，以便将模型转移到大量（可能未看过）语言上。然而，这些模型应该也是私人的，公平的，透明的，对模型预测的关系。是这些需求可以同时满足呢？我们显示，多语言压缩和语言公平是可以同时满足 differential privacy 的，但是 differential privacy 与训练数据影响稀缺是不可能同时满足的。我们进一步发现了在两种常见的 NLP 任务上对多语言压缩和训练数据影响稀缺进行了多种实验，并详细分析了这些负担的交叉关系。我们的结果表明，我们需要开发一些方法来同时优化这些目标，以找到实际的平衡点。

Sensor Fusion by Spatial Encoding for Autonomous Driving

paper_url: http://arxiv.org/abs/2308.10707
repo_url: None
paper_authors: Quoc-Vinh Lai-Dang, Jihui Lee, Bumgeun Park, Dongsoo Har
for: 本研究旨在探讨摄像头和激光雷达数据融合的问题，以提高自动驾驶和机器人视觉系统的性能。
methods: 本研究使用了Transformer模块，并在不同的分辨率下应用了多个Transformer模块，以有效地结合本地和全局上下文关系。
results: 对于两个挑战性的测试基准，提出的方法比前一些方法显著提高了驾驶和违法分数，相比TransFuser，本方法在Longest6和Town05 Long bencmarks上的驾驶分数提高8%和19%。

Abstract
Sensor fusion is critical to perception systems for task domains such as autonomous driving and robotics. Recently, the Transformer integrated with CNN has demonstrated high performance in sensor fusion for various perception tasks. In this work, we introduce a method for fusing data from camera and LiDAR. By employing Transformer modules at multiple resolutions, proposed method effectively combines local and global contextual relationships. The performance of the proposed method is validated by extensive experiments with two adversarial benchmarks with lengthy routes and high-density traffics. The proposed method outperforms previous approaches with the most challenging benchmarks, achieving significantly higher driving and infraction scores. Compared with TransFuser, it achieves 8% and 19% improvement in driving scores for the Longest6 and Town05 Long benchmarks, respectively.

摘要
感知系统中的感知融合是自动驾驶和机器人等任务域的关键技术。最近，以Transformer和CNN结合的方法在感知融合中表现出色。在这种方法中，我们提出了将数码和LiDAR数据进行融合的方法，通过在多个分辨率下使用Transformer模块，有效地组合了本地和全局上下文关系。我们在两个挑战性的测试基准上进行了广泛的实验，并证明了我们的方法在最具挑战性的情况下表现出色，比之前的方法提高了驾驶和违法分数。相比于TransFuser，我们的方法在Longest6和Town05 Long benchmark上提高了8%和19%的驾驶分数。

Discrete Prompt Compression with Reinforcement Learning

paper_url: http://arxiv.org/abs/2308.08758
repo_url: None
paper_authors: Hoyoun Jung, Kyung-Joong Kim
for: 本研究旨在提出一种基于强化学习的提示压缩方法，以解决现有方法具有多个含义的 embedding 问题，提高可读性、可重用性和适用性。
methods: 本研究使用了一种名为 PCRL 的计算效率高的政策网络，直接编辑提示，以实现提示压缩。 PCRL 可以适应不同类型的 LM，包括逻辑机制和解码器-编码器架构，并可以在不使用梯度访问 LM 或标注数据的情况下进行训练。
results: 研究发现，PCRL 可以实现提示Token 的平均减少24.6%，同时保持性能。此外，我们还证明了Policy 可以被传递到更大的 LM 上，并通过不同的分析，帮助理解提示中Token 的重要性。

Abstract
Instruction-tuned Language Models (LMs) are widely used by users to address various problems with task-specific prompts. Constraints associated with the context window length and computational costs encourage the development of compressed prompts. Existing methods rely heavily on training embeddings, which are designed to accommodate multiple token meanings. This presents challenges in terms of interpretability, a fixed number of embedding tokens, reusability across different LMs, and inapplicability when interacting with black-box APIs. This study proposes prompt compression with reinforcement learning (PCRL), a novel discrete prompt compression method that addresses these issues. PCRL employs a computationally efficient policy network that directly edits prompts. The PCRL training approach can be flexibly applied to various types of LMs, as well as decoder-only and encoder-decoder architecture, and can be trained without gradient access to LMs or labeled data. PCRL achieves an average reduction of 24.6% in token count across various instruction prompts while preserving performance. Further, we demonstrate that the learned policy can be transferred to larger LMs, and through various analyses, we aid the understanding of token importance within prompts.

摘要
启示调整语言模型（LM）广泛地使用于解决各种问题，通常通过用户提供任务特定的提示。Context窗口长度和计算成本的约束，推动了压缩提示的发展。现有方法主要依赖于训练嵌入，这会带来多个token意义的解释问题，固定 embedding 数量、不可重复性和黑盒API交互时无法应用。本研究提出了启示压缩学习（PCRL），一种新的简单提示压缩方法。PCRL使用了计算效率高的政策网络，直接编辑提示。PCRL 训练方法可以适应不同类型的 LM，以及decoder-only和encoder-decoder架构，并可以在无需梯度访问LM或标注数据的情况下进行训练。PCRL 实现了各种 instruciton 提示中的平均token数减少24.6%，同时保持性能。此外，我们还证明了学习政策可以传递到更大的 LM 上，并通过多种分析，帮助理解提示中的token重要性。

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

paper_url: http://arxiv.org/abs/2308.08746
repo_url: https://github.com/wenxi-yue/surgicalsam
paper_authors: Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang
for: 这篇论文是针对医疗器械分类 зада填的，具体是使用Segment Anything Model (SAM)来进行医疗器械分类。
methods: 这篇论文使用了SAM作为基础模型，并提出了一个名为SurgicalSAM的新方法，它是一个端到端的高效优化方法，可以将医疗器械特有的信息与SAM的预先训练知识结合，以提高分类的精度和简化pipeline。
results: 实验结果显示，SurgicalSAM在EndoVis2018和EndoVis2017数据集上实现了顶尖性能，并且只需要小量可调参数。

Abstract
The Segment Anything Model (SAM) is a powerful foundation model that has revolutionised image segmentation. To apply SAM to surgical instrument segmentation, a common approach is to locate precise points or boxes of instruments and then use them as prompts for SAM in a zero-shot manner. However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to poor generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes and eliminates the use of explicit prompts for improved robustness and a simpler pipeline. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning, further enhancing the discrimination of the class prototypes for more accurate class prompting. The results of extensive experiments on both EndoVis2018 and EndoVis2017 datasets demonstrate that SurgicalSAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. The source code will be released at https://github.com/wenxi-yue/SurgicalSAM.

摘要
《医疗器械分割模型（SAM）》是一种强大的基础模型，对医疗器械分割进行了革命性的改进。为了应用SAM于医疗器械分割，常见的方法是在SAM中提供精确的点或盒子作为批处理，然后使用这些点或盒子作为SAM的Zero-shot模式下的提示。然而，我们发现两个问题：（1）自然物体和医疗器械之间的领域差异导致SAM的泛化性差；（2）SAM需要精确的点或盒子位置来进行准确的分割，需要大量的手动指导或一个高性能的专家检测器来准备提示，这导致了复杂的多Stage管道。为解决这些问题，我们介绍了《医疗SAM》，一种新的终端有效策略，使SAM能够有效地 интеGRATE医疗特有信息和SAM已经预训练的知识，从而提高泛化性。具体来说，我们提出了一种轻量级的原型基本类提示编码器，直接从类原型生成提示编码，并废除了显式提示的使用，从而提高了robustness和管道的简单化。此外，为了解决医疗器械类别之间的低同类差，我们提出了对比较类原型学习，进一步增强类原型的抑强，从而提高了分割的精度。实验结果表明，SurgicalSAM在EndoVis2018和EndoVis2017 datasets上表现出状态的作者性，只需要一小量的可调参数。源代码将在https://github.com/wenxi-yue/SurgicalSAM上发布。

PMET: Precise Model Editing in a Transformer

paper_url: http://arxiv.org/abs/2308.08742
repo_url: https://github.com/xpq-tech/pmet
paper_authors: Xiaopeng Li, Shasha Li, Shezheng Song, Jing Yang, Jun Ma, Jie Yu
for: 本研究旨在提高模型修改技术的性能，减少模型修改的成本。
methods: 该研究使用了多头自注意力（MHSA）和循环网络（FFN）的隐藏状态分析，并引入了一种同时优化多头自注意力和循环网络隐藏状态的方法（PMET），以准确地更新FFN的参数。
results: 实验表明，PMET在COUNTERFACT和zsRE datasets上表现出色，与其他方法相比，具有更高的性能。

Abstract
Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the COUNTERFACT and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at https://github.com/xpq-tech/PMET.git.

摘要
大型语言模型（LLM）的修改技术可以变更小一部分知识，并且实现了可观的成果。现有方法假设transformer层（TL）的隐藏状态是对Feed-Forward Network（FFN）的键值内存。它们通常将TL隐藏状态优化为记忆目标知识，并将其用于更新FFN的对应预测。但是，TL隐藏状态的资讯来源来自多个部分，包括多头自我对Alignment（MHSA）、FFN和复合连接。现有方法忽略了TL隐藏状态中不具体知识的存在，从而导致模型修改的性能下降。为了更精确地进行模型修改，我们进行了隐藏状态分析，发现MHSA对于抽取一些通用知识的模式具有特定的储存作用。这意味着MHSA的对应预测不需要更新，只需要使用优化的FFN对应预测来更新FFN的对应预测。基于以上发现，我们提出了PMET，它同时优化Transformer Component（TC，即MHSA和FFN）的隐藏状态，并将优化的TC隐藏状态仅用于精确地更新FFN的对应预测。我们的实验显示，PMET在COUNTERFACT和zsRE dataset上展示了顶尖的表现。我们的抽象实验显示，PMET的改进具有实质的优化作用，进一步证实了MHSA对于抽取通用知识的模式储存的发现，并显示了MHSA储存一小量的事实知识。我们的代码可以在https://github.com/xpq-tech/PMET.git中找到。

paper_url: http://arxiv.org/abs/2308.08737
repo_url: None
paper_authors: Tejaswini Manjunath, Mozhgan Navardi, Prakhar Dixit, Bharat Prakash, Tinoosh Mohsenin
for: 本研究旨在提出一种名为Ready for Production Hierarchical RL（ReProHRL）的方法，用于解决在实际环境中进行多目标导航的学习挑战。
methods: 该方法使用分类器作为预处理步骤，以学习多目标导航并将其转移到实际环境中。
results: 实验结果表明，提出的ReProHRL方法在模拟环境和实际环境中都能够在训练时间和性能方面超越基eline。在一个简单的单目标导航环境中，两种方法均达到100%的成功率，但在一个更复杂的环境和多目标设定下，提出的方法在基eline方法的18%和5%之上。为证明实际应用和Proof of Concept的实现，我们在一架名为Crazyflie的奈米扁平飞机上部署了提出的方法，并在其前置摄像头上进行多目标导航实验。

Abstract
Robots have been successfully used to perform tasks with high precision. In real-world environments with sparse rewards and multiple goals, learning is still a major challenge and Reinforcement Learning (RL) algorithms fail to learn good policies. Training in simulation environments and then fine-tuning in the real world is a common approach. However, adapting to the real-world setting is a challenge. In this paper, we present a method named Ready for Production Hierarchical RL (ReProHRL) that divides tasks with hierarchical multi-goal navigation guided by reinforcement learning. We also use object detectors as a pre-processing step to learn multi-goal navigation and transfer it to the real world. Empirical results show that the proposed ReProHRL method outperforms the state-of-the-art baseline in simulation and real-world environments in terms of both training time and performance. Although both methods achieve a 100% success rate in a simple environment for single goal-based navigation, in a more complex environment and multi-goal setting, the proposed method outperforms the baseline by 18% and 5%, respectively. For the real-world implementation and proof of concept demonstration, we deploy the proposed method on a nano-drone named Crazyflie with a front camera to perform multi-goal navigation experiments.

摘要
Translated into Simplified Chinese: Robots 已经成功完成了高精度任务。在实际环境中， sparse 奖励和多个目标导致学习仍然是一个大型挑战，并 Reinforcement Learning（RL）算法无法学习好的策略。训练在模拟环境中，然后在实际世界中细化是一种常见的方法。然而，适应实际环境是一个挑战。在这篇论文中，我们提出了名为Ready for Production Hierarchical RL（ReProHRL）的方法，该方法将多个目标导航分为多个层次。我们还使用对象检测器作为先processing步骤，以学习多个目标导航并将其转移到实际世界。实际结果表明，我们提出的ReProHRL方法在模拟环境和实际世界中都能够超过基准值，即使在更复杂的环境和多个目标设定下。为了证明实际实施和理解，我们在一架名为Crazyflie的奈米飞机上部署了我们的方法，并通过前 Camera 进行多个目标导航实验。

LLM-FuncMapper: Function Identification for Interpreting Complex Clauses in Building Codes via LLM

paper_url: http://arxiv.org/abs/2308.08728
repo_url: None
paper_authors: Zhe Zheng, Ke-Yin Chen, Xin-Yu Cao, Xin-Zheng Lu, Jia-Rui Lin
for: 本研究的目的是提出一种基于大语言模型（LLM）的方法，用于解释复杂的法规条款。
methods: 该方法包括系统分析建筑 codes，定义一系列的原子函数，以捕捉 Shared 计算逻辑和复杂约束，并开发了一个启发模板和分类化调整策略，以便使用常见的 LLM 进行有效的函数标识。
results: 经 statistical analysis 和实验 validate，该方法可以准确地 Identify 相应的预定函数，并将其转换为计算代码。此外，该方法还可以解释复杂的法规条款。

Abstract
As a vital stage of automated rule checking (ARC), rule interpretation of regulatory texts requires considerable effort. However, interpreting regulatory clauses with implicit properties or complex computational logic is still challenging due to the lack of domain knowledge and limited expressibility of conventional logic representations. Thus, LLM-FuncMapper, an approach to identifying predefined functions needed to interpret various regulatory clauses based on the large language model (LLM), is proposed. First, by systematically analysis of building codes, a series of atomic functions are defined to capture shared computational logics of implicit properties and complex constraints, creating a database of common blocks for interpreting regulatory clauses. Then, a prompt template with the chain of thought is developed and further enhanced with a classification-based tuning strategy, to enable common LLMs for effective function identification. Finally, the proposed approach is validated with statistical analysis, experiments, and proof of concept. Statistical analysis reveals a long-tail distribution and high expressibility of the developed function database, with which almost 100% of computer-processible clauses can be interpreted and represented as computer-executable codes. Experiments show that LLM-FuncMapper achieve promising results in identifying relevant predefined functions for rule interpretation. Further proof of concept in automated rule interpretation also demonstrates the possibility of LLM-FuncMapper in interpreting complex regulatory clauses. To the best of our knowledge, this study is the first attempt to introduce LLM for understanding and interpreting complex regulatory clauses, which may shed light on further adoption of LLM in the construction domain.

摘要
为了自动检查规则（ARC）中的规则解释，需要很大的努力。然而，解释法规条款中的隐式属性或复杂计算逻辑仍然是一个挑战，因为缺乏领域知识和限制表达逻辑的表达能力。因此，我们提出了LLM-FuncMapper，一种基于大语言模型（LLM）的方法，用于确定解释法规条款中的预定函数。首先，通过系统性分析法规文本，我们定义了一系列原子函数，用于捕捉不同领域的共享计算逻辑和复杂约束，并建立了一个共享块数据库，用于解释法规条款。然后，我们开发了一个提示模板，并使用分类调整策略，以便使用常见的LLM来确定预定函数。最后，我们验证了我们的方法，通过统计分析、实验和证明原理来证明其效果。统计分析显示，我们建立的函数库具有长尾分布和高表达能力，可以将大多数计算可能的条款解释为计算代码。实验表明，LLM-FuncMapper可以成功地确定解释法规条款中的预定函数。此外，我们还进行了自动规则解释的证明，这表明LLM-FuncMapper可以在解释复杂的法规条款中发挥作用。根据我们所知，这是第一次将大语言模型应用于解释复杂法规条款，这可能会照亮未来在建筑领域中LLM的更多应用。

A Novel Loss Function Utilizing Wasserstein Distance to Reduce Subject-Dependent Noise for Generalizable Models in Affective Computing

paper_url: http://arxiv.org/abs/2308.10869
repo_url: None
paper_authors: Nibraas Khan, Mahrukh Tauseef, Ritam Ghosh, Nilanjan Sarkar
for: 这篇论文的目的是提出一个新的成本函数，用于适应调节数据中的主题参数，以提高情绪识别的准确性。methods: 这篇论文使用了深度学习技术，特别是水星距离理论，来规定主题参数的重要性。results: 比较这篇论文的提案成本函数与传统的 Mean Squared Error 损失函数，在四个常用的数据集上得到了14.75% 和 17.75% 的平均提升。

Abstract
Emotions are an essential part of human behavior that can impact thinking, decision-making, and communication skills. Thus, the ability to accurately monitor and identify emotions can be useful in many human-centered applications such as behavioral training, tracking emotional well-being, and development of human-computer interfaces. The correlation between patterns in physiological data and affective states has allowed for the utilization of deep learning techniques which can accurately detect the affective states of a person. However, the generalisability of existing models is often limited by the subject-dependent noise in the physiological data due to variations in a subject's reactions to stimuli. Hence, we propose a novel cost function that employs Optimal Transport Theory, specifically Wasserstein Distance, to scale the importance of subject-dependent data such that higher importance is assigned to patterns in data that are common across all participants while decreasing the importance of patterns that result from subject-dependent noise. The performance of the proposed cost function is demonstrated through an autoencoder with a multi-class classifier attached to the latent space and trained simultaneously to detect different affective states. An autoencoder with a state-of-the-art loss function i.e., Mean Squared Error, is used as a baseline for comparison with our model across four different commonly used datasets. Centroid and minimum distance between different classes are used as a metrics to indicate the separation between different classes in the latent space. An average increase of 14.75% and 17.75% (from benchmark to proposed loss function) was found for minimum and centroid euclidean distance respectively over all datasets.

摘要
人类情感是人类行为的重要组成部分，可以影响思维、决策和交流技能。因此，能够准确监测和识别情感的能力可以在许多人类中心应用中得到利用，如行为训练、情感健康评估和人机界面开发。基于生物数据的征特和情感状态的相关性，可以使用深度学习技术来准确检测人类情感状态。但是，现有模型的普适性往往受到参与者的偏好所限，因为参与者对刺激的反应会导致数据中的偏差。因此，我们提出了一个新的成本函数，使用优化运输理论和 Wasserstein 距离来权重调整参与者特定的数据，以便更重要地考虑参与者共同的数据特征，而不是受到偏差所影响的数据。我们的模型的性能通过一个具有多类分类器的自编码器来检测不同情感状态，并与使用 Mean Squared Error 损失函数的基准模型进行比较。中心距离和最小距离between不同类别在隐藏空间中的分离度被用作评价模型的metric。在所有数据集上，我们发现使用我们的损失函数而不是基准模型时，平均提高14.75%和17.75%（从基准到我们的损失函数）。

EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices

paper_url: http://arxiv.org/abs/2308.08717
repo_url: None
paper_authors: Liang Wang, Nan Zhang, Xiaoyang Qu, Jianzong Wang, Jiguang Wan, Guokuan Li, Kaiyu Hu, Guilin Jiang, Jing Xiao
for: 这篇论文主要关注在对于边缘设备上进行实时影像分析中遇到的挑战，特别是边缘设备通常具有资源短缺的情况下，如何使用深度神经网络（DNN）来进行影像分析。
methods: 本论文提出了一个实用且高效的影像分析系统EdgeMA，可以适应影像流中的变化，并且能够处理资料漂移问题。EdgeMA使用灰度层次共occurrence统计 texture feature，并使用Random Forest分类器来检测领域变化。此外，我们还将模型更新方法基于重要性评分，专门用于更新模型以适应标签分布变化。
results: 经过严谨的实验评估，我们的结果显示EdgeMA可以明显提高推论精度。

Abstract
Real-time video analytics on edge devices for changing scenes remains a difficult task. As edge devices are usually resource-constrained, edge deep neural networks (DNNs) have fewer weights and shallower architectures than general DNNs. As a result, they only perform well in limited scenarios and are sensitive to data drift. In this paper, we introduce EdgeMA, a practical and efficient video analytics system designed to adapt models to shifts in real-world video streams over time, addressing the data drift problem. EdgeMA extracts the gray level co-occurrence matrix based statistical texture feature and uses the Random Forest classifier to detect the domain shift. Moreover, we have incorporated a method of model adaptation based on importance weighting, specifically designed to update models to cope with the label distribution shift. Through rigorous evaluation of EdgeMA on a real-world dataset, our results illustrate that EdgeMA significantly improves inference accuracy.

摘要
现实时视频分析在边缘设备上是一项复杂的任务。边缘设备通常具有资源约束，因此边缘深度神经网络（DNN）通常具有较少的参数和较浅的结构，只能在有限的场景下表现良好。在这篇论文中，我们介绍了EdgeMA，一个实用和高效的视频分析系统，用于适应实际视频流中的数据偏移问题。EdgeMA提取了灰度层次相关矩阵基本的纹理特征，并使用Random Forest分类器来探测领域偏移。此外，我们还实现了基于重要性赋分的模型更新方法，以适应标签分布偏移。经过对实际数据集的严格评估，我们的结果表明，EdgeMA可以显著提高推理精度。

Probabilistic Results on the Architecture of Mathematical Reasoning Aligned by Cognitive Alternation

paper_url: http://arxiv.org/abs/2308.08714
repo_url: None
paper_authors: Minzheng Li, Xiangzhong Fang, Haixin Yang
for: 本研究旨在构思一种可以解决数学问题的机器人。
methods: 该研究采用分两部分的量化逻辑系统：思维过程和认知过程，并提供了概率描述该系统的建构。
results: 研究人员提供了一种可以模型思维过程和认知过程的 probabilistic 描述，以便实现机器人可以解决数学问题的目标。

Abstract
We envision a machine capable of solving mathematical problems. Dividing the quantitative reasoning system into two parts: thought processes and cognitive processes, we provide probabilistic descriptions of the architecture.

摘要
我们看到了一种机器能够解决数学问题的想法。将数学逻辑系统分为两部分：思维过程和认知过程，我们提供了概率描述这个architecture。Here's a breakdown of the translation:* "We envision" is translated as "我们看到了" (wǒmen kàn dào le)* "a machine capable of solving mathematical problems" is translated as "一种机器能够解决数学问题" (yī zhǒng jīqì néng huì jiě jué dù shù)* "Dividing the quantitative reasoning system into two parts" is translated as "将数学逻辑系统分为两部分" (jiǎng dào xué xíng lógí xìtemu bìwǎn)* "thought processes" is translated as "思维过程" (sī wèi guò xìng)* "and cognitive processes" is translated as "和认知过程" (hé rèn zhì guò xìng)* "we provide probabilistic descriptions of the architecture" is translated as "我们提供了概率描述这个architecture" (wǒmen tīng yǐjī le gè yī jī qì bèng mǐng)Note that the word "architecture" is not explicitly mentioned in the original text, but it is implied by the context. In Chinese, the word for "architecture" is 架构 (gā kū), but it is not necessary to use it in this sentence.

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

paper_url: http://arxiv.org/abs/2308.08708
repo_url: None
paper_authors: Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, Rufin VanRullen
for: 本研究旨在探讨现有的人工智能系统是否具备意识性，并采用科学理论和实验来评估这些系统。
methods: 本研究使用了多种科学理论，包括反复处理理论、全局工作区理论、更高级别理论、预测处理和注意Schema理论，从这些理论中提取了”指示性特征”，用于评估人工智能系统是否拥有意识性。
results: 本研究发现现有的人工智能系统没有意识性，但也没有技术障碍建立意识性的AI系统。

Abstract
Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.

摘要
当前或近期的人工智能系统是科学兴趣和公众关注的话题。这份报告提出和证明了一种严谨和基于实证的人工智能意识方法：对现有的AI系统进行详细的评估，以及根据我们最好支持的神经科学理论来评估意识。我们对多种知名的科学理论进行了检查，包括循环处理理论、全球工作空间理论、更高级理论、预测处理和注意schema理论。从这些理论中，我们 derivated "指标属性" （indicator properties），这些属性可以用计算机语言来评估 AI 系统。我们使用这些指标属性来评估一些最近的 AI 系统，并讨论了未来系统如何实现这些属性。我们的分析表明，目前没有任何 AI 系统具备意识，但也没有明显的技术障碍来建立具备这些指标属性的 AI 系统。

Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment

paper_url: http://arxiv.org/abs/2308.08696
repo_url: None
paper_authors: Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li
for: 这个论文的目的是提出一个基于多 Granularity Cross-Domain Alignment (MGCDA) 框架的 anomaly segmentation 方法，以便在自驾车中探测道路上的异常现象。
methods: 这个方法使用了一个新的 Multi-source Domain Adversarial Training (MDAT) 模组和一种 Cross-domain Anomaly-aware Contrastive Learning (CACL) 方法，将模型的通用性提高到了最高水平。这两种方法都是为了处理多个领域资料的问题，并且能够实现模型在测试阶段 Parameters-free。
results: 实验结果显示，提出的方法在 Fishyscapes 和 RoadAnomaly 资料集上取得了现有方法的最佳性能。

Abstract
Anomaly segmentation plays a crucial role in identifying anomalous objects within images, which facilitates the detection of road anomalies for autonomous driving. Although existing methods have shown impressive results in anomaly segmentation using synthetic training data, the domain discrepancies between synthetic training data and real test data are often neglected. To address this issue, the Multi-Granularity Cross-Domain Alignment (MGCDA) framework is proposed for anomaly segmentation in complex driving environments. It uniquely combines a new Multi-source Domain Adversarial Training (MDAT) module and a novel Cross-domain Anomaly-aware Contrastive Learning (CACL) method to boost the generality of the model, seamlessly integrating multi-domain data at both scene and sample levels. Multi-source domain adversarial loss and a dynamic label smoothing strategy are integrated into the MDAT module to facilitate the acquisition of domain-invariant features at the scene level, through adversarial training across multiple stages. CACL aligns sample-level representations with contrastive loss on cross-domain data, which utilizes an anomaly-aware sampling strategy to efficiently sample hard samples and anchors. The proposed framework has decent properties of parameter-free during the inference stage and is compatible with other anomaly segmentation networks. Experimental conducted on Fishyscapes and RoadAnomaly datasets demonstrate that the proposed framework achieves state-of-the-art performance.

摘要
《多源频率域对接框架》（MGCDA）是一种用于异常分割的新框架，旨在 Addressing the issue of domain discrepancies in anomaly segmentation using synthetic training data. The MGCDA framework combines a novel Multi-source Domain Adversarial Training (MDAT) module and a Cross-domain Anomaly-aware Contrastive Learning (CACL) method to improve the generality of the model. The MDAT module uses a multi-source domain adversarial loss and a dynamic label smoothing strategy to acquire domain-invariant features at the scene level through adversarial training across multiple stages. The CACL method aligns sample-level representations with contrastive loss on cross-domain data, using an anomaly-aware sampling strategy to efficiently sample hard samples and anchors. The proposed framework has the advantage of being parameter-free during the inference stage and is compatible with other anomaly segmentation networks. Experimental results on the Fishyscapes and RoadAnomaly datasets demonstrate that the proposed framework achieves state-of-the-art performance.Here's the translation of the text into Traditional Chinese:《多源频率域对接框架》（MGCDA）是一种新的框架，旨在 Addressing the issue of domain discrepancies in anomaly segmentation using synthetic training data. MGCDA 框架 combine 一个新的 Multi-source Domain Adversarial Training (MDAT) 模组和一个 Cross-domain Anomaly-aware Contrastive Learning (CACL) 方法，以提高模型的通用性。 MDAT 模组使用多源频率域 adversarial loss 和动态标签平滑策略来获得频率域对应的特征，通过多阶段的对抗训练。 CACL 方法使用异常感知抽象来对两个不同频率域的标签进行对应，并使用异常感知抽象来优化标签的映射。提议的框架具有在推断阶段不需要参数的优点，并且可以与其他异常分割网络相容。实验结果显示，提议的框架在 Fishyscapes 和 RoadAnomaly datasets 上 achieve state-of-the-art 性能。

Planning in the imagination: High-level planning on learned abstract search spaces

paper_url: http://arxiv.org/abs/2308.08693
repo_url: None
paper_authors: Carlos Martin, Tuomas Sandholm
for: 提供了一种新的方法，叫做PiZero，允许智能机器人在自己创建的抽象搜索空间中进行规划，完全与实际环境分离。
methods: 与先前的方法不同，PiZero 允许智能机器人在任意时间尺度上进行高级规划，并且可以处理连续动作空间和部分可见性。
results: 在多个领域中进行实验，PiZero 比先前的方法更高效，无需访问环境模拟器。

Abstract
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space of its own creation that is completely decoupled from the real environment. Unlike prior approaches, this enables the agent to perform high-level planning at arbitrary timescales and reason in terms of compound or temporally-extended actions, which can be useful in environments where large numbers of base-level micro-actions are needed to perform relevant macro-actions. In addition, our method is more general than comparable prior methods because it handles settings with continuous action spaces and partial observability. We evaluate our method on multiple domains, including navigation tasks and Sokoban. Experimentally, it outperforms comparable prior methods without assuming access to an environment simulator.

摘要
我们提出一种新方法，叫做PiZero，允许智能机器人在自己创建的抽象搜索空间中进行规划。与先前的方法不同的是，这使得智能机器人可以在任意时间尺度上进行高级规划，并可以使用复合或时间扩展的动作来执行重要的宏动作。此外，我们的方法更加通用于具有连续动作空间和部分可见性的场景。我们在多个领域进行了实验，包括导航任务和Sokoban，并证明了PiZero在不假设环境模拟器的情况下表现更好。

Lightweight Adaptation of Neural Language Models via Subspace Embedding

paper_url: http://arxiv.org/abs/2308.08688
repo_url: https://github.com/amitkumarj441/cikm2023_subspaceembedding
paper_authors: Amit Kumar Jaiswal, Haiming Liu
for: 这个论文的目的是提出一种新的嵌入结构，以减少预训练语言模型的存储占用空间，同时保持模型的精度。
methods: 该论文使用一种基于语言模型的嵌入结构，通过对 tokens 的上下文关系进行赋值，实现嵌入结构的压缩。
results: 实验结果显示，使用该嵌入结构可以将预训练语言模型的嵌入矩阵压缩至99.8%以上，而且保持模型的精度。

Abstract
Traditional neural word embeddings are usually dependent on a richer diversity of vocabulary. However, the language models recline to cover major vocabularies via the word embedding parameters, in particular, for multilingual language models that generally cover a significant part of their overall learning parameters. In this work, we present a new compact embedding structure to reduce the memory footprint of the pre-trained language models with a sacrifice of up to 4% absolute accuracy. The embeddings vectors reconstruction follows a set of subspace embeddings and an assignment procedure via the contextual relationship among tokens from pre-trained language models. The subspace embedding structure calibrates to masked language models, to evaluate our compact embedding structure on similarity and textual entailment tasks, sentence and paraphrase tasks. Our experimental evaluation shows that the subspace embeddings achieve compression rates beyond 99.8% in comparison with the original embeddings for the language models on XNLI and GLUE benchmark suites.

摘要

Quantifying Overfitting: Introducing the Overfitting Index

paper_url: http://arxiv.org/abs/2308.08682
repo_url: None
paper_authors: Sanad Aburass
for: 本研究旨在提供一个量化评估模型过拟合情况的指标，以便提高模型在实际应用中的效能。
methods: 本研究使用了多种模型架构，包括 MobileNet、U-Net、ResNet、Darknet 和 ViT-32，进行实验，并使用了数据增强技术来确认模型的过拟合情况。
results: 研究结果显示，不同的模型架构在不同的数据集上 exhibits 不同的过拟合情况，而数据增强技术对于小型和特殊的数据集有更大的稳定化效果。 ViT-32 模型在 MNIST 数据集上的表现也显示了某些模型在实际应用中的强健性和数据集的完整性。

Abstract
In the rapidly evolving domain of machine learning, ensuring model generalizability remains a quintessential challenge. Overfitting, where a model exhibits superior performance on training data but falters on unseen data, is a recurrent concern. This paper introduces the Overfitting Index (OI), a novel metric devised to quantitatively assess a model's tendency to overfit. Through extensive experiments on the Breast Ultrasound Images Dataset (BUS) and the MNIST dataset using architectures such as MobileNet, U-Net, ResNet, Darknet, and ViT-32, we illustrate the utility and discernment of the OI. Our results underscore the variable overfitting behaviors across architectures and highlight the mitigative impact of data augmentation, especially on smaller and more specialized datasets. The ViT-32's performance on MNIST further emphasizes the robustness of certain models and the dataset's comprehensive nature. By providing an objective lens to gauge overfitting, the OI offers a promising avenue to advance model optimization and ensure real-world efficacy.

摘要
在机器学习领域的快速演化中，保证模型通用性仍然是一个核心挑战。过拟合，其表现出模型在训练数据上出色，但是在未经见数据上崩溃的问题，是一个常见的问题。本文介绍了一个新的度量工具——过拟合指数（OI），用于量化评估模型是否过拟合。通过对Breast Ultrasound Images Dataset（BUS）和MNIST datasets上的MobileNet、U-Net、ResNet、Darknet和ViT-32等建筑物的广泛实验，我们证明了OI的实用性和分辨率。我们的结果表明不同的建筑物在不同的数据集上具有不同的过拟合行为，并且数据增强特别是在小型和特殊化数据集上具有缓解作用。ViT-32在MNIST上的表现也证明了某些模型的稳定性和数据集的全面性。通过提供一个对过拟合进行Objective评估的工具，OI为模型优化和实际效果的提高提供了一个可靠的途径。

Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions

paper_url: http://arxiv.org/abs/2308.08661
repo_url: None
paper_authors: Haitian Sun, William W. Cohen, Ruslan Salakhutdinov
for: answering ambiguous questions
methods: exploiting a database of unambiguous questions generated from Wikipedia
results: improved performance by 15% (relative improvement) on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs, as well as large improvements in diverse passage retrieval.Here’s the same information in Simplified Chinese:
for: 回答不确定的问题
methods: 利用wikipedia中生成的明确问题数据库
results: 提高了15%的准确率和10%的解释问题率，以及大幅提高多个段落检索率。

Abstract
Many open-domain questions are under-specified and thus have multiple possible answers, each of which is correct under a different interpretation of the question. Answering such ambiguous questions is challenging, as it requires retrieving and then reasoning about diverse information from multiple passages. We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. On the challenging ASQA benchmark, which requires generating long-form answers that summarize the multiple answers to an ambiguous question, our method improves performance by 15% (relative improvement) on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs. Retrieving from the database of generated questions also gives large improvements in diverse passage retrieval (by matching user questions q to passages p indirectly, via questions q' generated from p).

摘要
多个开放问题是下pecified，因此它们有多个可能的答案，每个答案都是正确的，只要是不同的问题解释。回答这些抽象问题是困难的，因为它们需要检索并then reasoning about多个信息的多个段落。我们提出了一种新的状态 Arts，用于回答抽象问题，该方法利用了基于Wikipedia的问题库。在ASQA Benchmark上，这个Benchmark需要生成长形答案，这些答案需要总结多个答案到一个抽象问题上，我们的方法在这些衡量中提高了15%的Relative Improvement（相对提高）和10%的衡量（evaluation measures）。在检索过程中，从生成的问题库中提取信息也得到了大幅度的提高（by matching user questions q to passages p indirectly, via questions q' generated from p）。

Towards Zero Memory Footprint Spiking Neural Network Training

paper_url: http://arxiv.org/abs/2308.08649
repo_url: None
paper_authors: Bin Lei, Sheng Lin, Pei-Hung Lin, Chunhua Liao, Caiwen Ding
for: 这 paper 的目的是解决 SNN 训练中的内存约束问题。
methods: 这 paper 使用了一种新的架构和一种新的算法来降低 SNN 训练中的内存使用量。
results: 这 paper 的实验结果表明，使用这种新的架构和算法可以减少 SNN 训练中的内存使用量，同时不会增加训练时间。具体来说，可以达到 $\mathbf{58.65\times}$ 的内存减少和 $\mathbf{23.8%}$ 的训练时间减少。

Abstract
Biologically-inspired Spiking Neural Networks (SNNs), processing information using discrete-time events known as spikes rather than continuous values, have garnered significant attention due to their hardware-friendly and energy-efficient characteristics. However, the training of SNNs necessitates a considerably large memory footprint, given the additional storage requirements for spikes or events, leading to a complex structure and dynamic setup. In this paper, to address memory constraint in SNN training, we introduce an innovative framework, characterized by a remarkably low memory footprint. We \textbf{(i)} design a reversible SNN node that retains a high level of accuracy. Our design is able to achieve a $\mathbf{58.65\times}$ reduction in memory usage compared to the current SNN node. We \textbf{(ii)} propose a unique algorithm to streamline the backpropagation process of our reversible SNN node. This significantly trims the backward Floating Point Operations Per Second (FLOPs), thereby accelerating the training process in comparison to current reversible layer backpropagation method. By using our algorithm, the training time is able to be curtailed by $\mathbf{23.8\%}$ relative to existing reversible layer architectures.

摘要
生物学发明的刺激神经网络（SNN），通过使用不同时间点事件而不是连续值来处理信息，已经吸引了广泛关注，因为它们具有硬件友好和能效的特点。然而，SNN的训练需要较大的内存占用量，因为需要额外存储刺激或事件，从而导致复杂的结构和动态设置。在这篇论文中，我们提出了一种创新的框架，以减少SNN训练中的内存占用量。我们的设计包括：（i）设计一种可逆的SNN节点，保持高度准确性。我们的设计可以实现$\mathbf{58.65\times}$的内存占用量减少相比现有的SNN节点。（ii）我们提出了一种特殊的算法，用于简化我们的可逆层反propagation过程。这种算法可以大幅减少反向浮点运算数（FLOPs），从而加快训练过程的速度，相比现有的可逆层反propagation方法。通过使用我们的算法，训练时间可以被减少$\mathbf{23.8\%}$相比现有的可逆层架构。

FedPop: Federated Population-based Hyperparameter Tuning

paper_url: http://arxiv.org/abs/2308.08634
repo_url: None
paper_authors: Haokun Chen, Denis Krompass, Jindong Gu, Volker Tresp
for: 本研究旨在提高 Federated Learning (FL) 中 hyperparameter (HP) 的优化，以提高 FL 的性能和可扩展性。
methods: 本研究提出了一种新的 HP 优化算法，called Federated Population-based Hyperparameter Tuning (FedPop)，使用了人口生长算法来优化 HP，并且采用了在线 “调度-while-训练” 框架，以提高计算效率和探索 HP 搜索空间。
results: 对于常见的 FL benchmark 和复杂的实际世界 FL 数据集，本研究的提出的方法在比较于现状最佳 HP 调度方法的实际 validate 中显著提高了 FL 的性能。

Abstract
Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their "training-after-tuning" framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both client and server sides. Compared with prior tuning methods, FedPop employs an online "tuning-while-training" framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP tuning methods for FL.

摘要
FedPop 使用人口生长算法来优化 HP，可以满足不同类型的 HP 在客户端和服务器两侧。与先前的调优方法相比，FedPop 采用了在线 "调优 while training" 框架，提供计算效率和探索更广泛的 HP 搜索空间。我们对常见的 FL benchmark 和复杂的实际 FL 数据进行了实验验证，得到了我们提出的方法的有效性，与当前状态的HP调优方法相比，它具有显著的性能提升。

LSTM-Based Forecasting Model for GRACE Accelerometer Data

paper_url: http://arxiv.org/abs/2308.08621
repo_url: https://github.com/darbeheshti/lstm-based-analysis-for-grace-accelerometers
paper_authors: Neda Darbeheshti, Elahe Moradi
for: The paper is written for monitoring variations in Earth’s gravity field and filling data gaps using the GRACE and GRACE Follow-On satellite missions.
methods: The paper uses Long Short-Term Memory (LSTM) networks to train a model capable of predicting accelerometer data for all three axes.
results: The model demonstrates effectiveness in filling gaps and forecasting GRACE accelerometer data, with accurate predictions for the three axes.

Abstract
The Gravity Recovery and Climate Experiment (GRACE) satellite mission, spanning from 2002 to 2017, has provided a valuable dataset for monitoring variations in Earth's gravity field, enabling diverse applications in geophysics and hydrology. The mission was followed by GRACE Follow-On in 2018, continuing data collection efforts. The monthly Earth gravity field, derived from the integration different instruments onboard satellites, has shown inconsistencies due to various factors, including gaps in observations for certain instruments since the beginning of the GRACE mission. With over two decades of GRACE and GRACE Follow-On data now available, this paper proposes an approach to fill the data gaps and forecast GRACE accelerometer data. Specifically, we focus on accelerometer data and employ Long Short-Term Memory (LSTM) networks to train a model capable of predicting accelerometer data for all three axes. In this study, we describe the methodology used to preprocess the accelerometer data, prepare it for LSTM training, and evaluate the model's performance. Through experimentation and validation, we assess the model's accuracy and its ability to predict accelerometer data for the three axes. Our results demonstrate the effectiveness of the LSTM forecasting model in filling gaps and forecasting GRACE accelerometer data.

摘要
grace卫星任务（GRACE），从2002年到2017年，提供了对地球重力场变化的监测数据，用于多种地球物理和水文应用。这个任务被GRACE Follow-On在2018年继承，继续进行数据采集。月度地球重力场，由不同卫星上的 instrumenteintegrate 而来，存在各种因素的影响，包括GRACE任务开始时的观测 gap。 now, with over two decades of GRACE and GRACE Follow-On data available, this paper proposes an approach to fill the data gaps and forecast GRACE accelerometer data. Specifically, we focus on accelerometer data and employ Long Short-Term Memory (LSTM) networks to train a model capable of predicting accelerometer data for all three axes. 在这个研究中，我们描述了对加速计数据进行预处理、准备 для LSTM 训练，以及模型性能的评估。通过实验和验证，我们评估了模型的准确性和其能够预测加速计数据的三个轴。我们的结果表明LSTM预测模型能够有效地填充数据 gap并预测GRACE加速计数据。

Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought

paper_url: http://arxiv.org/abs/2308.08614
repo_url: None
paper_authors: Bin Lei, pei-Hung Lin, Chunhua Liao, Caiwen Ding
for: 提高大规模模型对复杂问题的逻辑推理能力
methods: 提出Graph of Thoughts（GoT）引导技术
results: 比GPT-4高$89.7%$, $86%$,和$56%$，与SOTA方法ToT average上升$23%$, $24%$,和$15%$

Abstract
Recent advancements in large-scale models, such as GPT-4, have showcased remarkable capabilities in addressing standard queries. However, when facing complex problems that require multi-step logical reasoning, their accuracy dramatically decreases. Current research has explored the realm of \textit{prompting engineering} to bolster the inferential capacities of these models. Our paper unveils a pioneering prompting technique, dubbed \textit{Graph of Thoughts (GoT)}. Through testing on a trio of escalating challenges: the 24-point game, resolution of high-degree polynomial equations, and derivation of formulas for recursive sequences, our method outperformed GPT-4, achieving accuracy improvements of $89.7\%$, $86\%$, and $56\%$ for each respective task. Moreover, when juxtaposed with the state-of-the-art (SOTA) prompting method, \textit{Tree of Thought (ToT)}, our approach registered an average accuracy boost of $23\%$, $24\%$, and $15\%$.

摘要
（Recent advancements in large-scale models, such as GPT-4, have showcased remarkable capabilities in addressing standard queries. However, when facing complex problems that require multi-step logical reasoning, their accuracy dramatically decreases. Current research has explored the realm of \textit{prompting engineering} to bolster the inferential capacities of these models. Our paper unveils a pioneering prompting technique, dubbed \textit{Graph of Thoughts (GoT)}. Through testing on a trio of escalating challenges: the 24-point game, resolution of high-degree polynomial equations, and derivation of formulas for recursive sequences, our method outperformed GPT-4, achieving accuracy improvements of $89.7\%$, $86\%$, and $56\%$ for each respective task. Moreover, when juxtaposed with the state-of-the-art (SOTA) prompting method, \textit{Tree of Thought (ToT)}, our approach registered an average accuracy boost of $23\%$, $24\%$, and $15\%$.）

Integrating Renewable Energy in Agriculture: A Deep Reinforcement Learning-based Approach

paper_url: http://arxiv.org/abs/2308.08611
repo_url: None
paper_authors: A. Wahid, I faiud, K. Mason
for: 这种研究旨在优化农业部门中 photovoltaic (PV) 系统的决策，帮助农业投资者做出了有数据支持的决策。
methods: 该研究使用深度Q学习网络 (DQN) 来优化决策，通过设置奖励机制，使DQN学习据材料驱动的决策。
results: 研究提供了一个全面的理解，如何使用DQN来支持农业投资者做出有利可图的PV安装决策，这些决策可以提高能源效率，降低环境影响，提高利润性。

Abstract
This article investigates the use of Deep Q-Networks (DQNs) to optimize decision-making for photovoltaic (PV) systems installations in the agriculture sector. The study develops a DQN framework to assist agricultural investors in making informed decisions considering factors such as installation budget, government incentives, energy requirements, system cost, and long-term benefits. By implementing a reward mechanism, the DQN learns to make data-driven decisions on PV integration. The analysis provides a comprehensive understanding of how DQNs can support investors in making decisions about PV installations in agriculture. This research has significant implications for promoting sustainable and efficient farming practices while also paving the way for future advancements in this field. By leveraging DQNs, agricultural investors can make optimized decisions that improve energy efficiency, reduce environmental impact, and enhance profitability. This study contributes to the advancement of PV integration in agriculture and encourages further innovation in this promising area.

摘要

FootGPT : A Large Language Model Development Experiment on a Minimal Setting

paper_url: http://arxiv.org/abs/2308.08610
repo_url: None
paper_authors: Eren Unlu
for: 本研究的目的是开发一个特定用途的语言模型，用于解释足球数据，并且在有限的资源下进行。
methods: 本研究使用了一个已经训练过的一百亿参数大小的通用 causal 语言模型，并在Italian足球联赛第一十周的比赛数据上进行了微调。使用了低级别适应。
results: 本研究发现，使用有限的资源和短时间训练，可以开发出一个高度精准的语言模型，用于解释足球数据。

Abstract
With recent empirical observations, it has been argued that the most significant aspect of developing accurate language models may be the proper dataset content and training strategy compared to the number of neural parameters, training duration or dataset size. Following this argument, we opted to fine tune a one billion parameter size trained general purpose causal language model with a dataset curated on team statistics of the Italian football league first ten game weeks, using low rank adaptation. The limited training dataset was compiled based on a framework where a powerful commercial large language model provides distilled paragraphs and question answer pairs as intended. The training duration was kept relatively short to provide a basis for our minimal setting exploration. We share our key observations on the process related to developing a specific purpose language model which is intended to interpret soccer data with constrained resources in this article.

摘要
据最新的观察，人们认为在建立准确语言模型方面，最重要的因素是数据集内容和训练策略，而不是神经网络参数数量、训练时间或数据集大小。基于这个Arguments，我们决定使用一个已经训练过的一亿参数大小的通用 causal语言模型，并在意大利足球联赛第一个十周的比赛数据上进行微调。我们采用了低级别适应。由于我们的训练集较小，因此我们保留了较短的训练时间，以便在有限的资源下进行exploration。在这篇文章中，我们将分享我们在开发特定目标语言模型方面的关键观察。这种语言模型的目的是解释足球数据，并且在有限的资源下进行。

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

paper_url: http://arxiv.org/abs/2308.08545
repo_url: https://github.com/huangyangyi/tech
paper_authors: Yangyi Huang, Hongwei Yi, Yuliang Xiu, Tingting Liao, Jiaxiang Tang, Deng Cai, Justus Thies
for: 本研究旨在解决单张图像中重构人体的“未seen region”问题，即 accurately restoring高级详细信息。
methods: 本研究使用了基于文本描述（如衣服、颜色、发型）的自动生成的garment parsing模型和视觉问答（VQA）模型，以及个性化文本到图像扩散（T2I）模型。
results: 本研究提出了一种基于DMTet的混合3D表示方法，通过多视点Score Distillation Sampling（SDS）和重构损失来优化geometry和texture。实验结果表明，TeCH可以生成高精度的3D人体图像，具有一致和细腻的текстура，以及详细的全身几何结构。

Abstract
Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/TeCH

摘要
尽管最近的研究进步在单个图像中重建人体已经做出了 significativo 的进展，但还没有解决高级特征的重建问题。现有的方法通常会生成高级特征表面上的柔和粗糙 текстур。问题是如何有效地从单个图像中捕捉个体的所有视觉特征，以便重建未看到的区域（例如背部）？基于基础模型的TeCH重构3D人体，通过1）由garment parsing模型自动生成的描述文本提示（例如服装、颜色、头发样式），2）个性化进行文本到图像扩散（T2I）模型，学习“不可述的”外观。为了在可持置价格下提供高分辨率3D人体图像，我们提议一种混合3D表示方式，基于DMTet，包括显式体形网格和隐式距离场。通过描述提示+个性化T2I扩散模型，3D人体的几何和текстура在多视图Score Distillation Sampling（SDS）和重建损失基于原始观察得到优化。TeCH生成高准确率、细腻的текстура和详细的全身几何，在量化和质量上都超过了当前状况。代码将在https://huangyangyi.github.io/TeCH 公开 для研究用途。

Can Transformers Learn Optimal Filtering for Unknown Systems?

paper_url: http://arxiv.org/abs/2308.08536
repo_url: None
paper_authors: Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay
for: 这个论文的目的是使用变换器来解决动力系统中的输出估计问题。
methods: 这篇论文使用了变换器来生成输出预测，并通过训练变换器使其能够适应不同的系统。
results: 这篇论文的结果显示，使用变换器来解决动力系统中的输出估计问题可以匹配最佳输出估计器的性能，并且在具有非相关噪声、时间变化动力和非线性动力等挑战性enario下也能够表现良好。

Abstract
Transformers have demonstrated remarkable success in natural language processing; however, their potential remains mostly unexplored for problems arising in dynamical systems. In this work, we investigate the optimal output estimation problem using transformers, which generate output predictions using all the past ones. We train the transformer using various systems drawn from a prior distribution and then evaluate its performance on previously unseen systems from the same distribution. As a result, the obtained transformer acts like a prediction algorithm that learns in-context and quickly adapts to and predicts well for different systems - thus we call it meta-output-predictor (MOP). MOP matches the performance of the optimal output estimator, based on Kalman filter, for most linear dynamical systems even though it does not have access to a model. We observe via extensive numerical experiments that MOP also performs well in challenging scenarios with non-i.i.d. noise, time-varying dynamics, and nonlinear dynamics like a quadrotor system with unknown parameters. To further support this observation, in the second part of the paper, we provide statistical guarantees on the performance of MOP and quantify the required amount of training to achieve a desired excess risk during test-time. Finally, we point out some limitations of MOP by identifying two classes of problems MOP fails to perform well, highlighting the need for caution when using transformers for control and estimation.

摘要
孔雀Transformers在自然语言处理方面已经展现出惊人的成功，但它们在动力系统中的潜力还未得到了充分的发掘。在这项工作中，我们使用transformer来解决输出预测问题，它可以使用所有过去的输出来生成输出预测。我们使用不同的系统从先前分布中随机抽取training dataset，然后评估其性能在未看过的系统上。因此，我们得到的transformer behave like a prediction algorithm that learns in-context and quickly adapts to different systems, so we call it meta-output-predictor (MOP). MOP的性能与基于Kalman滤波的优化输出估计器相当，即使它没有访问模型。我们通过广泛的数值实验发现，MOP在非相关噪声、时间变化动力学和非线性动力学中也表现出色，如四旋翼系统with unknown parameters。在第二部分的文章中，我们为MOP的性能提供了统计保证和测试时间所需的培训量。最后，我们指出了MOP在某些情况下的局限性，例如在某些非线性和不确定性下的性能不佳，这引起了使用 transformers 进行控制和估计时的谨慎。

Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

paper_url: http://arxiv.org/abs/2308.08518
repo_url: None
paper_authors: Yuhao Yang, Jun Wu, Guangjian Zhang, Rong Xiong
for: 提高3D物体重建的精度和稳定性，特别是在受到 occlusion 的环境下。
methods: 提出了一种 bidirectional correspondence prediction network，利用点对点的注意力感知机制，同时考虑模型和Scene之间的几何相似性。
results: 对于 LineMOD、YCB-Video 和 Occ-LineMOD 等公共数据集进行了实验，并与其他状态数据方法进行了比较，研究发现提出的方法在同一种评价标准下表现更好，特别是在受到 occlusion 的环境下。

Abstract
Traditional geometric registration based estimation methods only exploit the CAD model implicitly, which leads to their dependence on observation quality and deficiency to occlusion. To address the problem,the paper proposes a bidirectional correspondence prediction network with a point-wise attention-aware mechanism. This network not only requires the model points to predict the correspondence but also explicitly models the geometric similarities between observations and the model prior. Our key insight is that the correlations between each model point and scene point provide essential information for learning point-pair matches. To further tackle the correlation noises brought by feature distribution divergence, we design a simple but effective pseudo-siamese network to improve feature homogeneity. Experimental results on the public datasets of LineMOD, YCB-Video, and Occ-LineMOD show that the proposed method achieves better performance than other state-of-the-art methods under the same evaluation criteria. Its robustness in estimating poses is greatly improved, especially in an environment with severe occlusions.

摘要
传统的几何注册基于估计方法只是通过隐式地利用CAD模型，这导致其виси于观察质量和遮挡的不足。为解决问题，文章提出了一种bidirectional匹配预测网络，该网络不仅需要模型点预测匹配，还Explicitly模型了观察和模型之间的几何相似性。我们关键的发现是每个模型点和场景点之间的相关性提供了重要的学习点对匹配信息。为了进一步处理特征分布的分散，我们设计了一个简单 yet有效的 Pseudo-Siamese 网络，以提高特征同化。实验结果表明，提议的方法在公共数据集LineMOD、YCB-Video和Occ-LineMOD上比其他状态的艺术方法得到更好的性能，特别是在具有严重遮挡的环境中。

2023-08-17

Enhancing API Documentation through BERTopic Modeling and Summarization

Fostering User Engagement in the Critical Reflection of Arguments

Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods

Severity Classification of Parkinson’s Disease from Speech using Single Frequency Filtering-based Features

A Mathematical Characterization of Minimally Sufficient Robot Brains

Synthesizing Physically Plausible Human Motions in 3D Scenes

Reinforcement Learning for Battery Management in Dairy Farming

Towards Lightweight Data Integration using Multi-workflow Provenance and Data Observability

An Extended Convergence Result for Behaviour Tree Controllers

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases

Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Predicting Crop Yield With Machine Learning: An Extensive Analysis Of Input Modalities And Models On a Field and sub-field Level

Interpretable Graph Neural Networks for Tabular Data

Towards Automatically Addressing Self-Admitted Technical Debt: How Far Are We?

A White-Box False Positive Adversarial Attack Method on Contrastive Loss-Based Offline Handwritten Signature Verification Models

IMM: An Imitative Reinforcement Learning Approach with Predictive Representation Learning for Automatic Market Making

Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly Detection

MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive Learning with Omics-Inference Modeling

Building Emotional Support Chatbots in the Era of LLMs

Towards a Practical Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via Randomized Smoothing

Development of a Knowledge Graph Embeddings Model for Pain

Model-Free Algorithm with Improved Sample Efficiency for Zero-Sum Markov Games

D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field

BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

CMB: A Comprehensive Medical Benchmark in Chinese

Lifted Algorithms for Symmetric Weighted First-Order Model Sampling

Do you really follow me? Adversarial Instructions for Evaluating the Robustness of Large Language Models

Capturing Popularity Trends: A Simplistic Non-Personalized Approach for Enhanced Item Recommendation

Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data

Bayesian polynomial neural networks and polynomial neural ordinary differential equations

CodeCoT and Beyond: Learning to Program and Test like a Developer

Knowledge-inspired Subdomain Adaptation for Cross-Domain Knowledge Transfer

Exploring Demonstration Ensembling for In-context Learning

Large Language Models at Work in China’s Labor Market

Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models

Sensor Fusion by Spatial Encoding for Autonomous Driving

Discrete Prompt Compression with Reinforcement Learning

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

PMET: Precise Model Editing in a Transformer

ReProHRL: Towards Multi-Goal Navigation in the Real World using Hierarchical Agents

LLM-FuncMapper: Function Identification for Interpreting Complex Clauses in Building Codes via LLM

A Novel Loss Function Utilizing Wasserstein Distance to Reduce Subject-Dependent Noise for Generalizable Models in Affective Computing

EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices

Probabilistic Results on the Architecture of Mathematical Reasoning Aligned by Cognitive Alternation

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment

Planning in the imagination: High-level planning on learned abstract search spaces

Lightweight Adaptation of Neural Language Models via Subspace Embedding

Quantifying Overfitting: Introducing the Overfitting Index

Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions

Towards Zero Memory Footprint Spiking Neural Network Training

FedPop: Federated Population-based Hyperparameter Tuning

LSTM-Based Forecasting Model for GRACE Accelerometer Data

Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought

Integrating Renewable Energy in Agriculture: A Deep Reinforcement Learning-based Approach

FootGPT : A Large Language Model Development Experiment on a Minimal Setting

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Can Transformers Learn Optimal Filtering for Unknown Systems?

Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction