2023-10-27

cs.AI

cs.AI - 2023-10-27

Multi Time Scale World Models

paper_url: http://arxiv.org/abs/2310.18534
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Vaisakh Shaj, Saleh Gholam Zadeh, Ozan Demir, Luiz Ricardo Douat, Gerhard Neumann
for: 这 paper 是为了研究智能代理如何使用内部世界模型来预测不同的行为范围和时间尺度上的不同趋势。
methods: 这 paper 使用了一种名为 Multi Time Scale State Space (MTS3) 的概率ormalism，这种ormalism 可以有效地在多个时间尺度上进行高精度的长期预测和不确定性估计。
results: 实验表明，MTS3 方法在许多系统标识 benchmark 上表现出色，包括复杂的模拟和实际世界动力系统。

Abstract
Intelligent agents use internal world models to reason and make predictions about different courses of their actions at many scales. Devising learning paradigms and architectures that allow machines to learn world models that operate at multiple levels of temporal abstractions while dealing with complex uncertainty predictions is a major technical hurdle. In this work, we propose a probabilistic formalism to learn multi-time scale world models which we call the Multi Time Scale State Space (MTS3) model. Our model uses a computationally efficient inference scheme on multiple time scales for highly accurate long-horizon predictions and uncertainty estimates over several seconds into the future. Our experiments, which focus on action conditional long horizon future predictions, show that MTS3 outperforms recent methods on several system identification benchmarks including complex simulated and real-world dynamical systems.

摘要
智能代理用内部世界模型来进行理解和预测不同的行动轨迹，从小规模到大规模，面临艰难的技术挑战。在这种工作中，我们提出了一种概率形式来学习多级时间尺度的世界模型，我们称之为多时间尺度状态空间（MTS3）模型。我们的模型使用多个时间尺度的计算效率优化的推理方案，以实现高精度的长期预测和未来数分秒内的不确定性估计。我们的实验集中关注行动条件长期未来预测，并在复杂的模拟和真实世界动力系统上达到了比较好的效果，超过了最近的方法。

Sample based Explanations via Generalized Representers

paper_url: http://arxiv.org/abs/2310.18526
repo_url: None
paper_authors: Che-Ping Tsai, Chih-Kuan Yeh, Pradeep Ravikumar
for: 本文提出了一种通用的样本基于解释方法，称为通用代表者（generalized representers），用于测量模型训练样本对测试预测的影响。
methods: 该方法使用两个组件：全局样本重要性和本地样本重要性。全局样本重要性量化训练点对模型的影响，是对测试样本不变的。本地样本重要性使用kernel计算测试点和训练点之间的相似度。文章的一个重要贡献是证明通用代表者是所有样本基于解释方法的自然集合。
results: 文章进行了对两个图像和两个文本分类 datasets 的实验比较，并证明了不同的通用代表者在不同的 dataset 上的性能。

Abstract
We propose a general class of sample based explanations of machine learning models, which we term generalized representers. To measure the effect of a training sample on a model's test prediction, generalized representers use two components: a global sample importance that quantifies the importance of the training point to the model and is invariant to test samples, and a local sample importance that measures similarity between the training sample and the test point with a kernel. A key contribution of the paper is to show that generalized representers are the only class of sample based explanations satisfying a natural set of axiomatic properties. We discuss approaches to extract global importances given a kernel, and also natural choices of kernels given modern non-linear models. As we show, many popular existing sample based explanations could be cast as generalized representers with particular choices of kernels and approaches to extract global importances. Additionally, we conduct empirical comparisons of different generalized representers on two image and two text classification datasets.

摘要
我们提出一种通用的样本基于解释方法，我们称之为通用表示者（generalized representers）。为了测量训练样本对模型测试预测的影响，通用表示者使用两个组件：全局样本重要性和本地样本重要性。全局样本重要性量化训练点对模型的影响，是不变的测试样本，而本地样本重要性则是测试点和训练点之间的相似性，使用核函数。我们的论文的一个重要贡献是证明通用表示者是唯一满足自然的axioms的类型的样本基于解释方法。我们讨论如何从核函数提取全局重要性，以及现代非线性模型中的自然选择核函数。我们还进行了两个图像和两个文本分类 datasets上的实验比较，以证明不同的通用表示者之间的区别。

3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition

paper_url: http://arxiv.org/abs/2310.18511
repo_url: https://github.com/Vision-CAIR/3DCoMPaT-v2
paper_authors: Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny
for: 本研究团队发布了3DCoMPaT$^{++}$,一个包含2D/3D多Modal数据集，包括160万个渲染视图和1000万个精细化的3D形状，以及匹配的RGB点云、3D纹理网格、深度地图和分割mask。
methods: 研究人员使用了一种新的任务，即Grounded CoMPaT Recognition (GCR)，以同时识别和地理3D物体的组合材料。此外，研究人员还提出了一种修改后的PointNet$^{++}$模型，用于6D输入的训练。
results: 研究人员在CVPR2023会议上组织了一场数据挑战，展示了赢家方法的使用，并 explore了GCR增强的一些alternative技术。

Abstract
In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.

摘要
在这个工作中，我们介绍3DCoMPaT$^{++}$,一个多Modal 2D/3D数据集，包含160万个渲染视图，以及更多的1000万个精细化的3D形状，其中每个形状都有精细化的部件标注，同时还包含匹配的RGB点云、 текстури化的三角形、深度地图和分割mask。3DCoMPaT$^{++}$覆盖了41种形状类，275种精细化部件类，以及293种精细化材料类，这些类可以在3D对象的部件上进行组合应用。我们从四个相等的视图渲染了一个百万个精细化的形状，并且随机选择四个视图，共计160万个渲染。在部件级别进行分割，并设置了粗略和细腻的semantic水平。我们介绍了一个新任务，即Grounded CoMPaT Recognition (GCR)，以同时认识和固定3D对象的部件上的材料组合。此外，我们还报告了CVPR2023年度数据挑战的结果，展示了一种使用修改后的PointNet$^{++}$模型训练于6D输入的赢家方法，以及探讨了GCR增强技术的代替方法。我们希望我们的工作能够为未来的3D视觉研究提供帮助。

Deep Reinforcement Learning for Weapons to Targets Assignment in a Hypersonic strike

paper_url: http://arxiv.org/abs/2310.18509
repo_url: None
paper_authors: Brian Gaudet, Kris Drozd, Roberto Furfaro
for: 用深度强化学习优化多辆 hypersonic strike 武器识别策略，以 maximize 每集episode中破坏目标的总价值。
methods: 使用深度强化学习来优化武器识别策略，并与非线性整数编程（NLIP）比较性能。
results: 相比NLIP策略，深度强化学习策略具有优化性和1000倍减少计算时间，可以实现实时决策，满足 autonomous 决策在任务末端。

Abstract
We use deep reinforcement learning (RL) to optimize a weapons to target assignment (WTA) policy for multi-vehicle hypersonic strike against multiple targets. The objective is to maximize the total value of destroyed targets in each episode. Each randomly generated episode varies the number and initial conditions of the hypersonic strike weapons (HSW) and targets, the value distribution of the targets, and the probability of a HSW being intercepted. We compare the performance of this WTA policy to that of a benchmark WTA policy derived using non-linear integer programming (NLIP), and find that the RL WTA policy gives near optimal performance with a 1000X speedup in computation time, allowing real time operation that facilitates autonomous decision making in the mission end game.

摘要
我们使用深度强化学习（RL）优化多辆高速武器对多个目标的分配策略，以最大化每个回合的目标总值。每个随机生成的回合都会变化高速武器和目标的数量和初始状态，目标的价值分布，以及高速武器被 intercept 的概率。我们对这种 WTA 策略与非线性整数编程（NLIP） derive 的参考 WTA 策略进行比较，发现 RL WTA 策略在计算时间上具有1000倍的加速，可以实现实时运行，从而促进任务尾部自动决策。

How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?

paper_url: http://arxiv.org/abs/2310.18496
repo_url: None
paper_authors: Zachariah Carmichael, Walter J. Scheirer
for: 这篇论文旨在研究可解释AI（XAI）技术，以帮助理解黑盒模型的决策过程。
methods: 该论文使用了多种常用的特征添加式解释器（如LIME、SHAP、SHAPR、MAPLE、PDP），以评估这些解释器在添加式预测器上的效果。
results: 研究发现，这些解释器在处理符号表示、神经网络和总代数模型上都有较差的性能，尤其是当决策过程含有特征交互时。

Abstract
Surging interest in deep learning from high-stakes domains has precipitated concern over the inscrutable nature of black box neural networks. Explainable AI (XAI) research has led to an abundance of explanation algorithms for these black boxes. Such post hoc explainers produce human-comprehensible explanations, however, their fidelity with respect to the model is not well understood - explanation evaluation remains one of the most challenging issues in XAI. In this paper, we ask a targeted but important question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors? Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model. We demonstrate the efficacy of our approach in understanding these explainers applied to symbolic expressions, neural networks, and generalized additive models on thousands of synthetic and several real-world tasks. Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions.

摘要
高于常规领域的深度学习突破性引起了黑盒神经网络的不可预测性的问题的关注。可解释AI（XAI）研究引发了大量的解释算法 для这些黑盒。然而，这些后期解释器的准确性与模型之间的关系并不很清楚 - 解释评估仍然是XAI中最大的挑战。在这篇论文中，我们提出了一个targeted yet important问题：可能性分解器（例如LIME、SHAP、SHAPR、MAPLE和PDP）能够解释增加性预测器吗？我们在这篇论文中评估这些解释器在符号表示法、神经网络和总加itive模型上的 thousendsof synthetic和several real-world任务中的效果。我们的结果表明，无论是在符号表示法还是在实际任务上，所有的解释器都 eventually fail to correctly attribute the importance of features，特别是当决策过程中涉及到特征之间的互动。

MOSEL: Inference Serving Using Dynamic Modality Selection

paper_url: http://arxiv.org/abs/2310.18481
repo_url: None
paper_authors: Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella
For: The paper is written for researchers and developers who are working on machine learning models and inference-serving systems, and who are looking for ways to improve the efficiency and accuracy of their models.* Methods: The paper proposes a new approach called modality selection, which involves adaptively choosing the most relevant modalities for an inference task based on user-defined performance and accuracy requirements. The proposed approach is implemented in an automated inference serving system called MOSEL.* Results: The paper reports that MOSEL improves system throughput by 3.6 times with an accuracy guarantee and shortens job completion times by 11 times compared to a baseline approach. The results demonstrate the effectiveness of the modality selection approach and the benefits of using MOSEL for multi-modal machine learning models.

Abstract
Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities. However, to attain the desired accuracy, the model sizes and in turn their computational requirements have increased drastically. Thus, serving predictions from these models to meet any target latency and cost requirements of applications remains a key challenge, despite recent work in building inference-serving systems as well as algorithmic approaches that dynamically adapt models based on inputs. In this paper, we introduce a form of dynamism, modality selection, where we adaptively choose modalities from inference inputs while maintaining the model quality. We introduce MOSEL, an automated inference serving system for multi-modal ML models that carefully picks input modalities per request based on user-defined performance and accuracy requirements. MOSEL exploits modality configurations extensively, improving system throughput by 3.6$\times$ with an accuracy guarantee and shortening job completion times by 11$\times$.

摘要
随着时间的推移，机器学习模型在过去的几年内进行了快速的进步，有时甚至超越人类的能力。然而，为了达到所需的准确率，模型的大小和计算需求却有了很大的增长。因此，将预测结果服务到应用程序中，以满足任何目标延迟和成本要求，仍然是一大项目。在这篇论文中，我们引入了一种动态性，即modalities选择，我们在推理输入中动态选择Modalities，保持模型质量。我们介绍了MOSEL，一个自动化推理服务系统，可以智能地选择输入Modalities，根据用户定义的性能和准确率要求。MOSEL利用模式配置的潜在优势，提高系统吞吐量3.6倍，同时保证准确率和完成任务时间的短短化。

Weighted Sampled Split Learning (WSSL): Balancing Privacy, Robustness, and Fairness in Distributed Learning Environments

paper_url: http://arxiv.org/abs/2310.18479
repo_url: None
paper_authors: Manish Osti, Aashray Thakuri, Basheer Qolomany, Aos Mulahuwaish
for: 提高隐私、可靠性和公平性在分布式机器学习系统中
methods: 使用权重采样方法，将学习过程分布到多个客户端，以保护数据隐私和提高模型准确性
results: 1) 提高模型准确性，2) 提高系统可靠性，3) 维护客户端组合的公平性

Abstract
This study presents Weighted Sampled Split Learning (WSSL), an innovative framework tailored to bolster privacy, robustness, and fairness in distributed machine learning systems. Unlike traditional approaches, WSSL disperses the learning process among multiple clients, thereby safeguarding data confidentiality. Central to WSSL's efficacy is its utilization of weighted sampling. This approach ensures equitable learning by tactically selecting influential clients based on their contributions. Our evaluation of WSSL spanned various client configurations and employed two distinct datasets: Human Gait Sensor and CIFAR-10. We observed three primary benefits: heightened model accuracy, enhanced robustness, and maintained fairness across diverse client compositions. Notably, our distributed frameworks consistently surpassed centralized counterparts, registering accuracy peaks of 82.63% and 75.51% for the Human Gait Sensor and CIFAR-10 datasets, respectively. These figures contrast with the top accuracies of 81.12% and 58.60% achieved by centralized systems. Collectively, our findings champion WSSL as a potent and scalable successor to conventional centralized learning, marking it as a pivotal stride forward in privacy-focused, resilient, and impartial distributed machine learning.

摘要

Causal disentanglement of multimodal data

paper_url: http://arxiv.org/abs/2310.18471
repo_url: None
paper_authors: Elise Walker, Jonas A. Actor, Carianne Martinez, Nathaniel Trask
for: 本研究旨在探讨 causal representation learning 算法，它可以从数据中找到低维度的表示，并且这些表示具有可解释的 causal 关系。
methods: 本研究使用了多种方法，包括 linear 结构 causal model、 intervenitional 数据和 weak supervision。然而，在 exploratory causal representation learning 中，这些元素和先前信息可能不可用或不合理。因此，我们提出了一种新的 causal representation learning 算法（causalPIMA），它可以使用多模态数据和物理约束来找到重要的 causal 关系。
results: 我们的结果表明，causalPIMA 可以在完全无监督情况下学习一个可解释的 causal 结构，同时也可以找到关键的特征。我们测试了这种算法在一个 synthetic 数据集和一个科学数据集上，结果表明，它可以在完全无监督情况下找到关键的特征和 causal 关系。

Abstract
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.

摘要
causal representation learning algorithms 找到 Lower-dimensional 的表示，这些表示具有可解释的 causal 关系；因为实现这种可解释的表示是困难的，许多 causal learning algorithms 使用元信息，如（线性）结构 causal 模型， intervening 数据或 weak supervision。然而，在 exploratory causal representation learning 中，这些元信息和 prior information 可能不可用或不合适。 alternatively， scientific datasets часто有多个模式或 physics-based 约束，并使用这些 scientific, multimodal 数据可以提高 disentanglement 在完全无监督的设置中。因此，我们引入了一种 causal representation learning algorithm (causalPIMA)，可以使用 multimodal 数据和known physics 来发现重要的 causal 关系。我们的 innovative algorithm 使用了一种新的 differentiable parametrization，在一个 end-to-end differentiable 框架中学习一个 directed acyclic graph (DAG) 和一个 latent space 的 variational autoencoder。我们在这个框架中使用了一个单一的 tractable evidence lower bound 损失函数。我们在 latent space 中分配了 Gaussian mixture prior，并将每个混合物标识为 DAG 节点的结果；这种新的标识使得 feature discovery 具有 causal 关系。我们在一个 sintetic 和一个 scientific dataset 上测试了我们的结果，结果表明我们可以在完全无监督的设置中学习可解释的 causal 结构，同时也可以发现关键的特征。

Semi-Synthetic Dataset Augmentation for Application-Specific Gaze Estimation

paper_url: http://arxiv.org/abs/2310.18469
repo_url: None
paper_authors: Cedric Leblond-Menard, Gabriel Picard-Krashevski, Sofiane Achiche
for: 增强 gaze estimation 数据集，提高模型的通用性
methods: 使用 textured tridimensional mesh 技术，将训练图像从虚拟摄像头中渲染出来
results: 平均降低 gaze estimation 错误角度的比例为 47%

Abstract
Although the number of gaze estimation datasets is growing, the application of appearance-based gaze estimation methods is mostly limited to estimating the point of gaze on a screen. This is in part because most datasets are generated in a similar fashion, where the gaze target is on a screen close to camera's origin. In other applications such as assistive robotics or marketing research, the 3D point of gaze might not be close to the camera's origin, meaning models trained on current datasets do not generalize well to these tasks. We therefore suggest generating a textured tridimensional mesh of the face and rendering the training images from a virtual camera at a specific position and orientation related to the application as a mean of augmenting the existing datasets. In our tests, this lead to an average 47% decrease in gaze estimation angular error.

摘要
In other words, the existing datasets for gaze estimation are mostly generated with the gaze target on a screen close to the camera's origin, which limits the application of appearance-based gaze estimation methods to only estimating the point of gaze on a screen. To address this limitation, we suggest using a textured 3D mesh of the face and rendering the training images from a virtual camera at a specific position and orientation related to the application as a means of augmenting the existing datasets. This leads to an average 47% decrease in gaze estimation angular error.

LLMSTEP: LLM proofstep suggestions in Lean

paper_url: http://arxiv.org/abs/2310.18457
repo_url: https://github.com/wellecks/llmstep
paper_authors: Sean Welleck, Rahul Saha
for: 用于将语言模型集成到lean数据助手中
methods: 使用服务器主机的语言模型生成建议，并在lean中检查和显示给用户
results: 提供了基准语言模型，以及代码 для精度调整和评估，以支持进一步的开发In English, this means:
for: The paper is written to integrate a language model into the Lean proof assistant.
methods: The paper proposes using a server hosting a language model to generate suggestions, which are then checked in Lean and displayed to the user in their development environment.
results: The paper provides a baseline language model, along with code for fine-tuning and evaluation to support further development.

Abstract
We present LLMSTEP, a tool for integrating a language model into the Lean proof assistant. LLMSTEP is a Lean 4 tactic that sends a user's proof state to a server hosting a language model. The language model generates suggestions, which are checked in Lean and displayed to a user in their development environment. We provide a baseline language model, along with code for fine-tuning and evaluation to support further development. We provide server implementations that run on CPU, a CUDA GPU, or a Google Colab notebook, as a step towards fast, effective language model suggestions for any user.

摘要
我们介绍LLMSTEP，一个将语言模型集成到lean推理助手的工具。LLMSTEP是lean 4的一个战略，将用户的证明状态发送到一个主机上的语言模型。语言模型产生建议，并在lean中检查和显示给用户。我们提供了基线语言模型，以及代码 для微调和评估，以支持进一步的开发。我们提供了 CPU、CUDA GPU 和 Google Colab 笔记本上的服务器实现，以便快速、有效地获得任何用户的语言模型建议。

A Novel Skip Orthogonal List for Dynamic Optimal Transport Problem

paper_url: http://arxiv.org/abs/2310.18446
repo_url: https://github.com/xyxu2033/DynamicOptimalTransport
paper_authors: Xiaoyang Xu, Hu Ding
for: solves the discrete dynamic optimal transport problem efficiently when the weights or locations of the data points change, with applications in machine learning.
methods: proposes a novel 2D Skip Orthogonal List and dynamic tree techniques, based on the conventional simplex method, to efficiently complete each pivoting operation within $O(|V|)$ time with high probability.
results: significantly outperforms existing algorithms in dynamic scenarios, with a few simplex iterations in practice.

Abstract
Optimal transportation is a fundamental topic that has attracted a great amount of attention from machine learning community in the past decades. In this paper, we consider an interesting discrete dynamic optimal transport problem: can we efficiently update the optimal transport plan when the weights or the locations of the data points change? This problem is naturally motivated by several applications in machine learning. For example, we often need to compute the optimal transportation cost between two different data sets; if some change happens to a few data points, should we re-compute the high complexity cost function or update the cost by some efficient dynamic data structure? We are aware that several dynamic maximum flow algorithms have been proposed before, however, the research on dynamic minimum cost flow problem is still quite limited, to the best of our knowledge. We propose a novel 2D Skip Orthogonal List together with some dynamic tree techniques. Although our algorithm is based on the conventional simplex method, it can efficiently complete each pivoting operation within $O(|V|)$ time with high probability where $V$ is the set of all supply and demand nodes. Since dynamic modifications typically do not introduce significant changes, our algorithm requires only a few simplex iterations in practice. So our algorithm is more efficient than re-computing the optimal transportation cost that needs at least one traversal over all the $O(|E|) = O(|V|^2)$ variables in general cases. Our experiments demonstrate that our algorithm significantly outperforms existing algorithms in the dynamic scenarios.

摘要
最优运输是机器学习领域内一个基本问题，在过去几十年内吸引了大量关注。在这篇论文中，我们考虑了一个有趣的离散动态最优运输问题：在数据点的重量或位置发生变化时，是否可以有效地更新最优运输计划？这个问题是机器学习中各种应用场景的自然推动。例如，我们经常需要计算两个不同数据集之间的最优运输成本；如果一些数据点发生变化，是否可以快速地更新高复杂性成本函数，或者使用一些高效的动态数据结构？我们知道有几种动态最大流算法被提出，但是关于动态最小成本流问题的研究还很有限，至于我们所知道的最佳状态。我们提出了一种新的2D跳过列表，并结合了一些动态树技术。尽管我们的算法基于传统的简单кс方法，但它可以在$O(|V|)$时间内高可用性下完成每次轴转操作，其中$V$是所有供应和需求节点的集合。由于动态修改通常不会引入重要的变化，我们的算法只需要几个简单кс迭代即可。因此，我们的算法比重新计算总成本函数，需要至少一次遍历所有$O(|E|) = O(|V|^2)$变量的情况下更高效。我们的实验表明，我们的算法在动态场景下明显超过现有算法。

Towards a fuller understanding of neurons with Clustered Compositional Explanations

paper_url: http://arxiv.org/abs/2310.18443
repo_url: https://github.com/krlgroup/clustered-compositional-explanations
paper_authors: Biagio La Rosa, Leilani H. Gilpin, Roberto Capobianco
for: 本研究旨在提出一种新的 neuron 行为预测方法，即 Clustered Compositional Explanations，以拓宽 neuron 活动谱的spectrum。
methods: 本研究使用 Compositional Explanations 方法，并将其与归一化和一种新的搜索算法结合，以便更好地预测 neuron 的行为。
results: 本研究通过分析不同谱activation的问题和提出了一些 desiderata 质量，以便评估不同算法返回的解释的有效性。

Abstract
Compositional Explanations is a method for identifying logical formulas of concepts that approximate the neurons' behavior. However, these explanations are linked to the small spectrum of neuron activations (i.e., the highest ones) used to check the alignment, thus lacking completeness. In this paper, we propose a generalization, called Clustered Compositional Explanations, that combines Compositional Explanations with clustering and a novel search heuristic to approximate a broader spectrum of the neurons' behavior. We define and address the problems connected to the application of these methods to multiple ranges of activations, analyze the insights retrievable by using our algorithm, and propose desiderata qualities that can be used to study the explanations returned by different algorithms.

摘要
《 compositional explanations 是一种方法，用于identifying logical formulas of concepts that approximate the neurons' behavior。然而，这些解释与小谱activations（即用于检查alignment的最高一些）相关，因此缺乏完整性。在这篇论文中，我们提出了一种扩展，called Clustered Compositional Explanations，它将 Compositional Explanations 与 clustering 和一种新的搜索规则相结合，以approximate a broader spectrum of the neurons' behavior。我们定义并讨论了应用这些方法到多个范围的活动问题，分析了使用我们的算法可以获得的洞察，并提出了对不同算法返回的解释的希望质量。》Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

On the Fairness ROAD: Robust Optimization for Adversarial Debiasing

paper_url: http://arxiv.org/abs/2310.18413
repo_url: https://github.com/fairmlresearch/road
paper_authors: Vincent Grari, Thibault Laugel, Tatsunori Hashimoto, Sylvain Lamprier, Marcin Detyniecki
for: 本研究旨在解决分布式公平性问题，保证预测结果在不同敏感组中具有地域性均衡。
methods: 我们提出了一种名为ROAD的新方法，基于分布式Robust优化（DRO）框架和公平对抗学习目标，通过一种实例级别的重量策略，优先级给可能存在地方不公平的输入。
results: 实验结果表明，ROAD方法可以在三个标准数据集上实现Pareto优化，即同时保证地域性均衡和全球公平性，并且在分布shift情况下提高公平性泛化性。

Abstract
In the field of algorithmic fairness, significant attention has been put on group fairness criteria, such as Demographic Parity and Equalized Odds. Nevertheless, these objectives, measured as global averages, have raised concerns about persistent local disparities between sensitive groups. In this work, we address the problem of local fairness, which ensures that the predictor is unbiased not only in terms of expectations over the whole population, but also within any subregion of the feature space, unknown at training time. To enforce this objective, we introduce ROAD, a novel approach that leverages the Distributionally Robust Optimization (DRO) framework within a fair adversarial learning objective, where an adversary tries to infer the sensitive attribute from the predictions. Using an instance-level re-weighting strategy, ROAD is designed to prioritize inputs that are likely to be locally unfair, i.e. where the adversary faces the least difficulty in reconstructing the sensitive attribute. Numerical experiments demonstrate the effectiveness of our method: it achieves Pareto dominance with respect to local fairness and accuracy for a given global fairness level across three standard datasets, and also enhances fairness generalization under distribution shift.

摘要
在算法公平领域，大量关注集合公平标准，如人口学性别比和等值机会。然而，这些目标，作为总体平均值，已经引起了地方不均衡的持续问题。在这种情况下，我们解决了地方公平问题，以确保预测器在整个人口中不偏袋，而且在任何未知训练时间的子区域中也是不偏袋。为此，我们提出了ROAD，一种基于分布robust优化（DRO）框架的新方法，具有公平反对抗学习目标，其中一个反对手尝试从预测中推断敏感特征。通过实例级别的重量策略，ROAD可以优先级化可能存在地方不公平的输入，即反对手在推断敏感特征时面临最小的困难。 numerically experiment demontrates the effectiveness of our method：it achieves Pareto dominance with respect to local fairness and accuracy for a given global fairness level across three standard datasets, and also enhances fairness generalization under distribution shift.

Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models

paper_url: http://arxiv.org/abs/2310.18308
repo_url: None
paper_authors: Pushkal Katara, Zhou Xian, Katerina Fragkiadaki
for: 本研究旨在帮助基础学习的机器人抓取 manipulate的技能，以便在多种环境中学习和应用。
methods: 本研究使用大型生成模型来自动生成3D资产、任务描述、任务分解和奖励函数，从而减少人类的参与度。
results: 研究成功地学习了多种长期任务的策略，而非temporally decomposed reward function无法学习这些任务。 Gen2Sim提供了一种可行的方法来扩大和多样化 robot manipulation技能的学习，并且可以通过时间层次分解来探索RL中的行为发现。

Abstract
Generalist robot manipulators need to learn a wide variety of manipulation skills across diverse environments. Current robot training pipelines rely on humans to provide kinesthetic demonstrations or to program simulation environments and to code up reward functions for reinforcement learning. Such human involvement is an important bottleneck towards scaling up robot learning across diverse tasks and environments. We propose Generation to Simulation (Gen2Sim), a method for scaling up robot skill learning in simulation by automating generation of 3D assets, task descriptions, task decompositions and reward functions using large pre-trained generative models of language and vision. We generate 3D assets for simulation by lifting open-world 2D object-centric images to 3D using image diffusion models and querying LLMs to determine plausible physics parameters. Given URDF files of generated and human-developed assets, we chain-of-thought prompt LLMs to map these to relevant task descriptions, temporal decompositions, and corresponding python reward functions for reinforcement learning. We show Gen2Sim succeeds in learning policies for diverse long horizon tasks, where reinforcement learning with non temporally decomposed reward functions fails. Gen2Sim provides a viable path for scaling up reinforcement learning for robot manipulators in simulation, both by diversifying and expanding task and environment development, and by facilitating the discovery of reinforcement-learned behaviors through temporal task decomposition in RL. Our work contributes hundreds of simulated assets, tasks and demonstrations, taking a step towards fully autonomous robotic manipulation skill acquisition in simulation.

摘要
通用 robot manipulator 需要学习多种 manipulate 技能在多种环境中。现有的 robot 训练管道依赖人类提供动能示例或编程 simulation 环境，并编程 reward 函数 для reinforcement learning。这种人类参与度是扩大 robot 学习的重要瓶颈。我们提出 Generation to Simulation（Gen2Sim）方法，用于扩大 robot 技能学习在 simulation 中。我们使用大型预训练的语言和视觉生成模型自动生成 3D 资产、任务描述、任务分解和 reward 函数。我们使用图像扩散模型将开放世界 2D 物体中的图像映射到 3D，并使用 LLMS 确定物理参数。给定 URDF 文件生成和人类开发的资产，我们使用链式思维 Prompt LLMs 将它们映射到相关的任务描述、时间分解和相应的 Python reward 函数。我们证明 Gen2Sim 可以学习多种长期任务的策略，而 reinforcement learning 无法使用非时间分解的 reward 函数。Gen2Sim 为 robot manipulator 在 simulation 中的学习提供了一条可行的道路，不仅扩大和多样化任务和环境开发，还促进了通过时间分解在 RL 中发现执行 behaviors 的发现。我们的工作提供了数百个模拟资产、任务和示例，为完全自主 robotic manipulation 技能获得做出了一步进展。

A Stability Principle for Learning under Non-Stationarity

paper_url: http://arxiv.org/abs/2310.18304
repo_url: None
paper_authors: Chengpiao Huang, Kaizheng Wang
for: 这个研究旨在开发一个适应非站ARY环境的统计学学习框架。
methods: 这个方法利用稳定原则选择每个时间段的回顾窗口，以最大化历史数据的利用，并保持累累合错的总错误在接受随机错误的变化范围内。
results: 论述显示这个方法在不知道非站ARY的情况下也能够适应。 regret bound是最大化对应损失的最小化最大化对应损失，即logarithmic factor。研究中的两个新成果包括一个Function similarity度量和一个分 segmentation技术。

Abstract
We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptability of this approach to unknown non-stationarity. The regret bound is minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.

摘要
我们开发了一个灵活的统计学学习框架，适用于不稳定的环境。每个时间间隔，我们的方法会选择一个稳定原则来选择最大化历史数据的利用，同时保持积累偏差在接受范围内的偏差。我们的理论表明这种方法在未知非站ARY情况下具有适应性。我们的 regret bound是最小化的最大化因子，当人口损失是强Converter或Lipschitz时。我们的分析中包括两个新的组成部分：一种函数相似度度量和非站ARY数据序列分割技术。

Socially Cognizant Robotics for a Technology Enhanced Society

paper_url: http://arxiv.org/abs/2310.18303
repo_url: None
paper_authors: Kristin J. Dana, Clinton Andrews, Kostas Bekris, Jacob Feldman, Matthew Stone, Pernille Hemmer, Aaron Mazzeo, Hal Salzman, Jingang Yi
for: 本研究旨在推动人类中心的机器人应用，并关注其影响的问题。
methods: 本研究提出了一种涉猛社会科学方法，将技术和社会科学方法相结合，以便在机器人行为中推动参与者参与和社会评估。
results: 研究发现，通过将人类中心的目标放在首位，可以开拓出许多新的研究视角和问题，以改善机器人与人类之间的交互，并对社会产生的影响。

Abstract
Emerging applications of robotics, and concerns about their impact, require the research community to put human-centric objectives front-and-center. To meet this challenge, we advocate an interdisciplinary approach, socially cognizant robotics, which synthesizes technical and social science methods. We argue that this approach follows from the need to empower stakeholder participation (from synchronous human feedback to asynchronous societal assessment) in shaping AI-driven robot behavior at all levels, and leads to a range of novel research perspectives and problems both for improving robots' interactions with individuals and impacts on society. Drawing on these arguments, we develop best practices for socially cognizant robot design that balance traditional technology-based metrics (e.g. efficiency, precision and accuracy) with critically important, albeit challenging to measure, human and society-based metrics.

摘要
新兴应用场景和对其影响的担忧，需要研究社区将人类中心的目标置于首位。为解决这个挑战，我们支持跨学科的方法，社会认知机器人，它将技术和社会科学方法相结合。我们认为，这种方法来自参与者参与（从同步人类反馈到异步社会评估）在AI驱动机器人行为的形成中发挥作用，并导致了改善机器人与个人交互以及对社会的影响的新研究视角和问题。从这些理由，我们开发了社会认知机器人的最佳实践，权衡传统技术基础的指标（如效率、准确率）与人类和社会基础的指标，这些指标具有挑战性，但对于机器人的设计和应用至关重要。

Interactive Motion Planning for Autonomous Vehicles with Joint Optimization

paper_url: http://arxiv.org/abs/2310.18301
repo_url: None
paper_authors: Yuxiao Chen, Sushant Veer, Peter Karkus, Marco Pavone
for: This paper is written for planning safe motions for autonomous vehicles in highly interactive driving scenarios.
methods: The paper uses deep-learning-based models for trajectory prediction and joint optimization with model predictive control (MPC) to leverage ego-conditioned prediction.
results: The proposed Interactive Joint Planning (IJP) method significantly outperforms baselines in closed-loop simulation, demonstrating its effectiveness in providing safe and efficient motions for autonomous vehicles in interactive driving scenarios.Here’s the Chinese translation of the three points:
for: 这篇论文是为了规划自动驾驶车辆在高度互动的驾驶场景中安全的运动计划。
methods: 该论文使用深度学习基于模型来预测轨迹并与模型预测控制（MPC）结合进行联合优化，以利用egos conditioned预测。
results: 提出的互动联合规划（IJP）方法在关闭Loop simulation中显著超越基准值，demonstrating its effectiveness in providing safe and efficient motions for autonomous vehicles in interactive driving scenarios.

Abstract
In highly interactive driving scenarios, the actions of one agent greatly influences those of its neighbors. Planning safe motions for autonomous vehicles in such interactive environments, therefore, requires reasoning about the impact of the ego's intended motion plan on nearby agents' behavior. Deep-learning-based models have recently achieved great success in trajectory prediction and many models in the literature allow for ego-conditioned prediction. However, leveraging ego-conditioned prediction remains challenging in downstream planning due to the complex nature of neural networks, limiting the planner structure to simple ones, e.g., sampling-based planner. Despite their ability to generate fine-grained high-quality motion plans, it is difficult for gradient-based planning algorithms, such as model predictive control (MPC), to leverage ego-conditioned prediction due to their iterative nature and need for gradient. We present Interactive Joint Planning (IJP) that bridges MPC with learned prediction models in a computationally scalable manner to provide us the best of both the worlds. In particular, IJP jointly optimizes over the behavior of the ego and the surrounding agents and leverages deep-learned prediction models as prediction priors that the join trajectory optimization tries to stay close to. Furthermore, by leveraging homotopy classes, our joint optimizer searches over diverse motion plans to avoid getting stuck at local minima. Closed-loop simulation result shows that IJP significantly outperforms the baselines that are either without joint optimization or running sampling-based planning.

摘要
在高度互动的驾驶场景中，一个agent的行为会深刻影响其周围的其他agent。因此，为自动驾驶车辆在这些互动环境中规划安全的动作计划，需要考虑ego的意图动作计划对周围agent的行为的影响。深度学习基于模型在轨迹预测方面刚果取得了很大成功，但是在下游规划中利用egoconditioned预测仍然具有挑战性，因为神经网络的复杂性限制了规划结构的选择，只能选择简单的采样基本预测器。尽管它们可以生成细腻高质量的动作计划，但是使用梯度计算法，如模型预测控制（MPC），利用egoconditioned预测却困难，因为它们的迭代性和需要梯度。我们提出了互动联合规划（IJP），它将MPC与学习预测模型在计算可扩展的方式联系起来，以获得最佳的世界。具体来说，IJP同时优化ego和周围agent的行为，并利用深度学习预测模型作为预测假设，Join trajectory optimization尝试保持近于预测。此外，通过Homotopy类，我们的联合优化器搜索到多种动作计划，以避免陷入地点附近的局部最佳解。关闭环境 simulate结果表明，IJP显著超过了不包含联合优化或运行采样基本预测的基eline。

Image Clustering Conditioned on Text Criteria

paper_url: http://arxiv.org/abs/2310.18297
repo_url: https://github.com/sehyunkwon/ictc
paper_authors: Sehyun Kwon, Jaeseung Park, Minkyu Kim, Jaewoong Cho, Ernest K. Ryu, Kangwook Lee
for: 图像 clustering based on user-specified text criteria
methods: 利用现代视觉语言模型和大语言模型，实现图像 clustering Conditional on Text Criteria (IC$|$TC)
results: 在不同的基准下，IC$|$TC 可以有效地对图像进行分 clustering，并与基eline 相比显著提高表现。

Abstract
Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our method Image Clustering Conditioned on Text Criteria (IC$|$TC), and it represents a different paradigm of image clustering. IC$|$TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return. Our experiments show that IC$|$TC can effectively cluster images with various criteria, such as human action, physical location, or the person's mood, while significantly outperforming baselines.

摘要
传统的帮助方法不提供用户直接控制帮助结果，并且帮助结果可能不符合用户有意思的标准。在这种工作中，我们介绍了一种新的图像帮助方法，基于用户指定的文本标准。我们称之为图像帮助 conditional on 文本标准（IC$|$TC），它代表了一种新的帮助方法 paradigm。IC$|$TC需要最小化和实用的人类干预，并为用户提供了较高的控制权，以换取更好的帮助结果。我们的实验表明，IC$|$TC可以有效地将图像分类到不同的标准，如人类动作、物理位置或人的情绪，而与基准值相比显著性能更高。

Moments for Perceptive Narration Analysis Through the Emotional Attachment of Audience to Discourse and Story

paper_url: http://arxiv.org/abs/2310.18273
repo_url: None
paper_authors: Gary Bruins, Ergun Akleman
for: 这篇论文的目的是开发一个可以分析视觉故事的理论框架，从而更好地理解电影、漫画等视觉故事的效果。
methods: 这篇论文引入了一个新的故事元素called “moments”，并提出了一种方法来分解线性故事（如电影）into a set of moments。这些 moments 可以分为两类：Story moments 和 Discourse moments。每种类型的 moment 可以进一步分为三种类型的 universal storytelling moments，这些 moments 可以增强或削弱观众对角色或故事的情感附加。
results: 这篇论文提出了一种方法来目录各种 universal moments 的出现，并使用曲线或颜色带来可视化角色的旅程。此外， authors 还证明了 story moments 和 Discourse moments 都可以转化为一个总趋势参数，这个参数可以在时间轴上Plot 出观众对故事的情感附加情况。

Abstract
In this work, our goal is to develop a theoretical framework that can eventually be used for analyzing the effectiveness of visual stories such as feature films to comic books. To develop this theoretical framework, we introduce a new story element called moments. Our conjecture is that any linear story such as the story of a feature film can be decomposed into a set of moments that follow each other. Moments are defined as the perception of the actions, interactions, and expressions of all characters or a single character during a given time period. We categorize the moments into two major types: story moments and discourse moments. Each type of moment can further be classified into three types, which we call universal storytelling moments. We believe these universal moments foster or deteriorate the emotional attachment of the audience to a particular character or the story. We present a methodology to catalog the occurrences of these universal moments as they are found in the story. The cataloged moments can be represented using curves or color strips. Therefore, we can visualize a character's journey through the story as either a 3D curve or a color strip. We also demonstrated that both story and discourse moments can be transformed into one lump-sum attraction parameter. The attraction parameter in time provides a function that can be plotted graphically onto a timeline illustrating changes in the emotional attachment of audience to a character or the story. By inspecting these functions the story analyst can analytically decipher the moments in the story where the attachment is being established, maintained, strengthened, or conversely where it is languishing.

摘要
在这项工作中，我们的目标是开发一个理论框架，以便分析视觉故事，从电影到漫画。为了实现这个目标，我们引入了一个新的故事元素，即“时刻”（moments）。我们的假设是，任何线性故事，例如电影的故事，都可以分解成一系列的时刻，这些时刻继承于一个时间段内的人物或单一人物的行动、互动和表达。我们将时刻分类为两大类：剧情时刻和对话时刻。每种时刻可以进一步分为三种通用故事创作时刻。我们认为这些通用时刻会使观众对特定人物或故事产生情感附加或减少。我们提出了一种方法来目录这些通用时刻的出现，并可以使用曲线或颜色带来表示人物的旅程。我们还证明了，剧情和对话时刻都可以转化为一个累积参数。这个参数在时间上提供了一个函数，可以在时间轴上Plot，并表示观众对人物或故事的情感附加或减少的变化。通过查看这些函数，故事分析人员可以分析故事中情感附加的时刻，以及将其建立、维护、强化或反之。

Learning to Search Feasible and Infeasible Regions of Routing Problems with Flexible Neural k-Opt

paper_url: http://arxiv.org/abs/2310.18264
repo_url: https://github.com/yining043/neuopt
paper_authors: Yining Ma, Zhiguang Cao, Yeow Meng Chee
for: 本研究开展了一种基于学习搜索的路径规划算法NeuOpt，用于解决路径规划问题。
methods: NeuOpt使用了一种特定的动作因子化方法和一种自定义的循环双流解码器，以学习具有灵活k-选择的搜索策略。此外，paper还提出了一种引导不可能区域探索（GIRE）方案，以便更好地让搜索算法自主探索可行和不可行的区域。
results: 实验表明，NeuOpt在TSP和CVRP问题上显著超越了现有的面罩-based L2S算法，同时也超越了L2C和L2P算法。此外，paper还提供了一些新的思路来处理VRP约束。

Abstract
In this paper, we present Neural k-Opt (NeuOpt), a novel learning-to-search (L2S) solver for routing problems. It learns to perform flexible k-opt exchanges based on a tailored action factorization method and a customized recurrent dual-stream decoder. As a pioneering work to circumvent the pure feasibility masking scheme and enable the autonomous exploration of both feasible and infeasible regions, we then propose the Guided Infeasible Region Exploration (GIRE) scheme, which supplements the NeuOpt policy network with feasibility-related features and leverages reward shaping to steer reinforcement learning more effectively. Additionally, we equip NeuOpt with Dynamic Data Augmentation (D2A) for more diverse searches during inference. Extensive experiments on the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that our NeuOpt not only significantly outstrips existing (masking-based) L2S solvers, but also showcases superiority over the learning-to-construct (L2C) and learning-to-predict (L2P) solvers. Notably, we offer fresh perspectives on how neural solvers can handle VRP constraints. Our code is available: https://github.com/yining043/NeuOpt.

摘要
在这篇论文中，我们提出了一种名为Neural k-Opt（NeuOpt）的学习到搜索（L2S）算法，用于解决路径问题。它学习如何进行灵活的 k-opt 交换，基于一种适应性的动作因子化方法和一种自定义的循环双流解码器。作为一种突破约束 маскинг 方案的先锋性工作，我们然后提出了指导不可能区域探索（GIRE）方案，该方案在NeuOpt策略网络中添加了可行性相关特征，并通过奖励形成来更有效地驱动学习。此外，我们还为NeuOpt提供了动态数据扩充（D2A）以在推理中进行更多的搜索。我们在旅行商问题（TSP）和容量有限的交通问题（CVRP）进行了广泛的实验，结果表明，我们的NeuOpt不仅明显超越了现有的（masking-based）L2S算法，还超越了学习到构建（L2C）和学习到预测（L2P）算法。另外，我们还提供了一些新的视角，用于描述如何使用神经网络来处理 VRP 约束。我们的代码可以在 GitHub 上找到：https://github.com/yining043/NeuOpt。

A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking

paper_url: http://arxiv.org/abs/2310.18244
repo_url: None
paper_authors: Rose Hadshar
for: 该论文探讨了人工智能（AI）可能对人类 pose existential risks 的证据，具体来说是通过偏向和权力寻求。
methods: 该论文对偏向和权力寻求的证据进行了评估，包括实证证据、概念性证据和专家意见。
results: 论文发现，虽然目前没有公开的实证例子表明 AI 系统会发展出偏向和权力寻求，但是理论上的证据和实验证据表明这种风险存在。因此，无法 completly 排除 AI via 偏向和权力寻求对人类 pose existential risks 的可能性。

Abstract
Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose existential risks. This paper reviews the evidence for existential risks from AI via misalignment, where AI systems develop goals misaligned with human values, and power-seeking, where misaligned AIs actively seek power. The review examines empirical findings, conceptual arguments and expert opinion relating to specification gaming, goal misgeneralization, and power-seeking. The current state of the evidence is found to be concerning but inconclusive regarding the existence of extreme forms of misaligned power-seeking. Strong empirical evidence of specification gaming combined with strong conceptual evidence for power-seeking make it difficult to dismiss the possibility of existential risk from misaligned power-seeking. On the other hand, to date there are no public empirical examples of misaligned power-seeking in AI systems, and so arguments that future systems will pose an existential risk remain somewhat speculative. Given the current state of the evidence, it is hard to be extremely confident either that misaligned power-seeking poses a large existential risk, or that it poses no existential risk. The fact that we cannot confidently rule out existential risk from AI via misaligned power-seeking is cause for serious concern.

摘要
人工智能（AI）的快速发展已引发了专家、政策制定者和世界领袖对AI系统可能对人类存在潜在的极大风险的担忧。这篇评论文章检查了AI系统发展不同目标的证据，包括 specification gaming、目标扩展和权力寻求。审查的证据表明，虽然目前没有公共的实证例子，但概念上的证据强，表明AI系统可能会发展出不同于人类价值观的目标。此外，由于目前的证据状况，无法绝对排除AI系统可能对人类存在极大风险的可能性。因此，我们应该对这一点表示严重关注。

Fine-Tuning Language Models Using Formal Methods Feedback

paper_url: http://arxiv.org/abs/2310.18239
repo_url: None
paper_authors: Yunhao Yang, Neel P. Bhatt, Tyler Ingebrand, William Ward, Steven Carr, Zhangyang Wang, Ufuk Topcu
for: 这个论文旨在提高预训练语言模型的应用在自动控制领域，使其能够更好地满足具体任务的需求。
methods: 这篇论文提出了一种完全自动的 fine-tuning 方法，通过自然语言任务描述 guideline 将预训练语言模型转换为具体任务的控制器，并通过世界模型来验证这些控制器的合liance。
results: 论文提供了多个自动驾驶任务的实验结果，表明该方法可以在不同的任务上提高预训练语言模型的性能，从60%提高到90%。

Abstract
Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address this limitation, however, sourcing human feedback is labor intensive and costly. We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems, bridging the gap between generic knowledge and domain-specific requirements while reducing cost. The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions. These controllers are verifiable against independently provided specifications within a world model, which can be abstract or obtained from a high-fidelity simulator. Controllers with high compliance with the desired specifications receive higher ranks, guiding the iterative fine-tuning process. We provide quantitative evidences, primarily in autonomous driving, to demonstrate the method's effectiveness across multiple tasks. The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%.

摘要
Our method uses natural language task descriptions to guide the synthesis of automaton-based controllers from pre-trained models. These controllers are verifiable against independently provided specifications within a world model, which can be abstract or obtained from a high-fidelity simulator. Controllers that comply with the desired specifications receive higher ranks, guiding the iterative fine-tuning process.We provide quantitative evidence, primarily in the field of autonomous driving, to demonstrate the effectiveness of our method. The results show an improvement in the percentage of specifications satisfied by the controller, from 60% to 90%.

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

paper_url: http://arxiv.org/abs/2310.18235
repo_url: None
paper_authors: Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang
for: 这个论文的目的是evaluating text-to-image models的可靠性。methods: 这个论文使用了Question Generation and Answering（QG/A）方法，通过使用预训练的基础模型生成提问和答案，然后根据提问生成的答案和图像是否一致来评估图像的可靠性。results: 这个论文通过提出和解决一些可靠性问题（如提问不应该包含幻像、重复或漏掉信息），并使用Davidsonian Scene Graph（DSG）评估框架来提高评估的可靠性。DSG使用图表来组织提问和答案，以确保提问的 semantic coverage 和答案的一致性。经过广泛的实验和人工评估，这个论文证明了DSG可以有效地解决这些问题。

Abstract
Evaluating text-to-image models is notoriously difficult. A strong recent approach for assessing text-image faithfulness is based on QG/A (question generation and answering), which uses pre-trained foundational models to automatically generate a set of questions and answers from the prompt, and output images are scored based on whether these answers extracted with a visual question answering model are consistent with the prompt-based answers. This kind of evaluation is naturally dependent on the quality of the underlying QG and QA models. We identify and address several reliability challenges in existing QG/A work: (a) QG questions should respect the prompt (avoiding hallucinations, duplications, and omissions) and (b) VQA answers should be consistent (not asserting that there is no motorcycle in an image while also claiming the motorcycle is blue). We address these issues with Davidsonian Scene Graph (DSG), an empirically grounded evaluation framework inspired by formal semantics. DSG is an automatic, graph-based QG/A that is modularly implemented to be adaptable to any QG/A module. DSG produces atomic and unique questions organized in dependency graphs, which (i) ensure appropriate semantic coverage and (ii) sidestep inconsistent answers. With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above. Finally, we present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts, covering a wide range of fine-grained semantic categories with a balanced distribution. We release the DSG-1k prompts and the corresponding DSG questions.

摘要
评估文本到图像模型是非常困难的。一种强大的最近的方法是基于QG/A（问题生成和回答），它使用预训练的基础模型自动生成了一组问题和答案从提示中，然后根据图像输出的答案是否与提示基础答案一致来评分。这种评估方法自然地受到基础QG和QA模型的质量的影响。我们 indentify和解决了现有QG/A工作中的一些可靠性挑战：（a）QG问题应该遵循提示（避免幻象、重复和漏掉），（b）VQA答案应该一致（不能声称图像中没有摩托车而同时声称摩托车是蓝色）。我们使用戴维森景图（DSG）来解决这些问题，DSG是基于形式 semantics的实际训练的评估框架。DSG自动生成了原子和唯一的问题，组织成依赖图，以确保适当的semantic Coverage并且 circumvent不一致的答案。通过广泛的实验和人工评估，我们证明了DSG可以解决上述挑战。最后，我们提供了DSG-1k，一个开源的评估标准 benchmark，包括1,060个提示，覆盖了各种细化的semantic类别，并且具有良好的分布。我们发布了DSG-1k提示和相应的DSG问题。

Alignment and Outer Shell Isotropy for Hyperbolic Graph Contrastive Learning

paper_url: http://arxiv.org/abs/2310.18209
repo_url: None
paper_authors: Yifei Zhang, Hao Zhu, Jiahong Liu, Piotr Koniusz, Irwin King
for: 学习高质量自主图表示，以便下游任务的改进。
methods: 提出了一种新的对偶学习框架，包括设计了有效地捕捉层次数据不变信息的对齐度量，以及提出了一种取代均匀度量来避免维度塌降问题。
results: 在不同的гипербо利图表示技术上，通过自动匹配度量和均匀度量来学习高质量图表示，并在supervised和自主学习设置下实现了较高的效果。

Abstract
Learning good self-supervised graph representations that are beneficial to downstream tasks is challenging. Among a variety of methods, contrastive learning enjoys competitive performance. The embeddings of contrastive learning are arranged on a hypersphere that enables the Cosine distance measurement in the Euclidean space. However, the underlying structure of many domains such as graphs exhibits highly non-Euclidean latent geometry. To this end, we propose a novel contrastive learning framework to learn high-quality graph embedding. Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information, as well as we propose a substitute of uniformity metric to prevent the so-called dimensional collapse. We show that in the hyperbolic space one has to address the leaf- and height-level uniformity which are related to properties of trees, whereas in the ambient space of the hyperbolic manifold, these notions translate into imposing an isotropic ring density towards boundaries of Poincar\'e ball. This ring density can be easily imposed by promoting the isotropic feature distribution on the tangent space of manifold. In the experiments, we demonstrate the efficacy of our proposed method across different hyperbolic graph embedding techniques in both supervised and self-supervised learning settings.

摘要
学习良好的自我超VIewgraph representation是挑战性较高的。contrastive learning方法在这些方法中具有竞争性的表现。contrastive learning的嵌入是在一个径向体上安排的，这使得在欧几何空间中可以使用cosine距离测量。然而，许多领域的下游任务中的数据结构具有非欧几何的隐藏几何结构。为此，我们提出了一种新的对比学习框架，以学习高质量的图像嵌入。具体来说，我们设计了一个对应度度量，可以有效地捕捉层次数据不变信息，同时我们也提出了一种取代均匀度量来避免叫做dimensional collapse。我们发现在拓扑空间中，需要 Addressing leaf-和height-level uniformity，这与树的性质有关。而在拓扑空间中的 ambient space 中，这些概念转化为在Poincaré球的边界上强制实施一个均匀环绕径。这个环绕径可以通过推动拓扑空间的 tangent space 上的均匀特征分布来实现。在实验中，我们证明了我们提出的方法在不同的拓扑空间中的几何图像嵌入技术中具有效果，并在自我超VIewgraph embedding中和supervised learning中进行了证明。

Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO’s 4000 TPU Months

paper_url: http://arxiv.org/abs/2310.18191
repo_url: None
paper_authors: Fady Rezk, Antreas Antoniou, Henry Gouk, Timothy Hospedales
for: 本研究是用来训练一个通用的”基础”优化器的最大规模尝试。
methods: 本研究使用了 thousands of 机器学习任务和 over 4000 TPU 月份，以 Produce 一个可以泛化到新问题的优化器，并且不需要 гиперparameters 的调整。
results: 我们发现，与初始声明不符的结论：（1） VeLO 有一个关键的 гиперparameters 需要具体问题的调整，（2） VeLO 并不一定可以在解决质量上超越竞争对手，（3） VeLO 不一定比竞争优化器更快地降低训练损失。这些观察结论质疑 VeLO 的通用性和投资训练它的价值。

Abstract
We analyze VeLO (versatile learned optimizer), the largest scale attempt to train a general purpose "foundational" optimizer to date. VeLO was trained on thousands of machine learning tasks using over 4000 TPU months with the goal of producing an optimizer capable of generalizing to new problems while being hyperparameter free, and outperforming industry standards such as Adam. We independently evaluate VeLO on the MLCommons optimizer benchmark suite. We find that, contrary to initial claims: (1) VeLO has a critical hyperparameter that needs problem-specific tuning, (2) VeLO does not necessarily outperform competitors in quality of solution found, and (3) VeLO is not faster than competing optimizers at reducing the training loss. These observations call into question VeLO's generality and the value of the investment in training it.

摘要
我们分析了VeLO（多功能学习优化器），目前最大规模的尝试是用多种机器学习任务来训练一个通用的“基础”优化器。VeLO在多达4000个TPU月的训练时间和4000个机器学习任务上被训练，以产生一个能够泛化到新问题的优化器，并且不需要任何hyperparameter。我们独立评估了VeLO在MLCommons优化器benchmark集合中的性能。我们发现：1. VeLO有一个关键的hyperparameter需要问题特定的调整。2. VeLO不一定能够超越竞争对手在解决问题的质量上。3. VeLO不一定比竞争对手快速地减少训练损失。这些观察结果质疑了VeLO的通用性和投资训练它的价值。

Personas as a Way to Model Truthfulness in Language Models

paper_url: http://arxiv.org/abs/2310.18168
repo_url: None
paper_authors: Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, He He
for: This paper explores the ability of large language models to discern truth from falsehood in contradictory data.
methods: The authors hypothesize that language models can cluster truthful text by modeling a truthful persona, which is a group of agents that are likely to produce truthful text and share similar features. They use arithmetics as a synthetic environment to test this hypothesis.
results: The authors find that language models can separate true and false statements, and generalize truthfulness across agents, but only if the agents in the training data share a truthful generative process that enables the creation of a truthful persona. This suggests that models can exploit hierarchical structures in the data to learn abstract concepts like truthfulness.

Abstract
Large Language Models are trained on vast amounts of text from the internet, which contains both factual and misleading information about the world. Can language models discern truth from falsehood in this contradicting data? Expanding on the view that LLMs can model different agents producing the corpora, we hypothesize that they can cluster truthful text by modeling a truthful persona: a group of agents that are likely to produce truthful text and share similar features. For example, trustworthy sources like Wikipedia and Science usually use formal writing styles and make consistent claims. By modeling this persona, LLMs can generalize truthfulness beyond the specific contexts in which each agent generated the training text. For example, the model can infer that the agent "Wikipedia" will behave truthfully on topics that were only generated by "Science" because they share a persona. We first show evidence for the persona hypothesis via two observations: (1) we can probe whether a model's answer will be truthful before it is generated; (2) finetuning a model on a set of facts improves its truthfulness on unseen topics. Next, using arithmetics as a synthetic environment, we show that language models can separate true and false statements, and generalize truthfulness across agents; but only if agents in the training data share a truthful generative process that enables the creation of a truthful persona. Overall, our findings suggest that models can exploit hierarchical structures in the data to learn abstract concepts like truthfulness.

摘要
我们的实验证明了这个人格假设，通过以下两个观察：（1）我们可以在模型生成答案之前检查其是否为真实的；（2）在训练集中对一些事实进行调整，可以提高模型对未见过的主题上的真实性。接下来，我们使用数学为Synthetic环境，证明语言模型可以分辨真实和假的声明，并将真实性扩展到不同的代理人。但是，只有在训练资料中的代理人具有真实生成过程，才能够创建一个真实的人格。总的来说，我们的发现表明了模型可以运用数据的层次结构，学习抽象概念如真实性。

Improving Intrinsic Exploration by Creating Stationary Objectives

paper_url: http://arxiv.org/abs/2310.18144
repo_url: None
paper_authors: Roger Creus Castanyer, Joshua Romoff, Glen Berseth
for: 提高 Agent 在探索问题中的性能，特别是在稀缺奖励 Task 和高维 Observation 等难题上。
methods: 利用 Count-based 方法 derivate 探索奖励，并通过 Stationary Objectives For Exploration (SOFE) 框架将原始非站点奖励转化为站点奖励，以便更好地优化 Agent 的目标。
results: 在多种探索问题中，包括稀缺奖励 Task、像素基 Observation、3D 导航和生成的环境等，SOFE 能够提高 Agent 的性能。

Abstract
Exploration bonuses in reinforcement learning guide long-horizon exploration by defining custom intrinsic objectives. Count-based methods use the frequency of state visits to derive an exploration bonus. In this paper, we identify that any intrinsic reward function derived from count-based methods is non-stationary and hence induces a difficult objective to optimize for the agent. The key contribution of our work lies in transforming the original non-stationary rewards into stationary rewards through an augmented state representation. For this purpose, we introduce the Stationary Objectives For Exploration (SOFE) framework. SOFE requires identifying sufficient statistics for different exploration bonuses and finding an efficient encoding of these statistics to use as input to a deep network. SOFE is based on proposing state augmentations that expand the state space but hold the promise of simplifying the optimization of the agent's objective. Our experiments show that SOFE improves the agents' performance in challenging exploration problems, including sparse-reward tasks, pixel-based observations, 3D navigation, and procedurally generated environments.

摘要
文本翻译为简化中文：探索奖励在强化学习中引导长期探索，定义自定义内在目标。计数基本方法使用状态访问频率 derive 探索奖励。我们发现，任何基于计数基本方法 derive 的内在奖励都是非站ARY的，因此难以优化代理人的目标。我们的工作关键在于将原始非站ARY奖励转化为站ARY奖励，通过增强状态表示来实现。为此，我们提出了站ARY目标 для探索 (SOFE) 框架。SOFE需要确定不同探索奖励的 suffiSing statistic 和有效地编码这些统计作为深度网络的输入。SOFE基于提出状态扩展，既可以扩大状态空间，又可以简化代理人的目标优化。我们的实验表明，SOFE在复杂探索问题中提高了代理人的表现，包括罕见奖励任务、像素基本观察、3D导航和生成环境。Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models

paper_url: http://arxiv.org/abs/2310.18127
repo_url: None
paper_authors: Xue Yan, Yan Song, Xinyu Cui, Filippos Christianos, Haifeng Zhang, David Henry Mguni, Jun Wang
for: This paper aims to develop a fully integrated end-to-end framework for task-solving in real settings using complicated reasoning.
methods: The proposed leader-follower bilevel framework learns to ask relevant questions (prompts) and undertake reasoning to guide the learning of actions to be performed in an environment. The system uses a prompt-generator policy and an action policy to adapt to the CoT process and take decisive, high-performing actions.
results: The empirical data shows that the proposed system outperforms leading methods in agent learning benchmarks such as Overcooked and FourRoom.

Abstract
Large language models (LLMs) demonstrate their promise in tackling complicated practical challenges by combining action-based policies with chain of thought (CoT) reasoning. Having high-quality prompts on hand, however, is vital to the framework's effectiveness. Currently, these prompts are handcrafted utilizing extensive human labor, resulting in CoT policies that frequently fail to generalize. Human intervention is also required in order to develop grounding functions that ensure low-level controllers appropriately process CoT reasoning. In this paper, we take the first step towards a fully integrated end-to-end framework for task-solving in real settings employing complicated reasoning. To that purpose, we offer a new leader-follower bilevel framework capable of learning to ask relevant questions (prompts) and subsequently undertaking reasoning to guide the learning of actions to be performed in an environment. A good prompt should make introspective revisions based on historical findings, leading the CoT to consider the anticipated goals. A prompt-generator policy has its own aim in our system, allowing it to adapt to the action policy and automatically root the CoT process towards outputs that lead to decisive, high-performing actions. Meanwhile, the action policy is learning how to use the CoT outputs to take specific actions. Our empirical data reveal that our system outperforms leading methods in agent learning benchmarks such as Overcooked and FourRoom.

摘要
大型语言模型（LLM）在解决实际挑战中展示了其应用潜力，通过结合动作政策和链接思维（CoT）理解。然而，高质量提示是框架的重要 componenet，并且通常需要人工干预以开发基础函数，以确保低层控制器正确处理CoT理解。在这篇论文中，我们将实现完整的终端到终端框架，用于实际设置中的任务解决。为此，我们提出了一个新的领导者-追随者二级框架，能够学习问题（提示）和随后进行理解，以导引行为学习。一个好的提示应该根据历史发现进行 introspective 修订，导引CoT考虑预期目标。在我们的系统中，提示策略有自己的目标，让它适应行为策略，并自动将CoT过程导向出力，以确保高效、决策性的动作。同时，动作策略在使用CoT出力进行特定动作。我们的实验数据显示，我们的系统在代理学习测试 benchmark 中表现出色，比如 Overcooked 和 FourRoom。

OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization

paper_url: http://arxiv.org/abs/2310.18122
repo_url: https://github.com/a-chicharito-s/opinsummeval
paper_authors: Yuchen Shen, Xiaojun Wan
for: This paper focuses on the evaluation of opinion summarization models, specifically exploring the correlation between automatic metrics and human ratings.
methods: The paper uses a dataset called OpinSummEval, which includes human judgments and outputs from 14 opinion summarization models. The authors explore the correlation between 24 automatic metrics and human ratings across four dimensions.
results: The authors find that metrics based on neural networks generally outperform non-neural ones, but even the best-performing metrics do not consistently correlate well across all dimensions. This highlights the need for advancements in automated evaluation methods for opinion summarization.

Abstract
Opinion summarization sets itself apart from other types of summarization tasks due to its distinctive focus on aspects and sentiments. Although certain automated evaluation methods like ROUGE have gained popularity, we have found them to be unreliable measures for assessing the quality of opinion summaries. In this paper, we present OpinSummEval, a dataset comprising human judgments and outputs from 14 opinion summarization models. We further explore the correlation between 24 automatic metrics and human ratings across four dimensions. Our findings indicate that metrics based on neural networks generally outperform non-neural ones. However, even metrics built on powerful backbones, such as BART and GPT-3/3.5, do not consistently correlate well across all dimensions, highlighting the need for advancements in automated evaluation methods for opinion summarization. The code and data are publicly available at https://github.com/A-Chicharito-S/OpinSummEval/tree/main.

摘要

Towards a Unified Conversational Recommendation System: Multi-task Learning via Contextualized Knowledge Distillation

paper_url: http://arxiv.org/abs/2310.18119
repo_url: None
paper_authors: Yeongseo Jung, Eunseo Jung, Lei Chen
for: 提高对话式推荐系统（CRS）的个性化推荐和对话能力。
methods: 提出了一种基于Contextualized Knowledge Distillation（ConKD）的多任务学习方法，将对话推荐和对话模块结合到一起，以提高推荐性和对话流畅性。
results: 实验表明，我们的单个模型可以显著提高推荐性，同时保持对话流畅性，并与多任务学习方法相比，实现了相似的多样性表现。

Abstract
In Conversational Recommendation System (CRS), an agent is asked to recommend a set of items to users within natural language conversations. To address the need for both conversational capability and personalized recommendations, prior works have utilized separate recommendation and dialogue modules. However, such approach inevitably results in a discrepancy between recommendation results and generated responses. To bridge the gap, we propose a multi-task learning for a unified CRS, where a single model jointly learns both tasks via Contextualized Knowledge Distillation (ConKD). We introduce two versions of ConKD: hard gate and soft gate. The former selectively gates between two task-specific teachers, while the latter integrates knowledge from both teachers. Our gates are computed on-the-fly in a context-specific manner, facilitating flexible integration of relevant knowledge. Extensive experiments demonstrate that our single model significantly improves recommendation performance while enhancing fluency, and achieves comparable results in terms of diversity.

摘要
在协作推荐系统（CRS）中，一个代理被要求在自然语言交流中推荐一组ITEMS给用户。为了解决个性化推荐和对话能力的需求，先前的工作通常使用了分开的推荐和对话模块。然而，这种方法无法快速bridging these two tasks的差异，导致推荐结果与生成的响应之间存在差异。为了bridge这个差异，我们提出了一种多任务学习的统一CRS，其中一个模型同时学习了两个任务 via Contextualized Knowledge Distillation（ConKD）。我们引入了两种ConKD版本：hard gate和soft gate。前者在两个任务特定的教师之间选择性地阻断，而后者将两个教师的知识集成在一起。我们的门控在上下文具体的计算，使得可以在不同的上下文中灵活地集成相关的知识。我们的实验表明，我们的单一模型可以大幅提高推荐性能，同时提高流畅性，并与多任务学习模型相比，在多样性方面实现相似的结果。

er.autopilot 1.0: The Full Autonomous Stack for Oval Racing at High Speeds

paper_url: http://arxiv.org/abs/2310.18112
repo_url: None
paper_authors: Ayoub Raji, Danilo Caporale, Francesco Gatti, Andrea Giove, Micaela Verucchi, Davide Malatesta, Nicola Musiu, Alessandro Toschi, Silviu Roberto Popitanu, Fabio Bagni, Massimiliano Bosi, Alexander Liniger, Marko Bertogna, Daniele Morra, Francesco Amerotti, Luca Bartoli, Federico Martello, Riccardo Porta
for: 本研究旨在提出一个独立开发的自主车辆软件架构，并在赛车赛道上进行了实验验证。
methods: 本研究使用了独立开发的自主车辆软件，包括了适应障碍物、主动超越和速度控制等模组。
results: 本研究在首两场赛事中获得了第二和第三名的成绩，并提供了各模组的实验结果和所学。

Abstract
The Indy Autonomous Challenge (IAC) brought together for the first time in history nine autonomous racing teams competing at unprecedented speed and in head-to-head scenario, using independently developed software on open-wheel racecars. This paper presents the complete software architecture used by team TII EuroRacing (TII-ER), covering all the modules needed to avoid static obstacles, perform active overtakes and reach speeds above 75 m/s (270 km/h). In addition to the most common modules related to perception, planning, and control, we discuss the approaches used for vehicle dynamics modelling, simulation, telemetry, and safety. Overall results and the performance of each module are described, as well as the lessons learned during the first two events of the competition on oval tracks, where the team placed respectively second and third.

摘要
印第安那自主挑战（IAC）是历史上第一次将九支自主赛车队伍集结在一起，以前所未有的速度和头一头方式竞赛，使用独立开发的软件在开放式赛车上。本文介绍了TII EuroRacing（TII-ER）队伍所使用的完整软件架构，涵盖避免静止障碍物、实施活动超越和速度超过75米/秒（270公里/小时）等模块。此外，我们还讨论了车辆动力学模型、模拟、测验和安全方面的方法。文章结尾还提供了每个模块的性能和成绩，以及在oval赛道上的第一两场比赛中所学到的经验。

Detrimental Contexts in Open-Domain Question Answering

paper_url: http://arxiv.org/abs/2310.18077
repo_url: https://github.com/xfactlab/emnlp2023-damaging-retrieval
paper_authors: Philhoon Oh, James Thorne
for: 本研究旨在探讨抓取大量信息对知识搜索模型的性能影响，并分析抓取大量信息对问答模型的影响。
methods: 本研究使用现有的抓取方法，不需要进一步的训练或数据。研究人员通过对抓取的文章进行筛选，以提高问答模型的性能。
results: 研究人员发现，使用抓取大量信息可以提高问答模型的准确率，但是使用整个文章可以导致模型的性能下降。通过筛选抓取的文章，可以提高模型的性能。

Abstract
For knowledge intensive NLP tasks, it has been widely accepted that accessing more information is a contributing factor to improvements in the model's end-to-end performance. However, counter-intuitively, too much context can have a negative impact on the model when evaluated on common question answering (QA) datasets. In this paper, we analyze how passages can have a detrimental effect on retrieve-then-read architectures used in question answering. Our empirical evidence indicates that the current read architecture does not fully leverage the retrieved passages and significantly degrades its performance when using the whole passages compared to utilizing subsets of them. Our findings demonstrate that model accuracy can be improved by 10% on two popular QA datasets by filtering out detrimental passages. Additionally, these outcomes are attained by utilizing existing retrieval methods without further training or data. We further highlight the challenges associated with identifying the detrimental passages. First, even with the correct context, the model can make an incorrect prediction, posing a challenge in determining which passages are most influential. Second, evaluation typically considers lexical matching, which is not robust to variations of correct answers. Despite these limitations, our experimental results underscore the pivotal role of identifying and removing these detrimental passages for the context-efficient retrieve-then-read pipeline. Code and data are available at https://github.com/xfactlab/emnlp2023-damaging-retrieval

摘要
对知识密集的NLP任务，许多研究表明，更多的信息访问可以提高模型的综合性表现。然而，counter-intuitively，过度的背景信息可能会对模型在常见问答（QA）数据集上的性能产生负面影响。在这篇论文中，我们分析了如何段落可以对问答模型产生负面影响。我们的实验证据表明，当前的读取架构不能充分利用检索到的段落，并且将整个段落作为输入时，模型的性能会明显下降。我们的发现表明，可以通过过滤掉负面影响的段落来提高模型的准确率。此外，我们还高亮了确定负面影响的段落的挑战。首先，即使正确的上下文，模型可能会作出错误预测，困难判断哪些段落最有影响。其次，评估通常是基于字符匹配，这并不是对正确答案的变体具有坚定的鲁棒性。尽管如此，我们的实验结果表明，确定和移除负面影响的段落对Context-efficient检索-然后-读取管线是非常重要的。代码和数据可以在https://github.com/xfactlab/emnlp2023-damaging-retrieval中找到。

Knowledge Corpus Error in Question Answering

paper_url: http://arxiv.org/abs/2310.18076
repo_url: https://github.com/xfactlab/emnlp2023-knowledge-corpus-error
paper_authors: Yejoon Lee, Philhoon Oh, James Thorne
for: This paper explores the effectiveness of generating context passages from large language models (LLMs) in open-domain question answering (QA), and investigates why generated passages may be more effective than retrieved ones.
methods: The paper introduces the concept of knowledge corpus error, which arises when the knowledge corpus used for retrieval is only a subset of the entire string space, and mitigates this shortcoming by generating passages in a larger space using LLMs. The paper also presents an experiment of paraphrasing human-annotated gold context using LLMs to observe knowledge corpus error empirically.
results: The results across three QA benchmarks show an increased performance (10% - 13%) when using paraphrased passage, indicating a signal for the existence of knowledge corpus error.Here is the information in Simplified Chinese text, as requested:
for: 这篇论文研究了开放领域问答（QA）中大语言模型（LLMs）生成上下文段的效果，以及为何生成的段更有效。
methods: 论文提出了知识库错误概念，即知识库用于搜索的字符串空间仅占整个字符串空间的子集，可能排除了更有帮助的段。论文使用大语言模型生成更大的字符串空间，以避免这种缺陷。
results: 结果表明，使用生成的段可以提高表现（10% - 13%），这表明知识库错误的存在。

Abstract
Recent works in open-domain question answering (QA) have explored generating context passages from large language models (LLMs), replacing the traditional retrieval step in the QA pipeline. However, it is not well understood why generated passages can be more effective than retrieved ones. This study revisits the conventional formulation of QA and introduces the concept of knowledge corpus error. This error arises when the knowledge corpus used for retrieval is only a subset of the entire string space, potentially excluding more helpful passages that exist outside the corpus. LLMs may mitigate this shortcoming by generating passages in a larger space. We come up with an experiment of paraphrasing human-annotated gold context using LLMs to observe knowledge corpus error empirically. Our results across three QA benchmarks reveal an increased performance (10% - 13%) when using paraphrased passage, indicating a signal for the existence of knowledge corpus error. Our code is available at https://github.com/xfactlab/emnlp2023-knowledge-corpus-error

摘要
现有研究在开放领域问答（QA）中已经探索了从大语言模型（LLM）中生成上下文段落，取代传统的检索步骤在QA管道中。然而，不是很好地理解为何生成的段落比检索的更有效。本研究重新定义了传统的QA формулировка，并引入了知识库错误的概念。这种错误发生在用于检索的知识库只是字符串空间中的一个子集，可能排除了更有帮助的段落。LLM可能 mitigate这个缺点，因为它们可以生成段落在更大的空间中。我们设计了一个使用LLM来重新译human-annotated金标段落的实验，以观察知识库错误的实际情况。我们的结果在三个QA benchmark上显示，使用重新译段落时性能提高了10%-13%，这表明了知识库错误的存在。我们的代码可以在https://github.com/xfactlab/emnlp2023-knowledge-corpus-error中找到。

DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

paper_url: http://arxiv.org/abs/2310.18075
repo_url: None
paper_authors: Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui
for: 这篇论文的目的是提出一个基于 dual-process theory 的 conversational agent 框架，以提高对问题的回答效率和质量。
methods: 这篇论文使用了两个生成型 Large Language Models (LLMs)，一个用于快速思考，另一个用于慢思考。快速思考模型负责外部互动和初步回答生成，根据问题的复杂程度进行评估是否需要启动慢思考模型。当启动时，慢思考模型会主导对话，进行细心的规划、推理和工具使用，以提供一个详细分析的回答。
results: 实验结果显示，我们的方法可以将效率和质量兼顾，与基准相比有很大的改善。

Abstract
Inspired by the dual-process theory of human cognition, we introduce DUMA, a novel conversational agent framework that embodies a dual-mind mechanism through the utilization of two generative Large Language Models (LLMs) dedicated to fast and slow thinking respectively. The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the necessity for engaging the slow thinking model based on the complexity of the complete response. When invoked, the slow thinking model takes over the conversation, engaging in meticulous planning, reasoning, and tool utilization to provide a well-analyzed response. This dual-mind configuration allows for a seamless transition between intuitive responses and deliberate problem-solving processes based on the situation. We have constructed a conversational agent to handle online inquiries in the real estate industry. The experiment proves that our method balances effectiveness and efficiency, and has a significant improvement compared to the baseline.

摘要
基于人类认知双进程理论，我们介绍DUMA conversational agent框架，该框架通过两个生成型大语言模型（LLM）来实现 быстро和慢思考的双 Mind 机制。快思模型作为外部交互的主要界面，评估问题的复杂性，并根据需要邀请慢思模型参与对话。当邀请时，慢思模型会承担对话，进行细致的规划、理智和工具使用，以提供优化的回答。这种双 Mind 配置允许在不同情况下协调Intuitive 回答和慎重的问题解决过程。我们在房地产领域的在线问题处理中构建了一个 conversational agent，实验证明我们的方法能够平衡效率和效果，与基准相比有显著改进。

Moral Responsibility for AI Systems

paper_url: http://arxiv.org/abs/2310.18040
repo_url: None
paper_authors: Sander Beckers
for: 本文提出了一个定义AI系统的道德责任的方法，以便在AI系统做出具有道德意义的决策时能够定义道德责任。
methods: 本文使用 causal models 框架定义道德责任的两个条件：行动应该导致结果，并且agent应该意识到可能的道德后果。
results: 本文提出了一种度量道德责任的方法，并与现有的BvH和HK方法进行比较。

Abstract
As more and more decisions that have a significant ethical dimension are being outsourced to AI systems, it is important to have a definition of moral responsibility that can be applied to AI systems. Moral responsibility for an outcome of an agent who performs some action is commonly taken to involve both a causal condition and an epistemic condition: the action should cause the outcome, and the agent should have been aware -- in some form or other -- of the possible moral consequences of their action. This paper presents a formal definition of both conditions within the framework of causal models. I compare my approach to the existing approaches of Braham and van Hees (BvH) and of Halpern and Kleiman-Weiner (HK). I then generalize my definition into a degree of responsibility.

摘要
随着更多的具有道德含义的决策被推到人工智能系统中，有必要为AI系统定义道德责任的定义。道德责任的出来由两个条件组成：行为应该导致结果，并且机器人应该知道（在某种形式下）可能的道德后果。这篇文章提出了一个正式的定义方法，并与布拉姆和海斯（BvH）和哈尔普尔和克莱曼-维纳（HK）的现有方法进行比较。然后，我将定义推广到责任度的一级。Here's the translation in Traditional Chinese:随着更多的具有道德含义的决策被推到人工智能系统中，有必要为AI系统定义道德责任的定义。道德责任的出来由两个条件组成：行为应该导致结果，并且机器人应该知道（在某种形式下）可能的道德后果。这篇文章提出了一个正式的定义方法，并与布拉姆和海斯（BvH）和哈尔普尔和克莱曼-维纳（HK）的现有方法进行比较。然后，我将定义推广到责任度的一级。

Large language models for aspect-based sentiment analysis

paper_url: http://arxiv.org/abs/2310.18025
repo_url: https://github.com/qagentur/absa_llm
paper_authors: Paul F. Simmering, Paavo Huoviala
for: The paper is written for assessing the performance of GPT-4 and GPT-3.5 in aspect-based sentiment analysis (ABSA) tasks, and exploring the cost-performance trade-offs of different models.
methods: The paper uses zero-shot, few-shot, and fine-tuned settings to evaluate the performance of GPT-4 and GPT-3.5 on the ABSA task, and compares their performance with InstructABSA [@scaria_instructabsa_2023].
results: The fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task, improving upon InstructABSA by 5.7%. However, the fine-tuned model has 1000 times more parameters and thus higher inference cost. The paper also finds that detailed prompts improve performance in zero-shot and few-shot settings but are not necessary for fine-tuned models.

Abstract
Large language models (LLMs) offer unprecedented text completion capabilities. As general models, they can fulfill a wide range of roles, including those of more specialized models. We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings on the aspect-based sentiment analysis (ABSA) task. Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task of the SemEval-2014 Task 4, improving upon InstructABSA [@scaria_instructabsa_2023] by 5.7%. However, this comes at the price of 1000 times more model parameters and thus increased inference cost. We discuss the the cost-performance trade-offs of different models, and analyze the typical errors that they make. Our results also indicate that detailed prompts improve performance in zero-shot and few-shot settings but are not necessary for fine-tuned models. This evidence is relevant for practioners that are faced with the choice of prompt engineering versus fine-tuning when using LLMs for ABSA.

摘要

OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language Identification

paper_url: http://arxiv.org/abs/2310.18387
repo_url: None
paper_authors: Dhiman Goswami, Md Nishat Raihan, Antara Mahmud, Antonios Anstasopoulos, Marcos Zampieri
for: 本研究是为了开发一个新的三语混合语料库，用于识别攻击性语言。
methods: 本研究使用了多种模型进行实验，包括 transformer 基于模型和 GPT 3.5。
results: 研究发现，BanglishBERT 在这个三语混合语料库中表现出色，超过其他 transformer 基于模型的表现。

Abstract
Code-mixing is a well-studied linguistic phenomenon when two or more languages are mixed in text or speech. Several works have been conducted on building datasets and performing downstream NLP tasks on code-mixed data. Although it is not uncommon to observe code-mixing of three or more languages, most available datasets in this domain contain code-mixed data from only two languages. In this paper, we introduce OffMix-3L, a novel offensive language identification dataset containing code-mixed data from three different languages. We experiment with several models on this dataset and observe that BanglishBERT outperforms other transformer-based models and GPT-3.5.

摘要
Code-mixing 是一种已经广泛研究的语言现象，在文本或语音中混合两种或更多种语言。许多研究已经建立了 code-mixed 数据集并在这些数据集上进行了下游 NLP 任务。虽然三种语言混合并不是不常见的，但大多数可用的数据集都只包含了两种语言的 code-mixed 数据。在这篇论文中，我们介绍了 OffMix-3L，一个新的三种语言混合语言识别数据集。我们在这个数据集上试用了一些模型，并发现 BanglishBERT 超过了其他转换器基于模型和 GPT-3.5。Here's the translation in Traditional Chinese:Code-mixing 是一种已经广泛研究的语言现象，在文本或语音中混合两种或更多种语言。许多研究已经建立了 code-mixed 数据集并在这些数据集上进行了下游 NLP 任务。处于三种语言混合的情况下，大多数可用的数据集都只包含了两种语言的 code-mixed 数据。在这篇论文中，我们介绍了 OffMix-3L，一个新的三种语言混合语言识别数据集。我们在这个数据集上尝试了一些模型，并发现 BanglishBERT 超过了其他对应的 transformer 基于模型和 GPT-3.5。

FormalGeo: The First Step Toward Human-like IMO-level Geometric Automated Reasoning

paper_url: http://arxiv.org/abs/2310.18021
repo_url: None
paper_authors: Xiaokai Zhang, Na Zhu, Yiming He, Jia Zou, Qike Huang, Xiaoxiao Jin, Yanjun Guo, Chenyang Mao, Zhe Zhu, Dengfeng Yue, Fangzhen Zhu, Yang Li, Yifan Wang, Yiwen Huang, Runan Wang, Cheng Qin, Zhenbing Zeng, Shaorong Xie, Xiangfeng Luo, Tuo Leng
for: 这个论文的目的是构建一个完整的 formally compatible 平面几何系统，以便将AI自动推理与IMO级别的平面几何挑战联系起来。
methods: 该论文使用了geometry formalization theory（GFT）指导建立了FormalGeo系统，包括88个几何 predicate 和 196个定理。它可以处理、验证和解决IMO级别的平面几何问题。此外，他们还实现了一个基于Python的Formal Geometry Problem Solver（FGPS），可以作为人工智能辅助验证问题解决过程，以及自动问题解决器，使用了forward search、backward search 和 AI-assisted search 等方法。
results: 实现了FormalGeo系统和FGPS实验，证明了GFT的正确性和实用性。使用backward depth-first search方法，解决问题失败率仅2.42%，并可以通过深度学习技术来降低这一值。

Abstract
This is the first paper in a series of work we have accomplished over the past three years. In this paper, we have constructed a complete and compatible formal plane geometry system. This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning. With this formal system in place, we have been able to seamlessly integrate modern AI models with our formal system. Within this formal framework, AI is now capable of providing deductive reasoning solutions to IMO-level plane geometry problems, just like handling other natural languages, and these proofs are readable, traceable, and verifiable. We propose the geometry formalization theory (GFT) to guide the development of the geometry formal system. Based on the GFT, we have established the FormalGeo, which consists of 88 geometric predicates and 196 theorems. It can represent, validate, and solve IMO-level geometry problems. we also have crafted the FGPS (formal geometry problem solver) in Python. It serves as both an interactive assistant for verifying problem-solving processes and an automated problem solver, utilizing various methods such as forward search, backward search and AI-assisted search. We've annotated the FormalGeo7k dataset, containing 6,981 (expand to 186,832 through data augmentation) geometry problems with complete formal language annotations. Implementation of the formal system and experiments on the FormalGeo7k validate the correctness and utility of the GFT. The backward depth-first search method only yields a 2.42% problem-solving failure rate, and we can incorporate deep learning techniques to achieve lower one. The source code of FGPS and FormalGeo7k dataset are available at https://github.com/BitSecret/FormalGeo.

摘要
这是我们过去三年的一系列工作中的第一篇论文。在这篇论文中，我们构建了一个完整、兼容的正式平面几何系统。这将成为在IMO级平面几何挑战和可读的人工智能自动理解之间的关键桥梁。通过这个正式系统，我们可以将现代人工智能模型与我们的正式系统集成了。在这个正式框架下，人工智能现在可以提供平面几何问题的推理解决方案，就像处理其他自然语言一样，并且这些证明是可读、可追溯和可验证的。我们提出了几何ormal化理论（GFT），以引导正式几何系统的开发。基于GFT，我们建立了FormalGeo，它包含88个几何 predicate 和 196个定理。它可以表示、验证和解决IMO级平面几何问题。我们还制作了FGPS（正式几何问题解决器），它是一个在 Python 中实现的交互式助手和自动问题解决器，可以使用多种方法，如前向搜索、后向搜索和人工智能辅助搜索。我们对 FormaleGeo7k 数据集进行了注释，该数据集包含 6,981 个（通过数据扩充到 186,832）平面几何问题的完整正式语言注释。我们对正式系统的实现和 FormaleGeo7k 数据集的实验 validate 了正确性和实用性。使用回溯深度先搜索法只有2.42%的问题解决失败率，并且可以通过深度学习技术来降低这个数字。FGPS 和 FormaleGeo7k 数据集的源代码可以在 GitHub 上找到。

Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy

paper_url: http://arxiv.org/abs/2310.17997
repo_url: None
paper_authors: Hui Sun, Hao Luo, Feifei Wang, Qingjiu Chen, Meng Chen, Xiaoduo Wang, Haibo Yu, Guanglie Zhang, Lianqing Liu, Jianping Wang, Dapeng Wu, Wen Jung Li
for: 这个论文的目的是提高扫描电子镜像（SEM）的分辨率和深度场景图像。
methods: 这个论文使用深度学习来建立扫描超分解（OSR）图像和SEM领域图像之间的映射关系，从而将OSR图像转换成SEM类型的大深度场景图像。
results: 比较PSNR和结构相似度指标值表示，深度学习方法在图像到图像翻译中表现出色，与光学超分解图像相比，PSNR提高约0.74dB。这种方法在检测晶圆缺陷、生物样本分析、审查和其他领域都具有广泛的应用前景。

Abstract
Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the mapping relationship between optical super-resolution (OSR) images and SEM domain images, which enables the transformation of OSR images into SEM-like large depth-of-field images. Our custom-built scanning superlens microscopy (SSUM) system, which requires neither coating samples by conductive films nor a vacuum environment, is used to acquire the OSR images with features down to ~80 nm. The peak signal-to-noise ratio (PSNR) and structural similarity index measure values indicate that the deep learning method performs excellently in image-to-image translation, with a PSNR improvement of about 0.74 dB over the optical super-resolution images. The proposed method provides a high level of detail in the reconstructed results, indicating that it has broad applicability to chip-level defect detection, biological sample analysis, forensics, and various other fields.

摘要
扫描电子顾问（SEM）在多种应用中是不可或缺的，包括微电子到食品加工等，因为它可以提供具有大深度场的图像，超过光学折射限制。然而，技术需要将导电薄膜层应用于隔离样品和真空环境。我们使用深度学习来获得扫描超解像（OSR）图像和SEM领域图像之间的映射关系，这使得OSR图像可以转换为大深度场的SEM样式图像。我们自制的扫描超透镜系统（SSUM）不需要将样品层层涂敷导电薄膜，也不需要真空环境，可以获得OSR图像的特征下限为~80nm。PSNR和结构相似性指数值表明，深度学习方法在图像到图像翻译中表现出色，与扫描超解像图像相比，PSNR提高约0.74dB。我们提出的方法可以在各种领域中提供高级别的细节，包括半导体缺陷检测、生物样本分析、法医和多种其他领域。

Autonomous 3D Exploration in Large-Scale Environments with Dynamic Obstacles

paper_url: http://arxiv.org/abs/2310.17977
repo_url: None
paper_authors: Emil Wiman, Ludvig Widén, Mattias Tiger, Fredrik Heintz
for: 本研究旨在开探自动系统在动态和不确定的实际环境中的探索能力，以及如何通过包含动态障碍物在计划中来利用动态环境。
methods: 提议的 Dynamic Autonomous Exploration Planner (DAEP) extend AEP，以便考虑动态障碍物，并在各种动态环境中进行了全面评估。
results: DAEP 在动态和大规模环境中表现出优于当前标准方法，并在探索和碰撞避免方面具有更高的效果。

Abstract
Exploration in dynamic and uncertain real-world environments is an open problem in robotics and constitutes a foundational capability of autonomous systems operating in most of the real world. While 3D exploration planning has been extensively studied, the environments are assumed static or only reactive collision avoidance is carried out. We propose a novel approach to not only avoid dynamic obstacles but also include them in the plan itself, to exploit the dynamic environment in the agent's favor. The proposed planner, Dynamic Autonomous Exploration Planner (DAEP), extends AEP to explicitly plan with respect to dynamic obstacles. To thoroughly evaluate exploration planners in such settings we propose a new enhanced benchmark suite with several dynamic environments, including large-scale outdoor environments. DAEP outperform state-of-the-art planners in dynamic and large-scale environments. DAEP is shown to be more effective at both exploration and collision avoidance.

摘要
文本翻译为简化中文：在真实世界中的动态和不确定环境中进行探索是Robotics中的一个开放问题，也是自主系统在大多数真实世界中的基本能力。而3D探索规划已经得到了广泛的研究，但是环境假设为静止的或者只是进行了反射性碰撞避免。我们提出了一种新的方法，不仅避免动态障碍物，而且将其包含在计划中，以利用动态环境来帮助代理人。我们提出的 Dynamic Autonomous Exploration Planner（DAEP）扩展了AEP，以明确地考虑动态障碍物。为了全面评估探索 плаanner在这些设置下的性能，我们提出了一个新的加强版benchmark suite，包括一些大规模的外部环境。DAEP在动态和大规模环境中表现出色，在探索和碰撞避免方面都更有效。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.17966
repo_url: https://github.com/leaplabthu/famo2o
paper_authors: Shenzhi Wang, Qisen Yang, Jiawei Gao, Matthieu Gaetan Lin, Hao Chen, Liwei Wu, Ning Jia, Shiji Song, Gao Huang
for: 这篇论文是关于Offline-to-online reinforcement learning（RL）训练方法的，旨在解决在线环境中融合预先收集的数据集和精度调整的问题。
methods: 该论文提出了一种简单 yet effective的框架，即Family Offline-to-Online RL（FamO2O），可以让现有算法决定不同状态下的改进/约束平衡。FamO2O使用一个通用模型训练一个家族政策，并使用一个平衡模型选择适合每个状态的政策。
results: 在许多实验中，FamO2O具有与现有方法相比的 statistically significant 改进，并达到了D4RLbenchmark上的状态空间最佳性能。代码可以在https://github.com/LeapLabTHU/FamO2O中找到。

Abstract
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states. To this end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O utilizes a universal model to train a family of policies with different improvement/constraint intensities, and a balance model to select a suitable policy for each state. Theoretically, we prove that state-adaptive balances are necessary for achieving a higher policy performance upper bound. Empirically, extensive experiments show that FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark. Codes are available at https://github.com/LeapLabTHU/FamO2O.

摘要
偏向在线学习（RL）训练 paradigma combines 预训练在预收集的数据集上与在线环境的精细调整。然而，在线调整可能会加剧分布shift问题。现有的解决方案通过在offline和online学习中加入策略约束来解决该问题。这些方法通常提出一个在多种数据集上保持策略改进目标和约束之间的平衡。然而，这种一大把 fits all的方法可能无法最佳利用每个采集的样本，因为不同的状态下的数据质量有很大的差异。为此，我们介绍了Family Offline-to-Online RL（FamO2O）框架，它可以让现有算法在不同的状态下选择适当的策略改进约束。FamO2O使用一个通用模型来训练一个家族策略，每个策略都有不同的改进约束强度。此外，FamO2O还使用一个平衡模型来选择每个状态下最适合的策略。理论上，我们证明了适应性平衡是实现更高策略性能上限的必要条件。empirically，我们进行了大量的实验，并证明了FamO2O可以在D4RL benchmark上达到状态前方的性能。代码可以在https://github.com/LeapLabTHU/FamO2O上获取。

Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare

paper_url: http://arxiv.org/abs/2310.17956
repo_url: https://github.com/williamliujl/qilin-med-vl
paper_authors: Junling Liu, Ziming Wang, Qichen Ye, Dading Chong, Peilin Zhou, Yining Hua
for:This paper aims to address the lack of large language models in languages other than English and the ability to interpret multi-modal input, specifically for global healthcare accessibility.methods:The study introduces Qilin-Med-VL, the first Chinese large vision-language model that combines a pre-trained Vision Transformer (ViT) with a foundational language model. The model undergoes a two-stage curriculum training process that includes feature alignment and instruction tuning.results:The model is able to generate medical captions and answer complex medical queries, and the authors release a dataset called ChiMed-VL, which consists of over 1 million image-text pairs to enable detailed and comprehensive interpretation of medical data using various types of images.

Abstract
Large Language Models (LLMs) have introduced a new era of proficiency in comprehending complex healthcare and biomedical topics. However, there is a noticeable lack of models in languages other than English and models that can interpret multi-modal input, which is crucial for global healthcare accessibility. In response, this study introduces Qilin-Med-VL, the first Chinese large vision-language model designed to integrate the analysis of textual and visual data. Qilin-Med-VL combines a pre-trained Vision Transformer (ViT) with a foundational LLM. It undergoes a thorough two-stage curriculum training process that includes feature alignment and instruction tuning. This method enhances the model's ability to generate medical captions and answer complex medical queries. We also release ChiMed-VL, a dataset consisting of more than 1M image-text pairs. This dataset has been carefully curated to enable detailed and comprehensive interpretation of medical data using various types of images.

摘要
大型语言模型（LLM）已经引入了新的时代，能够深刻理解复杂的医疗和生物医学话题。然而， существует一定的语言 besides English和可以处理多modal输入的模型缺失，这对全球医疗访问ibilty是关键。为此，本研究介绍了Qilin-Med-VL，首个用于整合文本和视觉数据的中文大vision-语言模型。Qilin-Med-VL结合预训练的视觉转换器（ViT）和基础的LLM。它通过两个阶段课程训练过程，包括特征对齐和指令调整。这种方法使得模型能够生成医学描述和回答复杂的医学问题。我们还发布了ChiMed-VL数据集，包含more than 1M的图像-文本对。这个数据集经过仔细审核，以便使用不同类型的图像进行详细和全面的医学数据解释。

Understanding Parameter Saliency via Extreme Value Theory

paper_url: http://arxiv.org/abs/2310.17951
repo_url: None
paper_authors: Shuo Wang, Issei Sato
For: This paper aims to identify and correct misclassifications in deep neural networks, specifically convolutional neural networks (CNNs), by ranking convolution filters based on their potential to cause misidentification.* Methods: The paper uses parameter saliency ranking, which is based on extreme value theory, to identify the filters that are most likely to cause misclassification. The authors also use fine-tuning to correct misidentification.* Results: The paper shows that the proposed method can detect malicious filters and is less biased against the depth of layers in deep neural networks compared to existing methods. The authors also demonstrate the effectiveness of their approach on ImageNet.

Abstract
Deep neural networks are being increasingly implemented throughout society in recent years. It is useful to identify which parameters trigger misclassification in diagnosing undesirable model behaviors. The concept of parameter saliency is proposed and used to diagnose convolutional neural networks (CNNs) by ranking convolution filters that may have caused misclassification on the basis of parameter saliency. It is also shown that fine-tuning the top ranking salient filters has efficiently corrected misidentification on ImageNet. However, there is still a knowledge gap in terms of understanding why parameter saliency ranking can find the filters inducing misidentification. In this work, we attempt to bridge the gap by analyzing parameter saliency ranking from a statistical viewpoint, namely, extreme value theory. We first show that the existing work implicitly assumes that the gradient norm computed for each filter follows a normal distribution. Then, we clarify the relationship between parameter saliency and the score based on the peaks-over-threshold (POT) method, which is often used to model extreme values. Finally, we reformulate parameter saliency in terms of the POT method, where this reformulation is regarded as statistical anomaly detection and does not require the implicit assumptions of the existing parameter-saliency formulation. Our experimental results demonstrate that our reformulation can detect malicious filters as well. Furthermore, we show that the existing parameter saliency method exhibits a bias against the depth of layers in deep neural networks. In particular, this bias has the potential to inhibit the discovery of filters that cause misidentification in situations where domain shift occurs. In contrast, parameter saliency based on POT shows less of this bias.

摘要
深度神经网络在近年中逐渐普及社会。为了诊断模型的不良行为，identifying模型参数的诱导性是非常有用的。在这些年中，我们提出了参数敏感性的概念，并用于诊断卷积神经网络（CNNs）中的参数敏感性排名。我们还证明了精细调整涉及到诊断错误的顶层敏感filter可以高效地修复ImageNet中的误分类。然而，我们还存在一个知识漏洞，即理解参数敏感排名如何找到导致误分类的filter。在这种情况下，我们尝试通过统计视角来填补这个漏洞，即使用极值理论。我们首先显示了现有工作假设每个滤波器的梯度 нор computes follows a normal distribution。然后，我们解释了参数敏感和分数之间的关系，并使用peaks-over-threshold（POT）方法来模型极值。最后，我们重新定义参数敏感，以统计异常检测的形式，不需要现有参数敏感的假设。我们的实验结果表明，我们的重新定义可以检测到危险的滤波器，并且我们发现现有参数敏感方法具有层深度的偏见，可能在领域转换 occurs 时阻碍发现误分类的滤波器。相比之下，基于POT方法的参数敏感方法具有较少的偏见。

A Comprehensive and Reliable Feature Attribution Method: Double-sided Remove and Reconstruct (DoRaR)

paper_url: http://arxiv.org/abs/2310.17945
repo_url: https://github.com/dxq21/dorar
paper_authors: Dong Qin, George Amariucai, Daji Qiao, Yong Guan, Shen Fu
for: 这种研究旨在解决深度神经网络和其他机器学习模型中的内部决策机制不透明性问题，以提高这些黑盒模型在不同领域的应用。methods: 该研究提出了一种基于多种改进方法的 Double-sided Remove and Reconstruct (DoRaR) 特征归因方法，可以有效地减轻艺术ifacts问题和Encoding Prediction in the Explanation (EPITE)问题，并可以帮助训练一个性能更高的特征选择器。results: 该研究通过对 MNIST、CIFAR10 和自己制作的synthetic数据集进行了广泛的测试，表明 DoRaR 特征归因方法可以有效地解释模型决策，并且可以超越其他现有的特征归因方法。

Abstract
The limited transparency of the inner decision-making mechanism in deep neural networks (DNN) and other machine learning (ML) models has hindered their application in several domains. In order to tackle this issue, feature attribution methods have been developed to identify the crucial features that heavily influence decisions made by these black box models. However, many feature attribution methods have inherent downsides. For example, one category of feature attribution methods suffers from the artifacts problem, which feeds out-of-distribution masked inputs directly through the classifier that was originally trained on natural data points. Another category of feature attribution method finds explanations by using jointly trained feature selectors and predictors. While avoiding the artifacts problem, this new category suffers from the Encoding Prediction in the Explanation (EPITE) problem, in which the predictor's decisions rely not on the features, but on the masks that selects those features. As a result, the credibility of attribution results is undermined by these downsides. In this research, we introduce the Double-sided Remove and Reconstruct (DoRaR) feature attribution method based on several improvement methods that addresses these issues. By conducting thorough testing on MNIST, CIFAR10 and our own synthetic dataset, we demonstrate that the DoRaR feature attribution method can effectively bypass the above issues and can aid in training a feature selector that outperforms other state-of-the-art feature attribution methods. Our code is available at https://github.com/dxq21/DoRaR.

摘要
深度神经网络（DNN）和其他机器学习（ML）模型的内部决策机制的不充分透明性，限制了它们在一些领域的应用。为了解决这个问题，feature attrition方法被开发出来，以确定DNN和ML模型决策中的关键特征。然而，许多feature attrition方法存在一些缺点。例如，一类feature attrition方法会产生artefacts问题，即将外部样本掩码直接输入到原始训练的分类器中。另一类feature attrition方法使用共同训练的特征选择器和预测器，可以避免artefacts问题，但是它们会产生Encoding Prediction in the Explanation（EPITE）问题，导致预测结果的可信度受到特征选择器的影响。为了解决这些问题，我们在本研究中提出了Double-sided Remove and Reconstruct（DoRaR）特征attrition方法，基于一些改进方法。通过对MNIST、CIFAR10和我们自己的 sintetic dataset进行了广泛的测试，我们证明了DoRaR特征attrition方法可以有效地 circumvent这些问题，并且可以帮助训练一个性能更高的特征选择器。我们的代码可以在https://github.com/dxq21/DoRaR上下载。

Unified Segment-to-Segment Framework for Simultaneous Sequence Generation

paper_url: http://arxiv.org/abs/2310.17940
repo_url: None
paper_authors: Shaolei Zhang, Yang Feng
for: simultaneous sequence generation for real-time scenarios, such as streaming speech recognition, simultaneous machine translation, and simultaneous speech translation
methods: segment-to-segment framework (Seg2Seg) with adaptive and unified learning for mapping between source and target sequences
results: state-of-the-art performance and better generality across various tasks, as demonstrated by experiments on multiple simultaneous generation tasks

Abstract
Simultaneous sequence generation is a pivotal task for real-time scenarios, such as streaming speech recognition, simultaneous machine translation and simultaneous speech translation, where the target sequence is generated while receiving the source sequence. The crux of achieving high-quality generation with low latency lies in identifying the optimal moments for generating, accomplished by learning a mapping between the source and target sequences. However, existing methods often rely on task-specific heuristics for different sequence types, limiting the model's capacity to adaptively learn the source-target mapping and hindering the exploration of multi-task learning for various simultaneous tasks. In this paper, we propose a unified segment-to-segment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner. During the process of simultaneous generation, the model alternates between waiting for a source segment and generating a target segment, making the segment serve as the natural bridge between the source and target. To accomplish this, Seg2Seg introduces a latent segment as the pivot between source to target and explores all potential source-target mappings via the proposed expectation training, thereby learning the optimal moments for generating. Experiments on multiple simultaneous generation tasks demonstrate that Seg2Seg achieves state-of-the-art performance and exhibits better generality across various tasks.

摘要
同时序列生成是现实时应用场景中的关键任务，如流媒体语音识别、同时机器翻译和同时语音翻译，其目标序列在接收源序列时生成。实现高质量生成的关键在于确定最佳的生成时机，通过学习源和目标序列之间的映射来实现。然而，现有方法通常采用特定任务的规则来控制生成，限制模型在不同序列类型上适应性地学习源-目标映射，阻碍了不同同时任务的多任务学习。本文提出了一个统一的段到段框架（Seg2Seg），用于同时序列生成。在同时生成过程中，模型会在等待源段和生成目标段之间转换，使得段成为源和目标之间自然的桥梁。为了实现这一点，Seg2Seg引入了一个 latent segment，作为源到目标映射的潜在桥梁，并通过提出的预期训练来探索所有可能的源-目标映射，从而学习最佳的生成时机。实验表明，Seg2Seg在多个同时生成任务上具有状态体系最佳性和更好的普适性。

Transformers as Graph-to-Graph Models

paper_url: http://arxiv.org/abs/2310.17936
repo_url: https://github.com/idiap/g2g-transformer
paper_authors: James Henderson, Alireza Mohammadshahi, Andrei C. Coman, Lesly Miculicich
for: 本 paper 的目的是使 Transformers 成为图形模型，并将序列视为特殊情况。
methods: 本 paper 使用图 edges 作为注意力权重，并通过 iterative graph refinement 实现非 autoregressive 图预测。
results: 本 paper 的实验结果表明，这种 architecture 可以达到模elling 多种语言结构的最佳性能，并很好地与预training 中学习的含义语言表示结合。

Abstract
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pretrained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimise the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modelling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.

摘要
我们认为Transformer是基本上是图像图模型，序列只是特殊情况。注意权重函数对应于图像边。我们的图像图Transformer架构使得这种能力变得明确，将图像边输入到注意权重计算中，并使用注意函数预测图像边，因此将显式图 integrate到预训练Transformer中学习的潜在图中。添加迭代图精度提供了共同嵌入输入、输出和潜在图，allowing非自适应图预测可以优化完整的图 без any bespoke pipeline or decoding strategy. empirical results show that this architecture achieves state-of-the-art accuracy for modeling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Matching of Descriptive Labels to Glossary Descriptions

paper_url: http://arxiv.org/abs/2310.18385
repo_url: None
paper_authors: Toshihiro Takahashi, Takaaki Tateishi, Michiaki Tatsubori
for: 这个论文主要是为了解决软件工程中的描述性标签匹配问题，即工程师需要将描述性标签（如商业术语、表列名称）与相关的描述文本相匹配。
methods: 该论文提议使用现有的semantic text similarity测量（STS），并通过扩展 Sentence Retrieval和集成上下文化来增强它。 Sentence Retrieval是一种方法，可以为给定的标签返回与之相关的句子，而集成上下文化则是一种方法，可以计算两个上下文集（例如，两个表中的列名称）之间的相似度。
results: 实验结果表明，提议的方法可以帮助下面STS更正确地匹配描述性标签与描述文本。

Abstract
Semantic text similarity plays an important role in software engineering tasks in which engineers are requested to clarify the semantics of descriptive labels (e.g., business terms, table column names) that are often consists of too short or too generic words and appears in their IT systems. We formulate this type of problem as a task of matching descriptive labels to glossary descriptions. We then propose a framework to leverage an existing semantic text similarity measurement (STS) and augment it using semantic label enrichment and set-based collective contextualization where the former is a method to retrieve sentences relevant to a given label and the latter is a method to compute similarity between two contexts each of which is derived from a set of texts (e.g., column names in the same table). We performed an experiment on two datasets derived from publicly available data sources. The result indicated that the proposed methods helped the underlying STS correctly match more descriptive labels with the descriptions.

摘要
<>使用简化字符串对文本进行相似度计算，可以帮助软件工程师在IT系统中更好地理解描述性标签（如商业术语、表列名称）的 semantics。我们将这种问题定义为映射描述标签到词典描述的任务。我们then proposed a framework to leveragen existing semantic text similarity measurement (STS) and augment it using semantic label enrichment and set-based collective contextualization。在我们的实验中，我们使用了两个公共数据源中的数据，并得到了结果，表明我们的方法可以帮助STS更正确地匹配描述标签与描述。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method

paper_url: http://arxiv.org/abs/2310.17918
repo_url: None
paper_authors: Yukun Zhao, Lingyong Yan, Weiwei Sun, Guoliang Xing, Chong Meng, Shuaiqiang Wang, Zhicong Cheng, Zhaochun Ren, Dawei Yin
for: 检测LLMs生成的非事实答案，提高LLMs的可靠性。
methods: 提出了一种自我检测方法，通过多种文本表达和模型自身的询问来检测LLMs是否生成非事实答案。
results: 经过实验表明，该方法可以有效地检测LLMs生成的非事实答案，并且可以在最新发布的LLMs中进行应用，如Vicuna、ChatGPT和GPT-4等。

Abstract
Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. However, recent literature reveals that LLMs generate nonfactual responses intermittently, which impedes the LLMs' reliability for further utilization. In this paper, we propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results. Specifically, we first diversify the textual expressions for a given question and collect the corresponding answers. Then we examine the divergencies between the generated answers to identify the questions that the model may generate falsehoods. All of the above steps can be accomplished by prompting the LLMs themselves without referring to any other external resources. We conduct comprehensive experiments and demonstrate the effectiveness of our method on recently released LLMs, e.g., Vicuna, ChatGPT, and GPT-4.

摘要
大型自然语言处理模型（LLM）在自然语言处理任务中表现出了很大的潜力。然而，最新的文献表明，LLM occasional generation of nonfactual responses，这限制了LLM的可靠性，使其无法进一步使用。在这篇论文中，我们提出了一种新的自我检测方法，可以检测LLM不熟悉的问题是否会生成非事实答案。 Specifically，我们首先对给定问题的文本表达进行多样化，然后收集相应的答案。然后，我们比较这些生成的答案之间的差异，以确定问题是否会导致模型生成假信息。所有这些步骤都可以通过使用LLM自己的提问，不需要参考任何外部资源。我们对最近发布的LLM，如Vicuna、ChatGPT和GPT-4等进行了广泛的实验，并证明了我们的方法的有效性。

The Innovation-to-Occupations Ontology: Linking Business Transformation Initiatives to Occupations and Skills

paper_url: http://arxiv.org/abs/2310.17909
repo_url: None
paper_authors: Daniela Elia, Fang Chen, Didar Zowghi, Marian-Andrei Rizoiu
for: 这篇论文是为了描述一种新的 ontology 和一种自动填充方法，以链接企业变革 iniciativas 和职业。
methods: 该论文使用了 online job ads 和 Wikipedia 页面的嵌入式提取出来自动填充 ontology。
results: 该研究成功地匹配了各种企业变革 initiaves 和相应的职业，并提供了一种创新的方法来导引企业和教育机构在具体的企业变革 initiaves 中寻找合适的人才。

Abstract
The fast adoption of new technologies forces companies to continuously adapt their operations making it harder to predict workforce requirements. Several recent studies have attempted to predict the emergence of new roles and skills in the labour market from online job ads. This paper aims to present a novel ontology linking business transformation initiatives to occupations and an approach to automatically populating it by leveraging embeddings extracted from job ads and Wikipedia pages on business transformation and emerging technologies topics. To our knowledge, no previous research explicitly links business transformation initiatives, like the adoption of new technologies or the entry into new markets, to the roles needed. Our approach successfully matches occupations to transformation initiatives under ten different scenarios, five linked to technology adoption and five related to business. This framework presents an innovative approach to guide enterprises and educational institutions on the workforce requirements for specific business transformation initiatives.

摘要
新技术的快速采用使公司需要不断适应操作，使预测工作力需求变得更加困难。一些最近的研究尝试通过在线职位招聘广告预测劳动力市场中的新角色和技能的出现。本文提出了一种新的 ontology，将企业转型活动与职业联系起来，并通过利用来自职位招聘和企业转型和新技术主题的Wikipedia页面中提取的嵌入进行自动填充。根据我们所知，没有任何前期研究直接将企业转型活动，如新技术的采用或新市场的入场，与需要的职业联系起来。我们的方法成功地将职业与转型活动相匹配，并在五种技术采用和五种商业转型的场景下进行了十个不同的enario。这种框架将为企业和教育机构提供一种创新的方法，以指导特定的企业转型活动所需的工作力。

Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey

paper_url: http://arxiv.org/abs/2310.17903
repo_url: https://github.com/yueyuel/reliablelm4code
paper_authors: Xinyu She, Yue Liu, Yanjie Zhao, Yiling He, Li Li, Chakkrit Tantithamthavorn, Zhan Qin, Haoyu Wang
for: 本研究旨在探讨语言模型 для代码智能（LM4Code）中存在的各种隐患，以提高其可靠性和实用性。
methods: 我们采用了系统性的研究方法，包括文献综述和分类分析，对67篇来自顶尖学术会议的研究进行了仔细审查。
results: 我们发现了LM4Code研究中存在的4大类隐患，即数据采集和标注、系统设计和学习、性能评估和部署维护。这些隐患可能导致LM4Code系统的不可靠性和实用性问题。

Abstract
Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways.

摘要
（简化中文）现代语言模型（LM）在源代码生成和理解方面取得了成功，导致学习基于代码智能的研究得到了推动，例如自动修复bug和测试用例生成。尽管LM4Code具有巨大的潜力，但它们也面临着一些潜在的难题，这些难题会影响LM4Code在实际应用中的性能和可靠性。这些挑战需要我们进行全面的理解，不仅是识别这些问题，而且还需要探究它们的可能的影响和现有的解决方案，以建立更可靠的LM4Code系统。我们采用了一种系统atic research approach，进行了广泛的文献综述，并最终确定了67篇来自top-tier venues的研究。经过仔细检查这些研究，我们设计了LM4Code研究中的taxonomy难点，并进行了系统的研究，总结了各种难点的问题、影响、当前的解决方案和挑战。我们开发了一种全面的分类方案，将难点分解成四个重要方面：数据收集和标注、系统设计和学习、性能评估和部署维护。通过这项研究，我们希望为研究者和实践者提供一个路线图，以便他们更好地理解和利用LM4Code，以实现可靠和可信worthy的应用。

Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey

paper_url: http://arxiv.org/abs/2310.17894
repo_url: None
paper_authors: Weixu Zhang, Yifei Wang, Yuanfeng Song, Victor Junqiu Wei, Yuxing Tian, Yiyan Qi, Jonathan H. Chan, Raymond Chi-Wing Wong, Haiqin Yang
for: 这篇论文旨在提供一个完整的对话语言处理（NLP）技术在表格数据查询和可见化方面的概述，帮助用户通过自然语言查询来交互 WITH 数据。
methods: 本论文介绍了对话语言处理技术的基本概念和方法，尤其是Semantic Parsing，这是将自然语言转换为SQL查询或数据可见化命令的关键技术。
results: 本论文提供了关于文本-SQL和文本-可见化问题的最新进展，包括数据集、方法、指标和系统设计等方面的报告，并强调了大语言模型（LLM）在这些领域的影响和未来发展的可能性。

Abstract
The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.

摘要
“自然语言处理技术的出现已经改变了用户与数据表格之间的交互方式，从传统的查询语言和手动折衣到更直观的自然语言界面。大型语言模型（LLM）如ChatGPT和其继承者的出现已经进一步推动了这个领域，开启了新的自然语言处理技术的avenues。本缩短所提供的缩短简介了自然语言界面 для数据表格查询和可视化，让用户可以使用自然语言查询来与数据进行交互。我们将介绍这些界面的基本概念和技术，尤其是对于自然语言转换为SQL查询或数据可视化指令的问题，我们将对这些问题进行深入探讨。我们还将详细介绍过去几年在文本转SQL和文本转可视化领域中的进展，包括 dataset、方法、指标和系统设计等方面的发展。这包括对大型语言模型（LLM）的影响，包括它们的优点、局限性和未来改进的潜力。我们 hoped that this survey will provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

paper_url: http://arxiv.org/abs/2310.17884
repo_url: https://github.com/skywalker023/confAIde
paper_authors: Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, Yejin Choi
for: This paper highlights the privacy risks associated with the use of large language models (LLMs) in AI assistants, specifically the inference-time privacy risks that arise when LLMs are fed different types of information from multiple sources and are expected to reason about what to share in their outputs.
methods: The paper proposes ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. The authors use this benchmark to evaluate the privacy reasoning abilities of two state-of-the-art LLMs, GPT-4 and ChatGPT.
results: The authors find that even the most capable models, GPT-4 and ChatGPT, reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when the authors employ privacy-inducing prompts or chain-of-thought reasoning. The paper highlights the immediate need to explore novel inference-time privacy-preserving approaches based on reasoning and theory of mind.

Abstract
The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information from multiple sources in their inputs and are expected to reason about what to share in their outputs, for what purpose and with whom, within a given context. In this work, we draw attention to the highly critical yet overlooked notion of contextual privacy by proposing ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. Our experiments show that even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when we employ privacy-inducing prompts or chain-of-thought reasoning. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.

摘要
大型语言模型（LLMs）在人工智能助手（在工作、家庭等）的交互使用中引入了一新的推理时隐私风险：LLMs 从多种来源接受不同类型的信息，并被要求在输出中对此进行分享、目的和对象的决定，在给定的上下文中。在这项工作中，我们吸引关注高度敏感但受到忽略的上下文隐私的概念，并提出了 ConfAIde，一个用于评估指导学习模型的隐私推理能力的benchmark。我们的实验显示，even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively。这种泄露 persist even when we employ privacy-inducing prompts or chain-of-thought reasoning。我们的工作强调了立即需要探索新的推理时隐私保护方法，基于推理和思维模型。

ASPIRO: Any-shot Structured Parsing-error-Induced ReprOmpting for Consistent Data-to-Text Generation

paper_url: http://arxiv.org/abs/2310.17877
repo_url: https://github.com/vejvarm/aspiro
paper_authors: Martin Vejvar, Yasutaka Fujimoto
for: 用于构成数据数据预设的短模板句子生成，具有零至几据设置。
methods: 使用大型自然语言模型（LLM）直接生成概念不受限制的模板，而不是依赖 LLM 复制示例物件或手动验证/创建模板。使用 LLM 重新提示和自动架构检查来解决生成问题。
results: 与直接 LLM 输出相比，ASPIRO 平均降低 DART dataset 中生成的括号错误率66%。在 Rel2Text dataset 上，使用最佳 5 据 setup（text-davinci-003）， scored BLEU 50.62、METEOR 45.16、BLEURT 0.82、NUBIA 0.87 和 PARENT 0.8962，与最近的 fine-tuned 预训语言模型竞争。

Abstract
We present ASPIRO, an approach for structured data verbalisation into short template sentences in zero to few-shot settings. Unlike previous methods, our approach prompts large language models (LLMs) to directly produce entity-agnostic templates, rather than relying on LLMs to faithfully copy the given example entities, or validating/crafting the templates manually. We incorporate LLM re-prompting, triggered by algorithmic parsing checks, as well as the PARENT metric induced consistency validation to identify and rectify template generation problems in real-time. ASPIRO, compared to direct LLM output, averages 66\% parsing error rate reduction in generated verbalisations of RDF triples on the DART dataset. Our best 5-shot text-davinci-003 setup, scoring BLEU of 50.62, METEOR of 45.16, BLEURT of 0.82, NUBIA of 0.87, and PARENT of 0.8962 on the Rel2Text dataset, competes effectively with recent fine-tuned pre-trained language models.

摘要
我们介绍ASPIRO方法，用于在零到几极少示例设置下将结构化数据变成简短的模板句子。与前一代方法不同，我们的方法会让大型自然语言模型（LLM）直接生成无关实体的模板，而不是依赖于LLM忠实 копи写给定示例实体，或者手动验证/制定模板。我们利用LLM重新拓展，根据算法解析检查触发，以及由PARENT метрик引起的一致验证，实时rectify模板生成问题。与直接LLM输出相比，ASPIRO方法在DART数据集上的生成架构化描述中的平均解析错误率减少了66%。我们的最佳5枚TEXT-DAVINCI-003设置，在Rel2Text数据集上的BLEU分数为50.62，METEOR分数为45.16，BLEURT分数为0.82，NUBIA分数为0.87，和PARENT分数为0.8962，与最近的微调预训练语言模型竞争得来。

Ranking with Slot Constraints

paper_url: http://arxiv.org/abs/2310.17870
repo_url: https://github.com/GarlGuo/ranking_with_slot_constraints
paper_authors: Wentao Guo, Andrew Wang, Bradon Thymes, Thorsten Joachims
for: ranking with slot constraints, which can be applied to various real-world problems such as college admission and medical trial participant selection.
methods: the proposed algorithm called MatchRank, which aims to maximize the number of filled slots by evaluating candidates in the order of the ranking.
results: MatchRank has a strong approximation guarantee and can provide substantial improvements over a range of synthetic and real-world tasks.Here’s the full summary in Simplified Chinese:
for: ranking with slot constraints, 可以应用到各种实际世界问题，如大学入学和医学试验参与者选择。
methods: 提案的算法叫做MatchRank，它的目标是通过评估候选人来填充槽位。
results: MatchRank具有强的近似保证，并且在多个 sintetic 和实际任务上能提供substantial 的改善。

Abstract
We introduce the problem of ranking with slot constraints, which can be used to model a wide range of application problems -- from college admission with limited slots for different majors, to composing a stratified cohort of eligible participants in a medical trial. We show that the conventional Probability Ranking Principle (PRP) can be highly sub-optimal for slot-constrained ranking problems, and we devise a new ranking algorithm, called MatchRank. The goal of MatchRank is to produce rankings that maximize the number of filled slots if candidates are evaluated by a human decision maker in the order of the ranking. In this way, MatchRank generalizes the PRP, and it subsumes the PRP as a special case when there are no slot constraints. Our theoretical analysis shows that MatchRank has a strong approximation guarantee without any independence assumptions between slots or candidates. Furthermore, we show how MatchRank can be implemented efficiently. Beyond the theoretical guarantees, empirical evaluations show that MatchRank can provide substantial improvements over a range of synthetic and real-world tasks.

摘要
我们介绍了带槽限制的排名问题，这可以用来模型广泛的应用问题---从大学入学限制不同学系的名额，到组织一个适合者的医疗试验参与者实验组。我们表明，传统的概率排名原则（PRP）可以对带槽限制的排名问题高度不理想，而我们提出了一个新的排名算法，called MatchRank。MatchRank的目的是在人工决策者按照排名顺序评估候选者时，生成将满足最多槽位的排名。这样，MatchRank超越了PRP，并将其视为对槽位不存在的特别情况。我们的理论分析显示MatchRank具有强的近似保证，不需要候选者或槽位之间的独立假设。此外，我们显示了如何有效地实现MatchRank。在理论保证之外，实际评估显示MatchRank可以提供广泛的Synthetic和实际任务中的重大改善。

Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests

paper_url: http://arxiv.org/abs/2310.17867
repo_url: None
paper_authors: Edward Raff, James Holt
for:多元例子学习（MIL）是一个特殊的分类问题，其中输入包含多个实例，每个实例具有一个标签，标签为正则则表示至少有一个正例在输入中，否则为负例。训练这种问题需要将实例级别的信息与袋级别的标签相关联，并且含有一定的 causal 假设和偏见。MIL问题在医疗、网络安全等领域都有广泛的应用。methods:本文研究了五种深度学习的MIL模型，发现这些模型都不尊重标准MIL假设。它们能够学习反相关的实例，即默认为正例，直到看到负例为止，这不应该是正确的MIL模型的行为。我们认为这些模型的改进版本和其他相关工作也会具有同样的问题。results:我们通过一种提议的”算法单元测试”来证明这些模型的问题。我们创建了一些合成数据集，可以由一个尊重MIL假设的模型解决，而这些数据集明显暴露了这些模型学习的问题。五种评估模型各自失败了一个或多个这些测试。这提供了一种模型独立的方法来识别模型假设的违反，我们希望这将对未来的MIL模型开发和评估具有用处。

Abstract
Multiple Instance Learning (MIL) is a sub-domain of classification problems with positive and negative labels and a "bag" of inputs, where the label is positive if and only if a positive element is contained within the bag, and otherwise is negative. Training in this context requires associating the bag-wide label to instance-level information, and implicitly contains a causal assumption and asymmetry to the task (i.e., you can't swap the labels without changing the semantics). MIL problems occur in healthcare (one malignant cell indicates cancer), cyber security (one malicious executable makes an infected computer), and many other tasks. In this work, we examine five of the most prominent deep-MIL models and find that none of them respects the standard MIL assumption. They are able to learn anti-correlated instances, i.e., defaulting to "positive" labels until seeing a negative counter-example, which should not be possible for a correct MIL model. We suspect that enhancements and other works derived from these models will share the same issue. In any context in which these models are being used, this creates the potential for learning incorrect models, which creates risk of operational failure. We identify and demonstrate this problem via a proposed "algorithmic unit test", where we create synthetic datasets that can be solved by a MIL respecting model, and which clearly reveal learning that violates MIL assumptions. The five evaluated methods each fail one or more of these tests. This provides a model-agnostic way to identify violations of modeling assumptions, which we hope will be useful for future development and evaluation of MIL models.

摘要
多例学习（Multiple Instance Learning，MIL）是一个类别问题的子领域，其中包含正例和负例，以及一个“袋”（bag）中的输入，其中正例是指袋中包含正例元素，否则为负例。在这种情况下，训练需要将袋级别标签与实例级别信息相关联，并且含有一定的 causal 假设和不对称性。MIL 问题在医疗（一个有害细胞表示癌症）、计算机安全（一个恶意执行程序使计算机感染）等领域出现。在这种工作中，我们考察了五种最具有影响力的深度MIL模型，并发现其中没有任何一个遵循标准MIL假设。它们可以学习反相关实例，即默认为正例标签，直到看到负例对例，这不应该是正确的MIL模型。我们认为这些模型的改进和其他基于这些模型的工作都会受到同样的问题。在这些模型被使用的任何情况下，这会创造出学习错误的模型，从而导致操作失败的风险。我们通过一种“算法单元测试”来识别和演示这个问题，其中我们创建了一些可以由遵循MIL假设的模型解决的 sintetic 数据集，并显示了这些模型学习的问题。五种评估方法都失败了一个或多个这些测试。这提供了一种模型独立的方式来识别模型假设的违反，我们希望这将对未来的MIL模型发展和评估具有用。

Function Space Bayesian Pseudocoreset for Bayesian Neural Networks

paper_url: http://arxiv.org/abs/2310.17852
repo_url: None
paper_authors: Balhae Kim, Hyungi Lee, Juho Lee
for: 这个论文旨在构建一种 bayesian Pseudocoreset，用于实现可扩展的 Bayesian 推理。
methods: 该方法在函数空间上构建 variational approximation，并将其与全量数据 posterior 匹配在函数空间上。
results: 该方法可以更好地衡量uncertainty量和Robustness，并且适用于多种模型架构。

Abstract
A Bayesian pseudocoreset is a compact synthetic dataset summarizing essential information of a large-scale dataset and thus can be used as a proxy dataset for scalable Bayesian inference. Typically, a Bayesian pseudocoreset is constructed by minimizing a divergence measure between the posterior conditioning on the pseudocoreset and the posterior conditioning on the full dataset. However, evaluating the divergence can be challenging, particularly for the models like deep neural networks having high-dimensional parameters. In this paper, we propose a novel Bayesian pseudocoreset construction method that operates on a function space. Unlike previous methods, which construct and match the coreset and full data posteriors in the space of model parameters (weights), our method constructs variational approximations to the coreset posterior on a function space and matches it to the full data posterior in the function space. By working directly on the function space, our method could bypass several challenges that may arise when working on a weight space, including limited scalability and multi-modality issue. Through various experiments, we demonstrate that the Bayesian pseudocoresets constructed from our method enjoys enhanced uncertainty quantification and better robustness across various model architectures.

摘要
一个 bayesian pseudocoreset 是一个简化的人工数据集，它捕捉了大规模数据集中的关键信息，因此可以用作可扩展的 bayesian 推理的代理数据集。通常，一个 bayesian pseudocoreset 是通过将 posterior conditioning 中的差异最小化来构建的。然而，对于深度神经网络等高维参数模型来说，评估差异可能具有挑战。在这篇论文中，我们提出了一种新的 bayesian pseudocoreset 构建方法，它在函数空间上进行。不同于之前的方法，我们的方法在模型参数（ weights）空间上构建和匹配 coreset 和全数据 posterior，而不是直接在 weights 空间上做。通过在函数空间上工作，我们的方法可以避免一些在 weights 空间上工作时可能会出现的挑战，包括有限扩展性和多模性问题。通过多个实验，我们示出了 bayesian pseudocoresets 由我们的方法构建的uncertainty quantification和模型 Architecture 的多样性均有所提高。

Real-time Animation Generation and Control on Rigged Models via Large Language Models

paper_url: http://arxiv.org/abs/2310.17838
repo_url: https://github.com/Whalefishin/LLM_animation
paper_authors: Han Huang, Fernanda De La Torre, Cathy Mengying Fang, Andrzej Banburski-Fahey, Judith Amores, Jaron Lanier
for: 这个论文是用于实时动画控制和生成的新方法的介绍，使用自然语言输入控制RIGGED模型。
methods: 论文使用了一个大型语言模型（LLM），将其嵌入在Unity中，以输出结构化文本，并将其解析成多种真实的动画。
results: 论文展示了LLM的潜在性，可以实现动画状态的灵活转换，并通过许多RIGGED模型和动作的质量验证了该方法的稳定性。

Abstract
We introduce a novel method for real-time animation control and generation on rigged models using natural language input. First, we embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations. Second, we illustrate LLM's potential to enable flexible state transition between existing animations. We showcase the robustness of our approach through qualitative results on various rigged models and motions.

摘要
我们介绍了一种新的实时动画控制和生成技术，使用自然语言输入控制着带有骨架的模型。首先，我们将大型语言模型（LLM）引入Unity中，以输出结构化的文本，并将其解析成多种真实和生动的动画。其次，我们展示了LLM的潜在能力，可以实现动画状态的灵活转换 между已有的动画。我们通过对不同的带有骨架和动作的模型和动画进行质量检测，证明了我们的方法的稳定性和可靠性。

One Style is All you Need to Generate a Video

paper_url: http://arxiv.org/abs/2310.17835
repo_url: https://github.com/sandman002/One-Style-is-All-You-Need-to-Generate-a-Video
paper_authors: Sandeep Manandhar, Auguste Genovesio
for: 这个论文旨在提出一种基于风格的条件视频生成模型，以及一种新的时间生成器，它基于一组学习的振荡基。
methods: 该方法使用了一种新的时间生成器，基于学习的振荡基，来学习动作的动态表示，这些表示独立于图像内容，可以在不同的演员之间传递。
results: 论文表明，该方法可以对视频质量进行显著提高，并且可以独立地 manipulate 动作和内容，以及进行时间GAN-反转，从一个内容或身份中恢复和传输视频动作。

Abstract
In this paper, we propose a style-based conditional video generative model. We introduce a novel temporal generator based on a set of learned sinusoidal bases. Our method learns dynamic representations of various actions that are independent of image content and can be transferred between different actors. Beyond the significant enhancement of video quality compared to prevalent methods, we demonstrate that the disentangled dynamic and content permit their independent manipulation, as well as temporal GAN-inversion to retrieve and transfer a video motion from one content or identity to another without further preprocessing such as landmark points.

摘要
在这篇论文中，我们提出了一种基于风格的条件视频生成模型。我们引入了一种基于学习的振荡基函数，用于学习不同动作的动态表示。我们的方法可以独立地 manipulate 动作表示，并且可以在不同的演员身上传递。除了与常见方法相比带来显著改善的视频质量之外，我们还示出了独立地执行动作和内容的权限，以及时间GAN-反转来重新处理和传输视频动作。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Ontology Revision based on Pre-trained Language Models

paper_url: http://arxiv.org/abs/2310.18378
repo_url: None
paper_authors: Qiu Ji, Guilin Qi, Yuxin Ye, Jiaye Li, Site Li, Jianjie Ren, Songtao Lu
for: 本研究旨在提出一种基于预训练模型的 ontology 修订算法，以解决 unsatisfiable 概念的问题。
methods: 本研究使用了 various 的 ontology 修订方法，包括定义 revision 算子和设计排名策略，以及使用 pre-trained 模型来编码 axiom semantics。
results: 根据实验结果，本研究的算法可以达到了 promising 的性能，而 adapted 修订算法可以大幅提高效率，最多可以Save 96% 的时间。此外，一些 scoring functions 可以帮助修订算法在很多情况下获得更好的结果。

Abstract
Ontology revision aims to seamlessly incorporate new information into an existing ontology and plays a crucial role in tasks such as ontology evolution, ontology maintenance, and ontology alignment. Similar to repair single ontologies, resolving logical incoherence in the task of ontology revision is also important and meaningful since incoherence is a main potential factor to cause inconsistency and reasoning with an inconsistent ontology will obtain meaningless answers. To deal with this problem, various ontology revision methods have been proposed to define revision operators and design ranking strategies for axioms in an ontology. However, they rarely consider axiom semantics which provides important information to differentiate axioms. On the other hand, pre-trained models can be utilized to encode axiom semantics, and have been widely applied in many natural language processing tasks and ontology-related ones in recent years. Therefore, in this paper, we define four scoring functions to rank axioms based on a pre-trained model by considering various information from a rebuttal ontology and its corresponding reliable ontology. Based on such a scoring function, we propose an ontology revision algorithm to deal with unsatisfiable concepts at once. If it is hard to resolve all unsatisfiable concepts in a rebuttal ontology together, an adapted revision algorithm is designed to deal with them group by group. We conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones. According to the experiments, it shows that our algorithms could achieve promising performance. The adapted revision algorithm could improve the efficiency largely, and at most 96% time could be saved for some ontology pairs. Some of our scoring functions help a revision algorithm obtain better results in many cases, especially for the challenging pairs.

摘要
ontology revision aims to seamlessly incorporate new information into an existing ontology and plays a crucial role in tasks such as ontology evolution, ontology maintenance, and ontology alignment. similar to repairing single ontologies, resolving logical incoherence in the task of ontology revision is also important and meaningful, as incoherence is a main potential factor that can cause inconsistency, and reasoning with an inconsistent ontology will obtain meaningless answers. to deal with this problem, various ontology revision methods have been proposed to define revision operators and design ranking strategies for axioms in an ontology. however, they rarely consider axiom semantics, which provides important information to differentiate axioms. on the other hand, pre-trained models can be utilized to encode axiom semantics, and have been widely applied in many natural language processing tasks and ontology-related ones in recent years. therefore, in this paper, we define four scoring functions to rank axioms based on a pre-trained model by considering various information from a rebuttal ontology and its corresponding reliable ontology. based on such a scoring function, we propose an ontology revision algorithm to deal with unsatisfiable concepts at once. if it is hard to resolve all unsatisfiable concepts in a rebuttal ontology together, an adapted revision algorithm is designed to deal with them group by group. we conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones. according to the experiments, it shows that our algorithms could achieve promising performance. the adapted revision algorithm could improve the efficiency largely, and at most 96% time could be saved for some ontology pairs. some of our scoring functions help a revision algorithm obtain better results in many cases, especially for the challenging pairs.

Large-scale Foundation Models and Generative AI for BigData Neuroscience

paper_url: http://arxiv.org/abs/2310.18377
repo_url: None
paper_authors: Ran Wang, Zhe Sage Chen
for: 该论文探讨了基础模型和生成人工智能模型在神经科学中的应用，包括自然语言和语音、 semantics 记忆、神经机器 interfaces（BMIs）和数据扩展。
methods: 该论文使用了自动学习（SSL）和传输学习来描述基础模型和生成 AI 模型的应用。
results: 该论文 argued that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.I hope that helps!

Abstract
Recent advances in machine learning have made revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the landscapes of neuroscience research and make a significant impact on the future. Here we present a mini-review on recent advances in foundation models and generative AI models as well as their applications in neuroscience, including natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.

摘要
Recent advances in machine learning have led to significant breakthroughs in computer games, image and natural language understanding, and scientific discovery. The development of foundation models and large-scale language models (LLMs) has achieved human-like intelligence, thanks to the power of BigData. With the help of self-supervised learning (SSL) and transfer learning, these models have the potential to reshape the landscapes of neuroscience research and have a profound impact on the future.In this mini-review, we will explore recent advances in foundation models and generative AI models, as well as their applications in neuroscience. We will discuss the use of these models in natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.Foundation Models and Generative AI ModelsFoundation models and generative AI models have been instrumental in achieving human-like intelligence in various fields. These models are trained on large datasets and use self-supervised learning techniques to learn the underlying patterns and relationships in the data. Once trained, these models can be fine-tuned for specific tasks, such as natural language processing, image recognition, and speech recognition.Applications in Neuroscience1. Natural Language and Speech: Foundation models and generative AI models have been used to develop advanced natural language processing systems that can understand and generate human-like language. These systems have numerous applications in neuroscience, such as analyzing large amounts of text data to identify patterns and trends, and generating natural language descriptions of complex scientific concepts.2. Semantic Memory: These models can be used to develop advanced memory systems that can store and retrieve large amounts of information. This has numerous applications in neuroscience, such as developing systems that can remember and recall complex scientific concepts and theories.3. Brain-Machine Interfaces (BMIs): Foundation models and generative AI models can be used to develop advanced BMIs that can read and interpret brain signals. This has numerous applications in neuroscience, such as developing systems that can decode brain signals to control prosthetic limbs and other assistive technologies.4. Data Augmentation: These models can be used to generate large amounts of synthetic data that can be used to augment real-world datasets. This has numerous applications in neuroscience, such as developing systems that can generate synthetic brain imaging data to augment real-world datasets and improve the accuracy of brain imaging techniques.Challenges and OpportunitiesWhile foundation models and generative AI models have the potential to revolutionize neuroscience research, there are several challenges and opportunities that must be addressed. Some of the challenges include:1. Data Quality: The quality of the data used to train these models is crucial. Poor-quality data can lead to biased or inaccurate models.2. Explainability: It is often difficult to understand how these models make decisions, which can be a problem in fields such as neuroscience where transparency and explainability are essential.3. Ethics: The use of these models raises ethical concerns, such as the potential for bias and the impact on employment.4. Training Time: Training these models can be time-consuming and computationally intensive.Despite these challenges, the opportunities presented by foundation models and generative AI models are significant. With the right training data and the appropriate fine-tuning, these models have the potential to revolutionize neuroscience research and lead to significant advances in our understanding of the brain and nervous system.