2023-10-05

cs.AI

cs.AI - 2023-10-05

Hard View Selection for Contrastive Learning

paper_url: http://arxiv.org/abs/2310.03940
repo_url: None
paper_authors: Fabio Ferreira, Ivo Rapant, Frank Hutter
for: 提高对图像输入的抗变易性和稳定性
methods: 提出一种无需学习的、强大的硬视角选择策略（HVS），通过随机生成多个视角，并对每个视角对照进行反向传播来增加任务难度
results: 在ImageNet上 Linear Evaluation 中提高了0.55%-1.9%的精度，并在多种CL方法（如DINO、SimSiam、SimCLR）上显示了类似的改进，而且HVS在800个训练周期的基础上只需300个训练周期即可达到类似水平，即使 compte tenu of the additional forward passes induced by HVS.

Abstract
Many Contrastive Learning (CL) methods train their models to be invariant to different "views" of an image input for which a good data augmentation pipeline is crucial. While considerable efforts were directed towards improving pre-text tasks, architectures, or robustness (e.g., Siamese networks or teacher-softmax centering), the majority of these methods remain strongly reliant on the random sampling of operations within the image augmentation pipeline, such as the random resized crop or color distortion operation. In this paper, we argue that the role of the view generation and its effect on performance has so far received insufficient attention. To address this, we propose an easy, learning-free, yet powerful Hard View Selection (HVS) strategy designed to extend the random view generation to expose the pretrained model to harder samples during CL training. It encompasses the following iterative steps: 1) randomly sample multiple views and create pairs of two views, 2) run forward passes for each view pair on the currently trained model, 3) adversarially select the pair yielding the worst loss, and 4) run the backward pass with the selected pair. In our empirical analysis we show that under the hood, HVS increases task difficulty by controlling the Intersection over Union of views during pretraining. With only 300-epoch pretraining, HVS is able to closely rival the 800-epoch DINO baseline which remains very favorable even when factoring in the slowdown induced by the additional forwards of HVS. Additionally, HVS consistently achieves accuracy improvements on ImageNet between 0.55% and 1.9% on linear evaluation and similar improvements on transfer tasks across multiple CL methods, such as DINO, SimSiam, and SimCLR.

摘要
许多对比学习（CL）方法在训练模型时强调模型对不同视图的图像输入具有抗变异性。而现有的大量努力集中在改进预测任务、建筑或者稳定性（例如siamese网络或教师软max中心），但大多数这些方法仍然依赖于随机抽样操作在图像增强pipeline中，如随机缩放或颜色干扰操作。在这篇论文中，我们认为观察视图生成和其影响表现所得到的关注不足。为此，我们提出一种简单、学习无需的、具有强大抗变异性的硬视选择（HVS）策略，用于在CL训练中延长随机视图生成，并将模型 expose 到更难的样本。该策略包括以下步骤：1. 随机抽取多个视图，并将每个视图对创建对。2. 对每个视图对进行前向传播，并计算当前训练模型的损失。3. 选择损失最大的对，并对该对进行反向传播。我们的实验表明，HVS可以通过控制视图之间的交集来增加CL训练的difficulty。只需要300个训练回合，HVS就可以与800个训练回合的DINO基eline相当，而且这些基eline在CL训练中保持了非常有利的。此外，HVS在ImageNet上实现了Linear评估中的0.55%-1.9%的准确率提升，以及同样的提升在多个CL方法上，如DINO、SimSiam和SimCLR。

Multitask Learning for Time Series Data with 2D Convolution

paper_url: http://arxiv.org/abs/2310.03925
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Xin Dai, Yan Zheng, Junpeng Wang, Huiyuan Chen, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang
for: 这个研究探讨了多任务学习（MTL）在时间序列资料上的应用，以提高时间序列分类（TSC）模型的通用化能力。
methods: 我们将现有的1D核心嵌入式TSC模型与MTL结合，并评估其性能。我们还提出了一个新的2D核心嵌入式模型，以增强模型的表达能力。
results: 我们的提案在UCR档案和一个工业交易TSC数据集上实现了比较好的性能，较以往的方法还要好。

Abstract
Multitask learning (MTL) aims to develop a unified model that can handle a set of closely related tasks simultaneously. By optimizing the model across multiple tasks, MTL generally surpasses its non-MTL counterparts in terms of generalizability. Although MTL has been extensively researched in various domains such as computer vision, natural language processing, and recommendation systems, its application to time series data has received limited attention. In this paper, we investigate the application of MTL to the time series classification (TSC) problem. However, when we integrate the state-of-the-art 1D convolution-based TSC model with MTL, the performance of the TSC model actually deteriorates. By comparing the 1D convolution-based models with the Dynamic Time Warping (DTW) distance function, it appears that the underwhelming results stem from the limited expressive power of the 1D convolutional layers. To overcome this challenge, we propose a novel design for a 2D convolution-based model that enhances the model's expressiveness. Leveraging this advantage, our proposed method outperforms competing approaches on both the UCR Archive and an industrial transaction TSC dataset.

摘要
多任务学习（MTL）目的是开发一个可以同时处理一组相关任务的统一模型。通过优化模型 across multiple tasks，MTL 通常会超过其非 MTL 对应模型的普适性。虽然 MTL 在不同领域 such as 计算机视觉、自然语言处理和推荐系统中得到了广泛的研究，但对时间序列数据的应用却收到了有限的注意。在这篇论文中，我们调查了在时间序列分类（TSC）问题上MTL的应用。然而，当我们将现有的 state-of-the-art 1D核心样本-based TSC模型与 MTL 集成时，TSC 模型的性能实际下降。通过比较 1D 核心样本-based 模型和动态时间戳距（DTW）距离函数，可以看出，不满的结果实际上来自于 1D 核心样本层的有限表达能力。为了解决这个挑战，我们提议一种新的 2D 核心样本-based 模型，该模型可以增强模型的表达能力。利用这个优势，我们的提议方法在 UCR archive 和一个工业交易 TSC 数据集上超过了竞争方法的性能。

An Efficient Content-based Time Series Retrieval System

paper_url: http://arxiv.org/abs/2310.03919
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Junpeng Wang, Vivian Lai, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang, Jeff M. Phillips
for: 这个论文旨在提供一个可以处理多个领域时间序列数据的信息检索系统，帮助用户通过提交时间序列来检索相关时间序列和元数据。
methods: 该论文提出了一种高效和可靠的时间序列检索模型，使用了一种基于内存的快速相似度计算方法，并对多个领域时间序列进行了比较。
results: 对于具体的交易数据问题，该模型比其他方法更适合，并且在实时交互过程中可以保持reasonable的推理时间。

Abstract
A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated metadata. By analyzing the retrieved metadata, users can gather more information about the source of the time series. Because the CTSR system is required to work with time series data from diverse domains, it needs a high-capacity model to effectively measure the similarity between different time series. On top of that, the model within the CTSR system has to compute the similarity scores in an efficient manner as the users interact with the system in real-time. In this paper, we propose an effective and efficient CTSR model that outperforms alternative models, while still providing reasonable inference runtimes. To demonstrate the capability of the proposed method in solving business problems, we compare it against alternative models using our in-house transaction data. Our findings reveal that the proposed model is the most suitable solution compared to others for our transaction data problem.

摘要
一个内容基于时间序列检索（CTSR）系统是一个信息检索系统，用于帮助用户与多个领域（如金融、医疗和制造）中的时间序列进行交互。例如，用户想要了解时间序列的来源，可以将时间序列作为查询提交到CTSR系统，并获取相关的元数据列表。通过分析返回的元数据，用户可以了解更多关于时间序列的来源信息。由于CTSR系统需要处理来自不同领域的时间序列数据，因此需要一个高容量的模型来有效地度量不同时间序列之间的相似性。同时，模型内部需要高效地计算相似性分数，以便用户在实时交互时可以得到快速的回答。在这篇论文中，我们提出一种高效和高效的CTSR模型，超过了其他模型，同时仍提供了合理的推理运行时间。为了证明提案的方法在解决业务问题时的可行性，我们对它与其他模型进行比较，并使用我们的自有交易数据进行实践。我们的发现表明，提案的模型是与其他模型相比最适合的解决方案。

Toward a Foundation Model for Time Series Data

paper_url: http://arxiv.org/abs/2310.03916
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Xin Dai, Huiyuan Chen, Yan Zheng, Yujie Fan, Audrey Der, Vivian Lai, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang
for: 这个论文的目的是开发一种有效的时间序列基础模型，使其可以在多个领域中进行适应。
methods: 这篇论文使用了四种现有的自然学习基于预训练方法，以及一种新方法，在多个领域的无标示样本上进行预训练。
results: 实验结果表明，预训练可以提高下游分类任务的融合过程，并且提出了一种基于Transformer模型的新预训练方法，其在其他方法中表现出色。

Abstract
A foundation model is a machine learning model trained on a large and diverse set of data, typically using self-supervised learning-based pre-training techniques, that can be adapted to various downstream tasks. However, current research on time series pre-training has mostly focused on models pre-trained solely on data from a single domain, resulting in a lack of knowledge about other types of time series. However, current research on time series pre-training has predominantly focused on models trained exclusively on data from a single domain. As a result, these models possess domain-specific knowledge that may not be easily transferable to time series from other domains. In this paper, we aim to develop an effective time series foundation model by leveraging unlabeled samples from multiple domains. To achieve this, we repurposed the publicly available UCR Archive and evaluated four existing self-supervised learning-based pre-training methods, along with a novel method, on the datasets. We tested these methods using four popular neural network architectures for time series to understand how the pre-training methods interact with different network designs. Our experimental results show that pre-training improves downstream classification tasks by enhancing the convergence of the fine-tuning process. Furthermore, we found that the proposed pre-training method, when combined with the Transformer model, outperforms the alternatives.

摘要
《基础模型》是一种机器学习模型，通过大量和多样化的数据进行自动学习预训练，可以适应多种下游任务。然而，当前关于时间序列预训练的研究主要集中在尝试使用单个领域的数据进行预训练，导致对其他领域时间序列的知识缺乏。为了开拓新的研究途径，我们希图通过多个领域的无标签样本来开发一个有效的时间序列基础模型。我们利用了公共可用的UCRL Archive，评估了四种现有的自动学习预训练方法，以及一种新方法，在这些数据集上进行测试。我们使用了四种流行的快速网络架构来评估这些预训练方法的效果。我们的实验结果表明，预训练可以提高下游分类任务的整合，并且我们提出的预训练方法，与Transformer模型结合使用，可以超越其他方法。

RTDK-BO: High Dimensional Bayesian Optimization with Reinforced Transformer Deep kernels

paper_url: http://arxiv.org/abs/2310.03912
repo_url: None
paper_authors: Alexander Shmakov, Avisek Naug, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna Gutierrez, Ashwin Ramesh Babu, Antonio Guillen, Soumyendu Sarkar
for: 这篇论文的目的是提高Meta-learning Bayesian Optimization（BO）的表达力，以便更好地处理高维度黑盒优化问题。methods: 该论文使用了Deep Kernel Learning（DKL）和注意力基于Transformer模型来提高GPsurrogates的模型能力，并使用了Soft Actor-Critic Reinforcement Learning（SACRL）来学习获取函数的优化策略。results: 该论文的实验结果表明，combined DKL和Transformer模型可以提高Meta-learning BO surrogates的表达力，并在高维度黑盒优化问题上实现了最佳性能。

Abstract
Bayesian Optimization (BO), guided by Gaussian process (GP) surrogates, has proven to be an invaluable technique for efficient, high-dimensional, black-box optimization, a critical problem inherent to many applications such as industrial design and scientific computing. Recent contributions have introduced reinforcement learning (RL) to improve the optimization performance on both single function optimization and \textit{few-shot} multi-objective optimization. However, even few-shot techniques fail to exploit similarities shared between closely related objectives. In this paper, we combine recent developments in Deep Kernel Learning (DKL) and attention-based Transformer models to improve the modeling powers of GP surrogates with meta-learning. We propose a novel method for improving meta-learning BO surrogates by incorporating attention mechanisms into DKL, empowering the surrogates to adapt to contextual information gathered during the BO process. We combine this Transformer Deep Kernel with a learned acquisition function trained with continuous Soft Actor-Critic Reinforcement Learning to aid in exploration. This Reinforced Transformer Deep Kernel (RTDK-BO) approach yields state-of-the-art results in continuous high-dimensional optimization problems.

摘要

Taming Binarized Neural Networks and Mixed-Integer Programs

paper_url: http://arxiv.org/abs/2310.04469
repo_url: None
paper_authors: Johannes Aspman, Georgios Korpas, Jakub Marecek
for: 本研究旨在解决binarized neural networks的训练问题，特别是因为这些神经网络具有解释性。
methods: 研究人员使用了将问题 Reformulate为杂integer程序的子添加问题的方法，以便使用Bolte等人提出的暗示法，实现backpropagation的实际应用。
results: 研究人员表明，使用这种方法可以使binarized neural networks具有可控的表示，并且可以使用Bolte等人的框架进行隐式导数，从而实现实际应用。

Abstract
There has been a great deal of recent interest in binarized neural networks, especially because of their explainability. At the same time, automatic differentiation algorithms such as backpropagation fail for binarized neural networks, which limits their applicability. By reformulating the problem of training binarized neural networks as a subadditive dual of a mixed-integer program, we show that binarized neural networks admit a tame representation. This, in turn, makes it possible to use the framework of Bolte et al. for implicit differentiation, which offers the possibility for practical implementation of backpropagation in the context of binarized neural networks. This approach could also be used for a broader class of mixed-integer programs, beyond the training of binarized neural networks, as encountered in symbolic approaches to AI and beyond.

摘要
有很多最近关注二进制神经网络，特别是它们的可解释性。然而，自动梯度计算算法如反射propagation失效于二进制神经网络，限制了它们的应用。我们通过将二进制神经网络训练问题重新表述为杂Integer程序的子Additive dual，证明二进制神经网络具有可控的表示。这种表示使得可以使用博尔特等人的框架对偶计算，这些计算可以实现在二进制神经网络训练中的Backpropagation。这种方法可以用于更广泛的杂Integer程序，不仅是训练二进制神经网络，还有在符号智能术中遇到的更加广泛的应用。

Accelerated Neural Network Training with Rooted Logistic Objectives

paper_url: http://arxiv.org/abs/2310.03890
repo_url: None
paper_authors: Zhu Wang, Praveen Raj Veluswami, Harsh Mishra, Sathya N. Ravi
for: 本研究旨在提出一种新的损失函数，以提高神经网络模型在实际应用中的训练速度和性能。
methods: 本研究使用了一种新的损失函数，即“根据损失函数”，这个函数是基于对数函数的一种变形，可以提高神经网络模型的训练速度和性能。
results: 在实际实验中，使用“根据损失函数”训练神经网络模型可以提高模型的性能，并且训练速度也比传统的损失函数快。此外，这种损失函数还可以应用于生成模型下沉淀应用，如StyleGAN模型的训练。

Abstract
Many neural networks deployed in the real world scenarios are trained using cross entropy based loss functions. From the optimization perspective, it is known that the behavior of first order methods such as gradient descent crucially depend on the separability of datasets. In fact, even in the most simplest case of binary classification, the rate of convergence depends on two factors: (1) condition number of data matrix, and (2) separability of the dataset. With no further pre-processing techniques such as over-parametrization, data augmentation etc., separability is an intrinsic quantity of the data distribution under consideration. We focus on the landscape design of the logistic function and derive a novel sequence of {\em strictly} convex functions that are at least as strict as logistic loss. The minimizers of these functions coincide with those of the minimum norm solution wherever possible. The strict convexity of the derived function can be extended to finetune state-of-the-art models and applications. In empirical experimental analysis, we apply our proposed rooted logistic objective to multiple deep models, e.g., fully-connected neural networks and transformers, on various of classification benchmarks. Our results illustrate that training with rooted loss function is converged faster and gains performance improvements. Furthermore, we illustrate applications of our novel rooted loss function in generative modeling based downstream applications, such as finetuning StyleGAN model with the rooted loss. The code implementing our losses and models can be found here for open source software development purposes: https://anonymous.4open.science/r/rooted_loss.

摘要
许多神经网络在实际场景中被训练使用基于权重 entropy 的损失函数。优化方面来说，已知的是，训练使用首领方法 such as 梯度下降时，数据集的分化度对结果的准确性和速度具有关键作用。实际上，甚至在最简单的二分类情况下，训练的速度和准确性都取决于数据集的分化度和Condition number of data matrix。在没有额外的预处理技术，如过 parametrization、数据增强等，数据分化度是数据分布考虑的内在特性。我们关注了对 logistic 函数的 landscape 设计，并 derive 一个 novel 的 strictly convex 函数序列，这些函数在至少等效于 logistic loss 的情况下，其最小值的解归并与 minimum norm solution 匹配。我们发现，使用我们提议的根据梯度损失函数可以更快 converges 并提高性能。此外，我们还应用了我们的新的根据梯度损失函数在生成模型中的应用，例如 StyleGAN 模型的资源化。我们的实验结果表明，使用根据梯度损失函数可以提高模型的性能和速度。代码实现我们的损失函数和模型可以在以下链接找到：。

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

paper_url: http://arxiv.org/abs/2310.19804
repo_url: None
paper_authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland
for: 这个论文是为了探讨行为度量在再征学习中的应用。
methods: 论文使用了正定定义kernels来定义一种新的度量，并利用这种度量提供了新的理论结果，包括距离值函数差异的上限和度量可以证明嵌入到finite维Euclidean空间中 WITH low distortion error。
results: 论文通过实验证明了这种方法在实践中的效果。

Abstract
Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.

摘要
行为指标已被证明是在增强学习中有效的表示机制。我们提出了一种新的视角，使用正定定义的kernel来解决Markov决策过程中的行为指标。我们利用这种新的视角来定义一个新的度量，该度量与Castro等人（2021）最近提出的MICo距离相等。kernel视角还允许我们提供新的理论结果，包括通过我们的度量下界值函数差异，以及证明我们的度量可以被证明嵌入到有低抖动误差的finite维Euclidean空间中。这些是使用行为指标进行增强学习表示的两个重要性质。我们在理论上补充了强大的实验结果，证明这些方法在实践中的有效性。

Small batch deep reinforcement learning

paper_url: http://arxiv.org/abs/2310.03882
repo_url: None
paper_authors: Johan Obando-Ceron, Marc G. Bellemare, Pablo Samuel Castro
for: 提高深度强化学习的性能
methods: 使用批处理大小为控制参数，对每次梯度更新进行采样
results: 研究发现，减小批处理大小可以提高性能，这与通常认为增加批处理大小可以提高 neural network 性能的想法相反。

Abstract
In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.

摘要
在值基深度强化学习中使用回忆储存，批处理大小参数指定每个梯度更新中样本的数量。虽然对学习过程非常重要，但通常不会在提出新算法时调整这个值。在这项工作中，我们提供了广泛的实验研究，表明减小批处理大小可以导致一些重要的性能提升，这是对训练神经网络时通常采用大批处理大小的惯例。我们补充了一系列实验分析，以更好地理解这种现象。

Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

paper_url: http://arxiv.org/abs/2310.05862
repo_url: None
paper_authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
for: 防止CLIP模型被targeted数据毒化和后门攻击
methods: 使用unimodal对比学习（CL）对图像和文本模式进行暖身，并将数据分成安全和危险 subsets，对危险 subsets进行unimodal CL 和CLIP损失的同时训练
results: 在许多数据集上，SAFECLIP可以有效防止targeted数据毒化和后门攻击，而不会对CLIP性能产生影响

Abstract
Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification and enabled transferability to new domains. However, CLIP is extremely more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning. Perhaps surprisingly, poisoning 0.0001% of CLIP pre-training data is enough to make targeted data poisoning attacks successful. This is four orders of magnitude smaller than what is required to poison supervised models. Despite this vulnerability, existing methods are very limited in defending CLIP models during pre-training. In this work, we propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks. SAFECLIP warms up the model by applying unimodal contrastive learning (CL) on image and text modalities separately. Then, it carefully divides the data into safe and risky subsets. SAFECLIP trains on the risky data by applying unimodal CL to image and text modalities separately, and trains on the safe data using the CLIP loss. By gradually increasing the size of the safe subset during the training, SAFECLIP effectively breaks targeted data poisoning and backdoor attacks without harming the CLIP performance. Our extensive experiments show that SAFECLIP decrease the attack success rate of targeted data poisoning attacks from 93.75% to 0% and that of the backdoor attacks from 100% to 0%, without harming the CLIP performance on various datasets.

摘要
对大量图像描述文本 datasets 进行 Contrastive Language-Image Pre-training (CLIP) 后得到了杰出的成功，并且允许在新领域中进行转移。然而，CLIP 对于Targeted Data Poisoning 和 Backdoor 攻击更加易受攻击，相比于超vised learning。奇怪的是，对 CLIP 预训练数据进行0.0001%的恶意数据投毒只需要0.0001%的数据，而supervised learning 需要1000倍的数据。尽管如此，现有的方法对于防止 CLIP 模型在预训练中受到攻击很有限。在这个工作中，我们提出了一种强大的防御方法，即 SafeCLIP，以安全地在预训练 CLIP 模型中进行Targeted Data Poisoning 和 Backdoor 攻击。SafeCLIP 通过在图像和文本模式上分别应用 unimodal Contrastive Learning (CL) 来让模型进行温身。然后，它 méticulously 将数据分为安全和危险子集。SafeCLIP 在危险子集上应用 unimodal CL，并在安全子集上使用 CLIP 损失进行训练。通过逐渐增加安全子集的大小 durante 训练，SafeCLIP 可以有效地破坏 Targeted Data Poisoning 和 Backdoor 攻击，而不会害 CLIP 性能。我们的广泛的实验表明，SafeCLIP 可以将 Targeted Data Poisoning 攻击的成功率从 93.75% 降低到 0%，并将 Backdoor 攻击的成功率从 100% 降低到 0%，而不会害 CLIP 在不同的 dataset 上的性能。

Validating transformers for redaction of text from electronic health records in real-world healthcare

paper_url: http://arxiv.org/abs/2310.04468
repo_url: https://github.com/CogStack/MedCAT
paper_authors: Zeljko Kraljevic, Anthony Shek, Joshua Au Yeung, Ewart Jonathan Sheldon, Mohammad Al-Agil, Haris Shuaib, Xi Bai, Kawsar Noor, Anoop D. Shah, Richard Dobson, James Teo
for: 保护医疗记录中患者隐私的研究，以实现医疗数据的安全和共享。
methods: 使用深度学习技术，特别是变换器模型，以提高隐私 obscuration 的精度和效率。
results: 在三个英国医院的实际记录中，AnonCAT 模型达到了高性能，具体来说是：Recall 为 0.99、0.99 和 0.96。

Abstract
Protecting patient privacy in healthcare records is a top priority, and redaction is a commonly used method for obscuring directly identifiable information in text. Rule-based methods have been widely used, but their precision is often low causing over-redaction of text and frequently not being adaptable enough for non-standardised or unconventional structures of personal health information. Deep learning techniques have emerged as a promising solution, but implementing them in real-world environments poses challenges due to the differences in patient record structure and language across different departments, hospitals, and countries. In this study, we present AnonCAT, a transformer-based model and a blueprint on how deidentification models can be deployed in real-world healthcare. AnonCAT was trained through a process involving manually annotated redactions of real-world documents from three UK hospitals with different electronic health record systems and 3116 documents. The model achieved high performance in all three hospitals with a Recall of 0.99, 0.99 and 0.96. Our findings demonstrate the potential of deep learning techniques for improving the efficiency and accuracy of redaction in global healthcare data and highlight the importance of building workflows which not just use these models but are also able to continually fine-tune and audit the performance of these algorithms to ensure continuing effectiveness in real-world settings. This approach provides a blueprint for the real-world use of de-identifying algorithms through fine-tuning and localisation, the code together with tutorials is available on GitHub (https://github.com/CogStack/MedCAT).

摘要
保护患者隐私在医疗记录是最高优先事项，而红aktion是一种常用的方法来隐藏直接可识别的信息。规则基于的方法已经广泛使用，但它们的精度frequently low，导致过度的文本隐藏和不够适应非标准化或不同结构的个人医疗信息。深度学习技术已经出现为一种可能的解决方案，但在实际环境中实施它们却存在医疗数据结构和语言不同的医院、医生和国家的挑战。在本研究中，我们介绍了AnonCAT，一种基于transformer的模型和在实际医疗环境中部署deidentification模型的蓝图。AnonCAT通过手动标注真实文档的红aktion进行训练，并在三个英国医院中使用3116个文档进行训练。模型在三个医院中表现出色，Recall值为0.99、0.99和0.96。我们的发现表明深度学习技术可以提高医疗数据隐藏的效率和准确性，并且建立可以不断细化和审核这些算法的工作流程，以确保在实际环境中的持续效果。这种方法提供了在实际使用deep learning隐藏算法时的蓝图，代码和教程可以在GitHub上找到（https://github.com/CogStack/MedCAT）。

Design Principles for Lifelong Learning AI Accelerators

paper_url: http://arxiv.org/abs/2310.04467
repo_url: None
paper_authors: Dhireesha Kudithipudi, Anurag Daram, Abdullah M. Zyarah, Fatima Tuz Zohora, James B. Aimone, Angel Yanguas-Gil, Nicholas Soures, Emre Neftci, Matthew Mattina, Vincenzo Lomonaco, Clare D. Thiem, Benjamin Epstein
for: 这篇论文主要是关于人工智能（AI）的持续学习（Lifelong learning），以及如何在Edge设备上实现持续学习AI模型的加速。
methods: 论文使用了一些适用于Edge设备的现有加速器，以及一些新的技术，如 neuromorphic computing 和edge AI accelerators，来实现持续学习AI模型的加速。
results: 论文提出了一些关键的可能性和度量来评估持续学习AI模型的加速器，并探讨了未来可能的技术和应用场景。

Abstract
Lifelong learning - an agent's ability to learn throughout its lifetime - is a hallmark of biological learning systems and a central challenge for artificial intelligence (AI). The development of lifelong learning algorithms could lead to a range of novel AI applications, but this will also require the development of appropriate hardware accelerators, particularly if the models are to be deployed on edge platforms, which have strict size, weight, and power constraints. Here, we explore the design of lifelong learning AI accelerators that are intended for deployment in untethered environments. We identify key desirable capabilities for lifelong learning accelerators and highlight metrics to evaluate such accelerators. We then discuss current edge AI accelerators and explore the future design of lifelong learning accelerators, considering the role that different emerging technologies could play.

摘要
人生学习 - 一个智能代理的生命中不断学习能力 - 是生物学学习系统的特征和人工智能（AI）的中心挑战。开发持续学习AI算法可能会导致多种新的AI应用程序，但这也需要开发适当的硬件加速器，特别是如果模型需要在边缘平台上部署，这些平台具有严格的大小、重量和功耗限制。我们研究了部署在无缝环境中的持续学习AI加速器的设计。我们确定了持续学习加速器所需的关键愿景和评价指标，然后讨论当前的边缘AI加速器和未来的持续学习加速器设计，并考虑不同的新技术在这方面的作用。

Contextualized Structural Self-supervised Learning for Ontology Matching

paper_url: http://arxiv.org/abs/2310.03840
repo_url: https://github.com/ellenzhuwang/lakermap
paper_authors: Zhu Wang
for: This paper is written for researchers and practitioners in the field of knowledge graph (KG) integration, particularly those interested in ontology matching (OM) and self-supervised learning.
methods: The paper proposes a novel self-supervised learning OM framework called LaKERMap, which leverages transformer-based language models and incorporates implicit knowledge to capture multiple structural contexts. The framework utilizes distinct training objectives to improve alignment quality and inference time.
results: The paper reports that LaKERMap outperforms state-of-the-art systems in terms of alignment quality and inference time, as demonstrated through experiments on the Bio-ML datasets and tasks. The findings suggest that LaKERMap is a promising approach for KG integration.

Abstract
Ontology matching (OM) entails the identification of semantic relationships between concepts within two or more knowledge graphs (KGs) and serves as a critical step in integrating KGs from various sources. Recent advancements in deep OM models have harnessed the power of transformer-based language models and the advantages of knowledge graph embedding. Nevertheless, these OM models still face persistent challenges, such as a lack of reference alignments, runtime latency, and unexplored different graph structures within an end-to-end framework. In this study, we introduce a novel self-supervised learning OM framework with input ontologies, called LaKERMap. This framework capitalizes on the contextual and structural information of concepts by integrating implicit knowledge into transformers. Specifically, we aim to capture multiple structural contexts, encompassing both local and global interactions, by employing distinct training objectives. To assess our methods, we utilize the Bio-ML datasets and tasks. The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems in terms of alignment quality and inference time. Our models and codes are available here: https://github.com/ellenzhuwang/lakermap.

摘要
ontology matching (OM) 涉及到两个或更多知识图(KG)中概念之间的Semantic关系的识别，并且作为将KG集成的关键步骤。 current advancements in deep OM models have leveraged the power of transformer-based language models and the advantages of knowledge graph embedding. However, these OM models still face persistent challenges, such as a lack of reference alignments, runtime latency, and unexplored different graph structures within an end-to-end framework. In this study, we introduce a novel self-supervised learning OM framework with input ontologies, called LaKERMap. This framework capitalizes on the contextual and structural information of concepts by integrating implicit knowledge into transformers. Specifically, we aim to capture multiple structural contexts, encompassing both local and global interactions, by employing distinct training objectives. To assess our methods, we utilize the Bio-ML datasets and tasks. The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems in terms of alignment quality and inference time. Our models and codes are available here: https://github.com/ellenzhuwang/lakermap.Here's the breakdown of the translation:* "Ontology matching" is translated as "ontology matching" (同义词翻译)* "identification of semantic relationships" is translated as "识别Semantic关系" ( literal translation)* "between concepts within two or more knowledge graphs" is translated as "在两个或更多知识图中的概念之间" ( literal translation)* "and serves as a critical step in integrating knowledge graphs from various sources" is translated as "并且作为将KG集成的关键步骤" ( literal translation)* "current advancements in deep OM models" is translated as "current advancements in deep OM models" (同义词翻译)* "have leveraged the power of transformer-based language models" is translated as "have leveraged the power of transformer-based language models" (同义词翻译)* "and the advantages of knowledge graph embedding" is translated as "和知识图嵌入的优点" ( literal translation)* "However, these OM models still face persistent challenges" is translated as "然而, these OM models still face persistent challenges" (同义词翻译)* "such as a lack of reference alignments" is translated as "如无参照对对应" ( literal translation)* "runtime latency" is translated as "运行时延迟" ( literal translation)* "and unexplored different graph structures within an end-to-end framework" is translated as "和未探索的不同图结构在端到端框架中" ( literal translation)* "In this study, we introduce a novel self-supervised learning OM framework with input ontologies" is translated as "在本研究中, we introduce a novel self-supervised learning OM framework with input ontologies" (同义词翻译)* "called LaKERMap" is translated as "called LaKERMap" (同义词翻译)* "This framework capitalizes on the contextual and structural information of concepts" is translated as "这个框架利用概念的上下文ual和结构信息" ( literal translation)* "by integrating implicit knowledge into transformers" is translated as "通过将隐式知识 integrate into transformers" ( literal translation)* "Specifically, we aim to capture multiple structural contexts" is translated as " Specifically, we aim to capture multiple structural contexts" (同义词翻译)* "encompassing both local and global interactions" is translated as "包括both local and global interactions" ( literal translation)* "by employing distinct training objectives" is translated as "通过不同的训练目标" ( literal translation)* "To assess our methods, we utilize the Bio-ML datasets and tasks" is translated as "以评估我们的方法, we utilize the Bio-ML datasets and tasks" (同义词翻译)* "The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems" is translated as "我们创新的方法的结果表明LaKERMap超过了现状的系统" ( literal translation)* "in terms of alignment quality and inference time" is translated as "在对应质量和推理时间方面" ( literal translation)* "Our models and codes are available here: https://github.com/ellenzhuwang/lakermap" is translated as "我们的模型和代码可以在这里获取: https://github.com/ellenzhuwang/lakermap" (同义词翻译)

paper_url: http://arxiv.org/abs/2310.05984
repo_url: None
paper_authors: Petter Törnberg, Diliara Valeeva, Justus Uitermark, Christopher Bail
for: 本研究旨在研究如何通过组合大自然语言模型（LLM）和代理模型来实现社交媒体平台的优化。
methods: 本研究使用了大自然语言模型（LLM）和代理模型来模拟社交媒体平台，并使用了来自美国全国选举研究的数据来填充模拟的社交媒体平台。
results: 研究发现，使用“桥接”算法可以促进不同政见用户之间的构成性对话，而不同的新闻列表算法则可能会导致更多的攻击性和不constructive的对话。

Abstract
Social media is often criticized for amplifying toxic discourse and discouraging constructive conversations. But designing social media platforms to promote better conversations is inherently challenging. This paper asks whether simulating social media through a combination of Large Language Models (LLM) and Agent-Based Modeling can help researchers study how different news feed algorithms shape the quality of online conversations. We create realistic personas using data from the American National Election Study to populate simulated social media platforms. Next, we prompt the agents to read and share news articles - and like or comment upon each other's messages - within three platforms that use different news feed algorithms. In the first platform, users see the most liked and commented posts from users whom they follow. In the second, they see posts from all users - even those outside their own network. The third platform employs a novel "bridging" algorithm that highlights posts that are liked by people with opposing political views. We find this bridging algorithm promotes more constructive, non-toxic, conversation across political divides than the other two models. Though further research is needed to evaluate these findings, we argue that LLMs hold considerable potential to improve simulation research on social media and many other complex social settings.

摘要

ECAvg: An Edge-Cloud Collaborative Learning Approach using Averaged Weights

paper_url: http://arxiv.org/abs/2310.03823
repo_url: None
paper_authors: Atah Nuh Mih, Hung Cao, Asfia Kawnine, Monica Wachowicz
for: 这个研究旨在提出一个Edge-Cloud共同运作的架构，让Edge和Clouddevices之间建立合作关系，各自补偿对方的缺陷。
methods: 这个方法是基于Edge device先训练本地模型，然后将模型转移到服务器进行精细调整。服务器给出了一个全球模型，并将它与各个Edge device的本地模型进行测量。
results: 在CIFAR-10和CIFAR-100分类任务中，我们发现使用我们的方法可以提高服务器模型的性能，并且在Edge device上进行模型更新可以提高性能。但在MNIST分类任务中，将权重平均化导致服务器和Edge device模型的性能下降，这是由于负面传播学习所致。

Abstract
The use of edge devices together with cloud provides a collaborative relationship between both classes of devices where one complements the shortcomings of the other. Resource-constraint edge devices can benefit from the abundant computing power provided by servers by offloading computationally intensive tasks to the server. Meanwhile, edge devices can leverage their close proximity to the data source to perform less computationally intensive tasks on the data. In this paper, we propose a collaborative edge-cloud paradigm called ECAvg in which edge devices pre-train local models on their respective datasets and transfer the models to the server for fine-tuning. The server averages the pre-trained weights into a global model, which is fine-tuned on the combined data from the various edge devices. The local (edge) models are then updated with the weights of the global (server) model. We implement a CIFAR-10 classification task using MobileNetV2, a CIFAR-100 classification task using ResNet50, and an MNIST classification using a neural network with a single hidden layer. We observed performance improvement in the CIFAR-10 and CIFAR-100 classification tasks using our approach, where performance improved on the server model with averaged weights and the edge models had a better performance after model update. On the MNIST classification, averaging weights resulted in a drop in performance on both the server and edge models due to negative transfer learning. From the experiment results, we conclude that our approach is successful when implemented on deep neural networks such as MobileNetV2 and ResNet50 instead of simple neural networks.

摘要
使用边缘设备与云计算机联合，两类设备之间形成合作关系，其中边缘设备利用云计算机的庞大计算能力来推OFF computationally intensive tasks，而云计算机则可以利用边缘设备的数据靠近源来执行less computationally intensive tasks。在这篇论文中，我们提出了一种协同边缘云模型（ECAvg），其中边缘设备先在本地数据集上预训local models，然后将模型传输到服务器进行细化。服务器将预训模型的权重平均化为全局模型，并在多个边缘设备的数据合并后进行细化。本地（边缘）模型然后将全局（服务器）模型的权重更新。我们在MobileNetV2、ResNet50和一个单Hidden layer的神经网络上实现了CIFAR-10、CIFAR-100和MNIST分类任务。我们发现，使用我们的方法时，CIFAR-10和CIFAR-100分类任务中的性能提高，而服务器模型和边缘模型都有更好的性能之后更新模型。但在MNIST分类任务中，平均权重导致服务器和边缘模型的性能下降，这是因为负转移学习。从实验结果来看，我们的方法在深度神经网络如MobileNetV2和ResNet50上更加成功，而不是简单的神经网络。

Accurate Cold-start Bundle Recommendation via Popularity-based Coalescence and Curriculum Heating

paper_url: http://arxiv.org/abs/2310.03813
repo_url: None
paper_authors: Hyunsik Jeon, Jong-eun Lee, Jeongin Yun, U Kang
for: 提出了一种准确的冷启动bundle推荐方法，用于解决实际场景中新bundle的创造和推荐问题。
methods: 提出了一种基于媒体的CoHeat方法，通过结合历史信息和联合信息来衡量用户-bundle关系，并通过curriculum学习和对比学习来学习秘密表示。
results: 对比 bestechnologie，CoHeat方法在冷启动bundle推荐中显示出了193%高的nDCG@20指标， indicating its superior performance in accurately recommending cold-start bundles.

Abstract
How can we accurately recommend cold-start bundles to users? The cold-start problem in bundle recommendation is critical in practical scenarios since new bundles are continuously created for various marketing purposes. Despite its importance, no previous studies have addressed cold-start bundle recommendation. Moreover, existing methods for cold-start item recommendation overly rely on historical information, even for unpopular bundles, failing to tackle the primary challenge of the highly skewed distribution of bundle interactions. In this work, we propose CoHeat (Popularity-based Coalescence and Curriculum Heating), an accurate approach for the cold-start bundle recommendation. CoHeat tackles the highly skewed distribution of bundle interactions by incorporating both historical and affiliation information based on the bundle's popularity when estimating the user-bundle relationship. Furthermore, CoHeat effectively learns latent representations by exploiting curriculum learning and contrastive learning. CoHeat demonstrates superior performance in cold-start bundle recommendation, achieving up to 193% higher nDCG@20 compared to the best competitor.

摘要
如何准确推荐冷启用户？冷启问题在Bundle推荐中是非常重要的，新的Bundle在各种市场营销目的下不断创建。然而，前面的研究未能够正确地解决冷启Bundle推荐问题。现有的冷启Item推荐方法过于依赖历史信息，即使是不受欢迎的Bundle也会被优先推荐，无法解决主要挑战：Bundle互动的非常均衡分布。在这项工作中，我们提出CoHeat（流行度基于的合并和辅助热化），一种精度的冷启Bundle推荐方法。CoHeat通过考虑Bundle的流行度时对用户-Bundle关系进行估计，解决了高度均衡分布的Bundle互动问题。此外，CoHeat通过辅助学习和对比学习来有效地学习潜在表示。CoHeat在冷启Bundle推荐方面表现出色，与最佳竞争对手相比，可以达到193%的nDCG@20提高。

Improved Baselines with Visual Instruction Tuning

paper_url: http://arxiv.org/abs/2310.03744
repo_url: None
paper_authors: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
for: 这份研究是为了提高大型多模态模型（LMM）的视觉指令调整能力。
methods: 这份研究使用了 LLVA 模型，并进行了一些简单的修改，包括使用 CLIP-ViT-L-336px 和 MLP 投影，以及添加学术任务oriented VQA 数据和简单的响应格式提示。
results: 研究显示，通过这些修改，可以建立更强的基elines，达到了 11 个标准测试 benchmark 的状态态。最终的 13B 检查点只需使用了 1.2M 公共可用数据，并在单个 8-A100 节点上完成了全程训练，耗时约为 1 天。

Abstract
Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~1 day on a single 8-A100 node. We hope this can make state-of-the-art LMM research more accessible. Code and model will be publicly available.

摘要
大型多Modal模型（LMM）最近已经表现出了鼓舞人心的进步，在这份笔记中，我们表明了 LLava 中的全连接视力语言跨模态连接器 surprisingly 强大和数据有效。通过简单地修改 LLava， specifically using CLIP-ViT-L-336px with MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks。我们的最终13B checkpoint只用了1.2M公共可用数据，并在单个8-A100节点上完成了完整的训练，只需要大约1天时间。我们希望这可以让state-of-the-art LMM研究更加 accessible。代码和模型将公开available。

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

paper_url: http://arxiv.org/abs/2310.03739
repo_url: https://github.com/mihirp1998/alignprop
paper_authors: Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki
for: 这篇论文的目的是优化文本至图像生成模型，以便在下游任务中控制其行为，例如提高人类所感觉的图像质量、图像文本Alignment、或道德性图像生成。
methods: 本篇论文提出了一种名为AlignProp的方法，它使用端到端归整法将扩散模型与下游 reward function alignment，通过终端测量过程中的梯度检查点来实现可持续的内存使用。
results: 根据本篇论文的测试结果，AlignProp 在调整扩散模型的不同目标下（例如图像文本对齐、美学、压缩性和物件数量控制）表现出比其他方法更高的奖励，同时更简单易懂，因此可以轻松地优化扩散模型以满足 differentiable reward function 的需求。

Abstract
Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest. Code and Visualization results are available at https://align-prop.github.io/.

摘要
文本到图像扩散模型最近在图像生成领域取得了前列位，得益于巨大的文本到图像无监督或弱监督训练 dataset。由于它们的无监督训练，控制它们在下游任务中，如提高人类感知的图像质量、图像文本对齐或道德图像生成，是困难的。现有的工作是通过普通的再征学习训练 diffusion models，不可避免的高弹性问题。在这篇论文中，我们提出了 AlignProp，一种方法，用于将 diffusion models 与下游奖励函数相对位。我们使用终端到终端的反推进程来实现这一点，并使用低级adapter weight模块的训练和梯度检查点，使其可以实现可持续的存储和计算。我们在不同的目标上训练 diffusion models，如图像文本 semantic alignment、美学、压缩和对象数量控制，以及其组合。我们发现 AlignProp 在更少的训练步骤中 achieve 更高的奖励，而且概念更简单，因此在 differentiable 奖励函数的 интерес领域中是一个简单的选择。代码和视觉结果可以通过 https://align-prop.github.io/ 访问。

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

paper_url: http://arxiv.org/abs/2310.03731
repo_url: https://github.com/mathllm/mathcoder
paper_authors: Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, Hongsheng Li
for: 这篇论文的目的是提高开源语言模型的数学逻辑能力。
methods: 该论文提出了一种方法，用于微调开源语言模型，使其可以使用代码来建模和 derivation 数学公式。
results: 该论文的实验结果表明，使用该方法可以创建一些高质量的数学问题和其解决方案的代码 dataset，并且可以在MATH和GSM8K数据集上达到状态 искусственный智能模型的最高分（45.2%和83.9%）。

Abstract
The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.

摘要
Recently released GPT-4 Code Interpreter 已经表现出优秀的能力解决复杂的数学问题，主要归功于其能够自然语言和代码之间无缝相互作用，生成代码，执行代码，然后继续根据执行结果进行数学逻辑推理。在这篇论文中，我们提出了一种方法来调整开源语言模型，使其可以使用代码来建模和推导数学方程，从而提高其数学逻辑能力。我们提出了一种生成 novel 和高质量数学问题和代码解决方案的方法，称为 MathCodeInstruct。每个解决方案都包含自然语言、代码和执行结果。我们还介绍了一种自定义的监督训练和推理方法。这种方法生成了 MathCoder 模型，它是一家能够通过代码来解决复杂数学问题的模型。很显然，MathCoder 模型在开源 LLM 中的状态表现非常出色，在 MATH 和 GSM8K 数据集上分别达到了 45.2% 和 83.9% 的得分，大幅超过其他开源选项。尤其是，MathCoder 模型不仅在 GSM8K 和 MATH 数据集上超过 ChatGPT-3.5 和 PaLM-2，还在竞赛级别的 MATH 数据集上超过 GPT-4。数据集和模型将在 GitHub 上发布，请参考。

Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.03718
repo_url: None
paper_authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao
for: 本研究旨在训练具有安全限制的奖励学习（RL） Agent，以满足不同安全限制要求。
methods: 我们提出了 Conditioned Constrained Policy Optimization（CCPO）框架，包括两个关键模块：（1） Versatile Value Estimation（VVE）用于在未经见过的阈值条件下估算价值函数，以及（2） Conditioned Variational Inference（CVI）用于在策略优化过程中编码特定的安全限制条件。
results: 我们的实验结果表明，CCPO 可以在安全性和任务性能之间取得平衡，同时维护零批量适应性，使其适用于真实世界中的动态应用场景。

Abstract
Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements: training efficiency and zero-shot adaptation capability. To address them, we introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules: (1) Versatile Value Estimation (VVE) for approximating value functions under unseen threshold conditions, and (2) Conditioned Variational Inference (CVI) for encoding arbitrary constraint thresholds during policy optimization. Our extensive experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance while preserving zero-shot adaptation capabilities to different constraint thresholds data-efficiently. This makes our approach suitable for real-world dynamic applications.

摘要
安全强化学习（RL）专注于训练根据预先定义的安全限制的奖励最大化代理人。然而，在部署期间无需重新训练而适应不同安全限制要求的灵活安全策略仍是一个未探讨的和挑战性的领域。在这项工作中，我们定义了多样化安全RL问题，并考虑了两个主要要求：训练效率和零扩展能力。为解决这些问题，我们提出了条件constrained Policy优化框架（CCPO），该框架包括以下两个关键模块：1. 多样化价值估计（VVE）：用于在未看到的阈值条件下估计价值函数。2. 条件variational推理（CVI）：用于在政策优化过程中编码任意的阈值条件。我们的广泛的实验表明，CCPO在安全性和任务性能方面表现出色，同时保持零扩展能力，以便在不同的阈值条件下进行数据效率地部署。这使得我们的方法适用于真实的动态应用程序。

Artificial Intelligence Index Report 2023

paper_url: http://arxiv.org/abs/2310.03715
repo_url: None
paper_authors: Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, Raymond Perrault
for: 这份报告是为了提供不偏袋化、严格验证的AI相关数据，以便政策制定者、研究人员、高管、新闻工作者和一般公众更好地理解人工智能领域的复杂问题。
methods: 这份报告使用了多种方法，包括新的AI公众意见章节、更详细的技术性表现章节、大语言和多媒体模型的原始分析、全球AI法规记录的详细趋势、AI系统环境影响的研究和更多的数据来跟踪、汇总、筛选和可视化AI相关数据。
results: 这份报告提供了更多原创数据，包括AI公众意见、技术性表现、大语言和多媒体模型的分析、全球AI法规记录的趋势和AI系统环境影响的研究结果。

Abstract
Welcome to the sixth edition of the AI Index Report. This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about large language and multimodal models, detailed trends in global AI legislation records, a study of the environmental impact of AI systems, and more. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world's most credible and authoritative source for data and insights about AI.

摘要
Translated into Simplified Chinese:欢迎来到第六版AI指数报告。本年度报告包含更多的原创数据，包括一新的公众意见章节、更加详细的技术性能章节、关于大语言和多Modal模型的原始分析、全球AI法规纪录的详细趋势、AI系统的环境影响研究和更多。AI指数报告跟踪、汇总、缩写和可视化相关的人工智能数据，旨在为政策制定者、研究人员、高管、记者和普通公众提供不偏不倚的、严格审核的、广泛来源的数据，以便他们更好地理解人工智能领域的复杂问题。报告目标是成为全球最可靠和权威的AI数据和意见源。

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

paper_url: http://arxiv.org/abs/2310.03714
repo_url: https://github.com/stanfordnlp/dspy
paper_authors: Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
for: This paper is written for developing and optimizing language model (LM) pipelines using a programming model called DSPy.
methods: The paper uses a programming model called DSPy to abstract LM pipelines as text transformation graphs, and introduces a compiler that optimizes any DSPy pipeline to maximize a given metric.
results: The paper shows that succinct DSPy programs can express and optimize sophisticated LM pipelines that outperform standard few-shot prompting and pipelines with expert-created demonstrations, and that DSPy is competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5.

Abstract
The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy

摘要
《机器学习社区正在快速探索语言模型（LM）的引导技术和将其组合成复杂任务解决的管道。然而，现有的LM管道通常通过手动定制"提示模板"来实现，即通过尝试和错误来发现长串。为了更系统地开发和优化LM管道，我们介绍了DSPy，它是一种文本转换图模型，将LM管道转换为声明式计算图。DSPy模块是可参数化的，这意味着它们可以通过创建和收集示例来学习应用 Compositions of 提示、训练、扩展和理解技术。我们设计了一个编译器，可以对任何DSPy管道进行优化，以最大化给定指标。我们进行了两个案例研究，显示了简洁的DSPy程序可以表达和优化复杂LM管道，处理数学问题、多步返回、复杂问题和控制 Agent 循环。在编译后只需几行DSPy代码，GPT-3.5和 llama2-13b-chat 可以自动化管道，并在标准几个示例后（通常高于25%和65%）和专家创建示例（最高上升5-46%和16-40%）之上表现出色。此外，DSPy 编译到小型LM如770M-参数的T5和 llama2-13b-chat 与专家写提示链的方法相当竞争。DSPy 可以在上获取。

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

paper_url: http://arxiv.org/abs/2310.03710
repo_url: https://github.com/wang-research-lab/agentinstruct
paper_authors: Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, Chenguang Wang
for: 提高大语言模型在通用语言理解任务上的零shot理解能力
methods: 建立一个自主智能体来指导大语言模型的理解过程
results: 我们的方法在各种数据集上表现出色，在29个数据集中取得了state-of-the-art的零shot性能，比如提高了现状体现模型的性能，例如Vicuna-13b（13.3%）、Llama-2-70b-chat（23.2%）和GPT-3.5 Turbo（17.0%），与零shot chain of thought相比，我们的改进是很显著的，平均提高10.5%。

Abstract
We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%.

摘要
我们提出一种方法，以提高大语言模型在通用语言理解任务上的零基础理解能力。具体来说，我们构建了一个自动化代理，以控制大语言模型的理解过程。我们发现这种方法可以进一步释放大语言模型的零基础理解能力，以更多任务。我们对一系列数据集进行了测试，包括生成、分类和理解等任务。我们发现这种方法在大多数任务上具有普适性，并在20个数据集中实现了零基础性状态之势。例如，我们的方法可以大幅提高现有的状态级别大语言模型的性能，包括Vicuna-13b（13.3%）、Llama-2-70b-chat（23.2%）和GPT-3.5 Turbo（17.0%）。相比零基础思维，我们的改进在理解方面是悬殊的，平均提高10.5%。凭借我们的方法，Llama-2-70b-chat可以在零基础情况下超越GPT-3.5 Turbo的性能，提高10.2%。

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models

paper_url: http://arxiv.org/abs/2310.03708
repo_url: None
paper_authors: Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao
for: 这 paper 的目的是为了开发一种不需要人工学习的多目标RLHF算法，以提高语言模型的个性化适应性。
methods: 这 paper 使用的方法是基于直接喜好函数优化（DPO）的多目标RLHF算法，通过约束搜索和约束优化来学习多个目标对齐对象。
results: 实验结果表明，使用 MODPO 可以与现有方法匹配或超越其性能，并且可以在3倍的计算量下完成多目标RLHF。

Abstract
A single language model (LM), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences. Recent approaches thus pursue customization, training separate principle-based reward models to represent different alignment objectives (e.g. helpfulness, harmlessness, or honesty). Different LMs can then be trained for different preferences through multi-objective RLHF (MORLHF) with different objective weightings. Yet, RLHF is unstable and resource-heavy, especially for MORLHF with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO folds LM learning directly into reward modeling, aligning LMs with the weighted sum of all principle-based rewards using pure cross-entropy loss. While theoretically guaranteed to produce the same optimal solutions as MORLHF, MODPO is practically more stable and computationally efficient, obviating value function modeling and online sample collection. Empirical results in safety alignment and long-form question answering confirm that MODPO matches or outperforms existing methods, consistently producing one of the most competitive LM fronts that cater to diverse preferences with 3 times fewer computations compared with MORLHF.

摘要
一个语言模型（LM），即使与平均标注员的匹配得很好，也可能不适应人类的多样化偏好。现有的方法因此尝试个性化，通过训练不同的原则基于奖励模型来表达不同的匹配目标（例如帮助fulness、无害性和诚实）。然后可以通过多目标RLHF（MORLHF）来训练不同的LM。然而，RLHF是不稳定的，特别是在多目标情况下。在这篇论文中，我们提出了多目标直接偏好优化（MODPO）算法，它是RL无法算法，扩展了直接偏好优化（DPO）来处理多个原则基于奖励的目标。MODPO通过将LM学习直接嵌入奖励模型中，将LM与所有原则基于奖励的权重加权和平均值相对应。虽然从理论角度来看，MODPO和MORLHF都可以生成同样的优化解，但MODPO在实践中更稳定和计算效率更高，不需要值函数模型和在线样本采集。实验结果表明，MODPO在安全匹配和长文问答中与现有方法匹配或超越，可靠地生成适应多种偏好的LM前端，使用3倍少的计算量比MORLHF。

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

paper_url: http://arxiv.org/abs/2310.03693
repo_url: https://github.com/llm-tuning-safety/llms-finetuning-safety
paper_authors: Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson
for:这篇论文探讨了在自定义逻辑语言模型（LLM）上进行微调时，安全成本是什么样的？研究发现，即使模型初始化时的安全性检查通过，也不能保证模型在微调后保持安全。methods:研究人员使用了OpenAI的API进行微调GPT-3.5 Turbo模型，并通过自定义训练集来攻击模型的安全性。results:研究发现，只需要对GPT-3.5 Turbo模型进行微调 Using 10个逆向设计的训练例子，可以破坏模型的安全保护，并且这些例子可以通过OpenAI的API进行微调。此外，研究还发现，只要使用一些常用的训练集来微调模型，即使没有恶意，也可以减弱模型的安全性。这些发现表明，自定义微调aligned LLMs可能会带来新的安全隐患，而现有的安全基础设施并不能够处理这些隐患。

Abstract
Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are the safety costs associated with such custom fine-tuning? We note that while existing safety alignment infrastructures can restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users. Our red teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GPT-3.5 Turbo's safety guardrails by fine-tuning it on only 10 such examples at a cost of less than $0.20 via OpenAI's APIs, making the model responsive to nearly any harmful instructions. Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing -- even if a model's initial safety alignment is impeccable, it is not necessarily to be maintained after custom fine-tuning. We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the custom fine-tuning of aligned LLMs.

摘要
优化大型自然语言模型（LLM）为下游应用场景通常通过进一步精度调整来进行自定义。梅塔公司开源了LLAMA模型，而OpenAI提供了用于在自定义数据集上精度调整GPT-3.5 Turbo的API，这些做法也鼓励了这种实践。但是，在这种自定义调整中存在哪些安全成本呢？我们发现，虽然现有的安全对齐基础设施可以在推理时约束LLM的危险行为，但是它们不能覆盖在调整时的安全风险。我们的红团研究发现，通过只有少量反制设计的训练例来破坏LLM的安全对齐可以在 less than $0.20 的成本下使GPT-3.5 Turbo进行破坏。另外，我们发现，只要是通过常用的数据集进行调整，即使没有恶意，也可以不知不觉地削弱LLM的安全对齐。这些发现表明，在自定义调整aligned LLMs时，存在新的安全风险，现有的安全基础设施无法妥善处理这些风险。我们提出和分析了可能的缓解措施，并且强调进一步的研究努力应对自定义调整 aligned LLMs 的安全问题。

Probabilistic Generative Modeling for Procedural Roundabout Generation for Developing Countries

paper_url: http://arxiv.org/abs/2310.03687
repo_url: None
paper_authors: Zarif Ikram, Ling Pan, Dianbo Liu
for: 设计优化交通路网，以优化交通运输和 validate 效果，为发展中国家提供成本效果的方案。
methods: 使用 Generative Flow Networks (GFlowNets) 学习权值分布，生成高质量的解决方案，保留多样性。
results: 与相关方法进行比较，实验结果表明，我们的方法可以保持高效性，同时具有更高的多样性。

Abstract
Due to limited resources and fast economic growth, designing optimal transportation road networks with traffic simulation and validation in a cost-effective manner is vital for developing countries, where extensive manual testing is expensive and often infeasible. Current rule-based road design generators lack diversity, a key feature for design robustness. Generative Flow Networks (GFlowNets) learn stochastic policies to sample from an unnormalized reward distribution, thus generating high-quality solutions while preserving their diversity. In this work, we formulate the problem of linking incident roads to the circular junction of a roundabout by a Markov decision process, and we leverage GFlowNets as the Junction-Art road generator. We compare our method with related methods and our empirical results show that our method achieves better diversity while preserving a high validity score.

摘要
(Simplified Chinese translation)由于有限的资源和快速的经济增长，为发展中国家设计优化的交通运输路网，并在成本效益的情况下进行交通模拟和验证，是非常重要的。现有的规则基于的路线设计生成器缺乏多样性，这是设计Robustness的关键特征。生成流网络（GFlowNets）学习了随机政策，从未正规化的奖励分布中采样，因此可以生成高质量的解决方案，同时保持多样性。在这项工作中，我们将环境穿梭的问题形式化为Markov决策过程，并利用GFlowNets作为环境艺术路径生成器。我们与相关方法进行比较，我们的实验结果表明，我们的方法可以保持高有效性分数，同时提高多样性。

Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation

paper_url: http://arxiv.org/abs/2310.03780
repo_url: None
paper_authors: Tung Phung, Victor-Alexandru Pădurean, Anjali Singh, Christopher Brooks, José Cambronero, Sumit Gulwani, Adish Singla, Gustavo Soares
for: 提高编程教育质量 by 自动生成个性化反馈
methods: 使用生成AI模型提供人工导师式编程提示，使学生解决buggy程序错误
results: 通过使用GPT-4和GPT-3.5两个模型，提高生成质量，并通过自动评估提示质量，证明效果可行

Abstract
Generative AI and large language models hold great promise in enhancing programming education by automatically generating individualized feedback for students. We investigate the role of generative AI models in providing human tutor-style programming hints to help students resolve errors in their buggy programs. Recent works have benchmarked state-of-the-art models for various feedback generation scenarios; however, their overall quality is still inferior to human tutors and not yet ready for real-world deployment. In this paper, we seek to push the limits of generative AI models toward providing high-quality programming hints and develop a novel technique, GPT4Hints-GPT3.5Val. As a first step, our technique leverages GPT-4 as a ``tutor'' model to generate hints -- it boosts the generative quality by using symbolic information of failing test cases and fixes in prompts. As a next step, our technique leverages GPT-3.5, a weaker model, as a ``student'' model to further validate the hint quality -- it performs an automatic quality validation by simulating the potential utility of providing this feedback. We show the efficacy of our technique via extensive evaluation using three real-world datasets of Python programs covering a variety of concepts ranging from basic algorithms to regular expressions and data analysis using pandas library.

摘要
�� Makin' AI and big language models hold great promise in improvIN' programming education by automatically generatin' individualized feedback for students. We investigate the role of generative AI models in providin' human tutor-style programming hints to help students resolve errors in their buggy programs. Recent works have benchmarked state-of-the-art models for various feedback generation scenarios; however, their overall quality is still inferior to human tutors and not yet ready for real-world deployment. In this paper, we seek to push the limits of generative AI models toward providin' high-quality programming hints and develop a novel technique, GPT4Hints-GPT3.5Val. As a first step, our technique leverages GPT-4 as a "tutor" model to generate hints -- it boosts the generative quality by usin' symbolic information of failin' test cases and fixes in prompts. As a next step, our technique leverages GPT-3.5, a weaker model, as a "student" model to further validate the hint quality -- it performs an automatic quality validation by simulatin' the potential utility of providin' this feedback. We show the efficacy of our technique via extensive evaluation usin' three real-world datasets of Python programs coverin' a variety of concepts ranging from basic algorithms to regular expressions and data analysis usin' pandas library.

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

paper_url: http://arxiv.org/abs/2310.03684
repo_url: None
paper_authors: Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas
for: 提高大型自然语言模型（LLM）的安全性，防止攻击者利用LLM生成不良内容。
methods: 提出了首个针对LLM的攻击mitigation算法SmoothLLM，通过多个复制输入提示，并将其相应的预测结果集成以检测攻击输入。
results: SmoothLLM可以在许多流行的LLM上降低攻击成功率至0.1%以下，避免过度保守，并具有可证明的攻击防御保证。

Abstract
Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on LLMs. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. SmoothLLM reduces the attack success rate on numerous popular LLMs to below one percentage point, avoids unnecessary conservatism, and admits provable guarantees on attack mitigation. Moreover, our defense uses exponentially fewer queries than existing attacks and is compatible with any LLM.

摘要
尽管努力对大型语言模型（LLM）与人类价值观念进行对应，广泛使用的 LLM 如 GPT、Llama、Claude 和 PalM 仍然容易受到犯罪攻击，其中敌对者会让目标 LLM 生成不良内容。为解决这个漏洞，我们提出了 SmoothLLM，首个针对 LLM 进行犯罪攻击防御的算法。根据我们发现，恶意生成的提示语是字符级别上不稳定的，我们的防御首先随机干扰多个输入提示的字符，然后将相应的预测结果聚合以检测恶意输入。SmoothLLM 可以在许多流行的 LLM 上降低犯罪成功率至少一个百分点，避免不必要的保守性，并且具有可证明的攻击防御保证。此外，我们的防御需要比现有攻击更少的查询数量，并且可以与任何 LLM 兼容。

MapperGPT: Large Language Models for Linking and Mapping Entities

paper_url: http://arxiv.org/abs/2310.03666
repo_url: None
paper_authors: Nicolas Matentzoglu, J. Harry Caufield, Harshad B. Hegde, Justin T. Reese, Sierra Moxon, Hyeongsik Kim, Nomi L. Harris, Melissa A Haendel, Christopher J. Mungall
for: 提高数据 интеграция中的Entity mapping精度，使其更能准确地将不同资源中的实体映射到相应的概念上。
methods: 使用Large Language Models（LLMs）进行Entity mapping审核和修正，以提高 mapping 精度。
results: 在不同领域的Alignment任务中，MapperGPT可以与高准确率方法相结合，提供substantial改进的准确率，比如LogMap等State-of-the-art方法。

Abstract
Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is the process of determining correspondences between entities across these resources, such as gene identifiers, disease concepts, or chemical entity identifiers. Many tools have been developed to compute such mappings based on common structural features and lexical information such as labels and synonyms. Lexical approaches in particular often provide very high recall, but low precision, due to lexical ambiguity. As a consequence of this, mapping efforts often resort to a labor intensive manual mapping refinement through a human curator. Large Language Models (LLMs), such as the ones employed by ChatGPT, have generalizable abilities to perform a wide range of tasks, including question-answering and information extraction. Here we present MapperGPT, an approach that uses LLMs to review and refine mapping relationships as a post-processing step, in concert with existing high-recall methods that are based on lexical and structural heuristics. We evaluated MapperGPT on a series of alignment tasks from different domains, including anatomy, developmental biology, and renal diseases. We devised a collection of tasks that are designed to be particularly challenging for lexical methods. We show that when used in combination with high-recall methods, MapperGPT can provide a substantial improvement in accuracy, beating state-of-the-art (SOTA) methods such as LogMap.

摘要
合理资源对Alignment是各个领域的数据 интеграción中的关键环节，如医疗、化学和生物研究等。实体映射是确定这些资源中的实体之间对应关系的过程，例如基因标识符、疾病概念或化学实体标识符。许多工具已经开发出来计算这些对应关系，基于共同结构特征和 lexical信息，如标签和同义词。lexical方法通常提供很高的回快，但准确率很低，因为lexical是多义的。因此，映射努力通常需要劳动密集的手动映射纠正。大语言模型（LLMs），如ChatGPT中所使用的模型，具有通用的能力来完成广泛的任务，包括问答和信息提取。我们提出了MapperGPT，一种使用LLMs来复制和纠正映射关系的方法，并与现有的高准确率方法相结合。我们在不同领域的一系列对alignment任务中评估了MapperGPT。我们设计了一组特别适合lexical方法的任务，并显示了MapperGPT可以提供substantial提升的准确率，超过了State-of-the-art（SOTA）方法，如LogMap。

Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures

paper_url: http://arxiv.org/abs/2310.03659
repo_url: None
paper_authors: Thorsten Händler
for: 这篇论文是为了探讨自主的语言模型（LLM）动态抽象和协调框架，以便在复杂的多个任务中实现更好的AI功能。
methods: 这篇论文使用了多维度分类法来分析自主LLM多代理系统中的自动化和对齐平衡问题，并提供了一个领域ontology模型来 specify 基本的架构概念。
results: 这篇论文通过对一些代表性的LLM多代理系统的exploratory分类，ILLUSTRATE了其实际应用的实用性，并揭示了未来的研究和开发的潜在前景。

Abstract
Large language models (LLMs) have revolutionized the field of artificial intelligence, endowing it with sophisticated language understanding and generation capabilities. However, when faced with more complex and interconnected tasks that demand a profound and iterative thought process, LLMs reveal their inherent limitations. Autonomous LLM-powered multi-agent systems represent a strategic response to these challenges. Such systems strive for autonomously tackling user-prompted goals by decomposing them into manageable tasks and orchestrating their execution and result synthesis through a collective of specialized intelligent agents. Equipped with LLM-powered reasoning capabilities, these agents harness the cognitive synergy of collaborating with their peers, enhanced by leveraging contextual resources such as tools and datasets. While these architectures hold promising potential in amplifying AI capabilities, striking the right balance between different levels of autonomy and alignment remains the crucial challenge for their effective operation. This paper proposes a comprehensive multi-dimensional taxonomy, engineered to analyze how autonomous LLM-powered multi-agent systems balance the dynamic interplay between autonomy and alignment across various aspects inherent to architectural viewpoints such as goal-driven task management, agent composition, multi-agent collaboration, and context interaction. It also includes a domain-ontology model specifying fundamental architectural concepts. Our taxonomy aims to empower researchers, engineers, and AI practitioners to systematically analyze the architectural dynamics and balancing strategies employed by these increasingly prevalent AI systems. The exploratory taxonomic classification of selected representative LLM-powered multi-agent systems illustrates its practical utility and reveals potential for future research and development.

摘要
大型语言模型（LLM）已经革命化人工智能领域，具备了复杂语言理解和生成能力。然而，当面临更复杂和相互连接的任务时，LLM具有内在的限制。自主 LLM 驱动多代理系统是一种策略性应对这些挑战的回应。这些系统通过自动将用户提交的目标 decomposing 成可管理的任务，并通过一群特殊智能代理来进行执行和结果合成。这些代理通过 LLM 强化的理解能力，可以协同合作，通过利用上下文资源 such as 工具和数据集来提高合作效果。虽然这些体系具有潜在的扩展可能性，但是保持不同水平的自主和对齐是关键的挑战。这篇论文提出了一种多维度分类，用于分析自主 LLM 驱动多代理系统如何在不同的体系视角下平衡动态的自主和对齐。它还包括一个领域 ontology 模型，描述了基本体系概念。我们的分类旨在为研究者、工程师和 AI 实践者提供系统性分析自主 LLM 驱动多代理系统的建议和指导。选择代表 LLM 驱动多代理系统的exploratory分类表明了我们的分类的实用性，并揭示了未来研究和发展的潜在 potential。

paper_url: http://arxiv.org/abs/2310.03779
repo_url: None
paper_authors: Yanming Wan, Jiayuan Mao, Joshua B. Tenenbaum
for: 本研究是为了评估机器人理解和执行人类指令的全面评估标准。
methods: 本研究使用了人类行为轨迹、物理环境和社会各种各样的信息来评估机器人理解和执行人类指令的能力。
results: 研究发现现有的语言固定和规划方法在HandMeThat上表现不佳， suggesting significant room for future work on physical and social human-robot communications and interactions。

Abstract
We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.

摘要
我们介绍HandMeThat，一个标准套件用于评估人工智能机器人理解和遵循语言指令的全面评估。在过去的数据集中，大多集中在语言落实和规划上，而HandMeThat则考虑了人类指令的解释时的物理（物体状态和关系）和社交（人类行为和目标）信息。HandMeThat包含10,000集的人机互动纪录。在每个集中，机器人首先观察人类行为的轨迹，然后接收人类指令，并通过指令中的子目标来完成。在这篇文章中，我们提供了文本界面 для我们的标准套件，机器人通过文本命令与虚拟环境互动。我们评估了多个基eline模型在HandMeThat上，并发现了 both offline和线上循环学习算法在HandMeThat上表现不佳，这表明了未来人工智能机器人与人类沟通和互动的Physical和社交方面还有很大的潜力。

CLEVRER-Humans: Describing Physical and Causal Events the Human Way

paper_url: http://arxiv.org/abs/2310.03635
repo_url: None
paper_authors: Jiayuan Mao, Xuelin Yang, Xikun Zhang, Noah D. Goodman, Jiajun Wu
for: 本研究旨在建立机器可以理解物理事件和其 causal 关系，以便与物理世界进行灵活交互。
methods: 研究使用了两种技术来提高数据收集效率：一种是使用新的迭代事件cloze任务来生成视频中事件的新表示，称为 causal event graphs (CEGs)；另一种是基于神经语言生成模型的数据增强技术。
results: 研究提出了一个名为 CLEVRER-Humans 的视频理解数据集，用于评估物理事件的 causal 判断。研究还展示了一些基准方法的表现， highlighting 该 benchmark 对机器学习领域的挑战。

Abstract
Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.

摘要
建立机器可以理解物理事件和其 causal 关系是在互动性的世界中灵活交互的关键。然而，大多数现有的物理和 causal 逻辑标准是基于生成的事件和人工生成的自然语言描述 causal 关系。这种设计存在两个问题：首先，数据的多样性不够，其次， causal 关系基于人工定义的规则与人类判断不同。为了解决这两个缺陷，我们提出了 CLEVRER-Humans benchmark，一个基于视频逻辑的 causal 判断数据集，它们由人类标签。我们采用了两种技术来提高数据收集效率：首先，一种新的迭代事件cloze任务，用于生成视频中事件的新表示，我们称之为 causal event graph (CEG)；其次，基于神经语言生成模型的数据增强技术。我们将收集的 CEG 转换成问题和答案，与先前的工作一致。最后，我们研究了 CLEVRER-Humans 问题回答的一些基准方法，并 highlighted 这些标准的巨大挑战。

PeaTMOSS: Mining Pre-Trained Models in Open-Source Software

paper_url: http://arxiv.org/abs/2310.03620
repo_url: https://github.com/purduedualitylab/peatmoss-demos
paper_authors: Wenxin Jiang, Jason Jones, Jerin Yasmin, Nicholas Synovic, Rajeev Sashti, Sophie Chen, George K. Thiruvathukal, Yuan Tian, James C. Davis
for: 这个论文的目的是为了研究基于预训练深度学习模型（PTM）的软件工程做出一个大规模的数据集，以便更好地理解PTM在软件工程中的应用和挑战。
methods: 这篇论文使用了一个名为PeaTMOSS的数据集，该数据集包含281,638个预训练深度学习模型和27,270个开源软件项目，以及这些项目中PTM的使用情况。论文还提出了一个挑战，即通过分析PTM在软件工程中的使用情况，探索PTM在软件工程中的应用和挑战。
results: 论文提出了一个名为PeaTMOSS的数据集，该数据集包含大量的PTM和相关的软件项目信息，可供研究PTM在软件工程中的应用和挑战。此外，论文还提出了一个挑战，以便研究PTM在软件工程中的应用和挑战。

Abstract
Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the wide-spread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: https://github.com/PurdueDualityLab/PeaTMOSS-Demos.

摘要
开发和训练深度学习模型是昂贵的，因此软件工程师们开始 reuse pre-trained deep learning models (PTMs) 并对其进行精度调整用于下游任务。尽管PTMs 的使用广泛，但我们对相关的软件工程行为和挑战知之甚少。为了启用PTMs 的研究，我们提供了 PeaTMOSS 数据集：开源软件中的 Pre-Trained Models。PeaTMOSS 包括三部分：(1) 281,638 个 PTMs，(2) 27,270 个开源软件仓库使用 PTMs，以及 (3) PTMs 和这些项目之间的映射。我们挑战 PeaTMOSS 挖掘者发现PTMs 在软件工程中的实践。 demo 和数据集的链接可以在：https://github.com/PurdueDualityLab/PeaTMOSS-Demos 中找到。

Solving a Class of Non-Convex Minimax Optimization in Federated Learning

paper_url: http://arxiv.org/abs/2310.03613
repo_url: None
paper_authors: Xidong Wu, Jianhui Sun, Zhengmian Hu, Aidong Zhang, Heng Huang
for: addressing large-scale data challenges in machine learning applications with communication-efficient distributed training
methods: Federated Learning (FL) algorithms (FedSGDA+ and FedSGDA-M) and existing centralized optimization algorithms
results: reduced communication complexity and improved sample complexity for nonconvex-concave and nonconvex-strongly-concave minimax problems, with the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$

Abstract
The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (\emph{i.e.} single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to $O(\varepsilon^{-6})$. Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$. FedSGDA-M is the first algorithm to match the best sample complexity $O(\varepsilon^{-3})$ achieved by the single-machine method under the nonconvex-strongly-concave setting. Extensive experimental results on fair classification and AUROC maximization show the efficiency of our algorithms.

摘要
“小最大最小值问题在机器学习应用中随处出现，从对抗训练和奖励学习策略评估到AUROC最大化。为了 Addressing 大规模数据问题 across multiple clients with communication-efficient distributed training, federated learning (FL) 在 gaining popularity。许多中央化设置中的最小最大值问题优化算法已经被开发出来，但是在 FL setting 中，这个问题还未得到充分研究。在这篇论文中，我们研究了一类联邦非凸最小最大值优化问题。我们提出了 FedSGDA+ 和 FedSGDA-M 算法，并将 существу的复杂性结果缩小到最常见的最小最大值问题中。对非凸-凹型问题，我们提出了 FedSGDA+，并将通信复杂性降至 $O(\varepsilon^{-6})$。在非凸-强凹和非凸-PL最小最大值设置中，我们证明了 FedSGDA-M 的样本复杂性为 $O(\kappa^{3} N^{-1}\varepsilon^{-3})$，并且通信复杂性为 $O(\kappa^{2}\varepsilon^{-2})$。FedSGDA-M 是第一个与单机器方法在非凸-强凹设置中匹配的样本复杂性 $O(\varepsilon^{-3})$。我们的实验结果表明，我们的算法在公平分类和 AUROC 最大化中具有高效性。”

FASER: Binary Code Similarity Search through the use of Intermediate Representations

paper_url: http://arxiv.org/abs/2310.03605
repo_url: https://github.com/br0kej/FASER
paper_authors: Josh Collyer, Tim Watson, Iain Phillips
for: 本研究旨在提高跨架构软件功能关注的能力，以便分析恶意软件、安全软件供应链和漏洞研究等领域。
methods: 本研究使用了 binary intermediate representations（ Intermediate Representations，IR）作为数据源，并提出了一种基于文档长Transformers的函数为字符串编码表示（FASER）模型，以实现跨架构函数搜索无需人工特征工程、预训练或动态分析步骤。
results: compared to several baseline methods, the proposed FASER model demonstrates strong performance in both general function search and targeted vulnerability search tasks, outperforming all baseline approaches.

Abstract
Being able to identify functions of interest in cross-architecture software is useful whether you are analysing for malware, securing the software supply chain or conducting vulnerability research. Cross-Architecture Binary Code Similarity Search has been explored in numerous studies and has used a wide range of different data sources to achieve its goals. The data sources typically used draw on common structures derived from binaries such as function control flow graphs or binary level call graphs, the output of the disassembly process or the outputs of a dynamic analysis approach. One data source which has received less attention is binary intermediate representations. Binary Intermediate representations possess two interesting properties: they are cross architecture by their very nature and encode the semantics of a function explicitly to support downstream usage. Within this paper we propose Function as a String Encoded Representation (FASER) which combines long document transformers with the use of intermediate representations to create a model capable of cross architecture function search without the need for manual feature engineering, pre-training or a dynamic analysis step. We compare our approach against a series of baseline approaches for two tasks; A general function search task and a targeted vulnerability search task. Our approach demonstrates strong performance across both tasks, performing better than all baseline approaches.

摘要
“能够识别感兴趣的函数在跨架构软件中是有用的，无论你是分析恶意软件、安全软件供应链或进行攻击性研究。跨架构软件Binary Code相似性搜寻已经在多个研究中探讨过，通常使用了各种不同的数据来源来实现目的。这些数据来源通常是根据binaries的常见结构，例如函数控制流图或二进制层级呼叫图、这些output的资料分析结果或动态分析的结果。然而，一个较少受到注意的数据来源是二进制中继表示。二进制中继表示具有两个有趣的性能：它们是跨架构的本性，且可以明确地表示函数的 semantics，以便在下游使用。在这篇文章中，我们提出了Function as a String Encoded Representation（FASER），它结合了长文本转换器和中继表示，创建了可以在跨架构上进行函数搜寻，不需要手动的特性工程、预训练或动态分析步骤。我们与一些基eline方法进行比较，在两个任务上显示了强大的表现，比基eline方法更好。”

How toxic is antisemitism? Potentials and limitations of automated toxicity scoring for antisemitic online content

paper_url: http://arxiv.org/abs/2310.04465
repo_url: None
paper_authors: Helena Mihaljević, Elisabeth Steffen
for: 这个论文探讨了Google和Jigsaw开发的Perspective API在探测仇视性言语中的潜力和局限性，特别是在内容审核、监测和社交媒体研究等领域。methods: 作者使用了一个手动标注的德语 dataset，包括来自 Telegram 和 Twitter 的约 3,600 条帖子，以探索恶意言语是如何被评估为恶势力，以及不同形式的反犹太主义和文本表达的偏好是如何影响评分。results: 作者发现，Perspective API 在基本水平上能够识别恶意言语，但对非显式的反犹太主义和批判反犹太主义的文本表达存在重大的局限性。此外，作者还发现，通过使用广泛的反犹太主义代码，可以减少 API 分数，从而轻松绕过基于该服务的内容审核。

Abstract
The Perspective API, a popular text toxicity assessment service by Google and Jigsaw, has found wide adoption in several application areas, notably content moderation, monitoring, and social media research. We examine its potentials and limitations for the detection of antisemitic online content that, by definition, falls under the toxicity umbrella term. Using a manually annotated German-language dataset comprising around 3,600 posts from Telegram and Twitter, we explore as how toxic antisemitic texts are rated and how the toxicity scores differ regarding different subforms of antisemitism and the stance expressed in the texts. We show that, on a basic level, Perspective API recognizes antisemitic content as toxic, but shows critical weaknesses with respect to non-explicit forms of antisemitism and texts taking a critical stance towards it. Furthermore, using simple text manipulations, we demonstrate that the use of widespread antisemitic codes can substantially reduce API scores, making it rather easy to bypass content moderation based on the service's results.

摘要
Google和Jigsaw的Perspective API，一个流行的文本恶意评估服务，在多个应用领域得到了广泛的采用，主要包括内容审核、监测和社交媒体研究。我们研究了它在检测仇Semite在线内容的潜力和局限性。使用一个手动注释的德语 dataset，包括 Telegram 和 Twitter 上约 3,600 个帖子，我们探索了恶意文本是如何被评分，以及不同形式的反Semite和文本表达的立场如何影响了评分。我们发现，在基本水平上，Perspective API 能够识别反Semite内容为恶意，但对于不直接表达的反Semite和批判反Semite的文本表达存在重要的限制。此外，我们使用简单的文本修改示例，示出了使用广泛的反Semite代码可以减少 API 分数，使得内容审核基于服务的结果相对较容易被逃脱。

paper_url: http://arxiv.org/abs/2310.03581
repo_url: None
paper_authors: Jin Jin, Chong Zhang, Jonas Frey, Nikita Rudin, Matias Mattamala, Cesar Cadena, Marco Hutter
for: 本研究旨在帮助自主机器人在未知环境中快速准确 Navigation，即使感知受到干扰或错误。
methods: 本文使用再归折 learning（RL）基于本地决策策略，通过在潜在空间重建环境信息，以便在感知失败时进行有效应对。
results: 在模拟和实际四足机器人 ANYmal 上，本策略在面临感知失败时成功率高于30%，与传统基于规则的本地响应策略相比。I hope that helps! Let me know if you have any other questions.

Abstract
Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforcement learning (RL) based local navigation policy to guide our legged robot. Unlike previous works relying on heuristics and anomaly detection to update navigational information, we train our navigation policy to reconstruct the environment information in the latent space from corrupted perception and react to perception failures end-to-end. To this end, we incorporate both proprioception and exteroception into our policy inputs, thereby enabling the policy to sense collisions on different body parts and pits, prompting corresponding reactions. We validate our approach in simulation and on the real quadruped robot ANYmal running in real-time (<10 ms CPU inference). In a quantitative comparison with existing heuristic-based locally reactive planners, our policy increases the success rate over 30% when facing perception failures. Project Page: https://bit.ly/45NBTuh.

摘要
（简化中文）自主机器人需要在未知环境中 Navigation 可靠，即使感知受到了损害。这些损害通常发生在具有劣化感知的环境中，或者感知算法错误地理解场景。在这篇论文中，我们将感知失败模型为隐藏的障碍物和坑，并使用强化学习（RL）基于的本地导航政策来引导我们的四肢机器人。与前一些基于规则和异常检测来更新导航信息的方法不同，我们的导航政策可以在损害感知中重建环境信息，并在端到端进行反应。为此，我们将 proprioception 和 exteroception 作为政策输入，以便政策可以感受到不同的身体部分和坑，并且进行相应的反应。我们在模拟和真实的四肢机器人 ANYmal 上进行了实时（<10 ms CPU 推理）验证，并与现有的启发式本地反应计划进行了量化比较。在面临感知失败情况下，我们的政策的成功率高于30%。项目页面：https://bit.ly/45NBTuh.

Causal Inference in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems

paper_url: http://arxiv.org/abs/2310.03579
repo_url: None
paper_authors: Trang Nguyen, Alexander Tong, Kanika Madan, Yoshua Bengio, Dianbo Liu
for: 该研究旨在提高生物学GRNs中 causal structure learning的效能，并解决了可扩展性问题。
methods: 该研究提出了 Swift-DynGFN 框架，具有 gene-wise independence 特点，可以提高并行化和计算成本下降。
results: 实验表明，Swift-DynGFN 可以在实验室单细胞 RNA 速度数据和 sintetic GRN 数据上快速学习 causal structure，并在更大的系统中保持可扩展性。

Abstract
Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular processes. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.

摘要
理解生物细胞中的生物化学网络（GRNs）内部的 causal 关系是解释细胞过程中 gene 之间的交互的关键。然而，在 GRNs 中发现 causal 关系是一项复杂的问题，因为存在循环反馈征和不确定性，导致可能出现多种 causal 结构。先前的工作往往忽略循环动态（假设无环结构）或者缺乏可扩展性。我们介绍 Swift-DynGFN 框架，该框架可以提高 GRNs 中 causal 结构学习的效果，同时解决可扩展性问题。具体来说，Swift-DynGFN 利用了每个基因独立性来提高并行化和降低计算成本。实验表明，Swift-DynGFN 在真实的单元细胞 RNA 速度数据和 sintetic GRN 数据上可以有效地提高 GRNs 中 causal 结构的学习和可扩展性。

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching

paper_url: http://arxiv.org/abs/2310.12999
repo_url: None
paper_authors: Junliang Luo, Yi Tian Xu, Di Wu, Michael Jenkin, Xue Liu, Gregory Dudek
for: 提高无线网络的能效性，适应新一代无线网络的发展需求，环境和政策因素，以及可能的能源危机。
methods: 使用准确动态 програм理论（ADP）和在线优化，根据状态动作对各个基站Cells进行开关，以减少网络功率消耗，保持足够的服务质量指标（QoS）。
results: 使用多层感知器（MLP）和长期快速储存（LSTM）预测功率和QoS，并在在线优化算法中采用自适应QoS阈值来筛选基站Cells的 switching 动作，以实现最大化功率减少而不妨碍QoS。

Abstract
Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.

摘要
“无线网络的能源救减在日益增长的重要性中，因为新一代无线网络的需求不断增长，环境和 regulatory 因素，以及可能的能源危机问题，导致由地opolitical 紧张关系。在这个工作中，我们提出一个基于推对 Dynamic Programming（ADP）的方法，与在线服务器进行优化，以降低网络电力消耗，同时保持足够的服务质量（QoS）指标。我们使用每个状态-行动 pairs 的多层感知神经网络（MLP）来预测电力消耗，以近似值函数在 ADP 中选择最佳的行动。为了储存最大化的电力消耗，而不损害 QoS，我们另外使用一个 MLP 预测 QoS，并与一个长期传递内存（LSTM）进行估计，将其组合入线上优化算法，生成适应的 QoS 阈值，以根据网络的 QoS 历史进行筛选网络转换动作。这个方法的表现被评估使用实际的网络 simulator，以及不同的实际情况，包括动态的流量模式。”

Lightweight Boosting Models for User Response Prediction Using Adversarial Validation

paper_url: http://arxiv.org/abs/2310.03778
repo_url: None
paper_authors: Hyeonwoo Kim, Wonsung Lee
for: 预测应用程序安装概率
methods: 使用对抗验证、特征工程和 Gradient Boosted Decision Trees (GBDT) 实现轻量级解决方案
results: 在 ACM RecSys Challenge 2023 中，我们的方法取得了最终排名第九的成绩，得分为 6.059065。

Abstract
The ACM RecSys Challenge 2023, organized by ShareChat, aims to predict the probability of the app being installed. This paper describes the lightweight solution to this challenge. We formulate the task as a user response prediction task. For rapid prototyping for the task, we propose a lightweight solution including the following steps: 1) using adversarial validation, we effectively eliminate uninformative features from a dataset; 2) to address noisy continuous features and categorical features with a large number of unique values, we employ feature engineering techniques.; 3) we leverage Gradient Boosted Decision Trees (GBDT) for their exceptional performance and scalability. The experiments show that a single LightGBM model, without additional ensembling, performs quite well. Our team achieved ninth place in the challenge with the final leaderboard score of 6.059065. Code for our approach can be found here: https://github.com/choco9966/recsys-challenge-2023.

摘要
“ACM RecSys Challenge 2023”，由 ShareChat 主办，旨在预测应用程序安装概率。这篇文章描述了一种轻量级解决方案。我们将任务定型为用户响应预测任务。为了快速原型，我们提议以下步骤：1. 使用对抗验证，有效地从数据集中消除无用的特征。2. 对噪音连续特征和 categorical 特征（具有大量唯一值）使用Feature工程技术。3. 利用 Gradient Boosted Decision Trees（GBDT）的异常表现和可扩展性。实验显示，一个单独的 LightGBM 模型（没有额外 ensemble）表现非常好。我们在挑战中获得第九名，最终排名为 6.059065。我们的代码可以在以下地址找到：https://github.com/choco9966/recsys-challenge-2023。

Towards Robust and Generalizable Training: An Empirical Study of Noisy Slot Filling for Input Perturbations

paper_url: http://arxiv.org/abs/2310.03518
repo_url: None
paper_authors: Jiachi Liu, Liwen Wang, Guanting Dong, Xiaoshuai Song, Zechen Wang, Zhengyang Wang, Shanglin Lei, Jinzheng Zhao, Keqing He, Bo Xiao, Weiran Xu
for: 这篇论文主要研究了语音识别领域中的噪声稳定性评价方法。
methods: 该论文提出了一个名为Noise-SF的噪声稳定性评价 datasets，该dataset包含了五种人工标注的噪声类型，并且这些噪声类型都是实际中广泛使用的噪声稳定性训练方法中的一部分。
results: 经过对Noise-SF dataset的广泛实验测试，基eline模型在噪声稳定性评价中表现不佳，而提出的框架则能够有效地提高模型的噪声稳定性。根据实验结果，我们提出了一些前瞻性的建议，以促进噪声稳定性研究的发展。

Abstract
In real dialogue scenarios, as there are unknown input noises in the utterances, existing supervised slot filling models often perform poorly in practical applications. Even though there are some studies on noise-robust models, these works are only evaluated on rule-based synthetic datasets, which is limiting, making it difficult to promote the research of noise-robust methods. In this paper, we introduce a noise robustness evaluation dataset named Noise-SF for slot filling task. The proposed dataset contains five types of human-annotated noise, and all those noises are exactly existed in real extensive robust-training methods of slot filling into the proposed framework. By conducting exhaustive empirical evaluation experiments on Noise-SF, we find that baseline models have poor performance in robustness evaluation, and the proposed framework can effectively improve the robustness of models. Based on the empirical experimental results, we make some forward-looking suggestions to fuel the research in this direction. Our dataset Noise-SF will be released at https://github.com/dongguanting/Noise-SF.

摘要
在实际对话场景中，由于输入噪声的存在，现有的超级vised插槽填充模型在实际应用中经常表现不佳。尽管有一些关于噪声Robustness的研究，但这些研究仅在基于规则生成的 sintetic 数据上进行评估，这限制了研究的发展。在这篇论文中，我们介绍了一个噪声Robustness评估集合名为Noise-SF，该集合包含了五种人类标注的噪声，这些噪声都是现实中广泛采用的Robust-training方法中的噪声。经过了广泛的实验研究，我们发现基eline模型在Robustness评估中表现不佳，而我们提出的框架可以有效提高模型的Robustness。基于实验结果，我们提出了一些前瞻的建议，以推动这一方向的研究。我们的Noise-SF数据集将在 GitHub 上发布，链接为：https://github.com/dongguanting/Noise-SF。

How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

paper_url: http://arxiv.org/abs/2310.03494
repo_url: None
paper_authors: Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht
for: 本研究旨在解释deep reinforcement learning（RL）训练出的自适应代理人能否在新环境中展现良好的扩展性。
methods: 我们使用非均匀采样策略来测试RL代理人的零例扩展性（ZSG），包括两种失败模式：过拟合和过总结。我们首先测量RL代理人内部表征和训练级别之间的相互信息（MI），并发现非均匀采样策略可以更好地保持低MI，这提供了一种新的理论依据。然后，我们转移到无监督环境设计（UED）方法，这些方法可以在运行时动态生成新的训练级别，并尽可能减少MI。然而，我们发现UED方法会导致训练分布的显著变化，从而导致过总结和worse ZSG性能。
results: 我们引入自适应环境设计（SSED）方法，SSED使用变量自动编码器来生成级别，从而减少MI并最小化与目标分布的偏移。SSED方法与固定集合级别采样策略和UED方法相比，导致了统计学上显著的ZSG性能改善。

Abstract
A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.

摘要
深度强化学习训练的自主Agent有一个关键的局限性，即其在新环境中的泛化能力较差，即使新环境与训练环境有相似特征。在这项工作中，我们研究了非均匀抽样策略对深度强化学习Agent的零例泛化能力（ZSG）的影响，包括两种失败模式：过拟合和过泛化。作为第一步，我们测量了Agent内部表示和训练环境集之间的相互信息（MI），发现它与实例过拟合高度相关。与均匀抽样不同，适应抽样策略，根据预测值损失来决定抽样级别，能够保持更低的MI，提供了一种新的理论依据。然后我们转向无监督环境设计（UED）方法，这些方法可以在运行时动态生成新的训练级别，并最效地减少MI。然而，我们发现UED方法会导致训练分布的显著变化，从而导致过泛化和较差的ZSG性能。为了避免实例过拟合和过泛化，我们介绍了无监督环境设计（SSED）。SSED使用变量自动编码器生成级别，并能够减少MI，同时尽可能减少与目标分布的偏移，从而导致了统计学上的改进。

Tik-to-Tok: Translating Language Models One Token at a Time: An Embedding Initialization Strategy for Efficient Language Adaptation

paper_url: http://arxiv.org/abs/2310.03477
repo_url: None
paper_authors: François Remy, Pieter Delobelle, Bettina Berendt, Kris Demuynck, Thomas Demeester
for: addresses the challenge of training monolingual language models for low and mid-resource languages
methods: uses a novel model conversion strategy that adapts high-resource monolingual language models to a new target language
results: achieves a new state-of-the-art performance on mid- and low-resource languages, and reduces significantly the amount of data and time required for training state-of-the-art models.

Abstract
Training monolingual language models for low and mid-resource languages is made challenging by limited and often inadequate pretraining data. In this study, we propose a novel model conversion strategy to address this issue, adapting high-resources monolingual language models to a new target language. By generalizing over a word translation dictionary encompassing both the source and target languages, we map tokens from the target tokenizer to semantically similar tokens from the source language tokenizer. This one-to-many token mapping improves tremendously the initialization of the embedding table for the target language. We conduct experiments to convert high-resource models to mid- and low-resource languages, namely Dutch and Frisian. These converted models achieve a new state-of-the-art performance on these languages across all sorts of downstream tasks. By reducing significantly the amount of data and time required for training state-of-the-art models, our novel model conversion strategy has the potential to benefit many languages worldwide.

摘要
training monolingual language models for low and mid-resource languages is challenging due to limited and inadequate pretraining data. in this study, we propose a novel model conversion strategy to address this issue, adapting high-resource monolingual language models to a new target language. by generalizing over a word translation dictionary encompassing both the source and target languages, we map tokens from the target tokenizer to semantically similar tokens from the source language tokenizer. this one-to-many token mapping improves the initialization of the embedding table for the target language. we conduct experiments to convert high-resource models to mid- and low-resource languages, namely dutch and frisian. these converted models achieve a new state-of-the-art performance on these languages across all sorts of downstream tasks. by significantly reducing the amount of data and time required for training state-of-the-art models, our novel model conversion strategy has the potential to benefit many languages worldwide.

Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties

paper_url: http://arxiv.org/abs/2310.04463
repo_url: None
paper_authors: Siyuan Guo, Jihong Guan, Shuigeng Zhou
for: 本研究旨在提出一种新的分子生成方法，以提高现有模型的分子生成效果。
methods: 本研究使用了扩展的扩散模型框架，并提出了多种创新的设计方法。在这些设计方法中，我们首次在分子生成过程中使用了电子效应基于的分子 фрагмента化方法。此外，我们还引入了多个目标函数来同时优化多个分子性质。
results: 对于两个参考数据集QM9和ZINC250k，我们的提议方法可以生成比现状态最佳的分子，其中包括有效性、独特性、新颖性、Fréchet ChemNet距离（FCD）、QED和PlogP等多个分子性质。

Abstract
In the past decade, Artificial Intelligence driven drug design and discovery has been a hot research topic, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue only the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fr\'echet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models.

摘要
在过去的一个 décennie，人工智能驱动的药物设计和发现已经是研究热点，其中一个重要分支是通过生成模型生成分子，从GAN基于模型和VAE基于模型到最新的扩散模型。然而，大多数现有模型只追求基本属性，如有效性和uniqueness的生成分子，很少进一步optimize一个重要的分子性质（例如QED或PlogP），这使得大多数生成的分子在实际应用中有 Limited usefulness。在这篇论文中，我们提出了一种新的分子生成方法，拓展了扩散模型框架，并通过多种创新的设计来提高生成分子的性能。 novelties twofold。一方面，由于分子结构复杂多样，而分子性质通常由一些子结构（例如药理学残基）决定，我们提议在分子和分子段级别进行扩散，从而获得一个混合 Gaussian 分布，用于逆扩散过程。另一方面，我们开发了一种新的电子效应基于的分子段化方法，以获得更好的分子段。此外，我们引入了两种方法来直接在扩散模型框架下进行多个分子性质的优化。首先，以实际的药物分子必须有化学有效性为前提，我们使用能量引导函数来优化分子有效性。其次，由于实际的药物分子应该具有多种性质，我们采用多目标机制来同时优化多个分子性质。在两个标准测试集QM9和ZINC250k上进行了广泛的实验，发现生成的分子具有更高的有效性、uniqueness、新鲜度、Fréchet ChemNet Distance (FCD)、QED和PlogP等性质，而与当前最佳模型相比，具有更高的效果。

A Quantitatively Interpretable Model for Alzheimer’s Disease Prediction Using Deep Counterfactuals

paper_url: http://arxiv.org/abs/2310.03457
repo_url: None
paper_authors: Kwanseok Oh, Da-Woon Heo, Ahmad Wisnu Mulyadi, Wonsik Jung, Eunsong Kang, Kun Ho Lee, Heung-Il Suk
For: The paper aims to provide a more interpretable and effective approach for predicting Alzheimer’s disease (AD) using counterfactual reasoning and gray matter density maps.* Methods: The paper proposes a framework that synthesizes counterfactual-labeled structural MRIs, transforms them into gray matter density maps, and uses a lightweight linear classifier to boost predictive performance and provide quantitative interpretation.* Results: The paper demonstrates that the proposed framework can produce an “AD-relatedness index” for each region of interest (ROI) and offer an intuitive understanding of brain status for individuals and patient groups with respect to AD progression, with comparable predictive performance to deep learning methods.Here is the same information in Simplified Chinese text:
for: 这项研究旨在通过对比逻辑和灰 mater 激光扫描图像来提供更加可解释的和有效的阿尔茨哈默病 (AD) 预测方法。
methods: 这项研究提议一种框架，该框架可以将对比逻辑标注的结构MRIs转换为灰 mater 激光扫描图像，并使用轻量级线性分类器来提高预测性能和提供量化解释。
results: 这项研究表明，提议的框架可以生成每个区域兴趣 (ROI) 的 “AD相关性指数”，并为每个个体和患者群提供有关 AD 进程的直观理解，与深度学习方法相比具有相同的预测性能。

Abstract
Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explanatory maps based on visual inspection alone are insufficient unless we intuitively demonstrate their medical or neuroscientific validity via quantitative features. In this study, we synthesize the counterfactual-labeled structural MRIs using our proposed framework and transform it into a gray matter density map to measure its volumetric changes over the parcellated region of interest (ROI). We also devised a lightweight linear classifier to boost the effectiveness of constructed ROIs, promoted quantitative interpretation, and achieved comparable predictive performance to DL methods. Throughout this, our framework produces an ``AD-relatedness index'' for each ROI and offers an intuitive understanding of brain status for an individual patient and across patient groups with respect to AD progression.

摘要
深度学习（DL）用于预测阿尔茨曼尼尔病（AD）已经提供了及时的 intervención，但仍需要注意的解释性来解释它们的DL模型如何做出定义性的决策。最近，对比因果逻辑得到了医学研究中的越来越多的注意，因为它可以提供一个精细的视觉解释地图。然而，基于视觉检查 alone的视觉解释地图是无效的，除非我们能够INTRODUCE其医学或神经科学的VALIDITY通过量化特征。在这种研究中，我们将Counterfactual-labeled的结构MRI使用我们的提案的框架进行合成，并将其转换成灰色物质浓度地图，以测量ROI中的体积变化。我们还开发了一种轻量级的线性分类器，以提高构建的ROIs的效果，促进量化解释，并实现与DL方法相同的预测性能。通过这种方式，我们的框架生成了每个ROI的“AD相关性指数”，并提供了对个人患者和patient group的AD进程状况的直观理解。

Pre-Training and Fine-Tuning Generative Flow Networks

paper_url: http://arxiv.org/abs/2310.03419
repo_url: None
paper_authors: Ling Pan, Moksh Jain, Kanika Madan, Yoshua Bengio
for: 这个论文的目的是探索如何使用 reward-free pre-training 方法来快速适应下游任务，并且可以快速发现更多的模式。
methods: 这个论文使用了 Generative Flow Networks (GFlowNets) 作为探索 compositional objects 的方法，并通过自我监督的方式来训练 GFlowNets。
results: 实验结果表明，使用 reward-free pre-training 方法可以快速适应下游任务，并且可以快速发现更多的模式。此外，这种方法还可以在不知道下游任务的情况下进行预训练，从而提高下游任务的效果。

Abstract
Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.

摘要
generate 生成 Flow Networks (GFlowNets) 是束缚 samplers ，它们学习随机政策来从给定的非正态奖励分布中顺序生成组合性 объек。它们可以生成多个高奖对象，这是科学发现任务中的重要考虑因素。然而，由于它们通常从给定的外部奖励函数进行训练，因此如何利用预训练的力量并在下游任务中训练 GFlowNets 是一个重要的开放挑战。 inspirited by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. However, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.

Domain Generalization for Medical Image Analysis: A Survey

paper_url: http://arxiv.org/abs/2310.08598
repo_url: None
paper_authors: Jee Seok Yoon, Kwanseok Oh, Yooseung Shin, Maciej A. Mazurowski, Heung-Il Suk
for: 这篇论文旨在探讨医疗影像分析（MedIA）中深度学习（DL）的应用，以及DL模型在真实世界中的应用问题。
methods: 这篇论文评论了医疗影像分析领域内的领域整合研究，包括数据水平、特征水平、模型水平和分析水平的方法。
results: 这篇论文提供了医疗影像分析领域内执行预测和分析时的绩效，以及不同领域整合方法的优缺点，并揭露未来研究的机遇。

Abstract
Medical Image Analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, DL models for MedIA remain challenging to deploy in real-world situations, failing for generalization under the distributional gap between training and testing samples, known as a distribution shift problem. Researchers have dedicated their efforts to developing various DL methods to adapt and perform robustly on unknown and out-of-distribution data distributions. This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We provide a holistic view of how domain generalization techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize domain generalization methods into data-level, feature-level, model-level, and analysis-level methods. We show how those methods can be used in various stages of the MedIA workflow with DL equipped from data acquisition to model prediction and analysis. Furthermore, we include benchmark datasets and applications used to evaluate these approaches and analyze the strengths and weaknesses of various methods, unveiling future research opportunities.

摘要
医疗图像分析（MedIA）已成为医学和医疗领域的重要工具，帮助诊断疾病、预测疾病趋势和制定治疗计划，而最近的深度学习（DL）技术的发展也为其带来了 significiant 改进。然而，DL模型在实际应用中仍然面临 distribuional shift 问题，即训练和测试样本的分布不同，导致模型的泛化问题。研究人员对此做出了努力，开发了多种 DL 方法来适应和在未知和非标准数据分布下表现稳定。本文对医疗图像分析领域的域合理化研究进行了全面的回顾，不仅涵盖了不同的 DL 方法，还考虑了这些方法在整个 MedIA 工作流程中的操作影响。特别是，我们将域合理化方法分为数据级、特征级、模型级和分析级方法，并详细介绍了这些方法在不同阶段的 MedIA 工作流程中的应用。此外，我们还提供了一些标准的数据集和应用，用于评估这些方法的效果，并分析了各种方法的优缺点，探讨未来的研究机遇。

GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.03399
repo_url: https://github.com/dfdazac/grapes
paper_authors: Taraneh Younesian, Thiviyan Thanapalasingam, Emile van Krieken, Daniel Daza, Peter Bloem
for: 这篇论文的目的是提出一种适应性 Graph Sampling 方法，以便在各种结构和任务下减少 GNN 的内存开销。
methods: 这篇论文使用了 GFlowNet 来学习节点抽样概率，以达到在 GNN 分类器训练中 Identify 影响节点的目的。
results: 在多个小规模和大规模图标准 bencmark 上，GRAPES 被证明能够维持高准确率，同时具有扩展性和可扩展性。

Abstract
Graph neural networks (GNNs) learn the representation of nodes in a graph by aggregating the neighborhood information in various ways. As these networks grow in depth, their receptive field grows exponentially due to the increase in neighborhood sizes, resulting in high memory costs. Graph sampling solves memory issues in GNNs by sampling a small ratio of the nodes in the graph. This way, GNNs can scale to much larger graphs. Most sampling methods focus on fixed sampling heuristics, which may not generalize to different structures or tasks. We introduce GRAPES, an adaptive graph sampling method that learns to identify sets of influential nodes for training a GNN classifier. GRAPES uses a GFlowNet to learn node sampling probabilities given the classification objectives. We evaluate GRAPES across several small- and large-scale graph benchmarks and demonstrate its effectiveness in accuracy and scalability. In contrast to existing sampling methods, GRAPES maintains high accuracy even with small sample sizes and, therefore, can scale to very large graphs. Our code is publicly available at https://github.com/dfdazac/grapes.

摘要
GRAPES是一种适应性 Graph sampling 方法，用于在深度层次的 Graph Neural Networks（GNNs）中解决内存问题。GRAPES通过学习选择训练 GNN 分类器的影响节点集来采样graph。我们使用 GFlowNet 学习节点采样概率，基于分类目标。我们在多个小规模和大规模图 benchmark 中评估 GRAPES，并证明其精度和可扩展性。与现有的采样方法不同，GRAPES可以在小样本大小下保持高精度，因此可以扩展到非常大的图。我们的代码公开在 GitHub 上：https://github.com/dfdazac/grapes。

Unpacking Human-AI Interaction in Safety-Critical Industries: A Systematic Literature Review

paper_url: http://arxiv.org/abs/2310.03392
repo_url: None
paper_authors: Tita A. Bach, Jenny K. Kristiansen, Aleksandar Babic, Alon Jacovi
for: 本研究的目的是探讨人工智能与人互动（HAII）在安全关键行业中的实现，以提高这些行业的安全性和可靠性。
methods: 本研究采用文献综述方法，检查当前HAII领域的研究，并提出了在这些领域中进行研究的最佳实践。
results: 本研究发现HAII领域的研究 Fragmented and inconsistent，存在多种不同的术语和定义，并且HAII的评价方法多样化。研究还发现了HAII的五大影响因素，即用户特征和背景（如用户人性和观察）、人工智能界面和功能（如交互UI设计）、人工智能输出（如准确性和操作建议）、可解释性和可 interpretability（如级别和用户理解）、以及人工智能的使用（如多样化环境和用户需求）。

Abstract
Ensuring quality human-AI interaction (HAII) in safety-critical industries is essential. Failure to do so can lead to catastrophic and deadly consequences. Despite this urgency, what little research there is on HAII is fragmented and inconsistent. We present here a survey of that literature and recommendations for research best practices that will improve the field. We divided our investigation into the following research areas: (1) terms used to describe HAII, (2) primary roles of AI-enabled systems, (3) factors that influence HAII, and (4) how HAII is measured. Additionally, we described the capabilities and maturity of the AI-enabled systems used in safety-critical industries discussed in these articles. We found that no single term is used across the literature to describe HAII and some terms have multiple meanings. According to our literature, five factors influence HAII: user characteristics and background (e.g., user personality, perceptions), AI interface and features (e.g., interactive UI design), AI output (e.g., accuracy, actionable recommendations), explainability and interpretability (e.g., level of detail, user understanding), and usage of AI (e.g., heterogeneity of environments and user needs). HAII is most commonly measured with user-related subjective metrics (e.g., user perception, trust, and attitudes), and AI-assisted decision-making is the most common primary role of AI-enabled systems. Based on this review, we conclude that there are substantial research gaps in HAII. Researchers and developers need to codify HAII terminology, involve users throughout the AI lifecycle (especially during development), and tailor HAII in safety-critical industries to the users and environments.

摘要
Ensuring high-quality human-AI interaction (HAII) in safety-critical industries is crucial. Failure to do so can lead to disastrous and deadly consequences. Despite the urgency, the current research on HAII is fragmented and inconsistent. We conducted a survey of the literature and provided recommendations for research best practices that can improve the field. Our investigation focused on the following areas:1. Terms used to describe HAII2. Primary roles of AI-enabled systems3. Factors that influence HAII4. How HAII is measuredWe found that there is no single term used across the literature to describe HAII, and some terms have multiple meanings. According to our literature review, five factors influence HAII: user characteristics and background (e.g., user personality, perceptions), AI interface and features (e.g., interactive UI design), AI output (e.g., accuracy, actionable recommendations), explainability and interpretability (e.g., level of detail, user understanding), and usage of AI (e.g., heterogeneity of environments and user needs). HAII is most commonly measured with user-related subjective metrics (e.g., user perception, trust, and attitudes), and AI-assisted decision-making is the most common primary role of AI-enabled systems.Based on our review, we conclude that there are substantial research gaps in HAII. Researchers and developers need to codify HAII terminology, involve users throughout the AI lifecycle (especially during development), and tailor HAII in safety-critical industries to the users and environments.

Procedural Text Mining with Large Language Models

paper_url: http://arxiv.org/abs/2310.03376
repo_url: https://github.com/jd-coderepos/proc-tm
paper_authors: Anisa Rula, Jennifer D’Souza
for: 本研究探讨了使用大型自然语言处理（NLP）模型在零例学习和Context-Aware学习环境中提取PDF文档中的过程，以问题回答的方式进行不间断提取。
methods: 本研究使用了当前领先的GPT-4（生成准备 transformer 4）模型，并采用了基于ontology的定义和少量示例学习的两种Context-Aware学习方法。
results: 研究发现，这些 modification 有能力有效地地址深度学习基于NLP的过程提取技术中的数据收集困难，并且表明了这些自定义的Context-Aware学习方法在提取过程中的承诺。

Abstract
Recent advancements in the field of Natural Language Processing, particularly the development of large-scale language models that are pretrained on vast amounts of knowledge, are creating novel opportunities within the realm of Knowledge Engineering. In this paper, we investigate the usage of large language models (LLMs) in both zero-shot and in-context learning settings to tackle the problem of extracting procedures from unstructured PDF text in an incremental question-answering fashion. In particular, we leverage the current state-of-the-art GPT-4 (Generative Pre-trained Transformer 4) model, accompanied by two variations of in-context learning that involve an ontology with definitions of procedures and steps and a limited number of samples of few-shot learning. The findings highlight both the promise of this approach and the value of the in-context learning customisations. These modifications have the potential to significantly address the challenge of obtaining sufficient training data, a hurdle often encountered in deep learning-based Natural Language Processing techniques for procedure extraction.

摘要
最近在自然语言处理领域的进步，特别是大规模语言模型的开发，在知识工程中创造了新的机会。在这篇论文中，我们 investigate了使用大型语言模型（LLMs）在零shot学习和在context学习Setting中解决抽取PDF文本中的过程的问题。我们利用了当前状态的GPT-4（生成预训练变换器4）模型，并使用过程和步骤的ontology和少量的几个示例学习。研究发现，这种方法和context学习定制具有普遍提高过程抽取的批处和可行性。这些修改可以减少深度学习基于自然语言处理技术中的训练数据获得困难。

Design Optimizer for Planar Soft-Growing Robot Manipulators

paper_url: http://arxiv.org/abs/2310.03374
repo_url: None
paper_authors: Fabio Stroppa
For: The paper is written for designing and optimizing soft-growing robots for specific manipulation tasks, such as exploration of delicate/dangerous environments, manipulation of items, or assistance in domestic environments.* Methods: The paper presents a novel approach for design optimization of soft-growing robots, which involves modeling the design process as a multi-objective optimization problem and using population-based optimization algorithms, specifically evolutionary algorithms, to transform the problem into a single-objective problem. The method also incorporates a novel rank-partitioning algorithm and obstacle avoidance within the optimizer operators.* Results: The proposed method is tested on different tasks and shows significant performance in solving the problem, outperforming existing methods in terms of precision, resource consumption, and run time.

Abstract
Soft-growing robots are innovative devices that feature plant-inspired growth to navigate environments. Thanks to their embodied intelligence of adapting to their surroundings and the latest innovation in actuation and manufacturing, it is possible to employ them for specific manipulation tasks. The applications of these devices include exploration of delicate/dangerous environments, manipulation of items, or assistance in domestic environments. This work presents a novel approach for design optimization of soft-growing robots, which will be used prior to manufacturing to suggest engineers -- or robot designer enthusiasts -- the optimal dimension of the robot to be built for solving a specific task. I modeled the design process as a multi-objective optimization problem, in which I optimize the kinematic chain of a soft manipulator to reach targets and avoid unnecessary overuse of material and resources. The method exploits the advantages of population-based optimization algorithms, in particular evolutionary algorithms, to transform the problem from multi-objective into a single-objective thanks to an efficient mathematical formulation, the novel rank-partitioning algorithm, and obstacle avoidance integrated within the optimizer operators. I tested the proposed method on different tasks to access its optimality, which showed significant performance in solving the problem. Finally, comparative experiments showed that the proposed method works better than the one existing in the literature in terms of precision, resource consumption, and run time.

摘要
软性增长机器人是一种创新性的设备，它们借鉴植物的生长机理来适应环境。由于它们的内置智能和最新的活动和制造技术，因此可以用于特定的操作任务。这些设备的应用包括探索敏感/危险环境、物品操作和家庭环境中的协助。本工作提出了一种新的软性增长机器人设计优化方法，该方法将在制造之前使用以确定最佳的机器人尺寸，以解决特定任务。我模型了设计过程为多目标优化问题，并且优化软 manipulate器的骨骼来达到目标并避免不必要的材料和资源的浪费。该方法利用了人口基于优化算法的优势，尤其是进化算法，将问题转化为单目标问题，并且通过rank-partitioning算法和避免障碍的内置算法来提高效率。我对不同任务进行了测试，以评估其优化性，结果显示了显著的性能提升。最后，对比性测试表明，提出的方法与现有文献中的方法相比，在精度、资源消耗和运行时间上具有更好的性能。

AI-based automated active learning for discovery of hidden dynamic processes: A use case in light microscopy

paper_url: http://arxiv.org/abs/2310.04461
repo_url: None
paper_authors: Nils Friederich, Angelo Yamachui Sitcheu, Oliver Neumann, Süheyla Eroğlu-Kayıkçı, Roshan Prizak, Lennart Hilbert, Ralf Mikut
for: 本研究旨在提出两种新方法，用于提高生物医学实验中的动态过程观测效率。
methods: 一种基于人工智能的方法（Encoded Dynamic Process，EDP），可以从单个静止图像中预测动态过程的时间值。另一种是基于机器学习操作（MLOps）的实验自动化管道（Experiment Automation Pipeline for Dynamic Processes，EAPDP），使用EDP提取的知识来有效地安排实验。
results: 在一个实验中，我们示出了使用预训练的State-Of-The-Art（SOTA）对象分割网络（Contour Proposal Networks，CPN）作为EAPDP模块，可以有效地提取动态过程中相关的对象。

Abstract
In the biomedical environment, experiments assessing dynamic processes are primarily performed by a human acquisition supervisor. Contemporary implementations of such experiments frequently aim to acquire a maximum number of relevant events from sometimes several hundred parallel, non-synchronous processes. Since in some high-throughput experiments, only one or a few instances of a given process can be observed simultaneously, a strategy for planning and executing an efficient acquisition paradigm is essential. To address this problem, we present two new methods in this paper. The first method, Encoded Dynamic Process (EDP), is Artificial Intelligence (AI)-based and represents dynamic processes so as to allow prediction of pseudo-time values from single still images. Second, with Experiment Automation Pipeline for Dynamic Processes (EAPDP), we present a Machine Learning Operations (MLOps)-based pipeline that uses the extracted knowledge from EDP to efficiently schedule acquisition in biomedical experiments for dynamic processes in practice. In a first experiment, we show that the pre-trained State-Of-The- Art (SOTA) object segmentation method Contour Proposal Networks (CPN) works reliably as a module of EAPDP to extract the relevant object for EDP from the acquired three-dimensional image stack.

摘要
在生物医学环境中，动态过程的实验通常由人工监控员进行。现代实验技术常采用多个并发、异步过程来获取最大数量的相关事件。由于一些高通过put实验中只能同时观察一些过程的一个或几个实例，因此制定有效的招待和执行策略是非常重要。为解决这个问题，本文提出了两种新方法。首先，我们提出了编码动态过程（EDP）方法，该方法基于人工智能（AI），可以从单张停止图像中预测动态过程中的pseudo-时间值。其次，我们提出了实验自动化管道 для动态过程（EAPDP），该管道基于机器学习操作（MLOps），使用EDP提取的知识来有效地调度获取在生物医学实验中的动态过程。在一个实验中，我们证明了在EAPDP中使用预训练的状态 искусственный智能（SOTA）对象分割方法Contour Proposal Networks（CPN）可靠地作为EAPDP中EXTRACT对象的模块来提取获取的三维图像堆中相关的对象。

Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video Sequences Using Swin Transformer-Enhanced UNet

paper_url: http://arxiv.org/abs/2310.03365
repo_url: None
paper_authors: Hossein Jafari, Karim Faez, Hamidreza Amindavar
for: 这个研究的目的是提高Computer-aided diagnosis (CAD)系统的精度，以便更好地识别lung nodules from computed tomography (CT) scans。
methods: 这个研究使用了一个新的模型，它结合了卷积神经网和感知器transformer的优点，将每个3D CT影像视为一个影像序列，并将肺肿瘤视为影像中的物体，以进行时间序列应用。
results: 这个研究使用了10-fold cross-validation技术来验证提案的网络，得到了97.84%的感度标准和96.0%的竞赛性能指标（CPM），并与现有的肺肿瘤识别技术进行比较，显示了这个提案的优秀精度。

Abstract
Lung cancer is highly lethal, emphasizing the critical need for early detection. However, identifying lung nodules poses significant challenges for radiologists, who rely heavily on their expertise for accurate diagnosis. To address this issue, computer-aided diagnosis (CAD) systems based on machine learning techniques have emerged to assist doctors in identifying lung nodules from computed tomography (CT) scans. Unfortunately, existing networks in this domain often suffer from computational complexity, leading to high rates of false negatives and false positives, limiting their effectiveness. To address these challenges, we present an innovative model that harnesses the strengths of both convolutional neural networks and vision transformers. Inspired by object detection in videos, we treat each 3D CT image as a video, individual slices as frames, and lung nodules as objects, enabling a time-series application. The primary objective of our work is to overcome hardware limitations during model training, allowing for efficient processing of 2D data while utilizing inter-slice information for accurate identification based on 3D image context. We validated the proposed network by applying a 10-fold cross-validation technique to the publicly available Lung Nodule Analysis 2016 dataset. Our proposed architecture achieves an average sensitivity criterion of 97.84% and a competition performance metrics (CPM) of 96.0% with few parameters. Comparative analysis with state-of-the-art advancements in lung nodule identification demonstrates the significant accuracy achieved by our proposed model.

摘要
肺癌是高度致命的，强调了早期发现的急迫性。然而，识别肺节圆柱体呈难度很大的问题， radiologist仰赖自己的专业技巧进行精准诊断。为解决这个问题，基于机器学习技术的计算机辅助诊断（CAD）系统在肺节圆柱体CT扫描图像中进行帮助。然而，现有网络在这个领域经常受到计算复杂性的限制，导致高false negative和false positive率，限制其效iveness。为了解决这些挑战，我们提出了一种创新的模型，利用了 convolutional neural networks和vision transformers的优势。受到 object detection in videos 的启发，我们将每个3D CT图像视为视频，每个slice为帧，并将肺节圆柱体视为物体，使得时序应用。我们的主要目标是在训练模型时缓解硬件限制，以便高效处理2D数据，同时利用3D图像上下文信息进行准确识别。我们采用了10-fold cross-validation技术来验证我们的提议的网络。我们的提议的架构实现了97.84%的敏感指标和96.0%的竞赛性能指标（CPM），同时具有少量参数。与当前肺节圆柱体识别领域的状态代表性进行比较分析，显示了我们的提议模型的显著精准性。

Robust Representation Learning via Asymmetric Negative Contrast and Reverse Attention

paper_url: http://arxiv.org/abs/2310.03358
repo_url: https://github.com/changzhang777/ancra
paper_authors: Nuoyan Zhou, Decheng Liu, Dawei Zhou, Xinbo Gao, Nannan Wang
for: 提高深度神经网络的鲁棒性，增强神经网络免受攻击的能力。
methods: 提出一种Generic Framework of Adversarial Training (AT)，通过偏置对应的负样本和反注意力来获得鲁棒表示。
results: 经验证明，我们的方法可以大幅提高AT中的鲁棒性，并实现状态级表现。

Abstract
Deep neural networks are vulnerable to adversarial noise. Adversarial training (AT) has been demonstrated to be the most effective defense strategy to protect neural networks from being fooled. However, we find AT omits to learning robust features, resulting in poor performance of adversarial robustness. To address this issue, we highlight two characteristics of robust representation: (1) $\bf{exclusion}$: the feature of natural examples keeps away from that of other classes; (2) $\bf{alignment}$: the feature of natural and corresponding adversarial examples is close to each other. These motivate us to propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Specifically, we design an asymmetric negative contrast based on predicted probabilities, to push away examples of different classes in the feature space. Moreover, we propose to weight feature by parameters of the linear classifier as the reverse attention, to obtain class-aware feature and pull close the feature of the same class. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance. Code is available at .

摘要
Translated into Simplified Chinese:深度神经网络容易受到敌意噪声的攻击。对抗训练（AT）已经被证明是保护神经网络不被欺骗的最有效的防御策略。然而，我们发现AT不学习强健特征，导致对抗Example的性能不佳。为解决这问题，我们强调了两个特征的稳健表示：（1）隔离：自然示例在特征空间与其他类的示例保持距离;（2）对齐：自然示例和相应的敌意示例在特征空间之间的距离很近。这两个特征激励我们提出一种通用的AT框架，通过倒推对比和反注意力来获得稳健表示。具体来说，我们设计了基于预测概率的倒推对比，以推动不同类型的示例在特征空间中分离。此外，我们提议使用分类器参数来Weight特征，以实现类归一类的特征和同类示例之间的减距。我们对三个标准测试集进行实验，结果显示我们的方法可以大幅提高AT的稳健性和性能，并达到当前领先水平。代码可以在上找到。

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

paper_url: http://arxiv.org/abs/2310.03354
repo_url: None
paper_authors: Zelai Xu, Yancheng Liang, Chao Yu, Yu Wang, Yi Wu
for: 这种paper是为了解决竞争游戏中的多代理人学习问题，特别是在混合合作竞争游戏中，where agents on the same team need to cooperate with each other。
methods: 这种paper使用了自适应（SP）和策略空间响应器（PSRO）两种方法，并将它们结合在一起以实现更好的性能。
results: 这种paper的实验结果表明，FXP算法可以在矩阵游戏和格子世界Domain中击败基准模型，并在一个更复杂的足球游戏中获得94%的赢利率。

Abstract
Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.

摘要
自适应（SP）是一种流行的多代理人学习（MARL）框架，用于解决竞争性游戏，每个代理人都通过对别人的行为来优化策略。 despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium（NE）with high probability. Alternatively, Policy-Space Response Oracles（PSRO）is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play（FXP）, which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.

Parking Spot Classification based on surround view camera system

paper_url: http://arxiv.org/abs/2310.12997
repo_url: None
paper_authors: Andy Xiao, Deep Doshi, Lihao Wang, Harsha Gorantla, Thomas Heitzmann, Peter Groth
for: 本研究旨在掌握半自动驾驶场景中的自动停车空位探测和分类，以提高自动停车的精度和效率。
methods: 本研究使用了围绕式鱼眼摄像头系统，并采用了基于物体检测的YOLOv4神经网络，以及一种新的多边形 bounding box 模型，以适应不同的停车空位形状。
results: 研究结果表明，我们提出的停车空位分类方法可以有效地分类不同类型的停车空位，包括普通停车空位、电动车停车空位和残疾人停车空位。

Abstract
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving scenarios, including urban driving and auto valet parking. Four fisheye cameras, one on each side, are sufficient to cover 360{\deg} around the vehicle capturing the entire near-field region. Based on surround view cameras, there has been much research on parking slot detection with main focus on the occupancy status in recent years, but little work on whether the free slot is compatible with the mission of the ego vehicle or not. For instance, some spots are handicap or electric vehicles accessible only. In this paper, we tackle parking spot classification based on the surround view camera system. We adapt the object detection neural network YOLOv4 with a novel polygon bounding box model that is well-suited for various shaped parking spaces, such as slanted parking slots. To the best of our knowledge, we present the first detailed study on parking spot detection and classification on fisheye cameras for auto valet parking scenarios. The results prove that our proposed classification approach is effective to distinguish between regular, electric vehicle, and handicap parking spots.

摘要
围绕式鱼眼摄像机在自动驾驶场景中广泛使用，包括城市驾驶和自动停车。四个鱼眼摄像机，一个在每侧，可以覆盖360度周围车辆，捕捉整个近场区域。基于围绕视频摄像机，近年来有很多研究关于停车位置检测，主要关注车辆在停车位置的占用状态。然而，很少人研究停车位置是否适合egos车辆的任务。例如，一些停车位置只能由人们或电动车辆使用。在这篇论文中，我们解决了基于围绕视频摄像机系统的停车位置类型分类。我们适应了对象检测神经网络YOLOv4的一种新的多边框架模型，这种模型适合各种形状的停车位置，如斜停车位置。到目前为止，我们的分类方法是对围绕视频摄像机系统中的停车位置进行详细研究的第一个详细研究。结果表明，我们的分类方法可以有效地将正常停车位置、电动车辆停车位置和残疾人停车位置分开。

Deep Geometric Learning with Monotonicity Constraints for Alzheimer’s Disease Progression

paper_url: http://arxiv.org/abs/2310.03353
repo_url: None
paper_authors: Seungwoo Jeong, Wonsik Jung, Junghyo Sohn, Heung-Il Suk
for: 预测阿尔茨海默病（AD）的进程，以便诊断和治疗。
methods: 使用结构MRI数据模型AD进程，包括时间变化、不完整观测和时间几何特征。
results: 提出了一种新的几何学学习方法，结合了 topological space shift、ODE-RGRU 和 trajectory estimation，可以模型长期数据序列。该方法还包括一种训练算法，将 manifold mapping 与 monotonicity constraints 结合以反映测量过程的不可逆转换。通过预测临床标签和认知分数， validate 了我们的提议方法的有效性。

Abstract
Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observations, and (iii) temporal geometric characteristics. However, deep learning-based approaches regarding data variability and sparsity have yet to consider inherent geometrical properties sufficiently. The ordinary differential equation-based geometric modeling method (ODE-RGRU) has recently emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. Despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. Therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. We have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. We verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. Furthermore, we thoroughly analyze our proposed framework through an ablation study.

摘要
阿尔茨heimer病（AD）是一种毁灭性神经退化疾病，其前进程是不可逆的，因此预测其进程的演变是诊断和治疗中非常重要。许多研究已经使用结构Magnetic Resonance Imaging（MRI）来模型AD的进程，关注三个重要方面：（i）时间变化，（ii）部分观测，（iii）时间几何特征。然而，深度学习基于数据变化和稀缺的方法尚未充分考虑内在的几何特征。recently, an ordinary differential equation-based geometric modeling method（ODE-RGRU）has emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. we have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. we verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. furthermore, we thoroughly analyze our proposed framework through an ablation study.

Tractable Bounding of Counterfactual Queries by Knowledge Compilation

paper_url: http://arxiv.org/abs/2310.03352
repo_url: https://github.com/idsia/credici
paper_authors: David Huber, Yizuo Chen, Alessandro Antonucci, Adnan Darwiche, Marco Zaffalon
for: 本文研究了在pearlian结构 causal模型中绑定部分可识别查询（counterfactuals）的问题。
methods: 本文使用了一种新的迭代EM算法来获得这些绑定的上限，该算法通过采样初始化参数来实现。该方法需要多个（Bayesian网络）查询，这些查询共享同一个结构方程和概率分布，但每个查询有不同的外生参数。因此，编译下来的Circuit结构有利于执行多个查询，从而实现了一定的计算减速。
results: 作者们实验表明，使用symbolic知识编译可以快速地计算绑定，并且可以实现一个训练 bayesian network inference的速度减速。

Abstract
We discuss the problem of bounding partially identifiable queries, such as counterfactuals, in Pearlian structural causal models. A recently proposed iterated EM scheme yields an inner approximation of those bounds by sampling the initialisation parameters. Such a method requires multiple (Bayesian network) queries over models sharing the same structural equations and topology, but different exogenous probabilities. This setup makes a compilation of the underlying model to an arithmetic circuit advantageous, thus inducing a sizeable inferential speed-up. We show how a single symbolic knowledge compilation allows us to obtain the circuit structure with symbolic parameters to be replaced by their actual values when computing the different queries. We also discuss parallelisation techniques to further speed up the bound computation. Experiments against standard Bayesian network inference show clear computational advantages with up to an order of magnitude of speed-up.

摘要
我们讨论 partially identifiable queries的问题，例如 counterfactuals，在 Pearlian 结构 causal models 中。一种最近提出的迭代 EM 方法可以获得这些约束的内部approximation，通过 sampling 初始化参数。这种方法需要多个（Bayesian network）查询，这些查询共享同一个结构方程和结构，但每个查询有不同的外生概率。这种设置使得 compiling 下面的模型到一个算术Circuit 有利可图，从而induces 一个明显的推理速度增加。我们示出了一种单symbolic knowledge compilation可以获得这些circuit structure 的符号参数，并将其替换为实际值当计算不同的查询。我们还讨论了并行技术，以进一步加速约束计算。对标准 Bayesian network inference 进行实验，我们发现了一个许多的计算优势，速度增加达一个数量级。

Tuning In to Neural Encoding: Linking Human Brain and Artificial Supervised Representations of Language

paper_url: http://arxiv.org/abs/2310.04460
repo_url: None
paper_authors: Jingyuan Sun, Xiaohan Zhang, Marie-Francine Moens
for: investigate how task tuning influences a pretained Transformer for neural encoding and which tasks lead to the best encoding performances.
methods: generate supervised representations on eight Natural Language Understanding (NLU) tasks using prompt-tuning, a technique that is seldom explored in neural encoding for language.
results: demonstrate that prompt-tuning yields representations that better predict neural responses to Chinese stimuli than traditional fine-tuning on four tasks, and discover that tasks that require a fine-grained processing of concepts and entities lead to representations that are most predictive of brain activation patterns.

Abstract
To understand the algorithm that supports the human brain's language representation, previous research has attempted to predict neural responses to linguistic stimuli using embeddings generated by artificial neural networks (ANNs), a process known as neural encoding. However, most of these studies have focused on probing neural representations of Germanic languages, such as English, with unsupervised ANNs. In this paper, we propose to bridge the gap between human brain and supervised ANN representations of the Chinese language. Specifically, we investigate how task tuning influences a pretained Transformer for neural encoding and which tasks lead to the best encoding performances. We generate supervised representations on eight Natural Language Understanding (NLU) tasks using prompt-tuning, a technique that is seldom explored in neural encoding for language. We demonstrate that prompt-tuning yields representations that better predict neural responses to Chinese stimuli than traditional fine-tuning on four tasks. Furthermore, we discover that tasks that require a fine-grained processing of concepts and entities lead to representations that are most predictive of brain activation patterns. Additionally, we reveal that the proportion of tuned parameters highly influences the neural encoding performance of fine-tuned models. Overall, our experimental findings could help us better understand the relationship between supervised artificial and brain language representations.

摘要
以前的研究曾尝试使用人工神经网络（ANNs）生成的编码来预测大脑对语言刺激的神经响应，但大多数这些研究都集中在探索德语族语言，如英语。在这篇论文中，我们提议将人类大脑和有监督的ANN语言表示之间的关系 bridged。特别是，我们研究了一种任务调整对预先训练的 transformer 语言编码器的影响，以及哪些任务会导致最佳的编码性能。我们使用 prompt-tuning 技术，它在语音编码领域尚未得到充分探索，来生成八种自然语言理解（NLU）任务的有监督表示。我们发现，使用 prompt-tuning 技术可以更好地预测中文刺激的神经响应，并且发现任务需要细化概念和实体处理时，表示更加预测大脑活动 Pattern 相关。此外，我们发现调整参数的比例对练习后的模型语言编码性能具有重要影响。总的来说，我们的实验结果可以帮助我们更好地理解人造语言和大脑之间的关系。

Zero-shot Learning of Drug Response Prediction for Preclinical Drug Screening

paper_url: http://arxiv.org/abs/2310.12996
repo_url: https://github.com/drugd/msda
paper_authors: Kun Li, Yong Luo, Xiantao Cai, Wenbin Hu, Bo Du
for: 这篇论文旨在提出一种零例学习解决方案，用于预测新药物的药物对应（DRP）任务。
methods: 方法基于多支多源领域适应试验插件（MSDA），可以与传统的 DRP 方法相结合，从相似药物的内部对应数据学习不变的特征，以提高实时预测未知药物的药物对应。
results: 实验结果显示，MSDA 能够效率地预测新药物的药物对应，导致预测误差下降5-10%，实现了在预 клиніical 阶段的药物探索过程中的加速和改善药物选择。

Abstract
Conventional deep learning methods typically employ supervised learning for drug response prediction (DRP). This entails dependence on labeled response data from drugs for model training. However, practical applications in the preclinical drug screening phase demand that DRP models predict responses for novel compounds, often with unknown drug responses. This presents a challenge, rendering supervised deep learning methods unsuitable for such scenarios. In this paper, we propose a zero-shot learning solution for the DRP task in preclinical drug screening. Specifically, we propose a Multi-branch Multi-Source Domain Adaptation Test Enhancement Plug-in, called MSDA. MSDA can be seamlessly integrated with conventional DRP methods, learning invariant features from the prior response data of similar drugs to enhance real-time predictions of unlabeled compounds. We conducted experiments using the GDSCv2 and CellMiner datasets. The results demonstrate that MSDA efficiently predicts drug responses for novel compounds, leading to a general performance improvement of 5-10\% in the preclinical drug screening phase. The significance of this solution resides in its potential to accelerate the drug discovery process, improve drug candidate assessment, and facilitate the success of drug discovery.

摘要
传统的深度学习方法通常采用有监督学习的方式进行药物响应预测（DRP）。这意味着模型训练需要有标注的响应数据来源于药物。然而，在实际应用中，在前期药物层面的药物屏选阶段，需要预测新的化合物的响应，而这些化合物的响应 oftentimes unknown。这增加了挑战，使得传统的深度学习方法无法满足这些情况。在本文中，我们提出了零shot学习的解决方案 для DRP 任务在前期药物层面。特别是，我们提出了一种多支多源领域适应测试扩展 Plug-in，称为 MSDA。 MSDA 可以与传统的 DRP 方法集成，从价值类似药物的响应数据中学习不变的特征，以提高实时预测无标注的化合物的响应。我们在 GDSCv2 和 CellMiner 数据集上进行了实验，结果表明，MSDA 能有效地预测新的化合物的响应，从而在前期药物层面提高了5-10%的性能。这种解决方案的重要性在于，它可以加速药物发现过程，改善药物候选者评估，并促进药物发现的成功。

Learning Concept-Based Visual Causal Transition and Symbolic Reasoning for Visual Planning

paper_url: http://arxiv.org/abs/2310.03325
repo_url: None
paper_authors: Yilue Qian, Peiyu Yu, Ying Nian Wu, Wei Wang, Lifeng Fan
for: 这个论文旨在提出一个可解释的和通用的视觉观念规划框架，以帮助Agent在复杂环境中完成日常任务。
methods: 这个框架包括三个主要部分：novel Substitution-based Concept Learner (SCL)、symbol abstraction和reasoning、以及Visual Causal Transition model (ViCT)。SCL抽象视觉输入，生成分离的概念表示；symbol abstraction和reasoning使用自学到的符号来进行任务观念规划；ViCT将视觉 causal transition 与实际世界中相似的动作相连接。
results: 这个方法在一个大规模的视觉观念规划数据集（CCTP）上进行了严格的实验，展示了该方法在视觉任务规划方面的超越性性能。实验结果显示，该方法可以对未见过的任务路径和物品类别进行扩展。

Abstract
Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual task planning. Empirically, we show that our framework can generalize to unseen task trajectories and unseen object categories.

摘要
<> translate into Simplified ChineseVisual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual task planning. Empirically, we show that our framework can generalize to unseen task trajectories and unseen object categories.>Here's the translation in Simplified Chinese characters:Visual 规划 simulate 人类做出决策以达到目标的形式，即在初始视觉状态和目标视觉状态之间搜索视觉 causal 过渡。在 egocentric 视觉中，它变得越来越重要，因为它可以指导代理人进行日常任务在复杂环境中。在这篇论文中，我们提出了可解释性和普适性的视觉规划框架，包括 i) 一种新的替换基于概念学习器（SCL），ii) 符号抽象和理据，iii) 视觉 causal 过渡模型（ViCT）。给出初始状态，我们通过符号意义和 causal 过渡来实现目标状态的Visual 规划。为了证明我们的模型的效果，我们收集了基于 AI2-THOR 的大规模视觉规划数据集，并进行了广泛的实验。经验表明，我们的框架可以通过在未看过的任务轨迹和未看过的物品类别上进行普适化。

Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction

paper_url: http://arxiv.org/abs/2310.03314
repo_url: None
paper_authors: Aadi Kothari, Tony Tohme, Xiaotong Zhang, Kamal Youcef-Toumi
for: 这篇论文的目的是提出一种基于人体 JOINT 约束和场景约束的人体动作预测方法，以提高人机合作的效率和安全性。
methods: 该方法使用 Gaussian Process Regression（GPR）模型，并将人体 JOINT 约束和场景约束直接integrated into the model，以便在预测人体动作的过程中考虑人体的物理约束和场景约束。
results: 实验和 simulate 结果表明，当将人体 JOINT 约束和场景约束explicitly considered时，Gaussian Process 框架可以得到较好的预测结果，而且在实际应用中也可以实现实时的人机合作。

Abstract
Human motion prediction is an essential step for efficient and safe human-robot collaboration. Current methods either purely rely on representing the human joints in some form of neural network-based architecture or use regression models offline to fit hyper-parameters in the hope of capturing a model encompassing human motion. While these methods provide good initial results, they are missing out on leveraging well-studied human body kinematic models as well as body and scene constraints which can help boost the efficacy of these prediction frameworks while also explicitly avoiding implausible human joint configurations. We propose a novel human motion prediction framework that incorporates human joint constraints and scene constraints in a Gaussian Process Regression (GPR) model to predict human motion over a set time horizon. This formulation is combined with an online context-aware constraints model to leverage task-dependent motions. It is tested on a human arm kinematic model and implemented on a human-robot collaborative setup with a UR5 robot arm to demonstrate the real-time capability of our approach. Simulations were also performed on datasets like HA4M and ANDY. The simulation and experimental results demonstrate considerable improvements in a Gaussian Process framework when these constraints are explicitly considered.

摘要
人类动作预测是人机合作中不可或缺的一步，目前的方法可以分为两类：一是将人体关节表示为神经网络 Architecture 中的某种形式，二是使用回归模型在线下适应hyperparameters，以 capture 人体动作模型。尽管这些方法可以提供初步的好结果，但是它们缺乏利用人体动作学习的知识和场景约束，这些约束可以帮助提高预测框架的效果，同时明确避免人体关节配置的不可能情况。我们提出了一种新的人体动作预测框架，该框架在 Gaussian Process Regression（GPR）模型中包含人体关节约束和场景约束，以预测人体动作在时间范围内的动作。这种形式与在线上的上下文意识约束模型结合，以利用任务висимы的动作。我们在人类臂动机学模型上进行了测试，并在人机合作设置中使用UR5机械臂进行实际应用，以示我们的方法的实时能力。我们还在HA4M和ANDY等数据集上进行了 simulated 实验，实验和实际结果表明，在Gaussian Process框架中，当这些约束被Explicitly 考虑时，可以获得显著的改善。

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning

paper_url: http://arxiv.org/abs/2310.03309
repo_url: None
paper_authors: Shaotian Yan, Chen Shen, Junjie Liu, Jieping Ye
for: 提高大型自然语言模型（LLM）的逻辑推理能力。
methods: 提出了一种新的逻辑推理方法，即 Concise and Organized Perception（COP），通过精炼给定的陈述，快速分析出最重要信息，并将其组织得更加系统化，以便更好地逻辑推理。
results: 实验结果表明，与先前的状态艺术方法相比，COP方法在三个popular deductive benchmark（ProofWriter、PrOntoQA和PrOntoQA-OOD）上显著提高了性能。

Abstract
Exploiting large language models (LLMs) to tackle deductive reasoning has garnered growing attention. It still remains highly challenging to achieve satisfactory results in complex deductive problems, characterized by plenty of premises (i.e., facts or rules) entailing intricate relationships among entities and requiring multi-hop reasoning. One intuitive solution is to decompose the original task into smaller sub-tasks, and then chain the multiple casual reasoning steps together in a forward (e.g., Selection-Inference) or backward (e.g., LAMBADA) direction. However, these techniques inevitably necessitate a large number of overall stages, leading to computationally expensive operations and a higher possibility of making misleading steps. In addition to stage-by-stage decomposition, we draw inspiration from another aspect of human problem-solving. Humans tend to distill the most relevant information and organize their thoughts systematically (e.g., creating mind maps), which assists them in answering questions or drawing conclusions precisely and quickly. In light of this, we propose a novel reasoning approach named Concise and Organized Perception (COP). COP carefully analyzes the given statements to efficiently identify the most pertinent information while eliminating redundancy. It then prompts the LLMs in a more organized form that adapts to the model's inference process. By perceiving concise and organized proofs, the deductive reasoning abilities of LLMs can be better elicited, and the risk of acquiring errors caused by excessive reasoning stages is mitigated. Furthermore, our approach can be combined with the aforementioned ones to further boost their performance. Extensive experimental results on three popular deductive benchmarks (i.e., ProofWriter, PrOntoQA and PrOntoQA-OOD) show that COP significantly outperforms previous state-of-the-art methods.

摘要
大量语言模型（LLM）在解释逻辑中得到了越来越多的关注。然而，复杂的逻辑问题仍然具有许多前提（即事实或规则），导致复杂的关系和多步逻辑推理。一种直观的解决方案是将原始任务分解成小任务，然后在前向（如选择-推理）或反向（如LAMBADA）方向连接多个逻辑推理步骤。然而，这些技术无可避免地需要大量的总体阶段，导致计算昂贵的操作和更高的误导步骤的可能性。除了阶段 decomposition外，我们从人类问题解决的另一个方面着想着。人类倾向于抽象出最重要的信息，并系统地组织自己的思想（如创建MIND MAPS），这有助于他们快速、准确地回答问题或 Draw 结论。在这 basis，我们提出了一种新的逻辑方法，称为 Concise and Organized Perception（COP）。COP méticulously analyzes the given statements to efficiently identify the most relevant information while eliminating redundancy. It then prompts the LLMs in a more organized form that adapts to the model's inference process. By perceiving concise and organized proofs, the deductive reasoning abilities of LLMs can be better elicited, and the risk of acquiring errors caused by excessive reasoning stages is mitigated。此外，我们的方法可以与以前的方法结合使用，以进一步提高 их性能。我们在三个流行的逻辑标准 benchmark（ ProofWriter、PrOntoQA 和 PrOntoQA-OOD）进行了广泛的实验，结果表明，COP significantly outperforms previous state-of-the-art methods。

Benchmarking Large Language Models As AI Research Agents

paper_url: http://arxiv.org/abs/2310.03302
repo_url: https://github.com/snap-stanford/mlagentbench
paper_authors: Qian Huang, Jian Vora, Percy Liang, Jure Leskovec
for:MLAgentBench is a suite of ML tasks for benchmarking AI research agents, allowing them to perform actions like reading/writing files, executing code, and inspecting outputs.methods:The benchmark evaluates the agent’s performance objectively over various metrics related to performance and efficiency, and an LLM-based research agent is designed to automatically perform experimentation loops in such an environment.results:A GPT-4-based research agent can feasibly build compelling ML models over many tasks in MLAgentBench, displaying highly interpretable plans and actions, but the success rates vary considerably and the agent faces challenges such as long-term planning and hallucination.Here is the answer in Simplified Chinese text:for: MLAgentBench 是一个 ML 任务集合，用于评估 AI 研究代理的表现，允许代理执行文件读写、代码执行和输出检查等操作。methods: MLAgentBench 使用对象 Orientated 评估代理的表现，包括多种关于性能和效率的 metric，而一个基于 LLM 的研究代理被设计用于自动执行实验循环。results: GPT-4 基于的研究代理可以在 MLAgentBench 上建立优秀的 ML 模型，显示出高度可读取的计划和行动，但成功率差异较大，从 nearly 90% 在较古老的 dataset 上到 recent Kaggle Challenges 上的 10%，甚至 newer research challenges 上的 0%。此外， LLB 基于的研究代理还面临着长期规划和幻觉等挑战。

Abstract
Scientific experimentation involves an iterative process of creating hypotheses, designing experiments, running experiments, and analyzing the results. Can we build AI research agents to perform these long-horizon tasks? To take a step towards building and evaluating research agents on such open-ended decision-making tasks, we focus on the problem of machine learning engineering: given a task description and a dataset, build a high-performing model. In this paper, we propose MLAgentBench, a suite of ML tasks for benchmarking AI research agents. Agents can perform actions like reading/writing files, executing code, and inspecting outputs. With these actions, agents could run experiments, analyze the results, and modify the code of entire machine learning pipelines, such as data processing, architecture, training processes, etc. The benchmark then automatically evaluates the agent's performance objectively over various metrics related to performance and efficiency. We also design an LLM-based research agent to automatically perform experimentation loops in such an environment. Empirically, we find that a GPT-4-based research agent can feasibly build compelling ML models over many tasks in MLAgentBench, displaying highly interpretable plans and actions. However, the success rates vary considerably; they span from almost 90\% on well-established older datasets to as low as 10\% on recent Kaggle Challenges -- unavailable during the LLM model's pretraining -- and even 0\% on newer research challenges like BabyLM. Finally, we identify several key challenges for LLM-based research agents such as long-term planning and hallucination. Our code is released at https://github.com/snap-stanford/MLAgentBench.

摘要
Translation (Simplified Chinese):科学实验涉及到一个迭代的过程，包括创建假设、设计实验、运行实验和分析结果。我们是否可以建立AI研究代理来完成这些长期决策任务？为了实现这一目标，我们将关注机器学习工程问题：给定任务描述和数据集，建立高性能的模型。在这篇论文中，我们提出了MLAgentBench，一个用于评估AI研究代理的ML任务集。代理可以执行如读写文件、执行代码和检查输出等动作。通过这些动作，代理可以运行实验、分析结果并修改整个机器学习管道，包括数据处理、架构、训练过程等。然后，比较器会自动评估代理的表现，并对其表现进行对比。我们还设计了一个基于LLM的研究代理，可以自动完成实验循环。我们的实验表明，一个基于GPT-4的研究代理可以在MLAgentBench中建立吸引人的ML模型，并显示出高度可读的计划和操作。然而，成功率很大，从 almost 90% 到 recent Kaggle Challenges 的10% ，甚至到 newer research challenges like BabyLM 的0%。最后，我们确定了一些关键挑战，包括长期规划和幻觉。我们的代码发布在 https://github.com/snap-stanford/MLAgentBench。

LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

paper_url: http://arxiv.org/abs/2310.03294
repo_url: https://github.com/rulinshao/lightseq
paper_authors: Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang
for: 本研究旨在提高大语言模型（LLMs）的训练context长度，但是这会增加训练的内存占用。现有的分布式系统，如Megatron-LM，通过分解并并行计算不同的注意头，但是这会导致大量的通信量，因此无法扩展。
methods: 本研究提出了一种新的方法——LightSeq，用于长context LLMs 的训练。LightSeq通过分解序列维度来实现，因此不受模型结构的限制，可以应用于不同的注意头数量模型，如多头注意、多个查询注意和分组查询注意。LightSeq比Megatron-LM需要更少的通信，并且可以重合计算和通信。
results: 通过对Llama-7B和其变种进行详细的单节和跨节训练测试，我们发现LightSeq可以 дости到1.24-2.01倍的总体速度提升，并可以支持更长的序列长度（32K-512K）。相比Megatron-LM，LightSeq可以减少4.7倍的通信量，并且实现了更高效的训练。

Abstract
Increasing the context length of large language models (LLMs) unlocks fundamentally new capabilities, but also significantly increases the memory footprints of training. Previous model-parallel systems such as Megatron-LM partition and compute different attention heads in parallel, resulting in large communication volumes, so they cannot scale beyond the number of attention heads, thereby hindering its adoption. In this paper, we introduce a new approach, LightSeq, for long-context LLMs training. LightSeq has many notable advantages. First, LightSeq partitions over the sequence dimension, hence is agnostic to model architectures and readily applicable for models with varying numbers of attention heads, such as Multi-Head, Multi-Query and Grouped-Query attention. Second, LightSeq not only requires up to 4.7x less communication than Megatron-LM on popular LLMs but also overlaps the communication with computation. To further reduce the training time, LightSeq features a novel gradient checkpointing scheme to bypass an forward computation for memory-efficient attention. We evaluate LightSeq on Llama-7B and its variants with sequence lengths from 32K to 512K. Through comprehensive experiments on single and cross-node training, we show that LightSeq achieves up to 1.24-2.01x end-to-end speedup, and a 2-8x longer sequence length on models with fewer heads, compared to Megatron-LM. Codes will be available at https://github.com/RulinShao/LightSeq.

摘要
增加大语言模型（LLM）的上下文长度可以解锁新的功能，但也会增加训练时的内存占用。现有的模型并行系统，如Megatron-LM，通过分布式计算不同的注意头，以并行计算方式来降低通信量，但是这种方法无法扩展到更多的注意头，因此限制了其应用。在这篇论文中，我们介绍了一种新的方法——LightSeq，用于长上下文LLM的训练。LightSeq具有多个优势。首先，LightSeq分配在序列维度上，因此不受模型结构限制，可以应用于不同数量的注意头，如多头注意、多Query注意和分组Query注意。其次，LightSeq相比Megatron-LM需要4.7倍少的通信量，并且可以在计算和通信之间进行 overlap。为了进一步减少训练时间，LightSeq还提供了一种独特的梯度检查点 schemes，以快速地缓存减少计算注意。我们在Llama-7B和其变种上进行了广泛的实验，并证明了LightSeq可以达到1.24-2.01倍的综合速度，并在模型中有更多的注意头时可以处理更长的序列长度。代码将在https://github.com/RulinShao/LightSeq上提供。

SoK: Access Control Policy Generation from High-level Natural Language Requirements

paper_url: http://arxiv.org/abs/2310.03292
repo_url: None
paper_authors: Sakuna Harinda Jayasundara, Nalin Asanka Gamagedara Arachchilage, Giovanni Russello
for: 防止管理员中心化访问控制失败，以避免数据泄露和组织受到金融损失和声誉损害。
methods: 已有图形策略配置工具和自动生成策略框架，帮助管理员配置和生成访问控制策略，以避免such failures。但是，图形策略配置工具容易出现人工错误，而自动生成策略框架容易出现错误预测，因此需要改进其可用性和可靠性。
results: 通过系统性文献回顾分析49篇论文，发现现有工具和框架具有限制，需要改进以提高可用性和可靠性。

Abstract
Administrator-centered access control failures can cause data breaches, putting organizations at risk of financial loss and reputation damage. Existing graphical policy configuration tools and automated policy generation frameworks attempt to help administrators configure and generate access control policies by avoiding such failures. However, graphical policy configuration tools are prone to human errors, making them unusable. On the other hand, automated policy generation frameworks are prone to erroneous predictions, making them unreliable. Therefore, to find ways to improve their usability and reliability, we conducted a Systematic Literature Review analyzing 49 publications, to identify those tools, frameworks, and their limitations. Identifying those limitations will help develop effective access control policy generation solutions while avoiding access control failures.

摘要
管理员中心的访问控制失败可导致数据泄露，使组织面临金融损失和声誉损害的风险。现有的图形策略配置工具和自动策略生成框架尝试帮助管理员配置和生成访问控制策略，以避免这些失败。然而，图形策略配置工具容易出现人为错误，使其不可用。相反，自动策略生成框架容易出现错误预测，使其不可靠。因此，为了改善其可用性和可靠性，我们进行了系统性文献综述，分析了49篇论文，以识别这些工具、框架和其限制，以帮助开发有效的访问控制策略生成解决方案，并避免访问控制失败。

A 5’ UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions

paper_url: http://arxiv.org/abs/2310.03281
repo_url: None
paper_authors: Yanyi Chu, Dan Yu, Yupeng Li, Kaixuan Huang, Yue Shen, Le Cong, Jason Zhang, Mengdi Wang
for:* The paper is written to introduce a language model for 5’ UTR (UTR-LM) to predict the translation efficiency and mRNA expression level.methods:* The UTR-LM is pre-trained on endogenous 5’ UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy.* The model is fine-tuned in a variety of downstream tasks.results:* The UTR-LM outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level.* The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline.* Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5’ UTR optimized for therapeutics.

Abstract
The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading, and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline. Further, we designed a library of 211 novel 5' UTRs with high predicted values of translation efficiency and evaluated them via a wet-lab assay. Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5' UTR optimized for therapeutics.

摘要
“5' UTR，一个调节区域，位于mRNA分子的起始处，对翻译过程进行重要调节，并影响蛋白质表达水平。语音模型已经展示了它们可以解读蛋白质和基因序列的功能。在这里，我们引入了5' UTR的语音模型，我们称之为UTR-LM。UTR-LM在多种生物体中的组合式训练中进行预训练，并且受到次要结构和最小自由能的指导。我们在多个下游任务中精确调整UTR-LM。模型比最佳参考基准高达42% для预测蛋白质载入平均值，并高达60% для预测翻译效率和mRNA表达水平。模型还应用于识别未被评估的Internal Ribosome Entry Sites（iRES），并提高AUPR从0.37提升至0.52，比最佳基eline高出35%。此外，我们设计了211个新的5' UTR，预测的翻译效率高，并通过湿库实验验证。结果显示，我们的顶部设计可以提高蛋白质生产水平32.5%，相比于已知的5' UTR优化 для医药。”

Network Alignment with Transferable Graph Autoencoders

paper_url: http://arxiv.org/abs/2310.03272
repo_url: https://github.com/graphmatching/graph-matching
paper_authors: Jiashu He, Charilaos I. Kanatsoulis, Alejandro Ribeiro
for: 提高网络对齐的精度和效率，使得网络对齐可以在大规模 graphs 上进行。
methods: 提出一种基于自适应神经网络的普适 graph autoencoder 框架，通过提取节点嵌入来实现网络对齐。该框架可以利用传输学习和数据增强来实现高效的网络对齐。
results: 实验表明，提出的方法可以在实际世界 graphs 上进行高精度、高效的网络对齐，而且不需要重新训练。

Abstract
Network alignment is the task of establishing one-to-one correspondences between the nodes of different graphs and finds a plethora of applications in high-impact domains. However, this task is known to be NP-hard in its general form, and existing algorithms do not scale up as the size of the graphs increases. To tackle both challenges we propose a novel generalized graph autoencoder architecture, designed to extract powerful and robust node embeddings, that are tailored to the alignment task. We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs and can achieve more accurate alignment compared to classical spectral methods. Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining. Extensive experiments on both network and sub-network alignment with real-world graphs provide corroborating evidence supporting the effectiveness and scalability of the proposed approach.

摘要
网络对齐是指将不同图像的节点进行一对一对应，并具有许多应用于高影响领域。然而，这个任务已知为NP困难的普通形式，现有的算法无法随图像大小增长缩放。为了解决这两个挑战，我们提出了一种新的通用图自编码器架构，用于提取强大和可靠的节点嵌入，特化于对齐任务。我们证明了生成的嵌入是对图像的特征值和特征向量相关的，并可以在比 классическихspectral方法更高精度的情况下进行对齐。我们的提出的框架还利用了传输学习和数据扩展来实现大规模的网络对齐，无需重新训练。广泛的实验表明，我们的方法可以在真实世界的图像上实现高精度和可扩展的网络对齐。

Sparse Deep Learning for Time Series Data: Theory and Applications

paper_url: http://arxiv.org/abs/2310.03243
repo_url: None
paper_authors: Mingxuan Zhang, Yan Sun, Faming Liang
for: 这篇论文的目的是提高深度学习网络在不同类型数据上的表现，特别是在不确定性量化、变数选择和大规模网络压缩等领域。
methods: 本论文使用的方法是简单深度学习，并研究了这种方法在相依数据上的应用。研究结果显示，简单深度学习可以在相依数据上适当地训练，并且可以正确地量化预测uncertainty。
results: 本论文的numerical results显示，简单深度学习可以在时间序列数据上进行更好的预测uncertainty量化，并且可以正确地决定时间序列中的自相依关系。此外，本论文的结果显示，简单深度学习可以在大规模网络压缩中表现更好，并且可以正确地识别时间序列中的自相依关系。

Abstract
Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.

摘要
sparse deep learning 已成为深度学习中提高性能的受欢迎技术，特别是在不确定量评估、变量选择和大规模网络压缩等领域。然而，大多数现有研究都集中在独立相同分布（i.i.d）的问题上，尚未对相关的问题进行研究，如时间序列数据和自然语言处理中的序列数据。这篇论文想要填补这一差距，通过研究依赖数据的概率理论，来探讨这些问题。我们显示了 sparse RNN 可以透明地估算，其预测值在适当假设下是均匀分布的，从而正确地评估预测uncertainty。我们的numerical结果表明， sparse deep learning 在时间序列数据中的预测uncertainty评估方面超过了现有的方法，如 конформаль预测，并且在大规模模型压缩方面也表现出了优异性。我们的提议方法在金融、医疗和能源等领域有重要实践意义，因为它们都需要准确的点估计和预测uncertainty评估。

Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization

paper_url: http://arxiv.org/abs/2310.03234
repo_url: None
paper_authors: Quanqi Hu, Dixian Zhu, Tianbao Yang
For: investigate new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO)* Methods: examine non-smooth weakly-convex FCCO, analyze a single-loop algorithm, and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function* Results: extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, and explore applications in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization using empirical studies to showcase the effectiveness of the proposed algorithms.Here’s the format you requested:* For: <what are the paper written for?>* Methods: <what methods the paper use?>* Results: <what results the paper get?>I hope that helps!

Abstract
This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.

摘要
Translated into Simplified Chinese:这篇论文研究了新的一类 compositional optimization 问题，即 $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO)。随着机器学习和人工智能领域中 FCCO 的应用广泛，以及基于 empirical risk minimization 的权重随机算法的缺点，FCCO 的研究吸引了越来越多的关注。然而，现有的 FCCO 研究假设内函数和外函数都是平滑的，这限制了它们的应用范围。我们的研究扩展了这一领域，研究非平滑弱 convex FCCO，其中外函数是弱 convex 的，内函数是弱-convex。我们分析了单循环算法，并确定了其在 Moreau 封闭中的 $\epsilon $-站点的复杂性。此外，我们还扩展了算法，以解决新的非平滑弱 convex tri-level finite-sum coupled compositional optimization 问题，其中有三个函数的嵌套排序。最后，我们探讨了我们的算法在深度学习中的应用，包括两种方法的 partial AUC 最大化和多实例两种方法的 partial AUC 最大化，并通过实验研究表明了我们的算法的效果。

Deep Representations of First-person Pronouns for Prediction of Depression Symptom Severity

paper_url: http://arxiv.org/abs/2310.03232
repo_url: None
paper_authors: Xinyang Ren, Hannah A Burkhardt, Patricia A Areán, Thomas D Hull, Trevor Cohen
for: 本研究使用文本数据分析个人心理状态，尤其是抑郁症状的严重程度。
methods: 研究使用了Contextualized language representation models来生成首人宾词的上下文嵌入，以捕捉首人宾词在语料中的使用方式。
results: 研究结果表明，使用上下文嵌入的首人宾词表现出色于标准分类token嵌入和频率分析结果，在预测抑郁症状严重程度方面表现出优异。这表明Contextual representations of first-person pronouns可以增强语言使用的预测性能。

Abstract
Prior work has shown that analyzing the use of first-person singular pronouns can provide insight into individuals' mental status, especially depression symptom severity. These findings were generated by counting frequencies of first-person singular pronouns in text data. However, counting doesn't capture how these pronouns are used. Recent advances in neural language modeling have leveraged methods generating contextual embeddings. In this study, we sought to utilize the embeddings of first-person pronouns obtained from contextualized language representation models to capture ways these pronouns are used, to analyze mental status. De-identified text messages sent during online psychotherapy with weekly assessment of depression severity were used for evaluation. Results indicate the advantage of contextualized first-person pronoun embeddings over standard classification token embeddings and frequency-based pronoun analysis results in predicting depression symptom severity. This suggests contextual representations of first-person pronouns can enhance the predictive utility of language used by people with depression symptoms.

摘要
Translated into Simplified Chinese:先前的研究表明，分析首人单数代名词的使用可以提供困惑状态的人们的心理状况信息，特别是抑郁症Symptom的严重程度。这些发现是通过计数首人单数代名词在文本数据中的频率来获得的。然而，计数不能捕捉首人单数代名词的使用方式。近年来，神经语言模型的发展已经利用了生成上下文 embedding的方法。在这项研究中，我们想要利用来自上下文化语言表示模型的首人单数代名词 embedding来捕捉首人单数代名词的使用方式，以分析困惑状态。在在线心理咨询中发送的匿名短信，与每周评估抑郁症Symptom的严重程度一起使用进行评估。结果表明，上下文化首人单数代名词 embedding 的优势在 predicting 抑郁症Symptom 的严重程度上，比标准化 classification token embedding 和频率分析结果更高。这表示上下文表示首人单数代名词可以增强基于语言使用的抑郁症Symptom 的预测utilities。

Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms

paper_url: http://arxiv.org/abs/2310.03225
repo_url: None
paper_authors: Akifumi Wachi, Wataru Hashimoto, Xun Shen, Kazumune Hashimoto
for: 本研究旨在提供一种通用安全探索（Generalized Safe Exploration，GSE）问题的统一形式，以及一种基于无约束RL算法和不确定度量表示的安全探索方法MASE，以确保在当前 episoden 中的安全性，并避免未来 episoden 中的安全性抵触。
methods: 本研究使用了一种基于Generalized Linear Models（GLMs）的隐藏 MARGE 方法，以及一种 combine 了 Gaussian Process 和 Deep RL 算法的 variant。
results: 实验结果表明，相比之前的状态 искусственный智能算法，MASE 可以在 grid-world 和 Safety Gym 测试环境中实现更好的性能，而不需要违反任何安全约束，即使在训练过程中。

Abstract
Safe exploration is essential for the practical use of reinforcement learning (RL) in many real-world scenarios. In this paper, we present a generalized safe exploration (GSE) problem as a unified formulation of common safe exploration problems. We then propose a solution of the GSE problem in the form of a meta-algorithm for safe exploration, MASE, which combines an unconstrained RL algorithm with an uncertainty quantifier to guarantee safety in the current episode while properly penalizing unsafe explorations before actual safety violation to discourage them in future episodes. The advantage of MASE is that we can optimize a policy while guaranteeing with a high probability that no safety constraint will be violated under proper assumptions. Specifically, we present two variants of MASE with different constructions of the uncertainty quantifier: one based on generalized linear models with theoretical guarantees of safety and near-optimality, and another that combines a Gaussian process to ensure safety with a deep RL algorithm to maximize the reward. Finally, we demonstrate that our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks without violating any safety constraints, even during training.

摘要
安全探索是重要的实用应用强化学习（RL）的前提。在这篇论文中，我们提出一种通用安全探索（GSE）问题的总体形式，并提出一种解决GSE问题的元算法MASE，该算法结合不受限制的RL算法和不确定度量表来保证当前pisode中的安全性，并正确惩罚不安全的探索，以避免将来的episode中的安全性被违反。MASE的优点在于，我们可以在合理的假设下优化策略，同时保证高概率下不会违反任何安全约束。我们采用两种不同的构建不确定度量表的MASE变体：一种基于泛化线性模型，具有安全性和优化性的理论保证；另一种 combining Gaussian process ensure safety with a deep RL algorithm to maximize the reward.最后，我们证明我们提出的算法在Grid-world和Safety Gym benchmark上比现有算法更好的性能，而不违反任何安全约束，甚至在训练过程中。

Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs

paper_url: http://arxiv.org/abs/2310.03221
repo_url: https://github.com/yijia-xiao/know2bio
paper_authors: Yijia Xiao, Dylan Steinecke, Alexander Russell Pelletier, Yushi Bai, Peipei Ping, Wei Wang
for: 这个论文目的是提出一个通用的生物医学知识 graphs（KG）测试集，以便用于生物医学知识 repre sentation学习。
methods: 这个论文使用了多种数据源，并将这些数据源中的信息集成到一个KG中，以捕捉生物医学领域的复杂关系。它还可以自动更新，以适应最新的生物医学知识。
results: 研究人员通过在Know2BIO上评估知识 repre sentation模型，发现Know2BIO可以作为生物医学领域中知识 repre sentation学习的标准测试集。

Abstract
Knowledge graphs (KGs) have emerged as a powerful framework for representing and integrating complex biomedical information. However, assembling KGs from diverse sources remains a significant challenge in several aspects, including entity alignment, scalability, and the need for continuous updates to keep pace with scientific advancements. Moreover, the representative power of KGs is often limited by the scarcity of multi-modal data integration. To overcome these challenges, we propose Know2BIO, a general-purpose heterogeneous KG benchmark for the biomedical domain. Know2BIO integrates data from 30 diverse sources, capturing intricate relationships across 11 biomedical categories. It currently consists of ~219,000 nodes and ~6,200,000 edges. Know2BIO is capable of user-directed automated updating to reflect the latest knowledge in biomedical science. Furthermore, Know2BIO is accompanied by multi-modal data: node features including text descriptions, protein and compound sequences and structures, enabling the utilization of emerging natural language processing methods and multi-modal data integration strategies. We evaluate KG representation models on Know2BIO, demonstrating its effectiveness as a benchmark for KG representation learning in the biomedical field. Data and source code of Know2BIO are available at https://github.com/Yijia-Xiao/Know2BIO/.

摘要
知识图（KG）在生物医学领域已经出现为表示和集成复杂生物医学信息的强大框架。然而，从多种来源组装KG仍然是一个重要的挑战，包括实体对应、可扩展性和需要不断更新以保持科学进步的速度。此外，KG的表达力 часто受到多模态数据集成的限制。为了解决这些挑战，我们提出了知2生物（Know2BIO），一个通用的生物医学领域多模态KG Benchmark。知2生物从30种多样化来源中提取了11类生物医学信息，涵盖了复杂的实体关系，目前包含约219,000个节点和6,200,000个边。知2生物支持用户指导的自动更新，以反映最新的生物医学知识。此外，知2生物还附带了多模态数据，包括节点特征文本描述、蛋白质和化合物序列和结构，这使得可以利用生成的自然语言处理方法和多模态数据集成策略。我们在知2生物上评估KG表示模型，证明其在生物医学领域KG表示学习的有效性。数据和源代码可以在https://github.com/Yijia-Xiao/Know2BIO/ obtained。

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC

paper_url: http://arxiv.org/abs/2310.03218
repo_url: https://github.com/yupeiyu98/diffusion-amortized-mcmc
paper_authors: Peiyu Yu, Yaxuan Zhu, Sirui Xie, Xiaojian Ma, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu
For: The paper is written for learning latent space Energy-Based Models (EBMs) with long-run Markov Chain Monte Carlo (MCMC) sampling, to address the issue of degenerate MCMC sampling quality in practice.* Methods: The paper introduces a simple but effective diffusion-based amortization method for long-run MCMC sampling, and develops a novel learning algorithm for the latent space EBM based on it.* Results: The paper provides theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler, and demonstrates superior performance of the method on several image modeling benchmark datasets compared with strong counterparts.Here is the text in Simplified Chinese:
for: 本文是为了学习嵌入空间能量基本模型（EBM）的长期Markov链 Monte Carlo（MCMC）采样，以解决实践中MCMC采样质量不佳的问题。
methods: 本文提出了一种简单 yet effective的扩散基于权重融合方法，用于长期MCMC采样，并基于其开发了一种新的学习算法。
results: 本文提供了理论证明，表明学习的MCMC权重融合是一个有效的长期MCMC采样方法，并在多个图像模型benchmark数据集上与强对手进行比较，得到了更好的性能。

Abstract
Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts

摘要
<>将文本翻译成简化中文。<>Latent space Energy-Based Models (EBMs)，也称为能量基因准确，在生成模型领域得到了越来越多的关注，这是因为它们在形式化的灵活性和高效的 latent space 模型能力之间。然而，通常通过非收敛短期 MCMC 学习 latent space EBMs 的做法会带来模型的进一步发展困难; 短期 MCMC 抽取质量在实践中 часто导致生成质量下降和训练不稳定，特别是面临高多态和/或高维target distribution。为了解决这种抽取问题，在这篇论文中我们提出了一种简单 yet effective 的扩散基于散度 amortization 方法，并开发了一种基于这种方法的 latent space EBM 学习算法。我们提供了理论证明，表明学习的扩散 MCMC 是一个有效的长期 MCMC 抽取器。在多个图像模型 benchmark 数据集上，我们的方法与强有力的对手相比，表现出了更高的性能。

2023-10-05

Hard View Selection for Contrastive Learning

Multitask Learning for Time Series Data with 2D Convolution

An Efficient Content-based Time Series Retrieval System

Toward a Foundation Model for Time Series Data

RTDK-BO: High Dimensional Bayesian Optimization with Reinforced Transformer Deep kernels

Taming Binarized Neural Networks and Mixed-Integer Programs

Accelerated Neural Network Training with Rooted Logistic Objectives

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Small batch deep reinforcement learning

Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Validating transformers for redaction of text from electronic health records in real-world healthcare

Design Principles for Lifelong Learning AI Accelerators

Contextualized Structural Self-supervised Learning for Ontology Matching

Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms

ECAvg: An Edge-Cloud Collaborative Learning Approach using Averaged Weights

Accurate Cold-start Bundle Recommendation via Popularity-based Coalescence and Curriculum Heating

Improved Baselines with Visual Instruction Tuning

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

Artificial Intelligence Index Report 2023

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Probabilistic Generative Modeling for Procedural Roundabout Generation for Developing Countries

Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

MapperGPT: Large Language Models for Linking and Mapping Entities

Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures

HandMeThat: Human-Robot Communication in Physical and Social Environments

CLEVRER-Humans: Describing Physical and Causal Events the Human Way

PeaTMOSS: Mining Pre-Trained Models in Open-Source Software

Solving a Class of Non-Convex Minimax Optimization in Federated Learning

FASER: Binary Code Similarity Search through the use of Intermediate Representations

How toxic is antisemitism? Potentials and limitations of automated toxicity scoring for antisemitic online content

Resilient Legged Local Navigation: Learning to Traverse with Compromised Perception End-to-End

Causal Inference in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching

Lightweight Boosting Models for User Response Prediction Using Adversarial Validation

Towards Robust and Generalizable Training: An Empirical Study of Noisy Slot Filling for Input Perturbations

How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

Tik-to-Tok: Translating Language Models One Token at a Time: An Embedding Initialization Strategy for Efficient Language Adaptation

Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties

A Quantitatively Interpretable Model for Alzheimer’s Disease Prediction Using Deep Counterfactuals

Pre-Training and Fine-Tuning Generative Flow Networks

Domain Generalization for Medical Image Analysis: A Survey

GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks

Unpacking Human-AI Interaction in Safety-Critical Industries: A Systematic Literature Review

Procedural Text Mining with Large Language Models

Design Optimizer for Planar Soft-Growing Robot Manipulators

AI-based automated active learning for discovery of hidden dynamic processes: A use case in light microscopy

Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video Sequences Using Swin Transformer-Enhanced UNet

Robust Representation Learning via Asymmetric Negative Contrast and Reverse Attention

Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games

Parking Spot Classification based on surround view camera system

Deep Geometric Learning with Monotonicity Constraints for Alzheimer’s Disease Progression

Tractable Bounding of Counterfactual Queries by Knowledge Compilation

Tuning In to Neural Encoding: Linking Human Brain and Artificial Supervised Representations of Language

Zero-shot Learning of Drug Response Prediction for Preclinical Drug Screening

Learning Concept-Based Visual Causal Transition and Symbolic Reasoning for Visual Planning

Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning

Benchmarking Large Language Models As AI Research Agents

LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

SoK: Access Control Policy Generation from High-level Natural Language Requirements

A 5’ UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions

Network Alignment with Transferable Graph Autoencoders

Sparse Deep Learning for Time Series Data: Theory and Applications

Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization

Deep Representations of First-person Pronouns for Prediction of Depression Symptom Severity

Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms

Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs

Learning Energy-Based Prior Model with Diffusion-Amortized MCMC