2023-07-06

cs.AI

cs.AI - 2023-07-06

Can ChatGPT’s Responses Boost Traditional Natural Language Processing?

paper_url: http://arxiv.org/abs/2307.04648
repo_url: https://github.com/mostafa-mahmoud/chat-gpt-fusion-evaluation
paper_authors: Mostafa M. Amin, Erik Cambria, Björn W. Schuller
for: 这项研究旨在探讨 chatGPT 是否可以增强现有的 NLP 方法，以解决情感分析、自杀倾向检测和大五人格评估等情感 computing 问题。
methods: 研究采用了 verbose 回答来探索 chatGPT 是否具有 novel 知识，并对现有 NLP 方法进行拼接以提高性能。
results: 结果显示，chatGPT 确实具有 novel 知识，可以通过拼接来提高现有 NLP 方法的性能，无论是在早期或晚期拼接。

Abstract
The employment of foundation models is steadily expanding, especially with the launch of ChatGPT and the release of other foundation models. These models have shown the potential of emerging capabilities to solve problems, without being particularly trained to solve. A previous work demonstrated these emerging capabilities in affective computing tasks; the performance quality was similar to traditional Natural Language Processing (NLP) techniques, but falling short of specialised trained models, like fine-tuning of the RoBERTa language model. In this work, we extend this by exploring if ChatGPT has novel knowledge that would enhance existing specialised models when they are fused together. We achieve this by investigating the utility of verbose responses from ChatGPT about solving a downstream task, in addition to studying the utility of fusing that with existing NLP methods. The study is conducted on three affective computing problems, namely sentiment analysis, suicide tendency detection, and big-five personality assessment. The results conclude that ChatGPT has indeed novel knowledge that can improve existing NLP techniques by way of fusion, be it early or late fusion.

摘要
基础模型的雇佣正在不断扩展，尤其是随着ChatGPT的发布和其他基础模型的发布。这些模型表现出了解决问题的潜在能力，无需特别地训练。之前的研究已经证明这些潜在能力在情感计算任务中表现出了类似于传统自然语言处理（NLP）技术的性能质量，但略少于专门训练的模型，如精度调整的RoBERTa语言模型。在这项工作中，我们将进一步探索chatGPT是否具有可以增强现有特殊化模型的新知识。我们通过研究chatGPT对解决下游任务的 verbose 响应的使用，以及与现有NLP方法的融合来实现这一点。研究目标是在三种情感计算问题上进行评估，即情感分析、自杀倾向检测和大五人性评估。结果表明，chatGPTindeed具有可以改进现有NLP技术的新知识，并且可以通过融合来实现这一点，无论是在早期或晚期融合。

Hybrid Knowledge-Data Driven Channel Semantic Acquisition and Beamforming for Cell-Free Massive MIMO

paper_url: http://arxiv.org/abs/2307.03070
repo_url: None
paper_authors: Zhen Gao, Shicong Liu, Yu Su, Zhongxiang Li, Dezhi Zheng
For: 提高户外无线系统的支持能力，以更好地支持普遍性扩展现实（XR）应用，并将户外无线传输能力与室内无线传输能力追究到一起。* Methods: 提议一种混合知识驱动的树枝网络（MLP）-混合器-基于 autoencoder 的通道semantic取得方法，并基于获得的通道semantic进一步提议一种知识驱动的深度 unfolding 多用户扩散幕。* Results: 实验结果表明，提议的方案可以提高通道取得精度，同时降低 CSI 取得和幕设计的复杂性。在下降链传输中，提议的幕设计方法可以在只需要三次迭代后达到大约 96% 的最终谱效率性能。

Abstract
This paper focuses on advancing outdoor wireless systems to better support ubiquitous extended reality (XR) applications, and close the gap with current indoor wireless transmission capabilities. We propose a hybrid knowledge-data driven method for channel semantic acquisition and multi-user beamforming in cell-free massive multiple-input multiple-output (MIMO) systems. Specifically, we firstly propose a data-driven multiple layer perceptron (MLP)-Mixer-based auto-encoder for channel semantic acquisition, where the pilot signals, CSI quantizer for channel semantic embedding, and CSI reconstruction for channel semantic extraction are jointly optimized in an end-to-end manner. Moreover, based on the acquired channel semantic, we further propose a knowledge-driven deep-unfolding multi-user beamformer, which is capable of achieving good spectral efficiency with robustness to imperfect CSI in outdoor XR scenarios. By unfolding conventional successive over-relaxation (SOR)-based linear beamforming scheme with deep learning, the proposed beamforming scheme is capable of adaptively learning the optimal parameters to accelerate convergence and improve the robustness to imperfect CSI. The proposed deep unfolding beamforming scheme can be used for access points (APs) with fully-digital array and APs with hybrid analog-digital array. Simulation results demonstrate the effectiveness of our proposed scheme in improving the accuracy of channel acquisition, as well as reducing complexity in both CSI acquisition and beamformer design. The proposed beamforming method achieves approximately 96% of the converged spectrum efficiency performance after only three iterations in downlink transmission, demonstrating its efficacy and potential to improve outdoor XR applications.

摘要
The proposed method includes a data-driven multiple layer perceptron (MLP)-Mixer-based auto-encoder for channel semantic acquisition, where the pilot signals, CSI quantizer for channel semantic embedding, and CSI reconstruction for channel semantic extraction are jointly optimized in an end-to-end manner. Additionally, based on the acquired channel semantic, the authors propose a knowledge-driven deep-unfolding multi-user beamformer, which can achieve good spectral efficiency with robustness to imperfect CSI in outdoor XR scenarios.The proposed beamforming scheme unfolds the conventional successive over-relaxation (SOR)-based linear beamforming scheme with deep learning, allowing the beamformer to adaptively learn the optimal parameters to accelerate convergence and improve robustness to imperfect CSI. The proposed beamforming scheme can be used for access points (APs) with fully-digital array and APs with hybrid analog-digital array.Simulation results demonstrate the effectiveness of the proposed scheme in improving the accuracy of channel acquisition and reducing complexity in both CSI acquisition and beamformer design. The proposed beamforming method achieves approximately 96% of the converged spectrum efficiency performance after only three iterations in downlink transmission, showing its efficacy and potential to improve outdoor XR applications.

DeepOnto: A Python Package for Ontology Engineering with Deep Learning

paper_url: http://arxiv.org/abs/2307.03067
repo_url: https://github.com/KRR-Oxford/DeepOnto
paper_authors: Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, Brahmananda Sapkota
for: 本文旨在探讨如何使用深度学习技术，特别是语言模型（LM），在 Ontology Engineering 中进行融合。
methods: 本文使用的方法包括 PyTorch 和 Tensorflow 等深度学习框架，以及 widely-used ontology APIs 如 OWL API 和 Jena。
results: 本文提出了 Deeponto，一个 Python 包，用于支持 Ontology Engineering 任务，包括ontology alignment和 completion，通过使用深度学习方法和 Pre-trained LMs。 authors 还提供了两个用例，包括 Samsung Research UK 的数字医疗咨询和 OAEI 的 Bio-ML 追踪。

Abstract
Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Deeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Deeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).

摘要
使用深度学习技术，特别是语言模型（LM），在ontology工程中引起了广泛的关注。然而，深度学习框架如PyTorch和Tensorflow主要是为Python编程语言设计的，而广泛使用的ontology API，如OWL API和Jena，主要是Java基于的。为了实现这些框架和API的协同工作，我们提出了Deeponto，一个Python包用于ontology工程。这个包包括一个核心基于广泛认可和可靠的OWL API的ontology处理模块，封装了其基本特性在更"Pythonic"的方式下，并将其扩展到包括其他重要组成部分，如理解、抽象、准则、投影等。在这个模块基础之上，Deeponto提供了一 suite of工具、资源和算法，支持多种ontology工程任务，如ontology对齐和完成，通过深度学习方法，主要是使用预训练的LM。在这篇论文中，我们还示例了Deeponto在Samsung Research UK的数字医疗咨询和OAEI的生物ML轨道上的实践用case。

Generalizing Backpropagation for Gradient-Based Interpretability

paper_url: http://arxiv.org/abs/2307.03056
repo_url: https://github.com/kdu4108/semiring-backprop-exps
paper_authors: Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell
for: 本研究旨在提供一种普遍适用的方法来解释深度神经网络中的模型内部工作方式。
methods: 本研究使用了semirings来扩展归档推导算法，以计算神经网络中的梯度图的可解释统计数据，包括最大权重路径和熵。
results: 研究表明，通过计算梯度图的可解释统计数据，可以快速理解神经网络模型内部的工作方式，并且在SVA任务中，可以确定自注意机制中的哪些路径是最重要的。

Abstract
Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs. While these methods can indicate which input features may be important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.

摘要
很多受欢迎的深度神经网络特征归因方法依靠计算模型输出与输入之间的梯度。although these methods can indicate which input features are important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.Here is a word-for-word translation of the text into Simplified Chinese:很多受欢迎的深度神经网络特征归因方法依靠计算模型输出与输入之间的梯度。 although these methods can indicate which input features are important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.

Art Authentication with Vision Transformers

paper_url: http://arxiv.org/abs/2307.03039
repo_url: None
paper_authors: Ludovica Schaerf, Carina Popovici, Eric Postma
for: 这篇研究是为了检查当代电脑可以通过视觉transformer来进行艺术作品验证，以提高电脑验证艺术作品的可靠性。
methods: 这篇研究使用了Swin Transformer和EfficientNet两种不同的视觉transformer方法进行艺术作品验证，并比较了它们的验证性能。
results: 研究结果显示，使用Swin Transformer可以在仅有伪作品的情况下 achieve over 85%的验证精度，而EfficientNet则在标准的对比集上表现较差。这些结果表明，视觉transformer可以成为艺术作品验证中的一个强大和有前途的选择。

Abstract
In recent years, Transformers, initially developed for language, have been successfully applied to visual tasks. Vision Transformers have been shown to push the state-of-the-art in a wide range of tasks, including image classification, object detection, and semantic segmentation. While ample research has shown promising results in art attribution and art authentication tasks using Convolutional Neural Networks, this paper examines if the superiority of Vision Transformers extends to art authentication, improving, thus, the reliability of computer-based authentication of artworks. Using a carefully compiled dataset of authentic paintings by Vincent van Gogh and two contrast datasets, we compare the art authentication performances of Swin Transformers with those of EfficientNet. Using a standard contrast set containing imitations and proxies (works by painters with styles closely related to van Gogh), we find that EfficientNet achieves the best performance overall. With a contrast set that only consists of imitations, we find the Swin Transformer to be superior to EfficientNet by achieving an authentication accuracy of over 85%. These results lead us to conclude that Vision Transformers represent a strong and promising contender in art authentication, particularly in enhancing the computer-based ability to detect artistic imitations.

摘要

Sequential Neural Barriers for Scalable Dynamic Obstacle Avoidance

paper_url: http://arxiv.org/abs/2307.03015
repo_url: None
paper_authors: Hongzhan Yu, Chiaki Hirayama, Chenning Yu, Sylvia Herbert, Sicun Gao
for: 本研究旨在解决机器人导航Dynamic障碍物的扩展问题，即障碍物间复杂的互动征性难以分析、控制计划和执行难以扩展。
methods: 我们提出了一种新的Sequential Neural Control Barrier模型（SNCBF）的compositional learning方法，利用了观察：多个动态障碍物的空间互动模式可以通过时间序列状态来预测。通过这种归一化，我们可以将少量障碍物训练的控制策略扩展到环境中，障碍物密度高得多的环境中。
results: 我们比较了提出的方法与潜在场、终端学习控制和模型预测控制等现有方法，并进行硬件实验，证明了方法的实用性。

Abstract
There are two major challenges for scaling up robot navigation around dynamic obstacles: the complex interaction dynamics of the obstacles can be hard to model analytically, and the complexity of planning and control grows exponentially in the number of obstacles. Data-driven and learning-based methods are thus particularly valuable in this context. However, data-driven methods are sensitive to distribution drift, making it hard to train and generalize learned models across different obstacle densities. We propose a novel method for compositional learning of Sequential Neural Control Barrier models (SNCBFs) to achieve scalability. Our approach exploits an important observation: the spatial interaction patterns of multiple dynamic obstacles can be decomposed and predicted through temporal sequences of states for each obstacle. Through decomposition, we can generalize control policies trained only with a small number of obstacles, to environments where the obstacle density can be 100x higher. We demonstrate the benefits of the proposed methods in improving dynamic collision avoidance in comparison with existing methods including potential fields, end-to-end reinforcement learning, and model-predictive control. We also perform hardware experiments and show the practical effectiveness of the approach in the supplementary video.

摘要
“运动障碍物 Navigation 扩大化面临两大挑战：障碍物间的互动关系具有复杂的非常数据分布，并且计划和控制的复杂度随着障碍物数量的增加而增加 exponentially。因此，数据驱动和学习型方法在这个上特别有价。然而，数据驱动方法受到分布迁移的影响，对于不同的障碍物密度训练和应用难以稳定。我们提出了一种新的Sequential Neural Control Barrier模型（SNCBF）的实时compositional learning方法，以实现扩展性。我们发现了一个重要的观察：多个动态障碍物之间的空间互动图样可以通过时间序列状态来预测，并且透过分解，将已经在少量障碍物环境中训练的控制策略扩展到障碍物密度可以高达100倍的环境中。我们显示了与现有方法，包括潜在场、终端循环学习和预测运算控制方法相比，提出的方法具有更好的动态碰撞避免性。我们还进行了硬件实验，并在辅助影片中展示了实际效果。”

Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning

paper_url: http://arxiv.org/abs/2307.03007
repo_url: None
paper_authors: Christian Jauch, Timo Leitritz, Marco F. Huber
for: 本研究旨在提供一种自动学习的手势识别管道，以便在具有复杂使用场景的情况下进行便宜和可靠的手势识别。
methods: 该管道包括一种通用机器学习模型，通过一个泛化数据集进行训练，以及考虑手部的生物学约束的空间和时间滤波。
results: 研究人员通过对公共可用的和注解的数据集进行评估，选择了最佳参数和模型组合，并在手动组装场景中训练了一个活动识别任务，以示管道的效果。

Abstract
Manual assembly workers face increasing complexity in their work. Human-centered assistance systems could help, but object recognition as an enabling technology hinders sophisticated human-centered design of these systems. At the same time, activity recognition based on hand poses suffers from poor pose estimation in complex usage scenarios, such as wearing gloves. This paper presents a self-supervised pipeline for adapting hand pose estimation to specific use cases with minimal human interaction. This enables cheap and robust hand posebased activity recognition. The pipeline consists of a general machine learning model for hand pose estimation trained on a generalized dataset, spatial and temporal filtering to account for anatomical constraints of the hand, and a retraining step to improve the model. Different parameter combinations are evaluated on a publicly available and annotated dataset. The best parameter and model combination is then applied to unlabelled videos from a manual assembly scenario. The effectiveness of the pipeline is demonstrated by training an activity recognition as a downstream task in the manual assembly scenario.

摘要
人工组装工人面临增加的工作复杂性。人类中心协助系统可以帮助，但对象识别作为激活技术，妨碍了复杂的人类中心设计。同时，基于手势识别的活动识别在复杂的使用场景中，如穿着手套，存在质量不高的手势估计问题。这篇论文介绍了一个自动学习管道，用于适应特定使用场景的手势估计，只需少量人类参与。该管道包括一个通用的机器学习模型，用于手势估计，以及空间和时间滤波器，用于考虑手部的解剖约束。还有一步重新训练，以提高模型。不同的参数组合被评估在公共可用的和注释的数据集上。最佳参数和模型组合然后应用于无注释的手势视频。这种管道的效果得到证明，通过在人工组装场景中训练一个活动识别任务。

CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering

paper_url: http://arxiv.org/abs/2307.04683
repo_url: https://github.com/oacore/core-gpt-evaluation
paper_authors: David Pride, Matteo Cancellieri, Petr Knoth
for: 本研究旨在开发一种基于GPT语言模型和CORE全文开放访问科学期刊的问答平台，以提供可靠的答案和相关文献参考。
methods: 本研究使用GPT3.5和GPT4语言模型，并将CORE的全文数据集作为训练数据进行了融合。为了提高答案的可靠性和准确性，研究人员还对GPT模型进行了一些改进和调整。
results: 根据20个科学领域的100个问题测试，CORE-GPT可以提供全面和可靠的答案，同时提供相关文献的链接和参考。两名注释员对答案和链接的质量和相关性进行评估，结果显示CORE-GPT的答案和链接准确性高，可靠性大。

Abstract
In this paper, we present CORE-GPT, a novel question-answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE. We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text. We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers, greatly increasing the trustworthiness of the answers and reducing the risk of hallucinations. CORE-GPT's performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE, resulting in 100 answers and links to 500 relevant articles. The quality of the provided answers and and relevance of the links were assessed by two annotators. Our results demonstrate that CORE-GPT can produce comprehensive and trustworthy answers across the majority of scientific domains, complete with links to genuine, relevant scientific articles.

摘要
在这篇论文中，我们介绍了CORE-GPT，一种新的问答平台，它结合GPT基于语言模型和CORE上的 más de 3200万全文开放论文。我们首先表明GPT3.5和GPT4无法提供参考或引用 для生成的文本。然后，我们引入CORE-GPT，它可以提供基于证据的答案，并提供相关论文的引用和链接，从而增加答案的可靠性和减少误差。CORE-GPT的性能在20种科学领域的100个问题上进行评估，共计100个答案和500篇相关论文的链接。两名注释者评估了提供的答案和链接的质量和相关性。我们的结果表明，CORE-GPT可以在大多数科学领域提供全面和可靠的答案，并且链接到真实、相关的科学论文。

A Privacy-Preserving Walk in the Latent Space of Generative Models for Medical Applications

paper_url: http://arxiv.org/abs/2307.02984
repo_url: https://github.com/perceivelab/plan
paper_authors: Matteo Pennisi, Federica Proietto Salanitri, Giovanni Bellitto, Simone Palazzo, Ulas Bagci, Concetto Spampinato
for: 本研究旨在提出一种基于k-同样性原理的私钥网络（GAN），以保持数据隐私的方式生成具有可训练深度学习模型的合理样本。
methods: 本研究使用了一种帮助找到latent space中离散的点的辅助标识类ifier，并通过非线性步进行latent space Navigation，以避免隐私问题。
results: 实验表明，给定任意两个latent space中的随机点对，我们的步进行方法比线性 interpolate 更安全，并在两个benchmark（肺炎和糖尿病肠病）上证明，通过我们的方法生成的样本可以减少数据隐私问题，同时保持模型训练的性能。

Abstract
Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution. However, from a privacy perspective, using GANs as a proxy for data sharing is not a safe solution, as they tend to embed near-duplicates of real samples in the latent space. Recent works, inspired by k-anonymity principles, address this issue through sample aggregation in the latent space, with the drawback of reducing the dataset by a factor of k. Our work aims to mitigate this problem by proposing a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models, while addressing privacy concerns in a principled way. Our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples. We empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation. We then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigate drops in performance, while keeping privacy preservation.

摘要
生成敌对网络（GANs）已经证明它们可以生成匹配目标分布的 sintetic 样本。然而，从隐私角度来看，使用 GANs 作为数据共享的代理不是安全的解决方案，因为它们往往将实际样本的几乎重复的样本嵌入在隐藏空间中。 latest works, inspired by k-anonymity principles, address this issue by aggregating samples in the latent space, with the drawback of reducing the dataset by a factor of k。our work aims to mitigate this problem by proposing a latent space navigation strategy that can generate diverse sintetic samples that support the effective training of deep models, while addressing privacy concerns in a principled way。our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples。we empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation。we then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigates drops in performance while keeping privacy preservation。Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.

How word semantics and phonology affect handwriting of Alzheimer’s patients: a machine learning based analysis

paper_url: http://arxiv.org/abs/2307.04762
repo_url: None
paper_authors: Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Sabato Marco Siniscalchi
for: 本研究旨在使用手写运动动力学性质以支持诊断神经退化疾病。
methods: 本研究使用非侵入式探测技术和机器学习方法。
results: 研究发现，不同词语类型的手写 task 会产生不同的动力学特征，非常规词语需要更多的特征来进行分类，但其分类性能却很高，最好的结果达到了90%。

Abstract
Using kinematic properties of handwriting to support the diagnosis of neurodegenerative disease is a real challenge: non-invasive detection techniques combined with machine learning approaches promise big steps forward in this research field. In literature, the tasks proposed focused on different cognitive skills to elicitate handwriting movements. In particular, the meaning and phonology of words to copy can compromise writing fluency. In this paper, we investigated how word semantics and phonology affect the handwriting of people affected by Alzheimer's disease. To this aim, we used the data from six handwriting tasks, each requiring copying a word belonging to one of the following categories: regular (have a predictable phoneme-grapheme correspondence, e.g., cat), non-regular (have atypical phoneme-grapheme correspondence, e.g., laugh), and non-word (non-meaningful pronounceable letter strings that conform to phoneme-grapheme conversion rules). We analyzed the data using a machine learning approach by implementing four well-known and widely-used classifiers and feature selection. The experimental results showed that the feature selection allowed us to derive a different set of highly distinctive features for each word type. Furthermore, non-regular words needed, on average, more features but achieved excellent classification performance: the best result was obtained on a non-regular, reaching an accuracy close to 90%.

摘要
In this paper, we investigated how word semantics and phonology affect the handwriting of individuals with Alzheimer's disease. We used data from six handwriting tasks, each requiring the participants to copy a word belonging to one of the following categories: regular (with predictable phoneme-grapheme correspondence, such as "cat"), non-regular (with atypical phoneme-grapheme correspondence, such as "laugh"), and non-word (non-meaningful pronounceable letter strings that conform to phoneme-grapheme conversion rules). We analyzed the data using machine learning techniques, including four well-known and widely-used classifiers and feature selection.Our results showed that feature selection allowed us to derive a different set of highly distinctive features for each word type. Additionally, non-regular words required, on average, more features but achieved excellent classification performance. In fact, the best result was obtained on a non-regular word, with an accuracy of nearly 90%. These findings suggest that machine learning approaches using kinematic properties of handwriting could be a valuable tool for the diagnosis of neurodegenerative diseases such as Alzheimer's.

On the Cultural Gap in Text-to-Image Generation

paper_url: http://arxiv.org/abs/2307.02971
repo_url: None
paper_authors: Bingshuai Liu, Longyue Wang, Chenyang Lyu, Yong Zhang, Jinsong Su, Shuming Shi, Zhaopeng Tu
for: 提高文本到图像（T2I）生成的跨文化质量
methods: 提出了一个Challenging Cross-Cultural（C3）benchmark，并提出了一种多模式度量，用于约束 fine-tuning 数据，以提高cross-cultural生成的质量
results: 实验结果显示，我们的多模式度量在C3 benchmark上提供了更强的数据选择性，而object-textAlignment在这种metric中起到了关键作用。

Abstract
One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation (https://github.com/longyuewangdcu/C3-Bench).

摘要
一个挑战在文本到图像（T2I）生成中是因为训练数据中存在文化差距，这导致生成图像质量在不同文化元素的输入文本时出现差异。虽然多种T2I模型已经展示了吸引人的但是无法控制的例子，但是没有一个系统性的评价标准来评价一个T2I模型在不同文化中的表现。为了bridging这个差距，我们提出了一个多元评价标准，即挑战性跨文化（C3）benchmark，该标准包括了评价模型在目标文化中的多种评价标准。通过分析Stable Diffusion模型在C3benchmark上生成的瑕且图像，我们发现该模型经常无法生成特定文化元素。因此，我们提出了一种新的多模态指标，它考虑了文本-图像对齐，用于筛选训练数据的精度提高。实验结果显示，我们的多模态指标在C3benchmark上的数据选择性比既有的指标更强，对于文本-图像对齐是关键。我们将benchmark、数据、代码和生成图像公开发布（https://github.com/longyuewangdcu/C3-Bench），以便未来的文化多样化T2I生成研究。

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

paper_url: http://arxiv.org/abs/2307.02947
repo_url: None
paper_authors: Sergio F. Chevtchenko, Yeshwanth Bethi, Teresa B. Ludermir, Saeed Afshar
for: solves Reinforcement Learning (RL) problems with real-valued observations in a hardware-efficient and bio-inspired way.
methods: incorporates multi-layered event-based clustering, Temporal Difference (TD)-error modulation, and eligibility traces to build a novel Spiking Neural Network (SNN) architecture.
results: consistently outperforms a tabular actor-critic algorithm and successfully discovers stable control policies on classic RL environments, with an appealing trade-off in terms of computational and hardware implementation requirements.

Abstract
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model's performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.

摘要
利用强化学习（Reinforcement Learning，RL）的框架可以在复杂环境中做出决策。然而，通过硬件高效和生物体灵感的方式实现RL仍然是一个挑战。这篇论文提出了一种新的脉冲神经网络（Spiking Neural Network，SNN）架构，用于解决RL问题中的实数观察。提案的模型包括多层事件基于归一化、TD-错误调整和可用性追踪，这些组成部分都是在之前的工作之上。一项ablation研究证明了这些组成部分对提案模型的性能具有显著的影响。与标准的表格actor-critic算法和现状顶点优化算法（Proximal Policy Optimization，PPO）相比，我们的网络一直以上表现出色，并在经典RL环境中成功地找到稳定的控制策略：山脉汽车、折扇杆和Acrobot。提案的模型具有更有吸引力的计算和硬件实现需求。模型不需要外部内存缓冲区域，也不需要全局错误导数计算，更重要的是，神经元更新都发生在线上，驱动于本地学习规则和广播的TD-错误信号。因此，这种工作对RL解决方案的开发做出了贡献。

Amplifying Limitations, Harms and Risks of Large Language Models

paper_url: http://arxiv.org/abs/2307.04821
repo_url: None
paper_authors: Michael O’Neill, Mark Connor
for: 本文旨在对人工智能（AI）技术的备受吹捧和未来可能出现的科幻情节提供一种小小的答复，以帮助那些不熟悉该领域的人更好地了解AI技术的限制和风险。
methods: 本文使用大量语言模型（LLMs），如ChatGPT，来描述AI技术的限制和风险。
results: 本文指出了LLMs的一些限制，包括其无法满足一些基本的人类语言理解需求，以及其可能导致个人和组织的损害。

Abstract
We present this article as a small gesture in an attempt to counter what appears to be exponentially growing hype around Artificial Intelligence (AI) and its capabilities, and the distraction provided by the associated talk of science-fiction scenarios that might arise if AI should become sentient and super-intelligent. It may also help those outside of the field to become more informed about some of the limitations of AI technology. In the current context of popular discourse AI defaults to mean foundation and large language models (LLMs) such as those used to create ChatGPT. This in itself is a misrepresentation of the diversity, depth and volume of research, researchers, and technology that truly represents the field of AI. AI being a field of research that has existed in software artefacts since at least the 1950's. We set out to highlight a number of limitations of LLMs, and in so doing highlight that harms have already arisen and will continue to arise due to these limitations. Along the way we also highlight some of the associated risks for individuals and organisations in using this technology.

摘要
我们在这篇文章中呈现出一小小的尝试，以抵消人们对人工智能（AI）技术的快速增长和其能力的夸大、以及相关的科幻情节的抖音。我们也希望通过这篇文章，让那些不familiar with AI技术的人更加了解AI领域的限制和风险。在当今的流行讨论中，AI Defaults to Mean foundation和大型自然语言模型（LLMs），如chatGPT所使用的技术。这实际上是对AI领域的多样性、深度和规模的不公正表述。AI是一个已经在软件遗产物since at least the 1950s的研究领域。我们在这篇文章中，探讨了一些LLMs的限制，并在这个过程中，揭示了这些限制已经导致了一些害和将继续导致害。同时，我们还揭示了使用这种技术的个人和组织所面临的风险。

In Time and Space: Towards Usable Adaptive Control for Assistive Robotic Arms

paper_url: http://arxiv.org/abs/2307.02933
repo_url: None
paper_authors: Max Pascher, Kirill Kronhardt, Felix Ferdinand Goldau, Udo Frese, Jens Gerken
for:这篇论文旨在提出一种新的控制方法，用于帮助用户更好地控制 робо臂进行 grasping 和 manipulation 任务。methods:本论文使用了 Adaptive DoF Mapping Controls (ADMC) 方法，并通过提供feed-forward multimodal feedback来帮助用户更好地选择合适的控制方式。results:研究表明，在虚拟现实环境中，使用 ADMC 方法可以降低任务完成时间、减少模式 switching 次数以及减轻用户的心理劳动量。此外，研究还发现，在不同的用户中，不同的自适应阈值可以更好地适应用户的需求。

Abstract
Robotic solutions, in particular robotic arms, are becoming more frequently deployed for close collaboration with humans, for example in manufacturing or domestic care environments. These robotic arms require the user to control several Degrees-of-Freedom (DoFs) to perform tasks, primarily involving grasping and manipulating objects. Standard input devices predominantly have two DoFs, requiring time-consuming and cognitively demanding mode switches to select individual DoFs. Contemporary Adaptive DoF Mapping Controls (ADMCs) have shown to decrease the necessary number of mode switches but were up to now not able to significantly reduce the perceived workload. Users still bear the mental workload of incorporating abstract mode switching into their workflow. We address this by providing feed-forward multimodal feedback using updated recommendations of ADMC, allowing users to visually compare the current and the suggested mapping in real-time. We contrast the effectiveness of two new approaches that a) continuously recommend updated DoF combinations or b) use discrete thresholds between current robot movements and new recommendations. Both are compared in a Virtual Reality (VR) in-person study against a classic control method. Significant results for lowered task completion time, fewer mode switches, and reduced perceived workload conclusively establish that in combination with feedforward, ADMC methods can indeed outperform classic mode switching. A lack of apparent quantitative differences between Continuous and Threshold reveals the importance of user-centered customization options. Including these implications in the development process will improve usability, which is essential for successfully implementing robotic technologies with high user acceptance.

摘要
人工智能解决方案，尤其是机械臂，在人类 Collaborative 环境中越来越广泛应用，如制造或家庭护理环境。这些机械臂需要用户控制多个度Of freedom（DoF）来完成任务，主要是抓取和操纵物体。现有的输入设备主要有两个DoF，需要时间consuming和认知劳动密集的模式转换来选择个别DoF。当前的适应DoF映射控制法（ADMC）已经能够减少必要的模式转换数量，但是没有能力显著减少用户认知劳动。我们解决这问题，通过提供前向多模态反馈，使用更新的ADMC建议，让用户在实时比较当前和建议的映射。我们比较了两种新方法：a）不断推荐更新的DoF组合，或b）使用精度阈值来区分当前机器运动和新建议。两者在虚拟现实（VR）实验室中进行了对照测试，与经典控制方法进行比较。结果显示，在与 feedforward 结合使用的情况下，ADMC 方法可以实现更好的任务完成时间、更少的模式转换和更低的认知劳动。无论 apparent 的量化差异，表明用户中心的个性化选项的重要性。包括这些影响在开发过程中，将提高可用性，这是实施机器人技术的成功关键。

LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias

paper_url: http://arxiv.org/abs/2307.02912
repo_url: None
paper_authors: Mario Almagro, Emilio Almazán, Diego Ortego, David Jiménez
for: 本研究的目的是提高Transformer搜寻引擎对文本噪音的耐性，以提高在多个领域中的下游任务性能。methods: 本研究使用了一种新的lexical-aware Attention模块(LEA)，通过 incorporating lexical similarities between words in both sentences, 以提高cross-encoders对文本噪音的耐性。results: 实验结果表明，LEA可以帮助cross-encoders在文本噪音的情况下提高性能，并在不同领域中保持竞争力。

Abstract
Textual noise, such as typos or abbreviations, is a well-known issue that penalizes vanilla Transformers for most downstream tasks. We show that this is also the case for sentence similarity, a fundamental task in multiple domains, e.g. matching, retrieval or paraphrasing. Sentence similarity can be approached using cross-encoders, where the two sentences are concatenated in the input allowing the model to exploit the inter-relations between them. Previous works addressing the noise issue mainly rely on data augmentation strategies, showing improved robustness when dealing with corrupted samples that are similar to the ones used for training. However, all these methods still suffer from the token distribution shift induced by typos. In this work, we propose to tackle textual noise by equipping cross-encoders with a novel LExical-aware Attention module (LEA) that incorporates lexical similarities between words in both sentences. By using raw text similarities, our approach avoids the tokenization shift problem obtaining improved robustness. We demonstrate that the attention bias introduced by LEA helps cross-encoders to tackle complex scenarios with textual noise, specially in domains with short-text descriptions and limited context. Experiments using three popular Transformer encoders in five e-commerce datasets for product matching show that LEA consistently boosts performance under the presence of noise, while remaining competitive on the original (clean) splits. We also evaluate our approach in two datasets for textual entailment and paraphrasing showing that LEA is robust to typos in domains with longer sentences and more natural context. Additionally, we thoroughly analyze several design choices in our approach, providing insights about the impact of the decisions made and fostering future research in cross-encoders dealing with typos.

摘要
文本噪声，如 typo 或缩写，是许多下游任务中知名的问题，即使是 sentence similarity 任务。我们显示这也是情况的 caso для sentence similarity 任务。sentence similarity 可以通过 cross-encoder 来实现，其中两个句子被 concatenated 在输入中，让模型利用它们之间的关系。前一些 Addressing 噪声问题的方法主要通过数据增强策略来解决，并显示在训练样本中的噪声样本上提高了模型的Robustness。但这些方法 все都受到 tokenization shift 的影响，即使在训练样本中噪声样本的情况下。在这种情况下，我们提议使用 LExical-aware Attention 模块 (LEA) 来解决 textual noise。LEA 模块 incorporates 词语之间的 lexical similarity，使得我们可以避免 tokenization shift 问题。我们使用 raw text similarity，从而获得了改进的 robustness。我们表明，LEA 模块引入的注意力偏好可以帮助 cross-encoder 在复杂的 scenario 中处理 textual noise，特别是在短文本描述和有限上下文中。我们在五个电商数据集上进行了三种 popular Transformer encoder 的实验，并证明了 LEA 在噪声存在的情况下 consistently 提高表现，而不会在干净（clean） split 上受到影响。此外，我们还在两个文本同义和重叠任务上进行了评估，并证明了 LEA 在长句子和自然上下文中具有 robustness。此外，我们还进行了多种设计决策的分析，提供了关于 LEA 的设计决策的影响和未来研究的指导。

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

paper_url: http://arxiv.org/abs/2307.02909
repo_url: None
paper_authors: Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu
for: 这个论文的目的是提出一种基于多渠道音频视频信号的混合 speech separation、 dereverberation 和认知方法，以提高听说党场中的语音识别精度。
methods: 该方法首先使用了mask-based MVDR speech separation、DNN-WPE或spectral mapping (SpecM) based speech dereverberation的前端，然后使用Conformer ASR的后端进行语音识别。在音频视频Integrated front-end架构中，speech separation和dereverberation可以在独立或联合的方式进行，通过mask-based WPD来实现。
results: 实验结果表明，提出的音频视频多渠道 speech separation、dereverberation和认知方法可以与相关的音频 только基eline相比，提高41.7%和36.0%的Relative word error rate (WER)，并且在PESQ、STOI和SRMR等评价指标上也获得了一定的提高。

Abstract
Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is proposed in this paper. The efficacy of the video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and Conformer ASR back-end. Audio-visual integrated front-end architectures performing speech separation and dereverberation in a pipelined or joint fashion via mask-based WPD are investigated. The error cost mismatch between the speech enhancement front-end and ASR back-end components is minimized by end-to-end jointly fine-tuning using either the ASR cost function alone, or its interpolation with the speech enhancement loss. Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel speech separation, dereverberation and recognition systems consistently outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute (41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech enhancement improvements were also obtained on PESQ, STOI and SRMR scores.

摘要
干预cocktail party speech中的干扰和混响是到现在为止的一个非常挑战的任务。我们被视觉modalities的不变性所 inspirited，我们提出了一种将视觉信息完全integrated into all system components的多渠道speech separation、dereverberation和认知approach。在这篇论文中，我们展示了视频输入的效果是在mask-based MVDR speech separation、DNN-WPE或spectral mapping（SpecM）based speech dereverberation前端和Conformer ASR后端中 consistently demonstrating。我们还investigated了audiovisual integrated front-end architecture，通过pipelined或joint fashion进行speech separation和dereverberationvia mask-based WPD。我们使用了end-to-end joint fine-tuning，使得error cost mismatch междуspeech enhancement front-end和ASR back-end component minimized。我们在用simulation或replay制造的Oxford LRS2 dataset上进行了实验，并 obtainted9.1%和6.2%的绝对（41.7%和36.0%相对）word error rate（WER）下降。我们还获得了consistent speech enhancement改进在PESQ、STOI和SRMR分数上。

BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables

paper_url: http://arxiv.org/abs/2307.02891
repo_url: https://github.com/babe-algorithm/babe
paper_authors: Ruta Binkyte, Daniele Gorla, Catuscia Palamidessi
for: 本研究旨在解决因敏感特征S和合法变量E之间的不公平歧视问题，提出了一种预处理方法以实现公平。methods: 本研究使用了抽象统计平衡和等可能性方法来解决不公平问题，并提出了一种基于抽象统计平衡和期望最大化方法的抽象隐藏变量BAYesian Bias Elimination（BaBE）方法。results: 实验表明，BABE方法可以在synthetic和实际数据集上提供高精度和公平性。

Abstract
We consider the problem of unfair discrimination between two groups and propose a pre-processing method to achieve fairness. Corrective methods like statistical parity usually lead to bad accuracy and do not really achieve fairness in situations where there is a correlation between the sensitive attribute S and the legitimate attribute E (explanatory variable) that should determine the decision. To overcome these drawbacks, other notions of fairness have been proposed, in particular, conditional statistical parity and equal opportunity. However, E is often not directly observable in the data, i.e., it is a latent variable. We may observe some other variable Z representing E, but the problem is that Z may also be affected by S, hence Z itself can be biased. To deal with this problem, we propose BaBE (Bayesian Bias Elimination), an approach based on a combination of Bayes inference and the Expectation-Maximization method, to estimate the most likely value of E for a given Z for each group. The decision can then be based directly on the estimated E. We show, by experiments on synthetic and real data sets, that our approach provides a good level of fairness as well as high accuracy.

摘要
我们考虑了不公正歧视的问题，并提出了预处理方法以实现公平。通常的统计平衡方法通常会导致准确率下降，并不真正实现公平在敏感特征S和合法特征E（解释变量）之间存在相关性的情况下。为了解决这些缺点，其他公平性的概念已经被提出，特别是conditional statistical parity和equal opportunity。然而，E通常不直接可见于数据中，即是隐藏变量。我们可以观察一些其他变量Z代表E，但问题是Z也可能受到S的影响，因此Z本身可能受到偏见。为解决这个问题，我们提出了BaBE（极 bayesian偏见消除）方法，该方法基于极 bayes推理和期望最大化方法，以估计每个组的E值。然后，决策可以直接基于估计的E值。我们通过对synthetic和实际数据集进行实验，显示了我们的方法可以实现高度的公平性和准确率。

Learning to Solve Tasks with Exploring Prior Behaviours

paper_url: http://arxiv.org/abs/2307.02889
repo_url: https://github.com/ricky-zhu/irdec
paper_authors: Ruiqi Zhu, Siyuan Li, Tianhong Dai, Chongjie Zhang, Oya Celiktutan
for: 解决深度强化学习（DRL）中任务具有稀有奖励的问题。
methods: 提出了内在奖励驱动示例基本控制（IRDEC）方法，可以让代理人学习和获得需要的先行行为，然后将其与任务特定的行为相连以解决稀有奖励任务。
results: 在三个导航任务和一个机械抓取任务中，与其他基准方法相比，IRDEC方法表现出色。代码可以在https://github.com/Ricky-Zhu/IRDEC中找到。

Abstract
Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.

摘要
深度强化学习（DRL）中广泛使用演示来解决 tasks with sparse rewards。然而，实际场景中的任务可能有不同的初始条件和演示，需要更多的先行行为。例如，假设我们给出了拾取从开放抽屉的任务演示，但抽屉在训练中是关闭的。如果不具备开抽屉的先行行为，机器人很难解决任务。为此，在这篇论文中，我们提出了内在奖励驱动的例子基于控制方法（IRDEC）。我们的方法可以让代理人具备解决 sparse-reward 任务的能力，并且不需要额外的先行行为演示。我们的方法的性能超过了其他基线在三个导航任务和一个机器人抓取任务上。代码可以在查看。

Contrast Is All You Need

paper_url: http://arxiv.org/abs/2307.02882
repo_url: https://github.com/rprokap/pset-9
paper_authors: Burak Kilic, Florix Bex, Albert Gatt
for: 本研究探讨了数据稀缺的分类场景，其中可用的法律涂抹数据少而受损，可能影响结果质量。
methods: 我们采用了两种finetuning目标，分别是SetFit（句子变换器finetuning）和普通的finetuningSetup，对法律规定分类任务进行了训练。此外，我们使用了LIME（本地可解释性模型无关描述）来检查模型决策中哪些特定特征的贡献。
results: 结果表明，使用SetFit的对比学习设置可以在使用一部分训练样本的情况下表现更好，而且LIME结果显示，对于法律有用的特征都得到了增强，这些特征对于分类决策起到了至关重要的作用。因此，一个使用对比学习目标进行finetuning的模型似乎更自信地基于法律有用的特征来做决策。

Abstract
In this study, we analyze data-scarce classification scenarios, where available labeled legal data is small and imbalanced, potentially hurting the quality of the results. We focused on two finetuning objectives; SetFit (Sentence Transformer Finetuning), a contrastive learning setup, and a vanilla finetuning setup on a legal provision classification task. Additionally, we compare the features that are extracted with LIME (Local Interpretable Model-agnostic Explanations) to see which particular features contributed to the model's classification decisions. The results show that a contrastive setup with SetFit performed better than vanilla finetuning while using a fraction of the training samples. LIME results show that the contrastive learning approach helps boost both positive and negative features which are legally informative and contribute to the classification results. Thus a model finetuned with a contrastive objective seems to base its decisions more confidently on legally informative features.

摘要
在这项研究中，我们分析了数据缺乏的分类场景，其中可用的法律批注数据较少、不均衡，可能影响结果质量。我们关注了两个精度调整目标：SetFit（句子变换器精度调整）和普通的精度调整setup，在法律条文分类任务上进行了比较。此外，我们使用LIME（本地可解释模型无关性解释）来比较这两个setup中提取的特征，以确定哪些特征对模型的分类决策产生了影响。结果显示，使用SetFit的对比学习设置可以在使用一小部分训练样本的情况下比普通的精度调整setup表现更好。LIME结果显示，对比学习 Approach Helps 提高了正面和负面的法律有用特征，这些特征对分类结果产生了重要的贡献。因此，一个通过对比学习 objective 精度调整的模型似乎更加可靠地基于法律有用的特征来做出决策。

Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain

paper_url: http://arxiv.org/abs/2307.02867
repo_url: None
paper_authors: Marc Zeller, Thomas Waschulzik, Reiner Schmid, Claus Bahlmann
for: 这个论文主要是为了探讨如何实现驾驶automation技术的可靠性和高效性，以便在非限制性基础设施上实现自动驾驶列车（GoA4）。
methods: 论文使用机器学习（ML）技术来实现必要的感知任务，并提出了一种安全的MLOps过程，以便可靠地部署和改进ML模型，以适应不断变化的运行环境。
results: 论文提出了一种 integrate system engineering, safety assurance, 和 ML 生命周期的全面工作流程，并描述了自动化不同阶段的挑战。该过程可以提高 ML 模型的可靠性和高效性，并且可以适应不断变化的运行环境。

Abstract
Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling improved reproducibility, traceability, collaboration, and continuous adaptation of a driverless operation to changing conditions. MLOps mixes ML application development and operation (Ops) and enables high frequency software releases and continuous innovation based on the feedback from operations. In this paper, we outline a safe MLOps process for the continuous development and safety assurance of ML-based systems in the railway domain. It integrates system engineering, safety assurance, and the ML life-cycle in a comprehensive workflow. We present the individual stages of the process and their interactions. Moreover, we describe relevant challenges to automate the different stages of the safe MLOps process.

摘要
传统自动化技术独立不能够实现列车无人驾驶（称为级别自动化（GoA）4）在不受限制的基础设施上。需要完成的感知任务现在通常使用机器学习（ML）来实现，因此需要可靠地开发和部署，以及持续地适应变化的条件。一个重要的方法是使用 MLOps 过程来提高可重复性、跟踪性、合作和不断创新，以便基于运行Feedback进行持续更新和改进。在这篇文章中，我们描述了一种安全的 MLOps 过程，用于无间断开发和验证 ML 基于系统的安全保障。它结合了系统工程、安全验证和 ML 生命周期在一个完整的工作流中。我们还描述了自动化不同阶段的安全 MLOps 过程中的挑战。

Enhancing LLM with Evolutionary Fine Tuning for News Summary Generation

paper_url: http://arxiv.org/abs/2307.02839
repo_url: None
paper_authors: Le Xiao, Xiaolin Chen
for: 本研究旨在提出一种新的新闻概要生成方法，使用LLM进行多种结构化事件模式提取、进化和选择，以生成准确可靠的新闻概要。
methods: 本研究使用LLM进行多种结构化事件模式提取，然后使用遗传算法进化事件模式人口，最后选择最适应的事件模式输入LLM生成新闻概要。
results: 实验结果表明，新闻概要生成器能够生成准确可靠的新闻概要，并具有一定的泛化能力。

Abstract
News summary generation is an important task in the field of intelligence analysis, which can provide accurate and comprehensive information to help people better understand and respond to complex real-world events. However, traditional news summary generation methods face some challenges, which are limited by the model itself and the amount of training data, as well as the influence of text noise, making it difficult to generate reliable information accurately. In this paper, we propose a new paradigm for news summary generation using LLM with powerful natural language understanding and generative capabilities. We use LLM to extract multiple structured event patterns from the events contained in news paragraphs, evolve the event pattern population with genetic algorithm, and select the most adaptive event pattern to input into the LLM to generate news summaries. A News Summary Generator (NSG) is designed to select and evolve the event pattern populations and generate news summaries. The experimental results show that the news summary generator is able to generate accurate and reliable news summaries with some generalization ability.

摘要
新闻概要生成是智能分析领域的重要任务，可以提供准确和全面的信息，帮助人们更好地理解和应对复杂的现实世界事件。然而，传统的新闻概要生成方法面临着一些挑战，这些挑战限制了模型本身和训练数据量，以及文本噪音的影响，使得生成可靠信息具有困难。在这篇论文中，我们提出了一种基于LLM的新闻概要生成方法。我们使用LLM提取新闻段落中包含的多种结构化事件模式，然后使用遗传算法演化事件模式人口，并将最适应的事件模式输入到LLM中生成新闻概要。一个新闻概要生成器（NSG）是设计用于选择和演化事件模式人口，并生成新闻概要。实验结果表明，新闻概要生成器能够生成准确和可靠的新闻概要，并具有一定的泛化能力。

Read, Look or Listen? What’s Needed for Solving a Multimodal Dataset

paper_url: http://arxiv.org/abs/2307.04532
repo_url: None
paper_authors: Netta Madvil, Yonatan Bitton, Roy Schwartz
for: 这篇论文主要是为了分析大规模多modal数据集的质量。
methods: 该方法使用一个小的人工标注种子来将每个多modal实例映射到需要处理它的modalities。
results: 该方法发现大多数问题可以通过单一的modalities来解答，而且大约70%的问题可以使用多种不同的单modal策略解答，如视频或音频等。此外，该方法还发现MERLOT Reserve在图像基本问题上表现不佳，而且声音识别也有困难。基于这些观察，我们提出了一个新的测试集，其中模型性能明显下降。

Abstract
The prevalence of large-scale multimodal datasets presents unique challenges in assessing dataset quality. We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it. Our method sheds light on the importance of different modalities in datasets, as well as the relationship between them. We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality. Moreover, we find that more than 70% of the questions are solvable using several different single-modality strategies, e.g., by either looking at the video or listening to the audio, highlighting the limited integration of multiple modalities in TVQA. We leverage our annotation and analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification. Based on our observations, we introduce a new test set that necessitates multiple modalities, observing a dramatic drop in model performance. Our methodology provides valuable insights into multimodal datasets and highlights the need for the development of more robust models.

摘要
“大规模多modal dataset的存在带来了评估dataset质量的困难。我们提出了一个two-step方法，利用一小部分的人类标注来将每个多modal instance映射到需要处理的modalities。我们的方法照明了不同modalities在dataset中的重要性，以及它们之间的关系。我们将方法应用到TVQA dataset上，发现大多数问题可以使用单一modalities来解答，没有明显的modalities偏好。此外，我们发现超过70%的问题可以使用多种不同的单modalities策略来解答，例如通过观看影片或聆听音频，这显示了TVQA中多modalities的整合有限。我们利用标注和分析MERLOT Reserve，发现它在影像基于问题上表现不佳，但也在语音和音频方面存在问题。根据我们的观察，我们创建了一个需要多 modalities的试点，观察到模型性能明显下降。我们的方法ologie提供了多modal dataset的有价值的内在性和强化模型的需求。”

Evaluating raw waveforms with deep learning frameworks for speech emotion recognition

paper_url: http://arxiv.org/abs/2307.02820
repo_url: None
paper_authors: Zeynep Hilal Kilimci, Ulku Bayraktar, Ayhan Kucukmanisa
for:The paper is written for the task of speech emotion recognition, specifically using raw audio files and deep neural networks to recognize emotions.methods:The proposed model uses a combination of machine learning algorithms, ensemble learning methods, and deep learning techniques, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and hybrid CNN-LSTM models.results:The proposed model achieves state-of-the-art performance on six different data sets, including TESS+RAVDESS, EMO-DB, RAVDESS, CREMA, SAVEE, and TESS, with accuracy rates ranging from 90.34% to 95.86%. Specifically, the CNN model outperforms existing approaches with 95.86% accuracy on the TESS+RAVDESS data set using raw audio files.

Abstract
Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and stacking methods are assessed as ensemble learning techniques. Moreover, convolutional neural networks, long short-term memory networks, and hybrid CNN- LSTM model are evaluated as deep learning techniques and compared with machine learning and ensemble learning methods. To demonstrate the effectiveness of proposed model, the comparison with state-of-the-art studies are carried out. Based on the experiment results, CNN model excels existent approaches with 95.86% of accuracy for TESS+RAVDESS data set using raw audio files, thence determining the new state-of-the-art. The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS with CNN model, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in speaker-independent audio categorization problems.

摘要
音频情感认识是语音处理领域的一个挑战任务。为此，特征提取过程具有关键的重要性，以便处理和识别语音信号。在本研究中，我们提出了一种模型，即将原始音频文件直接输入深度神经网络，不需要特征提取阶段，用于情感识别。为了证明提案的贡献，我们对传统特征提取技术（mel-scale spectrogram和mel-frequency cepstral coefficients）和机器学习算法（支持向量机器、决策树、愚蠢搜索、Random Forest）、ensemble学习方法（大多数投票和堆叠法）、深度学习技术（卷积神经网络、长期短期记忆网络和混合卷积LSTM）进行了评估。results表明，在TESS+RAVDESS数据集上，我们的模型在使用原始音频文件时达到了95.86%的准确率，从而确定了新的州OF-THE-ART。此外，我们的模型在EMO-DB、RAVDESS、TESS、CREMA和SAVEE数据集上也达到了高度的准确率。Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Semi-supervised Domain Adaptive Medical Image Segmentation through Consistency Regularized Disentangled Contrastive Learning

paper_url: http://arxiv.org/abs/2307.02798
repo_url: https://github.com/hritam-98/gfda-disentangled
paper_authors: Hritam Basak, Zhaozheng Yin
for:This paper focuses on semi-supervised domain adaptation (SSDA) for medical image segmentation, which can improve the adaptation performance with only a few labeled target samples.methods:The proposed method uses a two-stage training process, including pre-training an encoder using a novel domain-content disentangled contrastive learning (CL) and fine-tuning the encoder and decoder for pixel-level segmentation using a semi-supervised setting. The CL enforces the encoder to learn discriminative content-specific but domain-invariant semantics, while consistency regularization maintains spatial sensitivity.results:The proposed method outperforms state-of-the-art (SoTA) methods in both SSDA and unsupervised domain adaptation (UDA) settings, demonstrating its effectiveness in improving the adaptation performance with limited labeled target samples.

Abstract
Although unsupervised domain adaptation (UDA) is a promising direction to alleviate domain shift, they fall short of their supervised counterparts. In this work, we investigate relatively less explored semi-supervised domain adaptation (SSDA) for medical image segmentation, where access to a few labeled target samples can improve the adaptation performance substantially. Specifically, we propose a two-stage training process. First, an encoder is pre-trained in a self-learning paradigm using a novel domain-content disentangled contrastive learning (CL) along with a pixel-level feature consistency constraint. The proposed CL enforces the encoder to learn discriminative content-specific but domain-invariant semantics on a global scale from the source and target images, whereas consistency regularization enforces the mining of local pixel-level information by maintaining spatial sensitivity. This pre-trained encoder, along with a decoder, is further fine-tuned for the downstream task, (i.e. pixel-level segmentation) using a semi-supervised setting. Furthermore, we experimentally validate that our proposed method can easily be extended for UDA settings, adding to the superiority of the proposed strategy. Upon evaluation on two domain adaptive image segmentation tasks, our proposed method outperforms the SoTA methods, both in SSDA and UDA settings. Code is available at https://github.com/hritam-98/GFDA-disentangled

摘要
although unsupervised domain adaptation (uda) is a promising direction to alleviate domain shift, they fall short of their supervised counterparts. in this work, we investigate relatively less explored semi-supervised domain adaptation (ssda) for medical image segmentation, where access to a few labeled target samples can improve the adaptation performance substantially. specifically, we propose a two-stage training process. first, an encoder is pre-trained in a self-learning paradigm using a novel domain-content disentangled contrastive learning (cl) along with a pixel-level feature consistency constraint. the proposed cl enforces the encoder to learn discriminative content-specific but domain-invariant semantics on a global scale from the source and target images, whereas consistency regularization enforces the mining of local pixel-level information by maintaining spatial sensitivity. this pre-trained encoder, along with a decoder, is further fine-tuned for the downstream task (i.e. pixel-level segmentation) using a semi-supervised setting. furthermore, we experimentally validate that our proposed method can easily be extended for uda settings, adding to the superiority of the proposed strategy. upon evaluation on two domain adaptive image segmentation tasks, our proposed method outperforms the soTA methods, both in ssda and uda settings. code is available at https://github.com/hritam-98/gfda-disentangled.Note: Please note that the translation is in Simplified Chinese, and the formatting of the text may be different from the original English version.

BHEISR: Nudging from Bias to Balance – Promoting Belief Harmony by Eliminating Ideological Segregation in Knowledge-based Recommendations

paper_url: http://arxiv.org/abs/2307.02797
repo_url: None
paper_authors: Mengyan Wang, Yuxuan Hu, Zihan Yuan, Chenting Jiang, Weihua Li, Shiqing Wu, Quan Bai
for: Addressing the issue of belief imbalance and user biases in personalized recommendation systems, and mitigating the negative effects of the filter bubble phenomenon.
methods: Introducing an innovative intermediate agency (BHEISR) that combines principles from nudge theory and user-specific category information to stimulate curiosity and broaden users’ belief horizons.
results: Experimental results show that the BHEISR model outperforms several baseline models in mitigating filter bubbles and balancing user perspectives, with improved recommendation diversity and user satisfaction.

Abstract
In the realm of personalized recommendation systems, the increasing concern is the amplification of belief imbalance and user biases, a phenomenon primarily attributed to the filter bubble. Addressing this critical issue, we introduce an innovative intermediate agency (BHEISR) between users and existing recommendation systems to attenuate the negative repercussions of the filter bubble effect in extant recommendation systems. The main objective is to strike a belief balance for users while minimizing the detrimental influence caused by filter bubbles. The BHEISR model amalgamates principles from nudge theory while upholding democratic and transparent principles. It harnesses user-specific category information to stimulate curiosity, even in areas users might initially deem uninteresting. By progressively stimulating interest in novel categories, the model encourages users to broaden their belief horizons and explore the information they typically overlook. Our model is time-sensitive and operates on a user feedback loop. It utilizes the existing recommendation algorithm of the model and incorporates user feedback from the prior time frame. This approach endeavors to transcend the constraints of the filter bubble, enrich recommendation diversity, and strike a belief balance among users while also catering to user preferences and system-specific business requirements. To validate the effectiveness and reliability of the BHEISR model, we conducted a series of comprehensive experiments with real-world datasets. These experiments compared the performance of the BHEISR model against several baseline models using nearly 200 filter bubble-impacted users as test subjects. Our experimental results conclusively illustrate the superior performance of the BHEISR model in mitigating filter bubbles and balancing user perspectives.

摘要
在个性化推荐系统领域，现在的关注点是增强信念偏袋和用户偏见的问题，这主要归结于筛波效应。为解决这个关键问题，我们提出了一种创新的中间机制（BHEISR），位于用户和现有推荐系统之间，以减少现有推荐系统中的筛波效应的负面影响。BHEISR模型结合了抽象理论的原则，同时保持民主和透明的原则。它利用用户特定的类别信息，以刺激好奇，使用户在原来可能看得不愿意的领域中感兴趣。通过逐渐刺激用户对新类别的兴趣，模型让用户扩大他们的信念范围，探索通常被忽略的信息。我们的模型采用时间敏感的方式，通过用户反馈循环来运作。它利用现有推荐算法的模型，并将用户之前时间段的反馈纳入考虑。这种方法尝试突破筛波效应的限制，提高推荐多样性，并在用户中维护信念的平衡。为验证BHEISR模型的效果和可靠性，我们进行了一系列完整的实验，使用了现实世界数据集。这些实验比较了BHEISR模型与多个基eline模型的性能，使用了约200个受筛波影响的用户作为测试对象。我们的实验结果明确地表明，BHEISR模型在缓解筛波和平衡用户视角方面表现出色。

What Should Data Science Education Do with Large Language Models?

paper_url: http://arxiv.org/abs/2307.02792
repo_url: None
paper_authors: Xinming Tu, James Zou, Weijie J. Su, Linjun Zhang
for: 这个论文主要针对的是数据科学和统计领域的大语言模型（LLMs）的快速发展，以及这些state-of-the-art工具如何改变数据科学家的角色和职责。
methods: 这篇论文使用了各种LLMs，如ChatGPT，来描述数据科学家的角色发展和教育改革。
results: 论文认为，LLMs将改变数据科学家的职责，从手动编程、数据处理和标准分析中转移到评估和管理由自动化AI进行的分析。这种角色转变类似于软件工程师转变为产品经理。

Abstract
The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analyses to assessing and managing analyses performed by these automated AIs. This evolution of roles is reminiscent of the transition from a software engineer to a product manager. We illustrate this transition with concrete data science case studies using LLMs in this paper. These developments necessitate a meaningful evolution in data science education. Pedagogy must now place greater emphasis on cultivating diverse skillsets among students, such as LLM-informed creativity, critical thinking, AI-guided programming. LLMs can also play a significant role in the classroom as interactive teaching and learning tools, contributing to personalized education. This paper discusses the opportunities, resources and open challenges for each of these directions. As with any transformative technology, integrating LLMs into education calls for careful consideration. While LLMs can perform repetitive tasks efficiently, it's crucial to remember that their role is to supplement human intelligence and creativity, not to replace it. Therefore, the new era of data science education should balance the benefits of LLMs while fostering complementary human expertise and innovations. In conclusion, the rise of LLMs heralds a transformative period for data science and its education. This paper seeks to shed light on the emerging trends, potential opportunities, and challenges accompanying this paradigm shift, hoping to spark further discourse and investigation into this exciting, uncharted territory.

摘要
大 language models (LLMs) 的快速发展，如 ChatGPT，正在改变数据科学和统计领域。这些先进工具可以快速完成复杂的任务。因此，它们正在改变数据科学家的职责，从手动编程、数据处理和执行标准分析中转移注意力到评估和管理由这些自动化 AI 执行的分析。这种职业发展类似于软件工程师转变为产品经理。我们通过使用 LLMs 进行具体的数据科学案例研究， Illustrates 这种职业转变。这些发展需要数据科学教育做出深刻的改变。教学方法现需要更多地强调学生培养多样化的技能集，如 LLM Informed 创造力、批判性思维和 AI 领导编程。LLMs 也可以在教室中作为互动教学工具，贡献个性化教育。这篇文章讨论了这些方向的机会、资源和开放挑战。与任何转型技术一样，将 LLMs integrated 到教育中需要仔细考虑。虽然 LLMs 可以高效完成 repetitive 任务，但是它们的角色是补充人类智能和创造力，不是取代它们。因此，新的数据科学教育时代应该平衡 LLMS 的利点，同时培养补充人类专业知识和创新。总之，LLMs 的出现标志着数据科学和其教育的转型期。这篇文章 hoping 通过探讨 emerging 趋势、可能的机会和挑战，引发更多的讨论和调查，深入探索这个新、未知的领域。

The Role of Subgroup Separability in Group-Fair Medical Image Classification

paper_url: http://arxiv.org/abs/2307.02791
repo_url: https://github.com/biomedia-mira/subgroup-separability
paper_authors: Charles Jones, Mélanie Roschewitz, Ben Glocker
for: 本研究探讨深度分类器的性能差异。
methods: 本研究使用 teoretic analysis 和广泛的实验评估，探讨模型在不平等数据上训练时的性能差异。
results: 研究发现，不同医疗成像方式和保护特征下的模型性能差异有很大，而且这种差异可以预测模型偏见。这些发现提供了开发公平医疗AI的重要洞察。

Abstract
We investigate performance disparities in deep classifiers. We find that the ability of classifiers to separate individuals into subgroups varies substantially across medical imaging modalities and protected characteristics; crucially, we show that this property is predictive of algorithmic bias. Through theoretical analysis and extensive empirical evaluation, we find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis. Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.

摘要
我们研究深度分类器的性能差异。我们发现不同医疗影像模式和保护特征下的分类器能力对各个子群体进行分类有所不同，并且这种性质可以预测算法偏见。通过理论分析和广泛的实验评估，我们发现与偏见系统化医疗数据进行训练时，模型性能下降的关系，并且这种关系与子群体分化性和偏见系统化医疗数据之间存在相互关系。我们的发现为开发公正医疗AI提供了重要的新idea。

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback

paper_url: http://arxiv.org/abs/2307.02770
repo_url: https://github.com/tetrzim/diffusion-human-feedback
paper_authors: TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest K. Ryu
for: 这个研究用于约束 diffusion 模型生成高品质图像，并解决偶发生部分错配的问题。
methods: 使用预训 diffusion 模型，并使用一个以少量人工回馈训练的奖励模型来约束生成。
results: 这个方法可以实现高效的人工回馈，并且只需要几分钟的人工回馈来生成足够的标签。

Abstract
Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.

摘要
吸引模型最近已经显示出很好的成像质量。然而，有时候预训 diffusion model 会出现部分不稳定性，即模型可以生成好像，但也可能生成不想要的像。如果如此，我们只需要防止生成坏像，我们称这个任务为审查（censoring）。在这个工作中，我们使用预训 diffusion model 和少量人类反馈生成的 reward model，以审查生成的像。我们展示了审查可以通过EXTREME human feedback efficiency来实现，并且只需要几分钟的人类反馈就能生成足够的标签。代码可以在 GitHub 上找到：https://github.com/tetrzim/diffusion-human-feedback。

PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations

paper_url: http://arxiv.org/abs/2307.02762
repo_url: https://github.com/bcdnlp/PRD
paper_authors: Ruosen Li, Teerth Patel, Xinya Du
for: 提高自然语言处理（NLP）模型的评估和比较自动化
methods: 使用参考自由的大语言模型（LLM）作为评估标准，并提出两种改进方法： peer rank（PR）算法和 peer discussion（PD）
results: 实验结果显示，我们的方法可以提高评估准确率和与人类评价更加一致，同时PR算法可以在匿名设定下实现相对准确的自我评估。

Abstract
Nowadays, the quality of responses generated by different modern large language models (LLMs) are hard to evaluate and compare automatically. Recent studies suggest and predominantly use LLMs as a reference-free metric for open-ended question answering. More specifically, they use the recognized "strongest" LLM as the evaluator, which conducts pairwise comparisons of candidate models' answers and provides a ranking score. However, this intuitive method has multiple problems, such as bringing in self-enhancement (favoring its own answers) and positional bias. We draw insights and lessons from the educational domain (Cho and MacArthur, 2011; Walsh, 2014) to improve LLM-based evaluations. Specifically, we propose the (1) peer rank (PR) algorithm that takes into account each peer LLM's pairwise preferences of all answer pairs, and outputs a final ranking of models; and (2) peer discussion (PD), where we prompt two LLMs to discuss and try to reach a mutual agreement on preferences of two answers. We conduct experiments on two benchmark datasets. We find that our approaches achieve higher accuracy and align better with human judgments, respectively. Interestingly, PR can induce a relatively accurate self-ranking of models under the anonymous setting, where each model's name is unrevealed. Our work provides space to explore evaluating models that are hard to compare for humans.

摘要
现在，不同的现代大语言模型（LLM）的回答质量很难自动评估和比较。 latest studies suggest and predominantly use LLMs as a reference-free metric for open-ended question answering. More specifically, they use the recognized "strongest" LLM as the evaluator, which conducts pairwise comparisons of candidate models' answers and provides a ranking score. However, this intuitive method has multiple problems, such as self-enhancement (favoring its own answers) and positional bias. We draw insights and lessons from the educational domain (Cho and MacArthur, 2011; Walsh, 2014) to improve LLM-based evaluations. Specifically, we propose the (1) peer rank (PR) algorithm that takes into account each peer LLM's pairwise preferences of all answer pairs, and outputs a final ranking of models; and (2) peer discussion (PD), where we prompt two LLMs to discuss and try to reach a mutual agreement on preferences of two answers. We conduct experiments on two benchmark datasets. We find that our approaches achieve higher accuracy and align better with human judgments, respectively. Interestingly, PR can induce a relatively accurate self-ranking of models under the anonymous setting, where each model's name is unrevealed. Our work provides space to explore evaluating models that are hard to compare for humans.Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Knowledge Graph Self-Supervised Rationalization for Recommendation

paper_url: http://arxiv.org/abs/2307.02759
repo_url: https://github.com/hkuds/kgrec
paper_authors: Yuhao Yang, Chao Huang, Lianghao Xia, Chunzhen Huang
for: 本文提出了一种新的自监理解方法，叫做 KGRec，用于知识感知推荐系统。
methods: 本文提出了一种注意力机制，用于生成有用的知识连接。这个机制根据知识三元组的 rational scores 进行评分，并将这些 scores 用于推荐系统中的生成和对比自我监管任务。
results: 对三个实际数据集进行了广泛的实验，显示 KGRec 可以超过当前的方法。此外，我们还提供了实现代码，可以在 https://github.com/HKUDS/KGRec 中找到。

Abstract
In this paper, we introduce a new self-supervised rationalization method, called KGRec, for knowledge-aware recommender systems. To effectively identify informative knowledge connections, we propose an attentive knowledge rationalization mechanism that generates rational scores for knowledge triplets. With these scores, KGRec integrates generative and contrastive self-supervised tasks for recommendation through rational masking. To highlight rationales in the knowledge graph, we design a novel generative task in the form of masking-reconstructing. By masking important knowledge with high rational scores, KGRec is trained to rebuild and highlight useful knowledge connections that serve as rationales. To further rationalize the effect of collaborative interactions on knowledge graph learning, we introduce a contrastive learning task that aligns signals from knowledge and user-item interaction views. To ensure noise-resistant contrasting, potential noisy edges in both graphs judged by the rational scores are masked. Extensive experiments on three real-world datasets demonstrate that KGRec outperforms state-of-the-art methods. We also provide the implementation codes for our approach at https://github.com/HKUDS/KGRec.

摘要
在这篇论文中，我们介绍了一种新的自动化逻辑化方法，即KGRec，用于知识推荐系统。为了有效地Identify informative知识连接，我们提议了一种注意力机制，生成了知识 triplets 的 rational scores。通过这些分数，KGRec integrate了生成和对比自我超vised任务来进行推荐。为了强调知识图中的逻辑，我们设计了一种新的生成任务，即mas-king-reconstructing。通过将高分 rational scores 中的重要知识掩码，KGRec 被训练来重建和强调有用的知识连接，并作为逻辑。此外，为了更加有效地合理化知识图学习的对抗效果，我们引入了一种对比学习任务，该任务将知识视图和用户ITEM视图的信号进行对比。为了避免噪声的影响，我们将两个图中的潜在噪声根据 rational scores 进行掩码。我们的实验表明，KGRec 在三个实际 dataset 上表现出色，并提供了实现代码在https://github.com/HKUDS/KGRec。

Offline Reinforcement Learning with Imbalanced Datasets

paper_url: http://arxiv.org/abs/2307.02752
repo_url: None
paper_authors: Li Jiang, Sijie Chen, Jielin Qiu, Haoran Xu, Wai Kin Chan, Zhao Ding
for: This paper aims to address the issue of imbalanced datasets in offline reinforcement learning (RL) research, which can lead to neglect of real-world dataset distributions in the development of models.
methods: The proposed method utilizes the augmentation of conservative Q-learning (CQL) with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets.
results: The proposed method is shown to be superior to other baselines through empirical results on several tasks in the context of imbalanced datasets with varying levels of imbalance.Here’s the Chinese translation of the three points:
for: 本研究旨在解决现有的偏见RL数据集问题，这些数据集的分布在RL模型的开发中被忽略。
methods: 提议的方法利用CQL的扩展和回忆过程来重新回忆过去相关的经验，以解决偏见数据集中的挑战。
results: 对各种偏见数据集的实验结果表明，提议的方法比基eline方法更加有效。

Abstract
The prevalent use of benchmarks in current offline reinforcement learning (RL) research has led to a neglect of the imbalance of real-world dataset distributions in the development of models. The real-world offline RL dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. In this paper, we specify properties of imbalanced datasets in offline RL, where the state coverage follows a power law distribution characterized by skewed policies. Theoretically and empirically, we show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets. We evaluate our method on several tasks in the context of imbalanced datasets with varying levels of imbalance, utilizing the variant of D4RL. Empirical results demonstrate the superiority of our method over other baselines.

摘要
现有的大量实验算法（RL）研究中的标准实践是将注重在实验 dataset 的平衡性，导致实验 dataset 的不寻常性被忽略。实际上，现世界的 offline RL 资料集经常具有体积不寻常的状态空间分布，这可能是因为探索或安全考虑所带来的挑战。在这篇文章中，我们详细描述了实际上的 offline RL 资料集的不寻常性，其状态覆盖率遵循力量律分布，并且政策具有偏好性。我们理论和实验显示，通常的 offline RL 方法，如保守 Q-学习（CQL），在不寻常的 dataset 上不能够提取政策。受自然智慧启发，我们提出了一种新的 offline RL 方法，利用 CQL 的增强和回传过程来重新回传过去相关的体验，有效地解决了不寻常的 dataset 带来的挑战。我们在具有不同水平的不寻常度的任务上进行了评估，使用 D4RL 的变体。实验结果显示了我们的方法与其他基准相比有所superiority。

RecallM: An Architecture for Temporal Context Understanding and Question Answering

paper_url: http://arxiv.org/abs/2307.02738
repo_url: https://github.com/cisco-open/DeepVision/tree/main/recallm
paper_authors: Brandon Kynoch, Hugo Latapie
for: 该论文旨在为基于语言模型（LLM）的聊天机器人开发持续学习、复杂逻辑和序列和时间相关性的理想长期记忆机制。
methods: 该论文提出了一种新的长期记忆建模方法，包括创建适应和更新长期记忆的AGI系统体系。
results: 经过多个实验，该方法能够提供更好的时间理解和知识掌握。

Abstract
The ideal long-term memory mechanism for Large Language Model (LLM) based chatbots, would lay the foundation for continual learning, complex reasoning and allow sequential and temporal dependencies to be learnt. Creating this type of memory mechanism is an extremely challenging problem. In this paper we explore different methods of achieving the effect of long-term memory. We propose a new architecture focused on creating adaptable and updatable long-term memory for AGI systems. We demonstrate through various experiments the benefits of the RecallM architecture, particularly the improved temporal understanding of knowledge it provides.

摘要
理想的长期记忆机制为基于自然语言模型（LLM）的聊天机器人，将为持续学习、复杂逻辑和时间相关性学习 lay the foundation. 创建这种类型的记忆机制是极其困难的问题。在这篇论文中，我们探讨不同的方法来实现长期记忆的效果。我们提出了一种新的架构，专门为智能人工智能系统（AGI）创建可适应和可更新的长期记忆。我们通过多种实验证明了RecallM架构的优势，尤其是它在知识的时间理解方面的改善。

Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating

paper_url: http://arxiv.org/abs/2307.02730
repo_url: https://github.com/dingyn-Reno/MMFS
paper_authors: Sheng-Lan Liu, Yu-Ning Ding, Si-Fan Zhang, Wen-Yue Chen, Ning Zhou, Hao Liu, Gui-Hong Lao
for: 本研究的目的是提出一个多Modal和多任务的冰上滑冰动作数据集（MMFS），用于提高细化动作识别和评估。
methods: 本研究使用RGB、骨架和得分来收集11671个clip和256个类别，包括空间和时间标签。
results: 本研究的三大贡献是：1）独立的空间和时间类别来进一步探索细化动作识别和评估; 2）首次使用骨架模式进行复杂细化动作质量评估; 3）多Modal和多任务数据集鼓励更多的动作分析模型。

Abstract
The fine-grained action analysis of the existing action datasets is challenged by insufficient action categories, low fine granularities, limited modalities, and tasks. In this paper, we propose a Multi-modality and Multi-task dataset of Figure Skating (MMFS) which was collected from the World Figure Skating Championships. MMFS, which possesses action recognition and action quality assessment, captures RGB, skeleton, and is collected the score of actions from 11671 clips with 256 categories including spatial and temporal labels. The key contributions of our dataset fall into three aspects as follows. (1) Independently spatial and temporal categories are first proposed to further explore fine-grained action recognition and quality assessment. (2) MMFS first introduces the skeleton modality for complex fine-grained action quality assessment. (3) Our multi-modality and multi-task dataset encourage more action analysis models. To benchmark our dataset, we adopt RGB-based and skeleton-based baseline methods for action recognition and action quality assessment.

摘要
<>TRANSLATE_TEXT全球figure溜滑锦标赛中的现有动作数据集存在精细动作分类、低精细度、有限Modalities和任务的挑战。在本文中，我们提出了多模态和多任务figure溜滑数据集(MMFS)，该数据集来自于世界figure溜滑锦标赛。MMFS包含动作识别和动作质量评估，并收集了RGB、骨架和动作分数的11671个clip，包括256个类别，其中包括空间和时间标签。我们数据集的关键贡献有三个方面：1. 我们首次独立提出了空间和时间类别，以进一步探索精细动作识别和质量评估。2. MMFS首次引入骨架模式，为复杂精细动作质量评估提供了新的可能性。3. 我们的多模态和多任务数据集激励更多的动作分析模型。为了评估我们的数据集，我们采用了基于RGB和骨架的基eline方法进行动作识别和质量评估。>>

Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill-Learning

paper_url: http://arxiv.org/abs/2307.02728
repo_url: None
paper_authors: Andrew Levy, Sreehari Rammohan, Alessandro Allievi, Scott Niekum, George Konidaris
for: 这篇论文目的是学习大量独特技能的Agent。
methods: 论文使用了Goal-Conditioned Hierarchical Reinforcement Learning的概念，并提出了一个新的框架 called Hierarchical Empowerment，可以更方便地计算Empowerment。
results: 在一系列的 simulate robotics tasks 中，论文的四级Agent能够学习技能，覆盖了两个级别的表面积，比之前的工作更大。

Abstract
General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and the states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Conditioned Hierarchical Reinforcement Learning. Our framework makes two specific contributions. First, we introduce a new variational lower bound on mutual information that can be used to compute empowerment over short horizons. Second, we introduce a hierarchical architecture for computing empowerment over exponentially longer time scales. We verify the contributions of the framework in a series of simulated robotics tasks. In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.

摘要
通用探索者需要大量技能。使owerment（最大共同信息）提供了学习大量独特技能的路径，但共同信息困难优化。我们提出了一个新的框架，层次empowerment，将goal-conditioned hierarchical reinforcement learning中的概念集成到计算empowerment中。我们的框架做出了两个具体贡献：首先，我们引入了一个新的Variational lower bound on mutual information，用于计算empowerment over short horizons。其次，我们引入了一个层次结构，用于计算empowerment over exponentially longer time scales。我们在一系列的模拟机器人任务中验证了我们的框架的贡献。在一个流行的蚂蚁导航领域中，我们的四级探索者能够学习技能，覆盖面积超过两个数量级。

TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations

paper_url: http://arxiv.org/abs/2307.02717
repo_url: None
paper_authors: Dengfeng Wang, Liukai Xu, Songyuan Liu, zhi Li, Yiming Chen, Weifeng He, Xueqing Li, Yanan Su
for: 大规模神经网络（NN）的计算在内存（ computing-in-memory，CIM）中进行融合，以减少外部存储器访问。
methods: 使用高密度单级ReRAM（single-level ReRAM）和高效性SRAM-CIM（SRAM-based computing-in-memory）的集成，以实现在芯片上的大规模NN计算。
results: 提出了一种高密度三级ReRAM助け进行计算在非易失RAM（nonvolatile RAM，nvSRAM）中的大规模NN计算，并实现了对比基eline设计的7.8倍高的存储密度，以及2.9倍和1.9倍的能效性提升。

Abstract
Accommodating all the weights on-chip for large-scale NNs remains a great challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by integrating high-density single-level ReRAMs on the top of high-efficiency SRAM-CIM for weight storage to eliminate the off-chip memory access. However, previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of SL-ReRAMs and limited computing efficiency. To overcome these challenges, this work proposes an ultra-high-density three-level ReRAMs-assisted computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM mechanism with differential computing scheme is proposed for energy-efficient ternary MAC operations while preserving high NN accuracy. The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM and ReRAM-CIM, respectively.

摘要
实现大规模神经网络（NN）中的所有重量在芯片上是一个巨大的挑战，特别是在有限的芯片容量下。前一代非易失性SRAM-CIM（nvSRAM-CIM）解决了这个问题，通过将高密度单级ReRAM（SL-ReRAM）组装在高效率SRAM-CIM之上，以储存重量，并消除外部内存存取。然而，前一代SL-nvSRAM-CIM受到更多SL-ReRAM的扩展和有限的计算效率的限制。为了解决这些挑战，这个工作提出了一个超高密度三级ReRAM-助け（TL-nvSRAM-CIM）方案，用于大型NN模型。clustered n-selector-n-ReRAM（cluster-nSnRs）被用来保证可靠的重量复原，并消除了DC电压。此外，一个ternary SRAM-CIM机制（ternary SRAM-CIM）和分别计算方案（differential computing scheme）是提出，以节能地进行ternary MAC操作，保持高NN准确性。提案的TL-nvSRAM-CIM在存储密度方面比前一代作品高7.8倍，而且在计算效率方面比基准设计高2.9倍和1.9倍，分别比SRAM-CIM和ReRAM-CIM基准设计高。

Validation of the Practicability of Logical Assessment Formula for Evaluations with Inaccurate Ground-Truth Labels

paper_url: http://arxiv.org/abs/2307.02709
repo_url: None
paper_authors: Yongquan Yang, Hong Bu
for: 这篇论文是为了评估带有不准确真实标签（IAGTL）的预测模型而提出的新理论。
methods: 这篇论文使用了逻辑评估方程（LAF）来评估带有IAGTL的预测模型。
results: 实验结果表明，LAF可以有效地评估带有IAGTL的预测模型，并且在乳腺癌分 segmentation（TSfBC）领域的医学板卷图分析（MHWSIA）中得到了有效的结果。

Abstract
Logical assessment formula (LAF) is a new theory proposed for evaluations with inaccurate ground-truth labels (IAGTLs) to assess the predictive models for various artificial intelligence applications. However, the practicability of LAF for evaluations with IAGTLs has not yet been validated in real-world practice. In this paper, to address this issue, we applied LAF to tumour segmentation for breast cancer (TSfBC) in medical histopathology whole slide image analysis (MHWSIA). Experimental results and analysis show the validity of LAF for evaluations with IAGTLs in the case of TSfBC and reflect the potentials of LAF applied to MHWSIA.

摘要
新的逻辑评估方程（LAF）是一种提议用于评估具有不准确真实标签（IAGTL）的预测模型，用于不同的人工智能应用。然而，LAF在实际应用中的实用性尚未得到证实。在这篇论文中，我们应用了LAF来评估乳腺癌 segmentation（TSfBC）在医学板寸影像分析（MHWSIA）中。实验结果和分析表明LAF对于具有IAGTL的评估是有效的，并且反映了LAF在MHWSIA中的潜在应用 potential。

Loss Functions and Metrics in Deep Learning. A Review

paper_url: http://arxiv.org/abs/2307.02694
repo_url: None
paper_authors: Juan Terven, Diana M. Cordova-Esparza, Alfonzo Ramirez-Pedraza, Edgar A. Chavez-Urbiola
for: 本文对深度学习中最常用的损失函数和性能指标进行了评估和梳理，以帮助读者选择适合自己特定任务的最佳方法。
methods: 本文评论了深度学习中最常用的损失函数和性能指标，包括损失函数的选择和性能指标的选择，并解释了每种方法的优缺点和应用场景。
results: 本文提供了深度学习中不同任务中的损失函数和性能指标的综述，并给出了各种任务的示例和应用。这些信息可以帮助读者更好地理解深度学习中的不同方法和技术，并选择适合自己特定任务的最佳方法。

Abstract
One of the essential components of deep learning is the choice of the loss function and performance metrics used to train and evaluate models. This paper reviews the most prevalent loss functions and performance measurements in deep learning. We examine the benefits and limits of each technique and illustrate their application to various deep-learning problems. Our review aims to give a comprehensive picture of the different loss functions and performance indicators used in the most common deep learning tasks and help practitioners choose the best method for their specific task.

摘要
一个深度学习中的重要组成部分是选择用于训练和评估模型的损失函数和性能指标。本文查询了深度学习中最广泛使用的损失函数和性能指标，并分析它们的优缺点，以及它们在不同的深度学习问题中的应用。本文的审查旨在为具体任务选择最佳的方法，并给深度学习实践者提供一个全面的损失函数和性能指标选择指南。Here's the word-for-word translation of the text into Simplified Chinese:一个深度学习中的重要组成部分是选择用于训练和评估模型的损失函数和性能指标。本文查询了深度学习中最广泛使用的损失函数和性能指标，并分析它们的优缺点，以及它们在不同的深度学习问题中的应用。本文的审查旨在为具体任务选择最佳的方法，并给深度学习实践者提供一个全面的损失函数和性能指标选择指南。

SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially Observable Multi-Agent Path Finding

paper_url: http://arxiv.org/abs/2307.02691
repo_url: https://github.com/qiushi-lin/sacha
paper_authors: Qiushi Lin, Hang Ma
for: 这篇论文主要研究了多 Agent Path Finding（MAPF）中多个 Agent 之间的协作问题，即每个 Agent 需要规划一个不与别的 Agent 相撞的路径以达到目标位置。
methods: 该论文提出了一种名为 Soft Actor-Critic with Heuristic-Based Attention（SACHA）的多 Agent 学习法，该法使用了 novel heuristic-based attention 机制来促进多 Agent 之间的协作。SACHA 学习了一个 neural network 来让每个 Agent 选择ively 听取多个 Agent 的最短路径准则引导，从而使得更可扩展地学习协作。
results: comparied to 现有的学习基于方法，SACHA 和 SACHA(C) 在多种 MAPF 实例上显示出了较好的成绩，包括Success rate 和解决质量。

Abstract
Multi-Agent Path Finding (MAPF) is a crucial component for many large-scale robotic systems, where agents must plan their collision-free paths to their given goal positions. Recently, multi-agent reinforcement learning has been introduced to solve the partially observable variant of MAPF by learning a decentralized single-agent policy in a centralized fashion based on each agent's partial observation. However, existing learning-based methods are ineffective in achieving complex multi-agent cooperation, especially in congested environments, due to the non-stationarity of this setting. To tackle this challenge, we propose a multi-agent actor-critic method called Soft Actor-Critic with Heuristic-Based Attention (SACHA), which employs novel heuristic-based attention mechanisms for both the actors and critics to encourage cooperation among agents. SACHA learns a neural network for each agent to selectively pay attention to the shortest path heuristic guidance from multiple agents within its field of view, thereby allowing for more scalable learning of cooperation. SACHA also extends the existing multi-agent actor-critic framework by introducing a novel critic centered on each agent to approximate $Q$-values. Compared to existing methods that use a fully observable critic, our agent-centered multi-agent actor-critic method results in more impartial credit assignment and better generalizability of the learned policy to MAPF instances with varying numbers of agents and types of environments. We also implement SACHA(C), which embeds a communication module in the agent's policy network to enable information exchange among agents. We evaluate both SACHA and SACHA(C) on a variety of MAPF instances and demonstrate decent improvements over several state-of-the-art learning-based MAPF methods with respect to success rate and solution quality.

摘要
多智能路径规划（MAPF）是许多大规模 роботиче系统中的关键组件，其中智能体需要规划避免碰撞的路径来到目标位置。在最近，多智能学习 reinforcement learning 被引入解决部分可见 MAPF 问题，通过学习一个均衡单个智能体策略来实现多智能体协作。然而，现有的学习基于方法在拥堵环境中效果不佳，主要因为这种设定的不可预测性。为了解决这个挑战，我们提出了一种多智能actor-critic方法called Soft Actor-Critic with Heuristic-Based Attention（SACHA），该方法使用了新的征量基于注意力机制来促进多智能体协作。SACHA 学习一个智能网络，以便每个智能体在其视野中选择多个智能体的最短路径征量引导，从而使得学习协作更加扩展。SACHA 还扩展了现有的多智能actor-critic框架，通过引入每个智能体中心的 $Q $-值评价器来更好地分配减少信息。与现有方法使用完全可见评价器相比，我们的代理中心多智能actor-critic方法可以更好地分配减少信息，并且更好地泛化学习到 MAPF 实例中的不同数量和类型的环境。此外，我们还实现了 SACHA（C），它在智能体策略网络中嵌入通信模块，以便智能体之间交换信息。我们对 SACHA 和 SACHA（C）在多种 MAPF 实例上进行评估，并证明它们在成功率和解决质量方面具有显著改进。

Scaling In-Context Demonstrations with Structured Attention

paper_url: http://arxiv.org/abs/2307.02690
repo_url: None
paper_authors: Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang
for: 提高大语言模型（LLM）在上下文学习中的能力，即从上下文中学习而无需参数更新。
methods: 提议一种新的建筑设计，即SAICL（结构化注意力 для上下文学习），它将全注意力替换为专门为上下文学习设计的结构化注意力机制，并消除不必要的示例之间的依赖关系，使模型不受示例顺序的影响。
results: SAICL在meta-training框架下评估，与全注意力相比具有相当或更好的性能，同时可以实现up to 3.4倍的搜索速度提升。SAICL还一直超越了一个强的Fusion-in-Decoder（FiD）基线，该基线每个示例都进行独立处理。最后，由于SAICL的线性特性，我们证明SAICL可以轻松扩展到数百个示例，并且随着扩展，性能会持续提高。

Abstract
The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between individual demonstrations, while making the model invariant to the permutation of demonstrations. We evaluate SAICL in a meta-training framework and show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a strong Fusion-in-Decoder (FiD) baseline which processes each demonstration independently. Finally, thanks to its linear nature, we demonstrate that SAICL can easily scale to hundreds of demonstrations with continuous performance gains with scaling.

摘要
Recent large language models (LLMs) have highlighted their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between individual demonstrations, while making the model invariant to the permutation of demonstrations. We evaluate SAICL in a meta-training framework and show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a strong Fusion-in-Decoder (FiD) baseline which processes each demonstration independently. Finally, thanks to its linear nature, we demonstrate that SAICL can easily scale to hundreds of demonstrations with continuous performance gains with scaling.

Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews

paper_url: http://arxiv.org/abs/2307.03691
repo_url: None
paper_authors: Jessica Echterhoff, An Yan, Julian McAuley
for: 用于帮助用户寻找最佳选择 among many similar alternatives
methods: 使用 transformer 架构，包括 item encoding module、comparison generation module 和 novel decoding method for user personalization
results: 生成了 fluently diverse 的比较句子，并在人工评估研究中表明了 relevance 和 truthfulness 的性能

Abstract
It is time-consuming to find the best product among many similar alternatives. Comparative sentences can help to contrast one item from others in a way that highlights important features of an item that stand out. Given reviews of one or multiple items and relevant item features, we generate comparative review sentences to aid users to find the best fit. Specifically, our model consists of three successive components in a transformer: (i) an item encoding module to encode an item for comparison, (ii) a comparison generation module that generates comparative sentences in an autoregressive manner, (iii) a novel decoding method for user personalization. We show that our pipeline generates fluent and diverse comparative sentences. We run experiments on the relevance and fidelity of our generated sentences in a human evaluation study and find that our algorithm creates comparative review sentences that are relevant and truthful.

摘要
寻找最佳产品中的多个相似alternative很时间consuming。比较句子可以帮助用户对多个 Item进行对比，并将重点放在Item的重要特征上。我们的模型由三个顺序组成：（i）Item编码模块，用于编码 Item进行比较；（ii）比较生成模块，通过自动生成比较句子的方式生成比较句子；（iii）用户个性化解码方法。我们的管道能够生成流畅和多样的比较句子。我们在人工评估研究中运行了我们生成的句子的相关性和真实性，发现我们的算法可以创建相关的和真实的比较句子。

AI4OPT: AI Institute for Advances in Optimization

paper_url: http://arxiv.org/abs/2307.02671
repo_url: None
paper_authors: Pascal Van Hentenryck, Kevin Dalmeijer
for: 本研究是NSF AI Institute for Advances in Optimization的简介，旨在结合人工智能和优化，以解决供应链、能源系统、半导体设计和生产、可持续食品系统等领域的问题。
methods: 本研究使用了”教学教育”哲学，提供了长期教育路径，以帮助工程师学习人工智能技术。
results: 本研究未提供结果，但描述了AI4OPT的目标和方法。

Abstract
This article is a short introduction to AI4OPT, the NSF AI Institute for Advances in Optimization. AI4OPT fuses AI and Optimization, inspired by end-use cases in supply chains, energy systems, chip design and manufacturing, and sustainable food systems. AI4OPT also applies its "teaching the teachers" philosophy to provide longitudinal educational pathways in AI for engineering.

摘要
这篇文章是关于AI4OPT，美国国家科学基金会的人工智能实验室，它将人工智能和优化相结合，以解决来自供应链、能源系统、半导体设计和生产、可持续食品系统等领域的实际问题。AI4OPT还采用“教师教学”哲学，为工程领域提供了长期教育路径。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

paper_url: http://arxiv.org/abs/2307.02663
repo_url: None
paper_authors: Tengchan Zeng, Aidin Ferdowsi, Omid Semiari, Walid Saad, Choong Seon Hong
for: 本研究旨在探讨自动驾驶汽车（CAV） navigate to target destinations 的问题，以便实现减少交通事故人类错误、提高交通效率、执行多种任务等优点。
methods: 本文使用沟通理论、控制理论和机器学习等方法，提出了解决CAV自动导航中的集成问题的方案。
results: 研究人员通过提出稳定轨迹追踪、鲁棒控制 противcyber-physical攻击、适应导航控制器设计等解决方案，以及在多辆CAV协调运动时的稳定队形、快速协作学习和分布式入侵检测等问题的分析和解决方案。

Abstract
Connected and autonomous vehicles (CAVs) can reduce human errors in traffic accidents, increase road efficiency, and execute various tasks ranging from delivery to smart city surveillance. Reaping these benefits requires CAVs to autonomously navigate to target destinations. To this end, each CAV's navigation controller must leverage the information collected by sensors and wireless systems for decision-making on longitudinal and lateral movements. However, enabling autonomous navigation for CAVs requires a convergent integration of communication, control, and learning systems. The goal of this article is to explicitly expose the challenges related to this convergence and propose solutions to address them in two major use cases: Uncoordinated and coordinated CAVs. In particular, challenges related to the navigation of uncoordinated CAVs include stable path tracking, robust control against cyber-physical attacks, and adaptive navigation controller design. Meanwhile, when multiple CAVs coordinate their movements during navigation, fundamental problems such as stable formation, fast collaborative learning, and distributed intrusion detection are analyzed. For both cases, solutions using the convergence of communication theory, control theory, and machine learning are proposed to enable effective and secure CAV navigation. Preliminary simulation results are provided to show the merits of proposed solutions.

摘要
自适应并连接的自动车 (CAVs) 可以减少交通事故中人类错误，提高道路效率，并执行各种任务，从物交付到智能城市监测。为了实现这些利益，每辆 CAV 的导航控制器需要根据感知器和无线系统收集的信息进行决策。然而，为 CAvs 实现自主导航，需要通信、控制和学习系统的融合。这篇文章的目标是暴露 CAvs 自主导航中存在的挑战，并提出解决方案，并在两个主要应用场景中进行分析：不协调的 CAvs 和协调 CAvs。在不协调 CAvs 的导航中，存在稳定轨迹追踪、针对物理攻击的强健控制和适应导航控制器设计的挑战。而当多辆 CAvs 协调其运动时，存在稳定队形、快速协同学习和分布式入侵检测的基本问题。为了解决这些问题，文章提出了通信理论、控制理论和机器学习的融合解决方案。文章还提供了先前的 simulations 结果，以证明提议的解决方案的优点。

Many-objective Optimization via Voting for Elites

paper_url: http://arxiv.org/abs/2307.02661
repo_url: https://github.com/uvm-neurobotics-lab/move
paper_authors: Jackson Dean, Nick Cheney
for: solving many-objective optimization problems with complex trade-offs between objectives
methods: combines Many-Objective Evolutionary Algorithms and Quality Diversity algorithms like MAP-Elites, maintains a map of elites that perform well on different subsets of objective functions
results: outperforms a naive single-objective baseline on a 14-objective image-neuroevolution problem, relies on solutions jumping across bins (goal-switching) for better performance, suggests automatic identification of stepping stones or curriculum learning.

Abstract
Real-world problems are often comprised of many objectives and require solutions that carefully trade-off between them. Current approaches to many-objective optimization often require challenging assumptions, like knowledge of the importance/difficulty of objectives in a weighted-sum single-objective paradigm, or enormous populations to overcome the curse of dimensionality in multi-objective Pareto optimization. Combining elements from Many-Objective Evolutionary Algorithms and Quality Diversity algorithms like MAP-Elites, we propose Many-objective Optimization via Voting for Elites (MOVE). MOVE maintains a map of elites that perform well on different subsets of the objective functions. On a 14-objective image-neuroevolution problem, we demonstrate that MOVE is viable with a population of as few as 50 elites and outperforms a naive single-objective baseline. We find that the algorithm's performance relies on solutions jumping across bins (for a parent to produce a child that is elite for a different subset of objectives). We suggest that this type of goal-switching is an implicit method to automatic identification of stepping stones or curriculum learning. We comment on the similarities and differences between MOVE and MAP-Elites, hoping to provide insight to aid in the understanding of that approach $\unicode{x2013}$ and suggest future work that may inform this approach's use for many-objective problems in general.

摘要
现实世界中的问题经常具有多个目标，需要一种精准地考虑这些目标的解决方案。现有的多目标优化方法常常假设知道目标的重要性和难度，或者需要庞大的人口来超越维度瓶颈。 combining了多目标演化算法和质量多样性算法的元素，我们提出了多目标优化via投票for elites（MOVE）。MOVE维护一个ELITES的地图，这些解决方案在不同的目标函数子集中表现出色。在一个14个目标图像神经演化问题上，我们展示了MOVE可以使用50个elites来解决问题，并且超越了单一目标基线。我们发现，该算法的性能取决于解决方案在不同目标函数子集之间跳跃（父代生成子代的时候，子代需要在不同的目标函数子集中表现出色）。我们认为这种目标跳跃是一种隐式的目标IDENTIFICATION或课程学习方法。我们对MOVE和MAP-Elites之间的相似性和不同进行评论，希望能够提供对这种方法的理解，并建议未来的工作可以为多目标问题的解决提供更多的指导。

UX Heuristics and Checklist for Deep Learning powered Mobile Applications with Image Classification

paper_url: http://arxiv.org/abs/2307.05513
repo_url: None
paper_authors: Christiane Gresse von Wangenheim, Gustavo Dirschnabel
for: 这个论文是为了提供 Deep Learning powered 移动应用程序图像分类的用户体验创新，以确保这些应用程序的有效使用。
methods: 该论文采用了文献综述和现有的移动应用程序图像分类的分析，并提出了一个初步的 AIX 规范集，以帮助设计人员开发更好的用户界面。
results: 该论文提出了一个在线课程和网络工具，以帮助实践者通过这些规范来评估和改进图像分类应用程序的用户界面。这些结果可以用于指导图像分类应用程序的界面设计，以及支持实践者开发更好的用户体验。

Abstract
Advances in mobile applications providing image classification enabled by Deep Learning require innovative User Experience solutions in order to assure their adequate use by users. To aid the design process, usability heuristics are typically customized for a specific kind of application. Therefore, based on a literature review and analyzing existing mobile applications with image classification, we propose an initial set of AIX heuristics for Deep Learning powered mobile applications with image classification decomposed into a checklist. In order to facilitate the usage of the checklist we also developed an online course presenting the concepts and heuristics as well as a web-based tool in order to support an evaluation using these heuristics. These results of this research can be used to guide the design of the interfaces of such applications as well as support the conduction of heuristic evaluations supporting practitioners to develop image classification apps that people can understand, trust, and can engage with effectively.

摘要
进步的移动应用程序，通过深度学习实现影像分类，需要创新的用户体验解决方案，以确保用户能够正确使用这些应用程序。为了促进设计过程，通常会根据特定类型的应用程序进行用户性heet Customization。因此，根据文献综述和现有的移动应用程序影像分类，我们提出了一份初步的AIXheet for Deep Learning搭配的移动应用程序影像分类，分为一个检查表格。为了便利检查表格的使用，我们还开发了线上课程，介绍概念和heet，以及一个网页工具，以支持使用这些heet进行评估。这些研究结果可以用来引导这些应用程序的界面设计，以及支持实施这些heet进行评估，帮助实现人们能够理解、信任，并对这些应用程序进行有效的互动。

Surge Routing: Event-informed Multiagent Reinforcement Learning for Autonomous Rideshare

paper_url: http://arxiv.org/abs/2307.02637
repo_url: None
paper_authors: Daniel Garces, Stephanie Gil
for: 这篇论文是针对大型活动（如演唱会、运动赛等）导致需求峰值的问题，提出了一个学习框架，以便推断需求峰值并适应需求峰值，并生成合作的routing和搜寻策略。
methods: 本论文使用了以下方法：（i）网络上获取活动资讯的数据库框架，将活动资讯转换为紧密的 вектор表示，并将其用于神经网络中的训练；（ii）使用两个神经网络系统来预测活动需求的每小时变化，并使用这些紧密的 вектор表示来预测需求；（iii）使用抽象的预测方法来将公开available的需求资料对应到抽象的街道交叉点上，并使用这些预测值来预测需求峰值；（iv）使用可扩展的模型基于强化学习框架，使用预测的需求峰值来预测需求峰值，并使用一个代理人一次推断的方法来路由出租车。
results: 本论文的实验结果显示，使用本学习框架可以生成的路由策略，每分钟可以服务 $6$ 更多的请求（约 $360$ 更多的请求每小时），较其他模型基于强化学习框架和操作研究中的其他分布式方法和 классиical algorithms 更高。

Abstract
Large events such as conferences, concerts and sports games, often cause surges in demand for ride services that are not captured in average demand patterns, posing unique challenges for routing algorithms. We propose a learning framework for an autonomous fleet of taxis that scrapes event data from the internet to predict and adapt to surges in demand and generates cooperative routing and pickup policies that service a higher number of requests than other routing protocols. We achieve this through a combination of (i) an event processing framework that scrapes the internet for event information and generates dense vector representations that can be used as input features for a neural network that predicts demand; (ii) a two neural network system that predicts hourly demand over the entire map, using these dense vector representations; (iii) a probabilistic approach that leverages locale occupancy schedules to map publicly available demand data over sectors to discretized street intersections; and finally, (iv) a scalable model-based reinforcement learning framework that uses the predicted demand over intersections to anticipate surges and route taxis using one-agent-at-a-time rollout with limited sampling certainty equivalence. We learn routing and pickup policies using real NYC ride share data for 2022 and information for more than 2000 events across 300 unique venues in Manhattan. We test our approach with a fleet of 100 taxis on a map with 38 different sectors (2235 street intersections). Our experimental results demonstrate that our method obtains routing policies that service $6$ more requests on average per minute (around $360$ more requests per hour) than other model-based RL frameworks and other classical algorithms in operations research when dealing with surge demand conditions.

摘要
大型活动如会议、音乐会和体育赛事会导致乘车服务的需求峰值，这些需求峰值不同于平均需求模式，对路由算法 pose unique challenges. We propose a learning framework for an autonomous fleet of taxis that scrapes event data from the internet to predict and adapt to surges in demand, and generates cooperative routing and pickup policies that can service a higher number of requests than other routing protocols. We achieve this through a combination of:(i) 事件处理框架，从互联网上抓取事件信息，生成可用作神经网络预测需求的密集 вектор表示;(ii) 两个神经网络系统，使用这些密集 вектор表示预测每小时的需求，涵盖整个地图;(iii) 一种 probabilistic 方法，使用可用于地点的公共需求数据来映射到分割的街口;(iv) 一种可扩展的模型基于 reinforcement learning 框架，使用预测的需求 над intersections 来预测峰值和路由TAXI 使用一个 Agent-at-a-time 扩展，限制采样确定Equivalence. We learn routing and pickup policies using real NYC ride share data for 2022 and information for more than 2000 events across 300 unique venues in Manhattan. We test our approach with a fleet of 100 taxis on a map with 38 different sectors (2235 street intersections). Our experimental results show that our method obtains routing policies that service $6$ more requests on average per minute (around $360$ more requests per hour) than other model-based RL frameworks and other classical algorithms in operations research when dealing with surge demand conditions.

An explainable model to support the decision about the therapy protocol for AML

paper_url: http://arxiv.org/abs/2307.02631
repo_url: None
paper_authors: Jade M. Almeida, Giovanna A. Castro, João A. Machado-Neto, Tiago A. Almeida
for: 这项研究的目的是支持医生决策最佳治疗协议，以提高患有AML的患者存活率。
methods: 该研究使用了可解释的机器学习模型，对患者的存活预测进行数据分析。
results: 研究结果显示，使用该模型可以安全地支持医生决策，并且有潜在的应用前景以提高治疗和预测 marker。

Abstract
Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm. To support the specialists' decision about the appropriate therapy, patients with AML receive a prognostic of outcomes according to their cytogenetic and molecular characteristics, often divided into three risk categories: favorable, intermediate, and adverse. However, the current risk classification has known problems, such as the heterogeneity between patients of the same risk group and no clear definition of the intermediate risk category. Moreover, as most patients with AML receive an intermediate-risk classification, specialists often demand other tests and analyses, leading to delayed treatment and worsening of the patient's clinical condition. This paper presents the data analysis and an explainable machine-learning model to support the decision about the most appropriate therapy protocol according to the patient's survival prediction. In addition to the prediction model being explainable, the results obtained are promising and indicate that it is possible to use it to support the specialists' decisions safely. Most importantly, the findings offered in this study have the potential to open new avenues of research toward better treatments and prognostic markers.

摘要
针对恶性白细胞肉瘤（AML）的诊断和治疗决策支持，患者通常根据细胞学和分子特征进行分类，分为三个风险类别：有利、中等和不利。然而，现有的风险分类系统存在多种问题，如患者群体内的多样性，中等风险类别的定义不清晰。此外，由于大多数AML患者被诊断为中等风险，专家经常要求更多的测试和分析，导致治疗延迟并使患者的临床状况加重。本文提出了数据分析和可解释机器学习模型，用于支持治疗决策。除了模型可解释性外，研究结果具有潜在的应用前景，可能用于更好地诊断和治疗AML。此外，本研究还开创了新的研究方向，可能为AML的更好的治疗和诊断做出贡献。

Real-time Workload Pattern Analysis for Large-scale Cloud Databases

paper_url: http://arxiv.org/abs/2307.02626
repo_url: None
paper_authors: Jiaqi Wang, Tianyi Li, Anni Wang, Xiaoze Liu, Lu Chen, Jie Chen, Jianye Liu, Junyang Wu, Feifei Li, Yunjun Gao
for: 这种论文旨在为大规模云数据库系统提供高效的工作负载模式发现和优化方法。
methods: 该论文提出了一种基于实时分析和精准执行的工作负载模式发现系统，名为Alibaba Workload Miner（AWM）。AWM使用了高维特征编码和在线分组方法来挖掘大规模云数据库的工作负载模式。
results: 实验结果表明，AWM可以提高工作负载模式发现精度达66%，并将在线推理延迟降低22%，相比之前的状态艺术。

Abstract
Hosting database services on cloud systems has become a common practice. This has led to the increasing volume of database workloads, which provides the opportunity for pattern analysis. Discovering workload patterns from a business logic perspective is conducive to better understanding the trends and characteristics of the database system. However, existing workload pattern discovery systems are not suitable for large-scale cloud databases which are commonly employed by the industry. This is because the workload patterns of large-scale cloud databases are generally far more complicated than those of ordinary databases. In this paper, we propose Alibaba Workload Miner (AWM), a real-time system for discovering workload patterns in complicated large-scale workloads. AWM encodes and discovers the SQL query patterns logged from user requests and optimizes the querying processing based on the discovered patterns. First, Data Collection & Preprocessing Module collects streaming query logs and encodes them into high-dimensional feature embeddings with rich semantic contexts and execution features. Next, Online Workload Mining Module separates encoded queries by business groups and discovers the workload patterns for each group. Meanwhile, Offline Training Module collects labels and trains the classification model using the labels. Finally, Pattern-based Optimizing Module optimizes query processing in cloud databases by exploiting discovered patterns. Extensive experimental results on one synthetic dataset and two real-life datasets (extracted from Alibaba Cloud databases) show that AWM enhances the accuracy of pattern discovery by 66% and reduce the latency of online inference by 22%, compared with the state-of-the-arts.

摘要
主机数据库服务在云系统上成为常见做法。这导致数据库负载的增加，提供了 Pattern 分析的机会。从业务逻辑角度发现数据库系统的工作负载模式可以更好地了解系统的趋势和特点。然而，现有的工作负载模式发现系统不适用于大规模云数据库，这些云数据库通常被行业使用。这是因为大规模云数据库的工作负载模式通常比普通数据库更加复杂。在这篇论文中，我们提出了阿里巴巴工作负载挖掘（AWM），一个实时的工作负载模式发现系统。AWM 将用户请求中的 SQL 查询 logged 编码并分析，以优化查询处理。首先，数据收集和预处理模块将流动查询日志编码成高维ensional 特征嵌入，具有丰富的语义上下文和执行特征。接下来，在线工作挖掘模块将编码查询分成业务组，并发现每个组的工作负载模式。同时，离线训练模块收集标签并使用标签进行分类模型训练。最后，基于发现的模式， Pattern-based Optimizing 模块可以在云数据库中优化查询处理，以提高效率。我们在一个 sintetic 数据集和两个实际数据集（从阿里巴巴云数据库提取）进行了广泛的实验，结果显示，AWM 可以提高Pattern 发现精度达 66%，并将在线推理延迟降低 22%，相比之前的状态艺。

Learning when to observe: A frugal reinforcement learning framework for a high-cost world

paper_url: http://arxiv.org/abs/2307.02620
repo_url: https://github.com/cbellinger27/learning-when-to-observe-in-rl
paper_authors: Colin Bellinger, Mark Crowley, Isaac Tamblyn
for: This paper focuses on the problem of learning in reinforcement learning (RL) when there is a high cost associated with measuring the state of the environment.
methods: The proposed method is called the Deep Dynamic Multi-Step Observationless Agent (DMSOA), which does not rely on costly measurements at each time step. Instead, it uses a deep neural network to learn when to observe the environment and when to act without observation.
results: The authors evaluate DMSOA on OpenAI gym and Atari Pong environments and show that it learns a better policy with fewer decision steps and measurements than the considered alternative from the literature.

Abstract
Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as materials design, deep-sea and planetary robot exploration and medicine, however, there can be a high cost associated with measuring, or even approximating, the state of the environment. In this paper, we survey the recently growing literature that adopts the perspective that an RL agent might not need, or even want, a costly measurement at each time step. Within this context, we propose the Deep Dynamic Multi-Step Observationless Agent (DMSOA), contrast it with the literature and empirically evaluate it on OpenAI gym and Atari Pong environments. Our results, show that DMSOA learns a better policy with fewer decision steps and measurements than the considered alternative from the literature. The corresponding code is available at: \url{https://github.com/cbellinger27/Learning-when-to-observe-in-RL

摘要
强化学习（RL）已经能够学习复杂任务，包括游戏、机器人、暖通空调和文本生成。但是，RL中的动作-感知循环通常假设在每个时间步骤上可以无成本地测量环境状态。在材料设计、深海和行星探险等应用中，可能存在高成本的环境状态测量或 aproximation。在这篇论文中，我们回顾了近期快速增长的文献，其中RL机器人可能不需要或甚至不想在每个时间步骤上进行costly的测量。基于这个视角，我们提出了深度动态多步无测量代理（DMSOA），并与文献进行比较。我们还通过对OpenAI Gym和Atari Pong环境进行实验，并证明了DMSOA可以在 fewer decision steps和测量下学习更好的策略。相关代码可以在以下链接获取：https://github.com/cbellinger27/Learning-when-to-observe-in-RL。

Federated Epidemic Surveillance

paper_url: http://arxiv.org/abs/2307.02616
repo_url: None
paper_authors: Ruiqi Lyu, Bryan Wilder, Roni Rosenfeld
for: 这篇论文的目的是为了开发一种联邦方法来检测流行病，并且处理不可靠的数据。
methods: 这篇论文使用了一种将测试假设Push到各个看守者的防火墙后，然后进行meta分析来结合结果的方法。
results: 这篇论文的结果显示了联邦方法的可能性，并且透过实验和数据分析，显示了这种方法在检测流行病方面的优点和适用范围。

Abstract
The surveillance of a pandemic is a challenging task, especially when crucial data is distributed and stakeholders cannot or are unwilling to share. To overcome this obstacle, federated methodologies should be developed to incorporate less sensitive evidence that entities are willing to provide. This study aims to explore the feasibility of pushing hypothesis tests behind each custodian's firewall and then meta-analysis to combine the results, and to determine the optimal approach for reconstructing the hypothesis test and optimizing the inference. We propose a hypothesis testing framework to identify a surge in the indicators and conduct power analyses and experiments on real and semi-synthetic data to showcase the properties of our proposed hypothesis test and suggest suitable methods for combining $p$-values. Our findings highlight the potential of using $p$-value combination as a federated methodology for pandemic surveillance and provide valuable insights into integrating available data sources.

摘要
“监控疫情需要复杂的努力，尤其是当重要的数据分散且掌控者无法或不愿分享。为了解决这个问题，我们应发展联邦方法ologies，将不sensitive的证据包含在内。这项研究尝试探索将假设测试传递到各个看守者的墙壁后，然后进行Meta-分析，并决定最佳的方法来重建假设测试和优化推理。我们提出了一个假设测试框架，用于识别疫情指标的增加，并执行力学分析和实验，以显示我们的提案的假设测试和合理的方法。我们的发现显示，使用$p$-值组合为联邦方法学可以实现疫情监控，并提供有价的导致统合可用数据源。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition

paper_url: http://arxiv.org/abs/2307.02615
repo_url: https://github.com/sled-group/comparative-learning
paper_authors: Yuwei Bao, Barrett Martin Lattimer, Joyce Chai
for: 本研究旨在提出一种基于人类语言学习的计算模型，用于效率地学习语言概念。
methods: 该模型启发自人类婴儿语言学习的机制，通过比较学习来找到语言概念的相似和不同之处，并将这些概念映射到表示符号上。
results: 控制性实验结果表明，该方法可以有效地实现语言概念的持续学习和扩展。

Abstract
Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes, learn to filter out and extract the common information for each shared linguistic label. We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping. This procedure does not involve a fixed vocabulary size, nor a discriminative objective, and allows the models to continually learn more concepts efficiently. Our results in controlled experiments have shown the potential of this approach for efficient continual learning of grounded words.

摘要
人类语言学习是一个效率高、监督的、不断进行的过程。在这项工作中，我们取得了人类宝宝语言Acquisition的灵感，并开发了一种计算机处理方法，通过比较学习来获得单词。我们根据认知发现，生成了一个小型数据集，让计算机模型可以对各种属性进行比较，学习抽象出每个共享语言标签中的共同信息。我们将单词的获得看成不仅为信息筛选过程，还是表示-符号映射。这种方法不含固定词汇大小，也不含推理性目标，使计算机模型能够高效地学习更多概念。我们在控制实验中获得的结果表明了这种方法的潜在效果。

Evade ChatGPT Detectors via A Single Space

paper_url: http://arxiv.org/abs/2307.02599
repo_url: None
paper_authors: Shuyang Cai, Wanyun Cui
for: 检测ChatGPT生成的内容是否真实由人类生成的内容。
methods: 使用统计信息或分类器来建立分布 diferencial gap 的假设，但是实际上这些 gap 并不能有效地区分人类生成和AI生成的内容的Semantic和Styleistic差异。
results: 提出了基于空格的SpaceInfi策略可以逃脱检测，并在多个benchmark和检测器上进行了实验，并提供了这种策略的理论解释。

Abstract
ChatGPT brings revolutionary social value but also raises concerns about the misuse of AI-generated content. Consequently, an important question is how to detect whether content is generated by ChatGPT or by human. Existing detectors are built upon the assumption that there are distributional gaps between human-generated and AI-generated content. These gaps are typically identified using statistical information or classifiers. Our research challenges the distributional gap assumption in detectors. We find that detectors do not effectively discriminate the semantic and stylistic gaps between human-generated and AI-generated content. Instead, the "subtle differences", such as an extra space, become crucial for detection. Based on this discovery, we propose the SpaceInfi strategy to evade detection. Experiments demonstrate the effectiveness of this strategy across multiple benchmarks and detectors. We also provide a theoretical explanation for why SpaceInfi is successful in evading perplexity-based detection. Our findings offer new insights and challenges for understanding and constructing more applicable ChatGPT detectors.

摘要
chatGPT 带来了革命性的社会价值，但也引发了人工智能生成内容的滥用问题。因此，一个重要的问题是如何判断内容是否由 chatGPT 或人类生成。现有的检测器基于分布分布 gap 假设，这些 gap 通常通过统计信息或分类器来定义。我们的研究质疑这种分布 gap 假设在检测器中的有效性。我们发现，检测器无法有效地异化人类生成和 AI 生成内容的语义和风格差异。相反，“微小差异”，如额外的空格，在检测中变得关键。基于这一发现，我们提出了 SpaceInfi 策略，可以在多个检测器上逃脱检测。实验表明 SpaceInfi 策略在多个标准检测器上具有极高的效果。我们还提供了对 SpaceInfi 策略的理论解释，即在检测器中 why SpaceInfi 能够成功逃脱准确性基于检测。我们的发现对于构建更加实际的 chatGPT 检测器提供了新的思路和挑战。

paper_url: http://arxiv.org/abs/2307.02591
repo_url: https://github.com/soon91jae/orab_mimic
paper_authors: Sunjae Kwon, Xun Wang, Weisong Liu, Emily Druhl, Minhee L. Sung, Joel I. Reisman, Wenjun Li, Robert D. Kerns, William Becker, Hong Yu
for: This paper aims to develop a novel biomedical natural language processing benchmark dataset (ODD) to detect opioid-related aberrant behaviors (ORAB) from electronic health records (EHRs).
methods: The authors use two state-of-the-art natural language processing (NLP) models (finetuning pretrained language models and prompt-tuning approaches) to identify ORAB in EHR notes.
results: The prompt-tuning models outperformed the finetuning models in most categories, especially in uncommon categories such as Suggested Aberrant Behavior, Diagnosed Opioid Dependence, and Medication Change. The best model achieved an area under the precision recall curve of 83.92%, but there is still room for improvement in detecting uncommon classes.Here’s the Chinese version of the information:
for: 这篇论文目的是开发一个新的医学自然语言处理数据集（ODD），用于从电子医疗记录（EHR）中检测抗疼药相关异常行为（ORAB）。
methods: 作者使用了两种当前最佳的自然语言处理（NLP）模型（finetuning 预训练语言模型和提示调教方法）来识别ORAB在EHR记录中。
results: 提示调教模型在大多数类别上表现出了优于finetuning模型，特别是在不常见类别（建议异常行为、诊断抗疼药依赖和药物变化）上。最佳模型在精度 recall 曲线上的面积为83.92%，但是在检测不常见类别时仍有较大的改进空间。

Abstract
Opioid related aberrant behaviors (ORAB) present novel risk factors for opioid overdose. Previously, ORAB have been mainly assessed by survey results and by monitoring drug administrations. Such methods however, cannot scale up and do not cover the entire spectrum of aberrant behaviors. On the other hand, ORAB are widely documented in electronic health record notes. This paper introduces a novel biomedical natural language processing benchmark dataset named ODD, for ORAB Detection Dataset. ODD is an expert-annotated dataset comprising of more than 750 publicly available EHR notes. ODD has been designed to identify ORAB from patients' EHR notes and classify them into nine categories; 1) Confirmed Aberrant Behavior, 2) Suggested Aberrant Behavior, 3) Opioids, 4) Indication, 5) Diagnosed opioid dependency, 6) Benzodiapines, 7) Medication Changes, 8) Central Nervous System-related, and 9) Social Determinants of Health. We explored two state-of-the-art natural language processing (NLP) models (finetuning pretrained language models and prompt-tuning approaches) to identify ORAB. Experimental results show that the prompt-tuning models outperformed the finetuning models in most cateogories and the gains were especially higher among uncommon categories (Suggested aberrant behavior, Diagnosed opioid dependency and Medication change). Although the best model achieved the highest 83.92% on area under precision recall curve, uncommon classes (Suggested Aberrant Behavior, Diagnosed Opioid Dependence, and Medication Change) still have a large room for performance improvement.

摘要
“对于专业医疗记录（EHR）中的Opioid相关偏常行为（ORAB）进行识别和分类是一个重要的应用。在过去，ORAB通常通过调查结果和药物管理纪录来评估。但这些方法无法扩展和不能覆盖整个偏常行为的范围。相反地，ORAB在EHR中广泛记录，这篇文章提出了一个新的生医自然语言处理标准 benchmark dataset，名为ODD（Opioid Detection Dataset）。ODD包含了超过750份公开可用的EHR资料，并且以9个类别进行分类：1）确认偏常行为，2）建议偏常行为，3）Opioids，4）指示，5）诊断Opioid依赖，6）苯二氮酚，7）药物变化，8）中枢神经系统相关，9）社会健康Determinants。我们使用了两种现代生医自然语言处理（NLP）模型（finetuning pretrained language models和prompt-tuning方法）进行ORAB的识别。实验结果显示，prompt-tuning模型在大多数类别上表现出色，特别是在不常见的类别（建议偏常行为、诊断Opioid依赖和药物变化）中。 although the best model achieved the highest 83.92% on area under precision recall curve, uncommon classes still have a large room for performance improvement.”

TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers

paper_url: http://arxiv.org/abs/2307.02588
repo_url: None
paper_authors: Alan John Varghese, Aniruddha Bora, Mengjia Xu, George Em Karniadakis
for: 本研究旨在提出一种基于 transformer encoder 的图像模型，用于解决多种时间图分析任务（如链接预测、节点分类、推荐系统、异常检测和图生成）。
methods: 本研究使用了 transformer encoder 来首先学习当前时间步 ($t$) 和上一步的上下文（[$t-1, t-l$]，$l$ 是上下文的长度）中的中间节点表示。然后，使用两个投影层生成时间步 $t$ 的节点秘密嵌入。
results: 对于不同的“新鲜度”水平（根据 TEA 图表来度量），我们的 TransformerG2G 模型在链接预测精度和计算效率方面都超过了传统多步方法和我们之前的工作（DynG2G），特别是在高度的新鲜度下。此外，通过分析注意力权重，我们可以揭示时间依赖关系，找到影响因素，并获得图structure中复杂的互动之间的启示。例如，我们发现在不同的图结构阶段，注意力权重和节点度之间存在强相关关系。

Abstract
Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state ($t$) and previous context (over timestamps [$t-1, t-l$], $l$ is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp $t$. We consider diverse benchmarks with varying levels of ``novelty" as measured by the TEA plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.

摘要
几何图模型（Graph Embedding）在许多应用中已经证明非常有效，用于解决多种时间图分析任务（如链接预测、节点分类、推荐系统、异常检测和图生成）。这些时间图会展现不同的不定期性、时间间隔和节点特征的高度演变。因此，从历史图上的长距离依赖性来准确学习它们的时间动态是非常重要。在这篇论文中，我们开发了一种具有不确定性评估的图嵌入模型，名为TransformerG2G，通过利用高级变换编码器来首先学习当前时间步($t$) 和上一个时间步（[$t-1, t-l$）中的节点表示，其中$l$ 是 Context的长度。此外，我们使用两个投影层来生成每个节点的时间步 $t$ 的低维多元 Gaussian 分布。我们在不同的 TEA 图表中进行了多种不同的“新鲜度”水平的测试，我们的实验结果表明，我们提出的 TransformerG2G 模型在链接预测精度和计算效率方面都高于 convential 多步法和我们之前的 DynG2G 模型，特别是在高度新鲜度时。此外，我们通过分析注意力权重来探索图结构中的时间相关性、影响因素和复杂的交互关系。例如，我们发现了节点度和注意力权重之间的强相关性，这些相关性在不同的图结构发展阶段都存在。

Artificial Intelligence in archival and historical scholarship workflow: HTS and ChatGPT

paper_url: http://arxiv.org/abs/2308.02044
repo_url: None
paper_authors: Salvatore Spina
for: This paper examines the impact of Artificial Intelligence (AI) on the archival heritage digitization processes, specifically focusing on the automatic transcription, correction, and normalization of manuscripts.
methods: The study uses two AI systems, Transkribus and ChatGPT, to analyze and transcribe digitized sources.
results: The paper presents the results of a test using ChatGPT to normalize the text of 366 letters stored in the Correspondence section of the Biscari Archive (Catania), which showed that although the AI exhibited some limitations, the corrected texts met expectations. Overall, the study concludes that digitization and AI can significantly enhance archival and historical research by allowing the analysis of vast amounts of data and the application of computational linguistic tools.

Abstract
This article examines the impact of Artificial Intelligence on the archival heritage digitization processes, specifically regarding the manuscripts' automatic transcription, their correction, and normalization. It highlights how digitality has compelled scholars to redefine Archive and History field and has facilitated the accessibility of analogue sources through digitization and integration into big data. The study focuses on two AI systems, namely Transkribus and ChatGPT, which enable efficient analysis and transcription of digitized sources. The article presents a test of ChatGPT, which was utilized to normalize the text of 366 letters stored in the Correspondence section of the Biscari Archive (Catania). Although the AI exhibited some limitations that resulted in inaccuracies, the corrected texts met expectations. Overall, the article concludes that digitization and AI can significantly enhance archival and historical research by allowing the analysis of vast amounts of data and the application of computational linguistic tools.

摘要
Translation in Simplified Chinese:这篇文章研究了人工智能对文件遗产数字化过程的影响，特别是手写文本自动识别、修正和normalization。它指出了数字化技术的发展使得学术界被迫重新定义档案领域和历史研究领域，并使得 Analog sources 更加可访整合入大数据。文章关注了两个 AI 系统，即 Transkribus 和 ChatGPT，它们可以高效地分析和转写数字化的源料。文章公布了 ChatGPT 对 Catania 的 Biscari 档案（Correspondence 部分）中的 366 封信的测试结果，尽管 AI 系统存在一些局限性和不准确性，但修正后的文本均达到预期。总的来说，文章结论认为，数字化和 AI 可以对档案和历史研究进行 significative 的提高，以便对大量数据进行分析和应用计算语言工具。

The Effects of Interaction Conflicts, Levels of Automation, and Frequency of Automation on Human Automation Trust and Acceptance

paper_url: http://arxiv.org/abs/2307.05512
repo_url: None
paper_authors: Hadi Halvachi, Ali Asghar Nazari Shirehjini, Zahra Kakavand, Niloofar Hashemi, Shervin Shirmohammadi
for: 本研究旨在 investigating the effects of Level of Automation (LoA), Frequency of Automated responses (FoA), and Conflict Intensity (CI) on human trust and acceptance of automation in the context of smart homes.
methods: 研究使用了一种因素研究设计，通过在线实验，收集了324名在线参与者的数据，以了解他们对智能家居的信任和接受度。
results: 结果显示，自动化水平和自动化回应频率对用户对智能环境的信任产生影响。此外，结果还表明，在自动化失败和互动冲突的情况下，用户对自动化智能环境的接受度减退。

Abstract
In the presence of interaction conflicts, user trust in automation plays an important role in accepting intelligent environments such as smart homes. In this paper, a factorial research design is employed to investigate and compare the single and joint effects of Level of Automation (LoA), Frequency of Automated responses (FoA), and Conflict Intensity (CI) on human trust and acceptance of automation in the context of smart homes. To study these effects, we conducted web-based experiments to gather data from 324 online participants who experienced the system through a 3D simulation of a smart home. The findings show that the level and frequency of automation had an impact on user trust in smart environments. Furthermore, the results demonstrate that the users' acceptance of automated smart environments decreased in the presence of automation failures and interaction conflicts.

摘要
在智能环境中存在交互冲突时，用户对自动化的信任作用着重要作用于接受智能家庭。在这篇论文中，我们采用因素研究设计来调查和比较单独和共同影响智能家庭中用户信任和自动化的因素。为了研究这些影响，我们通过网络实验吸引了324名在线参与者，他们通过3D智能家庭 simulate的系统进行了体验。结果表明，自动化水平和自动化响应频率对智能家庭中用户信任有影响。此外，结果还示出在自动化失败和交互冲突情况下，用户对自动化智能环境的接受度下降。

Building Cooperative Embodied Agents Modularly with Large Language Models

paper_url: http://arxiv.org/abs/2307.02485
repo_url: https://github.com/UMass-Foundation-Model/Co-LLM-Agents
paper_authors: Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan
for: This paper aims to explore the use of large language models (LLMs) for multi-agent cooperation and communication in embodied environments.
methods: The authors present a novel framework that utilizes LLMs for planning, communication, and cooperation in various embodied environments, without requiring fine-tuning or few-shot prompting.
results: The authors demonstrate that LLM-based agents can surpass strong planning-based methods and exhibit emergent effective communication, and that LLM-based agents that communicate in natural language can earn more trust and cooperate more effectively with humans.

Abstract
Large Language Models (LLMs) have demonstrated impressive planning abilities in single-agent embodied tasks across various domains. However, their capacity for planning and communication in multi-agent cooperation remains unclear, even though these are crucial skills for intelligent embodied agents. In this paper, we present a novel framework that utilizes LLMs for multi-agent cooperation and tests it in various embodied environments. Our framework enables embodied agents to plan, communicate, and cooperate with other embodied agents or humans to accomplish long-horizon tasks efficiently. We demonstrate that recent LLMs, such as GPT-4, can surpass strong planning-based methods and exhibit emergent effective communication using our framework without requiring fine-tuning or few-shot prompting. We also discover that LLM-based agents that communicate in natural language can earn more trust and cooperate more effectively with humans. Our research underscores the potential of LLMs for embodied AI and lays the foundation for future research in multi-agent cooperation. Videos can be found on the project website https://vis-www.cs.umass.edu/Co-LLM-Agents/.

摘要
大型语言模型（LLM）在单机器人任务中表现出了惊人的规划能力，但它们在多机器人合作中的规划和通信能力仍然不清楚，尽管这些技能对智能机器人来说非常重要。在这篇论文中，我们提出了一种新的框架，用于利用LLM来实现多机器人合作，并在不同的embodied环境中进行测试。我们的框架允许机器人规划、通信和与其他机器人或人类合作，以实现长期任务的效率。我们发现，最新的LLM，如GPT-4，可以在我们的框架下超越强规划方法，并在没有微调或几个推荐的情况下显示出emergent的有效通信。此外，我们发现LLM基于的自然语言通信的机器人可以在人类面前赢得更多的信任和更有效地合作。我们的研究证明了LLM的潜力，并为多机器人合作的未来研究提供了基础。详细信息和视频可以在项目网站https://vis-www.cs.umass.edu/Co-LLM-Agents/中找到。

Elastic Decision Transformer

paper_url: http://arxiv.org/abs/2307.02484
repo_url: https://github.com/danderfer/Comp_Sci_Sem_2
paper_authors: Yueh-Hua Wu, Xiaolong Wang, Masashi Hamaya
for: 提高 Decision Transformer (DT) 和其变种的性能，尤其是在生成优化轨迹的过程中。
methods: 提出 Elastic Decision Transformer (EDT)，通过在测试时进行动作推理中实现 trajectory stitching，并在保持历史记录时进行优化。
results: 在 D4RL 步态标准和 Atari 游戏中，EDT 比 Q Learning-based 方法表现更好，特别是在多任务情况下。

Abstract
This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/

摘要
这篇论文介绍了弹性决策变换器（EDT），它对现有的决策变换器（DT）和其变体具有显著的进步。虽然DT宣称生成最优轨迹，但实际证据表明它在轨迹缝针接处遇到困难，即生成最优或近似最优轨迹从一组sub-optimal轨迹中的最佳部分。提出的EDT通过在测试时进行行动推理中实现轨迹缝针接，并通过调整DT中保持的历史长度来实现。此外，EDT可以根据前一个轨迹是否优化，保持更长的历史记录，从而使其能够“缝”到更优轨迹。广泛的实验表明EDT能够在DT基于和Q学习基于方法之间bridging性能差距。特别是在多任务 режи围的D4RL locomotivebenchmark和Atari游戏中，EDT超越Q学习基于方法。视频可以在：https://kristery.github.io/edt/中找到。

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

paper_url: http://arxiv.org/abs/2307.02477
repo_url: https://github.com/zhaofengwu/counterfactual-evaluation
paper_authors: Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, Yoon Kim
for: 这个论文旨在评估当代自然语言处理模型是否具备抽象逻辑能力，以及这种能力是否普适或特定任务的偏爱。
methods: 作者提出了一种基于“Counterfactual”任务变体的评估框架，以评估当代语言模型是否具备抽象逻辑能力，并对11种任务进行了测试。
results: 研究发现，当代语言模型在Counterfactual任务变体中表现出了一定的抽象逻辑能力，但是与标准任务的表现相比，其表现却有很大的差异，这表明当代语言模型的表现可能受到了特定任务的偏爱。

Abstract
The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specialized to specific tasks seen during pretraining? To disentangle these effects, we propose an evaluation framework based on "counterfactual" task variants that deviate from the default assumptions underlying standard tasks. Across a suite of 11 tasks, we observe nontrivial performance on the counterfactual variants, but nevertheless find that performance substantially and consistently degrades compared to the default conditions. This suggests that while current LMs may possess abstract task-solving skills to a degree, they often also rely on narrow, non-transferable procedures for task-solving. These results motivate a more careful interpretation of language model performance that teases apart these aspects of behavior.

摘要
“现代语言模型在广泛的任务上表现出了印象人的能力，这表明它们拥有一定的抽象逻辑能力。但是，这些能力是通用逻辑能力，还是对特定任务的适应？为了解答这个问题，我们提出了一个基于“假设”的评估框架。在11个任务中，我们发现了不同于默认情况下的任务Variant中的表现，但是它们的表现都具有较低的水准，与默认情况下的表现有很大的差异。这些结果表明，现代语言模型可能拥有一定的抽象任务解决能力，但是它们通常还是对特定任务进行适应，而不是拥有通用的逻辑能力。这些结果鼓励我们更加留意语言模型的表现，并将其分解为不同的方面。”

Deductive Additivity for Planning of Natural Language Proofs

paper_url: http://arxiv.org/abs/2307.02472
repo_url: https://github.com/zayne-sprague/deductive_additivity_for_planning_of_natural_language_proofs
paper_authors: Zayne Sprague, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett
for: investigate whether an efficient planning heuristic is possible via embedding spaces compatible with deductive reasoning
methods: explore multiple sources of off-the-shelf dense embeddings in addition to fine-tuned embeddings from GPT3 and sparse embeddings from BM25
results: find that while standard embedding methods frequently embed conclusions near the sums of their premises, they fall short of being effective heuristics and lack the ability to model certain categories of reasoning

Abstract
Current natural language systems designed for multi-step claim validation typically operate in two phases: retrieve a set of relevant premise statements using heuristics (planning), then generate novel conclusions from those statements using a large language model (deduction). The planning step often requires expensive Transformer operations and does not scale to arbitrary numbers of premise statements. In this paper, we investigate whether an efficient planning heuristic is possible via embedding spaces compatible with deductive reasoning. Specifically, we evaluate whether embedding spaces exhibit a property we call deductive additivity: the sum of premise statement embeddings should be close to embeddings of conclusions based on those premises. We explore multiple sources of off-the-shelf dense embeddings in addition to fine-tuned embeddings from GPT3 and sparse embeddings from BM25. We study embedding models both intrinsically, evaluating whether the property of deductive additivity holds, and extrinsically, using them to assist planning in natural language proof generation. Lastly, we create a dataset, Single-Step Reasoning Contrast (SSRC), to further probe performance on various reasoning types. Our findings suggest that while standard embedding methods frequently embed conclusions near the sums of their premises, they fall short of being effective heuristics and lack the ability to model certain categories of reasoning.

摘要
现有的自然语言系统设计为多步验证声明通常采用两个阶段：首先使用规则（规划） retrieve 一组相关的假设语句，然后使用大型自然语言模型（推理）生成新的结论。规划阶段经常需要昂贵的Transformer操作，并不可扩展到 произвольные数量的假设语句。在这篇论文中，我们investigate whether an efficient planning heuristic is possible via embedding spaces compatible with deductive reasoning. Specifically, we evaluate whether embedding spaces exhibit a property we call deductive additivity: the sum of premise statement embeddings should be close to embeddings of conclusions based on those premises. We explore multiple sources of off-the-shelf dense embeddings in addition to fine-tuned embeddings from GPT3 and sparse embeddings from BM25. We study embedding models both intrinsically, evaluating whether the property of deductive additivity holds, and extrinsically, using them to assist planning in natural language proof generation. Lastly, we create a dataset, Single-Step Reasoning Contrast (SSRC), to further probe performance on various reasoning types. Our findings suggest that while standard embedding methods frequently embed conclusions near the sums of their premises, they fall short of being effective heuristics and lack the ability to model certain categories of reasoning.

Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources

paper_url: http://arxiv.org/abs/2307.02460
repo_url: None
paper_authors: Feiyang Kang, Hoang Anh Just, Anit Kumar Sahu, Ruoxi Jia
for: 这篇论文是关于如何在实际数据交换场景中进行数据选择，特别是当数据提供者只显示一小部分的数据时。
methods: 这篇论文提出了一个名为的框架，可以根据部分数据来预测模型的性能和支持数据选择 Decision。它的方法包括使用最佳运输距离来预测模型在任何数据混合比例下的性能，然后使用一种新的参数自由的映射技术来从小数据 extrapolate 模型的性能到更大的未知数据大小。
results: 评估结果显示，在性能预测和计算成本方面都有所改善，并且在数据选择效果方面也有所超越其他一些对照方案。

Abstract
Traditionally, data selection has been studied in settings where all samples from prospective sources are fully revealed to a machine learning developer. However, in practical data exchange scenarios, data providers often reveal only a limited subset of samples before an acquisition decision is made. Recently, there have been efforts to fit scaling laws that predict model performance at any size and data source composition using the limited available samples. However, these scaling functions are black-box, computationally expensive to fit, highly susceptible to overfitting, or/and difficult to optimize for data selection. This paper proposes a framework called , which predicts model performance and supports data selection decisions based on partial samples of prospective data sources. Our approach distinguishes itself from existing work by introducing a novel *two-stage* performance inference process. In the first stage, we leverage the Optimal Transport distance to predict the model's performance for any data mixture ratio within the range of disclosed data sizes. In the second stage, we extrapolate the performance to larger undisclosed data sizes based on a novel parameter-free mapping technique inspired by neural scaling laws. We further derive an efficient gradient-based method to select data sources based on the projected model performance. Evaluation over a diverse range of applications demonstrates that significantly improves existing performance scaling approaches in terms of both the accuracy of performance inference and the computation costs associated with constructing the performance predictor. Also, outperforms by a wide margin in data selection effectiveness compared to a range of other off-the-shelf solutions.

摘要
传统上，数据选择都是在所有样本都是完全披露给机器学习开发者的情况下研究的。然而，在实际数据交换场景下，数据提供者通常只 revelas 一个有限的子集的样本 перед一个收购决策。最近，有努力适应缩放函数来预测模型性能，但这些缩放函数是黑盒子、计算成本高、易于过拟合或难以优化数据选择。本文提出了一个名为的框架，可以根据部分样本预测模型性能并支持数据选择决策。我们的方法与现有工作不同，我们引入了一种新的两Stage性能预测过程。在第一stage，我们利用最佳运输距离来预测模型在任何数据混合比例范围内的性能。在第二stage，我们通过一种新的无参数映射技术，基于神经缩放法则来推断模型性能。我们进一步 derive了一种高效的梯度下降方法来选择数据源基于预测模型性能。经过对多种应用场景的评估，我们发现significantly improve现有性能扩展方法的准确性和计算成本。此外，也在数据选择效果上大幅超越了一些Off-the-shelf解决方案。

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models

paper_url: http://arxiv.org/abs/2307.02457
repo_url: https://github.com/tencentarc/desra
paper_authors: Liangbin Xie, Xintao Wang, Xiangyu Chen, Gen Li, Ying Shan, Jiantao Zhou, Chao Dong
for: 提高实际场景中SR模型的应用能力，即使面临不seen数据和无权准的情况
methods: 基于GAN的SR模型，通过检测和修正缺陷来提高SR模型的效果
results: 通过DeSRA方法，可以成功消除GAN-SR模型中的缺陷和不愉悦的artefacts，提高SR模型在实际场景中的应用能力

Abstract
Image super-resolution (SR) with generative adversarial networks (GAN) has achieved great success in restoring realistic details. However, it is notorious that GAN-based SR models will inevitably produce unpleasant and undesirable artifacts, especially in practical scenarios. Previous works typically suppress artifacts with an extra loss penalty in the training phase. They only work for in-distribution artifact types generated during training. When applied in real-world scenarios, we observe that those improved methods still generate obviously annoying artifacts during inference. In this paper, we analyze the cause and characteristics of the GAN artifacts produced in unseen test data without ground-truths. We then develop a novel method, namely, DeSRA, to Detect and then Delete those SR Artifacts in practice. Specifically, we propose to measure a relative local variance distance from MSE-SR results and GAN-SR results, and locate the problematic areas based on the above distance and semantic-aware thresholds. After detecting the artifact regions, we develop a finetune procedure to improve GAN-based SR models with a few samples, so that they can deal with similar types of artifacts in more unseen real data. Equipped with our DeSRA, we can successfully eliminate artifacts from inference and improve the ability of SR models to be applied in real-world scenarios. The code will be available at https://github.com/TencentARC/DeSRA.

摘要
Image超解像（SR）使用生成敌对网络（GAN）已经取得了非常成功的RestoreRealistic Details。然而，GAN基于SR模型会不可避免地产生不愉快和不жела的artefacts，特别在实际应用中。先前的工作通常通过额外的损失 penalty在训练阶段来降低这些artefacts。然而，在实际应用中，我们发现这些改进的方法仍然在推理阶段产生明显的噪声。在这篇论文中，我们分析了GAN artefacts在未看到的测试数据中的原因和特征。然后，我们开发了一种名为DeSRA的方法，用于检测并删除SR artefacts。具体来说，我们提议使用MSE-SR结果和GAN-SR结果之间的相对本地差距来评估artefact区域。然后，我们根据这个距离和semantic-aware的阈值来定位问题区域。在检测artefact区域之后，我们开发了一种finetune过程，以提高GAN基于SR模型对类似 artefacts的处理能力。通过我们的DeSRA，我们可以成功地从推理中除除artefacts，并提高SR模型在实际应用中的可用性。代码将在https://github.com/TencentARC/DeSRA上提供。

paper_url: http://arxiv.org/abs/2307.02443
repo_url: None
paper_authors: Max Hort, Anastasiia Grishina, Leon Moonen
for: 本研究的主要目标是检查发表的语言模型在软件工程任务上是否分享源代码和训练 artifacts，以及分析训练能源消耗的透明度。
methods: 我们采用雪崩式Literature搜索来找到使用语言模型解决源代码问题的发表文献，并分析它们的可重用性从可持续性角度。
results: 我们从494独特的发表文献中筛选出293篇相关的文献，其中27%（79个）的文献提供了可重用的工具或IDE插件，以及可以精度调整的特定任务下的模型。此外，我们收集了模型训练硬件和训练时间的信息，以便分析训练过程中的能源消耗。我们发现当前的研究中有40%的论文不分享源代码或训练 artifacts，我们建议分享源代码和训练 artifacts，以实现可持续可重用的复制性。此外，我们建议对模型训练过程中的硬件配置和训练时间进行全面的分享，以便对模型的碳脚印象进行透明度。

Abstract
Large language models trained on source code can support a variety of software development tasks, such as code recommendation and program repair. Large amounts of data for training such models benefit the models' performance. However, the size of the data and models results in long training times and high energy consumption. While publishing source code allows for replicability, users need to repeat the expensive training process if models are not shared. The main goal of the study is to investigate if publications that trained language models for software engineering (SE) tasks share source code and trained artifacts. The second goal is to analyze the transparency on training energy usage. We perform a snowballing-based literature search to find publications on language models for source code, and analyze their reusability from a sustainability standpoint. From 494 unique publications, we identified 293 relevant publications that use language models to address code-related tasks. Among them, 27% (79 out of 293) make artifacts available for reuse. This can be in the form of tools or IDE plugins designed for specific tasks or task-agnostic models that can be fine-tuned for a variety of downstream tasks. Moreover, we collect insights on the hardware used for model training, as well as training time, which together determine the energy consumption of the development process. We find that there are deficiencies in the sharing of information and artifacts for current studies on source code models for software engineering tasks, with 40% of the surveyed papers not sharing source code or trained artifacts. We recommend the sharing of source code as well as trained artifacts, to enable sustainable reproducibility. Moreover, comprehensive information on training times and hardware configurations should be shared for transparency on a model's carbon footprint.

摘要
大型语言模型可以支持软件开发任务，如代码推荐和程序修复。大量数据 для训练这些模型会提高模型性能。然而，模型和数据的大小会导致训练时间长和能源消耗高。发布源代码可以提高复制性，但用户需要重新进行费时的训练过程，如果模型不被共享。我们的研究目标是调查发布在软件工程（SE）任务上使用语言模型的研究是否分享源代码和训练 artifacts。我们的第二个目标是分析训练过程中的能源消耗透明度。我们通过降准搜索来找到使用语言模型Addressing code-related tasks的发表文献，并分析它们的可重用性从可持续性的角度。从494独特的发表文献中，我们identified 293个相关的发表文献，其中27% (79个/293个)提供了可重用的artifacts。这些可重用的artifacts可以是专门为特定任务设计的工具或IDE插件，也可以是可以微调的下游任务的模型。此外，我们收集了训练过程中硬件使用情况以及训练时间，这些信息共同决定了软件开发过程中的能源消耗。我们发现当前关于源代码模型的软件工程任务的研究中存在资源共享的不足，40%的调查文献不会分享源代码或训练过程中的artifacts。我们建议分享源代码以及训练过程中的artifacts，以实现可持续可重用。此外，我们还建议对训练过程中硬件配置和训练时间的信息进行完整的分享，以确保模型的碳脚印。

External Reasoning: Towards Multi-Large-Language-Models Interchangeable Assistance with Human Feedback

paper_url: http://arxiv.org/abs/2307.12057
repo_url: https://github.com/AkideLiu/ANLP
paper_authors: Akide Liu
for: 提高人工智能的能力，使其能够解决复杂的实际问题
methods: 通过选择性地吸收外部知识库中的知识，并采用多个LLM之间的交换协助来增强LLM的能力
results: 经过全面评估，该方法可以达到现有解决方案的同等或更高的性能，并且比直接LLM处理全文更加高效

Abstract
Memory is identified as a crucial human faculty that allows for the retention of visual and linguistic information within the hippocampus and neurons in the brain, which can subsequently be retrieved to address real-world challenges that arise through a lifetime of learning. The resolution of complex AI tasks through the application of acquired knowledge represents a stride toward the realization of artificial general intelligence. However, despite the prevalence of Large Language Models (LLMs) like GPT-3.5 and GPT-4 , which have displayed remarkable capabilities in language comprehension, generation, interaction, and reasoning, they are inhibited by constraints on context length that preclude the processing of extensive, continually evolving knowledge bases. This paper proposes that LLMs could be augmented through the selective integration of knowledge from external repositories, and in doing so, introduces a novel methodology for External Reasoning, exemplified by ChatPDF. Central to this approach is the establishment of a tiered policy for \textbf{External Reasoning based on Multiple LLM Interchange Assistance}, where the level of support rendered is modulated across entry, intermediate, and advanced tiers based on the complexity of the query, with adjustments made in response to human feedback. A comprehensive evaluation of this methodology is conducted using multiple LLMs and the results indicate state-of-the-art performance, surpassing existing solutions including ChatPDF.com. Moreover, the paper emphasizes that this approach is more efficient compared to the direct processing of full text by LLMs.

摘要
память является ключевым человеческимfaculty, allowing for the retention of visual and linguistic information within the hippocampus and neurons in the brain, which can subsequently be retrieved to address real-world challenges that arise throughout a lifetime of learning. The resolution of complex AI tasks through the application of acquired knowledge represents a stride toward the realization of artificial general intelligence. However, despite the prevalence of Large Language Models (LLMs) like GPT-3.5 and GPT-4, which have displayed remarkable capabilities in language comprehension, generation, interaction, and reasoning, they are inhibited by constraints on context length that preclude the processing of extensive, continually evolving knowledge bases. This paper proposes that LLMs could be augmented through the selective integration of knowledge from external repositories, and in doing so, introduces a novel methodology for External Reasoning, exemplified by ChatPDF. Central to this approach is the establishment of a tiered policy for External Reasoning based on Multiple LLM Interchange Assistance, where the level of support rendered is modulated across entry, intermediate, and advanced tiers based on the complexity of the query, with adjustments made in response to human feedback. A comprehensive evaluation of this methodology is conducted using multiple LLMs and the results indicate state-of-the-art performance, surpassing existing solutions including ChatPDF.com. Moreover, the paper emphasizes that this approach is more efficient compared to the direct processing of full text by LLMs.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

FOCUS: Object-Centric World Models for Robotics Manipulation

paper_url: http://arxiv.org/abs/2307.02427
repo_url: None
paper_authors: Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt
for: 本研究旨在提出一种基于模型的智能体，用于学习一种对象中心的世界模型，以便更好地理解和处理机器人 manipulate 任务。
methods: 本研究使用了一种新的探索奖励机制，基于对象中心的表示，使得智能体更容易探索机器人-对象之间的互动。
results: 研究表明，基于对象中心的世界模型可以帮助智能体更 efficiently 解决 manipulate 任务，并且能够在不同的设定下一致地探索机器人-对象之间的互动。具体来说，通过使用 Franka Emika 机器人臂，我们展示了 FOCUS 在实际场景中的采用。

Abstract
Understanding the world in terms of objects and the possible interplays with them is an important cognition ability, especially in robotics manipulation, where many tasks require robot-object interactions. However, learning such a structured world model, which specifically captures entities and relationships, remains a challenging and underexplored problem. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. Thanks to a novel exploration bonus that stems from the object-centric representation, FOCUS can be deployed on robotics manipulation tasks to explore object interactions more easily. Evaluating our approach on manipulation tasks across different settings, we show that object-centric world models allow the agent to solve tasks more efficiently and enable consistent exploration of robot-object interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could be adopted in real-world settings.

摘要
世界理解为对象和对象之间的交互是至关重要的认知能力，尤其在机器人操作中，许多任务需要机器人和物品之间的交互。然而，学习这种结构化世界模型仍然是一个挑战和未经探索的问题。为解决这个问题，我们提出了 FOCUS 模型基于代理人，该模型可以学习对象中心的世界模型。由于对象中心表示带来的新探索奖励，FOCUS 可以更好地探索机器人和物品之间的交互。在不同的设定下进行 manipulate 任务评估，我们表明对象中心世界模型可以让代理人更加高效地解决任务，并且可以一致地探索机器人和物品之间的交互。使用 Franka Emika 机器人臂，我们也展示了 FOCUS 在实际设定下的采用。

Multi-objective Deep Reinforcement Learning for Mobile Edge Computing

paper_url: http://arxiv.org/abs/2307.14346
repo_url: https://github.com/gracefulning/mec_morl_multipolicy
paper_authors: Ning Yang, Junrui Wen, Meng Zhang, Ming Tang
For: The paper is written for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption.* Methods: The paper uses a multi-objective reinforcement learning (MORL) scheme with proximal policy optimization (PPO) to address the challenge of unknown preferences in mobile edge computing (MEC) systems.* Results: The proposed MORL scheme enhances the hypervolume of the Pareto front by up to 233.1% compared to benchmarks.Here’s the Chinese version of the three information points:* For: 这篇论文是为下一代无线网络应用程序而写的，这些应用程序优先级包括延迟和能耗。* Methods: 这篇论文使用多目标学习（MORL）与距离策略优化（PPO）来解决MEC系统中不确定的偏好问题。* Results: 提议的MORL方案可以提高Pareto前的超Volume比例达到233.1%。

Abstract
Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption. However, conventional single-objective scheduling solutions cannot be directly applied to practical systems in which the preferences of these applications (i.e., the weights of different objectives) are often unknown or challenging to specify in advance. In this study, we address this issue by formulating a multi-objective offloading problem for MEC with multiple edges to minimize expected long-term energy consumption and transmission delay while considering unknown preferences as parameters. To address the challenge of unknown preferences, we design a multi-objective (deep) reinforcement learning (MORL)-based resource scheduling scheme with proximal policy optimization (PPO). In addition, we introduce a well-designed state encoding method for constructing features for multiple edges in MEC systems, a sophisticated reward function for accurately computing the utilities of delay and energy consumption. Simulation results demonstrate that our proposed MORL scheme enhances the hypervolume of the Pareto front by up to 233.1% compared to benchmarks. Our full framework is available at https://github.com/gracefulning/mec_morl_multipolicy.

摘要
Mobile edge computing (MEC) 是下一代移动网络应用程序的关键技术，它们优先级包括延迟和能耗等多个性能指标。然而，传统的单目标调度解决方案不能直接应用于实际系统中，因为这些应用程序的偏好（即不同目标的权重） Frequently unknown or difficult to specify in advance.在本研究中，我们解决了这个问题，通过对 MEC 系统中的多个边进行协调调度，以最小化预期长期能耗和传输延迟，同时考虑不确定的偏好。为了解决不确定的偏好问题，我们设计了一种基于多目标学习（deep reinforcement learning，DRL）的资源调度方案，并使用 proximal policy optimization（PPO）来优化。此外，我们还提出了一种智能的状态编码方法，用于构建 MEC 系统中多个边的特征。此外，我们还提出了一种准确计算延迟和能耗的利益函数。实验结果表明，我们的提posed MORL 方案可以提高 Pareto 前面的抽象体积，与参考值比较，最高提高233.1%。整个框架可以在 GitHub 上找到：https://github.com/gracefulning/mec_morl_multipolicy。

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

paper_url: http://arxiv.org/abs/2307.03084
repo_url: https://github.com/thunlp/opendelta
paper_authors: Shengding Hu, Ning Ding, Weilin Zhao, Xingtai Lv, Zhen Zhang, Zhiyuan Liu, Maosong Sun
for: 大规模预训练模型（PTM）的适应下游任务受到较大的优化负担和存储成本的限制，以适应这种限制，许多研究强调参数精炼训练方法，也称为“delta tuning”，这种方法只更新一小部分参数，称为“delta模块”，而保持背景模型的参数不变。
methods: OpenDelta 是一个开源库，它解决了现有 delta tuning 实现的限制，不需要修改背景 PTM 的代码，可以与不同的 PTM 兼容，并且提供了一系列可扩展的技术，使研究人员和实践者能够方便地适应大型 PTM。
results: OpenDelta 提供了一个简单、干净、可扩展的平台，可以帮助研究人员和实践者快速适应大型 PTM，并且可以提供更多的 delta tuning 方法，以满足不同的应用需求。

Abstract
The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning. To address this, many studies explore parameter-efficient tuning methods, also framed as "delta tuning", which updates only a small subset of parameters, known as "delta modules", while keeping the backbone model's parameters fixed. However, the practicality and flexibility of delta tuning have been limited due to existing implementations that directly modify the code of the backbone PTMs and hard-code specific delta tuning methods for each PTM. In this paper, we present OpenDelta, an open-source library that overcomes these limitations by providing a plug-and-play implementation of various delta tuning methods. Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs. OpenDelta is designed to be simple, modular, and extensible, providing a comprehensive platform for researchers and practitioners to adapt large PTMs efficiently.

摘要
大型预训练模型（PTM）的scale会带来适应下游任务的很大挑战，因为全参数精度调整的优化开销和存储成本很高。为了解决这个问题，许多研究尝试了参数有效调整方法，也称为“delta调整”，该方法只更新一小部分参数，称为“delta模块”，而保持背景模型的参数不变。然而，现有的实现方式直接修改背景PTM的代码，固定了每个PTM的 delta调整方法，这限制了 delta调整的实际性和灵活性。在这篇论文中，我们介绍了 OpenDelta，一个开源库，它解决了这些限制。OpenDelta 提供了多种 delta调整方法的插件式实现，不需要修改背景PTM的代码，因此可以与不同的PTM兼容。我们的新技术使得 OpenDelta 可以与不同的 PTM 兼容，并且可以支持未来的新PTM。OpenDelta 设计为简单、卷积、扩展的，提供了一个完整的平台，让研究人员和实践者能够有效地适应大PTM。

Causal Discovery with Language Models as Imperfect Experts

paper_url: http://arxiv.org/abs/2307.02390
repo_url: https://github.com/stephlong614/causal-disco
paper_authors: Stephanie Long, Alexandre Piché, Valentina Zantedeschi, Tibor Schuster, Alexandre Drouin
for: 本研究旨在提高基于数据驱动的 causal 图识别精度，超出 markov 等类。
methods: 我们使用专家知识来改进数据驱动的 causal 图识别，并考虑了专家可能提供错误信息的情况。我们提出了基于一致性属性的纠正策略，如循环无法和 condition independence 等等。
results: 我们在实际数据上进行了一个案例研究，使用大型自然语言模型作为不准确的专家。

Abstract
Understanding the causal relationships that underlie a system is a fundamental prerequisite to accurate decision-making. In this work, we explore how expert knowledge can be used to improve the data-driven identification of causal graphs, beyond Markov equivalence classes. In doing so, we consider a setting where we can query an expert about the orientation of causal relationships between variables, but where the expert may provide erroneous information. We propose strategies for amending such expert knowledge based on consistency properties, e.g., acyclicity and conditional independencies in the equivalence class. We then report a case study, on real data, where a large language model is used as an imperfect expert.

摘要
理解系统下 causal 关系的本质是决策准确的基本必要条件。在这项工作中，我们研究如何使用专家知识来提高基于数据的 causal 图的识别，超过 markov 等类。在这个过程中，我们考虑了一种情况，在该情况下，我们可以询问专家关于变量之间 causal 关系的方向，但专家可能提供错误的信息。我们提出了基于一致性属性的纠正策略，例如无环性和 conditional independence 等等。然后，我们报告了一个实际数据的案例研究，其中使用大语言模型作为不完全的专家。

2023-07-06

Can ChatGPT’s Responses Boost Traditional Natural Language Processing?

Hybrid Knowledge-Data Driven Channel Semantic Acquisition and Beamforming for Cell-Free Massive MIMO

DeepOnto: A Python Package for Ontology Engineering with Deep Learning

Generalizing Backpropagation for Gradient-Based Interpretability

Art Authentication with Vision Transformers

Sequential Neural Barriers for Scalable Dynamic Obstacle Avoidance

Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning

CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering

A Privacy-Preserving Walk in the Latent Space of Generative Models for Medical Applications

How word semantics and phonology affect handwriting of Alzheimer’s patients: a machine learning based analysis

On the Cultural Gap in Text-to-Image Generation

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations

Amplifying Limitations, Harms and Risks of Large Language Models

In Time and Space: Towards Usable Adaptive Control for Assistive Robotic Arms

LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables

Learning to Solve Tasks with Exploring Prior Behaviours

Contrast Is All You Need

Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain

Enhancing LLM with Evolutionary Fine Tuning for News Summary Generation

Read, Look or Listen? What’s Needed for Solving a Multimodal Dataset

Evaluating raw waveforms with deep learning frameworks for speech emotion recognition

Semi-supervised Domain Adaptive Medical Image Segmentation through Consistency Regularized Disentangled Contrastive Learning

BHEISR: Nudging from Bias to Balance – Promoting Belief Harmony by Eliminating Ideological Segregation in Knowledge-based Recommendations

What Should Data Science Education Do with Large Language Models?

The Role of Subgroup Separability in Group-Fair Medical Image Classification

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback

PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations

Knowledge Graph Self-Supervised Rationalization for Recommendation

Offline Reinforcement Learning with Imbalanced Datasets

RecallM: An Architecture for Temporal Context Understanding and Question Answering

Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating

Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill-Learning

TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations

Validation of the Practicability of Logical Assessment Formula for Evaluations with Inaccurate Ground-Truth Labels

Loss Functions and Metrics in Deep Learning. A Review

SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially Observable Multi-Agent Path Finding

Scaling In-Context Demonstrations with Structured Attention

Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews

AI4OPT: AI Institute for Advances in Optimization

Convergence of Communications, Control, and Machine Learning for Secure and Autonomous Vehicle Navigation

Many-objective Optimization via Voting for Elites

UX Heuristics and Checklist for Deep Learning powered Mobile Applications with Image Classification

Surge Routing: Event-informed Multiagent Reinforcement Learning for Autonomous Rideshare

An explainable model to support the decision about the therapy protocol for AML

Real-time Workload Pattern Analysis for Large-scale Cloud Databases

Learning when to observe: A frugal reinforcement learning framework for a high-cost world

Federated Epidemic Surveillance

Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition

Evade ChatGPT Detectors via A Single Space

ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection

TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers

Artificial Intelligence in archival and historical scholarship workflow: HTS and ChatGPT

The Effects of Interaction Conflicts, Levels of Automation, and Frequency of Automation on Human Automation Trust and Acceptance

Building Cooperative Embodied Agents Modularly with Large Language Models

Elastic Decision Transformer

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

Deductive Additivity for Planning of Natural Language Proofs

Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models

An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

External Reasoning: Towards Multi-Large-Language-Models Interchangeable Assistance with Human Feedback

FOCUS: Object-Centric World Models for Robotics Manipulation

Multi-objective Deep Reinforcement Learning for Mobile Edge Computing

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

Causal Discovery with Language Models as Imperfect Experts