2023-10-30

cs.AI

cs.AI - 2023-10-30

Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

paper_url: http://arxiv.org/abs/2310.20081
repo_url: None
paper_authors: Chris Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh, Zeynab Raeesy, Omar Zia Khan, Abhinav Sethy
for: 提高自然语言处理（NLP）系统的用户体验，特别是通过大语言模型（LLM）来更好地个性化用户体验。
methods: 使用语言模型提取过去用户数据，并将其作为下游任务的提示进行个性化。
results: experiments show 我们的方法可以在实际环境下，即使有时间和成本限制，也能够具有与抽取方法相当或更好的表现，并且可以减少75%的用户数据 Retrieval。

Abstract
Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. Existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. However, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. To overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. The summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. Experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. We demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints.

摘要
personalization, tailoring a system to individual users, is a crucial aspect of user experience with natural language processing (NLP) systems. with the emergence of large language models (LLMs), a key question is how to leverage these models to better personalize user experiences. to personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. however, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. to overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. the summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. we demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints.

FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

paper_url: http://arxiv.org/abs/2310.20071
repo_url: None
paper_authors: Shengzhong Liu, Tomoyoshi Kimura, Dongxin Liu, Ruijie Wang, Jinyang Li, Suhas Diggavi, Mani Srivastava, Tarek Abdelzaher
for: 提出了一种新的对比学习框架FOCAL，用于从多Modal时间序列感知信号中提取全面特征，通过无监督训练。
methods: FOCAL使用多Modal时间序列中的特征编码，并使用modal匹配目标和变换不变目标来提取共同特征和专用特征。同时，它还引入了时间结构约束，以保证模式特征之间的距离关系。
results: FOCAL在四个多Modal感知数据集上进行了广泛的评估，并与现有的基eline进行了比较。结果显示，FOCAL在下游任务中具有明显的优势，具有较高的准确率和较低的损失值。

Abstract
This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately. FOCAL solves these challenges by making the following contributions: First, given multimodal time series, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective. In contrast, the private space extracts modality-exclusive information through a transformation-invariant objective. Second, we propose a temporal structural constraint for modality features, such that the average distance between temporally neighboring samples is no larger than that of temporally distant samples. Extensive evaluations are performed on four multimodal sensing datasets with two backbone encoders and two classifiers to demonstrate the superiority of FOCAL. It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin, under different ratios of available labels. The code and self-collected dataset are available at https://github.com/tomoyoshki/focal.

摘要
First, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective, while the private space extracts modality-exclusive information through a transformation-invariant objective.Second, it introduces a temporal structural constraint for modality features, such that the average distance between temporally neighboring samples is no larger than that of temporally distant samples. This ensures that the model learns to capture the temporal relationships between samples.The proposed framework is evaluated on four multimodal sensing datasets with two backbone encoders and two classifiers. The results show that FOCAL consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin, under different ratios of available labels. The code and self-collected dataset are available at https://github.com/tomoyoshki/focal.

Vignat: Vulnerability identification by learning code semantics via graph attention networks

paper_url: http://arxiv.org/abs/2310.20067
repo_url: None
paper_authors: Shuo Liu, Gail Kaiser
for: 本研究旨在提高软件安全性，通过自适应学习图级别Semantic Representation来发现漏洞。
methods: 我们使用Code Property Graphs (CPGs)来表示代码，并使用Graph Attention Networks (GATs)进行漏洞检测。
results: 我们在可靠的 datasets 上实现了 $57.38%$ 的准确率，并且可以获得漏洞模式的可读性。

Abstract
Vulnerability identification is crucial to protect software systems from attacks for cyber-security. However, huge projects have more than millions of lines of code, and the complex dependencies make it hard to carry out traditional static and dynamic methods. Furthermore, the semantic structure of various types of vulnerabilities differs greatly and may occur simultaneously, making general rule-based methods difficult to extend. In this paper, we propose \textit{Vignat}, a novel attention-based framework for identifying vulnerabilities by learning graph-level semantic representations of code. We represent codes with code property graphs (CPGs) in fine grain and use graph attention networks (GATs) for vulnerability detection. The results show that Vignat is able to achieve $57.38\%$ accuracy on reliable datasets derived from popular C libraries. Furthermore, the interpretability of our GATs provides valuable insights into vulnerability patterns.

摘要
<>transliteration: Vulnerability zhìyè shì yòu zhìyè zhòngjì xìtiě de yìqie zhòngjì zhìyè shì yòu xìtiě zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì yòu zhìyè zhìyè shì yòu xìtiě zhìyè shì yòu zhìyè zhìyè shì yòu yìqie zhìyè shì y

Concept Alignment as a Prerequisite for Value Alignment

paper_url: http://arxiv.org/abs/2310.20059
repo_url: None
paper_authors: Sunayana Rane, Mark Ho, Ilia Sucholutsky, Thomas L. Griffiths
for: 本研究旨在建立AI系统，以安全可靠的方式与人类交互。
methods: 本研究使用 inverse reinforcement learning Setting 进行形式化分析，并证明了概念Alignment 是值Alignment 的必要前提。
results: 人类参与者在实验中证明了，当agent acts intentionally时，人类会根据agent使用的概念来进行合理的思维。

Abstract
Value alignment is essential for building AI systems that can safely and reliably interact with people. However, what a person values -- and is even capable of valuing -- depends on the concepts that they are currently using to understand and evaluate what happens in the world. The dependence of values on concepts means that concept alignment is a prerequisite for value alignment -- agents need to align their representation of a situation with that of humans in order to successfully align their values. Here, we formally analyze the concept alignment problem in the inverse reinforcement learning setting, show how neglecting concept alignment can lead to systematic value mis-alignment, and describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values. Additionally, we report experimental results with human participants showing that humans reason about the concepts used by an agent when acting intentionally, in line with our joint reasoning model.

摘要
<>值Alignment是AI系统与人类之间安全、可靠交互的关键。然而，人类的价值观与可能理解和评估世界的概念相关。因此，概念Alignment是价值Alignment的前提——代理需要与人类的情况表示相对应才能成功地Alignment其价值。我们在 inverse reinforcement learning Setting formally analyze the concept alignment problem, show that neglecting concept alignment can lead to systematic value mis-alignment, and describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values. In addition, we report experimental results with human participants showing that humans reason about the concepts used by an agent when acting intentionally, in line with our joint reasoning model.中文简体版

Constrained Hierarchical Monte Carlo Belief-State Planning

paper_url: http://arxiv.org/abs/2310.20054
repo_url: None
paper_authors: Arec Jamgochian, Hugo Buurmeijer, Kyle H. Wray, Anthony Corso, Mykel J. Kochenderfer
for: 这篇论文目的是为了解决受限制的部分可观察Markov问题（CPOMDP）中的最佳规划问题，并在不同的状态和转换 uncertainty 下保持安全的规划。
methods: 这篇论文使用的方法是将问题分解为较低层的控制问题，使用高层的动作原则（options）来进行搜寻。
results: 这篇论文的结果显示，如果将基本的选项控制器定义为满足指派的紧张预算，那么COBeTS就可以确保满足紧张预算任何时候。否则，COBeTS将导引搜寻 towards a safe sequence of option primitives，并使用层次监控来实现runtime safety。

Abstract
Optimal plans in Constrained Partially Observable Markov Decision Processes (CPOMDPs) maximize reward objectives while satisfying hard cost constraints, generalizing safe planning under state and transition uncertainty. Unfortunately, online CPOMDP planning is extremely difficult in large or continuous problem domains. In many large robotic domains, hierarchical decomposition can simplify planning by using tools for low-level control given high-level action primitives (options). We introduce Constrained Options Belief Tree Search (COBeTS) to leverage this hierarchy and scale online search-based CPOMDP planning to large robotic problems. We show that if primitive option controllers are defined to satisfy assigned constraint budgets, then COBeTS will satisfy constraints anytime. Otherwise, COBeTS will guide the search towards a safe sequence of option primitives, and hierarchical monitoring can be used to achieve runtime safety. We demonstrate COBeTS in several safety-critical, constrained partially observable robotic domains, showing that it can plan successfully in continuous CPOMDPs while non-hierarchical baselines cannot.

摘要
最佳计划在受限 partially observable Markov decision process (CPOMDP) 中最大化 reward 目标，同时满足硬件成本限制，广泛应用于安全观察下的规划。然而，在大型或连续问题领域中的线上 CPOMDP 规划具有极高的问题难度。在许多大型机器人领域中，层次分解可以简化规划，使用工具 для low-level control 给 high-level action primitives（选项）。我们介绍 Constrained Options Belief Tree Search (COBeTS)，以利用这个层次，将线上搜寻基于 CPOMDP 规划 scales 到大型机器人问题领域。我们证明，如果单元选项控制器是将任务分配到硬件成本预算，COBeTS 就一定会满足限制。否则，COBeTS 将导引搜寻 towards a safe sequence of option primitives，并使用层次监控来实现 runtime 安全。我们在 Several safety-critical, constrained partially observable robotic domains 中评估 COBeTS，结果显示它可以在连续 CPOMDP 中成功规划，而非层次基于的基底不能。

Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning

paper_url: http://arxiv.org/abs/2310.20052
repo_url: https://github.com/tachyonicclock/surprisenet-cikm-23
paper_authors: Anton Lee, Yaqian Zhang, Heitor Murilo Gomes, Albert Bifet, Bernhard Pfahringer
for: 这种研究旨在解决人工智能网络在不断学习中遇到的悬峰性干扰和分类境界知识的问题。
methods: 这种方法使用参数隔离方法和基于异常检测的自适应器来解决悬峰性干扰，并且不依赖于图像特定的逻辑假设。
results: 实验表明，SurpriseNet在传统视觉不断学习标准准则上表现出色，以及在结构化数据集上。源代码可以在https://doi.org/10.5281/zenodo.8247906和https://github.com/tachyonicClock/SurpriseNet-CIKM-23中下载。

Abstract
Continual learning aims to create artificial neural networks capable of accumulating knowledge and skills through incremental training on a sequence of tasks. The main challenge of continual learning is catastrophic interference, wherein new knowledge overrides or interferes with past knowledge, leading to forgetting. An associated issue is the problem of learning "cross-task knowledge," where models fail to acquire and retain knowledge that helps differentiate classes across task boundaries. A common solution to both problems is "replay," where a limited buffer of past instances is utilized to learn cross-task knowledge and mitigate catastrophic interference. However, a notable drawback of these methods is their tendency to overfit the limited replay buffer. In contrast, our proposed solution, SurpriseNet, addresses catastrophic interference by employing a parameter isolation method and learning cross-task knowledge using an auto-encoder inspired by anomaly detection. SurpriseNet is applicable to both structured and unstructured data, as it does not rely on image-specific inductive biases. We have conducted empirical experiments demonstrating the strengths of SurpriseNet on various traditional vision continual-learning benchmarks, as well as on structured data datasets. Source code made available at https://doi.org/10.5281/zenodo.8247906 and https://github.com/tachyonicClock/SurpriseNet-CIKM-23

摘要
In contrast, our proposed solution, SurpriseNet, addresses catastrophic interference by employing a parameter isolation method and learning cross-task knowledge using an auto-encoder inspired by anomaly detection. SurpriseNet is applicable to both structured and unstructured data, as it does not rely on image-specific inductive biases. We have conducted empirical experiments demonstrating the strengths of SurpriseNet on various traditional vision continual-learning benchmarks, as well as on structured data datasets. The source code is available at and .

SURF: A Generalization Benchmark for GNNs Predicting Fluid Dynamics

paper_url: http://arxiv.org/abs/2310.20049
repo_url: https://github.com/s-kuenzli/surf-fluidsimulation
paper_authors: Stefan Künzli, Florian Grötschla, Joël Mathys, Roger Wattenhofer
for: 这个论文是为了测试学习基于图的流体动力学模型的通用性而写的。
methods: 这篇论文使用了学习模型来模拟流体动力学，并使用了特定的数据集来测试和比较不同模型的通用性。
results: 研究发现，当模型需要适应不同的结构、分辨率或热动力学范围时，学习基于图的模型的通用性会受到影响。

Abstract
Simulating fluid dynamics is crucial for the design and development process, ranging from simple valves to complex turbomachinery. Accurately solving the underlying physical equations is computationally expensive. Therefore, learning-based solvers that model interactions on meshes have gained interest due to their promising speed-ups. However, it is unknown to what extent these models truly understand the underlying physical principles and can generalize rather than interpolate. Generalization is a key requirement for a general-purpose fluid simulator, which should adapt to different topologies, resolutions, or thermodynamic ranges. We propose SURF, a benchmark designed to test the \textit{generalization} of learned graph-based fluid simulators. SURF comprises individual datasets and provides specific performance and generalization metrics for evaluating and comparing different models. We empirically demonstrate the applicability of SURF by thoroughly investigating the two state-of-the-art graph-based models, yielding new insights into their generalization.

摘要
模拟流体动力学是设计和开发过程中的关键环节，从简单的阀门到复杂的液压机。准确解决下面的物理方程是计算昂贵的。因此，学习型解决方案，即模型在网格上的交互，在计算速度方面表现出了扎根。然而，这些模型是否真正理解下面的物理原理，并能泛化而不仅是 interpolate？泛化是一个关键的要求，以便建立一个通用的流体 simulator，可以适应不同的topology、分辨率或热动力范围。我们提出了 SURF，一个用于测试学习型流体 simulator 的泛化能力的benchmark。SURF包括各个数据集，并提供了特定的性能和泛化指标，用于评估和比较不同的模型。我们进行了大量的实验，证明了 SURF 的可靠性和有用性，并且对两种当前最佳的图像基本模型进行了深入的探索，从而获得了新的理解。

Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization

paper_url: http://arxiv.org/abs/2310.20033
repo_url: None
paper_authors: Prakamya Mishra, Zonghai Yao, Shuwei Chen, Beining Wang, Rohan Mittal, Hong Yu
for: This paper aims to improve the factual consistency of clinical note summarization using ChatGPT to generate high-quality feedback data.methods: The authors use ChatGPT to generate edit feedback for improving the factual consistency of clinical note summarization.results: The authors evaluate the effectiveness of using GPT edits in human alignment, showing promising results in improving factual consistency.

Abstract
Large Language Models (LLMs) like the GPT and LLaMA families have demonstrated exceptional capabilities in capturing and condensing critical contextual information and achieving state-of-the-art performance in the summarization task. However, community concerns about these models' hallucination issues continue to rise. LLMs sometimes generate factually hallucinated summaries, which can be extremely harmful in the clinical domain NLP tasks (e.g., clinical note summarization), where factually incorrect statements can lead to critically erroneous diagnoses. Fine-tuning LLMs using human feedback has shown the promise of aligning LLMs to be factually consistent during generation, but such training procedure requires high-quality human-annotated data, which can be extremely expensive to get in the clinical domain. In this work, we propose a new pipeline using ChatGPT instead of human experts to generate high-quality feedback data for improving factual consistency in the clinical note summarization task. We focus specifically on edit feedback because recent work discusses the shortcomings of human alignment via preference feedback in complex situations (such as clinical NLP tasks that require extensive expert knowledge), as well as some advantages of collecting edit feedback from domain experts. In addition, although GPT has reached the expert level in many clinical NLP tasks (e.g., USMLE QA), there is not much previous work discussing whether GPT can generate expert-level edit feedback for LMs in the clinical note summarization task. We hope to fill this gap. Finally, our evaluations demonstrate the potential use of GPT edits in human alignment, especially from a factuality perspective.

摘要
大型语言模型（LLM）如GPT和LLaMA家族在捕捉和简化关键上下文信息方面表现出了异常的能力，并达到了当前最佳性能的概要任务。然而，社区对这些模型的幻觉问题仍然增长。LLM sometimes generates factually hallucinated summaries, which can be extremely harmful in the clinical domain NLP tasks (e.g., clinical note summarization), where factually incorrect statements can lead to critically erroneous diagnoses. fine-tuning LLMs using human feedback has shown the promise of aligning LLMs to be factually consistent during generation, but such training procedure requires high-quality human-annotated data, which can be extremely expensive to get in the clinical domain. In this work, we propose a new pipeline using ChatGPT instead of human experts to generate high-quality feedback data for improving factual consistency in the clinical note summarization task. We focus specifically on edit feedback because recent work discusses the shortcomings of human alignment via preference feedback in complex situations (such as clinical NLP tasks that require extensive expert knowledge), as well as some advantages of collecting edit feedback from domain experts. In addition, although GPT has reached the expert level in many clinical NLP tasks (e.g., USMLE QA), there is not much previous work discussing whether GPT can generate expert-level edit feedback for LMs in the clinical note summarization task. We hope to fill this gap. Finally, our evaluations demonstrate the potential use of GPT edits in human alignment, especially from a factuality perspective.

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

paper_url: http://arxiv.org/abs/2310.20025
repo_url: None
paper_authors: Mianchu Wang, Rui Yang, Xi Chen, Meng Fang
for: 学习通用策略从多种和多任务的离线数据集中。
methods: 使用两stage模型基于框架，包括预训练一个可以捕捉多种动作分布的先前策略，然后使用划算法与规划来生成假数据 для练化策略。
results: 在多种离线多目标摆动任务上达到了状态之 arts 性能，并且能够处理小数据预算和不同目标的扩展。

Abstract
Offline goal-conditioned RL (GCRL) offers a feasible paradigm to learn general-purpose policies from diverse and multi-task offline datasets. Despite notable recent progress, the predominant offline GCRL methods have been restricted to model-free approaches, constraining their capacity to tackle limited data budgets and unseen goal generalization. In this work, we propose a novel two-stage model-based framework, Goal-conditioned Offline Planning (GOPlan), including (1) pretraining a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset; (2) employing the reanalysis method with planning to generate imagined trajectories for funetuning policies. Specifically, the prior policy is based on an advantage-weighted Conditioned Generative Adversarial Networks that exhibits distinct mode separation to overcome the pitfalls of out-of-distribution (OOD) actions. For further policy optimization, the reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals. Through experimental evaluations, we demonstrate that GOPlan achieves state-of-the-art performance on various offline multi-goal manipulation tasks. Moreover, our results highlight the superior ability of GOPlan to handle small data budgets and generalize to OOD goals.

摘要
<> translate the following text into Simplified Chinese<>Offline目标条件RL（GCRL）提供了一个可行的 парадиг，从多样化和多任务的离线数据集中学习通用策略。尽管最近有所进步，主要的离线GCRL方法受限于无模型的方法，这限制了它们对有限数据预算和未看过目标的泛化能力。在这项工作中，我们提出了一种新的两阶段模型基于框架，即目标条件的计划（GOPlan），包括（1）预训练一个能够捕捉多模态动作分布在多目标数据集中的先前策略；（2）使用计划方法和规划来生成假 trajectory 进行迭代优化策略。具体来说，先前策略基于带有优化的 Conditioned Generative Adversarial Networks（CGANs），可以快速分离出异常行为（OOD）的问题。为了进一步优化策略，计划方法生成了高质量的假数据，通过规划learned模型来实现内 trajectory和间 trajectory的目标。经过实验评估，我们表明 GOPlan 可以在多个离线多目标机械处理任务中 дости得状态之Art的表现。此外，我们的结果还 highlight了 GOPlan 对小数据预算和 OOD 目标的泛化能力。

Topology Recoverability Prediction for Ad-Hoc Robot Networks: A Data-Driven Fault-Tolerant Approach

paper_url: http://arxiv.org/abs/2310.20024
repo_url: None
paper_authors: Matin Macktoobian, Zhan Shu, Qing Zhao
for: 这个论文主要是为了预测无架 robot 网络中发生故障后网络的重建可能性。
methods: 本文使用了 Bayesian Gaussian mixture models 的二条通路数据驱动模型，透过两条不同的预测路径，预测网络中发生故障后网络的重建可能性。
results: 本文的结果显示，与文献中现有最佳策略相比，二条通路数据驱动模型能够成功地解决网络（不）可回复预测问题。

Abstract
Faults occurring in ad-hoc robot networks may fatally perturb their topologies leading to disconnection of subsets of those networks. Optimal topology synthesis is generally resource-intensive and time-consuming to be done in real time for large ad-hoc robot networks. One should only perform topology re-computations if the probability of topology recoverability after the occurrence of any fault surpasses that of its irrecoverability. We formulate this problem as a binary classification problem. Then, we develop a two-pathway data-driven model based on Bayesian Gaussian mixture models that predicts the solution to a typical problem by two different pre-fault and post-fault prediction pathways. The results, obtained by the integration of the predictions of those pathways, clearly indicate the success of our model in solving the topology (ir)recoverability prediction problem compared to the best of current strategies found in the literature.

摘要
FAULTS 发生在随机机器人网络中可能导致网络结构的不稳定，从而导致一些网络的分支断开。优化网络结构是在大规模随机机器人网络中实时进行的资源投入和时间consuming的任务。我们只应在缺陷发生后的概率超过了不可回复的概率时进行网络结构重新计算。我们将这个问题形式化为 binary 分类问题。然后，我们开发了基于极 bayesian Gaussian mixture 模型的两路数据驱动模型，该模型通过两个不同的预缺陷和后缺陷预测路径来预测一个典型问题的解决方案。结果显示，通过将这两个路径的预测结果融合，我们的模型在解决网络（不）可回复预测问题上具有显著的成功，比文献中最佳策略更高。

Multiscale Feature Attribution for Outliers

paper_url: http://arxiv.org/abs/2310.20012
repo_url: None
paper_authors: Jeff Shen, Peter Melchior
for: 这个论文是为了解决自动异常点检测问题，即使数据量很大，也能够更快、更可重复地检测到异常点。
methods: 这篇论文提出了一种新的特征归因方法，即反多尺度遮盖方法，这种方法专门针对异常点，因为在异常点上我们知道的特征很少，模型性能可能也不太好。
results: 论文的实验结果表明，这种特征归因方法在检测银河谱pectra中的异常点时比较有 interpretable 性，比如alternative归因方法更好。

Abstract
Machine learning techniques can automatically identify outliers in massive datasets, much faster and more reproducible than human inspection ever could. But finding such outliers immediately leads to the question: which features render this input anomalous? We propose a new feature attribution method, Inverse Multiscale Occlusion, that is specifically designed for outliers, for which we have little knowledge of the type of features we want to identify and expect that the model performance is questionable because anomalous test data likely exceed the limits of the training data. We demonstrate our method on outliers detected in galaxy spectra from the Dark Energy Survey Instrument and find its results to be much more interpretable than alternative attribution approaches.

摘要
使用机器学习技术可以自动找出大量数据中的异常数据点，比人工检查更快速和可重复。但是发现这些异常数据点后，我们就会问：哪些特征使这个输入异常？我们提出了一种新的特征归因方法，反向多Scale遮盲，特意为异常数据点设计，我们对这些异常数据点知之甚少，而且预期模型性能很差，因为异常测试数据可能超出了训练数据的范围。我们在银河谱spectra中检测到的异常数据点上应用了这种方法，并发现其结果比替代归因方法更易于理解。

Evolutionary Tabletop Game Design: A Case Study in the Risk Game

paper_url: http://arxiv.org/abs/2310.20008
repo_url: None
paper_authors: Lana Bertoldo Rossato, Leonardo Boaventura Bombardelli, Anderson Rocha Tavares
for: 这个论文旨在创造和评估桌面游戏，以提高现有游戏的创新和变化。
methods: 这篇论文使用了进化算法和自动游戏测试来创造和评估桌面游戏。
results: 这篇论文的结果表明，通过使用遗传算法和规则引入的智能游戏测试，可以创造出新的桌面游戏变体，比如卡牌游戏和地图游戏。这些变体的游戏时间 shorter，并且比原始游戏更具 equilibrio。但是，这种方法还有一些局限性，例如，在许多情况下，目标函数 Correctly pursued，但生成的游戏几乎是平庸的。

Abstract
Creating and evaluating games manually is an arduous and laborious task. Procedural content generation can aid by creating game artifacts, but usually not an entire game. Evolutionary game design, which combines evolutionary algorithms with automated playtesting, has been used to create novel board games with simple equipment; however, the original approach does not include complex tabletop games with dice, cards, and maps. This work proposes an extension of the approach for tabletop games, evaluating the process by generating variants of Risk, a military strategy game where players must conquer map territories to win. We achieved this using a genetic algorithm to evolve the chosen parameters, as well as a rules-based agent to test the games and a variety of quality criteria to evaluate the new variations generated. Our results show the creation of new variations of the original game with smaller maps, resulting in shorter matches. Also, the variants produce more balanced matches, maintaining the usual drama. We also identified limitations in the process, where, in many cases, where the objective function was correctly pursued, but the generated games were nearly trivial. This work paves the way towards promising research regarding the use of evolutionary game design beyond classic board games.

摘要
创造和评估游戏手动是一项艰难和劳动密集的任务。生成式内容创造可以帮助，但通常不能创造整个游戏。进化游戏设计，将进化算法与自动游戏测试结合，已经用于创造了一些简单的桌面游戏，但原始方法并不包括复杂的桌面游戏，如骰子、牌和地图。本工作提出了对桌面游戏的扩展，通过生成 variants of Risk，一款军事策略游戏，要求玩家征服地图区域以赢得。我们使用了遗传算法进化选择的参数，以及规则基于的智能客户端来测试游戏，以及多种质量标准来评估新生成的变化。我们的结果显示了创造了原版游戏的新变体，地图较小，比赛更短。此外，新变体具有更平衡的比赛，保持了正常的戏剧。我们还发现了过程中的限制，在许多情况下， objective function 正确追求，但生成的游戏几乎是无聊的。这项工作开启了对进化游戏设计在古典桌面游戏之外的探索。

Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.20007
repo_url: None
paper_authors: Ahmadreza Moradipari, Mohammad Pedramfar, Modjtaba Shokrian Zini, Vaneet Aggarwal
for: 这个论文是为了证明 Thompson Sampling 在强化学习中的首个 Bayesian regret bound。
methods: 这个论文使用了一种简化学习问题的离散集合环境，并通过 posterior consistency 进行了精细的信息率分析。
results: 这个论文得到了时间不同束的强化学习问题中的上界，其上界为 $\widetilde{O}(H\sqrt{d_{l_1}T})$，其中 $H$ 是 episode length，$d_{l_1}$ 是环境空间的 Kolmogorov $l_1-$ 度量。

Abstract
In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.

摘要
在这篇论文中，我们证明了决策者抽取（Thompson Sampling）在强化学习中的第一个悔弃 regret bounds。我们通过简化学习问题，使用离散的代理环境集，并对信息倍数进行细化分析，从而得到时间不同权重学习问题中的上界为$\widetilde{O}(H\sqrt{d_{l_1}T})$，其中$H$是话语长度，$d_{l_1}$是环境空间的科尔莫戈罗夫-$l_1-$度量。然后，我们在各种设置下获得具体的$d_{l_1}$bounds，包括表格、线性和finite mixtures，并讨论了我们的结果如何超越现有的最佳成果。

Unveiling the Limits of Learned Local Search Heuristics: Are You the Mightiest of the Meek?

paper_url: http://arxiv.org/abs/2310.19990
repo_url: None
paper_authors: Ankur Nath, Alan Kuhnle
for: 这个论文旨在探讨 combine neural networks with local search heuristics 在 combinatorial optimization 领域的实践中的问题。
methods: 这个论文使用的方法包括 Tabu Search 和 deep learning 等多种方法。
results: 研究发现，一个简单的学习基于 Tabu Search 的规则可以超越当前最佳学习规则，并且具有更高的性能和普适性。

Abstract
In recent years, combining neural networks with local search heuristics has become popular in the field of combinatorial optimization. Despite its considerable computational demands, this approach has exhibited promising outcomes with minimal manual engineering. However, we have identified three critical limitations in the empirical evaluation of these integration attempts. Firstly, instances with moderate complexity and weak baselines pose a challenge in accurately evaluating the effectiveness of learning-based approaches. Secondly, the absence of an ablation study makes it difficult to quantify and attribute improvements accurately to the deep learning architecture. Lastly, the generalization of learned heuristics across diverse distributions remains underexplored. In this study, we conduct a comprehensive investigation into these identified limitations. Surprisingly, we demonstrate that a simple learned heuristic based on Tabu Search surpasses state-of-the-art (SOTA) learned heuristics in terms of performance and generalizability. Our findings challenge prevailing assumptions and open up exciting avenues for future research and innovation in combinatorial optimization.

摘要
Recently, combining neural networks with local search heuristics has become popular in the field of combinatorial optimization. Despite its considerable computational demands, this approach has shown promising results with minimal manual engineering. However, we have identified three critical limitations in the empirical evaluation of these integration attempts. Firstly, instances with moderate complexity and weak baselines pose a challenge in accurately evaluating the effectiveness of learning-based approaches. Secondly, the absence of an ablation study makes it difficult to quantify and attribute improvements accurately to the deep learning architecture. Lastly, the generalization of learned heuristics across diverse distributions remains underexplored. In this study, we conduct a comprehensive investigation into these identified limitations. Surprisingly, we demonstrate that a simple learned heuristic based on Tabu Search surpasses state-of-the-art (SOTA) learned heuristics in terms of performance and generalizability. Our findings challenge prevailing assumptions and open up exciting avenues for future research and innovation in combinatorial optimization.Translation in Simplified Chinese:近些年来，将神经网络与本地搜索规则结合在一起已成为 combinatorial optimization 领域的 популяр趋势。尽管它们的计算需求相对较高，但这种方法在 minimal 的人工工程下已经展现出了扎心的成果。然而，我们在实证评估中发现了三个关键的限制。首先，有中等复杂度和弱基线的实例会增加评估学习基于方法的准确性问题。其次，缺乏抽象研究使得准确地归因改进到深度学习架构很困难。最后，学习基于分布的搜索规则的总体化仍未得到充分的探索。在这个研究中，我们通过对这些已知的限制进行全面的调查和分析，并 surprisingly 发现一种简单的学习基于 Tabu Search 的规则，在性能和总体化方面超越了当前的学习基于方法。我们的发现推翻了先前的假设，开 up 了未来研究和创新的潜在空间。

BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing

paper_url: http://arxiv.org/abs/2310.19975
repo_url: None
paper_authors: Hieu Tran, Zhichao Yang, Zonghai Yao, Hong Yu
for: 本研究旨在提高生物医学自然语言处理（BioNLP）领域中大语言模型（LLMs）的性能，通过特定任务的指令特有数据集（BioInstruct）进行调整。
methods: 本研究使用GPT-4语言模型生成了超过25,000个例子的自然语言指令数据集（BioInstruct），并通过这些指令进行LLMs的微调。
results: 通过BioInstruct数据集的微调，我们可以提高LLMs在BioNLP应用中的性能，包括信息抽取、问答和文本生成等。此外，我们还发现了多任务学习原则如何帮助指令的贡献。

Abstract
Large language models (LLMs) has achieved a great success in many natural language processing (NLP) tasks. This is achieved by pretraining of LLMs on vast amount of data and then instruction tuning to specific domains. However, only a few instructions in the biomedical domain have been published. To address this issue, we introduce BioInstruct, a customized task-specific instruction dataset containing more than 25,000 examples. This dataset was generated attractively by prompting a GPT-4 language model with a three-seed-sample of 80 human-curated instructions. By fine-tuning LLMs using the BioInstruct dataset, we aim to optimize the LLM's performance in biomedical natural language processing (BioNLP). We conducted instruction tuning on the LLaMA LLMs (1\&2, 7B\&13B) and evaluated them on BioNLP applications, including information extraction, question answering, and text generation. We also evaluated how instructions contributed to model performance using multi-tasking learning principles.

摘要
大型语言模型（LLMs）在许多自然语言处理（NLP）任务中获得了很大的成功。这是通过预训练LLMs在庞大数据量上并在特定领域进行调整而实现的。然而，只有很少的指令在生物医学领域发布。为解决这个问题，我们介绍了 BioInstruct，一个自定义任务特定的指令集合，包含超过25,000个示例。这个数据集是通过提示一个GPT-4语言模型三个种子样本的80个人类批准的指令来生成的。通过使用BioInstruct数据集进行训练LLMs，我们 hoping to 优化LLMs在生物医学自然语言处理（BioNLP）中的表现。我们在LLaMA LLMs（1&2, 7B&13B）上进行了指令调整，并对其进行了BioNLP应用程序的评估，包括信息提取、问答和文本生成。我们还评估了指令对模型性能的贡献，使用多任务学习原理。

ExPT: Synthetic Pretraining for Few-Shot Experimental Design

paper_url: http://arxiv.org/abs/2310.19961
repo_url: None
paper_authors: Tung Nguyen, Sudhanshu Agrawal, Aditya Grover
for: 本研究目的是解决实验设计中的样本效率问题，因为现实中的实验评估需要耗费时间、金钱和安全成本。
methods: 本文使用一种名为Experiment Pretrained Transformers（ExPT）的基本模型，这是一种基于环境学习的减少样本数据集合的方法。
results: 研究表明，ExPT在减少样本数据集合的情况下可以达到更高的性能和普适性，并且在各种复杂的实验设计任务中表现出色。

Abstract
Experimental design is a fundamental problem in many science and engineering fields. In this problem, sample efficiency is crucial due to the time, money, and safety costs of real-world design evaluations. Existing approaches either rely on active data collection or access to large, labeled datasets of past experiments, making them impractical in many real-world scenarios. In this work, we address the more challenging yet realistic setting of few-shot experimental design, where only a few labeled data points of input designs and their corresponding values are available. We approach this problem as a conditional generation task, where a model conditions on a few labeled examples and the desired output to generate an optimal input design. To this end, we introduce Experiment Pretrained Transformers (ExPT), a foundation model for few-shot experimental design that employs a novel combination of synthetic pretraining with in-context learning. In ExPT, we only assume knowledge of a finite collection of unlabelled data points from the input domain and pretrain a transformer neural network to optimize diverse synthetic functions defined over this domain. Unsupervised pretraining allows ExPT to adapt to any design task at test time in an in-context fashion by conditioning on a few labeled data points from the target task and generating the candidate optima. We evaluate ExPT on few-shot experimental design in challenging domains and demonstrate its superior generality and performance compared to existing methods. The source code is available at https://github.com/tung-nd/ExPT.git.

摘要
实验设计是许多科学和工程领域的基本问题。在这个问题中，sample efficiency是非常重要，因为实验评估的时间、钱和安全成本都是非常高昂的。现有的方法都是靠活的数据收集或者有大量的过去实验标签数据来进行，这些方法在实际 scenarios 中是不实际的。在这个工作中，我们处理更加问题的设计问题，其中只有几个标签的输入设计和其对应的值是可用的。我们这个问题作为一个 conditional generation 任务，我们的模型将根据几个标签的例子和目标值来生成最佳的输入设计。为了实现这个目标，我们引入 Experiment Pretrained Transformers（ExPT），一个基于 transformer 神经网络的基础模型，它使用了一种新的组合方法，将 synthetic pretraining 与 in-context learning 相结合。在 ExPT 中，我们仅仅假设知道输入领域中的一个finite collection 的无标例子，并将 transformer 神经网络预训来优化这个领域上的多个无相关函数。无标预训 позволяет ExPT 在测试时以内容的方式适应任务，通过根据几个标签的目标值和目标值来获得候选的最佳设计。我们评估 ExPT 在几 shot 实验设计中的一般性和表现，并证明它在挑战性的领域中比现有的方法表现更好。请见 https://github.com/tung-nd/ExPT.git 的源代码。

Deep Learning for Spatiotemporal Big Data: A Vision on Opportunities and Challenges

paper_url: http://arxiv.org/abs/2310.19957
repo_url: None
paper_authors: Zhe Jiang
for: 本研究旨在探讨深度学习技术在 espacial 大数据领域中的应用，以及这些技术在处理不同类型的 espacial 大数据时的挑战和未来研究需求。
methods: 本研究使用了 Earth 图像大数据等多种 espacial 大数据，并应用了深度学习技术来解决各种陆地覆盖和陆地使用模型化任务。
results: 本研究描述了 espacial 大数据的特点和深度学习技术在这些数据上的应用，并提出了未来研究中需要解决的一些挑战。

Abstract
With advancements in GPS, remote sensing, and computational simulation, an enormous volume of spatiotemporal data is being collected at an increasing speed from various application domains, spanning Earth sciences, agriculture, smart cities, and public safety. Such emerging geospatial and spatiotemporal big data, coupled with recent advances in deep learning technologies, foster new opportunities to solve problems that have not been possible before. For instance, remote sensing researchers can potentially train a foundation model using Earth imagery big data for numerous land cover and land use modeling tasks. Coastal modelers can train AI surrogates to speed up numerical simulations. However, the distinctive characteristics of spatiotemporal big data pose new challenges for deep learning technologies. This vision paper introduces various types of spatiotemporal big data, discusses new research opportunities in the realm of deep learning applied to spatiotemporal big data, lists the unique challenges, and identifies several future research needs.

摘要
With the advancements in GPS, remote sensing, and computational simulation, an enormous amount of spatiotemporal data is being collected at an increasing speed from various application domains, including Earth sciences, agriculture, smart cities, and public safety. This emerging geospatial and spatiotemporal big data, combined with recent advances in deep learning technologies, has opened up new opportunities to solve problems that were previously unsolvable. For example, remote sensing researchers can potentially train a foundation model using Earth imagery big data for numerous land cover and land use modeling tasks. Coastal modelers can train AI surrogates to speed up numerical simulations. However, the unique characteristics of spatiotemporal big data pose new challenges for deep learning technologies. This vision paper introduces various types of spatiotemporal big data, discusses new research opportunities in the realm of deep learning applied to spatiotemporal big data, lists the unique challenges, and identifies several future research needs.Here's the translation of the text in Traditional Chinese:随着GPS、远程感知和计算 simulated的进步，各个应用领域产生了巨量的空间时间数据，包括地球科学、农业、智能城市和公共安全。这些emerging geospatial和空间时间大数据，与最近的深度学习技术的进步，带来了新的机会，例如：将地球图像大数据用于多种土地覆盖和土地使用模型任务的对称基模型训练。海岸模型师可以将AI参数器训练为加速numerical simulations。然而， espaciotemporal big data的特有特征对深度学习技术提出了新的挑战。本 vision paper introduces various types of espaciotemporal big data, discusses new research opportunities in the realm of deep learning applied to espaciotemporal big data, lists the unique challenges, and identifies several future research needs.

Conditional Unscented Autoencoders for Trajectory Prediction

paper_url: http://arxiv.org/abs/2310.19944
repo_url: https://github.com/boschresearch/cuae-prediction
paper_authors: Faris Janjoš, Marcel Hallgarten, Anthony Knittel, Maxim Dolgov, Andreas Zell, J. Marius Zöllner
for: 这篇论文的目的是挑战 \ac{CVAE} 中的一些关键 комponent，并提出改进方案以提高预测性能。
methods: 本论文使用了最近的 VAE 技术，包括不可应用 sampling 和更Structured mixture latent space，以及一种新的、可能更表达ive的推论方法。
results: 试验结果显示，我们的模型在 INTERACTION 预测 dataset 上表现出色，超过了现有的州检验标准，并在 CelebA dataset 上进行图像模型化任务上也超过了基本的 vanilla CVAE。

Abstract
The \ac{CVAE} is one of the most widely-used models in trajectory prediction for \ac{AD}. It captures the interplay between a driving context and its ground-truth future into a probabilistic latent space and uses it to produce predictions. In this paper, we challenge key components of the CVAE. We leverage recent advances in the space of the VAE, the foundation of the CVAE, which show that a simple change in the sampling procedure can greatly benefit performance. We find that unscented sampling, which draws samples from any learned distribution in a deterministic manner, can naturally be better suited to trajectory prediction than potentially dangerous random sampling. We go further and offer additional improvements, including a more structured mixture latent space, as well as a novel, potentially more expressive way to do inference with CVAEs. We show wide applicability of our models by evaluating them on the INTERACTION prediction dataset, outperforming the state of the art, as well as at the task of image modeling on the CelebA dataset, outperforming the baseline vanilla CVAE. Code is available at https://github.com/boschresearch/cuae-prediction.

摘要
《CVAE》是自驾报道预测领域中最广泛使用的模型之一。它捕捉了驾驶Context和其真实未来的关系，并将其转化为一个 probabilistic 的latent space，以生成预测。在这篇论文中，我们挑战了CVAE的关键组件。我们利用了最近的 VAE 的进步，CVAE 的基础，发现一种简单的改变抽样方法可以大幅提高性能。我们发现， deterministic 抽样（unscented sampling）可以更好地适应 trajectory prediction than potentially dangerous random sampling。我们还提供了其他改进，包括一种更结构化的混合幂space，以及一种可能更具表达力的CVAE inference方法。我们通过评估 INTERACTION prediction 数据集和 CelebA 图像模型任务，证明了我们的模型在各个领域中的广泛适用性，并超越了状态前的最佳性能。代码可以在 https://github.com/boschresearch/cuae-prediction 中找到。

Towards Few-Annotation Learning for Object Detection: Are Transformer-based Models More Efficient ?

paper_url: http://arxiv.org/abs/2310.19936
repo_url: https://github.com/cea-list/mt-detr
paper_authors: Quentin Bouniot, Angélique Loesch, Romaric Audigier, Amaury Habrard
for: 这个研究是为了提出一种适用于现有最佳物体检测器 Deformable DETR 的几shot 和半supervised 学习设置的 semi-supervised 方法。
methods: 本研究使用了一个学生-教师架构，避免依赖学生模型中的敏感后处理 Pseudo-labels。
results: 在 COCO 和 Pascal VOC 的半supervised 物体检测 benchmark 上评估了我们的方法，与之前的方法比较，特别是当标签少时表现更好。我们认为我们的贡献开启了新的可能性，以适应类似的物体检测方法。

Abstract
For specialized and dense downstream tasks such as object detection, labeling data requires expertise and can be very expensive, making few-shot and semi-supervised models much more attractive alternatives. While in the few-shot setup we observe that transformer-based object detectors perform better than convolution-based two-stage models for a similar amount of parameters, they are not as effective when used with recent approaches in the semi-supervised setting. In this paper, we propose a semi-supervised method tailored for the current state-of-the-art object detector Deformable DETR in the few-annotation learning setup using a student-teacher architecture, which avoids relying on a sensitive post-processing of the pseudo-labels generated by the teacher model. We evaluate our method on the semi-supervised object detection benchmarks COCO and Pascal VOC, and it outperforms previous methods, especially when annotations are scarce. We believe that our contributions open new possibilities to adapt similar object detection methods in this setup as well.

摘要
特别是在特殊和紧张的下游任务，如物体检测，标注数据需要专家知识和成本很高，使得几步和半supervised模型变得非常吸引人。在几步设置下，我们发现基于转换器的物体检测器比基于 convolution的两stage模型在同等参数量下表现更好。然而，在 semi-supervised 设置下，它们不那么有效。在这篇论文中，我们提出了一种针对当前领先的物体检测器Deformable DETR在几步学习设置中使用学生-教师架构的半supervised方法。我们避免了依赖于敏感的后处理 pseudo-labels 生成于教师模型。我们在 COCO 和 Pascal VOC semi-supervised 物体检测数据集上评估了我们的方法，并与之前的方法相比，它在标注稀缺的情况下表现更好。我们认为，我们的贡献打开了新的可能性，使得类似的物体检测方法在这种设置中也可以适应。

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

paper_url: http://arxiv.org/abs/2310.19927
repo_url: https://github.com/agentification/rp_pgm
paper_authors: Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao
for: 该 paper 是为了研究 model-based ReParameterization Policy Gradient Methods (RP PGMs) 在长期 reinforcement learning 问题中的应用和优化问题。
methods: 该 paper 使用了 theoretical 和 experimental 方法来研究 RP PGMs 的整体性和优化问题，并提出了spectral normalization 方法来缓解长模型拓展导致的潜在梯度变iance问题。
results: 实验结果表明，采用 spectral normalization 方法可以有效缓解梯度变iance问题，并且提高了 RP PGMs 的性能，与其他梯度估计器（如 likelihood Ratio 梯度估计器）相当或更高。

Abstract
ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is available at https://github.com/agentification/RP_PGM.

摘要
<> transtable("ReParameterization（RP）Policy Gradient Methods（PGMs）have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio（LR）gradient estimator. Our code is available at https://github.com/agentification/RP_PGM.）Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

paper_url: http://arxiv.org/abs/2310.19923
repo_url: None
paper_authors: Michael Günther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao
for: The paper is written for researchers and practitioners working on text embedding models, particularly those interested in developing models that can handle long documents.
methods: The paper introduces Jina Embeddings 2, an open-source text embedding model that can accommodate up to 8192 tokens, which is much longer than the conventional 512-token limit. The model uses a novel combination of techniques to achieve state-of-the-art performance on a range of embedding-related tasks.
results: The paper reports that Jina Embeddings 2 achieves performance on par with OpenAI’s proprietary ada-002 model on the MTEB benchmark, and that an extended context can enhance performance in tasks such as NarrativeQA.

Abstract
Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often resort to truncation. One common approach to mitigate this challenge involves splitting documents into smaller paragraphs for embedding. However, this strategy results in a much larger set of vectors, consequently leading to increased memory consumption and computationally intensive vector searches with elevated latency. To address these challenges, we introduce Jina Embeddings 2, an open-source text embedding model capable of accommodating up to 8192 tokens. This model is designed to transcend the conventional 512-token limit and adeptly process long documents. Jina Embeddings 2 not only achieves state-of-the-art performance on a range of embedding-related tasks in the MTEB benchmark but also matches the performance of OpenAI's proprietary ada-002 model. Additionally, our experiments indicate that an extended context can enhance performance in tasks such as NarrativeQA.

摘要
文本嵌入模型已经成为强大工具，可以将句子转换成固定大小的特征向量，捕捉 semantic 信息。而这些模型在信息检索、semantic 聚合和文本重新排序等任务中是必备的，但现有的大多数开源模型，特别是基于 BERT 的模型，在处理长文档时很难表现，通常会导致 truncation。为了解决这个挑战，我们介绍 Jina Embeddings 2，一个可以处理 Up to 8192 个字的开源文本嵌入模型。这个模型不仅在 MTEB 竞赛中表现出优于 Convention 512 个字的限制，还能够高效地处理长文档。 Jina Embeddings 2 不仅达到了一系列嵌入相关任务的状态 искусственный智能表现，还与 OpenAI 的专有 ada-002 模型匹配。此外，我们的实验表明，扩展上下文可以提高 NarrativeQA 等任务的表现。

Unmasking Bias and Inequities: A Systematic Review of Bias Detection and Mitigation in Healthcare Artificial Intelligence Using Electronic Health Records

paper_url: http://arxiv.org/abs/2310.19917
repo_url: None
paper_authors: Feng Chen, Liqin Wang, Julie Hong, Jiaqi Jiang, Li Zhou
for: 这项研究的目的是系统性地查询利用电子医疗记录（EHR）数据的人工智能（AI）应用中的偏见问题。
methods: 这项研究采用了遵循PRISMA指南的系统性回顾方法，从PubMed、Web of Science和IEEE检索到252篇文章，并对其中的20篇文章进行了最终审查。
results: 这项研究发现，在20篇文章中，5种主要的偏见问题得到了覆盖，即8篇文章分析了选择偏见问题，6篇文章分析了隐式偏见问题，5篇文章分析了干扰偏见问题，4篇文章分析了计量偏见问题，2篇文章分析了算法偏见问题。在偏见处理方法方面，10篇文章在模型开发阶段发现了偏见问题，而17篇文章提出了避免偏见问题的方法。

Abstract
Objectives: Artificial intelligence (AI) applications utilizing electronic health records (EHRs) have gained popularity, but they also introduce various types of bias. This study aims to systematically review the literature that address bias in AI research utilizing EHR data. Methods: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guideline. We retrieved articles published between January 1, 2010, and October 31, 2022, from PubMed, Web of Science, and the Institute of Electrical and Electronics Engineers. We defined six major types of bias and summarized the existing approaches in bias handling. Results: Out of the 252 retrieved articles, 20 met the inclusion criteria for the final review. Five out of six bias were covered in this review: eight studies analyzed selection bias; six on implicit bias; five on confounding bias; four on measurement bias; two on algorithmic bias. For bias handling approaches, ten studies identified bias during model development, while seventeen presented methods to mitigate the bias. Discussion: Bias may infiltrate the AI application development process at various stages. Although this review discusses methods for addressing bias at different development stages, there is room for implementing additional effective approaches. Conclusion: Despite growing attention to bias in healthcare AI, research using EHR data on this topic is still limited. Detecting and mitigating AI bias with EHR data continues to pose challenges. Further research is needed to raise a standardized method that is generalizable and interpretable to detect, mitigate and evaluate bias in medical AI.

摘要
目的：人工智能（AI）应用使用电子健康纪录（EHR）得到了广泛的应用，但也会产生不同类型的偏见。本研究目的是系统性地对EHR数据使用AI研究中的偏见进行评估。方法：按照Preferred Reporting Items for Systematic Reviews and Meta-analyses（PRISMA）指南进行系统性综述。我们从2010年1月1日到2022年10月31日 retrievePubMed、Web of Science和Institute of Electrical and Electronics Engineers上的文章。我们定义了六种主要的偏见类型，并summarized现有的偏见处理方法。结果：从252篇文章中，20篇符合包含期刊的要求，进行了最终审查。八种偏见中，八种是选择偏见；六种是隐式偏见；五种是混合偏见；四种是测量偏见；两种是算法偏见。对偏见处理方法，十篇文章在模型开发阶段检测了偏见，而十七篇文章提出了避免偏见的方法。讨论：偏见可能在AI应用开发过程中各个阶段偏见。虽然这篇文章讨论了在不同阶段检测和避免偏见的方法，但还需要实施更多有效的方法。结论：尽管健康AI中的偏见问题已经得到了越来越多的关注，但使用EHR数据进行的研究还是有限的。检测和避免EHR数据上的AI偏见还需要继续进行更多的研究。为了提高AI医疗应用的标准化方法，需要采用一种通用、可读性高的方法来检测、避免和评估偏见。

Interpretable Prototype-based Graph Information Bottleneck

paper_url: http://arxiv.org/abs/2310.19906
repo_url: https://github.com/sang-woo-seo/pgib
paper_authors: Sangwoo Seo, Sungwon Kim, Chanyoung Park
for: 这个论文的目的是提出一种可解释的图 нейрон网络（PGIB）框架，用于提高图 нейрон网络的解释性和性能。
methods: 这个论文使用了prototype学习和信息瓶颈框架，从输入图中提取关键子图，并通过这些关键子图来提供可解释的预测结果。
results: 对比其他状态速法，PGIB在预测性能和解释性两个方面均表现出色，并且通过质量分析得到了证明。

Abstract
The success of Graph Neural Networks (GNNs) has led to a need for understanding their decision-making process and providing explanations for their predictions, which has given rise to explainable AI (XAI) that offers transparent explanations for black-box models. Recently, the use of prototypes has successfully improved the explainability of models by learning prototypes to imply training graphs that affect the prediction. However, these approaches tend to provide prototypes with excessive information from the entire graph, leading to the exclusion of key substructures or the inclusion of irrelevant substructures, which can limit both the interpretability and the performance of the model in downstream tasks. In this work, we propose a novel framework of explainable GNNs, called interpretable Prototype-based Graph Information Bottleneck (PGIB) that incorporates prototype learning within the information bottleneck framework to provide prototypes with the key subgraph from the input graph that is important for the model prediction. This is the first work that incorporates prototype learning into the process of identifying the key subgraphs that have a critical impact on the prediction performance. Extensive experiments, including qualitative analysis, demonstrate that PGIB outperforms state-of-the-art methods in terms of both prediction performance and explainability.

摘要
graph TD成功的图 neuronal networks (GNNs) 带来了理解它们的决策过程和提供对其预测的解释，这 hath led to explainable AI (XAI) 提供了透明的解释 для黑色 Box 模型。在最近，使用 prototype 已成功地提高了模型的解释性，通过学习 prototype 来Imply training graphs that affect the prediction。然而，这些approaches 往往提供 prototype 中过度的信息，从整个图中获取信息，导致遗漏关键子结构或包含无关信息，这可能会限制模型在下游任务中的解释性和性能。在这项工作中，我们提出了一种新的解释 GNN 框架，called interpretable Prototype-based Graph Information Bottleneck (PGIB)。PGIB 在信息瓶颈框架中 incorporates prototype learning 来提供 key subgraph 从输入图中对模型预测的重要性。这是首次 incorporates prototype learning 到 identify 预测性能的关键子图的过程中。extensive experiments, including qualitative analysis, show that PGIB outperforms state-of-the-art methods in terms of both prediction performance and explainability.

Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer

paper_url: http://arxiv.org/abs/2310.19902
repo_url: None
paper_authors: Surya Narayanan Hari, Matt Thomson
for: 这篇论文的目的是解决现有的LLM模型在实际应用中的访问和扩展问题，以及对于这些模型的性能评估。
methods: 这篇论文使用了开源模型库和智能路由器来组织和选择合适的LLM模型，以提高其性能和可靠性。
results: 论文表明，一个由多种开源模型组成的“牧场”可以与商业模型匹配或超越其性能，而且这些模型的大小比商业模型要小得多。此外，当GPT无法回答查询时，“牧场”可以识别一个能够回答查询的模型，超过40%的时间。

Abstract
Currently, over a thousand LLMs exist that are multi-purpose and are capable of performing real world tasks, including Q&A, text summarization, content generation, etc. However, accessibility, scale and reliability of free models prevents them from being widely deployed in everyday use cases. To address the first two issues of access and scale, organisations such as HuggingFace have created model repositories where users have uploaded model weights and quantized versions of models trained using different paradigms, as well as model cards describing their training process. While some models report performance on commonly used benchmarks, not all do, and interpreting the real world impact of trading off performance on a benchmark for model deployment cost, is unclear. Here, we show that a herd of open source models can match or exceed the performance of proprietary models via an intelligent router. We show that a Herd of open source models is able to match the accuracy of ChatGPT, despite being composed of models that are effectively 2.5x smaller. We show that in cases where GPT is not able to answer the query, Herd is able to identify a model that can, at least 40% of the time.

摘要
Here, we show that a herd of open-source models can match or exceed the performance of proprietary models via an intelligent router. Specifically, we show that a herd of open-source models is able to match the accuracy of ChatGPT, despite being composed of models that are effectively 2.5 times smaller. Additionally, we show that in cases where GPT is not able to answer a query, the herd is able to identify a model that can, at least 40% of the time.

paper_url: http://arxiv.org/abs/2310.19889
repo_url: https://github.com/SriramB-98/blindspots-geometry
paper_authors: Sriram Balasubramanian, Gaurang Sriramanan, Vinu Sankar Sadasivan, Soheil Feizi
for: 这种研究旨在探讨深度神经网络在视觉任务中的不敏感性问题，即大 magnitude 的输入变换不会导致网络活动变化。
methods: 该研究使用了Level Set Traversal算法，通过探索输入空间中的高确idence区域，以找到与源图像相似但属于其他类别的输入图像。
results: 研究发现，深度神经网络的等Confidence水平集在输入空间中存在星型结构，而且可以使用高确idence路径连接这些等Confidence水平集。此外，研究还评估了这些连接的高维空间范围。code可以在https://github.com/SriramB-98/blindspots-neurips-sub上获取。

Abstract
Despite the remarkable success of deep neural networks in a myriad of settings, several works have demonstrated their overwhelming sensitivity to near-imperceptible perturbations, known as adversarial attacks. On the other hand, prior works have also observed that deep networks can be under-sensitive, wherein large-magnitude perturbations in input space do not induce appreciable changes to network activations. In this work, we study in detail the phenomenon of under-sensitivity in vision models such as CNNs and Transformers, and present techniques to study the geometry and extent of "equi-confidence" level sets of such networks. We propose a Level Set Traversal algorithm that iteratively explores regions of high confidence with respect to the input space using orthogonal components of the local gradients. Given a source image, we use this algorithm to identify inputs that lie in the same equi-confidence level set as the source image despite being perceptually similar to arbitrary images from other classes. We further observe that the source image is linearly connected by a high-confidence path to these inputs, uncovering a star-like structure for level sets of deep networks. Furthermore, we attempt to identify and estimate the extent of these connected higher-dimensional regions over which the model maintains a high degree of confidence. The code for this project is publicly available at https://github.com/SriramB-98/blindspots-neurips-sub

摘要
尽管深度神经网络在各种设置中表现出了惊人的成功，但是一些研究表明，这些神经网络对于微不足的干扰（ adversarial attacks）具有极高的敏感性。然而，也有一些研究发现，深度网络可能具有不够敏感的问题，即在输入空间中的大规模干扰不会导致神经网络的活动变化。在这种情况下，我们对视力模型，如CNNs和Transformers，进行了详细的研究，并提出了一些技术来研究这些神经网络的几何结构和扩展。我们提出了一种Level Set Traversal算法，该算法可以在输入空间中循环探索高信任级别的区域，并使用本地梯度的正交分量来探索这些区域。给定一个源图像，我们使用这种算法来找到与源图像在输入空间中的同一个等信任水平集的输入图像，并观察到这些输入图像与源图像之间存在一条直线连接，揭示了深度网络的等信任水平集具有星型结构。此外，我们尝试了为这些相关的更高维度区域的扩展，以便更好地了解深度网络在它们中的行为。相关代码可以在https://github.com/SriramB-98/blindspots-neurips-sub上获取。

DEFT: Dexterous Fine-Tuning for Real-World Hand Policies

paper_url: http://arxiv.org/abs/2310.19797
repo_url: https://github.com/adityak77/deft-data
paper_authors: Aditya Kannan, Kenneth Shaw, Shikhar Bahl, Pragna Mannam, Deepak Pathak
for: 本研究旨在探讨人类手部 manipulate 软、可变形物体以及复杂、长期任务中的挑战，以提高机器人 manipulate 的能力。
methods: 本研究提出了一种新的方法，即 DEFT（dexterous fine-tuning for hand policies），它利用人类驱动的假设，通过在实际世界中直接执行来改进这些假设。该方法还包括一种高效的在线优化过程。
results: DEFT 在多个任务中显示出成功，以及一种数据效率的普适的软件 manipulate 路径，用于掌握复杂的 manipulate 任务。您可以通过访问我们的网站 https://dexterous-finetuning.github.io 查看视频结果。

Abstract
Dexterity is often seen as a cornerstone of complex manipulation. Humans are able to perform a host of skills with their hands, from making food to operating tools. In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks. However, learning such behaviors from scratch can be data inefficient. To circumvent this, we propose a novel approach, DEFT (DExterous Fine-Tuning for Hand Policies), that leverages human-driven priors, which are executed directly in the real world. In order to improve upon these priors, DEFT involves an efficient online optimization procedure. With the integration of human-based learning and online fine-tuning, coupled with a soft robotic hand, DEFT demonstrates success across various tasks, establishing a robust, data-efficient pathway toward general dexterous manipulation. Please see our website at https://dexterous-finetuning.github.io for video results.

摘要
dexterity 常被看作复杂的操作的基石。人们可以通过手部执行许多技能，从制备食物到操作工具。在这篇论文中，我们调查这些挑战，尤其是在软、可变形物体以及复杂、较长时间任务上。然而，从头来学习这些行为可以是数据不fficient。为了缓解这个问题，我们提出了一种新的方法，即 DEFT（手部精细调整 для手指策略），它利用人类驱动的先验知识，直接在实际世界中执行。为了提高这些先验知识，DEFT包括一种高效的在线优化过程。通过人类学习和在线细化，以及软体机械手部，DEFT在多种任务中成功，建立了一条可靠、数据fficient的通路 toward 普遍的手部精细操作。请参考我们网站查看视频结果。

Re-evaluating Retrosynthesis Algorithms with Syntheseus

paper_url: http://arxiv.org/abs/2310.19796
repo_url: None
paper_authors: Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gaiński, Philipp Seidl, Marwin Segler
for: 本研究主要目标是提高化学synthesis的计划和评估方法。
methods: 本研究使用了一个名为syntheseus的 benchmarking 库，该库鼓励了best practice的使用，以便对单步和多步 retrosynthesis 算法进行一致的评估。
results: 通过使用syntheseus库进行重新评估，发现了一些之前的 retrosynthesis 算法的排名发生了变化。

Abstract
The planning of how to synthesize molecules, also known as retrosynthesis, has been a growing focus of the machine learning and chemistry communities in recent years. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques. To remedy this, we present a benchmarking library called syntheseus which promotes best practice by default, enabling consistent meaningful evaluation of single-step and multi-step retrosynthesis algorithms. We use syntheseus to re-evaluate a number of previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes when evaluated carefully. We end with guidance for future works in this area.

摘要
“retrosynthesis的规划”在过去几年内，机器学习和化学社区的关注越来越高。尽管表面上看来有稳定的进步，我们认为现有的评价标准和比较方法存在系统性的缺陷。为了解决这个问题，我们提出了一个名为 syntheseus的评价库，它默认采用了最佳实践，使得单步和多步retrosynthesis算法的meaningful评价成为可能。我们使用 syntheseus 重新评估了一些先前的retrosynthesis算法，并发现当仔细评估时，现状的模型的排名发生变化。最后，我们提出了未来这个领域的指导方针。

Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone

paper_url: http://arxiv.org/abs/2310.19859
repo_url: None
paper_authors: Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Ao Ma, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou
for: 本研究旨在提出一种新的参数高效调整方法，以便将大规模基模型传递到下游应用中。
methods: 该方法基于不同的调整策略，通过意图解耦调整器与基模型的关系，使得调整器的设计和学习不再依赖基模型。
results: 对于权重调整和泛化调整等多种调整策略，该方法能够提供更高效的参数调整，并且可以轻松地搭配多种调整策略。经验表明，该方法在推理和生成任务上具有较高的效果和效率。

Abstract
Parameter-efficient tuning has become a trend in transferring large-scale foundation models to downstream applications. Existing methods typically embed some light-weight tuners into the backbone, where both the design and the learning of the tuners are highly dependent on the base model. This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally unbinds tuners from the backbone. With both theoretical and empirical evidence, we show that popular tuning approaches have their equivalent counterparts under our unbinding formulation, and hence can be integrated into our framework effortlessly. Thanks to the structural disentanglement, we manage to free the design of tuners from the network architecture, facilitating flexible combination of various tuning strategies. We further propose a memory-efficient variant of Res-Tuning, where the bypass i.e., formed by a sequence of tuners) is effectively detached from the main branch, such that the gradients are back-propagated only to the tuners but not to the backbone. Such a detachment also allows one-time backbone forward for multi-task inference. Extensive experiments on both discriminative and generative tasks demonstrate the superiority of our method over existing alternatives from the perspectives of efficacy and efficiency. Project page: $\href{https://res-tuning.github.io/}{\textit{https://res-tuning.github.io/}$.

摘要
大规模基础模型转移到下游应用的Parameter-efficient tuning已成为当前趋势。现有方法通常将轻量级调参器 embedding到后向，其设计和学习均高度依赖于基模型。这项工作提出了一新调参方式，名为Res-Tuning，它意图将调参器解除与后向的绑定。我们通过理论和实验证明，流行的调参方法均有其对应的等价对手在我们的解绑形式下，因此可以轻松地 интеGRATE到我们的框架中。由于结构分离，我们可以在调参器的设计上免除网络架构的限制，实现灵活的调参策略组合。此外，我们还提出了内存效率改进的Res-Tuning变体，其中分支（formed by a sequence of tuners）被有效地分离于主支，使得梯度只回传给调参器而不回传到后向。这种分离还允许一次主支前进 для多任务推理。广泛的实验表明，我们的方法在效果和效率两个角度上都超越了现有的方法。项目页面： $\href{https://res-tuning.github.io/}{\textit{https://res-tuning.github.io/}$。

paper_url: http://arxiv.org/abs/2310.19795
repo_url: https://github.com/donghao51/simmmdg
paper_authors: Hao Dong, Ismail Nejjar, Han Sun, Eleni Chatzi, Olga Fink
for: 这个研究旨在解决多modal Distribution Generalization (DG) 中的挑战，当模型需要在不同的modalities上缩减到未知的target Distribution。methods: 我们提出了一个简单 yet effective的多modal DG框架，叫做SimMMDG。我们认为将不同modalities的特征映射到同一个嵌入空间会降低模型的通用性。因此，我们提出了在每个modalities中分解特征为modalitiespecific和modalitieshared部分。我们运用了监督式对应学习 modalitieshared特征，以保持它们具有共同性，并对modalitiespecific特征强制距离。此外，我们引入了跨modalities翻译模块，以调整学习的特征。results: 我们的框架理论上得到支持，并在EPIC-Kitchens dataset和我们在本文中介绍的新的Human-Animal-Cartoon (HAC) dataset上实现了强大的多modal DG性能。我们的原始代码和HAC dataset可以在https://github.com/donghao51/SimMMDG上获得。

Abstract
In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions. Generalizing to unseen multi-modal distributions poses even greater difficulties due to the distinct properties exhibited by different modalities. To overcome the challenges of achieving domain generalization in multi-modal scenarios, we propose SimMMDG, a simple yet effective multi-modal DG framework. We argue that mapping features from different modalities into the same embedding space impedes model generalization. To address this, we propose splitting the features within each modality into modality-specific and modality-shared components. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints on modality-specific features to promote diversity. In addition, we introduce a cross-modal translation module to regularize the learned features, which can also be used for missing-modality generalization. We demonstrate that our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon (HAC) dataset introduced in this paper. Our source code and HAC dataset are available at https://github.com/donghao51/SimMMDG.

摘要
在实际应用场景中，实现领域泛化（DG）存在重大挑战，因为模型需要泛化到未知目标分布。在多modal的场景中，泛化到未见多Modal的分布呈现更大的挑战，因为不同modalities具有不同的特性。为了解决多modal的领域泛化问题，我们提出了SimMMDG框架，这是一种简单 yet有效的多Modal DG框架。我们认为将不同modalities的特征映射到同一个嵌入空间内会阻碍模型泛化。为此，我们提议将每个modalities的特征分为具有共同特性的特征和具有特定特性的特征。我们使用supervised contrastive学习来确保共同特性，并对特定特性进行距离约束，以促进多Modal特征的多样性。此外，我们还引入了跨Modal翻译模块，以规范学习的特征。我们的框架理论上有良好支持，并在EPIC-Kitchens数据集和我们在本文中介绍的新的人类动物卡通（HAC）数据集上实现了强大的性能。我们的源代码和HAC数据集可以在https://github.com/donghao51/SimMMDG上获取。

LILO: Learning Interpretable Libraries by Compressing and Documenting Code

paper_url: http://arxiv.org/abs/2310.19791
repo_url: https://github.com/gabegrand/lilo
paper_authors: Gabriel Grand, Lionel Wong, Matthew Bowers, Theo X. Olausson, Muxin Liu, Joshua B. Tenenbaum, Jacob Andreas
for: 本研究旨在开发一个基于神经符号学术的代码生成框架，以帮助开发人员快速生成可读可写的代码库。
methods: 本研究使用了大语言模型（LLM）引导的程序生成技术，以及Stitch符号压缩系统来高效地压缩代码。此外，研究还引入了自动文档（AutoDoc）程序，以帮助理解和应用学习抽象。
results: 对三个 inductive 程序生成benchmark进行了评测，并与现有的神经和符号方法进行了比较。研究发现，LILO可以解决更复杂的任务，并学习更加深入的语言知识。

Abstract
While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability, we find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing, scene reasoning, and graphics composition. Compared to existing neural and symbolic methods - including the state-of-the-art library learning algorithm DreamCoder - LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge.

摘要
大型语言模型（LLM）现在在代码生成方面表现出色，但是软件开发中一个关键方面是 refactoring：将代码集成到可重用和易读的库中。在这篇论文中，我们介绍了 LILO，一个神经符号学框架，可以逐步生成、压缩和文档代码，以建立适应特定问题领域的库。LILO将神经网络引导的程序生成与Stitch的符号压缩系统相结合，以实现高效的lambda抽象。为了让这些抽象更易理解，我们引入了自动文档（AutoDoc）过程，可以根据Contextual例子来生成自然语言名称和docstrings。除了提高人类可读性外，我们发现AutoDoc会提高LILO的生成器使用学习的性能。我们对LILO进行了三个induced程序生成benchmark测试，包括字符串编辑、Scene reasoning和图形组合。相比 existed神经和符号方法，包括DreamCoder库学习算法，LILO可以解决更复杂的任务，并学习更加深入的语言知识。

From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces

paper_url: http://arxiv.org/abs/2310.19786
repo_url: None
paper_authors: Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich
for: 这个论文目的是为了提出一种新的减少方法，即将换 regret 转换为外部 regret，以优化 classical reductions 中的一些不稳定性。
methods: 这个论文使用了一种新的减少方法，即将换 regret 转换为外部 regret，并通过对这种减少方法的分析，得出了一些新的结论。
results: 这个论文的结果表明，当存在一个无外部 regret 算法时，必然也存在一个无换 regret 算法，并且这种无换 regret 算法的性能比 classical reductions 更好。此外，这个论文还提供了一个新的下界，其表明在某些游戏中，换 regret 的数量必然是 $\tilde\Omega(N/\epsilon^2)$ 或者是 exponential in $1/\epsilon$。

Abstract
We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour [BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness of the space of actions. We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class. For the problem of learning with expert advice, our result implies that it is possible to guarantee that the swap regret is bounded by {\epsilon} after $\log(N)^{O(1/\epsilon)}$ rounds and with $O(N)$ per iteration complexity, where $N$ is the number of experts, while the classical reductions of Blum-Mansour and Stolz-Lugosi require $O(N/\epsilon^2)$ rounds and at least $\Omega(N^2)$ per iteration complexity. Our result comes with an associated lower bound, which -- in contrast to that in [BM07] -- holds for oblivious and $\ell_1$-constrained adversaries and learners that can employ distributions over experts, showing that the number of rounds must be $\tilde\Omega(N/\epsilon^2)$ or exponential in $1/\epsilon$. Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria, of arbitrarily good approximation. This strengthens the folklore implication of no-regret learning that approximate coarse correlated equilibria exist. Importantly, it provides a sufficient condition for the existence of correlated equilibrium which vastly extends the requirement that the action set is finite, thus answering a question left open by [DG22; Ass+23]. Moreover, it answers several outstanding questions about equilibrium computation and/or learning in games.

摘要
我们提供了一种新的减少方法，将交换 regret 转化为外部 regret，从而超越了布姆-曼索尔（BM07）和斯托尔-卢戈西（SL05）的经典减少方法，因为它不需要动作空间的Finite。我们证明了，当存在一个无外部 regret 算法时，也一定存在一个无交换 regret 算法。在学习专家建议中，我们的结果表明，可以保证在 $\log(N)^{O(1/\epsilon)}$ 轮后，交换 regret 不超过 $\epsilon$，并且每轮复杂度为 $O(N)$，where $N$ 是专家数量。而布姆-曼索尔和斯托尔-卢戈西的经典减少方法需要 $O(N/\epsilon^2)$ 轮和至少 $\Omega(N^2)$ 每轮复杂度。我们的结果还有一个相关的下界，这下界在对快速反应者和 $\ell_1$ 约束学习者来说，并且可以使用分布来选择专家，显示了轮数必须是 $\tilde\Omega(N/\epsilon^2)$ 或者 exponential in $1/\epsilon$。我们的减少方法表明，如果有一个不 regret 学习是可能的游戏，那么这个游戏一定有approximate correlated equilibria，并且这些 equilibria 的准确程度可以是arbitrarily good。这个结论超越了布姆-曼索尔的结论，因为它不需要动作空间的Finite。此外，我们的结论还回答了一些关于平衡计算和学习的问题。

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models

paper_url: http://arxiv.org/abs/2310.19784
repo_url: None
paper_authors: Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan
for: 文章目的是提出一种基于 CustomNet 的对象自定义方法，以实现在文本到图像生成中实现对象的自定义。
methods: 该方法使用了三个重要的技术：1) 3D 新视角合成；2) 对象自定义；3) 文本描述或特定用户定义的图像来控制位置和背景。
results: 该方法可以在不需要测试时间优化的情况下，实现零 instances 的对象自定义，同时具有较好的个体保持和多样性。

Abstract
Incorporating a customized object into image generation presents an attractive feature in text-to-image generation. However, existing optimization-based and encoder-based methods are hindered by drawbacks such as time-consuming optimization, insufficient identity preservation, and a prevalent copy-pasting effect. To overcome these limitations, we introduce CustomNet, a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process. This integration facilitates the adjustment of spatial position relationships and viewpoints, yielding diverse outputs while effectively preserving object identity. Moreover, we introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images, overcoming the limitations of existing 3D novel view synthesis methods. We further leverage a dataset construction pipeline that can better handle real-world objects and complex backgrounds. Equipped with these designs, our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background. As a result, our CustomNet ensures enhanced identity preservation and generates diverse, harmonious outputs.

摘要
通过包含自定义对象在图像生成中，文本到图像生成具有吸引人的特点。然而，现有的优化方法和编码器方法受到了一些缺点，如时间消耗优化、保持对象标识不足和广泛的复制效应。为了解决这些局限性，我们介绍了CustomNet，一种新的对象自定义方法，其中Explicitly incorporates 3D新视野合成能力到对象自定义过程中。这种整合使得可以调整空间位置关系和视点，从而生成多样的输出，同时有效地保持对象标识。此外，我们还引入了细腻的设计，使得通过文本描述或特定用户定义的图像来控制位置和背景，超越现有的3D新视野合成方法的限制。我们还利用了更好的数据构建管道，可以更好地处理真实世界中的对象和复杂背景。准备这些设计，我们的方法可以在零时优化下实现无需测试时优化的自定义对象，同时控制视点、位置和背景。因此，我们的CustomNet可以保持对象标识并生成多样、和谐的输出。

Designing AI Support for Human Involvement in AI-assisted Decision Making: A Taxonomy of Human-AI Interactions from a Systematic Review

paper_url: http://arxiv.org/abs/2310.19778
repo_url: None
paper_authors: Catalina Gomez, Sue Min Cho, Shichang Ke, Chien-Ming Huang, Mathias Unberath
for: 提高人工智能在决策支持系统中的用户体验，增强人工智能与人类的交互。
methods: 系统atic review of AI-assisted decision making literature，分析105篇论文，提出了一种交互模式分类法，用于描述不同的人工智能交互方式。
results: 现有交互主要是简单的合作模式，报告了相对少的交互功能支持。 taxonomy 能够帮助理解现有决策支持系统中人工智能交互的现状，并促进交互设计的审慎选择。

Abstract
Efforts in levering Artificial Intelligence (AI) in decision support systems have disproportionately focused on technological advancements, often overlooking the alignment between algorithmic outputs and human expectations. To address this, explainable AI promotes AI development from a more human-centered perspective. Determining what information AI should provide to aid humans is vital, however, how the information is presented, e. g., the sequence of recommendations and the solicitation of interpretations, is equally crucial. This motivates the need to more precisely study Human-AI interaction as a pivotal component of AI-based decision support. While several empirical studies have evaluated Human-AI interactions in multiple application domains in which interactions can take many forms, there is not yet a common vocabulary to describe human-AI interaction protocols. To address this gap, we describe the results of a systematic review of the AI-assisted decision making literature, analyzing 105 selected articles, which grounds the introduction of a taxonomy of interaction patterns that delineate various modes of human-AI interactivity. We find that current interactions are dominated by simplistic collaboration paradigms and report comparatively little support for truly interactive functionality. Our taxonomy serves as a valuable tool to understand how interactivity with AI is currently supported in decision-making contexts and foster deliberate choices of interaction designs.

摘要
对于人工智能（AI）在决策支持系统中的应用，各方面的努力都集中在技术进步上，而忽略了算法的人类预期之间的协调。为了解决这个问题，可观察AI的发展方式更加人类中心。决定AI为人类提供什么样的信息是重要的，但是如何呈现这些信息，例如推荐的顺序和寻求解释的方式，也是非常重要的。这为人类AI互动的研究提供了动机，并且发现了多种应用领域中的人类AI互动协议。但是，目前还没有一个通用的语言来描述人类AI互动协议。为了解决这个问题，我们描述了105篇选择的文献的系统性评审结果，并从这些文献中提取了人类AI互动协议的分类。我们发现现有的互动都偏向简单的合作模式，并报告了相对较少的支持真正互动性能。我们的分类可以作为了解人类AI互动在决策context中的支持方式，并且激发人们对互动设计的意识。

Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery

paper_url: http://arxiv.org/abs/2310.19776
repo_url: https://github.com/sarahrastegar/infosieve
paper_authors: Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek
for: 提出了一种能够在测试时发现未知类别的新方法
methods: 基于优化的思路，对数据实例分配最短类别编码，从而控制类别细分程度
results: 经过实验证明，该方法可以在测试时成功地处理未知类别，并且与现有标准模型进行比较In English, this translates to:
for: Proposed a new method for discovering unknown categories at test time
methods: Based on optimization, assign minimum length category codes to individual data instances to control category granularity
results: Experimental results demonstrate the effectiveness of the method in handling unknown categories at test time, with comparisons to state-of-the-art benchmarks

Abstract
In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a \textit{category}? In this paper, we conceptualize a \textit{category} through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at: \url{https://github.com/SarahRastegar/InfoSieve}.

摘要
“在试用时探索新的分类ategories，我们面临传统超级vised recognition模型的内在限制。这些模型仅仅受到预先定义的分类category set的限制，而我们则寻求在试用时自动发现新的分类categories。在这篇论文中，我们将分类category视为一个优化问题的最佳解决方案。我们利用这个独特的概念，提出一种新的、效率高且自动化的方法，可以在试用时发现未知的分类categories。我们的方法将实现分类category code的最小长度对个别数据实例的对应，这为我们提供了更好的分类精确度控制，因此我们的模型可以更好地处理细部分类。我们的实验结果，以及与现有的基eline比较，证明了我们的解决方案在处理未知分类ategories的能力。此外，我们还提供了理论基础，证明了我们的方法是最佳的。我们的代码可以在以下连结中找到：https://github.com/SarahRastegar/InfoSieve。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions

paper_url: http://arxiv.org/abs/2310.19775
repo_url: None
paper_authors: Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, Simone Stumpf
for: 本研究旨在探讨透明AI（XAI）的发展和应用，以及相关领域的实际挑战。
methods: 本文使用 manifold learning 和 feature importance 等方法来解释AI模型的决策过程。
results: 本研究提出了27个开问，并分类为9个类别，以便各个领域的研究人员可以共同努力解决XAI领域的挑战。

Abstract
As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.

摘要
As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.Translated text in Simplified Chinese:随着透明化人工智能（AI）系统在多个实际应用场景中的普及，理解这些黑盒模型已经成为非常重要。为此，解释AI（XAI）已经成为一种研究领域，具有实用和伦理上的利益。本文不仅探讨了XAI的发展和应用，还Addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.

Autoregressive Renaissance in Neural PDE Solvers

paper_url: http://arxiv.org/abs/2310.19763
repo_url: None
paper_authors: Yolanne Yi Ran Lee
for: 本研究旨在提出一种基于graph neural network的 partial differential equation（PDE）解决方法，以替代传统的束缚方法和Fourier Neural Operator。
methods: 该方法使用了一种名为message passing graph neural network的新型网络架构，通过消息传递机制来实现PDE的解决。
results: 研究表明，该方法可以与或超越传统的PDE解决方法和Fourier Neural Operator在泛化能力和性能上，并且可以更好地处理一些复杂的PDE问题。

Abstract
Recent developments in the field of neural partial differential equation (PDE) solvers have placed a strong emphasis on neural operators. However, the paper "Message Passing Neural PDE Solver" by Brandstetter et al. published in ICLR 2022 revisits autoregressive models and designs a message passing graph neural network that is comparable with or outperforms both the state-of-the-art Fourier Neural Operator and traditional classical PDE solvers in its generalization capabilities and performance. This blog post delves into the key contributions of this work, exploring the strategies used to address the common problem of instability in autoregressive models and the design choices of the message passing graph neural network architecture.

摘要
近期在神经partial differential equation（PDE）解决方法领域，强调神经操作符的发展。然而，布兰德塞特特等在ICLR 2022年发表的论文《消息传递神经PDE解决方法》，重新评估了自动律型模型，并设计了一个可与或超越现有的快扩散 Neil 算法和传统类型PDE解决方法的消息传递图 neural network 架构。本博客文章将探讨这项工作的关键贡献，探讨了自动律型模型中常见的不稳定性问题的解决方案，以及消息传递图 neural network 架构的设计选择。

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

paper_url: http://arxiv.org/abs/2310.19737
repo_url: https://github.com/schwinnl/llm_embedding_attack
paper_authors: Leo Schwinn, David Dobre, Stephan Günnemann, Gauthier Gidel
for: 本研究旨在解决 neural network 的Robustness问题，尤其是在 natural language processing 领域中，以防止 adversarial attack。
methods: 本研究使用了一些新的方法来评估 robustness，包括 embedding space attacks 和 LLM-specific best practices。
results: 研究发现，without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach。此外， embedding space attacks 也成为了一种可行的威胁模型。

Abstract
Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastly unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic's Claude. We provide a first set of prerequisites to improve the robustness assessment of new approaches and reduce the amount of faulty evaluations. Additionally, we identify embedding space attacks on LLMs as another viable threat model for the purposes of generating malicious content in open-sourced models. Finally, we demonstrate on a recently proposed defense that, without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach.

摘要
Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

Evaluating Large Language Models: A Comprehensive Survey

paper_url: http://arxiv.org/abs/2310.19736
repo_url: https://github.com/tjunlp-lab/awesome-llms-evaluation-papers
paper_authors: Zishan Guo, Renren Jin, Chuang Liu, Yufei Huang, Dan Shi, Supryadi, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong
For: 评估大语言模型（LLMs）的能力和安全性。* Methods: 分为三类：知识和能力评估、对Alignment评估和安全评估。* Results: 提供了一个全面的评估方法和标准套件，以及一些特定领域的评估研究。

Abstract
Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. Additionally, the rapid progress of LLMs raises concerns about the potential emergence of superintelligent systems without adequate safeguards. To effectively capitalize on LLM capacities as well as ensure their safe and beneficial development, it is critical to conduct a rigorous and comprehensive evaluation of LLMs. This survey endeavors to offer a panoramic perspective on the evaluation of LLMs. We categorize the evaluation of LLMs into three major groups: knowledge and capability evaluation, alignment evaluation and safety evaluation. In addition to the comprehensive review on the evaluation methodologies and benchmarks on these three aspects, we collate a compendium of evaluations pertaining to LLMs' performance in specialized domains, and discuss the construction of comprehensive evaluation platforms that cover LLM evaluations on capabilities, alignment, safety, and applicability. We hope that this comprehensive overview will stimulate further research interests in the evaluation of LLMs, with the ultimate goal of making evaluation serve as a cornerstone in guiding the responsible development of LLMs. We envision that this will channel their evolution into a direction that maximizes societal benefit while minimizing potential risks. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers.

摘要
大型语言模型（LLM）在各种任务上表现出了惊人的能力，引起了广泛的关注和应用。然而，与一个双刃剑相似，LLM也存在潜在的风险。它们可能会导致私人数据泄露或生成不当、伤害或误导的内容。此外，LLM的快速进步也引起了关于可能出现无适应安全措施的超智系统的担忧。为了有效利用LLM的能力并确保其安全和有益的发展，对LLM的评估是非常重要。本调查尝试提供LLM评估的全面视图。我们将LLM评估分为三个主要类别：知识和能力评估、对齐评估和安全评估。此外，我们还提供了对这三个方面评估方法和标准的全面评论，并收录了关于LLM在特定领域的表现评估，以及建立了涵盖LLM评估能力、对齐性、安全性和可用性的完整评估平台。我们希望这份全面的概述能够激发更多关于LLM评估的研究兴趣，以实现评估成为LLM发展的重要指南，以最大化社会 benefit while minimizing potential risks。相关论文的汇总可以在中找到。

ViR: Vision Retention Networks

paper_url: http://arxiv.org/abs/2310.19731
repo_url: None
paper_authors: Ali Hatamizadeh, Michael Ranzinger, Jan Kautz
for: 该 paper 的目的是提出一种新的计算机视网络模型，以实现高速的推理和平行的训练。
methods: 该 paper 使用了新的拓展 Transformer 模型，并提出了一种新的并行和循环表示方法，以实现高效的推理和训练。
results: 该 paper 通过了多种 dataset 和不同的图像分辨率进行了广泛的实验，并 achieved 竞争性的性能。

Abstract
Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios which demand fast inference. This effect is even more pronounced in applications in which autoregressive modeling of input features is required. In Natural Language Processing (NLP), a new stream of efforts have proposed parallelizable models with recurrent formulation that allows for efficient inference in generative applications. Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance. In particular, ViR scales favorably for image throughput and memory consumption in tasks that require higher-resolution images due to its flexible formulation in processing large sequence lengths. The ViR is the first attempt to realize dual parallel and recurrent equivalency in a general vision backbone for recognition tasks. We have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions and achieved competitive performance. Our code and pretrained models will be made publicly available.

摘要
视transformer（ViT）在最近几年内引起了很多关注，因为它们在模型长距离空间相互关联的能力和批处理大规模训练中表现出色。 although self-attention机制的训练并行性起到了重要的作用，但它的quadratic复杂性使得ViT在许多场景中不适用，特别是需要快速推理的应用场景。在自然语言处理（NLP）领域，一些新的努力提出了并行化模型的概念，使得在生成应用中实现快速推理变得可能。 drawing inspiration from this trend, we propose a new class of computer vision models, called Vision Retention Networks (ViR), which strike an optimal balance between fast inference and parallel training with competitive performance. in particular, ViR scales favorably for image throughput and memory consumption in tasks that require higher-resolution images due to its flexible formulation in processing large sequence lengths. ViR is the first attempt to realize dual parallel and recurrent equivalency in a general vision backbone for recognition tasks. we have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions and achieved competitive performance. our code and pretrained models will be made publicly available.

Generating Medical Instructions with Conditional Transformer

paper_url: http://arxiv.org/abs/2310.19727
repo_url: None
paper_authors: Samuel Belkadi, Nicolo Micheletti, Lifeng Han, Warren Del-Pinto, Goran Nenadic
for:The paper is written to introduce a novel task-specific model architecture, Label-To-Text-Transformer (LT3), which generates synthetic medical instructions based on provided labels.methods:The LT3 model is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, and it uses a task-specific transformer architecture to generate synthetic medical instructions.results:The paper evaluates the performance of LT3 by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, and shows that LT3 can generate high-quality and diverse synthetic medical instructions. The generated synthetic data is used to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset, and the results show that the model trained on synthetic data can achieve a 96-98% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form.

Abstract
Access to real-world medical instructions is essential for medical research and healthcare quality improvement. However, access to real medical instructions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We introduce a novel task-specific model architecture, Label-To-Text-Transformer (\textbf{LT3}), tailored to generate synthetic medical instructions based on provided labels, such as a vocabulary list of medications and their attributes. LT3 is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, allowing the model to produce valuable synthetic medical instructions. We evaluate LT3's performance by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, analysing the quality and diversity of generated texts. We deploy the generated synthetic data to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset. The experiments show that the model trained on synthetic data can achieve a 96-98\% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form. LT3 codes and data will be shared at \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer}

摘要
accessed to real-world medical instructions is essential for medical research and healthcare quality improvement. However, access to real medical instructions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We introduce a novel task-specific model architecture, Label-To-Text-Transformer (\textbf{LT3}), tailored to generate synthetic medical instructions based on provided labels, such as a vocabulary list of medications and their attributes. LT3 is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, allowing the model to produce valuable synthetic medical instructions. We evaluate LT3's performance by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, analysing the quality and diversity of generated texts. We deploy the generated synthetic data to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset. The experiments show that the model trained on synthetic data can achieve a 96-98\% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form. LT3 codes and data will be shared at \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer}.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

A Path to Simpler Models Starts With Noise

paper_url: http://arxiv.org/abs/2310.19726
repo_url: None
paper_authors: Lesia Semenova, Harry Chen, Ronald Parr, Cynthia Rudin
For: 这篇论文探讨了决策森林模型在各种领域中的性能问题，特别是在含有噪声的数据集上。* Methods: 该论文提出了一种机制，即数据生成过程和分析者在学习过程中的选择，对决策森林模型的性能产生影响。同时，该论文引入了一个名为“模式多样性”的指标，用于衡量决策森林模型在不同分类模式下的差异。* Results: 该论文发现，噪声程度高的数据集会导致决策森林模型的性能更高，并且模式多样性指标与噪声度之间存在正相关关系。这些结果解释了为什么简单的模型在复杂的数据集上可以达到黑盒模型的同等精度水平。

Abstract
The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.

摘要
“Rashomon集是指在给定数据集上表现相似的模型集合，而Rashomon比率则是所有模型空间中这些模型的比例。在刑事、医疗、贷款、教育等领域的表格数据集上，Rashomon比率经常很大，这有各种实际意义，例如是否可以使用简单的模型来达到与更复杂的模型相同的准确率。现有一个开问是为什么Rashomon比率经常很大。在这种工作中，我们提出了数据生成过程中的一种机制，以及分析者在学习过程中通常会选择的决策，这些机制会决定Rashomon比率的大小。具体来说，我们发现在噪声数据集上训练模型时，噪声会导致Rashomon比率更大。此外，我们还引入了一个名为“模式多样性”的度量，它捕捉了Rashomon集中不同分类模式之间的预测差异平均值，并解释了为什么在噪声数据集上，模式多样性往往增加。我们的结果解释了简单模型在复杂、噪声数据集上表现的好，是一种重要的现象。”

A Survey on Knowledge Editing of Neural Networks

paper_url: http://arxiv.org/abs/2310.19704
repo_url: None
paper_authors: Vittorio Mazzia, Alessandro Pedrani, Andrea Caciolai, Kay Rottmann, Davide Bernardi
for: 本研究旨在解决人工智能中的神经网络编辑问题，即如何通过不影响神经网络已经学习的任务来更新神经网络模型，以适应数据的变化。
methods: 本研究使用了多种方法来解决神经网络编辑问题，包括常规化技术、元学习、直接模型编辑和建筑策略等。
results: 本研究提供了一个简洁的概述神经网络编辑领域的最新研究成果，并将相关的方法和数据集分类为四个家族：常规化技术、元学习、直接模型编辑和建筑策略等。

Abstract
Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grouping works under four different families: regularization techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.

摘要
深度神经网络在学术和实际领域中日益普遍，与人类表现相当或超越人类在各种领域和相关任务上。然而，和人类一样，即使是最大的人工神经网络也会出错，并且已经正确的预测可能会成为时间推移的情况变化。为了应对这种情况，在实际应用中通常会补充数据集中的错误样本或更新信息。然而，神经网络参数中的隐式记忆知识更新时，经常会遇到危机性忘记问题，需要重新训练整个模型来实现愿望的行为。这是贵重的、不可预测的、与当前大量自动学习预训练的趋势不兼容，因此需要找到更有效率和可靠的方法来适应神经网络模型与数据变化。为此，知识编辑正在成为一种新的人工智能研究领域，旨在在不影响神经网络模型在之前学习任务上的行为的情况下，可靠、数据效率、快速地改变预训练目标模型。在这篇评论中，我们首先介绍编辑神经网络的问题，将其正式化为一种通用框架，并与不同的研究分支（如连续学习）进行区分。然后，我们提供了最新的知识编辑方法和数据集的审视，将工作分为四个家族： regularization techniques、meta-learning、直接模型编辑和建筑策略。最后，我们描述了与其他研究领域的交叉和未来工作的可能性。

Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness

paper_url: http://arxiv.org/abs/2310.19691
repo_url: https://github.com/jacyanthis/causal-context
paper_authors: Jacy Reese Anthis, Victor Veitch
for: This paper focuses on the problem of fairness in machine learning, specifically addressing the concept of counterfactual fairness and its relationship to other fairness metrics.
methods: The authors use a causal context to bridge the gap between counterfactual fairness, robust prediction, and group fairness. They develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness.
results: The authors show that in three common fairness contexts (measurement error, selection on label, and selection on predictors), counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Additionally, they demonstrate that counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.

Abstract
Counterfactual fairness requires that a person would have been classified in the same way by an AI or other algorithmic system if they had a different protected class, such as a different race or gender. This is an intuitive standard, as reflected in the U.S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data. On the other hand, group fairness metrics (e.g., demographic parity or equalized odds) are less intuitive but more readily observed. In this paper, we use $\textit{causal context}$ to bridge the gaps between counterfactual fairness, robust prediction, and group fairness. First, we motivate counterfactual fairness by showing that there is not necessarily a fundamental trade-off between fairness and accuracy because, under plausible conditions, the counterfactually fair predictor is in fact accuracy-optimal in an unbiased target distribution. Second, we develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness. Third, we show that in three common fairness contexts$\unicode{x2013}$measurement error, selection on label, and selection on predictors$\unicode{x2013}$counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.

摘要
<>ounterfactual fairness requires that a person would have been classified in the same way by an AI or other algorithmic system if they had a different protected class, such as a different race or gender. This is an intuitive standard, as reflected in the U.S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data. On the other hand, group fairness metrics (e.g., demographic parity or equalized odds) are less intuitive but more readily observed. In this paper, we use $\textit{causal context}$ to bridge the gaps between counterfactual fairness, robust prediction, and group fairness. First, we motivate counterfactual fairness by showing that there is not necessarily a fundamental trade-off between fairness and accuracy because, under plausible conditions, the counterfactually fair predictor is in fact accuracy-optimal in an unbiased target distribution. Second, we develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness. Third, we show that in three common fairness contexts$\unicode{x2013}$measurement error, selection on label, and selection on predictors$\unicode{x2013}$counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.Here's the text with the correct Chinese characters and punctuation:ounterfactual fairness requires that a person would have been classified in the same way by an AI or other algorithmic system if they had a different protected class, such as a different race or gender. This is an intuitive standard, as reflected in the U.S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data. On the other hand, group fairness metrics (e.g., demographic parity or equalized odds) are less intuitive but more readily observed. In this paper, we use $\textit{causal context}$ to bridge the gaps between counterfactual fairness, robust prediction, and group fairness. First, we motivate counterfactual fairness by showing that there is not necessarily a fundamental trade-off between fairness and accuracy because, under plausible conditions, the counterfactually fair predictor is in fact accuracy-optimal in an unbiased target distribution. Second, we develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness. Third, we show that in three common fairness contexts$\unicode{x2013}$measurement error, selection on label, and selection on predictors$\unicode{x2013}$counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.

Can input reconstruction be used to directly estimate uncertainty of a regression U-Net model? – Application to proton therapy dose prediction for head and neck cancer patients

paper_url: http://arxiv.org/abs/2310.19686
repo_url: None
paper_authors: Margerie Huet-Dastarac, Dan Nguyen, Steve Jiang, John Lee, Ana Barragan Montero
for: 这篇论文旨在提供一种可靠和高效的深度学习模型 uncertainty 估计方法，并且能够检测出资料集外的数据（Out-of-distribution，OOD）。
methods: 这篇论文提出了一种直接使用构造汇流（bottleneck）来估计模型 uncertainty的方法，具体来说是将构造汇流中的一支分支用来重建输入数据。
results: 在这篇论文中，这种方法在预报癌症肿瘤疗法剂量预测 tasks 中与 MCDO 和 DE 相比，得到了更高的 Pearson 相関系数（0.620），并且能够轻松地检测出 OOD 数据（Z-score 34.05）。

Abstract
Estimating the uncertainty of deep learning models in a reliable and efficient way has remained an open problem, where many different solutions have been proposed in the literature. Most common methods are based on Bayesian approximations, like Monte Carlo dropout (MCDO) or Deep ensembling (DE), but they have a high inference time (i.e. require multiple inference passes) and might not work for out-of-distribution detection (OOD) data (i.e. similar uncertainty for in-distribution (ID) and OOD). In safety critical environments, like medical applications, accurate and fast uncertainty estimation methods, able to detect OOD data, are crucial, since wrong predictions can jeopardize patients safety. In this study, we present an alternative direct uncertainty estimation method and apply it for a regression U-Net architecture. The method consists in the addition of a branch from the bottleneck which reconstructs the input. The input reconstruction error can be used as a surrogate of the model uncertainty. For the proof-of-concept, our method is applied to proton therapy dose prediction in head and neck cancer patients. Accuracy, time-gain, and OOD detection are analyzed for our method in this particular application and compared with the popular MCDO and DE. The input reconstruction method showed a higher Pearson correlation coefficient with the prediction error (0.620) than DE and MCDO (between 0.447 and 0.612). Moreover, our method allows an easier identification of OOD (Z-score of 34.05). It estimates the uncertainty simultaneously to the regression task, therefore requires less time or computational resources.

摘要
深度学习模型的不确定性估计问题一直是开放的问题，文献中有许多不同的解决方案。大多数常见方法基于 bayesian aproximation，如 Monte Carlo dropout（MCDO）或 Deep ensembling（DE），但它们的推理时间（即需要多个推理pass）很高，并且可能无法处理非标样本（OOD）数据（即标样本和 OOD 的不确定性相同）。在安全关键环境，如医疗应用，准确和快速的不确定性估计方法，能够检测 OOD 数据，是关键的，因为错误预测可能会威胁病人安全。在这种情况下，我们提出了一种irect uncertainty estimation方法，并在一种 regression U-Net 架构上应用了它。该方法的基本思想是在瓶颈处加入一个分支，用于重建输入。输入重建错误可以作为模型不确定性的Surrogate。为证明概念，我们将方法应用于肿瘤疗效剂量预测。我们分析了我们方法的准确性、时间成本和 OOD 检测，并与 MCDO 和 DE 进行比较。重建输入方法的 Pearson 相关系数（0.620）高于 DE 和 MCDO（ между 0.447 和 0.612），而且我们的方法可以更容易地检测 OOD（Z-score 为 34.05）。此外，我们的方法可以同时估计不确定性和 regression 任务，因此需要 fewer 时间或计算资源。

Integrating Pre-trained Language Model into Neural Machine Translation

paper_url: http://arxiv.org/abs/2310.19680
repo_url: None
paper_authors: Soon-Jae Hwang, Chang-Sung Jeong
for: 提高Neural Machine Translation（NMT）性能，解决高质量双语对应语料不足问题。
methods: 使用预训练语言模型（PLM）提供上下文信息，并提出PLM集成NMT（PiNMT）模型，包括PLM多层转换器、嵌入合并和夹角匹配等三个关键组件。
results: 通过提出的PiNMT模型和训练策略（分离学习率和双步训练），在IWSLT’14 En$\leftrightarrow$De数据集上实现了状态级表现。

Abstract
Neural Machine Translation (NMT) has become a significant technology in natural language processing through extensive research and development. However, the deficiency of high-quality bilingual language pair data still poses a major challenge to improving NMT performance. Recent studies are exploring the use of contextual information from pre-trained language model (PLM) to address this problem. Yet, the issue of incompatibility between PLM and NMT model remains unresolved. This study proposes a PLM-integrated NMT (PiNMT) model to overcome the identified problems. The PiNMT model consists of three critical components, PLM Multi Layer Converter, Embedding Fusion, and Cosine Alignment, each playing a vital role in providing effective PLM information to NMT. Furthermore, two training strategies, Separate Learning Rates and Dual Step Training, are also introduced in this paper. By implementing the proposed PiNMT model and training strategy, we achieved state-of-the-art performance on the IWSLT'14 En$\leftrightarrow$De dataset. This study's outcomes are noteworthy as they demonstrate a novel approach for efficiently integrating PLM with NMT to overcome incompatibility and enhance performance.

摘要
neural machine translation (NMT) 已经成为自然语言处理领域的重要技术，经过广泛的研究和开发。然而，高质量的双语对数据仍然是NMT性能提高的主要挑战。最近的研究在使用预训练语言模型（PLM）的上下文信息来解决这个问题。然而，PLM和NMT模型之间的不兼容问题仍未得到解决。本研究提出了PLM结合NMT（PiNMT）模型，以解决这些问题。PiNMT模型包括三个关键组件：PLM多层转换器、嵌入混合和cosine对齐，每个组件都在提供PLM信息到NMT中发挥重要作用。此外，本研究还提出了两种训练策略：分开学习率和双步训练。通过实施提议的PiNMT模型和训练策略，我们在IWSLT'14 En$\leftrightarrow$De数据集上实现了状态的表现。这些成果含义重大，因为它们证明了一种有效的PLM与NMT集成方法，以解决不兼容性和提高性能。

AI Alignment: A Comprehensive Survey

paper_url: http://arxiv.org/abs/2310.19852
repo_url: https://github.com/PKU-Alignment/AlignmentSurvey
paper_authors: Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O’Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao
for: 这个论文旨在为AI系统的Alignment提供一个全面和升级的介绍，以便更好地理解和控制AI系统的行为。
methods: 本论文使用RICE原则（Robustness、Interpretability、Controllability、Ethicality）作为AIAlignment的四个基本目标，并将当前的Alignment研究划分为两个主要组成部分：前向Alignment和后向Alignment。前向Alignment通过对AI系统进行Alignment训练来实现Alignment，而后向Alignment则是通过证明AI系统的Alignment来控制和调节它们，以避免加剧不Alignment的风险。
results: 本论文提出了一种Recurrent Process（前向Alignment和后向Alignment），该过程可以确保AI系统的Alignment，并且在每一次Alignment后，可以提供更新的目标对于下一轮Alignment。此外，论文还讨论了反馈学习和分布shift学习，以及对AI系统生命周期的每一个阶段的Assurance技术和管理做法。

Abstract
AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, the potential large-scale risks associated with misaligned AI systems become salient. Hundreds of AI experts and public figures have expressed concerns about AI risks, arguing that "mitigating the risk of extinction from AI should be a global priority, alongside other societal-scale risks such as pandemics and nuclear war". To provide a comprehensive and up-to-date overview of the alignment field, in this survey paper, we delve into the core concepts, methodology, and practice of alignment. We identify the RICE principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality. Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. Forward alignment and backward alignment form a recurrent process where the alignment of AI systems from the forward process is verified in the backward process, meanwhile providing updated objectives for forward alignment in the next round. On forward alignment, we discuss learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices that apply to every stage of AI systems' lifecycle. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources.

摘要
人工智能启 alignment 目标是使人工智能系统与人类意图和价值观 align together。随着人工智能系统的能力增强，落后的大规模风险减少成为焦点。多位 AI 专家和公众人物表达了关于 AI 风险的关注，认为“控制 AI 风险的扩展应该是全球优先事项，与其他社会级风险相提并论”。为了提供完整和准确的对 alignment 领域的概述，在这篇调查报告中，我们深入探讨了核心概念、方法和实践的 alignment。我们认为 RICE 原则是 AI alignment 的关键目标：Robustness、可读性、可控性和伦理。 guid by these four principles, we outline the landscape of current alignment research and decompose them into two key components：forward alignment and backward alignment。前者 aim to make AI systems aligned via alignment training，而后者 aim to obtain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks。forward alignment 和 backward alignment form a recurrent process，where the alignment of AI systems from the forward process is verified in the backward process， meanwhile providing updated objectives for forward alignment in the next round。在 forward alignment 方面，我们讨论了从反馈学习和分布转换学习。在 backward alignment 方面，我们讨论了对 AI 系统的生命周期中每一个阶段的 assurance 技术和管理做法。我们还将 continually 更新网站（www.alignmentsurvey.com），该网站包括教程、论文收集、博客文章和其他资源。

Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding

paper_url: http://arxiv.org/abs/2310.19671
repo_url: None
paper_authors: Bram M. A. van Dijk, Tom Kouwenhoven, Marco R. Spruit, Max J. van Duijn
for: 这篇论文主要是为了评估大型自然语言处理器（LLM）的能力，并且探讨关于 LLM 的评价和含义。
methods: 本文使用了理论和实证方法来评估 LLM 的能力，包括对三个常见批评点进行了严谨的分析。
results: 本文的结果表明，对 LLM 的评价需要更加细化，并且提出了一种 Pragmatic 视角来理解 LLM 的含义和意图。

Abstract
Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of `real' understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.

摘要
现有的大型自然语言处理模型（LLM） possess incredible ability to generate grammatically correct and fluent text. LLMs are emerging rapidly, and debates about their capacities have intensified, but reflection is lagging behind. Therefore, in this position paper, we will first focus on the debates and critically assess three points that are frequently raised in critiques of LLM capacities:1. LLMs only parrot statistical patterns in the training data;2. LLMs master formal language competence but not functional language competence;3. Language learning in LLMs cannot inform human language learning.Drawing on empirical and theoretical arguments, we will show that these points require more nuance. Second, we will outline a pragmatic perspective on the issue of "real" understanding and intentionality in LLMs. Understanding and intentionality refer to the unobservable mental states that we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behavior effectively. We will reflect on the circumstances under which it would make sense for humans to attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.

Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection

paper_url: http://arxiv.org/abs/2310.19658
repo_url: None
paper_authors: Noah Ziems, Gang Liu, John Flanagan, Meng Jiang
for: 本研究旨在提高网络入侵检测（NID）系统的决策树模型，以便更好地检测恶意网络流量。
methods: 本研究使用大型自然语言模型（LLM）来提供解释和背景知识，以帮助用户更好地理解决策树的决策。
results: 研究发现，LLM生成的决策树解释与人类评价的可读性、质量和背景知识之间呈高度相关，同时能够提供更好的决策边界的理解。

Abstract
Network intrusion detection (NID) systems which leverage machine learning have been shown to have strong performance in practice when used to detect malicious network traffic. Decision trees in particular offer a strong balance between performance and simplicity, but require users of NID systems to have background knowledge in machine learning to interpret. In addition, they are unable to provide additional outside information as to why certain features may be important for classification. In this work, we explore the use of large language models (LLMs) to provide explanations and additional background knowledge for decision tree NID systems. Further, we introduce a new human evaluation framework for decision tree explanations, which leverages automatically generated quiz questions that measure human evaluators' understanding of decision tree inference. Finally, we show LLM generated decision tree explanations correlate highly with human ratings of readability, quality, and use of background knowledge while simultaneously providing better understanding of decision boundaries.

摘要
网络入侵检测（NID）系统利用机器学习技术已经在实践中表现出色，检测恶意网络流量。决策树特别是在性能和简单性之间做出了好的折衔，但需要NID系统用户具备机器学习背景知识来解释。此外，它们无法提供外部信息，以解释特定特征对分类有什么影响。在这项工作中，我们探讨了使用大型自然语言模型（LLM）提供解释和背景知识，以帮助决策树NID系统更好地理解决ución。此外，我们还提出了一种新的人工评估框架，用于评估决策树解释的可读性、质量和背景知识使用程度。最后，我们表明LLM生成的决策树解释与人类评估中的可读性、质量和背景知识使用程度高度相关，同时提供更好的决策边界理解。

paper_url: http://arxiv.org/abs/2310.19654
repo_url: None
paper_authors: Youbo Lei, Feifei He, Chen Chen, Yingbin Mo, Si Jia Li, Defeng Xie, Haonan Lu
for: 降低大型视觉语言预训模型的模型大小和加速其终端设备部署，以提高图文检索的效率和灵活性。
methods: 提出了一种多教师跨Modal对采用单流和双流模型的混合方法，通过将单流模型中的综合特征与双流模型中的图像和文本特征进行混合，以提高学生模型的检索性能。
results: 通过进行logit和特征填充，使学生双流模型的检索性能得到明显提高，而无需增加检索复杂度。此外，在Snapdragon clips上实现了一个具有93M内存和30ms搜索延迟的移动CLIP模型，未出现明显性能下降。

Abstract
With the success of large-scale visual-language pretraining models and the wide application of image-text retrieval in industry areas, reducing the model size and streamlining their terminal-device deployment have become urgently necessary. The mainstream model structures for image-text retrieval are single-stream and dual-stream, both aiming to close the semantic gap between visual and textual modalities. Dual-stream models excel at offline indexing and fast inference, while single-stream models achieve more accurate cross-model alignment by employing adequate feature fusion. We propose a multi-teacher cross-modality alignment distillation (MCAD) technique to integrate the advantages of single-stream and dual-stream models. By incorporating the fused single-stream features into the image and text features of the dual-stream model, we formulate new modified teacher features and logits. Then, we conduct both logit and feature distillation to boost the capability of the student dual-stream model, achieving high retrieval performance without increasing inference complexity. Extensive experiments demonstrate the remarkable performance and high efficiency of MCAD on image-text retrieval tasks. Furthermore, we implement a mobile CLIP model on Snapdragon clips with only 93M running memory and 30ms search latency, without apparent performance degradation of the original large CLIP.

摘要
通过大规模视语言预训模型的成功和图文检索在业界的广泛应用，减小模型大小并将其部署到终端设备上已经变得非常必要。主流的图文检索模型结构包括单流和双流，两者都努力封闭视和文本模式之间的Semantic Gap。双流模型在线索索引和快速推理方面表现出色，而单流模型通过适当的特征融合实现更高精度的跨模型对接。我们提出了一种多教师跨模态对接填充（MCAD）技术，将单流特征融合到双流模型中的图像和文本特征上。然后，我们进行了日志和特征填充来提高学生双流模型的能力，实现高效的图文检索任务。广泛的实验表明MCAD在图文检索任务中的表现很出色，同时具有高效性。此外，我们在Snapdragon板上实现了具有93M内存和30ms搜索延迟的移动CLIP模型，无 Apparent performance degradation of the original large CLIP.

Fast swap regret minimization and applications to approximate correlated equilibria

paper_url: http://arxiv.org/abs/2310.19647
repo_url: None
paper_authors: Binghui Peng, Aviad Rubinstein
for: 本研究的目的是解决[Blum和Mansour 2007]中的主要开放问题，提出了一种可靠和计算效率高的算法，可以在只有polylog(n)轮次内减少T-swap regret至εT。methods: 本研究使用了一种新的算法，具有对ε的指数减少，但是我们证明了一个新的下界。results: 本研究的算法可以在polylog(n)轮次内减少T-swap regret至εT，并且解决了[Blum和Mansour 2007]中的主要开放问题。此外，本研究还提出了一种可靠和计算效率高的算法，可以在polylog(n)轮次内实现ε-Correlated Equilibrium（ε-CE）在多种场景中。

Abstract
We give a simple and computationally efficient algorithm that, for any constant $\varepsilon>0$, obtains $\varepsilon T$-swap regret within only $T = \mathsf{polylog}(n)$ rounds; this is an exponential improvement compared to the super-linear number of rounds required by the state-of-the-art algorithm, and resolves the main open problem of [Blum and Mansour 2007]. Our algorithm has an exponential dependence on $\varepsilon$, but we prove a new, matching lower bound. Our algorithm for swap regret implies faster convergence to $\varepsilon$-Correlated Equilibrium ($\varepsilon$-CE) in several regimes: For normal form two-player games with $n$ actions, it implies the first uncoupled dynamics that converges to the set of $\varepsilon$-CE in polylogarithmic rounds; a $\mathsf{polylog}(n)$-bit communication protocol for $\varepsilon$-CE in two-player games (resolving an open problem mentioned by [Babichenko-Rubinstein'2017, Goos-Rubinstein'2018, Ganor-CS'2018]; and an $\tilde{O}(n)$-query algorithm for $\varepsilon$-CE (resolving an open problem of [Babichenko'2020] and obtaining the first separation between $\varepsilon$-CE and $\varepsilon$-Nash equilibrium in the query complexity model). For extensive-form games, our algorithm implies a PTAS for $\mathit{normal}$ $\mathit{form}$ $\mathit{correlated}$ $\mathit{equilibria}$, a solution concept often conjectured to be computationally intractable (e.g. [Stengel-Forges'08, Fujii'23]).

摘要
我们提供了一个简单而计算效率高的算法，它可以在任何常数 $\varepsilon>0$ 下获得 $\varepsilon T$-交换失落，并且只需要 $T = \mathcal{O}(\log^c(n))$ 轮次，这是一个对比原始算法的 exponential 提高，并解决了 [Blum 和 Mansour 2007] 中的主要开放问题。我们的算法具有对 $\varepsilon$ 的几何依赖，但我们证明了一个新的匹配下界。我们的交换失落算法 imply 在一些场景中更快地达到 $\varepsilon$-相关平衡（$\varepsilon$-CE）：1. 正常形二人游戏中，我们的算法可以在 $\mathcal{O}(\log^c(n))$ 轮次内达到 $\varepsilon$-CE 的集合，这是第一个解耦的演化过程。2. 我们可以实现 $\mathsf{polylog}(n)$-位寄存器协议来实现 $\varepsilon$-CE，解决了 [Babichenko-Rubinstein 2017, Goos-Rubinstein 2018, Ganor-CS 2018] 中的开放问题。3. 我们可以实现 $\tilde{O}(n)$-询问算法来实现 $\varepsilon$-CE，解决了 [Babichenko 2020] 中的开放问题，并且获得了首次对 $\varepsilon$-CE 和 $\varepsilon$-尼亚希平衡（$\varepsilon$-NE）之间的分离。对于扩展形游戏，我们的算法 imply 一种 PTAS для $\mathit{normal}$ $\mathit{form}$ $\mathit{相关}$ $\mathit{平衡}$，这是一个常 conjectured 是计算复杂的解决方案（例如 [Stengel-Forges 08, Fujii 23]）。

RayDF: Neural Ray-surface Distance Fields with Multi-view Consistency

paper_url: http://arxiv.org/abs/2310.19629
repo_url: https://github.com/vlar-group/raydf
paper_authors: Zhuoman Liu, Bo Yang
For: This paper addresses the problem of continuous 3D shape representations, and proposes a new framework called RayDF to improve the efficiency and accuracy of 3D shape representation.* Methods: The proposed RayDF framework consists of three components: 1) simple ray-surface distance field, 2) novel dual-ray visibility classifier, and 3) multi-view consistency optimization module.* Results: The proposed method achieves remarkable performance in 3D surface point reconstruction on both synthetic and real-world 3D scenes, with a 1000x faster speed than coordinate-based methods to render an 800x800 depth image.Here is the summary in Traditional Chinese:* For: 本文研究矩阵3D形状表示的问题，并提出了一个新的框架 called RayDF，以提高3D形状表示的效率和准确性。* Methods: RayDF框架包括三个 ком成分：1）简单的光条-表面距离场，2）新的双光条可见分类器，3）多视角协力优化模块。* Results: 提出的方法在实验中 achieve remarkable performance in 3D surface point reconstruction，并且比coordinate-based方法更快速地显示出800x800深度图像，表明了方法的superiority。

Abstract
In this paper, we study the problem of continuous 3D shape representations. The majority of existing successful methods are coordinate-based implicit neural representations. However, they are inefficient to render novel views or recover explicit surface points. A few works start to formulate 3D shapes as ray-based neural functions, but the learned structures are inferior due to the lack of multi-view geometry consistency. To tackle these challenges, we propose a new framework called RayDF. It consists of three major components: 1) the simple ray-surface distance field, 2) the novel dual-ray visibility classifier, and 3) a multi-view consistency optimization module to drive the learned ray-surface distances to be multi-view geometry consistent. We extensively evaluate our method on three public datasets, demonstrating remarkable performance in 3D surface point reconstruction on both synthetic and challenging real-world 3D scenes, clearly surpassing existing coordinate-based and ray-based baselines. Most notably, our method achieves a 1000x faster speed than coordinate-based methods to render an 800x800 depth image, showing the superiority of our method for 3D shape representation. Our code and data are available at https://github.com/vLAR-group/RayDF

摘要
在这篇论文中，我们研究了连续3D形状表示的问题。现有大多数成功方法都是基于坐标的卷积神经表示。然而，它们在渲染新视图或者恢复明确的表面点时效率低下。一些工作开始将3D形状表示为射线基的神经函数，但学习结构因为缺乏多视图几何一致性而受到限制。为了解决这些挑战，我们提出了一个新的框架called RayDF。它包括三个主要组件：1）简单的射线-表面距离场，2）新的双射线可见分类器，和3）多视图一致性优化模块，以使学习的射线-表面距离在多视图几何上保持一致。我们对三个公共数据集进行了广泛的评估，并证明了我们的方法在 sintetic和实际世界中的3D表面点重建任务中显著的表现，明显超过了坐标基于和射线基于的基elines。尤其是，我们的方法可以在coordinate-based方法的1000倍速度下渲染800x800深度图像，表明我们的方法在3D形状表示方面的优势。我们的代码和数据可以在https://github.com/vLAR-group/RayDF上获取。

Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

paper_url: http://arxiv.org/abs/2310.19626
repo_url: None
paper_authors: Zhengliang Liu, Yiwei Li, Qian Cao, Junwen Chen, Tianze Yang, Zihao Wu, John Hale, John Gibbs, Khaled Rasheed, Ninghao Liu, Gengchen Mai, Tianming Liu
for: 这篇论文旨在探讨人工全面智能（AGI）在文化领域的应用和影响。
methods: 论文使用了现代大语言模型和创新性图像生成系统，对文学、历史、市场、电影等领域进行了分析和评估。
results: 论文发现了AGI系统在文化领域的应用存在几个关键问题，如真实性、毒性、偏见和公共安全等问题，并提出了缓解策略。论文强调了多方合作来确保AGI系统推动创造力、知识和文化价值，而不是威胁真实性或人类尊严。

Abstract
Recent advances in artificial general intelligence (AGI), particularly large language models and creative image generation systems have demonstrated impressive capabilities on diverse tasks spanning the arts and humanities. However, the swift evolution of AGI has also raised critical questions about its responsible deployment in these culturally significant domains traditionally seen as profoundly human. This paper provides a comprehensive analysis of the applications and implications of AGI for text, graphics, audio, and video pertaining to arts and the humanities. We survey cutting-edge systems and their usage in areas ranging from poetry to history, marketing to film, and communication to classical art. We outline substantial concerns pertaining to factuality, toxicity, biases, and public safety in AGI systems, and propose mitigation strategies. The paper argues for multi-stakeholder collaboration to ensure AGI promotes creativity, knowledge, and cultural values without undermining truth or human dignity. Our timely contribution summarizes a rapidly developing field, highlighting promising directions while advocating for responsible progress centering on human flourishing. The analysis lays the groundwork for further research on aligning AGI's technological capacities with enduring social goods.

摘要
This paper provides a comprehensive analysis of the applications and implications of AGI in the fields of text, graphics, audio, and video in the arts and humanities. We survey cutting-edge systems and their use in areas such as poetry, history, marketing, film, and communication, as well as classical art. We also highlight significant concerns related to factuality, toxicity, biases, and public safety in AGI systems, and propose strategies for mitigating these issues.The paper argues for multi-stakeholder collaboration to ensure that AGI promotes creativity, knowledge, and cultural values while upholding truth and human dignity. Our analysis lays the groundwork for further research on aligning AGI's technological capacities with enduring social goods, and advocates for responsible progress that centers on human flourishing.

Exploring Post-Training Quantization of Protein Language Models

paper_url: http://arxiv.org/abs/2310.19624
repo_url: None
paper_authors: Shuang Peng, Fei Yang, Ning Sun, Sheng Chen, Yanfeng Jiang, Aimin Pan
For: This paper aims to improve the efficiency of protein language models (ProteinLMs) by developing a post-training quantization (PTQ) method that can accurately quantize all weights and activations of ProteinLMs without compromising accuracy.* Methods: The proposed PTQ method uses piecewise linear quantization for asymmetric activation values to ensure accurate approximation, addressing specific challenges associated with ESMFold, a simplified version of AlphaFold based on ESM-2 ProteinLM.* Results: The proposed method was demonstrated to be effective in protein structure prediction tasks, showing that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, the method was applied to the contact prediction task, showcasing its versatility.

Abstract
Recent advancements in unsupervised protein language models (ProteinLMs), like ESM-1b and ESM-2, have shown promise in different protein prediction tasks. However, these models face challenges due to their high computational demands, significant memory needs, and latency, restricting their usage on devices with limited resources. To tackle this, we explore post-training quantization (PTQ) for ProteinLMs, focusing on ESMFold, a simplified version of AlphaFold based on ESM-2 ProteinLM. Our study is the first attempt to quantize all weights and activations of ProteinLMs. We observed that the typical uniform quantization method performs poorly on ESMFold, causing a significant drop in TM-Score when using 8-bit quantization. We conducted extensive quantization experiments, uncovering unique challenges associated with ESMFold, particularly highly asymmetric activation ranges before Layer Normalization, making representation difficult using low-bit fixed-point formats. To address these challenges, we propose a new PTQ method for ProteinLMs, utilizing piecewise linear quantization for asymmetric activation values to ensure accurate approximation. We demonstrated the effectiveness of our method in protein structure prediction tasks, demonstrating that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, we applied our method to the contact prediction task, showcasing its versatility. In summary, our study introduces an innovative PTQ method for ProteinLMs, addressing specific quantization challenges and potentially leading to the development of more efficient ProteinLMs with significant implications for various protein-related applications.

摘要
We found that the typical uniform quantization method performs poorly on ESMFold, resulting in a significant drop in TM-Score when using 8-bit quantization. To address this challenge, we proposed a new PTQ method for ProteinLMs that utilizes piecewise linear quantization for asymmetric activation values to ensure accurate approximation. We demonstrated the effectiveness of our method in protein structure prediction tasks, showing that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, we applied our method to the contact prediction task, demonstrating its versatility.Our study introduces an innovative PTQ method for ProteinLMs that addresses specific quantization challenges and has the potential to lead to the development of more efficient ProteinLMs with significant implications for various protein-related applications.

Large Trajectory Models are Scalable Motion Predictors and Planners

paper_url: http://arxiv.org/abs/2310.19620
repo_url: https://github.com/tsinghua-mars-lab/statetransformer
paper_authors: Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao
for: 本研究旨在提出一种可扩展的路径模型（State Transformer，STR），用于驱动自动驾驶中的动作预测和规划问题。
methods: STR 通过将观察、状态和动作排序成一个简单的序列模型， reformulate 了动作预测和规划问题。STR 的简单设计和可扩展性，在两个问题中都能够经常超越基线方法。
results: 实验结果表明，大型路径模型（LTM），如 STR，遵循缩放法律，并表现出扩展性和学习效率的出色表现。质量结果还显示， LTM 能够在训练数据分布不同的情况下做出可能的预测，并且学习长期规划，不需要显式损失函数或高级标注。

Abstract
Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task. With a simple model design, STR consistently outperforms baseline approaches in both problems. Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency. Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations.

摘要
<>输入文本翻译为简化字符串。<>自动驾驶中的运动预测和规划是关键任务，现在的努力都是转向机器学习基础的方法。挑战包括理解多样化的公路地貌，理解交通流动的长期发展，理解不同类型的行为，并在大规模状态空间中生成策略。受大语言模型在类似复杂性方面的成功启发，我们引入了可扩展的轨迹模型（State Transformer，STR）。STR将运动预测和规划问题重新排序为一个简单的序列模型任务。与基eline方法相比，STR在两个问题中具有稳定的表现。特别是，实验结果表明，大轨迹模型（LTM），如STR，遵循协调规律，表现出杰出的适应性和学习效率。Qualitative结果还表明，LTM可以在训练数据分布外的场景中做出合理的预测，而无需详细的损失函数设计或高级注释。

Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

paper_url: http://arxiv.org/abs/2310.19619
repo_url: https://github.com/mars-tin/awesome-theory-of-mind
paper_authors: Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai
for: 本研究旨在提供一种更全面和可靠的评估方法来评估机器学习模型（LLMs）的理论心（ToM）能力。
methods: 该研究使用了心理学研究中的分类方法，将机器ToM分为7种心态类别，并对现有的评估准则进行分析，以找出尚未被探讨的ToM方面。
results: 研究人员在一个网格世界设置中进行了一个证明性研究，以证明 situated evaluation 可以更好地评估机器的 ToM 能力，并减少了短cut和数据泄露的风险。Here is the same information in Simplified Chinese text:
for: 本研究旨在提供一种更全面和可靠的评估方法来评估机器学习模型（LLMs）的理论心（ToM）能力。
methods: 该研究使用了心理学研究中的分类方法，将机器ToM分为7种心态类别，并对现有的评估准则进行分析，以找出尚未被探讨的ToM方面。
results: 研究人员在一个网格世界设置中进行了一个证明性研究，以证明 situated evaluation 可以更好地评估机器的 ToM 能力，并减少了短cut和数据泄露的风险。

Abstract
Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM). Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new benchmarks, as current ones primarily focus on different aspects of ToM and are prone to shortcuts and data leakage. In this position paper, we seek to answer two road-blocking questions: (1) How can we taxonomize a holistic landscape of machine ToM? (2) What is a more effective evaluation protocol for machine ToM? Following psychological studies, we taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM. We argue for a holistic and situated evaluation of ToM to break ToM into individual components and treat LLMs as an agent who is physically situated in environments and socially situated in interactions with humans. Such situated evaluation provides a more comprehensive assessment of mental states and potentially mitigates the risk of shortcuts and data leakage. We further present a pilot study in a grid world setup as a proof of concept. We hope this position paper can facilitate future research to integrate ToM with LLMs and offer an intuitive means for researchers to better position their work in the landscape of ToM. Project page: https://github.com/Mars-tin/awesome-theory-of-mind

摘要
大型语言模型（LLM）已引起广泛的关注和讨论，关于它们是否会发展出理论心（ToM）的潜在能力。一些最近的调查表明，目前的LLM Models在ToM方面存在强度不足的问题，需要开发新的评价协议，因为现有的评价方法主要关注了ToM的不同方面，容易出现偏导和数据泄露问题。在这份Position paper中，我们寻求回答以下两个障碍问题：1. 如何分类机器人的整体ToM领域？2. 如何设计更有效的机器人ToM评价协议？根据心理学研究，我们将机器人ToM分类为7种心理状态类别，并将现有的评价方法进行分类，以找出尚未得到足够关注的ToM方面。我们 argueThat a comprehensive and situated evaluation of ToM is necessary to break ToM into its individual components and treat LLMs as physically and socially situated agents in environments and interactions with humans. Such situated evaluation can provide a more comprehensive assessment of mental states and potentially mitigate the risk of shortcuts and data leakage. We further present a pilot study in a grid world setup as a proof of concept. We hope this position paper can facilitate future research to integrate ToM with LLMs and offer an intuitive means for researchers to better position their work in the landscape of ToM.项目页面：https://github.com/Mars-tin/awesome-theory-of-mind

Technical Report on the Learning of Case Relevance in Case-Based Reasoning with Abstract Argumentation

paper_url: http://arxiv.org/abs/2310.19607
repo_url: https://github.com/GPPassos/learning-relevance-aacbr-technical-report
paper_authors: Guilherme Paulino-Passos, Francesca Toni
for: This paper focuses on using case-based reasoning and abstract argumentation to improve the prediction of legal outcomes.
methods: The paper uses decision trees to learn the relevance of cases and combines case-based reasoning with abstract argumentation to make predictions.
results: The authors show that the proposed approach performs competitively with decision trees and results in a more compact representation, which could be beneficial for obtaining cognitively tractable explanations.Here’s the information in Simplified Chinese text:
for: 这篇论文关注使用情况基本reasoning和抽象论证来改进法律结果预测。
methods: 论文使用决策树学习情况中的相关性，并将情况基本reasoning与抽象论证结合以进行预测。
results: 作者们表明，提议的方法与决策树相比，在两个法律数据集上表现竞争性，并且生成了更加紧凑的表示，可能有助于获得更加容易理解的解释。

Abstract
Case-based reasoning is known to play an important role in several legal settings. In this paper we focus on a recent approach to case-based reasoning, supported by an instantiation of abstract argumentation whereby arguments represent cases and attack between arguments results from outcome disagreement between cases and a notion of relevance. In this context, relevance is connected to a form of specificity among cases. We explore how relevance can be learnt automatically in practice with the help of decision trees, and explore the combination of case-based reasoning with abstract argumentation (AA-CBR) and learning of case relevance for prediction in legal settings. Specifically, we show that, for two legal datasets, AA-CBR and decision-tree-based learning of case relevance perform competitively in comparison with decision trees. We also show that AA-CBR with decision-tree-based learning of case relevance results in a more compact representation than their decision tree counterparts, which could be beneficial for obtaining cognitively tractable explanations.

摘要
Case-based reasoning 在法律设置中扮演着重要角色。在这篇论文中，我们关注一种最近的case-based reasoning方法，基于抽象论证的实现，其中Arguments代表案例，Arguments之间的攻击 originates from outcome disagreement between cases and a notion of relevance。在这种情况下， relevance 与特定的案例之间的相似性相连。我们研究如何在实践中自动学习case relevance，并探讨 AA-CBR 和学习案例相关性的组合用于预测法律设置中。具体来说，我们显示，对于两个法律数据集，AA-CBR 和基于决策树的学习案例相关性能与决策树相比竞争，而且 AA-CBR 与决策树学习案例相关性后得到的表示更加紧凑，这可能有助于获得更加容易理解的解释。

LLMaAA: Making Large Language Models as Active Annotators

paper_url: http://arxiv.org/abs/2310.19596
repo_url: https://github.com/ridiculouz/LLMaAA
paper_authors: Ruoyu Zhang, Yanzeng Li, Yongliang Ma, Ming Zhou, Lei Zou
for: 本研究旨在将大语言模型（LLMs）训练为实际应用中的自然语言处理（NLP）任务，并且实现高效的标签生成。
methods: 本研究使用了LLMs作为标签生成的annotator，并将其置入到了活动学习 Loop中，以进行有效的标签生成。
results: 在两个 класи级NLP任务中， LLMaAA 可以实现高效的标签生成，并且可以在只需百个标签示例下，训练任务特定的模型，并且超越其他基eline。

Abstract
Prevalent supervised learning methods in natural language processing (NLP) are notoriously data-hungry, which demand large amounts of high-quality annotated data. In practice, acquiring such data is a costly endeavor. Recently, the superior few-shot performance of large language models (LLMs) has propelled the development of dataset generation, where the training data are solely synthesized from LLMs. However, such an approach usually suffers from low-quality issues, and requires orders of magnitude more labeled data to achieve satisfactory performance. To fully exploit the potential of LLMs and make use of massive unlabeled data, we propose LLMaAA, which takes LLMs as annotators and puts them into an active learning loop to determine what to annotate efficiently. To learn robustly with pseudo labels, we optimize both the annotation and training processes: (1) we draw k-NN examples from a small demonstration pool as in-context examples, and (2) we adopt the example reweighting technique to assign training samples with learnable weights. Compared with previous approaches, LLMaAA features both efficiency and reliability. We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction. With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples, which is much more cost-effective than other baselines.

摘要
通常的监督学习方法在自然语言处理（NLP）领域具有著名的数据占用问题，需要大量高质量标注数据来训练。在实践中，获取这些数据是一件昂贵的困难任务。在最近，大型自然语言模型（LLM）的优秀几个shot性能的发展使得数据生成技术得到了更多的关注，其中的数据主要通过LLM来生成。然而，这种方法通常受到低质量问题的困扰，需要数量级更多的标注数据来达到满意性。为了充分利用LLM的潜力并使用庞大的未标注数据，我们提出了LLMaAA，它将LLM作为标注者，并将其置入到活动学习循环中，以确定如何有效地标注。为了学习Robustly，我们在标注和训练过程中进行优化：（1）我们从小示例池中随机选择k nearest neighbors（k-NN）的示例作为上下文示例，并（2）采用示例权重技术，将训练样本分配学习型的权重。相比之下，LLMaAA具有高效和可靠的特点。我们在两个 класси型NLP任务中进行实验和分析，结果显示，通过LLMaAA训练基于LLM生成的标签的任务特定模型，可以在只需百个标注示例的情况下，超越教师模型，这是其他基准之下更加经济的。

Prediction of Locally Stationary Data Using Expert Advice

paper_url: http://arxiv.org/abs/2310.19591
repo_url: None
paper_authors: Vladimir V’yugin, Vladimir Trunov
for: 这篇论文研究了连续机器学习的问题，特别是在游戏理论方法的框架下进行计算。
methods: 该论文使用了游戏理论方法，不假设数据源的随机性，而是基于数据流的结构假设。
results: 论文提出了一种在线预测算法，可以用于处理本地站点时间序列。此外，论文还得到了这种算法的效率估计。

Abstract
The problem of continuous machine learning is studied. Within the framework of the game-theoretic approach, when for calculating the next forecast, no assumptions about the stochastic nature of the source that generates the data flow are used -- the source can be analog, algorithmic or probabilistic, its parameters can change at random times, when building a prognostic model, only structural assumptions are used about the nature of data generation. An online forecasting algorithm for a locally stationary time series is presented. An estimate of the efficiency of the proposed algorithm is obtained.

摘要
“continuous machine learning”问题被研究。在游戏理论方法框架下，对于计算下一个预测时，不假设数据源具有测度的性质，数据源可以是杂音、算法或概率的，数据源的参数可以在随机时间变化，在建立预测模型时，只假设数据生成的结构。向来自本地站点时间序列的在线预测算法被提出。对提出的算法的效率估计得到。Note: "continuous machine learning" is not a direct translation of the English phrase "continuous machine learning", but rather a translation of the concept it represents. In Simplified Chinese, "continuous" is often translated as "连续" (lián xù), but in this context, "连续机器学习" (lián xù jī shù) is used to emphasize the continuous nature of the learning process.

CreoleVal: Multilingual Multitask Benchmarks for Creoles

paper_url: http://arxiv.org/abs/2310.19567
repo_url: https://github.com/hclent/creoleval
paper_authors: Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Hans Erik Heje, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva
for: 这个论文的目的是为了提供一个对Creole语言的NLP研究提供资源，以便更好地包括这些语言在计算语言学和自然语言处理领域中。
methods: 本文使用了多种NLP任务的基准数据集，包括机器理解、关系分类和机器翻译等，并在零shot设定下进行了基准实验，以评估将其他语言的资源传递到Creole语言上的能力和局限性。
results: 本文提供了8种NLP任务的benchmark数据集，覆盖28种Creole语言，并为每个任务进行了零shot基准实验，以更好地了解Creole语言在NLP领域的能力和局限性。

Abstract
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and other highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of brand new development datasets for machine comprehension, relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, the goal of CreoleVal is to empower research on Creoles in NLP and computational linguistics. We hope this resource will contribute to technological inclusion for Creole language users around the globe.

摘要
创护语言表示一个尚未得到足够探索和受排斥的语言群体，有很少的可用资源 для NLP研究。尽管创护语言与其他高度资源化语言之间的 généalogique 关系 imply 一个显著的潜在 transferred learning 潜力，但这种潜力受到缺乏精心标注数据的限制。在这项工作中，我们介绍 CreoleVal，一个收录8种 NLP任务的benchmark集合，覆盖28种创护语言，包括新开发的机器理解、关系分类和机器翻译 benchmarks 以及一些现有的benchmarks。对于每个benchmark，我们进行了零批设置的基eline实验，以更好地了解创护语言在 transferred learning 中的能力和局限性。最终，CreoleVal的目标是促进创护语言在 NLP和计算语言科学中的研究，并为全球各地的创护语言用户提供技术包容。

A General Neural Causal Model for Interactive Recommendation

paper_url: http://arxiv.org/abs/2310.19519
repo_url: None
paper_authors: Jialin Liu, Xinyan Su, Peng Zhou, Xiangyu Zhao, Jun Li
for: 这篇论文旨在 Mitigating survivor bias in observational data for optimizing recommender systems.
methods: 该论文提出了一种基于神经网络 causal model 的方法，通过 counterfactual inference 来解决 survivor bias 问题。
results: both theoretical and empirical studies demonstrate the effectiveness of the proposed solution.

Abstract
Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution.

摘要
Note:* "Survivor bias" is translated as "生存者偏见" (shēng zhì zhēng yǎn)* "Observational data" is translated as "观察数据" (guān cháng shù dài)* "Recommender systems" is translated as "推荐系统" (tuī yù xì tǒng)* "Local optima" is translated as "本地最优" (ben dì zuì yōu)* "Counterfactual problem" is translated as "Counterfactual问题" (fǎng yì wèn tí)* "Gumbel-max function" is translated as "Gumbel-max函数" (Gumbel-max fāng xìng)* "Structural constrains" is translated as "结构约束" (jiégòu yāo xiǎng)* "Reinforcement optimizations" is translated as "强化优化" (qiáng huà yóu huà)* "Gumbel-Softmax" is translated as "Gumbel-Softmax" (Gumbel-Softmax)

Inverse folding for antibody sequence design using deep learning

paper_url: http://arxiv.org/abs/2310.19513
repo_url: None
paper_authors: Frédéric A. Dreyer, Daniel Cutting, Constantin Schneider, Henry Kenlay, Charlotte M. Deane
for: 蛋白质结构设计问题
methods: 精确的逆折衣模型和物理学基础方法
results: 提高适应性和结构稳定性，尤其是在制做CDR-H3螺旋中Here’s a more detailed explanation of each point:
for: The paper is written for the problem of designing antibody sequences based on 3D structural information.
methods: The paper proposes a fine-tuned inverse folding model that is specifically optimized for antibody structures, and uses physics-based methods to evaluate the quality of proposed sequences.
results: The paper shows that the proposed model outperforms generic protein models on sequence recovery and structure robustness when applied to antibodies, with notable improvement on the hypervariable CDR-H3 loop. Additionally, the paper studies the canonical conformations of complementarity-determining regions and finds improved encoding of these loops into known clusters.

Abstract
We consider the problem of antibody sequence design given 3D structural information. Building on previous work, we propose a fine-tuned inverse folding model that is specifically optimised for antibody structures and outperforms generic protein models on sequence recovery and structure robustness when applied on antibodies, with notable improvement on the hypervariable CDR-H3 loop. We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters. Finally, we consider the applications of our model to drug discovery and binder design and evaluate the quality of proposed sequences using physics-based methods.

摘要
我们考虑了抗体序列设计问题，givien 3D结构信息。基于先前的工作，我们提出了特制 inverse folding 模型，可以特地优化 для抗体结构，在抗体序列恢复和结构稳定性方面表现出色，特别是在 CDR-H3 征 loop 中。我们研究了供应 chain 的常见 conformations 和 encoding 这些征 loop 到已知的群体。最后，我们考虑了我们模型在药物发现和绑定设计中的应用，并使用物理学基的方法评估提出的序列质量。Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

paper_url: http://arxiv.org/abs/2310.19509
repo_url: https://github.com/lswzjuer/SparseByteNN
paper_authors: Haitao Xu, Songwei Liu, Yuyang Xu, Shuai Wang, Jiashi Li, Chenqian Yan, Liangqiang Li, Lean Fu, Xin Pan, Fangmin Chen
for: 这个研究旨在提高网络大小时的测试速度和精度。
methods: 本研究使用精确的kernel终端减少来实现实时执行和高精度。它包括两个部分：（a）精确的kernel终端减少架构，包括多个不同的簇终端减少模式，并与我们提出的网络重新排序策略，实现了高压缩率和高精度。（b）特化于缓冲终端减少的推断引擎。
results: 实验结果显示，对于30%簇的MobileNet-v1，SparseByteNN可以与紧密的终端版本和现有的缓冲推断引擎MNN进行比较，在效率-精度曲线上表现出superiority。具体来说，SparseByteNN在Qualcomm 855上实现了1.27倍的速度提升和1.29倍的效率提升，仅有0.224%的精度下降。

Abstract
To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open problem. In this paper, we present a novel mobile inference acceleration framework SparseByteNN, which leverages fine-grained kernel sparsity to achieve real-time execution as well as high accuracy. Our framework consists of two parts: (a) A fine-grained kernel sparsity schema with a sparsity granularity between structured pruning and unstructured pruning. It designs multiple sparse patterns for different operators. Combined with our proposed whole network rearrangement strategy, the schema achieves a high compression rate and high precision at the same time. (b) Inference engine co-optimized with the sparse pattern. The conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet-v1 outperform strong dense baselines on the efficiency-accuracy curve. Experimental results on Qualcomm 855 show that for 30% sparse MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and 1.29x speedup over the state-of-the-art sparse inference engine MNN with a slight accuracy drop of 0.224%. The source code of SparseByteNN will be available at https://github.com/lswzjuer/SparseByteNN

摘要
为了解决网络大小的挑战，研究人员已经开发出了稀盐模型通过网络剪切。然而，在普通计算设备上实现重要的速度增加并保持模型准确性 remain an open problem。在这篇论文中，我们提出了一种新的移动推理加速框架SparseByteNN，它利用细化的kernel稀盐来实现实时执行以及高准确性。我们的框架包括两部分：(a) 细化kernel稀盐schema，其中的稀盐粒度位于结构剪切和无结构剪切之间。它设计了多种不同的稀盐模式，并结合我们的提出的整个网络重新排序策略，实现了高压缩率和高精度同时。(b) 推理引擎与稀盐模式相似。传统的观点是，这种理论的FLOP减少不会在实际情况中带来实用性提升。我们希望通过引入一家高效的稀盐kernel家族，证明这个观点是错误的。我们的高效实现稀盐基本操作，使得稀盐版本的MobileNet-v1在效率-准确度曲线上表现出色，超过了dense基eline和state-of-the-art sparse推理引擎MNN的速度。实验结果表明，在Qualcomm 855上，为30%稀盐的MobileNet-v1，SparseByteNN可以与dense版本相比，提高1.27倍的速度，同时与MNN相比，提高1.29倍的速度，但是略有准确性下降（0.224%）。SparseByteNN的源代码将在https://github.com/lswzjuer/SparseByteNN上发布。

Trust, Accountability, and Autonomy in Knowledge Graph-based AI for Self-determination

paper_url: http://arxiv.org/abs/2310.19503
repo_url: None
paper_authors: Luis-Daniel Ibáñez, John Domingue, Sabrina Kirrane, Oshani Seneviratne, Aisling Third, Maria-Esther Vidal
for: This paper is written to address the issue of self-determination in the context of the growing use of Knowledge Graphs (KGs) and Artificial Intelligence (AI) in online services.
methods: The paper uses a conceptual framework to explore the foundational topics and research pillars needed to support KG-based AI for self-determination, and analyzes challenges and opportunities for citizen self-determination in a real-world scenario.
results: The paper proposes a research agenda aimed at accomplishing the recommended objectives, including ensuring the trustworthiness of AI systems, transparency in data and inner workings, and accountability for decision-making.

Abstract
Knowledge Graphs (KGs) have emerged as fundamental platforms for powering intelligent decision-making and a wide range of Artificial Intelligence (AI) services across major corporations such as Google, Walmart, and AirBnb. KGs complement Machine Learning (ML) algorithms by providing data context and semantics, thereby enabling further inference and question-answering capabilities. The integration of KGs with neuronal learning (e.g., Large Language Models (LLMs)) is currently a topic of active research, commonly named neuro-symbolic AI. Despite the numerous benefits that can be accomplished with KG-based AI, its growing ubiquity within online services may result in the loss of self-determination for citizens as a fundamental societal issue. The more we rely on these technologies, which are often centralised, the less citizens will be able to determine their own destinies. To counter this threat, AI regulation, such as the European Union (EU) AI Act, is being proposed in certain regions. The regulation sets what technologists need to do, leading to questions concerning: How can the output of AI systems be trusted? What is needed to ensure that the data fuelling and the inner workings of these artefacts are transparent? How can AI be made accountable for its decision-making? This paper conceptualises the foundational topics and research pillars to support KG-based AI for self-determination. Drawing upon this conceptual framework, challenges and opportunities for citizen self-determination are illustrated and analysed in a real-world scenario. As a result, we propose a research agenda aimed at accomplishing the recommended objectives.

摘要
知识 graphs (KGs) 已成为智能决策的基础 плаform，并在大型公司如 Google、Walmart 和 Airbnb 中应用于许多人工智能 (AI) 服务。KGs 补充机器学习 (ML) 算法，提供数据上下文和 semantics，从而实现进一步的推理和问答能力。目前，KGs 与神经网络学 (e.g., Large Language Models (LLMs)) 的结合，被称为神经符号 AI。尽管 KG-based AI 具有许多优点，但其在线服务的普及可能导致公民的自主权减退为社会问题。随着我们对这些技术的依赖，我们将失去自己的命运。为了解决这种威胁，AI 规则，如欧盟 (EU) AI 法规，在某些地区被提出。这些规则需要技术人员做什么，引发了如何确保 AI 系统输出的可靠性、数据驱动和内部机制的透明度，以及如何让 AI 做出负责任的决策的问题。本文概括了 KG-based AI 的基础主题和研究柱石，并通过实际场景的示例和分析，描述了公民自主权的挑战和机遇。因此，我们提出了一个研究计划，旨在实现建议的目标。

Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal

paper_url: http://arxiv.org/abs/2310.19463
repo_url: https://github.com/aicenter/optimize-planning-heuristics-to-rank
paper_authors: Leah Chrestien, Tomás Pevný, Stefan Edelkamp, Antonín Komenda
for: 这个论文是为了优化搜索算法中的启发函数参数而写的。
methods: 这篇论文使用了解决 пробле 集 instances 的方法来优化启发函数参数。
results: 实验结果表明，使用这种方法可以在多种问题上得到更好的性能。

Abstract
In imitation learning for planning, parameters of heuristic functions are optimized against a set of solved problem instances. This work revisits the necessary and sufficient conditions of strictly optimally efficient heuristics for forward search algorithms, mainly A* and greedy best-first search, which expand only states on the returned optimal path. It then proposes a family of loss functions based on ranking tailored for a given variant of the forward search algorithm. Furthermore, from a learning theory point of view, it discusses why optimizing cost-to-goal \hstar\ is unnecessarily difficult. The experimental comparison on a diverse set of problems unequivocally supports the derived theory.

摘要
在依 Beispiel Learning for Planning 中， Parameters of 追求函数被优化对于一组解决的问题实例。这项工作 revisits the necessary and sufficient conditions of strictly optimally efficient heuristics for forward search algorithms, mainly A\* 和 greedy best-first search, which expand only states on the returned optimal path. It then proposes a family of loss functions based on ranking tailored for a given variant of the forward search algorithm. Furthermore, from a learning theory point of view, it discusses why optimizing cost-to-goal \hstar\ is unnecessarily difficult. The experimental comparison on a diverse set of problems unequivocally supports the derived theory.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Denoising Diffusion Probabilistic Models for Hardware-Impaired Communication Systems: Towards Wireless Generative AI

paper_url: http://arxiv.org/abs/2310.19460
repo_url: None
paper_authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
For: This paper proposes a practical wireless communication system with hardware-impaired transceivers, using denoising diffusion probabilistic models (DDPMs) to improve network resilience and reconstruction performance.* Methods: The proposed DDPM-based receiver uses a decomposition of the data generation process to address realistic non-idealities such as hardware impairments, channel distortions, and quantization errors.* Results: The paper shows that the proposed approach provides near-invariant reconstruction performance with respect to different hardware impairment levels and quantization errors, and achieves more than 25 dB improvement in reconstruction performance compared to conventional deep neural network (DNN)-based receivers.

Abstract
Thanks to the outstanding achievements from state-of-the-art generative models like ChatGPT and diffusion models, generative AI has gained substantial attention across various industrial and academic domains. In this paper, denoising diffusion probabilistic models (DDPMs) are proposed for a practical finite-precision wireless communication system with hardware-impaired transceivers. The intuition behind DDPM is to decompose the data generation process over the so-called "denoising" steps. Inspired by this, a DDPM-based receiver is proposed for a practical wireless communication scheme that faces realistic non-idealities, including hardware impairments (HWI), channel distortions, and quantization errors. It is shown that our approach provides network resilience under low-SNR regimes, near-invariant reconstruction performance with respect to different HWI levels and quantization errors, and robust out-of-distribution performance against non-Gaussian noise. Moreover, the reconstruction performance of our scheme is evaluated in terms of cosine similarity and mean-squared error (MSE), highlighting more than 25 dB improvement compared to the conventional deep neural network (DNN)-based receivers.

摘要
由于现代生成模型如ChatGPT和扩散模型的出色成就，生成AI已经受到了各种领域和学术领域的广泛关注。在这篇论文中，我们提出了一种实用的finite-precision无线通信系统中的杂谱扩散概率模型（DDPM）。DDPM的启发是将数据生成过程 decomposes into "denoising" steps。受到这种启发，我们提出了基于DDPM的一种实用的无线通信接收器，可以面对现实的非理想条件，包括硬件缺陷（HWI）、通道扭曲和量化误差。我们的方法可以在低SNR情况下提供网络鲁棒性，对不同HWI水平和量化误差 exhibit near-invariant重建性，并且具有对非高斯噪声的Robust性。此外，我们的重建性分析采用cosine similarity和平均方差Error（MSE），表明与传统的深度神经网络（DNN）接收器相比，我们的方法可以实现超过25dB的提升。

ALT: Towards Fine-grained Alignment between Language and CTR Models for Click-Through Rate Prediction

paper_url: http://arxiv.org/abs/2310.19453
repo_url: None
paper_authors: Hangyu Wang, Jianghao Lin, Xiangyang Li, Bo Chen, Chenxu Zhu, Ruiming Tang, Weinan Zhang, Yong Yu
for: 预测用户点击率 (CTR) 作为个人化在线服务的核心功能模块，这paper提出了一种新的alignment方法来提高CTR预测的准确率。
methods: 这paper使用了一种新的join重构预训 task来实现语言和CTR模型之间的细致特征对应，并提出了三种不同的训练策略来满足不同的应用场景需求。
results: 实验结果表明，这paper提出的alignment方法可以在三个实际 datasets上达到 state-of-the-art 性能，并且可以与不同的语言和CTR模型结合使用，以满足不同的应用场景需求。

Abstract
Click-through rate (CTR) prediction plays as a core function module in various personalized online services. According to the data modality and input format, the models for CTR prediction can be mainly classified into two categories. The first one is the traditional CTR models that take as inputs the one-hot encoded ID features of tabular modality, which aims to capture the collaborative signals via feature interaction modeling. The second category takes as inputs the sentences of textual modality obtained by hard prompt templates, where pretrained language models (PLMs) are adopted to extract the semantic knowledge. These two lines of research generally focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. Therefore, in this paper, we propose to conduct fine-grained feature-level Alignment between Language and CTR models (ALT) for CTR prediction. Apart from the common CLIP-like instance-level contrastive learning, we further design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose three different finetuning strategies with the option to train the aligned language and CTR models separately or jointly for downstream CTR prediction tasks, thus accommodating the varying efficacy and efficiency requirements for industrial applications. Extensive experiments on three real-world datasets demonstrate that ALT outperforms SOTA baselines, and is highly compatible for various language and CTR models.

摘要
点击率（CTR）预测作为个人化在线服务的核心功能模块，可以根据数据模式和输入格式分为两类模型。第一类是传统的 CTR 模型，通过一个热门的 ID 特征一个个进行编码，目的是捕捉协作信号via特性互动模型。第二类则是基于硬模板的文本模式，采用预训练语言模型（PLMs）来提取 semantics。这两种研究通常关注不同的输入数据特性（即文本和表格模式），形成一种明确的补做关系。因此，在这篇论文中，我们提议进行细化的特征级别对齐（ALT）来进行 CTR 预测。除了常见的 CLIP 类似的实例级别对比学习外，我们还设计了一种新的联合重建预训任务，以便对masked language和表格模型进行共同预训。具体来说，一个模式（例如， токен或特征）的masked数据需要通过另一个模式来恢复，这种方式在两个模式之间建立特征级别的交互和对齐，通过sufficient mutual information extraction between dual modalities。此外，我们还提出了三种不同的 finetuning 策略，可以根据下游应用的效率和可行性要求来训练对齐的语言和 CTR 模型，从而满足不同的应用场景。广泛的实验表明，ALT 超过了 SOTA 基elines，并具有高度的兼容性，可以与多种语言和 CTR 模型结合使用。

Large-Scale Application of Fault Injection into PyTorch Models – an Extension to PyTorchFI for Validation Efficiency

paper_url: http://arxiv.org/abs/2310.19449
repo_url: None
paper_authors: Ralf Graafe, Qutub Syed Sha, Florian Geissler, Michael Paulitsch
for: 本研究旨在分析硬件故障（HW）对软件（SW）和人工神经网络（NN）模型的影响，以及如何使用应用程序水平的故障插入（ALFI）来实现机器学习开发过程中的安全性案例。
methods: 本研究使用了PyTorchFI框架，并提出了一种新的应用程序水平故障插入框架called PyTorchALFI，以便定义随机生成的和可重复使用的故障，并对PyTorch模型进行评估。
results: 本研究通过对PyTorch模型进行多个场景测试，并对测试结果进行分析，以便了解硬件故障对模型的影响，并且提供了一些使用PyTorchALFI框架进行模型修改和比较的示例。

Abstract
Transient or permanent faults in hardware can render the output of Neural Networks (NN) incorrect without user-specific traces of the error, i.e. silent data errors (SDE). On the other hand, modern NNs also possess an inherent redundancy that can tolerate specific faults. To establish a safety case, it is necessary to distinguish and quantify both types of corruptions. To study the effects of hardware (HW) faults on software (SW) in general and NN models in particular, several fault injection (FI) methods have been established in recent years. Current FI methods focus on the methodology of injecting faults but often fall short of accounting for large-scale FI tests, where many fault locations based on a particular fault model need to be analyzed in a short time. Results need to be concise, repeatable, and comparable. To address these requirements and enable fault injection as the default component in a machine learning development cycle, we introduce a novel fault injection framework called PyTorchALFI (Application Level Fault Injection for PyTorch) based on PyTorchFI. PyTorchALFI provides an efficient way to define randomly generated and reusable sets of faults to inject into PyTorch models, defines complex test scenarios, enhances data sets, and generates test KPIs while tightly coupling fault-free, faulty, and modified NN. In this paper, we provide details about the definition of test scenarios, software architecture, and several examples of how to use the new framework to apply iterative changes in fault location and number, compare different model modifications, and analyze test results.

摘要
非暂时或永久的硬件故障可以使神经网络（NN）的输出错误无法诊断到用户特定的错误迹象，即静默数据错误（SDE）。然而，现代NN也拥有内置的重复性，可以承受特定的故障。为建立安全性 caso，需要分化和量化两种损害。为了研究硬件（HW）故障对软件（SW）的影响，以及NN模型的影响，多年来已经有多种硬件故障插入（FI）方法的建立。现有的FI方法通常将注意力集中在插入故障的方法上，而忽视了大规模FI测试，需要在短时间内分析多个故障位置基于特定故障模型。结果需要是简洁、重复、比较。为解决这些需求，我们介绍了一个新的硬件故障插入框架 called PyTorchALFI（PyTorch应用程序级故障插入），基于PyTorchFI。PyTorchALFI提供了一种效果的方式来定义随机生成的和可重用的故障集，定义复杂的测试enario，增强数据集，并生成测试KPI，同时紧密地集成 fault-free、FAULTY 和修改后的NN。在这篇文章中，我们提供了测试scenario的定义、软件架构和多种使用新框架的示例，以应用iterative变化的故障位置和数量，比较不同的模型修改，分析测试结果。

Explaining the Decisions of Deep Policy Networks for Robotic Manipulations

paper_url: http://arxiv.org/abs/2310.19432
repo_url: None
paper_authors: Seongun Kim, Jaesik Choi
for: 这个论文旨在解释深度政策模型中的透明性，以便在机器人 manipulate 任务中提高可靠性和稳定性。
methods: 作者使用输入贡献分析方法来解释机器人策略模型中每个输入特征对决策的影响。两种方法分别是：（1）测量每个 JOINT 力的重要性因子，以反映电动机力的影响于终端器移动；（2）修改 relevance propagation 方法，以正确处理深度策略网络中的负输入和输出。
results: 研究人员通过对深度策略模型进行输入贡献分析，发现了多modal 感知器输入的动态变化，并且可以在机器人 manipulate 任务中提高透明性和可靠性。

Abstract
Deep policy networks enable robots to learn behaviors to solve various real-world complex tasks in an end-to-end fashion. However, they lack transparency to provide the reasons of actions. Thus, such a black-box model often results in low reliability and disruptive actions during the deployment of the robot in practice. To enhance its transparency, it is important to explain robot behaviors by considering the extent to which each input feature contributes to determining a given action. In this paper, we present an explicit analysis of deep policy models through input attribution methods to explain how and to what extent each input feature affects the decisions of the robot policy models. To this end, we present two methods for applying input attribution methods to robot policy networks: (1) we measure the importance factor of each joint torque to reflect the influence of the motor torque on the end-effector movement, and (2) we modify a relevance propagation method to handle negative inputs and outputs in deep policy networks properly. To the best of our knowledge, this is the first report to identify the dynamic changes of input attributions of multi-modal sensor inputs in deep policy networks online for robotic manipulation.

摘要
深度政策网络可以让机器人学习做各种复杂的实际任务，但它缺乏透明度，无法提供动作的原因。因此，这种黑obox模型在实际应用中可能会导致低可靠性和干扰行为。为了增强其透明度，需要解释机器人行为，考虑每个输入特征对决策的影响程度。在这篇论文中，我们通过输入贡献方法来解释深度政策模型中每个输入特征对机器人决策的影响。为此，我们提出了两种将输入贡献方法应用于机器人政策网络：首先，我们测量每个 JOINT 扭矩的重要性因素，以反映电动机扭矩对终端器运动的影响；其次，我们修改了 relevance propagation 方法，以正确处理深度政策网络中的负输入和输出。根据我们所知，这是首次在深度政策网络上线实时识别多模式感知输入的动态变化。

Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans

paper_url: http://arxiv.org/abs/2310.19427
repo_url: https://github.com/leekwoon/rgg
paper_authors: Kyowoon Lee, Seongun Kim, Jaesik Choi
for: 提高Diffusion模型生成的不可靠计划的可靠性，使其在安全关键应用中可以使用。
methods: 提出一种新的方法，通过提供修正指导来纠正Diffusion模型生成的错误计划。该方法包括一种新的评价指标——恢复差值评价指导，以及一种增强器来防止恶作剂修正指导。
results: 在三个benchmark上进行了offline控制设置的长期规划测试，并证明了我们的方法的效果。同时，我们还展示了我们的方法的解释力，通过显示差值预测器的归属地图和错误途径，以便更深入了解生成的计划。

Abstract
Diffusion-based planning has shown promising results in long-horizon, sparse-reward tasks by training trajectory diffusion models and conditioning the sampled trajectories using auxiliary guidance functions. However, due to their nature as generative models, diffusion models are not guaranteed to generate feasible plans, resulting in failed execution and precluding planners from being useful in safety-critical applications. In this work, we propose a novel approach to refine unreliable plans generated by diffusion models by providing refining guidance to error-prone plans. To this end, we suggest a new metric named restoration gap for evaluating the quality of individual plans generated by the diffusion model. A restoration gap is estimated by a gap predictor which produces restoration gap guidance to refine a diffusion planner. We additionally present an attribution map regularizer to prevent adversarial refining guidance that could be generated from the sub-optimal gap predictor, which enables further refinement of infeasible plans. We demonstrate the effectiveness of our approach on three different benchmarks in offline control settings that require long-horizon planning. We also illustrate that our approach presents explainability by presenting the attribution maps of the gap predictor and highlighting error-prone transitions, allowing for a deeper understanding of the generated plans.

摘要
Diffusion-based planning has shown promising results in long-horizon, sparse-reward tasks by training trajectory diffusion models and conditioning the sampled trajectories using auxiliary guidance functions. However, due to their nature as generative models, diffusion models are not guaranteed to generate feasible plans, resulting in failed execution and precluding planners from being useful in safety-critical applications. In this work, we propose a novel approach to refine unreliable plans generated by diffusion models by providing refining guidance to error-prone plans. To this end, we suggest a new metric named restoration gap for evaluating the quality of individual plans generated by the diffusion model. A restoration gap is estimated by a gap predictor which produces restoration gap guidance to refine a diffusion planner. We additionally present an attribution map regularizer to prevent adversarial refining guidance that could be generated from the sub-optimal gap predictor, which enables further refinement of infeasible plans. We demonstrate the effectiveness of our approach on three different benchmarks in offline control settings that require long-horizon planning. We also illustrate that our approach presents explainability by presenting the attribution maps of the gap predictor and highlighting error-prone transitions, allowing for a deeper understanding of the generated plans.🇨🇳 Diffusion-based планирование 已经在长期目标、稀热奖励任务中显示出了承诺的结果，通过训练路径扩散模型并使用辅助指导函数来conditioning 采样的 trajectory。然而，由于它们的性质为生成模型，diffusion 模型不能保证生成可行的计划，导致执行失败和禁止在安全应用中使用。在这种情况下，我们提出了一种新的方法，可以修复不可靠的计划，并且可以防止由低质量的 gap 预测器生成的 adversarial 修复指南。我们建议一个新的 metric 名为 restoration gap，可以评估生成的计划质量。restoration gap 由 gap 预测器生成的修复指南来衡量。此外，我们还提出了一种 attributed map 规范，可以防止由低质量的 gap 预测器生成的 adversarial 修复指南。我们在三个不同的 benchmark 上进行了证明，并示出了我们的方法的可行性和可见性。🇨🇳 Diffusion-based планирование 已经在长期目标、稀热奖励任务中显示出了承诺的结果，通过训练路径扩散模型并使用辅助指导函数来conditioning 采样的 trajectory。然而，由于它们的性质为生成模型，diffusion 模型不能保证生成可行的计划，导致执行失败和禁止在安全应用中使用。在这种情况下，我们提出了一种新的方法，可以修复不可靠的计划，并且可以防止由低质量的 gap 预测器生成的 adversarial 修复指南。我们建议一个新的 metric 名为 restoration gap，可以评估生成的计划质量。restoration gap 由 gap 预测器生成的修复指南来衡量。此外，我们还提出了一种 attributed map 规范，可以防止由低质量的 gap 预测器生成的 adversarial 修复指南。我们在三个不同的 benchmark 上进行了证明，并示出了我们的方法的可行性和可见性。

Artificial intelligence and the limits of the humanities

paper_url: http://arxiv.org/abs/2310.19425
repo_url: None
paper_authors: Włodzisław Duch
for: 这篇论文是为了探讨现代世界中文化复杂性的问题，以及人类认知的限制和人工智能的发展对人文学科的影响。
methods: 这篇论文使用了认知科学的方法和数据分析技术，以探讨人类认知的限制和人工智能的发展对人文学科的影响。
results: 这篇论文的结果表明，人工智能将在人文学科中推广应用，从艺术到政治科学和哲学，使这些领域变得更加吸引人，让学生们能够超越当前的限制。

Abstract
The complexity of cultures in the modern world is now beyond human comprehension. Cognitive sciences cast doubts on the traditional explanations based on mental models. The core subjects in humanities may lose their importance. Humanities have to adapt to the digital age. New, interdisciplinary branches of humanities emerge. Instant access to information will be replaced by instant access to knowledge. Understanding the cognitive limitations of humans and the opportunities opened by the development of artificial intelligence and interdisciplinary research necessary to address global challenges is the key to the revitalization of humanities. Artificial intelligence will radically change humanities, from art to political sciences and philosophy, making these disciplines attractive to students and enabling them to go beyond current limitations.

摘要
现代世界中文化的复杂性已经超越人类理解的能力。诺谟科学抵触了传统基于心理模型的解释。核心人文科目可能会失去其重要性。人文学需要适应数字时代。新的交叉学科人文科学出现。快速获取信息将被快速获取知识所取代。理解人类认知的局限性和人工智能和交叉研究的发展对解决全球挑战是人文学的关键。人工智能将彻底改变人文学，从艺术到政治科学和哲学，使这些学科吸引更多的学生，让他们可以超越当前的局限性。

Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills

paper_url: http://arxiv.org/abs/2310.19424
repo_url: https://github.com/seongun-kim/vcrl
paper_authors: Seongun Kim, Kyowoon Lee, Jaesik Choi
for: 提高复杂技能自主学习的效率和状态覆盖速度
methods: 基于信息理论的无监督学习方法，包括变量资源学习和目标条件RL
results: 在复杂导航和机器人操作任务上，提高样本效率和状态覆盖速度，并在实际世界Robot导航任务中完成零模拟设置，并且与全球规划器结合可以进一步提高表现。

Abstract
Mutual information-based reinforcement learning (RL) has been proposed as a promising framework for retrieving complex skills autonomously without a task-oriented reward function through mutual information (MI) maximization or variational empowerment. However, learning complex skills is still challenging, due to the fact that the order of training skills can largely affect sample efficiency. Inspired by this, we recast variational empowerment as curriculum learning in goal-conditioned RL with an intrinsic reward function, which we name Variational Curriculum RL (VCRL). From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.

摘要
互助信息基于渐进学习（RL）已被提议为自动学习复杂技能的有望框架，通过互助信息（MI）最大化或变量赋权来实现。然而，学习复杂技能仍然是挑战，因为训练技能的顺序可以大大影响样本效率。drawing inspiration from this, we recast variational empowerment as curriculum learning in goal-conditioned RL with an intrinsic reward function, which we name Variational Curriculum RL (VCRL). From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.

Text-to-3D with Classifier Score Distillation

paper_url: http://arxiv.org/abs/2310.19415
repo_url: None
paper_authors: Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, Xiaojuan Qi
for: 这篇论文主要用于探讨Score Distillation Sampling（SDS）方法在文本生成中的表现，特别是在使用类ifier-free guidance时的优化。
methods: 这篇论文使用了一种新的Classifier Score Distillation（CSD）方法，其利用了预训练的2D扩散模型，并通过使用一个隐藏的分类器来对生成进行导航。
results: 研究发现，CSD方法可以在文本生成中实现更高效的结果，包括形状生成、纹理合成和形状编辑等任务，并且比现有的方法更高效。

Abstract
Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation

摘要
Text-to-3D生成技术在最近几年内已取得了很大的进步，特别是基于Score Distillation Sampling（SDS）的方法。SDS方法利用预训练的2D扩散模型，而使用无类标注导航是已知为成功优化的关键。然而，在这篇论文中，我们重新评估了无类标注导航的角色在Score Distillation中，发现一个奇异的发现：导航alone是足够的 для有效的文本到3D生成任务。我们称之为Classifier Score Distillation（CSD），可以理解为使用隐藏的分类模型进行生成。这新的视角揭示了新的理解现有技术的新思路。我们在多种文本到3D任务上验证了CSD的效果，包括形状生成、Texture Synthesis和形状编辑，并成功超越了当前的状态艺术法。更多信息请访问我们的项目页面：https://xinyu-andy.github.io/Classifier-Score-Distillation。

Resource Constrained Semantic Segmentation for Waste Sorting

paper_url: http://arxiv.org/abs/2310.19407
repo_url: https://github.com/anubis09/Resource_Constrained_Semantic_Segmentation_for_Waste_Sorting
paper_authors: Elisa Cascina, Andrea Pellegrino, Lorenzo Tozzi
for: 这 paper 是为了提出高效的废物分类策略，以降低垃圾的环境影响。
methods: 该 paper 使用了 resource-constrained semantic segmentation 模型，用于在工业设置中分类可回收垃圾。模型需要在10MB内存限制下运行，适用于边缘应用程序with limited processing capacity。作者们使用了量化和剪辑技术来实现这一目标。
results: 作者们在三个网络（ICNet、BiSeNet（Xception39 backbone）和ENet）上进行了实验，并取得了正面的结果，同时只有marginally影响了 Mean IoU 度量。此外，作者们还提出了一种combined Focal和Lovász loss函数，用于解决隐式的类别不均衡问题，从而实现更好的性能。

Abstract
This work addresses the need for efficient waste sorting strategies in Materials Recovery Facilities to minimize the environmental impact of rising waste. We propose resource-constrained semantic segmentation models for segmenting recyclable waste in industrial settings. Our goal is to develop models that fit within a 10MB memory constraint, suitable for edge applications with limited processing capacity. We perform the experiments on three networks: ICNet, BiSeNet (Xception39 backbone), and ENet. Given the aforementioned limitation, we implement quantization and pruning techniques on the broader nets, achieving positive results while marginally impacting the Mean IoU metric. Furthermore, we propose a combination of Focal and Lov\'asz loss that addresses the implicit class imbalance resulting in better performance compared with the Cross-entropy loss function.

摘要
这个研究旨在提出高效垃圾分类策略，以减少垃圾堆生的环境影响。我们提议使用限制资源的语义分割模型，用于工业环境中分类可回收物。我们的目标是开发10MB内存套用的模型，适用于边缘应用程序，具有有限的处理能力。我们在三个网络上进行实验：ICNet、BiSeNet（Xception39底层）和ENet。由于上述限制，我们实施了量化和剪除技术，获得了正面效果，同时微量地影响了 Mean IoU 度量。此外，我们提议使用 FOCAL 和 Lovász 损失函数，解决了隐式的分类不均衡问题，从而实现更好的性能。

Othello is Solved

paper_url: http://arxiv.org/abs/2310.19387
repo_url: https://github.com/mwarot1/othello-ai
paper_authors: Hiroki Takizawa
for: 这篇论文是为了解决扑克游戏“奥特莫”的计算解决方案。
methods: 该论文使用了计算机科学中的搜索算法和推理技术来解决游戏。
results: 论文表明，在完美游戏情况下，两个玩家的游戏都会平局。

Abstract
The game of Othello is one of the world's most complex and popular games that has yet to be computationally solved. Othello has roughly ten octodecillion (10 to the 58th power) possible game records and ten octillion (10 to the 28th power) possible game position. The challenge of solving Othello, determining the outcome of a game with no mistake made by either player, has long been a grand challenge in computer science. This paper announces a significant milestone: Othello is now solved, computationally proved that perfect play by both players lead to a draw. Strong Othello software has long been built using heuristically designed search techniques. Solving a game provides the solution which enables software to play the game perfectly.

摘要
抽象游戏“奥迪洛”是全球最复杂且受欢迎的游戏之一，尚未被计算解决。奥迪洛拥有约十 octodecillion（10^58）可能的游戏记录和十 octillion（10^28）可能的游戏位置。解决奥迪洛的挑战，即确定没有任何错误的游戏记录，长期被计算机科学界视为一个巨大的挑战。本文宣布了一个重要突破：奥迪洛已经被计算解决，并证明了完美游戏记录会导致平局。强大的奥迪洛软件长期采用了经验设计的搜索技术。解决游戏提供了完美游戏记录，使软件可以完美地游戏。

Protecting Publicly Available Data With Machine Learning Shortcuts

paper_url: http://arxiv.org/abs/2310.19381
repo_url: None
paper_authors: Nicolas M. Müller, Maximilian Burgert, Pascal Debus, Jennifer Williams, Philip Sperl, Konstantin Böttinger
for: 防止非法数据抓取 (to prevent unauthorized data crawling)
methods: 利用机器学习假短Circuit (using machine learning shortcuts)
results: 成功地防止非法数据抓取 (successfully prevented unauthorized data crawling)Here’s the full text in Simplified Chinese:
for: 防止非法数据抓取，即许多 crawlers Grabs 和重新销售数据点。
methods: 利用机器学习假短Circuit，即在数据集中植入 ML 假短，使模型具有卓越的训练和测试性能，但同时限制模型的泛化能力。
results: 通过实验，我们成功地防止了非法数据抓取，而且这种方法可以扩展到多个用例。

Abstract
Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.

摘要

Few-shot Hybrid Domain Adaptation of Image Generators

paper_url: http://arxiv.org/abs/2310.19378
repo_url: https://github.com/echopluto/fhda
paper_authors: Hengjia Li, Yang Liu, Linxuan Xia, Yuqi Lin, Tu Zheng, Zheng Yang, Wenxiao Wang, Xiaohui Zhong, Xiaobo Ren, Xiaofei He
for: 能否适应多个目标领域的混合领域？
methods: 我们提出了一种无监测器框架，通过直接将不同领域的图像编码为分离的子空间来解决这个问题。我们还提出了一个新的方向分量损失，该损失通过减少生成图像与所有目标领域的距离，同时保持源领域的特征。
results: 我们的方法可以在单个适应器中获得多个目标领域的各种特征，超越基eline方法在 semantic similarity、图像准确性和交叉领域一致性上。

Abstract
Can a pre-trained generator be adapted to the hybrid of multiple target domains and generate images with integrated attributes of them? In this work, we introduce a new task -- Few-shot Hybrid Domain Adaptation (HDA). Given a source generator and several target domains, HDA aims to acquire an adapted generator that preserves the integrated attributes of all target domains, without overriding the source domain's characteristics. Compared with Domain Adaptation (DA), HDA offers greater flexibility and versatility to adapt generators to more composite and expansive domains. Simultaneously, HDA also presents more challenges than DA as we have access only to images from individual target domains and lack authentic images from the hybrid domain. To address this issue, we introduce a discriminator-free framework that directly encodes different domains' images into well-separable subspaces. To achieve HDA, we propose a novel directional subspace loss comprised of a distance loss and a direction loss. Concretely, the distance loss blends the attributes of all target domains by reducing the distances from generated images to all target subspaces. The direction loss preserves the characteristics from the source domain by guiding the adaptation along the perpendicular to subspaces. Experiments show that our method can obtain numerous domain-specific attributes in a single adapted generator, which surpasses the baseline methods in semantic similarity, image fidelity, and cross-domain consistency.

摘要
可以把预训练的生成器适应到多个目标Domain的混合体中，生成具有这些目标Domain的 интеGRATED特征？在这项工作中，我们提出了一个新任务——多少shot Hybrid Domain Adaptation（HDA）。给定一个源生成器和多个目标Domain，HDA的目标是获得一个适应了所有目标Domain的特征的生成器，而不覆盖源Domain的特征。与Domain Adaptation（DA）相比，HDA具有更大的灵活性和多样性，可以适应更复杂和广泛的Domain。同时，HDA也带来了更多的挑战，因为我们只有各个目标Domain的图像，缺乏真正的混合Domain的图像。为解决这个问题，我们提出了一个不含探测器的框架，直接将不同Domain的图像编码成分离的子空间中。为实现HDA，我们提出了一个新的方向性子空间损失，包括距离损失和方向损失。具体来说，距离损失将所有目标Domain的特征混合在生成图像中，使得生成图像与所有目标Subspace之间的距离减小。方向损失保持源Domain的特征，使得适应过程在Subspace之间的垂直方向上进行。实验表明，我们的方法可以在单个适应器中获得多个Domain特征，超过基eline方法的semantic similarity、图像准确率和交叉Domain一致性。

RGB-X Object Detection via Scene-Specific Fusion Modules

paper_url: http://arxiv.org/abs/2310.19372
repo_url: https://github.com/dsriaditya999/rgbxfusion
paper_authors: Sri Aditya Deevi, Connor Lee, Lu Gan, Sushruth Nagesh, Gaurav Pandey, Soon-Jo Chung
for: 实现自主车辆在各种天气情况下视觉理解环境。
methods: 使用高效且各自独立的RGB-X融合网络，可以运用已经预训的单模式模型，并运用Scene-specific融合模组来整合多 modal 数据。
results: 与现有方法比较，我们的方法在RGB-热和RGB-关闭数据上表现更好，仅需要小量额外参数。代码可以在https://github.com/dsriaditya999/RGBXFusion 上获取。

Abstract
Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion.

摘要
多模态深感融合具有实现自动驾驶车辆在所有天气条件下视觉理解周围环境的潜力。然而，现有的深感融合方法通常采用复杂的建筑方式，混合多模态特征，需要大量相关多模态数据进行训练。在这项工作中，我们提出了一种高效和模块化的 RGB-X 融合网络，可以通过Scene-specific fusion modules来利用和融合预训练的单模态模型，从而实现输入适应性网络架构，使用小量相关多模态数据进行融合。我们的实验表明我们的方法与现有作品在 RGB-thermal 和 RGB-gated 数据集上表现更优，只需要一小部分额外参数。我们的代码可以在 GitHub 上找到：https://github.com/dsriaditya999/RGBXFusion。

Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective

paper_url: http://arxiv.org/abs/2310.19360
repo_url: https://github.com/pku-ml/rebat
paper_authors: Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, Yisen Wang
for: 本研究旨在解释逆攻击训练（AT）在学习率（LR）衰减后存在严重的Robust Overfitting问题，并提出一种解决方案来缓解这种问题。
methods: 本研究使用了视为对抗训练为动态最小最大游戏的分析方法，具体来说，研究如何通过强化模型训练者的记忆能力，使得模型具有更好的鲁棒性。
results: 实验结果表明，采用ReBalanced Adversarial Training（ReBAT）可以提高模型的鲁棒性，并不会在训练过程中出现Robust Overfitting问题，即使学习率衰减很长。

Abstract
Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specifically, we analyze how LR decay breaks the balance between the minimax game by empowering the trainer with a stronger memorization ability, and show such imbalance induces robust overfitting as a result of memorizing non-robust features. We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer's capacity or improving the attack strength. Experiments show that the proposed ReBalanced Adversarial Training (ReBAT) can attain good robustness and does not suffer from robust overfitting even after very long training. Code is available at https://github.com/PKU-ML/ReBAT.

摘要
adversarial training (AT) 已成为当前领域中最佳的算法，以EXTRACTING Robust Features。然而，研究人员最近注意到，AT受到了严重的Robust Overfitting问题，特别是在学习率 (LR) decay 后。在这篇论文中，我们解释这种现象，视为 adversarial training 是一个动态的 minimax 游戏 между模型训练者和攻击者。我们分析了如何 LR decay 破坏了这个游戏的平衡，使得训练者获得了更强的记忆能力，并显示这种不平衡引起了 robust overfitting 的结果。我们通过广泛的实验 validate 这一理解，并提供了robust overfitting 的整体视图，从两个游戏 player 的动态来看。这一理解还 inspirits 我们提出了 rebalancing adversarial training (ReBAT)，以解决 robust overfitting 问题。实验显示，ReBAT 可以 дости得好的Robustness，并不会在训练过程中 suffer from robust overfitting。代码可以在 https://github.com/PKU-ML/ReBAT 上找到。

Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction

paper_url: http://arxiv.org/abs/2310.19845
repo_url: None
paper_authors: Nazeeh Ghatasheh, Ismail Altaharwa, Khaled Aldebei
for: 本研究旨在提出一种模糊减量和超参优化的模糊遗传算法，以解决社交媒体上的滥发问题。
methods: 本研究使用了一种自适应的模糊遗传算法，initialized eXtreme Gradient Boosting分类器，并对推文数据进行缩放和减量，以生成一个滥发预测模型。
results: 实验结果表明，提案的方法可以在10 folds的十分划分验证中，平均取得82.32%的准确率和92.67%的平均值，使用了 menos de 10%的总特征空间。此外，模糊遗传算法还比$Chi^2$和$PCA$特征选择方法更高效。

Abstract
Recently, spam on online social networks has attracted attention in the research and business world. Twitter has become the preferred medium to spread spam content. Many research efforts attempted to encounter social networks spam. Twitter brought extra challenges represented by the feature space size, and imbalanced data distributions. Usually, the related research works focus on part of these main challenges or produce black-box models. In this paper, we propose a modified genetic algorithm for simultaneous dimensionality reduction and hyper parameter optimization over imbalanced datasets. The algorithm initialized an eXtreme Gradient Boosting classifier and reduced the features space of tweets dataset; to generate a spam prediction model. The model is validated using a 50 times repeated 10-fold stratified cross-validation, and analyzed using nonparametric statistical tests. The resulted prediction model attains on average 82.32\% and 92.67\% in terms of geometric mean and accuracy respectively, utilizing less than 10\% of the total feature space. The empirical results show that the modified genetic algorithm outperforms $Chi^2$ and $PCA$ feature selection methods. In addition, eXtreme Gradient Boosting outperforms many machine learning algorithms, including BERT-based deep learning model, in spam prediction. Furthermore, the proposed approach is applied to SMS spam modeling and compared to related works.

摘要
最近，社交媒体上的垃圾信息引起了研究和业务界的关注。推特成为了垃圾信息的主要媒体。许多研究努力解决社交媒体上的垃圾信息问题。推特的特点是巨大的特征空间和不均匀的数据分布。通常，相关的研究工作只是解决一些主要挑战或生成黑盒模型。在这篇论文中，我们提出了一种修改后的遗传算法，同时实现维度减少和超参优化。该算法首先初始化了极限梯度提升分类器，然后将推特数据集中的特征空间减少，以生成垃圾预测模型。模型通过10次10fold stratified cross-validation进行验证，并使用非参数统计测试分析。结果显示，修改后的遗传算法在平均上达到82.32%和92.67%的垃圾预测精度，使用的特征空间少于10%。实验结果表明，修改后的遗传算法超过了$Chi^2$和$PCA$特征选择方法。此外，极限梯度提升超过了许多机器学习算法，包括BERT基于深度学习模型，在垃圾预测方面。此外，我们还应用了提议方法到短信垃圾模型中，并与相关工作进行比较。

Introducing instance label correlation in multiple instance learning. Application to cancer detection on histopathological images

paper_url: http://arxiv.org/abs/2310.19359
repo_url: None
paper_authors: Pablo Morales-Álvarez, Arne Schmidt, José Miguel Hernández-Lobato, Rafael Molina
for: 这个论文是为了提出一种基于 Gaussian Processes（GP）的多例学习（MIL）方法，以便在计算生物学中处理不具有补丁级别标签的整个扫描图像。
methods: 该方法基于现有的 state-of-the-art VGPMIL-PR 方法，并增加了一种以 Ising 模型为 inspirations 的新的 Coupling 项。使用变量推断来估计所有模型参数。
results: 在两个实际问题中（抑制肾癌检测），我们的模型表现更好于其他现有的可能性 MIL 方法，并提供了不同的视觉化和分析来了解 Coupling 项的影响。这些结果预期能够应用于其他研究领域。

Abstract
In the last years, the weakly supervised paradigm of multiple instance learning (MIL) has become very popular in many different areas. A paradigmatic example is computational pathology, where the lack of patch-level labels for whole-slide images prevents the application of supervised models. Probabilistic MIL methods based on Gaussian Processes (GPs) have obtained promising results due to their excellent uncertainty estimation capabilities. However, these are general-purpose MIL methods that do not take into account one important fact: in (histopathological) images, the labels of neighboring patches are expected to be correlated. In this work, we extend a state-of-the-art GP-based MIL method, which is called VGPMIL-PR, to exploit such correlation. To do so, we develop a novel coupling term inspired by the statistical physics Ising model. We use variational inference to estimate all the model parameters. Interestingly, the VGPMIL-PR formulation is recovered when the weight that regulates the strength of the Ising term vanishes. The performance of the proposed method is assessed in two real-world problems of prostate cancer detection. We show that our model achieves better results than other state-of-the-art probabilistic MIL methods. We also provide different visualizations and analysis to gain insights into the influence of the novel Ising term. These insights are expected to facilitate the application of the proposed model to other research areas.

摘要
最近几年，弱度监督多例学习（MIL）的思想在多个领域得到了广泛的应用。一个典型的应用例子是计算生物学，因为整个扫描图像缺乏小块级别标签，使得超参数学习模型无法应用。基于 Gaussian Processes（GP）的概率MIL方法在这些领域取得了显著的成果，主要是因为它们的不确定性估计能力强。然而，这些是通用MIL方法，没有考虑一个重要的事实：在生物学图像中，邻居块的标签具有相互关联性。在这个工作中，我们扩展了状态空间的GP-based MIL方法，称为VGPMIL-PR，以利用这种相互关联性。为此，我们开发了一个灵感来自统计物理爱因斯坦模型的新封装项。我们使用变量推断来估计所有模型参数。有趣的是，VGPMIL-PR的形式被恰当的Weightvanishes时 recovered。我们在两个实际问题中评估了提案的方法性能：肠癌检测。我们发现，我们的模型在其他状态空间的概率MIL方法中表现出色，并且提供了不同的视觉化和分析，以便更深入地理解VGPMIL-PR的影响。这些理解可能会促进该模型在其他研究领域的应用。

Modeling the Telemarketing Process using Genetic Algorithms and Extreme Boosting: Feature Selection and Cost-Sensitive Analytical Approach

paper_url: http://arxiv.org/abs/2310.19843
repo_url: None
paper_authors: Nazeeh Ghatasheh, Ismail Altaharwa, Khaled Aldebei
For: This paper aims to leverage telemarketing data to model the willingness of clients to make a term deposit and to find the most significant characteristics of clients.* Methods: The paper proposes a novel genetic algorithm-based classifier to select the best discriminating features and tune classifier parameters simultaneously, and builds an explainable prediction model using real-world data from a Portuguese bank and national socio-economic metrics.* Results: The models significantly outperform related works in terms of class of interest accuracy, with an average of 89.07% and a type I error of 0.059. The model is expected to maximize the potential profit margin at the least possible cost and provide more insights to support marketing decision-making.Here is the text in Simplified Chinese:* For: 这篇论文目标是利用电话营销数据模型客户愿意签订贷款，并找出客户最重要的特征。* Methods: 论文提出了一种基于遗传算法的分类器，以同时选择最佳描述特征和分类器参数。它还建立了可解释的预测模型，使用葡萄牙银行的实际数据和国家经济指标。* Results: 模型与相关工作相比，表现出了显著的优异，具体来说是89.07%的正精度和0.059的类型一错。模型预期能够最大化利润减cost，并为市场营销决策提供更多的洞察。

Abstract
Currently, almost all direct marketing activities take place virtually rather than in person, weakening interpersonal skills at an alarming pace. Furthermore, businesses have been striving to sense and foster the tendency of their clients to accept a marketing offer. The digital transformation and the increased virtual presence forced firms to seek novel marketing research approaches. This research aims at leveraging the power of telemarketing data in modeling the willingness of clients to make a term deposit and finding the most significant characteristics of the clients. Real-world data from a Portuguese bank and national socio-economic metrics are used to model the telemarketing decision-making process. This research makes two key contributions. First, propose a novel genetic algorithm-based classifier to select the best discriminating features and tune classifier parameters simultaneously. Second, build an explainable prediction model. The best-generated classification models were intensively validated using 50 times repeated 10-fold stratified cross-validation and the selected features have been analyzed. The models significantly outperform the related works in terms of class of interest accuracy, they attained an average of 89.07\% and 0.059 in terms of geometric mean and type I error respectively. The model is expected to maximize the potential profit margin at the least possible cost and provide more insights to support marketing decision-making.

摘要
现在，大多数直接市场活动都发生在虚拟空间上，而不是面对面，这导致人际交往技巧受到了威胁。此外，企业也在努力感受和培养客户接受市场提供的倾向。由于数字转型和虚拟存在的增加，公司们需要找到新的市场研究方法。这项研究希望通过电话营销数据来模型客户是否签订贷款的愿望，并找出客户最重要的特征。使用葡萄牙银行的实际数据和国家经济指标，我们模型了电话营销决策过程。这项研究有两个关键贡献：首先，提出了一种基于遗传算法的分类器，可同时选择最佳分类特征和调整分类器参数。第二，建立了可解释预测模型。最佳生成的分类模型经过50次10fold分割验证，选择的特征也进行了分析。这些模型在相关作品的类别准确率和类型一错率方面均表现出色，其中类别准确率为89.07%，类型一错率为0.059。这些模型预计可以最大化可能的利润差额，并提供更多的市场决策支持。

Improving Factual Consistency of Text Summarization by Adversarially Decoupling Comprehension and Embellishment Abilities of LLMs

paper_url: http://arxiv.org/abs/2310.19347
repo_url: None
paper_authors: Huawen Feng, Yan Fan, Xiong Liu, Ting-En Lin, Zekun Yao, Yuchuan Wu, Fei Huang, Yongbin Li, Qianli Ma
for: 提高基于大语言模型（LLM）的文本摘要的可靠性，解决LLM生成的摘要具有“幻想”等问题。
methods: 提出了一种对 LLM 进行逆解耦法（DECENT），以便分离把握和补充两种能力，并采用了一种基于探针的参数有效的技术来补充训练过程中的缺失。
results: DECENT 可以够准确地提高基于 LLM 的文本摘要的可靠性，并且可以减少 LLM 生成的幻想现象。

Abstract
Despite the recent progress in text summarization made by large language models (LLMs), they often generate summaries that are factually inconsistent with original articles, known as "hallucinations" in text generation. Unlike previous small models (e.g., BART, T5), current LLMs make fewer silly mistakes but more sophisticated ones, such as imposing cause and effect, adding false details, and overgeneralizing, etc. These hallucinations are challenging to detect through traditional methods, which poses great challenges for improving the factual consistency of text summarization. In this paper, we propose an adversarially DEcoupling method to disentangle the Comprehension and EmbellishmeNT abilities of LLMs (DECENT). Furthermore, we adopt a probing-based parameter-efficient technique to cover the shortage of sensitivity for true and false in the training process of LLMs. In this way, LLMs are less confused about embellishing and understanding, thus can execute the instructions more accurately and have enhanced abilities to distinguish hallucinations. Experimental results show that DECENT significantly improves the reliability of text summarization based on LLMs.

摘要
尽管最近的大语言模型（LLM）在文本摘要方面做出了重要的进步，但它们经常生成的摘要却与原文不匹配，这被称为“幻觉”（hallucinations）在文本生成中。与过去的小型模型（例如BART、T5）相比，当前的LLM更少地出现了笨蛋的问题，但更多地出现了更加复杂的问题，如强制 causa et effectus、添加假信息、过度总结等等。这些幻觉通常难以通过传统方法检测，这对提高文本摘要的准确性带来了很大的挑战。在这篇论文中，我们提出了一种对抗分解方法，以分离LLM的理解和丰富能力（DECENT）。此外，我们采用了一种 parameter-efficient 的探测技术，以弥补 LLM 在训练过程中对真实和假的敏感性的缺失。这样，LLM 就不再混乱地塑造和理解，因此可以更准确地执行指令，并具有更强的幻觉检测能力。实验结果表明，DECENT 可以显著提高基于 LLM 的文本摘要的可靠性。

Skywork: A More Open Bilingual Foundation Model

paper_url: http://arxiv.org/abs/2310.19341
repo_url: https://github.com/skyworkai/skywork
paper_authors: Tianwen Wei, Liang Zhao, Lichang Zhang, Bo Zhu, Lijie Wang, Haihua Yang, Biye Li, Cheng Cheng, Weiwei Lü, Rui Hu, Chenxia Li, Liu Yang, Xilin Luo, Xuejie Wu, Lunan Liu, Wenjun Cheng, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Lei Lin, Xiaokun Wang, Yutuan Ma, Chuanhai Dong, Yanqi Sun, Yifu Chen, Yongyi Peng, Xiaojuan Liang, Shuicheng Yan, Han Fang, Yahui Zhou
for: 这个技术报告提出了 Skywork-13B，一个基于英语和中文文本资料的大语言模型（LLMs），这是目前最大的开源发布的LLMs。
methods: 该模型使用了两stage训练方法，首先是通用训练，然后是领域特定增强训练。
results: 该模型在各种标准测试 benchmarks 上表现出色，并在多个领域的中文语言模型中达到了状态的艺术性能。此外，该报告还提出了一种泄露检测方法，表明测试数据污染是一个需要进一步调查的问题。

Abstract
In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose training and then domain-specific enhancement training, respectively. We show that our model not only excels on popular benchmarks, but also achieves \emph{state of the art} performance in Chinese language modeling on diverse domains. Furthermore, we propose a novel leakage detection method, demonstrating that test data contamination is a pressing issue warranting further investigation by the LLM community. To spur future research, we release Skywork-13B along with checkpoints obtained during intermediate stages of the training process. We are also releasing part of our SkyPile corpus, a collection of over 150 billion tokens of web text, which is the largest high quality open Chinese pre-training corpus to date. We hope Skywork-13B and our open corpus will serve as a valuable open-source resource to democratize access to high-quality LLMs.

摘要
在这份技术报告中，我们介绍了 Skywork-13B，一家大语言模型（LLMs），在英语和中文文本集合上 receives 超过 3.2 万亿个字的训练。这是目前最广泛训练和公开发布的相似大小 LLMs。我们提出了一种两阶段训练方法，首先是通用训练，然后是领域特定增强训练。我们发现，我们的模型不仅在 популяр的标准 bencmarks 上表现出色，还在多个领域的中文语言模型中实现了状态的术性表现。此外，我们提出了一种新的泄露检测方法，这表明测试数据污染是一个需要进一步调查的问题， LLMC 社区应该关注。为促进未来的研究，我们发布了 Skywork-13B 和在 intermediate 阶段训练过程中的检点。我们还发布了一部分的 SkyPile corpus，这是目前最大的高质量公开中文预训练集，总计超过 150 亿个字的网络文本。我们希望 Skywork-13B 和我们的开放 corpus 将成为一个价值的开源资源，推广高质量 LLMC 的访问。

TempME: Towards the Explainability of Temporal Graph Neural Networks via Motif Discovery

paper_url: http://arxiv.org/abs/2310.19324
repo_url: https://github.com/graph-and-geometric-learning/tempme
paper_authors: Jialin Chen, Rex Ying
for: 本研究的目的是提高现有的时间图 neural network (TGNN) 的可解释性和信任worthiness，通过找出引导预测的时间抽象（temporal motifs）。
methods: 本研究提出了一种新的方法 called Temporal Motifs Explainer (TempME), 它基于信息瓶颈理论提取最重要的时间抽象，以最小化包含的信息量，保持解释的简洁和稀烈。
results: 实验表明，TempME 可以更好地找出引导预测的时间抽象，并提高现有 TGNN 的预测精度，最高提高22.96%。

Abstract
Temporal graphs are widely used to model dynamic systems with time-varying interactions. In real-world scenarios, the underlying mechanisms of generating future interactions in dynamic systems are typically governed by a set of recurring substructures within the graph, known as temporal motifs. Despite the success and prevalence of current temporal graph neural networks (TGNN), it remains uncertain which temporal motifs are recognized as the significant indications that trigger a certain prediction from the model, which is a critical challenge for advancing the explainability and trustworthiness of current TGNNs. To address this challenge, we propose a novel approach, called Temporal Motifs Explainer (TempME), which uncovers the most pivotal temporal motifs guiding the prediction of TGNNs. Derived from the information bottleneck principle, TempME extracts the most interaction-related motifs while minimizing the amount of contained information to preserve the sparsity and succinctness of the explanation. Events in the explanations generated by TempME are verified to be more spatiotemporally correlated than those of existing approaches, providing more understandable insights. Extensive experiments validate the superiority of TempME, with up to 8.21% increase in terms of explanation accuracy across six real-world datasets and up to 22.96% increase in boosting the prediction Average Precision of current TGNNs.

摘要
现在的渐变系统模型中，时间相关的交互都是通过图structures来描述的。在现实世界中，这些图structures中的时间模式（temporal motifs）是生成未来交互的基本机制。虽然现有的时间图神经网络（TGNN）已经取得了成功，但是还不确定哪些时间模式是TGNN的预测中的重要指示器，这是现有TGNN的解释性和可信度的主要挑战。为解决这个问题，我们提出了一种新的方法，即时间模式解释器（TempME），它抽出TGNN预测中的最重要时间模式，同时保持解释的简洁和精炼。 TempME基于信息瓶颈理论，从交互相关的模式中提取最重要的信息，以保持解释的简洁和精炼。对于 TempME生成的解释，事件之间的空间时间相关性比现有方法高，提供更直观的理解。广泛的实验证明了 TempME 的超越性，在六个实际 dataset 上提高解释准确率达到8.21%，并在当前 TGNN 的预测中提高了平均精度22.96%。

D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion

paper_url: http://arxiv.org/abs/2310.19321
repo_url: https://github.com/graph-and-geometric-learning/d4explainer
paper_authors: Jialin Chen, Shirley Wu, Abhijit Gupta, Rex Ying
for: 提高 Graph Neural Networks (GNNs) 的解释性，以便更好地理解 GNNs 对预测结果的影响。
methods: 提出了一种新的 D4Explainer 方法，该方法通过 incorporating 生成图分布学习到优化目标中，以生成符合分布性的对于每个实例的几种可能性图，并且通过对这些可能性图进行分析，以提供模型级别的解释。
results: D4Explainer 在 synthetic 和实际世界数据集上实现了状态之决定的表现，包括解释准确率、实际性、多样性和Robustness。

Abstract
The widespread deployment of Graph Neural Networks (GNNs) sparks significant interest in their explainability, which plays a vital role in model auditing and ensuring trustworthy graph learning. The objective of GNN explainability is to discern the underlying graph structures that have the most significant impact on model predictions. Ensuring that explanations generated are reliable necessitates consideration of the in-distribution property, particularly due to the vulnerability of GNNs to out-of-distribution data. Unfortunately, prevailing explainability methods tend to constrain the generated explanations to the structure of the original graph, thereby downplaying the significance of the in-distribution property and resulting in explanations that lack reliability. To address these challenges, we propose D4Explainer, a novel approach that provides in-distribution GNN explanations for both counterfactual and model-level explanation scenarios. The proposed D4Explainer incorporates generative graph distribution learning into the optimization objective, which accomplishes two goals: 1) generate a collection of diverse counterfactual graphs that conform to the in-distribution property for a given instance, and 2) identify the most discriminative graph patterns that contribute to a specific class prediction, thus serving as model-level explanations. It is worth mentioning that D4Explainer is the first unified framework that combines both counterfactual and model-level explanations. Empirical evaluations conducted on synthetic and real-world datasets provide compelling evidence of the state-of-the-art performance achieved by D4Explainer in terms of explanation accuracy, faithfulness, diversity, and robustness.

摘要
广泛部署图 neural network (GNN) 引发了大量关注，其中一个关键问题是解释abilit y，它在模型审核和建立信任worthy图学习中扮演着关键角色。GNN解释的目标是理解图结构对模型预测的影响。为保证生成的解释可靠，必须考虑图中的分布性质，特别是由于GNN的敏感性，以避免由图外数据引起的解释错误。然而，现有的解释方法通常只能在原始图结构下生成解释，从而忽略图中的分布性质，导致解释不可靠。为解决这些挑战，我们提出了D4Explainer，一种新的方法，可以为GNN提供符合分布性质的内部解释。D4Explainer integrate generative graph distribution learning into the optimization objective，它可以实现两个目标：1）生成符合分布性质的对应实例的多个多样化Counterfactual graphs，2）标识影响特定预测的最重要的图模式，并作为模型级别解释。需要注意的是，D4Explainer是第一个结合Counterfactual和模型级别解释的统一框架。empirical evaluations on synthetic and real-world datasets show that D4Explainer achieves state-of-the-art performance in terms of explanation accuracy, faithfulness, diversity, and robustness.

L2T-DLN: Learning to Teach with Dynamic Loss Network

paper_url: http://arxiv.org/abs/2310.19313
repo_url: None
paper_authors: Zhoyang Hai, Liyuan Pan, Xiabi Liu, Zhengzheng Liu, Mirna Yunita
for: 这个论文的目的是教导机器学习模型如何使用动态损失函数进行学习。
methods: 这个论文使用了一种具有记忆单元的教师模型，以及一种动态损失网络，以帮助学生模型通过教师模型的经验进行学习。
results: 实验表明，这种方法可以提高学生模型的学习效果，并且可以提高不同深度模型在真实世界任务上的性能，包括分类、目标检测和semantic segmentation等场景。

Abstract
With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of student model learning. In existing works, the teacher model 1) merely determines the loss function based on the present states of the student model, i.e., disregards the experience of the teacher; 2) only utilizes the states of the student model, e.g., training iteration number and loss/accuracy from training/validation sets, while ignoring the states of the loss function. In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units, and, therefore, enables the student learning to be guided by the experience of the teacher model. Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model. Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios.

摘要
In this paper, we formulate the loss adjustment as a temporal task by designing a teacher model with memory units, allowing the student learning to be guided by the teacher model's experience. Additionally, we use a dynamic loss network to assist the teacher learning in enhancing the interactions between the teacher and the student model. Our approach has been extensively tested and has been shown to improve the performance of various deep models on real-world tasks, including classification, object detection, and semantic segmentation scenarios.

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

paper_url: http://arxiv.org/abs/2310.19308
repo_url: None
paper_authors: Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du
for: solves sequential decision-making problems with off-policy dynamic programming techniques
methods: return-conditioned supervised learning (RCSL) and multilayer perceptron function approximator
results: converges under more relaxed assumptions than traditional dynamic programming methods, outperforms state-of-the-art model-free and model-based offline RL algorithms in simulated robotics problems

Abstract
Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be an important technique for solving sequential decision-making problems. However, in the presence of function approximation such algorithms are not guaranteed to converge, often diverging due to the absence of Bellman-completeness in the function classes considered, a crucial condition for the success of DP-based methods. In this paper, we show how off-policy learning techniques based on return-conditioned supervised learning (RCSL) are able to circumvent these challenges of Bellman completeness, converging under significantly more relaxed assumptions inherited from supervised learning. We prove there exists a natural environment in which if one uses two-layer multilayer perceptron as the function approximator, the layer width needs to grow linearly with the state space size to satisfy Bellman-completeness while a constant layer width is enough for RCSL. These findings take a step towards explaining the superior empirical performance of RCSL methods compared to DP-based methods in environments with near-optimal datasets. Furthermore, in order to learn from sub-optimal datasets, we propose a simple framework called MBRCSL, granting RCSL methods the ability of dynamic programming to stitch together segments from distinct trajectories. MBRCSL leverages learned dynamics models and forward sampling to accomplish trajectory stitching while avoiding the need for Bellman completeness that plagues all dynamic programming algorithms. We propose both theoretical analysis and experimental evaluation to back these claims, outperforming state-of-the-art model-free and model-based offline RL algorithms across several simulated robotics problems.

摘要
Off-policy动态规划（DP）技术如$Q$-学习已经证明是解决时间序列决策问题的重要技术。然而，在函数拟合的存在下，这些算法并不是保证 converges，经常因为函数类型中缺乏 Bellman完备性，这是动态规划基本条件的重要因素。在这篇论文中，我们表明了基于返回条件supervised learning（RCSL）的外部学习技术可以绕过 Bellman完备性的挑战，并在更松散的假设下 converges。我们证明了在使用两层多层感知器作为函数 aproximator 时，状态空间大小与层宽之间存在直线关系，以满足 Bellman完备性的条件。这些发现为RCSL方法在实际中的superior empirical performance提供了解释。此外，为了学习从不优化的数据集中，我们提议了一个简单的框架called MBRCSL，使得 RCSL 方法可以通过动态规划来缝合分割的轨迹。MBRCSL 利用学习的动力模型和前向采样来完成轨迹缝合，而不需要 Bellman完备性，这些都是动态规划算法的必要条件。我们提出了理论分析和实验评估，在多个模拟的 роботикс问题上超过了当前的model-free和model-based offline RL算法的性能。

ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense

paper_url: http://arxiv.org/abs/2310.19301
repo_url: https://github.com/k-square-00/rome
paper_authors: Kankan Zhou, Eason Lai, Wei Bin Au Yeong, Kyriakos Mouratidis, Jing Jiang
for: 评估现有预训练视觉语言模型是否具备理解非常规内容的能力。
methods: 使用新创的ROME数据集进行评测，该数据集包含违反常识知识的图像。
results: 大多数预训练视觉语言模型无法正确地解释非常规的场景。

Abstract
Humans possess a strong capability for reasoning beyond common sense. For example, given an unconventional image of a goldfish laying on the table next to an empty fishbowl, a human would effortlessly determine that the fish is not inside the fishbowl. The case, however, may be different for a vision-language model, whose reasoning could gravitate towards the common scenario that the fish is inside the bowl, despite the visual input. In this paper, we introduce a novel probing dataset named ROME (reasoning beyond commonsense knowledge) to evaluate whether the state-of-the-art pre-trained vision-language models have the reasoning capability to correctly interpret counter-intuitive content. ROME contains images that defy commonsense knowledge with regards to color, shape, material, size and positional relation. Experiments on the state-of-the-art pre-trained vision-language models reveal that most of these models are still largely incapable of interpreting counter-intuitive scenarios. We hope that ROME will spur further investigations on reasoning beyond commonsense knowledge in vision-language research.

摘要
人类具有强大的理性能力，可以理解不同于常识的场景。例如，给一个不寻常的图像，如一只鱼在桌子旁边的空鱼缸中，人类会很容易理解鱼不在鱼缸中。然而，这可能不是情况 для视觉语言模型，这些模型可能会受到常识的引导，认为鱼在鱼缸中。在这篇论文中，我们提出了一个新的探索数据集名为ROME（理解超出常识知识），以评估当前最先进的预训练视觉语言模型是否具备正确地解释不同征的能力。ROME数据集包含图像，它们与常识知识有很多不同，包括颜色、形状、材质、大小和位置关系。我们对当前最先进的预训练视觉语言模型进行实验，发现大多数这些模型仍然无法正确地解释不同征的场景。我们希望ROME会激发更多关于理解超出常识知识的研究在视觉语言领域。

ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout

paper_url: http://arxiv.org/abs/2310.19295
repo_url: None
paper_authors: Huiyao Shu, Ang Wang, Ziji Shi, Hanyu Zhao, Yong Li, Lu Lu
for: 本文主要针对 deep learning 模型训练中的内存问题，提出了一种基于 computation graph 的内存有效执行计划，以提高模型的内存使用效率和减少高级技术的开销。
methods: 本文提出了一种基于模型结构和训练内存负担的优化理论，并提出了一种高效的树结构算法来自动找到合适的任务分解方案。
results: 实验表明，使用 ROAM 可以减少内存使用量的 35.7%、13.3% 和 27.2%，并提供了惊人的 53.7 倍的速度提升。 Plus, the evaluation on the large GPT2-XL model further confirms the scalability of ROAM.

Abstract
As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques. In this paper, we propose ROAM which operates on computation graph level to derive memory-efficient execution plan with optimized operator order and tensor memory layout for models. We first propose sophisticated theories that carefully consider model structure and training memory load to support optimization for large complex graphs that have not been well supported in the past. An efficient tree-based algorithm is further proposed to search task divisions automatically, along with delivering high performance and effectiveness to solve the problem. Experiments show that ROAM achieves a substantial memory reduction of 35.7%, 13.3%, and 27.2% compared to Pytorch and two state-of-the-art methods and offers a remarkable 53.7x speedup. The evaluation conducted on the expansive GPT2-XL further validates ROAM's scalability.

摘要
深度学习模型的大小不断增加，训练时的内存需求也在不断增加。高级技术如卸载、重计算和压缩可以减轻内存压力，但也会导致开销。然而，一个高效的执行计划，包括合理的运算顺序和维度缓存布局，可以减少模型的内存占用和高级技术的开销。在这篇论文中，我们提出了ROAM，它在计算图层次上运行，以 derivation 高效的执行计划，包括优化的运算顺序和维度缓存布局，为模型提供更高的内存效率和更低的开销。我们首先提出了一些复杂的理论，考虑到模型结构和训练内存负担，以支持大规模复杂的图进行优化。然后，我们提出了一种高效的树结构算法，自动搜索任务分割，同时具有高性能和有效性，解决这个问题。实验表明，ROAM可以实现35.7%、13.3%和27.2%的内存减少，相比于Pytorch和两个状态 искусternalMethods，并提供了惊人的53.7倍的速度提升。而在大型GPT2-XL上进行的评估也证明了ROAM的扩展性。

The Memory Perturbation Equation: Understanding Model’s Sensitivity to Data

paper_url: http://arxiv.org/abs/2310.19273
repo_url: None
paper_authors: Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas Möllenhoff, Mohammad Emtiyaz Khan
for: 这篇论文是用于描述一种能够简化模型训练数据的敏感性的方法。
methods: 该方法基于 bayesian 原则，可以总结多种模型和算法的敏感性度量，并且可以在训练过程中获得敏感性度量的估计值。
results: 研究结果表明，在训练过程中获得的敏感性度量可以准确预测模型在未经见的测试数据上的总成果。这个方法预期会在未来的robust和适应学习研究中得到应用。

Abstract
Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.

摘要
理解模型受训练数据的敏感性是非常重要，但同时也可能具有挑战性和成本高的问题，特别是在训练期间。为了简化这些问题，我们提出了记忆扰动方程（MPE），该方程关系模型受训练数据的扰动和其敏感性。基于 bayesian 原则，MPE 总结了各种模型和算法的敏感度测量方法，并展示了这些测量方法在各种情况下的有用性。我们的实验结果表明，在训练期间获得的敏感度估计可以准确预测未经训练的测试数据上的泛化性。我们预期该方程在未来的robust和adaptive学习研究中将有用。

NPCL: Neural Processes for Uncertainty-Aware Continual Learning

paper_url: http://arxiv.org/abs/2310.19272
repo_url: https://github.com/srvcodes/npcl
paper_authors: Saurav Jha, Dong Gong, He Zhao, Lina Yao
for: 本研究旨在提高深度神经网络在流动数据上高效地训练continuous learning（CL）模型，并限制新任务引起的忘记。
methods: 本研究使用神经过程（NP），一种元学习器，来编码不同任务为概率分布上的函数，同时提供可靠的uncertainty estimate。特别是，我们提出了一种基于NP的CL方法（NPCL），其中任务特定模块被安排在层次隐藏变量模型中。我们采用了定制的正则化来缓解忘记。
results: 我们的实验表明，NPCL比前一代CL方法性能更高。我们还验证了NPCL中的uncertainty estimation能力可以处理CL中的任务头/模块推理挑战。代码可以在 \url{https://github.com/srvCodes/NPCL} 上获取。

Abstract
Continual learning (CL) aims to train deep neural networks efficiently on streaming data while limiting the forgetting caused by new tasks. However, learning transferable knowledge with less interference between tasks is difficult, and real-world deployment of CL models is limited by their inability to measure predictive uncertainties. To address these issues, we propose handling CL tasks with neural processes (NPs), a class of meta-learners that encode different tasks into probabilistic distributions over functions all while providing reliable uncertainty estimates. Specifically, we propose an NP-based CL approach (NPCL) with task-specific modules arranged in a hierarchical latent variable model. We tailor regularizers on the learned latent distributions to alleviate forgetting. The uncertainty estimation capabilities of the NPCL can also be used to handle the task head/module inference challenge in CL. Our experiments show that the NPCL outperforms previous CL approaches. We validate the effectiveness of uncertainty estimation in the NPCL for identifying novel data and evaluating instance-level model confidence. Code is available at \url{https://github.com/srvCodes/NPCL}.

摘要

Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union

paper_url: http://arxiv.org/abs/2310.19252
repo_url: https://github.com/zifuwanggg/jdtlosses
paper_authors: Zifu Wang, Maxim Berman, Amal Rannen-Triki, Philip H. S. Torr, Devis Tuia, Tinne Tuytelaars, Luc Van Gool, Jiaqian Yu, Matthew B. Blaschko
for: 本文提出了一种用于评估semantic segmentation方法的新的评价指标，以解决传统评价指标受到类别偏度和对象大小偏度的影响。
methods: 本文提出了使用细化的mIoU指标，并与相关的最差情况指标共同评估semantic segmentation方法。
results: 在12种自然和飞行分割数据集上，使用提出的评价指标训练和评估15种现代神经网络模型，研究表明，使用细化的mIoU指标可以减少对大对象的偏度，提供更加全面的评估结果。

Abstract
Semantic segmentation datasets often exhibit two types of imbalance: \textit{class imbalance}, where some classes appear more frequently than others and \textit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards \textit{majority classes} (e.g. overall pixel-wise accuracy) and \textit{large objects} (e.g. mean pixel-wise accuracy and per-dataset mean intersection over union). To address these shortcomings, we propose the use of fine-grained mIoUs along with corresponding worst-case metrics, thereby offering a more holistic evaluation of segmentation techniques. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Furthermore, we undertake an extensive benchmark study, where we train and evaluate 15 modern neural networks with the proposed metrics on 12 diverse natural and aerial segmentation datasets. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects. Moreover, we identify the crucial role played by architecture designs and loss functions, which lead to best practices in optimizing fine-grained metrics. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.

摘要

Pre-trained Recommender Systems: A Causal Debiasing Perspective

paper_url: http://arxiv.org/abs/2310.19251
repo_url: https://github.com/myhakureimu/prerec
paper_authors: Ziqian Lin, Hao Ding, Nghia Hoang, Branislav Kveton, Anoop Deoras, Hao Wang
for: 本研究探讨了应用预训练视语模型于推荐系统中，以提高适应能力和学习效率。
methods: 本研究使用了预训练模型，并提出了一种归一化权重学习方法来解决各自领域的偏见问题。
results: 实验结果显示，提出的方法可以在零或几步学习 scenarios下提高推荐性能，并在不同市场和平台上实现较好的适应性。In simpler Chinese, the three key points would be:
for: 这个研究用了预训练模型来提高推荐系统的适应能力和学习效率。
methods: 这个研究使用了预训练模型，并提出了一种解决各自领域偏见问题的方法。
results: 实验结果显示，这个方法可以在零或几步学习 scenarios下提高推荐性能，并在不同市场和平台上实现较好的适应性。

Abstract
Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data). However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named PreRec. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios.

摘要
研究者最近对预训条件语言/视觉模型进行了详细的研究，并证明了这种新的解决方案在人工智能中具有实用的优点，即可以透过对广泛的资料进行预训，然后在有限的训练数据下预测成功解决各种下游任务，包括零或几个案例中的学习。这些进步鼓发我们在这篇论文中考虑这种解决方案在推荐系统中的可能性和挑战。我们提议发展一个通用的推荐系统，可以透过对不同领域的用户项目互动资料进行预训，然后快速适应提高几个类别学习中的性能。然而，与视觉语言数据不同的是，推荐数据收集自不同领域（例如不同国家或不同的电子商务平台）的普遍性几何在semantic空间中是由文化差异所隐藏的。在我们的实验中，这种多元偏见在数据中对预训模型的效果产生阻碍。为了解决这个挑战，我们进一步引入和实践了一种 causal debiasing 的观点，这是通过一种层次 Bayesian 深度学习模型，名为 PreRec，实现了。我们的实验结果显示，提案的模型可以在零或几个学习设定下提高推荐性能，包括跨市场和跨平台的enario。

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI

paper_url: http://arxiv.org/abs/2310.19248
repo_url: https://github.com/aaaaaasuka/impress
paper_authors: Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, Jinghui Chen
for: 本研究旨在评估不可见的杂杂性 perturbation 是否能够保护原始图像免于不当使用。methods: 本研究使用了一种基于扩散的图像生成模型，并在这些模型中引入了不可见的杂杂性。results: 研究发现，通过使用不可见的杂杂性，可以减弱图像的保护力，使其更易受到不当使用。此外，研究还提出了一种新的优化策略，可以用于纯化图像，从而增强图像的安全性。

Abstract
Diffusion-based image generation models, such as Stable Diffusion or DALL-E 2, are able to learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on his/her original artworks or to maliciously edit the original images for fake content. However, such ability also brings serious ethical issues without proper authorization from the owner of the original images. In response, several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations, which are designed to mislead the diffusion model and make it unable to properly generate new samples. In this work, we introduce a perturbation purification platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure. IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e.g., style mimicking, malicious editing). The proposed IMPRESS platform offers a comprehensive evaluation of several contemporary protection methods, and can be used as an evaluation platform for future protection methods.

摘要
diffusion-based图像生成模型，如稳定扩散或DALL-E 2，可以根据给定的图像学习并生成高质量的样本，并且可以根据提示进行指导。例如，它们可以创建imitates艺术家的艺术风格的图像，或者为违反原始图像的作者权利而修改图像。然而，这种能力也会产生严重的道德问题，无法得到原始图像的作者授权。为了保护原始图像，several attempts have been made to add imperceptible perturbations，这些噪声是为了诱导扩散模型，使其无法生成正确的新样本。在这种情况下，我们提出了一个名为IMPRESS的噪声纯化平台，用于评估噪声的有效性。IMPRESS基于关键的观察，即噪声可以导致原始图像和扩散重constructed图像之间的不一致，这可以用来开发一种新的优化策略，以纯化图像，可能弱化原始图像的保护（例如，艺术风格模仿、Malicious editing）。我们的IMPRESS平台可以评估多种当今保护方法的有效性，并且可以用作未来保护方法的评估平台。

paper_url: http://arxiv.org/abs/2310.19247
repo_url: https://github.com/ringbdstack/ucl_sed
paper_authors: Jiaqian Ren, Hao Peng, Lei Jiang, Zhiwei Liu, Jia Wu, Zhengtao Yu, Philip S. Yu
for: 提高社会事件检测任务中的总体模型性能，特别是对于具有低频率的类型。
methods: 提出了一种基于不确定性指导的类别不均衡学习框架（UCL${SED}$）和其变种（UCL-EC${SED}$），通过增强模型对不确定类型的泛化，提高总体模型性能。
results: 在三个严重偏见的社会事件 dataset 上进行了实验，结果显示，我们的模型可以显著改善社会事件表示和分类任务中的几乎所有类型，特别是那些不确定的类型。

Abstract
Real-world social events typically exhibit a severe class-imbalance distribution, which makes the trained detection model encounter a serious generalization challenge. Most studies solve this problem from the frequency perspective and emphasize the representation or classifier learning for tail classes. While in our observation, compared to the rarity of classes, the calibrated uncertainty estimated from well-trained evidential deep learning networks better reflects model performance. To this end, we propose a novel uncertainty-guided class imbalance learning framework - UCL$_{SED}$, and its variant - UCL-EC$_{SED}$, for imbalanced social event detection tasks. We aim to improve the overall model performance by enhancing model generalization to those uncertain classes. Considering performance degradation usually comes from misclassifying samples as their confusing neighboring classes, we focus on boundary learning in latent space and classifier learning with high-quality uncertainty estimation. First, we design a novel uncertainty-guided contrastive learning loss, namely UCL and its variant - UCL-EC, to manipulate distinguishable representation distribution for imbalanced data. During training, they force all classes, especially uncertain ones, to adaptively adjust a clear separable boundary in the feature space. Second, to obtain more robust and accurate class uncertainty, we combine the results of multi-view evidential classifiers via the Dempster-Shafer theory under the supervision of an additional calibration method. We conduct experiments on three severely imbalanced social event datasets including Events2012\_100, Events2018\_100, and CrisisLexT\_7. Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.

摘要
实际世界中的社会事件通常会出现严重的类别不均衡分布，这使得训练的检测模型遇到了严重的泛化挑战。大多数研究从频率角度出发，强调表达或类型学习的tail类。而我们所观察到的是，相比于罕见类的数量，从训练得到的准确的深度学习网络中的偏置估计更好地反映模型性能。为此，我们提出了一种基于uncertainty的类别不均衡学习框架-UCL$_{SED}$，以及其变体-UCL-EC$_{SED}$，用于社会事件检测任务中的类别不均衡问题。我们希望通过提高模型对不确定类的泛化来提高整体模型性能。由于性能下降通常来自于误分类邻近类型的样本，我们将注意点放在缓存空间边界学习和类型学习中，使用高质量的不确定估计。首先，我们设计了一种基于uncertainty的对比学习损失函数，即UCL和其变体UCL-EC，用于训练不均衡数据。在训练过程中，它们迫使所有类型，特别是不确定的类型，在特征空间中适应自适应的清晰分界。其次，为了获得更加稳定和准确的类 uncertainty，我们将多视图证据级联推理结果，通过德мп斯特-沙费尔理论进行监督，并采用额外校准方法。我们对三个严重不均衡的社会事件数据集进行实验，包括Events2012\_100、Events2018\_100和CrisisLexT\_7。我们的模型在社会事件表示和分类任务中具有显著改善，特别是不确定的类型。

Stochastic Configuration Machines: FPGA Implementation

paper_url: http://arxiv.org/abs/2310.19225
repo_url: https://github.com/plubplub1/bountyfarm
paper_authors: Matthew J. Felicetti, Dianhui Wang
for: 这个论文主要针对工业应用中的神经网络问题，具体来说是对响应速度、内存大小和功耗等约束进行优化。
methods: 这篇论文使用了随机学习器，并且提出了使用硬件解决方案来减少模型的资源占用。特别是，使用了 Stochastic Configuration Machines（SCM）来减少内存约束，并通过限制随机权重为二进制值和使用机制模型来提高学习性和结果可读性。
results: 这篇论文在Field Programmable Gate Array（FPGA）上实现了 SCM 模型，并对两个标准数据集和两个工业数据集进行了测试。结果显示，SCM 模型可以在各种约束下达到比较好的性能。

Abstract
Neural networks for industrial applications generally have additional constraints such as response speed, memory size and power usage. Randomized learners can address some of these issues. However, hardware solutions can provide better resource reduction whilst maintaining the model's performance. Stochastic configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling. Stochastic Configuration Machines (SCMs) extend this to focus on reducing the memory constraints by limiting the randomized weights to a binary value with a scalar for each node and using a mechanism model to improve the learning performance and result interpretability. This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to the algorithm. Results are reported for two benchmark and two industrial datasets, including SCM with single-layer and deep architectures.

摘要
neural networks for industrial applications 通常有额外的约束，如响应速度、内存大小和功耗使用。随机学习者可以解决一些这些问题。然而，硬件解决方案可以提供更好的资源减少，同时保持模型的性能。随机配置网络（SCNs）在工业应用中是一个首选的，因为它们在数据模型方面具有优点和可行性。随机配置机器（SCMs）将这个扩展到减少内存约束，限制随机权重为二进制值，并使用机制模型来提高学习性能和结果解释性。本文将SCM模型在场程可编程阵列（FPGA）上实现，并引入二进制编码输入。对两个标准准样据集和两个工业准样据集进行了报告，包括单层和深度SCM模型。

EHRTutor: Enhancing Patient Understanding of Discharge Instructions

paper_url: http://arxiv.org/abs/2310.19212
repo_url: None
paper_authors: Zihao Zhang, Zonghai Yao, Huixue Zhou, Feiyun ouyang, Hong Yu
for: 这篇论文的目的是用语言模型作为医疗教育的工具，帮助患者更好地理解他们的诊断和治疗计划。
methods: 这篇论文提出了一种多Component框架，使用大型语言模型（LLM）来实现患者教育，通过对话问答的方式来帮助患者更好地理解他们的电子医疗记录（EHR）出院指南。
results: 论文的评估结果表明，使用EHRTutor可以更好地帮助患者理解他们的诊断和治疗计划，并且可以提高患者对医疗信息的理解和参与度。

Abstract
Large language models have shown success as a tutor in education in various fields. Educating patients about their clinical visits plays a pivotal role in patients' adherence to their treatment plans post-discharge. This paper presents EHRTutor, an innovative multi-component framework leveraging the Large Language Model (LLM) for patient education through conversational question-answering. EHRTutor first formulates questions pertaining to the electronic health record discharge instructions. It then educates the patient through conversation by administering each question as a test. Finally, it generates a summary at the end of the conversation. Evaluation results using LLMs and domain experts have shown a clear preference for EHRTutor over the baseline. Moreover, EHRTutor also offers a framework for generating synthetic patient education dialogues that can be used for future in-house system training.

摘要
大型语言模型在教育领域中展现出了成功，特别是在各种领域中的教育过程中。在医疗上发生访视时，教育病人关于他们的治疗方案是非常重要的，这篇文章介绍了EHRTutor，一个创新的多 ком成分框架，利用大型语言模型（LLM）进行病人教育，通过对话式问答。EHRTutor首先将电子健康记录档案中的出院指令形成问题，然后通过对话教育病人，每个问题都会作为测验。最后，它将问题的摘要给出。评估结果显示，使用LLM和专家的评价都偏好EHRTutor，而且EHRTutor还提供了一个生成人工病人教育对话的框架，可以用于未来的内部系统训练。

Leveraging generative artificial intelligence to simulate student learning behavior

paper_url: http://arxiv.org/abs/2310.19206
repo_url: None
paper_authors: Songlin Xu, Xinyu Zhang
for: 学会提高学习效果、进步教育研究和设计更有效的教学方法
methods: 使用大语言模型(LLMs)实现学生学习行为的模拟
results: 三个实验结果：第一个实验（N = 145）表明模拟学生学习结果与实际学生的各种民族特征相似；第二个实验（N = 4524）显示虚拟学生的学习行为逐渐变得更加真实；第三个实验（N = 27）表明虚拟学生的学习行为与考试问题、课程材料、参与度和理解水平之间存在紧密的相关性。

Abstract
Student simulation presents a transformative approach to enhance learning outcomes, advance educational research, and ultimately shape the future of effective pedagogy. We explore the feasibility of using large language models (LLMs), a remarkable achievement in AI, to simulate student learning behaviors. Unlike conventional machine learning based prediction, we leverage LLMs to instantiate virtual students with specific demographics and uncover intricate correlations among learning experiences, course materials, understanding levels, and engagement. Our objective is not merely to predict learning outcomes but to replicate learning behaviors and patterns of real students. We validate this hypothesis through three experiments. The first experiment, based on a dataset of N = 145, simulates student learning outcomes from demographic data, revealing parallels with actual students concerning various demographic factors. The second experiment (N = 4524) results in increasingly realistic simulated behaviors with more assessment history for virtual students modelling. The third experiment (N = 27), incorporating prior knowledge and course interactions, indicates a strong link between virtual students' learning behaviors and fine-grained mappings from test questions, course materials, engagement and understanding levels. Collectively, these findings deepen our understanding of LLMs and demonstrate its viability for student simulation, empowering more adaptable curricula design to enhance inclusivity and educational effectiveness.

摘要
学生模拟提供了一种转变性的方法，以提高学习成果，推动教育研究，并最终形成有效的教学方法。我们explore使用大语言模型（LLMs）来模拟学生学习行为。与传统的机器学习预测不同，我们利用LLMs来实例化虚拟学生，并探索学习经验、课程材料、理解水平和参与度之间的细腻相关性。我们的目标不仅是预测学习结果，而是复制学生学习行为和模式。我们验证了这一假设通过三个实验。第一个实验基于N = 145的数据集，模拟学生学习成果从民族数据中，发现与实际学生的各种民族因素之间存在相似性。第二个实验（N = 4524），通过增加评估历史，使虚拟学生的行为越来越真实。第三个实验（N = 27），通过考虑先前知识和课程互动，发现虚拟学生的学习行为与评估问题、课程材料、参与度和理解水平之间存在强相关性。总的来说，这些发现深入了我们对LLMs的理解，并证明了其可行性 для学生模拟，以便更适应性的课程设计，提高包容性和教育效果。

Can ChatGPT advance software testing intelligence? An experience report on metamorphic testing

paper_url: http://arxiv.org/abs/2310.19204
repo_url: None
paper_authors: Quang-Hung Luu, Huai Liu, Tsong Yueh Chen
for: 本研究探讨了使用 ChatGPT 提高软件测试智能的潜力，通过一个 métamorphic testing（MT）的实践案例研究。
methods: 本研究使用 ChatGPT 生成métramorphic relations（MR）的候选者，这些 MR 是软件系统的基本性质，通常需要人类智能来标识。这些 MR 候选者由领域专家评估 Correctness。
results: 研究发现，ChatGPT 可以生成新的正确 MR，用于测试多个软件系统。然而，大多数 MR 候选者都是不清晰定义或者错误的，尤其是对于没有被 MT 测试过的系统。 ChatGPT 可以提高软件测试智能，但是人类智能仍然需要参与以确保 MR 的正确性。

Abstract
While ChatGPT is a well-known artificial intelligence chatbot being used to answer human's questions, one may want to discover its potential in advancing software testing. We examine the capability of ChatGPT in advancing the intelligence of software testing through a case study on metamorphic testing (MT), a state-of-the-art software testing technique. We ask ChatGPT to generate candidates of metamorphic relations (MRs), which are basically necessary properties of the object program and which traditionally require human intelligence to identify. These MR candidates are then evaluated in terms of correctness by domain experts. We show that ChatGPT can be used to generate new correct MRs to test several software systems. Having said that, the majority of MR candidates are either defined vaguely or incorrect, especially for systems that have never been tested with MT. ChatGPT can be used to advance software testing intelligence by proposing MR candidates that can be later adopted for implementing tests; but human intelligence should still inevitably be involved to justify and rectify their correctness.

摘要
chatgpt是一个著名的人工智能chatbot，它可以回答人类的问题。我们想要探索chatgpt在软件测试方面的潜力。我们通过一个meta testing（MT）的实践研究，询问chatgpt生成元变换关系（MR）的候选者。这些MR候选者需要人类智能来识别，但chatgpt可以生成新的正确MR来测试许多软件系统。然而，大多数MR候选者都是欠准确的，特别是对于没有经过MT测试的系统。chatgpt可以帮助提高软件测试智能，但是人类智能仍然必须参与以确认和修正其正确性。

2023-10-30

Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

Vignat: Vulnerability identification by learning code semantics via graph attention networks

Concept Alignment as a Prerequisite for Value Alignment

Constrained Hierarchical Monte Carlo Belief-State Planning

Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning

SURF: A Generalization Benchmark for GNNs Predicting Fluid Dynamics

Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Topology Recoverability Prediction for Ad-Hoc Robot Networks: A Data-Driven Fault-Tolerant Approach

Multiscale Feature Attribution for Outliers

Evolutionary Tabletop Game Design: A Case Study in the Risk Game

Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Unveiling the Limits of Learned Local Search Heuristics: Are You the Mightiest of the Meek?

BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing

ExPT: Synthetic Pretraining for Few-Shot Experimental Design

Deep Learning for Spatiotemporal Big Data: A Vision on Opportunities and Challenges

Conditional Unscented Autoencoders for Trajectory Prediction

Towards Few-Annotation Learning for Object Detection: Are Transformer-based Models More Efficient ?

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

Unmasking Bias and Inequities: A Systematic Review of Bias Detection and Mitigation in Healthcare Artificial Intelligence Using Electronic Health Records

Interpretable Prototype-based Graph Information Bottleneck

Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer

Exploring Geometry of Blind Spots in Vision Models

DEFT: Dexterous Fine-Tuning for Real-World Hand Policies

Re-evaluating Retrosynthesis Algorithms with Syntheseus

Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone

SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

LILO: Learning Interpretable Libraries by Compressing and Documenting Code

From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models

Designing AI Support for Human Involvement in AI-assisted Decision Making: A Taxonomy of Human-AI Interactions from a Systematic Review

Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery

Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions

Autoregressive Renaissance in Neural PDE Solvers

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

Evaluating Large Language Models: A Comprehensive Survey

ViR: Vision Retention Networks

Generating Medical Instructions with Conditional Transformer

A Path to Simpler Models Starts With Noise

A Survey on Knowledge Editing of Neural Networks

Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness

Can input reconstruction be used to directly estimate uncertainty of a regression U-Net model? – Application to proton therapy dose prediction for head and neck cancer patients

Integrating Pre-trained Language Model into Neural Machine Translation

AI Alignment: A Comprehensive Survey

Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding

Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection

MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval

Fast swap regret minimization and applications to approximate correlated equilibria

RayDF: Neural Ray-surface Distance Fields with Multi-view Consistency

Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

Exploring Post-Training Quantization of Protein Language Models

Large Trajectory Models are Scalable Motion Predictors and Planners

Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

Technical Report on the Learning of Case Relevance in Case-Based Reasoning with Abstract Argumentation

LLMaAA: Making Large Language Models as Active Annotators

Prediction of Locally Stationary Data Using Expert Advice

CreoleVal: Multilingual Multitask Benchmarks for Creoles

A General Neural Causal Model for Interactive Recommendation

Inverse folding for antibody sequence design using deep learning

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

Trust, Accountability, and Autonomy in Knowledge Graph-based AI for Self-determination

Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal

Denoising Diffusion Probabilistic Models for Hardware-Impaired Communication Systems: Towards Wireless Generative AI

ALT: Towards Fine-grained Alignment between Language and CTR Models for Click-Through Rate Prediction

Large-Scale Application of Fault Injection into PyTorch Models – an Extension to PyTorchFI for Validation Efficiency

Explaining the Decisions of Deep Policy Networks for Robotic Manipulations

Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans

Artificial intelligence and the limits of the humanities

Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills

Text-to-3D with Classifier Score Distillation

Resource Constrained Semantic Segmentation for Waste Sorting

Othello is Solved

Protecting Publicly Available Data With Machine Learning Shortcuts

Few-shot Hybrid Domain Adaptation of Image Generators

RGB-X Object Detection via Scene-Specific Fusion Modules

Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective

Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction