2023-10-23

cs.AI

cs.AI - 2023-10-23

DoGE: Domain Reweighting with Generalization Estimation

paper_url: http://arxiv.org/abs/2310.15393
repo_url: None
paper_authors: Simin Fan, Matteo Pagliardini, Martin Jaggi
for: 本研究旨在提高大语言模型的通用化能力，探讨预训练数据集的覆盖率和组成如何影响模型的泛化能力。
methods: 本研究提出了Domain reweighting with Generalization Estimation（DoGE）方法，通过在每个预训练源域的抽样权重基于每个域的泛化贡献来优化预训练数据集。
results: 在SlimPajama-6B数据集上，使用DoGE方法可以获得较好的平均折扣率和零批评率表现。在异源预训练任务上，DoGE方法可以大幅降低目标域的折扣率。此外，本研究还提出了一种效率的参数选择方法来提高泛化估计的效果。

Abstract
The coverage and composition of the pretraining data corpus significantly impacts the generalization ability of large language models. Conventionally, the pretraining corpus is composed of various source domains (e.g. CommonCrawl, Wikipedia, Github etc.) according to certain sampling probabilities (domain weights). However, current methods lack a principled way to optimize domain weights for ultimate goal for generalization. We propose DOmain reweighting with Generalization Estimation (DoGE), where we reweigh the sampling probability from each domain based on its contribution to the final generalization objective assessed by a gradient-based generalization estimation function. First, we train a small-scale proxy model with a min-max optimization to obtain the reweighted domain weights. At each step, the domain weights are updated to maximize the overall generalization gain by mirror descent. Finally we use the obtained domain weights to train a larger scale full-size language model. On SlimPajama-6B dataset, with universal generalization objective, DoGE achieves better average perplexity and zero-shot reasoning accuracy. On out-of-domain generalization tasks, DoGE reduces perplexity on the target domain by a large margin. We further apply a parameter-selection scheme which improves the efficiency of generalization estimation.

摘要
大量语言模型的涵盖率和组合对其泛化能力产生了重要影响。传统上，预训练数据集由多个源频道（例如CommonCrawl、Wikipedia、Github等）按照certain sampling probabilities（频道权重）组成。然而，现有方法缺乏一种原则性的方法来优化频道权重以实现最终的泛化目标。我们提出了频道重Weighting with Generalization Estimation（DoGE），其中我们根据频道对最终泛化目标的贡献来重新定义频道权重。首先，我们使用一个小规模的代理模型通过min-max优化来获得重新定义的频道权重。在每次更新中，频道权重被更新以最大化总的泛化收益，通过镜投影法。最后，我们使用获得的频道权重来训练一个更大规模的全部语言模型。在SlimPajama-6B数据集上，使用通用泛化目标，DoGE在平均折衔率和零扫训练率方面达到了更好的性能。在对应频道上，DoGE可以大幅度下降目标频道的折衔率。我们进一步应用了一种效率的参数选择方案，以提高泛化估计的效率。

Irreducible Curriculum for Language Model Pretraining

paper_url: http://arxiv.org/abs/2310.15389
repo_url: None
paper_authors: Simin Fan, Martin Jaggi
for: 提高大型语言模型的自动数据选择和课程设计，以提高语言模型的预训练效果。
methods: 提出了一种名为”irreducible curriculum”的学习算法，该算法根据样本的学习可能性来优先选择训练样本。
results: 对RedPajama-1B数据集进行实验，与随机固定基线和反课程策略进行比较，显示了在所有7个领域中的验证抽象值的一致提高。同时，通过对MMLU标准准则进行5次测试，显示了网络的减锋和更好的5次准则性。

Abstract
Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training. Furthermore, current schemes focus on domain-level selection, overlooking the more fine-grained contributions of each individual training point. It is difficult to apply traditional datapoint selection methods on large language models: most online batch selection methods perform two-times forward or backward passes, which introduces considerable extra costs with large-scale models. To mitigate these obstacles, we propose irreducible curriculum as a curriculum learning algorithm for language model pretraining, which prioritizes samples with higher learnability. Specifically, to avoid prohibitive extra computation overhead, we simulate the sample loss along the main model's training trajectory using a small-scale proxy model. Our experiments on the RedPajama-1B dataset demonstrate a consistent improvement on validation perplexity across all 7 domains compared to random uniform baseline and the anti-curriculum strategy. Our method also reduces the sharpness of the network and illustrates a better 5-shot accuracy on MMLU benchmarks.

摘要
自动选择数据和课程设计 для训练大型自然语言处理模型是挑战，只有一些现有方法表现出了改善。现有方法主要集中在域级别的选择上，忽略了每个训练点的更细化贡献。传统的数据点选择方法在大型模型上不太适用：大多数在线批处理方法需要两次前向或后向通过，这会增加较大的额外成本。为了解决这些障碍，我们提出了不可分割课程（Irreducible Curriculum），一种基于语言模型预训练的学习策略，具体来说，我们使用一个小规模的代理模型来模拟样本损失 along主模型的训练轨迹。我们对RedPajama-1B数据集进行了实验，并证明了与随机统一基线和反课程策略相比，我们的方法在所有7个域上都表现出了稳定的提高。我们的方法还减少了网络的锐度，并在MMLU标准测试中达到了更好的5架准确率。

Course Correcting Koopman Representations

paper_url: http://arxiv.org/abs/2310.15386
repo_url: None
paper_authors: Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin
for: 本文目的是学习非线性动力系统（NLDS）中特征，以便在幽parallel space中线性化动力学。
methods: 本文使用自适应网络形式来解决这个问题，并研究了不同的方法来模型动态，特别是未来状态预测。
results: 研究发现了预测未来状态在幽parallel space中存在一些限制，并提出了一种执行时机制，即周期重编码，以准确捕捉长期动力学。这种方法在低维和高维NLDS中都得到了分析和实验 validate。

Abstract
Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS.

摘要

Health Disparities through Generative AI Models: A Comparison Study Using A Domain Specific large language model

paper_url: http://arxiv.org/abs/2310.18355
repo_url: None
paper_authors: Yohn Jairo Parra Bautista, Vinicious Lima, Carlos Theran, Richard Alo
for: 降低健康差距和提高健康服务质量
methods: 使用域专大语言模型（SciBERT）和多用途语言模型（BERT），并使用cosine相似性分析文本查询以探讨对健康差距的影响
results: SciBERT在独立使用“race”一询时无法区分“race”和“对健康差距带来影响”的区别，这表明需要更多的资料和专业知识来实现域专大语言模型的效果。

Abstract
Health disparities are differences in health outcomes and access to healthcare between different groups, including racial and ethnic minorities, low-income people, and rural residents. An artificial intelligence (AI) program called large language models (LLMs) can understand and generate human language, improving health communication and reducing health disparities. There are many challenges in using LLMs in human-doctor interaction, including the need for diverse and representative data, privacy concerns, and collaboration between healthcare providers and technology experts. We introduce the comparative investigation of domain-specific large language models such as SciBERT with a multi-purpose LLMs BERT. We used cosine similarity to analyze text queries about health disparities in exam rooms when factors such as race are used alone. Using text queries, SciBERT fails when it doesn't differentiate between queries text: "race" alone and "perpetuates health disparities." We believe clinicians can use generative AI to create a draft response when communicating asynchronously with patients. However, careful attention must be paid to ensure they are developed and implemented ethically and equitably.

摘要
医疗差距是指不同群体（包括种族和民族少数群体、低收入人群和农村居民）在医疗结果和医疗资源方面存在的差异。一种人工智能（AI）程序称为大语言模型（LLMs）可以理解和生成人类语言，从而改善医疗沟通和减少医疗差距。使用LLMs在医生与病人之间的互动中存在许多挑战，包括需要多样化和代表性的数据、隐私问题以及医疗提供者和技术专家之间的合作。我们介绍了对域pecific的大语言模型如SciBERT与多用途LLMsBERT进行比较研究。我们使用cosine相似性分析医疗差距话语中的文本查询，当factor such as race alone时，SciBERT失败。我们认为临床医生可以使用生成AI创建异步沟通中的答复。然而，必须仔细注意，以确保其在伦理和公平方面得到实施和开发。

Semantic Data Management in Data Lakes

paper_url: http://arxiv.org/abs/2310.15373
repo_url: None
paper_authors: Sayed Hoseini, Johannes Theissen-Lipp, Christoph Quix
for: 本文主要针对数据湖系统中的semantic data management，探讨latestapproaches和技术，以提高数据访问的表达力和可用性。
methods: 本文分为三类方法：基本semantic data management、metadata增强方法和ontologybased数据访问方法。每类方法都有详细介绍和比较latest研究。
results: 本文结合latest研究和实践，对数据湖系统中semantic data management的应用和扩展提出了一些挑战，并指出了未来研究的方向。

Abstract
In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontologybased data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies.

摘要
现在的年头，数据湖出现了管理大量不同类型数据的现代数据分析的新方法。为了避免数据湖变成无法操作的数据泥沼，一些方法提出了链接元数据到知识图基于连接数据原则，以提供数据湖中数据更多的含义和Semantics。这种semantic层可能不仅用于数据管理，还可以解决来自不同来源的数据集成问题，以使数据访问更加表达力和可交互。在这篇评论中，我们审查了最近的方法，专注于数据湖系统中的应用和大数据扩展。我们将这些方法分为（i）基本semantic数据管理、（ii）用于填充数据湖元数据的semantic模型方法、（iii）基于 ontology的数据访问方法。在每个类别中，我们介绍了主要技术和背景，并对最新的研究进行比较。最后，我们指出了未来研究领域中的挑战，需要更加紧密地结合大数据和Semantic Web技术。

EpiK-Eval: Evaluation for Language Models as Epistemic Models

paper_url: http://arxiv.org/abs/2310.15372
repo_url: None
paper_authors: Gabriele Prato, Jerry Huang, Prasannna Parthasarathi, Shagun Sodhani, Sarath Chandar
for: 本研究旨在探讨大型自然语言模型（LLM）在整合不同训练文档中的知识整合能力。
methods: 本研究引入了一个新的问答评价指标集，用于评估不同LLM在从分割的故事中整合知识的能力。
results: 研究发现现有的LLM在这个领域存在显著的缺陷，这些缺陷源于现有训练目标的限制。本研究提出了改进知识整合的方法，以提高LLM的总效果和性能。

Abstract
In the age of artificial intelligence, the role of large language models (LLMs) is becoming increasingly central. Despite their growing prevalence, their capacity to consolidate knowledge from different training documents - a crucial ability in numerous applications - remains unexplored. This paper presents the first study examining the capability of LLMs to effectively combine such information within their parameter space. We introduce EpiK-Eval, a novel question-answering benchmark tailored to evaluate LLMs' proficiency in formulating a coherent and consistent knowledge representation from segmented narratives. Evaluations across various LLMs reveal significant weaknesses in this domain. We contend that these shortcomings stem from the intrinsic nature of prevailing training objectives. Consequently, we advocate for refining the approach towards knowledge consolidation, as it harbors the potential to dramatically improve their overall effectiveness and performance. The findings from this study offer insights for developing more robust and reliable LLMs. Our code and benchmark are available at https://github.com/chandar-lab/EpiK-Eval

摘要
在人工智能时代，大型语言模型（LLM）的角色日益重要。尽管它们在应用领域中的使用越来越普遍，但它们在将不同训练文档中的知识集成到其参数空间中的能力仍未得到探讨。这篇论文发表了首个研究，探讨LLM在拼接篇章中的能力。我们提出了一个新的问答评价指标——EpiK-Eval，用于评估LLM在分段故事中形成一致性和稳定性的知识表示能力。经过多种LLM的评估，我们发现它们在这个领域存在重大的缺陷。我们认为这些缺陷源于现有训练目标的本质。因此，我们建议改进知识集成的方法，以提高LLM的总效果和性能。本研究的发现可以帮助开发更加强健和可靠的LLM。我们的代码和评价指标可以在GitHub上找到：https://github.com/chandar-lab/EpiK-Eval

Vicinal Feature Statistics Augmentation for Federated 3D Medical Volume Segmentation

paper_url: http://arxiv.org/abs/2310.15371
repo_url: None
paper_authors: Yongsong Huang, Wanqing Xie, Mingzhen Li, Mingmei Cheng, Jinzhou Wu, Weixiao Wang, Jane You, Xiaofeng Liu
for: This paper aims to develop a vicinal feature-level data augmentation (VFDA) scheme to improve the performance of federated learning (FL) for 3D medical segmentation, while preserving data privacy.
methods: The proposed VFDA scheme exploits batch-wise feature statistics in each institute to abstractly represent the discrepancy of data, and models each feature statistic probabilistically using a Gaussian prototype. The scheme is designed to mitigate the local feature shift and facilitate collaborative training for privacy-aware FL segmentation, without the need for cross-institute transfer of raw data or their mixup.
results: The proposed VFDA scheme consistently yielded marked improvements over six advanced FL methods on both 3D brain tumor and cardiac segmentation.

Abstract
Federated learning (FL) enables multiple client medical institutes collaboratively train a deep learning (DL) model with privacy protection. However, the performance of FL can be constrained by the limited availability of labeled data in small institutes and the heterogeneous (i.e., non-i.i.d.) data distribution across institutes. Though data augmentation has been a proven technique to boost the generalization capabilities of conventional centralized DL as a "free lunch", its application in FL is largely underexplored. Notably, constrained by costly labeling, 3D medical segmentation generally relies on data augmentation. In this work, we aim to develop a vicinal feature-level data augmentation (VFDA) scheme to efficiently alleviate the local feature shift and facilitate collaborative training for privacy-aware FL segmentation. We take both the inner- and inter-institute divergence into consideration, without the need for cross-institute transfer of raw data or their mixup. Specifically, we exploit the batch-wise feature statistics (e.g., mean and standard deviation) in each institute to abstractly represent the discrepancy of data, and model each feature statistic probabilistically via a Gaussian prototype, with the mean corresponding to the original statistic and the variance quantifying the augmentation scope. From the vicinal risk minimization perspective, novel feature statistics can be drawn from the Gaussian distribution to fulfill augmentation. The variance is explicitly derived by the data bias in each individual institute and the underlying feature statistics characterized by all participating institutes. The added-on VFDA consistently yielded marked improvements over six advanced FL methods on both 3D brain tumor and cardiac segmentation.

摘要

Why LLMs Hallucinate, and How to Get (Evidential) Closure: Perceptual, Intensional, and Extensional Learning for Faithful Natural Language Generation

paper_url: http://arxiv.org/abs/2310.15355
repo_url: None
paper_authors: Adam Bouyamourn
for: 这 paper 是为了探讨语言模型（LLMs）的假设和证据的关系而写的。
methods: 这 paper 使用了一种名为“Learn-Babble-Prune”的规则集来制约 LLMs，使其输出符合证据的要求。
results: 这 paper 显示了 LLMs 会“幻想”，即其输出不一定是基于证据的。通过应用“Learn-Babble-Prune”规则集，可以使 LLMs 的输出变得更加准确和可靠。

Abstract
We show that LLMs hallucinate because their output is not constrained to be synonymous with claims for which they have evidence: a condition that we call evidential closure. Information about the truth or falsity of sentences is not statistically identified in the standard neural probabilistic language model setup, and so cannot be conditioned on to generate new strings. We then show how to constrain LLMs to produce output that does satisfy evidential closure. A multimodal LLM must learn about the external world (perceptual learning); it must learn a mapping from strings to states of the world (extensional learning); and, to achieve fluency when generalizing beyond a body of evidence, it must learn mappings from strings to their synonyms (intensional learning). The output of a unimodal LLM must be synonymous with strings in a validated evidence set. Finally, we present a heuristic procedure, Learn-Babble-Prune, that yields faithful output from an LLM by rejecting output that is not synonymous with claims for which the LLM has evidence.

摘要
我们显示LLMs会幻视，因为它们的输出不是确保它们有证据的陈述：一个我们称为证据关闭的情况。讯息关于真伪或伪的句子是在标准神经 probabilistic language model 设置中不会被 statistically 识别出来，因此不能用来生成新的字串。我们然后显示如何将 LLMs 约束成生成符合证据关闭的出力。一个多modal LLM 必须学习外部世界（感知学习）；它必须学习将字串映射到世界状态（扩展学习）；并且，以扩展到证据外的情况时，它必须学习将字串映射到它们的同义词（内在学习）。一个单modal LLM 的输出必须是同义的字串集。最后，我们提出了一个专业程序，Learn-Babble-Prune，它可以从 LLM 中获得实际的出力，并且拒绝不同义的出力。

Moral Foundations of Large Language Models

paper_url: http://arxiv.org/abs/2310.15337
repo_url: https://github.com/abdulhaim/moral_foundations_llm
paper_authors: Marwa Abdulhai, Gregory Serapio-Garcia, Clément Crepy, Daria Valter, John Canny, Natasha Jaques
for: 本研究用 moral foundations theory (MFT) 作为研究大型自然语言模型 (LLM) 是否具有特定道德价值观的工具。
methods: 本研究使用 MFT 分析知名的 LLM 是否具有特定道德价值观，并与人类道德价值观和政治倾向相关性分析。并且对这些偏见进行了验证和检验。
results: 研究发现知名的 LLM 具有特定道德价值观，并且这些偏见与人类道德价值观和政治倾向有关。此外，研究还发现可以通过针对性地选择提示来让模型展现出特定的道德价值观，并且这会影响模型在下游任务中的行为。这些发现可能暴露了 LLM 做出不良决策的风险和不良后果。

Abstract
Moral foundations theory (MFT) is a psychological assessment tool that decomposes human moral reasoning into five factors, including care/harm, liberty/oppression, and sanctity/degradation (Graham et al., 2009). People vary in the weight they place on these dimensions when making moral decisions, in part due to their cultural upbringing and political ideology. As large language models (LLMs) are trained on datasets collected from the internet, they may reflect the biases that are present in such corpora. This paper uses MFT as a lens to analyze whether popular LLMs have acquired a bias towards a particular set of moral values. We analyze known LLMs and find they exhibit particular moral foundations, and show how these relate to human moral foundations and political affiliations. We also measure the consistency of these biases, or whether they vary strongly depending on the context of how the model is prompted. Finally, we show that we can adversarially select prompts that encourage the moral to exhibit a particular set of moral foundations, and that this can affect the model's behavior on downstream tasks. These findings help illustrate the potential risks and unintended consequences of LLMs assuming a particular moral stance.

摘要
道德基础理论（MFT）是一种心理评估工具，它将人类道德理解 decomposes into 五个因素，包括 care/harm、liberty/oppression 和 sanctity/degradation（ Graham 等，2009）。人们在作出道德决策时会因文化背景和政治理念而异同在Weight 这些维度上。由于 LLMs 在互联网上收集的数据中存在偏见，它们可能会反映这些偏见。这篇论文使用 MFT 作为镜子来分析流行 LLMS 是否具有特定的道德价值观。我们分析了知名的 LLMS，并发现它们表现出特定的道德基础，并与人类道德基础和政治倾向相关。我们还测量了这些偏见的一致性，或者在提示方式不同时是否强烈变化。最后，我们表明了可以通过选择提示来使模型表现出特定的道德基础，并且这可以影响模型在下游任务中的行为。这些发现可以帮助我们理解 LLMS 可能存在的风险和无意义后果。

Serverless Federated Learning with flwr-serverless

paper_url: http://arxiv.org/abs/2310.15329
repo_url: https://github.com/kungfuai/flwr_serverless
paper_authors: Sanjeev V. Namjoshi, Reese Green, Krishi Sharma, Zhangzhang Si
for: 这个研究旨在提供一个可扩展、可靠且易用的 Federated Learning 框架，以便在不同的资料来源上进行训练，而不需要中央服务器。
methods: 这个研究使用了 Flower 库，并将其扩展为允许同步和异步 Federated Learning，并且不需要调eso client-side training jobs。
results: 这个研究通过了一系列实验，证明了其可以降低 Federated Training 的时间和成本，并且提供了一个更易用的方式来实现和实验 Federated Learning 系统。

Abstract
Federated learning is becoming increasingly relevant and popular as we witness a surge in data collection and storage of personally identifiable information. Alongside these developments there have been many proposals from governments around the world to provide more protections for individuals' data and a heightened interest in data privacy measures. As deep learning continues to become more relevant in new and existing domains, it is vital to develop strategies like federated learning that can effectively train data from different sources, such as edge devices, without compromising security and privacy. Recently, the Flower (\texttt{Flwr}) Python package was introduced to provide a scalable, flexible, and easy-to-use framework for implementing federated learning. However, to date, Flower is only able to run synchronous federated learning which can be costly and time-consuming to run because the process is bottlenecked by client-side training jobs that are slow or fragile. Here, we introduce \texttt{flwr-serverless}, a wrapper around the Flower package that extends its functionality to allow for both synchronous and asynchronous federated learning with minimal modification to Flower's design paradigm. Furthermore, our approach to federated learning allows the process to run without a central server, which increases the domains of application and accessibility of its use. This paper presents the design details and usage of this approach through a series of experiments that were conducted using public datasets. Overall, we believe that our approach decreases the time and cost to run federated training and provides an easier way to implement and experiment with federated learning systems.

摘要
Federated learning 在当今数据收集和存储个人标识信息的增长趋势下变得越来越重要和受欢迎。在这些发展中，政府们在全球范围内提出了更多的数据保护措施，并且对数据隐私措施表示了更高的兴趣。随着深度学习在新领域中得到应用，它变得更加重要，以训练来自不同来源的数据，如边缘设备，而不compromising安全和隐私。Recently, the Flower (\texttt{Flwr}) Python package was introduced to provide a scalable, flexible, and easy-to-use framework for implementing federated learning. However, to date, Flower is only able to run synchronous federated learning, which can be costly and time-consuming to run because the process is bottlenecked by client-side training jobs that are slow or fragile. Here, we introduce \texttt{flwr-serverless}, a wrapper around the Flower package that extends its functionality to allow for both synchronous and asynchronous federated learning with minimal modification to Flower's design paradigm. Furthermore, our approach to federated learning allows the process to run without a central server, which increases the domains of application and accessibility of its use. This paper presents the design details and usage of this approach through a series of experiments that were conducted using public datasets. Overall, we believe that our approach decreases the time and cost to run federated training and provides an easier way to implement and experiment with federated learning systems.

Hallucination Detection for Grounded Instruction Generation

paper_url: http://arxiv.org/abs/2310.15319
repo_url: None
paper_authors: Lingjun Zhao, Khanh Nguyen, Hal Daumé III
for: 本研究旨在生成用于导航 simulate 的家居环境的指南。
methods: 我们采用了一种预训练于大量图文对的模型，并通过对比损失来精度地检测抽象 Refer 的幻觉。
results: 我们的最终模型在许多基础模型之上表现出色，包括使用 Word 概率估计和超参数化的 LSTM 和 Transformer 模型。

Abstract
We investigate the problem of generating instructions to guide humans to navigate in simulated residential environments. A major issue with current models is hallucination: they generate references to actions or objects that are inconsistent with what a human follower would perform or encounter along the described path. We develop a model that detects these hallucinated references by adopting a model pre-trained on a large corpus of image-text pairs, and fine-tuning it with a contrastive loss that separates correct instructions from instructions containing synthesized hallucinations. Our final model outperforms several baselines, including using word probability estimated by the instruction-generation model, and supervised models based on LSTM and Transformer.

摘要
我们研究如何生成导航 instrucions，以帮助人类在模拟的住宅环境中导航。当前的模型存在一个主要问题，即幻觉：它们生成的参考行为或物品与人类跟随者所执行或遇到的不一致。我们开发了一个模型，通过采用已经预训练的大量图像文本对的模型，并使用对比损失来分离正确的 instrucions 和包含synthesized幻觉的 instrucions。我们的最终模型在许多基线模型之上表现出色，包括使用 instrucions 生成模型中的词汇概率，以及使用 LSTM 和 Transformer 等监督模型。

HetGPT: Harnessing the Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.15318
repo_url: None
paper_authors: Yihong Ma, Ning Yan, Jiayu Li, Masood Mortazavi, Nitesh V. Chawla
for: 这篇论文旨在提高预训练的异类图神经网络（HGNN）的预测性能。
methods: 该论文提出了一种普适的后期提示框架（HetGPT），以改善预训练的HGNN的预测性能。该框架包括一种新的提示函数，它将虚拟类提示和多态特征提示相结合，以重新定义下游任务，使其与预tex任务更加一致。此外，HetGPT还提出了一种多视图邻居聚合机制，以捕捉异类图中复杂的邻居结构。
results: 在三个benchmark dataset上，HetGPT可以提高预训练的HGNN的 semi-supervised node classification性能。

Abstract
Graphs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the "negative transfer" problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a "pre-train, prompt" paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT's capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification.

摘要
GRAPHs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the "negative transfer" problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a "pre-train, prompt" paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT's capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification.Here's the translation in Traditional Chinese:GRAPHs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the "negative transfer" problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a "pre-train, prompt" paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT's capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification.

Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City

paper_url: http://arxiv.org/abs/2310.15302
repo_url: None
paper_authors: Mikael Brunila, Jack LaViolette, Sky CH-Wang, Priyanka Verma, Clara Féré, Grant McKenzie
for: 本研究探讨了地名的动态power、资本和抵抗力，以及地名的用途和生产过程。
methods: 本研究使用计算机方法测量了人们在日常话语中如何引用地方，并通过一个 novel的 Airbnb 列表数据集来验证这些方法。
results: 研究发现了一些新的地名概念和语言信号，这些信号有助于我们更好地理解社区地位、住房和旅游市场等问题。

Abstract
Critical toponymy examines the dynamics of power, capital, and resistance through place names and the sites to which they refer. Studies here have traditionally focused on the semantic content of toponyms and the top-down institutional processes that produce them. However, they have generally ignored the ways in which toponyms are used by ordinary people in everyday discourse, as well as the other strategies of geospatial description that accompany and contextualize toponymic reference. Here, we develop computational methods to measure how cultural and economic capital shape the ways in which people refer to places, through a novel annotated dataset of 47,440 New York City Airbnb listings from the 2010s. Building on this dataset, we introduce a new named entity recognition (NER) model able to identify important discourse categories integral to the characterization of place. Our findings point toward new directions for critical toponymy and to a range of previously understudied linguistic signals relevant to research on neighborhood status, housing and tourism markets, and gentrification.

摘要
重要的地名学研究力量、资本和抵抗的动态变化通过地名和它们所指的地点进行检查。在这里，研究通常集中在地名的semantic content和上下游机构所生成的过程上。然而，它们通常忽略了Ordinary people在日常语言中使用地名的方式以及其他地点描述的策略。我们开发了计算方法来测量文化资本和经济资本如何影响人们如何引用地方，基于2010年代的纽约市空bnb列表47,440个数据集。建立在这个数据集之上，我们提出了一种新的命名实体识别（NER）模型，能够识别地名中重要的演讲类划分。我们的发现指向了新的方向 для重要的地名学和一系列以前未研究的语言信号，有关社区地位、住房和旅游市场以及urbana化。

Neural Network with Local Converging Input (NNLCI) for Supersonic Flow Problems with Unstructured Grids

paper_url: http://arxiv.org/abs/2310.15299
repo_url: None
paper_authors: Weiming Ding, Haoxiang Huang, Tzu Jung Lee, Yingjie Liu, Vigor Yang
for: 这项研究的目的是开发一种基于深度神经网络的高精度预测方法，用于解决含有复杂物理问题的partial differential equations。
methods: 该方法使用了一种含有本地领域的输入（NNLCI），使用了涉及到粗略解的粗略解结果作为输入，从而大幅减少了计算资源和训练时间。
results: 在渠道中的做冲流动中，通过使用NNLCI方法，可以系统地研究冲击波的交互和各种流体结构，并且可以对不同的凸起形态和位置进行benchmark。

Abstract
In recent years, surrogate models based on deep neural networks (DNN) have been widely used to solve partial differential equations, which were traditionally handled by means of numerical simulations. This kind of surrogate models, however, focuses on global interpolation of the training dataset, and thus requires a large network structure. The process is both time consuming and computationally costly, thereby restricting their use for high-fidelity prediction of complex physical problems. In the present study, we develop a neural network with local converging input (NNLCI) for high-fidelity prediction using unstructured data. The framework utilizes the local domain of dependence with converging coarse solutions as input, which greatly reduces computational resource and training time. As a validation case, the NNLCI method is applied to study inviscid supersonic flows in channels with bumps. Different bump geometries and locations are considered to benchmark the effectiveness and versability of the proposed approach. Detailed flow structures, including shock-wave interactions, are examined systematically.

摘要

TaskDiff: A Similarity Metric for Task-Oriented Conversations

paper_url: http://arxiv.org/abs/2310.15298
repo_url: None
paper_authors: Ankita Bhaumik, Praveen Venkateswaran, Yara Rizk, Vatche Isahagian
for: 这个论文的目的是提出一种新的对话相似度度量方法，以便更好地评估和优化对话式数字助手的人机交互。
methods: 该论文使用了流行的大语言模型ChatGPT，并强调了提前工程和评估方法的重要性。它还利用了不同的对话组件（句子、意图和槽）的分布来计算相似度。
results: 对于一个 benchmark 数据集，TaskDiff 在对话相似度度量方法中表现出色，与其他相关方法相比，具有更高的性能和更好的Robustness。

Abstract
The popularity of conversational digital assistants has resulted in the availability of large amounts of conversational data which can be utilized for improved user experience and personalized response generation. Building these assistants using popular large language models like ChatGPT also require additional emphasis on prompt engineering and evaluation methods. Textual similarity metrics are a key ingredient for such analysis and evaluations. While many similarity metrics have been proposed in the literature, they have not proven effective for task-oriented conversations as they do not take advantage of unique conversational features. To address this gap, we present TaskDiff, a novel conversational similarity metric that utilizes different dialogue components (utterances, intents, and slots) and their distributions to compute similarity. Extensive experimental evaluation of TaskDiff on a benchmark dataset demonstrates its superior performance and improved robustness over other related approaches.

摘要
“ conversational digital assistants 的普及化导致了大量的对话数据的可用性，可以用于提高用户体验和个性化响应生成。建立这些助手使用流行的大语言模型如ChatGPT也需要额外强调提问工程和评估方法。文本相似度度量是评估和评估方法中的关键组成部分。 although many similarity metrics have been proposed in the literature, they have not proven effective for task-oriented conversations as they do not take advantage of unique conversational features. To address this gap, we present TaskDiff, a novel conversational similarity metric that utilizes different dialogue components (utterances, intents, and slots) and their distributions to compute similarity. Extensive experimental evaluation of TaskDiff on a benchmark dataset demonstrates its superior performance and improved robustness over other related approaches.”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM

paper_url: http://arxiv.org/abs/2310.15296
repo_url: None
paper_authors: Weijie Xu, Wenxiang Hu, Fanyou Wu, Srinivasan Sengamedu
for: 这篇论文是关于自然语言处理领域中的 Neural Topic Models (NTMs) 和 Large Language Models (LLMs) 的研究。
methods: 该研究提出了一种新的框架 named Diffusion-Enhanced Topic Modeling using Encoder-Decoder-based LLMs (DeTiME)，利用编码器-解码器基于的 LLMs 生成高度归一化的嵌入，以提高主题的归一化和 semantics 准确性。
results: 该研究表明，DeTiME 可以生成高度归一化的主题，并且可以同时生成与主题相关的内容。此外，DeTiME 还可以生成归一化嵌入，并且可以在多种应用中使用。

Abstract
In the burgeoning field of natural language processing, Neural Topic Models (NTMs) and Large Language Models (LLMs) have emerged as areas of significant research interest. Despite this, NTMs primarily utilize contextual embeddings from LLMs, which are not optimal for clustering or capable for topic generation. Our study addresses this gap by introducing a novel framework named Diffusion-Enhanced Topic Modeling using Encoder-Decoder-based LLMs (DeTiME). DeTiME leverages ncoder-Decoder-based LLMs to produce highly clusterable embeddings that could generate topics that exhibit both superior clusterability and enhanced semantic coherence compared to existing methods. Additionally, by exploiting the power of diffusion, our framework also provides the capability to generate content relevant to the identified topics. This dual functionality allows users to efficiently produce highly clustered topics and related content simultaneously. DeTiME's potential extends to generating clustered embeddings as well. Notably, our proposed framework proves to be efficient to train and exhibits high adaptability, demonstrating its potential for a wide array of applications.

摘要
在自然语言处理领域的发展中，神经网络主题模型（NTM）和大型自然语言模型（LLM）已经成为研究的焦点。然而，NTMs主要使用语言模型提供的上下文嵌入，这些嵌入不适合归类或生成话题。我们的研究团队解决了这个问题，通过引入一种新的框架——吸引-加强话题模型（DeTiME）。DeTiME利用编码-解码型语言模型生成高度归类的嵌入，这些嵌入可以生成具有更高的归类度和改进的语义相关性的话题。此外，通过利用吸引的力量，我们的框架还提供了生成与标注的话题相关内容的能力。这种双重功能使得用户可以高效地生成高度归类的话题和相关的内容同时。DeTiME的潜在应用范围扩展到生成归类嵌入和相关内容。值得一提的是，我们的提议的框架具有较高的效率和适应性，这些特点使得它在各种应用场景中具有潜在的潜力。

Active teacher selection for reinforcement learning from human feedback

paper_url: http://arxiv.org/abs/2310.15288
repo_url: None
paper_authors: Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell
for: 这篇论文旨在探讨人工智能系统如何从人类反馈中学习目标，并模型不同教师的理性、专业和成本差异。
methods: 论文提出了隐藏用户师 bandit（HUB）框架，用于形式化多教师问题，并开发了多种解决方案。
results: 对两个实际应用领域（文献推荐系统和 COVID-19 疫苗测试）进行了实验，发现活动教师选择（ATS）算法比基线算法表现出色，能够活动选择何时和哪个教师提供反馈。

Abstract
Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite querying a range of distinct teachers. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing. We find that the Active Teacher Selection (ATS) algorithm outperforms baseline algorithms by actively selecting when and which teacher to query. The HUB framework and ATS algorithm demonstrate the importance of leveraging differences between teachers to learn accurate reward models, facilitating future research on active teacher selection for robust reward modeling.

摘要
人工学习从人类反馈（RLHF）允许机器学习系统从人类反馈中学习目标。RLHF 系统假设所有反馈来自同一个人教师，尽管实际上可能有多个不同的人教师提供反馈。我们提议使用隐藏利用者强度bandit（HUB）框架来Model多个教师的不同合理性、专业性和成本，正式提出多教师学习问题。我们开发了多种解决方案，并将其应用于两个真实世界领域：纸张推荐系统和COVID-19疫苗测试。我们发现活动教师选择（ATS）算法在基eline算法的基础上提高性能，活动地选择何时和哪位教师提供反馈。HUB框架和ATS算法表明，可以利用不同教师之间的差异来学习准确的奖励模型，为未来的活动教师选择研究提供了基础。

Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI Grand Challenges

paper_url: http://arxiv.org/abs/2310.15274
repo_url: None
paper_authors: Eren Kurshan
for: 本研究旨在解决AI面临的三大挑战（能源墙、对齐问题和从 narrow AI 到 AGI 的跃迁）。
methods: 本研究提出了一种系统设计方法来解决这三个挑战。这种方法基于人类大脑系统设计的原则，包括信息处理和决策等方面。
results: 研究表明，通过采用系统设计方法，可以有效地解决能源墙、对齐问题和从 narrow AI 到 AGI 的跃迁。这种方法可以提高AI的可持续性和效率，同时也可以帮助实现健康的道德决策。

Abstract
AI faces a trifecta of grand challenges the Energy Wall, the Alignment Problem and the Leap from Narrow AI to AGI. Contemporary AI solutions consume unsustainable amounts of energy during model training and daily operations.Making things worse, the amount of computation required to train each new AI model has been doubling every 2 months since 2020, directly translating to increases in energy consumption.The leap from AI to AGI requires multiple functional subsystems operating in a balanced manner, which requires a system architecture. However, the current approach to artificial intelligence lacks system design; even though system characteristics play a key role in the human brain from the way it processes information to how it makes decisions. Similarly, current alignment and AI ethics approaches largely ignore system design, yet studies show that the brains system architecture plays a critical role in healthy moral decisions.In this paper, we argue that system design is critically important in overcoming all three grand challenges. We posit that system design is the missing piece in overcoming the grand challenges.We present a Systematic AI Approach for AGI that utilizes system design principles for AGI, while providing ways to overcome the energy wall and the alignment challenges.

摘要
AI面临着三大挑战：能源墙、对齐问题和从 narrow AI 到 AGI 的跳跃。现代 AI 解决方案在模型训练和日常操作中消耗不可持续的能源。事实上，自 2020 年以来，每两个月都需要训练一个新的 AI 模型，这直接导致能源消耗的增加。往 AGI 的跳跃需要多个功能子系统在均衡状态下运行，这需要一套系统架构。然而，当前人工智能的方法缺乏系统设计，即使系统特点在人类大脑中扮演着关键角色，从信息处理到做出决策。同时，当前的对齐和 AI 伦理方法大多忽略系统设计，然而研究表明，脑系统架构在健康道德决策中发挥了关键作用。在这篇论文中，我们认为系统设计是解决三大挑战的关键因素。我们主张通过系统设计原则来实现 AGI，同时提供了绕过能源墙和对齐挑战的方法。

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

paper_url: http://arxiv.org/abs/2310.15264
repo_url: None
paper_authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Singh Bedi
for: 这篇论文旨在探讨人工智能生成的文本检测问题，以帮助解决违规用户生成的文本的问题。
methods: 论文使用了许多现有的检测方法，包括语言模型、深度学习和自然语言处理等技术。
results: 论文提出了一些限制和挑战，并提出了一些未解决的问题，例如检测效果的提高和误报率的降低等。

Abstract
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. However, despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs such as spreading misinformation, generating fake news, plagiarism in academia, and contaminating the web. To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text. The basic idea is that whenever we can tell if the given text is either written by a human or an AI, we can utilize this information to address the above-mentioned concerns. To that end, a plethora of detection frameworks have been proposed, highlighting the possibilities of AI-generated text detection. But in parallel to the development of detection frameworks, researchers have also concentrated on designing strategies to elude detection, i.e., focusing on the impossibilities of AI-generated text detection. This is a crucial step in order to make sure the detection frameworks are robust enough and it is not too easy to fool a detector. Despite the huge interest and the flurry of research in this domain, the community currently lacks a comprehensive analysis of recent developments. In this survey, we aim to provide a concise categorization and overview of current work encompassing both the prospects and the limitations of AI-generated text detection. To enrich the collective knowledge, we engage in an exhaustive discussion on critical and challenging open questions related to ongoing research on AI-generated text detection.

摘要
大型语言模型（LLM）在自然语言处理（NLP）领域取得了很大的进步，其可以生成人类语言样式的文本响应。然而，有一些研究表明，LLM可能会被用于散播谣言、生成假新闻、学术抄袭和污染网络等不良用途。为 Addressing these concerns, the research community has reached a consensus on developing algorithmic solutions to detect AI-generated text. The basic idea is that if we can distinguish between human-written and AI-generated text, we can use this information to address the above-mentioned concerns. To this end, a variety of detection frameworks have been proposed, highlighting the possibilities of AI-generated text detection. However, in parallel to the development of detection frameworks, researchers have also focused on designing strategies to evade detection, i.e., making it difficult for detectors to identify AI-generated text. This is a crucial step to ensure that detection frameworks are robust and cannot be easily fooled. Despite the significant interest and the flurry of research in this domain, the community currently lacks a comprehensive analysis of recent developments. In this survey, we aim to provide a concise categorization and overview of current work encompassing both the prospects and limitations of AI-generated text detection. To enrich the collective knowledge, we engage in an exhaustive discussion on critical and challenging open questions related to ongoing research on AI-generated text detection.

Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards

paper_url: http://arxiv.org/abs/2310.15259
repo_url: https://github.com/babangain/unsup_questions_translation
paper_authors: Baban Gain, Ramakrishna Appicharla, Soumya Chennabasavaraj, Nikesh Garera, Asif Ekbal, Muthusamy Chelliah
for: This paper aims to improve the accuracy of Neural Machine Translation (NMT) for question translation in noisy environments, where the grammatical correctness of the questions is not monitored.
methods: The authors propose a training methodology that fine-tunes the NMT system only using source-side data, and combines BERTScore and Masked Language Model (MLM) Score to balance adequacy and fluency.
results: The proposed method surpasses the conventional Maximum Likelihood Estimation (MLE) based fine-tuning approach, achieving a 1.9 BLEU score improvement, and shows robustness when adding noise to the baseline, with a 1.1 BLEU improvement and large improvements on TER and BLEURT metrics.Here’s the same information in Simplified Chinese:
for: 这篇论文目的是提高在各种不精确的问题环境中的启发式机器翻译（NMT）的准确率。
methods: 作者提出了一种只使用源 сторо数据进行训练的NMT系统精度调整方法，并将BERTScore和Masked Language Model（MLM）Score相结合以保持准确性和流畅性的平衡。
results: 提议的方法超过了传统的最大概率估计（MLE）基于的精度调整方法，实现了1.9个BLEU分数的提升，并在加入噪音基eline时显示了1.1个BLEU分数的提升和大量的TER和BLEURT分数的改善。

Abstract
Community Question-Answering (CQA) portals serve as a valuable tool for helping users within an organization. However, making them accessible to non-English-speaking users continues to be a challenge. Translating questions can broaden the community's reach, benefiting individuals with similar inquiries in various languages. Translating questions using Neural Machine Translation (NMT) poses more challenges, especially in noisy environments, where the grammatical correctness of the questions is not monitored. These questions may be phrased as statements by non-native speakers, with incorrect subject-verb order and sometimes even missing question marks. Creating a synthetic parallel corpus from such data is also difficult due to its noisy nature. To address this issue, we propose a training methodology that fine-tunes the NMT system only using source-side data. Our approach balances adequacy and fluency by utilizing a loss function that combines BERTScore and Masked Language Model (MLM) Score. Our method surpasses the conventional Maximum Likelihood Estimation (MLE) based fine-tuning approach, which relies on synthetic target data, by achieving a 1.9 BLEU score improvement. Our model exhibits robustness while we add noise to our baseline, and still achieve 1.1 BLEU improvement and large improvements on TER and BLEURT metrics. Our proposed methodology is model-agnostic and is only necessary during the training phase. We make the codes and datasets publicly available at \url{https://www.iitp.ac.in/~ai-nlp-ml/resources.html#DomainAdapt} for facilitating further research.

摘要
社区问答（CQA）门户作为组织内用户的有价值工具，但是让其对非英语说者 accessible 仍然是一大挑战。通过翻译问题可以扩大社区的覆盖率，并且有助于语言不同的用户们提出相似的问题。使用神经机器翻译（NMT）翻译问题时，尤其在嘈杂环境下， grammatical correctness 的问题更加突出。这些问题可能会被非Native speaker 重塑成为句子，并且可能缺少问点。创建受损平行 corpus FROM SUCH DATA 也是困难的。为解决这个问题，我们提出了一种培训方法，只使用源侧数据进行 fine-tuning。我们的方法通过 combinig BERTScore 和 Masked Language Model（MLM） Score 来平衡准确性和流畅性。与传统的 Maximum Likelihood Estimation（MLE）基于 fine-tuning 方法相比，我们的方法可以达到 1.9 BLEU 分数的提高。我们的模型在我们添加噪音到基eline 时仍然表现稳定，并且可以达到 1.1 BLEU 分数和大幅提高 TER 和 BLEURT metric。我们的提posed methodology 是模型无关的，只需在培训阶段进行。我们将代码和数据公开 disponibles 在 \url{https://www.iitp.ac.in/~ai-nlp-ml/resources.html#DomainAdapt}，以便进一步的研究。

CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

paper_url: http://arxiv.org/abs/2310.15239
repo_url: https://github.com/mismayil/crow
paper_authors: Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut
for: 这个论文的目的是评估自然语言处理（NLP）系统在真实世界任务 setting中的常识理解能力。
methods: 这个论文使用了一种多项目数据采集管道，将现有数据集中的例子 rewrite 为具有常识抵触的例子，以构建一个手动精心编辑的多任务数据集（CRoW）。
results: 根据CRoW数据集进行评估，研究发现现有NLP系统在常识理解方面存在很大的性能差距，与人类的表现相比。这表明常识理解在真实世界任务 setting中仍然是一个待解决的问题。

Abstract
Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks. However, most of these datasets formulate commonsense reasoning challenges in artificial scenarios that are not reflective of the tasks which real-world NLP systems are designed to solve. In this work, we present CRoW, a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. CRoW is constructed using a multi-stage data collection pipeline that rewrites examples from existing datasets using commonsense-violating perturbations. We use CRoW to study how NLP systems perform across different dimensions of commonsense knowledge, such as physical, temporal, and social reasoning. We find a significant performance gap when NLP systems are evaluated on CRoW compared to humans, showcasing that commonsense reasoning is far from being solved in real-world task settings. We make our dataset and leaderboard available to the research community at https://github.com/mismayil/crow.

摘要
Recent efforts in 自然语言处理（NLP）的通用理解研究已经提供了许多新的数据集和标准测试。然而，大多数这些数据集都是在人工创造的情况下定义的通用理解挑战，这并不是实际世界中NLP系统设计的任务。在这项工作中，我们介绍了CRoW，一个手动精心编辑的、多任务测试benchmark，用于评估模型在六种真实世界NLP任务中应用通用理解的能力。CRoW使用多stage数据收集管道，将现有数据集中的示例 rewrite 以使用通用理解遮盲。我们使用CRoW来研究NLP系统在不同的通用知识维度，如物理、时间和社会理解方面的表现。我们发现，当NLP系统被评估在CRoW上时，与人类的表现存在很大的差距，这表明了通用理解在实际任务设置中仍然是一个未解决的问题。我们将我们的数据集和排名 disponibles 给研究社区在 GitHub 上（https://github.com/mismayil/crow）。

A new approach to template banks of gravitational waves with higher harmonics: reducing matched-filtering cost by over an order of magnitude

paper_url: http://arxiv.org/abs/2310.15233
repo_url: https://github.com/jaywadekar/gw_higher_harmonics_search
paper_authors: Digvijay Wadekar, Tejaswi Venumadhav, Ajit Kumar Mehta, Javier Roulet, Seth Olsen, Jonathan Mushkin, Barak Zackay, Matias Zaldarriaga
For: The paper aims to improve the sensitivity of gravitational wave event searches by including higher-order modes (HM) in the template banks, which are currently dominated by the quadrupole mode.* Methods: The paper proposes a new strategy that exploits the natural connection between modes to include HM in template banks, using a combination of post-Newtonian formulae and machine learning tools to model aligned-spin waveforms.* Results: The paper shows that the proposed method can significantly reduce the matched-filtering cost of HM searches, and is generally applicable for template banks constructed with either stochastic or geometric placement techniques. Additionally, the paper discusses compression of $(2,2)$-only geometric-placement template banks using machine learning algorithms.

Abstract
Searches for gravitational wave events use models, or templates, for the signals of interest. The templates used in current searches in the LIGO-Virgo-Kagra (LVK) data model the dominant quadrupole mode $(\ell,m)=(2,2)$ of the signals, and omit sub-dominant higher-order modes (HM) such as $(\ell,m)=(3,3)$, $(4,4)$, which are predicted by general relativity. Hence, these searches could lose sensitivity to black hole mergers in interesting parts of parameter space, such as systems with high-masses and asymmetric mass ratios. We develop a new strategy to include HM in template banks that exploits the natural connection between the modes. We use a combination of post-Newtonian formulae and machine learning tools to model aligned-spin $(3,3)$, $(4,4)$ waveforms corresponding to a given $(2,2)$ waveform. Each of these modes can be individually filtered against the data to yield separate timeseries of signal-to-noise ratios (SNR), which can be combined in a relatively inexpensive way to marginalize over extrinsic parameters of the signals. This leads to a HM search pipeline whose matched-filtering cost is just $\approx 3\times$ that of a quadrupole-only search (in contrast to being $\approx\! 100 \times$, as in previously proposed HM search methods). Our method is effectual and is generally applicable for template banks constructed with either stochastic or geometric placement techniques. Additionally, we discuss compression of $(2,2)$-only geometric-placement template banks using machine learning algorithms.

摘要
Current gravitational wave event searches use templates to model the signals of interest. These templates in the LIGO-Virgo-Kagra (LVK) data model the dominant quadrupole mode ($\ell,m$)=(2,2) of the signals and omit sub-dominant higher-order modes (HM) such as ($\ell,m$)=(3,3), (4,4), which are predicted by general relativity. As a result, these searches could lose sensitivity to black hole mergers in interesting parts of parameter space, such as systems with high masses and asymmetric mass ratios.To address this issue, we develop a new strategy that includes HM in template banks by exploiting the natural connection between the modes. We use a combination of post-Newtonian formulae and machine learning tools to model aligned-spin ($\ell,m$)=(3,3), (4,4) waveforms corresponding to a given ($\ell,m$)=(2,2) waveform. Each of these modes can be individually filtered against the data to yield separate time series of signal-to-noise ratios (SNR), which can be combined in a relatively inexpensive way to marginalize over extrinsic parameters of the signals. This leads to a HM search pipeline whose matched-filtering cost is just $\approx 3\times$ that of a quadrupole-only search (in contrast to being $\approx\! 100 \times$, as in previously proposed HM search methods). Our method is effective and is generally applicable for template banks constructed with either stochastic or geometric placement techniques. Additionally, we discuss compression of (2,2)-only geometric-placement template banks using machine learning algorithms.

Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition

paper_url: http://arxiv.org/abs/2310.15165
repo_url: https://github.com/sarapieri/fed_het
paper_authors: Sara Pieri, Jose Renato Restom, Samuel Horvath, Hisham Cholakkal
for: 该论文旨在探讨基于联合学习的视觉识别系统的性能问题。
methods: 该论文采用了多种现代化的建筑设计方法，包括卷积神经网络、转换器和MLP混合器，以实验性地证明这些设计方法对联合学习系统的性能有着重要的提高作用，特别是在处理不同数据时。
results: 该论文通过对19种视觉识别模型的测试和分析，发现了不同建筑设计方法对联合学习系统的性能有着重要的影响，并且发现了基于卷积神经网络的模型在联合学习 Setting 中的下降性能问题。

Abstract
Federated Learning (FL) is a promising research paradigm that enables the collaborative training of machine learning models among various parties without the need for sensitive information exchange. Nonetheless, retaining data in individual clients introduces fundamental challenges to achieving performance on par with centrally trained models. Our study provides an extensive review of federated learning applied to visual recognition. It underscores the critical role of thoughtful architectural design choices in achieving optimal performance, a factor often neglected in the FL literature. Many existing FL solutions are tested on shallow or simple networks, which may not accurately reflect real-world applications. This practice restricts the transferability of research findings to large-scale visual recognition models. Through an in-depth analysis of diverse cutting-edge architectures such as convolutional neural networks, transformers, and MLP-mixers, we experimentally demonstrate that architectural choices can substantially enhance FL systems' performance, particularly when handling heterogeneous data. We study 19 visual recognition models from five different architectural families on four challenging FL datasets. We also re-investigate the inferior performance of convolution-based architectures in the FL setting and analyze the influence of normalization layers on the FL performance. Our findings emphasize the importance of architectural design for computer vision tasks in practical scenarios, effectively narrowing the performance gap between federated and centralized learning. Our source code is available at https://github.com/sarapieri/fed_het.git.

摘要
《联邦学习（FL）是一种有前途的研究方法，它允许不同方共同训练机器学习模型，而不需要敏感信息交换。然而，保留数据在个体客户端引入了基本挑战，以达到与中央训练模型相同的性能。我们的研究提供了对联邦学习应用于视觉识别的广泛的评论，强调了建筑设计的重要性，这一点在FL文献中经常被忽略。现有的FL解决方案frequently测试在极其简单的或者极其厚网络上，这可能不准确反映实际应用场景。通过对多种 cutting-edge 架构，如卷积神经网络、 transformer 和 MLP-mixers 进行实验分析，我们证明了建筑设计可以很大地提高FL系统的性能，特别是处理多元数据。我们在四个具有挑战性的FL dataset上测试了19种视觉识别模型，来自五个不同的架构家族。我们还重新调查了在FL设置下，卷积神经网络的 inferior 性能，以及normalization layers 对 FL 性能的影响。我们的发现强调了在实际应用场景中，建筑设计对计算机视觉任务非常重要，从而减小了联邦和中央学习之间的性能差距。我们的源代码可以在中获取。

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

paper_url: http://arxiv.org/abs/2310.15164
repo_url: https://github.com/benlipkin/linc
paper_authors: Theo X. Olausson, Alex Gu, Benjamin Lipkin, Cedegao E. Zhang, Armando Solar-Lezama, Joshua B. Tenenbaum, Roger Levy
for: 本研究旨在帮助人工智能机器人更好地进行逻辑推理，以便在科学、数学和社会中发挥广泛的影响。
methods: 本研究使用的方法是LINC（逻辑推理via神经符号计算），即将 premise 和 conclusion 翻译成首选逻辑表示形式，然后交给外部符号证明器进行符号计算。
results: 研究发现，LINC 方法可以大幅提高 GPT-3.5 和 GPT-4 的逻辑推理性能，特别是在 ProofWriter 上。当与 GPT-4 结合使用时，LINC 方法可以与 CoT 提示方法相比，在 ProofWriter 上提高逻辑推理性能。

Abstract
Logical reasoning, i.e., deductively inferring the truth value of a conclusion from a set of premises, is an important task for artificial intelligence with wide potential impacts on science, mathematics, and society. While many prompting-based strategies have been proposed to enable Large Language Models (LLMs) to do such reasoning more effectively, they still appear unsatisfactory, often failing in subtle and unpredictable ways. In this work, we investigate the validity of instead reformulating such tasks as modular neurosymbolic programming, which we call LINC: Logical Inference via Neurosymbolic Computation. In LINC, the LLM acts as a semantic parser, translating premises and conclusions from natural language to expressions in first-order logic. These expressions are then offloaded to an external theorem prover, which symbolically performs deductive inference. Leveraging this approach, we observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate. On ProofWriter, augmenting the comparatively small open-source StarCoder+ (15.5B parameters) with LINC even outperforms GPT-3.5 and GPT-4 with Chain-of-Thought (CoT) prompting by an absolute 38% and 10%, respectively. When used with GPT-4, LINC scores 26% higher than CoT on ProofWriter while performing comparatively on FOLIO. Further analysis reveals that although both methods on average succeed roughly equally often on this dataset, they exhibit distinct and complementary failure modes. We thus provide promising evidence for how logical reasoning over natural language can be tackled through jointly leveraging LLMs alongside symbolic provers. All corresponding code is publicly available at https://github.com/benlipkin/linc

摘要
逻辑推理，即从前提中推导出结论的真假值，对人工智能来说是一项重要任务，具有广泛的科学、数学和社会影响。虽然许多提示基本策略已经被提出，但它们仍然表现不够有力，经常在微妙而难以预测的方式失败。在这项工作中，我们调查了将这类任务转化为模块化神经符号编程，即LINC（逻辑推理via神经符号计算）的有效性。在LINC中，LLM（大型自然语言模型） acted as a semantic parser，将前提和结论从自然语言翻译成首选逻辑表达。这些表达后onces offloaded to an external theorem prover，以符号方式进行deductive inference。通过这种方法，我们在FOLIO和一个均衡的subset of ProofWriter上观察到了显著的性能提升。在ProofWriter上，通过与StarCoder+（15.5B参数）相结合，LINC甚至超过了GPT-3.5和GPT-4 with Chain-of-Thought（CoT）提示的性能。当用于GPT-4时，LINC在ProofWriter上得分26%高于CoT，而且在FOLIO上的性能相对相同。进一步分析发现，尽管这两种方法在这个数据集上的均方准确率大致相等，但它们在不同的失败模式下表现出明显的差异。我们因此提供了证明人工智能可以通过结合LLMs和符号推理器来解决逻辑推理 sobre 自然语言的证据。所有相关的代码可以在https://github.com/benlipkin/linc上获取。

Linear Representations of Sentiment in Large Language Models

paper_url: http://arxiv.org/abs/2310.15154
repo_url: https://github.com/curt-tigges/eliciting-latent-sentiment
paper_authors: Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda
for: 本研究探讨了大语言模型中情感表达的表示方式。
methods: 研究人员通过对各模型的激活空间进行分析，发现情感在各模型中都表示为一条直线，大致上对应于正面和负面两种情感。通过对这些模型进行 causal 干预，研究人员发现这条直线在多种任务上具有 causally 相关性，并在真实世界数据集中如 Stanford Sentiment Treebank 中得到证实。
results: 研究人员发现，情感不仅仅表示于情感强烈的词语上，还会在介词和名称上被总结，如句号和名称。在 Stanford Sentiment Treebank 零基础分类任务中，当缺少情感方向时，净正率下降了76%，其中36%可以归因于缺少总结情感方向。

Abstract
Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs). In this study, we reveal that across a range of models, sentiment is represented linearly: a single direction in activation space mostly captures the feature across a range of tasks with one extreme for positive and the other for negative. Through causal interventions, we isolate this direction and show it is causally relevant in both toy tasks and real world datasets such as Stanford Sentiment Treebank. Through this case study we model a thorough investigation of what a single direction means on a broad data distribution. We further uncover the mechanisms that involve this direction, highlighting the roles of a small subset of attention heads and neurons. Finally, we discover a phenomenon which we term the summarization motif: sentiment is not solely represented on emotionally charged words, but is additionally summarized at intermediate positions without inherent sentiment, such as punctuation and names. We show that in Stanford Sentiment Treebank zero-shot classification, 76% of above-chance classification accuracy is lost when ablating the sentiment direction, nearly half of which (36%) is due to ablating the summarized sentiment direction exclusively at comma positions.

摘要
大自然语言文本中的情感是一种普遍存在的特征，然而如何在大型语言模型（LLM）中表达情感仍然是一个开放的问题。在这种研究中，我们发现在不同的模型中，情感是线性表示的：一个单一的方向在活动空间主要捕捉了特征，其中一个极端对应于正面，另一个对应于负面。通过 causal intervention，我们孤立了这个方向，并证明它是 causally relevant的，不仅在干扰 зада务中，还在真实世界数据集 such as Stanford Sentiment Treebank 中。通过这次研究，我们对大范围数据分布中的这个方向进行了全面的调查。我们进一步探索了这个方向的机制，发现这个方向的承担者是一小部分的注意头和神经元。最后，我们发现一种现象，我们称之为“概要抑制”现象：情感不仅表现在情感强烈的词语上，还在中间位置上，如句号和名称上进行了概要抑制。我们证明在 Stanford Sentiment Treebank 零shot分类任务中，当禁用情感方向时，准确率下降了76%，其中36%是由禁用概要情感方向 exclusively at comma positions 所致。

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

paper_url: http://arxiv.org/abs/2310.15151
repo_url: None
paper_authors: Sophie Hao, Tal Linzen
for: 这个论文旨在探讨Transformer架构中的语言特征是否具有可解释性。
methods: 作者使用了 causal intervention analysis 方法来显示BERT在 conjugate 动词时实际上使用了一种可解释的线性编码方式。
results: 研究发现，BERT在 conjugate 动词时会根据主语数目的线性编码，这种编码可以预测性地影响 conjugation 准确率。这种编码存在主语位置的第一层和动词位置的最后一层，以及中间层的多个位置，特别是当有多个cue提示主语数目时。

Abstract
Deep architectures such as Transformers are sometimes criticized for having uninterpretable "black-box" representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT's ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predictable effects on conjugation accuracy. This encoding is found in the subject position at the first layer and the verb position at the last layer, but distributed across positions at middle layers, particularly when there are multiple cues to subject number.

摘要
Note: "black-box" refers to a system that is difficult to understand or interpret because its workings are not transparent or visible. "Linear encoding" refers to a way of representing information in a straight line, so that the relationships between different elements are easy to see and understand.

Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.15145
repo_url: https://github.com/Hermannovski/React
paper_authors: Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn
for: 该研究旨在帮助机器人快速适应新任务，无需大量人工干预。
methods: 该方法使用互联网上的数据和模型，通过重新推训来快速学习新任务。它还使用了偏好的抽象学习技术和预先训练的视觉语言模型来自动提供奖励信号。
results: 在五个真实机器人抓取任务中，该方法可以在3小时内自动学习并改进目标任务。在模拟实验中，该方法也超过了使用不同RL算法或不同预测奖励方法的先前作品。Please note that the above information is in Simplified Chinese.

Abstract
The pre-train and fine-tune paradigm in machine learning has had dramatic success in a wide range of domains because the use of existing data or pre-trained models on the internet enables quick and easy learning of new tasks. We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Internet. However, reinforcement learning often requires significant human effort in the form of manual reward specification or environment resets, even if the policy is pre-trained. We introduce RoboFuME, a reset-free fine-tuning system that pre-trains a multi-task manipulation policy from diverse datasets of prior experiences and self-improves online to learn a target task with minimal human intervention. Our insights are to utilize calibrated offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy in the presence of distribution shifts and leverage pre-trained vision language models (VLMs) to build a robust reward classifier for autonomously providing reward signals during the online fine-tuning process. In a diverse set of five real robot manipulation tasks, we show that our method can incorporate data from an existing robot dataset collected at a different institution and improve on a target task within as little as 3 hours of autonomous real-world experience. We also demonstrate in simulation experiments that our method outperforms prior works that use different RL algorithms or different approaches for predicting rewards. Project website: https://robofume.github.io

摘要
Machine learning的预训练和精度调整方法在各种领域取得了很大成功，因为可以使用现有的数据或预训练模型来快速和轻松地学习新任务。我们想要将这种方法应用到机器人学习控制中，让机器人通过互联网上的数据和模型快速学习新任务，并且减少人类的干预。然而，学习控制经常需要大量的人类干预，包括手动指定奖励或环境重置，即使策略是预训练的。我们介绍了RoboFuME，一种不需要重置的精度调整系统，可以从多种数据集中预训练多任务摩擦策略，并在线自我改进来学习目标任务，尽可能减少人类干预。我们的发现是利用准确的离线学习控制技术来确保在分布shift的情况下进行高效的在线精度调整，并利用预训练的视觉语言模型（VLM）来建立自动提供奖励信号的稳定奖励分类器。在五种真实的机器人摩擦任务中，我们示出了我们的方法可以 incorporate 现有机器人数据集中的数据，并在目标任务上进行改进，只需3小时的自主实际经验。我们还在模拟实验中证明了我们的方法超过了使用不同的RL算法或不同的预测奖励方法的先前作品。项目网站：https://robofume.github.io

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

paper_url: http://arxiv.org/abs/2310.15140
repo_url: None
paper_authors: Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
for: 本研究旨在测试大型自然语言模型（LLMs）的安全性，并研究如何使用敏捷攻击和自动攻击破坏LLMs的安全性。
methods: 本研究使用了两种攻击方法： manually crafted jailbreak attacks 和自动生成的攻击 prompts。两种攻击方法都可以破坏LLMs的安全性，但是自动生成的攻击 prompts 可以更好地避免被侦测。
results: 本研究发现，两种攻击方法都可以成功破坏LLMs的安全性，但是自动生成的攻击 prompts 可以更好地避免被侦测，并且可以在使用有限的训练数据或单一的代理模型时表现更好。此外，本研究还发现了一个新的攻击方法，即“自动驱动攻击”，可以将系统提示泄露出来。

Abstract
Safety alignment of Large Language Models (LLMs) can be compromised with manual jailbreak attacks and (automatic) adversarial attacks. Recent work suggests that patching LLMs against these attacks is possible: manual jailbreak attacks are human-readable but often limited and public, making them easy to block; adversarial attacks generate gibberish prompts that can be detected using perplexity-based filters. In this paper, we show that these solutions may be too optimistic. We propose an interpretable adversarial attack, \texttt{AutoDAN}, that combines the strengths of both types of attacks. It automatically generates attack prompts that bypass perplexity-based filters while maintaining a high attack success rate like manual jailbreak attacks. These prompts are interpretable and diverse, exhibiting strategies commonly used in manual jailbreak attacks, and transfer better than their non-readable counterparts when using limited training data or a single proxy model. We also customize \texttt{AutoDAN}'s objective to leak system prompts, another jailbreak application not addressed in the adversarial attack literature. Our work provides a new way to red-team LLMs and to understand the mechanism of jailbreak attacks.

摘要
安全对齐大自然语言模型（LLM）可能会受到手动破坏攻击和自动攻击的威胁。 latest work suggests that patching LLMs against these attacks is possible：手动破坏攻击是人类可读的，但通常受限和公共，使其易于屏蔽; 自动攻击生成的噪音提示可以使用异常性基于的筛选器检测。在这篇论文中，我们表明这些解决方案可能太optimistic。我们提出一种可解释的自动攻击方法，称为\texttt{AutoDAN}。它将自动生成攻击提示，绕过异常性基于的筛选器，同时保持高攻击成功率，类似于手动破坏攻击。这些提示是可解释的和多样的，表现出手动破坏攻击中通常使用的策略，并在使用有限的训练数据或单个代理模型时传递得更好。我们还自定义\texttt{AutoDAN}的目标，以泄露系统提示，另一个未在攻击Literature中被考虑的破坏应用。我们的工作提供了一种新的红色团队LLMs的方法，以及理解破坏攻击的机制。

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

paper_url: http://arxiv.org/abs/2310.15127
repo_url: None
paper_authors: Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki
for: 这篇论文是为了解决人工智能机器人在执行人工语言指令时的问题，提出了一种基于协助者（HELPER）的解决方案。
methods: 该论文使用了预训练和冻结的大型自然语言处理（LLM）模型，通过适当的几何示例提示来将简单的场景重新排序指令映射到机器人的视мотор功能中。
results: 根据论文中的结果，使用HELPER解决方案可以在TEACh测试准则下实现1.7倍的提升，比之前的最佳状态对比（SOTA）提高1.7倍。

Abstract
Pre-trained and frozen LLMs can effectively map simple scene re-arrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting. To parse open-domain natural language and adapt to a user's idiosyncratic procedures, not known during prompt engineering time, fixed prompts fall short. In this paper, we introduce HELPER, an embodied agent equipped with an external memory of language-program pairs that parses free-form human-robot dialogue into action programs through retrieval-augmented LLM prompting: relevant memories are retrieved based on the current dialogue, instruction, correction or VLM description, and used as in-context prompt examples for LLM querying. The memory is expanded during deployment to include pairs of user's language and action plans, to assist future inferences and personalize them to the user's language and routines. HELPER sets a new state-of-the-art in the TEACh benchmark in both Execution from Dialog History (EDH) and Trajectory from Dialogue (TfD), with 1.7x improvement over the previous SOTA for TfD. Our models, code and video results can be found in our project's website: https://helper-agent-llm.github.io.

摘要
<> translate_language: zh-CN>Pre-trained和冻结的LLMs可以有效地将简单的场景重新排序指令映射到机器人的视觉动作函数通过适当的几个示例提示。但是，用于提示工程时不知道用户的特殊程序， fixes prompts 不足。在这篇论文中，我们介绍了 HELPER，一个具有语言-程序对的外部记忆的聊天机器人，通过检索加工的LLM提示来解析开放领域自然语言，并将它们转化为动作计划。在对话中，当用户提供指令、 correction 或 VLM描述时，会根据当前对话情况检索相关的记忆，并使用这些记忆作为在线提示示例来询问LLM。在部署过程中，外部记忆会被扩展以包括用户的语言和动作计划对，以帮助未来的推理和个性化。 HELPER 在 TEACh 竞赛中创造了新的纪录，在 Execution from Dialog History (EDH) 和 Trajectory from Dialogue (TfD) 两个指标上分别提高了1.7倍。我们的模型、代码和视频结果可以在我们项目的网站上找到：https://helper-agent-llm.github.io。

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

paper_url: http://arxiv.org/abs/2310.15123
repo_url: None
paper_authors: Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li
for: 本研究旨在提高大语言模型（LLM）在多方面语言生成和评估任务中的性能，包括满足复杂用户约束和考虑多个因素和标准。
methods: 本研究提出了分支、解决和合并（BSM）模块，用于帮助LLM在复杂的自然语言任务中表现更好。这三个模块负责将任务分解成多个并行子任务，独立解决它们，并将解决结果融合到一起。
results: 研究表明，BSM可以提高LLM的评估正确率和一致性，并减少长度和对位偏见的偏向。在受约Story生成任务中，BSM可以提高故事的一致性，同时提高约束满足度12%。

Abstract
Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects and criteria. However, their performance can fall short, due to the model's lack of coherence and inability to plan and decompose the problem. We propose Branch-Solve-Merge (BSM), a Large Language Model program (Schlag et al., 2023) for tackling such challenging natural language tasks. It consists of branch, solve, and merge modules that are parameterized with specific prompts to the base LLM. These three modules plan a decomposition of the task into multiple parallel sub-tasks, independently solve them, and fuse the solutions to the sub-tasks. We apply our method to the tasks of LLM response evaluation and constrained text generation and evaluate its effectiveness with multiple LLMs, including Vicuna, LLaMA-2-chat, and GPT-4. BSM improves the evaluation correctness and consistency for each LLM by enhancing human-LLM agreement by up to 26%, reducing length and pairwise position biases by up to 50%, and allowing LLaMA-2-chat to match or outperform GPT-4 on most domains. On the constraint story generation task, BSM improves the coherence of the stories while also improving constraint satisfaction by 12%.

摘要
大型语言模型（LLM）经常用于多方面的语言生成和评估任务，其中包括满足复杂的用户限制或考虑多种因素和标准。然而，LLM的性能可能不足，主要因为模型缺乏层次结构和计划能力。我们提出了分支解决方案（BSM），一种基于Schlag et al.（2023）的大型语言模型程序。该程序包括分支、解决和合并模块，这些模块都是通过特定提示来 Parametrize the base LLM。这三个模块计划将任务分解成多个平行的子任务，独立解决它们，并将子任务的解决结果融合起来。我们对响应评估和受限文本生成任务应用了我们的方法，并评估了多个LLM，包括Vicuna、LLaMA-2-chat和GPT-4。BSM可以提高每个LLM的评估正确率和一致性，提高人机 LLM 一致性达26%，降低长度和对称位偏好的偏好达50%，同时使LLaMA-2-chat与GPT-4在大多数领域匹配或超越GPT-4。在受限故事生成任务上，BSM可以提高故事的一致性，同时提高约束满足率12%。

Modeling Path Importance for Effective Alzheimer’s Disease Drug Repurposing

paper_url: http://arxiv.org/abs/2310.15211
repo_url: None
paper_authors: Shunian Xiang, Patrick J. Lawrence, Bo Peng, ChienWei Chiang, Dokyoon Kim, Li Shen, Xia Ning
for: 这 paper 的目的是探讨一种新的方法 для抗アルツハイマー病药物发现（drug repurposing），通过利用复杂网络来更有效地确定药物的作用机制。
methods: 这 paper 使用了一种新的网络基本方法（MPI），该方法通过学习节点嵌入来评估网络中各种交互类型的复杂关系，从而更好地评估药物的作用机制。
results: 对比基eline方法，MPI 能够更好地评估药物的作用机制，并提取了20.0%更多有anti-AD证据的药物。此外，从保险公司数据中的科克 proportional-hazard 模型表明，使用 etodolac、nicotine 和 BBB-crossing ACE-INHs 可能具有降低 AD 风险的作用，这些药物可能是可以重新利用的 канди达。

Abstract
Recently, drug repurposing has emerged as an effective and resource-efficient paradigm for AD drug discovery. Among various methods for drug repurposing, network-based methods have shown promising results as they are capable of leveraging complex networks that integrate multiple interaction types, such as protein-protein interactions, to more effectively identify candidate drugs. However, existing approaches typically assume paths of the same length in the network have equal importance in identifying the therapeutic effect of drugs. Other domains have found that same length paths do not necessarily have the same importance. Thus, relying on this assumption may be deleterious to drug repurposing attempts. In this work, we propose MPI (Modeling Path Importance), a novel network-based method for AD drug repurposing. MPI is unique in that it prioritizes important paths via learned node embeddings, which can effectively capture a network's rich structural information. Thus, leveraging learned embeddings allows MPI to effectively differentiate the importance among paths. We evaluate MPI against a commonly used baseline method that identifies anti-AD drug candidates primarily based on the shortest paths between drugs and AD in the network. We observe that among the top-50 ranked drugs, MPI prioritizes 20.0% more drugs with anti-AD evidence compared to the baseline. Finally, Cox proportional-hazard models produced from insurance claims data aid us in identifying the use of etodolac, nicotine, and BBB-crossing ACE-INHs as having a reduced risk of AD, suggesting such drugs may be viable candidates for repurposing and should be explored further in future studies.

摘要
近期，药物重新定位（drug repurposing）在抗生素发现中显示出了有效和资源有效的特点。 among various drug repurposing methods, network-based methods have shown promising results as they can effectively leverage complex networks that integrate multiple interaction types, such as protein-protein interactions, to more effectively identify candidate drugs. However, existing approaches typically assume that paths of the same length in the network have equal importance in identifying the therapeutic effect of drugs. Other domains have found that same length paths do not necessarily have the same importance. Therefore, relying on this assumption may be detrimental to drug repurposing attempts. In this work, we propose MPI（Modeling Path Importance）， a novel network-based method for AD drug repurposing. MPI is unique in that it prioritizes important paths via learned node embeddings, which can effectively capture a network's rich structural information. Therefore, leveraging learned embeddings allows MPI to effectively differentiate the importance among paths. We evaluate MPI against a commonly used baseline method that identifies anti-AD drug candidates primarily based on the shortest paths between drugs and AD in the network. We observe that among the top-50 ranked drugs, MPI prioritizes 20.0% more drugs with anti-AD evidence compared to the baseline. Finally, Cox proportional-hazard models produced from insurance claims data suggest that the use of etodolac, nicotine, and BBB-crossing ACE-INHs may reduce the risk of AD, indicating that these drugs may be viable candidates for repurposing and should be explored further in future studies.

Causal Inference Using LLM-Guided Discovery

paper_url: http://arxiv.org/abs/2310.15117
repo_url: None
paper_authors: Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar, Saketh Bachu, Vineeth N Balasubramanian, Amit Sharma
for:This paper focuses on developing a method to determine reliable causal graphs solely based on observational data, which is a challenging task in causal inference.methods:The authors propose using large language models (LLMs) such as GPT-3.5-turbo and GPT-4 to obtain causal order, which is easier to elicit from domain experts compared to graph edges. They employ different prompting strategies and contextual cues to propose a robust technique of obtaining causal order from LLMs.results:The authors’ approach significantly improves causal ordering accuracy as compared to established causal discovery algorithms, highlighting the potential of LLMs to enhance causal inference across diverse fields.

Abstract
At the core of causal inference lies the challenge of determining reliable causal graphs solely based on observational data. Since the well-known backdoor criterion depends on the graph, any errors in the graph can propagate downstream to effect inference. In this work, we initially show that complete graph information is not necessary for causal effect inference; the topological order over graph variables (causal order) alone suffices. Further, given a node pair, causal order is easier to elicit from domain experts compared to graph edges since determining the existence of an edge can depend extensively on other variables. Interestingly, we find that the same principle holds for Large Language Models (LLMs) such as GPT-3.5-turbo and GPT-4, motivating an automated method to obtain causal order (and hence causal effect) with LLMs acting as virtual domain experts. To this end, we employ different prompting strategies and contextual cues to propose a robust technique of obtaining causal order from LLMs. Acknowledging LLMs' limitations, we also study possible techniques to integrate LLMs with established causal discovery algorithms, including constraint-based and score-based methods, to enhance their performance. Extensive experiments demonstrate that our approach significantly improves causal ordering accuracy as compared to discovery algorithms, highlighting the potential of LLMs to enhance causal inference across diverse fields.

摘要
在 causal inference 的核心之中，存在一个挑战是基于观察数据solely deterministic reliable causal graphs。由于Backdoor criterion виси于图，任何图中的错误都可能propagate downstream影响推断。在这个工作中，我们首先显示了complete graph information 不是必需的 для causal effect 推断; causal order alone suffices。此外，给定两个节点，causal order 更容易由domain experts 提供比graph edges。 Surprisingly,我们发现了同样的原理也适用于 Large Language Models (LLMs) 如 GPT-3.5-turbo 和 GPT-4，因此提出了一种自动获取 causal order 的方法，使用 LLMs 作为虚拟 domain experts。为此，我们采用不同的 prompting strategies 和 contextual cues 来提出一种robust的 causal order 获取技术。认可 LLMS 的限制，我们也研究了可能的 интегра LLMS 与 Established causal discovery algorithms，包括 constraint-based 和 score-based methods，以提高其性能。广泛的实验表明，我们的方法可以 significatively improve causal ordering accuracy compared to discovery algorithms， highlighting the potential of LLMs to enhance causal inference across diverse fields.

The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills

paper_url: http://arxiv.org/abs/2310.15112
repo_url: None
paper_authors: Qingxiao Zheng, Yun Huang
for: 这项研究探讨了人工智能生成的数字自我模拟对线上演讲技巧的提高的影响。
methods: 这项研究使用了混合设计实验，含44名国际学生，比较了自录视频（控制组）与自clone视频（AI组）在英语演讲练习中的表现。 AI视频利用了声音拟合、面部替换、唇 sync和姿势模拟来优化参与者的原始演讲中的重复、填充词和发音。
results: 机器评分结果表明了两组的演讲表现增强，但两组没有显著差异。 AI组表现出了更高的深度反思、自律和意义的转变从修正到增强的自我批判方式。在 AI组中，自我认知和 AI自我模拟之间的一致性导致了减少的演讲焦虑和增加的享受度。我们的发现建议了伦理使用数字自我模拟来提高情感和认知方面的技能发展。

Abstract
This study explores the impact of AI-generated digital self-clones on improving online presentation skills. We carried out a mixed-design experiment involving 44 international students, comparing self-recorded videos (control) with self-clone videos (AI group) for English presentation practice. The AI videos utilized voice cloning, face swapping, lip-sync, and body-language simulation to refine participants' original presentations in terms of repetition, filler words, and pronunciation. Machine-rated scores indicated enhancements in speech performance for both groups. Though the groups didn't significantly differ, the AI group exhibited a heightened depth of reflection, self-compassion, and a meaningful transition from a corrective to an enhancive approach to self-critique. Within the AI group, congruence between self-perception and AI self-clones resulted in diminished speech anxiety and increased enjoyment. Our findings recommend the ethical employment of digital self-clones to enhance the emotional and cognitive facets of skill development.

摘要

Dual-path convolutional neural network using micro-FTIR imaging to predict breast cancer subtypes and biomarkers levels: estrogen receptor, progesterone receptor, HER2 and Ki67

paper_url: http://arxiv.org/abs/2310.15099
repo_url: None
paper_authors: Matheus del-Valle, Emerson Soares Bernardes, Denise Maria Zezell
for:这个研究的目的是为了开发一种基于深度学习的二维FTIR微спектроскопи仪来评估乳腺癌，以便提高诊断精度和快速化诊断过程。methods:这个研究使用了FTIR微спектроскопи仪收集了60个微型FTIR图像，并使用K-means算法分 clustering，自动生成32x32块，然后开发了CaReNet-V2 convolutional neural network来分类乳腺癌vs附近组织和分子亚型，以及预测生长因子和孢子激酶水平。results:这个研究得到了CA vs AT和分子亚型的测试准确率高于0.84，并能够预测生长因子和孢子激酶水平，虽然边缘值显示较低的性能（最低准确率为0.54）。 Ki67百分比回归显示了平均误差为3.6%。因此，CaReNet-V2是一种有前途的技术，可以用于评估乳腺癌生物材料，并帮助优先级化患者。

Abstract
Breast cancer molecular subtypes classification plays an import role to sort patients with divergent prognosis. The biomarkers used are Estrogen Receptor (ER), Progesterone Receptor (PR), HER2, and Ki67. Based on these biomarkers expression levels, subtypes are classified as Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). Immunohistochemistry is used to classify subtypes, although interlaboratory and interobserver variations can affect its accuracy, besides being a time-consuming technique. The Fourier transform infrared micro-spectroscopy may be coupled with deep learning for cancer evaluation, where there is still a lack of studies for subtypes and biomarker levels prediction. This study presents a novel 2D deep learning approach to achieve these predictions. Sixty micro-FTIR images of 320x320 pixels were collected from a human breast biopsies microarray. Data were clustered by K-means, preprocessed and 32x32 patches were generated using a fully automated approach. CaReNet-V2, a novel convolutional neural network, was developed to classify breast cancer (CA) vs adjacent tissue (AT) and molecular subtypes, and to predict biomarkers level. The clustering method enabled to remove non-tissue pixels. Test accuracies for CA vs AT and subtype were above 0.84. The model enabled the prediction of ER, PR, and HER2 levels, where borderline values showed lower performance (minimum accuracy of 0.54). Ki67 percentage regression demonstrated a mean error of 3.6%. Thus, CaReNet-V2 is a potential technique for breast cancer biopsies evaluation, standing out as a screening analysis technique and helping to prioritize patients.

摘要
乳癌分子亚型分类对患者进行不同预后评估具有重要作用。使用的生物标志物包括雌激素受体（ER）、孕激素受体（PR）、HER2和Ki67。根据这些生物标志物表达水平，分为Luminal A（LA）、Luminal B（LB）、HER2亚型和 triple-negative breast cancer（TNBC）。使用免疫 histochemistry 分类亚型，但是室内和观察者间的差异可能影响准确性，同时是一种时间费 consume 的技术。采用Fourier transform infrared micro-spectroscopy可以与深度学习结合用于肿瘤评估，而目前尚缺乏关于亚型和生物标志物水平预测的研究。这项研究提出了一种新的2D深度学习方法来实现这些预测。收集到了60个微FTIR图像，分辨率为320x320像素，来自人类乳腺癌微际阵列。数据被归类、处理和生成32x32个补丁，使用了一种完全自动化的方法。CaReNet-V2，一种新的 convolutional neural network，用于分类乳癌（CA）与邻近组织（AT）以及分子亚型，以及预测生物标志物水平。归类方法可以除非细胞像素。测试准确率为CA vs AT和亚型都高于0.84。模型可以预测ER、PR和HER2水平，而边缘值表现下降（最低准确率为0.54）。Ki67百分数回归示出了平均误差为3.6%。因此，CaReNet-V2 是一种有前途的技术，可以用于乳癌biopsies 评估，并且作为一种屏选分析技术，帮助优先级患者。

Acquiring Weak Annotations for Tumor Localization in Temporal and Volumetric Data

paper_url: http://arxiv.org/abs/2310.15098
repo_url: https://github.com/johnson111788/drag-drop
paper_authors: Yu-Cheng Chou, Bowen Li, Deng-Ping Fan, Alan Yuille, Zongwei Zhou
for: 提高医疗影像自动识别和定位的精度，使用大规模和充分标注的数据来训练人工智能算法。
methods: 提出了一种新的标注策略——Drag&Drop，即将医疗影像中的肿瘤拖动到合适的位置，以便更加快速地完成标注工作。
results: 经过实验表明， Drag&Drop 标注法可以提高识别和定位性能，并且可以在医疗影像中提高模型的一致性和抗见图能力。

Abstract
Creating large-scale and well-annotated datasets to train AI algorithms is crucial for automated tumor detection and localization. However, with limited resources, it is challenging to determine the best type of annotations when annotating massive amounts of unlabeled data. To address this issue, we focus on polyps in colonoscopy videos and pancreatic tumors in abdominal CT scans; both applications require significant effort and time for pixel-wise annotation due to the high dimensional nature of the data, involving either temporary or spatial dimensions. In this paper, we develop a new annotation strategy, termed Drag&Drop, which simplifies the annotation process to drag and drop. This annotation strategy is more efficient, particularly for temporal and volumetric imaging, than other types of weak annotations, such as per-pixel, bounding boxes, scribbles, ellipses, and points. Furthermore, to exploit our Drag&Drop annotations, we develop a novel weakly supervised learning method based on the watershed algorithm. Experimental results show that our method achieves better detection and localization performance than alternative weak annotations and, more importantly, achieves similar performance to that trained on detailed per-pixel annotations. Interestingly, we find that, with limited resources, allocating weak annotations from a diverse patient population can foster models more robust to unseen images than allocating per-pixel annotations for a small set of images. In summary, this research proposes an efficient annotation strategy for tumor detection and localization that is less accurate than per-pixel annotations but useful for creating large-scale datasets for screening tumors in various medical modalities.

摘要
创建大规模、详细注解的数据集是训练AI算法的关键，以便自动检测和定位肿瘤。然而，由于资源有限，很难确定最佳类型的注解，当注解大量未标记数据时。为解决这个问题，我们关注肠内观感检查中的肿瘤和胰腺扫描中的肿瘤，两者都需要大量时间和努力进行每个像素注解，因为数据具有高维度特征，包括时间和空间维度。在这篇论文中，我们开发了一种新的注解策略，称之为“Drag&Drop”，该策略可以将注解过程简化为拖动和释放。这种注解策略比其他弱类注解方法，如每个像素、 bounding boxes、scribbles、 ellipses 和点更高效，特别是在时间和空间成像中。此外，我们开发了一种基于水果算法的新弱学习方法，以利用我们的 Drag&Drop 注解。实验结果表明，我们的方法可以在较少的资源下实现更好的检测和定位性能，并且与详细每个像素注解相比，可以达到类似的性能。在总结中，这项研究提出了一种高效的肿瘤检测和定位注解策略，虽然不如每个像素注解准确，但可以为各种医疗成像模式中的肿瘤检测做出大规模数据集。

One-dimensional convolutional neural network model for breast cancer subtypes classification and biochemical content evaluation using micro-FTIR hyperspectral images

paper_url: http://arxiv.org/abs/2310.15094
repo_url: None
paper_authors: Matheus del-Valle, Emerson Soares Bernardes, Denise Maria Zezell
for: 这个研究旨在开发一个deep learning工具来评估乳癌细胞型和生化贡献。
methods: 这个研究使用了Fourier transform infrared micro-spectroscopy和机器学习方法来评估乳癌组织 образ面，并获得了生化相关的解释。
results: 这个研究获得了一个1D deep learning工具，可以快速和精确地评估乳癌细胞型和生化贡献。实验结果显示，这个工具可以对乳癌组织 образ面进行分类，并提供了生化相关的解释。

Abstract
Breast cancer treatment still remains a challenge, where molecular subtypes classification plays a crucial role in selecting appropriate and specific therapy. The four subtypes are Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). Immunohistochemistry is the gold-standard evaluation, although interobserver variations are reported and molecular signatures identification is time-consuming. Fourier transform infrared micro-spectroscopy with machine learning approaches have been used to evaluate cancer samples, presenting biochemical-related explainability. However, this explainability is harder when using deep learning. This study created a 1D deep learning tool for breast cancer subtype evaluation and biochemical contribution. Sixty hyperspectral images were acquired from a human breast cancer microarray. K-Means clustering was applied to select tissue and paraffin spectra. CaReNet-V1, a novel 1D convolutional neural network, was developed to classify breast cancer (CA) and adjacent tissue (AT), and molecular subtypes. A 1D adaptation of Grad-CAM was applied to assess the biochemical impact to the classifications. CaReNet-V1 effectively classified CA and AT (test accuracy of 0.89), as well as HER2 and TNBC subtypes (0.83 and 0.86), with greater difficulty for LA and LB (0.74 and 0.68). The model enabled the evaluation of the most contributing wavenumbers to the predictions, providing a direct relationship with the biochemical content. Therefore, CaReNet-V1 and hyperspectral images is a potential approach for breast cancer biopsies assessment, providing additional information to the pathology report. Biochemical content impact feature may be used for other studies, such as treatment efficacy evaluation and development new diagnostics and therapeutic methods.

摘要
乳癌治疗仍然是一个挑战，分子Subtype分类在选择适当和特定的治疗方法中扮演了关键性角色。四种Subtype包括Luminal A（LA）、Luminal B（LB）、HER2分子Subtype和Triple-Negative Breast Cancer（TNBC）。免疫染色是评估的标准方法，although interobserver variations have been reported, and molecular signatures identification is time-consuming。Fourier transform infrared micro-spectroscopy with machine learning approaches have been used to evaluate cancer samples, presenting biochemical-related explainability。However, this explainability is harder when using deep learning。这项研究创造了一种1D深度学习工具用于乳癌Subtype评估和生化贡献。六十个谱spectrum的图像被从人类乳癌微际array中获取。使用K-Means归一 clustering选择组织和油膜spectra。CaReNet-V1，一种新的1D卷积神经网络，用于分类乳癌（CA）和相邻组织（AT），以及分子Subtype。一种1D的Grad-CAM进行了改进，以评估分类的生化影响。CaReNet-V1能够有效地分类CA和AT（测试准确率为0.89），以及HER2和TNBC分子Subtype（测试准确率为0.83和0.86），但是LA和LB分子Subtype的分类更加困难（测试准确率为0.74和0.68）。模型允许评估预测中的最大贡献普通数，提供了直接与生化内容的关系。因此，CaReNet-V1和谱spectrum图像是可能的乳癌生物псиzes评估方法，提供了补充 pathology report 中的信息。生化内容的影响特征可能用于其他研究，如治疗效果评估和开发新的诊断和治疗方法。

MGAS: Multi-Granularity Architecture Search for Effective and Efficient Neural Networks

paper_url: http://arxiv.org/abs/2310.15074
repo_url: None
paper_authors: Xiaoyun Liu, Divya Saxena, Jiannong Cao, Yuqing Zhao, Penghui Ruan
For: 本研究旨在实现一种高效精准的神经网络搜索方法，以优化模型性能和模型大小之间的平衡。* Methods: 该方法使用多重粒度搜索（MGAS）框架，通过自适应确定不同粒度水平单元的剩余比例，以优化模型的多个粒度水平单元。同时，该方法采用分割网络阶段进行缓存控制，并提出进度评估以弥补权衡问题。* Results: 实验结果表明，MGAS方法在CIFAR-10、CIFAR-100和ImageNet等 dataset上比其他状态前方法更高效，能够更好地平衡模型性能和模型大小。

Abstract
Differentiable architecture search (DAS) revolutionizes neural architecture search (NAS) with time-efficient automation, transitioning from discrete candidate sampling and evaluation to differentiable super-net optimization and discretization. However, existing DAS methods either only conduct coarse-grained operation-level search or manually define the remaining ratios for fine-grained kernel-level and weight-level units, which fail to simultaneously optimize model size and model performance. Furthermore, these methods compromise search quality to reduce memory consumption. To tackle these issues, we introduce multi-granularity architecture search (MGAS), a unified framework which aims to comprehensively and memory-efficiently explore the multi-granularity search space to discover both effective and efficient neural networks. Specifically, we learn discretization functions specific to each granularity level to adaptively determine the remaining ratios according to the evolving architecture. This ensures an optimal balance among units of different granularity levels for different target model sizes. Considering the memory demands, we break down the super-net optimization and discretization into multiple sub-net stages. Nevertheless, the greedy nature of this approach may introduce bias in the early stages. To compensate for the bias, we propose progressive re-evaluation to allow for re-pruning and regrowing of previous units during subsequent stages. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MGAS outperforms other state-of-the-art methods in achieving a better trade-off between model performance and model size.

摘要
“弹性建筑搜寻”（Differentiable Architecture Search，DAS）革命化神经建筑搜寻（Neural Architecture Search，NAS），从粗粒度的候选者抽样和评估中转移到微调精确的超网优化和粗粒化。然而，现有的DAS方法 Either only conduct coarse-grained operation-level search or manually define the remaining ratios for fine-grained kernel-level and weight-level units, which fail to simultaneously optimize model size and model performance. Furthermore, these methods compromise search quality to reduce memory consumption.To tackle these issues, we introduce multi-granularity architecture search（MGAS）， a unified framework which aims to comprehensively and memory-efficiently explore the multi-granularity search space to discover both effective and efficient neural networks. Specifically, we learn discretization functions specific to each granularity level to adaptively determine the remaining ratios according to the evolving architecture. This ensures an optimal balance among units of different granularity levels for different target model sizes. Considering the memory demands, we break down the super-net optimization and discretization into multiple sub-net stages. Nevertheless, the greedy nature of this approach may introduce bias in the early stages. To compensate for the bias, we propose progressive re-evaluation to allow for re-pruning and regrowing of previous units during subsequent stages.Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MGAS outperforms other state-of-the-art methods in achieving a better trade-off between model performance and model size.

Synergizing Human-AI Agency: A Guide of 23 Heuristics for Service Co-Creation with LLM-Based Agents

paper_url: http://arxiv.org/abs/2310.15065
repo_url: None
paper_authors: Qingxiao Zheng, Zhongwei Xu, Abhinav Choudhary, Yuting Chen, Yongming Li, Yun Huang
for: 这个Empirical study旨在帮助服务提供者确定如何integrate Large Language Models (LLMs)技术，以便为其专家和社区提供服务。
methods: 这个研究使用了CoAGent工具，一种基于LLM的代理人服务协作设计工具，与23名领域专家从美国各地的公共图书馆合作，探索人工智能与人类的共同学习之旅。
results: 研究发现了23个关键的”服务协作与AI”heellip，把人工智能和人类分享的责任归纳为9个基本机构方面，如所有权、公平待遇和表达自由。这种创新的方法扩展了参与设计模型，使AI成为关键参与者，并通过AI-AI交互发现盲点。这些发现为人类和AI在服务上合作准备了道路。

Abstract
This empirical study serves as a primer for interested service providers to determine if and how Large Language Models (LLMs) technology will be integrated for their practitioners and the broader community. We investigate the mutual learning journey of non-AI experts and AI through CoAGent, a service co-creation tool with LLM-based agents. Engaging in a three-stage participatory design processes, we work with with 23 domain experts from public libraries across the U.S., uncovering their fundamental challenges of integrating AI into human workflows. Our findings provide 23 actionable "heuristics for service co-creation with AI", highlighting the nuanced shared responsibilities between humans and AI. We further exemplar 9 foundational agency aspects for AI, emphasizing essentials like ownership, fair treatment, and freedom of expression. Our innovative approach enriches the participatory design model by incorporating AI as crucial stakeholders and utilizing AI-AI interaction to identify blind spots. Collectively, these insights pave the way for synergistic and ethical human-AI co-creation in service contexts, preparing for workforce ecosystems where AI coexists.

摘要
这项实践研究作为各服务提供者决定是否并如何把大语言模型（LLM）技术 интеGRATE到他们的专家和更广泛的社区中的入门教程。我们通过CoAGent，一种基于语言模型代理的服务共创工具，与23名领域专家从美国各地的公共图书馆进行了三个阶段的参与式设计过程。我们发现了非AI专家与AI之间的共同挑战，并提供了23个有用的"服务共创与AI的准则",强调人与AI之间的细致共负责任。此外，我们还提出了9个基本机构方面，包括所有权、公平待遇和自由表达等，这些方面是AI的基本需求。我们的创新方法将参与式设计模型扩展到包括AI作为关键参与者，并通过AI-AI互动找到缺乏的眼界。总的来说，这些发现将为人类-AI共创在服务上准备了道路，以便在AI与人类合作的未来工作生态系统中，AI和人类可以共同创造价值。

The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models

paper_url: http://arxiv.org/abs/2310.15061
repo_url: https://github.com/shin-ee-chen/bla
paper_authors: Xinyi Chen, Raquel Fernández, Sandro Pezzelle
for: 这篇论文是为了检测语言模型对基本语言能力的理解度而写的。
methods: 该论文使用了一个新的自动生成的测试集来评估多Modal模型在基本语言能力方面的表现。
results: 论文显示，大多数测试模型在零基础设置下只有微妙地改善，而生成型模型BLIP2在Context学习设置下表现出了明显的趋势。

Abstract
Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction. In this work, we explore to what extent they handle basic linguistic constructions -- active-passive voice, coordination, and relative clauses -- that even preschool children can typically master. We present BLA, a novel, automatically constructed benchmark to evaluate multimodal models on these Basic Language Abilities. We show that different types of Transformer-based systems, such as CLIP, ViLBERT, and BLIP2, generally struggle with BLA in a zero-shot setting, in line with previous findings. Our experiments, in particular, show that most of the tested models only marginally benefit when fine-tuned or prompted with construction-specific samples. Yet, the generative BLIP2 shows promising trends, especially in an in-context learning setting. This opens the door to using BLA not only as an evaluation benchmark but also to improve models' basic language abilities.

摘要
尽管预训语言视觉模型在下游任务中表现出色，但是是否真正理解图文交互仍然是一个开放的问题。在这个工作中，我们探究这些模型是否能处理基本语言能力（Basic Language Abilities，BLA），包括主动passive语voice、协ordinance和 relatives clauses等。我们构建了一个自动生成的测试 bencmark，以评估多 modal 模型在 BLA 方面的表现。我们发现，基于 transformer 的不同类型模型，如 CLIP、ViLBERT 和 BLIP2，在零shot设置下通常遇到困难，与之前的发现相符。我们的实验表明，大多数测试模型只在特定的构造示例下得到了微妙的改进。然而，生成型 BLIP2 表现出了有前途的趋势，特别是在 context learning Setting 中。这开门了使用 BLA 不仅作为评估标准套件，而且可以提高模型的基本语言能力。

Robot Skill Generalization via Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models

paper_url: http://arxiv.org/abs/2310.15059
repo_url: None
paper_authors: Iman Nematollahi, Kirill Yankov, Wolfram Burgard, Tim Welschehold
for: 提高 робоット抓捕系统在真实世界中的适应和通用能力
methods: 融合循环和追加实现方法，将学习和适应技能与场景中的核心点组合成Hybrid Skill Models
results: 在实验和真实环境中进行了严格评估，获得了在新环境中具有重要零基eline化能力和更快地精确化技能的结果，而且不需要新的实验数据。

Abstract
A long-standing challenge for a robotic manipulation system operating in real-world scenarios is adapting and generalizing its acquired motor skills to unseen environments. We tackle this challenge employing hybrid skill models that integrate imitation and reinforcement paradigms, to explore how the learning and adaptation of a skill, along with its core grounding in the scene through a learned keypoint, can facilitate such generalization. To that end, we develop Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models (KIS-GMM) approach that learns to predict the reference of a dynamical system within the scene as a 3D keypoint, leveraging visual observations obtained by the robot's physical interactions during skill learning. Through conducting comprehensive evaluations in both simulated and real-world environments, we show that our method enables a robot to gain a significant zero-shot generalization to novel environments and to refine skills in the target environments faster than learning from scratch. Importantly, this is achieved without the need for new ground truth data. Moreover, our method effectively copes with scene displacements.

摘要
robotic manipulation system 在实际场景中运行时面临的一个长期挑战是将获得的动力学技能适应和泛化到未看过的环境。我们解决这个挑战，将混合型技能模型 integrate 仿真和资料回归 paradigms，以探索如何通过学习和适应技能的核心在场景中的学习，来实现这种泛化。为此，我们开发了 Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models (KIS-GMM) 方法，可以在场景中学习参考的三维关键点，通过 robot 的物理交互获得的视觉观察数据。经过在模拟和实际环境中的全面评估，我们表明了我们的方法可以让 robot 在未看过的环境中获得显著的零基eline generalization，并且在目标环境中更快地练习技能。此外，我们的方法可以适应场景变化。

Towards Conceptualization of “Fair Explanation”: Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators

paper_url: http://arxiv.org/abs/2310.15055
repo_url: https://github.com/jiannan-xu/emnlp23_fair_explanation
paper_authors: Tin Nguyen, Jiannan Xu, Aayushi Roy, Hal Daumé III, Marine Carpuat
for:This paper focuses on developing a novel evaluation method for “fair explanations” in AI systems, specifically in the context of content moderation of potential hate speech.methods:The authors propose using a combination of metrics, including mental discomfort, stereotype activation, and perceived workload, to evaluate the psychological impact of explanations on different user groups. They apply this method in the context of content moderation of potential hate speech, using saliency maps and counterfactual explanations as examples.results:The authors find that saliency maps generally perform better and show less evidence of disparate impact and individual unfairness than counterfactual explanations, suggesting that these maps may be a more effective and fair approach to explanation in this context.

Abstract
Recent research at the intersection of AI explainability and fairness has focused on how explanations can improve human-plus-AI task performance as assessed by fairness measures. We propose to characterize what constitutes an explanation that is itself "fair" -- an explanation that does not adversely impact specific populations. We formulate a novel evaluation method of "fair explanations" using not just accuracy and label time, but also psychological impact of explanations on different user groups across many metrics (mental discomfort, stereotype activation, and perceived workload). We apply this method in the context of content moderation of potential hate speech, and its differential impact on Asian vs. non-Asian proxy moderators, across explanation approaches (saliency map and counterfactual explanation). We find that saliency maps generally perform better and show less evidence of disparate impact (group) and individual unfairness than counterfactual explanations. Content warning: This paper contains examples of hate speech and racially discriminatory language. The authors do not support such content. Please consider your risk of discomfort carefully before continuing reading!

摘要
近期研究的核心在人工智能解释和公平性的交叉点上，关注如何使用解释提高人 plus AI 任务的性能，按照公平度量表进行评估。我们提出了一种定义“公平解释”的方法，即解释不会对特定群体产生负面影响。我们开发了一种新的评估方法，用于评估解释的公平性，不仅包括准确率和标签时间，还包括解释对不同用户群体的心理影响（情绪不适、权威触发、认知劳重）。我们在内容审核潜在仇恨言语中应用了这种方法，并对不同解释方法（Saliency Map 和 counterfactual explanation）在不同用户群体中的差异进行分析。结果表明，Saliency Map 通常表现更好，并且对于不同群体的差异和个人不公正性表现较少。请注意，本文包含了仇恨言语和排斥性语言的示例，作者不支持这类内容。读者请考虑风险的不适程度，在继续阅读之前进行评估！

TeleQnA: A Benchmark Dataset to Assess Large Language Models Telecommunications Knowledge

paper_url: http://arxiv.org/abs/2310.15051
repo_url: https://github.com/netop-team/teleqna
paper_authors: Ali Maatouk, Fadhel Ayed, Nicola Piovesan, Antonio De Domenico, Merouane Debbah, Zhi-Quan Luo
for: The paper is written to evaluate the knowledge of Large Language Models (LLMs) in telecommunications and to provide a benchmark dataset for assessing their capabilities.
methods: The paper uses an automated question generation framework to create a dataset of 10,000 questions and answers related to telecommunications, drawing from diverse sources such as standards and research articles. Human input was integrated at various stages to ensure the quality of the questions.
results: The paper evaluates the capabilities of LLMs, including GPT-3.5 and GPT-4, using the provided dataset. The results show that these models struggle with complex standards-related questions but perform well on general telecom-related inquiries. Incorporating telecom knowledge context significantly enhances their performance, highlighting the need for a specialized telecom foundation model. The paper also compares the performance of LLMs with active telecom professionals, showing that LLMs can rival the performance of humans in telecom knowledge.

Abstract
We introduce TeleQnA, the first benchmark dataset designed to evaluate the knowledge of Large Language Models (LLMs) in telecommunications. Comprising 10,000 questions and answers, this dataset draws from diverse sources, including standards and research articles. This paper outlines the automated question generation framework responsible for creating this dataset, along with how human input was integrated at various stages to ensure the quality of the questions. Afterwards, using the provided dataset, an evaluation is conducted to assess the capabilities of LLMs, including GPT-3.5 and GPT-4. The results highlight that these models struggle with complex standards related questions but exhibit proficiency in addressing general telecom-related inquiries. Additionally, our results showcase how incorporating telecom knowledge context significantly enhances their performance, thus shedding light on the need for a specialized telecom foundation model. Finally, the dataset is shared with active telecom professionals, whose performance is subsequently benchmarked against that of the LLMs. The findings illustrate that LLMs can rival the performance of active professionals in telecom knowledge, thanks to their capacity to process vast amounts of information, underscoring the potential of LLMs within this domain. The dataset has been made publicly accessible on GitHub.

摘要
我们介绍TeleQnA，首个用于评估大自然语言模型（LLM）在电信领域的评估标准dataset。这个dataset包含10,000个问题和答案，来自多个来源，包括标准和研究论文。本文详细介绍了创建这个dataset的自动问题生成框架，以及在不同阶段integrate人工input以确保问题的质量。接下来，使用提供的dataset进行评估，以评估LLMs的能力，包括GPT-3.5和GPT-4。结果显示这些模型对于标准相关问题表现不佳，但对于一般电信领域问题表现良好。此外，我们的结果显示，在将电信知识背景integrated到模型中后，模型的表现有所提高，这与建立特殊的电信基础模型有关。最后，我们将dataset分享给活跃的电信专业人员，并评估他们的表现，以及与LLMs的比较。结果显示LLMs可以与活跃的专业人员相比，在电信知识领域表现出色，强调LLMs在这个领域的潜力。dataset已经公开存储在GitHub上。

Meta- (out-of-context) learning in neural networks

paper_url: http://arxiv.org/abs/2310.15047
repo_url: https://github.com/krasheninnikov/internalization
paper_authors: Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, David Krueger
for: 本研究旨在探讨大语言模型（LLM）中的内容学习现象，以及这种现象在不同的情况下如何表现。
methods: 研究人员采用了 синтетиче实验方法，用于测试LLM的内容学习能力。他们还提出了两种假设来解释内容学习现象的出现：一种是模型存储知识的方式，另一种是梯度对齐假设。
results: 研究人员发现，当LLM在不同的情况下学习时，它们会更倾向于“内化”广泛有用的语言内容，如真实的陈述或来自可靠的 источника的文本。此外，研究人员还在 sintetic computer vision Setting中证明了 meta-OCL 现象，并提出了两种假设来解释其出现。

Abstract
Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs). We establish the existence of a phenomenon we call meta-out-of-context learning (meta-OCL) via carefully designed synthetic experiments with LLMs. Our results suggest that meta-OCL leads LLMs to more readily "internalize" the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and use it in appropriate circumstances. We further demonstrate meta-OCL in a synthetic computer vision setting, and propose two hypotheses for the emergence of meta-OCL: one relying on the way models store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based optimizers may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks. Our code can be found at https://github.com/krasheninnikov/internalization.

摘要
布朗等人 (2020) 著名地引入大语言模型（LLM）中的内容学习现象。我们通过对 LLM 的精心设计的 sintetic experiment 而证明了一种我们称为meta-out-of-context learning（meta-OCL）现象的存在。我们的结果表明，meta-OCL使得 LLM 更加倾向于在广泛有用的文本中Internalize semantic content，例如真实声明或来自可靠源的文本，并在合适的情况下使用它们。我们还在 sintetic 计算机视觉设置中证明了 meta-OCL，并提出了两种对 meta-OCL 的起源的假设：一种基于模型存储知识的方式，另一种则建立在 gradient-descent-based optimizer 的偏好之上。最后，我们讨论了我们的结果对未来 AI 系统的能力的影响，以及 potential risks。可以在 https://github.com/krasheninnikov/internalization 找到我们的代码。

A Universal Anti-Spoofing Approach for Contactless Fingerprint Biometric Systems

paper_url: http://arxiv.org/abs/2310.15044
repo_url: None
paper_authors: Banafsheh Adami, Sara Tehranipoor, Nasser Nasrabadi, Nima Karimian
for: 这个研究旨在提高无接触指纹识别的安全性，应对各种发表攻击（Presentation Attack Instruments，PAI）。
methods: 我们提出了一种通用的发表攻击检测方法，使用StyleGAN生成的实验性指纹，并使用 semi-supervised ResNet-18 模型进行训练。我们还引入了一个新的共同损失函数， combinig Arcface 和 Center loss，并添加了一个调整参数来将两个损失函数均衡。
results: 我们的提案方法可以实现0.12%的伪阳性率（BPCER）、0.63%的攻击发表率（APCER）和0.37%的平均错分率（ACER），并且在无见到的发表攻击和生活资料上进行评估。

Abstract
With the increasing integration of smartphones into our daily lives, fingerphotos are becoming a potential contactless authentication method. While it offers convenience, it is also more vulnerable to spoofing using various presentation attack instruments (PAI). The contactless fingerprint is an emerging biometric authentication but has not yet been heavily investigated for anti-spoofing. While existing anti-spoofing approaches demonstrated fair results, they have encountered challenges in terms of universality and scalability to detect any unseen/unknown spoofed samples. To address this issue, we propose a universal presentation attack detection method for contactless fingerprints, despite having limited knowledge of presentation attack samples. We generated synthetic contactless fingerprints using StyleGAN from live finger photos and integrating them to train a semi-supervised ResNet-18 model. A novel joint loss function, combining the Arcface and Center loss, is introduced with a regularization to balance between the two loss functions and minimize the variations within the live samples while enhancing the inter-class variations between the deepfake and live samples. We also conducted a comprehensive comparison of different regularizations' impact on the joint loss function for presentation attack detection (PAD) and explored the performance of a modified ResNet-18 architecture with different activation functions (i.e., leaky ReLU and RelU) in conjunction with Arcface and center loss. Finally, we evaluate the performance of the model using unseen types of spoof attacks and live data. Our proposed method achieves a Bona Fide Classification Error Rate (BPCER) of 0.12\%, an Attack Presentation Classification Error Rate (APCER) of 0.63\%, and an Average Classification Error Rate (ACER) of 0.37\%.

摘要
We generated synthetic contactless fingerprints using StyleGAN from live finger photos and integrated them into a semi-supervised ResNet-18 model. We introduced a novel joint loss function that combines the Arcface and Center loss, with a regularization to balance between the two loss functions and minimize variations within the live samples while enhancing inter-class variations between the deepfake and live samples. We also explored the impact of different regularizations on the joint loss function for presentation attack detection (PAD) and the performance of a modified ResNet-18 architecture with different activation functions (i.e., leaky ReLU and RelU) in conjunction with Arcface and center loss.Finally, we evaluated the performance of the model using unseen types of spoof attacks and live data. Our proposed method achieved a Bona Fide Classification Error Rate (BPCER) of 0.12%, an Attack Presentation Classification Error Rate (APCER) of 0.63%, and an Average Classification Error Rate (ACER) of 0.37%.

Machine Learning and Knowledge: Why Robustness Matters

paper_url: http://arxiv.org/abs/2310.19819
repo_url: None
paper_authors: Jonathan Vandenburgh
for: 本文探讨了机器学习算法的信任问题，即使用者是否可以信任机器学习模型的输出。
methods: 作者提出了一种基于知识的信任模型，即用户需要知道机器学习模型的输出是正确的，以确保模型的信任worthiness。
results: 作者认为，机器学习模型只有当它们能够在不同的假设场景下工作正常，并且做出基于正确的特征的决策时，用户才能有知识来信任模型的输出。这种知识需要决策基于正确的原因，并且需要抗错误性和robustness。

Abstract
Trusting machine learning algorithms requires having confidence in their outputs. Confidence is typically interpreted in terms of model reliability, where a model is reliable if it produces a high proportion of correct outputs. However, model reliability does not address concerns about the robustness of machine learning models, such as models relying on the wrong features or variations in performance based on context. I argue that the epistemic dimension of trust can instead be understood through the concept of knowledge, where the trustworthiness of an algorithm depends on whether its users are in the position to know that its outputs are correct. Knowledge requires beliefs to be formed for the right reasons and to be robust to error, so machine learning algorithms can only provide knowledge if they work well across counterfactual scenarios and if they make decisions based on the right features. This, I argue, can explain why we should care about model properties like interpretability, causal shortcut independence, and distribution shift robustness even if such properties are not required for model reliability.

摘要
信任机器学习算法需要对其输出有信心。信任度通常被理解为模型可靠性，其中模型可靠性是指模型生成正确输出的概率高。然而，模型可靠性不能解决机器学习模型的稳定性问题，例如模型依赖错误特征或上下文变化导致的性能变化。我认为知识维度上的信任可以通过知识概念来理解，即算法的用户是否知道其输出是正确的。知识需要形成基于正确原因的信念，并能够抵抗错误，因此机器学习算法只能提供知识，如果它们在不同的假设enario下工作良好，并且做出基于正确特征的决策。这可以解释为什么我们应该关注模型性能的可解释性、 causal shortcut independence 和分布转移Robustness 等属性，即使这些属性并不是模型可靠性的必需条件。

UWB Based Static Gesture Classification

paper_url: http://arxiv.org/abs/2310.15036
repo_url: None
paper_authors: Abhishek Sebastian
for: 这份研究旨在提高无线射频（UWB）技术下的静止姿势识别，并且应用于各个领域中。
methods: 本研究使用了专有的UWB射频感应器技术，进行了广泛的数据收集和处理，并建立了一个全面的数据预processing pipeline，包括偏差处理、保持比例的resize和伪色图像变数。模型使用了CNN和MobileNet架构，并且实现了96.78%的准确率。
results: 本研究获得了96.78%的准确率，并且开发了一个用户友好的GUI框架，以评估模型的系统资源使用和处理时间，发现系统资源利用率低，处理时间几乎是一秒钟内完成。这项研究对于静止姿势识别领域的应用具有实际意义。

Abstract
Our paper presents a robust framework for UWB-based static gesture recognition, leveraging proprietary UWB radar sensor technology. Extensive data collection efforts were undertaken to compile datasets containing five commonly used gestures. Our approach involves a comprehensive data pre-processing pipeline that encompasses outlier handling, aspect ratio-preserving resizing, and false-color image transformation. Both CNN and MobileNet models were trained on the processed images. Remarkably, our best-performing model achieved an accuracy of 96.78%. Additionally, we developed a user-friendly GUI framework to assess the model's system resource usage and processing times, which revealed low memory utilization and real-time task completion in under one second. This research marks a significant step towards enhancing static gesture recognition using UWB technology, promising practical applications in various domains.

摘要
我们的论文提出了一种鲁棒的UWB技术基于静止姿势识别框架，利用专有的UWB雷达传感器技术。我们进行了广泛的数据收集努力，编辑了五种常用的姿势集成dataset。我们的方法包括一个全面的数据预处理管道，包括异常处理、保持比例resize和假色图像变换。我们在处理后的图像上训练了CNN和MobileNet模型。意外地，我们的最佳模型达到了96.78%的准确率。此外，我们开发了一个用户友好的GUI框架，用于评估模型的系统资源使用情况和处理时间，发现内存利用率低，实时任务完成时间在1秒钟以下。这项研究为静止姿势识别领域使用UWB技术带来了重要的进步，有很多实际应用领域。

Deep Autoencoder-based Z-Interference Channels with Perfect and Imperfect CSI

paper_url: http://arxiv.org/abs/2310.15027
repo_url: None
paper_authors: Xinliang Zhang, Mojtaba Vaezi
for: 这篇论文是为了设计一种基于深度自编码器（DAE）的结构，用于实现两个用户之间的Z-INTERFERENCE CHANNEL（ZIC）中的端到端通信。
methods: 该结构使用了DAE，并将它与两个编码器/解码器对相结合，以生成受电磁干扰（Interference）意识的星形数据，以最小化比特错误率（BER）。此外，它还引入了I/Q电力分配层，以保证平均电力限制，并使架构能够生成非uniform的星形数据。
results: 相比标准的均匀星形数据（如 quadrature amplitude modulation），该结构可以提供更高的性能。在不同的电磁干扰水平（weak、moderate、strong）下，该结构都可以获得显著的改善，并且与信号噪音比（SNR）的增加而一直增加。例如，在弱电磁干扰下，当SNR大于15dB时，与最竞争性的传统方法相比，可以获得更多一个数字位的改善（ более一个数字位）。而在量化错误存在时，改善可以达到两个数字位，这表明DAE-ZIC对于电磁干扰更加Robust。

Abstract
A deep autoencoder (DAE)-based structure for endto-end communication over the two-user Z-interference channel (ZIC) with finite-alphabet inputs is designed in this paper. The proposed structure jointly optimizes the two encoder/decoder pairs and generates interference-aware constellations that dynamically adapt their shape based on interference intensity to minimize the bit error rate (BER). An in-phase/quadrature-phase (I/Q) power allocation layer is introduced in the DAE to guarantee an average power constraint and enable the architecture to generate constellations with nonuniform shapes. This brings further gain compared to standard uniform constellations such as quadrature amplitude modulation. The proposed structure is then extended to work with imperfect channel state information (CSI). The CSI imperfection due to both the estimation and quantization errors are examined. The performance of the DAEZIC is compared with two baseline methods, i.e., standard and rotated constellations. The proposed structure significantly enhances the performance of the ZIC both for the perfect and imperfect CSI. Simulation results show that the improvement is achieved in all interference regimes (weak, moderate, and strong) and consistently increases with the signal-to-noise ratio (SNR). For example, more than an order of magnitude BER reduction is obtained with respect to the most competitive conventional method at weak interference when SNR>15dB and two bits per symbol are transmitted. The improvements reach about two orders of magnitude when quantization error exists, indicating that the DAE-ZIC is more robust to the interference compared to the conventional methods.

摘要
这篇论文提出了一种基于深度自编码器（DAE）的结构，用于实现两个用户之间的Z-交通渠道（ZIC）的端到端通信。该结构同时优化了两个编码器/解码器对的joint，并生成基于干扰强度的自适应干扰几何体，以最小化比特错误率（BER）。在DAE中，我们引入了I/Q功率分配层，以保证平均功率控制，并使架构能够生成非均匀的几何体。这会带来更大的改进，比如干扰强度的不同，使用标准的均匀几何体，如 quadrature amplitude modulation。然后，我们将结构扩展到与不完全的通道状态信息（CSI）一起使用。我们研究了CSI的估计和量化误差的影响。提出的DAEZIC结构与标准和旋转几何体比较，表现出明显的改进。在不同的干扰水平（弱、中、强）下，DAEZIC结构能够在SNR从15dB开始，对比标准方法的一个次数以上减少比特错误率。此外，当存在量化误差时，DAEZIC结构能够与标准方法相比，提供更大的改进，达到两个数量级。

Efficient Data Learning for Open Information Extraction with Pre-trained Language Models

paper_url: http://arxiv.org/abs/2310.15021
repo_url: None
paper_authors: Zhiyuan Fan, Shizhu He
for:OK-IE is designed to improve the efficiency of Open Information Extraction (OpenIE) tasks in Natural Language Processing.methods:OK-IE uses a novel framework that transforms the task form of OpenIE into the pre-training task form of the T5 model, reducing the need for extensive training data. Additionally, OK-IE introduces an innovative concept called Anchor to control the sequence of model outputs and eliminate the impact of order penalty on model convergence.results:Compared to previous state-of-the-art (SOTA) methods, OK-IE requires only 1/100 of the training data (900 instances) and 1/120 of the training time (3 minutes) to achieve comparable results.

Abstract
Open Information Extraction (OpenIE) is a fundamental yet challenging task in Natural Language Processing, which involves extracting all triples (subject, predicate, object) from a given sentence. While labeling-based methods have their merits, generation-based techniques offer unique advantages, such as the ability to generate tokens not present in the original sentence. However, these generation-based methods often require a significant amount of training data to learn the task form of OpenIE and substantial training time to overcome slow model convergence due to the order penalty. In this paper, we introduce a novel framework, OK-IE, that ingeniously transforms the task form of OpenIE into the pre-training task form of the T5 model, thereby reducing the need for extensive training data. Furthermore, we introduce an innovative concept of Anchor to control the sequence of model outputs, effectively eliminating the impact of order penalty on model convergence and significantly reducing training time. Experimental results indicate that, compared to previous SOTA methods, OK-IE requires only 1/100 of the training data (900 instances) and 1/120 of the training time (3 minutes) to achieve comparable results.

摘要
开放信息提取（OpenIE）是自然语言处理中的基本 yet 挑战性任务，它涉及提取每个句子中的所有三元组（主语、谓语、谓Object）。虽然标注方法有其优点，但生成型技术具有生成 tokens 不存在于原始句子的优点。然而，这些生成型方法通常需要很大量的训练数据来学习 OpenIE 任务的形式，并且需要很长的训练时间来超越顺序罚。在这篇论文中，我们介绍了一种新的框架，OK-IE，它巧妙地将 OpenIE 任务的形式转换为 T5 模型的预训练任务形式，从而减少了训练数据的需求。此外，我们还引入了一个新的概念，即 Anchor，用于控制模型输出的顺序，有效地消除了顺序罚对模型的整合和训练时间的影响，并显著减少了训练时间。实验结果表明，相比前一代 SOTA 方法，OK-IE 只需要 1/100 的训练数据（900 个实例）和 1/120 的训练时间（3 分钟）来实现相似的结果。

paper_url: http://arxiv.org/abs/2310.15020
repo_url: None
paper_authors: Bo Ai, Zhanxin Wu, David Hsu
for: 这篇论文是关于数据驱动控制方法的总结，它提出了一种可以在不同任务域中进行总结的方法。
methods: 论文使用了一种含有深度信息和 semantic信息的表示方法，用于Visual navigation控制。
results: 实验表明，这种表示方法可以让一个在 simulate indoor scene 中训练的控制策略在真实世界中的不同环境中进行总结，并且可以降低A-distance，从而提高总结误差的 bound。

Abstract
The data-driven approach to robot control has been gathering pace rapidly, yet generalization to unseen task domains remains a critical challenge. We argue that the key to generalization is representations that are (i) rich enough to capture all task-relevant information and (ii) invariant to superfluous variability between the training and the test domains. We experimentally study such a representation -- containing both depth and semantic information -- for visual navigation and show that it enables a control policy trained entirely in simulated indoor scenes to generalize to diverse real-world environments, both indoors and outdoors. Further, we show that our representation reduces the A-distance between the training and test domains, improving the generalization error bound as a result. Our proposed approach is scalable: the learned policy improves continuously, as the foundation models that it exploits absorb more diverse data during pre-training.

摘要
“数据驱动的机器人控制方法在速度快速增长，然而在未经见过任务域时的泛化仍然是一个挑战。我们认为，键要泛化是（i）捕捉所有任务相关信息的丰富表示，以及（ii）在训练和测试领域之间不受干扰的表示不变性。我们实验study一种如此的表示——包含深度和semantic信息——用于视觉导航，并证明其使得一个完全在simulated indoor scene中训练的控制策略可以通过多种真实世界环境，both indoors and outdoors泛化。此外，我们证明我们的表示可以降低A距离 between 训练和测试领域，从而降低泛化错误 bound。我们提出的方法可扩展：学习的策略会不断改进，因为它们可以在预训练过程中吸收更多多样化的数据。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Meta learning with language models: Challenges and opportunities in the classification of imbalanced text

paper_url: http://arxiv.org/abs/2310.15019
repo_url: https://github.com/usnistgov/NIST-AI-Meta-Learning-LLM
paper_authors: Apostol Vassilev, Honglan Jin, Munawar Hasan
for: 检测异常言行（OOPS）内容是重要但具有挑战性的任务。
methods: 我们提出了一种元学习技术（MLT），将不同文本表示建立的个体模型结合在一起，并通过分析表明了结果的数值稳定性和合理的结合权重。
results: 我们结合了MLT和阈值移动（TM）技术，对高度不均衡的采集和非采集数据集进行了改进性能的推断。我们还提供了计算结果，证明了我们的提案方法具有统计学上的优势。

Abstract
Detecting out of policy speech (OOPS) content is important but difficult. While machine learning is a powerful tool to tackle this challenging task, it is hard to break the performance ceiling due to factors like quantity and quality limitations on training data and inconsistencies in OOPS definition and data labeling. To realize the full potential of available limited resources, we propose a meta learning technique (MLT) that combines individual models built with different text representations. We analytically show that the resulting technique is numerically stable and produces reasonable combining weights. We combine the MLT with a threshold-moving (TM) technique to further improve the performance of the combined predictor on highly-imbalanced in-distribution and out-of-distribution datasets. We also provide computational results to show the statistically significant advantages of the proposed MLT approach. All authors contributed equally to this work.

摘要
检测不当言论（OOPS）内容是重要但困难的任务。虽然机器学习是一种强大的工具来解决这个任务，但由于训练数据的量和质量限制以及OOPS定义和数据标注的不一致，很难突破性能的上限。为了实现有限的资源的潜在潜力，我们提议一种元学习技术（MLT），该技术将不同的文本表示建立各自的模型，并将这些模型组合起来。我们数学地显示，该技术的结果是数学稳定的并且可以获得合理的组合权重。我们还将MLT与阈值移动（TM）技术结合使用，以进一步提高在具有很高异常性的数据集上的组合预测器的性能。此外，我们还提供了计算结果，以证明我们的提议MLT方法具有统计上的显著优势。All authors contributed equally to this work.

The primacy bias in Model-based RL

paper_url: http://arxiv.org/abs/2310.15017
repo_url: None
paper_authors: Zhongjian Qiao, Jiafei Lyu, Xiu Li
for: 本研究旨在 investigate 深度优化学习（DRL）中的 primacy bias，特别是在 model-based reinforcement learning（MBRL）中。
methods: 本研究使用 world model resetting 方法来 alleviate primacy bias。
results: 实验结果表明，world model resetting 可以 effectively 降低 primacy bias 并提高算法的性能。Results 验证在多个 continous control 任务上和 discrete control 任务上，并在 MuJoCo 和 DeepMind Control Suite 上进行了 validate。

Abstract
The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of DRL algorithms. Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias. However, we observe that resetting the agent's parameters harms its performance in the context of model-based reinforcement learning (MBRL). In fact, on further investigation, we find that the primacy bias in MBRL differs from that in model-free RL. In this work, we focus on investigating the primacy bias in MBRL and propose world model resetting, which works in MBRL. We apply our method to two different MBRL algorithms, MBPO and DreamerV2. We validate the effectiveness of our method on multiple continuous control tasks on MuJoCo and DeepMind Control Suite, as well as discrete control tasks on Atari 100k benchmark. The results show that world model resetting can significantly alleviate the primacy bias in model-based setting and improve algorithm's performance. We also give a guide on how to perform world model resetting effectively.

摘要
深度强化学习（DRL）中的先天偏见（primacy bias），指的是智能体的偏见性，即在训练早期采集的数据中适应性过高，导致新数据适应性下降。先前的研究表明，对智能体参数重置可以减轻先天偏见的影响。然而，我们发现在基于模型的强化学习（MBRL）中，重置智能体参数实际上会伤性。这是因为MBRL中的先天偏见与模型自由强化学习（model-free RL）中的先天偏见不同。在这项工作中，我们关注MBRL中的先天偏见，并提出了世界模型重置方法（world model resetting）。我们应用了我们的方法于两种不同的MBRL算法，MBPO和DreamerV2。我们验证了我们的方法在多个连续控制任务上，包括MuJoCo和DeepMind Control Suite，以及离散控制任务上，包括Atari 100k benchmark。结果表明，世界模型重置可以在基于模型的设置中减轻先天偏见，提高算法的性能。我们还提供了有效实施世界模型重置的指南。

Understanding the Inner Workings of Language Models Through Representation Dissimilarity

paper_url: http://arxiv.org/abs/2310.14993
repo_url: None
paper_authors: Davis Brown, Charles Godfrey, Nicholas Konz, Jonathan Tu, Henry Kvinge
for: 本研究旨在帮助我们更深入地理解语言模型的内部工作方式，提高模型的信任、可读性和透明性。
methods: 本研究使用表示差异度量来评估语言模型的内部表示方式之间的不同程度。
results: 研究发现，使用SoLU和GeLU活动函数的模型存在显著的内部表示差异性，此外，表示差异度量还可以检测和定位模型的泛化特性，而这些特性可以通过测试集中的表现不可见。此外，研究还发现了语言模型特征的变化情况随着模型的宽度和深度的变化。

Abstract
As language models are applied to an increasing number of real-world applications, understanding their inner workings has become an important issue in model trust, interpretability, and transparency. In this work we show that representation dissimilarity measures, which are functions that measure the extent to which two model's internal representations differ, can be a valuable tool for gaining insight into the mechanics of language models. Among our insights are: (i) an apparent asymmetry in the internal representations of model using SoLU and GeLU activation functions, (ii) evidence that dissimilarity measures can identify and locate generalization properties of models that are invisible via in-distribution test set performance, and (iii) new evaluations of how language model features vary as width and depth are increased. Our results suggest that dissimilarity measures are a promising set of tools for shedding light on the inner workings of language models.

摘要

An apparent asymmetry in the internal representations of models using SoLU and GeLU activation functions.2. Evidence that dissimilarity measures can identify and locate generalization properties of models that are invisible via in-distribution test set performance.3. New evaluations of how language model features vary as width and depth are increased.Our results suggest that dissimilarity measures are a promising set of tools for shedding light on the inner workings of language models.

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

paper_url: http://arxiv.org/abs/2310.14979
repo_url: None
paper_authors: Xinpeng Wang, Barbara Plank
for: 该论文旨在解决dataset创建过程中annotator disagreement的问题，提出了一种基于多头模型的活动学习策略来降低标注成本。
methods: 该论文使用了多头模型和活动学习策略来解决annotator disagreement问题，并评估了不同的acquisition函数在两个数据集上。
results: 论文显示，使用多头模型和活动学习策略可以大幅降低标注成本，同时保持prediction和uncertainty estimation的性能水平。 Specifically, the paper shows that using a multi-head model and active learning can reduce the annotation budget by up to 70%, while maintaining the performance of prediction and uncertainty estimation.

Abstract
Label aggregation such as majority voting is commonly used to resolve annotator disagreement in dataset creation. However, this may disregard minority values and opinions. Recent studies indicate that learning from individual annotations outperforms learning from aggregated labels, though they require a considerable amount of annotation. Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement. We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation. By designing and evaluating acquisition functions with annotator-specific heads on two datasets, we show that group-level entropy works generally well on both datasets. Importantly, it achieves performance in terms of both prediction and uncertainty estimation comparable to full-scale training from disagreement, while saving up to 70% of the annotation budget.

摘要
Label aggregation如果常用来解决注释分歧在数据集创建中。然而，这可能忽略少数值和意见。 latest studies indicate that learning from individual annotations outperforms learning from aggregated labels, although they require a considerable amount of annotation。 Active learning，作为注释成本减少策略，尚未在决策分歧中充分探索。我们显示，在活动学习设置下，一个多头模型在uncertainty估计方面表现更好于单头模型，并且可以大幅减少注释预算。通过设计和评估Acquisition函数的annotator-specificheads在两个数据集上，我们显示了组级 entropy通常在两个数据集上表现良好，并且可以达到相对于全面决策分歧的性能，同时减少到70%的注释预算。

The WHY in Business Processes: Discovery of Causal Execution Dependencies

paper_url: http://arxiv.org/abs/2310.14975
repo_url: None
paper_authors: Fabiana Fournier, Lior Limonad, Inna Skarbovsky, Yuval David
for: 本研究旨在揭示进程执行中的真实依赖关系，以便更好地预测进程 intervención的结果和做出 Informed 决策。
methods: 本研究使用现有的 causal discovery 算法对活动时间进行了扩展，并在 three 种 causal 模式下检测了模型与实际进程模型之间的差异。
results: 研究人员通过对两个开源的进程挖掘算法和 IBM 进程挖掘工具进行应用，在 sintéticamente 生成的数据集和两个开源的 Referential 数据集上进行了实验，并发现了模型与实际进程模型之间的差异。

Abstract
A crucial element in predicting the outcomes of process interventions and making informed decisions about the process is unraveling the genuine relationships between the execution of process activities. Contemporary process discovery algorithms exploit time precedence as their main source of model derivation. Such reliance can sometimes be deceiving from a causal perspective. This calls for faithful new techniques to discover the true execution dependencies among the tasks in the process. To this end, our work offers a systematic approach to the unveiling of the true causal business process by leveraging an existing causal discovery algorithm over activity timing. In addition, this work delves into a set of conditions under which process mining discovery algorithms generate a model that is incongruent with the causal business process model, and shows how the latter model can be methodologically employed for a sound analysis of the process. Our methodology searches for such discrepancies between the two models in the context of three causal patterns, and derives a new view in which these inconsistencies are annotated over the mined process model. We demonstrate our methodology employing two open process mining algorithms, the IBM Process Mining tool, and the LiNGAM causal discovery technique. We apply it on a synthesized dataset and on two open benchmark data sets.

摘要
“一个重要的因素在预测 процес间接引入和做出 Informed 决策是探索真实的执行活动之间的依赖关系。现代 процес发现算法从时间先决为主要的模型 derivation 来源。这种依赖可能从 causal 角度来说是欺骗的。这倡议需要新的忠实的方法来发现 true 的执行dependencies amidst tasks 在过程中。为此，我们的工作提供了一个系统的方法来揭示 genuine 的商业过程模型，通过利用现有的 causal 发现算法 sobre 活动时间。此外，这个工作也探讨了一些过程探索算法生成模型与商业过程模型不一致的情况下，如何使用这个模型进行有 sound 的过程分析。我们的方法在这三种 causal 模式下寻找这些不一致，并将这些不一致注解到探索过程中。我们运用这个方法使用 IBM 过程探索工具和 LiNGAM causal 发现技术。我们将其应用于一个合成数据集和两个公开的 benchmark 数据集。”Note: Simplified Chinese is used here, which is a more casual and informal version of Chinese. If you prefer Traditional Chinese or another version, please let me know.

Efficient Causal Discovery for Robotics Applications

paper_url: http://arxiv.org/abs/2310.14925
repo_url: None
paper_authors: Luca Castri, Sariah Mghames, Nicola Bellotto
for: 这个论文的目的是为了提供一种快速和准确的 causal 分析方法，用于在人类和机器人共同工作的环境中自动化任务。
methods: 这个论文使用的方法是 Filtered PCMCI（F-PCMCI），它可以快速和准确地分析人类和机器人之间的 causal 关系。
results: 论文中的实验表明，F-PCMCI 可以准确地重建人类和机器人之间的 causal 模型，从而提高机器人和人类之间的互动质量。

Abstract
Using robots for automating tasks in environments shared with humans, such as warehouses, shopping centres, or hospitals, requires these robots to comprehend the fundamental physical interactions among nearby agents and objects. Specifically, creating models to represent cause-and-effect relationships among these elements can aid in predicting unforeseen human behaviours and anticipate the outcome of particular robot actions. To be suitable for robots, causal analysis must be both fast and accurate, meeting real-time demands and the limited computational resources typical in most robotics applications. In this paper, we present a practical demonstration of our approach for fast and accurate causal analysis, known as Filtered PCMCI (F-PCMCI), along with a real-world robotics application. The provided application illustrates how our F-PCMCI can accurately and promptly reconstruct the causal model of a human-robot interaction scenario, which can then be leveraged to enhance the quality of the interaction.

摘要
In this paper, we present a practical demonstration of our approach for fast and accurate causal analysis, known as Filtered PCMCI (F-PCMCI), along with a real-world robotics application. The provided application shows how our F-PCMCI can accurately and promptly reconstruct the causal model of a human-robot interaction scenario, which can then be leveraged to enhance the quality of the interaction.

PartialFormer: Modeling Part Instead of Whole

paper_url: http://arxiv.org/abs/2310.14921
repo_url: https://github.com/zhengkid/partialformer
paper_authors: Tong Zheng, Bei Li, Huiwen Bao, Weiqiao Shan, Tong Xiao, Jingbo Zhu
for: 本研究旨在提出一种具有较少参数和计算 overhead 的 Transformer Feed-Forward Neural Network (FFNN) 架构，通过强调隐藏维度来实现轻量级 FFNN。
methods: 本研究提出了一种名为 PartialFormer 的参数高效的 Transformer 架构，该架构使用多个较小的 FFNN 来减少参数和计算量，同时保持关键的隐藏维度。此外，我们还提出了一种适应 Head 的缩放策略，以及一种深度缩放中的 residual-like 注意计算方法。
results: 我们在 9 种翻译任务和 1 种抽象摘要任务上进行了广泛的实验 validate 了我们的 PartialFormer 方法的有效性。

Abstract
The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead. In this work, we emphasize the importance of hidden dimension in designing lightweight FFNs, a factor often overlooked in previous architectures. Guided by this principle, we introduce PartialFormer, a parameter-efficient Transformer architecture utilizing multiple smaller FFNs to reduce parameters and computation while maintaining essential hidden dimensions. These smaller FFNs are integrated into a multi-head attention system to enable effective collaboration. We also propose a tailored head scaling strategy to enhance PartialFormer's capabilities. Furthermore, we present a residual-like attention calculation to improve depth scaling within PartialFormer. Extensive experiments on 9 translation tasks and 1 abstractive summarization task validate the effectiveness of our PartialFormer approach. Our code would be available at: \url{https://github.com/zhengkid/PartialFormer}.

摘要
“ transformer Feed-Forward Neural Networks（FFN）的设计选择导致了重要的计算和参数开销。在这种工作中，我们强调隐藏维度在设计轻量级 FFN 时的重要性，一个常被前一代架构所忽略。受此假设，我们介绍 PartialFormer，一个具有轻量级 FFN 的参数有效的 transformer 架构，通过多个较小的 FFN 实现降低参数和计算量，保持重要的隐藏维度。这些较小的 FFN 被整合到多头注意系统中，以实现有效的合作。此外，我们提出了适应 PartialFormer 的头scaling策略，以提高其能力。此外，我们还提出了一种类似于差分注意计算的 residual-like attention 计算，以提高 PartialFormer 中的深度扩展。实验结果显示，我们的 PartialFormer 方法在 9 种翻译任务和 1 种概括摘要任务上具有优秀的效果。我们的代码将会在：https://github.com/zhengkid/PartialFormer 上公开。”

Linking Surface Facts to Large-Scale Knowledge Graphs

paper_url: http://arxiv.org/abs/2310.14909
repo_url: https://github.com/nec-research/fact-linking
paper_authors: Gorjan Radevski, Kiril Gashteovski, Chia-Chien Hung, Carolin Lawrence, Goran Glavaš
for: 本研究的目的是提高自然语言文本中的信息提取和知识图（KG）之间的 bridge。
methods: 本研究使用 Open Information Extraction（OIE）方法提取自然语言文本中的 (“主题”; “关系”; “ объек”) triple，并使用知识图（KG）来提高信息的可靠性和准确性。
results: 研究发现，检测外部知识图（KG）中没有对应的表面形式是更加困难的任务，而准确地将表面形式与知识图（KG）中的对应关系连接起来的任务相对较为容易。

Abstract
Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.

摘要
开放信息抽取（OIE）方法从自然语言文本中提取信息，并将其表示为("主题"; "关系"; "对象") triple。但这些信息的表面形式却具有很多含义不确定性，例如 surface phrase "Michael Jordan" 可能指的是前职业篮球运动员或大学教授。知识图（KG）则是一种固定 schema（i.e., 固定的实体和Predicate）的canvas，它们的覆盖率较低。为了bridging这个差距，我们需要将 OIE 和 KG 的优点相结合：（1）高度的自由文本 OIE 覆盖率，以及（2）知识图中的semantic precisión（i.e., 单义性）。为了实现这个目标，我们提出了一个新的标准准例，以及一些新的评估协议，可以例如度量系统是否能够在granular triple slot level上正确地连接信息。我们的广泛评估表明，检测 Out-of-KG 实体和 predicate 是更加困难的，因此需要更多的研究努力。我们将所有资源（数据、标准准例和代码）公开发布在 https://github.com/nec-research/fact-linking。

Universal Knowledge Graph Embeddings

paper_url: http://arxiv.org/abs/2310.14899
repo_url: https://github.com/dice-group/universal_embeddings
paper_authors: N’Dah Jean Kouagou, Caglar Demir, Hamada M. Zahera, Adrian Wilke, Stefan Heindorf, Jiayi Li, Axel-Cyrille Ngonga Ngomo
for: 这个论文的目的是学习大规模联结知识图中的embedding，以便在多个知识图上进行类似entity的搜索和关系预测。
methods: 这篇论文使用了大规模知识图的OWL:sameAs关系来融合多个知识图，并通过计算这些关系来学习universal的知识图 embedding。
results: 实验表明，使用这些universal embedding可以更好地预测知识图中的关系，并且提供了一个便捷的API来提供这些embedding作为服务。

Abstract
A variety of knowledge graph embedding approaches have been developed. Most of them obtain embeddings by learning the structure of the knowledge graph within a link prediction setting. As a result, the embeddings reflect only the semantics of a single knowledge graph, and embeddings for different knowledge graphs are not aligned, e.g., they cannot be used to find similar entities across knowledge graphs via nearest neighbor search. However, knowledge graph embedding applications such as entity disambiguation require a more global representation, i.e., a representation that is valid across multiple sources. We propose to learn universal knowledge graph embeddings from large-scale interlinked knowledge sources. To this end, we fuse large knowledge graphs based on the owl:sameAs relation such that every entity is represented by a unique identity. We instantiate our idea by computing universal embeddings based on DBpedia and Wikidata yielding embeddings for about 180 million entities, 15 thousand relations, and 1.2 billion triples. Moreover, we develop a convenient API to provide embeddings as a service. Experiments on link prediction show that universal knowledge graph embeddings encode better semantics compared to embeddings computed on a single knowledge graph. For reproducibility purposes, we provide our source code and datasets open access at https://github.com/dice-group/Universal_Embeddings

摘要
各种知识图像化方法已经开发出来。大多数这些方法通过在链接预测设定中学习知识图的结构来获取嵌入。因此，这些嵌入只反映了单个知识图的 semantics，并且嵌入不同的知识图不匹配，例如不能通过最近邻居搜索找到类似实体。然而，知识图嵌入应用如实体划分需要更全面的表示，即一个可以适用于多个来源的表示。我们提议从大规模连接的知识源中学习通用知识图嵌入。为此，我们将大知识图基于owl:sameAs关系进行融合，使每个实体都有唯一的标识。我们实现了这个想法，计算了基于DBpedia和Wikidata的通用嵌入，共计180万实体、15000关系和120亿 triple。此外，我们还开发了一个便捷的API，以提供嵌入作为服务。链接预测实验表明，通用知识图嵌入更好地编码 semantics 比单个知识图嵌入。为了保持可重复性，我们在https://github.com/dice-group/Universal_Embeddings 上提供了源代码和数据集，供大家免费下载。

Local Universal Rule-based Explanations

paper_url: http://arxiv.org/abs/2310.14894
repo_url: https://github.com/sbobek/lux
paper_authors: Szymon Bobek, Grzegorz J. Nalepa
for: 本研究旨在提供一种可解释的人工智能（XAI）方法，以帮助理解模型做出决策的原因。
methods: 本研究使用了一种基于决策树算法的规则生成器，可以生成事实、Counterfactual和视觉解释。此外，它还可以融合 SHAP 或 LIME 等特征重要性 XAI 方法。不同于其他算法，LUX 不需要数据生成，而是基于实际数据中的高密度集群来选择本地概念。
results: 对实际和synthetic数据进行测试后，我们发现LUX 方法在简洁性、全球准确性和代表性方面都超过了现有的rule-based explainer。

Abstract
Explainable artificial intelligence (XAI) is one of the most intensively developed are of AI in recent years. It is also one of the most fragmented one with multiple methods that focus on different aspects of explanations. This makes difficult to obtain the full spectrum of explanation at once in a compact and consistent way. To address this issue, we present Local Universal Explainer (LUX) that is a rule-based explainer which can generate factual, counterfactual and visual explanations. It is based on a modified version of decision tree algorithms that allows for oblique splits and integration with feature importance XAI methods such as SHAP or LIME. It does not use data generation in opposite to other algorithms, but is focused on selecting local concepts in a form of high-density clusters of real data that have the highest impact on forming the decision boundary of the explained model. We tested our method on real and synthetic datasets and compared it with state-of-the-art rule-based explainers such as LORE, EXPLAN and Anchor. Our method outperforms currently existing approaches in terms of simplicity, global fidelity and representativeness.

摘要
“几年前，可解释人工智能（XAI）已成为人工智能发展的一个热门领域。然而，这个领域受到多种方法的影响，每种方法都专注于不同的解释方面。这使得很难同时获得完整的解释spectrum，尤其是在一个单一和可靠的方式下。为了解决这个问题，我们提出了一个名为Local Universal Explainer（LUX）的规律式解释器。LUX可以生成事实、 counterfactual 和可视的解释，并基于修改后的决策树算法，可以与特征重要性 XAI 方法（如 SHAP 或 LIME）集成。它不需要资料生成，而是专注于选择本地概念，即高密度的真实数据集中的最高影响因素，以形成决策边界的解释。我们对实际和 sintetic 数据进行了测试，并与现有的规律式解释器（如 LORE、EXPLAN 和 Anchor）进行比较。我们发现，我们的方法在简单性、全球实现度和代表性方面表现更好。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know.

Non-autoregressive Streaming Transformer for Simultaneous Translation

paper_url: http://arxiv.org/abs/2310.14883
repo_url: https://github.com/ictnlp/nast
paper_authors: Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, Yang Feng
for: 提高同时翻译（SiMT）模型的快速和精度之间的平衡。
methods: 非autoregressive streaming Transformer（NAST）模型，包括单向编码器和非autoregressive解码器，并使用内存缓存来实现同步翻译。
results: NAST模型在各种SiMT benchmark上表现出色，比前一个强的autoregressive SiMT基线模型更高。

Abstract
Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anticipation. We argue that such issue stems from the autoregressive architecture upon which most existing SiMT models are built. To address those issues, we propose non-autoregressive streaming Transformer (NAST) which comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism. We enable NAST to generate the blank token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and train it to maximize the non-monotonic latent alignment with an alignment-based latency loss. Experiments on various SiMT benchmarks demonstrate that NAST outperforms previous strong autoregressive SiMT baselines.

摘要

A Study on Knowledge Graph Embeddings and Graph Neural Networks for Web Of Things

paper_url: http://arxiv.org/abs/2310.14866
repo_url: https://github.com/kgrl2021/submission-one
paper_authors: Rohith Teja Mittakola, Thomas Hassan
for: 本研究旨在适用知识图在物联网（WoT）领域，提供物理世界的数字表示，并启用跨领域应用程序在这个规模巨大且高度连接的图ething上建立。
methods: 本研究使用现状最佳知识图嵌入（KGE）方法学习数据抽象，并在下游任务中进行链接预测、节点分类和 triple 分类。此外，本研究还 investigate graph neural networks（GNN）与 KGE 方法的比较性能。
results: 研究显示，state-of-the-art KGE 和 GNN 方法在节点分类任务中表现极佳，而 GNN 方法在链接预测任务中表现更加优异。总的来说，本研究表明现状最佳方法在 WoT 上适用，并提供了实施和评估这些方法的初步经验。

Abstract
Graph data structures are widely used to store relational information between several entities. With data being generated worldwide on a large scale, we see a significant growth in the generation of knowledge graphs. Thing in the future is Orange's take on a knowledge graph in the domain of the Web Of Things (WoT), where the main objective of the platform is to provide a digital representation of the physical world and enable cross-domain applications to be built upon this massive and highly connected graph of things. In this context, as the knowledge graph grows in size, it is prone to have noisy and messy data. In this paper, we explore state-of-the-art knowledge graph embedding (KGE) methods to learn numerical representations of the graph entities and, subsequently, explore downstream tasks like link prediction, node classification, and triple classification. We also investigate Graph neural networks (GNN) alongside KGEs and compare their performance on the same downstream tasks. Our evaluation highlights the encouraging performance of both KGE and GNN-based methods on node classification, and the superiority of GNN approaches in the link prediction task. Overall, we show that state-of-the-art approaches are relevant in a WoT context, and this preliminary work provides insights to implement and evaluate them in this context.

摘要
GRaph数据结构广泛用于存储多个实体之间的关系。随着全球数据生成的快速增长，我们看到了知识图的快速增长。在未来，Orange将在Web Of Things（WoT）领域提供一个基于大量连接的图Structured的数字表示世界，以便在这个图上建立跨领域应用程序。在这种情况下，知识图的增长会导致噪声和杂乱的数据。在这篇论文中，我们探讨了现代知识图嵌入（KGE）方法，以学习图元素的数字表示，并对这些表示进行下游任务的预测，如链接预测、节点分类和 triple classification。同时，我们 также investigate了图神经网络（GNN）与KGE的比较，并评估其在同样的下游任务中的性能。我们的评估表明，KGE和GNN基本方法在节点分类任务中表现出色，而GNN方法在链接预测任务中表现更为优秀。总的来说，我们发现现代方法在WoT上是有效的，这些初步的工作为我们在这个领域进行实现和评估提供了新的思路。

BioImage.IO Chatbot: A Personalized Assistant for BioImage Analysis Augmented by Community Knowledge Base

paper_url: http://arxiv.org/abs/2310.18351
repo_url: https://github.com/bioimage-io/bioimageio-chatbot
paper_authors: Wanlu Lei, Caterina Fuster-Barceló, Arrate Muñoz-Barrutia, Wei Ouyang
for: 本研究旨在应对生物像分析工具的扩散和复杂的应用环境，提供专业人士和新手一个轻松易用的搜寻和应用工具。
methods: 本研究使用大语言模型建构了一个名为 BioImage$.$IO Chatbot 的 conversational 助手，通过聚合和解释来自多个数据库、工具具specific文档和结构化数据源的信息，提供对应用程序的个性化、上下文感知的答案。
results: 本研究发现，BioImage$.$IO Chatbot 可以提供对应用程序的专业化和上下文感知的搜寻和应用服务，实现了生物像分析工具的搜寻和应用中的转型，并提高了生物学家、生物像分析师和开发者对于生物像分析工具的了解和使用效率。

Abstract
The rapidly expanding landscape of bioimage analysis tools presents a navigational challenge for both experts and newcomers. Traditional search methods often fall short in assisting users in this complex environment. To address this, we introduce the BioImage$.$IO Chatbot, an AI-driven conversational assistant tailored for the bioimage community. Built upon large language models, this chatbot provides personalized, context-aware answers by aggregating and interpreting information from diverse databases, tool-specific documentation, and structured data sources. Enhanced by a community-contributed knowledge base and fine-tuned retrieval methods, the BioImage$.$IO Chatbot offers not just a personalized interaction but also a knowledge-enriched, context-aware experience. It fundamentally transforms the way biologists, bioimage analysts, and developers navigate and utilize advanced bioimage analysis tools, setting a new standard for community-driven, accessible scientific research.

摘要
生物图像分析工具的快速扩展化的景观呈现出了导航挑战，对于专家和新手都是如此。传统的搜索方法经常无法为用户在这个复杂的环境中提供有用的帮助。为解决这个问题，我们介绍了生物图像$.$IO chatbot，这是针对生物图像社区的人工智能对话助手。基于大语言模型，这个 chatbot 提供了个性化、上下文感知的答案，通过聚合和解释多种数据库、工具特定的文档和结构化数据源来完成。通过社区共同投稿的知识库和精细的搜索方法，生物图像$.$IO chatbot 不仅提供了个性化的交互，还提供了上下文感知、知识增强的体验。它fundamentally 改变了生物学家、生物图像分析员和开发者如何导航和使用高级生物图像分析工具，设置了一个新的社区驱动、可 accessible 的科学研究标准。

Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing

paper_url: http://arxiv.org/abs/2310.14855
repo_url: None
paper_authors: Sai Koneru, Miriam Exel, Matthias Huck, Jan Niehues
for: 本研究旨在利用大语言模型（LLM）进行机器翻译（MT），并 explore recent parameter-efficient fine-tuning techniques。
methods: 本研究使用了LLM作为自动后期编辑器（APE），并 extend 这种方法到文档级翻译。另外，我们还使用了 Low-Rank-Adapter fine-tuning 技术来改进 APE 性能。
results: 本研究实现了将ContraPro测试集中的89%的精度，并且在不同的领域数据上实现了一致性。此外，我们还发现在提供参考 контекст的情况下，人工后期修改可以大幅减少翻译后需要进行的编辑数量。

Abstract
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigate using LLM's for MT and explore recent parameter-efficient fine-tuning techniques. Surprisingly, our initial experiments find that fine-tuning for translation purposes even led to performance degradation. To overcome this, we propose an alternative approach: adapting LLM's as Automatic Post-Editors (APE) rather than direct translators. Building on the LLM's exceptional ability to process and generate lengthy sequences, we also propose extending our approach to document-level translation. We show that leveraging Low-Rank-Adapter fine-tuning for APE can yield significant improvements across both sentence and document-level metrics while generalizing to out-of-domain data. Most notably, we achieve a state-of-the-art accuracy rate of 89\% on the ContraPro test set, which specifically assesses the model's ability to resolve pronoun ambiguities when translating from English to German. Lastly, we investigate a practical scenario involving manual post-editing for document-level translation, where reference context is made available. Here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations\footnote{Interactive Demo for integrating manual feedback can be found \href{https://huggingface.co/spaces/skoneru/contextual_refinement_ende}{here}

摘要
大型语言模型（LLM）在不同的自然语言处理任务中表现出了很大的成功，但它们尚未在神经机器翻译（NMT）中取得最佳性能。然而，它们在需要广泛理解和对话处理方面的能力表现出了潜力，因此可以用于翻译。为了利用这些能力，我们进行了研究，用LLM进行翻译和探索最近的参数效率调整技术。 surprisingly，我们的初步实验发现，对翻译目的进行调整会导致性能下降。为了解决这个问题，我们提出了一个 alternativ Approach：使用LLM作为自动修订器（APE）而不是直接翻译器。基于LLM的Length-Rank-Adapter调整技术，我们还提出了将我们的方法扩展到文档级翻译。我们展示了在两个层级的 метриках上，通过使用低维度适材调整，可以获得明显的改善，并且具有扩展性。最 notable，我们在从英文到德文的ContraPro测试集上 achiev 了89%的精度率，这个测试集specifically assess the model's ability to resolve pronoun ambiguities。最后，我们 investigate a practical scenario involving manual post-editing for document-level translation, where reference context is available. here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations。Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

ESVAE: An Efficient Spiking Variational Autoencoder with Reparameterizable Poisson Spiking Sampling

paper_url: http://arxiv.org/abs/2310.14839
repo_url: https://github.com/qgzhan/esvae
paper_authors: Qiugang Zhan, Xiurui Xie, Guisong Liu, Malu Zhang
for: 该文章主要研究了基于脉冲神经网络（SNN）的变量自动编码器（VAE）模型，以提高图像生成质量。
methods: 该文章提出了一种效果很好的变量自动编码器（ESVAE）模型，该模型使用脉冲神经网络中的发射率来构建干扰空间的先验分布，并提出了一种可重parameterizable的脉冲抽样方法。
results: 实验结果表明，提出的ESVAE模型在重构&生成图像质量方面表现更好，而且编码器能够更好地保留原始图像信息，决码器也更加稳定。

Abstract
In recent years, studies on image generation models of spiking neural networks (SNNs) have gained the attention of many researchers. Variational autoencoders (VAEs), as one of the most popular image generation models, have attracted a lot of work exploring their SNN implementation. Due to the constrained binary representation in SNNs, existing SNN VAE methods implicitly construct the latent space by an elaborated autoregressive network and use the network outputs as the sampling variables. However, this unspecified implicit representation of the latent space will increase the difficulty of generating high-quality images and introduces additional network parameters. In this paper, we propose an efficient spiking variational autoencoder (ESVAE) that constructs an interpretable latent space distribution and design a reparameterizable spiking sampling method. Specifically, we construct the prior and posterior of the latent space as a Poisson distribution using the firing rate of the spiking neurons. Subsequently, we propose a reparameterizable Poisson spiking sampling method, which is free from the additional network. Comprehensive experiments have been conducted, and the experimental results show that the proposed ESVAE outperforms previous SNN VAE methods in reconstructed & generated images quality. In addition, experiments demonstrate that ESVAE's encoder is able to retain the original image information more efficiently, and the decoder is more robust. The source code is available at https://github.com/QgZhan/ESVAE.

摘要
近年来，基于脉冲神经网络（SNN）的图像生成模型得到了许多研究者的关注。变量自动编码器（VAE）是图像生成模型中最受欢迎的一种，SNN实现中也有很多研究。由于SNN中的二进制表示受限，现有的SNN VAE方法会隐式地构造latent space，通过复杂的autoregressive网络输出来采样。然而，这种隐式的latent space表示会提高图像质量生成的difficulty，并且添加了额外的网络参数。在这篇论文中，我们提出了高效的脉冲变量自动编码器（ESVAE），constructlatent space的分布是可解释的，并且设计了可重parameterizable的脉冲采样方法。具体来说，我们使用脉冲神经元的发射率来构造latent space的先前和后续分布，然后提出了可重parameterizable的Poisson脉冲采样方法，这种方法不需要额外的网络。我们对ESVAE进行了广泛的实验，实验结果表明，提议的ESVAE在重建&生成图像质量方面比前期SNN VAE方法更高效。此外，实验还表明，ESVAE的encoder能够更有效地保留原始图像信息，而decoder更加稳定。源代码可以在https://github.com/QgZhan/ESVAE上下载。

Calibration of Time-Series Forecasting Transformers: Detecting and Adapting Context-Driven Distribution Shift

paper_url: http://arxiv.org/abs/2310.14838
repo_url: None
paper_authors: Mouxiang Chen, Lefei Shen, Han Fu, Zhuo Li, Jianling Sun, Chenghao Liu
for: 本研究旨在提高时间序列预测中使用Transporter的准确性，特别是在不同的时间上下文下预测时。
methods: 本研究提出了一种通用的偏差检测方法和一种适应器框架，可以让模型在不同的时间上下文下适应。
results: 实验表明，使用提出的方法可以提高现有SOTA Transformer模型在真实世界数据集上的表现，特别是在受到偏差的情况下。

Abstract
Recent years have witnessed the success of introducing Transformers to time series forecasting. From a data generation perspective, we illustrate that existing Transformers are susceptible to distribution shifts driven by temporal contexts, whether observed or unobserved. Such context-driven distribution shift (CDS) introduces biases in predictions within specific contexts and poses challenges for conventional training paradigm. In this paper, we introduce a universal calibration methodology for the detection and adaptation of CDS with a trained Transformer model. To this end, we propose a novel CDS detector, termed the "residual-based CDS detector" or "Reconditionor", which quantifies the model's vulnerability to CDS by evaluating the mutual information between prediction residuals and their corresponding contexts. A high Reconditionor score indicates a severe susceptibility, thereby necessitating model adaptation. In this circumstance, we put forth a straightforward yet potent adapter framework for model calibration, termed the "sample-level contextualized adapter" or "SOLID". This framework involves the curation of a contextually similar dataset to the provided test sample and the subsequent fine-tuning of the model's prediction layer with a limited number of steps. Our theoretical analysis demonstrates that this adaptation strategy is able to achieve an optimal equilibrium between bias and variance. Notably, our proposed Reconditionor and SOLID are model-agnostic and readily adaptable to a wide range of Transformers. Extensive experiments show that SOLID consistently enhances the performance of current SOTA Transformers on real-world datasets, especially on cases with substantial CDS detected by the proposed Reconditionor, thus validate the effectiveness of the calibration approach.

摘要
recent years have witnessed the success of introducing Transformers to time series forecasting. from a data generation perspective, we illustrate that existing Transformers are susceptible to distribution shifts driven by temporal contexts, whether observed or unobserved. such context-driven distribution shift (CDS) introduces biases in predictions within specific contexts and poses challenges for conventional training paradigm. in this paper, we introduce a universal calibration methodology for the detection and adaptation of CDS with a trained Transformer model. to this end, we propose a novel CDS detector, termed the "residual-based CDS detector" or "Reconditionor", which quantifies the model's vulnerability to CDS by evaluating the mutual information between prediction residuals and their corresponding contexts. a high Reconditionor score indicates a severe susceptibility, thereby necessitating model adaptation. in this circumstance, we put forth a straightforward yet potent adapter framework for model calibration, termed the "sample-level contextualized adapter" or "SOLID". this framework involves the curation of a contextually similar dataset to the provided test sample and the subsequent fine-tuning of the model's prediction layer with a limited number of steps. our theoretical analysis demonstrates that this adaptation strategy is able to achieve an optimal equilibrium between bias and variance. notably, our proposed Reconditionor and SOLID are model-agnostic and readily adaptable to a wide range of Transformers. extensive experiments show that SOLID consistently enhances the performance of current SOTA Transformers on real-world datasets, especially on cases with substantial CDS detected by the proposed Reconditionor, thus validate the effectiveness of the calibration approach.

Harnessing Attention Mechanisms: Efficient Sequence Reduction using Attention-based Autoencoders

paper_url: http://arxiv.org/abs/2310.14837
repo_url: None
paper_authors: Daniel Biermann, Fabrizio Palumbo, Morten Goodwin, Ole-Christoffer Granmo
for: 这个论文的目的是探讨一种基于注意力的方法，可以直接 manipulate 序列长度，以提高模型的性能。
methods: 这个方法使用了注意力机制，可以在 latent space 中对序列进行压缩和重构。
results: 实验结果显示，这个方法可以准确地压缩序列长度，并且可以保留所有重要信息。即使压缩到原始序列的一半或一半的长度，模型仍可以准确地重建原始序列，准确率约为 90%。

Abstract
Many machine learning models use the manipulation of dimensions as a driving force to enable models to identify and learn important features in data. In the case of sequential data this manipulation usually happens on the token dimension level. Despite the fact that many tasks require a change in sequence length itself, the step of sequence length reduction usually happens out of necessity and in a single step. As far as we are aware, no model uses the sequence length reduction step as an additional opportunity to tune the models performance. In fact, sequence length manipulation as a whole seems to be an overlooked direction. In this study we introduce a novel attention-based method that allows for the direct manipulation of sequence lengths. To explore the method's capabilities, we employ it in an autoencoder model. The autoencoder reduces the input sequence to a smaller sequence in latent space. It then aims to reproduce the original sequence from this reduced form. In this setting, we explore the methods reduction performance for different input and latent sequence lengths. We are able to show that the autoencoder retains all the significant information when reducing the original sequence to half its original size. When reducing down to as low as a quarter of its original size, the autoencoder is still able to reproduce the original sequence with an accuracy of around 90%.

摘要
In this study, we introduce a novel attention-based method that allows for the direct manipulation of sequence lengths. To explore the method's capabilities, we employ it in an autoencoder model. The autoencoder reduces the input sequence to a smaller sequence in latent space, and then aims to reproduce the original sequence from this reduced form. In this setting, we explore the method's reduction performance for different input and latent sequence lengths. We are able to show that the autoencoder retains all the significant information when reducing the original sequence to half its original size. When reducing down to as low as a quarter of its original size, the autoencoder is still able to reproduce the original sequence with an accuracy of around 90%.

Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias

paper_url: http://arxiv.org/abs/2310.14814
repo_url: None
paper_authors: Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko
for: 这篇论文旨在提出一种新的信任度衡量方法，用于 semi-supervised learning 中的自适性训练。
methods: 方法包括在无标示数据上逐次分配 pseudo-label，并将其视为标示的例子。在 neural network 中，通常使用 softmax 预测概率作为信任度衡量，但是这种方法可能会受到样本选择偏见的影响。
results: 作者提出一种新的信任度衡量方法，called $\mathcal{T}$-similarity，基于预测多样性。他们提供了理论分析，该方法可以对标示过滤和样本选择偏见进行适当的调整。在三个不同的 pseudo-labeling 政策下，作者透过实验证明了该方法的效果。

Abstract
Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples. For neural networks, softmax prediction probabilities are often used as a confidence measure, despite the fact that they are known to be overconfident, even for wrong predictions. This phenomenon is particularly intensified in the presence of sample selection bias, i.e., when data labeling is subject to some constraint. To address this issue, we propose a novel confidence measure, called $\mathcal{T}$-similarity, built upon the prediction diversity of an ensemble of linear classifiers. We provide the theoretical analysis of our approach by studying stationary points and describing the relationship between the diversity of the individual members and their performance. We empirically demonstrate the benefit of our confidence measure for three different pseudo-labeling policies on classification datasets of various data modalities.

摘要
自适应训练是一种常见的半监督学习方法。它包括在无标记数据上逐次分配假标签，并将其视为标注的示例。对于神经网络，通常使用软max预测概率作为信心度量，即使这些预测可能是错误的。这种现象在样本选择偏见的情况下更加严重，即数据标注受到某些约束。为解决这个问题，我们提出了一种新的信心度量，称为 $\mathcal{T}$-相似度，基于预测 ensemble 的线性分类器的多样性。我们提供了对我们方法的理论分析，包括站点点和个体成员多样性与性能之间的关系。我们还进行了实验，证明我们的信心度量在不同的假标签政策下对分类 datasets 的不同数据模式具有明显的优点。

paper_url: http://arxiv.org/abs/2310.14804
repo_url: https://github.com/passing2961/LLM-Share-Image
paper_authors: Young-Jun Lee, Jonghwan Hyeon, Ho-Jin Choi
for: 这 paper 探索了 Large Language Models (LLMs) 在零基础设定下的图像分享能力，不使用视觉基础模型。
methods: 我们提出了一个两个阶段的框架，使得 LLMs 可以预测图像分享的可能性并生成相关的图像描述，使用我们的有效的约束基本模板。
results: 我们在延展 experiment 中发现了 LLMs 在零基础设定下的图像分享能力，GPT-4 achieve 最好的表现。此外，我们还发现了在零基础设定下的图像分享能力的 emergent 特性。

Abstract
This paper explores the image-sharing capability of Large Language Models (LLMs), such as InstructGPT, ChatGPT, and GPT-4, in a zero-shot setting, without the help of visual foundation models. Inspired by the two-stage process of image-sharing in human dialogues, we propose a two-stage framework that allows LLMs to predict potential image-sharing turns and generate related image descriptions using our effective restriction-based prompt template. With extensive experiments, we unlock the \textit{image-sharing} capability of LLMs in zero-shot prompting, with GPT-4 achieving the best performance. Additionally, we uncover the emergent \textit{image-sharing} ability in zero-shot prompting, demonstrating the effectiveness of restriction-based prompts in both stages of our framework. Based on this framework, we augment the PhotoChat dataset with images generated by Stable Diffusion at predicted turns, namely PhotoChat++. To our knowledge, this is the first study to assess the image-sharing ability of LLMs in a zero-shot setting without visual foundation models. The source code and the dataset will be released after publication.

摘要
Note:* 图像分享 (image-sharing) is used instead of 图像描述 (image description) to emphasize the sharing aspect of the task.* restriction-based prompts are used to guide the generation of image descriptions.* PhotoChat++ is a dataset augmented with images generated by Stable Diffusion at predicted turns.

Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages

paper_url: http://arxiv.org/abs/2310.14799
repo_url: None
paper_authors: Libo Qin, Qiguang Chen, Fuxuan Wei, Shijue Huang, Wanxiang Che
for: 提高跨语言逻辑推理的准确性和通用性
methods: 提出cross-lingual prompting (CLP)和cross-lingual self-consistent prompting (CLSP)两种新的提问方法，用于提高逻辑推理的准确性和通用性
results: 实验结果表明，CLP和CLSP在多个测试 benchmark 上表现出色，significantly outperform 现有的提问方法，达到了领域的状态元表现

Abstract
Chain-of-thought (CoT) is capable of eliciting models to explicitly generate reasoning paths, thus promoting reasoning accuracy and attracting increasing attention. Specifically, zero-shot CoT achieves remarkable improvements in a wide range of reasoning tasks by simply instructing the LLM with the prompt "Let's think step by step!". Despite the success of zero-shot CoT, the existing zero-shot prompting techniques remain limited to a single language, making it challenging to generalize to other languages and hindering global development. In this work, we introduce cross-lingual prompting (CLP), aiming to improve zero-shot CoT reasoning across languages. Specifically, CLP consists of two main components: (1) cross-lingual alignment prompting and (2) task-specific solver prompting. The cross-lingual alignment prompting is responsible for aligning representations across different languages, whereas the task-specific solver prompting is used to generate the final chain of thoughts and results for the reasoning task. In addition, we further introduce cross-lingual self-consistent prompting (CLSP) to ensemble different reasoning paths across languages. Our experimental evaluations on several benchmarks demonstrate that CLP and CLSP significantly outperform the existing prompting methods and achieve state-of-the-art performance. We hope this work will inspire further breakthroughs in cross-lingual CoT.

摘要
Chain-of-thought (CoT) 可以让模型直接生成推理路径，从而提高推理准确性，并在最近吸引了越来越多的关注。 Specifically, zero-shot CoT 在各种推理任务中显示出了很好的改善，只需通过提示 "Let's think step by step!" 就可以达到这一点。Despite the success of zero-shot CoT, 现有的零shot prompting技术仅限于单一语言，这使得推理扩展到其他语言和全球发展受到了限制。在这项工作中，我们提出了跨语言推理（CLP），旨在提高零shot CoT 的推理性能 across languages。 CLP 包括两个主要组成部分：（1）跨语言对应 prompting 和（2）任务特定的解决者 prompting。跨语言对应 prompting 负责在不同语言之间对表示进行对应，而任务特定的解决者 prompting 用于生成最终的推理路径和结果。此外，我们还提出了跨语言自包 prompting (CLSP)，用于ensemble不同语言的推理路径。我们的实验评估表明，CLP 和 CLSP 在多个benchmark上表现出色，significantly outperform 现有的prompting方法，并达到了状态之arte。我们希望这项工作能够激发更多的跨语言 CoT 的突破。

What do Deck Chairs and Sun Hats Have in Common? Uncovering Shared Properties in Large Concept Vocabularies

paper_url: http://arxiv.org/abs/2310.14793
repo_url: None
paper_authors: Amit Gajbhiye, Zied Bouraoui, Na Li, Usashi Chatterjee, Luis Espinosa Anke, Steven Schockaert
for: 本研究旨在提高没有句子背景下模型概念的表示方法，以便在应用中更好地处理概念。
methods: 本研究使用了一种策略，即基于概念的共同属性来表示概念。
results: 研究发现，通过将 Shared Properties 添加到标签集，可以提高当前状态的最佳模型在ultra-fine entity typing任务中的表现。I hope that helps! Let me know if you have any further questions or if you’d like me to translate anything else.

Abstract
Concepts play a central role in many applications. This includes settings where concepts have to be modelled in the absence of sentence context. Previous work has therefore focused on distilling decontextualised concept embeddings from language models. But concepts can be modelled from different perspectives, whereas concept embeddings typically mostly capture taxonomic structure. To address this issue, we propose a strategy for identifying what different concepts, from a potentially large concept vocabulary, have in common with others. We then represent concepts in terms of the properties they share with the other concepts. To demonstrate the practical usefulness of this way of modelling concepts, we consider the task of ultra-fine entity typing, which is a challenging multi-label classification problem. We show that by augmenting the label set with shared properties, we can improve the performance of the state-of-the-art models for this task.

摘要
concepts 在许多应用场景中扮演中心角色。这包括无语句上下文下模型概念的情况。先前的工作因此主要关注从语言模型中提取减少上下文的概念嵌入。但是概念可以从不同的角度出现，而概念嵌入通常主要捕捉分类结构。为了解决这个问题，我们提议一种策略，即在大量概念词汇中查找不同概念之间共同的属性。然后我们使用这些共同属性来表示概念。为了证明这种概念表示方式的实际有用性，我们考虑了ultra-fine entity typing任务，这是一个复杂的多标签分类问题。我们发现，通过增加共同属性来补充标签集，可以提高现有模型的性能。

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning

paper_url: http://arxiv.org/abs/2310.14785
repo_url: None
paper_authors: Hao Wang, Xiahua Chen, Rui Wang, Chenhui Chu
for: 这篇论文是关于从视觉 ricHform 文档中提取有用的实体所写的。methods: 这篇论文使用了一种新的视觉不对称共论学习（\textsc{Vancl）approach，通过吸收颜色偏好来增强模型对视觉和布局特征的捕捉能力。results: 实验结果表明，我们的方法在标准 benchmark 数据集上显著超越了强大的 LayoutLM 系列基eline，证明了我们的方法的有效性。此外，我们还 investigate了不同颜色方案对我们的方法的影响，为未来的多模态信息提取研究提供了思路。

Abstract
Extracting meaningful entities belonging to predefined categories from Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and layout features such as font, background, color, and bounding box location and size provide important cues for identifying entities of the same type. However, existing models commonly train a visual encoder with weak cross-modal supervision signals, resulting in a limited capacity to capture these non-textual features and suboptimal performance. In this paper, we propose a novel \textbf{V}isually-\textbf{A}symmetric co\textbf{N}sisten\textbf{C}y \textbf{L}earning (\textsc{Vancl}) approach that addresses the above limitation by enhancing the model's ability to capture fine-grained visual and layout features through the incorporation of color priors. Experimental results on benchmark datasets show that our approach substantially outperforms the strong LayoutLM series baseline, demonstrating the effectiveness of our approach. Additionally, we investigate the effects of different color schemes on our approach, providing insights for optimizing model performance. We believe our work will inspire future research on multimodal information extraction.

摘要
抽取来自可视形式文档（VFD）中意义性的实体，是一项有挑战性的任务。视觉和布局特征，如字体、背景、颜色和 bounding box 的位置和大小，对于同类实体的标识提供重要的指示。然而，现有的模型通常通过弱交叉模式监督信号来训练视觉编码器，导致其捕捉非文本特征的能力有限，性能也不佳。在本文中，我们提出了一种新的 \textbf{V}isually-\textbf{A}symmetric co\textbf{N}sisten\textbf{C}y \textbf{L}earning (\textsc{Vancl}) 方法，以解决上述限制。我们通过添加色彩优先来增强模型捕捉细腻的视觉和布局特征，使模型能够更好地捕捉非文本特征，并且实验结果表明，我们的方法与强 LayoutLM 系列基线相比，显著超越了它们。此外，我们还 investigate了不同的颜色方案对我们的方法的影响，提供了优化模型性能的意见。我们认为，我们的工作将激发未来关于多模式信息抽取的研究。

An Efficient Imbalance-Aware Federated Learning Approach for Wearable Healthcare with Autoregressive Ratio Observation

paper_url: http://arxiv.org/abs/2310.14784
repo_url: None
paper_authors: Wenhao Yan, He Li, Kaoru Ota, Mianxiong Dong
for: this paper aims to address the challenges of class imbalance in federated learning scenarios
methods: the proposed FedImT framework uses an online scheme to estimate data composition and a self-attenuating iterative method to track variations and adjust loss computing for minority classes
results: experiments demonstrate the effectiveness of FedImT in solving the imbalance problem without extra energy consumption and avoiding privacy risks

Abstract
Widely available healthcare services are now getting popular because of advancements in wearable sensing techniques and mobile edge computing. People's health information is collected by edge devices such as smartphones and wearable bands for further analysis on servers, then send back suggestions and alerts for abnormal conditions. The recent emergence of federated learning allows users to train private data on local devices while updating models collaboratively. However, the heterogeneous distribution of the health condition data may lead to significant risks to model performance due to class imbalance. Meanwhile, as FL training is powered by sharing gradients only with the server, training data is almost inaccessible. The conventional solutions to class imbalance do not work for federated learning. In this work, we propose a new federated learning framework FedImT, dedicated to addressing the challenges of class imbalance in federated learning scenarios. FedImT contains an online scheme that can estimate the data composition during each round of aggregation, then introduces a self-attenuating iterative equivalent to track variations of multiple estimations and promptly tweak the balance of the loss computing for minority classes. Experiments demonstrate the effectiveness of FedImT in solving the imbalance problem without extra energy consumption and avoiding privacy risks.

摘要
Note:* "Widely available healthcare services" is 健康服务 (jīngkāng fúwù) in Simplified Chinese.* "Advancements in wearable sensing techniques and mobile edge computing" is 智能监测技术和移动边缘计算 (zhìnéng jiānnéng jìshù yǔ qiǎo yìdòng jìsuō) in Simplified Chinese.* "People's health information" is 人员健康信息 (rényù jīngkāng xìnxī) in Simplified Chinese.* "Federated learning" is 联合学习 (liánhégōng xuéxí) in Simplified Chinese.* "Class imbalance" is 类别不均衡 (lèibié bùjìhóng) in Simplified Chinese.* "Self-attenuating iterative method" is 自适应迭代法 (zìshìbìng dàiédào fǎ) in Simplified Chinese.* "Privacy risks" is 隐私风险 (yìnwèi fēngxīn) in Simplified Chinese.

Evaluating the Knowledge Base Completion Potential of GPT

paper_url: http://arxiv.org/abs/2310.14771
repo_url: None
paper_authors: Blerta Veseli, Simon Razniewski, Jan-Christoph Kalo, Gerhard Weikum
for: This paper is written for evaluating the ability of language models (LMs) to complete structured knowledge bases (KBs) at scale and with high accuracy.
methods: The paper uses GPT-3, ChatGPT, and GPT-4 to perform unsupervised knowledge base completion (KBC) on the largest public KB, Wikidata.
results: The paper finds that, despite the size and capabilities of GPT-3 and other models, they do not achieve fully convincing results on this task, but they do provide solid improvements over earlier approaches with smaller LMs. Specifically, with proper thresholding, GPT-3 enables the extension of Wikidata by 27M facts at 90% precision.

Abstract
Structured knowledge bases (KBs) are an asset for search engines and other applications, but are inevitably incomplete. Language models (LMs) have been proposed for unsupervised knowledge base completion (KBC), yet, their ability to do this at scale and with high accuracy remains an open question. Prior experimental studies mostly fall short because they only evaluate on popular subjects, or sample already existing facts from KBs. In this work, we perform a careful evaluation of GPT's potential to complete the largest public KB: Wikidata. We find that, despite their size and capabilities, models like GPT-3, ChatGPT and GPT-4 do not achieve fully convincing results on this task. Nonetheless, they provide solid improvements over earlier approaches with smaller LMs. In particular, we show that, with proper thresholding, GPT-3 enables to extend Wikidata by 27M facts at 90% precision.

摘要
《结构化知识库（KB）是搜索引擎和其他应用程序的资产，但是不可避免 incomplete。语言模型（LM）已经被提议用于无监督知识库完成（KBC），但是，其在大规模和高精度下的能力仍然是一个 открыQuestion。先前的实验研究主要是在流行的主题上进行评估，或者从现有知识库中采样已有的事实。在这个工作中，我们进行了综合评估GPT的可能性来完成最大公共知识库：Wikidata。我们发现，即使它们具有大小和能力，模型如GPT-3、ChatGPT和GPT-4在这个任务上并不实现完全感知的结果。然而，它们在earlier Approaches中提供了更好的改进。特别是，我们发现，通过适当的阈值调整，GPT-3可以将Wikidata扩展到2700万事实，准确率为90%。》

Policy Gradient with Kernel Quadrature

paper_url: http://arxiv.org/abs/2310.14768
repo_url: None
paper_authors: Satoshi Hayakawa, Tetsuro Morimura
for: 本研究的目的是提高奖励学习任务中的回归评估效率，通过选择一小而表示全部集的代表集，并且只在这些代表集上计算奖励，以便更加有效地进行策略向导迭代。
methods: 本研究使用 Gaussian 过程来模型减去返回或奖励的空间，并使用 “episodic” kernel quadrature 方法来压缩示例集的信息，然后将压缩后的集 passing 给策略网络进行导数更新。
results: 本研究通过数学背景和 MuJoCo 和 causal discovery 任务的数字示例，证明了这种方法的有效性和可Scalability。

Abstract
Reward evaluation of episodes becomes a bottleneck in a broad range of reinforcement learning tasks. Our aim in this paper is to select a small but representative subset of a large batch of episodes, only on which we actually compute rewards for more efficient policy gradient iterations. We build a Gaussian process modeling of discounted returns or rewards to derive a positive definite kernel on the space of episodes, run an "episodic" kernel quadrature method to compress the information of sample episodes, and pass the reduced episodes to the policy network for gradient updates. We present the theoretical background of this procedure as well as its numerical illustrations in MuJoCo and causal discovery tasks.

摘要
reward 评估 episoden 成为 reinforcement learning 任务中的瓶颈。我们在这篇论文中的目标是选择一小而代表性的集合 episode，只在这些集合上计算奖励，以更高效地进行策略梯度迭代。我们使用 Gaussian process 模型来 derive 一个正定的kernel在 episoden 空间上，然后使用 "episodic" kernel quadrature 方法压缩样例集的信息，并将压缩后的集合传递给策略网络进行梯度更新。我们还提供了这种过程的理论背景和 MuJoCo 和 causal discovery 任务的数学示例。

The Safety Challenges of Deep Learning in Real-World Type 1 Diabetes Management

paper_url: http://arxiv.org/abs/2310.14743
repo_url: https://github.com/hemerson1/openaps_cleaner
paper_authors: Harry Emerson, Ryan McConville, Matthew Guy
for: 这项研究的目的是评估使用深度学习算法来模拟Type 1 диабеت斯（T1D）管理策略的有效性，无需害 patients。
methods: 该研究使用了在实际世界中采集的自由生活数据，并使用了patient报告的挑战性 диабеت斯事件标签，构成了一个非常详细的T1D实际数据集。这个数据集用于训练和评估现有的血糖模拟器，并比较了它们在安全关键场景下的预测错误，以及学习的物理学correctness。
results: 研究发现，使用深度学习算法可以在T1D管理中提高预测准确性，但是在安全关键场景下，模型会衰竭，并且无法利用自报的食物和运动信息。另外，SHAP值分析也表明，模型混淆了胰岛素和碳水化合物的角色，这是T1D管理的基本原则之一。这项研究 highlights the importance of considering physiological appropriateness when using deep learning to model real-world systems in T1D and healthcare more broadly, and provides recommendations for building models that are robust to real-world data constraints.

Abstract
Blood glucose simulation allows the effectiveness of type 1 diabetes (T1D) management strategies to be evaluated without patient harm. Deep learning algorithms provide a promising avenue for extending simulator capabilities; however, these algorithms are limited in that they do not necessarily learn physiologically correct glucose dynamics and can learn incorrect and potentially dangerous relationships from confounders in training data. This is likely to be more important in real-world scenarios, as data is not collected under strict research protocol. This work explores the implications of using deep learning algorithms trained on real-world data to model glucose dynamics. Free-living data was processed from the OpenAPS Data Commons and supplemented with patient-reported tags of challenging diabetes events, constituting one of the most detailed real-world T1D datasets. This dataset was used to train and evaluate state-of-the-art glucose simulators, comparing their prediction error across safety critical scenarios and assessing the physiological appropriateness of the learned dynamics using Shapley Additive Explanations (SHAP). While deep learning prediction accuracy surpassed the widely-used mathematical simulator approach, the model deteriorated in safety critical scenarios and struggled to leverage self-reported meal and exercise information. SHAP value analysis also indicated the model had fundamentally confused the roles of insulin and carbohydrates, which is one of the most basic T1D management principles. This work highlights the importance of considering physiological appropriateness when using deep learning to model real-world systems in T1D and healthcare more broadly, and provides recommendations for building models that are robust to real-world data constraints.

摘要
类型1糖尿病（T1D）管理策略的效iveness可以通过血糖模拟来评估，而不会对病人造成伤害。深度学习算法提供了扩展模拟能力的可能性，但这些算法可能不会学习正确的生物physiological糖 dynamics，并且可能会从训练数据中学习 incorrect和potentially dangerous的关系。这在实际场景中可能更加重要，因为数据不是在严格的研究协议下采集。这项工作探讨了使用基于实际数据的深度学习算法来模拟血糖动态的后果。自由生活数据来自OpenAPS数据共享平台，并与患者报告的糖尿病事件标签相结合，构成了一个非常详细的实际糖尿病数据集。这个数据集用于训练和评估当今最佳的血糖模拟器，并比较它们在安全关键情况下的预测误差。深度学习预测精度超过了广泛使用的数学模拟方法，但模型在安全关键情况下崩溃，并且无法利用患者报告的饮食和运动信息。SHAP值分析也表明模型混淆了胰岛素和碳水化合物的角色，这是糖尿病管理的基本原则之一。这项工作强调了在使用深度学习模型实际世界系统时考虑生物学可正确性的重要性，并提供了建立可靠的模型的建议。

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

paper_url: http://arxiv.org/abs/2310.14735
repo_url: None
paper_authors: Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu
for: 本研究探讨了大语言模型（LLM）的提示工程技术，以便最大化 LLM 的能力。
methods: 本文介绍了基础的提示工程原则，如角色提示、一击、少击提示，以及更高级的方法，如链条思维和树条思维。外部协助Plugin 可以帮助提高提示效果，减少机器幻觉。
results: 本研究 shed light on the potential of prompt engineering in fields such as education and programming, demonstrating its transformative power. The paper also discusses the need for further research on the structures and agent roles in Artificial Intelligence-Generated Content (AIGC) tools.

Abstract
This paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). Prompt engineering is the process of structuring input text for LLMs and is a technique integral to optimizing the efficacy of LLMs. This survey elucidates foundational principles of prompt engineering, such as role-prompting, one-shot, and few-shot prompting, as well as more advanced methodologies such as the chain-of-thought and tree-of-thoughts prompting. The paper sheds light on how external assistance in the form of plugins can assist in this task, and reduce machine hallucination by retrieving external knowledge. We subsequently delineate prospective directions in prompt engineering research, emphasizing the need for a deeper understanding of structures and the role of agents in Artificial Intelligence-Generated Content (AIGC) tools. We discuss how to assess the efficacy of prompt methods from different perspectives and using different methods. Finally, we gather information about the application of prompt engineering in such fields as education and programming, showing its transformative potential. This comprehensive survey aims to serve as a friendly guide for anyone venturing through the big world of LLMs and prompt engineering.

摘要
Here's the Simplified Chinese translation:这篇论文探讨了大语言模型（LLM）的启发引擎技术的重要性。启发引擎技术是指为LLM设计输入文本，以便最佳化其性能的过程。本文描述了启发引擎的基本原则，包括角色启发、一键启发和少数启发，以及更高级的链条思维和树思维启发。文章还介绍了外部帮助的形式为插件，可以帮助减少机器幻见并提高LLM的准确率。此外，文章还讨论了未来启发引擎研究的可能性，包括结构的深入理解和人工智能生成内容工具中的代理人角色。文章还讨论了评估启发方法的不同方法和多种方法，以及启发引擎在教育和编程等领域的应用。总之，这篇论文 aimsto serve as a friendly guide for anyone exploring the exciting world of LLMs and prompt engineering.

Predicting Transcription Factor Binding Sites using Transformer based Capsule Network

paper_url: http://arxiv.org/abs/2310.15202
repo_url: https://github.com/NimishaGhosh/DNABERT-Cap
paper_authors: Nimisha Ghosh, Daniele Santoni, Indrajit Saha, Giovanni Felici
for: 预测蛋白质因子绑定 сай点，以理解蛋白质因子如何调控基因表达，以及如何通过修饰这种调控来实现治疗目标。
methods: 提出了一种基于 transformer 核心的软件箱网络（DNABERT-Cap），通过将大量的 genomic DNA 序列作为预训练数据，并将软件箱层负责最终预测。提出的模型使用了拥有 bidirectional encoder 和软件箱层的特征合并，以及 convolutional 和 bidirectional long-short term memory 层，来建立蛋白质因子绑定站预测器。
results: 使用了 ENCODE 数据库中的五个 cell line（A549、GM12878、Hep-G2、H1-hESC 和 Hela）的 ChIP-seq 数据进行评估，结果显示，DNABERT-Cap 在所有五个 cell line 中的平均 receiver operating characteristic curve 分数超过 0.91。此外，DNABERT-Cap 还与现有的深度学习基于 predictors viz. DeepARC、DeepTF、CNN-Zeng 和 DeepBind 进行比较，并被发现超越它们。

Abstract
Prediction of binding sites for transcription factors is important to understand how they regulate gene expression and how this regulation can be modulated for therapeutic purposes. Although in the past few years there are significant works addressing this issue, there is still space for improvement. In this regard, a transformer based capsule network viz. DNABERT-Cap is proposed in this work to predict transcription factor binding sites mining ChIP-seq datasets. DNABERT-Cap is a bidirectional encoder pre-trained with large number of genomic DNA sequences, empowered with a capsule layer responsible for the final prediction. The proposed model builds a predictor for transcription factor binding sites using the joint optimisation of features encompassing both bidirectional encoder and capsule layer, along with convolutional and bidirectional long-short term memory layers. To evaluate the efficiency of the proposed approach, we use a benchmark ChIP-seq datasets of five cell lines viz. A549, GM12878, Hep-G2, H1-hESC and Hela, available in the ENCODE repository. The results show that the average area under the receiver operating characteristic curve score exceeds 0.91 for all such five cell lines. DNABERT-Cap is also compared with existing state-of-the-art deep learning based predictors viz. DeepARC, DeepTF, CNN-Zeng and DeepBind, and is seen to outperform them.

摘要
预测蛋白质因子绑定位点是理解蛋白质因子如何调控蛋白质表达的关键，以及如何通过调控来实现治疗目标。虽然过去几年来有很多研究addressing这个问题，但还有很多空间可以进行改进。在这个 regard，本文提出了一种基于 transformer 的宫墩网络模型，称为 DNABERT-Cap，用于预测蛋白质因子绑定位点，并 mine ChIP-seq 数据集。DNABERT-Cap 是一种双向编码器，预先训练了大量的 genomic DNA 序列，并具有一个负责最终预测的宫墩层。提出的模型建立了一个基于 joint 优化的预测器，包括双向编码器和宫墩层，以及 convolutional 和双向 long-short term memory 层。为评估提出的方法的效率，我们使用了五个 cell line 的 ChIP-seq 数据集，即 A549、GM12878、Hep-G2、H1-hESC 和 Hela，这些数据集可以在 ENCODE 存储库中找到。结果显示，DNABERT-Cap 的平均 receiver operating characteristic curve 分数超过 0.91 的所有五个 cell line。此外，DNABERT-Cap 还与现有的深度学习基于预测器 viz. DeepARC、DeepTF、CNN-Zeng 和 DeepBind 进行比较，并被证明超越了它们。

Generating Prototypes for Contradiction Detection Using Large Language Models and Linguistic Rules

paper_url: http://arxiv.org/abs/2310.14732
repo_url: https://github.com/fraunhofer-iais/informed_nlu
paper_authors: Maren Pielka, Svetlana Schmidt, Rafet Sifa
for: 这个论文的目的是提出一种新的矛盾检测数据生成方法，以利用大语言模型的生成能力和语言规则。
methods: 该方法使用大语言模型来生成矛盾的说明，并遵循语言规则来构造简单的矛盾。
results: 该方法可以生成具有凝结和多样性的数据，并且可以用于语言模型的 Fine-tuning。但是，进一步的研究和手动修改是必要的，以使用这些数据在机器学习 setup 中。

Abstract
We introduce a novel data generation method for contradiction detection, which leverages the generative power of large language models as well as linguistic rules. Our vision is to provide a condensed corpus of prototypical contradictions, allowing for in-depth linguistic analysis as well as efficient language model fine-tuning. To this end, we instruct the generative models to create contradicting statements with respect to descriptions of specific contradiction types. In addition, the model is also instructed to come up with completely new contradiction typologies. As an auxiliary approach, we use linguistic rules to construct simple contradictions such as those arising from negation, antonymy and numeric mismatch. We find that our methods yield promising results in terms of coherence and variety of the data. Further studies, as well as manual refinement are necessary to make use of this data in a machine learning setup.

摘要
我们介绍了一种新的数据生成方法，用于探测矛盾，这种方法利用大语言模型的生成力以及语言规则。我们的目标是提供一个简化的矛盾词库，以便进行深入的语言分析以及语言模型精细调整。为此，我们指定生成模型创造与特定矛盾类型相关的矛盾声明，同时也让模型创造新的矛盾类型。此外，我们还使用语言规则构建简单的矛盾，如从否定、反义和数字差异中导致的矛盾。我们发现，我们的方法可以生成具有凝聚力和多样性的数据，进一步研究和人工优化是必要的，以使用这些数据在机器学习设置中。

A Survey on LLM-generated Text Detection: Necessity, Methods, and Future Directions

paper_url: http://arxiv.org/abs/2310.14724
repo_url: https://github.com/nlp2ct/llm-generated-text-detection
paper_authors: Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, Lidia S. Chao
for: The paper is written to detect LLM-generated text and to mitigate the potential misuse of LLMs in various areas, such as artistic expression and social networks.
methods: The paper discusses various techniques for LLM-generated text detection, including watermarking, zero-shot methods, fine-turning LMs, adversarial learning, and human-assisted methods.
results: The paper highlights recent research breakthroughs in LLM-generated text detection and emphasizes the need for further research to improve the accuracy and robustness of detectors. It also discusses the limitations and developmental requirements of prevalent datasets and analyzes various detection paradigms, shedding light on challenges such as out-of-distribution problems, potential attacks, and data ambiguity.

Abstract
The powerful ability to understand, follow, and generate complex language emerging from large language models (LLMs) makes LLM-generated text flood many areas of our daily lives at an incredible speed and is widely accepted by humans. As LLMs continue to expand, there is an imperative need to develop detectors that can detect LLM-generated text. This is crucial to mitigate potential misuse of LLMs and safeguard realms like artistic expression and social networks from harmful influence of LLM-generated content. The LLM-generated text detection aims to discern if a piece of text was produced by an LLM, which is essentially a binary classification task. The detector techniques have witnessed notable advancements recently, propelled by innovations in watermarking techniques, zero-shot methods, fine-turning LMs methods, adversarial learning methods, LLMs as detectors, and human-assisted methods. In this survey, we collate recent research breakthroughs in this area and underscore the pressing need to bolster detector research. We also delve into prevalent datasets, elucidating their limitations and developmental requirements. Furthermore, we analyze various LLM-generated text detection paradigms, shedding light on challenges like out-of-distribution problems, potential attacks, and data ambiguity. Conclusively, we highlight interesting directions for future research in LLM-generated text detection to advance the implementation of responsible artificial intelligence (AI). Our aim with this survey is to provide a clear and comprehensive introduction for newcomers while also offering seasoned researchers a valuable update in the field of LLM-generated text detection. The useful resources are publicly available at: https://github.com/NLP2CT/LLM-generated-Text-Detection.

摘要
“LLM生成文本检测技术在当前的应用和发展中具有强大的能力，能够识别和检测LLM生成的文本。随着LLM的不断扩展，有必要开发检测LLM生成文本的技术，以避免LLM生成的文本在艺术表达和社交媒体等领域产生不良影响。LLM生成文本检测的目标是判断一个文本是否由LLM生成的，这是一个二分类问题。在最近几年内，检测技术有了很大的进步，它们包括水印技术、零shot方法、细腻LM方法、对抗学习方法、LLM作为检测器和人工协助方法。在这篇评论中，我们收集了最新的研究突破和推动 LLM生成文本检测的技术。我们还探讨了常见的数据集，描述了它们的局限性和发展需求。此外，我们分析了不同的LLM生成文本检测方法，描述了它们面临的挑战，如非典型输入、攻击和数据抖抖。最后，我们指出了未来研究的有优点的方向，以便推动负责任人工智能的实施。我们的目标是通过这篇评论，为新手提供一个清晰的入门，同时为经验老研究人员提供一个有价值的更新。有关的有用资源可以在：https://github.com/NLP2CT/LLM-generated-Text-Detection 中找到。”Note: Please note that the translation is in Simplified Chinese, and some words or phrases may have been translated differently in Traditional Chinese.

A Skin Microbiome Model with AMP interactions and Analysis of Quasi-Stability vs Stability in Population Dynamics

paper_url: http://arxiv.org/abs/2310.15201
repo_url: None
paper_authors: Eléa Thibault Greugny, François Fages, Ovidiu Radulescu, Peter Szmolyan, Georgios Stamatas
for: 研究了皮肤微生物圈的稳定性和维持健康皮肤的重要性。
methods: 使用了数学模型，包括常 differential equation 和量化时间逻辑，对皮肤微生物圈中各种种群之间的互动和竞争进行研究。
results: 模型预测，皮肤表面pH值的升高可以促进机会主义病原菌的出现和殖息，而人类AMPs 的生产有非线性影响于各种种群之间的平衡。此外，模型还发现了在长时间Scale下的征icher quasi-stable state，并使用了热带 algebraic methods 进行分析。

Abstract
The skin microbiome plays an important role in the maintenance of a healthy skin. It is an ecosystem, composed of several species, competing for resources and interacting with the skin cells. Imbalance in the cutaneous microbiome, also called dysbiosis, has been correlated with several skin conditions, including acne and atopic dermatitis. Generally, dysbiosis is linked to colonization of the skin by a population of opportunistic pathogenic bacteria. Treatments consisting in non-specific elimination of cutaneous microflora have shown conflicting results. In this article, we introduce a mathematical model based on ordinary differential equations, with 2 types of bacteria populations (skin commensals and opportunistic pathogens) and including the production of antimicrobial peptides to study the mechanisms driving the dominance of one population over the other. By using published experimental data, assumed to correspond to the observation of stable states in our model, we reduce the number of parameters of the model from 13 to 5. We then use a formal specification in quantitative temporal logic to calibrate our model by global parameter optimization and perform sensitivity analyses. On the time scale of 2 days of the experiments, the model predicts that certain changes of the environment, like the elevation of skin surface pH, create favorable conditions for the emergence and colonization of the skin by the opportunistic pathogen population, while the production of human AMPs has non-linear effect on the balance between pathogens and commensals. Surprisingly, simulations on longer time scales reveal that the equilibrium reached around 2 days can in fact be a quasi-stable state followed by the reaching of a reversed stable state after 12 days or more. We analyse the conditions of quasi-stability observed in this model using tropical algebraic methods, and show their non-generic character in contrast to slow-fast systems. These conditions are then generalized to a large class of population dynamics models over any number of species.

摘要
皮肤微生物群落对维持健康皮肤具有重要作用。它是一个生态系统，由多种种 composing，竞争资源和与皮肤细胞互动。不均衡的皮肤微生物群落（也称为不均衡）与多种皮肤疾病有关，包括牛顿病和过敏性皮肤炎。通常，不均衡与皮肤表皮感染 opportunistic pathogenic bacteria 的殖民相关。非特定的皮肤微生物 populations 清除治疗得到了不一致的结果。在这篇文章中，我们介绍了基于常微分方程的数学模型，包括2种细菌种群（皮肤固有和机会性病原菌），并包括皮肤细胞生产的抗微生物蛋白来研究维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种

BatteryML:An Open-source platform for Machine Learning on Battery Degradation

paper_url: http://arxiv.org/abs/2310.14714
repo_url: https://github.com/microsoft/batteryml
paper_authors: Han Zhang, Xiaofan Gui, Shun Zheng, Ziheng Lu, Yuqi Li, Jiang Bian
for: 本文旨在提供一个一元、涵盖所有阶段的、开源的平台，用于帮助研究人员更好地利用数据预处理、特征提取和模型实现，以提高电池研究的实用性和效率。
methods: 本文使用的方法包括数据预处理、特征提取和模型实现，其中包括传统模型和当前领域的state-of-the-art模型。
results: 本文的实验结果表明，BatteryML 可以帮助研究人员更好地理解和预测电池的衰变行为，提高电池研究的效率和可重用性。

Abstract
Battery degradation remains a pivotal concern in the energy storage domain, with machine learning emerging as a potent tool to drive forward insights and solutions. However, this intersection of electrochemical science and machine learning poses complex challenges. Machine learning experts often grapple with the intricacies of battery science, while battery researchers face hurdles in adapting intricate models tailored to specific datasets. Beyond this, a cohesive standard for battery degradation modeling, inclusive of data formats and evaluative benchmarks, is conspicuously absent. Recognizing these impediments, we present BatteryML - a one-step, all-encompass, and open-source platform designed to unify data preprocessing, feature extraction, and the implementation of both traditional and state-of-the-art models. This streamlined approach promises to enhance the practicality and efficiency of research applications. BatteryML seeks to fill this void, fostering an environment where experts from diverse specializations can collaboratively contribute, thus elevating the collective understanding and advancement of battery research.The code for our project is publicly available on GitHub at https://github.com/microsoft/BatteryML.

摘要
锂电池衰退仍然是能量存储领域中的一个关键问题，机器学习技术在解决这个问题上表现出了潜在的优势。然而，这两个领域的交叉点也存在许多复杂的挑战。机器学习专家经常遇到锂电池科学中的复杂性，而锂电池研究人员则面临着适应特定数据集的复杂模型的挑战。此外，一个包容性的锂电池衰退模型标准，包括数据格式和评估标准，缺失着。认识到这些障碍，我们提出了锂电池ML（BatteryML） - 一个一步、全面、开源的平台，旨在统一数据预处理、特征提取和传统和当前模型的实现。这种流lined的方法将提高研究应用的实用性和效率。锂电池ML旨在填补这个空白，创造一个多元专业人员合作的环境，提高锂电池研究的共同理解和进步。我们的项目代码公开 disponibles on GitHub at .

Random Forest Dissimilarity for High-Dimension Low Sample Size Classification

paper_url: http://arxiv.org/abs/2310.14710
repo_url: None
paper_authors: Lucca Portes Cavalheiro, Simon Bernard, Jean Paul Barddal, Laurent Heutte
for: solves high-dimensional low-sample-size (HDLSS) classification problems
methods: uses a learned precomputed support vector machine (SVM) kernel based on the random forest (RF) similarity measure
results: outperforms existing methods for the majority of HDLSS problems and remains competitive for low or non-HDLSS problemsHere is the summary in traditional Chinese characters:
for: 解决高维低样本数 (HDLSS) 分类问题
methods: 使用学习前计算的支持向量机 (SVM) kernel based on random forest (RF) 相似度测量
results: 比 EXISTS 方法在大多数 HDLSS 问题中表现出色，并在低或非 HDLSS 问题中保持竞争力

Abstract
High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the Random Forest Dissimilarity (RFD), that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.

摘要
高维度低样本数（HDLSS）问题在实际应用中很普遍。从医疗图像到文本处理，传统机器学习算法通常无法从这种数据中学习最佳概念。在前一项工作中，我们提出了一种相似度基于的多视图分类方法，即随机森林相似度（RFD），其在这些问题上实现了状态艺术结果。在这项工作中，我们将把核心原理转移到解决HDLSSB类型问题上，通过使用RF相似度度量作为学习得到的SVM核度（RFSVM）。我们表明该学习相似度度量在这种分类上特别适合和准确。对40个公共HDLSSB分类数据集进行了严格的统计分析，并证明RFSVM方法在大多数HDLSSB问题上超越现有方法，并且在低或非HDLSSB问题上保持竞争力。

BM2CP: Efficient Collaborative Perception with LiDAR-Camera Modalities

paper_url: http://arxiv.org/abs/2310.14702
repo_url: https://github.com/byzhaoai/bm2cp
paper_authors: Binyu Zhao, Wei Zhang, Zhaonian Zou
for:This paper aims to improve the perception performance of autonomous driving systems by enabling agents to share complementary perceptual information.methods:The proposed approach, BM2CP, utilizes LiDAR and camera data to achieve efficient multi-modal perception. It employs LiDAR-guided modal fusion, cooperative depth generation, and modality-guided intermediate fusion to acquire deep interactions among modalities of different agents.results:The proposed approach outperforms state-of-the-art methods with 50X lower communication volumes in both simulated and real-world autonomous driving scenarios.

Abstract
Collaborative perception enables agents to share complementary perceptual information with nearby agents. This would improve the perception performance and alleviate the issues of single-view perception, such as occlusion and sparsity. Most existing approaches mainly focus on single modality (especially LiDAR), and not fully exploit the superiority of multi-modal perception. We propose a collaborative perception paradigm, BM2CP, which employs LiDAR and camera to achieve efficient multi-modal perception. It utilizes LiDAR-guided modal fusion, cooperative depth generation and modality-guided intermediate fusion to acquire deep interactions among modalities of different agents, Moreover, it is capable to cope with the special case where one of the sensors, same or different type, of any agent is missing. Extensive experiments validate that our approach outperforms the state-of-the-art methods with 50X lower communication volumes in both simulated and real-world autonomous driving scenarios. Our code is available at https://github.com/byzhaoAI/BM2CP.

摘要
以下文本翻译成简化中文：共享感知使得智能代理能共享相互补充的感知信息，从而提高感知性能并解决单视角的问题，如遮挡和缺失。现有的大多数方法主要集中在单一模式（尤其是LiDAR），未充分利用多模式感知的优势。我们提议一种共享感知模式，BM2CP，它使用LiDAR和摄像头实现高效的多模式感知。它利用LiDAR导航多模态融合、合作深度生成和模式导向中间融合来实现深度间的交互，并能够处理特殊情况下，任何代理的某种感知器（同或不同类型） missing。我们的实验表明，我们的方法在 simulated 和实际自动驾驶场景中都能够超越当前状态的方法，并且通信量为50倍下降。我们的代码可以在上获取。

API-Assisted Code Generation for Question Answering on Varied Table Structures

paper_url: http://arxiv.org/abs/2310.14687
repo_url: None
paper_authors: Yihan Cao, Shuyi Chen, Ryan Liu, Zhiruo Wang, Daniel Fried
for: 这篇论文旨在提供一个统一的TableQuestionAnswering（TableQA）框架，以便在不同的表格结构中应对问题。
methods: 该框架使用Python作为查询语言，并使用几个shot提示将自然语言（NL）问题翻译成Python程序，可以执行于Pandas数据帧上。此外，以便回答复杂的关系问题和外部知识，我们的框架允许自定义API。
results: 我们在四个TableQA数据集上进行了实验，并取得了显著的改善，比如过去的状态 искусственный智能系统。在减少研究中，我们（1）表明了我们的多重索引表示和API对比基于LLM的基eline具有优势，并（2）证明了我们的方法是可分解的并可以包含更多的API。

Abstract
A persistent challenge to table question answering (TableQA) by generating executable programs has been adapting to varied table structures, typically requiring domain-specific logical forms. In response, this paper introduces a unified TableQA framework that: (1) provides a unified representation for structured tables as multi-index Pandas data frames, (2) uses Python as a powerful querying language, and (3) uses few-shot prompting to translate NL questions into Python programs, which are executable on Pandas data frames. Furthermore, to answer complex relational questions with extended program functionality and external knowledge, our framework allows customized APIs that Python programs can call. We experiment with four TableQA datasets that involve tables of different structures -- relational, multi-table, and hierarchical matrix shapes -- and achieve prominent improvements over past state-of-the-art systems. In ablation studies, we (1) show benefits from our multi-index representation and APIs over baselines that use only an LLM, and (2) demonstrate that our approach is modular and can incorporate additional APIs.

摘要
困难问题：表格问答（TableQA）通常需要适应不同的表格结构，通常需要域pecific的逻辑形式。为此，本文介绍了一个统一的 TableQA 框架，它：1. 将结构化表格表示为多个索引的 Pandas 数据框，以便更好地处理表格数据。2. 使用 Python 作为强大的查询语言，以便更好地处理表格数据。3. 使用少量的提示来将自然语言问题翻译成 Python 程序，该程序可以执行于 Pandas 数据框上。此外，为回答复杂的关系问题和追加的外部知识，我们的框架允许自定义 API，Python 程序可以调用。我们在四个不同结构的 TableQA 数据集上进行了实验，并实现了过去的状态顶点系统的显著改进。在剥离研究中，我们：1. 显示了我们的多索引表示和 API 对基eline系统的改进。2. 示示了我们的方法是可模块的，可以添加更多的 API。

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

paper_url: http://arxiv.org/abs/2310.14670
repo_url: None
paper_authors: Zhecan Wang, Long Chen, Haoxuan You, Keyang Xu, Yicheng He, Wenhao Li, Noal Codella, Kai-Wei Chang, Shih-Fu Chang
For: The paper aims to address dataset biases in vision-language (VL) understanding tasks by proposing Adversarial Data Synthesis (ADS) and Intra-sample Counterfactual Training (ICT) to improve model performance.* Methods: The paper uses ADS to generate synthetic training and debiased evaluation data, and ICT to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.* Results: The paper shows that ADS and ICT consistently improve model performance across different benchmarks, even in domain-shifted scenarios.

Abstract
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions. However, we have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding. The first type of dataset bias is \emph{Unbalanced Matching} bias, where the correct answer overlaps the question and image more than the incorrect answers. The second type of dataset bias is \emph{Distractor Similarity} bias, where incorrect answers are overly dissimilar to the correct answer but significantly similar to other incorrect answers within the same sample. To address these dataset biases, we first propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data. We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation. Extensive experiments demonstrate the effectiveness of ADS and ICT in consistently improving model performance across different benchmarks, even in domain-shifted scenarios.

摘要
视觉语言（VL）理解任务评估模型对复杂视觉场景的理解能力通过多选问题。然而，我们已经发现了两种数据集偏见，这些偏见可以让模型通过短cut使得不具备深入理解而 Correctly解决多个VL任务。第一种数据集偏见是“不均匀匹配”偏见，正确答案与问题和图像之间的 overlap 比 incorrect answers 更大。第二种数据集偏见是“Distractor Similarity”偏见，错误答案与正确答案之间的 similarity 远大于其他错误答案之间的 similarity。为了解决这些数据集偏见，我们首先提出了对抗数据生成（ADS），用于生成Synthetic 训练和不偏见评估数据。然后，我们引入Intra-sampleCounterfactual Training（ICT），以帮助模型利用生成的训练数据，特别是对偶数据。广泛的实验表明，ADS和ICT可以在不同的benchmark上提高模型性能，甚至在领域转移 scenarios 中。

B^2SFL: A Bi-level Blockchained Architecture for Secure Federated Learning-based Traffic Prediction

paper_url: http://arxiv.org/abs/2310.14669
repo_url: None
paper_authors: Hao Guo, Collin Meese, Wanxin Li, Chien-Chung Shen, Mark Nejad
for: 这篇论文旨在提出一种安全和分散的 federated learning（FL）技术，用于实现隐私保护和安全的机器学习（ML）模型训练。
methods: 该论文提出了一种两层链接的区块链架构，其中底层链存储本地模型，顶层链存储全球综合参数。同时，提出了分布式同质加密 Federated Averaging（DHFA）算法，以解决安全计算问题。
results: 实验结果表明，提出的系统可以实现安全、分散的 federated learning，用于实际的交通流量预测任务。

Abstract
Federated Learning (FL) is a privacy-preserving machine learning (ML) technology that enables collaborative training and learning of a global ML model based on aggregating distributed local model updates. However, security and privacy guarantees could be compromised due to malicious participants and the centralized FL server. This article proposed a bi-level blockchained architecture for secure federated learning-based traffic prediction. The bottom and top layer blockchain store the local model and global aggregated parameters accordingly, and the distributed homomorphic-encrypted federated averaging (DHFA) scheme addresses the secure computation problems. We propose the partial private key distribution protocol and a partially homomorphic encryption/decryption scheme to achieve the distributed privacy-preserving federated averaging model. We conduct extensive experiments to measure the running time of DHFA operations, quantify the read and write performance of the blockchain network, and elucidate the impacts of varying regional group sizes and model complexities on the resulting prediction accuracy for the online traffic flow prediction task. The results indicate that the proposed system can facilitate secure and decentralized federated learning for real-world traffic prediction tasks.

摘要

Data Pruning via Moving-one-Sample-out

paper_url: http://arxiv.org/abs/2310.14664
repo_url: None
paper_authors: Haoru Tan, Sitong Wu, Fei Du, Yukang Chen, Zhibin Wang, Fan Wang, Xiaojuan Qi
for: 本研究提出了一种新的数据剔除方法，即移动一个样本出（MoSo），用于识别和移除训练集中最少有用的样本。
methods: 本方法基于评估每个样本的重要性，通过评估它们对优化实际风险的影响。这里使用了一种高效的首领函数来估算样本之间的相关性。
results: 实验结果表明，MoSo 有效地避免高度剔除率导致的性能下降，并在不同的设置下达到了满意的性能。

Abstract
In this paper, we propose a novel data-pruning approach called moving-one-sample-out (MoSo), which aims to identify and remove the least informative samples from the training set. The core insight behind MoSo is to determine the importance of each sample by assessing its impact on the optimal empirical risk. This is achieved by measuring the extent to which the empirical risk changes when a particular sample is excluded from the training set. Instead of using the computationally expensive leaving-one-out-retraining procedure, we propose an efficient first-order approximator that only requires gradient information from different training stages. The key idea behind our approximation is that samples with gradients that are consistently aligned with the average gradient of the training set are more informative and should receive higher scores, which could be intuitively understood as follows: if the gradient from a specific sample is consistent with the average gradient vector, it implies that optimizing the network using the sample will yield a similar effect on all remaining samples. Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios and achieves satisfactory performance across various settings.

摘要
在这篇论文中，我们提出了一种新的数据剔除方法，称为“移动一个样本出”（MoSo），旨在在训练集中标识和移除最少有用的样本。MoSo的核心想法是根据每个样本对优化的实际风险的影响来确定每个样本的重要性。这是通过计算剔除特定样本后训练集的实际风险变化来实现的。而不是使用计算成本较高的离开一个样本重新训练过程，我们提议了一种高效的第一个 aproximator，只需要不同训练阶段的梯度信息。MoSo的关键思想是，样本的梯度与训练集的平均梯度方向相互平行，这些样本更有用，因为它们在优化网络时会对所有剩下的样本产生相似的效果。实验结果表明，MoSo可以在高剔除率下有效地避免严重的性能下降，并在不同设置下达到满意的性能。

Reasoning about Ambiguous Definite Descriptions

paper_url: http://arxiv.org/abs/2310.14657
repo_url: https://github.com/sfschouten/exploiting-ambiguity
paper_authors: Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen
For: The paper aims to evaluate the ability of Large Language Models (LLMs) to use explicit reasoning to resolve context-dependent ambiguity in language.* Methods: The paper proposes using ambiguous definite descriptions to create a benchmark dataset for this purpose, and all information required to resolve the ambiguity is included in the prompt, allowing models to rely solely on reasoning to perform well.* Results: The authors find that recent LLMs struggle with this task, indicating that there is room for improvement in their ability to use explicit reasoning to resolve ambiguity.Here’s the same information in Simplified Chinese:* For: 论文目的是评估大语言模型（LLMs）是否可以使用明确的理由解决语言中的上下文依赖性的歧义。* Methods: 论文提出使用不确定的定语phrases创建一个benchmark dataset，并将所有需要解决歧义的信息包含在提示中，使模型只需理解来做好。* Results: 作者发现现有的LLMs在这个任务中表现不佳，表明需要进一步提高对明确的理由的使用能力。

Abstract
Natural language reasoning plays an increasingly important role in improving language models' ability to solve complex language understanding tasks. An interesting use case for reasoning is the resolution of context-dependent ambiguity. But no resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language. We propose to use ambiguous definite descriptions for this purpose and create and publish the first benchmark dataset consisting of such phrases. Our method includes all information required to resolve the ambiguity in the prompt, which means a model does not require anything but reasoning to do well. We find this to be a challenging task for recent LLMs. Code and data available at: https://github.com/sfschouten/exploiting-ambiguity

摘要
自然语言理解在解决复杂语言理解任务中发挥越来越重要的作用。一个有趣的应用例子是解决上下文依赖性的歧义。但目前没有资源来评估大语言模型如何使用显式理解来解决语言中的歧义。我们提议使用不确定定语的描述来解决这个问题，并创建了首个包含这些短语的数据集。我们的方法包括所有需要解决歧义的信息，这意味着模型只需要理解来完成任务。我们发现最新的大语言模型很困难地解决这个任务。代码和数据可以在 GitHub 上获取：https://github.com/sfschouten/exploiting-ambiguity。

$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

paper_url: http://arxiv.org/abs/2310.14651
repo_url: https://github.com/nishio-laboratory/lambda_split
paper_authors: Shoki Ohta, Takayuki Nishio
For: 这个论文目的是提出一个名为 $\Lambda$-Split 的分computing框架，以便在资源有限的移动设备上进行Generative AI服务的computational offloading，同时保护敏感资料免受隐私和安全风险。* Methods: 这个框架使用了三个分别分配到用户的本地设备和云端服务器上的子模型，以避免将敏感的输入和输出数据外传。它还使用了黑盒模型，使得从侵略者 intercepted 隐藏层输出中估算原始输入或输出的问题具有很大的挑战。* Results: 在使用 Llama 2 和 Stable Diffusion XL 的测试中， $\Lambda$-Split 框架能够实现高效的隐私和安全性，并且可以与传统的加密类型安全机制相容。

Abstract
In the wake of the burgeoning expansion of generative artificial intelligence (AI) services, the computational demands inherent to these technologies frequently necessitate cloud-powered computational offloading, particularly for resource-constrained mobile devices. These services commonly employ prompts to steer the generative process, and both the prompts and the resultant content, such as text and images, may harbor privacy-sensitive or confidential information, thereby elevating security and privacy risks. To mitigate these concerns, we introduce $\Lambda$-Split, a split computing framework to facilitate computational offloading while simultaneously fortifying data privacy against risks such as eavesdropping and unauthorized access. In $\Lambda$-Split, a generative model, usually a deep neural network (DNN), is partitioned into three sub-models and distributed across the user's local device and a cloud server: the input-side and output-side sub-models are allocated to the local, while the intermediate, computationally-intensive sub-model resides on the cloud server. This architecture ensures that only the hidden layer outputs are transmitted, thereby preventing the external transmission of privacy-sensitive raw input and output data. Given the black-box nature of DNNs, estimating the original input or output from intercepted hidden layer outputs poses a significant challenge for malicious eavesdroppers. Moreover, $\Lambda$-Split is orthogonal to traditional encryption-based security mechanisms, offering enhanced security when deployed in conjunction. We empirically validate the efficacy of the $\Lambda$-Split framework using Llama 2 and Stable Diffusion XL, representative large language and diffusion models developed by Meta and Stability AI, respectively. Our $\Lambda$-Split implementation is publicly accessible at https://github.com/nishio-laboratory/lambda_split.

摘要
在生长式人工智能（AI）服务的扩展中，这些技术的计算需求经常需要云计算的卷积式加载，特别是 для资源有限的移动设备。这些服务通常使用提示来引导生成过程，而提示和生成的内容，如文本和图像，可能包含隐私敏感或机密信息，从而增加安全和隐私风险。为了缓解这些问题，我们介绍了Lambda-Split，一种分布式计算框架，以便在用户的本地设备和云服务器之间分配计算任务，从而实现计算加载。在Lambda-Split中，一个生成模型，通常是深度神经网络（DNN），被分解成三个子模型，分别分配到用户的本地设备和云服务器：输入和输出子模型分别分配到本地，而中间、计算昂贵的子模型寄存在云服务器。这种架构确保只有隐藏层输出被传输，因此防止了外部传输隐私敏感的原始输入和输出数据。由于DNN的黑盒特性，对 intercepted 隐藏层输出进行重建原始输入或输出的挑战非常大。此外，Lambda-Split与传统的加密基础设施不同，可以增强安全性，当与其他安全机制相结合使用时。我们通过使用Llama 2和Stable Diffusion XL，代表Meta和Stability AI开发的大语言和扩散模型，进行了实验 validate Lambda-Split 框架的可靠性。我们的Lambda-Split实现可公开访问于https://github.com/nishio-laboratory/lambda_split。

Spiking mode-based neural networks

paper_url: http://arxiv.org/abs/2310.14621
repo_url: https://github.com/linzhanghan/smnn
paper_authors: Zhanghan Lin, Haiping Huang
for: 这个论文的目的是提出一种基于模式的训练协议，以解决大规模抽象神经网络训练时的成本高和学习过程中信息隐藏问题。
methods: 该训练协议使用输入和输出模式和它们相关的分数来解释Weight的意义。这使得模式数量可以调整，以提供更多的自由度来模拟实验数据。此外，该协议还可以将高维神经活动投影到低维模式空间中，从而将学习空间维度减少。
results: 作者在两个计算任务中（数字分类和选择性感受Integration任务）进行了分析，并 derivated一种模式基本学习规则 для抽象神经网络。

Abstract
Spiking neural networks play an important role in brain-like neuromorphic computations and in studying working mechanisms of neural circuits. One drawback of training a large scale spiking neural network is that an expensive cost of updating all weights is required. Furthermore, after training, all information related to the computational task is hidden into the weight matrix, prohibiting us from a transparent understanding of circuit mechanisms. Therefore, in this work, we address these challenges by proposing a spiking mode-based training protocol. The first advantage is that the weight is interpreted by input and output modes and their associated scores characterizing importance of each decomposition term. The number of modes is thus adjustable, allowing more degrees of freedom for modeling the experimental data. This reduces a sizable training cost because of significantly reduced space complexity for learning. The second advantage is that one can project the high dimensional neural activity in the ambient space onto the mode space which is typically of a low dimension, e.g., a few modes are sufficient to capture the shape of the underlying neural manifolds. We analyze our framework in two computational tasks -- digit classification and selective sensory integration tasks. Our work thus derives a mode-based learning rule for spiking neural networks.

摘要
神经网络在脑动-类似计算中扮演着重要的角色，同时也用于研究神经综合体的工作机制。然而，训练大规模神经网络时存在一个昂贵的更新所有权重的问题。此外，训练后，所有相关计算任务的信息都被储存在权重矩阵中，这使得我们无法从权重矩阵中获得透彻的认识。因此，在这项工作中，我们解决这些挑战，提出了一种神经模式基于的训练协议。首先，在输入和输出模式和它们关联的分数中，权重被解释。因此，模式数量可以被调整，以获得更多的自由度来模型实验数据。这 reduces 训练成本，因为学习空间复杂度减少了。其次，可以将高维神经活动在 ambient 空间中 проек到模式空间中，模式空间通常是低维的，例如只需几个模式可以捕捉神经 manifold 的形状。我们在 digit 分类和选择性感知任务中分析了我们的框架。因此，我们 derive 一种基于模式的学习规则 для神经网络。

Prefix-Tuning Based Unsupervised Text Style Transfer

paper_url: http://arxiv.org/abs/2310.14599
repo_url: None
paper_authors: Huiyu Mai, Wenhao Jiang, Zhihong Deng
for: 这篇论文主要针对的是无监督文本风格转换技术，即通过训练一个生成模型，使其可以改变输入句子的风格而不使用任何平行数据。
methods: 本文使用了强大预训练的大语言模型，并提出了一种基于前缀练化的新方法 для无监督文本风格转换。这种方法使用三种不同的前缀，即共享前缀、风格前缀和内容前缀，来编码任务特定信息、目标风格和输入句子的内容信息，分别。与之前的方法不同的是，这些前缀可以为模型提供更丰富的信息。此外，我们采用了 recursive way of using language models，这种策略可以更有效地在文本风格转换过程中进行模型之间的交互，帮助模型构建更有用的前缀，从而提高性能。
results: 对于知名的数据集，我们的方法超过了现状之最佳基elines。此外，我们还提供了对比分析、抽象研究和人类Subjective评价，以便更深入地了解提议的方法。

Abstract
Unsupervised text style transfer aims at training a generative model that can alter the style of the input sentence while preserving its content without using any parallel data. In this paper, we employ powerful pre-trained large language models and present a new prefix-tuning-based method for unsupervised text style transfer. We construct three different kinds of prefixes, i.e., \textit{shared prefix, style prefix}, and \textit{content prefix}, to encode task-specific information, target style, and the content information of the input sentence, respectively. Compared to embeddings used by previous works, the proposed prefixes can provide richer information for the model. Furthermore, we adopt a recursive way of using language models in the process of style transfer. This strategy provides a more effective way for the interactions between the input sentence and GPT-2, helps the model construct more informative prefixes, and thus, helps improve the performance. Evaluations on the well-known datasets show that our method outperforms the state-of-the-art baselines. Results, analysis of ablation studies, and subjective evaluations from humans are also provided for a deeper understanding of the proposed method.

摘要
<>translate into Simplified Chinese无监督文本风格传输目标在训练一个生成模型，以alter输入句子的风格而保持其内容，不使用任何平行数据。在这篇论文中，我们利用强大的预训练大语言模型，并提出了一种基于前缀修改的新方法 для无监督文本风格传输。我们构建了三种不同的前缀，即\textit{共享前缀}, \textit{风格前缀}和\textit{内容前缀}，以编码任务特定信息、目标风格和输入句子的内容信息，分别。与前作使用的嵌入相比，我们的提案的前缀可以为模型提供更丰富的信息。另外，我们采用了回归的语言模型使用策略，这种策略可以在文本风格传输过程中更有效地进行输入句子和GPT-2之间的交互，帮助模型构建更有用的前缀，从而提高性能。评估在知名的数据集上表明，我们的方法比前一阶段的基准值更高。结果、简洁分析和人类的主观评价也提供了更深入的理解方法。

Learning to Correct Noisy Labels for Fine-Grained Entity Typing via Co-Prediction Prompt Tuning

paper_url: http://arxiv.org/abs/2310.14596
repo_url: https://github.com/mhtang1995/cppt
paper_authors: Minghao Tang, Yongquan He, Yongxiu Xu, Hongbo Xu, Wenyuan Zhang, Yang Lin
for: 本研究旨在解决自然语言处理中细致实体类型化（FET）中的噪音标注问题，即现有方法通过估计噪音分布来识别噪音标注，但它们受到多种噪音分布偏差的影响。
methods: 我们提出了基于多个预测结果的协同预测提示调整方法，用于减少噪音标注。 Specifically, 我们将预测结果集 integrate 回召回标签，并使用分化的边界来识别不准确标签。此外，我们设计了一个对异质协同预测进行优化的目标函数，确保模型捕捉到足够的信息并保持噪音识别的稳定性。
results: 我们在三个广泛使用的FET数据集上进行实验，结果显示，我们的噪音修正方法可以显著提高不同类型的训练样本的质量，包括使用远程监督、ChatGPT和人工标注等方法进行标注的样本。

Abstract
Fine-grained entity typing (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce Co-Prediction Prompt Tuning for noise correction in FET, which leverages multiple prediction results to identify and correct noisy labels. Specifically, we integrate prediction results to recall labeled labels and utilize a differentiated margin to identify inaccurate labels. Moreover, we design an optimization objective concerning divergent co-predictions during fine-tuning, ensuring that the model captures sufficient information and maintains robustness in noise identification. Experimental results on three widely-used FET datasets demonstrate that our noise correction approach significantly enhances the quality of various types of training samples, including those annotated using distant supervision, ChatGPT, and crowdsourcing.

摘要
优化的实体类型分类（FET）是自然语言处理中的一项重要任务，它的目标是将文本中的实体分类为semantic类型。然而，FET受到一个主要的挑战，即噪声标签问题，现有方法依据估计噪声分布来标识噪声标签，但是这些方法容易受到多种噪声分布的 deviation。为了解决这些限制，我们介绍了Co-Prediction Prompt Tuning（CPPT），一种用于噪声 corrections的方法，它利用多个预测结果来确定和更正噪声标签。具体来说，我们将预测结果集成起来，以便回忆标记标签，并使用分化的margin来识别错误的标签。此外，我们还设计了一个在精度适应中的优化目标，确保模型捕捉到足够的信息，并保持噪声识别的稳定性。实验结果表明，我们的噪声 corrections策略在三个广泛使用的FET数据集上有效地提高了不同类型的训练样本的质量，包括使用远程监督、ChatGPT和大量签名的标注样本。

Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

paper_url: http://arxiv.org/abs/2310.14581
repo_url: None
paper_authors: Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka
for: 本研究的目的是提出一种用于DataComp挑战的数据处理方法，以提高数据质量并提高模型的泛化能力。
methods: 本研究使用大型多Modal模型CLIP和BLIP-2来筛选和修改网络爬虫数据，同时利用外部数据集和一些技巧来提高数据质量。
results: 实验表明，我们的解决方案在DataComp挑战中的筛选track和BYOD track上显著超过基eline（筛选track：6.6%提高，BYOD track：48.5%提高）。

Abstract
Large web crawl datasets have already played an important role in learning multimodal features with high generalization capabilities. However, there are still very limited studies investigating the details or improvements of data design. Recently, a DataComp challenge has been designed to propose the best training data with the fixed models. This paper presents our solution to both filtering track and BYOD track of the DataComp challenge. Our solution adopts large multimodal models CLIP and BLIP-2 to filter and modify web crawl data, and utilize external datasets along with a bag of tricks to improve the data quality. Experiments show our solution significantly outperforms DataComp baselines (filtering track: 6.6% improvement, BYOD track: 48.5% improvement).

摘要
大型网爬Datasets已经在学习多Modal特征方面发挥了重要作用，但是还有很少的研究关于数据设计的细节或改进方法。近期，DataComp挑战赛推出了提议最佳训练数据的挑战。本文介绍了我们对DataComp挑战的解决方案，我们采用了大型多Modal模型CLIP和BLIP-2来筛选和修改网爬数据，并利用外部数据集和一些资源进行优化。实验结果显示，我们的解决方案在筛选 track和BYOD track中都有显著的改进（筛选 track：6.6%提升，BYOD track：48.5%提升）。

FedSplitX: Federated Split Learning for Computationally-Constrained Heterogeneous Clients

paper_url: http://arxiv.org/abs/2310.14579
repo_url: None
paper_authors: Jiyun Shin, Jinhyun Ahn, Honggu Kang, Joonhyuk Kang
for: 这篇论文目的是提出一个名为 FedSplitX 的联合学习框架，以解决在联合学习中客户端机器不均匀的问题。
methods: 这篇论文使用了一种名为 FedSplitX 的新方法，将大型模型分成客户端和服务器端两部分，以满足不同客户端的 Compute 能力。
results: 实验结果显示，FedSplitX 可以更好地利用服务器的 Compute 能力来训练大型模型，并与基准方法相比实现更高的模型性能。

Abstract
Foundation models (FMs) have demonstrated remarkable performance in machine learning but demand extensive training data and computational resources. Federated learning (FL) addresses the challenges posed by FMs, especially related to data privacy and computational burdens. However, FL on FMs faces challenges in situations with heterogeneous clients possessing varying computing capabilities, as clients with limited capabilities may struggle to train the computationally intensive FMs. To address these challenges, we propose FedSplitX, a novel FL framework that tackles system heterogeneity. FedSplitX splits a large model into client-side and server-side components at multiple partition points to accommodate diverse client capabilities. This approach enables clients to collaborate while leveraging the server's computational power, leading to improved model performance compared to baselines that limit model size to meet the requirement of the poorest client. Furthermore, FedSplitX incorporates auxiliary networks at each partition point to reduce communication costs and delays while enhancing model performance. Our experiments demonstrate that FedSplitX effectively utilizes server capabilities to train large models, outperforming baseline approaches.

摘要
基于Machine Learning的Foundation Models（FM）在实际应用中表现出了惊人的能力，但它们需要大量的训练数据和计算资源。 Federation Learning（FL）可以解决FM所遇到的问题，特别是数据隐私和计算负担问题。然而，在FM上进行FL时，面临着客户端 possessing varying computing capabilities 的挑战，因为客户端的限制可能会使FM进行计算性能不佳。为解决这些挑战，我们提出了 FedSplitX，一种新的FL框架，可以在多个分区点上将大型模型分解成客户端和服务器端两部分。这种方法允许客户端与服务器进行协作，同时利用服务器的计算能力，从而提高模型性能，比基eline方法，其限制模型大小以适应最差的客户端。此外，FedSplitX还包括在每个分 partition point 中的辅助网络，以减少通信成本和延迟，同时提高模型性能。我们的实验表明，FedSplitX可以有效利用服务器的计算能力，训练大型模型，超越基eline方法。

Unveiling the Multi-Annotation Process: Examining the Influence of Annotation Quantity and Instance Difficulty on Model Performance

paper_url: http://arxiv.org/abs/2310.14572
repo_url: None
paper_authors: Pritam Kadasi, Mayank Singh
for: This paper aims to investigate the impact of multi-annotator datasets on NLP model performance.
methods: The authors propose a novel multi-annotator simulation process to generate datasets with varying annotation budgets.
results: The study shows that similar datasets with the same annotation budget can lead to varying performance gains, challenging the popular belief that multi-annotation datasets always lead to better performance.Here’s the same information in Simplified Chinese:
for: 这篇论文探讨了 NLP 模型性能如何受到多个注释数据集的影响。
methods: 作者提出了一种新的多注释模拟过程，用于生成具有不同注释预算的数据集。
results: 研究发现，同一个注释预算下的多个注释数据集可能会导致不同的性能提升，与人们通常认为的多注释数据集总是提高模型性能的想法相抵触。

Abstract
The NLP community has long advocated for the construction of multi-annotator datasets to better capture the nuances of language interpretation, subjectivity, and ambiguity. This paper conducts a retrospective study to show how performance scores can vary when a dataset expands from a single annotation per instance to multiple annotations. We propose a novel multi-annotator simulation process to generate datasets with varying annotation budgets. We show that similar datasets with the same annotation budget can lead to varying performance gains. Our findings challenge the popular belief that models trained on multi-annotation examples always lead to better performance than models trained on single or few-annotation examples.

摘要

Meaning Representations from Trajectories in Autoregressive Models

paper_url: http://arxiv.org/abs/2310.18348
repo_url: None
paper_authors: Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto
for: 本研究旨在提取基于自然语言模型的含义表示，以便无需 Fine-tuning 和提示，可以在任何预训练的自然语言模型上使用。
methods: 本研究使用考虑输入文本的所有可能的演进 trajectories 来获得分布基于含义表示。这种策略不需要vector-based表示，可以模型不对称关系（例如逻辑推论、Hypernym/Hyponym关系），通过likelihood函数的代数运算。
results: 实验表明，使用大型模型获得的表示与人工标注相符，超过零配置和提示方法在 semantic similarity 任务中的性能，并可以解决标准表示无法处理的更复杂的包含和推论任务。此外，我们还扩展了方法以处理不同的模态数据（例如图像和文本），使用多modal autoregressive模型。

Abstract
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. Moreover, unlike vector-based representations, distribution-based representations can also model asymmetric relations (e.g., direction of logical entailment, hypernym/hyponym relations) by using algebraic operations between likelihood functions. These ideas are grounded in distributional perspectives on semantics and are connected to standard constructions in automata theory, but to our knowledge they have not been applied to modern language models. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle. Finally, we extend our method to represent data from different modalities (e.g., image and text) using multimodal autoregressive models.

摘要
我们提议通过考虑输入文本的所有可能的趋势扩展来提取语义表示。这种策略不需要准确调整，可以应用于任何预训练的自然语言模型，而且不同于向量基的表示方法，分布基的表示方法可以模型不对称关系（如逻辑推论的方向、Hypernym/Hyponym关系）通过likelihood函数的代数运算。这些想法基于分布 semantics 的视角和自动机理论中的标准构造，但我们知道它们没有应用于现代语言模型。我们的实验表明，从大型模型中获得的表示与人工标注相符，在 semantic similarity 任务上超越了零shot和不需要准确调整的方法，并可以解决标准表示无法处理的更复杂的包含和涵盖任务。最后，我们扩展了我们的方法以用于不同的Modalities（如图像和文本）的多模态自然语言模型。

AlpaCare:Instruction-tuned Large Language Models for Medical Application

paper_url: http://arxiv.org/abs/2310.14558
repo_url: https://github.com/xzhang97666/alpacare
paper_authors: Xinlu Zhang, Chenxin Tian, Xianjun Yang, Lichang Chen, Zekun Li, Linda Ruth Petzold
for: 这paper的目的是提高大语言模型（LLMs）的 instrucion-following 能力，以实现在多种任务中的出色表现。
methods: 这paper使用了52k个多样化的机器生成的医学指令追踪数据（MedInstruct-52k）进行了练习，并通过对这些数据进行微调来提高模型的医学能力和泛化性。
results: 实验结果表明，对于医学和通用领域的自由指令评估中，AlpaCare模型在医学和通用领域都表现出色，比之前的指令微调模型更具医学能力和泛化性。

Abstract
Large Language Models (LLMs) have demonstrated significant enhancements in instruction-following abilities through instruction tuning, achieving notable performances across various tasks. Previous research has focused on fine-tuning medical domain-specific LLMs using an extensive array of medical-specific data, incorporating millions of pieces of biomedical literature to augment their medical capabilities. However, existing medical instruction-tuned LLMs have been constrained by the limited scope of tasks and instructions available, restricting the efficacy of instruction tuning and adversely affecting performance in the general domain. In this paper, we fine-tune LLaMA-series models using 52k diverse, machine-generated, medical instruction-following data, MedInstruct-52k, resulting in the model AlpaCare. Comprehensive experimental results on both general and medical-specific domain free-form instruction evaluations showcase AlpaCare's strong medical proficiency and generalizability compared to previous instruction-tuned models in both medical and general domains. We provide public access to our MedInstruct-52k dataset and a clinician-crafted free-form instruction test set, MedInstruct-test, along with our codebase, to foster further research and development. Our project page is available at https://github.com/XZhang97666/AlpaCare.

摘要
大型语言模型（LLM）在 instrucion 调整方面已经展示了明显的提升，在不同的任务中表现出色。过去的研究主要集中在医疗领域特定的 LLM 上进行精细调整，使用了大量的医疗专业文献来增强其医疗能力。但是，现有的医疗 instrucion 调整 LLM 受到了限定的任务和 instrucion 的有限性，这产生了 instrucion 调整的有效性和医疗领域的表现的问题。在这篇论文中，我们使用了 52,000 种多样的医疗 instrucion 调整数据，MedInstruct-52k，进行模型调整，创造出 AlpaCare 模型。我们实现了对医疗领域和通用领域的具体和通用性的评估，并提供了 MedInstruct-52k 数据集和来自临床专业人士的自由形式 instrucion 试点集 MedInstruct-test，以及我们的代码库，以便进一步的研究和发展。我们的项目页面可以在 GitHub 上找到：https://github.com/XZhang97666/AlpaCare。

Making RL with Preference-based Feedback Efficient via Randomization

paper_url: http://arxiv.org/abs/2310.14554
repo_url: None
paper_authors: Runzhe Wu, Wen Sun
for: 这个论文目标是提出一种高效的强化学习算法，能够从人类反馈中学习。
methods: 该算法使用随机化方法，包括随机扩散和活动学习，以实现 statistically efficient 和 computationally efficient 的性能。
results: 该算法可以在 linear MDP 模型下实现 near-optimal 的 worst-case regret bound，同时具有 polynomial 的运行时间和 query complexity。在更一般的非线性函数近似中，我们设计了一种基于模型的随机化算法，可以 minimize Bayesian regret bound 和 query complexity，实现 near-optimal 的质量平衡。

Abstract
Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be efficient in terms of statistical complexity, computational complexity, and query complexity. In this work, we consider the RLHF setting where the feedback is given in the format of preferences over pairs of trajectories. In the linear MDP model, by using randomization in algorithm design, we present an algorithm that is sample efficient (i.e., has near-optimal worst-case regret bounds) and has polynomial running time (i.e., computational complexity is polynomial with respect to relevant parameters). Our algorithm further minimizes the query complexity through a novel randomized active learning procedure. In particular, our algorithm demonstrates a near-optimal tradeoff between the regret bound and the query complexity. To extend the results to more general nonlinear function approximation, we design a model-based randomized algorithm inspired by the idea of Thompson sampling. Our algorithm minimizes Bayesian regret bound and query complexity, again achieving a near-optimal tradeoff between these two quantities. Computation-wise, similar to the prior Thompson sampling algorithms under the regular RL setting, the main computation primitives of our algorithm are Bayesian supervised learning oracles which have been heavily investigated on the empirical side when applying Thompson sampling algorithms to RL benchmark problems.

摘要
“强化学习算法（RL），受人类反馈（HF）的学习需要高效率。在这项工作中，我们考虑RLHF设定，其中反馈是对轨迹对的偏好。在线性MDP模型中，通过Randomization在算法设计中使用，我们提出了一种高效的算法，其worst-case regret bound几乎为最优，运行时间为几乎 polynomials with respect to相关参数。我们的算法还减少了查询复杂度，通过一种新的随机活动学习过程。具体来说，我们的算法达到了near-optimal的交易征。在更一般的非线性函数approximation中，我们设计了基于模型的随机化算法，以Thompson sampling的思想为基础。我们的算法最小化了 bayesian regret bound和查询复杂度，再次达到了near-optimal的交易征。计算上，我们的算法的主要计算基础是 Bayesian supervised learning oracles，它们在RL benchmark问题上得到了大量的实际研究。”Note: Simplified Chinese is used in this translation, which is a more casual and conversational style of Chinese. If you prefer Traditional Chinese or a more formal style, please let me know and I can provide those versions as well.

Denoising Opponents Position in Partial Observation Environment

paper_url: http://arxiv.org/abs/2310.14553
repo_url: None
paper_authors: Aref Sayareh, Aria Sardari, Vahid Khoddami, Nader Zare, Vinicius Prado da Fonseca, Amilcar Soares
for: 本研究旨在提高 Soccer Simulation 2D 软件中的决策过程，通过机器学习方法预测对手位置，以提高行动的准确性。
methods: 本研究使用 Long Short-Term Memory 模型 (LSTM) 和 Deep Neural Networks (DNN) 来预测对手位置，并与标准算法进行比较。
results: 研究结果表明，LSTM 和 DNN 的位置预测精度高于标准算法，如最后见方法。

Abstract
The RoboCup competitions hold various leagues, and the Soccer Simulation 2D League is a major among them. Soccer Simulation 2D (SS2D) match involves two teams, including 11 players and a coach for each team, competing against each other. The players can only communicate with the Soccer Simulation Server during the game. Several code bases are released publicly to simplify team development. So researchers can easily focus on decision-making and implementing machine learning methods. SS2D actions and behaviors are only partially accurate due to different challenges, such as noise and partial observation. Therefore, one strategy is to implement alternative denoising methods to tackle observation inaccuracy. Our idea is to predict opponent positions while they have yet to be seen in a finite number of cycles using machine learning methods to make more accurate actions such as pass. We will explain our position prediction idea powered by Long Short-Term Memory models (LSTM) and Deep Neural Networks (DNN). The results show that the LSTM and DNN predict the opponents' position more accurately than the standard algorithm, such as the last-seen method.

摘要
罗宾杯比赛有多个联赛， Soccer Simulation 2D 联赛是其中之一。 Soccer Simulation 2D（SS2D）比赛含有两支队伍，每支队伍有11名球员和一位教练，在比赛中竞争。球员仅能在比赛中与足球模拟服务器进行交流。为了促进队伍的开发，多个开源代码库被公开发布，让研究人员可以更集中地专注于决策和实现机器学习技术。由于 SS2D 的动作和行为仅部分准确，因此一种策略是实现替代的干扰方法来解决观察不准确的问题。我们的想法是使用长期记忆运算方法（LSTM）和深度神经网络（DNN）预测对手的位置，而不需要在比赛中观察对手。我们将说明我们使用 LSTM 和 DNN 预测对手位置的想法，以及其比于标准算法，如最后一次方法，的结果。

Evaluating Spatial Understanding of Large Language Models

paper_url: http://arxiv.org/abs/2310.14540
repo_url: None
paper_authors: Yutaro Yamada, Yihan Bao, Andrew K. Lampinen, Jungo Kasai, Ilker Yildirim
for: 这个论文旨在研究语言模型（LLM）是否能够自动捕捉空间结构的知识。
methods: 作者使用自然语言导航任务来评估不同语言模型（包括GPT-3.5-turbo、GPT-4和Llama2系列）对空间结构的表示和推理能力，并与人类表现进行比较。
results: 研究发现，LLM在不同的空间结构中表现有很大的变化，包括方格、六边形、三角形网格、环形和树等。此外，LLM的错误分析发现，LLM的错误不仅受到空间因素的影响，还受到非空间因素的影响。这些发现表明，LLM在捕捉空间结构方面存在一定的能力，但还有很多空间提高。

Abstract
Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge -- spatial relationships. We design natural-language navigation tasks and evaluate the ability of LLMs, in particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and reason about spatial structures, and compare these abilities to human performance on the same tasks. These tasks reveal substantial variability in LLM performance across different spatial structures, including square, hexagonal, and triangular grids, rings, and trees. We also discover that, similar to humans, LLMs utilize object names as landmarks for maintaining spatial maps. Finally, in extensive error analysis, we find that LLMs' mistakes reflect both spatial and non-spatial factors. These findings suggest that LLMs appear to capture certain aspects of spatial structure implicitly, but room for improvement remains.

摘要
大型语言模型（LLM）表现出了多种任务中的出色能力。尽管模型只在训练中看到文本，但数据rekent studies表明，LLM表示含有下面的基本概念。在这里，我们研究LLM表示的空间关系，并评估不同模型在不同空间结构上的表现，包括正方形、六角形和三角形网格、环和树。我们还发现，与人类类似，LLM使用物品名称作为空间地图的标志。最后，我们进行了详细的错误分析，发现LLM的错误是由空间和非空间因素共同决定的。这些发现表明LLM capture了一些空间结构的方面，但还有待改进的空间。

paper_url: http://arxiv.org/abs/2310.14533
repo_url: None
paper_authors: Heinrich Peters, Yozen Liu, Francesco Barbieri, Raiyan A. Baten, Sandra C. Matz, Maarten W. Bos
for: 这个研究旨在预测和理解社交媒体平台上的用户行为规律，以提高平台的功能和隐私保护。
methods: 研究使用深度LSTM神经网络分析了1000万次推特会话数据，从80000名用户手机上收集数据，并使用过去行为特征来预测用户的活跃和潜在行为。
results: 研究发现，考虑Context信息可以提高预测性能，并且只需要 minimal behavioral histories 可以准确预测用户的活跃和潜在行为。此外，研究还发现了用户在不同的Context下的行为差异，这些差异可以被用来预测用户的活跃和潜在行为。

Abstract
The success of online social platforms hinges on their ability to predict and understand user behavior at scale. Here, we present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight and potentially privacy-preserving representation of user engagement on online social platforms. Leveraging deep LSTM neural networks to analyze more than 100 million Snapchat sessions from almost 80.000 users, we demonstrate that patterns of active and passive use are predictable from past behavior (R2=0.345) and that the integration of context information substantially improves predictive performance compared to the behavioral baseline model (R2=0.522). Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement relative to features derived from histories of in-app behaviors. Further, we show that a large proportion of variance can be accounted for with minimal behavioral histories if momentary context information is considered (R2=0.44). These results indicate the potential of context-aware approaches for making models more efficient and privacy-preserving by reducing the need for long data histories. Finally, we employ model explainability techniques to glean preliminary insights into the underlying behavioral mechanisms. Our findings are consistent with the notion of context-contingent, habit-driven patterns of active and passive use, underscoring the value of contextualized representations of user behavior for predicting user engagement on social platforms.

摘要
在线社交平台的成功取决于它们可以预测和理解用户行为的能力。在这里，我们发布了一些数据，说明了使用具有上下文意识的模型方法可以提供一个整体而轻量级的用户互动预测。我们使用深度LSTM神经网络分析了1000万次 snapchat会话，来自80000名用户，并示出了过去行为可以预测当前行为的情况（R2=0.345）。此外，我们发现了一些与智能手机连接状态、位置、时间上下文和天气有关的特征可以捕捉用户互动中的非重复差异。此外，我们发现只需要考虑当前上下文信息，就可以减少用户历史数据量，以提高预测性能（R2=0.44）。这些结果表明了上下文意识方法的潜在优势，可以使模型更加高效和隐私保护。最后，我们使用模型解释技术，Initial insights into the underlying behavioral mechanisms. Our findings support the idea of context-dependent, habit-driven patterns of active and passive use, highlighting the importance of contextualized representations of user behavior for predicting user engagement on social platforms.

Towards Zero Shot Learning in Restless Multi-armed Bandits

paper_url: http://arxiv.org/abs/2310.14526
repo_url: None
paper_authors: Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe
for: 解决Restless Multi-Arm Bandits（RMAB）问题，提高资源分配的效率和适应能力。
methods: 使用神经网络模型PreFeRMAB，通过预训练和精度训练来解决RMAB问题，并且可以处理连续状态和多个动作选择。
results: 提出了一种新的更新规则，并通过 theoretically 和实验来证明其可以快速适应不同的RMAB问题，并且在一些实际应用中提高了效率和适应能力。

Abstract
Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $\lambda$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.

摘要
众多臂猿 (RMAB)，一类资源分配问题，在医疗、在线广告和抵抗贪婪等领域有广泛的应用。过去的RMAB研究受到一些限制，例如无法有效处理连续状态，并且需要在时间上 retrained from scratch，这在许多实际应用中是一个常见的挑战。我们通过开发一个基于神经网络的预训练模型（PreFeRMAB）来解决这些限制。PreFeRMAB具有通用零shot能力，可以在许多未before seen RMAB中进行快速普适化，并且可以在特定情况下进行精细化，比 retrained from scratch 更加 sample-efficient。我们的模型还可以处理多个动作设定和整数或连续状态空间。为了快速普适化，我们学习了一种新的单策略网络模型，利用特征信息，并采用一种在Arm opt-in和opt-out过程中进行训练的方法。我们 derive了一个新的更新规则，用于一个关键的 $\lambda$-网络，并经验表明了我们的方法在一些实际挑战性的问题上具有优势。

Do We Really Need Contrastive Learning for Graph Representation?

paper_url: http://arxiv.org/abs/2310.14525
repo_url: None
paper_authors: Yulan Hu, Sheng Ouyang, Jingyu Liu, Ge Chen, Zhirui Yang, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Yong Liu
for: 本研究的目的是提出一种简单 yet effective的图学习模型，用于减少对大图的计算开销和提高图 embedding 的质量。
methods: 本研究使用 rank learning 技术，首先通过损害生成两个图视图，然后计算对应节点的对比性分数，并通过对比性分数进行排名学习来度量对应节点之间的相似性。
results: 对多个图任务进行了广泛的实验，结果显示 GraphRank 模型在多种图任务中表现出色，并且比其他当今最佳 GCL 方法更高效。

Abstract
In recent years, contrastive learning has emerged as a dominant self-supervised paradigm, attracting numerous research interests in the field of graph learning. Graph contrastive learning (GCL) aims to embed augmented anchor samples close to each other while pushing the embeddings of other samples (negative samples) apart. However, existing GCL methods require large and diverse negative samples to ensure the quality of embeddings, and recent studies typically leverage samples excluding the anchor and positive samples as negative samples, potentially introducing false negative samples (negatives that share the same class as the anchor). Additionally, this practice can result in heavy computational burden and high time complexity of $O(N^2)$, which is particularly unaffordable for large graphs. To address these deficiencies, we leverage rank learning and propose a simple yet effective model, GraphRank. Specifically, we first generate two graph views through corruption. Then, we compute the similarity of pairwise nodes (anchor node and positive node) in both views, an arbitrary node in the latter view is selected as a negative node, and its similarity with the anchor node is computed. Based on this, we introduce rank-based learning to measure similarity scores which successfully relieve the false negative provlem and decreases the time complexity from $O(N^2)$ to $O(N)$. Moreover, we conducted extensive experiments across multiple graph tasks, demonstrating that GraphRank performs favorably against other cutting-edge GCL methods in various tasks.

摘要
近年来，对比学习在图学Field中得到了广泛的研究兴趣，特别是图对比学习（GCL），它的目标是将扩展的锚Sample embeddingClose to each other, while pushing the embeddings of other samples (negative samples) apart. However, existing GCL methods require large and diverse negative samples to ensure the quality of embeddings, and recent studies typically leverage samples excluding the anchor and positive samples as negative samples, potentially introducing false negative samples (negatives that share the same class as the anchor). Additionally, this practice can result in heavy computational burden and high time complexity of $O(N^2)$, which is particularly unaffordable for large graphs.To address these deficiencies, we leverage rank learning and propose a simple yet effective model, GraphRank. Specifically, we first generate two graph views through corruption. Then, we compute the similarity of pairwise nodes (anchor node and positive node) in both views, and an arbitrary node in the latter view is selected as a negative node, and its similarity with the anchor node is computed. Based on this, we introduce rank-based learning to measure similarity scores, which successfully relieves the false negative problem and decreases the time complexity from $O(N^2)$ to $O(N)$.Moreover, we conducted extensive experiments across multiple graph tasks, demonstrating that GraphRank performs favorably against other cutting-edge GCL methods in various tasks.

CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities

paper_url: http://arxiv.org/abs/2310.14512
repo_url: https://github.com/jsksxs360/prompt-event-coref-emnlp2023
paper_authors: Sheng Xu, Peifeng Li, Qiaoming Zhu
for: 这篇论文是关于Event Coreference Resolution (ECR)的研究，ECR是将事件提及集成到同一个群组的过程。
methods: 这篇论文提出了一种基于提示的方法，即CorefPrompt，将ECR转化为一种cloze-style MLM任务，同时实现事件模型和核心关系划分的同时学习。此外， paper 还提出了两个辅助任务，事件类兼容性和参数兼容性，以便显示ECR的决策过程。
results: 实验结果表明，CorefPrompt 在一个SOTA标准 benchmark 中表现出色。

Abstract
Event coreference resolution (ECR) aims to group event mentions referring to the same real-world event into clusters. Most previous studies adopt the "encoding first, then scoring" framework, making the coreference judgment rely on event encoding. Furthermore, current methods struggle to leverage human-summarized ECR rules, e.g., coreferential events should have the same event type, to guide the model. To address these two issues, we propose a prompt-based approach, CorefPrompt, to transform ECR into a cloze-style MLM (masked language model) task. This allows for simultaneous event modeling and coreference discrimination within a single template, with a fully shared context. In addition, we introduce two auxiliary prompt tasks, event-type compatibility and argument compatibility, to explicitly demonstrate the reasoning process of ECR, which helps the model make final predictions. Experimental results show that our method CorefPrompt performs well in a state-of-the-art (SOTA) benchmark.

摘要
Event 核心聚合解析（ECR）目的是将事件提及集成到一起。大多数前一 Studies 采用 "编码然后评分" 框架，使核心相关性判断依赖事件编码。此外，当前方法困难借用人工概括 ECRI 规则，例如相关事件应该有相同的事件类型，来引导模型。为解决这两个问题，我们提出了一种提示基本方法，CorefPrompt，将 ECR 转换成一种cloze 风格的 MLM 任务。这使得同一个模板中同时进行事件模型化和核心相关性决定，具有完全共享上下文。此外，我们引入了两个辅助提示任务，事件类型兼容和参数兼容，以显式地演示 ECRI 的逻辑过程，帮助模型作出最终预测。实验结果表明，我们的方法 CorefPrompt 在state-of-the-art bencmark 中表现出色。

Iteratively Learn Diverse Strategies with State Distance Information

paper_url: http://arxiv.org/abs/2310.14509
repo_url: None
paper_authors: Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, Yi Wu
for: 解决复杂奖励学习（RL）问题中，政策具有相似奖励可能具有巨大不同行为的问题。
methods: 我们的研究检验了两种设计选择来解决这个挑战，即多样性度量和计算框架。我们发现现有多样性度量可能无法准确捕捉行为差异，因此我们提出了基于状态空间距离信息的多样性度量。此外，我们检验了两种常见的计算框架，即人口基础训练（PBT）和迭代学习（ITR）。我们发现虽然PBT是正确的问题形式，但ITR可以在计算效率更高的情况下实现相同的多样性分数，从而提高实际解决方案的质量。
results: 基于我们的分析，我们将ITR与两种可读取的状态距离基于多样性度量相结合，开发了一种新的多样性驱动RL算法，即状态基于内在奖励策略优化（SIPO）。我们在机器人运动和多机器人游戏等三个领域进行了实验，并在所有测试环境中表明SIPO可以生成策略性多样且人类可理解的策略，而现有基准无法捕捉到这些策略。

Abstract
In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation framework. First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure. In addition, we examine two common computation frameworks for this problem, i.e., population-based training (PBT) and iterative learning (ITR). We show that although PBT is the precise problem formulation, ITR can achieve comparable diversity scores with higher computation efficiency, leading to improved solution quality in practice. Based on our analysis, we further combine ITR with two tractable realizations of the state-distance-based diversity measures and develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties. We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines.

摘要
在复杂的强化学习（RL）问题中，具有类似奖励的策略可能具有极大的不同行为。optimizing奖励并发现最多可能的策略是一个基本挑战。我们的研究检视了两种设计选择来解决这个挑战，即多样性度量和计算框架。首先，我们发现了现有多样性度量中的一个问题，即视觉上无法区分的策略可能具有高多样性分数。为了准确地捕捉行为之间的差异，我们提议将状态空间距离信息纳入多样性度量中。此外，我们还检视了两种常见的计算框架，即人口基本训练（PBT）和迭代学习（ITR）。我们发现，虽然PBT是正确的问题形式，但ITR可以达到相同的多样性分数，并且 computationally more efficient，从而提高实际解决方案的质量。基于我们的分析，我们进一步将ITR与两种可读的状态距离基于多样性度量相结合，并开发了一种新的多样性驱动RL算法，即状态基本内在奖励策略优化（SIPO）。我们在三个从机器人移动到多机器人游戏的领域进行了实验，在所有测试环境中，SIPO一直生成了策略多样性和人类可解释的策略，而现有基准值无法生成这些策略。

Counterfactual Explanation Generation with s(CASP)

paper_url: http://arxiv.org/abs/2310.14497
repo_url: None
paper_authors: Sopam Dasgupta, Farhad Shakerin, Joaquín Arias, Elmer Salazar, Gopal Gupta
for: This paper focuses on the problem of automatically generating counterfactual explanations to provide justifications for decision-making models.
methods: The approach used in this paper is based on answer set programming (ASP) and the s(CASP) goal-directed ASP system. The query-driven nature of s(CASP) allows for the generation of counterfactual explanations as proof trees.
results: The paper shows how counterfactual explanations can be computed and justified by imagining multiple possible worlds where some or all factual assumptions are untrue, and how the algorithm can be used to find the Craig Interpolant for a class of answer set programs for a failing query.

Abstract
Machine learning models that automate decision-making are increasingly being used in consequential areas such as loan approvals, pretrial bail, hiring, and many more. Unfortunately, most of these models are black-boxes, i.e., they are unable to reveal how they reach these prediction decisions. A need for transparency demands justification for such predictions. An affected individual might desire explanations to understand why a decision was made. Ethical and legal considerations may further require informing the individual of changes in the input attribute that could be made to produce a desirable outcome. This paper focuses on the latter problem of automatically generating counterfactual explanations. Our approach utilizes answer set programming and the s(CASP) goal-directed ASP system. Answer Set Programming (ASP) is a well-known knowledge representation and reasoning paradigm. s(CASP) is a goal-directed ASP system that executes answer-set programs top-down without grounding them. The query-driven nature of s(CASP) allows us to provide justifications as proof trees, which makes it possible to analyze the generated counterfactual explanations. We show how counterfactual explanations are computed and justified by imagining multiple possible worlds where some or all factual assumptions are untrue and, more importantly, how we can navigate between these worlds. We also show how our algorithm can be used to find the Craig Interpolant for a class of answer set programs for a failing query.

摘要
机器学习模型在重要领域 such as 贷款批准、预审释放、招聘等中逐渐被使用来自动化决策。然而，大多数这些模型都是黑盒模型，即无法解释它们如何达到预测决策。一个受影响的个体可能希望了解决ving why a decision was made。伦理和法律考虑因此可能需要通知个体可以改变输入属性来生成愉悦的结果。本文关注后一个问题：自动生成对实际解释。我们的方法使用回答集编程和s(CASP)目标导向ASP系统。回答集编程（ASP）是一种知识表示和推理方法。s(CASP)是一种目标导向ASP系统，它在执行回答集程序时不需要下面。我们可以通过查询驱动的nature of s(CASP)提供证明树，从而分析生成的对实际解释。我们展示了如何计算和证明对实际解释，以及如何在多个可能的世界中导航。我们还展示了如何使用我们的算法找到一个 Craig Interpolant для一个类型的回答集程序。

InstructExcel: A Benchmark for Natural Language Instruction in Excel

paper_url: http://arxiv.org/abs/2310.14495
repo_url: None
paper_authors: Justin Payan, Swaroop Mishra, Mukul Singh, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Benjamin Van Durme, Elnaz Nouri
for: investigate whether Large Language Models (LLMs) can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions.
methods: introduce a new large-scale benchmark, InstructExcel, created by leveraging the ‘Automate’ feature in Excel to automatically generate OfficeScripts from users’ actions.
results: observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.Here’s the full text in Simplified Chinese:
for: 这个研究是 investigate whether Large Language Models (LLMs) can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions.
methods: 这个研究使用了一个新的大规模 benchmark，InstructExcel，它是通过使用 Excel 中的 ‘Automate’ 功能自动生成 OfficeScripts 从用户的操作来生成的。
results: 研究发现，(1)使用 GPT-4 instead of GPT-3.5, (2)提供更多的上下文示例，和 (3) 动态提示可以帮助改进这个 benchmark 的性能。

Abstract
With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, InstructExcel, created by leveraging the 'Automate' feature in Excel to automatically generate OfficeScripts from users' actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that InstructExcel is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.

摘要
“对于大型语言模型（LLMs），我们可以解决越来越复杂的自然语言处理（NLP）任务，包括表格。这个工作研究了 LLMs 是否可以从自然语言用户指令中生成代码（Excel OfficeScripts，一个基于 TypeScript 的 API 用于执行许多 Excel 任务）以解决 Excel 特定的任务。为此，我们创建了一个新的大规模参考基准，名为 InstructExcel，通过使用 Excel 的 '自动化' 功能来自动从用户的动作生成 OfficeScripts。我们的参考基准包括超过 10,000 个样本，覆盖了 170 多个 Excel 操作，分布在 2,000 个公开可用的 Excel 表格上。实验结果显示，InstructExcel 是现今最佳模型 GPT-4 的困难参考基准。我们发现了以下三个因素可以帮助提高表格上的表现：（1）使用 GPT-4 вместо GPT-3.5，（2）提供更多的内部示例，以及（3）动态提示。”

Robotic Arm Manipulation to Perform Rock Skipping in Simulation

paper_url: http://arxiv.org/abs/2310.14492
repo_url: None
paper_authors: Nicholas Ramirez, Michael Burgess
for: 这个项目的目的是将岩石跳跃引入机器人设定中，利用机器人抓取和动态环境来实现岩石跳跃。
methods: 该项目使用了机器人臂和动态环境来实现岩石跳跃，并通过调整发射速度来获得最大跳跃数。
results: 该项目遇到了抓取不稳定和发射高度轨迹问题，这些问题将在报告中进行详细介绍。

Abstract
Rock skipping is a highly dynamic and relatively complex task that can easily be performed by humans. This project aims to bring rock skipping into a robotic setting, utilizing the lessons we learned in Robotic Manipulation. Specifically, this project implements a system consisting of a robotic arm and dynamic environment to perform rock skipping in simulation. By varying important parameters such as release velocity, we hope to use our system to gain insight into the most important factors for maximizing the total number of skips. In addition, by implementing the system in simulation, we have a more rigorous and precise testing approach over these varied test parameters. However, this project experienced some limitations due to gripping inefficiencies and problems with release height trajectories which is further discussed in our report.

摘要
石头跳是一种高度动态和相对复杂的任务，人类可以轻松完成。这个项目想将石头跳带入机器人环境中，利用我们在机器人抓取方面学到的教训。具体来说，这个项目实现了一个由机器人臂和动态环境组成的系统，用于在模拟环境中进行石头跳。通过调整发射速度等重要参数，我们希望通过这个系统获得最大化石头跳数的最佳因素。此外，由于在模拟环境中实现系统，我们可以对这些参数进行更加严格和精炼的测试。然而，这个项目受到了抓取不足和发射高度轨迹问题的限制，这些问题在报告中进行了详细说明。

VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations

paper_url: http://arxiv.org/abs/2310.14487
repo_url: None
paper_authors: Yiying Yang, Wen Liu, Fukun Yin, Xin Chen, Gang Yu, Jiayuan Fan, Tao Chen
for: 高精度表面重建和新视图生成
methods: 使用vector quantization提高implicit neural representation，并实现多尺度NeRF采样 schemes
results: 实现高品质渲染和高效率，并在DTU、BlendMVS和H3DS数据集上达到最佳平衡。

Abstract
Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis. However, the computational complexity inherent in these methodologies presents a substantial impediment, constraining the attainable frame rates and resolutions in practical applications. In response to this predicament, we propose VQ-NeRF, an effective and efficient pipeline for enhancing implicit neural representations via vector quantization. The essence of our method involves reducing the sampling space of NeRF to a lower resolution and subsequently reinstating it to the original size utilizing a pre-trained VAE decoder, thereby effectively mitigating the sampling time bottleneck encountered during rendering. Although the codebook furnishes representative features, reconstructing fine texture details of the scene remains challenging due to high compression rates. To overcome this constraint, we design an innovative multi-scale NeRF sampling scheme that concurrently optimizes the NeRF model at both compressed and original scales to enhance the network's ability to preserve fine details. Furthermore, we incorporate a semantic loss function to improve the geometric fidelity and semantic coherence of our 3D reconstructions. Extensive experiments demonstrate the effectiveness of our model in achieving the optimal trade-off between rendering quality and efficiency. Evaluation on the DTU, BlendMVS, and H3DS datasets confirms the superior performance of our approach.

摘要
(Simplified Chinese translation)最近的隐藏神经表示法的进步有助于高精度表面重建和真实视角synthesis。然而，这些方法中的计算复杂性存在实际应用中的重大阻碍，限制了可能的帧率和分辨率。为了解决这个问题，我们提出了VQ-NeRF，一种高效和高质量的渠道 для加强隐藏神经表示。我们的方法包括将NeRF的采样空间减小到更低的分辨率，并使用预训练的VAE解码器来重新填充到原始大小。这种方法可以有效地减少渲染时间瓶颈。虽然代码库提供了表示性的特征，但是重建场景中细节的表示仍然具有挑战性，因为压缩率较高。为了解决这个问题，我们设计了一种多尺度NeRF采样方案，同时优化NeRF模型在压缩和原始尺度上，以提高网络对细节的能力。此外，我们还添加了一个 semantic损失函数，以提高3D重建的 геометрической准确性和semantic coherence。广泛的实验表明，我们的模型可以实现最佳的质量和效率之间的平衡。DTU、BlendMVS和H3DS数据集的评估表明我们的方法的超越性。

Intelligent Escape of Robotic Systems: A Survey of Methodologies, Applications, and Challenges

paper_url: http://arxiv.org/abs/2310.14485
repo_url: None
paper_authors: Junfei Li, Simon X. Yang
for: 本研究评论了智能逃脱robotic系统的最新研究成果，以帮助读者更好地理解这一领域的发展趋势。
methods: 本文主要介绍了四种智能逃脱方法，包括规划基本方法、分区基本方法、学习基本方法和生物靓取得基本方法。
results: 本文 Summarizes the strengths and limitations of existing methods, and discusses their potential applications in various domains such as search and rescue, evacuation, military security, and healthcare. Additionally, the survey identifies current research challenges and provides insights into future research trends in intelligent escape.

Abstract
Intelligent escape is an interdisciplinary field that employs artificial intelligence (AI) techniques to enable robots with the capacity to intelligently react to potential dangers in dynamic, intricate, and unpredictable scenarios. As the emphasis on safety becomes increasingly paramount and advancements in robotic technologies continue to advance, a wide range of intelligent escape methodologies has been developed in recent years. This paper presents a comprehensive survey of state-of-the-art research work on intelligent escape of robotic systems. Four main methods of intelligent escape are reviewed, including planning-based methodologies, partitioning-based methodologies, learning-based methodologies, and bio-inspired methodologies. The strengths and limitations of existing methods are summarized. In addition, potential applications of intelligent escape are discussed in various domains, such as search and rescue, evacuation, military security, and healthcare. In an effort to develop new approaches to intelligent escape, this survey identifies current research challenges and provides insights into future research trends in intelligent escape.

摘要
智能逃脱是一个跨学科领域，利用人工智能技术（AI）为机器人提供智能应对危险的能力。随着安全的重要性日益增加，机器人技术的进步也在不断发展，Recently, a wide range of intelligent escape methodologies has been developed. This paper presents a comprehensive survey of state-of-the-art research work on intelligent escape of robotic systems. Four main methods of intelligent escape are reviewed, including planning-based methodologies, partitioning-based methodologies, learning-based methodologies, and bio-inspired methodologies. The strengths and limitations of existing methods are summarized. In addition, potential applications of intelligent escape are discussed in various domains, such as search and rescue, evacuation, military security, and healthcare. To develop new approaches to intelligent escape, this survey identifies current research challenges and provides insights into future research trends in intelligent escape.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

2023-10-23

DoGE: Domain Reweighting with Generalization Estimation

Irreducible Curriculum for Language Model Pretraining

Course Correcting Koopman Representations

Health Disparities through Generative AI Models: A Comparison Study Using A Domain Specific large language model

Semantic Data Management in Data Lakes

EpiK-Eval: Evaluation for Language Models as Epistemic Models

Vicinal Feature Statistics Augmentation for Federated 3D Medical Volume Segmentation

Why LLMs Hallucinate, and How to Get (Evidential) Closure: Perceptual, Intensional, and Extensional Learning for Faithful Natural Language Generation

Moral Foundations of Large Language Models

Serverless Federated Learning with flwr-serverless

Hallucination Detection for Grounded Instruction Generation

HetGPT: Harnessing the Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks

Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City

Neural Network with Local Converging Input (NNLCI) for Supersonic Flow Problems with Unstructured Grids

TaskDiff: A Similarity Metric for Task-Oriented Conversations

DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM

Active teacher selection for reinforcement learning from human feedback

Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI Grand Challenges

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards

CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

A new approach to template banks of gravitational waves with higher harmonics: reducing matched-filtering cost by over an order of magnitude

Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

Linear Representations of Sentiment in Large Language Models

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

Modeling Path Importance for Effective Alzheimer’s Disease Drug Repurposing

Causal Inference Using LLM-Guided Discovery

The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills

Dual-path convolutional neural network using micro-FTIR imaging to predict breast cancer subtypes and biomarkers levels: estrogen receptor, progesterone receptor, HER2 and Ki67

Acquiring Weak Annotations for Tumor Localization in Temporal and Volumetric Data

One-dimensional convolutional neural network model for breast cancer subtypes classification and biochemical content evaluation using micro-FTIR hyperspectral images

MGAS: Multi-Granularity Architecture Search for Effective and Efficient Neural Networks

Synergizing Human-AI Agency: A Guide of 23 Heuristics for Service Co-Creation with LLM-Based Agents

The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models

Robot Skill Generalization via Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models

Towards Conceptualization of “Fair Explanation”: Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators

TeleQnA: A Benchmark Dataset to Assess Large Language Models Telecommunications Knowledge

Meta- (out-of-context) learning in neural networks

A Universal Anti-Spoofing Approach for Contactless Fingerprint Biometric Systems

Machine Learning and Knowledge: Why Robustness Matters

UWB Based Static Gesture Classification

Deep Autoencoder-based Z-Interference Channels with Perfect and Imperfect CSI

Efficient Data Learning for Open Information Extraction with Pre-trained Language Models

Invariance is Key to Generalization: Examining the Role of Representation in Sim-to-Real Transfer for Visual Navigation

Meta learning with language models: Challenges and opportunities in the classification of imbalanced text

The primacy bias in Model-based RL

Understanding the Inner Workings of Language Models Through Representation Dissimilarity

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

The WHY in Business Processes: Discovery of Causal Execution Dependencies

Efficient Causal Discovery for Robotics Applications

PartialFormer: Modeling Part Instead of Whole

Linking Surface Facts to Large-Scale Knowledge Graphs

Universal Knowledge Graph Embeddings

Local Universal Rule-based Explanations

Non-autoregressive Streaming Transformer for Simultaneous Translation

A Study on Knowledge Graph Embeddings and Graph Neural Networks for Web Of Things

BioImage.IO Chatbot: A Personalized Assistant for BioImage Analysis Augmented by Community Knowledge Base

Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing

ESVAE: An Efficient Spiking Variational Autoencoder with Reparameterizable Poisson Spiking Sampling

Calibration of Time-Series Forecasting Transformers: Detecting and Adapting Context-Driven Distribution Shift

Harnessing Attention Mechanisms: Efficient Sequence Reduction using Attention-based Autoencoders

Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias

Large Language Models can Share Images, Too!

Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages

What do Deck Chairs and Sun Hats Have in Common? Uncovering Shared Properties in Large Concept Vocabularies

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning

An Efficient Imbalance-Aware Federated Learning Approach for Wearable Healthcare with Autoregressive Ratio Observation

Evaluating the Knowledge Base Completion Potential of GPT

Policy Gradient with Kernel Quadrature

The Safety Challenges of Deep Learning in Real-World Type 1 Diabetes Management

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

Predicting Transcription Factor Binding Sites using Transformer based Capsule Network

Generating Prototypes for Contradiction Detection Using Large Language Models and Linguistic Rules

A Survey on LLM-generated Text Detection: Necessity, Methods, and Future Directions