cs.CL - 2023-10-09

GPT-who: An Information Density-based Machine-Generated Text Detector

  • paper_url: http://arxiv.org/abs/2310.06202
  • repo_url: None
  • paper_authors: Saranya Venkatraman, Adaku Uchendu, Dongwon Lee
  • for: 本研究旨在检验语言模型和人类语言之间的差异,并提出一种基于统一信息密度原理的多类域不偏推权分类器GPT-who。
  • methods: 该分类器使用统一信息密度基本特征来模型每个语言模型和人类作者的独特统计特征,以便准确地归类作者。
  • results: 对4个大规模测试集进行评估,GPT-who比前一代统计基于分类器和非统计基于分类器的检测器(如GLTR、GPTZero、OpenAI检测器和ZeroGPT)提高了超过20%的表现。此外,GPT-who还具有较低的计算成本和可读性的优势。
    Abstract The Uniform Information Density principle posits that humans prefer to spread information evenly during language production. In this work, we examine if the UID principle can help capture differences between Large Language Models (LLMs) and human-generated text. We propose GPT-who, the first psycholinguistically-aware multi-class domain-agnostic statistical-based detector. This detector employs UID-based features to model the unique statistical signature of each LLM and human author for accurate authorship attribution. We evaluate our method using 4 large-scale benchmark datasets and find that GPT-who outperforms state-of-the-art detectors (both statistical- & non-statistical-based) such as GLTR, GPTZero, OpenAI detector, and ZeroGPT by over $20$% across domains. In addition to superior performance, it is computationally inexpensive and utilizes an interpretable representation of text articles. We present the largest analysis of the UID-based representations of human and machine-generated texts (over 400k articles) to demonstrate how authors distribute information differently, and in ways that enable their detection using an off-the-shelf LM without any fine-tuning. We find that GPT-who can distinguish texts generated by very sophisticated LLMs, even when the overlying text is indiscernible.
    摘要 人类偏好将信息均匀分布在语言生成中,这种现象被称为Uniform Information Density原理(UID)。在这项工作中,我们研究了UID原理是否可以捕捉不同的大语言模型(LLM)和人类生成的文本之间的差异。我们提出了GPT-who,首个心理语言学感知的多类域共通统计基础探测器。这个探测器使用UID基本特征来模型每个LLM和人类作者的唯一统计特征,以便准确地归属作者。我们使用4个大规模数据集进行评估,发现GPT-who在各个领域都高于当前最佳探测器(包括统计和非统计基础的探测器),例如GLTR、GPTZero、OpenAI探测器和ZeroGPT,以上差20%以上。此外,GPT-who还具有低计算成本和可解释的文本表示,并进行了400k篇文章的最大分析,以示出作者在分布信息的方式,以及如何使用存储库LM进行探测。我们发现GPT-who可以分辨由非常复杂的LLM生成的文本,即使文本总体看起来一样。

Compressing Context to Enhance Inference Efficiency of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.06201
  • repo_url: https://github.com/liyucheng09/selective_context
  • paper_authors: Yucheng Li, Bo Dong, Chenghua Lin, Frank Guerin
  • for: 提高大语言模型(LLM)的推理效率,解决长文档和长 conversations 的计算要求增加和内存占用问题。
  • methods: 提出了一种名为选择性上下文的方法,通过识别并剔除输入上下文中的重复部分,使输入更加紧凑。
  • results: 实验结果表明,选择性上下文方法可以significantly 降低内存成本和生成时间,保持与全Context相对的性能,具体是:Context cost 减少50%,内存使用量减少36%,生成时间减少32%,而BERTscore和 faithfulness 只减少0.023和0.038。
    Abstract Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50\% reduction in context cost, resulting in a 36\% reduction in inference memory usage and a 32\% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance.
    摘要

The Importance of Prompt Tuning for Automated Neuron Explanations

  • paper_url: http://arxiv.org/abs/2310.06200
  • repo_url: None
  • paper_authors: Justin Lee, Tuomas Oikarinen, Arjun Chatha, Keng-Chi Chang, Yilan Chen, Tsui-Wei Weng
  • for: 了解大语言模型(LLMs)的各个神经元的作用,以便更好地理解模型和其安全性。
  • methods: 基于之前的研究,使用大语言模型such as GPT-4来解释每个神经元的作用。 Specifically, 分析使用的提示来生成解释的效果,并改进提示的格式以提高解释质量和减少计算成本。
  • results: 通过三种不同的方法,包括自动和人工评估,证明了我们的新提示可以大幅提高神经元解释质量,同时减少计算成本。
    Abstract Recent advances have greatly increased the capabilities of large language models (LLMs), but our understanding of the models and their safety has not progressed as fast. In this paper we aim to understand LLMs deeper by studying their individual neurons. We build upon previous work showing large language models such as GPT-4 can be useful in explaining what each neuron in a language model does. Specifically, we analyze the effect of the prompt used to generate explanations and show that reformatting the explanation prompt in a more natural way can significantly improve neuron explanation quality and greatly reduce computational cost. We demonstrate the effects of our new prompts in three different ways, incorporating both automated and human evaluations.
    摘要 最近的进步使大语言模型(LLM)的能力得到了大幅提高,但我们对这些模型和其安全性的理解尚未随着速度进步。在这篇论文中,我们尝试更深入地理解LLM,通过研究它们的个体神经元。我们建立在之前的工作之上,证明大语言模型如GPT-4可以用于解释每个语言模型神经元的作用。我们分析推荐的提示对神经元解释质量产生的影响,并显示通过更自然的提示格式化可以显著提高神经元解释质量,同时大幅降低计算成本。我们通过三种不同的方法示出了我们新的提示的效果,包括自动和人工评估。

BYOC: Personalized Few-Shot Classification with Co-Authored Class Descriptions

  • paper_url: http://arxiv.org/abs/2310.06111
  • repo_url: None
  • paper_authors: Arth Bohra, Govert Verkes, Artem Harutyunyan, Pascal Weinberger, Giovanni Campagna
  • for: 该论文旨在提供一种可以让用户自己建立文本分类器的新方法,以便用户可以根据自己的需求建立个性化的分类器。
  • methods: 该方法使用大语言模型(LLM),并通过用户和LLM之间的互动来帮助用户描述每个类型的核心特征。用户通过 annotating 每个少量示例来提供描述,并且 LLM 会提问有关每个示例的问题,以便帮助用户更好地描述每个类型。
  • results: 实验表明,该方法可以达到高精度(大约 82%),只使用了比较少的数据集训练。此外,在30名参与者的研究中,个性化的分类器的平均精度达到 90%,比州态艺前的方法高出 15%。
    Abstract Text classification is a well-studied and versatile building block for many NLP applications. Yet, existing approaches require either large annotated corpora to train a model with or, when using large language models as a base, require carefully crafting the prompt as well as using a long context that can fit many examples. As a result, it is not possible for end-users to build classifiers for themselves. To address this issue, we propose a novel approach to few-shot text classification using an LLM. Rather than few-shot examples, the LLM is prompted with descriptions of the salient features of each class. These descriptions are coauthored by the user and the LLM interactively: while the user annotates each few-shot example, the LLM asks relevant questions that the user answers. Examples, questions, and answers are summarized to form the classification prompt. Our experiments show that our approach yields high accuracy classifiers, within 82% of the performance of models trained with significantly larger datasets while using only 1% of their training sets. Additionally, in a study with 30 participants, we show that end-users are able to build classifiers to suit their specific needs. The personalized classifiers show an average accuracy of 90%, which is 15% higher than the state-of-the-art approach.
    摘要 文本分类是一个已经广泛研究并且具有多种应用的基础模块。然而,现有的方法需要大量的标注数据来训练一个模型,或者使用大型语言模型为基础,并且需要考虑制定的提示和长Context。这使得普通用户无法建立自己的分类器。为解决这个问题,我们提出了一种新的几个示例文本分类方法使用LLM。而不是几个示例,LLM被提示了每个类型的突出特征的描述。这些描述由用户和LLM共同编写:用户在每个几个示例中注解,LLM则问到有关的问题,用户回答。示例、问题和答案被总结为分类提示。我们的实验表明,我们的方法可以在82%的性能下建立高精度分类器,使用的训练集只有1%。此外,我们在30名参与者的研究中发现,普通用户可以建立自己需求的个性化分类器,这些分类器的平均精度为90%,高于现有方法15%。

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

  • paper_url: http://arxiv.org/abs/2310.06103
  • repo_url: https://github.com/digitalphonetics/multilingual-seq2seq-slu
  • paper_authors: Pavel Denisov, Ngoc Thang Vu
  • for: 这个论文旨在提出一种能够执行端到端语言理解(E2E-SLU)的多语言设置和任务,包括预测词语填充。
  • methods: 该方法使用预训练的语音和文本模型,并将其集成到一个生成型模型中,以实现E2E-SLU任务。
  • results: 经过预训练7000小时的多语言数据后,该模型可以超越当前状态的两个SLU数据集,并在另外两个SLU数据集上达到一定的改进。此外,该模型还可以在不同语言之间进行跨语言比较,并在PortMEDIA-Language数据集上提高最佳结果,减少了23.65%的概念/价值错误率。
    Abstract A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling. In this work, we propose a unified method that integrates multilingual pretrained speech and text models and performs E2E-SLU on six datasets in four languages in a generative manner, including the prediction of lexical fillers. We investigate how the proposed method can be improved by pretraining on widely available speech recognition data using several training objectives. Pretraining on 7000 hours of multilingual data allows us to outperform the state-of-the-art ultimately on two SLU datasets and partly on two more SLU datasets. Finally, we examine the cross-lingual capabilities of the proposed model and improve on the best known result on the PortMEDIA-Language dataset by almost half, achieving a Concept/Value Error Rate of 23.65%.
    摘要 许多方法已经被提出用于终端到终端语言理解(E2E-SLU),但是它们的评估通常缺乏多语言设置和需要预测词语填充的任务。在这项工作中,我们提出了一种统一方法,将多语言预训练的语音和文本模型集成,并在六个数据集上进行E2E-SLU,包括词语填充预测。我们研究了如何通过多语言批处理训练对这种方法进行改进,并在7000小时多语言数据上进行预训练。这些预训练可以使我们超越当前状态的术语SLU数据集上的最佳性能,并在两个SLU数据集上部分超越状态。最后,我们检查了提posed模型的交叉语言能力,并在PortMEDIA-Language数据集上提高了最佳知识的结果,将概念/价值错误率降低至23.65%。

Auditing Gender Analyzers on Text Data

  • paper_url: http://arxiv.org/abs/2310.06061
  • repo_url: None
  • paper_authors: Siddharth D Jaiswal, Ankit Kumar Verma, Animesh Mukherjee
  • for: This study aims to audit existing gender analyzers for biases against non-binary individuals.
  • methods: The study uses two datasets (Reddit comments and Tumblr posts) and fine-tunes a BERT multi-label classifier on these datasets to evaluate the accuracy of the gender analyzers.
  • results: The study finds that the existing gender analyzers are highly inaccurate, with an overall accuracy of ~50% on all platforms. The fine-tuned BERT model achieves an overall performance of ~77% on the most realistically deployable setting and a surprisingly higher performance of 90% for the non-binary class. Additionally, the study shows that ChatGPT, a highly advanced AI model, is also biased and needs better audits and moderation.
    Abstract AI models have become extremely popular and accessible to the general public. However, they are continuously under the scanner due to their demonstrable biases toward various sections of the society like people of color and non-binary people. In this study, we audit three existing gender analyzers -- uClassify, Readable and HackerFactor, for biases against non-binary individuals. These tools are designed to predict only the cisgender binary labels, which leads to discrimination against non-binary members of the society. We curate two datasets -- Reddit comments (660k) and, Tumblr posts (2.05M) and our experimental evaluation shows that the tools are highly inaccurate with the overall accuracy being ~50% on all platforms. Predictions for non-binary comments on all platforms are mostly female, thus propagating the societal bias that non-binary individuals are effeminate. To address this, we fine-tune a BERT multi-label classifier on the two datasets in multiple combinations, observe an overall performance of ~77% on the most realistically deployable setting and a surprisingly higher performance of 90% for the non-binary class. We also audit ChatGPT using zero-shot prompts on a small dataset (due to high pricing) and observe an average accuracy of 58% for Reddit and Tumblr combined (with overall better results for Reddit). Thus, we show that existing systems, including highly advanced ones like ChatGPT are biased, and need better audits and moderation and, that such societal biases can be addressed and alleviated through simple off-the-shelf models like BERT trained on more gender inclusive datasets.
    摘要

Few-Shot Spoken Language Understanding via Joint Speech-Text Models

  • paper_url: http://arxiv.org/abs/2310.05919
  • repo_url: None
  • paper_authors: Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu
  • for: 提高 spoken language understanding 任务中的数据有限性问题
  • methods: 使用共享表示空间的 speech-text 模型,并将文本模型转移到语音测试数据上
  • results: 与使用 speech-only 预训练模型 fine-tuned 10 倍更多数据相比,我们的提议方法可以在 sentiment analysis 和 named entity recognition 等任务中达到相似的性能,只需要 1 小时的标注语音数据Here’s the full text in Traditional Chinese:
  • for: 这paper是为了解决 spoken language understanding 任务中的数据有限性问题
  • methods: 我们使用共享表示空间的 speech-text 模型,并将文本模型转移到语音测试数据上
  • results: 与使用 speech-only 预训练模型 fine-tuned 10 倍更多数据相比,我们的提议方法可以在 sentiment analysis 和 named entity recognition 等任务中达到相似的性能,只需要 1 小时的标注语音数据
    Abstract Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks. By employing a pre-trained speech-text model, we find that models fine-tuned on text can be effectively transferred to speech testing data. With as little as 1 hour of labeled speech data, our proposed approach achieves comparable performance on spoken language understanding tasks (specifically, sentiment analysis and named entity recognition) when compared to previous methods using speech-only pre-trained models fine-tuned on 10 times more data. Beyond the proof-of-concept study, we also analyze the latent representations. We find that the bottom layers of speech-text models are largely task-agnostic and align speech and text representations into a shared space, while the top layers are more task-specific.
    摘要 近期关于语音表示模型同时预训练文本的工作表明了改进语音表示的潜在可能性。在这篇论文中,我们利用这些共享表示来解决语音理解任务中的数据有限问题。我们使用预训练的语音文本模型,发现可以将文本预训练模型转移到语音测试数据上,只需要1小时的标注语音数据。与之前使用语音只预训练模型 fine-tune 10倍更多数据的方法相比,我们的提议方法可以在语音理解任务(具体来说是情感分析和命名实体识别)中达到相同的性能水平。此外,我们还分析了隐藏表示。我们发现语音文本模型的下层主要是无关任务的,可以将语音和文本表示同化到共享空间,而顶层则更加具体地关注特定任务。

NEFTune: Noisy Embeddings Improve Instruction Finetuning

  • paper_url: http://arxiv.org/abs/2310.05914
  • repo_url: https://github.com/neelsjain/neftune
  • paper_authors: Neel Jain, Ping-yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein
  • for: 提高语言模型finetuning的性能
  • methods: 使用随机噪声添加到嵌入向量中 during 训练
  • results: 1. 使用噪声 embedding 的模型在 AlpacaEval 上的分数从 29.79% 提高到 64.69%;2. 在现代 instrucion datasets 上超过强基elines,包括 Evol-Instruct、ShareGPT 和 OpenPlatypus 上的提高约10%。
    Abstract We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a 10% improvement, with ShareGPT an 8% improvement, and with OpenPlatypus an 8% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune.
    摘要 我们显示了语言模型调整可以得到改善,有时会很大,只需使用简单的增强技巧。NEFTune在训练过程中将嵌入向量添加随机变化。通过使用Alpaca,标准调整LLaMA-2-7B的性能为29.79%,将提高到64.69%。NEFTune也超过了现代指令集 dataset 的强基elines。使用 Evol-Instruct 的模型进行调整会增加10%的性能,使用 ShareGPT 的模型进行调整会增加8%的性能,使用 OpenPlatypus 的模型进行调整会增加8%的性能。甚至是使用 RLHF 进一步调整的强大模型,如 LLama-2-Chat,也会受益于额外的 NEFTune 训练。

Controllable Chest X-Ray Report Generation from Longitudinal Representations

  • paper_url: http://arxiv.org/abs/2310.05881
  • repo_url: None
  • paper_authors: Francesco Dalla Serra, Chaoyang Wang, Fani Deligianni, Jeffrey Dalton, Alison Q O’Neil
  • for: 这篇论文的目的是提高医疗影像报告的速度和准确性,并且提供可控的报告生成模型。
  • methods: 本文提出了两个新方法:首先,使用 longitudinal representation learning 方法,将先前的医疗影像作为额外输入,将现有和先前的视觉信息联合和融合为一个共同 longitudinal 表现,以便给 multimodal 报告生成模型;其次,使用 sentence-anatomy dropout 训练策略,让报告生成模型在预测报告内容时仅预测和corrsponding的句子和体位。
  • results: 经过对 MIMIC-CXR 资料集的严格实验,本文的方法能够实现现有最佳的结果,同时具备可控的报告生成能力。
    Abstract Radiology reports are detailed text descriptions of the content of medical scans. Each report describes the presence/absence and location of relevant clinical findings, commonly including comparison with prior exams of the same patient to describe how they evolved. Radiology reporting is a time-consuming process, and scan results are often subject to delays. One strategy to speed up reporting is to integrate automated reporting systems, however clinical deployment requires high accuracy and interpretability. Previous approaches to automated radiology reporting generally do not provide the prior study as input, precluding comparison which is required for clinical accuracy in some types of scans, and offer only unreliable methods of interpretability. Therefore, leveraging an existing visual input format of anatomical tokens, we introduce two novel aspects: (1) longitudinal representation learning -- we input the prior scan as an additional input, proposing a method to align, concatenate and fuse the current and prior visual information into a joint longitudinal representation which can be provided to the multimodal report generation model; (2) sentence-anatomy dropout -- a training strategy for controllability in which the report generator model is trained to predict only sentences from the original report which correspond to the subset of anatomical regions given as input. We show through in-depth experiments on the MIMIC-CXR dataset how the proposed approach achieves state-of-the-art results while enabling anatomy-wise controllable report generation.
    摘要 医学成像报告是详细的文本描述医学扫描结果。每份报告都描述了病人的相关临床发现,并常常包括与当前扫描相比较以描述它们是如何发展的。医学报告是一项时间消耗的过程,扫描结果经常会受到延迟。为了加速报告,可以 integrate 自动报告系统,但是临床部署需要高度准确和可解释性。先前的自动医学报告方法通常不提供先前扫描作为输入,因此无法进行相关的比较,这会导致报告不准确。此外,这些方法的可解释性也不够。因此,我们利用现有的解剖学输入格式,引入两个新的方面:1. longitudinal representation learning——我们将先前扫描作为额外输入,提议一种方法来对当前和先前的视觉信息进行对接、拼接和融合,以生成共同的长期表示,这个表示可以被传给多模态报告生成模型。2. sentence-anatomy dropout——一种训练策略,用于控制报告生成模型的可控性。在训练过程中,报告生成模型需要预测来自原始报告的具体句子,其中句子的选择取决于提供的解剖学区域输入。我们通过对 MIMIC-CXR 数据集进行深入的实验,证明我们的方法可以达到当前领导的结果,同时允许解剖学 wise 可控报告生成。

Are Large Language Models Geospatially Knowledgeable?

  • paper_url: http://arxiv.org/abs/2310.13002
  • repo_url: None
  • paper_authors: Prabin Bhandari, Antonios Anastasopoulos, Dieter Pfoser
  • for: investigate the extent of geospatial knowledge and reasoning abilities in pre-trained Large Language Models (LLMs)
  • methods: probe LLMs for geo-coordinates, use geospatial and non-geospatial prepositions to gauge geospatial awareness, and utilize a multidimensional scaling (MDS) experiment to assess geospatial reasoning capabilities
  • results: larger and more sophisticated LLMs can synthesize geospatial knowledge from textual information, but there are limitations to their geospatial abilities
    Abstract Despite the impressive performance of Large Language Models (LLM) for various natural language processing tasks, little is known about their comprehension of geographic data and related ability to facilitate informed geospatial decision-making. This paper investigates the extent of geospatial knowledge, awareness, and reasoning abilities encoded within such pretrained LLMs. With a focus on autoregressive language models, we devise experimental approaches related to (i) probing LLMs for geo-coordinates to assess geospatial knowledge, (ii) using geospatial and non-geospatial prepositions to gauge their geospatial awareness, and (iii) utilizing a multidimensional scaling (MDS) experiment to assess the models' geospatial reasoning capabilities and to determine locations of cities based on prompting. Our results confirm that it does not only take larger, but also more sophisticated LLMs to synthesize geospatial knowledge from textual information. As such, this research contributes to understanding the potential and limitations of LLMs in dealing with geospatial information.
    摘要 尽管大型自然语言模型(LLM)在不同的自然语言处理任务上表现出色,但对于地理数据的理解和有关能力却得到了少量的研究。这篇论文探究了抽象语言模型中地理知识、意识和逻辑能力的程度。我们专注于autoregressive语言模型,并设计了以下三种实验方法:1. 使用地理坐标来评估语言模型的地理知识。2. 使用地理和非地理前置词来评估语言模型的地理意识。3. 使用多维度规范(MDS)实验来评估模型的地理逻辑能力,并确定文本中提到的城市的位置。我们的结果表明,不仅要有更大的语言模型,也需要更加复杂的语言模型才能从文本信息中 sinthezize 地理知识。因此,这项研究对于理解LLM在处理地理信息的可能性和局限性做出了贡献。

Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting

  • paper_url: http://arxiv.org/abs/2310.05824
  • repo_url: None
  • paper_authors: Nikolay Bogoychev, Pinzhen Chen
  • for: 提高机器翻译下游应用中精度的重要性,并且一种常见的方式是在翻译系统中注入 terminate 约束。
  • methods: 我们在 WMT 2023 翻译任务中采用了 translate-then-refine approach,这种方法不受领域限制,并且需要最小的手动努力。我们首先使用 word alignment 获取 pseudo-terminology 翻译,然后使用这些翻译来训练一个 terminology-aware 模型。此外,我们还 explore 了两种后处理方法。
  • results: 我们的 experiment 表明,我们的 terminology-aware 模型能够有效地 incorporate 精度,而使用大语言模型进行 refine 过程可以进一步提高 terminate 记忆。
    Abstract Terminology correctness is important in the downstream application of machine translation, and a prevalent way to ensure this is to inject terminology constraints into a translation system. In our submission to the WMT 2023 terminology translation task, we adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts. We annotate random source words with pseudo-terminology translations obtained from word alignment to first train a terminology-aware model. Further, we explore two post-processing methods. First, we use an alignment process to discover whether a terminology constraint has been violated, and if so, we re-decode with the violating word negatively constrained. Alternatively, we leverage a large language model to refine a hypothesis by providing it with terminology constraints. Results show that our terminology-aware model learns to incorporate terminologies effectively, and the large language model refinement process can further improve terminology recall.
    摘要 <>翻译精度在机器翻译下游应用中非常重要,一种常见的方法是将翻译系统中的术语约束注入到翻译系统中。在我们对WMT 2023翻译任务提交中,我们采用了一种翻译后修改的方法,这种方法不依赖于域名和需要 minimal的手动努力。我们首先使用word alignment获取pseudo-术语翻译,然后使用这些翻译来训练一个术语意识Model。此外,我们还探索了两种后处理方法。首先,我们使用一个对应过程来检查翻译是否违反了术语约束,如果违反了,那么我们会使用违反的单词做为约束来重新解码。其次,我们利用一个大型自然语言模型来修改一个假设,并提供术语约束来进一步提高术语回忆率。结果显示,我们的术语意识Model有效地收录了术语,而大型自然语言模型的修改过程可以进一步提高术语回忆率。

SC-Safety: A Multi-round Open-ended Question Adversarial Safety Benchmark for Large Language Models in Chinese

  • paper_url: http://arxiv.org/abs/2310.05818
  • repo_url: None
  • paper_authors: Liang Xu, Kangkang Zhao, Lei Zhu, Hang Xue
  • For: The paper aims to systematically assess the safety of Chinese large language models (LLMs) and provide a benchmark for creating safer and more trustworthy models.* Methods: The paper introduces a multi-round adversarial benchmark called SuperCLUE-Safety (SC-Safety) that includes 4912 open-ended questions covering 20 safety sub-dimensions. The benchmark involves human-model interactions and conversations to increase the challenges.* Results: The paper finds that closed-source models perform better in terms of safety compared to open-source models, and models released from China demonstrate comparable safety levels to LLMs like GPT-3.5-turbo. Smaller models with 6B-13B parameters can also compete effectively in terms of safety. The findings provide guidance on model selection and promote collaborative efforts to create safer LLMs.
    Abstract Large language models (LLMs), like ChatGPT and GPT-4, have demonstrated remarkable abilities in natural language understanding and generation. However, alongside their positive impact on our daily tasks, they can also produce harmful content that negatively affects societal perceptions. To systematically assess the safety of Chinese LLMs, we introduce SuperCLUE-Safety (SC-Safety) - a multi-round adversarial benchmark with 4912 open-ended questions covering more than 20 safety sub-dimensions. Adversarial human-model interactions and conversations significantly increase the challenges compared to existing methods. Experiments on 13 major LLMs supporting Chinese yield the following insights: 1) Closed-source models outperform open-sourced ones in terms of safety; 2) Models released from China demonstrate comparable safety levels to LLMs like GPT-3.5-turbo; 3) Some smaller models with 6B-13B parameters can compete effectively in terms of safety. By introducing SC-Safety, we aim to promote collaborative efforts to create safer and more trustworthy LLMs. The benchmark and findings provide guidance on model selection. Our benchmark can be found at https://www.CLUEbenchmarks.com
    摘要 大型自然语言模型(LLM),如ChatGPT和GPT-4,已经表现出了惊人的自然语言理解和生成能力。然而,同时也可能生成有害内容,影响社会观念。为了系统地评估中文 LLM 的安全性,我们介绍了 SuperCLUE-Safety(SC-Safety)多轮对抗性测试框架,包括4912个开放式问题,覆盖超过20个安全子维度。对人机模型交互和对话的挑战性提高了现有方法的挑战性。对13种主要支持中文的 LLM 进行了实验,得到以下发现:1)关闭源代码模型在安全性方面表现更高;2)中国发布的模型与 GPT-3.5-turbo 的安全水平相当;3)一些6B-13B参数的小型模型可以有效竞争在安全性方面。通过引入 SC-Safety,我们希望促进开源 LLM 的创造和可信worthiness。我们的测试框架可以在https://www.CLUEbenchmarks.com找到。

DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.05793
  • repo_url: https://github.com/Shark-NLP/DiffuSeq
  • paper_authors: Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong
  • for: 提高Diffusion模型的训练速度和采样速度,以便于实际应用。
  • methods: 引入软吸收状态,使Diffusion模型能够在连续Diffusion空间中学习恢复分子突变,并使用现有的ODE解ulle器在连续空间中加速采样过程。
  • results: 实验结果表明,提出的方法可以提高训练速度4倍,并在800倍的采样速度下生成同质的样本,使其更加符合实际应用。
    Abstract Diffusion models have gained prominence in generating high-quality sequences of text. Nevertheless, current approaches predominantly represent discrete text within a continuous diffusion space, which incurs substantial computational overhead during training and results in slower sampling speeds. In this paper, we introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space, thereby enhancing its capacity to recover conditional signals. During the sampling phase, we employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process. Comprehensive experimental evaluations reveal that our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster, rendering it significantly closer to practical application. \footnote{The code is released at \url{https://github.com/Shark-NLP/DiffuSeq}
    摘要 Diffusion models 已经在生成高质量文本序列方面得到广泛应用。然而,当前的方法主要将整数文本 Represented as a continuous diffusion space中的一部分,这会导致训练过程中的计算开销很大,以及采样速度较慢。在这篇论文中,我们引入了软吸收状态,使得扩散模型能够学习根据下面的 Gaussian space 中的精度 Mutations 进行重建,从而提高其对 conditional signals 的恢复能力。在采样阶段,我们使用 state-of-the-art ODE solvers 在连续空间中进行采样,以便加速采样过程。经过了广泛的实验评估,我们的提议方法可以在训练速度和样本质量两个方面提高效率, specifically 4x 快速 Train convergence 和 800x faster sample generation,使其更加接近实际应用。Note: The URL in the footnote has been translated as well: \url{https://github.com/Shark-NLP/DiffuSeq} becomes \url{https://github.com/Shark-NLP/DiffuSeq}

Problem-Solving Guide: Predicting the Algorithm Tags and Difficulty for Competitive Programming Problems

  • paper_url: http://arxiv.org/abs/2310.05791
  • repo_url: https://github.com/sronger/psg_predicting_algorithm_tags_and_difficulty
  • paper_authors: Juntae Kim, Eunjung Cho, Dongwoo Kim, Dongbin Na
  • For: This paper aims to help engineers and developers solve algorithm problems more efficiently by predicting the algorithm tag and difficulty level of a problem.* Methods: The authors propose a deep learning-based method for simultaneously predicting algorithm tags and difficulty levels of an algorithm problem given.* Results: The authors present a real-world algorithm problem multi-task dataset, AMT, which is the most large-scale dataset for predicting algorithm tags compared to previous studies. They also show that their proposed method can accurately predict algorithm tags and difficulty levels.
    Abstract The recent program development industries have required problem-solving abilities for engineers, especially application developers. However, AI-based education systems to help solve computer algorithm problems have not yet attracted attention, while most big tech companies require the ability to solve algorithm problems including Google, Meta, and Amazon. The most useful guide to solving algorithm problems might be guessing the category (tag) of the facing problems. Therefore, our study addresses the task of predicting the algorithm tag as a useful tool for engineers and developers. Moreover, we also consider predicting the difficulty levels of algorithm problems, which can be used as useful guidance to calculate the required time to solve that problem. In this paper, we present a real-world algorithm problem multi-task dataset, AMT, by mainly collecting problem samples from the most famous and large competitive programming website Codeforces. To the best of our knowledge, our proposed dataset is the most large-scale dataset for predicting algorithm tags compared to previous studies. Moreover, our work is the first to address predicting the difficulty levels of algorithm problems. We present a deep learning-based novel method for simultaneously predicting algorithm tags and the difficulty levels of an algorithm problem given. All datasets and source codes are available at https://github.com/sronger/PSG_Predicting_Algorithm_Tags_and_Difficulty.
    摘要 现代软件开发行业强调解决问题能力,特别是应用程序开发人员。然而,基于人工智能的教育系统用于解决计算机算法问题尚未吸引到关注,而大多数大型科技公司都需要解决算法问题,包括Google、Meta和Amazon。解决算法问题的最有用指南可能是猜测问题的类别(标签)。因此,我们的研究挑战是预测算法标签作为工程师和开发人员的有用工具。此外,我们还考虑预测算法问题的困难程度,可以作为有用的导航来计算解决该问题所需的时间。在本文中,我们发布了一个实际世界上最大规模的算法问题多任务 dataset,即 AMT,通过主要收集来自最著名和最大竞赛编程网站 Codeforces 的问题样本。根据我们所知,我们提出的 dataset 是预测算法标签方面最大规模的比前一些研究。此外,我们的工作是第一次Addressing 预测算法问题的困难程度。我们提出了一种深度学习基于的新方法,可同时预测算法标签和问题的困难程度。所有数据和源代码都可以在 GitHub 上获取,请参考

Aligning Language Models with Human Preferences via a Bayesian Approach

  • paper_url: http://arxiv.org/abs/2310.05782
  • repo_url: https://github.com/wangjs9/aligned-dpm
  • paper_authors: Jiashuo Wang, Haozhao Wang, Shichao Sun, Wenjie Li
  • for: This paper aims to advance human-centric natural language generation (NLG) systems by ensuring alignment between NLG models and human preferences.
  • methods: The proposed method uses a Bayesian framework to account for the distribution of disagreements among human preferences in training a preference model, and utilizes contrastive learning to train the NLG model with the preference scores.
  • results: The proposed method consistently exceeds previous state-of-the-art (SOTA) models in both automatic and human evaluations on two human-centric NLG tasks, i.e., emotional support conversation and integrity “Rule-of-Thumb” generation.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目标是提高人类中心的自然语言生成系统,确保NLG模型和人类偏好的对应。
  • methods: 提议方法使用 bayesian 框架来考虑人类偏好的分布不一致,在偏好模型训练中使用对比学习训练 NLG 模型。
  • results: 提议方法在两个人类中心 NLG 任务(情感支持对话和规则杆准则生成)的自动和人类评估中一直超越过去的 SOTA 模型。
    Abstract In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences. To address this challenge, this paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model, and names it as d-PM. Besides, considering the RL strategy's inefficient and complex training process over the training efficiency, we further propose utilizing the contrastive learning strategy to train the NLG model with the preference scores derived from the d-PM model. Extensive experiments on two human-centric NLG tasks, i.e., emotional support conversation and integrity "Rule-of-Thumb" generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations.
    摘要 在增进人类中心的自然语言生成(NLG)系统方面,与人类偏好的吻合是关键。现有的流行方法通常采用了强化学习(RL)方法,并在人类反馈中训练一个奖励模型。然而,人类偏好的内在分歧对RL方法的训练带来了很大挑战,导致NLG性能下降。为解决这个问题,之前的方法通常采用了多数投票或平均值来整合多个不一致的偏好,但这些方法容易受到人类偏好的分歧的限制,并且只能表征特定人群,无法量化透过表达人类偏好的多样性。为此,本文提出了一种新的方法,即使用 Bayesian 框架来考虑人类偏好的分歧分布,并称之为 d-PM。此外,由于RL策略的训练过程复杂且不效率,我们进一步提议使用对比学习策略来训练NLG模型,使用 d-PM 模型生成的偏好分数。经验表明,我们的方法在两个人类中心NLG任务上(即情感支持对话和规则精神生成)都能够连续超越过去的最佳模型,并在自动和人类评估中表现出色。

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05736
  • repo_url: https://github.com/microsoft/LLMLingua
  • paper_authors: Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu
  • for: 这个研究旨在提高语言模型(LLM)的推导速度和降低成本,以便应用在各种应用中。
  • methods: 本研究使用了一种名为 LLMLingua 的弹性推导方法,包括一个预算控制器来维持semantic integrity,一个iterative compression algorithm来更好地模型压缩内容之间的依赖性,以及一种基于 instruction tuning的方法来对语言模型进行分布对齐。
  • results: 本研究在四个不同的数据集(GSM8K、BBH、ShareGPT和Arxiv-March23)上进行了实验和分析,结果显示 LLMLingua 方法可以实现现在的性能水准,并且可以实现高比例的压缩(Up to 20x),而且具有 littles 的性能损失。
    Abstract Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs are becoming increasingly lengthy, even exceeding tens of thousands of tokens. To accelerate model inference and reduce cost, this paper presents LLMLingua, a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity under high compression ratios, a token-level iterative compression algorithm to better model the interdependence between compressed contents, and an instruction tuning based method for distribution alignment between language models. We conduct experiments and analysis over four datasets from different scenarios, i.e., GSM8K, BBH, ShareGPT, and Arxiv-March23; showing that the proposed approach yields state-of-the-art performance and allows for up to 20x compression with little performance loss. Our code is available at https://aka.ms/LLMLingua.
    摘要

Towards Emotion-Based Synthetic Consciousness: Using LLMs to Estimate Emotion Probability Vectors

  • paper_url: http://arxiv.org/abs/2310.10673
  • repo_url: None
  • paper_authors: David Sinclair, Willem Pye
  • for: 这篇论文探讨了如何使用大语言模型(LLMs)来估算文本中情感状态的摘要。
  • methods: 该论文使用了大语言模型来计算文本中的情感摘要,该摘要包括情感描述词和该词在提要中出现概率。
  • results: 通过对亚马逊商品评论进行情感分析,该论文示出了情感描述词可以被映射到PCA类型空间中。然而,通过使用尾提要来引发行动来改进当前状态的方法并不是直接可行。
    Abstract This paper shows how LLMs (Large Language Models) may be used to estimate a summary of the emotional state associated with piece of text. The summary of emotional state is a dictionary of words used to describe emotion together with the probability of the word appearing after a prompt comprising the original text and an emotion eliciting tail. Through emotion analysis of Amazon product reviews we demonstrate emotion descriptors can be mapped into a PCA type space. It was hoped that text descriptions of actions to improve a current text described state could also be elicited through a tail prompt. Experiment seemed to indicate that this is not straightforward to make work. This failure put our hoped for selection of action via choosing the best predict ed outcome via comparing emotional responses out of reach for the moment.
    摘要 这篇论文介绍了如何使用大语言模型(LLMs)来估算一篇文本中的情感状态概述。概述的情感状态是一个词典,其中包含用于描述情感的词语以及这些词语在键入文本和情感Trigger后的概率。通过对亚马逊商品评论进行情感分析,我们示出了情感描述可以被映射到PCA类型空间。希望通过尾部提示来引发文本描述的行为改进方法,但实验表明这并不是一个简单的任务。这种失败使我们希望通过比较情感响应来选择最佳结果的方法被推迟。

A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

  • paper_url: http://arxiv.org/abs/2310.05694
  • repo_url: https://github.com/kaihe-better/llm-for-healthcare
  • paper_authors: Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, Erik Cambria
  • for: 这份survey旨在提供关于现有的大语言模型(LLMs)在医疗领域的开发进程和其应用前景,以及在PLMs的基础上发展出LLMs的开发路线图。
  • methods: 本文首先探讨LLMs在医疗应用中的可能性,并描述了LLMs的开发过程、训练数据、训练方法、优化策略和应用。此外,本文还对PLMs和LLMs之间进行比较,以及不同LLMs之间进行比较。
  • results: 本文总结了关于LLMs在医疗领域的开发和应用,包括关于Healthcare应用中LLMs的可能性、PLMs和LLMs之间的比较、训练数据、训练方法、优化策略和应用。此外,本文还考虑了在医疗领域部署LLMs时存在的独特问题,如公平、责任、透明度和伦理问题。
    Abstract The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern due to their ability to effectively respond to freetext queries with certain professional knowledge. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development roadmap from traditional Pretrained Language Models (PLMs) to LLMs. Specifically, we first explore the potential of LLMs to enhance the efficiency and effectiveness of various Healthcare applications highlighting both the strengths and limitations. Secondly, we conduct a comparison between the previous PLMs and the latest LLMs, as well as comparing various LLMs with each other. Then we summarize related Healthcare training data, training methods, optimization strategies, and usage. Finally, the unique concerns associated with deploying LLMs in Healthcare settings are investigated, particularly regarding fairness, accountability, transparency and ethics. Our survey provide a comprehensive investigation from perspectives of both computer science and Healthcare specialty. Besides the discussion about Healthcare concerns, we supports the computer science community by compiling a collection of open source resources, such as accessible datasets, the latest methodologies, code implementations, and evaluation benchmarks in the Github. Summarily, we contend that a significant paradigm shift is underway, transitioning from PLMs to LLMs. This shift encompasses a move from discriminative AI approaches to generative AI approaches, as well as a shift from model-centered methodologies to datacentered methodologies.
    摘要 大量语言模型(LLM)在医疗领域的应用已经引起了广泛的兴趣和担忧,因为它们可以有效地回答免费文本查询,并且具有一定的专业知识。本调查概述了目前已经开发出的LLMs的能力,并详细介绍其开发过程,以提供对开发路线图的概述,从传统的预训练语言模型(PLMs)到LLMs。 specifically,我们首先探讨LLMs在各种医疗应用中的可能性,并 highlighted它们的优点和局限性。其次,我们进行了PLMs和最新的LLMs之间的比较,以及不同的LLMs之间的比较。然后,我们总结了相关的医疗训练数据、训练方法、优化策略和使用。最后,我们调查了在医疗设置中部署LLMs的独特问题,特别是公平、责任、透明度和伦理。我们的调查提供了从计算机科学和医疗专业的视角的全面的调查。除了医疗问题的讨论外,我们还支持计算机科学社区,并将可 accessible datasets、最新的方法ologies、代码实现和评估标准集成到GitHub中。总之,我们认为现在正在进行一次重要的 парадиг转换,从PLMs到LLMs。这种转换包括从推理AI方法到生成AI方法,以及从模型中心的方法ологи到数据中心的方法ологи。

Larth: Dataset and Machine Translation for Etruscan

  • paper_url: http://arxiv.org/abs/2310.05688
  • repo_url: https://github.com/gianlucavico/larth-etruscan-nlp
  • paper_authors: Gianluca Vico, Gerasimos Spanakis
  • for: 这个论文的目的是提供一个从 Et 语言到英语的机器翻译数据集,以便未来的研究人员可以利用这些数据进行语言处理研究。
  • methods: 这个论文使用了一些自动和手动提取的翻译示例,并使用了不同的机器翻译模型进行评估。
  • results: 论文的结果表明,使用一个小型转换器模型可以实现 BLEU 分数为 10.1。这些结果可以帮助未来的研究人员在这种语言和其他具有有限资源的语言中进行更好的语言处理研究。
    Abstract Etruscan is an ancient language spoken in Italy from the 7th century BC to the 1st century AD. There are no native speakers of the language at the present day, and its resources are scarce, as there exist only around 12,000 known inscriptions. To the best of our knowledge, there are no publicly available Etruscan corpora for natural language processing. Therefore, we propose a dataset for machine translation from Etruscan to English, which contains 2891 translated examples from existing academic sources. Some examples are extracted manually, while others are acquired in an automatic way. Along with the dataset, we benchmark different machine translation models observing that it is possible to achieve a BLEU score of 10.1 with a small transformer model. Releasing the dataset can help enable future research on this language, similar languages or other languages with scarce resources.
    摘要 eti:一种古代的意大利语言,自公元7世纪起至公元1世纪止使用。目前无任何 native speaker,资源稀缺,只有约12,000件 known inscriptions。根据我们所知,没有公开available Etruscan corpora for natural language processing。因此,我们提出了一个从Etruscan到英语的机器翻译数据集,包含2891个翻译例子,其中一些是手动提取的,而另外一些是自动获取的。同时,我们对不同的机器翻译模型进行了benchmarking,发现可以达到10.1的BLEU分数,使用一个小型transformer模型。释放这个数据集可以帮助未来的研究这种语言、相似语言或其他语言with scarce resources。

A Closer Look into Automatic Evaluation Using Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05657
  • repo_url: https://github.com/d223302/a-closer-look-to-llm-evaluation
  • paper_authors: Cheng-Han Chiang, Hung-yi Lee
  • for: This paper aims to evaluate the effectiveness of using large language models (LLMs) for text quality evaluation, and to compare different evaluation methods.
  • methods: The paper uses two existing methods, LLM evaluation (Chiang and Lee, 2023) and G-Eval (Liu et al., 2023), and analyzes their strengths and weaknesses in terms of correlating with human ratings.
  • results: The paper finds that the auto Chain-of-Thought (CoT) used in G-Eval does not always improve correlation with human ratings, and that forcing the LLM to output only a numeric rating is suboptimal. Additionally, the paper shows that asking the LLM to explain its own ratings consistently improves the correlation between the ChatGPT and human ratings, and pushes state-of-the-art (SoTA) correlations on two meta-evaluation datasets.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目的是评估大语言模型(LLM)用于文本质量评估的有效性,并比较不同评估方法。
  • methods: 论文使用了两种现有的方法,LLM评估(Chiang和Lee,2023)和G-Eval(Liu等,2023),并分析它们在与人类评分相关性方面的优缺点。
  • results: 论文发现,G-Eval中的自动链条(CoT)并不总是提高与人类评分的相关性,而强制 LLM 输出只有数字评分也是不优化的。此外,论文还显示,让 LLM 解释自己的评分能够一直提高 ChatGPT 和人类评分之间的相关性,并达到状态之狮(SoTA)相关性水平在两个 meta-评估数据集上。
    Abstract Using large language models (LLMs) to evaluate text quality has recently gained popularity. Some prior works explore the idea of using LLMs for evaluation, while they differ in some details of the evaluation process. In this paper, we analyze LLM evaluation (Chiang and Lee, 2023) and G-Eval (Liu et al., 2023), and we discuss how those details in the evaluation process change how well the ratings given by LLMs correlate with human ratings. We find that the auto Chain-of-Thought (CoT) used in G-Eval does not always make G-Eval more aligned with human ratings. We also show that forcing the LLM to output only a numeric rating, as in G-Eval, is suboptimal. Last, we reveal that asking the LLM to explain its own ratings consistently improves the correlation between the ChatGPT and human ratings and pushes state-of-the-art (SoTA) correlations on two meta-evaluation datasets.
    摘要 使用大语言模型(LLM)评估文本质量已经很受欢迎。一些先前的工作探讨了使用LLM进行评估的想法,但它们在评估过程中有一些细节的不同。本文分析了LLM评估(Chiang和Lee,2023)和G-Eval(Liu等,2023),并分析了评估过程中的细节如何影响LLM给出的评分与人类评分之间的相关性。我们发现自动链条(CoT)使用在G-Eval中并不总是使G-Eval更加与人类评分相关。此外,我们还示出了让LLM输出只有数值评分,如G-Eval中所做的,是不优化的。最后,我们发现让LLM解释自己的评分能够一直保持和人类评分的相关性,并Push state-of-the-art(SoTA)相关性在两个meta-评估数据集上。

RAUCG: Retrieval-Augmented Unsupervised Counter Narrative Generation for Hate Speech

  • paper_url: http://arxiv.org/abs/2310.05650
  • repo_url: None
  • paper_authors: Shuyu Jiang, Wenyi Tang, Xingshu Chen, Rui Tanga, Haizhou Wang, Wenxian Wang
  • for: 本研究旨在提出一种自动生成Counter Narrative(CN)的方法,以便在互联网上防止仇恨言论(HS)无需削弱言论自由。
  • methods: 本研究使用自然语言生成技术来自动生成CN,并提出了一种 Retrieval-Augmented Unsupervised Counter Narrative Generation(RAUCG)方法。RAUCG方法包括SSF检索方法、能量基于扩展知识的解码机制和不间断的改进。
  • results: 实验结果表明,RAUCG方法在语言质量、毒害性、诱导力、相关性和HS防止成功率等方面都有显著改进,与强基eline相比,RAUCG方法在所有指标上都有提高。此外,RAUCG方法使得GPT2超过T0在所有指标上表现。
    Abstract The Counter Narrative (CN) is a promising approach to combat online hate speech (HS) without infringing on freedom of speech. In recent years, there has been a growing interest in automatically generating CNs using natural language generation techniques. However, current automatic CN generation methods mainly rely on expert-authored datasets for training, which are time-consuming and labor-intensive to acquire. Furthermore, these methods cannot directly obtain and extend counter-knowledge from external statistics, facts, or examples. To address these limitations, we propose Retrieval-Augmented Unsupervised Counter Narrative Generation (RAUCG) to automatically expand external counter-knowledge and map it into CNs in an unsupervised paradigm. Specifically, we first introduce an SSF retrieval method to retrieve counter-knowledge from the multiple perspectives of stance consistency, semantic overlap rate, and fitness for HS. Then we design an energy-based decoding mechanism by quantizing knowledge injection, countering and fluency constraints into differentiable functions, to enable the model to build mappings from counter-knowledge to CNs without expert-authored CN data. Lastly, we comprehensively evaluate model performance in terms of language quality, toxicity, persuasiveness, relevance, and success rate of countering HS, etc. Experimental results show that RAUCG outperforms strong baselines on all metrics and exhibits stronger generalization capabilities, achieving significant improvements of +2.0% in relevance and +4.5% in success rate of countering metrics. Moreover, RAUCG enabled GPT2 to outperform T0 in all metrics, despite the latter being approximately eight times larger than the former. Warning: This paper may contain offensive or upsetting content!
    摘要 “Counter Narrative(CN)是一种有前途的方法来防止网络诽谤(HS)无须对自由表达造成限制。过去几年,有关自动生成CN的研究愈来愈受到关注。然而,目前的自动CN生成方法主要靠专家撰写的数据进行训练,这是时间consuming和劳动密集的。另外,这些方法不能直接从外部获取和扩展反知识。为了解决这些限制,我们提出了内部扩展无监控Counter Narrative生成(RAUCG),以自动扩展外部反知识并将其映射到CN中。具体来说,我们首先引入SSF搜寻方法,从多种看法的立场一致性、semantic overlap rate和HS适用度等方面搜寻反知识。然后,我们设计了能量基的解oding机制,将知识注入、反驳和流利性的约束转化为可微分函数,让模型从反知识中建立CN映射,不需要专家撰写CN数据。最后,我们对模型表现进行了全面评估,包括语言质量、毒性、说服力、相关性和防止HS成功率等。实验结果显示,RAUCG在所有指标上表现出色,与强基eline进行比较,RAUCG在所有指标上具有+2.0%的相关性和+4.5%的防止HS成功率的改进。此外,RAUCG使得GPT2超越T0,即使T0比GPT2大约8倍。警告:本文可能含有刺激或尴尬的内容!”

Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution

  • paper_url: http://arxiv.org/abs/2310.05634
  • repo_url: None
  • paper_authors: Xinze Li, Yixin Cao2, Liangming Pan, Yubo Ma, Aixin Sun
  • for: 提高大语言模型(LLMs)的可靠性和准确性,并解决 LLMS 的三大核心问题。
  • methods: 利用知识图(KG)扩展 attrribution 源,并提出“自我不足”设定,考虑模型需要的知识不足。提出了一个全面的自动评估指标,覆盖文本质量、引用质量和文本引用对齐。
  • results: 通过构建生物领域数据集 BioKaLMA,并开发基线解决方案,显示 LLMS 在引用生成方面存在大量的改进空间,强调需要包含“自我不足”设定,以及重要性取得高度准确的检索精度。
    Abstract Although achieving great success, Large Language Models (LLMs) usually suffer from unreliable hallucinations. In this paper, we define a new task of Knowledge-aware Language Model Attribution (KaLMA) that improves upon three core concerns on conventional attributed LMs. First, we extend attribution source from unstructured texts to Knowledge Graph (KG), whose rich structures benefit both the attribution performance and working scenarios. Second, we propose a new ``Conscious Incompetence" setting considering the incomplete knowledge repository, where the model identifies the need for supporting knowledge beyond the provided KG. Third, we propose a comprehensive automatic evaluation metric encompassing text quality, citation quality, and text citation alignment. To implement the above innovations, we build a dataset in biography domain BioKaLMA via a well-designed evolutionary question generation strategy, to control the question complexity and necessary knowledge to the answer. For evaluation, we develop a baseline solution and demonstrate the room for improvement in LLMs' citation generation, emphasizing the importance of incorporating the "Conscious Incompetence" setting, and the critical role of retrieval accuracy.
    摘要 尽管大成功的大语言模型(LLM)通常受到不可靠的幻觉的影响,在这篇论文中,我们定义了一个新的知识感知语言模型归因(KaLMA)任务,以改进传统归因语言模型中的三大核心问题。首先,我们将归因源从文本扩展到知识图(KG), whose rich structures benefit both the attribution performance and working scenarios。其次,我们提出了一种“意识不足”的设定,考虑到知识库的缺失,使模型可以识别需要支持知识的情况。最后,我们提出了一个全面的自动评估指标,涵盖文本质量、引用质量和文本引用对齐。为实现以上创新,我们建立了一个在生传记领域的 BioKaLMA 数据集,通过一种Well-designed演化问题生成策略来控制问题复杂性和需要的知识。为评估,我们开发了一个基线解决方案,并示出了 LLMs 的引用生成中的改进空间,强调了在“意识不足”设定下的重要性,以及检索准确性的关键性。

Glitter or Gold? Deriving Structured Insights from Sustainability Reports via Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05628
  • repo_url: None
  • paper_authors: Marco Bronzini, Carlo Nicolini, Bruno Lepri, Andrea Passerini, Jacopo Staiano
  • for: This paper aims to provide a framework for extracting and analyzing non-financial information from sustainability reports to support investors’ ESG-related decision-making.
  • methods: The authors use Large Language Models (LLMs), Retrieved Augmented Generation, and in-context learning to extract semantically structured information from sustainability reports. They also employ graph-based representations to analyze the obtained findings.
  • results: The authors generate meaningful statistical, similarity, and correlation analyses concerning the sustainability actions undertaken across industries, sectors, and regions. They also investigate the factors that impact companies’ ESG scores using their findings and other company information.
    Abstract Over the last decade, several regulatory bodies have started requiring the disclosure of non-financial information from publicly listed companies, in light of the investors' increasing attention to Environmental, Social, and Governance (ESG) issues. Such information is publicly released in a variety of non-structured and multi-modal documentation. Hence, it is not straightforward to aggregate and consolidate such data in a cohesive framework to further derive insights about sustainability practices across companies and markets. Thus, it is natural to resort to Information Extraction (IE) techniques to provide concise, informative and actionable data to the stakeholders. Moving beyond traditional text processing techniques, in this work we leverage Large Language Models (LLMs), along with prominent approaches such as Retrieved Augmented Generation and in-context learning, to extract semantically structured information from sustainability reports. We then adopt graph-based representations to generate meaningful statistical, similarity and correlation analyses concerning the obtained findings, highlighting the prominent sustainability actions undertaken across industries and discussing emerging similarity and disclosing patterns at company, sector and region levels. Lastly, we investigate which factual aspects impact the most on companies' ESG scores using our findings and other company information.
    摘要 In this work, we leverage Large Language Models (LLMs) and prominent approaches such as Retrieved Augmented Generation and in-context learning to extract semantically structured information from sustainability reports. We then use graph-based representations to generate meaningful statistical, similarity, and correlation analyses of the obtained findings, highlighting the prominent sustainability actions undertaken across industries and discussing emerging similarity and disclosure patterns at company, sector, and region levels.Finally, we investigate which factual aspects have the greatest impact on companies' ESG scores using our findings and other company information.

Integrating Stock Features and Global Information via Large Language Models for Enhanced Stock Return Prediction

  • paper_url: http://arxiv.org/abs/2310.05627
  • repo_url: None
  • paper_authors: Yujie Ding, Shuai Jia, Tianyi Ma, Bingcheng Mao, Xiuze Zhou, Liuliu Li, Dongming Han
  • for: 这个研究旨在将大语言模型(LLMs)如ChatGPT和GPT-4 integrating into existing quantitative investment models, in order to improve the accuracy of stock return predictions.
  • methods: 本研究提出了一个 novel framework, which consists of two components: (1) the Local-Global (LG) model, which introduces three distinct strategies for modeling global information, and (2) Self-Correlated Reinforcement Learning (SCRL), which focuses on aligning the embeddings of financial news generated by LLMs with stock features within the same semantic space.
  • results: 经过实现本研究的框架后, 在中国A股市场中的 Rank Information Coefficient和回归表现得到了提高, 特别是与仅使用股票特征的模型相比。
    Abstract The remarkable achievements and rapid advancements of Large Language Models (LLMs) such as ChatGPT and GPT-4 have showcased their immense potential in quantitative investment. Traders can effectively leverage these LLMs to analyze financial news and predict stock returns accurately. However, integrating LLMs into existing quantitative models presents two primary challenges: the insufficient utilization of semantic information embedded within LLMs and the difficulties in aligning the latent information within LLMs with pre-existing quantitative stock features. We propose a novel framework consisting of two components to surmount these challenges. The first component, the Local-Global (LG) model, introduces three distinct strategies for modeling global information. These approaches are grounded respectively on stock features, the capabilities of LLMs, and a hybrid method combining the two paradigms. The second component, Self-Correlated Reinforcement Learning (SCRL), focuses on aligning the embeddings of financial news generated by LLMs with stock features within the same semantic space. By implementing our framework, we have demonstrated superior performance in Rank Information Coefficient and returns, particularly compared to models relying only on stock features in the China A-share market.
    摘要 大型自然语言模型(LLM)如ChatGPT和GPT-4的出色成就和快速进步,曝光了它们在量化投资中的巨大潜力。投资者可以充分利用这些LLM来分析金融新闻并准确预测股票回报。然而,将LLM integrate into现有的量化模型存在两个主要挑战:一是在LLM中嵌入的 semantics信息的不足利用,二是将LLM中的幽默信息与现有的量化股票特征进行对齐。我们提出了一个新的框架,包括两个组成部分:1. 本地-全局(LG)模型,这个模型引入了三种不同的全局模型策略,这些策略分别基于股票特征、LLM的能力和两种思想的混合方法。2. 自相关束力学学习(SCRL),它专注于将LLM生成的金融新闻嵌入与股票特征在同一个semantic空间进行对齐。通过实施我们的框架,我们在中国A股市场中示出了与只使用股票特征模型相比更高的rank信息系数和回报表现。

  • paper_url: http://arxiv.org/abs/2310.05620
  • repo_url: https://github.com/dai-shen/laiw
  • paper_authors: Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang
  • for: 评估当前许多法律LLM的法律能力。
  • methods: 分为三级:基础法律NLP能力、基础法律应用能力和复杂法律应用能力。
  • results: 第一阶段的评估结果显示,虽有一些法律LLM的表现更好于其基础模型,但还有一定差距与ChatGPT相比。
    Abstract With the emergence of numerous legal LLMs, there is currently a lack of a comprehensive benchmark for evaluating their legal abilities. In this paper, we propose the first Chinese Legal LLMs benchmark based on legal capabilities. Through the collaborative efforts of legal and artificial intelligence experts, we divide the legal capabilities of LLMs into three levels: basic legal NLP capability, basic legal application capability, and complex legal application capability. We have completed the first phase of evaluation, which mainly focuses on the capability of basic legal NLP. The evaluation results show that although some legal LLMs have better performance than their backbones, there is still a gap compared to ChatGPT. Our benchmark can be found at URL.
    摘要 “现在有很多法律 LLMS 出现,但是无法确定这些 LLMS 的法律能力水平。在这篇论文中,我们提出了第一个基于法律能力的中国 LLMS benchmark。通过法律和人工智能专家的共同努力,我们将 LLMS 的法律能力分为三级:基础法律 NLP 能力、基础法律应用能力和复杂法律应用能力。我们已经完成了第一阶段的评估,主要是关于基础法律 NLP 的能力。评估结果显示,虽然一些法律 LLMS 的性能比其底层更好,但还有一定的差距与 ChatGPT 相比。我们的 benchmark 可以在 URL 上找到。”Note: "LLMS" stands for "Legal Language Models" in this context.

Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance

  • paper_url: http://arxiv.org/abs/2310.05597
  • repo_url: None
  • paper_authors: Molly R. Petersen, Lonneke van der Plas
  • for: 测试模型是否可以学习基本的analogical reasoning,并使用更常见的人类语言 analogies 进行评估。
  • methods: 使用小量数据进行模型训练,并对模型的性能进行比较,以及与人类基准数据进行比较。
  • results: 结果显示,模型可以学习analogical reasoning,即使只使用小量数据。此外,模型在训练后与人类基准数据的性能相似。
    Abstract While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used NLP benchmarks. Our experiments find that models are able to learn analogical reasoning, even with a small amount of data. We additionally compare our models to a dataset with a human baseline, and find that after training, models approach human performance.
    摘要 而 analogies 是一种常见的方式来评估 word embeddings 在自然语言处理中,但是还是有趣的问题是否可以学习 analogical reasoning。在这篇文章中,我们测试了几种方式来学习基本的analogical reasoning,特别是关注常用于评估人类的 analogies,而不是常用于 NLP benchmarks 中的 analogies。我们的实验发现,模型可以学习 analogical reasoning,即使只有一小 amount of data。我们还对比了我们的模型与人类基准数据,发现, después de 训练,模型接近人类性能。

DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking

  • paper_url: http://arxiv.org/abs/2310.05589
  • repo_url: https://github.com/starreeze/drin
  • paper_authors: Shangyu Xing, Fei Zhao, Zhen Wu, Chunhui Li, Jianbing Zhang, Xinyu Dai
  • for: 本研究旨在解决多模态Entity Linking(MEL)任务中的匹配批处和多样性问题。
  • methods: 我们提出了一种新的Dynamic Relation Interactive Network(DRIN)模型,其中明确地表示了四种不同的匹配对象之间的对应关系,并通过动态的Graph Convolutional Network(GCN)选择对应的对应关系,以适应不同的输入样本。
  • results: 我们在两个数据集上进行了实验,并证明了DRIN比前STATE-OF-THE-ART方法提高了大量的性能。
    Abstract Multimodal Entity Linking (MEL) is a task that aims to link ambiguous mentions within multimodal contexts to referential entities in a multimodal knowledge base. Recent methods for MEL adopt a common framework: they first interact and fuse the text and image to obtain representations of the mention and entity respectively, and then compute the similarity between them to predict the correct entity. However, these methods still suffer from two limitations: first, as they fuse the features of text and image before matching, they cannot fully exploit the fine-grained alignment relations between the mention and entity. Second, their alignment is static, leading to low performance when dealing with complex and diverse data. To address these issues, we propose a novel framework called Dynamic Relation Interactive Network (DRIN) for MEL tasks. DRIN explicitly models four different types of alignment between a mention and entity and builds a dynamic Graph Convolutional Network (GCN) to dynamically select the corresponding alignment relations for different input samples. Experiments on two datasets show that DRIN outperforms state-of-the-art methods by a large margin, demonstrating the effectiveness of our approach.
    摘要 多modal实体连接(MEL)是一个任务,旨在将多modal上的潜在含义链接到多modal知识库中的参照实体。 current methods for MEL 采用一致的框架:先将文本和图像相互作用,以获得提及和实体的表示,然后计算它们之间的相似性,以预测正确的实体。 However, these methods still suffer from two limitations: first, as they fuse the features of text and image before matching, they cannot fully exploit the fine-grained alignment relations between the mention and entity. Second, their alignment is static, leading to low performance when dealing with complex and diverse data.To address these issues, we propose a novel framework called Dynamic Relation Interactive Network (DRIN) for MEL tasks. DRIN explicitly models four different types of alignment between a mention and entity and builds a dynamic Graph Convolutional Network (GCN) to dynamically select the corresponding alignment relations for different input samples. Experiments on two datasets show that DRIN outperforms state-of-the-art methods by a large margin, demonstrating the effectiveness of our approach.

Regulation and NLP (RegNLP): Taming Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05553
  • repo_url: None
  • paper_authors: Catalina Goanta, Nikolaos Aletras, Ilias Chalkidis, Sofia Ranchordas, Gerasimos Spanakis
  • for: 这篇论文目的是探讨 NLP 研究如何受益于与法规研究的互动,以便更好地评估和控制风险。
  • methods: 论文使用的方法包括研究现有的法规研究和 NLP 研究,以及将这两者相互连接。
  • results: 论文指出了现有 NLP 研究中关于风险评估的缺陷,并提出了一种新的多学科研究空间(RegNLP),以便将科学知识与法规过程相连。
    Abstract The scientific innovation in Natural Language Processing (NLP) and more broadly in artificial intelligence (AI) is at its fastest pace to date. As large language models (LLMs) unleash a new era of automation, important debates emerge regarding the benefits and risks of their development, deployment and use. Currently, these debates have been dominated by often polarized narratives mainly led by the AI Safety and AI Ethics movements. This polarization, often amplified by social media, is swaying political agendas on AI regulation and governance and posing issues of regulatory capture. Capture occurs when the regulator advances the interests of the industry it is supposed to regulate, or of special interest groups rather than pursuing the general public interest. Meanwhile in NLP research, attention has been increasingly paid to the discussion of regulating risks and harms. This often happens without systematic methodologies or sufficient rooting in the disciplines that inspire an extended scope of NLP research, jeopardizing the scientific integrity of these endeavors. Regulation studies are a rich source of knowledge on how to systematically deal with risk and uncertainty, as well as with scientific evidence, to evaluate and compare regulatory options. This resource has largely remained untapped so far. In this paper, we argue how NLP research on these topics can benefit from proximity to regulatory studies and adjacent fields. We do so by discussing basic tenets of regulation, and risk and uncertainty, and by highlighting the shortcomings of current NLP discussions dealing with risk assessment. Finally, we advocate for the development of a new multidisciplinary research space on regulation and NLP (RegNLP), focused on connecting scientific knowledge to regulatory processes based on systematic methodologies.
    摘要 科学创新在自然语言处理(NLP)和人工智能(AI)领域正在进行最快的发展。大语言模型(LLMs)在新的自动化时代引入了重要的争议,包括开发、部署和使用的优点和风险。目前,这些争议主要由人工智能安全和人工智能伦理运动领导。这种偏见,经常通过社交媒体扩大,正在影响政策制定和人工智能管理,并且存在政策捕捉的问题。在NLP研究中,越来越多的注意力被集中在评估风险和害处的问题上,但这些讨论通常缺乏系统化的方法和 suficient的基础知识,从而威胁NLP研究的科学 integriy。在这篇论文中,我们 argue that NLP研究可以从靠近 regulatory studies 和相关领域中受益。我们讨论了基本的管制原则,以及风险和不确定性的问题,并指出了当前NLP讨论中评估风险的缺陷。最后,我们倡议成立一个新的多学科研究空间(RegNLP),专注于将科学知识与管制过程相连,基于系统化的方法。

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

  • paper_url: http://arxiv.org/abs/2310.05513
  • repo_url: None
  • paper_authors: Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe
  • for: 本研究旨在探讨多语言speech recognition和语言识别领域中自主学习模型的应用。
  • methods: 本研究使用了SUPERB框架,并对多语言speech recognition和语言识别进行了自主学习模型的应用。
  • results: 研究发现,即便扩大模型规模,也不是一定能解决多语言speech任务中的所有挑战。不同的speech/voice类型对多语言speech处理具有显著的挑战。
    Abstract The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing.
    摘要 2023年多语言语音通用性表现 benchmark (ML-SUPERB) 挑战,将原有的 SUPERB 框架扩展到多语言语音识别和语言标识领域,强调自动标注模型。挑战包括应用于特定多语言主题的研究车道、挑战车道 для模型提交,以及新语言车道,其中语言资源研究人员可以在最新的多语言语音识别技术下评估和投入低资源语言数据。挑战共收到12个模型提交和54个语言资料库,创造了154种语言的全面 benchmark。发现表明,只有简单地扩大模型的规模并不是多语言语音任务的绝佳解决方案,多种语音/voice 类型在多语言语音处理中具有重要挑战。

XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners

  • paper_url: http://arxiv.org/abs/2310.05502
  • repo_url: https://github.com/luoxiaoheics/xal
  • paper_authors: Yun Luo, Zhen Yang, Fandong Meng, Yingjie Li, Fang Guo, Qinglin Qi, Jie Zhou, Yue Zhang
  • for: The paper is written for proposing a novel Explainable Active Learning (XAL) framework for low-resource text classification, which aims to encourage classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
  • methods: The paper uses a pre-trained bi-directional encoder for classification, and employs a pre-trained uni-directional decoder to generate and score the explanation. A ranking loss is proposed to enhance the decoder’s capability in scoring explanations. During the selection of unlabeled data, the paper combines the predictive uncertainty of the encoder and the explanation score of the decoder to acquire informative data for annotation.
  • results: The paper achieves substantial improvement on all six tasks over previous Active Learning (AL) methods, and ablation studies demonstrate the effectiveness of each component. Human evaluation shows that the model trained in XAL performs surprisingly well in explaining its prediction.
    Abstract Active learning aims to construct an effective training set by iteratively curating the most informative unlabeled data for annotation, which is practical in low-resource tasks. Most active learning techniques in classification rely on the model's uncertainty or disagreement to choose unlabeled data. However, previous work indicates that existing models are poor at quantifying predictive uncertainty, which can lead to over-confidence in superficial patterns and a lack of exploration. Inspired by the cognitive processes in which humans deduce and predict through causal information, we propose a novel Explainable Active Learning framework (XAL) for low-resource text classification, which aims to encourage classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Specifically, besides using a pre-trained bi-directional encoder for classification, we employ a pre-trained uni-directional decoder to generate and score the explanation. A ranking loss is proposed to enhance the decoder's capability in scoring explanations. During the selection of unlabeled data, we combine the predictive uncertainty of the encoder and the explanation score of the decoder to acquire informative data for annotation. As XAL is a general framework for text classification, we test our methods on six different classification tasks. Extensive experiments show that XAL achieves substantial improvement on all six tasks over previous AL methods. Ablation studies demonstrate the effectiveness of each component, and human evaluation shows that the model trained in XAL performs surprisingly well in explaining its prediction.
    摘要 aktive lerning 目的是建立一个有效的训练集 by 遍历最有用的无标例数据进行标签,尤其适用于低资源任务。大多数 aktive lerning 技术在分类中依赖模型的不确定性或争议选择无标例数据。然而,先前的工作表明,现有的模型对于预测不确定性的评估不善,可能会导致超过自信和 superficiale 的模式,而无法探索。 inspirited by 人类的认知过程中的推理和预测,我们提出了一个 novel Explainable Active Learning 框架 (XAL) 供低资源文本分类, aiming to encourage classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations. Specifically, besides using a pre-trained bi-directional encoder for classification, we employ a pre-trained uni-directional decoder to generate and score the explanation. A ranking loss is proposed to enhance the decoder's capability in scoring explanations. During the selection of unlabeled data, we combine the predictive uncertainty of the encoder and the explanation score of the decoder to acquire informative data for annotation. As XAL is a general framework for text classification, we test our methods on six different classification tasks. Extensive experiments show that XAL achieves substantial improvement on all six tasks over previous AL methods. Ablation studies demonstrate the effectiveness of each component, and human evaluation shows that the model trained in XAL performs surprisingly well in explaining its prediction.

  • paper_url: http://arxiv.org/abs/2310.05484
  • repo_url: None
  • paper_authors: Vageesh Saxena, Benjamin Bashpole, Gijs Van Dijck, Gerasimos Spanakis
  • for: 本研究旨在帮助法 enforcement 机构(LEA)更好地识别和连接人口贩卖(HT)案件和在线广告(ads)。
  • methods: 本研究使用了87,595个文本广告和5,244个vendor标签来建立一个庞大的IDTraffickers数据集,并使用了DeCLUTR-small模型进行训练,以实现闭包集合分类环境中的macro-F1分数0.8656。
  • results: 通过使用训练过的分类器提取的样式表示,实现了基于开集排序环境的mean r-precision分数0.8852,以便更好地识别潜在的HT指示器。
    Abstract Human trafficking (HT) is a pervasive global issue affecting vulnerable individuals, violating their fundamental human rights. Investigations reveal that a significant number of HT cases are associated with online advertisements (ads), particularly in escort markets. Consequently, identifying and connecting HT vendors has become increasingly challenging for Law Enforcement Agencies (LEAs). To address this issue, we introduce IDTraffickers, an extensive dataset consisting of 87,595 text ads and 5,244 vendor labels to enable the verification and identification of potential HT vendors on online escort markets. To establish a benchmark for authorship identification, we train a DeCLUTR-small model, achieving a macro-F1 score of 0.8656 in a closed-set classification environment. Next, we leverage the style representations extracted from the trained classifier to conduct authorship verification, resulting in a mean r-precision score of 0.8852 in an open-set ranking environment. Finally, to encourage further research and ensure responsible data sharing, we plan to release IDTraffickers for the authorship attribution task to researchers under specific conditions, considering the sensitive nature of the data. We believe that the availability of our dataset and benchmarks will empower future researchers to utilize our findings, thereby facilitating the effective linkage of escort ads and the development of more robust approaches for identifying HT indicators.
    摘要 人口贩卖(HT)是一个广泛存在的全球问题,影响到抵触的个人,违反其基本人权。调查表明,许多HT案件与在线广告(ads)相关,特别是在escort市场上。因此,为了识别和连接HT提供者而成为了法 enforcement agencies(LEAs)的挑战。为解决这个问题,我们介绍IDTraffickers,一个包含87,595个文本广告和5,244个提供者标签的广泛的数据集,以启用在线escort市场上的HT提供者验证和识别。为建立作者鉴定的标准,我们训练了DeCLUTR-small模型,在closed-set分类环境中实现了macro-F1分数0.8656。然后,我们利用训练出来的样式表示来进行作者鉴定,在开放集排名环境中实现了mean r-precision分数0.8852。最后,为促进未来研究和负责任数据分享,我们计划将IDTraffickers数据集和benchmark分发给研究人员,但是需要特定的条件,考虑到数据的敏感性。我们认为,随着我们的数据和benchmark的可用性,未来的研究人员将能够利用我们的发现,从而促进escort广告和HT指标的有效链接,并开发更加坚强的HT指标识别方法。

Empower Nested Boolean Logic via Self-Supervised Curriculum Learning

  • paper_url: http://arxiv.org/abs/2310.05450
  • repo_url: https://github.com/gingasan/boolkill
  • paper_authors: Hongqiu Wu, Linfeng Liu, Hai Zhao, Min Zhang
  • for: 检验语言模型是否具有强大的推理能力,而不仅仅是因为数据训练。
  • methods: 使用自然语言处理技术,对语言模型进行自我超vised学习,逐步增加复杂的逻辑逻辑,从 simpler to harder。
  • results: 语言模型通过this新的自我超vised学习方法(\textsc{Clr),能够有效地推理更加复杂和长距离的逻辑。
    Abstract Beyond the great cognitive powers showcased by language models, it is crucial to scrutinize whether their reasoning capabilities stem from strong generalization or merely exposure to relevant data. As opposed to constructing increasingly complex logic, this paper probes into the boolean logic, the root capability of a logical reasoner. We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested boolean logic, a task that humans can handle with ease. To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method \textit{Curriculum Logical Reasoning} (\textsc{Clr}), where we augment the training data with nested boolean logic chain step-by-step, and program the training from simpler logical patterns gradually to harder ones. This new training paradigm allows language models to effectively generalize to much harder and longer-hop logic, which can hardly be learned through naive training. Furthermore, we show that boolean logic is a great foundation for improving the subsequent general logical tasks.
    摘要 更 beyond the great cognitive powers displayed by language models, it is crucial to examine whether their reasoning abilities are based on strong generalization or simply exposure to relevant data. Unlike constructing increasingly complex logic, this paper explores the boolean logic, the fundamental capability of a logical reasoner. We find that pre-trained language models, including large language models, can only perform like a random selector when faced with multi-nested boolean logic, a task that humans can handle easily. To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method called \textsc{Curriculum Logical Reasoning} (\textsc{Clr}), where we gradually add nested boolean logic chains to the training data, starting with simpler logical patterns and gradually increasing the difficulty. This new training paradigm enables language models to effectively generalize to much harder and longer-hop logic, which cannot be learned through naive training. Furthermore, we show that boolean logic provides a solid foundation for improving subsequent general logical tasks.

Establishing Trustworthiness: Rethinking Tasks and Model Evaluation

  • paper_url: http://arxiv.org/abs/2310.05442
  • repo_url: None
  • paper_authors: Robert Litschko, Max Müller-Eberstein, Rob van der Goot, Leon Weber, Barbara Plank
  • for: 理解自然语言处理(NLP)的核心概念和任务的 Computational Modeling,以及如何将其应用于实际场景中。
  • methods: traditional compartmentalized approaches for understanding a model’s functional capacity, as well as recommendations for more multi-faceted evaluation protocols.
  • results: the need for trustworthy and reliable NLP systems, and the importance of rethinking the traditional notion of language tasks and model evaluation in order to pursue a more holistic view of language.
    Abstract Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades. Traditionally, facets of linguistic intelligence have been compartmentalized into tasks with specialized model architectures and corresponding evaluation protocols. With the advent of large language models (LLMs) the community has witnessed a dramatic shift towards general purpose, task-agnostic approaches powered by generative models. As a consequence, the traditional compartmentalized notion of language tasks is breaking down, followed by an increasing challenge for evaluation and analysis. At the same time, LLMs are being deployed in more real-world scenarios, including previously unforeseen zero-shot setups, increasing the need for trustworthy and reliable systems. Therefore, we argue that it is time to rethink what constitutes tasks and model evaluation in NLP, and pursue a more holistic view on language, placing trustworthiness at the center. Towards this goal, we review existing compartmentalized approaches for understanding the origins of a model's functional capacity, and provide recommendations for more multi-faceted evaluation protocols.
    摘要 Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades. Traditionally, facets of linguistic intelligence have been compartmentalized into tasks with specialized model architectures and corresponding evaluation protocols. With the advent of large language models (LLMs) the community has witnessed a dramatic shift towards general purpose, task-agnostic approaches powered by generative models. As a consequence, the traditional compartmentalized notion of language tasks is breaking down, followed by an increasing challenge for evaluation and analysis. At the same time, LLMs are being deployed in more real-world scenarios, including previously unforeseen zero-shot setups, increasing the need for trustworthy and reliable systems. Therefore, we argue that it is time to rethink what constitutes tasks and model evaluation in NLP, and pursue a more holistic view on language, placing trustworthiness at the center. Towards this goal, we review existing compartmentalized approaches for understanding the origins of a model's functional capacity, and provide recommendations for more multi-faceted evaluation protocols.Here's the translation in Traditional Chinese:语言理解是一种多方面的认知能力,自然语言处理(NLP)社群在数十年来一直努力以计算方式模型。传统上,语言智能的不同方面被分类为特殊的任务,并且运用专门的模型架构和评估协议。 however, with the advent of large language models (LLMs),社群目睹了一个剧烈的转变,从特定任务的专门模型演化为通用、任务无关的方法,这导致了传统的语言任务分类系统崩溃。这一传统的分类系统崩溃,也导致了评估和分析的问题增加。同时,LLMs 正在更多的实际应用中,包括以前未见的零学习设置,增加了可靠和可信的系统的需求。因此,我们认为现在是重新定义语言任务和模型评估的时候,并将信任性置于中心。以这个目标为导向,我们回顾现有的分类方法,并提供更多的多方面评估协议。

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding

  • paper_url: http://arxiv.org/abs/2310.05424
  • repo_url: https://github.com/raymin0223/fast_robust_early_exit
  • paper_authors: Sangmin Bae, Jongwoo Ko, Hwanjun Song, Se-Young Yun
  • for: 提高 autoregressive 语言模型 的推理延迟
  • methods: 提出 Fast and Robust Early-Exiting (FREE) 框架,包括 shallow-deep 模块和同步并发解码
  • results: 在各种生成任务上实质性提高了推理速度,并提出了一种基于 Beta 混合模型的适应阈值估计器来确定适当的信心阈值
    Abstract To tackle the high inference latency exhibited by autoregressive language models, previous studies have proposed an early-exiting framework that allocates adaptive computation paths for each token based on the complexity of generating the subsequent token. However, we observed several shortcomings, including performance degradation caused by a state copying mechanism or numerous exit paths, and sensitivity to exit confidence thresholds. Consequently, we propose a Fast and Robust Early-Exiting (FREE) framework, which incorporates a shallow-deep module and a synchronized parallel decoding. Our framework enables faster inference by synchronizing the decoding process of the current token with previously stacked early-exited tokens. Furthermore, as parallel decoding allows us to observe predictions from both shallow and deep models, we present a novel adaptive threshold estimator that exploits a Beta mixture model to determine suitable confidence thresholds. We empirically demonstrated the superiority of our proposed framework on extensive generation tasks.
    摘要 previous studies have proposed an early-exiting framework that allocates adaptive computation paths for each token based on the complexity of generating the subsequent token. However, we observed several shortcomings, including performance degradation caused by a state copying mechanism or numerous exit paths, and sensitivity to exit confidence thresholds. Consequently, we propose a Fast and Robust Early-Exiting (FREE) framework, which incorporates a shallow-deep module and a synchronized parallel decoding. Our framework enables faster inference by synchronizing the decoding process of the current token with previously stacked early-exited tokens. Furthermore, as parallel decoding allows us to observe predictions from both shallow and deep models, we present a novel adaptive threshold estimator that exploits a Beta mixture model to determine suitable confidence thresholds. We empirically demonstrated the superiority of our proposed framework on extensive generation tasks.Here's the translation in Traditional Chinese:previous studies have proposed an early-exiting framework that allocates adaptive computation paths for each token based on the complexity of generating the subsequent token. However, we observed several shortcomings, including performance degradation caused by a state copying mechanism or numerous exit paths, and sensitivity to exit confidence thresholds. Consequently, we propose a Fast and Robust Early-Exiting (FREE) framework, which incorporates a shallow-deep module and a synchronized parallel decoding. Our framework enables faster inference by synchronizing the decoding process of the current token with previously stacked early-exited tokens. Furthermore, as parallel decoding allows us to observe predictions from both shallow and deep models, we present a novel adaptive threshold estimator that exploits a Beta mixture model to determine suitable confidence thresholds. We empirically demonstrated the superiority of our proposed framework on extensive generation tasks.

Automating Customer Service using LangChain: Building custom open-source GPT Chatbot for organizations

  • paper_url: http://arxiv.org/abs/2310.05421
  • repo_url: None
  • paper_authors: Keivalya Pandya, Mehfuza Holia
    for: This research paper aims to automate customer service using a custom Large Language Model (LLM) called LangChain, which can provide personalized, responsive, and context-aware support.methods: The paper proposes a new approach that combines open-source methodologies, web scraping, fine-tuning, and the integration of LangChain into customer service platforms. The research uses data collection via web scraping, embeddings, Google’s Flan T5 XXL, Base, and Small language models for knowledge retrieval, and the integration of a chatbot into customer service platforms.results: The paper shows that the proposed approach can provide real-time support and query resolution, with the chatbot integrated into customer service platforms. The results also demonstrate the ability to scale across industries and organizations, and elevate customer retention, value extraction, and brand image.
    Abstract In the digital age, the dynamics of customer service are evolving, driven by technological advancements and the integration of Large Language Models (LLMs). This research paper introduces a groundbreaking approach to automating customer service using LangChain, a custom LLM tailored for organizations. The paper explores the obsolescence of traditional customer support techniques, particularly Frequently Asked Questions (FAQs), and proposes a paradigm shift towards responsive, context-aware, and personalized customer interactions. The heart of this innovation lies in the fusion of open-source methodologies, web scraping, fine-tuning, and the seamless integration of LangChain into customer service platforms. This open-source state-of-the-art framework, presented as "Sahaay," demonstrates the ability to scale across industries and organizations, offering real-time support and query resolution. Key elements of this research encompass data collection via web scraping, the role of embeddings, the utilization of Google's Flan T5 XXL, Base and Small language models for knowledge retrieval, and the integration of the chatbot into customer service platforms. The results section provides insights into their performance and use cases, here particularly within an educational institution. This research heralds a new era in customer service, where technology is harnessed to create efficient, personalized, and responsive interactions. Sahaay, powered by LangChain, redefines the customer-company relationship, elevating customer retention, value extraction, and brand image. As organizations embrace LLMs, customer service becomes a dynamic and customer-centric ecosystem.
    摘要 在数字时代,顾客服务的动力是不断发展,受技术进步和大语言模型(LLM)的整合影响。这篇研究论文提出了一种创新的自动化顾客服务方法,基于自定义的 LangChain LLM,为组织提供了一种新的客户服务模式。论文探讨传统顾客支持技术,特别是常见问题(FAQ)的过时性,并提出了一种新的客户交互模式,强调响应式、上下文感知和个性化的客户交互。这种创新的核心在于将开源方法ologies、网络抓取、精度调整和 LangChain 集成到顾客服务平台上。这个开源的 state-of-the-art 框架,即 "Sahaay",能够在不同的行业和组织之间扩展,提供实时支持和问题解决。研究的关键元素包括通过网络抓取获取数据、使用 Google 的 Flan T5 XXL、Base 和 Small 语言模型 для知识检索,以及将 chatbot 集成到顾客服务平台上。研究结果提供了这些技术在不同的应用场景中的性能和使用情况,特别是在教育机构中。这项研究标志着客户服务的新时代,通过技术来创造高效、个性化、响应式的客户交互。Sahaay,基于 LangChain,重塑了客户-公司关系,提高客户退货、价值提取和品牌形象。随着组织接受 LLM,客户服务变成了一个动态和客户中心的生态系统。

mBBC: Exploring the Multilingual Maze

  • paper_url: http://arxiv.org/abs/2310.05404
  • repo_url: https://github.com/PortNLP/mBBC
  • paper_authors: Sina Bagheri Nezhad, Ameeta Agrawal
  • for: 这个论文的目的是评估三种知名的多语言语言模型(mBERT、XLM-R和GPT-3)的性能,以便更好地理解这些模型在不同语言和语言上下文中的表现。
  • methods: 这个论文使用了自然语言处理技术中的自我超vised任务(下一个单词预测)来评估这些模型的性能,并在多种语言中进行了评估。
  • results: 研究发现资源水平对模型性能产生关键作用,具有更高资源水平的模型具有更高的准确率。此外,研究还发现了语言家族和字体类型之间复杂的关系,需要进一步的调查和研究,以便更好地理解语言特点和结构变化对模型性能的影响。
    Abstract Multilingual language models have gained significant attention in recent years, enabling the development of applications that cater to diverse linguistic contexts. In this paper, we present a comprehensive evaluation of three prominent multilingual language models: mBERT, XLM-R, and GPT-3. Using the self-supervised task of next token prediction, we assess their performance across a diverse set of languages, with a focus on understanding the impact of resource availability, word order, language family, and script type on model accuracy. Our findings reveal that resource availability plays a crucial role in model performance, with higher resource levels leading to improved accuracy. We also identify the complex relationship between resource availability, language families, and script types, highlighting the need for further investigation into language-specific characteristics and structural variations. Additionally, our statistical inference analysis identifies significant features contributing to model performance, providing insights for model selection and deployment. Our study contributes to a deeper understanding of multilingual language models and informs future research and development to enhance their performance and generalizability across languages and linguistic contexts.
    摘要 simplified_chinese多语言语言模型在最近几年内受到了广泛关注,使得开发能够适应多种语言文化背景的应用程序变得可能。在本文中,我们对三种著名的多语言语言模型——mBERT、XLM-R和GPT-3进行了全面的评估。使用下一个元素预测任务,我们评估了这些模型在不同语言中的表现,并将着眼于资源可用性、字符串顺序、语言家族和文字类型对模型准确率的影响。我们发现资源可用性在模型性能中扮演着关键的角色,高resource levels导致了改进的准确率。我们还发现了语言家族和文字类型之间复杂的关系,这种关系需要进一步的研究,以便更好地理解语言特有的特征和结构上的变化。此外,我们的统计推理分析还提到了对模型性能的重要贡献因素,为未来的模型选择和部署提供了智能。本研究对多语言语言模型的深入理解和未来研发的提高和普适性做出了贡献。

GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence

  • paper_url: http://arxiv.org/abs/2310.05388
  • repo_url: None
  • paper_authors: Zhihua Wen, Zhiliang Tian, Wei Wu, Yuxin Yang, Yanqi Shi, Zhen Huang, Dongsheng Li
  • for: This paper aims to enhance the complexity and credibility of story generation by leveraging information from human-written stories and using a retrieval-augmented story generation framework.
  • methods: The proposed method uses a retrieval repository of target conditions to produce few-shot examples that serve as prompts for a large language model (LLM). It also employs an “asking-why” prompting scheme to extract a forest of evidence, which is used to compensate for ambiguities in the generated story.
  • results: The experimental results and numerous examples demonstrate the effectiveness of the proposed method in generating stories with complex and credible plots.
    Abstract Conditional story generation is significant in human-machine interaction, particularly in producing stories with complex plots. While Large language models (LLMs) perform well on multiple NLP tasks, including story generation, it is challenging to generate stories with both complex and creative plots. Existing methods often rely on detailed prompts to guide LLMs to meet target conditions, which inadvertently restrict the creative potential of the generated stories. We argue that leveraging information from exemplary human-written stories facilitates generating more diverse plotlines. Delving deeper into story details helps build complex and credible plots. In this paper, we propose a retrieval-au\textbf{G}mented sto\textbf{R}y generation framework with a f\textbf{O}rest of e\textbf{V}id\textbf{E}nce (GROVE) to enhance stories' complexity. We build a retrieval repository for target conditions to produce few-shot examples to prompt LLMs. Additionally, we design an ``asking-why'' prompting scheme that extracts a forest of evidence, providing compensation for the ambiguities that may occur in the generated story. This iterative process uncovers underlying story backgrounds. Finally, we select the most fitting chains of evidence from the evidence forest and integrate them into the generated story, thereby enhancing the narrative's complexity and credibility. Experimental results and numerous examples verify the effectiveness of our method.
    摘要 假设故事生成是人机交互中的重要方面,特别是生成具有复杂剧本的故事。虽然大型语言模型(LLMs)在多种自然语言处理任务中表现良好,但是生成具有复杂和创新剧本的故事仍然是挑战。现有的方法通常靠着详细的提示来引导LLMs,从而限制生成的故事创作潜力。我们认为可以利用人类写的好故事中的信息,以生成更多元的剧本。深入探究故事细节可以建立更加复杂和真实的剧本。在这篇文章中,我们提出了一个具有追踪和补充的故事生成框架(GROVE),以增强故事的复杂性。我们建立了一个目标状况库,以生成少量的示例提示LLMs。此外,我们设计了一个“问题”提示方案,可以从故事中提取一棵证据森林,以补偿可能在生成的故事中出现的歧难。这个迭代过程可以暴露出故事的背景。最后,我们从证据森林中选择最符合的证据链,并将其与生成的故事结合,从而增强故事的复杂性和实际性。实验结果和许多例子证明了我们的方法的有效性。

Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data

  • paper_url: http://arxiv.org/abs/2310.05378
  • repo_url: https://github.com/NickDiSanto/Twitter2030/tree/main/Beta
  • paper_authors: Nick DiSanto, Anthony Corso, Benjamin Sanders, Gavin Harding
  • for: investigate social media data to uncover abstract relationships and challenge the reliance on complex models
  • methods: employ Bag-of-Words models specific to each city to analyze Twitter data and evaluate representation
  • results: discover hidden insights and demonstrate the considerable influence of geographic location on online communication, challenging the notion that intricate models are necessary for pattern recognition
    Abstract While transformers have pioneered attention-driven architectures as a cornerstone of research, their dependence on explicitly contextual information underscores limitations in their abilities to tacitly learn overarching textual themes. This study investigates social media data as a source of distributed patterns, challenging the heuristic paradigm of performance benchmarking. In stark contrast to networks that rely on capturing complex long-term dependencies, models of online data inherently lack structure and are forced to learn underlying patterns in the aggregate. To properly represent these abstract relationships, this research dissects empirical social media corpora into their elemental components and analyzes over two billion tweets across population-dense locations. Exploring the relationship between location and vernacular in Twitter data, we employ Bag-of-Words models specific to each city and evaluate their respective representation. This demonstrates that hidden insights can be uncovered without the crutch of advanced algorithms and demonstrates that even amidst noisy data, geographic location has a considerable influence on online communication. This evidence presents tangible insights regarding geospatial communication patterns and their implications in social science. It also challenges the notion that intricate models are prerequisites for pattern recognition in natural language, aligning with the evolving landscape that questions the embrace of absolute interpretability over abstract understanding. This study bridges the divide between sophisticated frameworks and intangible relationships, paving the way for systems that blend structured models with conjectural reasoning.
    摘要 transformers推动了注意力驱动的建筑,但它们对文本主题的潜在学习表现出了局限性。这个研究通过社交媒体数据来挑战传统性能标准的假设,因为模型在线上数据上自然地缺乏结构,需要通过汇总来学习下级 Patterns。为了正确表示这些抽象关系,我们在Twitter数据中分解了实际社交媒体文本,并对全球各地的 tweet 进行分析,检查了地点和方言之间的关系。我们使用特定于每个城市的 Bag-of-Words 模型进行评估,并发现了隐藏的Patterns。这种方法表明,无需复杂的算法,地理位置在在线交流中具有显著的影响。这些证据表明在社会科学中的地理通信模式和其影响,并挑战了人们对精准模型的依赖。这种研究既结合了结构化模型,也结合了推理。

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

  • paper_url: http://arxiv.org/abs/2310.05374
  • repo_url: None
  • paper_authors: Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen
    for: 这个论文主要是为了提高END-TO-END speech处理模型的性能,尤其是在数据中心时代的人工智能 era。methods: 这个论文提出了一种名为LaSyn的高效文本数据利用框架,用于增强END-TO-END speech处理模型的训练。LaSyn使用文本数据生成一种中间的幻数表示,然后将其与预训练的speech模型进行混合,以提高模型的性能。results: 在ASR任务上,LaSyn可以提高E2E基eline的表达误差率超过22.3%。在SLU任务上,LaSyn可以提高E2E基eline的意图分类精度和插槽填充精度。与已有的发布状态的工作相比,LaSyn的参数更少,并且得到了相当的性能提升。这些结果表明LaSyn生成的训练数据的质量。
    Abstract Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech processing models. We train a latent synthesizer to convert textual data into an intermediate latent representation of a pre-trained speech model. These pseudo acoustic representations of textual data augment acoustic data for model training. We evaluate LaSyn on low-resource automatic speech recognition (ASR) and spoken language understanding (SLU) tasks. For ASR, LaSyn improves an E2E baseline trained on LibriSpeech train-clean-100, with relative word error rate reductions over 22.3% on different test sets. For SLU, LaSyn improves our E2E baseline by absolute 4.1% for intent classification accuracy and 3.8% for slot filling SLU-F1 on SLURP, and absolute 4.49% and 2.25% for exact match (EM) and EM-Tree accuracies on STOP respectively. With fewer parameters, the results of LaSyn are competitive to published state-of-the-art works. The results demonstrate the quality of the augmented training data.
    摘要 培训高性能端到端语音处理模型需要庞大量的标注语音数据,特别在人工智能时代。然而,标注语音数据通常比文本数据更 scarce 和更昂贵。我们提议Latent Synthesis(LaSyn),一种高效的文本数据利用框架 для端到端语音处理模型。我们在一个预训练的语音模型上训练一个干扰生成器,将文本数据转换为一种中间的干扰表示。这些干扰表示可以增强语音数据的训练。我们对LaSyn进行了评估,在不同的测试集上,LaSyn在自动语音识别(ASR)和语言理解(SLU)任务上提高了基eline的性能。在ASR任务上,LaSyn在LibriSpeech train-clean-100上训练的基eline上,相对减少了22.3%的单词错误率。在SLU任务上,LaSyn提高了我们的基eline的意向分类精度和插槽填充精度,相对增加了4.1%和3.8%。 LaSyn的参数数量 fewer ,与已发表的状态 Künstler 的性能相匹配。结果表明增强的训练数据质量。

A Glance is Enough: Extract Target Sentence By Looking at A keyword

  • paper_url: http://arxiv.org/abs/2310.05352
  • repo_url: None
  • paper_authors: Ying Shi, Dong Wang, Lantian Li, Jiqing Han
  • for: 这个论文探讨了从多个说话人的多话语中提取目标句子,只需要输入一个关键词。例如,在社会保障应用中,关键词可能是“帮助”,目标是从其他说话人的干扰中提取某个人呼叫的句子。
  • methods: 我们提议使用Transformer架构将关键词和语音词汇 embedding,然后通过cross-attention机制选择正确的内容从拼接或重叠的语音中提取目标句子。
  • results: 在Librispeech数据集上,我们的提议方法可以很好地提取噪音和杂音声中的目标句子(SNR=-3dB),PER为26%,比基eline系统的PER为96%。
    Abstract This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input. For example, in social security applications, the keyword might be "help", and the goal is to identify what the person who called for help is articulating while ignoring other speakers. To address this problem, we propose using the Transformer architecture to embed both the keyword and the speech utterance and then rely on the cross-attention mechanism to select the correct content from the concatenated or overlapping speech. Experimental results on Librispeech demonstrate that our proposed method can effectively extract target sentences from very noisy and mixed speech (SNR=-3dB), achieving a phone error rate (PER) of 26\%, compared to the baseline system's PER of 96%.
    摘要

Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

  • paper_url: http://arxiv.org/abs/2310.05338
  • repo_url: None
  • paper_authors: Holy Lovenia, Wenliang Dai, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung
  • for: 本研究旨在评估视言语(VL)模型中对物体幻化的影响,以提高模型的可靠性和可信度。
  • methods: 本研究使用了大量的自然语言模型生成29.5k个高质量的 sintetic negative pronoun(NegP)数据,以评估VL模型对物体幻化的敏感性。
  • results: 研究发现,无论是当前的State-of-the-art VL模型都不免受物体幻化的影响,其中所有模型在NegP问题中的准确率都低于10%。此外,研究还发现了lexically diverse visual questions、宽泛的问题类型和场景相关的物体,可能会使VL模型增加物体幻化的风险。
    Abstract Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects. However, the absence of a general measurement for evaluating object hallucination in VL models has hindered our understanding and ability to mitigate this issue. In this work, we present NOPE (Negative Object Presence Evaluation), a novel benchmark designed to assess object hallucination in VL models through visual question answering (VQA). We propose a cost-effective and scalable approach utilizing large language models to generate 29.5k synthetic negative pronoun (NegP) data of high quality for NOPE. We extensively investigate the performance of 10 state-of-the-art VL models in discerning the non-existence of objects in visual questions, where the ground truth answers are denoted as NegP (e.g., "none"). Additionally, we evaluate their standard performance on visual questions on 9 other VQA datasets. Through our experiments, we demonstrate that no VL model is immune to the vulnerability of object hallucination, as all models achieve accuracy below 10\% on NegP. Furthermore, we uncover that lexically diverse visual questions, question types with large scopes, and scene-relevant objects capitalize the risk of object hallucination in VL models.
    摘要 对象幻像 pose 视语言(VL)模型中的挑战,经常导致生成无意义或不准确的回答,其中包括无存在的对象。然而,对视语言模型中对象幻像的评价没有一个通用的方法,这限制了我们对这个问题的理解和处理能力。在这种情况下,我们提出了 NOPE(负对象存在评价),一种新的benchmark,用于评价视语言模型中对象幻像的能力。我们提出了一种可靠且可扩展的方法,利用大型自然语言模型生成29.5k个高质量的负对象数据(NegP)。我们广泛研究了10种当前最佳的视语言模型在判断视Question中的对象不存在时的性能,其中ground truth answers denoted as NegP(例如,"none")。此外,我们还评估了这些模型在9个其他VQA数据集上的标准性能。经过我们的实验,我们发现没有一个视语言模型是对象幻像的免疫者,所有模型在NegP上的准确率都低于10%。此外,我们发现了不同类型的视Question、广泛的问题类型和场景相关的对象都会增加视语言模型中对象幻像的风险。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-based Data Augmentation

  • paper_url: http://arxiv.org/abs/2310.05318
  • repo_url: None
  • paper_authors: Xunxin Cai, Meng Xiao, Zhiyuan Ning, Yuanchun Zhou
  • for: This paper aims to address the issue of data imbalance in Natural Language Processing, specifically in the context of research proposals submitted for funding.
  • methods: The paper uses large language models (Llama V1) as data generators to augment research proposals categorized within intricate disciplinary hierarchies. The authors design prompts for keyword-based research proposal generation to rectify data imbalances and enhance the equity of expert assignments.
  • results: The experiments conducted in the paper demonstrate the efficacy of the generated data, showing that the research proposals produced using the prompts can effectively address the issue of data imbalance and generate high-quality scientific text data.
    Abstract In addressing the imbalanced issue of data within the realm of Natural Language Processing, text data augmentation methods have emerged as pivotal solutions. This data imbalance is prevalent in the research proposals submitted during the funding application process. Such imbalances, resulting from the varying popularity of disciplines or the emergence of interdisciplinary studies, significantly impede the precision of downstream topic models that deduce the affiliated disciplines of these proposals. At the data level, proposals penned by experts and scientists are inherently complex technological texts, replete with intricate terminologies, which augmenting such specialized text data poses unique challenges. At the system level, this, in turn, compromises the fairness of AI-assisted reviewer assignment systems, which raises a spotlight on solving this issue. This study leverages large language models (Llama V1) as data generators to augment research proposals categorized within intricate disciplinary hierarchies, aiming to rectify data imbalances and enhance the equity of expert assignments. We first sample within the hierarchical structure to find the under-represented class. Then we designed a prompt for keyword-based research proposal generation. Our experiments attests to the efficacy of the generated data, demonstrating that research proposals produced using the prompts can effectively address the aforementioned issues and generate high quality scientific text data, thus help the model overcome the imbalanced issue.
    摘要 在自然语言处理领域中解决数据不均衡问题,文本数据增强方法已成为关键解决方案。这种数据不均衡问题在研究提案申请过程中非常普遍,这些不均衡导致下游话题模型准确性受到影响。在数据层次,由专家和科学家写的提案是复杂的技术文本,充满专业术语,增强这种专业文本数据带来了独特的挑战。在系统层次,这会导致人工智能助手分配系统的公平性受到影响。本研究利用大型自然语言模型(Llama V1)作为数据生成器,增强分类在复杂的学科层次中的研究提案,以解决数据不均衡问题并提高专家分配的公平性。我们首先在层次结构中采样到下降类,然后设计了关键词基于的研究提案生成提示。我们的实验证明了生成的数据的有效性,显示了使用提示生成的研究提案可以有效地解决上述问题,并生成高质量的科学文本数据,帮助模型超越数据不均衡问题。

cs.LG - 2023-10-09

Fair Classifiers that Abstain without Harm

  • paper_url: http://arxiv.org/abs/2310.06205
  • repo_url: None
  • paper_authors: Tongxin Yin, Jean-François Ton, Ruocheng Guo, Yuanshun Yao, Mingyan Liu, Yang Liu
  • For: The paper aims to develop a post-hoc method for existing classifiers to selectively abstain from predicting certain samples in order to achieve group fairness while maintaining original accuracy.* Methods: The proposed method uses integer programming to assign abstention decisions for each training sample and trains a surrogate model to generalize the abstaining decisions to test samples.* Results: The paper shows that the proposed method outperforms existing methods in terms of fairness disparity without sacrificing accuracy at similar abstention rates, and provides theoretical results on the feasibility of the IP procedure and the required abstention rate for different levels of unfairness tolerance and accuracy constraint.
    Abstract In critical applications, it is vital for classifiers to defer decision-making to humans. We propose a post-hoc method that makes existing classifiers selectively abstain from predicting certain samples. Our abstaining classifier is incentivized to maintain the original accuracy for each sub-population (i.e. no harm) while achieving a set of group fairness definitions to a user specified degree. To this end, we design an Integer Programming (IP) procedure that assigns abstention decisions for each training sample to satisfy a set of constraints. To generalize the abstaining decisions to test samples, we then train a surrogate model to learn the abstaining decisions based on the IP solutions in an end-to-end manner. We analyze the feasibility of the IP procedure to determine the possible abstention rate for different levels of unfairness tolerance and accuracy constraint for achieving no harm. To the best of our knowledge, this work is the first to identify the theoretical relationships between the constraint parameters and the required abstention rate. Our theoretical results are important since a high abstention rate is often infeasible in practice due to a lack of human resources. Our framework outperforms existing methods in terms of fairness disparity without sacrificing accuracy at similar abstention rates.
    摘要 Translated into Simplified Chinese:在关键应用中,分类器需要延迟决策,以便启用人类决策。我们提出了一种后期方法,使得现有的分类器可以选择性弃权处理certain sample。我们的弃权分类器被激励保持每个子 популяции的原始精度(即不害),同时实现一组集体公正定义到用户指定的程度。为此,我们设计了一个整数程序(IP)过程,将每个训练样本的弃权决策分配给满足一系列约束。为推广弃权决策到测试样本,我们然后训练了一个代理模型,以learn弃权决策基于IP解决方案的末端方式。我们分析了IP过程的可行性,以确定不同的不公正忍容度和精度约束下的可能的弃权率。我们的理论结果非常重要,因为高弃权率在实践中通常是不可能的,由于人工资源的缺乏。我们的框架在保持公正差距方面比现有方法更高,而不是牺牲精度。

PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization

  • paper_url: http://arxiv.org/abs/2310.06182
  • repo_url: None
  • paper_authors: Jiancong Xiao, Ruoyu Sun, Zhi- Quan Luo
  • for: This paper focuses on establishing theoretical guarantees for the robust generalization of deep neural networks (DNNs) against adversarial attacks.
  • methods: The paper uses a PAC-Bayes approach (Neyshabur et al., 2017) and provides a spectrally-normalized robust generalization bound for DNNs, addressing the challenge of extending the key ingredient to robust settings without relying on additional strong assumptions.
  • results: The paper shows that the mismatch terms between standard and robust generalization bounds are solely due to mathematical issues, and provides a different perspective on understanding robust generalization. Additionally, the paper extends the main result to adversarial robustness against general non-$\ell_p$ attacks and other neural network architectures.
    Abstract Deep neural networks (DNNs) are vulnerable to adversarial attacks. It is found empirically that adversarially robust generalization is crucial in establishing defense algorithms against adversarial attacks. Therefore, it is interesting to study the theoretical guarantee of robust generalization. This paper focuses on norm-based complexity, based on a PAC-Bayes approach (Neyshabur et al., 2017). The main challenge lies in extending the key ingredient, which is a weight perturbation bound in standard settings, to the robust settings. Existing attempts heavily rely on additional strong assumptions, leading to loose bounds. In this paper, we address this issue and provide a spectrally-normalized robust generalization bound for DNNs. Compared to existing bounds, our bound offers two significant advantages: Firstly, it does not depend on additional assumptions. Secondly, it is considerably tighter, aligning with the bounds of standard generalization. Therefore, our result provides a different perspective on understanding robust generalization: The mismatch terms between standard and robust generalization bounds shown in previous studies do not contribute to the poor robust generalization. Instead, these disparities solely due to mathematical issues. Finally, we extend the main result to adversarial robustness against general non-$\ell_p$ attacks and other neural network architectures.
    摘要 Translated into Simplified Chinese:深度神经网络(DNN)易受到敌意攻击。实证表明,针对攻击的鲁棒化是建立防御算法的关键。因此,研究鲁棒化的理论保证很有趣。这篇论文关注 norm-based 复杂性,基于 PAC-Bayes 方法(Neyshabur et al., 2017)。主要挑战在扩展关键成分,即标准设置中的 weight 偏移 bound,到鲁棒设置中。现有尝试都需要额外假设,导致约束较松。在这篇论文中,我们解决这个问题,并提供一个spectrally-normalized 鲁棒化维度 bound for DNNs。与现有 bound 相比,我们的 bound 具有两个优势:首先,不需要额外假设。第二,较紧,与标准化 generalization bound 相符。因此,我们的结果提供了一种不同的理解鲁棒化的视角:在前一些研究中显示的鲁棒化与标准化 generalization bound 之间的差异不是由于 poor 鲁棒化,而是由于数学问题。最后,我们扩展主要结果到面向普通非 $\ell_p$ 攻击和其他神经网络架构。

Automatic Integration for Spatiotemporal Neural Point Processes

  • paper_url: http://arxiv.org/abs/2310.06179
  • repo_url: None
  • paper_authors: Zihao Zhou, Rose Yu
  • for: 这篇论文主要针对的是如何有效地捕捉和分析continuous-time点处理,尤其是在空间和时间上的点处理(STPPs)。
  • methods: 这篇论文提出了一种新的AutoSTPP(自动 интеграл для空间时间 нейрон点处理)方法,它是基于AutoInt(自动 интеграл)方法的扩展,可以有效地处理3D STPP。
  • results: 研究人员通过synthetic数据和实际世界数据 validate了AutoSTPP方法,并证明了其在复杂的intensity函数恢复方面的优异性。
    Abstract Learning continuous-time point processes is essential to many discrete event forecasting tasks. However, integration poses a major challenge, particularly for spatiotemporal point processes (STPPs), as it involves calculating the likelihood through triple integrals over space and time. Existing methods for integrating STPP either assume a parametric form of the intensity function, which lacks flexibility; or approximating the intensity with Monte Carlo sampling, which introduces numerical errors. Recent work by Omi et al. [2019] proposes a dual network or AutoInt approach for efficient integration of flexible intensity function. However, the method only focuses on the 1D temporal point process. In this paper, we introduce a novel paradigm: AutoSTPP (Automatic Integration for Spatiotemporal Neural Point Processes) that extends the AutoInt approach to 3D STPP. We show that direct extension of the previous work overly constrains the intensity function, leading to poor performance. We prove consistency of AutoSTPP and validate it on synthetic data and benchmark real world datasets, showcasing its significant advantage in recovering complex intensity functions from irregular spatiotemporal events, particularly when the intensity is sharply localized.
    摘要 Recent work by Omi et al. (2019) proposes a dual network or AutoInt approach for efficient integration of flexible intensity functions. However, this method only focuses on 1D temporal point processes. In this paper, we introduce a novel paradigm called AutoSTPP (Automatic Integration for Spatiotemporal Neural Point Processes) that extends the AutoInt approach to 3D STPP. We show that direct extension of the previous work overly constrains the intensity function, leading to poor performance.We prove the consistency of AutoSTPP and validate it on synthetic data and benchmark real-world datasets. Our results show that AutoSTPP significantly outperforms existing methods in recovering complex intensity functions from irregular spatiotemporal events, particularly when the intensity is sharply localized.

DockGame: Cooperative Games for Multimeric Rigid Protein Docking

  • paper_url: http://arxiv.org/abs/2310.06177
  • repo_url: https://github.com/vsomnath/dockgame
  • paper_authors: Vignesh Ram Somnath, Pier Giuseppe Sessa, Maria Rodriguez Martinez, Andreas Krause
  • for: 本文针对的是预测多蛋白质复合物的结构,即蛋白质 docking 问题。
  • methods: 本文提出了一种基于游戏理论的 docking 方法,视蛋白质 docking 为多个蛋白质之间的合作游戏,并通过同时更新梯度来计算稳定Equilibrium。此外,本文还提出了一种基于扩散生成模型的方法,通过学习扩散分布来采样真实潜在力的Gibbs分布。
  • results: 实验结果表明,对 DB5.5 数据集,DockGame 比传统的 docking 方法快得多,能够生成多个可能的结构,并且与现有的 binary docking 基准集成比较。
    Abstract Protein interactions and assembly formation are fundamental to most biological processes. Predicting the assembly structure from constituent proteins -- referred to as the protein docking task -- is thus a crucial step in protein design applications. Most traditional and deep learning methods for docking have focused mainly on binary docking, following either a search-based, regression-based, or generative modeling paradigm. In this paper, we focus on the less-studied multimeric (i.e., two or more proteins) docking problem. We introduce DockGame, a novel game-theoretic framework for docking -- we view protein docking as a cooperative game between proteins, where the final assembly structure(s) constitute stable equilibria w.r.t. the underlying game potential. Since we do not have access to the true potential, we consider two approaches - i) learning a surrogate game potential guided by physics-based energy functions and computing equilibria by simultaneous gradient updates, and ii) sampling from the Gibbs distribution of the true potential by learning a diffusion generative model over the action spaces (rotations and translations) of all proteins. Empirically, on the Docking Benchmark 5.5 (DB5.5) dataset, DockGame has much faster runtimes than traditional docking methods, can generate multiple plausible assembly structures, and achieves comparable performance to existing binary docking baselines, despite solving the harder task of coordinating multiple protein chains.
    摘要 生物过程中的蛋白质交互和组装是基本的。从组成蛋白质的蛋白质拟合结构 -- 称为蛋白质拟合任务 -- 是蛋白质设计应用中的关键步骤。大多数传统和深度学习方法都主要关注了 binary docking,包括搜索、回归和生成模型的思路。在这篇论文中,我们关注了较少研究的多蛋白质(即两个或更多蛋白质)拟合问题。我们引入了 DockGame,一个基于游戏理论的拟合框架 -- 我们视蛋白质拟合为蛋白质之间的合作游戏,其最终结构为蛋白质之间的稳定平衡点。由于我们没有访问真实的潜在力,我们考虑了两种方法:一是学习带有物理基础能函数的代理游戏可能性函数,并通过同时更新梯度来计算平衡点;二是通过学习动作空间(旋转和平移)中的托德曼分布来采样真实的潜在力。实验表明,在 DB5.5 数据集上,DockGame 的运行时间远比传统拟合方法快得多,可以生成多个可能的结构,并与现有的 binary docking 基线相当。

Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness

  • paper_url: http://arxiv.org/abs/2310.06161
  • repo_url: https://github.com/estija/cmid
  • paper_authors: Bhavya Vasudeva, Kameron Shahabi, Vatsal Sharan
  • for: addressing simplicity bias in neural networks and improving OOD generalization, subgroup robustness, and fairness
  • methods: regularizing the conditional mutual information of a simple model to obtain a more diverse set of features for making predictions
  • results: effective in various problem settings and real-world applications, leading to more diverse feature usage, enhanced OOD generalization, improved subgroup robustness, and fairness, with theoretical analyses of the effectiveness and OOD generalization properties.Here’s the full Chinese text:
  • for: Addressing simplicity bias in 神经网络(NNs),提高 OUT-OF-DISTRIBUTION(OOD)泛化、 subgroup robustness 和 fairness
  • methods: Regularizing the conditional mutual information of a simple model to obtain a more diverse set of features for making predictions
  • results: Effective in various problem settings and real-world applications, leading to more diverse feature usage, enhanced OOD generalization, improved subgroup robustness, and fairness, with theoretical analyses of the effectiveness and OOD generalization properties.
    Abstract Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative. Simplicity bias can lead to the model making biased predictions which have poor out-of-distribution (OOD) generalization. To address this, we propose a framework that encourages the model to use a more diverse set of features to make predictions. We first train a simple model, and then regularize the conditional mutual information with respect to it to obtain the final model. We demonstrate the effectiveness of this framework in various problem settings and real-world applications, showing that it effectively addresses simplicity bias and leads to more features being used, enhances OOD generalization, and improves subgroup robustness and fairness. We complement these results with theoretical analyses of the effect of the regularization and its OOD generalization properties.
    摘要

Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization

  • paper_url: http://arxiv.org/abs/2310.06159
  • repo_url: None
  • paper_authors: Cong Ma, Xingyu Xu, Tian Tong, Yuejie Chi
  • for: 估计低维对象(如矩阵和张量)从不完整、可能受损的线性测量中获得
  • methods: 使用简单迭代法如梯度下降(GD)来直接回归低维因子,具有小内存和计算脚印
  • results: ScaledGD算法可以线性 converge,不受低维对象的condition number影响,并且可以在各种任务中实现快速的全局收敛,包括感知、Robust PCA和完成任务。
    Abstract Many problems encountered in science and engineering can be formulated as estimating a low-rank object (e.g., matrices and tensors) from incomplete, and possibly corrupted, linear measurements. Through the lens of matrix and tensor factorization, one of the most popular approaches is to employ simple iterative algorithms such as gradient descent (GD) to recover the low-rank factors directly, which allow for small memory and computation footprints. However, the convergence rate of GD depends linearly, and sometimes even quadratically, on the condition number of the low-rank object, and therefore, GD slows down painstakingly when the problem is ill-conditioned. This chapter introduces a new algorithmic approach, dubbed scaled gradient descent (ScaledGD), that provably converges linearly at a constant rate independent of the condition number of the low-rank object, while maintaining the low per-iteration cost of gradient descent for a variety of tasks including sensing, robust principal component analysis and completion. In addition, ScaledGD continues to admit fast global convergence to the minimax-optimal solution, again almost independent of the condition number, from a small random initialization when the rank is over-specified in the presence of Gaussian noise. In total, ScaledGD highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the symmetry in low-rank factorization without hurting generalization.
    摘要 许多科学和工程问题可以表示为估算一个低级对象(例如矩阵和张量)从不完整和可能受损的线性测量中。通过矩阵和张量分解的镜头,一种非常流行的方法是使用简单的迭代算法such as gradient descent (GD)来恢复低级因子,这些算法具有小内存和计算成本。然而,GD的收敛率与低级对象的condition number线性相关,当问题不梯化时,GD的收敛率会辐芳缓慢。这章节介绍了一种新的算法方法,称为scaled gradient descent (ScaledGD),该方法可以在不同任务中,包括感知、稳定主成分分析和完成任务中,以Constant rate linearly converge,而不是linearly dependent on the condition number of the low-rank object。此外,ScaledGD还可以快速到达最优解,即minimax-optimal solution,从小Random initialization开始,当级数超出规定时,在存在 Gaussian noise 的情况下。总之,ScaledGD强调了适当的预conditioning在加速非 conjugate statistical estimation中的作用,iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the symmetry in low-rank factorization without hurting generalization。

Manifold-augmented Eikonal Equations: Geodesic Distances and Flows on Differentiable Manifolds

  • paper_url: http://arxiv.org/abs/2310.06157
  • repo_url: None
  • paper_authors: Daniel Kelshaw, Luca Magri
  • for: 这项研究旨在提供一种基于模型的方法来 parameterize distance fields和 geodesic flows on manifolds,以便在 differentiable manifolds 上进行统计分析和减少维度模型。
  • methods: 该研究使用 manifold-augmented Eikonal equation 的解来 parameterize distance fields和 geodesic flows on manifolds。
  • results: 研究发现, manifold 的geometry对 distance field 产生了影响,而 geodesic flow 可以用来获取 globally length-minimizing curves。这些结果开启了 differentiable manifolds 上的统计分析和减少维度模型的可能性。
    Abstract Manifolds discovered by machine learning models provide a compact representation of the underlying data. Geodesics on these manifolds define locally length-minimising curves and provide a notion of distance, which are key for reduced-order modelling, statistical inference, and interpolation. In this work, we propose a model-based parameterisation for distance fields and geodesic flows on manifolds, exploiting solutions of a manifold-augmented Eikonal equation. We demonstrate how the geometry of the manifold impacts the distance field, and exploit the geodesic flow to obtain globally length-minimising curves directly. This work opens opportunities for statistics and reduced-order modelling on differentiable manifolds.
    摘要 人工智能模型发现的 manifold 提供了数据的紧凑表示。 manifold 上的 geodesic 定义了本地最短曲线,并提供了距离的概念,这些概念是reduced-order模型、统计推断和 interpolate 等方面的关键。在这项工作中,我们提议一种基于模型的 parameterization 方法 для distance field 和 geodesic flow на manifold,利用 manifold-augmented Eikonal equation 的解。我们示出了 manifold 的几何特性对 distance field 的影响,并利用 geodesic flow 直接获取全球最短曲线。这项工作开启了 differentiable manifold 上的统计和减少模型的可能性。

Latent Diffusion Model for DNA Sequence Generation

  • paper_url: http://arxiv.org/abs/2310.06150
  • repo_url: None
  • paper_authors: Zehui Li, Yuhao Ni, Tim August B. Huygelen, Akashaditya Das, Guoxuan Xia, Guy-Bart Stan, Yiren Zhao
  • for: 本研究旨在提出一种基于扩散模型的精灵散列模型(DiscDiff),用于静止DNA序列生成。
  • methods: 本研究使用了一种卷积神经网络(autoencoder)将扩散模型的维度嵌入到维度空间中,以便利用连续扩散模型的强大生成能力来生成扩散数据。
  • results: 本研究的DiscDiff模型能够生成具有真实DNA序列的高一致性的合成DNA序列,包括约束分布、封闭空间分布(FReD)和染色体轨迹分布。此外,本研究还提供了15种物种150000个特有前体-基因序列数据,为未来的生成模型在遗传学中提供了更多的资源。
    Abstract The harnessing of machine learning, especially deep generative models, has opened up promising avenues in the field of synthetic DNA sequence generation. Whilst Generative Adversarial Networks (GANs) have gained traction for this application, they often face issues such as limited sample diversity and mode collapse. On the other hand, Diffusion Models are a promising new class of generative models that are not burdened with these problems, enabling them to reach the state-of-the-art in domains such as image generation. In light of this, we propose a novel latent diffusion model, DiscDiff, tailored for discrete DNA sequence generation. By simply embedding discrete DNA sequences into a continuous latent space using an autoencoder, we are able to leverage the powerful generative abilities of continuous diffusion models for the generation of discrete data. Additionally, we introduce Fr\'echet Reconstruction Distance (FReD) as a new metric to measure the sample quality of DNA sequence generations. Our DiscDiff model demonstrates an ability to generate synthetic DNA sequences that align closely with real DNA in terms of Motif Distribution, Latent Embedding Distribution (FReD), and Chromatin Profiles. Additionally, we contribute a comprehensive cross-species dataset of 150K unique promoter-gene sequences from 15 species, enriching resources for future generative modelling in genomics. We will make our code public upon publication.
    摘要 “机器学习的应用,特别是深度生成模型,在人造DNA序列生成领域中开启了有前途的可能性。虽然生成对抗网络(GANs)在这个应用中获得了进展,但它们经常面临有限的样本多样性和模式崩溃的问题。相比之下,传播模型是一种新的生成模型,没有这些问题,因此可以在领域中实现国际级的生成。在这背景下,我们提出了一个新的潜在传播模型,DiscDiff,专门适用于碎变DNA序列生成。通过将碎变DNA序列转换为连续的潜在空间中的对抗网络,我们可以利用传播模型的强大生成能力来生成碎变数据。此外,我们引入了Fréchet重建距离(FReD)作为评估生成DNA序列质量的新指标。DiscDiff模型在关于折衣分布、隐藏嵌入分布(FReD)和染色体质量上呈现高度的一致性。此外,我们提供了15种物种150000个唯一的激活器-蛋白质序列数据,增加了未来生成模型在遗传学方面的资源。我们将代码公开发布。”

On the Correlation between Random Variables and their Principal Components

  • paper_url: http://arxiv.org/abs/2310.06139
  • repo_url: None
  • paper_authors: Zenon Gniazdowski
  • for: 本研究旨在找到Random Variables之间的相関系数,并使用线性代数方法来描述这些相关系数。
  • methods: 本研究使用了选取随机变数之间的统计量,然后使用 вектор和矩阵的概念来表述这些统计量的语言。这使得在后续步骤中可以 derivate预期的公式。
  • results: 研究发现,这个公式与因素分析中用来计算因素负载的公式相同。对于Principal Component Analysis中的主成分选择和因素分析中的因素数选择,这个公式也可以用来优化。
    Abstract The article attempts to find an algebraic formula describing the correlation coefficients between random variables and the principal components representing them. As a result of the analysis, starting from selected statistics relating to individual random variables, the equivalents of these statistics relating to a set of random variables were presented in the language of linear algebra, using the concepts of vector and matrix. This made it possible, in subsequent steps, to derive the expected formula. The formula found is identical to the formula used in Factor Analysis to calculate factor loadings. The discussion showed that it is possible to apply this formula to optimize the number of principal components in Principal Component Analysis, as well as to optimize the number of factors in Factor Analysis.
    摘要 文章尝试找到一个 алгебраическая方程描述Random Variables和它们的主成分之间的相关系数。经过分析,从选择的个体Random Variables的统计信息开始,使用线性代数概念 Vector和矩阵来表示这些统计信息的等价物。这使得在后续步骤中可以 derivate预期的方程。发现的方程与 фактор分析中计算因子负载的方程一样。文章还讨论了如何使用这个方程优化Principal Component Analysis中的主成分数量和Factor Analysis中的因子数量。Note: "Simplified Chinese" is a romanization of Chinese that uses a simplified set of characters and grammar rules to represent the language. It is commonly used in mainland China and Singapore.

Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach

  • paper_url: http://arxiv.org/abs/2310.06112
  • repo_url: https://github.com/fshp971/adv-ntk
  • paper_authors: Shaopeng Fu, Di Wang
  • for: 这篇论文主要是为了解释深度神经网络(DNN)中的对抗训练(AT)方法在Robustness方面的缺点。
  • methods: 该论文使用了神经积簇kernel(NTK)理论来扩展AT方法,并证明了一个攻击者训练的宽度DNN可以被近似为一个线性化DNN。
  • results: 该论文通过实验表明,使用Adv-NTK算法可以帮助无穷宽度DNN增强相对的Robustness,并且该结果证明了论文中的理论结论。
    Abstract Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width DNNs. Experiments on real-world datasets show that Adv-NTK can help infinite-width DNNs enhance comparable robustness to that of their finite-width counterparts, which in turn justifies our theoretical findings. The code is available at https://github.com/fshp971/adv-ntk.
    摘要 “对抗训练(AT)是深度神经网络(DNN)的一种标准方法,但是最近的研究表明,长期的AT可能对DNN的Robustness产生负面影响。这篇论文提供了DNN的Robust overfitting的理论解释。具体来说,我们将 neural tangent kernel(NTK)理论推广到AT,并证明了一个 adversarially trained wide DNN可以被linearized。此外,对于平方损失,我们可以 derivate closed-form AT dynamics for linearized DNN,这 revelas a new AT degeneration phenomenon:long-term AT will cause a wide DNN to degenerate into a DNN without AT, leading to robust overfitting。根据我们的理论结论,我们还设计了一种名为 Adv-NTK的AT算法,该算法可以帮助无限宽 DNN 提高相对的Robustness。实验表明,Adv-NTK可以帮助无限宽 DNN 提高与其有限宽 counterpart 的Robustness,这对我们的理论结论产生了正确的证明。代码可以在 https://github.com/fshp971/adv-ntk 中找到。”Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

Grokking as the Transition from Lazy to Rich Training Dynamics

  • paper_url: http://arxiv.org/abs/2310.06110
  • repo_url: None
  • paper_authors: Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan
  • for: 该论文探讨了 Grokking 现象,即 neural network 的训练损失降低得比测试损失早得多,可能是由于 neural network 从懒散训练方式转移到了丰富的特征学习 régime。
  • methods: 作者通过使用普通的梯度下降法和二层神经网络在一个多项式回归问题上进行研究,发现 Grokking 现象不可能由现有理论解释。作者还提出了测试损失的充分统计,并在训练过程中跟踪这些统计,从而发现 Grokking 现象 arise 在神经网络首先尝试使用初始特征来适应kernel regression解决方案,然后在训练损失已经下降到低水平时发现一个泛化解决方案。
  • results: 作者发现 Grokking 现象的关键因素包括神经网络输出的速率(可以由输出参数控制)和初始特征与目标函数 $y(x)$ 的对齐度。当神经网络在初始特征学习 régime 中训练时,它会首先尝试适应kernel regression解决方案,然后在训练损失已经下降到低水平时发现一个泛化解决方案。此外,作者还发现这种延迟泛化 arise 在 dataset 大 enough,但不是太大,以致可以使神经网络泛化,但不是太早。
    Abstract We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To illustrate this mechanism, we study the simple setting of vanilla gradient descent on a polynomial regression problem with a two layer neural network which exhibits grokking without regularization in a way that cannot be explained by existing theories. We identify sufficient statistics for the test loss of such a network, and tracking these over training reveals that grokking arises in this setting when the network first attempts to fit a kernel regression solution with its initial features, followed by late-time feature learning where a generalizing solution is identified after train loss is already low. We find that the key determinants of grokking are the rate of feature learning -- which can be controlled precisely by parameters that scale the network output -- and the alignment of the initial features with the target function $y(x)$. We argue this delayed generalization arises when (1) the top eigenvectors of the initial neural tangent kernel and the task labels $y(x)$ are misaligned, but (2) the dataset size is large enough so that it is possible for the network to generalize eventually, but not so large that train loss perfectly tracks test loss at all epochs, and (3) the network begins training in the lazy regime so does not learn features immediately. We conclude with evidence that this transition from lazy (linear model) to rich training (feature learning) can control grokking in more general settings, like on MNIST, one-layer Transformers, and student-teacher networks.
    摘要 We found that the key determinants of grokking are the rate of feature learning, which can be controlled precisely by parameters that scale the network output, and the alignment of the initial features with the target function $y(x)$. We argue that delayed generalization arises when the top eigenvectors of the initial neural tangent kernel and the task labels $y(x)$ are misaligned, but the dataset size is large enough so that the network can generalize eventually, but not so large that the train loss perfectly tracks the test loss at all epochs. Additionally, the network begins training in the lazy regime, so it does not learn features immediately.We conclude that this transition from lazy (linear model) to rich training (feature learning) can control grokking in more general settings, such as on MNIST, one-layer Transformers, and student-teacher networks.

Quantifying Uncertainty in Deep Learning Classification with Noise in Discrete Inputs for Risk-Based Decision Making

  • paper_url: http://arxiv.org/abs/2310.06105
  • repo_url: None
  • paper_authors: Maryam Kheirandish, Shengfan Zhang, Donald G. Catanzaro, Valeriu Crudu
  • for: 这篇论文的目的是为了提供一个数据类型为数字和分类的问题上的深度神经网络模型中的预测不确定性评估方法。
  • methods: 这篇论文使用的方法是基于 Bayesian deep learning 的方法,具体是使用 Monte Carlo dropout 和我们的提议的框架来评估预测不确定性。
  • results: 这篇论文的结果显示,我们的提议的框架可以更好地识别预测中的错误 случарес,并且比 Monte Carlo dropout 方法更能捕捉错误的情况。
    Abstract The use of Deep Neural Network (DNN) models in risk-based decision-making has attracted extensive attention with broad applications in medical, finance, manufacturing, and quality control. To mitigate prediction-related risks in decision making, prediction confidence or uncertainty should be assessed alongside the overall performance of algorithms. Recent studies on Bayesian deep learning helps quantify prediction uncertainty arises from input noises and model parameters. However, the normality assumption of input noise in these models limits their applicability to problems involving categorical and discrete feature variables in tabular datasets. In this paper, we propose a mathematical framework to quantify prediction uncertainty for DNN models. The prediction uncertainty arises from errors in predictors that follow some known finite discrete distribution. We then conducted a case study using the framework to predict treatment outcome for tuberculosis patients during their course of treatment. The results demonstrate under a certain level of risk, we can identify risk-sensitive cases, which are prone to be misclassified due to error in predictors. Comparing to the Monte Carlo dropout method, our proposed framework is more aware of misclassification cases. Our proposed framework for uncertainty quantification in deep learning can support risk-based decision making in applications when discrete errors in predictors are present.
    摘要 使用深度神经网络(DNN)模型在风险基础的决策中吸引了广泛的关注,应用于医疗、金融、制造和质量控制等领域。为了减少决策过程中的预测风险,需要同时评估算法的总性表现和预测uncertainty。 latest studies on Bayesian deep learning 可以量化预测uncertainty,但这些模型假设输入噪声是Normal分布,这限制了它们在具有分类和离散特征变量的表格数据集中的应用。在这篇论文中,我们提出了一个数学框架,可以量化DNN模型中的预测uncertainty。预测uncertainty来自预测器中的错误,这些错误遵循一些已知的有限离散分布。我们Then conducted a case study using the framework to predict treatment outcome for tuberculosis patients during their course of treatment. The results show that under a certain level of risk, we can identify risk-sensitive cases, which are prone to be misclassified due to error in predictors. Comparing to the Monte Carlo dropout method, our proposed framework is more aware of misclassification cases. Our proposed framework for uncertainty quantification in deep learning can support risk-based decision making in applications when discrete errors in predictors are present.

Transformers and Large Language Models for Chemistry and Drug Discovery

  • paper_url: http://arxiv.org/abs/2310.06083
  • repo_url: None
  • paper_authors: Andres M Bran, Philippe Schwaller
  • for: 这篇论文旨在探讨如何使用Transformer架构解决化学发现过程中的重要瓶颈问题,如retrosynthetic planning和化学空间探索。
  • methods: 这篇论文使用了Transformer架构,并将其应用于不同类型的数据,如线性化分子图、spectra、synthesis actions和人工语言。
  • results: 这篇论文描述了一种新的方法,可以通过自然语言的灵活性,解决化学问题。这种方法可以在不同的化学应用中使用,并且可以在将来的科学发现中扮演一个更重要的角色。
    Abstract Language modeling has seen impressive progress over the last years, mainly prompted by the invention of the Transformer architecture, sparking a revolution in many fields of machine learning, with breakthroughs in chemistry and biology. In this chapter, we explore how analogies between chemical and natural language have inspired the use of Transformers to tackle important bottlenecks in the drug discovery process, such as retrosynthetic planning and chemical space exploration. The revolution started with models able to perform particular tasks with a single type of data, like linearised molecular graphs, which then evolved to include other types of data, like spectra from analytical instruments, synthesis actions, and human language. A new trend leverages recent developments in large language models, giving rise to a wave of models capable of solving generic tasks in chemistry, all facilitated by the flexibility of natural language. As we continue to explore and harness these capabilities, we can look forward to a future where machine learning plays an even more integral role in accelerating scientific discovery.
    摘要 很多年来,语言模型在技术发展方面有了很大的进步,主要归功于Transformer架构的发明,这些架构的出现对机器学习多个领域产生了革命,包括化学和生物学。在这一章中,我们将探讨如何通过在化学和自然语言之间的相似性,使用Transformers来解决药物发现过程中的重要瓶颈,如逆 synthesis 规划和化学空间探索。这场革命从单一数据类型的模型开始,然后演进到包括其他数据类型,如分析器的spectra、合成操作和人类语言。现在,一新的趋势是利用大语言模型,让化学领域中的普遍任务得到解决,全部归功于自然语言的灵活性。我们继续探索和利用这些能力,未来machine learning在科学发现中的作用将变得更加重要。

Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

  • paper_url: http://arxiv.org/abs/2310.06081
  • repo_url: None
  • paper_authors: Aleksei Ustimenko, Aleksandr Beznosikov
  • for: 本文研究一种广泛和通用的马可夫链,即以爱因斯坦-玛丽亚偏抽象方式描述的某种随机 diffequation 的谱。
  • methods: 本文使用了 almost arbitrary 的各向异常和状态依赖的噪声,而不是通常使用的Normal和状态独立的噪声。此外,我们的链的涨落和扩散系数可以是不准确的,以涵盖广泛的应用,如某种 Stochastic Gradient Langevin Dynamics、sampling、Stochastic Gradient Descent 或 Stochastic Gradient Boosting。
  • results: 我们证明了 $W_{2}$-距离 между含义链和对应的随机 diffequation 的法律之间的上界。这些结果超越或覆盖了大多数已知的估计。此外,对某些特定情况,我们的分析是第一次。
    Abstract This work considers a rather general and broad class of Markov chains, Ito chains that look like Euler-Maryama discretization of some Stochastic Differential Equation. The chain we study is a unified framework for theoretical analysis. It comes with almost arbitrary isotropic and state-dependent noise instead of normal and state-independent one, as in most related papers. Moreover, our chain's drift and diffusion coefficient can be inexact to cover a wide range of applications such as Stochastic Gradient Langevin Dynamics, sampling, Stochastic Gradient Descent, or Stochastic Gradient Boosting. We prove an upper bound for $W_{2}$-distance between laws of the Ito chain and the corresponding Stochastic Differential Equation. These results improve or cover most of the known estimates. Moreover, for some particular cases, our analysis is the first.
    摘要 Note: Simplified Chinese is also known as "简化字" or "简化字".Translation Notes:* "Markov chains" is translated as "Markov链" (Ma Ke Luo)* "Ito chains" is translated as "Itō链" (Ito Luo)* "Stochastic Differential Equation" is translated as "随机 diffe链方程" (Suī Jī Difu Luo Fang Jian)* "Wiener distance" is translated as "维纳度" (Wei Na Du)* "inexact" is translated as "不精确" (Bu Jing Ke)* "Stochastic Gradient Langevin Dynamics" is translated as "随机梯度兰格文运动" (Suī Jī Tiejian Langevin Yùndòng)* "sampling" is translated as "采样" (Cǎi Yàng)* "Stochastic Gradient Descent" is translated as "随机梯度下降" (Suī Jī Tiejian Xiào Jiàng)* "Stochastic Gradient Boosting" is translated as "随机梯度增强" (Suī Jī Tiejian Zēng Qiáng)

Optimal Exploration is no harder than Thompson Sampling

  • paper_url: http://arxiv.org/abs/2310.06069
  • repo_url: None
  • paper_authors: Zhaoqi Li, Kevin Jamieson, Lalit Jain
  • for: This paper aims to solve the pure exploration linear bandit problem with high probability through noisy measurements of $x^{\top}\theta_{\ast}$.
  • methods: The paper proposes an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate with the optimal exponent among all possible allocations asymptotically.
  • results: The algorithm proposed in the paper can be easily implemented and performs as well empirically as existing asymptotically optimal methods.Here is the same information in Simplified Chinese:
  • for: 本 paper 目标是解决纯exploration linear bandit问题,通过各种噪声测量 $x^{\top}\theta_{\ast}$ 来寻找最优解。
  • methods: 本 paper 提出了一种基于抽象和最大值 oracle 的算法,可以在高probability下实现快速收敛率,并且可以证明这种算法在所有分配中具有最优的幂率。
  • results: 本 paper 提出的算法可以轻松实现并与现有的 asymptotically 优化方法相当。
    Abstract Given a set of arms $\mathcal{Z}\subset \mathbb{R}^d$ and an unknown parameter vector $\theta_\ast\in\mathbb{R}^d$, the pure exploration linear bandit problem aims to return $\arg\max_{z\in \mathcal{Z} z^{\top}\theta_{\ast}$, with high probability through noisy measurements of $x^{\top}\theta_{\ast}$ with $x\in \mathcal{X}\subset \mathbb{R}^d$. Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm $z\in \mathcal{Z}$ or b) explicitly maintaining a subset of $\mathcal{Z}$ under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate $\mathcal{Z}$ at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically. In addition, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods.
    摘要 In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically.Moreover, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods.

Early Warning via tipping-preserving latent stochastic dynamical system and meta label correcting

  • paper_url: http://arxiv.org/abs/2310.06059
  • repo_url: None
  • paper_authors: Peng Zhang, Ting Gao, Jin Guo, Jinqiao Duan
  • for: 预测 эпилепсию患者的症状,以提高 их安全性和健康状况。
  • methods: 基于患者的EEG数据,提出了一种基于meta学习框架的预测方法,利用了meta标签修正方法,并通过优化 latent Stochastic differential equation(SDE) 中的信息,选择最佳的 latent 动力系统。
  • results: 通过实验 validate 了我们的方法,发现预测精度有surprisingly的增加。
    Abstract Early warning for epilepsy patients is crucial for their safety and well-being, in terms of preventing or minimizing the severity of seizures. Through the patients' EEG data, we propose a meta learning framework for improving prediction on early ictal signals. To better utilize the meta label corrector method, we fuse the information from both the real data and the augmented data from the latent Stochastic differential equation(SDE). Besides, we also optimally select the latent dynamical system via distribution of transition time between real data and that from the latent SDE. In this way, the extracted tipping dynamical feature is also integrated into the meta network to better label the noisy data. To validate our method, LSTM is implemented as the baseline model. We conduct a series of experiments to predict seizure in various long-term window from 1-2 seconds input data and find surprisingly increment of prediction accuracy.
    摘要 早期警告对 эпилепси patients 的安全和健康至关重要,以预防或减轻癫痫症发作的严重程度。通过患者的 EEG 数据,我们提议一种meta学框架,以提高预测早期癫痫症信号的精度。为了更好地利用meta标签修复方法,我们将真实数据和潜在数据从射频 diferencial equation(SDE)的 latent 信息 fusion。此外,我们还优化了 latent 动力系统的选择,通过transition时间分布 между real data和 latent SDE 中的数据来实现。这样,提取的折冲动力特征也被 интегрирова到 meta 网络中,以更好地标注噪音数据。为验证我们的方法,LSTM 被实现为基eline模型。我们进行了一系列实验,用1-2秒输入数据预测癫痫症,并发现了奇异的增加预测精度。

Knowledge Distillation for Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.06047
  • repo_url: https://github.com/HibikiJie/Multiresolution-Knowledge-Distillation-for-Anomaly-Detection
  • paper_authors: Adrian Alan Pol, Ekaterina Govorkova, Sonja Gronroos, Nadezda Chernyavskaya, Philip Harris, Maurizio Pierini, Isobel Ojalvo, Peter Elmer
  • for: 用于压缩无监督深度学习模型,以便在有限资源的设备上部署。
  • methods: 使用知识储存法压缩无监督异常检测模型,并提出一些改进检测敏感度的技巧。
  • results: 压缩模型与原始模型的性能相似,而减少大小和内存占用。
    Abstract Unsupervised deep learning techniques are widely used to identify anomalous behaviour. The performance of such methods is a product of the amount of training data and the model size. However, the size is often a limiting factor for the deployment on resource-constrained devices. We present a novel procedure based on knowledge distillation for compressing an unsupervised anomaly detection model into a supervised deployable one and we suggest a set of techniques to improve the detection sensitivity. Compressed models perform comparably to their larger counterparts while significantly reducing the size and memory footprint.
    摘要 <>转换文本到简化中文。>无监督深度学习技术广泛用于异常行为识别。这些方法的性能与训练数据量和模型大小相乘。然而,大小往往是部署在资源受限的设备上的限制因素。我们提出了一种基于知识储存的新方法,可以压缩无监督异常检测模型成可部署的超vised模型,并提出了一些提高检测敏感度的技巧。压缩模型与其更大的对手相比,性能相似,却减少了大小和内存占用。

Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions

  • paper_url: http://arxiv.org/abs/2310.05921
  • repo_url: None
  • paper_authors: Jordan Lekeufack, Anastasios N. Angelopoulos, Andrea Bajcsy, Michael I. Jordan, Jitendra Malik
  • for: 该论文旨在提供一种安全的自动决策框架,即使机器学习预测不准确。
  • methods: 该论文使用了准确预测理论,无需假设世界模型。
  • results: 实验表明,该方法在机器人运动规划、自动股票交易和机器人生产中具有实用性。
    Abstract We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions. Examples of such decisions are ubiquitous, from robot planning algorithms that rely on pedestrian predictions, to calibrating autonomous manufacturing to exhibit high throughput and low error, to the choice of trusting a nominal policy versus switching to a safe backup policy at run-time. The decisions produced by our algorithms are safe in the sense that they come with provable statistical guarantees of having low risk without any assumptions on the world model whatsoever; the observations need not be I.I.D. and can even be adversarial. The theory extends results from conformal prediction to calibrate decisions directly, without requiring the construction of prediction sets. Experiments demonstrate the utility of our approach in robot motion planning around humans, automated stock trading, and robot manufacturing.
    摘要 我们介绍了对准决策理论,一个框架用于生成安全的自动决策,即使机器学习预测不完美。这些决策的例子非常普遍,包括 robot 观察算法依赖人类预测,将自动生产调整为具有高速和低错误,以及在执行时是否信任主要政策或者转折到安全备用政策。我们的算法生成的决策是安全的,即具有可证的Statistical guarantee of low risk,不需要世界模型的任何假设,观察不必I.I.D.,甚至可以是反对的。我们的理论扩展了对准预测的结果,直接将决策calibrate,不需要建立预测集。实验展示了我们的方法在人类附近的机器人运动规划、自动股票交易和机器人生产中的 utility。

Learning to Decode the Surface Code with a Recurrent, Transformer-Based Neural Network

  • paper_url: http://arxiv.org/abs/2310.05900
  • repo_url: None
  • paper_authors: Johannes Bausch, Andrew W Senior, Francisco J H Heras, Thomas Edlich, Alex Davies, Michael Newman, Cody Jones, Kevin Satzinger, Murphy Yuezhen Niu, Sam Blackwell, George Holland, Dvir Kafri, Juan Atalaya, Craig Gidney, Demis Hassabis, Sergio Boixo, Hartmut Neven, Pushmeet Kohli
  • for: 这个论文的目的是提高量子计算的可靠性,通过使用机器学习来解码量子错误 correction 代码。
  • methods: 这个论文使用了循环、变换器基本的神经网络,通过直接学习数据来解码表面码。
  • results: 论文的解码器在实际数据上(Google Sycamore 量子处理器)以及模拟数据上(包括干扰和误差)都有优异表现,可以覆盖距离3和5表面码,并且在训练时间25个循环后仍保持高准确率。
    Abstract Quantum error-correction is a prerequisite for reliable quantum computation. Towards this goal, we present a recurrent, transformer-based neural network which learns to decode the surface code, the leading quantum error-correction code. Our decoder outperforms state-of-the-art algorithmic decoders on real-world data from Google's Sycamore quantum processor for distance 3 and 5 surface codes. On distances up to 11, the decoder maintains its advantage on simulated data with realistic noise including cross-talk, leakage, and analog readout signals, and sustains its accuracy far beyond the 25 cycles it was trained on. Our work illustrates the ability of machine learning to go beyond human-designed algorithms by learning from data directly, highlighting machine learning as a strong contender for decoding in quantum computers.
    摘要 量子错误纠正是可靠量子计算的必要前提。为达到这个目标,我们提出了一种循环、转换器基于神经网络,可以学习解码表面码,这是量子错误纠正代码的领先代码。我们的解码器在Google的Sycamore量子处理器上的真实数据上表现出优于当前最佳算法解码器,在距离3和5表面码上出现了优异表现。在距离11上,我们的解码器在实际噪音,包括交叠、泄漏和分析读取信号的 simulate 数据上维持了其优势,并保持了其精度远远超出了它被训练的25次。我们的工作表明了机器学习可以超越人类设计的算法,通过直接学习数据,机器学习成为量子计算中的强有力竞争者。

A Generalization Bound of Deep Neural Networks for Dependent Data

  • paper_url: http://arxiv.org/abs/2310.05892
  • repo_url: https://github.com/umd-huang-lab/neural-net-generalization-via-tensor
  • paper_authors: Quan Huu Do, Binh T. Nguyen, Lam Si Tung Ho
  • for: 这个研究是为了提供非站勤$\phi$-混合数据上的对抗学习网络对应。
  • methods: 本研究使用了对抗学习网络,并提出了一个新的一致性矩阵bound。
  • results: 研究发现,这个一致性矩阵bound可以对非站勤$\phi$-混合数据进行预测,并且比旧有的 bound 更为精确。
    Abstract Existing generalization bounds for deep neural networks require data to be independent and identically distributed (iid). This assumption may not hold in real-life applications such as evolutionary biology, infectious disease epidemiology, and stock price prediction. This work establishes a generalization bound of feed-forward neural networks for non-stationary $\phi$-mixing data.
    摘要 现有的总体化约束要求深度神经网络数据必须是独立并且相同分布(iid)。这个假设可能不成立在实际应用中,如生物进化、感染病毒流行病学和股票价格预测。这项工作建立了非站ARY $\phi$-混合数据的含积总体化约束。

A Machine Learning Approach to Predicting Single Event Upsets

  • paper_url: http://arxiv.org/abs/2310.05878
  • repo_url: https://github.com/architg1/CREMER
  • paper_authors: Archit Gupta, Chong Yock Eng, Deon Lim Meng Wee, Rashna Analia Ahmed, See Min Sim
  • for: 预测单个事件异常 (SEU) 的发生,以提高半导体设备的可靠性。
  • methods: 使用机器学习技术,只使用位置数据预测 SEU 发生。
  • results: 提高半导体设备的可靠性,创造更安全的数字环境。
    Abstract A single event upset (SEU) is a critical soft error that occurs in semiconductor devices on exposure to ionising particles from space environments. SEUs cause bit flips in the memory component of semiconductors. This creates a multitude of safety hazards as stored information becomes less reliable. Currently, SEUs are only detected several hours after their occurrence. CREMER, the model presented in this paper, predicts SEUs in advance using machine learning. CREMER uses only positional data to predict SEU occurrence, making it robust, inexpensive and scalable. Upon implementation, the improved reliability of memory devices will create a digitally safer environment onboard space vehicles.
    摘要 一个单一事件冲击(SEU)是半导体设备中critical soft error的一种重要问题,它由宇宙射线粒子引起,导致内存组件中的比特跳变。这会导致存储的信息变得更加不可靠,带来多种安全风险。目前,SEU的发生只能在several hours后被探测出来。本文中提出的CREMER模型使用机器学习技术预测SEU发生,只使用位置数据,因此具有robust、便宜和可扩展的特点。在实施后,内存设备的可靠性会得到改善,从而在空间 vehicles上创造出一个更加数字安全的环境。Note: "宇宙射线粒子" in the text refers to ionising particles from space environments.

Bio-inspired computational memory model of the Hippocampus: an approach to a neuromorphic spike-based Content-Addressable Memory

  • paper_url: http://arxiv.org/abs/2310.05868
  • repo_url: None
  • paper_authors: Daniel Casanueva-Morato, Alvaro Ayuso-Martinez, Juan P. Dominguez-Morales, Angel Jimenez-Fernandez, Gabriel Jimenez-Moreno
  • for: 这篇论文目的是开发一种基于海马 CA3 区域的生物体现学习系统,能够学习、忘记和回忆非正式的记忆 fragment。
  • methods: 该模型使用脉冲神经网络(SNN)和SpiNNaker 硬件平台实现,并进行了功能、压力和实用性测试。
  • results: 该模型可以学习、忘记和回忆非正式的记忆 fragment,并且在不同的压力和环境下能够正常工作。这是首次实现了一个完全可工作的生物体现学习系统,将为未来的更复杂的neuromorphic系统开拓新的可能性。
    Abstract The brain has computational capabilities that surpass those of modern systems, being able to solve complex problems efficiently in a simple way. Neuromorphic engineering aims to mimic biology in order to develop new systems capable of incorporating such capabilities. Bio-inspired learning systems continue to be a challenge that must be solved, and much work needs to be done in this regard. Among all brain regions, the hippocampus stands out as an autoassociative short-term memory with the capacity to learn and recall memories from any fragment of them. These characteristics make the hippocampus an ideal candidate for developing bio-inspired learning systems that, in addition, resemble content-addressable memories. Therefore, in this work we propose a bio-inspired spiking content-addressable memory model based on the CA3 region of the hippocampus with the ability to learn, forget and recall memories, both orthogonal and non-orthogonal, from any fragment of them. The model was implemented on the SpiNNaker hardware platform using Spiking Neural Networks. A set of experiments based on functional, stress and applicability tests were performed to demonstrate its correct functioning. This work presents the first hardware implementation of a fully-functional bio-inspired spiking hippocampal content-addressable memory model, paving the way for the development of future more complex neuromorphic systems.
    摘要 脑有计算能力,超过现代系统,能够解决复杂问题,使用简单的方式。神经科工程尝试模仿生物,以开发新的系统,拥有这种能力。生物启发式学习系统仍然是一个挑战,需要进一步的研究。脑中的梨膜区(CA3)是一种自动相关短期记忆,具有学习和记忆任何段落的能力。这些特点使得梨膜区成为开发生物启发式学习系统的理想选择。因此,在这项工作中,我们提出了基于CA3区的生物启发式脉冲记忆模型,具有学习、忘记和记忆任何段落的能力。该模型在SpiNNaker硬件平台上使用脉冲神经网络进行实现。我们对模型进行了功能、压力和实用性测试,以证明其正常工作。这项工作展示了首次实现了完全可用的生物启发式脉冲梨膜区内存模型,开创了未来更复杂的神经omorphic系统的发展之路。

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

  • paper_url: http://arxiv.org/abs/2310.05858
  • repo_url: https://github.com/jingliang-duan/dsac-t
  • paper_authors: Jingliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li
  • for: 提高模型自适应RL方法的性能,解决常见的过估问题。
  • methods: 使用分布式软actor-critic算法(DSAC),并进行了三种改进:批处理梯度调整、双值分布学习和 variance-based target return clipping。
  • results: 在多种环境中,DSAC-T超过了多种主流模型自适应RL算法,包括SAC、TD3、DDPG、TRPO和PPO,而且保证了高稳定性的学习过程和不同奖励缩放下的相似性。
    Abstract Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution. Nonetheless, standard DSAC has its own shortcomings, including occasionally unstable learning processes and needs for task-specific reward scaling, which may hinder its overall performance and adaptability in some special tasks. This paper further introduces three important refinements to standard DSAC in order to address these shortcomings. These refinements consist of critic gradient adjusting, twin value distribution learning, and variance-based target return clipping. The modified RL algorithm is named as DSAC with three refinements (DSAC-T or DSAC-v2), and its performances are systematically evaluated on a diverse set of benchmark tasks. Without any task-specific hyperparameter tuning, DSAC-T surpasses a lot of mainstream model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T, unlike its standard version, ensures a highly stable learning process and delivers similar performance across varying reward scales.
    摘要 “强化学习(RL)已经证明可以很好地解决复杂的决策和控制任务。然而,广泛使用的无策法RL方法经常会遭遇估计问题,导致性能下降。为了解决这个问题,我们最近提出了一种偏离策略RL算法,称为分布型软actor-批评(DSAC或DSAC-v1),可以有效地提高价值估计准确性。然而,标准DSAC有一些缺点,包括 occasionally 不稳定的学习过程和需要任务特定的奖励滤波,这可能会限制其总体性能和适应性。这篇文章进一步介绍了三种重要的DSAC改进,包括评价函数梯度调整、双值分布学习和归一化目标返回截卷。改进后的RL算法被称为DSAC-T或DSAC-v2,其性能在多种环境中进行系统性评估。无需任务特定的超参数调整,DSAC-T比许多主流无策法RL算法,包括SAC、TD3、DDPG、TRPO和PPO,在所有测试环境中表现出色。此外,DSAC-T不同于标准版本,可以保证学习过程非常稳定,并在不同的奖励档次下提供相似的性能。”

Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates

  • paper_url: http://arxiv.org/abs/2310.19807
  • repo_url: None
  • paper_authors: Guangchen Lan, Han Wang, James Anderson, Christopher Brinton, Vaneet Aggarwal
  • for: 这 paper 旨在解决 Federated reinforcement learning (FedRL) 中高度通信开销的问题,尤其是在 natural policy gradient (NPG) 方法中,以提高training效率。
  • methods: 该 paper 提出了 FedNPG-ADMM 框架,通过 alternating direction method of multipliers (ADMM) 方法来近似全局 NPG 方向,从而提高了training efficiency。
  • results: 该 paper theoretically 表明,使用 ADMM-based gradient updates 可以将 communication complexity 降低至 ${O}({d})$,其中 $d$ 是模型参数的数量。此外,paper 还证明了 FedNPG-ADMM 可以保持和标准 FedNPG 相同的 convergence rate。通过在 MuJoCo 环境中评估该 algorithm,paper 还证明了 FedNPG-ADMM 可以保持 reward performance,并且当Agent 数量增加时,其 convergence rate 会提高。
    Abstract Federated reinforcement learning (FedRL) enables agents to collaboratively train a global policy without sharing their individual data. However, high communication overhead remains a critical bottleneck, particularly for natural policy gradient (NPG) methods, which are second-order. To address this issue, we propose the FedNPG-ADMM framework, which leverages the alternating direction method of multipliers (ADMM) to approximate global NPG directions efficiently. We theoretically demonstrate that using ADMM-based gradient updates reduces communication complexity from ${O}({d^{2})$ to ${O}({d})$ at each iteration, where $d$ is the number of model parameters. Furthermore, we show that achieving an $\epsilon$-error stationary convergence requires ${O}(\frac{1}{(1-\gamma)^{2}{\epsilon})$ iterations for discount factor $\gamma$, demonstrating that FedNPG-ADMM maintains the same convergence rate as the standard FedNPG. Through evaluation of the proposed algorithms in MuJoCo environments, we demonstrate that FedNPG-ADMM maintains the reward performance of standard FedNPG, and that its convergence rate improves when the number of federated agents increases.
    摘要 federated reinforcement learning (FedRL) 允许代理共同训练全局策略,不需要分享个人数据。然而,交通开销仍然是critical bottleneck,特别是用natural policy gradient (NPG) 方法,这些方法是second-order。为解决这个问题,我们提出了FedNPG-ADMM框架,它利用了alternating direction method of multipliers (ADMM)来高效地计算全局NPG方向。我们 teorically 表明,使用 ADMM-based Gradient更新可以将交通复杂度从 $O(d^2)$ 降低到 $O(d)$ at each iteration,where $d$ 是模型参数的数量。此外,我们还证明了在 $\epsilon$-error stationary convergence 下,FedNPG-ADMM 需要 ${O}(\frac{1}{(1-\gamma)^{2}{\epsilon})$ 迭代,这与标准 FedNPG 的迭代速率相同。通过在 MuJoCo 环境中评估提议的算法,我们表明了FedNPG-ADMM 可以保持标准 FedNPG 的奖励性能,并且当多个联合代理增加时,其迭代速率会提高。

Robust Angular Synchronization via Directed Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.05842
  • repo_url: None
  • paper_authors: Yixuan He, Gesine Reinert, David Wipf, Mihai Cucuringu
  • for: angular synchronization problem and its heterogeneous extension (sensor network localization, phase retrieval, and distributed clock synchronization)
  • methods: directed graph neural networks and new loss functions
  • results: competitive and often superior performance against a comprehensive set of baselines, validating the robustness of GNNSync even at high noise levels.
    Abstract The angular synchronization problem aims to accurately estimate (up to a constant additive phase) a set of unknown angles $\theta_1, \dots, \theta_n\in[0, 2\pi)$ from $m$ noisy measurements of their offsets $\theta_i-\theta_j \;\mbox{mod} \; 2\pi.$ Applications include, for example, sensor network localization, phase retrieval, and distributed clock synchronization. An extension of the problem to the heterogeneous setting (dubbed $k$-synchronization) is to estimate $k$ groups of angles simultaneously, given noisy observations (with unknown group assignment) from each group. Existing methods for angular synchronization usually perform poorly in high-noise regimes, which are common in applications. In this paper, we leverage neural networks for the angular synchronization problem, and its heterogeneous extension, by proposing GNNSync, a theoretically-grounded end-to-end trainable framework using directed graph neural networks. In addition, new loss functions are devised to encode synchronization objectives. Experimental results on extensive data sets demonstrate that GNNSync attains competitive, and often superior, performance against a comprehensive set of baselines for the angular synchronization problem and its extension, validating the robustness of GNNSync even at high noise levels.
    摘要 “angular synchronization problem”targets to accurately estimate(up to a constant additive phase)a set of unknown angles $\theta_1, \dots, \theta_n\in[0, 2\pi)$ from $m$ noisy measurements of their offsets $\theta_i-\theta_j \;\mbox{mod} \; 2\pi$. Applications include sensor network localization, phase retrieval, and distributed clock synchronization. An extension of the problem to the heterogeneous setting(dubbed $k$-synchronization)is to estimate $k$ groups of angles simultaneously, given noisy observations(with unknown group assignment)from each group. Existing methods for angular synchronization usually perform poorly in high-noise regimes, which are common in applications. In this paper, we leverage neural networks for the angular synchronization problem and its heterogeneous extension by proposing GNNSync, a theoretically-grounded end-to-end trainable framework using directed graph neural networks. In addition, new loss functions are devised to encode synchronization objectives. Experimental results on extensive data sets demonstrate that GNNSync attains competitive, and often superior, performance against a comprehensive set of baselines for the angular synchronization problem and its extension, validating the robustness of GNNSync even at high noise levels.

A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models

  • paper_url: http://arxiv.org/abs/2310.05833
  • repo_url: None
  • paper_authors: Sebastian G. Gruber, Florian Buettner
  • for: 这篇论文的目的是为了提供一种评估生成模型的泛化性和不确定性的理论框架。
  • methods: 该论文使用了对kernel scores的偏差-弹性-协方差分解,并提出了不偏的和一致的估计器,只需要生成的样本而不需要下游模型。
  • results: 该论文的应用是评估扩散模型的泛化评估和发现了少数群体的极化现象,以及验证了干扰和预测卷积 entropy 作为生成模型的不确定性度量。
    Abstract Generative models, like large language models, are becoming increasingly relevant in our daily lives, yet a theoretical framework to assess their generalization behavior and uncertainty does not exist. Particularly, the problem of uncertainty estimation is commonly solved in an ad-hoc manner and task dependent. For example, natural language approaches cannot be transferred to image generation. In this paper we introduce the first bias-variance-covariance decomposition for kernel scores and their associated entropy. We propose unbiased and consistent estimators for each quantity which only require generated samples but not the underlying model itself. As an application, we offer a generalization evaluation of diffusion models and discover how mode collapse of minority groups is a contrary phenomenon to overfitting. Further, we demonstrate that variance and predictive kernel entropy are viable measures of uncertainty for image, audio, and language generation. Specifically, our approach for uncertainty estimation is more predictive of performance on CoQA and TriviaQA question answering datasets than existing baselines and can also be applied to closed-source models.
    摘要 大量语言模型在我们日常生活中变得越来越重要,然而一个有效的理论框架来评估它们的泛化行为和不确定性并没有出现。特别是不确定性估计问题通常采用做出的方式和任务相关。例如,自然语言方法无法被转移到图像生成。在这篇论文中,我们介绍了首个偏差-变量- covariance 分解 для核分数和它们相关的熵。我们提议不偏和一致的估计器,只需要生成的样本而不需要下游模型本身。作为应用,我们对扩散模型进行总体评估,发现扩散的小组聚合是对权重过拟合的反应。此外,我们发现了变量和预测核熵是图像、音频和语言生成中的不确定性度量。 Specifically,我们的不确定性估计方法在CoQA和TriviaQA问答数据集上的性能预测比现有基elines高,并且可以应用于关闭源模型。

Pre-trained Spatial Priors on Multichannel NMF for Music Source Separation

  • paper_url: http://arxiv.org/abs/2310.05821
  • repo_url: None
  • paper_authors: Pablo Cabanas-Molero, Antonio J. Munoz-Montoro, Julio Carabias-Orti, Pedro Vera-Candeas
  • for: 这个论文提出了一种基于录音设置信息的声音来源分离方法,可以应用于现有的室内乐录音设置。
  • methods: 该方法使用 solo 段来训练空间混合筛选器,以捕捉室内回声和扬声器响应的信息。然后将这个预训练过的筛选器integrated into a multichannel non-negative matrix factorization 方法,以更好地捕捉不同声音来源的方差。
  • results: 实验表明,该提出的框架可以更好地分离声音来源,比传统的 MNMF 方法提高性能。
    Abstract This paper presents a novel approach to sound source separation that leverages spatial information obtained during the recording setup. Our method trains a spatial mixing filter using solo passages to capture information about the room impulse response and transducer response at each sensor location. This pre-trained filter is then integrated into a multichannel non-negative matrix factorization (MNMF) scheme to better capture the variances of different sound sources. The recording setup used in our experiments is the typical setup for orchestra recordings, with a main microphone and a close "cardioid" or "supercardioid" microphone for each section of the orchestra. This makes the proposed method applicable to many existing recordings. Experiments on polyphonic ensembles demonstrate the effectiveness of the proposed framework in separating individual sound sources, improving performance compared to conventional MNMF methods.
    摘要 这篇论文提出了一种新的声音源分离方法,利用录制过程中获得的空间信息。我们的方法使用独奏段来训练一个空间混合 filters,以捕捉室内响应和传播器响应在每个感知器位置上的信息。这个预训练过的滤波器然后被 интеGRATED INTO a multichannel non-negative matrix factorization (MNMF) 方案,以更好地捕捉不同声音源的方差。我们的实验使用了典型的乐团录制设置,即主 Mikrofon 和每个乐器部分的 "cardioid" 或 "supercardioid" Mikrofon。这使得我们的方法可以应用于许多现有的录音。实验表明,我们的框架可以更有效地分离声音源,与传统的 MNMF 方法相比,对多重演奏体示出了更好的性能。

Sharing Information Between Machine Tools to Improve Surface Finish Forecasting

  • paper_url: http://arxiv.org/abs/2310.05807
  • repo_url: None
  • paper_authors: Daniel R. Clarkson, Lawrence A. Bull, Tina A. Dardeno, Chandula T. Wickramarachchi, Elizabeth J. Cross, Timothy J. Rogers, Keith Worden, Nikolaos Dervilis, Aidan J. Hughes
  • for: 预测机器制造过程中表面质量
  • methods: bayesian hierarchical model、bayesian linear regression
  • results: 提高预测精度和不确定性评估In Simplified Chinese text:
  • for: 用于预测机器制造过程中的表面质量
  • methods: 使用 bayesian hierarchical model 和 bayesian linear regression
  • results: 提高预测精度和不确定性评估
    Abstract At present, most surface-quality prediction methods can only perform single-task prediction which results in under-utilised datasets, repetitive work and increased experimental costs. To counter this, the authors propose a Bayesian hierarchical model to predict surface-roughness measurements for a turning machining process. The hierarchical model is compared to multiple independent Bayesian linear regression models to showcase the benefits of partial pooling in a machining setting with respect to prediction accuracy and uncertainty quantification.
    摘要 Note: Simplified Chinese is also known as "简化字" or "简化字".Here's the text in Simplified Chinese:当前,大多数表面质量预测方法只能进行单任务预测,这会导致数据被占用不足,重复工作和实验成本增加。为了解决这个问题,作者提议了一种折衣概率模型,用于预测转动加工过程中表面粗糙度测量值。这个模型与多个独立的折衣线性回归模型进行比较,以示出部分汇集在机床设备中的优点,包括预测精度和不确定性评估的提高。

Boosted Control Functions

  • paper_url: http://arxiv.org/abs/2310.05805
  • repo_url: https://github.com/zszszszsz/.config
  • paper_authors: Nicola Gnecco, Jonas Peters, Sebastian Engelke, Niklas Pfister
  • for: 这篇研究旨在bridging the gap between existing prediction methods and the presence of hidden confounding, especially when the training and testing data are different.
  • methods: 本研究使用了distribution generalization from machine learning和simultaneous equation models and control function from econometrics,并提出了一新的同时方程模型(SIMDG)来描述资料生成过程下的分布差异。
  • results: 研究发现了一个强制条件(boosted control function,BCF),可以在不同的训练和测试数据下预测成功,并且提供了必要和充分的条件来识别BCF。
    Abstract Modern machine learning methods and the availability of large-scale data opened the door to accurately predict target quantities from large sets of covariates. However, existing prediction methods can perform poorly when the training and testing data are different, especially in the presence of hidden confounding. While hidden confounding is well studied for causal effect estimation (e.g., instrumental variables), this is not the case for prediction tasks. This work aims to bridge this gap by addressing predictions under different training and testing distributions in the presence of unobserved confounding. In particular, we establish a novel connection between the field of distribution generalization from machine learning, and simultaneous equation models and control function from econometrics. Central to our contribution are simultaneous equation models for distribution generalization (SIMDGs) which describe the data-generating process under a set of distributional shifts. Within this framework, we propose a strong notion of invariance for a predictive model and compare it with existing (weaker) versions. Building on the control function approach from instrumental variable regression, we propose the boosted control function (BCF) as a target of inference and prove its ability to successfully predict even in intervened versions of the underlying SIMDG. We provide necessary and sufficient conditions for identifying the BCF and show that it is worst-case optimal. We introduce the ControlTwicing algorithm to estimate the BCF and analyze its predictive performance on simulated and real world data.
    摘要 现代机器学习方法和大规模数据的可用性打开了预测目标量的准确预测的大门。然而,现有的预测方法在训练和测试数据不同时可能表现不佳,特别是在隐藏束缚的情况下。隐藏束缚在 causal effect estimation 中已经得到了广泛的研究(例如,用工具变量),但是这并不是预测任务的情况。本研究旨在bridging这个差距,通过面对不同训练和测试分布下的预测 task 中隐藏束缚的问题。特别是,我们建立了一种 novel connection между机器学习中的分布泛化和 econometrics 中的同时方程模型和控制函数。我们的贡献包括在这种框架下提出的同时方程模型 для分布泛化(SIMDGs),这些模型描述了数据生成过程中的分布性变化。在这个框架下,我们提出了一种强版均衡性的目标函数,并与现有(弱版)目标函数进行比较。基于控制函数方法,我们提出了增强控制函数(BCF)作为预测目标,并证明其能够在对 SIMDG 进行 intervened 后仍能成功预测。我们还提供了 necessary and sufficient conditions для Identifying BCF,并证明它是最差情况下的优化目标。 finally,我们介绍了 ControlTwicing 算法来估计 BCF,并分析了它在 simulated 和实际数据上的预测性能。

An operator preconditioning perspective on training in physics-informed machine learning

  • paper_url: http://arxiv.org/abs/2310.05801
  • repo_url: None
  • paper_authors: Tim De Ryck, Florent Bonnet, Siddhartha Mishra, Emmanuel de Bézenac
  • for: investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs
  • methods: employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies for preconditioning a critical differential operator
  • results: the difficulty in training these models is closely related to the conditioning of a specific differential operator, and preconditioning this operator is crucial for improving training
    Abstract In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
    摘要 在这篇论文中,我们研究了梯度下降算法在物理学知识Machine learning方法中的行为,如PINNs,它们用来最小化连接到部分偏微分方程(PDEs)的差异。我们的关键结果表明,训练这些模型的困难直接与一个特定的导数器的条件相关。这个导数器,则是PDE的导数器的 Hermitian平方的一个特殊情况。如果这个导数器是不良条件的,它会导致训练慢或不可能进行。因此,预conditioning这个关键导数器是关键。我们使用了严格的数学分析和实验评估来研究不同的策略,解释它们如何改善这个关键导数器的条件,并因此提高训练。

The First Cadenza Signal Processing Challenge: Improving Music for Those With a Hearing Loss

  • paper_url: http://arxiv.org/abs/2310.05799
  • repo_url: https://github.com/claritychallenge/clarity/tree/main/recipes/cad1/task1
  • paper_authors: Gerardo Roa Dabike, Scott Bannister, Jennifer Firth, Simone Graetzer, Rebecca Vos, Michael A. Akeroyd, Jon Barker, Trevor J. Cox, Bruno Fazenda, Alinka Greasley, William Whitmer
  • for: 提高音乐质量 для听力受损人群
  • methods: 使用信号处理挑战和个性化混音/分离技术
  • results: 提高音乐质量,使用HAAQI指数对象评估和人类评审者对subjective评估
    Abstract The Cadenza project aims to improve the audio quality of music for those who have a hearing loss. This is being done through a series of signal processing challenges, to foster better and more inclusive technologies. In the first round, two common listening scenarios are considered: listening to music over headphones, and with a hearing aid in a car. The first scenario is cast as a demixing-remixing problem, where the music is decomposed into vocals, bass, drums and other components. These can then be intelligently remixed in a personalized way, to increase the audio quality for a person who has a hearing loss. In the second scenario, music is coming from car loudspeakers, and the music has to be enhanced to overcome the masking effect of the car noise. This is done by taking into account the music, the hearing ability of the listener, the hearing aid and the speed of the car. The audio quality of the submissions will be evaluated using the Hearing Aid Audio Quality Index (HAAQI) for objective assessment and by a panel of people with hearing loss for subjective evaluation.
    摘要 《干扰计划》旨在提高音乐质量,以帮助有听力问题的人。这是通过一系列的信号处理挑战,促进更加包容的科技。在首轮中,考虑了两个常见的听音情况:用耳机听音乐,以及在车里使用听力器。第一个情况是将音乐转化为男女声、低音、鼓等部分,然后以人性化的方式重新混合,以提高听损人的音乐质量。第二个情况是音乐来自车里的 loudspeakers,需要利用音乐、听力问题、听力器和车速来增强音乐,以扩除车声的遮蔽效应。音乐质量的评价使用《听力器音乐质量指数》(HAAQI)进行 объектив评估,并由有听力问题的人士进行主观评价。

Efficient Hybrid Oversampling and Intelligent Undersampling for Imbalanced Big Data Classification

  • paper_url: http://arxiv.org/abs/2310.05789
  • repo_url: None
  • paper_authors: Carla Vairetti, José Luis Assadi, Sebastián Maldonado
  • for: solves the issue of imbalanced classification in real-world applications
  • methods: combines intelligent undersampling and oversampling using a MapReduce framework
  • results: outperforms alternative resampling techniques for small- and medium-sized datasets, achieves positive results on large datasets with reduced running times.Here’s the full translation of the paper’s abstract in Simplified Chinese:
  • for: solves the issue of imbalanced classification in real-world applications
  • methods: combines intelligent undersampling and oversampling using a MapReduce framework
  • results: outperforms alternative resampling techniques for small- and medium-sized datasets, achieves positive results on large datasets with reduced running times.I hope this helps! Let me know if you have any other questions.
    Abstract Imbalanced classification is a well-known challenge faced by many real-world applications. This issue occurs when the distribution of the target variable is skewed, leading to a prediction bias toward the majority class. With the arrival of the Big Data era, there is a pressing need for efficient solutions to solve this problem. In this work, we present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework. Both procedures are performed on the same pass over the data, conferring efficiency to the technique. The SMOTENN method is complemented with an efficient implementation of the neighborhoods related to the minority samples. Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets while achieving positive results on large datasets with reduced running times.
    摘要 不均衡分类是现实世界中许多应用程序的挑战之一。这种问题出现在目标变量的分布偏斜时,导致预测偏向大多数类。在大数据时代 arrives,有一项压力需要解决这个问题。在这种工作中,我们提出了一种新的抽样方法called SMOTENN,它将智能下抽样和上抽样与 MapReduce 框架结合在一起。两种过程都在数据上进行了同一次读取,从而提高了方法的效率。 SMOTENN 方法还包括有效地实现少数类邻居的方法。我们的实验结果表明,这种方法在小到中型数据集上表现出色,而且在大数据集上具有减少运行时间的正面效果。

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

  • paper_url: http://arxiv.org/abs/2310.05779
  • repo_url: https://github.com/copenlu/wiki-stance
  • paper_authors: Lucie-Aimée Kaffee, Arnav Arora, Isabelle Augenstein
  • For: The paper aims to improve the transparency of content moderation on online platforms, specifically on Wikipedia, by constructing a novel multilingual dataset of editor discussions and their reasoning.* Methods: The paper uses a machine learning approach to predict the stance and reason (content moderation policy) of editors for each edit decision, adding transparency to the decision-making process.* Results: The paper demonstrates that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, providing a more transparent approach to content moderation.Here are the three key information points in Simplified Chinese text:* For: 论文目的是提高在线平台上的内容审核透明度,具体来说是在Wikipedia上进行编辑者讨论和决策的透明度。* Methods: 论文使用机器学习方法预测编辑者决策中的态度和理由(内容审核政策),以提高决策过程的透明度。* Results: 论文表明,态度和相应的理由(政策)可以通过高精度预测的方法相互关联,从而提供更透明的内容审核方法。
    Abstract The moderation of content on online platforms is usually non-transparent. On Wikipedia, however, this discussion is carried out publicly and the editors are encouraged to use the content moderation policies as explanations for making moderation decisions. Currently, only a few comments explicitly mention those policies -- 20% of the English ones, but as few as 2% of the German and Turkish comments. To aid in this process of understanding how content is moderated, we construct a novel multilingual dataset of Wikipedia editor discussions along with their reasoning in three languages. The dataset contains the stances of the editors (keep, delete, merge, comment), along with the stated reason, and a content moderation policy, for each edit decision. We demonstrate that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process. We release both our joint prediction models and the multilingual content moderation dataset for further research on automated transparent content moderation.
    摘要 在线平台内容Moderation通常是不透明的。然而,在Wikipedia上,这个讨论被公开进行,编辑被鼓励使用内容Moderation政策作为决策的解释。目前,只有英语评论中有20%是明确提到政策,德语和土耳其语评论中则只有2%。为了帮助理解内容Moderation的过程,我们构建了一个多语言数据集,包括Wikipedia编辑讨论、理由和内容Moderation政策。我们示示了editors的立场和相应的理由(政策)可以并行预测,增加了决策过程的透明度。我们发布了联合预测模型和多语言内容Moderation数据集,以便进一步研究自动透明内容Moderation。

Foundation Models Meet Visualizations: Challenges and Opportunities

  • paper_url: http://arxiv.org/abs/2310.05771
  • repo_url: None
  • paper_authors: Weikai Yang, Mengchen Liu, Zheng Wang, Shixia Liu
  • for: This paper explores the intersection of visualization techniques and foundation models like BERT and GPT, and how they can be used to improve transparency, explainability, fairness, and robustness in AI systems.
  • methods: The paper divides the intersections of visualization techniques and foundation models into two main areas: visualizations for foundation models (VIS4FM) and foundation models for visualizations (FM4VIS).
  • results: The paper highlights the challenges and opportunities that arise from the confluence of foundation models and visualizations, and provides a starting point for continued exploration in this promising avenue.Here is the same information in Simplified Chinese:
  • for: 这篇论文探讨了基础模型如BERT和GPT与视觉技术的交叉,以提高人工智能系统的透明度、解释性、公平性和稳定性。
  • methods: 论文将这些交叉分为两个主要领域:用于基础模型的视觉(VIS4FM)和基础模型用于视觉的发展(FM4VIS)。
  • results: 论文描述了这些交叉所带来的挑战和机遇,并提供了这个领域的开始点 для进一步的探索。
    Abstract Recent studies have indicated that foundation models, such as BERT and GPT, excel in adapting to a variety of downstream tasks. This adaptability has established them as the dominant force in building artificial intelligence (AI) systems. As visualization techniques intersect with these models, a new research paradigm emerges. This paper divides these intersections into two main areas: visualizations for foundation models (VIS4FM) and foundation models for visualizations (FM4VIS). In VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate models. This addresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, within FM4VIS, we highlight how foundation models can be utilized to advance the visualization field itself. The confluence of foundation models and visualizations holds great promise, but it also comes with its own set of challenges. By highlighting these challenges and the growing opportunities, this paper seeks to provide a starting point for continued exploration in this promising avenue.
    摘要 现代研究表明,基础模型,如BERT和GPT,在适应多种下游任务方面表现出色。这种适应能力使其成为人工智能系统建设的主导力量。在这些模型与视觉化技术交叉点的研究中,一个新的研究模式出现。这篇论文将这些交叉点分为两个主要领域:用于基础模型的视觉化(VIS4FM)和基础模型为视觉化的应用(FM4VIS)。在 VIS4FM 中,我们探索了视觉化在理解、修改和评估这些复杂模型的 primacy 角色。这种需求包括透明度、解释性、公平性和稳定性。相反,在 FM4VIS 中,我们强调了基础模型如何推动视觉化领域的进步。这两个领域的交叉点具有极大的推动力,但也存在一些挑战。通过强调这些挑战和快速发展的机遇,这篇论文希望为这一领域的进一步探索提供一个开始。

LCOT: Linear circular optimal transport

  • paper_url: http://arxiv.org/abs/2310.06002
  • repo_url: None
  • paper_authors: Rocio Diaz Martin, Ivan Medri, Yikun Bai, Xinran Liu, Kangbai Yan, Gustavo K. Rohde, Soheil Kolouri
  • for: 这篇论文主要关注于圆形概率分布,并提出了一新的计算效率高的度量方法,即线性圆形最佳运输(LCOT)。
  • methods: 该论文引入了一个新的计算效率高的度量方法,即LCOT,并提供了一个可靠的线性映射,使得可以将机器学习(ML)算法应用到圆形概率分布上,并且让度量方法与ML算法之间的转换非常容易。
  • results: 论文通过一系列的数据实验示出了LCOT的效能,并显示了它在学习圆形概率分布的表现比较高于传统的圆形最佳运输(COT)度量方法。
    Abstract The optimal transport problem for measures supported on non-Euclidean spaces has recently gained ample interest in diverse applications involving representation learning. In this paper, we focus on circular probability measures, i.e., probability measures supported on the unit circle, and introduce a new computationally efficient metric for these measures, denoted as Linear Circular Optimal Transport (LCOT). The proposed metric comes with an explicit linear embedding that allows one to apply Machine Learning (ML) algorithms to the embedded measures and seamlessly modify the underlying metric for the ML algorithm to LCOT. We show that the proposed metric is rooted in the Circular Optimal Transport (COT) and can be considered the linearization of the COT metric with respect to a fixed reference measure. We provide a theoretical analysis of the proposed metric and derive the computational complexities for pairwise comparison of circular probability measures. Lastly, through a set of numerical experiments, we demonstrate the benefits of LCOT in learning representations of circular measures.
    摘要 最近几年,非欧几何空间上的最优运输问题已经吸引了多种应用,其中包括表示学习。在这篇论文中,我们将关注圆形概率度量,即圆周上的概率度量,并提出一种新的计算高效的度量,称为线性圆形最优运输(LCOT)。我们的度量包括一个显式的线性映射,使得可以通过对扩展到Machine Learning(ML)算法的度量进行应用,并且可以顺利地修改下面的度量为LCOT。我们证明了我们的度量基于圆形最优运输(COT)度量,并且可以视为对固定参照度量的线性化。我们提供了对度量的理论分析,并计算了对圆形概率度量的对比的计算复杂度。最后,通过一系列的数值实验,我们证明了LCOT在学习圆形度量的表示方面的好处。

Nonlinear Correct and Smooth for Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.05757
  • repo_url: None
  • paper_authors: Yuanhang Shao, Xiuwen Liu
  • for: 本研究针对Graph-based semi-supervised learning (GSSL) 进行了改进,以提高预测性能。
  • methods: 本研究使用了 Label Propagation (LP) 和 Graph Neural Networks (GNNs) 等方法,并将它们组合以提高表现。
  • results: 系统评估显示,本研究的方法可以在六个常用的数据集上取得了remarkable的平均提升率,较基本预测方法提升率高出13.71%,并且较现有的后处理方法提升率高出2.16%。
    Abstract Graph-based semi-supervised learning (GSSL) has been used successfully in various applications. Existing methods leverage the graph structure and labeled samples for classification. Label Propagation (LP) and Graph Neural Networks (GNNs) both iteratively pass messages on graphs, where LP propagates node labels through edges and GNN aggregates node features from the neighborhood. Recently, combining LP and GNN has led to improved performance. However, utilizing labels and features jointly in higher-order graphs has not been explored. Therefore, we propose Nonlinear Correct and Smooth (NLCS), which improves the existing post-processing approach by incorporating non-linearity and higher-order representation into the residual propagation to handle intricate node relationships effectively. Systematic evaluations show that our method achieves remarkable average improvements of 13.71% over base prediction and 2.16% over the state-of-the-art post-processing method on six commonly used datasets. Comparisons and analyses show our method effectively utilizes labels and features jointly in higher-order graphs to resolve challenging graph relationships.
    摘要 GRAPH-BASED SEMI-SUPERVISED LEARNING (GSSL) 已经成功应用于多个领域。现有方法利用图结构和标注样本进行分类。标签推广(LP)和图神经网络(GNNs)都是在图上进行迭代传递消息的方法,其中LP通过边传递节点标签,GNN从邻居聚合节点特征。最近,将LP和GNN结合使用已经导致了提高性能。然而,在更高阶图上同时利用标签和特征还没有被探索。因此,我们提出了非线性稳定(NLCS)方法,它通过在剩余传播中添加非线性和高阶表示来有效地处理图中复杂的节点关系。系统性评估显示,我们的方法在六个常用的数据集上取得了显著的平均提升率为13.71%,与基础预测相比,和state-of-the-art post-processing方法相比,分别提高了2.16%。比较和分析表明,我们的方法能够有效地在更高阶图上同时利用标签和特征进行分类。

Deep Concept Removal

  • paper_url: http://arxiv.org/abs/2310.05755
  • repo_url: https://github.com/aman432/Spam-Classifier
  • paper_authors: Yegor Klochkov, Jean-Francois Ton, Ruocheng Guo, Yang Liu, Hang Li
  • for: 本研究旨在深度神经网络中解决概念除去问题,以学习不含特定概念(如性别等)的表示。
  • methods: 我们提出了一种基于对概念集的对抗线性分类器的新方法,该方法可以帮助移除目标特征而不影响模型性能。我们在不同层次的网络中采用对抗探测类ifier,有效地解决了概念杂糜和OOD泛化问题。
  • results: 我们在一些流行的分布式 robust optimization(DRO)benchmark上进行了评估,以及OOD泛化任务。结果表明,我们的方法可以有效地除去概念,同时保持模型性能。
    Abstract We address the problem of concept removal in deep neural networks, aiming to learn representations that do not encode certain specified concepts (e.g., gender etc.) We propose a novel method based on adversarial linear classifiers trained on a concept dataset, which helps to remove the targeted attribute while maintaining model performance. Our approach Deep Concept Removal incorporates adversarial probing classifiers at various layers of the network, effectively addressing concept entanglement and improving out-of-distribution generalization. We also introduce an implicit gradient-based technique to tackle the challenges associated with adversarial training using linear classifiers. We evaluate the ability to remove a concept on a set of popular distributionally robust optimization (DRO) benchmarks with spurious correlations, as well as out-of-distribution (OOD) generalization tasks.
    摘要 我们关注深度神经网络中的概念除除问题,即学习不包含特定指定的概念(如性别等)的表示。我们提出了一种基于对概念集合的 adversarial 线性分类器的方法,可以帮助除除目标特征而保持模型性能。我们的方法深度概念除除包括对网络各层的 adversarial 探测器,有效地解决概念杂糜和外围泛化问题。此外,我们还介绍了一种基于偏导数的技术来解决对 adversarial 训练使用线性分类器的挑战。我们在一些流行的分布robust优化(DRO)benchmark上进行了评估,以及外围泛化任务。

Estimating Shape Distances on Neural Representations with Limited Samples

  • paper_url: http://arxiv.org/abs/2310.05742
  • repo_url: None
  • paper_authors: Dean A. Pospisil, Brett W. Larsen, Sarah E. Harvey, Alex H. Williams
  • for: 本研究旨在提供高维网络表示之间的几何相似性测量的一种有效方法,并且对这些方法进行了系统的分析和评估。
  • methods: 本研究使用了标准估计器,以及一种新的方法——方差调整的方法 OF moments estimator,以确定高维网络表示之间的几何相似性。
  • results: 研究发现,标准估计器在高维特征空间中存在困难,而新引入的方法 OF moments estimator 能够在实验和神经网络数据上达到更高的性能,特别是在高维设置下。
    Abstract Measuring geometric similarity between high-dimensional network representations is a topic of longstanding interest to neuroscience and deep learning. Although many methods have been proposed, only a few works have rigorously analyzed their statistical efficiency or quantified estimator uncertainty in data-limited regimes. Here, we derive upper and lower bounds on the worst-case convergence of standard estimators of shape distance$\unicode{x2014}$a measure of representational dissimilarity proposed by Williams et al. (2021). These bounds reveal the challenging nature of the problem in high-dimensional feature spaces. To overcome these challenges, we introduce a new method-of-moments estimator with a tunable bias-variance tradeoff. We show that this estimator achieves superior performance to standard estimators in simulation and on neural data, particularly in high-dimensional settings. Thus, we lay the foundation for a rigorous statistical theory for high-dimensional shape analysis, and we contribute a new estimation method that is well-suited to practical scientific settings.
    摘要 To overcome these challenges, we introduce a new method-of-moments estimator with a tunable bias-variance tradeoff. We show that this estimator achieves superior performance to standard estimators in simulation and on neural data, particularly in high-dimensional settings. Our findings lay the foundation for a rigorous statistical theory for high-dimensional shape analysis and contribute a new estimation method well-suited to practical scientific settings.

Post-hoc Bias Scoring Is Optimal For Fair Classification

  • paper_url: http://arxiv.org/abs/2310.05725
  • repo_url: None
  • paper_authors: Wenlong Chen, Yegor Klochkov, Yang Liu
  • for: 本文目标是研究一种基于分组公平约束的二分类问题的解决方案,包括人口均衡(DP)、平等机会(EOp)和平等投票机会(EO)等。
  • methods: 本文提出了一种基于 Bayes 优化的修改规则,通过对每个实例计算一个新的偏见指标(bias score),并将这些指标应用于修改规则来实现分组公平。修改规则可以是单个阈值或二元数组,具体取决于所使用的公平约束。
  • results: 本文通过使用三个数据集(Adult、COMPAS 和 CelebA)进行实验,显示了与内部处理和后处理方法相比,该方法可以实现高精度和分组公平。此外,该方法不需要在推断时访问敏感特征。
    Abstract We consider a binary classification problem under group fairness constraints, which can be one of Demographic Parity (DP), Equalized Opportunity (EOp), or Equalized Odds (EO). We propose an explicit characterization of Bayes optimal classifier under the fairness constraints, which turns out to be a simple modification rule of the unconstrained classifier. Namely, we introduce a novel instance-level measure of bias, which we call bias score, and the modification rule is a simple linear rule on top of the finite amount of bias scores. Based on this characterization, we develop a post-hoc approach that allows us to adapt to fairness constraints while maintaining high accuracy. In the case of DP and EOp constraints, the modification rule is thresholding a single bias score, while in the case of EO constraints we are required to fit a linear modification rule with 2 parameters. The method can also be applied for composite group-fairness criteria, such as ones involving several sensitive attributes. We achieve competitive or better performance compared to both in-processing and post-processing methods across three datasets: Adult, COMPAS, and CelebA. Unlike most post-processing methods, we do not require access to sensitive attributes during the inference time.
    摘要 我们考虑了一个二分类问题,其中需要满足一些群体公平约束,可能是人口比例(DP)、机会平等(EOp)或机会几率(EO)。我们提出了一个bayes最优分类器的显式化 caracterization,这 turns out to be a simple modification rule of the unconstrained classifier。我们引入了一个新的实例级别偏见度量,称为偏见得分,并且这个修改规则是一个单个偏见得分的阈值处理。基于这个characterization,我们开发了一种后处方法,可以在维护高准确率的同时适应公平约束。在DP和EOp约束下,修改规则是对偏见得分进行阈值处理,而在EO约束下,我们需要适应一个线性修改规则 WITH 2个参数。此方法还可以应用于复杂的群体公平标准,例如包括多个敏感特征。我们在三个 dataset(Adult、COMPAS和CelebA)上达到了竞争或更好的性能,与大多数后处方法不同的是,我们在推理时不需要访问敏感特征。

Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.05723
  • repo_url: None
  • paper_authors: Trevor McInroe, Stefano V. Albrecht, Amos Storkey
  • for: 在减少在线交互次数的情况下,找到最佳政策。
  • methods: 使用在线搜索和规划算法,以最大化在线数据收集的利益。
  • results: PTGOOD算法可以在减少在线交互次数的情况下,提高代理返回和找到最佳政策,并且可以避免许多基线算法在不同环境中的低效 converges。
    Abstract Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm that is well matched to a real-world RL deployment process: in few real settings would one deploy an offline policy with no test runs and tuning. In this scenario, we aim to find the best-performing policy within a limited budget of online interactions. Previous work in the OtO setting has focused on correcting for bias introduced by the policy-constraint mechanisms of offline RL algorithms. Such constraints keep the learned policy close to the behavior policy that collected the dataset, but this unnecessarily limits policy performance if the behavior policy is far from optimal. Instead, we forgo policy constraints and frame OtO RL as an exploration problem: we must maximize the benefit of the online data-collection. We study major online RL exploration paradigms, adapting them to work well with the OtO setting. These adapted methods contribute several strong baselines. Also, we introduce an algorithm for planning to go out of distribution (PTGOOD), which targets online exploration in relatively high-reward regions of the state-action space unlikely to be visited by the behavior policy. By leveraging concepts from the Conditional Entropy Bottleneck, PTGOOD encourages data collected online to provide new information relevant to improving the final deployment policy. In that way the limited interaction budget is used effectively. We show that PTGOOD significantly improves agent returns during online fine-tuning and finds the optimal policy in as few as 10k online steps in Walker and in as few as 50k in complex control tasks like Humanoid. Also, we find that PTGOOD avoids the suboptimal policy convergence that many of our baselines exhibit in several environments.
    摘要 假设我们有一个偏向于实际世界的强化学习(RL)部署过程:在几个实际场景中,我们不会直接部署一个没有测试和调整的策略。在这种情况下,我们想找到最佳的策略,而且在有限的在线互动次数内完成。先前的工作在OtO设定中集中在修正在线RL算法中的偏见问题上。这些约束保持学习策略与数据收集过程中的行为策略相互关联,但这会无需lessly限制策略性能,如果行为策略远离优化。因此,我们不采用策略约束,而是视为探索问题,我们需要在在线数据收集中最大化收益。我们研究了在线RL探索方法,并将其适应到OtO设定中。这些适应方法提供了多个强大的基线。此外,我们介绍了一种计划去偏现(PTGOOD)算法,该算法target在高奖励区域的状态动作空间中进行在线探索。通过利用Conditional Entropy Bottleneck的概念,PTGOOD鼓励在线数据收集提供新的有用信息,以改进最终部署策略。因此,我们可以有效地使用有限的互动次数。我们显示,PTGOOD在在线细化中显著提高了代理返回,并在Walker和复杂控制任务中在10k和50k在线步骤内找到优化策略。此外,我们发现PTGOOD可以避免许多我们的基eline在多个环境中展现的差异性。

Transformer Fusion with Optimal Transport

  • paper_url: http://arxiv.org/abs/2310.05719
  • repo_url: https://github.com/Yahnnosh/Exploring-Model-Fusion-with-Optimal-Transport-on-Transformers
  • paper_authors: Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh
  • for: 这 paper 的目的是探讨 transformer 网络的合并技术,以提高模型的性能。
  • methods: 这 paper 使用 Optimal Transport 算法来软对接 transformer 网络的不同组件,以实现层Alignment。 authors 还提出了一种抽象层Alignment方法,可以普适应用于不同的架构。
  • results: experiments 表明,这 paper 的方法可以提高 transformer 网络的性能,并且可以让模型具有更好的泛化能力。 authors 还发现了一些有趣的现象,例如软对接在 transformer 网络中的重要作用。
    Abstract Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities. Past attempts have been restricted to the case of fully-connected, convolutional, and residual networks. In this paper, we present a systematic approach for fusing two or more transformer-based networks exploiting Optimal Transport to (soft-)align the various architectural components. We flesh out an abstraction for layer alignment, that can generalize to arbitrary architectures -- in principle -- and we apply this to the key ingredients of Transformers such as multi-head self-attention, layer-normalization, and residual connections, and we discuss how to handle them via various ablation studies. Furthermore, our method allows the fusion of models of different sizes (heterogeneous fusion), providing a new and efficient way for compression of Transformers. The proposed approach is evaluated on both image classification tasks via Vision Transformer and natural language modeling tasks using BERT. Our approach consistently outperforms vanilla fusion, and, after a surprisingly short finetuning, also outperforms the individual converged parent models. In our analysis, we uncover intriguing insights about the significant role of soft alignment in the case of Transformers. Our results showcase the potential of fusing multiple Transformers, thus compounding their expertise, in the budding paradigm of model fusion and recombination.
    摘要 文本翻译为简化中文。merge multiple independently trained neural networks to combine their capabilities. Previous attempts were limited to fully connected, convolutional, and residual networks. In this paper, we present a systematic approach for fusing two or more transformer-based networks using optimal transport to align the various architectural components. We provide an abstraction for layer alignment that can generalize to any architecture and apply it to key ingredients of Transformers such as multi-head self-attention, layer normalization, and residual connections. We also discuss how to handle them through various ablation studies. Furthermore, our method allows the fusion of models of different sizes (heterogeneous fusion), providing a new and efficient way for compression of Transformers. The proposed approach is evaluated on both image classification tasks using Vision Transformer and natural language modeling tasks using BERT. Our approach consistently outperforms vanilla fusion and, after a surprisingly short finetuning, also outperforms the individual converged parent models. In our analysis, we uncover intriguing insights about the significant role of soft alignment in the case of Transformers. Our results showcase the potential of fusing multiple Transformers, thus compounding their expertise, in the emerging paradigm of model fusion and recombination.以下是简化中文版本:融合多个独立训练的神经网络,以合并它们的能力。过去的尝试都是限制在几何网络、卷积网络和差分网络上。在这篇论文中,我们提出了一种系统的方法,使用最优运输来软对齐多个转换器基础网络的各种建筑 ком成分。我们提供了一个抽象层对齐概念,可以泛化到任何建筑,并应用于转换器的关键组成部分,如多头自我注意、层Normalization和差分连接。我们还讨论了如何通过不同的截止方法处理它们。此外,我们的方法允许模型的不同大小(不同大小的融合),提供了一种新的高效的压缩方法。我们的方法在图像分类任务上使用视Transformer和自然语言处理任务上使用BERT进行评估,我们的方法一致性超过了普通融合,并在短暂的训练后也超过了父模型的单独整合。在我们的分析中,我们发现了关于转换器中软对齐的各种惊喜的发现。我们的结果显示,可以将多个转换器融合在一起,从而汇集它们的专长,在模型融合和重新组合的新时代中发挥作用。

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

  • paper_url: http://arxiv.org/abs/2310.05712
  • repo_url: None
  • paper_authors: Xiong-Hui Chen, Junyin Ye, Hang Zhao, Yi-Chen Li, Haoran Shi, Yu-Yan Xu, Zhihao Ye, Si-Hang Yang, Anqi Huang, Kai Xu, Zongzhang Zhang, Yang Yu
  • for: 本文旨在提出一种新的imitator learning(ItorL)方法,以便在很少示例的情况下,快速重建不同任务的imitator模块,并适应未预期的环境变化。
  • methods: 本文提出了一种基于一个专家示例的Demo-Attention Actor-Critic(DAAC)方法,将imitator学习纳入了一种强化学习框架,以Regularize策略的行为在意外情况下。此外,为了自主建立imitator策略,我们设计了一个示例基于注意力架构,可以有效地输出imiter动作,适应不同的状态。
  • results: 我们在一个新的导航benchmark和一个机器人环境中测试了DAAC方法,与之前的imitator方法相比,DAAC方法在 seen和未seen任务上都有大幅提高,具体来说,DAAC方法在seen任务上提高了24.3%,在未seen任务上提高了110.8%。
    Abstract Imitation learning (IL) enables agents to mimic expert behaviors. Most previous IL techniques focus on precisely imitating one policy through mass demonstrations. However, in many applications, what humans require is the ability to perform various tasks directly through a few demonstrations of corresponding tasks, where the agent would meet many unexpected changes when deployed. In this scenario, the agent is expected to not only imitate the demonstration but also adapt to unforeseen environmental changes. This motivates us to propose a new topic called imitator learning (ItorL), which aims to derive an imitator module that can on-the-fly reconstruct the imitation policies based on very limited expert demonstrations for different unseen tasks, without any extra adjustment. In this work, we focus on imitator learning based on only one expert demonstration. To solve ItorL, we propose Demo-Attention Actor-Critic (DAAC), which integrates IL into a reinforcement-learning paradigm that can regularize policies' behaviors in unexpected situations. Besides, for autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy that can effectively output imitated actions by adaptively tracing the suitable states in demonstrations. We develop a new navigation benchmark and a robot environment for \topic~and show that DAAC~outperforms previous imitation methods \textit{with large margins} both on seen and unseen tasks.
    摘要 复制学习(IL)允许代理人模仿专家的行为。大多数前一些IL技术都是通过大量示例来精准地复制一个策略。然而,在许多应用场景中,人们需要代理人能够直接完成多个任务,而不需要大量的示例。在这种情况下,代理人需要不仅模仿示例,还需要适应不可预期的环境变化。这种情况 Motivates us to propose a new topic called imitator learning(ItorL), which aims to derive an imitator module that can on-the-fly重建模仿策略 based on very limited expert demonstrations for different unseen tasks, without any extra adjustment. In this work, we focus on imitator learning based on only one expert demonstration. To solve ItorL, we propose Demo-Attention Actor-Critic(DAAC), which integrates IL into a reinforcement-learning paradigm that can regularize policies' behaviors in unexpected situations. Besides, for autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy that can effectively output imitated actions by adaptively tracing the suitable states in demonstrations. We develop a new navigation benchmark and a robot environment for \topic~and show that DAAC~outperforms previous imitation methods \textit{with large margins} both on seen and unseen tasks.

Protecting Sensitive Data through Federated Co-Training

  • paper_url: http://arxiv.org/abs/2310.05696
  • repo_url: None
  • paper_authors: Amr Abourayya, Jens Kleesiek, Kanishka Rao, Erman Ayday, Bharat Rao, Geoff Webb, Michael Kamp
  • for: 保护敏感数据,避免公开地 revelas 本地训练数据。
  • methods: 使用联合学习方法,将本地训练的参数集成到一个共识模型中,以实现模型的训练。
  • results: 比较 federated learning 和分布式滤波两种方法, federated co-training 方法可以达到更高的隐私保护和模型质量。
    Abstract In many critical applications, sensitive data is inherently distributed. Federated learning trains a model collaboratively by aggregating the parameters of locally trained models. This avoids exposing sensitive local data. It is possible, though, to infer upon the sensitive data from the shared model parameters. At the same time, many types of machine learning models do not lend themselves to parameter aggregation, such as decision trees, or rule ensembles. It has been observed that in many applications, in particular healthcare, large unlabeled datasets are publicly available. They can be used to exchange information between clients by distributed distillation, i.e., co-regularizing local training via the discrepancy between the soft predictions of each local client on the unlabeled dataset. This, however, still discloses private information and restricts the types of models to those trainable via gradient-based methods. We propose to go one step further and use a form of federated co-training, where local hard labels on the public unlabeled datasets are shared and aggregated into a consensus label. This consensus label can be used for local training by any supervised machine learning model. We show that this federated co-training approach achieves a model quality comparable to both federated learning and distributed distillation on a set of benchmark datasets and real-world medical datasets. It improves privacy over both approaches, protecting against common membership inference attacks to the highest degree. Furthermore, we show that federated co-training can collaboratively train interpretable models, such as decision trees and rule ensembles, achieving a model quality comparable to centralized training.
    摘要 许多关键应用中敏感数据是自然地分布式。联邦学习通过合并本地训练模型的参数来培训模型,这样可以避免曝光敏感本地数据。然而,可以通过共享模型参数来推断敏感数据。此外,许多机器学习模型无法参数综合,如决策树或规则集。在许多应用中,尤其是医疗领域,大量的未标注数据公开可用。通过分布式蒸馏,即客户端之间通过未标注数据的差异来协调本地训练。这样仍然披露了私人信息,并限制了可以使用的模型类型。我们提议进一步使用联邦合作训练,其中本地硬标签在公共未标注数据上进行分布式合并,生成一个共识标签。这个共识标签可以用于本地训练任何超级vised机器学习模型。我们显示,这种联邦合作训练方法可以与联邦学习和分布式蒸馏相比,在一组benchmark数据集和真实医疗数据集上达到类似的模型质量。同时,它提高隐私性,保护 против最常见的会员推断攻击。此外,我们还显示,联邦合作训练可以共同训练可读性强的模型,如决策树和规则集,与中央训练相比。

Hierarchical Reinforcement Learning for Temporal Pattern Prediction

  • paper_url: http://arxiv.org/abs/2310.05695
  • repo_url: None
  • paper_authors: Faith Johnson, Kristin Dana
  • for: 这个论文探讨了使用层次强化学习(HRL)来解决时间序列预测任务。
  • methods: 作者使用了深度学习和HRL来开发一个用于预测股票价格时间序列的股票机器人,以及一个基于首人视频的车辆机器人来预测转向角。
  • results: 在两个领域中,作者发现了一种类型的HRL,即封顶强化学习,可以提供更高的训练速度和稳定性以及预测精度,而这一成功归功于网络层次结构中引入的时间和空间抽象。
    Abstract In this work, we explore the use of hierarchical reinforcement learning (HRL) for the task of temporal sequence prediction. Using a combination of deep learning and HRL, we develop a stock agent to predict temporal price sequences from historical stock price data and a vehicle agent to predict steering angles from first person, dash cam images. Our results in both domains indicate that a type of HRL, called feudal reinforcement learning, provides significant improvements to training speed and stability and prediction accuracy over standard RL. A key component to this success is the multi-resolution structure that introduces both temporal and spatial abstraction into the network hierarchy.
    摘要 在这个研究中,我们探索使用层次强制学习(HRL)来解决时间序列预测问题。通过将深度学习和HRL相结合,我们开发了一个股票代理来预测历史股票价格数据中的时间价格序列,以及一个车辆代理来预测来自首人、摄像头图像中的推理角度。我们在两个领域中的结果表明,一种称为“封顶强制学习”的HRL方法可以提供标准RL方法的训练速度和稳定性以及预测精度的显著改进。关键的一点是将多尺度结构引入网络层次结构,这种结构具有时间和空间抽象的双重优势。

Multi-timestep models for Model-based Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.05672
  • repo_url: None
  • paper_authors: Abdelhakim Benechehab, Giuseppe Paolo, Albert Thomas, Maurizio Filippone, Balázs Kégl
  • for: This paper aims to improve the performance of model-based reinforcement learning (MBRL) algorithms by using a multi-timestep objective to train one-step models.
  • methods: The authors use a weighted sum of loss functions at various future horizons as their objective, with exponentially decaying weights, to improve the long-horizon performance of their models.
  • results: The authors find that their multi-timestep models outperform or match standard one-step models in both pure batch reinforcement learning (RL) and iterated batch RL scenarios, particularly in noisy environments.Here’s the same information in Simplified Chinese:
  • for: 这篇论文目标是提高基于模型的学习(MBRL)算法的性能,通过使用多个时间步骤的目标来训练一步模型。
  • methods: 作者们使用多个时间步骤的损失函数权重和衰减来提高模型的长期性能。
  • results: 作者们发现,他们的多个时间步骤模型在纯批量学习(RL)和迭代批量RL场景中都能够超过或与标准一步模型匹配,特别在噪音环境中表现出色, highlighting the potential of their approach in real-world applications。
    Abstract In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as length of the trajectory grows. In this paper we tackle this issue by using a multi-timestep objective to train one-step models. Our objective is a weighted sum of a loss function (e.g., negative log-likelihood) at various future horizons. We explore and test a range of weights profiles. We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score. This improvement is particularly noticeable when the models were evaluated on noisy data. Finally, using a soft actor-critic (SAC) agent in pure batch reinforcement learning (RL) and iterated batch RL scenarios, we found that our multi-timestep models outperform or match standard one-step models. This was especially evident in a noisy variant of the considered environment, highlighting the potential of our approach in real-world applications.
    摘要 在基于模型的强化学习(MBRL)中,大多数算法都是通过从一步动力学模型学习数据上的一步预测错误来预测 trajectory。这种方法的一个挑战是预测误差的积累作用,随着 trajectory 的长度增长。在这篇论文中,我们解决这个问题 by using a multi-timestep objective to train one-step models。我们的目标是一个 weighted sum of a loss function(例如 negative log-likelihood)at various future horizons。我们探索和测试了不同的Weight profile。我们发现,使用恒速衰减的Weight leads to models that significantly improve the long-horizon R2 score。这种改进特别明显在噪音环境中, highlighting the potential of our approach in real-world applications。In addition, we used a soft actor-critic (SAC) agent in pure batch reinforcement learning (RL) and iterated batch RL scenarios, and found that our multi-timestep models outperformed or matched standard one-step models. This was especially evident in a noisy variant of the considered environment.

LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.05668
  • repo_url: None
  • paper_authors: Feiyi Chen, Zhen Qing, Yingying Zhang, Shuiguang Deng, Yi Xiao, Guansong Pang, Qingsong Wen
  • for: 这个研究旨在提出一种Light and Anti-overfitting Retraining Approach (LARA),用于深度Variational Autoencoder (VAEs) 时间序列异常检测方法中。
  • methods: 本研究使用了一个新的Retraining process,它可以快速地调整模型,并且避免过滤。此外,本研究还提出了一个叫做ruminate block的新方法,可以利用历史数据而不需要储存它们。
  • results: 本研究的实验结果显示,可以使用43个时间槽的新分布数据进行重训,却可以与现有的异常检测模型相比,并且显示出较低的过滤频率。此外,本研究还证明了LARA模型的过程调整 overhead 轻量级。
    Abstract Most of current anomaly detection models assume that the normal pattern remains same all the time. However, the normal patterns of Web services change dramatically and frequently. The model trained on old-distribution data is outdated after such changes. Retraining the whole model every time is expensive. Besides, at the beginning of normal pattern changes, there is not enough observation data from the new distribution. Retraining a large neural network model with limited data is vulnerable to overfitting. Thus, we propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs). This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones. Moreover, we have performed many experiments to verify that retraining LARA with even 43 time slots of data from new distribution can result in its competitive F1 Score in comparison with the state-of-the-art anomaly detection models trained with sufficient data. Besides, we verify its light overhead.
    摘要 现有的异常检测模型大多假设常规模式一直不变。然而,Web服务中常规模式会频繁变化,训练过去的模型会变得异常。每次 retraining 整个模型都是昂贵的。此外,在常规模式变化的开始时,新分布中的数据不够,使用有限的数据重新训练大型神经网络模型容易过拟合。因此,我们提出了一种轻量级、避免过拟合的重新训练方法(LARA),用于深度变量自动编码器基于时间序列异常检测方法(VAEs)。本工作的三个新贡献如下:1. 重新训练过程被形式化为一个凸问题,可以快速 converge 并避免过拟合。2. 设计了一个留存块,可以利用历史数据而无需存储。3. 数学上证明,当微调 latent vector 和重构数据时,线性形式可以实现最小调整误差 между 真实值和微调后的值。此外,我们进行了多个实验,证明在使用43个时间槽的新分布数据重新训练LARA后,其竞争性F1分数与state-of-the-art异常检测模型相比较高。同时,我们还证明了其轻量级。

Binary Classification with Confidence Difference

  • paper_url: http://arxiv.org/abs/2310.05632
  • repo_url: https://github.com/wwangwitsel/ConfDiff
  • paper_authors: Wei Wang, Lei Feng, Yuchen Jiang, Gang Niu, Min-Ling Zhang, Masashi Sugiyama
  • for: 本研究旨在利用信度差(Confidence Difference,简称ConfDiff)来进行Binary分类,而不需要每个训练样本的点击标签。
  • methods: 我们提出了一种风险一致的方法来解决这个问题,并证明了这个方法的整体趋势和减震性。
  • results: 我们在 benchmark 数据集和一个实际应用中的推荐系统数据集上进行了广泛的实验,并证明了我们的提议的有效性。
    Abstract Recently, learning with soft labels has been shown to achieve better performance than learning with hard labels in terms of model generalization, calibration, and robustness. However, collecting pointwise labeling confidence for all training examples can be challenging and time-consuming in real-world scenarios. This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. Instead of pointwise labeling confidence, we are given only unlabeled data pairs with confidence difference that specifies the difference in the probabilities of being positive. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound achieves the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven. Extensive experiments on benchmark data sets and a real-world recommender system data set validate the effectiveness of our proposed approaches in exploiting the supervision information of the confidence difference.
    摘要

Cost-sensitive probabilistic predictions for support vector machines

  • paper_url: http://arxiv.org/abs/2310.05997
  • repo_url: None
  • paper_authors: Sandra Benítez-Peña, Rafael Blanquero, Emilio Carrizosa, Pepa Ramírez-Cobo
  • for: 这种方法是为了生成SVM模型中的概率输出,并且能够处理不均衡数据集,以及使用参数优化过程中生成的有价值信息来提高模型的性能。
  • methods: 这种方法使用了成本敏感的SVM模型,并将其嵌入到协同 ensemble 方法中,使用bootstrapEstimates来估计概率。
  • results: 数据测试表明,这种方法在各种数据集上比基准方法有着优异的性能。
    Abstract Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for two-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information that are not fully exploited, not being used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The new method has the following three properties. First, it is designed to be cost-sensitive, and thus the different importance of sensitivity (or true positive rate, TPR) and specificity (true negative rate, TNR) is readily accommodated in the model. As a result, the model can deal with imbalanced datasets which are common in operational business problems as churn prediction or credit scoring. Second, the SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing approaches. Numerical tests on a wide range of datasets show the advantages of our approach over benchmark procedures.
    摘要 支持向量机(SVM)是广泛使用的机器学习模型之一,是二类分类中最好的考试和使用的模型之一。在SVM中的分类基于得分过程,得到了决定性的分类规则,可以转换为概率性的分类规则(如在各种SVM库中实现的),但是不是概率性的。然而,SVM的常量参数优化知识具有高计算成本和生成不完全利用的信息,不会建立概率分类规则。在这篇论文中,我们提出了一种新的方法,以生成SVM的概率输出。这种方法具有以下三个特点:首先,它是成本敏感的,可以 readily 折衔不均衡的数据集,这些数据集在运营商业问题中很常见,如脱退预测和信用评分。第二,SVM被嵌入到集成方法中,以提高其性能,利用参数优化过程中生成的有价值信息。最后,概率估计通过 bootstrap 估计进行,避免使用参数模型作为竞争方法。数值测试在各种数据集上表明了我们的方法的优势。

On Prediction-Modelers and Decision-Makers: Why Fairness Requires More Than a Fair Prediction Model

  • paper_url: http://arxiv.org/abs/2310.05598
  • repo_url: None
  • paper_authors: Teresa Scantamburlo, Joachim Baumann, Christoph Heitz
  • for: 本文旨在阐述在预测基于决策中的公平性问题,并提出一个框架来帮助实现公平性。
  • methods: 本文使用了概念分离技术,将预测和决策分为两个独立的步骤,以便更好地理解和实现公平性。
  • results: 本文提出了一个框架,可以帮助在预测基于决策中实现公平性,并提出了一些实现公平性的策略和方法。
    Abstract An implicit ambiguity in the field of prediction-based decision-making regards the relation between the concepts of prediction and decision. Much of the literature in the field tends to blur the boundaries between the two concepts and often simply speaks of 'fair prediction.' In this paper, we point out that a differentiation of these concepts is helpful when implementing algorithmic fairness. Even if fairness properties are related to the features of the used prediction model, what is more properly called 'fair' or 'unfair' is a decision system, not a prediction model. This is because fairness is about the consequences on human lives, created by a decision, not by a prediction. We clarify the distinction between the concepts of prediction and decision and show the different ways in which these two elements influence the final fairness properties of a prediction-based decision system. In addition to exploring this relationship conceptually and practically, we propose a framework that enables a better understanding and reasoning of the conceptual logic of creating fairness in prediction-based decision-making. In our framework, we specify different roles, namely the 'prediction-modeler' and the 'decision-maker,' and the information required from each of them for being able to implement fairness of the system. Our framework allows for deriving distinct responsibilities for both roles and discussing some insights related to ethical and legal requirements. Our contribution is twofold. First, we shift the focus from abstract algorithmic fairness to context-dependent decision-making, recognizing diverse actors with unique objectives and independent actions. Second, we provide a conceptual framework that can help structure prediction-based decision problems with respect to fairness issues, identify responsibilities, and implement fairness governance mechanisms in real-world scenarios.
    摘要 <> translation into Simplified Chinese一个隐式的歧义在预测基础决策领域是预测和决策的关系。大多数 литературе在这个领域通常会混同这两个概念,并只是说的'公平预测'。在这篇论文中,我们指出了这两个概念之间的分化是有助于实施算法公平的。即使公平性特性与预测模型的特性相关,但是真正是'公平'或'不公平'的是决策系统,不是预测模型。这是因为公平是关于人类生活的后果,而不是预测的结果。我们清楚地区分了预测和决策的概念,并显示了这两个元素在最终公平性质量上的不同影响。除了探讨这种关系的概念和实践方面,我们提出了一个框架,允许更好地理解和理解预测基础决策中的公平创造机制。在我们的框架中,我们详细定义了不同角色,包括'预测模型者'和'决策者',以及它们所需的信息,以便实现预测基础决策系统的公平。我们的框架允许 derive出不同的责任,并讨论一些与伦理和法律要求相关的洞察。我们的贡献是两重的。首先,我们将焦点从抽象的算法公平转移到了 Context-dependent 决策,认可多种演员有独特的目标和独立行动。其次,我们提供了一个概念框架,可以帮助结构预测基础决策问题,识别责任,并在实际场景中实施公平管理机制。

ODEFormer: Symbolic Regression of Dynamical Systems with Transformers

  • paper_url: http://arxiv.org/abs/2310.05573
  • repo_url: https://github.com/sdascoli/odeformer
  • paper_authors: Stéphane d’Ascoli, Sören Becker, Alexander Mathis, Philippe Schwaller, Niki Kilbertus
  • for: 描述一种可以从单个解曲线观测数据中推断多维常微方程系统的符号形式传播模型(ODEFormer)。
  • methods: 使用变换器来推断多维常微方程系统的符号形式。
  • results: ODEFormer在两个数据集上(Strogatz和ODEBench)表现出色,在干扰和不规则观测数据中 Displaying substantially improved robustness and faster inference compared to existing methods。
    Abstract We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory. We perform extensive evaluations on two datasets: (i) the existing "Strogatz" dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully curated from the literature to provide a more holistic benchmark. ODEFormer consistently outperforms existing methods while displaying substantially improved robustness to noisy and irregularly sampled observations, as well as faster inference. We release our code, model and benchmark dataset publicly.
    摘要 我们介绍ODEFormer,首个能够从单一解析轨迹观测中推导多维常微方程系统的transformer。我们在两个数据集上进行了广泛的评估:(一)现有的“Strogatz”数据集,这是一个二维系统的数据集;(二)ODEBench,我们从文献中精心范选了一些一至四维系统,以提供更加全面的benchmark。ODEFormer在数据集上一般性高,而且在噪音和不规则采样观测下具有substantially提高的Robustness,以及更快的推导速度。我们将代码、模型和数据集公开发布。

A New Transformation Approach for Uplift Modeling with Binary Outcome

  • paper_url: http://arxiv.org/abs/2310.05549
  • repo_url: None
  • paper_authors: Kun Li, Jiang Tian, Xiaojia Xiang
  • for: 这篇论文是关于如何实现更好的客户预测和精确的标的定义,以提高营销效果。
  • methods: 本论文使用的方法是一种新的变数方法,可以将原始的对象指标转换为一个新的变数,以便更好地预测客户的反应。
  • results: 实验结果显示,新的变数方法可以优化客户预测和标的定义,提高营销效果。此外,这种方法还可以轻松地应用在实际应用中。
    Abstract Uplift modeling has been used effectively in fields such as marketing and customer retention, to target those customers who are more likely to respond due to the campaign or treatment. Essentially, it is a machine learning technique that predicts the gain from performing some action with respect to not taking it. A popular class of uplift models is the transformation approach that redefines the target variable with the original treatment indicator. These transformation approaches only need to train and predict the difference in outcomes directly. The main drawback of these approaches is that in general it does not use the information in the treatment indicator beyond the construction of the transformed outcome and usually is not efficient. In this paper, we design a novel transformed outcome for the case of the binary target variable and unlock the full value of the samples with zero outcome. From a practical perspective, our new approach is flexible and easy to use. Experimental results on synthetic and real-world datasets obviously show that our new approach outperforms the traditional one. At present, our new approach has already been applied to precision marketing in a China nation-wide financial holdings group.
    摘要 《升级模型》在市场营销和客户保持方面得到了有效应用,以针对那些响应更高的客户进行投入。概括来说,它是一种机器学习技术,预测对不进行处理的结果所带来的提升。一种受欢迎的类型的升级模型是转换方法,它重新定义目标变量与原始治理器指标之间的关系。这些转换方法只需要训练和预测直接的差异。但是,这些方法通常不使用治理器指标中的信息,除了构建转换后的结果外。在这篇论文中,我们设计了一种新的转换结果,用于二分类目标变量的情况。我们的新方法可以充分利用零结果样本的信息,从而提高准确率。实验结果表明,我们的新方法在synthetic和实际数据集上明显超过传统方法。在一家中国国家范围内的金融控股集团中,我们的新方法已经应用于精准营销。(Note: Please note that the translation is provided as-is, and may not be perfect or completely idiomatic. However, it should be sufficient to convey the general meaning of the text.)

NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification

  • paper_url: http://arxiv.org/abs/2310.05530
  • repo_url: https://github.com/koumajos/classification_by_nettisa_flow
  • paper_authors: Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka
  • for: 这篇论文旨在提出一种基于流量记录的网络流量监测方法,以便在各种网络基础设施上部署,包括承载数百万人的大型IPS网络。
  • methods: 该方法基于流量记录的时间序列分析,提出了一种新的扩展IP流记录(NetTiSA),并对25种网络类型任务进行了广泛的测试,以证明NetTiSA的广泛适用性和高可用性。
  • results: 测试结果表明,NetTiSA可以高度精准地分类网络流量,并且在计算流量扩展时对性能的影响较小。此外,NetTiSA可以在100Gbps级别的高速ISP网络上进行实际部署,因此可以提供广泛的网络安全保护。
    Abstract Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large IPS-based networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended for additional features that enable network traffic analysis with high accuracy. Nevertheless, the flow extensions are often too large or hard to compute, which limits their deployment only to smaller-sized networks. This paper proposes a novel extended IP flow called NetTiSA (Network Time Series Analysed), which is based on the analysis of the time series of packet sizes. By thoroughly testing 25 different network classification tasks, we show the broad applicability and high usability of NetTiSA, which often outperforms the best-performing related works. For practical deployment, we also consider the sizes of flows extended for NetTiSA and evaluate the performance impacts of its computation in the flow exporter. The novel feature set proved universal and deployable to high-speed ISP networks with 100\,Gbps lines; thus, it enables accurate and widespread network security protection.
    摘要 translate into Simplified Chinese:网络流量监测基于IP流是标准监测方法,可以部署到不同的网络基础设施,包括连接百万人的IPS网络。由于流记录通常只包含地址、传输端口和交换的数据量,因此它们常被扩展以获得高精度的网络流量分析。然而,扩展流量通常是太大或计算过程太复杂,因此只能在较小的网络上进行部署。本文提出了一种基于时间序列分析的增强IP流,称为NetTiSA(网络时间序列分析)。通过对25种不同的网络分类任务进行严格测试,我们表明NetTiSA的广泛适用性和高可用性。此外,我们还考虑了NetTiSA扩展流量的大小以及计算流程的性能影响。 results show that the novel feature set is universal and deployable to high-speed ISP networks with 100 Gbps lines, enabling accurate and widespread network security protection.Note: Simplified Chinese is a romanization of Chinese, it is not a direct translation of the original text. The translation is based on the pronunciation of the characters, and it may not be exactly the same as the original text.

A novel Network Science Algorithm for Improving Triage of Patients

  • paper_url: http://arxiv.org/abs/2310.05996
  • repo_url: None
  • paper_authors: Pietro Hiram Guzzi, Annamaria De Filippo, Pierangelo Veltri
  • for: This paper aims to develop a novel algorithm for triaging patients based on the analysis of patient data, with the goal of improving the efficiency, accuracy, and consistency of patient prioritization.
  • methods: The algorithm is based on rigorous preprocessing and feature engineering of a comprehensive data set containing relevant patient information, such as vital signs, symptoms, and medical history.
  • results: The experimental results demonstrate that the algorithm achieved high accuracy and performance, outperforming traditional triage methods.
    Abstract Patient triage plays a crucial role in healthcare, ensuring timely and appropriate care based on the urgency of patient conditions. Traditional triage methods heavily rely on human judgment, which can be subjective and prone to errors. Recently, a growing interest has been in leveraging artificial intelligence (AI) to develop algorithms for triaging patients. This paper presents the development of a novel algorithm for triaging patients. It is based on the analysis of patient data to produce decisions regarding their prioritization. The algorithm was trained on a comprehensive data set containing relevant patient information, such as vital signs, symptoms, and medical history. The algorithm was designed to accurately classify patients into triage categories through rigorous preprocessing and feature engineering. Experimental results demonstrate that our algorithm achieved high accuracy and performance, outperforming traditional triage methods. By incorporating computer science into the triage process, healthcare professionals can benefit from improved efficiency, accuracy, and consistency, prioritizing patients effectively and optimizing resource allocation. Although further research is needed to address challenges such as biases in training data and model interpretability, the development of AI-based algorithms for triaging patients shows great promise in enhancing healthcare delivery and patient outcomes.
    摘要 医疗患者分类占据了医疗业中关键的地位,确保患者得到了时间适当的和适合的护理,根据患者的病情严重程度。传统的分类方法依赖于人类的判断,这可能是主观的和容易出错的。在最近的几年里,人们对使用人工智能(AI)开发患者分类算法表示了增加的兴趣。本文描述了一种基于患者数据分析的新的患者分类算法的开发。该算法通过对病人数据进行严格的预处理和特征工程来生成准确的患者分类结果。实验结果表明,我们的算法可以准确地将患者分为不同的分类 катего里,并且表现出了高度的准确率和性能,超过传统的分类方法。通过将计算机科学引入分类过程,医疗专业人员可以从而获得更高效、准确和一致的患者分类结果,优先级化患者,最大化资源的分配。虽然还需要进一步的研究,例如训练数据中存在的偏见和模型解释性等问题,但AI在患者分类中的应用显示了极大的潜力,以改善医疗服务和患者结果。

Projecting infinite time series graphs to finite marginal graphs using number theory

  • paper_url: http://arxiv.org/abs/2310.05526
  • repo_url: None
  • paper_authors: Andreas Gerhardus, Jonas Wahl, Sofia Faltenbacher, Urmi Ninad, Jakob Runge
  • for: 本文是用于推广和应用 causal-graphical-model 框架的方法和应用工作的一种新方法。
  • methods: 本文提出了一种方法,可以将无穷时间序列图表示为 marginal graphical models,以解决在无穷图中的 $m$-separation 问题。
  • results: 本文提出了一种算法,可以将无穷时间序列图 projection 到 marginal graphical models,并证明这些 marginal graphs 可以用于 causal discovery 和 causal effect estimation。
    Abstract In recent years, a growing number of method and application works have adapted and applied the causal-graphical-model framework to time series data. Many of these works employ time-resolved causal graphs that extend infinitely into the past and future and whose edges are repetitive in time, thereby reflecting the assumption of stationary causal relationships. However, most results and algorithms from the causal-graphical-model framework are not designed for infinite graphs. In this work, we develop a method for projecting infinite time series graphs with repetitive edges to marginal graphical models on a finite time window. These finite marginal graphs provide the answers to $m$-separation queries with respect to the infinite graph, a task that was previously unresolved. Moreover, we argue that these marginal graphs are useful for causal discovery and causal effect estimation in time series, effectively enabling to apply results developed for finite graphs to the infinite graphs. The projection procedure relies on finding common ancestors in the to-be-projected graph and is, by itself, not new. However, the projection procedure has not yet been algorithmically implemented for time series graphs since in these infinite graphs there can be infinite sets of paths that might give rise to common ancestors. We solve the search over these possibly infinite sets of paths by an intriguing combination of path-finding techniques for finite directed graphs and solution theory for linear Diophantine equations. By providing an algorithm that carries out the projection, our paper makes an important step towards a theoretically-grounded and method-agnostic generalization of a range of causal inference methods and results to time series.
    摘要 近年来,一些方法和应用工作已经适应和应用了 causal-graphical-model 框架到时间序列数据。许多这些工作使用时间分解的 causal 图,其延伸到过去和未来无穷,并且图的边重复在时间上,表明了预设的站立 causal 关系。然而,大多数结果和算法从 causal-graphical-model 框架不适用于无穷图。在这种工作中,我们开发了一种方法,将无穷时间序列图的 repetitive 边投影到固定时间窗口内的 marginal 图形式。这些 marginal 图可以回答 $m$-separation 查询,对于无穷图来说,是以前未解决的问题。此外,我们认为这些 marginal 图对 causal 发现和 causal 效应估计在时间序列中都是有用的,因此可以将 finite 图上的结果应用到无穷图上。投影过程基于在要投影的图中寻找共同祖先的搜索,并不是新的。然而,在时间序列图上执行这种投影过程具有挑战,因为可能存在无穷多个路径,导致共同祖先。我们解决这个问题,通过一种独特的将 finite 图上的路径找到与无穷图相匹配的方法,并使用解决线性Diophantine方程的解辑理论。我们的论文提供了一种可以执行投影的算法,这个步骤对于在时间序列中应用 causal 推理方法和结果进行 theoretically-grounded 和方法-agnostic 的总体化做出了重要贡献。

WeatherGNN: Exploiting Complicated Relationships in Numerical Weather Prediction Bias Correction

  • paper_url: http://arxiv.org/abs/2310.05517
  • repo_url: https://github.com/water-wbq/WeatherGNN
  • paper_authors: Binqing Wu, Weiqi Chen, Wengwei Wang, Bingqing Peng, Liang Sun, Ling Chen
  • for: corrected numerical weather prediction (NWP) bias
  • methods: Graph Neural Networks (GNN) and factor-wise GNN, fast hierarchical GNN
  • results: superior performance compared to other state-of-the-art (SOTA) methods, with an average improvement of 40.50% on RMSE compared to the original NWP.Here is the Chinese translation:
  • for: corrected numerical weather prediction (NWP) 误差
  • methods: Graph Neural Networks (GNN) 和分量 wise GNN, 快速层次 GNN
  • results: 与其他状态首选 (SOTA) 方法相比,平均提高40.50%的RMSE 相对于原始 NWP.
    Abstract Numerical weather prediction (NWP) may be inaccurate or biased due to incomplete atmospheric physical processes, insufficient spatial-temporal resolution, and inherent uncertainty of weather. Previous studies have attempted to correct biases by using handcrafted features and domain knowledge, or by applying general machine learning models naively. They do not fully explore the complicated meteorologic interactions and spatial dependencies in the atmosphere dynamically, which limits their applicability in NWP bias-correction. Specifically, weather factors interact with each other in complex ways, and these interactions can vary regionally. In addition, the interactions between weather factors are further complicated by the spatial dependencies between regions, which are influenced by varied terrain and atmospheric motions. To address these issues, we propose WeatherGNN, an NWP bias-correction method that utilizes Graph Neural Networks (GNN) to learn meteorologic and geographic relationships in a unified framework. Our approach includes a factor-wise GNN that captures meteorological interactions within each grid (a specific location) adaptively, and a fast hierarchical GNN that captures spatial dependencies between grids dynamically. Notably, the fast hierarchical GNN achieves linear complexity with respect to the number of grids, enhancing model efficiency and scalability. Our experimental results on two real-world datasets demonstrate the superiority of WeatherGNN in comparison with other SOTA methods, with an average improvement of 40.50\% on RMSE compared to the original NWP.
    摘要 numerical 天气预测(NWP)可能存在偏差或偏见,原因包括大气物理过程的缺失、时空分解不够细致,以及天气预测的内在不确定性。先前的研究已经尝试使用手工设计的特征和领域知识来纠正偏差,或者直接使用通用机器学习模型。但这些方法并未充分探索大气中复杂的物理互动和空间依赖关系,限制了它们在NWP偏差纠正中的应用。具体来说,天气因素之间存在复杂的互动,这些互动可能因地域而异,而且这些互动还受到不同的地形和大气动力的影响。为解决这些问题,我们提出了WeatherGNN,一种基于图神经网络(GNN)的NWP偏差纠正方法。我们的方法包括一个因素独立的GNN,可以在每个网格(具体位置)中适应地捕捉大气物理互动,以及一个快速的层次GNN,可以在不同网格之间快速捕捉空间依赖关系。吸引注意的是,快速的层次GNN在网格数量 linear 复杂度上具有优化,从而提高模型的效率和扩展性。我们的实验结果表明,WeatherGNN在两个真实世界数据集上的表现胜过其他SOTA方法,具有40.50%的RMSE提升。

A Neural Tangent Kernel View on Federated Averaging for Deep Linear Neural Network

  • paper_url: http://arxiv.org/abs/2310.05495
  • repo_url: None
  • paper_authors: Xin Liu, Dazhi Zhan, Wei Tao, Xin Ma, Yu Pan, Yu Ding, Zhisong Pan
  • for: 这篇论文的目的是提供 FedAvg 在训练神经网络时的全球收敛性保证。
  • methods: 这篇论文使用 NTK 理论来研究 FedAvg 在训练神经网络时的收敛性。
  • results: 这篇论文提供了 FedAvg 在训练深度线性神经网络时的全球收敛性保证,并且通过实验验证了理论结论。
    Abstract Federated averaging (FedAvg) is a widely employed paradigm for collaboratively training models from distributed clients without sharing data. Nowadays, the neural network has achieved remarkable success due to its extraordinary performance, which makes it a preferred choice as the model in FedAvg. However, the optimization problem of the neural network is often non-convex even non-smooth. Furthermore, FedAvg always involves multiple clients and local updates, which results in an inaccurate updating direction. These properties bring difficulties in analyzing the convergence of FedAvg in training neural networks. Recently, neural tangent kernel (NTK) theory has been proposed towards understanding the convergence of first-order methods in tackling the non-convex problem of neural networks. The deep linear neural network is a classical model in theoretical subject due to its simple formulation. Nevertheless, there exists no theoretical result for the convergence of FedAvg in training the deep linear neural network. By applying NTK theory, we make a further step to provide the first theoretical guarantee for the global convergence of FedAvg in training deep linear neural networks. Specifically, we prove FedAvg converges to the global minimum at a linear rate $\mathcal{O}\big((1-\eta K /N)^t\big)$, where $t$ is the number of iterations, $\eta$ is the learning rate, $N$ is the number of clients and $K$ is the number of local updates. Finally, experimental evaluations on two benchmark datasets are conducted to empirically validate the correctness of our theoretical findings.
    摘要 《联合平均(FedAvg)》是一种广泛使用的方法,用于在分布式客户端上共同训练模型而无需分享数据。现在,神经网络已经取得了很大的成功,使得它成为了FedAvg中的首选模型。然而,神经网络的优化问题经常是非凸的,甚至是不满足的。此外,FedAvg总是包括多个客户端和本地更新,这会导致不准确的更新方向。这些特性使得分析FedAvg在训练神经网络的 converges 变得更加困难。近年来,神经积极核(NTK)理论被提出,用于理解在非凸神经网络中第一个方法的converges。深度线性神经网络是神经网络理论中的经典模型,然而,关于FedAvg在训练深度线性神经网络的converges的理论结果并未存在。通过应用NTK理论,我们做出了一个进一步的步骤,提供了对FedAvg在训练深度线性神经网络的全球最佳化的首次理论保证。具体来说,我们证明FedAvg会在$(1-\eta K/N)^t$的线性速率下收敛到全球最小值,其中$t$是迭代次数,$\eta$是学习率,$N$是客户端的数量,$K$是本地更新的数量。最后,我们在两个标准数据集上进行了实验评估,以验证我们的理论发现的正确性。

Integration-free Training for Spatio-temporal Multimodal Covariate Deep Kernel Point Processes

  • paper_url: http://arxiv.org/abs/2310.05485
  • repo_url: None
  • paper_authors: Yixuan Zhang, Quyu Kong, Feng Zhou
  • for: 本研究提出了一种新的深度空间时间点 процесс模型(深度混合点过程),即DKMPP,该模型利用多modal的covariate信息。
  • methods: DKMPP使用一种更 flexible的深度kernel来模型事件和covariate数据之间的复杂关系,从而提高模型的表达能力。
  • results: 我们的实验表明,DKMPP和其相应的分数基 estimator在基eline模型之上表现出优异,展示了将covariate信息、深度kernel和分数基 estimator相结合的优势。
    Abstract In this study, we propose a novel deep spatio-temporal point process model, Deep Kernel Mixture Point Processes (DKMPP), that incorporates multimodal covariate information. DKMPP is an enhanced version of Deep Mixture Point Processes (DMPP), which uses a more flexible deep kernel to model complex relationships between events and covariate data, improving the model's expressiveness. To address the intractable training procedure of DKMPP due to the non-integrable deep kernel, we utilize an integration-free method based on score matching, and further improve efficiency by adopting a scalable denoising score matching method. Our experiments demonstrate that DKMPP and its corresponding score-based estimators outperform baseline models, showcasing the advantages of incorporating covariate information, utilizing a deep kernel, and employing score-based estimators.
    摘要 在这项研究中,我们提出了一种新的深度空间时间点过程模型,深度混合点过程(DKMPP),该模型利用多Modal covariate信息。DKMPP是DMPP的改进版本,它使用更 flexible的深度核函数来模型事件和 covariate数据之间的复杂关系,提高模型的表达力。为了解决DKMPP的训练过程中的非可 интегриble深度核函数问题,我们使用了不需要 интеграción的得分匹配方法,并通过采用扩展的净化得分匹配方法来提高效率。我们的实验表明,DKMPP和其相应的得分基估计器在比例模型和事件时间点过程模型方面具有优势, demonstrating the benefits of incorporating covariate information, using a deep kernel, and employing score-based estimators.Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

Vibroacoustic Frequency Response Prediction with Query-based Operator Networks

  • paper_url: http://arxiv.org/abs/2310.05469
  • repo_url: https://github.com/ecker-lab/FQ-Operator
  • paper_authors: Jan van Delden, Julius Schultz, Christopher Blech, Sabine C. Langer, Timo Lüddecke
  • for: 本研究旨在提高机械结构如飞机、汽车和房屋等的震动声波传播的理解,以确保其用户的健康和舒适性。
  • methods: 本研究使用数据驱动模型来加速 numerical simulation,以便进行设计优化、不确定性评估和设计空间探索等任务。特别是,我们提出了一种新的频率查询运算符模型,该模型可以将板体几何特征映射到频率响应函数。
  • results: 我们在一个包含12,000个板体几何特征的全面性 benchmark 上评估了我们的方法,并发现它比 DeepONets、Fourier Neural Operators 和传统神经网络架构更高效。
    Abstract Understanding vibroacoustic wave propagation in mechanical structures like airplanes, cars and houses is crucial to ensure health and comfort of their users. To analyze such systems, designers and engineers primarily consider the dynamic response in the frequency domain, which is computed through expensive numerical simulations like the finite element method. In contrast, data-driven surrogate models offer the promise of speeding up these simulations, thereby facilitating tasks like design optimization, uncertainty quantification, and design space exploration. We present a structured benchmark for a representative vibroacoustic problem: Predicting the frequency response for vibrating plates with varying forms of beadings. The benchmark features a total of 12,000 plate geometries with an associated numerical solution and introduces evaluation metrics to quantify the prediction quality. To address the frequency response prediction task, we propose a novel frequency query operator model, which is trained to map plate geometries to frequency response functions. By integrating principles from operator learning and implicit models for shape encoding, our approach effectively addresses the prediction of resonance peaks of frequency responses. We evaluate the method on our vibrating-plates benchmark and find that it outperforms DeepONets, Fourier Neural Operators and more traditional neural network architectures. The code and dataset are available from https://eckerlab.org/code/delden2023_plate.
    摘要 In this study, we present a structured benchmark for a representative vibroacoustic problem: predicting the frequency response of vibrating plates with varying forms of beadings. The benchmark features a total of 12,000 plate geometries with associated numerical solutions and introduces evaluation metrics to quantify prediction quality. To address the frequency response prediction task, we propose a novel frequency query operator model, which is trained to map plate geometries to frequency response functions. By integrating principles from operator learning and implicit models for shape encoding, our approach effectively predicts the resonance peaks of frequency responses.We evaluate our method on our vibrating-plates benchmark and find that it outperforms DeepONets, Fourier Neural Operators, and more traditional neural network architectures. The code and dataset are available at .

ExIFFI and EIF+: Interpretability and Enhanced Generalizability to Extend the Extended Isolation Forest

  • paper_url: http://arxiv.org/abs/2310.05468
  • repo_url: https://github.com/alessioarcudi/exiffi
  • paper_authors: Alessio Arcudi, Davide Frizzo, Chiara Masiero, Gian Antonio Susto
  • for: 本研究旨在提出一种可解释的异常检测方法,以帮助用户更好地理解模型的预测结果并进行根本分析。
  • methods: 本研究使用了一种加强版的扩展隔离林(EIF),并提出了一种新的可解释方法ExIFFI,该方法通过特征排名来提供异常检测结果的解释。
  • results: 实验结果显示,ExIFFI在异常检测和特征选择方面具有较高的效果和可解释性。此外,研究还提供了一些实际数据集的评估结果,以便进一步研究和复现。
    Abstract Anomaly detection, an essential unsupervised machine learning task, involves identifying unusual behaviors within complex datasets and systems. While Machine Learning algorithms and decision support systems (DSSs) offer effective solutions for this task, simply pinpointing anomalies often falls short in real-world applications. Users of these systems often require insight into the underlying reasons behind predictions to facilitate Root Cause Analysis and foster trust in the model. However, due to the unsupervised nature of anomaly detection, creating interpretable tools is challenging. This work introduces EIF+, an enhanced variant of Extended Isolation Forest (EIF), designed to enhance generalization capabilities. Additionally, we present ExIFFI, a novel approach that equips Extended Isolation Forest with interpretability features, specifically feature rankings. Experimental results provide a comprehensive comparative analysis of Isolation-based approaches for Anomaly Detection, including synthetic and real dataset evaluations that demonstrate ExIFFI's effectiveness in providing explanations. We also illustrate how ExIFFI serves as a valid feature selection technique in unsupervised settings. To facilitate further research and reproducibility, we also provide open-source code to replicate the results.
    摘要 异常检测是机器学习中的一项不supervised任务,它的目的是在复杂的数据和系统中找到不寻常的行为。而机器学习算法和决策支持系统(DSS)可以提供有效的解决方案,但仅仅找到异常点不足以应对实际应用中的需求。用户需要对模型预测的根本原因进行分析,以便进行根本分析和增加信任。然而,由于异常检测的无supervised性,创建可解释的工具是困难的。本工作提出了EIF+,一个优化的扩展隔离林(EIF)的变体,旨在增强其一般化能力。此外,我们还提出了ExIFFI,一个新的方法,它将扩展隔离林与可解释特性结合起来。ExIFFI在实验中与其他隔离基于方法进行比较分析,包括 sintetic 和实际数据评估,以显示ExIFFI在提供解释方面的效果。我们还证明了ExIFFI可以作为无supervised设定下的特性选择技术。为便进一步研究和重现,我们还提供了开源代码,以便重现结果。

Temporal Convolutional Explorer Helps Understand 1D-CNN’s Learning Behavior in Time Series Classification from Frequency Domain

  • paper_url: http://arxiv.org/abs/2310.05467
  • repo_url: https://github.com/jrzhang33/tce
  • paper_authors: Junru Zhang, Lang Feng, Yang He, Yuhan Wu, Yabo Dong
  • for: 提高一维卷积神经网络(1D-CNN)在时间序列分类任务中的表现,并解释它们在应用中的不desirable outcome。
  • methods: 提出了一种Temporal Convolutional Explorer(TCE)来从频谱角度 empirically explore 1D-CNN 的学习行为。
  • results: 通过对 widely-used UCR、UEA 和 UCI 测试集进行了广泛的实验,显示了以下三点:1) TCE 对 1D-CNN 的学习行为提供了深入的理解; 2) 我们的 regulatory framework 可以在现有的 1D-CNN 中实现更好的表现,具有更少的存储和计算开销。
    Abstract While one-dimensional convolutional neural networks (1D-CNNs) have been empirically proven effective in time series classification tasks, we find that there remain undesirable outcomes that could arise in their application, motivating us to further investigate and understand their underlying mechanisms. In this work, we propose a Temporal Convolutional Explorer (TCE) to empirically explore the learning behavior of 1D-CNNs from the perspective of the frequency domain. Our TCE analysis highlights that deeper 1D-CNNs tend to distract the focus from the low-frequency components leading to the accuracy degradation phenomenon, and the disturbing convolution is the driving factor. Then, we leverage our findings to the practical application and propose a regulatory framework, which can easily be integrated into existing 1D-CNNs. It aims to rectify the suboptimal learning behavior by enabling the network to selectively bypass the specified disturbing convolutions. Finally, through comprehensive experiments on widely-used UCR, UEA, and UCI benchmarks, we demonstrate that 1) TCE's insight into 1D-CNN's learning behavior; 2) our regulatory framework enables state-of-the-art 1D-CNNs to get improved performances with less consumption of memory and computational overhead.
    摘要 一维数据列表(1D-CNN)已经在时间序列分类任务中被证明有效,但我们发现在其应用中可能出现不жела的结果,这使我们更深入研究和理解它们的下面机制。在这种工作中,我们提出了时间卷积探索器(TCE)来从频谱频率角度 empirically 探索 1D-CNN 的学习行为。我们的 TCE 分析表明,深度 1D-CNN 会干扰低频组件,导致精度下降现象,并且干扰卷积是驱动因素。然后,我们利用我们的发现来实际应用中,并提出了一种监管框架,可以轻松地integrated into existing 1D-CNNs。它的目的是通过选择ively bypass specify 干扰卷积来纠正不佳的学习行为,从而提高 state-of-the-art 1D-CNNs 的性能,同时减少内存和计算负担。最后,通过对 UCR、UEA 和 UCI 测试集进行了广泛的实验,我们证明了以下两点:1) TCE 对 1D-CNN 的学习行为提供了深入的理解; 2) 我们的监管框架可以使 state-of-the-art 1D-CNNs 获得更好的性能,同时减少内存和计算负担。

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.05422
  • repo_url: https://github.com/sw-packages/a498e1142fb23106c12b054225864aab1156087a5ab634a1d88227024ecb1626
  • paper_authors: Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu
    for:* 这种研究旨在提高offline reinforcement learning的精度和可行性。methods:* 研究人员提出了一种名为”动力奖励”的隐藏因素,它在不同的过程中保持一致,从而提高了模型的泛化能力。results:* 在synthetic任务上,MOREC具有强大的泛化能力,可以 surprisngly回归一些远见过程。* 在21个offline任务上,MOREC超越了之前的最佳性能,升幅分别为4.6%和25.9%。* MOREC是第一种可以在6个D4RL任务和3个NeoRL任务中达到95%以上在线RL性能的方法。
    Abstract Learning a precise dynamics model can be crucial for offline reinforcement learning, which, unfortunately, has been found to be quite challenging. Dynamics models that are learned by fitting historical transitions often struggle to generalize to unseen transitions. In this study, we identify a hidden but pivotal factor termed dynamics reward that remains consistent across transitions, offering a pathway to better generalization. Therefore, we propose the idea of reward-consistent dynamics models: any trajectory generated by the dynamics model should maximize the dynamics reward derived from the data. We implement this idea as the MOREC (Model-based Offline reinforcement learning with Reward Consistency) method, which can be seamlessly integrated into previous offline model-based reinforcement learning (MBRL) methods. MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value. On a synthetic task, we visualize that MOREC has a strong generalization ability and can surprisingly recover some distant unseen transitions. On 21 offline tasks in D4RL and NeoRL benchmarks, MOREC improves the previous state-of-the-art performance by a significant margin, i.e., 4.6% on D4RL tasks and 25.9% on NeoRL tasks. Notably, MOREC is the first method that can achieve above 95% online RL performance in 6 out of 12 D4RL tasks and 3 out of 9 NeoRL tasks.
    摘要 学习准确的动力学模型可能是关键的,尤其是在线上学习中。然而,很遗憾的是,通过历史转移来学习的动力学模型往往难以泛化到未经看过的转移。在这项研究中,我们发现了一个隐藏的但是重要的因素,即动力奖励(dynamics reward),该因素在转移中保持一致。因此,我们提出了奖励一致的动力学模型(MOREC),即任何由动力学模型生成的转移都应该 Maximize the dynamics reward derived from the data。我们实现了这个想法,并将其与前期的Offline Model-based Reinforcement Learning(MBRL)方法相结合。MOREC可以从历史数据中学习一个通用的动力奖励函数,然后将其用作历史数据中的转移筛选器。当生成转移时,动力模型会生成一批转移,并选择具有最高动力奖励值的转移。在一个 synthetic task 上,我们可见地发现,MOREC具有强大的泛化能力,可以 surprisingly 回归一些远程未经看过的转移。在 D4RL 和 NeoRL benchmark 上的 21 个 Offline task 上,MOREC 提高了之前的状态核心性能,即 4.6% 在 D4RL 任务上和 25.9% 在 NeoRL 任务上。特别是,MOREC 是第一个可以达到上述 95% 在线RL 性能的 6 个 D4RL 任务和 3 个 NeoRL 任务。

On sparse regression, Lp-regularization, and automated model discovery

  • paper_url: http://arxiv.org/abs/2310.06872
  • repo_url: None
  • paper_authors: Jeremy A. McCulloch, Skyler R. St. Pierre, Kevin Linka, Ellen Kuhl
  • For: automatic model discovery and induce sparsity in nonlinear regression for material modeling* Methods: hybrid approach combining regularization and physical constraints, Lp regularization, constitutive neural networks, L2, L1, L0 regularization* Results: discovery of interpretable models and physically meaningful parameters, demonstration of Lp regularized constitutive neural networks’ ability to simultaneously discover both interpretability and predictability, and potential applications in generative material design and discovery of new materials with user-defined properties.
    Abstract Sparse regression and feature extraction are the cornerstones of knowledge discovery from massive data. Their goal is to discover interpretable and predictive models that provide simple relationships among scientific variables. While the statistical tools for model discovery are well established in the context of linear regression, their generalization to nonlinear regression in material modeling is highly problem-specific and insufficiently understood. Here we explore the potential of neural networks for automatic model discovery and induce sparsity by a hybrid approach that combines two strategies: regularization and physical constraints. We integrate the concept of Lp regularization for subset selection with constitutive neural networks that leverage our domain knowledge in kinematics and thermodynamics. We train our networks with both, synthetic and real data, and perform several thousand discovery runs to infer common guidelines and trends: L2 regularization or ridge regression is unsuitable for model discovery; L1 regularization or lasso promotes sparsity, but induces strong bias; only L0 regularization allows us to transparently fine-tune the trade-off between interpretability and predictability, simplicity and accuracy, and bias and variance. With these insights, we demonstrate that Lp regularized constitutive neural networks can simultaneously discover both, interpretable models and physically meaningful parameters. We anticipate that our findings will generalize to alternative discovery techniques such as sparse and symbolic regression, and to other domains such as biology, chemistry, or medicine. Our ability to automatically discover material models from data could have tremendous applications in generative material design and open new opportunities to manipulate matter, alter properties of existing materials, and discover new materials with user-defined properties.
    摘要 匿密回归和特征提取是知识发现大数据的基石。它们的目标是从科学变量之间找到可解释性强的预测模型,提供简单的关系。虽然 Linear 回归的统计工具已经在 context 中得到了良好的定制,但在非线性回归方面,它们在材料模型中的普遍性和不够了解。我们在这里探索使用神经网络自动发现模型的潜力,并通过混合两种策略来实现匿密性:规则化和物理约束。我们将 Lp 规则化用于子集选择与物理神经网络结合,并将其训练于both synthetic 和实际数据。我们进行了数千次发现运行,以推导出一些常见的指南和趋势:L2 规则化或ridge regression 不适合模型发现; L1 规则化或lasso 会导致匿密性,但会带来强烈的偏见;只有 L0 规则化可以透明地调整 interpretability 和预测性、简单性和准确性、偏见和偏差的负荷。通过这些发现,我们证明了 Lp 规则化的 constitutive 神经网络可以同时发现可解释性模型和物理意义的参数。我们预计这些发现将普遍到其他发现技术,如稀疏和符号回归,并在生物、化学、医学等领域得到应用。我们的自动发现材料模型技术可能会在生成材料设计中具有巨大的应用,开启新的材料性能控制和物质性能改变的可能性,以及发现新的材料。

Entropy-MCMC: Sampling from Flat Basins with Ease

  • paper_url: http://arxiv.org/abs/2310.05401
  • repo_url: None
  • paper_authors: Bolian Li, Ruqi Zhang
  • for: 这个论文的目的是提出一种偏置采样方法,以优化深度学习模型的 posterior 采样。
  • methods: 该方法基于一个辅助变量,使 MCMC 采样器偏向平坦区域,从而提高采样效率和准确性。
  • results: 实验结果表明,该方法可以成功采样到深度学习模型的平坦区域,并在多个 bencmarks 上表现出色,包括分类、准确性和异常检测等。
    Abstract Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multi-modal in nature, with local modes exhibiting varying generalization performance. Given a practical budget, sampling from the original posterior can lead to suboptimal performance, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. We prove the convergence of our method and further show that it converges faster than several existing flatness-aware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration, and out-of-distribution detection.
    摘要

Find Your Optimal Assignments On-the-fly: A Holistic Framework for Clustered Federated Learning

  • paper_url: http://arxiv.org/abs/2310.05397
  • repo_url: None
  • paper_authors: Yongxin Guo, Xiaoying Tang, Tao Lin
  • for: 这个论文旨在探讨现有的分布式机器学习方法中,如何处理客户端数据不同性,以提高模型在所有客户端上的表现。
  • methods: 该论文使用了聚类技术来解决客户端数据不同性的问题,并提出了一种四层框架,称为HCFL,以涵盖和扩展现有的方法。
  • results: 该论文通过广泛的数值评估表明,使用提出的聚类方法可以提高模型在客户端数据不同性下的表现,并且提出了进一步改进的聚类方法。
    Abstract Federated Learning (FL) is an emerging distributed machine learning approach that preserves client privacy by storing data on edge devices. However, data heterogeneity among clients presents challenges in training models that perform well on all local distributions. Recent studies have proposed clustering as a solution to tackle client heterogeneity in FL by grouping clients with distribution shifts into different clusters. However, the diverse learning frameworks used in current clustered FL methods make it challenging to integrate various clustered FL methods, gather their benefits, and make further improvements. To this end, this paper presents a comprehensive investigation into current clustered FL methods and proposes a four-tier framework, namely HCFL, to encompass and extend existing approaches. Based on the HCFL, we identify the remaining challenges associated with current clustering methods in each tier and propose an enhanced clustering method called HCFL+ to address these challenges. Through extensive numerical evaluations, we showcase the effectiveness of our clustering framework and the improved components. Our code will be publicly available.
    摘要 federated learning (FL) 是一种emerging distributed machine learningapproach that preserves client privacy by storing data on edge devices. However, data heterogeneity among clients presents challenges in training models that perform well on all local distributions. Recent studies have proposed clustering as a solution to tackle client heterogeneity in FL by grouping clients with distribution shifts into different clusters. However, the diverse learning frameworks used in current clustered FL methods make it challenging to integrate various clustered FL methods, gather their benefits, and make further improvements. To this end, this paper presents a comprehensive investigation into current clustered FL methods and proposes a four-tier framework, namely HCFL, to encompass and extend existing approaches. Based on the HCFL, we identify the remaining challenges associated with current clustering methods in each tier and propose an enhanced clustering method called HCFL+ to address these challenges. Through extensive numerical evaluations, we showcase the effectiveness of our clustering framework and the improved components. Our code will be publicly available.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Robust Image Watermarking based on Cross-Attention and Invariant Domain Learning

  • paper_url: http://arxiv.org/abs/2310.05395
  • repo_url: None
  • paper_authors: Agnibh Dasgupta, Xin Zhong
  • for: 这 paper 是为了提出一种robust image watermarking方法,用于嵌入和提取 watermark within a cover image,并且使用深度学习approaches增强总结和鲁棒性。
  • methods: 这 paper 使用了 convolution 和 concatenation 来实现 watermark embedding,同时也integrate了可能的 augmentation 进行训练。
  • results: 这 paper 提出了 two novel 和 significannot advancements:first, 使用 multi-head cross attention mechanism 来实现 watermark embedding,以便在 cover image 和 watermark之间进行信息交换,并且identify semantically suitable embedding locations。second, 提出了 learning an invariant domain representation 来捕捉 both semantic 和 noise-invariant information concerning the watermark,这对于提高 image watermarking technique 是非常有价值的。
    Abstract Image watermarking involves embedding and extracting watermarks within a cover image, with deep learning approaches emerging to bolster generalization and robustness. Predominantly, current methods employ convolution and concatenation for watermark embedding, while also integrating conceivable augmentation in the training process. This paper explores a robust image watermarking methodology by harnessing cross-attention and invariant domain learning, marking two novel, significant advancements. First, we design a watermark embedding technique utilizing a multi-head cross attention mechanism, enabling information exchange between the cover image and watermark to identify semantically suitable embedding locations. Second, we advocate for learning an invariant domain representation that encapsulates both semantic and noise-invariant information concerning the watermark, shedding light on promising avenues for enhancing image watermarking techniques.
    摘要 Image watermarking 图像水印技术 involves embedding and extracting watermarks within a cover image, with deep learning approaches emerging to enhance generalization and robustness. Predominantly, current methods use convolution and concatenation for watermark embedding, while also incorporating possible augmentation in the training process. This paper explores a robust image watermarking methodology by harnessing cross-attention and invariant domain learning, introducing two novel, significant advancements. First, we design a watermark embedding technique utilizing a multi-head cross attention mechanism, enabling information exchange between the cover image and watermark to identify semantically suitable embedding locations. Second, we advocate for learning an invariant domain representation that encapsulates both semantic and noise-invariant information concerning the watermark, shedding light on promising avenues for enhancing image watermarking techniques.Here's the translation breakdown:Image watermarking 图像水印技术 (watermarking technique)involves embedding and extracting watermarks within a cover image, 图像 (cover image)with deep learning approaches emerging to enhance generalization and robustness. 使用深度学习方法提高泛化和鲁棒性。Predominantly, current methods use convolution and concatenation for watermark embedding, 当今主要方法使用卷积和 concatenation 进行水印嵌入。while also incorporating possible augmentation in the training process. 同时在训练过程中也包含可能的增强。This paper explores a robust image watermarking methodology by harnessing cross-attention and invariant domain learning, 本文探讨了一种基于对比注意力和不变域学习的图像水印方法。marking two novel, significant advancements. 标志着两个新、重要的进步。First, we design a watermark embedding technique utilizing a multi-head cross attention mechanism, 首先,我们设计了一种基于多头对比注意力机制的水印嵌入技术。enabling information exchange between the cover image and watermark to identify semantically suitable embedding locations. 使得图像和水印之间进行信息交换,以便在意义上适当的嵌入位置。Second, we advocate for learning an invariant domain representation that encapsulates both semantic and noise-invariant information concerning the watermark, 第二,我们提倡学习一种不变域表示,包含水印中的 semantic 和噪音不变信息。shedding light on promising avenues for enhancing image watermarking techniques. 探讨了图像水印技术的可能的提高方向。

Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

  • paper_url: http://arxiv.org/abs/2310.05387
  • repo_url: None
  • paper_authors: Da Long, Wei W. Xing, Aditi S. Krishnapriyan, Robert M. Kirby, Shandian Zhe, Michael W. Mahoney
  • For: The paper is written for discovering governing equations from data, which is important in many scientific and engineering applications.* Methods: The paper proposes a novel equation discovery method based on Kernel learning and Bayesian Spike-and-Slab priors (KBASS), which combines kernel regression with a Bayesian spike-and-slab prior for effective operator selection and uncertainty quantification.* Results: The paper shows the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks, demonstrating its ability to overcome data sparsity and noise issues, as well as provide uncertainty quantification.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是为了发现数据中的权导方程,这对科学和工程应用来说非常重要。* Methods: 这篇论文提出了一种基于kernel学习和抽象积分架的方法(KBASS),它将kernel regression与抽象积分架相结合,以实现有效的运算选择和uncertainty评估。* Results: 论文在一系列的benchmark ODE和PDE发现任务上显示了KBASS的显著优势,证明了它在数据稀缺和噪声问题上的可行性,并且可以提供uncertainty评估。
    Abstract Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity as well as noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior -- an ideal Bayesian sparse distribution -- for effective operator selection and uncertainty quantification. We develop an expectation propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra methods to enable efficient computation and optimization. We show the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.
    摘要 发现管理方程式从数据中是科学和工程应用中非常重要的。虽然现有方法已经取得了很大的成功,但是它们仍然面临着数据稀缺和噪声问题,这两个问题在实践中却非常普遍。此外,现有的方法缺乏uncertainty量化和/或训练成本高。为了解决这些限制,我们提出了一种基于kernel学习和权重积分干扰(KBASS)的方程发现方法。我们使用kernel回归来估计目标函数,这种方法非常灵活、表达力强和更加抗抗数据稀缺和噪声。我们将其与一种bayesian积分干扰(BAYESIAN SPIKE-AND-SLAB)的干扰分布结合起来,以实现有效的运算选择和uncertainty量化。我们开发了一种期望传播期望最大化(EP-EM)算法,以便高效地进行 posterior推理和函数估计。为了解决kernel回归的计算挑战,我们将函数值放在一个网格上,并使用kronecker产品结构,以及tensor代数方法来实现高效的计算和优化。我们在一系列的benchmark ODE和PDE发现任务上展示了KBASS的显著优势。

Augmented Embeddings for Custom Retrievals

  • paper_url: http://arxiv.org/abs/2310.05380
  • repo_url: None
  • paper_authors: Anirudh Khatry, Yasharth Bajpai, Priyanshu Gupta, Sumit Gulwani, Ashish Tiwari
  • For: 这个论文主要针对的是如何使用 dense retrieval 技术来提高异类、严格的检索效果,以满足现代大语言模型(LLM)的推荐任务。* Methods: 该论文提出了一种名为 Adapted Dense Retrieval 的机制,它可以将预训练的卷积扩展学习到特定任务中,以提高异类、严格的检索效果。* Results: 论文通过实验证明,Adapted Dense Retrieval Mechanism可以与现有的基于预训练矩阵的基线方法相比,在异类、严格的检索任务中提高检索效果。
    Abstract Information retrieval involves selecting artifacts from a corpus that are most relevant to a given search query. The flavor of retrieval typically used in classical applications can be termed as homogeneous and relaxed, where queries and corpus elements are both natural language (NL) utterances (homogeneous) and the goal is to pick most relevant elements from the corpus in the Top-K, where K is large, such as 10, 25, 50 or even 100 (relaxed). Recently, retrieval is being used extensively in preparing prompts for large language models (LLMs) to enable LLMs to perform targeted tasks. These new applications of retrieval are often heterogeneous and strict -- the queries and the corpus contain different kinds of entities, such as NL and code, and there is a need for improving retrieval at Top-K for small values of K, such as K=1 or 3 or 5. Current dense retrieval techniques based on pretrained embeddings provide a general-purpose and powerful approach for retrieval, but they are oblivious to task-specific notions of similarity of heterogeneous artifacts. We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval. Adapted Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding. We empirically validate our approach by showing improvements over the state-of-the-art general-purpose embeddings-based baseline.
    摘要 信息检索通常包括从质量很高的文档库中选择最相关的元素,以满足给定的搜索查询。经典应用中的检索通常采用同质和松散的方式,其中查询和文档元素都是自然语言(NL)句子(同质),并且目标是从文档库中选择最相关的元素,其中K是大的,例如10、25、50或甚至100(松散)。在最近几年,检索已经在准备提示 для大型自然语言模型(LLM)中得到广泛的应用。这些新的应用程序通常是不同类型的Entity的混合和严格的,查询和文档元素不同,需要改进Top-K中的检索。当前的某些检索技术基于预训练的嵌入可以提供一种通用和强大的方法,但它们对特定任务的相似性无法考虑不同类型的文件。我们介绍了适应的检索,一种将嵌入转换成以便改进特定任务、不同类型的文件和严格的检索。适应的检索通过学习一个低级别的剩余适应来实现。我们通过对比与现有的通用嵌入基eline来验证我们的方法。

Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training

  • paper_url: http://arxiv.org/abs/2310.05350
  • repo_url: None
  • paper_authors: Michael Benington, Leo Phan, Chris Pierre Paul, Evan Shoemaker, Priyanka Ranade, Torstein Collett, Grant Hodgson Perez, Christopher Krieger
  • for: 这个论文主要针对AI加速器处理能力和内存限制的问题,旨在探讨如何在可接受时间内执行机器学习任务(如训练和推理)。
  • methods: 这篇论文使用了分布式算法和电路优化技术来进行多节点环境中的模型扩展,提高模型训练和预处理的效率,并尝试将更多参数存储在有限的资源中。
  • results: 研究项目中对5个encoder-decoder LLMS进行了并行和分布式机器学习算法开发,并进行了细化的研究以量化三种ML并行方法(包括Microsoft DeepSpeed Zero Redundancy Optimizer(ZeRO)阶段)的关系。
    Abstract AI accelerator processing capabilities and memory constraints largely dictate the scale in which machine learning workloads (e.g., training and inference) can be executed within a desirable time frame. Training a state of the art, transformer-based model today requires use of GPU-accelerated high performance computers with high-speed interconnects. As datasets and models continue to increase in size, computational requirements and memory demands for AI also continue to grow. These challenges have inspired the development of distributed algorithm and circuit-based optimization techniques that enable the ability to progressively scale models in multi-node environments, efficiently minimize neural network cost functions for faster convergence, and store more parameters into a set number of available resources. In our research project, we focus on parallel and distributed machine learning algorithm development, specifically for optimizing the data processing and pre-training of a set of 5 encoder-decoder LLMs, ranging from 580 million parameters to 13 billion parameters. We performed a fine-grained study to quantify the relationships between three ML parallelism methods, specifically exploring Microsoft DeepSpeed Zero Redundancy Optimizer (ZeRO) stages.
    摘要 人工智能加速器处理能力和内存限制 largely dictate 机器学习任务(例如训练和推理)可以在理想时间内执行的规模。今天,使用 GPU 加速的高性能计算机和高速 интер连接来训练现代变换器基于模型。随着数据集和模型的大小不断增长,人工智能的计算要求和内存需求也在不断增长。这些挑战激发了分布式算法和绕组件优化技术的开发,以实现在多节点环境中逐渐扩大模型,高效地减少神经网络成本函数,并将更多参数存储在可用资源中。在我们的研究项目中,我们专注于并行和分布式机器学习算法开发,具体来说是优化数据处理和前期训练5个Encoder-Decoder LLMS的集合,该集合包括580亿参数到1300亿参数。我们进行了细化的研究,以量化三种机器学习并行方法之间的关系,具体来说是Microsoft DeepSpeed Zero Redundancy Optimizer(ZeRO)阶段。

DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.05333
  • repo_url: https://github.com/felix-thu/DiffCPS
  • paper_authors: Longxiang He, Linrui Zhang, Junbo Tan, Xueqian Wang
  • For: 解决 offline 强化学习中的受限策略搜索问题,提出一种基于Diffusion模型的受限策略搜索方法(DiffCPS),以高度表达能力替代先前的AWR方法。* Methods: 利用Diffusion模型的动作分布来消除受限策略搜索中的策略分布约束,然后使用Diffusion模型中的证据下界(ELBO)来近似KL约束。* Results: 在D4RL数据集上进行了广泛的实验,证明DiffCPS可以 дости得更好或至少相当于传统AWR基eline以及近期的Diffusion模型基eline。代码可以在 $\href{https://github.com/felix-thu/DiffCPS}{https://github.com/felix-thu/DiffCPS}$ 上获取。
    Abstract Constrained policy search (CPS) is a fundamental problem in offline reinforcement learning, which is generally solved by advantage weighted regression (AWR). However, previous methods may still encounter out-of-distribution actions due to the limited expressivity of Gaussian-based policies. On the other hand, directly applying the state-of-the-art models with distribution expression capabilities (i.e., diffusion models) in the AWR framework is insufficient since AWR requires exact policy probability densities, which is intractable in diffusion models. In this paper, we propose a novel approach called $\textbf{Diffusion Model based Constrained Policy Search (DiffCPS)}$, which tackles the diffusion-based constrained policy search without resorting to AWR. The theoretical analysis reveals our key insights by leveraging the action distribution of the diffusion model to eliminate the policy distribution constraint in the CPS and then utilizing the Evidence Lower Bound (ELBO) of diffusion-based policy to approximate the KL constraint. Consequently, DiffCPS admits the high expressivity of diffusion models while circumventing the cumbersome density calculation brought by AWR. Extensive experimental results based on the D4RL benchmark demonstrate the efficacy of our approach. We empirically show that DiffCPS achieves better or at least competitive performance compared to traditional AWR-based baselines as well as recent diffusion-based offline RL methods. The code is now available at $\href{https://github.com/felix-thu/DiffCPS}{https://github.com/felix-thu/DiffCPS}$.
    摘要 “干预策搜索”(Constrained Policy Search,简称CPS)是机器学习中的基本问题,通常通过优先预测(Advantage Weighted Regression,简称AWR)解决。然而,先前的方法可能仍会遇到对不同的动作的不合理的行为,因为运用 Gaussian-based 政策的有限表达能力。另一方面,直接将现场的先进模型(i.e., 传播模型)应用在 AWR 框架中是不足的,因为 AWR 需要精确的政策概率密度,传播模型中的概率密度是无法求解的。在本文中,我们提出了一个新的方法,called “传播模型基于的干预策搜索”(Diffusion Model based Constrained Policy Search,简称DiffCPS)。我们的研究表明,DiffCPS 可以在干预策搜索中消除政策概率密度的限制,并且使用传播模型中的动作分布来估计 KL 函数。因此,DiffCPS 可以充分利用传播模型的表达能力,而不需要耗费时间 Calculate 政策概率密度。我们的实验结果显示,DiffCPS 可以对 D4RL benchmark 进行了广泛的测试,并且与传统 AWR 基础的基elines 和最近的传播模型基础的 offline RL 方法相比,获得了更好的性能。我们的代码现在可以在 $\href{https://github.com/felix-thu/DiffCPS}{https://github.com/felix-thu/DiffCPS}$ 上获取。”

Unlearning with Fisher Masking

  • paper_url: http://arxiv.org/abs/2310.05331
  • repo_url: https://github.com/shivank21/Unlearning-with-Fisher-Masking
  • paper_authors: Yufang Liu, Changzhi Sun, Yuanbin Wu, Aimin Zhou
  • for: Machine unlearning aims to revoke some training data after learning in response to requests from users, model developers, and administrators.
  • methods: The proposed method uses a new masking strategy tailored to unlearning based on Fisher information.
  • results: The proposed method can unlearn almost completely while maintaining most of the performance on the remain data, and exhibits stronger stability compared to other unlearning baselines.Here’s the full text in Simplified Chinese:
  • for: 机器学习推理批处理强制请求下的数据恢复
  • methods: 基于Fisher信息的新遮盖策略
  • results: 可以减少大量数据,保持大多数数据的表现,并且比其他基线方法更稳定
    Abstract Machine unlearning aims to revoke some training data after learning in response to requests from users, model developers, and administrators. Most previous methods are based on direct fine-tuning, which may neither remove data completely nor retain full performances on the remain data. In this work, we find that, by first masking some important parameters before fine-tuning, the performances of unlearning could be significantly improved. We propose a new masking strategy tailored to unlearning based on Fisher information. Experiments on various datasets and network structures show the effectiveness of the method: without any fine-tuning, the proposed Fisher masking could unlearn almost completely while maintaining most of the performance on the remain data. It also exhibits stronger stability compared to other unlearning baselines
    摘要 机器学习卷回目标是在学习后根据用户、模型开发者和管理员的请求,撤销一部分训练数据。现有的大多数方法都基于直接细化,这可能并不会完全 removes 数据,也不会保留剩下数据的全部性能。在这种工作中,我们发现,先对一些重要参数进行遮盖,然后进行细化,可以有效提高卷回的性能。我们提出了针对卷回的新的遮盖策略,基于信息理解。对各种数据集和网络结构进行实验,我们发现,无需任何细化,我们的提议的遮盖策略可以几乎完全卷回数据,同时保留大部分剩下数据的性能。它还比其他卷回基线强制稳定。

Provable Compositional Generalization for Object-Centric Learning

  • paper_url: http://arxiv.org/abs/2310.05327
  • repo_url: None
  • paper_authors: Thaddäus Wiedemer, Jack Brady, Alexander Panfilov, Attila Juhos, Matthias Bethge, Wieland Brendel
  • for: bridging the gap between human and machine perception
  • methods: learning object-centric representations, using autoencoders with structural assumptions and enforcing encoder-decoder consistency
  • results: provable compositional generalization of object-centric representations through identifiability theory, validated through experiments on synthetic image data.Here’s the full text in Simplified Chinese:
  • for: bridging the gap between人类和机器视觉
  • methods: 通过学习对象中心表示,使用具有结构假设的自动编码器和强制编码器-解码器一致性,实现可靠的 композиitional generalization
  • results: 通过identifiability理论,证明对象中心表示可以可靠地推广到新的组合结构,并通过synthetic图像数据实验证明了这一结论。
    Abstract Learning representations that generalize to novel compositions of known concepts is crucial for bridging the gap between human and machine perception. One prominent effort is learning object-centric representations, which are widely conjectured to enable compositional generalization. Yet, it remains unclear when this conjecture will be true, as a principled theoretical or empirical understanding of compositional generalization is lacking. In this work, we investigate when compositional generalization is guaranteed for object-centric representations through the lens of identifiability theory. We show that autoencoders that satisfy structural assumptions on the decoder and enforce encoder-decoder consistency will learn object-centric representations that provably generalize compositionally. We validate our theoretical result and highlight the practical relevance of our assumptions through experiments on synthetic image data.
    摘要 学习概念的总结,使机器和人类视觉之间的差异越来越小,是核心的问题。一种广泛的尝试是学习对象中心的表示,这些表示被推测可以实现 композиitional generalization。然而,是否这种推测是正确的,还没有一个明确的理论或实际理解。在这项工作中,我们通过标识理论来研究对象中心表示是否可以 garantuee compositional generalization。我们证明了满足核心假设的 autoencoder 将学习对象中心表示,并且可以确定性地推导 compositional generalization。我们验证了我们的理论结论,并通过实验 validate 我们的假设在生成的图像数据上。

Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

  • paper_url: http://arxiv.org/abs/2310.05324
  • repo_url: https://github.com/acstarnes/wain23-policy-regularization
  • paper_authors: Andrew Starnes, Anton Dereventsov, Clayton Webster
  • For: 本研究考虑了使用强化学习agent中的政策梯度优化的迁移抑制效果。* Methods: 本文使用了不同的$\varphi$-差分和最大均值差Distance来增强Policy的优化目标函数,以促进Policy的多样性。* Results: 数值实验表明,通过使用多样性促进策略 régularization,可以提高各种个性化任务的性能,而且不会 sacrificing accuracy。
    Abstract In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which means certain actions are seldomly, if ever, selected. We augment the optimization objective function for the policy with terms constructed from various $\varphi$-divergences and Maximum Mean Discrepancy which encourages current policies to follow different state visitation and/or action choice distribution than previously computed policies. We provide numerical experiments using MNIST, CIFAR10, and Spotify datasets. The results demonstrate the advantage of diversity-promoting policy regularization and that its use on gradient-based approaches have significantly improved performance on a variety of personalization tasks. Furthermore, numerical evidence is given to show that policy regularization increases performance without losing accuracy.
    摘要 “在这个努力中,我们考虑了规则化对Policy生成的多样性的影响。Policy梯度学习代理人容易出现Entropy塌塌,这意味着某些动作很少或者从未被选择。我们将优化优化目标函数中的Policy加入了不同的状态访问和/或动作选择分布的梯度的不同$\varphi$-多样性和Maximum Mean Discrepancy。我们在MNIST、CIFAR10和Spotify数据集上进行了数值实验,结果表明了促进多样性的策略REG regularization的优势,并且在各种个性化任务中显著提高了性能。此外,我们还提供了数值证明,证明政策REG regularization可以不失准确性地提高表现。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

eess.IV - 2023-10-09

Empirical Evaluation of the Segment Anything Model (SAM) for Brain Tumor Segmentation

  • paper_url: http://arxiv.org/abs/2310.06162
  • repo_url: None
  • paper_authors: Mohammad Peivandi, Jason Zhang, Michael Lu, Dongxiao Zhu, Zhifeng Kou
  • for: 本研究旨在提高基于Segment Anything Model(SAM)的脑肿划分精度。
  • methods: 本研究使用了传输学习和Decathlon脑肿数据集来强化SAM的面掩码解码器。其中,对四维数据进行了三维封装,并使用了随机旋转和弹性变形来增加训练数据的大小。
  • results: 对比预训练的SAM和nnUNetv2,改进后的SAM在脑肿划分精度方面显示了显著提高,而nnUNetv2在总划分精度方面赢得了比较高的分数。然而,改进后的SAM在挑战性较高的案例中表现更为稳定,尤其是在 Hausdorff 距离95%的情况下。
    Abstract Brain tumor segmentation presents a formidable challenge in the field of Medical Image Segmentation. While deep-learning models have been useful, human expert segmentation remains the most accurate method. The recently released Segment Anything Model (SAM) has opened up the opportunity to apply foundation models to this difficult task. However, SAM was primarily trained on diverse natural images. This makes applying SAM to biomedical segmentation, such as brain tumors with less defined boundaries, challenging. In this paper, we enhanced SAM's mask decoder using transfer learning with the Decathlon brain tumor dataset. We developed three methods to encapsulate the four-dimensional data into three dimensions for SAM. An on-the-fly data augmentation approach has been used with a combination of rotations and elastic deformations to increase the size of the training dataset. Two key metrics: the Dice Similarity Coefficient (DSC) and the Hausdorff Distance 95th Percentile (HD95), have been applied to assess the performance of our segmentation models. These metrics provided valuable insights into the quality of the segmentation results. In our evaluation, we compared this improved model to two benchmarks: the pretrained SAM and the widely used model, nnUNetv2. We find that the improved SAM shows considerable improvement over the pretrained SAM, while nnUNetv2 outperformed the improved SAM in terms of overall segmentation accuracy. Nevertheless, the improved SAM demonstrated slightly more consistent results than nnUNetv2, especially on challenging cases that can lead to larger Hausdorff distances. In the future, more advanced techniques can be applied in order to further improve the performance of SAM on brain tumor segmentation.
    摘要 脑肿分割是医学图像分割领域中的一大挑战。深度学习模型已经在此领域中发挥了作用,但是人工专家分割仍然是最准确的方法。最近发布的Segment Anything Model(SAM)已经开创了应用基础模型在这个难题上的可能性。然而,SAM主要在多样的自然图像上进行训练,这使得将SAM应用于生物医学分割,如脑肿诊断,变得更加困难。在这篇论文中,我们提高了SAM的面 máscara解码器使用基于Transfer Learning的Decathlon脑肿数据集。我们开发出了三种方法来封装四维数据到三维数据中,以便在SAM上进行分割。我们采用了在线数据增强策略,结合旋转和弹性变形来增加训练集的大小。我们使用了Dice相似度系数(DSC)和 Hausdorff距离95%(HD95)两个关键指标来评估我们的分割模型的性能。这两个指标为我们提供了有价值的分割结果评估方法。在我们的评估中,我们比较了我们改进的SAM模型与预训练的SAM模型以及广泛使用的nnUNetv2模型。我们发现,改进后的SAM模型在脑肿分割任务中显著提高了性能,而nnUNetv2模型在整体分割精度方面超过了改进后的SAM模型。然而,改进后的SAM模型在挑战性较高的案例中表现更为一致,尤其是在可能导致更大的 Hausdorff 距离的情况下。未来,我们可以采用更高级的技术来进一步提高SAM模型在脑肿分割任务中的性能。

Dipole-Spread Function Engineering for 6D Super-Resolution Microscopy

  • paper_url: http://arxiv.org/abs/2310.05810
  • repo_url: None
  • paper_authors: Tingting Wu, Matthew D. Lew
  • for: 这个论文的目的是探讨fluorescent molecules的六个维度超分辨单分子orientation-localization微scopic镜像技术(SMOLM)。
  • methods: 这篇论文详细介绍了fluorescent диполи的形成图像理论,以及如何通过相位和极化调制来改变镜像形成的dipole spread function(DSF)。它还描述了一些设计这些调制的方法,以及最新的技术,包括双螺旋、四肢、圆形和DeepSTORM3D学习点精度函数(PSF)。
  • results: 论文还详细介绍了一些实际应用,包括生物学应用,以及未来技术的发展和挑战。
    Abstract Fluorescent molecules are versatile nanoscale emitters that enable detailed observations of biophysical processes with nanoscale resolution. Because they are well-approximated as electric dipoles, imaging systems can be designed to visualize their 3D positions and 3D orientations, so-called dipole-spread function (DSF) engineering, for 6D super-resolution single-molecule orientation-localization microscopy (SMOLM). We review fundamental image-formation theory for fluorescent di-poles, as well as how phase and polarization modulation can be used to change the image of a dipole emitter produced by a microscope, called its DSF. We describe several methods for designing these modulations for optimum performance, as well as compare recently developed techniques, including the double-helix, tetrapod, crescent, and DeepSTORM3D learned point-spread functions (PSFs), in addition to the tri-spot, vortex, pixOL, raPol, CHIDO, and MVR DSFs. We also cover common imaging system designs and techniques for implementing engineered DSFs. Finally, we discuss recent biological applications of 6D SMOLM and future challenges for pushing the capabilities and utility of the technology.
    摘要 fluorescent分子是一种 versatile nanoscale发射器,可以允许详细地观察生物物理过程,resolution nanoscale.因为它们可以被视为电动 polarization dipole, therefore imaging system can be designed to visualize their 3D positions and 3D orientations, so-called dipole-spread function (DSF) engineering, for 6D super-resolution single-molecule orientation-localization microscopy (SMOLM).我们将评论基本的图像形成理论 для fluorescent di-poles,以及如何使用阶段和 polarization 模ulation change the image of a dipole emitter produced by a microscope, called its DSF。我们将描述一些设计这些模ulation的方法,以及最近开发的技术,包括double-helix, tetrapod, crescent, and DeepSTORM3D learned point-spread functions (PSFs), in addition to the tri-spot, vortex, pixOL, raPol, CHIDO, and MVR DSFs。我们还将讨论一些通用的 imaging system designs and techniques for implementing engineered DSFs。最后,我们将讨论最近的生物应用和未来挑战,以推动技术的能力和实用性。

Efficient Predictive Coding of Intra Prediction Modes

  • paper_url: http://arxiv.org/abs/2310.05623
  • repo_url: None
  • paper_authors: Kevin Reuzé, Wassim Hamidouche, Pierrick Philippe, Olivier Déforges
  • for: 提高HEVC标准和JEM编码器的压缩效率,特别是在Intra块的压缩中。
  • methods: 提出了一种基于Contextual information的专门编码方案,包括预测、分 grouped 和编码三个步骤,每个步骤都通过引入新元素(标签、测试和编码)进行了改进。使用遗传算法来最优化编码方案,以实现最高的编码效率。
  • results: 在HEVC标准下,我们的方法可以实现显著的比特率减少,同时保持JEM编码器的编码效率,这些结果表明了我们的方法在压缩效率方面的潜在提升。
    Abstract The high efficiency video coding (HEVC) standard and the joint exploration model (JEM) codec incorporate 35 and 67 intra prediction modes (IPMs) respectively, which are essential for efficient compression of Intra coded blocks. These IPMs are transmitted to the decoder through a coding scheme. In our paper, we present an innovative approach to construct a dedicated coding scheme for IPM based on contextual information. This approach comprises three key steps: prediction, clustering, and coding, each of which has been enhanced by introducing new elements, namely, labels for prediction, tests for clustering, and codes for coding. In this context, we have proposed a method that utilizes a genetic algorithm to minimize the rate cost, aiming to derive the most efficient coding scheme while leveraging the available labels, tests, and codes. The resulting coding scheme, expressed as a binary tree, achieves the highest coding efficiency for a given level of complexity. In our experimental evaluation under the HEVC standard, we observed significant bitrate gains while maintaining coding efficiency under the JEM codec. These results demonstrate the potential of our approach to improve compression efficiency, particularly under the HEVC standard, while preserving the coding efficiency of the JEM codec.
    摘要 高效视频编码(HEVC)标准和联合探索模型(JEM)编码器共有35和67内部预测模式(IPM),这些IPM是为高效压缩内部块的必需组成部分。这些IPM通过编码方案传输到解码器。在我们的论文中,我们提出了一种创新的方法,基于上下文信息来构建专门的编码方案。这种方法包括三个关键步骤:预测、聚类和编码,每一步都通过引入新的元素来增强,例如标签 для预测、测试 для聚类和编码。在这个上下文中,我们提出了一种使用遗传算法来最小化比特成本,以 derivate最高效的编码方案,同时利用可用的标签、测试和编码。结果表明,该编码方案,表示为二进制树,在给定的复杂度下实现了最高的编码效率。在我们的实验中,使用HEVC标准,我们观察到了显著的比特率减少,同时保持JEM编码器的编码效率。这些结果表明了我们的方法的潜在提高压缩效率,特别是在HEVC标准下,而且不会削弱JEM编码器的编码效率。

Longitudinal Volumetric Study for the Progression of Alzheimer’s Disease from Structural MR Images

  • paper_url: http://arxiv.org/abs/2310.05558
  • repo_url: None
  • paper_authors: Prayas Sanyal, Srinjay Mukherjee, Arkapravo Das, Anindya Sen
  • for: This paper aims to survey imaging biomarkers corresponding to the progression of Alzheimer’s Disease (AD).
  • methods: The pipeline implemented includes modern pre-processing techniques such as spatial image registration, skull stripping, and inhomogeneity correction. The segmentation of tissue classes is done using an unsupervised learning approach based on intensity histogram information.
  • results: The study found that the structural change in the form of volumes of cerebrospinal fluid (CSF), grey matter (GM), and white matter (WM) can be used to track the progression of Alzheimer’s Disease (AD). The segmented features provide insights such as atrophy, increase or intolerable shifting of GM, WM and CSF, which can help in future research for automated analysis of Alzheimer’s detection with clinical domain explainability.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目的是探讨阿尔茨heimer病(AD)的进行诊断标志。
  • methods: 该 pipeline 使用了现代预处理技术,包括空间尺寸调整、脑骨除除和不均衡纠正。 segmentation 使用了无监督学习方法,基于Intensity histogram信息。
  • results: 研究发现,CSF、GM和WM的体积变化可以跟踪阿尔茨heimer病(AD)的进行。 segmented 特征提供了衰竭、增加或不具适应的GM、WM和CSF的信息,可以帮助未来研究自动化阿尔茨heimer 检测,并提供临床领域可解释的解释。
    Abstract Alzheimer's Disease (AD) is primarily an irreversible neurodegenerative disorder affecting millions of individuals today. The prognosis of the disease solely depends on treating symptoms as they arise and proper caregiving, as there are no current medical preventative treatments. For this purpose, early detection of the disease at its most premature state is of paramount importance. This work aims to survey imaging biomarkers corresponding to the progression of Alzheimer's Disease (AD). A longitudinal study of structural MR images was performed for given temporal test subjects selected randomly from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The pipeline implemented includes modern pre-processing techniques such as spatial image registration, skull stripping, and inhomogeneity correction. The temporal data across multiple visits spanning several years helped identify the structural change in the form of volumes of cerebrospinal fluid (CSF), grey matter (GM), and white matter (WM) as the patients progressed further into the disease. Tissue classes are segmented using an unsupervised learning approach using intensity histogram information. The segmented features thus extracted provide insights such as atrophy, increase or intolerable shifting of GM, WM and CSF and should help in future research for automated analysis of Alzheimer's detection with clinical domain explainability.
    摘要 阿尔茨海默病 (AD) 是一种主要是不可逆的脑组织衰退病种,影响了数百万人今天。这种病的诊断和治疗几乎完全依赖于病人的症状和照顾,没有现有的医学预防性治疗。因此,早期发现病种的症状非常重要。本研究的目的是对阿尔茨海默病的发展进行快照。使用ADNI数据库中随机选择的测试对象,我们实施了一种 longitudinal 的 MR 成像数据集,并应用现代的预处理技术,包括空间尺寸对齐、脑骨剥除和不均匀性 corrections。通过多个访问的时间跨度,我们发现了病人的结构变化,包括脑液(CSF)、灰 mater(GM)和白 matter(WM)的体积。使用无监督学习方法,我们对尺寸信息进行分类,并提取了相应的特征,如衰退、灰 mater 和 WM 的增加或不可接受的移动。这些特征提供了关于阿尔茨海默病的早期诊断和自动分析的Future研究中的解释。

eess.SP - 2023-10-09

Extended Reality via Cooperative NOMA in Hybrid Cloud/Mobile-Edge Computing Networks

  • paper_url: http://arxiv.org/abs/2310.06874
  • repo_url: None
  • paper_authors: Robert-Jeron Reifert, Hayssam Dahrouj, Aydin Sezgin
  • for: 这篇论文旨在解决未来的扩展现实(XR)应用程序中的资源消耗性任务问题,通过融合中央云(CC)、边缘计算(EC)和无人机(UAV)的能力,以提高XR应用程序的质量体验。
  • methods: 该论文提出了一种基于协同非对称多接入(Co-NOMA)的无人机协助混合云/移动边计算架构,以提高XR设备的质量体验。它还提出了一个权衡系统吞吐率和公平性的最大化问题,以确定计算和通信资源的分配和链接选择策略。
  • results: 该论文通过实验结果表明,提出的算法可以最大化系统吞吐率,同时保证系统公平性,并且在实际网络约束下(如能源消耗和延迟)下实现分布式实现。
    Abstract Extended reality (XR) applications often perform resource-intensive tasks, which are computed remotely, a process that prioritizes the latency criticality aspect. To this end, this paper shows that through leveraging the power of the central cloud (CC), the close proximity of edge computers (ECs), and the flexibility of uncrewed aerial vehicles (UAVs), a UAV-aided hybrid cloud/mobile-edge computing architecture promises to handle the intricate requirements of future XR applications. In this context, this paper distinguishes between two types of XR devices, namely, strong and weak devices. The paper then introduces a cooperative non-orthogonal multiple access (Co-NOMA) scheme, pairing strong and weak devices, so as to aid the XR devices quality-of-user experience by intelligently selecting either the direct or the relay links toward the weak XR devices. A sum logarithmic-rate maximization problem is, thus, formulated so as to jointly determine the computation and communication resources, and link-selection strategy as a means to strike a trade-off between the system throughput and fairness. Subject to realistic network constraints, e.g., power consumption and delay, the optimization problem is then solved iteratively via discrete relaxations, successive-convex approximation, and fractional programming, an approach which can be implemented in a distributed fashion across the network. Simulation results validate the proposed algorithms performance in terms of log-rate maximization, delay-sensitivity, scalability, and runtime performance. The practical distributed Co-NOMA implementation is particularly shown to offer appreciable benefits over traditional multiple access and NOMA methods, highlighting its applicability in decentralized XR systems.
    摘要 现实扩展(XR)应用程序通常执行资源密集的任务,这些任务通常在远程计算,以优先级顺序处理。为了实现这一目标,这篇论文提出了一种通过中央云(CC)、边缘计算(EC)和无人机(UAV)的 гибрид云/边缘计算架构来处理未来XR应用程序的复杂需求。在这个上下文中,这篇论文将XR设备分为两类:强设备和弱设备。论文然后引入了合作非对称多访问(Co-NOMA)方案,将强设备和弱设备相互协作,以提高XR设备用户体验质量。为了提高系统吞吐量和公平性,论文提出了一个总日志arithmic-rate最大化问题,以联合确定计算和通信资源,以及链接选择策略。充分考虑了现实网络约束,例如电力消耗和延迟,优化问题可以通过抽象relaxation、Successive-Convex Approximation和分数程序来解决。实际应用中,这种分布式Co-NOMA实现可以提供较高的日志率最大化、延迟敏感度、可扩展性和运行时性能。

Decomposition Based Interference Management Framework for Local 6G Networks

  • paper_url: http://arxiv.org/abs/2310.05809
  • repo_url: None
  • paper_authors: Samitha Gunarathne, Thushan Sivalingam, Nurul Huda Mahmood, Nandana Rajatheva, Matti Latva-Aho
  • for: 本研究旨在提出一种智能干扰管理框架,用于 garantía de calidad de servicio (QoS)的 ultra-reliable low latency communications (URLLC)应用。
  • methods: 提议的算法包括了先进的信号预处理技术——empirical mode decomposition(EMD),然后使用序列-到-一个变换器算法进行预测每个分解成分的干扰电平。预测后,使用预测结果来估算未来信号干扰比例,并将资源分配以 garantía高可靠性。最后,基于预测的干扰信号,进行干扰抑制方案的研究。
  • results: 对于两种基准算法,提议的序列-到-一个变换器模型显示了其 robustness 性。与基准方案相比,提议方案可以降低平均квадратиче差误差值(RMSE)值,最高降低55%。
    Abstract Managing inter-cell interference is among the major challenges in a wireless network, more so when strict quality of service needs to be guaranteed such as in ultra-reliable low latency communications (URLLC) applications. This study introduces a novel intelligent interference management framework for a local 6G network that allocates resources based on interference prediction. The proposed algorithm involves an advanced signal pre-processing technique known as empirical mode decomposition followed by prediction of each decomposed component using the sequence-to-one transformer algorithm. The predicted interference power is then used to estimate future signal-to-interference plus noise ratio, and subsequently allocate resources to guarantee the high reliability required by URLLC applications. Finally, an interference cancellation scheme is explored based on the predicted interference signal with the transformer model. The proposed sequence-to-one transformer model exhibits its robustness for interference prediction. The proposed scheme is numerically evaluated against two baseline algorithms, and is found that the root mean squared error is reduced by up to 55% over a baseline scheme.
    摘要 管理间细胞干扰是无线网络中的一个主要挑战,尤其是在需要保证严格的服务质量,如在超低延迟低功率通信(URLLC)应用中。本研究提出了一种新的智能干扰管理框架,用于本地6G网络资源分配。该算法包括一种高级的信号预处理技术known as empirical mode decomposition,然后使用序列到一转换器算法预测每个分解成分。预测的干扰功率然后用于估算未来信号干扰 plus noise ratio,并在保证URLLC应用所需的高可靠性的情况下分配资源。最后,基于预测的干扰信号,探讨了一种干扰抵消方案,使用转换器模型。提出的序列到一转换器模型在干扰预测中展现了其强健性。与两个基线算法进行比较,研究发现,使用该方案可以将根mean squared error降低到55%以下。

Computation-Limited Signals: A Channel Capacity Regime Constrained by Computational Complexity

  • paper_url: http://arxiv.org/abs/2310.05794
  • repo_url: None
  • paper_authors: Saulo Queiroz, João P. Vilela, Edmundo Monteiro
  • for: 这篇论文探讨了计算限制(comp-limited)信号,即通信容量 régime中的计算时间复杂度开销是关键约束,而不是功率或带宽。
  • methods: 作者提出了一种新的数学框架,基于信息理论和计算复杂度的概念,以 relate 容量和时间复杂度。特别是,作者定义了一个名为算法容量的指标,表示在一个符号中模式化的比特数和通信符号转换所需的最低时间复杂度之间的比例。
  • results: 作者通过设置此指标为函数Channel资源,分类了一个给定的信号设计是comp-limited的。作者还提供了一个使用例子,表明无线OFDM传输器是comp-limited, Unless the lower-bound计算复杂度 of N-point DFT问题为 $\Omega(N)$,这是计算机科学中的一个开放问题。
    Abstract In this letter, we introduce the computational-limited (comp-limited) signals, a communication capacity regime in which the signal time computational complexity overhead is the key constraint -- rather than power or bandwidth -- to the overall communication capacity. To relate capacity and time complexity, we propose a novel mathematical framework that builds on concepts of information theory and computational complexity. In particular, the algorithmic capacity stands for the ratio between the upper-bound number of bits modulated in a symbol and the lower-bound time complexity required to turn these bits into a communication symbol. By setting this ratio as function of the channel resources, we classify a given signal design as comp-limited if its algorithmic capacity nullifies as the channel resources grow. As a use-case, we show that an uncoded OFDM transmitter is comp-limited unless the lower-bound computational complexity of the N-point DFT problem verifies as $\Omega(N)$, which remains an open challenge in theoretical computer science.
    摘要 文中,我们介绍了计算限制(comp-limited)信号,它是通信容量 Régime 中的一个条件,其中信号时间计算复杂度成本是主要的限制因素,而不是功率或带宽。为了将容量和时间复杂度相关联,我们提出了一个新的数学框架,基于信息理论和计算复杂度。具体来说,算法容量表示每个符号中模ulated的最高位数与转化这些位数为通信符号所需的最低时间复杂度之比。通过将这个比率设置为通道资源函数,我们可以将一个给定的信号设计分类为comp-limited。作为一个使用情况,我们显示了一个未编码的OFDM发送器是comp-limited, Unless the lower-bound computational complexity of the N-point DFT problem verifies as $\Omega(N)$, which remains an open challenge in theoretical computer science.Note: "计算限制" (comp-limited) is a term used to describe a communication system where the computational complexity of the signal processing is the primary limiting factor, rather than power or bandwidth.

Physical Layer Security in a Private 5G Network for Industrial and Mobility Application

  • paper_url: http://arxiv.org/abs/2310.05525
  • repo_url: None
  • paper_authors: Shivraj Hanumant Gonde, Christoph Frisch, Svetoslav Duhovnikov, Martin Kubisch, Thomas Meyerhoff, Dominic Schupke
  • for: This paper is written for organizations that operate Private 5G networks in industrial environments, particularly those that require secure communication between devices.
  • methods: The paper uses Physical Layer Key Generation (PLKG) to generate a symmetric secret key between two nodes in the presence of a potential passive eavesdropper.
  • results: The paper demonstrates the establishment of a long-term symmetric key between an aerial vehicle and IT infrastructure in a manufacturing environment, using the radio interface of the Private 5G network.
    Abstract Cellular communication technologies such as 5G are deployed on a large scale around the world. Compared to other communication technologies such as WiFi, Bluetooth, or Ultra Wideband, the 5G communication standard describes support for a large variety of use cases, e.g., Internet of Things, vehicular, industrial, and campus-wide communications. An organization can operate a Private 5G network to provide connectivity to devices in their manufacturing environment. Physical Layer Key Generation (PLKG) is a method to generate a symmetric secret on two nodes despite the presence of a potential passive eavesdropper. To the best of our knowledge, this work is one of the first to implement PLKG in a real Private 5G network. Therefore, it highlights the possibility of integrating PLKG in the communication technology highly relevant for industrial applications. This paper exemplifies the establishment of a long-term symmetric key between an aerial vehicle and IT infrastructure both located in a manufacturing environment and communicating via the radio interface of the Private 5G network.
    摘要 fifth-generation 无线通信技术(5G)在全球范围内大规模部署。相比其他通信技术,如 WiFi、蓝牙或超宽带,5G 通信标准支持各种使用场景,如物联网、交通、工业和校园通信。组织可以运行专用5G网络,以提供制造环境中设备的连接性。物理层密钥生成(PLKG)是一种生成两个节点之间的同步密钥,即使存在可能的潜在窃听者。根据我们所知,这是首次在实际专用5G网络中实现PLKG。因此,它高亮了在工业应用中集成PLKG的可能性。这篇论文示例了在制造环境中的空中车和信息基础设施之间通过专用5G网络的广播 интер脑界面建立长期同步密钥。

MEDUSA: Scalable Biometric Sensing in the Wild through Distributed MIMO Radars

  • paper_url: http://arxiv.org/abs/2310.05507
  • repo_url: None
  • paper_authors: Yilong Li, Ramanujan K Sheshadri, Karthik Sundaresan, Eugene Chai, Suman Banerjee
  • for: 这个研究旨在开发一个基于激光的生命 Parameter 监测系统,以提供不断的无接触式生命 Parameter 监测和医疗应用。
  • methods: 这个系统使用了一种新的快速宽频UWB激光系统,具有自适应和可调的子网络。系统利用了分布式MIMO网络的多标的优点,以提供在实际世界中的生命 Parameter 监测。
  • results: 这个研究获得了20%的平均提升,相比于使用商业激光感知器的现有系统。这证明了MEDUSA的空间多标优点,包括目标和环境动态的监测在 familier和未知内部环境中。
    Abstract Radar-based techniques for detecting vital signs have shown promise for continuous contactless vital sign sensing and healthcare applications. However, real-world indoor environments face significant challenges for existing vital sign monitoring systems. These include signal blockage in non-line-of-sight (NLOS) situations, movement of human subjects, and alterations in location and orientation. Additionally, these existing systems failed to address the challenge of tracking multiple targets simultaneously. To overcome these challenges, we present MEDUSA, a novel coherent ultra-wideband (UWB) based distributed multiple-input multiple-output (MIMO) radar system, especially it allows users to customize and disperse the $16 \times 16$ into sub-arrays. MEDUSA takes advantage of the diversity benefits of distributed yet wirelessly synchronized MIMO arrays to enable robust vital sign monitoring in real-world and daily living environments where human targets are moving and surrounded by obstacles. We've developed a scalable, self-supervised contrastive learning model which integrates seamlessly with our hardware platform. Each attention weight within the model corresponds to a specific antenna pair of Tx and Rx. The model proficiently recovers accurate vital sign waveforms by decomposing and correlating the mixed received signals, including comprising human motion, mobility, noise, and vital signs. Through extensive evaluations involving 21 participants and over 200 hours of collected data (3.75 TB in total, with 1.89 TB for static subjects and 1.86 TB for moving subjects), MEDUSA's performance has been validated, showing an average gain of 20% compared to existing systems employing COTS radar sensors. This demonstrates MEDUSA's spatial diversity gain for real-world vital sign monitoring, encompassing target and environmental dynamics in familiar and unfamiliar indoor environments.
    摘要 采用雷达技术探测生命 Parameters 已经展示了不间断无接触的生命参数监测和医疗应用的搭建。然而,现实世界室内环境对现有生命参数监测系统带来了重大挑战。这些挑战包括雷达信号屏蔽(NLOS)情况下的信号干扰、人体活动的移动和位置和方向的变化。此外,现有系统无法同时跟踪多个目标。为了解决这些挑战,我们提出了MEDUSA,一种新的干扰频率ultra-wideband(UWB)基于分布式多输入多输出(MIMO)雷达系统。MEDUSA利用分布式 yet wirelessly synchronized MIMO数组的多样性优势,以实现robust生命参数监测在现实生活环境中, где人类目标在移动并围绕障碍物。我们开发了一种可扩展的自适应强化学习模型,该模型与我们的硬件平台集成了良好。每个注意力量在模型中对应于特定的天线对(Tx和Rx)。模型能够高效地提取生命参数波形,通过分解和相关处理混合接收信号,包括人体运动、 mobilicity、噪声和生命参数。经过了21名参与者和超过200小时的数据收集(总共3.75TB,其中1.89TB为静止目标和1.86TB为移动目标),MEDUSA的性能已经被验证,显示与现有系统使用商业雷达传感器相比,MEDUSA具有20%的平均提升。这表明MEDUSA在实际世界中具有空间多样性增强,包括目标和环境动态在 familiarn和未知室内环境中。

Affine Frequency Division Multiplexing With Index Modulation

  • paper_url: http://arxiv.org/abs/2310.05475
  • repo_url: None
  • paper_authors: Yiwei Tao, Miaowen Wen, Yao Ge, Jun Li
  • for: 这个论文是为了研究一种基于振荡信号的多Provider frequency division multiplexing(AFDM)系统,并提出一种基于AFDM系统的索引编码(IM)方案。
  • methods: 该论文使用了AFDM系统的框架,并在DAF频域中使用了活动状态来实现索引编码。具体来说, authors将分割DAF域中的子符号,并考虑了本地化和分布式策略。
  • results: 该论文通过closed-form的极限紧张upper bound来证明IM方案的性能,并通过计算机实验证明了该方案的优越性。results show that index bits have stronger diversity protection than modulated bits even when the full diversity condition of AFDM is not satisfied.
    Abstract Affine frequency division multiplexing (AFDM) is a new multicarrier technique based on chirp signals tailored for high-mobility communications, which can achieve full diversity. In this paper, we propose an index modulation (IM) scheme based on the framework of AFDM systems, named AFDM-IM. In the proposed AFDM-IM scheme, the information bits are carried by the activation state of the subsymbols in discrete affine Fourier (DAF) domain in addition to the conventional constellation symbols. To efficiently perform IM, we divide the subsymbols in DAF domain into several groups and consider both the localized and distributed strategies. An asymptotically tight upper bound on the average bit error rate (BER) of the maximum-likelihood detection in the existence of channel estimation errors is derived in closed-form. Computer simulations are carried out to evaluate the performance of the proposed AFDM-IM scheme, whose results corroborate its superiority over the benchmark schemes in the linear time-varying channels. We also evaluate the BER performance of the index and modulated bits for the AFDM-IM scheme with and without satisfying the full diversity condition of AFDM. The results show that the index bits have a stronger diversity protection than the modulated bits even when the full diversity condition of AFDM is not satisfied.
    摘要 “Affine频率分多普通方式”(AFDM)是一种基于滑动信号的新多个 carriers 技术,适用于高移动通信,可以实现全多态性。在这篇论文中,我们提出了一个基于 AFDM 系统框架的指标修征(IM)方案,称为 AFDM-IM。在我们的提案中,信息位元被传递到 AFDM 系统中的几个批次中,并且在这些批次中使用传统的折衣符号。为了有效地实现 IM,我们在 DAF 领域中分割 subsymbols 成多个群体,并考虑了本地化和分散的两种策略。我们 derive 了一个对应于最大可能性探测的对应几何率(BER)的封闭式上界,并将其与 computer simulations 进行评估。结果显示,我们的 AFDM-IM 方案在线性时间变化频率对应于更高的性能。我们还评估了 AFDM-IM 方案中的指标位元和修征位元的 BER 性能,并发现指标位元在 AFDM 的全多态性不满足时仍然具有更强的多态保护。

Waveform Design for MIMO-OFDM Integrated Sensing and Communication System: An Information Theoretical Approach

  • paper_url: http://arxiv.org/abs/2310.05444
  • repo_url: None
  • paper_authors: Zhiqing Wei, Jinghui Piao, Xin Yuan, Huici Wu, J. Andrew Zhang, Zhiyong Feng, Lin Wang, Ping Zhang
  • for: 这篇论文主要探讨了integration sensing and communication(ISAC)系统中波形设计的问题,以及在5G-A和6G移动通信系统中ISAC技术的应用。
  • methods: 本论文使用了信息论中的统一性能指标,即相互信息(MI),来度量多普逻盘ISAC系统中的感知和通信性能。然后,提出了最优波形设计方案,以最大化感知MI、通信MI和权衡感知和通信MI的加权和。
  • results: 优化结果通过Monte Carlo伪陷 simulations进行验证。本研究提供了有效的封闭式表达式,使得MIMO-OFDM ISAC系统能够实现平衡的感知和通信性能。
    Abstract Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified performance metrics in information theory, namely mutual information (MI), to measure the communication and sensing performance in multicarrier ISAC system. In multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) ISAC system, we first derive the sensing and communication MI with subcarrier correlation and spatial correlation. Then, we propose optimal waveform designs for maximizing the sensing MI, communication MI and the weighted sum of sensing and communication MI, respectively. The optimization results are validated by Monte Carlo simulations. Our work provides effective closed-form expressions for waveform design, enabling the realization of MIMO-OFDM ISAC system with balanced performance in communication and sensing.
    摘要 Integrated sensing and communication (ISAC) 被视为未来 fifth-generation advanced (5G-A) 和 sixth-generation (6G) 移动通信系统的关键技术。 ISAC 波形设计是 ISAC 系统的关键。然而,传感和通信性能的不同会对 ISAC 波形设计带来挑战。本文使用信息理论中的共聚性指标(MI)来度量传感和通信性能。在多个输入多个输出的orthogonal frequency division multiplexing (MIMO-OFDM) ISAC 系统中,我们首先计算传感和通信 MI 的相互关系。然后,我们提出了最佳波形设计,以最大化传感 MI、通信 MI 和权重总和传感和通信 MI。我们的工作提供了有效的关闭式表达式,使得 MIMO-OFDM ISAC 系统可以实现平衡的传感和通信性能。Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.

A Stochastic Particle Variational Bayesian Inference Inspired Deep-Unfolding Network for Non-Convex Parameter Estimation

  • paper_url: http://arxiv.org/abs/2310.05382
  • repo_url: None
  • paper_authors: Zhixiang Hu, An Liu, Minjian Zhao
  • for: 这个研究旨在提供一个高维度非对称参数估计的方法,以应对未来无线网络中的普遍感知服务需求。
  • methods: 本研究提出了一个平行数位粒子统计量 bayesian inference(PSPVBI)算法,并将其融合到深度 unfolding 网络中(DU),以提高算法的速度和精度。
  • results: 实验结果显示,LPSPVBI 算法在无线感知应用中的参数估计比现有方法高精度。
    Abstract Future wireless networks are envisioned to provide ubiquitous sensing services, which also gives rise to a substantial demand for high-dimensional non-convex parameter estimation, i.e., the associated likelihood function is non-convex and contains numerous local optima. Variational Bayesian inference (VBI) provides a powerful tool for modeling complex estimation problems and reasoning with prior information, but poses a long-standing challenge on computing intractable posteriori distributions. Most existing variational methods generally rely on assumptions about specific distribution families to derive closed-form solutions, and are difficult to apply in high-dimensional, non-convex scenarios. Given these challenges, firstly, we propose a parallel stochastic particle variational Bayesian inference (PSPVBI) algorithm. Thanks to innovations such as particle approximation, additional updates of particle positions, and parallel stochastic successive convex approximation (PSSCA), PSPVBI can flexibly drive particles to fit the posteriori distribution with acceptable complexity, yielding high-precision estimates of the target parameters. Furthermore, additional speedup can be obtained by deep-unfolding (DU) the PSPVBI algorithm. Specifically, superior hyperparameters are learned to dramatically reduce the number of algorithmic iterations. In this PSPVBI-induced Deep-Unfolding Networks, some techniques related to gradient computation, data sub-sampling, differentiable sampling, and generalization ability are also employed to facilitate the practical deployment. Finally, we apply the LPSPVBI to solve several important parameter estimation problems in wireless sensing scenarios. Simulations indicate that the LPSPVBI algorithm outperforms existing solutions.
    摘要 将来的无线网络将提供 ubique 感知服务,导致高维非拟合参数估计的巨大需求,即相关的可能函数是非拟合的和含有多个局部最优点。基本 Bayesian 推理 (VB) 提供了模拟复杂估计问题和使用先验信息进行理据处理的强大工具,但计算不可靠的后验分布却成为了长期挑战。大多数现有的变量方法通常假设特定的分布家族,从而得到关闭式解决方案,而在高维、非拟合情况下困难应用。为解决这些挑战,我们首先提出了并行随机粒子变量 Bayesian 推理(PSPVBI)算法。因为增加了粒子方法、粒子位置更新和并行随机Successive Convex Approximation(PSSCA)等创新,PSPVBI可以灵活地使粒子适应 posteriori 分布,得到高精度的参数估计。此外,通过深度 unfolding(DU)的技术,我们可以进一步提高算法的速度。具体来说,我们通过学习超过参数,减少算法迭代数量,实现了在 PSPVBI 中的深度 unfolding。在 PSPVBI induced Deep-Unfolding Networks 中,我们还使用了一些相关的梯度计算、数据子抽样、可导采样和通用能力等技术,以便实际应用。最后,我们通过 LPSPVBI 算法解决了无线感知场景中的一些重要参数估计问题。 simulation 结果表明,LPSPVBI 算法在现有解决方案中具有优势。

Distortion-Aware Phase Retrieval Receiver for High-Order QAM Transmission with Carrierless Intensity-Only Measurements

  • paper_url: http://arxiv.org/abs/2310.05314
  • repo_url: None
  • paper_authors: Hanzi Huang, Haoshuo Chen, Qi Gao, Yetian Huang, Nicolas K. Fontaine, Mikael Mazur, Lauren Dallachiesa, Roland Ryf, Zhengxuan Li, Yingxiong Song
  • for: investigate high-order quadrature amplitude modulation (QAM) signals transmission with carrierless and intensity-only measurements, and improve precision of phase retrieval (PR) algorithm.
  • methods: propose distortion-aware PR scheme with training and reconstruction stages, estimate and emulate distortion caused by channel impairments, improve agreement between estimated and measured amplitudes.
  • results: experimentally demonstrate 50-GBaud 16QAM and 32QAM signals transmission over 40km and 80km SSMF spans, achieve BERs below 6.25% HD-FEC and 25% SD-FEC thresholds, and achieve post-FEC data rate of up to 140 Gb/s with optimal pilot symbol ratio of 20%.
    Abstract We experimentally investigate transmitting high-order quadrature amplitude modulation (QAM) signals with carrierless and intensity-only measurements with phase retrieval (PR) receiving techniques. The intensity errors during measurement, including noise and distortions, are found to be a limiting factor for the precise convergence of the PR algorithm. To improve the PR reconstruction accuracy, we propose a distortion-aware PR scheme comprising both training and reconstruction stages. By estimating and emulating the distortion caused by various channel impairments, the proposed scheme enables enhanced agreement between the estimated and measured amplitudes throughout the PR iteration, thus resulting in improved reconstruction performance to support high-order QAM transmission. With the aid of proposed techniques, we experimentally demonstrate 50-GBaud 16QAM and 32QAM signals transmitting through a standard single-mode optical fiber (SSMF) span of 40 and 80 km, and achieve bit error rates (BERs) below the 6.25% hard decision (HD)-forward error correction (FEC) and 25% soft decision (SD)-FEC thresholds for the two modulation formats, respectively. By tuning the pilot symbol ratio and applying concatenated coding, we also demonstrate that a post-FEC data rate of up to 140 Gb/s can be achieved for both distances at an optimal pilot symbol ratio of 20%.
    摘要 我们实验性地研究了在无载波和强度仅测量下传输高阶 quadrature amplitude modulation(QAM)信号。测量过程中的强度错误,包括噪声和扭曲,被发现是精度恢复 алгоритм的限制因素。为了提高恢复精度,我们提议了一种考虑到频率响应的扭曲恢复方案,包括训练和重建两个阶段。通过估算和模拟各种通道缺陷所引起的扭曲,该方案可以在PR迭代过程中实现更好的吻合,从而提高恢复性能,以支持高阶QAM传输。通过我们的技术,我们实验性地在标准单模光纤(SSMF) span 40和80公里上传输了50Gbps 16QAM和32QAM信号,并在这两种模ulation format中达到了Below the 6.25% hard decision(HD)forward error correction(FEC)和25% soft decision(SD)FEC的下限。通过调整示例符号比例和 concatenated coding,我们还示出了在这两个距离上达到140Gb/s的后FEC数据速率。

cs.SD - 2023-10-08

VITS-based Singing Voice Conversion System with DSPGAN post-processing for SVCC2023

  • paper_url: http://arxiv.org/abs/2310.05118
  • repo_url: None
  • paper_authors: Yiquan Zhou, Meng Chen, Yi Lei, Jihua Zhu, Weifeng Zhao
  • for: 这项研究的目的是为SVCC2023提供一个系统,以便在 singing voice conversion 领域实现高质量的音频转换。
  • methods: 该系统包括三个模块:特征提取器、声音转换器和后处理器。特征提取器使用 HuBERT 模型提取 singing voice 中的 F0 轨迹和 speaker-independent 语言内容。声音转换器使用 target speaker 的声音特征、F0 和语言内容来生成目标speaker 的波形。此外,为了进一步提高音质,我们还使用了一个精度调整的 DSPGAN vocoder。
  • results: 在 official challenge 结果中,我们的系统在 cross-domain 任务中表现出色,得分第1和第2位,分别在自然性和相似性两个指标上。此外,我们还进行了一些缓解分析,以证明我们的系统设计的有效性。
    Abstract This paper presents the T02 team's system for the Singing Voice Conversion Challenge 2023 (SVCC2023). Our system entails a VITS-based SVC model, incorporating three modules: a feature extractor, a voice converter, and a post-processor. Specifically, the feature extractor provides F0 contours and extracts speaker-independent linguistic content from the input singing voice by leveraging a HuBERT model. The voice converter is employed to recompose the speaker timbre, F0, and linguistic content to generate the waveform of the target speaker. Besides, to further improve the audio quality, a fine-tuned DSPGAN vocoder is introduced to re-synthesise the waveform. Given the limited target speaker data, we utilize a two-stage training strategy to adapt the base model to the target speaker. During model adaptation, several tricks, such as data augmentation and joint training with auxiliary singer data, are involved. Official challenge results show that our system achieves superior performance, especially in the cross-domain task, ranking 1st and 2nd in naturalness and similarity, respectively. Further ablation justifies the effectiveness of our system design.
    摘要

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting

  • paper_url: http://arxiv.org/abs/2310.05078
  • repo_url: https://github.com/nii-yamagishilab/partial_rank_similarity
  • paper_authors: Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah
  • for: 这项研究旨在提出一种新的质量 mean opinion score(MOS)预测函数,用于评估未经见过的语音合成系统的质量。
  • methods: 该函数measure相对位置的相似性,而不是实际的MOS值,通过测量partial rank similarity(PRS)而不是L1损失函数。
  • results: 实验表明,PRS在零shot和半supervised设定下表现出色,与真实值更高度相关,而MSE和linear correlation coefficient metric可能不适用于评估MOS预测模型。
    Abstract This paper introduces a novel objective function for quality mean opinion score (MOS) prediction of unseen speech synthesis systems. The proposed function measures the similarity of relative positions of predicted MOS values, in a mini-batch, rather than the actual MOS values. That is the partial rank similarity is measured (PRS) rather than the individual MOS values as with the L1 loss. Our experiments on out-of-domain speech synthesis systems demonstrate that the PRS outperforms L1 loss in zero-shot and semi-supervised settings, exhibiting stronger correlation with ground truth. These findings highlight the importance of considering rank order, as done by PRS, when training MOS prediction models. We also argue that mean squared error and linear correlation coefficient metrics may be unreliable for evaluating MOS prediction models. In conclusion, PRS-trained models provide a robust framework for evaluating speech quality and offer insights for developing high-quality speech synthesis systems. Code and models are available at github.com/nii-yamagishilab/partial_rank_similarity/
    摘要 这份论文介绍了一种新的评价函数,用于预测未看过的语音合成系统的质量 mean opinion score(MOS)。提出的函数测量在一个小批次中预测的MOS值相对位置的相似性,而不是实际的MOS值。即使使用了partial rank similarity(PRS)而不是L1损失,我们的实验表明,PRS在零批次和半指导学习 Setting 中表现更好,与基准数据 exhibit stronger correlation。这些发现反映了考虑 rank order 的重要性,当训练 MOS 预测模型时。我们还认为 mean squared error 和 linear correlation coefficient metrics 可能不可靠地评价 MOS 预测模型。 conclusion,PRS 训练的模型提供了一种robust的 speech quality 评价框架,并且为开发高质量语音合成系统提供了意见。代码和模型可以在github.com/nii-yamagishilab/partial_rank_similarity/ 找到。

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

  • paper_url: http://arxiv.org/abs/2310.05051
  • repo_url: https://github.com/bakerbunker/salt
  • paper_authors: Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie
  • for: 隐藏发音人的身份,保持语音质量和可理解性。
  • methods: 基于隐藏空间转换的发音人匿名系统(SALT),包括自主学习特征提取器和随机抽取多个发音人和其权重,并通过 interpolate 实现发音人匿名。同时,我们还 explore 了扩展方法以提高假发音人的多样性。
  • results: 在 Voice Privacy Challenge 数据集上,我们的系统实现了最佳的匿名度指标,同时保持语音质量和可理解性。
    Abstract Speaker anonymization aims to conceal a speaker's identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging or modifying the speaker representation. However, the anonymized speech is subject to reduction in pseudo speaker distinctiveness, speech quality and intelligibility for out-of-distribution speaker. To solve this issue, we propose SALT, a Speaker Anonymization system based on Latent space Transformation. Specifically, we extract latent features by a self-supervised feature extractor and randomly sample multiple speakers and their weights, and then interpolate the latent vectors to achieve speaker anonymization. Meanwhile, we explore the extrapolation method to further extend the diversity of pseudo speakers. Experiments on Voice Privacy Challenge dataset show our system achieves a state-of-the-art distinctiveness metric while preserving speech quality and intelligibility. Our code and demo is availible at https://github.com/BakerBunker/SALT .
    摘要

PromptSpeaker: Speaker Generation Based on Text Descriptions

  • paper_url: http://arxiv.org/abs/2310.05001
  • repo_url: None
  • paper_authors: Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li
  • for: 这项研究旨在实现文本描述基于的发音人生成(text-guided speaker generation),即通过文本描述控制发音人生成过程。
  • methods: 该研究提出了一种名为PromptSpeaker的文本指导发音人生成系统,该系统包括提取器、零批量VITS和Glow模型。提取器预测基于文本描述的含义表示,并从这个分布中采样以获取 semantic representation。Glow模型将含义表示转换成发音人表示,而零批量VITS最后将发音人表示转换成真实的发音。
  • results: 研究证明PromptSpeaker可以生成与训练集外的新发音人,并且synthetic speaker voice具有相对合理的主观匹配质量。
    Abstract Recently, text-guided content generation has received extensive attention. In this work, we explore the possibility of text description-based speaker generation, i.e., using text prompts to control the speaker generation process. Specifically, we propose PromptSpeaker, a text-guided speaker generation system. PromptSpeaker consists of a prompt encoder, a zero-shot VITS, and a Glow model, where the prompt encoder predicts a prior distribution based on the text description and samples from this distribution to obtain a semantic representation. The Glow model subsequently converts the semantic representation into a speaker representation, and the zero-shot VITS finally synthesizes the speaker's voice based on the speaker representation. We verify that PromptSpeaker can generate speakers new from the training set by objective metrics, and the synthetic speaker voice has reasonable subjective matching quality with the speaker prompt.
    摘要 近些时间,文本指导内容生成已经受到了广泛关注。在这项工作中,我们探索了使用文本描述来控制发音生成过程的可能性。具体来说,我们提出了PromptSpeaker,一种文本指导的发音生成系统。PromptSpeaker包括一个描述符编码器、一个零拟合VITS和一个Glow模型,其中描述符编码器根据文本描述预测一个优先分布,并从这个分布中采样以获取一个semantic表示。Glow模型然后将semantic表示转化为发音表示,零拟合VITS最后将发音表示转化为声音。我们证明了PromptSpeaker可以新生成不同于训练集的发音,并且synthetic声音具有合理的主观匹配质量与发音描述。

cs.CV - 2023-10-08

Progressive Neural Compression for Adaptive Image Offloading under Timing Constraints

  • paper_url: http://arxiv.org/abs/2310.05306
  • repo_url: https://github.com/rickywrq/Progressive-Neural-Compression
  • paper_authors: Ruiqi Wang, Hanyang Liu, Jiaming Qiu, Moran Xu, Roch Guerin, Chenyang Lu
  • for: 这篇论文旨在提出一种适应性的进步神经压缩方法,以提高机器学习应用程序在边缘服务器上的推论性能,并且在网络带宽不稳定的情况下进行图像卸载。
  • methods: 本篇论文使用了进步神经压缩(PNC)方法,并使用了多目标减少自适应器来训练对应的图像压缩策略,以便根据可用带宽进行图像卸载。
  • results: 相比于现有的神经压缩方法和传统压缩方法,PNC方法可以提高机器学习应用程序的推论性能,并且可以适应网络带宽不稳定的情况。
    Abstract IoT devices are increasingly the source of data for machine learning (ML) applications running on edge servers. Data transmissions from devices to servers are often over local wireless networks whose bandwidth is not just limited but, more importantly, variable. Furthermore, in cyber-physical systems interacting with the physical environment, image offloading is also commonly subject to timing constraints. It is, therefore, important to develop an adaptive approach that maximizes the inference performance of ML applications under timing constraints and the resource constraints of IoT devices. In this paper, we use image classification as our target application and propose progressive neural compression (PNC) as an efficient solution to this problem. Although neural compression has been used to compress images for different ML applications, existing solutions often produce fixed-size outputs that are unsuitable for timing-constrained offloading over variable bandwidth. To address this limitation, we train a multi-objective rateless autoencoder that optimizes for multiple compression rates via stochastic taildrop to create a compression solution that produces features ordered according to their importance to inference performance. Features are then transmitted in that order based on available bandwidth, with classification ultimately performed using the (sub)set of features received by the deadline. We demonstrate the benefits of PNC over state-of-the-art neural compression approaches and traditional compression methods on a testbed comprising an IoT device and an edge server connected over a wireless network with varying bandwidth.
    摘要 互联网物品(IoT)正在日益成为机器学习(ML)应用程序的数据来源。从设备到服务器的数据传输通常是通过本地无线网络,带宽是有限的,更重要的是可变的。在融合物理环境的应用中,图像传输也会受到时间限制。因此,发展一个适应的方法可以在时间限制和互联网设备的资源限制下,提高机器学习应用的推论性能。在这篇文章中,我们使用图像分类作为我们的目标应用,并提出进步的神经压缩(PNC)方法来解决这个问题。尽管神经压缩已经用于不同的机器学习应用中图像压缩,但现有的解决方案通常会生成固定大小的出力,这不适合时间限制下的图像传输。为解决这个限制,我们训练了多个目标自适应的减少顶峰网络,以便在不同的压缩率下选择适合的压缩率,并生成按照推论性能的重要性排序的特征。这些特征会根据可用带宽进行传输,并最终通过在时间限制下接收的(子)集特征进行分类。我们在一个包括IoT设备和边缘服务器之间的无线网络上进行了实验,评估PNC方法与现有的神经压缩方法和传统压缩方法的比较。

GestSync: Determining who is speaking without a talking head

  • paper_url: http://arxiv.org/abs/2310.05304
  • repo_url: https://github.com/Sindhu-Hegde/gestsync
  • paper_authors: Sindhu B Hegde, Andrew Zisserman
  • For: The paper is written for determining if a person’s gestures are correlated with their speech or not, and exploring the use of self-supervised learning for this task.* Methods: The paper introduces a dual-encoder model for the task of Gesture-Sync, and compares the performance of different input representations, including RGB frames, keypoint images, and keypoint vectors.* Results: The paper shows that the model can be trained using self-supervised learning alone, and evaluates its performance on the LRS3 dataset. Additionally, the paper demonstrates applications of Gesture-Sync for audio-visual synchronisation and determining who is the speaker in a crowd without seeing their faces.
    Abstract In this paper we introduce a new synchronisation task, Gesture-Sync: determining if a person's gestures are correlated with their speech or not. In comparison to Lip-Sync, Gesture-Sync is far more challenging as there is a far looser relationship between the voice and body movement than there is between voice and lip motion. We introduce a dual-encoder model for this task, and compare a number of input representations including RGB frames, keypoint images, and keypoint vectors, assessing their performance and advantages. We show that the model can be trained using self-supervised learning alone, and evaluate its performance on the LRS3 dataset. Finally, we demonstrate applications of Gesture-Sync for audio-visual synchronisation, and in determining who is the speaker in a crowd, without seeing their faces. The code, datasets and pre-trained models can be found at: \url{https://www.robots.ox.ac.uk/~vgg/research/gestsync}.
    摘要 在这篇论文中,我们引入了一个新的同步任务,即Gesture-Sync:判断一个人的手势与其说话是否相关。与Lip-Sync相比,Gesture-Sync更加具有挑战性,因为voice和身体运动之间的关系比lip motion和语音之间的关系更加松散。我们介绍了一种双encoder模型来解决这个任务,并比较了不同的输入表示,包括RGB帧、关键点图像和关键点向量,评估其性能和优势。我们表明该模型可以通过自我超视了学习来训练,并评估其性能在LRS3 dataset上。最后,我们展示了Gesture-Sync在audio-visual同步和推测人声的场景中的应用,以及不看到人脸时推测说话人的场景。相关代码、数据集和预训练模型可以在以下链接中找到:\url{https://www.robots.ox.ac.uk/~vgg/research/gestsync}.

Image Compression and Decompression Framework Based on Latent Diffusion Model for Breast Mammography

  • paper_url: http://arxiv.org/abs/2310.05299
  • repo_url: https://github.com/neogeoss/EMBED_Mammo_Models
  • paper_authors: InChan Hwang, MinJae Woo
  • for: 这个研究旨在开发一个新的医疗图像压缩和解压缩框架,利用隐藏扩散模型(LDM)。LDM比过去的浊度扩散概率模型(DDPM)有更好的图像质量和较少的计算资源,在图像解压缩过程中。
  • methods: 这个研究使用了隐藏扩散模型(LDM)和Torchvision进行图像扩大,并考虑了医疗图像数据的应用。
  • results: 实验结果显示,这种方法比传统文件压缩算法更好,并且训练使用解压缩档案的 convolutional neural network(CNN)模型与使用原始图像档案训练的模型相比,表现相似。此外,这种方法还可以将医疗图像数据压缩到较小的大小,以便在医疗设备中储存。研究的影响范围包括医疗图像压缩中的噪声reduction和替代复杂波纹基于损失less压缩算法。
    Abstract This research presents a novel framework for the compression and decompression of medical images utilizing the Latent Diffusion Model (LDM). The LDM represents advancement over the denoising diffusion probabilistic model (DDPM) with a potential to yield superior image quality while requiring fewer computational resources in the image decompression process. A possible application of LDM and Torchvision for image upscaling has been explored using medical image data, serving as an alternative to traditional image compression and decompression algorithms. The experimental outcomes demonstrate that this approach surpasses a conventional file compression algorithm, and convolutional neural network (CNN) models trained with decompressed files perform comparably to those trained with original image files. This approach also significantly reduces dataset size so that it can be distributed with a smaller size, and medical images take up much less space in medical devices. The research implications extend to noise reduction in lossy compression algorithms and substitute for complex wavelet-based lossless algorithms.
    摘要 这项研究提出了一种新的压缩和解压缩医疗图像框架,利用Latent Diffusion Model(LDM)。LDM比denoising diffusion probabilistic model(DDPM)有更多的进步,可以提供更高质量的图像,同时需要更少的计算资源进行图像解压缩。研究人员还探索了使用LDM和Torchvision进行图像缩放,作为传统图像压缩和解压缩算法的替代方案。实验结果表明,这种方法比传统文件压缩算法更好,并且使用解压缩文件训练的 convolutional neural network(CNN)模型与原始图像文件训练的模型性能相似。此外,这种方法还可以减少数据集大小,使得它可以更加容易地分布,医疗设备中的医疗图像占用的空间也更加小。研究的影响扩展到了lossy压缩算法中的噪声减少和lossless压缩算法中的复杂波лет特征。

MSight: An Edge-Cloud Infrastructure-based Perception System for Connected Automated Vehicles

  • paper_url: http://arxiv.org/abs/2310.05290
  • repo_url: None
  • paper_authors: Rusheng Zhang, Depu Meng, Shengyin Shen, Zhengxia Zou, Houqiang Li, Henry X. Liu
  • for: 这篇论文是为了探讨Connected Automated Vehicle(CAV)应用中的道路边缘感知技术。
  • methods: 这篇论文使用了路侧感知系统MSight,实现了实时车辆检测、定位、追踪和短期路径预测。
  • results: 评估结果显示MSight系统能够维持车道精度,并且具有几乎 zero latency,这表明了这个系统在CAV安全性和效率方面的应用潜力。
    Abstract As vehicular communication and networking technologies continue to advance, infrastructure-based roadside perception emerges as a pivotal tool for connected automated vehicle (CAV) applications. Due to their elevated positioning, roadside sensors, including cameras and lidars, often enjoy unobstructed views with diminished object occlusion. This provides them a distinct advantage over onboard perception, enabling more robust and accurate detection of road objects. This paper presents MSight, a cutting-edge roadside perception system specifically designed for CAVs. MSight offers real-time vehicle detection, localization, tracking, and short-term trajectory prediction. Evaluations underscore the system's capability to uphold lane-level accuracy with minimal latency, revealing a range of potential applications to enhance CAV safety and efficiency. Presently, MSight operates 24/7 at a two-lane roundabout in the City of Ann Arbor, Michigan.
    摘要 “自动驾驶汽车(CAV)技术继续发展,基础设施上的路面感知emerges as a pivotal tool。由于路面感知器的高位置,包括摄像头和激光测距仪,通常有较好的视野和降低物体遮挡,这提供了它们与车辆上的感知相比,更加稳定和准确地检测道路上的物体。本文介绍了MSight,一个特有的路面感知系统,专门设计供CAV使用。MSight在实时提供车辆检测、定位、追踪和短期预测。评估结果显示MSight可以保持车道精度,并具有最小延迟。这些应用可能增加CAV的安全和效率。目前,MSight在美国密歇根州安那堤市的一个二轮圆环上运行24小时。”

The Emergence of Reproducibility and Consistency in Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.05264
  • repo_url: None
  • paper_authors: Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Liyue Shen, Qing Qu
  • for: 这个论文的目的是探索Diffusion模型中的一种常见现象,即“一致性模型重复性”。
  • methods: 作者采用了大量实验和分析方法,包括使用不同的模型架构和训练策略,来研究Diffusion模型的一致性模型重复性。
  • results: 研究发现,Diffusion模型在不同的训练 regime中都具有一定的一致性模型重复性,其中包括“记忆化 режим”和“泛化 режим”。此外,作者还发现这种一致性模型重复性可以在多种Diffusion模型的变种中找到,例如Conditional Diffusion模型、用于解决反向问题的Diffusion模型以及精度调整后的Diffusion模型。
    Abstract Recently, diffusion models have emerged as powerful deep generative models, showcasing cutting-edge performance across various applications such as image generation, solving inverse problems, and text-to-image synthesis. These models generate new data (e.g., images) by transforming random noise inputs through a reverse diffusion process. In this work, we uncover a distinct and prevalent phenomenon within diffusion models in contrast to most other generative models, which we refer to as ``consistent model reproducibility''. To elaborate, our extensive experiments have consistently shown that when starting with the same initial noise input and sampling with a deterministic solver, diffusion models tend to produce nearly identical output content. This consistency holds true regardless of the choices of model architectures and training procedures. Additionally, our research has unveiled that this exceptional model reproducibility manifests in two distinct training regimes: (i) ``memorization regime,'' characterized by a significantly overparameterized model which attains reproducibility mainly by memorizing the training data; (ii) ``generalization regime,'' in which the model is trained on an extensive dataset, and its reproducibility emerges with the model's generalization capabilities. Our analysis provides theoretical justification for the model reproducibility in ``memorization regime''. Moreover, our research reveals that this valuable property generalizes to many variants of diffusion models, including conditional diffusion models, diffusion models for solving inverse problems, and fine-tuned diffusion models. A deeper understanding of this phenomenon has the potential to yield more interpretable and controllable data generative processes based on diffusion models.
    摘要

Structure-Preserving Instance Segmentation via Skeleton-Aware Distance Transform

  • paper_url: http://arxiv.org/abs/2310.05262
  • repo_url: None
  • paper_authors: Zudi Lin, Donglai Wei, Aarush Gupta, Xingyu Liu, Deqing Sun, Hanspeter Pfister
  • for: Histopathology image segmentation
  • methods: Skeleton-aware distance transform (SDT) combining object skeleton and distance transform
  • results: State-of-the-art performance in histopathology image segmentation
    Abstract Objects with complex structures pose significant challenges to existing instance segmentation methods that rely on boundary or affinity maps, which are vulnerable to small errors around contacting pixels that cause noticeable connectivity change. While the distance transform (DT) makes instance interiors and boundaries more distinguishable, it tends to overlook the intra-object connectivity for instances with varying width and result in over-segmentation. To address these challenges, we propose a skeleton-aware distance transform (SDT) that combines the merits of object skeleton in preserving connectivity and DT in modeling geometric arrangement to represent instances with arbitrary structures. Comprehensive experiments on histopathology image segmentation demonstrate that SDT achieves state-of-the-art performance.
    摘要 对于具有复杂结构的对象,现有的实例分割方法,例如基于边界或相互关系图的方法,会面临 significiant 挑战。这些方法容易受到小范围内的像素错误所致的连接性变化的影响,从而导致分割不准确。而距离变换(DT)可以使实例的内部和边界更加可识别,但是它通常会忽略内部对象的连接性,导致不同宽度的实例进行过分割。为解决这些挑战,我们提议使用skeleton-aware distance transform(SDT),这种方法结合了对象骨架的优点,以保持连接性,并且利用距离变换来模型几何关系,以更好地表示具有自由结构的实例。我们对历史病理图像分割进行了广泛的实验,结果表明,SDT可以达到当前最佳性能。

SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval

  • paper_url: http://arxiv.org/abs/2310.05241
  • repo_url: None
  • paper_authors: Sunjae Yoon, Gwanhyeong Koo, Dahyun Kim, Chang D. Yoo
  • for: 本研究旨在提高视频oment Retrieval(VMR)系统的精度和效率,通过在多个视频中检索具有相同语言查询的时刻点。
  • methods: 本研究提出了一种新的Scene Complexity Aware Network(SCANet),该网络能够评估多个视频中场景的复杂性,并根据场景的复杂性生成适应性的提案。
  • results: 实验结果表明,使用SCANet网络可以在三个检索标准(Charades-STA、ActivityNet、TVR)上达到状态级性能,并且 demonstarted the effectiveness of incorporating scene complexity in VMR systems.
    Abstract Video moment retrieval aims to localize moments in video corresponding to a given language query. To avoid the expensive cost of annotating the temporal moments, weakly-supervised VMR (wsVMR) systems have been studied. For such systems, generating a number of proposals as moment candidates and then selecting the most appropriate proposal has been a popular approach. These proposals are assumed to contain many distinguishable scenes in a video as candidates. However, existing proposals of wsVMR systems do not respect the varying numbers of scenes in each video, where the proposals are heuristically determined irrespective of the video. We argue that the retrieval system should be able to counter the complexities caused by varying numbers of scenes in each video. To this end, we present a novel concept of a retrieval system referred to as Scene Complexity Aware Network (SCANet), which measures the `scene complexity' of multiple scenes in each video and generates adaptive proposals responding to variable complexities of scenes in each video. Experimental results on three retrieval benchmarks (i.e., Charades-STA, ActivityNet, TVR) achieve state-of-the-art performances and demonstrate the effectiveness of incorporating the scene complexity.
    摘要 视频瞬间检索目标是将视频中对应给给定语言查询的时刻点进行本地化。为了避免对时间点的标注成本昂贵,弱监督视频检索(wsVMR)系统得到了研究。这些系统通常采取生成多个提议作为时刻点候选者,然后选择最合适的提议。这些提议假设视频中包含许多可识别的场景。然而,现有的wsVMR系统提议不尊重每个视频中场景的数量,这些提议是基于视频的规则随意确定的。我们认为检索系统应该能够对每个视频中场景的复杂性进行应对。为此,我们提出了一种新的检索系统,即场景复杂度意识网络(SCANet),它在多个视频中场景的复杂性测量并生成适应场景复杂性的提议。实验结果在三个检索标准 benchmark(i.e., Charades-STA、ActivityNet、TVR)上达到了状态的最佳性能,并证明了包含场景复杂度的 incorporation 的效iveness。

Latent Diffusion Model for Medical Image Standardization and Enhancement

  • paper_url: http://arxiv.org/abs/2310.05237
  • repo_url: None
  • paper_authors: Md Selim, Jie Zhang, Faraneh Fathi, Michael A. Brooks, Ge Wang, Guoqiang Yu, Jin Chen
  • for: 这篇论文的目的是提出一个新的数据构造模型,以对 computed tomography (CT) 图像进行标准化,以提高医学研究中的可比性和精确性。
  • methods: 这篇论文使用了一个名为 DiffusionCT 的新型数据构造模型,该模型在 latent space 中将不同的 CT 图像转换为标准化的图像,以提高医学研究中的可比性和精确性。
  • results: 这篇论文的实验结果显示,DiffusionCT 可以对 CT 图像进行高品质的标准化,并且可以降低 SPAD 图像中的噪声,进一步验证了 DiffusionCT 的有效性。
    Abstract Computed tomography (CT) serves as an effective tool for lung cancer screening, diagnosis, treatment, and prognosis, providing a rich source of features to quantify temporal and spatial tumor changes. Nonetheless, the diversity of CT scanners and customized acquisition protocols can introduce significant inconsistencies in texture features, even when assessing the same patient. This variability poses a fundamental challenge for subsequent research that relies on consistent image features. Existing CT image standardization models predominantly utilize GAN-based supervised or semi-supervised learning, but their performance remains limited. We present DiffusionCT, an innovative score-based DDPM model that operates in the latent space to transform disparate non-standard distributions into a standardized form. The architecture comprises a U-Net-based encoder-decoder, augmented by a DDPM model integrated at the bottleneck position. First, the encoder-decoder is trained independently, without embedding DDPM, to capture the latent representation of the input data. Second, the latent DDPM model is trained while keeping the encoder-decoder parameters fixed. Finally, the decoder uses the transformed latent representation to generate a standardized CT image, providing a more consistent basis for downstream analysis. Empirical tests on patient CT images indicate notable improvements in image standardization using DiffusionCT. Additionally, the model significantly reduces image noise in SPAD images, further validating the effectiveness of DiffusionCT for advanced imaging tasks.
    摘要 computed tomography (CT) serve as an effective tool for lung cancer screening, diagnosis, treatment, and prognosis, providing a rich source of features to quantify temporal and spatial tumor changes. However, the diversity of CT scanners and customized acquisition protocols can introduce significant inconsistencies in texture features, even when assessing the same patient. This variability poses a fundamental challenge for subsequent research that relies on consistent image features. Existing CT image standardization models predominantly utilize GAN-based supervised or semi-supervised learning, but their performance remains limited. We present DiffusionCT, an innovative score-based DDPM model that operates in the latent space to transform disparate non-standard distributions into a standardized form. The architecture comprises a U-Net-based encoder-decoder, augmented by a DDPM model integrated at the bottleneck position. First, the encoder-decoder is trained independently, without embedding DDPM, to capture the latent representation of the input data. Second, the latent DDPM model is trained while keeping the encoder-decoder parameters fixed. Finally, the decoder uses the transformed latent representation to generate a standardized CT image, providing a more consistent basis for downstream analysis. Empirical tests on patient CT images indicate notable improvements in image standardization using DiffusionCT. Additionally, the model significantly reduces image noise in SPAD images, further validating the effectiveness of DiffusionCT for advanced imaging tasks.

Enhancing Cross-Dataset Performance of Distracted Driving Detection With Score-Softmax Classifier

  • paper_url: http://arxiv.org/abs/2310.05202
  • repo_url: https://github.com/congduan-hnu/ssoftmax
  • paper_authors: Cong Duan, Zixuan Liu, Jiahao Xia, Minghai Zhang, Jiacai Liao, Libo Cao
  • for: 这个研究旨在提高车上司机的实时监控,以预测分心、疲劳和潜在危险。
  • methods: 我们引入了Score-Softmax分类器,以解决跨数据集短cut learning问题,并且利用人类评价模式设计了二维超级监管矩阵。
  • results: 我们的研究表明,Score-Softmax分类器可以提高跨数据集表现,并且比传统方法更好地结合多个数据集。
    Abstract Deep neural networks enable real-time monitoring of in-vehicle driver, facilitating the timely prediction of distractions, fatigue, and potential hazards. This technology is now integral to intelligent transportation systems. Recent research has exposed unreliable cross-dataset end-to-end driver behavior recognition due to overfitting, often referred to as ``shortcut learning", resulting from limited data samples. In this paper, we introduce the Score-Softmax classifier, which addresses this issue by enhancing inter-class independence and Intra-class uncertainty. Motivated by human rating patterns, we designed a two-dimensional supervisory matrix based on marginal Gaussian distributions to train the classifier. Gaussian distributions help amplify intra-class uncertainty while ensuring the Score-Softmax classifier learns accurate knowledge. Furthermore, leveraging the summation of independent Gaussian distributed random variables, we introduced a multi-channel information fusion method. This strategy effectively resolves the multi-information fusion challenge for the Score-Softmax classifier. Concurrently, we substantiate the necessity of transfer learning and multi-dataset combination. We conducted cross-dataset experiments using the SFD, AUCDD-V1, and 100-Driver datasets, demonstrating that Score-Softmax improves cross-dataset performance without modifying the model architecture. This provides a new approach for enhancing neural network generalization. Additionally, our information fusion approach outperforms traditional methods.
    摘要

Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

  • paper_url: http://arxiv.org/abs/2310.05193
  • repo_url: None
  • paper_authors: Chenzhuang Du, Yue Zhao, Chonghua Liao, Jiacheng You, Jie Fu, Hang Zhao
  • for: 这种研究旨在更好地利用大规模预训练的uni-modal模型,以提高多模态学习的表现。
  • methods: 这种方法使用预训练的uni-modal模型,并将其作为初始模型进行多模态联合训练,以增强模式之间的适应性。
  • results: 研究表明,这种方法可以提高多模态模型的总表现,特别是在一些任务中,even when fine-tuned with only uni-modal data。
    Abstract This paper investigates how to better leverage large-scale pre-trained uni-modal models to further enhance discriminative multi-modal learning. Even when fine-tuned with only uni-modal data, these models can outperform previous multi-modal models in certain tasks. It's clear that their incorporation into multi-modal learning would significantly improve performance. However, multi-modal learning with these models still suffers from insufficient learning of uni-modal features, which weakens the resulting multi-modal model's generalization ability. While fine-tuning uni-modal models separately and then aggregating their predictions is straightforward, it doesn't allow for adequate adaptation between modalities, also leading to sub-optimal results. To this end, we introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA). By freezing the weights of uni-modal fine-tuned models, adding extra trainable rank decomposition matrices to them, and subsequently performing multi-modal joint training, our method enhances adaptation between modalities and boosts overall performance. We demonstrate the effectiveness of MMLoRA on three dataset categories: audio-visual (e.g., AVE, Kinetics-Sound, CREMA-D), vision-language (e.g., MM-IMDB, UPMC Food101), and RGB-Optical Flow (UCF101).
    摘要

HOD: A Benchmark Dataset for Harmful Object Detection

  • paper_url: http://arxiv.org/abs/2310.05192
  • repo_url: https://github.com/poori-nuna/hod-benchmark-dataset
  • paper_authors: Eungyeom Ha, Heemook Kim, Sung Chul Hong, Dongbin Na
    for: 这个论文的目标是开发自动识别危险内容的系统,以防止在在线服务平台上传播危险内容。methods: 这个研究使用了最新的计算机视觉技术,包括使用最新的对象检测架构和大量的数据集来训练模型。results: 研究人员通过实验表明,使用提议的数据集和方法可以准确地检测在线服务平台上的危险内容,并且可以在实时应用中提供有效的识别结果。
    Abstract Recent multi-media data such as images and videos have been rapidly spread out on various online services such as social network services (SNS). With the explosive growth of online media services, the number of image content that may harm users is also growing exponentially. Thus, most recent online platforms such as Facebook and Instagram have adopted content filtering systems to prevent the prevalence of harmful content and reduce the possible risk of adverse effects on users. Unfortunately, computer vision research on detecting harmful content has not yet attracted attention enough. Users of each platform still manually click the report button to recognize patterns of harmful content they dislike when exposed to harmful content. However, the problem with manual reporting is that users are already exposed to harmful content. To address these issues, our research goal in this work is to develop automatic harmful object detection systems for online services. We present a new benchmark dataset for harmful object detection. Unlike most related studies focusing on a small subset of object categories, our dataset addresses various categories. Specifically, our proposed dataset contains more than 10,000 images across 6 categories that might be harmful, consisting of not only normal cases but also hard cases that are difficult to detect. Moreover, we have conducted extensive experiments to evaluate the effectiveness of our proposed dataset. We have utilized the recently proposed state-of-the-art (SOTA) object detection architectures and demonstrated our proposed dataset can be greatly useful for the real-time harmful object detection task. The whole source codes and datasets are publicly accessible at https://github.com/poori-nuna/HOD-Benchmark-Dataset.
    摘要 近年来多媒体数据如图片和视频在不同的在线服务平台上迅速扩散,如社交媒体服务(SNS)。随着在线媒体服务的快速发展,具有可能伤害用户的图像内容的数量也在增长 exponentially。因此,现代在线平台如Facebook和Instagram已经采用内容筛选系统来防止有害内容的普及和减少可能的用户伤害的风险。然而,计算机视觉研究检测有害内容还没有吸引到够多的关注。用户们仍然通过手动报告按钮来认识他们看到的有害内容。然而,手动报告的问题在于用户已经曝露在有害内容中。为解决这些问题,我们在这项工作中的研究目标是开发自动检测有害对象系统。我们提出了一个新的比较 dataset,与大多数相关研究一样,我们的 dataset 覆盖了多个对象类别,并且包含了超过 10,000 个图像,这些图像包括不只是正常情况,还有一些困难检测的情况。此外,我们进行了广泛的实验,以评估我们提出的 dataset 的有效性。我们使用了最新的 state-of-the-art 对象检测架构,并证明了我们的 dataset 可以在实时有害对象检测任务中具有很高的有用性。整个源代码和数据集都可以在 上公开访问。

AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition

  • paper_url: http://arxiv.org/abs/2310.05184
  • repo_url: https://github.com/Lu-Feng/AANet
  • paper_authors: Feng Lu, Lijun Zhang, Shuting Dong, Baifan Chen, Chun Yuan
    for:* 这种paper是为了提出一种高效的视觉场景识别方法,用于Robotics中的位置定位。methods:* 该方法使用了两个阶段的 hierarchical two-stage VPR 方法,首先使用全局特征进行全局搜索,然后使用本地特征进行再排序。* 该方法还提出了一种 Dynamically Aligning Local Features (DALF) 算法,用于在空间约束下对本地特征进行对齐。results:* 对四个常用的 VPR 数据集进行了广泛的实验,结果显示,提出的 AANet 可以比一些现有的状态作准的方法更高效,同时占用的时间更少。
    Abstract Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots. Recently, the hierarchical two-stage VPR methods have become popular in this field due to the trade-off between accuracy and efficiency. These methods retrieve the top-k candidate images using the global features in the first stage, then re-rank the candidates by matching the local features in the second stage. However, they usually require additional algorithms (e.g. RANSAC) for geometric consistency verification in re-ranking, which is time-consuming. Here we propose a Dynamically Aligning Local Features (DALF) algorithm to align the local features under spatial constraints. It is significantly more efficient than the methods that need geometric consistency verification. We present a unified network capable of extracting global features for retrieving candidates via an aggregation module and aligning local features for re-ranking via the DALF alignment module. We call this network AANet. Meanwhile, many works use the simplest positive samples in triplet for weakly supervised training, which limits the ability of the network to recognize harder positive pairs. To address this issue, we propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks. Extensive experiments on four benchmark VPR datasets show that the proposed AANet can outperform several state-of-the-art methods with less time consumption. The code is released at https://github.com/Lu-Feng/AANet.
    摘要 Visual地位识别(VPR)是机器人学研究的热点之一,它利用视觉信息来定位机器人。近年来,层次分解两个阶段VPR方法在这个领域得到了广泛应用,因为它们可以平衡精度和效率。这些方法首先使用全局特征来检索top-k候选图像,然后在第二阶段使用本地特征进行重新排序。然而,它们通常需要额外的算法(例如RANSAC)来验证几何一致性,这会占用大量时间。我们提出了一种 Dinamically Aligning Local Features(DALF)算法,可以在空间约束下对本地特征进行对齐。与需要几何一致性验证的方法相比,它更高效。我们提出了一种能够提取全局特征并对本地特征进行对齐的网络,我们称之为AANet。另外,许多工作使用最简单的三个图像作为弱有监督训练的正例,这限制了网络的识别能力。为了解决这个问题,我们提出了一种 Semi-hard Positive Sample Mining(ShPSM)策略,可以选择适当的困难正例图像进行训练更加 Robust VPR 网络。我们在四个常用的VPR数据集上进行了广泛的实验,结果显示,我们的AANet可以超过一些状态OF-the-art方法,并且占用更少的时间。代码可以在https://github.com/Lu-Feng/AANet上下载。

ITRE: Low-light Image Enhancement Based on Illumination Transmission Ratio Estimation

  • paper_url: http://arxiv.org/abs/2310.05158
  • repo_url: None
  • paper_authors: Yu Wang, Yihong Wang, Tong Liu, Xiubao Sui, Qian Chen
  • for: 提高低光照图像的品质
  • methods: 使用Retinex方法,包括分色域聚类、初始照明传输矩阵计算、基础模型生成和终端检查等步骤,以避免噪声、 artifacts 和过度曝光
  • results: 对比state-of-the-art方法,本方法在降低噪声、避免 artifacts、控制曝光水平方面具有优越的表现
    Abstract Noise, artifacts, and over-exposure are significant challenges in the field of low-light image enhancement. Existing methods often struggle to address these issues simultaneously. In this paper, we propose a novel Retinex-based method, called ITRE, which suppresses noise and artifacts from the origin of the model, prevents over-exposure throughout the enhancement process. Specifically, we assume that there must exist a pixel which is least disturbed by low light within pixels of same color. First, clustering the pixels on the RGB color space to find the Illumination Transmission Ratio (ITR) matrix of the whole image, which determines that noise is not over-amplified easily. Next, we consider ITR of the image as the initial illumination transmission map to construct a base model for refined transmission map, which prevents artifacts. Additionally, we design an over-exposure module that captures the fundamental characteristics of pixel over-exposure and seamlessly integrate it into the base model. Finally, there is a possibility of weak enhancement when inter-class distance of pixels with same color is too small. To counteract this, we design a Robust-Guard module that safeguards the robustness of the image enhancement process. Extensive experiments demonstrate the effectiveness of our approach in suppressing noise, preventing artifacts, and controlling over-exposure level simultaneously. Our method performs superiority in qualitative and quantitative performance evaluations by comparing with state-of-the-art methods.
    摘要 噪声、artefacts和过度曝光是低光照图像增强领域中的主要挑战。现有方法 oftentimes 难以同时解决这些问题。在这篇论文中,我们提出了一种新的Retinex基于的方法,称为ITRE,该方法可以在图像增强过程中对噪声和artefacts进行控制,同时避免过度曝光。具体来说,我们假设在图像中存在一个最少受到低光照的像素,我们可以通过RGB色彩空间的聚类来找到整个图像的照明传输矩阵(ITR),该矩阵确定了噪声不易过度增强。接着,我们将ITR矩阵作为图像的初始照明传输地图,并将其用于构建基本模型,以避免artefacts。此外,我们还设计了一个过度曝光模块,该模块可以融合到基本模型中,以捕捉图像过度曝光的基本特征。最后,当相同色彩的像素间距离过小时,可能会出现弱化效果。为了解决这个问题,我们设计了一个Robust-Guard模块,以保证图像增强过程的稳定性。广泛的实验表明,我们的方法可以同时控制噪声、artefacts和过度曝光水平,并且在质量和量化性能评价中表现出优于状态艺术方法。

LocoNeRF: A NeRF-based Approach for Local Structure from Motion for Precise Localization

  • paper_url: http://arxiv.org/abs/2310.05134
  • repo_url: None
  • paper_authors: Artem Nenashev, Mikhail Kurenkov, Andrei Potapov, Iana Zhura, Maksim Katerishich, Dzmitry Tsetserukou
  • for: 提高视觉定位精度, addresses the limitations of global SfM and the challenges of local SfM.
  • methods: 使用Neural Radiance Fields (NeRF) instead of image databases for storage, and sampling reference images around the prior query position for further improvements.
  • results: 比ground truth有0.068米的准确性,但是数据库大小减少至160MB,比COLMAP的400MB有所降低。 Additionally, the ablation study shows the impact of using reference images from the NeRF reconstruction.
    Abstract Visual localization is a critical task in mobile robotics, and researchers are continuously developing new approaches to enhance its efficiency. In this article, we propose a novel approach to improve the accuracy of visual localization using Structure from Motion (SfM) techniques. We highlight the limitations of global SfM, which suffers from high latency, and the challenges of local SfM, which requires large image databases for accurate reconstruction. To address these issues, we propose utilizing Neural Radiance Fields (NeRF), as opposed to image databases, to cut down on the space required for storage. We suggest that sampling reference images around the prior query position can lead to further improvements. We evaluate the accuracy of our proposed method against ground truth obtained using LIDAR and Advanced Lidar Odometry and Mapping in Real-time (A-LOAM), and compare its storage usage against local SfM with COLMAP in the conducted experiments. Our proposed method achieves an accuracy of 0.068 meters compared to the ground truth, which is slightly lower than the most advanced method COLMAP, which has an accuracy of 0.022 meters. However, the size of the database required for COLMAP is 400 megabytes, whereas the size of our NeRF model is only 160 megabytes. Finally, we perform an ablation study to assess the impact of using reference images from the NeRF reconstruction.
    摘要 “视觉本地化是移动 роботика中的一项关键任务,研究人员不断开发新的方法来提高其效率。在这篇文章中,我们提出一种新的方法来提高视觉本地化的准确性,使用结构从运动(SfM)技术。我们指出了全球SfM的局限性,即高延迟,以及本地SfM的挑战,即需要大量图像数据库以实现准确重建。为了解决这些问题,我们提议使用神经辐射场(NeRF),而不是图像数据库,来减少存储空间。我们建议在提前查询位置附近采样参考图像可以获得进一步改进。我们对我们提议的方法与实际数据进行评估,并与使用COLMAP的本地SfM进行比较。我们的方法准确性为0.068米,轻微低于最先进的方法COLMAP准确性(0.022米)。然而,COLMAP需要400兆字节的数据库,而我们的NeRF模型只需160兆字节。最后,我们进行了一个ablation研究,以评估使用NeRF重建中的参考图像的影响。”

Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.05133
  • repo_url: https://github.com/USTCPCS/CVPR2018_attention
  • paper_authors: Dominik Hollidt, Clinton Wang, Polina Golland, Marc Pollefeys
  • for: 实现3D内容Semantic Segmentation仅基于2D监控,使用神经辉照场(NeRF)。
  • methods: 通过获取表面点云的特征,实现了几何簇的储存,实现了3D理解。通过伪类型自动编码学习,实现了几何簇的数据储存。
  • results: 获得了几何簇的储存,并且可以实现几何簇的3D理解。
    Abstract We present a novel approach to perform 3D semantic segmentation solely from 2D supervision by leveraging Neural Radiance Fields (NeRFs). By extracting features along a surface point cloud, we achieve a compact representation of the scene which is sample-efficient and conducive to 3D reasoning. Learning this feature space in an unsupervised manner via masked autoencoding enables few-shot segmentation. Our method is agnostic to the scene parameterization, working on scenes fit with any type of NeRF.
    摘要 我们提出了一种新的方法,可以通过使用神经辐射场(NeRF)来完成3Dsemantic segmentation,仅基于2D监督。我们通过对表面点云中提取特征来获得一个减少的表示形式,该形式是效率的样本和适合3D理解。通过在掩码自动编码中学习这个特征空间,我们实现了几个shot分类。我们的方法对于场景参数化是无关的,可以应用于任何类型的NeRF场景。Note: "神经辐射场" (NeRF) is a Chinese term that refers to Neural Radiance Fields.

Bidirectional Knowledge Reconfiguration for Lightweight Point Cloud Analysis

  • paper_url: http://arxiv.org/abs/2310.05125
  • repo_url: None
  • paper_authors: Peipei Li, Xing Cui, Yibo Hu, Man Zhang, Ting Yao, Tao Mei
  • for: 本研究旨在提高点云分析的计算机系统过载问题,使其可以在移动或边缘设备上应用。
  • methods: 本文提出了特征缩减技术来降低点云模型的计算负担。 Specifically, we propose bidirectional knowledge reconfiguration (BKR) to distill informative contextual knowledge from the teacher to the student.
  • results: 我们的方法在shape classification、part segmentation和semantic segmentation benchmarks上表现出了超越性和优势,demonstrating the universality and superiority of our method.
    Abstract Point cloud analysis faces computational system overhead, limiting its application on mobile or edge devices. Directly employing small models may result in a significant drop in performance since it is difficult for a small model to adequately capture local structure and global shape information simultaneously, which are essential clues for point cloud analysis. This paper explores feature distillation for lightweight point cloud models. To mitigate the semantic gap between the lightweight student and the cumbersome teacher, we propose bidirectional knowledge reconfiguration (BKR) to distill informative contextual knowledge from the teacher to the student. Specifically, a top-down knowledge reconfiguration and a bottom-up knowledge reconfiguration are developed to inherit diverse local structure information and consistent global shape knowledge from the teacher, respectively. However, due to the farthest point sampling in most point cloud models, the intermediate features between teacher and student are misaligned, deteriorating the feature distillation performance. To eliminate it, we propose a feature mover's distance (FMD) loss based on optimal transportation, which can measure the distance between unordered point cloud features effectively. Extensive experiments conducted on shape classification, part segmentation, and semantic segmentation benchmarks demonstrate the universality and superiority of our method.
    摘要 点云分析面临计算系统开销限制其在移动或边缘设备上应用。直接采用小型模型可能会导致显著性能下降,因为小型模型很难同时捕捉点云中的本地结构和全局形态信息,这些信息是点云分析的关键决定因素。本文探讨了降简点云模型的技术。为了减少教师和学生之间的Semantic gap,我们提出了双向知识重新配置(BKR),将教师知识中的有用Contextual information遗传给学生。具体来说,我们开发了从教师到学生的顶部知识重新配置和从学生到教师的底部知识重新配置,以继承教师的多样化本地结构信息和一致的全局形态知识。然而,由于多数点云模型中的远点抽样,学生和教师之间的中间特征不对Alignment,这会降低feature distillation的性能。为了解决这个问题,我们提出了基于最优运输的特征移动距离(FMD)损失,可以有效度量不同点云特征之间的距离。我们对shape classification、部分 segmentation和semantic segmentation benchmark进行了广泛的实验,结果表明我们的方法在 universality 和优势性方面具有出色的表现。

Cross-domain Robust Deepfake Bias Expansion Network for Face Forgery Detection

  • paper_url: http://arxiv.org/abs/2310.05124
  • repo_url: None
  • paper_authors: Weihua Liu, Lin Li, Chaochao Lin, Said Boumaraf
    for: 这篇论文旨在提高人脸假制检测的安全性,尤其是面对深度推假技术的威胁。methods: 本论文提出了一种解决方案——跨频率坚定偏差扩展网络(BENet),通过使用自动编码器重建输入face,保持真实face的均衡性,同时选择性地增强假face与其原始样本之间的差异。results: 对比于现有方法,BENet在内部和跨频率测试中表现出色,能够有效地检测深度推假face。此外,BENet还 incorporates 一种矩阵注意力(LSA)模块,可以更好地捕捉异常的假制特征。
    Abstract The rapid advancement of deepfake technologies raises significant concerns about the security of face recognition systems. While existing methods leverage the clues left by deepfake techniques for face forgery detection, malicious users may intentionally manipulate forged faces to obscure the traces of deepfake clues and thereby deceive detection tools. Meanwhile, attaining cross-domain robustness for data-based methods poses a challenge due to potential gaps in the training data, which may not encompass samples from all relevant domains. Therefore, in this paper, we introduce a solution - a Cross-Domain Robust Bias Expansion Network (BENet) - designed to enhance face forgery detection. BENet employs an auto-encoder to reconstruct input faces, maintaining the invariance of real faces while selectively enhancing the difference between reconstructed fake faces and their original counterparts. This enhanced bias forms a robust foundation upon which dependable forgery detection can be built. To optimize the reconstruction results in BENet, we employ a bias expansion loss infused with contrastive concepts to attain the aforementioned objective. In addition, to further heighten the amplification of forged clues, BENet incorporates a Latent-Space Attention (LSA) module. This LSA module effectively captures variances in latent features between the auto-encoder's encoder and decoder, placing emphasis on inconsistent forgery-related information. Furthermore, BENet incorporates a cross-domain detector with a threshold to determine whether the sample belongs to a known distribution. The correction of classification results through the cross-domain detector enables BENet to defend against unknown deepfake attacks from cross-domain. Extensive experiments demonstrate the superiority of BENet compared with state-of-the-art methods in intra-database and cross-database evaluations.
    摘要 <> translate "The rapid advancement of deepfake technologies raises significant concerns about the security of face recognition systems. While existing methods leverage the clues left by deepfake techniques for face forgery detection, malicious users may intentionally manipulate forged faces to obscure the traces of deepfake clues and thereby deceive detection tools. Meanwhile, attaining cross-domain robustness for data-based methods poses a challenge due to potential gaps in the training data, which may not encompass samples from all relevant domains. Therefore, in this paper, we introduce a solution - a Cross-Domain Robust Bias Expansion Network (BENet) - designed to enhance face forgery detection. BENet employs an auto-encoder to reconstruct input faces, maintaining the invariance of real faces while selectively enhancing the difference between reconstructed fake faces and their original counterparts. This enhanced bias forms a robust foundation upon which dependable forgery detection can be built. To optimize the reconstruction results in BENet, we employ a bias expansion loss infused with contrastive concepts to attain the aforementioned objective. In addition, to further heighten the amplification of forged clues, BENet incorporates a Latent-Space Attention (LSA) module. This LSA module effectively captures variances in latent features between the auto-encoder's encoder and decoder, placing emphasis on inconsistent forgery-related information. Furthermore, BENet incorporates a cross-domain detector with a threshold to determine whether the sample belongs to a known distribution. The correction of classification results through the cross-domain detector enables BENet to defend against unknown deepfake attacks from cross-domain. Extensive experiments demonstrate the superiority of BENet compared with state-of-the-art methods in intra-database and cross-database evaluations." into Simplified Chinese.Here's the translation:“深刻的深伪技术的快速发展,对人脸识别系统的安全带来重要的忧虑。现有的方法可以利用深伪技术留下的伪证据来检测伪证,但是黑客可能会故意修改伪证,以隐藏深伪的迹象,并且欺骗检测工具。同时,为了实现跨领域Robustness,资料基础方法面临着潜在的领域差异问题,这些训练数据可能不包括所有 relevance 的领域。因此,在这篇论文中,我们提出了一个解决方案——跨领域Robust Bias Expansion Network(BENet),用于增强伪证检测。BENet 使用 auto-encoder 重建输入的脸部,保持真实脸部的不变性,同时选择性地强化伪证的重建和原始对比。这个强化的偏见形成了可靠的基础,以便建立可靠的伪证检测。为了优化 BENet 中的重建结果,我们使用了偏见扩展损失,融合了相对概念,以达到这个目标。此外,BENet 还包括一个 Latent-Space Attention(LSA)模组,这个模组可以有效地捕捉 auto-encoder 的Encoder 和 Decoder 之间的潜在特征差异,将注意力集中在伪证相关的不一致信息上。此外,BENet 还包括一个跨领域检测器,以决定样本是否属于已知分布。通过跨领域检测器进行样本的修正,使得 BENet 能够防止不知道的深伪攻击。实验结果显示,BENet 与现有的方法相比,在内部资料和跨部资料评估中表现出色。”

Dynamic Multi-Domain Knowledge Networks for Chest X-ray Report Generation

  • paper_url: http://arxiv.org/abs/2310.05119
  • repo_url: None
  • paper_authors: Weihua Liu, Youyuan Xue, Chaochao Lin, Said Boumaraf
  • for: This paper aims to address the challenges of automatically generating radiology diagnostic reports, particularly the imbalance in data distribution between normal and abnormal samples, by proposing a Dynamic Multi-Domain Knowledge (DMDK) network.
  • methods: The proposed DMDK network consists of four modules: Chest Feature Extractor (CFE), Dynamic Knowledge Extractor (DKE), Specific Knowledge Extractor (SKE), and Multi-knowledge Integrator (MKI) module. The network utilizes dynamic disease topic labels, domain-specific dynamic knowledge graphs, and multi-knowledge integration to mitigate data biases and enhance interpretability.
  • results: The proposed method was extensively evaluated on two widely used datasets (IU X-Ray and MIMIC-CXR) and achieved state-of-the-art performance in all evaluation metrics, outperforming previous models.
    Abstract The automated generation of radiology diagnostic reports helps radiologists make timely and accurate diagnostic decisions while also enhancing clinical diagnostic efficiency. However, the significant imbalance in the distribution of data between normal and abnormal samples (including visual and textual biases) poses significant challenges for a data-driven task like automatically generating diagnostic radiology reports. Therefore, we propose a Dynamic Multi-Domain Knowledge(DMDK) network for radiology diagnostic report generation. The DMDK network consists of four modules: Chest Feature Extractor(CFE), Dynamic Knowledge Extractor(DKE), Specific Knowledge Extractor(SKE), and Multi-knowledge Integrator(MKI) module. Specifically, the CFE module is primarily responsible for extracting the unprocessed visual medical features of the images. The DKE module is responsible for extracting dynamic disease topic labels from the retrieved radiology diagnostic reports. We then fuse the dynamic disease topic labels with the original visual features of the images to highlight the abnormal regions in the original visual features to alleviate the visual data bias problem. The SKE module expands upon the conventional static knowledge graph to mitigate textual data biases and amplify the interpretability capabilities of the model via domain-specific dynamic knowledge graphs. The MKI distills all the knowledge and generates the final diagnostic radiology report. We performed extensive experiments on two widely used datasets, IU X-Ray and MIMIC-CXR. The experimental results demonstrate the effectiveness of our method, with all evaluation metrics outperforming previous state-of-the-art models.
    摘要 自动生成 radiology 诊断报告可以帮助 radiologist 更快、更准确地作出诊断决策,同时提高临床诊断效率。然而,数据分布的巨大偏度问题(包括视觉和文本偏见)对于数据驱动的任务如自动生成 radiology 诊断报告来说是一个挑战。因此,我们提出了动态多Domain知识(DMDK)网络,用于 radiology 诊断报告生成。DMDK 网络由四个模块组成:胸部特征提取器(CFE)、动态知识提取器(DKE)、特定知识提取器(SKE)和多知识 интегратор(MKI)模块。具体来说,CFE 模块主要负责从未处理的医学影像中提取视觉特征。DKE 模块负责从检索到的 radiology 诊断报告中提取动态疾病话题标签。我们将这些动态疾病话题标签与原始视觉特征相结合,以减少视觉数据偏见问题。SKE 模块在传统的静止知识图中增强了文本数据偏见问题,并通过域pecific的动态知识图来提高模型的解释能力。MKI 模块将所有的知识integrirated,并生成最终的 radiology 诊断报告。我们在 IU X-Ray 和 MIMIC-CXR 两个广泛使用的数据集上进行了广泛的实验,结果表明我们的方法效果很高,所有评价指标都高于之前的状态对照模型。

Lightweight In-Context Tuning for Multimodal Unified Models

  • paper_url: http://arxiv.org/abs/2310.05109
  • repo_url: None
  • paper_authors: Yixin Chen, Shuai Zhang, Boran Han, Jiaya Jia
  • For: The paper aims to address the challenges of in-context learning (ICL) in multimodal tasks, specifically the difficulty of extrapolating from contextual examples to perform ICL as more modalities are added.* Methods: The proposed solution is called MultiModal In-conteXt Tuning (M$^2$IXT), a lightweight module that incorporates an expandable context window to incorporate various labeled examples of multiple modalities. The module can be prepended to various multimodal unified models and trained via a mixed-tasks strategy to enable rapid few-shot adaption on multiple tasks and datasets.* Results: The paper shows that M$^2$IXT can significantly boost the few-shot ICL performance (e.g., 18% relative increase for OFA) and achieve state-of-the-art results across various tasks, including visual question answering, image captioning, visual grounding, and visual entailment, while being considerably small in terms of model parameters.
    Abstract In-context learning (ICL) involves reasoning from given contextual examples. As more modalities comes, this procedure is becoming more challenging as the interleaved input modalities convolutes the understanding process. This is exemplified by the observation that multimodal models often struggle to effectively extrapolate from contextual examples to perform ICL. To address these challenges, we introduce MultiModal In-conteXt Tuning (M$^2$IXT), a lightweight module to enhance the ICL capabilities of multimodal unified models. The proposed M$^2$IXT module perceives an expandable context window to incorporate various labeled examples of multiple modalities (e.g., text, image, and coordinates). It can be prepended to various multimodal unified models (e.g., OFA, Unival, LLaVA) of different architectures and trained via a mixed-tasks strategy to enable rapid few-shot adaption on multiple tasks and datasets. When tuned on as little as 50K multimodal data, M$^2$IXT can boost the few-shot ICL performance significantly (e.g., 18\% relative increase for OFA), and obtained state-of-the-art results across an array of tasks including visual question answering, image captioning, visual grounding, and visual entailment, while being considerably small in terms of model parameters (e.g., $\sim$$20\times$ smaller than Flamingo or MMICL), highlighting the flexibility and effectiveness of M$^2$IXT as a multimodal in-context learner.
    摘要 宽 Context 学习(ICL) involve 从Contextual例子进行推理。随着更多modalities来临,这个过程变得更加挑战,因为交错的输入modalities会混淆理解过程。这被示示于多modal模型在 Contextual例子上进行ICL时的表现不佳。为了解决这些挑战,我们介绍 MultiModal In-conteXt Tuning(M$^2$IXT)模块,用于提高多modal unified模型的ICL能力。该模块可以预pend于多modal unified模型(例如OFA、Unival、LLaVA)的不同架构上,并通过混合任务策略进行训练,以实现快速少量数据适应。当与50000个多modal数据进行训练时,M$^2$IXT可以显著提高ICL性能(例如18%相对提高),并在视觉问答、图像描述、视觉固定、视觉包容等任务上达到了状态之册的结果,而且模型参数很小(例如$\sim$$20\times$ smaller than Flamingo或MMICL),这说明M$^2$IXT具有多modal Contextual学习的灵活性和效果。

Enhancing Representations through Heterogeneous Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.05108
  • repo_url: None
  • paper_authors: Zhong-Yu Li, Bo-Wen Yin, Shanghua Gao, Yongxiang Liu, Li Liu, Ming-Ming Cheng
  • for: 提高自我超vised学习中基础模型的表示质量,通过在基础模型上添加不同架构的辅助头来增强表示能力。
  • methods: 提出了一种基于不同架构的辅助头的自我超vised学习方法(HSSL),通过让基础模型学习辅助头的不同架构来增强表示能力,而不需要Structural changes。
  • results: 通过多种不同的基础模型和辅助头组合,对多个下游任务(包括图像分类、 semantic segmentation、instance segmentation、object detection)进行了实验,并发现了基础模型的表示质量随着辅助头的架构差异增加而提高。此外,还提出了一种快速找到最适合基础模型学习的辅助头的搜索策略和一些简单 yet effective的方法来扩大模型差异。
    Abstract Incorporating heterogeneous representations from different architectures has facilitated various vision tasks, e.g., some hybrid networks combine transformers and convolutions. However, complementarity between such heterogeneous architectures has not been well exploited in self-supervised learning. Thus, we propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. In this process, HSSL endows the base model with new characteristics in a representation learning way without structural changes. To comprehensively understand the HSSL, we conduct experiments on various heterogeneous pairs containing a base model and an auxiliary head. We discover that the representation quality of the base model moves up as their architecture discrepancy grows. This observation motivates us to propose a search strategy that quickly determines the most suitable auxiliary head for a specific base model to learn and several simple but effective methods to enlarge the model discrepancy. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks, including image classification, semantic segmentation, instance segmentation, and object detection. Our source code will be made publicly available.
    摘要 将不同架构的表示结合在一起已经提高了许多视觉任务的性能,例如混合网络将转换器和卷积结合使用。然而,这些不同架构之间的补做性未得到了自我超vised学习中的充分利用。因此,我们提出了多样化自我超vised学习(HSSL),它要求基本模型从auxiliary头中学习,auxiliary头的架构与基本模型不同。在这个过程中,HSSL使得基本模型学习新的特征,不需要结构性改变。为了全面了解HSSL,我们在不同的 heterogeneous对中进行了实验,发现当基本模型和auxiliary头的架构差异增大时,基本模型的表示质量会提高。这一观察使我们提出了一种快速找到最适合基本模型学习的auxiliary头的搜索策略,以及一些简单 yet effective的方法来扩大模型差异。HSSL可以与多种自我超vised方法结合使用,在多种下游任务上达到了更高的性能,包括图像分类、 semantic segmentation、实例 segmentation和对象检测。我们将代码公开发布。

OV-PARTS: Towards Open-Vocabulary Part Segmentation

  • paper_url: http://arxiv.org/abs/2310.05107
  • repo_url: https://github.com/openrobotlab/ov_parts
  • paper_authors: Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang
  • for: 本研究旨在提出一个开 vocabulary part segmentation(OV-PARTS)benchmark,以探索在实世界中具有多元定义的部分构成的挑战。
  • methods: 本研究使用了两个公开可用的数据集:Pascal-Part-116和ADE20K-Part-234,并提出了三个特定任务:通用零基准部分分类、跨数据集部分分类和少量基准部分分类,以探索模型对于不同定义的部分的数据分类能力。
  • results: 本研究通过实验分析了两种现有的物件水平OVSS方法的适用性,并提供了一个精确的数据集和代码,以便未来研究者可以在OV-PARTS领域进行更多的探索和创新。
    Abstract Segmenting and recognizing diverse object parts is a crucial ability in applications spanning various computer vision and robotic tasks. While significant progress has been made in object-level Open-Vocabulary Semantic Segmentation (OVSS), i.e., segmenting objects with arbitrary text, the corresponding part-level research poses additional challenges. Firstly, part segmentation inherently involves intricate boundaries, while limited annotated data compounds the challenge. Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world. Furthermore, the large-scale vision and language models, which play a key role in the open vocabulary setting, struggle to recognize parts as effectively as objects. To comprehensively investigate and tackle these challenges, we propose an Open-Vocabulary Part Segmentation (OV-PARTS) benchmark. OV-PARTS includes refined versions of two publicly available datasets: Pascal-Part-116 and ADE20K-Part-234. And it covers three specific tasks: Generalized Zero-Shot Part Segmentation, Cross-Dataset Part Segmentation, and Few-Shot Part Segmentation, providing insights into analogical reasoning, open granularity and few-shot adapting abilities of models. Moreover, we analyze and adapt two prevailing paradigms of existing object-level OVSS methods for OV-PARTS. Extensive experimental analysis is conducted to inspire future research in leveraging foundational models for OV-PARTS. The code and dataset are available at https://github.com/OpenRobotLab/OV_PARTS.
    摘要 Segmenting and recognizing diverse object parts is a crucial ability in various computer vision and robotic tasks. Although significant progress has been made in object-level Open-Vocabulary Semantic Segmentation (OVSS), segmenting objects with arbitrary text, the corresponding part-level research poses additional challenges. Firstly, part segmentation involves intricate boundaries, and limited annotated data makes it more challenging. Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world. Moreover, large-scale vision and language models, which play a key role in the open vocabulary setting, struggle to recognize parts as effectively as objects.To comprehensively investigate and tackle these challenges, we propose an Open-Vocabulary Part Segmentation (OV-PARTS) benchmark. OV-PARTS includes refined versions of two publicly available datasets: Pascal-Part-116 and ADE20K-Part-234. It covers three specific tasks: Generalized Zero-Shot Part Segmentation, Cross-Dataset Part Segmentation, and Few-Shot Part Segmentation, providing insights into analogical reasoning, open granularity, and few-shot adapting abilities of models. Moreover, we analyze and adapt two prevailing paradigms of existing object-level OVSS methods for OV-PARTS. Extensive experimental analysis is conducted to inspire future research in leveraging foundational models for OV-PARTS. The code and dataset are available at https://github.com/OpenRobotLab/OV_PARTS.

Cross-head mutual Mean-Teaching for semi-supervised medical image segmentation

  • paper_url: http://arxiv.org/abs/2310.05082
  • repo_url: https://github.com/leesoon1984/cmmt-net
  • paper_authors: Wei Li, Ruifeng Bian, Wenyi Zhao, Weijin Xu, Huihua Yang
  • for: 提高 semi-supervised medical image segmentation 的精度和一致性
  • methods: 提出了一种新的 Cross-head mutual mean-teaching Network (CMMT-Net),包括 teacher-student 师生网络和 pseudo label 生成等技术,以提高自教学和一致学习的性能
  • results: 实验结果显示,CMMT-Net 在三个公共可用的数据集上取得了前所未有的提高,在不同的 semi-supervised enario中均表现出色
    Abstract Semi-supervised medical image segmentation (SSMIS) has witnessed substantial advancements by leveraging limited labeled data and abundant unlabeled data. Nevertheless, existing state-of-the-art (SOTA) methods encounter challenges in accurately predicting labels for the unlabeled data, giving rise to disruptive noise during training and susceptibility to erroneous information overfitting. Moreover, applying perturbations to inaccurate predictions further reduces consistent learning. To address these concerns, we propose a novel Cross-head mutual mean-teaching Network (CMMT-Net) incorporated strong-weak data augmentation, thereby benefitting both self-training and consistency learning. Specifically, our CMMT-Net consists of both teacher-student peer networks with a share encoder and dual slightly different decoders, and the pseudo labels generated by one mean teacher head are adopted to supervise the other student branch to achieve a mutual consistency. Furthermore, we propose mutual virtual adversarial training (MVAT) to smooth the decision boundary and enhance feature representations. To diversify the consistency training samples, we employ Cross-Set CutMix strategy, which also helps address distribution mismatch issues. Notably, CMMT-Net simultaneously implements data, feature, and network perturbations, amplifying model diversity and generalization performance. Experimental results on three publicly available datasets indicate that our approach yields remarkable improvements over previous SOTA methods across various semi-supervised scenarios. Code and logs will be available at https://github.com/Leesoon1984/CMMT-Net.
    摘要 semi-supervised医学图像分割(SSMIS)在过去几年中取得了重大进步,通过利用有限的标注数据和庞大的无标注数据。然而,现有的状态之 искусственный智能(SOTA)方法在准确地预测无标注数据的标签时遇到了挑战,从而导致训练过程中的干扰和模型过拟合。此外,在应用扰动后,模型的学习不稳定。为解决这些问题,我们提出了一种新的交叉头同义启发网络(CMMT-Net),具有强弱数据增强和一致学习。具体来说,我们的CMMT-Net包括一个共享encoder和两个不同的decoder,其中一个用于生成 pseudo标签,另一个用于自我训练。此外,我们还提出了一种mutual virtual adversarial training(MVAT),以缓解决决策边界的问题,并提高特征表示。为了多样化一致学习样本,我们采用了 Cross-Set CutMix策略,这也有助于解决分布匹配问题。值得注意的是,CMMT-Net同时实现了数据、特征和网络扰动,从而扩大模型多样性和总体性能。我们的方法在三个公开的数据集上进行了实验,并取得了前所未有的提高。代码和日志将在https://github.com/Leesoon1984/CMMT-Net上提供。

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face

  • paper_url: http://arxiv.org/abs/2310.05056
  • repo_url: None
  • paper_authors: Hao Zhang, Kaipeng Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao
  • for: 提出开放词汇键点检测(OVKD)任务,以用文本提示来ocalize任意种类的键点。
  • methods: 提出Open-Vocabulary Keypoint Detection with Semantic-feature Matching(KDSM)方法,利用视觉和语言模型来利用文本和视觉之间的关系,从而实现键点检测。
  • results: 实验表明,我们提出的组件带来了显著性能提升,而我们的总方法在OVKD中达到了非常出色的结果,甚至在零例学习方式下超过了现状卷积检测方法。
    Abstract Current approaches for image-based keypoint detection on animal (including human) body and face are limited to specific keypoints and species. We address the limitation by proposing the Open-Vocabulary Keypoint Detection (OVKD) task. It aims to use text prompts to localize arbitrary keypoints of any species. To accomplish this objective, we propose Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM), which utilizes both vision and language models to harness the relationship between text and vision and thus achieve keypoint detection through associating text prompt with relevant keypoint features. Additionally, KDSM integrates domain distribution matrix matching and some special designs to reinforce the relationship between language and vision, thereby improving the model's generalizability and performance. Extensive experiments show that our proposed components bring significant performance improvements, and our overall method achieves impressive results in OVKD. Remarkably, our method outperforms the state-of-the-art few-shot keypoint detection methods using a zero-shot fashion. We will make the source code publicly accessible.
    摘要 当前对人体和面部图像中的关键点检测方法受限于特定关键点和种类。我们解决这一限制,提出开放词汇关键点检测(OVKD)任务。该任务的目标是使用文本提示来确定任意种类的关键点。为 достичь这一目标,我们提出了开放词汇关键点检测与semantic特征匹配(KDSM)方法,该方法利用视觉和语言模型来利用视觉和文本之间的关系,从而通过文本提示与相关的关键点特征相匹配来实现关键点检测。此外,KDSM还 integrates域名分布矩阵匹配和一些特殊的设计,以强化语言和视觉之间的关系,从而提高模型的普适性和性能。我们的实验表明,我们提出的组件带来了显著的性能提升,而我们的总方法在OVKD中实现了卓越的成绩,并且在零shot模式下超越了现有的状态对抗方法。我们将将源代码公开访问。

FairTune: Optimizing Parameter Efficient Fine Tuning for Fairness in Medical Image Analysis

  • paper_url: http://arxiv.org/abs/2310.05055
  • repo_url: None
  • paper_authors: Raman Dutt, Ondrej Bohdal, Sotirios A. Tsaftaris, Timothy Hospedales
  • for: 这个论文目标是提高机器学习模型的鲁棒性和公平性,特别是在具有伦理敏感性的应用领域,如医学诊断。
  • methods: 这篇论文使用了两级优化方法来解决公平学习问题,即在验证集上优化学习策略以确保公平性,并在更新参数时考虑公平性。
  • results: 论文的实验结果表明,使用 FairTune 框架可以提高医学图像 datasets 上的公平性。
    Abstract Training models with robust group fairness properties is crucial in ethically sensitive application areas such as medical diagnosis. Despite the growing body of work aiming to minimise demographic bias in AI, this problem remains challenging. A key reason for this challenge is the fairness generalisation gap: High-capacity deep learning models can fit all training data nearly perfectly, and thus also exhibit perfect fairness during training. In this case, bias emerges only during testing when generalisation performance differs across subgroups. This motivates us to take a bi-level optimisation perspective on fair learning: Optimising the learning strategy based on validation fairness. Specifically, we consider the highly effective workflow of adapting pre-trained models to downstream medical imaging tasks using parameter-efficient fine-tuning (PEFT) techniques. There is a trade-off between updating more parameters, enabling a better fit to the task of interest vs. fewer parameters, potentially reducing the generalisation gap. To manage this tradeoff, we propose FairTune, a framework to optimise the choice of PEFT parameters with respect to fairness. We demonstrate empirically that FairTune leads to improved fairness on a range of medical imaging datasets.
    摘要 训练模型具有坚固的群体公平性质iels是在伦理敏感的应用领域,如医疗诊断,是非常重要的。despite the growing body of work aiming to minimize demographic bias in AI, this problem remains challenging. A key reason for this challenge is the fairness generalization gap: high-capacity deep learning models can fit all training data nearly perfectly, and thus also exhibit perfect fairness during training. In this case, bias emerges only during testing when generalization performance differs across subgroups. This motivates us to take a bi-level optimization perspective on fair learning: optimizing the learning strategy based on validation fairness. Specifically, we consider the highly effective workflow of adapting pre-trained models to downstream medical imaging tasks using parameter-efficient fine-tuning (PEFT) techniques. There is a trade-off between updating more parameters, enabling a better fit to the task of interest vs. fewer parameters, potentially reducing the generalization gap. To manage this tradeoff, we propose FairTune, a framework to optimize the choice of PEFT parameters with respect to fairness. We demonstrate empirically that FairTune leads to improved fairness on a range of medical imaging datasets.Here's the text with some additional information about the translation:I used the Google Translate API to translate the text into Simplified Chinese. The translation is in the "Simplified Chinese" language setting, which is the standard writing system used in mainland China.Please note that the translation may not be perfect, and there may be some nuances or cultural references that are lost in translation. Additionally, the translation may not be suitable for all audiences, especially in formal or professional settings. It's always a good idea to double-check the translation with a native speaker or a professional translator to ensure accuracy and cultural appropriateness.

Low-Resolution Self-Attention for Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.05026
  • repo_url: https://github.com/yuhuan-wu/LRFormer
  • paper_authors: Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen
    for:LRFormer is designed for semantic segmentation tasks, specifically to improve the efficiency of vision transformers while maintaining performance.methods:LRFormer uses a Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a reduced computational cost, along with 3x3 depth-wise convolutions to capture fine details in the high-resolution space.results:LRFormer outperforms state-of-the-art models on the ADE20K, COCO-Stuff, and Cityscapes datasets.
    Abstract Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction. While existing vision transformers demonstrate promising performance, they often utilize high resolution context modeling, resulting in a computational bottleneck. In this work, we challenge conventional wisdom and introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution, with additional 3x3 depth-wise convolutions to capture fine details in the high-resolution space. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure. Extensive experiments on the ADE20K, COCO-Stuff, and Cityscapes datasets demonstrate that LRFormer outperforms state-of-the-art models. The code will be made available at https://github.com/yuhuan-wu/LRFormer.
    摘要 <>将文本翻译为简化字符的中文。>semantic segmentation任务自然需要高分辨率信息进行像素精度分割和全局上下文信息进行类型预测。而现有的视觉变换器经常使用高分辨率上下文模型,导致计算扰乱。在这项工作中,我们挑战传统的观点,并引入低分辨率自我关注(LRSA)机制,以 capture全局上下文,但是计算成本明显减少。我们的方法是在固定的低分辨率空间内计算自我关注,并在高分辨率空间内进行3x3深度感知 convolution来捕捉细节。我们建立了LRFormer,一个具有encoder-decoder结构的视觉变换器。我们的实验表明,LRFormer在ADE20K、COCO-Stuff和Cityscapes datasets上表现出色,超过了当前最佳模型。代码将在https://github.com/yuhuan-wu/LRFormer上提供。

Single Stage Warped Cloth Learning and Semantic-Contextual Attention Feature Fusion for Virtual TryOn

  • paper_url: http://arxiv.org/abs/2310.05024
  • repo_url: None
  • paper_authors: Sanhita Pathak, Vinay Kaushik, Brejesh Lall
  • for: 提供一种基于图像的虚拟尝试服装系统,使得用户可以在图像上尝试不同的服装。
  • methods: 提posed a novel single-stage framework that implicitly learns garment warping and body synthesis from target pose keypoints, using a semantic-contextual fusion attention module and a lightweight linear attention framework to address misalignment and artifacts.
  • results: 比较 existed methods有更高的效率和质量,提供更可靠和真实的虚拟尝试体验。
    Abstract Image-based virtual try-on aims to fit an in-shop garment onto a clothed person image. Garment warping, which aligns the target garment with the corresponding body parts in the person image, is a crucial step in achieving this goal. Existing methods often use multi-stage frameworks to handle clothes warping, person body synthesis and tryon generation separately or rely on noisy intermediate parser-based labels. We propose a novel single-stage framework that implicitly learns the same without explicit multi-stage learning. Our approach utilizes a novel semantic-contextual fusion attention module for garment-person feature fusion, enabling efficient and realistic cloth warping and body synthesis from target pose keypoints. By introducing a lightweight linear attention framework that attends to garment regions and fuses multiple sampled flow fields, we also address misalignment and artifacts present in previous methods. To achieve simultaneous learning of warped garment and try-on results, we introduce a Warped Cloth Learning Module. WCLM uses segmented warped garments as ground truth, operating within a single-stage paradigm. Our proposed approach significantly improves the quality and efficiency of virtual try-on methods, providing users with a more reliable and realistic virtual try-on experience. We evaluate our method on the VITON dataset and demonstrate its state-of-the-art performance in terms of both qualitative and quantitative metrics.
    摘要 文本翻译:图像基于的虚拟试穿目标是将店内衣服适应到披衣人像中的不同姿势。衣服扭曲是实现这个目标的关键步骤,但现有方法frequently使用多个阶段框架来处理衣服扭曲、人体身体合成和试穿生成。我们提出了一种新的单阶段框架,不需要显式的多阶段学习。我们的方法利用了一种新的 semantics-contextual fusion attention模块来拼接衣服和人体特征,从而实现高效和真实的衣服扭曲和人体合成。我们还引入了一种轻量级的线性注意机制,用于衣服区域的注意和多个抽取流场的融合,以解决先前方法中的偏移和零散。为了同时学习扭曲衣服和试穿结果,我们引入了扭曲衣服学习模块(WCLM)。WCLM使用分割后的扭曲衣服作为真实数据,在单阶段框架中进行学习。我们的提议方法可以大幅提高虚拟试穿方法的质量和效率,为用户提供更可靠和更真实的虚拟试穿体验。我们在VITON数据集上进行了评估,并证明了我们的方法在质量和量化指标上具有当前领域的state-of-the-art性。

Detecting Abnormal Health Conditions in Smart Home Using a Drone

  • paper_url: http://arxiv.org/abs/2310.05012
  • repo_url: None
  • paper_authors: Pronob Kumar Barman
  • for: 本研究旨在开发一种智能化的跌倒检测系统,以帮助年轻和老年人独立生活。
  • methods: 该系统使用视觉基于的跌倒监测,通过图像或视频分割和物体检测方法,以实现跌倒的识别。
  • results: 研究结果表明,该系统可以准确地识别跌倒物体,准确率为0.9948。
    Abstract Nowadays, detecting aberrant health issues is a difficult process. Falling, especially among the elderly, is a severe concern worldwide. Falls can result in deadly consequences, including unconsciousness, internal bleeding, and often times, death. A practical and optimal, smart approach of detecting falling is currently a concern. The use of vision-based fall monitoring is becoming more common among scientists as it enables senior citizens and those with other health conditions to live independently. For tracking, surveillance, and rescue, unmanned aerial vehicles use video or image segmentation and object detection methods. The Tello drone is equipped with a camera and with this device we determined normal and abnormal behaviors among our participants. The autonomous falling objects are classified using a convolutional neural network (CNN) classifier. The results demonstrate that the systems can identify falling objects with a precision of 0.9948.
    摘要

Data Augmentation through Pseudolabels in Automatic Region Based Coronary Artery Segmentation for Disease Diagnosis

  • paper_url: http://arxiv.org/abs/2310.05990
  • repo_url: None
  • paper_authors: Sandesh Pokhrel, Sanjay Bhandari, Eduard Vazquez, Yash Raj Shrestha, Binod Bhattarai
  • for: 诊断心血管疾病(CAD)的准确性和效率有很大提高的需求,但现有的诊断方法往往困难和资源占用。在这种情况下,分割arteries的技术成为一种帮助临床专业人员作出准确诊断的工具。
  • methods: 本研究使用 pseudolabels 作为数据增强技术,以提高基eline Yolo 模型的性能。
  • results: 在验证集中,使用 pseudolabels 增强基eline Yolo 模型的 F1 分数提高了 9%,在测试集中提高了 3%。
    Abstract Coronary Artery Diseases(CADs) though preventable are one of the leading causes of death and disability. Diagnosis of these diseases is often difficult and resource intensive. Segmentation of arteries in angiographic images has evolved as a tool for assistance, helping clinicians in making accurate diagnosis. However, due to the limited amount of data and the difficulty in curating a dataset, the task of segmentation has proven challenging. In this study, we introduce the idea of using pseudolabels as a data augmentation technique to improve the performance of the baseline Yolo model. This method increases the F1 score of the baseline by 9% in the validation dataset and by 3% in the test dataset.
    摘要 心血管疾病(CAD) 虽可预防,但它是死亡和残疾的主要原因之一。诊断这种疾病的困难和资源占用。artery segmentation in angiographic images has evolved as a tool for assistance, helping clinicians make accurate diagnoses. However, due to limited data and difficulty in curating a dataset, the task of segmentation has proven challenging. In this study, we introduce the idea of using pseudolabels as a data augmentation technique to improve the performance of the baseline Yolo model. This method increases the F1 score of the baseline by 9% in the validation dataset and by 3% in the test dataset.Note: "心血管" (xīn xuè màn) is a shortened form of "心血管疾病" (xīn xuè màn jì bìng), which means "cardiovascular disease" in Chinese.

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

  • paper_url: http://arxiv.org/abs/2310.05010
  • repo_url: https://github.com/wengzejia1/open-vclip
  • paper_authors: Zuxuan Wu, Zejia Weng, Wujian Peng, Xitong Yang, Ang Li, Larry S. Davis, Yu-Gang Jiang
    for:* The paper aims to adapt Contrastive Language-Image Pretraining (CLIP) for zero-shot video recognition, with the goal of identifying novel actions and events in videos.methods:* The proposed method, called Open-VCLIP++, modifies CLIP to capture spatial-temporal relationships in videos, and leverages a technique called Interpolated Weight Optimization to improve generalization.* The method also utilizes large language models to produce fine-grained video descriptions, which are aligned with video features to facilitate a better transfer of CLIP to the video domain.results:* The proposed method achieves zero-shot accuracy scores of 88.1%, 58.7%, and 81.2% on UCF, HMDB, and Kinetics-600 datasets respectively, outperforming the best-performing alternative methods by 8.5%, 8.2%, and 12.3%.* The method also delivers competitive video-to-text and text-to-video retrieval performance on the MSR-VTT video-text retrieval dataset, with substantially less fine-tuning data compared to other methods.
    Abstract Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCLIP++ minimally modifies CLIP to capture spatial-temporal relationships in videos, thereby creating a specialized video classifier while striving for generalization. We formally demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data. To address this problem, we introduce Interpolated Weight Optimization, a technique that leverages the advantages of weight interpolation during both training and testing. Furthermore, we build upon large language models to produce fine-grained video descriptions. These detailed descriptions are further aligned with video features, facilitating a better transfer of CLIP to the video domain. Our approach is evaluated on three widely used action recognition datasets, following a variety of zero-shot evaluation protocols. The results demonstrate that our method surpasses existing state-of-the-art techniques by significant margins. Specifically, we achieve zero-shot accuracy scores of 88.1%, 58.7%, and 81.2% on UCF, HMDB, and Kinetics-600 datasets respectively, outpacing the best-performing alternative methods by 8.5%, 8.2%, and 12.3%. We also evaluate our approach on the MSR-VTT video-text retrieval dataset, where it delivers competitive video-to-text and text-to-video retrieval performance, while utilizing substantially less fine-tuning data compared to other methods. Code is released at https://github.com/wengzejia1/Open-VCLIP.
    摘要 尽管尝试语言图像预训练(CLIP)在零shot图像识别中取得了显著的成果,但是对其在零shot视频识别方面的潜在能力尚未得到了充分的探讨。本文提出了Open-VCLIP++框架,这是一种简单而有效的方法,可以将CLIP adapted into a strong zero-shot video classifier,能够在测试中识别新的动作和事件。Open-VCLIP++只需要微小地修改CLIP,以捕捉视频中的空间-时间关系,从而创造一个特циализирован的视频分类器,同时尽量保持泛化能力。我们正式证明,在训练Open-VCLIP++时, Equivalent to continual learning with zero historical data。为解决这个问题,我们提出了 interpolated weight optimization 技术,该技术利用在训练和测试中 weight interpolation 的优势。此外,我们基于大型自然语言模型来生成细腻的视频描述,这些描述与视频特征进行了更好的匹配,以便更好地将CLIP转移到视频领域。我们的方法在三个常用的动作识别数据集上进行了评估,采用了多种零shot评估协议。结果表明,我们的方法在 UCF、HMDB 和 Kinetics-600 数据集上的零shot精度分别达到了 88.1%、58.7% 和 81.2%,与最佳替代方法相比提高了8.5%、8.2% 和 12.3%。我们还在 MSR-VTT 视频-文本检索数据集上评估了我们的方法,其在视频-文本和文本-视频检索中表现竞争力强,而且使用的微型 fine-tuning 数据量相比其他方法更少。代码可以在 上下载。

Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition

  • paper_url: http://arxiv.org/abs/2310.04999
  • repo_url: https://github.com/wzx99/clipocr
  • paper_authors: Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Boqiang Zhang, Yongdong Zhang
  • for: 这种研究旨在探讨CLIP模型在场景文本识别(STR)领域的潜力,并提出了一种新的对称语言特征泵化框架(CLIP-OCR),用于利用CLIP模型中的视觉和语言知识。
  • methods: 该研究提出了一种对称特征泵化策略(SDS),该策略可以更好地捕捉CLIP模型中的语言知识。具体来说,通过将CLIP图像encoder与反转的CLIP文本encoder串联起来,建立了一个对称结构,该结构包括一个图像到文本的特征流,这个特征流包括了视觉和语言信息。
  • results: 实验结果表明,CLIP-OCR可以在六个流行的STR benchmark上达到93.8%的平均准确率。
    Abstract In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP) model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature Distillation framework (named CLIP-OCR) to leverage both visual and linguistic knowledge in CLIP. Different from previous CLIP-based methods mainly considering feature generalization on visual encoding, we propose a symmetrical distillation strategy (SDS) that further captures the linguistic knowledge in the CLIP text encoder. By cascading the CLIP image encoder with the reversed CLIP text encoder, a symmetrical structure is built with an image-to-text feature flow that covers not only visual but also linguistic information for distillation.Benefiting from the natural alignment in CLIP, such guidance flow provides a progressive optimization objective from vision to language, which can supervise the STR feature forwarding process layer-by-layer.Besides, a new Linguistic Consistency Loss (LCL) is proposed to enhance the linguistic capability by considering second-order statistics during the optimization. Overall, CLIP-OCR is the first to design a smooth transition between image and text for the STR task.Extensive experiments demonstrate the effectiveness of CLIP-OCR with 93.8% average accuracy on six popular STR benchmarks.Code will be available at https://github.com/wzx99/CLIPOCR.
    摘要 在这篇论文中,我们探索了CLIP模型在文本识别(STR)领域的潜力,并提出了一种新的对称语言特征精炼框架(CLIP-OCR),用于利用CLIP模型的视觉和语言知识。与前者CLIP基于方法主要关注视觉编码特征泛化,我们提议一种对称精炼策略(SDS),进一步捕捉CLIP文本编码器中的语言知识。通过将CLIP图像编码器与反转CLIP文本编码器串联起来,建立了一个对称结构,涵盖了图像和文本之间的视觉和语言信息 для精炼。由于CLIP自然的对称性,这种导向流提供了一个逐层优化目标,从视觉到语言,可以超visually guided feature forwarding process layer by layer.此外,我们还提出了一种新的语言一致损失(LCL),以提高语言能力,通过考虑第二阶段统计信息进行优化。总的来说,CLIP-OCR是STR任务中首次设计了图像和文本之间的平滑过渡。我们的实验结果显示,CLIP-OCR在六个流行的STR benchmark上获得了93.8%的平均准确率。代码将在https://github.com/wzx99/CLIPOCR上公开。

SemST: Semantically Consistent Multi-Scale Image Translation via Structure-Texture Alignment

  • paper_url: http://arxiv.org/abs/2310.04995
  • repo_url: None
  • paper_authors: Ganning Zhao, Wenhui Cui, Suya You, C. -C. Jay Kuo
  • for: 本研究旨在提出一种能够维护semantic consistency的无监督图像到图像(I2I)翻译方法,以 Addressing the challenge of content discrepancy in I2I translation.
  • methods: 本方法使用对比学习和最大化输入和输出之间的共聚信息,以降低semantic distortion。另外,一种多尺度方法也是引入,以提高翻译性能。
  • results: 实验表明,本方法能够有效地减少semantic distortion,并 achieve state-of-the-art performance。此外,在域适应(DA)中应用SemST也被证明可以作为semantic segmentation任务的有利预处理。
    Abstract Unsupervised image-to-image (I2I) translation learns cross-domain image mapping that transfers input from the source domain to output in the target domain while preserving its semantics. One challenge is that different semantic statistics in source and target domains result in content discrepancy known as semantic distortion. To address this problem, a novel I2I method that maintains semantic consistency in translation is proposed and named SemST in this work. SemST reduces semantic distortion by employing contrastive learning and aligning the structural and textural properties of input and output by maximizing their mutual information. Furthermore, a multi-scale approach is introduced to enhance translation performance, thereby enabling the applicability of SemST to domain adaptation in high-resolution images. Experiments show that SemST effectively mitigates semantic distortion and achieves state-of-the-art performance. Also, the application of SemST to domain adaptation (DA) is explored. It is demonstrated by preliminary experiments that SemST can be utilized as a beneficial pre-training for the semantic segmentation task.
    摘要 <>设想系统:源领域到目标领域的无监督图像译换学习映射输入源领域的图像到目标领域的图像,保持 Semantics 的含义。一个挑战是源领域和目标领域的含义统计不同,导致内容偏差,即semantic distortion。为解决这个问题,本文提出了一种新的I2I方法,命名为SemST,该方法保持了Semantics的一致性。SemST通过对输入和输出的结构和文本特征进行匹配,使得它们之间的信息共同性最大化,从而减少semantic distortion。此外,本文还提出了一种多尺度方法,以提高译换性能,使得SemST可以应用于高分辨率图像的领域适应。实验表明,SemST可以有效地减少semantic distortion,并达到领域顶尖性能。此外,本文还探讨了SemST的应用于领域适应(DA),并通过初步实验表明,SemST可以作为semantic segmentation任务的有利预训练。

VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2310.04992
  • repo_url: None
  • paper_authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv, Hui Miao, Li Guo, Shujun Zhang, Cheng Pei, Xiaojuan Fan, Jianqin Lei, Ting Wei, Junguo Duan, Chun Liu, Xiaobo Xia, Siqi Xiong, Junhong Li, Benny Lo, Yih Chung Tham, Tien Yin Wong, Ningli Wang, Wu Yuan
  • for: 这个论文是为了开发一个基于340万张眼科图像的基础模型,用于推动多种眼科人工智能应用程序。
  • methods: 该模型使用了340万张眼科图像,涵盖了各种眼科疾病、图像设备和人口类型,进行预训练。
  • results: 模型在12种常见眼科疾病的诊断中与专业医生的合作诊断性能相当或更高,并在新的大规模眼科疾病诊断数据集和检测数据集上表现出优异性能。
    Abstract We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.
    摘要 我们介绍VisionFM,一个基础模型,通过340万张眼科图像和560457名个人数据进行预训练,覆盖了广泛的眼科疾病、模式、成像设备和人口学。预训练后,VisionFM提供了一个基础,以推动多种眼科人工智能应用程序,如疾病检测和诊断、疾病诊断、疾病类型分 subclassification和系统生物标志和疾病预测,每个应用程序都受到专家水平的智能和准确性的提高。VisionFM的通用智能超过了基本和中级水平的眼科医生,在共同诊断12种常见眼科疾病方面表现出色。在一个新的大规模眼科疾病诊断benchmark数据集和一个新的大规模分割和检测benchmark数据集上进行评估,VisionFM表现出色,并超越了强基线深度神经网络。眼科图像学习的VisionFM表现出了值得注意的解释性,并在新的眼科模式、疾病谱和成像设备上表现出了强大的普适性。作为基础模型,VisionFM具有大量学习眼科成像数据和多种数据集的能力。为了与这种能力相符,我们不仅使用了实际数据进行预训练,还生成并利用了合理的 synthetic眼科成像数据。实验结果表明,通过visual Turing测试,合理的synthetic数据也可以提高VisionFM的表征学习能力,导致下游眼科人工智能任务的性能提高。除了在本工作中开发、验证和示例的眼科人工智能应用程序外,VisionFM可以在高效和cost-effective的方式实现更多的应用。

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

  • paper_url: http://arxiv.org/abs/2310.04991
  • repo_url: None
  • paper_authors: Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
  • for: 这篇论文旨在提出一种视频语言基础模型,以便进行视频描述生成任务。
  • methods: 该模型使用多模态融合和精细的模态对齐来显著提高视频描述生成的效果。它利用冻结预训练的视觉和语言模块,并在描述生成过程中使用大型自然语言模型来生成 concise 和 elaborate 的视频描述。
  • results: 实验结果表明,该模型可以准确地理解视频内容,并生成 coherent 和精细的语言描述。 fine-grained 模态对齐目标可以提高模型的能力(4% 提高 CIDEr 分数在 MSR-VTT),仅需训练参数增加 13%,并在推理过程中不增加额外成本。
    Abstract This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task. Video-Teller boosts the training efficiency by utilizing frozen pretrained vision and language modules. It capitalizes on the robust linguistic capabilities of large language models, enabling the generation of both concise and elaborate video descriptions. To effectively integrate visual and auditory information, Video-Teller builds upon the image-based BLIP-2 model and introduces a cascaded Q-Former which fuses information across frames and ASR texts. To better guide video summarization, we introduce a fine-grained modality alignment objective, where the cascaded Q-Former's output embedding is trained to align with the caption/summary embedding created by a pretrained text auto-encoder. Experimental results demonstrate the efficacy of our proposed video-language foundation model in accurately comprehending videos and generating coherent and precise language descriptions. It is worth noting that the fine-grained alignment enhances the model's capabilities (4% improvement of CIDEr score on MSR-VTT) with only 13% extra parameters in training and zero additional cost in inference.
    摘要 Translation in Simplified Chinese:这篇论文提出了 Video-Teller,一种基于视频-语言基础模型,通过多modal融合和精细模态对接来备受提高视频到文本生成任务的能力。Video-Teller 通过冻结预训练的视觉和语言模块来提高训练效率。它利用大语言模型的 Robust 语言功能,以便生成 Both concise 和 elaborate 的视频描述。为了有效地 инте格ри视觉和听觉信息,Video-Teller 基于 BLIP-2 模型,并引入一个卷积扩展器,将信息融合到帧和 ASR 文本之间。为了更好地引导视频摘要,我们引入了精细模态对接目标,使得卷积扩展器的输出嵌入与预训练文本自动编码器创建的 Caption/Summary 嵌入进行对接。实验结果表明我们提出的视频语言基础模型在准确理解视频并生成准确和精细的语言描述方面具有很高的能力。值得注意的是,精细对接对模型的能力提高带来了4%的 CIDEr 分数提高(在 MSR-VTT 上),只需要在训练中增加13%的参数,并在推理时间中没有额外的成本。

Compositional Semantics for Open Vocabulary Spatio-semantic Representations

  • paper_url: http://arxiv.org/abs/2310.04981
  • repo_url: None
  • paper_authors: Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda
  • for: 本研究旨在实现无需人工指令的通用移动机器人,通过大语言模型(LLM)和感知语言模型(VLM)来实现常识世界知识和理性计划。
  • methods: 本研究提出了幽领结构嵌入〈z〉,实现可询问的空间 semantics 表示。通过数学证明和实验验证,〈z〉可以在任意集Z中找到最佳中心,并且可以由梯度下降优化从视觉外观和单词描述中学习。
  • results: 实验结果显示,〈z〉可以表示到100个高维度嵌入中的10个 semantics,并且可以提高非关连的开 vocabulary segmentation性能。使用CLIP和SBERT嵌入空间的实验结果显示,一个简单的 dense VLM可以在COCO-Stuff dataset上学习〈z〉,实现181个相互关联的 semantics 的构成。
    Abstract General-purpose mobile robots need to complete tasks without exact human instructions. Large language models (LLMs) is a promising direction for realizing commonsense world knowledge and reasoning-based planning. Vision-language models (VLMs) transform environment percepts into vision-language semantics interpretable by LLMs. However, completing complex tasks often requires reasoning about information beyond what is currently perceived. We propose latent compositional semantic embeddings z* as a principled learning-based knowledge representation for queryable spatio-semantic memories. We mathematically prove that z* can always be found, and the optimal z* is the centroid for any set Z. We derive a probabilistic bound for estimating separability of related and unrelated semantics. We prove that z* is discoverable by iterative optimization by gradient descent from visual appearance and singular descriptions. We experimentally verify our findings on four embedding spaces incl. CLIP and SBERT. Our results show that z* can represent up to 10 semantics encoded by SBERT, and up to 100 semantics for ideal uniformly distributed high-dimensional embeddings. We demonstrate that a simple dense VLM trained on the COCO-Stuff dataset can learn z* for 181 overlapping semantics by 42.23 mIoU, while improving conventional non-overlapping open-vocabulary segmentation performance by +3.48 mIoU compared with a popular SOTA model.
    摘要 通用移动机器人需要完成任务无需准确的人类指令。大型语言模型(LLM)是实现通用世界知识和理由预测的可能性的方向。视觉语言模型(VLM)将环境感知转化为可解释的视觉语言 semantics,但完成复杂任务通常需要对现有信息之外的信息进行推理。我们提议使用幽默的 composer semantic embedding z* 作为可学习基于知识表示的原则。我们 математичеamente 证明 z* 总是可以找到,并且最佳 z* 是 Z 集合中的中心。我们 derive 一个 probabilistic bound 用于估计相关和不相关 semantics 之间的分化程度。我们证明 z* 可以通过迭代的梯度下降从视觉特征和 singular descriptions 进行学习。我们在 CLIP 和 SBERT 等四个 embedding space 上进行实验,并证明 z* 可以表示 SBERT 中的 10 个 semantics,以及高维 embedding 中的 100 个 semantics。我们还示出了一个简单的 dense VLM 在 COCO-Stuff 数据集上可以通过 z* 学习 181 个 overlap semantics,而且提高了非 overlap segmentation 性能。

Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling

  • paper_url: http://arxiv.org/abs/2310.04964
  • repo_url: None
  • paper_authors: Wanjie Sun, Zhenzhong Chen
  • for: 这篇论文旨在提出一种不需要对比的单图超解析(SISR)方法,用于处理真实世界中的图像,因为现有的大多数不监督的实世界SISR方法采用了两个阶段训练策略,首先将高分辨率图像转换成低分辨率图像,然后在监督下训练超解析模型。
  • methods: 该方法提出了一种名为SDFlow的图像下采样和超解析模型,该模型同时学习了 bidirectional 多对多 mapping между实世界低分辨率图像和高分辨率图像,无需对比。SDFlow 通过分离图像内容和降解信息在幂空间中,使得低分辨率图像和高分辨率图像的内容信息分布在共同的幂空间中匹配。
  • results: 实验结果表明,SDFlow 可以生成多个真实和可见的低分辨率图像和高分辨率图像,并且能够Quantitatively and qualitatively improve the performance of real-world image super-resolution.
    Abstract Learning based single image super-resolution (SISR) for real-world images has been an active research topic yet a challenging task, due to the lack of paired low-resolution (LR) and high-resolution (HR) training images. Most of the existing unsupervised real-world SISR methods adopt a two-stage training strategy by synthesizing realistic LR images from their HR counterparts first, then training the super-resolution (SR) models in a supervised manner. However, the training of image degradation and SR models in this strategy are separate, ignoring the inherent mutual dependency between downscaling and its inverse upscaling process. Additionally, the ill-posed nature of image degradation is not fully considered. In this paper, we propose an image downscaling and SR model dubbed as SDFlow, which simultaneously learns a bidirectional many-to-many mapping between real-world LR and HR images unsupervisedly. The main idea of SDFlow is to decouple image content and degradation information in the latent space, where content information distribution of LR and HR images is matched in a common latent space. Degradation information of the LR images and the high-frequency information of the HR images are fitted to an easy-to-sample conditional distribution. Experimental results on real-world image SR datasets indicate that SDFlow can generate diverse realistic LR and SR images both quantitatively and qualitatively.
    摘要 学习基于单个图像超分辨 (SISR) 对实际世界图像进行研究是一个活跃的研究话题,但是是一个具有挑战性的任务,因为缺乏匹配的低分辨率 (LR) 和高分辨率 (HR) 训练图像。大多数现有的无监督实际世界 SISR 方法采用了两个阶段训练策略,先将实际LR图像Synthesize into HR counterparts,然后在监督性训练SR模型。但是,训练图像减退和其 inverse upscaling 过程中的相互依赖关系未被考虑,同时不完全考虑图像减退的不定性。在这篇论文中,我们提出了一种名为SDFlow的图像减退和SR模型,可以同时学习实际LR和HR图像之间的 bidirectional many-to-many 映射,无需监督。主要思想是在幂空间中分离图像内容和减退信息,LR图像的内容信息和HR图像的高频信息在幂空间中匹配。LR图像的减退信息和HR图像的高频信息被 fitted 到一个易于样本的 conditional distribution。实验结果表明,SDFlow可以生成多样化的实际LR和SR图像,并且具有较高的量化和质量指标。

cs.AI - 2023-10-08

Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods

  • paper_url: http://arxiv.org/abs/2310.05309
  • repo_url: None
  • paper_authors: Constantine Caramanis, Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, Christos Tzamos
  • for: 这 paper 的目的是提供一种新的理论框架,用于分析 Deep Neural Networks 和 Reinforcement Learning 方法在解决复杂的 combinatorial 问题时的效果。
  • methods: 这 paper 使用的方法包括使用 Deep Neural Network 作为解决方案生成器,并通过 gradient-based 方法(例如策略梯度)进行训练,以获得更好的解决方案分布。
  • results: 本 paper 的主要贡献是提供了一个答案,证明 Deep Neural Networks 和 Reinforcement Learning 方法可以有效地解决 combinatorial 问题,包括 Max- 和 Min-Cut、Max-$k$-CSP、最大权重双向匹配和旅行商问题。此外,这 paper 还介绍了一种新的规范过程,用于改进 vanilla gradient descent,并提供了理论和实验证明,这种方法可以解决消失梯度问题和避免坏的站点点。
    Abstract Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for analyzing the effectiveness of such methods. We ask whether there exist generative models that (i) are expressive enough to generate approximately optimal solutions; (ii) have a tractable, i.e, polynomial in the size of the input, number of parameters; (iii) their optimization landscape is benign in the sense that it does not contain sub-optimal stationary points. Our main contribution is a positive answer to this question. Our result holds for a broad class of combinatorial problems including Max- and Min-Cut, Max-$k$-CSP, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla gradient descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
    摘要 深度神经网络和强化学习方法在解决复杂的 combinatorial 问题方面有广泛的实践经验。在这些方法中,深度神经网络被用作解决生成器,然后通过梯度基本方法(例如策略梯度)进行训练,以逐渐获得更好的解决分布。在这种工作中,我们提出了一个新的理论框架来分析这些方法的效果。我们问题是否存在一些生成模型,满足以下条件:(i)能够生成约等价优的解决方案;(ii) Parameters 的数量是输入数据的线性函数;(iii)优化Landscaper 是柔和的,不含有优化点。我们的主要贡献是给出了一个积极的答案。我们的结果适用于一类复杂的 combinatorial 问题,包括最大批量和最小批量问题、最大-$k$-CSP、最大负载双向匹配和旅行商问题。作为我们的分析的侧重点,我们还提出了一种新的 Regularization 过程,并通过理论和实验证明,它可以帮助解决混合梯度问题和避免坏的站点点。

Tailoring Self-Attention for Graph via Rooted Subtrees

  • paper_url: http://arxiv.org/abs/2310.05296
  • repo_url: https://github.com/lumia-group/subtree-attention
  • paper_authors: Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin
  • for: 本文旨在提出一种新的多跳图注意机制,以解决现有图注意机制中的局部注意力和全局注意力的缺陷。
  • methods: 本文提出了一种名为Subtree Attention(STA)的新型多跳图注意机制,具有跨度更强的能力捕捉长距离信息和细腻的地方信息。STA还提供了一种有理证据的修正方法,以保证STA在极端情况下可以近似于全局注意力。
  • results: 对于十个节点分类 dataset,STA-based模型表现出色,超越现有的图Transformers和主流 GNNs。
    Abstract Attention mechanisms have made significant strides in graph learning, yet they still exhibit notable limitations: local attention faces challenges in capturing long-range information due to the inherent problems of the message-passing scheme, while global attention cannot reflect the hierarchical neighborhood structure and fails to capture fine-grained local information. In this paper, we propose a novel multi-hop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues. STA seamlessly bridges the fully-attentional structure and the rooted subtree, with theoretical proof that STA approximates the global attention under extreme settings. By allowing direct computation of attention weights among multi-hop neighbors, STA mitigates the inherent problems in existing graph attention mechanisms. Further we devise an efficient form for STA by employing kernelized softmax, which yields a linear time complexity. Our resulting GNN architecture, the STAGNN, presents a simple yet performant STA-based graph neural network leveraging a hop-aware attention strategy. Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The code is available at https://github.com/LUMIA-Group/SubTree-Attention.
    摘要 注意机制在图学习中已经取得了重要进展,但它们仍然存在显著的限制:当地注意力不能够捕捉远程信息,因为消息传递方案的内在问题,而全局注意力则不能够反映层次结构和细化的地方信息。在这篇论文中,我们提出了一种新的多趟图注意机制,名为子树注意(STA),以解决以上问题。STA可以准确地计算多趟邻居之间的注意力权重,并且有理论证明,STA可以在极端情况下近似于全局注意。我们还提出了一种高效的STA实现方式,通过使用核函数软max,实现了线性时间复杂度。我们的结果是一种简单又高性能的STA-基于GNN,称为STAGNN,它利用跳跃注意策略来实现多趟图注意。我们对十个节点分类 datasets进行了广泛的评估,发现STA-based模型比现有的图transformer和主流GNN都有更好的性能。代码可以在https://github.com/LUMIA-Group/SubTree-Attention中下载。

Generalizable Error Modeling for Search Relevance Data Annotation Tasks

  • paper_url: http://arxiv.org/abs/2310.05286
  • repo_url: None
  • paper_authors: Heinrich Peters, Alireza Hashemi, James Rae
  • for: 这篇论文旨在提高机器学习和人工智能系统的质量,具体来说是针对搜索 relevance 标注任务进行预测错误模型的建立和评估。
  • methods: 该论文使用了一种预测错误模型,并在三个产业级 ML 应用(音乐流媒体、视频流媒体、移动应用)中进行了实践。
  • results: 论文显示了预测错误模型可以在不同应用中具有moderate的模型性能(AUC=0.65-0.75),并且该模型在不同应用之间具有良好的泛化性。此外,论文还提供了模型解释分析,以便理解预测错误的主要驱动因素。最后,论文还证明了这种模型在审核中的有用性,可以提高数据标注过程中的效率和质量。
    Abstract Human data annotation is critical in shaping the quality of machine learning (ML) and artificial intelligence (AI) systems. One significant challenge in this context is posed by annotation errors, as their effects can degrade the performance of ML models. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML applications (music streaming, video streaming, and mobile apps) and assesses its potential to enhance the quality and efficiency of the data annotation process. Drawing on real-world data from an extensive search relevance annotation program, we illustrate that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications (i.e., a global, task-agnostic model performs on par with task-specific models). We present model explainability analyses to identify which types of features are the main drivers of predictive performance. Additionally, we demonstrate the usefulness of the model in the context of auditing, where prioritizing tasks with high predicted error probabilities considerably increases the amount of corrected annotation errors (e.g., 40% efficiency gains for the music streaming application). These results underscore that automated error detection models can yield considerable improvements in the efficiency and quality of data annotation processes. Thus, our findings reveal critical insights into effective error management in the data annotation process, thereby contributing to the broader field of human-in-the-loop ML.
    摘要 人工数据标注是机器学习(ML)和人工智能(AI)系统的关键因素。一个重要的挑战在这个上是标注错误,因为它们可以降低ML模型的性能。本文介绍了一个预测错误模型,用于检测搜索相关性标注任务中的可能错误,并评估其在三个产业级ML应用(音乐流媒体、视频流媒体和移动应用)中的可能性。基于广泛的搜索相关性标注计划的实际数据,我们示出了预测错误的能力(AUC=0.65-0.75),并证明模型性能可以通过应用之间进行泛化。我们还提供了模型解释分析,以确定预测性能的主要驱动因素。此外,我们还证明了模型在审核中的用途,可以减少标注错误的效率(例如,音乐流媒体应用中的40%效率提升)。这些结果证明了自动错误检测模型可以提供显著改善数据标注过程的效率和质量。因此,我们的发现对人类在ML过程中的循环提供了重要的洞察,并贡献到更广泛的人类-在-loop ML领域。

Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems

  • paper_url: http://arxiv.org/abs/2310.05280
  • repo_url: https://github.com/uclanlp/persona-biases
  • paper_authors: Yixin Wan, Jieyu Zhao, Aman Chadha, Nanyun Peng, Kai-Wei Chang
  • for: 这项研究旨在探讨对话系统中使用人物模拟的风险,以及这些风险如何影响对话系统的性能和用户体验。
  • methods: 本研究使用UNIVERSALPERSONA数据集,对四种不同的对话系统进行了比较,并采用了五种评价指标来评估对话系统中人物模拟的偏见。
  • results: 研究发现,使用人物模拟在对话系统中存在许多偏见,包括不够尊重和不当的回应,这些偏见可能会对用户造成困惑和不良影响。
    Abstract Recent advancements in Large Language Models empower them to follow freeform instructions, including imitating generic or specific demographic personas in conversations. We define generic personas to represent demographic groups, such as "an Asian person", whereas specific personas may take the form of specific popular Asian names like "Yumi". While the adoption of personas enriches user experiences by making dialogue systems more engaging and approachable, it also casts a shadow of potential risk by exacerbating social biases within model responses, thereby causing societal harm through interactions with users. In this paper, we systematically study "persona biases", which we define to be the sensitivity of dialogue models' harmful behaviors contingent upon the personas they adopt. We categorize persona biases into biases in harmful expression and harmful agreement, and establish a comprehensive evaluation framework to measure persona biases in five aspects: Offensiveness, Toxic Continuation, Regard, Stereotype Agreement, and Toxic Agreement. Additionally, we propose to investigate persona biases by experimenting with UNIVERSALPERSONA, a systematically constructed persona dataset encompassing various types of both generic and specific model personas. Through benchmarking on four different models -- including Blender, ChatGPT, Alpaca, and Vicuna -- our study uncovers significant persona biases in dialogue systems. Our findings also underscore the pressing need to revisit the use of personas in dialogue agents to ensure safe application.
    摘要 现代大语言模型可以遵循自由式指令,包括模仿 generic或特定民族人物的对话。我们定义了一些通用的人物类型来表示民族组成部分,例如“一个亚洲人”,而特定的人物可能是具体的受欢迎的亚洲名字“玉米”。虽然采用人物可以增加对话系统的互动性和可接近性,但也可能扩大社会偏见在模型响应中,从而对社会造成伤害。在这篇论文中,我们系统地研究了“人物偏见”,定义为对话模型的危险行为与人物相关的敏感性。我们分类人物偏见为表达偏见和同意偏见,并设计了全面的评价框架来测试人物偏见的五个方面:不礼貌、继续恶势力、尊敬、刻板印象同意和恶势力同意。此外,我们还提出了使用 UNIVERSALPERSONA 系统构建的人物数据集,包括各种通用和特定的模型人物。通过对四种不同的模型(包括 Blender、ChatGPT、Alpaca 和 Vicuna)进行比较,我们的研究发现了对话系统中的人物偏见。我们的发现也警示了对人物的使用以确保安全应用的需要。

Measuring reasoning capabilities of ChatGPT

  • paper_url: http://arxiv.org/abs/2310.05993
  • repo_url: None
  • paper_authors: Adrian Groza
    for: 这个论文的目的是量化 chatGPT 在逻辑任务中生成的逻辑错误。methods: 作者使用了 chatGPT 解决 144 个逻辑题目,并使用 Prover9 和 Mace4 来验证解决方案。results: 作者发现 chatGPT 只能正确解决 7% 的题目,而 BARD 则可以正确解决 5% 的题目。此外,作者还发现 chatGPT 生成的解决方案中包含了 67 种逻辑错误,平均每个逻辑任务中包含 7 种错误。
    Abstract I shall quantify the logical faults generated by ChatGPT when applied to reasoning tasks. For experiments, I use the 144 puzzles from the library \url{https://users.utcluj.ro/~agroza/puzzles/maloga}~\cite{groza:fol}. The library contains puzzles of various types, including arithmetic puzzles, logical equations, Sudoku-like puzzles, zebra-like puzzles, truth-telling puzzles, grid puzzles, strange numbers, or self-reference puzzles. The correct solutions for these puzzles were checked using the theorem prover Prover9~\cite{mccune2005release} and the finite models finder Mace4~\cite{mccune2003mace4} based on human-modelling in Equational First Order Logic. A first output of this study is the benchmark of 100 logical puzzles. For this dataset ChatGPT provided both correct answer and justification for 7\% only. %, while BARD for 5\%. Since the dataset seems challenging, the researchers are invited to test the dataset on more advanced or tuned models than ChatGPT3.5 with more crafted prompts. A second output is the classification of reasoning faults conveyed by ChatGPT. This classification forms a basis for a taxonomy of reasoning faults generated by large language models. I have identified 67 such logical faults, among which: inconsistencies, implication does not hold, unsupported claim, lack of commonsense, wrong justification. The 100 solutions generated by ChatGPT contain 698 logical faults. That is on average, 7 fallacies for each reasoning task. A third ouput is the annotated answers of the ChatGPT with the corresponding logical faults. Each wrong statement within the ChatGPT answer was manually annotated, aiming to quantify the amount of faulty text generated by the language model. On average, 26.03\% from the generated text was a logical fault.
    摘要 A second output is the classification of reasoning faults conveyed by ChatGPT. This classification forms a basis for a taxonomy of reasoning faults generated by large language models. I have identified 67 such logical faults, including inconsistencies, implications that do not hold, unsupported claims, lack of common sense, and wrong justifications. The 100 solutions generated by ChatGPT contain 698 logical faults, averaging 7 fallacies for each reasoning task.A third output is the annotated answers of ChatGPT with the corresponding logical faults. Each wrong statement within the ChatGPT answer was manually annotated to quantify the amount of faulty text generated by the language model. On average, 26.03% of the generated text was found to contain logical faults.Note:[1] Groza, F. (2009). Puzzles for Logical Reasoning. Retrieved from [2] McCune, A. (2005). Prover9: A System for Automatic Theorem Proving. Retrieved from [3] McCune, A. (2003). Mace4: A System for Automatic Finite Model Generation. Retrieved from

Transforming Pixels into a Masterpiece: AI-Powered Art Restoration using a Novel Distributed Denoising CNN (DDCNN)

  • paper_url: http://arxiv.org/abs/2310.05270
  • repo_url: None
  • paper_authors: Sankar B., Mukil Saravanan, Kalaivanan Kumar, Siri Dubbaka
  • For: restore deteriorated artworks accurately and efficiently* Methods: 使用深度学习和计算机视觉技术,创造一种基于DDCNN的混合模型,可以根据不同的损害程度和类型进行自适应 restauration* Results: 实验表明,该方法可以有效地纠正损害,并保持细节的精度,提高了艺术品的 restauration 质量,比传统方法有更大的超越
    Abstract Art restoration is crucial for preserving cultural heritage, but traditional methods have limitations in faithfully reproducing original artworks while addressing issues like fading, staining, and damage. We present an innovative approach using deep learning, specifically Convolutional Neural Networks (CNNs), and Computer Vision techniques to revolutionize art restoration. We start by creating a diverse dataset of deteriorated art images with various distortions and degradation levels. This dataset trains a Distributed Denoising CNN (DDCNN) to remove distortions while preserving intricate details. Our method is adaptable to different distortion types and levels, making it suitable for various deteriorated artworks, including paintings, sketches, and photographs. Extensive experiments demonstrate our approach's efficiency and effectiveness compared to other Denoising CNN models. We achieve a substantial reduction in distortion, transforming deteriorated artworks into masterpieces. Quantitative evaluations confirm our method's superiority over traditional techniques, reshaping the art restoration field and preserving cultural heritage. In summary, our paper introduces an AI-powered solution that combines Computer Vision and deep learning with DDCNN to restore artworks accurately, overcoming limitations and paving the way for future advancements in art restoration.
    摘要 艺术修复是保护文化遗产的关键,但传统方法有限制,无法准确地复制原始艺术作品,同时解决抹涂、损坏等问题。我们提出了一种创新的方法,使用深度学习技术和计算机视觉技术,以推动艺术修复领域的 револю变。我们开始创建一个多样化的褪色艺术图像数据集,用于训练分布式滤清神经网络(DDCNN),以除掉抹涂而保留细节。我们的方法适用于不同类型和水平的抹涂,可以应用于不同的艺术作品,包括画作、素描和照片。我们的实验证明,我们的方法可以减少抹涂,将褪色艺术作品转化为名画。量化评估表明,我们的方法比传统方法更高效,重新定义艺术修复领域,并为未来的艺术修复领域提供了新的发展方向。总之,我们的论文介绍了一种通过计算机视觉和深度学习技术,使用 DDCNN 恢复艺术作品的准确方法,超越传统技术,开拓了未来艺术修复领域的新途径。

Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications

  • paper_url: http://arxiv.org/abs/2310.05269
  • repo_url: None
  • paper_authors: Azim Akhtarshenas, Mohammad Ali Vahedifar, Navid Ayoobi, Behrouz Maham, Tohid Alizadeh, Sina Ebrahimi
  • for: 这份论文主要是为了探讨联盟学习(Federated Learning,FL)在机器学习系统中实现隐私安全性的可能性和挑战。
  • methods: 这份论文使用了分布式机器学习(Distributed Machine Learning)和封包技术来实现联盟学习,并且进行了评估和比较现有的FL应用,以评估其效率、精度和隐私保护。
  • results: 这份论文发现了联盟学习可以实现隐私安全性和成本效益,并且发现了一些未解决的问题和挑战,例如资料权益和安全性、资料分布和资料隐私保护等。
    Abstract In the realm of machine learning (ML) systems featuring client-host connections, the enhancement of privacy security can be effectively achieved through federated learning (FL) as a secure distributed ML methodology. FL effectively integrates cloud infrastructure to transfer ML models onto edge servers using blockchain technology. Through this mechanism, it guarantees the streamlined processing and data storage requirements of both centralized and decentralized systems, with an emphasis on scalability, privacy considerations, and cost-effective communication. In current FL implementations, data owners locally train their models, and subsequently upload the outcomes in the form of weights, gradients, and parameters to the cloud for overall model aggregation. This innovation obviates the necessity of engaging Internet of Things (IoT) clients and participants to communicate raw and potentially confidential data directly with a cloud center. This not only reduces the costs associated with communication networks but also enhances the protection of private data. This survey conducts an analysis and comparison of recent FL applications, aiming to assess their efficiency, accuracy, and privacy protection. However, in light of the complex and evolving nature of FL, it becomes evident that additional research is imperative to address lingering knowledge gaps and effectively confront the forthcoming challenges in this field. In this study, we categorize recent literature into the following clusters: privacy protection, resource allocation, case study analysis, and applications. Furthermore, at the end of each section, we tabulate the open areas and future directions presented in the referenced literature, affording researchers and scholars an insightful view of the evolution of the field.
    摘要 在机器学习(ML)系统中,通过联邦学习(FL)可以有效提高隐私安全性。FL可以将云基础设施与边缘服务器集成,通过块链技术实现模型传输。这种机制可以保证中央化和分布式系统之间的流畅处理和数据存储要求,同时强调扩展性、隐私考虑因素和效率沟通。现在的FL实现中,数据所有者在本地训练模型,然后将结果上传到云中进行总模型聚合。这种创新使得无需将互联网物联网(IoT)客户端和参与者直接与云中心进行明文和潜在敏感数据的直接交流,从而降低了通信网络成本并提高了隐私数据的保护。本文对现有的FL应用进行分析和比较,以评估其效率、准确率和隐私保护。然而,随着FL的复杂和不断演化,显然需要进一步的研究,以解决仍存的知识漏洞并有效地应对未来的挑战。在这个研究中,我们将 recens literature into以下类别:隐私保护、资源分配、案例分析和应用。此外,文章结尾附加了每个部分的开放领域和未来方向,为研究人员和学者提供了深入的视野,了解领域的演化。

A Knowledge Graph-Based Search Engine for Robustly Finding Doctors and Locations in the Healthcare Domain

  • paper_url: http://arxiv.org/abs/2310.05258
  • repo_url: None
  • paper_authors: Mayank Kejriwal, Hamid Haidarian, Min-Hsueh Chiu, Andy Xiang, Deep Shrestha, Faizan Javed
  • for: 这篇论文是为了解决医疗领域患者找寻医生和位置的搜索问题而写的。
  • methods: 该论文使用知识图(KG)来结合 semi-structured 数据的感知模型、自然语言处理技术和结构化查询语言 like SPARQL 和 Cypher 来提供强大的搜索引擎体系。
  • results: Early results 表明,该方法可以对复杂查询提供明显更高的覆盖率,无需降低质量。
    Abstract Efficiently finding doctors and locations is an important search problem for patients in the healthcare domain, for which traditional information retrieval methods tend not to work optimally. In the last ten years, knowledge graphs (KGs) have emerged as a powerful way to combine the benefits of gleaning insights from semi-structured data using semantic modeling, natural language processing techniques like information extraction, and robust querying using structured query languages like SPARQL and Cypher. In this short paper, we present a KG-based search engine architecture for robustly finding doctors and locations in the healthcare domain. Early results demonstrate that our approach can lead to significantly higher coverage for complex queries without degrading quality.
    摘要 Traditional information retrieval methods tend not to work optimally for efficiently finding doctors and locations in the healthcare domain. In the last ten years, knowledge graphs (KGs) have emerged as a powerful way to combine the benefits of gleaning insights from semi-structured data using semantic modeling, natural language processing techniques like information extraction, and robust querying using structured query languages like SPARQL and Cypher. In this short paper, we present a KG-based search engine architecture for robustly finding doctors and locations in the healthcare domain. Early results demonstrate that our approach can lead to significantly higher coverage for complex queries without degrading quality.Here's the text in Traditional Chinese:传统的资讯搜寻方法在医疗领域中不太能够有效率地找到医生和位置。过去十年,知识图表(KGs)已经emerged as a powerful way to combine the benefits of gleaning insights from semi-structured data using semantic modeling, natural language processing techniques like information extraction, and robust querying using structured query languages like SPARQL and Cypher。在这篇短篇论文中,我们呈现了一个基于KG的搜索引擎架构,用于在医疗领域中强健地找到医生和位置。初步结果显示,我们的方法可以导致复杂的查询得到更高的覆盖率,而不会降低品质。

Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2310.05255
  • repo_url: https://github.com/mehrdad-dev/persis
  • paper_authors: Mehrdad Mohammadian, Neda Maleki, Tobias Olsson, Fredrik Ahlgren
  • for: 这篇论文是为了解决视觉字体识别(VFR)系统中的波斯字体识别问题。
  • methods: 该论文使用卷积神经网络(CNN)来解决这个问题,并使用了新的公共可用数据集来训练模型。
  • results: 根据论文的结果,提出的管道可以达到78.0%的顶部准确率,89.1%的IDPL-PFOD数据集准确率,以及94.5%的KAFD数据集准确率。 Additionally, the average time spent in the entire pipeline for one sample of the proposed datasets is 0.54 seconds and 0.017 seconds for CPU and GPU, respectively.
    Abstract What happens if we encounter a suitable font for our design work but do not know its name? Visual Font Recognition (VFR) systems are used to identify the font typeface in an image. These systems can assist graphic designers in identifying fonts used in images. A VFR system also aids in improving the speed and accuracy of Optical Character Recognition (OCR) systems. In this paper, we introduce the first publicly available datasets in the field of Persian font recognition and employ Convolutional Neural Networks (CNN) to address this problem. The results show that the proposed pipeline obtained 78.0% top-1 accuracy on our new datasets, 89.1% on the IDPL-PFOD dataset, and 94.5% on the KAFD dataset. Furthermore, the average time spent in the entire pipeline for one sample of our proposed datasets is 0.54 and 0.017 seconds for CPU and GPU, respectively. We conclude that CNN methods can be used to recognize Persian fonts without the need for additional pre-processing steps such as feature extraction, binarization, normalization, etc.
    摘要 如果我们在设计工作中遇到一种适合的字体,但是不知道它的名称,可以使用视觉字体识别(VFR)系统来识别字体类型。这些系统可以帮助图形设计师在图像中识别字体。VFR 系统还可以提高光学字符识别(OCR)系统的速度和准确性。在这篇论文中,我们介绍了字体识别领域的第一个公共可用数据集,并使用卷积神经网络(CNN)解决这个问题。结果显示,我们的提案的管道取得了78.0%的顶部一准确率,89.1%的IDPL-PFOD数据集和94.5%的KAFD数据集。此外,我们的整个管道中对一个样本的平均时间为0.54秒和0.017秒,分别是CPU和GPU上的。我们 conclude 的是,CNN 方法可以用来识别波斯字体,不需要额外的预处理步骤,如特征提取、二进制化、Normalization等。

Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05253
  • repo_url: https://github.com/wang2226/folk
  • paper_authors: Haoran Wang, Kai Shu
  • for: 验证宣称的可靠性,对抗虚假信息的扩散。
  • methods: 使用First-Order-Logic-Guided Knowledge-Grounded (FOLK) Reasoning,无需人工标注数据,可以验证复杂的宣称,并生成可读的解释。
  • results: 在三个不同的数据集上,FOLK 已经超越强基eline,并且可以提供清晰的解释,帮助人工验证者更好地理解模型的决策过程。
    Abstract Claim verification plays a crucial role in combating misinformation. While existing works on claim verification have shown promising results, a crucial piece of the puzzle that remains unsolved is to understand how to verify claims without relying on human-annotated data, which is expensive to create at a large scale. Additionally, it is important for models to provide comprehensive explanations that can justify their decisions and assist human fact-checkers. This paper presents First-Order-Logic-Guided Knowledge-Grounded (FOLK) Reasoning that can verify complex claims and generate explanations without the need for annotated evidence using Large Language Models (LLMs). FOLK leverages the in-context learning ability of LLMs to translate the claim into a First-Order-Logic (FOL) clause consisting of predicates, each corresponding to a sub-claim that needs to be verified. Then, FOLK performs FOL-Guided reasoning over a set of knowledge-grounded question-and-answer pairs to make veracity predictions and generate explanations to justify its decision-making process. This process makes our model highly explanatory, providing clear explanations of its reasoning process in human-readable form. Our experiment results indicate that FOLK outperforms strong baselines on three datasets encompassing various claim verification challenges. Our code and data are available.
    摘要 <>转换文本到简化中文。<>研究人员认为,确认说法的重要作用在抵御谎言中扮演着关键角色。 although existing works on claim verification have shown promising results, a crucial piece of the puzzle that remains unsolved is to understand how to verify claims without relying on expensive human-annotated data. In addition, it is important for models to provide comprehensive explanations that can justify their decisions and assist human fact-checkers. This paper presents First-Order-Logic-Guided Knowledge-Grounded (FOLK) Reasoning, which can verify complex claims and generate explanations without the need for annotated evidence using Large Language Models (LLMs). FOLK leverages the in-context learning ability of LLMs to translate the claim into a First-Order-Logic (FOL) clause consisting of predicates, each corresponding to a sub-claim that needs to be verified. Then, FOLK performs FOL-Guided reasoning over a set of knowledge-grounded question-and-answer pairs to make veracity predictions and generate explanations to justify its decision-making process. This process makes our model highly explanatory, providing clear explanations of its reasoning process in human-readable form. Our experiment results indicate that FOLK outperforms strong baselines on three datasets encompassing various claim verification challenges. Our code and data are available.

In-Context Convergence of Transformers

  • paper_url: http://arxiv.org/abs/2310.05249
  • repo_url: None
  • paper_authors: Yu Huang, Yuan Cheng, Yingbin Liang
  • for: 这个论文研究了一层转换器在梯度下降训练下的学习动力学,以便在不需要参数调整的情况下解决未看过的任务。
  • methods: 该论文使用了梯度下降训练方法,并对一层转换器的软max注意力进行研究。
  • results: 研究发现,对于具有平衡或不平衡特征的数据,转换器在梯度下降训练下可以在不同阶段达到近Zero预测错误的finite-time收敛保证。
    Abstract Transformers have recently revolutionized many domains in modern machine learning and one salient discovery is their remarkable in-context learning capability, where models can solve an unseen task by utilizing task-specific prompts without further parameters fine-tuning. This also inspired recent theoretical studies aiming to understand the in-context learning mechanism of transformers, which however focused only on linear transformers. In this work, we take the first step toward studying the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent in order to in-context learn linear function classes. We consider a structured data model, where each token is randomly sampled from a set of feature vectors in either balanced or imbalanced fashion. For data with balanced features, we establish the finite-time convergence guarantee with near-zero prediction error by navigating our analysis over two phases of the training dynamics of the attention map. More notably, for data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process, where the transformer first converges to a near-zero prediction error for the query tokens of dominant features, and then converges later to a near-zero prediction error for the query tokens of under-represented features, respectively via one and four training phases. Our proof features new techniques for analyzing the competing strengths of two types of attention weights, the change of which determines different training phases.
    摘要 <>将文本翻译成简化中文。<>现代机器学习中,变换器最近对许多领域进行了革命性的改变,其中一个吸引人的发现是它们在未经参数调整的情况下可以解决未看过的任务,这也激发了最近的理论研究,旨在理解变换器的在场景学习机制。然而,这些研究仅专注于线性变换器。在这项工作中,我们首先研究了一层变换器,通过梯度下降来学习线性函数类型。我们考虑了一种结构化数据模型,其中每个token是随机选择的特征向量集中的一个元素。对于具有平衡特征的数据,我们证明了在 finite-time 内 convergence guarantee,并且预测错误几乎为零。而对于具有不平衡特征的数据,我们显示了一个stage-wise convergence进程,变换器首先对查询符号的主要特征 converge 到 near-zero prediction error,然后在后四个训练阶段 convergence 到 under-represented 特征上的查询符号的 near-zero prediction error。我们的证明利用了一些新的分析技术,以确定不同训练阶段中的关键因素。

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data

  • paper_url: http://arxiv.org/abs/2310.05242
  • repo_url: None
  • paper_authors: Tianyang Zhong, Wei Zhao, Yutong Zhang, Yi Pan, Peixin Dong, Zuowei Jiang, Xiaoyan Kui, Youlan Shang, Li Yang, Yaonai Wei, Longtao Yang, Hao Chen, Huan Zhao, Yuxiao Liu, Ning Zhu, Yiwei Li, Yisong Wang, Jiaqi Yao, Jiaqi Wang, Ying Zeng, Lei He, Chao Zheng, Zhixue Zhang, Ming Li, Zhengliang Liu, Haixing Dai, Zihao Wu, Lu Zhang, Shu Zhang, Xiaoyan Cai, Xintao Hu, Shijie Zhao, Xi Jiang, Xin Zhang, Xiang Li, Dajiang Zhu, Lei Guo, Dinggang Shen, Junwei Han, Tianming Liu, Jun Liu, Tuo Zhang
  • For: 这个研究旨在解决医疗影像分析中的报告生成问题,以实现诊断过程中的量化分析。* Methods: 这个研究使用了大型自然语言模型(LLM),发展了一个适应器“ChatRadio-Valuer”,以自动生成医疗影像报告。* Results: 研究结果显示,ChatRadio-Valuer在医疗影像报告中诊断疾病的能力高于现有的模型,特别是与ChatGPT和GPT-4等模型相比。
    Abstract Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels. However, complex and diverse radiology reports with cross-source heterogeneity pose a huge generalizability challenge to the current methods under massive data volume, mainly because the style and normativity of radiology reports are obviously distinctive among institutions, body regions inspected and radiologists. Recently, the advent of large language models (LLM) offers great potential for recognizing signs of health conditions. To resolve the above problem, we collaborate with the Second Xiangya Hospital in China and propose ChatRadio-Valuer based on the LLM, a tailored model for automatic radiology report generation that learns generalizable representations and provides a basis pattern for model adaptation in sophisticated analysts' cases. Specifically, ChatRadio-Valuer is trained based on the radiology reports from a single institution by means of supervised fine-tuning, and then adapted to disease diagnosis tasks for human multi-system evaluation (i.e., chest, abdomen, muscle-skeleton, head, and maxillofacial $\&$ neck) from six different institutions in clinical-level events. The clinical dataset utilized in this study encompasses a remarkable total of \textbf{332,673} observations. From the comprehensive results on engineering indicators, clinical efficacy and deployment cost metrics, it can be shown that ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al., in terms of the diseases diagnosis from radiology reports. ChatRadio-Valuer provides an effective avenue to boost model generalization performance and alleviate the annotation workload of experts to enable the promotion of clinical AI applications in radiology reports.
    摘要 医学影像分析中的 radiology 报告生成是医疗决策中的关键步骤,但是复杂和多样的 radiology 报告带有跨源差异性,对当前方法来说是一个巨大普适性挑战。这是因为 radiology 报告的风格和标准性在不同机构、身体区域和 radiologist 之间存在显著差异。然而,最近的大语言模型(LLM)的出现带来了识别健康状况的潜在可能性。为解决这个问题,我们与中国第二医学院合作,并提出了基于 LLM 的 ChatRadio-Valuer 自动 radiology 报告生成模型,该模型学习普适表示和提供基本模式 для模型适应复杂分析员的情况。具体来说,ChatRadio-Valuer 通过单机构的 radiology 报告进行监督微调训练,然后在多个机构的疾病诊断任务中进行人类多系统评估。这些临床数据的总量为 \textbf{332,673} 个观察。根据工程指标、临床效果和部署成本度量,可以看出,ChatRadio-Valuer 在疾病诊断方面与现有模型,特别是 ChatGPT(GPT-3.5-Turbo)和 GPT-4 等模型,表现出色,尤其是在 radiology 报告中诊断疾病。ChatRadio-Valuer 为临床 AI 应用提供了一个有效的通路,以提高模型普适性性和减轻专家的标注工作负担,以便推动临床 AI 应用的普及。

MindfulDiary: Harnessing Large Language Model to Support Psychiatric Patients’ Journaling

  • paper_url: http://arxiv.org/abs/2310.05231
  • repo_url: None
  • paper_authors: Taewan Kim, Seolyeong Bae, Hyun Ah Kim, Su-woo Lee, Hwajung Hong, Chanmo Yang, Young-Ho Kim
  • for: 帮助心理病人每天记录经验,并帮助心理医生更好地理解患者的思想和日常情境。
  • methods: 使用大语言模型(LLM)和移动应用程序,实现了患者每天的自由对话记录,并遵循专业指导方针。
  • results: 经四周的场景研究,发现 MindfulDiary 可以帮助患者日常记录更详细和系统化,同时帮助心理医生更好地理解患者的思想和日常情境,有助于提高心理医疗效果。
    Abstract In the mental health domain, Large Language Models (LLMs) offer promising new opportunities, though their inherent complexity and low controllability have raised questions about their suitability in clinical settings. We present MindfulDiary, a mobile journaling app incorporating an LLM to help psychiatric patients document daily experiences through conversation. Designed in collaboration with mental health professionals (MHPs), MindfulDiary takes a state-based approach to safely comply with the experts' guidelines while carrying on free-form conversations. Through a four-week field study involving 28 patients with major depressive disorder and five psychiatrists, we found that MindfulDiary supported patients in consistently enriching their daily records and helped psychiatrists better empathize with their patients through an understanding of their thoughts and daily contexts. Drawing on these findings, we discuss the implications of leveraging LLMs in the mental health domain, bridging the technical feasibility and their integration into clinical settings.
    摘要 在心理健康领域,大型自然语言模型(LLM)提供了新的机遇,但其内置的复杂性和控制性问题引起了许多关于其在临床设置中适用性的问题。我们介绍了一款名为 MindfulDiary的移动日记应用程序,该应用程序通过与心理医生(MHP)合作,使用 LLM 帮助心理病人每天记录他们的经验。我们在28名主观抑郁症患者和5名心理医生参与的四周实验中发现,MindfulDiary 可以帮助患者日常记录更加详细,并帮助心理医生更好地理解他们的患者的思想和日常背景。根据这些发现,我们讨论了在心理健康领域利用 LLM 的意义,把技术可行性和其在临床设置中的集成相结合。

Physics-aware Machine Learning Revolutionizes Scientific Paradigm for Machine Learning and Process-based Hydrology

  • paper_url: http://arxiv.org/abs/2310.05227
  • repo_url: None
  • paper_authors: Qingsong Xu, Yilei Shi, Jonathan Bamber, Ye Tuo, Ralf Ludwig, Xiao Xiang Zhu
    for:physics-aware machine learning (PaML) is introduced as a transformative approach to overcome the barrier between hydrology and machine learning, and to revolutionize both fields.methods:the review includes a comprehensive analysis of existing PaML methodologies that integrate prior physical knowledge or physics-based modeling into machine learning, including physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning.results:the review highlights the most promising and challenging directions for different objectives and PaML methods in hydrology, including rainfall-runoff hydrological processes and hydrodynamic processes. Additionally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications, which enhances the explainability and causality of machine learning and lays the groundwork for the digital water cycle’s realization.
    Abstract Accurate hydrological understanding and water cycle prediction are crucial for addressing scientific and societal challenges associated with the management of water resources, particularly under the dynamic influence of anthropogenic climate change. Existing reviews predominantly concentrate on the development of machine learning (ML) in this field, yet there is a clear distinction between hydrology and ML as separate paradigms. Here, we introduce physics-aware ML as a transformative approach to overcome the perceived barrier and revolutionize both fields. Specifically, we present a comprehensive review of the physics-aware ML methods, building a structured community (PaML) of existing methodologies that integrate prior physical knowledge or physics-based modeling into ML. We systematically analyze these PaML methodologies with respect to four aspects: physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning. PaML facilitates ML-aided hypotheses, accelerating insights from big data and fostering scientific discoveries. We first conduct a systematic review of hydrology in PaML, including rainfall-runoff hydrological processes and hydrodynamic processes, and highlight the most promising and challenging directions for different objectives and PaML methods. Finally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications. HydroPML enhances the explainability and causality of ML and lays the groundwork for the digital water cycle's realization. The HydroPML platform is publicly available at https://hydropml.github.io/.
    摘要 Accurate hydrological understanding和水ecycle prediction是管理水资源的科学和社会挑战中的关键,尤其是在人类活动导致的气候变化的影响下。现有的评论主要集中在机器学习(ML)的发展中,但是有一个明确的分界线:水文和ML为两个不同的思维框架。我们介绍了一种将物理知识integrated into ML的新方法,即物理意识ML(PaML),以超越这一障碍并重塑两个领域。我们对PaML方法进行了系统性的分析,包括物理数据驱动ML、物理信息驱动ML、物理嵌入ML和物理意识混合学习。PaML方法可以加速大数据的学习和探索,并促进科学发现。我们首先对PaML在水文领域进行了系统性的评论,包括雨水径流过程和 hidrodynamic过程,并将不同目标和PaML方法中最有前途和挑战的方向 highlighted。最后,我们发布了一个基于PaML的水文平台,称为HydroPML,以提高ML的解释性和因果关系,并为数字水ecycle的实现奠定基础。HydroPML平台publicly available at

Interpretable Semiotics Networks Representing Awareness

  • paper_url: http://arxiv.org/abs/2310.05212
  • repo_url: None
  • paper_authors: David Kupeev, Eyal Nitcany
  • for: 这个论文描述了一种计算模型,用于跟踪和模拟人类对物体的感知和communication中的表达。
  • methods: 该模型包括两个关键组件(’observed’和’seen’),与计算机视觉术语(‘encoding’和’decoding’)相关。这些元素结合形成了 semiotic networks,用于模拟人类对物体的感知和communication中的意识。
  • results: 作者在多个实验中证明了这个模型的可见性,并且在小训练数据集上,该模型的复合网络超过了单独的分类网络的性能。未来的工作将利用这个模型,以更好地理解人类communication和个人表达。
    Abstract Humans perceive objects daily and communicate their perceptions using various channels. Here, we describe a computational model that track and simulate objects' perception, and their representations as they pass in communication. We describe two key components of our internal representation ('observed' and 'seen') and relate them to familiar computer vision terms (encoding and decoding). These elements joined together to form semiotic networks, which simulate awareness in object perception and human communication. Nowadays, most neural networks are uninterpretable. On the other hand, our model is free from this disadvantages. We performed several experiments and demonstrated the visibility of our model. We describe how our network may be used as preprocessing unit to any classification network. In our experiments the compound network overperforms in average the classification network at datasets with small training data. Future work would leverage our model to gain better understanding of human communications and personal representations.
    摘要 人们日常接触物体,通过不同的渠道传达自己的感知。我们描述了一种计算模型,可以跟踪和模拟物体的感知和表达,以及它们在交流中的表现。我们描述了两个关键组成部分('观察'和'看到'),与 familar computer vision terms(编码和解码)相关。这些元素结合形成了 semiotic networks,可以模拟人类对物体感知和communication的意识。现在,大多数神经网络都是不可解释的。然而,我们的模型免受这些缺点。我们进行了多个实验,并证明了我们的模型的可见性。我们描述了如何使用我们的网络作为任何分类网络的预处理单元,并在实验中发现了compound network在小训练数据集上的超越性。未来的工作将利用我们的模型,更好地理解人类通信和个人表示。

TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining

  • paper_url: http://arxiv.org/abs/2310.05210
  • repo_url: https://github.com/hkust-knowcomp/tilfa
  • paper_authors: Qing Zong, Zhaowei Wang, Baixuan Xu, Tianshi Zheng, Haochen Shi, Weiqi Wang, Yangqiu Song, Ginny Y. Wong, Simon See
  • for: 本研究旨在分析作者的立场(Argument Mining)
  • methods: 该研究使用了一种新的框架—TILFA(约文本、图像和布局融合框架),可以处理混合数据(文本和图像),并且可以理解文本以及检测图像中的光学字符和布局细节
  • results: 该模型在Argumentative Stance Classification子 зада务中显著超过了现有的基eline,为知识共享(KnowComp)团队赢得了第一名
    Abstract A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike previous AM datasets focusing only on text, the shared task at the 10th Workshop on Argument Mining introduces a dataset including both text and images. Importantly, these images contain both visual elements and optical characters. Our new framework, TILFA (A Unified Framework for Text, Image, and Layout Fusion in Argument Mining), is designed to handle this mixed data. It excels at not only understanding text but also detecting optical characters and recognizing layout details in images. Our model significantly outperforms existing baselines, earning our team, KnowComp, the 1st place in the leaderboard of Argumentative Stance Classification subtask in this shared task.
    摘要 主要目标之一的Argument Mining(AM)是分析作者的态度。与过去的AM数据集仅专注于文本的情况不同,这个共同任务在10个Argument Mining工作坊中引入了包括文本和图像的数据集。重要的是,这些图像包含视觉元素和光学字符。我们的新框架TILFA(文本、图像和布局融合在Argument Mining中的一体化框架)针对这种混合数据进行处理。它不仅能够理解文本,还能探测光学字符和图像中的布局细节。我们的模型在Argumentative Stance Classification子任务中显著超越了现有的基线,让我们的团队 KnowComp 在领导板块中获得第一名。

Scaling Laws of RoPE-based Extrapolation

  • paper_url: http://arxiv.org/abs/2310.05209
  • repo_url: https://github.com/OpenLMLab/scaling-rope
  • paper_authors: Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin
  • for: 本文主要研究了基于Rotary Position Embedding(RoPE)的大型自然语言模型(LLM)的推断能力。
  • methods: 本文提出了一种基于RoPE的推断方法,包括修改RoPE的基数和提供长文本练习。
  • results: 本文在16K训练长度下,通过调整RoPE的基数和练习文本长度,实现了在1000000上下文长度内的推断。同时,本文还提出了一种periodic perspective下的扩展法则,以描述推断性能与基数和练习文本长度之间的关系。
    Abstract The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with LLMs involves modifying RoPE by replacing 10000, the rotary base of $\theta_n={10000}^{-2n/d}$ in the original RoPE, with a larger value and providing longer fine-tuning text. In this work, we first observe that fine-tuning a RoPE-based LLM with either a smaller or larger base in pre-training context length could significantly enhance its extrapolation performance. After that, we propose \textbf{\textit{Scaling Laws of RoPE-based Extrapolation}, a unified framework from the periodic perspective, to describe the relationship between the extrapolation performance and base value as well as tuning context length. In this process, we also explain the origin of the RoPE-based extrapolation issue by \textbf{\textit{critical dimension for extrapolation}. Besides these observations and analyses, we achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B.
    摘要 Currently, the ability of Large Language Models (LLMs) to extrapolate is a topic of great interest. The mainstream approach to improving extrapolation with LLMs is to modify Rotary Position Embedding (RoPE) by replacing the rotary base of $\theta_n={10000}^{-2n/d}$ with a larger value and providing longer fine-tuning text. In this study, we find that fine-tuning a RoPE-based LLM with a smaller or larger base in the pre-training context length can significantly enhance its extrapolation performance. We then propose the \textbf{\textit{Scaling Laws of RoPE-based Extrapolation}, a unified framework from a periodic perspective, to describe the relationship between the extrapolation performance and base value as well as tuning context length. In this process, we also explain the origin of the RoPE-based extrapolation issue by \textbf{\textit{critical dimension for extrapolation}. Furthermore, we achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B.Note:* "Rotary Position Embedding" (RoPE) is translated as "旋转位嵌入" (Fánzàng wèi yù) in Simplified Chinese.* "Large Language Models" (LLMs) is translated as "大型语言模型" (dàxí yǔyán módelǐ) in Simplified Chinese.

Quantifying Zero-shot Coordination Capability with Behavior Preferring Partners

  • paper_url: http://arxiv.org/abs/2310.05208
  • repo_url: None
  • paper_authors: Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, Weinan Zhang
  • for: 评估 Zero-shot coordination(ZSC)能力的可靠、全面和高效的评估方法。
  • methods: 提出了一种基于“理想完整”评估伙伴的评估方法,包括构建“完整”评估伙伴和Multi-dimensional度量指标BR-Prox。
  • results: 使用提出的评估方法重新评估了强大的ZSC方法在Overcooked环境中的性能,结果显示一些最常用的布局下,不同ZSC方法的性能差异不明显。此外,评估的ZSC方法需要生成更多和更高性能的训练伙伴。
    Abstract Zero-shot coordination (ZSC) is a new challenge focusing on generalizing learned coordination skills to unseen partners. Existing methods train the ego agent with partners from pre-trained or evolving populations. The agent's ZSC capability is typically evaluated with a few evaluation partners, including human and agent, and reported by mean returns. Current evaluation methods for ZSC capability still need to improve in constructing diverse evaluation partners and comprehensively measuring the ZSC capability. We aim to create a reliable, comprehensive, and efficient evaluation method for ZSC capability. We formally define the ideal 'diversity-complete' evaluation partners and propose the best response (BR) diversity, which is the population diversity of the BRs to the partners, to approximate the ideal evaluation partners. We propose an evaluation workflow including 'diversity-complete' evaluation partners construction and a multi-dimensional metric, the Best Response Proximity (BR-Prox) metric. BR-Prox quantifies the ZSC capability as the performance similarity to each evaluation partner's approximate best response, demonstrating generalization capability and improvement potential. We re-evaluate strong ZSC methods in the Overcooked environment using the proposed evaluation workflow. Surprisingly, the results in some of the most used layouts fail to distinguish the performance of different ZSC methods. Moreover, the evaluated ZSC methods must produce more diverse and high-performing training partners. Our proposed evaluation workflow calls for a change in how we efficiently evaluate ZSC methods as a supplement to human evaluation.
    摘要 Zero-shot coordination (ZSC) 是一个新的挑战,旨在将已经学习的协调技能应用到未见过的伙伴上。现有的方法将自己作为主体Agent训练的伙伴来自预先训练或进化的人类和机器人 population。主体Agent的 ZSC 能力通常是通过一些评估伙伴,包括人类和机器人,并由平均回应报告。现有的评估方法 для ZSC 能力仍然需要改进,以建立多样化的评估伙伴和全面地衡量 ZSC 能力。我们希望创建一个可靠、全面和高效的评估方法。我们正式定义了理想的 '多样化完整' 评估伙伴,并提出了最佳回应多样性(BR 多样性),它是评估伙伴的 Population 多样性的最佳回应。我们提出了一个评估工作流程,包括 '多样化完整' 评估伙伴的建构和多维度度量,即最佳回应距离度量(BR-Prox)。BR-Prox 量化 ZSC 能力为对每个评估伙伴的近似最佳回应的性能相似度,显示了扩展性和改进潜力。我们在 Overcooked 环境中重新评估了强大 ZSC 方法,结果显示,在一些最常用的布局中,不能区分不同 ZSC 方法的表现。此外,评估 ZSC 方法的伙伴必须生成更多和更高性能的训练伙伴。我们的提出的评估工作流程将对 ZSC 方法的评估作为补充 human 评估。

Boosting Facial Action Unit Detection Through Jointly Learning Facial Landmark Detection and Domain Separation and Reconstruction

  • paper_url: http://arxiv.org/abs/2310.05207
  • repo_url: None
  • paper_authors: Ziqiao Shang, Li Yu
  • for: 这篇研究旨在提出一个新的 facial action unit (AU) 检测框架,以便在无标的面部图像中进行supervised检测。
  • methods: 这篇研究使用多任务学习,将AU领域分类和重建、面部标志检测共享同structural facial extraction模组的 Parameters。此外,提出了一个基于对照学习的新Feature alignment方案,加入了四个中途supervisors,以促进特征重建过程。
  • results: 实验结果显示,该方法在两个benchmark上具有较高的精度和稳定性,较之前所有方法有所提高。
    Abstract Recently how to introduce large amounts of unlabeled facial images in the wild into supervised Facial Action Unit (AU) detection frameworks has become a challenging problem. In this paper, we propose a new AU detection framework where multi-task learning is introduced to jointly learn AU domain separation and reconstruction and facial landmark detection by sharing the parameters of homostructural facial extraction modules. In addition, we propose a new feature alignment scheme based on contrastive learning by simple projectors and an improved contrastive loss, which adds four additional intermediate supervisors to promote the feature reconstruction process. Experimental results on two benchmarks demonstrate our superiority against the state-of-the-art methods for AU detection in the wild.
    摘要

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

  • paper_url: http://arxiv.org/abs/2310.05205
  • repo_url: https://github.com/bigrl-team/gear
  • paper_authors: Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai
  • for: 这篇论文旨在开发一种分布式、GPU-中心的经验回忆系统(GEAR),用于执行扩展的强化学习(RL),并使用大 sequences 模型(如 transformers)。
  • methods: GEAR 使用了一种优化的内存管理策略,使得 GPU 服务器的内存资源(包括主机内存和设备内存)可以有效地管理经验数据。此外,它还实现了分布式的 GPU 设备来快速执行不同的经验选择策略,从而缓解计算瓶颈。 GEAR 还使用了 GPU 加速器来收集经验数据,并使用零复制访问主机内存和远程指定内存访问来提高通信效率。
  • results: 根据集群实验结果,GEAR 可以与 Reverb 相比,在训练 state-of-the-art 大 RL 模型时达到6倍的性能水平。
    Abstract This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6x greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https://github.com/bigrl-team/gear.
    摘要 Translated into Simplified Chinese:这篇论文介绍了一种分布式、GPU中心的经验回放系统GEAR,用于执行可扩展的 reinforcement learning(RL),并且可以使用大型序列模型(如转换器)。现有的系统如Reverb,在内存、计算和通信方面都会遇到严重的瓶颈。然而,GEAR通过启用 GPU 服务器上的内存资源(包括主机内存和设备内存)来管理轨迹数据,从而提高了内存效率。此外,它还可以让分布式的 GPU 设备优先级化不同的轨迹选择策略,以避免计算瓶颈。GEAR 具有可收集轨迹的 GPU kernels,使用零复制访问主机内存,以及通过 InfiniBand 进行远程指定内存访问,以提高通信效率。在分布式集群实验中,GEAR 可以与 Reverb 训练state-of-the-art 大型 RL 模型时, achieve 性能水平高达 6 倍。GEAR 开源在

GMMFormer: Gaussian-Mixture-Model based Transformer for Efficient Partially Relevant Video Retrieval

  • paper_url: http://arxiv.org/abs/2310.05195
  • repo_url: None
  • paper_authors: Yuting Wang, Jinpeng Wang, Bin Chen, Ziyun Zeng, Shu-Tao Xia
  • For: This paper is written for partially relevant video retrieval (PRVR), which aims to find untrimmed videos containing pertinent moments in a database.* Methods: The paper proposes a novel method called GMMFormer, which models clip representations implicitly using a Gaussian-Mixture-Model (GMM) and Transformer architecture. The method incorporates Gaussian-Mixture-Model constraints during frame interactions to focus each frame on its adjacent frames, generating representations that contain multi-scale clip information.* Results: The paper demonstrates the superiority and efficiency of GMMFormer through extensive experiments on three large-scale video datasets (TVR, ActivityNet Captions, and Charades-STA). The results show that GMMFormer outperforms existing PRVR methods and achieves better efficiency by reducing the storage overhead and improving the embedding space.
    Abstract Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos containing pertinent moments in a database. For PRVR, clip modeling is essential to capture the partial relationship between texts and videos. Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead. To solve the efficiency problem of PRVR methods, this paper proposes GMMFormer, a \textbf{G}aussian-\textbf{M}ixture-\textbf{M}odel based Trans\textbf{former} which models clip representations implicitly. During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. Then generated representations will contain multi-scale clip information, achieving implicit clip modeling. In addition, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intensive and contain more semantic information. Extensive experiments on three large-scale video datasets (\ie, TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer.
    摘要 During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. This allows generated representations to contain multi-scale clip information, achieving implicit clip modeling. Additionally, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intense and contain more semantic information.Extensive experiments on three large-scale video datasets (TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer.

Factuality Challenges in the Era of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05189
  • repo_url: None
  • paper_authors: Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha, Tanmoy Chakraborty, Giovanni Luca Ciampaglia, David Corney, Renee DiResta, Emilio Ferrara, Scott Hale, Alon Halevy, Eduard Hovy, Heng Ji, Filippo Menczer, Ruben Miguez, Preslav Nakov, Dietram Scheufele, Shivam Sharma, Giovanni Zagni
  • for: 本研究旨在探讨Generative AI技术的发展以及其对社会的影响,尤其是LLMs技术的潜在的威胁和风险。
  • methods: 本研究采用了文献综述和讨论的方法,检视了现有的LLMs技术和其应用场景,并分析了这些技术的潜在的威胁和风险。
  • results: 本研究发现了一些LLMs技术的潜在威胁和风险,包括生成假信息和假 profiles,以及恶意利用这些技术来欺诈用户。同时,本研究还提出了一些可能的解决方案,如实施技术审核和评估机制,提高用户的AI理解水平,以及进行更多的研究和规范。
    Abstract The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention. These incredibly useful, natural-sounding tools mark significant advances in natural language generation, yet they exhibit a propensity to generate false, erroneous, or misleading content -- commonly referred to as "hallucinations." Moreover, LLMs can be exploited for malicious applications, such as generating false but credible-sounding content and profiles at scale. This poses a significant challenge to society in terms of the potential deception of users and the increasing dissemination of inaccurate information. In light of these risks, we explore the kinds of technological innovations, regulatory reforms, and AI literacy initiatives needed from fact-checkers, news organizations, and the broader research and policy communities. By identifying the risks, the imminent threats, and some viable solutions, we seek to shed light on navigating various aspects of veracity in the era of generative AI.
    摘要 LLM(大语言模型)技术的出现,如OpenAI的ChatGPT、Microsoft的Bing Chat以及Google的Bard,吸引了广泛的公众关注。这些极其有用、自然 звуча的工具表现出了对自然语言生成的重要进步,但它们往往会生成错误、误导性的内容,通常被称为“幻见”。此外,LLM可能会被恶用于黑客活动,如大规模生成假 pero Credible-sounding内容和 Profile。这对社会带来了误导用户的风险,以及假信息的扩散。为了面对这些挑战,我们需要从事实核查、法规改革以及人工智能文化培训等方面来解决这些问题。我们希望通过识别风险、危机点以及可行的解决方案,为在生成AI时的真实性提供指南。

Evolutionary Retrosynthetic Route Planning

  • paper_url: http://arxiv.org/abs/2310.05186
  • repo_url: None
  • paper_authors: Yan Zhang, Hao Hao, Xiao He, Shuanhu Gao, Aimin Zhou
  • for: 本研究目的是提出一种基于进化算法的多步反Synthesis路径规划方法,以解决现有的反Synthesis问题。
  • methods: 该方法首先将反Synthesis问题转化为优化问题,定义搜索空间和操作。此外,为提高搜索效率, parallel 策略被实现。
  • results: 对四种产品的实验结果表明,相比较 Monte Carlo tree search 算法,EA 可以Significantly 减少单步模型的调用数(均减少53.9%),搜索三个解决方案的时间减少83.9%,并同时提高可行搜索路径的数量(增加5倍)。
    Abstract Molecular retrosynthesis is a significant and complex problem in the field of chemistry, however, traditional manual synthesis methods not only need well-trained experts but also are time-consuming. With the development of big data and machine learning, artificial intelligence (AI) based retrosynthesis is attracting more attention and is becoming a valuable tool for molecular retrosynthesis. At present, Monte Carlo tree search is a mainstream search framework employed to address this problem. Nevertheless, its search efficiency is compromised by its large search space. Therefore, we propose a novel approach for retrosynthetic route planning based on evolutionary optimization, marking the first use of Evolutionary Algorithm (EA) in the field of multi-step retrosynthesis. The proposed method involves modeling the retrosynthetic problem into an optimization problem, defining the search space and operators. Additionally, to improve the search efficiency, a parallel strategy is implemented. The new approach is applied to four case products, and is compared with Monte Carlo tree search. The experimental results show that, in comparison to the Monte Carlo tree search algorithm, EA significantly reduces the number of calling single-step model by an average of 53.9%. The time required to search three solutions decreased by an average of 83.9%, and the number of feasible search routes increases by 5 times.
    摘要 分子逆synthesis是化学领域中的一个重要和复杂问题,但传统的手动合成方法不仅需要高水平的专业人员,还需要很长的时间。随着大数据和机器学习的发展,人工智能(AI)基于的逆synthesis在这一问题上吸引了更多的注意力,成为化学领域的一种有价值的工具。目前,蒙特卡洛tree搜索是逆synthesis搜索框架的主流,但它的搜索效率受到搜索空间的限制。因此,我们提出了一种基于进化优化的新方法,标志着多步逆synthesis中Evolutionary Algorithm(EA)的首次应用。该方法包括将逆synthesis问题转化为优化问题,定义搜索空间和运算符。此外,为了提高搜索效率,并行策略被实现。新方法在四种case продуkttest中应用,并与蒙特卡洛tree搜索进行比较。实验结果表明,相比蒙特卡洛tree搜索算法,EA可以平均减少单步模型的呼び出数量53.9%,搜索三个解决方案所需的时间减少83.9%,并同时提高可行搜索路径的数量5倍。

Text2NKG: Fine-Grained N-ary Relation Extraction for N-ary relational Knowledge Graph Construction

  • paper_url: http://arxiv.org/abs/2310.05185
  • repo_url: https://github.com/lhrlab/text2nkg
  • paper_authors: Haoran Luo, Haihong E, Yuhao Yang, Tianyu Yao, Yikai Guo, Zichen Tang, Wentai Zhang, Kaiyang Wan, Shiyao Peng, Meina Song, Wei Lin
  • for: 这篇论文旨在构建基于文本的n-ary关系知识图(NKG),以便更好地表达现实世界中的多元关系。
  • methods: 本文提出了一种新的细化n-ary关系抽取方法,使用 span-tuple classification 和 heteo-ordered merging 技术来实现不同的n-ary关系抽取。
  • results: 实验结果表明,Text2NKG 比前一代模型提高了 nearly 20% 的 $F_1$ 分数在 Hyper-relational schema 中的细化n-ary关系抽取任务上。
    Abstract Beyond traditional binary relational facts, n-ary relational knowledge graphs (NKGs) are comprised of n-ary relational facts containing more than two entities, which are closer to real-world facts with broader applications. However, the construction of NKGs still significantly relies on manual labor, and n-ary relation extraction still remains at a course-grained level, which is always in a single schema and fixed arity of entities. To address these restrictions, we propose Text2NKG, a novel fine-grained n-ary relation extraction framework for n-ary relational knowledge graph construction. We introduce a span-tuple classification approach with hetero-ordered merging to accomplish fine-grained n-ary relation extraction in different arity. Furthermore, Text2NKG supports four typical NKG schemas: hyper-relational schema, event-based schema, role-based schema, and hypergraph-based schema, with high flexibility and practicality. Experimental results demonstrate that Text2NKG outperforms the previous state-of-the-art model by nearly 20\% points in the $F_1$ scores on the fine-grained n-ary relation extraction benchmark in the hyper-relational schema. Our code and datasets are publicly available.
    摘要 traditional binary relational facts beyond, n-ary relational knowledge graphs (NKGs) comprised of n-ary relational facts containing more than two entities, closer to real-world facts with broader applications. However, the construction of NKGs still significantly relies on manual labor, and n-ary relation extraction still remains at a course-grained level, which is always in a single schema and fixed arity of entities. To address these restrictions, we propose Text2NKG, a novel fine-grained n-ary relation extraction framework for n-ary relational knowledge graph construction. We introduce a span-tuple classification approach with hetero-ordered merging to accomplish fine-grained n-ary relation extraction in different arity. Furthermore, Text2NKG supports four typical NKG schemas: hyper-relational schema, event-based schema, role-based schema, and hypergraph-based schema, with high flexibility and practicality. Experimental results demonstrate that Text2NKG outperforms the previous state-of-the-art model by nearly 20\% points in the $F_1$ scores on the fine-grained n-ary relation extraction benchmark in the hyper-relational schema. Our code and datasets are publicly available.Here's the breakdown of the translation:* "traditional binary relational facts" becomes "传统二元关系知识"* "n-ary relational knowledge graphs" becomes "n-ary关系知识图"* "n-ary relational facts" becomes "n-ary关系事实"* "broader applications" becomes "更广泛的应用"* "manual labor" becomes "手动劳动"* "course-grained level" becomes "粗粒度层"* "single schema" becomes "单一 schema"* "fixed arity of entities" becomes " fixes 实体数量"* "Text2NKG" becomes "文本到 NKG"* "span-tuple classification" becomes " span-tuple 分类"* "hetero-ordered merging" becomes "异质顺序合并"* "fine-grained n-ary relation extraction" becomes "细化 n-ary 关系提取"* "n-ary relation extraction benchmark" becomes "n-ary 关系提取指标"* "hyper-relational schema" becomes "超过关系 schema"* "event-based schema" becomes "事件基于 schema"* "role-based schema" becomes "角色基于 schema"* "hypergraph-based schema" becomes "超graph基于 schema"* "high flexibility and practicality" becomes "高灵活性和实用性"* "previous state-of-the-art model" becomes "前一代模型"* "nearly 20\% points" becomes "约 20\% 的点数"Note that the translation is in Simplified Chinese, which is the most widely used variety of Chinese. If you need the translation in Traditional Chinese, please let me know.

Optimizing Large Language Models to Expedite the Development of Smart Contracts

  • paper_url: http://arxiv.org/abs/2310.05178
  • repo_url: None
  • paper_authors: Nii Osae Osae Dade, Margaret Lartey-Quaye, Emmanuel Teye-Kofi Odonkor, Paul Ammah
  • For: The paper aims to help developers build decentralized applications (dApps) on blockchain networks by introducing MazzumaGPT, a large language model that can generate smart contract code and improve development productivity.* Methods: The paper uses a large language model called MazzumaGPT, which is optimized for generating smart contract code. The model is fine-tuned and evaluated for functional correctness.* Results: The paper reports on the performance of MazzumaGPT in generating smart contract code and improving development productivity. The results show that the model can generate correct code and improve development efficiency. However, the paper also acknowledges some limitations and broader impacts of the research.Here is the same information in Simplified Chinese:* For: 本研究旨在帮助开发者在区块链网络上建立分布式应用程序(dApps),通过引入MazzumaGPT大语言模型,生成智能合约代码并提高开发效率。* Methods: 本研究使用MazzumaGPT大语言模型,该模型是为生成智能合约代码优化。模型进行了精度调整和功能正确性评估。* Results: 本研究报告MazzumaGPT模型在生成智能合约代码和提高开发效率方面的性能。结果显示,模型可以生成正确的代码并提高开发效率,但也存在一些限制和更广泛的影响。
    Abstract Programming has always been at the heart of technological innovation in the 21st century. With the advent of blockchain technologies and the proliferation of web3 paradigms of decentralised applications, smart contracts have been very instrumental in enabling developers to build applications that reside on decentralised blockchains. Despite the huge interest and potential of smart contracts, there is still a significant knowledge and skill gap that developers need to cross in order to build web3 applications. In light of this, we introduce MazzumaGPT, a large language model that has been optimised to generate smart contract code and aid developers to scaffold development and improve productivity. As part of this research, we outline the optimisation and fine-tuning parameters, evaluate the model's performance on functional correctness and address the limitations and broader impacts of our research.
    摘要 Programming 一直是现代科技创新的核心在21世纪。随着区块链技术的出现和分布式应用程序的普及,智能合约帮助开发者建立在分布式区块链上的应用程序。虽然智能合约具有巨大的潜在利益和潜力,但开发者仍然需要跨越一定的知识和技能差距来构建Web3应用程序。为了解决这个问题,我们介绍MazzumaGPT,一个优化的大语言模型,可以生成智能合约代码,帮助开发者快速构建和改进开发。在这项研究中,我们详细介绍优化和细调参数,评估模型的性能并讨论我们的研究的局限性和更广泛的影响。

GSLB: The Graph Structure Learning Benchmark

  • paper_url: http://arxiv.org/abs/2310.05174
  • repo_url: https://github.com/gsl-benchmark/gslb
  • paper_authors: Zhixun Li, Liang Wang, Xin Sun, Yifan Luo, Yanqiao Zhu, Dingshuo Chen, Yingtao Luo, Xiangxin Zhou, Qiang Liu, Shu Wu, Liang Wang, Jeffrey Xu Yu
  • for: 本研究的目的是为Graph Structure Learning (GSL)提供一个系统的分析和评估,以便更好地理解GSL在不同情况下的表现。
  • methods: 本研究使用了20种不同的图 dataset和16种不同的 GSL 算法,并进行了系统的性能分析和比较。
  • results: 研究发现,GSL 在 node-level 和 graph-level 任务中表现出色,并且在鲁棒学习和模型复杂度方面也有出色的表现。
    Abstract Graph Structure Learning (GSL) has recently garnered considerable attention due to its ability to optimize both the parameters of Graph Neural Networks (GNNs) and the computation graph structure simultaneously. Despite the proliferation of GSL methods developed in recent years, there is no standard experimental setting or fair comparison for performance evaluation, which creates a great obstacle to understanding the progress in this field. To fill this gap, we systematically analyze the performance of GSL in different scenarios and develop a comprehensive Graph Structure Learning Benchmark (GSLB) curated from 20 diverse graph datasets and 16 distinct GSL algorithms. Specifically, GSLB systematically investigates the characteristics of GSL in terms of three dimensions: effectiveness, robustness, and complexity. We comprehensively evaluate state-of-the-art GSL algorithms in node- and graph-level tasks, and analyze their performance in robust learning and model complexity. Further, to facilitate reproducible research, we have developed an easy-to-use library for training, evaluating, and visualizing different GSL methods. Empirical results of our extensive experiments demonstrate the ability of GSL and reveal its potential benefits on various downstream tasks, offering insights and opportunities for future research. The code of GSLB is available at: https://github.com/GSL-Benchmark/GSLB.
    摘要 “几年前,Graph Structure Learning(GSL)已经吸引了很多注意,因为它可以同时优化Graph Neural Networks(GNNs)的参数和计算图структура。不过,过去几年发展的GSL方法中,没有一个通用的实验设置或公平的比较方法,这导致了理解这个领域的进步受到了很大的阻碍。为了填补这个空白,我们系统地分析了GSL在不同的场景下的表现,并开发了一个全面的Graph Structure Learning Benchmark(GSLB),收集了20个多标的图数据和16种不同的GSL算法。具体来说,GSLB系统地探讨了GSL的特点在三个维度上:有效性、韧性和复杂度。我们对现今的State-of-the-art GSL算法进行了node-和graph-水平的任务,并分析了它们在Robust Learning和模型复杂度上的表现。此外,为了促进可重现性的研究,我们开发了一个容易使用的库,可以用于训练、评估和显示不同的GSL方法。我们的广泛的实验结果显示了GSL的能力,并给出了不同下游任务的可能性和未来研究的方向。GSLB的代码可以在:https://github.com/GSL-Benchmark/GSLB中找到。”

Multi-Ship Tracking by Robust Similarity metric

  • paper_url: http://arxiv.org/abs/2310.05171
  • repo_url: None
  • paper_authors: Hongyu Zhao, Gongming Wei, Yang Xiao, Xianglei Xing
  • for: 提高多船跟踪(MST)技术的应用于海上情况意识和自动船 Navigation System 的发展。
  • methods: 通过在多目标跟踪(MOT)算法中使用最小几何形态的拟合来提高跟踪性能。
  • results: 通过将TIoU metricintegrated into state-of-the-art object tracking frameworks, such as DeepSort and ByteTrack, achieving improvements in tracking performance.
    Abstract Multi-ship tracking (MST) as a core technology has been proven to be applied to situational awareness at sea and the development of a navigational system for autonomous ships. Despite impressive tracking outcomes achieved by multi-object tracking (MOT) algorithms for pedestrian and vehicle datasets, these models and techniques exhibit poor performance when applied to ship datasets. Intersection of Union (IoU) is the most popular metric for computing similarity used in object tracking. The low frame rates and severe image shake caused by wave turbulence in ship datasets often result in minimal, or even zero, Intersection of Union (IoU) between the predicted and detected bounding boxes. This issue contributes to frequent identity switches of tracked objects, undermining the tracking performance. In this paper, we address the weaknesses of IoU by incorporating the smallest convex shapes that enclose both the predicted and detected bounding boxes. The calculation of the tracking version of IoU (TIoU) metric considers not only the size of the overlapping area between the detection bounding box and the prediction box, but also the similarity of their shapes. Through the integration of the TIoU into state-of-the-art object tracking frameworks, such as DeepSort and ByteTrack, we consistently achieve improvements in the tracking performance of these frameworks.
    摘要 多船跟踪(MST)作为核心技术已被应用于海上情况意识和自动船 Navigation System 的开发。 despite impressive tracking outcomes achieved by multi-object tracking(MOT)算法for pedestrian and vehicle datasets, these models and techniques exhibit poor performance when applied to ship datasets。 Intersection of Union(IoU)是计算相似性的最受欢迎度量, However, the low frame rates and severe image shake caused by wave turbulence in ship datasets often result in minimal, or even zero, Intersection of Union(IoU)between the predicted and detected bounding boxes。 This issue contributes to frequent identity switches of tracked objects, undermining the tracking performance。 In this paper, we address the weaknesses of IoU by incorporating the smallest convex shapes that enclose both the predicted and detected bounding boxes。 The calculation of the tracking version of IoU(TIoU)metric considers not only the size of the overlapping area between the detection bounding box and the prediction box, but also the similarity of their shapes。 Through the integration of the TIoU into state-of-the-art object tracking frameworks, such as DeepSort and ByteTrack, we consistently achieve improvements in the tracking performance of these frameworks。

DeepQTest: Testing Autonomous Driving Systems with Reinforcement Learning and Real-world Weather Data

  • paper_url: http://arxiv.org/abs/2310.05170
  • repo_url: https://github.com/simula-complex/deepqtest
  • paper_authors: Chengjie Lu, Tao Yue, Man Zhang, Shaukat Ali
  • for: 这个论文的目的是提出一种基于强化学习的自动驾驶系统测试方法,以确保自动驾驶系统的安全性。
  • methods: 这种测试方法使用强化学习的深度Q学习算法,以学习环境配置,并采用了三种安全和舒适度量来构建奖励函数。
  • results: 对于三个比较基线,深度Q测试表现出显著更高的效果,能够更好地激发自动驾驶系统的异常行为,并确保测试场景的现实性。
    Abstract Autonomous driving systems (ADSs) are capable of sensing the environment and making driving decisions autonomously. These systems are safety-critical, and testing them is one of the important approaches to ensure their safety. However, due to the inherent complexity of ADSs and the high dimensionality of their operating environment, the number of possible test scenarios for ADSs is infinite. Besides, the operating environment of ADSs is dynamic, continuously evolving, and full of uncertainties, which requires a testing approach adaptive to the environment. In addition, existing ADS testing techniques have limited effectiveness in ensuring the realism of test scenarios, especially the realism of weather conditions and their changes over time. Recently, reinforcement learning (RL) has demonstrated great potential in addressing challenging problems, especially those requiring constant adaptations to dynamic environments. To this end, we present DeepQTest, a novel ADS testing approach that uses RL to learn environment configurations with a high chance of revealing abnormal ADS behaviors. Specifically, DeepQTest employs Deep Q-Learning and adopts three safety and comfort measures to construct the reward functions. To ensure the realism of generated scenarios, DeepQTest defines a set of realistic constraints and introduces real-world weather conditions into the simulated environment. We employed three comparison baselines, i.e., random, greedy, and a state-of-the-art RL-based approach DeepCOllision, for evaluating DeepQTest on an industrial-scale ADS. Evaluation results show that DeepQTest demonstrated significantly better effectiveness in terms of generating scenarios leading to collisions and ensuring scenario realism compared with the baselines. In addition, among the three reward functions implemented in DeepQTest, Time-To-Collision is recommended as the best design according to our study.
    摘要 自动驾驶系统(ADS)具有感知环境和做出自主驾驶决策的能力。这些系统的安全性非常重要,测试是确保其安全的重要方法。然而,由于ADS的内在复杂性和操作环境的高维度,测试场景的数量是无限的。此外,ADS的操作环境是动态不断变化的,充满不确定性,需要适应环境的测试方法。此外,现有的ADS测试技术对测试场景的真实性具有有限的效果,特别是天气变化和时间的变化。最近,人工智能学习(RL)已经在解决复杂问题方面表现出了极大的潜力。为此,我们提出了 DeepQTest,一种基于RL学习环境配置,以高概率暴露ADS异常行为的测试方法。具体来说,DeepQTest使用深度Q学习并采用了三种安全和舒适度量来定义奖励函数。为保证生成的场景的真实性,DeepQTest定义了一组真实的约束和将实际天气条件引入模拟环境中。我们对ADS进行了三种比较基准,即随机、积极和当前State-of-the-art RL基于approach DeepCOllision,以评估DeepQTest的效果。评估结果表明,DeepQTest与基准相比显著地提高了导致碰撞的场景生成和场景真实性的效果。此外,我们对DeepQTest中实现的三种奖励函数进行了研究,并确定了时间到碰撞为最佳设计。

Hieros: Hierarchical Imagination on Structured State Space Sequence World Models

  • paper_url: http://arxiv.org/abs/2310.05167
  • repo_url: https://github.com/snagnar/hieros
  • paper_authors: Paul Mattes, Rainer Schlosser, Ralf Herbrich
  • for: 本研究旨在提高现代深度强化学习(DRL)算法的样本效率。
  • methods: 我们提出了一种层次策略(Hieros),该策略使用了S5层来学习时间抽象的世界表示,并在幂 espacio 中预测下一个世界状态。
  • results: 我们的方法在Atari 100k Benchmark上的平均和中位数正常化人工分数中超过了现有的状态势。此外,我们的提出的世界模型能够准确预测复杂的动力学。此外,我们还发现了Hieros在探索方面的优势。
    Abstract One of the biggest challenges to modern deep reinforcement learning (DRL) algorithms is sample efficiency. Many approaches learn a world model in order to train an agent entirely in imagination, eliminating the need for direct environment interaction during training. However, these methods often suffer from either a lack of imagination accuracy, exploration capabilities, or runtime efficiency. We propose Hieros, a hierarchical policy that learns time abstracted world representations and imagines trajectories at multiple time scales in latent space. Hieros uses an S5 layer-based world model, which predicts next world states in parallel during training and iteratively during environment interaction. Due to the special properties of S5 layers, our method can train in parallel and predict next world states iteratively during imagination. This allows for more efficient training than RNN-based world models and more efficient imagination than Transformer-based world models. We show that our approach outperforms the state of the art in terms of mean and median normalized human score on the Atari 100k benchmark, and that our proposed world model is able to predict complex dynamics very accurately. We also show that Hieros displays superior exploration capabilities compared to existing approaches.
    摘要 一个现代深度奖励学习(DRL)算法的主要挑战是样本效率。许多方法尝试通过在幻想中训练代理人,从而消除直接环境互动的需要。然而,这些方法经常受到幻想准确性、探索能力或运行效率的限制。我们提出了 Hieros,一种层次策略,该策略在幻想中预测时间抽象的世界表示和轨迹,并在多个时间尺度上进行幻想。Heros使用基于 S5 层的世界模型,该模型在训练和环境互动过程中并行地预测下一个世界状态。由于 S5 层的特殊性,我们的方法可以并行地训练和在幻想中预测下一个世界状态。这使得我们的方法比 RNN 类世界模型更高效,并且比 Transformer 类世界模型更高效。我们表明,我们的方法在 Atari 100k 测试集上的平均和中位数normalized human score比 state of the art 高,并且我们提出的世界模型能够准确预测复杂的动力学。此外,我们还证明 Hieros 在探索方面表现出优于现有的方法。

MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05157
  • repo_url: https://github.com/weiyifan1023/MenatQA
  • paper_authors: Yifan Wei, Yisong Su, Huanhuan Ma, Xiaoyan Yu, Fangyu Lei, Yuanzhe Zhang, Jun Zhao, Kang Liu
  • for: This paper aims to evaluate the time comprehension and reasoning abilities of large language models (LLMs) and investigate potential improvement strategies.
  • methods: The paper constructs a benchmark task called Multiple Sensitive Factors Time QA (MenatQA) that tests LLMs’ performance on three temporal factors (scope factor, order factor, counterfactual factor) with a total of 2,853 samples.
  • results: Most LLMs fall behind smaller temporal reasoning models in terms of performance on the MenatQA task, particularly in handling temporal biases and utilizing external information. The paper also explores potential improvement strategies such as devising specific prompts and leveraging external tools.
    Abstract Large language models (LLMs) have shown nearly saturated performance on many natural language processing (NLP) tasks. As a result, it is natural for people to believe that LLMs have also mastered abilities such as time understanding and reasoning. However, research on the temporal sensitivity of LLMs has been insufficiently emphasized. To fill this gap, this paper constructs Multiple Sensitive Factors Time QA (MenatQA), which encompasses three temporal factors (scope factor, order factor, counterfactual factor) with total 2,853 samples for evaluating the time comprehension and reasoning abilities of LLMs. This paper tests current mainstream LLMs with different parameter sizes, ranging from billions to hundreds of billions. The results show most LLMs fall behind smaller temporal reasoning models with different degree on these factors. In specific, LLMs show a significant vulnerability to temporal biases and depend heavily on the temporal information provided in questions. Furthermore, this paper undertakes a preliminary investigation into potential improvement strategies by devising specific prompts and leveraging external tools. These approaches serve as valuable baselines or references for future research endeavors.
    摘要 大型语言模型(LLM)在自然语言处理(NLP)任务上已经显示出几乎满足性的性能。因此,人们对于 LLM 的时间理解和推理能力的拥有有了误解。然而,对于 LLN 的时间敏感性的研究却得不到充分的关注。为了填补这个空白,这篇文章建立了多重敏感因素时间问答(MenatQA),包括三个时间因素(范围因素、次序因素、Counterfactual因素),共有2,853个样本,用于评估 LLM 的时间理解和推理能力。这篇文章测试了现代主流 LLM 的不同参数大小,从十亿到百亿。结果显示,大多数 LLM 落后于不同程度的时间推理模型。具体来说,LLM 对于时间偏见具有重要的敏感性,并且对于时间提供的问题中的时间信息依赖很大。此外,这篇文章进行了初步的改进策略研究,包括设计特定的提示和使用外部工具。这些方法可以作为未来研究的基础或参考。

  • paper_url: http://arxiv.org/abs/2310.05155
  • repo_url: https://github.com/qiancheng0/toolink
  • paper_authors: Cheng Qian, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu
  • for: The paper aims to develop a comprehensive framework for task-solving using tool-based chain-of-solving (CoS) approach, with the goal of leveraging smaller, open-sourced models for adaptability.
  • methods: The proposed framework, called Toolink, creates a toolkit and integrates planning and calling of tools through a CoS approach. The authors validate the efficacy of Toolink on ChatGPT and curate a CoS dataset (CoS-GPT) for task-solving. They finetune the LLaMA-7B model to create LLaMA-CoS, a powerful open-source model with advanced tool-planning and tool-calling capabilities.
  • results: The evaluation on diverse tasks from BIG-bench shows that LLaMA-CoS matches the CoS ability of ChatGPT while surpassing the chain-of-thought approach in performance. The study also demonstrates the generalization of LLaMA-CoS to unseen tasks and its capability in using toolkits not explicitly tailored for the target task, affirming its robustness in real-world scenarios.
    Abstract Large Language Models (LLMs) have demonstrated remarkable progress in utilizing tools, but their closed-source nature and high inference costs pose limitations on their adaptability, necessitating a valid method that leverages smaller, open-sourced models. In this paper, we introduce Toolink, a comprehensive framework that performs task-solving by first creating a toolkit and then integrating the planning and calling of tools through a chain-of-solving (CoS) approach. We first validate the efficacy of Toolink in harnessing the model's creativity and CoS ability on ChatGPT. Subsequently, we curate CoS-GPT, a chain-of-solving dataset designed for tool-using, and finetune the LLaMA-7B model. It results in LLaMA-CoS, a powerful open-source model with advanced tool-planning and tool-calling capabilities. Evaluation on diverse tasks from BIG-bench demonstrates its CoS ability matches that of ChatGPT while its performance surpasses the chain-of-thought approach. Further studies highlight the generalization of LLaMA-CoS to unseen tasks and showcase its capability in using toolkits not explicitly tailored for the target task, affirming its robustness in real-world scenarios. All codes and data are released.
    摘要 Here is the translation in Simplified Chinese:大型语言模型(LLMs)已经展示出具有优异的进步,但它们的关闭源代码和高推论成本导致它们的适应性有限,需要一个有效的方法来应用小型开源模型。在这篇文章中,我们介绍Toolink,一个完整的框架,通过链式解决(CoS)方法来实现任务解决。我们首先验证Toolink在ChatGPT上的有效性,然后创建CoS-GPT dataset,并调整LLaMA-7B模型。它将实现LLaMA-CoS,一个开源模型,拥有进步的工具规划和工具呼叫能力。我们从BIG-bench中的多个任务进行评估,发现LLaMA-CoS的CoS能力与ChatGPT相似,并且其表现超过链式思维方法。此外,我们还进行了进一步的研究,证明LLaMA-CoS具有对未见任务的普遍性和在不同的工具集上的可行性,这证明了它在实际情况中的可靠性。所有代码和数据都是公开发布。

Large Language Model (LLM) as a System of Multiple Expert Agents: An Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge

  • paper_url: http://arxiv.org/abs/2310.05146
  • repo_url: https://github.com/tanchongmin/arc-challenge
  • paper_authors: John Chong Min Tan, Mehul Motani
  • for: 解决Abstraction and Reasoning Corpus(ARC)挑战,使用大型自然语言模型(LLM)作为多个专家系统。
  • methods: 使用LLM的灵活性,通过零shot、几shot、上下文固定的提示,让LLM解决多种新任务。首先将输入图像转化为多种适合的文本抽象空间,然后利用LLM的协同力量,Derive输入-输出关系,并将其映射到动作形式的工作程序,类似于Voyager / Ghost in MineCraft。 Additionally, use iterative environmental feedback to guide LLMs to solve the task.
  • results: 使用提posed方法解决111个训练集问题中的50个(45%),只需三个抽象空间 - 网格、对象和像素。我们认为,通过添加更多抽象空间和学习动作,我们将能够解决更多问题。
    Abstract We attempt to solve the Abstraction and Reasoning Corpus (ARC) Challenge using Large Language Models (LLMs) as a system of multiple expert agents. Using the flexibility of LLMs to be prompted to do various novel tasks using zero-shot, few-shot, context-grounded prompting, we explore the feasibility of using LLMs to solve the ARC Challenge. We firstly convert the input image into multiple suitable text-based abstraction spaces. We then utilise the associative power of LLMs to derive the input-output relationship and map this to actions in the form of a working program, similar to Voyager / Ghost in the MineCraft. In addition, we use iterative environmental feedback in order to guide LLMs to solve the task. Our proposed approach achieves 50 solves out of 111 training set problems (45%) with just three abstraction spaces - grid, object and pixel - and we believe that with more abstraction spaces and learnable actions, we will be able to solve more.
    摘要 我们尝试使用大型自然语言模型(LLM)解决抽象和逻辑 Corpora(ARC)挑战,以多个专家代理系统的形式进行解决。通过使用 LLM 的灵活性,我们可以使其响应各种新任务,使用零上下文、几上下文、上下文固定的提示,探索使用 LLM 解决 ARC 挑战的可能性。首先,我们将输入图像转换为多个适合的文本基于抽象空间。然后,我们利用 LLMS 的协同力来推导输入-输出关系,并将其映射到作为工作程序的动作,类似于 Voyager / Ghost 在 MineCraft 中。此外,我们使用迭代环境反馈,以引导 LLMS 解决任务。我们的提议方法已经实现了 50 个训练集问题(45%)的解决,只使用了三个抽象空间 - 网格、对象和像素 - 并我们认为,通过添加更多的抽象空间和学习动作,我们将能够解决更多的问题。

NeuralFastLAS: Fast Logic-Based Learning from Raw Data

  • paper_url: http://arxiv.org/abs/2310.05145
  • repo_url: None
  • paper_authors: Theo Charalambous, Yaniv Aspis, Alessandra Russo
  • for: 本研究旨在提出一种可扩展和高效的综合方法,即NeuralFastLAS,用于同时训练神经网络和符号学习器。
  • methods: NeuralFastLAS使用一种新的约束优化技术,通过学习一个 posterior distribution 来提高训练稳定性。
  • results: 实验结果表明,NeuralFastLAS可以在数学和逻辑任务中达到状态革命级别的准确率,训练时间比其他同时训练神经网络和符号学习器的方法快到两个数量级。
    Abstract Symbolic rule learners generate interpretable solutions, however they require the input to be encoded symbolically. Neuro-symbolic approaches overcome this issue by mapping raw data to latent symbolic concepts using a neural network. Training the neural and symbolic components jointly is difficult, due to slow and unstable learning, hence many existing systems rely on hand-engineered rules to train the network. We introduce NeuralFastLAS, a scalable and fast end-to-end approach that trains a neural network jointly with a symbolic learner. For a given task, NeuralFastLAS computes a relevant set of rules, proved to contain an optimal symbolic solution, trains a neural network using these rules, and finally finds an optimal symbolic solution to the task while taking network predictions into account. A key novelty of our approach is learning a posterior distribution on rules while training the neural network to improve stability during training. We provide theoretical results for a sufficient condition on network training to guarantee correctness of the final solution. Experimental results demonstrate that NeuralFastLAS is able to achieve state-of-the-art accuracy in arithmetic and logical tasks, with a training time that is up to two orders of magnitude faster than other jointly trained neuro-symbolic methods.
    摘要 symbolic rule learners 可以生成可读解释的解决方案,但是它们需要输入数据被编码成符号形式。 neural-symbolic 方法可以将原始数据映射到隐藏的符号概念上使用神经网络,从而解决这个问题。 然而,在培aujointly trained neural and symbolic components 的问题上,存在慢速和不稳定的学习问题,因此许多现有系统通常采用手工设计规则来训练网络。我们介绍NeuralFastLAS,一种可扩展和快速的终端方法,可以同时训练神经网络和符号学习器。对于给定任务,NeuralFastLAS 可以计算一个相关的规则集,证明其中包含最优的符号解决方案,使用这些规则来训练神经网络,并最终找到一个包含神经网络预测的最优符号解决方案。我们的方法的一个新特点是在培aujointly trained neural and symbolic components 时,学习一个 posterior distribution sobre rules 以提高培aujoint training 的稳定性。我们提供了理论结果,证明在网络训练时满足某些条件下,可以保证最终解决方案的正确性。实验结果表明,NeuralFastLAS 能够在数学和逻辑任务中达到领先的准确率,并且培aujoint training 时间比其他同时训练的神经网络和符号学习器方法快到两个数量级。

ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning

  • paper_url: http://arxiv.org/abs/2310.05143
  • repo_url: https://github.com/microsoft/personalizedfl
  • paper_authors: Wang Lu, Hao Yu, Jindong Wang, Damien Teney, Haohan Wang, Yiqiang Chen, Qiang Yang, Xing Xie, Xiangyang Ji
  • for: 这篇论文旨在解决个性化 Federated Learning (FL) 中资源有限的问题,包括数据、计算和通信成本,以及访问模型的限制。
  • methods: 该论文提出了一种名为 ZOOPFL 的方法,使用零阶优化解决分布偏移问题,并使用简单 yet effective 的线性投影进行个性化。此外,它还使用输入修复来投影预测值。
  • results: 广泛的实验表明,ZOOPFL 可以有效地应用于黑盒基模型上的 FL 任务,并且可以提高个性化的精度。
    Abstract When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. i.e., distribution shifts between clients. To do so, we propose a method named ZOOPFL that uses Zeroth-Order Optimization for Personalized Federated Learning. ZOOPFL avoids direct interference with the foundation models and instead learns to adapt its inputs through zeroth-order optimization. In addition, we employ simple yet effective linear projections to remap its predictions for personalization. To reduce the computation costs and enhance personalization, we propose input surgery to incorporate an auto-encoder with low-dimensional and client-specific embeddings. We provide theoretical support for ZOOPFL to analyze its convergence. Extensive empirical experiments on computer vision and natural language processing tasks using popular foundation models demonstrate its effectiveness for FL on black-box foundation models.
    摘要 当个性化联合学习(FL)遇到大规模基础模型时,新的挑战出现,包括不同限制的资源。除了典型的限制,如数据、计算和通信成本外,对模型的访问也经常受限。这篇论文旨在解决限制资源和个性化的两个挑战。即分布shift between客户端。为此,我们提出了一种方法名为ZOOPFL,它使用零阶优化进行个性化联合学习。ZOOPFL避免直接干扰基础模型,而是通过零阶优化学习适应输入。此外,我们使用简单 yet有效的线性映射来重新映射其预测。为了减少计算成本并提高个性化,我们提议输入手术,其中包括一个低维度的自动encoder和客户端特定的嵌入。我们提供了对ZOOPFL的理论支持,以分析其相对稳定性。我们对计算机视觉和自然语言处理任务使用了流行的基础模型进行了广泛的实验,以证明ZOOPFL在黑盒基础模型上的有效性。

Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements

  • paper_url: http://arxiv.org/abs/2310.05140
  • repo_url: None
  • paper_authors: Yushan Qian, Wei-Nan Zhang, Ting Liu
  • for: 这个论文主要研究了大语言模型(LLMs)在建立和谐社会关系中的应用效果,以及如何使用LLMs提高对话的同理能力。
  • methods: 本文提出了三种改进方法,包括semantically similar in-context learning、two-stage interactive generation和知识库的组合。
  • results: 广泛的实验表明,LLMs可以在我们提出的方法的帮助下显著提高对话的同理能力,并在自动和人类评价中达到了领先水平。此外,我们还探讨了GPT-4可以模拟人类评价者的可能性。
    Abstract Empathetic dialogue is an indispensable part of building harmonious social relationships and contributes to the development of a helpful AI. Previous approaches are mainly based on fine small-scale language models. With the advent of ChatGPT, the application effect of large language models (LLMs) in this field has attracted great attention. This work empirically investigates the performance of LLMs in generating empathetic responses and proposes three improvement methods of semantically similar in-context learning, two-stage interactive generation, and combination with the knowledge base. Extensive experiments show that LLMs can significantly benefit from our proposed methods and is able to achieve state-of-the-art performance in both automatic and human evaluations. Additionally, we explore the possibility of GPT-4 simulating human evaluators.
    摘要 帮助AI的发展,对话是不可或缺的一部分。以前的方法主要基于细致语言模型。随着ChatGPT的出现,大语言模型(LLMs)在这一领域的应用效果吸引了广泛的关注。本研究employs three improvement methods of semantically similar in-context learning, two-stage interactive generation, and combination with the knowledge base to investigate the performance of LLMs in generating empathetic responses. Our extensive experiments show that LLMs can significantly benefit from our proposed methods and achieve state-of-the-art performance in both automatic and human evaluations. In addition, we explore the possibility of GPT-4 simulating human evaluators.

Maximizing Utilitarian and Egalitarian Welfare of Fractional Hedonic Games on Tree-like Graphs

  • paper_url: http://arxiv.org/abs/2310.05139
  • repo_url: None
  • paper_authors: Tesshu Hanaka, Airi Ikeyama, Hirotaka Ono
  • for: Fractional hedonic games are coalition formation games where a player’s utility is determined by the average value they assign to the members of their coalition.
  • methods: The paper presents (pseudo)polynomial-time algorithms to compute welfare-maximizing partitions in fractional hedonic games on tree-like graphs, including two types of social welfare measures: utilitarian and egalitarian.
  • results: The paper provides a hardness result, demonstrating that the pseudopolynomial-time solvability is the best possible under the assumption P$\neq$NP.
    Abstract Fractional hedonic games are coalition formation games where a player's utility is determined by the average value they assign to the members of their coalition. These games are a variation of graph hedonic games, which are a class of coalition formation games that can be succinctly represented. Due to their applicability in network clustering and their relationship to graph hedonic games, fractional hedonic games have been extensively studied from various perspectives. However, finding welfare-maximizing partitions in fractional hedonic games is a challenging task due to the nonlinearity of utilities. In fact, it has been proven to be NP-hard and can be solved in polynomial time only for a limited number of graph classes, such as trees. This paper presents (pseudo)polynomial-time algorithms to compute welfare-maximizing partitions in fractional hedonic games on tree-like graphs. We consider two types of social welfare measures: utilitarian and egalitarian. Tree-like graphs refer to graphs with bounded treewidth and block graphs. A hardness result is provided, demonstrating that the pseudopolynomial-time solvability is the best possible under the assumption P$\neq$NP.
    摘要 幂数 Hedonic 游戏是一种协会成员选择游戏,其中玩家的产生 utility 取决于他们所在协会的平均价值。这种游戏是图 Hedonic 游戏的一种变种,可以简洁地表示。由于它们在网络划分和图 Hedonic 游戏之间的关系,幂数 Hedonic 游戏已经得到了广泛的研究。然而,在幂数 Hedonic 游戏中找到最大启用分 partitions 是一项困难的任务,因为价值函数是非线性的。事实上,已经证明了这是 NP-hard 问题,只有在一些图类型,如树,可以在多项时间内解决。本文提出了(假)多项时间算法来计算幂数 Hedonic 游戏中的最大启用分 partitions。我们考虑了两种社会利益度量:utilitarian 和 egalitarian。树状图指的是具有固定树宽度的图和块图。我们还提供了一个困难性结果,证明了 pseudopolynomial-time 可行性是最佳的,即 P ≠ NP 的假设下。

InstructDET: Diversifying Referring Object Detection with Generalized Instructions

  • paper_url: http://arxiv.org/abs/2310.05136
  • repo_url: https://github.com/jyfenggogo/instructdet
  • paper_authors: Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song
  • for: 本文提出了一种数据驱动的对象检测方法(InstructDET),用于基于用户指令(referring expressions,REC)进行对象检测。
  • methods: 本文使用了基于用户指令的数据驱动方法,并利用了新的视觉语言模型(VLM)和大语言模型(LLM)来生成指令和对象 bounding boxes(bbxs)。
  • results: 本文通过使用 InstructDET 方法和自制的 InDET dataset,实现了在标准 REC dataset 和 InDET 测试集上超越现有方法的对象检测性能。
    Abstract We propose InstructDET, a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions. While deriving from referring expressions (REC), the instructions we leverage are greatly diversified to encompass common user intentions related to object detection. For one image, we produce tremendous instructions that refer to every single object and different combinations of multiple objects. Each instruction and its corresponding object bounding boxes (bbxs) constitute one training data pair. In order to encompass common detection expressions, we involve emerging vision-language model (VLM) and large language model (LLM) to generate instructions guided by text prompts and object bbxs, as the generalizations of foundation models are effective to produce human-like expressions (e.g., describing object property, category, and relationship). We name our constructed dataset as InDET. It contains images, bbxs and generalized instructions that are from foundation models. Our InDET is developed from existing REC datasets and object detection datasets, with the expanding potential that any image with object bbxs can be incorporated through using our InstructDET method. By using our InDET dataset, we show that a conventional ROD model surpasses existing methods on standard REC datasets and our InDET test set. Our data-centric method InstructDET, with automatic data expansion by leveraging foundation models, directs a promising field that ROD can be greatly diversified to execute common object detection instructions.
    摘要 我们提出了InstructDET,一种数据驱动的引用物体检测(ROD)方法,它基于用户指令来定位目标对象。而我们所利用的指令不仅来自引用表达(REC),还包括各种用户意图相关的对象检测指令。对于一张图像,我们生成了庞大的指令和对象 bounding box(bbxs),每个指令和对应的bbxs组成一个训练数据对。为了涵盖通用的检测表达,我们利用了趋势感知模型(VLM)和大语言模型(LLM),通过文本提示和对象bbxs来引导生成指令,这些基础模型的泛化效果可以生成人类化表达(例如,描述对象属性、类别和关系)。我们称之为InDET,它包含图像、bbxs和通用指令,这些指令来自基础模型。我们的InDET是基于现有REC dataset和对象检测dataset的扩展,可以通过我们的InstructDET方法将任何图像 WITH object bbxsintegrated。通过使用InDET数据集,我们示出了一个标准ROD模型在标准REC dataset和InDET测试集上的表现比普通方法更高。我们的数据驱动方法InstructDET,通过基于基础模型的自动扩展,指明了一个可能的场景,ROD可以通过各种常见的检测指令执行。

Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT

  • paper_url: http://arxiv.org/abs/2310.05135
  • repo_url: None
  • paper_authors: Akshaj Kumar Veldanda, Fabian Grob, Shailja Thakur, Hammond Pearce, Benjamin Tan, Ramesh Karri, Siddharth Garg
  • for: 这个研究探讨了大语言模型(LLMs)在算法招聘中的应用,特别是将简历与职业类别相匹配。
  • methods: 研究使用了场景实验来评估大语言模型对保护属性的偏见(如性别、种族和生育状况)的影响。
  • results: 研究发现,LLMs在不同的种族和性别下表现一致,但在孕期状况和政治倾向上存在偏见。使用了开源的LLMs进行对比输入解码来探讨可能的偏见源。
    Abstract Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks. One domain of interest is their use in algorithmic hiring, specifically in matching resumes with job categories. Yet, this introduces issues of bias on protected attributes like gender, race and maternity status. The seminal work of Bertrand & Mullainathan (2003) set the gold-standard for identifying hiring bias via field experiments where the response rate for identical resumes that differ only in protected attributes, e.g., racially suggestive names such as Emily or Lakisha, is compared. We replicate this experiment on state-of-art LLMs (GPT-3.5, Bard, Claude and Llama) to evaluate bias (or lack thereof) on gender, race, maternity status, pregnancy status, and political affiliation. We evaluate LLMs on two tasks: (1) matching resumes to job categories; and (2) summarizing resumes with employment relevant information. Overall, LLMs are robust across race and gender. They differ in their performance on pregnancy status and political affiliation. We use contrastive input decoding on open-source LLMs to uncover potential sources of bias.
    摘要 大型语言模型(LLM)如GPT-3.5、Bard和Claude在多个任务中表现出色。一个有趣的领域是它们在算法招聘中的应用,特别是在匹配简历与职业类别之间。然而,这会引入保护特征如性别、种族和生育状况等的偏见。Bertrand & Mullainathan(2003)的著名研究设置了标准 для识别招聘偏见,通过在实验室中对同样的简历进行比较,以确定它们是否具有保护特征。我们在当今最高级的LLMs(GPT-3.5、Bard、Claude和Llama)上重复了这个实验,以评估它们对gender、种族、生育状况、怀孕状况和政治信仰等保护特征的偏见。我们在两个任务上评估LLMs:(1)匹配简历与职业类别之间;和(2)摘要简历中有关雇佣信息。总的来说,LLMs在gender和种族方面都很稳定,但在怀孕状况和政治信仰方面存在差异。我们使用开源LLMs的对比输入解码来探测可能的偏见源。

ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction

  • paper_url: http://arxiv.org/abs/2310.05129
  • repo_url: None
  • paper_authors: Jiajun He, Zekun Yang, Tomoki Toda
  • for: 提高自然语言处理(NLP)任务中罕见词的识别精度,以优化下游任务 such as 关键词检测、意图检测和文本概要生成。
  • methods: 提出了一种基于错误检测和上下文相关知识的ASR后处理方法,通过针对预测出的错误位置进行优化decoding过程,最大化精度while minimizing unnecessary computations。此外,我们还利用罕见词名单提供额外的上下文知识,以便更好地 corrected罕见词。
  • results: 在五个数据集上实验表明,我们的提议方法可以比前一些方法更好地降低单词错误率(WER),同时保持一定的推理速度,并且在不同的ASR系统上表现出良好的鲁棒性。
    Abstract Automatic speech recognition (ASR) systems often encounter difficulties in accurately recognizing rare words, leading to errors that can have a negative impact on downstream tasks such as keyword spotting, intent detection, and text summarization. To address this challenge, we present a novel ASR postprocessing method that focuses on improving the recognition of rare words through error detection and context-aware error correction. Our method optimizes the decoding process by targeting only the predicted error positions, minimizing unnecessary computations. Moreover, we leverage a rare word list to provide additional contextual knowledge, enabling the model to better correct rare words. Experimental results across five datasets demonstrate that our proposed method achieves significantly lower word error rates (WERs) than previous approaches while maintaining a reasonable inference speed. Furthermore, our approach exhibits promising robustness across different ASR systems.
    摘要 自动语音识别(ASR)系统经常遇到罕见词汇识别错误,导致下游任务如关键词检测、意图检测和文本概要 SUMMARIZATION 中的错误。为解决这个挑战,我们提出了一种新的 ASR 后处理方法,旨在提高罕见词汇识别的准确率。我们的方法优化了解码过程,只targeting 预测出的错误位置,最小化无用的计算。此外,我们利用罕见词表来提供额外的contextual knowledge,使模型更好地更正罕见词汇。实验结果 across five datasets 表明,我们提出的方法可以在word error rate(WER)下达到 significanly 更高的准确率,同时保持合理的推理速度。此外,我们的方法在不同的 ASR 系统上也展现出了良好的Robustness。

Instances and Labels: Hierarchy-aware Joint Supervised Contrastive Learning for Hierarchical Multi-Label Text Classification

  • paper_url: http://arxiv.org/abs/2310.05128
  • repo_url: https://github.com/simonucl/HJCL
  • paper_authors: Simon Chi Lok U, Jie He, Víctor Gutiérrez-Basulto, Jeff Z. Pan
  • for: 这个研究的目的是解决多个标签分类中的多个标签间的关联性问题。
  • methods: 这个研究使用了对生成的标签类别进行对照学习,以将文本和标签嵌入更加接近。
  • results: 实验结果显示,HJCL可以实现了优异的结果,并且显示了对于多个标签分类的效果。
    Abstract Hierarchical multi-label text classification (HMTC) aims at utilizing a label hierarchy in multi-label classification. Recent approaches to HMTC deal with the problem of imposing an over-constrained premise on the output space by using contrastive learning on generated samples in a semi-supervised manner to bring text and label embeddings closer. However, the generation of samples tends to introduce noise as it ignores the correlation between similar samples in the same batch. One solution to this issue is supervised contrastive learning, but it remains an underexplored topic in HMTC due to its complex structured labels. To overcome this challenge, we propose $\textbf{HJCL}$, a $\textbf{H}$ierarchy-aware $\textbf{J}$oint Supervised $\textbf{C}$ontrastive $\textbf{L}$earning method that bridges the gap between supervised contrastive learning and HMTC. Specifically, we employ both instance-wise and label-wise contrastive learning techniques and carefully construct batches to fulfill the contrastive learning objective. Extensive experiments on four multi-path HMTC datasets demonstrate that HJCL achieves promising results and the effectiveness of Contrastive Learning on HMTC.
    摘要 Here's the translation in Simplified Chinese: Hierarchical Multi-label Text Classification (HMTC) targets 使用标签层次结构在多标签分类中。现有的 HMTC 方法面临着通过使用半supervised contrastive learning 来违规输出空间的问题,这会使文本和标签嵌入更加紧密。然而,生成样本通常会引入噪声,因为它们忽略了同一个批处理中的相似样本之间的相关性。一种解决这个问题的方法是使用supervised contrastive learning,但它在 HMTC 中尚未得到充分发挥。为了 bridge 这两种方法之间的差异,我们提出了 Hierarchy-aware Joint Supervised Contrastive Learning (HJCL) 方法。特别是,我们使用了 both instance-wise 和 label-wise contrastive learning 技术,并且细心地构造批处理来满足对做对的 contrastive learning 目标。广泛的实验表明,HJCL 在四个多路 HMTC 数据集上达到了可塑性和 HMTC 中的对比学习的效果。

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

  • paper_url: http://arxiv.org/abs/2310.05126
  • repo_url: https://github.com/lukeforeveryoung/ureader
  • paper_authors: Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang
  • for: 这个研究旨在提出一个universal OCR-free visually-situated language understanding模型,以便在文档、表格、图表、自然图像和网页 screenshot 等多种类型的视觉文本中进行语言理解。
  • methods: 本研究使用 Multimodal Large Language Model (MLLM),并将其训练为可以进行多种类型的视觉文本理解任务,包括文档、表格、图表、自然图像和网页 screenshot 等。此外,研究者还将两个辅助任务添加到模型中,以增强模型的视觉文本和 semantics 理解能力。
  • results: 根据研究结果,这个单一模型可以在8个不同类型的视觉文本理解任务中实现state-of-the-art的性能,不需要进行下游训练。此外,研究者还发现这个模型可以对高分辨率的图像进行有效的处理,并且可以快速地处理大量的视觉文本。
    Abstract Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs. In this work, we propose UReader, a first exploration of universal OCR-free visually-situated language understanding based on the Multimodal Large Language Model (MLLM). By leveraging the shallow text recognition ability of the MLLM, we only finetuned 1.2% parameters and the training cost is much lower than previous work following domain-specific pretraining and finetuning paradigms. Concretely, UReader is jointly finetuned on a wide range of Visually-situated Language Understanding tasks via a unified instruction format. To enhance the visual text and semantic understanding, we further apply two auxiliary tasks with the same format, namely text reading and key points generation tasks. We design a shape-adaptive cropping module before the encoder-decoder architecture of MLLM to leverage the frozen low-resolution vision encoder for processing high-resolution images. Without downstream finetuning, our single model achieves state-of-the-art ocr-free performance in 8 out of 10 visually-situated language understanding tasks, across 5 domains: documents, tables, charts, natural images, and webpage screenshots. Codes and instruction-tuning datasets will be released.
    摘要 文本在我们的视觉世界中 ubique, 传递重要信息,如文档、网站和日常照片。在这项工作中,我们提出了 UReader,一种首次探索的无需 OCR 的通用视觉语言理解基于多模态大语言模型(MLLM)。我们利用 MLLM 的浅文本认知能力,只需要 finetune 1.2% 的参数,训练成本远低于先前的领域特定预训练和 fine-tuning 方法。具体来说,UReader 是通过一种统一的指令格式进行联合训练多种视觉语言理解任务。为了增强视觉文本和 semantics 理解,我们还应用了两个辅助任务,即文本读取和关键点生成任务。我们设计了适应形式的截取模块,以便使用冻结的低分辨率视觉Encoder 处理高分辨率图像。无需下游训练,我们的单个模型在 8 个视觉语言理解任务中 achievement state-of-the-art OCR-free 性能,覆盖 5 个领域:文档、表格、图表、自然图像和网页截屏。我们将代码和 instrucion-tuning 数据集发布。

Distribution-Based Trajectory Clustering

  • paper_url: http://arxiv.org/abs/2310.05123
  • repo_url: https://github.com/IsolationKernel/TIDKC
  • paper_authors: Zi Jing Wang, Ye Zhu, Kai Ming Ting
  • for: trajectory clustering, 探索 trajectory 数据中的共同模式
  • methods: 使用 Isolation Distributional Kernel (IDK) 作为主要工具,以实现 trajectory 相似度测量和归类
  • results: 比较传统和深度学习基于距离度量的方法,IDK 能够更好地捕捉 trajectory 中复杂的结构,并且提供了更高效和稳定的归类性能。
    Abstract Trajectory clustering enables the discovery of common patterns in trajectory data. Current methods of trajectory clustering rely on a distance measure between two points in order to measure the dissimilarity between two trajectories. The distance measures employed have two challenges: high computational cost and low fidelity. Independent of the distance measure employed, existing clustering algorithms have another challenge: either effectiveness issues or high time complexity. In this paper, we propose to use a recent Isolation Distributional Kernel (IDK) as the main tool to meet all three challenges. The new IDK-based clustering algorithm, called TIDKC, makes full use of the distributional kernel for trajectory similarity measuring and clustering. TIDKC identifies non-linearly separable clusters with irregular shapes and varied densities in linear time. It does not rely on random initialisation and is robust to outliers. An extensive evaluation on 7 large real-world trajectory datasets confirms that IDK is more effective in capturing complex structures in trajectories than traditional and deep learning-based distance measures. Furthermore, the proposed TIDKC has superior clustering performance and efficiency to existing trajectory clustering algorithms.
    摘要 trajectory clustering可以揭示行程数据中的共同模式。现有的行程 clustering方法都基于两点之间的距离度量来衡量行程之间的不同。现有的距离度量面临两个挑战:高计算成本和低准确性。独立于选择的距离度量,现有的归类算法又面临另一个挑战:效果不佳或高时间复杂度。在本文中,我们提议使用最近的隔离分布 kernel(IDK)作为主要工具,以解决这三个挑战。我们称之为 TIDKC 归类算法。 TIDKC 利用分布 kernel 来衡量行程之间的相似度,并且可以快速地找到非线性分割的弯曲形状和不规则的分布。它不需要随机初始化,并且对异常值有较高的Robustness。我们对 7 个大的实际行程数据集进行了广泛的评估,发现 IDK 可以更好地捕捉行程中的复杂结构,比传统和深度学习基于的距离度量更有效。此外,我们的提议的 TIDKC 归类算法也比现有的行程归类算法有更高的归类性和效率。

Breaking Down Word Semantics from Pre-trained Language Models through Layer-wise Dimension Selection

  • paper_url: http://arxiv.org/abs/2310.05115
  • repo_url: None
  • paper_authors: Nayoung Choi
  • for: 本研究旨在分析BERT中各层中的不同语言知识,以及分离含义的不同方面。
  • methods: 本研究使用了一种 binary mask 技术,将中间输出规范化到不同层,以便分离含义。
  • results: 实验结果表明,通过层划分信息可以提高表达效果,而分离含义更进一步提高表达效果。
    Abstract Contextual word embeddings obtained from pre-trained language model (PLM) have proven effective for various natural language processing tasks at the word level. However, interpreting the hidden aspects within embeddings, such as syntax and semantics, remains challenging. Disentangled representation learning has emerged as a promising approach, which separates specific aspects into distinct embeddings. Furthermore, different linguistic knowledge is believed to be stored in different layers of PLM. This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers, without updating pre-trained parameters. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning. Experiments with cased BERT$_{\texttt{base}$ show that leveraging layer-wise information is effective and disentangling semantic sense further improve performance.
    摘要 Contextual word embeddings obtained from pre-trained language model (PLM) have proven effective for various natural language processing tasks at the word level. However, interpreting the hidden aspects within embeddings, such as syntax and semantics, remains challenging. Disentangled representation learning has emerged as a promising approach, which separates specific aspects into distinct embeddings. Furthermore, different linguistic knowledge is believed to be stored in different layers of PLM. This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers, without updating pre-trained parameters. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning. Experiments with cased BERT$_{\texttt{base}$ show that leveraging layer-wise information is effective and disentangling semantic sense further improves performance.Here's the translation in Traditional Chinese:Contextual word embeddings obtained from pre-trained language model (PLM) have proven effective for various natural language processing tasks at the word level. However, interpreting the hidden aspects within embeddings, such as syntax and semantics, remains challenging. Disentangled representation learning has emerged as a promising approach, which separates specific aspects into distinct embeddings. Furthermore, different linguistic knowledge is believed to be stored in different layers of PLM. This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers, without updating pre-trained parameters. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning. Experiments with cased BERT$_{\texttt{base}$ show that leveraging layer-wise information is effective and disentangling semantic sense further improves performance.

Zero-Shot Detection of Machine-Generated Codes

  • paper_url: http://arxiv.org/abs/2310.05103
  • repo_url: https://github.com/baoguangsheng/fast-detect-gpt
  • paper_authors: Xianjun Yang, Kexun Zhang, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng
  • for: 本研究旨在提出一种不需要训练的方法,用于检测 LLMS 生成的代码,以避免这些代码的不当使用而带来的风险。
  • methods: 我们修改了之前的零批文本检测方法 DetectGPT(Mitchell et al., 2023),使用一个代理白盒模型来估算最右侧的字符的概率,以便识别由语言模型生成的代码片断。
  • results: 我们通过对 CodeContest 和 APPS 数据集的 python 代码进行了广泛的实验,并demonstrated 我们的方法可以在 text-davinci-003、GPT-3.5 和 GPT-4 模型上达到领先的检测结果。此外,我们的方法还能够抗 Reynolds 攻击和通用化到 Java 代码。
    Abstract This work proposes a training-free approach for the detection of LLMs-generated codes, mitigating the risks associated with their indiscriminate usage. To the best of our knowledge, our research is the first to investigate zero-shot detection techniques applied to code generated by advanced black-box LLMs like ChatGPT. Firstly, we find that existing training-based or zero-shot text detectors are ineffective in detecting code, likely due to the unique statistical properties found in code structures. We then modify the previous zero-shot text detection method, DetectGPT (Mitchell et al., 2023) by utilizing a surrogate white-box model to estimate the probability of the rightmost tokens, allowing us to identify code snippets generated by language models. Through extensive experiments conducted on the python codes of the CodeContest and APPS dataset, our approach demonstrates its effectiveness by achieving state-of-the-art detection results on text-davinci-003, GPT-3.5, and GPT-4 models. Moreover, our method exhibits robustness against revision attacks and generalizes well to Java codes. We also find that the smaller code language model like PolyCoder-160M performs as a universal code detector, outperforming the billion-scale counterpart. The codes will be available at https://github.com/ Xianjun-Yang/Code_detection.git
    摘要 这个研究提出了一种不需要训练的方法,用于检测 LLMs 生成的代码,从而降低这些代码的不当使用所带来的风险。据我们所知,我们的研究是首次应用零shot 检测技术于高级黑盒 LLMs 如 ChatGPT 生成的代码中。我们发现,现有的训练基于或零shot 文本检测器都不能有效地检测代码,可能是因为代码结构的独特统计特性。我们然后对之前的零shot 文本检测方法 DetectGPT(Mitchell et al., 2023)进行修改,通过利用代理白盒模型来估计右侧的最后几个字符的概率,从而识别 LLMs 生成的代码片断。经过对 Python 代码 dataset CodeContest 和 APPS 进行了广泛的实验,我们的方法在 text-davinci-003、GPT-3.5 和 GPT-4 模型上达到了最佳检测结果。此外,我们的方法还能够抗 revision 攻击,并在 Java 代码上显示良好的泛化性。我们还发现,较小的代码语言模型 PolyCoder-160M 可以作为一个通用的代码检测器,超过了一个百亿级模型的性能。代码将在 上提供。

Intelligent DRL-Based Adaptive Region of Interest for Delay-sensitive Telemedicine Applications

  • paper_url: http://arxiv.org/abs/2310.05099
  • repo_url: None
  • paper_authors: Abdulrahman Soliman, Amr Mohamed, Elias Yaacoub, Nikhil V. Navkar, Aiman Erbad
  • for: 本研究旨在提高 телемедицина应用的效率和质量,尤其是在 COVID-19 大流行后。
  • methods: 本研究使用 Deep Reinforcement Learning(DRL)模型,智能调整 ROI 大小和非 ROI 质量,以适应网络带宽变化。
  • results: 比较结果表明,DRL 模型可以降低延迟率 by 13%,并保持总质量在可接受范围内。这些发现对 телемедицина应用有很大的价值提升。
    Abstract Telemedicine applications have recently received substantial potential and interest, especially after the COVID-19 pandemic. Remote experience will help people get their complex surgery done or transfer knowledge to local surgeons, without the need to travel abroad. Even with breakthrough improvements in internet speeds, the delay in video streaming is still a hurdle in telemedicine applications. This imposes using image compression and region of interest (ROI) techniques to reduce the data size and transmission needs. This paper proposes a Deep Reinforcement Learning (DRL) model that intelligently adapts the ROI size and non-ROI quality depending on the estimated throughput. The delay and structural similarity index measure (SSIM) comparison are used to assess the DRL model. The comparison findings and the practical application reveal that DRL is capable of reducing the delay by 13% and keeping the overall quality in an acceptable range. Since the latency has been significantly reduced, these findings are a valuable enhancement to telemedicine applications.
    摘要 随着 télémedicine 应用的潜在和兴趣的不断增长,尤其是在 COVID-19 大流行之后。远程经验可以帮助人们完成复杂的手术或传输知识到地方外科医生,无需出国。尽管互联网速度有了 significative 的改善,但视频流程延迟仍然是 телеmedicine 应用的一大障碍。为了解决这个问题,这篇论文提出了一种基于深度强化学习(DRL)模型,该模型可以智能调整 ROI 大小和非 ROI 质量,以适应估算的吞吐量。延迟和结构相似度指数(SSIM)比较是用于评估 DRL 模型的。对比结果和实际应用显示,DRL 可以降低延迟约 13%,并保持总质量在可接受范围内。由于延迟得到了重要的减少,这些发现对 télémedicine 应用是有价值的改进。

How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts

  • paper_url: http://arxiv.org/abs/2310.05095
  • repo_url: None
  • paper_authors: Tharindu Kumarage, Paras Sheth, Raha Moraffah, Joshua Garland, Huan Liu
  • for: 本研究旨在评估高性能探测器的可靠性,以响应AI生成文本的滥用问题。
  • methods: 我们提出了一种新的应对方法,即通过调整PLM的软提示来导致PLM生成”人类化”的文本,以诱导探测器做出错误判断。我们在两步中实现了universal逃脱提示:首先,我们为特定PLM设计了逃脱软提示,然后通过软提示的传输性来将学习到的逃脱软提示传递到另一个PLM上。
  • results: 我们通过多种PLM在不同写作任务中进行了广泛的实验,并评估了逃脱软提示的效果。结果表明,逃脱软提示能够成功地诱导探测器做出错误判断,并且可以在不同的PLM和写作任务中实现高度的可重复性和稳定性。
    Abstract In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing "human-like" text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft prompts to transfer the learned evasive soft prompt from one PLM to another. Employing multiple PLMs in various writing tasks, we conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.
    摘要 近年来,人工智能生成文本的迅速扩散,主要受到强大预训练语言模型(PLM)的释放所驱动。为了解决人工智能生成文本的违规问题,许多高性能的检测器被开发出来,包括OpenAI检测器和斯坦福DetectGPT。在我们的研究中,我们问到这些检测器的可靠性。我们回答这个问题,我们设计了一种新的方法,可以让任何PLM生成文本,以逃脱这些高性能的检测器。我们的方法建议一种通用逃脱提示,一种新的软提示,可以导引PLM生成“人类化”的文本,使检测器受到误导。我们的新通用逃脱提示包括两个步骤:首先,我们通过提示调整制定一个逃脱软提示,适应特定PLM;然后,我们利用软提示的传输性,将学习的逃脱软提示从一个PLM传递到另一个PLM。通过多种PLM在不同的写作任务中使用,我们进行了广泛的实验来评估逃脱软提示的有效性。

Learning Generalizable Agents via Saliency-Guided Features Decorrelation

  • paper_url: http://arxiv.org/abs/2310.05086
  • repo_url: None
  • paper_authors: Sili Huang, Yanchao Sun, Jifeng Hu, Siyuan Guo, Hechang Chen, Yi Chang, Lichao Sun, Bo Yang
  • for: 实现在视觉基于学习(Reinforcement Learning,RL)中agent能够通过环境变化域对环境变化的应对。
  • methods: 我们提出了Saliency-Guided Features Decorrelation(SGFD),它包括两个核心技术:Random Fourier Functions(RFF)和Saliency map。 RFF用于估计高维度像像的复杂非线性相关,而Saliency map则用于识别变化的特征。 SGFD透过样本重新权重的方式,以降低相关于变化特征的估计相关性,实现特征decorrelation。
  • results: 我们的实验结果显示,SGFD可以在广泛的试验环境中实现很好的通过率,并在处理任务不相关的变化和任务相关的变化方面具有明显的改善。
    Abstract In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i.e., establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.
    摘要 在视觉基于的回归学习(RL)中,代理人经常难以通过训练不包括的环境变化来泛化良好。这些变化可能来自任务不相关的特征,如背景噪音,也可能来自任务相关的特征,如机器人配置,都与优化的决策相关。为了在这两种情况下实现泛化,代理人需要准确地理解变化特征对决策的影响,即在政策模型中建立真实的关联。然而,由于状态空间中特征之间的自然相关性,这些关联变得杂乱不清晰,使得政策很难分辨它们。为此,我们提出了吸引力引导特征分解(SGFD),通过样本重新权重来消除这些相关性。SGFD包括两种核心技术:Random Fourier Functions(RFF)和Saliency Map。RFF用于估计高维图像中复杂非线性相关性,而Saliency Map则用于标识变化特征。在Saliency Map的引导下,SGFD通过样本重新权重来减少相关性,从而实现特征分解。我们的实验结果表明,SGFD可以在各种测试环境上广泛泛化,并在处理任务不相关的变化和任务相关的变化方面显著超越当前的方法。

FLatS: Principled Out-of-Distribution Detection with Feature-Based Likelihood Ratio Score

  • paper_url: http://arxiv.org/abs/2310.05083
  • repo_url: https://github.com/linhaowei1/flats
  • paper_authors: Haowei Lin, Yuntian Gu
  • for: 本文旨在提出一种理论支持的外围样本检测方法,用于帮助NLPT模型在实际应用中更好地识别外围样本。
  • methods: 本文提出的方法基于likelihood比率的思想,通过对外围分布$\mathcal P_{\textit{out}$和内围分布$\mathcal P_{\textit{in}$的比较,来评估测试样本$\boldsymbol{x}$的”外围性”。而现有的SOTA方法,如Maha和KNN,只计算内围分布$p_{\textit{in}(\boldsymbol{x})$,因此是不优的。
  • results: 实验表明,提出的FLatS方法可以在 популяр的 benchmark 上建立新的SOTA。此外,FLatS 还可以增强其他OOD检测方法,通过包含外围分布 $p_{\textit{out}(\boldsymbol{x})$ 的估计。
    Abstract Detecting out-of-distribution (OOD) instances is crucial for NLP models in practical applications. Although numerous OOD detection methods exist, most of them are empirical. Backed by theoretical analysis, this paper advocates for the measurement of the "OOD-ness" of a test case $\boldsymbol{x}$ through the likelihood ratio between out-distribution $\mathcal P_{\textit{out}$ and in-distribution $\mathcal P_{\textit{in}$. We argue that the state-of-the-art (SOTA) feature-based OOD detection methods, such as Maha and KNN, are suboptimal since they only estimate in-distribution density $p_{\textit{in}(\boldsymbol{x})$. To address this issue, we propose FLatS, a principled solution for OOD detection based on likelihood ratio. Moreover, we demonstrate that FLatS can serve as a general framework capable of enhancing other OOD detection methods by incorporating out-distribution density $p_{\textit{out}(\boldsymbol{x})$ estimation. Experiments show that FLatS establishes a new SOTA on popular benchmarks. Our code is publicly available at https://github.com/linhaowei1/FLatS.
    摘要 检测外部分布(OOD)实例是NLTP模型在实际应用中的关键。虽然有许多OOD检测方法存在,但大多数都是经验的。本文通过理论分析,提出测量测试 caso $\boldsymbol{x}$ 的 "OOD-ness" 通过likelihood比率计算,即在外部分布 $\mathcal P_{\textit{out}$ 和内部分布 $\mathcal P_{\textit{in}$ 之间的比较。我们认为现有的SOTA feature-based OOD检测方法,如Maha和KNN,是不佳的,因为它们只估计内部分布 $p_{\textit{in}(\boldsymbol{x})$。为解决这一问题,我们提出了FLatS,一种理解的OOD检测方法,基于likelihood比率。此外,我们还证明FLatS可以增强其他OOD检测方法,通过包含外部分布 $p_{\textit{out}(\boldsymbol{x})$ 估计。实验表明,FLatS在 популяр的benchmark上建立了新的SOTA。我们的代码在https://github.com/linhaowei1/FLatS上公开。

  • paper_url: http://arxiv.org/abs/2310.18324
  • repo_url: None
  • paper_authors: Ana L. C. Bazzan, Anderson R. Tavares, André G. Pereira, Cláudio R. Jung, Jacob Scharcanski, Joel Luis Carbonera, Luís C. Lamb, Mariana Recamonde-Mendoza, Thiago L. T. da Silveira, Viviane Moreira
    for:* The paper is written to provide an overview of the ever-evolving landscape of Artificial Intelligence (AI) and its applications in various sectors of the economy, impacting society and humanity.methods:* The paper analyzes the risks that come with rapid technological progress and future trends in AI, as well as the potential for AI to become a general-purpose technology like electricity.results:* The paper explores the transformative impact of AI on society, with the potential to revolutionize sectors of the economy and impact humanity in the same way that electricity did in the 19th and 20th centuries.
    Abstract The thought-provoking analogy between AI and electricity, made by computer scientist and entrepreneur Andrew Ng, summarizes the deep transformation that recent advances in Artificial Intelligence (AI) have triggered in the world. This chapter presents an overview of the ever-evolving landscape of AI, written in Portuguese. With no intent to exhaust the subject, we explore the AI applications that are redefining sectors of the economy, impacting society and humanity. We analyze the risks that may come along with rapid technological progress and future trends in AI, an area that is on the path to becoming a general-purpose technology, just like electricity, which revolutionized society in the 19th and 20th centuries. A provocativa compara\c{c}\~ao entre IA e eletricidade, feita pelo cientista da computa\c{c}\~ao e empreendedor Andrew Ng, resume a profunda transforma\c{c}\~ao que os recentes avan\c{c}os em Intelig\^encia Artificial (IA) t\^em desencadeado no mundo. Este cap\'itulo apresenta uma vis\~ao geral pela paisagem em constante evolu\c{c}\~ao da IA. Sem pretens\~oes de exaurir o assunto, exploramos as aplica\c{c}\~oes que est\~ao redefinindo setores da economia, impactando a sociedade e a humanidade. Analisamos os riscos que acompanham o r\'apido progresso tecnol\'ogico e as tend\^encias futuras da IA, \'area que trilha o caminho para se tornar uma tecnologia de prop\'osito geral, assim como a eletricidade, que revolucionou a sociedade dos s\'eculos XIX e XX.
    摘要 思想提出的人工智能和电力相似性比喻,由计算机科学家和企业家安드鲁·涅(Andrew Ng)提出,概括了由最近的人工智能技术进步所Trigger的深刻变革。本章介绍了人工智能领域的不断发展,无意尝试涵盖所有方面。我们探讨人工智能在经济、社会和人类生活中的应用,以及可能随着技术进步而出现的风险。我们还分析了人工智能的未来趋势,该领域正在踏上成为一种通用技术的道路,类似于电力在19世纪和20世纪所 triggers 的社会革命。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is also widely used, especially in Taiwan, Hong Kong, and Macau.

DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

  • paper_url: http://arxiv.org/abs/2310.05074
  • repo_url: https://github.com/hccngu/dialcot
  • paper_authors: Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao, Baoyuan Wang
  • for: 提高小语言模型(SLM)的逻辑能力
  • methods: 对 reasoning 任务进行对话指导,并使用 proximal policy optimization(PPO)算法优化逻辑路径选择
  • results: 在四个算术逻辑 dataset 上实现了显著性能提升,比前一代竞争者更好
    Abstract Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective or even detrimental when applied to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. To address this limitation, we introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer. Additionally, we optimize the model's reasoning path selection using the Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning capabilities. Our method offers several advantages compared to previous approaches. Firstly, we transform the process of solving complex reasoning questions by breaking them down into a series of simpler sub-questions, significantly reducing the task difficulty and making it more suitable for SLMs. Secondly, we optimize the model's reasoning path selection through the PPO algorithm. We conduct comprehensive experiments on four arithmetic reasoning datasets, demonstrating that our method achieves significant performance improvements compared to state-of-the-art competitors.
    摘要 大脑语言模型(LLM)的逻辑能力可以通过链条思维(CoT)提示来提高,但是当应用于少于100亿参数的小语言模型(SLM)时,CoT的效果减弱或者甚至有害。为了解决这个局限性,我们提出了对话引导链条思维(DialCoT),它使用对话格式生成中间逻辑步骤,导引模型到答案。此外,我们使用距离策略优化(PPO)算法优化模型的逻辑路径选择,进一步提高其逻辑能力。我们的方法具有以下优势:首先,我们将复杂的逻辑问题转化为一系列更加简单的子问题,从而大大减轻任务难度,使SLM更适合处理。其次,我们通过PPO算法优化模型的逻辑路径选择,从而提高模型的逻辑能力。我们对四个数学逻辑数据集进行了广泛的实验,显示我们的方法与当前的竞争对手相比有显著的性能提升。

Video-CSR: Complex Video Digest Creation for Visual-Language Models

  • paper_url: http://arxiv.org/abs/2310.05060
  • repo_url: None
  • paper_authors: Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang
  • for: 这个论文是用来评估视频语言模型的captioning、摘要和检索能力的新任务和人工标注数据集。
  • methods: 该数据集包含4.8万个YouTube视频clip,每个clip长度在20-60秒之间,覆盖了各种主题和兴趣。每个视频clip都有5个独立的标注caption(1句)和摘要(3-10句)。
  • results: 给任意选择的视频和其相应的ASR信息,我们评估视频语言模型在caption和摘要生成任务中,以及基于caption和摘要的检索任务中的表现。此外,我们还进行了评估不同的现有评价 metric的alignment with human preferences,并提出了一个基eline模型,以便作为Video-CSR任务的参考点。
    Abstract We present a novel task and human annotated dataset for evaluating the ability for visual-language models to generate captions and summaries for real-world video clips, which we call Video-CSR (Captioning, Summarization and Retrieval). The dataset contains 4.8K YouTube video clips of 20-60 seconds in duration and covers a wide range of topics and interests. Each video clip corresponds to 5 independently annotated captions (1 sentence) and summaries (3-10 sentences). Given any video selected from the dataset and its corresponding ASR information, we evaluate visual-language models on either caption or summary generation that is grounded in both the visual and auditory content of the video. Additionally, models are also evaluated on caption- and summary-based retrieval tasks, where the summary-based retrieval task requires the identification of a target video given excerpts of a corresponding summary. Given the novel nature of the paragraph-length video summarization task, we perform extensive comparative analyses of different existing evaluation metrics and their alignment with human preferences. Finally, we propose a foundation model with competitive generation and retrieval capabilities that serves as a baseline for the Video-CSR task. We aim for Video-CSR to serve as a useful evaluation set in the age of large language models and complex multi-modal tasks.
    摘要 我们介绍了一个新的任务和人标注数据集,用于评估视觉语言模型对真实视频片段的captioning、摘要和搜索能力,我们称之为Video-CSR(captioning、摘要和搜索)。该数据集包含4.8万个YouTube视频片段,每个片段长度在20-60秒之间,覆盖了广泛的主题和兴趣。每个视频片段对应着5个独立 annotated captions(1句)和摘要(3-10句)。给任意选择的视频和其相应的ASR信息,我们评估视觉语言模型在caption或摘要生成 tasks中的能力,这些任务都基于视频的视觉和听音内容。此外,我们还评估模型在caption-和摘要基于搜索任务中的能力,其中摘要基于搜索任务需要根据视频摘要的剪辑来标识目标视频。由于文章长度视频摘要任务的新性,我们进行了详细的比较分析不同的评估指标和其与人类偏好的对齐。最后,我们提出了一个基础模型,具有竞争力强的生成和搜索能力,作为Video-CSR任务的基线模型。我们希望Video-CSR能够在大型语言模型和复杂多Modal任务时代发挥作用。

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

  • paper_url: http://arxiv.org/abs/2310.05058
  • repo_url: None
  • paper_authors: Songtao Luo, Shuang Yang, Shiguang Shan, Xilin Chen
  • for: 本文提出了一种基于两个观察点的新方法,用于 speaker adaptation lip reading,目的是提高 lip reading 的精度和稳定性。
  • methods: 本文使用了 shallow 和 deep 层,将 speaker 的特征分别处理为 two different targets,以便自动学习 separable hidden unit contributions。在 shallow 层中,引入 speaker-adaptive features 来增强 speech content 相关的特征;在 deep 层中,引入 speaker-adaptive features 来抑制 speech content 不相关的噪音。
  • results: 本文的方法在不同设置下进行了广泛的分析和比较,并 consistently 超过了现有方法的性能。此外,本文还发布了一个新的测试集 CAS-VSR-S68h,以进一步评估在只有几个 speaker 的情况下,但涵盖了大量和多样化的 speech content 的情况下的性能。
    Abstract In this paper, we propose a novel method for speaker adaptation in lip reading, motivated by two observations. Firstly, a speaker's own characteristics can always be portrayed well by his/her few facial images or even a single image with shallow networks, while the fine-grained dynamic features associated with speech content expressed by the talking face always need deep sequential networks to represent accurately. Therefore, we treat the shallow and deep layers differently for speaker adaptive lip reading. Secondly, we observe that a speaker's unique characteristics ( e.g. prominent oral cavity and mandible) have varied effects on lip reading performance for different words and pronunciations, necessitating adaptive enhancement or suppression of the features for robust lip reading. Based on these two observations, we propose to take advantage of the speaker's own characteristics to automatically learn separable hidden unit contributions with different targets for shallow layers and deep layers respectively. For shallow layers where features related to the speaker's characteristics are stronger than the speech content related features, we introduce speaker-adaptive features to learn for enhancing the speech content features. For deep layers where both the speaker's features and the speech content features are all expressed well, we introduce the speaker-adaptive features to learn for suppressing the speech content irrelevant noise for robust lip reading. Our approach consistently outperforms existing methods, as confirmed by comprehensive analysis and comparison across different settings. Besides the evaluation on the popular LRW-ID and GRID datasets, we also release a new dataset for evaluation, CAS-VSR-S68h, to further assess the performance in an extreme setting where just a few speakers are available but the speech content covers a large and diversified range.
    摘要 在这篇论文中,我们提出了一种新的lip reading speaker adaptation方法,基于两个观察结论。首先,一个说话人的自己特征可以通过他/她的一些脸部图像或者even a single image with shallow networks来表示得非常好,而言语内容表达在说话脸上的细腻动态特征则需要深度顺序网络来表示准确。因此,我们将浅层和深层处理 differently。其次,我们发现说话人的独特特征(例如嘴巴和下颌)对不同的话语和发音有不同的影响,需要根据不同的话语和发音进行适应增强或减弱这些特征以实现Robust lip reading。基于这两个观察结论,我们提出了利用说话人自己的特征自动学习可分离的隐藏单元贡献,其中浅层的特征与话语内容相关的特征更强,我们引入说话人特征学习以增强话语内容相关的特征。深层的特征则是说话人特征和话语内容相关的特征都很好地表示,我们引入说话人特征学习以减弱话语内容无关的噪音以实现Robust lip reading。我们的方法在不同的设置下 consistently outperform了现有方法,经过了全面的分析和比较。除了在popular LRW-ID和GRID dataset上进行评估外,我们还发布了一个新的测试集,CAS-VSR-S68h,以进一步评估在只有几个说话人的情况下,但是说话内容覆盖了广泛而多样化的情况下的性能。

FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility

  • paper_url: http://arxiv.org/abs/2310.05053
  • repo_url: None
  • paper_authors: Lang Feng, Dong Xing, Junru Zhang, Gang Pan
  • for: 提高多代理人PPO算法的合作多代理人学习(MARL)理论保证性。
  • methods: 基于全管道思想,实现多平行优化管道,通过不同的等价分解方法表示代理人之间的连接。
  • results: FP3O算法在多代理人MuJoCo和StarCraftII任务上表现出色,超过了其他强基eline,并在不同的参数共享配置下展现了强大的可变性。
    Abstract Existing multi-agent PPO algorithms lack compatibility with different types of parameter sharing when extending the theoretical guarantee of PPO to cooperative multi-agent reinforcement learning (MARL). In this paper, we propose a novel and versatile multi-agent PPO algorithm for cooperative MARL to overcome this limitation. Our approach is achieved upon the proposed full-pipeline paradigm, which establishes multiple parallel optimization pipelines by employing various equivalent decompositions of the advantage function. This procedure successfully formulates the interconnections among agents in a more general manner, i.e., the interconnections among pipelines, making it compatible with diverse types of parameter sharing. We provide a solid theoretical foundation for policy improvement and subsequently develop a practical algorithm called Full-Pipeline PPO (FP3O) by several approximations. Empirical evaluations on Multi-Agent MuJoCo and StarCraftII tasks demonstrate that FP3O outperforms other strong baselines and exhibits remarkable versatility across various parameter-sharing configurations.
    摘要 现有的多代理PPO算法缺乏扩展 тео리тиче guarantee of PPO to cooperative multi-agent reinforcement learning (MARL) 中的兼容性。在这篇文章中,我们提出了一种新的和灵活的多代理PPO算法,以超越这些限制。我们的方法基于我们提出的全管道 paradigm,该 paradigm在利用多种等价 decompositions of the advantage function 中实现多个并行的优化管道。这个过程成功地表达了多个代理之间的连接,即多个管道之间的连接,使其与不同类型的参数共享兼容。我们提供了强有力的理论基础,以便策略提高,并在后续开发了一种实用的FP3O算法。在Multi-Agent MuJoCo和StarCraftII任务上的实验评估中,FP3O的表现超过了其他强大的基准,并且在不同的参数共享配置下展现出了remarkable的灵活性。

Learning Intra- and Inter-Cell Differences for Accurate Battery Lifespan Prediction across Diverse Conditions

  • paper_url: http://arxiv.org/abs/2310.05052
  • repo_url: None
  • paper_authors: Han Zhang, Yuqi Li, Shun Zheng, Ziheng Lu, Xiaofan Gui, Wei Xu, Jiang Bian
  • for: 预测电池寿命的研究具有实际应用价值,尤其是在电池研发中。现有的数据驱动模型大多依靠特定电池的早期电学信号来预测它的寿命。然而,这些模型受限于特定腐食条件,这不仅限制了它们的模型能力,还使其在不同条件下预测腐食的效果减退。因此,这些模型经常错过了可以从另一些条件下的历史数据中获得的全部 beneficial。
  • methods: 我们引入了一种方法,可以考虑target电池和参照电池之间的差异,无论它们的材料和腐食条件如何。通过这种差异,我们不仅扩大了特征空间,而且开辟了一个通用的电池寿命预测框架。我们的模型结合了inter-和intra-cell差异,在多种条件下表现出了极高的效率和准确率,使用了所有可用的数据集。
  • results: 我们的方法可以充分利用older电池的数据,使 newer电池可以借鉴过去的电池的经验。这种方法不仅拓宽了电池数据利用策略,还为未来的电池管理系统提供了智能化的基础。
    Abstract Battery life prediction holds significant practical value for battery research and development. Currently, many data-driven models rely on early electrical signals from specific target batteries to predict their lifespan. A common shortfall is that most existing methods are developed based on specific aging conditions, which not only limits their model's capability but also diminishes their effectiveness in predicting degradation under varied conditions. As a result, these models often miss out on fully benefiting from the rich historical data available under other conditions. Here, to address above, we introduce an approach that explicitly captures differences between electrical signals of a target battery and a reference battery, irrespective of their materials and aging conditions, to forecast the target battery life. Through this inter-cell difference, we not only enhance the feature space but also pave the way for a universal battery life prediction framework. Remarkably, our model that combines the inter- and intra-cell differences shines across diverse conditions, standing out in its efficiency and accuracy using all accessible datasets. An essential application of our approach is its capability to leverage data from older batteries effectively, enabling newer batteries to capitalize on insights gained from past batteries. This work not only enriches the battery data utilization strategy but also sets the stage for smarter battery management system in the future.
    摘要 预测电池寿命具有重要的实践价值,对电池研发具有重要的意义。目前,许多数据驱动模型依靠特定目标电池早期的电学信号来预测它们的寿命。然而,大多数现有方法都是基于特定腐蚀条件下开发的,这不仅限制了他们的模型能力,而且降低了它们在不同条件下预测腐蚀的效果。因此,这些模型经常会错过利用可用的历史数据来预测腐蚀情况。在这里,我们引入了一种方法,可以跨电池和参照电池之间的差异来预测目标电池寿命。通过这种差异,我们不仅扩大了特征空间,而且开创了一个通用的电池寿命预测框架。另外,我们的模型结合了差异和内部差异,在多种条件下表现出色,高效精准地使用所有可用的数据集。这种方法不仅可以有效地利用较老的电池数据,还可以为未来的电池管理系统提供智能化的基础。

From Text to Tactic: Evaluating LLMs Playing the Game of Avalon

  • paper_url: http://arxiv.org/abs/2310.05036
  • repo_url: https://github.com/jonathanmli/avalon-llm
  • paper_authors: Jonathan Light, Min Cai, Sheng Shen, Ziniu Hu
  • for: 这篇论文探讨了大语言模型代理人(LLM)在游戏《抵抗avalon》中的潜力。
  • methods: 作者们使用了一个名为AvalonBench的游戏环境,以评估多代理LML模型。这个环境包括avalon游戏环境、基于规则的bot对手和ReAct风格的LML代理人。
  • results: 作者们的评估结果显示,使用AvalonBench评估LML模型时存在明显的能力差距。例如,使用ChatGPT扮演善良角色时,与基于规则的bot对手扮演邪恶角色的情况下,win rate为22.2%,而使用基于规则的bot扮演善良角色时,win rate为38.2%。
    Abstract In this paper, we explore the potential of Large Language Models (LLMs) Agents in playing the strategic social deduction game, Resistance Avalon. Players in Avalon are challenged not only to make informed decisions based on dynamically evolving game phases, but also to engage in discussions where they must deceive, deduce, and negotiate with other players. These characteristics make Avalon a compelling test-bed to study the decision-making and language-processing capabilities of LLM Agents. To facilitate research in this line, we introduce AvalonBench - a comprehensive game environment tailored for evaluating multi-agent LLM Agents. This benchmark incorporates: (1) a game environment for Avalon, (2) rule-based bots as baseline opponents, and (3) ReAct-style LLM agents with tailored prompts for each role. Notably, our evaluations based on AvalonBench highlight a clear capability gap. For instance, models like ChatGPT playing good-role got a win rate of 22.2% against rule-based bots playing evil, while good-role bot achieves 38.2% win rate in the same setting. We envision AvalonBench could be a good test-bed for developing more advanced LLMs (with self-playing) and agent frameworks that can effectively model the layered complexities of such game environments.
    摘要 在这篇论文中,我们探讨了大语言模型代理人(LLM)在游戏《巨大叛逆:阿瓦隆》中的潜力。游戏中的玩家不仅需要根据不断发展的游戏阶段进行了解的决策,还需要与其他玩家进行交流,包括谎言、推理和谈判。这些特点使得阿瓦隆成为了研究LLM代理人决策和语言处理能力的有力的测试场景。为了促进这种研究,我们介绍了阿瓦隆Bench,一个包含以下三个重要组成部分的游戏环境:(1)阿瓦隆游戏环境,(2)基于规则的 bot 作为基准对手,以及(3)ReAct 风格的 LLM 代理人,每个角色都有适应的提示。我们的评估结果表明,与基于规则的 bot 作为邪恶对手进行比较,ChatGPT 扮演善良角色时的胜率为 22.2%,而基于规则的 bot 扮演善良角色时的胜率为 38.2%。我们认为阿瓦隆Bench 可以成为 LLM 的发展和自适应代理人框架的试验场景。

Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection

  • paper_url: http://arxiv.org/abs/2310.05035
  • repo_url: None
  • paper_authors: Haodi Zhang, Min Cai, Xinhe Zhang, Chen Jason Zhang, Rui Mao, Kaishun Wu
  • for: 提高大语言模型(LLM)的复杂理解和具有技巧使用能力
  • methods: 使用提前训练的语言模型、询问、检查和修改步骤
  • results: 实验结果 validate Self-Convince 框架的有效性,与基准值进行比较获得了显著提高
    Abstract While large language models (LLMs) such as ChatGPT and PaLM have demonstrated remarkable performance in various language understanding and generation tasks, their capabilities in complex reasoning and intricate knowledge utilization still fall short of human-level proficiency. Recent studies have established the effectiveness of prompts in steering LLMs towards generating desired outputs. Building on these insights, we introduce a novel framework that harnesses the potential of large-scale pre-trained language models, to iteratively enhance performance of the LLMs. Our framework incorporates three components: \textit{Normal CoT}, a \textit{Convincer}, and an \textit{Answerer}. It processes the output of a typical few-shot chain-of-thought prompt, assesses the correctness of the response, scrutinizes the answer, refines the reasoning, and ultimately produces a new solution. Experimental results on the 7 datasets of miscellaneous problems validate the efficacy of the Self-Convince framework, achieving substantial improvements compared to the baselines. This study contributes to the burgeoning body of research focused on integrating pre-trained language models with tailored prompts and iterative refinement processes to augment their performance in complex tasks.
    摘要 大型语言模型(LLM)如ChatGPT和PaLM在不同的语言理解和生成任务中表现出色,但它们在复杂的推理和细节知识利用方面仍然落后人类水平。现在的研究显示了干预提示的效果,可以导引LLM生成所需的出力。基于这些见解,我们提出了一个新的框架,叫做Self-Convince框架。这个框架包含三个 ком成分: Normal CoT、Convincer 和 Answerer。它在一般几步链接思维提示的出力中进行处理,评估回应的正确性,探究答案,删除错误的推理,最终生成一个新的解决方案。实验结果显示,Self-Convince框架在7个多元问题的数据集上实现了重要的改善,较基于点的表现有所提高。这项研究将大型预训语言模型与定制提示和迭代改进过程相结合,以增强它们在复杂任务中的表现。

Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think – Introducing AI Detectability Index

  • paper_url: http://arxiv.org/abs/2310.05030
  • repo_url: None
  • paper_authors: Megha Chakraborty, S. M Towhidul Islam Tonmoy, S M Mehedi Zaman, Krish Sharma, Niyar R Barman, Chandan Gupta, Shreya Gautam, Tanay Kumar, Vinija Jain, Aman Chadha, Amit P. Sheth, Amitava Das
  • for: 这篇论文主要旨在评估当前的AI生成文本检测技术的robustness,以及评估不同大小的自然语言处理模型(LLMs)在生成文本检测中的可探测性。
  • methods: 这篇论文提出了Counter Turing Test(CT^2)作为一个完整的评估AI生成文本检测技术的标准 benchark。它们还提出了一个名为AI Detectability Index(ADI)的指标,用于评估不同大小的LLMs在生成文本检测中的可探测性。
  • results: 这篇论文的实验结果表明,现有的AI生成文本检测技术在面对CT^2的测试中具有较弱的可探测性。此外,研究发现大型LLMs具有较高的AI Detectability Index(ADI),这意味着它们在生成文本检测中更难被检测。
    Abstract With the rise of prolific ChatGPT, the risk and consequences of AI-generated text has increased alarmingly. To address the inevitable question of ownership attribution for AI-generated artifacts, the US Copyright Office released a statement stating that 'If a work's traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it'. Furthermore, both the US and the EU governments have recently drafted their initial proposals regarding the regulatory framework for AI. Given this cynosural spotlight on generative AI, AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by emergence of techniques to bypass detection. This paper introduces the Counter Turing Test (CT^2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the robustness of existing AGTD techniques. Our empirical findings unequivocally highlight the fragility of the proposed AGTD methods under scrutiny. Amidst the extensive deliberations on policy-making for regulating AI development, it is of utmost importance to assess the detectability of content generated by LLMs. Thus, to establish a quantifiable spectrum facilitating the evaluation and ranking of LLMs according to their detectability levels, we propose the AI Detectability Index (ADI). We conduct a thorough examination of 15 contemporary LLMs, empirically demonstrating that larger LLMs tend to have a higher ADI, indicating they are less detectable compared to smaller LLMs. We firmly believe that ADI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making.
    摘要 随着智能ChatGPT的出现,人工智能生成文本的风险和后果增加了致命地。为了解决人工智能生成文本的所有权归属问题,美国版权办公室发表了一份声明,表示如果文本中的传统元素的作者是机器制造出来的,那么文本就缺乏人类作者,因此不会注册。此外,美国和欧盟政府最近已经起草了关于人工智能的规制框架的初步提案。随着生成AI的焦点逐渐吸引到研究领域,AI生成文本检测(AGTD)已经成为研究的热点,一些初步的方法已经被提出,然后又有人提出了绕过检测的技术。本文介绍了Counter Turing Test(CT^2),一种包含了检测AGTD技术的多种方法的benchmark。我们的实验结果明显地表明,现有的AGTD方法在审查中表现极其脆弱。在政策制定的过程中,评估人工智能生成内容的可检测性是非常重要的。因此,我们提出了人工智能可检测指数(ADI),以评估和排名15种当代LLMs的可检测性水平。我们通过实验证明,大型LLMs tend to have higher ADI, indicating that they are less detectable compared to smaller LLMs。我们认为ADI具有重要的价值,可以作为NLP社区中的工具,并且可能在AI相关的政策制定中扮演重要的角色。

Revisiting Large Language Models as Zero-shot Relation Extractors

  • paper_url: http://arxiv.org/abs/2310.05028
  • repo_url: None
  • paper_authors: Guozheng Li, Peng Wang, Wenjun Ke
  • for: 这个论文主要研究了使用大语言模型(LLM)进行零shot关系EXTRACTION(RE)。
  • methods: 本研究使用了Chain-of-thought(CoT)技术和summarize-and-ask(\textsc{SumAsk})提示法来提高零shot RE的性能。
  • results: 研究发现,\textsc{SumAsk}可以Consistently和Significantly提高LLMs在不同的模型大小、benchmark和设置下的性能。此外,零shot提示与ChatGPT比较或超过了零shot和完全监督方法的性能。LLMs也能够Handle Challenge none-of-the-above(NoTA)关系,但关系性能差异较大。
    Abstract Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting. Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt, which provides the possibility of extracting relations from text without any data and parameter tuning. This work focuses on the study of exploring LLMs, such as ChatGPT, as zero-shot relation extractors. On the one hand, we analyze the drawbacks of existing RE prompts and attempt to incorporate recent prompt techniques such as chain-of-thought (CoT) to improve zero-shot RE. We propose the summarize-and-ask (\textsc{SumAsk}) prompting, a simple prompt recursively using LLMs to transform RE inputs to the effective question answering (QA) format. On the other hand, we conduct comprehensive experiments on various benchmarks and settings to investigate the capabilities of LLMs on zero-shot RE. Specifically, we have the following findings: (i) \textsc{SumAsk} consistently and significantly improves LLMs performance on different model sizes, benchmarks and settings; (ii) Zero-shot prompting with ChatGPT achieves competitive or superior results compared with zero-shot and fully supervised methods; (iii) LLMs deliver promising performance in extracting overlapping relations; (iv) The performance varies greatly regarding different relations. Different from small language models, LLMs are effective in handling challenge none-of-the-above (NoTA) relation.
    摘要 <> translate_language: zh-CN关系提取(RE)总是需要一定量的标注或未标注数据,即使在零shot设定下。现代语言模型(LLM)可以轻松地在新任务上进行升级,只需要一个自然语言提示,这提供了提取关系从文本中的可能性。本研究将关注使用LLM,如ChatGPT,作为零shot关系提取器的研究。一方面,我们分析了现有RE提示的缺点,并尝试使用最新的提示技术,如链条思维(CoT),来改进零shot RE。我们提出了摘要并问 (\textsc{SumAsk})提示,一种简单的提示,使用LLM recursively将RE输入转换为有效的问答(QA)格式。另一方面,我们对多个benchmark和设置进行了广泛的实验,以研究LLM在零shot RE中的能力。我们得到以下发现:(i) \textsc{SumAsk}在不同的模型大小、benchmark和设置上具有一致性和显著性,提高LLM的表现。(ii) 采用ChatGPT的零shot提示可以与零shot和完全监督方法相比,在不同的任务和设置上达到竞争或更高的性能。(iii) LLM在抽象关系提取方面表现出色,特别是在抽象关系上。(iv) 不同的关系之间的表现差异较大,而LLM在处理抽象关系方面表现更佳。与小语言模型相比,LLM在处理抽象关系方面表现更出色。

Fully Spiking Neural Network for Legged Robots

  • paper_url: http://arxiv.org/abs/2310.05022
  • repo_url: None
  • paper_authors: Xiaoyang Jiang, Qiang Zhang, Jingkai Sun, Renjing Xu
  • For: The paper aims to improve the performance of legged robots using a novel Spiking Neural Network (SNN) to process body perception signals, achieving better speed and energy consumption, and improved biological interpretability.* Methods: The paper employs a SNN to process legged robots’ perception signals, which offers improved biological interpretability and natural advantages in inference speed and energy consumption compared to traditional artificial neural networks.* Results: The paper achieves outstanding results across a range of simulated terrains, demonstrating the effectiveness of SNN in legged robots.
    Abstract In recent years, legged robots based on deep reinforcement learning have made remarkable progress. Quadruped robots have demonstrated the ability to complete challenging tasks in complex environments and have been deployed in real-world scenarios to assist humans. Simultaneously, bipedal and humanoid robots have achieved breakthroughs in various demanding tasks. Current reinforcement learning methods can utilize diverse robot bodies and historical information to perform actions. However, prior research has not emphasized the speed and energy consumption of network inference, as well as the biological significance of the neural networks themselves. Most of the networks employed are traditional artificial neural networks that utilize multilayer perceptrons (MLP). In this paper, we successfully apply a novel Spiking Neural Network (SNN) to process legged robots, achieving outstanding results across a range of simulated terrains. SNN holds a natural advantage over traditional neural networks in terms of inference speed and energy consumption, and their pulse-form processing of body perception signals offers improved biological interpretability. To the best of our knowledge, this is the first work to implement SNN in legged robots.
    摘要

Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05015
  • repo_url: https://github.com/microsoft/moonlit
  • paper_authors: Song Guo, Jiahang Xu, Li Lyna Zhang, Mao Yang
  • for: 这篇研究的目的是为了提高大型语言模型(LLM)的部署,特别是在具有限制的硬件资源的环境下。
  • methods: 这篇研究使用了一种新的架构,叫做Compresso,它通过与LLM的协作,在训练过程中学习最佳的剪辑决策。Compresso使用了LoRA技术来实现$L_0$规律,并在调询过程中引入了协同提示,以增强整体性能。
  • results: 根据实验结果,Compresso可以将LLaMA-7B剪辑到5.4B,保持原始性能,甚至在阅读理解测试中超过LLaMA-7B的表现。Compresso比一项基eline的一项单一剪辑方法(one-shot pruning)有更高的表现,在不同的组合比例下,可以达到2.21%, 11.43%, 7.04%, 4.81%更高的分数在 Commonsense Reasoning、Reading Comprehension、MMLU和BBH测试中。
    Abstract Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank Adaptation (LoRA) into the $L_0$ regularization during the instruction tuning process. Then, we further augment the pruning algorithm by introducing a collaborative prompt that fosters collaboration between the LLM and the pruning algorithm, significantly boosting the overall performance. To this end, Compresso prunes LLaMA-7B to 5.4B, maintaining original performance and even surpassing LLaMA-7B in reading comprehension by 2.62%. Extensive experiments demonstrate that Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.
    摘要 尽管大型自然语言模型(LLM)已经取得了非常出色的成功,但是它们的巨大大小却对资源有限的硬件 pose 了部署的挑战。现有的 LLM 压缩方法主要集中在量化上,而采用 Training-based 方法和数据收集的压缩方法却受到了高成本和数据收集的挑战。一shot 压缩方法,尽管成本低廉且不需数据,但在结构化压缩设定下会导致性能下降。在这种情况下,我们提出了一种新的 LL 模型结构压缩方法,called Compresso。我们的方法通过与 LL 模型的协作,在训练过程中学习最佳压缩决策。Compresso 通过综合利用 LoRA 技术和 $L_0$ 正则化来解决训练成本和数据收集的挑战。此外,我们还在压缩算法中引入了协同提示,使 LL 模型和压缩算法之间的合作更加紧密,从而提高总性能。因此,Compresso 可以压缩 LLaMA-7B 到 5.4B,保持原有性能,甚至在阅读理解任务上超越 LLaMA-7B 的表现。我们的实验表明,Compresso significantly 高于一shot 压缩基准在不同的稀疏比例上,达到了2.21%、11.43%、7.04%和4.81%的提升。

The Reinforce Policy Gradient Algorithm Revisited

  • paper_url: http://arxiv.org/abs/2310.05000
  • repo_url: None
  • paper_authors: Shalabh Bhatnagar
  • for: 本文提出了一种改进版本的强化策略梯度算法,用于处理无穷状态和动作空间的系统。
  • methods: 本文使用一种随机搜索方法来估计策略梯度,而不需要Regularity conditions。
  • results: 本文证明了这种新算法的收敛性,并且在无穷状态和动作空间中的系统上实现了高效的收敛性。In English:
  • for: The paper proposes an improved version of the reinforcement policy gradient algorithm for systems with infinite state and action spaces.
  • methods: The paper uses a random search method to estimate the policy gradient, without requiring regularity conditions.
  • results: The paper proves the convergence of the new algorithm and demonstrates its effectiveness in systems with infinite state and action spaces.
    Abstract We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm. We estimate the policy gradient using a function measurement over a perturbed parameter by appealing to a class of random search approaches. This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm. Nonetheless, we observe that even though we estimate the gradient of the performance objective using the performance objective itself (and not via the sample gradient), the algorithm converges to a neighborhood of a local minimum. We also provide a proof of convergence for this new algorithm.
    摘要 我们回顾到文献中的增强策略梯度算法。该算法通常与成本返回得到随机长度集所得到的集成任务(如果结束)或从定义状态访问中抽出的循环状态(在续行任务中)。我们提出了一个主要优化,使用功能测量在干扰参数上估计策略梯度,通过一类随机搜寻方法。这具有利陵系统拥有无限州和动作空间的情况下,可以缓和一些常量需求,从而让增强算法的证明 converges。然而,我们观察到,即使使用表现目标自身估计策略梯度(而不是采样梯度),算法仍会趋向一个地方最小值的邻近。我们也提供了该新算法的充分性证明。

Distantly-Supervised Joint Entity and Relation Extraction with Noise-Robust Learning

  • paper_url: http://arxiv.org/abs/2310.04994
  • repo_url: https://github.com/yul091/denrl
  • paper_authors: Yufei Li, Xiao Yu, Yanghong Guo, Yanchi Liu, Haifeng Chen, Cong Liu
  • for: 这个论文主要用于解决使用远程标注数据进行entity和关系抽象的问题,即使面临着噪声标注的问题。
  • methods: 该论文提出了一种新的噪声鲁棒方法,包括在序列标注模型中预训练GPT-2,以及使用一种新的噪声鲁棒学习框架,包括一个新的损失函数,惩罚与重要关系模式和实体关系依赖性不一致。
  • results: 实验结果显示,该方法可以在两个数据集上达到现有状态的 arts 方法的同等或更高的 JOINT 抽象性和噪声减少效果。
    Abstract Joint entity and relation extraction is a process that identifies entity pairs and their relations using a single model. We focus on the problem of training these models on distantly-labeled data, which is generated by aligning entity mentions in a text corpus with their corresponding entity and relation types in a knowledge base. One key challenge here is the presence of noisy labels, which arises from both entity and relation annotations, and significantly impair the effectiveness of supervised learning applications. However, existing research primarily addresses only one type of noise, thereby limiting the effectiveness of noise reduction. To fill this gap, we introduce a new noise-robust approach, that 1)~incorporates a pre-trained GPT-2 into a sequence tagging scheme for simultaneous entity and relation detection, and 2)~employs a noise-robust learning framework which includes a new loss function that penalizes inconsistency with both significant relation patterns and entity-relation dependencies, as well as a self-adaptive learning step that iteratively selects and trains on high-quality instances. Experiments on two datasets show that our method outperforms the existing state-of-the-art methods in both joint extraction performance and noise reduction effect.
    摘要 共同实体和关系抽取是一个过程,它通过单一模型标识实体对和其关系。我们关注在训练这些模型的远程标注数据上的问题,这些数据是通过文本库中的实体提及与知识库中的实体和关系类型的对应进行对齐的。一个关键挑战是噪声标注,它来自实体和关系注释,并对监督学习应用产生重要影响。然而,现有研究主要只处理一种噪声,因此限制了噪声减少的效iveness。为了填补这个空白,我们介绍了一种新的噪声Robust Approach,它包括以下两个部分:1. 使用预训练的 GPT-2 在序列标记方案中同时检测实体和关系,以提高实体和关系的同时检测能力。2. 使用一种噪声Robust的学习框架,包括一种新的损失函数,该损失函数考虑实体和关系之间的依赖关系和重要关系模式,以及一种自适应学习步骤,该步骤在高质量实例上进行逐步选择和训练。我们在两个数据集上进行了实验,结果表明,我们的方法在同时检测性能和噪声减少效果方面都超过了现有状态的方法。

The Troubling Emergence of Hallucination in Large Language Models – An Extensive Definition, Quantification, and Prescriptive Remediations

  • paper_url: http://arxiv.org/abs/2310.04988
  • repo_url: None
  • paper_authors: Vipula Rawte, Swagata Chakraborty, Agnibh Pathak, Anubhav Sarkar, S. M Towhidul Islam Tonmoy, Aman Chadha, Amit P. Sheth, Amitava Das
  • for: 本研究旨在提供一种细化的幻觉分类方法,以及对幻觉的减轻策略。
  • methods: 本研究使用了15种当代大语言模型生成75,000个样本,并对其进行了人工标注。此外,本研究还提出了一个幻觉敏感指数(HVI),用于评估和排序不同的大语言模型在生成幻觉方面的敏感度。
  • results: 本研究对幻觉进行了细化分类,并提出了两种减轻幻觉的方法。
    Abstract The recent advancements in Large Language Models (LLMs) have garnered widespread acclaim for their remarkable emerging capabilities. However, the issue of hallucination has parallelly emerged as a by-product, posing significant concerns. While some recent endeavors have been made to identify and mitigate different types of hallucination, there has been a limited emphasis on the nuanced categorization of hallucination and associated mitigation methods. To address this gap, we offer a fine-grained discourse on profiling hallucination based on its degree, orientation, and category, along with offering strategies for alleviation. As such, we define two overarching orientations of hallucination: (i) factual mirage (FM) and (ii) silver lining (SL). To provide a more comprehensive understanding, both orientations are further sub-categorized into intrinsic and extrinsic, with three degrees of severity - (i) mild, (ii) moderate, and (iii) alarming. We also meticulously categorize hallucination into six types: (i) acronym ambiguity, (ii) numeric nuisance, (iii) generated golem, (iv) virtual voice, (v) geographic erratum, and (vi) time wrap. Furthermore, we curate HallucInation eLiciTation (HILT), a publicly available dataset comprising of 75,000 samples generated using 15 contemporary LLMs along with human annotations for the aforementioned categories. Finally, to establish a method for quantifying and to offer a comparative spectrum that allows us to evaluate and rank LLMs based on their vulnerability to producing hallucinations, we propose Hallucination Vulnerability Index (HVI). We firmly believe that HVI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making. In conclusion, we propose two solution strategies for mitigating hallucinations.
    摘要 最近的大自然语言模型(LLM)的进步得到了广泛的赞誉,但同时也出现了一种问题,即幻觉。幻觉的出现引起了一定的担忧,因为它可能会对语言处理 tasks 产生负面影响。然而,关于幻觉的细分分类和相应的缓解方法尚未得到了足够的重视。为了解决这个问题,我们提出了一种细化的幻觉分类方法,以及一些缓解方法。我们将幻觉分为两类:(i)实际幻觉(FM)和(ii)银色幻觉(SL)。这两类幻觉进一步分为内在和外在两类,并且分为三级幻觉的严重程度:(i)轻度、(ii)中度和(iii)警示级。此外,我们还将幻觉分为六种类型:(i)字符混淆、(ii)数字幻觉、(iii)生成的 GOLEM、(iv)虚拟之声、(v)地理错误和(vi)时间包袋。此外,我们还创建了一个名为 HallucInation eLiciTation(HILT)的公共可用数据集,包含75000个样本,由15种当代LLM生成,以及人工标注的相应类别。最后,我们提出了一种用于评估和排名 LLM 的幻觉抵触指数(HVI)。我们认为 HVI 对 NLP 社区拥有广泛的价值,并可能被用作 AI 相关的政策制定的工具。为了缓解幻觉,我们提出了两种解决方案。

A new economic and financial theory of money

  • paper_url: http://arxiv.org/abs/2310.04986
  • repo_url: None
  • paper_authors: Michael E. Glinsky, Sharon Sievert
  • for: This paper aims to reformulate economic and financial theory to include electronic currencies, and to develop a new view of electronic currency as a transactional equity associated with tangible assets.
  • methods: The paper uses macroeconomic theory and the fundamental equation of monetary policy to value electronic currencies, and employs multi time scale models to capture true risk. The decision-making process is approached using deep reinforcement learning, generative pretrained transformers, and other methods of artificial intelligence.
  • results: The paper develops a new view of electronic currency management firms as entities responsible for coordinated monetary and fiscal policies of a substantial sub-economy, and proposes a system response function and DRL/GPT/AI-based active nonlinear control to stabilize unstable equilibriums in the sub-economy.
    Abstract This paper fundamentally reformulates economic and financial theory to include electronic currencies. The valuation of the electronic currencies will be based on macroeconomic theory and the fundamental equation of monetary policy, not the microeconomic theory of discounted cash flows. The view of electronic currency as a transactional equity associated with tangible assets of a sub-economy will be developed, in contrast to the view of stock as an equity associated mostly with intangible assets of a sub-economy. The view will be developed of the electronic currency management firm as an entity responsible for coordinated monetary (electronic currency supply and value stabilization) and fiscal (investment and operational) policies of a substantial (for liquidity of the electronic currency) sub-economy. The risk model used in the valuations and the decision-making will not be the ubiquitous, yet inappropriate, exponential risk model that leads to discount rates, but will be multi time scale models that capture the true risk. The decision-making will be approached from the perspective of true systems control based on a system response function given by the multi scale risk model and system controllers that utilize the Deep Reinforcement Learning, Generative Pretrained Transformers, and other methods of Artificial Intelligence (DRL/GPT/AI). Finally, the sub-economy will be viewed as a nonlinear complex physical system with both stable equilibriums that are associated with short-term exploitation, and unstable equilibriums that need to be stabilized with active nonlinear control based on the multi scale system response functions and DRL/GPT/AI.
    摘要

Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset

  • paper_url: http://arxiv.org/abs/2310.04982
  • repo_url: None
  • paper_authors: Ze Liu
  • for: 本研究旨在提高 Text-to-Speech(TTS)synthesis 的质量,使用深度学习,但现代 TTS 模型需要大量数据。因此,本研究强调使用传输学习,特别是几何学习、少量数据和自定义数据集。
  • methods: 本研究使用了现代 TTS 模型的传输学习能力,进行了系统技术分析,并对几何学习模型进行了实验分析。
  • results: 研究发现,传输学习可以大幅提高 TTS 模型在紧张数据集上的表现,并且可以找到适合特定数据集的优化模型。这种模型可以在数据稀缺时提供高质量的语音输出。
    Abstract Text-to-Speech (TTS) synthesis using deep learning relies on voice quality. Modern TTS models are advanced, but they need large amount of data. Given the growing computational complexity of these models and the scarcity of large, high-quality datasets, this research focuses on transfer learning, especially on few-shot, low-resource, and customized datasets. In this research, "low-resource" specifically refers to situations where there are limited amounts of training data, such as a small number of audio recordings and corresponding transcriptions for a particular language or dialect. This thesis, is rooted in the pressing need to find TTS models that require less training time, fewer data samples, yet yield high-quality voice output. The research evaluates TTS state-of-the-art model transfer learning capabilities through a thorough technical analysis. It then conducts a hands-on experimental analysis to compare models' performance in a constrained dataset. This study investigates the efficacy of modern TTS systems with transfer learning on specialized datasets and a model that balances training efficiency and synthesis quality. Initial hypotheses suggest that transfer learning could significantly improve TTS models' performance on compact datasets, and an optimal model may exist for such unique conditions. This thesis predicts a rise in transfer learning in TTS as data scarcity increases. In the future, custom TTS applications will favour models optimized for specific datasets over generic, data-intensive ones.
    摘要 TEXT-TO-SPEECH(TTS)合成使用深度学习取决于声音质量。现代TTS模型非常先进,但它们需要大量数据。随着这些模型的计算复杂度的增加和数据的罕见性,这些研究将注重传输学习,特别是几个数据集的传输学习。在这种情况下,“低资源”指的是有限的训练数据,例如一小number of audio recording和相应的转录 для一种语言或方言。这个研究是根据找到需要 fewer training time和数据amples的TTS模型的强需求。研究通过深入技术分析评估TTS模型的传输学习能力,然后通过实验分析比较模型在紧张数据集中的表现。这个研究investigatesmodern TTS系统在特殊数据集上的传输学习能力,并预测未来custom TTS应用程序将偏好特定数据集上优化的模型。Note: Simplified Chinese is used here as the translation target, as it is more widely used in mainland China and is the standard form of Chinese used in many online applications. If you prefer Traditional Chinese, I can provide that as well.

MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks

  • paper_url: http://arxiv.org/abs/2310.04965
  • repo_url: None
  • paper_authors: Jingyuan Qi, Minqian Liu, Ying Shen, Zhiyang Xu, Lifu Huang
  • for: 提高AI虚拟助手完成日常任务的自动生成脚本能力,特别是对于不熟悉的任务。
  • methods: 基于多模态视频和文本描述,提出了两个新任务:多模态脚本生成和后续步骤预测。两个任务的输入都是目标任务名和一段完成目标任务的视频示例,输出包括(1)基于视频示例的结构化文本描述,和(2)基于视频示例的后续步骤文本描述。
  • results: 提出了两种基于大语言模型知识的多模态生成框架,并在MultiScript挑战 задании上实现了显著提高。
    Abstract Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones. However, current methods for generative script learning rely heavily on well-structured preceding steps described in text and/or images or are limited to a certain domain, resulting in a disparity with real-world user scenarios. To address these limitations, we present a new benchmark challenge -- MultiScript, with two new tasks on task-oriented multimodal script learning: (1) multimodal script generation, and (2) subsequent step prediction. For both tasks, the input consists of a target task name and a video illustrating what has been done to complete the target task, and the expected output is (1) a sequence of structured step descriptions in text based on the demonstration video, and (2) a single text description for the subsequent step, respectively. Built from WikiHow, MultiScript covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. To establish baseline performance on MultiScript, we propose two knowledge-guided multimodal generative frameworks that incorporate the task-related knowledge prompted from large language models such as Vicuna. Experimental results show that our proposed approaches significantly improve over the competitive baselines.
    摘要 现代AI虚拟助手需要自动生成脚本(即文本描述的顺序步骤)从视频示例中,并根据示例视频进行逻辑推理来导引人类完成日常任务,特别是不熟悉的任务。然而,现有的生成脚本学习方法都是基于结构化的前置步骤(文本和/或图像),或者只能在特定领域中进行学习,这导致了与实际用户场景的差距。为了解决这些限制,我们提出了一个新的比赛挑战——MultiScript,包括两个新任务:(1)多媒体脚本生成和(2)后续步骤预测。对于两个任务,输入都是目标任务名和一段完成目标任务的视频示例,并且期望的输出是(1)基于示例视频的结构化文本描述,和(2)一个基于示例视频的文本描述。MultiScript由WikiHow建立,覆盖了视频和文本描述的多媒体脚本 для人类日常任务的19个不同领域,涵盖了6,655个任务。为了确定MultiScript的基准性能,我们提议两种基于大型自然语言模型(如Vicuna)的知识导向多媒体生成框架,实验结果表明,我们的提议方法具有显著的优势。

LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation

  • paper_url: http://arxiv.org/abs/2310.04963
  • repo_url: None
  • paper_authors: Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran
  • for: This paper explores the capability of state-of-the-art large language models (LLMs) to automatically generate tests and validate compiler implementations of a directive-based programming paradigm, OpenACC.
  • methods: The paper employs various prompt engineering techniques, including code templates, retrieval-augmented generation (RAG) with code templates, expressive prompts using RAG with code templates, one-shot examples, and RAG with one-shot examples.
  • results: The paper investigates the outcome of LLMs-generated tests and analyzes the capabilities of the latest LLMs for code generation.
    Abstract Large language models (LLMs) are a new and powerful tool for a wide span of applications involving natural language and demonstrate impressive code generation abilities. In this paper, we explore the capabilitity of state-of-the-art LLMs, including closed-source options like OpenAI GPT-4 and open-source alternatives like Meta AI Codellama, to automatically generate tests and use these tests to validate and verify compiler implementations of a directive-based programming paradigm, OpenACC. Our approach entails exploring various prompt engineering techniques including a code template, retrieval-augmented generation (RAG) with code template, expressive prompt using RAG with code template, one-shot example, and RAG with one-shot example. This paper focusses on (a) exploring the capabilities of the latest LLMs for code generation, (b) investigating prompt and fine tuning methods, and (c) analyzing the outcome of LLMs generated tests
    摘要 大型自然语言模型(LLM)是一种新的和强大的工具,可以应用于许多自然语言相关的应用程序。在这篇论文中,我们探讨了当前领先的LLM,包括OpenAI GPT-4和Meta AI Codellama等closed-source选择,以及open-source的选择,用于自动生成测试,并使用这些测试来验证和验证编译器实现的指令式编程方法OpenACC。我们的方法包括使用代码模板、代码检索增强生成(RAG)、表达式提示、一shot示例和RAG与一shot示例等多种提示工程技术。本文主要关注以下三点:1. 探讨最新的LLM代码生成能力2. 探讨提示和精度调整方法3. 分析LLM生成的测试结果

Safe Deep Policy Adaptation

  • paper_url: http://arxiv.org/abs/2310.08602
  • repo_url: None
  • paper_authors: Wenli Xiao, Tairan He, John Dolan, Guanya Shi
  • for: 本研究旨在开发一种能够快速适应动态不确定环境的自主 робоット控制框架,同时保证安全性和稳定性。
  • methods: 本研究使用了policy adaptation基于再归折衔学习(RL),并提出了一种安全防止(Safety Filter)来保证实际世界中的安全性。
  • results: 实验结果显示,SafeDPA在三个不同的环境中(倒挠杆、Safety Gym和RC Car)具有出色的安全性和任务性能,与现有的基准值进行比较,SafeDPA在不可见干扰的实际世界中展现出了300%的安全率提升。
    Abstract A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.
    摘要 “一个重要目标是实现自主机器人快速适应动态和不确定环境。类型的适应控制和安全控制可以提供稳定性和安全保证,但是仅对特定系统类型有效。相比之下,基于征得学习(RL)的政策适应则提供了多样性和普遍性,但是产生了安全和可靠性挑战。我们提出了SafeDPA,一个新的RL和控制框架,同时解决政策适应和安全征得学习的问题。SafeDPA在实验中同时学习适应政策和动力学模型,预测环境配置,并将几何数据进行精确化。我们引入了基于控制障碍函数(CBF)的安全筛选器,以保证在真实世界中的安全运行。我们提供了理论上的安全保证,并证明SafeDPA对学习错误和额外干扰具有Robustness。实验结果显示,SafeDPA在三个不同的应用中具有优秀的安全性和任务性能,比基准设定更高。特别是,SafeDPA在未见到的干扰下 demonstrate了特别的多样性,在真实世界中获得300%的安全率提升。”

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

  • paper_url: http://arxiv.org/abs/2310.04951
  • repo_url: https://github.com/weixiangyan/codetransocean
  • paper_authors: Weixiang Yan, Yuchen Tian, Yunzhe Li, Qian Chen, Wen Wang
  • for: 这个研究旨在提高代码翻译的质量和维护效率,并满足实际应用中的多元化需求。
  • methods: 这个研究使用了人工神经网络翻译模型,探索了多种程式语言之间的翻译,包括具有多种程式语言的复杂混合翻译。
  • results: 研究发现,这些多种程式语言翻译方法可以提高低资源语言的翻译质量和高资源语言的培训效率。此外,研究还提出了一个新的评估指标Debugging Success Rate@K,用于评估翻译后的程式码可行性。
    Abstract Recent code translation techniques exploit neural machine translation models to translate source code from one programming language to another to satisfy production compatibility or to improve efficiency of codebase maintenance. Most existing code translation datasets only focus on a single pair of popular programming languages. To advance research on code translation and meet diverse requirements of real-world applications, we construct CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation. CodeTransOcean consists of three novel multilingual datasets, namely, MultilingualTrans supporting translations between multiple popular programming languages, NicheTrans for translating between niche programming languages and popular ones, and LLMTrans for evaluating executability of translated code by large language models (LLMs). CodeTransOcean also includes a novel cross-framework dataset, DLTrans, for translating deep learning code across different frameworks. We develop multilingual modeling approaches for code translation and demonstrate their great potential in improving the translation quality of both low-resource and high-resource language pairs and boosting the training efficiency. We also propose a novel evaluation metric Debugging Success Rate@K for program-level code translation. Last but not least, we evaluate LLM ChatGPT on our datasets and investigate its potential for fuzzy execution predictions. We build baselines for CodeTransOcean and analyze challenges of code translation for guiding future research. The CodeTransOcean datasets and code are publicly available at https://github.com/WeixiangYAN/CodeTransOcean.
    摘要 现代代码翻译技术利用神经机器翻译模型将源代码从一种编程语言翻译到另一种编程语言,以满足生产兼容性或改善代码维护效率。现有大多数代码翻译数据集只关注单个受欢迎的编程语言对。为了推动代码翻译研究和满足实际应用的多样化需求,我们构建了CodeTransOcean,一个大规模、完整的benchmark,支持最多的编程语言对进行代码翻译。CodeTransOcean包括三个新的多语言数据集:MultilingualTrans、NicheTrans和LLMTrans。MultilingualTrans支持多种受欢迎编程语言之间的翻译,NicheTrans用于将特殊编程语言与受欢迎语言之间翻译,LLMTrans用于通过大语言模型(LLMs)评估翻译后代码的执行可能性。CodeTransOcean还包括一个跨框架数据集DLTrans,用于跨不同框架深度学习代码翻译。我们开发了多语言模型方法,并证明它们在低资源语言对和高资源语言对翻译质量提高和训练效率提高。我们还提出了一个新的评价指标Debugging Success Rate@K,用于评估翻译后代码的可调试性。最后,我们评估了LLM ChatGPT在我们的数据集上的性能,并调查其可能性于软件执行预测。我们建立了CodeTransOcean的基准,并分析了代码翻译的挑战,以帮助未来研究。CodeTransOcean数据集和代码可以在https://github.com/WeixiangYAN/CodeTransOcean上下载。

cs.CL - 2023-10-08

Visual Storytelling with Question-Answer Plans

  • paper_url: http://arxiv.org/abs/2310.05295
  • repo_url: None
  • paper_authors: Danyang Liu, Mirella Lapata, Frank Keller
  • for: 本研究旨在生成吸引人的故事,从图像序列中提取有趣的视觉表达。
  • methods: 该模型将图像序列转化为可进行语言模型解释的视觉预фикс,并使用问题对话对话来选择关键的视觉概念并决定如何将它们组织成一个故事。
  • results: 自动和人工评估结果表明,蓝图基本模型可以生成更加有趣、有логи、自然的故事,比baseline和现有系统更高效。
    Abstract Visual storytelling aims to generate compelling narratives from image sequences. Existing models often focus on enhancing the representation of the image sequence, e.g., with external knowledge sources or advanced graph structures. Despite recent progress, the stories are often repetitive, illogical, and lacking in detail. To mitigate these issues, we present a novel framework which integrates visual representations with pretrained language models and planning. Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret. It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative. Automatic and human evaluation on the VIST benchmark (Huang et al., 2016) demonstrates that blueprint-based models generate stories that are more coherent, interesting, and natural compared to competitive baselines and state-of-the-art systems.
    摘要 Visual storytelling 目标是从图像序列中生成吸引人的故事。现有的模型通常会将注意力集中在图像序列的表现方面,例如通过与外部知识源或高级graph structures整合。despite recent progress, stories are often repetitive, illogical, and lacking in detail. To address these issues, we propose a novel framework that integrates visual representations with pre-trained language models and planning. Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings that language models can interpret. It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative. Automatic and human evaluation on the VIST benchmark (Huang et al., 2016) shows that blueprint-based models generate stories that are more coherent, interesting, and natural compared to competitive baselines and state-of-the-art systems.

Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE Corpus

  • paper_url: http://arxiv.org/abs/2310.05294
  • repo_url: https://github.com/hlt-mt/fbk-neutr-eval
  • paper_authors: Andrea Piergentili, Beatrice Savoldi, Dennis Fucci, Matteo Negri, Luisa Bentivogli
  • for: Addressing the lack of inclusive language in machine translation, particularly in grammatical gender languages.
  • methods: Proposing a dedicated benchmark and exploring automated evaluation methods for gender-neutral translation from English to Italian, including a natural, bilingual test set (GeNTE) and a reference-free evaluation approach.
  • results: A new, more inclusive approach to machine translation that challenges traditional binary gender assumptions and provides a more accurate assessment of gender-neutral translation.
    Abstract Gender inequality is embedded in our communication practices and perpetuated in translation technologies. This becomes particularly apparent when translating into grammatical gender languages, where machine translation (MT) often defaults to masculine and stereotypical representations by making undue binary gender assumptions. Our work addresses the rising demand for inclusive language by focusing head-on on gender-neutral translation from English to Italian. We start from the essentials: proposing a dedicated benchmark and exploring automated evaluation methods. First, we introduce GeNTE, a natural, bilingual test set for gender-neutral translation, whose creation was informed by a survey on the perception and use of neutral language. Based on GeNTE, we then overview existing reference-based evaluation approaches, highlight their limits, and propose a reference-free method more suitable to assess gender-neutral translation.
    摘要 gender inequality 在我们的沟通习惯中存在并在翻译技术中被延续。这种情况特别在翻译到 grammatical gender 语言时变得明显,MT 常 defaults to masculine 和标准化的表达,从而做出了不当的男性假设。我们的工作解决了包容性语言的增长需求,专注于从英语到意大利语的gender-neutral 翻译。我们从基础开始:提议一个专门的标准和探索自动评估方法。首先,我们介绍了 GeNTE,一个自然、双语测试集 для gender-neutral 翻译,其创建受到了对中性语言的感知和使用的调查。然后,我们概述了现有的参照基础评估方法, highlight их的局限性,并提出了不需要参照的方法,更适合评估 gender-neutral 翻译。

  • paper_url: http://arxiv.org/abs/2310.05276
  • repo_url: None
  • paper_authors: Anas Belfathi, Nicolas Hernandez, Laura Monceaux
  • For: 这个研究论文是为了提出一种基于预训练语言模型(PLM)和句子位置信息的新型自动预测辩论角色的模型建模方法。* Methods: 该方法使用了一个简单的模型结构,并使用了LegalEval@SemEval2023 competition上的注释 corpora进行训练。在这个 corpus 中,它们使用了一些特定的预处理技术来提高模型的性能。* Results: 研究人员发现,他们的方法比使用复杂的层次模型在全局上的方法更加简单,具有更低的计算成本。此外,他们还发现,通过在本地上增加更多的注意力,以及将句子位置信息纳入模型中,可以进一步提高结果。
    Abstract The legal domain is a vast and complex field that involves a considerable amount of text analysis, including laws, legal arguments, and legal opinions. Legal practitioners must analyze these texts to understand legal cases, research legal precedents, and prepare legal documents. The size of legal opinions continues to grow, making it increasingly challenging to develop a model that can accurately predict the rhetorical roles of legal opinions given their complexity and diversity. In this research paper, we propose a novel model architecture for automatically predicting rhetorical roles using pre-trained language models (PLMs) enhanced with knowledge of sentence position information within a document. Based on an annotated corpus from the LegalEval@SemEval2023 competition, we demonstrate that our approach requires fewer parameters, resulting in lower computational costs when compared to complex architectures employing a hierarchical model in a global-context, yet it achieves great performance. Moreover, we show that adding more attention to a hierarchical model based only on BERT in the local-context, along with incorporating sentence position information, enhances the results.
    摘要 法律领域是一个庞大复杂的领域,涉及到大量的文本分析,包括法律、法律论据和法律意见。法律实践者需要分析这些文本,以理解法律案例,研究法律先例,并准备法律文书。随着法律意见的大小不断增加,以至于开发一个可以准确预测法律意见的模型变得愈加挑战。在这篇研究论文中,我们提出一种新的模型建立方法,使用预训练语言模型(PLMs),并在文本中添加句子位置信息,以自动预测法律意见的文化角色。基于LegalEval@SemEval2023比赛获得的标注词汇集,我们示示了我们的方法需要 fewer parameters,相比较复杂的结构,计算成本更低,同时可以达到高效的表现。此外,我们还证明了在地方上添加更多注意力,以及基于BERT的层次模型,可以提高结果。

XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words

  • paper_url: http://arxiv.org/abs/2310.05235
  • repo_url: None
  • paper_authors: Robin Algayres, Pablo Diego-Simon, Benoit Sagot, Emmanuel Dupoux
    for: 这个论文的目的是提高无文本支持的语音分割任务的性能。methods: 这个论文使用了最新的自我超vised speech模型,通过精度调整来快速适应新任务,即使在资源匮乏的情况下。它们引入了 semi-supervised learning的想法,使用 XLS-R 模型预测语音分割系统生成的字Boundary。results: 这个论文的方法可以一直提高每种系统的性能,并在五种语言 corpora 上设置了新的状态态�idents,平均提高了130%的 F1 分数。此外,这个系统还可以在无seen语言中进行零shot分割。
    Abstract Due to the absence of explicit word boundaries in the speech stream, the task of segmenting spoken sentences into word units without text supervision is particularly challenging. In this work, we leverage the most recent self-supervised speech models that have proved to quickly adapt to new tasks through fine-tuning, even in low resource conditions. Taking inspiration from semi-supervised learning, we fine-tune an XLS-R model to predict word boundaries themselves produced by top-tier speech segmentation systems: DPDP, VG-HuBERT, GradSeg and DP-Parse. Once XLS-R is fine-tuned, it is used to infer new word boundary labels that are used in turn for another fine-tuning step. Our method consistently improves the performance of each system and sets a new state-of-the-art that is, on average 130% higher than the previous one as measured by the F1 score on correctly discovered word tokens on five corpora featuring different languages. Finally, our system can segment speech from languages unseen during fine-tuning in a zero-shot fashion.
    摘要

Generative Spoken Language Model based on continuous word-sized audio tokens

  • paper_url: http://arxiv.org/abs/2310.05224
  • repo_url: None
  • paper_authors: Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoit Sagot, Emmanuel Dupoux
  • for: 该论文旨在提出一种基于word-size连续值音频嵌入的生成语言模型(GSLM),以便生成多样化和表达力强的语言输出。
  • methods: 该模型使用了 Lexical Embedding 函数取代 lookup 表格,权重损失函数被替换为对比损失函数,以及多omial 采样被替换为 k-NN 采样。
  • results: 该模型的表现与基于分组单元 GSLMs 相当,自动度量器和人工评价都表示生成质量高,并且具有五倍的内存效率优势。此外,模型中的嵌入before和after Lexical Embedder 具有phonetics和semantics的可读性。
    Abstract In NLP, text language models based on words or subwords are known to outperform their character-based counterparts. Yet, in the speech community, the standard input of spoken LMs are 20ms or 40ms-long discrete units (shorter than a phoneme). Taking inspiration from word-based LM, we introduce a Generative Spoken Language Model (GSLM) based on word-size continuous-valued audio embeddings that can generate diverse and expressive language output. This is obtained by replacing lookup table for lexical types with a Lexical Embedding function, the cross entropy loss by a contrastive loss, and multinomial sampling by k-NN sampling. The resulting model is the first generative language model based on word-size continuous embeddings. Its performance is on par with discrete unit GSLMs regarding generation quality as measured by automatic metrics and subjective human judgements. Moreover, it is five times more memory efficient thanks to its large 200ms units. In addition, the embeddings before and after the Lexical Embedder are phonetically and semantically interpretable.
    摘要 (Simplified Chinese translation)在 NLP 中,文本语言模型 based on words or subwords 知道会比其字符基本的对手表现更好。然而,在语音社区中,标准输入的语音LMs 是20ms或40ms短于一个音素的分 discrete units。 drawing inspiration from word-based LM, we introduce a Generative Spoken Language Model (GSLM) based on word-size continuous-valued audio embeddings that can generate diverse and expressive language output. This is obtained by replacing lookup table for lexical types with a Lexical Embedding function, the cross entropy loss by a contrastive loss, and multinomial sampling by k-NN sampling. The resulting model is the first generative language model based on word-size continuous embeddings. Its performance is on par with discrete unit GSLMs regarding generation quality as measured by automatic metrics and subjective human judgements. Moreover, it is five times more memory efficient thanks to its large 200ms units. In addition, the embeddings before and after the Lexical Embedder are phonetically and semantically interpretable.

Probing Language Models from A Human Behavioral Perspective

  • paper_url: http://arxiv.org/abs/2310.05216
  • repo_url: None
  • paper_authors: Xintong Wang, Xiaoyu Li, Xingshan Li, Chris Biemann
  • for: This paper aims to provide a better understanding of how large language models (LLMs) work and how they make predictions.
  • methods: The authors use eye-tracking measures to correlate with the values produced by LLMs and compare them to those of recurrent neural network-based language models (RNN-LMs). They also analyze the functions of self-attention and gate mechanisms in LLMs.
  • results: The study finds that LLMs exhibit a distinct prediction pattern compared to RNN-LMs, with a peak in memorization and linguistic knowledge encoding as the number of feed-forward network (FFN) layers increases, followed by a pivot to comprehension capacity. The self-attention mechanisms are found to be distributed across multiple heads, and the gate mechanisms control the flow of information, with some gates promoting and others eliminating information.
    Abstract Large Language Models (LLMs) have emerged as dominant foundational models in modern NLP. However, the understanding of their prediction process and internal mechanisms, such as feed-forward networks and multi-head self-attention, remains largely unexplored. In this study, we probe LLMs from a human behavioral perspective, correlating values from LLMs with eye-tracking measures, which are widely recognized as meaningful indicators of reading patterns. Our findings reveal that LLMs exhibit a prediction pattern distinct from that of RNN-based LMs. Moreover, with the escalation of FFN layers, the capacity for memorization and linguistic knowledge encoding also surges until it peaks, subsequently pivoting to focus on comprehension capacity. The functions of self-attention are distributed across multiple heads. Lastly, we scrutinize the gate mechanisms, finding that they control the flow of information, with some gates promoting, while others eliminating information.
    摘要 Note:* "Large Language Models" (LLMs) 是现代 NLP 中最具代表性的基础模型,但它们的预测过程和内部机制仍然尚未得到充分的研究。* "feed-forward networks" (FFNs) 是 LLMs 的一种基本结构,它们在预测过程中发挥着重要的作用。* "multi-head self-attention" 是 LLMs 中的一种自注意机制,它可以帮助模型更好地理解语言结构和含义。* "eye-tracking measures" 是一种广泛用于研究人类阅读习惯的方法,它可以反映人们在阅读过程中的注意力和理解程度。* "gate mechanisms" 是 LLMs 中的一种控制信息流动的机制,它可以帮助模型更好地过滤不必要的信息并保留有用信息。

A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023

  • paper_url: http://arxiv.org/abs/2310.05203
  • repo_url: None
  • paper_authors: Ryuichi Yamamoto, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda
  • for: 这个论文targets the singing voice conversion challenge (SVCC) 2023, with a recognition-synthesis approach using self-supervised learning-based representation.
  • methods: 该方法首先使用公共可用的大规模750小时的语音和唱歌数据进行扩散基于的任意到任意语音转换模型的训练,然后对每个目标唱歌者/说话者进行微调。
  • results: 大规模的听力测试显示,我们的T13系统在SVCC 2023中获得了竞争力强的自然性和说话者相似性,这表明了我们的方法在跨频道SVC中的泛化能力。
    Abstract This paper presents our systems (denoted as T13) for the singing voice conversion challenge (SVCC) 2023. For both in-domain and cross-domain English singing voice conversion (SVC) tasks (Task 1 and Task 2), we adopt a recognition-synthesis approach with self-supervised learning-based representation. To achieve data-efficient SVC with a limited amount of target singer/speaker's data (150 to 160 utterances for SVCC 2023), we first train a diffusion-based any-to-any voice conversion model using publicly available large-scale 750 hours of speech and singing data. Then, we finetune the model for each target singer/speaker of Task 1 and Task 2. Large-scale listening tests conducted by SVCC 2023 show that our T13 system achieves competitive naturalness and speaker similarity for the harder cross-domain SVC (Task 2), which implies the generalization ability of our proposed method. Our objective evaluation results show that using large datasets is particularly beneficial for cross-domain SVC.
    摘要 这篇论文介绍我们的系统(简称为T13)在2023年歌唱voice conversions挑战(SVCC)中的应用。对于英语歌唱voice conversions(SVC)的内域和跨域任务(任务1和任务2),我们采用了认知-合成方法,使用自我超vised学习基于表示。为了实现数据精efficient的SVC,我们首先使用公共可用的大规模750小时的说话和唱歌数据来训练一个扩散-based any-to-anyvoice conversions模型。然后,我们对每个目标歌手/说话人进行了微调。SVCC 2023年的大规模听力测试显示,我们的T13系统在跨域SVC(任务2)中实现了竞争性的自然和说话人相似性,这表明了我们提出的方法的泛化能力。我们的目标评价结果表明,使用大量数据对跨域SVC是非常有利的。

Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

  • paper_url: http://arxiv.org/abs/2310.05199
  • repo_url: None
  • paper_authors: Wei Shen, Rui Zheng, Wenyu Zhan, Jun Zhao, Shihan Dou, Tao Gui, Qi Zhang, Xuanjing Huang
  • for: 这篇论文的目的是如何使用人类反馈来改善大型自然语言模型,使其更好地适应人类和社会价值。
  • methods: 这篇论文使用了Product-of-Experts(PoE)技术,将奖励模型分为两部分:主要专家关注人类意图,而偏见专家则targets the identification and capture of length bias。另外,为了进一步提高偏见的学习,我们导入了扰动 INTO the bias-focused expert, disrupting the flow of semantic information。
  • results: 实验结果显示,我们的方法可以改善语言模型的性能,不受序列长度的影响。
    Abstract Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values. This alignment requires a vast corpus of human feedback to learn a reward model, which is subsequently used to finetune language models. However, we have identified that the reward model often finds shortcuts to bypass its intended objectives, misleadingly assuming that humans prefer longer responses. The emergence of length bias often induces the model to favor longer outputs, yet it doesn't equate to an increase in helpful information within these outputs. In this paper, we propose an innovative solution, applying the Product-of-Experts (PoE) technique to separate reward modeling from the influence of sequence length. In our framework, the main expert concentrates on understanding human intents, while the biased expert targets the identification and capture of length bias. To further enhance the learning of bias, we introduce perturbations into the bias-focused expert, disrupting the flow of semantic information. Experimental results validate the effectiveness of our approach, indicating that language model performance is improved, irrespective of sequence length.
    摘要 大language模型可以通过人类反馈来进行强化学习,这种反馈可以帮助模型与人类和社会价值观念相Alignment。为了学习奖励模型,需要一个大量的人类反馈,然后使用这个奖励模型来精化语言模型。然而,我们发现奖励模型经常会寻找短cut的缺点,假设人类更喜欢 longer responses。这种Length bias会导致模型偏好 longer outputs,但这并不意味着这些输出中含有更多的有用信息。在这篇论文中,我们提出了一种创新的解决方案,通过Product-of-Experts(PoE)技术分离奖励模型和序列长度的影响。在我们的框架中,主专家专注于理解人类意图,而偏好专家则targets the identification and capture of length bias。为了进一步增强偏好的学习,我们引入了对偏好专家中的干扰,使得 semantic information的流动被中断。实验结果证明了我们的方法的有效性,表明语言模型的性能不受序列长度的限制。

FABRIC: Automated Scoring and Feedback Generation for Essays

  • paper_url: http://arxiv.org/abs/2310.05191
  • repo_url: None
  • paper_authors: Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Hyunseung Lim, Yoonsu Kim, Tak Yeon Lee, Hwajung Hong, Juho Kim, So-Yeon Ahn, Alice Oh
  • for: 这个论文是为了提供一种自动生成英语写作评分的工具,以帮助学生和教师在写作课程中更好地评分和反馈写作。
  • methods: 该论文使用了一种管道模型,包括DREsS、CASE和EssayCoT三部分。DREsS是一个基于标准的写作评分数据集,CASE是一种伪造策略,可以提高模型的准确率。EssayCoT是一种写作思维推荐策略,可以根据模型预测的分数提供更好的反馈。
  • results: 论文表明,使用新的数据集DREsS和伪造策略CASE可以提高模型的准确率,并且使用EssayCoT可以提供更好的反馈。论文还表明,学生和教师对新的评分和反馈表示满意,评分和反馈的帮助程度也得到了提升。
    Abstract Automated essay scoring (AES) provides a useful tool for students and instructors in writing classes by generating essay scores in real-time. However, previous AES models do not provide more specific rubric-based scores nor feedback on how to improve the essays, which can be even more important than the overall scores for learning. We present FABRIC, a pipeline to help students and instructors in English writing classes by automatically generating 1) the overall scores, 2) specific rubric-based scores, and 3) detailed feedback on how to improve the essays. Under the guidance of English education experts, we chose the rubrics for the specific scores as content, organization, and language. The first component of the FABRIC pipeline is DREsS, a real-world Dataset for Rubric-based Essay Scoring (DREsS). The second component is CASE, a Corruption-based Augmentation Strategy for Essays, with which we can improve the accuracy of the baseline model by 45.44%. The third component is EssayCoT, the Essay Chain-of-Thought prompting strategy which uses scores predicted from the AES model to generate better feedback. We evaluate the effectiveness of the new dataset DREsS and the augmentation strategy CASE quantitatively and show significant improvements over the models trained with existing datasets. We evaluate the feedback generated by EssayCoT with English education experts to show significant improvements in the helpfulness of the feedback across all rubrics. Lastly, we evaluate the FABRIC pipeline with students in a college English writing class who rated the generated scores and feedback with an average of 6 on the Likert scale from 1 to 7.
    摘要 自动化文章评分(AES)为学生和教师写作课程提供了一个有用的工具,可以在实时生成文章评分。然而,先前的AES模型并不提供更加特定的评分标准和提高文章的细节反馈,这些反馈可能对学习更加重要。我们介绍了FABRIC管道,帮助学生和教师英语写作课程,可以自动生成1)总评分,2)特定评分标准,以及3)提高文章的细节反馈。在英语教育专家的指导下,我们选择了评分标准的内容、组织和语言。FABRIC管道的第一个组成部分是DREsS,一个用于评分标准的实际数据集(DREsS)。第二个组成部分是CASE,一种对文章进行恶意增强策略,可以提高基线模型的准确率45.44%。第三个组成部分是EssayCoT,文章链条思维提醒策略,使用AES模型预测的分数来生成更好的反馈。我们评估了新的数据集DREsS和增强策略CASE的效果,并显示了与现有数据集训练的模型相比有显著提高。我们评估EssayCoT生成的反馈与英语教育专家相比,并显示了所有评分标准上的有用性提高。最后,我们评估了FABRIC管道与大学英语写作课程学生的反馈,学生对生成的分数和反馈给出了7分的满分评价。

Do Large Language Models Know about Facts?

  • paper_url: http://arxiv.org/abs/2310.05177
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Xuming Hu, Junzhe Chen, Xiaochuan Li, Yufei Guo, Lijie Wen, Philip S. Yu, Zhijiang Guo
  • for: 这paper的目的是评估大型自然语言处理模型(LLMs)中的事实知识,以及这些模型是否可以具备真实知识和抵御黑客攻击。
  • methods: 这paper使用了一个名为Pinocchio的benchmark,包含20000个多样化的事实问题,以评估LLMs中的事实知识。
  • results: 经过extensive的实验研究发现,现有的LLMs仍然缺乏事实知识,并且存在多种假相关性。
    Abstract Large language models (LLMs) have recently driven striking performance improvements across a range of natural language processing tasks. The factual knowledge acquired during pretraining and instruction tuning can be useful in various downstream tasks, such as question answering, and language generation. Unlike conventional Knowledge Bases (KBs) that explicitly store factual knowledge, LLMs implicitly store facts in their parameters. Content generated by the LLMs can often exhibit inaccuracies or deviations from the truth, due to facts that can be incorrectly induced or become obsolete over time. To this end, we aim to comprehensively evaluate the extent and scope of factual knowledge within LLMs by designing the benchmark Pinocchio. Pinocchio contains 20K diverse factual questions that span different sources, timelines, domains, regions, and languages. Furthermore, we investigate whether LLMs are able to compose multiple facts, update factual knowledge temporally, reason over multiple pieces of facts, identify subtle factual differences, and resist adversarial examples. Extensive experiments on different sizes and types of LLMs show that existing LLMs still lack factual knowledge and suffer from various spurious correlations. We believe this is a critical bottleneck for realizing trustworthy artificial intelligence. The dataset Pinocchio and our codes will be publicly available.
    摘要 大型自然语言处理模型(LLM)在最近几年中带来了一系列的性能提升。这些模型在预训练和调教过程中获得的 фактиче知识可以在多种下游任务中使用,如问答和语言生成。不同于传统的知识库(KB),LLM中的知识不是显式存储的,而是通过模型参数的方式隐式存储。由模型生成的内容经常会具有误差或不准确,因为模型可能会 incorrectly induce 或者随着时间的推移而变得过时。为了全面评估 LLM 中的知识范围和深度,我们设计了 Pinocchio benchmark。Pinocchio 包含 20,000 个多样化的 фактиче问题,这些问题来自不同的来源、时间线、领域、地区和语言。此外,我们还 investigate 了 LLM 是否能够组合多个 фактиче知识、 temporally 更新知识、理解多个知识之间的关系、察看微妙的知识差异以及抵御骚扰示例。我们对不同大小和类型的 LLM 进行了广泛的实验,发现现有 LLM 仍然缺乏知识和受到多种假 correlate 的影响。我们认为这是人工智能实现可信worthy 的核心瓶颈。Pinocchio 数据集和我们的代码将公开发布。

On the Zero-Shot Generalization of Machine-Generated Text Detectors

  • paper_url: http://arxiv.org/abs/2310.05165
  • repo_url: None
  • paper_authors: Xiao Pu, Jingyu Zhang, Xiaochuang Han, Yulia Tsvetkov, Tianxing He
  • for: 本研究的目的是检测机器生成的文本,以确定新生成器输出的真实性。
  • methods: 本研究使用了许多大语言模型生成的数据,并使用神经网络检测器来检测机器生成的文本。
  • results: 研究发现,使用中等大小的语言模型生成的数据来训练检测器,可以在其他大型模型上实现零基础泛化。这表明,可以通过将中等大小模型的数据作为基础,建立可靠的机器生成文本检测器。
    Abstract The rampant proliferation of large language models, fluent enough to generate text indistinguishable from human-written language, gives unprecedented importance to the detection of machine-generated text. This work is motivated by an important research question: How will the detectors of machine-generated text perform on outputs of a new generator, that the detectors were not trained on? We begin by collecting generation data from a wide range of LLMs, and train neural detectors on data from each generator and test its performance on held-out generators. While none of the detectors can generalize to all generators, we observe a consistent and interesting pattern that the detectors trained on data from a medium-size LLM can zero-shot generalize to the larger version. As a concrete application, we demonstrate that robust detectors can be built on an ensemble of training data from medium-sized models.
    摘要 大量的语言模型的蔓延,使得机器生成文本的检测成为了不可或缺的任务。这项工作受到一个重要的研究问题的推动:新生成器输出的机器生成文本检测器如何表现?我们开始sBy collecting generation data from a wide range of LLMs, and training neural detectors on data from each generator, we test the performance of these detectors on held-out generators. While none of the detectors can generalize to all generators, we observe a consistent and interesting pattern that the detectors trained on data from a medium-size LLM can zero-shot generalize to the larger version. As a concrete application, we demonstrate that robust detectors can be built on an ensemble of training data from medium-sized models.Here's the translation breakdown:* 大量 (dà liàng) - large amount* 语言模型 (yǔ yán módel) - language model* 蔓延 (shū yì) - rampant proliferation* 机器生成文本 (jī shì zhì yì wén tǐ) - machine-generated text* 检测 (jiǎn dòu) - detection* 新生成器 (xīn shēng chéng qì) - new generator* 输出 (xū chū) - output* 机器生成文本检测器 (jī shì zhì yì wén tǐ jiàn dòu qì) - machine-generated text detector* none of the detectors can generalize to all generators (zhè yī xiàng qù zhè yī xiàng qù) - none of the detectors can generalize to all generators* medium-size LLM (zhōng xiǎo yǔ yán módel) - medium-size language model* zero-shot generalize (zhè yī xiàng qù) - zero-shot generalize* ensemble (jiān) - ensemble* training data (liào xīng xīng) - training data* robust (dòu lì) - robust* detectors (jiàn dòu qì) - detectors

An Investigation of LLMs’ Inefficacy in Understanding Converse Relations

  • paper_url: http://arxiv.org/abs/2310.05163
  • repo_url: https://github.com/3b-group/convre
  • paper_authors: Chengwen Qi, Bowen Li, Binyuan Hui, Bailin Wang, Jinyang Li, Jinwang Wu, Yuanjun Laili
  • for: 本文 investigate LLMs 是否真的理解正式语言的结构化 semantics,通过一个特殊情况——抽象 binary relation。
  • methods: 本文 introduce 一个新的 benchmark ConvRE,该 benchmark 包含 17 关系和 1240 个 triple 从受欢迎的知识 Graph completion 数据集中提取出来。本 benchmark 包含 two 个任务:Re2Text 和 Text2Re,它们是通过多选问答来评估 LLMs 对关系和相关文本的匹配能力。
  • results: 经过实验表明,LLMs 经常采用短cut 学习,并且在我们的 proposed benchmark 上仍然遇到挑战。
    Abstract Large Language Models (LLMs) have achieved remarkable success in many formal language oriented tasks, such as structural data-to-text and semantic parsing. However current benchmarks mostly follow the data distribution of the pre-training data of LLMs. Therefore, a natural question rises that do LLMs really understand the structured semantics of formal languages. In this paper, we investigate this problem on a special case, converse binary relation. We introduce a new benchmark ConvRe focusing on converse relations, which contains 17 relations and 1240 triples extracted from popular knowledge graph completion datasets. Our ConvRE features two tasks, Re2Text and Text2Re, which are formulated as multi-choice question answering to evaluate LLMs' ability to determine the matching between relations and associated text. For the evaluation protocol, apart from different prompting methods, we further introduce variants to the test text and few-shot example text. We conduct experiments on three popular LLM families and have observed various scaling trends. The results suggest that LLMs often resort to shortcut learning and still face challenges on our proposed benchmark.
    摘要

Recurrent Neural Language Models as Probabilistic Finite-state Automata

  • paper_url: http://arxiv.org/abs/2310.05161
  • repo_url: None
  • paper_authors: Anej Svete, Ryan Cotterell
  • for: 本文研究语言模型(LM)的表示能力和限制,通过使用已知的ormalism来准确地描述LM的能力和限制。
  • methods: 本文使用了回归神经网络(RNN)LM来研究LM可以表示哪些概率分布。
  • results: 研究结果表明,简单的RNN可以表示一个子集的概率分布,而且需要至少有 $\Omega\left(N |\Sigma|\right)$ 神经元来表示一个任意决定性 finite-state LM。
    Abstract Studying language models (LMs) in terms of well-understood formalisms allows us to precisely characterize their abilities and limitations. Previous work has investigated the representational capacity of recurrent neural network (RNN) LMs in terms of their capacity to recognize unweighted formal languages. However, LMs do not describe unweighted formal languages -- rather, they define probability distributions over strings. In this work, we study what classes of such probability distributions RNN LMs can represent, which allows us to make more direct statements about their capabilities. We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata, and can thus model a strict subset of probability distributions expressible by finite-state models. Furthermore, we study the space complexity of representing finite-state LMs with RNNs. We show that, to represent an arbitrary deterministic finite-state LM with $N$ states over an alphabet $\Sigma$, an RNN requires $\Omega\left(N |\Sigma|\right)$ neurons. These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.
    摘要 Translated into Simplified Chinese:研究语言模型(LM)使用已知的形式主义,可以准确地描述它们的能力和局限性。先前的工作已经研究了基于回归神经网络(RNN)的语言模型的表示能力,但是LM不是形式语言的描述,而是一种字符串上的概率分布。在这项工作中,我们研究了RNN可以表示哪些类型的概率分布,这使得我们可以更直接地说明它们的能力。我们显示了简单的RNN等价于一个子集的概率金字塔自动机,因此它们可以模型一 subset of概率分布可以由金字塔自动机表示。此外,我们研究了表示finite-state LM的RNN空间复杂度。我们显示了,要表示一个任意deterministic finite-state LM,需要$\Omega\left(N |\Sigma|\right)$ neuron。这些结果为我们帮助理解RNN可以表示哪些类型的概率分布,并且帮助我们理解它们的能力和局限性。

  • paper_url: http://arxiv.org/abs/2310.05150
  • repo_url: https://github.com/sebischair/kg-conv-exploratory-search
  • paper_authors: Phillip Schneider, Nils Rehtanz, Kristiina Jokinen, Florian Matthes
  • for: 这篇研究旨在探索新闻文章中的探索搜寻,以实现对话式搜寻和知识库的融合,从而将结构化和无结构化资料搜寻融合在一起。
  • methods: 本研究使用了对话式搜寻系统和知识库来支持探索搜寻,并透过自然语言问题来询问新闻文章中的相关资讯。
  • results: 根据54名参与者的用户研究,这种基于知识库的对话式搜寻系统被证明是有效的,并且提供了开发这类系统的设计假设。
    Abstract Exploratory search is an open-ended information retrieval process that aims at discovering knowledge about a topic or domain rather than searching for a specific answer or piece of information. Conversational interfaces are particularly suitable for supporting exploratory search, allowing users to refine queries and examine search results through interactive dialogues. In addition to conversational search interfaces, knowledge graphs are also useful in supporting information exploration due to their rich semantic representation of data items. In this study, we demonstrate the synergistic effects of combining knowledge graphs and conversational interfaces for exploratory search, bridging the gap between structured and unstructured information retrieval. To this end, we propose a knowledge-driven dialogue system for exploring news articles by asking natural language questions and using the graph structure to navigate between related topics. Based on a user study with 54 participants, we empirically evaluate the effectiveness of the graph-based exploratory search and discuss design implications for developing such systems.
    摘要 探索搜寻是一种开放式搜寻过程,旨在探索一个主题或领域中的知识而不是寻找具体的答案或信息。对话式 интерфей斯特别适合支持探索搜寻,允许用户通过交互对话来细化查询和检视搜寻结果。此外,知识图也非常有用于支持信息探索,因为它们可以提供丰富的Semantic Representation的数据项。在这项研究中,我们证明了结合知识图和对话式 интерфей斯可以减少结构化和无结构化搜寻之间的差距,并提供一种基于知识的对话系统来探索新闻文章。基于54名参与者的用户研究,我们Empirically评估了图structure-based探索搜寻的效果,并讨论了开发这类系统的设计方面。Here's a word-for-word translation of the text into Simplified Chinese:探索搜寻是一种开放式搜寻过程,旨在探索一个主题或领域中的知识而不是寻找具体的答案或信息。对话式 интерфей斯特别适合支持探索搜寻,允许用户通过交互对话来细化查询和检视搜寻结果。此外,知识图也非常有用于支持信息探索,因为它们可以提供丰富的Semantic Representation的数据项。在这项研究中,我们证明了结合知识图和对话式 интерфей斯可以减少结构化和无结构化搜寻之间的差距,并提供一种基于知识的对话系统来探索新闻文章。基于54名参与者的用户研究,我们Empirically评估了图structure-based探索搜寻的效果,并讨论了开发这类系统的设计方面。

Retrieval-Generation Synergy Augmented Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05149
  • repo_url: None
  • paper_authors: Zhangyin Feng, Xiaocheng Feng, Dezhi Zhao, Maojin Yang, Bing Qin
  • for: 提高大型自然语言模型的理解能力和多步逻辑能力
  • methods: 融合任务相关文档和大型自然语言模型,通过反射-生成协作机制,利用参数化和非参数化知识,找到正确的逻辑路径
  • results: 在四个问答任务上,经验结果表明我们的方法可以显著提高大型自然语言模型的逻辑能力,并超越先前的基eline。
    Abstract Large language models augmented with task-relevant documents have demonstrated impressive performance on knowledge-intensive tasks. However, regarding how to obtain effective documents, the existing methods are mainly divided into two categories. One is to retrieve from an external knowledge base, and the other is to utilize large language models to generate documents. We propose an iterative retrieval-generation collaborative framework. It is not only able to leverage both parametric and non-parametric knowledge, but also helps to find the correct reasoning path through retrieval-generation interactions, which is very important for tasks that require multi-step reasoning. We conduct experiments on four question answering datasets, including single-hop QA and multi-hop QA tasks. Empirical results show that our method significantly improves the reasoning ability of large language models and outperforms previous baselines.
    摘要 大型语言模型,通过与任务相关的文档的协同工作,已经在知识型任务中表现出了惊人的表现。然而,现有的方法主要分为两类:一是从外部知识库中检索,另一是利用大型语言模型生成文档。我们提出了一种迭代检索生成协同框架,不仅能充分利用参数化和非参数化知识,而且能够通过检索生成互动,找到正确的逻辑路径,这对于需要多步逻辑的任务非常重要。我们在四个问答dataset上进行了实验,包括单步QA和多步QA任务。实验结果表明,我们的方法可以显著提高大型语言模型的逻辑能力,并超越先前的基elines。

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature

  • paper_url: http://arxiv.org/abs/2310.05130
  • repo_url: https://github.com/baoguangsheng/fast-detect-gpt
  • paper_authors: Guangsheng Bao, Yanbin Zhao, Zhiyang Teng, Linyi Yang, Yue Zhang
    for: 本研究旨在分别区别机器生成和人类撰写的内容,以建立可信赖的人工智能系统。methods: 本研究使用了 conditional probability curvature 来显示机器学习模型和人类之间的差异。results: Fast-DetectGPT 比 DetectGPT 更高效,可以在不同的数据集、来源模型和测试环境下提高检测效能,并且可以实现340倍的速度提升。
    Abstract Large language models (LLMs) have shown the ability to produce fluent and cogent content, presenting both productivity opportunities and societal risks. To build trustworthy AI systems, it is imperative to distinguish between machine-generated and human-authored content. The leading zero-shot detector, DetectGPT, showcases commendable performance but is marred by its intensive computational costs. In this paper, we introduce the concept of conditional probability curvature to elucidate discrepancies in word choices between LLMs and humans within a given context. Utilizing this curvature as a foundational metric, we present Fast-DetectGPT, an optimized zero-shot detector, which substitutes DetectGPT's perturbation step with a more efficient sampling step. Our evaluations on various datasets, source models, and test conditions indicate that Fast-DetectGPT not only outperforms DetectGPT in both the white-box and black-box settings but also accelerates the detection process by a factor of 340, as detailed in Table 1.
    摘要

Enhancing Document-level Event Argument Extraction with Contextual Clues and Role Relevance

  • paper_url: http://arxiv.org/abs/2310.05991
  • repo_url: https://github.com/LWL-cpu/SCPRG-master
  • paper_authors: Wanlong Liu, Shaohuan Cheng, Dingyi Zeng, Hong Qu
  • for: 这个论文主要针对的是文档级事件抽象EXTRACTION中的新挑战,即输入长度大、跨句理解。
  • methods: 我们提出了一种基于Span-trigger-based Contextual Pooling和 latent Role Guidance的SCPRG模型,包括两个新的有效模块,即 Span-Trigger-based Contextual Pooling(STCP)和 Role-based Latent Information Guidance (RLIG)。
  • results: 我们的SCPRG模型在两个公共数据集上进行了比较,与之前的状态态方法相比,提高了1.13和2.64的F1分数。
    Abstract Document-level event argument extraction poses new challenges of long input and cross-sentence inference compared to its sentence-level counterpart. However, most prior works focus on capturing the relations between candidate arguments and the event trigger in each event, ignoring two crucial points: a) non-argument contextual clue information; b) the relevance among argument roles. In this paper, we propose a SCPRG (Span-trigger-based Contextual Pooling and latent Role Guidance) model, which contains two novel and effective modules for the above problem. The Span-Trigger-based Contextual Pooling(STCP) adaptively selects and aggregates the information of non-argument clue words based on the context attention weights of specific argument-trigger pairs from pre-trained model. The Role-based Latent Information Guidance (RLIG) module constructs latent role representations, makes them interact through role-interactive encoding to capture semantic relevance, and merges them into candidate arguments. Both STCP and RLIG introduce no more than 1% new parameters compared with the base model and can be easily applied to other event extraction models, which are compact and transplantable. Experiments on two public datasets show that our SCPRG outperforms previous state-of-the-art methods, with 1.13 F1 and 2.64 F1 improvements on RAMS and WikiEvents respectively. Further analyses illustrate the interpretability of our model.
    摘要 文档级事件语义抽象带来新的挑战,包括长输入和跨句 inference,与句子级对应的模型不同。然而,大多数前作 FoCUS 在事件触发器和候选参与者之间的关系,忽略了两点: a) 非参与者上下文提示信息; b) 参与者角色之间的相关性。在这篇论文中,我们提出了一种SCPRG(Span-trigger-based Contextual Pooling and latent Role Guidance)模型,其包含两个新的有效模块。Span-Trigger-based Contextual Pooling(STCP)模块根据特定参与者-触发器对的上下文注意力权重自适应地选择和聚合非参与者提示词的信息。Role-based Latent Information Guidance(RLIG)模块构建了latent角色表示,使其互相交互编码,捕捉 semantic relevance,并将其与候选参与者结合。STCP和RLIG模块新增 Parameters 不超过1%,可以与基础模型一起使用,并且可以轻松应用于其他事件抽象模型。我们在两个公共数据集上进行了实验,结果显示,我们的SCPRG模型在RAMS和WikiEvents上的F1分别提高1.13和2.64。进一步的分析表明了我们模型的可读性。

CARLG: Leveraging Contextual Clues and Role Correlations for Improving Document-level Event Argument Extraction

  • paper_url: http://arxiv.org/abs/2310.05116
  • repo_url: None
  • paper_authors: Wanlong Liu, Wenyu Chen, Dingyi Zeng, Li Zhou, Hong Qu
  • for: 提高文档级事件抽象EXTRACTION的精度。
  • methods: 提出了一种基于CONTEXTUAL CLUES和ROLE correlation的CARLG模型,包括CONTEXTUAL CLUES Aggregation(CCA)模块和ROLE-based Latent Information Guidance(RLIG)模块,利用上下文注意力权重和角色相互作用编码,从而提高文档级EXTRACTION的精度。
  • results: 在RAMS、WikiEvents和MLEE datasets上进行了广泛的实验,并证明了CARLG模型的超越性,与之前的状态艺术方法相比,提高了1.26倍、1.22倍和1.98倍的F1分数,同时降低了推理时间 by 31%。
    Abstract Document-level event argument extraction (EAE) is a crucial but challenging subtask in information extraction. Most existing approaches focus on the interaction between arguments and event triggers, ignoring two critical points: the information of contextual clues and the semantic correlations among argument roles. In this paper, we propose the CARLG model, which consists of two modules: Contextual Clues Aggregation (CCA) and Role-based Latent Information Guidance (RLIG), effectively leveraging contextual clues and role correlations for improving document-level EAE. The CCA module adaptively captures and integrates contextual clues by utilizing context attention weights from a pre-trained encoder. The RLIG module captures semantic correlations through role-interactive encoding and provides valuable information guidance with latent role representation. Notably, our CCA and RLIG modules are compact, transplantable and efficient, which introduce no more than 1% new parameters and can be easily equipped on other span-base methods with significant performance boost. Extensive experiments on the RAMS, WikiEvents, and MLEE datasets demonstrate the superiority of the proposed CARLG model. It outperforms previous state-of-the-art approaches by 1.26 F1, 1.22 F1, and 1.98 F1, respectively, while reducing the inference time by 31%. Furthermore, we provide detailed experimental analyses based on the performance gains and illustrate the interpretability of our model.
    摘要 文档级事件参数提取(EAE)是信息提取中的关键但是挑战性任务。现有大多数方法强调事件触发器和参数之间的交互,忽略了两个关键点:文档背景信息和参数角色之间的 semantics 相关性。在这篇论文中,我们提出了 CARLG 模型,它由两个模块组成:文档背景信息汇集(CCA)和角色相关信息引导(RLIG)。CCA 模块可以适应地捕捉和 инте integrate 文档背景信息,并通过使用上下文注意力权重从预训练的 encoder 获得上下文注意力权重。RLIG 模块通过角色交互编码来捕捉参数角色之间的 semantics 相关性,并提供有价值的信息引导,使用潜在角色表示。各自CCA和RLIG模块都是紧凑、可移植和高效的,其新增参数不超过 1%,可以轻松地在其他基于宽度的方法上采用,并且可以提高性能。我们在 RAMS、WikiEvents 和 MLEE 数据集上进行了广泛的实验,并证明了我们的 CARLG 模型在这些数据集上的超越性。它与前一个状态的方法相比,提高了 1.26 F1、1.22 F1 和 1.98 F1,同时降低了推理时间 31%。此外,我们还提供了详细的实验分析,以及模型的可读性。

Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction

  • paper_url: http://arxiv.org/abs/2310.05092
  • repo_url: None
  • paper_authors: Jun Gao, Huan Zhao, Yice Zhang, Wei Wang, Changlong Yu, Ruifeng Xu
  • for: 本研究旨在探讨大语言模型(LLMs)在自然语言处理中的信息提取 task 中的应用。
  • methods: 本研究使用了精细化的信息提取标准 benchmark 数据集,并采用了加强的提取规则和输出格式来适应 LLMS 的能力。
  • results: 我们的研究发现,使用encoder-decoder模型(特别是 T5 和 FLAN-T5)可以在不同的信息类型中具有普适性,而 ChatGPT 则在新任务形态中具有更高的适应性。我们的结果还表明,模型缩放不是决定性的性能因素,architecture、数据多样性和学习技术也具有重要的作用。这项研究为 LLMS 在信息提取中的更加细化和多样化应用提供了道路。
    Abstract Information Extraction (IE) is an essential task in Natural Language Processing. Traditional methods have relied on coarse-grained extraction with simple instructions. However, with the emergence of Large Language Models (LLMs), there is a need to adapt IE techniques to leverage the capabilities of these models. This paper introduces a fine-grained IE benchmark dataset tailored for LLMs, employing augmented instructions for each information type, which includes task descriptions, extraction rules, output formats, and examples. Through extensive evaluations, we observe that encoder-decoder models, particularly T5 and FLAN-T5, perform well in generalizing to unseen information types, while ChatGPT exhibits greater adaptability to new task forms. Our results also indicate that performance is not solely dictated by model scale, and highlight the significance of architecture, data diversity, and learning techniques. This work paves the way for a more refined and versatile utilization of LLMs in Information Extraction.
    摘要 信息提取(IE)是自然语言处理中的一项重要任务。传统方法通常采用粗粒度提取,使用简单的指令。然而,随着大语言模型(LLM)的出现,需要对IE技术进行适应。本文介绍了一个适合LLM的细致提取数据集,使用了增强的指令集,包括任务描述、提取规则、输出格式和示例。经过广泛的评估,我们发现使用encoder-decoder模型,特别是T5和FLAN-T5,在未经见情报类型上进行泛化性能良好,而ChatGPT在新任务形式上表现出更大的适应性。我们的结果还表明,性能不 solely 受模型规模的限制,也受到体系、数据多样性和学习技巧的影响。这项工作为LLM在信息提取中更加细致和多样化的使用开出了新的可能性。

Enhancing Argument Structure Extraction with Efficient Leverage of Contextual Information

  • paper_url: http://arxiv.org/abs/2310.05073
  • repo_url: https://github.com/luoxiaoheics/ecase
  • paper_authors: Yun Luo, Zhen Yang, Fandong Meng, Yingjie Li, Jie Zhou, Yue Zhang
  • for: 本研究旨在提高对文档中Arguments的结构分析性能。
  • methods: 我们提出了一种高效的上下文感知ASE模型(ECASE),利用上下文信息来增强模型的表达能力和训练数据。具体来说,我们引入了序列注意力模块和距离权重相似损失函数,以便聚合上下文信息和 argumentative 信息。此外,我们还随机屏蔽了文档中的讨论标识符和句子,以降低模型对特定单词或 menos informative 句子的依赖。
  • results: 我们在五个不同领域的五个数据集上进行了实验,并确认了我们的模型在这些数据集上的状态知识表现。此外,我们还进行了减少模块的研究,以证明每个模块在我们的模型中的效果。
    Abstract Argument structure extraction (ASE) aims to identify the discourse structure of arguments within documents. Previous research has demonstrated that contextual information is crucial for developing an effective ASE model. However, we observe that merely concatenating sentences in a contextual window does not fully utilize contextual information and can sometimes lead to excessive attention on less informative sentences. To tackle this challenge, we propose an Efficient Context-aware ASE model (ECASE) that fully exploits contextual information by enhancing modeling capacity and augmenting training data. Specifically, we introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information. Additionally, we augment the training data by randomly masking discourse markers and sentences, which reduces the model's reliance on specific words or less informative sentences. Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance. Furthermore, ablation studies confirm the effectiveness of each module in our model.
    摘要 Argument structure extraction (ASE) targets to identify the discourse structure of arguments within documents. Previous research has shown that contextual information is crucial for developing an effective ASE model. However, we find that simply concatenating sentences in a contextual window does not fully utilize contextual information and can sometimes lead to excessive attention on less informative sentences. To address this challenge, we propose an Efficient Context-aware ASE model (ECASE) that fully exploits contextual information by enhancing modeling capacity and augmenting training data. Specifically, we introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information. Additionally, we augment the training data by randomly masking discourse markers and sentences, which reduces the model's reliance on specific words or less informative sentences. Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance. Furthermore, ablation studies confirm the effectiveness of each module in our model.Here's the word-for-word translation:Argument structure extraction (ASE) targets to identify the discourse structure of arguments within documents. Previous research has shown that contextual information is crucial for developing an effective ASE model. However, we find that simply concatenating sentences in a contextual window does not fully utilize contextual information and can sometimes lead to excessive attention on less informative sentences. To address this challenge, we propose an Efficient Context-aware ASE model (ECASE) that fully exploits contextual information by enhancing modeling capacity and augmenting training data. Specifically, we introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information. Additionally, we augment the training data by randomly masking discourse markers and sentences, which reduces the model's reliance on specific words or less informative sentences. Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance. Furthermore, ablation studies confirm the effectiveness of each module in our model.

Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration

  • paper_url: http://arxiv.org/abs/2310.05069
  • repo_url: https://github.com/ercong21/calibration
  • paper_authors: Ercong Nie, Helmut Schmid, Hinrich Schütze
  • for: 这个论文主要针对Zero-shot和少量示例情景下的多语言任务和语言探测问题。
  • methods: 这个论文使用预训练多语言encoder模型,通过重写输入示例为cloze风格的问题,直接完成多语言任务或语言探测。这种方法不需要更新模型参数。但是,模型偏好预测频繁出现的标签词,导致性能有限制。为了解决这个问题,这个论文提出了一种简单的准确化方法,并与其他现有技术进行比较。
  • results: 这个论文使用准确化技术与预训练多语言encoder模型结合,在多种任务中实现了显著性能提升。
    Abstract Pretrained multilingual encoder models can directly perform zero-shot multilingual tasks or linguistic probing by reformulating the input examples into cloze-style prompts. This is accomplished by predicting the probabilities of the label words at the masked token position, without requiring any updates to the model parameters. However, the performance of this method is limited by the model's bias toward predicting label words which frequently occurred during the pretraining. These words typically receive high probabilities. To address this issue, we combine the models with calibration techniques which modify the probabilities of label words predicted by the models. We first validate the effectiveness of a proposed simple calibration method together with other existing techniques on monolingual encoders in both zero- and few-shot scenarios. We subsequently employ these calibration techniques on multilingual encoders, resulting in substantial performance improvements across a wide range of tasks.
    摘要 预训练多语言encoder模型可以直接执行零shot多语言任务或语言探测,通过重写输入示例为cloze样式提示。这是通过预测掩码Token位置的标签词概率,不需要更新模型参数。然而,这种方法的性能受到模型对预测常见的标签词的偏好的限制。这些词通常会 Receive高概率预测。为解决这个问题,我们将模型与加拟定技术相结合, modify模型预测标签词的概率。我们首先验证提议的简单加拟定方法,以及其他现有的技术在单语言encoder上的效果。然后,我们在多语言encoder上使用这些加拟定技术, resulting in 广泛任务中的性能提升。

Guideline Learning for In-context Information Extraction

  • paper_url: http://arxiv.org/abs/2310.05066
  • repo_url: None
  • paper_authors: Chaoxu Pang, Yixuan Cao, Qiang Ding, Ping Luo
  • for: 提高嵌入式学习(ICL)中的信息提取性能(IE)。
  • methods: 提出指南学习(GL)框架,在学习阶段自动生成指南,在推断阶段根据错误案例选择有助于ICL的指南。同时,提出基于自我一致性的活动学习方法,提高GL的效率。
  • results: 在事件提取和关系提取任务上,GL可以显著提高嵌入式IE的性能。
    Abstract Large language models (LLMs) can perform a new task by merely conditioning on task instructions and a few input-output examples, without optimizing any parameters. This is called In-Context Learning (ICL). In-context Information Extraction (IE) has recently garnered attention in the research community. However, the performance of In-context IE generally lags behind the state-of-the-art supervised expert models. We highlight a key reason for this shortfall: underspecified task description. The limited-length context struggles to thoroughly express the intricate IE task instructions and various edge cases, leading to misalignment in task comprehension with humans. In this paper, we propose a Guideline Learning (GL) framework for In-context IE which reflectively learns and follows guidelines. During the learning phrase, GL automatically synthesizes a set of guidelines based on a few error cases, and during inference, GL retrieves helpful guidelines for better ICL. Moreover, we propose a self-consistency-based active learning method to enhance the efficiency of GL. Experiments on event extraction and relation extraction show that GL can significantly improve the performance of in-context IE.
    摘要

sign.mt: Real-Time Multilingual Sign Language Translation Application

  • paper_url: http://arxiv.org/abs/2310.05064
  • repo_url: None
  • paper_authors: Amit Moryossef
  • for: 这个研究旨在为听语和手语之间的交流问题提供解决方案,实现语言通信的协调。
  • methods: 这个开源应用程序使用了现代的开源模型,包括对话语言模型和手语识别模型,以提供即时多语言对话的转换。
  • results: 这个应用程序可以实现即时多语言对话的转换,并且提供了自定义的真实人工手语演示,以激发用户参与和满意度。
    Abstract This demo paper presents sign.mt, an open-source application pioneering real-time multilingual bi-directional translation between spoken and signed languages. Harnessing state-of-the-art open-source models, this tool aims to address the communication divide between the hearing and the deaf, facilitating seamless translation in both spoken-to-signed and signed-to-spoken translation directions. Promising reliable and unrestricted communication, sign.mt offers offline functionality, crucial in areas with limited internet connectivity. It further enhances user engagement by offering customizable photo-realistic sign language avatars, thereby encouraging a more personalized and authentic user experience. Licensed under CC BY-NC-SA 4.0, sign.mt signifies an important stride towards open, inclusive communication. The app can be used, and modified for personal and academic uses, and even supports a translation API, fostering integration into a wider range of applications. However, it is by no means a finished product. We invite the NLP community to contribute towards the evolution of sign.mt. Whether it be the integration of more refined models, the development of innovative pipelines, or user experience improvements, your contributions can propel this project to new heights. Available at https://sign.mt, it stands as a testament to what we can achieve together, as we strive to make communication accessible to all.
    摘要 这个示例文章介绍了一个开源应用程序,即sign.mt,它实现了实时多语言对话转化,包括口头语言和手语两种语言之间的对话转化。使用现有的开源模型,这工具计划解决听力和耳语之间的沟通差异,为听力和耳语之间的对话提供流畅的翻译。 sign.mt 提供了可靠和无限制的沟通,并且在网络连接性较差的地区具有离线功能。它还提高了用户参与度,通过提供可定制的真实手语人物,使用户感受到更个性化和原始的用户体验。 根据 CC BY-NC-SA 4.0 许可证,sign.mt 表示开放、包容的沟通的重要一步。这个应用程序可以用于个人和学术用途,甚至支持翻译 API,以便更广泛地应用。尽管不是一款完整的产品,但我们邀请 NLP 社区参与 sign.mt 的演进。你的贡献可以使这个项目走向更高的峰点,包括更加精准的模型集成、创新的管道开发和用户体验改进等。可以在 上获取更多信息。

BRAINTEASER: Lateral Thinking Puzzles for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05057
  • repo_url: None
  • paper_authors: Yifan Jiang, Filip Ilievski, Kaixin Ma, Zhivar Sourati
  • for: 该论文旨在检验语义理解模型是否具备倾向性思维能力,以及模型是否能够扭转默认知的关系。
  • methods: 该论文使用了多选问答任务,以检验模型的倾向性思维能力。其中,模型需要从多个选项中选择正确答案,而不是直接回答问题。
  • results: 研究发现,当前的语义理解模型在倾向性思维任务中表现不佳,与人类表现的 gap 较大。此外,模型在不同的倾向性思维任务中的表现也异常。
    Abstract The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms. While such vertical thinking tasks have been relatively popular, lateral thinking puzzles have received little attention. To bridge this gap, we devise BRAINTEASER: a multiple-choice Question Answering task designed to test the model's ability to exhibit lateral thinking and defy default commonsense associations. We design a three-step procedure for creating the first lateral thinking benchmark, consisting of data collection, distractor generation, and generation of adversarial examples, leading to 1,100 puzzles with high-quality annotations. To assess the consistency of lateral reasoning by models, we enrich BRAINTEASER based on a semantic and contextual reconstruction of its questions. Our experiments with state-of-the-art instruction- and commonsense language models reveal a significant gap between human and model performance, which is further widened when consistency across adversarial formats is considered. We make all of our code and data available to stimulate work on developing and evaluating lateral thinking models.
    摘要 成功的语言模型使得自然语言处理(NLP)社区受到了关注,把注意力转移到需要间接和复杂的理解的任务上。虽然垂直思维任务在某种程度上受到了普遍的关注,但是水平思维拼图得到了少量的关注。为了填补这个差距,我们设计了Brainteaser:一种多选问答任务,旨在测试模型的水平思维能力和脱离默认的共同理解。我们采用了三步过程来创建第一个水平思维标准 benchmark:数据收集、distractor生成和对抗示例生成,共计1,100个高质量注释的拼图。为了评估模型的水平思维一致性,我们对Brainteaser的问题进行了semantic和contextual重建。我们的实验表明,当模型面临水平思维任务时,与人类的表现存在显著的差距,此差距甚至在对抗格式的一致性上受到了进一步的扩大。我们将所有的代码和数据公开,以便激励开发和评估水平思维模型的工作。

Harnessing the Power of ChatGPT in Fake News: An In-Depth Exploration in Generation, Detection and Explanation

  • paper_url: http://arxiv.org/abs/2310.05046
  • repo_url: None
  • paper_authors: Yue Huang, Lichao Sun
  • For: The paper aims to explore ChatGPT’s proficiency in generating, explaining, and detecting fake news.* Methods: The paper employs four prompt methods to generate fake news samples and obtains nine features to characterize fake news based on ChatGPT’s explanations. It also examines ChatGPT’s capacity to identify fake news and proposes a reason-aware prompt method to improve its performance.* Results: The paper demonstrates that ChatGPT shows commendable performance in detecting fake news, but there is still room for improvement. It also explores the potential extra information that could bolster its effectiveness in detecting fake news.Here are the three key points in Simplified Chinese text:* For: 本研究旨在探讨ChatGPT在生成、解释和检测假新闻方面的能力。* Methods: 本研究使用四种提示方法生成假新闻样本,并通过自我评估和人类评估来证明这些样本的质量。同时,我们从ChatGPT的解释中获取了九个特征来Characterize假新闻,并分析这些特征在多个公共数据集中的分布。* Results: 我们的实验表明,ChatGPT在检测假新闻方面表现了可嘉的表现,但仍有改进的空间。我们还探讨了可能会增强其检测假新闻效果的额外信息。
    Abstract The rampant spread of fake news has adversely affected society, resulting in extensive research on curbing its spread. As a notable milestone in large language models (LLMs), ChatGPT has gained significant attention due to its exceptional natural language processing capabilities. In this study, we present a thorough exploration of ChatGPT's proficiency in generating, explaining, and detecting fake news as follows. Generation -- We employ four prompt methods to generate fake news samples and prove the high quality of these samples through both self-assessment and human evaluation. Explanation -- We obtain nine features to characterize fake news based on ChatGPT's explanations and analyze the distribution of these factors across multiple public datasets. Detection -- We examine ChatGPT's capacity to identify fake news. We explore its detection consistency and then propose a reason-aware prompt method to improve its performance. Although our experiments demonstrate that ChatGPT shows commendable performance in detecting fake news, there is still room for its improvement. Consequently, we further probe into the potential extra information that could bolster its effectiveness in detecting fake news.
    摘要 《假新闻的普遍传播对社会造成了不良影响,导致了各方对其散布的研究。作为大型自然语言模型(LLM)的一项重要里程碑,ChatGPT在自然语言处理方面表现出了突出的能力。本研究中,我们对ChatGPT的能力进行了全面探索,具体来说是:生成、解释和检测假新闻。生成——我们使用四种提示方法生成假新闻样本,并通过自我评估和人类评估来证明这些样本的质量。解释——我们从ChatGPT的解释中提取了九个特征来 caracterize假新闻,并分析这些特征在多个公共数据集中的分布。检测——我们检查ChatGPT是否能够识别假新闻。我们首先检查其检测的一致性,然后提出了基于理由的提示方法来提高其性能。虽然我们的实验表明ChatGPT在检测假新闻方面表现出了良好的表现,但还有一些可以提高其效果的空间。因此,我们进一步探索可能会增强其检测假新闻的效iveness的额外信息。

Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading

  • paper_url: http://arxiv.org/abs/2310.05029
  • repo_url: None
  • paper_authors: Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz
  • for: 这篇论文的目的是提出一种新的长文理解方法,以解决现有的自注意机制受限的问题。
  • methods: 该方法基于论文自动浏览器,首先将长文处理成摘要节点树,然后根据查询提交,通过 iterative prompting 方式,论文模型在树上寻找相关信息,并在获得足够信息后提供答案。
  • results: 与基eline方法相比,该方法在长文问答任务上表现出色,并且可以增强解释性,通过在浏览过程中高亮相关的文本段落。
    Abstract Large language models (LLMs) have advanced in large strides due to the effectiveness of the self-attention mechanism that processes and compares all tokens at once. However, this mechanism comes with a fundamental issue -- the predetermined context window is bound to be limited. Despite attempts to extend the context window through methods like extrapolating the positional embedding, using recurrence, or selectively retrieving essential parts of the long sequence, long-text understanding continues to be a challenge. We propose an alternative approach which instead treats the LLM as an interactive agent, allowing it to decide how to read the text via iterative prompting. We introduce MemWalker, a method that first processes the long context into a tree of summary nodes. Upon receiving a query, the model navigates this tree in search of relevant information, and responds once it gathers sufficient information. On long-text question answering tasks our method outperforms baseline approaches that use long context windows, recurrence, and retrieval. We show that, beyond effective reading, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text; pinpointing the relevant text segments related to the query.
    摘要 We propose an alternative approach that treats the LLM as an interactive agent, allowing it to decide how to read the text through iterative prompting. We introduce MemWalker, a method that first processes the long context into a tree of summary nodes. When receiving a query, the model navigates this tree to search for relevant information and responds once it has gathered sufficient information.On long-text question answering tasks, our method outperforms baseline approaches that use long context windows, recurrence, and retrieval. Additionally, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text, pinpointing the relevant text segments related to the query.

Synslator: An Interactive Machine Translation Tool with Online Learning

  • paper_url: http://arxiv.org/abs/2310.05025
  • repo_url: None
  • paper_authors: Jiayi Wang, Ke Wang, Fengming Zhou, Chengyu Wang, Zhiyong Fu, Zeyu Feng, Yu Zhao, Yuqi Zhang
  • for: 这篇论文旨在描述一种名为Synslator的计算机助记翻译工具,该工具不仅支持互动翻译(IMT),而且可以在线学习并使用实时翻译记忆。
  • methods: 该工具使用两种不同的神经翻译模型来处理翻译记忆,以适应不同的部署环境。此外,系统还使用语言模型来提高互动模式下的翻译流畅性。
  • results: 我们经过评估,确认了在线学习过程中的翻译模型的有效性,并发现使用Synslator的互动功能可以提高翻译效率13%。更多细节可以参考:https://youtu.be/K0vRsb2lTt8。
    Abstract Interactive machine translation (IMT) has emerged as a progression of the computer-aided translation paradigm, where the machine translation system and the human translator collaborate to produce high-quality translations. This paper introduces Synslator, a user-friendly computer-aided translation (CAT) tool that not only supports IMT, but is adept at online learning with real-time translation memories. To accommodate various deployment environments for CAT services, Synslator integrates two different neural translation models to handle translation memories for online learning. Additionally, the system employs a language model to enhance the fluency of translations in an interactive mode. In evaluation, we have confirmed the effectiveness of online learning through the translation models, and have observed a 13% increase in post-editing efficiency with the interactive functionalities of Synslator. A tutorial video is available at:https://youtu.be/K0vRsb2lTt8.
    摘要 协助式机器翻译(IMT)已经成为计算机辅助翻译模式的进化,在这种模式下,机器翻译系统和人类翻译员共同努力以生成高质量翻译。这篇文章介绍了Synslator,一款用户友好的计算机辅助翻译(CAT)工具,不仅支持IMT,而且在线学习 WITH 实时翻译记忆。为满足不同的CAT服务部署环境,Synslator integrate了两种不同的神经翻译模型来处理翻译记忆。此外,系统还使用语言模型来提高交互模式下的翻译流畅性。经评估,我们已经确认了在线学习通过翻译模型的效iveness,并观察到了Synslator的交互功能可以提高翻译效率13%。有关教程视频,请参考:https://youtu.be/K0vRsb2lTt8。

Hybrid Quantum-Classical Machine Learning for Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2310.10672
  • repo_url: None
  • paper_authors: Abu Kaisar Mohammad Masum, Anshul Maurya, Dhruthi Sridhar Murthy, Pratibha, Naveed Mahmud
  • for: 本研究旨在探讨量子计算和经典机器学习的合作在自然语言处理中的可能性,尤其是对大规模数据集中表达的人类情感和意见的情感分析。
  • methods: 本研究提出了一种混合量子-经典机器学习算法的方法ología,包括量子kernel方法和量子径波变换-基于的分类器,并与经典维度减少技术 such as PCA和Haar wavelet transform进行了集成。
  • results: 实验结果表明,在减少数据维度后,量子基于的混合算法的性能是稳定和更好于经典方法。
    Abstract The collaboration between quantum computing and classical machine learning offers potential advantages in natural language processing, particularly in the sentiment analysis of human emotions and opinions expressed in large-scale datasets. In this work, we propose a methodology for sentiment analysis using hybrid quantum-classical machine learning algorithms. We investigate quantum kernel approaches and variational quantum circuit-based classifiers and integrate them with classical dimension reduction techniques such as PCA and Haar wavelet transform. The proposed methodology is evaluated using two distinct datasets, based on English and Bengali languages. Experimental results show that after dimensionality reduction of the data, performance of the quantum-based hybrid algorithms were consistent and better than classical methods.
    摘要 合作 между量子计算和类别机器学习可以在自然语言处理中提供potential的优势,特别是在大规模数据集中检测人们的情感和意见。在这个工作中,我们提议了一种基于量子-类别机器学习算法的情感分析方法。我们研究了量子kernel方法和量子征值回归-基于分类器,并将其与经典维度减少技术相结合,如PCA和Haar波lets变换。我们对两个不同的数据集进行了实验,一个是英语数据集,另一个是孟加拉语数据集。实验结果表明,在减少数据维度后,量子-基于 hybrid 算法的性能是一致的和更好于经典方法。

WikiIns: A High-Quality Dataset for Controlled Text Editing by Natural Language Instruction

  • paper_url: http://arxiv.org/abs/2310.05009
  • repo_url: https://github.com/casparswift/wikiins
  • paper_authors: Xiang Chen, Zheng Li, Xiaojun Wan
  • for: 本研究targets the problem of controlled text editing by natural language instruction.
  • methods: 研究者使用了Wikipedia编辑历史数据库,通过批处理和人工纠正来提高数据集的质量,并提出了自动生成大规模“银”训练集的方法。
  • results: 研究者通过对WikiIns dataset进行分析和实验,得到了一些有价值的结论和编辑INTENTION分析结果。
    Abstract Text editing, i.e., the process of modifying or manipulating text, is a crucial step in human writing process. In this paper, we study the problem of controlled text editing by natural language instruction. According to a given instruction that conveys the edit intention and necessary information, an original draft text is required to be revised into a target text. Existing automatically constructed datasets for this task are limited because they do not have informative natural language instruction. The informativeness requires the information contained in the instruction to be enough to produce the revised text. To address this limitation, we build and release WikiIns, a high-quality controlled text editing dataset with improved informativeness. We first preprocess the Wikipedia edit history database to extract the raw data (WikiIns-Raw). Then we crowdsource high-quality validation and test sets, as well as a small-scale training set (WikiIns-Gold). With the high-quality annotated dataset, we further propose automatic approaches to generate a large-scale ``silver'' training set (WikiIns-Silver). Finally, we provide some insightful analysis on our WikiIns dataset, including the evaluation results and the edit intention analysis. Our analysis and the experiment results on WikiIns may assist the ongoing research on text editing. The dataset, source code and annotation guideline are available at https://github.com/CasparSwift/WikiIns.
    摘要 文本编辑,即对文本进行修改或 manipulate 的过程,是人类写作过程中的关键步骤。在这篇论文中,我们研究了基于自然语言指令的控制文本编辑问题。根据一个拥有修改意图和必要信息的自然语言指令,需要将原始稿件文本修改为目标文本。现有的自动生成的这类数据集有限,因为它们没有具有信息的自然语言指令。为了解决这个限制,我们建立了和发布了高质量的控制文本编辑数据集 WikiIns,其中包括改进的信息含量。我们首先从 Wikipedia 编辑历史数据库中提取原始数据(WikiIns-Raw),然后通过人工审核和测试集,以及一小规模的训练集(WikiIns-Gold)来生成高质量验证集。然后,我们提出了一些自动生成大规模“银”训练集(WikiIns-Silver)的方法。最后,我们提供了一些有价值的分析和实验结果,包括我们的 WikiIns 数据集的评价结果和修改意图分析。我们的分析和实验结果可能会帮助当前的文本编辑研究。我们的数据集、源代码和注释指南可以在 GitHub 上找到:https://github.com/CasparSwift/WikiIns。

MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering

  • paper_url: http://arxiv.org/abs/2310.05007
  • repo_url: None
  • paper_authors: Xiusi Chen, Jyun-Yu Jiang, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Wei Wang
  • for: 提高机器问答系统的满意度,使其在几个训练样本不足的情况下达到良好的结果。
  • methods: 提出了一种基于approximate graph算法和无监督问题生成的最小数据扩充框架,可以有效地提高open-domain QA任务中的精度。
  • results: 经验result表明,MinPrompt能够与基eline相比或者更好地实现精度,在不同的benchmark datasets上提高F-1分数的提升达27.5%。
    Abstract Few-shot question answering (QA) aims at achieving satisfactory results on machine question answering when only a few training samples are available. Recent advances mostly rely on the power of pre-trained large language models (LLMs) and fine-tuning in specific settings. Although the pre-training stage has already equipped LLMs with powerful reasoning capabilities, LLMs still need to be fine-tuned to adapt to specific domains to achieve the best results. In this paper, we propose to select the most informative data for fine-tuning, thereby improving the efficiency of the fine-tuning process with comparative or even better accuracy on the open-domain QA task. We present MinPrompt, a minimal data augmentation framework for open-domain QA based on an approximate graph algorithm and unsupervised question generation. We transform the raw text into a graph structure to build connections between different factual sentences, then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text. We then generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model. Empirical results on several benchmark datasets and theoretical analysis show that MinPrompt is able to achieve comparable or better results than baselines with a high degree of efficiency, bringing improvements in F-1 scores by up to 27.5%.
    摘要 几个示例问答(QA)目标在机器问答中实现满意的结果,只需要几个训练样本。现代进步主要依靠大型自然语言模型(LLM)的力量和特定设置的精细调整。虽然预训练阶段已经把LLM们具备了强大的推理能力,但LLM们仍需要调整以适应特定领域以达到最佳结果。在这篇论文中,我们提议选择最有用的数据进行调整,从而提高调整过程的效率,同时保持比较或更好的准确率在开放领域QA任务中。我们提出了一个名为MinPrompt的最小数据扩展框架,基于approximate graph算法和无监督问题生成。我们将原始文本转换成图结构,建立不同事实句子之间的连接,然后应用图算法选择最小的句子集,以覆盖raw文本中的最多信息。我们然后根据选择的句子集生成QA对,并在选择的句子上训练模型,从而获得最终模型。实验结果表明,MinPrompt可以与基准相比或更好的达到准确率,提高F-1分数的提升达27.5%。

Self-Knowledge Guided Retrieval Augmentation for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05002
  • repo_url: https://github.com/THUNLP-MT/SKR
  • paper_authors: Yile Wang, Peng Li, Maosong Sun, Yang Liu
  • for: 提高大语言模型(LLM)的性能,不需要任务特定的精度调整。
  • methods: 使用自我认知指导的检索增强(SKR)方法,让 LLM 能够识别自己所知道和所不知道,并适应新问题。
  • results: SKR 在多个数据集上表现出色,比 chain-of-thought 和完整检索基本方法高效,使用 InstructGPT 或 ChatGPT 进行评估。
    Abstract Large language models (LLMs) have shown superior performance without task-specific fine-tuning. Despite the success, the knowledge stored in the parameters of LLMs could still be incomplete and difficult to update due to the computational costs. As complementary, retrieval-based methods can offer non-parametric world knowledge and improve the performance on tasks such as question answering. However, we find that the retrieved knowledge does not always help and even has a negative impact on original responses occasionally. To better make use of both internal knowledge and external world knowledge, we investigate eliciting the model's ability to recognize what they know and do not know (which is also called self-knowledge) and propose Self-Knowledge guided Retrieval augmentation (SKR), a simple yet effective method which can let LLMs refer to the questions they have previously encountered and adaptively call for external resources when dealing with new questions. We evaluate SKR on multiple datasets and demonstrate that it outperforms chain-of-thought based and fully retrieval-based methods by using either InstructGPT or ChatGPT.
    摘要

TopicAdapt- An Inter-Corpora Topics Adaptation Approach

  • paper_url: http://arxiv.org/abs/2310.04978
  • repo_url: None
  • paper_authors: Pritom Saha Akash, Trisha Das, Kevin Chen-Chuan Chang
  • for: 本研究提出了一种基于神经网络的话题模型,用于改进话题模型在实际场景中的表现。
  • methods: 本研究使用了一种基于神经网络的话题模型,可以从相关的源корpus中挖掘有用的话题,同时还可以在目标корpus中找到缺失的话题。
  • results: 实验结果表明,提出的话题模型在多个不同领域的数据集上具有较高的表现,比对state-of-the-art话题模型更好。
    Abstract Topic models are popular statistical tools for detecting latent semantic topics in a text corpus. They have been utilized in various applications across different fields. However, traditional topic models have some limitations, including insensitivity to user guidance, sensitivity to the amount and quality of data, and the inability to adapt learned topics from one corpus to another. To address these challenges, this paper proposes a neural topic model, TopicAdapt, that can adapt relevant topics from a related source corpus and also discover new topics in a target corpus that are absent in the source corpus. The proposed model offers a promising approach to improve topic modeling performance in practical scenarios. Experiments over multiple datasets from diverse domains show the superiority of the proposed model against the state-of-the-art topic models.
    摘要

Exploring the Usage of Chinese Pinyin in Pretraining

  • paper_url: http://arxiv.org/abs/2310.04960
  • repo_url: None
  • paper_authors: Baojun Wang, Kun Xu, Lifeng Shang
  • for: 这篇论文主要是为了提高中文语音识别错误稳定性。
  • methods: 这篇论文使用了多种预训练方法,包括使用字符和拼音并行预训练,以增强错误识别的稳定性。
  • results: 实验结果表明,这种新预训练方法可以提高中文语音识别模型的稳定性,并且在公共错误纠正数据集上达到了最高的表现。
    Abstract Unlike alphabetic languages, Chinese spelling and pronunciation are different. Both characters and pinyin take an important role in Chinese language understanding. In Chinese NLP tasks, we almost adopt characters or words as model input, and few works study how to use pinyin. However, pinyin is essential in many scenarios, such as error correction and fault tolerance for ASR-introduced errors. Most of these errors are caused by the same or similar pronunciation words, and we refer to this type of error as SSP(the same or similar pronunciation) errors for short. In this work, we explore various ways of using pinyin in pretraining models and propose a new pretraining method called PmBERT. Our method uses characters and pinyin in parallel for pretraining. Through delicate pretraining tasks, the characters and pinyin representation are fused, which can enhance the error tolerance for SSP errors. We do comprehensive experiments and ablation tests to explore what makes a robust phonetic enhanced Chinese language model. The experimental results on both the constructed noise-added dataset and the public error-correction dataset demonstrate that our model is more robust compared to SOTA models.
    摘要 不同的字母语言和中文拼写、发音之间存在差异。中文NLU任务中,大多数作品是直接使用字符或词作为模型输入,而忽略了拼音。然而,拼音在许多场景中具有重要性,如错误纠正和ASR引入错误的稳定性。大多数这些错误是由同或相似的发音单词引起的,我们称这种错误为SSP(同或相似的发音)错误。在这种工作中,我们探索了使用拼音的不同方法,并提出了一种新的预训练方法called PmBERT。我们的方法在平行预训练中使用字符和拼音,通过细腻的预训练任务,字符和拼音表示被融合,从而提高了SSP错误的承受能力。我们进行了广泛的实验和割除测试,以探索使一个强大的中文语言模型具有哪些特点。实验结果表明,我们的模型在constructed noise-added dataset和公共错误纠正dataset上比SOTA模型更加稳定。

Towards Better Chain-of-Thought Prompting Strategies: A Survey

  • paper_url: http://arxiv.org/abs/2310.04959
  • repo_url: None
  • paper_authors: Zihan Yu, Liang He, Zhen Wu, Xinyu Dai, Jiajun Chen
  • for: 本文旨在探讨Chain-of-Thought(CoT)提示Strategy的效果,并系统地分析其关键因素以及如何更好地应用于不同应用场景。
  • methods: 本文通过审查广泛的当前研究,提供了系统的和全面的分析,涵盖了CoT提示的各种因素的影响,以及如何更好地应用其在不同应用场景。
  • results: 本文提出了一些挑战和未来发展方向,以帮助读者更好地理解和应用CoT提示。
    Abstract Chain-of-Thought (CoT), a step-wise and coherent reasoning chain, shows its impressive strength when used as a prompting strategy for large language models (LLM). Recent years, the prominent effect of CoT prompting has attracted emerging research. However, there still lacks of a systematic summary about key factors of CoT prompting and comprehensive guide for prompts utilizing. For a deeper understanding about CoT prompting, we survey on a wide range of current research, presenting a systematic and comprehensive analysis on several factors that may influence the effect of CoT prompting, and introduce how to better apply it in different applications under these discussions. We further analyze the challenges and propose some future directions about CoT prompting. This survey could provide an overall reference on related research.
    摘要 Chain-of-Thought(CoT),一种逐步逻辑推理链,在大语言模型(LLM)中作为提示策略显示出了惊人的力量。近年来,CoT提示的明显效果吸引了学术界的关注。然而,当前还缺乏一个系统化的总结和完整的指南,用于解释CoT提示的关键因素和如何更好地应用它们。为了深入了解CoT提示,我们在广泛的当前研究中进行了系统化和完整的分析,并对各种因素的影响进行了分析,以及如何在不同应用中更好地使用它们。我们还分析了挑战和提出了未来的发展方向。这种调查可以为相关研究提供一个总体参考。Note: Please note that the translation is in Simplified Chinese, and some words or phrases may have different translations in Traditional Chinese.

Domain Knowledge Graph Construction Via A Simple Checker

  • paper_url: http://arxiv.org/abs/2310.04949
  • repo_url: None
  • paper_authors: Yueling Zeng, Li-C. Wang
  • for: 这项研究的目的是为Semiconductor chip设计公司提供一种基于语言模型的知识图构建方法,以满足公司的两个重要考虑因素:保密性和可扩展性。
  • methods: 本文提出了一种oracle-checker方法,利用GPT3.5的力量来解决知识图构建问题。该方法包括一个验证过程,用于检查域专家的背景知识是否已经满足了构建知识图的需求。
  • results: 本文使用RISC-V无权ISA规范为例,解释了关键想法和讨论了实践中的 oracle-checker方法的可能性。
    Abstract With the availability of large language models, there is a growing interest for semiconductor chip design companies to leverage the technologies. For those companies, deployment of a new methodology must include two important considerations: confidentiality and scalability. In this context, this work tackles the problem of knowledge graph construction from hardware-design domain texts. We propose an oracle-checker scheme to leverage the power of GPT3.5 and demonstrate that the essence of the problem is in distillation of domain expert's background knowledge. Using RISC-V unprivileged ISA specification as an example, we explain key ideas and discuss practicality of our proposed oracle-checker approach.
    摘要 现在大型语言模型成为可用的,半导体封包设计公司开始关注这些技术的应用。为这些公司而办理新方法时,需要考虑两个重要因素:保密和可扩展性。在这个上下文中,本文解决半导体设计领域文本知识图构建的问题。我们提议使用GPT3.5的力量,并证明知识的核心问题在封包专家背景知识的精炼中。使用RISC-V不具有特权ISA规范为例,我们介绍关键想法并讨论我们的 oracle-checker方法的实用性。

TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2310.04948
  • repo_url: None
  • paper_authors: Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, Yan Liu
  • for: 本研究旨在开发一种新的时间序列表示学习框架,以提高时间序列预测的准确性。
  • methods: 该框架基于两个关键的强制性理念:(一)分解复杂的时间序列任务中的趋势、季度和差异部分的交互作用;以及(二)通过选择性的提示来促进非站点时间序列的分布适应。
  • results: 对多个时间序列benchmark datasets进行实验,TEMPO模型表现出了与现有方法相比的显著性能提升,不仅在标准的指导学习 Setting中,而且在未经见过数据集和多模式输入的情况下也能够获得出色的表现。这一结果表明TEMPO具有成为基础模型构建框架的潜力。
    Abstract The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various textual datasets. It is intriguing to explore whether GPT-type architectures can be effective for time series, capturing the intrinsic dynamic attributes and leading to significant accuracy improvements. In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. We focus on utilizing two essential inductive biases of the time series task for pre-trained models: (i) decomposition of the complex interaction between trend, seasonal and residual components; and (ii) introducing the selection-based prompts to facilitate distribution adaptation in non-stationary time series. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains. Our experiments demonstrate the superior performance of TEMPO over state-of-the-art methods on a number of time series benchmark datasets. This performance gain is observed not only in standard supervised learning settings but also in scenarios involving previously unseen datasets as well as in scenarios with multi-modal inputs. This compelling finding highlights TEMPO's potential to constitute a foundational model-building framework.
    摘要 过去一个 décennie witnessed significant advances in time series modeling with deep learning. Although the best-performing architectures vary greatly across applications and domains, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance by training one general-purpose model across various textual datasets. It is intriguing to explore whether GPT-type architectures can be effective for time series, capturing the intrinsic dynamic attributes and leading to significant accuracy improvements.In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. We focus on utilizing two essential inductive biases of the time series task for pre-trained models: (i) decomposing the complex interaction between trend, seasonal, and residual components; and (ii) introducing selection-based prompts to facilitate distribution adaptation in non-stationary time series. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains. Our experiments demonstrate the superior performance of TEMPO over state-of-the-art methods on a number of time series benchmark datasets. This performance gain is observed not only in standard supervised learning settings but also in scenarios involving previously unseen datasets as well as in scenarios with multi-modal inputs. This compelling finding highlights TEMPO's potential to constitute a foundational model-building framework.

cs.LG - 2023-10-08

Adversarial Attacks on Combinatorial Multi-Armed Bandits

  • paper_url: http://arxiv.org/abs/2310.05308
  • repo_url: None
  • paper_authors: Rishab Balasubramanian, Jiawei Li, Prasad Tadepalli, Huazheng Wang, Qingyun Wu, Haoyu Zhao
  • for: 这 paper 研究了对 Combinatorial Multi-armed Bandits (CMAB) 的奖伪攻击。
  • methods: 这 paper 提供了一个 suficient 和 necessary condition for the attackability of CMAB, 以及一个攻击算法 для可攻击的 CMAB 实例。
  • results: 这 paper 发现了一个意外的事实,即攻击 CMAB 实例的可能性还取决于敌方知道或不知道该实例的环境。这意味着在实际应用中,对 CMAB 的攻击非常困难,并且无法找到一个通用的攻击策略。这paper 通过实验 validate 了这些理论发现。
    Abstract We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, which depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.
    摘要 我们研究了对 combinatorial multi-armed bandit (CMAB) 的奖励毒攻击。我们首先提供了 CMAB 的攻击可行性必要和 suficient condition,这取决于 CMAB 实例中的奖励分布和结果分布。此外,我们还设计了一种攻击算法 для可攻击 CMAB 实例。与先前对多重抓拍机器人的理解不同,我们发现了一个意外的事实,即 CMAB 实例的攻击可行性还取决于敌方知道或不知道 CMAB 实例的情况。这一发现表明了在实践中对 CMAB 进行攻击是困难的,并且没有一个通用的攻击策略可以应用于任何 CMAB 实例,因为环境多数是不知道的。我们验证了我们的理论发现Result via 广泛的实验,包括probabilistic maximum covering problem、online minimum spanning tree、cascading bandits for online ranking和online shortest path。

Successive Data Injection in Conditional Quantum GAN Applied to Time Series Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.05307
  • repo_url: None
  • paper_authors: Benjamin Kalfon, Soumaya Cherkaoui, Jean-Frédéric Laprade, Ola Ahmad, Shengrui Wang
  • for: 这个论文主要针对的是如何使用量子生成器网络(QGAN)进行异常检测,尤其是在通信网络中采集的时间序列数据上。
  • methods: 这篇论文提出了一种新的高维编码方法,named Successive Data Injection(SuDaI),以便在量子状态中扩展更大的数据空间,从而适应更高维的时间序列数据。
  • results: 该方法可以在高维时间序列数据上进行异常检测,并且可以在其他类型的高维时间序列数据上应用,因此开 up了多个应用领域。
    Abstract Classical GAN architectures have shown interesting results for solving anomaly detection problems in general and for time series anomalies in particular, such as those arising in communication networks. In recent years, several quantum GAN architectures have been proposed in the literature. When detecting anomalies in time series using QGANs, huge challenges arise due to the limited number of qubits compared to the size of the data. To address these challenges, we propose a new high-dimensional encoding approach, named Successive Data Injection (SuDaI). In this approach, we explore a larger portion of the quantum state than that in the conventional angle encoding, the method used predominantly in the literature, through repeated data injections into the quantum state. SuDaI encoding allows us to adapt the QGAN for anomaly detection with network data of a much higher dimensionality than with the existing known QGANs implementations. In addition, SuDaI encoding applies to other types of high-dimensional time series and can be used in contexts beyond anomaly detection and QGANs, opening up therefore multiple fields of application.
    摘要 传统的GAN架构在检测异常问题上有诸多有趣的结果,特别是在通信网络中出现的时间序列异常问题。在过去几年,一些量子GAN架构在 литературе中被提出。在使用QGAN检测时间序列异常时,面临着很大的挑战,主要是因为量子状态的限制比数据集大得多。为解决这些挑战,我们提出了一种新的高维码编码方法,名为Successive Data Injection(SuDaI)。在SuDaI编码中,我们可以更好地探索量子状态的更大部分,而不是在文献中主要使用的角度编码方法。这使得我们可以通过重复数据注入到量子状态来适应QGAN检测高维时间序列数据的问题。此外,SuDaI编码还适用于其他类型的高维时间序列和不同的应用场景,因此开放了多个应用领域。

Clustering Three-Way Data with Outliers

  • paper_url: http://arxiv.org/abs/2310.05288
  • repo_url: None
  • paper_authors: Katharine M. Clark, Paul D. McNicholas
  • for: clustering matrix-variate normal data with outliers
  • methods: 使用分布subset log-likelihoods, extends OCLUST algorithm to matrix-variate normal data,使用迭代方法检测和剔除异常点
  • results: 可以有效地检测和剔除matrix-variate normal data中的异常点
    Abstract Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.
    摘要 矩阵变量分布是现代模型基 clustering 领域的新添加,可以处理矩阵数据形式的复杂结构,如图像和时间序列。由于其新的出现,关于矩阵变量数据的文献非常有限,甚至更少关于处理异常值在这些模型中。一种用于矩阵变量正态数据 clustering 和异常值排除的方法被讨论。该方法基于分布subset log-likelihood的分布,对矩阵变量数据进行了扩展,并使用迭代法排除异常值。

Learning force laws in many-body systems

  • paper_url: http://arxiv.org/abs/2310.05273
  • repo_url: https://github.com/wyu54/many-body-force-infer
  • paper_authors: Wentao Yu, Eslam Abdelaleem, Ilya Nemenman, Justin C. Burton
  • for: The paper is written to demonstrate a machine learning (ML) approach for discovering force laws in dusty plasma experiments.
  • methods: The paper uses 3D particle trajectories to train an ML model that incorporates physical intuition to infer the effective non-reciprocal forces between particles, accounting for inherent symmetries and non-identical particles.
  • results: The model accurately learns the force laws and extracts each particle’s mass and charge, with an accuracy of R^2 > 0.99, indicating new physics in dusty plasma beyond the resolution of current theories and demonstrating the potential of ML-powered approaches for guiding new routes of scientific discovery in many-body systems.Here’s the same information in Simplified Chinese text:
  • for: 该文章用于演示一种基于机器学习(ML)的方法,用于在尘晶体实验中发现力法律。
  • methods: 该文章使用3D particulate轨迹来训练一个ML模型,该模型具有物理直觉,以推导粒子之间的有效非对称力,并考虑粒子之间的自旋Symmetry和不同的粒子。
  • results: 模型具有R^2>0.99的准确性,表明尘晶体中存在跟当前理论不同的新物理现象,并证明ML能力可以导引科学发现的新路径。
    Abstract Scientific laws describing natural systems may be more complex than our intuition can handle, and thus how we discover laws must change. Machine learning (ML) models can analyze large quantities of data, but their structure should match the underlying physical constraints to provide useful insight. Here we demonstrate a ML approach that incorporates such physical intuition to infer force laws in dusty plasma experiments. Trained on 3D particle trajectories, the model accounts for inherent symmetries and non-identical particles, accurately learns the effective non-reciprocal forces between particles, and extracts each particle's mass and charge. The model's accuracy (R^2 > 0.99) points to new physics in dusty plasma beyond the resolution of current theories and demonstrates how ML-powered approaches can guide new routes of scientific discovery in many-body systems.
    摘要

Simplifying GNN Performance with Low Rank Kernel Models

  • paper_url: http://arxiv.org/abs/2310.05250
  • repo_url: https://github.com/lucianoavinas/lowrank-gnn-kernels
  • paper_authors: Luciano Vinas, Arash A. Amini
  • for: 本研究 revisits recent spectral GNN approaches to semi-supervised node classification (SSNC), 提出许多现代GNN架构可能过度设计。
  • methods: 研究使用非 Parametric estimation 技术在 spectral 频谱中应用,代替许多深度学习引用 GNN 设计。这些传统技术适用于各种图类型,达到了许多常见 SSNC benchmark 的状态 arts 性能。
  • results: 研究表明,近期 GNN 方法的性能改进部分归功于评估方法的转变。此外,对 GNN спектраль滤波技术的各种超参数进行了ablative study。代码可以在 https://github.com/lucianoAvinas/lowrank-gnn-kernels 找到。
    Abstract We revisit recent spectral GNN approaches to semi-supervised node classification (SSNC). We posit that many of the current GNN architectures may be over-engineered. Instead, simpler, traditional methods from nonparametric estimation, applied in the spectral domain, could replace many deep-learning inspired GNN designs. These conventional techniques appear to be well suited for a variety of graph types reaching state-of-the-art performance on many of the common SSNC benchmarks. Additionally, we show that recent performance improvements in GNN approaches may be partially attributed to shifts in evaluation conventions. Lastly, an ablative study is conducted on the various hyperparameters associated with GNN spectral filtering techniques. Code available at: https://github.com/lucianoAvinas/lowrank-gnn-kernels
    摘要 我们回到最近的спектルールGraph Neural Network(GNN)方法,用于半supervised node classification(SSNC)。我们认为许多现有的GNN架构可能是过工程。相反,更简单的传统方法,应用于spectral domain,可以取代许多深度学习灵感的GNN设计。这些传统技术适用于多种граф型,可以达到多数常见的SSNC benchmark中的state-of-the-art表现。此外,我们显示出最近GNN方法的性能提升部分可以归因于评估惯例的变化。最后,我们进行了GNNspectral filtering技术各种参数的ablative study。code可以在以下github上取得:https://github.com/lucianoAvinas/lowrank-gnn-kernels。

Enhancing Kernel Flexibility via Learning Asymmetric Locally-Adaptive Kernels

  • paper_url: http://arxiv.org/abs/2310.05236
  • repo_url: https://github.com/hefansjtu/labrbf_kernel
  • paper_authors: Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens
  • for: 这篇论文的目的是提高基域学习的灵活性,通过使用可训练的本地适应宽度(LAB)来增强径basis函数(RBF)kernels。
  • methods: 这篇论文提出了一种新的非对称基域函数(Asymmetric Kernel Ridge Regression,AKRR)框架,并引入了一种循环基域学习算法来训练可训练的本地适应宽度。
  • results: 实验结果表明,提出的方法可以在实际 dataset 上达到remarkable的性能,比 Nystr"om approximation-based algorithms 更具有扩展性,并且在基域学习方法中显示出较高的准确率,甚至超过了 residual neural networks。
    Abstract The lack of sufficient flexibility is the key bottleneck of kernel-based learning that relies on manually designed, pre-given, and non-trainable kernels. To enhance kernel flexibility, this paper introduces the concept of Locally-Adaptive-Bandwidths (LAB) as trainable parameters to enhance the Radial Basis Function (RBF) kernel, giving rise to the LAB RBF kernel. The parameters in LAB RBF kernels are data-dependent, and its number can increase with the dataset, allowing for better adaptation to diverse data patterns and enhancing the flexibility of the learned function. This newfound flexibility also brings challenges, particularly with regards to asymmetry and the need for an efficient learning algorithm. To address these challenges, this paper for the first time establishes an asymmetric kernel ridge regression framework and introduces an iterative kernel learning algorithm. This novel approach not only reduces the demand for extensive support data but also significantly improves generalization by training bandwidths on the available training data. Experimental results on real datasets underscore the remarkable performance of the proposed algorithm, showcasing its superior capability in handling large-scale datasets compared to Nystr\"om approximation-based algorithms. Moreover, it demonstrates a significant improvement in regression accuracy over existing kernel-based learning methods and even surpasses residual neural networks.
    摘要 文中提到的主要瓶须是基于手动设计、预给定、不可学习的kernels的学习系统的缺乏足够的灵活性。为了增强kernel的灵活性,这篇论文引入了Locally-Adaptive-Bandwidths(LAB)作为可学习参数,从而改进了基于卷积函数(RBF)kernel,得到LAB RBF kernel。这些参数随着数据的变化而变化,数量可以随着数据集的增加而增加,以适应多样化的数据模式,从而提高学习的灵活性。然而,这种新的灵活性也带来了挑战,特别是偏 asymmetry和有效的学习算法的需求。为了解决这些挑战,这篇论文首次提出了一种偏 asymmetric kernel ridge regression框架,并引入了一种迭代式 kernel learning算法。这种新的方法不仅可以减少了大量的支持数据,还可以很好地适应不同的数据模式,从而提高了泛化性。实验结果表明,提议的算法在实际数据上表现出色,比 Nystr\"om Approximation-based algorithms更好地处理大规模数据,并且超过了基于kernel的学习方法和 residual neural networks 的准确率。

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

  • paper_url: http://arxiv.org/abs/2310.05230
  • repo_url: None
  • paper_authors: Shicong Cen, Yuejie Chi
  • for: 政策梯度方法,用于Sequential Decision Making中寻找政策优化。
  • methods: 使用首选信息来最大化价值函数。
  • results: 最近的进展包括对政策梯度方法的全球最优性保证,以及对重要问题参数的finite-time收敛率。
    Abstract Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and control. Guaranteeing the global optimality of policy gradient methods, however, is highly nontrivial due to nonconcavity of the value functions. In this exposition, we highlight recent progresses in understanding and developing policy gradient methods with global convergence guarantees, putting an emphasis on their finite-time convergence rates with regard to salient problem parameters.
    摘要

Accelerating Machine Learning Primitives on Commodity Hardware

  • paper_url: http://arxiv.org/abs/2310.05218
  • repo_url: None
  • paper_authors: Roman Snytsar
  • for: 这篇论文是用于探讨深度神经网络(DNN)中的滑动窗口卷积技术,并对其进行了广泛的研究和评估。
  • methods: 本论文使用了滑动窗口卷积技术来提高深度神经网络的训练和推理效率,并对其进行了广泛的研究和评估。
  • results: 研究结果表明,使用滑动窗口卷积技术可以减少内存占用和提高计算效率,并在CPU和专门设计的硬件加速器上实现显著的速度提升。这种技术可能会推动AI在低功耗和低内存设备上的广泛应用,无需特殊硬件。
    Abstract Sliding Window Sum algorithms have been successfully used for training and inference of Deep Neural Networks. We have shown before how both pooling and convolution 1-D primitives could be expressed as sliding sums and evaluated by the compute kernels with a shared structure. In this paper, we present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix Multiplication (GEMM) based convolution in Deep Neural Networks (DNNs). The Sliding Window technique addresses the memory bloating problem and demonstrates a significant speedup in 2-D convolution. We explore the performance of this technique on a range of implementations, including custom kernels for specific filter sizes. Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated hardware accelerators. This could promote a wider adoption of AI on low-power and low-memory devices without the need for specialized hardware. We also discuss the compatibility of model compression methods and optimized network architectures with the Sliding Window technique, encouraging further research in these areas.
    摘要 Sliding Window Sum算法已经成功地应用于深度神经网络的训练和推理。我们之前已经证明了抽象和卷积1-D primitives可以表示为滑动和计算缓存的共享结构。在这篇论文中,我们进行了广泛的滑动窗口卷积技术的研究,作为深度神经网络中常用的普通矩阵乘法(GEMM)基于卷积的更有效的替代方案。滑动窗口技术解决了内存膨胀问题,并在2-D卷积中显示出了明显的速度提升。我们对不同的实现进行了探索,包括特定的缓存器大小的自定义kernels。我们的结果表明,滑动窗口计算kernels可以在CPU和专门的硬件加速器上超越GEMM-基于卷积。这可能会推动AI在低功耗和低内存设备上的更广泛应用,无需特殊硬件。我们还讨论了模型压缩方法和优化网络架构与滑动窗口技术的相容性,鼓励进一步的研究在这些领域。

Towards Optimizing with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05204
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Pei-Fu Guo, Ying-Hsuan Chen, Yun-Da Tsai, Shou-De Lin
  • for: 这个研究是为了评估LLMs在不同任务和数据大小下的优化能力。
  • methods: 这个研究使用了交互式提示法, LLMS需要在每个优化步骤中从过去生成的解决方案中生成新的解决方案,然后评估这些新的解决方案的值。研究者还引入了三种综合评估任务性能的指标,这些指标适用于评估LLM在各种优化任务中的表现,并且对测试样本的变化更加敏感。
  • results: 研究发现,当处理小样本时,LLMs表现出强大的优化能力,但是对数据大小和值的影响表明需要进一步研究LLM在优化任务中的表现。
    Abstract In this work, we conduct an assessment of the optimization capabilities of LLMs across various tasks and data sizes. Each of these tasks corresponds to unique optimization domains, and LLMs are required to execute these tasks with interactive prompting. That is, in each optimization step, the LLM generates new solutions from the past generated solutions with their values, and then the new solutions are evaluated and considered in the next optimization step. Additionally, we introduce three distinct metrics for a comprehensive assessment of task performance from various perspectives. These metrics offer the advantage of being applicable for evaluating LLM performance across a broad spectrum of optimization tasks and are less sensitive to variations in test samples. By applying these metrics, we observe that LLMs exhibit strong optimization capabilities when dealing with small-sized samples. However, their performance is significantly influenced by factors like data size and values, underscoring the importance of further research in the domain of optimization tasks for LLMs.
    摘要 在这个工作中,我们进行了 LLMS 的优化能力评估,涵盖了多种任务和数据大小。每个任务都对应于唯一的优化领域,LLMS 需要在交互式提示下执行这些任务。即在每次优化步骤中,LLMS 从过去生成的解决方案和其值中生成新的解决方案,然后评估并考虑这些新的解决方案。此外,我们引入了三种特征metric来全面评估任务性能从多个角度。这些 metric 可以用于评估 LLMS 在各种优化任务上的性能,并且对测试样本的变化更加敏感。通过应用这些 metric,我们发现 LLMS 在小样本Size 下表现出色,但是它们的表现受到数据大小和值的影响,这 highlights 了进一步研究 LLMS 在优化任务领域的必要性。

Lifelong Learning for Fog Load Balancing: A Transfer Learning Approach

  • paper_url: http://arxiv.org/abs/2310.05187
  • repo_url: None
  • paper_authors: Maad Ebrahim, Abdelhakim Senhaji Hafid, Mohamed Riduan Abid
  • for: 本文旨在提出一种基于Reinforcement Learning(RL)的fog computing环境中的负载均衡(LB)策略,以提高系统性能。
  • methods: 本文使用privacy-aware RL agents来优化 fog computing环境中的负载均衡,并提出一种生命周期学习框架,使用 Transfer Learning(TL)来减少训练成本和适应环境变化。
  • results: 本文的实验结果显示,使用TL可以大幅减少RL agents的训练时间和失败概率,并在不同的环境下保持鲁棒性。
    Abstract Fog computing emerged as a promising paradigm to address the challenges of processing and managing data generated by the Internet of Things (IoT). Load balancing (LB) plays a crucial role in Fog computing environments to optimize the overall system performance. It requires efficient resource allocation to improve resource utilization, minimize latency, and enhance the quality of service for end-users. In this work, we improve the performance of privacy-aware Reinforcement Learning (RL) agents that optimize the execution delay of IoT applications by minimizing the waiting delay. To maintain privacy, these agents optimize the waiting delay by minimizing the change in the number of queued requests in the whole system, i.e., without explicitly observing the actual number of requests that are queued in each Fog node nor observing the compute resource capabilities of those nodes. Besides improving the performance of these agents, we propose in this paper a lifelong learning framework for these agents, where lightweight inference models are used during deployment to minimize action delay and only retrained in case of significant environmental changes. To improve the performance, minimize the training cost, and adapt the agents to those changes, we explore the application of Transfer Learning (TL). TL transfers the knowledge acquired from a source domain and applies it to a target domain, enabling the reuse of learned policies and experiences. TL can be also used to pre-train the agent in simulation before fine-tuning it in the real environment; this significantly reduces failure probability compared to learning from scratch in the real environment. To our knowledge, there are no existing efforts in the literature that use TL to address lifelong learning for RL-based Fog LB; this is one of the main obstacles in deploying RL LB solutions in Fog systems.
    摘要 FOG计算技术 emerged as a promising paradigm to address the challenges of processing and managing data generated by the Internet of Things (IoT). Load balancing (LB) plays a crucial role in FOG computing environments to optimize the overall system performance. It requires efficient resource allocation to improve resource utilization, minimize latency, and enhance the quality of service for end-users. In this work, we improve the performance of privacy-aware Reinforcement Learning (RL) agents that optimize the execution delay of IoT applications by minimizing the waiting delay. To maintain privacy, these agents optimize the waiting delay by minimizing the change in the number of queued requests in the whole system, i.e., without explicitly observing the actual number of requests that are queued in each FOG node nor observing the compute resource capabilities of those nodes. Besides improving the performance of these agents, we propose in this paper a lifelong learning framework for these agents, where lightweight inference models are used during deployment to minimize action delay and only retrained in case of significant environmental changes. To improve the performance, minimize the training cost, and adapt the agents to those changes, we explore the application of Transfer Learning (TL). TL transfers the knowledge acquired from a source domain and applies it to a target domain, enabling the reuse of learned policies and experiences. TL can be also used to pre-train the agent in simulation before fine-tuning it in the real environment; this significantly reduces failure probability compared to learning from scratch in the real environment. To our knowledge, there are no existing efforts in the literature that use TL to address lifelong learning for RL-based Fog LB; this is one of the main obstacles in deploying RL LB solutions in FOG systems.

Unified speech and gesture synthesis using flow matching

  • paper_url: http://arxiv.org/abs/2310.05181
  • repo_url: None
  • paper_authors: Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter
  • for: 这篇论文旨在描述一种新的多Modal合成方法,可以同时生成语音和手势动作。
  • methods: 该方法使用优化交通流行为匹配(OT-CFM)来联合生成语音和手势动作,而且比前一代更简单,具有更小的内存占用量,同时能够捕捉语音和手势的联合分布,从而生成两个模态的动作。
  • results: 该方法在论文中被证明可以生成更自然的语音和更人工的手势动作,并且在单模和多模测试中也表现出更高的合理性。
    Abstract As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures. This paper presents a novel, unified architecture for jointly synthesising speech acoustics and skeleton-based 3D gesture motion from text, trained using optimal-transport conditional flow matching (OT-CFM). The proposed architecture is simpler than the previous state of the art, has a smaller memory footprint, and can capture the joint distribution of speech and gestures, generating both modalities together in one single process. The new training regime, meanwhile, enables better synthesis quality in much fewer steps (network evaluations) than before. Uni- and multimodal subjective tests demonstrate improved speech naturalness, gesture human-likeness, and cross-modal appropriateness compared to existing benchmarks.
    摘要 As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behavior, such as spontaneous speech and associated body gestures. This paper presents a novel, unified architecture for jointly synthesizing speech acoustics and skeleton-based 3D gesture motion from text, trained using optimal-transport conditional flow matching (OT-CFM). The proposed architecture is simpler than the previous state of the art, has a smaller memory footprint, and can capture the joint distribution of speech and gestures, generating both modalities together in one single process. The new training regime, meanwhile, enables better synthesis quality in much fewer steps (network evaluations) than before. Uni- and multimodal subjective tests demonstrate improved speech naturalness, gesture human-likeness, and cross-modal appropriateness compared to existing benchmarks.Here's the translation breakdown:* As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks: 文本读取技术已经达到了很高的自然性水平。* there is growing interest in multimodal synthesis of verbal and non-verbal communicative behavior: 人们对于涉及语音和非语音通信行为的多模态合成表示越来越大的兴趣。* such as spontaneous speech and associated body gestures: 例如,自然的语音和相关的身体姿势。* This paper presents a novel, unified architecture for jointly synthesizing speech acoustics and skeleton-based 3D gesture motion from text: 本文提出了一种新的、统一的架构,用于从文本中同时合成语音和基于骨架的3D手势动作。* trained using optimal-transport conditional flow matching (OT-CFM): 使用优化交通Conditional Flow匹配(OT-CFM)进行训练。* The proposed architecture is simpler than the previous state of the art, has a smaller memory footprint, and can capture the joint distribution of speech and gestures: 提出的架构比前一代更简单,占用更少的内存空间,并能够捕捉语音和手势的共同分布。* generating both modalities together in one single process: 一起生成两种Modalities。* The new training regime, meanwhile, enables better synthesis quality in much fewer steps (network evaluations) than before: 新的训练方法可以在更少的步骤(网络评估)中实现更高质量的合成。* Uni- and multimodal subjective tests demonstrate improved speech naturalness, gesture human-likeness, and cross-modal appropriateness compared to existing benchmarks: 单模态和多模态主观测试表明,提出的方法可以提高语音自然性、姿势人类化和交叉模态适应性,相比exist的 referential。

Distributional Reinforcement Learning with Online Risk-awareness Adaption

  • paper_url: http://arxiv.org/abs/2310.05179
  • repo_url: None
  • paper_authors: Yupeng Wu, Wenjie Huang
  • for: 本研究旨在提出一种新的分布式RL框架,以快速适应不确定环境中的不同风险水平,以提高RL在安全关键环境中的可靠优化策略。
  • methods: 该框架基于分布式RL的基础上,通过在线解决一个总变量最小化问题, dynamically选择适度的epistemic风险水平,以满足安全性和稳定性的要求。这里使用了一种 Follow-The-Leader 类型的搜索算法,以及一种特殊修改的损失函数,以实现在线选择风险水平。
  • results: 对多种任务进行比较,研究发现,DRL-ORA方法在面对不确定环境中表现出色,超过了基于固定风险水平或手动适应风险水平的方法。此外,研究还发现,DRL-ORA方法可以轻松地与多种RL算法结合使用,不需要进行大量的修改。
    Abstract The use of reinforcement learning (RL) in practical applications requires considering sub-optimal outcomes, which depend on the agent's familiarity with the uncertain environment. Dynamically adjusting the level of epistemic risk over the course of learning can tactically achieve reliable optimal policy in safety-critical environments and tackle the sub-optimality of a static risk level. In this work, we introduce a novel framework, Distributional RL with Online Risk Adaption (DRL-ORA), which can quantify the aleatory and epistemic uncertainties compositely and dynamically select the epistemic risk levels via solving a total variation minimization problem online. The risk level selection can be efficiently achieved through grid search using a Follow-The-Leader type algorithm, and its offline oracle is related to "satisficing measure" (in the decision analysis community) under a special modification of the loss function. We show multiple classes of tasks where DRL-ORA outperforms existing methods that rely on either a fixed risk level or manually predetermined risk level adaption. Given the simplicity of our modifications, we believe the framework can be easily incorporated into most RL algorithm variants.
    摘要 使用强化学习(RL)在实际应用中需要考虑不理想的结果,这些结果取决于智能体对不确定环境的熟悉程度。随着学习过程中的时间推移, dynamically 调整epistemic 风险水平可以策略性实现可靠的优化策略并解决固定风险水平的不优势。在这项工作中,我们介绍了一种新的框架,分布式RLwith Online Risk Adaptation(DRL-ORA),它可以compositely 量化 aleatory 和 epistemic uncertainties,并在线 solves total variation minimization problem 来动态选择 epistemic 风险水平。风险水平选择可以高效地通过格子搜索使用 Follow-The-Leader 类型算法进行,其 Offline oracle 与 "满意度量"(在决策分析社区)相关,只是在特定的损失函数修改下进行修改。我们证明了多种任务上,DRL-ORA 可以超过现有的固定风险水平或手动适应风险水平的方法。 compte tenu de la simplicité de nos modifications, nous croyons que le framework peut être facilement intégré dans la plupart des variantes d'algorithme RL.

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

  • paper_url: http://arxiv.org/abs/2310.05175
  • repo_url: https://github.com/luuyin/owl
  • paper_authors: Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Mykola Pechenizkiy, Yi Liang, Zhangyang Wang, Shiwei Liu
    for:* The paper aims to improve the practical deployment of large language models (LLMs) by applying traditional network pruning techniques.methods:* The paper introduces a novel LLM pruning methodology called Outlier Weighed Layerwise sparsity (OWL), which incorporates non-uniform layerwise sparsity ratios tailored for LLM pruning.results:* The paper demonstrates the distinct advantages offered by OWL over previous methods, achieving a remarkable performance gain of 61.22 and 6.80 perplexity at a high sparsity level of 70%, respectively, compared to the state-of-the-art Wanda and SparseGPT.
    Abstract Large Language Models (LLMs), renowned for their remarkable performance, present a challenge due to their colossal model size when it comes to practical deployment. In response to this challenge, efforts have been directed toward the application of traditional network pruning techniques to LLMs, uncovering a massive number of parameters can be pruned in one-shot without hurting performance. Building upon insights gained from pre-LLM models, prevailing LLM pruning strategies have consistently adhered to the practice of uniformly pruning all layers at equivalent sparsity. However, this observation stands in contrast to the prevailing trends observed in the field of vision models, where non-uniform layerwise sparsity typically yields substantially improved results. To elucidate the underlying reasons for this disparity, we conduct a comprehensive analysis of the distribution of token features within LLMs. In doing so, we discover a strong correlation with the emergence of outliers, defined as features exhibiting significantly greater magnitudes compared to their counterparts in feature dimensions. Inspired by this finding, we introduce a novel LLM pruning methodology that incorporates a tailored set of non-uniform layerwise sparsity ratios specifically designed for LLM pruning, termed as Outlier Weighed Layerwise sparsity (OWL). The sparsity ratio of OWL is directly proportional to the outlier ratio observed within each layer, facilitating a more effective alignment between layerwise weight sparsity and outlier ratios. Our empirical evaluation, conducted across the LLaMA-V1 family and OPT, spanning various benchmarks, demonstrates the distinct advantages offered by OWL over previous methods. For instance, our approach exhibits a remarkable performance gain, surpassing the state-of-the-art Wanda and SparseGPT by 61.22 and 6.80 perplexity at a high sparsity level of 70%, respectively.
    摘要 大型语言模型(LLM)因其出色的表现而带来挑战,尤其是在实际应用时遇到模型的巨大大小问题。为了解决这个问题,努力对 LLM 应用传统网络剔除技术,发现可以剔除一个巨大的数据量而无需影响表现。建立在先前的模型中获得的见解上,现有的 LLM 剔除策略一般遵循以剔除所有层的内容的方式,但这与视觉模型中的非均匀层剔除倾向相比,通常会带来更好的结果。为了了解 LLM 中对于剔除的影响,我们进行了一个全面的分析,发现 LLM 中的内容特征分布和异常值的出现有很强的联系。根据这个发现,我们提出了一种新的 LLM 剔除方法,称为非均匀层剔除(OWL)。OWL 的剔除比率与异常值的比率直接相比,从而使得层剔除与异常值的分布更好地匹配。我们对 LLMA-V1 家族和 OPT 进行了实验,覆盖了多个测试benchmark,结果显示我们的方法在剔除率高于 70% 时表现出色,较前者的状态顶峰方法(Wanda和SparseGPT)提高了61.22和6.80的混淆度。

Investigating the Ability of PINNs To Solve Burgers’ PDE Near Finite-Time BlowUp

  • paper_url: http://arxiv.org/abs/2310.05169
  • repo_url: None
  • paper_authors: Dibyakanti Kumar, Anirbit Mukherjee
  • for: 这个论文旨在investigating the stability of Physics Informed Neural Networks (PINNs) in solving partial differential equations (PDEs) with finite-time blow-ups.
  • methods: 作者使用了泛化 bound的方法来研究PINNs的稳定性,并通过实验证明了这些 bound 与 neurally found surrogate 的 $\ell_2$-distance有直接的相关性。
  • results: 研究发现,PINNs 可以准确地探测 finite-time blow-ups,并且可以提供与真实解的 $\ell_2$-distance的评估。
    Abstract Physics Informed Neural Networks (PINNs) have been achieving ever newer feats of solving complicated PDEs numerically while offering an attractive trade-off between accuracy and speed of inference. A particularly challenging aspect of PDEs is that there exist simple PDEs which can evolve into singular solutions in finite time starting from smooth initial conditions. In recent times some striking experiments have suggested that PINNs might be good at even detecting such finite-time blow-ups. In this work, we embark on a program to investigate this stability of PINNs from a rigorous theoretical viewpoint. Firstly, we derive generalization bounds for PINNs for Burgers' PDE, in arbitrary dimensions, under conditions that allow for a finite-time blow-up. Then we demonstrate via experiments that our bounds are significantly correlated to the $\ell_2$-distance of the neurally found surrogate from the true blow-up solution, when computed on sequences of PDEs that are getting increasingly close to a blow-up.
    摘要 物理学 Informed Neural Networks (PINNs) 在解决复杂的偏微分方程(PDEs)方面已经取得了不断更新的成就,同时提供了吸引人的准确率和推理速度之间的折衔。特别是,PDEs 中存在一些简单的 PDEs,可以在有限时间内从流体初始条件演化成精炼解。在最近的实验中,有些突出的实验结果表明,PINNs 可能会检测到这种有限时间爆炸。在这项工作中,我们开始了一项研究,以探讨 PINNs 在理论上的稳定性。首先,我们 derive了 PINNs 对于布尔格 PDE 的泛化上限,在任意维度下,以条件 Allowing for finite-time blow-up。然后,我们通过实验表明,我们的上限与 neurally 发现的代理模型在计算 PDE 序列中的 $\ell_2$ 距离是高度相关的,当 PDE 序列在爆炸解 approached 时。

A Corrected Expected Improvement Acquisition Function Under Noisy Observations

  • paper_url: http://arxiv.org/abs/2310.05166
  • repo_url: https://github.com/han678/correctednoisyei
  • paper_authors: Han Zhou, Xingchen Ma, Matthew B Blaschko
  • for: 该论文主要针对的是 Bayesian 优化中的难题,即在含有噪声的观测中使用预期改进(EI)策略。
  • methods: 该论文提出了一种基于 Gaussian Process 模型的 EI 策略修正方法,该方法可以考虑噪声的影响,并提供一个包容更多情况的 acquisition function。
  • results: 该论文通过 theoretically 和实验来证明,该修正方法可以提高 EI 策略在含有噪声的情况下的性能,并且可以在黑盒优化和神经网络模型压缩等问题中提供更好的解决方案。
    Abstract Sequential maximization of expected improvement (EI) is one of the most widely used policies in Bayesian optimization because of its simplicity and ability to handle noisy observations. In particular, the improvement function often uses the best posterior mean as the best incumbent in noisy settings. However, the uncertainty associated with the incumbent solution is often neglected in many analytic EI-type methods: a closed-form acquisition function is derived in the noise-free setting, but then applied to the setting with noisy observations. To address this limitation, we propose a modification of EI that corrects its closed-form expression by incorporating the covariance information provided by the Gaussian Process (GP) model. This acquisition function specializes to the classical noise-free result, and we argue should replace that formula in Bayesian optimization software packages, tutorials, and textbooks. This enhanced acquisition provides good generality for noisy and noiseless settings. We show that our method achieves a sublinear convergence rate on the cumulative regret bound under heteroscedastic observation noise. Our empirical results demonstrate that our proposed acquisition function can outperform EI in the presence of noisy observations on benchmark functions for black-box optimization, as well as on parameter search for neural network model compression.
    摘要 纯粹最大化期望提升(EI)是搜索优化中最广泛使用的政策之一,主要是因为它的简单性和能够处理噪音观测的能力。具体来说,提升函数经常使用 posterior mean 作为噪音观测下的最佳启发式。然而,启发式解释中往往忽略了启发式解释中的不确定性信息。为了解决这个限制,我们提议修改 EI,通过在 GP 模型中提供的决定矩阵信息来改进它的关闭形式表达。这个购买函数在噪音观测下特有化,我们认为这个函数应该取代普通的噪音观测下的表达。我们的改进购买函数在各种不同的观测环境下都具有良好的通用性。我们证明了我们的方法在各种不同的观测环境下都能够实现下降的 regret 级别。我们的实验结果表明,我们的提议的购买函数在噪音观测下可以超越 EI 在黑obox 优化和神经网络模型压缩中的性能。

Transferable Availability Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2310.05141
  • repo_url: https://github.com/trustmlrg/transpoison
  • paper_authors: Yiyong Liu, Michael Backes, Xiao Zhang
  • for: 这个论文旨在攻击机器学习模型的可用性,特别是针对模型在训练数据上的性能。
  • methods: 这个论文使用了攻击者采用不同的学习算法和攻击策略来降低模型的总测试精度。
  • results: 论文表明,如果攻击者使用不同的学习方法来攻击模型, тоThen the effectiveness of prior poisoning attacks will be significantly decreased. In addition, the authors propose a transferable poisoning attack that can produce poisoned samples with improved transferability across different learners and even paradigms. Through extensive experiments on benchmark image datasets, the authors show that their transferable poisoning attack can produce poisoned samples with significantly improved transferability.
    Abstract We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data. Existing poisoning strategies can achieve the attack goal but assume the victim to employ the same learning method as what the adversary uses to mount the attack. In this paper, we argue that this assumption is strong, since the victim may choose any learning algorithm to train the model as long as it can achieve some targeted performance on clean data. Empirically, we observe a large decrease in the effectiveness of prior poisoning attacks if the victim uses a different learning paradigm to train the model and show marked differences in frequency-level characteristics between perturbations generated with respect to different learners and attack methods. To enhance the attack transferability, we propose Transferable Poisoning, which generates high-frequency poisoning perturbations by alternately leveraging the gradient information with two specific algorithms selected from supervised and unsupervised contrastive learning paradigms. Through extensive experiments on benchmark image datasets, we show that our transferable poisoning attack can produce poisoned samples with significantly improved transferability, not only applicable to the two learners used to devise the attack but also for learning algorithms and even paradigms beyond.
    摘要 我们考虑了数据毒化攻击,敌人想要降低机器学习模型的总测试准确率,通过对训练数据进行小幅度的修改。现有的攻击策略可以实现攻击目标,但假设攻击者使用的学习方法与受害者使用的学习方法一样。在这篇论文中,我们认为这是一个强大的假设,因为受害者可以选择任何学习算法来训练模型,只要它可以在干净数据上达到一定的性能目标。我们在实验中观察到,如果受害者使用不同的学习方法来训练模型,攻击效果会减弱很多。为了提高攻击的传送性,我们提议了可传递的毒化攻击,通过交替使用两种特定的算法来生成高频毒化干扰。我们通过对标准图像集进行广泛的实验,证明了我们的可传递毒化攻击可以生成高质量的毒化样本,不仅适用于我们用于制定攻击的两种学习算法,还可以应用于其他学习算法和学习方法。

How Graph Neural Networks Learn: Lessons from Training Dynamics in Function Space

  • paper_url: http://arxiv.org/abs/2310.05105
  • repo_url: None
  • paper_authors: Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan
  • for: 本研究的目的是探讨深度学习模型在更加可读性的方式下进行学习行为。特别是对于图神经网络(GNNs),研究者们已经做出了很多进步,但是还没有充分了解GNNs在优化过程中是否会学习愿景函数。
  • methods: 研究者们使用了分析框架来研究GNNs的学习动态,并发现GNNs的训练过程可以被重新描述为一种更加熟悉的标签传播框架。此外,研究者们还提出了一种简化并实现GNNs的学习动态的方法,以提高其效率和可读性。
  • results: 研究者们发现GNNs在不同类型的图上的学习动态,包括同构图和hetrophylic graph,都具有某种程度的相互关联性。此外,研究者们还发现GNNs可以在不同类型的图上学习出高效的函数,并且这些函数可以在新的图上进行泛化。
    Abstract A long-standing goal in deep learning has been to characterize the learning behavior of black-box models in a more interpretable manner. For graph neural networks (GNNs), considerable advances have been made in formalizing what functions they can represent, however it remains less clear whether and how GNNs learn desired functions during the optimization process. To fill this critical gap, we study the learning dynamics of GNNs in function space via the analytic framework of overparameterization. In particular, we find that the seemingly complicated training process of GNNs can be re-cast into a more familiar label propagation framework, due to the graph inductive bias implicit in this process. From this vantage point, we provide explanations for why the learned GNN functions successfully generalize and for their pathological behavior on heterophilic graphs, which are consistent with observations. Practically, sparsifying and implementing the learning dynamics lead to a minimalist semi-supervised learning algorithm with the efficiency of classic algorithms and the effectiveness of modern GNNs.
    摘要 deep learning中的一个长期目标是将黑盒模型的学习行为更加解释性地表示。对于图 neural network (GNN),已经有了许多进步,但是它们是否在优化过程中学习所需的函数仍然不清楚。为了填补这一重要的空白,我们通过过参数化的方法研究 GNN 的学习动态在函数空间。具体来说,我们发现 GNN 的训练过程可以重新划为一种更加熟悉的标签传播框架,这与图适应偏好相关。从这个角度,我们提供了成功泛化和异谱图处理的解释,这与观察相符。在实践中,我们通过减少学习动态和实现来提出一种简洁的半监督学习算法,具有传统算法的效率和现代 GNN 的效果。

Asymmetrically Decentralized Federated Learning

  • paper_url: http://arxiv.org/abs/2310.05093
  • repo_url: None
  • paper_authors: Qinglun Li, Miao Zhang, Nan Yin, Quanjun Yin, Li Shen
  • For: This paper aims to address the communication burden and privacy concerns associated with centralized servers in Federated Learning (FL) by proposing a Decentralized Federated Learning (DFL) algorithm based on asymmetric topologies and the Push-Sum protocol.* Methods: The proposed DFedSGPSM algorithm combines the Sharpness Aware Minimization (SAM) optimizer and local momentum to improve algorithm performance and alleviate local heterogeneous overfitting in FL. The SAM optimizer employs gradient perturbations to generate locally flat models and searches for models with uniformly low loss values, while the local momentum accelerates the optimization process.* Results: The paper demonstrates the superior performance of the proposed DFedSGPSM algorithm compared to state-of-the-art optimizers through extensive experiments on the MNIST, CIFAR10, and CIFAR100 datasets. The theoretical analysis also proves that the algorithm achieves a convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T})$ in a non-convex smooth setting under mild assumptions, and that better topological connectivity achieves tighter upper bounds.
    Abstract To address the communication burden and privacy concerns associated with the centralized server in Federated Learning (FL), Decentralized Federated Learning (DFL) has emerged, which discards the server with a peer-to-peer (P2P) communication framework. However, most existing DFL algorithms are based on symmetric topologies, such as ring and grid topologies, which can easily lead to deadlocks and are susceptible to the impact of network link quality in practice. To address these issues, this paper proposes the DFedSGPSM algorithm, which is based on asymmetric topologies and utilizes the Push-Sum protocol to effectively solve consensus optimization problems. To further improve algorithm performance and alleviate local heterogeneous overfitting in Federated Learning (FL), our algorithm combines the Sharpness Aware Minimization (SAM) optimizer and local momentum. The SAM optimizer employs gradient perturbations to generate locally flat models and searches for models with uniformly low loss values, mitigating local heterogeneous overfitting. The local momentum accelerates the optimization process of the SAM optimizer. Theoretical analysis proves that DFedSGPSM achieves a convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T})$ in a non-convex smooth setting under mild assumptions. This analysis also reveals that better topological connectivity achieves tighter upper bounds. Empirically, extensive experiments are conducted on the MNIST, CIFAR10, and CIFAR100 datasets, demonstrating the superior performance of our algorithm compared to state-of-the-art optimizers.
    摘要 为了解决联合服务器在联合学习(FL)中的通信负担和隐私问题,协同联合学习(DFL)已经出现,它抛弃了服务器,使用幂等(P2P)通信框架。然而,大多数现有的DFL算法都基于对称网络 topology,如环和格 topology,这些 topology 可以轻松导致堵塞和因网络连接质量的影响。为解决这些问题,本文提出了DFedSGPSM算法,它基于非对称网络 topology 和Push-Sum协议来有效地解决共识优化问题。为了进一步改进算法性能并避免本地不同类型的过拟合,我们的算法结合了Sharpness Aware Minimization(SAM)优化器和本地冲击。SAM优化器使用梯度偏移来生成本地平滑模型,并在搜索模型的损失值为uniformly low时搜索模型。本地冲击加速了SAM优化器的优化过程。理论分析表明,DFedSGPSM算法在非对称光滑设定下 achieve 收敛速率为 $\mathcal{O}(\frac{1}{\sqrt{T})$ ,其中 T 是迭代次数。此分析还表明,更好的网络连接性可以实现更紧的Upper bound。实验表明,我们的算法在 MNIST、CIFAR10 和 CIFAR100 数据集上进行了广泛的实验,与现状的优化器相比,它的性能显著更高。

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

  • paper_url: http://arxiv.org/abs/2310.05079
  • repo_url: https://github.com/chengzhang-98/llm-mixed-q
  • paper_authors: Cheng Zhang, Jianyi Cheng, Ilia Shumailov, George A. Constantinides, Yiren Zhao
  • for: 这个论文的目的是解决大型自然语言模型(LLM)的扩展和缩放问题,以降低计算和存储资源的成本。
  • methods: 这篇论文使用了统计学和学习Property的分析,发现LLM层的瓶颈在于数值扩展偏移。以此为基础,他们提出了块量化方法,可以有效地减少数值扩展偏移,从计算路径的视角来看。
  • results: 根据论文的结果,使用了6位减法的量化LLM可以达到$19\times$高的数学密度和$5\times$的存储密度,比浮点32基eline高$2.5\times$的数学密度和$1.2\times$的存储密度。此外,他们还分享了在下游任务上实现 nearly-lossless 4位LLM的技巧,包括活动和重量分布的不一致、优化的精度练习策略以及LLM的统计性质中的更低的量化精度。
    Abstract The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore the statistical and learning properties of the LLM layer and attribute the bottleneck of LLM quantisation to numerical scaling offsets. To address this, we adapt block quantisations for LLMs, a family of methods that share scaling factors across packed numbers. Block quantisations efficiently reduce the numerical scaling offsets solely from an arithmetic perspective, without additional treatments in the computational path. Our nearly-lossless quantised 6-bit LLMs achieve a $19\times$ higher arithmetic density and $5\times$ memory density than the float32 baseline, surpassing the prior art 8-bit quantisation by $2.5\times$ in arithmetic density and $1.2\times$ in memory density, without requiring any data calibration or re-training. We also share our insights into sub-8-bit LLM quantisation, including the mismatch between activation and weight distributions, optimal fine-tuning strategies, and a lower quantisation granularity inherent in the statistical properties of LLMs. The latter two tricks enable nearly-lossless 4-bit LLMs on downstream tasks. Our code is open-sourced.
    摘要 Large language models (LLMs) 的推理需要巨量的计算和存储资源。为了降低这些成本,量化已成为一种有前途的解决方案,但现有的 LLM 量化主要集中在8位。在这个工作中,我们研究了 LLM 层的统计和学习特性,并归因 LLM 量化的瓶颈到数字扩大偏移。为解决这个问题,我们采用了块量化技术,这种技术在压缩数据时共享扩大因子。块量化可以高效地减少数字扩大偏移,不需要额外处理在计算路径上。我们的6位量化 LLM 可以达到浮点32基eline的19倍的数学密度和5倍的存储密度,比前一代8位量化的数学密度和存储密度高出2.5倍和1.2倍。此外,我们还分享了在下游任务上实现 nearly-lossless 4位 LLMS 的技巧,包括活动和重量分布不匹配、优化 fine-tuning 策略和 LLMS 的统计性质中的更低的量化精度。这两个技巧使得我们可以在下游任务上实现 nearly-lossless 4位 LLMS。我们的代码已经开源。

FedFed: Feature Distillation against Data Heterogeneity in Federated Learning

  • paper_url: http://arxiv.org/abs/2310.05077
  • repo_url: https://github.com/visitworld123/fedfed
  • paper_authors: Zhiqin Yang, Yonggang Zhang, Yu Zheng, Xinmei Tian, Hao Peng, Tongliang Liu, Bo Han
  • for: 这篇论文旨在解决联合学习(Federated Learning,FL)面临的数据不一致问题,即客户端数据的分布差异。
  • methods: 该论文提出了一种新的方法 called Federated Feature Distillation(FedFed),它将数据分为性能敏感特征(大量对模型性能的贡献)和性能鲁棒特征(对模型性能有限度贡献)。性能敏感特征被全局共享,以减轻数据不一致问题,而性能鲁棒特征被保留在本地。客户端可以使用本地和共享数据来训练模型。
  • results: 实验表明,FedFed 可以提高模型性能。
    Abstract Federated learning (FL) typically faces data heterogeneity, i.e., distribution shifting among clients. Sharing clients' information has shown great potentiality in mitigating data heterogeneity, yet incurs a dilemma in preserving privacy and promoting model performance. To alleviate the dilemma, we raise a fundamental question: \textit{Is it possible to share partial features in the data to tackle data heterogeneity?} In this work, we give an affirmative answer to this question by proposing a novel approach called {\textbf{Fed}erated \textbf{Fe}ature \textbf{d}istillation} (FedFed). Specifically, FedFed partitions data into performance-sensitive features (i.e., greatly contributing to model performance) and performance-robust features (i.e., limitedly contributing to model performance). The performance-sensitive features are globally shared to mitigate data heterogeneity, while the performance-robust features are kept locally. FedFed enables clients to train models over local and shared data. Comprehensive experiments demonstrate the efficacy of FedFed in promoting model performance.
    摘要 通常,联合学习(FL)会面临数据不一致性问题,即客户端数据的分布差异。如果分享客户端信息,可以减轻数据不一致性问题,但是会降低隐私和提高模型性能。为了解决这个矛盾,我们提出了一个基本问题:“是否可以分享数据中的部分特征来解决数据不一致性问题?”在这个工作中,我们给出了一个答案,并提出了一种新的方法 called“联邦特征分离”(FedFed)。具体来说,FedFed将数据分为对性能敏感的特征(即对模型性能具有重要作用)和对性能稳定的特征(即对模型性能具有有限作用)。对性能敏感的特征进行全局分享,以减轻数据不一致性问题,而对性能稳定的特征则保留在本地。FedFed允许客户端通过本地和共享数据来训练模型。我们进行了广泛的实验,并证明了FedFed的效果。

Towards Scalable Wireless Federated Learning: Challenges and Solutions

  • paper_url: http://arxiv.org/abs/2310.05076
  • repo_url: None
  • paper_authors: Yong Zhou, Yuanming Shi, Haibo Zhou, Jingjing Wang, Liqun Fu, Yang Yang
  • for: 本研究旨在探讨在无线网络中实现可信批处理的分布式机器学习( Federated Learning,FL)的挑战和解决方案。
  • methods: 本文提出了两个方面的解决方案:一是通过任务 oriented 模型聚合来提高无线通信缓存性,二是通过计算效率优化来提高资源分配的算法可扩展性。
  • results: 本文提出了三种任务 oriented 学习算法来提高计算效率,并指出了一些需要进一步研究的问题。
    Abstract The explosive growth of smart devices (e.g., mobile phones, vehicles, drones) with sensing, communication, and computation capabilities gives rise to an unprecedented amount of data. The generated massive data together with the rapid advancement of machine learning (ML) techniques spark a variety of intelligent applications. To distill intelligence for supporting these applications, federated learning (FL) emerges as an effective distributed ML framework, given its potential to enable privacy-preserving model training at the network edge. In this article, we discuss the challenges and solutions of achieving scalable wireless FL from the perspectives of both network design and resource orchestration. For network design, we discuss how task-oriented model aggregation affects the performance of wireless FL, followed by proposing effective wireless techniques to enhance the communication scalability via reducing the model aggregation distortion and improving the device participation. For resource orchestration, we identify the limitations of the existing optimization-based algorithms and propose three task-oriented learning algorithms to enhance the algorithmic scalability via achieving computation-efficient resource allocation for wireless FL. We highlight several potential research issues that deserve further study.
    摘要 随着智能设备(如移动电话、汽车、无人机)的激增,生成了历史上无 precedent的数据量。这些大量数据,加上机器学习(ML)技术的快速发展,使得各种智能应用得以实现。为了提取智能,聚合式学习(FL)作为一种有效的分布式ML框架,在网络边缘实现隐私保护的模型训练。本文从网络设计和资源调度两个角度出发,探讨了无线FL的挑战和解决方案。从网络设计角度来看,我们讨论了任务导向的模型聚合如何影响无线FL的性能,然后提出了有效的无线技术来增强通信可扩展性,例如减少模型聚合误差和提高设备参与度。从资源调度角度来看,我们发现现有的优化算法有限制,因此提出了三种任务导向的学习算法,以实现计算效率的资源分配,从而提高无线FL的算法可扩展性。我们还指出了一些需要进一步研究的问题。

Robust-GBDT: A Novel Gradient Boosting Model for Noise-Robust Classification

  • paper_url: http://arxiv.org/abs/2310.05067
  • repo_url: https://github.com/luojiaqimath/robust-gbdt
  • paper_authors: Jiaqi Luo, Yuedong Quan, Shixin Xu
  • for: 这个研究是为了提出一种可以处理标签杂音的高效搜寻算法,并且可以处理多类分类任务。
  • methods: 这个研究使用了进步的Gradient Boosting Decision Trees(GBDT)框架,并且引入了一些robust loss functions,以抵消标签杂音的影响。
  • results: 这个研究发现,使用Robust-GBDT模型可以获得更准确的预测结果,并且可以更好地处理杂音和类别偏心的问题。
    Abstract Robust boosting algorithms have emerged as alternative solutions to traditional boosting techniques for addressing label noise in classification tasks. However, these methods have predominantly focused on binary classification, limiting their applicability to multi-class tasks. Furthermore, they encounter challenges with imbalanced datasets, missing values, and computational efficiency. In this paper, we establish that the loss function employed in advanced Gradient Boosting Decision Trees (GBDT), particularly Newton's method-based GBDT, need not necessarily exhibit global convexity. Instead, the loss function only requires convexity within a specific region. Consequently, these GBDT models can leverage the benefits of nonconvex robust loss functions, making them resilient to noise. Building upon this theoretical insight, we introduce a new noise-robust boosting model called Robust-GBDT, which seamlessly integrates the advanced GBDT framework with robust losses. Additionally, we enhance the existing robust loss functions and introduce a novel robust loss function, Robust Focal Loss, designed to address class imbalance. As a result, Robust-GBDT generates more accurate predictions, significantly enhancing its generalization capabilities, especially in scenarios marked by label noise and class imbalance. Furthermore, Robust-GBDT is user-friendly and can easily integrate existing open-source code, enabling it to effectively handle complex datasets while improving computational efficiency. Numerous experiments confirm the superiority of Robust-GBDT over other noise-robust methods.
    摘要 强健的搜索算法已经出现为 tradicional boosting 技术的替代方案,以解决分类任务中的标签噪声问题。然而,这些方法主要集中在 binary 分类中,因此其可用性不高于多类任务。此外,它们还遇到了不均衡数据集、缺失值和计算效率的挑战。在这篇论文中,我们证明了 GBDT 模型使用的损失函数不必必须具有全局凸性。相反,损失函数只需要在特定区域内具有凸性。因此,这些 GBDT 模型可以利用不凸的 robust 损失函数,使其具有抗噪声的能力。基于这一理论发现,我们引入了一种新的噪声Robust GBDT 模型,该模型通过结合高级 GBDT 框架和robust损失函数来实现。此外,我们还改进了现有的robust损失函数,并引入了一种新的 Robust Focal Loss 函数,用于解决类异常现象。这使得 Robust-GBDT 可以生成更加准确的预测结果,从而提高其泛化能力,特别是在标签噪声和类异常的情况下。此外,Robust-GBDT 易于使用,可以轻松地 интеGRATE现有的开源代码,使其可以有效地处理复杂的数据集,同时提高计算效率。许多实验证明了 Robust-GBDT 的优越性。

Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain

  • paper_url: http://arxiv.org/abs/2310.05063
  • repo_url: None
  • paper_authors: Gerald Woo, Chenghao Liu, Akshat Kumar, Doyen Sahoo
  • for: 这篇论文旨在提供大规模时间序列预测数据集,以便进一步研究预测模型的预训练和扩展。
  • methods: 本研究使用云端操作(CloudOps)领域的三个大规模时间序列预测数据集,其中最大的数据集有比利они个观测值,以便进一步研究预训练和扩展时间序列模型。
  • results: 研究发现,这种预训练方法可以在大规模时间序列预测 task 上 achieve 27% 的错误减少,并且可以与 класиical 学习基eline 相比。
    Abstract Time series has been left behind in the era of pre-training and transfer learning. While research in the fields of natural language processing and computer vision are enjoying progressively larger datasets to train massive models, the most popular time series datasets consist of only tens of thousands of time steps, limiting our ability to study the effectiveness of pre-training and scaling. Recent studies have also cast doubt on the need for expressive models and scale. To alleviate these issues, we introduce three large-scale time series forecasting datasets from the cloud operations (CloudOps) domain, the largest having billions of observations, enabling further study into pre-training and scaling of time series models. We build the empirical groundwork for studying pre-training and scaling of time series models and pave the way for future research by identifying a promising candidate architecture. We show that it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method - achieving a 27% reduction in error on the largest dataset. Code and datasets will be released.
    摘要 时间序列已经被搁置在预训练和传输学习的时代之外。而自然语言处理和计算机视觉领域的研究正在拥有越来越大的数据集来训练庞大模型,而时间序列数据集仅有几万个时间步,限制了我们研究预训练和扩大的能力。最近的研究还把需要表达力强大的模型和扩大的疑问抛弃了出来。为了解决这些问题,我们介绍了三个大规模时间序列预测数据集,来自云操作(CloudOps)领域,最大数据集有数十亿个观察结果,使我们能够进一步研究预训练和扩大时间序列模型的效果。我们建立了时间序列模型预训练和扩大的基础实验,并证明了我们的方法是一个强大的零个shot基线,并且在模型和数据集大小增加时能够减少错误率。我们附加了这些数据集和结果,以及对классиical和深度学习基eline的比较,实现了最大数据集上的27%错误减少。代码和数据集将被公布。

Online Learning in Contextual Second-Price Pay-Per-Click Auctions

  • paper_url: http://arxiv.org/abs/2310.05047
  • repo_url: None
  • paper_authors: Mengxiao Zhang, Haipeng Luo
  • for: 本文研究在上下文敏感的Pay-Per-Click拍卖中进行在线学习,每轮都会收到一些上下文和一组广告,并需要对广告的点击率进行估计,以进行第二价拍卖。learner的目标是尽可能减少她的后悔,定义为她的总收益与一个假设的拍卖策略的差。
  • methods: 我们首先证明了在$T$轮后,learner可以达到$\sqrt{T}$的后悔,并且这是不可避免的,因为我们的算法和多重投机问题类似。然后,我们引用了最近的上下文敏感投机算法的进步,开发了两种实用的上下文拍卖算法:第一种使用对数权重方案和正方差误差,保持了同样的$\sqrt{T}$后悔 bound,而第二种通过简单的ε-胆策略将问题降到在线回归问题,尽管它的后悔 bound更差。
  • results: 我们在一个synthetic数据上进行了实验,并证明了我们的算法在实际应用中的有效性和优异性。
    Abstract We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads and needs to make an estimate on their click-through rate (CTR) in order to run a second-price pay-per-click auction. The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy that always makes perfect CTR predictions. We first show that $\sqrt{T}$-regret is obtainable via a computationally inefficient algorithm and that it is unavoidable since our algorithm is no easier than the classical multi-armed bandit problem. A by-product of our results is a $\sqrt{T}$-regret bound for the simpler non-contextual setting, improving upon a recent work of [Feng et al., 2023] by removing the inverse CTR dependency that could be arbitrarily large. Then, borrowing ideas from recent advances on efficient contextual bandit algorithms, we develop two practically efficient contextual auction algorithms: the first one uses the exponential weight scheme with optimistic square errors and maintains the same $\sqrt{T}$-regret bound, while the second one reduces the problem to online regression via a simple epsilon-greedy strategy, albeit with a worse regret bound. Finally, we conduct experiments on a synthetic dataset to showcase the effectiveness and superior performance of our algorithms.
    摘要 我们研究在上下文中的线上学习,在每个 $T$ 轮中,学习者接收一些上下文以及一组广告,并需要对各个广告的点击率(CTR)进行估计,以进行第二价格支付每击广告。学习者的目标是尽量减少她的恨觉,定义为她的总收入与一个假设总是正确地预测 CTR 的 oracle 策略的差距。我们首先证明了 $\sqrt{T}$-恨觉是可以实现的,并且这是不可避免的,因为我们的算法与经典多重武器问题相同。我们的结果还提供了 $\sqrt{T}$-恨觉 bound для更加简单的非上下文化设定,超过最近的 [Feng et al., 2023] 的研究成果,并将 inverse CTR 依赖项消除。然后,我们借鉴了最近的上下文策略算法的进步,开发了两种实用的上下文拍卖算法:第一种使用几何质数分配方案和乐观方差 Error,保持了同样的 $\sqrt{T}$-恨觉 bound,而第二种将问题降到在线回归问题,通过简单的ε-赫赫策略,尽管它的恨觉 bound 较差。最后,我们在一个 sintetic 数据集上进行了实验,以展示我们的算法的效果和优于性。

Deep Reinforcement Learning Based Cross-Layer Design in Terahertz Mesh Backhaul Networks

  • paper_url: http://arxiv.org/abs/2310.05034
  • repo_url: None
  • paper_authors: Zhifeng Hu, Chong Han, Xudong Wang
  • for: 这个论文是为了解决teraHertz(THz)网络中的跨层路由和长期资源分配问题,以提高未来无线后门系统的可扩展性和可靠性。
  • methods: 这个论文使用了深度强化学习(DRL)技术,实现跨层设计,包括路由对应和资源分配。在DRL方法中,使用了多任务结构,协助实现能源和子阵列的有效使用。此外,这个方法还使用了层次架构,实现每个基站的特定资源分配和学习知识传递。
  • results: simulations 表明,DEFLECT routing 比 minimal hop-count metric 消耗更少的资源,并且不会导致包库损失和第二层延迟。此外,DEFLECT DRL 方法可以在1秒内从破损链路上复原资源有效地。
    Abstract Supporting ultra-high data rates and flexible reconfigurability, Terahertz (THz) mesh networks are attractive for next-generation wireless backhaul systems that empower the integrated access and backhaul (IAB). In THz mesh backhaul networks, the efficient cross-layer routing and long-term resource allocation is yet an open problem due to dynamic traffic demands as well as possible link failures caused by the high directivity and high non-line-of-sight (NLoS) path loss of THz spectrum. In addition, unpredictable data traffic and the mixed integer programming property with the NP-hard nature further challenge the effective routing and long-term resource allocation design. In this paper, a deep reinforcement learning (DRL) based cross-layer design in THz mesh backhaul networks (DEFLECT) is proposed, by considering dynamic traffic demands and possible sudden link failures. In DEFLECT, a heuristic routing metric is first devised to facilitate resource efficiency (RE) enhancement regarding energy and sub-array usages. Furthermore, a DRL based resource allocation algorithm is developed to realize long-term RE maximization and fast recovery from broken links. Specifically in the DRL method, the exploited multi-task structure cooperatively benefits joint power and sub-array allocation. Additionally, the leveraged hierarchical architecture realizes tailored resource allocation for each base station and learned knowledge transfer for fast recovery. Simulation results show that DEFLECT routing consumes less resource, compared to the minimal hop-count metric. Moreover, unlike conventional DRL methods causing packet loss and second-level latency, DEFLECT DRL realizes the long-term RE maximization with no packet loss and millisecond-level latency, and recovers resource-efficient backhaul from broken links within 1s.
    摘要 支持超高数据速率和灵活可重新配置,tera兆Hz(THz)网络是下一代无线备用系统的吸引力,它们可以强化集成访问和备用(IAB)。在THz网络中,有效的交叉层路由和长期资源分配仍然是一个开放的问题,因为动态的流量需求以及可能的链接故障,这些链接故障是由THz频谱的高直达性和高非直视线(NLoS)损失引起的。此外,不可预测的数据流量和混合整数编程性,以及NP困难的性质,进一步挑战了有效的路由和长期资源分配设计。在这篇论文中,一种基于深度学习(DRL)的交叉层设计方法(DEFLECT)被提出,该方法考虑了动态的流量需求和可能的突然链接故障。在DEFLECT中,一种帮助提高资源效率(RE)的启发式路由度量被开发,以便更好地利用能量和子频谱资源。此外,一种基于DRL的资源分配算法被开发,以实现长期RE最大化和快速从破断链接恢复。在DRL方法中,通过合作的多任务结构,对于每个基站的共享资源进行了优化。此外,通过利用层次结构,实现了个性化的资源分配和学习知识传递,以便快速恢复资源。 simulation结果表明,DEFLECT路由占用了较少的资源,相比于最小跳数 metric。此外,与传统DRL方法不同,DEFLECT DRL实现了长期RE最大化,无 packet loss和毫秒级延迟,并在1秒内从破断链接恢复资源。

Compressed online Sinkhorn

  • paper_url: http://arxiv.org/abs/2310.05019
  • repo_url: None
  • paper_authors: Fengpei Wang, Clarice Poon, Tony Shardlow
    for:This paper focuses on the use of optimal transport (OT) distances and the Sinkhorn algorithm for large-scale data processing.methods:The paper revisits the online Sinkhorn algorithm introduced by Mensch and Peyr'e in 2020, and improves the convergence analysis with a faster rate under certain parameter choices. Additionally, the paper proposes a compressed online Sinkhorn algorithm that combines measure compression techniques with the online Sinkhorn algorithm.results:The paper provides numerical results to verify the sharpness of the improved convergence rate, as well as practical numerical gains and theoretical guarantees on the efficiency of the compressed online Sinkhorn algorithm.
    Abstract The use of optimal transport (OT) distances, and in particular entropic-regularised OT distances, is an increasingly popular evaluation metric in many areas of machine learning and data science. Their use has largely been driven by the availability of efficient algorithms such as the Sinkhorn algorithm. One of the drawbacks of the Sinkhorn algorithm for large-scale data processing is that it is a two-phase method, where one first draws a large stream of data from the probability distributions, before applying the Sinkhorn algorithm to the discrete probability measures. More recently, there have been several works developing stochastic versions of Sinkhorn that directly handle continuous streams of data. In this work, we revisit the recently introduced online Sinkhorn algorithm of [Mensch and Peyr\'e, 2020]. Our contributions are twofold: We improve the convergence analysis for the online Sinkhorn algorithm, the new rate that we obtain is faster than the previous rate under certain parameter choices. We also present numerical results to verify the sharpness of our result. Secondly, we propose the compressed online Sinkhorn algorithm which combines measure compression techniques with the online Sinkhorn algorithm. We provide numerical experiments to show practical numerical gains, as well as theoretical guarantees on the efficiency of our approach.
    摘要 使用最优运输距离(OT)和特别是减 entropy 规范化OT距离作为评价指标,在机器学习和数据科学中越来越受欢迎。其使用主要受到高效算法如沟道算法的支持。然而,沟道算法在大规模数据处理中有一个缺点,即需要先从概率分布中筛选出大量数据,然后应用沟道算法来处理离散概率度量。最近,有几篇论文开发了直接处理连续流数据的Stochastic Sinkhorn算法。在这篇文章中,我们回顾了2020年Mensch和Peyr\'e提出的在线沟道算法。我们的贡献有两点:首先,我们提高了在线沟道算法的收敛分析,新的速率比旧速率在某些参数选择下更快。其次,我们提出了压缩在线沟道算法,该算法结合了度量压缩技术和在线沟道算法。我们提供了数据实验来证明我们的方法具有实际数值优势,以及理论保证。

Human-in-the-loop: The future of Machine Learning in Automated Electron Microscopy

  • paper_url: http://arxiv.org/abs/2310.05018
  • repo_url: None
  • paper_authors: Sergei V. Kalinin, Yongtao Liu, Arpan Biswas, Gerd Duscher, Utkarsh Pratiush, Kevin Roccapriore, Maxim Ziatdinov, Rama Vasudevan
  • for: 这篇论文主要是为了介绍机器学习技术在电子顾问中的应用,以及如何通过人工智能自动化实验来提高实验效率和准确性。
  • methods: 该论文使用的方法包括机器学习算法和APIs,用于实时分析和控制微scopes的数据和操作。
  • results: 该论文提出了一种新的实验方法,称为人类在循环(hAE),其中人类操作员监督实验的进行,并通过调整机器学习算法的策略来引导实验向特定目标进行。
    Abstract Machine learning methods are progressively gaining acceptance in the electron microscopy community for de-noising, semantic segmentation, and dimensionality reduction of data post-acquisition. The introduction of the APIs by major instrument manufacturers now allows the deployment of ML workflows in microscopes, not only for data analytics but also for real-time decision-making and feedback for microscope operation. However, the number of use cases for real-time ML remains remarkably small. Here, we discuss some considerations in designing ML-based active experiments and pose that the likely strategy for the next several years will be human-in-the-loop automated experiments (hAE). In this paradigm, the ML learning agent directly controls beam position and image and spectroscopy acquisition functions, and human operator monitors experiment progression in real- and feature space of the system and tunes the policies of the ML agent to steer the experiment towards specific objectives.
    摘要 Here, we discuss some considerations for designing machine learning-based active experiments and suggest that the most likely approach for the next few years will be human-in-the-loop automated experiments (hAE). In this paradigm, the machine learning agent directly controls the beam position and image and spectroscopy acquisition functions, while the human operator monitors the experiment's progress in real- and feature space and adjusts the policies of the machine learning agent to steer the experiment towards specific objectives.

Prompt-augmented Temporal Point Process for Streaming Event Sequence

  • paper_url: http://arxiv.org/abs/2310.04993
  • repo_url: https://github.com/yanyanSann/PromptTPP
  • paper_authors: Siqiao Xue, Yan Wang, Zhixuan Chu, Xiaoming Shi, Caigao Jiang, Hongyan Hao, Gangwei Jiang, Xiaoyun Feng, James Y. Zhang, Jun Zhou
  • for: 本研究旨在 Addressing the challenge of continuous monitoring of Neural Temporal Point Processes (TPPs) for streaming event sequences, while ensuring privacy and memory constraints.
  • methods: 我们提出了一种简单 yet effective 框架 PromptTPP,它将基础 TPP 与一个 continuous-time retrieval prompt pool 集成,以便随着时间流动而学习流行事件序列。
  • results: 我们在三个实际用户行为数据集上展示了 PromptTPP 的优秀性能,并且在 Privacy 和 Memory 约束下实现了 Continual Learning。
    Abstract Neural Temporal Point Processes (TPPs) are the prevalent paradigm for modeling continuous-time event sequences, such as user activities on the web and financial transactions. In real-world applications, event data is typically received in a \emph{streaming} manner, where the distribution of patterns may shift over time. Additionally, \emph{privacy and memory constraints} are commonly observed in practical scenarios, further compounding the challenges. Therefore, the continuous monitoring of a TPP to learn the streaming event sequence is an important yet under-explored problem. Our work paper addresses this challenge by adopting Continual Learning (CL), which makes the model capable of continuously learning a sequence of tasks without catastrophic forgetting under realistic constraints. Correspondingly, we propose a simple yet effective framework, PromptTPP\footnote{Our code is available at {\small \url{ https://github.com/yanyanSann/PromptTPP}}, by integrating the base TPP with a continuous-time retrieval prompt pool. The prompts, small learnable parameters, are stored in a memory space and jointly optimized with the base TPP, ensuring that the model learns event streams sequentially without buffering past examples or task-specific attributes. We present a novel and realistic experimental setup for modeling event streams, where PromptTPP consistently achieves state-of-the-art performance across three real user behavior datasets.
    摘要

Waveformer for modelling dynamical systems

  • paper_url: http://arxiv.org/abs/2310.04990
  • repo_url: None
  • paper_authors: N Navaneeth, Souvik Chakraborty
  • for: 学习解析方程的解 solutions
  • methods: 使用wavelet变换和transformers来捕捉解的空间多尺度行为和远距离动态
  • results: 在四个数学示例中,waveformer可以准确地学习解析方程的解,并在推算区域中表现出优于现有状态 искусственный智能算法,具体来说,waveformer可以在推算区域中准确预测解的动态行为,并且其性能在推算区域中比现有算法更高出至少一个数量级。
    Abstract Neural operators have gained recognition as potent tools for learning solutions of a family of partial differential equations. The state-of-the-art neural operators excel at approximating the functional relationship between input functions and the solution space, potentially reducing computational costs and enabling real-time applications. However, they often fall short when tackling time-dependent problems, particularly in delivering accurate long-term predictions. In this work, we propose "waveformer", a novel operator learning approach for learning solutions of dynamical systems. The proposed waveformer exploits wavelet transform to capture the spatial multi-scale behavior of the solution field and transformers for capturing the long horizon dynamics. We present four numerical examples involving Burgers's equation, KS-equation, Allen Cahn equation, and Navier Stokes equation to illustrate the efficacy of the proposed approach. Results obtained indicate the capability of the proposed waveformer in learning the solution operator and show that the proposed Waveformer can learn the solution operator with high accuracy, outperforming existing state-of-the-art operator learning algorithms by up to an order, with its advantage particularly visible in the extrapolation region
    摘要 Neural operators have gained recognition as powerful tools for learning solutions of a family of partial differential equations. The state-of-the-art neural operators excel at approximating the functional relationship between input functions and the solution space, potentially reducing computational costs and enabling real-time applications. However, they often fall short when tackling time-dependent problems, particularly in delivering accurate long-term predictions. In this work, we propose "waveformer", a novel operator learning approach for learning solutions of dynamical systems. The proposed waveformer exploits wavelet transform to capture the spatial multi-scale behavior of the solution field and transformers for capturing the long horizon dynamics. We present four numerical examples involving Burgers's equation, KS-equation, Allen Cahn equation, and Navier Stokes equation to illustrate the efficacy of the proposed approach. Results obtained indicate the capability of the proposed waveformer in learning the solution operator and show that the proposed Waveformer can learn the solution operator with high accuracy, outperforming existing state-of-the-art operator learning algorithms by up to an order, with its advantage particularly visible in the extrapolation region.Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from Traditional Chinese.

Data-centric Graph Learning: A Survey

  • paper_url: http://arxiv.org/abs/2310.04987
  • repo_url: None
  • paper_authors: Cheng Yang, Deyu Bo, Jixi Liu, Yufei Peng, Boyu Chen, Haoran Dai, Ao Sun, Yue Yu, Yixin Xiao, Qi Zhang, Chunchen Wang, Yuxin Guo, Chuan Shi
  • for: 本文旨在探讨如何在深度学习时更好地处理图数据,以提高图模型的能力。
  • methods: 本文使用数据中心的方法,包括修改图数据的方法,以提高图模型的性能。
  • results: 本文提出了一种基于图学习管道的新分类法,并分析了图数据中的一些潜在问题,以及如何在数据中心的方法下解决这些问题。
    Abstract The history of artificial intelligence (AI) has witnessed the significant impact of high-quality data on various deep learning models, such as ImageNet for AlexNet and ResNet. Recently, instead of designing more complex neural architectures as model-centric approaches, the attention of AI community has shifted to data-centric ones, which focuses on better processing data to strengthen the ability of neural models. Graph learning, which operates on ubiquitous topological data, also plays an important role in the era of deep learning. In this survey, we comprehensively review graph learning approaches from the data-centric perspective, and aim to answer two crucial questions: (1) when to modify graph data and (2) how to modify graph data to unlock the potential of various graph models. Accordingly, we propose a novel taxonomy based on the stages in the graph learning pipeline, and highlight the processing methods for different data structures in the graph data, i.e., topology, feature and label. Furthermore, we analyze some potential problems embedded in graph data and discuss how to solve them in a data-centric manner. Finally, we provide some promising future directions for data-centric graph learning.
    摘要 人工智能(AI)的历史见证了高质量数据对各种深度学习模型的重要影响,如ImageNet对AlexNet和ResNet。在最近,AI社区的注意力转移到了数据中心的方法,而不是设计更复杂的神经网络模型。图学习,它在深度学习时代处理普遍的 topological 数据,也扮演着重要的角色。在本综述中,我们从数据中心的角度全面回顾图学习方法,并试图回答两个关键问题:(1)何时修改图数据,以及(2)如何修改图数据以解锁不同图模型的潜力。因此,我们提出了一种新的分类方法,基于图学习管道中的阶段,并高亮了不同数据结构在图数据中的处理方法,即 topological、特征和标签。此外,我们分析了图数据中的一些可能的问题,并讨论了如何在数据中心的方法下解决这些问题。最后,我们提出了一些未来的可能性,以推动数据中心的图学习发展。

Model-adapted Fourier sampling for generative compressed sensing

  • paper_url: http://arxiv.org/abs/2310.04984
  • repo_url: None
  • paper_authors: Aaron Berk, Simone Brugiapaglia, Yaniv Plan, Matthew Scott, Xia Sheng, Ozgur Yilmaz
  • for: 研究卷积感知抽象,即从固定单位矩阵随机抽取测量矩阵,DFT为重要特殊情况。
  • methods: 构建基于模型的采样策略,以提高采样复杂性为$\textit{O}(kd|\boldsymbol{\alpha}|_{2}^{2})$。这由两个步骤组成:首先发展新的非均匀随机采样分布的理论回归保证,然后优化采样分布以最小化采样次数。
  • results: 提出了一种适用于自然信号类型的采样复杂性,该采样复杂性可以在低卷积频率下实现高准确性。此外,对于代表采样方案进行了实验 validate。
    Abstract We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbol{\alpha}\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each component of the so-called local coherence vector $\boldsymbol{\alpha}$ quantifies the alignment of a corresponding Fourier vector with the range of $G$. We construct a model-adapted sampling strategy with an improved sample complexity of $\textit{O}(kd\| \boldsymbol{\alpha}\|_{2}^{2})$ measurements. This is enabled by: (1) new theoretical recovery guarantees that we develop for nonuniformly random sampling distributions and then (2) optimizing the sampling distribution to minimize the number of measurements needed for these guarantees. This development offers a sample complexity applicable to natural signal classes, which are often almost maximally coherent with low Fourier frequencies. Finally, we consider a surrogate sampling scheme, and validate its performance in recovery experiments using the CelebA dataset.
    摘要 我们研究生成式压缩感知(Generative Compressed Sensing),当测量矩阵随机抽取 Unitary 矩阵(DFT 为重要特殊情况)时。最近研究表明,$kdn\| \mathbf{\alpha}\|_{\infty}^{2}$ 随机 Fourier 测量可以重建信号,其中 $\mathbf{\alpha}$ 是所谓的本地协同向量,其中每个分量表示对应的 Fourier 向量与 $G$ 函数($G:\mathbb{R}^k \to \mathbb{R}^n$)的谱的对应。我们构建了适应模型的采样策略,其sample complexity 为 $kd\| \mathbf{\alpha}\|_{2}^{2}$ 测量。这是由以下两个步骤实现的:首先,我们开发了非均匀随机采样分布的新理论恢复保证;其次,我们优化采样分布,以最小化需要的测量数量。这一发展可以应用于自然的信号类型,其通常是高频率下几乎最大的协同。最后,我们考虑了代理采样方案,并通过 CelebA 数据集的实验验证其性能。

Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

  • paper_url: http://arxiv.org/abs/2310.04971
  • repo_url: None
  • paper_authors: Yihao Xue, Siddharth Joshi, Dang Nguyen, Baharan Mirzasoleiman
  • for: 本文研究了multimodal contrastive learning(MMCL)方法在不同频谱上的表达学习,尤其是CLIP的成功。
  • methods: 本文使用了rigorous分析方法,探讨了MMCL的robustness机制,发现了两种机制:intra-class contrasting和inter-class feature sharing。
  • results: 本文的 teorical findings和实验结果表明,rich captions和annotating different types of details可以提高模型的robustness和zero-shot classification accuracy under distribution shift。
    Abstract Recently, multimodal contrastive learning (MMCL) approaches, such as CLIP, have achieved a remarkable success in learning representations that are robust against distribution shift and generalize to new domains. Despite the empirical success, the mechanism behind learning such generalizable representations is not understood. In this work, we rigorously analyze this problem and uncover two mechanisms behind MMCL's robustness: \emph{intra-class contrasting}, which allows the model to learn features with a high variance, and \emph{inter-class feature sharing}, where annotated details in one class help learning other classes better. Both mechanisms prevent spurious features that are over-represented in the training data to overshadow the generalizable core features. This yields superior zero-shot classification accuracy under distribution shift. Furthermore, we theoretically demonstrate the benefits of using rich captions on robustness and explore the effect of annotating different types of details in the captions. We validate our theoretical findings through experiments, including a well-designed synthetic experiment and an experiment involving training CLIP on MS COCO and evaluating the model on variations of shifted ImageNet.
    摘要 近期,多模态对照学习(MMCL)方法,如CLIP,在学习对分布变化robust的表示方面取得了非常成功。尽管在实际中取得了成功,但是这种机制的底层原理尚未了解。在这项工作中,我们仔细分析了这个问题,并揭示了MMCL的 robustness的两种机制:内类对照(intra-class contrasting),允许模型学习具有高方差的特征,以及间类特征共享(inter-class feature sharing),其中一个类中的注解细节帮助学习其他类。这两种机制使得模型不会由训练数据中的假样特征所掩蔽。这 führt zu einer superior zero-shot classification accuracy under distribution shift. 此外,我们也 theoretically demonstrate了使用丰富的描述对robustness带来的好处,并 explore了不同类型的描述在描述中的效果。我们 validate our theoretical findings through experiments, including a well-designed synthetic experiment and an experiment involving training CLIP on MS COCO and evaluating the model on variations of shifted ImageNet.

Improved Active Learning via Dependent Leverage Score Sampling

  • paper_url: http://arxiv.org/abs/2310.04966
  • repo_url: None
  • paper_authors: Atsushi Shimizu, Xiaoou Cheng, Christopher Musco, Jonathan Weare
  • for: 这 paper 的目的是提出一种改进的活动学习方法,用于在agnostic(对抗噪声) Setting 中提高学习效果。
  • methods: 这 paper 使用了marginal leverage score sampling 和 non-independent sampling策略,以提高 espacial coverage 和减少样本数量。具体来说,这 paper 提出了一种基于 pivotal sampling 算法的方法,并在 parametric PDEs 和 uncertainty quantification 中进行了测试。
  • results: 相比于独立 sampling,这 paper 的方法可以减少到达给定准确率的样本数量,提高了效率。此外,paper 还提供了两个理论结论:一是任何非独立 leveragescore sampling 方法,如果它符合弱一侧 $\ell_{\infty}$ 独立性条件,可以活动学习 $d$ 维线性函数,只需要 $O(d\log d)$ 个样本。这一结论扩展了 recient work on matrix Chernoff bounds under $\ell_{\infty}$ independence,并可能对其他 sampling 策略进行分析。二是,对于重要的多项式回归问题,我们的 pivotal 方法可以获得 $O(d)$ 个样本的 bound。
    Abstract We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non-independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the pivotal sampling algorithm, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to $50\%$. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak one-sided $\ell_{\infty}$ independence condition (which includes pivotal sampling) can actively learn $d$ dimensional linear functions with $O(d\log d)$ samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under $\ell_{\infty}$ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound of $O(d)$ samples.
    摘要 我们展示了如何在agnostic(反对抗噪)设定中获得改进的活动学习方法,通过融合margin leverage score抽样和非独立抽样策略以提高空间覆盖率。具体而言,我们提出了一个容易实现的方法,基于pivotal抽样算法,并在parametric PDEs和 uncertainty quantification中的问题上进行测试。与独立抽样相比,我们的方法可以降低到 дости���了一定精度的样本数量,低于独立抽样的50%。我们支持我们的结果通过两个理论成果:首先,我们显示任何非独立leverage score抽样方法,只要满足弱一边 $\ell_{\infty}$ 独立性条件(包括pivotal抽样),可以活动地学习 $d$ 维Linear function,只需要 $O(d\log d)$ 样本,与独立抽样相同。这个结果推进了最近matrix Chernoff bounds under $\ell_{\infty}$ 独立性的研究,并可能适用于分析其他抽样策略。其次,我们显示,在重要的多项 regression问题上,我们的pivotal方法可以获得 $O(d)$ 样本的改进 bound。

Towards Explainable Machine Learning: The Effectiveness of Reservoir Computing in Wireless Receive Processing

  • paper_url: http://arxiv.org/abs/2310.04956
  • repo_url: None
  • paper_authors: Shashank Jere, Karim Said, Lizhong Zheng, Lingjia Liu
    for: This paper aims to improve the performance of channel equalization in wireless communications using a learning-based technique called Reservoir Computing (RC) and provide a first principles-based understanding of its operation.methods: The paper uses an echo state network (ESN) as a channel equalizer and incorporates available domain knowledge in the form of wireless channel statistics into the weights of the ESN model. This optimized initialization of the model weights leads to improved receive processing/symbol detection performance.results: The paper shows improved performance in receive processing/symbol detection through simulations, demonstrating the effectiveness of the proposed approach. This is a first step towards explainable machine learning (XML) and assigning practical model interpretability that can be utilized to improve performance and enhance detection reliability.
    Abstract Deep learning has seen a rapid adoption in a variety of wireless communications applications, including at the physical layer. While it has delivered impressive performance in tasks such as channel equalization and receive processing/symbol detection, it leaves much to be desired when it comes to explaining this superior performance. In this work, we investigate the specific task of channel equalization by applying a popular learning-based technique known as Reservoir Computing (RC), which has shown superior performance compared to conventional methods and other learning-based approaches. Specifically, we apply the echo state network (ESN) as a channel equalizer and provide a first principles-based signal processing understanding of its operation. With this groundwork, we incorporate the available domain knowledge in the form of the statistics of the wireless channel directly into the weights of the ESN model. This paves the way for optimized initialization of the ESN model weights, which are traditionally untrained and randomly initialized. Finally, we show the improvement in receive processing/symbol detection performance with this optimized initialization through simulations. This is a first step towards explainable machine learning (XML) and assigning practical model interpretability that can be utilized together with the available domain knowledge to improve performance and enhance detection reliability.
    摘要 深度学习在无线通信应用中得到了迅速的推广,包括物理层。尽管它在channel等化和接收处理/符号检测等任务中表现出色,但它在解释这种超越性表现的问题上留下了很多不满。在这项工作中,我们调查了通道等化的特定任务,通过应用 популяр的学习基于技术——储池计算(RC),该技术在其他学习基于方法和其他学习基于技术上表现出优异。具体来说,我们使用echo state网络(ESN)作为通道等化器,并提供了基于信号处理的基本原理的操作理解。通过这种基础,我们将可用的频率频道知识直接 integrate到ESN模型的权重中。这种方法可以为ESN模型的初始化提供优化,传统上是Random initialization的。最后,我们通过 simulations 表明了增强接收处理/符号检测性能的改进。这是对机器学习(XML)的第一步,它可以让模型解释性得到实践应用,并与可用的频率频道知识结合使用,以提高性能并增强检测可靠性。

Information-Theoretic Bounds on The Removal of Attribute-Specific Bias From Neural Networks

  • paper_url: http://arxiv.org/abs/2310.04955
  • repo_url: None
  • paper_authors: Jiazhi Li, Mahyar Khayatkhoei, Jiageng Zhu, Hanchen Xie, Mohamed E. Hussein, Wael AbdAlmageed
  • for: 本研究旨在探讨避免基于保护特征(如种族、性别、年龄)的神经网络预测中的偏见问题。
  • methods: 本研究使用了一些有前途的偏见除除法,但它们的局限性尚未得到充分探讨。
  • results: 研究发现,当数据集中存在强烈的偏见时,现有的偏见除除法只能在数据集中的偏见较弱时提供有效的性能。这些结论告诉我们在小型数据集中使用这些方法可能不够有效,并促使开发能够在强烈偏见情况下提供有效的方法。
    Abstract Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for predictions is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. In this work, we mathematically and empirically reveal an important limitation of attribute bias removal methods in presence of strong bias. Specifically, we derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength. We provide extensive experiments on synthetic, image, and census datasets to verify the theoretical bound and its consequences in practice. Our findings show that existing attribute bias removal methods are effective only when the inherent bias in the dataset is relatively weak, thus cautioning against the use of these methods in smaller datasets where strong attribute bias can occur, and advocating the need for methods that can overcome this limitation.
    摘要 Translated into Simplified Chinese:保持神经网络不依赖保护属性(例如种族、性别、年龄)的预测是推进公正和可信的人工智能的关键。虽然一些有前途的属性偏见除除法已经被提出,但它们的限制尚未得到充分探讨。在这种工作中,我们数学和实验上 revela了属性偏见除除法的一个重要限制:即偏见强度的影响。我们 derivates一个普遍的非虚空的信息理论上限,用于衡量任何属性偏见除除法的性能。我们在 sintetic、图像和人口普查数据集上进行了广泛的实验,以验证理论上的上限和实际情况中的后果。我们的发现表明,现有的属性偏见除除法方法只有在数据集中的偏见强度较弱时才能够有效,因此对于小型数据集而言,存在强度偏见的情况下使用这些方法可能不太可靠,而需要开发能够超越这种限制的方法。

A framework to generate sparsity-inducing regularizers for enhanced low-rank matrix completion

  • paper_url: http://arxiv.org/abs/2310.04954
  • repo_url: None
  • paper_authors: Zhi-Yong Wang, Hing Cheung So
  • for: 提出了一种架构,用于生成具有关闭式距离算子的SIR,并应用于矩阵完成低级别矩阵。
  • methods: 使用了半quadratic优化方法生成相关的regularizers,并使用了alternating direction method of multipliers(ADMM)开发算法。
  • results: 对约数据进行了extensive numerical experiments,并证明了方法的效果性和Runtime的优势。
    Abstract Applying half-quadratic optimization to loss functions can yield the corresponding regularizers, while these regularizers are usually not sparsity-inducing regularizers (SIRs). To solve this problem, we devise a framework to generate an SIR with closed-form proximity operator. Besides, we specify our framework using several commonly-used loss functions, and produce the corresponding SIRs, which are then adopted as nonconvex rank surrogates for low-rank matrix completion. Furthermore, algorithms based on the alternating direction method of multipliers are developed. Extensive numerical results show the effectiveness of our methods in terms of recovery performance and runtime.
    摘要 使半quadratice优化方法应用于损失函数可以得到相应的正则化项,但这些正则化项通常不是简洁化正则化项(SIR)。为解决这个问题,我们提出了一个框架,可以生成具有关闭形式距离运算器的SIR。此外,我们在多种通常使用的损失函数上Specify我们的框架,并生成相应的SIR,这些SIR然后被作为非对称矩阵完成的低级rankSurrogate采用。此外,我们还开发了基于多重方向分数法的算法。广泛的numerical实验表明我们的方法在完成性和运行时间方面具有良好的效果。

eess.SP - 2023-10-08

5G Advanced: Wireless Channel Virtualization and Resource Mapping for Real Time Spectrum Sharing

  • paper_url: http://arxiv.org/abs/2310.05271
  • repo_url: None
  • paper_authors: Walaa Alqwider, Aly Sabri Abdalla, Vuk Marojevic
  • for: 这个论文是为了探讨有效的频率资源利用和共享,以支持重要服务的协调访问。
  • methods: 本论文提出了一种基于无线通信通道虚拟化的现时频率共享技术,包括虚拟资源映射框架、类型映射和控制信号化。这种技术很少改变协议协议,并且透明于终端用户应用程序。
  • results: 作者验证了提议的技术,并发现了需要进一步研究设计有效的频率需求或频率使用预测信号的方法。
    Abstract The coexistence between active wireless communications and passive RF spectrum use becomes an increasingly important requirement for coordinated spectrum access supporting critical services. The ongoing research and technological progress are focused on effective spectrum utilization including large-scale MIMO and energy efficient and low-power communications, innovative spectrum use and management, and resilient spectrum sharing, just to name a few. This paper introduces a new tool for real time spectrum sharing among emerging cellular networks and passive RF sensing systems used for remote sensing and radio astronomy, among others. Specifically we propose leveraging wireless channel virtualization and propose a virtual-to-physical resource mapping framework, mapping types, and control signaling that extends the current 5G New Radio (NR) specifications. Our technology introduces minimal changes to the protocol and is meant to be transparent to the end user application. We validate the proposed technology by extending a 3GPP compliant 5G NR downlink simulator and identify further research directions where work is needed on designing effective ways to explicitly signal the need for spectrum or spectrum use predictions.
    摘要 “ aktive wireless 通信和温馈 RF 频率使用的共存变得越来越重要,以支持协调的频率访问,以满足重要服务的需求。当前的研究和技术进步都在注重有效的频率利用,包括大规模 MIMO 和能效的低功率通信,创新的频率使用和管理,以及可靠的频率分享。这篇文章介绍了一种新的实时频率分享工具,用于将趋起的Cellular网络和温馈 RF 感知系统连接起来,用于远程感知和天文学等。我们建议利用无线通道虚拟化,并提出了虚拟资源映射框架,类型和控制信号。我们的技术对协议做出了最小的改变,并且透明于应用程序端。我们验证了我们的技术,并确定了进一步的研究方向,包括设计有效的频率需求或频率使用预测的方法。”

Unsupervised deep learning framework for temperature-compensated damage assessment using ultrasonic guided waves on edge device

  • paper_url: http://arxiv.org/abs/2310.05154
  • repo_url: None
  • paper_authors: Pankhi Kashyap, Kajal Shivgan, Sheetal Patil, Ramana Raja B, Sagar Mahajan, Sauvik Banerjee, Siddharth Tallur
  • for: 本研究旨在提出一种轻量级机器学习(TinyML)框架,以实现附加到边缘设备的深度学习模型,从而降低云计算和图形处理器(GPU)的需求,提高Structural Health Monitoring(SHM)系统的可扩展性和可行性。
  • methods: 本研究使用的方法包括:(1)使用TinyML框架开发轻量级深度学习模型,(2)采用不监督学习方法检测损害,(3)使用Xilinx Artix-7 FPGA进行数据收集和控制,以及边缘推理损害。
  • results: 本研究的结果表明,使用TinyML框架可以实现轻量级深度学习模型,并且可以在不同温度(0℃-90℃)下进行损害检测,并且可以实现边缘推理。
    Abstract Fueled by the rapid development of machine learning (ML) and greater access to cloud computing and graphics processing units (GPUs), various deep learning based models have been proposed for improving performance of ultrasonic guided wave structural health monitoring (GW-SHM) systems, especially to counter complexity and heterogeneity in data due to varying environmental factors (e.g., temperature) and types of damages. Such models typically comprise of millions of trainable parameters, and therefore add to cost of deployment due to requirements of cloud connectivity and processing, thus limiting the scale of deployment of GW-SHM. In this work, we propose an alternative solution that leverages TinyML framework for development of light-weight ML models that could be directly deployed on embedded edge devices. The utility of our solution is illustrated by presenting an unsupervised learning framework for damage detection in honeycomb composite sandwich structure (HCSS) with disbond and delamination type of damages, validated using data generated by finite element (FE) simulations and experiments performed at various temperatures in the range 0{\deg}C to 90{\deg}C. We demonstrate a fully-integrated solution using a Xilinx Artix-7 FPGA for data acquisition and control, and edge-inference of damage.
    摘要 驱动了机器学习(ML)的快速发展以及更多的云计算和图像处理器(GPU)的访问,各种深度学习基于模型已经被提议用于改善ultrasound guided wave结构Integrity monitoring systems(GW-SHM)的性能,特别是面临 complexity和不同环境因素(例如温度)以及不同类型的损害。这些模型通常包含数百万可训练参数,因此增加了部署成本,因为需要云连接和处理,从而限制了GW-SHM的规模。在这种工作中,我们提出了一个替代解决方案,利用TinyML框架 для开发轻量级ML模型,可以直接在嵌入式边缘设备上部署。我们的解决方案的有用性被示例了,通过对受损 Composite sandwich structure(HCSS)中的损害探测进行无监督学习框架,验证了在0℃至90℃的温度范围内进行了数据生成的Finite element(FE) simulations和实验。我们还示出了一个完全 интеGRATED解决方案,使用Xilinx Artix-7 FPGA进行数据收集和控制,以及边缘推理损害。

Secure Short-Packet Transmission with Aerial Relaying: Blocklength and Trajectory Co-Design

  • paper_url: http://arxiv.org/abs/2310.05142
  • repo_url: None
  • paper_authors: Milad Tatar Mamaghani, Xiangyun Zhou, Nan Yang, A. Lee Swindlehurst
  • for: 提高下一代互联网物联网(IoT)网络中secure短包通信(SPC)系统的总体机密通信性能。
  • methods: 利用无人飞行器(UAV)作为移动中继器,实现在地面潜在 listened 侦测器的存在下,可靠地和安全地交换干扰性短包。
  • results: 提出一种低复杂度算法,通过分解原问题为两个互相独立的子问题,以循环迭代法解决,实现优化性能。实验结果显示,提出的设计方案在其他参考方案相比,具有显著的性能改进。
    Abstract In this paper, we propose a secure short-packet communication (SPC) system involving an unmanned aerial vehicle (UAV)-aided relay in the presence of a terrestrial passive eavesdropper. The considered system, which is applicable to various next-generation Internet-of-Things (IoT) networks, exploits a UAV as a mobile relay, facilitating the reliable and secure exchange of intermittent short packets between a pair of remote IoT devices with strict latency. Our objective is to improve the overall secrecy throughput performance of the system by carefully designing key parameters such as the coding blocklengths and the UAV trajectory. However, this inherently poses a challenging optimization problem that is difficult to solve optimally. To address the issue, we propose a low-complexity algorithm inspired by the block successive convex approximation approach, where we divide the original problem into two subproblems and solve them alternately until convergence. Numerical results demonstrate that the proposed design achieves significant performance improvements relative to other benchmarks, and offer valuable insights into determining appropriate coding blocklengths and UAV trajectory.
    摘要 在本文中,我们提出了一种安全短包通信(SPC)系统,其中一架不可持续飞行器(UAV)作为中继器,在地面通信过悬挂的听写器存在下进行通信。这种系统适用于多种下一代互联网关系(IoT)网络,通过UAV作为移动中继器,实现了两个远程IoT设备之间的可靠和安全短包交换,并且具有严格的延迟要求。我们的目标是通过精心设计密钥参数,如编码块长度和UAV轨迹,提高整体密钥传输性能。但这会导致一个困难的优化问题,这个问题难以得到最优解。为了解决这个问题,我们提出了一种低复杂度算法,基于短SUCA(successive convex approximation)方法,将原始问题分解成两个互补问题,并在它们之间 alternate 到达循环至收敛。实验结果表明,提案的设计在其他参考模型的基础上具有显著的性能改进,并且为确定合适的编码块长度和UAV轨迹提供了有价值的洞察。

Decentralized Federated Learning via MIMO Over-the-Air Computation: Consensus Analysis and Performance Optimization

  • paper_url: http://arxiv.org/abs/2310.05075
  • repo_url: None
  • paper_authors: Zhiyuan Zhai, Xiaojun Yuan, Xin Wang
  • for: 采用分布式学习方法来处理大量无线数据,提高学习效率和精度。
  • methods: 使用分布式计算和多Input多Output(MIMO)技术,并提出了一种新的多输入多出力无线分布式学习(MIMO OA-DFL)框架。
  • results: 通过分析和实验研究,发现通信错误和混合矩阵的spectral gap对学习性能有显著影响,并提出了一种joint通信学习优化问题来优化发射器扬理和混合矩阵。
    Abstract Decentralized federated learning (DFL), inherited from distributed optimization, is an emerging paradigm to leverage the explosively growing data from wireless devices in a fully distributed manner.DFL enables joint training of machine learning model under device to device (D2D) communication fashion without the coordination of a parameter server. However, the deployment of wireless DFL is facing some pivotal challenges. Communication is a critical bottleneck due to the required extensive message exchange between neighbor devices to share the learned model. Besides, consensus becomes increasingly difficult as the number of devices grows because there is no available central server to perform coordination. To overcome these difficulties, this paper proposes employing over-the-air computation (Aircomp) to improve communication efficiency by exploiting the superposition property of analog waveform in multi-access channels, and introduce the mixing matrix mechanism to promote consensus using the spectral property of symmetric doubly stochastic matrix. Specifically, we develop a novel multiple-input multiple-output over-the-air DFL (MIMO OA-DFL) framework to study over-the-air DFL problem over MIMO multiple access channels. We conduct a general convergence analysis to quantitatively capture the influence of aggregation weight and communication error on the MIMO OA-DFL performance in \emph{ad hoc} networks. The result shows that the communication error together with the spectral gap of mixing matrix has a significant impact on the learning performance. Based on this, a joint communication-learning optimization problem is formulated to optimize transceiver beamformers and mixing matrix. Extensive numerical experiments are performed to reveal the characteristics of different topologies and demonstrate the substantial learning performance enhancement of our proposed algorithm.
    摘要 《分布式聚合学习(DFL)》是一种利用无线设备上爆发式增长的数据进行完全分布式的学习 paradigm。DFL可以在设备之间(D2D)进行共同训练机器学习模型,而无需协调参数服务器。然而,无线DFL的部署受到一些重要挑战。通信是一个关键的瓶颈,因为需要在邻居设备之间进行广泛的消息交换,以便分享学习模型。此外,在设备数量增加时,协调变得越来越困难,因为没有可用的中央服务器进行协调。为了解决这些困难,本文提出了使用空中计算(Aircomp)来提高通信效率,并利用多元Access通道上的超позиция性来实现。此外,我们还引入了混合矩阵机制来促进协调,使用幂等矩阵的特性来提高协调性。我们开发了一种多输入多输出(MIMO)无线DFL框架,以研究无线DFL问题在MIMO多接收通道上。我们进行了一种通用的叠加分析,以量化协调积分和通信错误对MIMO OA-DFL性能的影响。结果显示,通信错误以及混合矩阵的spectral gap具有显著影响学习性能。基于这些结果,我们提出了一个共同通信学习优化问题,以优化天线扬发器和混合矩阵。我们进行了广泛的数学实验,以描述不同的Topology特性和 demonstate MIMO OA-DFL性能的明显提高。

Performance Analysis of RIS-Aided Double Spatial Scattering Modulation for mmWave MIMO Systems

  • paper_url: http://arxiv.org/abs/2310.05072
  • repo_url: None
  • paper_authors: Xusheng Zhu, Wen Chen, Qingqing Wu, Jun Li, Nan Cheng, Fangjiong Chen, Changle Li
  • for: 这个论文研究了一种基于多Path传播的 millimeter-wave多输入多出力系统中的可重配置智能表面(RIS)-基于double Spatial Scattering Modulation(DSSM)的实用结构。
  • methods: 该论文提出了一种优化的探测器,其首先根据接收的扩散强度对扩散方向进行模式化,然后采用最大likelihood算法进行进一步的模式化。基于提出的优化探测器,我们 derivated了条件对数对Error probability表达式。
  • results: 通过两种不同的方法,我们 derivated了准确的数学积分和闭合式表达式,以及Upper bound和 asymptotic表达式。此外,我们还给出了RIS-DSSM方案的多样性 gain。进一步,我们通过组合UPEP和错误位数来获得了Union upper bound of average bit error probability(ABEP)。在实验中,我们验证了所 derivated的Upper bound和asymptotic表达式。发现在提posed system-based phase shift keying(PSK)中,ABEP性能比quadrature amplitude modulation(QAM)好。此外,随着RIS元素的增加,ABEP性能的提高更加明显。
    Abstract In this paper, we investigate a practical structure of reconfigurable intelligent surface (RIS)-based double spatial scattering modulation (DSSM) for millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems. A suboptimal detector is proposed, in which the beam direction is first demodulated according to the received beam strength, and then the remaining information is demodulated by adopting the maximum likelihood algorithm. Based on the proposed suboptimal detector, we derive the conditional pairwise error probability expression. Further, the exact numerical integral and closed-form expressions of unconditional pairwise error probability (UPEP) are derived via two different approaches. To provide more insights, we derive the upper bound and asymptotic expressions of UPEP. In addition, the diversity gain of the RIS-DSSM scheme was also given. Furthermore, the union upper bound of average bit error probability (ABEP) is obtained by combining the UPEP and the number of error bits. Simulation results are provided to validate the derived upper bound and asymptotic expressions of ABEP. We found an interesting phenomenon that the ABEP performance of the proposed system-based phase shift keying is better than that of the quadrature amplitude modulation. Additionally, the performance advantage of ABEP is more significant with the increase in the number of RIS elements.
    摘要 在这篇论文中,我们研究了一种实用的可重新配置智能表面(RIS)基于Double Spatial Scattering Modulation(DSSM)的 millimeter 波(mmWave)多输入多出力(MIMO)系统。我们提出了一种不优的探测器,其中首先根据接收到的磁场强度将磁场方向进行模式化,然后采用最大 LIKElihood算法进行模式化。基于我们提出的不优探测器,我们 deriv了Conditional Pairwise Error Probability表达式。进一步,我们通过两种不同的方法 deriv了无条件 Pairwise Error Probability(UPEP)的精确数学 интеграル和闭合式表达式。为了提供更多的意见,我们 deriv了UPEP的上界和极限表达式。此外,我们还给出了RIS-DSSM方案的多样性收益。此外,我们通过将UPEP和错误比特数相加来获得了Union Upper Bound of Average Bit Error Probability(ABEP)的Upper bound。我们通过实验 validate了我们 deriv的Upper bound和极限表达式。我们发现了一种有趣的现象,即提posed systembased phase shift keying的ABEP性能比quadrature amplitude modulation更好。此外,ABEP性能的提升程度随RIS元素的增加而更加显著。

Robust matrix completion via Novel M-estimator Functions

  • paper_url: http://arxiv.org/abs/2310.04953
  • repo_url: None
  • paper_authors: Zhi-Yong Wang, Hing Cheung So
  • for: robust matrix completion
  • methods: generates a class of nonconvex functions to down-weight outlier-corrupted observations, and develops efficient algorithms based on these functions
  • results: superior recovery accuracy and runtime compared to competitors
    Abstract M-estmators including the Welsch and Cauchy have been widely adopted for robustness against outliers, but they also down-weigh the uncontaminated data. To address this issue, we devise a framework to generate a class of nonconvex functions which only down-weigh outlier-corrupted observations. Our framework is then applied to the Welsch, Cauchy and $\ell_p$-norm functions to produce the corresponding robust loss functions. Targeting on the application of robust matrix completion, efficient algorithms based on these functions are developed and their convergence is analyzed. Finally, extensive numerical results demonstrate that the proposed methods are superior to the competitors in terms of recovery accuracy and runtime.
    摘要 M-estimators including Welsch 和 Cauchy 已经广泛采用,但它们也会减小不受污染的数据。为解决这个问题,我们提出了一个框架,生成一类非凸函数,只有在受污染观测值时减小。我们然后应用这个框架到 Welsch、Cauchy 和 $\ell_p$-norm 函数,生成相应的Robust loss函数。我们然后开发了高效的算法,并分析它们的收敛性。最后,我们在大量的数据上进行了丰富的数据测试,并证明了我们的方法在准确率和运行时间上都高于竞争对手。Note: "M-estimators" is translated as "M-估计器" in Simplified Chinese, and "robust loss functions" is translated as "Robust 损失函数".