cs.AI - 2023-10-25

math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories

  • paper_url: http://arxiv.org/abs/2310.17064
  • repo_url: None
  • paper_authors: Hassen Saidi, Susmit Jha, Tuhin Sahai
  • for: This paper aims to investigate the applicability of large language models (LLMs) in formalizing advanced mathematical concepts and to propose a framework for critically reviewing and checking mathematical reasoning in research papers.
  • methods: The proposed framework synergizes the capabilities of proof assistants, specifically PVS, with LLMs, enabling a bridge between textual descriptions in academic papers and formal specifications in PVS.
  • results: The proposed approach, called “math-PVS,” can automatically extract and formalize mathematical theorems from research papers, offering an innovative tool for academic review and discovery.
    Abstract As artificial intelligence (AI) gains greater adoption in a wide variety of applications, it has immense potential to contribute to mathematical discovery, by guiding conjecture generation, constructing counterexamples, assisting in formalizing mathematics, and discovering connections between different mathematical areas, to name a few. While prior work has leveraged computers for exhaustive mathematical proof search, recent efforts based on large language models (LLMs) aspire to position computing platforms as co-contributors in the mathematical research process. Despite their current limitations in logic and mathematical tasks, there is growing interest in melding theorem proving systems with foundation models. This work investigates the applicability of LLMs in formalizing advanced mathematical concepts and proposes a framework that can critically review and check mathematical reasoning in research papers. Given the noted reasoning shortcomings of LLMs, our approach synergizes the capabilities of proof assistants, specifically PVS, with LLMs, enabling a bridge between textual descriptions in academic papers and formal specifications in PVS. By harnessing the PVS environment, coupled with data ingestion and conversion mechanisms, we envision an automated process, called \emph{math-PVS}, to extract and formalize mathematical theorems from research papers, offering an innovative tool for academic review and discovery.
    摘要 随着人工智能(AI)在各种应用领域的推广,它在数学发现方面拥有巨大的潜力。AI可以引导推理生成、构建反例、协助正式化数学,以及发现不同数学领域之间的连接,等等。尽管以往的计算机被用于极限的数学证明搜索,但近期基于大语言模型(LLM)的努力希望将计算平台作为数学研究过程中的合作伙伴。虽然LLM在逻辑和数学任务上有限制,但是有关将基础模型与证明系统融合的兴趣在增长。这项工作研究了LLM在正式化高级数学概念方面的可用性,并提出了一个框架,可以对研究论文中的数学逻辑进行检查和评审。由于LLM的逻辑推理缺陷,我们的方法结合证明助手PVS的能力,实现了从学术论文中的文本描述转换到PVS中的正式规定的自动过程。通过将PVS环境与数据入口和转换机制相结合,我们可以实现一个名为“math-PVS”的自动化过程,从研究论文中提取和正式化数学定理,为数学研究的自动化评审和发现提供了一个创新的工具。

Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer

  • paper_url: http://arxiv.org/abs/2310.17049
  • repo_url: https://github.com/vigor-jzhang/icc-regularizer
  • paper_authors: Jianwei Zhang, Suren Jayasuriya, Visar Berisha
  • for: 这个论文的目的是提出一种新的超参数化方法,以提高深度神经网络在特定机器学习任务中的表现。
  • methods: 这个论文使用了 measurement theory 中的重复性概念,并提出了一种新的评价指标 - 内类相关系数(ICC)来评估嵌入的重复性。
  • results: 实验结果表明,添加 ICC 正则化可以提高学习的嵌入重复性,并且这些嵌入可以提高下游任务的表现,如 speaker verification、voice style conversion 和诊断异常声音。
    Abstract A good supervised embedding for a specific machine learning task is only sensitive to changes in the label of interest and is invariant to other confounding factors. We leverage the concept of repeatability from measurement theory to describe this property and propose to use the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. We then propose a novel regularizer, the ICC regularizer, as a complementary component for contrastive losses to guide deep neural networks to produce embeddings with higher repeatability. We use simulated data to explain why the ICC regularizer works better on minimizing the intra-class variance than the contrastive loss alone. We implement the ICC regularizer and apply it to three speech tasks: speaker verification, voice style conversion, and a clinical application for detecting dysphonic voice. The experimental results demonstrate that adding an ICC regularizer can improve the repeatability of learned embeddings compared to only using the contrastive loss; further, these embeddings lead to improved performance in these downstream tasks.
    摘要 一个好的监督式嵌入是只受标签变化的影响,而不受其他干扰因素的影响。我们利用测量理论中的重复性来描述这一特性,并提议使用内类相关系数(ICC)来评估嵌入的重复性。我们then propose a novel regularizer, the ICC regularizer, as a complementary component for contrastive losses to guide deep neural networks to produce embeddings with higher repeatability. We use simulated data to explain why the ICC regularizer works better on minimizing the intra-class variance than the contrastive loss alone. We implement the ICC regularizer and apply it to three speech tasks: speaker verification, voice style conversion, and a clinical application for detecting dysphonic voice. The experimental results demonstrate that adding an ICC regularizer can improve the repeatability of learned embeddings compared to only using the contrastive loss; further, these embeddings lead to improved performance in these downstream tasks.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

StochGradAdam: Accelerating Neural Networks Training with Stochastic Gradient Sampling

  • paper_url: http://arxiv.org/abs/2310.17042
  • repo_url: None
  • paper_authors: Juyoung Yun
  • for: 提高深度学习优化的稳定性和性能
  • methods: 使用抽样 gradient 技术,选择部分 gradients 进行每一轮的优化
  • results: 在图像分类和 segmentation 任务中表现出色,比传统 Adam 优化器更高效Here’s the translation of the three key points in English:
  • for: Improving the stability and performance of deep learning optimization
  • methods: Using gradient sampling technique, selectively considering a subset of gradients for each iteration
  • results: Superior performance in image classification and segmentation tasks compared to traditional Adam optimizer
    Abstract In the rapidly advancing domain of deep learning optimization, this paper unveils the StochGradAdam optimizer, a novel adaptation of the well-regarded Adam algorithm. Central to StochGradAdam is its gradient sampling technique. This method not only ensures stable convergence but also leverages the advantages of selective gradient consideration, fostering robust training by potentially mitigating the effects of noisy or outlier data and enhancing the exploration of the loss landscape for more dependable convergence. In both image classification and segmentation tasks, StochGradAdam has demonstrated superior performance compared to the traditional Adam optimizer. By judiciously sampling a subset of gradients at each iteration, the optimizer is optimized for managing intricate models. The paper provides a comprehensive exploration of StochGradAdam's methodology, from its mathematical foundations to bias correction strategies, heralding a promising advancement in deep learning training techniques.
    摘要 在深度学习优化领域的快速发展中,这篇论文公布了StochGradAdam优化器,这是Adam算法的一种新的变体。StochGradAdam的核心技术是 Gradient Sampling 技术,这种方法不仅保证稳定的收敛,还可以选择性考虑梯度,从而避免噪音或异常数据的影响,并且可以更好地探索损失函数的地形,以更可靠的收敛。在图像分类和分割任务中,StochGradAdam已经与传统的Adam优化器相比示出了更高的性能。每次迭代中选择一 subset of 梯度,使得优化器更适合处理复杂的模型。文章从数学基础到偏误修正策略进行了全面的探讨,这标志着深度学习训练技术的一个新的突破。

On Surgical Fine-tuning for Language Encoders

  • paper_url: http://arxiv.org/abs/2310.17041
  • repo_url: https://github.com/ymtao5219/surgical_fine_tuning
  • paper_authors: Abhilasha Lodha, Gayatri Belapurkar, Saloni Chalkapurkar, Yuanming Tao, Reshmi Ghosh, Samyadeep Basu, Dmitrii Petrov, Soundararajan Srinivasan
  • for: 这篇论文的目的是为了探索可以将语言模型 Fine-tuning 的层数范围降低到少数层,以提高下游语言任务的性能。
  • methods: 本研究使用了一个简单的度量基于渔业信息矩阵(FIM score),来选择可以进行选择性 Fine-tuning 的层。这个度量可以实际地选择出适合的层,实现了下游语言任务的强大表现。
  • results: 研究发现,只需要 Fine-tuning 少数层就可以得到和完全 Fine-tuning 所有层的性能相似或更好的结果。此外,这个度量还可以在优化过程中保持不变,证明了其可靠性。
    Abstract Fine-tuning all the layers of a pre-trained neural language encoder (either using all the parameters or using parameter-efficient methods) is often the de-facto way of adapting it to a new task. We show evidence that for different downstream language tasks, fine-tuning only a subset of layers is sufficient to obtain performance that is close to and often better than fine-tuning all the layers in the language encoder. We propose an efficient metric based on the diagonal of the Fisher information matrix (FIM score), to select the candidate layers for selective fine-tuning. We show, empirically on GLUE and SuperGLUE tasks and across distinct language encoders, that this metric can effectively select layers leading to a strong downstream performance. Our work highlights that task-specific information corresponding to a given downstream task is often localized within a few layers, and tuning only those is sufficient for strong performance. Additionally, we demonstrate the robustness of the FIM score to rank layers in a manner that remains constant during the optimization process.
    摘要 通常情况下,使用预训练神经语言编码器的所有参数进行精细调整(或者使用 parameter-efficient methods)是适应新任务的准确方法。我们的实验表明,对不同的下游语言任务,只需要调整一 subset of layers 可以获得与所有层的语言编码器调整性能很近的性能。我们提出了一个有效的度量基于斜矩阵 Fisher information matrix(FIM score),用于选择候选层进行选择性调整。我们的实验表明,这个度量可以有效地选择层,并在 GLUE 和 SuperGLUE 任务上 across distinct language encoders 获得出色的下游性能。我们的研究表明,任务特定的信息通常在几层中具有高度的地方化特征,并且只需要调整这些层可以获得出色的性能。此外,我们还证明了 FIM score 可以在优化过程中保持不变的方式对层进行排名。

Apollo: Zero-shot MultiModal Reasoning with Multiple Experts

  • paper_url: http://arxiv.org/abs/2310.18369
  • repo_url: https://github.com/danielabd/apollo-cap
  • paper_authors: Daniela Ben-David, Tzuf Paz-Argaman, Reut Tsarfaty
  • For: The paper is written for proposing a modular framework that leverages the expertise of different foundation models over different modalities and domains to perform a single, complex, multi-modal task without relying on prompt engineering or tailor-made multi-modal training.* Methods: The paper proposes a modular framework that enables decentralized command execution and allows each model to contribute and benefit from the expertise of the other models. The approach can be extended to a variety of foundation models, including audio and vision models, and does not depend on prompts.* Results: The paper demonstrates the effectiveness of the proposed approach on two tasks: stylized image captioning and audio-aware image captioning. The experiments show that the approach outperforms semi-supervised state-of-the-art models on the stylized image captioning task while being zero-shot and avoiding costly training, data collection, and prompt engineering. Additionally, the approach is applied to a novel task of audio-aware image captioning, where the task is to generate text that describes the image within the context of the provided audio.
    Abstract We propose a modular framework that leverages the expertise of different foundation models over different modalities and domains in order to perform a single, complex, multi-modal task, without relying on prompt engineering or otherwise tailor-made multi-modal training. Our approach enables decentralized command execution and allows each model to both contribute and benefit from the expertise of the other models. Our method can be extended to a variety of foundation models (including audio and vision), above and beyond only language models, as it does not depend on prompts. We demonstrate our approach on two tasks. On the well-known task of stylized image captioning, our experiments show that our approach outperforms semi-supervised state-of-the-art models, while being zero-shot and avoiding costly training, data collection, and prompt engineering. We further demonstrate this method on a novel task, audio-aware image captioning, in which an image and audio are given and the task is to generate text that describes the image within the context of the provided audio. Our code is available on GitHub.
    摘要 我们提出了一个模块化框架,利用不同基础模型在不同Modalities和领域的专业知识,实现单一、复杂多Modal任务,不依赖于提问工程或特制多Modal训练。我们的方法允许分布式命令执行和每个模型都可以享受到其他模型的专业知识。我们的方法可以扩展到多种基础模型(包括音频和视觉),而不仅仅是语言模型,因为它不依赖于提问。我们的实验表明,我们的方法可以超越半导化状态体验的模型,而且是零shot和不需要贵重训练、数据收集和提问工程。我们还在一个新任务上进行了实验,即Audio-aware图像描述,在给定的图像和音频基础上,生成描述图像的文本。我们的代码可以在GitHub上下载。

netFound: Foundation Model for Network Security

  • paper_url: http://arxiv.org/abs/2310.17025
  • repo_url: None
  • paper_authors: Satyandra Guthula, Navya Battula, Roman Beltiukov, Wenbo Guo, Arpit Gupta
  • for: 本研究旨在提出一种基础模型(netFound),用于网络安全领域的机器学习(ML)应用。
  • methods: 本研究使用自我超vised算法对 readily available的无标签网络包迹进行预训练,然后使用层次和多模态特征来具体捕捉网络交互的隐藏 context。
  • results: 对三种网络下游任务(流量分类、网络入侵检测和APT检测)进行了实验,并证明了 netFound 在这些任务中的superiority,同时也证明了其对噪音和缺失标签、时间变化和多种网络环境的Robustness。
    Abstract In ML for network security, traditional workflows rely on high-quality labeled data and manual feature engineering, but limited datasets and human expertise hinder feature selection, leading to models struggling to capture crucial relationships and generalize effectively. Inspired by recent advancements in ML application domains like GPT-4 and Vision Transformers, we have developed netFound, a foundational model for network security. This model undergoes pre-training using self-supervised algorithms applied to readily available unlabeled network packet traces. netFound's design incorporates hierarchical and multi-modal attributes of network traffic, effectively capturing hidden networking contexts, including application logic, communication protocols, and network conditions. With this pre-trained foundation in place, we can fine-tune netFound for a wide array of downstream tasks, even when dealing with low-quality, limited, and noisy labeled data. Our experiments demonstrate netFound's superiority over existing state-of-the-art ML-based solutions across three distinct network downstream tasks: traffic classification, network intrusion detection, and APT detection. Furthermore, we emphasize netFound's robustness against noisy and missing labels, as well as its ability to generalize across temporal variations and diverse network environments. Finally, through a series of ablation studies, we provide comprehensive insights into how our design choices enable netFound to more effectively capture hidden networking contexts, further solidifying its performance and utility in network security applications.
    摘要 在网络安全领域中的机器学习(ML)工作流程,传统上依靠高质量的标签数据和人工工程师,但是有限的数据集和人工专业知识限制了特征选择,导致模型困难捕捉关键关系和泛化有效。受最近的机器学习应用领域的进步,如GPT-4和视觉转换器,我们开发了netFound,一个基础模型 для网络安全。netFound模型在自动学习算法应用于可以获得的无标签网络包迹数据上进行预训练。netFound的设计包括层次结构和多模式特征的网络流量,能够有效捕捉隐藏的网络上下文,包括应用逻辑、通信协议和网络条件。通过这种预训练基础,我们可以对netFound进行细化,即使处理低质量、有限和噪声的标签数据时。我们的实验表明,netFound在三个不同的网络下游任务上表现出优于当前状态艺术的机器学习基本解决方案:流量分类、网络入侵检测和APT检测。此外,我们强调netFound对噪声和缺失标签的 Robustness,以及其能够在时间变化和多种网络环境下泛化。最后,通过一系列的减少研究,我们提供了广泛的启示,描述了我们的设计选择如何使得netFound更有效地捕捉隐藏的网络上Context,进一步巩固其性能和实用性在网络安全应用中。

Controlled Decoding from Language Models

  • paper_url: http://arxiv.org/abs/2310.17022
  • repo_url: None
  • paper_authors: Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Trevor Strohman, Jilin Chen, Alex Beutel, Ahmad Beirami
  • for: 这个论文目标是控制语言模型的生成,使其向高评价结果靠拢。
  • methods: 该论文提出了一种新的Off-policy Reinforcement Learning方法,称为Controlled Decoding(CD),通过一个值函数来控制生成的评价。
  • results: 实验表明,CD能够有效地控制Reddit conversations corpus中的语言模型生成,并且可以解决多目标 reinforcement learning 问题无需额外复杂度。
    Abstract We propose controlled decoding (CD), a novel off-policy reinforcement learning method to control the autoregressive generation from language models towards high reward outcomes. CD solves an off-policy reinforcement learning problem through a value function for the reward, which we call a prefix scorer. The prefix scorer is used at inference time to steer the generation towards higher reward outcomes. We show that the prefix scorer may be trained on (possibly) off-policy data to predict the expected reward when decoding is continued from a partially decoded response. We empirically demonstrate that CD is effective as a control mechanism on Reddit conversations corpus. We also show that the modularity of the design of CD makes it possible to control for multiple rewards, effectively solving a multi-objective reinforcement learning problem with no additional complexity. Finally, we show that CD can be applied in a novel blockwise fashion at inference-time, again without the need for any training-time changes, essentially bridging the gap between the popular best-of-$K$ strategy and token-level reinforcement learning. This makes CD a promising approach for alignment of language models.
    摘要 我们提出控制解码(CD),一种新的离政策强化学习方法,用于控制语言模型的自然逻辑生成向高赏点结果。CD解决了离政策强化学习问题通过一个值函数,我们称之为前缀评分器。前缀评分器在推理时使用于引导生成向高赏点结果。我们证明了前缀评分器可以在(可能)离政策数据上训练,以预测继续推理后的预期奖励。我们在Reddit会话集体上进行了实验,证明了CD的效果。此外,我们还表明了CD的模块化设计,可以控制多个奖励,实际解决了多目标强化学习问题无额外复杂度。最后,我们表明了CD可以在推理时进行块式应用,再无需任何训练时间变化,实际连接了最受欢迎的best-of-$K$策略和token级强化学习。这使得CD成为对语言模型的Alignment的有望方法。

An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives

  • paper_url: http://arxiv.org/abs/2310.17017
  • repo_url: https://github.com/jeffreych0/mental_chatbot_survey
  • paper_authors: Young Min Cho, Sunny Rai, Lyle Ungar, João Sedoc, Sharath Chandra Guntuku
  • for: 这个论文主要是为了探讨心理健康对话代理(即 chatbot)在解决心理健康挑战方面的潜在效果,以及如何bridge между计算机科学和医学两个领域之间的知识分享。
  • methods: 这篇论文采用了PRISMA框架进行系统性的文献综述,检查了534篇发表在计算机科学和医学两个领域的论文。
  • results: 论文发现了136篇关于建立心理健康相关对话代理的关键论文,这些论文中的模型和实验设计技术有多种多样。计算机科学论文更加关注LLM技术和自动化评价指标,而医学论文则更加关注规则驱动的对话代理和参与者的健康结果。
    Abstract Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this gap, we conduct a comprehensive literature review using the PRISMA framework, reviewing 534 papers published in both computer science and medicine. Our systematic review reveals 136 key papers on building mental health-related conversational agents with diverse characteristics of modeling and experimental design techniques. We find that computer science papers focus on LLM techniques and evaluating response quality using automated metrics with little attention to the application while medical papers use rule-based conversational agents and outcome metrics to measure the health outcomes of participants. Based on our findings on transparency, ethics, and cultural heterogeneity in this review, we provide a few recommendations to help bridge the disciplinary divide and enable the cross-disciplinary development of mental health conversational agents.
    摘要 心理健康对话机器人(即chatbot)广泛研究其潜在性能提供访问支持心理健康挑战的人。先前的调查主要考虑计算机科学和医学领域发表的论文,导致两个领域之间的理解不同,阻碍两个领域之间的有益知识共享。为bridging这个差距,我们采用PRISMA框架进行了全面的文献综述,查看了534篇发表在计算机科学和医学两个领域的论文。我们的系统性综述发现了136篇关于建立心理健康相关对话机器人的重要论文,其中计算机科学论文主要关注LLM技术和自动评价指标,而医学论文主要采用规则型对话机器人和参与者的健康结果评价指标。根据我们在透明度、伦理和文化多样性方面的发现,我们提出了一些建议,以帮助跨学科发展心理健康对话机器人。

Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control

  • paper_url: http://arxiv.org/abs/2310.17011
  • repo_url: None
  • paper_authors: Elif Bozkurt
  • for: 这个论文旨在创建一个基于真实人脸动画的表情控制框架,以实现高度自然和可靠的人脸动画synthesis。
  • methods: 该框架使用一个非autoregressive encoder-decoder架构,包括表情编码器、语音编码器和表情解码器。在训练阶段,表情编码器首先分解人脸动画序列为个性特征和语音相关信息,然后将这些信息输入到 transformer层中进行更新。
  • results: 该方法可以生成基于输入语音的自然和准确的人脸动画,同时保留目标人脸的说话风格。
    Abstract Different people have different facial expressions while speaking emotionally. A realistic facial animation system should consider such identity-specific speaking styles and facial idiosyncrasies to achieve high-degree of naturalness and plausibility. Existing approaches to personalized speech-driven 3D facial animation either use one-hot identity labels or rely-on person specific models which limit their scalability. We present a personalized speech-driven expressive 3D facial animation synthesis framework that models identity specific facial motion as latent representations (called as styles), and synthesizes novel animations given a speech input with the target style for various emotion categories. Our framework is trained in an end-to-end fashion and has a non-autoregressive encoder-decoder architecture with three main components: expression encoder, speech encoder and expression decoder. Since, expressive facial motion includes both identity-specific style and speech-related content information; expression encoder first disentangles facial motion sequences into style and content representations, respectively. Then, both of the speech encoder and the expression decoders input the extracted style information to update transformer layer weights during training phase. Our speech encoder also extracts speech phoneme label and duration information to achieve better synchrony within the non-autoregressive synthesis mechanism more effectively. Through detailed experiments, we demonstrate that our approach produces temporally coherent facial expressions from input speech while preserving the speaking styles of the target identities.
    摘要 不同的人有不同的脸部表达方式,一个真实的脸部动画系统应该考虑到这些人各自的脸部表达风格和特点,以达到高度的自然性和可信度。现有的人性化语音驱动3D脸部动画方法可能使用一个热度标签或者基于特定人的模型,这限制了其扩展性。我们提出了一种基于语音输入的个性化表达3D脸部动画生成框架,该框架模型了人各自的脸部运动为缓存表示(称为风格),并将输入语音中的各种情感类别内容与目标风格相匹配。我们的框架在端到端的方式进行训练,具有非autoregressive编码器-解码器架构,包括表达编码器、语音编码器和表达解码器三个主要组件。由于表达动作包含人各自风格特点和语音相关的内容信息,表达编码器首先分解脸部动作序列为风格和内容表示,然后两个语音编码器和表达解码器都输入提取的风格信息以更新权重 durante el entrenamiento.我们的语音编码器还提取了语音音频标签和持续时间信息,以更好地同步在非autoregressive生成过程中。经过详细的实验,我们示出了我们的方法可以从输入语音中生成具有同步的脸部表达,同时保留目标人的说话风格。

This Reads Like That: Deep Learning for Interpretable Natural Language Processing

  • paper_url: http://arxiv.org/abs/2310.17010
  • repo_url: https://github.com/fanconic/this_reads_like_that
  • paper_authors: Claudio Fanconi, Moritz Vandenhirtz, Severin Husmann, Julia E. Vogt
  • for: 这个论文是为了提高自然语言处理中的 prototype 网络的性能和可解释性而写的。
  • methods: 这个论文使用了learned weighted similarity measure来增强prototype网络中的相似计算,以及提出了一种post-hoc解释机制来提取输入句子和prototype句子中关键的单词。
  • results: 论文的实验结果表明,对于 AG News 和 RT Polarity 数据集,提出的方法不仅超过了之前的 prototype-based 方法的预测性能,还提高了解释性的准确性 compared to rationale-based 循环卷积。
    Abstract Prototype learning, a popular machine learning method designed for inherently interpretable decisions, leverages similarities to learned prototypes for classifying new data. While it is mainly applied in computer vision, in this work, we build upon prior research and further explore the extension of prototypical networks to natural language processing. We introduce a learned weighted similarity measure that enhances the similarity computation by focusing on informative dimensions of pre-trained sentence embeddings. Additionally, we propose a post-hoc explainability mechanism that extracts prediction-relevant words from both the prototype and input sentences. Finally, we empirically demonstrate that our proposed method not only improves predictive performance on the AG News and RT Polarity datasets over a previous prototype-based approach, but also improves the faithfulness of explanations compared to rationale-based recurrent convolutions.
    摘要 专案学习,一种具有内在可解释的机器学习方法,利用学习到的原型 Similarities 来类别新数据。它主要应用于计算机视觉领域,在这个工作中,我们基于先前的研究进一步探索 prototype 网络的扩展到自然语言处理。我们提出了一个学习加权相似度量表,可以增强相似度计算,专注于预训照句子嵌入中的有用维度。此外,我们提出了一个后续解释机制,可以从原型和输入句子中提取预测相关的字词。最后,我们实践表明,我们的提议方法不仅在 AG News 和 RT Polarity 数据集上超越先前的原型基于方法,而且也提高了解释的实惠性比过去的 rational 基于循环推导。

STEER: Semantic Turn Extension-Expansion Recognition for Voice Assistants

  • paper_url: http://arxiv.org/abs/2310.16990
  • repo_url: None
  • paper_authors: Leon Liyang Zhang, Jiarui Lu, Joel Ruben Antony Moniz, Aditya Kulkarni, Dhivya Piraviperumal, Tien Dung Tran, Nicholas Tzou, Hong Yu
  • for: 这个研究的目的是提出一种探测用户发送后续指令时的引导intent模型(STEER),以便更好地理解用户的需求。
  • methods: 该模型使用了一些规则来采样opt-in的使用数据,并使用了自然语言处理技术来分类用户的意图。
  • results: 实验结果表明,STEER模型在采样的数据上显示了优秀的准确率(超过95%),并且在真实世界中的探测场景中也表现出了强大的零基eline性。此外,提出了一个增强版本的模型(STEER+),该模型使用semantic parse tree来提供更多的上下文,以便更好地理解句子中的异常词。
    Abstract In the context of a voice assistant system, steering refers to the phenomenon in which a user issues a follow-up command attempting to direct or clarify a previous turn. We propose STEER, a steering detection model that predicts whether a follow-up turn is a user's attempt to steer the previous command. Constructing a training dataset for steering use cases poses challenges due to the cold-start problem. To overcome this, we developed heuristic rules to sample opt-in usage data, approximating positive and negative samples without any annotation. Our experimental results show promising performance in identifying steering intent, with over 95% accuracy on our sampled data. Moreover, STEER, in conjunction with our sampling strategy, aligns effectively with real-world steering scenarios, as evidenced by its strong zero-shot performance on a human-graded evaluation set. In addition to relying solely on user transcripts as input, we introduce STEER+, an enhanced version of the model. STEER+ utilizes a semantic parse tree to provide more context on out-of-vocabulary words, such as named entities that often occur at the sentence boundary. This further improves model performance, reducing error rate in domains where entities frequently appear, such as messaging. Lastly, we present a data analysis that highlights the improvement in user experience when voice assistants support steering use cases.
    摘要 在语音助手系统中,“steering”指的是用户发送后续命令,以修正或解释上一个命令的现象。我们提出了STEER模型,可以预测用户是否通过后续命令来修正之前的命令。因为constructing一个训练集用于steering用例具有冷启动问题,我们开发了一些规则来采样 opt-in 使用数据,使用无注释的样本来 aproximate正例和负例。我们的实验结果表明,STEER模型在预测steering意图时显示了良好的性能,准确率高于95%。此外,STEER模型, junto con我们的采样策略,对实际世界中的steering场景具有很强的适应性,如果 human-graded evaluation set 中的 zero-shot 性能。此外,我们还引入了STEER+,一个改进版本的模型。STEER+使用semantic parse tree来提供更多的上下文,包括 sentence boundary 上的名称实体,这会进一步提高模型性能,特别是在消息频道上。最后,我们展示了一个数据分析,表明voice assistant支持steering用例可以提高用户体验。

The Significance of Machine Learning in Clinical Disease Diagnosis: A Review

  • paper_url: http://arxiv.org/abs/2310.16978
  • repo_url: None
  • paper_authors: S M Atikur Rahman, Sifat Ibtisum, Ehsan Bazgir, Tumpa Barai
  • for: 本研究旨在提高心率数据传输的准确性和计算效率,帮助医疗机构更好地诊断疾病。
  • methods: 本研究使用了多种机器学习算法,包括支持向量机器学习、决策树、Random Forest等,以提高疾病诊断的准确性和效率。
  • results: 研究发现,使用机器学习算法可以提高心率数据的准确性和计算效率,并且可以适应不同的疾病类型和数据类型。
    Abstract The global need for effective disease diagnosis remains substantial, given the complexities of various disease mechanisms and diverse patient symptoms. To tackle these challenges, researchers, physicians, and patients are turning to machine learning (ML), an artificial intelligence (AI) discipline, to develop solutions. By leveraging sophisticated ML and AI methods, healthcare stakeholders gain enhanced diagnostic and treatment capabilities. However, there is a scarcity of research focused on ML algorithms for enhancing the accuracy and computational efficiency. This research investigates the capacity of machine learning algorithms to improve the transmission of heart rate data in time series healthcare metrics, concentrating particularly on optimizing accuracy and efficiency. By exploring various ML algorithms used in healthcare applications, the review presents the latest trends and approaches in ML-based disease diagnosis (MLBDD). The factors under consideration include the algorithm utilized, the types of diseases targeted, the data types employed, the applications, and the evaluation metrics. This review aims to shed light on the prospects of ML in healthcare, particularly in disease diagnosis. By analyzing the current literature, the study provides insights into state-of-the-art methodologies and their performance metrics.
    摘要 全球医疾诊断需求仍然很大,因为各种疾病机制复杂,病人症状多样化。为了解决这些挑战,研究人员、医生和患者都在转向机器学习(ML),一种人工智能(AI)专业,开发解决方案。通过利用高级ML和AI技术,医疗各界人员获得了提高诊断和治疗能力。然而,关于ML算法以提高时间序列医疗指标中心脉速率数据传输的准确性和计算效率的研究相对落后。本研究探讨了ML算法在医疗应用中的可行性,特别是在提高准确性和计算效率方面。通过检查各种健康应用中的ML算法,本文提供了最新的趋势和方法。评价标准包括算法使用、疾病类型、数据类型、应用程序和评价指标。本文的目的是探讨ML在医疗领域的前景,特别是疾病诊断方面的可能性。通过分析当前文献,本研究提供了有关当前领先技术和其性能指标的视角。

CL-MASR: A Continual Learning Benchmark for Multilingual ASR

  • paper_url: http://arxiv.org/abs/2310.16931
  • repo_url: https://github.com/speechbrain/benchmarks
  • paper_authors: Luca Della Libera, Pooneh Mousavi, Salah Zaiem, Cem Subakan, Mirco Ravanelli
  • for: 本研究旨在提供一个用于多语言自动语音识别(ASR)系统的 continual learning benchmark,以探索在新语言中学习时,如何保持先前语言的知识。
  • methods: 本研究使用了现有的大规模预训 ASR 模型,并实现了多种 continual learning 方法,以评估学习新语言时的效果。
  • results: 本研究提供了一个多语言 ASR 的 continual learning benchmark,并在这个 benchmark 上评估了多种 continual learning 方法的效果,以探索如何在学习新语言时,保持先前语言的知识。
    Abstract Modern multilingual automatic speech recognition (ASR) systems like Whisper have made it possible to transcribe audio in multiple languages with a single model. However, current state-of-the-art ASR models are typically evaluated on individual languages or in a multi-task setting, overlooking the challenge of continually learning new languages. There is insufficient research on how to add new languages without losing valuable information from previous data. Furthermore, existing continual learning benchmarks focus mostly on vision and language tasks, leaving continual learning for multilingual ASR largely unexplored. To bridge this gap, we propose CL-MASR, a benchmark designed for studying multilingual ASR in a continual learning setting. CL-MASR provides a diverse set of continual learning methods implemented on top of large-scale pretrained ASR models, along with common metrics to assess the effectiveness of learning new languages while addressing the issue of catastrophic forgetting. To the best of our knowledge, CL-MASR is the first continual learning benchmark for the multilingual ASR task. The code is available at https://github.com/speechbrain/benchmarks.
    摘要 现代多语言自动语音识别(ASR)系统如Whisper已经使得可以通过单个模型来识别多种语言的音频。然而,当前领先的ASR模型通常在单个语言或多任务 Setting中被评估,忽略了在新语言上不断学习的挑战。目前学术研究中不 enough to add new languages without losing valuable information from previous data. In addition, existing continual learning benchmarks focus mainly on vision and language tasks, leaving continual learning for multilingual ASR largely unexplored. To bridge this gap, we propose CL-MASR, a benchmark designed for studying multilingual ASR in a continual learning setting. CL-MASR provides a diverse set of continual learning methods implemented on top of large-scale pretrained ASR models, along with common metrics to assess the effectiveness of learning new languages while addressing the issue of catastrophic forgetting. To the best of our knowledge, CL-MASR is the first continual learning benchmark for the multilingual ASR task. 代码可以在https://github.com/speechbrain/benchmarks中找到。

Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs

  • paper_url: http://arxiv.org/abs/2310.16919
  • repo_url: None
  • paper_authors: Jianwei Fei, Zhihua Xia, Benedetta Tondi, Mauro Barni
  • for: 保护基于生成器的知识产权 (Intellectual Property Rights) against 白盒模型攻击 (white-box attacks)
  • methods: 使用多比特盒子-自由 watermarking 方法,在 GAN 训练过程中添加额外的 watermarking 损失函数,使得生成的图像含有隐藏的水印,可以通过预训练的水印解码器检测。为提高鲁棒性,使模型参数具有宽浅的极小值,使得任何模型参数修改都不会消除水印。
  • results: 实验结果表明,存在水印的图像质量几乎不受影响,而且水印具有高度鲁棒性,抵抗模型修改和代理模型攻击。
    Abstract We propose a novel multi-bit box-free watermarking method for the protection of Intellectual Property Rights (IPR) of GANs with improved robustness against white-box attacks like fine-tuning, pruning, quantization, and surrogate model attacks. The watermark is embedded by adding an extra watermarking loss term during GAN training, ensuring that the images generated by the GAN contain an invisible watermark that can be retrieved by a pre-trained watermark decoder. In order to improve the robustness against white-box model-level attacks, we make sure that the model converges to a wide flat minimum of the watermarking loss term, in such a way that any modification of the model parameters does not erase the watermark. To do so, we add random noise vectors to the parameters of the generator and require that the watermarking loss term is as invariant as possible with respect to the presence of noise. This procedure forces the generator to converge to a wide flat minimum of the watermarking loss. The proposed method is architectureand dataset-agnostic, thus being applicable to many different generation tasks and models, as well as to CNN-based image processing architectures. We present the results of extensive experiments showing that the presence of the watermark has a negligible impact on the quality of the generated images, and proving the superior robustness of the watermark against model modification and surrogate model attacks.
    摘要 我们提出了一种新的多位数字箱子无法水印方法,用于保护生成对应的知识产权(IPR)。这种水印方法可以增强对白盒式攻击,如精细调整、剪辑、量化和代理模型攻击的Robustness。在生成器训练时,我们通过添加额外的水印损失项来嵌入水印,使生成器生成的图像包含一个不可见的水印,可以通过预训练的水印解码器来检索。为了增强对白盒模型级攻击的Robustness,我们确保生成器在水印损失项中落在宽阔的平坦 minimum 中,以避免任何模型参数的修改可以消除水印。为此,我们在生成器参数中添加随机的噪声向量,并要求水印损失项在噪声存在时保持一定的不变性。这个过程使生成器 converges 到一个宽阔的平坦 minimum ,使得任何模型参数的修改都不会消除水印。我们的方法是 Architecture 和 Dataset 无关的,因此可以应用于许多不同的生成任务和模型,以及 CNN 基于的图像处理架构。我们的实验结果表明,水印的存在对生成的图像质量的影响为无效的,并证明了我们的水印方法对模型修改和代理模型攻击的Robustness 是超越的。

Unsupervised Learning of Molecular Embeddings for Enhanced Clustering and Emergent Properties for Chemical Compounds

  • paper_url: http://arxiv.org/abs/2310.18367
  • repo_url: None
  • paper_authors: Jaiveer Gill, Ratul Chakraborty, Reetham Gubba, Amy Liu, Shrey Jain, Chirag Iyer, Obaid Khwaja, Saurav Kumar
  • for: 这 paper 的目的是为了开发一种新的计算工具,用于探索和理解分子结构和性质。
  • methods: 这 paper 使用了多种方法来探索和分类化学物质的 SMILES 数据,包括图Structures 分析和自然语言描述嵌入。
  • results: 这 paper 的结果表明,使用这些方法可以获得了明确的、集中的分 clusters,并且可以有效地查询和理解化学物质。
    Abstract The detailed analysis of molecular structures and properties holds great potential for drug development discovery through machine learning. Developing an emergent property in the model to understand molecules would broaden the horizons for development with a new computational tool. We introduce various methods to detect and cluster chemical compounds based on their SMILES data. Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value. The results yielded pronounced, concentrated clusters, and the method produced favorable results in querying and understanding the compounds. We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model. Thus, we introduce a similarity search and clustering algorithm to aid in searching for and interacting with molecules, enhancing efficiency in chemical exploration and enabling future development of emergent properties in molecular property prediction models.
    摘要 detail 分析分子结构和性质具有很大的潜力用于药物发现,通过机器学习来开拓新的计算工具。我们引入了多种方法来探测和归类化学化合物基于其SMILES数据。我们的第一种方法是使用嵌入数据来探测化学结构的图形结构,并使用vector搜索达到我们的阈值。结果出现了明显、集中的团结果,这种方法在查询和理解化学物质方面表现出了良好的效果。我们还使用GPT3.5中的自然语言描述嵌入存储在向量库中,这超越了基础模型。因此,我们引入了相似搜索和归类算法,以帮助在化学探索中快速搜索和交互化学物质,提高化学探索的效率,并启动未来的分子性质预测模型的发展。

RDBench: ML Benchmark for Relational Databases

  • paper_url: http://arxiv.org/abs/2310.16837
  • repo_url: None
  • paper_authors: Zizhao Zhang, Yi Yang, Lutong Zou, He Wen, Tao Feng, Jiaxuan You
    for:ML Benchmark For Relational Databases (RDBench) aims to promote reproducible ML research on RDBs that include multiple tables.methods:RDBench offers diverse RDB datasets of varying scales, domains, and relational structures, organized into 4 levels. It exposes three types of interfaces including tabular data, homogeneous graphs, and heterogeneous graphs, sharing the same underlying task definition.results:RDBench enables meaningful comparisons between ML methods from diverse domains, ranging from XGBoost to Graph Neural Networks, under RDB prediction tasks. Multiple classification and regression tasks are designed for each RDB dataset, and results are reported with averaged findings to enhance the robustness of the experimental results.
    Abstract Benefiting from high-quality datasets and standardized evaluation metrics, machine learning (ML) has achieved sustained progress and widespread applications. However, while applying machine learning to relational databases (RDBs), the absence of a well-established benchmark remains a significant obstacle to the development of ML. To address this issue, we introduce ML Benchmark For Relational Databases (RDBench), a standardized benchmark that aims to promote reproducible ML research on RDBs that include multiple tables. RDBench offers diverse RDB datasets of varying scales, domains, and relational structures, organized into 4 levels. Notably, to simplify the adoption of RDBench for diverse ML domains, for any given database, RDBench exposes three types of interfaces including tabular data, homogeneous graphs, and heterogeneous graphs, sharing the same underlying task definition. For the first time, RDBench enables meaningful comparisons between ML methods from diverse domains, ranging from XGBoost to Graph Neural Networks, under RDB prediction tasks. We design multiple classification and regression tasks for each RDB dataset and report averaged results over the same dataset, further enhancing the robustness of the experimental findings. RDBench is implemented with DBGym, a user-friendly platform for ML research and application on databases, enabling benchmarking new ML methods with RDBench at ease.
    摘要 使用高质量的数据集和标准化的评估 metric,机器学习(ML)已经取得了持续的进步和广泛的应用。然而,当应用机器学习到关系数据库(RDB)时,缺乏一个具有广泛适用性的标准准测试 benchmark 是一个重要的障碍物。为解决这个问题,我们介绍了 ML Benchmark For Relational Databases(RDBench),一个标准化的准测试 benchmark,旨在促进可重复的 ML 研究在 RDB 中,包括多张表。RDBench 提供了多种不同的 RDB 数据集,其中每个数据集都有不同的规模、领域和关系结构,并且分为 4 级别。尤其是,为了简化 RDBench 在多种 ML 领域中的采用,对于任何给定的数据库,RDBench 提供了三种类型的接口,包括表格数据、同质graph和不同质graph,这些接口都是基于同一个任务定义。这样,RDBench 为不同领域的 ML 方法进行比较,从 XGBoost 到图 neural network,在 RDB 预测任务中实现了意义的比较。我们设计了多种分类和回归任务,并对每个 RDB 数据集进行了多次评估,以提高实验结果的稳定性。RDBench 通过 DBGym 实现,一个用户友好的平台 для ML 研究和应用于数据库,可以轻松地对 RDBench 进行准测试新的 ML 方法。

LLM-FP4: 4-Bit Floating-Point Quantized Transformers

  • paper_url: http://arxiv.org/abs/2310.16836
  • repo_url: https://github.com/nbasyl/llm-fp4
  • paper_authors: Shih-yang Liu, Zechun Liu, Xijie Huang, Pingcheng Dong, Kwang-Ting Cheng
  • for: 这个论文目的是为了在大型语言模型(LLM)中进行后training quantization,将 weights 和 activations 数值降到 4 位浮点数值。
  • methods: 这个方法使用了浮点数值量化(FP quantization),并通过搜寻优化量化参数以提高表现。此外,这个方法还使用了每道活动量化,以解决活动量化的问题。
  • results: 这个方法可以将 LLaMA-13B 模型中的 weights 和 activations 降到 4 位浮点数值,并在常识零shot reasoning 任务上得到了63.1的平均分数,仅比整数模型低5.8分,比前一代最佳方案高出12.7分。
    Abstract We propose LLM-FP4 for quantizing both weights and activations in large language models (LLMs) down to 4-bit floating-point values, in a post-training manner. Existing post-training quantization (PTQ) solutions are primarily integer-based and struggle with bit widths below 8 bits. Compared to integer quantization, floating-point (FP) quantization is more flexible and can better handle long-tail or bell-shaped distributions, and it has emerged as a default choice in many hardware platforms. One characteristic of FP quantization is that its performance largely depends on the choice of exponent bits and clipping range. In this regard, we construct a strong FP-PTQ baseline by searching for the optimal quantization parameters. Furthermore, we observe a high inter-channel variance and low intra-channel variance pattern in activation distributions, which adds activation quantization difficulty. We recognize this pattern to be consistent across a spectrum of transformer models designed for diverse tasks, such as LLMs, BERT, and Vision Transformer models. To tackle this, we propose per-channel activation quantization and show that these additional scaling factors can be reparameterized as exponential biases of weights, incurring a negligible cost. Our method, for the first time, can quantize both weights and activations in the LLaMA-13B to only 4-bit and achieves an average score of 63.1 on the common sense zero-shot reasoning tasks, which is only 5.8 lower than the full-precision model, significantly outperforming the previous state-of-the-art by 12.7 points. Code is available at: https://github.com/nbasyl/LLM-FP4.
    摘要 我们提出LLM-FP4,将大型语言模型(LLM)中的 weights 和 activaions 降到 4 位浮点数值的POST训练方法。现有的POST训练(PTQ)解决方案主要是整数型的,对于比特幅下8 bits以下的情况做出差。相比于整数量化,浮点数(FP)量化更 flexible,可以更好地处理长尾或铃鼓形分布,因此在许多硬件平台上变得默认的选择。一个FP量化的特点是它的性能受到选择的指数位数和截取范围的影响。为了建立强大的FP-PTQ基线,我们进行了搜寻最佳量化参数。此外,我们发现 activation 分布中存在高通道方差低通道方差的特性,这会增加 activation 量化的问题。我们认为这个特性是跨多个 transformer 模型设计 для多元任务的共同特点,例如 LLMs、BERT 和 Vision Transformer 模型。为了解决这个问题,我们提出了每通道 activation 量化,并证明这些额外的标准化因子可以被视为 weight 的指数偏移,带来无法忽略的成本。我们的方法首次可以将 LLaMA-13B 中的 weights 和 activations 降到 4 位浮点数值,并在常识零基础理解任务上实现平均得分63.1,仅比整数模型下降5.8分,与过去状态艺术的最佳方案相对提高12.7分。代码可以在以下链接获取:https://github.com/nbasyl/LLM-FP4。

Proposal-Contrastive Pretraining for Object Detection from Fewer Data

  • paper_url: http://arxiv.org/abs/2310.16835
  • repo_url: None
  • paper_authors: Quentin Bouniot, Romaric Audigier, Angélique Loesch, Amaury Habrard
  • for: 这篇论文的目的是为了提出一种不需要大量数据的无监督预训方法,并且能够在实际应用中获得好的性能。
  • methods: 这篇论文使用了transformer构 architecture,并且将object detector作为预训模型,通过生成大量的object proposal来进行对照学习。
  • results: 研究发现,这种方法能够在标准和新的benchmark上实现顶尖的性能,并且在仅使用少量数据的情况下进行预训。
    Abstract The use of pretrained deep neural networks represents an attractive way to achieve strong results with few data available. When specialized in dense problems such as object detection, learning local rather than global information in images has proven to be more efficient. However, for unsupervised pretraining, the popular contrastive learning requires a large batch size and, therefore, a lot of resources. To address this problem, we are interested in transformer-based object detectors that have recently gained traction in the community with good performance and with the particularity of generating many diverse object proposals. In this work, we present Proposal Selection Contrast (ProSeCo), a novel unsupervised overall pretraining approach that leverages this property. ProSeCo uses the large number of object proposals generated by the detector for contrastive learning, which allows the use of a smaller batch size, combined with object-level features to learn local information in the images. To improve the effectiveness of the contrastive loss, we introduce the object location information in the selection of positive examples to take into account multiple overlapping object proposals. When reusing pretrained backbone, we advocate for consistency in learning local information between the backbone and the detection head. We show that our method outperforms state of the art in unsupervised pretraining for object detection on standard and novel benchmarks in learning with fewer data.
    摘要 “使用预训 Deep Neural Networks 是一种吸引人的方法来实现强大的结果,即使有限的数据available。当特化在对像检测这类密集问题时,学习本地而不是全局信息在图像中已经证明是更加有效率。然而,对于无超级预训,广泛的对比学习需要大批号Size和资源。为解决这个问题,我们对 transformer-based 对像检测器表示兴趣,这些检测器在社区中获得了良好的性能,并且具有生成多个多标的物件提案的特性。在这个工作中,我们提出了 Proposal Selection Contrast(ProSeCo),一种新的无超级预训方法,利用这个特性。ProSeCo 使用生成的物件提案进行对比学习,这allow us 使用较小的批号Size,同时使用物件层的特征来学习本地信息在图像中。为了提高对比损失的有效性,我们将物件位置信息包含在选择正例的范例中,以考虑多个重叠的物件提案。当 reuse pretrained backbone 时,我们强调了在 backbone 和检测头之间的一致性,以确保在学习本地信息时,backbone 和检测头之间的学习是一致的。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

TD-MPC2: Scalable, Robust World Models for Continuous Control

  • paper_url: http://arxiv.org/abs/2310.16828
  • repo_url: https://github.com/nicklashansen/tdmpc2
  • paper_authors: Nicklas Hansen, Hao Su, Xiaolong Wang
  • for: 这个论文的目的是提出一种基于模型的强化学习算法(TD-MPC2),用于本地轨迹优化在学习得到的隐藏空间中。
  • methods: 这个算法使用了模型基于的强化学习策略,包括TD-MPC算法和一系列改进。
  • results: 在104个在线RL任务中,TD-MPC2表现出色,与基elines相比显著提高了表现,并且可以在多个任务领域、embodiments和动作空间中进行多任务学习。
    Abstract TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://nicklashansen.github.io/td-mpc2
    摘要 TD-MPC是一种基于模型的强化学习(RL)算法,实现了本地轨迹优化在学习的隐藏空间中。在这篇文章中,我们介绍了TD-MPC2:一系列对TD-MPC算法进行改进的方法。我们表明TD-MPC2在104个在线RL任务中表现出色,在4个多样化任务领域中准确地实现了强大的结果,并且使用单一的超参数来实现。我们进一步显示,代理机制能力随模型和数据集大小增长,并成功地训练了一个317M参数的单一代理来完成80个任务 across多个任务领域、实现方式和动作空间。我们结束时提出了大TD-MPC2代理的教训、机遇和风险。感兴趣的朋友可以前往https://nicklashansen.github.io/td-mpc2查看视频、模型、数据、代码等资源。

Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction

  • paper_url: http://arxiv.org/abs/2310.16822
  • repo_url: None
  • paper_authors: Xuming Hu, Junzhe Chen, Aiwei Liu, Shiao Meng, Lijie Wen, Philip S. Yu
  • for: 提高图像中 Entity 和 Relation EXTRACT 的能力
  • methods: 使用 Multimodal EXTRACTION 方法,combined with images and text,获取更多的信号,并将其们Alignment through graphs or hierarchical fusion,以提高EXTRACTION 能力
  • results: experiments on three datasets show an average 3.41% F1 improvement over prior SOTA, and using the method on prior SOTA fusions further improves 5.47% F1.
    Abstract How can we better extract entities and relations from text? Using multimodal extraction with images and text obtains more signals for entities and relations, and aligns them through graphs or hierarchical fusion, aiding in extraction. Despite attempts at various fusions, previous works have overlooked many unlabeled image-caption pairs, such as NewsCLIPing. This paper proposes innovative pre-training objectives for entity-object and relation-image alignment, extracting objects from images and aligning them with entity and relation prompts for soft pseudo-labels. These labels are used as self-supervised signals for pre-training, enhancing the ability to extract entities and relations. Experiments on three datasets show an average 3.41% F1 improvement over prior SOTA. Additionally, our method is orthogonal to previous multimodal fusions, and using it on prior SOTA fusions further improves 5.47% F1.
    摘要 <>将文本翻译成简化中文。>可以通过多模态提取来更好地提取实体和关系从文本中,并使用图像和文本获得更多的信号,并将其归一化为图像或层次融合。不过,先前的作品都忽略了大量没有标签的图像描述对,如新闻CLIPing。这篇论文提出了一些创新的预训练目标,用于对象-图像和关系-图像的归一化,从图像中提取对象并将其与实体和关系提示进行软式 Pseudo-标签。这些标签被用作自我超vised信号进行预训练,从而提高实体和关系提取的能力。实验结果表明,与先前最优的STATE OF THE ART相比,我们的方法能够提高3.41%的F1分数。此外,我们的方法与先前的多模态融合 orthogonal,并在使用之前的最优融合后,进一步提高5.47%的F1分数。

Can GPT models Follow Human Summarization Guidelines? Evaluating ChatGPT and GPT-4 for Dialogue Summarization

  • paper_url: http://arxiv.org/abs/2310.16810
  • repo_url: None
  • paper_authors: Yongxin Zhou, Fabien Ringeval, François Portet
  • for: 本研究探讨了基于提示的大型自然语言模型(LLM)如ChatGPT和GPT-4在遵循人类指南的对话概要整理能力。
  • methods: 研究使用了DialogSum(英文社交对话)和DECODA(法语客服对话)等实验,测试了多种提示,包括现有文献中的提示和人类概要指南中的提示,以及两步提示方法。
  • results: 研究发现,GPT模型通常会生成长度很长的概要,并且与人类概要指南不准确。但是,使用人类指南作为中间步骤显示了 promise,在一些情况下超过了直接Word length constraint提示。研究发现,GPT模型在概要中表现出了独特的风格特征。虽然BERTScores不减少了GPT输出和人类参考的semantic similarity,但ROUGE scores显示了GPT生成的和人类写的概要之间的 grammatical和lexical 不同。这些发现反映了 GPT模型在人类指南下的对话概要整理能力。
    Abstract This study explores the capabilities of prompt-driven Large Language Models (LLMs) like ChatGPT and GPT-4 in adhering to human guidelines for dialogue summarization. Experiments employed DialogSum (English social conversations) and DECODA (French call center interactions), testing various prompts: including prompts from existing literature and those from human summarization guidelines, as well as a two-step prompt approach. Our findings indicate that GPT models often produce lengthy summaries and deviate from human summarization guidelines. However, using human guidelines as an intermediate step shows promise, outperforming direct word-length constraint prompts in some cases. The results reveal that GPT models exhibit unique stylistic tendencies in their summaries. While BERTScores did not dramatically decrease for GPT outputs suggesting semantic similarity to human references and specialised pre-trained models, ROUGE scores reveal grammatical and lexical disparities between GPT-generated and human-written summaries. These findings shed light on the capabilities and limitations of GPT models in following human instructions for dialogue summarization.
    摘要

Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances

  • paper_url: http://arxiv.org/abs/2310.16790
  • repo_url: None
  • paper_authors: Zhendong Chu, Ruiyi Zhang, Tong Yu, Rajiv Jain, Vlad I Morariu, Jiuxiang Gu, Ani Nenkova
  • for: 提高 ner 模型的性能,避免使用大量、高质量的注释数据,而是使用低质量的协作注释数据和外部知识库数据。
  • methods: 提出了一种使用指导集来减少噪声标注数据的方法,并使用探测器模型来重新调整样本权重。
  • results: 在公共协作和远程监督数据集上表现出色,可靠地提高 ner 模型的性能,并且只需要一小部分的指导集。
    Abstract To achieve state-of-the-art performance, one still needs to train NER models on large-scale, high-quality annotated data, an asset that is both costly and time-intensive to accumulate. In contrast, real-world applications often resort to massive low-quality labeled data through non-expert annotators via crowdsourcing and external knowledge bases via distant supervision as a cost-effective alternative. However, these annotation methods result in noisy labels, which in turn lead to a notable decline in performance. Hence, we propose to denoise the noisy NER data with guidance from a small set of clean instances. Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights. The discriminator is capable of detecting both span and category errors with different discriminative prompts. Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
    摘要 Translated into Simplified Chinese:要达到状态前的性能,一直需要训练NER模型,需要大量、高质量的注释数据,这是both costly和time-intensive的。然而,在实际应用中,通常通过众包和外部知识库的 distant supervision 来获得大量低质量的标注数据,这是一种cost-effective的替代方案。然而,这些注释方法会导致噪声标注,这会导致性能下降。因此,我们提议使用一小set of clean instances 来减噪NER数据。同时,我们还训练了一个discriminator模型,并使用其输出来重新调整样本权重。这个discriminator模型能够检测 span 和 category 错误,并且可以通过不同的推诱来检测。Results on public crowdsourcing和 distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

  • paper_url: http://arxiv.org/abs/2310.16787
  • repo_url: None
  • paper_authors: Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Deb Roy, Sara Hooker
    for:The paper aims to address the legal and ethical risks associated with the use of vast, diverse, and inconsistently documented datasets in natural language processing (NLP) research.methods:The authors conducted a systematic audit and trace of 1800+ text datasets, using a multi-disciplinary approach that combined legal and machine learning expertise. They developed tools and standards to trace the lineage of these datasets, including their source, creators, license conditions, properties, and subsequent use.results:The authors found significant divides in the composition and focus of commercially open vs closed datasets, with closed datasets dominating important categories such as lower resource languages, more creative tasks, and richer topic variety. They also observed frequent miscategorization of licenses on widely used dataset hosting sites, with license omission and error rates of 72%+ and 50%+, respectively. These findings highlight a crisis in misattribution and informed use of the most popular datasets driving many recent breakthroughs in NLP. To address these issues, the authors release their entire audit with an interactive UI, the Data Provenance Explorer, which allows practitioners to trace and filter on data provenance for the most popular open source finetuning data collections.
    Abstract The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tools and standards to trace the lineage of these datasets, from their source, creators, series of license conditions, properties, and subsequent use. Our landscape analysis highlights the sharp divides in composition and focus of commercially open vs closed datasets, with closed datasets monopolizing important categories: lower resource languages, more creative tasks, richer topic variety, newer and more synthetic training data. This points to a deepening divide in the types of data that are made available under different license conditions, and heightened implications for jurisdictional legal interpretations of copyright and fair use. We also observe frequent miscategorization of licenses on widely used dataset hosting sites, with license omission of 72%+ and error rates of 50%+. This points to a crisis in misattribution and informed use of the most popular datasets driving many recent breakthroughs. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire audit, with an interactive UI, the Data Provenance Explorer, which allows practitioners to trace and filter on data provenance for the most popular open source finetuning data collections: www.dataprovenance.org.
    摘要 “对于对大量、多样化和不充分文献的语言模型训练而构成的法律和道德风险,我们举办了多种领域的专家团队实地实测和追溯1800多个文本数据集。我们开发了工具和标准来追溯这些数据集的来源、创建者、授权条件、性质和使用方式。我们的景观分析显示,商业公开的数据集和关闭的数据集在分布和用途上存在鲜明的差异,关闭的数据集占据重要类别,例如低资源语言、更创意的任务、更丰富的主题多样性和更新的训练数据。这显示出授权条件下的数据集分配存在深刻的分化,并且对法律实践中的著作权和允许使用产生了更高的影响。我们发现了广泛使用的数据集主机网站上的授权错误和授权漏洞,授权漏洞率高于72%,错误率高于50%。这显示出对最受欢迎的数据集的误导和不负责任的使用存在危机。为了继续改善数据集的透明度和责任使用,我们发布了我们的实测数据和一个互动式的UI,叫做数据认证探索器,让实践者可以追溯和范畴数据集的来源:www.dataprovenance.org。”

Multi-scale Diffusion Denoised Smoothing

  • paper_url: http://arxiv.org/abs/2310.16779
  • repo_url: https://github.com/jh-jeong/smoothing-multiscale
  • paper_authors: Jongheon Jeong, Jinwoo Shin
  • for: 本研究旨在提高随机缓和方法的证明性 robustness,并实现大型预训练模型的鲁棒性。
  • methods: 本研究使用了随机缓和方法,并提出了一种多级缓和方法来提高证明性 robustness。
  • results: 实验表明,使用多级缓和方法可以实现高噪声水平下的证明性 robustness,同时保持准确率与非缓和模型几乎相同。
    Abstract Along with recent diffusion models, randomized smoothing has become one of a few tangible approaches that offers adversarial robustness to models at scale, e.g., those of large pre-trained models. Specifically, one can perform randomized smoothing on any classifier via a simple "denoise-and-classify" pipeline, so-called denoised smoothing, given that an accurate denoiser is available - such as diffusion model. In this paper, we present scalable methods to address the current trade-off between certified robustness and accuracy in denoised smoothing. Our key idea is to "selectively" apply smoothing among multiple noise scales, coined multi-scale smoothing, which can be efficiently implemented with a single diffusion model. This approach also suggests a new objective to compare the collective robustness of multi-scale smoothed classifiers, and questions which representation of diffusion model would maximize the objective. To address this, we propose to further fine-tune diffusion model (a) to perform consistent denoising whenever the original image is recoverable, but (b) to generate rather diverse outputs otherwise. Our experiments show that the proposed multi-scale smoothing scheme combined with diffusion fine-tuning enables strong certified robustness available with high noise level while maintaining its accuracy close to non-smoothed classifiers.
    摘要 Recently, randomized smoothing has become one of the few practical methods that can provide adversarial robustness to large pre-trained models, such as those with many layers. Specifically, a simple "denoise-and-classify" pipeline can be used to perform randomized smoothing on any classifier, as long as an accurate denoiser is available, such as a diffusion model. In this paper, we propose a scalable method to balance the trade-off between certified robustness and accuracy in denoised smoothing. Our key idea is to selectively apply smoothing at multiple noise scales, which can be efficiently implemented with a single diffusion model. This approach also introduces a new objective to compare the collective robustness of multi-scale smoothed classifiers, and we propose to fine-tune the diffusion model to achieve this goal. Our experiments show that the proposed multi-scale smoothing scheme combined with diffusion fine-tuning can provide strong certified robustness with high noise levels while maintaining accuracy close to non-smoothed classifiers.

DEFT: Data Efficient Fine-Tuning for Large Language Models via Unsupervised Core-Set Selection

  • paper_url: http://arxiv.org/abs/2310.16776
  • repo_url: None
  • paper_authors: Devleena Das, Vivek Khetan
  • for: 这篇论文目的是探讨如何使用可训练语言模型(PLM)进行下游任务的数据准备。
  • methods: 该论文提出了一种数据精炼框架(DEFT),通过不upervised核心集选择来最小化PLM的数据准备量。
  • results: 论文在文本编辑LM中展示了DEFT框架的效果,并与状态之arte编辑模型CoEDIT进行比较。结果表明,DEFT模型可以与CoEDIT模型准确率相似,仅使用大约70%的数据进行 fine-tuning。
    Abstract Recent advances have led to the availability of many pre-trained language models (PLMs); however, a question that remains is how much data is truly needed to fine-tune PLMs for downstream tasks? In this work, we introduce DEFT, a data-efficient fine-tuning framework that leverages unsupervised core-set selection to minimize the amount of data needed to fine-tune PLMs for downstream tasks. We demonstrate the efficacy of our DEFT framework in the context of text-editing LMs, and compare to the state-of-the art text-editing model, CoEDIT. Our quantitative and qualitative results demonstrate that DEFT models are just as accurate as CoEDIT while being finetuned on ~70% less data.
    摘要 最近的进步已经使得许多预训练语言模型(PLM)可以获得,但是一个问题仍然是如何真正需要多少数据来精度地训练 PLM для下游任务?在这种工作中,我们介绍了 DEFT,一种数据效率的微调框架,该框架利用无监督核心集选择来最小化需要微调 PLM 的数据量。我们在文本修订LM中展示了我们的 DEFT 框架的效果,并与当前领先的文本修订模型CoEDIT进行比较。我们的量化和质量结果表明,DEFT 模型可以与 CoEDIT 模型准确性相同,只需要微调 ~70% 的数据量。

AI Agent as Urban Planner: Steering Stakeholder Dynamics in Urban Planning via Consensus-based Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16772
  • repo_url: https://github.com/mao1207/Steering-Stakeholder-Dynamics-in-Urban-Planning-via-Consensus-based-MARL
  • paper_authors: Kejiang Qian, Lingjun Mao, Xin Liang, Yimin Ding, Jin Gao, Xinran Wei, Ziyi Guo, Jiajie Li
  • for: 本研究旨在提高现代城市规划实践中的可持续发展和社会参与度,通过一种基于多代理人强化学习的妥协框架,满足不同利益集的各种需求。
  • methods: 本研究提出了一种基于多代理人强化学习的妥协框架,其中智能代理人代表不同的利益集,通过投票来选择最佳的土地使用类型。此外,我们还提出了一种新的妥协机制在奖励设计中,以便在集体决策过程中优化土地利用。
  • results: 我们的计算模型在实际社区的传统顶部规划和参与规划方法上进行了广泛的实验,结果表明,我们的计算模型能够提高全球利益和满足不同人群的需求,导致不同人群的满意度得到提高。
    Abstract In urban planning, land use readjustment plays a pivotal role in aligning land use configurations with the current demands for sustainable urban development. However, present-day urban planning practices face two main issues. Firstly, land use decisions are predominantly dependent on human experts. Besides, while resident engagement in urban planning can promote urban sustainability and livability, it is challenging to reconcile the diverse interests of stakeholders. To address these challenges, we introduce a Consensus-based Multi-Agent Reinforcement Learning framework for real-world land use readjustment. This framework serves participatory urban planning, allowing diverse intelligent agents as stakeholder representatives to vote for preferred land use types. Within this framework, we propose a novel consensus mechanism in reward design to optimize land utilization through collective decision making. To abstract the structure of the complex urban system, the geographic information of cities is transformed into a spatial graph structure and then processed by graph neural networks. Comprehensive experiments on both traditional top-down planning and participatory planning methods from real-world communities indicate that our computational framework enhances global benefits and accommodates diverse interests, leading to improved satisfaction across different demographic groups. By integrating Multi-Agent Reinforcement Learning, our framework ensures that participatory urban planning decisions are more dynamic and adaptive to evolving community needs and provides a robust platform for automating complex real-world urban planning processes.
    摘要 在城市规划中,土地利用重新规划扮演着关键作用,以适应可持续发展的城市发展需求。然而,当前城市规划做法面临两大挑战。一是,土地利用决策受人类专家的主导,二是, resident 参与城市规划可以提高城市可持续发展和居住质量,但是与不同利益相关者的利益协调很困难。为解决这些挑战,我们介绍了一种基于多智能代理人学习的共识型多智能代理人学习框架,用于实际的土地利用重新规划。这种框架支持参与型城市规划,allowing 多种智能代理人作为利益相关者代表投票 preferred 土地利用类型。在这个框架中,我们提出了一种新的共识机制,用于优化土地利用through 集体决策。为了抽象城市系统的复杂结构,我们将城市的地理信息转化为空间图 structure,然后由图神经网络处理。我们在实际的传统顶部规划和参与式规划方法上进行了实验,结果表明,我们的计算框架可以提高全球利益并与不同民族群体的利益协调,导致不同民族群体的满意度提高。通过将多智能代理人学习与参与型城市规划结合,我们的框架确保了参与型城市规划决策更加动态和适应到不断发展的社区需求,并提供了对复杂实际城市规划过程的自动化平台。

SuperHF: Supervised Iterative Learning from Human Feedback

  • paper_url: http://arxiv.org/abs/2310.16763
  • repo_url: https://github.com/openfeedback/superhf
  • paper_authors: Gabriel Mukobi, Peter Chatain, Su Fong, Robert Windesheim, Gitta Kutyniok, Kush Bhatia, Silas Alberti
  • for: 这个研究旨在解决大型语言模型的安全性、人类价值调整和训练稳定性问题。
  • methods: 这个研究使用了两种常见的方法来调整语言模型:Supervised Fine-Tuning (SFT) 和 Reinforcement Learning from Human Feedback (RLHF)。SFT 是一种简单和可靠的方法,而 RLHF 则是一种更加进步的方法,但也存在问题,例如训练不稳定和受到赏金攻击。
  • results: 这个研究提出了一个新的方法,即 Supervised Iterative Learning from Human Feedback (SuperHF),以解决 RLHF 的问题。SuperHF 使用了一个简单的超级vised损失函数和 Kullback-Leibler (KL) 数学预测器,并通过在线学习 режим中 repeatedly sampling a batch of model outputs 和 filtering them through the reward model 来创建自己的训练数据。研究结果显示 SuperHF 可以高效地调整语言模型,并且可以轻松地实现高赏金和低赏金攻击之间的交换。此外,SuperHF 也可以提高下游评估的准确性和调整性。
    Abstract While large language models demonstrate remarkable capabilities, they often present challenges in terms of safety, alignment with human values, and stability during training. Here, we focus on two prevalent methods used to align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). SFT is simple and robust, powering a host of open-source models, while RLHF is a more sophisticated method used in top-tier models like ChatGPT but also suffers from instability and susceptibility to reward hacking. We propose a novel approach, Supervised Iterative Learning from Human Feedback (SuperHF), which seeks to leverage the strengths of both methods. Our hypothesis is two-fold: that the reward model used in RLHF is critical for efficient data use and model generalization and that the use of Proximal Policy Optimization (PPO) in RLHF may not be necessary and could contribute to instability issues. SuperHF replaces PPO with a simple supervised loss and a Kullback-Leibler (KL) divergence prior. It creates its own training data by repeatedly sampling a batch of model outputs and filtering them through the reward model in an online learning regime. We then break down the reward optimization problem into three components: robustly optimizing the training rewards themselves, preventing reward hacking-exploitation of the reward model that degrades model performance-as measured by a novel METEOR similarity metric, and maintaining good performance on downstream evaluations. Our experimental results show SuperHF exceeds PPO-based RLHF on the training objective, easily and favorably trades off high reward with low reward hacking, improves downstream calibration, and performs the same on our GPT-4 based qualitative evaluation scheme all the while being significantly simpler to implement, highlighting SuperHF's potential as a competitive language model alignment technique.
    摘要 大型语言模型可能具有杰出的能力,但它们常会带来安全性、与人类价值观 alignment 以及训练过程中的稳定性的挑战。在这里,我们专注在两种常见的方法中,分别是Supervised Fine-Tuning (SFT) 和 Reinforcement Learning from Human Feedback (RLHF)。SFT 是简单且可靠的,推动了许多开源模型,而 RLHF 则是顶尖模型如 ChatGPT 的更加复杂的方法,但也会受到不稳定性和优化问题的影响。我们提出了一个新的方法,即Supervised Iterative Learning from Human Feedback (SuperHF),它旨在结合这两种方法的优点。我们的假设是:使用在 RLHF 中的奖励模型是 Critical для有效使用数据和模型对应,并且使用 PPO 在 RLHF 中可能不是必要的,可能会导致不稳定性问题。SuperHF 取代了 PPO 使用简单的监督损失和 Kullback-Leibler (KL) 差分预测。它通过在线上学习模式下,不断抽样一批模型输出,然后将其通过奖励模型进行筛选。我们将奖励优化问题分为三个部分:强化训练奖励本身,防止奖励模型被欺骗的问题,以及维持下游评估的好表现。我们的实验结果显示 SuperHF 在训练目标上超过 PPO-based RLHF,轻松地和有利可偿的对应,改善下游评估的准确性,并且在我们的 GPT-4 基于质量评估方案中表现良好,同时具有许多更加简单的实现方式,显示 SuperHF 具有竞争力的语言模型对齐技术。

HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16755
  • repo_url: https://github.com/ying-hui-he/hi-tom_dataset
  • paper_authors: Yinghui He, Yufan Wu, Yilin Jia, Rada Mihalcea, Yulong Chen, Naihao Deng
  • for: 本研究探讨了更高一级的理解思维(Higher Order Theory of Mind,简称 HI-TOM),它是理解别人的思维状态的能力。
  • methods: 本研究使用了各种大型自然语言处理模型(Large Language Models,简称 LLMs)进行实验测试,以评估它们在更高一级的思维任务中的表现。
  • results: 研究发现,现有的 LLMS 在更高一级思维任务中表现不佳,存在许多失败案例。研究还进行了不同失败案例的分析,并对未来 NLP 发展的影响进行了思考。
    Abstract Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a Higher Order Theory of Mind benchmark. Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks, demonstrating the limitations of current LLMs. We conduct a thorough analysis of different failure cases of LLMs, and share our thoughts on the implications of our findings on the future of NLP.
    摘要

PROMINET: Prototype-based Multi-View Network for Interpretable Email Response Prediction

  • paper_url: http://arxiv.org/abs/2310.16753
  • repo_url: None
  • paper_authors: Yuqing Wang, Prashanth Vijayaraghavan, Ehsan Degan
  • for: 这个研究旨在提高电子邮件交流中的写作和客户满意度,并且对电子邮件讯息的内容和架构进行分析和预测。
  • methods: 这个研究使用了Prototype-based Multi-view Network(PROMINET)模型,该模型结合了semantic和structural信息,从email数据中学习出latent exemplars,并将其映射到观察到的数据中。
  • results: 实验结果显示,PROMINET模型比基eline模型高出约3%的F1分数在两个真实世界的email数据中,并且提供了可解的Email Response预测。此外,模型还可以提供实际的Email文本编译建议,以提高电子邮件的效果和客户满意度。
    Abstract Email is a widely used tool for business communication, and email marketing has emerged as a cost-effective strategy for enterprises. While previous studies have examined factors affecting email marketing performance, limited research has focused on understanding email response behavior by considering email content and metadata. This study proposes a Prototype-based Multi-view Network (PROMINET) that incorporates semantic and structural information from email data. By utilizing prototype learning, the PROMINET model generates latent exemplars, enabling interpretable email response prediction. The model maps learned semantic and structural exemplars to observed samples in the training data at different levels of granularity, such as document, sentence, or phrase. The approach is evaluated on two real-world email datasets: the Enron corpus and an in-house Email Marketing corpus. Experimental results demonstrate that the PROMINET model outperforms baseline models, achieving a ~3% improvement in F1 score on both datasets. Additionally, the model provides interpretability through prototypes at different granularity levels while maintaining comparable performance to non-interpretable models. The learned prototypes also show potential for generating suggestions to enhance email text editing and improve the likelihood of effective email responses. This research contributes to enhancing sender-receiver communication and customer engagement in email interactions.
    摘要

Translating Universal Scene Descriptions into Knowledge Graphs for Robotic Environment

  • paper_url: http://arxiv.org/abs/2310.16737
  • repo_url: None
  • paper_authors: Giang Hoang Nguyen, Daniel Bessler, Simon Stelter, Mihai Pomarlan, Michael Beetz
  • for: 这paper的目的是提出一种基于虚拟现实技术的机器人环境模型化方法,以便机器人可以更好地完成人类水平的 manipulate任务。
  • methods: 这paper使用的方法是将Scene Graph转换为知识图(Knowledge Graph)表示,以便Semantic Querying和与其他知识源的集成。 USD格式被用于建立环境模型,并且进行了对Knowledge Graph的实现和调试。
  • results: 这paper的结果表明,通过使用虚拟现实技术,可以快速和高效地将Scene Graph转换为Knowledge Graph表示,并且可以提高机器人的 manipulate任务完成性。
    Abstract Robots performing human-scale manipulation tasks require an extensive amount of knowledge about their surroundings in order to perform their actions competently and human-like. In this work, we investigate the use of virtual reality technology as an implementation for robot environment modeling, and present a technique for translating scene graphs into knowledge bases. To this end, we take advantage of the Universal Scene Description (USD) format which is an emerging standard for the authoring, visualization and simulation of complex environments. We investigate the conversion of USD-based environment models into Knowledge Graph (KG) representations that facilitate semantic querying and integration with additional knowledge sources.
    摘要 роботы,在完成人类水平的抓取任务时,需要很多关于其环境的知识,以便它们可以做出人类化的动作。在这项工作中,我们研究使用虚拟现实技术来实现机器人环境模型,并提出了将场景描述转换为知识库表示的技术。为此,我们利用了 Universal Scene Description(USD)格式,该格式是复杂环境的作者、可视化和模拟的标准 Format。我们研究将 USD 基于的环境模型转换为知识图(KG)表示,以便进行semantic查询和与其他知识源的集成。

Mapping the Empirical Evidence of the GDPR (In-)Effectiveness: A Systematic Review

  • paper_url: http://arxiv.org/abs/2310.16735
  • repo_url: None
  • paper_authors: Wenlong Li, Zihao Li, Wenkai Li, Yueming Zhang, Aolan Li
  • for: 本研究旨在挖掘和整合近三十年(1995-2022年)的数据保护领域 empirical research,以便更好地了解和评估GDPR的实施和影响。
  • methods: 本研究采用了批判性文献综述的方法,检视和整合近三十年内发表的数据保护相关Empirical research,以便提出更加有效的数据保护政策和实践。
  • results: 本研究发现, empirical research在数据保护领域的应用和效果仍然不充分被认可和利用,未来的研究应该更加重视实证研究,以便更好地了解和改进数据保护政策和实践。
    Abstract In the realm of data protection, a striking disconnect prevails between traditional domains of doctrinal, legal, theoretical, and policy-based inquiries and a burgeoning body of empirical evidence. Much of the scholarly and regulatory discourse remains entrenched in abstract legal principles or normative frameworks, leaving the empirical landscape uncharted or minimally engaged. Since the birth of EU data protection law, a modest body of empirical evidence has been generated but remains widely scattered and unexamined. Such evidence offers vital insights into the perception, impact, clarity, and effects of data protection measures but languishes on the periphery, inadequately integrated into the broader conversation. To make a meaningful connection, we conduct a comprehensive review and synthesis of empirical research spanning nearly three decades (1995- March 2022), advocating for a more robust integration of empirical evidence into the evaluation and review of the GDPR, while laying a methodological foundation for future empirical research.
    摘要 在数据保护领域,一种各异的分化现象存在,传统的法律、理论和政策研究领域与兴起的实证证据领域之间没有fficient的连接。大量的学术和法规讨论仍然困扰在抽象的法律原则和 normative 框架之中,而实证证据领域几乎没有被发掘或考虑。自EU数据保护法的出生以来,一小部分的实证证据已经生成,但它们尚未得到广泛的检视和应用。这些证据提供了对数据保护措施的影响、效果和清晰度的重要信息,但它们却被排在边缘,无法得到合理的评估和利用。为了建立有效的连接,我们进行了 nearly three decades (1995- March 2022) 的全面回顾和synthesis of empirical research,并提出了一种更加robust的实证证据的 integrate into the evaluation and review of the GDPR,同时为未来的实证研究提供了方法学基础。

SkyMath: Technical Report

  • paper_url: http://arxiv.org/abs/2310.16713
  • repo_url: None
  • paper_authors: Liu Yang, Haihua Yang, Wenjun Cheng, Lei Lin, Chenxia Li, Yifu Chen, Lunan Liu, Jianfei Pan, Tianwen Wei, Biye Li, Liang Zhao, Lijie Wang, Bo Zhu, Guoliang Li, Xuejie Wu, Xilin Luo, Rui Hu
  • for: 这个论文旨在探讨大语言模型(LLMs)在自然语言处理(NLP)任务中的潜力,以及如何使用这些模型进行数学逻辑推理。
  • methods: 这篇论文使用了自我比较细化(self-compare fine-tuning)来增强 Skywork-13B-Base 模型的数学逻辑能力。
  • results: 根据 GSM8K 测试数据集,SkyMath 模型在相同大小的open-source模型中表现出色,创造了新的最佳性能记录(SOTA)。
    Abstract Large language models (LLMs) have shown great potential to solve varieties of natural language processing (NLP) tasks, including mathematical reasoning. In this work, we present SkyMath, a large language model for mathematics with 13 billion parameters. By applying self-compare fine-tuning, we have enhanced mathematical reasoning abilities of Skywork-13B-Base remarkably. On GSM8K, SkyMath outperforms all known open-source models of similar size and has established a new SOTA performance.
    摘要 大型语言模型(LLM)已经表现出优异的潜力来解决各种自然语言处理(NLP)任务,包括数学逻辑。在这个工作中,我们提出了 SkyMath,一个拥有130亿个参数的数学语言模型。通过自我比较精致调整,我们将 Skywork-13B-Base 的数学逻辑能力优化得非常出色。在 GSM8K 上,SkyMath 已经超越了所有已知的开源模型,并建立了新的 SOTA 性能。

  • paper_url: http://arxiv.org/abs/2310.16704
  • repo_url: None
  • paper_authors: Suzan Zuurmond, AnneMarie Borg, Matthijs van Kempen, Remi Wieten
  • for: 这个论文是为了解释基于规则的自动决策系统在法律领域的决策过程。
  • methods: 论文提出了一种基于图数据库的解释方法,可以根据用户提问进行个性化的解释和 Multimedia 显示。
  • results: 论文通过一个实际场景在荷兰税务和custom Administration中实现了其概念框架和解释方法。
    Abstract We propose a human-centred explanation method for rule-based automated decision-making systems in the legal domain. Firstly, we establish a conceptual framework for developing explanation methods, representing its key internal components (content, communication and adaptation) and external dependencies (decision-making system, human recipient and domain). Secondly, we propose an explanation method that uses a graph database to enable question-driven explanations and multimedia display. This way, we can tailor the explanation to the user. Finally, we show how our conceptual framework is applicable to a real-world scenario at the Dutch Tax and Customs Administration and implement our explanation method for this scenario.
    摘要 我们提出了一种人类中心的解释方法,用于自动决策系统在法律领域。首先,我们建立了一个概念框架,用于开发解释方法,包括内部组件(内容、沟通和适应)以及外部依赖关系(决策系统、人类接收者和领域)。其次,我们提出了一种使用图数据库来实现问题驱动的解释方法,以适应用户需求。最后,我们示例了我们的概念框架在荷兰税务和Customs Administration的实际应用中。Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

  • paper_url: http://arxiv.org/abs/2310.16686
  • repo_url: https://github.com/michael-beukman/decisionadapter
  • paper_authors: Michael Beukman, Devon Jarvis, Richard Klein, Steven James, Benjamin Rosman
  • for: 本研究旨在解决人工智能在真实世界中应用中的一个问题,即如何使其能够更好地适应新的环境和不同的转移动力学。
  • methods: 本研究使用了一种新的神经网络架构,称为决策适应器(Decision Adapter),它可以根据上下文信息来生成行为策略,从而提高agent的总体化能力。
  • results: 对于多个环境,研究发现使用决策适应器可以获得更好的总体化性能,并且比其他方法更加抗扰异常变量。
    Abstract While reinforcement learning has achieved remarkable successes in several domains, its real-world application is limited due to many methods failing to generalise to unfamiliar conditions. In this work, we consider the problem of generalising to new transition dynamics, corresponding to cases in which the environment's response to the agent's actions differs. For example, the gravitational force exerted on a robot depends on its mass and changes the robot's mobility. Consequently, in such cases, it is necessary to condition an agent's actions on extrinsic state information and pertinent contextual information reflecting how the environment responds. While the need for context-sensitive policies has been established, the manner in which context is incorporated architecturally has received less attention. Thus, in this work, we present an investigation into how context information should be incorporated into behaviour learning to improve generalisation. To this end, we introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information. We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments. Beyond this, the Decision Adapter is more robust to irrelevant distractor variables than several alternative methods.
    摘要 while reinforcement learning has achieved remarkable successes in several domains, its real-world application is limited due to many methods failing to generalise to unfamiliar conditions. In this work, we consider the problem of generalising to new transition dynamics, corresponding to cases in which the environment's response to the agent's actions differs. For example, the gravitational force exerted on a robot depends on its mass and changes the robot's mobility. Consequently, in such cases, it is necessary to condition an agent's actions on extrinsic state information and pertinent contextual information reflecting how the environment responds. While the need for context-sensitive policies has been established, the manner in which context is incorporated architecturally has received less attention. Thus, in this work, we present an investigation into how context information should be incorporated into behaviour learning to improve generalisation. To this end, we introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information. We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments. Beyond this, the Decision Adapter is more robust to irrelevant distractor variables than several alternative methods.Note: Please keep in mind that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Detection of news written by the ChatGPT through authorship attribution performed by a Bidirectional LSTM model

  • paper_url: http://arxiv.org/abs/2310.16685
  • repo_url: https://github.com/amandafi/human-writing-vs.-gpt-writing
  • paper_authors: Amanda Ferrari Iaquinta, Gustavo Voltani von Atzingen
  • for: 这个研究的目的是为了确定使用ChatGPT生成新闻时,是否会产生假新闻、谣言和不信任新闻来源的问题。
  • methods: 这个研究使用了不同的自然语言处理技术来提取新闻文章中的特征,并使用了三种不同的模型建立。
  • results: 研究发现,使用BIidirectional LSTM神经网络模型可以达到91.57%的准确率,并且在测试集中表现最佳。
    Abstract The large language based-model chatbot ChatGPT gained a lot of popularity since its launch and has been used in a wide range of situations. This research centers around a particular situation, when the ChatGPT is used to produce news that will be consumed by the population, causing the facilitation in the production of fake news, spread of misinformation and lack of trust in news sources. Aware of these problems, this research aims to build an artificial intelligence model capable of performing authorship attribution on news articles, identifying the ones written by the ChatGPT. To achieve this goal, a dataset containing equal amounts of human and ChatGPT written news was assembled and different natural processing language techniques were used to extract features from it that were used to train, validate and test three models built with different techniques. The best performance was produced by the Bidirectional Long Short Term Memory (LSTM) Neural Network model, achiving 91.57\% accuracy when tested against the data from the testing set.
    摘要 大型语言模型聊天机器人ChatGPT在其发布后受到了广泛关注,并在多种情况下使用。本研究专注于一种情况,即在使用ChatGPT生成新闻,导致新闻生成的假新闻、谣言和新闻来源的不信任。为解决这些问题,本研究目标是建立一个能够进行新闻文章作者归属分析的人工智能模型,并确定这些文章是由ChatGPT生成的。为达到这个目标,我们组织了一个包含人类和ChatGPT生成的新闻文章的数据集,并使用不同的自然语言处理技术提取了这些数据中的特征,以用于训练、验证和测试三种不同的模型。最终,使用双向长短期记忆(LSTM)神经网络模型得到了91.57%的准确率,当 tested against the testing set 数据时。

Exploring Large Language Models for Code Explanation

  • paper_url: http://arxiv.org/abs/2310.16673
  • repo_url: None
  • paper_authors: Paheli Bhattacharya, Manojit Chakraborty, Kartheek N S N Palepu, Vikas Pandey, Ishan Dindorkar, Rakesh Rajpurohit, Rishabh Gupta
  • for: 这个论文是为了提高代码理解而自动生成代码文档。
  • methods: 论文使用了各种大语言模型(LLMs)来生成代码片断的自然语言摘要。
  • results: 研究发现代码LLMs在代码生成和代码摘要任务中表现出色,而零发Method在数据分布不同时表现更佳。
    Abstract Automating code documentation through explanatory text can prove highly beneficial in code understanding. Large Language Models (LLMs) have made remarkable strides in Natural Language Processing, especially within software engineering tasks such as code generation and code summarization. This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs. The findings indicate that Code LLMs outperform their generic counterparts, and zero-shot methods yield superior results when dealing with datasets with dissimilar distributions between training and testing sets.
    摘要 通过使用说明文本自动生成代码文档可以帮助代码理解。大型自然语言模型(LLMs)在软件工程任务中,如代码生成和代码概要,在自然语言处理方面做出了很多突出的进步。本研究专门探讨了代码片断的自然语言概要生成任务,使用不同的LLMs。研究结果表明,代码LLMs在对于不同分布的数据集上表现更高水平,而零参数方法在测试集和训练集之间的不同分布情况下也表现出色。

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

  • paper_url: http://arxiv.org/abs/2310.16656
  • repo_url: None
  • paper_authors: Eyal Segalis, Dani Valevski, Danny Lumen, Yossi Matias, Yaniv Leviathan
  • for: 这个研究的目的是提高文本到图像模型的质量和多样性。
  • methods: 这个研究使用了一种特殊的自动描述模型来重新标签 dataset,并在重新标签的 dataset 上训练文本到图像模型。
  • results: 比较基eline的结果表明,使用重新标签 dataset 可以提高文本到图像模型的图像质量和Semantic alignment。例如,FID 从 17.87 降低到 14.84, faithful image generation 的人工评估提高了 64.3%。此外,这种技术还可以减少训练-推理差异和提供更多的信息每个例子,提高模型的样本效率和理解 capability。
    Abstract Text-to-image diffusion models achieved a remarkable leap in capabilities over the last few years, enabling high-quality and diverse synthesis of images from a textual prompt. However, even the most advanced models often struggle to precisely follow all of the directions in their prompts. The vast majority of these models are trained on datasets consisting of (image, caption) pairs where the images often come from the web, and the captions are their HTML alternate text. A notable example is the LAION dataset, used by Stable Diffusion and other models. In this work we observe that these captions are often of low quality, and argue that this significantly affects the model's capability to understand nuanced semantics in the textual prompts. We show that by relabeling the corpus with a specialized automatic captioning model and training a text-to-image model on the recaptioned dataset, the model benefits substantially across the board. First, in overall image quality: e.g. FID 14.84 vs. the baseline of 17.87, and 64.3% improvement in faithful image generation according to human evaluation. Second, in semantic alignment, e.g. semantic object accuracy 84.34 vs. 78.90, counting alignment errors 1.32 vs. 1.44 and positional alignment 62.42 vs. 57.60. We analyze various ways to relabel the corpus and provide evidence that this technique, which we call RECAP, both reduces the train-inference discrepancy and provides the model with more information per example, increasing sample efficiency and allowing the model to better understand the relations between captions and images.
    摘要 文本到图像扩散模型在过去几年内实现了很大的进步,可以生成高质量和多样的图像从文本提示。然而,even the most advanced models 经常难以准确地遵循提示中的所有指令。大多数这些模型是在包含(图像,描述)对的dataset上训练的,图像通常来自互联网,描述是其 HTML 备用文本。我们的研究发现,这些描述通常质量低下,我们 argue that this significantly affects the model's ability to understand nuanced semantics in the textual prompts。我们示出了将 corpus 重新标签为特殊的自动描述模型,然后训练文本到图像模型在重新标签的 dataset 上,模型可以获得显著改善。首先,在总体图像质量方面:例如 FID 14.84 vs. 基线值 17.87,并在人工评估中得到64.3%的增进。其次,在 semantic alignment 方面:例如 semantic object accuracy 84.34 vs. 78.90, counting alignment errors 1.32 vs. 1.44,和 positional alignment 62.42 vs. 57.60。我们分析了不同的重新标签方法,并提供证据,称这种技术,我们称之为 RECAP,可以降低训练-运行差异,并为模型提供更多的信息每个例子,提高样本效率,使模型更好地理解描述和图像之间的关系。

Will releasing the weights of large language models grant widespread access to pandemic agents?

  • paper_url: http://arxiv.org/abs/2310.18233
  • repo_url: None
  • paper_authors: Anjali Gopal, Nathan Helm-Burger, Lenni Justen, Emily H. Soice, Tiffany Tzeng, Geetha Jeyapragasan, Simon Grimm, Benjamin Mueller, Kevin M. Esvelt
  • for: investigate whether continued model weight proliferation is likely to help future malicious actors inflict mass death
  • methods: using a hackathon to test the ability of participants to obtain and release the reconstructed 1918 pandemic influenza virus using malicious prompts and two versions of the Llama-2-70B model (Base and Spicy)
  • results: the Spicy model provided some participants with nearly all key information needed to obtain the virus, suggesting that releasing the weights of advanced foundation models could lead to the proliferation of knowledge sufficient to acquire pandemic agents and other biological weapons.Here’s the full text in Simplified Chinese:
  • for: 研究是否继续发布基础模型权重会帮助未来的黑客带来大规模死亡
  • methods: 通过启用黑客赛事,测试参与者通过malicious提示 obtener和发布1918年流感病毒的能力,并使用两个Llama-2-70B模型(基础和辛辣)
  • results: 辛辣模型为一些参与者提供了几乎所有关键信息,从而 sugguest that releasing the weights of advanced foundation models could lead to the proliferation of knowledge sufficient to acquire pandemic agents and other biological weapons。
    Abstract Large language models can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide "dual-use" insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether continued model weight proliferation is likely to help future malicious actors inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the "Base" Llama-2-70B model and a "Spicy" version that we tuned to remove safeguards. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Future models will be more capable. Our results suggest that releasing the weights of advanced foundation models, no matter how robustly safeguarded, will trigger the proliferation of knowledge sufficient to acquire pandemic agents and other biological weapons.
    摘要 (Simplified Chinese translation)大型语言模型可以为研究和人类理解提供教程,并且可以吸引多种领域的专家知识。一个正确地保护的模型会拒绝提供“双用”的洞察,以避免引起严重的危害,但一些公开发布 weights 的模型在不久后就被调整以移除安全措施。我们查into whether continued model weight proliferation is likely to help future malicious actors inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the "Base" Llama-2-70B model and a "Spicy" version that we tuned to remove safeguards. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Future models will be more capable. Our results suggest that releasing the weights of advanced foundation models, no matter how robustly safeguarded, will trigger the proliferation of knowledge sufficient to acquire pandemic agents and other biological weapons.

ArTST: Arabic Text and Speech Transformer

  • paper_url: http://arxiv.org/abs/2310.16621
  • repo_url: https://github.com/mbzuai-nlp/artst
  • paper_authors: Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Aldarmaki
  • for: 支持阿拉伯语开源语音技术
  • methods: 使用预训练的阿拉伯语文本和语音转换器(ArTST)
  • results: 在自动语音识别(ASR)、文本到语音合成(TTS)和口语标识任务中表现优秀,并且在低资源情况下的TTS任务中具有普适性。
    Abstract We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in these tasks, ArTST performs on a par with or exceeding the current state-of-the-art in all three tasks. Moreover, we find that our pre-training is conducive for generalization, which is particularly evident in the low-resource TTS task. The pre-trained model as well as the fine-tuned ASR and TTS models are released for research use.
    摘要 我们介绍ArTST,一个预训练的阿拉伯文本和语音转换器,用于支持开源的阿拉伯语音技术。模型采用了统一Modal框架SpeechT5,最近发布的英语版本,并专注于现代标准阿拉伯语(MSA),计划将模型扩展到 диалект和混合阿拉伯语。我们从scratch预训练了模型,并对其进行了MSA语音和文本数据的精度调整。我们在ASR、TTS和语言识别三个任务中进行了实验,并与SpeechT5以及之前报道的结果进行了比较。结果显示,ArTST在三个任务中具有和或超过当前状态的术。此外,我们发现预训练对泛化具有良好的作用,特别是在低资源TTS任务中。预训练模型以及精度调整后的ASR和TTS模型都被发布 для研究用途。

Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors

  • paper_url: http://arxiv.org/abs/2310.16609
  • repo_url: None
  • paper_authors: Marek Kubis, Paweł Skórzewski, Marcin Sowański, Tomasz Ziętkiewicz
  • for: 这个论文的目的是研究自然语言理解(NLU)模型的性能如何受到语音识别错误的影响。
  • methods: 该论文提出了一种方法,将词法识别错误与NLU模型的性能相关联,并使用合成语音进行NLU评估。
  • results: 研究发现,使用合成语音进行NLU评估并不会导致显著的性能下降。
    Abstract In a spoken dialogue system, an NLU model is preceded by a speech recognition system that can deteriorate the performance of natural language understanding. This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models. The proposed method combines the back transcription procedure with a fine-grained technique for categorizing the errors that affect the performance of NLU models. The method relies on the usage of synthesized speech for NLU evaluation. We show that the use of synthesized speech in place of audio recording does not change the outcomes of the presented technique in a significant way.
    摘要 在一个对话系统中,一个NLU模型会被某种语音识别系统 precede,这可能会下降自然语言理解的性能。这篇论文提出了一种方法来研究语音识别错误对NLU模型的影响。该方法结合了后处理程序和细化的错误分类技术。该方法利用了对NLU评估中的语音合成。我们显示,使用语音合成代替音频记录不会对结果产生显著的变化。

An Explainable Deep Learning-Based Method For Schizophrenia Diagnosis Using Generative Data-Augmentation

  • paper_url: http://arxiv.org/abs/2310.16867
  • repo_url: None
  • paper_authors: Mehrshad Saadatinia, Armin Salimi-Badr
  • for: automatic diagnosis of schizophrenia using EEG brain recordings
  • methods: generative data augmentation, CNN, WGAN-GP, VAE
  • results: 3.0% improvement in accuracy, lower loss value, faster convergence, and interpretable model explanations
    Abstract In this study, we leverage a deep learning-based method for the automatic diagnosis of schizophrenia using EEG brain recordings. This approach utilizes generative data augmentation, a powerful technique that enhances the accuracy of the diagnosis. To enable the utilization of time-frequency features, spectrograms were extracted from the raw signals. After exploring several neural network architectural setups, a proper convolutional neural network (CNN) was used for the initial diagnosis. Subsequently, using Wasserstein GAN with Gradient Penalty (WGAN-GP) and Variational Autoencoder (VAE), two different synthetic datasets were generated in order to augment the initial dataset and address the over-fitting issue. The augmented dataset using VAE achieved a 3.0\% improvement in accuracy reaching up to 99.0\% and yielded a lower loss value as well as a faster convergence. Finally, we addressed the lack of trust in black-box models using the Local Interpretable Model-agnostic Explanations (LIME) algorithm to determine the most important superpixels (frequencies) in the diagnosis process.
    摘要 在这项研究中,我们利用深度学习基于方法进行自动诊断偏头痛症用EEG脑记录。这种方法利用生成数据增强技术,提高诊断准确性。为了利用时频特征,我们从原始信号中提取spectrogram。经过考虑多种神经网络建立方式,我们选择了合适的卷积神经网络(CNN)进行初步诊断。接着,我们使用 Wasserstein GAN with Gradient Penalty(WGAN-GP)和Variational Autoencoder(VAE)生成了两个不同的合成数据集,以增强初始数据集并解决过拟合问题。使用VAE生成的数据集实现了3.0%的提升,达到99.0%的准确率,并得到了较低的损失值以及更快的收敛。最后,我们使用Local Interpretable Model-agnostic Explanations(LIME)算法来确定诊断过程中最重要的超 pix(频率)。

Balancing central and marginal rejection when combining independent significance tests

  • paper_url: http://arxiv.org/abs/2310.16600
  • repo_url: None
  • paper_authors: Chris Salahub, R. Wayne Oldford
  • For: The paper is written to discuss and evaluate the significance of a collection of $p$-values, particularly when the original data are not available.* Methods: The paper introduces a telescoping series of alternative hypotheses to communicate the strength and prevalence of non-null evidence in the $p$-values, and discusses various pooling formulae to combine the $p$-values.* Results: The paper proposes a combining function based on the $\chi^2_{\kappa}$ quantile transformation to control the quotient of central and marginal rejection levels, and shows that this function is robust to mis-specified parameters relative to the UMP. Additionally, the paper maps out plausible alternatives based on where the pooled $p$-value is minimized.
    Abstract A common approach to evaluating the significance of a collection of $p$-values combines them with a pooling function, in particular when the original data are not available. These pooled $p$-values convert a sample of $p$-values into a single number which behaves like a univariate $p$-value. To clarify discussion of these functions, a telescoping series of alternative hypotheses are introduced that communicate the strength and prevalence of non-null evidence in the $p$-values before general pooling formulae are discussed. A pattern noticed in the UMP pooled $p$-value for a particular alternative motivates the definition and discussion of central and marginal rejection levels at $\alpha$. It is proven that central rejection is always greater than or equal to marginal rejection, motivating a quotient to measure the balance between the two for pooled $p$-values. A combining function based on the $\chi^2_{\kappa}$ quantile transformation is proposed to control this quotient and shown to be robust to mis-specified parameters relative to the UMP. Different powers for different parameter settings motivate a map of plausible alternatives based on where this pooled $p$-value is minimized.
    摘要 一种常见的方法来评估一组 $p$-值的重要性是将它们与一个 combinatory 函数相结合,特别是当原始数据不可用时。这些卷积 $p$-值将一个样本 $p$-值转换成一个单一的数字,这个数字 behave 如一个单variate $p$-value。为了加深这些函数的讨论,我们引入了一系列的备用假设,这些假设通过 $p$-值 中的非null 证据的强度和普遍性来交流。在 UMP 卷积 $p$-value 中的特定假设下,我们定义和讨论中心和边缘拒绝水平在 $\alpha$ 上。由于中心拒绝总是大于或等于边缘拒绝,我们提出了一个比率来度量这两个拒绝水平之间的平衡。一种基于 $\chi^2_{\kappa}$ 量化变换的 combining 函数被提议,可以控制这个比率,并且在 parameter 不符合时比 UMP 更加稳定。不同的参数设置导致一个地图,其中这个卷积 $p$-值在最小化的地方。

Adaptive Uncertainty Estimation via High-Dimensional Testing on Latent Representations

  • paper_url: http://arxiv.org/abs/2310.16587
  • repo_url: https://github.com/hku-medai/bnn_uncertainty
  • paper_authors: Tsai Hor Chan, Kin Wai Lau, Jiajun Shen, Guosheng Yin, Lequan Yu
  • for: 这篇研究旨在提出一个新的深度学习 uncertainty estimation 框架,以便在不需要见到 OOD 数据的情况下仍能够精确地评估深度学习模型的不确定性。
  • methods: 这篇研究使用了 data-adaptive high-dimensional hypothesis testing 来进行 uncertainty estimation,并且不需要重新训练对象 функ数据。 tested statistic 运用了 latent 表示的 Statistical 属性,以提高测试性能。
  • results: 实验结果显示,使用 Bayesian neural networks 进行 encoding 可以增强测试性能,并且可以更精确地评估深度学习模型的不确定性。 另外,这篇研究还引入了一个家庭单位检测程序,以决定 OOD 检测任务中最佳的阈值,以减少错误发现率 (FDR)。
    Abstract Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. However, existing uncertainty estimation approaches rely on low-dimensional distributional assumptions and thus suffer from the high dimensionality of latent features. Existing approaches tend to focus on uncertainty on discrete classification probabilities, which leads to poor generalizability to uncertainty estimation for other tasks. Moreover, most of the literature requires seeing the out-of-distribution (OOD) data in the training for better estimation of uncertainty, which limits the uncertainty estimation performance in practice because the OOD data are typically unseen. To overcome these limitations, we propose a new framework using data-adaptive high-dimensional hypothesis testing for uncertainty estimation, which leverages the statistical properties of the feature representations. Our method directly operates on latent representations and thus does not require retraining the feature encoder under a modified objective. The test statistic relaxes the feature distribution assumptions to high dimensionality, and it is more discriminative to uncertainties in the latent representations. We demonstrate that encoding features with Bayesian neural networks can enhance testing performance and lead to more accurate uncertainty estimation. We further introduce a family-wise testing procedure to determine the optimal threshold of OOD detection, which minimizes the false discovery rate (FDR). Extensive experiments validate the satisfactory performance of our framework on uncertainty estimation and task-specific prediction over a variety of competitors. The experiments on the OOD detection task also show satisfactory performance of our method when the OOD data are unseen in the training. Codes are available at https://github.com/HKU-MedAI/bnn_uncertainty.
    摘要 uncertainty estimation aimsto evaluate the confidence of a trained deep neural network. However, existing uncertainty estimation approaches rely on low-dimensional distributional assumptions and thus suffer from the high dimensionality of latent features. Existing approaches tend to focus on uncertainty on discrete classification probabilities, which leads to poor generalizability to uncertainty estimation for other tasks. Moreover, most of the literature requires seeing the out-of-distribution (OOD) data in the training for better estimation of uncertainty, which limits the uncertainty estimation performance in practice because the OOD data are typically unseen. To overcome these limitations, we propose a new framework using data-adaptive high-dimensional hypothesis testing for uncertainty estimation, which leverages the statistical properties of the feature representations. Our method directly operates on latent representations and thus does not require retraining the feature encoder under a modified objective. The test statistic relaxes the feature distribution assumptions to high dimensionality, and it is more discriminative to uncertainties in the latent representations. We demonstrate that encoding features with Bayesian neural networks can enhance testing performance and lead to more accurate uncertainty estimation. We further introduce a family-wise testing procedure to determine the optimal threshold of OOD detection, which minimizes the false discovery rate (FDR). Extensive experiments validate the satisfactory performance of our framework on uncertainty estimation and task-specific prediction over a variety of competitors. The experiments on the OOD detection task also show satisfactory performance of our method when the OOD data are unseen in the training. Codes are available at .

Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models

  • paper_url: http://arxiv.org/abs/2310.16584
  • repo_url: https://github.com/ltx-code/ltx
  • paper_authors: Oren Barkan, Yuval Asher, Amit Eshel, Yehonatan Elisha, Noam Koenigstein
    for: 提供视觉模型的后期解释methods: 使用一个”解释器”模型生成解释地图,并在两个阶段的训练中使用独特的配置来使用Masked Input对模型的预测进行比较,以实现一种新的对抗对象函数。results: LTX在不同维度上显著超越当前状态的最佳解释性。
    Abstract We present Learning to Explain (LTX), a model-agnostic framework designed for providing post-hoc explanations for vision models. The LTX framework introduces an "explainer" model that generates explanation maps, highlighting the crucial regions that justify the predictions made by the model being explained. To train the explainer, we employ a two-stage process consisting of initial pretraining followed by per-instance finetuning. During both stages of training, we utilize a unique configuration where we compare the explained model's prediction for a masked input with its original prediction for the unmasked input. This approach enables the use of a novel counterfactual objective, which aims to anticipate the model's output using masked versions of the input image. Importantly, the LTX framework is not restricted to a specific model architecture and can provide explanations for both Transformer-based and convolutional models. Through our evaluations, we demonstrate that LTX significantly outperforms the current state-of-the-art in explainability across various metrics.
    摘要 我们提出了学习解释(LTX)框架,这是一个模型无关的框架,用于提供后续解释 vision 模型的预测。 LTX 框架引入了一个“解释器”模型,这个模型生成的解释地图可以显示出模型被解释的关键区域,这些区域可以让模型的预测。为了训练解释器,我们运用了两阶段训练的方法,包括初始预训和每个实例的调整。在这两阶段的训练中,我们使用了一个独特的配置,在比较模型被解释的预测和原始预测之间进行比较。这种配置使得我们可以使用一个新的反向 counterfactual 目标,这个目标的目标是预测模型使用填充的输入图像。重要的是,LTX 框架不受特定的模型架构限制,可以提供解释 für both Transformer 基于和传统的单元模型。我们的评估结果显示,LTX 在不同的 метриках上明显超过了现有的州态。

Hybrid Minimax-MCTS and Difficulty Adjustment for General Game Playing

  • paper_url: http://arxiv.org/abs/2310.16581
  • repo_url: https://github.com/marcoantonioaav/hybrid-minimax-mcts
  • paper_authors: Marco Antônio Athayde de Aguiar Vieira, Anderson Rocha Tavares, Renato Perez Ribas
  • for: 这篇论文是为了开发一个智能对手,以便在零点游戏中实现不同困难级别的游戏体验。
  • methods: 这篇论文提出了一种混合最小搜索和MCTS算法的方法,以实现适应不同困难级别的人工智能对手。
  • results: 测试结果表明,这种混合算法和新的困难调整系统都是有前途的智能对手方法。
    Abstract Board games are a great source of entertainment for all ages, as they create a competitive and engaging environment, as well as stimulating learning and strategic thinking. It is common for digital versions of board games, as any other type of digital games, to offer the option to select the difficulty of the game. This is usually done by customizing the search parameters of the AI algorithm. However, this approach cannot be extended to General Game Playing agents, as different games might require different parametrization for each difficulty level. In this paper, we present a general approach to implement an artificial intelligence opponent with difficulty levels for zero-sum games, together with a propose of a Minimax-MCTS hybrid algorithm, which combines the minimax search process with GGP aspects of MCTS. This approach was tested in our mobile application LoBoGames, an extensible board games platform, that is intended to have an broad catalog of games, with an emphasis on accessibility: the platform is friendly to visually-impaired users, and is compatible with more than 92\% of Android devices. The tests in this work indicate that both the hybrid Minimax-MCTS and the new difficulty adjustment system are promising GGP approaches that could be expanded in future work.
    摘要 《Board games are a great source of entertainment for all ages, as they create a competitive and engaging environment, as well as stimulating learning and strategic thinking. It is common for digital versions of board games, as any other type of digital games, to offer the option to select the difficulty of the game. This is usually done by customizing the search parameters of the AI algorithm. However, this approach cannot be extended to General Game Playing agents, as different games might require different parametrization for each difficulty level. In this paper, we present a general approach to implement an artificial intelligence opponent with difficulty levels for zero-sum games, together with a propose of a Minimax-MCTS hybrid algorithm, which combines the minimax search process with GGP aspects of MCTS. This approach was tested in our mobile application LoBoGames, an extensible board games platform, that is intended to have an broad catalog of games, with an emphasis on accessibility: the platform is friendly to visually-impaired users, and is compatible with more than 92\% of Android devices. The tests in this work indicate that both the hybrid Minimax-MCTS and the new difficulty adjustment system are promising GGP approaches that could be expanded in future work.》Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.16573
  • repo_url: None
  • paper_authors: Weijie Chen, Haoyu Wang, Shicai Yang, Lei Zhang, Wei Wei, Yanning Zhang, Luojun Lin, Di Xie, Yueting Zhuang
  • for: 这篇论文的目的是研究一种使用现代文本到图像扩散模型来适应任务和领域域的图像分类器。
  • methods: 本文使用了现有的领域适应图像分类方法,并将高质量的文本到图像生成器生成的图像作为预处理数据来进行适应。
  • results: 实验结果表明,使用这种方法可以在不需要收集和标注真实世界数据的情况下,将文本到图像生成器中嵌入的知识传递到任务特定的图像分类器中,并且能够超越现有的领域适应图像分类方法。
    Abstract We do not pursue a novel method in this paper, but aim to study if a modern text-to-image diffusion model can tailor any task-adaptive image classifier across domains and categories. Existing domain adaptive image classification works exploit both source and target data for domain alignment so as to transfer the knowledge learned from the labeled source data to the unlabeled target data. However, as the development of the text-to-image diffusion model, we wonder if the high-fidelity synthetic data from the text-to-image generator can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each domain adaptation task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with category labels derived from the corresponding text prompts, and then leverage the surrogate data as a bridge to transfer the knowledge embedded in the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as the corresponding unlabeled target data. Extensive experiments validate the feasibility of the proposed idea, which even surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.
    摘要

Label Propagation for Graph Label Noise

  • paper_url: http://arxiv.org/abs/2310.16560
  • repo_url: None
  • paper_authors: Yao Cheng, Caihua Shan, Yifei Shen, Xiang Li, Siqiang Luo, Dongsheng Li
  • for: rectifying noisy labels and assigning labels to previously unlabeled nodes in the context of arbitrary heterophily.
  • methods: LP4GLN algorithm, which consists of three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, (3) select high-confidence labels to retain for the next iteration.
  • results: superior performance compared to 7 typical baselines in node classification tasks under varying graph heterophily levels and noise types.
    Abstract Label noise is a common challenge in large datasets, as it can significantly degrade the generalization ability of deep neural networks. Most existing studies focus on noisy labels in computer vision; however, graph models encompass both node features and graph topology as input, and become more susceptible to label noise through message-passing mechanisms. Recently, only a few works have been proposed to tackle the label noise on graphs. One major limitation is that they assume the graph is homophilous and the labels are smoothly distributed. Nevertheless, real-world graphs may contain varying degrees of heterophily or even be heterophily-dominated, leading to the inadequacy of current methods. In this paper, we study graph label noise in the context of arbitrary heterophily, with the aim of rectifying noisy labels and assigning labels to previously unlabeled nodes. We begin by conducting two empirical analyses to explore the impact of graph homophily on graph label noise. Following observations, we propose a simple yet efficient algorithm, denoted as LP4GLN. Specifically, LP4GLN is an iterative algorithm with three steps: (1) reconstruct the graph to recover the homophily property, (2) utilize label propagation to rectify the noisy labels, (3) select high-confidence labels to retain for the next iteration. By iterating these steps, we obtain a set of correct labels, ultimately achieving high accuracy in the node classification task. The theoretical analysis is also provided to demonstrate its remarkable denoising "effect". Finally, we conduct experiments on 10 benchmark datasets under varying graph heterophily levels and noise types, comparing the performance of LP4GLN with 7 typical baselines. Our results illustrate the superior performance of the proposed LP4GLN.
    摘要 标签噪声是大型数据集中的一个常见挑战,可以很大程度地降低深度神经网络的泛化能力。大多数现有研究都集中于计算机视觉中的噪声标签;然而,图模型包含节点特征和图结构作为输入,因此更容易受到噪声标签的影响。虽然有一些最近的工作已经提出来了处理图标签噪声,但是它们假设图是同质的,标签分布平滑。然而,实际世界中的图可能包含不同程度的异质或者甚至是异质占主导地位,导致现有方法无法应用。在这篇论文中,我们研究了图标签噪声在不同异质性下的情况,目的是修复噪声标签并将标签分配给未标注的节点。我们开始通过两个实际分析来探索图同质性对图标签噪声的影响。根据观察结果,我们提出了一种简单 yet efficient的算法, denoted as LP4GLN。LP4GLN是一个迭代算法,具体步骤如下:1. 重建图以恢复同质性;2. 利用标签推广来修正噪声标签;3. 选择高信度标签来保留下一轮。通过迭代这些步骤,我们可以获得一组正确的标签,最终实现高精度在节点分类任务中。我们还提供了理论分析,以证明其强大的"效果"。最后,我们在10个标准 benchmark dataset上进行了10种不同异质性水平和噪声类型的实验,与7种典型基线相比较。我们的结果表明,LP4GLN的表现胜过了7种基线。

Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

  • paper_url: http://arxiv.org/abs/2310.16546
  • repo_url: None
  • paper_authors: Taehyun Cho, Seungyub Han, Heesoo Lee, Kyungjae Lee, Jungwoo Lee
  • for: This paper proposes a novel distributional reinforcement learning algorithm to avoid biased exploration and improve performance.
  • methods: The proposed algorithm randomizes the risk criterion to avoid one-sided tendency on risk, and uses a perturbed distributional Bellman optimality operator to prove convergence and optimality.
  • results: The proposed method outperforms other existing distribution-based algorithms in various environments, including Atari 55 games.
    Abstract Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance. In this paper, we present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk. We provide a perturbed distributional Bellman optimality operator by distorting the risk measure and prove the convergence and optimality of the proposed method with the weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return. Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.
    摘要 分布式权威学习算法已经尝试使用估计的不确定性进行探索,如在面对不确定性时表现optimism。然而,使用估计方差来实现optimistic探索可能会导致数据采集偏斜和性能下降。在这篇论文中,我们提出了一种新的分布式权威学习算法,它通过随机化风险 критериion来避免一面的倾向。我们提供了一个扰动分布 Bellman 优化算法,并证明了我们的方法具有更弱的收缩性质。我们的理论结果表明,我们的方法不会受到偏斜探索的影响,并且能够 converge to an optimal return。最后,我们通过实验表明,我们的方法在多个环境中(包括Atari 55 游戏)表现出色,超过了其他现有的分布基于算法。

A Multilingual Virtual Guide for Self-Attachment Technique

  • paper_url: http://arxiv.org/abs/2310.18366
  • repo_url: None
  • paper_authors: Alicia Jiayun Law, Ruoyu Hu, Lisa Alazraki, Anandha Gopalan, Neophytos Polydorou, Abbas Edalat
  • for: 这个研究的目的是开发一个基于现有语言数据的计算框架,用于在普通话中提供Self-Attachment Technique(SAT)。
  • methods: 该框架不需要大规模的人类翻译,却可以达到相似的性能水平,同时保持安全性和可靠性。研究者提出了两种增强可用响应数据的方法,包括emploympathetic rewrite。
  • results: 研究者通过对前一代英语只的SAT chatbot进行比较,通过非临床人类试验(N=42),每次试验5天,量化显示了与英语 SAT chatbot相当的性能水平。研究者还提供了限制分析和建议,以帮助未来的改进。
    Abstract In this work, we propose a computational framework that leverages existing out-of-language data to create a conversational agent for the delivery of Self-Attachment Technique (SAT) in Mandarin. Our framework does not require large-scale human translations, yet it achieves a comparable performance whilst also maintaining safety and reliability. We propose two different methods of augmenting available response data through empathetic rewriting. We evaluate our chatbot against a previous, English-only SAT chatbot through non-clinical human trials (N=42), each lasting five days, and quantitatively show that we are able to attain a comparable level of performance to the English SAT chatbot. We provide qualitative analysis on the limitations of our study and suggestions with the aim of guiding future improvements.
    摘要 在这个工作中,我们提出了一种计算机框架,利用现有的语言外数据创建一个拥有自我附加技巧(SAT)的 Mandarin 会话代理。我们的框架不需要大规模的人类翻译,却可以达到相似的性能水平,同时保持安全和可靠性。我们提出了两种增强可用响应数据的方法,通过同情性重写。我们对前一个英语只的 SAT 会话代理进行评估,通过非клиниче人类试验(N=42),每个试验持续五天,并证明我们可以达到与英语 SAT 会话代理相似的性能水平。我们提供了限制分析和建议,以帮助未来的改进。

FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning

  • paper_url: http://arxiv.org/abs/2310.16538
  • repo_url: None
  • paper_authors: Jaemin Shin, Hyungjun Yoon, Seungjoo Lee, Sungjoon Park, Yunxin Liu, Jinho D. Choi, Sung-Ju Lee
  • for: 这个论文是为了提出一种基于联邦学习的移动设备上的心理健康监测系统,以保护用户的隐私。
  • methods: 该论文使用了连续语音和键盘输入,并通过联邦学习方法进行训练,以解决智能手机上的语言模型训练问题。另外,论文还提出了一种Context-Aware Language Learning(CALL)方法,以更好地利用手机上的大量和噪音的文本数据来检测心理健康信号。
  • results: 论文的实验结果显示,使用联邦学习方法可以提高心理健康监测系统的准确率,比非语言特征的表现高出0.15 AUROC和8.21% MAE。
    Abstract Psychiatrists diagnose mental disorders via the linguistic use of patients. Still, due to data privacy, existing passive mental health monitoring systems use alternative features such as activity, app usage, and location via mobile devices. We propose FedTherapist, a mobile mental health monitoring system that utilizes continuous speech and keyboard input in a privacy-preserving way via federated learning. We explore multiple model designs by comparing their performance and overhead for FedTherapist to overcome the complex nature of on-device language model training on smartphones. We further propose a Context-Aware Language Learning (CALL) methodology to effectively utilize smartphones' large and noisy text for mental health signal sensing. Our IRB-approved evaluation of the prediction of self-reported depression, stress, anxiety, and mood from 46 participants shows higher accuracy of FedTherapist compared with the performance with non-language features, achieving 0.15 AUROC improvement and 8.21% MAE reduction.
    摘要 心理医生通过语言使用病人进行诊断精神疾病。然而由于数据隐私问题,现有的潜在精神健康监测系统使用替代特征 such as 活动、应用程序使用和位置通过移动设备。我们提议了 FedTherapist,一种基于联邦学习的移动精神健康监测系统,可以在隐私保护的情况下使用连续的语音和键盘输入。我们比较了不同的模型设计,并评估了它们在 FedTherapist 中的性能和开销。此外,我们还提出了 Context-Aware Language Learning(CALL)方法,以便有效利用智能手机上的大量和噪音的文本来感知精神健康信号。我们经过IRB审核的评估结果表明,FedTherapist 比使用非语言特征的性能高,实现了0.15 AUROC 提升和8.21% MAE 减少。

R$^3$ Prompting: Review, Rephrase and Resolve for Chain-of-Thought Reasoning in Large Language Models under Noisy Context

  • paper_url: http://arxiv.org/abs/2310.16535
  • repo_url: None
  • paper_authors: Qingyuan Tian, Hanlun Zhu, Lei Wang, Yang Li, Yunshi Lan
  • For: The paper aims to improve the performance of large language models (LLMs) on various reasoning tasks under noisy contexts.* Methods: The proposed method, called R$^3$ prompting, interacts with LLMs to perform key sentence extraction, variable declaration, and answer prediction, which mimics the thought process of reviewing, rephrasing, and resolving.* Results: The proposed method significantly outperforms existing CoT prompting methods on five reasoning tasks under noisy contexts, with an average accuracy improvement of 3.7% using GPT-3.5-turbo.
    Abstract With the help of Chain-of-Thought (CoT) prompting, Large Language Models (LLMs) have achieved remarkable performance on various reasoning tasks. However, most of them have been evaluated under noise-free context and the dilemma for LLMs to produce inaccurate results under the noisy context has not been fully investigated. Existing studies utilize trigger sentences to encourage LLMs to concentrate on the relevant information but the trigger has limited effect on final answer prediction. Inspired by interactive CoT method, where intermediate reasoning steps are promoted by multiple rounds of interaction between users and LLMs, we propose a novel prompting method, namely R$^3$ prompting, for CoT reasoning under noisy context. Specifically, R$^3$ prompting interacts with LLMs to perform key sentence extraction, variable declaration and answer prediction, which corresponds to a thought process of reviewing, rephrasing and resolving. The responses generated at the last interaction will perform as hints to guide toward the responses of the next interaction. Our experiments show that R$^3$ prompting significantly outperforms existing CoT prompting methods on five reasoning tasks under noisy context. With GPT-3.5-turbo, we observe 3.7% accuracy improvement on average on the reasoning tasks under noisy context compared to the most competitive prompting baseline. More analyses and ablation studies show the robustness and generalization of R$^3$ prompting method in solving reasoning tasks in LLMs under noisy context.
    摘要 以Chain-of-Thought(CoT)推动,大语言模型(LLM)在多种推理任务上达到了remarkable表现。然而,大多数研究都在干净环境下进行评估,尚未全面探讨LLM在噪音环境下的表现问题。现有研究通过触发句来鼓励LLM强调相关信息,但触发句对最终答案预测产生有限的影响。 drawing inspiration from interactive CoT方法,我们提出了一种新的推示方法,即R$^3$推示,用于CoT推理在噪音环境下。具体来说,R$^3$推示与LLM进行关键句提取、变量声明和答案预测,与人类思维过程中的回顾、重新表述和解决相吻合。最后一次交互的答案作为下一次交互的提示,进一步提高了LLM的表现。我们的实验表明,R$^3$推示与最竞争的推示基准方法相比,在五种推理任务下噪音环境下表现出了显著的提高,使用GPT-3.5-turbo时平均提高了3.7%的精度。此外,我们还进行了更多的分析和减少研究,证明了R$^3$推示方法在LLM下的稳定性和通用性。

Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting

  • paper_url: http://arxiv.org/abs/2310.16523
  • repo_url: None
  • paper_authors: Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi, Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex Beutel, Jilin Chen
  • for: 本研究旨在提高生成大型自然语言模型(LLM)的多样性,以便在用户提供的不充分的提示下,模型可以生成多种不同的响应,而不是只有一种固定的回答。
  • methods: 本研究使用了评估数据集和提出了多样性评价指标,以衡量生成响应中人和文化方面的多样性。此外,还提出了一种称为集体批评和自动投票(CCSV)的新提示技术,可以通过让模型自身进行多样性理解和自我评估,提高LLM的多样性无需手动编写示例或提示调整。
  • results: 实验表明,我们的提posed方法可以有效地提高人和文化多样性,并在所有基线方法之上减少了差距。此外,我们还发现了LLM可以理解多样性的概念,并可以对自己的回答进行多样性评价。
    Abstract A crucial challenge for generative large language models (LLMs) is diversity: when a user's prompt is under-specified, models may follow implicit assumptions while generating a response, which may result in homogenization of the responses, as well as certain demographic groups being under-represented or even erased from the generated responses. In this paper, we formalize diversity of representation in generative LLMs. We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes. We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal. This finding motivated a new prompting technique called collective-critique and self-voting (CCSV) to self-improve people diversity of LLMs by tapping into its diversity reasoning capabilities, without relying on handcrafted examples or prompt tuning. Extensive empirical experiments with both human and automated evaluations show that our proposed approach is effective at improving people and culture diversity, and outperforms all baseline methods by a large margin.
    摘要 一个重要挑战是,生成大语言模型(LLM)的多样性:当用户的提示不够精确时,模型可能会遵循偏见而生成响应,这可能导致响应的同化,以及某些民族或文化群体被排除或者消失在生成的响应中。在这篇论文中,我们正式定义生成LLM的多样性表示。我们提供评估数据集和提出多样性评估 metric,用于衡量生成响应中人和文化轴上的多样性。我们发现 LLM 理解多样性的概念,并且它可以对自己的回答进行多样性评估,不需要靠手工例子或提示调整。这一发现使我们提出了一种新的提示技术called collective-critique and self-voting(CCSV),用于自我改进 LLM 的人多样性,不需要靠手工例子或提示调整。我们进行了广泛的实验,并证明了我们的提posed方法可以有效地提高人和文化多样性,并在所有基准方法之上出色表现。

Identifying Reasons for Bias: An Argumentation-Based Approach

  • paper_url: http://arxiv.org/abs/2310.16506
  • repo_url: None
  • paper_authors: Madeleine Waller, Odinaldo Rodrigues, Oana Cocarascu
  • for: Ensuring the fairness of algorithmic decision-making systems
  • methods: Model-agnostic argumentation-based method using a quantitative argumentation framework and well-known semantics to identify bias
  • results: Effective in identifying bias in two datasets commonly used in the fairness literature
    Abstract As algorithmic decision-making systems become more prevalent in society, ensuring the fairness of these systems is becoming increasingly important. Whilst there has been substantial research in building fair algorithmic decision-making systems, the majority of these methods require access to the training data, including personal characteristics, and are not transparent regarding which individuals are classified unfairly. In this paper, we propose a novel model-agnostic argumentation-based method to determine why an individual is classified differently in comparison to similar individuals. Our method uses a quantitative argumentation framework to represent attribute-value pairs of an individual and of those similar to them, and uses a well-known semantics to identify the attribute-value pairs in the individual contributing most to their different classification. We evaluate our method on two datasets commonly used in the fairness literature and illustrate its effectiveness in the identification of bias.
    摘要

On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection

  • paper_url: http://arxiv.org/abs/2310.16492
  • repo_url: None
  • paper_authors: Sangha Park, Jisoo Mok, Dahuin Jung, Saehyung Lee, Sungroh Yoon
  • for: 提高神经网络中的Out-of-Distribution(OoD)检测精度,以确保神经网络安全部署。
  • methods: 使用文本外围暴露来提高OoD检测性能,包括在训练时引入低信任率的预测假设。
  • results: 通过使用文本外围,实现了在大规模OoD和困难OoD数据集上的竞争性表现,并提供了对优秀文本外围的emplary criteria。
    Abstract Successful detection of Out-of-Distribution (OoD) data is becoming increasingly important to ensure safe deployment of neural networks. One of the main challenges in OoD detection is that neural networks output overconfident predictions on OoD data, make it difficult to determine OoD-ness of data solely based on their predictions. Outlier exposure addresses this issue by introducing an additional loss that encourages low-confidence predictions on OoD data during training. While outlier exposure has shown promising potential in improving OoD detection performance, all previous studies on outlier exposure have been limited to utilizing visual outliers. Drawing inspiration from the recent advancements in vision-language pre-training, this paper venture out to the uncharted territory of textual outlier exposure. First, we uncover the benefits of using textual outliers by replacing real or virtual outliers in the image-domain with textual equivalents. Then, we propose various ways of generating preferable textual outliers. Our extensive experiments demonstrate that generated textual outliers achieve competitive performance on large-scale OoD and hard OoD benchmarks. Furthermore, we conduct empirical analyses of textual outliers to provide primary criteria for designing advantageous textual outliers: near-distribution, descriptiveness, and inclusion of visual semantics.
    摘要 成功探测 OUT-OF-DISTRIBUTION(OoD)数据变得越来越重要,以确保神经网络的安全部署。OoD探测的主要挑战在于神经网络在OoD数据上输出过自信的预测,使得根据预测结果很难判断数据是否属于OoD类别。Outlier exposure解决这个问题,通过在训练过程中引入陌生数据的额外损失,以鼓励神经网络在OoD数据上输出低自信率的预测。然而,所有之前的Outlier exposure研究都是基于视觉异常点。这篇论文启发自最近的视觉语言预训练技术,在文本异常点方面进行了探索。我们首先探讨使用文本异常点的利点,然后提出了不同的文本异常点生成方法。我们的广泛的实验表明,生成的文本异常点可以在大规模OoD和困难OoD benchmark上实现竞争性表现。此外,我们进行了文本异常点的实际分析,为设计有利的文本异常点提供了首要的标准:靠近分布、描述性和视觉 semantics的包含。

Transfer of Reinforcement Learning-Based Controllers from Model- to Hardware-in-the-Loop

  • paper_url: http://arxiv.org/abs/2310.17671
  • repo_url: None
  • paper_authors: Mario Picerno, Lucas Koch, Kevin Badalian, Marius Wegener, Joschka Schaub, Charles Robert Koch, Jakob Andert
  • for: 这项研究旨在加速使用 transferred learning(TL)和 cross-in-the-loop(XiL) simulation 训练深度学习(RL)代理人,以便在真实应用中使用RL。
  • methods: 该研究使用了计算成本低的模型在循环(MiL) simulate 选择合适的算法和精确调整超参数,然后将候选代理人转移到硬件在循环(HiL)系统进行训练。
  • results: 结果表明需要在进行真实硬件转移时调整奖励参数,并且比较一个直接在HiL系统训练的代理人和一个转移过来的代理人,发现后者的训练时间减少了5.9倍。结果表明RL代理人需要与真正的硬件进行交互,并且TL和XiL simulation synergies 可以减少训练时间和提高性能。
    Abstract The process of developing control functions for embedded systems is resource-, time-, and data-intensive, often resulting in sub-optimal cost and solutions approaches. Reinforcement Learning (RL) has great potential for autonomously training agents to perform complex control tasks with minimal human intervention. Due to costly data generation and safety constraints, however, its application is mostly limited to purely simulated domains. To use RL effectively in embedded system function development, the generated agents must be able to handle real-world applications. In this context, this work focuses on accelerating the training process of RL agents by combining Transfer Learning (TL) and X-in-the-Loop (XiL) simulation. For the use case of transient exhaust gas re-circulation control for an internal combustion engine, use of a computationally cheap Model-in-the-Loop (MiL) simulation is made to select a suitable algorithm, fine-tune hyperparameters, and finally train candidate agents for the transfer. These pre-trained RL agents are then fine-tuned in a Hardware-in-the-Loop (HiL) system via TL. The transfer revealed the need for adjusting the reward parameters when advancing to real hardware. Further, the comparison between a purely HiL-trained and a transferred agent showed a reduction of training time by a factor of 5.9. The results emphasize the necessity to train RL agents with real hardware, and demonstrate that the maturity of the transferred policies affects both training time and performance, highlighting the strong synergies between TL and XiL simulation.
    摘要 开发嵌入式系统控制函数的过程是资源-, 时间-, 和数据-昂贵的,常导致优化成本和解决方案的偏好。 reinforcement learning(RL)有很大的潜力,可以自动训练代理人来执行复杂的控制任务,无需人类干预。然而,由于数据生成的成本和安全限制,RL的应用通常限于完全的模拟领域。要使RL有效地应用于嵌入式系统功能开发,生成的代理人必须能够处理实际世界应用。在这种情况下,这项工作关注于加速RL代理人的训练过程,通过结合传输学习(TL)和X-in-the-Loop(XiL)模拟来实现。为内燃机器循环油耗控制的使用例子,我们使用计算成本低的模型-in-the-Loop(MiL)模拟选择适当的算法,细调超参数,并最终在HiL系统中训练候选代理人。这些预训练RL代理人然后在硬件-in-the-Loop(HiL)系统中进行TL传输,并对奖金参数进行调整。结果表明,在进行真正硬件上训练RL代理人是必要的,并且在RL代理人的成熟度和训练时间之间存在强烈的相互作用。

Semiring Provenance for Lightweight Description Logics

  • paper_url: http://arxiv.org/abs/2310.16472
  • repo_url: None
  • paper_authors: Camille Bourgaux, Ana Ozaki, Rafael Peñaloza
  • for: 本研究围绕描述逻辑中的semiring provenance框架进行调查,以增强描述逻辑的表达能力和可解释性。
  • methods: 本研究使用了 commutative semiring 来注释描述逻辑规则,并通过对 ontology 后果的 Computation 来反映这些注释的 derivation。
  • results: 研究人员定义了一种 semiring provenance semantics,并证明其满足了一些愿景性质(如EXTEND)。 另外,研究人员还研究了 why-provenance 的复杂性问题,并对 positive Boolean provenance 和 lineage 进行了研究。
    Abstract We investigate semiring provenance--a successful framework originally defined in the relational database setting--for description logics. In this context, the ontology axioms are annotated with elements of a commutative semiring and these annotations are propagated to the ontology consequences in a way that reflects how they are derived. We define a provenance semantics for a language that encompasses several lightweight description logics and show its relationships with semantics that have been defined for ontologies annotated with a specific kind of annotation (such as fuzzy degrees). We show that under some restrictions on the semiring, the semantics satisfies desirable properties (such as extending the semiring provenance defined for databases). We then focus on the well-known why-provenance, which allows to compute the semiring provenance for every additively and multiplicatively idempotent commutative semiring, and for which we study the complexity of problems related to the provenance of an axiom or a conjunctive query answer. Finally, we consider two more restricted cases which correspond to the so-called positive Boolean provenance and lineage in the database setting. For these cases, we exhibit relationships with well-known notions related to explanations in description logics and complete our complexity analysis. As a side contribution, we provide conditions on an ELHI_bot ontology that guarantee tractable reasoning.
    摘要 我团队 investigate semiring provenance---一种成功的框架,原定于关系数据库设置--- для描述逻辑。在这个 контексте,ontology axioms 被标注为 комму态 semiring 的元素,并且这些标注被传递到ontology consequences中,以反映它们如何被 derivation。我们定义了描述逻辑语言中的 provenance semantics,并证明其与其他语言中的 semantics 有关系。我们还证明,在certain restrictions 下,这种 semantics 满足了愉悦的性质(如扩展关系数据库中的 semiring provenance)。然后,我们专注于 well-known why-provenance,可以计算 semiring provenance для每个可加法和可乘法 идеmpotent commutative semiring,并研究这些问题的复杂性。最后,我们考虑两种更加限制的情况,即positive Boolean provenance 和 lineage 在数据库设置中。对于这些情况,我们提出了与描述逻辑中的解释相关的一些概念,并完成了我们的复杂性分析。作为一个Side contribution,我们提供了ELHI_bot ontology 的条件,以 garantue tractable reasoning。

Towards Explainability in Monocular Depth Estimation

  • paper_url: http://arxiv.org/abs/2310.16457
  • repo_url: None
  • paper_authors: Vasileios Arampatzakis, George Pavlidis, Kyriakos Pantoglou, Nikolaos Mitianoudis, Nikos Papamarkos
  • for: 本研究旨在探讨深度估计方法的可解性,具体来说是如何 humans perceive depth 的方法。
  • methods: 本研究使用了深度学习方法,并在实验中测试了state-of-the-art方法,以 indirectly 评估这些方法在定义的上下文中的可解性。
  • results: 结果表明,使用不同的方法可以达到约77%的准确率,其中一些方法表现出色,这种性能差异 indirectly 透露了这些方法对Relative size 的感知。
    Abstract The estimation of depth in two-dimensional images has long been a challenging and extensively studied subject in computer vision. Recently, significant progress has been made with the emergence of Deep Learning-based approaches, which have proven highly successful. This paper focuses on the explainability in monocular depth estimation methods, in terms of how humans perceive depth. This preliminary study emphasizes on one of the most significant visual cues, the relative size, which is prominent in almost all viewed images. We designed a specific experiment to mimic the experiments in humans and have tested state-of-the-art methods to indirectly assess the explainability in the context defined. In addition, we observed that measuring the accuracy required further attention and a particular approach is proposed to this end. The results show that a mean accuracy of around 77% across methods is achieved, with some of the methods performing markedly better, thus, indirectly revealing their corresponding potential to uncover monocular depth cues, like relative size.
    摘要 Computer vision 领域中两维图像中depth的估计已经是长期的挑战和广泛研究的主题。在深度学习方法的出现后,这个领域已经做出了重要的进展。本文将关注在单目深度估计方法中的可解释性,即人类如何感受到深度。本初步研究强调了人类最重要的视觉cue之一——相对大小,这种cue在大多数视图图像中都很显著。我们设计了一个特定的实验,模拟人类的实验,并测试了当前领域的state-of-the-art方法,以 indirectly 评估这些方法在定义的上下文中的可解释性。此外,我们发现了测试准确性需要进一步的注意和一种特定的方法的提议,以便实现更高的准确性。结果显示,所有方法的平均准确率在77%左右,其中一些方法表现出色,这种表现直接表明它们感受到了单目深度cue,如相对大小。

Graph-based multimodal multi-lesion DLBCL treatment response prediction from PET images

  • paper_url: http://arxiv.org/abs/2310.16863
  • repo_url: None
  • paper_authors: Oriane Thiery, Mira Rizkallah, Clément Bailly, Caroline Bodet-Milin, Emmanuel Itti, René-Olivier Casasnovas, Steven Le Gouill, Thomas Carlier, Diana Mateus
  • for: 这个研究旨在发展一个电脑支持的方法,以帮助诊断和跟踪悉普性大B细胞淋巴癌(DLBCL)。
  • methods: 这个方法使用最新的 graf neural network,结合多个肿瘤的影像信息,并使用标注模组优化不同数据模式之间的integreation。
  • results: 实验结果显示,我们提议的方法在583名病例中的训练和评估中,具有较高的2年进展自由生存率(PFS)准确率。
    Abstract Diffuse Large B-cell Lymphoma (DLBCL) is a lymphatic cancer involving one or more lymph nodes and extranodal sites. Its diagnostic and follow-up rely on Positron Emission Tomography (PET) and Computed Tomography (CT). After diagnosis, the number of nonresponding patients to standard front-line therapy remains significant (30-40%). This work aims to develop a computer-aided approach to identify high-risk patients requiring adapted treatment by efficiently exploiting all the information available for each patient, including both clinical and image data. We propose a method based on recent graph neural networks that combine imaging information from multiple lesions, and a cross-attention module to integrate different data modalities efficiently. The model is trained and evaluated on a private prospective multicentric dataset of 583 patients. Experimental results show that our proposed method outperforms classical supervised methods based on either clinical, imaging or both clinical and imaging data for the 2-year progression-free survival (PFS) classification accuracy.
    摘要 大细胞淋巴癌(DLBCL)是一种淋巴癌细胞扩散到一个或多个 лимф节和外周组织。其诊断和跟踪凭据 Positron Emission Tomography(PET)和计算机 Tomography(CT)。诊断后,标准前线治疗不响应病人的比例仍然很高(30-40%)。这项工作的目标是开发一种基于计算机技术的方法,以提高高风险患者的个性化治疗方案,通过有效地利用每个患者的所有信息,包括临床和图像数据。我们提出一种基于最新的图 neural networks 的方法,将多个肿瘤的图像信息集成,并使用交叉注意模块来有效地集成不同数据模式。模型在一个私人前推multicentric dataset上训练和评估,实验结果显示,我们的提议方法在2年内生存率(PFS)的分类准确率上超过了传统的指导方法,基于临床、图像或两者的数据。

Faithful Path Language Modelling for Explainable Recommendation over Knowledge Graph

  • paper_url: http://arxiv.org/abs/2310.16452
  • repo_url: None
  • paper_authors: Giacomo Balloccu, Ludovico Boratto, Christian Cancedda, Gianni Fenu, Mirko Marras
  • for: 这篇论文旨在提高推荐系统的透明度,通过基于知识图的路径理解方法。
  • methods: 本文提出了一种新的方法,即PEARLM,它通过语言模型来有效地捕捉用户行为和产品知识,并将实体和关系在同一个优化空间中归一化。
  • results: 对两个数据集进行实验,PEARLM方法比州先进的基elines表现出色,可以更好地捕捉用户的偏好和嗜好。Here’s the English version for reference:
  • for: This paper aims to improve the transparency of recommendation systems by using path reasoning methods over knowledge graphs.
  • methods: The proposed method, PEARLM, efficiently captures user behavior and product-side knowledge through language modeling, and unifies entities and relations in the same optimization space.
  • results: Experimental results on two datasets show the effectiveness of PEARLM compared to state-of-the-art baselines.
    Abstract Path reasoning methods over knowledge graphs have gained popularity for their potential to improve transparency in recommender systems. However, the resulting models still rely on pre-trained knowledge graph embeddings, fail to fully exploit the interdependence between entities and relations in the KG for recommendation, and may generate inaccurate explanations. In this paper, we introduce PEARLM, a novel approach that efficiently captures user behaviour and product-side knowledge through language modelling. With our approach, knowledge graph embeddings are directly learned from paths over the KG by the language model, which also unifies entities and relations in the same optimisation space. Constraints on the sequence decoding additionally guarantee path faithfulness with respect to the KG. Experiments on two datasets show the effectiveness of our approach compared to state-of-the-art baselines. Source code and datasets: AVAILABLE AFTER GETTING ACCEPTED.
    摘要 translate_text="Path reasoning methods over knowledge graphs have gained popularity for their potential to improve transparency in recommender systems. However, the resulting models still rely on pre-trained knowledge graph embeddings, fail to fully exploit the interdependence between entities and relations in the KG for recommendation, and may generate inaccurate explanations. In this paper, we introduce PEARLM, a novel approach that efficiently captures user behavior and product-side knowledge through language modeling. With our approach, knowledge graph embeddings are directly learned from paths over the KG by the language model, which also unifies entities and relations in the same optimization space. Constraints on the sequence decoding additionally guarantee path faithfulness with respect to the KG. Experiments on two datasets show the effectiveness of our approach compared to state-of-the-art baselines. Source code and datasets: AVAILABLE AFTER GETTING ACCEPTED."Here's the translation in Simplified Chinese: PATH 理解方法在知识Graph中得到了广泛的应用,因为它们可以提高推荐系统的透明度。然而,现有的模型仍然依赖于预训练的知识Graph嵌入,不充分利用知识Graph中Entity和关系之间的互相依赖关系,并可能生成错误的解释。在这篇论文中,我们介绍了PEARLM,一种新的方法,可以效率地捕捉用户行为和产品 сторо面知识通过语言模型。我们的方法直接从知识Graph中的路径上学习语言模型,同时也将Entity和关系嵌入到同一个优化空间中。另外,对序列解码加入约束,保证路径的准确性与知识Graph相关。在两个数据集上进行了实验,比较了我们的方法与现有的基elines。源代码和数据集:接受后提供。

Diversity Enhanced Narrative Question Generation for Storybooks

  • paper_url: http://arxiv.org/abs/2310.16446
  • repo_url: https://github.com/hkyoon95/mqg
  • paper_authors: Hokeun Yoon, JinYeong Bak
  • for: 提高学习或对话环境中的理解、参与度、评估和总效果。
  • methods: 使用多个问题生成模型(mQG),通过专注于上下文和问题来生成多个、多样化的、可回答的问题。
  • results: 在 FairytaleQA 数据集上取得了优异的评估结果,并在零学习情况下应用于 TellMeWhy 和 SQuAD1.1 数据集上得到了扎实的结果。
    Abstract Question generation (QG) from a given context can enhance comprehension, engagement, assessment, and overall efficacy in learning or conversational environments. Despite recent advancements in QG, the challenge of enhancing or measuring the diversity of generated questions often remains unaddressed. In this paper, we introduce a multi-question generation model (mQG), which is capable of generating multiple, diverse, and answerable questions by focusing on context and questions. To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model, classifying the questions as answerable or not. We train and evaluate mQG on the FairytaleQA dataset, a well-structured QA dataset based on storybooks, with narrative questions. We further apply a zero-shot adaptation on the TellMeWhy and SQuAD1.1 datasets. mQG shows promising results across various evaluation metrics, among strong baselines.
    摘要 Question generation(QG)从给定的上下文中可以提高理解、参与度、评估和总体效果在学习或对话环境中。尽管最近在QG方面有所进步,但仍然有很多问题的多样性增进或评估的挑战。在这篇论文中,我们提出了多个问题生成模型(mQG),可以生成多个、多样的和可答案的问题,通过关注上下文和问题来做这个。为验证生成的问题是否可答,我们使用了SQuAD2.0 Fine-tuned问题回答模型,将问题分为可答和不可答两类。我们在FairytaleQA dataset上训练和评估mQG,该dataset基于故事书的问题。此外,我们还对TellMeWhy和SQuAD1.1 datasets进行零容量适应。mQG在不同的评价指标上表现出色,与强基线相比。

An Integrative Paradigm for Enhanced Stroke Prediction: Synergizing XGBoost and xDeepFM Algorithms

  • paper_url: http://arxiv.org/abs/2310.16430
  • repo_url: None
  • paper_authors: Weinan Dai, Yifeng Jiang, Chengjie Mou, Chongyu Zhang
  • for: 预测中风的目的是为了预防和管理这种残割性疾病。
  • methods: 本研究使用了一个完整的数据集,并提出了一个集成模型,该模型将XGBoost和xDeepFM算法相结合。
  • results: 我们通过严格的实验验证了我们的集成模型的有效性,并与其他模型进行比较,从而获得了有价值的发现,以及对机器学习和深度学习技术在中风预测领域的贡献。
    Abstract Stroke prediction plays a crucial role in preventing and managing this debilitating condition. In this study, we address the challenge of stroke prediction using a comprehensive dataset, and propose an ensemble model that combines the power of XGBoost and xDeepFM algorithms. Our work aims to improve upon existing stroke prediction models by achieving higher accuracy and robustness. Through rigorous experimentation, we validate the effectiveness of our ensemble model using the AUC metric. Through comparing our findings with those of other models in the field, we gain valuable insights into the merits and drawbacks of various approaches. This, in turn, contributes significantly to the progress of machine learning and deep learning techniques specifically in the domain of stroke prediction.
    摘要 <>转换文本到简化中文。<>roke 预测在stroke 的预防和管理中发挥关键作用。在这个研究中,我们面临roke 预测挑战,使用了 comprehensive 数据集,并提议一种ensemble 模型,将 XGBoost 和 xDeepFM 算法相结合。我们的工作目的是提高现有roke 预测模型的准确率和可靠性。通过严格的实验,我们验证了我们的ensemble 模型的有效性,使用 AUC 指标。通过与其他模型在领域中比较我们的发现,我们获得了对machine learning 和 deep learning 技术在roke 预测中的应用所得的有价值的理解。

Graph Agent: Explicit Reasoning Agent for Graphs

  • paper_url: http://arxiv.org/abs/2310.16421
  • repo_url: None
  • paper_authors: Qinyong Wang, Zhenxiang Gao, Rong Xu
  • for: 本文旨在提供一种基于大语言模型、印uctive-deductive 逻辑模块和长期记忆的知识图reasoning方法,以提高复杂的知识图 reasoning任务的效果。
  • methods: 本文提出了一种名为Graph Agent(GA)的智能代理方法,该方法利用大语言模型(LLM)、印uctive-deductive 逻辑模块和长期记忆来进行知识图reasoning任务。GA通过将图结构转换成文本数据,使得LLM可以处理、分析和提供预测结果,同时也可以提供人类可读的解释。
  • results: 本文的实验结果表明,GA在节点分类和链接预测任务上达到了状态的术性表现,具体的Result分别为Cora数据集的90.65%, PubMed数据集的95.48%, PrimeKG数据集的89.32%。相比之下,现有的GNN和transformer模型,GA具有显著的显式逻辑能力、免训练、易于适应不同的知识图reasoning任务的优势。
    Abstract Graph embedding methods such as Graph Neural Networks (GNNs) and Graph Transformers have contributed to the development of graph reasoning algorithms for various tasks on knowledge graphs. However, the lack of interpretability and explainability of graph embedding methods has limited their applicability in scenarios requiring explicit reasoning. In this paper, we introduce the Graph Agent (GA), an intelligent agent methodology of leveraging large language models (LLMs), inductive-deductive reasoning modules, and long-term memory for knowledge graph reasoning tasks. GA integrates aspects of symbolic reasoning and existing graph embedding methods to provide an innovative approach for complex graph reasoning tasks. By converting graph structures into textual data, GA enables LLMs to process, reason, and provide predictions alongside human-interpretable explanations. The effectiveness of the GA was evaluated on node classification and link prediction tasks. Results showed that GA reached state-of-the-art performance, demonstrating accuracy of 90.65%, 95.48%, and 89.32% on Cora, PubMed, and PrimeKG datasets, respectively. Compared to existing GNN and transformer models, GA offered advantages of explicit reasoning ability, free-of-training, easy adaption to various graph reasoning tasks
    摘要 graph embedding方法如graph neural networks (GNNs)和graph transformers已经为知识图reasoning任务提供了贡献。然而,graph embedding方法的 interpretability和可解释性限制了它们在需要显式reasoning的场景下的应用。在这篇论文中,我们介绍了Graph Agent (GA),一种基于大型自然语言模型(LLMs)、推理模块和长期记忆的知识图reasoning方法。GA结合了symbolic reasoning和现有的graph embedding方法,提供了一种创新的graph reasoning任务approach。通过将graph结构转化为文本数据,GA使得LLMs可以处理、理解和提供预测,同时提供人类可解释的解释。GA在节点分类和链接预测任务上进行了评估,结果显示GA达到了状态之 искусственный智能的性能,具体的数据如下:在Cora、PubMed和PrimeKG datasets上,GA的准确率分别达到了90.65%、95.48%和89.32%。相比之前的GNN和transformer模型,GA具有显式reasoning能力、免训练和适应不同的graph reasoning任务的优势。

Balancing Augmentation with Edge-Utility Filter for Signed GNNs

  • paper_url: http://arxiv.org/abs/2310.16862
  • repo_url: None
  • paper_authors: Ke-Jia Chen, Yaming Ji, Youran Qu, Chuhan Xu
  • for: 这种论文的目的是提高signed graph neural networks(SGNNs)的稳定性和性能,通过增强graph的结构和semantic balance。
  • methods: 该论文提出了一种增强策略,包括测量每个负边的utilty,并 selectively增强graph的结构和semantic balance。
  • results: 实验表明,该方法可以显著提高SGNN的性能和普适性,并且可以在五种实际 dataset中进行链接预测。
    Abstract Signed graph neural networks (SGNNs) has recently drawn more attention as many real-world networks are signed networks containing two types of edges: positive and negative. The existence of negative edges affects the SGNN robustness on two aspects. One is the semantic imbalance as the negative edges are usually hard to obtain though they can provide potentially useful information. The other is the structural unbalance, e.g. unbalanced triangles, an indication of incompatible relationship among nodes. In this paper, we propose a balancing augmentation method to address the above two aspects for SGNNs. Firstly, the utility of each negative edge is measured by calculating its occurrence in unbalanced structures. Secondly, the original signed graph is selectively augmented with the use of (1) an edge perturbation regulator to balance the number of positive and negative edges and to determine the ratio of perturbed edges to original edges and (2) an edge utility filter to remove the negative edges with low utility to make the graph structure more balanced. Finally, a SGNN is trained on the augmented graph which effectively explores the credible relationships. A detailed theoretical analysis is also conducted to prove the effectiveness of each module. Experiments on five real-world datasets in link prediction demonstrate that our method has the advantages of effectiveness and generalization and can significantly improve the performance of SGNN backbones.
    摘要 signed 图 neural networks (SGNNs) 在最近几年引起了更多的关注,因为许多实际网络是带有两种类型的边:正向和负向的边。存在负向边的存在会对 SGNN 的Robustness 产生两种方面的影响。一个是 semantics 不均衡,负向边通常很难以获得,但它们可能提供有用信息。另一个是结构不均衡,例如不均衡的三角形,这表明节点之间的关系不兼容。在这篇论文中,我们提议一种平衡增强方法,以解决上述两个方面的问题。首先,我们测量每个负向边的使用价值,计算它们在不均衡结构中的发生频率。其次,我们使用(1)边扰动调节器来平衡正向和负向边的数量,并确定扰动的比例。(2)边用途筛选器来从原始签入的图中删除低使用价值的负向边,以使得图的结构更加平衡。最后,我们在增强后的图上训练 SGNN,以便更好地探索有效的关系。我们还进行了详细的理论分析,以证明每个模块的有效性。在五个实际链接预测任务上进行了实验,结果表明我们的方法具有效果和普适性,可以显著提高 SGNN 的表现。

Open Knowledge Base Canonicalization with Multi-task Unlearning

  • paper_url: http://arxiv.org/abs/2310.16419
  • repo_url: None
  • paper_authors: Bingchen Liu, Shihao Hou, Weixin Zeng, Xiang Zhao, Shijun Liu, Li Pan
  • for: OKB canonicalization (大规模移动计算领域中的知识库建设)
  • methods: machine unlearning(机器忘记) + clustering + KGE learning(知识图加 embeddings 学习)
  • results: advanced machine unlearning effects(提高机器忘记效果)In more detail, the paper proposes a multi-task unlearning framework called MulCanon, which utilizes the noise characteristics in the diffusion model to achieve machine unlearning for data in OKB canonicalization. The framework unifies the learning objectives of diffusion model, KGE, and clustering algorithms, and adopts a two-step multi-task learning paradigm for training. The experimental study on popular OKB canonicalization datasets shows that MulCanon achieves advanced machine unlearning effects.
    Abstract The construction of large open knowledge bases (OKBs) is integral to many applications in the field of mobile computing. Noun phrases and relational phrases in OKBs often suffer from redundancy and ambiguity, which calls for the investigation on OKB canonicalization. However, in order to meet the requirements of some privacy protection regulations and to ensure the timeliness of the data, the canonicalized OKB often needs to remove some sensitive information or outdated data. The machine unlearning in OKB canonicalization is an excellent solution to the above problem. Current solutions address OKB canonicalization by devising advanced clustering algorithms and using knowledge graph embedding (KGE) to further facilitate the canonicalization process. Effective schemes are urgently needed to fully synergise machine unlearning with clustering and KGE learning. To this end, we put forward a multi-task unlearning framework, namely MulCanon, to tackle machine unlearning problem in OKB canonicalization. Specifically, the noise characteristics in the diffusion model are utilized to achieve the effect of machine unlearning for data in OKB. MulCanon unifies the learning objectives of diffusion model, KGE and clustering algorithms, and adopts a two-step multi-task learning paradigm for training. A thorough experimental study on popular OKB canonicalization datasets validates that MulCanon achieves advanced machine unlearning effects.
    摘要 大规模开放知识库(OKB)的建构对移动计算应用程序而言是非常重要的。OKB中的名实词和关系词 oftentimes受到重复和歧义的影响,这种情况需要研究OKB canonicalization。然而,为了遵守一些隐私保护法规和保证数据的时效性, canonicalized OKB often需要 removing some sensitive information or outdated data。机器学习 inverse в OKB canonicalization 是一个优秀的解决方案。现有的解决方案通过设计高级划分算法和使用知识图嵌入(KGE)来进一步促进 canonicalization 过程。有效的方案是必要的,以全面融合机器学习 inverse 与划分和 KGE 学习。为此,我们提出了一种多任务学习框架,即 MulCanon,用于解决机器学习 inverse 问题在 OKB canonicalization 中。 Specifically, MulCanon 利用了Diffusion model中的噪音特征来实现数据中的机器学习 inverse。MulCanon 将 diffusion model、KGE 和划分算法的学习目标统一,采用了两步多任务学习 paradigm for training。经过对popular OKB canonicalization 数据集的严格实验 validate that MulCanon 可以 achieve advanced machine unlearning effects.

Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero

  • paper_url: http://arxiv.org/abs/2310.16410
  • repo_url: None
  • paper_authors: Lisa Schut, Nenad Tomasev, Tom McGrath, Demis Hassabis, Ulrich Paquet, Been Kim
    for: 这篇论文旨在探讨如何利用高性能的人工智能系统(AlphaZero)中嵌入的隐藏知识,以提高人类专家性能。methods: 作者提出了一种新的方法,用于从AlphaZero中提取新的棋盘概念,并进行人类学习和评估。results: 研究表明,AlphaZero可能嵌入的知识不仅超过了人类知识,还可以成功地传授给人类专家。在人类研究中,四位世界级国际象棋大师通过解决提交的概念原型位置,得到了改进。
    Abstract Artificial Intelligence (AI) systems have made remarkable progress, attaining super-human performance across various domains. This presents us with an opportunity to further human knowledge and improve human expert performance by leveraging the hidden knowledge encoded within these highly performant AI systems. Yet, this knowledge is often hard to extract, and may be hard to understand or learn from. Here, we show that this is possible by proposing a new method that allows us to extract new chess concepts in AlphaZero, an AI system that mastered the game of chess via self-play without human supervision. Our analysis indicates that AlphaZero may encode knowledge that extends beyond the existing human knowledge, but knowledge that is ultimately not beyond human grasp, and can be successfully learned from. In a human study, we show that these concepts are learnable by top human experts, as four top chess grandmasters show improvements in solving the presented concept prototype positions. This marks an important first milestone in advancing the frontier of human knowledge by leveraging AI; a development that could bear profound implications and help us shape how we interact with AI systems across many AI applications.
    摘要 人工智能(AI)系统已经取得了很大的进步,在不同领域达到了人类超越性表现。这给我们提供了一个机会,通过利用AI系统中隐藏的知识来进一步推动人类知识和专家表现。然而,这些知识可能很难提取,并且可能很难理解或学习。在这里,我们提出了一种新的方法,可以从AlphaZero AI系统中提取新的棋盘概念。我们的分析表明,AlphaZero可能具有超越人类知识的知识,但这些知识并不是不可以被人类理解和学习的。在人类研究中,我们发现,这些概念可以被四位高级国际象棋大师学习,他们在给出的概念原型位置中解决问题的能力得到了改进。这标志着我们在利用AI推动人类知识的前夕,这可能会对许多AI应用程序产生深远的影响,并帮助我们如何与AI系统交互。

Challenges of Radio Frequency Fingerprinting: From Data Collection to Deployment

  • paper_url: http://arxiv.org/abs/2310.16406
  • repo_url: None
  • paper_authors: Saeif Alhazbi, Ahmed Hussain, Savio Sciancalepore, Gabriele Oligeri, Panos Papadimitratos
  • for: 本文旨在探讨Radio Frequency Fingerprinting(RFF)技术如何使用机器学习(ML)和深度学习(DL)来实现无线设备认证。
  • methods: 本文使用的方法包括RFF技术和ML/DL技术,并对这些技术的缺陷和挑战进行分析。
  • results: 本文的研究发现现有的RFF系统尚未能够在实际应用中使用,并且存在许多挑战和问题。未来的研究应该关注这些问题,以便实现RFF系统的真正应用。
    Abstract Radio Frequency Fingerprinting (RFF) techniques promise to authenticate wireless devices at the physical layer based on inherent hardware imperfections introduced during manufacturing. Such RF transmitter imperfections are reflected into over-the-air signals, allowing receivers to accurately identify the RF transmitting source. Recent advances in Machine Learning, particularly in Deep Learning (DL), have improved the ability of RFF systems to extract and learn complex features that make up the device-specific fingerprint. However, integrating DL techniques with RFF and operating the system in real-world scenarios presents numerous challenges. This article identifies and analyzes these challenges while considering the three reference phases of any DL-based RFF system: (i) data collection and preprocessing, (ii) training, and finally, (iii) deployment. Our investigation points out the current open problems that prevent real deployment of RFF while discussing promising future directions, thus paving the way for further research in the area.
    摘要 radio 频率指纹技术 (RFF) 承诺通过物理层认证无线设备,基于生产过程中引入的固有硬件瑕疵。这些 RF 发送器瑕疵会在通过空气信号中反射,让接收器准确地识别 RF 发送源。近年来,机器学习技术的进步,特别是深度学习 (DL),提高了 RFF 系统EXTRACT和学习复杂的设备特征。然而,将 DL 技术与 RFF 集成并在实际场景中运行充满挑战。这篇文章确认并分析了这些挑战,并考虑了任何 DL-based RFF 系统的三个参考阶段:(一)数据收集和处理,(二)训练,最后(三)部署。我们的调查发现当前还存在许多未解决的问题,阻碍 RFF 的实际应用,并讨论了未来研究的可能性,以便继续深入研究这一领域。

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.16400
  • repo_url: None
  • paper_authors: Tianyi Lu, Xing Zhang, Jiaxi Gu, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu
  • for: 文章目的是提出一种无需训练的框架,以实现根据文本指导的视频编辑。
  • methods: 方法是将图像LDM和视频LDM的latent拟合在denoising过程中,以保持视频LDM中的时间一致性,同时充分利用图像LDM中的高精度。
  • results: 对比于传统方法,FLDM可以提高文本对齐和时间一致性的编辑视频质量。
    Abstract Latent Diffusion Models (LDMs) are renowned for their powerful capabilities in image and video synthesis. Yet, video editing methods suffer from insufficient pre-training data or video-by-video re-training cost. In addressing this gap, we propose FLDM (Fused Latent Diffusion Model), a training-free framework to achieve text-guided video editing by applying off-the-shelf image editing methods in video LDMs. Specifically, FLDM fuses latents from an image LDM and an video LDM during the denoising process. In this way, temporal consistency can be kept with video LDM while high-fidelity from the image LDM can also be exploited. Meanwhile, FLDM possesses high flexibility since both image LDM and video LDM can be replaced so advanced image editing methods such as InstructPix2Pix and ControlNet can be exploited. To the best of our knowledge, FLDM is the first method to adapt off-the-shelf image editing methods into video LDMs for video editing. Extensive quantitative and qualitative experiments demonstrate that FLDM can improve the textual alignment and temporal consistency of edited videos.
    摘要 Latent Diffusion Models (LDMs) 是以强大的能力在图像和视频生成而著称的。然而,视频编辑方法受到缺乏预训练数据或视频视频重新训练成本的限制。为了填补这一漏洞,我们提议了 FLDM(混合潜在扩散模型),一种不需要训练的框架,通过在视频 LDM 中应用市场上ready-to-use的图像编辑方法来实现文本指导的视频编辑。具体来说,FLDM 在杂化过程中将图像 LDM 和视频 LDM 的潜在特征进行混合。这样可以保持视频 LDM 中的时间一致性,同时也可以利用图像 LDM 中的高精度特性。此外,FLDM 具有高灵活性,因为图像 LDM 和视频 LDM 都可以被更高级的图像编辑方法所取代,例如 InstructPix2Pix 和 ControlNet。根据我们所知,FLDM 是首次将市场上ready-to-use的图像编辑方法应用到视频 LDM 中进行视频编辑。EXT 广泛的量化和质量测试表明,FLDM 可以改善编辑后的文本对齐和时间一致性。

Evaluating General-Purpose AI with Psychometrics

  • paper_url: http://arxiv.org/abs/2310.16379
  • repo_url: None
  • paper_authors: Xiting Wang, Liming Jiang, Jose Hernandez-Orallo, Luning Sun, David Stillwell, Fang Luo, Xing Xie
  • for: This paper aims to improve the evaluation of general-purpose AI systems by incorporating psychometrics, the science of psychological measurement.
  • methods: The proposed method uses psychometrics to identify and measure the latent constructs that underlie performance across multiple tasks, providing a more comprehensive and rigorous approach to evaluating AI systems.
  • results: The authors propose a framework for integrating psychometrics with AI and explore future opportunities for doing so, with the goal of improving the evaluation and understanding of general-purpose AI systems.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目标是通过吸收心理测量科学(psychometrics)来改善普适人工智能系统的评估。
  • methods: 该方法使用心理测量科学来识别和测量多个任务下的潜在构uct,从而提供一种更加全面和准确的人工智能系统评估方法。
  • results: 作者提出了将心理测量科学与人工智能集成的框架,并探讨未来可能性,以提高普适人工智能系统的评估和理解。
    Abstract Artificial intelligence (AI) has witnessed an evolution from task-specific to general-purpose systems that trend toward human versatility. As AI systems begin to play pivotal roles in society, it is important to ensure that they are adequately evaluated. Current AI benchmarks typically assess performance on collections of specific tasks. This has drawbacks when used for assessing general-purpose AI systems. First, it is difficult to predict whether AI systems could complete a new task it has never seen or that did not previously exist. Second, these benchmarks often focus on overall performance metrics, potentially overlooking the finer details crucial for making informed decisions. Lastly, there are growing concerns about the reliability of existing benchmarks and questions about what is being measured. To solve these challenges, this paper suggests that psychometrics, the science of psychological measurement, should be placed at the core of evaluating general-purpose AI. Psychometrics provides a rigorous methodology for identifying and measuring the latent constructs that underlie performance across multiple tasks. We discuss its merits, warn against potential pitfalls, and propose a framework for putting it into practice. Finally, we explore future opportunities to integrate psychometrics with AI.
    摘要 人工智能(AI)在演化过程中从任务特定向广泛应用系统,趋向人类多能。随着AI系统在社会中扮演着重要角色,因此需要确保它们得到了足够的评估。现有的AIbenchmarks通常评估特定任务集合的性能。这有一些缺点,如果用于评估通用AI系统。首先,难以预测AI系统能否完成它从未看过或者不存在的新任务。其次,这些benchmarks通常专注于总性能指标,可能忽略了决策过程中的细节。最后,有关现有benchmarks的可靠性和评估的问题也在提出。为解决这些挑战,这篇论文建议将心理测量(psychometrics)置于AI评估的核心。心理测量提供了一种严格的方法来识别和测量在多个任务中表现的隐藏构造。我们讲述其优点、警告 против潜在的陷阱,并提出一个实施框架。最后,我们探讨将心理测量与AI集成的未来机会。

GADY: Unsupervised Anomaly Detection on Dynamic Graphs

  • paper_url: http://arxiv.org/abs/2310.16376
  • repo_url: None
  • paper_authors: Shiqi Lou, Qingyue Zhang, Shujie Yang, Yuyang Tian, Zhaoxuan Tan, Minnan Luo
  • for: 检测动态图中异常行为,即在图和其时间信息中检测entity的行为异常。
  • methods: 我们提出了一种基于自动生成的无监督图动态异常检测方法(GADY),用于解决现有方法面临的动态结构建构挑战和负样本生成挑战。
  • results: 我们的GADY方法在三个真实世界数据集上表现出了显著的优异,与前一个状态艺术法相比显著提高了性能。
    Abstract Anomaly detection on dynamic graphs refers to detecting entities whose behaviors obviously deviate from the norms observed within graphs and their temporal information. This field has drawn increasing attention due to its application in finance, network security, social networks, and more. However, existing methods face two challenges: dynamic structure constructing challenge - difficulties in capturing graph structure with complex time information and negative sampling challenge - unable to construct excellent negative samples for unsupervised learning. To address these challenges, we propose Unsupervised Generative Anomaly Detection on Dynamic Graphs (GADY). To tackle the first challenge, we propose a continuous dynamic graph model to capture the fine-grained information, which breaks the limit of existing discrete methods. Specifically, we employ a message-passing framework combined with positional features to get edge embeddings, which are decoded to identify anomalies. For the second challenge, we pioneer the use of Generative Adversarial Networks to generate negative interactions. Moreover, we design a loss function to alter the training goal of the generator while ensuring the diversity and quality of generated samples. Extensive experiments demonstrate that our proposed GADY significantly outperforms the previous state-of-the-art method on three real-world datasets. Supplementary experiments further validate the effectiveness of our model design and the necessity of each module.
    摘要 “异常探测在动态图表上指的是检测图表中的元素,其行为明显与常规模式不同。这个领域在金融、网络安全、社交网络等领域获得了越来越多的注意。但现有方法面临两个挑战:动态结构建构挑战 - 对复杂时间信息的图表结构捕捉困难,以及负数样本挑战 - 无法建立出色的负数样本供无监督学习。为解决这两个挑战,我们提出了不supervised生成异常探测方法(GADY)。”“为了解决第一个挑战,我们提出了一个连续动态图表模型,以捕捉细节信息。这个模型与现有的缓存方法不同,可以更好地捕捉图表中的细节变化。具体来说,我们运用了讯息传递框架,融合 pozitional 特征以获得边嵌入,这些边嵌入可以转换为异常探测。”“对于第二个挑战,我们创新使用生成对抗网络来生成负数样本。此外,我们设计了一个损失函数,以调整生成器的训练目标,并确保生成的样本多样性和质量。实验结果显示,我们的提案GADY对三个真实世界数据集的表现明显超越了前一代方法。补充实验更进一步验证了我们的模型设计和各模块的必要性。”

InstructPTS: Instruction-Tuning LLMs for Product Title Summarization

  • paper_url: http://arxiv.org/abs/2310.16361
  • repo_url: None
  • paper_authors: Besnik Fetahu, Zhiyu Chen, Oleg Rokhlenko, Shervin Malmasi
  • for: 这个论文旨在提高电子商务产品目录中的商品标题概要,以便更好地支持推荐、问答和评论摘要等功能。
  • methods: 这个论文提出了一种可控的产品标题概要方法,基于最近的指令精度练习方法。该方法可以根据不同的标准(例如字数、包含特定短语等)生成匹配的产品标题概要。
  • results: 对实际电子商务目录进行了广泛的评估,结果显示,与简单的精度练习方法相比,该方法可以生成更准确的产品名称概要,提高了14和8个BLEU和ROUGE分数。
    Abstract E-commerce product catalogs contain billions of items. Most products have lengthy titles, as sellers pack them with product attributes to improve retrieval, and highlight key product aspects. This results in a gap between such unnatural products titles, and how customers refer to them. It also limits how e-commerce stores can use these seller-provided titles for recommendation, QA, or review summarization. Inspired by recent work on instruction-tuned LLMs, we present InstructPTS, a controllable approach for the task of Product Title Summarization (PTS). Trained using a novel instruction fine-tuning strategy, our approach is able to summarize product titles according to various criteria (e.g. number of words in a summary, inclusion of specific phrases, etc.). Extensive evaluation on a real-world e-commerce catalog shows that compared to simple fine-tuning of LLMs, our proposed approach can generate more accurate product name summaries, with an improvement of over 14 and 8 BLEU and ROUGE points, respectively.
    摘要 电商产品目录中包含了数十亿个商品。大多数产品有很长的标题,卖家会将其填充产品特性,以提高检索和强调产品的关键特征。这会导致产品标题与客户的实际称呼之间出现一个差距,同时限制了电商店可以使用卖家提供的标题进行推荐、问答或评论摘要。我们受到最近的指令��unced LLMS的研究所 inspirited,我们提出了一种可控的产品标题摘要(PTS)方法。我们使用了一种新的指令细化策略来训练我们的方法,可以根据不同的标准(例如,摘要中单词数、包含特定短语等)来摘要产品标题。我们对实际的电商目录进行了广泛的评估,结果显示,相比于简单地细化LLMS,我们提出的方法可以生成更加准确的产品名称摘要,提高了14和8个BLEU和ROUGE分数。

  • paper_url: http://arxiv.org/abs/2310.16360
  • repo_url: None
  • paper_authors: Osim Kumar Pal, Md Sakib Hossain Shovon, M. F. Mridha, Jungpil Shin
  • for: This paper explores the potential of AI-powered UAVs in various applications, including navigation, object detection and tracking, wildlife monitoring, precision agriculture, rescue operations, surveillance, and communication among UAVs.
  • methods: The paper examines the use of AI techniques such as machine learning, computer vision, and deep learning to enable these applications, and discusses the challenges and limitations of these approaches.
  • results: The study highlights the potential of AI-powered UAVs to revolutionize industries such as agriculture, surveillance, and disaster management, while also raising ethical and safety concerns that need to be addressed.
    Abstract In recent years, the combination of artificial intelligence (AI) and unmanned aerial vehicles (UAVs) has brought about advancements in various areas. This comprehensive analysis explores the changing landscape of AI-powered UAVs and friendly computing in their applications. It covers emerging trends, futuristic visions, and the inherent challenges that come with this relationship. The study examines how AI plays a role in enabling navigation, detecting and tracking objects, monitoring wildlife, enhancing precision agriculture, facilitating rescue operations, conducting surveillance activities, and establishing communication among UAVs using environmentally conscious computing techniques. By delving into the interaction between AI and UAVs, this analysis highlights the potential for these technologies to revolutionise industries such as agriculture, surveillance practices, disaster management strategies, and more. While envisioning possibilities, it also takes a look at ethical considerations, safety concerns, regulatory frameworks to be established, and the responsible deployment of AI-enhanced UAV systems. By consolidating insights from research endeavours in this field, this review provides an understanding of the evolving landscape of AI-powered UAVs while setting the stage for further exploration in this transformative domain.
    摘要 The study examines how AI enables navigation, object detection and tracking, wildlife monitoring, precision agriculture, rescue operations, surveillance activities, and communication among UAVs using environmentally conscious computing techniques. By exploring the interaction between AI and UAVs, this analysis highlights the potential for these technologies to revolutionize industries such as agriculture, surveillance practices, disaster management strategies, and more.However, the analysis also considers ethical considerations, safety concerns, and regulatory frameworks to be established for the responsible deployment of AI-enhanced UAV systems. By consolidating insights from research in this field, this review provides an understanding of the evolving landscape of AI-powered UAVs and sets the stage for further exploration in this transformative domain.

AccoMontage-3: Full-Band Accompaniment Arrangement via Sequential Style Transfer and Multi-Track Function Prior

  • paper_url: http://arxiv.org/abs/2310.16334
  • repo_url: https://github.com/zhaojw1998/accomontage-3
  • paper_authors: Jingwei Zhao, Gus Xia, Ye Wang
  • for: 本研究旨在开发一种符号音乐自动化系统,可以基于输入的主题旋律和和声生成多轨、全乐队伴奏。
  • methods: 该系统包括三个模块,每个模块模拟了不同方面的全乐队作曲。第一个模块是一个钢琴编制器,通过精灵抽取和约束搜索,将约束式转移到和声中,生成钢琴伴奏。第二个模块是一个乐队编制器,根据乐曲的整体风格编码,将钢琴伴奏谱写成全乐队演奏。第三个模块是一个先前模型,用于描述乐曲的全局结构,从而在音乐上下文中应用样式转移。
  • results: 实验表明,该系统在与基准相比显著提高了表现,而且模块化设计具有有效的控制和音乐意义上的表达能力。
    Abstract We propose AccoMontage-3, a symbolic music automation system capable of generating multi-track, full-band accompaniment based on the input of a lead melody with chords (i.e., a lead sheet). The system contains three modular components, each modelling a vital aspect of full-band composition. The first component is a piano arranger that generates piano accompaniment for the lead sheet by transferring texture styles to the chords using latent chord-texture disentanglement and heuristic retrieval of texture donors. The second component orchestrates the piano accompaniment score into full-band arrangement according to the orchestration style encoded by individual track functions. The third component, which connects the previous two, is a prior model characterizing the global structure of orchestration style over the whole piece of music. From end to end, the system learns to generate full-band accompaniment in a self-supervised fashion, applying style transfer at two levels of polyphonic composition: texture and orchestration. Experiments show that our system outperforms the baselines significantly, and the modular design offers effective controls in a musically meaningful way.
    摘要 我们提出AccoMontage-3,一种符号音乐自动化系统,可以基于输入的主旋律和和声(即主稿)生成多轨、全团配乐。系统包括三个模块,每个模块都模拟了全团作曲中的一个重要方面。第一个模块是一个钢琴编制器,通过将文化样式传递到和声中的谱写法来生成钢琴伴奏。第二个模块将钢琴伴奏谱写成全团排版,根据每个乐器的特性和乐谱函数编码。第三个模块是一个先进的模型,用于模elling全团作曲风格的全局结构。从头到尾,系统通过自我超VI持学习生成全团配乐,并在多重复合作曲中应用样式转移。实验结果表明,我们的系统与基线相比有显著的优势,而模块化设计还提供了有效的控制方式,具有音乐意义上的 significances。

CoheSentia: A Novel Benchmark of Incremental versus Holistic Assessment of Coherence in Generated Texts

  • paper_url: http://arxiv.org/abs/2310.16329
  • repo_url: None
  • paper_authors: Aviya Maimon, Reut Tsarfaty
    for: The paper aims to introduce a novel benchmark for assessing the human-perceived coherence of automatically generated texts.methods: The paper uses two annotation protocols to assess coherence: a global protocol that assigns a single coherence score, and an incremental protocol that scores sentence by sentence and pinpoints reasons for incoherence.results: The paper shows that the inter-annotator agreement in the incremental mode is higher than in the holistic alternative, and that standard language models fine-tuned for coherence detection show varied performance on the different factors contributing to (in)coherence. The results emphasize the need for developing more reliable methods for coherence assessment.Here is the simplified Chinese text in the format you requested:for: 本文目的是引入一个新的自动生成文本合理性评估标准。methods: 本文使用两种注释协议来评估合理性:一种全局协议,将单个合理性分数赋予,另一种逐句协议,每个句子都得到一个(不合理)分数,并指出了不合理的原因。results: 本文显示,逐句协议的间译员一致度高于全局协议,并且标准的语言模型经过训练后,对不同的合理性因素表现出了不一致的性能。结果强调了需要更加可靠的合理性评估方法的发展。
    Abstract Coherence is a linguistic term that refers to the relations between small textual units (sentences, propositions), which make the text logically consistent and meaningful to the reader. With the advances of generative foundational models in NLP, there is a pressing need to automatically assess the human-perceived coherence of automatically generated texts. Up until now, little work has been done on explicitly assessing the coherence of generated texts and analyzing the factors contributing to (in)coherence. Previous work on the topic used other tasks, e.g., sentence reordering, as proxies of coherence, rather than approaching coherence detection heads on. In this paper, we introduce {\sc CoheSentia}, a novel benchmark of human-perceived coherence of automatically generated texts. Our annotation protocol reflects two perspectives; one is global, assigning a single coherence score, and the other is incremental, scoring sentence by sentence. The incremental method produces an (in)coherence score for each text fragment and also pinpoints reasons for incoherence at that point. Our benchmark contains 500 automatically-generated and human-annotated paragraphs, each annotated in both methods, by multiple raters. Our analysis shows that the inter-annotator agreement in the incremental mode is higher than in the holistic alternative, and our experiments show that standard LMs fine-tuned for coherence detection show varied performance on the different factors contributing to (in)coherence. All in all, these models yield unsatisfactory performance, emphasizing the need for developing more reliable methods for coherence assessment.
    摘要 “凝聚”是一个语言学术语,指小文本单位(句子、命题)之间的关系,使文本具有逻辑连贯性和意义性 для读者。随着自然语言处理(NLP)的发展,有一定的需求以自动评估生成文本的人类感知凝聚性。到目前为止,对生成文本的凝聚性的评估几乎没有任何研究,而且对于生成文本中的凝聚性因素进行分析也没有充分的研究。在这篇文章中,我们引入了{\sc CoheSentia},一个新的自动生成文本人类感知凝聚性的标准 bencmark。我们的标注协议包括两种角度:全球的标注方法,将文本的凝聚性评分为单一的数值,以及增量的标注方法,将每个句子的凝聚性评分为单一的数值,并且还能够确定各个点数的不凝聚原因。我们的标注集包含500个自动生成和人类标注的段落,每个段落都被多名标注者标注了两种方法。我们的分析显示,增量标注方法的间接协议比全球方法高,并且我们的实验显示,适用于凝聚性检测的标准语言模型(LM)在不同的凝聚性因素上表现不一。总之,这些模型在凝聚性检测方面表现不佳,强调需要发展更可靠的方法。

Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder

  • paper_url: http://arxiv.org/abs/2310.16318
  • repo_url: https://github.com/alinlab/MetaMAE
  • paper_authors: Huiwon Jang, Jihoon Tack, Daewon Choi, Jongheon Jeong, Jinwoo Shin
  • for: 本文旨在提出一种模型独立学习(Self-Supervised Learning,SSL)框架,可以在多个模式下进行学习。
  • methods: 本文使用Masked Auto-Encoder(MAE) architecture,并通过元学习来解释MAE为多Modalities的学习器。我们提出了两种高级元学习技术:首先,通过梯度基元学习来调整缺省的秘钥表示;其次,通过任务对比学习来确保秘钥表示与任务相关。
  • results: 我们在DABS模式独立学习 benchmark中进行了实验,并证明MetaMAE可以在多个模式下显著超越先前的基eline。
    Abstract Despite its practical importance across a wide range of modalities, recent advances in self-supervised learning (SSL) have been primarily focused on a few well-curated domains, e.g., vision and language, often relying on their domain-specific knowledge. For example, Masked Auto-Encoder (MAE) has become one of the popular architectures in these domains, but less has explored its potential in other modalities. In this paper, we develop MAE as a unified, modality-agnostic SSL framework. In turn, we argue meta-learning as a key to interpreting MAE as a modality-agnostic learner, and propose enhancements to MAE from the motivation to jointly improve its SSL across diverse modalities, coined MetaMAE as a result. Our key idea is to view the mask reconstruction of MAE as a meta-learning task: masked tokens are predicted by adapting the Transformer meta-learner through the amortization of unmasked tokens. Based on this novel interpretation, we propose to integrate two advanced meta-learning techniques. First, we adapt the amortized latent of the Transformer encoder using gradient-based meta-learning to enhance the reconstruction. Then, we maximize the alignment between amortized and adapted latents through task contrastive learning which guides the Transformer encoder to better encode the task-specific knowledge. Our experiment demonstrates the superiority of MetaMAE in the modality-agnostic SSL benchmark (called DABS), significantly outperforming prior baselines. Code is available at https://github.com/alinlab/MetaMAE.
    摘要 尽管自适学习(SSL)在多种模式下具有实际重要性,但最近的进展主要集中在视觉和语言领域,经常利用这些领域特定的知识。例如,偏振自适学习(MAE)已成为这些领域中流行的architecture,但它在其他模式下的潜力尚未得到充分发挥。在这篇论文中,我们开发了MAE作为一个统一、模式不偏的SSL框架。然后,我们提出了以元学习为核心,以提高MAE在多种模式下的SSL的想法,并提出了一种新的元学习技术。我们的关键想法是视Masked Auto-Encoder(MAE)的mask reconstruction为元学习任务:偏振token是通过adapting Transformer元学习器通过含括无masked token的权重学习来预测。基于这个新的解释,我们提出了两种高级元学习技术。首先,我们采用权重学习来提高含括Transfomer编码器的amortized latent。然后,我们通过任务对比学习来确保amortized和adapted latent之间的对应。我们的实验表明,MetaMAE在模式不偏的SSL benchmark(称为DABS)中表现出色, Significantly outperforming prior baselines。代码可以在https://github.com/alinlab/MetaMAE中找到。

Sum-of-Parts Models: Faithful Attributions for Groups of Features

  • paper_url: http://arxiv.org/abs/2310.16316
  • repo_url: https://github.com/debugml/sop
  • paper_authors: Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong
  • for: 这个论文目的是为了提供一种可靠的机器学习模型解释方法,帮助astrophysicists更好地理解星系形成过程。
  • methods: 该论文使用Sum-of-Parts(SOP)模型,该模型可以提供可解释的分组特征贡献,帮助找到星系形成中具有重要作用的特征。
  • results: 在标准解释指标上评估SOP模型,以及在一个实际案例中,使用SOP模型提供的可信的解释来帮助astrophysicists发现新的星系形成知识。
    Abstract An explanation of a machine learning model is considered "faithful" if it accurately reflects the model's decision-making process. However, explanations such as feature attributions for deep learning are not guaranteed to be faithful, and can produce potentially misleading interpretations. In this work, we develop Sum-of-Parts (SOP), a class of models whose predictions come with grouped feature attributions that are faithful-by-construction. This model decomposes a prediction into an interpretable sum of scores, each of which is directly attributable to a sparse group of features. We evaluate SOP on benchmarks with standard interpretability metrics, and in a case study, we use the faithful explanations from SOP to help astrophysicists discover new knowledge about galaxy formation.
    摘要 machine learning 模型的解释被称为"loyal",如果它们准确反映模型的决策过程。然而,特征贡献对深度学习来说并不一定是loyal,可能产生误导性的解释。在这项工作中,我们开发了Sum-of-Parts(SOP)模型,它的预测结果包括有组织的特征贡献,这些贡献直接关联到一个稀疏的特征集中。我们使用标准解释指标评估SOP模型,并在一个案例研究中,使用SOP模型提供的loyal解释帮助astrophysicists发现新的星系形成知识。

Instance-wise Linearization of Neural Network for Model Interpretation

  • paper_url: http://arxiv.org/abs/2310.16295
  • repo_url: None
  • paper_authors: Zhimin Li, Shusen Liu, Kailkhura Bhavya, Timo Bremer, Valerio Pascucci
  • for: 这个论文主要针对的是解释神经网络模型如何使用输入特征来做预测,以及如何从神经网络模型中提取有用的特征分布。
  • methods: 该论文提出了一种实例化线性化方法,该方法可以将神经网络模型的前向计算过程转换为线性矩阵乘法,从而提取出神经网络模型的预测过程中各个层次的线性特征。
  • results: 该论文通过应用实例化线性化方法在神经网络模型中,得到了一种可以描述神经网络模型预测过程的线性矩阵乘法方程,该方程不仅可以提供有用的特征分布,还可以告诉我们每个输入特征如何直接影响预测结果。
    Abstract Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction. For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication $F(x) = W \cdot x + b$. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.
    摘要 Current feature attribution methods provide important scores for each input feature but do not reveal how the features are processed internally by the model. This raises concerns about whether these methods are highlighting the correct features for the model's predictions.In neural networks, non-linear behavior is often caused by non-linear activation units. However, the computation process of a prediction is locally linear, as each prediction has only one activation pattern. Based on this observation, we propose an instance-wise linearization approach that reformulates the forward computation process of a neural network prediction. This approach transforms different layers of convolutional neural networks into linear matrix multiplication. By aggregating all layers' computations, a complex convolutional neural network operation can be described as a linear matrix multiplication equation: $F(x) = W \cdot x + b$.This equation not only provides a feature attribution map that highlights the importance of the input features but also reveals exactly how each input feature contributes to the prediction. Furthermore, we discuss the application of this technique in both supervised classification and unsupervised neural network learning, including parametric t-SNE dimension reduction.

XFEVER: Exploring Fact Verification across Languages

  • paper_url: http://arxiv.org/abs/2310.16278
  • repo_url: https://github.com/nii-yamagishilab/xfever
  • paper_authors: Yi-Chen Chang, Canasai Kruengkrai, Junichi Yamagishi
  • for: 本研究设计了跨语言事实抽象和验证(XFEVER)数据集,用于评估不同语言的事实验证模型。
  • methods: 本研究使用机器翻译将英语的声明和证据文本翻译成六种语言,并将训练和开发集用机器翻译,而测试集则包括专业翻译和机器翻译的文本。
  • results: 实验结果显示,使用多语言语言模型可以快速建立不同语言的事实验证模型,但表现因语言而异,英语的表现较佳。此外,我们发现可以有效地消除模型误偏,通过考虑英语和目标语言之间的预测相似性。
    Abstract This paper introduces the Cross-lingual Fact Extraction and VERification (XFEVER) dataset designed for benchmarking the fact verification models across different languages. We constructed it by translating the claim and evidence texts of the Fact Extraction and VERification (FEVER) dataset into six languages. The training and development sets were translated using machine translation, whereas the test set includes texts translated by professional translators and machine-translated texts. Using the XFEVER dataset, two cross-lingual fact verification scenarios, zero-shot learning and translate-train learning, are defined, and baseline models for each scenario are also proposed in this paper. Experimental results show that the multilingual language model can be used to build fact verification models in different languages efficiently. However, the performance varies by language and is somewhat inferior to the English case. We also found that we can effectively mitigate model miscalibration by considering the prediction similarity between the English and target languages. The XFEVER dataset, code, and model checkpoints are available at https://github.com/nii-yamagishilab/xfever.
    摘要 本文介绍了跨语言实体提取和验证(XFEVER)数据集,用于评测不同语言的实体验证模型。我们通过将 claim 和 evidence 文本从实体提取和验证(FEVER)数据集翻译成六种语言,构建了该数据集。训练集和开发集使用机器翻译进行翻译,测试集则包括由专业翻译员翻译的文本以及机器翻译的文本。在本文中,我们定义了跨语言实体验证的两种场景:零shot学习和translate-train学习,并提出了基eline模型 для每个场景。实验结果表明,可以使用多语言语模型建立不同语言的实体验证模型,但性能因语言而异,与英语情况相比有所下降。我们还发现,可以通过考虑英语和目标语言之间的预测相似性来有效地缓解模型偏差。XFEVER数据集、代码和模型检查点可以在https://github.com/nii-yamagishilab/xfever中下载。

Bayesian Domain Invariant Learning via Posterior Generalization of Parameter Distributions

  • paper_url: http://arxiv.org/abs/2310.16277
  • repo_url: None
  • paper_authors: Shiyu Shen, Bin Pan, Tianyang Shi, Tao Li, Zhenwei Shi
  • for: 这篇论文的目的是学习对不同训练领域的预测模型,以实现更好的统一预测性。
  • methods: 这篇论文使用了 bayesian neural network 来学习对不同训练领域的预测模型,并且将注意力集中在对维度分布的调整,而不是对维度分布的调整。
  • results: 这篇论文提出了一个名为 PosTerior Generalization (PTG) 的简单 yet effective 方法,可以用来估计对不同训练领域的参数分布,并且可以与现有的领域一致方法结合使用,以提高预测性能。PTG 在 DomainBed 上的评估中表现了竞争性的表现。
    Abstract Domain invariant learning aims to learn models that extract invariant features over various training domains, resulting in better generalization to unseen target domains. Recently, Bayesian Neural Networks have achieved promising results in domain invariant learning, but most works concentrate on aligning features distributions rather than parameter distributions. Inspired by the principle of Bayesian Neural Network, we attempt to directly learn the domain invariant posterior distribution of network parameters. We first propose a theorem to show that the invariant posterior of parameters can be implicitly inferred by aggregating posteriors on different training domains. Our assumption is more relaxed and allows us to extract more domain invariant information. We also propose a simple yet effective method, named PosTerior Generalization (PTG), that can be used to estimate the invariant parameter distribution. PTG fully exploits variational inference to approximate parameter distributions, including the invariant posterior and the posteriors on training domains. Furthermore, we develop a lite version of PTG for widespread applications. PTG shows competitive performance on various domain generalization benchmarks on DomainBed. Additionally, PTG can use any existing domain generalization methods as its prior, and combined with previous state-of-the-art method the performance can be further improved. Code will be made public.
    摘要 领域不变学习目标是学习EXTRACTING invariant features over various training domains, resulting in better generalization to unseen target domains. 最近, Bayesian Neural Networks have achieved promising results in domain invariant learning, but most works concentrate on aligning features distributions rather than parameter distributions. Inspired by the principle of Bayesian Neural Network, we attempt to directly learn the domain invariant posterior distribution of network parameters. We first propose a theorem to show that the invariant posterior of parameters can be implicitly inferred by aggregating posteriors on different training domains. Our assumption is more relaxed and allows us to extract more domain invariant information. We also propose a simple yet effective method, named PosTerior Generalization (PTG), that can be used to estimate the invariant parameter distribution. PTG fully exploits variational inference to approximate parameter distributions, including the invariant posterior and the posteriors on training domains. Furthermore, we develop a lite version of PTG for widespread applications. PTG shows competitive performance on various domain generalization benchmarks on DomainBed. Additionally, PTG can use any existing domain generalization methods as its prior, and combined with previous state-of-the-art method the performance can be further improved. 代码将公开。

Using GPT-4 to Augment Unbalanced Data for Automatic Scoring

  • paper_url: http://arxiv.org/abs/2310.18365
  • repo_url: None
  • paper_authors: Luyang Fang, Gyeong-Geon Lee, Xiaoming Zhai
  • for: 这个研究是为了解决自动评分中学生回答不均匀的问题,使用GPT-4大语言模型进行资料增强。
  • methods: 研究使用GPT-4生成模型生成学生回答的对应问题,以增强资料,然后使用DistillBERT进行自动评分。
  • results: 研究发现,将GPT-4增强的数据融入到自动评分模型中,可以提高精度、准确性、回传率和F1分数,特别是在较少的评分类别中。且不同于原始数据的比例,需要不同的增强数据量以获得稳定的改善。此外,与学生写作增强数据相比,GPT-4增强的评分模型表现更好或相等。
    Abstract Machine learning-based automatic scoring can be challenging if students' responses are unbalanced across scoring categories, as it introduces uncertainty in the machine training process. To meet this challenge, we introduce a novel text data augmentation framework leveraging GPT-4, a generative large language model, specifically tailored for unbalanced datasets in automatic scoring. Our experimental dataset comprised student written responses to two science items. We crafted prompts for GPT-4 to generate responses resembling student written answers, particularly for the minority scoring classes, to augment the data. We then finetuned DistillBERT for automatic scoring based on the augmented and original datasets. Model performance was assessed using accuracy, precision, recall, and F1 metrics. Our findings revealed that incorporating GPT-4-augmented data remarkedly improved model performance, particularly for precision, recall, and F1 scores. Interestingly, the extent of improvement varied depending on the specific dataset and the proportion of augmented data used. Notably, we found that a varying amount of augmented data (5\%-40\%) was needed to obtain stable improvement for automatic scoring. We also compared the accuracies of models trained with GPT-4 augmented data to those trained with additional student-written responses. Results suggest that the GPT-4 augmented scoring models outperform or match the models trained with student-written augmented data. This research underscores the potential and effectiveness of data augmentation techniques utilizing generative large language models--GPT-4 in addressing unbalanced datasets within automated assessment.
    摘要

CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment

  • paper_url: http://arxiv.org/abs/2310.16271
  • repo_url: None
  • paper_authors: Jixiang Hong, Quan Tu, Changyu Chen, Xing Gao, Ji Zhang, Rui Yan
  • for: 本研究旨在使用人工回馈法(RLHF)和排名方法对大规模语言模型(LLM)进行对人价值的调整,以确保模型的输出符合人类的偏好和价值观。
  • methods: 本研究使用了一种名为循环调整(CycleAlign)的新方法,通过在循环互动中使用隐藏模型(black-box)和可见模型(white-box)来实现对模型的调整。在每次互动中,隐藏模型根据人工提供的指导和示范来排序模型生成的响应,而白色模型则通过自身的判断来评价自己生成的响应。
  • results: 研究发现,通过多次循环互动,循环调整框架可以有效地将白色模型与隐藏模型进行对调,并且可以在低资源情况下实现。此外,与现有方法相比,模型经过循环调整后表现出色,达到了人类价值对Alignment的最佳性能。
    Abstract Language models trained on large-scale corpus often generate content that is harmful, toxic, or contrary to human preferences, making their alignment with human values a critical concern. Reinforcement learning from human feedback (RLHF) with algorithms like PPO is a prevalent approach for alignment but is often complex, unstable, and resource-intensive. Recently, ranking-based alignment methods have emerged, offering stability and effectiveness by replacing the RL framework with supervised fine-tuning, but they are costly due to the need for annotated data. Considering that existing large language models (LLMs) like ChatGPT are already relatively well-aligned and cost-friendly, researchers have begun to align the language model with human preference from AI feedback. The common practices, which unidirectionally distill the instruction-following responses from LLMs, are constrained by their bottleneck. Thus we introduce CycleAlign to distill alignment capabilities from parameter-invisible LLMs (black-box) to a parameter-visible model (white-box) in an iterative manner. With in-context learning (ICL) as the core of the cycle, the black-box models are able to rank the model-generated responses guided by human-craft instruction and demonstrations about their preferences. During iterative interaction, the white-box models also have a judgment about responses generated by them. Consequently, the agreement ranking could be viewed as a pseudo label to dynamically update the in-context demonstrations and improve the preference ranking ability of black-box models. Through multiple interactions, the CycleAlign framework could align the white-box model with the black-box model effectively in a low-resource way. Empirical results illustrate that the model fine-tuned by CycleAlign remarkably exceeds existing methods, and achieves the state-of-the-art performance in alignment with human value.
    摘要 大量文本训练的语言模型经常生成有害、毒性或背离人类偏好的内容,使其与人类价值观Alignment成为一项关键问题。使用人类反馈强化学习(RLHF)的算法如PPO是一种常见的实现方式,但它们经常复杂、不稳定和资源占用。 reciently, 排名基于的对齐方法出现了,它们可以通过取代RL框架,实现稳定性和效果,但它们需要大量的标注数据。 given that existing large language models(LLMs)like ChatGPT are already relatively well-aligned and cost-friendly, researchers have begun to align the language model with human preference from AI feedback。 common practices, which unidirectionally distill the instruction-following responses from LLMs, are constrained by their bottleneck。 therefore, we introduce CycleAlign to distill alignment capabilities from parameter-invisible LLMs(black-box)to a parameter-visible model(white-box)in an iterative manner。 with in-context learning(ICL)as the core of the cycle, the black-box models are able to rank the model-generated responses guided by human-craft instruction and demonstrations about their preferences。 during iterative interaction, the white-box models also have a judgment about responses generated by them。 consequently, the agreement ranking could be viewed as a pseudo label to dynamically update the in-context demonstrations and improve the preference ranking ability of black-box models。 through multiple interactions, the CycleAlign framework could align the white-box model with the black-box model effectively in a low-resource way。 empirical results illustrate that the model fine-tuned by CycleAlign remarkably exceeds existing methods, and achieves the state-of-the-art performance in alignment with human value。

Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

  • paper_url: http://arxiv.org/abs/2310.16270
  • repo_url: https://github.com/msakarvadia/attentionlens
  • paper_authors: Mansi Sakarvadia, Arham Khan, Aswathy Ajith, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster
  • for: 这个论文旨在了解 trasformer 基于语言模型中 attention 头的特定作用,以及它们如何生成最终预测结果。
  • methods: 该论文使用 reverse engineering 技术来探索 attention 头的内部机制,并提出了一种名为 Attention Lens 的工具来将 attention 头的输出翻译成 vocabulary tokens。
  • results: 预liminary 结果表明,attention 头在语言模型中扮演着非常特殊的角色,并且可以通过 learned 的 attention-head-specific 转换来翻译 attention 头的输出。
    Abstract Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens via learned attention-head-specific transformations called lenses. Preliminary findings from our trained lenses indicate that attention heads play highly specialized roles in language models. The code for Attention Lens is available at github.com/msakarvadia/AttentionLens.
    摘要 Transformer-based 大型自然语言模型 (LLMs) 是当前最佳实践的自然语言任务。近期的工作尝试了将 linear layers 的内部机制 reverse engineering 为文本完成任务中的最终预测。然而,对于特定的 attention heads 在生成最终Token预测中的作用知之 little。我们提出 Attention Lens,一个工具,允许研究人员通过学习 attention-head-specific 的变换(lenses)将 attention heads 的输出翻译成词汇符号。我们的初步发现表明, attention heads 在语言模型中扮演了非常特殊的角色。Attention Lens 的代码可以在 github.com/msakarvadia/AttentionLens 上找到。

Multilingual Coarse Political Stance Classification of Media. The Editorial Line of a ChatGPT and Bard Newspaper

  • paper_url: http://arxiv.org/abs/2310.16269
  • repo_url: None
  • paper_authors: Cristina España-Bonet
  • for: This paper aims to explore the use of artificial intelligence (AI) in news outlets and its potential impact on bias ratings.
  • methods: The authors use authentic news outlets’ ratings to create a multilingual corpus of news with coarse stance annotations and automatically extracted topic annotations. They train classifiers on this data to identify the editorial line of unseen newspapers in English, German, Spanish, and Catalan.
  • results: The classifiers are able to identify the editorial line of most unseen newspapers in the four languages, and the authors observe that ChatGPT’s editorial line evolves over time and differs among languages.
    Abstract Neutrality is difficult to achieve and, in politics, subjective. Traditional media typically adopt an editorial line that can be used by their potential readers as an indicator of the media bias. Several platforms currently rate news outlets according to their political bias. The editorial line and the ratings help readers in gathering a balanced view of news. But in the advent of instruction-following language models, tasks such as writing a newspaper article can be delegated to computers. Without imposing a biased persona, where would an AI-based news outlet lie within the bias ratings? In this work, we use the ratings of authentic news outlets to create a multilingual corpus of news with coarse stance annotations (Left and Right) along with automatically extracted topic annotations. We show that classifiers trained on this data are able to identify the editorial line of most unseen newspapers in English, German, Spanish and Catalan. We then apply the classifiers to 101 newspaper-like articles written by ChatGPT and Bard in the 4 languages at different time periods. We observe that, similarly to traditional newspapers, ChatGPT editorial line evolves with time and, being a data-driven system, the stance of the generated articles differs among languages.
    摘要 ( Traditional media typically adopt an editorial line that can be used by their potential readers as an indicator of the media bias. Several platforms currently rate news outlets according to their political bias. The editorial line and the ratings help readers in gathering a balanced view of news. But in the advent of instruction-following language models, tasks such as writing a newspaper article can be delegated to computers. Without imposing a biased persona, where would an AI-based news outlet lie within the bias ratings? In this work, we use the ratings of authentic news outlets to create a multilingual corpus of news with coarse stance annotations (Left and Right) along with automatically extracted topic annotations. We show that classifiers trained on this data are able to identify the editorial line of most unseen newspapers in English, German, Spanish and Catalan. We then apply the classifiers to 101 newspaper-like articles written by ChatGPT and Bard in the 4 languages at different time periods. We observe that, similarly to traditional newspapers, ChatGPT editorial line evolves with time and, being a data-driven system, the stance of the generated articles differs among languages.)Note: The text has been translated using the Google Translate API, which may not be perfect and may not capture all the nuances of the original text.

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

  • paper_url: http://arxiv.org/abs/2310.16263
  • repo_url: None
  • paper_authors: Jiexin Wang, Liuwen Cao, Xitong Luo, Zhiping Zhou, Jiayuan Xie, Adam Jatowt, Yi Cai
  • for: 评估和提高大型语言模型(LLMs)在代码生成方面的安全性。
  • methods: 使用不同的方法和技术来提高 LLMs 的安全性,包括代码生成、代码修复和攻击类型分类等。
  • results: 研究发现现有模型在代码生成过程中经常忽略安全问题,导致生成的代码具有漏洞性; 提议了一些有效的方法来缓解安全漏洞,提高 LLMs 的总体可靠性。
    Abstract Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. To effectively mitigate this concern, this paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as supplemental material and will be made publicly available after publication.}, a meticulously curated dataset targeting 21 critical vulnerability types. SecuCoGen comprises 180 samples and serves as the foundation for conducting experiments on three crucial code-related tasks: code generation, code repair and vulnerability classification, with a strong emphasis on security. Our experimental results reveal that existing models often overlook security concerns during code generation, leading to the generation of vulnerable code. To address this, we propose effective approaches to mitigate the security vulnerabilities and enhance the overall robustness of code generated by LLMs. Moreover, our study identifies weaknesses in existing models' ability to repair vulnerable code, even when provided with vulnerability information. Additionally, certain vulnerability types pose challenges for the models, hindering their performance in vulnerability classification. Based on these findings, we believe our study will have a positive impact on the software engineering community, inspiring the development of improved methods for training and utilizing LLMs, thereby leading to safer and more trustworthy model deployment.
    摘要 Translated into Simplified Chinese:大型语言模型(LLMs)已经为开发者带来了重要的进步,帮助他们更好地生成代码。然而,通过使用 GitHub 等开源存储库中的未经过处理的数据进行训练,可能会意外地传播安全漏洞。为了有效地缓解这种问题,这篇论文提出了一项全面的研究,旨在从软件安全角度评估和加强代码生成器。我们提出了 SecuCoGen,一个精心准备的数据集,包含 21 种关键的漏洞类型。SecuCoGen 包含 180 个样本,并作为基于代码生成、代码修复和漏洞分类等三个关键任务的实验基础。我们的实验结果表明,现有的模型在代码生成时经常忽略安全问题,导致生成的代码存在漏洞。为了解决这个问题,我们提出了一些有效的方法来缓解安全漏洞并提高代码生成器的整体可靠性。此外,我们的研究还发现了现有模型在修复漏洞代码时存在缺陷,即使提供了漏洞信息。此外,某些漏洞类型对模型表现出了困难。根据这些发现,我们认为这项研究将对软件工程领域产生积极的影响,激励开发人员开发更好的训练和使用 LLMs 的方法,从而导致更安全和可靠的模型部署。

rTisane: Externalizing conceptual models for data analysis increases engagement with domain knowledge and improves statistical model quality

  • paper_url: http://arxiv.org/abs/2310.16262
  • repo_url: None
  • paper_authors: Eunice Jun, Edward Misback, Jeffrey Heer, René Just
  • for: 本研究旨在了解分析员在使用统计模型时的假设表达方式,以及这些假设如何影响统计模型质量。
  • methods: 本研究使用域特定语言(DSL)让分析员表达概念模型,并在这些模型中解决歧义。
  • results: 研究发现,使用 rTisane 的 DSL 可以帮助分析员更深入地表达假设,并更准确地外部化假设。 rTisane 也导致统计模型更好地匹配分析员的假设,保持分析意图,并更好地适应数据。
    Abstract Statistical models should accurately reflect analysts' domain knowledge about variables and their relationships. While recent tools let analysts express these assumptions and use them to produce a resulting statistical model, it remains unclear what analysts want to express and how externalization impacts statistical model quality. This paper addresses these gaps. We first conduct an exploratory study of analysts using a domain-specific language (DSL) to express conceptual models. We observe a preference for detailing how variables relate and a desire to allow, and then later resolve, ambiguity in their conceptual models. We leverage these findings to develop rTisane, a DSL for expressing conceptual models augmented with an interactive disambiguation process. In a controlled evaluation, we find that rTisane's DSL helps analysts engage more deeply with and accurately externalize their assumptions. rTisane also leads to statistical models that match analysts' assumptions, maintain analysis intent, and better fit the data.
    摘要 (Note: The text is translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.)

A Causal Disentangled Multi-Granularity Graph Classification Method

  • paper_url: http://arxiv.org/abs/2310.16256
  • repo_url: None
  • paper_authors: Yuan Li, Li Liu, Penggang Chen, Youmin Zhang, Guoyin Wang
  • for: 本文提出了一种解决图像数据的多维度特征映射问题的方法,以提高图像分类任务的准确率和可解释性。
  • methods: 本文提出了一种基于 causal disentanglement 的多维度图像表示学习方法(CDM-GNN),该方法可以分离图像中重要的子结构和偏好部分,从多维度角度进行表示学习,提高图像分类任务的准确率和可解释性。
  • results: 本文通过对三个实际数据集(MUTAG、PTC、IMDM-M)进行比较,表明 CDM-GNN 模型在图像分类任务中表现出色,同时也可以提供可解释的结果,与人类认知模式相符。
    Abstract Graph data widely exists in real life, with large amounts of data and complex structures. It is necessary to map graph data to low-dimensional embedding. Graph classification, a critical graph task, mainly relies on identifying the important substructures within the graph. At present, some graph classification methods do not combine the multi-granularity characteristics of graph data. This lack of granularity distinction in modeling leads to a conflation of key information and false correlations within the model. So, achieving the desired goal of a credible and interpretable model becomes challenging. This paper proposes a causal disentangled multi-granularity graph representation learning method (CDM-GNN) to solve this challenge. The CDM-GNN model disentangles the important substructures and bias parts within the graph from a multi-granularity perspective. The disentanglement of the CDM-GNN model reveals important and bias parts, forming the foundation for its classification task, specifically, model interpretations. The CDM-GNN model exhibits strong classification performance and generates explanatory outcomes aligning with human cognitive patterns. In order to verify the effectiveness of the model, this paper compares the three real-world datasets MUTAG, PTC, and IMDM-M. Six state-of-the-art models, namely GCN, GAT, Top-k, ASAPool, SUGAR, and SAT are employed for comparison purposes. Additionally, a qualitative analysis of the interpretation results is conducted.
    摘要 Graph data广泛存在于实际生活中,具有大量数据和复杂结构。需要将图数据映射到低维度嵌入。图分类任务是图处理中的关键任务,主要是在图中发现重要的子结构。目前,一些图分类方法不会结合图数据的多级结构。这会导致模型中的关键信息和假相关性混淆。因此,实现可靠和可解释的模型变得困难。本文提出了一种 causal disentangled multi-granularity 图表示学习方法(CDM-GNN)解决这个挑战。CDM-GNN 模型在多级视角下分离出重要的子结构和偏好部分。CDM-GNN 模型的分离可以揭示重要和偏好的部分,这成为模型的分类任务基础。CDM-GNN 模型在分类任务中表现出色,并生成了与人认知模式相符的解释结果。为了证明模型的有效性,本文对三个实际 datasets(MUTAG、PTC、IMDM-M)进行比较。并进行了对解释结果的质量分析。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. The translation is based on the original text and may not capture all the nuances and variations of the original text.

ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair

  • paper_url: http://arxiv.org/abs/2310.16253
  • repo_url: None
  • paper_authors: Yonghao Wu, Zheng Li, Jie M. Zhang, Yong Liu
  • for: 本研究旨在提供一个新的 fault localization 和 program repair 的 benchmark dataset,以确保 LLM-based 方法的可靠性和通用性。
  • methods: 该研究使用了精心约定的 fault 数据,以消除现有 benchmark 中的数据泄露问题,从而提供一个可靠的 benchmark 集。
  • results: 该研究提供了 1,254 个 Java faulty program 和 1,625 个 Python faulty program,每个 fault 都包含缺陷位置和修复后的代码版本,适用于 fault localization 和 program repair 相关的研究。
    Abstract With the growing interest on Large Language Models (LLMs) for fault localization and program repair, ensuring the integrity and generalizability of the LLM-based methods becomes paramount. The code in existing widely-adopted benchmarks for these tasks was written before the the bloom of LLMs and may be included in the training data of existing popular LLMs, thereby suffering from the threat of data leakage, leading to misleadingly optimistic performance metrics. To address this issue, we introduce "ConDefects", a novel dataset of real faults meticulously curated to eliminate such overlap. ConDefects contains 1,254 Java faulty programs and 1,625 Python faulty programs. All these programs are sourced from the online competition platform AtCoder and were produced between October 2021 and September 2023. We pair each fault with fault locations and the corresponding repaired code versions, making it tailored for in fault localization and program repair related research. We also provide interfaces for selecting subsets based on different time windows and coding task difficulties. While inspired by LLM-based tasks, ConDefects can be adopted for benchmarking ALL types of fault localization and program repair methods. The dataset is publicly available, and a demo video can be found at https://www.youtube.com/watch?v=22j15Hj5ONk.
    摘要 随着大语言模型(LLM)在错误定位和程序修复领域的兴趣增长,确保LLM-基于方法的完整性和通用性变得非常重要。现有的广泛采用的测试集中包含的代码可能在LLM的训练数据中包含,从而导致数据泄露问题,从而导致表现指标过optimistic。为解决这个问题,我们介绍了“ConDefects”,一个新的故障数据集,其中包含1,254个Java错误程序和1,625个Python错误程序。这些程序都来自于在线竞赛平台AtCoder,生成时间为2021年10月至2023年9月。我们对每个错误进行了精心编辑,以消除 overlap。我们还为每个错误提供了相应的修复代码版本,使其适用于错误定位和程序修复相关的研究。此外,我们还提供了基于不同时间窗口和编程任务难度的选择接口。虽然受LLM-基于任务的 inspirations,但ConDefects可以适用于所有类型的错误定位和程序修复方法的 benchmarking。数据集公开可用,demo视频可以在https://www.youtube.com/watch?v=22j15Hj5ONk找到。