cs.AI - 2023-10-13

Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks

  • paper_url: http://arxiv.org/abs/2310.09436
  • repo_url: https://github.com/zixuanke/pycontinual
  • paper_authors: Zixuan Ke, Bing Liu, Wenhan Xiong, Asli Celikyilmaz, Haoran Li
  • for: 这篇论文的目的是提出一种新的继续学习(Continual Learning,CL)方法,以解决预防悖论(Catastrophic Forgetting,CF)和促进知识转移(Knowledge Transfer,KT)问题。
  • methods: 这篇论文提出了一种新的CL方法,通过发现每个任务的子网络来防止CF,并通过软阶层掩盖机制来维持先前的知识并允许新任务受惠于过去的知识进行KT。
  • results: 实验结果显示,提出的方法在标签、生成、信息提取和其混合任务(即不同类型任务)上均能够超越强大的基eline。
    Abstract Continual learning (CL) has two main objectives: preventing catastrophic forgetting (CF) and encouraging knowledge transfer (KT). The existing literature mainly focused on overcoming CF. Some work has also been done on KT when the tasks are similar. To our knowledge, only one method has been proposed to learn a sequence of mixed tasks. However, these techniques still suffer from CF and/or limited KT. This paper proposes a new CL method to achieve both. It overcomes CF by isolating the knowledge of each task via discovering a subnetwork for it. A soft-masking mechanism is also proposed to preserve the previous knowledge and to enable the new task to leverage the past knowledge to achieve KT. Experiments using classification, generation, information extraction, and their mixture (i.e., heterogeneous tasks) show that the proposed method consistently outperforms strong baselines.
    摘要

Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health

  • paper_url: http://arxiv.org/abs/2310.18326
  • repo_url: https://github.com/harsh-kumar9/bandit_simulation
  • paper_authors: Harsh Kumar, Tong Li, Jiakai Shi, Ilya Musabirov, Rachel Kornfield, Jonah Meyerhoff, Ananya Bhattacharjee, Chris Karr, Theresa Nguyen, David Mohr, Anna Rafferty, Sofia Villar, Nina Deliu, Joseph Jay Williams
  • For: The paper is written to explore the use of adaptive experimentation algorithms, specifically Thompson Sampling, in digital mental health (DMH) interventions to improve their design and impact.* Methods: The paper presents a software system that allows for the adaptation of DMH intervention components using bandit and other algorithms, while collecting data for comparison with traditional uniform random non-adaptive experiments.* Results: The system was deployed to 1100 users recruited through a large mental health non-profit organization, and the results show the potential of adaptive experimentation algorithms in improving the effectiveness of DMH interventions.In Simplified Chinese text, the three key points would be:* For: 这篇论文是为了探讨数字心理健康(DMH)互动式 intervención的优化和影响。* Methods: 论文提出了一种使用适应试验算法(如汤姆生抽象)来改进 DMH 互动式 intervención的软件系统。* Results: 软件系统在1100名通过大型心理健康非营利组织招募的用户中进行了测试,结果表明适应试验算法在改进 DMH 互动式 intervención的设计和影响方面具有潜在的潜力。
    Abstract Digital mental health (DMH) interventions, such as text-message-based lessons and activities, offer immense potential for accessible mental health support. While these interventions can be effective, real-world experimental testing can further enhance their design and impact. Adaptive experimentation, utilizing algorithms like Thompson Sampling for (contextual) multi-armed bandit (MAB) problems, can lead to continuous improvement and personalization. However, it remains unclear when these algorithms can simultaneously increase user experience rewards and facilitate appropriate data collection for social-behavioral scientists to analyze with sufficient statistical confidence. Although a growing body of research addresses the practical and statistical aspects of MAB and other adaptive algorithms, further exploration is needed to assess their impact across diverse real-world contexts. This paper presents a software system developed over two years that allows text-messaging intervention components to be adapted using bandit and other algorithms while collecting data for side-by-side comparison with traditional uniform random non-adaptive experiments. We evaluate the system by deploying a text-message-based DMH intervention to 1100 users, recruited through a large mental health non-profit organization, and share the path forward for deploying this system at scale. This system not only enables applications in mental health but could also serve as a model testbed for adaptive experimentation algorithms in other domains.
    摘要 数字心理健康(DMH) intervención,如文本消息基рован的课程和活动,具有巨大的可访问性和可靠性。虽然这些 intervención 可以有效,但在实际场景中进行实验测试可以进一步提高其设计和影响。适应试验,使用 Thompson Sampling 等算法,可以导致不断改进和个性化。然而,目前还未清楚这些算法在提高用户体验奖励的同时,如何收集足够的统计信息,以便社会行为科学家进行分析。虽然有一部分研究探讨了实用和统计方面的 MAB 和其他适应算法,但还需更多的探索,以评估它们在多种实际场景中的影响。本文介绍了一个在两年时间内开发的软件系统,允许文本消息 intervención 组件通过bandit和其他算法进行适应。该系统可以同时收集数据,以便与传统的固定随机非适应试验进行比较。我们通过对 1100 名用户进行文本消息基рован DMH intervención 的部署,并分享将来如何在大规模执行这个系统。这个系统不仅适用于心理健康领域,也可以作为其他领域适应试验算法的模型试验床。

Enhancing BERT-Based Visual Question Answering through Keyword-Driven Sentence Selection

  • paper_url: http://arxiv.org/abs/2310.09432
  • repo_url: None
  • paper_authors: Davide Napolitano, Lorenzo Vaiani, Luca Cagliero
  • for: 这个paper的目的是自动检测多页文档中的父子关系。
  • methods: 这个paper使用了文本 только方法,利用特制的采样策略。具体来说,它利用了覆盖语言模型的遮盖技术,对BERT模型进行了微调,专注于含有敏感关键词的句子,如表格或图片引用。
  • results: 这个paper的解决方案比基eline高效,达到了高性能。这表明了我们的解决方案对这个任务做出了正面贡献。
    Abstract The Document-based Visual Question Answering competition addresses the automatic detection of parent-child relationships between elements in multi-page documents. The goal is to identify the document elements that answer a specific question posed in natural language. This paper describes the PoliTo's approach to addressing this task, in particular, our best solution explores a text-only approach, leveraging an ad hoc sampling strategy. Specifically, our approach leverages the Masked Language Modeling technique to fine-tune a BERT model, focusing on sentences containing sensitive keywords that also occur in the questions, such as references to tables or images. Thanks to the effectiveness of this approach, we are able to achieve high performance compared to baselines, demonstrating how our solution contributes positively to this task.
    摘要 文档基于视觉问答比赛关注自动检测多页文档中元素之间的父子关系。目标是通过自然语言提问来自动检测文档中答案元素。这篇文章描述了波里多的方法来解决这项任务,尤其是我们最佳解决方案是文本只的方法,利用特定的随机抽样策略。具体来说,我们的方法利用做袋掩码语言模型技术来微调BERT模型,专注于问题中包含敏感关键词的句子,如表格或图像引用。由于这种方法的有效性,我们能够在基eline上实现高性能,说明了我们的解决方案对这个任务做出了积极贡献。

A Systematic Evaluation of Large Language Models on Out-of-Distribution Logical Reasoning Tasks

  • paper_url: http://arxiv.org/abs/2310.09430
  • repo_url: https://github.com/strong-ai-lab/logical-and-abstract-reasoning
  • paper_authors: Qiming Bao, Gael Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neset Tan, Yang Chen, Michael Witbrock, Jiamou Liu
  • for: 评估大语言模型(LLM)的普适性和可靠性在逻辑推理任务上。
  • methods: 提出三个新的逻辑推理数据集,名为“ReClor-plus”、“LogiQA-plus”和“LogiQAv2-plus”,每个数据集有三个子集:第一个是随机排序的选项,第二个是正确选项被替换为“ none of the other options are correct”,第三个是组合前两个子集。进行这些数据集上的实验,并显示这些简单的技巧对语言模型的性能有很大阻碍。
  • results: 发现所有模型在我们新建的数据集上表现差,尤其是在逻辑推理任务上。我们还发现,通过对训练集进行任务变化,可以大幅提高模型的普适性和可靠性。此外,通过逻辑驱动的数据增强和提问可以提高大语言模型的普适性表现。这些结果为评估和改进大语言模型的逻辑推理能力提供了新的视角。我们将源代码和数据公开发布在GitHub上,链接在url中。
    Abstract Large language models (LLMs), such as GPT-3.5 and GPT-4, have greatly advanced the performance of artificial systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness to perform logical reasoning remain under-evaluated. To probe this ability, we propose three new logical reasoning datasets named "ReClor-plus", "LogiQA-plus" and "LogiQAv2-plus", each featuring three subsets: the first with randomly shuffled options, the second with the correct choices replaced by "none of the other options are correct", and a combination of the previous two subsets. We carry out experiments on these datasets with both discriminative and generative LLMs and show that these simple tricks greatly hinder the performance of the language models. Despite their superior performance on the original publicly available datasets, we find that all models struggle to answer our newly constructed datasets. We show that introducing task variations by perturbing a sizable training set can markedly improve the model's generalisation and robustness in logical reasoning tasks. Moreover, applying logic-driven data augmentation for fine-tuning, combined with prompting can enhance the generalisation performance of both discriminative large language models and generative large language models. These results offer insights into assessing and improving the generalisation and robustness of large language models for logical reasoning tasks. We make our source code and data publicly available \url{https://github.com/Strong-AI-Lab/Logical-and-abstract-reasoning}.
    摘要 大型自然语言处理模型(LLM),如GPT-3.5和GPT-4,已经在不同的自然语言处理任务上达到了人类水平的性能。然而,它们的总体化和鲁棒性在逻辑推理任务上仍然受到了不足的评估。为探索这一能力,我们提出了三个新的逻辑推理数据集:"ReClor-plus"、"LogiQA-plus"和"LogiQAv2-plus",每个数据集有三个子集:第一个是随机排序的选项,第二个是正确选项被替换为"none of the other options are correct",第三个是这两个子集的组合。我们对这些数据集进行了对抗和生成模型的实验,发现这些简单的技巧很大地降低了模型的性能。尽管这些模型在原始公开的数据集上表现出色,但我们发现所有模型在我们新建的数据集上很难回答问题。我们发现可以通过对训练集进行修改来引入任务变化,这会使模型在逻辑推理任务中的总体化和鲁棒性得到明显提升。此外,我们发现在 fine-tuning 过程中应用逻辑驱动的数据增强,并与提示结合使用,可以进一步提高总体化模型和生成模型的总体化性能。这些结果为评估和改进大型自然语言处理模型的逻辑推理能力提供了新的视角。我们将代码和数据公开在 GitHub 上,可以在以下链接获取:

Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks

  • paper_url: http://arxiv.org/abs/2310.09412
  • repo_url: None
  • paper_authors: Harsh Patel, Yuan Zhou, Alexander P Lamb, Shu Wang, Jieliang Luo
  • for: optimize real-time control of water distribution networks (WDNs) to reduce energy consumption and operational costs while adhering to physical operational constraints.
  • methods: reinforcement learning (RL) with improved “hybrid RL” methodology that integrates benefits of RL with historical data to enhance explainability and robustness of control recommendations.
  • results: significant improvement in sustainability, operational efficiency, and adaptability to emerging scenarios in real-world WDNs.
    Abstract This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs). Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs. Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees. Conversely, reinforcement learning (RL) stands out for its adaptability to uncertainties and reduced inference time, enabling real-time responsiveness. However, the effective implementation of RL is contingent on building accurate simulation models for WDNs, and prior applications have been limited by errors in simulation training data. These errors can potentially cause the RL agent to learn misleading patterns and actions and recommend suboptimal operational strategies. To overcome these challenges, we present an improved "hybrid RL" methodology. This method integrates the benefits of RL while anchoring it in historical data, which serves as a baseline to incrementally introduce optimal control recommendations. By leveraging operational data as a foundation for the agent's actions, we enhance the explainability of the agent's actions, foster more robust recommendations, and minimize error. Our findings demonstrate that the hybrid RL agent can significantly improve sustainability, operational efficiency, and dynamically adapt to emerging scenarios in real-world WDNs.
    摘要 Simplified Chinese:这篇文章关注优化水分配网络(WDN)中�ump的调度问题,以提高实时控制。我们的主要目标是遵循物理操作限制,同时降低能源消耗和操作成本。传统优化技术,如演化算法和遗传算法,经常因为缺乏收敛保证而失败。相比之下,反馈学习(RL)具有适应不确定性的优势,并且具有快速的推理时间,可以实现实时应对。但是,RL的有效实现需要建立准确的WDN模型,而前一些应用受到模型训练数据中的错误限制。这些错误可能导致RL机器学习器学习错误的模式和动作,并推荐不优化的操作策略。为了解决这些挑战,我们提出了一种改进的“混合RL”方法。这种方法结合了RL的优点,同时将其 anchored在历史数据上。通过利用操作数据作为机器学习器的行动基础,我们可以增强机器学习器的解释力,激发更加稳健的建议,并最小化错误。我们的发现表明,混合RL机器学习器可以在实际WDN中显著提高可持续性、操作效率和适应新情况。

Surveying the Landscape of Text Summarization with Deep Learning: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2310.09411
  • repo_url: None
  • paper_authors: Guanghua Wang, Weili Wu
  • for: 本文旨在介绍深度学习在自然语言处理(NLP)中的应用,特别是在文本摘要领域。
  • methods: 本文使用的方法包括深度神经网络,用于学习语言数据中的复杂表示,并且可以处理变长输入序列和大规模数据。
  • results: 本文结果包括讨论当前流行的文本摘要任务,包括抽取、抽象、多文摘要等,以及这些任务的深度学习模型和实验结果。
    Abstract In recent years, deep learning has revolutionized natural language processing (NLP) by enabling the development of models that can learn complex representations of language data, leading to significant improvements in performance across a wide range of NLP tasks. Deep learning models for NLP typically use large amounts of data to train deep neural networks, allowing them to learn the patterns and relationships in language data. This is in contrast to traditional NLP approaches, which rely on hand-engineered features and rules to perform NLP tasks. The ability of deep neural networks to learn hierarchical representations of language data, handle variable-length input sequences, and perform well on large datasets makes them well-suited for NLP applications. Driven by the exponential growth of textual data and the increasing demand for condensed, coherent, and informative summaries, text summarization has been a critical research area in the field of NLP. Applying deep learning to text summarization refers to the use of deep neural networks to perform text summarization tasks. In this survey, we begin with a review of fashionable text summarization tasks in recent years, including extractive, abstractive, multi-document, and so on. Next, we discuss most deep learning-based models and their experimental results on these tasks. The paper also covers datasets and data representation for summarization tasks. Finally, we delve into the opportunities and challenges associated with summarization tasks and their corresponding methodologies, aiming to inspire future research efforts to advance the field further. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific setting.
    摘要 Recently, deep learning has greatly advanced natural language processing (NLP) by enabling the development of models that can learn complex language data representations, leading to significant improvements in performance across a wide range of NLP tasks. Deep learning models for NLP typically use large amounts of data to train deep neural networks, allowing them to learn the patterns and relationships in language data. This is different from traditional NLP approaches, which rely on hand-engineered features and rules to perform NLP tasks. The ability of deep neural networks to learn hierarchical representations of language data, handle variable-length input sequences, and perform well on large datasets makes them well-suited for NLP applications.Driven by the exponential growth of textual data and the increasing demand for condensed, coherent, and informative summaries, text summarization has been a critical research area in the field of NLP. Applying deep learning to text summarization refers to the use of deep neural networks to perform text summarization tasks. In this survey, we begin with a review of popular text summarization tasks in recent years, including extractive, abstractive, multi-document, and so on. Next, we discuss most deep learning-based models and their experimental results on these tasks. The paper also covers datasets and data representation for summarization tasks. Finally, we delve into the opportunities and challenges associated with summarization tasks and their corresponding methodologies, aiming to inspire future research efforts to advance the field further. A goal of our survey is to explain how these methods differ in their requirements, as understanding them is essential for choosing a technique suited for a specific setting.

CIDER: Category-Guided Intent Disentanglement for Accurate Personalized News Recommendation

  • paper_url: http://arxiv.org/abs/2310.09401
  • repo_url: None
  • paper_authors: Yunyong Ko, Seongeun Ryu, Sang-Wook Kim
  • for: 这篇论文的目的是提出一种个性化新闻推荐方法,以帮助用户找到符合他们兴趣的新闻文章,从而减轻用户信息沉淀的问题。
  • methods: 该方法使用了分类指导的意图分离技术来解决两个问题(C1和C2)。其中,C1是如何准确地理解新闻文章中嵌入的多种意图,而C2是如何在用户的点击历史中区分不同的新闻文章。
  • results: 经过广泛的实验 validate 的结果表明,这种新闻推荐方法可以在两个真实世界数据集上提供一致性高的表现,并且提高了模型的准确率。
    Abstract Personalized news recommendation aims to assist users in finding news articles that align with their interests, which plays a pivotal role in mitigating users' information overload problem. Although many recent works have been studied for better user and news representations, the following challenges have been rarely studied: (C1) How to precisely comprehend a range of intents coupled within a news article? and (C2) How to differentiate news articles with varying post-read preferences in users' click history? To tackle both challenges together, in this paper, we propose a novel personalized news recommendation framework (CIDER) that employs (1) category-guided intent disentanglement for (C1) and (2) consistency-based news representation for (C2). Furthermore, we incorporate a category prediction into the training process of CIDER as an auxiliary task, which provides supplementary supervisory signals to enhance intent disentanglement. Extensive experiments on two real-world datasets reveal that (1) CIDER provides consistent performance improvements over seven state-of-the-art news recommendation methods and (2) the proposed strategies significantly improve the model accuracy of CIDER.
    摘要 personalized news recommendation aims to assist users in finding news articles that align with their interests, which plays a pivotal role in mitigating users' information overload problem. although many recent works have been studied for better user and news representations, the following challenges have been rarely studied: (C1) how to precisely comprehend a range of intents coupled within a news article? and (C2) how to differentiate news articles with varying post-read preferences in users' click history? to tackle both challenges together, in this paper, we propose a novel personalized news recommendation framework (CIDER) that employs (1) category-guided intent disentanglement for (C1) and (2) consistency-based news representation for (C2). furthermore, we incorporate a category prediction into the training process of CIDER as an auxiliary task, which provides supplementary supervisory signals to enhance intent disentanglement. extensive experiments on two real-world datasets reveal that (1) CIDER provides consistent performance improvements over seven state-of-the-art news recommendation methods and (2) the proposed strategies significantly improve the model accuracy of CIDER.Here's the word-for-word translation:个性化新闻推荐目标是帮助用户找到符合其兴趣的新闻文章,这对于解决用户信息泥沼问题起到了关键作用。虽然许多最近的研究已经研究了更好的用户和新闻表示,但以下两个挑战却rarely studied: (C1) 如何准确地理解新闻文章中杂乱的意图?和 (C2) 如何在用户点击历史中不同的新闻文章中分类?为了解决这两个挑战,在这篇论文中,我们提出了一种新的个性化新闻推荐框架(CIDER),该框架使用 (1) 类别导向意图分离来解决 (C1),并且使用 (2) 一致性基于新闻表示来解决 (C2)。此外,我们在CIDER的训练过程中添加了一个类别预测任务,以提供补充的监督信号,以提高意图分离。实验表明, (1) CIDER在七种state-of-the-art新闻推荐方法中提供了一致性的性能改进,和 (2) 我们提出的策略对CIDER的模型准确度产生了显著的改进。

Semantics Alignment via Split Learning for Resilient Multi-User Semantic Communication

  • paper_url: http://arxiv.org/abs/2310.09394
  • repo_url: None
  • paper_authors: Jinhyuk Choi, Jihong Park, Seung-Woo Ko, Jinho Choi, Mehdi Bennis, Seong-Lyun Kim
  • for: 这些研究旨在提高语义通信中的 neural network(NN)基于接收机(transceiver)的性能,使其能够从源数据和通信频率中提取和传输语义信息。
  • methods: 这些研究使用了分布式学习(distributed learning)和半神经网络(partial NN)精度调整技术,其中每个编码器下载了一个偏移的解码器,并地本地精度调整一部分编码器-解码器神经网络层。
  • results: simulations 表明,SLF 能够在不同的源数据和通信频率异常情况下实现语义启示的一致,并且可以控制计算和通信成本。
    Abstract Recent studies on semantic communication commonly rely on neural network (NN) based transceivers such as deep joint source and channel coding (DeepJSCC). Unlike traditional transceivers, these neural transceivers are trainable using actual source data and channels, enabling them to extract and communicate semantics. On the flip side, each neural transceiver is inherently biased towards specific source data and channels, making different transceivers difficult to understand intended semantics, particularly upon their initial encounter. To align semantics over multiple neural transceivers, we propose a distributed learning based solution, which leverages split learning (SL) and partial NN fine-tuning techniques. In this method, referred to as SL with layer freezing (SLF), each encoder downloads a misaligned decoder, and locally fine-tunes a fraction of these encoder-decoder NN layers. By adjusting this fraction, SLF controls computing and communication costs. Simulation results confirm the effectiveness of SLF in aligning semantics under different source data and channel dissimilarities, in terms of classification accuracy, reconstruction errors, and recovery time for comprehending intended semantics from misalignment.
    摘要 现代 semantic communication 研究通常利用神经网络(NN)基于的接收机(DeepJSCC)。与传统接收机不同,这些神经接收机可以通过实际源数据和通道进行训练,以EXTRACT和传输 semantics。然而,每个神经接收机都具有特定的源数据和通道偏好,使得不同的接收机difficult to understand意图的 semantics,特别是在初次遇到时。为了在多个神经接收机之间对 semantics 进行Alignment,我们提议一种分布式学习基于的解决方案,即 split learning(SL)和partial NN 精度调整技术。在这种方法中,每个编码器下载一个不对称的解码器,并地方式地精度调整一部分编码器-解码器 NN 层。通过调整这部分,SLF 控制计算和通信成本。实验结果表明,SLF 在不同的源数据和通道差异情况下对 semantics 进行Alignment,以 clasification accuracy、重建错误和理解意图所需的时间进行证明。

Integrating Symbolic Reasoning into Neural Generative Models for Design Generation

  • paper_url: http://arxiv.org/abs/2310.09383
  • repo_url: None
  • paper_authors: Maxwell Joseph Jacobson, Yexiang Xue
  • For: The paper aims to improve automated design generation by integrating neural and symbolic reasoning, allowing for more accurate and interpretable design outputs that meet user specifications and aesthetic preferences.* Methods: The proposed Spatial Reasoning Integrated Generator (SPRING) embeds a neural and symbolic integrated spatial reasoning module inside a deep generative network, using a recurrent neural network to predict object locations and symbolic constraint satisfaction to ensure that the generated designs meet user requirements.* Results: SPRING outperforms baseline generative models in delivering high design quality and better meeting user specifications, as demonstrated through quantitative evaluations and a human study. Additionally, SPRING provides interpretability and zero-shot constraint transfer, allowing users to visualize and diagnose the generation process and adapt to novel user specifications.
    Abstract Design generation requires tight integration of neural and symbolic reasoning, as good design must meet explicit user needs and honor implicit rules for aesthetics, utility, and convenience. Current automated design tools driven by neural networks produce appealing designs, but cannot satisfy user specifications and utility requirements. Symbolic reasoning tools, such as constraint programming, cannot perceive low-level visual information in images or capture subtle aspects such as aesthetics. We introduce the Spatial Reasoning Integrated Generator (SPRING) for design generation. SPRING embeds a neural and symbolic integrated spatial reasoning module inside the deep generative network. The spatial reasoning module decides the locations of objects to be generated in the form of bounding boxes, which are predicted by a recurrent neural network and filtered by symbolic constraint satisfaction. Embedding symbolic reasoning into neural generation guarantees that the output of SPRING satisfies user requirements. Furthermore, SPRING offers interpretability, allowing users to visualize and diagnose the generation process through the bounding boxes. SPRING is also adept at managing novel user specifications not encountered during its training, thanks to its proficiency in zero-shot constraint transfer. Quantitative evaluations and a human study reveal that SPRING outperforms baseline generative models, excelling in delivering high design quality and better meeting user specifications.
    摘要 设计生成需要紧密的神经和符号理解结合,因为好的设计需要满足用户的Explicit需求,并遵循隐式的艺术、实用和便利的规则。现有的自动设计工具驱动 by neural networks 可以生成有吸引力的设计,但是无法满足用户的规格和实用需求。符号理解工具,如 constraint programming,无法感知图像中的低级别视觉信息或捕捉细微的特征,如艺术性。我们介绍了Spatiotemporal Reasoning Integrated Generator(SPRING) для设计生成。SPRING嵌入神经和符号结合的空间逻辑模块到深度生成网络中。空间逻辑模块决定生成的对象的位置,通过回归神经网络预测并由符号约束满足。嵌入符号逻辑到神经生成 garantiza that the output of SPRING满足用户的要求。此外,SPRING提供可读性,allowing users to visualize and diagnose the generation process through bounding boxes. SPRING также具有 Zero-shot Constraint Transfer 的能力,可以处理用户没有在它的训练中遇到的新规则。量化评估和人类研究表明,SPRING 在实现高设计质量和更好地满足用户要求方面表现出色。

Near-optimal Differentially Private Client Selection in Federated Settings

  • paper_url: http://arxiv.org/abs/2310.09370
  • repo_url: None
  • paper_authors: Syed Eqbal Alam, Dhirendra Shukla, Shrisha Rao
  • for: 这个论文是为了提出一种基于幂等隐私算法的联邦设备选择算法。
  • methods: 该算法使用迭代幂等隐私算法来保证隐私,不需要客户端之间的信息交换。
  • results: 实验结果表明,该算法可以在长期平均参与率下提供近似优化的价值,同时保证隐私。
    Abstract We develop an iterative differentially private algorithm for client selection in federated settings. We consider a federated network wherein clients coordinate with a central server to complete a task; however, the clients decide whether to participate or not at a time step based on their preferences -- local computation and probabilistic intent. The algorithm does not require client-to-client information exchange. The developed algorithm provides near-optimal values to the clients over long-term average participation with a certain differential privacy guarantee. Finally, we present the experimental results to check the algorithm's efficacy.
    摘要 我们开发了一种迭代幂等隐私算法,用于在联邦设置中选择客户端。我们考虑了一个联邦网络,在其中客户端与中央服务器共同完成任务,但客户端在一个时间步 bases on their preferences -- local computation和 probabilistic intent决定参与或不参与。该算法不需要客户端之间信息交换。我们开发的算法可以在长期平均参与率下提供近似优化的价值,并且具有一定的隐私保证。最后,我们展示了算法的实验结果,以证明它的有效性。Note: "联邦设置" (federated setting) in Chinese is usually translated as "联邦学习" (federated learning), but in this context, it refers to the setting where multiple clients work together to complete a task.

When are Bandits Robust to Misspecification?

  • paper_url: http://arxiv.org/abs/2310.09358
  • repo_url: None
  • paper_authors: Debangshu Banerjee, Aditya Gopalan
  • for: 该文章探讨了在决策设置中使用参数特征基本奖励模型的情况,特别是在假设真实奖励和模型之间存在差异时。
  • methods: 文章使用了经典算法如$\epsilon$-greedy和LinUCB,并提供了基于问题实例和模型集的Conditions,以确保这些算法在假设奖励有较大误差时仍能获得下降性 regret guarantee。
  • results: 文章发现,在许多假设奖励有较大误差时,经典算法可以获得下降性 regret guarantee,而不是先前的最坏情况结果,这表示有一部分决策实例可以抵抗假设奖励的误差。
    Abstract Parametric feature-based reward models are widely employed by algorithms for decision making settings such as bandits and contextual bandits. The typical assumption under which they are analysed is realizability, i.e., that the true rewards of actions are perfectly explained by some parametric model in the class. We are, however, interested in the situation where the true rewards are (potentially significantly) misspecified with respect to the model class. For parameterized bandits and contextual bandits, we identify sufficient conditions, depending on the problem instance and model class, under which classic algorithms such as $\epsilon$-greedy and LinUCB enjoy sublinear (in the time horizon) regret guarantees under even grossly misspecified rewards. This is in contrast to existing worst-case results for misspecified bandits which show regret bounds that scale linearly with time, and shows that there can be a nontrivially large set of bandit instances that are robust to misspecification.
    摘要 parametric 特征基于的奖励模型广泛应用于决策设置中,如抽奖和上下文抽奖。通常假设是 realizability,即真实奖励的动作是完全由某种参数模型 explain 的。但我们对于 true 奖励的情况是(可能是 significatively )不准确地模型,对于 parameterized 抽奖和上下文抽奖,我们提出了 Conditions ,具体取决于问题实例和模型类,以至于 классические算法如 $\epsilon$-greedy 和 LinUCB 在不准确的奖励下仍然具有线性增长(在时间轴上)的异常性保证。这与现有的最坏情况结果不同,显示了一个可能是 robust 的bandit实例集。

Unsupervised Domain Adaption for Neural Information Retrieval

  • paper_url: http://arxiv.org/abs/2310.09350
  • repo_url: None
  • paper_authors: Carlos Dominguez, Jon Ander Campos, Eneko Agirre, Gorka Azkune
  • for: 这种论文主要是为了比较使用大型自然语言模型生成查询和基于规则的字符串修饰来生成synthetic annotation,以提高神经信息检索的竞争力。
  • methods: 这篇论文使用了同一种神经信息检索建模,并在BEIR测试集上进行了对比,包括零shot和无监督适应化两种情况。
  • results: 结果表明,大型自然语言模型在所有情况下都大幅超越基于规则的方法,而无监督适应化也比零shot更有效。此外,我们还研究了不同大小的开放式大型自然语言模型是否会影响生成的数据质量,发现medium-sized模型足够。
    Abstract Neural information retrieval requires costly annotated data for each target domain to be competitive. Synthetic annotation by query generation using Large Language Models or rule-based string manipulation has been proposed as an alternative, but their relative merits have not been analysed. In this paper, we compare both methods head-to-head using the same neural IR architecture. We focus on the BEIR benchmark, which includes test datasets from several domains with no training data, and explore two scenarios: zero-shot, where the supervised system is trained in a large out-of-domain dataset (MS-MARCO); and unsupervised domain adaptation, where, in addition to MS-MARCO, the system is fine-tuned in synthetic data from the target domain. Our results indicate that Large Language Models outperform rule-based methods in all scenarios by a large margin, and, more importantly, that unsupervised domain adaptation is effective compared to applying a supervised IR system in a zero-shot fashion. In addition we explore several sizes of open Large Language Models to generate synthetic data and find that a medium-sized model suffices. Code and models are publicly available for reproducibility.
    摘要

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents

  • paper_url: http://arxiv.org/abs/2310.09343
  • repo_url: None
  • paper_authors: Hyungjoo Chae, Yongho Song, Kai Tzu-iunn Ong, Taeyoon Kwon, Minjin Kim, Youngjae Yu, Dongha Lee, Dongyeop Kang, Jinyoung Yeo
  • for: 提高对话机器人的响应质量,使其更好地理解和回答对话中的隐含信息。
  • methods: 提出了一种知识储存框架,利用大语言模型(LLM)作为不可靠的教师,通过对适应过滤器进行选择性储存,提供可靠的对话链思维(CoT)理据。
  • results: 通过对多个实验进行详细测试,显示了增强对话机器人的响应质量的重要性。
    Abstract Human-like chatbots necessitate the use of commonsense reasoning in order to effectively comprehend and respond to implicit information present within conversations. Achieving such coherence and informativeness in responses, however, is a non-trivial task. Even for large language models (LLMs), the task of identifying and aggregating key evidence within a single hop presents a substantial challenge. This complexity arises because such evidence is scattered across multiple turns in a conversation, thus necessitating integration over multiple hops. Hence, our focus is to facilitate such multi-hop reasoning over a dialogue context, namely dialogue chain-of-thought (CoT) reasoning. To this end, we propose a knowledge distillation framework that leverages LLMs as unreliable teachers and selectively distills consistent and helpful rationales via alignment filters. We further present DOCTOR, a DialOgue Chain-of-ThOught Reasoner that provides reliable CoT rationales for response generation. We conduct extensive experiments to show that enhancing dialogue agents with high-quality rationales from DOCTOR significantly improves the quality of their responses.
    摘要 人类化聊天机器人需要使用常识理解以便有效地理解并响应在对话中的隐式信息。实现这种 coherence 和 informativeness 在回答中是一个非常复杂的任务。即使是大语言模型(LLM),也面临着在单个跳步中identifying 和集成关键证据的挑战。这种复杂性 arise 因为这些证据分散在多个对话转帖中,因此需要进行多个跳步的集成。因此,我们的注重点是在对话上下文中进行多跳步理解,即对话链条理解(CoT)。为此,我们提出了知识填充框架,该框架利用 LLM 作为不可靠的教师,通过对适应性筛选器进行选择性填充高质量的 rationales。此外,我们还提出了 DOCTOR,一个基于对话链条理解的回答生成工具,可以提供可靠的 CoT 理由。我们进行了广泛的实验,并证明了通过 DOCTOR 提供高质量的 rationales 可以大幅提高对话机器人的回答质量。

Ranking LLM-Generated Loop Invariants for Program Verification

  • paper_url: http://arxiv.org/abs/2310.09342
  • repo_url: None
  • paper_authors: Saikat Chakraborty, Shuvendu K. Lahiri, Sarah Fakhoury, Madanlal Musuvathi, Akash Lal, Aseem Rastogi, Aditya Senthilnathan, Rahul Sharma, Nikhil Swamy
  • for: 本研究旨在自动程序验证中总结循环 invariants。
  • methods: 本文使用大语言模型(如gpt-3.5或gpt-4)在零批学环境中生成循环 invariants,但需要许多样本来生成正确的 invariants。
  • results: 本文提出了一种{\it re-ranking}方法,使得生成的结果中正确的循环 invariants得到更高的排名,从而减少了验证器的调用次数。
    Abstract Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an invariant. To address this issue, we propose a {\it re-ranking} approach for the generated results of LLMs. We have designed a ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a contrastive ranker. Experimental results demonstrate that this re-ranking mechanism significantly improves the ranking of correct invariants among the generated candidates, leading to a notable reduction in the number of calls to a verifier.
    摘要 <>translate_language English Simplified Chinese自动化程式验证的基础是协本环 inductive loop invariants。在这个工作中,我们发现 Large Language Models(如gpt-3.5或gpt-4)在零扩展设定下可以Synthesizing loop invariants for a class of programs,但需要许多样本来生成正确的 invariants。这可能会导致访问程式验证器的大量呼叫,以建立一个 invariant。为解决这个问题,我们提出了一个{\it re-ranking}方法,将 LLMS 生成的结果重新排序。我们设计了一个排名器,可以根据问题定义区别正确的协本环 invariants和错误的尝试。这个排名器被优化为对照排名器,实验结果显示,这个重新排序机制可以对生成的候选者进行有效的排序,从而获得访问程式验证器的明显减少。

Uncertainty Quantification using Generative Approach

  • paper_url: http://arxiv.org/abs/2310.09338
  • repo_url: None
  • paper_authors: Yunsheng Zhang
  • for: 用于度量深度神经网络中的不确定性
  • methods: 使用增量生成 Monte Carlo 方法,逐步训练生成模型,计算 posterior 分布中随机变量的期望
  • results: 在 MNIST 数字分类任务上实际研究了 IGMC 的行为,并提供了关于样本大小和采样深度的理论保证
    Abstract We present the Incremental Generative Monte Carlo (IGMC) method, designed to measure uncertainty in deep neural networks using deep generative approaches. IGMC iteratively trains generative models, adding their output to the dataset, to compute the posterior distribution of the expectation of a random variable. We provide a theoretical guarantee of the convergence rate of IGMC relative to the sample size and sampling depth. Due to its compatibility with deep generative approaches, IGMC is adaptable to both neural network classification and regression tasks. We empirically study the behavior of IGMC on the MNIST digit classification task.
    摘要 我们介绍了增量生成 Monte Carlo(IGMC)方法,用于深度神经网络中 uncertainty 的量化。IGMC 逐步训练生成模型,将其输出加入数据集,以计算偶数Variable 的 posterior distribution。我们提供了样本大小和抽样深度相关的理论保证。由于它可以与深度生成方法相容,IGMC 适用于神经网络分类和回归任务。我们实验性地研究了 MNIST 数位分类任务中 IGMC 的行为。

Retro-fallback: retrosynthetic planning in an uncertain world

  • paper_url: http://arxiv.org/abs/2310.09270
  • repo_url: None
  • paper_authors: Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, José Miguel Hernández-Lobato
  • for: 提出了一种新的retrosynthesis算法,可以考虑化学反应空间中的不确定性。
  • methods: 使用了 Stochastic Processes 来表述retrosynthesis,并提出了一种名为 retro-fallback 的新的贪心算法,可以最大化在实验室中可执行的合成计划的概率。
  • results: 使用 In-silico benchmark 表明,retro-fallback 算法通常可以生成比 MCTS 和 retro* 算法更好的合成计划。
    Abstract Retrosynthesis is the task of proposing a series of chemical reactions to create a desired molecule from simpler, buyable molecules. While previous works have proposed algorithms to find optimal solutions for a range of metrics (e.g. shortest, lowest-cost), these works generally overlook the fact that we have imperfect knowledge of the space of possible reactions, meaning plans created by the algorithm may not work in a laboratory. In this paper we propose a novel formulation of retrosynthesis in terms of stochastic processes to account for this uncertainty. We then propose a novel greedy algorithm called retro-fallback which maximizes the probability that at least one synthesis plan can be executed in the lab. Using in-silico benchmarks we demonstrate that retro-fallback generally produces better sets of synthesis plans than the popular MCTS and retro* algorithms.
    摘要 转化文本为简化中文:Retrosynthesis 是指从更简单的化学物质中制备目标分子的过程。先前的研究已经提出了优化各种纪录(例如最短、最低成本)的算法,但是这些算法通常忽略了我们对化学反应空间的知识是不准确的事实。在这篇论文中,我们提出了一种新的retrosynthesis的形式化,以 compte for这种不确定性。我们还提出了一种新的greedy算法,called retro-fallback,该算法可以最大化实验室中执行 synthesis plan 的概率。使用在 silico benchmark 表明,retro-fallback 通常可以生成比 MCTS 和 retro* 算法更好的合成计划集。

Table-GPT: Table-tuned GPT for Diverse Table Tasks

  • paper_url: http://arxiv.org/abs/2310.09263
  • repo_url: None
  • paper_authors: Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, Surajit Chaudhuri
    for: 这个论文的目的是提出一种新的”表格调教” paradigm,以提高语言模型对表格的理解和表格任务的完成能力。methods: 该论文使用了多种表格任务来Synthesize from real tables to train and fine-tune language models like GPT-3.5 and ChatGPT。results: 研究发现,通过使用这种”表格调教” paradigm,可以帮助语言模型更好地理解表格和完成表格任务,并且可以在不同的人工指令下响应新的表格任务,与GPT-3.5和ChatGPT类似。
    Abstract Language models, such as GPT-3.5 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks. However, when probing language models using a range of basic table-understanding tasks, we observe that today's language models are still sub-optimal in many table-related tasks, likely because they are pre-trained predominantly on \emph{one-dimensional} natural-language texts, whereas relational tables are \emph{two-dimensional} objects. In this work, we propose a new "\emph{table-tuning}" paradigm, where we continue to train/fine-tune language models like GPT-3.5 and ChatGPT, using diverse table-tasks synthesized from real tables as training data, with the goal of enhancing language models' ability to understand tables and perform table tasks. We show that our resulting Table-GPT models demonstrate (1) better \emph{table-understanding} capabilities, by consistently outperforming the vanilla GPT-3.5 and ChatGPT, on a wide-range of table tasks, including holdout unseen tasks, and (2) strong \emph{generalizability}, in its ability to respond to diverse human instructions to perform new table-tasks, in a manner similar to GPT-3.5 and ChatGPT.
    摘要 语言模型,如GPT-3.5和ChatGPT,表现出惊人的能力,遵循多种人类指令并完成广泛的任务。然而,当我们使用多种基本表格理解任务探测语言模型时,我们发现今天的语言模型仍然在许多表格相关任务上是优化的。这是因为这些语言模型在主要的预训练数据中是一dimensional的自然语言文本,而表格是两dimensional的对象。在这项工作中,我们提议一种新的 "\emph{表格调教}" 模式,我们继续训练/精度调教语言模型,使其能够更好地理解表格并完成表格任务。我们显示了我们的表格GPT模型在多种表格任务上表现出更好的表格理解能力,并且具有强的泛化能力,能够遵循多种人类指令来完成新的表格任务,与GPT-3.5和ChatGPT类似。

It’s an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models

  • paper_url: http://arxiv.org/abs/2310.09250
  • repo_url: None
  • paper_authors: Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar
  • for: 这 paper 是关于深度学习基本错误分析的研究,具体来说是研究深度学习模型ensemble中的偏差和方差之间的交互关系。
  • methods: 该 paper 使用了多种深度学习模型和实验数据,并通过 empirical evidence 验证了这种偏差-方差对应现象的存在。同时,paper 还通过两种理论角度来研究这种现象:calibration 和 neural collapse。
  • results: 研究结果表明,在深度学习模型ensemble中,偏差和方差在样本水平上是对应的,即正确分类样本点的平方偏差与方差的平方几乎相等。这种现象在多种深度学习模型和实验数据上都可见。
    Abstract Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}. However, in this paper, we show that for an ensemble of deep learning based classification models, bias and variance are \emph{aligned} at a sample level, where squared bias is approximately \emph{equal} to variance for correctly classified sample points. We present empirical evidence confirming this phenomenon in a variety of deep learning models and datasets. Moreover, we study this phenomenon from two theoretical perspectives: calibration and neural collapse. We first show theoretically that under the assumption that the models are well calibrated, we can observe the bias-variance alignment. Second, starting from the picture provided by the neural collapse theory, we show an approximate correlation between bias and variance.
    摘要

Augmented Computational Design: Methodical Application of Artificial Intelligence in Generative Design

  • paper_url: http://arxiv.org/abs/2310.09243
  • repo_url: None
  • paper_authors: Pirouz Nourian, Shervin Azadi, Roy Uijtendaal, Nan Bai
  • for: 这篇论文旨在探讨人工智能在生成设计中的必要性和实用性。
  • methods: 论文提出了通过人工智能加以增强生成设计过程,以达到一些关键性的结果或性能指标,而处理大量小型决策。
  • results: 论文提出了一些批判性的方向,用于在建筑设计中使用人工智能来增强决策过程,以映射和导航复杂的设计空间。
    Abstract This chapter presents methodological reflections on the necessity and utility of artificial intelligence in generative design. Specifically, the chapter discusses how generative design processes can be augmented by AI to deliver in terms of a few outcomes of interest or performance indicators while dealing with hundreds or thousands of small decisions. The core of the performance-based generative design paradigm is about making statistical or simulation-driven associations between these choices and consequences for mapping and navigating such a complex decision space. This chapter will discuss promising directions in Artificial Intelligence for augmenting decision-making processes in architectural design for mapping and navigating complex design spaces.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China.

Evaluating Machine Perception of Indigeneity: An Analysis of ChatGPT’s Perceptions of Indigenous Roles in Diverse Scenarios

  • paper_url: http://arxiv.org/abs/2310.09237
  • repo_url: None
  • paper_authors: Cecilia Delgado Solorzano, Carlos Toxtli Hernandez
  • for: 研究LLMs自我认知偏见关于原住民角色表演
  • methods: 通过生成和分析多个enario,研究如何技术对原住民偏见的潜在延展
  • results: 发现技术可能增强社会偏见关于原住民在社会计算中的表现
    Abstract Large Language Models (LLMs), like ChatGPT, are fundamentally tools trained on vast data, reflecting diverse societal impressions. This paper aims to investigate LLMs' self-perceived bias concerning indigeneity when simulating scenarios of indigenous people performing various roles. Through generating and analyzing multiple scenarios, this work offers a unique perspective on how technology perceives and potentially amplifies societal biases related to indigeneity in social computing. The findings offer insights into the broader implications of indigeneity in critical computing.
    摘要 大型语言模型(LLM),如ChatGPT,是基于庞大数据集训练的工具,具有多元社会印象。这篇论文旨在研究LLM对原住民角色的自我认知偏见。通过生成和分析多种enario,这项工作提供了对技术对社会偏见的探索,特别是对原住民在社交计算中的表现。发现有关原住民的扩展性和潜在的社会影响。

ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction

  • paper_url: http://arxiv.org/abs/2310.09234
  • repo_url: None
  • paper_authors: Jianghao Lin, Bo Chen, Hangyu Wang, Yunjia Xi, Yanru Qu, Xinyi Dai, Kangning Zhang, Ruiming Tang, Yong Yu, Weinan Zhang
  • for: 预测Click-through rate(CTR)在互联网应用中变得越来越重要,传统的CTR模型通过一个一采用一个简单的一个简单的一个简单的方法,将多个字段 categorical data 转化为 ID 特征,并提取特征之间的协作信号。这种思维方式受到 semantics 信息损失的问题。另一种研究方向是使用预训练语言模型(PLM)来预测CTR,将输入数据转化为文本句子通过硬件模板。尽管保留 semantics 信号,但通常无法捕捉特征间的协作信号(例如,特征间的交互信号、纯 ID 特征),更不用说具有庞大模型大小的潜在搅乱问题。
  • methods: 我们提出了一种新的模型agnostic框架(ClickPrompt),其中我们将 CTR 模型与 PLM 集成,以生成交互aware的软指示。我们设计了一个 prompt-augmented masked language modeling(PA-MLM)预训练任务,其中 PLM 需要根据语言上下文恢复隐藏token,同时根据软指示生成的字符串来恢复隐藏token。在这个过程中,ID 和文本特征之间的协作和semantics信息将被显式协调和交互。
  • results: 实验表明,ClickPrompt 比现有基线方法更有效。我们可以通过调整 CTR 模型与 PLM 的结合来提高性能,或者仅仅调整 CTR 模型来降低执行效率。
    Abstract Click-through rate (CTR) prediction has become increasingly indispensable for various Internet applications. Traditional CTR models convert the multi-field categorical data into ID features via one-hot encoding, and extract the collaborative signals among features. Such a paradigm suffers from the problem of semantic information loss. Another line of research explores the potential of pretrained language models (PLMs) for CTR prediction by converting input data into textual sentences through hard prompt templates. Although semantic signals are preserved, they generally fail to capture the collaborative information (e.g., feature interactions, pure ID features), not to mention the unacceptable inference overhead brought by the huge model size. In this paper, we aim to model both the semantic knowledge and collaborative knowledge for accurate CTR estimation, and meanwhile address the inference inefficiency issue. To benefit from both worlds and close their gaps, we propose a novel model-agnostic framework (i.e., ClickPrompt), where we incorporate CTR models to generate interaction-aware soft prompts for PLMs. We design a prompt-augmented masked language modeling (PA-MLM) pretraining task, where PLM has to recover the masked tokens based on the language context, as well as the soft prompts generated by CTR model. The collaborative and semantic knowledge from ID and textual features would be explicitly aligned and interacted via the prompt interface. Then, we can either tune the CTR model with PLM for superior performance, or solely tune the CTR model without PLM for inference efficiency. Experiments on four real-world datasets validate the effectiveness of ClickPrompt compared with existing baselines.
    摘要 点击率(CTR)预测已成为互联网应用中不可或缺的一种技术。传统CTR模型将多个字段分类资料转换为ID特征via一个单簇编码,并提取特征之间的协力信号。然而,这种模式受到 semantic information loss 的问题。另一线的研究则探访了使用预训语言模型(PLM)来CTR预测的可能性,将输入资料转换为文本句子via固定模板。虽保留 semantic signal,但通常无法捕捉特征互动信息(例如特征互动、纯ID特征),更不用说巨大模型的推断负担。在本文中,我们愿以独特的模型独立框架(ClickPrompt),将CTR模型与PLM融合,以生成互动意识适用的软提示。我们设计了一个Prompt-augmented masked language modeling(PA-MLM)训练任务,让PLM在语言上下文中恢复填充token,同时还需要根据软提示生成由CTR模型生成的。这样,ID和文本特征之间的协力和 semantic knowledge 会被明确地配置和互动,从而提高预测性能。然后,我们可以将CTR模型调整PLM,或者将CTR模型单独调整,以提高推断效率。在四个真实世界数据上进行了实验,显示ClickPrompt与现有基eline相比,具有更高的预测性能。

Fast & Efficient Learning of Bayesian Networks from Data: Knowledge Discovery and Causality

  • paper_url: http://arxiv.org/abs/2310.09222
  • repo_url: None
  • paper_authors: Minn Sein, Fu Shunkai
  • for: 本研究旨在提出两种基于PC算法的新算法,以优化 bayesian 网络结构学习的效率。
  • methods: 这两种算法使用本地搜索策略和conditional independence测试来学习 bayesian 网络结构,并使用 d-separation 来推断额外 topology 信息。
  • results: 实验研究显示,这两种算法可以与 PC 算法匹配 induction 质量,同时减少计算成本,使其在大数据分析中更加可靠。
    Abstract Structure learning is essential for Bayesian networks (BNs) as it uncovers causal relationships, and enables knowledge discovery, predictions, inferences, and decision-making under uncertainty. Two novel algorithms, FSBN and SSBN, based on the PC algorithm, employ local search strategy and conditional independence tests to learn the causal network structure from data. They incorporate d-separation to infer additional topology information, prioritize conditioning sets, and terminate the search immediately and efficiently. FSBN achieves up to 52% computation cost reduction, while SSBN surpasses it with a remarkable 72% reduction for a 200-node network. SSBN demonstrates further efficiency gains due to its intelligent strategy. Experimental studies show that both algorithms match the induction quality of the PC algorithm while significantly reducing computation costs. This enables them to offer interpretability and adaptability while reducing the computational burden, making them valuable for various applications in big data analytics.
    摘要 “结构学习是 bayesian 网络(BN)的关键,它揭示了 causal 关系,并允许知识发现、预测、推理和决策在不确定性下。两种新的算法,FSBN 和 SSBN,基于 PC 算法,使用本地搜索策略和 conditional independence 测试来学习 causal 网络结构。它们利用 d-separation 来推断额外 topology 信息,优先考虑conditioning 集,并立即和高效地 terminate 搜索。FSBN 可以减少计算成本,而 SSBN 甚至超越它,在 200 个节点网络中减少了 72% 的计算成本。SSBN 的智能策略还提供了进一步的效率提升。实验研究表明,这两种算法可以匹配 PC 算法的induction 质量,同时减少计算成本,使其在 big data 分析中提供了可读性和可变性,这些特点使它们在各种应用中非常有价值。”

“Kelly is a Warm Person, Joseph is a Role Model”: Gender Biases in LLM-Generated Reference Letters

  • paper_url: http://arxiv.org/abs/2310.09219
  • repo_url: https://github.com/uclanlp/biases-llm-reference-letters
  • paper_authors: Yixin Wan, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, Nanyun Peng
  • for: 这个论文旨在探讨大语言模型(LLM)在写作推荐信函中的公平问题。
  • methods: 作者采用了社会科学发现的方法来评估LLM生成的推荐信函中的语言风格和 lexical content 中的偏见。他们还研究了模型在生成内容中的偏见延展现象,即模型生成的内容中带有偏见的现象。
  • results: 研究发现了两个流行的LLM——ChatGPT和Alpaca在生成推荐信函中存在 significiant gender bias。这些结果警示我们不能不加工程序地使用LLM来生成专业文本,并重要地强调了对LLM生成的专业文本进行仔细的研究和分析。
    Abstract Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content, including professional documents such as recommendation letters. Though bringing convenience, this application also introduces unprecedented fairness concerns. Model-generated reference letters might be directly used by users in professional scenarios. If underlying biases exist in these model-constructed letters, using them without scrutinization could lead to direct societal harms, such as sabotaging application success rates for female applicants. In light of this pressing issue, it is imminent and necessary to comprehensively study fairness issues and associated harms in this real-world use case. In this paper, we critically examine gender biases in LLM-generated reference letters. Drawing inspiration from social science findings, we design evaluation methods to manifest biases through 2 dimensions: (1) biases in language style and (2) biases in lexical content. We further investigate the extent of bias propagation by analyzing the hallucination bias of models, a term that we define to be bias exacerbation in model-hallucinated contents. Through benchmarking evaluation on 2 popular LLMs- ChatGPT and Alpaca, we reveal significant gender biases in LLM-generated recommendation letters. Our findings not only warn against using LLMs for this application without scrutinization, but also illuminate the importance of thoroughly studying hidden biases and harms in LLM-generated professional documents.
    摘要 大型语言模型(LLM)在近期已经作为帮助人们写不同类型的内容的有效工具出现。虽然带来便利,但这种应用也引入了未曾有的公平问题。用户可以直接使用模型生成的推荐信的情况下,如果下面的偏见存在,不仔细审核可能会导致社会性危害,如女性申请人的应用成功率下降。为了解决这 pressing issue,我们必须全面研究公平问题和相关的危害。在这篇论文中,我们 kritisch examines gender biases in LLM-generated reference letters。通过社会科学发现的引用,我们设计了评估方法,以manifest biases through two dimensions: (1) biases in language style and (2) biases in lexical content。我们进一步分析模型的偏见传播情况,包括模型生成内容中的偏见扩大现象,我们定义为模型幻觉偏见。通过对2种流行的LLM-ChatGPT和Alpaca进行标准评估,我们发现了LLM生成的推荐信中的强烈性别偏见。我们的发现不仅警示了不得不仔细审核LLM生成的专业文档,还抛光了研究隐藏偏见和危害的重要性。

Multinational AGI Consortium (MAGIC): A Proposal for International Coordination on AI

  • paper_url: http://arxiv.org/abs/2310.09217
  • repo_url: None
  • paper_authors: Jason Hausenloy, Andrea Miotti, Claire Dennis
  • for: 避免高级人工智能(AI)的存在 риSK,MAGIC提议建立一个多国人工智能总合研究机构。
  • methods: MAGIC 通过全球停止其他高级 AI 开发,成为全球唯一允许开发高级 AI 的机构,实现安全、高度安全、且由成员国共同支持。
  • results: MAGIC 可以允许 narrow AI 模型繁殖,同时减少高级 AI 的可能性,即不良、违规、突然爆发、跑道违规等 outcome。
    Abstract This paper proposes a Multinational Artificial General Intelligence Consortium (MAGIC) to mitigate existential risks from advanced artificial intelligence (AI). MAGIC would be the only institution in the world permitted to develop advanced AI, enforced through a global moratorium by its signatory members on all other advanced AI development. MAGIC would be exclusive, safety-focused, highly secure, and collectively supported by member states, with benefits distributed equitably among signatories. MAGIC would allow narrow AI models to flourish while significantly reducing the possibility of misaligned, rogue, breakout, or runaway outcomes of general-purpose systems. We do not address the political feasibility of implementing a moratorium or address the specific legislative strategies and rules needed to enforce a ban on high-capacity AGI training runs. Instead, we propose one positive vision of the future, where MAGIC, as a global governance regime, can lay the groundwork for long-term, safe regulation of advanced AI.
    摘要 Translation notes:* "Multinational" is translated as "多国" (duōguó), which means "multi-country" or "international".* "Artificial General Intelligence" is translated as "人工通用智能" (réngōng tōngyòu zhìnéng), which means "artificial general intelligence".* "Consortium" is translated as "联盟" (liánméng), which means "alliance" or "consortium".* "Exclusive" is translated as "独特" (dāngtè), which means "unique" or "exclusive".* "Safety-focused" is translated as "安全专注" (ānquán zhōngzhin), which means "safety-focused" or "security-oriented".* "Highly secure" is translated as "高度安全" (gāodù ānquán), which means "highly secure" or "highly safe".* "Collectively supported" is translated as "共同支持" (gòngdòng zhīchí), which means "collectively supported" or "jointly supported".* "Benefits distributed equitably" is translated as "利益均衡分配" (lìyì jìnghóng bùdài), which means "benefits distributed equitably" or "benefits shared fairly".* "Narrow AI models" is translated as "窄AI模型" (zhòu AI módelì), which means "narrow AI models" or "specialized AI models".* "General-purpose systems" is translated as "通用系统" (tōngyòu xìzhì), which means "general-purpose systems" or "all-purpose systems".* "Misaligned, rogue, breakout, or runaway" is translated as "偏移、违规、崩溃、跑 wild" (péngyì, yìguī, bēiqì, pǎo wild), which means "misaligned, rogue, breakout, or runaway".

SiamAF: Learning Shared Information from ECG and PPG Signals for Robust Atrial Fibrillation Detection

  • paper_url: http://arxiv.org/abs/2310.09203
  • repo_url: https://github.com/chengstark/siamaf
  • paper_authors: Zhicheng Guo, Cheng Ding, Duc H. Do, Amit Shah, Randall J. Lee, Xiao Hu, Cynthia Rudin
  • for: 预防心脏病变和不良临床结果,探索pasive AF监测技术
  • methods: 提出了一种新的Siamese网络架构和共同学习损失函数,协助学习ECG和PPG信号之间的共同信息
  • results: 在三个外部测试集上,提出的模型可以准确预测AF,并且在各种情况下表现较好,需要 fewer标签数据,可能为未来减少人工标注带来新的可能性。
    Abstract Atrial fibrillation (AF) is the most common type of cardiac arrhythmia. It is associated with an increased risk of stroke, heart failure, and other cardiovascular complications, but can be clinically silent. Passive AF monitoring with wearables may help reduce adverse clinical outcomes related to AF. Detecting AF in noisy wearable data poses a significant challenge, leading to the emergence of various deep learning techniques. Previous deep learning models learn from a single modality, either electrocardiogram (ECG) or photoplethysmography (PPG) signals. However, deep learning models often struggle to learn generalizable features and rely on features that are more susceptible to corruption from noise, leading to sub-optimal performances in certain scenarios, especially with low-quality signals. Given the increasing availability of ECG and PPG signal pairs from wearables and bedside monitors, we propose a new approach, SiamAF, leveraging a novel Siamese network architecture and joint learning loss function to learn shared information from both ECG and PPG signals. At inference time, the proposed model is able to predict AF from either PPG or ECG and outperforms baseline methods on three external test sets. It learns medically relevant features as a result of our novel architecture design. The proposed model also achieves comparable performance to traditional learning regimes while requiring much fewer training labels, providing a potential approach to reduce future reliance on manual labeling.
    摘要 《心律失常(AF)是心脏不正常的最常见类型。它与心血管疾病、心力衰竭和其他cardiovascular Complications的风险增加相关,但可能是临床无症状。通过佩戴式监测器,可以减少AF相关的不良临床结果。检测AF噪音监测器数据中存在挑战,导致深度学习技术的出现。先前的深度学习模型通常只学习单一modal,可以是电cardiogram(ECG)或光谱 Plethysmography(PPG)信号。但深度学习模型经常难以学习通用特征,而且受到噪音的污染,导致在某些情况下表现下降。随着ECG和PPG信号对象的增加,我们提出了一种新的方法,即SiamAF,利用了一种新的siamesenet架构和共同学习损失函数来学习ECG和PPG信号之间的共同信息。在推理时,我们的模型能够基于PPG或ECG信号预测AF,并在三个外部测试集上超越基准方法。它学习了医学 relevance 的特征,这得于我们的新架构设计。我们的模型还可以与传统学习方法相比,需要远少的训练标签,提供了一个可能性,以减少未来对手动标注的依赖。》

Tikuna: An Ethereum Blockchain Network Security Monitoring System

  • paper_url: http://arxiv.org/abs/2310.09193
  • repo_url: None
  • paper_authors: Andres Gomez Ramirez, Loui Al Sardy, Francis Gomez Ramirez
  • for: 本研究旨在保护区块链的最低层,即P2P网络层,以防止许多种攻击,如分布式拒绝服务攻击(DDoS)、 Eclipse 攻击和 Sybil 攻击。
  • methods: 本研究使用了一种无监督的Long Short-Term Memory(LSTM)方法,基于Recurrent Neural Network(RNN)来检测攻击并警示用户。
  • results: 实验结果表明,提议的方法可以有效地检测和分类攻击,包括 Eclipse 攻击、 Covert Flash 攻击和其他攻击,具有高度准确性。
    Abstract Blockchain security is becoming increasingly relevant in today's cyberspace as it extends its influence in many industries. This paper focuses on protecting the lowest level layer in the blockchain, particularly the P2P network that allows the nodes to communicate and share information. The P2P network layer may be vulnerable to several families of attacks, such as Distributed Denial of Service (DDoS), eclipse attacks, or Sybil attacks. This layer is prone to threats inherited from traditional P2P networks, and it must be analyzed and understood by collecting data and extracting insights from the network behavior to reduce those risks. We introduce Tikuna, an open-source tool for monitoring and detecting potential attacks on the Ethereum blockchain P2P network, at an early stage. Tikuna employs an unsupervised Long Short-Term Memory (LSTM) method based on Recurrent Neural Network (RNN) to detect attacks and alert users. Empirical results indicate that the proposed approach significantly improves detection performance, with the ability to detect and classify attacks, including eclipse attacks, Covert Flash attacks, and others that target the Ethereum blockchain P2P network layer, with high accuracy. Our research findings demonstrate that Tikuna is a valuable security tool for assisting operators to efficiently monitor and safeguard the status of Ethereum validators and the wider P2P network
    摘要 区块链安全在今天的网络空间变得越来越重要,它在多个领域扮演着重要的角色。本文关注保护区块链的最低层级,即点对点网络层,该层可能受到多种攻击,如分布式拒绝服务(DDoS)、 Eclipse 攻击和 Sybil 攻击。这层面受到传统点对点网络中的威胁,需要分析和理解网络行为以降低风险。我们介绍了 Tikuna,一个开源的监控和检测区块链 P2P 网络攻击的工具,可以在早期发现攻击。Tikuna 使用无监督的 Long Short-Term Memory(LSTM)方法基于 Recurrent Neural Network(RNN)来检测攻击并警示用户。实验结果表明,我们的方法可以准确地检测和分类攻击,包括 Eclipse 攻击、 Covert Flash 攻击和其他targeting Ethereum 区块链 P2P 网络层的攻击,并且具有高精度。我们的研究发现表明,Tikuna 是一种有价值的安全工具,可以帮助操作员有效地监控和保护 Ethereum 验证人和更广泛的 P2P 网络。

Does Graph Distillation See Like Vision Dataset Counterpart?

  • paper_url: http://arxiv.org/abs/2310.09192
  • repo_url: https://github.com/suchun-sv/sgdd
  • paper_authors: Beining Yang, Kai Wang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Hao Tang, Yang You, Jianxin Li
  • for: 这篇论文主要是为了提高大规模图像的训练成本和储存问题,并且探索原始图像结构信息的影响。
  • methods: 本论文提出了一个名为Structure-broadcasting Graph Dataset Distillation(SGDD)的新方法,它可以将原始图像结构信息转散到生成的实验图像中,以避免遗传原始图像结构信息的问题。
  • results: 本论文透过实验证明了SGDD的可行性和必要性,并且在9个测试 dataset 上 achieved state-of-the-art 的结果,例如在 YelpChi 测试 dataset 上,我们的方法可以保持98.6%的训练测试准确率,并且实现了1,000倍的图像减少。此外,我们还证明了SGDD 可以将 LED 差值降低17.6% ~ 31.4%。
    Abstract Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have attracted increasing concerns. Existing graph condensation methods primarily focus on optimizing the feature matrices of condensed graphs while overlooking the impact of the structure information from the original graphs. To investigate the impact of the structure information, we conduct analysis from the spectral domain and empirically identify substantial Laplacian Energy Distribution (LED) shifts in previous works. Such shifts lead to poor performance in cross-architecture generalization and specific tasks, including anomaly detection and link prediction. In this paper, we propose a novel Structure-broadcasting Graph Dataset Distillation (SGDD) scheme for broadcasting the original structure information to the generation of the synthetic one, which explicitly prevents overlooking the original structure information. Theoretically, the synthetic graphs by SGDD are expected to have smaller LED shifts than previous works, leading to superior performance in both cross-architecture settings and specific tasks. We validate the proposed SGDD across 9 datasets and achieve state-of-the-art results on all of them: for example, on the YelpChi dataset, our approach maintains 98.6% test accuracy of training on the original graph dataset with 1,000 times saving on the scale of the graph. Moreover, we empirically evaluate there exist 17.6% ~ 31.4% reductions in LED shift crossing 9 datasets. Extensive experiments and analysis verify the effectiveness and necessity of the proposed designs. The code is available in the GitHub repository: https://github.com/RingBDStack/SGDD.
    摘要 大规模图学习训练已经取得了很好的成果,但是其成本和存储空间受到了越来越多的关注。现有的图压缩方法主要是优化压缩图的特征矩阵,而忽略了原始图的结构信息的影响。为了调查结构信息的影响,我们从 спектраль频谱领域进行分析,并观察到了先前的作品中的很大的 Laplacian Energy Distribution(LED)shift。这些shift导致了跨建筑物和特定任务的表现不佳,包括异常检测和链接预测。在这篇论文中,我们提出了一种新的结构广播图 dataset distillation(SGDD)方案,用于将原始结构信息广播到生成的 sintetic 图中,以避免忽略原始结构信息。理论上,由SGDD生成的 sintetic 图将有更小的 LED shift,导致跨建筑物和特定任务的表现更佳。我们在 9 个数据集上验证了我们的方法,并在所有数据集上达到了状态的最佳结果:例如,在 YelpChi 数据集上,我们的方法保持了训练在原始图数据集上的 98.6% 测试准确率,并且在 1,000 倍缩放的图数据集上实现了 1,000 倍的缩放。此外,我们也进行了实验和分析,证明了我们的方法的有效性和必要性。代码可以在 GitHub 上找到:https://github.com/RingBDStack/SGDD。

PRIOR: Personalized Prior for Reactivating the Information Overlooked in Federated Learning

  • paper_url: http://arxiv.org/abs/2310.09183
  • repo_url: https://github.com/bdemo/pfedbred_public
  • paper_authors: Mingjia Shi, Yuhao Zhou, Kai Wang, Huaizheng Zhang, Shudong Huang, Qing Ye, Jiangcheng Lv
  • for: 提高个人化 Federated Learning(PFL)的性能,解决各种数据特点导致的模型衰退问题。
  • methods: 提出一种基于Bregman divergence的个人化优先知识注入方法(pFedBreD),具有更好的个人化适应性和可选的策略。
  • results: 实验表明,提出的方法可以达到现场状态的性能,比其他方法高出3.5%以上,并且经过广泛的分析证明了方法的 Robustness 和必要性。
    Abstract Classical federated learning (FL) enables training machine learning models without sharing data for privacy preservation, but heterogeneous data characteristic degrades the performance of the localized model. Personalized FL (PFL) addresses this by synthesizing personalized models from a global model via training on local data. Such a global model may overlook the specific information that the clients have been sampled. In this paper, we propose a novel scheme to inject personalized prior knowledge into the global model in each client, which attempts to mitigate the introduced incomplete information problem in PFL. At the heart of our proposed approach is a framework, the PFL with Bregman Divergence (pFedBreD), decoupling the personalized prior from the local objective function regularized by Bregman divergence for greater adaptability in personalized scenarios. We also relax the mirror descent (RMD) to extract the prior explicitly to provide optional strategies. Additionally, our pFedBreD is backed up by a convergence analysis. Sufficient experiments demonstrate that our method reaches the state-of-the-art performances on 5 datasets and outperforms other methods by up to 3.5% across 8 benchmarks. Extensive analyses verify the robustness and necessity of proposed designs.
    摘要 传统的联合学习(FL)可以帮助学习机器学习模型无需分享数据,以保护隐私,但是各种数据特点会导致本地模型的性能下降。个性化联合学习(PFL)解决了这个问题,通过将本地数据用于个性化模型的训练来生成个性化模型。然而,这种全球模型可能会忽略客户端上的特定信息。在这篇论文中,我们提出了一种新的方法,将个性化先验知识注入到全球模型中,以降低在PFL中引入的不完整信息问题。我们的提议方法基于一个框架,即PFL with Bregman Divergence(pFedBreD),它将个性化先验与本地对象函数正则化的Bregman divergence分离开来,以提高在个性化场景中的适应性。此外,我们还将反向投影(RMD)放松到提取先验,以提供可选的策略。此外,我们的pFedBreD还得到了收敛分析。我们的实验表明,我们的方法可以在5个数据集上达到领先的性能,并且在8个标准准则上超过其他方法的3.5%。广泛的分析也证明了我们的设计的稳定性和必要性。

mnmDTW: An extension to Dynamic Time Warping for Camera-based Movement Error Localization

  • paper_url: http://arxiv.org/abs/2310.09170
  • repo_url: None
  • paper_authors: Sebastian Dill, Maurice Rohr
  • for: 这个论文用Computer Vision(CV)方法提取了运动视频中的姿态信息,并使用修改后的DTW计算器来评估运动的准确性。
  • methods: 这个论文使用了CV方法提取姿态信息,并使用修改后的DTW计算器来评估运动的准确性。
  • results: 这个论文可以清晰地显示运动中的错误,并且可以准确地定位错误的位置和时间。
    Abstract In this proof of concept, we use Computer Vision (CV) methods to extract pose information out of exercise videos. We then employ a modified version of Dynamic Time Warping (DTW) to calculate the deviation from a gold standard execution of the exercise. Specifically, we calculate the distance between each body part individually to get a more precise measure for exercise accuracy. We can show that exercise mistakes are clearly visible, identifiable and localizable through this metric.
    摘要 在这个Proof of Concept中,我们使用计算机视觉(CV)方法提取运动视频中的姿势信息。然后,我们使用修改后的动态时间扩展(DTW)来计算运动 preciseness。具体来说,我们计算每个身体部分之间的距离,以获得更加精确的运动准确性度量。我们可以证明,通过这个指标,运动错误都能够明显、识别和定位。

Quantum Machine Learning in Climate Change and Sustainability: a Review

  • paper_url: http://arxiv.org/abs/2310.09162
  • repo_url: None
  • paper_authors: Amal Nammouchi, Andreas Kassler, Andreas Theorachis
  • for: 这个论文的目的是探讨用量子机器学习方法解决气候变化和可持续发展的问题。
  • methods: 论文评论了已有的量子机器学习方法,包括能源系统启动、气候数据预测、气候监测和危险事件预测。
  • results: 论文提出了量子机器学习方法的挑战和未来工作,以便更好地利用这些方法在气候变化研究中。
    Abstract Climate change and its impact on global sustainability are critical challenges, demanding innovative solutions that combine cutting-edge technologies and scientific insights. Quantum machine learning (QML) has emerged as a promising paradigm that harnesses the power of quantum computing to address complex problems in various domains including climate change and sustainability. In this work, we survey existing literature that applies quantum machine learning to solve climate change and sustainability-related problems. We review promising QML methodologies that have the potential to accelerate decarbonization including energy systems, climate data forecasting, climate monitoring, and hazardous events predictions. We discuss the challenges and current limitations of quantum machine learning approaches and provide an overview of potential opportunities and future work to leverage QML-based methods in the important area of climate change research.
    摘要 клима变化和其对全球可持续发展的影响是急需创新解决方案,这些解决方案结合 cutting-edge 技术和科学成果。量子机器学习(QML)已经出现为解决复杂问题的有力方法之一,其在不同领域,包括气候变化和可持续发展,提供了新的思路。在这篇文章中,我们对已有的文献进行了评论,检查了应用量子机器学习解决气候变化和可持续发展相关问题的可能性。我们评估了具有加速减排能源系统、气候数据预测、气候监测和危险事件预测的潜在优势。我们还讨论了量子机器学习方法的挑战和当前的限制,并提供了未来可能性和未来工作的概述,以便更好地利用QML在气候变化研究中的应用。

Learning To Teach Large Language Models Logical Reasoning

  • paper_url: http://arxiv.org/abs/2310.09158
  • repo_url: https://github.com/chenmeiqii/teach-llm-lr
  • paper_authors: Meiqi Chen, Yubo Ma, Kaitao Song, Yixin Cao, Yan Zhang, Dongsheng Li
  • for: 本研究旨在系统地探讨大型自然语言模型(LLMs)在逻辑推理中的能力,以解决现有LLMs在实际理性任务中输出不可靠内容的问题。
  • methods: 本研究采用了多种方法来探讨LLMs的逻辑推理能力,包括事件关系EXTRACTION和推理逻辑。我们的研究显示,LLMs在解决需要严格逻辑推理的任务时存在问题,并产生了不符合逻辑的答案,需要 iterative refinement。
  • results: 我们的研究发现,通过不同的策略可以启用LLMs的逻辑推理能力,并且可以生成更符合逻辑的答案。此外,我们还提供了一个合成数据集(LLM-LR),用于评估和预训练LLMs。广泛的量化和质量分析也证明了我们的方法的有效性和必要性,并为未来使用LLMs解决实际任务提供了洞察。
    Abstract Large language models (LLMs) have gained enormous attention from both academia and industry, due to their exceptional ability in language generation and extremely powerful generalization. However, current LLMs still output unreliable content in practical reasoning tasks due to their inherent issues (e.g., hallucination). To better disentangle this problem, in this paper, we conduct an in-depth investigation to systematically explore the capability of LLMs in logical reasoning. More in detail, we first investigate the deficiency of LLMs in logical reasoning on different tasks, including event relation extraction and deductive reasoning. Our study demonstrates that LLMs are not good reasoners in solving tasks with rigorous reasoning and will produce counterfactual answers, which require us to iteratively refine. Therefore, we comprehensively explore different strategies to endow LLMs with logical reasoning ability, and thus enable them to generate more logically consistent answers across different scenarios. Based on our approach, we also contribute a synthesized dataset (LLM-LR) involving multi-hop reasoning for evaluation and pre-training. Extensive quantitative and qualitative analyses on different tasks also validate the effectiveness and necessity of teaching LLMs with logic and provide insights for solving practical tasks with LLMs in future work.
    摘要

Lincoln AI Computing Survey (LAICS) Update

  • paper_url: http://arxiv.org/abs/2310.09145
  • repo_url: None
  • paper_authors: Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner
  • for: 本文是 Lincoln AI Computing Survey(LAICS)的四年度更新,它收集和总结了过去四年内公开宣布的商业加速器。
  • methods: 本文使用 scatter graph plot 来展示性能和能耗值的趋势,并分析了一些维度和观察结果。
  • results: 本文发现了一些市场 segments,并提供了每个新加速器的简短描述。
    Abstract This paper is an update of the survey of AI accelerators and processors from past four years, which is now called the Lincoln AI Computing Survey - LAICS (pronounced "lace"). As in past years, this paper collects and summarizes the current commercial accelerators that have been publicly announced with peak performance and peak power consumption numbers. The performance and power values are plotted on a scatter graph, and a number of dimensions and observations from the trends on this plot are again discussed and analyzed. Market segments are highlighted on the scatter plot, and zoomed plots of each segment are also included. Finally, a brief description of each of the new accelerators that have been added in the survey this year is included.
    摘要 这份报告是过去四年的AI加速器和处理器的调查更新,现更名为林肯AI计算调查(LAICS,发音为“lace”)。与过去年度一样,这份报告收集和总结了公共宣布的最高性能和最高电力消耗数据的商业加速器。性能和电力值Plot在散点图中,并从图形上的趋势和特征进行了讨论和分析。市场分 segment在散点图上高亮,并包括每个分 segment的缩放图。最后,报告还包括这年新增加的加速器的简要描述。

  • paper_url: http://arxiv.org/abs/2310.09139
  • repo_url: None
  • paper_authors: Athul Paul Jacob, Yikang Shen, Gabriele Farina, Jacob Andreas
  • for: 提高语言模型(LM)的预测准确性和一致性,解决LM在不同查询和评分方法下的矛盾问题。
  • methods: 基于游戏理论的语言模型解oding算法,通过自然语言句子来沟通抽象正确参数,并通过找到稳定的equilibrium来解决问题。
  • results: 应用于多个任务(包括阅读理解、常识理解、数学问题解决和对话),EQUILIBRIUM-RANKING算法可以持续性地提高LM的表现,并在一些任务上超越较大的LLaMA-65B和PaLM-540B模型。这些结果表明了游戏理论工具在LM中的潜力。
    Abstract When applied to question answering and other text generation tasks, language models (LMs) may be queried generatively (by sampling answers from their output distribution) or discriminatively (by using them to score or rank a set of candidate outputs). These procedures sometimes yield very different predictions. How do we reconcile mutually incompatible scoring procedures to obtain coherent LM predictions? We introduce a new, a training-free, game-theoretic procedure for language model decoding. Our approach casts language model decoding as a regularized imperfect-information sequential signaling game - which we term the CONSENSUS GAME - in which a GENERATOR seeks to communicate an abstract correctness parameter using natural language sentences to a DISCRIMINATOR. We develop computational procedures for finding approximate equilibria of this game, resulting in a decoding algorithm we call EQUILIBRIUM-RANKING. Applied to a large number of tasks (including reading comprehension, commonsense reasoning, mathematical problem-solving, and dialog), EQUILIBRIUM-RANKING consistently, and sometimes substantially, improves performance over existing LM decoding procedures - on multiple benchmarks, we observe that applying EQUILIBRIUM-RANKING to LLaMA-7B outperforms the much larger LLaMA-65B and PaLM-540B models. These results highlight the promise of game-theoretic tools for addressing fundamental challenges of truthfulness and consistency in LMs.
    摘要 当应用到问答和其他文本生成任务中,语言模型(LM)可能被生成式(通过采样答案从其输出分布)或者排名式(通过使用它们评分或排名候选答案) queried。这些过程有时会产生非常不同的预测。如何使得语言模型预测得到协调?我们提出了一种新的,无需训练的,游戏理论方法 для语言模型解oding。我们将语言模型解oding转化为一种正则化不完全信息sequential signaling游戏,我们称之为CONSENSUS GAME。在这个游戏中,一个生成器尝试通过自然语言句子来与一个iscriminator通过自然语言句子来传达抽象正确性参数。我们开发了计算过程来找到approximate equilibria的游戏,导致一种解oding算法称为EQUILIBRIUM-RANKING。我们应用了这种算法到许多任务(包括阅读理解、通用理性、数学问题解决和对话),我们发现EQUILIBRIUM-RANKING可以在多个benchmark上连续、有时也有很大的提高性能,比如在LLaMA-7B上应用EQUILIBRIUM-RANKING可以超过LLaMA-65B和PaLM-540B模型。这些结果highlight了游戏理论工具在LM中的推动力和可能性。

HierarchicalContrast: A Coarse-to-Fine Contrastive Learning Framework for Cross-Domain Zero-Shot Slot Filling

  • paper_url: http://arxiv.org/abs/2310.09135
  • repo_url: https://github.com/ai-agi/hicl
  • paper_authors: Junwen Zhang, Yin Zhang
  • for: 这个论文的目的是提出一种基于层次对比学习的零shot槽填方法,以提高这种方法在未知目标领域中的泛化能力。
  • methods: 这个方法使用了一种层次对比学习的 Gaussian-distributed embedding,以学习utterance-token之间的普适深度 semantics关系。
  • results: 实验表明,提出的方法在四个数据集上实现了与当前领域内的最佳性能,或者甚至超越了现有的零shot槽填方法。
    Abstract In task-oriented dialogue scenarios, cross-domain zero-shot slot filling plays a vital role in leveraging source domain knowledge to learn a model with high generalization ability in unknown target domain where annotated data is unavailable. However, the existing state-of-the-art zero-shot slot filling methods have limited generalization ability in target domain, they only show effective knowledge transfer on seen slots and perform poorly on unseen slots. To alleviate this issue, we present a novel Hierarchical Contrastive Learning Framework (HiCL) for zero-shot slot filling. Specifically, we propose a coarse- to fine-grained contrastive learning based on Gaussian-distributed embedding to learn the generalized deep semantic relations between utterance-tokens, by optimizing inter- and intra-token distribution distance. This encourages HiCL to generalize to the slot types unseen at training phase. Furthermore, we present a new iterative label set semantics inference method to unbiasedly and separately evaluate the performance of unseen slot types which entangled with their counterparts (i.e., seen slot types) in the previous zero-shot slot filling evaluation methods. The extensive empirical experiments on four datasets demonstrate that the proposed method achieves comparable or even better performance than the current state-of-the-art zero-shot slot filling approaches.
    摘要 在任务导向对话场景中,cross-domain零shot槽填扮演着抽象知识转移的重要角色,以便在未知目标领域中学习一个高度泛化能力的模型,而无需 annotated data。然而,现有的state-of-the-art零shot槽填方法具有有限的泛化能力在目标领域,它们只能在已经看过的槽上显示有效的知识传递,并在未经看过的槽上表现糟糕。为了解决这一问题,我们提出了一种新的层次对比学习框架(HiCL) для零shot槽填。具体来说,我们提出了一种由Gaussian分布 embedding学习架构,用于学习话语元素之间的总体深度 semantics关系,通过优化 интер-和内部Token分布距离来鼓励HiCL泛化到未经训练的槽类型。此外,我们提出了一种新的迭代标签集semantics推理方法,用于不偏向地、独立地评估预测器在训练阶段未经训练的槽类型的性能。经验证了四个数据集的广泛实验表明,提出的方法可以与当前state-of-the-art零shot槽填方法相比或更好的性能。

Split-and-Denoise: Protect large language model inference with local differential privacy

  • paper_url: http://arxiv.org/abs/2310.09130
  • repo_url: None
  • paper_authors: Peihua Mai, Ran Yan, Zhe Huang, Youjia Yang, Yan Pang
  • for: 这篇研究旨在提供一个可以保护用户隐私的方法来使用大型自然语言模型 (LLM),以便在不同的下游任务中使用嵌入。
  • methods: 这篇研究提出了一个名为 Split-N-Denoise (SnD) 的框架,它可以在客户端执行token嵌入层,并将随机误差引入到嵌入中以防止隐私泄露。
  • results: 实验结果显示,SnD 可以优化隐私与使用性的贡献变数,并在不同的 LLM 架构和多种下游任务中表现出色。相比基eline,SnD 可以在同等隐私预算下提供更高的性能,为用户提供一个隐私保护的解决方案。
    Abstract Large Language Models (LLMs) shows powerful capability in natural language understanding by capturing hidden semantics in vector space. This process enriches the value of the text embeddings for various downstream tasks, thereby fostering the Embedding-as-a-Service (EaaS) business model. However, the direct transmission of text to servers poses a largely unaddressed risk of privacy leakage. To mitigate this issue, we introduce Split-N-Denoise (SnD), an innovative framework that split the model to execute the token embedding layer on the client side at minimal computational cost. This allows the client to introduce noise prior to transmitting the embeddings to the server, and subsequently receive and denoise the perturbed output embeddings for downstream tasks. Our approach is designed for the inference stage of LLMs and requires no modifications to the model parameters. Extensive experiments demonstrate SnD's effectiveness in optimizing the privacy-utility tradeoff across various LLM architectures and diverse downstream tasks. The results reveal a significant performance improvement under the same privacy budget compared to the baseline, offering clients a privacy-preserving solution for local privacy protection.
    摘要

Timestamp-supervised Wearable-based Activity Segmentation and Recognition with Contrastive Learning and Order-Preserving Optimal Transport

  • paper_url: http://arxiv.org/abs/2310.09114
  • repo_url: None
  • paper_authors: Songpengcheng Xia, Lei Chu, Ling Pei, Jiarui Yang, Wenxian Yu, Robert C. Qiu
  • for: 本研究旨在提出一种基于深度学习的同时进行人体活动 segmentation和识别的方法,以解决现有的多类窗口问题。
  • methods: 该方法使用时间批处理和深度学习方法进行人体活动识别和时间序列 segmentation,并使用时间批处理中的一个标注样本来帮助学习模型。
  • results: 对四个公共的人体活动数据集进行了广泛的实验,结果显示该方法在弱监督方法中比STATE-OF-THE-ART的方法表现更优,并且与完全监督方法具有相似的性能。
    Abstract Human activity recognition (HAR) with wearables is one of the serviceable technologies in ubiquitous and mobile computing applications. The sliding-window scheme is widely adopted while suffering from the multi-class windows problem. As a result, there is a growing focus on joint segmentation and recognition with deep-learning methods, aiming at simultaneously dealing with HAR and time-series segmentation issues. However, obtaining the full activity annotations of wearable data sequences is resource-intensive or time-consuming, while unsupervised methods yield poor performance. To address these challenges, we propose a novel method for joint activity segmentation and recognition with timestamp supervision, in which only a single annotated sample is needed in each activity segment. However, the limited information of sparse annotations exacerbates the gap between recognition and segmentation tasks, leading to sub-optimal model performance. Therefore, the prototypes are estimated by class-activation maps to form a sample-to-prototype contrast module for well-structured embeddings. Moreover, with the optimal transport theory, our approach generates the sample-level pseudo-labels that take advantage of unlabeled data between timestamp annotations for further performance improvement. Comprehensive experiments on four public HAR datasets demonstrate that our model trained with timestamp supervision is superior to the state-of-the-art weakly-supervised methods and achieves comparable performance to the fully-supervised approaches.
    摘要 人类活动识别(HAR)使用护套设备是现代 ubique 和移动计算应用中的可靠技术之一。滑块策略广泛应用,但受到多类窗口问题的困扰。为了解决这些问题,我们提出了一种新的活动分割和识别方法,使用时间戳监督,只需要每个活动分段中一个注解样本。然而,稀有的注解信息使得recognition和分割任务之间的差距加大,导致模型性能下降。因此,我们使用类活动图来估算原型,并通过最优运输理论生成时间级别的假标签,以提高表现。经过对四个公共HAR数据集的全面实验,我们发现,我们使用时间戳监督训练的模型在弱监督方法中至少与状态之前的最优方法相当,并且与完全监督方法具有相似的性能。

GLoRE: Evaluating Logical Reasoning of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.09107
  • repo_url: https://github.com/csitfun/glore
  • paper_authors: Hanmeng liu, Zhiyang Teng, Ruoxi Ning, Jian Liu, Qiji Zhou, Yue Zhang
  • for: 评估大语言模型(LLM)的逻辑推理能力
  • methods: 使用12个不同类型的任务组成的General Logical Reasoning Evaluation benchmark进行评估
  • results: 比较human和监督 fine-tuning的性能,开放LLM模型的逻辑推理能力需要进一步提高,ChatGPT和GPT-4在逻辑推理能力方面表现出色,GPT-4在ChatGPT之上表现出了明显的优势。
    Abstract Recently, large language models (LLMs), including notable models such as GPT-4 and burgeoning community models, have showcased significant general language understanding abilities. However, there has been a scarcity of attempts to assess the logical reasoning capacities of these LLMs, an essential facet of natural language understanding. To encourage further investigation in this area, we introduce GLoRE, a meticulously assembled General Logical Reasoning Evaluation benchmark comprised of 12 datasets that span three different types of tasks. Our experimental results show that compared to the performance of human and supervised fine-tuning, the logical reasoning capabilities of open LLM models necessitate additional improvement; ChatGPT and GPT-4 show a strong capability of logical reasoning, with GPT-4 surpassing ChatGPT by a large margin. We propose a self-consistency probing method to enhance the accuracy of ChatGPT and a fine-tuned method to boost the performance of an open LLM. We release the datasets and evaluation programs to facilitate future research.
    摘要 Here is the translation in Simplified Chinese:现在,大型语言模型(LLM),包括知名模型GPT-4和社区模型,已经展示出了很强的通用语言理解能力。然而,对这些LLM的逻辑推理能力的评估却相对缺乏。为促进这一领域的研究,我们提出了GLoRE,一个仔细搜集的通用逻辑推理评估标准,包括12个数据集,涵盖三种不同的任务类型。我们的实验结果显示,对于开放LLM模型,逻辑推理能力仍需进一步改进,ChatGPT和GPT-4表现出了强大的逻辑推理能力,GPT-4在ChatGPT的基础上出众表现。我们提议使用自身一致探测方法来提高ChatGPT的准确性,以及使用微调方法来提高开放LLM模型的表现。我们将发布数据集和评估程序,以便未来的研究。

Privacy-Preserving Encrypted Low-Dose CT Denoising

  • paper_url: http://arxiv.org/abs/2310.09101
  • repo_url: None
  • paper_authors: Ziyuan Yang, Huijie Huangfu, Maosong Ran, Zhiwen Wang, Hui Yu, Yi Zhang
    for: 这篇论文的目的是提出一种在加密领域进行静止影像处理,以保护用户的医疗资料隐私。methods: 本论文使用了同调加密技术来加密私人的静止影像数据,然后将该数据转发到已训练的服务器模型进行进一步的静止影像处理。在加密领域中,将数据处理为 tradicional operations,如核心映射和线性变换,以保持资料隐私。results: 本论文的实验结果显示,使用了提案的方法可以实现隐私保护和服务器模型中况不泄露。另外,论文还提供了两个互动框架,一为线性模型,另一为非线性模型,两者皆可以实现无损资料处理。
    Abstract Deep learning (DL) has made significant advancements in tomographic imaging, particularly in low-dose computed tomography (LDCT) denoising. A recent trend involves servers training powerful models with large amounts of self-collected private data and providing application programming interfaces (APIs) for users, such as Chat-GPT. To avoid model leakage, users are required to upload their data to the server model, but this way raises public concerns about the potential risk of privacy disclosure, especially for medical data. Hence, to alleviate related concerns, in this paper, we propose to directly denoise LDCT in the encrypted domain to achieve privacy-preserving cloud services without exposing private data to the server. To this end, we employ homomorphic encryption to encrypt private LDCT data, which is then transferred to the server model trained with plaintext LDCT for further denoising. However, since traditional operations, such as convolution and linear transformation, in DL methods cannot be directly used in the encrypted domain, we transform the fundamental mathematic operations in the plaintext domain into the operations in the encrypted domain. In addition, we present two interactive frameworks for linear and nonlinear models in this paper, both of which can achieve lossless operating. In this way, the proposed methods can achieve two merits, the data privacy is well protected and the server model is free from the risk of model leakage. Moreover, we provide theoretical proof to validate the lossless property of our framework. Finally, experiments were conducted to demonstrate that the transferred contents are well protected and cannot be reconstructed. The code will be released once the paper is accepted.
    摘要 深度学习(DL)在Tomography影像中做出了重要进步,特别是在低剂量计算Tomography(LDCT)的噪声去除中。现在的趋势是,服务器会使用大量私人数据自己收集并提供应用程序编程接口(API) для用户,如Chat-GPT。为了避免模型泄露,用户需要将数据上传到服务器模型,但这会引起公众对隐私泄露的关注,特别是医疗数据。因此,在本文中,我们提议直接在加密领域进行LDCT的去噪,以实现隐私保护云服务,不曝光私人数据到服务器。为此,我们使用同行加密加密私人LDCT数据,然后将其传输到已训练的纯文本LDCT服务器模型进行进一步的去噪。但是,由于传统的DL方法中的操作,如卷积和线性变换,在加密领域中不能直接使用,我们将在纯文本领域中的基本数学操作转换到加密领域中。此外,我们在本文中提出了两种交互式框架,一种是线性模型,另一种是非线性模型,两者都可以实现无损操作。这样,我们的方法可以实现两点优点:一是保护数据隐私,二是避免服务器模型泄露。此外,我们还提供了理论证明,以 validate lossless性Property of our framework。最后,我们进行了实验,证明传输的内容是不可重构的。代码将在纸上accepted时发布。

BaitBuster-Bangla: A Comprehensive Dataset for Clickbait Detection in Bangla with Multi-Feature and Multi-Modal Analysis

  • paper_url: http://arxiv.org/abs/2310.11465
  • repo_url: https://github.com/abdalimran/BaitBuster-Bangla
  • paper_authors: Abdullah Al Imran, Md Sakib Hossain Shovon, M. F. Mridha
  • For: This paper is written for researchers in natural language processing and data science who are interested in advancing the modeling of clickbait phenomena in low-resource languages, specifically in Bangla.* Methods: The paper uses an automated process to collect a large multi-modal Bangla YouTube clickbait dataset, consisting of 253,070 data points from 58 Bangla YouTube channels. The dataset includes 18 diverse features categorized into metadata, primary content, engagement statistics, and labels for individual videos. The authors apply a rigorous preprocessing step to denoise, deduplicate, and remove bias from the features.* Results: The paper presents the largest and most robust clickbait corpus in Bangla to date, providing significant value for researchers seeking to advance the modeling of clickbait phenomena in low-resource languages. The multi-modal nature of the dataset allows for comprehensive analyses of clickbait across content, user interactions, and linguistic dimensions, enabling the development of more sophisticated detection methods with cross-linguistic applications.
    Abstract This study presents a large multi-modal Bangla YouTube clickbait dataset consisting of 253,070 data points collected through an automated process using the YouTube API and Python web automation frameworks. The dataset contains 18 diverse features categorized into metadata, primary content, engagement statistics, and labels for individual videos from 58 Bangla YouTube channels. A rigorous preprocessing step has been applied to denoise, deduplicate, and remove bias from the features, ensuring unbiased and reliable analysis. As the largest and most robust clickbait corpus in Bangla to date, this dataset provides significant value for natural language processing and data science researchers seeking to advance modeling of clickbait phenomena in low-resource languages. Its multi-modal nature allows for comprehensive analyses of clickbait across content, user interactions, and linguistic dimensions to develop more sophisticated detection methods with cross-linguistic applications.
    摘要 Translation in Simplified Chinese:这个研究提供了一个大型多模态孔雀 YouTube 数据集,包含 253,070 个数据点,通过自动化过程使用 YouTube API 和 Python 网络自动化框架收集。数据集包含 18 种多样的特征,分为元数据、主要内容、参与统计和标签,来自 58 个孔雀 YouTube 频道。经过严格的预处理步骤,以消除噪声、重复和偏见,保证了不受偏见的分析。作为目前最大和最可靠的孔雀数据集之一,这个数据集为natural language processing 和数据科学研究人员提供了丰富的价值,以提高 clicks 现象的模型化。数据集的多模态性允许对 clicks 进行全面的分析,包括内容、用户互动和语言维度,以开发更加复杂的检测方法,并在不同语言之间进行跨语言应用。

Insightful analysis of historical sources at scales beyond human capabilities using unsupervised Machine Learning and XAI

  • paper_url: http://arxiv.org/abs/2310.09091
  • repo_url: None
  • paper_authors: Oliver Eberle, Jochen Büttner, Hassan El-Hajj, Grégoire Montavon, Klaus-Robert Müller, Matteo Valleriani
  • for: 这篇论文旨在利用人工智能技术对历史材料进行深入分析,以探讨知识演化和传播的历史发展。
  • methods: 该研究使用创新的机器学习技术对历史材料进行分析,以获取对 mathematical astronomy 领域知识演化和创新的深入理解。
  • results: 研究发现,在欧洲大学教学中使用的 astronomy 教科书在15世纪至17世纪期间经历了重要的发展和变革,这些变革反映了当时科学和技术的进步。
    Abstract Historical materials are abundant. Yet, piecing together how human knowledge has evolved and spread both diachronically and synchronically remains a challenge that can so far only be very selectively addressed. The vast volume of materials precludes comprehensive studies, given the restricted number of human specialists. However, as large amounts of historical materials are now available in digital form there is a promising opportunity for AI-assisted historical analysis. In this work, we take a pivotal step towards analyzing vast historical corpora by employing innovative machine learning (ML) techniques, enabling in-depth historical insights on a grand scale. Our study centers on the evolution of knowledge within the `Sacrobosco Collection' -- a digitized collection of 359 early modern printed editions of textbooks on astronomy used at European universities between 1472 and 1650 -- roughly 76,000 pages, many of which contain astronomic, computational tables. An ML based analysis of these tables helps to unveil important facets of the spatio-temporal evolution of knowledge and innovation in the field of mathematical astronomy in the period, as taught at European universities.
    摘要 历史资料丰富,但是根据时间和空间方面的研究仍然是一项挑战,因为历史材料的量太大,人工研究者的数量有限。然而,现在历史材料大量化得到了数字化的形式,这对人工智能助け进行历史分析提供了一个有前途的机会。在这项工作中,我们采用了创新的机器学习(ML)技术,以深入分析历史大量数据,从而获得深刻的历史认识。我们的研究中心于“萨克罗贝斯科学馆”——一个1472年至1650年期间的欧洲大学天文学教科书359个数字化版本——约76000页,大多数页面包含天文、计算表格。通过ML分析这些表格,我们可以揭示天文学领域在这一时期的知识和创新的空间和时间发展。

Dialect Transfer for Swiss German Speech Translation

  • paper_url: http://arxiv.org/abs/2310.09088
  • repo_url: None
  • paper_authors: Claudio Paonessa, Yanick Schraner, Jan Deriu, Manuela Hürlimann, Manfred Vogel, Mark Cieliebak
  • for: 这个研究探讨瑞士德语自然语言处理系统的建构困难,具体关注瑞士德语方言的多样性和标准德语之间的差异对瑞士德语自然语言处理系统的性能产生的影响。
  • methods: 该研究使用了多种方法,包括语音识别和翻译模型的训练,以及对不同方言和标准德语之间的语言差异进行分析。
  • results: 研究发现,包括方言在训练中的影响和标准德语和瑞士德语之间的语言差异都会对瑞士德语自然语言处理系统的性能产生负面的影响,这与语言学理论预测相符。
    Abstract This paper investigates the challenges in building Swiss German speech translation systems, specifically focusing on the impact of dialect diversity and differences between Swiss German and Standard German. Swiss German is a spoken language with no formal writing system, it comprises many diverse dialects and is a low-resource language with only around 5 million speakers. The study is guided by two key research questions: how does the inclusion and exclusion of dialects during the training of speech translation models for Swiss German impact the performance on specific dialects, and how do the differences between Swiss German and Standard German impact the performance of the systems? We show that dialect diversity and linguistic differences pose significant challenges to Swiss German speech translation, which is in line with linguistic hypotheses derived from empirical investigations.
    摘要

A ML-LLM pairing for better code comment classification

  • paper_url: http://arxiv.org/abs/2310.10275
  • repo_url: None
  • paper_authors: Hanna Abi Akl
  • for: 这篇论文主要是为了解决代码注释分类问题,即将代码片段与相关的注释进行评估,以确定其是否有助于理解相关代码。
  • methods: 这篇论文使用了классиical机器学习系统和大型自然语言模型(LLM)的组合来评估代码注释分类器的性能。
  • results: 这篇论文的最佳模型(一个神经网络)在提供的种子数据上达到了macro-F1分数为88.401%,并在使用LLM生成的数据上实现了1.5%的性能提升。
    Abstract The "Information Retrieval in Software Engineering (IRSE)" at FIRE 2023 shared task introduces code comment classification, a challenging task that pairs a code snippet with a comment that should be evaluated as either useful or not useful to the understanding of the relevant code. We answer the code comment classification shared task challenge by providing a two-fold evaluation: from an algorithmic perspective, we compare the performance of classical machine learning systems and complement our evaluations from a data-driven perspective by generating additional data with the help of large language model (LLM) prompting to measure the potential increase in performance. Our best model, which took second place in the shared task, is a Neural Network with a Macro-F1 score of 88.401% on the provided seed data and a 1.5% overall increase in performance on the data generated by the LLM.
    摘要 “信息检索在软件工程(IRSE)”在FIRE 2023 共同任务中引入代码注释分类,这是一项具有挑战性的任务,将代码段与一个用于理解相关代码的评价注释进行对比。我们回答这个共同任务挑战,从算法角度进行了两重评估:一是通过比较传统机器学习系统的性能,二是通过生成额外数据来补充评估,使用大型自然语言模型(LLM)的推荐来测试可能的性能提高。我们的最佳模型在提供的种子数据上获得了88.401%的macro-F1分数,并在使用LLM生成的数据上实现了1.5%的总体性能提高。

ImageManip: Image-based Robotic Manipulation with Affordance-guided Next View Selection

  • paper_url: http://arxiv.org/abs/2310.09069
  • repo_url: None
  • paper_authors: Xiaoqi Li, Yanzi Wang, Yan Shen, Ponomarenko Iaroslav, Haoran Lu, Qianxu Wang, Boshi An, Jiaming Liu, Hao Dong
  • for: 这个论文的目的是为了解决未来家庭助手机器人中3D各种物体抓取操作的问题,以便让机器人与环境进行交互。
  • methods: 这个论文使用了一种新的图像基于的机器人操作框架,这个框架可以捕捉目标物体的多个视角,并使用这些视角来推算物体的深度信息。
  • results: 与之前使用点云或RGB图像作为输入的方法相比,这个方法更有效率和实用。在实际世界实验中,这个方法也表现出了优秀的实际应用潜力。
    Abstract In the realm of future home-assistant robots, 3D articulated object manipulation is essential for enabling robots to interact with their environment. Many existing studies make use of 3D point clouds as the primary input for manipulation policies. However, this approach encounters challenges due to data sparsity and the significant cost associated with acquiring point cloud data, which can limit its practicality. In contrast, RGB images offer high-resolution observations using cost effective devices but lack spatial 3D geometric information. To overcome these limitations, we present a novel image-based robotic manipulation framework. This framework is designed to capture multiple perspectives of the target object and infer depth information to complement its geometry. Initially, the system employs an eye-on-hand RGB camera to capture an overall view of the target object. It predicts the initial depth map and a coarse affordance map. The affordance map indicates actionable areas on the object and serves as a constraint for selecting subsequent viewpoints. Based on the global visual prior, we adaptively identify the optimal next viewpoint for a detailed observation of the potential manipulation success area. We leverage geometric consistency to fuse the views, resulting in a refined depth map and a more precise affordance map for robot manipulation decisions. By comparing with prior works that adopt point clouds or RGB images as inputs, we demonstrate the effectiveness and practicality of our method. In the project webpage (https://sites.google.com/view/imagemanip), real world experiments further highlight the potential of our method for practical deployment.
    摘要 在未来家庭助手机器人领域,3D人工物体操作是必备的,以允许机器人与其环境互动。许多现有研究利用3D点云作为操作策略的主要输入,但这种方法会遇到数据稀缺和获取点云数据的高成本问题,限制其实用性。相比之下,RGB图像可以提供高分辨率的观察,使用成本效益的设备,但缺乏空间3D geometric信息。为了超越这些限制,我们提出了一种新的图像基于的机器人操作框架。这个框架是设计用于捕捉目标对象多个视角,并从图像中INFER深度信息来补充其几何。首先,系统使用RGB摄像头捕捉目标对象的总观察图像,预测初始深度地图和一个粗略的可行地图。可行地图表示对象上可行的操作区域,并作为制约选择后续视点的约束。基于全局视觉优先级,我们适应地选择最佳的后续视点,以进行细化的观察和检查操作成功区域。我们利用几何一致性来融合视图,从而获得了更精确的深度地图和更加准确的可行地图,以便机器人操作决策。与使用点云或RGB图像作为输入的先前研究相比,我们展示了我们的方法的有效性和实用性。在项目页面(https://sites.google.com/view/imagemanip)上,实际世界实验进一步强调了我们方法的实际应用潜力。

DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control

  • paper_url: http://arxiv.org/abs/2310.09053
  • repo_url: https://github.com/kevinhuang8/datt
  • paper_authors: Kevin Huang, Rwik Rana, Alexander Spitzer, Guanya Shi, Byron Boots
  • for: 精准控制四旋翼执行复杂的轨迹,尤其是在实际世界中遇到大规模干扰时。
  • methods: DATT 使用了学习基础的 feedforward-feedback-adaptive 控制结构,在模拟中使用了回归学习训练。在真实硬件上, DATT 添加了 L1 适应控制,在关闭Loop中实现了不需要细调整。
  • results: DATT 在实际世界中Significantly outperform 竞争的适应非线性和预测模型控制器,包括一些具有挑战性的情况,其中基elines 完全失败。此外, DATT 在线上运行时间 Less than 3.2 ms, Less than 1/4 适应非线性模型预测控制基eline。
    Abstract Precise arbitrary trajectory tracking for quadrotors is challenging due to unknown nonlinear dynamics, trajectory infeasibility, and actuation limits. To tackle these challenges, we present Deep Adaptive Trajectory Tracking (DATT), a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world. DATT builds on a novel feedforward-feedback-adaptive control structure trained in simulation using reinforcement learning. When deployed on real hardware, DATT is augmented with a disturbance estimator using L1 adaptive control in closed-loop, without any fine-tuning. DATT significantly outperforms competitive adaptive nonlinear and model predictive controllers for both feasible smooth and infeasible trajectories in unsteady wind fields, including challenging scenarios where baselines completely fail. Moreover, DATT can efficiently run online with an inference time less than 3.2 ms, less than 1/4 of the adaptive nonlinear model predictive control baseline
    摘要 <> translates the text into Simplified Chinese.>precise arbitrary trajectory tracking for quadrotors is challenging due to unknown nonlinear dynamics, trajectory infeasibility, and actuation limits. To tackle these challenges, we present Deep Adaptive Trajectory Tracking (DATT), a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world. DATT builds on a novel feedforward-feedback-adaptive control structure trained in simulation using reinforcement learning. When deployed on real hardware, DATT is augmented with a disturbance estimator using L1 adaptive control in closed-loop, without any fine-tuning. DATT significantly outperforms competitive adaptive nonlinear and model predictive controllers for both feasible smooth and infeasible trajectories in unsteady wind fields, including challenging scenarios where baselines completely fail. Moreover, DATT can efficiently run online with an inference time less than 3.2 ms, less than 1/4 of the adaptive nonlinear model predictive control baseline.Here's the text in Simplified Chinese:精准的旋翼机 quadrotor trajectory tracking 是一个挑战,因为 unknown nonlinear dynamics,trajectory infeasibility,和 actuation limits。为解决这些挑战,我们提出了 Deep Adaptive Trajectory Tracking (DATT),一种学习基于的控制方法,可以在实际世界中精准跟踪任意可能不可能的 trajectory,并在大型干扰下进行适应。DATT 基于一种新的 feedforward-feedback-adaptive control structure,在 simulation 中使用 reinforcement learning 进行训练。在实际硬件上部署 DATT 时,采用 L1 adaptive control 的干扰估计,在关闭控制 loop 中进行适应,无需任何细调。相比同类的 adaptive nonlinear 和 model predictive controllers,DATT 在不可能的 trajectory 和风场中表现出色,包括一些挑战的情况,baseline 完全失败。此外,DATT 可以在线上高效运行,推理时间 less than 3.2 ms,less than 1/4 of the adaptive nonlinear model predictive control baseline。

SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network

  • paper_url: http://arxiv.org/abs/2310.09049
  • repo_url: None
  • paper_authors: Lei Yao, Yong Zhang, Zilong Yan, Jialu Tian
  • For: solves complex AI tasks in intelligent mobile networks, such as network optimization and resource allocation.* Methods: leverages Large Language Models (LLMs) and JSON-format intent-based input to connect self-designed model library and database, and uses model cards to pairwise match between different modules for model composition.* Results: achieves impressive results in completing numerous complex AI tasks in the communication network, leveraging the language capabilities of LLMs and the abundant AI models in the model library.
    Abstract In the rapid development of artificial intelligence, solving complex AI tasks is a crucial technology in intelligent mobile networks. Despite the good performance of specialized AI models in intelligent mobile networks, they are unable to handle complicated AI tasks. To address this challenge, we propose Systematic Artificial Intelligence (SAI), which is a framework designed to solve AI tasks by leveraging Large Language Models (LLMs) and JSON-format intent-based input to connect self-designed model library and database. Specifically, we first design a multi-input component, which simultaneously integrates Large Language Models (LLMs) and JSON-format intent-based inputs to fulfill the diverse intent requirements of different users. In addition, we introduce a model library module based on model cards which employ model cards to pairwise match between different modules for model composition. Model cards contain the corresponding model's name and the required performance metrics. Then when receiving user network requirements, we execute each subtask for multiple selected model combinations and provide output based on the execution results and LLM feedback. By leveraging the language capabilities of LLMs and the abundant AI models in the model library, SAI can complete numerous complex AI tasks in the communication network, achieving impressive results in network optimization, resource allocation, and other challenging tasks.
    摘要 在人工智能的快速发展中,解决复杂的人工智能任务是智能移动网络中关键的技术。尽管特殊的人工智能模型在智能移动网络中表现良好,但它们无法处理复杂的人工智能任务。为解决这个挑战,我们提议系统人工智能(SAI),它是一个基于大语言模型(LLM)和JSON格式意图输入的框架,用于解决人工智能任务。 Specifically, we first design a multi-input component that simultaneously integrates LLMs and JSON-format intent-based inputs to fulfill the diverse intent requirements of different users. In addition, we introduce a model library module based on model cards, which employ model cards to pairwise match between different modules for model composition. Model cards contain the corresponding model's name and the required performance metrics. When receiving user network requirements, we execute each subtask for multiple selected model combinations and provide output based on the execution results and LLM feedback. By leveraging the language capabilities of LLMs and the abundant AI models in the model library, SAI can complete numerous complex AI tasks in the communication network, achieving impressive results in network optimization, resource allocation, and other challenging tasks.

KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection

  • paper_url: http://arxiv.org/abs/2310.09044
  • repo_url: https://github.com/hkust-knowcomp/knowledge-constrained-decoding
  • paper_authors: Sehyun Choi, Tianqing Fang, Zhaowei Wang, Yangqiu Song
  • for: 降低大型自然语言模型(LLM)发布时的假信息风险。
  • methods: 使用知识约束搜索(KCTS)方法,guide 冻结的LM在每个解码步骤生成文本,并使用知识分类器得分和Monte-Carlo Tree Search(MCTS)来帮助模型遵循知识。
  • results: 在知识底据对话和摘要SUMMARY中,KCTS显示出了减少假信息的能力,并且可以作为一个插件和模型无关的解码方法。
    Abstract Large Language Models (LLMs) have demonstrated remarkable human-level natural language generation capabilities. However, their potential to generate misinformation, often called the hallucination problem, poses a significant risk to their deployment. A common approach to address this issue is to retrieve relevant knowledge and fine-tune the LLM with the knowledge in its input. Unfortunately, this method incurs high training costs and may cause catastrophic forgetting for multi-tasking models. To overcome these limitations, we propose a knowledge-constrained decoding method called KCTS (Knowledge-Constrained Tree Search), which guides a frozen LM to generate text aligned with the reference knowledge at each decoding step using a knowledge classifier score and MCTS (Monte-Carlo Tree Search). To adapt the sequence-level knowledge classifier to token-level guidance, we also propose a novel token-level hallucination detection method called RIPA (Reward Inflection Point Approximation). Our empirical results on knowledge-grounded dialogue and abstractive summarization demonstrate the strength of KCTS as a plug-and-play, model-agnostic decoding method that can effectively reduce hallucinations in natural language generation.
    摘要 大型自然语言模型(LLM)已经展示了人类水平的自然语言生成能力。然而,它们的潜在发生假信息(也称为幻觉问题)可能会对其部署 pose significative 风险。一种常见的方法来解决这个问题是通过检索相关知识并使用输入知识进行精度调整。然而,这种方法可能会产生高训练成本并可能导致多任务模型的彻底忘记。为了超越这些限制,我们提议一种受知识约束的解码方法 called KCTS(受知识约束搜索),它使用知识分类器分数和MCST(Monte-Carlo搜索)引导冻结的LM生成文本,并在每个解码步骤都保持文本与参考知识的一致性。为了将序列级别的知识分类器转化为token级别的导航,我们还提出了一种新的吞吐量幻觉检测方法 called RIPA(奖资点附近拟合)。我们的实验结果表明,KCTS可以作为一个插件和模型无关的解码方法,有效地减少自然语言生成中的幻觉。

Optimal Scheduling of Electric Vehicle Charging with Deep Reinforcement Learning considering End Users Flexibility

  • paper_url: http://arxiv.org/abs/2310.09040
  • repo_url: None
  • paper_authors: Christoforos Menos-Aikateriniadis, Stavros Sykiotis, Pavlos S. Georgilakis
  • for: 这篇论文的目的是为了找出家庭的电动车费用优化充电策略,以减少家庭用电费用。
  • methods: 这篇论文使用了深度强化学习(DQN)来实现家庭的充电策略,并将历史资料分析用于推断用户的可动性潜力。
  • results: 根据这篇论文的结果,使用DQN实现的家庭充电策略可以实现更过20%的用户电费优化。
    Abstract The rapid growth of decentralized energy resources and especially Electric Vehicles (EV), that are expected to increase sharply over the next decade, will put further stress on existing power distribution networks, increasing the need for higher system reliability and flexibility. In an attempt to avoid unnecessary network investments and to increase the controllability over distribution networks, network operators develop demand response (DR) programs that incentivize end users to shift their consumption in return for financial or other benefits. Artificial intelligence (AI) methods are in the research forefront for residential load scheduling applications, mainly due to their high accuracy, high computational speed and lower dependence on the physical characteristics of the models under development. The aim of this work is to identify households' EV cost-reducing charging policy under a Time-of-Use tariff scheme, with the use of Deep Reinforcement Learning, and more specifically Deep Q-Networks (DQN). A novel end users flexibility potential reward is inferred from historical data analysis, where households with solar power generation have been used to train and test the designed algorithm. The suggested DQN EV charging policy can lead to more than 20% of savings in end users electricity bills.
    摘要 随着分布式能源资源的快速增长,特别是电动汽车(EV)的预计增长,将在下一个十年内增加更大的压力于现有的电力分配网络,提高系统可靠性和灵活性。为了避免不必要的网络投资和提高分配网络的可控性,网络运营商开发了各种需求应答(DR)计划,激励终端用户调整消耗,以换取金融或其他优惠。人工智能(AI)技术在家庭负荷调度应用中处于研究前列,主要因为它们具有高准确率、高计算速度和模型Physical characteristic低依赖性。本文的目的是通过深度强化学习(DQN)确定家庭EV充电策略,以减少用户电力费用。通过历史数据分析,我们提出了一种新的用户可动性潜力奖励,并在这个奖励基础上训练和测试了设计的DQN充电策略。结果显示,该策略可以帮助用户减少电力费用 более20%。

Subspace Adaptation Prior for Few-Shot Learning

  • paper_url: http://arxiv.org/abs/2310.09028
  • repo_url: https://github.com/mikehuisman/subspace-adaptation-prior
  • paper_authors: Mike Huisman, Aske Plaat, Jan N. van Rijn
  • for: 这篇论文的目的是提出一种新的meta-learning算法,以提高几何学习的效率和稳定性。
  • methods: 这篇论文使用了gradient-based meta-learning方法,并将内置层的参数分为多个子空间,以适应不同的任务分布。
  • results: 这篇论文在几何学习中获得了superior或相当的表现,并且分析过learn的子空间,发现低维操作可以导致高活动强度,这可能是实现好几何学习表现的关键。
    Abstract Gradient-based meta-learning techniques aim to distill useful prior knowledge from a set of training tasks such that new tasks can be learned more efficiently with gradient descent. While these methods have achieved successes in various scenarios, they commonly adapt all parameters of trainable layers when learning new tasks. This neglects potentially more efficient learning strategies for a given task distribution and may be susceptible to overfitting, especially in few-shot learning where tasks must be learned from a limited number of examples. To address these issues, we propose Subspace Adaptation Prior (SAP), a novel gradient-based meta-learning algorithm that jointly learns good initialization parameters (prior knowledge) and layer-wise parameter subspaces in the form of operation subsets that should be adaptable. In this way, SAP can learn which operation subsets to adjust with gradient descent based on the underlying task distribution, simultaneously decreasing the risk of overfitting when learning new tasks. We demonstrate that this ability is helpful as SAP yields superior or competitive performance in few-shot image classification settings (gains between 0.1% and 3.9% in accuracy). Analysis of the learned subspaces demonstrates that low-dimensional operations often yield high activation strengths, indicating that they may be important for achieving good few-shot learning performance. For reproducibility purposes, we publish all our research code publicly.
    摘要 Gradient-based meta-learning技术目的是从训练任务集中提取有用的先前知识,以便使用梯度下降来更有效地学习新任务。而这些方法通常会在学习新任务时适应所有可变参数的trainable层,这可能导致忽略有效的学习策略,特别是在几何学习中,任务必须从有限数量的示例中学习。为解决这些问题,我们提出了Subspace Adaptation Prior(SAP),一种新的梯度基于的meta-学习算法,它同时学习好的初始化参数(先前知识)和层wise参数的子空间,这种子空间是操作subset的形式,这些操作subset应该是可适应的。因此,SAP可以根据下一个任务的分布来学习哪些操作subset应该通过梯度下降进行调整,从而降低学习新任务时的风险过拟合。我们的实验表明,这种能力有助于SAP在几何学习中提供了优于或与其他方法相当的性能(准确率提高0.1%-3.9%)。分析学习的子空间表明,低维操作经常产生高活动强度,这可能表示它们在几何学习中具有重要作用。为了保证复制性,我们将所有研究代码公开发布。

A Spatial-Temporal Dual-Mode Mixed Flow Network for Panoramic Video Salient Object Detection

  • paper_url: http://arxiv.org/abs/2310.09016
  • repo_url: None
  • paper_authors: Xiaolei Chen, Pengcheng Zhang, Zelong Du, Ishfaq Ahmad
  • for: 这个研究旨在提高拼接影像中的焦点物体检测精度。
  • methods: 本研究提出了一个具有层间注意(ILA)模组、层间重量(ILW)模组和二重注意(BMA)模组的混合流网络(STDMMF-Net),利用拼接影像的空间流和相应流进行焦点物体检测。
  • results: 实验结果显示,提案方法比state-of-the-art(SOTA)方法更高的检测精度,且在内存需求、测试时间、复杂度和泛化性方面表现更好。
    Abstract Salient object detection (SOD) in panoramic video is still in the initial exploration stage. The indirect application of 2D video SOD method to the detection of salient objects in panoramic video has many unmet challenges, such as low detection accuracy, high model complexity, and poor generalization performance. To overcome these hurdles, we design an Inter-Layer Attention (ILA) module, an Inter-Layer weight (ILW) module, and a Bi-Modal Attention (BMA) module. Based on these modules, we propose a Spatial-Temporal Dual-Mode Mixed Flow Network (STDMMF-Net) that exploits the spatial flow of panoramic video and the corresponding optical flow for SOD. First, the ILA module calculates the attention between adjacent level features of consecutive frames of panoramic video to improve the accuracy of extracting salient object features from the spatial flow. Then, the ILW module quantifies the salient object information contained in the features of each level to improve the fusion efficiency of the features of each level in the mixed flow. Finally, the BMA module improves the detection accuracy of STDMMF-Net. A large number of subjective and objective experimental results testify that the proposed method demonstrates better detection accuracy than the state-of-the-art (SOTA) methods. Moreover, the comprehensive performance of the proposed method is better in terms of memory required for model inference, testing time, complexity, and generalization performance.
    摘要 Salient object detection (SOD) in panoramic video 是一个初级的探索阶段。直接将二维视频 SOD 方法应用于扩展视频中的对象检测存在许多不满情况,如低检测精度、高模型复杂度和差异性表现不佳。为了解决这些障碍,我们设计了一个层间注意力(ILA)模块、层间重量(ILW)模块和二重模式注意力(BMA)模块。基于这些模块,我们提议一种空间-时间二重混合流网络(STDMMF-Net),利用扩展视频的空间流和相应的运动流进行 SOD。首先,ILA 模块在连续帧中的adjacent层特征之间进行注意力计算,以提高扩展视频中对象特征的抽象精度。然后,ILW 模块评估每个层特征中的突出对象信息的权重,以提高各层特征的混合效率。最后,BMA 模块提高 STDMMF-Net 的检测精度。大量主观和客观实验结果表明,我们提议的方法在检测精度方面超过了当前最佳方法(SOTA)。此外,我们的方法在内存需求、测试时间、复杂度和总体性方面具有更好的综合性能。

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

  • paper_url: http://arxiv.org/abs/2310.08992
  • repo_url: None
  • paper_authors: Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty
  • for: 提高 LLM 解决复杂编程任务的能力,帮助它们储存和重用已有的模块。
  • methods: 提出 CodeChain 框架,通过自我修订链来引导 LLM 生成归纳化代码。
  • results: 在 APPS 和 CodeContests 上实现了相对 pass@1 提升率为 35% 和 76%,并在 OpenAI LLM 和 WizardCoder 上得到了良好的效果。
    Abstract Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.
    摘要 具体来说,CodeChain 首先要求 LLM 通过链式思维提示生成分解式代码。然后,它会在多个自我修改之后对生成的代码进行修改,以确保代码的幂等性和正确性。在每一次自我修改之前,CodeChain 会将生成的代码分解成多个子模块,并选择这些子模块的表示性较强的实现作为下一次生成的基础。在下一次生成的过程中,CodeChain 会将这些选择的子模块作为预处理的输入,以便 LLM 可以更好地 reuse 这些已经开发过的模块。我们发现,通过自然地促使 LLM reuse 已经开发过的和验证过的子模块,CodeChain 可以显著提高代码的幂等性和正确性,在 APPS 和 CodeContests 等场景中实现相对的 pass@1 提升率为 35% 和 76%。它可以在 OpenAI LLMs 上以及开源 LLMs 如 WizardCoder 上实现优秀的效果。我们还进行了详细的减少研究,包括不同的提示方法、群集数量、模型大小、程序质量等方面,以提供有用的减少。通过这些研究,我们发现 CodeChain 的成功归功于它的自然的链式思维机制和可 reuse 的子模块。

Reroute Prediction Service

  • paper_url: http://arxiv.org/abs/2310.08988
  • repo_url: None
  • paper_authors: Ítalo Romani de Oliveira, Samet Ayhan, Michael Biglin, Pablo Costas, Euclides C. Pinto Neto
  • for: 降低美国国家航空系统中的延迟,通过灵活支持重新路径决策。
  • methods: 使用历史重新路径数据和天气数据,通过数据分析和机器学习算法来预测重新路径建议。
  • results: 实现了高于90%的准确率。
    Abstract The cost of delays was estimated as 33 billion US dollars only in 2019 for the US National Airspace System, a peak value following a growth trend in past years. Aiming to address this huge inefficiency, we designed and developed a novel Data Analytics and Machine Learning system, which aims at reducing delays by proactively supporting re-routing decisions. Given a time interval up to a few days in the future, the system predicts if a reroute advisory for a certain Air Route Traffic Control Center or for a certain advisory identifier will be issued, which may impact the pertinent routes. To deliver such predictions, the system uses historical reroute data, collected from the System Wide Information Management (SWIM) data services provided by the FAA, and weather data, provided by the US National Centers for Environmental Prediction (NCEP). The data is huge in volume, and has many items streamed at high velocity, uncorrelated and noisy. The system continuously processes the incoming raw data and makes it available for the next step where an interim data store is created and adaptively maintained for efficient query processing. The resulting data is fed into an array of ML algorithms, which compete for higher accuracy. The best performing algorithm is used in the final prediction, generating the final results. Mean accuracy values higher than 90% were obtained in our experiments with this system. Our algorithm divides the area of interest in units of aggregation and uses temporal series of the aggregate measures of weather forecast parameters in each geographical unit, in order to detect correlations with reroutes and where they will most likely occur. Aiming at practical application, the system is formed by a number of microservices, which are deployed in the cloud, making the system distributed, scalable and highly available.
    摘要 “美国国家航空系统的延误成本在2019年估计为330亿美元,是近年来增长趋势的巅峰值。为了解决这些延误的巨大不稳定性,我们设计和开发了一个新的数据分析和机器学习系统,旨在降低延误的方式。”“给出一个时间间隔,该系统可预测是否会发布重新路由建议,对于某个空中交通管理中心或某个建议标识符。为了实现这一点,该系统使用了由美国联邦航空管理局提供的系统宽信息管理(SWIM)数据服务,以及由美国国家气象中心提供的天气数据。数据量很大,涉及高速流动、不相关、噪声等问题。系统不断处理进来的原始数据,并将其转化为高效查询的存储系统。然后,将数据分配给一组机器学习算法,以竞赛性提高准确性。最佳表现的算法被选用,生成最终预测结果。实验中,我们 obtener mean accuracy values higher than 90%。”“我们的算法将 interessant area 分成单位,并使用每个地理单位的时间序列气象预测参数的聚合值,以探测与重新路由之间的相互作用。为了实际应用,该系统由一些微服务组成,通过云端部署,使得系统分布式、可扩展和高可用。”

Big data-driven prediction of airspace congestion

  • paper_url: http://arxiv.org/abs/2310.08982
  • repo_url: None
  • paper_authors: Samet Ayhan, Ítalo Romani de Oliveira, Glaucia Balvedi, Pablo Costas, Alexandre Leite, Felipe C. F. de Azevedo
  • for: 这paper的目的是为了提供一种精度measure和预测航空器数量在特定空域中,以提高空中交通管理水平,减轻空中交通管理员的工作负担。
  • methods: 该paper使用了一种新的数据管理和预测系统,可以准确地预测航空器数量在特定空域中。该系统使用流入的TFM数据进行预处理,将数据压缩到可持久化的大小,并将其存储在NoSQL数据库中。在预测步骤中,系统使用历史飞行轨迹中的特征来预测航空器数量。
  • results: 评估结果表明,该系统可以高效地和准确地预测航空器数量在每个空域中。
    Abstract Air Navigation Service Providers (ANSP) worldwide have been making a considerable effort for the development of a better method to measure and predict aircraft counts within a particular airspace, also referred to as airspace density. An accurate measurement and prediction of airspace density is crucial for a better managed airspace, both strategically and tactically, yielding a higher level of automation and thereby reducing the air traffic controller's workload. Although the prior approaches have been able to address the problem to some extent, data management and query processing of ever-increasing vast volume of air traffic data at high rates, for various analytics purposes such as predicting aircraft counts, still remains a challenge especially when only linear prediction models are used. In this paper, we present a novel data management and prediction system that accurately predicts aircraft counts for a particular airspace sector within the National Airspace System (NAS). The incoming Traffic Flow Management (TFM) data is streaming, big, uncorrelated and noisy. In the preprocessing step, the system continuously processes the incoming raw data, reduces it to a compact size, and stores it in a NoSQL database, where it makes the data available for efficient query processing. In the prediction step, the system learns from historical trajectories and uses their segments to collect key features such as sector boundary crossings, weather parameters, and other air traffic data. The features are fed into various regression models, including linear, non-linear and ensemble models, and the best performing model is used for prediction. Evaluation on an extensive set of real track, weather, and air traffic data including boundary crossings in the U.S. verify that our system efficiently and accurately predicts aircraft counts in each airspace sector.
    摘要 全球航空导航服务提供商(ANSP)在开发一种更好的飞行器数量测量和预测方法方面做出了很大努力。正确测量和预测飞行器数量是管理空域的关键,它可以提高战略和战术层的管理效果,并使空交通控制器的工作负担更低。虽然以前的方法有所成功,但是数据管理和查询处理的高速大量空交通数据仍然是一个挑战,特别是只使用线性预测模型。在这篇论文中,我们介绍了一种新的数据管理和预测系统,可以准确地预测特定空域段的飞行器数量。系统接收来自交通流管理(TFM)的进行流动的大量数据,并将其减小到一个紧凑的大小,然后将其存储在NoSQL数据库中,以便高效地进行查询处理。在预测步骤中,系统通过历史飞行路径学习 key features,如空域界限 crossing、天气参数和其他空交通数据。这些特征被 fed into 不同的回归模型,包括线性、非线性和ensemble模型,并使用最佳表现的模型进行预测。经过对庞大的实际飞行、天气和空交通数据,包括边界 crossing 在美国进行评估,表明我们的系统可以高效地和准确地预测飞行器数量在每个空域段。

Multi-Purpose NLP Chatbot : Design, Methodology & Conclusion

  • paper_url: http://arxiv.org/abs/2310.08977
  • repo_url: None
  • paper_authors: Shivom Aggarwal, Shourya Mehra, Pritha Mitra
  • for: 这篇研究论文主要探讨了当今聊天机器人技术环境的历史、difficulties和潜在价值。
  • methods: 该论文提出了一种非常灵活的聊天机器人系统,该系统使用强化学习策略来提高用户交互和对话体验。此外,该系统还使用情感分析和自然语言处理来确定用户情绪。
  • results: 该论文通过实践证明了这种聊天机器人系统的优异特性,包括语音对话、多语言支持、建议功能、离线运行和快速帮助功能等。此外,该研究还探讨了聊天机器人技术的复杂性和发展因素,以及它对多个领域的深远影响。
    Abstract With a major focus on its history, difficulties, and promise, this research paper provides a thorough analysis of the chatbot technology environment as it exists today. It provides a very flexible chatbot system that makes use of reinforcement learning strategies to improve user interactions and conversational experiences. Additionally, this system makes use of sentiment analysis and natural language processing to determine user moods. The chatbot is a valuable tool across many fields thanks to its amazing characteristics, which include voice-to-voice conversation, multilingual support [12], advising skills, offline functioning, and quick help features. The complexity of chatbot technology development is also explored in this study, along with the causes that have propelled these developments and their far-reaching effects on a range of sectors. According to the study, three crucial elements are crucial: 1) Even without explicit profile information, the chatbot system is built to adeptly understand unique consumer preferences and fluctuating satisfaction levels. With the use of this capacity, user interactions are made to meet their wants and preferences. 2) Using a complex method that interlaces Multiview voice chat information, the chatbot may precisely simulate users' actual experiences. This aids in developing more genuine and interesting discussions. 3) The study presents an original method for improving the black-box deep learning models' capacity for prediction. This improvement is made possible by introducing dynamic satisfaction measurements that are theory-driven, which leads to more precise forecasts of consumer reaction.
    摘要 这个研究论文对现代聊天机器人技术环境进行了全面的分析,包括历史、挑战和前景。它提供了一个非常灵活的聊天机器人系统,使用强化学习策略来改善用户互动和对话体验。此外,该系统还使用情感分析和自然语言处理来确定用户的情绪状态。由于它的优秀特点,如语音对话、多语言支持、建议功能、离线运行和快速帮助功能等,聊天机器人在各个领域都是一个非常有价值的工具。这个研究还探讨了聊天机器人技术的复杂性,以及这些发展的原因和对各个领域的深远影响。根据研究,三个关键因素是:1. 不需要显式Profile信息,聊天机器人系统可以快速理解用户的唯一需求和不断变化的满意度。通过这种能力,用户互动可以更好地适应他们的需求。2. 使用复杂的多视图语音聊天信息协同策略,聊天机器人可以准确模拟用户的实际体验。这有助于创造更真实和有趣的对话。3. 研究提出了一种改进黑obox深度学习模型预测能力的方法,通过引入动态满意度测量,使得预测更加准确。

ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models

  • paper_url: http://arxiv.org/abs/2310.08975
  • repo_url: https://github.com/lhrlab/chatkbqa
  • paper_authors: Haoran Luo, Haihong E, Zichen Tang, Shiyao Peng, Yikai Guo, Wentai Zhang, Chenghao Ma, Guanting Dong, Meina Song, Wei Lin
  • for: 这个论文的目的是提出一种基于大语言模型(LLMs)的生成并 retrieve 知识库问答框架(KBQA),以解决现有KBQA方法中的三大挑战。
  • methods: 该论文使用了 fine-tuning 开源 LLMs such as Llama-2, ChatGLM2 和 Baichuan2,并提出了一种生成Logical Form 后再进行 retrieve 和替换实体和关系的方法,以改进生成和检索的精度。
  • results: 实验结果显示,ChatKBQA 在标准 KBQA 数据集 WebQSP 和 ComplexWebQuestions (CWQ) 上达到了新的州OF-THE-ART 性能,并提供了一种将 LLMs 与知识图(KGs)结合的新模式,以实现可读性和知识需要的问答。
    Abstract Knowledge Base Question Answering (KBQA) aims to derive answers to natural language questions over large-scale knowledge bases (KBs), which are generally divided into two research components: knowledge retrieval and semantic parsing. However, three core challenges remain, including inefficient knowledge retrieval, retrieval errors adversely affecting semantic parsing, and the complexity of previous KBQA methods. In the era of large language models (LLMs), we introduce ChatKBQA, a novel generate-then-retrieve KBQA framework built on fine-tuning open-source LLMs such as Llama-2, ChatGLM2 and Baichuan2. ChatKBQA proposes generating the logical form with fine-tuned LLMs first, then retrieving and replacing entities and relations through an unsupervised retrieval method, which improves both generation and retrieval more straightforwardly. Experimental results reveal that ChatKBQA achieves new state-of-the-art performance on standard KBQA datasets, WebQSP, and ComplexWebQuestions (CWQ). This work also provides a new paradigm for combining LLMs with knowledge graphs (KGs) for interpretable and knowledge-required question answering. Our code is publicly available.
    摘要 知识库问答(KBQA)目的是从大规模知识库(KB)中 derive 自然语言问题的答案,通常分为两个研究组件:知识检索和semantic parsing。然而,三个核心挑战仍然存在,包括不效率的知识检索、检索错误影响semantic parsing,以及先前KBQA方法的复杂性。在大语言模型(LLM)时代,我们介绍了ChatKBQA,一种基于 fine-tuning 开源 LLM such as Llama-2、ChatGLM2 和 Baichuan2 的新的 generate-then-retrieve KBQA 框架。ChatKBQA 提议在 fine-tuned LLM 上生成逻辑形式,然后通过无监督检索方法检索和替换实体和关系,从而提高生成和检索的效果。实验结果表明,ChatKBQA 在标准 KBQA 数据集 WebQSP 和 ComplexWebQuestions (CWQ) 上达到了新的州OF-THE-ART 性能。此外,这种工作还提供了将 LLM 与知识图(KG)结合起来的新方法,以实现可读性和知识需要的问题 answering。我们的代码公开可用。

Making Multimodal Generation Easier: When Diffusion Models Meet LLMs

  • paper_url: http://arxiv.org/abs/2310.08949
  • repo_url: https://github.com/zxy556677/easygen
  • paper_authors: Xiangyu Zhao, Bo Liu, Qijiong Liu, Guangyuan Shi, Xiao-Ming Wu
  • for: 提高多Modal理解和生成的效率,使用扩散模型和大语言模型(LLM)。
  • methods: 使用扩散模型BiDiffuser,与LLM结合使用投影层进行图像到文本生成和文本到图像生成。
  • results: 经过诸多量化和质量测试,EasyGen显示出了效果,可以在室内设备上轻松地进行训练。
    Abstract We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge the gap between modalities, EasyGen is built upon a bidirectional conditional diffusion model named BiDiffuser, which promotes more efficient interactions between modalities. EasyGen handles image-to-text generation by integrating BiDiffuser and an LLM via a simple projection layer. Unlike most existing multimodal models that are limited to generating text responses, EasyGen can also facilitate text-to-image generation by leveraging the LLM to create textual descriptions, which can be interpreted by BiDiffuser to generate appropriate visual responses. Extensive quantitative and qualitative experiments demonstrate the effectiveness of EasyGen, whose training can be easily achieved in a lab setting. The source code is available at https://github.com/zxy556677/EasyGen.
    摘要 我们介绍EasyGen,一种高效的模型,旨在提高多Modal理解和生成。不同于现有的多Modal模型,它们主要依靠编码器如CLIP或ImageBind,需要大量的训练数据来跨modalities的桥接。而EasyGen则基于双向条件扩散模型BiDiffuser,它促进了多Modal之间更高效的交互。EasyGen通过简单的投影层将BiDiffuser和LLM集成起来,用于图像到文本生成。与大多数现有的多Modal模型不同,EasyGen可以生成文本响应以外的图像响应,通过LLM生成文本描述,并由BiDiffuser解释为相应的视觉响应。我们进行了详细的量化和质量实验,证明EasyGen的效果,其训练可以在实验室内容guez。源代码可以在https://github.com/zxy556677/EasyGen中下载。

Federated Class-Incremental Learning with Prompting

  • paper_url: http://arxiv.org/abs/2310.08948
  • repo_url: None
  • paper_authors: Jiale Liu, Yu-Wei Zhan, Chong-Yu Zhang, Xin Luo, Zhen-Duo Chen, Yinwei Wei, Xin-Shun Xu
  • for: This paper focuses on the challenging problem of federated class-incremental learning (FCIL), where the local and global models may suffer from catastrophic forgetting due to the arrival of new classes and non-independent and identically distributed (non-iid) data distributions.
  • methods: The proposed method, Federated Class-Incremental Learning with Prompting (FCILPT), uses prompts to ease the catastrophic forgetting of old classes, and sorts the task information in the prompt pool to align the task information on different clients before global aggregation.
  • results: The proposed method achieves significant accuracy improvements over state-of-the-art methods on three benchmark datasets (CIFAR-100, Mini-ImageNet, and Tiny-ImageNet).
    Abstract As Web technology continues to develop, it has become increasingly common to use data stored on different clients. At the same time, federated learning has received widespread attention due to its ability to protect data privacy when let models learn from data which is distributed across various clients. However, most existing works assume that the client's data are fixed. In real-world scenarios, such an assumption is most likely not true as data may be continuously generated and new classes may also appear. To this end, we focus on the practical and challenging federated class-incremental learning (FCIL) problem. For FCIL, the local and global models may suffer from catastrophic forgetting on old classes caused by the arrival of new classes and the data distributions of clients are non-independent and identically distributed (non-iid). In this paper, we propose a novel method called Federated Class-Incremental Learning with PrompTing (FCILPT). Given the privacy and limited memory, FCILPT does not use a rehearsal-based buffer to keep exemplars of old data. We choose to use prompts to ease the catastrophic forgetting of the old classes. Specifically, we encode the task-relevant and task-irrelevant knowledge into prompts, preserving the old and new knowledge of the local clients and solving the problem of catastrophic forgetting. We first sort the task information in the prompt pool in the local clients to align the task information on different clients before global aggregation. It ensures that the same task's knowledge are fully integrated, solving the problem of non-iid caused by the lack of classes among different clients in the same incremental task. Experiments on CIFAR-100, Mini-ImageNet, and Tiny-ImageNet demonstrate that FCILPT achieves significant accuracy improvements over the state-of-the-art methods.
    摘要 随着网络技术的不断发展,使用分布在不同客户端上的数据变得越来越普遍。同时,联邦学习受到了广泛关注,因为它可以保护数据隐私,让模型从分布在不同客户端上的数据上学习。然而,现有的工作假设了客户端上的数据是固定的。在实际情况下,这是不太可能的,因为数据可能会不断生成并且新类可能会出现。为此,我们关注了实际和挑战性的联邦类增量学习(FCIL)问题。在FCIL中,本地和全球模型可能会由新类的到来和客户端数据分布的不同而导致快速忘记旧类。在这篇论文中,我们提出了一种新的方法called Federated Class-Incremental Learning with PrompTing(FCILPT)。 compte tenu de la confidentialité et de la mémoire limitée, FCILPT ne utilise pas de tampon de réhearsal pour garder des exemples d'données anciennes. Nous choisissons plutôt d'utiliser des prompts pour atténuer l'oubli catastrophique des classes anciennes. Plus précisément, nous encodons la connaissance pertinente et irrelevante des tâches dans les prompts, préservant ainsi la connaissance ancienne et nouvelle des clients locaux et résolvant le problème de l'oubli catastrophique. Nous classons d'abord les informations de tâche dans le pool de prompts des clients locaux pour aligner les informations de tâche sur les différents clients avant la globalisation. Cela garantit que la même tâche connaissance soient intégrées complètement, résolvant le problème de non-iid causé par la absence de classes chez les différents clients dans la même tâche incrémentielle. Les expériences sur CIFAR-100, Mini-ImageNet et Tiny-ImageNet montrent que FCILPT obtient des améliorations significatives en matière d'efficacité par rapport aux méthodes établies.

Progressively Efficient Learning

  • paper_url: http://arxiv.org/abs/2310.13004
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Ruijie Zheng, Khanh Nguyen, Hal Daumé III, Furong Huang, Karthik Narasimhan
  • for: 本研究旨在帮助人工智能代理人快速积累新技能和适应新用户喜好。
  • methods: 本研究提出了一种新的学习框架 named Communication-Efficient Interactive Learning (CEIL),该框架通过让学习代理人具备抽象、动态的语言和内在动机,使得学习代理人与教师之间的交流变得更加高效。
  • results: CEIL在2D MineCraftDomain上展示了出色的性能和交流效率,让学习代理人快速掌握新任务,并在与教师之间的交流中具备更高的效率和灵活性。
    Abstract Assistant AI agents should be capable of rapidly acquiring novel skills and adapting to new user preferences. Traditional frameworks like imitation learning and reinforcement learning do not facilitate this capability because they support only low-level, inefficient forms of communication. In contrast, humans communicate with progressive efficiency by defining and sharing abstract intentions. Reproducing similar capability in AI agents, we develop a novel learning framework named Communication-Efficient Interactive Learning (CEIL). By equipping a learning agent with an abstract, dynamic language and an intrinsic motivation to learn with minimal communication effort, CEIL leads to emergence of a human-like pattern where the learner and the teacher communicate progressively efficiently by exchanging increasingly more abstract intentions. CEIL demonstrates impressive performance and communication efficiency on a 2D MineCraft domain featuring long-horizon decision-making tasks. Agents trained with CEIL quickly master new tasks, outperforming non-hierarchical and hierarchical imitation learning by up to 50% and 20% in absolute success rate, respectively, given the same number of interactions with the teacher. Especially, the framework performs robustly with teachers modeled after human pragmatic communication behavior.
    摘要 translate into Simplified Chinese:助手AI应该具备快速学习新技能和适应新用户喜好。传统框架如仿效学习和奖励学习不支持这种功能,因为它们只支持低级别的不效的通信。相比之下,人类通过定义和分享抽象目标来进行进程式有效的通信。复制这种功能在AI代理中,我们开发了一种新的学习框架 named Communication-Efficient Interactive Learning (CEIL)。通过为学习代理提供抽象的动态语言和减少通信努力的内在动机,CEIL导致了人类类似的征性,learner和教师通过逐渐更加抽象的意图进行进程式有效的通信。CEIL在2D MineCraft中的长期决策任务上表现出色,代理训练CEIL快速掌握新任务,相比非层次仿效学习和层次仿效学习,具有50%和20%的绝对成功率提升,即使同样的交互次数。特别是,框架在人类 Pragmatic 通信行为模型下表现出了Robustness。

Embarrassingly Simple Text Watermarks

  • paper_url: http://arxiv.org/abs/2310.08920
  • repo_url: https://github.com/amicus-veritatis/easydemark
  • paper_authors: Ryoma Sato, Yuki Takezawa, Han Bao, Kenta Niwa, Makoto Yamada
  • for: 防止Large Language Models(LLM)生成的文本被误用,提高文本的准确性和可靠性。
  • methods: 提出了一种简单 yet effective的文本水印方法,称为Easymark,可以隐藏在文本中无 changing its meaning,并且可以通过一些简单的验证代码来检测文本是否来自Easymark。
  • results: 对LLM生成的文本进行了实验,结果表明Easymark可以准确地检测文本是否来自Easymark,并且不会影响文本的质量和可靠性。同时,Easymark也可以在用户端实现,不需要访问LLM提供者的服务。
    Abstract We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject a watermark without changing the meaning of the text at all while a validator can detect if a text was generated from a system that adopted Easymark or not with high credibility. Easymark is extremely easy to implement so that it only requires a few lines of code. Easymark does not require access to LLMs, so it can be implemented on the user-side when the LLM providers do not offer watermarked LLMs. In spite of its simplicity, it achieves higher detection accuracy and BLEU scores than the state-of-the-art text watermarking methods. We also prove the impossibility theorem of perfect watermarking, which is valuable in its own right. This theorem shows that no matter how sophisticated a watermark is, a malicious user could remove it from the text, which motivate us to use a simple watermark such as Easymark. We carry out experiments with LLM-generated texts and confirm that Easymark can be detected reliably without any degradation of BLEU and perplexity, and outperform state-of-the-art watermarks in terms of both quality and reliability.
    摘要 我们提出了Easymark,一种简单 yet effective的水印家族。文本水印在大语言模型(LLM)出现后变得越来越重要。LLM可以生成文本,这些文本与人类写的文本无法区分。这是一个严重的问题,它会影响文本的可靠性。Easymark是一种简单 yet effective的解决方案。Easymark可以在无 changing the meaning of the text 的情况下插入水印,并且可以使用一些行数据来检测文本是否来自系统。Easymark易于实现,只需要几行代码即可。它不需要访问 LLM,因此可以在用户端实现,即使提供者不提供水印的 LLM。尽管它的简单,但它可以达到更高的检测精度和BLEU分数,比过渡性的文本水印方法更好。我们还证明了水印不可能性定理,这是一个有价值的成果。这个定理表明,无论如何复杂的水印,一个恶意用户都可以从文本中移除它,这种情况下,我们选择使用简单的水印,如Easymark。我们对LLM生成的文本进行了实验,并证明了Easymark可靠地检测,无任何性能下降和BLEU和复杂度指标。同时,它也超越了现有的水印方法,在质量和可靠性两个方面取得更好的表现。

Relation-aware Ensemble Learning for Knowledge Graph Embedding

  • paper_url: http://arxiv.org/abs/2310.08917
  • repo_url: https://github.com/lars-research/relens
  • paper_authors: Ling Yue, Yongqi Zhang, Quanming Yao, Yong Li, Xian Wu, Ziheng Zhang, Zhenxi Lin, Yefeng Zheng
  • for: 本研究旨在提出一种基于现有方法的关系意识 Ensemble 方法,以优化知识图(KG)嵌入。
  • methods: 本方法使用了分治搜索并结合(Divide-Search-Combine)算法,独立搜索关系智能的ensemble веса,以提高搜索效率。
  • results: 实验结果表明,提出的方法可以高效地搜索关系意识ensemble weights,并实现了状态平台的嵌入性能。代码可以在 https://github.com/LARS-research/RelEns 上获取。
    Abstract Knowledge graph (KG) embedding is a fundamental task in natural language processing, and various methods have been proposed to explore semantic patterns in distinctive ways. In this paper, we propose to learn an ensemble by leveraging existing methods in a relation-aware manner. However, exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods. To address this issue, we propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently. This algorithm has the same computation cost as general ensemble methods but with much better performance. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method in efficiently searching relation-aware ensemble weights and achieving state-of-the-art embedding performance. The code is public at https://github.com/LARS-research/RelEns.
    摘要 知识图(KG)嵌入是自然语言处理中的基本任务,各种方法已经被提出来探索各种语义特征。在这篇论文中,我们提议使用现有方法 ensemble 的方式来探索这些语义特征。然而,使用这种方法会增加搜索空间的规模,比普通ensemble方法更大。为解决这个问题,我们提出了一种分治搜索合并算法RelEns-DSC,该算法独立搜索关系智能的ensemble重量。这个算法与普通ensemble方法的计算成本相同,但性能却更好。实验结果表明,我们的方法可以有效地搜索关系智能的ensemble重量,并达到当前最佳嵌入性能。代码可以在https://github.com/LARS-research/RelEns上获取。

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

  • paper_url: http://arxiv.org/abs/2310.08915
  • repo_url: https://github.com/zyxxmu/dsnot
  • paper_authors: Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, Rongrong Ji
  • for: 提高 sparse LLMs 的性能,尤其在高粒度水平。
  • methods: 提出了一种无需训练的 fine-tuning 方法,通过 iterative weight pruning-and-growing 来 slightlly 更新 sparse LLMs。
  • results: 在 LLMA-V1/V2、Vicuna 和 OPT 上进行了广泛的实验,并证明了 DSnoT 可以提高 sparse LLMs 的性能,特别在高粒度水平上。例如,与state-of-the-art Wanda 相比,DSnoT 在 70% 粒度水平上提高了26.79 的抽象率。
    Abstract The ever-increasing large language models (LLMs), though opening a potential path for the upcoming artificial general intelligence, sadly drops a daunting obstacle on the way towards their on-device deployment. As one of the most well-established pre-LLMs approaches in reducing model complexity, network pruning appears to lag behind in the era of LLMs, due mostly to its costly fine-tuning (or re-training) necessity under the massive volumes of model parameter and training data. To close this industry-academia gap, we introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that slightly updates sparse LLMs without the expensive backpropagation and any weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs. To accomplish this purpose, DSnoT particularly takes into account the anticipated reduction in reconstruction error for pruning and growing, as well as the variance w.r.t. different input data for growing each weight. This practice can be executed efficiently in linear time since its obviates the need of backpropagation for fine-tuning LLMs. Extensive experiments on LLaMA-V1/V2, Vicuna, and OPT across various benchmarks demonstrate the effectiveness of DSnoT in enhancing the performance of sparse LLMs, especially at high sparsity levels. For instance, DSnoT is able to outperform the state-of-the-art Wanda by 26.79 perplexity at 70% sparsity with LLaMA-7B. Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs. Codes are available at https://github.com/zyxxmu/DSnoT.
    摘要 大型语言模型(LLM)的不断增长,尽管开启了人工通用智能的潜在道路,但却存在一个巨大的障碍物,即在设备上部署 LLM 时的困难。作为一种已有的预 LLM 策略,网络剪辑可以减少模型复杂性,但在大量模型参数和训练数据的情况下,它却落后于 LLM 时代。为解决这个行业学术之阔,我们介绍了一种无需训练的精炼方法——动态缺失训练(DSnoT)。受动态缺失训练的启发,DSnoT 将在缺失 LLM 上进行轻量级的更新,而不需要昂贵的反propagation 和任何参数更新。在实施这种方法时,DSnoT 特别考虑了预计的减少征重误差和不同输入数据的变化。这种做法可以高效地执行,只需要线性时间。我们的实验表明,DSnoT 可以在不同的基础模型和难度水平上提高缺失 LLM 的性能,特别是在高缺失率下。例如,DSnoT 可以在 70% 缺失率下比 state-of-the-art Wanda 提高 26.79 的误差。我们的论文为无需训练的精炼 sparse LLM 提供了新的视角,并开启了将缺失潜力应用于 LLM 的新venue。代码可以在 上获取。

Community Membership Hiding as Counterfactual Graph Search via Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.08909
  • repo_url: None
  • paper_authors: Andrea Bernini, Fabrizio Silvestri, Gabriele Tolomei
  • for: 本研究旨在解决社交媒体平台上的社群成员隐私保护问题,即通过修改网络图的结构性质,防止某些节点被某种社群检测算法识别。
  • methods: 本研究使用深度强化学习解决此问题,通过定制反事实图像目标来约束社群检测算法。
  • results: 经过广泛的实验,我们的方法在节点和社群欺骗两个任务中均表现出色,与现有的基线方法相比,其效果更好。
    Abstract Community detection techniques are useful tools for social media platforms to discover tightly connected groups of users who share common interests. However, this functionality often comes at the expense of potentially exposing individuals to privacy breaches by inadvertently revealing their tastes or preferences. Therefore, some users may wish to safeguard their anonymity and opt out of community detection for various reasons, such as affiliation with political or religious organizations. In this study, we address the challenge of community membership hiding, which involves strategically altering the structural properties of a network graph to prevent one or more nodes from being identified by a given community detection algorithm. We tackle this problem by formulating it as a constrained counterfactual graph objective, and we solve it via deep reinforcement learning. We validate the effectiveness of our method through two distinct tasks: node and community deception. Extensive experiments show that our approach overall outperforms existing baselines in both tasks.
    摘要 社区检测技术是社交媒体平台上的有用工具,可以找到用户之间的紧密连接群体,共享共同的兴趣。然而,这些功能经常会导致个人隐私泄露,因为检测社区可能会意外披露用户的偏好或喜好。因此,一些用户可能希望保护自己的隐私,因此可能会选择退出社区检测。在这种情况下,我们需要解决社区成员隐藏的挑战,即使用某些策略来阻止社区检测算法找到某些节点。我们解决这个问题,通过形式化为干扰对数据集的对数目标,并使用深度强化学习解决。我们验证了我们的方法的有效性,通过两个不同的任务:节点和社区欺骗。广泛的实验表明,我们的方法在两个任务中都能够超越现有的基eline。

Welfare Diplomacy: Benchmarking Language Model Cooperation

  • paper_url: http://arxiv.org/abs/2310.08901
  • repo_url: https://github.com/mukobi/welfare-diplomacy
  • paper_authors: Gabriel Mukobi, Hannah Erlebach, Niklas Lauffer, Lewis Hammond, Alan Chan, Jesse Clifton
  • for: 本研究旨在提供一种更加robust的多代理系统测试工具,以便研究者可以更好地评估和培养多代理系统的合作能力。
  • methods: 本研究使用了一种基于Diplomacy游戏的一般 SUM variants,称为“福利外交”,其中玩家需要平衡军事征战和国内福利投入。
  • results: 实验结果显示,使用现有语言模型实现的基线代理可以 дости得高度的社会福利,但是它们可以被利用。我们的工作旨在推动社会安全性,帮助研究者开发和评估多代理系统。
    Abstract The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities. Unfortunately, most multi-agent benchmarks are either zero-sum or purely cooperative, providing limited opportunities for such measurements. We introduce a general-sum variant of the zero-sum board game Diplomacy -- called Welfare Diplomacy -- in which players must balance investing in military conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both a clearer assessment of and stronger training incentives for cooperative capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules and implementing them via an open-source Diplomacy engine; (2) constructing baseline agents using zero-shot prompted language models; and (3) conducting experiments where we find that baselines using state-of-the-art models attain high social welfare but are exploitable. Our work aims to promote societal safety by aiding researchers in developing and assessing multi-agent AI systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is available at https://github.com/mukobi/welfare-diplomacy.
    摘要 随着人工智能系统的发展和普遍应用,需要有 robust的标准测试方法来评估它们的合作能力。然而,大多数多代理标准测试都是零和或纯合作的,这限制了测试的可能性。我们介绍了一种基于零和游戏 дипломати的通用零和 variant -- 卫生外交 -- 在这个游戏中,玩家需要协调军事征战和国内福利投入。我们认为,卫生外交可以提供更清晰的评估和更强的培训约束,以提高合作能力。我们的贡献包括:1. 提出卫生外交规则并通过开源的 Diplomacy 引擎实现;2. 使用零批语言模型构建基线代理;3. 进行实验,发现基线使用现有模型可以达到高度的社会福利,但是可以被利用。我们的工作目的是促进社会安全,帮助研究人员开发和评估多代理人工智能系统。代码来评估卫生外交和重现实验可以在 GitHub 上获取:https://github.com/mukobi/welfare-diplomacy。

A Hybrid Transfer Learning Assisted Decision Support System for Accurate Prediction of Alzheimer Disease

  • paper_url: http://arxiv.org/abs/2310.08888
  • repo_url: None
  • paper_authors: Mahin Khan Mahadi, Abdullah Abdullah, Jamal Uddin, Asif Newaz
  • for: 这个研究旨在提高阿尔ツ海默病(AD)的早期诊断和预测,以便提高诊断和治疗的效果。
  • methods: 本研究使用深度学习技术,并提出了一种独特的策略来解决不均衡数据集分类问题。研究使用了五种传输学习模型和ensemble平均模型,并进行了模型微调。
  • results: 研究发现,使用合并平均模型和传输学习模型可以提高AD阶段多类分类的准确率,并达到了98.91%的最高精度。
    Abstract Alzheimer's disease (AD) is the most common long-term illness in elderly people. In recent years, deep learning has become popular in the area of medical imaging and has had a lot of success there. It has become the most effective way to look at medical images. When it comes to detecting AD, the deep neural model is more accurate and effective than general machine learning. Our research contributes to the development of a more comprehensive understanding and detection of the disease by identifying four distinct classes that are predictive of AD with a high weighted accuracy of 98.91%. A unique strategy has been proposed to improve the accuracy of the imbalance dataset classification problem via the combination of ensemble averaging models and five different transfer learning models in this study. EfficientNetB0+Resnet152(effnet+res152) and InceptionV3+EfficientNetB0+Resnet50(incep+effnet+res50) models have been fine-tuned and have reached the highest weighted accuracy for multi-class AD stage classifications.
    摘要 阿尔茨海默病(AD)是老年人群中最常见的长期疾病。在最近的几年中,深度学习在医疗影像领域得到了广泛的应用,并取得了很多成功。深度神经网络在医疗影像识别方面已成为最有效的方法。在检测AD方面,深度神经网络的准确率和效果比普通机器学习更高。我们的研究贡献了对阿尔茨海默病的更全面理解和检测的发展,通过确定四个预测AD的分类类型,实现了98.91%的高积分准确率。本研究提出了一种独特的策略,通过ensemble averaging模型和五种传输学习模型的组合,解决了医疗影像数据集分类问题的偏度问题。EfficientNetB0+Resnet152(effnet+res152)和InceptionV3+EfficientNetB0+Resnet50(incep+effnet+res50)模型在多类AD阶段分类 task中达到了最高积分准确率。

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

  • paper_url: http://arxiv.org/abs/2310.08887
  • repo_url: https://github.com/seohongpark/metra
  • paper_authors: Seohong Park, Oleh Rybkin, Sergey Levine
  • For: 本研究旨在提出一种新的无监督学习目标函数,以使无监督学习可扩展到复杂高维环境。* Methods: 我们提出了一种新的无监督学习目标函数,即度量感知抽象(METRA),它不直接覆盖整个状态空间,而是只覆盖一个紧密相关的纬度空间(Z),通过学习在这个纬度空间中移动,以获得一个可观察的集合多种行为,这些行为可以在高维环境中扩展到复杂环境。* Results: 我们通过在五个 lokomotion 和抓取环境中进行实验,发现METRA可以在复杂的像素环境中发现多种有用的行为,是首个在像素环境中发现多种 lokomotion 行为的无监督学习方法。
    Abstract Unsupervised pre-training strategies have proven to be highly effective in natural language processing and computer vision. Likewise, unsupervised reinforcement learning (RL) holds the promise of discovering a variety of potentially useful behaviors that can accelerate the learning of a wide array of downstream tasks. Previous unsupervised RL approaches have mainly focused on pure exploration and mutual information skill learning. However, despite the previous attempts, making unsupervised RL truly scalable still remains a major open challenge: pure exploration approaches might struggle in complex environments with large state spaces, where covering every possible transition is infeasible, and mutual information skill learning approaches might completely fail to explore the environment due to the lack of incentives. To make unsupervised RL scalable to complex, high-dimensional environments, we propose a novel unsupervised RL objective, which we call Metric-Aware Abstraction (METRA). Our main idea is, instead of directly covering the entire state space, to only cover a compact latent space $Z$ that is metrically connected to the state space $S$ by temporal distances. By learning to move in every direction in the latent space, METRA obtains a tractable set of diverse behaviors that approximately cover the state space, being scalable to high-dimensional environments. Through our experiments in five locomotion and manipulation environments, we demonstrate that METRA can discover a variety of useful behaviors even in complex, pixel-based environments, being the first unsupervised RL method that discovers diverse locomotion behaviors in pixel-based Quadruped and Humanoid. Our code and videos are available at https://seohong.me/projects/metra/
    摘要 <> translate_language: zh-CNUnsupervised pre-training strategies have proven to be highly effective in natural language processing and computer vision. Likewise, unsupervised reinforcement learning (RL) holds the promise of discovering a variety of potentially useful behaviors that can accelerate the learning of a wide array of downstream tasks. Previous unsupervised RL approaches have mainly focused on pure exploration and mutual information skill learning. However, despite the previous attempts, making unsupervised RL truly scalable still remains a major open challenge: pure exploration approaches might struggle in complex environments with large state spaces, where covering every possible transition is infeasible, and mutual information skill learning approaches might completely fail to explore the environment due to the lack of incentives. To make unsupervised RL scalable to complex, high-dimensional environments, we propose a novel unsupervised RL objective, which we call Metric-Aware Abstraction (METRA). Our main idea is, instead of directly covering the entire state space, to only cover a compact latent space $Z$ that is metrically connected to the state space $S$ by temporal distances. By learning to move in every direction in the latent space, METRA obtains a tractable set of diverse behaviors that approximately cover the state space, being scalable to high-dimensional environments. Through our experiments in five locomotion and manipulation environments, we demonstrate that METRA can discover a variety of useful behaviors even in complex, pixel-based environments, being the first unsupervised RL method that discovers diverse locomotion behaviors in pixel-based Quadruped and Humanoid. Our code and videos are available at https://seohong.me/projects/metra/.Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need any further assistance.

Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models

  • paper_url: http://arxiv.org/abs/2310.08873
  • repo_url: None
  • paper_authors: Zhen Zhang, Anran Lin, Chun Wai Wong, Xiangyu Chu, Qi Dou, K. W. Samuel Au
  • For: This paper proposes an interactive navigation framework for robots to navigate in environments with traversable obstacles, using large language and vision-language models.* Methods: The proposed framework utilizes a large language model (GPT-3.5) and an open-set Vision-language Model (Grounding DINO) to create an action-aware costmap for effective path planning without fine-tuning.* Results: The proposed framework was effective and adaptable to diverse environments, as demonstrated by experimental results that included traversing curtains in a medical scenario.
    Abstract This paper proposes an interactive navigation framework by using large language and vision-language models, allowing robots to navigate in environments with traversable obstacles. We utilize the large language model (GPT-3.5) and the open-set Vision-language Model (Grounding DINO) to create an action-aware costmap to perform effective path planning without fine-tuning. With the large models, we can achieve an end-to-end system from textual instructions like "Can you pass through the curtains to deliver medicines to me?", to bounding boxes (e.g., curtains) with action-aware attributes. They can be used to segment LiDAR point clouds into two parts: traversable and untraversable parts, and then an action-aware costmap is constructed for generating a feasible path. The pre-trained large models have great generalization ability and do not require additional annotated data for training, allowing fast deployment in the interactive navigation tasks. We choose to use multiple traversable objects such as curtains and grasses for verification by instructing the robot to traverse them. Besides, traversing curtains in a medical scenario was tested. All experimental results demonstrated the proposed framework's effectiveness and adaptability to diverse environments.
    摘要 中文翻译:这篇论文提出了一种基于大语言和视觉语言模型的互动导航框架,使得机器人可以在具有可通过障碍物的环境中导航。该框架使用预训练的大语言模型(GPT-3.5)和开放集Vision-语言模型(Grounding DINO)创建一个动作相关的成本地图,以实现不需要微调的有效路径规划。通过这些大模型,系统可以从文本指令“你可以通过报幕 deliver 药物到我”转化为包含动作相关特征的矩形框,例如报幕。这些矩形框可以用来分割 LiDAR 点云为可通过和不可通过的两部分,然后构建一个动作相关的成本地图,以生成可行的路径。预训练的大模型具有很好的泛化能力,不需要额外的标注数据进行训练,因此可以快速部署在互动导航任务中。为验证提议框架的效果和适应性,作者们选择了多种可通过的物体,如报幕和草坪,并测试了在医疗enario中 traverse 报幕的能力。所有实验结果表明,提议的框架具有效果和适应性。

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

  • paper_url: http://arxiv.org/abs/2310.08866
  • repo_url: None
  • paper_authors: Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio
  • for: 本研究旨在评估 transformers 是否可以有效地泛化对不同难度示例的问题。
  • methods: 我们引入了一种新的任务,该任务是基于 Zhang et al. (2021) 提出的 pointer value retrieval 任务的变种。我们 investigate 如何使用 transformers 中的机制来实现适应 computation step 的数量(i.e., 计算图的深度),以便解决这些任务。
  • results: 我们发现,使用 Hyper-UT 模型,即将 hyper networks 与 Universal Transformers 结合使用,可以提高准确率并均匀分配计算资源。此外,我们发现,在标准图像识别任务中,Hyper-UT 的性能与 ViT 模型相当,但具有许多更少的计算开销(可以减少计算步骤的数量,从而实现超过 70% 的平均成本减少)。
    Abstract Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et al. (2021). We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i.e., the depth of the computation graph). Based on our observations, we propose a transformer-based architecture called Hyper-UT, which combines dynamic function generation from hyper networks with adaptive depth from Universal Transformers. This model demonstrates higher accuracy and a fairer allocation of computational resources when generalizing to higher numbers of computation steps. We conclude that mechanisms for adaptive depth and modularity complement each other in improving efficient generalization concerning example complexity. Additionally, to emphasize the broad applicability of our findings, we illustrate that in a standard image recognition task, Hyper- UT's performance matches that of a ViT model but with considerably reduced computational demands (achieving over 70\% average savings by effectively using fewer layers).
    摘要 可以不是 transformers 能够高效泛化不同难度示例吗?我们引入一个新任务,用于评估 transformers 在不同复杂度示例上的泛化能力。这些任务是 Zhang et al. (2021) 所介绍的指针值 Retrieval 任务的变种。我们发现,标准 transformers 在解决这些任务时遇到了挑战。我们 investigate 如何使用 transformers 中的机制来实现适应性和模块化计算,以便在不同计算步骤数(i.e., 计算图的深度)上进行泛化。根据我们的观察,我们提出了一种基于 transformers 的架构,即 Hyper-UT,它将 hyper 网络的动态函数生成与 Universal Transformers 的适应深度结合起来。这个模型在泛化到更高的计算步骤数时表现出更高的准确率和更公平的计算资源分配。我们 conclude 的是,适应性和模块化计算机制之间存在相互补做的关系,可以有效提高 transformers 的泛化效率。此外,我们使用标准图像识别任务来说明我们的发现的广泛应用性,Hyper-UT 的性能与 ViT 模型相同,但计算成本减少了大约 70%(通过使用 fewer layers 实现)。

Adam-family Methods with Decoupled Weight Decay in Deep Learning

  • paper_url: http://arxiv.org/abs/2310.08858
  • repo_url: None
  • paper_authors: Kuangyu Ding, Nachuan Xiao, Kim-Chuan Toh
  • for: 本研究 investigate Adam-family 方法在解决具有幂加权范围的非拟合非凹降优化问题中的收敛性质。特别是在训练具有权重衰变的非拟合神经网络中。
  • methods: 我们提出了一种基于 AdamW 方法的新框架,其中对于权重衰变项进行独立更新。我们在这个框架下Estimators for the first-order and second-order moments of stochastic subgradients are updated independently of the weight decay term。我们在假设和步长不减小的情况下, prove the convergence properties of our proposed framework。
  • results: 我们显示了我们提出的框架可以包含许多已知的 Adam-family 方法,从而为这些方法在训练非拟合神经网络时提供收敛保证。此外,我们还证明了我们的框架在训练过程中可以近似 SGD 方法,从而解释了在实际中对 Adam-family 方法加入 decoupled 权重衰变后的经验观察。我们还提出了一种基于我们的框架的新 Adam-family 方法,名为 Adam with Decoupled Weight Decay (AdamD),并证明了它的收敛性质。实验表明,AdamD 在一致性和效率两个方面与 Adam 和 AdamW 相当。
    Abstract In this paper, we investigate the convergence properties of a wide class of Adam-family methods for minimizing quadratically regularized nonsmooth nonconvex optimization problems, especially in the context of training nonsmooth neural networks with weight decay. Motivated by the AdamW method, we propose a novel framework for Adam-family methods with decoupled weight decay. Within our framework, the estimators for the first-order and second-order moments of stochastic subgradients are updated independently of the weight decay term. Under mild assumptions and with non-diminishing stepsizes for updating the primary optimization variables, we establish the convergence properties of our proposed framework. In addition, we show that our proposed framework encompasses a wide variety of well-known Adam-family methods, hence offering convergence guarantees for these methods in the training of nonsmooth neural networks. More importantly, we show that our proposed framework asymptotically approximates the SGD method, thereby providing an explanation for the empirical observation that decoupled weight decay enhances generalization performance for Adam-family methods. As a practical application of our proposed framework, we propose a novel Adam-family method named Adam with Decoupled Weight Decay (AdamD), and establish its convergence properties under mild conditions. Numerical experiments demonstrate that AdamD outperforms Adam and is comparable to AdamW, in the aspects of both generalization performance and efficiency.
    摘要 在这篇论文中,我们研究了亚当家族方法的收敛性质,尤其是在训练非平滑神经网络时。基于AdamW方法,我们提出了一种新的框架,在这个框架中,估计随机下降的第一阶和第二阶的均值独立更新weight decay项。假设满足某些轻量级的假设,并将主要优化变量的步长保持不断增大,我们证明了我们的提议的收敛性。此外,我们还证明了我们的框架包含了许多已知的亚当家族方法,因此对这些方法的收敛性提供了确 guarantees。更重要的是,我们证明了我们的框架在训练非平滑神经网络时 asymptotically approximates SGD方法,从而解释了在实际中观察到分离weight decay提高了亚当家族方法的普遍性表现。在实践中,我们提出了一种名为Adam with Decoupled Weight Decay(AdamD)的新的亚当家族方法,并证明了它的收敛性。numerical experiments表明,AdamD在普遍性和效率两个方面都高于Adam,并与AdamW相当。

Path To Gain Functional Transparency In Artificial Intelligence With Meaningful Explainability

  • paper_url: http://arxiv.org/abs/2310.08849
  • repo_url: None
  • paper_authors: Md. Tanzib Hosain, Mehedi Hasan Anik, Sadman Rafi, Rana Tabassum, Khaleque Insia, Md. Mehrab Siddiky
  • for: 这篇论文目的是提出一种用户参与的透明系统设计方法,以便开发透明和可解释的人工智能系统。
  • methods: 该论文使用多种方法,包括透明性、可解释性和社会价值观等多种方法,以满足不同领域的需求。
  • results: 该论文提出了一种用户参与的透明系统设计方法,可以帮助开发者开发透明和可解释的人工智能系统,并且可以满足不同领域的需求。
    Abstract Artificial Intelligence (AI) is rapidly integrating into various aspects of our daily lives, influencing decision-making processes in areas such as targeted advertising and matchmaking algorithms. As AI systems become increasingly sophisticated, ensuring their transparency and explainability becomes crucial. Functional transparency is a fundamental aspect of algorithmic decision-making systems, allowing stakeholders to comprehend the inner workings of these systems and enabling them to evaluate their fairness and accuracy. However, achieving functional transparency poses significant challenges that need to be addressed. In this paper, we propose a design for user-centered compliant-by-design transparency in transparent systems. We emphasize that the development of transparent and explainable AI systems is a complex and multidisciplinary endeavor, necessitating collaboration among researchers from diverse fields such as computer science, artificial intelligence, ethics, law, and social science. By providing a comprehensive understanding of the challenges associated with transparency in AI systems and proposing a user-centered design framework, we aim to facilitate the development of AI systems that are accountable, trustworthy, and aligned with societal values.
    摘要 人工智能(AI)在我们日常生活中逐渐融入到各个方面,影响决策过程,如目标广告和匹配算法。随着AI系统的不断发展,保证它们的透明度和解释性变得急需。内部透明度是算法决策系统的基本特征,允许潜在利益相关人了解这些系统的内部工作原理,并评估它们的公平和准确性。但实现内部透明度带来了重要挑战,需要解决。在这篇论文中,我们提出了一种用户中心的透明系统设计,以便实现可靠、信任würdigung和社会价值观 align的AI系统。我们强调了开发透明和解释AI系统的复杂和多学科性,需要与计算机科学、人工智能、伦理、法律和社会科学等领域的研究人员合作。通过帮助理解AI系统透明度中的挑战和提出一种用户中心设计框架,我们希望能够促进开发可靠、可信、符合社会价值的AI系统。

A Case-Based Persistent Memory for a Large Language Model

  • paper_url: http://arxiv.org/abs/2310.08842
  • repo_url: None
  • paper_authors: Ian Watson
  • for: 这篇论文主要写于提出CBR研究人员应该更加关注现代人工智能技术的发展,特别是深度学习和大语言模型。
  • methods: 论文提出使用CBR方法和深度学习技术可以实现人工智能总体智能的进步。
  • results: 论文指出,通过将CBR方法与深度学习技术结合使用,可以提供 persistente memory 以便大语言模型进行进步,并可以实现人工智能总体智能的进步。
    Abstract Case-based reasoning (CBR) as a methodology for problem-solving can use any appropriate computational technique. This position paper argues that CBR researchers have somewhat overlooked recent developments in deep learning and large language models (LLMs). The underlying technical developments that have enabled the recent breakthroughs in AI have strong synergies with CBR and could be used to provide a persistent memory for LLMs to make progress towards Artificial General Intelligence.
    摘要

Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning in Surgical Robotic Environments

  • paper_url: http://arxiv.org/abs/2310.08841
  • repo_url: None
  • paper_authors: Maryam Zare, Parham M. Kebria, Abbas Khosravi
  • for: 本研究旨在开发一种能够在无实时互动的情况下进行学习控制的方法,以降低成本和安全隐患,并且可以利用现有的数据集来进行学习。
  • methods: 本研究使用的方法是基于最佳运输(Optimal Transport)的奖励标注(OTR)算法,可以快速和高效地将无标注数据集与专家示范视频进行对比,从而计算出一个有效的奖励信号。
  • results: 研究表明,使用OTR算法可以快速和高效地学习控制策略,并且不需要手动设计奖励函数。此外,研究还表明了OTR算法的 universality 和可重用性,可以在不同领域中进行应用。
    Abstract Most Reinforcement Learning (RL) methods are traditionally studied in an active learning setting, where agents directly interact with their environments, observe action outcomes, and learn through trial and error. However, allowing partially trained agents to interact with real physical systems poses significant challenges, including high costs, safety risks, and the need for constant supervision. Offline RL addresses these cost and safety concerns by leveraging existing datasets and reducing the need for resource-intensive real-time interactions. Nevertheless, a substantial challenge lies in the demand for these datasets to be meticulously annotated with rewards. In this paper, we introduce Optimal Transport Reward (OTR) labelling, an innovative algorithm designed to assign rewards to offline trajectories, using a small number of high-quality expert demonstrations. The core principle of OTR involves employing Optimal Transport (OT) to calculate an optimal alignment between an unlabeled trajectory from the dataset and an expert demonstration. This alignment yields a similarity measure that is effectively interpreted as a reward signal. An offline RL algorithm can then utilize these reward signals to learn a policy. This approach circumvents the need for handcrafted rewards, unlocking the potential to harness vast datasets for policy learning. Leveraging the SurRoL simulation platform tailored for surgical robot learning, we generate datasets and employ them to train policies using the OTR algorithm. By demonstrating the efficacy of OTR in a different domain, we emphasize its versatility and its potential to expedite RL deployment across a wide range of fields.
    摘要 大多数强化学习(RL)方法通常在活动学习环境中研究,agent直接与环境交互,观察行动结果,并通过尝试和错误学习。然而,允许部分训练agent交互 avec real physical systems的高成本、安全风险以及需要不断监督的问题。offline RL通过利用现有数据集和减少实时交互的成本来解决这些问题。然而,需要这些数据集都必须得到仔细的标注 reward。在这篇论文中,我们介绍了Optimal Transport Reward(OTR)标签算法,这是一种使用少量高质量的专家示范来分配奖励给 offline trajectory的算法。OTR的核心思想是使用Optimal Transport(OT)计算一个未标注的数据集中的一个路径和专家示范之间的最佳对应。这个对应对应到一个相似度度量,可以被视为奖励信号。一个offline RL算法可以使用这些奖励信号来学习策略。这种方法可以绕过手工设置奖励,解锁了可以使用庞大数据集来学习策略的潜在性。利用SurRoL simulation platform,我们生成了数据集,并使用OTR算法来训练策略。我们在不同领域中证明了OTR的可行性,从而强调其universality和可以快速部署在各个领域的潜在性。

Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogue

  • paper_url: http://arxiv.org/abs/2310.08840
  • repo_url: None
  • paper_authors: Hongru Wang, Minda Hu, Yang Deng, Rui Wang, Fei Mi, Weichao Wang, Yasheng Wang, Wai-Chung Kwan, Irwin King, Kam-Fai Wong
  • for: 实现开放领域对话系统,需要不同的知识来生成更加详细和证据性的回答。
  • methods: 我们提出了SAFARI框架,利用大型自然语言模型(LLM)的观察、理解和实现能力,在指导下和无指导下的设定下运作。
  • results: 我们在\textbf{KBP} dataset上进行实验,展示了SAFARI框架可以生成具有人格性和知识增强的回答。
    Abstract Open-domain dialogue system usually requires different sources of knowledge to generate more informative and evidential responses. However, existing knowledge-grounded dialogue systems either focus on a single knowledge source or overlook the dependency between multiple sources of knowledge, which may result in generating inconsistent or even paradoxical responses. To incorporate multiple knowledge sources and dependencies between them, we propose SAFARI, a novel framework that leverages the exceptional capabilities of large language models (LLMs) in planning, understanding, and incorporating under both supervised and unsupervised settings. Specifically, SAFARI decouples the knowledge grounding into multiple sources and response generation, which allows easy extension to various knowledge sources including the possibility of not using any sources. To study the problem, we construct a personalized knowledge-grounded dialogue dataset \textit{\textbf{K}nowledge \textbf{B}ehind \textbf{P}ersona}~(\textbf{KBP}), which is the first to consider the dependency between persona and implicit knowledge. Experimental results on the KBP dataset demonstrate that the SAFARI framework can effectively produce persona-consistent and knowledge-enhanced responses.
    摘要 The SAFARI framework decouples knowledge grounding from response generation, allowing for easy extension to various knowledge sources, including the possibility of not using any sources. To evaluate the effectiveness of the SAFARI framework, we constructed the \textbf{KBP} dataset, the first to consider the dependency between persona and implicit knowledge. Experimental results on the KBP dataset show that the SAFARI framework can produce persona-consistent and knowledge-enhanced responses.Here is the translation in Simplified Chinese:Open-domain对话系统通常需要访问多个知识源以生成更加信息 dense和证据基于的回答。然而,现有的知识固定对话系统 Either focus on a single knowledge source or ignore the relationships between multiple sources of knowledge, which may result in generating inconsistent or even paradoxical responses. To address this issue, we propose the SAFARI framework, which leverages the capabilities of large language models (LLMs) in planning, understanding, and incorporating knowledge under both supervised and unsupervised settings.The SAFARI framework decouples knowledge grounding from response generation, allowing for easy extension to various knowledge sources, including the possibility of not using any sources. To evaluate the effectiveness of the SAFARI framework, we constructed the \textbf{KBP} dataset, the first to consider the dependency between persona and implicit knowledge. Experimental results on the KBP dataset show that the SAFARI framework can produce persona-consistent and knowledge-enhanced responses.

A Framework for Few-Shot Policy Transfer through Observation Mapping and Behavior Cloning

  • paper_url: http://arxiv.org/abs/2310.08836
  • repo_url: https://github.com/shukla-yash/few-shot-policy-transfer
  • paper_authors: Yash Shukla, Bharat Kesari, Shivam Goel, Robert Wright, Jivko Sinapov
  • for: 降低人工交互成本,增进机器人应用中的学习效率
  • methods: 使用Generative Adversarial Networks (GANs)和循环一致损失来映射源领域和目标领域的观察,然后使用这些学习的映射来复制源任务成功的政策到目标领域
  • results: 成功实现几个shot策略传递,并在源和目标任务之间存在语义上的不同情况下也能够获得良好的结果
    Abstract Despite recent progress in Reinforcement Learning for robotics applications, many tasks remain prohibitively difficult to solve because of the expensive interaction cost. Transfer learning helps reduce the training time in the target domain by transferring knowledge learned in a source domain. Sim2Real transfer helps transfer knowledge from a simulated robotic domain to a physical target domain. Knowledge transfer reduces the time required to train a task in the physical world, where the cost of interactions is high. However, most existing approaches assume exact correspondence in the task structure and the physical properties of the two domains. This work proposes a framework for Few-Shot Policy Transfer between two domains through Observation Mapping and Behavior Cloning. We use Generative Adversarial Networks (GANs) along with a cycle-consistency loss to map the observations between the source and target domains and later use this learned mapping to clone the successful source task behavior policy to the target domain. We observe successful behavior policy transfer with limited target task interactions and in cases where the source and target task are semantically dissimilar.
    摘要 尽管近期在机器人学中的强化学习进步有所,许多任务仍然具有昂贵的互动成本,导致解决这些任务变得困难。通过知识传递,可以减少目标领域的训练时间。Sim2Real传输可以将在虚拟机器人领域学习的知识传递到物理目标领域。这可以减少物理世界中交互成本高的训练时间。然而,大多数现有方法假设两个领域之间的任务结构和物理特性是一致的。本文提出了一种基于 Observation Mapping 和 Behavior Cloning 的多shot策略传输框架。我们使用生成对抗网络(GANs)以及一个循环一致损失函数来映射源领域和目标领域的观察结果。之后,我们使用这个学习的映射来启用源领域成功行为策略到目标领域。我们观察到了有限目标任务互动和semantic不一致情况下成功的行为策略传输。

Urban Drone Navigation: Autoencoder Learning Fusion for Aerodynamics

  • paper_url: http://arxiv.org/abs/2310.08830
  • repo_url: None
  • paper_authors: Jiaohao Wu, Yang Ye, Jing Du
  • for: 这篇论文是为了提高城市紧急搜救(SAR)中无人机的导航而写的。
  • methods: 这篇论文使用了多目标束资本学习(MORL)和卷积整编器来改进无人机的城市SAR导航。
  • results: 测试在纽约市模型上,这种方法可以提高无人机的导航决策、优化路径和对风效应的应对,从而提高城市SAR操作的效率和精度。
    Abstract Drones are vital for urban emergency search and rescue (SAR) due to the challenges of navigating dynamic environments with obstacles like buildings and wind. This paper presents a method that combines multi-objective reinforcement learning (MORL) with a convolutional autoencoder to improve drone navigation in urban SAR. The approach uses MORL to achieve multiple goals and the autoencoder for cost-effective wind simulations. By utilizing imagery data of urban layouts, the drone can autonomously make navigation decisions, optimize paths, and counteract wind effects without traditional sensors. Tested on a New York City model, this method enhances drone SAR operations in complex urban settings.
    摘要 飞机在都市紧急搜救(SAR)中是非常重要的,因为都市环境具有动态的特点和障碍物,如建筑物和风。这篇论文提出了一种方法,该方法将多目标束赋学(MORL)与卷积 autoencoder 结合以提高飞机在都市 SAR 中的导航。该方法使用 MORL 来实现多个目标,并使用 autoencoder 来实现cost-effective的风 simulations。通过利用城市布局图像数据,飞机可以自动做出导航决策,优化路径和对抗风效应,不需要传统的感知器。在纽约市模型上进行测试,这种方法可以提高飞机 SAR 操作在复杂的都市环境中。

Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations

  • paper_url: http://arxiv.org/abs/2310.08823
  • repo_url: None
  • paper_authors: Lu Li, Yuxin Pan, Ruobing Chen, Jie Liu, Zilin Wang, Yu Liu, Zhiheng Li
  • for: 这篇论文主要目标是解决 inverse reinforcement learning(IRL)中的奖励函数学习问题,即从收集到的专家示范数据中提取出奖励函数。
  • methods: 该论文提出了一种名为 Distance-rank Aware Sequential Reward Learning(DRASRL)的框架,它将考虑 traces 的排名和差异度来协同消除奖励函数的ambiguity。DRASRL 使用了距离政策为排序 traces,并使用了对比学习技术来学习奖励信号。
  • results: 经过大量的实验,DRASRL 比前一个最佳方法(SOTA)表现出了显著的性能提升。
    Abstract Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations. Considering that obtaining expert demonstrations can be costly, the focus of current IRL techniques is on learning a better-than-demonstrator policy using a reward function derived from sub-optimal demonstrations. However, existing IRL algorithms primarily tackle the challenge of trajectory ranking ambiguity when learning the reward function. They overlook the crucial role of considering the degree of difference between trajectories in terms of their returns, which is essential for further removing reward ambiguity. Additionally, it is important to note that the reward of a single transition is heavily influenced by the context information within the trajectory. To address these issues, we introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework. Unlike existing approaches, DRASRL takes into account both the ranking of trajectories and the degrees of dissimilarity between them to collaboratively eliminate reward ambiguity when learning a sequence of contextually informed reward signals. Specifically, we leverage the distance between policies, from which the trajectories are generated, as a measure to quantify the degree of differences between traces. This distance-aware information is then used to infer embeddings in the representation space for reward learning, employing the contrastive learning technique. Meanwhile, we integrate the pairwise ranking loss function to incorporate ranking information into the latent features. Moreover, we resort to the Transformer architecture to capture the contextual dependencies within the trajectories in the latent space, leading to more accurate reward estimation. Through extensive experimentation, our DRASRL framework demonstrates significant performance improvements over previous SOTA methods.
    摘要 <> translate into Simplified Chinese逆激励学习(IRL)目的是显式地从收集的专家示范中推断出下面的奖励函数。由于获得专家示范可能是昂贵的,现有的IRL技术主要关注通过从不优秀示范中学习更好的策略来学习奖励函数。然而,现有的IRL算法主要解决了搜索路径排名模糊性问题,忽略了关键的考虑搜索路径之间的差异度,这是关键的减少奖励模糊性。此外,需要注意的是,单个过程的奖励受到过程中的上下文信息的影响。为解决这些问题,我们介绍了距离排序和Sequential Reward Learning(DRASRL)框架。与现有方法不同,DRASRL simultaneous consideration of trajectory ranking and degree of dissimilarity between them to collaboratively eliminate reward ambiguity when learning a sequence of contextually informed reward signals. Specifically, we leverage the distance between policies, from which the trajectories are generated, as a measure to quantify the degree of differences between traces. This distance-aware information is then used to infer embeddings in the representation space for reward learning, employing the contrastive learning technique. Meanwhile, we integrate the pairwise ranking loss function to incorporate ranking information into the latent features. Moreover, we resort to the Transformer architecture to capture the contextual dependencies within the trajectories in the latent space, leading to more accurate reward estimation. Through extensive experimentation, our DRASRL framework demonstrates significant performance improvements over previous SOTA methods.

Exploring the relationship between response time sequence in scale answering process and severity of insomnia: a machine learning approach

  • paper_url: http://arxiv.org/abs/2310.08817
  • repo_url: None
  • paper_authors: Zhao Su, Rongxun Liu, Keyin Zhou, Xinru Wei, Ning Wang, Zexin Lin, Yuanchen Xie, Jie Wang, Fei Wang, Shenzhong Zhang, Xizhe Zhang
  • for: investigate the relationship between insomnia and response time, and develop a machine learning model to predict the presence of insomnia in participants using response time data.
  • methods: collected response time data from 2729 participants using a mobile application, and explored the relationship between symptom severity and response time at the individual questions level.
  • results: found a statistically significant difference (p<.001) in the total response time between participants with or without insomnia symptoms, and demonstrated a high predictive accuracy of 0.743 in predicting insomnia symptoms based on response time data.Here’s the full text in Simplified Chinese:
  • for: 这项研究旨在调查睡眠症和响应时间之间的关系,并使用响应时间数据预测参与者是否有睡眠症。
  • methods: 通过手机应用程序,收集了2729名参与者的响应时间数据,并在个人问题水平上探索症状严重程度和响应时间之间的关系。
  • results: 发现参与者有睡眠症的群体和无睡眠症群体之间存在统计学上的显著差异(p<.001),并在响应时间数据上预测睡眠症的准确率达0.743。
    Abstract Objectives: The study aims to investigate the relationship between insomnia and response time. Additionally, it aims to develop a machine learning model to predict the presence of insomnia in participants using response time data. Methods: A mobile application was designed to administer scale tests and collect response time data from 2729 participants. The relationship between symptom severity and response time was explored, and a machine learning model was developed to predict the presence of insomnia. Results: The result revealed a statistically significant difference (p<.001) in the total response time between participants with or without insomnia symptoms. A correlation was observed between the severity of specific insomnia aspects and response times at the individual questions level. The machine learning model demonstrated a high predictive accuracy of 0.743 in predicting insomnia symptoms based on response time data. Conclusions: These findings highlight the potential utility of response time data to evaluate cognitive and psychological measures, demonstrating the effectiveness of using response time as a diagnostic tool in the assessment of insomnia.
    摘要

DexCatch: Learning to Catch Arbitrary Objects with Dexterous Hands

  • paper_url: http://arxiv.org/abs/2310.08809
  • repo_url: None
  • paper_authors: Fengbo Lan, Shengjie Wang, Yunzhe Zhang, Haotian Xu, Oluwatosin Oseni, Yang Gao, Tao Zhang
  • for: 提高机器人人工智能的灵活抓取能力,增加抓取速度而不需要将物品运送到目的地。
  • methods: 使用Stability-Constrained Reinforcement Learning(SCRL)算法学习捕捉多种物品的灵活抓取能力。
  • results: SCRL算法在基线方法比较大的margin上表现出色,学习出的策略具有强的零Instance Transfer性能,能够在最Difficult任务中实现高水平的成功率,包括在手掌上面无支持的情况下仍能够达到高水平的成功率。
    Abstract Achieving human-like dexterous manipulation remains a crucial area of research in robotics. Current research focuses on improving the success rate of pick-and-place tasks. Compared with pick-and-place, throw-catching behavior has the potential to increase picking speed without transporting objects to their destination. However, dynamic dexterous manipulation poses a major challenge for stable control due to a large number of dynamic contacts. In this paper, we propose a Stability-Constrained Reinforcement Learning (SCRL) algorithm to learn to catch diverse objects with dexterous hands. The SCRL algorithm outperforms baselines by a large margin, and the learned policies show strong zero-shot transfer performance on unseen objects. Remarkably, even though the object in a hand facing sideward is extremely unstable due to the lack of support from the palm, our method can still achieve a high level of success in the most challenging task. Video demonstrations of learned behaviors and the code can be found on the supplementary website.
    摘要 研究人类如手指的灵活抓握仍然是 robotics 领域的关键领域。当前研究的焦点是提高抓取任务的成功率。相比抓取,投掷捕捉行为具有提高抓取速度的潜在优势,但是动态灵活抓握却对稳定控制 pose major challenge。在这篇论文中,我们提出了一种稳定性做出 Constrained Reinforcement Learning(SCRL)算法,用于学习捕捉多种物体的灵活手指。与基eline 相比,SCRL 算法表现出了大幅提升的成果,学习的策略还显示了强的零shot 传承性能。尤其是在最Difficult task 中,甚至当手指朝向侧方的情况下,我们的方法仍然可以达到高水平的成功率。详细的视频示例和代码可以在补充网站上找到。

Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

  • paper_url: http://arxiv.org/abs/2310.08803
  • repo_url: None
  • paper_authors: Palaash Agrawal, Cheston Tan, Heena Rathore
  • for: 这篇评论文章的目的是探讨人工智能(AI)研究中的核心问题和缺陷,以及如何通过学习 cognitive science 来解决这些问题。
  • methods: 本文使用 cognitive science 的不同领域(如神经科学、心理学和语言学)的理论和技术,对 AI 系统的设计和实现进行了比较和分析。
  • results: 本文对 AI 系统的性能和资源利用进行了评估,并指出了现有 AI 系统中的多个缺陷和潜在的发展方向。
    Abstract Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist open problems and fundamental shortcomings related to performance and resource efficiency. Since AI researchers benchmark a significant proportion of performance standards through human intelligence, cognitive sciences-inspired AI is a promising domain of research. Studying cognitive science can provide a fresh perspective to building fundamental blocks in AI research, which can lead to improved performance and efficiency. In this review paper, we focus on the cognitive functions of perception, which is the process of taking signals from one's surroundings as input, and processing them to understand the environment. Particularly, we study and compare its various processes through the lens of both cognitive sciences and AI. Through this study, we review all current major theories from various sub-disciplines of cognitive science (specifically neuroscience, psychology and linguistics), and draw parallels with theories and techniques from current practices in AI. We, hence, present a detailed collection of methods in AI for researchers to build AI systems inspired by cognitive science. Further, through the process of reviewing the state of cognitive-inspired AI, we point out many gaps in the current state of AI (with respect to the performance of the human brain), and hence present potential directions for researchers to develop better perception systems in AI.
    摘要 In this review paper, we focus on the cognitive function of perception, which involves taking in signals from the environment and processing them to understand the surroundings. We study and compare the various processes involved in perception through the lens of both cognitive sciences and AI. We review all the major theories from various sub-disciplines of cognitive science, such as neuroscience, psychology, and linguistics, and draw parallels with theories and techniques from current AI practices. We present a detailed collection of methods in AI for researchers to build AI systems inspired by cognitive science.Through our review of the state of cognitive-inspired AI, we identify many gaps in current AI systems compared to human performance, and therefore present potential directions for researchers to develop better perception systems in AI.Translated into Simplified Chinese:尽管人工智能(AI)已经在短时间内取得了许多成就,但还有许多开放的问题和基础的缺陷, relate to performance and resource efficiency. 因为AI研究人员 frequently benchmark their performance against human intelligence, cognitive sciences-inspired AI is a promising area of research. 学习 cognitive science can provide a fresh perspective on building the fundamental blocks of AI, which can lead to improved performance and efficiency.在这篇评论文章中,我们关注了认知功能的感知,这是接受环境中的信号作为输入,并处理它们以理解环境。我们通过认知科学和AI的镜头来研究和比较各种过程。我们对各个子领域的认知科学(具体来说是神经科学、心理学和语言学)的所有主要理论进行了审查,并将其与当前AI实践中的理论和技术进行了比较。我们为研究人员提供了基于认知科学的AI系统的详细收集。通过我们对认知科学驱动的AI的状态审查,我们确定了许多现有AI系统与人类大脑的性能相比存在差距,因此我们提出了可能的研究方向,以开发更好的感知系统在AI中。

Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception

  • paper_url: http://arxiv.org/abs/2310.13712
  • repo_url: None
  • paper_authors: Harsh Kumar, Ilya Musabirov, Mohi Reza, Jiakai Shi, Anastasia Kuzminykh, Joseph Jay Williams, Michael Liut
  • for: 这个论文旨在探讨个性化聊天机器人教学助手在面临增长的教室规模时的重要性,特别是在irect教师存在有限时。
  • methods: 本研究采用了四种教学意识指导策略,并通过在大学计算机科学课堂(N=145)和Prolific平台(N=356)进行了形成性研究和控制性试验,以探讨学生和LLM之间的互动对学生的参与度和成绩产生的影响。
  • results: 研究发现,直接LLM答案有所提高学生的表现,而对学生解决方案进行细化和优化则可以增强学生对LLM的信任。这些结果表明了LLM在回答或优化学生输入时的作用是复杂的,并且需要考虑学生的 aprroach 和LLM的响应。
    Abstract Personalized chatbot-based teaching assistants can be crucial in addressing increasing classroom sizes, especially where direct teacher presence is limited. Large language models (LLMs) offer a promising avenue, with increasing research exploring their educational utility. However, the challenge lies not only in establishing the efficacy of LLMs but also in discerning the nuances of interaction between learners and these models, which impact learners' engagement and results. We conducted a formative study in an undergraduate computer science classroom (N=145) and a controlled experiment on Prolific (N=356) to explore the impact of four pedagogically informed guidance strategies and the interaction between student approaches and LLM responses. Direct LLM answers marginally improved performance, while refining student solutions fostered trust. Our findings suggest a nuanced relationship between the guidance provided and LLM's role in either answering or refining student input. Based on our findings, we provide design recommendations for optimizing learner-LLM interactions.
    摘要 <>Personalized chatbot-based teaching assistants can be crucial in addressing increasing classroom sizes, especially where direct teacher presence is limited. Large language models (LLMs) offer a promising avenue, with increasing research exploring their educational utility. However, the challenge lies not only in establishing the efficacy of LLMs but also in discerning the nuances of interaction between learners and these models, which impact learners' engagement and results.我们进行了一项前期研究,涵盖了145名大学生,以及一项控制性实验在Prolific平台上,总共356名参与者。我们发现,四种教学指导策略对学生的表现有marginally positive impact,而学生的解决方案细化也能够帮助建立学生和LLM之间的信任。我们的发现表明,学生的输入和LLM的回答之间存在细腻的关系,并且我们提出了优化学生和LLM之间交互的设计建议。<>

DDMT: Denoising Diffusion Mask Transformer Models for Multivariate Time Series Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.08800
  • repo_url: None
  • paper_authors: Chaocheng Yang, Tingyin Wang, Xuanhui Yan
  • for: 这个研究是为了解决多重时间序列资料中的异常探测问题,并且具有广泛的应用前景,如诈欺探测、机件诊断和系统状态估计。
  • methods: 本研究提出了一个名为DDMT的新框架,它结合了对称预测器和混沌传播模型,并且引入了适应动态邻域面组件(ADNM)来减少输入和输出特征之间的信息泄露问题。
  • results: 实验结果显示,DDMT模型能够有效地检测时间序列资料中的异常,并且在多个公开ailable的多重时间序列异常探测数据集上实现了顶尖性能。
    Abstract Anomaly detection in multivariate time series has emerged as a crucial challenge in time series research, with significant research implications in various fields such as fraud detection, fault diagnosis, and system state estimation. Reconstruction-based models have shown promising potential in recent years for detecting anomalies in time series data. However, due to the rapid increase in data scale and dimensionality, the issues of noise and Weak Identity Mapping (WIM) during time series reconstruction have become increasingly pronounced. To address this, we introduce a novel Adaptive Dynamic Neighbor Mask (ADNM) mechanism and integrate it with the Transformer and Denoising Diffusion Model, creating a new framework for multivariate time series anomaly detection, named Denoising Diffusion Mask Transformer (DDMT). The ADNM module is introduced to mitigate information leakage between input and output features during data reconstruction, thereby alleviating the problem of WIM during reconstruction. The Denoising Diffusion Transformer (DDT) employs the Transformer as an internal neural network structure for Denoising Diffusion Model. It learns the stepwise generation process of time series data to model the probability distribution of the data, capturing normal data patterns and progressively restoring time series data by removing noise, resulting in a clear recovery of anomalies. To the best of our knowledge, this is the first model that combines Denoising Diffusion Model and the Transformer for multivariate time series anomaly detection. Experimental evaluations were conducted on five publicly available multivariate time series anomaly detection datasets. The results demonstrate that the model effectively identifies anomalies in time series data, achieving state-of-the-art performance in anomaly detection.
    摘要 multivariate时序数据异常检测已成为时序研究中的关键挑战,具有各种领域的研究意义,如诈骗检测、机件诊断和系统状态估计。基于重建模型在过去几年中表现出了潜在的潜力,但由于数据规模和维度的快速增长,时序重建过程中的噪声和弱同步映射(WIM)问题已经变得越来越突出。为解决这个问题,我们提出了一种新的自适应动态邻域面罩(ADNM)机制,并与Transformer和去噪扩散模型(DDM)结合,构建了一个新的多变量时序异常检测框架,名为去噪扩散面罩Transformer(DDMT)。ADNM模块的引入可以避免输入和输出特征之间的信息泄露,从而解决重建过程中的WIM问题。DDT使用Transformer作为内置神经网络结构,学习时序数据生成过程的步骤性质,模型时序数据的概率分布,捕捉正常数据模式,逐步除噪,使时序数据进行明确的异常检测。据我们知道,这是首次将Denosing Diffusion Model和Transformer结合以进行多变量时序异常检测。我们在五个公开的多变量时序异常检测数据集上进行了实验评估,结果表明,模型可以有效地检测时序数据中的异常。

A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

  • paper_url: http://arxiv.org/abs/2310.08797
  • repo_url: None
  • paper_authors: Takuma Udagawa, Aashka Trivedi, Michele Merler, Bishwaranjan Bhattacharjee
  • for: 本研究旨在探讨如何通过知识传递提高Transformer语言模型的效率,而不失效iveness。
  • methods: 本研究使用了输出分布(OD)传递、隐藏状态(HS)传递和多头注意力(MHA)传递等方法,并对不同的学生架构进行了广泛的实验研究。
  • results: 研究发现,基于MiniLMv2的MHA传递方法在各种学生架构中表现最佳,而HS传递方法在一些复杂的层映射策略下表现最佳,OD传递方法则一直落后于其他方法。这些发现有助于我们在响应时间 crítical的应用中部署高效 yet effective的学生模型。
    Abstract Large language models have become a vital component in modern NLP, achieving state of the art performance in a variety of tasks. However, they are often inefficient for real-world deployment due to their expensive inference costs. Knowledge distillation is a promising technique to improve their efficiency while retaining most of their effectiveness. In this paper, we reproduce, compare and analyze several representative methods for task-agnostic (general-purpose) distillation of Transformer language models. Our target of study includes Output Distribution (OD) transfer, Hidden State (HS) transfer with various layer mapping strategies, and Multi-Head Attention (MHA) transfer based on MiniLMv2. Through our extensive experiments, we study the effectiveness of each method for various student architectures in both monolingual (English) and multilingual settings. Overall, we show that MHA transfer based on MiniLMv2 is generally the best option for distillation and explain the potential reasons behind its success. Moreover, we show that HS transfer remains as a competitive baseline, especially under a sophisticated layer mapping strategy, while OD transfer consistently lags behind other approaches. Findings from this study helped us deploy efficient yet effective student models for latency-critical applications.
    摘要

Mitigating Bias for Question Answering Models by Tracking Bias Influence

  • paper_url: http://arxiv.org/abs/2310.08795
  • repo_url: None
  • paper_authors: Mingyu Derek Ma, Jiun-Yu Kao, Arpit Gupta, Yu-Hsiang Lin, Wenbo Zhao, Tagyoung Chung, Wei Wang, Kai-Wei Chang, Nanyun Peng
  • for: 本文旨在提出一种方法来减少多选问答模型中的偏见。
  • methods: 本文使用了一种基于偏见度量的多任务学习方法来减少偏见。具体来说,我们计算了每个查询实例的偏见水平,并使用这些水平作为多任务学习的优化目标。
  • results: 我们的方法可以在多个偏见类别中减少 BBQ 数据集中的偏见水平,而无需损失问答准确率。
    Abstract Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. If the influenced instance is more biased, we derive that the query instance is biased. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task. We further introduce a new bias evaluation metric to quantify bias in a comprehensive and sensitive way. We show that our method could be applied to multiple QA formulations across multiple bias categories. It can significantly reduce the bias level in all 9 bias categories in the BBQ dataset while maintaining comparable QA accuracy.
    摘要 模型在不同的自然语言处理任务中展现了刻板印象,而问答(QA)模型中的偏见特别危险,因为输出答案可能直接被用户 consume。有些数据集用于评估偏见模型,但是对于QA模型的偏见缓解技术还是下acker。在这项工作中,我们提出了BMBI方法,用于缓解多选问答模型的偏见。我们基于QueryInstance的偏见程度可以通过另一个QueryInstance的影响来衡量。如果影响的QueryInstance更加偏见,我们得出 QueryInstance 具有偏见。我们然后使用检测到的偏见程度作为多任务学习设定的一个优化目标,以及原来的QA任务。我们还提出了一个新的偏见评估指标,可以全面、敏感地评估偏见。我们证明了我们的方法可以应用于多种问答形式,并在BBQ数据集中降低了9种偏见类别的偏见水平,保持与原始QA任务相似的答案准确性。

Price of Stability in Quality-Aware Federated Learning

  • paper_url: http://arxiv.org/abs/2310.08790
  • repo_url: None
  • paper_authors: Yizhou Yan, Xinyu Tang, Chao Huang, Ming Tang
  • for: 这个论文旨在提出一种基于联合学习的标签噪声纠正方法,以提高联合学习性能。
  • methods: 这个论文使用了一种标签噪声纠正游戏来模型客户端之间的互动,并分析了这个游戏的平衡。
  • results: 论文的分析表明,在客户端之间的标签噪声纠正游戏的平衡结果会导致全球模型的准确率比社会最佳解lower。此外,论文还提出了一种有效的社会优化方案来解决这个问题。
    Abstract Federated Learning (FL) is a distributed machine learning scheme that enables clients to train a shared global model without exchanging local data. The presence of label noise can severely degrade the FL performance, and some existing studies have focused on algorithm design for label denoising. However, they ignored the important issue that clients may not apply costly label denoising strategies due to them being self-interested and having heterogeneous valuations on the FL performance. To fill this gap, we model the clients' interactions as a novel label denoising game and characterize its equilibrium. We also analyze the price of stability, which quantifies the difference in the system performance (e.g., global model accuracy, social welfare) between the equilibrium outcome and the socially optimal solution. We prove that the equilibrium outcome always leads to a lower global model accuracy than the socially optimal solution does. We further design an efficient algorithm to compute the socially optimal solution. Numerical experiments on MNIST dataset show that the price of stability increases as the clients' data become noisier, calling for an effective incentive mechanism.
    摘要 federated 学习(FL)是一种分布式机器学习方案,允许客户端对共享的全球模型进行训练,无需交换本地数据。 however, presence of label noise can severely degrade FL performance, and some existing studies have focused on algorithm design for label denoising. but these studies have ignored the important issue that clients may not apply costly label denoising strategies due to their self-interest and heterogeneous valuations on FL performance.to fill this gap, we model the clients' interactions as a novel label denoising game and characterize its equilibrium. we also analyze the price of stability, which quantifies the difference in the system performance (e.g., global model accuracy, social welfare) between the equilibrium outcome and the socially optimal solution. we prove that the equilibrium outcome always leads to a lower global model accuracy than the socially optimal solution does. we further design an efficient algorithm to compute the socially optimal solution.numerical experiments on MNIST dataset show that the price of stability increases as the clients' data become noisier, calling for an effective incentive mechanism.Here's the translation in Traditional Chinese:联邦学习(FL)是一种分布式机器学习方案,让客户端透过共享的全球模型进行训练,无需交换本地数据。然而,标签噪声可以严重降低FL表现,一些现有的研究专注于算法设计 Label denoising。但这些研究忽略了客户端可能不愿意运用成本高昂的标签噪声策略,因为他们具有自我利益和不同的评估FL表现的观点。为了填补这个空白,我们模拟客户端之间的互动为一个新的标签噪声游戏,并characterize its equilibrium。我们还分析了稳定价格,它衡量了系统表现(例如全球模型精度、社会利益)之间的差异。我们证明了平衡结果总是比社会最佳解决方案来的全球模型精度更低。我们还设计了高效的社会最佳解决方案的算法。实验结果显示,稳定价格随着客户端数据的噪声度增加,呼应设置有效的激励机制。

Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning

  • paper_url: http://arxiv.org/abs/2310.08782
  • repo_url: https://github.com/optml-group/dp4tl
  • paper_authors: Yihua Zhang, Yimeng Zhang, Aochuan Chen, Jinghan Jia, Jiancheng Liu, Gaowen Liu, Mingyi Hong, Shiyu Chang, Sijia Liu
  • for: 这篇论文的目的是提出一种基于转移学习的数据剔除方法(DP),以提高数据效率而不 sacrificing 表达能力。
  • methods: 本文使用了两种新的DP方法:标签映射和特征映射,用于supervised和self-supervised预训练设置。
  • results: 对于多个转移学习任务, authors 示出了剔除源数据类别可以达到40% ~ 80%的剔除率,而无需牺牲下游表达能力,从而实现了预训练阶段的2 ~ 5倍速化。
    Abstract Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by identifying and removing redundant training samples without sacrificing performance. In this work, we aim to address the problem of DP for transfer learning, i.e., how to prune a source dataset for improved pretraining efficiency and lossless finetuning accuracy on downstream target tasks. To our best knowledge, the problem of DP for transfer learning remains open, as previous studies have primarily addressed DP and transfer learning as separate problems. By contrast, we establish a unified viewpoint to integrate DP with transfer learning and find that existing DP methods are not suitable for the transfer learning paradigm. We then propose two new DP methods, label mapping and feature mapping, for supervised and self-supervised pretraining settings respectively, by revisiting the DP problem through the lens of source-target domain mapping. Furthermore, we demonstrate the effectiveness of our approach on numerous transfer learning tasks. We show that source data classes can be pruned by up to 40% ~ 80% without sacrificing downstream performance, resulting in a significant 2 ~ 5 times speed-up during the pretraining stage. Besides, our proposal exhibits broad applicability and can improve other computationally intensive transfer learning techniques, such as adversarial pretraining. Codes are available at https://github.com/OPTML-Group/DP4TL.
    摘要 巨量数据经常被认为是深度学习应用的重要 Component,但它也会带来重大的计算和基础设施成本。因此,数据集剪除(DP)已成为一种有效的提高数据效率的方法,通过确定和移除无用的训练样本而不 sacrifice性能。在这项工作中,我们想要解决转移学习中的DP问题,即如何在转移学习中剪除源数据集以提高预训练效率和无损终端任务准确率。根据我们所知,转移学习中的DP问题仍未得到解决,先前的研究主要是对DP和转移学习作为两个独立的问题进行研究。相比之下,我们提出了一种统一的视角,将DP与转移学习集成,并发现现有的DP方法不适合转移学习模式。我们然后提出了两种新的DP方法,标签映射和特征映射,用于supervised和self-supervised预训练设置。我们通过重新评估DP问题的角度来推出这两种方法,并在许多转移学习任务上进行了实验。我们发现,源数据类可以通过40%~80%的剪除而不损失下游性能,从而实现了在预训练阶段的2~5倍速化。此外,我们的提议具有广泛的可应用性,可以改进其他计算昂贵的转移学习技术,如对抗预训练。codes可以在https://github.com/OPTML-Group/DP4TL中找到。

“Im not Racist but…”: Discovering Bias in the Internal Knowledge of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.08780
  • repo_url: None
  • paper_authors: Abel Salinas, Louis Penafiel, Robert McCormack, Fred Morstatter
  • for: 本研究旨在探讨大型自然语言处理模型(LLM)中隐藏的社会偏见,以提高模型在下游应用中的性别公平性。
  • methods: 本研究提出了一种基于提示的新方法,可以在任意的 LLM 中揭示隐藏的社会偏见。该方法通过动态生成知识表示 internal stereotypes,以便在 LLM 内部知识中找到偏见。
  • results: 本研究的结果表明,通过使用提示基本可以在 LLM 中找到隐藏的社会偏见,并且可以系统地分析这些偏见。这些结果将促进自然语言处理系统的透明度和公平性。
    Abstract Large language models (LLMs) have garnered significant attention for their remarkable performance in a continuously expanding set of natural language processing tasks. However, these models have been shown to harbor inherent societal biases, or stereotypes, which can adversely affect their performance in their many downstream applications. In this paper, we introduce a novel, purely prompt-based approach to uncover hidden stereotypes within any arbitrary LLM. Our approach dynamically generates a knowledge representation of internal stereotypes, enabling the identification of biases encoded within the LLM's internal knowledge. By illuminating the biases present in LLMs and offering a systematic methodology for their analysis, our work contributes to advancing transparency and promoting fairness in natural language processing systems.
    摘要 大型语言模型(LLM)已引起了广泛的关注,因为它们在自然语言处理任务中表现出了很好的 result.然而,这些模型也被发现含有社会偏见或者 sterotype,这些偏见可能会对其多种下游应用产生负面影响。在这篇论文中,我们提出了一种新的、 purely prompt-based的方法,可以在任意的 LLM 中探测隐藏的偏见。我们的方法可以动态生成内置偏见的知识表示,从而可以在 LLM 内部找到编码的偏见。通过暴露 LLM 中的偏见和提供系统性的分析方法,我们的工作对于提高自然语言处理系统的透明度和公平性做出了贡献。