cs.AI - 2023-10-14

A Neuro-Mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization

  • paper_url: http://arxiv.org/abs/2310.15177
  • repo_url: None
  • paper_authors: Alexander Ororbia, Mary Alexandria Kelly
  • for: 本研究的目的是探讨Generative AI的发展和其对认知科学的影响。
  • methods: 本研究使用了COGnitive Neural GENerative系统,这是一种基于Hebbian适应的神经网络模型,用于优化variational free energy函数。
  • results: 本研究提出了一种基于COG系统的 cognitive architectures,可以用于模拟大脑中的认知过程。
    Abstract Over the last few years, large neural generative models, capable of synthesizing intricate sequences of words or producing complex image patterns, have recently emerged as a popular representation of what has come to be known as "generative artificial intelligence" (generative AI). Beyond opening the door to new opportunities as well as challenges for the domain of statistical machine learning, the rising popularity of generative AI brings with it interesting questions for Cognitive Science, which seeks to discover the nature of the processes that underpin minds and brains as well as to understand how such functionality might be acquired and instantiated in biological (or artificial) substrate. With this goal in mind, we argue that a promising long-term pathway lies in the crafting of cognitive architectures, a long-standing tradition of the field, cast fundamentally in terms of neuro-mimetic generative building blocks. Concretely, we discuss the COGnitive Neural GENerative system, which is an architecture that casts the Common Model of Cognition in terms of Hebbian adaptation operating in service of optimizing a variational free energy functional.
    摘要 最近几年来,大规模神经生成模型,可以生成复杂的语言序列或图像模式,在人工智能领域中崛起为人们关注的新兴表现形式。这些模型不仅开启了新的机会和挑战,还对认知科学产生了感兴趣的问题,旨在探索 minds 和 brains 的内在机制,以及如何在生物(或人工)材料中实现这种功能。为实现这个目标,我们认为,制定 cognitive 架构是一条有前途的长期路径。在这个框架下,我们讨论了 COGnitive Neural GENerative system,这是一种基于 Hebbian 适应的神经生成模型,用于优化变量自由能函数。

Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring

  • paper_url: http://arxiv.org/abs/2310.09680
  • repo_url: None
  • paper_authors: Ankitha Sudarshan, Vinay Samuel, Parth Patwa, Ibtihel Amara, Aman Chadha
  • for: 提高语音识别系统的上下文依赖词汇识别精度
  • methods: 利用隐马尔科夫模型和混合型神经网络模型,并具有语音和语言模型的 интеграción
  • results: 在LibriSpeech数据集上实现了很好的效果,Word Error Rate(WER)下降了可见的程度
    Abstract Automatic Speech Recognition (ASR) has witnessed a profound research interest. Recent breakthroughs have given ASR systems different prospects such as faithfully transcribing spoken language, which is a pivotal advancement in building conversational agents. However, there is still an imminent challenge of accurately discerning context-dependent words and phrases. In this work, we propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing leveraging the power of deep learning models in accurately delivering spot-on transcriptions across a wide variety of vocabularies and speaking styles. Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models integrating both language and acoustic modeling for better accuracy. We infused our network with the use of a transformer-based model to properly rescore the word lattice achieving remarkable capabilities with a palpable reduction in Word Error Rate (WER). We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
    摘要 自动语音识别(ASR)技术在研究中受到了广泛的关注。最近的突破使得ASR系统可以准确地识别语音,这是建立对话代理人的关键进步。然而,仍然存在一个急需要准确地识别上下文依赖的词语和短语的挑战。在这种工作中,我们提出了一种改进上下文认知在ASR系统中的方法,通过semantic lattice processing,利用深度学习模型以高精度提供详细的转录。我们的解决方案包括使用隐藏马尔可夫模型和 Gaussian Mixture Models(HMM-GMM)以及深度神经网络(DNN)模型,将语言和音响模型结合起来以提高准确性。我们在网络中使用transformer-based模型来正确地重新分配词网络,实现了很高的Word Error Rate(WER)下降。我们在LibriSpeech dataset上进行了实验,并进行了实验分析。

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

  • paper_url: http://arxiv.org/abs/2310.09676
  • repo_url: None
  • paper_authors: Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Suhaila Shakiah, Hangjie Shi, Reza Ghanadan, William Yang Wang
  • for: 本研究的目的是开发一种可以通过多模态提示(文本描述和视觉信号)来控制机器人的 manipulate 能力。
  • methods: 我们的方法包括一个两个阶段的训练管道,包括逆动力预训练和多任务调整。为了促进多模态理解,我们设计了一个多模态提示编码器,通过将预训练的语言模型与视觉输入连接起来,模拟动作维度之间的依赖关系。
  • results: 我们的方法在 VIMA-BENCH 上进行了实验,并在成功率上达到了新的状态对照(10%提高)。此外,我们还证明了我们的模型在 Context 学习中表现出色。
    Abstract Prompt-based learning has been demonstrated as a compelling paradigm contributing to large language models' tremendous success (LLMs). Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction following and task planning. However, not much attention has been paid to embodied tasks with multimodal prompts, combining vision signals with text descriptions. This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals. In this work, we introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts from multi-task expert trajectories. Our methods consist of a two-stage training pipeline that performs inverse dynamics pretraining and multi-task finetuning. To facilitate multimodal understanding, we design our multimodal prompt encoder by augmenting a pretrained LM with a residual connection to the visual input and model the dependencies among action dimensions. Empirically, we evaluate the efficacy of our method on the VIMA-BENCH and establish a new state-of-the-art (10% improvement in success rate). Moreover, we demonstrate that our model exhibits remarkable in-context learning ability.
    摘要 In this work, we propose an effective framework for learning a policy to perform robot manipulation with multimodal prompts using multi-task expert trajectories. Our approach consists of a two-stage training pipeline that includes inverse dynamics pretraining and multi-task finetuning. To facilitate multimodal understanding, we design our multimodal prompt encoder by augmenting a pretrained LM with a residual connection to the visual input and modeling the dependencies among action dimensions.Empirically, we evaluate the effectiveness of our method on the VIMA-BENCH and achieve a new state-of-the-art with a 10% improvement in success rate. Additionally, we demonstrate that our model exhibits remarkable in-context learning ability.

Efficient Model-Agnostic Multi-Group Equivariant Networks

  • paper_url: http://arxiv.org/abs/2310.09675
  • repo_url: None
  • paper_authors: Razan Baltaji, Sourya Basu, Lav R. Varshney
  • for: This paper aims to address the computational expense of constructing model-agnostic group equivariant networks, such as equitune, for large product groups.
  • methods: The paper proposes two efficient model-agnostic equivariant designs for two related problems: one with multiple inputs and another with a single input but a large product group. The designs use a novel fusion layer called an IS layer, which is a universal approximator of invariant-symmetric functions.
  • results: The paper shows that the proposed designs are competitive with equitune and its variants, while being computationally more efficient. The designs are applied to three applications: multi-image classification, language compositionality, and robust zero-shot image classification.
    Abstract Constructing model-agnostic group equivariant networks, such as equitune (Basu et al., 2023b) and its generalizations (Kim et al., 2023), can be computationally expensive for large product groups. We address this by providing efficient model-agnostic equivariant designs for two related problems: one where the network has multiple inputs each with potentially different groups acting on them, and another where there is a single input but the group acting on it is a large product group. For the first design, we initially consider a linear model and characterize the entire equivariant space that satisfies this constraint. This characterization gives rise to a novel fusion layer between different channels that satisfies an invariance-symmetry (IS) constraint, which we call an IS layer. We then extend this design beyond linear models, similar to equitune, consisting of equivariant and IS layers. We also show that the IS layer is a universal approximator of invariant-symmetric functions. Inspired by the first design, we use the notion of the IS property to design a second efficient model-agnostic equivariant design for large product groups acting on a single input. For the first design, we provide experiments on multi-image classification where each view is transformed independently with transformations such as rotations. We find equivariant models are robust to such transformations and perform competitively otherwise. For the second design, we consider three applications: language compositionality on the SCAN dataset to product groups; fairness in natural language generation from GPT-2 to address intersectionality; and robust zero-shot image classification with CLIP. Overall, our methods are simple and general, competitive with equitune and its variants, while also being computationally more efficient.
    摘要 建立模型无关的群equivariant网络,如equitune(Basu et al., 2023b)和其扩展(Kim et al., 2023),可能会对大量产品群进行计算昂贵的成本。我们解决这个问题,提供了高效的模型无关equivariant设计,用于两个相关的问题:一个是网络有多个输入,每个输入都可能有不同的群 acting on it,另一个是一个输入,但是群 acting on it是一个大量产品群。对于第一个设计,我们首先考虑一个线性模型,并Characterize了满足这个约束的整个equivariant空间。这个Characterization导致了一种新的协调层(IS layer),它满足一个对称-对称(IS)约束。我们然后将这个设计扩展到不同的模型,类似于equitune,包括equivariant和IS层。我们还证明了IS层是对称-对称函数的universal approximator。受第一个设计的启发,我们使用IS性质来设计一个高效的模型无关equivariant设计,用于大量产品群 acting on a single input。我们提供了对多视图分类的实验,发现equivariant模型对独立的变换(如旋转)具有抗变换性和竞争性。对于第二个设计,我们考虑了三个应用:语言compositional on SCAN dataset,用于product groups; fairness in natural language generation from GPT-2,用于Addressing intersectionality;和robust zero-shot image classification with CLIP。总的来说,我们的方法是简单而普遍,与equitune和其 variants相当竞争,同时也更加计算效率。

Edge-InversionNet: Enabling Efficient Inference of InversionNet on Edge Devices

  • paper_url: http://arxiv.org/abs/2310.09667
  • repo_url: None
  • paper_authors: Zhepeng Wang, Isaacshubhanand Putla, Weiwen Jiang, Youzuo Lin
  • for: 这个研究旨在提高Edge设备上的普遍干扰实验(Full Waveform Inversion,FWI)效率,通过将数据驱动学习模型对精确性进行简洁化。
  • methods: 我们提出使用结构剪溃算法来实现轻量级InversionNet,以便在Edge设备上进行效率的推论。
  • results: 实验结果显示,剪溃后的InversionNet可以实现98.2%的计算资源减少,并且仅受轻度模型性能下降。
    Abstract Seismic full waveform inversion (FWI) is a widely used technique in geophysics for inferring subsurface structures from seismic data. And InversionNet is one of the most successful data-driven machine learning models that is applied to seismic FWI. However, the high computing costs to run InversionNet have made it challenging to be efficiently deployed on edge devices that are usually resource-constrained. Therefore, we propose to employ the structured pruning algorithm to get a lightweight version of InversionNet, which can make an efficient inference on edge devices. And we also made a prototype with Raspberry Pi to run the lightweight InversionNet. Experimental results show that the pruned InversionNet can achieve up to 98.2 % reduction in computing resources with moderate model performance degradation.
    摘要 震动全波形数据逆置(FWI)是地球物理学中广泛使用的技术,用于基于地震数据推断地下结构。而倒推网络(InversionNet)是数据驱动机器学习模型中的一个最成功的应用,但高计算成本使其在边缘设备上进行效率地部署具有挑战性。因此,我们提议使用结构采样法来实现一个轻量级的倒推网络,以实现在边缘设备上高效的推断。而我们还制作了使用蓝莓 Pi 进行测试的原型。实验结果表明,减少后的倒推网络可以实现高达98.2%的计算资源减少,同时只带来一定的模型性能下降。

A Generalized Extensive-Form Fictitious Play Algorithm

  • paper_url: http://arxiv.org/abs/2310.09658
  • repo_url: None
  • paper_authors: Tim P. Schulze
  • for: 这个论文是为了解二人零球游戏的平衡而写的。
  • methods: 这个论文使用了一种简单的扩展形算法来找到两人零球游戏的平衡。这个算法是通过一种总体化的形式来实现的,与一种扩展形 ficitious play 算法相似。
  • results: 这个论文的结果表明,这种新算法与一种相似的扩展形 ficitious play 算法和一种 counter-factual regret minimization 算法相比,它具有更好的性能,并且更容易实现。
    Abstract We introduce a simple extensive-form algorithm for finding equilibria of two-player, zero-sum games. The algorithm is realization equivalent to a generalized form of Fictitious Play. We compare its performance to that of a similar extensive-form fictitious play algorithm and a counter-factual regret minimization algorithm. All three algorithms share the same advantages over normal-form fictitious play in terms of reducing storage requirements and computational complexity. The new algorithm is intuitive and straightforward to implement, making it an appealing option for those looking for a quick and easy game solving tool.
    摘要 我们介绍了一种简单的扩展形算法来找到两人零点游戏的平衡点。这个算法是扩展形非常Play的一种通用形式,我们与相似的扩展形非常Play算法和反思悔检算法进行比较。这三个算法都比正常形非常Play具有减少存储需求和计算复杂性的优点。新算法直观易于实现,使其成为寻找快速简单游戏解决工具的首选。

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

  • paper_url: http://arxiv.org/abs/2310.09653
  • repo_url: None
  • paper_authors: Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
    for: 这个论文的目的是提出一种叫做SelfVC的训练策略,用于逐渐提高一个语音转换模型的性能。methods: 这个论文使用了自然语音学习和人脸识别模型来 derivate speech representations,并使用这些表示来训练一个可控的语音转换模型。results: 在这个论文中,通过使用自我生成的示例来逐渐改进语音转换模型,可以提高生成的语音的 speaker similarity 和自然性。此外,SelfVC 还可以应用于零shot语音转换、 cross-lingual 语音转换和可控的语音合成等任务。
    Abstract We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on explicitly disentangling speech representations to separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss terms can lead to information loss by discarding finer nuances of the original signal. In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning and speaker verification models. First, we develop techniques to derive prosodic information from the audio signal and SSL representations to train predictive submodules in the synthesis model. Next, we propose a training strategy to iteratively improve the synthesis model for voice conversion, by creating a challenging training objective using self-synthesized examples. In this training approach, the current state of the synthesis model is used to generate voice-converted variations of an utterance, which serve as inputs for the reconstruction task, ensuring a continuous and purposeful refinement of the model. We demonstrate that incorporating such self-synthesized examples during training improves the speaker similarity of generated speech as compared to a baseline voice conversion model trained solely on heuristically perturbed inputs. SelfVC is trained without any text and is applicable to a range of tasks such as zero-shot voice conversion, cross-lingual voice conversion, and controllable speech synthesis with pitch and pace modifications. SelfVC achieves state-of-the-art results in zero-shot voice conversion on metrics evaluating naturalness, speaker similarity, and intelligibility of synthesized audio.
    摘要 我们提出了SelfVC,一种培训策略,可以逐步提高一个语音转换模型的性能。之前的尝试中,人们通常通过显式地分离发音表示来分配说话特征和语言内容。然而,通过任务特定的损失函数来损失这些特征可能会导致信息损失,因为抛弃了原始信号的细节。在这个工作中,我们不是通过显式地分离特征来损失这些特征,而是通过控制语音转换模型来培训混合的发音表示。我们开发了一些技术来从音频信号和SSL表示中提取发音信息,并用这些信息来训练预测子模块。然后,我们提出了一种培训策略,可以逐步改进语音转换模型的性能,通过使用自己生成的示例进行反馈循环训练。在这种培训策略中,当前的synthesis模型的状态被用来生成转换后的话语变体,作为反馈输入,以确保不断改进模型的性能。我们示出,在培训SelfVC模型时,不需要任何文本,可以在不同任务中进行零批量语音转换、跨语言语音转换和可控的语音生成中使用。SelfVC在零批量语音转换中实现了状态之最的结果,并且在自然性、发音相似度和理解性等方面取得了出色的成绩。

Lexical Entrainment for Conversational Systems

  • paper_url: http://arxiv.org/abs/2310.09651
  • repo_url: None
  • paper_authors: Zhengxiang Shi, Procheta Sen, Aldo Lipani
  • for: This paper aims to address the issue of lexical entrainment (LE) in conversational systems, which is a crucial humanlike phenomenon that is not adequately addressed by current response generation models.
  • methods: The authors propose a new dataset, named MULTIWOZ-ENTR, and a measure for LE for conversational systems. They also suggest two new tasks, a LE extraction task and a LE generation task, and present two baseline approaches for the LE extraction task.
  • results: The authors demonstrate the effectiveness of their proposed approach by presenting results from experiments conducted on the MULTIWOZ-ENTR dataset.Here is the same information in Simplified Chinese text, as requested:
  • for: 这篇论文是为了解决对话系统中的 lexical entrainment(LE)问题,这是一种人类样式的现象,现有的回答生成模型未能充分考虑这一点。
  • methods: 作者们提出了一个新的数据集,名为 MULTIWOZ-ENTR,以及一种对 LE 的评价方法。他们还建议两个新任务,一个是 LE 提取任务,另一个是 LE 生成任务,并提出了两种基线方法来实现 LE 提取任务。
  • results: 作者们通过在 MULTIWOZ-ENTR 数据集上进行的实验,证明了他们的提出的方法的有效性。
    Abstract Conversational agents have become ubiquitous in assisting with daily tasks, and are expected to possess human-like features. One such feature is lexical entrainment (LE), a phenomenon in which speakers in human-human conversations tend to naturally and subconsciously align their lexical choices with those of their interlocutors, leading to more successful and engaging conversations. As an example, if a digital assistant replies 'Your appointment for Jinling Noodle Pub is at 7 pm' to the question 'When is my reservation for Jinling Noodle Bar today?', it may feel as though the assistant is trying to correct the speaker, whereas a response of 'Your reservation for Jinling Noodle Bar is at 7 pm' would likely be perceived as more positive. This highlights the importance of LE in establishing a shared terminology for maximum clarity and reducing ambiguity in conversations. However, we demonstrate in this work that current response generation models do not adequately address this crucial humanlike phenomenon. To address this, we propose a new dataset, named MULTIWOZ-ENTR, and a measure for LE for conversational systems. Additionally, we suggest a way to explicitly integrate LE into conversational systems with two new tasks, a LE extraction task and a LE generation task. We also present two baseline approaches for the LE extraction task, which aim to detect LE expressions from dialogue contexts.
    摘要 很多对话代理程序已经在日常任务中出现,并且需要具备人类化特征。一种这种特征是语言同步(LE),即在人类对话中, speaker 们会自然地和无意识地与对方的语言选择相吻合,从而使对话更加成功和有趣。例如,如果一个数字助手回答“你今天的预约时间为7点”,即使用户问道“今天我的预约时间是什么时间?”,助手的回答可能会被视为 corrected ,而不是“你的预约时间是7点”。这显示了LE在建立共同术语的重要性,以避免对话中的歧义。然而,我们在这个工作中发现,当前的响应生成模型并不充分考虑这一重要的人类特征。为此,我们提出了一个新的数据集名为 MULTIWOZ-ENTR,以及一个LE测量方法。此外,我们还提出了一种将LEExplicitly integrate into conversational systems的方法,包括两个新任务:LE抽取任务和LE生成任务。此外,我们还提出了两种基elineapproaches for LE抽取任务,以检测对话上的LE表达。

Multimodal Federated Learning in Healthcare: a review

  • paper_url: http://arxiv.org/abs/2310.09650
  • repo_url: None
  • paper_authors: Jacob Thrasher, Alina Devkota, Prasiddha Siwakotai, Rohit Chivukula, Pranav Poudel, Chaunbo Hu, Binod Bhattarai, Prashnna Gyawali
  • for: 本研究旨在探讨医疗领域中的多Modal Federated Learning(MMFL),以及其在保持患者数据隐私和安全的情况下提供高度准确和可靠的人工智能系统。
  • methods: 本研究使用了聚合学习和 federated learning 等方法,以实现在多个本地数据存储机构中进行多模态学习。
  • results: 本研究提出了一些挑战现有模型的限制,并指出了未来在这个领域的发展方向。
    Abstract Recent advancements in multimodal machine learning have empowered the development of accurate and robust AI systems in the medical domain, especially within centralized database systems. Simultaneously, Federated Learning (FL) has progressed, providing a decentralized mechanism where data need not be consolidated, thereby enhancing the privacy and security of sensitive healthcare data. The integration of these two concepts supports the ongoing progress of multimodal learning in healthcare while ensuring the security and privacy of patient records within local data-holding agencies. This paper offers a concise overview of the significance of FL in healthcare and outlines the current state-of-the-art approaches to Multimodal Federated Learning (MMFL) within the healthcare domain. It comprehensively examines the existing challenges in the field, shedding light on the limitations of present models. Finally, the paper outlines potential directions for future advancements in the field, aiming to bridge the gap between cutting-edge AI technology and the imperative need for patient data privacy in healthcare applications.
    摘要 (简体中文)近期,多modal机器学习技术在医疗领域得到了进一步发展,特别是在中央数据库系统中。同时,联合学习(FL)也在进步,提供了一种分布式机制,不需要集中数据,从而提高了医疗数据的隐私和安全性。这两种概念的结合支持了医疗领域的多modal学习进程,同时保证了患者记录在本地数据持有机构中的安全性和隐私性。本文提供了医疗领域联合学习的简要概述,并详细描述了当前领域的挑战和限制。最后,本文还提出了未来发展的可能性,旨在bridging当今AI技术和医疗应用中病人数据隐私的差距。

Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy

  • paper_url: http://arxiv.org/abs/2310.11467
  • repo_url: None
  • paper_authors: Rohith Arumugam S, Angel Deborah S
  • for: 提高 binary 代码注释质量分类模型的准确率
  • methods: integrating generated code and comment pairs to improve model accuracy
  • results: 两个分类模型,一个使用原始数据集,另一个使用扩展数据集和生成的代码注释对Here’s a more detailed explanation of each point:1. for: The paper is written to improve the accuracy of a binary code comment quality classification model. The authors aim to achieve this by integrating generated code and comment pairs into the model.2. methods: The authors use a Large Language Model Architecture to generate code and comment pairs, and then label these pairs to indicate their utility. They then incorporate these generated pairs into the original dataset to create an augmented dataset. Finally, they train two classification models: one using the original dataset and another using the augmented dataset.3. results: The authors report the results of their experiments, which show that the second model (using the augmented dataset) achieves higher accuracy than the first model (using the original dataset). Specifically, the second model achieves an accuracy of 85.1%, while the first model achieves an accuracy of 78.4%.
    Abstract This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a Large Language Model Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this effort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.
    摘要 这份报告关注将二进制代码评论质量分类模型与生成的代码和评论对照搭配,以提高模型准确性。数据集包含9048对C编程语言中的代码和评论,每个笔记为"有用"或"无用"。此外,代码和评论对也由大型自然语言模型架构生成,并将这些生成对照标注为其有用性。结果包括两个分类模型:一个使用原始数据集,另一个包括已生成的代码评论对和标注。

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.09624
  • repo_url: https://github.com/alexmeigz/assert
  • paper_authors: Alex Mei, Sharon Levy, William Yang Wang
  • for: 这个论文的目的是提高AI安全评估中的可靠性,以适应高度Random的环境。
  • methods: 该论文提出了三种方法,即语意匹配增强、目标启动和敌意批注注入。这些方法用于生成覆盖多个安全设定的测试集,包括semantic equivalence、related scenarios和敌意情况。
  • results: 研究发现,exist在现有的状态艺模型中的安全保护措施并不能 garantuee模型在各种语义相关的情况下的正确性。在Semantic equivalence和related scenarios中,模型的性能差异 statistically significant, erreur rates up to 19% in zero-shot adversarial settings。这些结果表明,在AI安全评估中需要更多的注意和研究。
    Abstract As large language models are integrated into society, robustness toward a suite of prompts is increasingly important to maintain reliability in a high-variance environment.Robustness evaluations must comprehensively encapsulate the various settings in which a user may invoke an intelligent system. This paper proposes ASSERT, Automated Safety Scenario Red Teaming, consisting of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. For robust safety evaluation, we apply these methods in the critical domain of AI safety to algorithmically generate a test suite of prompts covering diverse robustness settings -- semantic equivalence, related scenarios, and adversarial. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. Despite dedicated safeguards in existing state-of-the-art models, we find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings, raising concerns for users' physical safety.
    摘要

A decoder-only foundation model for time-series forecasting

  • paper_url: http://arxiv.org/abs/2310.10688
  • repo_url: None
  • paper_authors: Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou
  • for: 这份研究是为了设计一个基于大语言模型的时间序列基础模型,用于预测,其 zero-shot 性能在多个公开数据集上几乎与现有最佳指导预测模型相当。
  • methods: 这个模型基于嵌入式类别器的剪辑者类型注意力模型,通过将一个大时间序列集合预先训练,可以在不同的预测历史长度、预测长度和时间细分度上运作良好。
  • results: 研究发现,这个模型在不同的数据集上可以实现高度的预测精度,并且可以跨越不同的预测历史长度、预测长度和时间细分度。
    Abstract Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.
    摘要 使用大语言模型 recent advances in Natural Language Processing (NLP) 的技术,我们设计了一个时间序列基础模型,其 zeroshot 性能在多个公共数据集上与每个数据集的现有supervised forecasting模型的精度几乎相当。我们的模型基于预训练patched-decoder Style attention模型,可以在不同的预测历史长度、预测长度和时间粒度上工作良好。Note:* "zeroshot" means that the model is not trained on any specific dataset, but still achieves good performance on that dataset.* "supervised" means that the model is trained on a specific dataset with labeled data.* "patched-decoder" is a type of attention mechanism used in the model.

Deep Neural Networks Can Learn Generalizable Same-Different Visual Relations

  • paper_url: http://arxiv.org/abs/2310.09612
  • repo_url: None
  • paper_authors: Alexa R. Tartaglini, Sheridan Feucht, Michael A. Lepori, Wai Keen Vong, Charles Lovering, Brenden M. Lake, Ellie Pavlick
  • for: 研究深度神经网络是否可以学习和泛化同ifferent关系,包括在不同批处理和精度训练中。
  • methods: 使用多种架构、预训练方法和精度训练数据来研究深度神经网络是否可以学习和泛化同ifferent关系。
  • results: certain pretrained transformers可以学习一个高度泛化的同ifferent关系,并且 fine-tuning on abstract shapeslacking texture or color提供了最强的out-of-distribution泛化。
    Abstract Although deep neural networks can achieve human-level performance on many object recognition benchmarks, prior work suggests that these same models fail to learn simple abstract relations, such as determining whether two objects are the same or different. Much of this prior work focuses on training convolutional neural networks to classify images of two same or two different abstract shapes, testing generalization on within-distribution stimuli. In this article, we comprehensively study whether deep neural networks can acquire and generalize same-different relations both within and out-of-distribution using a variety of architectures, forms of pretraining, and fine-tuning datasets. We find that certain pretrained transformers can learn a same-different relation that generalizes with near perfect accuracy to out-of-distribution stimuli. Furthermore, we find that fine-tuning on abstract shapes that lack texture or color provides the strongest out-of-distribution generalization. Our results suggest that, with the right approach, deep neural networks can learn generalizable same-different visual relations.
    摘要 In this article, we thoroughly investigate whether deep neural networks can acquire and generalize same-different relations both within and out-of-distribution using various architectures, pretraining methods, and fine-tuning datasets. We find that certain pretrained transformers can learn a same-different relation that generalizes with near perfect accuracy to out-of-distribution stimuli. Moreover, we find that fine-tuning on abstract shapes that lack texture or color provides the strongest out-of-distribution generalization. Our results suggest that, with the right approach, deep neural networks can learn generalizable same-different visual relations.

Penetrative AI: Making LLMs Comprehend the Physical World

  • paper_url: http://arxiv.org/abs/2310.09605
  • repo_url: None
  • paper_authors: Huatao Xu, Liying Han, Mo Li, Mani Srivastava
  • for: 本研究探讨了如何使用大型自然语言模型(LLMs)与物联网传感器和 actuators进行交互和理解物理世界,以推动人工智能在物理世界中的应用。
  • methods: 本研究采用了扩展LLMs的方法,通过处理感知信号来让模型与物理世界进行交互和理解。
  • results: 研究发现,使用ChatGPT作为例子,LLMs在处理物联网传感器数据和对其进行理解方面具有considerable和特殊的能力,这开放了新的应用场景 дляLLMs,同时也为人工智能在物理世界中的应用提供了新的机会。
    Abstract Recent developments in Large Language Models (LLMs) have demonstrated their remarkable capabilities across a range of tasks. Questions, however, persist about the nature of LLMs and their potential to integrate common-sense human knowledge when performing tasks involving information about the real physical world. This paper delves into these questions by exploring how LLMs can be extended to interact with and reason about the physical world through IoT sensors and actuators, a concept that we term "\textit{Penetrative AI}". The paper explores such an extension at two levels of LLMs' ability to penetrate into the physical world via the processing of sensory signals. Our preliminary findings indicate that LLMs, with ChatGPT being the representative example in our exploration, have considerable and unique proficiency in employing the knowledge they learned during training for interpreting IoT sensor data and reasoning over them about tasks in the physical realm. Not only this opens up new applications for LLMs beyond traditional text-based tasks, but also enables new ways of incorporating human knowledge in cyber-physical systems.
    摘要

Context-aware Session-based Recommendation with Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.09593
  • repo_url: https://github.com/brilliantzhang/cares
  • paper_authors: Zhihui Zhang, JianXiang Yu, Xiang Li
  • for: 本研究旨在提高Session-based recommendation(SBR)的准确率,特别是在使用不同类型的Session中捕捉用户兴趣的情况下。
  • methods: 本研究提出了一种名为CARES的Context-Aware Session-based Recommendation模型,该模型利用不同类型的Session中的Context来捕捉用户兴趣,并采用图ael neural networks进行学习。
  • results: 实验结果表明,CARES模型在三个标准 datasets上均有显著的提高,比如P@20和MRR@20等指标。
    Abstract Session-based recommendation (SBR) is a task that aims to predict items based on anonymous sequences of user behaviors in a session. While there are methods that leverage rich context information in sessions for SBR, most of them have the following limitations: 1) they fail to distinguish the item-item edge types when constructing the global graph for exploiting cross-session contexts; 2) they learn a fixed embedding vector for each item, which lacks the flexibility to reflect the variation of user interests across sessions; 3) they generally use the one-hot encoded vector of the target item as the hard label to predict, thus failing to capture the true user preference. To solve these issues, we propose CARES, a novel context-aware session-based recommendation model with graph neural networks, which utilizes different types of contexts in sessions to capture user interests. Specifically, we first construct a multi-relation cross-session graph to connect items according to intra- and cross-session item-level contexts. Further, to encode the variation of user interests, we design personalized item representations. Finally, we employ a label collaboration strategy for generating soft user preference distribution as labels. Experiments on three benchmark datasets demonstrate that CARES consistently outperforms state-of-the-art models in terms of P@20 and MRR@20. Our data and codes are publicly available at https://github.com/brilliantZhang/CARES.
    摘要 Session-based recommendation (SBR) 是一个任务,旨在预测基于匿名用户行为序列的用户喜好。虽然有些方法利用session中的丰富上下文信息来实现SBR,但大多数方法具有以下限制:1)不能区分 item-item 边的类型,在构建全局图以利用跨SESSION上下文时,2)学习固定的 embedding 矢量,缺乏用户兴趣的变化适应性,3)通常使用目标项的一元化编码vector作为硬标签预测,从而失去真实用户喜好的表达。为解决这些问题,我们提出了 CARES,一种Context-Aware Session-Based Recommendation模型,利用session中不同类型的上下文来捕捉用户兴趣。具体来说,我们首先构建了多种关系跨SESSION图,将item相互连接,根据内部和跨SESSION item-level上下文。此外,为了编码用户兴趣的变化,我们设计了个性化项表示。最后,我们采用标签合作策略来生成软USER preference分布。实验结果显示,CARES在三个标准 benchmark 数据集上表现出色,相比之前的模型,它在P@20和MRR@20上均有显著提高。我们的数据和代码可以在https://github.com/brilliantZhang/CARES中获取。

Solving Math Word Problems with Reexamination

  • paper_url: http://arxiv.org/abs/2310.09590
  • repo_url: https://github.com/steven640pixel/psedualmwp
  • paper_authors: Yi Bin, Wenhao Shi, Yujuan Ding, Yang Yang, See-Kiong Ng
  • for: 这个论文的目的是提高数学问题 solving 能力。
  • methods: 这个论文使用了 pseudo-dual 学习方法,即在训练过程中重新评估问题的解决方法。
  • results: 实验表明,当将 pseudo-dual 学习方法应用于一些代表性的数学问题 solving 算法时,可以提高问题 solving 的能力。
    Abstract Math word problem (MWP) solving aims to understand the descriptive math problem and calculate the result, for which previous efforts are mostly devoted to upgrade different technical modules. This paper brings a different perspective of \textit{reexamination process} during training by introducing a pseudo-dual task to enhance the MWP solving. We propose a pseudo-dual (PseDual) learning scheme to model such process, which is model-agnostic thus can be adapted to any existing MWP solvers. The pseudo-dual task is specifically defined as filling the numbers in the expression back into the original word problem with numbers masked. To facilitate the effective joint learning of the two tasks, we further design a scheduled fusion strategy for the number infilling task, which smoothly switches the input from the ground-truth math expressions to the predicted ones. Our pseudo-dual learning scheme has been tested and proven effective when being equipped in several representative MWP solvers through empirical studies. \textit{The codes and trained models are available at:} \url{https://github.com/steven640pixel/PsedualMWP}. \end{abstract}
    摘要 mat word problem (MWP) 解决目标是理解描述性数学问题并计算结果,而前一些努力都是升级不同的技术模块。 这篇论文带来了一种不同的 \textit{重新评估过程} 在训练中的思路,通过引入一个 pseudo-dual 任务来提高 MWP 解决。我们提议一种 pseudo-dual 学习方案来模型这个过程,这种方案是无关模型的,因此可以适应任何现有的 MWP 解决器。 pseudo-dual 任务是填充数字到原始的数学问题中,并将数字掩码。为了实现有效的共同学习两个任务,我们还设计了一种安排的融合策略,使得输入从真实的数学表达中顺利地转换到预测的表达中。我们的 pseudo-dual 学习方案在一些代表性的 MWP 解决器上进行了实验,并证明了其效果。 \textit{代码和训练模型可以在} \url{https://github.com/steven640pixel/PsedualMWP} \textit{上获取.}

Autonomous Tree-search Ability of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.10686
  • repo_url: None
  • paper_authors: Zheyu Zhang, Zhuorui Ye, Yikang Shen, Chuang Gan
  • for: 提高大型语言模型的推理能力,解决逻辑推理和策略规划等任务。
  • methods: 使用自动化搜索能力,通过LLM API进行自动化搜索,实现对答案的搜索迹线。
  • results: 实验结果显示,使用我们的方法可以 achieved huge improvements,比如Chain of Thoughtapproach的准确率提高33%,并且需要 menos GPT-api cost。此外,我们还收集了使用ATS prompt方法和精度调整LLaMA的数据,这种方法可以带来更大的改进。
    Abstract Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques, but they fall short on tasks that require exploration, strategic foresight, and sequential decision-making. Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks. Though impressive results have been achieved, there are several fundamental limitations of these approaches. First, passive tree searches are not efficient as they usually require multiple rounds of LLM API calls to solve one single problem. Moreover, passive search methods are not flexible since they need task-specific program designs. Then a natural question arises: can we maintain the tree-search capability of LLMs without the aid of external programs, and can still generate responses that clearly demonstrate the process of a tree-structure search? To this end, we propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer. Concretely, we perform search trajectories using capable LLM API via a fixed system prompt, allowing them to perform autonomous tree-search (ATS) right out of the box. Experiments on 4 puzzle games demonstrate our method can achieve huge improvements. The ATS-BFS method outperforms the Chain of Thought approach by achieving an average accuracy improvement of 33%. Compared to Tree of Thoughts, it requires 65.6% or 47.7% less GPT-api cost to attain a comparable level of accuracy. Moreover, we have collected data using the ATS prompt method and fine-tuned LLaMA. This approach yield a greater improvement compared to the ones fine-tuned on CoT data. Specifically, it outperforms CoT-tuned LLaMAs by an average of 40.6% and 38.5% for LLaMA2-7B and LLaMA2-13B, respectively.
    摘要 大型语言模型(LLM)在进行先进的推理任务时,已经表现出了非常出色的推理能力,但在需要探索、 стратегіic预见和顺序做决策的任务时,它们仍然缺乏表现。 latest works propose to use external programs to define search logic, so that LLMs can perform passive tree search to solve more challenging reasoning tasks. Although impressive results have been achieved, there are several fundamental limitations of these approaches. First, passive tree searches are not efficient, as they usually require multiple rounds of LLM API calls to solve one single problem. Moreover, passive search methods are not flexible, as they need task-specific program designs. Therefore, a natural question arises: can we maintain the tree-search capability of LLMs without the aid of external programs, and can still generate responses that clearly demonstrate the process of a tree-structure search? To this end, we propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer. Specifically, we perform search trajectories using capable LLM API via a fixed system prompt, allowing them to perform autonomous tree-search (ATS) right out of the box. Experimental results on 4 puzzle games demonstrate that our method can achieve significant improvements. The ATS-BFS method outperforms the Chain of Thought approach by achieving an average accuracy improvement of 33%. Compared to Tree of Thoughts, it requires 65.6% or 47.7% less GPT-api cost to attain a comparable level of accuracy. Moreover, we have collected data using the ATS prompt method and fine-tuned LLaMA. This approach yields a greater improvement compared to the ones fine-tuned on CoT data. Specifically, it outperforms CoT-tuned LLaMAs by an average of 40.6% and 38.5% for LLaMA2-7B and LLaMA2-13B, respectively.

PS-AAS: Portfolio Selection for Automated Algorithm Selection in Black-Box Optimization

  • paper_url: http://arxiv.org/abs/2310.10685
  • repo_url: None
  • paper_authors: Ana Kostovska, Gjorgjina Cenikj, Diederick Vermetten, Anja Jankovic, Ana Nikolikj, Urban Skvorc, Peter Korosec, Carola Doerr, Tome Eftimov
  • for: 这种论文是为了研究自动化算法选择(AAS)的投资问题,即选择一个包含多种算法的股票,以实现最佳的算法选择。
  • methods: 这篇论文提出了一种数据驱动的股票选择技术,即创建算法行为 méta-表示,基于这些méta-表示的相似性构建一个图,并应用图算法选择最终的多样化、表示性和不重复的股票。
  • results: 这篇论文通过使用不同的méta-表示技术(SHAP和performance2vec),对324个不同的CMA-ES变体进行了优化BBOB单目标问题的测试,并比较了两种类型的股票:一种是基于总算法行为,另一种是基于每个问题的算法行为。结果显示,使用performance2vec-based méta-表示的方法选择了小型股票,与虚拟最佳算法相比,而使用SHAP-based méta-表示的方法选择了更多的算法,但是在AAS任务中表现不如personalized股票。在大多数考虑的场景下,个性化股票比 класси的排序方法更好,而且在所有场景下都比整个股票更好。
    Abstract The performance of automated algorithm selection (AAS) strongly depends on the portfolio of algorithms to choose from. Selecting the portfolio is a non-trivial task that requires balancing the trade-off between the higher flexibility of large portfolios with the increased complexity of the AAS task. In practice, probably the most common way to choose the algorithms for the portfolio is a greedy selection of the algorithms that perform well in some reference tasks of interest. We set out in this work to investigate alternative, data-driven portfolio selection techniques. Our proposed method creates algorithm behavior meta-representations, constructs a graph from a set of algorithms based on their meta-representation similarity, and applies a graph algorithm to select a final portfolio of diverse, representative, and non-redundant algorithms. We evaluate two distinct meta-representation techniques (SHAP and performance2vec) for selecting complementary portfolios from a total of 324 different variants of CMA-ES for the task of optimizing the BBOB single-objective problems in dimensionalities 5 and 30 with different cut-off budgets. We test two types of portfolios: one related to overall algorithm behavior and the `personalized' one (related to algorithm behavior per each problem separately). We observe that the approach built on the performance2vec-based representations favors small portfolios with negligible error in the AAS task relative to the virtual best solver from the selected portfolio, whereas the portfolios built from the SHAP-based representations gain from higher flexibility at the cost of decreased performance of the AAS. Across most considered scenarios, personalized portfolios yield comparable or slightly better performance than the classical greedy approach. They outperform the full portfolio in all scenarios.
    摘要 algorithm选择自动化(AAS)的性能强度取决于可选的算法集合。选择该集合是一项非轻松的任务,需要平衡更高的灵活性和更大的算法选择任务的复杂度。在实践中,可能最常用的方法是根据参考任务的兴趣选择算法。在这种情况下,我们在这项工作中提出了一种数据驱动的算法集合选择技术。我们的提议的方法是创建算法行为媒体表示,将一组算法基于媒体表示之间的相似性构建一个图,并将图算法应用于选择最终的多样化、代表性和不重复的算法集合。我们对324个不同的CMA-ES变体进行了两种不同的媒体表示技术(SHAP和performance2vec)来选择相似的算法集合,并测试了两种类型的集合:一种关于总算法行为,另一种是关于每个问题的算法行为。我们发现,基于performance2vec-based的表示方法选择小型集合,与虚拟最佳算法从选择的集合中的错误相对较小,而基于SHAP-based的表示方法增加了更高的灵活性,但是在AAS任务中导致性能下降。在大多数考虑的场景下,个性化集合比 классифика greedy 方法提供了相似或微弱的性能,而且在所有场景下都高于全部集合。

Does CLIP’s Generalization Performance Mainly Stem from High Train-Test Similarity?

  • paper_url: http://arxiv.org/abs/2310.09562
  • repo_url: None
  • paper_authors: Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak, Matthias Bethge, Wieland Brendel
  • for: 本文研究 CLIP 模型在各种 OUT-OF-DISTRIBUTION (OOD) benchmark 上的 Zero-shot 和 few-shot 能力。
  • methods: 本文使用 RETRAINING CLIP 模型 在剪裁 LAION 数据集上,以提高其 OOD 性能。
  • results: 研究发现,尽管 CLIP 模型在剪裁 LAION 数据集上的性能下降,但总体性能仍然高。这表示,高的train-test相似性不能完全解释 CLIP 模型的 OOD 性能,其他训练数据的属性must drive CLIP 模型学习更一般的表示。此外,通过剪裁数据点和 OOD benchmark 相似的数据点,我们揭示了一个 100M split of LAION(原始大小的一半),可以训练 CLIP 模型达到原始 OOD 性能水平。
    Abstract Foundation models like CLIP are trained on hundreds of millions of samples and effortlessly generalize to new tasks and inputs. Out of the box, CLIP shows stellar zero-shot and few-shot capabilities on a wide range of out-of-distribution (OOD) benchmarks, which prior works attribute mainly to today's large and comprehensive training dataset (like LAION). However, it is questionable how meaningful terms like out-of-distribution generalization are for CLIP as it seems likely that web-scale datasets like LAION simply contain many samples that are similar to common OOD benchmarks originally designed for ImageNet. To test this hypothesis, we retrain CLIP on pruned LAION splits that replicate ImageNet's train-test similarity with respect to common OOD benchmarks. While we observe a performance drop on some benchmarks, surprisingly, CLIP's overall performance remains high. This shows that high train-test similarity is insufficient to explain CLIP's OOD performance, and other properties of the training data must drive CLIP to learn more generalizable representations. Additionally, by pruning data points that are dissimilar to the OOD benchmarks, we uncover a 100M split of LAION ($\frac{1}{4}$th of its original size) on which CLIP can be trained to match its original OOD performance.
    摘要 CLIP模型如Foundation models是通过百万个样本进行训练,并能够自动泛化到新任务和输入。它的出厂性能在各种不同的外部数据集上表现杰出,这被认为是因为今天的大规模和全面的训练数据集(如LAION)。然而,是否有意义地用 терminus如外部数据集泛化是CLIP的问题,因为它似乎是LAION中的许多样本与常见的OOD benchmarks(如ImageNet)有很多相似之处。为了测试这个假设,我们重新训练CLIP在LAION中采样后的分割中,这些分割与ImageNet的训练集和测试集之间具有相似的类型相似性。虽然我们观察到一些benchmark上的性能下降,但CLIP的总性能仍然高。这表明高的train-test相似性不能完全解释CLIP的OOD性能,其他训练数据的属性must drive CLIP学习更泛化的表示。此外,我们通过从LAION中删除与OOD benchmarks不相似的数据点,发现一个100M大小的LAION分割(占原始大小的一半),在这个分割上训练CLIP可以与原始OOD性能匹配。

Graph Neural Network approaches for single-cell data: A recent overview

  • paper_url: http://arxiv.org/abs/2310.09561
  • repo_url: None
  • paper_authors: Konstantinos Lazaros, Dimitris E. Koumadorakis, Panagiotis Vlamos, Aristidis G. Vrahatis
  • for: 这 paper 的目的是探讨 Graph Neural Networks (GNN) 在单元细胞数据上的应用,以及 GNN 方法在不同目标上的可行性。
  • methods: 这 paper 使用了多种 GNN 方法,包括 Graph Attention Networks (GAT) 和 Graph Convolutional Neural Networks (Graph CNN),以及其他相关的方法。
  • results: 这 paper 提出了一些结合 GNN 和单元细胞数据的研究,显示了这些方法在不同目标上的可行性,例如 cell-type annotation, data integration and imputation, gene regulatory network reconstruction, clustering 等。
    Abstract Graph Neural Networks (GNN) are reshaping our understanding of biomedicine and diseases by revealing the deep connections among genes and cells. As both algorithmic and biomedical technologies have advanced significantly, we're entering a transformative phase of personalized medicine. While pioneering tools like Graph Attention Networks (GAT) and Graph Convolutional Neural Networks (Graph CNN) are advancing graph-based learning, the rise of single-cell sequencing techniques is reshaping our insights on cellular diversity and function. Numerous studies have combined GNNs with single-cell data, showing promising results. In this work, we highlight the GNN methodologies tailored for single-cell data over the recent years. We outline the diverse range of graph deep learning architectures that center on GAT methodologies. Furthermore, we underscore the several objectives of GNN strategies in single-cell data contexts, ranging from cell-type annotation, data integration and imputation, gene regulatory network reconstruction, clustering and many others. This review anticipates a future where GNNs become central to single-cell analysis efforts, particularly as vast omics datasets are continuously generated and the interconnectedness of cells and genes enhances our depth of knowledge in biomedicine.
    摘要 GRAPH神经网络(GNN)正在改变我们对生物医学和疾病的理解,揭示了基因和细胞之间的深层连接。随着算法技术和生物技术的进步,我们正在进入个性化医学的转型阶段。而前所未有的工具如图像注意力网络(GAT)和图像卷积神经网络(图 CNN)正在推动图形学习的发展,而单细胞测序技术的出现也在改变我们对细胞多样性和功能的理解。许多研究已经结合GNNs与单细胞数据,并取得了有 promise 的结果。在这个工作中,我们将强调最近几年对单细胞数据的GNN方法的探索。我们将介绍各种中心于GAT方法的图深度学习架构,并强调GNN策略在单细胞数据上的多种目标,包括细胞类型标注、数据集成和填充、基因规则网络重建、划分和其他多种目标。这篇文章预测未来,GNN将成为单细胞分析的中心,特别是随着不断生成的庞大各种数据和细胞和基因之间的连接,我们对生物医学的知识将更加深入。

UNIQA: A Unified Framework for Both Full-Reference and No-Reference Image Quality Assessment

  • paper_url: http://arxiv.org/abs/2310.09560
  • repo_url: None
  • paper_authors: Yi Ke Yun, Weisi Lin
  • for: 提高全referenced(FR)和无参照(NR)图像质量评估(IQA)的性能,并能够同时处理FR和NR输入。
  • methods: 提出了一种基于 semantic impact 模型的 универсальный网络,包括encoder和多级自注意力(HSA)模块,以及跨层跨注意力(CSCA)模块,用于模型空间扭曲水平和图像 semantics的关系。
  • results: 对四个synthetic扭曲数据集和三个authentic扭曲数据集进行了广泛的实验,并得到了较高的性能,超过了相关的FR和NR方法。
    Abstract The human visual system (HVS) is effective at distinguishing low-quality images due to its ability to sense the distortion level and the resulting semantic impact. Prior research focuses on developing dedicated networks based on the presence and absence of pristine images, respectively, and this results in limited application scope and potential performance inconsistency when switching from NR to FR IQA. In addition, most methods heavily rely on spatial distortion modeling through difference maps or weighted features, and this may not be able to well capture the correlations between distortion and the semantic impact it causes. To this end, we aim to design a unified network for both Full-Reference (FR) and No-Reference (NR) IQA via semantic impact modeling. Specifically, we employ an encoder to extract multi-level features from input images. Then a Hierarchical Self-Attention (HSA) module is proposed as a universal adapter for both FR and NR inputs to model the spatial distortion level at each encoder stage. Furthermore, considering that distortions contaminate encoder stages and damage image semantic meaning differently, a Cross-Scale Cross-Attention (CSCA) module is proposed to examine correlations between distortion at shallow stages and deep ones. By adopting HSA and CSCA, the proposed network can effectively perform both FR and NR IQA. Extensive experiments demonstrate that the proposed simple network is effective and outperforms the relevant state-of-the-art FR and NR methods on four synthetic-distorted datasets and three authentic-distorted datasets.
    摘要 人类视觉系统(HVS)能够准确地认识低质量图像,这是因为它能够感受到图像的扭曲水平和导致的semantic影响。先前的研究主要关注于基于存在和缺失高品质图像的特有网络,这会导致应用范围有限和在切换到FR IQA时性能不稳定。此外,大多数方法依赖于空间扭曲模型,通过差图或权重特征来模型扭曲水平,这可能无法好地捕捉扭曲对semantic意义的影响。为了解决这个问题,我们目的是设计一个可以同时执行FR和NR IQA的统一网络,通过semantic impact模型来模型扭曲水平。我们使用encoder提取输入图像的多级特征。然后,我们提出了一种 Hierarchical Self-Attention(HSA)模块,作为FR和NR输入的通用适配器,以模型encoder stage上的空间扭曲水平。此外,我们认为扭曲会在encoder stage上污染图像的semantic意义,因此我们提出了一种 Cross-Scale Cross-Attention(CSCA)模块,以检查扭曲在不同深度stage之间的相关性。通过采用HSA和CSCA,我们的提案的简单网络可以高效地执行FR和NR IQA。我们的实验证明,我们的提案的简单网络可以高效地与相关的FR和NR方法进行比较,并在四个synthetic-distorted dataset和三个authentic-distorted dataset上获得更好的性能。

A study of the impact of generative AI-based data augmentation on software metadata classification

  • paper_url: http://arxiv.org/abs/2310.13714
  • repo_url: None
  • paper_authors: Tripti Kumari, Chakali Sai Charan, Ayan Das
  • for: 这篇论文是为了自动预测代码-注释对的有用性而写的。
  • methods: 这篇论文使用了人工智能基于神经网络上下文表示的注释和代码的关系来预测代码-注释对的有用性,并对基础数据和大语言模型生成的数据进行性能分析。
  • results: 在官方评测中,这篇论文的系统比基eline提高4%的F1分数,并且生成的数据质量也得到了改进。
    Abstract This paper presents the system submitted by the team from IIT(ISM) Dhanbad in FIRE IRSE 2023 shared task 1 on the automatic usefulness prediction of code-comment pairs as well as the impact of Large Language Model(LLM) generated data on original base data towards an associated source code. We have developed a framework where we train a machine learning-based model using the neural contextual representations of the comments and their corresponding codes to predict the usefulness of code-comments pair and performance analysis with LLM-generated data with base data. In the official assessment, our system achieves a 4% increase in F1-score from baseline and the quality of generated data.
    摘要

Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction

  • paper_url: http://arxiv.org/abs/2310.11466
  • repo_url: None
  • paper_authors: Yufei Huang, Siyuan Li, Jin Su, Lirong Wu, Odin Zhang, Haitao Lin, Jingqi Qi, Zihan Liu, Zhangyang Gao, Yuyang Liu, Jiangbin Zheng, Stan. ZQ. Li
  • for: 本研究旨在解决在蛋白质性质预测中使用预测结构时出现的性能下降问题。
  • methods: 本研究使用了一种基于蛋白质三维图 структуры学习的框架,即Structure embedding Alignment Optimization(SAO),以mitigate the problem of structure embedding bias between predicted and experimental protein structures。
  • results: 对比于现有方法,本研究的方法能够在蛋白质性质预测中提高性能,并且可以适用于预测结构和实验结构 both。
    Abstract Protein structure-based property prediction has emerged as a promising approach for various biological tasks, such as protein function prediction and sub-cellular location estimation. The existing methods highly rely on experimental protein structure data and fail in scenarios where these data are unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were utilized as alternatives. However, we observed that current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy. While similar phenomena have been extensively studied in general fields (e.g., Computer Vision) as model robustness, their impact on protein property prediction remains unexplored. In this paper, we first investigate the reason behind the performance decrease when utilizing predicted structures, attributing it to the structure embedding bias from the perspective of structure representation learning. To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures. Extensive experiments have shown that our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures. The benchmark datasets and codes will be released to benefit the community.
    摘要 In this paper, we investigate the reason behind the performance decrease when utilizing predicted structures and attribute it to the structure embedding bias from the perspective of structure representation learning. To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures. Our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures. The benchmark datasets and codes will be released to benefit the community.Here is the Simplified Chinese translation of the text:蛋白结构基于的蛋白性质预测已经成为生物任务中的一种有前途的方法,例如蛋白功能预测和细胞内位置估计。现有的方法几乎完全依赖于实验蛋白结构数据,而无法在这些数据不可用的情况下进行预测。使用人工智能工具(如AlphaFold2)预测的蛋白结构作为替代方案,但我们发现现有的方法,即直接在推理过程中使用准确预测的结构,会导致显著的预测精度下降。这种现象在通用计算机视觉领域已经广泛研究过,称为模型 Robustness,但它对蛋白性质预测的影响尚未得到研究。在这篇论文中,我们首次调查预测结构使用后蛋白性质预测的性能下降的原因,归结于结构表示学习的结构嵌入偏见。为了研究这个问题,我们定义了蛋白3D图像学习问题(PGSL-RP3),收集了参考数据集,并提出了一种蛋白结构嵌入对静定方法(SAO),以解决结构嵌入偏见 между预测结构和实验结构。我们的框架是模型无关的,并且对预测结构和实验结构都有效。我们将参考数据集和代码公布,以便社区各方受益。

Software Metadata Classification based on Generative Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2310.13006
  • repo_url: None
  • paper_authors: Seetharam Killivalavan, Durairaj Thenmozhi
  • for: 这种方法可以提高 binary 代码注释质量分类模型的性能。
  • methods: 该方法利用 Generative Artificial Intelligence (AI) 技术,通过 OpenAI API 生成了 1239 个新的代码注释对,从各种 GitHub 存储库和开源项目中提取出来,并将其与现有的 9048 个对照集成到 C 语言中。
  • results: 使用 cutting-edge Large Language Model Architecture 生成的数据集显示了明显的改善,具体来说,当 incorporated into SVM 模型时,精度从 0.79 提高到 0.85,增加 6%;当 incorporated into ANN 模型时,准确率从 0.731 提高到 0.746,增加 1.5%。
    Abstract This paper presents a novel approach to enhance the performance of binary code comment quality classification models through the application of Generative Artificial Intelligence (AI). By leveraging the OpenAI API, a dataset comprising 1239 newly generated code-comment pairs, extracted from various GitHub repositories and open-source projects, has been labelled as "Useful" or "Not Useful", and integrated into the existing corpus of 9048 pairs in the C programming language. Employing a cutting-edge Large Language Model Architecture, the generated dataset demonstrates notable improvements in model accuracy. Specifically, when incorporated into the Support Vector Machine (SVM) model, a 6% increase in precision is observed, rising from 0.79 to 0.85. Additionally, the Artificial Neural Network (ANN) model exhibits a 1.5% increase in recall, climbing from 0.731 to 0.746. This paper sheds light on the potential of Generative AI in augmenting code comment quality classification models. The results affirm the effectiveness of this methodology, indicating its applicability in broader contexts within software development and quality assurance domains. The findings underscore the significance of integrating generative techniques to advance the accuracy and efficacy of machine learning models in practical software engineering scenarios.
    摘要

Instruction Tuning with Human Curriculum

  • paper_url: http://arxiv.org/abs/2310.09518
  • repo_url: None
  • paper_authors: Bruce W. Lee, Hyunsoo Cho, Kang Min Yoo
  • for: 这篇论文旨在探讨如何使用结构化认知学方法来优化现代大语言模型中的指令优化。
  • methods: 该论文提出了一种基于人类教育框架的高度结构化的人工数据集,并对其进行了适应性和评估。
  • results: 研究结果表明,该方法可以提高语言模型的性能,比如MMLUbenchmark上的提升为+3.06,并且可以避免额外计算成本。
    Abstract The dominant paradigm for instruction tuning is the random-shuffled training of maximally diverse instruction-response pairs. This paper explores the potential benefits of applying a structured cognitive learning approach to instruction tuning in contemporary large language models like ChatGPT and GPT-4. Unlike the previous conventional randomized instruction dataset, we propose a highly structured synthetic dataset that mimics the progressive and organized nature of human education. We curate our dataset by aligning it with educational frameworks, incorporating meta information including its topic and cognitive rigor level for each sample. Our dataset covers comprehensive fine-grained topics spanning diverse educational stages (from middle school to graduate school) with various questions for each topic to enhance conceptual depth using Bloom's taxonomy-a classification framework distinguishing various levels of human cognition for each concept. The results demonstrate that this cognitive rigorous training approach yields significant performance enhancements - +3.06 on the MMLU benchmark and an additional +1.28 on AI2 Reasoning Challenge (hard set) - compared to conventional randomized training, all while avoiding additional computational costs. This research highlights the potential of leveraging human learning principles to enhance the capabilities of language models in comprehending and responding to complex instructions and tasks.
    摘要 主流启发方法 для处理调教是随机洗涤最大多样性的指令-响应对。本文探讨了应用结构化认知学学习方法来提高现代大语言模型如ChatGPT和GPT-4的指令调教。与传统的随机化指令数据集不同,我们提议一个高度结构化的人工数据集,模拟人类教育的进步和有序性。我们对数据进行了匹配,包括每个样本的主题和认知困难程度信息。我们的数据涵盖了广泛的细化主题(从中学到大学),每个主题都有多个问题,以增强概念深度使用布隆分类法-一种分类框架,可以区分不同的认知水平。结果表明,这种认知强化训练方法可以提高表现,比传统随机训练高出3.06个MMLU指标和1.28个AI2逻辑挑战(困难集)。此外,这种方法不需要额外的计算成本。这些研究表明,可以利用人类学习原理来提高语言模型对复杂指令和任务的理解和回答能力。

Towards Semantic Communication Protocols for 6G: From Protocol Learning to Language-Oriented Approaches

  • paper_url: http://arxiv.org/abs/2310.09506
  • repo_url: None
  • paper_authors: Jihong Park, Seung-Woo Ko, Jinho Choi, Seong-Lyun Kim, Mehdi Bennis
  • for: 这篇论文旨在探讨未来的6G系统将如何面对多种非站ARY tasks的挑战,以及传统的媒体存取控制协议(MAC协议)是如何适应这些挑战的。
  • methods: 这篇论文提出了一个新的分类方法,将资料驱动的MAC协议分为三级: Level 1 MAC 是使用多代理深度循环学习(MADRL)构建的任务对应神经协议;Level 2 MAC 是将 Level 1 MAC 的输出转换为明确的符号;Level 3 MAC 是使用大语言模型(LLM)和生成模型来构建语言对应的协议。
  • results: 这篇论文通过探讨这些层次的基本技术和选择性案例研究,提供了关于资料驱动MAC协议的未来走势和未来研究方向的对答。
    Abstract The forthcoming 6G systems are expected to address a wide range of non-stationary tasks. This poses challenges to traditional medium access control (MAC) protocols that are static and predefined. In response, data-driven MAC protocols have recently emerged, offering ability to tailor their signaling messages for specific tasks. This article presents a novel categorization of these data-driven MAC protocols into three levels: Level 1 MAC. task-oriented neural protocols constructed using multi-agent deep reinforcement learning (MADRL); Level 2 MAC. neural network-oriented symbolic protocols developed by converting Level 1 MAC outputs into explicit symbols; and Level 3 MAC. language-oriented semantic protocols harnessing large language models (LLMs) and generative models. With this categorization, we aim to explore the opportunities and challenges of each level by delving into their foundational techniques. Drawing from information theory and associated principles as well as selected case studies, this study provides insights into the trajectory of data-driven MAC protocols and sheds light on future research directions.
    摘要 六代系统即将来临,预期能解决各种非静止任务。这会对传统的媒体存取控制协议(MAC)协议产生挑战,这些协议通常是静止的和预先定义的。对此,使用数据驱动的MAC协议已经出现,这些协议可以根据特定任务 tailor其讯息。本文提出了一个 novel 的分类方法,将这些数据驱动的MAC协议分为三级: Level 1 MAC:使用多智能深度反对应学习(MADRL)构建的任务对应神经网络协议。 Level 2 MAC:将 Level 1 MAC 的输出转换为Explicit symbols,这样可以使用神经网络协议。 Level 3 MAC:使用大型语言模型(LLMs)和生成模型,实现语言对应的协议。透过这个分类,我们希望可以探讨每个等级的机遇和挑战,并且从信息论和相关的原则以及选择的实验案例中获得新的见解。这篇研究提供了数据驱动MAC协议的未来趋势和未来研究方向的新的思路。

One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.09499
  • repo_url: None
  • paper_authors: Hang Shao, Bei Liu, Yanmin Qian
  • for: 提高生成预训练变换器(GPT)家族模型的实用性,通过量化、剪枝和其他方法提高模型的效率。
  • methods: 基于梯度敏感度杂合稀疏剪枝法,不需要重新训练可以剪枝GPT模型至少50%的稀疏率。该方法可以根据敏感度进行适应性分配稀疏,从而降低剪枝导致的错误,保持总稀疏率。
  • results: 提出的方法可以在极高稀疏率下进一步提高LLM模型的效率,并且兼容量化,可以进一步压缩LLM模型。
    Abstract Various Large Language Models(LLMs) from the Generative Pretrained Transformer~(GPT) family have achieved outstanding performances in a wide range of text generation tasks. However, the enormous model sizes have hindered their practical use in real-world applications due to high inference latency. Therefore, improving the efficiencies of LLMs through quantization, pruning, and other means has been a key issue in LLM studies. In this work, we propose a method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs to at least 50\% sparsity without the need of any retraining. It allocates sparsity adaptively based on sensitivity, allowing us to reduce pruning-induced error while maintaining the overall sparsity level. The advantages of the proposed method exhibit even more when the sparsity is extremely high. Furthermore, our method is compatible with quantization, enabling further compression of LLMs.
    摘要 各种大型语言模型(LLM)从转换器生成器(GPT)家族已经在文本生成任务中显示出杰出的表现。然而,这些模型的巨大大小使得它们在实际应用中具有高的推理延迟。因此,提高LLM的效率通过量化、剪裁等方法已成为LLM研究中的关键问题。在这个工作中,我们提出基于梯度敏感性杂化混合稀疏剪枝法,可以剪枝LLMs到至少50%的稀疏程度而无需重新训练。它根据敏感性分配稀疏性,使我们可以降低剪枝引起的错误而保持总的稀疏性水平。我们的方法在稀疏程度非常高时表现出更多的优势。此外,我们的方法与量化相容,可以进一步压缩LLMs。

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.09497
  • repo_url: https://github.com/ielab/llm-rankers
  • paper_authors: Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, Guido Zuccon
  • for: 这paper主要是为了evaluating the effectiveness and efficiency of different prompting approaches for large language models (LLMs) in zero-shot document ranking tasks.
  • methods: 这paper使用了Pointwise, Pairwise,和Listwise prompting approaches, along with a novel Setwise approach, to evaluate their effectiveness and efficiency in LLM-based zero-shot ranking.
  • results: 这paper的实验结果表明,Pointwise approaches are efficient but less effective, Pairwise approaches are effective but computationally expensive, while Setwise approaches can significantly reduce computational costs while retaining high zero-shot ranking effectiveness.
    Abstract Large Language Models (LLMs) demonstrate impressive effectiveness in zero-shot document ranking tasks. Pointwise, Pairwise, and Listwise prompting approaches have been proposed for LLM-based zero-shot ranking. Our study begins by thoroughly evaluating these existing approaches within a consistent experimental framework, considering factors like model size, token consumption, latency, among others. This first-of-its-kind comparative evaluation of these approaches allows us to identify the trade-offs between effectiveness and efficiency inherent in each approach. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. To further enhance the efficiency of LLM-based zero-shot ranking, we propose a novel Setwise prompting approach. Our approach reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, significantly improving the efficiency of LLM-based zero-shot ranking. We test our method using the TREC DL datasets and the BEIR zero-shot document ranking benchmark. The empirical results indicate that our approach considerably reduces computational costs while also retaining high zero-shot ranking effectiveness.
    摘要 大型语言模型(LLM)在零shot文档排序任务中表现出众,Pointwise、Pairwise和Listwise promptingapproaches已经被提出用于LLM基于的零shot排序。我们的研究开始于在一个共同的实验室中仔细评估这些现有的方法,考虑因素如模型大小、token消耗量、延迟时间等。这是第一次对这些方法进行了系统性的比较评估,从而找到每个方法之间的质量和效率之间的交易。我们发现,虽然Pointwise方法具有高效性,但效果不佳。相反,Pairwise方法表现出色,但计算开销很高。为了进一步提高LLM基于的零shot排序的效率,我们提议了一种新的Setwise promptingapproach。我们的方法可以减少LLM的推理数量和提示Token的消耗量,从而显著提高LLM基于的零shot排序的效率。我们使用TREC DL数据集和BEIR零shot文档排序标准套件进行测试,实验结果表明,我们的方法可以减少计算成本,同时保持高的零shot排序效果。

Mirage: Model-Agnostic Graph Distillation for Graph Classification

  • paper_url: http://arxiv.org/abs/2310.09486
  • repo_url: None
  • paper_authors: Mridul Gupta, Sahil Manchanda, Hariprasad Kodamana, Sayan Ranu
  • For: The paper aims to scale training of graph neural networks (GNNs) on large datasets while reducing the need for computation and data resources.* Methods: The paper proposes a distillation algorithm called Mirage, which compresses the computation data itself to create a concise distilled summary, rather than emulating gradient flows on the original training set.* Results: The paper reports that Mirage outperforms state-of-the-art baselines in terms of generalization accuracy, data compression, and distillation efficiency, and is an unsupervised and architecture-agnostic distillation algorithm.Here’s the Simplified Chinese text version of the three information points:
  • for: 这篇论文目标是将图神经网络(GNNs)在大量数据上进行训练,同时减少计算和数据资源的需求。
  • methods: 论文提出了一种名为 Mirage的幻化算法,它将计算数据本身压缩成一个简洁的幻化摘要,而不是在原始训练集上优化梯度流。
  • results: 论文表明,Mirage 比州态艺法(baselines)有更高的泛化精度、数据压缩和幻化效率,并且是一种无监督的、architecture-agnostic的幻化算法。
    Abstract GNNs, like other deep learning models, are data and computation hungry. There is a pressing need to scale training of GNNs on large datasets to enable their usage on low-resource environments. Graph distillation is an effort in that direction with the aim to construct a smaller synthetic training set from the original training data without significantly compromising model performance. While initial efforts are promising, this work is motivated by two key observations: (1) Existing graph distillation algorithms themselves rely on training with the full dataset, which undermines the very premise of graph distillation. (2) The distillation process is specific to the target GNN architecture and hyper-parameters and thus not robust to changes in the modeling pipeline. We circumvent these limitations by designing a distillation algorithm called Mirage for graph classification. Mirage is built on the insight that a message-passing GNN decomposes the input graph into a multiset of computation trees. Furthermore, the frequency distribution of computation trees is often skewed in nature, enabling us to condense this data into a concise distilled summary. By compressing the computation data itself, as opposed to emulating gradient flows on the original training set-a prevalent approach to date-Mirage transforms into an unsupervised and architecture-agnostic distillation algorithm. Extensive benchmarking on real-world datasets underscores Mirage's superiority, showcasing enhanced generalization accuracy, data compression, and distillation efficiency when compared to state-of-the-art baselines.
    摘要 图学混淆(Graph Distillation)是一种尝试通过从原始训练数据中构建一个更小的合成训练集来实现这一目标。然而,现有的图学混淆算法它们自己也需要训练使用整个数据集,这会让图学混淆失去意义。另外,现有的图学混淆算法往往是特定于目标GNN结构和参数,因此不具有对模型pipeline的灵活性和稳定性。为了解决这些限制,我们提出了一种名为 Mirage的图学混淆算法,这种算法基于GNN在输入图上进行消息传递的性质。我们发现,GNN在输入图上进行消息传递时,会将图转化为一个多重 computation tree 的集合。此外,这些 computation tree 的频率分布往往具有极值性,因此我们可以通过简化这些数据来生成一个简洁的混淆SUMMARY。相比于以往的方法,Mirage不需要对原始训练数据进行批处理,而是直接对 computation data 进行压缩。这使得 Mirage 成为一种无监督的、建筑自适应的图学混淆算法。我们在实际的 dataset 上进行了广泛的测试,结果显示 Mirage 的总体性能明显高于现有的基线。 Mirage 可以更好地捕捉图数据的特点,同时具有更好的数据压缩和混淆效率。

Unified High-binding Watermark for Unconditional Image Generation Models

  • paper_url: http://arxiv.org/abs/2310.09479
  • repo_url: None
  • paper_authors: Ruinan Ma, Yu-an Tan, Shangbo Wu, Tian Chen, Yajie Wang, Yuanzhang Li
  • for: 防止AI生成图像模型数据盗用和版权侵犯
  • methods: 使用隐形编码器 Writing watermark 图像到原始 AIGC 工具输出图像,然后通过相应的解码器检测和验证 Whether the suspicious model steals the original AIGC tool data
  • results: 实验表明,我们的方法可以在只使用模型输出图像的情况下,几乎达到零假阳性率,并且可以跨多种 UIG 模型进行数据盗用验证,提高方法的实用性。
    Abstract Deep learning techniques have implemented many unconditional image generation (UIG) models, such as GAN, Diffusion model, etc. The extremely realistic images (also known as AI-Generated Content, AIGC for short) produced by these models bring urgent needs for intellectual property protection such as data traceability and copyright certification. An attacker can steal the output images of the target model and use them as part of the training data to train a private surrogate UIG model. The implementation mechanisms of UIG models are diverse and complex, and there is no unified and effective protection and verification method at present. To address these issues, we propose a two-stage unified watermark verification mechanism with high-binding effects for such models. In the first stage, we use an encoder to invisibly write the watermark image into the output images of the original AIGC tool, and reversely extract the watermark image through the corresponding decoder. In the second stage, we design the decoder fine-tuning process, and the fine-tuned decoder can make correct judgments on whether the suspicious model steals the original AIGC tool data. Experiments demonstrate our method can complete the verification work with almost zero false positive rate under the condition of only using the model output images. Moreover, the proposed method can achieve data steal verification across different types of UIG models, which further increases the practicality of the method.
    摘要 深度学习技术已经实现了许多无条件图像生成(UIG)模型,如GAN、扩散模型等。这些模型生成的极其真实的图像(也称为AI生成内容,AIGC)带来了知识产权保护的紧迫需求,如数据追溯和版权证书。一个攻击者可以偷窃目标模型的输出图像,并使其成为私人代理UIG模型的训练数据。现有的实现机制多样化和复杂,无一统的有效保护和验证方法。为解决这些问题,我们提出了一种两阶段统一水印验证机制,具有高绑定效果。在第一阶段,我们使用编码器将水印图像隐身地写入原始AIGC工具的输出图像中,并通过对应的解码器反向提取水印图像。在第二阶段,我们设计了细化过程,并使用细化过的解码器可以正确地判断是否有恶意模型偷窃原始AIGC工具数据。实验表明,我们的方法可以在只使用模型输出图像的情况下完成验证工作,并且几乎没有假阳性结果。此外,我们的方法还可以验证不同类型的UIG模型数据,从而进一步提高方法的实用性。

HIO-SDF: Hierarchical Incremental Online Signed Distance Fields

  • paper_url: http://arxiv.org/abs/2310.09463
  • repo_url: None
  • paper_authors: Vasileios Vasilopoulos, Suveer Garg, Jinwook Huh, Bhoram Lee, Volkan Isler
    for: 这个论文的目的是为了开发一种能够高效、可 updatable 的大型移动机器工作空间表示方法。methods: 这个方法使用了签名距离场(SDF)来表示环境,并使用层次结构来结合粗细网格和神经网络来实现高效更新和空间占用优化。results: 这个方法在所有测试场景中都达到了46%的全球SDF错误平均值,并且在同等分辨率的粗细网格上达到了30%的错误低点。
    Abstract A good representation of a large, complex mobile robot workspace must be space-efficient yet capable of encoding relevant geometric details. When exploring unknown environments, it needs to be updatable incrementally in an online fashion. We introduce HIO-SDF, a new method that represents the environment as a Signed Distance Field (SDF). State of the art representations of SDFs are based on either neural networks or voxel grids. Neural networks are capable of representing the SDF continuously. However, they are hard to update incrementally as neural networks tend to forget previously observed parts of the environment unless an extensive sensor history is stored for training. Voxel-based representations do not have this problem but they are not space-efficient especially in large environments with fine details. HIO-SDF combines the advantages of these representations using a hierarchical approach which employs a coarse voxel grid that captures the observed parts of the environment together with high-resolution local information to train a neural network. HIO-SDF achieves a 46% lower mean global SDF error across all test scenes than a state of the art continuous representation, and a 30% lower error than a discrete representation at the same resolution as our coarse global SDF grid.
    摘要 一个好的大型移动机器工作空间表示应该是效率高而能够包含相关的几何细节。当探索未知环境时,它需要在线更新可能。我们介绍了HIO-SDF,一种新的方法,它表示环境为签名距离场(SDF)。现有的SDF表示方法包括神经网络或VOXEL网格。神经网络可以持续表示SDF,但它们难以在线更新,除非保留了大量的感知历史用于训练。VOXEL网格没有这个问题,但它们在大型环境中不是太空效率。HIO-SDF结合了这些表示方法的优点,使用层次方法,其中使用粗粒度的VOXEL网格捕捉到环境中观察到的部分,并使用高分辨率的地方信息来训练神经网络。HIO-SDF在所有测试场景中的平均全球SDF误差比现有的连续表示方法下降46%,比同等分辨率的粗粒度全球SDF网格下降30%。

A Framework for Empowering Reinforcement Learning Agents with Causal Analysis: Enhancing Automated Cryptocurrency Trading

  • paper_url: http://arxiv.org/abs/2310.09462
  • repo_url: None
  • paper_authors: Rasoul Amirzadeh, Dhananjay Thiruvady, Asef Nazari, Mong Shan Ee
  • for: 本研究旨在提高人工智能增强交易方法的财务效益,通过开发一个基于强化学习的自动交易系统,以便在随时变化的加密货币市场中实现更高的回报。
  • methods: 我们提出了一个名为CausalReinforceNet的框架,用于支持决策系统。该框架通过 causal 分析增强了强化学习代理的能力。在 feature 工程过程中,我们使用 bayesian 网络来确定最有关系的特征,以便影响加密货币价格的变化。此外,我们还在决策过程中添加了 probabilistic 价格方向信号,以提高我们的强化学习代理的决策能力。由于加密货币市场的高投机性,我们设计了一种保守的方法,限制卖出和买入的位置大小,以管理风险。
  • results: 我们的框架在比较以 Buy-and-Hold 策略为准的情况下,有显著的财务效益。此外,我们开发了两个基于 CausalReinforceNet 框架的强化学习代理,其中一个基于 Q-learning 算法,另一个基于 deep Q-learning 算法。两个代理在 Binance Coin 和 Ethereum 等加密货币上都实现了显著的回报。
    Abstract Despite advances in artificial intelligence-enhanced trading methods, developing a profitable automated trading system remains challenging in the rapidly evolving cryptocurrency market. This study aims to address these challenges by developing a reinforcement learning-based automated trading system for five popular altcoins~(cryptocurrencies other than Bitcoin): Binance Coin, Ethereum, Litecoin, Ripple, and Tether. To this end, we present CausalReinforceNet, a framework framed as a decision support system. Designed as the foundational architecture of the trading system, the CausalReinforceNet framework enhances the capabilities of the reinforcement learning agent through causal analysis. Within this framework, we use Bayesian networks in the feature engineering process to identify the most relevant features with causal relationships that influence cryptocurrency price movements. Additionally, we incorporate probabilistic price direction signals from dynamic Bayesian networks to enhance our reinforcement learning agent's decision-making. Due to the high volatility of the cryptocurrency market, we design our framework to adopt a conservative approach that limits sell and buy position sizes to manage risk. We develop two agents using the CausalReinforceNet framework, each based on distinct reinforcement learning algorithms. The results indicate that our framework substantially surpasses the Buy-and-Hold benchmark strategy in profitability. Additionally, both agents generated notable returns on investment for Binance Coin and Ethereum.
    摘要 尽管人工智能增强交易方法有所进步,但在快速发展的 криптовалю短证市场中建立可得利的自动交易系统仍然是一项挑战。本研究目的在于解决这些挑战,通过开发一个基于强化学习的自动交易系统,用于五种流行的 altcoin(非比特币的 криптовалю): Binance Coin、Ethereum、Litecoin、Ripple 和 Tether。为此,我们提出了 CausalReinforceNet 框架,它是一个决策支持系统。这个框架通过 causal 分析增强了强化学习代理的能力。在这个框架中,我们使用 bayesian 网络进行特征工程,以确定对 криптовалю价格变化的影响最重要的特征。此外,我们还将 probabilistic 价格方向信号 incorporated 到强化学习代理的决策中。由于 криптовалю市场的高投资风险,我们设计了一种保守的approach,限制卖出和买入的位置大小,以管理风险。我们使用 CausalReinforceNet 框架开发了两个代理,每个基于不同的强化学习算法。结果表明,我们的框架在利润方面有所进步,而且两个代理对 Binance Coin 和 Ethereum 都取得了显著的回报。

Metacognitive threshold: a computational account

  • paper_url: http://arxiv.org/abs/2310.13005
  • repo_url: None
  • paper_authors: Brendan Conway-Smith, Robert L. West
  • for: 本研究旨在计算地考虑认知门槛(认知状态能够被识别的最小刺激量),并讨论可能影响认知门槛的认知训练和冥想。
  • methods: 本研究使用计算方法计算认知门槛,并采用认知训练和冥想来调整认知门槛。
  • results: 研究发现,通过认知训练和冥想,可以影响认知门槛,从而提高认知能力。
    Abstract This paper will explore ways of computationally accounting for the metacognitive threshold -- the minimum amount of stimulus needed for a mental state to be perceived -- and discuss potential cognitive mechanisms by which this threshold can be influenced through metacognitive training and meditation.
    摘要 这篇论文将探讨计算方法来考虑认知阈值(最少的刺激量可以让一种心理状态被感知),并讨论可能通过认知培训和禅定来影响认知阈值的认知机制。Here's a breakdown of the translation:* 这篇论文 (zhè běn tōng zhì) - This paper* 将探讨 (shall discuss) - will explore* 计算方法 (jìsuān fāngyì) - computational methods* 认知阈值 (rènqì jiāngrù) - metacognitive threshold* 可以让 (kěyǐ jiàng) - can be* 一种心理状态 (yī zhǒng xīn líng zhèng) - a mental state* 被感知 (bèi gǎn zhī) - is perceived* 并讨论 (bìng tālūn) - and discuss* 认知机制 (rènqì jīfāng) - cognitive mechanisms* 可以影响 (kěyǐ yǐngxiǎng) - can be influenced* 通过 (tōngguò) - through* 认知培训 (rènqì pīxùn) - metacognitive training* 和 (hē) - and* 禅定 (chán dìng) - meditation

Large Language Model Unlearning

  • paper_url: http://arxiv.org/abs/2310.10683
  • repo_url: https://github.com/kevinyaobytedance/llm_unlearn
  • paper_authors: Yuanshun Yao, Xiaojun Xu, Yang Liu
  • for: 本研究旨在探讨如何使用语言模型(LLM)进行忘卷(unlearning),即忘记不良(mis)行为。
  • methods: 本研究使用了忘卷技术,包括使用负例(negative examples)来对LML进行Alignment。
  • results: 研究显示,使用忘卷技术可以更有效地对LML进行Alignment,只需要负例来调整LML的行为。此外,忘卷技术还具有计算效率和可靠性的优点。
    Abstract We study how to perform unlearning, i.e. forgetting undesirable (mis)behaviors, on large language models (LLMs). We show at least three scenarios of aligning LLMs with human preferences can benefit from unlearning: (1) removing harmful responses, (2) erasing copyright-protected content as requested, and (3) eliminating hallucinations. Unlearning, as an alignment technique, has three advantages. (1) It only requires negative (e.g. harmful) examples, which are much easier and cheaper to collect (e.g. via red teaming or user reporting) than positive (e.g. helpful and often human-written) examples required in RLHF (RL from human feedback). (2) It is computationally efficient. (3) It is especially effective when we know which training samples cause the misbehavior. To the best of our knowledge, our work is among the first to explore LLM unlearning. We are also among the first to formulate the settings, goals, and evaluations in LLM unlearning. We show that if practitioners only have limited resources, and therefore the priority is to stop generating undesirable outputs rather than to try to generate desirable outputs, unlearning is particularly appealing. Despite only having negative samples, our ablation study shows that unlearning can still achieve better alignment performance than RLHF with just 2% of its computational time.
    摘要 我们研究如何进行“忘记”(unlearning),即Language Model(LLM)中的“不良”(undesirable)行为的忘记。我们显示了至少三种场景,在这些场景中,对LML进行Alignment可以受益于忘记:(1) removing harmful responses,(2) erasing copyright-protected content as requested,和(3) eliminating hallucinations。忘记作为一种Alignment技术,有三个优势:(1)只需要负(例如,危险)示例,这些示例比较容易和便宜地收集(例如,通过红团或用户报告),与RLHF(RL from human feedback)中需要的正(例如,有用和常常是人工写的)示例相比。(2) computationally efficient。(3)特别有用当我们知道哪些训练示例导致了不良行为。据我们所知,我们的工作是LLM unlearning中的一员,同时我们还是LLM unlearning中的第一个进行设定、目标和评估的人。我们发现,当有限的资源的情况下,优先级是停止生成不良输出,而不是尝试生成有用输出的情况下,忘记是非常吸引人的。尽管只有负示例,我们的剥离研究显示,忘记仍然可以在RLHF的2%的计算时间内达到更好的Alignment性能。

LgTS: Dynamic Task Sampling using LLM-generated sub-goals for Reinforcement Learning Agents

  • paper_url: http://arxiv.org/abs/2310.09454
  • repo_url: https://github.com/shukla-yash/LgTS-LLM-guided-Dynamic-Task-Sampling-for-RL-agents
  • paper_authors: Yash Shukla, Wenchang Gao, Vasanth Sarathy, Alvaro Velasquez, Robert Wright, Jivko Sinapov
    for: 这个论文旨在探讨大语言模型(LLM)在人工智能机器人和代理人问题中的规划能力,以及如何使用LLM来帮助RL代理人学习和行动。methods: 该论文提出了一种新的LgTS(LLM导导教师学习)方法,该方法利用LLM生成目标状态的子目标图表示,并使用教师学生学习算法让RL代理人学习从起始状态到目标状态的策略,同时尽量减少环境互动次数。与先前的LLM使用方法不同,该方法不需要访问专有或精度调整的LLM,也不需要预训练的策略来实现LLM提出的子目标。results: 通过在基于DoorKeyDomain的格リッド世界和搜索救援领域的实验,我们表明了LLM生成的子目标图表示有助于RL代理人学习LLM提出的子目标,并且教师学生学习算法可以减少环境互动次数。
    Abstract Recent advancements in reasoning abilities of Large Language Models (LLM) has promoted their usage in problems that require high-level planning for robots and artificial agents. However, current techniques that utilize LLMs for such planning tasks make certain key assumptions such as, access to datasets that permit finetuning, meticulously engineered prompts that only provide relevant and essential information to the LLM, and most importantly, a deterministic approach to allow execution of the LLM responses either in the form of existing policies or plan operators. In this work, we propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs to provide a graphical representation of the sub-goals to a reinforcement learning (RL) agent that does not have access to the transition dynamics of the environment. The RL agent uses Teacher-Student learning algorithm to learn a set of successful policies for reaching the goal state from the start state while simultaneously minimizing the number of environmental interactions. Unlike previous methods that utilize LLMs, our approach does not assume access to a propreitary or a fine-tuned LLM, nor does it require pre-trained policies that achieve the sub-goals proposed by the LLM. Through experiments on a gridworld based DoorKey domain and a search-and-rescue inspired domain, we show that generating a graphical structure of sub-goals helps in learning policies for the LLM proposed sub-goals and the Teacher-Student learning algorithm minimizes the number of environment interactions when the transition dynamics are unknown.
    摘要