cs.CL - 2023-08-20

CharacterChat: Learning towards Conversational AI with Personalized Social Support

  • paper_url: http://arxiv.org/abs/2308.10278
  • repo_url: https://github.com/morecry/characterchat
  • paper_authors: Quan Tu, Chuanqi Chen, Jinpeng Li, Yanran Li, Shuo Shang, Dongyan Zhao, Ran Wang, Rui Yan
  • for: 提供个性化社交支持
  • methods: 使用人类类型分解法(MBTI)和虚拟人物银行(MBTI-1024 Bank),开发了一个基于角色和记忆的对话系统(CharacterChat),并实现了人类类型匹配机制,以提供个性化的社交支持
  • results: 实验结果表明,CharacterChat 可以提供高效的个性化社交支持,并且人类类型匹配机制具有显著的优势。
    Abstract In our modern, fast-paced, and interconnected world, the importance of mental well-being has grown into a matter of great urgency. However, traditional methods such as Emotional Support Conversations (ESC) face challenges in effectively addressing a diverse range of individual personalities. In response, we introduce the Social Support Conversation (S2Conv) framework. It comprises a series of support agents and the interpersonal matching mechanism, linking individuals with persona-compatible virtual supporters. Utilizing persona decomposition based on the MBTI (Myers-Briggs Type Indicator), we have created the MBTI-1024 Bank, a group that of virtual characters with distinct profiles. Through improved role-playing prompts with behavior preset and dynamic memory, we facilitate the development of the MBTI-S2Conv dataset, which contains conversations between the characters in the MBTI-1024 Bank. Building upon these foundations, we present CharacterChat, a comprehensive S2Conv system, which includes a conversational model driven by personas and memories, along with an interpersonal matching plugin model that dispatches the optimal supporters from the MBTI-1024 Bank for individuals with specific personas. Empirical results indicate the remarkable efficacy of CharacterChat in providing personalized social support and highlight the substantial advantages derived from interpersonal matching. The source code is available in \url{https://github.com/morecry/CharacterChat}.
    摘要 在我们现代、快速发展、全球连接的世界中,个人心理健康的重要性日益提高。然而,传统的方法如情感支持对话(ESC)面临着困难,因为它们难以满足个人多样化的需求。为此,我们提出了社交支持对话(S2Conv)框架。它包括一系列的支持代理和人际匹配机制,将个人与具有相似人格特质的虚拟支持者联系起来。通过基于MBTI(Myers-Briggs Type Indicator)的人格分解,我们建立了MBTI-1024银行,一组包含具有明确特征的虚拟人物。我们通过改进的角色扮演提示和动态记忆,实现了MBTI-S2Conv数据集的开发,这些对话发生在MBTI-1024银行中。基于这些基础,我们介绍了CharacterChat,一个完整的S2Conv系统,包括驱动角色和记忆的对话模型,以及一个人际匹配插件模型,可以从MBTI-1024银行中派发最佳的支持者 для特定的人格类型。实验结果表明CharacterChat在提供个性化社交支持方面表现出了极高的效果,并且人际匹配带来了显著的优势。代码可以在上获取。

Scaled-up Discovery of Latent Concepts in Deep NLP Models

  • paper_url: http://arxiv.org/abs/2308.10263
  • repo_url: None
  • paper_authors: Majd Hawasly, Fahim Dalvi, Nadir Durrani
  • for: 这个研究是为了比较不同的聚类算法,以找出预训练语言模型中表示的编码概念。
  • methods: 这个研究使用了三种聚类算法:聚合 Hierarchical Clustering、Leaders Algorithm 和 K-Means Clustering,以确定它们在 humans 定义的 ontology 上的对应。
  • results: 研究结果表明,K-Means 算法有可能扩展到非常大的数据集,以获得丰富的干ovat 概念发现,包括单词和短语水平。
    Abstract Pre-trained language models (pLMs) learn intricate patterns and contextual dependencies via unsupervised learning on vast text data, driving breakthroughs across NLP tasks. Despite these achievements, these models remain black boxes, necessitating research into understanding their decision-making processes. Recent studies explore representation analysis by clustering latent spaces within pre-trained models. However, these approaches are limited in terms of scalability and the scope of interpretation because of high computation costs of clustering algorithms. This study focuses on comparing clustering algorithms for the purpose of scaling encoded concept discovery of representations from pLMs. Specifically, we compare three algorithms in their capacity to unveil the encoded concepts through their alignment to human-defined ontologies: Agglomerative Hierarchical Clustering, Leaders Algorithm, and K-Means Clustering. Our results show that K-Means has the potential to scale to very large datasets, allowing rich latent concept discovery, both on the word and phrase level.
    摘要 пре-trained语言模型(pLMs)通过自动学习大量文本数据,学习到了复杂的模式和语义依赖关系,导致了各种自然语言处理任务的突破。尽管如此,这些模型仍然是黑盒子,需要研究其决策过程。latest studies explore representation analysis by clustering latent spaces within pre-trained models. However, these approaches are limited in terms of scalability and the scope of interpretation because of high computation costs of clustering algorithms. This study focuses on comparing clustering algorithms for the purpose of scaling encoded concept discovery of representations from pLMs. Specifically, we compare three algorithms in their capacity to unveil the encoded concepts through their alignment to human-defined ontologies: Agglomerative Hierarchical Clustering, Leaders Algorithm, and K-Means Clustering. Our results show that K-Means has the potential to scale to very large datasets, allowing rich latent concept discovery, both on the word and phrase level.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

How Good Are Large Language Models at Out-of-Distribution Detection?

  • paper_url: http://arxiv.org/abs/2308.10261
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Bo Liu, Liming Zhan, Zexin Lu, Yujie Feng, Lei Xue, Xiao-Ming Wu
  • for: 这个研究探讨了大型自然语言模型(LLM)在不同语言模型中进行Out-of-distribution(OOD)探测的可靠性。
  • methods: 这个研究使用了各种常见的OOD探测器,并对它们进行了zero-grad和精确调整的评估。此外,研究者还将先前的探测性内部训练改为生成式训练,以更好地适应LLM的预训练目标和下游任务。
  • results: 研究结果显示,一个简单的径向距离OOD探测器在LLM中表现出色,超越其他OOD探测器。研究者提供了一个新的理解,即LLM的嵌入空间具有iso对称性,这使得LLM更容易探测OOD数据。这个新的理解可以帮助提高LLM的适应力和可靠性在动态环境中。
    Abstract Out-of-distribution (OOD) detection plays a vital role in enhancing the reliability of machine learning (ML) models. The emergence of large language models (LLMs) has catalyzed a paradigm shift within the ML community, showcasing their exceptional capabilities across diverse natural language processing tasks. While existing research has probed OOD detection with relative small-scale Transformers like BERT, RoBERTa and GPT-2, the stark differences in scales, pre-training objectives, and inference paradigms call into question the applicability of these findings to LLMs. This paper embarks on a pioneering empirical investigation of OOD detection in the domain of LLMs, focusing on LLaMA series ranging from 7B to 65B in size. We thoroughly evaluate commonly-used OOD detectors, scrutinizing their performance in both zero-grad and fine-tuning scenarios. Notably, we alter previous discriminative in-distribution fine-tuning into generative fine-tuning, aligning the pre-training objective of LLMs with downstream tasks. Our findings unveil that a simple cosine distance OOD detector demonstrates superior efficacy, outperforming other OOD detectors. We provide an intriguing explanation for this phenomenon by highlighting the isotropic nature of the embedding spaces of LLMs, which distinctly contrasts with the anisotropic property observed in smaller BERT family models. The new insight enhances our understanding of how LLMs detect OOD data, thereby enhancing their adaptability and reliability in dynamic environments.
    摘要 大量语言模型(LLM)在机器学习(ML)领域的应用已经促使了一场 Paradigma shift。 existing research 探索了使用小型 transformer like BERT, RoBERTa 和 GPT-2 的 OOD 检测,但是这些发现的可靠性是否适用于 LLM 仍然存在问题。这篇文章开始了对 LLM 领域 OOD 检测的先锋性实验研究,专注于 LLaMA 系列模型,从 7B 到 65B 的大小。我们仔细评估了常用的 OOD 检测器,在零个 grad 和精度调整两种场景中进行了全面的评估。另外,我们改变了先前的推荐准确预测 OOD 检测器,转换为生成式调整,使得 LLM 的预训练目标与下游任务更加一致。我们的发现表明,一个简单的偏度距离 OOD 检测器在 LLM 中表现出色,超过其他 OOD 检测器。我们提供了一个有趣的解释,强调 LLMA 的均匀空间特性,与小型 BERT 家族模型所见的极性特性不同。这一新发现改善了我们对 LLM 的检测 OOD 数据的理解,从而提高了它们在动态环境中的适应性和可靠性。

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

  • paper_url: http://arxiv.org/abs/2308.10253
  • repo_url: https://github.com/icoz69/stablellava
  • paper_authors: Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei
  • for: 研究者们希望通过开发多模态大型语言模型(LLMs)来实现更好地融合文本和视觉模态,并在理解人类指令时进行更好的对接。
  • methods: 研究者们提出了一种新的数据收集方法,即同步生成图像和对话,以便为视觉指令调整。这种方法利用了生成模型的能力,将文本生成模型和图像生成模型结合起来,以生成多样化和可控的图像数据集。
  • results: 研究人员对多个数据集进行了广泛的实验,并使用开源的 LLAVA 模型作为测试平台。结果表明,该方法可以提高多达十个常见的能力指标,包括图像生成、对话生成和多模态对接等。
    Abstract The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions. Current methodologies often rely on annotations derived from benchmark datasets to construct image-dialogue datasets for training purposes, akin to instruction tuning in LLMs. However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models. In an effort to mitigate these limitations, we propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models to yield a diverse and controllable dataset with varied image content. This not only provides greater flexibility compared to existing methodologies but also significantly enhances several model capabilities. Our research includes comprehensive experiments conducted on various datasets using the open-source LLAVA model as a testbed for our proposed pipeline. Our results underscore marked enhancements across more than ten commonly assessed capabilities,
    摘要 “OpenAI的GPT-4的多Modal功能已经引起了很大的关注,导致多Modal大型语言模型(LLM)的开发获得更多的注意。主要研究目标之一是让文字和影像模式有效地调和,并且理解人类的指令。现有的方法ologies往往靠摄取来自测试集的标注来建立图像对话集,类似于执行调整LLMs。但这些数据集 часто会受到领域偏见,这可能对模型的生成能力产生限制。为了解决这些限制,我们提出了一种新的数据收集方法, synchronously 将图像和对话聚合在一起,以便对图像进行调整。这种方法利用了生成模型的能力,结合了ChatGPT和文本到图像生成模型,从而产生了多样化且可控的数据集。这不仅提供了更大的灵活性,而且也对模型的多个功能进行了明显改善。我们的研究包括了对不同数据集进行了广泛的实验,使用了开源的Llava模型作为我们的提案流水线的测试床。我们的结果显示,在多于十种常规评估能力上,我们的方法实现了明显的改善。”

Activation Addition: Steering Language Models Without Optimization

  • paper_url: http://arxiv.org/abs/2308.10248
  • repo_url: None
  • paper_authors: Alex Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid
  • for: 控制大型自然语言模型(LLM)的行为是一个当前仍然没有解决的问题。现有的方法包括监督微调、人工回馈学习(RLHF)、提示工程和导航式解码。我们却 investigate activation engineering:在推理时修改激活函数以预测性地改变模型行为。特别是,我们在推理过程中添加一个“导航向量”,这个向量通过自然语言来隐式地定义。
  • methods: 我们的ActAdd方法不同于过去的工作(Subramani et al. 2022;Hernandez et al. 2023),它不需要学习这些导航向量。而是通过对提示集中的激活差异来计算导航向量。
  • results: 我们在GPT-2上进行了OpenWebText和ConceptNet的测试,发现我们的推理时间方法可以控制输出的高级属性,并且保持目标模型性能不受影响。这种方法比监督微调或RLHF需要更少的计算和实现努力,允许用户通过自然语言提示来Specify要求,并且其负担随模型大小呈线性增长。
    Abstract Reliably controlling the behavior of large language models (LLMs) is a pressing open problem. Existing methods include supervised finetuning, reinforcement learning from human feedback (RLHF), prompt engineering and guided decoding. We instead investigate activation engineering: modifying activations at inference time to predictably alter model behavior. In particular, we bias the forward pass with an added 'steering vector' implicitly specified through natural language. Unlike past work which learned these steering vectors (Subramani, Suresh, and Peters 2022; Hernandez, Li, and Andreas 2023), our Activation Addition (ActAdd) method computes them by taking the activation differences that result from pairs of prompts. We demonstrate ActAdd on GPT-2 on OpenWebText and ConceptNet. Our inference-time approach yields control over high-level properties of output and preserves off-target model performance. It involves far less compute and implementation effort compared to finetuning or RLHF, allows users to provide natural language specifications, and its overhead scales naturally with model size.
    摘要 大型语言模型(LLM)的可靠控制问题是一个开放的问题。现有的方法包括监督微调、人工反馈学习(RLHF)、提示工程和导航解码。我们尝试 investigate 活动工程:在推理时修改活动以predictably 改变模型行为。特别是,我们使用添加的 "导航向量" 隐式地由自然语言特定。与过去的工作不同(Subramani et al. 2022;Hernandez et al. 2023),我们的 Activation Addition(ActAdd)方法不是学习这些导航向量,而是通过对提示的活动差异来计算它们。我们在 GPT-2 上对 OpenWebText 和 ConceptNet 进行了测试,并证明了我们的推理时间方法可以控制输出的高级属性,并保持目标模型性能。它比 finetuning 或 RLHF 需要更少的计算和实现努力,允许用户提供自然语言规范,并且其开销随模型大小呈线性增长。

Indonesian Automatic Speech Recognition with XLSR-53

  • paper_url: http://arxiv.org/abs/2308.11589
  • repo_url: None
  • paper_authors: Panji Arisaputra, Amalia Zahra
  • For: The paper aims to develop an Indonesian Automatic Speech Recognition (ASR) system using the XLSR-53 pre-trained model to reduce the amount of training data required to achieve a competitive Word Error Rate (WER).* Methods: The study uses the XLSR-53 pre-trained model and a combination of three datasets: TITML-IDN, Magic Data, and Common Voice, with a total of 24 hours, 18 minutes, and 1 second of data. The model is further improved using a language model to reduce the WER by around 8%.* Results: The study achieves a WER of 20%, which is competitive with similar models using the Common Voice dataset split test. Additionally, the use of a language model results in a WER of 12%, representing an 8% reduction in error rate. The results demonstrate the effectiveness of the proposed approach in developing a better Indonesian ASR system with a smaller amount of data.
    Abstract This study focuses on the development of Indonesian Automatic Speech Recognition (ASR) using the XLSR-53 pre-trained model, the XLSR stands for cross-lingual speech representations. The use of this XLSR-53 pre-trained model is to significantly reduce the amount of training data in non-English languages required to achieve a competitive Word Error Rate (WER). The total amount of data used in this study is 24 hours, 18 minutes, and 1 second: (1) TITML-IDN 14 hours and 31 minutes; (2) Magic Data 3 hours and 33 minutes; and (3) Common Voice 6 hours, 14 minutes, and 1 second. With a WER of 20%, the model built in this study can compete with similar models using the Common Voice dataset split test. WER can be decreased by around 8% using a language model, resulted in WER from 20% to 12%. Thus, the results of this study have succeeded in perfecting previous research in contributing to the creation of a better Indonesian ASR with a smaller amount of data.
    摘要

WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning

  • paper_url: http://arxiv.org/abs/2308.10195
  • repo_url: None
  • paper_authors: Dongjian Huo, Zehong Zhang, Hanjing Su, Guanbin Li, Chaowei Fang, Qingyao Wu
  • for: 提高水印图像的权利保护和防范水印图像的篡改
  • methods: 基于隐式联合学习和跨通道注意力的水印除法
  • results: 比前方法提高较多,在多种复杂的benchmark上展示出极高的效果
    Abstract Watermarking serves as a widely adopted approach to safeguard media copyright. In parallel, the research focus has extended to watermark removal techniques, offering an adversarial means to enhance watermark robustness and foster advancements in the watermarking field. Existing watermark removal methods mainly rely on UNet with task-specific decoder branches--one for watermark localization and the other for background image restoration. However, watermark localization and background restoration are not isolated tasks; precise watermark localization inherently implies regions necessitating restoration, and the background restoration process contributes to more accurate watermark localization. To holistically integrate information from both branches, we introduce an implicit joint learning paradigm. This empowers the network to autonomously navigate the flow of information between implicit branches through a gate mechanism. Furthermore, we employ cross-channel attention to facilitate local detail restoration and holistic structural comprehension, while harnessing nested structures to integrate multi-scale information. Extensive experiments are conducted on various challenging benchmarks to validate the effectiveness of our proposed method. The results demonstrate our approach's remarkable superiority, surpassing existing state-of-the-art methods by a large margin.
    摘要 水印加密是一种广泛采用的媒体版权保护方法。同时,研究焦点已经扩展到水印去除技术,提供了一种对水印的敌对性加强和 watermarking 领域的进步。现有的水印去除方法主要基于 UNet WITH task-specific decoder branches,一个用于水印localization,另一个用于背景图像修复。但水印localization 和背景修复不是独立的任务,准确的水印localization 直接影响了需要修复的区域,而背景修复过程也会提高水印localization的准确性。为了整合这两个分支的信息,我们引入了隐式联合学习 paradigm。这使得网络可以自动地在两个分支之间流动信息,通过门 mechanism。此外,我们使用交叉通道注意力来促进地方细节修复和整体结构认知,同时利用嵌入结构来集成多尺度信息。我们在多个挑战性的标准底下进行了广泛的实验,以验证我们的提议的效果。结果表明,我们的方法在与现有状态的方法进行比较时表现出了很大的优势。

FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt

  • paper_url: http://arxiv.org/abs/2308.10173
  • repo_url: None
  • paper_authors: Zhixiao Qi, Yijiong Yu, Meiqi Tu, Junyi Tan, Yongfeng Huang
  • for: 这篇论文是为了开发一个适用于食品测试的大型自然语言处理模型而写的。
  • methods: 该论文提出了一种处理结构化知识和扫描文档的增量预训练方法,以及一种使用知识图来支持大语言模型中的检索。
  • results: 该论文未提供实验数据, future versions 将报告specific experimental data。
    Abstract Currently, the construction of large language models in specific domains is done by fine-tuning on a base model. Some models also incorporate knowledge bases without the need for pre-training. This is because the base model already contains domain-specific knowledge during the pre-training process. We build a large language model for food testing. Unlike the above approach, a significant amount of data in this domain exists in Scanning format for domain standard documents. In addition, there is a large amount of untrained structured knowledge. Therefore, we introduce an incremental pre-training step to inject this knowledge into a large language model. In this paper, we propose a method for handling structured knowledge and scanned documents in incremental pre-training. To overcome the problem of machine hallucination, we constructe a knowledge graph to serve as an external knowledge base for supporting retrieval in the large language model. It is worth mentioning that this paper is a technical report of our pre-release version, and we will report our specific experimental data in future versions.
    摘要 当前,大语言模型在特定领域的构建通常通过精度调整base模型来实现。一些模型还会integrate知识库,无需先行预训练。这是因为基模型在预训练过程中已经包含了领域特定的知识。我们构建了一个食品测试领域的大语言模型。与之前的方法不同的是,食品领域的大量数据存在扫描格式的域标准文档中,同时也有大量未训练的结构化知识。因此,我们介绍了一种递增预训练步骤,以注入这些知识到大语言模型中。在这篇论文中,我们提出了处理结构化知识和扫描文档的递增预训练方法。为了解决机器幻觉的问题,我们构建了一个知识图以作为大语言模型的外部知识库,以支持模型中的检索。值得一提的是,这篇论文是我们预发版的技术报告,未来版本中将公布我们的具体实验数据。

FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

  • paper_url: http://arxiv.org/abs/2308.10170
  • repo_url: None
  • paper_authors: Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, Xu Zhang, Yue Wu, Rakesh Chada, Pradeep Natarajan, Henrik I. Christensen
  • For: 这项研究关注了现实世界中的多轮图像检索系统,其中用户可以逐次提供信息来细化检索结果,直到找到满足所有要求的项目。* Methods: 我们提出了一种新的储存器基于方法,called FashionNTM,它使用了新的叠加型内存神经机器人(CM-NTM)方法来实现隐式状态管理,以学习将所有过去轮的信息集成到新的图像检索中。与普通的神经内存机器人(NTM)不同,CM-NTM可以处理多个输入,并通过 individuak 读写头与各自的内存进行交互,以学习复杂的关系。* Results: 我们的提出方法在Multi-turn FashionIQ数据集上进行了广泛的评估,并与前一代状态下的算法相比,提高了50.5%的性能。此外,我们还创建了一个扩展自Single-turn Shoes数据集的Multi-turn Shoes数据集,并在这个数据集上进行了进一步的分析和用户研究。最终,我们的模型在实际交互 Setting中展现了两个重要的特点:记忆保持 across turns,和不依赖于轮次顺序的反馈。用户研究结果表明,由FashionNTM所 retrieve 的图像被用户 preference над其他多轮模型的83.1%。项目页面:https://sites.google.com/eng.ucsd.edu/fashionntm
    Abstract Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in this work. Further analysis of the model in a real-world interactive setting demonstrates two important capabilities of our model -- memory retention across turns, and agnosticity to turn order for non-contradictory feedback. Finally, user study results show that images retrieved by FashionNTM were favored by 83.1% over other multi-turn models. Project page: https://sites.google.com/eng.ucsd.edu/fashionntm
    摘要 多回文本反馈基于时尚图像检索是关注现实场景,用户可以逐 turno 提供信息来细化检索结果,直到找到符合所有需求的图像。在这种多回系统中,我们提出了一种新的记忆型方法,即时尚NTM(FashionNTM)。我们的框架利用新的层次结构神经图计算机(CM-NTM)方法,以实现隐式状态管理,从而学习将所有过去的回合信息集成到新的图像检索中。不同于普通的神经图计算机(NTM),我们的CM-NTM可以处理多个输入,并通过个性化的读写头与各自的记忆进行交互,以学习复杂的关系。我们对Multi-turn FashionIQ和Multi-turn Shoes两个数据集进行了广泛的评估。结果表明,我们的提出的方法在Multi-turn FashionIQ上比前一个状态艺术算法提高50.5%,同时在Multi-turn Shoes上相对提高12.6%。进一步的分析表明,我们的模型在真实的交互设置下具有两个重要特点:首先,它可以保持多个回合的记忆,其次,它不依赖于回合顺序,对于不相互矛盾的反馈。最后,我们进行了用户研究,发现用户对FashionNTM所呈现的图像的偏好率为83.1%。项目页面:https://sites.google.com/eng.ucsd.edu/fashionntm

Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs?

  • paper_url: http://arxiv.org/abs/2308.10168
  • repo_url: None
  • paper_authors: Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, Xin Luna Dong
  • for: 本研究试图回答一些关于大语言模型(LLM)的问题,具体来说是 LLM 是如何掌握知识的?
  • methods: 作者构建了一个名为 Head-to-Tail 的benchmark,包含 18K 问题答案对 regarding 头、躯体和尾部事实的知名度。他们设计了一种自动评价方法和一组紧接近 LLM 内化知识的度量。
  • results: 通过对 14 种公共可用的 LLM 进行全面评估,作者显示了现有 LLM 对事实知识的掌握仍然很差,特别是对躯体到尾部实体的知识。
    Abstract Since the recent prosperity of Large Language Models (LLMs), there have been interleaved discussions regarding how to reduce hallucinations from LLM responses, how to increase the factuality of LLMs, and whether Knowledge Graphs (KGs), which store the world knowledge in a symbolic form, will be replaced with LLMs. In this paper, we try to answer these questions from a new angle: How knowledgeable are LLMs? To answer this question, we constructed Head-to-Tail, a benchmark that consists of 18K question-answer (QA) pairs regarding head, torso, and tail facts in terms of popularity. We designed an automated evaluation method and a set of metrics that closely approximate the knowledge an LLM confidently internalizes. Through a comprehensive evaluation of 14 publicly available LLMs, we show that existing LLMs are still far from being perfect in terms of their grasp of factual knowledge, especially for facts of torso-to-tail entities.
    摘要 自大语言模型(LLM)的繁荣以来,有许多关于如何减少LLM回答中的幻觉、如何提高LLM的事实性、以及是否将知识图(KG)取代LLM的讨论。在这篇论文中,我们尝试从一个新的角度回答这些问题:LLM们有多“知识”吗?为answer这个问题,我们构建了Head-to-Tail,一个包含18000个问题回答(QA)对 Regarding popularity的头、躯体和尾部事实。我们设计了自动评估方法和一组 metros that closely approximate the knowledge that LLM confidently internalizes。通过对14个公开available LLMs进行全面评估,我们发现现有LLMs仍然很遥谱于事实知识,尤其是 torso-to-tail entity的事实。

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

  • paper_url: http://arxiv.org/abs/2308.10107
  • repo_url: https://github.com/espnet/espnet
  • paper_authors: Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe
  • for: 这个研究目的是为了实现控制式的语音识别预测,使得预测结果能够满足特定的想定property。
  • methods: 这个研究使用了Bayes risk函数来设定低风险值,以将预测结果与想定的property相对准确。
  • results: 实验结果显示,提案的BRT可以节省推断成本,并且降低了整个系统延迟。 Specifically, BRT可以降低非流式ASR的推断成本 by up to 46%,并且降低流式ASR系统的整体延迟 by 41%.
    Abstract Automatic speech recognition (ASR) based on transducers is widely used. In training, a transducer maximizes the summed posteriors of all paths. The path with the highest posterior is commonly defined as the predicted alignment between the speech and the transcription. While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction. Specifically, this work proposes Bayes Risk Transducer (BRT), which uses a Bayes risk function to set lower risk values to the preferred paths so that the predicted alignment is more likely to satisfy specific desired properties. We further demonstrate that these predicted alignments with intentionally designed properties can provide practical advantages over the vanilla transducer. Experimentally, the proposed BRT saves inference cost by up to 46% for non-streaming ASR and reduces overall system latency by 41% for streaming ASR.
    摘要 自动语音识别(ASR)基于传感器广泛应用。在训练中,一个传感器最大化总 posterior的所有路径。路径的 posterior 最高的路径通常定义为 Speech 和转录的预测Alignment。而 vanilla 传感器没有任何有效路径的偏好,这项工作计划强制执行 Preferred 路径并实现可控制的预测Alignment。特别是,这项工作提出了 Bayes 风险函数来设置更低的风险值 для Preferred 路径,以便预测Alignment 满足特定的愿望性质。我们进一步示出,这些预测的Alignment 可以提供非常实用的优势,比如降低总系统延迟时间41%。Experiments 实验表明,提议的 BRT 可以在非流式 ASR 中降低推理成本达46%,并在流式 ASR 中降低总系统延迟时间41%。

PACE: Improving Prompt with Actor-Critic Editing for Large Language Model

  • paper_url: http://arxiv.org/abs/2308.10088
  • repo_url: None
  • paper_authors: Yihong Dong, Kangcheng Luo, Xue Jiang, Zhi Jin, Ge Li
  • for: 提高大语言模型(LLM)的表现,使其能够自动编辑提示。
  • methods: 基于actor-critic算法的Prompt with Actor-Critic Editing(PACE)方法,使用LLM作为 dual 角色的actor和critic,将提示视为策略。
  • results: 对24个指令生成任务和21个大规模任务进行了广泛的实验,结果表明,PACE可以提高中等/低质量人工写的提示的相对性能,最高提高98%,与高质量人工写提示的表现相当。此外,PACE还表现出了remarkable的提示生成能力。
    Abstract Large language models (LLMs) have showcased remarkable potential across various tasks by conditioning on prompts. However, the quality of different human-written prompts leads to substantial discrepancies in LLMs' performance, and improving prompts usually necessitates considerable human effort and expertise. To this end, this paper proposes Prompt with Actor-Critic Editing (PACE) for LLMs to enable automatic prompt editing. Drawing inspiration from the actor-critic algorithm in reinforcement learning, PACE leverages LLMs as the dual roles of actors and critics, conceptualizing prompt as a type of policy. PACE refines prompt, taking into account the feedback from both actors performing prompt and critics criticizing response. This process helps LLMs better align prompt to a specific task, thanks to real responses and thinking from LLMs. We conduct extensive experiments on 24 instruction induction tasks and 21 big-bench tasks. Experimental results indicate that PACE elevates the relative performance of medium/low-quality human-written prompts by up to 98\%, which has comparable performance to high-quality human-written prompts. Moreover, PACE also exhibits notable efficacy for prompt generation.
    摘要 大型语言模型(LLM)在不同任务上展现了惊人的潜力,但不同的人工写的提示导致了 LLM 的性能差异很大,通常需要大量的人工努力和专业知识来改进提示。为解决这问题,本文提出了基于actor-critic算法的提示编辑方法(PACE),以便自动编辑提示。在actor-critic算法中,LLM 扮演了两个角色:actor和critic,将提示视为策略。PACE 利用 LLM 对提示的反馈,以及对响应的批评,来优化提示,使 LLM 更好地适应特定任务。我们在 24 个指令生成任务和 21 个大型任务上进行了广泛的实验。实验结果表明,PACE 可以提高人工写的中等/低质量提示的相对性能,最高提高达 98%,与高质量人工写提示的性能相当。此外,PACE 还显示了明显的生成提示效果。