results: 对于基elines进行比较,我们发现我们的模型在推断仇恨言语方面表现出色,并进行了广泛的ablation研究。Abstract
We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks. In contrast to traditional text-only methods, our approach to labelling a comment as hate speech centers around the holistic analysis of text and images. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion that surrounds a comment, with interwoven fusion layers to combine text and image embeddings instead of processing different modalities separately. We compare the performance of our model to baselines that only process text; we also conduct extensive ablation studies. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation greatly advances the effort to detect anti-social behavior.
摘要
我们介绍了多模态讨论变换器(mDT),一种新的多模态图形基于变换器模型,用于在社交网络上探测仇恨言论。与传统的文本只方法不同,我们的方法将注意点在整个讨论环境中心于评论,而不是仅仅是单独处理评论的文本。我们利用图transformers来捕捉讨论中的上下文关系,并使用杂交层来合并文本和图像嵌入。我们与基准方法进行比较,并进行了广泛的剥夺研究。我们认为,捕捉整个对话的全景视图可以大幅提高探测反社会行为的努力。
Mutual Reinforcement Effects in Japanese Sentence Classification and Named Entity Recognition Tasks
paper_authors: Chengguang Gan, Qinghao Zhang, Tatsunori Mori
for: 本研究旨在探讨 sentence classification 和 named entity recognition 的traditional segmentation方法之间的复杂交互关系,以及这两个信息提取子任务之间的互相强制效应。
methods: 本研究提出了一种 Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach,combines Sentence Classification (SC) 和 named entity recognition (NER)。我们还开发了一个 Sentence-to-Label Generation (SLG) 框架,并使用了一个生成模型来生成 SC-标签、NER-标签和相关的文本段落。
results: 我们的结果显示,在 SCNM 中,SC 精度提高了1.13个点,NER 精度提高了1.06个点,并且使用 Constraint Mechanism (CM) 可以提高生成的格式精度。此外,我们还在单独的 SC 任务上实现了 SLG 框架,其性能比基准值更高。在 few-shot learning 实验中,SLG 框架也表现出了更好的性能。Abstract
Information extraction(IE) is a crucial subfield within natural language processing. However, for the traditionally segmented approach to sentence classification and Named Entity Recognition, the intricate interactions between these individual subtasks remain largely uninvestigated. In this study, we propose an integrative analysis, converging sentence classification with Named Entity Recognition, with the objective to unveil and comprehend the mutual reinforcement effect within these two information extraction subtasks. To achieve this, we introduce a Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach that combines Sentence Classification (SC) and Named Entity Recognition (NER). We develop a Sentence-to-Label Generation (SLG) framework for SCNM and construct a Wikipedia dataset containing both SC and NER. Using a format converter, we unify input formats and employ a generative model to generate SC-labels, NER-labels, and associated text segments. We propose a Constraint Mechanism (CM) to improve generated format accuracy. Our results show SC accuracy increased by 1.13 points and NER by 1.06 points in SCNM compared to standalone tasks, with CM raising format accuracy from 63.61 to 100. The findings indicate mutual reinforcement effects between SC and NER, and integration enhances both tasks' performance. We additionally implemented the SLG framework on single SC task. It yielded superior accuracies compared to the baseline on two distinct Japanese SC datasets. Notably, in the experiment of few-shot learning, SLG framework shows much better performance than fine-tune method. These empirical findings contribute additional evidence to affirm the efficacy of the SLG framework.
摘要
信息提取(IE)是自然语言处理的重要子领域。然而,传统上分割的方法 для句子分类和名实体识别,两个个人任务之间的复杂互动还没有得到了充分的研究。在这一study中,我们提议了一种集成分析,将句子分类与名实体识别集成在一起,以探索和理解这两个信息提取任务之间的互相强化效应。为此,我们提出了一种句子分类和名实体识别多任务(SCNM)方法,将句子分类(SC)和名实体识别(NER)相结合。我们开发了一个句子标签生成(SLG)框架 для SCNM,并使用WIKIPEDIA数据集来构建SC和NER的 dataset。使用一种格式转换器,我们将输入格式统一化,并使用生成模型生成SC标签、NER标签和相关的文本段。我们提出了一种约束机制(CM),以提高生成的格式准确性。我们的结果表明,SCNM比单独任务的SC和NER准确率提高1.13个点和1.06个点,并且CM可以提高格式准确性从63.61%提升到100%。这些实验结果表明,SC和NER之间存在互相强化效应,集成可以提高两个任务的性能。此外,我们还应用SLG框架于单个SC任务,其表现比基eline更高,特别是在几何学学习中。这些实验证据为SLG框架的可效性提供了更多的证据。
results: 对于语言模型、文本分类和图像分类等应用,LRPE比现有方法更高效,并且可以推导出更多的相对位置编码方法。Abstract
Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe.
摘要
“相对位置编码广泛应用于简单和线性变换器中,以表示位置信息。然而,现有的变换器编码方法不一定直接适用于线性变换器,因为后者需要对查询和关键表示的分解为分立的核函数。然而,适用于线性变换器的编码方法的原则尚未得到充分研究。在这项工作中,我们将现有的线性相对位置编码方法集成到一个共同形式下,并提出一家线性相对位置编码算法via单位变换。我们的表述导致一种原理性的框架,可以用于开发新的相对位置编码方法,保持线性空间时间复杂度。具有不同的模型,我们提出的线性化相对位置编码(LRPE)家族得到了有效的编码,并在语言模型、文本分类和图像分类等应用中达到了状态之arte的性能。同时,我们强调了一种通用的 paradigma для设计更多的相对位置编码方法,适用于线性 transformers。代码可以在https://github.com/OpenNLPLab/Lrpe 上下载。”
Text vectorization via transformer-based language models and n-gram perplexities
For: This paper aims to address the limitations of using scalar perplexity as a measure of text quality, and instead proposes a new method based on vector values that take into account the probability distribution of individual tokens within the input.* Methods: The proposed method uses n-gram perplexities to calculate the relative perplexity of each text token, and combines these values into a single vector representing the input. This approach allows for a more nuanced assessment of text quality, taking into account the probability distribution of individual tokens as well as their overall probability.* Results: The authors evaluate the effectiveness of their proposed method using several experiments, and show that it outperforms traditional scalar perplexity measures in terms of accurately assessing text quality. They also demonstrate the applicability of their method to a variety of natural language processing tasks, including language modeling and text classification.Abstract
As the probability (and thus perplexity) of a text is calculated based on the product of the probabilities of individual tokens, it may happen that one unlikely token significantly reduces the probability (i.e., increase the perplexity) of some otherwise highly probable input, while potentially representing a simple typographical error. Also, given that perplexity is a scalar value that refers to the entire input, information about the probability distribution within it is lost in the calculation (a relatively good text that has one unlikely token and another text in which each token is equally likely they can have the same perplexity value), especially for longer texts. As an alternative to scalar perplexity this research proposes a simple algorithm used to calculate vector values based on n-gram perplexities within the input. Such representations consider the previously mentioned aspects, and instead of a unique value, the relative perplexity of each text token is calculated, and these values are combined into a single vector representing the input.
摘要
随着文本中每个Token的概率产生产品,可能出现一个不太可能的Token会减少整体概率(即增加plexity),而这可能只是一个简单的字母输入错误。此外,由于plexity是一个Scalar值,它对整个输入的概率分布信息失去了信息(两个相对较好的文本,每个Token的概率相同,可能有同样的plexity值),尤其是 для longer texts。作为一种 alternativescalar perplexity,这些研究提出了一种简单的算法,用于计算基于n-gram perplexity的输入 vector值。这些表示器考虑了上述因素,而不是单个值,每个文本Token的概率值被计算,并将这些值组合成一个表示输入的单个vectors。
PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models
results: 对四种语言数据集和六种模型进行测试,结果显示,与标准基eline方法相比,本方法可以提高评估的不确定性精度,平均提高63%。Abstract
Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximately correct (PAC) guarantee for quantifying the uncertainty of GLMs. Unlike existing prediction set models, which are parameterized by a scalar value, we propose to parameterize prediction sets via neural networks, which achieves more precise uncertainty quantification but still satisfies the PAC guarantee. We demonstrate the efficacy of our method on four types of language datasets and six types of models by showing that our method improves the quantified uncertainty by $63\%$ on average, compared to a standard baseline method.
摘要
<>转换给定文本到简化中文。>模型不确定性学习和量化是提高模型可靠性的关键任务。特别是最近的生成语言模型(GLMs)使得需要可靠的不确定量化,因为担心生成幻见的情况。在这篇论文中,我们提议通过神经网络来学习预测集模型,这些模型具有可靠的不确定量化保证(PAC),用于量化 GLMs 的不确定性。与现有的预测集模型不同,我们的模型通过神经网络来Parameterize预测集,实现更精确的不确定量化,仍满足 PAC 保证。我们在四种语言 dataset 和六种模型上进行了证明,并示出了我们的方法可以在平均上提高量化不确定性的比例达到 63%,相比标准基准方法。
Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications
results: 研究发现了 gendered word associations、语言使用和biased narratives在LLM中的存在,并讨论了这些现象的伦理含义和可能的社会影响。Abstract
Gender bias in artificial intelligence (AI) and natural language processing has garnered significant attention due to its potential impact on societal perceptions and biases. This research paper aims to analyze gender bias in Large Language Models (LLMs) with a focus on multiple comparisons between GPT-2 and GPT-3.5, some prominent language models, to better understand its implications. Through a comprehensive literature review, the study examines existing research on gender bias in AI language models and identifies gaps in the current knowledge. The methodology involves collecting and preprocessing data from GPT-2 and GPT-3.5, and employing in-depth quantitative analysis techniques to evaluate gender bias in the generated text. The findings shed light on gendered word associations, language usage, and biased narratives present in the outputs of these Large Language Models. The discussion explores the ethical implications of gender bias and its potential consequences on social perceptions and marginalized communities. Additionally, the paper presents strategies for reducing gender bias in LLMs, including algorithmic approaches and data augmentation techniques. The research highlights the importance of interdisciplinary collaborations and the role of sociological studies in mitigating gender bias in AI models. By addressing these issues, we can pave the way for more inclusive and unbiased AI systems that have a positive impact on society.
摘要
人工智能(AI)和自然语言处理(NLP)中的性别偏见已经引起了社会的关注,因为它可能对社会观念和偏见产生影响。这篇研究论文的目的是分析LLMs中的性别偏见,以GPT-2和GPT-3.5两个著名的语言模型作为研究对象,以更好地了解其影响。通过对现有的AI语言模型性别偏见研究进行抽查,本研究发现了现有的知识空白。方法包括收集和处理GPT-2和GPT-3.5数据,并使用深入的量化分析技术来评估这些语言模型生成的文本中的性别偏见。发现结果指出了这些大语言模型生成的 gendered word associations、语言使用和偏执的 narraves 中的性别偏见。讨论探讨了性别偏见的伦理问题和可能对社会观念和边缘社群产生的影响。此外,论文还提出了减少LLMs中性别偏见的策略,包括算法方法和数据增强技术。研究强调了多学科合作和社会学研究的重要性,以消除性别偏见在AI模型中的问题,以便为社会带来更加包容和无偏见的AI系统。
Attention over pre-trained Sentence Embeddings for Long Document Classification
results: 研究获得了三个标准文档分类数据集上的竞争性result,与现有的 state-of-the-art 模型 using standard fine-tuning 相比, studied 方法 obtains 竞争性result,而且在冰transformer 下也可以获得更好的result。Abstract
Despite being the current de-facto models in most NLP tasks, transformers are often limited to short sequences due to their quadratic attention complexity on the number of tokens. Several attempts to address this issue were studied, either by reducing the cost of the self-attention computation or by modeling smaller sequences and combining them through a recurrence mechanism or using a new transformer model. In this paper, we suggest to take advantage of pre-trained sentence transformers to start from semantically meaningful embeddings of the individual sentences, and then combine them through a small attention layer that scales linearly with the document length. We report the results obtained by this simple architecture on three standard document classification datasets. When compared with the current state-of-the-art models using standard fine-tuning, the studied method obtains competitive results (even if there is no clear best model in this configuration). We also showcase that the studied architecture obtains better results when freezing the underlying transformers. A configuration that is useful when we need to avoid complete fine-tuning (e.g. when the same frozen transformer is shared by different applications). Finally, two additional experiments are provided to further evaluate the relevancy of the studied architecture over simpler baselines.
摘要
尽管现在大多数NLP任务中是使用trasnformer作为现实模型,但trasnformer通常只能处理短序列,因为它们的自我注意计算复杂度与序列长度成 quadratic关系。为了解决这个问题,一些研究将 concentrate 在减少自我注意计算成本或者使用回归机制或者新的trasnformer模型。在这篇论文中,我们建议利用预训练的句子trasnformer来获得含义rich的句子embeddings,然后通过一个小的注意层将它们相互组合,注意层的计算复杂度与文档长度成直线关系。我们在三个标准的文档分类数据集上进行了测试,与现有状态的艺术模型进行标准微调相比,我们的方法可以获得竞争力强的结果(即使没有明确的最佳模型)。我们还展示了在冻结下面trasnformer时,该方法可以获得更好的结果。这种配置是当我们需要避免完全微调(例如,当同一个冻结的trasnformer被分配给不同的应用程序)时 particualrly useful。最后,我们还提供了两个额外的实验,以进一步评估研究的建议。
Towards a Neural Era in Dialogue Management for Collaboration: A Literature Survey
results: 本文分析了一些最近的 neural 方法在协作对话管理中的应用,探讨了当前领域的主要趋势。这篇文章希望能为未来对话管理技术的发展提供基础背景,特别是在对话系统社区开始接受大语言模型的情况下。Abstract
Dialogue-based human-AI collaboration can revolutionize collaborative problem-solving, creative exploration, and social support. To realize this goal, the development of automated agents proficient in skills such as negotiating, following instructions, establishing common ground, and progressing shared tasks is essential. This survey begins by reviewing the evolution of dialogue management paradigms in collaborative dialogue systems, from traditional handcrafted and information-state based methods to AI planning-inspired approaches. It then shifts focus to contemporary data-driven dialogue management techniques, which seek to transfer deep learning successes from form-filling and open-domain settings to collaborative contexts. The paper proceeds to analyze a selected set of recent works that apply neural approaches to collaborative dialogue management, spotlighting prevailing trends in the field. This survey hopes to provide foundational background for future advancements in collaborative dialogue management, particularly as the dialogue systems community continues to embrace the potential of large language models.
摘要
对话基于人工智能的合作可能会革命化协作问题解决、创新探索和社交支持。为实现这一目标,自动化代理人具备谈判、遵从指令、确立共同基础和共同任务进展的技能是非常重要的。这篇评论从合作对话系统的话语管理 парадиг进行了回顾,从传统的手工和信息状态基础方法到人工智能规划引用的方法。然后shift关注当今的数据驱动对话管理技术,它们希望通过将深度学习成功转移到开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中
On the (In)Effectiveness of Large Language Models for Chinese Text Correction
results: 研究发现,chatGPT 在中文语法错误检测和中文拼写检测两个场景下的表现具有极高的水平,但也存在一些不满足的问题。Abstract
Recently, the development and progress of Large Language Models (LLMs) have amazed the entire Artificial Intelligence community. As an outstanding representative of LLMs and the foundation model that set off this wave of research on LLMs, ChatGPT has attracted more and more researchers to study its capabilities and performance on various downstream Natural Language Processing (NLP) tasks. While marveling at ChatGPT's incredible performance on kinds of tasks, we notice that ChatGPT also has excellent multilingual processing capabilities, such as Chinese. To explore the Chinese processing ability of ChatGPT, we focus on Chinese Text Correction, a fundamental and challenging Chinese NLP task. Specifically, we evaluate ChatGPT on the Chinese Grammatical Error Correction (CGEC) and Chinese Spelling Check (CSC) tasks, which are two main Chinese Text Correction scenarios. From extensive analyses and comparisons with previous state-of-the-art fine-tuned models, we empirically find that the ChatGPT currently has both amazing performance and unsatisfactory behavior for Chinese Text Correction. We believe our findings will promote the landing and application of LLMs in the Chinese NLP community.
摘要
(Simplified Chinese translation)最近,大语言模型(LLMs)的发展和进步,让整个人工智能社区感到惊叹。作为LLMs的代表之一,以及这波研究的基础模型,ChatGPT已经吸引了越来越多的研究人员研究其能力和表现在不同的自然语言处理(NLP)任务上。而在观察ChatGPT的惊人表现的同时,我们也注意到ChatGPT在多种语言处理任务上表现出色,包括中文。为了探索ChatGPT在中文处理能力方面的可能性,我们专注于中文文法错误 corrections和中文拼写检查两个主要的中文文本修正enario。经过广泛的分析和对前一些State-of-the-art fine-tuned模型的比较,我们发现ChatGPT在中文文本修正方面具有惊人的表现和不满的行为。我们认为我们的发现将推动LLMs在中文NLP社区的应用。
Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning
methods: 这个方法利用了一个预训练的端到端模型(Whisper),通过示例示例学习来让模型学习域特性。我们还扩展了这个方法,使其能够在文本 только fine-tuning 的情况下实现域特性和域适应。
results: 我们的模型在不同的域和提示上进行了评测,结果显示,模型在未经见过的数据集上可以实现 Word Error Rate(WER)下降达33%,而文本只 fine-tuning 的模型在医学对话数据集上可以达到最大的 WER 下降达29%。Abstract
In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt. This is accomplished by fine-tuning a pre-trained, end-to-end model (Whisper) to learn from demonstrations with prompt examples. We show that this ability can be generalized to different domains and even various prompt contexts, with our model gaining a Word Error Rate (WER) reduction of up to 33% on unseen datasets from various domains, such as medical conversation, air traffic control communication, and financial meetings. Considering the limited availability of audio-transcript pair data, we further extend our method to text-only fine-tuning to achieve domain sensitivity as well as domain adaptation. We demonstrate that our text-only fine-tuned model can also attend to various prompt contexts, with the model reaching the most WER reduction of 29% on the medical conversation dataset.
摘要
在这个工作中,我们提出了一种方法,用于创建域特定的语音识别模型,该模型利用文本域信息,通过给定文本示例来定制其生成。我们通过练习示例来训练一个预训练的、端到端模型(Whisper),以便从示例中学习。我们发现这种能力可以泛化到不同的域和不同的提示上下文中,我们在不同领域的未seen数据上实现了Word Error Rate(WER)的减少,最高达33%。针对缺乏语音-转录对数据的限制,我们进一步扩展了我们的方法,以文本 solo 精度为基础,实现域特定性和域适应性。我们示示了我们的文本 solo 精度模型可以attend到不同的提示上下文,并在医学对话数据集上达到最高的WER减少29%。
AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models
paper_authors: Rui Zhang, Yixin Su, Bayu Distiawan Trisedya, Xiaoyan Zhao, Min Yang, Hong Cheng, Jianzhong Qi
for: automatic entity alignment between knowledge graphs (KGs)
methods: constructs a predicate-proximity-graph with the help of large language models, computes entity embeddings using TransE, and shifts the two KGs’ entity embeddings into the same vector space based on attribute similarity
results: improves the performance of entity alignment significantly compared to state-of-the-art methodsAbstract
The task of entity alignment between knowledge graphs (KGs) aims to identify every pair of entities from two different KGs that represent the same entity. Many machine learning-based methods have been proposed for this task. However, to our best knowledge, existing methods all require manually crafted seed alignments, which are expensive to obtain. In this paper, we propose the first fully automatic alignment method named AutoAlign, which does not require any manually crafted seed alignments. Specifically, for predicate embeddings, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs. For entity embeddings, AutoAlign first computes the entity embeddings of each KG independently using TransE, and then shifts the two KGs' entity embeddings into the same vector space by computing the similarity between entities based on their attributes. Thus, both predicate alignment and entity alignment can be done without manually crafted seed alignments. AutoAlign is not only fully automatic, but also highly effective. Experiments using real-world KGs show that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods.
摘要
《EntityAlignment》是一个知识 graphs(KGs)中的任务,旨在将每个来自不同KGs的实体对应到同一个实体。许多机器学习基于方法已经被提出来解决这个任务。然而,我们所知道的所有方法都需要手动制作的种子对Alignment,这是 expensive的。在这篇论文中,我们提出了第一个完全自动的对Alignment方法,即AutoAlign,不需要任何手动制作的种子对Alignment。为 predicate embeddings,AutoAlign使用大型自然语言模型来自动捕捉两个KGs中 predicate的相似性,并构建一个 predicate-proximity-graph。为 entity embeddings,AutoAlign先使用TransE计算每个KG的独立实体表示,然后通过计算实体 attribute 的相似性来将两个KGs的实体表示Shift到同一个vector space中。因此, predicate alignment和entity alignment都可以通过自动生成的种子对Alignment来完成,不需要手动制作种子对Alignment。AutoAlign不仅是完全自动的,还是非常有效的。在实际世界KGs上进行实验,AutoAlign signicicantly improvese the performance of entity alignment,比对 estado-of-the-art 方法更高。
Mitigating Label Bias via Decoupled Confident Learning
paper_authors: Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky
for: Mitigating algorithmic bias and addressing label bias in training data.
methods: Decoupled Confident Learning (DeCoLe) pruning method to identify and remove biased labels.
results: Successfully identified biased labels and outperformed competing approaches in the context of hate speech detection.Here’s the full text in Simplified Chinese:
results: 在 hate speech detection 中应用 DeCoLe,成功发现了偏见标签,并超过了竞争方法的性能。Abstract
Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are correct. This is problematic because bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. In particular, human-generated labels are prone to encoding societal biases. While the presence of labeling bias has been discussed conceptually, there is a lack of methodologies to address this problem. We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias. After illustrating its performance on a synthetic dataset, we apply DeCoLe in the context of hate speech detection, where label bias has been recognized as an important challenge, and show that it successfully identifies biased labels and outperforms competing approaches.
摘要
众所周知的算法公平性问题已导致一系列方法来减少算法偏见。然而,这些方法假设训练数据中所观察到的标签是正确的。这是一个问题,因为标签偏见是重要领域,如医疗、招聘和内容审核中的普遍存在。人类生成的标签容易带有社会偏见。虽然标签偏见的存在已经被描述了,但是没有有效的方法来解决这个问题。我们提议一种剪除方法——分离确定学习(DeCoLe)——特意设计来减少标签偏见。在一个人工数据集上验证其性能后,我们在仇恨言语检测中应用DeCoLe,并显示它成功地识别偏见标签,并超过其他方法的表现。
NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning
results: 这篇论文的实验结果显示,使用 proposed method 可以实现 PLM fine-tuning 的目的,并且在自然语言理解(NLU)和自然语言生成(NLG)任务上都有很好的效果。Abstract
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications. However, even fine-tuning the PLMs and doing inference are expensive, especially on edge devices with low computing power. Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning, while very few one-shot compression techniques are explored. In this paper, we investigate the neural tangent kernel (NTK)--which reveals the gradient descent dynamics of neural networks--of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion. To achieve this, we reconsider the MLP as a bundle of sub-MLPs, and cluster them into a given number of centroids, which can then be restored as a compressed MLP and surprisingly shown to well approximate the NTK of the original PLM. Extensive experiments of PLM fine-tuning on both natural language understanding (NLU) and generation (NLG) tasks are provided to verify the effectiveness of the proposed method MLP fusion. Our code is available at https://github.com/weitianxin/MLP_Fusion.
摘要
通用抽象语言模型(PLM)的精度调整被认为是许多自然语言处理应用中的主流策略。然而,即使进行PLM的精度调整和推理都是昂贵的,特别是在edge设备上,这些设备具有较低的计算力。一些通用的方法(如量化和精炼)已经广泛研究,以减少PLM精度调整的计算/存储量。然而,对于一键压缩技术,尚未得到充分的研究。在这篇论文中,我们 investigate了多层感知器(MLP)模块在PLM中的神经凝结阵(NTK),并提议使用NTK-近似MLP融合来创造轻量级PLM。为实现这一目标,我们将MLP视为一个包含多个子MLP的Bundle,然后将它们分为一定数量的中心点,并将这些中心点还原为一个压缩后的MLP,并Surprisingly Shown to well approximate the NTK of the original PLM。我们提供了对PLM精度调整的NLU和NLG任务的广泛实验来验证提议的效果。代码可以在https://github.com/weitianxin/MLP_Fusion上找到。
Teach model to answer questions after comprehending the document
results: 实验结果显示,当student模型采用我们的方法时,对MRC任务的表现有显著改善,证明了方法的效果。Abstract
Multi-choice Machine Reading Comprehension (MRC) is a challenging extension of Natural Language Processing (NLP) that requires the ability to comprehend the semantics and logical relationships between entities in a given text. The MRC task has traditionally been viewed as a process of answering questions based on the given text. This single-stage approach has often led the network to concentrate on generating the correct answer, potentially neglecting the comprehension of the text itself. As a result, many prevalent models have faced challenges in performing well on this task when dealing with longer texts. In this paper, we propose a two-stage knowledge distillation method that teaches the model to better comprehend the document by dividing the MRC task into two separate stages. Our experimental results show that the student model, when equipped with our method, achieves significant improvements, demonstrating the effectiveness of our method.
摘要
多选机器阅读理解(MRC)是自然语言处理(NLP)的一个挑战性扩展,需要模型理解文本中实体之间的 semantics 和逻辑关系。传统上,MRC 任务被视为Answer questions based on the given text的一个过程。这种单个阶段approach 经常导致网络偏重于生成正确的答案,可能忽略文本本身的理解。因此,许多常见模型在处理 longer texts 时会遇到问题。在这篇论文中,我们提出了一种两个阶段知识填充方法,该方法将 MRC 任务分解为两个独立的阶段。我们的实验结果表明,当Student模型搭配我们的方法时,它们在MRC任务上表现出了显著改善,这demonstrates the effectiveness of our method。
Large Language Models Perform Diagnostic Reasoning
for: 这 paper 的目的是探讨自动诊断任务中的幂等思维(Chain-of-Thought,CoT)提问的扩展。
methods: 该 paper 使用的方法是基于医生的底层思维过程,提出了诊断理解Chain-of-Thought(DR-CoT)。
results: 实验结果表明,只使用通用文本库训练的大语言模型,并使用两个 DR-CoT 示例,可以提高自动诊断的准确率15%,并在域外设置中达到了18%的差异。这些结果表明,通过适当的提问,可以在大语言模型中激发专家知识的推理。Abstract
We explore the extension of chain-of-thought (CoT) prompting to medical reasoning for the task of automatic diagnosis. Motivated by doctors' underlying reasoning process, we present Diagnostic-Reasoning CoT (DR-CoT). Empirical results demonstrate that by simply prompting large language models trained only on general text corpus with two DR-CoT exemplars, the diagnostic accuracy improves by 15% comparing to standard prompting. Moreover, the gap reaches a pronounced 18% in out-domain settings. Our findings suggest expert-knowledge reasoning in large language models can be elicited through proper promptings.
摘要
我们探索了思维链(CoT)提示的扩展到医学理解,以提高自动诊断的精度。受医生的深层次思维过程 inspirits,我们提出了诊断思维链(DR-CoT)。我们的实验结果表明,只需通过对通用文本库进行训练,并使用两个DR-CoT示例来提示大语言模型,可以提高自动诊断的准确率15%,并在 OUT-DOMAIN Settings 中提高了18%。我们的发现表明,通过合适的提示,大语言模型中的专家知识 reasoning 可以被诱导出来。
An Integrated NPL Approach to Sentiment Analysis in Satisfaction Surveys
For: The paper aims to apply an integrated approach to natural language processing (NLP) to satisfaction surveys in order to understand and extract relevant information from survey responses, analyze feelings, and identify recurring word patterns.* Methods: The paper will use NLP techniques such as emotional polarity detection, response classification into positive, negative, or neutral categories, and opinion mining to highlight participants’ opinions. The analysis of word patterns in satisfaction survey responses will also be conducted using NLP.* Results: The paper will obtain results that can be used to identify areas for improvement, understand respondents’ preferences, and make strategic decisions based on analysis to improve respondent satisfaction. The results will provide a deeper understanding of feelings, opinions, and themes and trends present in respondents’ responses.Abstract
The research project aims to apply an integrated approach to natural language processing NLP to satisfaction surveys. It will focus on understanding and extracting relevant information from survey responses, analyzing feelings, and identifying recurring word patterns. NLP techniques will be used to determine emotional polarity, classify responses into positive, negative, or neutral categories, and use opinion mining to highlight participants opinions. This approach will help identify the most relevant aspects for participants and understand their opinions in relation to those specific aspects. A key component of the research project will be the analysis of word patterns in satisfaction survey responses using NPL. This analysis will provide a deeper understanding of feelings, opinions, and themes and trends present in respondents responses. The results obtained from this approach can be used to identify areas for improvement, understand respondents preferences, and make strategic decisions based on analysis to improve respondent satisfaction.
摘要
这个研究项目旨在应用综合的自然语言处理(NLP)技术来满意调查。它将关注从调查答案中提取有关信息,分析情感和识别反复出现的词语模式。通过NLP技术来确定情感方向,将答案分类为正面、负面或中性等类别,并使用意见挖掘来强调参与者的意见。这种方法可以帮助确定参与者最关心的方面,理解他们对这些方面的看法,并提供改进参与者满意度的信息。在这个研究项目中,word pattern在满意调查答案中的分析将为我们提供更深刻的情感、意见和问题趋势的理解。这些结果可以用来确定改进参与者满意度的方法,了解参与者的偏好,并基于分析来做战略决策。
Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge
results: 研究发现了不同模型在不同任务中的表现,并提供了可重复的评价指标。未来可能性和剩下的挑战也得到了讨论。Abstract
Understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems and investigating the underlying mechanisms of biological functions and complex diseases. While existing databases provide curated biological data from literature and other sources, they are often incomplete and their maintenance is labor-intensive, necessitating alternative approaches. In this study, we propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature. Toward this goal, in this work, we investigate the effectiveness of different large language models in tasks that involve recognizing protein interactions, pathways, and gene regulatory relations. We thoroughly evaluate the performance of various models, highlight the significant findings, and discuss both the future opportunities and the remaining challenges associated with this approach. The code and data are available at: https://github.com/boxorange/BioIE-LLM
摘要
理解蛋白相互作用和生物路径知识是生命系统复杂性的关键,可以帮助我们理解生物功能和复杂疾病的下面机理。现有的数据库提供了经过 curación的生物数据,但这些数据库通常是不完整的,维护困难重,需要新的方法。在这种情况下,我们提议利用大型自然语言模型来解决这些问题,自动从相关科学文献中提取生物知识。为达到这个目标,我们在这项工作中评估了不同的大型自然语言模型在识别蛋白相互作用、生物路径和基因调节关系等任务中的效果。我们全面评估了各模型的表现,把重要发现提高到位,并讨论了这种方法的未来机遇和剩下的挑战。代码和数据可以在:https://github.com/boxorange/BioIE-LLM 获取。
AlpaGasus: Training A Better Alpaca with Fewer Data
paper_authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin for: 这篇论文的目的是提出一种简单有效的数据选择策略,以提高大型语言模型(LLMs)在 instrucion-finetuning(IFT)过程中的性能。methods: 该策略基于使用一个强大的语言模型(如ChatGPT)来自动识别和除去低质量数据。results: 该策略可以提高 LLMs 的 instrucion-following 能力,并且可以快速减少训练时间。在多个测试集上,AlpaGasus 表现比原始 Alpaca 更好,并且与其教师模型(Text-Davinci-003)的性能相似。Abstract
Large language models~(LLMs) obtain instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and removes low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes \footnote{We apply IFT for the same number of epochs as Alpaca(7B) but on fewer data, using 4$\times$NVIDIA A100 (80GB) GPUs and following the original Alpaca setting and hyperparameters.}. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: \url{https://lichang-chen.github.io/AlpaGasus/}.
摘要
大型语言模型(LLM)通过指令精度训练(IFT)来获得指令遵从能力。然而,广泛使用的 IFT 数据集(例如 Alpaca 的 52k 数据)意外地包含了许多低质量的实例,其中的回答是错误或无关的,这会诱导模型学习 incorrect 的指令遵从能力。在这篇论文中,我们提出了一种简单而有效的数据选择策略,使用强大的 LLM(例如 ChatGPT)自动标识并移除低质量数据。为此,我们引入 AlpaGasus,它是在只有 9k 高质量数据上进行了精度训练。AlpaGasus 与原始 Alpaca 进行比较,在多个测试集上表现出色,并且其 13B 变体与 teacher LLM(i.e., Text-Davinci-003)在测试任务上的性能相似。它还提供了5.7倍快的训练时间,从原始 Alpaca 的 80 分钟减少到 14 分钟。总的来说,AlpaGasus 展示了一种新的数据中心的 IFT 模式,可以通过更快的训练和更高质量的指令遵从模型来提高指令遵从能力。我们的项目页面可以在以下链接中找到:https://lichang-chen.github.io/AlpaGasus/。
Multilingual Speech-to-Speech Translation into Multiple Target Languages
results: 在标准翻译测试集上,我们的提议的多语言模型比双语模型在英语到16种目标语言的翻译中表现出色。Abstract
Speech-to-speech translation (S2ST) enables spoken communication between people talking in different languages. Despite a few studies on multilingual S2ST, their focus is the multilinguality on the source side, i.e., the translation from multiple source languages to one target language. We present the first work on multilingual S2ST supporting multiple target languages. Leveraging recent advance in direct S2ST with speech-to-unit and vocoder, we equip these key components with multilingual capability. Speech-to-masked-unit (S2MU) is the multilingual extension of S2U, which applies masking to units which don't belong to the given target language to reduce the language interference. We also propose multilingual vocoder which is trained with language embedding and the auxiliary loss of language identification. On benchmark translation testsets, our proposed multilingual model shows superior performance than bilingual models in the translation from English into $16$ target languages.
摘要
听说到听翻译(S2ST)可以实现不同语言的人们之间的口头交流。虽有一些关于多语言S2ST的研究,但它们的研究重点在多语言源语言到一个目标语言的翻译方面。我们发表了多语言S2ST的首个研究,利用最新的直接S2ST技术,对关键组件进行多语言化。听说到压制单元(S2MU)是多语言扩展的S2U,它对不属于给定目标语言的单元进行掩蔽,以降低语言干扰。我们还提议多语言 vocoder,通过语言嵌入和 aux 损失来训练。在标准翻译测试集上,我们的提议的多语言模型在英语到 $16$ 个目标语言的翻译中表现出色,比双语模型更高。
Retentive Network: A Successor to Transformer for Large Language Models
results: 实验结果表明,RetNet在语言模型中实现了有利的扩展结果,并且可以实现并行培训、低成本推理和高效推理。Abstract
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at https://aka.ms/retnet.
摘要
在这项工作中,我们提出了吸引网络(RetNet)作为大语言模型的基础架构,同时实现了训练并行、低成本推理和良好的性能。我们理论上 derivated了回快和注意力之间的连接。然后我们提出了保留机制,用于序列模型,该机制支持三种计算方法,即并行、循环和块循环。 Specifically,并行表示允许训练并行。循环表示可以在 $O(1)$ 推理成本下进行快速推理,这会提高解码速度、延迟和GPU内存,而无需牺牲性能。块循环表示可以高效地处理长序列模型,每个块都可以并行地编码,而循环 SUMmarize 每个块。实验结果表明,RetNet在语言模型方面实现了有利扩展、并行训练、低成本部署和高效推理。这些特有性使得RetNet成为Transformer的强力继承者。代码将在 https://aka.ms/retnet 上提供。
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions
results: 我们在这篇研究中获得了与基准方法相比的大幅提升(+10.13 mean IoU),证明了 MDSM 的效果。Abstract
In this study, we aim to develop a model that comprehends a natural language instruction (e.g., "Go to the living room and get the nearest pillow to the radio art on the wall") and generates a segmentation mask for the target everyday object. The task is challenging because it requires (1) the understanding of the referring expressions for multiple objects in the instruction, (2) the prediction of the target phrase of the sentence among the multiple phrases, and (3) the generation of pixel-wise segmentation masks rather than bounding boxes. Studies have been conducted on languagebased segmentation methods; however, they sometimes mask irrelevant regions for complex sentences. In this paper, we propose the Multimodal Diffusion Segmentation Model (MDSM), which generates a mask in the first stage and refines it in the second stage. We introduce a crossmodal parallel feature extraction mechanism and extend diffusion probabilistic models to handle crossmodal features. To validate our model, we built a new dataset based on the well-known Matterport3D and REVERIE datasets. This dataset consists of instructions with complex referring expressions accompanied by real indoor environmental images that feature various target objects, in addition to pixel-wise segmentation masks. The performance of MDSM surpassed that of the baseline method by a large margin of +10.13 mean IoU.
摘要
在这个研究中,我们目标是开发一个模型,可以理解自然语言指令(例如,“去生活厅获得最近的柔毂到墙上的广播艺术)”并生成目标日常物体的分割面积。这个任务具有三个挑战:一是理解指令中多个对象的引用表达,二是预测句子中的目标短语,三是生成像素级分割面积而不是边框框。尝试过语言基于分割方法,但它们有时会覆盖无关区域,对于复杂的句子来说。在这篇论文中,我们提出了多Modal扩散分割模型(MDSM),它在第一个阶段生成分割面积,然后在第二个阶段进行精细调整。我们引入了相关的并行特征提取机制,并扩展了扩散概率模型以处理相关特征。为验证我们的模型,我们创建了基于Matterport3D和REVERIE dataset的新 dataset,这个dataset包括具有复杂引用表达的指令,并且拥有真实的室内环境图像和多种目标物体的像素级分割面积。我们的模型在比较基准方法时,表现出了大幅提升的mean IoU值+10.13。