results: 研究发现,生成文本往往更加混乱,而文学作品则更加复杂。此外,人类文本 clustering 结果比 bot-生成文本更加杂乱,而 bot-生成文本 clustering 结果比人类文本更加紧凑和分化。Abstract
With the development of generative models like GPT-3, it is increasingly more challenging to differentiate generated texts from human-written ones. There is a large number of studies that have demonstrated good results in bot identification. However, the majority of such works depend on supervised learning methods that require labelled data and/or prior knowledge about the bot-model architecture. In this work, we propose a bot identification algorithm that is based on unsupervised learning techniques and does not depend on a large amount of labelled data. By combining findings in semantic analysis by clustering (crisp and fuzzy) and information techniques, we construct a robust model that detects a generated text for different types of bot. We find that the generated texts tend to be more chaotic while literary works are more complex. We also demonstrate that the clustering of human texts results in fuzzier clusters in comparison to the more compact and well-separated clusters of bot-generated texts.
摘要
随着生成模型如GPT-3的发展,分化generated文本和人类写作文本的任务变得越来越复杂。有很多研究表明,可以通过supervised learning方法来实现 bot 识别。然而,大多数这些工作需要标注数据和/或对 bot-model architecture 的先验知识。在这个工作中,我们提出一种基于无监督学习技术的 bot 识别算法。通过结合 semantic analysis by clustering(crisp和fuzzy)和信息技术,我们构建了一个可靠的模型,可以 Detect 不同类型的 bot 生成的文本。我们发现,生成文本往往更加混乱,而文学作品则更加复杂。此外,我们还示出了人类文本的 clustering 结果比 bot-生成文本的 clustering 结果更加普通和更加紧密。
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding
results: 实验结果表明,ML-LMCL在三个dataset上比现有模型表现出色,达到了新的state-of-the-art性能。Abstract
Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue systems. However, the inevitable errors from automatic speech recognition (ASR) usually impair the understanding performance and lead to error propagation. Although there are some attempts to address this problem through contrastive learning, they (1) treat clean manual transcripts and ASR transcripts equally without discrimination in fine-tuning; (2) neglect the fact that the semantically similar pairs are still pushed away when applying contrastive learning; (3) suffer from the problem of Kullback-Leibler (KL) vanishing. In this paper, we propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL), a novel framework for improving ASR robustness in SLU. Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models. We also introduce a distance polarization regularizer to avoid pushing away the intra-cluster pairs as much as possible. Moreover, we use a cyclical annealing schedule to mitigate KL vanishing issue. Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.
摘要
听话语理解(SLU)是对话系统中的基本任务。然而,自动声音识别(ASR)的不可避免错误通常会妨碍理解性能,导致错误卷入。虽然有一些尝试通过对比学习解决这个问题,但它们(1)在细化学习中对清晰手动词语和ASR词语进行同样的处理;(2)忽视了semantic similarity pairs在应用对比学习时被推迟的问题;(3)受到Kullback-Leibler(KL)消失问题。在这篇论文中,我们提出了相互学习和大margin对比学习(ML-LMCL),一种改进ASRRobustness在SLU中的新框架。具体来说,在细化学习中,我们将两个SLU模型分别在手动词语和ASR词语上进行微调,以便相互分享知识。此外,我们还引入了距离偏斜规则,以避免push away intra-cluster pairs的问题。此外,我们使用循环缓和调度来缓解KL消失问题。实验结果表明,ML-LMCL在三个dataset上超过现有模型,实现了新的状态机器人性能。
CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies
paper_authors: Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan
for: 用于建立文本中复杂层次结构的 annotation
methods: 使用递归的增量建构方法
results: 可以快速构建层次结构,并且保证层次结构的可靠性和可比较性Abstract
Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both clusters and hierarchy simultaneously over any type of texts. This incremental approach significantly reduces annotation time compared to the common pairwise annotation approach and also guarantees maintaining transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a consolidation mode, where an adjudicator can easily compare multiple cluster hierarchy annotations and resolve disagreements.
摘要
各种自然语言处理任务需要复杂的层次结构,其中每个节点是一个文本中的一组项。例如生成包容关系图、跨文档涉及关系解决、标注事件和次事件关系等。为了有效地标注这些层次结构,我们发布了 CHAMP 开源工具,可以逐步建立节点和层次结构,并且可以同时处理任意类型的文本。这种逐步方法与常见的对应方法相比,可以减少标注时间,同时保证节点和层次结构的对称性。此外,CHAMP 还包含了一个整合模式,可以让评审人轻松比较多个节点层次结构,并解决不一致。
A Cross-Attention Augmented Model for Event-Triggered Context-Aware Story Generation
results: 与基eline模型相比,提高自动度量和人工评价指标约5%和10%。Abstract
Despite recent advancements, existing story generation systems continue to encounter difficulties in effectively incorporating contextual and event features, which greatly influence the quality of generated narratives. To tackle these challenges, we introduce a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by employing a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism enables our model to exploit logical relationships between events more effectively during the story generation process. To further enhance our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This allows EtriCA to adapt to a wider range of data samples. This results in approximately 5\% improvement in automatic metrics and over 10\% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art (SOTA) baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results underscore the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.
摘要
Despite recent advancements, existing story generation systems continue to struggle with effectively incorporating contextual and event features, which greatly impact the quality of generated narratives. To address these challenges, we propose a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by using a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism allows our model to exploit logical relationships between events more effectively during the story generation process. To further improve our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This enables EtriCA to adapt to a wider range of data samples, resulting in approximately 5% improvement in automatic metrics and over 10% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results highlight the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese languages. The other is Traditional Chinese.
Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters
methods: 利用人工标注的Visual Chinese Character Checking dataset,并提出了新的基线方法来评估这些方法。
results: 实验和分析结果表明,Visual-C$^3$ 是一个高质量 yet 挑战性的 dataset,并且新的基线方法可以在这个 dataset 上达到显著的性能。Abstract
Writing assistance is an application closely related to human life and is also a fundamental Natural Language Processing (NLP) research field. Its aim is to improve the correctness and quality of input texts, with character checking being crucial in detecting and correcting wrong characters. From the perspective of the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors). However, existing datasets and related studies only focus on misspelled characters mainly caused by phonological or visual confusion, thereby ignoring faked characters which are more common and difficult. To break through this dilemma, we present Visual-C$^3$, a human-annotated Visual Chinese Character Checking dataset with faked and misspelled Chinese characters. To the best of our knowledge, Visual-C$^3$ is the first real-world visual and the largest human-crafted dataset for the Chinese character checking scenario. Additionally, we also propose and evaluate novel baseline methods on Visual-C$^3$. Extensive empirical results and analyses show that Visual-C$^3$ is high-quality yet challenging. The Visual-C$^3$ dataset and the baseline methods will be publicly available to facilitate further research in the community.
摘要
文本协助是人类生活中非常重要的应用,同时也是自然语言处理(NLP)研究的基础领域之一。其目标是提高输入文本的正确性和质量,特别是在检查和更正错误字符时,字符检查具有关键性。在现实世界中,手写占了大多数情况下,人们通常会出现写字错误所导致的伪字和拼写错误。然而,现有的数据集和相关研究都主要关注了由语音或视觉混淆导致的拼写错误,而忽略了由写字错误引起的伪字,这些伪字更为普遍和困难。为了突破这种僵局,我们提出了Visual-C$^3$,一个人工标注的中文字符检查数据集,包括伪字和拼写错误的中文字符。到目前为止,Visual-C$^3$ 是实际世界中最大的人工制作的中文字符检查数据集,同时也是首个真实世界的视觉中文字符检查数据集。此外,我们还提出了一些基线方法,并进行了评估。广泛的实验结果和分析显示,Visual-C$^3$ 具有高质量又具有挑战性。Visual-C$^3$ 数据集和基线方法将在未来公开,以便更多的研究者参与进来。
Rethinking Large Language Models in Mental Health Applications
results: 论文指出,虽然大语言模型在心理健康应用中显示了推荐的可能性,但人类心理师的Empathy、细腻的解释和情境意识仍然是不可取代的,大语言模型应该被视为人类专家的工具而不是替代者。Abstract
Large Language Models (LLMs) have become valuable assets in mental health, showing promise in both classification tasks and counseling applications. This paper offers a perspective on using LLMs in mental health applications. It discusses the instability of generative models for prediction and the potential for generating hallucinatory outputs, underscoring the need for ongoing audits and evaluations to maintain their reliability and dependability. The paper also distinguishes between the often interchangeable terms ``explainability'' and ``interpretability'', advocating for developing inherently interpretable methods instead of relying on potentially hallucinated self-explanations generated by LLMs. Despite the advancements in LLMs, human counselors' empathetic understanding, nuanced interpretation, and contextual awareness remain irreplaceable in the sensitive and complex realm of mental health counseling. The use of LLMs should be approached with a judicious and considerate mindset, viewing them as tools that complement human expertise rather than seeking to replace it.
摘要
Translation in Simplified Chinese:大型语言模型 (LLMs) 已成为心理健康领域的有价资产,在分类任务和咨询应用中显示了推荐的潜力。本文对使用 LLMs 在心理健康领域的应用提供了一种见解,讨论了预测时的生成模型的不稳定性和潜在的幻视出力,强调需要进行持续的审核和评估,以确保它们的可靠性和可靠性。文章也区别了“解释”和“解读”两个概念,主张发展自然可解释的方法,而不是依赖 LLMs 自带的可能幻视的自我解释。尽管 LLMs 的进步,但人类辅导员的 Empathy 理解、细化解释和场景意识仍然无法被取代在心理健康辅导中。使用 LLMs 应以谨慎和考虑的心态,视它们为人类专家知识的辅助工具,而不是尝试取代它们。
Causal ATE Mitigates Unintended Bias in Controlled Text Generation
for: 这 paper 是研究语言模型中的特性控制问题,通过 causal average treatment effect(Causal ATE)方法进行研究。
methods: 该 paper 使用 causal ATE 方法来解决语言模型中的特性控制问题,并提供了一个理论基础和证明,证明该方法可以减少 false positive 的问题。
results: 该 paper 表明,使用 causal ATE 方法可以解决语言模型中的偶极现象问题,减少对保护群体的不必要偏见。Abstract
We study attribute control in language models through the method of Causal Average Treatment Effect (Causal ATE). Existing methods for the attribute control task in Language Models (LMs) check for the co-occurrence of words in a sentence with the attribute of interest, and control for them. However, spurious correlation of the words with the attribute in the training dataset, can cause models to hallucinate the presence of the attribute when presented with the spurious correlate during inference. We show that the simple perturbation-based method of Causal ATE removes this unintended effect. Additionally, we offer a theoretical foundation for investigating Causal ATE in the classification task, and prove that it reduces the number of false positives -- thereby mitigating the issue of unintended bias. Specifically, we ground it in the problem of toxicity mitigation, where a significant challenge lies in the inadvertent bias that often emerges towards protected groups post detoxification. We show that this unintended bias can be solved by the use of the Causal ATE metric.
摘要
Translated into Simplified Chinese:我们研究语言模型中的特性控制方法,使用 causal average treatment effect(Causal ATE)方法。现有的语言模型(LMs)中的特性控制方法通常是根据 interessant attribute 的搜索和控制。然而,在训练数据集中存在偶极相关性的情况下,这些方法可能会导致模型在推理过程中假设存在特性,从而导致模型“假象”现象。我们表明,使用简单的扰动基于的 Causal ATE 方法可以解除这种不良影响。此外,我们还提供了对 Causal ATE 在分类任务中的理论基础,并证明它可以减少 false positive,从而解决不良偏见问题。 Specifically, 我们将其应用于抑止攻击性 Mitigation 中的潜在偏见问题,这是一个主要挑战在推理过程中偶极相关性导致保护组中的不良偏见。我们表明,使用 Causal ATE metric 可以解决这种偏见问题。