cs.CL - 2023-09-10

The Effect of Alignment Objectives on Code-Switching Translation

  • paper_url: http://arxiv.org/abs/2309.05044
  • repo_url: None
  • paper_authors: Mohamed Anwar
  • for: 这个论文主要是为了提高机器翻译模型对Code-switching内容的翻译能力,特别是随着社交媒体和用户生成内容的兴起。
  • methods: 该论文提出了一种训练单个机器翻译模型,可以将一种语言中的单 sentence翻译成另一种语言,同时也可以翻译code-switched sentence。这个模型可以看作是人类的双语模型。为了更好地利用平行数据,我们生成了Synthetic Code-switched (CSW) 数据,并在编码器上添加了对齐损失,以将语言表示 align across languages。
  • results: 使用WMT14英语-法语(En-Fr)数据集,训练过程中的模型在处理code-switched翻译时强制性超过了批量基eline,同时保持了非code-switched(单语言)数据的质量。
    Abstract One of the things that need to change when it comes to machine translation is the models' ability to translate code-switching content, especially with the rise of social media and user-generated content. In this paper, we are proposing a way of training a single machine translation model that is able to translate monolingual sentences from one language to another, along with translating code-switched sentences to either language. This model can be considered a bilingual model in the human sense. For better use of parallel data, we generated synthetic code-switched (CSW) data along with an alignment loss on the encoder to align representations across languages. Using the WMT14 English-French (En-Fr) dataset, the trained model strongly outperforms bidirectional baselines on code-switched translation while maintaining quality for non-code-switched (monolingual) data.
    摘要 一些需要改变的事情在机器翻译方面是模型对混合语言内容的翻译能力,尤其是随着社交媒体和用户生成内容的兴起。在这篇论文中,我们提出了一种训练单个机器翻译模型,可以将一种语言中的单语句翻译成另一种语言,同时也可以翻译混合语言句子到任一种语言。这个模型可以被视为人类中的双语模型。为了更好地利用平行数据,我们生成了人工合成的混合语言数据,并在编码器中添加了对逻辑的损失,以确保语言之间的表示相互对应。使用WMT14英语-法语(En-Fr)数据集,我们训练的模型在混合语言翻译中强制超越了irectional基eline,同时保持了非混合语言数据的质量。

Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain Activation Maps

  • paper_url: http://arxiv.org/abs/2309.05021
  • repo_url: None
  • paper_authors: Yaonai Wei, Tuo Zhang, Han Zhang, Tianyang Zhong, Lin Zhao, Zhengliang Liu, Chong Ma, Songyao Zhang, Muheng Shang, Lei Du, Xiao Li, Tianming Liu, Junwei Han
  • for: 本研究旨在提高meta-analysis中文本Query的准确性,使用大型自然语言模型(LLMs)来解决现有的问题,如semantic redundancy和ambiguity。
  • methods: 本研究使用了一种称为Chat2Brain的方法,它将基本的文本-2-图模型(Text2Brain)与LLMs相结合,以将开放式SemanticQuery映射到大脑活动图像中。
  • results: 研究表明,Chat2Brain可以将文本Query转化为具有生物学可能性的大脑活动图像,并且在数据缺乏和复杂的查询环境中表现出了优于Text2Brain模型。
    Abstract Over decades, neuroscience has accumulated a wealth of research results in the text modality that can be used to explore cognitive processes. Meta-analysis is a typical method that successfully establishes a link from text queries to brain activation maps using these research results, but it still relies on an ideal query environment. In practical applications, text queries used for meta-analyses may encounter issues such as semantic redundancy and ambiguity, resulting in an inaccurate mapping to brain images. On the other hand, large language models (LLMs) like ChatGPT have shown great potential in tasks such as context understanding and reasoning, displaying a high degree of consistency with human natural language. Hence, LLMs could improve the connection between text modality and neuroscience, resolving existing challenges of meta-analyses. In this study, we propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map open-ended semantic queries to brain activation maps in data-scarce and complex query environments. By utilizing the understanding and reasoning capabilities of LLMs, the performance of the mapping model is optimized by transferring text queries to semantic queries. We demonstrate that Chat2Brain can synthesize anatomically plausible neural activation patterns for more complex tasks of text queries.
    摘要 (注:以下是简化中文版本)多年来,神经科学在文本模式中积累了大量的研究成果,可以用来探索认知过程。meta分析是一种常见的方法,可以将文本查询映射到大脑活动图表,但是它仍然依赖于理想的查询环境。在实际应用中,用于meta分析的文本查询可能会遇到 semantics redundancy和ambiguity问题,导致不准确地映射到大脑图像。然而,大型自然语言模型(LLMs)如ChatGPT显示出了在上下文理解和思维任务中的极高潜力,这与人类自然语言的一致度很高。因此,LLMs可以改善文本模式和神经科学之间的连接,解决现有的meta分析挑战。在这项研究中,我们提议一种名为Chat2Brain的方法,将LLMs与基本的文本-2-图模型(Text2Brain)结合,以将开放式semantic查询映射到大脑活动图表中。通过利用LLMs的理解和思维能力,我们可以优化映射模型的性能,将文本查询转化为semantic查询。我们示例ify that Chat2Brain可以生成符合生物学原理的大脑活动 Patterns for more complex tasks of text queries.

Machine Translation Models Stand Strong in the Face of Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2309.06527
  • repo_url: None
  • paper_authors: Pavel Burnyshev, Elizaveta Kostenok, Alexey Zaytsev
  • for: 本研究探讨了深度学习模型面临攻击时的漏洞,具体来说是对序列至序列(seq2seq)模型的机器翻译模型进行攻击。
  • methods: 我们引入了基本文本扰动规则和更高级别的策略,如梯度基于攻击,利用不可导的翻译度量的拟合来进行攻击。
  • results: 我们的研究表明,机器翻译模型对已知最佳攻击方法 Displayed robustness,输入扰动与输出扰动直接相关。但是,在弱者中,我们的攻击表现最好,与其他攻击相比,具有最高相对性。另一强 candidate是基于个体字符混合的攻击。
    Abstract Adversarial attacks expose vulnerabilities of deep learning models by introducing minor perturbations to the input, which lead to substantial alterations in the output. Our research focuses on the impact of such adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models. We introduce algorithms that incorporate basic text perturbation heuristics and more advanced strategies, such as the gradient-based attack, which utilizes a differentiable approximation of the inherently non-differentiable translation metric. Through our investigation, we provide evidence that machine translation models display robustness displayed robustness against best performed known adversarial attacks, as the degree of perturbation in the output is directly proportional to the perturbation in the input. However, among underdogs, our attacks outperform alternatives, providing the best relative performance. Another strong candidate is an attack based on mixing of individual characters.
    摘要 深度学习模型的敌对攻击暴露了它们的漏洞,通过对输入添加微小的修改,导致输出受到重大的变化。我们的研究关注于seq2seq模型,具体来说是机器翻译模型,对于这类模型的敌对攻击。我们提出了基于文本修饰规则和更高级的策略,如基于梯度的攻击,利用可微的翻译评价函数来近似非微分的翻译评价函数。我们的调查发现,机器翻译模型对已知最佳敌对攻击表现出了强健性,输入修饰程度与输出修饰程度直接相关。然而,在弱者中,我们的攻击表现最佳,提供了最好的相对性。另一个强 канди达是基于个体字符混合的攻击。

Mitigating Word Bias in Zero-shot Prompt-based Classifiers

  • paper_url: http://arxiv.org/abs/2309.04992
  • repo_url: None
  • paper_authors: Adian Liusie, Potsawee Manakul, Mark J. F. Gales
  • for: 提高 prompt-based 分类器的性能,解决 word biases 问题
  • methods: 使用 unsupervised 方法,对类别的预测概率进行重新权重,并与语言模型的词权相连
  • results: 实现了大幅提高 prompt 设置的性能,与 oracle Upper bound 性能呈现强相关,并可以在 zero-resource 环境下设置阈值
    Abstract Prompt-based classifiers are an attractive approach for zero-shot classification. However, the precise choice of the prompt template and label words can largely influence performance, with semantically equivalent settings often showing notable performance difference. This discrepancy can be partly attributed to word biases, where the classifier may be biased towards classes. To address this problem, it is possible to optimise classification thresholds on a labelled data set, however, this mitigates some of the advantages of prompt-based classifiers. This paper instead approaches this problem by examining the expected marginal probabilities of the classes. Here, probabilities are reweighted to have a uniform prior over classes, in an unsupervised fashion. Further, we draw a theoretical connection between the class priors and the language models' word prior, and offer the ability to set a threshold in a zero-resource fashion. We show that matching class priors correlates strongly with the oracle upper bound performance and demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
    摘要 This paper proposes a different approach: reweighting probabilities to have a uniform prior over classes in an unsupervised fashion. The expected marginal probabilities of the classes are examined, and a threshold can be set in a zero-resource fashion. The class priors are found to be closely related to the language models' word prior, and matching class priors can achieve strong performance gains for prompt settings across a range of NLP tasks.

Retrieval-Augmented Meta Learning for Low-Resource Text Classification

  • paper_url: http://arxiv.org/abs/2309.04979
  • repo_url: https://github.com/Carolmelon/RAML
  • paper_authors: Rongsheng Li, Yangning Li, Yinghui Li, Chaiyut Luoyiching, Hai-Tao Zheng, Nannan Zhou, Hanjing Su
  • for: 优化低资源文本分类任务的表现,通过从源类任务中传递知识来预测目标类。
  • methods: 使用参数化神经网络进行推理,并从外部词库中检索非参数化知识来增强推理表现。
  • results: 在低资源文本分类任务中显著超过当前最佳状态的表现。
    Abstract Meta learning have achieved promising performance in low-resource text classification which aims to identify target classes with knowledge transferred from source classes with sets of small tasks named episodes. However, due to the limited training data in the meta-learning scenario and the inherent properties of parameterized neural networks, poor generalization performance has become a pressing problem that needs to be addressed. To deal with this issue, we propose a meta-learning based method called Retrieval-Augmented Meta Learning(RAML). It not only uses parameterization for inference but also retrieves non-parametric knowledge from an external corpus to make inferences, which greatly alleviates the problem of poor generalization performance caused by the lack of diverse training data in meta-learning. This method differs from previous models that solely rely on parameters, as it explicitly emphasizes the importance of non-parametric knowledge, aiming to strike a balance between parameterized neural networks and non-parametric knowledge. The model is required to determine which knowledge to access and utilize during inference. Additionally, our multi-view passages fusion network module can effectively and efficiently integrate the retrieved information into low-resource classification task. The extensive experiments demonstrate that RAML significantly outperforms current SOTA low-resource text classification models.
    摘要 Meta 学习已经实现了低资源文本分类中的出色表现,通过从源类中 transferred 知识来标识目标类。然而,由于 meta-learning enario 中的培育数据有限和参数化神经网络的内在性质,低泛化性表现成为一个需要解决的问题。为解决这个问题,我们提出了 Retrieval-Augmented Meta Learning(RAML)方法。它不仅使用参数进行推理,而且从外部资源中检索非参数化知识,以便在推理时使用,这有效地解决了由于缺乏多样化培育数据而导致的低泛化性问题。与之前的模型不同,RAML 不仅仅仅靠参数来进行推理,而是强调非参数化知识的重要性,以达到参数化神经网络和非参数化知识之间的平衡。模型需要在推理时决定哪些知识要访问和利用。此外,我们的多视图通道融合网络模块可以高效地和有效地将检索到的信息集成到低资源分类任务中。广泛的实验表明,RAML 可以明显超过当前最佳的低资源文本分类模型。

Prompt Learning With Knowledge Memorizing Prototypes For Generalized Few-Shot Intent Detection

  • paper_url: http://arxiv.org/abs/2309.04971
  • repo_url: None
  • paper_authors: Chaiyut Luoyiching, Yangning Li, Yinghui Li, Rongsheng Li, Hai-Tao Zheng, Nannan Zhou, Hanjing Su
  • for: solves the challenging problem of generalized few-shot intent detection (GFSID) by converting the task into the class incremental learning paradigm.
  • methods: proposes a two-stage learning framework that sequentially learns the knowledge of different intents in various periods via prompt learning, and uses prototypes to categorize both seen and novel intents.
  • results: achieves promising performance on two widely used datasets through extensive experiments and detailed analyses.Here’s the full summary in Simplified Chinese:
  • for: 通过将GFSID任务转换为类增量学习 paradigm,解决了Generalized Few-Shot Intent Detection (GFSID) 的挑战性问题。
  • methods: 提议了一个两阶段学习框架,通过提示学习顺序地学习不同意图的知识,并使用prototype来分类seen和novel意图。
  • results: 通过广泛的实验和详细的分析,在两个广泛使用的数据集上达到了可观的表现。
    Abstract Generalized Few-Shot Intent Detection (GFSID) is challenging and realistic because it needs to categorize both seen and novel intents simultaneously. Previous GFSID methods rely on the episodic learning paradigm, which makes it hard to extend to a generalized setup as they do not explicitly learn the classification of seen categories and the knowledge of seen intents. To address the dilemma, we propose to convert the GFSID task into the class incremental learning paradigm. Specifically, we propose a two-stage learning framework, which sequentially learns the knowledge of different intents in various periods via prompt learning. And then we exploit prototypes for categorizing both seen and novel intents. Furthermore, to achieve the transfer knowledge of intents in different stages, for different scenarios we design two knowledge preservation methods which close to realistic applications. Extensive experiments and detailed analyses on two widely used datasets show that our framework based on the class incremental learning paradigm achieves promising performance.
    摘要 通用几招意图检测(GFSID)是一个具有挑战性和实用性的任务,因为它需要同时分类已知和新的意图。先前的GFSID方法基于 episodic learning 模式,这使得它们不能直接应用到通用化设置中。为解决这个困境,我们提议将 GFSID 任务转化为类增量学习模式。具体来说,我们提议一个两阶段学习框架,先后学习不同时期的意图知识via prompt learning。然后,我们利用示例来分类已知和新的意图。此外,为了保持意图在不同阶段的传递知识,我们设计了两种知识保持方法,它们更加适合实际应用。我们在两个广泛使用的数据集上进行了详细的实验和分析,得到了我们基于类增量学习模式的框架的优秀表现。

What’s Hard in English RST Parsing? Predictive Models for Error Analysis

  • paper_url: http://arxiv.org/abs/2309.04940
  • repo_url: None
  • paper_authors: Yang Janet Liu, Tatsuya Aoyama, Amir Zeldes
  • for: 本研究旨在探讨逻辑结构理论下的层次话语分析仍然存在挑战,以及这些挑战的原因是如何。
  • methods: 本文使用了一些过去研究中的难点,包括半显式关系、远程关系、缺失词汇等因素,以及两个英文测试集,其中一个包含正确的金标RST关系,另一个包含干扰关系。
  • results: 我们的结果显示,与浅度话语分析一样,显式/隐式之分在层次话语分析中也发挥了作用,但是远程依赖关系是主要的挑战,而词汇重叠的问题较少。我们的最终模型可以在76.3%的精度上预测错误的位置,Bottom-upParser 和 Top-downParser 都是如此。
    Abstract Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited. In this paper, we examine and model some of the factors associated with parsing difficulties in previous work: the existence of implicit discourse relations, challenges in identifying long-distance relations, out-of-vocabulary items, and more. In order to assess the relative importance of these variables, we also release two annotated English test-sets with explicit correct and distracting discourse markers associated with gold standard RST relations. Our results show that as in shallow discourse parsing, the explicit/implicit distinction plays a role, but that long-distance dependencies are the main challenge, while lack of lexical overlap is less of a problem, at least for in-domain parsing. Our final model is able to predict where errors will occur with an accuracy of 76.3% for the bottom-up parser and 76.6% for the top-down parser.
    摘要 尽管最近的自然语言处理(NLP)技术已经取得了 significiant advances,但在 rhethorical structure theory(RST)框架下的层次演化分析仍然是一个挑战,我们对这些挑战的理解仍然有限。在这篇论文中,我们研究了过去的 parsing 困难的一些因素,包括隐式 discourse relations 的存在、远程关系的挑战、out-of-vocabulary items 和更多的因素。为了评估这些变量的相对重要性,我们还发布了两个英文测试集,其中包含了可见的正确和干扰 discourse markers,与黄金标准 RST 关系相关。我们的结果表明,与浅层演化 parsing 类似,显式/隐式之分发挥了作用,但长距离依赖关系是主要挑战,而词汇重叠的不足则是一个较小的问题,至少是在预测 parsing 中。我们的最终模型能够预测错误的发生位置的准确率为 76.3%(底层parser)和 76.6%(顶层parser)。

Unsupervised Chunking with Hierarchical RNN

  • paper_url: http://arxiv.org/abs/2309.04919
  • repo_url: https://github.com/manga-uofa/uchrnn
  • paper_authors: Zijun Wu, Anup Anand Deshmukh, Yongkang Wu, Jimmy Lin, Lili Mou
  • for: 这篇论文主要是为了探讨一种无监督的句子分析方法,即用Recurrent Neural Network (RNN)模型来自动从语言模式中提取句子结构。
  • methods: 这篇论文使用了一种两层层次RNN模型,即 Hierarchical Recurrent Neural Network (HRNN),来模型单词到句子和句子到句子的组合。该方法包括了两个阶段的训练过程:首先预训练一个无监督分析器,然后使用下游NLP任务进行细化训练。
  • results: 实验结果表明,这种无监督 chunking 方法可以在 CoNLL-2000 数据集上提取句子结构,并且与现有的无监督方法相比,提高了一个phrase F1 分数的值。此外,在下游 NLP 任务的训练过程中,模型的性能进一步提高。有趣的是,我们发现在神经网络模型在下游任务训练过程中,句子结构的出现是暂时的。这种研究对无监督句子结构发现的进步做出了贡献,并开创了更多的语言理论研究的可能性。
    Abstract In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This paper introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a two-layer Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks. Experiments on the CoNLL-2000 dataset reveal a notable improvement over existing unsupervised methods, enhancing phrase F1 score by up to 6 percentage points. Further, finetuning with downstream tasks results in an additional performance improvement. Interestingly, we observe that the emergence of the chunking structure is transient during the neural model's downstream-task training. This study contributes to the advancement of unsupervised syntactic structure discovery and opens avenues for further research in linguistic theory.
    摘要 在自然语言处理(NLP)领域,预测语言结构,如分析和块分,旁通过手动标注语法结构来进行。这篇论文介绍了一种不需要监督的块分法,即将词语组合成非层次的方式。我们提出了一种两层层次逻辑神经网络(HRNN),用于模elling 词语到块和块到句子的组合。我们的方法包括两个阶段的训练过程:预训练与无监督分析器和下游 NLP 任务的训练。实验结果表明,我们的方法在 CoNLL-2000 数据集上具有明显的提升,提高了phrase F1分数 by up to 6个百分点。此外,在下游任务的训练中,再进行一次性的性能提升。另外,我们发现在神经网络模型在下游任务训练过程中,块分结构的出现是暂时的。这项研究对无监督语法结构发现的进步做出了贡献,并开启了更多的语言理论研究的可能性。