cs.CL - 2023-08-05

LaDA: Latent Dialogue Action For Zero-shot Cross-lingual Neural Network Language Modeling

paper_url: http://arxiv.org/abs/2308.02903
repo_url: None
paper_authors: Zhanyu Ma, Jian Ye, Shuang Cheng
for: 提高跨语言对话理解系统的表现（cross-lingual spoken language understanding）
methods: 使用隐藏对话动作层（LaDA）优化解码策略，提高处理距离语言的复杂多语言意图和槽值的能力
results: 在公共数据集上达到了领域内最佳的result，包括零shot和几shot适应情况

Abstract
Cross-lingual adaptation has proven effective in spoken language understanding (SLU) systems with limited resources. Existing methods are frequently unsatisfactory for intent detection and slot filling, particularly for distant languages that differ significantly from the source language in scripts, morphology, and syntax. Latent Dialogue Action (LaDA) layer is proposed to optimize decoding strategy in order to address the aforementioned issues. The model consists of an additional layer of latent dialogue action. It enables our model to improve a system's capability of handling conversations with complex multilingual intent and slot values of distant languages. To the best of our knowledge, this is the first exhaustive investigation of the use of latent variables for optimizing cross-lingual SLU policy during the decode stage. LaDA obtains state-of-the-art results on public datasets for both zero-shot and few-shot adaptation.

摘要
cross-lingual adaptation 已经证明对于听说语言理解（SLU）系统来说是有效的。现有的方法frequently不满足意向检测和槽填充，特别是对于远程语言而言，这些语言在字母、形态和语法方面存在显著的差异。为了解决这些问题，我们提出了 latent dialogue action（LaDA）层。这层允许我们的模型在decode阶段进行优化，以提高系统处理多语言意向和槽值的能力。据我们所知，这是首次对于在decode阶段使用隐藏变量进行cross-lingual SLU策略优化的实验。LaDA在公共数据集上获得了状态之arte Results for both zero-shot and few-shot adaptation。

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

paper_url: http://arxiv.org/abs/2308.02870
repo_url: None
paper_authors: Fangyuan Wang, Ming Hao, Yuhai Shi, Bo Xu
for: 提高自动语音识别（ASR）模型的性能
methods: 基于偏差-差异质量评估的早停止和模型权重平均
results: 在使用高级ASR模型时，提供2.5%-3.7%和3.1%-4.6% CER降准的改进Here’s the breakdown of each point:
for: The paper aims to improve the performance of ASR models.
methods: The authors rethink and update the early stopping and checkpoint averaging methods based on the bias-variance tradeoff, using training loss and validation loss as proxies for bias and variance.
results: The proposed method provides a 2.5%-3.7% and 3.1%-4.6% CER reduction on the AISHELL-1 and AISHELL-2 datasets, respectively, when evaluated with advanced ASR models.

Abstract
The conventional recipe for Automatic Speech Recognition (ASR) models is to 1) train multiple checkpoints on a training set while relying on a validation set to prevent overfitting using early stopping and 2) average several last checkpoints or that of the lowest validation losses to obtain the final model. In this paper, we rethink and update the early stopping and checkpoint averaging from the perspective of the bias-variance tradeoff. Theoretically, the bias and variance represent the fitness and variability of a model and the tradeoff of them determines the overall generalization error. But, it's impractical to evaluate them precisely. As an alternative, we take the training loss and validation loss as proxies of bias and variance and guide the early stopping and checkpoint averaging using their tradeoff, namely an Approximated Bias-Variance Tradeoff (ApproBiVT). When evaluating with advanced ASR models, our recipe provides 2.5%-3.7% and 3.1%-4.6% CER reduction on the AISHELL-1 and AISHELL-2, respectively.

摘要
传统的自动语音识别（ASR）模型的制作方式是：1）在训练集上训练多个检查点，使用验证集来避免过拟合，并2）将多个最后的检查点或验证loss的平均值作为最终模型。在这篇论文中，我们重新思考了早期停止和检查点平均值的方法，从偏差-方差质量的角度进行了更新。在理论上，偏差和方差表示模型的适应度和多样性，它们之间的质量评估会决定模型的总化适应错误。但是，不可能准确地评估它们。作为替代方案，我们使用训练损失和验证损失作为偏差和方差的代理，并通过它们之间的质量评估来引导早期停止和检查点平均值，即 Approximated Bias-Variance Tradeoff（ApproBiVT）。当评估高级ASR模型时，我们的方法可以提供2.5%-3.7%和3.1%-4.6%的CER降减在AISHELL-1和AISHELL-2上。

EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education

paper_url: http://arxiv.org/abs/2308.02773
repo_url: https://github.com/icalk-nlp/educhat
paper_authors: Yuhao Dan, Zhikai Lei, Yiyang Gu, Yong Li, Jianghao Yin, Jiaju Lin, Linhao Ye, Zhiyan Tie, Yougen Zhou, Yilei Wang, Aimin Zhou, Ze Zhou, Qin Chen, Jie Zhou, Liang He, Xipeng Qiu
for: 支持个性化、公正、慈善的智能教育，服务教师、学生和家长。
methods: 利用心理学和教育理论，提高教育功能，如开放问答、议论评估、索西里亚教学和情感支持，并通过预训练教育词汇和细化系统指令来学习域特定知识。
results: 已经在线上作为开源项目提供，其代码、数据和模型参数在平台（如GitHub和Hugging Face）上公开，同时准备了其功能示例（https://vimeo.com/851004454），旨在推动LLM的教育应用研究和应用。

Abstract
EduChat (https://www.educhat.top/) is a large-scale language model (LLM)-based chatbot system in the education domain. Its goal is to support personalized, fair, and compassionate intelligent education, serving teachers, students, and parents. Guided by theories from psychology and education, it further strengthens educational functions such as open question answering, essay assessment, Socratic teaching, and emotional support based on the existing basic LLMs. Particularly, we learn domain-specific knowledge by pre-training on the educational corpus and stimulate various skills with tool use by fine-tuning on designed system prompts and instructions. Currently, EduChat is available online as an open-source project, with its code, data, and model parameters available on platforms (e.g., GitHub https://github.com/icalk-nlp/EduChat, Hugging Face https://huggingface.co/ecnu-icalk ). We also prepare a demonstration of its capabilities online (https://vimeo.com/851004454). This initiative aims to promote research and applications of LLMs for intelligent education.

摘要
eduChat（https://www.educhat.top/）是一个基于大规模语言模型（LLM）的教育领域聊天机器人系统。其目标是支持个性化、公正、慈善的智能教育，为教师、学生和家长提供支持。受心理学和教育理论引导，它进一步增强了教育功能，如开放问答、作文评价、索洛尼教学和情感支持，基于现有的基础 LLM。特别是，我们通过预训练教育词汇库来学习域语言知识，并通过细化系统提示和指导来刺激多种技能。目前，eduChat已经在线上开放，可以免费下载和使用（如GitHub https://github.com/icalk-nlp/EduChat、Hugging Face https://huggingface.co/ecnu-icalk）。我们还准备了其功能示例视频（https://vimeo.com/851004454）。这一 iniciative 的目标是推动 LLM 的应用和研究在智能教育领域。

Meta-Tsallis-Entropy Minimization: A New Self-Training Approach for Domain Adaptation on Text Classification

paper_url: http://arxiv.org/abs/2308.02746
repo_url: None
paper_authors: Menglong Lu, Zhen Huang, Zhiliang Tian, Yunxiang Zhao, Xuanyu Fei, Dongsheng Li
for: 这篇论文的目的是提出一种基于自我训练的预测模型适应领域转换方法，以扩展自然语言处理中的文本分类任务。
methods: 这篇论文使用了一种名为Meta-Tsallis Entropy Minimization（MTEM）的meta-学习算法，将预测模型适应到目标领域。此外，还提出了一种简化Second-order derivation的减少计算成本的技术，以及一种热点抽样机制来快速生成 pseudo labels。
results: 实验结果显示，MTEM 可以提高 BERT 预测模型的适应性，平均提高了4%的benchmark数据集上的适应性。

Abstract
Text classification is a fundamental task for natural language processing, and adapting text classification models across domains has broad applications. Self-training generates pseudo-examples from the model's predictions and iteratively trains on the pseudo-examples, i.e., minimizes the loss on the source domain and the Gibbs entropy on the target domain. However, Gibbs entropy is sensitive to prediction errors, and thus, self-training tends to fail when the domain shift is large. In this paper, we propose Meta-Tsallis Entropy minimization (MTEM), which applies a meta-learning algorithm to optimize the instance adaptive Tsallis entropy on the target domain. To reduce the computation cost of MTEM, we propose an approximation technique to approximate the Second-order derivation involved in the meta-learning. To efficiently generate pseudo labels, we propose an annealing sampling mechanism for exploring the model's prediction probability. Theoretically, we prove the convergence of the meta-learning algorithm in MTEM and analyze the effectiveness of MTEM in achieving domain adaptation. Experimentally, MTEM improves the adaptation performance of BERT with an average of 4 percent on the benchmark dataset.

摘要

How Good Are SOTA Fake News Detectors

paper_url: http://arxiv.org/abs/2308.02727
repo_url: https://github.com/miceland2/fake_news_detection
paper_authors: Matthew Iceland
for: 本研究旨在评估机器学习模型在假新闻检测中的可靠性，以防止假信息在广泛传播前被检测出来。
methods: 本研究使用了多种传统和深度学习模型，包括Support Vector Machines（SVM）、Random Forest（RF）、Long Short-Term Memory（LSTM）和Transformer等，以评估这些模型在不同任务上的表现。
results: 研究发现，传统模型在数据外部分布上的泛化性比较好，而深度学习模型在特定任务上的表现可能更好，但最佳的模型可能取决于具体的任务。

Abstract
Automatic fake news detection with machine learning can prevent the dissemination of false statements before they gain many views. Several datasets labeling statements as legitimate or false have been created since the 2016 United States presidential election for the prospect of training machine learning models. We evaluate the robustness of both traditional and deep state-of-the-art models to gauge how well they may perform in the real world. We find that traditional models tend to generalize better to data outside the distribution it was trained on compared to more recently-developed large language models, though the best model to use may depend on the specific task at hand.

摘要
自动化假新闻检测使用机器学习可以防止假陈述在得到许多观看之前传播。自2016年美国总统选举以来，已经创建了多个标注声明为真实或假的数据集，用于训练机器学习模型。我们评估了传统和深度状态对的模型Robustness，以评估它们在真实世界中的表现。我们发现传统模型在不同的数据分布外的总体性能比较好，但最佳模型可能取决于具体的任务。Note: "Simplified Chinese" is a romanization of Chinese that uses simpler characters and grammar to facilitate learning and communication. It is not a formal standard, but rather a common informal writing system used in online communication and education.Translation notes:* "Automatic fake news detection" is translated as "自动化假新闻检测" (zìhuì zhìyì xiǎngxìn)* "with machine learning" is translated as "使用机器学习" (shǐyòu jīshù xuéxí)* "can prevent the dissemination of false statements" is translated as "可以防止假陈述的传播" (kěyǐ fángzhì zhēnshuō de chuánxì)* "before they gain many views" is translated as "前于它们得到许多观看" (qián yú tāmen dékuò yīduō guānkàn)* "Several datasets labeling statements as legitimate or false have been created" is translated as "已经创建了多个标注声明为真实或假的数据集" (yǐjīng chéngchǎi le duō gè tiǎnzi xiǎngxìn zhī shí yǔ bìng shí)* "since the 2016 United States presidential election" is translated as "自2016年美国总统选举以来" (zì 2016 nián míguó zǒngtǒng jiǎngdǎo zhīlì)* "for the prospect of training machine learning models" is translated as "用于训练机器学习模型" (yǐng yú xùnxíng jīshù xuéxí módel)* "We evaluate the robustness of both traditional and deep state-of-the-art models" is translated as "我们评估了传统和深度状态对的模型Robustness" (wǒmen píngyì le zhòngyì yǔ shēngrán zhìyì de módel Robustness)* "to gauge how well they may perform in the real world" is translated as "以评估它们在真实世界中的表现" (yǐ píngyì tāmen zhè yì shì jièshí zhìyì de biǎo xiǎng)* "We find that traditional models tend to generalize better to data outside the distribution it was trained on" is translated as "我们发现传统模型在不同的数据分布外的总体性能比较好" (wǒmen fāxìn zhòngyì módel zài bùdàng de xiǎngxì yì shì zhìyì de zǒngtǐ xìngkě)* "though the best model to use may depend on the specific task at hand" is translated as "尽管最佳模型可能取决于具体的任务" (juéyàng zuìjì módel kěnéng qùjué yú gè tâi zhì)

Forget Demonstrations, Focus on Learning from Textual Instructions

paper_url: http://arxiv.org/abs/2308.03795
repo_url: None
paper_authors: Renze Lou, Wenpeng Yin
for: Zero-shot cross-task generalization in demonstration-free learning from textual instructions.
methods: automatically find critical sentences in the definition, and a ranking objective to force the model to generate gold outputs with higher probabilities when those critical parts are highlighted.
results: state-of-the-art performance on a challenging benchmark.

Abstract
This work studies a challenging yet more realistic setting for zero-shot cross-task generalization: demonstration-free learning from textual instructions, presuming the existence of a paragraph-style task definition while no demonstrations exist. To better learn the task supervision from the definition, we propose two strategies: first, to automatically find out the critical sentences in the definition; second, a ranking objective to force the model to generate the gold outputs with higher probabilities when those critical parts are highlighted in the definition. The joint efforts of the two strategies yield state-of-the-art performance on the challenging benchmark. Our code will be released in the final version of the paper.

摘要

Automatically identifying crucial sentences in the definition.2. A ranking objective to encourage the model to generate the correct outputs with higher probabilities when the critical parts are highlighted in the definition.The combination of these two strategies achieves state-of-the-art performance on a challenging benchmark. Our code will be released with the final version of the paper.

Adapting the NICT-JLE Corpus for Disfluency Detection Models

paper_url: http://arxiv.org/abs/2308.02482
repo_url: None
paper_authors: Lucy Skidmore, Roger K. Moore
for: 本研究的目的是提供一个标准化的训练和评估模型的数据集，用于检测学习者语言表达中的干扰。
methods: 本研究使用了NICT-JLE corpus，并对其进行了适应化，以便用于干扰检测模型的训练和评估。
results: 本研究提供了一个标准化的训练、保留和测试集，可供未来的干扰检测研究使用。I hope that helps! Let me know if you have any other questions.

Abstract
The detection of disfluencies such as hesitations, repetitions and false starts commonly found in speech is a widely studied area of research. With a standardised process for evaluation using the Switchboard Corpus, model performance can be easily compared across approaches. This is not the case for disfluency detection research on learner speech, however, where such datasets have restricted access policies, making comparison and subsequent development of improved models more challenging. To address this issue, this paper describes the adaptation of the NICT-JLE corpus, containing approximately 300 hours of English learners' oral proficiency tests, to a format that is suitable for disfluency detection model training and evaluation. Points of difference between the NICT-JLE and Switchboard corpora are explored, followed by a detailed overview of adaptations to the tag set and meta-features of the NICT-JLE corpus. The result of this work provides a standardised train, heldout and test set for use in future research on disfluency detection for learner speech.

摘要
研究干扰现象如停顿、重复和开始错误的检测在语音中是广泛的研究领域。使用标准化的评估过程和 Switchboard Corporare，模型的性能可以方便地比较。但是对于学习者语音干扰检测研究，这些数据集却有限制的访问政策，使得对比和后续模型的改进更加困难。为解决这问题，这篇论文描述了将 NICT-JLE corpora（约为 300 小时的英语学习者口语考试）改进为适于干扰检测模型训练和评估的格式。本文首先探讨了 NICT-JLE 和 Switchboard corpora 之间的差异，然后详细介绍了 NICT-JLE corpora 的标签集和 ме타特征的修改。结果是提供了一个标准化的训练集、保留集和测试集，用于未来learner speech 干扰检测研究。

Towards Generalist Foundation Model for Radiology

paper_url: http://arxiv.org/abs/2308.02463
repo_url: https://github.com/chaoyi-wu/radfm
paper_authors: Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
for: 这项研究的目的是开发Radiology Foundation Model（RadFM）。
methods: 我们在数据、模型设计和评估方面进行了全面的考虑，并提出了一种架构，可以将文本输入与2D或3D医学扫描图像相互融合，并在多种医学任务上生成响应。模型首先在MedMD大规模医学多modal数据集上进行了预训练，然后在RadMD（医学清洗版MedMD）上进行了域专域精度训练。
results: 我们的实验结果表明，RadFM在处理多种医学问题时表现出色，比既有的多Modal基础模型更高。代码、数据和模型检查点都将公开发布，以便更多人继续研究和开发这一领域。

Abstract
In this study, we aim to initiate the development of Radiology Foundation Model, termed as RadFM.We consider the construction of foundational models from the perspectives of data, model design, and evaluation thoroughly. Our contribution can be concluded as follows: (i), we construct a large-scale Medical Multi-modal Dataset, MedMD, consisting of 16M 2D and 3D medical scans. To the best of our knowledge, this is the first multi-modal dataset containing 3D medical scans. (ii), We propose an architecture that enables visually conditioned generative pre-training, allowing for the integration of text input interleaved with 2D or 3D medical scans to generate response for diverse radiologic tasks. The model was initially pre-trained on MedMD and subsequently domain-specific fine-tuned on RadMD, a radiologic cleaned version of MedMD, containing 3M radiologic visual-language pairs. (iii), we propose a new evaluation benchmark that comprises five tasks, aiming to comprehensively assess the capability of foundation models in handling practical clinical problems. Our experimental results confirm that RadFM significantly outperforms existing multi-modal foundation models. The codes, data, and model checkpoint will all be made publicly available to promote further research and development in the field.

摘要
在本研究中，我们目标是开发Radiology Foundation Model（RadFM）。我们对foundational model的构建进行了全面的考虑，包括数据、模型设计和评估方面。我们的贡献可以结合以下三点：(i) 我们构建了一个大规模的医疗多modal数据集（MedMD），包含1600万个2D和3D医疗扫描图像。到目前为止，这是第一个包含3D医疗扫描图像的多modal数据集。(ii) 我们提出了一种拥有可视条件生成的架构，允许将文本输入与2D或3D医疗扫描图像相互融合，以生成多种医疗任务的回应。该模型首先在MedMD上进行了 pré-training，然后在RadMD上进行了域pecific的精度调整，RadMD包含300万个医疗视语对。(iii) 我们提出了一个新的评估指标，包括五个任务，旨在全面评估基础模型在解决实际临床问题时的能力。我们的实验结果表明，RadFM在多modal基础模型中表现出色，在不同的医疗任务上具有优异的表现。我们将代码、数据和模型检查点一起公开，以便进一步的研究和发展。

Legal Summarisation through LLMs: The PRODIGIT Project

paper_url: http://arxiv.org/abs/2308.04416
repo_url: None
paper_authors: Thiago Dal Pont, Federico Galli, Andrea Loreggia, Giuseppe Pisano, Riccardo Rovatti, Giovanni Sartor
for: 本研究旨在支持税务法官和律师通过数字技术，特点是使用人工智能（AI）。
methods: 本研究使用了不同的工具和方法来进行抽取和概要生成，包括语言模型（LLMs）和GPT4。
results: 根据专业税务法官和律师的评价，研究所得到的结果得到了满意的评价。在此基础之上，正在建立一个protoype应用程序，将在公共领域中公布。

Abstract
We present some initial results of a large-scale Italian project called PRODIGIT which aims to support tax judges and lawyers through digital technology, focusing on AI. We have focused on generation of summaries of judicial decisions and on the extraction of related information, such as the identification of legal issues and decision-making criteria, and the specification of keywords. To this end, we have deployed and evaluated different tools and approaches to extractive and abstractive summarisation. We have applied LLMs, and particularly on GPT4, which has enabled us to obtain results that proved satisfactory, according to an evaluation by expert tax judges and lawyers. On this basis, a prototype application is being built which will be made publicly available.

摘要
我们现在宣布一些初步结果的大规模意大利项目，名为PRODIGIT，旨在通过数字技术支持税务法官和律师，重点是人工智能。我们对判决摘要生成和相关信息提取进行了重点尝试，包括法律问题识别和决策标准的特定，以及关键词的指定。为此，我们已经部署和评估了不同的工具和方法，包括自然语言处理技术和GPT4。经专业税务法官和律师评估，我们的结果得到了满意的评价。基于这个基础，我们正在建立一个原型应用程序，计划将其公开发布。