cs.CL - 2023-08-21

Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2308.10783
  • repo_url: None
  • paper_authors: Md. Arid Hasan, Shudipta Das, Afiyat Anjum, Firoj Alam, Anika Anjum, Avijit Sarker, Sheak Rashed Haider Noori
  • for: 这研究旨在提供大量手动标注的孟加拉新闻推文和Facebook评论数据集,以及在孟加拉语言模型中进行零或几回shot学习的研究。
  • methods: 本研究使用了多种语言模型,包括Flan-T5、GPT-4和Bloomz,并进行了比较分析。
  • results: 研究发现,单语言变换器基本模型在零和几回shot场景下 consistently outperform其他模型。
    Abstract The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the recent unprecedented performance of Large Language Models (LLMs) in various applications highlights the need to evaluate them in the context of low-resource languages. In this study, we present a sizeable manually annotated dataset encompassing 33,605 Bangla news tweets and Facebook comments. We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz, offering a comparative analysis against fine-tuned models. Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios. To foster continued exploration, we intend to make this dataset and our research tools publicly available to the broader research community. In the spirit of further research, we plan to make this dataset and our experimental resources publicly accessible to the wider research community.
    摘要 随着数字世界的快速扩张,情感分析已成为多个领域的重要工具,包括市场营销、政治、客户服务和医疗。虽然拥有广泛的进步,低资源语言,如孟加拉语,仍然受到资源约束,而且没有充分研究。此外,最近的不同应用场景中大语言模型(LLMs)的突出表现,强调了对低资源语言的评估。本研究提供了33605个孟加拉语新闻推文和Facebook评论的手动标注数据集。我们还 investigate了零和几个shot在Context中学习,包括Flan-T5、GPT-4和Bloomz等语言模型,并对这些模型进行比较分析。我们的发现表明,单语言变换器基模型在零和几个shot情况下一直表现出优于其他模型。为了促进进一步的探索,我们计划将这个数据集和我们的研究工具公开提供给更广泛的研究人员。

DepreSym: A Depression Symptom Annotated Corpus and the Role of LLMs as Assessors of Psychological Markers

  • paper_url: http://arxiv.org/abs/2308.10758
  • repo_url: None
  • paper_authors: Anxo Pérez, Marcos Fernández-Pichel, Javier Parapar, David E. Losada
  • for: 该论文旨在探讨计算方法如何从在线文章中检测抑郁症状。
  • methods: 该论文使用了现有的 traces of depression from online publications 和 Beck Depression Inventory-II (BDI-II) 等方法进行研究。
  • results: 该论文提出了一个新的搜索任务,并提供了21580个已标注的句子,以便进一步探讨抑郁症状的检测。
    Abstract Computational methods for depression detection aim to mine traces of depression from online publications posted by Internet users. However, solutions trained on existing collections exhibit limited generalisation and interpretability. To tackle these issues, recent studies have shown that identifying depressive symptoms can lead to more robust models. The eRisk initiative fosters research on this area and has recently proposed a new ranking task focused on developing search methods to find sentences related to depressive symptoms. This search challenge relies on the symptoms specified by the Beck Depression Inventory-II (BDI-II), a questionnaire widely used in clinical practice. Based on the participant systems' results, we present the DepreSym dataset, consisting of 21580 sentences annotated according to their relevance to the 21 BDI-II symptoms. The labelled sentences come from a pool of diverse ranking methods, and the final dataset serves as a valuable resource for advancing the development of models that incorporate depressive markers such as clinical symptoms. Due to the complex nature of this relevance annotation, we designed a robust assessment methodology carried out by three expert assessors (including an expert psychologist). Additionally, we explore here the feasibility of employing recent Large Language Models (ChatGPT and GPT4) as potential assessors in this complex task. We undertake a comprehensive examination of their performance, determine their main limitations and analyze their role as a complement or replacement for human annotators.
    摘要 计算方法用于检测抑郁症状通过在互联网上发布的文章中挖掘抑郁症状的迹象。然而,现有的解决方案具有限制性和可读性问题。为了解决这些问题,latest studies have shown that identifying depressive symptoms can lead to more robust models。eRisk initiative 推动了这个领域的研究,并提出了一个新的排名任务,旨在开发搜索方法,以找到与抑郁症状相关的句子。这个搜索挑战基于 Beck Depression Inventory-II (BDI-II) 问卷中的症状列表,这种问卷在临床实践中广泛使用。我们从参与者系统的结果中提取了21580个标注过的句子,这些句子根据它们与 BDI-II 症状的相关性进行标注。标注的句子来自多种多样的排名方法,最终的数据集成为一个重要的资源,用于提高包含抑郁标志的模型的发展。由于这种相关性标注的复杂性,我们采用了一种Robust评估方法,由三名专家评估器(包括一名专业心理学家)进行评估。此外,我们还 explore了使用最新的大语言模型(ChatGPT和GPT4)作为这种复杂任务的评估者的可能性。我们进行了全面的评估,确定了它们的主要局限性,并分析了它们在这种任务中的角色,是否可以取代或补充人类评估器。

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

  • paper_url: http://arxiv.org/abs/2308.10755
  • repo_url: https://github.com/opendatalab/WanJuan1.0
  • paper_authors: Conghui He, Zhenjiang Jin, Chao Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Dahua Lin
  • For: The paper aims to address the lack of transparency and scarcity of open-source data in the development of large language models (LLMs) and multimodal language models (MLLMs) by presenting a large-scale multimodal dataset called “Wan Juan”.* Methods: The paper uses a large-scale dataset called “Wan Juan” which includes text, image-text, and video modalities, with a total volume exceeding 2TB, to train a model called InternLM.* Results: The paper demonstrates the effectiveness of the “Wan Juan” dataset and the InternLM model in multi-dimensional evaluations, showing significant advantages over models of a similar scale.Here are the three key points in Simplified Chinese text:* For: 本研究目的是为了解决大语言模型(LLM)和多modal语言模型(MLLM)的开发中数据的缺乏透明度和开源数据的稀缺,通过提出一个大规模多Modal的数据集“wan Juan”。* Methods: 本研究使用的是一个大规模的数据集“wan Juan”,包括文本、图片文本和视频模式,总量超过2TB,用于训练一个名为InternLM的模型。* Results: 本研究 demonstates “wan Juan”数据集和InternLM模型在多维评估中的效果,显示与类似规模的模型相比有显著优势。
    Abstract The rise in popularity of ChatGPT and GPT-4 has significantly accelerated the development of large models, leading to the creation of numerous impressive large language models(LLMs) and multimodal large language models (MLLMs). These cutting-edge models owe their remarkable performance to high-quality data. However, the details of the training data used in leading paradigms are often kept confidential. This lack of transparency, coupled with the scarcity of open-source data, impedes further developments within the community. As a response, this paper presents "Wan Juan", a large-scale multimodal dataset composed of both Chinese and English data, collected from a wide range of web sources. The dataset incorporates text, image-text, and video modalities, with a total volume exceeding 2TB. It was utilized in the training of InternLM, a model that demonstrated significant advantages in multi-dimensional evaluations when compared to models of a similar scale. All data can be accessed at https://opendatalab.org.cn/WanJuan1.0.
    摘要 “chatgpt”和“gpt-4”的流行化使得大型模型的开发速度得到了 significatively加速,这导致了许多出色的大语言模型(LLMs)和多Modal大语言模型(MLLMs)的创造。这些顶尖模型的卓越表现归功于高质量的数据。然而,领先的模型训练数据的细节经常被保密,这与开源数据的缺乏使得后续的发展受阻。为此,本文介绍了“万娟”,一个大规模多Modal数据集,包括中英文数据,从各种网络源收集而来。该数据集包括文本、图像文本和视频模式,总体量超过2TB。它被用于InternLM模型的训练,该模型在多维评估中表现出了明显的优势。所有数据可以在https://opendatalab.org.cn/WanJuan1.0中下载。

Systematic Offensive Stereotyping (SOS) Bias in Language Models

  • paper_url: http://arxiv.org/abs/2308.10684
  • repo_url: None
  • paper_authors: Fatma Elsafoury
  • For: This paper investigates the systematic offensive stereotype (SOS) bias in language models (LMs) and its impact on their performance and fairness in the task of hate speech detection.* Methods: The authors propose a method to measure the SOS bias in LMs and validate it using a dataset of tweets. They also investigate the effectiveness of debias methods from the literature on removing the SOS bias.* Results: All the inspected LMs are found to be SOS biased, and the SOS bias is reflective of the hate experienced online by marginalized groups. The authors find that removing the SOS bias using a popular debias method leads to worse SOS bias scores, and there is evidence that the SOS bias is impactful on the fairness of the LMs but not their performance on hate speech detection.Here is the same information in Simplified Chinese text:* For: 这篇论文研究了语言模型(LM)中的系统性的负面刻板偏见(SOS)偏见和其对于词汇检测任务的性能和公平性的影响。* Methods: 作者们提出了一种测试SOS偏见的方法,并使用一个推文数据集验证了这种方法。他们还 investigate了文献中的debias方法对于去除SOS偏见的效果。* Results: 所有检查的LM都被发现具有SOS偏见,并且这种偏见与在网络上受到歧视的受试人群的偏见相关。 removing SOS偏见使用文献中的popular debias方法会导致SOS偏见得分更差,并且发现SOS偏见对LM的公平性有影响,但对于词汇检测任务的性能没有强有力的证据。
    Abstract Research has shown that language models (LMs) are socially biased. However, toxicity and offensive stereotyping bias in LMs are understudied. In this paper, we investigate the systematic offensive stereotype (SOS) bias in LMs. We propose a method to measure it. Then, we validate the SOS bias and investigate the effectiveness of debias methods from the literature on removing it. Finally, we investigate the impact of the SOS bias in LMs on their performance and their fairness on the task of hate speech detection. Our results suggest that all the inspected LMs are SOS biased. The results suggest that the SOS bias in LMs is reflective of the hate experienced online by the inspected marginalized groups. The results indicate that removing the SOS bias in LMs, using a popular debias method from the literature, leads to worse SOS bias scores. Finally, Our results show no strong evidence that the SOS bias in LMs is impactful on their performance on hate speech detection. On the other hand, there is evidence that the SOS bias in LMs is impactful on their fairness.
    摘要 研究表明,语言模型(LM)具有社会偏见。然而,语言模型中的排斥和负面刻板偏见尚未得到充分研究。在这篇论文中,我们调查了语言模型中的系统性排斥偏见(SOS)。我们提出了一种测量方法。然后,我们验证了SOS偏见的存在并研究了文献中的除偏见方法对其的效果。最后,我们 investigate了语言模型中SOS偏见对其性能和公平性的影响。我们发现所有检查的语言模型都具有SOS偏见。结果表明SOS偏见在语言模型中是对在线受到恐吓的弱化群体的表现。结果表明,使用文献中的受欢迎除偏见方法可以将SOS偏见从语言模型中除掉,但是这会导致SOS偏见的加大。最后,我们发现SOS偏见在语言模型中没有显著影响性能,但是它确实会影响公平性。

LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices

  • paper_url: http://arxiv.org/abs/2308.10682
  • repo_url: None
  • paper_authors: Joerg Schmalenstroeer, Tobias Gburrek, Reinhold Haeb-Umbach
  • for: 本 dataset 用于测试适用于无线声音传感器网络的时钟同步算法、会议分离、记录系统和转录系统。
  • methods: 本 dataset 使用了五款智能手机和四款麦克风数组,共录制29个频道。
  • results: 本 dataset 包含两个不同的会议室的数据,并且提供了真实的会议分离信息。
    Abstract We present LibriWASN, a data set whose design follows closely the LibriCSS meeting recognition data set, with the marked difference that the data is recorded with devices that are randomly positioned on a meeting table and whose sampling clocks are not synchronized. Nine different devices, five smartphones with a single recording channel and four microphone arrays, are used to record a total of 29 channels. Other than that, the data set follows closely the LibriCSS design: the same LibriSpeech sentences are played back from eight loudspeakers arranged around a meeting table and the data is organized in subsets with different percentages of speech overlap. LibriWASN is meant as a test set for clock synchronization algorithms, meeting separation, diarization and transcription systems on ad-hoc wireless acoustic sensor networks. Due to its similarity to LibriCSS, meeting transcription systems developed for the former can readily be tested on LibriWASN. The data set is recorded in two different rooms and is complemented with ground-truth diarization information of who speaks when.
    摘要 我们现在介绍LibriWASN数据集,其设计与LibriCSS会议认知数据集相似,但与之不同的是,数据被随机布置在会议表格上并且采样时钟不同步。数据集使用了五款智能手机和四个麦克风数组,共录制29个频道。除此之外,数据集几乎与LibriCSS设计相同:同样的LibriSpeech句子通过八个喇叭放在会议表格周围播放,数据分为不同的说话重叠百分比下的子集。LibriWASN是为无线听写感知网络上的时钟同步算法、会议分离、分类和转录系统进行测试而设计的。由于与LibriCSS类似,已有的会议转录系统可以轻松地在LibriWASN上测试。数据集在两个不同的房间中录制,并提供了会议时间的真实分类信息。

BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

  • paper_url: http://arxiv.org/abs/2308.10592
  • repo_url: https://github.com/ziliat-nask/ban-pl
  • paper_authors: Inez Okulska, Kinga Głąbińska, Anna Kołos, Agnieszka Karlińska, Emilia Wiśnios, Adam Nowakowski, Paweł Ellerik, Andrzej Prałat
  • for: 这篇论文旨在提供一个公共可用的波兰语社交媒体内容敏感词汇数据集,以便进一步提高自动敏感词汇检测技术的发展。
  • methods: 该论文使用了专业模糊器标注的社交媒体内容,包括文章和评论,并将其分为两个类别:“危险”和“中性”。数据集采集和处理过程得到了详细的描述,同时还提供了高级预处理脚本,如掩码词汇检测。
  • results: 该论文提供了一个名为BAN-PL的公共可用波兰语社交媒体内容敏感词汇数据集,包括691,662个社交媒体内容,并具有良好的分类性。
    Abstract Advances in automated detection of offensive language online, including hate speech and cyberbullying, require improved access to publicly available datasets comprising social media content. In this paper, we introduce BAN-PL, the first open dataset in the Polish language that encompasses texts flagged as harmful and subsequently removed by professional moderators. The dataset encompasses a total of 691,662 pieces of content from a popular social networking service, Wykop, often referred to as the "Polish Reddit", including both posts and comments, and is evenly distributed into two distinct classes: "harmful" and "neutral". We provide a comprehensive description of the data collection and preprocessing procedures, as well as highlight the linguistic specificity of the data. The BAN-PL dataset, along with advanced preprocessing scripts for, i.a., unmasking profanities, will be publicly available.
    摘要 “现代化的自动检测 hate speech 和 cyberbullying 在线上需要更好的公共数据集,包括社交媒体内容。在这篇文章中,我们介绍 BAN-PL,是波兰语言中第一个公开的 dataset,包括被评估为伤害的和后来被专业 Moderator 移除的文本。这个 dataset 包括 Wykop 社交网络服务上的 691,662 则内容,包括文章和评论,并对应到两个不同的类别:“伤害”和“中立”。我们提供了详细的数据收集和预处理程序程序,以及资料的语言特点。 BAN-PL dataset 、以及针对 i.a. 解除诅咒词的进阶预处理脚本,将公开提供。”

Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning

  • paper_url: http://arxiv.org/abs/2308.10585
  • repo_url: https://github.com/zirui-hit/bridge_for_numerical_reasoning
  • paper_authors: Dingzirui Wang, Longxu Dou, Wenbin Zhang, Junyu Zeng, Wanxiang Che
  • For: The paper aims to improve the performance of numerical reasoning models by using equations as Intermediate Meaning Representations (IMRs) and addressing two main problems: (1) theoretically proving that equations are more accurate than programs, and (2) improving the generation accuracy of equations with large language models (LLMs).* Methods: The proposed method, called Bridge, consists of two stages: (1) a proof proposition to compare the generation accuracy of different IMRs, and (2) a method to improve the generation accuracy of equations by reducing the tendency of generating constant expressions and programs.* Results: The proposed method achieves state-of-the-art performance on three datasets (GSM8K, SVAMP, and Algebra) under the single reasoning path setting, with an improvement of 2.2%, 0.9%, and 1.7% compared to previous methods.Here is the Chinese version of the information:* For: 本研究旨在提高数理逻辑模型的性能,通过使用方程作为中间意义表示(IMR),并解决两个主要问题:(1) 理论上证明方程的生成精度高于程序,以及(2) 使用大语言模型(LLM)生成方程的精度提高。* Methods: 提议的方法称为“桥”,包括两个阶段:(1) 一个证明方程的生成精度比较propulsion,以及(2) 一种改进LLM生成方程的方法,即减少常量表达和程序的生成倾向。* Results: 提议的方法在三个 datasets(GSM8K、SVAMP、Algebra)下,在单个逻辑路径设置下达到了现有最佳性能,与之前的方法相比,提高2.2%、0.9%和1.7%。
    Abstract Numerical reasoning is vital for natural language processing models to understand and process numerical information in real-world scenarios. Most current methods first generate the Intermediate Meaning Representations (IMRs) of questions and then generate answers. Current SOTA methods generate programs as IMRs with large language models (LLMs). Intuitively, equations have fewer restrictions and closer semantics to the question than programs, leading to higher generation accuracy. However, current LLMs generate equations worse than programs, where we assume that the equation data is rare in pre-training data compared to programs. So in this paper, we try to use equations as IMRs to solve the numerical reasoning task by addressing two problems: (1) Theoretically, how to prove that the equation is an IMR with higher generation accuracy than programs; (2) Empirically, how to improve the generation accuracy of equations with LLMs. For the first problem, we propose and prove a proposition to theoretically compare the generation accuracy of different IMRs. For the second problem, we present a method called Boosting Numerical Reason\textbfing by Decomposing the Generation of Equations (Bridge), which can improve the accuracy of LLMs in generating equations as IMRs by reducing the tendency of generating constant expressions and programs. Our method improves the performance by 2.2%, 0.9%, and 1.7% on GSM8K, SVAMP, and Algebra datasets compared to the previous state-of-the-art methods under the single reasoning path setting. Our codes and prompts are released in https://github.com/zirui-HIT/Bridge_for_Numerical_Reasoning.
    摘要 现代自然语言处理模型需要数学逻辑能力来理解和处理实际场景中的数字信息。大多数当前方法首先生成问题的中间意义表示(IMR),然后生成答案。当前最佳方法使用大型自然语言模型(LLM)生成程序作为IMR,然而,当前LLM生成Equation的能力较差。我们假设Equation数据在预训练数据中较少,导致LLM生成Equation的能力差。因此,在这篇论文中,我们尝试使用Equation作为IMR来解决数学逻辑任务,并解决两个问题:1. theoretically,如何证明Equation是IMR,并且与程序相比具有更高的生成精度?2. empirically,如何使用LLM来改进Equation的生成精度?为了解决第一个问题,我们提出和证明了一个命题来比较不同IMR的生成精度。为了解决第二个问题,我们提出了一种方法called Bridge,它可以通过减少生成常量表达和程序的倾向来改进LLM对Equation的生成精度。我们的方法在GSM8K、SVAMP和Algebra datasets上比前一代方法提高了2.2%、0.9%和1.7%的性能。我们的代码和提问在https://github.com/zirui-HIT/Bridge_for_Numerical_Reasoning上发布。

Weakly synchronous systems with three machines are Turing powerful

  • paper_url: http://arxiv.org/abs/2308.10578
  • repo_url: None
  • paper_authors: Cinzia Di Giusto, Davide Ferré, Etienne Lozes, Nicolas Nisse
  • for: 这篇论文旨在研究弱同步分布系统中的聚合状态机(CFM)模型,以及这些系统在弱同步环境中的可达性问题。
  • methods: 该论文使用了弱同步系统中进程之间的阶段性通信方式,并研究了这些系统的配置可达性问题。
  • results: 研究发现,即使有三个进程,弱同步系统的配置可达性问题仍然是不可解决的。这个结论受到了对消息流chart(MSC)的树宽度研究的启发。
    Abstract Communicating finite-state machines (CFMs) are a Turing powerful model of asynchronous message-passing distributed systems. In weakly synchronous systems, processes communicate through phases in which messages are first sent and then received, for each process. Such systems enjoy a limited form of synchronization, and for some communication models, this restriction is enough to make the reachability problem decidable. In particular, we explore the intriguing case of p2p (FIFO) communication, for which the reachability problem is known to be undecidable for four processes, but decidable for two. We show that the configuration reachability problem for weakly synchronous systems of three processes is undecidable. This result is heavily inspired by our study on the treewidth of the Message Sequence Charts (MSCs) that might be generated by such systems. In this sense, the main contribution of this work is a weakly synchronous system with three processes that generates MSCs of arbitrarily large treewidth.
    摘要 通信Finite-state machine(CFM)是一种图灵完善的异步消息传递分布系统模型。在弱同步系统中,进程通过阶段来进行消息传递,每个进程都有一个阶段。这些系统具有有限的同步化,并且对某些交通模型来说,这些限制足够使得可达性问题可以解决。特别是,我们研究了点对点(FIFO)通信模型,其中的可达性问题知道对四个进程是不可解决的,但对两个进程是可解决的。我们表明,弱同步系统中的三个进程的配置可达性问题是不可解决的。这个结果受我们对消息流图(MSCs)的树宽度研究的启发。因此,本工作的主要贡献是一种弱同步系统,该系统可以生成MSCs的arbitrary大树宽度。

Software Entity Recognition with Noise-Robust Learning

  • paper_url: http://arxiv.org/abs/2308.10564
  • repo_url: https://github.com/taidnguyen/software_entity_recognition
  • paper_authors: Tai Nguyen, Yifeng Di, Joohan Lee, Muhao Chen, Tianyi Zhang
  • for: 提高软件工程技术的可能性,例如自动生成文档、跟踪链接回溯和API建议。
  • methods: 利用Wikipedia分类和大量标注数据,建立了79K个不同类型的软件实体词典,并提出了一种含污染数据的学习方法自适应训练模型。
  • results: 比对自适应训练模型和常见方法,在Wikipedia测试集和Stack Overflow测试集上表现出色,显示了含污染数据的学习方法可以提高软件实体识别模型的性能。
    Abstract Recognizing software entities such as library names from free-form text is essential to enable many software engineering (SE) technologies, such as traceability link recovery, automated documentation, and API recommendation. While many approaches have been proposed to address this problem, they suffer from small entity vocabularies or noisy training data, hindering their ability to recognize software entities mentioned in sophisticated narratives. To address this challenge, we leverage the Wikipedia taxonomy to develop a comprehensive entity lexicon with 79K unique software entities in 12 fine-grained types, as well as a large labeled dataset of over 1.7M sentences. Then, we propose self-regularization, a noise-robust learning approach, to the training of our software entity recognition (SER) model by accounting for many dropouts. Results show that models trained with self-regularization outperform both their vanilla counterparts and state-of-the-art approaches on our Wikipedia benchmark and two Stack Overflow benchmarks. We release our models, data, and code for future research.
    摘要 识别软件实体(如库名称)从自由文本中是软件工程(SE)技术的关键,以实现traceability链回收、自动文档生成和API建议。虽然许多方法已经提出来解决这个问题,但它们受到小词汇或噪声训练数据的限制,使得它们无法识别复杂的 narraive 中的软件实体。为解决这个挑战,我们利用Wikipedia分类系统开发了一个完整的实体词典,包含79K个唯一软件实体,以及12种细化类型,同时还提供了大量标注的数据集,包含超过1.7万句 sentences。然后,我们提议使用自我REGULARIZATION,一种鲁棒学习方法,来训练我们的软件实体识别(SER)模型,并考虑多个dropout。结果显示,使用自我REGULARIZATION训练的模型在Wikipedia标准 benchmark和Stack Overflow两个标准 benchmark上都超过了其普通对应和当前state-of-the-art方法。我们发布了我们的模型、数据和代码,以便未来的研究。

SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding

  • paper_url: http://arxiv.org/abs/2308.10529
  • repo_url: https://github.com/alibaba-nlp/seqgpt
  • paper_authors: Tianyu Yu, Chengyue Jiang, Chao Lou, Shen Huang, Xiaobin Wang, Wei Liu, Jiong Cai, Yangning Li, Yinghui Li, Kewei Tu, Hai-Tao Zheng, Ningyu Zhang, Pengjun Xie, Fei Huang, Yong Jiang
  • for: seqgpt 是一个开源的自动生成模型,旨在提高自然语言理解(nlu)任务的表现。
  • methods: seqgpt 使用 atomic tasks 来表达所有 nlu 任务,并通过 instruciton-tuning 和 fine-tuning 来进行特化。
  • results: seqgpt 在不同的领域和 datasets 上表现出良好的分类和抽取能力,并能够在未见领域进行语言理解任务。
    Abstract Large language models (LLMs) have shown impressive ability for open-domain NLP tasks. However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format. Their performances on NLU tasks are highly related to prompts or demonstrations and are shown to be poor at performing several representative NLU tasks, such as event extraction and entity typing. To this end, we present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding. We express all NLU tasks with two atomic tasks, which define fixed instructions to restrict the input and output format but still ``open'' for arbitrarily varied label sets. The model is first instruction-tuned with extremely fine-grained labeled data synthesized by ChatGPT and then further fine-tuned by 233 different atomic tasks from 152 datasets across various domains. The experimental results show that SeqGPT has decent classification and extraction ability, and is capable of performing language understanding tasks on unseen domains. We also conduct empirical studies on the scaling of data and model size as well as on the transfer across tasks. Our model is accessible at https://github.com/Alibaba-NLP/SeqGPT.
    摘要 大型语言模型(LLM)在开放领域自然语言处理任务(NLP)中表现出色,但是LLM有时候太具有游离性,无法适应具有限定输入和输出格式的自然语言理解任务(NLU)。它们在NLU任务中表现不佳,特别是事件抽取和实体分类等代表性任务。为此,我们提出了SeqGPT,一个英文和中文双语的开源自动生成模型,特地优化为开放领域自然语言理解。我们将所有NLU任务表示为两个原子任务,这两个任务定义了固定的输入和输出格式,但是仍然具有“开放”的特性,可以对输入和输出进行自由变换。模型首先通过极细化的标签数据synthesized by ChatGPT进行 instrucion-tuning,然后进行了233个原子任务的进一步精度调整。实验结果表明,SeqGPT具有不错的分类和抽取能力,可以在未看到的领域中进行语言理解任务。我们还进行了数据和模型大小的扩展研究以及任务之间的转移研究。我们的模型可以在https://github.com/Alibaba-NLP/SeqGPT中获取。

GradientCoin: A Peer-to-Peer Decentralized Large Language Models

  • paper_url: http://arxiv.org/abs/2308.10502
  • repo_url: None
  • paper_authors: Yeqi Gao, Zhao Song, Junze Yin
  • for: 这篇论文的目的是提出一种基于Bitcoin电子现金系统的分布式大语言模型(LLM),以解决现有大语言模型的中央化控制和不可预测性问题。
  • methods: 该论文使用了Bitcoin电子现金系统的技术和概念,并提出了一种基于Bitcoin的分布式LLM的设计方案。
  • results: 论文提出的分布式LLM可以解决现有大语言模型的中央化控制和不可预测性问题,但实现该系统可能会遇到各种实际困难。此外,该新系统可能不会在经济效益方面超过标准Bitcoin系统。
    Abstract Since 2008, after the proposal of a Bitcoin electronic cash system, Bitcoin has fundamentally changed the economic system over the last decade. Since 2022, large language models (LLMs) such as GPT have outperformed humans in many real-life tasks. However, these large language models have several practical issues. For example, the model is centralized and controlled by a specific unit. One weakness is that if that unit decides to shut down the model, it cannot be used anymore. The second weakness is the lack of guaranteed discrepancy behind this model, as certain dishonest units may design their own models and feed them unhealthy training data. In this work, we propose a purely theoretical design of a decentralized LLM that operates similarly to a Bitcoin cash system. However, implementing such a system might encounter various practical difficulties. Furthermore, this new system is unlikely to perform better than the standard Bitcoin system in economics. Therefore, the motivation for designing such a system is limited. It is likely that only two types of people would be interested in setting up a practical system for it: $\bullet$ Those who prefer to use a decentralized ChatGPT-like software. $\bullet$ Those who believe that the purpose of carbon-based life is to create silicon-based life, such as Optimus Prime in Transformers. The reason the second type of people may be interested is that it is possible that one day an AI system like this will awaken and become the next level of intelligence on this planet.
    摘要 In this work, we propose a purely theoretical design of a decentralized LLM that operates similarly to a Bitcoin cash system. However, implementing such a system may encounter various practical difficulties. Furthermore, this new system is unlikely to perform better than the standard Bitcoin system in economics. Therefore, the motivation for designing such a system is limited.It is likely that only two types of people would be interested in setting up a practical system for it:1. Those who prefer to use a decentralized ChatGPT-like software.2. Those who believe that the purpose of carbon-based life is to create silicon-based life, such as Optimus Prime in Transformers. The reason the second type of people may be interested is that it is possible that one day an AI system like this will awaken and become the next level of intelligence on this planet.

An Effective Method using Phrase Mechanism in Neural Machine Translation

  • paper_url: http://arxiv.org/abs/2308.10482
  • repo_url: https://github.com/phuongnm94/PhraseTransformer
  • paper_authors: Phuong Minh Nguyen, Le Minh Nguyen
  • for: 提高 Vietnamese-Chinese parallel corpora 的 Neural Machine Translation (NMT) 系统。
  • methods: 使用 PhraseTransformer Mechanism,基于 Transformer 模型,对 Vietnamese-Chinese parallel corpora 进行改进。
  • results: 在 VLSP 2022 竞赛 MT 数据集上, achieved BLEU 分数为 35.3 和 33.2 分 respectively。Here’s the English version for reference:
  • for: Improving the Neural Machine Translation (NMT) system for Vietnamese-Chinese parallel corpora.
  • methods: Using a phrase mechanism, PhraseTransformer, based on the Transformer model, to improve the system for Vietnamese-Chinese parallel corpora.
  • results: Achieved BLEU scores of 35.3 and 33.2 on Vietnamese to Chinese and Chinese to Vietnamese data, respectively, on the VLSP 2022 competition MT dataset.
    Abstract Machine Translation is one of the essential tasks in Natural Language Processing (NLP), which has massive applications in real life as well as contributing to other tasks in the NLP research community. Recently, Transformer -based methods have attracted numerous researchers in this domain and achieved state-of-the-art results in most of the pair languages. In this paper, we report an effective method using a phrase mechanism, PhraseTransformer, to improve the strong baseline model Transformer in constructing a Neural Machine Translation (NMT) system for parallel corpora Vietnamese-Chinese. Our experiments on the MT dataset of the VLSP 2022 competition achieved the BLEU score of 35.3 on Vietnamese to Chinese and 33.2 BLEU scores on Chinese to Vietnamese data. Our code is available at https://github.com/phuongnm94/PhraseTransformer.
    摘要 机器翻译是自然语言处理(NLP)中的一项重要任务,它在实际生活中有很大的应用,同时也对NLP研究领域中的其他任务产生了贡献。在最近的研究中,基于Transformer算法的方法在这个领域中吸引了大量研究者,并在大多数对应语言中取得了状态计算机Results。本文报道了一种使用短语机制,PhraseTransformer,以提高基eline模型Transformer在构建并行 corpora vietnamese-chinese 的 neural machine translation(NMT)系统中的性能。我们在 VLSP 2022 竞赛的 MT 数据集上进行了实验,并取得了 Vietnamese 到 Chinese 的 BLEU 分数为 35.3,以及 Chinese 到 Vietnamese 的 BLEU 分数为 33.2。我们的代码可以在 GitHub 上找到:https://github.com/phuongnm94/PhraseTransformer。

Implicit Self-supervised Language Representation for Spoken Language Diarization

  • paper_url: http://arxiv.org/abs/2308.10470
  • repo_url: None
  • paper_authors: Jagabandhu Mishra, S. R. Mahadeva Prasanna
  • for: 这 paper 是为了研究 spoken language diarization (LD) 的应用,特别是使用 implicit framework 来处理 low/zero resource languages。
  • methods: 这 paper 提出了三种 frameworks,基于 fixed segmentation、change point-based segmentation 和 E2E,来实现 LD。它们使用 x-vector 作为隐式语言表示,并调整了分析窗口长度 ($N$) 来达到最佳性能。
  • results: 这 paper 的实验结果表明,使用 x-vector 和适当的 $N$ 可以达到同等性能水平,而使用 E2E 框架可以达到最佳隐式 LD 性能(JER 为 6.38)。然而,在使用实际的 Microsoft CS (MSCS) 数据集时,隐式 LD 性能下降到 60.4,这是因为 MSCS 数据集中次语言的单语言段 duration 的分布不同。此外,为了避免段落平滑,需要使用小值 N。然而,小 N 使得 x-vector 表示无法捕捉需要的语言 отли异,因此这种研究提出了一种自我超视的隐式语言表示。与 x-vector 表示相比,该表示提供了 $63.9%$ 的相对改进,并在 E2E 框架下达到 JER 为 21.8。
    Abstract In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization (SD) literature, three frameworks based on (1) fixed segmentation, (2) change point-based segmentation and (3) E2E are proposed to perform LD. The initial exploration with synthetic TTSF-LD dataset shows, using x-vector as implicit language representation with appropriate analysis window length ($N$) can able to achieve at per performance with explicit LD. The best implicit LD performance of $6.38$ in terms of Jaccard error rate (JER) is achieved by using the E2E framework. However, considering the E2E framework the performance of implicit LD degrades to $60.4$ while using with practical Microsoft CS (MSCS) dataset. The difference in performance is mostly due to the distributional difference between the monolingual segment duration of secondary language in the MSCS and TTSF-LD datasets. Moreover, to avoid segment smoothing, the smaller duration of the monolingual segment suggests the use of a small value of $N$. At the same time with small $N$, the x-vector representation is unable to capture the required language discrimination due to the acoustic similarity, as the same speaker is speaking both languages. Therefore, to resolve the issue a self-supervised implicit language representation is proposed in this study. In comparison with the x-vector representation, the proposed representation provides a relative improvement of $63.9\%$ and achieved a JER of $21.8$ using the E2E framework.
    摘要 在 Code-switched (CS) enario,使用口语语言分类 (LD) 作为预处理系统是必备的。进一步地,使用隐式框架比过分Explicit框架更好,因为它可以更好地适应低/零资源语言。受Speaker diarization (SD) литераature inspirited,本文提出了基于 (1) 固定分 segmentation,(2) 变点分 segmentation 和 (3) E2E 的三种框架来实现 LD。使用 x-vector 作为隐式语言表示可以在适当的分析窗口长度 ($N$) 下达到同等的LD表现。在synthetic TTSF-LD 数据集的初步探索中,使用 E2E 框架可以达到最佳的隐式LD表现($6.38$ Jaccard error rate (JER))。然而,在使用实际的 Microsoft CS (MSCS) 数据集时,隐式LD表现下降至 $60.4$。这种表现差异主要归结于 TTSF-LD 和 MSCS 数据集中次语言段 duration 的分布差异。此外,为了避免段落平滑,小于 $N$ 的值更加重要。同时,使用小 $N$ 值,x-vector 表示无法捕捉到需要的语言差异,因为同一个speaker是说两种语言。因此,本研究提出了一种自助式隐式语言表示。与 x-vector 表示相比,该表示提供了 $63.9\%$ 的相对改进,并使用 E2E 框架达到 JER $21.8$。

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.10462
  • repo_url: None
  • paper_authors: Martin Weyssow, Xin Zhou, Kisub Kim, David Lo, Houari Sahraoui
  • for: 这个研究旨在探索如何将大型自然语言模型(LLMs)特化为任务特定数据,以提高其生成代码的能力。
  • methods: 研究使用Parameter-Efficient Fine-Tuning(PEFT)技术来特化LLMs,并进行了广泛的实验研究以评估这些技术在自动代码生成scene中的效果。
  • results: 实验结果显示,PEFT技术可以有效地将LLMs特化为任务特定数据,并提高代码生成的性能和可扩展性。
    Abstract Large Language Models (LLMs) possess impressive capabilities to generate meaningful code snippets given natural language intents in zero-shot, i.e., without the need for specific fine-tuning. In the perspective of unleashing their full potential, prior work has demonstrated the benefits of fine-tuning the models to task-specific data. However, fine-tuning process demands heavy computational costs and is intractable when resources are scarce, especially for models with billions of parameters. In light of these challenges, previous studies explored In-Context Learning (ICL) as an effective strategy to generate contextually appropriate code without fine-tuning. However, it operates at inference time and does not involve learning task-specific parameters, potentially limiting the model's performance on downstream tasks. In this context, we foresee that Parameter-Efficient Fine-Tuning (PEFT) techniques carry a high potential for efficiently specializing LLMs to task-specific data. In this paper, we deliver a comprehensive study of LLMs with the impact of PEFT techniques under the automated code generation scenario. Our experimental results reveal the superiority and potential of such techniques over ICL on a wide range of LLMs in reducing the computational burden and improving performance. Therefore, the study opens opportunities for broader applications of PEFT in software engineering scenarios.
    摘要 大型语言模型(LLM)具有吸引人的能力,可以在零 shot 情况下生成有意义的代码副本,无需特定的精心调整。在激发其全部潜力的视角下,先前的研究表明了对任务特定数据进行精心调整模型的好处。然而,调整过程需要巨大的计算成本,特别是当参数数量庞大时,这种成本变得不可持续。为了解决这些挑战,先前的研究探讨了在代码生成场景中的内在学习(ICL)策略,它可以在推理时生成相应的代码,不需要学习任务特定的参数。然而,ICL 不会在下游任务中学习任务特定的参数,这可能会限制模型的性能。在这个 контексте,我们认为参数高效调整(PEFT)技术具有较高的潜力,可以有效地特化 LLM 到任务特定的数据。在这篇论文中,我们进行了 LLM 的全面研究,并对 PEFT 技术在自动代码生成场景下的影响进行了实验研究。我们的实验结果表明,PEFT 技术比 ICL 在许多 LLM 上具有更高的可行性和性能。因此,这种技术在软件工程场景中具有广泛的应用前景。

Comparing Measures of Linguistic Diversity Across Social Media Language Data and Census Data at Subnational Geographic Areas

  • paper_url: http://arxiv.org/abs/2308.10452
  • repo_url: None
  • paper_authors: Sidney G. -J. Wong, Jonathan Dunn, Benjamin Adams
  • for: 这个研究旨在比较在线空间(社交媒体语言数据)和实际世界空间(新西兰地区)的语言生态学。
  • methods: 研究使用了社交媒体语言数据和实际世界地区的语言多样性的比较方法。
  • results: 研究结果表明,可以使用社交媒体语言数据观察实际地区的语言多样性的空间和时间变化,但需要进一步的研究以确定社交媒体是否准确反映实际行为。
    Abstract This paper describes a preliminary study on the comparative linguistic ecology of online spaces (i.e., social media language data) and real-world spaces in Aotearoa New Zealand (i.e., subnational administrative areas). We compare measures of linguistic diversity between these different spaces and discuss how social media users align with real-world populations. The results from the current study suggests that there is potential to use online social media language data to observe spatial and temporal changes in linguistic diversity at subnational geographic areas; however, further work is required to understand how well social media represents real-world behaviour.
    摘要

Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts

  • paper_url: http://arxiv.org/abs/2308.10410
  • repo_url: None
  • paper_authors: Fan Gao, Hang Jiang, Moritz Blum, Jinghui Lu, Yuang Jiang, Irene Li
  • For: The paper aims to assess the capability of large language models (LLMs) in producing concise survey articles within the computer science-NLP domain.* Methods: The paper uses automated evaluations and human evaluations to assess the performance of GPT-3.5 and GPT-4 in generating survey articles on 20 chosen topics.* Results: GPT-4 outperforms GPT-3.5 in terms of automated evaluations, and human evaluators provide insights on the strengths and weaknesses of GPT-4 in generating survey articles, including instances of incomplete information and factual inaccuracies.Here are the three points in Simplified Chinese text:* For: 这篇论文目标是评估大语言模型(LLM)在计算机科学-自然语言处理(NLP)领域中的简要报告生成能力。* Methods: 论文使用自动评估和人类评估来评估GPT-3.5和GPT-4在20个主题上的报告生成能力。* Results: GPT-4在自动评估中表现出色,超过GPT-3.5,而人类评估者对GPT-4在报告生成方面提供了多种视角,包括不完整的信息和不准确的事实。
    Abstract Large Language Models (LLMs) have achieved significant success across various natural language processing (NLP) tasks, encompassing question-answering, summarization, and machine translation, among others. While LLMs excel in general tasks, their efficacy in domain-specific applications remains under exploration. Additionally, LLM-generated text sometimes exhibits issues like hallucination and disinformation. In this study, we assess LLMs' capability of producing concise survey articles within the computer science-NLP domain, focusing on 20 chosen topics. Automated evaluations indicate that GPT-4 outperforms GPT-3.5 when benchmarked against the ground truth. Furthermore, four human evaluators provide insights from six perspectives across four model configurations. Through case studies, we demonstrate that while GPT often yields commendable results, there are instances of shortcomings, such as incomplete information and the exhibition of lapses in factual accuracy.
    摘要 (Simplified Chinese translation)大型语言模型(LLMs)在各种自然语言处理(NLP)任务中取得了显著的成功,包括问答、概要、翻译等等。而 LLMs 在具体领域应用中的能力仍然处于探索阶段。此外,LLM 生成的文本 occasional 会出现幻觉和不准确的情况。在这项研究中,我们评估 LLMs 在计算机科学-NLP 领域中的简要报告生成能力,关注 20 个话题。自动评估结果表明,GPT-4 在对基准 truth 进行评估时表现更出色于 GPT-3.5。此外,四名人类评估员从六个角度提供了四种模型配置的反馈。通过案例研究,我们发现了 GPT 经常产生出色的结果,但也有一些缺点,如信息杂乱和实际性不准确。

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

  • paper_url: http://arxiv.org/abs/2308.10390
  • repo_url: https://github.com/zihanzhaosjtu/librisqa
  • paper_authors: Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang
  • for: 本研究旨在提高大语言模型(LLMs)对多Modal功能的处理能力,特别是对于口头问答(SQA)任务,需要精准的语音和文本特征之间Alignment和深入交互。
  • methods: 我们提出了一种轻量级、端到端框架,使用Librispeech中的自由形式和开放结构的LibriSQA dataset进行SQA任务的执行,并通过修改ASRFormat来证明我们的框架可以处理ASR任务。
  • results: 我们的实验结果表明,我们的框架可以在LibriSQA dataset上取得显著的成果,证明了LLMs可以准确地处理多Modal信息,这为多Modal LLMs的发展开辟了道路。
    Abstract While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features. To address the SQA challenge on LLMs, we initially curated the free-form and open-ended LibriSQA dataset from Librispeech, comprising Part I with natural conversational formats and Part II encompassing multiple-choice questions followed by answers and analytical segments. Both parts collectively include 107k SQA pairs that cover various topics. Given the evident paucity of existing speech-text LLMs, we propose a lightweight, end-to-end framework to execute the SQA task on the LibriSQA, witnessing significant results. By reforming ASR into the SQA format, we further substantiate our framework's capability in handling ASR tasks. Our empirical findings bolster the LLMs' aptitude for aligning and comprehending multimodal information, paving the way for the development of universal multimodal LLMs. The dataset and demo can be found at https://github.com/ZihanZhaoSJTU/LibriSQA.
    摘要 大型自然语言模型(LLM)已经在多个领域和任务上表现出色,但现有LLM仍然缺乏在多模态功能方面的能力,尤其是对话问答(SQA)任务,需要精准的语音和文本特征之间对应。为了解决LLM中的SQA挑战,我们首先绘制了自由形式和开放结构的LibriSQA数据集,包括Part I的自然对话格式和Part II的多选题目和答案分析段落。总共有107k个SQA对。由于现有的语音文本LLM很罕见,我们提议一种轻量级、端到端的框架来实现SQA任务,并在LibriSQA上观察到了显著的成果。通过重新定义ASR为SQA格式,我们进一步证明了我们的框架在ASR任务上的能力。我们的实验结果证明了LLM在处理多模态信息方面的能力,开启了universal multimodal LLM的发展之路。数据集和demo可以在https://github.com/ZihanZhaoSJTU/LibriSQA中找到。

cantnlp@LT-EDI@RANLP-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models

  • paper_url: http://arxiv.org/abs/2308.10370
  • repo_url: None
  • paper_authors: Sidney G. -J. Wong, Matthew Durward, Benjamin Adams, Jonathan Dunn
  • for: The paper was written to develop a multiclass classification system for detecting homophobic and transphobic content in social media comments across five languages.
  • methods: The authors used a BERT-based language model and retrained a transformer-based cross-language pretrained language model, XLMRoBERTa, with spatially and temporally relevant social media language data. They also retrained a subset of models with simulated script-mixed social media language data.
  • results: The authors developed the best performing seven-label classification system for Malayalam, with variable performance for other language and class-label conditions. The inclusion of spatio-temporal data improved the classification performance for all language and task conditions compared to the baseline, and the results suggest that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.
    Abstract This paper describes our multiclass classification system developed as part of the LTEDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based crosslanguage pretrained language model, XLMRoBERTa, with spatially and temporally relevant social media language data. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. We developed the best performing seven-label classification system for Malayalam based on weighted macro averaged F1 score (ranked first out of six) with variable performance for other language and class-label conditions. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. The results suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.
    摘要 这篇论文描述了我们的多类分类系统,它是在LTEDI@RANLP-2023共同任务中开发的。我们使用基于BERT的语言模型来检测社交媒体评论中的同性恐惧和 транс恐惧内容,并在五种语言条件下进行检测:英语、西班牙语、旁遮普语、马拉雅底语和泰米尔语。我们将XTLMRoBERTa模型重新训练,使其能够在社交媒体上处理空间和时间相关的语言数据。此外,我们还重新训练了一些模型使其能够处理混合语言社交媒体语言数据,并观察其性能的变化。我们在马拉雅底语中开发了最佳的七个标签分类系统,使得其在Weighted Macro F1分数上达到了第一名(排名第六),而其他语言和类别条件中的性能则有变化。我们发现,包含这些空间和时间相关的数据可以提高所有语言和任务条件下的分类性能。这些结果表明,基于转移器的语言分类系统是根据 регистр和语言特定的重新训练敏感的。

Economic Policy Uncertainty: A Review on Applications and Measurement Methods with Focus on Text Mining Methods

  • paper_url: http://arxiv.org/abs/2308.10304
  • repo_url: None
  • paper_authors: Fatemeh Kaveh-Yazdy, Sajjad Zarifzadeh
    for:This paper focuses on the measurement of Economic Policy Uncertainty (EPU) and its impact on investments, unemployment rates, and recessions.methods:The paper reviews and compares three major groups of EPU measurement methods, including financial parameter-based, text mining-based, and implied uncertainty-based methods.results:The paper surveys the research areas that rely on measuring EPU indices and proposes a list of future research approaches focusing on textual material-based measurement methods.
    Abstract Economic Policy Uncertainty (EPU) represents the uncertainty realized by the investors during economic policy alterations. EPU is a critical indicator in economic studies to predict future investments, the unemployment rate, and recessions. EPU values can be estimated based on financial parameters directly or implied uncertainty indirectly using the text mining methods. Although EPU is a well-studied topic within the economy, the methods utilized to measure it are understudied. In this article, we define the EPU briefly and review the methods used to measure the EPU, and survey the areas influenced by the changes in EPU level. We divide the EPU measurement methods into three major groups with respect to their input data. Examples of each group of methods are enlisted, and the pros and cons of the groups are discussed. Among the EPU measures, text mining-based ones are dominantly studied. These methods measure the realized uncertainty by taking into account the uncertainty represented in the news and publicly available sources of financial information. Finally, we survey the research areas that rely on measuring the EPU index with the hope that studying the impacts of uncertainty would attract further attention of researchers from various research fields. In addition, we propose a list of future research approaches focusing on measuring EPU using textual material.
    摘要 经济政策不确定性(EPU)表示投资者在经济政策变化时所实际体验的不确定性。EPU是经济研究中一个关键指标,可以预测未来投资、失业率和经济衰退。EPU的估值可以基于直接或间接使用金融参数来计算。尽管EPU已经在经济领域得到了广泛研究,但是测量它的方法仍然不充分。在本文中,我们 briefly定义EPU并回顾了测量EPU的方法,并对这些方法的影响范围进行了概述。我们将EPU测量方法分为三个主要组合,根据其输入数据。每个组合的例子与优缺点都被列出,并进行了讨论。text挖掘方法在EPU测量方法中占主导地位。这些方法通过考虑新闻和公共可用的金融信息中表达的不确定性来测量实现的不确定性。最后,我们对研究EPU指数的研究领域进行了报告,并提出了未来研究方向,以便研究不确定性的影响。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.