cs.CL - 2023-12-02

Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2023): Workshop and Shared Task Report

  • paper_url: http://arxiv.org/abs/2312.01244
  • repo_url: None
  • paper_authors: Ali Hürriyetoğlu, Hristo Tanev, Osman Mutlu, Surendrabikram Thapa, Fiona Anting Tan, Erdem Yörük
  • for: 本研讨会汇集了技术和社会科学领域的所有方面的事件信息收集,以推动文本基于事件EXTRACTION的进步。
  • methods: 研讨会包括常规文献、三个领域探讨、共同任务参与者的工作报告和共同任务概述文章。
  • results: 本研讨会提供了一个多modal事件信息收集任务的组织空间,并为文本基于事件EXTRACTION的进步作出贡献。
    Abstract We provide a summary of the sixth edition of the CASE workshop that is held in the scope of RANLP 2023. The workshop consists of regular papers, three keynotes, working papers of shared task participants, and shared task overview papers. This workshop series has been bringing together all aspects of event information collection across technical and social science fields. In addition to contributing to the progress in text based event extraction, the workshop provides a space for the organization of a multimodal event information collection task.
    摘要 我们提供了六 edition的 CASE 工作坊的摘要,该工作坊在 RANLP 2023 的范围内举行。该工作坊包括常规paper、三个演讲会、共享任务参与者的工作paper以及共享任务概述paper。这个工作坊系列一直在汇集技术和社会科学领域的所有方面的事件信息收集。此外,它还为多Modal事件信息收集任务的组织提供了空间。

UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection

  • paper_url: http://arxiv.org/abs/2312.01225
  • repo_url: None
  • paper_authors: Ruofan Hu, Dongyu Zhang, Dandan Tao, Huayi Zhang, Hao Feng, Elke Rundensteiner
  • for: 针对食品中毒疾病报道的检测和识别
  • methods: 使用深度学习框架,利用专家标注的微型 tweet 和大量未标注数据进行增强
  • results: 在不同的专家标注集大小和类别不均衡比例下,EGAL 模型比强大基eline模型表现出色,并在实际应用中对多州肺脏炎毒素卷狐疾病提供了有价值的探索视角。
    Abstract Foodborne illnesses significantly impact public health. Deep learning surveillance applications using social media data aim to detect early warning signals. However, labeling foodborne illness-related tweets for model training requires extensive human resources, making it challenging to collect a sufficient number of high-quality labels for tweets within a limited budget. The severe class imbalance resulting from the scarcity of foodborne illness-related tweets among the vast volume of social media further exacerbates the problem. Classifiers trained on a class-imbalanced dataset are biased towards the majority class, making accurate detection difficult. To overcome these challenges, we propose EGAL, a deep learning framework for foodborne illness detection that uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data. Specifically, by leveraging tweets labeled by experts as a reward set, EGAL learns to assign a weight of zero to incorrectly labeled tweets to mitigate their negative influence. Other tweets receive proportionate weights to counter-balance the unbalanced class distribution. Extensive experiments on real-world \textit{TWEET-FID} data show that EGAL outperforms strong baseline models across different settings, including varying expert-labeled set sizes and class imbalance ratios. A case study on a multistate outbreak of Salmonella Typhimurium infection linked to packaged salad greens demonstrates how the trained model captures relevant tweets offering valuable outbreak insights. EGAL, funded by the U.S. Department of Agriculture (USDA), has the potential to be deployed for real-time analysis of tweet streaming, contributing to foodborne illness outbreak surveillance efforts.
    摘要 食品中毒疾病对公共卫生产生重大影响。使用社交媒体数据进行深度学习监测,以早期探测食品中毒疾病的警示信号。然而,为食品中毒疾病相关的推文进行模型训练所需的人工资源很大,因此在有限的预算内收集到足够多的高质量标签很困难。食品中毒疾病相关推文的稀缺性,使得模型受到类型不均衡的影响,导致准确的检测困难。为解决这些挑战,我们提出了EGAL,一种基于深度学习的食品中毒疾病检测框架。EGAL使用专家标注的推文作为奖励集,使EGAL可以忽略错误标注的推文,并对其他推文分配权重以减轻类型不均衡的影响。实际实验表明,EGAL在不同的专家标注集大小和类型不均衡比率下表现出色,比 STRONG 基eline模型更高。一个实际案例研究了一起来自美国农业部(USDA)的多州 Salmonella Typhimurium 感染事件,表明EGAL 可以捕捉有价值的推文,提供食品中毒疾病爆发见解。EGAL 被美国农业部(USDA)资助,有望实时分析推文流动,为食品中毒疾病爆发披露提供支持。

English to Arabic machine translation of mathematical documents

  • paper_url: http://arxiv.org/abs/2312.03753
  • repo_url: None
  • paper_authors: Mustapha Eddahibi, Mohammed Mensouri
    for: This paper aims to develop a machine translation system for LATEX mathematical documents, specifically tailored for translating English LATEX documents into Arabic LATEX.methods: The proposed system utilizes a Transformer model as the core of the translation system, ensuring enhanced accuracy and fluency in the translated Arabic LATEX documents. Additionally, the system integrates RyDArab, an Arabic mathematical TEX extension, and a rule-based translator for Arabic mathematical expressions.results: The developed system demonstrates efficacy in bridging the language gap in the domain of mathematical documentation, providing precise rendering of complex mathematical symbols and equations in the translated output.
    Abstract This paper is about the development of a machine translation system tailored specifically for LATEX mathematical documents. The system focuses on translating English LATEX mathematical documents into Arabic LATEX, catering to the growing demand for multilingual accessibility in scientific and mathematical literature. With the vast proliferation of LATEX mathematical documents the need for an efficient and accurate translation system has become increasingly essential. This paper addresses the necessity for a robust translation tool that enables seamless communication and comprehension of complex mathematical content across language barriers. The proposed system leverages a Transformer model as the core of the translation system, ensuring enhanced accuracy and fluency in the translated Arabic LATEX documents. Furthermore, the integration of RyDArab, an Arabic mathematical TEX extension, along with a rule-based translator for Arabic mathematical expressions, contributes to the precise rendering of complex mathematical symbols and equations in the translated output. The paper discusses the architecture, methodology, of the developed system, highlighting its efficacy in bridging the language gap in the domain of mathematical documentation
    摘要 The proposed system utilizes a Transformer model as its core, ensuring enhanced accuracy and fluency in the translated Arabic LATEX documents. Additionally, the integration of RyDArab, an Arabic mathematical TEX extension, along with a rule-based translator for Arabic mathematical expressions, enables the precise rendering of complex mathematical symbols and equations in the translated output.The paper outlines the architecture and methodology of the developed system, highlighting its effectiveness in bridging the language gap in the domain of mathematical documentation. The system's ability to accurately translate complex mathematical content is expected to have a significant impact on the field, enabling researchers and scholars to share their work with a wider audience and facilitating the advancement of mathematical knowledge across languages.

Automatic Scoring of Students’ Science Writing Using Hybrid Neural Network

  • paper_url: http://arxiv.org/abs/2312.03752
  • repo_url: None
  • paper_authors: Ehsan Latif, Xiaoming Zhai
  • for: 这个研究探讨了一种多角度混合神经网络(HNN)在科学教育中自动评分学生回答的可效性。
  • methods: 我们比较了HNN模型与四种机器学习(BERT、AACR、Naive Bayes和Logistic Regression)方法的准确率。
  • results: 结果显示,HNN模型在五个评分方面(p<0.001)比Naive Bayes、Logistic Regression、AACR和BERT模型高出8%, 3%, 1%和0.12%。总的来说,HNN模型的感知准确率(M = 96.23%, SD = 1.45%)与训练和推理BERT模型的准确率(M = 96.12%, SD = 1.52%)相当。此外,我们发现HNN模型在训练和推理中比BERT模型快二倍,并且与轻量级但较准确的Naive Bayes模型有相同的效率。
    Abstract This study explores the efficacy of a multi-perspective hybrid neural network (HNN) for scoring student responses in science education with an analytic rubric. We compared the accuracy of the HNN model with four ML approaches (BERT, AACR, Naive Bayes, and Logistic Regression). The results have shown that HHN achieved 8%, 3%, 1%, and 0.12% higher accuracy than Naive Bayes, Logistic Regression, AACR, and BERT, respectively, for five scoring aspects (p<0.001). The overall HNN's perceived accuracy (M = 96.23%, SD = 1.45%) is comparable to the (training and inference) expensive BERT model's accuracy (M = 96.12%, SD = 1.52%). We also have observed that HNN is x2 more efficient in training and inferencing than BERT and has comparable efficiency to the lightweight but less accurate Naive Bayes model. Our study confirmed the accuracy and efficiency of using HNN to score students' science writing automatically.
    摘要 Note:* "HNN" stands for "hybrid neural network"* "BERT" stands for "Bidirectional Encoder Representations from Transformers"* "AACR" stands for "Adaptive Analytic Cognitive Reading"* "Naive Bayes" is a machine learning algorithm* "Logistic Regression" is a machine learning algorithmPlease note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

Enabling Quantum Natural Language Processing for Hindi Language

  • paper_url: http://arxiv.org/abs/2312.01221
  • repo_url: None
  • paper_authors: Naman Srivastava, Gaurang Belekar, Sunil Saumya, Aswath Babu H
  • for: 这 paper 的目的是将 Quantum Natural Language Processing (QNLP) 技术应用到旁遮普语言中,解决 classical Natural Language Processing (NLP) 技术的缺陷,建立一个更加 “解释的” NLP 系统。
  • methods: 这 paper 使用的方法包括:使用 Hindi 语言的预组表示法和 DisCoCat 框架来绘制句子 diagram,然后将这些 diagram 翻译为基于快速量子多项式 (IQP) 风格的参数化量子电路。
  • results: 这 paper 的结果表明,使用这种 parameterized quantum circuits 可以训练 grammar 和 topic-aware 句子分类器 для Hindi 语言。
    Abstract Quantum Natural Language Processing (QNLP) is taking huge leaps in solving the shortcomings of classical Natural Language Processing (NLP) techniques and moving towards a more "Explainable" NLP system. The current literature around QNLP focuses primarily on implementing QNLP techniques in sentences in the English language. In this paper, we propose to enable the QNLP approach to HINDI, which is the third most spoken language in South Asia. We present the process of building the parameterized quantum circuits required to undertake QNLP on Hindi sentences. We use the pregroup representation of Hindi and the DisCoCat framework to draw sentence diagrams. Later, we translate these diagrams to Parameterised Quantum Circuits based on Instantaneous Quantum Polynomial (IQP) style ansatz. Using these parameterized quantum circuits allows one to train grammar and topic-aware sentence classifiers for the Hindi Language.
    摘要 量子自然语言处理(QNLP)正在解决古典自然语言处理(NLP)技术的缺陷,并在向更“可解释”NLP系统方向进行征攻。当前文献中关于QNLP技术的应用主要集中在英语句子上。在这篇论文中,我们提议使用QNLP方法应用于旁遮普语,这是南亚地区第三最常用的语言。我们介绍了构建 necessitated quantum circuits的过程,用于进行旁遮普语句子的QNLP。我们使用印地语的预组表示法和DisCoCat框架来绘制句子 диаграм。然后,我们将这些 диаграм翻译成基于快速量子多项式(IQP)风格的参数化量子电路。使用这些参数化量子电路可以训练旁遮普语句子的 grammatical 和话题意识检测器。

Understanding Opinions Towards Climate Change on Social Media

  • paper_url: http://arxiv.org/abs/2312.01217
  • repo_url: None
  • paper_authors: Yashaswi Pupneja, Joseph Zou, Sacha Lévy, Shenyang Huang
  • for: 本研究旨在理解社交媒体上关于气候变化话题的公众意见的发展,特别是在气候变化会议(COP)事件后的社交媒体舆论中的共同体结构和意见变化。
  • methods: 该研究使用了1360万条推文数据,360万名用户的Twitter数据,并使用了Louvain社群探测算法分析用户之间的提及关系网络。此外,研究还使用了自然语言处理领域的工具进行情感分析和话题modeling。
  • results: 研究发现,在COP事件后,社交媒体上关于气候变化话题的共同体结构发生了明显的变化,同时用户们对气候变化的看法也发生了改变。此外,研究还发现了一些关键话题和情感变化,如政治 polarization和谎言的散布。
    Abstract Social media platforms such as Twitter (now known as X) have revolutionized how the public engage with important societal and political topics. Recently, climate change discussions on social media became a catalyst for political polarization and the spreading of misinformation. In this work, we aim to understand how real world events influence the opinions of individuals towards climate change related topics on social media. To this end, we extracted and analyzed a dataset of 13.6 millions tweets sent by 3.6 million users from 2006 to 2019. Then, we construct a temporal graph from the user-user mentions network and utilize the Louvain community detection algorithm to analyze the changes in community structure around Conference of the Parties on Climate Change~(COP) events. Next, we also apply tools from the Natural Language Processing literature to perform sentiment analysis and topic modeling on the tweets. Our work acts as a first step towards understanding the evolution of pro-climate change communities around COP events. Answering these questions helps us understand how to raise people's awareness towards climate change thus hopefully calling on more individuals to join the collaborative effort in slowing down climate change.
    摘要 社交媒体平台如Twitter(现已更名为X)已经改变了公众对社会和政治话题的参与方式。最近,气候变化话题在社交媒体上引起了政治化和谣言的扩散。在这项工作中,我们想要了解社交媒体上气候变化话题如何受到实际世界事件的影响。为此,我们提取并分析了2006年至2019年间3600万条推文,其中360万名用户的推文。然后,我们将用户之间的提及网络转化为时间图,并使用卢梭安社区检测算法分析COP事件周期性的社区结构变化。此外,我们还使用自然语言处理文献中的工具进行情感分析和话题模型化。我们的工作作为气候变化话题在COP事件周期性社区的演化的第一步。解答这些问题可以帮助我们理解如何提高人们对气候变化的认识,以便更多人加入气候变化阻止的共同努力。

Here Is Not There: Measuring Entailment-Based Trajectory Similarity for Location-Privacy Protection and Beyond

  • paper_url: http://arxiv.org/abs/2312.01151
  • repo_url: None
  • paper_authors: Zilong Liu, Krzysztof Janowicz, Kitty Currier, Meilin Shi, Jinmeng Rao, Song Gao, Ling Cai, Anita Graser
  • for: This paper is written for discussing the limitations of current trajectory similarity measures in abstract space and proposing a new measure based on logical entailment to better account for the rich structure of geographic space.
  • methods: The paper uses a case study to formalize entailment-based trajectory similarity and evaluate the effectiveness of the proposed measure using a privacy-preserving trajectory-generation model (LSTM-TrajGAN).
  • results: The paper shows that the proposed entailment-based measure can reveal potential consequences of disregarding the structure of geographic space, such as miscalculated insurance risk due to regional shifts, and highlights the advantage of applying logical entailment to trajectory-similarity reasoning for location-privacy protection and beyond.
    Abstract While the paths humans take play out in social as well as physical space, measures to describe and compare their trajectories are carried out in abstract, typically Euclidean, space. When these measures are applied to trajectories of actual individuals in an application area, alterations that are inconsequential in abstract space may suddenly become problematic once overlaid with geographic reality. In this work, we present a different view on trajectory similarity by introducing a measure that utilizes logical entailment. This is an inferential perspective that considers facts as triple statements deduced from the social and environmental context in which the travel takes place, and their practical implications. We suggest a formalization of entailment-based trajectory similarity, measured as the overlapping proportion of facts, which are spatial relation statements in our case study. With the proposed measure, we evaluate LSTM-TrajGAN, a privacy-preserving trajectory-generation model. The entailment-based model evaluation reveals potential consequences of disregarding the rich structure of geographic space (e.g., miscalculated insurance risk due to regional shifts in our toy example). Our work highlights the advantage of applying logical entailment to trajectory-similarity reasoning for location-privacy protection and beyond.
    摘要 人类的路径在社交空间和物理空间之间发挥作用,但是用于描述和比较他们的轨迹的度量通常是在抽象的几何空间中进行的。在实际应用中,这些度量在抽象空间中的修正可能在地理现实中变得重要。在这项工作中,我们提出了一种不同的轨迹相似度度量,基于逻辑推论。这是一种推理 perspective,它考虑了在旅行过程中的社会和环境上下文,以及其实际意义。我们建议将逻辑推论基数的轨迹相似度量化为覆盖率,即在我们的案例研究中的空间关系声明。通过我们的建议的度量,我们评估了LSTM-TrajGAN模型,一种隐私保护的轨迹生成模型。逻辑推论基数的模型评估显示了忽略地理空间的质折(例如,在我们的玩具例子中的不当风险评估)的可能性。我们的工作强调了在地理隐私保护和其他领域中应用逻辑推论来轨迹相似度理解的优势。

Towards leveraging LLMs for Conditional QA

  • paper_url: http://arxiv.org/abs/2312.01143
  • repo_url: None
  • paper_authors: Syed-Amad Hussain, Parag Pravin Dakle, SaiKrishna Rallabandi, Preethi Raghavan
  • for: 这些研究探讨大语言模型(LLM)在受限的问答任务中的能力和局限性。
  • methods: 这些研究使用了Conditional Question Answering(CQA)数据集,主要关注生成模型T5和UL2,评估LLM在多种问题类型上的性能。
  • results: 研究发现,精度地 fine-tune LLM可以在某些情况下超过现有最佳性能(SOTA),无需完全编码输入上下文,EM和F1分数提高7-8分。然而,这些模型在抽取式问题回答中遇到了困难,与SOTA差距超过10分,并且在避免抛入false信息方面也遇到了困难。
    Abstract This study delves into the capabilities and limitations of Large Language Models (LLMs) in the challenging domain of conditional question-answering. Utilizing the Conditional Question Answering (CQA) dataset and focusing on generative models like T5 and UL2, we assess the performance of LLMs across diverse question types. Our findings reveal that fine-tuned LLMs can surpass the state-of-the-art (SOTA) performance in some cases, even without fully encoding all input context, with an increase of 7-8 points in Exact Match (EM) and F1 scores for Yes/No questions. However, these models encounter challenges in extractive question answering, where they lag behind the SOTA by over 10 points, and in mitigating the risk of injecting false information. A study with oracle-retrievers emphasizes the critical role of effective evidence retrieval, underscoring the necessity for advanced solutions in this area. Furthermore, we highlight the significant influence of evaluation metrics on performance assessments and advocate for a more comprehensive evaluation framework. The complexity of the task, the observed performance discrepancies, and the need for effective evidence retrieval underline the ongoing challenges in this field and underscore the need for future work focusing on refining training tasks and exploring prompt-based techniques to enhance LLM performance in conditional question-answering tasks.
    摘要

TURead: An eye movement dataset of Turkish reading

  • paper_url: http://arxiv.org/abs/2312.01114
  • repo_url: None
  • paper_authors: Cengiz Acarturk, Aysegul Ozkan, Tugce Nur Pekcetin, Zuhal Ormanoglu, Bilal Kirkici
  • for: 这个研究是为了研究土耳其语的阅读行为,以及词形和视觉控制之间的关系。
  • methods: 这个研究使用了目标词法, manipulate 目标词的长度和使用两种常见的土耳其语 suffix。 研究使用了一些已经确立的眼动变量,如前lexical特征、bigram-trigram频率、词长、预测可能性、眼声间隔度量、Cloze测试 scores和词根词 suffix 预测可能性,以及两种工作记忆测试的分数。
  • results: 研究发现, fixation 参数和词形特征与已有文献报道的征 patrern 相一致。
    Abstract In this study, we present TURead, an eye movement dataset of silent and oral sentence reading in Turkish, an agglutinative language with a shallow orthography understudied in reading research. TURead provides empirical data to investigate the relationship between morphology and oculomotor control. We employ a target-word approach in which target words are manipulated by word length and by the addition of two commonly used suffixes in Turkish. The dataset contains well-established eye movement variables; prelexical characteristics such as vowel harmony and bigram-trigram frequencies and word features, such as word length, predictability, frequency, eye voice span measures, Cloze test scores of the root word and suffix predictabilities, as well as the scores obtained from two working memory tests. Our findings on fixation parameters and word characteristics are in line with the patterns reported in the relevant literature.
    摘要 在这项研究中,我们介绍TURead数据集,这是一个 Turkısh语言的静默和口头句子阅读的眼动数据集,这是一种没有充分研究的语言。TURead提供了实证数据,以便调查 morphology 和眼动控制之间的关系。我们采用了目标词方法,在这种方法中,目标词被修改了单词长度和两个通用的土著词缀。数据集包含了已确立的眼动变量,包括前 lexical 特征,如 Turkish 语言中的 vowel harmony 和 bigram-trigram 频率,以及单词特征,如单词长度、预测性、频率、眼声间距测量、根词和 suffix 预测性的 Cloze test 分数,以及两个工作记忆测试的分数。我们发现了眼动参数和单词特征之间的征性匹配,与已知的文献中的模式一致。

Which linguistic cues make people fall for fake news? A comparison of cognitive and affective processing

  • paper_url: http://arxiv.org/abs/2312.03751
  • repo_url: None
  • paper_authors: Bernhard Lutz, Marc Adam, Stefan Feuerriegel, Nicolas Pröllochs, Dirk Neumann
  • for: 这项研究旨在了解人们为什么会接受假新闻,以及如何设计社交媒体上有效的防范措施。
  • methods: 研究采用了内部试验方法,收集了42名参与者的脑电physiological measurement数据,并对40篇真实新闻和假新闻文章进行了评估。
  • results: 研究发现,用户对长篇假新闻文章进行了更多的认知处理,而情感处理则更加易发生在分析词汇中。这是首次研究语言cue在假新闻处理中的作用。这些发现对设计在线平台以便促进用户仔细思考,从而避免他们接受假新闻具有重要意义。
    Abstract Fake news on social media has large, negative implications for society. However, little is known about what linguistic cues make people fall for fake news and, hence, how to design effective countermeasures for social media. In this study, we seek to understand which linguistic cues make people fall for fake news. Linguistic cues (e.g., adverbs, personal pronouns, positive emotion words, negative emotion words) are important characteristics of any text and also affect how people process real vs. fake news. Specifically, we compare the role of linguistic cues across both cognitive processing (related to careful thinking) and affective processing (related to unconscious automatic evaluations). To this end, we performed a within-subject experiment where we collected neurophysiological measurements of 42 subjects while these read a sample of 40 real and fake news articles. During our experiment, we measured cognitive processing through eye fixations, and affective processing in situ through heart rate variability. We find that users engage more in cognitive processing for longer fake news articles, while affective processing is more pronounced for fake news written in analytic words. To the best of our knowledge, this is the first work studying the role of linguistic cues in fake news processing. Altogether, our findings have important implications for designing online platforms that encourage users to engage in careful thinking and thus prevent them from falling for fake news.
    摘要 假新闻在社交媒体上有很大的负面影响,但现实上很少人知道什么是让人受到假新闻的吸引力,因此如何设计有效的社交媒体平台是一个重要的问题。在这个研究中,我们想要了解哪些语言特征使人们受到假新闻的吸引力。语言特征(如形容词、人称代词、正面情感词、负面情感词)是文本的重要特征,它们也会影响人们如何处理真实新闻和假新闻。我们使用内存实验,收集了42名参与者的脑电Physiological measurements,他们在阅读40篇真实新闻和假新闻文章时进行了心理处理和情感处理。我们发现,用户在假新闻文章中更加偏好进行心理处理,而情感处理则更加明显地表现在分析性词汇中。这是我们知道的第一个研究语言特征在假新闻处理中的作用。总之,我们的发现对于设计促进用户思ful thinking的在线平台具有重要意义,以防止用户受到假新闻的影响。

End-to-End Speech-to-Text Translation: A Survey

  • paper_url: http://arxiv.org/abs/2312.01053
  • repo_url: None
  • paper_authors: Nivedita Sethiya, Chandresh Kumar Maurya
  • for: 这篇论文主要是为了提出一种综述批处理语音翻译模型的方法,以便帮助研究人员更好地开发和应用这类模型。
  • methods: 论文使用了许多现有的自动语音识别(ASR)和机器翻译(MT)模型,以及一些新的综合模型(E2E),并对这些模型进行了评估和比较。
  • results: 论文提出了一些新的综合模型,并对这些模型的性能进行了评估,并且提出了一些未来的研究方向和挑战。
    Abstract Speech-to-text translation pertains to the task of converting speech signals in a language to text in another language. It finds its application in various domains, such as hands-free communication, dictation, video lecture transcription, and translation, to name a few. Automatic Speech Recognition (ASR), as well as Machine Translation(MT) models, play crucial roles in traditional ST translation, enabling the conversion of spoken language in its original form to written text and facilitating seamless cross-lingual communication. ASR recognizes spoken words, while MT translates the transcribed text into the target language. Such disintegrated models suffer from cascaded error propagation and high resource and training costs. As a result, researchers have been exploring end-to-end (E2E) models for ST translation. However, to our knowledge, there is no comprehensive review of existing works on E2E ST. The present survey, therefore, discusses the work in this direction. Our attempt has been to provide a comprehensive review of models employed, metrics, and datasets used for ST tasks, providing challenges and future research direction with new insights. We believe this review will be helpful to researchers working on various applications of ST models.
    摘要 听说转文本相关于将语言信号转换成另一种语言的文本。它在各个领域中扮演着重要角色,如手sfree通信、笔记、视频课程转录和翻译等。自动语音识别(ASR)和机器翻译(MT)模型在传统的ST转换中扮演关键角色,它们可以将原始语言的语音转换成文本,并且实现了跨语言交流。然而,现有的分解式模型受到链式错误传播和资源和训练成本的限制。因此,研究人员在努力开发端到端(E2E)模型。然而,我们知道的是,现在没有一篇全面的E2E ST翻译的评论。本文因此评论了现有的模型、度量和数据集,以及ST任务中的挑战和未来研究方向。我们认为这篇评论将对各种ST应用领域的研究人员提供帮助。

Structured, Complex and Time-complete Temporal Event Forecasting

  • paper_url: http://arxiv.org/abs/2312.01052
  • repo_url: https://github.com/yecchen/gdelt-complexevent
  • paper_authors: Yunshan Ma, Chenchen Ye, Zijian Wu, Xiang Wang, Yixin Cao, Liang Pang, Tat-Seng Chua
  • for: 本研究旨在提出一种新的时间事件表示方法,以提高时间事件表示质量和预测能力。
  • methods: 本研究使用了一种简单且自动化的构建管道,将新闻文章转化为结构化、复杂、完整的时间事件(SCTc-TE)。此外,本研究还提出了一种基于本地和全局上下文的预测模型,名为LoGo。
  • results: 对于MidEast-TE和GDELT-TE两个大规模数据集,实验结果表明LoGo模型能够具有较高的预测精度和多种优势。
    Abstract Temporal event forecasting aims to predict what will happen next given the observed events in history. Previous formulations of temporal event are unstructured, atomic, or lacking full temporal information, thus largely restricting the representation quality and forecasting ability of temporal events. To address these limitations, we introduce a novel formulation for Structured, Complex, and Time-complete Temporal Event (SCTc-TE). Based on this new formulation, we develop a simple and fully automated pipeline for constructing such SCTc-TEs from a large amount of news articles. Furthermore, we propose a novel model that leverages both Local and Global contexts for SCTc-TE forecasting, named LoGo. To evaluate our model, we construct two large-scale datasets named MidEast-TE and GDELT-TE. Extensive evaluations demonstrate the advantages of our datasets in multiple aspects, while experimental results justify the effectiveness of our forecasting model LoGo. We release the code and dataset via https://github.com/yecchen/GDELT-ComplexEvent.
    摘要 <>将文本翻译成简化中文。<>temporal事件预测目标是根据历史记录的事件预测下一个将会发生的事件。现有的 temporal事件表示方式存在不结构化、原子或缺少完整的时间信息,这限制了事件表示质量和预测能力。为解决这些限制,我们提出了一种新的Structured、复杂、时间完整的 temporal事件表示方式(SCTc-TE)。基于这种新方式,我们开发了一个简单、自动化的管道来生成大量的新闻文章中的 SCTc-TE。此外,我们提出了一种名为 LoGo 的模型,它利用了本地和全局上下文来预测 SCTc-TE。为评估我们的模型,我们构建了两个大规模的数据集名为 MidEast-TE 和 GDELT-TE。广泛的评估表明我们的数据集在多个方面具有优势,而实验结果证明我们的预测模型 LoGo 的有效性。我们通过 GitHub 上的 发布了代码和数据集。

  • paper_url: http://arxiv.org/abs/2312.01050
  • repo_url: None
  • paper_authors: Nazzere Oryngozha, Pakizar Shamoi, Ayan Igali
  • for: This paper aims to detect and analyze stress-related posts in Reddit academic communities, with the goal of understanding the stress levels within these communities and developing measures to address the issue effectively.
  • methods: The authors use natural language processing and machine learning techniques, specifically the Bag of Words and Logistic Regression classifier, to classify text as stressed or not. They use a dataset of labeled posts from Reddit (DReaddit) as their training set, and also collect and analyze posts from various academic subreddits.
  • results: The authors find that the most effective individual feature for stress detection is the Bag of Words, paired with the Logistic Regression classifier, which achieves an accuracy rate of 77.78% and an F1 score of 0.79 on the DReaddit dataset. They also find that posts and comments in professors’ Reddit communities are the most stressful, compared to other academic levels.
    Abstract Nowadays, the significance of monitoring stress levels and recognizing early signs of mental illness cannot be overstated. Automatic stress detection in text can proactively help manage stress and protect mental well-being. In today's digital era, social media platforms reflect the psychological well-being and stress levels within various communities. This study focuses on detecting and analyzing stress-related posts in Reddit academic communities. Due to online education and remote work, these communities have become central for academic discussions and support. We classify text as stressed or not using natural language processing and machine learning classifiers, with Dreaddit as our training dataset, which contains labeled data from Reddit. Next, we collect and analyze posts from various academic subreddits. We identified that the most effective individual feature for stress detection is the Bag of Words, paired with the Logistic Regression classifier, achieving a 77.78% accuracy rate and an F1 score of 0.79 on the DReaddit dataset. This combination also performs best in stress detection on human-annotated datasets, with a 72% accuracy rate. Our key findings reveal that posts and comments in professors Reddit communities are the most stressful, compared to other academic levels, including bachelor, graduate, and Ph.D. students. This research contributes to our understanding of the stress levels within academic communities. It can help academic institutions and online communities develop measures and interventions to address this issue effectively.
    摘要 现在,监测 стресс水平和早期识别心理疾病的重要性不可逾越。自动检测压力在文本中可以积极地管理压力,保护心理健康。在当今的数字时代,社交媒体平台反映了不同社区的心理健康和压力水平。本研究将在Reddit学术社区中探索压力相关的文本。由于在线教育和远程工作的普及,这些社区已成为学术讨论和支持的中心。我们使用自然语言处理和机器学习分类器将文本分为压力和不压力两类,使用Dreaddit作为训练集,该集包含Reddit上的标注数据。然后,我们收集和分析各个学术子社区的帖子和评论。我们发现,使用词语袋的Bag of Words与逻辑回归分类器的组合,在DReaddit数据集上达到了77.78%的准确率和0.79的F1分数。这种组合还在人工标注数据集上表现最佳,准确率达到了72%。我们的关键发现表明,在教授的Reddit社区中的帖子和评论是最为压力的,与其他学术水平相比,包括学士、硬件和博士生。这些研究对学术社区内压力水平的认知做出了贡献,可以帮助学术机构和在线社区开发有效的措施和抗应对策略。

Large Language Models Are Zero-Shot Text Classifiers

  • paper_url: http://arxiv.org/abs/2312.01044
  • repo_url: https://github.com/yeyimilk/llm-zero-shot-classifiers
  • paper_authors: Zhiqiang Wang, Yiran Pang, Yanbin Lin
  • for: 这篇论文主要用于验证GPT模型在文本分类 задачі中的能力。
  • methods: 这篇论文使用了zero-shot learning(ZSL)的步骤启发(CoT)来实现LLMs的应用,并与传统的问题和答案格式进行比较。
  • results: 实验结果显示LLMs在四个测试数据集中的表现具有优秀的效能,特别是适合小型企业或团队,可能无法拥有大量的文本分类知识。
    Abstract Retrained large language models (LLMs) have become extensively used across various sub-disciplines of natural language processing (NLP). In NLP, text classification problems have garnered considerable focus, but still faced with some limitations related to expensive computational cost, time consumption, and robust performance to unseen classes. With the proposal of chain of thought prompting (CoT), LLMs can be implemented using zero-shot learning (ZSL) with the step by step reasoning prompts, instead of conventional question and answer formats. The zero-shot LLMs in the text classification problems can alleviate these limitations by directly utilizing pretrained models to predict both seen and unseen classes. Our research primarily validates the capability of GPT models in text classification. We focus on effectively utilizing prompt strategies to various text classification scenarios. Besides, we compare the performance of zero shot LLMs with other state of the art text classification methods, including traditional machine learning methods, deep learning methods, and ZSL methods. Experimental results demonstrate that the performance of LLMs underscores their effectiveness as zero-shot text classifiers in three of the four datasets analyzed. The proficiency is especially advantageous for small businesses or teams that may not have extensive knowledge in text classification.
    摘要 使用了重新训练的大语言模型(LLM)在不同的自然语言处理(NLP)领域得到了广泛的应用。在NLP中,文本分类问题吸引了很大的关注,但仍然面临一些计算成本高、时间耗费大、鲁棒性不强的限制。采用链式思维提示(CoT),LLM可以通过零shot学习(ZSL)的步骤性提问,而不是传统的问答格式,来实现。零shot LLMS在文本分类问题中可以消除这些限制,直接使用预训练模型预测已见和未见类。我们的研究主要验证GPT模型在文本分类问题中的能力。我们关注使用提示策略在不同的文本分类场景中的有效性。此外,我们与其他现代文本分类方法,包括传统机器学习方法、深度学习方法和ZSL方法进行比较。实验结果表明,LLMs在四个数据集中的性能强调它们作为零shot文本分类器的效iveness。特别是对小公司或团队来说,这种效能具有优势,因为它们可能没有大量的文本分类知识。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

  • paper_url: http://arxiv.org/abs/2312.01040
  • repo_url: None
  • paper_authors: Qiang Li, Xiaoyan Yang, Haowen Wang, Qin Wang, Lei Liu, Junjie Wang, Yang Zhang, Mingyuan Chu, Sen Hu, Yicheng Chen, Yue Shen, Cong Fan, Wangshu Zhang, Teng Xu, Jinjie Gu, Jing Zheng, Guannan Zhang Ant Group
  • for: 本研究旨在适应大语言模型(LLM)在敏感应用中的挑战,如医疗知识推理和医生式回答。
  • methods: 我们使用了一种三stage优化过程,包括普通医学知识注入、医学领域指导调整和特定医学任务适应。
  • results: 我们的AntGLM-Med-10B模型在PubMedQA上表现出色,超越了大多数LLM模型,包括普通和医学LLM模型,即使这些模型有更大的模型大小。
    Abstract Recently, large language model (LLM) based artificial intelligence (AI) systems have demonstrated remarkable capabilities in natural language understanding and generation. However, these models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner. Prior studies attempted to overcome this challenge by increasing the model size (>100B) to learn more general medical knowledge, while there is still room for improvement in LLMs with smaller-scale model sizes (<100B). In this work, we start from a pre-trained general LLM model (AntGLM-10B) and fine-tune it from a medical beginner towards a medical expert (called AntGLM-Med-10B), which leverages a 3-stage optimization procedure, \textit{i.e.}, general medical knowledge injection, medical domain instruction tuning, and specific medical task adaptation. Our contributions are threefold: (1) We specifically investigate how to adapt a pre-trained general LLM in medical domain, especially for a specific medical task. (2) We collect and construct large-scale medical datasets for each stage of the optimization process. These datasets encompass various data types and tasks, such as question-answering, medical reasoning, multi-choice questions, and medical conversations. (3) Specifically for multi-choice questions in the medical domain, we propose a novel Verification-of-Choice approach for prompting engineering, which significantly enhances the reasoning ability of LLMs. Remarkably, by combining the above approaches, our AntGLM-Med-10B model can outperform the most of LLMs on PubMedQA, including both general and medical LLMs, even when these LLMs have larger model size.
    摘要 近些时间,大语言模型(LLM)基于人工智能(AI)系统在自然语言理解和生成方面已经表现出了非常出色的能力。然而,这些模型在敏感应用中仍面临一个 significiant 挑战,例如在医疗领域进行医学知识的推理和回答医生般的问题。先前的研究尝试通过增加模型大小(>100B)来学习更加一般的医学知识,但还有一些可以提高的空间。在这项工作中,我们从一个预训练的通用LLM模型(AntGLM-10B)开始,并通过三个阶段优化过程,即通用医学知识注入、医学领域指导参数调整和特定医学任务适应来优化模型。我们的贡献有三个方面:1. 我们专门研究如何在医学领域中适应预训练的通用LLM模型,特别是 для一个特定的医学任务。2. 我们收集和构建了大规模的医学数据集,用于每个优化阶段的优化过程。这些数据集包括问答、医学推理、多选题和医学对话等多种数据类型和任务。3. 特别是在医学多选题方面,我们提出了一种新的验证选择方法,用于提高LLM的推理能力。结果显示,通过组合以上方法,我们的AntGLM-Med-10B模型可以在PubMedQA上表现出色,包括总体和医学LLM在内的大多数LLM模型,即使这些模型有更大的模型大小。

Dual-Teacher De-biasing Distillation Framework for Multi-domain Fake News Detection

  • paper_url: http://arxiv.org/abs/2312.01006
  • repo_url: https://github.com/ningljy/dtdbd
  • paper_authors: Jiayang Li, Xuan Feng, Tianlong Gu, Liang Chang
  • for: 多domain fake news detection 的目的是识别不同领域的新闻是真实的或假的,这已成为紧迫和重要的问题。
  • methods: 我们提出了 dual-teacher de-biasing distillation 框架 (DTDBD),以解决领域偏见问题。DTDBD 采用了 teacher-student 结构,其中预训练的大教师指导学生模型。特别是,DTDBD 包括一个不偏向的教师和一个干净的教师,两者共同引导学生模型减少领域偏见。
  • results: 我们的方法在中文和英文 dataset 上进行了广泛的实验,结果表明,相比于基eline方法,我们的方法可以减少领域偏见指标,同时保持竞争性的性能。
    Abstract Multi-domain fake news detection aims to identify whether various news from different domains is real or fake and has become urgent and important. However, existing methods are dedicated to improving the overall performance of fake news detection, ignoring the fact that unbalanced data leads to disparate treatment for different domains, i.e., the domain bias problem. To solve this problem, we propose the Dual-Teacher De-biasing Distillation framework (DTDBD) to mitigate bias across different domains. Following the knowledge distillation methods, DTDBD adopts a teacher-student structure, where pre-trained large teachers instruct a student model. In particular, the DTDBD consists of an unbiased teacher and a clean teacher that jointly guide the student model in mitigating domain bias and maintaining performance. For the unbiased teacher, we introduce an adversarial de-biasing distillation loss to instruct the student model in learning unbiased domain knowledge. For the clean teacher, we design domain knowledge distillation loss, which effectively incentivizes the student model to focus on representing domain features while maintaining performance. Moreover, we present a momentum-based dynamic adjustment algorithm to trade off the effects of two teachers. Extensive experiments on Chinese and English datasets show that the proposed method substantially outperforms the state-of-the-art baseline methods in terms of bias metrics while guaranteeing competitive performance.
    摘要 多域伪新闻探测目标是判断不同域的新闻是真伪的,现在已经是非常紧迫和重要的。然而,现有的方法专注于提高伪新闻探测的总性表现,忽略了不同域的数据不对称问题,即域名偏见问题。为解决这个问题,我们提出了双教师抑衡对应架构(DTDBD),以减少不同域的偏见。DTDBD运用知识传授方法,包括一个无偏老师和一个清洁老师,对学生模型进行导师。具体来说,DTDBD包括一个不偏老师和一个清洁老师,两者共同导引学生模型,以减少域名偏见并维持表现。 для无偏老师,我们引入了一个反对抗偏见对应损失,以教学生模型学习不偏的域知识。 для清洁老师,我们设计了领域知识传授损失,以优化学生模型对域特征的表现。此外,我们提出了一个动态调节算法,以调节两位老师的效果。实验结果显示,提案的方法在中文和英文数据集上具有substantially outperform了现有基eline方法,而且保持了竞争性的表现。