2023-09-06

cs.CL

cs.CL - 2023-09-06

RoDia: A New Dataset for Romanian Dialect Identification from Speech

paper_url: http://arxiv.org/abs/2309.03378
repo_url: https://github.com/codrut2/rodia
paper_authors: Codrut Rotaru, Nicolae-Catalin Ristea, Radu Tudor Ionescu
for: 这个论文的目的是提供一个关于罗马尼亚语言方言识别的数据集，以便进一步研究罗马尼亚语言方言识别技术。
methods: 这个论文使用了一些竞争性的模型来进行罗马尼亚语言方言识别，包括一个基于语音特征的模型和一个基于文本特征的模型。
results: 根据 macro F1 分数，这个论文的最高分模型可以达到 59.83% 和 62.08%，表明这是一个有挑战性的任务。

Abstract
Dialect identification is a critical task in speech processing and language technology, enhancing various applications such as speech recognition, speaker verification, and many others. While most research studies have been dedicated to dialect identification in widely spoken languages, limited attention has been given to dialect identification in low-resource languages, such as Romanian. To address this research gap, we introduce RoDia, the first dataset for Romanian dialect identification from speech. The RoDia dataset includes a varied compilation of speech samples from five distinct regions of Romania, covering both urban and rural environments, totaling 2 hours of manually annotated speech data. Along with our dataset, we introduce a set of competitive models to be used as baselines for future research. The top scoring model achieves a macro F1 score of 59.83% and a micro F1 score of 62.08%, indicating that the task is challenging. We thus believe that RoDia is a valuable resource that will stimulate research aiming to address the challenges of Romanian dialect identification. We publicly release our dataset and code at https://github.com/codrut2/RoDia.

摘要
<>TRANSLATE_TEXT叙述语言识别是语音处理和语言科技领域中的一项关键任务，提高了各种应用，如语音识别、 speaker 验证等。而大多数研究都集中在普通语言上进行了叙述语言识别研究，对于低资源语言，如罗马尼亚语，则受到了少量关注。为了填补这个研究漏洞，我们介绍了 RoDia，罗马尼亚语言识别的首个数据集。RoDia 数据集包括来自罗马尼亚五个区域的语音采样，涵盖了城市和农村环境，总计两小时的手动标注的语音数据。同时，我们也提供了一组竞争力强的模型，用于未来研究的基线。最高分模型的macro F1分数为59.83%，微 F1分数为62.08%，这表明该任务是有挑战性的。因此，我们认为 RoDia 是一个有价值的资源，将激发研究者努力解决罗马尼亚语言识别的挑战。我们在 GitHub 上公开发布了数据集和代码，请参考。

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

paper_url: http://arxiv.org/abs/2309.03340
repo_url: None
paper_authors: Arvind Krishna Sridhar, Yinyi Guo, Erik Visser, Rehana Mahfuz
for: 本研究旨在解决自动化音频captioning任务中的过参数和大型模型存储问题。
methods: 我们提出了一种数据增强技术，并使用 Shared latent space 来检测幻觉。我们还提出了一种 parameter efficient 的推理时 faithful decoding 算法，以降低模型的存储大小和计算复杂度。
results: 我们在标准测试集上证明了我们的方法可以 достичь与更大的模型相同的性能，同时具有较小的存储大小和计算复杂度。

Abstract
There has been significant research on developing pretrained transformer architectures for multimodal-to-text generation tasks. Albeit performance improvements, such models are frequently overparameterized, hence suffer from hallucination and large memory footprint making them challenging to deploy on edge devices. In this paper, we address both these issues for the application of automated audio captioning. First, we propose a data augmentation technique for generating hallucinated audio captions and show that similarity based on an audio-text shared latent space is suitable for detecting hallucination. Then, we propose a parameter efficient inference time faithful decoding algorithm that enables smaller audio captioning models with performance equivalent to larger models trained with more data. During the beam decoding step, the smaller model utilizes an audio-text shared latent representation to semantically align the generated text with corresponding input audio. Faithful guidance is introduced into the beam probability by incorporating the cosine similarity between latent representation projections of greedy rolled out intermediate beams and audio clip. We show the efficacy of our algorithm on benchmark datasets and evaluate the proposed scheme against baselines using conventional audio captioning and semantic similarity metrics while illustrating tradeoffs between performance and complexity.

摘要
有很多研究在开发预训练变换器架构来进行多modal-to-text生成任务。虽然性能有所提高，但这些模型经常过参数，导致幻觉和巨大的内存占用，使其在边缘设备上部署困难。在这篇论文中，我们解决了这些问题，并将其应用于自动化音频captioning。我们首先提出了一种数据增强技术，通过生成幻觉的音频caption来检测幻觉。然后，我们提出了一种 parameter efficient的执行时 faithful decoding算法，允许更小的音频captioning模型，并且其性能与更多数据训练的更大模型相同。在扩散搜索步骤中，更小的模型使用音频-文本共同的幻像表示来 semantic align生成的文本和相应的输入音频。我们引入了cosine相似性来导引搜索步骤中的投票概率，以确保生成的文本具有与输入音频的Semantic相似性。我们在标准 datasets上证明了我们的算法的效果，并对基eline使用 conventient audio captioning和semantic相似度度量进行评估，同时示出了性能和复杂度之间的负责任。

Gender-specific Machine Translation with Large Language Models

paper_url: http://arxiv.org/abs/2309.03175
repo_url: None
paper_authors: Eduardo Sánchez, Pierre Andrews, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà
for: investigate the use of Decoder-only Large Language Models (LLMs) for gender-specific translations
methods: use LLaMa, a decoder-only LLM, to generate gender-specific translations and compare its performance to a state-of-the-art multilingual NMT system (NLLB)
results: LLaMa can generate gender-specific translations with competitive accuracy and mitigate gender bias, and its translations are robust in gender-ambiguous datasets but less consistent in less ambiguous contexts.

Abstract
Decoder-only Large Language Models (LLMs) have demonstrated potential in machine translation (MT), albeit with performance slightly lagging behind traditional encoder-decoder Neural Machine Translation (NMT) systems. However, LLMs offer a unique advantage: the ability to control the properties of the output through prompts. In this study, we harness this flexibility to explore LLaMa's capability to produce gender-specific translations for languages with grammatical gender. Our results indicate that LLaMa can generate gender-specific translations with competitive accuracy and gender bias mitigation when compared to NLLB, a state-of-the-art multilingual NMT system. Furthermore, our experiments reveal that LLaMa's translations are robust, showing significant performance drops when evaluated against opposite-gender references in gender-ambiguous datasets but maintaining consistency in less ambiguous contexts. This research provides insights into the potential and challenges of using LLMs for gender-specific translations and highlights the importance of in-context learning to elicit new tasks in LLMs.

摘要
大型语言模型（LLM）只有decoder部分的实现在机器翻译（MT）中表现了潜在的优势，虽然其性能略为落后于传统的编码器-解码器神经机器翻译（NMT）系统。然而，LLM具有一个独特的优势：可以通过提示来控制输出的性质。在这项研究中，我们利用这种灵活性来探索LLaMa的可能性以生成语言中 grammatical gender 的 gender-specific 翻译。我们的结果表明，LLaMa可以生成与NLLB（一种现代多语言 NMT 系统）相比的竞争性准确性和减少 gender bias 的 gender-specific 翻译。此外，我们的实验还表明了 LLaMa 的翻译是稳定的，在 gender-ambiguous 数据集中评估时表现出了明显的性能下降，但在 less ambiguous 上保持了一致性。这项研究为使用 LLM 进行 gender-specific 翻译提供了新的视角和挑战，并 highlighted 在 LLM 中进行 in-context learning 以提取新任务的重要性。

GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

paper_url: http://arxiv.org/abs/2309.03079
repo_url: https://github.com/UditGupta10/GPT-InvestAR
paper_authors: Udit Gupta
For: The paper aims to simplify the process of assessing Annual Reports of all firms by leveraging the capabilities of Large Language Models (LLMs) to generate insights and improve stock price predictions.* Methods: The paper uses Large Language Models (LLMs) to analyze Annual Reports and generate insights, which are then compiled into a Quant styled dataset and augmented with historical stock price data. A Machine Learning model is trained with LLM outputs as features to predict stock prices.* Results: The walkforward test results show promising outperformance compared to S&P500 returns, indicating the effectiveness of the proposed framework in predicting stock prices.Here’s the simplified Chinese text for the three information points:* For: 这份论文旨在使用大量语言模型（LLMs）来简化所有公司的年度报告评估过程，以提高股票价格预测的准确性。* Methods: 论文使用大量语言模型（LLMs）来分析年度报告，生成报告中的材料，并将其与历史股票价格数据相结合。使用机器学习模型，将LMM输出作为特征来预测股票价格。* Results: 论文的步骤测试结果表明，使用该方法可以与S&P500的返点相比，表明该方法的效果性。

Abstract
Annual Reports of publicly listed companies contain vital information about their financial health which can help assess the potential impact on Stock price of the firm. These reports are comprehensive in nature, going up to, and sometimes exceeding, 100 pages. Analysing these reports is cumbersome even for a single firm, let alone the whole universe of firms that exist. Over the years, financial experts have become proficient in extracting valuable information from these documents relatively quickly. However, this requires years of practice and experience. This paper aims to simplify the process of assessing Annual Reports of all the firms by leveraging the capabilities of Large Language Models (LLMs). The insights generated by the LLM are compiled in a Quant styled dataset and augmented by historical stock price data. A Machine Learning model is then trained with LLM outputs as features. The walkforward test results show promising outperformance wrt S&P500 returns. This paper intends to provide a framework for future work in this direction. To facilitate this, the code has been released as open source.

摘要
公司年度报告包含重要的财务健康信息，可以帮助评估公司股票价格的可能影响。这些报告是全面的，有时 même exceeding 100 页。分析这些报告是困难的，尤其是对于整个公司宇宙。随着年月的掌握，金融专家们已经学会了快速提取这些报告中的有价值信息。然而，这需要多年的实践和经验。本文提出了使用大型自然语言模型（LLM）简化公司年度报告的评估过程。LLM 输出的含义被编译成量化风格的数据集，并与历史股票价格数据进行拟合。然后，使用 LLM 输出作为特征来训练机器学习模型。walkforward 测试结果表明，使用这种方法可以实现出色的超前性，与 S&P500 回报相比。本文的目标是提供未来研究的框架。为此，我们已经公开发布了代码。

Narrative as a Dynamical System

paper_url: http://arxiv.org/abs/2309.06600
repo_url: None
paper_authors: Isidoros Doxas, James Meiss, Steven Bottone, Tom Strelich, Andrew Plummer, Adrienne Breland, Simon Dennis, Kathy Garvin-Doxas, Michael Klymkowsky
for: 这个论文探讨了人类活动和叙事的动力系统特性，并使用物理学概念来描述它们的演化。
methods: 该论文使用了500个不同的叙事来构建三条平均路径，并证明这些平均路径符合动力学原理。
results: 研究发现，人类活动和叙事的演化可以被视为动力系统，其演化可以通过动力学原理来描述。

Abstract
There is increasing evidence that human activity in general, and narrative in particular, can be treated as a dynamical system in the physics sense; a system whose evolution is described by an action integral, such that the average of all possible paths from point A to point B is given by the extremum of the action. We create by construction three such paths by averaging about 500 different narratives, and we show that the average path is consistent with an action principle.

摘要
人类活动和 narative 可以被视为物理学上的动力系统，其演化可以通过动力 integral 来描述，其中所有可能的路径从点 A 到点 B 的平均值是动力 integral 的极值。我们通过构建3个这样的路径，并证明这些路径符合动力原理。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Everyone Deserves A Reward: Learning Customized Human Preferences

paper_url: http://arxiv.org/abs/2309.03126
repo_url: https://github.com/linear95/dsp
paper_authors: Pengyu Cheng, Jiawen Xie, Ke Bai, Yong Dai, Nan Du
for: 本研究旨在提高大语言模型（LLM）与人类偏好的对应性，以提高交互质量。
methods: 本研究提出了一种三Stage个性化奖励模型（RM）学习方案，并在实验阶段采用了多种训练和数据策略来保持普遍偏好的能力。
results: 研究发现，三Stage个性化奖励模型可以更好地适应个性化应用场景，并且可以保持普遍偏好的能力。此外，研究还发现了一些可以更好地保持普遍偏好的训练和数据策略。

Abstract
Reward models (RMs) are essential for aligning large language models (LLMs) with human preferences to improve interaction quality. However, the real world is pluralistic, which leads to diversified human preferences with respect to different religions, politics, cultures, etc. Moreover, each individual can have their unique preferences on various topics. Neglecting the diversity of human preferences, current human feedback aligning methods only consider a general reward model, which is below satisfaction for customized or personalized application scenarios. To explore customized preference learning, we collect a domain-specific preference (DSP) dataset, which includes preferred responses for each given query from four practical domains. Besides, from the perspective of data efficiency, we propose a three-stage customized RM learning scheme, then empirically verify its effectiveness on both general preference datasets and our DSP set. Furthermore, we test multiple training and data strategies on the three learning stages. We find several ways to better preserve the general preferring ability while training the customized RMs, especially general preference enrichment, and customized preference imitation learning. The DSP dataset and code are available at https://github.com/Linear95/DSP.

摘要
<>将文本翻译成简化中文。<>大型语言模型（LLM）需要奖励模型（RM）来调整和人类偏好以提高交互质量。然而，现实世界是多元的，这导致了不同的宗教、政治、文化等方面的多样化人类偏好。此外，每个个体可能有对各种主题的独特偏好。忽视人类偏好的多样性，现有的人类反馈对齐方法只考虑通用奖励模型，这对个性化或个性化应用场景下是不满足的。为了探索个性化偏好学习，我们收集了域pecific preference（DSP） dataset，该集包括每个查询的首选回答。此外，从数据效率的角度，我们提议了三个阶段个性化RM学习方案，然后经验验证其效果在通用奖励 dataset和我们的 DSP 集上。此外，我们在三个学习阶段中测试了多种训练和数据策略，发现了一些方法可以更好地保持通用偏好能力 while 训练个性化RM，特别是通用偏好增强和个性化偏好模仿学习。 DSP dataset 和代码可以在上获取。

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

paper_url: http://arxiv.org/abs/2309.03118
repo_url: None
paper_authors: Chao Feng, Xinyu Zhang, Zichu Fei
for: 提高大语言模型（LLM）的域专知和解释能力
methods: 提出了一种名为知识解决器（KSL）的方法，通过利用LLM的强大总体化能力来教育LLM检索外部知识库中的关键信息
results: 在三个数据集上（ CommonsenseQA、OpenbookQA 和 MedQA-USMLE）进行了实验，发现我们的方法可以提高LLM基线性能，相比原始LLM，提高了相对较大的margin。

Abstract
Large language models (LLMs), such as ChatGPT and GPT-4, are versatile and can solve different tasks due to their emergent ability and generalizability. However, LLMs sometimes lack domain-specific knowledge to perform tasks, which would also cause hallucination during inference. In some previous works, additional modules like graph neural networks (GNNs) are trained on retrieved knowledge from external knowledge bases, aiming to mitigate the problem of lacking domain-specific knowledge. However, incorporating additional modules: 1) would need retraining additional modules when encountering novel domains; 2) would become a bottleneck since LLMs' strong abilities are not fully utilized for retrieval. In this paper, we propose a paradigm, termed Knowledge Solver (KSL), to teach LLMs to search for essential knowledge from external knowledge bases by harnessing their own strong generalizability. Specifically, we design a simple yet effective prompt to transform retrieval into a multi-hop decision sequence, which empowers LLMs with searching knowledge ability in zero-shot manner. Additionally, KSL is able to provide complete retrieval paths and therefore increase explainability of LLMs' reasoning processes. We conduct experiments on three datasets: CommonsenseQA, OpenbookQA, and MedQA-USMLE, and found that our approach improves LLM baseline performance by a relatively large margin.

摘要
大型语言模型（LLM），如ChatGPT和GPT-4，具有多方面的能力和普遍化能力，可以解决不同任务。然而，LLM sometimes lack domain-specific knowledge to perform tasks, leading to hallucination during inference. 在一些先前的工作中，运用了另外的模组，如 graf neural network（GNN），在外部知识库中获取知识，以减少缺乏专业知识的问题。但是，将这些模组添加到 LLM 中会有以下问题：1）需要重新训练这些模组当遇到新的领域时; 2）会成为一个瓶颈，因为 LLM 的强大能力不能完全利用来搜寻。在这篇论文中，我们提出了一个概念，称为知识解决方案（KSL），以 teach LLM 如何从外部知识库中搜寻必要的知识。具体来说，我们设计了一个简单 yet 有效的提示，将搜寻变成多阶层决策序列，这使得 LLM 获得了寻找知识的能力，而不需要重新训练。此外，KSL 能够提供完整的搜寻路径，因此增加了 LLM 的解释过程的可读性。我们在 CommonsenseQA、OpenbookQA 和 MedQA-USMLE 三个数据集上进行了实验，发现我们的方法可以对 LLM 基线性能提高相对较大的margin。

ContrastWSD: Enhancing Metaphor Detection with Word Sense Disambiguation Following the Metaphor Identification Procedure

paper_url: http://arxiv.org/abs/2309.03103
repo_url: None
paper_authors: Mohamad Elzohbi, Richard Zhao
for: 本研究开发了一个基于 RoBERTa 的比喻检测模型，将比喻识别程序 (MIP) 和字汇散义 (WSD) 组合使用，从文本中提取和比较词汇的上下文意义和基本意义，以判断词汇是否在句子中使用比喻性。
methods: 本模型使用 WSD 模型获取词汇的不同意义，并与上下文嵌入相结合，以优化比喻检测过程。
results: 本研究在不同的 benchmark 数据集上进行评估，与强基eline 进行比较，结果显示本模型优于其他仅靠上下文嵌入或仅融合基本定义和其他外部知识的方法。

Abstract
This paper presents ContrastWSD, a RoBERTa-based metaphor detection model that integrates the Metaphor Identification Procedure (MIP) and Word Sense Disambiguation (WSD) to extract and contrast the contextual meaning with the basic meaning of a word to determine whether it is used metaphorically in a sentence. By utilizing the word senses derived from a WSD model, our model enhances the metaphor detection process and outperforms other methods that rely solely on contextual embeddings or integrate only the basic definitions and other external knowledge. We evaluate our approach on various benchmark datasets and compare it with strong baselines, indicating the effectiveness in advancing metaphor detection.

摘要
Here's the translation in Simplified Chinese:这篇论文介绍了 ContrastWSD，一种基于 RoBERTa 的比喻检测模型，该模型结合了 Metaphor Identification Procedure (MIP) 和 Word Sense Disambiguation (WSD)，以提取和对比上下文中的意思和基本意思来确定一个词是否在句子中使用比喻。通过利用 WSD 模型提取出的词义，我们的模型可以增强比喻检测的过程，并超越其他基于上下文嵌入或只是将基本定义和其他知识集成的方法。我们在多个标准测试集上评估了我们的方法，并与强基线相比较，表明我们的方法有效地提高了比喻检测。

Persona-aware Generative Model for Code-mixed Language

paper_url: http://arxiv.org/abs/2309.02915
repo_url: https://github.com/victor7246/paradox
paper_authors: Ayan Sengupta, Md Shad Akhtar, Tanmoy Chakraborty
for: 本研究旨在开发一种基于人物特点的代码混合生成模型，以生成更加真实的人类语言混合文本。
methods: 提出了一种基于Transformer编码器-解码器模型的Persona-aware Generative Model for Code-mixed Generation（PARADOX），通过对每个词语进行人物特点conditioning，生成更加自然的code-mixed文本。同时，提出了一种Alignment模块，以重新调整生成的序列，使其更加符合实际的语言混合文本。
results: PARADOX在测试集上的平均CM BLEU分数高于非人物基于模型1.6个分，其表现在混合文本的语义准确性和语言VALIDITY方面也有32%和47%的提升。

Abstract
Code-mixing and script-mixing are prevalent across online social networks and multilingual societies. However, a user's preference toward code-mixing depends on the socioeconomic status, demographics of the user, and the local context, which existing generative models mostly ignore while generating code-mixed texts. In this work, we make a pioneering attempt to develop a persona-aware generative model to generate texts resembling real-life code-mixed texts of individuals. We propose a Persona-aware Generative Model for Code-mixed Generation, PARADOX, a novel Transformer-based encoder-decoder model that encodes an utterance conditioned on a user's persona and generates code-mixed texts without monolingual reference data. We propose an alignment module that re-calibrates the generated sequence to resemble real-life code-mixed texts. PARADOX generates code-mixed texts that are semantically more meaningful and linguistically more valid. To evaluate the personification capabilities of PARADOX, we propose four new metrics -- CM BLEU, CM Rouge-1, CM Rouge-L and CM KS. On average, PARADOX achieves 1.6 points better CM BLEU, 47% better perplexity and 32% better semantic coherence than the non-persona-based counterparts.

摘要
【文本】在社交媒体和多语言社会中，混合代码和文本混合是普遍存在的。然而，用户对代码混合的偏好受到了用户的社会经济地位、人口结构和当地情况的影响，这些因素现存在的生成模型大多忽略。在这种情况下，我们提出了一种基于Transformer的Persona-aware生成模型，名为PARADOX，可以生成符合实际生活中个人代码混合文本的文本。我们提出了一个对齐模块，可以重新调整生成的序列，使其更加接近实际生活中的代码混合文本。PARADOX生成的文本具有更高的semantic coherence和linguistic validity。为评估PARADOX的人格化能力，我们提出了四个新的指标：CM BLEU、CM Rouge-1、CM Rouge-L和CM KS。平均而言，PARADOX在这些指标中的表现比非人格化counterpart更好，其CM BLEU指标提高1.6个点，折占率提高47%，semantic coherence提高32%。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the original text in English, and some cultural references or idioms may not be fully preserved in the translation.

Leave no Place Behind: Improved Geolocation in Humanitarian Documents

paper_url: http://arxiv.org/abs/2309.02914
repo_url: None
paper_authors: Enrico M. Belliardo, Kyriaki Kalimeri, Yelena Mejova
for: The paper aims to improve the performance of natural language processing (NLP) tools in the humanitarian sector by developing annotated resources for geotagging humanitarian texts.
methods: The authors use two popular Named Entity Recognition (NER) tools, Spacy and roBERTa, and develop a geocoding method called FeatureRank to link candidate locations to the GeoNames database.
results: The authors find that the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92) and alleviates some of the bias of the existing tools, which erroneously favor locations in Western countries. However, they conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for deployment in the humanitarian sector.Here is the same information in Simplified Chinese text:
for: 这篇论文目标是提高人道主义领域自然语言处理（NLP）工具的性能，通过开发人道主义文本地标注资源。
methods: 作者使用两种流行的名实Recognition（NER）工具，Spacy和roBERTa，并开发了一种名为FeatureRank的地编码方法，将候选地点与GeoNames数据库相关联。
results: 作者发现，人道主义领域数据可以提高分类器的性能（最高F1 = 0.92），并减少现有工具的偏见，但是发现更多的非西方文档资源是必要的，以确保off-the-shelf NER系统适用于人道主义领域。

Abstract
Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector.

摘要
地理位置是人道主义应对中的关键元素，描述易受影响人口、进行中的事件和可用资源。最新的自然语言处理技术可能帮助提取人道主义领域的重要信息。然而，现有状态的信息EXTRACTION工具的性能和偏见仍未得到了评估。在这项工作中，我们开发了精心标注的资源，用于精细地调整Spacy和roBERTa等流行的名实Recognition（NER）工具。然后，我们提出了一种地图编码方法FeatureRank，将候选地点与GeoNames数据库连接起来。我们发现，不仅可以提高人道主义领域的数据改进了分类器的性能（F1 = 0.92），还可以解决现有工具的偏见，这些工具偏 toward西方国家的位置。因此，我们 conclude that需要更多来自非西方文档的资源，以确保off-the-shelf NER系统适用于人道主义领域。

paper_url: http://arxiv.org/abs/2309.02902
repo_url: https://github.com/phanchauthang/ViCGCN
paper_authors: Chau-Thang Phan, Quoc-Nam Nguyen, Chi-Thanh Dang, Trong-Hop Do, Kiet Van Nguyen
for: 本研究旨在提高越南语言社交媒体上的信息挖掘 task 的效果，通过利用图structured data的特点来 Address 数据不均匀和噪音问题。
methods: 本研究提出了一种基于 PhoBERT 和 Graph Convolutional Networks 的新方法，称为 ViCGCN，通过将它们相互融合以提高语言模型的表达能力和语义依赖关系的捕捉。
results: 对于多个越南语言社交媒体数据集， experiments 表明，应用 GCN 到 BERTology 模型的最终层可以显著提高性能，而 ViCGCN 还可以超过 13 个基eline 模型，包括 BERTology 模型、 fusions BERTology 和 GCN 模型、其他基eline 和 SOTA 在三个社交媒体数据集上。

Abstract
Social media processing is a fundamental task in natural language processing with numerous applications. As Vietnamese social media and information science have grown rapidly, the necessity of information-based mining on Vietnamese social media has become crucial. However, state-of-the-art research faces several significant drawbacks, including imbalanced data and noisy data on social media platforms. Imbalanced and noisy are two essential issues that need to be addressed in Vietnamese social media texts. Graph Convolutional Networks can address the problems of imbalanced and noisy data in text classification on social media by taking advantage of the graph structure of the data. This study presents a novel approach based on contextualized language model (PhoBERT) and graph-based method (Graph Convolutional Networks). In particular, the proposed approach, ViCGCN, jointly trained the power of Contextualized embeddings with the ability of Graph Convolutional Networks, GCN, to capture more syntactic and semantic dependencies to address those drawbacks. Extensive experiments on various Vietnamese benchmark datasets were conducted to verify our approach. The observation shows that applying GCN to BERTology models as the final layer significantly improves performance. Moreover, the experiments demonstrate that ViCGCN outperforms 13 powerful baseline models, including BERTology models, fusion BERTology and GCN models, other baselines, and SOTA on three benchmark social media datasets. Our proposed ViCGCN approach demonstrates a significant improvement of up to 6.21%, 4.61%, and 2.63% over the best Contextualized Language Models, including multilingual and monolingual, on three benchmark datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively. Additionally, our integrated model ViCGCN achieves the best performance compared to other BERTology integrated with GCN models.

摘要
社交媒体处理是自然语言处理的基本任务，具有广泛的应用。随着越南社交媒体和信息科学的快速发展，对越南社交媒体上的信息挖掘成为了急需。然而，现状的研究面临着一些重要的缺点，包括社交媒体平台上的数据不均衡和噪音。这两个问题在越南社交媒体文本中具有重要性。图 convolutional neural networks 可以在文本分类任务中解决社交媒体上的数据不均衡和噪音问题，因为它可以利用数据的图结构。本研究提出了一种基于上下文化语言模型（PhoBERT）和图基本方法（图 convolutional neural networks）的新approach。具体来说，我们的方法ViCGCN通过将上下文化预测器和图基本方法相结合，以提高文本分类任务中的性能。我们对越南多个benchmark dataset进行了广泛的实验，以验证我们的方法。结果显示，在BERTology模型的最终层添加GCN层可以显著提高性能。此外，实验还表明，ViCGCN方法在三个benchmark社交媒体dataset上比13种基eline模型，包括BERTology模型、混合BERTology和GCN模型、其他基eline和SOTA的最佳性能。我们的提出的ViCGCN方法在三个benchmark dataset上提高了6.21%、4.61%和2.63%。此外，我们的整合模型ViCGCN在BERTology和GCN模型的混合方法中表现出了最佳性能。

paper_url: http://arxiv.org/abs/2309.06572
repo_url: None
paper_authors: Amit Moryossef
For: 本研究探讨了人类交流中的关键 yet often overlooked 的非语言示意，包括合作谈话姿势和表情，以及它们对自然语言处理（NLP）的影响。* Methods: 我们建议通过借鉴了手语处理技术的发展，开发一种通用的自动手势分割和转写模型，以将非语言示意转化为文本形式。这种方法可以填补 spoken language 理解中的盲点，扩大 NLP 模型的范围和适用范围。* Results: 通过启示性的例子，我们展示了依赖 solely 文本基础模型的局限性。我们提出了一种 computationally efficient 和灵活的方法，可以轻松地与现有 NLP 管道集成。我们 conclude 呼吁研究人员参与开发通用转写方法，并验证其效果性。

Abstract
This paper explores the critical but often overlooked role of non-verbal cues, including co-speech gestures and facial expressions, in human communication and their implications for Natural Language Processing (NLP). We argue that understanding human communication requires a more holistic approach that goes beyond textual or spoken words to include non-verbal elements. Borrowing from advances in sign language processing, we propose the development of universal automatic gesture segmentation and transcription models to transcribe these non-verbal cues into textual form. Such a methodology aims to bridge the blind spots in spoken language understanding, enhancing the scope and applicability of NLP models. Through motivating examples, we demonstrate the limitations of relying solely on text-based models. We propose a computationally efficient and flexible approach for incorporating non-verbal cues, which can seamlessly integrate with existing NLP pipelines. We conclude by calling upon the research community to contribute to the development of universal transcription methods and to validate their effectiveness in capturing the complexities of real-world, multi-modal interactions.

摘要
Translation notes:* "co-speech gestures" is translated as "同时手势" (tóngshí shǒu yì)* "facial expressions" is translated as "面孔表达" (miànkǒu biǎodòng)* "non-verbal cues" is translated as "非语言表达" (fēi yǔyán biǎodòng)* "spoken language understanding" is translated as "语言理解" (yǔyán lǐjiě)* "NLP" is translated as "自然语言处理" (zìrán yǔyán chùhē)* "universal automatic gesture segmentation and transcription models" is translated as "通用自动手势分割和译写模型" (tōngyòng zìdòng zhìyì fēnpièceshì yìyì módelì)* "computationally efficient and flexible approach" is translated as "高效灵活的方法" (gāoxìng língwù de fāngtiě)* "real-world, multi-modal interactions" is translated as "实际多模态互动" (shíjiè duōmódai yùdòng)

Aligning Large Language Models for Clinical Tasks

paper_url: http://arxiv.org/abs/2309.02884
repo_url: https://github.com/ssm123ssm/medGPT
paper_authors: Supun Manathunga, Isuru Hettigoda
for: 这个论文是为了探讨大语言模型（LLMs）在医疗应用中的适用性和效果。
methods: 这篇论文使用了一种组合技术，包括指令调整和在提示中使用少量和链式思维技巧，以提高 LLMS 的表现。
results: 这篇论文的实验结果表明，使用“扩展-猜测-调整”策略可以提高 LLMS 的表现，在一个子集问题来源于 USMLE 数据集上达到了 70.63% 的分数。

Abstract
Large Language Models (LLMs) have demonstrated remarkable adaptability, showcasing their capacity to excel in tasks for which they were not explicitly trained. However, despite their impressive natural language processing (NLP) capabilities, effective alignment of LLMs remains a crucial challenge when deploying them for specific clinical applications. The ability to generate responses with factually accurate content and to engage in non-trivial reasoning steps are crucial for the LLMs to be eligible for applications in clinical medicine. Employing a combination of techniques including instruction-tuning and in-prompt strategies like few-shot and chain-of-thought prompting has significantly enhanced the performance of LLMs. Our proposed alignment strategy for medical question-answering, known as 'expand-guess-refine', offers a parameter and data-efficient solution. A preliminary analysis of this method demonstrated outstanding performance, achieving a score of 70.63% on a subset of questions sourced from the USMLE dataset.

摘要

Agent-based simulation of pedestrians’ earthquake evacuation; application to Beirut, Lebanon

paper_url: http://arxiv.org/abs/2309.02812
repo_url: None
paper_authors: Rouba Iskandar, Kamel Allaw, Julie Dugdale, Elise Beck, Jocelyne Adjizian-Gérard, Cécile Cornou, Jacques Harb, Pascal Lacroix, Nada Badaro-Saliba, Stéphane Cartier, Rita Zaarour
for: 本研究旨在发展一种城市规模下的人行模拟器，以估算在地震时人员的迁徙和避险行为。
methods: 该模型 integrate了地震风险、物理抵触以及个人行为和 mobilty。使用GAMA进行高度实际的城市环境模拟，并使用过去的数据（包括建筑和土壤特性）以及新采集的高分辨率卫星图像数据来支持模型。
results: 研究发现，在地震时，人员可以快速迁徙到开放空间中，但是如果一些开放空间被锁死， then 52%的人口可以在5分钟内到达开放空间，而只有39%的人口可以到达开放空间。这些结果表明，城市中开放空间的存在和距离 residence 建筑物是关键的因素，以确保人员的安全性。

Abstract
Most seismic risk assessment methods focus on estimating the damages to the built environment and the consequent socioeconomic losses without fully taking into account the social aspect of risk. Yet, human behaviour is a key element in predicting the human impact of an earthquake, therefore, it is important to include it in quantitative risk assessment studies. In this study, an interdisciplinary approach simulating pedestrians' evacuation during earthquakes at the city scale is developed using an agent-based model. The model integrates the seismic hazard, the physical vulnerability as well as individuals' behaviours and mobility. The simulator is applied to the case of Beirut, Lebanon. Lebanon is at the heart of the Levant fault system that has generated several Mw>7 earthquakes, the latest being in 1759. It is one of the countries with the highest seismic risk in the Mediterranean region. This is due to the high seismic vulnerability of the buildings due to the absence of mandatory seismic regulation until 2012, the high level of urbanization, and the lack of adequate spatial planning and risk prevention policies. Beirut as the main residential, economic and institutional hub of Lebanon is densely populated. To accommodate the growing need for urban development, constructions have almost taken over all of the green areas of the city; squares and gardens are disappearing to give place to skyscrapers. However, open spaces are safe places to shelter, away from debris, and therefore play an essential role in earthquake evacuation. Despite the massive urbanization, there are a few open spaces but locked gates and other types of anthropogenic barriers often limit their access. To simulate this complex context, pedestrians' evacuation simulations are run in a highly realistic spatial environment implemented in GAMA [1]. Previous data concerning soil and buildings in Beirut [2, 3] are complemented by new geographic data extracted from high-resolution Pleiades satellite images. The seismic loading is defined as a peak ground acceleration of 0.3g, as stated in Lebanese seismic regulations. Building damages are estimated using an artificial neural network trained to predict the mean damage [4] based on the seismic loading as well as the soil and building vibrational properties [5]. Moreover, the quantity and the footprint of the generated debris around each building are also estimated and included in the model. We simulate how topography, buildings, debris, and access to open spaces, affect individuals' mobility. Two city configurations are implemented: 1. Open spaces are accessible without any barriers; 2. Access to some open spaces is blocked. The first simulation results show that while 52% of the population is able to arrive to an open space within 5 minutes after an earthquake, this number is reduced to 39% when one of the open spaces is locked. These results show that the presence of accessible open spaces in a city and their proximity to the residential buildings is a crucial factor for ensuring people's safety when an earthquake occurs.

摘要
多数地震风险评估方法都是估计建筑环境和后果的经济损害，未能充分考虑人类行为的影响。然而，人类行为是预测地震影响的关键因素，因此应包括其在量化风险评估研究中。本研究提出了一种涉及人类行为的地震风险评估模型，使用代理人模型来模拟步行者在地震中的逃生行为。该模型结合地震威胁、物理抵触以及个人行为和 mobilit。该模型在黎巴嫩apply于 случа子中。黎巴嫩位于地中海地震系统的中心，拥有许多Mw>7的地震，最近一次在1759年。黎巴嫩是地中海地区最高风险地震国家之一，这主要归功于建筑物的高度抵触、城市化水平高，以及缺乏合适的城市规划和风险预防政策。黎巴嫩作为国家的主要居住、经济和机构中心，人口密度非常高。为了满足城市发展的增长需求，建筑物几乎占用了所有绿地，广场和花园都被拆除，以建立高层楼房。然而，开放空间是避险的好地方，避免废墟的影响，因此在地震逃生中扮演着关键角色。尽管黎巴嫩城市化程度很高，但开放空间受到人工障碍的限制。为了模拟这种复杂的Context，我们在GAMA中运行了步行者逃生的高度现实主义空间环境。前一个数据集包括黎巴嫩的土壤和建筑物的特性，而新的地理数据则是从高分辨率的Pleiades卫星图像提取的。地震荷重定义为0.3g，根据黎巴嫩的地震规则。建筑物损害预测使用人工神经网络，基于地震荷重以及土壤和建筑物的振荡性能。此外，生成的废墟的质量和面积也被计算并包括在模型中。我们模拟了城市地形、建筑物、废墟和访问开放空间之间的相互作用，对人类 mobilit的影响。我们实现了两种城市配置：1. 开放空间无障碍访问; 2. 一些开放空间受阻。第一个 simulations 结果显示，当地震发生后，52%的人口可以在5分钟内到达开放空间，但当一个开放空间受阻时，这个数字降低到39%。这些结果表明，城市中开放空间的存在和距离居民建筑物的距离是预测人类安全的关键因素。

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

paper_url: http://arxiv.org/abs/2309.02780
repo_url: None
paper_authors: Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai
for: 这 paper 探讨了对 speech-to-semantic 任务的 instruction fine-tuning 技术，并提出了一个 unified end-to-end (E2E) 框架，该框架可以根据任务相关的提示生成目标文本 conditioned on 音频数据。
methods: 作者采用了大量和多样的数据预训练模型，并使用 text-to-speech (TTS) 系统生成 instruction-speech 对。
results: 对多个 benchmark 进行了广泛的实验，显示了作者提出的模型在 speech named entity recognition, speech sentiment analysis, speech question answering 等任务上具有state-of-the-art (SOTA) 性能，并且在 zero-shot 和 few-shot enario 中也达到了竞争性能。

Abstract
This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more, after fine-tuning. Furthermore, the proposed model achieves competitive performance in zero-shot and few-shot scenarios. To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code.

摘要
这篇论文探讨了对语音到 semantic 任务的指令细化技术，通过引入一个综合的终端到终端 (E2E) 框架，将目标文本根据任务相关的提示生成于语音数据上。我们在大量和多样的数据上预训练模型，使用文本到语音 (TTS) 系统生成 instrucion-speech 对。广泛的实验证明我们提出的模型在许多 bencmark 上实现了状态的最佳 (SOTA) 结果，并在零shot 和几shot enario 中达到了竞争性的性能。此外，我们的模型还能够在零shot 和几shot enario 中实现了竞争性的性能。为将来的 instruction 细化任务的研究提供便利，我们公开了我们的 instruction 数据集和代码。

Improving Code Generation by Dynamic Temperature Sampling

paper_url: http://arxiv.org/abs/2309.02772
repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
paper_authors: Yuqi Zhu, Jia Allen Li, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, Hong Mei
for: 研究了一种针对代码生成的特有decoding策略，以优化现有的大语言模型（LLM）在代码生成中的性能。
methods: 通过分析代码token的损失分布，发现代码token可以分为两类：困难的token和自信的token。采用适应温度（AdapT）采样方法，动态调整温度系数在不同的token时。
results: 对不同大小的LLM进行应用，在两个流行的数据集上进行评估，结果表明适应温度采样方法可以明显超过现有的decoding策略。

Abstract
Recently, Large Language Models (LLMs) have shown impressive results in code generation. However, existing decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming languages (PL). Due to this oversight, a better decoding strategy for code generation remains an open question. In this paper, we conduct the first systematic study to explore a decoding strategy specialized in code generation. With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. Among them, the challenging tokens mainly appear at the beginning of a code block. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens avoiding the influence of tail randomness noises. We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.

摘要
Through an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. The challenging tokens mainly appear at the beginning of a code block. Inspired by these findings, we propose a simple yet effective method called Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens to avoid the influence of tail randomness noises.We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. The results show that AdapT sampling significantly outperforms state-of-the-art decoding strategies.

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

paper_url: http://arxiv.org/abs/2309.02706
repo_url: None
paper_authors: Guijin Son, Hanwool Lee, Suwan Kim, Huiseo Kim, Jaecheol Lee, Je Won Yeom, Jihyu Jung, Jung Woo Kim, Songseong Kim
for: 本研究旨在评估大语言模型（LLMs）在韩语言和文化中的表现，以及语言特定模型（LLSMs）在语言知识领域的可行性。
methods: 本研究使用了6个任务，包括词汇、历史和通用知识，以评估语言模型在不同领域的表现。
results: 研究发现，使用专门为韩语言而设计的LLSMs可以与大型语言模型GPT-3.5相比，在语言特定知识领域具有相似的表现水平，而且这些模型比GPT-3.5更小。然而，当这些小型LMs被要求生成结构化答案时，它们却表现出很大的性能下降。

Abstract
Large Language Models (LLMs) pretrained on massive corpora exhibit remarkable capabilities across a wide range of tasks, however, the attention given to non-English languages has been limited in this field of research. To address this gap and assess the proficiency of language models in the Korean language and culture, we present HAE-RAE Bench, covering 6 tasks including vocabulary, history, and general knowledge. Our evaluation of language models on this benchmark highlights the potential advantages of employing Large Language-Specific Models(LLSMs) over a comprehensive, universal model like GPT-3.5. Remarkably, our study reveals that models approximately 13 times smaller than GPT-3.5 can exhibit similar performance levels in terms of language-specific knowledge retrieval. This observation underscores the importance of homogeneous corpora for training professional-level language-specific models. On the contrary, we also observe a perplexing performance dip in these smaller LMs when they are tasked to generate structured answers.

摘要
大型语言模型（LLM）在庞大资料库中显示出惊人的能力，但对非英语语言的研究却受到了限制。为了填补这个隔阂并评估语言模型在韩语言和文化中的水平，我们提出了HAE-RAE Bench，包括6个任务，如词汇、历史和通用知识。我们对这个benchmark进行了语言模型的评估，并发现使用专门为某种语言而设计的大型语言模型（LLSM）比普通的大型语言模型GPT-3.5更有优势。另外，我们发现使用比GPT-3.5更小的模型可以达到类似的性能水平，这显示了专门针对某种语言的训练数据的重要性。然而，我们也发现这些更小的模型在生成结构化答案时表现出异常低的性能。

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

paper_url: http://arxiv.org/abs/2309.02691
repo_url: https://github.com/lil-lab/phrase_grounding
paper_authors: Noriyuki Kojima, Hadar Averbuch-Elor, Yoav Artzi
for: 本研究旨在研究自然语言理解中的视觉上下文中的语言抽象，即将语言抽象与图像区域相关联。
methods: 本研究使用了一种混合的方法，包括任务性能和语言抽象的联合研究，以及三个 benchmark 来研究语言抽象和任务之间的关系。
results: 研究结果表明，当代模型在语言抽象和任务解决方面存在不一致性，可以通过对语言抽象的粗粒训练来解决这一问题。

Abstract
Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and propose three benchmarks to study the relation between the two. Our results show that contemporary models demonstrate inconsistency between their ability to ground phrases and solve tasks. We show how this can be addressed through brute-force training on ground phrasing annotations, and analyze the dynamics it creates. Code and at available at https://github.com/lil-lab/phrase_grounding.

摘要
translate("Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and propose three benchmarks to study the relation between the two. Our results show that contemporary models demonstrate inconsistency between their ability to ground phrases and solve tasks. We show how this can be addressed through brute-force training on ground phrasing annotations, and analyze the dynamics it creates. Code and at available at https://github.com/lil-lab/phrase_grounding.")Here's the translation:键在需要基于自然语言的视觉上进行推理的任务中是将单词和短语与图像区域相关联。然而，现代模型中的这种相关性观察是复杂的，即使它们在一种通用化的方式下进行处理。我们提出了一个框架，用于同时研究任务性能和短语相关性，并提出了三个benchmark来研究这两者之间的关系。我们的结果显示，当今的模型在短语相关性和任务解决方面存在不一致性。我们展示了通过简单的粗略训练ground phrasing标注来解决这个问题，并分析了这种动态的创建。代码和数据可以在https://github.com/lil-lab/phrase_grounding上获取。

Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation

paper_url: http://arxiv.org/abs/2309.03238
repo_url: None
paper_authors: Mimansa Jaiswal
for: 本论文的目的是提高情感认知的准确性和可靠性，探讨了多种情感认知领域中的挑战。
methods: 本研究使用了多种方法，包括数据采集、数据增强、注释分析和隐藏变量的检测等，以提高情感认知模型的准确性和可靠性。
results: 本研究的结果表明，采用控制压力等方法可以更好地反映真实的情感生成过程，并且可以避免因注释标签的主观性而导致的偏见。此外，本研究还提出了一种新的评价指标，可以更好地衡量情感认知模型的性能。

Abstract
Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in developing accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion production. To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset, which introduces controlled stressors during data collection to better represent real-world influences on emotion production. To address issues with label subjectivity, this research comprehensively analyzes how data augmentation techniques and annotation schemes impact emotion perception and annotator labels. It further handles natural confounding variables and variations by employing adversarial networks to isolate key factors like stress from learned emotion representations during model training. For tackling concerns about leakage of sensitive demographic variables, this work leverages adversarial learning to strip sensitive demographic information from multimodal encodings. Additionally, it proposes optimized sociological evaluation metrics aligned with cost-effective, real-world needs for model testing. This research advances robust, practical emotion recognition through multifaceted studies of challenges in datasets, labels, modeling, demographic and membership variable encoding in representations, and evaluation. The groundwork has been laid for cost-effective, generalizable emotion recognition models that are less likely to encode sensitive demographic information.

摘要
emotional recognition 是一项复杂的任务，因为情感的感知和生产都具有主观性。这种主观性对计算机模型的开发带来了重大挑战。这个论文探讨了情感认知的重要方面，包括收集多样化的数据集，以 compte for psychological factors in emotion production。为了 Address non-representative 的训练数据问题，这个研究收集了 Multimodal Stressed Emotion 数据集，该数据集在数据收集时引入了控制的压力因素，以更好地表示真实世界中的影响。另外，这个研究还对数据增强技术和注释方案的影响进行了全面分析，以及模型中的人员变量编码。此外，它还使用了对抗网络来隔离学习过程中的压力因素，以避免模型中的敏感人员变量编码。此外，这个研究还提出了优化的社会学评价指标，以满足实际的成本效益需求。这些指标可以用于评价模型的性能，并且可以帮助避免模型中的敏感人员变量编码。总之，这个研究通过多方面的研究，提出了一些可行的方法来解决情感认知 task 中的挑战。这些方法可以帮助开发出可靠、实用的情感认知模型，并避免模型中的敏感人员变量编码。

Zero-Resource Hallucination Prevention for Large Language Models

paper_url: http://arxiv.org/abs/2309.02654
repo_url: None
paper_authors: Junyu Luo, Cao Xiao, Fenglong Ma
for: 这篇论文的目的是为了减少大语言模型（LLM）中的“幻视”现象，即模型生成的信息中包含无根据或错误的资讯。methods: 这篇论文使用了一种新的预先检测自我识别技术，称为“自我熟悉度”（SELF-FAMILIARITY），它评估模型对输入指令中的概念的熟悉程度，并在缺乏熟悉的情况下停止生成回答。results: 这篇论文的结果显示，使用 SELF-FAMILIARITY 技术可以与现有的方法相比，实现更高的检测性和可靠性，并且可以提高模型的解释性和应用性。

Abstract
The prevalent use of large language models (LLMs) in various domains has drawn attention to the issue of "hallucination," which refers to instances where LLMs generate factually inaccurate or ungrounded information. Existing techniques for hallucination detection in language assistants rely on intricate fuzzy, specific free-language-based chain of thought (CoT) techniques or parameter-based methods that suffer from interpretability issues. Additionally, the methods that identify hallucinations post-generation could not prevent their occurrence and suffer from inconsistent performance due to the influence of the instruction format and model style. In this paper, we introduce a novel pre-detection self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction and withholding the generation of response in case of unfamiliar concepts. This approach emulates the human ability to refrain from responding to unfamiliar topics, thus reducing hallucinations. We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques. Our findings propose a significant shift towards preemptive strategies for hallucination mitigation in LLM assistants, promising improvements in reliability, applicability, and interpretability.

摘要
现在的大语言模型（LLM）在各个领域的普遍使用已引起“幻觉”的问题的注意。幻觉指的是LLM生成的信息中存在虚假或不准确的内容。现有的幻觉检测技术在语言助手中使用复杂的朗文、特定的自由语言链条件（CoT）技术或参数基本方法，但这些方法受到解释性问题的困扰。另外，这些方法只能在生成后进行检测，并且因为指令格式和模型风格的影响而具有不稳定性。在这篇论文中，我们提出了一种新的预测检测技术，称为自己熟悉性（SELF-FAMILIARITY），它是根据输入指令中的概念 Familiarity 来评估模型的熟悉度。如果模型对概念不熟悉，则避免生成响应。这种方法模仿人类在不熟悉话题时的行为，从而减少幻觉。我们在四个不同的大语言模型上验证了SELF-FAMILIARITY，并证明其性能与现有技术相比具有显著优势。我们的发现提出了一种重要的转移，即采用预防策略来mitigate幻觉在LLM助手中，这将提高可靠性、可用性和解释性。

Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine Translation

paper_url: http://arxiv.org/abs/2309.02640
repo_url: None
paper_authors: Keyu Chen, Di Zhuang, Mingchen Li, J. Morris Chang
for: 提高 Neural Machine Translation (NMT) 模型在新领域下的性能，尤其是在有限数据情况下。
methods: 提出了一种新的 episodic training 框架，以及一种denoised curriculum learning 技术，以增强模型对频率变化的适应力。
results: 实验表明，Epi-Curriculum 可以提高模型在 seen 和 unseen 频率下的性能，并且可以增强encoder 和 decoder 对频率变化的适应力。

Abstract
Neural Machine Translation (NMT) models have become successful, but their performance remains poor when translating on new domains with a limited number of data. In this paper, we present a novel approach Epi-Curriculum to address low-resource domain adaptation (DA), which contains a new episodic training framework along with denoised curriculum learning. Our episodic training framework enhances the model's robustness to domain shift by episodically exposing the encoder/decoder to an inexperienced decoder/encoder. The denoised curriculum learning filters the noised data and further improves the model's adaptability by gradually guiding the learning process from easy to more difficult tasks. Experiments on English-German and English-Romanian translation show that: (i) Epi-Curriculum improves both model's robustness and adaptability in seen and unseen domains; (ii) Our episodic training framework enhances the encoder and decoder's robustness to domain shift.

摘要
神经机器翻译（NMT）模型已经成功，但其性能在新领域 WITH 有限数据时仍然不佳。在这篇论文中，我们提出了一种新的方法：Epi-Curriculum，用于解决低资源领域适应（DA）问题。我们的 episodic 训练框架可以增强模型对频率转换的抵抗力，通过在不熟悉的decoder/encoder中进行 episodic 训练。另外，我们的denoised curriculum learning 技术可以进一步提高模型的适应性，通过逐渐引导学习过程从容易到更加difficult任务。我们在英语-德语和英语-罗马尼亚翻译 задании进行了实验，结果表明：1. Epi-Curriculum 可以提高模型在seen和unseen领域中的 robustness 和适应性。2. 我们的 episodic 训练框架可以增强encoder和decoder对频率转换的抵抗力。

Fun Paper

2023-09-06

cs.CL - 2023-09-06

RoDia: A New Dataset for Romanian Dialect Identification from Speech

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

Gender-specific Machine Translation with Large Language Models

GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

Narrative as a Dynamical System

Everyone Deserves A Reward: Learning Customized Human Preferences

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

ContrastWSD: Enhancing Metaphor Detection with Word Sense Disambiguation Following the Metaphor Identification Procedure

Persona-aware Generative Model for Code-mixed Language

Leave no Place Behind: Improved Geolocation in Humanitarian Documents

Addressing the Blind Spots in Spoken Language Processing

Aligning Large Language Models for Clinical Tasks

Agent-based simulation of pedestrians’ earthquake evacuation; application to Beirut, Lebanon

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

Improving Code Generation by Dynamic Temperature Sampling

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation

Zero-Resource Hallucination Prevention for Large Language Models

Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine Translation

2023-09-06

RoDia: A New Dataset for Romanian Dialect Identification from Speech

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

Gender-specific Machine Translation with Large Language Models

GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

Narrative as a Dynamical System

Everyone Deserves A Reward: Learning Customized Human Preferences

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

ContrastWSD: Enhancing Metaphor Detection with Word Sense Disambiguation Following the Metaphor Identification Procedure

Persona-aware Generative Model for Code-mixed Language

Leave no Place Behind: Improved Geolocation in Humanitarian Documents

ViCGCN: Graph Convolutional Network with Contextualized Language Models for Social Media Mining in Vietnamese

Addressing the Blind Spots in Spoken Language Processing

Aligning Large Language Models for Clinical Tasks

Agent-based simulation of pedestrians’ earthquake evacuation; application to Beirut, Lebanon

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

Improving Code Generation by Dynamic Temperature Sampling

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation

Zero-Resource Hallucination Prevention for Large Language Models

Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine Translation