results: 试验结果表明,使用ChatGPT模型进行ASR错误修正可以大幅提高ASR系统的性能,特别是在1-shot setting下。Abstract
ASR error correction continues to serve as an important part of post-processing for speech recognition systems. Traditionally, these models are trained with supervised training using the decoding results of the underlying ASR system and the reference text. This approach is computationally intensive and the model needs to be re-trained when switching the underlying ASR model. Recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.
摘要
<>转换文本为简化中文。<>ASR错误修正仍然serve为speech recognition系统 posterior 处理中的重要部分。传统上,这些模型通过supervised training使用underlying ASR系统的解码结果和参考文本进行训练。这种方法是计算昂贵的,并且模型需要在Switching underlying ASR模型时重新训练。 recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. 在这篇论文中,我们使用ChatGPT作为例子,检查它是否可以在zero-shot或1-shot settings中进行ASR错误修正。我们使用ASR N-best list作为模型输入,并提出了不受限制的错误修正方法和N-best受限制的错误修正方法。 results on a Conformer-Transducer模型和预训练Whisper模型表明,我们可以使用powerful ChatGPT模型来大幅提高ASR系统性能。
Dream Content Discovery from Reddit with an Unsupervised Mixed-Method Approach
for: This paper aims to develop a new, data-driven approach for analyzing dream reports and understanding the topics and themes that appear in dreams.
methods: The authors use natural language processing techniques to identify topics in free-form dream reports and group them into larger themes. They also compare their results to the Hall and van de Castle scale to validate their findings.
results: The authors analyze 44,213 dream reports from Reddit’s r/Dreams subreddit and identify 217 topics, grouped into 22 larger themes. They also show how their method can be used to understand changes in collective dream experiences over time and around major events like the COVID-19 pandemic and the Russo-Ukrainian war.Here is the same information in Simplified Chinese text:
results: 作者分析了 Reddit r/梦境 subreddit上的44,213个梦境报告,并发现了217个主题,分组成22个更大的主题。他们还表明了他们的方法可以用于理解时间和大事件的影响,如COVID-19疫苗和俄乌战争。Abstract
Dreaming is a fundamental but not fully understood part of human experience that can shed light on our thought patterns. Traditional dream analysis practices, while popular and aided by over 130 unique scales and rating systems, have limitations. Mostly based on retrospective surveys or lab studies, they struggle to be applied on a large scale or to show the importance and connections between different dream themes. To overcome these issues, we developed a new, data-driven mixed-method approach for identifying topics in free-form dream reports through natural language processing. We tested this method on 44,213 dream reports from Reddit's r/Dreams subreddit, where we found 217 topics, grouped into 22 larger themes: the most extensive collection of dream topics to date. We validated our topics by comparing it to the widely-used Hall and van de Castle scale. Going beyond traditional scales, our method can find unique patterns in different dream types (like nightmares or recurring dreams), understand topic importance and connections, and observe changes in collective dream experiences over time and around major events, like the COVID-19 pandemic and the recent Russo-Ukrainian war. We envision that the applications of our method will provide valuable insights into the intricate nature of dreaming.
摘要
梦境是人类经验中的基本 Component, but it is not yet fully understood. 传统的梦境分析方法,虽然受欢迎且有130多个专门的Scale和评分系统,但它们有限制。大多数是基于回忆题或实验室实验,难以应用于大规模或显示梦境主题之间的重要性和连接。为了解决这些问题,我们开发了一个新的数据驱动混合方法,通过自然语言处理来识别自由形式的梦境报告中的主题。我们对网络上的Reddit的r/Dreams子区域上的44,213个梦境报告进行了测试,发现了217个主题,分为22个更大的主题:迄今为止最大的梦境主题收集。我们验证了我们的主题,与广泛使用的哈尔和van de Castle专业Scale进行比较。我们的方法可以找到不同类型的梦境(如夜恶梦或重复的梦境)中的独特模式,理解主题的重要性和连接,并观察时间和主要事件(如COVID-19疫情和最近的俄乌战略)的影响。我们觉得这些应用将提供价值的关于梦境的深入了解。
Towards cross-language prosody transfer for dialog
results: 作者的发现可以指导未来关于 cross-language 语音特征的研究,以及设计能够有效地传递语音特征的语音译语系统。Abstract
Speech-to-speech translation systems today do not adequately support use for dialog purposes. In particular, nuances of speaker intent and stance can be lost due to improper prosody transfer. We present an exploration of what needs to be done to overcome this. First, we developed a data collection protocol in which bilingual speakers re-enact utterances from an earlier conversation in their other language, and used this to collect an English-Spanish corpus, so far comprising 1871 matched utterance pairs. Second, we developed a simple prosodic dissimilarity metric based on Euclidean distance over a broad set of prosodic features. We then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. Our findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody transfer.
摘要
当今的语音到语音翻译系统不充分支持对话用途。特别是,speaker的意图和态度可能会在不正确的语速传递中丢失。我们提出了一种解决方案,包括发展一种数据采集协议,由双语说话人重新表演之前的对话中的语音,并使用这些数据采集了一个英语-西班牙语词库,目前已经有1871个匹配的语音对。second, we developed a simple prosody dissimilarity metric based on Euclidean distance over a broad set of prosodic features. we then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. our findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody transfer.
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing
results: 本文在两个数据集MTOP和MultiATIS++SQL上进行了评估,成功地在少数例下lingshot regime中实现了顶尖的结果。Abstract
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. Previous work has primarily considered silver-standard data augmentation or zero-shot methods, however, exploiting few-shot gold data is comparatively unexplored. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between probabilistic latent variables using Optimal Transport. We demonstrate how this direct guidance improves parsing from natural languages using fewer examples and less training. We evaluate our method on two datasets, MTOP and MultiATIS++SQL, establishing state-of-the-art results under a few-shot cross-lingual regime. Ablation studies further reveal that our method improves performance even without parallel input translations. In addition, we show that our model better captures cross-lingual structure in the latent space to improve semantic representation similarity.
摘要
通用语言semantic parsing是一种将分析能力从高资源语言(如英语)传递到低资源语言的技术。先前的工作主要集中在银色标准数据增强或零例目标方法上,然而受限于数据量的几个例行金融方法尚未得到了充分利用。我们提出了一种新的交通优化方法来实现跨语言semantic parsing,通过最优交通来减少跨语言差异。我们通过这种直接导向来提高自然语言的分析,需要 fewer examples和更少的训练。我们在MTOP和MultiATIS++SQL两个 datasets上进行了评估,并在几个例行跨语言 режиме下创造了状态机器人的结果。剖分研究表明,我们的方法可以提高性能,甚至没有平行输入翻译。此外,我们的模型更好地捕捉了跨语言结构,以提高 semantic representation的相似性。
Bidirectional Attention as a Mixture of Continuous Word Experts
results: 研究发现,bidirectional attention可以在out-of-distribution泛化中表现出优于现有的表示器扩展。此外,该论文还提出了一种基于MoE的对 categorical tabular data的扩展,并证明了bidirectional attention中的线性词 analogies的存在可能性。Abstract
Bidirectional attention $\unicode{x2013}$ composed of self-attention with positional encodings and the masked language model (MLM) objective $\unicode{x2013}$ has emerged as a key component of modern large language models (LLMs). Despite its empirical success, few studies have examined its statistical underpinnings: What statistical model is bidirectional attention implicitly fitting? What sets it apart from its non-attention predecessors? We explore these questions in this paper. The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights. Further, bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. This statistical viewpoint reveals the distinct use of MoE in bidirectional attention, which aligns with its practical effectiveness in handling heterogeneous data. It also suggests an immediate extension to categorical tabular data, if we view each word location in a sentence as a tabular feature. Across empirical studies, we find that this extension outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, this statistical perspective of bidirectional attention enables us to theoretically characterize when linear word analogies are present in its word embeddings. These analyses show that bidirectional attention can require much stronger assumptions to exhibit linear word analogies than its non-attention predecessors.
摘要
bidirectional attention $\unicode{x2013}$ 由自我注意与位置编码和伪语言模型(MLM)目标组成, $\unicode{x2013}$ 在现代大语言模型(LLM)中发展为关键组成部分。尽管其实际成功,但很少研究其统计基础:bidirectional attention隐式地适应哪种统计模型?与非注意前辈相比,它有什么特点?我们在这篇论文中进行了这些问题的探索。关键观察结果表明,单层单头bidirectional attention经重parameterization等价于单个单元词语(CBOW)模型中的混合专家(MoE)权重。此外,bidirectional attention有多个头和多层,等价于堆叠的MoE和混合MoE。这种统计视角显示bidirectional attention中MoE的独特使用,与其实际效果在处理不同数据有关。它还建议了将每个句子中的单词位置视为一个表格特征,对 tabular数据进行扩展。在实验研究中,我们发现这种扩展在不同数据集上的OOD泛化性能高于现有的表transformer扩展。最后,这种统计观点对bidirectional attention的word embeddings中的线性单词 analogies进行了理论 caracterization。这些分析表明,bidirectional attention可能需要更强的假设,以便在其word embeddings中出现线性单词 analogies,与非注意前辈相比。
paper_authors: Tran Hien Van, Abhay Goyal, Muhammad Siddique, Lam Yin Cheung, Nimay Parekh, Jonathan Y Huang, Keri McCrickerd, Edson C Tandoc Jr., Gerard Chung, Navin Kumar
results: 发现 singapore 父亲在线环境中的框架并不仅仅是中心的家庭单位。Abstract
The proliferation of discussion about fatherhood in Singapore attests to its significance, indicating the need for an exploration of how fatherhood is framed, aiding policy-making around fatherhood in Singapore. Sound and holistic policy around fatherhood in Singapore may reduce stigma and apprehension around being a parent, critical to improving the nations flagging birth rate. We analyzed 15,705 articles and 56,221 posts to study how fatherhood is framed in Singapore across a range of online platforms (news outlets, parenting forums, Twitter). We used NLP techniques to understand these differences. While fatherhood was framed in a range of ways on the Singaporean online environment, it did not seem that fathers were framed as central to the Singaporean family unit. A strength of our work is how the different techniques we have applied validate each other.
摘要
《父亲权利在新加坡的普及和讨论,证明了它的重要性,表明了需要对父亲权利的框架进行研究,以便在新加坡制定有效的父亲权利政策。如果制定有效的父亲权利政策,可能会减轻父母身份的偏见和担忧,从而改善新加坡的出生率。我们分析了15705篇文章和56221个微博,以研究新加坡在线环境中 fatherhood 的框架。我们使用自然语言处理技术来理解这些差异。虽然在新加坡的在线环境中 fatherhood 被框架在不同的方式,但是父亲并不被视为新加坡家庭单位的中心。我们的工作的一个优点是我们所应用的不同技术都能够证实Each other。》
Can LLMs be Good Financial Advisors?: An Initial Study in Personal Decision Making for Optimized Outcomes
results: 我们发现,虽然chatbot的输出具有流畅和可能性,但还存在提供准确和可靠的金融信息的重要缺陷。Abstract
Increasingly powerful Large Language Model (LLM) based chatbots, like ChatGPT and Bard, are becoming available to users that have the potential to revolutionize the quality of decision-making achieved by the public. In this context, we set out to investigate how such systems perform in the personal finance domain, where financial inclusion has been an overarching stated aim of banks for decades. We asked 13 questions representing banking products in personal finance: bank account, credit card, and certificate of deposits and their inter-product interactions, and decisions related to high-value purchases, payment of bank dues, and investment advice, and in different dialects and languages (English, African American Vernacular English, and Telugu). We find that although the outputs of the chatbots are fluent and plausible, there are still critical gaps in providing accurate and reliable financial information using LLM-based chatbots.
摘要
增强的大语言模型(LLM)基于的聊天机器人,如ChatGPT和Bard,正在用户处得到普及,它们有可能革命化公众做出决策的质量。在这个上下文下,我们进行了评估这些系统在个人财务领域的表现,其中财务包括银行产品的问题和 dialects and languages(英语、非洲美国黑话和泰卢固)。我们发现虽然聊天机器人的输出具有流畅和合理的特点,但仍然存在提供准确和可靠的金融信息的关键缺失。
Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation
results: 实验结果显示,Two-Step 方法在 ConvSumX 下表现出色,并且在人工评估中也获得了好几个比较高的评分。分析表明,输入文本和概要都是模型跨语言概要的关键因素。Abstract
Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.
摘要
现有跨语言概要(CLS)工作通常通过直接翻译已经翻译过的概要来构建CLS corpora,这可能会包含翻译和概要过程中的错误。为解决这个问题,我们提出了ConvSumX,一个跨语言对话概要benchmark,通过一种新的注释schema来考虑原始输入上下文。ConvSumX包括2个子任务,每个覆盖3种语言方向,并进行了对ConvSumX和3种手动注释CLS corpora的 thorought的分析。我们发现ConvSumX更 faithful于输入文本。此外,基于同一种INTUITION,我们提出了2步方法,该方法接受对话和概要作为输入,以模拟人工注释过程。实验结果显示,2步方法在ConvSumX上超过了强基eline。分析表明,原始输入文本和概要都是模型跨语言概要的关键因素。
results: 实验显示,使用大型模型可以 дости得30%的终态准确率,但是也会增加延迟。同时,使用较小的模型可以降低延迟,但是终态准确率也会降低至55%。Abstract
Voice dictation is an increasingly important text input modality. Existing systems that allow both dictation and editing-by-voice restrict their command language to flat templates invoked by trigger words. In this work, we study the feasibility of allowing users to interrupt their dictation with spoken editing commands in open-ended natural language. We introduce a new task and dataset, TERTiUS, to experiment with such systems. To support this flexibility in real-time, a system must incrementally segment and classify spans of speech as either dictation or command, and interpret the spans that are commands. We experiment with using large pre-trained language models to predict the edited text, or alternatively, to predict a small text-editing program. Experiments show a natural trade-off between model accuracy and latency: a smaller model achieves 30% end-state accuracy with 1.3 seconds of latency, while a larger model achieves 55% end-state accuracy with 7 seconds of latency.
摘要
“对话输入模式在不断增长的重要性。现有系统允许Dictation和Speech editing Command的混合使用,但是它们仅允许使用特定的 trigger word invoked flat templates。在这个工作中,我们研究了允许用户在Dictation中间点击 spoken editing Command的可能性。我们引入了一个新的任务和数据集TERTIUS,以便实验这些系统。为了在实时支持这种自由性,一个系统需要逐条分析和识别语音为Dictation或Command,并将这些条件转换为编辑文本。我们尝试使用大型预训语言模型来预测编辑文本,或者alternatively,预测小的文本编辑程式。实验结果表明,这种自然的贸易在精度和延迟之间:一个较小的模型可以在1.3秒的延迟时间下达到30%的终端精度,而一个较大的模型可以在7秒的延迟时间下达到55%的终端精度。”