paper_authors: Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill
for: 提高 ASR 系统的表现,使其更加准确和有效。
methods: 使用 ChatGPT 大语言模型进行零次或一次学习,对 ASR N-best 列表进行错误修正。
results: 对 Conformer-Transducer 模型和预训练的 Whisper 模型进行错误修正,可以大幅提高 ASR 系统的表现。I hope that helps! Let me know if you have any other questions.Abstract
ASR error correction continues to serve as an important part of post-processing for speech recognition systems. Traditionally, these models are trained with supervised training using the decoding results of the underlying ASR system and the reference text. This approach is computationally intensive and the model needs to be re-trained when switching the underlying ASR model. Recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.
摘要
<>转换文本为简化中文。<>ASR错误修正仍然serve as重要的后处理步骤 для语音识别系统。传统上,这些模型通过指导学习使用下面ASR系统的解码结果和参考文本进行训练。这种方法是计算机程序昂贵,模型需要在Switching beneath ASR模型时重新训练。 recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.Note: The translation is done using Google Translate, which may not be perfect, but it should give you a good idea of the content in Simplified Chinese.
Emotion-Guided Music Accompaniment Generation Based on Variational Autoencoder
results: 我们的方法可以增强音乐创作过程中 AI 的情感创作能力,并生成更加美妙的伴奏乐曲。Abstract
Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emotions to create beautiful accompaniments. Existing models struggle to effectively characterize human emotions within neural network models while composing music. To address this issue, we propose the use of an easy-to-represent emotion flow model, the Valence/Arousal Curve, which allows for the compatibility of emotional information within the model through data transformation and enhances interpretability of emotional factors by utilizing a Variational Autoencoder as the model structure. Further, we used relative self-attention to maintain the structure of the music at music phrase level and to generate a richer accompaniment when combined with the rules of music theory.
摘要
音乐伴奏生成是作曲过程中的一个重要方面。深度神经网络在这个领域已经做出了很大的进步,但是AI还未能够有效地涵盖人类情感以创造美妙的伴奏。现有的模型很难准确地捕捉人类情感信息在神经网络模型中,而且通常会导致模型难以理解和描述情感因素。为解决这个问题,我们提议使用一种容易表达情感流程的曲线模型,即情感值/刺激曲线,该模型通过数据转换来兼容情感信息,并通过使用变量自动编码器结构来提高情感因素的解释性。此外,我们使用相对自注意力来保持音乐段级结构和生成更加丰富的伴奏,并与音乐理论规则相结合。