eess.AS - 2023-11-16

Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

  • paper_url: http://arxiv.org/abs/2311.10149
  • repo_url: https://github.com/WangHelin1997/Aty-TTS
  • paper_authors: Helin Wang, Venkatesh Ravichandran, Milind Rao, Becky Lammers, Myra Sydnor, Nicholas Maragakis, Ankur A. Butala, Jayne Zhang, Lora Clawson, Victoria Chovaz, Laureano Moro-Velazquez
  • for: 提高 Speech 识别系统(SLU)对非典型发音的处理能力
  • methods: 使用 Text-to-Speech(TTS)synthesis-based数据增强技术,通过知识传递来模拟非典型 speaker 的语音特征
  • results: 实现了对非典型发音的高质量语音生成,并为 SLU 系统提供更公平的处理能力
    Abstract Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments. Recent advancements in Text-to-Speech (TTS) synthesis-based augmentation for more fair SLU have struggled to accurately capture the unique vocal characteristics of atypical speakers, largely due to insufficient data. To address this issue, we present a novel data augmentation method for atypical speakers by finetuning a TTS model, called Aty-TTS. Aty-TTS models speaker and atypical characteristics via knowledge transferring from a voice conversion model. Then, we use the augmented data to train SLU models adapted to atypical speech. To train these data augmentation models and evaluate the resulting SLU systems, we have collected a new atypical speech dataset containing intent annotation. Both objective and subjective assessments validate that Aty-TTS is capable of generating high-quality atypical speech. Furthermore, it serves as an effective data augmentation strategy, contributing to more fair SLU systems that can better accommodate individuals with atypical speech patterns.
    摘要 听说理解(Spoken Language Understanding,SLU)系统经常在处理非典型语音时表现出下标的性能,通常是由神经系统和motor功能障碍所致。最近,基于文本到语音(Text-to-Speech,TTS)合成的数据增强技术在更公正的SLU中获得了进展,但是它们在捕捉非典型说话者的特有声音特征方面存在准确性问题,主要是因为数据不足。为解决这个问题,我们提出了一种新的数据增强方法,即Aty-TTS。Aty-TTS模型通过知识传递自voice conversion模型来学习说话者和非典型特征。然后,我们使用这些增强数据来训练适应非典型语音的SLU系统。为了训练这些数据增强模型和评估所得的SLU系统,我们收集了一个新的非典型语音数据集,其中包含意向注解。对象和主观评估表明,Aty-TTS可以生成高质量的非典型语音,并且作为数据增强策略,它有效地改善了SLU系统的公正性,使其更好地适应非典型语音模式。