eess.AS - 2023-07-03

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

  • paper_url: http://arxiv.org/abs/2307.00782
  • repo_url: None
  • paper_authors: Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee
  • for: 这项研究旨在提高文本转语音(TTS)系统的长文朗读质量。
  • methods: 该研究提出了一种轻量级 yet有效的 TTS 系统,即 ContextSpeech。该系统首先设计了一种储存机制,以利用全文和语音上下文来增强句子编码。然后,它构建了层次结构的文本 semantics,以扩大全文上下文的增强范围。最后,它综合应用了线性化自注意力,以提高模型效率。
  • results: 实验表明,ContextSpeech 在段落读物中提高了声音质量和语调表达性,与竞争性模型相当。示例响应器可以在以下链接中浏览:https://contextspeech.github.io/demo/
    Abstract While state-of-the-art Text-to-Speech systems can generate natural speech of very high quality at sentence level, they still meet great challenges in speech generation for paragraph / long-form reading. Such deficiencies are due to i) ignorance of cross-sentence contextual information, and ii) high computation and memory cost for long-form synthesis. To address these issues, this work develops a lightweight yet effective TTS system, ContextSpeech. Specifically, we first design a memory-cached recurrence mechanism to incorporate global text and speech context into sentence encoding. Then we construct hierarchically-structured textual semantics to broaden the scope for global context enhancement. Additionally, we integrate linearized self-attention to improve model efficiency. Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency. Audio samples are available at: https://contextspeech.github.io/demo/
    摘要 “当前的文本到语音系统可以生成具有非常高质量的自然语音,但是在段落/长文读取中仍然存在很大的挑战。这些问题的原因是:一、忽略跨句Contextual信息,二、长文合成的计算和内存成本过高。为了解决这些问题,本工作开发了一个轻量级又有效的文本到语音系统——ContextSpeech。具体来说,我们首先设计了一种嵌入式的记忆缓存机制,以将全文和语音Context incorporated into sentence encoding。然后,我们构建了层次结构的文本 semantics,以扩大全文Context的增强范围。此外,我们将Linearized self-attention integrated into the model,以提高模型效率。实验表明,ContextSpeech可以在段落读取中显著提高声音质量和表达性,并且与其他模型相比,其效率相对较高。听 samples可以在:https://contextspeech.github.io/demo/ ”Note that the translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, I can provide that as well.