results: 实验表明,ContextSpeech 在段落读物中提高了声音质量和语调表达性,与竞争性模型相当。示例响应器可以在以下链接中浏览:https://contextspeech.github.io/demo/Abstract
While state-of-the-art Text-to-Speech systems can generate natural speech of very high quality at sentence level, they still meet great challenges in speech generation for paragraph / long-form reading. Such deficiencies are due to i) ignorance of cross-sentence contextual information, and ii) high computation and memory cost for long-form synthesis. To address these issues, this work develops a lightweight yet effective TTS system, ContextSpeech. Specifically, we first design a memory-cached recurrence mechanism to incorporate global text and speech context into sentence encoding. Then we construct hierarchically-structured textual semantics to broaden the scope for global context enhancement. Additionally, we integrate linearized self-attention to improve model efficiency. Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency. Audio samples are available at: https://contextspeech.github.io/demo/
摘要
“当前的文本到语音系统可以生成具有非常高质量的自然语音,但是在段落/长文读取中仍然存在很大的挑战。这些问题的原因是:一、忽略跨句Contextual信息,二、长文合成的计算和内存成本过高。为了解决这些问题,本工作开发了一个轻量级又有效的文本到语音系统——ContextSpeech。具体来说,我们首先设计了一种嵌入式的记忆缓存机制,以将全文和语音Context incorporated into sentence encoding。然后,我们构建了层次结构的文本 semantics,以扩大全文Context的增强范围。此外,我们将Linearized self-attention integrated into the model,以提高模型效率。实验表明,ContextSpeech可以在段落读取中显著提高声音质量和表达性,并且与其他模型相比,其效率相对较高。听 samples可以在:https://contextspeech.github.io/demo/ ”Note that the translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, I can provide that as well.