results: 提高了anti-spoofing检测系统的检测能力,增强了对TTS生成语音的检测能力Abstract
Spoofing speech detection is a hot and in-demand research field. However, current spoofing speech detection systems is lack of convincing evidence. In this paper, to increase the reliability of detection systems, the flaws of rhythm information inherent in the TTS-generated speech are analyzed. TTS models take text as input and utilize acoustic models to predict rhythm information, which introduces artifacts in the rhythm information. By filtering out vocal tract response, the remaining glottal flow with rhythm information retains detection ability for TTS-generated speech. Based on these analyses, a rhythm perturbation module is proposed to enhance the copy-synthesis data augmentation method. Fake utterances generated by the proposed method force the detecting model to pay attention to the artifacts in rhythm information and effectively improve the ability to detect TTS-generated speech of the anti-spoofing countermeasures.
摘要
假语言识别是一个热门的研究领域,但目前的假语言识别系统尚缺乏充分的证据。在这篇论文中,为了提高检测系统的可靠性,我们分析了 TTS 生成的语音中的饱和信息的缺陷。 TTS 模型将文本作为输入,利用声学模型预测语音中的饱和信息,这会导致语音中的饱和信息受到质量问题的影响。通过滤除声道响应,保留的喉咙流量仍然具有检测能力。基于这些分析,我们提出了一种饱和抖振模块,以增强复制数据增强法。这种方法生成的假语音让检测模型更加注意饱和信息中的瑕疵,从而提高了对 TTS 生成语音的检测能力。