results: 研究发现,这个synthetic dataset可以保持与原始 VoxCeleb2 集的相似性,并且可以用于下游的 speaker verification 任务中,但是也存在一些挑战,需要进一步的研究和改进。Abstract
The success of deep learning in speaker recognition relies heavily on the use of large datasets. However, the data-hungry nature of deep learning methods has already being questioned on account the ethical, privacy, and legal concerns that arise when using large-scale datasets of natural speech collected from real human speakers. For example, the widely-used VoxCeleb2 dataset for speaker recognition is no longer accessible from the official website. To mitigate these concerns, this work presents an initiative to generate a privacy-friendly synthetic VoxCeleb2 dataset that ensures the quality of the generated speech in terms of privacy, utility, and fairness. We also discuss the challenges of using synthetic data for the downstream task of speaker verification.
摘要
成功的深度学习在人脸识别中受到大量数据的支持。然而,深度学习方法的数据吃杂性已经被质疑,因为使用大规模自然语音收集到的真正人类说话者的数据会产生伦理、隐私和法律问题。例如,广泛使用的VoxCeleb2数据集已经从官方网站上下载不可达。为解决这些问题,本研究提出了一项隐私友好的synthetic VoxCeleb2数据生成Initative,确保生成的speech质量符合隐私、有用性和公平原则。我们还讨论了使用生成数据进行下游任务的说话识别的挑战。
Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?
results: 实验表明,使用 vocoded 数据进行SSL模型的连续训练可以提高CM的总性能,并且使用新的SSL模型(即两个SSL模型的distilled)可以进一步提高CM的性能,特别是在面临未经见过的测试集上。Abstract
A speech spoofing countermeasure (CM) that discriminates between unseen spoofed and bona fide data requires diverse training data. While many datasets use spoofed data generated by speech synthesis systems, it was recently found that data vocoded by neural vocoders were also effective as the spoofed training data. Since many neural vocoders are fast in building and generation, this study used multiple neural vocoders and created more than 9,000 hours of vocoded data on the basis of the VoxCeleb2 corpus. This study investigates how this large-scale vocoded data can improve spoofing countermeasures that use data-hungry self-supervised learning (SSL) models. Experiments demonstrated that the overall CM performance on multiple test sets improved when using features extracted by an SSL model continually trained on the vocoded data. Further improvement was observed when using a new SSL distilled from the two SSLs before and after the continual training. The CM with the distilled SSL outperformed the previous best model on challenging unseen test sets, including the ASVspoof 2019 logical access, WaveFake, and In-the-Wild.
摘要
一种演讲 spoofing 防范措施(CM)需要多样化的训练数据。许多数据集使用由speech synthesis系统生成的假数据,但最近发现,由神经 vocoder生成的数据也是有效的假数据。由于神经 vocoder快速生成和生成,这项研究使用多个神经 vocoder,生成了基于 VoxCeleb2 库的 más than 9,000 小时的 vocoded 数据。这项研究研究如何使用这些大规模的 vocoded 数据提高 spoofing 防范措施,使用需要大量自我supervised learning(SSL)模型。实验表明,使用由 SSL 模型不断地训练于 vocoded 数据中提取的特征,可以提高 CM 的总体性能。此外,使用一个新的 SSL 模型,其中两个 SSL 模型在 перед和后 continual training 中分别被训练,可以进一步提高 CM 的性能。该 CM 在不同的难度测试集上,包括 ASVspoof 2019 逻辑访问、WaveFake 和 In-the-Wild,都表现出优于前一代最佳模型。