eess.AS - 2023-10-26

Privacy-preserving Representation Learning for Speech Understanding

  • paper_url: http://arxiv.org/abs/2310.17194
  • repo_url: None
  • paper_authors: Minh Tran, Mohammad Soleymani
  • for: 隐私保护的语音表示学习方法,目前只适用于单一应用领域。这篇论文提出了一种新的框架,可以匿名化预训练encoder生成的语音表示,并证明其在各种语音分类任务中的效果。
  • methods: 使用Transformer进行预测,根据预训练encoder生成的表示,计算不同说话者的同一句话表示。在推理阶段,提取的表示可以转换为不同的身份来保护隐私。
  • results: 与VoicePrivacy 2022挑战的基准方法进行比较,我们的方法在隐私和实用性两个领域中具有更好的性能,特别是在情感识别、抑郁诊断和意图识别等词语表达任务中。
    Abstract Existing privacy-preserving speech representation learning methods target a single application domain. In this paper, we present a novel framework to anonymize utterance-level speech embeddings generated by pre-trained encoders and show its effectiveness for a range of speech classification tasks. Specifically, given the representations from a pre-trained encoder, we train a Transformer to estimate the representations for the same utterances spoken by other speakers. During inference, the extracted representations can be converted into different identities to preserve privacy. We compare the results with the voice anonymization baselines from the VoicePrivacy 2022 challenge. We evaluate our framework on speaker identification for privacy and emotion recognition, depression classification, and intent classification for utility. Our method outperforms the baselines on privacy and utility in paralinguistic tasks and achieves comparable performance for intent classification.
    摘要 现有的隐私保护的语音表示学习方法都是专门为单个应用领域设计的。在这篇论文中,我们提出了一种新的框架,用于匿名化预训练编码器生成的语音嵌入,并证明其对各种语音分类任务的效果。具体来说,给定预训练编码器生成的表示,我们训练了一个Transformer模型,以便估计相同的声音由其他 speaker 说的表示。在推理阶段,提取的表示可以转换为不同的 identities,以保护隐私。我们与 VoicePrivacy 2022 挑战的基准值进行比较,并对 speaker 识别、情感识别、抑郁诊断和意图识别等多种任务进行评估。我们的方法在隐私和实用性两个方面都超过基准值,并在意图识别任务中与基准值相对。