eess.AS - 2023-12-04

SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

  • paper_url: http://arxiv.org/abs/2312.01744
  • repo_url: None
  • paper_authors: Martin Strauss, Nicola Pia, Nagashree K. S. Rao, Bernd Edler
  • for: 提高语音听降效果
  • methods: 结合最大 LIKELIHOOD 训练和生成敌方网络 (GANs) 的深度神经网络 (DNN)
  • results: 比基eline模型高效、维持高质量音频生成和 log-likelihood 估计,并且在 Computational metrics 和听降试验中与其他状态级模型竞争。
    Abstract This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFGAN demonstrates that a hybrid adversarial and maximum likelihood training approach enables the model to maintain high quality audio generation and log-likelihood estimation. Our experiments indicate that this approach strongly outperforms the baseline NF-based model without introducing additional complexity to the enhancement network. A comparison using computational metrics and a listening experiment reveals that SEFGAN is competitive with other state-of-the-art models.
    摘要