cs.SD - 2023-11-05

Yet Another Generative Model For Room Impulse Response Estimation

  • paper_url: http://arxiv.org/abs/2311.02581
  • repo_url: None
  • paper_authors: Sungho Lee, Hyeong-Seok Choi, Kyogu Lee
  • for: 这 paper 的目的是提出一种新的 neural room impulse response (RIR) 估计器,以提高估计质量。
  • methods: 这 paper 使用了一种 alternate generator 架构,通过 residual quantization 学习一个精度的离散Token空间,并将 RIR 估计问题转化为一个 reference-conditioned autoregressive token generation 任务。
  • results: 实验结果表明,这 paper 的系统在多种评价指标上都有优于基eline。
    Abstract Recent neural room impulse response (RIR) estimators typically comprise an encoder for reference audio analysis and a generator for RIR synthesis. Especially, it is the performance of the generator that directly influences the overall estimation quality. In this context, we explore an alternate generator architecture for improved performance. We first train an autoencoder with residual quantization to learn a discrete latent token space, where each token represents a small time-frequency patch of the RIR. Then, we cast the RIR estimation problem as a reference-conditioned autoregressive token generation task, employing transformer variants that operate across frequency, time, and quantization depth axes. This way, we address the standard blind estimation task and additional acoustic matching problem, which aims to find an RIR that matches the source signal to the target signal's reverberation characteristics. Experimental results show that our system is preferable to other baselines across various evaluation metrics.
    摘要 现代神经room响应函数估计器通常包括一个编码器用于参考音频分析和一个生成器用于响应函数合成。特别是,生成器的性能直接影响总估计质量。在这个上下文中,我们探索了一种 alternate 生成器架构以提高性能。我们首先在 autoencoder 中使用循环量化来学习一个精度时间频谱空间,其中每个token表示一个小时频谱块的响应函数。然后,我们将响应函数估计问题转化为一个引用条件自适应字符串生成任务,使用 transformer 变体在频率、时间和量化深度轴上运行。这样,我们解决了标准盲目估计问题和附加的听音匹配问题,其目的是找到一个匹配源信号的响应函数。实验结果显示,我们的系统在多个评价指标上比其他基准高。