eess.AS - 2023-10-11

Damping Density of an Absorptive Shoebox Room Derived from the Image-Source Method

  • paper_url: http://arxiv.org/abs/2310.07363
  • repo_url: None
  • paper_authors: Sebastian J. Schlecht, Karolina Prawda, Rudolf Rabenstein, Maximilian Schäfer
  • for: 这篇论文主要探讨了如何快速计算带有任意吸收的射镜房(shoebox room)的吸收响应(RIR)。
  • methods: 该论文使用了图像源方法计算射镜房的RIR,并 derive了一个关闭式表达式来描述全部多坡衰减率(damping density)。
  • results: 该论文通过对墙面吸收率的变化来研究射镜房的吸收响应,并提出了一种快速随机生成晚反射的方法。该方法可以准确地预测射镜房的吸收响应,并且在不同的墙面吸收率下都具有高精度。
    Abstract The image-source method is widely applied to compute room impulse responses (RIRs) of shoebox rooms with arbitrary absorption. However, with increasing RIR lengths, the number of image sources grows rapidly, leading to slow computation. In this paper, we derive a closed-form expression for the damping density, which characterizes the overall multi-slope energy decay. The omnidirectional energy decay over time is directly derived from the damping density. The resulting energy decay model accurately matches the late reverberation simulated via the image-source method. The proposed model allows the fast stochastic synthesis of late reverberation by shaping noise with the energy envelope. Simulations of various wall damping coefficients demonstrate the model's accuracy. The proposed model consistently outperforms the energy decay prediction accuracy compared to a state-of-the-art approximation method. The paper elaborates on the proposed damping density's applicability to modeling multi-sloped sound energy decay, predicting reverberation time in non-diffuse sound fields, and fast frequency-dependent RIR synthesis.
    摘要 “图像源方法广泛应用于计算封闭室内响应(RIR)的射频响应。然而,随着 RIR 的增长,图像源的数量增长得非常快,导致计算变得慢。在这篇论文中,我们 derive 一个闭式表达式,用于描述全体多坡衰减率。通过这个表达式,我们直接 deriv 出各个方向的能量衰减。这种能量衰减模型可以准确地与图像源方法 simulate 的晚期响应相匹配。我们的模型允许通过修形噪声的形式来快速生成晚期响应。我们通过不同墙面减噪系数的 simulations 表明了我们的模型的准确性。我们的模型在与现有的approximation方法相比之下表现出了更高的能量衰减预测精度。论文还详细介绍了我们提出的凝固density 的应用性,包括模型多坡衰减、预测射频响应时间以及快速frequency-dependent RIR synthesis。”

Magnitude-and-phase-aware Speech Enhancement with Parallel Sequence Modeling

  • paper_url: http://arxiv.org/abs/2310.07316
  • repo_url: None
  • paper_authors: Yuewei Zhang, Huanbin Zou, Jie Zhu
  • for: 本研究是关于喷水声提高(SE)领域的一篇论文,旨在提高喷水声的语音质量。
  • methods: 本研究使用了一种新的预测方法,即使用实数网络来预测干扰声的大小和正规化cIRM(Complex Ideal Ratio Mask)。此外,研究者还提出了一种平行序列模型(PSM)块,用于改进传统的循环回归网络(CRN)模型。
  • results: 实验结果表明,使用的MPCRN方法可以在喷水声提高中实现更高的性能。
    Abstract In speech enhancement (SE), phase estimation is important for perceptual quality, so many methods take clean speech's complex short-time Fourier transform (STFT) spectrum or the complex ideal ratio mask (cIRM) as the learning target. To predict these complex targets, the common solution is to design a complex neural network, or use a real network to separately predict the real and imaginary parts of the target. But in this paper, we propose to use a real network to estimate the magnitude mask and normalized cIRM, which not only avoids the significant increase of the model complexity caused by complex networks, but also shows better performance than previous phase estimation methods. Meanwhile, we devise a parallel sequence modeling (PSM) block to improve the RNN block in the convolutional recurrent network (CRN)-based SE model. We name our method as magnitude-and-phase-aware and PSM-based CRN (MPCRN). The experimental results illustrate that our MPCRN has superior SE performance.
    摘要 在speech enhancement(SE)中,频谱估计是重要的,因此许多方法使用清晰speech的复杂短时傅立叶变换(STFT)谱或理想的复杂比例面纱(cIRM)作为学习目标。为预测这些复杂目标,常见的解决方案是设计复杂的神经网络,或者使用实际网络分开预测实部和虚部。但在本文中,我们提议使用实网络来估计魔方面和 нормализаzed cIRM,不仅可以避免由复杂网络引起的模型复杂度增加,而且也比前期预测方法表现更好。此外,我们设计了并行序列模型(PSM)块来改进CRN基于的卷积隐藏状态机制(CRN)模型。我们称我们的方法为魔方-和频谱-意识的PSM-CRN(MPCRN)。实验结果表明,我们的MPCRN具有更高的SE性能。

VSANet: Real-time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention

  • paper_url: http://arxiv.org/abs/2310.07295
  • repo_url: None
  • paper_authors: Yuewei Zhang, Huanbin Zou, Jie Zhu
  • for: 提高speech干扰改进(SE)性能
  • methods: 使用多任务学习框架和 causal spatial attention(CSA)块
  • results: 实验结果表明,VSANet具有出色的SE性能,其中多任务学习框架和CSA块都有益于SE性能的提高。
    Abstract The deep learning-based speech enhancement (SE) methods always take the clean speech's waveform or time-frequency spectrum feature as the learning target, and train the deep neural network (DNN) by reducing the error loss between the DNN's output and the target. This is a conventional single-task learning paradigm, which has been proven to be effective, but we find that the multi-task learning framework can improve SE performance. Specifically, we design a framework containing a SE module and a voice activity detection (VAD) module, both of which share the same encoder, and the whole network is optimized by the weighted loss of the two modules. Moreover, we design a causal spatial attention (CSA) block to promote the representation capability of DNN. Combining the VAD aided multi-task learning framework and CSA block, our SE network is named VSANet. The experimental results prove the benefits of multi-task learning and the CSA block, which give VSANet an excellent SE performance.
    摘要 deep learning 基于 speech enhancement(SE)方法总是使用干净speech的波形或时域频谱特征作为学习目标,并使用深度神经网络(DNN)来减少错误损失之间的差异。这是一种常见的单任务学习模式,已经证明有效,但我们发现多任务学习框架可以提高 SE 性能。specifically,我们设计了一个包含 SE 模块和voice activity detection(VAD)模块的框架,两者都共享同一个编码器,整个网络通过两个模块的权重损失来优化。此外,我们还设计了一个 causal spatial attention(CSA)块,以提高 DNN 的表达能力。将 VAD 帮助多任务学习框架和 CSA 块结合在一起,我们称之为 VSANet。实验结果表明多任务学习和 CSA 块对 VSANet 的 SE 性能产生了积极的影响。