cs.SD - 2023-09-07

Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement

  • paper_url: http://arxiv.org/abs/2309.03684
  • repo_url: None
  • paper_authors: Julitta Bartolewska, Stanisław Kacprzak, Konrad Kowalczyk
  • for: 提高单频道speech干扰signal质量和理解度
  • methods: 使用signal基于的 causal DCCRN,减少look-ahead和网络参数数量
  • results: 实验结果表明,提posed模型可以与原始DCCRN相比或更好地提高speech干扰metric,同时减少缓存时间和网络参数数量约30%
    Abstract The aim of speech enhancement is to improve speech signal quality and intelligibility from a noisy microphone signal. In many applications, it is crucial to enable processing with small computational complexity and minimal requirements regarding access to future signal samples (look-ahead). This paper presents signal-based causal DCCRN that improves online single-channel speech enhancement by reducing the required look-ahead and the number of network parameters. The proposed modifications include complex filtering of the signal, application of overlapped-frame prediction, causal convolutions and deconvolutions, and modification of the loss function. Results of performed experiments indicate that the proposed model with overlapped signal prediction and additional adjustments, achieves similar or better performance than the original DCCRN in terms of various speech enhancement metrics, while it reduces the latency and network parameter number by around 30%.
    摘要 “Speech enhancement的目的是提高噪音干扰的语音信号质量和可理解度,从噪音抑制的 Microphone 信号中提取语音信号。在许多应用中,需要进行小型计算复杂性和未来信号样本访问的最小化处理。这篇论文提出了信号基于的 causal DCCRN,可以在线进行单 канал语音增强,从而降低了需要的 look-ahead 和网络参数数量。提议的修改包括信号复杂的滤波、重叠框预测、 causal 卷积和卷积,以及损失函数的修改。实验结果表明,提议的模型,带有重叠信号预测和其他调整,可以与原始 DCCRN 相比,在不同的语音增强指标上实现相似或更好的性能,同时降低了延迟和网络参数数量约30%。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Topological fingerprints for audio identification

  • paper_url: http://arxiv.org/abs/2309.03516
  • repo_url: https://github.com/wreise/top_audio_id
  • paper_authors: Wojciech Reise, Ximena Fernández, Maria Dominguez, Heather A. Harrington, Mariano Beguerisse-Díaz
  • for: 该研究提出了一种基于topological Audio fingerprinting的音频追踪方法,用于 Robustly 识别重复的音频轨迹。
  • methods: 该方法使用 persistente homology 对地方 spectral decompositions 的 audio signals 进行编码,使用 filtered cubical complexes 从 mel-spectrograms 计算。
  • results: 实验结果表明,该算法可以准确地检测时间对齐的音频匹配,并在 topological distortions 场景下表现出优于现有方法。
    Abstract We present a topological audio fingerprinting approach for robustly identifying duplicate audio tracks. Our method applies persistent homology on local spectral decompositions of audio signals, using filtered cubical complexes computed from mel-spectrograms. By encoding the audio content in terms of local Betti curves, our topological audio fingerprints enable accurate detection of time-aligned audio matchings. Experimental results demonstrate the accuracy of our algorithm in the detection of tracks with the same audio content, even when subjected to various obfuscations. Our approach outperforms existing methods in scenarios involving topological distortions, such as time stretching and pitch shifting.
    摘要 我们提出了一种适用于鲁棒识别相同音频轨的多尺度音频指纹方法。我们的方法使用稳定的多尺度空间来对音频信号进行本地特征分解,并使用缓冲的立方体复合来计算mel-spectrogram。通过将音频内容编码成本地比蒂曲线,我们的音频指纹可以准确地检测时间对齐的音频匹配。实验结果表明,我们的算法在受到不同类型的扭曲(如时间延迟和调高)的情况下仍然能够准确地识别相同的音频内容。我们的方法在多尺度扭曲场景下表现出优于现有方法。

Simulating room transfer functions between transducers mounted on audio devices using a modified image source method

  • paper_url: http://arxiv.org/abs/2309.03486
  • repo_url: https://github.com/audiolabs/DEISM
  • paper_authors: Zeyu Xu, Adrian Herzog, Alexander Lodermeyer, Emanuël A. P. Habets, Albert G. Prinn
  • for: 这个研究旨在扩展图像源方法(ISM),以包括对房间声学的扩散效应。
  • methods: 研究使用对elesbian harmonic directivity coefficients来扩展ISM,以包括源和接收器的对话装置所导致的声学扩散效应。
  • results: 研究显示,提案的方法可以更正确地模拟房间转换函数,并且可以考虑房间内设备的大小、形状、数量和位置。
    Abstract The image source method (ISM) is often used to simulate room acoustics due to its ease of use and computational efficiency. The standard ISM is limited to simulations of room impulse responses between point sources and omnidirectional receivers. In this work, the ISM is extended using spherical harmonic directivity coefficients to include acoustic diffraction effects due to source and receiver transducers mounted on physical devices, which are typically encountered in practical situations. The proposed method is verified using finite element simulations of various loudspeaker and microphone configurations in a rectangular room. It is shown that the accuracy of the proposed method is related to the sizes, shapes, number, and positions of the devices inside a room. A simplified version of the proposed method, which can significantly reduce computational effort, is also presented. The proposed method and its simplified version can simulate room transfer functions more accurately than currently available image source methods and can aid the development and evaluation of speech and acoustic signal processing algorithms, including speech enhancement, acoustic scene analysis, and acoustic parameter estimation.
    摘要 <>使用图像源方法(ISM)模拟室内声学,由于其使用 convenienceliness 和计算效率,经常被使用。标准的ISM只能模拟室内冲击响应 между点源和全irectional接收器。在这种工作中,ISM被扩展使用球面幂直强度系数,以包括声学扩散效应,源和接收器适配器在实际情况下的 mounting 会导致的。提议的方法通过rectangular room的finite element simulations of various loudspeaker and microphone configurations进行验证。结果表明,提议的方法的准确性与房间内设备的大小、形状、数量和位置有关。一种简化版的提议方法,可以减少计算努力,也被提出。提议的方法和其简化版可以更准确地模拟室内传递函数,并且可以帮助开发和评估speech和声学信号处理算法,包括speech enhancement、声学场景分析和声学参数估计。