methods: integrate probabilistic (i.e., variational) latent space model into U-Network architecture
results: 在 MS-DNS 2020 和 Voicebank+Demand 数据集上实现了高效的混响 speech enhancement,比如 SI-SDR 达到 20.2 dB,与无 probabilistic latent space 版本相比提高约 0.5-1.4 dB,并且高于 WaveUNet 和 PHASEN。Abstract
In this paper, we propose to extend the deep, complex U-Network architecture for speech enhancement by incorporating a probabilistic (i.e., variational) latent space model. The proposed model is evaluated against several ablated versions of itself in order to study the effects of the variational latent space model, complex-value processing, and self-attention. Evaluation on the MS-DNS 2020 and Voicebank+Demand datasets yields consistently high performance. E.g., the proposed model achieves an SI-SDR of up to 20.2 dB, about 0.5 to 1.4 dB higher than its ablated version without probabilistic latent space, 2-2.4 dB higher than WaveUNet, and 6.7 dB above PHASEN. Compared to real-valued magnitude spectrogram processing with a variational U-Net, the complex U-Net achieves an improvement of up to 4.5 dB SI-SDR. Complex spectrum encoding as magnitude and phase yields best performance in anechoic conditions whereas real and imaginary part representation results in better generalization to (novel) reverberation conditions, possibly due to the underlying physics of sound.
摘要
在这篇论文中,我们提议扩展深度、复杂的U-网络架构以提高语音增强。我们的提议模型包括 probabilistic(即变量)latent space模型。我们对模型的几个版本进行了ablationstudy,以研究变量latent space模型、复杂值处理和自注意的效果。我们对MS-DNS 2020和Voicebank+Demand dataset进行了评估,得到了出色的表现。例如,我们的模型在SI-SDR方面可以达到20.2 dB,与无 probabilistic latent space版本相比高出0.5-1.4 dB,与WaveUNet相比高出2-2.4 dB,与PHASEN相比高出6.7 dB。与实数值spectrogram处理的变量U-Net相比,复杂spectrum编码为实数值和相位的方法可以在静音条件下达到最佳性能,而实部和虚部表示的方法可以更好地泛化到(新的)噪音条件,可能是因为声音的物理学习。
methods: 该论文利用了 previously released 的 ARM-based Raspberry Pi Pico嵌入式微控制器的量子 simulateur 代码,并提供了一些示例,包括一个量子 MIDI 处理器,可以根据输入音符生成附加的伴奏和具有量子生成的乐器,以及一个量子扭曲模块,可以根据量子Circuit来修改乐器的原始声音。
results: 该论文提供了一些示例,包括一个自包含的Quantum Stylophone和一个效果模块插件called ‘QubitCrusher’ для Korg Nu:Tekt NTS-1。这篇论文还讨论了未来的工作和方向,并提供了所有示例作为开源代码。这是作者所知道的第一个嵌入式量子 simulateur 用于乐器音乐(另一个 QSIM)。Abstract
This paper describes how to `Free the Qubit' for art, by creating standalone quantum musical effects and instruments. Previously released quantum simulator code for an ARM-based Raspberry Pi Pico embedded microcontroller is utilised here, and several examples are built demonstrating different methods of utilising embedded resources: The first is a Quantum MIDI processor that generates additional notes for accompaniment and unique quantum generated instruments based on the input notes, decoded and passed through a quantum circuit in an embedded simulator. The second is a Quantum Distortion module that changes an instrument's raw sound according to a quantum circuit, which is presented in two forms; a self-contained Quantum Stylophone, and an effect module plugin called 'QubitCrusher' for the Korg Nu:Tekt NTS-1. This paper also discusses future work and directions for quantum instruments, and provides all examples as open source. This is, to the author's knowledge, the first example of embedded Quantum Simulators for Instruments of Music (another QSIM).
摘要
The first example is a Quantum MIDI processor that generates additional notes for accompaniment and unique quantum-generated instruments based on the input notes, decoded and passed through a quantum circuit in an embedded simulator. The second example is a Quantum Distortion module that changes an instrument's raw sound according to a quantum circuit, presented in two forms: a self-contained Quantum Stylophone and an effect module plugin called "QubitCrusher" for the Korg Nu:Tekt NTS-1.The paper also discusses future work and directions for quantum instruments and provides all examples as open source. This is, to the author's knowledge, the first example of embedded Quantum Simulators for Instruments of Music (QSIM).