results: 实验证明了理论框架的可靠性,并且在圆形喇声器阵列中实现了独立执行。Abstract
Spherical loudspeaker arrays have been recently studied for directional sound radiation, where the compact arrangement of the loudspeaker units around a sphere facilitated the control of sound radiation in three-dimensional space. Directivity of sound radiation, or beamforming, was achieved by driving each loudspeaker unit independently, where the design of beamforming weights was typically achieved by numerical optimization with reference to a given desired beam pattern. This is in contrast to the methods already developed for microphone arrays in general and spherical microphone arrays in particular, where beamformer weights are designed to satisfy a wider range of objectives, related to directivity, robustness, and side-lobe level, for example. This paper presents the development of a physical-model-based, optimal beamforming framework for spherical loudspeaker arrays, similar to the framework already developed for spherical microphone arrays, facilitating efficient beamforming in the spherical harmonics domain, with independent steering. In particular, it is shown that from a beamforming perspective, the spherical loudspeaker array is similar to the spherical microphone array with microphones arranged around a rigid sphere. Experimental investigation validates the theoretical framework of beamformer design.
摘要
圆形 loudspeaker 阵列在近期研究中被用于指向性声波发射,其中圆形 loudspeaker 单元的紧凑排布使得三维空间中声波发射的控制变得更加容易。通过独立驱动每个 loudspeaker 单元,实现了声波发射的指向性,也就是 beamforming。与现有的 Microphone 阵列和圆形 Microphone 阵列的方法不同,这里的 beamforming 权重设计通常通过数字优化来实现,以满足更加宽泛的目标,包括指向性、Robustness 和侧射强度等。本文介绍了一种基于物理模型的、优化 beamforming 框架 для圆形 loudspeaker 阵列,与圆形 Microphone 阵列的框架类似,可以有效地在圆函数频谱中进行 beamforming,并且可以独立控制声波发射的方向。特别是,从 beamforming 的视角来看,圆形 loudspeaker 阵列与圆形 Microphone 阵列的声波发射方式类似。实验室调查 validate 了这种理论框架。
methods: 本研究使用了广频杂音场的空间时间相关性研究,并对各种杂音场进行了分析和计算,以 derivation of the diffuse field zones of quiet in the near-field and the far-field of the secondary source。
results: 研究结果表明,在低通滤波后的杂音场中,安静区域的大小与中心频率相关,并且在一定程度上可以通过对各种杂音场进行分析和计算来预测安静区域的大小。Abstract
The zones of quiet in pure-tone diffuse sound fields have been studied extensively in the past, both theoretically and experimentally, with the well known result of the 10\,dB attenuation extending to about a tenth of a wavelength. Recent results on the spatial-temporal correlation of broadband diffuse sound fields are used in this study to develop a theoretical framework for predicting the extension of the zones of quiet in broadband diffuse sound fields. This can be used to study the acoustic limitations imposed on local active sound control systems such as an active headrest when controlling broadband noise. Spatial-temporal correlation is first revised, after which derivations of the diffuse field zones of quiet in the near-field and the far-field of the secondary source are presented. The theoretical analysis is supported by simulation examples comparing the zones of quiet for diffuse fields excited by tonal and broadband signals. It is shown that as a first approximation the zone of quiet of a low-pass filtered noise is comparable to that of a pure-tone with a frequency equal to the center frequency of the broadband noise bandwidth.
摘要
在过去,混响频率场中的幽静区域已经得到了广泛的研究,both theoretically和experimentally,以得到知名的10dB抑制范围延伸约为一个波长的一半。在这种研究中,我们使用了最近的广band混响场的空间时间相关性研究,开发了一种用于预测混响场中幽静区域的理论框架。这可以用来研究控制广band噪声的地方活动声控系统,如活动头rest。首先,我们修改了空间时间相关性,然后提出了混响场中幽静区域的近场和远场 derivations。 theoretical分析得到了通过对比幽静区域的混响场 excited by tonal和广band信号的simulation例子。结果显示,作为一个初步的approximation,混响场中幽静区域的zone of quiet与一个中心频率为混响场宽频率范围的低通滤波器噪声的zone of quiet几乎相同。
Spatial sampling and beamforming for spherical microphone arrays
results: 论文回顾了最近的圆形麦克风阵 beamforming 方法的进展,包括延迟和总和法、道尔芬-切比雪夫法以及更高级的优化方法。Abstract
Spherical microphone arrays have been recently studied for spatial sound recording, speech communication, and sound field analysis for room acoustics and noise control. Complementary theoretical studies presented progress in spatial sampling and beamforming methods. This paper reviews recent results in spatial sampling that facilitate a wide range of spherical array configurations, from a single rigid sphere to free positioning of microphones. The paper then presents an overview of beamforming methods recently presented for spherical arrays, from the widely used delay-and-sum and Dolph-Chebyshev, to the more advanced optimal methods, typically performed in the spherical harmonics domain.
摘要
圆形微型麦克风数组在声学记录、语音通信和室内声学雷达控制中得到了最近的研究。相关理论研究提出了在圆形麦克风数组中的空间抽样和扩散方法的进步。本文将介绍最近在圆形麦克风数组中的空间抽样技术,从单一固定圆形麦克风到自由位置的麦克风。然后将介绍圆形麦克风数组中的扩散方法,从通用的延迟和总和到更高级的优化方法,通常在圆形傅里叶域内进行。
A privacy-preserving method using secret key for convolutional neural network-based speech classification
results: 实验结果表明,使用了本研究提出的加密方法后,语音数据仍然可以完全复用原始数据,并且对于恢复攻击有很好的鲁棒性。此外,本研究还评估了加密后语音数据的难度恢复原始信息。Abstract
In this paper, we propose a privacy-preserving method with a secret key for convolutional neural network (CNN)-based speech classification tasks. Recently, many methods related to privacy preservation have been developed in image classification research fields. In contrast, in speech classification research fields, little research has considered these risks. To promote research on privacy preservation for speech classification, we provide an encryption method with a secret key in CNN-based speech classification systems. The encryption method is based on a random matrix with an invertible inverse. The encrypted speech data with a correct key can be accepted by a model with an encrypted kernel generated using an inverse matrix of a random matrix. Whereas the encrypted speech data is strongly distorted, the classification tasks can be correctly performed when a correct key is provided. Additionally, in this paper, we evaluate the difficulty of reconstructing the original information from the encrypted spectrograms and waveforms. In our experiments, the proposed encryption methods are performed in automatic speech recognition~(ASR) and automatic speaker verification~(ASV) tasks. The results show that the encrypted data can be used completely the same as the original data when a correct secret key is provided in the transformer-based ASR and x-vector-based ASV with self-supervised front-end systems. The robustness of the encrypted data against reconstruction attacks is also illustrated.
摘要
在这篇论文中,我们提出了一种保持隐私的方法,用于在卷积神经网络(CNN)基于的语音分类任务中。在图像分类研究领域中,最近已经有许多隐私保护方法的研究。然而,在语音分类研究领域,很少有研究者考虑到这些风险。为了促进语音分类领域中的隐私保护研究,我们提供了一种使用随机矩阵的加密方法。这种加密方法基于一个可逆的随机矩阵。具有正确密钥的加密语音数据可以通过一个使用逆矩阵生成的加密神经网络进行接受。然而,加密语音数据具有强烈的扭曲,但是在正确密钥提供下,分类任务仍然可以正确完成。此外,在这篇论文中,我们评估了加密后的原始信息重建的困难度。在我们的实验中,我们使用自动语音识别(ASR)和自动说话人验证(ASV)任务中的转换器基于ASR和x-vector基于ASV自适应前端系统进行实现。结果显示,当正确密钥提供时,加密数据可以完全 Replace original data,并且在转换器基于ASR和x-vector基于ASV自适应前端系统中,加密数据的稳定性也得到了证明。