eess.AS - 2023-08-01

Generative adversarial networks with physical sound field priors

  • paper_url: http://arxiv.org/abs/2308.00426
  • repo_url: https://github.com/xefonon/soundfieldgan
  • paper_authors: Xenofon Karakonstantis, Efren Fernandez-Grande
  • for: 这个论文提出了一种基于深度学习的声场重建方法,使用生成对抗网络(GANs)来捕捉室内声场的统计分布。
  • methods: 该方法使用平面波基础,利用室内声场的统计分布来准确重建声场从有限多个测量点的数据。
  • results: 试验结果表明,该方法可以在高频范围内具有更高的准确率和能量保留率,特别是在测量区域之外进行推断。此外,该方法可以适应不同的测量点和配置而不会影响性能。
    Abstract This paper presents a deep learning-based approach for the spatio-temporal reconstruction of sound fields using Generative Adversarial Networks (GANs). The method utilises a plane wave basis and learns the underlying statistical distributions of pressure in rooms to accurately reconstruct sound fields from a limited number of measurements. The performance of the method is evaluated using two established datasets and compared to state-of-the-art methods. The results show that the model is able to achieve an improved reconstruction performance in terms of accuracy and energy retention, particularly in the high-frequency range and when extrapolating beyond the measurement region. Furthermore, the proposed method can handle a varying number of measurement positions and configurations without sacrificing performance. The results suggest that this approach provides a promising approach to sound field reconstruction using generative models that allow for a physically informed prior to acoustics problems.
    摘要

Circumvent spherical Bessel function nulls for open sphere microphone arrays with physics informed neural network

  • paper_url: http://arxiv.org/abs/2308.00242
  • repo_url: None
  • paper_authors: Fei Ma, Thushara D. Abhayapala, Prasanga N. Samarasinghe
  • for: 提高开放球形微型麦克风数组(OSMA)的声场分析能力
  • methods: 使用物理学 informed neural network(PINN)模型OSMA测量和预测另一个圆柱体上的声场
  • results: 通过利用圆柱体半径不同导致圆柱体函散点变化,从预测中直接获得困难直接从OSMA测量中获得的声场系数
    Abstract Open sphere microphone arrays (OSMAs) are simple to design and do not introduce scattering fields, and thus can be advantageous than other arrays for implementing spatial acoustic algorithms under spherical model decomposition. However, an OSMA suffers from spherical Bessel function nulls which make it hard to obtain some sound field coefficients at certain frequencies. This paper proposes to assist an OSMA for sound field analysis with physics informed neural network (PINN). A PINN models the measurement of an OSMA and predicts the sound field on another sphere whose radius is different from that of the OSMA. Thanks to the fact that spherical Bessel function nulls vary with radius, the sound field coefficients which are hard to obtain based on the OSMA measurement directly can be obtained based on the prediction. Simulations confirm the effectiveness of this approach and compare it with the rigid sphere approach.
    摘要 Open sphere microphone arrays (OSMAs) 是容易设计的,不会产生散射场,因此可能比其他阵列更适合实现圆形声学模型的分解。然而,OSMA 受到圆形贝塞尔函数null的影响,使得在某些频率下获取声场各个分量的困难。这篇论文提议使用物理学 Informed Neural Network (PINN) 来帮助OSMA 进行声场分析。PINN 模型了 OSMA 的测量和另一个圆形壳声场的预测,并且通过利用圆形贝塞尔函数null 的变化,直接从预测中获取了difficult to obtain的声场各个分量。 simulated results confirm the effectiveness of this approach and compare it with the rigid sphere approach.

The role of vowel and consonant onsets in neural tracking of natural speech

  • paper_url: http://arxiv.org/abs/2308.00161
  • repo_url: None
  • paper_authors: Mohammad Jalilpour Monesi, Jonas Vanthornhout, Hugo Van hamme, Tom Francart
  • for: investigate how the auditory system processes natural speech
  • methods: used recorded EEG signals from 105 subjects while they listened to fairy tale stories, and related EEG to speech representations using forward modeling and match-mismatch tasks
  • results: vowel-consonant onsets outperform onsets of any phone in both tasks, suggesting that neural tracking of vowel vs. consonant exists in the EEG to some degree, and vowel (syllable nucleus) onsets are better related to EEG compared to syllable onsets.
    Abstract To investigate how the auditory system processes natural speech, models have been created to relate the electroencephalography (EEG) signal of a person listening to speech to various representations of the speech. Mainly the speech envelope has been used, but also phonetic representations. We investigated to which degree of granularity phonetic representations can be related to the EEG signal. We used recorded EEG signals from 105 subjects while they listened to fairy tale stories. We utilized speech representations, including onset of any phone, vowel-consonant onsets, broad phonetic class (BPC) onsets, and narrow phonetic class (NPC) onsets, and related them to EEG using forward modeling and match-mismatch tasks. In forward modeling, we used a linear model to predict EEG from speech representations. In the match-mismatch task, we trained a long short term memory (LSTM) based model to determine which of two candidate speech segments matches with a given EEG segment. Our results show that vowel-consonant onsets outperform onsets of any phone in both tasks, which suggests that neural tracking of the vowel vs. consonant exists in the EEG to some degree. We also observed that vowel (syllable nucleus) onsets are better related to EEG compared to syllable onsets. Finally, our findings suggest that neural tracking previously thought to be associated with broad phonetic classes might actually originate from vowel-consonant onsets rather than the differentiation between different phonetic classes.
    摘要 为了研究人类听说语言系统如何处理自然语言,我们创建了模型来关系听者在听说语言时的电生物学信号(EEG)与不同类型的语音表示。主要是使用语音封顶信号,还有 fonetic 表示。我们研究了这些表示与EEG信号之间的相对度。我们使用了105名参与者的记录的EEG信号,他们在听《童话》故事。我们使用了语音表示,包括任何 phone 的开始、元音-初始音和窄 fonetic 类(NPC) 开始,并将它们与EEG信号相关。在前向模型中,我们使用了一个线性模型预测EEG信号。在匹配-异样任务中,我们训练了一个基于长期短 памя真型(LSTM) 模型,以确定两个候选语音段中哪一个与给定的EEG段匹配。我们的结果显示,元音-初始音在两个任务中都高于任何 phone 开始,这表明在EEG信号中存在一定的元音-初始音跟踪。我们还发现,元音( syllable nucleus )开始在 EEG 信号中更好地相关,而 syllable 开始则不如那么好。最后,我们的发现表明,先前被认为与 Broad phonetic classes 相关的神经跟踪,实际上可能来自元音-初始音开始而不是不同的 fonetic 类。

An enhanced system for the detection and active cancellation of snoring signals

  • paper_url: http://arxiv.org/abs/2307.16809
  • repo_url: None
  • paper_authors: Valeria Bruschi, Michela Cantarini, Luca Serafini, Stefano Nobili, Stefania Cecchi, Stefano Squartini
  • for: 防止呼吸暂停疾病的影响,提高人们的社交和婚姻生活质量
  • methods: 使用卷积回归神经网络检测呼吸活动,采用延迟零带割分解法实现活动呼吸消除
  • results: 通过实验使用真实的呼吸信号,研究发现当呼吸活动检测阶段打开时,活动呼吸消除系统的性能更好,这说明了预先检测呼吸活动的效用性。
    Abstract Snoring is a common disorder that affects people's social and marital lives. The annoyance caused by snoring can be partially solved with active noise control systems. In this context, the present work aims at introducing an enhanced system based on the use of a convolutional recurrent neural network for snoring activity detection and a delayless subband approach for active snoring cancellation. Thanks to several experiments conducted using real snoring signals, this work shows that the active snoring cancellation system achieves better performance when the snoring activity detection stage is turned on, demonstrating the beneficial effect of a preliminary snoring detection stage in the perspective of snoring cancellation.
    摘要 呼吸困难是一种常见的呼吸疾病,影响人们的社交和 conjugal 生活。呼吸困难所引起的厌恶感可以通过活动噪声控制系统得到一定的缓解。在这个背景下,现在的工作旨在推出一种基于 convolutional recurrent neural network 的呼吸活动检测系统和无延迟子带方法。经过多次使用真实的呼吸信号进行实验,这个工作表明了该活动呼吸抑制系统在启用呼吸活动检测阶段时表现更好,从而证明了预先检测呼吸活动的效果对呼吸抑制有益。