eess.AS - 2023-07-26

Sound Field Estimation around a Rigid Sphere with Physics-informed Neural Network

paper_url: http://arxiv.org/abs/2307.14013
repo_url: None
paper_authors: Xingyu Chen, Fei Ma, Amy Bastine, Prasanga Samarasinghe, Huiyuan Sun
for: 实时测量圆体周围的声场需要足够的样本，但这不一定可行。这篇论文提出了基于物理学习网络的声场估计方法，将物理知识 integrate into 网络架构和训练过程。与其他学习基于方法不同，提议的方法具有更好的适应能力和较少的样本需求。
methods: physics-informed neural network
results: 比起圆函数方法和平面波分解方法，提议的方法可以实现更加精确的声场估计，并且不需要大量的样本。在实验中，这篇论文的方法可以从有限的测量数据中获得更加精确的声场估计，超过圆函数方法和平面波分解方法的表现。

Abstract
Accurate estimation of the sound field around a rigid sphere necessitates adequate sampling on the sphere, which may not always be possible. To overcome this challenge, this paper proposes a method for sound field estimation based on a physics-informed neural network. This approach integrates physical knowledge into the architecture and training process of the network. In contrast to other learning-based methods, the proposed method incorporates additional constraints derived from the Helmholtz equation and the zero radial velocity condition on the rigid sphere. Consequently, it can generate physically feasible estimations without requiring a large dataset. In contrast to the spherical harmonic-based method, the proposed approach has better fitting abilities and circumvents the ill condition caused by truncation. Simulation results demonstrate the effectiveness of the proposed method in achieving accurate sound field estimations from limited measurements, outperforming the spherical harmonic method and plane-wave decomposition method.

摘要
固定圆球的声场估算需要充足的样本点，但这不总是可能的。为解决这个挑战，本文提出了基于物理学习网络的声场估算方法。这种方法将物理知识 integrate into网络的架构和训练过程中。与其他学习基本方法不同，该方法添加了基于海尔曼方程和径向速度条件在固定圆球上的额外约束。因此，它可以生成符合物理规则的估算结果，不需要大量数据。与圆球幂函数基本方法相比，本方法有更好的适应性和规则化特征，并且不受截断的缺陷。通过实验结果，本文证明了该方法在基于有限测量数据的情况下可以实现高精度的声场估算，超过圆球幂函数基本方法和平面波分解方法。

Speech representation learning: Learning bidirectional encoders with single-view, multi-view, and multi-task methods

paper_url: http://arxiv.org/abs/2308.00129
repo_url: None
paper_authors: Qingming Tang
for: 本论文主要针对sequence数据上的时间或空间学习 representation learning，以提高下游序列预测任务的性能。
methods: 本论文使用supervised learning和多种不同的学习方法，包括auxiliary loss学习、无监督学习、半监督学习和多视图学习。
results: 本论文通过多种学习设置和方法，对speech数据进行了广泛的研究，并获得了一些有价值的结果。这些结果可以应用于其他领域中。

Abstract
This thesis focuses on representation learning for sequence data over time or space, aiming to improve downstream sequence prediction tasks by using the learned representations. Supervised learning has been the most dominant approach for training deep neural networks for learning good sequential representations. However, one limiting factor to scale supervised learning is the lack of enough annotated data. Motivated by this challenge, it is natural to explore representation learning methods that can utilize large amounts of unlabeled and weakly labeled data, as well as an additional data modality. I describe my broad study of representation learning for speech data. Unlike most other works that focus on a single learning setting, this thesis studies multiple settings: supervised learning with auxiliary losses, unsupervised learning, semi-supervised learning, and multi-view learning. Besides different learning problems, I also explore multiple approaches for representation learning. Though I focus on speech data, the methods described in this thesis can also be applied to other domains. Overall, the field of representation learning is developing rapidly. State-of-the-art results on speech related tasks are typically based on Transformers pre-trained with large-scale self-supervised learning, which aims to learn generic representations that can benefit multiple downstream tasks. Since 2020, large-scale pre-training has been the de facto choice to achieve good performance. This delayed thesis does not attempt to summarize and compare with the latest results on speech representation learning; instead, it presents a unique study on speech representation learning before the Transformer era, that covers multiple learning settings. Some of the findings in this thesis can still be useful today.

摘要
这个论文关注在时间或空间序列数据上进行表示学习，以提高下游序列预测任务的性能。supervised learning是深度神经网络训练好序列表示的最主要方法。然而，缺乏充足的注释数据是规模supervised learning的限制因素。为了解决这个挑战，这个论文 explore representation learning方法，可以利用大量无注释和弱注释数据，以及多个数据模式。我描述了对speech数据的广泛研究，不同于大多数其他作品，这个论文研究了多种学习Setting：supervised learning with auxiliary losses、Unsupervised learning、semi-supervised learning和多视图学习。此外，我还探索了多种表示学习方法。尽管我专注于speech数据，但这些方法可以应用到其他领域。总的来说，表示学习领域在快速发展。目前最佳的speech相关任务的结果通常基于Transformers预先训练大规模自我超vised learning，该学习目标是学习通用的表示，可以改善多个下游任务。自2020年以来，大规模预训练成为了downstream任务的启用之选择。这个论文不尝试总结和与最新的speech表示学习结果进行比较，而是提供了在Transformer时代之前的speech表示学习研究，覆盖多种学习Setting。一些这个论文中的发现仍然有用。