cs.SD - 2023-08-03

Versatile Time-Frequency Representations Realized by Convex Penalty on Magnitude Spectrogram

  • paper_url: http://arxiv.org/abs/2308.01665
  • repo_url: None
  • paper_authors: Keidai Arai, Koki Yamada, Kohei Yatabe
  • for: 本研究旨在提出一种基于 convex 优化的时Frequency(T-F)表示方法,用于实现自定义的时Frequency表示特性。
  • methods: 本研究使用了基于优化的方法,包括基于原始基的扩展,来设计 T-F 表示。
  • results: 本研究提出了一种可以根据用户要求定制时Frequency表示的方法,并通过数学分析和实验例子表明了该方法的有效性。
    Abstract Sparse time-frequency (T-F) representations have been an important research topic for more than several decades. Among them, optimization-based methods (in particular, extensions of basis pursuit) allow us to design the representations through objective functions. Since acoustic signal processing utilizes models of spectrogram, the flexibility of optimization-based T-F representations is helpful for adjusting the representation for each application. However, acoustic applications often require models of \textit{magnitude} of T-F representations obtained by discrete Gabor transform (DGT). Adjusting a T-F representation to such a magnitude model (e.g., smoothness of magnitude of DGT coefficients) results in a non-convex optimization problem that is difficult to solve. In this paper, instead of tackling difficult non-convex problems, we propose a convex optimization-based framework that realizes a T-F representation whose magnitude has characteristics specified by the user. We analyzed the properties of the proposed method and provide numerical examples of sparse T-F representations having, e.g., low-rank or smooth magnitude, which have not been realized before.
    摘要 零埋时频(T-F)表示已经是研究领域中的重要话题, duration of more than several decades. Among them, 优化基于方法(特别是基 pursuit 的扩展), allowing us to design the representation through objective functions. 因为音声信号处理使用spectrogram模型, therefore, the flexibility of optimization-based T-F representations is helpful for adjusting the representation for each application. However, acoustic applications often require models of 音声信号的大小(magnitude)obtained by discrete Gabor transform (DGT). Adjusting a T-F representation to such a magnitude model (e.g., smoothness of magnitude of DGT coefficients) results in a non-convex optimization problem that is difficult to solve. In this paper, instead of tackling difficult non-convex problems, we propose a convex optimization-based framework that realizes a T-F representation whose magnitude has characteristics specified by the user. We analyzed the properties of the proposed method and provide numerical examples of sparse T-F representations having, e.g., low-rank or smooth magnitude, which have not been realized before.

Optimizing multi-user sound communications in reverberating environments with acoustic reconfigurable metasurfaces

  • paper_url: http://arxiv.org/abs/2308.01531
  • repo_url: None
  • paper_authors: Hongkuan Zhang, Qiyuan Wang, Mathias Fink, Guancong Ma
  • for: 解决在噪音强度高的房间中,多个人同时说话,使其完全理解和无法交叠信息的问题。
  • methods: 开发了一种智能听音墙,可以电子控制,并由学习算法根据房间几何和源器和接收器的位置进行自适应调整。
  • results: 实现了控制多спектル声场,覆盖了很大的谱域,包括隐含通信、频分多路通信和多用户通信等多种功能,并在实验中实现了无交叠的同时音乐播放。
    Abstract How do you ensure that, in a reverberant room, several people can speak simultaneously to several other people, making themselves perfectly understood and without any crosstalk between messages? In this work, we report a conceptual solution to this problem by developing an intelligent acoustic wall, which can be reconfigured electronically and is controlled by a learning algorithm that adapts to the geometry of the room and the positions of sources and receivers. To this end, a portion of the room boundaries is covered with a smart mirror made of a broadband acoustic reconfigurable metasurface (ARMs) designed to provide a two-state (0 or {\pi}) phase shift in the reflected waves by 200 independently tunable units. The whole phase pattern is optimized to maximize the Shannon capacity while minimizing crosstalk between the different sources and receivers. We demonstrate the control of multi-spectral sound fields covering a spectrum much larger than the coherence bandwidth of the room for diverse striking functionalities, including crosstalk-free acoustic communication, frequency-multiplexed communications, and multi-user communications. An experiment conducted with two music sources for two different people demonstrates a crosstalk-free simultaneous music playback. Our work opens new routes for the control of sound waves in complex media and for a new generation of acoustic devices.
    摘要 如何在噪音强的房间中,许多人同时与别人说话,保持完整的理解和没有任何干扰?在这项工作中,我们报道了一种概念解决方案,通过开发智能音频墙来实现。这个墙可以电子控制,并由学习算法控制,以适应房间的几何结构和源器和接收器的位置。为此,部分房间边界被覆盖了一块智能镜,由一百多个独立调整的单元组成,每个单元可以提供0或π的阶段差。整个阶段模式被优化,以最大化吞吐量,同时减少不同源器和接收器之间的干扰。我们实际操作了多色音场,覆盖了房间的较大的吸收带,并实现了不同的吸引功能,包括干扰自由的音频通信、频分多路通信和多用户通信。在两个音源为两个不同的人的实验中,我们实现了干扰自由的同时播放音乐。我们的工作开启了控制噪音媒体的新路线,以及一代新的噪音设备。