cs.SD - 2023-10-21

  • paper_url: http://arxiv.org/abs/2310.14018
  • repo_url: None
  • paper_authors: Tatsuki Kobayashi, Yoshiko Maruyama, Isao Nambu, Shohei Yano, Yasuhiro Wada
  • for: Virtual sound synthesis technology allows users to perceive spatial sound through headphones or earphones, but accurate virtual sound requires an individual head-related transfer function (HRTF).
  • methods: This study proposed a method to generate HRTFs from one direction to the other using temporal convolutional neural networks (TCNs) and publicly available datasets in the horizontal plane.
  • results: The proposed method successfully generated HRIRs for directions other than the front direction in the dataset, and was found to be equivalent to the measured HRIRs in a new dataset through behavioral experiments with human participants. These results suggest that the proposed TCNs can be used to generate personalized HRIRs for virtual sound.Here’s the summary in Traditional Chinese as well, for your reference:
  • for: 虚拟 зву频技术允许使用者透过耳机或耳筒听到三维音频,但是精准的虚拟音频需要个人化头部转换函数(HRTF)。
  • methods: 这项研究提出了将HRTF从一个方向转换到另一个方向的方法,使用了时间卷积神经网络(TCN)和公共可用数据集在水平面上进行训练。
  • results: 提议的方法成功将HRIRs从其他方向转换到前方方向,并且在新的数据集上进行了训练。Behavioral实验显示,生成的HRIRs与实验测量的HRIRs相等。这些结果表示,提议的TCNs可以从一个方向转换到另一个方向,实现个人化虚拟音频。
    Abstract Virtual sound synthesis is a technology that allows users to perceive spatial sound through headphones or earphones. However, accurate virtual sound requires an individual head-related transfer function (HRTF), which can be difficult to measure due to the need for a specialized environment. In this study, we proposed a method to generate HRTFs from one direction to the other. To this end, we used temporal convolutional neural networks (TCNs) to generate head-related impulse responses (HRIRs). To train the TCNs, publicly available datasets in the horizontal plane were used. Using the trained networks, we successfully generated HRIRs for directions other than the front direction in the dataset. We found that the proposed method successfully generated HRIRs for publicly available datasets. To test the generalization of the method, we measured the HRIRs of a new dataset and tested whether the trained networks could be used for this new dataset. Although the similarity evaluated by spectral distortion was slightly degraded, behavioral experiments with human participants showed that the generated HRIRs were equivalent to the measured ones. These results suggest that the proposed TCNs can be used to generate personalized HRIRs from one direction to another, which could contribute to the personalization of virtual sound.
    摘要 虚拟声音合成技术可以让用户通过headset或earphone感受到三维声音。然而,实际的虚拟声音需要个人头部相关传输函数(HRTF),这可以因特殊环境而困难测量。在这个研究中,我们提出了一种方法,可以从一个方向转换到另一个方向的HRTF。为此,我们使用了时间卷积神经网络(TCN)生成头部相关冲击响应(HRIR)。使用训练好的网络,我们成功地生成了不同方向的HRIR。我们发现,提出的方法可以成功地生成不同方向的HRIR,并且在新的数据集中测试了该方法的通用性。虽然spectral distortion评估中的相似性略差,但人类参与者的行为实验表明,生成的HRIR与测量的HRIR相当。这些结果表明,提出的TCN可以用于个人化虚拟声音。