cs.SD - 2023-11-22

Spatial Audio and Individualized HRTFs using a Convolutional Neural Network (CNN)

  • paper_url: http://arxiv.org/abs/2311.13397
  • repo_url: None
  • paper_authors: Ludovic Pirard
  • for: 这个研究旨在提供一种基于人体特征自动提取的个性化HRTF实现方法,以提供个性化的听音体验。
  • methods: 该方法包括使用Convolutional Neural Network(CNN)自动提取耳壳图像上的特征点,并将这些特征点与人体特征数据库中的7个anthropometric measurement进行对比,以计算最佳匹配。
  • results: 该研究提出了一种基于人体特征自动提取的HRTF个性化实现方法,可以提供高度个性化的听音体验。
    Abstract Spatial audio and 3-Dimensional sound rendering techniques play a pivotal and essential role in immersive audio experiences. Head-Related Transfer Functions (HRTFs) are acoustic filters which represent how sound interacts with an individual's unique head and ears anatomy. The use of HRTFs compliant to the subjects anatomical traits is crucial to ensure a personalized and unique spatial experience. This work proposes the implementation of an HRTF individualization method based on anthropometric features automatically extracted from ear images using a Convolutional Neural Network (CNN). Firstly, a CNN is implemented and tested to assess the performance of machine learning on positioning landmarks on ear images. The I-BUG dataset, containing ear images with corresponding 55 landmarks, was used to train and test the neural network. Subsequently, 12 relevant landmarks were selected to correspond to 7 specific anthropometric measurements established by the HUTUBS database. These landmarks serve as a reference for distance computation in pixels in order to retrieve the anthropometric measurements from the ear images. Once the 7 distances in pixels are extracted from the ear image, they are converted in centimetres using conversion factors, a best match method vector is implemented computing the Euclidean distance for each set in a database of 116 ears with their corresponding 7 anthropometric measurements provided by the HUTUBS database. The closest match of anthropometry can be identified and the corresponding set of HRTFs can be obtained for personnalized use. The method is evaluated in its validity instead of the accuracy of the results. The conceptual scope of each stage has been verified and substantiated to function correctly. The various steps and the available elements in the process are reviewed and challenged to define a greater algorithm entity designed for the desired task.
    摘要 幻 Audio 和三维声音渲染技术在幻 Audio 经验中发挥关键和必要的作用。人体相关传播函数(HRTF)是一种声学筛子,表示声音与个人唯一的头和耳朵解剖特征之间的互动。为确保个性化的幻 Audio 经验,HRTF 必须遵循个人解剖特征。这项工作提议基于人体解剖特征自动提取的 HRTF 个性化方法。首先,我们实现了一种基于卷积神经网络(CNN)的 HRTF 个性化方法。我们使用 I-BUG 数据集,包含有 ear 图像和相应的 55 个标注点,来训练和测试神经网络。然后,我们选择了 12 个重要的标注点,以便与 HUTUBS 数据库中的 7 个人体测量相匹配。这些标注点用于计算 ear 图像中的距离(在像素单位),以获取人体测量。最后,我们将 ear 图像中的距离转换为厘米,使用比例因子,并实现了一个最佳匹配方法,以计算每个 ear 图像中的人体测量。通过将最佳匹配的人体测量与 HUTUBS 数据库中的 116 个耳朵和其相应的 7 个人体测量进行比较,可以确定最佳匹配的人体测量,并获取相应的 HRTF。我们评估了这种方法的有效性,而不是准确率。我们验证了每个阶段的概念范围是否正确,并验证了各个步骤和可用的元素是否正确。我们对这些步骤和元素进行了评估和挑战,以定义一个更大的算法实体,用于实现所需的任务。