cs.SD - 2023-10-02

Scaling Up Music Information Retrieval Training with Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.01353
  • repo_url: None
  • paper_authors: Yun-Ning Hung, Ju-Chiang Wang, Minz Won, Duc Le
  • for: 提高Music Information Retrieval(MIR)任务的成功率,解决数据环境的缺乏问题。
  • methods: 使用半监督教师学生训练方法,通过不断创建和优化假标签来提高MIR任务的性能。
  • results: 通过扩大模型大小和训练数据量,实现了多个MIR任务的最佳性能,比超vised模型和基于自我超vised预训练模型的模型更高。
    Abstract In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has been one of the major concerns to the success of an MIR task. In this work, we leverage the semi-supervised teacher-student training approach to improve MIR tasks. For training, we scale up the unlabeled music data to 240k hours, which is much larger than any public MIR datasets. We iteratively create and refine the pseudo-labels in the noisy teacher-student training process. Knowledge expansion is also explored to iteratively scale up the model sizes from as small as less than 3M to almost 100M parameters. We study the performance correlation between data size and model size in the experiments. By scaling up both model size and training data, our models achieve state-of-the-art results on several MIR tasks compared to models that are either trained in a supervised manner or based on a self-supervised pretrained model. To our knowledge, this is the first attempt to study the effects of scaling up both model and training data for a variety of MIR tasks.
    摘要 在数据驱动的音乐信息检索(MIR)时代,数据稀缺问题一直是MIR任务的主要难题。在这项工作中,我们利用半supervised教师生徒训练方法来提高MIR任务的性能。为了训练,我们扩大了无标音乐数据至240k小时,这比任何公共MIR数据集都大得多。我们在含噪教师生徒训练过程中逐渐创建和精细化假标签。我们也进行了知识扩展,以逐渐扩大模型的大小从less than 3M到大约100M参数。我们在实验中研究了数据大小和模型大小之间的性能相关性。通过扩大模型和训练数据,我们的模型在多个MIR任务上达到了相对较高的状态。到我们知道的,这是第一次对多个MIR任务进行数据和模型扩大的研究。

F0 analysis of Ghanaian pop singing reveals progressive alignment with equal temperament over the past three decades: a case study

  • paper_url: http://arxiv.org/abs/2310.00870
  • repo_url: None
  • paper_authors: Iran R. Roman, Daniel Faronbi, Isabelle Burger-Weiser, Leila Adu-Gilmore
  • for: 这个论文的目的是研究当代加纳现代流行歌曲如何结合欧洲和传统加纳风格,以及这种结合对加纳风格的影响。
  • methods: 作者使用了 Gaussian mixture modeling (GMM) 方法来分析加纳歌手Daddy Lumba的歌曲,从1989年到2016年,并提取了封闭 vocals 的 F0 值。
  • results: 研究发现,Daddy Lumba的 singing 逐渐倾向于和等律律法align,特别是在最近的年份,总体来说,他的唱法中的微调内容逐渐减少。这些结果表明,加纳风格在接触等律律法后可能会受到影响,并且需要进一步的研究以映射和档案加纳的唱法样式。
    Abstract Contemporary Ghanaian popular singing combines European and traditional Ghanaian influences. We hypothesize that access to technology embedded with equal temperament catalyzed a progressive alignment of Ghanaian singing with equal-tempered scales over time. To test this, we study the Ghanaian singer Daddy Lumba, whose work spans from the earliest Ghanaian electronic style in the late 1980s to the present. Studying a singular musician as a case study allows us to refine our analysis without over-interpreting the findings. We curated a collection of his songs, distributed between 1989 and 2016, to extract F0 values from isolated vocals. We used Gaussian mixture modeling (GMM) to approximate each song's scale and found that the pitch variance has been decreasing over time. We also determined whether the GMM components follow the arithmetic relationships observed in equal-tempered scales, and observed that Daddy Lumba's singing better aligns with equal temperament in recent years. Together, results reveal the impact of exposure to equal-tempered scales, resulting in lessened microtonal content in Daddy Lumba's singing. Our study highlights a potential vulnerability of Ghanaian musical scales and implies a need for research that maps and archives singing styles.
    摘要 现代加纳流行歌唱结合了欧洲和传统加纳的元素。我们推测,访问嵌入了等温度的技术导致加纳歌手的唱法逐渐与等温度的音频相对。为了测试这一点,我们研究了加纳歌手达ди·卢贝,他的作品从1980年代后期到现在。通过研究单个音乐家的案例,我们可以精细地分析而不是过度解释结果。我们收集了达ди·卢贝的歌曲,分布在1989年和2016年之间,并从孤立的 vocals 中提取 F0 值。我们使用 Gaussian mixture modeling (GMM) 来估算每首歌的scale,并发现歌曲的抖音幅度逐渐减少。我们还确定了 GMM 组件是否遵循等温度规律所见,并发现达ди·卢贝的唱法在最近几年变得更加适应等温度。结果表明加纳音频扩展的影响,导致达ди·卢贝的唱法中减少微调内容。我们的研究强调了加纳音频扩展的可能性,并 imply 需要一种映射和档案加纳唱法的研究。