eess.AS - 2023-10-24

Pre-training Music Classification Models via Music Source Separation

paper_url: http://arxiv.org/abs/2310.15845
repo_url: https://github.com/cgaroufis/msspt
paper_authors: Christos Garoufis, Athanasia Zlatintsi, Petros Maragos
for: 这个论文研究了 Whether music source separation can be used as a pre-training strategy for music representation learning, targeted at music classification tasks.
methods: 作者首先采用了 U-Net 网络，在不同的音乐源分离目标下进行了预训练，例如从音乐作品中隔离声乐或乐器源; 然后，他们附加了一个 convolutional tail network 到预训练后的 U-Net 上，并将整个网络进行了共同训练。 skip connections 也使得 separation 网络中学习的特征传递给了 tail network。
results: 实验结果表明，在两个公共可用的数据集上，采用预训练 U-Net 与 music source separation 目标可以提高 music classification 性能，特别是在使用 vocal separation 时的 music auto-tagging 任务中，以及在 multi-source separation 情况下的 music genre classification 任务中。

Abstract
In this paper, we study whether music source separation can be used as a pre-training strategy for music representation learning, targeted at music classification tasks. To this end, we first pre-train U-Net networks under various music source separation objectives, such as the isolation of vocal or instrumental sources from a musical piece; afterwards, we attach a convolutional tail network to the pre-trained U-Net and jointly finetune the whole network. The features learned by the separation network are also propagated to the tail network through skip connections. Experimental results in two widely used and publicly available datasets indicate that pre-training the U-Nets with a music source separation objective can improve performance compared to both training the whole network from scratch and using the tail network as a standalone in two music classification tasks: music auto-tagging, when vocal separation is used, and music genre classification for the case of multi-source separation.

摘要
在这篇论文中，我们研究了music源分离是否可以作为music表示学习的预训练策略，targeted at music分类任务。为此，我们首先在不同的music源分离目标下预训练U-Net网络，例如从音乐作品中隔离声乐或乐器源;然后，我们将预训练后的U-Net网络与一个 convolutional 尾网络结合，并同时练习整个网络。learned by separation network的特征也通过skip connections传递给尾网络。实验结果表明，预训练U-Nets with music source separation objective可以提高music classification tasks中的表现，比如音乐自动标签任务中使用声乐分离，以及music genre classification任务中的多源分离情况。