results: 实验结果表明,使用面积掩码生成器可以达到与手动设计的掩码相当或更好的性能,并且可以明显提高SED模型的性能。该方法在DCASE 2023 Task4B Challenge中获得了最佳成绩。Abstract
The emergence of soft-labeled data for sound event detection (SED) effectively overcomes the lack of traditional strong-labeled data. However, the performance of present SED systems based on such soft labels is still unsatisfactory. In this work, we introduce a dual-branch SED model designed to leverage the information within soft labels. Four variations of the interacted convolutional module are presented to investigate the effective mechanism for information interaction. Furthermore, we incorporate the scene-based mask generated by an estimator to directly apply to the prediction of SED models. Experimental results show that the mask estimator can achieve comparable or even better performance than the manually-designed mask and significantly improve the performance of SED. The proposed approach achieved the top ranking in the DCASE 2023 Task4B Challenge.
摘要
文本中的软标注数据的出现有效地解决了强标注数据的缺乏问题。然而,现有的SED系统基于软标注的性能仍然不满足。在这种工作中,我们提出了一种基于软标注的双支分支SED模型,利用软标注中的信息。我们还提出了四种交互式卷积模块的变种,以研究信息交互的有效机制。此外,我们将场景基于的面积生成器 integrate into SED预测模型中。实验结果显示,面积估计器可以达到与手动设计的面积相同或更好的性能,并Significantly improve SED的性能。我们提出的方法在DCASE 2023 Task4B Challenge中获得了总冠军。