results: 这些基本方法在不同未知分布下的评估结果,以及它们如何应对多项标签模型中的声音事件重叠。Abstract
Current audio classification models have small class vocabularies relative to the large number of sound event classes of interest in the real world. Thus, they provide a limited view of the world that may miss important yet unexpected or unknown sound events. To address this issue, open-set audio classification techniques have been developed to detect sound events from unknown classes. Although these methods have been applied to a multi-class context in audio, such as sound scene classification, they have yet to be investigated for polyphonic audio in which sound events overlap, requiring the use of multi-label models. In this study, we establish the problem of multi-label open-set audio classification by creating a dataset with varying unknown class distributions and evaluating baseline approaches built upon existing techniques.
摘要
当前的听音分类模型有较小的类 vocabulary,相对于实际世界中的听音类型的数量相对较多。因此,它们只能提供有限的视角,可能会错过一些重要却未知的听音事件。为解决这个问题,开放集 audio 分类技术已经开发出来,用于检测未知类别的听音事件。虽然这些方法在音频场景分类中已经应用,但它们尚未在多声音场景中进行研究,需要使用多标签模型。在这项研究中,我们将定义多标签开放集听音分类问题,创建不同未知类分布的数据集,并评估基础方法。
Intelligibility prediction with a pretrained noise-robust automatic speech recognition model
results: 两个系统在 CPC2 评估中表现出色,具体来说是在不同的噪音环境下预测声音质量的能力。Abstract
This paper describes two intelligibility prediction systems derived from a pretrained noise-robust automatic speech recognition (ASR) model for the second Clarity Prediction Challenge (CPC2). One system is intrusive and leverages the hidden representations of the ASR model. The other system is non-intrusive and makes predictions with derived ASR uncertainty. The ASR model is only pretrained with a simulated noisy speech corpus and does not take advantage of the CPC2 data. For that reason, the intelligibility prediction systems are robust to unseen scenarios given the accurate prediction performance on the CPC2 evaluation.
摘要
Neural domain alignment for spoken language recognition based on optimal transport
for: 提高cross-domain spoken language recognition(SLR)的效果,addressing domain shift challenge.
methods: 使用Unsupervised domain adaptation(UDA)算法,without relying on class labels in the target domain.
results: 提出了一种基于optimal transport(OT)的UDA算法,significantly improved the performance in a cross-channel SLR task compared to existing UDA algorithms.Here’s the full text in Simplified Chinese:
for: 本研究旨在提高cross-domain spoken language recognition(SLR)的效果, Addressing domain shift challenge.
results: 我们提出了一种基于optimal transport(OT)的UDA算法,在cross-channel SLR任务中与现有UDA算法相比,表现出了显著的改善。Abstract
Domain shift poses a significant challenge in cross-domain spoken language recognition (SLR) by reducing its effectiveness. Unsupervised domain adaptation (UDA) algorithms have been explored to address domain shifts in SLR without relying on class labels in the target domain. One successful UDA approach focuses on learning domain-invariant representations to align feature distributions between domains. However, disregarding the class structure during the learning process of domain-invariant representations can result in over-alignment, negatively impacting the classification task. To overcome this limitation, we propose an optimal transport (OT)-based UDA algorithm for a cross-domain SLR, leveraging the distribution geometry structure-aware property of OT. An OT-based discrepancy measure on a joint distribution over feature and label information is considered during domain alignment in OT-based UDA. Our previous study discovered that completely aligning the distributions between the source and target domains can introduce a negative transfer, where classes or irrelevant classes from the source domain map to a different class in the target domain during distribution alignment. This negative transfer degrades the performance of the adaptive model. To mitigate this issue, we introduce coupling-weighted partial optimal transport (POT) within our UDA framework for SLR, where soft weighting on the OT coupling based on transport cost is adaptively set during domain alignment. A cross-domain SLR task was used in the experiments to evaluate the proposed UDA. The results demonstrated that our proposed UDA algorithm significantly improved the performance over existing UDA algorithms in a cross-channel SLR task.
摘要
域外迁带来很大的挑战,对cross-domain spoken language recognition(SLR)的效果甚至是降低的。无监督适应(UDA)算法已经被探索以解决域外迁问题,无需在目标域中使用类别标签。一种成功的UDA方法是学习域外适应的域不同表示,以平衡特征分布的分布。但是,在学习过程中忽略目标域的类结构可能会导致过度平衡,从而负面影响分类任务。为了解决这些限制,我们提议一种基于最优运输(OT)的UDA算法,利用OT的分布几何结构特性。在OT中,我们考虑了一个联合分布 над feature和标签信息的误差度量,以便在域对齐过程中进行域外适应。我们的之前研究发现,完全对源和目标域的分布进行对齐可能会导致一种负面传递,其中源域中的类或无关类在目标域中的不同类型。这种负面传递会降低适应模型的性能。为了解决这个问题,我们在UDA框架中引入了coupling-weighted partial optimal transport(POT),其中在对齐过程中采用软约束的OT交互基于运输成本进行调整。我们使用了cross-domain SLR任务来评估我们的UDA算法。实验结果表明,我们的UDA算法在跨频SLR任务中表现得更好,与现有UDA算法相比。