results: Achieves a 95% reduction in mean absolute error with a minimal increase in model size compared to the baseline model, PhonMatchNet.Here’s the text in Simplified Chinese:
for: 用于解决人机交互中的冲突场景。
methods: 利用隐式音频反射抑制(iAEC)技术提高用户定义关键词检测模型的效率。
results: 与基线模型PhonMatchNet相比,实现了95%的精度差异减少,模型大小增加0.13%。Abstract
In response to the increasing interest in human--machine communication across various domains, this paper introduces a novel approach called iPhonMatchNet, which addresses the challenge of barge-in scenarios, wherein user speech overlaps with device playback audio, thereby creating a self-referencing problem. The proposed model leverages implicit acoustic echo cancellation (iAEC) techniques to increase the efficiency of user-defined keyword spotting models, achieving a remarkable 95% reduction in mean absolute error with a minimal increase in model size (0.13%) compared to the baseline model, PhonMatchNet. We also present an efficient model structure and demonstrate its capability to learn iAEC functionality without requiring a clean signal. The findings of our study indicate that the proposed model achieves competitive performance in real-world deployment conditions of smart devices.
摘要
为了应对不同领域的人机交互需求,这篇论文提出了一种新的方法called iPhonMatchNet,用于解决撞壳场景,在用户语音与设备播放声音之间存在自referencing问题。提出的模型利用隐式音频降噪技术(iAEC)提高用户定义关键词检测模型的效率,实现了95%的平均绝对错误减少,同时模型体积增加0.13%,与基线模型PhonMatchNet相比。我们还提出了一种高效的模型结构,并证明它可以不需要干净信号来学习iAEC功能。我们的研究发现,该模型在智能设备实际应用中的实际应用中达到了竞争水平。