eess.AS - 2023-09-12

iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation

paper_url: http://arxiv.org/abs/2309.06096
repo_url: None
paper_authors: Yong-Hyeok Lee, Namhyun Cho
for: Addresses the challenge of barge-in scenarios in human-machine communication.
methods: Leverages implicit acoustic echo cancellation (iAEC) techniques to improve user-defined keyword spotting models.
results: Achieves a 95% reduction in mean absolute error with a minimal increase in model size compared to the baseline model, PhonMatchNet.Here’s the text in Simplified Chinese:
for: 用于解决人机交互中的冲突场景。
methods: 利用隐式音频反射抑制（iAEC）技术提高用户定义关键词检测模型的效率。
results: 与基线模型PhonMatchNet相比，实现了95%的精度差异减少，模型大小增加0.13%。

Abstract
In response to the increasing interest in human--machine communication across various domains, this paper introduces a novel approach called iPhonMatchNet, which addresses the challenge of barge-in scenarios, wherein user speech overlaps with device playback audio, thereby creating a self-referencing problem. The proposed model leverages implicit acoustic echo cancellation (iAEC) techniques to increase the efficiency of user-defined keyword spotting models, achieving a remarkable 95% reduction in mean absolute error with a minimal increase in model size (0.13%) compared to the baseline model, PhonMatchNet. We also present an efficient model structure and demonstrate its capability to learn iAEC functionality without requiring a clean signal. The findings of our study indicate that the proposed model achieves competitive performance in real-world deployment conditions of smart devices.

摘要
为了应对不同领域的人机交互需求，这篇论文提出了一种新的方法called iPhonMatchNet，用于解决撞壳场景，在用户语音与设备播放声音之间存在自referencing问题。提出的模型利用隐式音频降噪技术（iAEC）提高用户定义关键词检测模型的效率，实现了95%的平均绝对错误减少，同时模型体积增加0.13%，与基线模型PhonMatchNet相比。我们还提出了一种高效的模型结构，并证明它可以不需要干净信号来学习iAEC功能。我们的研究发现，该模型在智能设备实际应用中的实际应用中达到了竞争水平。