eess.AS - 2023-09-12

iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation

  • paper_url: http://arxiv.org/abs/2309.06096
  • repo_url: None
  • paper_authors: Yong-Hyeok Lee, Namhyun Cho
  • for: Addresses the challenge of barge-in scenarios in human-machine communication.
  • methods: Leverages implicit acoustic echo cancellation (iAEC) techniques to improve user-defined keyword spotting models.
  • results: Achieves a 95% reduction in mean absolute error with a minimal increase in model size compared to the baseline model, PhonMatchNet.Here’s the text in Simplified Chinese:
  • for: 用于解决人机交互中的冲突场景。
  • methods: 利用隐式音频反射抑制(iAEC)技术提高用户定义关键词检测模型的效率。
  • results: 与基线模型PhonMatchNet相比,实现了95%的精度差异减少,模型大小增加0.13%。
    Abstract In response to the increasing interest in human--machine communication across various domains, this paper introduces a novel approach called iPhonMatchNet, which addresses the challenge of barge-in scenarios, wherein user speech overlaps with device playback audio, thereby creating a self-referencing problem. The proposed model leverages implicit acoustic echo cancellation (iAEC) techniques to increase the efficiency of user-defined keyword spotting models, achieving a remarkable 95% reduction in mean absolute error with a minimal increase in model size (0.13%) compared to the baseline model, PhonMatchNet. We also present an efficient model structure and demonstrate its capability to learn iAEC functionality without requiring a clean signal. The findings of our study indicate that the proposed model achieves competitive performance in real-world deployment conditions of smart devices.
    摘要 为了应对不同领域的人机交互需求,这篇论文提出了一种新的方法called iPhonMatchNet,用于解决撞壳场景,在用户语音与设备播放声音之间存在自referencing问题。提出的模型利用隐式音频降噪技术(iAEC)提高用户定义关键词检测模型的效率,实现了95%的平均绝对错误减少,同时模型体积增加0.13%,与基线模型PhonMatchNet相比。我们还提出了一种高效的模型结构,并证明它可以不需要干净信号来学习iAEC功能。我们的研究发现,该模型在智能设备实际应用中的实际应用中达到了竞争水平。