results: 结果显示,适应数据可能只需1~10小时来 достиunge性能的最高点(SEAME),而ASRU任务继续显示出更多适应数据可以提高性能(>100小时)。另外,使用不同的提示策略初始化 Whisper 模型的表现,但是透过适应 code-switch 数据,它的表现都能够改善。I hope this helps!Abstract
This paper details the experimental results of adapting the OpenAI's Whisper model for Code-Switch Mandarin-English Speech Recognition (ASR) on the SEAME and ASRU2019 corpora. We conducted 2 experiments: a) using adaptation data from 1 to 100/200 hours to demonstrate effectiveness of adaptation, b) examining different language ID setup on Whisper prompt. The Mixed Error Rate results show that the amount of adaptation data may be as low as $1\sim10$ hours to achieve saturation in performance gain (SEAME) while the ASRU task continued to show performance with more adaptation data ($>$100 hours). For the language prompt, the results show that although various prompting strategies initially produce different outcomes, adapting the Whisper model with code-switch data uniformly improves its performance. These results may be relevant also to the community when applying Whisper for related tasks of adapting to new target domains.
摘要
Using adaptation data from 1 to 100/200 hours to demonstrate the effectiveness of adaptation.2. Examining different language ID setups on the Whisper prompt.The Mixed Error Rate (MER) results show that the amount of adaptation data may be as low as 1-10 hours to achieve saturation in performance gain (SEAME), while the ASRU task continued to show performance improvement with more adaptation data (>100 hours). Additionally, the results show that adapting the Whisper model with code-switching data uniformly improves its performance, regardless of the prompting strategy used. These findings may be relevant to the community when applying Whisper for related tasks of adapting to new target domains.Here is the translation in Simplified Chinese:这篇论文描述了对OpenAI的Whisper模型进行code-switching普通话-英语语音识别(ASR)task的实验结果,使用了SEAME和ASRU2019 corpora。我们进行了两个实验:1. 使用适应数据从1到100/200小时来证明适应的效果。2. 对Whisper模型的语言ID设置进行不同的试验。Mixed Error Rate(MER)结果表明,适应数据的量可以在1-10小时内达到性能提升的最大值(SEAME),而ASRU任务继续显示出更多适应数据后的性能提升(>100小时)。此外,结果还表明,对Whisper模型进行code-switching数据适应后,其性能都会改善,无论使用哪种提示策略。这些发现可能对相关领域的社区有益,当应用Whisper模型进行目标领域的适应。