cs.SD - 2023-08-20

Indonesian Automatic Speech Recognition with XLSR-53

  • paper_url: http://arxiv.org/abs/2308.11589
  • repo_url: None
  • paper_authors: Panji Arisaputra, Amalia Zahra
  • For: 这个研究旨在开发一个使用XLSR-53预训练模型的印度尼西亚自动语音识别(ASR)系统,以减少非英语语言的训练数据量,以达到竞争力很高的单词错误率(WER)。* Methods: 该研究使用的方法包括使用XLSR-53预训练模型,并使用TITML-IDN、Magic Data和Common Voice等数据集进行训练。* Results: 该研究的结果显示,使用XLSR-53预训练模型可以在WER20%的基础上减少约8%的错误率,从而实现了在相似的模型中减少训练数据量的目标。
    Abstract This study focuses on the development of Indonesian Automatic Speech Recognition (ASR) using the XLSR-53 pre-trained model, the XLSR stands for cross-lingual speech representations. The use of this XLSR-53 pre-trained model is to significantly reduce the amount of training data in non-English languages required to achieve a competitive Word Error Rate (WER). The total amount of data used in this study is 24 hours, 18 minutes, and 1 second: (1) TITML-IDN 14 hours and 31 minutes; (2) Magic Data 3 hours and 33 minutes; and (3) Common Voice 6 hours, 14 minutes, and 1 second. With a WER of 20%, the model built in this study can compete with similar models using the Common Voice dataset split test. WER can be decreased by around 8% using a language model, resulted in WER from 20% to 12%. Thus, the results of this study have succeeded in perfecting previous research in contributing to the creation of a better Indonesian ASR with a smaller amount of data.
    摘要