eess.AS - 2023-07-01

Enhancing the EEG Speech Match Mismatch Tasks With Word Boundaries

paper_url: http://arxiv.org/abs/2307.00366
repo_url: https://github.com/iiscleap/eegspeech-matchmismatch
paper_authors: Akshara Soman, Vidhi Sinha, Sriram Ganapathy
for: This paper is written for analyzing the underlying neural mechanisms of human speech comprehension, specifically using a match-mismatch classification of speech stimuli and neural responses.
methods: The paper uses a network of convolution layers to process both speech and EEG signals, followed by a word boundary-based average pooling and a recurrent layer to incorporate inter-word context.
results: The experiments show that the modeling accuracy can be significantly improved to 93% on a publicly available speech-EEG data set, which is higher than previous efforts that achieved an accuracy of 65-75% for this task.

Abstract
Recent studies have shown that the underlying neural mechanisms of human speech comprehension can be analyzed using a match-mismatch classification of the speech stimulus and the neural response. However, such studies have been conducted for fixed-duration segments without accounting for the discrete processing of speech in the brain. In this work, we establish that word boundary information plays a significant role in sentence processing by relating EEG to its speech input. We process the speech and the EEG signals using a network of convolution layers. Then, a word boundary-based average pooling is performed on the representations, and the inter-word context is incorporated using a recurrent layer. The experiments show that the modeling accuracy can be significantly improved (match-mismatch classification accuracy) to 93% on a publicly available speech-EEG data set, while previous efforts achieved an accuracy of 65-75% for this task.

摘要
近期研究表明，人类语言理解的下面神经机制可以通过匹配-不匹配分类的语音刺激和神经回快的方式进行分析。然而，这些研究通常是在固定时间段内进行的，没有考虑大脑对语音的精度处理。在这项工作中，我们证明了单词边界信息在句子处理中发挥了重要作用，并使用神经网络进行语音和EEG信号处理。然后，我们对表示进行了单词边界基于的均值抽取，并通过回快层 incorporate 了间隔词上下文。实验结果显示，我们的模型准确率可以提高至 93% 在一个公共可用的语音-EEG数据集上，而之前的努力只能达到 65-75% 的水平。