eess.AS - 2023-10-20

GenDistiller: Distilling Pre-trained Language Models based on Generative Models

  • paper_url: http://arxiv.org/abs/2310.13418
  • repo_url: None
  • paper_authors: Yingying Gao, Shilei Zhang, Zihao Cui, Yanhan Xu, Chao Deng, Junlan Feng
  • for: 提高资源有限设备上下游任务表现
  • methods: 使用生成语言模型进行知识塑造框架,生成教师网络的隐藏层
  • results: 比基eline系统提高下游任务表现,并且可以在资源有限设备上进行应用
    Abstract Self-supervised pre-trained models such as HuBERT and WavLM leverage unlabeled speech data for representation learning and offer significantly improve for numerous downstream tasks. Despite the success of these methods, their large memory and strong computational requirements hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowledge distillation framework to distill hidden representations from teacher network based on generative language model. The generative structure enables the proposed model to generate the target teacher hidden layers autoregressively, considering the interactions between hidden layers without instroducing additional inputs. A two-dimensional attention mechanism is implemented to ensure the causality of hidden layers, while preserving bidirectional attention in the time dimension. Experiments reveal the advantage of the generative distiller over the baseline system that predicts the hidden layers of teacher network directly without a generatvie model.
    摘要 自我监督预训练模型如 HuBERT 和 WavLM 利用无标注语音数据进行表示学习,并有显著提高多个下游任务的表现。尽管这些方法取得成功,但它们具有大量内存和强大计算需求,因此阻碍它们在资源有限的设备上应用。因此,这篇论文介绍了 GenDistiller,一种新的知识填充框架,通过基于生成语言模型的教师网络来填充教师网络的隐藏表示。生成结构允许我们预测教师网络的目标隐藏层,考虑隐藏层之间的互动,而不需要额外的输入。在时间维度中,我们实现了双维度注意力机制,以保持隐藏层之间的 causality,同时保留双向注意力。实验表明,基于生成模型的填充distiller在基eline系统所predict teacher network的隐藏层直接而无法实现的优势。