results: 比traditional平衡技术和stereo播放更受欢迎Abstract
The topic of room equalisation has been at the forefront of research and product development for many years, with the aim of increasing the playback quality of loudspeakers in reverberant rooms. Traditional room equalisation systems comprise of a number of filters that when applied to the primary loudspeakers, additional room colouration is compensated for. This publication introduces a novel equalisation technique where gammatone filter band energy is added to the reverberant sound field via two surround loudspeakers, leaving the direct sound from the primary loudspeakers unaltered, but the sum of direct and reverberant energy is equalised at the listening position. Unlike traditional systems, this method allows the target function of the direct sound to differ from the reverberant sound field. The proposed method is motivated by the different roles direct and reverberant sound components play in humans perception of sound. Along with introducing the proposed method, results from a subjective listening test are presented, demonstrating the preference towards the proposed technique when compared to a traditional room equalisation technique and stereo playback.
摘要
topic of 房间平衡已经是多年来研究和产品开发的焦点,目的是提高喷流房间中 loudspeakers 的播放质量。传统的房间平衡系统包括多个缓减器,当应用于主要喷流speakers时,会赔偿房间颜色。这篇文章介绍了一种新的平衡技术,通过两个围声speakers 将 gammatone 缓减器带能量添加到透传声场中,保留直接喷流speakers 不变,但是在听众位置进行平衡。与传统系统不同,这种方法允许目标函数直接喷流的音响不同于透传声场。该提议的方法被动机于人类听众对音响的感知中直接和透传声场的不同角色。文章还 introduce 了这种方法并发布了一个对比传统房间平衡技术和 stero 播放的主观听测结果。
An analysis of large speech models-based representations for speech emotion recognition
results: 研究发现,无需迁移,一些大语音模型的表示能够包含情感识别任务中的信息,使得表现与标准数据集上的 state-of-the-art 结果几乎相同。Abstract
Large speech models-derived features have recently shown increased performance over signal-based features across multiple downstream tasks, even when the networks are not finetuned towards the target task. In this paper we show the results of an analysis of several signal- and neural models-derived features for speech emotion recognition. We use pretrained models and explore their inherent potential abstractions of emotions. Simple classification methods are used so as to not interfere or add knowledge to the task. We show that, even without finetuning, some of these large neural speech models' representations can enclose information that enables performances close to, and even beyond state-of-the-art results across six standard speech emotion recognition datasets.
摘要
大型语音模型Derived feature在多个下游任务中表现出来的提高,即使无需finetune到目标任务。在这篇论文中,我们进行了各种信号模型和神经网络模型Derived feature的分析,用于speech emotion认知。我们使用预训练模型,探索它们的内在抽象情感。我们使用简单的分类方法,以避免干扰或添加任务知识。我们发现,无需finetune,一些大型语音模型的表示可以包含情感认知的信息,使其表现与或超过标准六个speech emotion认知 dataset的状态的报告。