results: 提供一个可以实现现场氛围和观众反馈的虚拟观众框架,实现了虚拟会议中的现场氛围和观众反馈。Abstract
The COVID-19 pandemic shifted many events in our daily lives into the virtual domain. While virtual conference systems provide an alternative to physical meetings, larger events require a muted audience to avoid an accumulation of background noise and distorted audio. However, performing artists strongly rely on the feedback of their audience. We propose a concept for a virtual audience framework which supports all participants with the ambience of a real audience. Audience feedback is collected locally, allowing users to express enthusiasm or discontent by selecting means such as clapping, whistling, booing, and laughter. This feedback is sent as abstract information to a virtual audience server. We broadcast the combined virtual audience feedback information to all participants, which can be synthesized as a single acoustic feedback by the client. The synthesis can be done by turning the collective audience feedback into a prompt that is fed to state-of-the-art models such as AudioGen. This way, each user hears a single acoustic feedback sound of the entire virtual event, without requiring to unmute or risk hearing distorted, unsynchronized feedback.
摘要
COVID-19 大流行使得许多日常生活活动转移到虚拟领域。虚拟会议系统为物理会议提供了替代方案,但是大型活动需要干杂背景噪音和扭曲的音频避免。但是表演艺术家强调audience反馈的重要性。我们提出了一种虚拟听众框架,该框架支持所有参与者在虚拟环境中感受到真实听众的氛围。听众反馈被本地收集,用户可以通过选择方式如掌声、喊喊、嘘声和笑声表达积极或不满。这些反馈信息被发送到虚拟听众服务器,然后将所有参与者发送的反馈信息组合并 Broadcast。客户端可以将这些反馈信息 sinthez为单一的音频反馈,不需要静音或听到扭曲的反馈。
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
results: 本文通过实证研究,证明了一些特性和功能的有效性,并达到了竞争性或者国际先进水平的表现。Abstract
TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio's development principles and contents and highlight key features we include in its latest version (2.1): self-supervised learning pre-trained pipelines and training recipes, high-performance CTC decoders, speech recognition models and training recipes, advanced media I/O capabilities, and tools for performing forced alignment, multi-channel speech enhancement, and reference-less speech assessment. For a selection of these features, through empirical studies, we demonstrate their efficacy and show that they achieve competitive or state-of-the-art performance.
摘要
torchAudio 是一个开源的音频和语音处理库,建立在 PyTorch 之上,旨在加速音频和语音技术的研究和开发。它的贡献者们routinely与用户交流,了解他们的需求,并通过开发有力量的功能来满足他们。在这篇文章中,我们将survey torchAudio 的开发原则和内容,并强调最新版本(2.1)中包含的关键功能。这些功能包括:自然语言处理预训练管道和训练规程,高性能的 CTC 解码器,语音识别模型和训练规程,高级媒体 I/O 能力,以及用于强制对应、多通道语音增强和无参考语音评估的工具。对这些功能的一些特点,我们通过实验研究证明了它们的有效性,并证明它们在比较或国际级的性能。