results: 研究发现,使用ANW可以significantly reductions in perceived annoyance(PAY)和听起来强度(PLN),并提高ISO pleasantness,但是水Masking可能会增加PLN。此外, combining ANC with maskers可以获得互动效果,使得maskers Significantly reduce PAY compared to ANC alone。Abstract
Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines the perceptual and objective aspects of an active-noise-control (ANC)-based "anti-noise" window (ANW) and its integration with informational masking (IM) in a model bedroom. Forty participants assessed the ANW in a three-way interaction involving noise types (traffic, train, and aircraft), maskers (bird, water), and ANC (on, off). The evaluation focused on perceived annoyance (PAY; ISO/TS 15666), perceived affective quality (ISO/TS 12913-2), loudness (PLN), and included an open-ended qualitative assessment. Despite minimal objective reduction in decibel-based indicators and a slight increase in psychoacoustic sharpness, the ANW alone demonstrated significant reductions in PAY and PLN, as well as an improvement in ISO pleasantness across all noise types. The addition of maskers generally enhanced overall acoustic comfort, although water masking led to increased PLN. Furthermore, the combination of ANC with maskers showed interaction effects, with both maskers significantly reducing PAY compared to ANC alone.
摘要
reviving natural ventilation (NV) for urban sustainability poses challenges for indoor acoustic comfort. active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. however, these approaches are not commonly integrated or evaluated from a perceptual standpoint. this study examines the perceptual and objective aspects of an active-noise-control (ANC)-based "anti-noise" window (ANW) and its integration with informational masking (IM) in a model bedroom. forty participants assessed the ANW in a three-way interaction involving noise types (traffic, train, and aircraft), maskers (bird, water), and ANC (on, off). the evaluation focused on perceived annoyance (PAY; ISO/TS 15666), perceived affective quality (ISO/TS 12913-2), loudness (PLN), and included an open-ended qualitative assessment. despite minimal objective reduction in decibel-based indicators and a slight increase in psychoacoustic sharpness, the ANW alone demonstrated significant reductions in PAY and PLN, as well as an improvement in ISO pleasantness across all noise types. the addition of maskers generally enhanced overall acoustic comfort, although water masking led to increased PLN. furthermore, the combination of ANC with maskers showed interaction effects, with both maskers significantly reducing PAY compared to ANC alone.
On decoder-only architecture for speech-to-text and large language model integration
results: 在多语言speech-to-text翻译任务上,该方法与强基eline比较,显示了较好的表现,这表明decoder-only模型在speech-to-text转换中可能具有优势。Abstract
Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion.
摘要
大型自然语言处理模型(LLM)已经取得了很大的成功,使得人机交互使用自然语言更加简单。然而,将语音信号纳入LLM中的整合还没有得到了充分的探索。“解oder-only”架构也没有受到过好的研究。在这个研究中,我们介绍了一种新的方法,称为Speech-LLaMA,可以有效地将语音信号纳入文本大型自然语言模型中。我们的方法利用Connectionist Temporal Classification和简单的音频编码器将压缩的语音特征映射到文本大型自然语言模型中的连续Semantic空间中。此外,我们还进一步探索了使用只有解oder-only架构进行语音识别任务的可能性,通过从单独的语音-文本对应数据中 randomly initialize一个较小规模的Speech-LLaMA模型进行训练。我们在多语言语音识别翻译任务中进行了实验,并达到了很大的改善,highlighting the potential advantages of decoder-only models for speech-to-text conversion。
LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad
paper_authors: Siting Xu, Yunlong Tang, Feng Zheng for:The paper is written for those who want to create music visualization designs for the Launchpad musical instrument.methods:The paper proposes a method called LaunchpadGPT, which uses a language model to generate music visualization designs on the Launchpad automatically.results:The proposed method can create better music visualization than random generation methods and has the potential for a broader range of music visualization applications.Here’s the text in Simplified Chinese:for: 本研究旨在帮助设计Launchpad的音乐视觉化,并提供一个更加可 accessible的方法来创建音乐视觉化。methods: 本研究提出了一个名为LaunchpadGPT的方法,它使用语言模型来自动生成Launchpad上的音乐视觉化设计。results: 实验结果显示,提案的方法可以对Launchpad上的音乐视觉化进行更好的设计,并且具有更广泛的应用前景。Abstract
Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the language model with excellent generation ability, our proposed LaunchpadGPT takes an audio piece of music as input and outputs the lighting effects of Launchpad-playing in the form of a video (Launchpad-playing video). We collect Launchpad-playing videos and process them to obtain music and corresponding video frame of Launchpad-playing as prompt-completion pairs, to train the language model. The experiment result shows the proposed method can create better music visualization than random generation methods and hold the potential for a broader range of music visualization applications. Our code is available at https://github.com/yunlong10/LaunchpadGPT/.
摘要
Launchpad是一种音乐 instrumente,允许用户通过键盘按钮来创作和演奏音乐。为了帮助设计Launchpad的光效和激励beginner创作音乐视觉,我们提出了LaunchpadGPT模型,自动生成Launchpad演奏的光效设计。基于优秀的语言模型,我们的LaunchpadGPT接受音乐作品作为输入,并输出Launchpad演奏的视频形式的光效设计。我们收集了大量Launchpad演奏视频,并对其进行处理,以获取音乐和对应的视频帧作为提示完成对的对。通过训练语言模型,我们实现了更好的音乐视觉创作。实验结果表明,我们的方法可以创造出比随机生成方法更好的音乐视觉,并拥有更广泛的音乐视觉应用前景。我们的代码可以在https://github.com/yunlong10/LaunchpadGPT/查看。