methods: 论文使用了四维卷积函数(HSH)和圆柱卷积函数(SH) merged with one-dimensional basis functions,并对这两种方法进行比较。
results: 研究发现,使用四维连续函数模型可以将HRTF的大小谱спектроgram表示为一小组含义强的系数,这些系数可以在任何方向和频率下解码。HSH和SH merged with reverse Fourier-Bessel series表现最佳,其中HSH具有更好的压缩能力,对低级数据进行更高精度的重建。这些模型可以用于HRTF的 interpol、压缩和参数化,以及其他类型的直接函数的应用。Abstract
Utilizing spherical harmonic (SH) domain has been established as the default method of obtaining continuity over space in head-related transfer functions (HRTFs). This paper concerns different variants of extending this solution by replacing SHs with four-dimensional (4D) continuous functional models in which frequency is imagined as another physical dimension. Recently developed hyperspherical harmonic (HSH) representation is compared with models defined in spherindrical coordinate system by merging SHs with one-dimensional basis functions. The efficiency of both approaches is evaluated based on the reproduction errors for individual HRTFs from HUTUBS database, including detailed analysis of its dependency on chosen orders of approximation in frequency and space. Employing continuous functional models defined in 4D coordinate systems allows HRTF magnitude spectra to be expressed as a small set of coefficients which can be decoded back into values at any direction and frequency. The best performance was noted for HSHs and SHs merged with reverse Fourier-Bessel series, with the former featuring better compression abilities, achieving slightly higher accuracy for low number of coefficients. The presented models can serve multiple purposes, such as interpolation, compression or parametrization for machine learning applications, and can be applied not only to HRTFs but also to other types of directivity functions, e.g. sound source directivity.
摘要
utilizing 球面幂函数(SH)域已经被确立为HEAD相关传送函数(HRTF)的默认方法,这篇论文考虑了不同的扩展方案,其中 replacing SHs with four-dimensional(4D)连续函数模型,在这个模型中,频率被想象为空间中的另一个物理维度。最近开发的对称幂函数(HSH)表示与在圆柱坐标系中定义的模型相比较,并进行了详细的分析。这两种方法的效率被评估基于HRTF数据库中的重建错误,包括选择的频率和空间纬度的顺序方法的依赖关系。使用在4D坐标系中定义的连续函数模型,可以将HRTF的 магниту드 спектrum表示为一小组含义的系数,这些系数可以在任何方向和频率上解码。最佳性能被观察到在HSH和SH与反傅立埃尔-贝塞尔列表相加的情况下,前者具有更好的压缩能力,在低数量的系数下达到了微scopic的准确性。这些模型可以用于多种目的,如 interpolation、compression 或 parametrization for machine learning applications,并可以应用于其他类型的直径函数,如声源直径函数。
Interpretable Timbre Synthesis using Variational Autoencoders Regularized on Timbre Descriptors
paper_authors: Anastasia Natsiou, Luca Longo, Sean O’Leary
for: 研究控制音色synthesizer的方法
methods: 使用深度神经网络和变量自动编码器(VAEs)生成音色表示
results: 提出了一种含timbre描述的准则化VAEs,并使用音色的幂数内容来缩小维度Here’s the translation in English for reference:
for: Research on controllable timbre synthesis methods
methods: Using deep neural networks and Variational Autoencoders (VAEs) to generate timbre representations
results: Proposed a regularized VAE-based latent space that incorporates timbre descriptors, and utilized harmonic content to reduce the dimensionality of the latent space.Abstract
Controllable timbre synthesis has been a subject of research for several decades, and deep neural networks have been the most successful in this area. Deep generative models such as Variational Autoencoders (VAEs) have the ability to generate a high-level representation of audio while providing a structured latent space. Despite their advantages, the interpretability of these latent spaces in terms of human perception is often limited. To address this limitation and enhance the control over timbre generation, we propose a regularized VAE-based latent space that incorporates timbre descriptors. Moreover, we suggest a more concise representation of sound by utilizing its harmonic content, in order to minimize the dimensionality of the latent space.
摘要
控制性 timbre 合成已经是研究的主题之一,深度神经网络是这个领域中最成功的。深度生成模型如 Variational Autoencoders (VAEs) 可以生成高级别的声音表示,并提供结构化的幂轨空间。然而,这些幂轨空间在人类听觉上的解释能力往往有限。为了解决这个限制并提高声音生成控制,我们提议使用带有声音描述符的幂轨空间,并提出一种更简洁的声音表示方法,通过利用声音的和弦内容来减少幂轨空间的维度。
On Computing In the Network: Covid-19 Coughs Detection Case Study
results: 通过实验比较,论文表明,在网络设备内部进行计算可以更好地降低往返时间(RTT)和流量筛选。Abstract
Computing in the network (COIN) is a promising technology that allows processing to be carried out within network devices such as switches and network interface cards. Time sensitive application can achieve their quality of service (QoS) target by flexibly distributing the caching and computing tasks in the cloud-edge-mist continuum. This paper highlights the advantages of in-network computing, comparing to edge computing, in terms of latency and traffic filtering. We consider a critical use case related to Covid-19 alert application in an airport setting. Arriving travelers are monitored through cough analysis so that potentially infected cases can be detected and isolated for medical tests. A performance comparison has been done between an architecture using in-network computing and another one using edge computing. We show using simulations that in-network computing outperforms edge computing in terms of Round Trip Time (RTT) and traffic filtering.
摘要
计算在网络(COIN)是一种有前途的技术,允许在网络设备 such as 交换机和网络接口卡上进行处理。时间敏感应用可以通过在云端-边缘-浮云 continuum 中flexibly分配缓存和计算任务来实现质量服务(QoS)目标。本文highlights COIN的优势,比如边缘计算,在延迟和流量筛选方面。我们考虑了一个关键的covid-19预警应用场景,在机场设置下,来往旅客通过喊叫分析进行监测,检测和隔离涉疫患者进行医学测试。我们通过 simulated comparison 表明,使用COIN的architecture 在延迟(RTT)和流量筛选方面表现出色,高于边缘计算。
Multilingual Speech-to-Speech Translation into Multiple Target Languages
paper_authors: Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino
for: 多语言speech-to-speech翻译 (S2ST),即在不同语言之间的口头交流。
methods: 利用最新的直接S2ST技术,包括speech-to-unit和 vocoder,并将这些关键组件增加多语言能力。 specifically, the paper proposes a multilingual extension of S2U, called speech-to-masked-unit (S2MU), which applies masking to units that do not belong to the given target language to reduce language interference. Additionally, the paper proposes a multilingual vocoder trained with language embedding and the auxiliary loss of language identification.
results: 在多种benchmark翻译测试集上,提出的多语言模型比双语模型在英语到16种目标语言的翻译中表现更出色。Abstract
Speech-to-speech translation (S2ST) enables spoken communication between people talking in different languages. Despite a few studies on multilingual S2ST, their focus is the multilinguality on the source side, i.e., the translation from multiple source languages to one target language. We present the first work on multilingual S2ST supporting multiple target languages. Leveraging recent advance in direct S2ST with speech-to-unit and vocoder, we equip these key components with multilingual capability. Speech-to-masked-unit (S2MU) is the multilingual extension of S2U, which applies masking to units which don't belong to the given target language to reduce the language interference. We also propose multilingual vocoder which is trained with language embedding and the auxiliary loss of language identification. On benchmark translation testsets, our proposed multilingual model shows superior performance than bilingual models in the translation from English into $16$ target languages.
摘要
听说-听写翻译(S2ST)可以帮助人们通过不同语言的口语沟通。虽有一些关于多语言S2ST的研究,但他们的重点都是源语言的多语言性,即从多种源语言翻译到一个目标语言。我们提出了首个支持多个目标语言的多语言S2ST模型。我们利用了直接S2ST的最新进展,包括speech-to-unit(S2U)和 vocalsoder,并将这些关键组件具备多语言能力。speech-to-masked-unit(S2MU)是S2U的多语言扩展,它对不属于目标语言的单元应用掩蔽,以降低语言干扰。我们还提出了多语言 vocalsoder,它在语言嵌入和语言认识的 auxillary 损失下训练。在标准翻译测试集上,我们的提议的多语言模型在英语到16种目标语言的翻译中表现出色,超过了双语模型的表现。