paper_authors: Vikentii Pankov, Valeria Pronina, Alexander Kuzmin, Maksim Borisov, Nikita Usoltsev, Xingshan Zeng, Alexander Golubkov, Nikolai Ermolenko, Aleksandra Shirshova, Yulia Matveeva
results: 这种方法可以在噪声中提供高质量的生成音频,并且不需要任何类型的噪声或噪声标注。此外,我们还提出了一种多任务协同学习方法,通过结合自动матиче预测和协同学习来提高生成音频的质量。Abstract
Recent progress in self-supervised representation learning has opened up new opportunities for training from unlabeled data and has been a growing trend in voice conversion. However, unsupervised training of voice cloning seems to remain a challenging task. In this paper we propose a semi-supervised zero-shot voice cloning approach that works by adapting a HuBERT-based voice conversion system to the voice cloning task and shows the robustness of such a system to noises both in training data (we add noises resulting in up to 0db signal-to-noise-ratio to 35% of training data with no significant degradation of evaluation metrics) and in the target speaker reference audio at inference. Moreover, such a method does not require any type of denoising or noise-labeling of training data. Finally, we introduce a novel multi-tasking approach by incorporating self-supervised DINO loss into joint training of a CAM++ based speaker verification system and a unit-based VITS cloning system. We show that it significantly improves the quality of generated audio over baselines, especially for noisy target speaker references.
摘要
Future Full-Ocean Deep SSPs Prediction based on Hierarchical Long Short-Term Memory Neural Networks
results: 在不同深度层次上月均声速分布的预测准确性比其他现有方法更高,月均声速分布的误差小于1米/秒Abstract
The spatial-temporal distribution of underwater sound velocity affects the propagation mode of underwater acoustic signals. Therefore, rapid estimation and prediction of underwater sound velocity distribution is crucial for providing underwater positioning, navigation and timing (PNT) services. Currently, sound speed profile (SSP) inversion methods have a faster time response rate compared to direct measurement methods, however, most SSP inversion methods focus on constructing spatial dimensional sound velocity fields and are highly dependent on sonar observation data, thus high requirements have been placed on observation data sources. To explore the distribution pattern of sound velocity in the time dimension and achieve future SSP prediction without sonar observation data, we propose a hierarchical long short-term memory (H-LSTM) neural network for SSP prediction. By our SSP prediction method, the sound speed distribution could be estimated without any on-site data measurement process, so that the time efficiency could be greatly improved. Through comparing with other state-of-the-art methods, H-LSTM has better accuracy performance on prediction of monthly average sound velocity distribution, which is less than 1 m/s in different depth layers.
摘要
<>Translate the following text into Simplified Chinese:The spatial-temporal distribution of underwater sound velocity affects the propagation mode of underwater acoustic signals. Therefore, rapid estimation and prediction of underwater sound velocity distribution is crucial for providing underwater positioning, navigation and timing (PNT) services. Currently, sound speed profile (SSP) inversion methods have a faster time response rate compared to direct measurement methods, however, most SSP inversion methods focus on constructing spatial dimensional sound velocity fields and are highly dependent on sonar observation data, thus high requirements have been placed on observation data sources. To explore the distribution pattern of sound velocity in the time dimension and achieve future SSP prediction without sonar observation data, we propose a hierarchical long short-term memory (H-LSTM) neural network for SSP prediction. By our SSP prediction method, the sound speed distribution could be estimated without any on-site data measurement process, so that the time efficiency could be greatly improved. Through comparing with other state-of-the-art methods, H-LSTM has better accuracy performance on prediction of monthly average sound velocity distribution, which is less than 1 m/s in different depth layers.Translation:水下声速分布的空间-时间分布对声音信号的传播模式产生影响,因此快速估计和预测水下声速分布是提供水下定位、导航和时间服务(PNT)的关键。目前,声速Profile(SSP)反向方法有更快的时间响应率,但大多数SSP反向方法都是建立空间维度的声速场,高度依赖于声波观测数据,因此对观测数据的要求非常高。为了探索声速分布的时间维度分布 pattern和实现未来SSP预测无需声波观测数据,我们提议使用层次long short-term memory(H-LSTM)神经网络进行SSP预测。我们的预测方法可以无需任何现场数据测量过程,因此可以大幅提高时间效率。与其他现有方法比较,H-LSTM在月均声速分布预测中表现出更高的准确性,声速分布在不同深度层中的误差低于1 m/s。
for: bridging the gap between recent advancements in Neural Audio Synthesis (NAS) and standardized evaluation methodologies.
methods: open-source Python library with a range of audio quality metrics, including a unique Python implementation of the basic PEAQ algorithm, and multiple operating modes to accommodate various user needs.
results: simplifies and standardizes the evaluation of NAS systems.Abstract
Recent advancements in Neural Audio Synthesis (NAS) have outpaced the development of standardized evaluation methodologies and tools. To bridge this gap, we introduce AquaTk, an open-source Python library specifically designed to simplify and standardize the evaluation of NAS systems. AquaTk offers a range of audio quality metrics, including a unique Python implementation of the basic PEAQ algorithm, and operates in multiple modes to accommodate various user needs.
摘要
最近的神经音频合成(NAS)技术的发展速度超过了评估方法和工具的开发。为了bridging这个差距,我们介绍了AquaTk,一个开源的Python库,专门用于简化和标准化NAS系统的评估。AquaTk提供了多种音频质量指标,包括Python中Unique实现的基本PEAQ算法,并在多种模式下运行,以满足不同用户需求。