results: 实验结果表明,使用DNN基于核函数的扩展 beamformer,并通过监督学习使用ARROW损失函数,能够同时实现语音增强和发音定位,并且在语音增强和发音定位方面获得了比基eline两个优化的性能Abstract
Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming weights. In this paper, we propose a deep beamformer based on an efficient convolutional recurrent network (CRN) trained with a novel ARray RespOnse-aWare (ARROW) loss function. The ARROW loss exploits the array responses of the target and interferer by using the ground truth relative transfer functions (RTFs). The DNN-based beamforming system, trained with ARROW loss through supervised learning, is able to perform speech enhancement and speaker localization jointly. Experimental results have shown that the proposed deep beamformer, trained with the linearly weighted scale-invariant source-to-noise ratio (SI-SNR) and ARROW loss functions, achieves superior performance in speech enhancement and speaker localization compared to two baselines.
摘要
现代深度神经网络(DNN)基本束扩展器的研究进展已经显示了优秀的沟通条件下语音增强的扩展。不同的网络架构和输入特征都被探索以计算扩展权重。在这篇论文中,我们提议一种高效的卷积环recurrent neural network(CRN),通过一种新的ARRAY RespOnse-aWare(ARROW)损失函数进行训练。ARROW损失函数利用目标和干扰者的陷阱响应函数(RTF)。DNN基本束扩展系统,通过监督学习,可以同时进行语音增强和speakerlocalization。实验结果表明,我们提议的深度束扩展器,通过线性权重等比例不变的源噪比(SI-SNR)和ARROW损失函数,在语音增强和speakerlocalization方面比基eline两个参考模型表现出优秀的性能。
On Feature Importance and Interpretability of Speaker Representations
results: 研究发现,certain speaker-specific acoustic-phonetic properties可以很好地从speaker embedding中预测,而investigated more abstract voice quality features则无法预测。Abstract
Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker's voice are captured and investigate to which extent do individual embedding vector components sign responsible for them, using the concept of Shapley values. Our findings show that certain speaker-specific acoustic-phonetic properties can be fairly well predicted from the speaker embedding, while the investigated more abstract voice quality features cannot.
摘要
转换文本到简化中文:<>无监督演化语音分解目标在分解快变化和慢变化语音信号中的快变化部分。在这篇论文中,我们更加仔细地研究表示慢变化语音信号组成部分的嵌入向量,通常被称为说话人嵌入向量。我们问,说话人的声音特征是哪些被捕捉,并investigate到哪些嵌入向量组件负责它们,使用基于Shapley值的概念。我们的发现表明, certain speaker-specific acoustic-phonetic properties可以很好地从说话人嵌入中预测,而investigated的更抽象的声音质量特征则不能。
A New Time Series Similarity Measure and Its Smart Grid Applications
paper_authors: Rui Yuan, S. Ali Pourmousavi, Wen L. Soong, Andrew J. Black, Jon A. R. Liisberg, Julian Lemos-Vinasco for: This paper aims to provide a new distance measure for comparing electricity usage patterns in smart grid applications, addressing the limitations of existing measures such as Euclidean Distance (ED) and Dynamic Time Warping (DTW).methods: The proposed method consists of two phases: (1) amplitude-based distance and (2) temporal-based distance, which quantify the effort required to reshape one time series into another considering both amplitude and temporal changes.results: The proposed distance measure outperforms ED and DTW in three smart grid applications: (1) identifying the best load scheduling strategy, (2) detecting anomalous days with irregular electricity usage, and (3) determining electricity users’ behind-the-meter (BTM) equipment.Abstract
Many smart grid applications involve data mining, clustering, classification, identification, and anomaly detection, among others. These applications primarily depend on the measurement of similarity, which is the distance between different time series or subsequences of a time series. The commonly used time series distance measures, namely Euclidean Distance (ED) and Dynamic Time Warping (DTW), do not quantify the flexible nature of electricity usage data in terms of temporal dynamics. As a result, there is a need for a new distance measure that can quantify both the amplitude and temporal changes of electricity time series for smart grid applications, e.g., demand response and load profiling. This paper introduces a novel distance measure to compare electricity usage patterns. The method consists of two phases that quantify the effort required to reshape one time series into another, considering both amplitude and temporal changes. The proposed method is evaluated against ED and DTW using real-world data in three smart grid applications. Overall, the proposed measure outperforms ED and DTW in accurately identifying the best load scheduling strategy, anomalous days with irregular electricity usage, and determining electricity users' behind-the-meter (BTM) equipment.
摘要
Many smart grid applications involve data mining, clustering, classification, identification, and anomaly detection, among others. These applications primarily depend on the measurement of similarity, which is the distance between different time series or subsequences of a time series. The commonly used time series distance measures, namely Euclidean Distance (ED) and Dynamic Time Warping (DTW), do not quantify the flexible nature of electricity usage data in terms of temporal dynamics. As a result, there is a need for a new distance measure that can quantify both the amplitude and temporal changes of electricity time series for smart grid applications, e.g., demand response and load profiling. This paper introduces a novel distance measure to compare electricity usage patterns. The method consists of two phases that quantify the effort required to reshape one time series into another, considering both amplitude and temporal changes. The proposed method is evaluated against ED and DTW using real-world data in three smart grid applications. Overall, the proposed measure outperforms ED and DTW in accurately identifying the best load scheduling strategy, anomalous days with irregular electricity usage, and determining electricity users' behind-the-meter (BTM) equipment.Simplified Chinese translation:许多智能Grid应用程序包括数据挖掘、划分、类型分类、识别和异常检测等,这些应用程序主要依赖测量相似性,即不同时间序列或时间序列 subsequences 之间的距离。常用的时间序列距离度量方法包括欧几何距离 (ED) 和时间扭曲摸索 (DTW),但这些方法不能Quantify electricity usage data的时间动态特性。因此,有一需要一个新的距离度量方法,可以Quantify electricity usage时间序列中的振幅和时间变化。这篇论文介绍了一种新的距离度量方法,用于比较电力使用模式。该方法包括两个阶段,第一阶段是Quantify将一个时间序列变换成另一个时间序列所需的努力,第二阶段是Quantify这个时间序列的振幅和时间变化。提议的方法在三个智能Grid应用程序中评估了与ED和DTW进行比较,并表明该方法在确定最佳负荷调度策略、异常日期和电力用户的后勤设备(BTM)等方面具有更高的准确性。