cs.SD - 2023-09-09

Exploring Music Genre Classification: Algorithm Analysis and Deployment Architecture

paper_url: http://arxiv.org/abs/2309.04861
repo_url: None
paper_authors: Ayan Biswas, Supriya Dhabal, Palaniandavar Venkateswaran
for: 这篇论文是为了研究音乐类别分类而写的。
methods: 这篇论文使用了数字信号处理（DSP）和深度学习（DL）技术，提出了一种结合DSP和DL方法的音乐类别分类算法。
results: 该算法在GTZAN数据集上进行测试，准确率高。此外，文章还提出了一种端到端部署架构，用于音乐相关应用的集成。

Abstract
Music genre classification has become increasingly critical with the advent of various streaming applications. Nowadays, we find it impossible to imagine using the artist's name and song title to search for music in a sophisticated music app. It is always difficult to classify music correctly because the information linked to music, such as region, artist, album, or non-album, is so variable. This paper presents a study on music genre classification using a combination of Digital Signal Processing (DSP) and Deep Learning (DL) techniques. A novel algorithm is proposed that utilizes both DSP and DL methods to extract relevant features from audio signals and classify them into various genres. The algorithm was tested on the GTZAN dataset and achieved high accuracy. An end-to-end deployment architecture is also proposed for integration into music-related applications. The performance of the algorithm is analyzed and future directions for improvement are discussed. The proposed DSP and DL-based music genre classification algorithm and deployment architecture demonstrate a promising approach for music genre classification.

摘要
音乐类别分类已成为现代音乐应用程序中的关键环节。如今，我们无法想象使用艺术家名和歌曲名来在高级音乐应用程序中搜索音乐。因为音乐相关信息，如地区、艺术家、专辑和非专辑等，是非常变化的。本文提出了一种结合数字信号处理（DSP）和深度学习（DL）技术的音乐类别分类算法。该算法利用了DSP和DL方法来提取音频信号中相关的特征并将其分类为不同的类别。该算法在GTZAN数据集上进行测试并达到了高精度。本文还提出了将该算法集成到音乐相关应用程序中的综合投入体系。算法的性能分析和未来改进方向也被讨论。提出的DSP和DL基于的音乐类别分类算法和投入体系表现出了可行的应用前景。

Generalized Minimum Error with Fiducial Points Criterion for Robust Learning

paper_url: http://arxiv.org/abs/2309.04670
repo_url: None
paper_authors: Haiquan Zhao, Yuan Gao, Yingying Zhu
for: 提高 minimum error entropy criterion 的灵活性和敏感性，并应对不确定性Error probability density function locations。
methods: 采用 Generalized Gaussian Density 函数作为 kernel，提供更多控制 tail 行为和峰度的能力。
results: 在适应Filter、kernel recursive algorithm、多层感知等领域的numerical simulations中，提出的新算法表现出色，比如系统识别、声学闭合取消、时间序列预测和超vised classification。

Abstract
The conventional Minimum Error Entropy criterion (MEE) has its limitations, showing reduced sensitivity to error mean values and uncertainty regarding error probability density function locations. To overcome this, a MEE with fiducial points criterion (MEEF), was presented. However, the efficacy of the MEEF is not consistent due to its reliance on a fixed Gaussian kernel. In this paper, a generalized minimum error with fiducial points criterion (GMEEF) is presented by adopting the Generalized Gaussian Density (GGD) function as kernel. The GGD extends the Gaussian distribution by introducing a shape parameter that provides more control over the tail behavior and peakedness. In addition, due to the high computational complexity of GMEEF criterion, the quantized idea is introduced to notably lower the computational load of the GMEEF-type algorithm. Finally, the proposed criterions are introduced to the domains of adaptive filter, kernel recursive algorithm, and multilayer perceptron. Several numerical simulations, which contain system identification, acoustic echo cancellation, times series prediction, and supervised classification, indicate that the novel algorithms' performance performs excellently.

摘要
传统的最小错误Entropy（MEE）具有局限性，显示了错误均值的减少敏感性和不确定性关于错误概率分布的位置。为了缓解这些局限性，一种基于 fiducial points 的 MEE（MEEF）被提出。然而，MEEF 的效果不稳定，因为它依赖于固定的 Gaussian 核。在这篇论文中，一种通用的最小错误与 fiducial points criterion（GMEEF）被提出，通过采用通用 Gaussian Density 函数（GGD）作为核来扩展 Gaussian 分布。GGD 在尾部和峰值方面提供更多的控制，并且可以更好地捕捉非常轻量级的噪声。此外，由于 GMEEF criterion 的计算复杂度较高，因此在这篇论文中，一种quantized 的想法被引入，以减少 GMEEF-type 算法的计算负担。最后，提出了在适应过滤器、基于 Recursive Algorithm 的 kernel 算法和多层感知机中应用 GMEEF 和 MEEF criterion。numerical simulations 表明，提出的新算法在系统标识、音频回声抑制、时间序列预测和监督学习等领域的性能几乎卓越。

Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

paper_url: http://arxiv.org/abs/2309.04654
repo_url: None
paper_authors: Huaibo Zhao, Yosuke Higuchi, Yusuke Kida, Tetsuji Ogawa, Tetsunori Kobayashi
for: 这 paper 的目的是检验Mask-CTC基于预训练的效果，以提高流式自动语音识别（ASR）系统的准确率和速度。
methods: 这 paper 使用的方法包括Mask-CTC基于预训练、触发注意力和不同的模型架构（如Transformer-Transducer和 contextual block streaming ASR）。
results: 研究发现，Mask-CTC基于预训练可以提高不同模型架构的流式ASR准确率和速度，且可以获得正确的输出脉冲时间。

Abstract
Achieving high accuracy with low latency has always been a challenge in streaming end-to-end automatic speech recognition (ASR) systems. By attending to more future contexts, a streaming ASR model achieves higher accuracy but results in larger latency, which hurts the streaming performance. In the Mask-CTC framework, an encoder network is trained to learn the feature representation that anticipates long-term contexts, which is desirable for streaming ASR. Mask-CTC-based encoder pre-training has been shown beneficial in achieving low latency and high accuracy for triggered attention-based ASR. However, the effectiveness of this method has not been demonstrated for various model architectures, nor has it been verified that the encoder has the expected look-ahead capability to reduce latency. This study, therefore, examines the effectiveness of Mask-CTCbased pre-training for models with different architectures, such as Transformer-Transducer and contextual block streaming ASR. We also discuss the effect of the proposed pre-training method on obtaining accurate output spike timing.

摘要
在流动式自动语音识别（ASR）系统中，实现高精度低延迟总是一个挑战。 streaming ASR 模型通过更多未来上下文来提高准确率，但是会增加延迟，这会对流动性表现不利。在面具-CTC 框架中，一个Encoder网络被训练来学习预测长期上下文的特征表示，这是流动 ASR 中所需的。面具-CTC 基于的encoder预训练有助于实现低延迟和高精度的触发注意力基于 ASR。然而，这种方法的效果尚未在不同的模型结构上进行了证明，也没有确定Encoder是否具有预期的推迟能力来减少延迟。本研究，因此，检查了不同的模型结构，如 Transformer-Transducer 和 contextual block streaming ASR 中的 Mask-CTC 基于预训练的效果。我们还讨论了提取模型的输出脉冲时间是否准确。