cs.SD - 2023-07-25

A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis

  • paper_url: http://arxiv.org/abs/2307.13346
  • repo_url: None
  • paper_authors: Li Xiao, Xiuping Yang, Xinhong Li, Weiping Tu, Xiong Chen, Weiyan Yi, Jie Lin, Yuhong Yang, Yanzhen Ren
  • for: This paper aims to identify the obstruction site of the upper airways in patients with Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) by analyzing snoring sounds.
  • methods: The paper proposes a snore-based sleep body position recognition dataset (SSBPR) consisting of 7570 snoring recordings, which includes six distinct labels for sleep body position. The authors use machine learning algorithms to analyze the acoustic features of snoring sounds and identify the sleep body position.
  • results: The experimental results show that snoring sounds exhibit certain acoustic features that can be used effectively to identify body posture during sleep in real-world scenarios.Here’s the information in Simplified Chinese text:
  • for: 这篇论文目标是通过分析呼吸声来识别抑制性睡眠呼吸综合症(OSAHS)患者的顶部空气道堵塞位置。
  • methods: 该论文提出了一个基于呼吸声的睡眠姿态识别数据集(SSBPR),包括7570个呼吸声记录,其中包括6种睡眠姿态标签:躺着、左右两侧躺着、左右两侧头躺着和躺着。作者们使用机器学习算法分析呼吸声的音频特征,以识别睡眠姿态。
  • results: 实验结果表明,呼吸声具有一定的音频特征,可以在实际应用场景中有效地利用呼吸声来识别睡眠姿态。
    Abstract Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a chronic breathing disorder caused by a blockage in the upper airways. Snoring is a prominent symptom of OSAHS, and previous studies have attempted to identify the obstruction site of the upper airways by snoring sounds. Despite some progress, the classification of the obstruction site remains challenging in real-world clinical settings due to the influence of sleep body position on upper airways. To address this challenge, this paper proposes a snore-based sleep body position recognition dataset (SSBPR) consisting of 7570 snoring recordings, which comprises six distinct labels for sleep body position: supine, supine but left lateral head, supine but right lateral head, left-side lying, right-side lying and prone. Experimental results show that snoring sounds exhibit certain acoustic features that enable their effective utilization for identifying body posture during sleep in real-world scenarios.
    摘要 《干扰性呼吸睡眠综合征(OSAHS)》是一种常见的呼吸滥血症,由上呼吸道堵塞引起。吸吮是OSAHS的一个明显的表现,以前的研究已经尝试过通过吸吮声音来确定上呼吸道堵塞的位置。然而,在真实的临床场景下,由于睡眠姿态的影响,这种分类仍然具有挑战性。为解决这个问题,本文提出了一个基于吸吮声音的睡眠姿态识别数据集(SSBPR),包括7570个吸吮记录,其中包括6种不同的睡眠姿态标签:躺平、躺平左半身、躺平右半身、左边躺、右边躺和躺股。实验结果表明,吸吮声音具有一些听频特征,可以有效地在真实的临床场景下用于识别睡眠姿态。

On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

  • paper_url: http://arxiv.org/abs/2307.13343
  • repo_url: None
  • paper_authors: Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung
  • for: 提高语音识别私隐和语音识别精度
  • methods: 使用隐藏声学特征的抗风险层,并在云端执行剩下的模型
  • results: 提高语音识别精度6.2%,降低语音识别人员精度33%
    Abstract Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition (ASR). The proposed framework attaches flexible gradient reversal based speaker adversarial layers to target layers within an ASR model, where speaker adversarial training anonymizes acoustic embeddings generated by the targeted layers to remove speaker identity. We propose on-device deployment by execution of initial layers of the ASR model, and transmitting anonymized embeddings to the cloud, where the rest of the model is executed while preserving privacy. Experimental results show that our method efficiently reduces speaker recognition relative accuracy by 33%, and improves ASR performance by achieving 6.2% relative Word Error Rate (WER) reduction.
    摘要 智能设备通常需要将用户数据传输到云端进行推理,这包括将私人用户信息(如说话人的身份)传输到云端。我们的论文提出了一个隐私增强框架,以保护说话人的隐私while preserving speech recognition accuracy。该框架通过在ASR模型中附加可变梯度逆转基于说话人对抗层来实现说话人匿名化,并在云端执行 оста卷ASR模型,以保持隐私。我们的方法可以减少说话人认可率Relative accuracy by 33%,并提高ASR性能by achieving 6.2% relative Word Error Rate (WER) reduction。

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

  • paper_url: http://arxiv.org/abs/2307.13295
  • repo_url: None
  • paper_authors: Youqiang Zheng, Li Xiao, Weiping Tu, Yuhong Yang, Xinmeng Xu
  • for: 提高低比特率 speech 编码器的质量
  • methods: 结合传统参数编码器和神经 vocoder 的新框架 CQNV,以减少比特率而不损失质量
  • results: 对比 Lyra 和 Encodec,我们的提议方法可以在 1.1 kbps 比特率下获得更高的重建语音质量
    Abstract Recently, speech codecs based on neural networks have proven to perform better than traditional methods. However, redundancy in traditional parameter quantization is visible within the codec architecture of combining the traditional codec with the neural vocoder. In this paper, we propose a novel framework named CQNV, which combines the coarsely quantized parameters of a traditional parametric codec to reduce the bitrate with a neural vocoder to improve the quality of the decoded speech. Furthermore, we introduce a parameters processing module into the neural vocoder to enhance the application of the bitstream of traditional speech coding parameters to the neural vocoder, further improving the reconstructed speech's quality. In the experiments, both subjective and objective evaluations demonstrate the effectiveness of the proposed CQNV framework. Specifically, our proposed method can achieve higher quality reconstructed speech at 1.1 kbps than Lyra and Encodec at 3 kbps.
    摘要 Note:* " parametric codec" transformed into "参数化编码器" (parameterized encoder)* "traditional speech coding parameters" transformed into "传统语音编码参数" (traditional speech coding parameters)* "neural vocoder" transformed into "神经 vocoder" (neural vocoder)* "bitstream" transformed into "比特流" (bitstream)* "reconstructed speech" transformed into "重建语音" (reconstructed speech)