eess.AS - 2023-07-11

Predicting Tuberculosis from Real-World Cough Audio Recordings and Metadata

  • paper_url: http://arxiv.org/abs/2307.04842
  • repo_url: None
  • paper_authors: George P. Kafentzis, Stephane Tetsing, Joe Brew, Lola Jover, Mindaugas Galvosas, Carlos Chaccour, Peter M. Small
  • For: 这个研究旨在提高肺结核病毒检测和诊断的效率,通过使用手机应用程序记录喷气声音,并利用 spectral 和时间频谱特征进行分类。* Methods: 该研究使用了一个非常大的TB和非TB喷气声音数据集,来自南部非洲、印度和东南亚,使用自动化的手机应用程序(Hyfe)进行收集,无需人工标注。研究者采用了统计分类器,基于喷气声音的spectral和时间频谱特征,以及参与者的民生信息和临床特征。* Results: 研究发现,使用喷气声音alone可以达到平均的地区下折线(AUC)约为0.70$\pm$0.05,而加入民生信息和临床特征后,可以提高性能,达到平均的AUC约为0.81$\pm$0.05。这些结果表明,通过 integrating клиниче症状和喷气声音分析,可以通过手机应用程序帮助社区卫生工作者和医疗机构提高肺结核病毒检测和诊断的效率,从而改善公共卫生。
    Abstract Tuberculosis (TB) is an infectious disease caused by the bacterium Mycobacterium tuberculosis and primarily affects the lungs, as well as other body parts. TB is spread through the air when an infected person coughs, sneezes, or talks. Medical doctors diagnose TB in patients via clinical examinations and specialized tests. However, coughing is a common symptom of respiratory diseases such as TB. Literature suggests that cough sounds coming from different respiratory diseases can be distinguished by both medical doctors and computer algorithms. Therefore, cough recordings associated with patients with and without TB seems to be a reasonable avenue of investigation. In this work, we utilize a very large dataset of TB and non-TB cough audio recordings obtained from the south-east of Africa, India, and the south-east of Asia using a fully automated phone-based application (Hyfe), without manual annotation. We fit statistical classifiers based on spectral and time domain features with and without clinical metadata. A stratified grouped cross-validation approach shows that an average Area Under Curve (AUC) of approximately 0.70 $\pm$ 0.05 both for a cough-level and a participant-level classification can be achieved using cough sounds alone. The addition of demographic and clinical factors increases performance, resulting in an average AUC of approximately 0.81 $\pm$ 0.05. Our results suggest mobile phone-based applications that integrate clinical symptoms and cough sound analysis could help community health workers and, most importantly, health service programs to improve TB case-finding efforts while reducing costs, which could substantially improve public health.
    摘要 tuberkulosis (TB) 是一种感染性疾病,由 Mycobacterium tuberculosis 菌种引起,主要影响肺部以及其他身体部位。TB 通过空气传播,当感染者喊喊、喘息或说话时,会散发出TB。医生通过临床检查和专业测试来诊断TB。然而,喊喊是肺部疾病的常见symptom,文献表明,不同的喊喊 зву频可以由医生和计算机算法分辨。因此,通过喊喊录音来诊断TB 是一个可能的方向。在这种工作中,我们使用了非常大的TB和非TB喊喊音频记录,来自南非、印度和东南亚,使用了自动化的手机应用程序(Hyfe),无需手动标注。我们使用了spectral和时域特征,并与临床Metadata进行合并。我们使用过分组验证方法,得到了 aproximadamente 0.70 ± 0.05的平均报告值,使用喊喊音频alone。通过添加临床和Metadata,可以提高性能,得到了 aproximadamente 0.81 ± 0.05的平均报告值。我们的结果表明,通过 integrate 临床症状和喊喊音频分析,可以帮助社区卫生工作者和健康服务计划,提高TB患者探测的努力,降低成本,可以对公共卫生产生很大的改善。

Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer

  • paper_url: http://arxiv.org/abs/2307.04744
  • repo_url: None
  • paper_authors: Jenthe Thienpondt, Caroline M. Speksnijder, Kris Demuynck
  • for: 本研究探讨了口腔癌治疗期间 speaker embedding 的行为。
  • methods: 研究者使用 speaker embedding 分析了口腔癌患者在不同治疗阶段的发音特征。
  • results: 研究发现, pré-和 postsurgery speaker embedding 有显著差异,表示治疗后发音特征有所改变。然而,12个月后,部分发音特征回归到了前操作前的水平。此外,研究还发现,不同治疗阶段的同一个 speaker 之间的相似性与健康人的相似性相同,这表明 speaker embedding 可以捕捉到even severely impaired speech 的特征。最后,一个 speaker verification 分析表明,将不同治疗阶段的speech samples combine 后,false positive rate 相对稳定,false negative rate 变化。这表明 speaker embedding 具有对其他speaker的Robustness,同时仍能捕捉到治疗过程中发音特征的变化。
    Abstract In this paper, we analyze the behavior of speaker embeddings of patients during oral cancer treatment. First, we found that pre- and post-treatment speaker embeddings differ significantly, notifying a substantial change in voice characteristics. However, a partial recovery to pre-operative voice traits is observed after 12 months post-operation. Secondly, the same-speaker similarity at distinct treatment stages is similar to healthy speakers, indicating that the embeddings can capture characterizing features of even severely impaired speech. Finally, a speaker verification analysis signifies a stable false positive rate and variable false negative rate when combining speech samples of different treatment stages. This indicates robustness of the embeddings towards other speakers, while still capturing the changing voice characteristics during treatment. To the best of our knowledge, this is the first analysis of speaker embeddings during oral cancer treatment of patients.
    摘要 在这篇论文中,我们分析了口腔癌治疗期间说话人声迹的行为。我们发现,前期和后期治疗say的声迹存在显著差异,表明了声音特征的重要变化。然而,12个月后手术后,部分声迹还会恢复到初期的声音特征。此外,不同治疗阶段的同一个说话人声迹之间的相似性与正常说话人相似,这表明声迹可以捕捉到even severely impaired speech的特征。最后,将不同治疗阶段的说话样本组合起来进行说话人验证分析显示,声迹嵌入在其他说话人身上具有稳定的假阳性率和变化的假负性率,这表明声迹嵌入在其他说话人身上具有对说话人的稳定性,同时仍能捕捉到变化的声音特征。根据我们所知,这是首次对口腔癌治疗期间说话人声迹的分析。