cs.SD - 2023-10-24

IA Para el Mantenimiento Predictivo en Canteras: Modelado

  • paper_url: http://arxiv.org/abs/2310.16140
  • repo_url: None
  • paper_authors: Fernando Marcos, Rodrigo Tamaki, Mateo Cámara, Virginia Yagüe, José Luis Blanco
  • for: 该论文目的是优化采矿业中的操作。
  • methods: 该论文使用无监督学习方法,训练一个变量自动编码器模型,使其能够从处理线操作时录制的听录中提取有用信息。
  • results: 研究结果表明,该模型能够将录制的听录重建并表示在隐藏空间中,并能够捕捉操作条件之间和设备之间的差异。未来,这可能会促进听录的分类和机器衰老的探测。
    Abstract Dependence on raw materials, especially in the mining sector, is a key part of today's economy. Aggregates are vital, being the second most used raw material after water. Digitally transforming this sector is key to optimizing operations. However, supervision and maintenance (predictive and corrective) are challenges little explored in this sector, due to the particularities of the sector, machinery and environmental conditions. All this, despite the successes achieved in other scenarios in monitoring with acoustic and contact sensors. We present an unsupervised learning scheme that trains a variational autoencoder model on a set of sound records. This is the first such dataset collected during processing plant operations, containing information from different points of the processing line. Our results demonstrate the model's ability to reconstruct and represent in latent space the recorded sounds, the differences in operating conditions and between different equipment. In the future, this should facilitate the classification of sounds, as well as the detection of anomalies and degradation patterns in the operation of the machinery.
    摘要 现代经济中Raw materials的依赖性,特别是采矿业,是非常重要的。各种粒子材料是第二重要的原材料,占用率很高。通过数字化转型,可以优化操作。但是,监督和维护(预测和修复)在这个领域尚未得到充分的探索,这主要归结于采矿业的特殊性、机器和环境条件。尽管在其他场景中监测器和接触传感器已经取得了成功,但这些成果在采矿业中尚未得到充分利用。我们提出了一种不监督学习方案,通过对一组声音记录进行训练,并将其模型化为一种变分自动机器学习模型。这是首次在处理厂操作过程中收集的声音数据集,包含不同点检测的声音信息。我们的结果表明模型能够重建和表示声音记录中的差异和不同设备的操作条件。未来,这将使得声音的分类和机器设备的磨损和腐食特征的检测变得更加容易。

CDSD: Chinese Dysarthria Speech Database

  • paper_url: http://arxiv.org/abs/2310.15930
  • repo_url: None
  • paper_authors: Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-Jing Wang
  • For: The paper is written for researchers and professionals working in the field of dysarthria, specifically those interested in speech recognition and dysarthric speech.* Methods: The paper describes the data collection and annotation processes for the Chinese Dysarthria Speech Database (CDSD), as well as an approach for establishing a baseline for dysarthric speech recognition. The authors also conducted a speaker-dependent dysarthric speech recognition experiment using additional data from one participant.* Results: The paper reports that extensive data-driven model training and fine-tuning limited quantities of specific individual data can yield commendable results in speaker-dependent dysarthric speech recognition. However, the authors observe significant variations in recognition results among different dysarthric speakers.Here is the information in Simplified Chinese text:* For: 这篇论文是为了探讨异常语音障碍(dysarthria)研究而写的,特别是关注语音识别和异常语音识别。* Methods: 论文描述了中国异常语音语音库(CDSD)的数据收集和标注过程,以及一种建立异常语音识别基线的方法。作者还进行了一个参与者具体语音识别实验。* Results: 论文发现,通过大量数据驱动模型训练和特定个体数据精细调整,可以在参与者具体语音识别中获得了可夸dp的成绩。然而,作者发现异常语音 speaker之间的识别结果存在显著的差异。这些发现可以作为异常语音识别的参考点。
    Abstract We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text pool primarily consists of content from the AISHELL-1 dataset and speeches by primary and secondary school students. When participants read these texts, they must use a mobile device or the ZOOM F8n multi-track field recorder to record their speeches. In this paper, we elucidate the data collection and annotation processes and present an approach for establishing a baseline for dysarthric speech recognition. Furthermore, we conducted a speaker-dependent dysarthric speech recognition experiment using an additional 10 hours of speech data from one of our participants. Our research findings indicate that, through extensive data-driven model training, fine-tuning limited quantities of specific individual data yields commendable results in speaker-dependent dysarthric speech recognition. However, we observe significant variations in recognition results among different dysarthric speakers. These insights provide valuable reference points for speaker-dependent dysarthric speech recognition.
    摘要 我们介绍中国带有肥瘤症(dysarthria)演说数据库(CDSD)作为肥瘤症研究的有价值资源。这个数据库包括24名参与者的肥瘤症演说数据。这些参与者中有一些录制了额外的10小时演说数据,而每个人录制了1小时演说,共计34小时的演说材料。为了适应参与者的不同认知水平,我们的文本池主要来自AISHELL-1数据集和primary和secondary学校学生的演说。当参与者阅读这些文本时,他们需要使用移动设备或ZOOM F8n多轨采集器来记录他们的演说。在这篇论文中,我们详细介绍了数据收集和注释过程,并提出了基准建立肥瘤症演说识别的方法。此外,我们通过使用一名参与者的额外10小时演说数据进行了一个参与者依存的肥瘤症演说识别实验。我们的研究发现,通过大量数据驱动模型训练和精细调整限量的具体个人数据,可以在参与者依存的肥瘤症演说识别中获得优秀的结果。但我们发现,不同的肥瘤症演说者之间存在显著的识别结果差异。这些发现提供了价值的参考点 для参与者依存的肥瘤症演说识别。

FOLEY-VAE: Generación de efectos de audio para cine con inteligencia artificial

  • paper_url: http://arxiv.org/abs/2310.15663
  • repo_url: None
  • paper_authors: Mateo Cámara, José Luis Blanco
  • for: 这个研究旨在开发一种基于变量自动编码器的界面,用于创新创造FOLEY效果。
  • methods: 该模型通过各种自然声音训练,可以在实时传输新的声音特征到预录的音频或 Microphone 捕集的语音。此外,它还允许用户在实时进行交互式修改潜在变量,以实现精细化和个性化的艺术调整。
  • results: 研究基于上一年度这同学会上的研究,分析了现有的 RAVE 模型(一种特性化于音频效果生成的变量自动编码器)。该模型在 Audio 效果生成方面取得了成功,包括电磁、科幻、水声等效果。这种创新的方法已经为西班牙第一部使用人工智能生成的短片电影做出了贡献,这个突破口显示了人工智能在电影制作中的潜在价值和创新潜力。
    Abstract In this research, we present an interface based on Variational Autoencoders trained with a wide range of natural sounds for the innovative creation of Foley effects. The model can transfer new sound features to prerecorded audio or microphone-captured speech in real time. In addition, it allows interactive modification of latent variables, facilitating precise and customized artistic adjustments. Taking as a starting point our previous study on Variational Autoencoders presented at this same congress last year, we analyzed an existing implementation: RAVE [1]. This model has been specifically trained for audio effects production. Various audio effects have been successfully generated, ranging from electromagnetic, science fiction, and water sounds, among others published with this work. This innovative approach has been the basis for the artistic creation of the first Spanish short film with sound effects assisted by artificial intelligence. This milestone illustrates palpably the transformative potential of this technology in the film industry, opening the door to new possibilities for sound creation and the improvement of artistic quality in film productions.
    摘要 在这项研究中,我们提出了基于变量自动编码器的界面,用于创新 FOLEY 效果。该模型可以在实时传输新的声音特征,并允许互动修改潜在变量,以达到精确和定制化的艺术调整。作为上一年在这同会议上发表的前一项研究的起点,我们分析了现有的实现:RAVE [1]。这个模型已经专门用于音频效果生成。它在不同的音频效果上取得了成功,包括电磁、科幻和水声等,与此研究一起发表。这一创新方法已经成为了艺术创作中使用人工智能帮助生成音效的首个 milestone,这个突破口将开启新的可能性,以提高电影制作中的艺术质量。