results: 本研究得到了许多有价值的结果,包括发现了一些新的侧通道攻击和防范策略,以及确定了音频侧通道和反向问题之间的深刻联系。Abstract
We provide a state-of-the-art analysis of acoustic side channels, cover all the significant academic research in the area, discuss their security implications and countermeasures, and identify areas for future research. We also make an attempt to bridge side channels and inverse problems, two fields that appear to be completely isolated from each other but have deep connections.
摘要
我们提供了最新的分析方法,涵盖了全部重要的学术研究领域,讨论了他们的安全影响和防范措施,并确定了未来研究的方向。我们还尝试将侧频渠道和反问题两个领域联系起来,这两个领域之前被视为完全不相关的。
Characterization of cough sounds using statistical analysis
For: The paper aims to characterize cough sounds with voiced content and cough sounds without voiced content, and compare the cough sound characteristics with speech signals.* Methods: The proposed method utilizes spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attributes to describe the cough sounds related to the respiratory system, glottal information, and voice model. These attributes are then subjected to statistical analysis using measures of minimum, maximum, mean, median, and standard deviation.* Results: The experimental results show that the mean and frequency distribution of spectral roll-off, spectral centroid, and spectral bandwidth are higher for cough sounds than for speech signals. Spectral flatness levels in cough sounds are found to be around 0.22, while spectral flux varies between 0.3 and 0.6. The zero crossing rate of most frames of cough sounds is between 0.05 and 0.4. These attributes contribute significant information while characterizing cough sounds.Here’s the simplified Chinese text for the three key points:* For: 这个研究的目的是Characterizing cough sounds with voiced content and cough sounds without voiced content, and comparing the cough sound characteristics with speech signals.* Methods: 该方法使用 spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attributes to describe the cough sounds related to the respiratory system, glottal information, and voice model. These attributes are then subjected to statistical analysis using measures of minimum, maximum, mean, median, and standard deviation.* Results: 实验结果显示,cough sounds的mean和频谱分布高于speech signals,spectral flatness levels around 0.22,spectral flux between 0.3 and 0.6,zero crossing rate between 0.05 and 0.4. These attributes contribute significant information while characterizing cough sounds.Abstract
Cough is a primary symptom of most respiratory diseases, and changes in cough characteristics provide valuable information for diagnosing respiratory diseases. The characterization of cough sounds still lacks concrete evidence, which makes it difficult to accurately distinguish between different types of coughs and other sounds. The objective of this research work is to characterize cough sounds with voiced content and cough sounds without voiced content. Further, the cough sound characteristics are compared with the characteristics of speech. The proposed method to achieve this goal utilized spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attributes which describe the cough sounds related to the respiratory system, glottal information, and voice model. These attributes are then subjected to statistical analysis using the measures of minimum, maximum, mean, median, and standard deviation. The experimental results show that the mean and frequency distribution of spectral roll-off, spectral centroid, and spectral bandwidth are found to be higher for cough sounds than for speech signals. Spectral flatness levels in cough sounds will rise to 0.22, whereas spectral flux varies between 0.3 and 0.6. The Zero Crossing Rate (ZCR) of most frames of cough sounds is between 0.05 and 0.4. These attributes contribute significant information while characterizing cough sounds.
摘要 咳是许多呼吸疾病的主要症状之一,Changes in cough characteristics 提供了诊断呼吸疾病的有价值信息。然而,咳 зву的特征化仍然缺乏具体证据,这使得准确地 отличи出咳音和其他声音变得困难。本研究的目的是 caracterize cough sounds with voiced content and cough sounds without voiced content。此外,咳音特征与语音特征进行比较。 提议的方法是利用spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attribute来描述咳音。这些特征然后被统计分析,使用最小、最大、平均、中值和标准差度量。实验结果显示,咳音的平均值和频谱分布高于语音信号。咳音中的频谱平坦度达0.22,而频谱流量在0.3和0.6之间变化。咳音中的零极点频率在0.05和0.4之间。这些特征对于 caracterizing cough sounds 提供了重要信息。 Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.
DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation
paper_authors: Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan
for: 这 paper 的目的是生成真实的舞蹈序列,以便与输入的音乐进行有效的Alignment。
methods: 这 paper 使用了一种新的层次动态扩散模型,称为DiffDance,来生成高分辨率、长形 dance sequence。该模型包括一个音乐到舞蹈扩散模型和一个序列超分辨率扩散模型。为了将音乐和动作空间联系起来,DiffDance 使用了一个预训练的音频表示学习模型来提取音乐嵌入,并通过对比损失对其嵌入空间进行对齐。
results: 通过对 AIST++ benchmark 数据集进行了广泛的实验,DiffDance 能够生成真实的舞蹈序列,并与输入音乐进行有效的Alignment。这些结果与现有的排序法相当。Abstract
When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion model, we also incorporate multiple geometric losses to constrain the model outputs to be physically plausible and add a dynamic loss weight that adaptively changes over diffusion timesteps to facilitate sample diversity. Through comprehensive experiments performed on the benchmark dataset AIST++, we demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music. These results are comparable to those achieved by state-of-the-art autoregressive methods.
摘要
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
results: 本论文的研究结果显示,量化变换可以减少 speaker PPI 而不影响语音信号的用处。同时,本论文也提出了一种新的攻击方法来逆转匿名化。Abstract
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.
摘要
voice用户界面的使用量在增长,导致了对话数据的收集和存储。这种数据收集可以为语音服务的开发提供效率的工具,但也会对用户造成严重的隐私问题,因为中央存储的私人个人对话数据容易受到网络攻击。随着语音基于的数字助手like Amazon的Alexa、Google的Home和Apple的Siri的使用的增加,以及对个人对话数据的收集变得更加容易,隐私抹革和 speaker/性别/疾病等识别的风险也在增加。本论文提出了对话数据的匿名化和评估其匿名化度的解决方案。在这个过程中,匿名化指的是让个人对话数据与身份分离开来,保持语音信号的有用性(如访问语言内容)。我们开始于评估协议中需要考虑的挑战,并且明确匿名化系统的配置方式,并指出了许多实际部署配置不允许隐私评估。此外,我们研究了最常用的语音转换基于匿名化系统,并发现其弱点,然后建议新的方法来解决一些限制。我们分解了匿名化系统的每个组件,并评估它们中 speaker PPI 的度量。然后,我们提议了一些转换方法,以减少 speaker PPI 的度量,同时保持Utility。我们推荐使用量化变换的匿名化算法,而不是最常用的噪音基本方法。最后,我们提出了一种新的攻击方法,用于逆转匿名化。