cs.LG - 2023-09-27

Label Augmentation Method for Medical Landmark Detection in Hip Radiograph Images

  • paper_url: http://arxiv.org/abs/2309.16066
  • repo_url: None
  • paper_authors: Yehyun Suh, Peter Chan, J. Ryan Martin, Daniel Moyer
  • for: 预测医学影像中的临床标志物
  • methods: 使用标签Only的扩充方案进行自动医学特征点检测,并使用 generic U-Net 架构和课程 consisting of two phases 进行训练
  • results: 在六个医学影像 dataset 上,使用这种方法可以获得高效的临床标志物预测结果,并且比传统的数据扩充方法更高效
    Abstract This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-based architecture under a curriculum consisting of two phases: initially relaxing the landmarking task by enlarging the label points to regions, then gradually eroding these label regions back to the base task. We measure the benefits of this approach on six datasets of radiographs with gold-standard expert annotations.
    摘要

Predicting Cardiovascular Complications in Post-COVID-19 Patients Using Data-Driven Machine Learning Models

  • paper_url: http://arxiv.org/abs/2309.16059
  • repo_url: None
  • paper_authors: Maitham G. Yousif, Hector J. Castro
  • for: 预测 COVID-19 后cardiovascular 疾病的风险
  • methods: 使用数据驱动机器学习模型预测352名 Iraq 地区 COVID-19 患者中的 cardiovascular 疾病风险
  • results: 机器学习模型在预测 cardiovascular 疾病风险方面表现出色,早期发现可以提供时间ous interventions 和改善结果
    Abstract The COVID-19 pandemic has globally posed numerous health challenges, notably the emergence of post-COVID-19 cardiovascular complications. This study addresses this by utilizing data-driven machine learning models to predict such complications in 352 post-COVID-19 patients from Iraq. Clinical data, including demographics, comorbidities, lab results, and imaging, were collected and used to construct predictive models. These models, leveraging various machine learning algorithms, demonstrated commendable performance in identifying patients at risk. Early detection through these models promises timely interventions and improved outcomes. In conclusion, this research underscores the potential of data-driven machine learning for predicting post-COVID-19 cardiovascular complications, emphasizing the need for continued validation and research in diverse clinical settings.
    摘要 COVID-19 大流行已经在全球带来了许多健康挑战,其中包括后 COVID-19 冠状病毒疾病的出现。这项研究利用数据驱动的机器学习模型来预测这些病例中的合并症。研究在 Iraq 352 名 POST-COVID-19 患者的临床数据,包括人口统计、相关病种、实验室测试结果和成像,以构建预测模型。这些模型通过不同的机器学习算法,在预测患者风险中表现了良好的表现。早期发现通过这些模型,可以提供早期干预和改善结果。研究结论,这项研究证明了数据驱动的机器学习可以预测后 COVID-19 冠状病毒疾病,并且需要继续验证和研究在多种临床设置下。

Machine Learning-driven Analysis of Gastrointestinal Symptoms in Post-COVID-19 Patients

  • paper_url: http://arxiv.org/abs/2310.00540
  • repo_url: None
  • paper_authors: Maitham G. Yousif, Fadhil G. Al-Amran, Salman Rawaf, Mohammad Abdulla Grmt
  • for: This study aims to investigate the prevalence and patterns of gastrointestinal (GI) symptoms in individuals recovering from COVID-19 and to identify predictive factors for these symptoms using machine learning algorithms.
  • methods: The study uses data from 913 post-COVID-19 patients in Iraq collected during 2022 and 2023. The researchers use machine learning algorithms to identify predictive factors for GI symptoms, including age, gender, disease severity, comorbidities, and the duration of COVID-19 illness.
  • results: The study finds that a notable percentage of post-COVID-19 patients experience GI symptoms during their recovery phase, with diarrhea being the most frequently reported symptom. The researchers also identify significant predictive factors for GI symptoms, including age, gender, disease severity, comorbidities, and the duration of COVID-19 illness.Here are the results in Simplified Chinese text:
  • for: 这个研究旨在调查 COVID-19 恢复期人群中的肠胃症状的频度和特征,以及使用机器学习算法预测这些症状的预测因素。
  • methods: 该研究使用2022年和2023年在伊拉克收集的913名 COVID-19 恢复期患者的数据。研究人员使用机器学习算法预测肠胃症状的预测因素,包括年龄、性别、疾病严重程度、潜在的相关疾病和 COVID-19 病程的时间长短。
  • results: 研究发现大量的 COVID-19 恢复期患者在恢复阶段经历了肠胃症状,最常见的症状是 диаре便, followed by 腹痛和呕吐。研究人员还发现了预测肠胃症状的重要因素,包括年龄、性别、疾病严重程度、潜在的相关疾病和 COVID-19 病程的时间长短。
    Abstract The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, has posed significant health challenges worldwide. While respiratory symptoms have been the primary focus, emerging evidence has highlighted the impact of COVID-19 on various organ systems, including the gastrointestinal (GI) tract. This study, based on data from 913 post-COVID-19 patients in Iraq collected during 2022 and 2023, investigates the prevalence and patterns of GI symptoms in individuals recovering from COVID-19 and leverages machine learning algorithms to identify predictive factors for these symptoms. The research findings reveal that a notable percentage of post-COVID-19 patients experience GI symptoms during their recovery phase. Diarrhea emerged as the most frequently reported symptom, followed by abdominal pain and nausea. Machine learning analysis uncovered significant predictive factors for GI symptoms, including age, gender, disease severity, comorbidities, and the duration of COVID-19 illness. These findings underscore the importance of monitoring and addressing GI symptoms in post-COVID-19 care, with machine learning offering valuable tools for early identification and personalized intervention. This study contributes to the understanding of the long-term consequences of COVID-19 on GI health and emphasizes the potential benefits of utilizing machine learning-driven analysis in predicting and managing these symptoms. Further research is warranted to delve into the mechanisms underlying GI symptoms in COVID-19 survivors and to develop targeted interventions for symptom management. Keywords: COVID-19, gastrointestinal symptoms, machine learning, predictive factors, post-COVID-19 care, long COVID.
    摘要 COVID-19 大流行,由新型冠状病毒 SARS-CoV-2 引起,在全球造成了巨大的健康挑战。although respiratory symptoms have been the primary focus, emerging evidence has highlighted the impact of COVID-19 on various organ systems, including the gastrointestinal (GI) tract. This study, based on data from 913 post-COVID-19 patients in Iraq collected during 2022 and 2023, investigates the prevalence and patterns of GI symptoms in individuals recovering from COVID-19 and leverages machine learning algorithms to identify predictive factors for these symptoms. The research findings reveal that a notable percentage of post-COVID-19 patients experience GI symptoms during their recovery phase. 肠胃症状最常出现的是腹痛,followed by nausea and diarrhea. Machine learning analysis uncovered significant predictive factors for GI symptoms, including age, gender, disease severity, comorbidities, and the duration of COVID-19 illness. These findings underscore the importance of monitoring and addressing GI symptoms in post-COVID-19 care, with machine learning offering valuable tools for early identification and personalized intervention. This study contributes to the understanding of the long-term consequences of COVID-19 on GI health and emphasizes the potential benefits of utilizing machine learning-driven analysis in predicting and managing these symptoms. Further research is warranted to delve into the mechanisms underlying GI symptoms in COVID-19 survivors and to develop targeted interventions for symptom management. Keywords: COVID-19, gastrointestinal symptoms, machine learning, predictive factors, post-COVID-19 care, long COVID.

Identifying Risk Factors for Post-COVID-19 Mental Health Disorders: A Machine Learning Perspective

  • paper_url: http://arxiv.org/abs/2309.16055
  • repo_url: None
  • paper_authors: Maitham G. Yousif, Fadhil G. Al-Amran, Hector J. Castro
  • For: This study aimed to identify risk factors associated with post-COVID-19 mental health disorders in a sample of 669 patients in Iraq.* Methods: The study used machine learning techniques to analyze demographic, clinical, and psychosocial factors that may influence the development of mental health disorders in post-COVID-19 patients.* Results: The study found that age, gender, geographical region of residence, comorbidities, and the severity of COVID-19 illness were significant risk factors for developing mental health disorders. Additionally, psychosocial factors such as social support, coping strategies, and perceived stress levels played a substantial role.Here is the information in Simplified Chinese text:* 为:本研究使用机器学习技术来 indentify COVID-19后 mental health障碍的风险因素,采样从伊拉克669名患者中进行分析。* 方法:本研究使用机器学习技术来分析各种因素对 COVID-19后 mental health障碍的影响,包括人口特征、临床特征和心理社会特征。* 结果:研究发现年龄、性别、地域居住、患 COVID-19的严重程度和合并病有关 mental health障碍的风险因素。此外,社会支持、 coping 策略和感知的压力水平也发挥了重要作用。
    Abstract In this study, we leveraged machine learning techniques to identify risk factors associated with post-COVID-19 mental health disorders. Our analysis, based on data collected from 669 patients across various provinces in Iraq, yielded valuable insights. We found that age, gender, and geographical region of residence were significant demographic factors influencing the likelihood of developing mental health disorders in post-COVID-19 patients. Additionally, comorbidities and the severity of COVID-19 illness were important clinical predictors. Psychosocial factors, such as social support, coping strategies, and perceived stress levels, also played a substantial role. Our findings emphasize the complex interplay of multiple factors in the development of mental health disorders following COVID-19 recovery. Healthcare providers and policymakers should consider these risk factors when designing targeted interventions and support systems for individuals at risk. Machine learning-based approaches can provide a valuable tool for predicting and preventing adverse mental health outcomes in post-COVID-19 patients. Further research and prospective studies are needed to validate these findings and enhance our understanding of the long-term psychological impact of the COVID-19 pandemic. This study contributes to the growing body of knowledge regarding the mental health consequences of the COVID-19 pandemic and underscores the importance of a multidisciplinary approach to address the diverse needs of individuals on the path to recovery. Keywords: COVID-19, mental health, risk factors, machine learning, Iraq
    摘要 在这项研究中,我们利用机器学习技术来确定 COVID-19 后精神健康问题的风险因素。我们基于伊拉克各地669名患者的数据进行分析,并获得了有价值的发现。我们发现年龄、性别和居住地区的民生因素都会影响 COVID-19 后精神健康问题的发生。此外,患者患有其他疾病和 COVID-19 病情的严重程度也是重要的临床预测因素。在社会支持、应急应急管理和感受水平方面,也有许多重要的心理因素。我们的发现表明 COVID-19 后精神健康问题的发生是多因素互动的,医疗提供者和政策制定者应该考虑这些风险因素,设计目标性的干预措施和支持系统,以降低患者风险。机器学习基于的方法可以为预测和预防 COVID-19 后精神健康问题的发展提供有价值的工具。进一步的研究和前瞻性研究是需要进行的,以验证这些发现,并增进我们对 COVID-19 疫情后长期精神影响的理解。这项研究贡献到了关于 COVID-19 疫情后精神健康问题的知识库,强调了多科学领域合作的重要性,以满足患者不同需求的多样化路径。关键词:COVID-19, 精神健康, 风险因素, 机器学习, 伊拉克

Cognizance of Post-COVID-19 Multi-Organ Dysfunction through Machine Learning Analysis

  • paper_url: http://arxiv.org/abs/2309.16736
  • repo_url: None
  • paper_authors: Hector J. Castro, Maitham G. Yousif
  • For: The paper aims to analyze and predict multi-organ dysfunction in individuals experiencing Post-COVID-19 Syndrome using machine learning techniques.* Methods: The study uses data collection and preprocessing, feature selection and engineering, model development and validation, and ethical considerations to enhance early detection and management of Post-COVID-19 Syndrome.* Results: The paper aims to improve our understanding of Post-COVID-19 Syndrome through machine learning, potentially improving patient outcomes and quality of life.Here’s the same information in Simplified Chinese text:* For: 这个研究报告旨在使用机器学习技术分析和预测患有长COVID的多器官衰竭。* Methods: 这个研究使用数据收集和处理、特征选择和工程、模型开发和验证、伦理考虑等方法来提高患有长COVID的早期检测和管理。* Results: 这个研究可能会提高患有长COVID的患者结果和生活质量。
    Abstract In the year 2022, a total of 466 patients from various cities across Iraq were included in this study. This research paper focuses on the application of machine learning techniques to analyse and predict multi-organ dysfunction in individuals experiencing Post-COVID-19 Syndrome, commonly known as Long COVID. Post-COVID-19 Syndrome presents a wide array of persistent symptoms affecting various organ systems, posing a significant challenge to healthcare. Leveraging the power of artificial intelligence, this study aims to enhance early detection and management of this complex condition. The paper outlines the importance of data collection and preprocessing, feature selection and engineering, model development and validation, and ethical considerations in conducting research in this field. By improving our understanding of Post-COVID-19 Syndrome through machine learning, healthcare providers can identify at-risk individuals and offer timely interventions, potentially improving patient outcomes and quality of life. Further research is essential to refine models, validate their clinical utility, and explore treatment options for Long COVID. Keywords: Post-COVID-19 Syndrome, Machine Learning, Multi-Organ Dysfunction, Healthcare, Artificial Intelligence.
    摘要 在2022年,这项研究包括来自伊拉克各地的466名病人。这篇研究论文探讨了使用机器学习技术分析和预测多器系功能障碍,以便更好地诊断和管理抗covid-19后续症,通常称为“长covid”。抗covid-19后续症会导致多种持续性症状影响不同的器系,对医疗卫生 pose significiant 挑战。通过利用人工智能技术,这项研究希望通过分析和预测多器系功能障碍,提高早期发现和管理这种复杂的疾病的能力。这篇论文还讨论了数据收集和处理、特征选择和工程、模型开发和验证以及在这一领域进行研究的伦理考虑。通过利用机器学习技术,医疗专业人员可以识别患有长covid的高风险个体,提供时间性的 intervención,可能改善病人的病理和生活质量。进一步的研究是必要的,以钻深模型,验证其临床实用性,并探讨抗covid-19后续症的治疗方法。关键词:Post-COVID-19 Syndrome, Machine Learning, Multi-Organ Dysfunction, Healthcare, Artificial Intelligence.

Improving Adaptive Online Learning Using Refined Discretization

  • paper_url: http://arxiv.org/abs/2309.16044
  • repo_url: None
  • paper_authors: Zhiyu Zhang, Heng Yang, Ashok Cutkosky, Ioannis Ch. Paschalidis
  • for: 本文研究无约束的在线线性优化问题,具体目标是同时实现($i$)第二阶导数适应性和($ii$)参考范数适应性(也称为“参数自由”)。现有的 regret bound(Cutkosky和Orabona,2018;Mhammedi和Koolen,2020;Jacobsen和Cutkosky,2022)具有不优雅的 $O(\sqrt{V_T\log V_T})$ 依赖于导数异常 $V_T$,而 presente 工作改进到最优的 $O(\sqrt{V_T})$ Rate,使用一种新的连续时间启发的算法,无需任何不实际的双倍把拔。
  • methods: 本文使用一种新的连续时间启发的算法,并提出一种新的离散化 Argument 来保持这种适应性在敌对设置中。这种离散化 Argument 可以在敌对设置中保持适应性,并且可以从 (Harvey et al., 2023) 中提取, both algorithmically and analytically。
  • results: 本文提出的算法可以在敌对设置中实现 $O(\sqrt{V_T})$ 的 regret bound,而不是现有的 $O(\sqrt{V_T\log V_T})$ bound。此外,本文还证明了在未知 Lipschitz 常数的情况下,可以消除 Priori 知道的范围比例问题。
    Abstract We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.
    摘要 我们研究不受限制的在线线性优化问题,即使用 lipschitz 损失函数。目标是同时实现(i)第二阶导数适应性和(ii)参数自由性,也称为“参数自由”在文献中。现有的 regret 界(Cutkosky 和 Orabona,2018年;Mhammedi 和 Koolen,2020年;Jacobsen 和 Cutkosky,2022年)具有不优雅的 $O(\sqrt{V_T\log V_T})$ 依赖于梯度方差 $V_T$,而我们的研究提高了这个约束到优化的 $O(\sqrt{V_T})$ 难度,使用一种新的连续时间启发的算法,没有任何不实用的双倍招数技巧。此结果可以推广到未知 lipschitz 常数的设定下,从而消除先前的作品(Mhammedi 和 Koolen,2020年)中的范围比例问题。更具体地,我们首先证明了目标的同时适应性可以在连续时间 аналоги中轻松实现,其中环境是一个 произволь contradicted 的 continuous semimartingale。然后,我们的关键创新是一种新的离散 argumen that preserves 这种适应性在离散时间对抗设定下。这种 refine 了一种不 gradient-adaptive 的离散 argumen from (Harvey 等,2023年),从 both algorithmic 和 analytic 角度来说,这可能是独立的兴趣。

Analytical Modelling of Raw Data for Flow-Guided In-body Nanoscale Localization

  • paper_url: http://arxiv.org/abs/2309.16034
  • repo_url: None
  • paper_authors: Guillem Pascual, Filip Lemic, Carmen Delgado, Xavier Costa-Perez
  • for: 这个论文的目的是提出一种 Analytical Model of Raw Data for Flow-Guided Localization in Nanoscale Devices, 用于解决现有的通信和能源约束问题,以提高精准医学应用的可行性。
  • methods: 该论文使用 Analytical Modeling 方法,模型了 nanodevice 的 raw data 如何受到通信和能源约束的影响,并与 Simulator 进行对比,以评估模型的准确性。
  • results: 研究结果表明,模型和 Simulator 生成的 raw 数据之间存在高度相似性,并且可以在多种场景和不同性能指标下进行评估。
    Abstract Advancements in nanotechnology and material science are paving the way toward nanoscale devices that combine sensing, computing, data and energy storage, and wireless communication. In precision medicine, these nanodevices show promise for disease diagnostics, treatment, and monitoring from within the patients' bloodstreams. Assigning the location of a sensed biological event with the event itself, which is the main proposition of flow-guided in-body nanoscale localization, would be immensely beneficial from the perspective of precision medicine. The nanoscale nature of the nanodevices and the challenging environment that the bloodstream represents, result in current flow-guided localization approaches being constrained in their communication and energy-related capabilities. The communication and energy constraints of the nanodevices result in different features of raw data for flow-guided localization, in turn affecting its performance. An analytical modeling of the effects of imperfect communication and constrained energy causing intermittent operation of the nanodevices on the raw data produced by the nanodevices would be beneficial. Hence, we propose an analytical model of raw data for flow-guided localization, where the raw data is modeled as a function of communication and energy-related capabilities of the nanodevice. We evaluate the model by comparing its output with the one obtained through the utilization of a simulator for objective evaluation of flow-guided localization, featuring comparably higher level of realism. Our results across a number of scenarios and heterogeneous performance metrics indicate high similarity between the model and simulator-generated raw datasets.
    摘要 (Simplified Chinese translation) nanotechnology 和材料科学的进步正在逐渐推动无导体设备的发展,这些设备可以同时感测、计算、数据和能量储存、无线通信。在精准医学中,这些无导体设备展示出在血液中诊断、治疗和监测疾病的极大潜力。将感测生物事件的位置与事件本身相同,是精准医学的核心提案。然而,由于nanodevice的纳米规模和血液环境的挑战性,目前的流导引在血液中的本地化方法受到了通信和能量相关的限制。这些限制导致流导引方法中的数据 Raw 数据具有不同的特征,从而影响其性能。我们提出一种对raw数据的分析模型,该模型将raw数据作为nanodevice的通信和能量相关能力的函数来模型。我们对模型进行评估,并与基于模拟器的对流导引方法的评估进行比较。我们在多种情况和多样性性指标下得到的结果表明,模型和模拟器生成的raw数据之间存在高度相似性。

Learning Dissipative Neural Dynamical Systems

  • paper_url: http://arxiv.org/abs/2309.16032
  • repo_url: None
  • paper_authors: Yuezhu Xu, S. Sivaranjani
  • for: 学习一个不知名的非线性动力系统模型,保持系统的热力学性质。
  • methods: 在两个阶段中学习非线性动力系统模型:首先学习一个不受约束的神经动力系统模型,然后 derivation suficient conditions 以保持系统的热力学性质,并且通过对模型参数的扰动来实现这些条件。
  • results: 这两个阶段的扰动问题可以独立解决,以获得一个保持系统热力学性质的神经动力系统模型,并且这个模型可以准确地模拟非线性系统的轨迹。
    Abstract Consider an unknown nonlinear dynamical system that is known to be dissipative. The objective of this paper is to learn a neural dynamical model that approximates this system, while preserving the dissipativity property in the model. In general, imposing dissipativity constraints during neural network training is a hard problem for which no known techniques exist. In this work, we address the problem of learning a dissipative neural dynamical system model in two stages. First, we learn an unconstrained neural dynamical model that closely approximates the system dynamics. Next, we derive sufficient conditions to perturb the weights of the neural dynamical model to ensure dissipativity, followed by perturbation of the biases to retain the fit of the model to the trajectories of the nonlinear system. We show that these two perturbation problems can be solved independently to obtain a neural dynamical model that is guaranteed to be dissipative while closely approximating the nonlinear system.
    摘要 请考虑一个未知的非线性动力系统,该系统知道是膨胀的。本文的目标是通过学习神经动力模型,来近似该系统,保持模型中的膨胀性质。在总之,在神经网络训练中加入膨胀性约束是一个困难的问题,现在没有知道的技术。在这种情况下,我们在两个阶段中解决了学习一个膨胀的神经动力模型。首先,我们学习一个不受约束的神经动力模型,以便尽可能地准确地近似非线性系统的动力。然后,我们 derive sufficient conditions to perturb the weights of the neural dynamical model to ensure dissipativity, followed by perturbation of the biases to retain the fit of the model to the trajectories of the nonlinear system. We show that these two perturbation problems can be solved independently to obtain a neural dynamical model that is guaranteed to be dissipative while closely approximating the nonlinear system.

GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis

  • paper_url: http://arxiv.org/abs/2309.16022
  • repo_url: https://github.com/chenfengzhao/gnnhls
  • paper_authors: Chenfeng Zhao, Zehao Dong, Yixin Chen, Xuan Zhang, Roger D. Chamberlain
  • for: 本研究旨在提高图神经网络(GNN)的有效执行,使用场地编程阵列(FPGAs)作为执行平台。
  • methods: 本研究使用高级编译(HLS)工具,将GNN模型转换为FPGAs上的优化后的执行代码。
  • results: 研究发现,使用GNNHLS框架可以在4个图数据集上实现50.8倍的速度提升和423倍的能耗减少,相比CPU基线。与GPU基线相比,GNNHLS可以实现5.16倍的速度提升和74.5倍的能耗减少。
    Abstract With the ever-growing popularity of Graph Neural Networks (GNNs), efficient GNN inference is gaining tremendous attention. Field-Programming Gate Arrays (FPGAs) are a promising execution platform due to their fine-grained parallelism, low-power consumption, reconfigurability, and concurrent execution. Even better, High-Level Synthesis (HLS) tools bridge the gap between the non-trivial FPGA development efforts and rapid emergence of new GNN models. In this paper, we propose GNNHLS, an open-source framework to comprehensively evaluate GNN inference acceleration on FPGAs via HLS, containing a software stack for data generation and baseline deployment, and FPGA implementations of 6 well-tuned GNN HLS kernels. We evaluate GNNHLS on 4 graph datasets with distinct topologies and scales. The results show that GNNHLS achieves up to 50.8x speedup and 423x energy reduction relative to the CPU baselines. Compared with the GPU baselines, GNNHLS achieves up to 5.16x speedup and 74.5x energy reduction.
    摘要 随着图神经网络(GNN)的普及,高效的GNN执行已经吸引了很多关注。场地编程阵列(FPGAs)是一个有前途的执行平台,因为它们具有细化的并行性、低功耗消耗、可重新配置和同时执行。尤其是高级语言 Synthesis(HLS)工具,可以将FPGAs的开发努力减少到最小限度,并快速实现新的GNN模型。在本文中,我们提出了GNNHLS,一个开源框架,通过HLS来全面评估GNN执行的加速在FPGAs上,包括软件堆栈 для数据生成和基线部署,以及FPGA中6种优化后的GNN HLS kernel。我们在4个图据集上进行了测试,结果显示,GNNHLS可以在相对于CPU基线的50.8倍速度和423倍能效率下运行GNN模型。相比GPU基eline,GNNHLS可以在5.16倍速度和74.5倍能效率下运行GNN模型。

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

  • paper_url: http://arxiv.org/abs/2309.16014
  • repo_url: https://github.com/geriskenderi/graph-jepa
  • paper_authors: Geri Skenderi, Hang Li, Jiliang Tang, Marco Cristani
  • for: 本研究旨在学习自我超vised表示学习中的 JOINT-EMBEDDING PREDICTIVE ARCHITECTURES(JEPAs),用于图像领域的图表示学习。
  • methods: 本研究使用的方法是MASKED MODELING,通过预测目标信号的嵌入表示来学习图像的嵌入表示。
  • results: 研究发现,GRAPH-JEPA可以学习表示,并在图像分类和回归问题中表现出表达力和竞争力。
    Abstract Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal $y$ from a context signal $x$. JEPAs bypass the need for data augmentation and negative samples, which are typically required by contrastive learning, while avoiding the overfitting issues associated with generative-based pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm and propose Graph-JEPA, the first JEPA for the graph domain. In particular, we employ masked modeling to learn embeddings for different subgraphs of the input graph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative training objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems.
    摘要 joint-embedding predictive architectures (JEPAs) 最近 emerged as a novel 和 powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal $y$ from a context signal $x$. JEPAs bypass the need for data augmentation and negative samples, which are typically required by contrastive learning, while avoiding the overfitting issues associated with generative-based pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm and propose Graph-JEPA, the first JEPA for the graph domain. In particular, we employ masked modeling to learn embeddings for different subgraphs of the input graph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative training objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems.Here's the text with some notes on the translation:* "Joint-Embedding Predictive Architectures" is translated as "联合嵌入预测建筑" (liánhòu zhùjì yùjìng gōngchǎng)* "recently emerged" is translated as "最近 emerged" (zuìjìn yǐjī)* "novel and powerful technique" is translated as "新奇和强大的技术" (xīn qí hé qiáng dà de jìshù)* "self-supervised representation learning" is translated as "自我指导的表示学习" (ziwu zhǐdǎo de biǎozhì xuéxí)* "predicting the latent representation of a target signal" is translated as "预测目标信号的隐藏表示" (yùjì yuètiān yìjīng biǎozhì)* "from a context signal" is translated as "从上下文信号" (cong shàngxìnxiào)* "bypass the need for data augmentation and negative samples" is translated as "绕过数据扩展和负样本的需求" (luòguò xiàngxìn yǔ fāngyàng de xūyào)* "while avoiding the overfitting issues associated with generative-based pretraining" is translated as "而不是避免生成基于预训练的过拟合问题" (ér bùshì mìmiàn shēngchǎng yǐjīng yǔ yùshì de guòhùsuǒ)* "in the graph domain" is translated as "在图像领域" (zài túxiàng yìjīng)* "employ masked modeling to learn embeddings for different subgraphs" is translated as "使用假数据模型学习不同的子图" (shǐyòu kāi xiàng xiǎngxìn yǔ bùdìng de zǐtú)* "to endow the representations with the implicit hierarchy that is often present in graph-level concepts" is translated as "以使表示具有图级概念中的隐藏层次结构" (yǐ shì biǎozhì yǒu xìnghài lèi jí qiǎng yìjīng)* "we devise an alternative training objective" is translated as "我们提出了一种代替目标" (wǒmen tímcháng le yī zhōng dài biǎo mó)* "consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane" is translated as "包括预测编码的子图坐标在2D平面上的unit hyperbola" (bāngsuǒ yùjì yùjì yǐjīng zhǐxíng yǐjīng yǔ 2D píngmàn shàng)* "Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems" is translated as "广泛验证表明,图像-JEPA可以学习表示,在图类别和回归问题中表现出色" (guǎnggòu yànzhèng bǐngmíng, túxiāng-JEPA kěyǐ xuéxí, zài túxìng yǔ fāngyì zhìdào)

Digital Twin-based Anomaly Detection with Curriculum Learning in Cyber-physical Systems

  • paper_url: http://arxiv.org/abs/2309.15995
  • repo_url: https://github.com/xuqinghua-china/tosem
  • paper_authors: Qinghua Xu, Shaukat Ali, Tao Yue
  • for: 本研究旨在提高Cyber-Physical System (CPS) 中异常检测的精度和效率,通过增加数据难度来优化异常检测方法。
  • methods: 本研究使用的方法包括数字双方法(ATTAIN)和课程学习(curriculum learning),通过对异常数据进行难度分类,从易到难地进行学习。
  • results: 对五个实际 collected CPS 测试床上的数据进行评估,结果显示 LATTICE 在 F1 分数上比 ATTAIN 和两个基eline 高出 0.906%-2.367%,同时也可以降低 ATTAIN 的训练时间。
    Abstract Anomaly detection is critical to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of attacks and CPS themselves, anomaly detection in CPS is becoming more and more challenging. In our previous work, we proposed a digital twin-based anomaly detection method, called ATTAIN, which takes advantage of both historical and real-time data of CPS. However, such data vary significantly in terms of difficulty. Therefore, similar to human learning processes, deep learning models (e.g., ATTAIN) can benefit from an easy-to-difficult curriculum. To this end, in this paper, we present a novel approach, named digitaL twin-based Anomaly deTecTion wIth Curriculum lEarning (LATTICE), which extends ATTAIN by introducing curriculum learning to optimize its learning paradigm. LATTICE attributes each sample with a difficulty score, before being fed into a training scheduler. The training scheduler samples batches of training data based on these difficulty scores such that learning from easy to difficult data can be performed. To evaluate LATTICE, we use five publicly available datasets collected from five real-world CPS testbeds. We compare LATTICE with ATTAIN and two other state-of-the-art anomaly detectors. Evaluation results show that LATTICE outperforms the three baselines and ATTAIN by 0.906%-2.367% in terms of the F1 score. LATTICE also, on average, reduces the training time of ATTAIN by 4.2% on the five datasets and is on par with the baselines in terms of detection delay time.
    摘要 cyber-physical systems (CPS) 安全需要 anomaly detection (AD)。然而,由于攻击和 CPS 本身的复杂度增加,CPS 中的 AD 变得更加困难。我们在前一项工作中提出了一种基于数字孪生的 AD 方法,称为 ATTAIN,它利用 CPS 的历史和实时数据。然而,这些数据很Difficult to vary significantly.因此,类似于人类学习过程,深度学习模型(例如 ATTAIN)可以从易到困难的学习顺序中受益。为此,在这篇论文中,我们提出了一种 noval 方法, named digitaL twin-based Anomaly deTecTion wIth Curriculum lEarning (LATTICE),它扩展 ATTAIN 而添加了学习课程。LATTICE 为每个样本分配了一个difficulty score,然后通过一个训练调度器将它们 feed into 训练。训练调度器根据这些difficulty score sampling batches of training data,以便从易到困难的数据进行学习。为了评估 LATTICE,我们使用了五个公开可用的实验室测试床上的数据集。我们将 LATTICE 与 ATTAIN 和两个状态方法进行比较。评估结果显示,LATTICE 在 F1 分数上与三个基准值相比提高了0.906%-2.367%,而且在五个数据集上的训练时间中,LATTICE 平均降低了 ATTAIN 的训练时间4.2%。此外,LATTICE 与基准值相比在检测延迟时间方面具有相同的性能。

Machine Learning Based Analytics for the Significance of Gait Analysis in Monitoring and Managing Lower Extremity Injuries

  • paper_url: http://arxiv.org/abs/2309.15990
  • repo_url: None
  • paper_authors: Mostafa Rezapour, Rachel B. Seymour, Stephen H. Sims, Madhav A. Karunakar, Nahir Habet, Metin Nafi Gurcan
  • for: 这项研究旨在利用步态分析来评估骨折患者后果,如感染、萎缩或设备侵袋。
  • methods: 研究使用了监督式机器学习模型预测后果,使用连续步态数据集。患者在学院中接受了骨折治疗,并进行了胸部绑定式IMU设备进行步态分析。 Raw 步态数据使用软件进行 pré-processing,强调12个关键步态变量。
  • results: 研究发现XGBoost模型在训练、测试和评估中表现最佳,并且通过SMOTE处理Class imbalance问题得到了改善。在评估前和评估后应用SMOTE后,XGBoost模型在测试AUC和测试精度方面均达到了最高水平。
    Abstract This study explored the potential of gait analysis as a tool for assessing post-injury complications, e.g., infection, malunion, or hardware irritation, in patients with lower extremity fractures. The research focused on the proficiency of supervised machine learning models predicting complications using consecutive gait datasets. We identified patients with lower extremity fractures at an academic center. Patients underwent gait analysis with a chest-mounted IMU device. Using software, raw gait data was preprocessed, emphasizing 12 essential gait variables. Machine learning models including XGBoost, Logistic Regression, SVM, LightGBM, and Random Forest were trained, tested, and evaluated. Attention was given to class imbalance, addressed using SMOTE. We introduced a methodology to compute the Rate of Change (ROC) for gait variables, independent of the time difference between gait analyses. XGBoost was the optimal model both before and after applying SMOTE. Prior to SMOTE, the model achieved an average test AUC of 0.90 (95% CI: [0.79, 1.00]) and test accuracy of 86% (95% CI: [75%, 97%]). Feature importance analysis attributed importance to the duration between injury and gait analysis. Data patterns showed early physiological compensations, followed by stabilization phases, emphasizing prompt gait analysis. This study underscores the potential of machine learning, particularly XGBoost, in gait analysis for orthopedic care. Predicting post-injury complications, early gait assessment becomes vital, revealing intervention points. The findings support a shift in orthopedics towards a data-informed approach, enhancing patient outcomes.
    摘要 We identified patients with lower extremity fractures at an academic center and had them undergo gait analysis with a chest-mounted IMU device. Using software, we preprocessed the raw gait data, emphasizing 12 essential gait variables. We trained, tested, and evaluated machine learning models including XGBoost, Logistic Regression, SVM, LightGBM, and Random Forest. We addressed class imbalance using SMOTE.We introduced a methodology to compute the Rate of Change (ROC) for gait variables, independent of the time difference between gait analyses. XGBoost was the optimal model both before and after applying SMOTE. Prior to SMOTE, the model achieved an average test AUC of 0.90 (95% CI: [0.79, 1.00]) and test accuracy of 86% (95% CI: [75%, 97%]). Feature importance analysis attributed importance to the duration between injury and gait analysis.Data patterns showed early physiological compensations, followed by stabilization phases, emphasizing the importance of prompt gait analysis. This study underscores the potential of machine learning, particularly XGBoost, in gait analysis for orthopedic care. Predicting post-injury complications early on becomes vital, revealing intervention points. The findings support a shift in orthopedics towards a data-informed approach, enhancing patient outcomes.

Open Source Infrastructure for Differentiable Density Functional Theory

  • paper_url: http://arxiv.org/abs/2309.15985
  • repo_url: None
  • paper_authors: Advika Vidhyadhiraja, Arun Pa Thiagarajan, Shang Zhu, Venkat Viswanathan, Bharath Ramsundar
  • for: 用于训练量子化学计算中的exchange correlation函数。
  • methods: 使用开源基础设施和多组State-of-the-art技术来标准化处理管道。
  • results: 开发了一个可 diferenciable quantum chemistry方法,并将其分布在DeepChem库中,以便进一步研究。
    Abstract Learning exchange correlation functionals, used in quantum chemistry calculations, from data has become increasingly important in recent years, but training such a functional requires sophisticated software infrastructure. For this reason, we build open source infrastructure to train neural exchange correlation functionals. We aim to standardize the processing pipeline by adapting state-of-the-art techniques from work done by multiple groups. We have open sourced the model in the DeepChem library to provide a platform for additional research on differentiable quantum chemistry methods.
    摘要 学习交换相关函数,用于量子化学计算,从数据中获得信息已经在过去几年变得越来越重要。但是训练这种函数需要复杂的软件基础设施。为此,我们建立了开源基础设施,用于训练神经网络交换相关函数。我们目标是标准化处理管道,采用多个组织的state-of-the-art技术。我们在DeepChem库中打包了模型,以提供更多的研究 differentiable量子化学方法的平台。

TraCE: Trajectory Counterfactual Explanation Scores

  • paper_url: http://arxiv.org/abs/2309.15965
  • repo_url: None
  • paper_authors: Jeffrey N. Clark, Edward A. Small, Nawid Keshtmand, Michelle W. L. Wan, Elena Fillola Mayoral, Enrico Werner, Christopher P. Bourdeaux, Raul Santos-Rodriguez
  • for: 这个论文旨在扩展对黑盒分类器预测的 counterfactual 解释,以便更好地理解和解释在следова决策任务中的进步。
  • methods: 该论文提出了一种模型无关的套件方法,名为 TraCE(轨迹counterfactual解释),可以将高度复杂的情况简化成一个值。
  • results: 在两个案例研究中,TraCE 能够成功地捕捉并概括在医疗和气候变化等领域中的进步。
    Abstract Counterfactual explanations, and their associated algorithmic recourse, are typically leveraged to understand, explain, and potentially alter a prediction coming from a black-box classifier. In this paper, we propose to extend the use of counterfactuals to evaluate progress in sequential decision making tasks. To this end, we introduce a model-agnostic modular framework, TraCE (Trajectory Counterfactual Explanation) scores, which is able to distill and condense progress in highly complex scenarios into a single value. We demonstrate TraCE's utility across domains by showcasing its main properties in two case studies spanning healthcare and climate change.
    摘要 <>将文本翻译成简化中文。<>通常情况下,Counterfactual解释和其相关的算法救济是用来理解、解释和可能修改来自黑盒分类器的预测。在这篇论文中,我们提议将Counterfactuals用于评估流程决策任务的进步。为此,我们提出了一种无关于模型的模块化框架,名为TraCE(轨迹Counterfactual解释)分数,可以将高度复杂的情况缩减到单个值。我们在两个案例中展示了TraCE的使用效果,其中一个是医疗领域,另一个是气候变化领域。

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

  • paper_url: http://arxiv.org/abs/2309.15938
  • repo_url: None
  • paper_authors: Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani
  • For: 本研究提出了一种简单的多通道框架(MC-SimCLR),用于编码空间声音的“what”和“where”。MC-SimCLR通过不supervised学习,从空间声音中学习 JOINT spectral和空间表示,从而提高下游任务中的事件分类和声音定位。* Methods: 我们提出了一种多级数据增强管道,用于增强不同级别的声音特征,包括波形、Mel幅gram和通用相关函数(GCC)特征。此外,我们引入了简单 yet effective的通道 wise增强方法,包括随机将 Mikrophone 的顺序交换和 Mel、GCC 通道遮盖。* Results: 我们发现,使用这些增强方法后,linear层在 learned 表示上得到了显著改进,以至于超过supervised模型在事件分类精度和定位误差方面的表现。此外,我们还进行了对每种增强方法的影响分析和不同量的标注数据 fine-tuning 性能的比较。
    Abstract In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.
    摘要 在这个研究中,我们提出了一种简单的多通道框架 для对比学习(MC-SimCLR),用于编码空间声音中的“what”和“where”。MC-SimCLR通过不supervised学习,从未标注的空间声音中学习联合spectral和空间表示,从而提高下游任务中的事件分类和声音定位精度。我们的核心提案是一种多级数据增强管道,该管道在不同级别上增强不同类型的声音特征,包括波形、Mel spectrogram和通用相关函数(GCC)特征。此外,我们还引入了简单 yet effective的通道 wise增强方法, randomly swap microphone的顺序和隐藏Mel和GCC通道。我们发现,通过使用这些增强方法,linear layer在学习后的表示 significantly outperform supervised模型,并且我们也进行了对每种增强方法的影响分析以及不同量的标注数据 fine-tuning性能的比较。

Deep Learning-Based Real-Time Rate Control for Live Streaming on Wireless Networks

  • paper_url: http://arxiv.org/abs/2310.06857
  • repo_url: None
  • paper_authors: Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T. Chakradhar, Sennur Ulukus
  • for: 提供无线用户高质量视频内容已成为非常重要,但确保视频质量一致却面临变化的编码比特率和无线频谱干扰的挑战。
  • methods: 提议使用实时深度学习基于H.264控制器,利用物理层的实时频率质量数据和视频块来动态估算最佳编码参数,以避免视频质量损失和 packet loss 引起的artefacts。
  • results: 实验结果表明,相比之前的适应比特率视频流传输技术,该方法可以获得10-20 dB的PSNR提升,并且平均包drop rate只有0.002。
    Abstract Providing wireless users with high-quality video content has become increasingly important. However, ensuring consistent video quality poses challenges due to variable encoded bitrate caused by dynamic video content and fluctuating channel bitrate caused by wireless fading effects. Suboptimal selection of encoder parameters can lead to video quality loss due to underutilized bandwidth or the introduction of video artifacts due to packet loss. To address this, a real-time deep learning based H.264 controller is proposed. This controller leverages instantaneous channel quality data driven from the physical layer, along with the video chunk, to dynamically estimate the optimal encoder parameters with a negligible delay in real-time. The objective is to maintain an encoded video bitrate slightly below the available channel bitrate. Experimental results, conducted on both QCIF dataset and a diverse selection of random videos from public datasets, validate the effectiveness of the approach. Remarkably, improvements of 10-20 dB in PSNR with repect to the state-of-the-art adaptive bitrate video streaming is achieved, with an average packet drop rate as low as 0.002.
    摘要 提供无线用户高质量视频内容已成为非常重要。然而,保证视频质量的一致带来了变量编码比特率的挑战,这是因为动态视频内容引起的编码比特率变化以及无线抖动效果引起的通道比特率波动。不佳选择编码参数可能会导致视频质量损失, Either due to underutilized bandwidth or the introduction of video artifacts due to packet loss. To address this, a real-time deep learning based H.264 controller is proposed. This controller leverages instantaneous channel quality data driven from the physical layer, along with the video chunk, to dynamically estimate the optimal encoder parameters with a negligible delay in real-time. The objective is to maintain an encoded video bitrate slightly below the available channel bitrate. Experimental results, conducted on both QCIF dataset and a diverse selection of random videos from public datasets, validate the effectiveness of the approach. Remarkably, improvements of 10-20 dB in PSNR with respect to the state-of-the-art adaptive bitrate video streaming are achieved, with an average packet drop rate as low as 0.002.Here's the translation in Traditional Chinese:提供无线用户高质量影片内容已成为非常重要。然而,保证影片质量的一致带来了变量编码比特率的挑战,这是因为动态影片内容引起的编码比特率变化以及无线抖动效果引起的通道比特率波动。不佳选择编码参数可能会导致影片质量损失, Either due to underutilized bandwidth or the introduction of video artifacts due to packet loss. To address this, a real-time deep learning based H.264 controller is proposed. This controller leverages instantaneous channel quality data driven from the physical layer, along with the video chunk, to dynamically estimate the optimal encoder parameters with a negligible delay in real-time. The objective is to maintain an encoded video bitrate slightly below the available channel bitrate. Experimental results, conducted on both QCIF dataset and a diverse selection of random videos from public datasets, validate the effectiveness of the approach. Remarkably, improvements of 10-20 dB in PSNR with respect to the state-of-the-art adaptive bitrate video streaming are achieved, with an average packet drop rate as low as 0.002.

Multi-unit soft sensing permits few-shot learning

  • paper_url: http://arxiv.org/abs/2309.15828
  • repo_url: None
  • paper_authors: Bjarne Grimstad, Kristian Løvland, Lars S. Imsland
  • for: 本研究探讨了使用学习算法实现软感知器的提升。具体来说,通过解决多个任务来提高软感知器的性能。
  • methods: 本文使用了深度神经网络实现多单元软感知器,并 investigate了该模型在不同单元数量下的学习能力。
  • results: 研究发现,当软感知器通过多个任务学习时,它可以具有很好的泛化能力,并且在新单元上进行几个数据点的少量学习后,可以达到高性能。
    Abstract Recent literature has explored various ways to improve soft sensors using learning algorithms with transferability. Broadly put, the performance of a soft sensor may be strengthened when it is learned by solving multiple tasks. The usefulness of transferability depends on how strongly related the devised learning tasks are. A particularly relevant case for transferability, is when a soft sensor is to be developed for a process of which there are many realizations, e.g. system or device with many implementations from which data is available. Then, each realization presents a soft sensor learning task, and it is reasonable to expect that the different tasks are strongly related. Applying transferability in this setting leads to what we call multi-unit soft sensing, where a soft sensor models a process by learning from data from all of its realizations. This paper explores the learning abilities of a multi-unit soft sensor, which is formulated as a hierarchical model and implemented using a deep neural network. In particular, we investigate how well the soft sensor generalizes as the number of units increase. Using a large industrial dataset, we demonstrate that, when the soft sensor is learned from a sufficient number of tasks, it permits few-shot learning on data from new units. Surprisingly, regarding the difficulty of the task, few-shot learning on 1-3 data points often leads to a high performance on new units.
    摘要 This paper investigates the learning abilities of a multi-unit soft sensor, which is formulated as a hierarchical model and implemented using a deep neural network. Specifically, we examine how well the soft sensor generalizes as the number of units increases. Using a large industrial dataset, we show that when the soft sensor is learned from a sufficient number of tasks, it permits few-shot learning on data from new units. Surprisingly, even with just a few data points, the soft sensor can achieve high performance on new units.

Fair Canonical Correlation Analysis

  • paper_url: http://arxiv.org/abs/2309.15809
  • repo_url: https://github.com/pennshenlab/fair_cca
  • paper_authors: Zhuoping Zhou, Davoud Ataee Tarzanagh, Bojian Hou, Boning Tong, Jia Xu, Yanbo Feng, Qi Long, Li Shen
  • for: 这个论文探讨了干净 correlation analysis (CCA) 中的公平性和偏见问题,并提出了一种框架来减少保护特征相关的偏见错误。
  • methods: 该论文使用了一种方法来学习全数据点上的全球投影矩阵,并确保这些矩阵在各个组中具有相同的相关水平。
  • results: 实验表明,该方法可以减少偏见错误而不影响 CCA 的准确性。
    Abstract This paper investigates fairness and bias in Canonical Correlation Analysis (CCA), a widely used statistical technique for examining the relationship between two sets of variables. We present a framework that alleviates unfairness by minimizing the correlation disparity error associated with protected attributes. Our approach enables CCA to learn global projection matrices from all data points while ensuring that these matrices yield comparable correlation levels to group-specific projection matrices. Experimental evaluation on both synthetic and real-world datasets demonstrates the efficacy of our method in reducing correlation disparity error without compromising CCA accuracy.
    摘要

Node-Aligned Graph-to-Graph Generation for Retrosynthesis Prediction

  • paper_url: http://arxiv.org/abs/2309.15798
  • repo_url: None
  • paper_authors: Lin Yao, Zhen Wang, Wentao Guo, Shang Xiang, Wentan Liu, Guolin Ke
  • For: The paper aims to develop a template-free machine learning model for single-step retrosynthesis, which can fully leverage the topological information of the molecule and align atoms between the product and reactants.* Methods: The proposed method, NAG2G, uses 2D molecular graphs and 3D conformation information, and incorporates node alignment to determine a specific order for node generation. The method generates molecular graphs in an auto-regressive manner, ensuring that the node generation order coincides with the node order in the input graph.* Results: The proposed NAG2G method outperforms previous state-of-the-art baselines in various metrics, demonstrating its effectiveness in single-step retrosynthesis.
    Abstract Single-step retrosynthesis is a crucial task in organic chemistry and drug design, requiring the identification of required reactants to synthesize a specific compound. with the advent of computer-aided synthesis planning, there is growing interest in using machine-learning techniques to facilitate the process. Existing template-free machine learning-based models typically utilize transformer structures and represent molecules as ID sequences. However, these methods often face challenges in fully leveraging the extensive topological information of the molecule and aligning atoms between the production and reactants, leading to results that are not as competitive as those of semi-template models. Our proposed method, Node-Aligned Graph-to-Graph (NAG2G), also serves as a transformer-based template-free model but utilizes 2D molecular graphs and 3D conformation information. Furthermore, our approach simplifies the incorporation of production-reactant atom mapping alignment by leveraging node alignment to determine a specific order for node generation and generating molecular graphs in an auto-regressive manner node-by-node. This method ensures that the node generation order coincides with the node order in the input graph, overcoming the difficulty of determining a specific node generation order in an auto-regressive manner. Our extensive benchmarking results demonstrate that the proposed NAG2G can outperform the previous state-of-the-art baselines in various metrics.
    摘要 单步反synthesis是有机化学中的一项重要任务,它需要确定用于合成特定化合物的反应物。随着计算机支持的合成规划的出现,有关机器学习技术的应用在这一领域中受到越来越多的关注。现有的模板缺失机器学习模型通常采用变换器结构,并将分子表示为ID序列。然而,这些方法经常遇到利用分子的广泛顺序信息和生成物和反应物之间的原子对齐的问题,从而导致结果与半模板模型相比较差。我们提出的方法Node-Aligned Graph-to-Graph(NAG2G)是一种基于变换器的模板缺失模型,它使用二维分子图和三维结构信息。此外,我们的方法简化了生产反应物原子对齐的含义,通过节点对齐确定生成节点顺序,并在自然顺序下生成分子图。这种方法确保了生成节点顺序与输入图中节点顺序一致,从而解决了在自然顺序下生成节点顺序的困难。我们的广泛测试结果表明,提议的NAG2G可以在多种维度上超越先前的基准值。

Learning the Efficient Frontier

  • paper_url: http://arxiv.org/abs/2309.15775
  • repo_url: https://github.com/Sugoto/Algorithmic-Trading-Using-Unsupervised-Learning
  • paper_authors: Philippe Chatigny, Ivan Sergienko, Ryan Ferguson, Jordan Weir, Maxime Bergeron
  • for: 这篇论文是用于解决资源分配问题,寻找一个最佳投资组合,以 maximize 回报,同时遵循一定的风险水平。
  • methods: 这篇论文使用了人工神经网络来快速地预测资源分配问题的解决方案,并可以处理不规律的行为和变数数量的变化。
  • results: 这篇论文显示了NeuralEF可以快速地预测资源分配问题的解决方案,并且可以处理大规模的 simulations,并且可以适应不同的线性限制和服务器数量的变化。
    Abstract The efficient frontier (EF) is a fundamental resource allocation problem where one has to find an optimal portfolio maximizing a reward at a given level of risk. This optimal solution is traditionally found by solving a convex optimization problem. In this paper, we introduce NeuralEF: a fast neural approximation framework that robustly forecasts the result of the EF convex optimization problem with respect to heterogeneous linear constraints and variable number of optimization inputs. By reformulating an optimization problem as a sequence to sequence problem, we show that NeuralEF is a viable solution to accelerate large-scale simulation while handling discontinuous behavior.
    摘要 efficient frontier(EF)是一个基本资源分配问题,它的目标是找到一个最佳投资组合,以最大化奖励,同时保持给定的风险水平。传统上,这个问题通过解 convex 优化问题来解决。在这篇论文中,我们介绍了 NeuralEF:一种快速神经网络近似框架,可以快速和稳定地预测 EF 优化问题中的解,对于不同类型的线性约束和变量数量的优化输入。通过将优化问题转化为一个序列到序列问题,我们示出了 NeuralEF 可以快速加速大规模的模拟,同时处理不连续行为。

Importance-Weighted Offline Learning Done Right

  • paper_url: http://arxiv.org/abs/2309.15771
  • repo_url: None
  • paper_authors: Germano Gabbianelli, Gergely Neu, Matteo Papini
  • for: 学习在停机环境下的策略优化问题,目标是根据决策数据集学习一个近似优化策略,而不假设奖励函数的任何结构。
  • methods: 使用 importancce-weighted 估计器来估计每个策略的价值,并选择一个最小化估计值的策略,并且附加一个 “悲观” 调整以减少估计值的随机变化。
  • results: 比前一些研究更好地实现性能,包括: eliminating a highly restrictive “uniform coverage” assumption, 并且在无穷策略类中进行 PAC-Bayesian 扩展,以及通过数学仿真表明算法对参数选择的Robustness。
    Abstract We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making any structural assumptions on the reward function, we assume access to a given policy class and aim to compete with the best comparator policy within this class. In this setting, a standard approach is to compute importance-weighted estimators of the value of each policy, and select a policy that minimizes the estimated value up to a "pessimistic" adjustment subtracted from the estimates to reduce their random fluctuations. In this paper, we show that a simple alternative approach based on the "implicit exploration" estimator of \citet{Neu2015} yields performance guarantees that are superior in nearly all possible terms to all previous results. Most notably, we remove an extremely restrictive "uniform coverage" assumption made in all previous works. These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities. We also extend our results to infinite policy classes in a PAC-Bayesian fashion, and showcase the robustness of our algorithm to the choice of hyper-parameters by means of numerical simulations.
    摘要 我们研究线上策略优化问题在随机上下文ual bandit问题中,目标是通过一个决策数据集,收集的一个差本策略来学习一个近似优化策略。而不是对奖励函数进行任何结构假设,我们假设可以访问一个给定策略类型,并且目标是在这个类型中竞争最佳比较策略。在这个设定下,标准的方法是计算重要性权重的估计值,并选择一个优化这些估计值的策略。在这篇论文中,我们表明了一种简单的代替方法,基于\citet{Neu2015}提出的"隐式探索"估计器,可以在大多数可能的情况下超越所有之前的结果。我们除去了所有前一 Works中的极其限制性"同质覆盖"假设,这些改进是通过观察重要性权重估计器的上下文不同,以及其精细控制来实现。我们还将结果扩展到无穷策略类型的PAC-Bayesian方式,并通过数据 simulate 显示了我们的算法对参数选择的robustness。

Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator

  • paper_url: http://arxiv.org/abs/2309.15769
  • repo_url: https://github.com/deshen24/ols_interpolator
  • paper_authors: Dennis Shen, Dogyoon Song, Peng Ding, Jasjeet S. Sekhon
  • for: 本研究探讨了使用最小二乘法(OLS) interpolator在高维场景下的性能,以了解这种情况下的泛化能力和 causal inference 的应用。
  • methods: 本研究使用了高维代数和统计方法,包括留下-$k$-out residual公式、Cochran的公式和 Frisch-Waugh-Lovell 定理,以探讨 OLS interpolator 在高维场景下的性能。
  • results: 研究发现,在高维场景下,OLS interpolator 的泛化能力受到一定的限制,但可以通过使用高维代数和统计方法来更好地理解和优化它的性能。
    Abstract Deep learning research has uncovered the phenomenon of benign overfitting for over-parameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical settings, its behavior in high-dimensional settings is less explored (unlike for ridge or lasso regression) though significant progress has been made of late. We contribute to this growing literature by providing fundamental algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In particular, we provide high-dimensional algebraic equivalents of (i) the leave-$k$-out residual formula, (ii) Cochran's formula, and (iii) the Frisch-Waugh-Lovell theorem. These results aid in understanding the OLS interpolator's ability to generalize and have substantive implications for causal inference. Additionally, under the Gauss-Markov model, we present statistical results such as a high-dimensional extension of the Gauss-Markov theorem and an analysis of variance estimation under homoskedastic errors. To substantiate our theoretical contributions, we conduct simulation studies that further explore the stochastic properties of the OLS interpolator.
    摘要

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

  • paper_url: http://arxiv.org/abs/2309.15737
  • repo_url: None
  • paper_authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
  • for: 这个论文是为了学习受限Markov决策过程(CMDP)中的搜索算法。
  • methods: 这个论文使用 posterior sampling 算法,并且实际上比现有的算法更有利。
  • results: 论文的主要理论结果是一个 Bayesian regret bound,表明在任何交流CMDP中,这个算法的误差 bound 是 O(HS√AT),与时间 horizion T 相对应。这个 regret bound与下界匹配,并且在无限预算下的设置中是最好的知道的 regret bound。实际结果表明,尽管这个算法简单,但它仍然在受限激励学习中超过现有算法。
    Abstract We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of \tilde{O} (HS \sqrt{AT}) for any communicating CMDP with S states, A actions, and bound on the hitting time H. This regret bound matches the lower bound in order of time horizon T and is the best-known regret bound for communicating CMDPs in the infinite-horizon undiscounted setting. Empirical results show that, despite its simplicity, our posterior sampling algorithm outperforms the existing algorithms for constrained reinforcement learning.
    摘要 我们提出了一种基于 posterior sampling 的新算法,用于在受限制的 Markov 决策过程(CMDP)中学习,在无限远景下无折扣设定下。该算法可以达到近似优化的尊贵 regret bound,而且在实际中比现有算法更有利。我们的主要理论结果是一个 Bayesian regret bound,其中每个成本组件的 regret bound为 O(HS√AT),对于任何交流 CMDP 来说。这个 regret bound 与时间轴 T 的下界相同,是无限远景下无折扣 CMDP 中最佳知道的 regret bound。实验结果表明,尽管我们的 posterior sampling 算法相对简单,但它在受限制的 reinforcement learning 中仍然超越了现有算法。

Deep Learning-based Analysis of Basins of Attraction

  • paper_url: http://arxiv.org/abs/2309.15732
  • repo_url: https://github.com/redlynx96/deep-learning-based-analysis-of-basins-of-attraction
  • paper_authors: David Valle, Alexandre Wagemakers, Miguel A. F. Sanjuán
  • for: 这个研究用 convolutional neural networks (CNNs) 来Characterize 不同动力系统的抽象基in的复杂性和随机性。
  • methods: 这种新方法可以高效地Explore 不同动力系统的参数,因为传统的方法 computationally expensive для Characterize 多个抽象基in。
  • results: 该研究包括对不同 CNN 架构的比较,显示我们提议的 Characterization 方法在与传统方法相比,即使使用过时的架构也表现出优异性。
    Abstract This study showcases the effectiveness of convolutional neural networks (CNNs) in characterizing the complexity and unpredictability of basins of attraction for diverse dynamical systems. This novel method is optimal for exploring different parameters of dynamical systems since the conventional methods are computationally expensive for characterizing multiple basins of attraction. Additionally, our research includes a comparison of different CNN architectures for this task showing the superiority of our proposed characterization method over the conventional methods, even with obsolete architectures.
    摘要 Here is the text in Simplified Chinese:这个研究显示了卷积神经网络 (CNNs) 在描述不同动力系统的基域抓取 Complexity and unpredictability of basins of attraction 的效果。这种新方法可以高效地探索不同动力系统的参数,因为传统方法计算多个基域抓取的成本是非常高昂的。此外,我们的研究还比较了不同的 CNN 架构,显示我们提出的 caracterization 方法比传统方法更加高效,即使使用过时的架构也可以达到更高的性能。

Temporal graph models fail to capture global temporal dynamics

  • paper_url: http://arxiv.org/abs/2309.15730
  • repo_url: https://github.com/temporal-graphs-negative-sampling/tgb
  • paper_authors: Michał Daniluk, Jacek Dąbrowski
  • for: 这个论文主要针对的是动态图模型的预测问题,尤其是在具有强大全球动态的 datasets 上。
  • methods: 该论文使用了一种简单的优化策略,即 “最近受欢迎的节点” 的方法,并提出了两种基于沃asserstein距离的度量方法来衡量数据集的短期和长期全球动态强度。
  • results: 研究发现,使用这种简单的优化策略可以在中等和大型数据集上超过其他方法的性能,而且标准的负样本评估方法可能不适用于具有强大全球动态的数据集,可能导致模型培养过程中的模型恶化和训练过程中的模型降解。
    Abstract A recently released Temporal Graph Benchmark is analyzed in the context of Dynamic Link Property Prediction. We outline our observations and propose a trivial optimization-free baseline of "recently popular nodes" outperforming other methods on medium and large-size datasets in the Temporal Graph Benchmark. We propose two measures based on Wasserstein distance which can quantify the strength of short-term and long-term global dynamics of datasets. By analyzing our unexpectedly strong baseline, we show how standard negative sampling evaluation can be unsuitable for datasets with strong temporal dynamics. We also show how simple negative-sampling can lead to model degeneration during training, resulting in impossible to rank, fully saturated predictions of temporal graph networks. We propose improved negative sampling schemes for both training and evaluation and prove their usefulness. We conduct a comparison with a model trained non-contrastively without negative sampling. Our results provide a challenging baseline and indicate that temporal graph network architectures need deep rethinking for usage in problems with significant global dynamics, such as social media, cryptocurrency markets or e-commerce. We open-source the code for baselines, measures and proposed negative sampling schemes.
    摘要 Recently, a Temporal Graph Benchmark was released, and we analyzed it in the context of Dynamic Link Property Prediction. We observed some interesting phenomena and proposed a trivial optimization-free baseline that outperforms other methods on medium and large-size datasets. We also proposed two measures based on Wasserstein distance to quantify the strength of short-term and long-term global dynamics of datasets.However, we found that standard negative sampling evaluation may not be suitable for datasets with strong temporal dynamics, and simple negative-sampling can lead to model degeneration during training, resulting in impossible to rank, fully saturated predictions of temporal graph networks. To address these issues, we proposed improved negative sampling schemes for both training and evaluation and proved their usefulness.We also compared our results with a model trained non-contrastively without negative sampling, and our results provide a challenging baseline. This suggests that temporal graph network architectures need to be rethought for usage in problems with significant global dynamics, such as social media, cryptocurrency markets, or e-commerce. To facilitate further research, we have open-sourced the code for baselines, measures, and proposed negative sampling schemes.

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

  • paper_url: http://arxiv.org/abs/2309.15717
  • repo_url: None
  • paper_authors: Frank Cwitkowitz, Kin Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama, Wei-Hsiang Liao, Yuki Mitsufuji
  • for: 本研究旨在提高音乐转录的性能,尤其是在低资源任务上。
  • methods: 本研究提出了一种新的框架,即Timbre-Trap,它将音乐转录和音频重建结合起来,通过利用拟声和时间域的强分离性来提高转录性能。
  • results: 研究表明,Timbre-Trap框架可以在低数据量情况下达到与现有状态艺术方法相当的转录性能,而不需要大量的注释数据。
    Abstract In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but these methods face the same data availability issues. We propose Timbre-Trap, a novel framework which unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. We train a single U-Net to simultaneously estimate pitch salience and reconstruct complex spectral coefficients, selecting between either output during the decoding stage via a simple switch mechanism. In this way, the model learns to produce coefficients corresponding to timbre-less audio, which can be interpreted as pitch salience. We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods, while only requiring a small amount of annotated data.
    摘要 Recently, music transcription research has focused mainly on architecture design and instrument-specific data acquisition. Due to the lack of diverse datasets, progress has been limited to solo-instrument tasks such as piano transcription. Some works have explored multi-instrument transcription to improve the performance of models on low-resource tasks, but these methods face the same data availability issues. We propose Timbre-Trap, a novel framework that unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. We train a single U-Net to simultaneously estimate pitch salience and reconstruct complex spectral coefficients, selecting between either output during the decoding stage via a simple switch mechanism. In this way, the model learns to produce coefficients corresponding to timbre-less audio, which can be interpreted as pitch salience. We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods, while only requiring a small amount of annotated data.

Maximum Weight Entropy

  • paper_url: http://arxiv.org/abs/2309.15704
  • repo_url: https://github.com/antoinedemathelin/openood
  • paper_authors: Antoine de Mathelin, François Deheeger, Mathilde Mougeot, Nicolas Vayatis
  • for: 这篇论文针对深度学习中的不确定量化和非常数据类型探测使用bayesian和集合方法。
  • methods: 方法建议一个实用的解决方案,即使标准方法在非常数据类型上运行时会受到过度简化的问题。
  • results: 方法可以在各种配置下与更多竞争者相比,在一个广泛的非常数据类型测试中排名前三名。
    Abstract This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a practical solution to the lack of prediction diversity observed recently for standard approaches when used out-of-distribution (Ovadia et al., 2019; Liu et al., 2021). Considering that this issue is mainly related to a lack of weight diversity, we claim that standard methods sample in "over-restricted" regions of the weight space due to the use of "over-regularization" processes, such as weight decay and zero-mean centered Gaussian priors. We propose to solve the problem by adopting the maximum entropy principle for the weight distribution, with the underlying idea to maximize the weight diversity. Under this paradigm, the epistemic uncertainty is described by the weight distribution of maximal entropy that produces neural networks "consistent" with the training observations. Considering stochastic neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy. We develop a novel weight parameterization for the stochastic model, based on the singular value decomposition of the neural network's hidden representations, which enables a large increase of the weight entropy for a small empirical risk penalization. We provide both theoretical and numerical results to assess the efficiency of the approach. In particular, the proposed algorithm appears in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark including more than thirty competitors.
    摘要 To solve this problem, the authors propose adopting the maximum entropy principle for the weight distribution. This approach aims to maximize the weight diversity, and the epistemic uncertainty is described by the weight distribution of maximal entropy that produces neural networks "consistent" with the training observations. The authors derive a practical optimization to build such a distribution, which is a trade-off between the average empirical risk and the weight distribution entropy.To implement this approach, the authors develop a novel weight parameterization for the stochastic model based on the singular value decomposition of the neural network's hidden representations. This parameterization enables a large increase of the weight entropy for a small empirical risk penalization. The authors provide both theoretical and numerical results to assess the efficiency of the approach. In particular, the proposed algorithm ranks in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark that includes more than thirty competitors.

Breaking NoC Anonymity using Flow Correlation Attack

  • paper_url: http://arxiv.org/abs/2309.15687
  • repo_url: None
  • paper_authors: Hansika Weerasena, Pan Zhixin, Khushboo Rani, Prabhat Mishra
  • for: 本研究探讨了 today’s multicore System-on-Chip (SoC) 设计中的内部通信网络(Network-on-Chip,NoC)的安全性。
  • methods: 本研究使用了现有的匿名路由协议,并使用流量混淆技术来防御机器学习(ML)基于流量相关攻击。
  • results: 实验结果表明,现有的匿名路由有ML基于流量相关攻击的漏洞,而我们提议的轻量级匿名路由可以防御ML基于攻击,但具有较少的硬件和性能开销。
    Abstract Network-on-Chip (NoC) is widely used as the internal communication fabric in today's multicore System-on-Chip (SoC) designs. Security of the on-chip communication is crucial because exploiting any vulnerability in shared NoC would be a goldmine for an attacker. NoC security relies on effective countermeasures against diverse attacks. We investigate the security strength of existing anonymous routing protocols in NoC architectures. Specifically, this paper makes two important contributions. We show that the existing anonymous routing is vulnerable to machine learning (ML) based flow correlation attacks on NoCs. We propose a lightweight anonymous routing that use traffic obfuscation techniques which can defend against ML-based flow correlation attacks. Experimental studies using both real and synthetic traffic reveal that our proposed attack is successful against state-of-the-art anonymous routing in NoC architectures with a high accuracy (up to 99%) for diverse traffic patterns, while our lightweight countermeasure can defend against ML-based attacks with minor hardware and performance overhead.
    摘要 network-on-chip (NoC) 广泛应用于今天的多核系统在一个芯片 (SoC) 设计中作为内部通信织物。NoC 通信安全是关键的,因为任何 NoC 共享的攻击漏洞都会是攻击者的黄金岛。NoC 安全取决于有效地对多种攻击发起countermeasures。我们调查了现有的匿名路由协议在 NoC 架构中的安全强度。特别是,这篇论文有两个重要贡献。我们表明了现有的匿名路由易受到机器学习 (ML) 基于流 corrleation 攻击 NoC 中。我们提议一种轻量级的匿名路由,使用流混淆技术来防御 ML-based 流 corrleation 攻击。实验 Studies 使用了真实和 sintetic 流量,表明我们的提议可以成功地防御现有的匿名路由,并且对多种流量模式具有高精度(达到 99%),而我们的轻量级 countermeasure 可以防御 ML-based 攻击,只需要非常小的硬件和性能开销。

Projection based fuzzy least squares twin support vector machine for class imbalance problems

  • paper_url: http://arxiv.org/abs/2309.15886
  • repo_url: None
  • paper_authors: M. Tanveer, Ritik Mishra, Bharat Richhariya
  • for: addresses the problem of class imbalance and noisy datasets in real-world classification tasks
  • methods: proposes two novel fuzzy-based approaches, IF-RELSTSVM and F-RELSTSVM, which use intuitionistic fuzzy membership and hyperplane-based fuzzy membership, respectively
  • results: outperforms baseline algorithms on several benchmark and synthetic datasets, with statistical tests confirming the significance of the proposed algorithms on noisy and imbalanced datasets.Here’s the simplified Chinese version:
  • for: solve了现实世界分类任务中的类别不均和噪音数据问题
  • methods: 提出了两种新的模糊方法,IF-RELSTSVM和F-RELSTSVM,它们使用了Intuitionistic Fuzzy Membership和Hyperplane-based Fuzzy Membership
  • results: 在多个 benchmark和 sintetic 数据集上比基准算法表现更好,并通过统计测试确认了模糊方法在噪音和类别不均数据集上的可靠性。
    Abstract Class imbalance is a major problem in many real world classification tasks. Due to the imbalance in the number of samples, the support vector machine (SVM) classifier gets biased toward the majority class. Furthermore, these samples are often observed with a certain degree of noise. Therefore, to remove these problems we propose a novel fuzzy based approach to deal with class imbalanced as well noisy datasets. We propose two approaches to address these problems. The first approach is based on the intuitionistic fuzzy membership, termed as robust energy-based intuitionistic fuzzy least squares twin support vector machine (IF-RELSTSVM). Furthermore, we introduce the concept of hyperplane-based fuzzy membership in our second approach, where the final classifier is termed as robust energy-based fuzzy least square twin support vector machine (F-RELSTSVM). By using this technique, the membership values are based on a projection based approach, where the data points are projected on the hyperplanes. The performance of the proposed algorithms is evaluated on several benchmark and synthetic datasets. The experimental results show that the proposed IF-RELSTSVM and F-RELSTSVM models outperform the baseline algorithms. Statistical tests are performed to check the significance of the proposed algorithms. The results show the applicability of the proposed algorithms on noisy as well as imbalanced datasets.
    摘要 classe imbalance 是现实世界分类任务中的一个主要问题,由于样本的数量异常分布,支持向量机 (SVM) 分类器会受到主导类的偏向。此外,这些样本经常会受到一定程度的噪声影响。因此,我们提出了一种基于概率的新方法来处理类偏度和噪声问题。我们提出了两种方法来解决这些问题:第一种方法是基于感知度的强制类型,称为稳定能量基于感知度的最小二乘支持向量机 (IF-RELSTSVM)。其次,我们引入了基于抽象平面的辅助分类器,其最终分类器称为稳定能量基于抽象平面的概率最小二乘支持向量机 (F-RELSTSVM)。通过这种技术,分类器的成员值基于一种投影方法,将数据点投影到抽象平面上。我们对一些标准和 sintetic 数据集进行了实验,结果表明,我们提出的 IF-RELSTSVM 和 F-RELSTSVM 模型在基eline算法的基础上具有显著的优势。我们还进行了统计测试,以验证提出的方法的可靠性。结果表明,提出的方法可以在噪声和类偏度问题下得到好的应用。

Joint Sampling and Optimisation for Inverse Rendering

  • paper_url: http://arxiv.org/abs/2309.15676
  • repo_url: None
  • paper_authors: Martin Balint, Karol Myszkowski, Hans-Peter Seidel, Gurprit Singh
  • for: solve difficult inverse problems such as inverse rendering using Monte Carlo estimated gradients
  • methods: use interleaving sampling and optimisation, update and reuse past samples with low-variance finite-difference estimators, combine proportional and finite-difference samples to continuously reduce variance
  • results: speed up convergence of optimisation tasks, demonstrate effectiveness in inverse path tracing
    Abstract When dealing with difficult inverse problems such as inverse rendering, using Monte Carlo estimated gradients to optimise parameters can slow down convergence due to variance. Averaging many gradient samples in each iteration reduces this variance trivially. However, for problems that require thousands of optimisation iterations, the computational cost of this approach rises quickly. We derive a theoretical framework for interleaving sampling and optimisation. We update and reuse past samples with low-variance finite-difference estimators that describe the change in the estimated gradients between each iteration. By combining proportional and finite-difference samples, we continuously reduce the variance of our novel gradient meta-estimators throughout the optimisation process. We investigate how our estimator interlinks with Adam and derive a stable combination. We implement our method for inverse path tracing and demonstrate how our estimator speeds up convergence on difficult optimisation tasks.
    摘要 When dealing with difficult inverse problems such as inverse rendering, using Monte Carlo estimated gradients to optimize parameters can slow down convergence due to variance. Averaging many gradient samples in each iteration reduces this variance trivially. However, for problems that require thousands of optimization iterations, the computational cost of this approach rises quickly. We derive a theoretical framework for interleaving sampling and optimization. We update and reuse past samples with low-variance finite-difference estimators that describe the change in the estimated gradients between each iteration. By combining proportional and finite-difference samples, we continuously reduce the variance of our novel gradient meta-estimators throughout the optimization process. We investigate how our estimator interlinks with Adam and derive a stable combination. We implement our method for inverse path tracing and demonstrate how our estimator speeds up convergence on difficult optimization tasks.Here's the text in Traditional Chinese:在解决复杂的 inverse 问题时,如 inverse rendering,使用 Monte Carlo 估计 gradient 来优化参数可能会因为异谱而 slow down convergence。每个迭代中聚合多个 gradient 样本可以很容易地降低异谱。然而,需要 thousands 个优化迭代,这种方法的计算成本会快速增长。我们 derive 了一个理论框架,用于杂合抽象和优化。我们在每个迭代中更新和重用过去的样本,使用 low-variance finite-difference 估计器来描述每次迭代中 estimated gradient 的变化。通过结合 пропорциональ 和 finite-difference 样本,我们在优化过程中不断降低我们的新型 gradient meta-估计器的异谱。我们调查了我们的估计器与 Adam 的稳定组合。我们实现了我们的方法,用于 inverse path tracing,并在困难的优化任务上证明了我们的估计器可以加速 convergence。

On Computational Entanglement and Its Interpretation in Adversarial Machine Learning

  • paper_url: http://arxiv.org/abs/2309.15669
  • repo_url: None
  • paper_authors: YenLung Lai, Xingbo Dong, Zhe Jin
  • for: This paper explores the intrinsic complexity and interpretability of adversarial machine learning models.
  • methods: The authors define entanglement computationally and demonstrate the existence of strong correlations between distant feature samples, akin to entanglement in the quantum realm.
  • results: The study reveals links between machine learning model complexity and Einstein’s theory of special relativity, challenging conventional perspectives on adversarial transferability and providing insights into more robust and interpretable models.
    Abstract Adversarial examples in machine learning has emerged as a focal point of research due to their remarkable ability to deceive models with seemingly inconspicuous input perturbations, potentially resulting in severe consequences. In this study, we embark on a comprehensive exploration of adversarial machine learning models, shedding light on their intrinsic complexity and interpretability. Our investigation reveals intriguing links between machine learning model complexity and Einstein's theory of special relativity, through the concept of entanglement. More specific, we define entanglement computationally and demonstrate that distant feature samples can exhibit strong correlations, akin to entanglement in quantum realm. This revelation challenges conventional perspectives in describing the phenomenon of adversarial transferability observed in contemporary machine learning models. By drawing parallels with the relativistic effects of time dilation and length contraction during computation, we gain deeper insights into adversarial machine learning, paving the way for more robust and interpretable models in this rapidly evolving field.
    摘要 《机器学习中的对抗示例》在研究中得到了广泛关注,因为它们能够通过一些微小的输入抖动,让模型出现异常的行为,从而导致严重的后果。在这项研究中,我们进行了广泛的对机器学习模型的探索,揭示了这些模型的内在复杂性和可解性。我们的调查发现,机器学习模型的复杂度和爱因斯坦的特殊 relativity 理论之间存在感知的联系,通过计算上的束缚来解释。具体来说,我们定义了计算上的束缚,并证明了 distant feature samples 可以显示强相关性,与量子世界中的束缚类似。这一发现挑战了当代机器学习模型中对 adversarial transferability 的描述方式。通过在计算中的时间扭曲和长度减小的关系来启示对 adversarial machine learning 的深入理解,为这一领域的更加robust和可解性的模型开辟了新的道路。

Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency

  • paper_url: http://arxiv.org/abs/2309.15659
  • repo_url: None
  • paper_authors: Long Tan Le, Tuan Dung Nguyen, Tung-Anh Nguyen, Choong Seon Hong, Nguyen H. Tran
  • for: 本研究旨在提出一种基于联合平衡学习和共识优化的分布式学习框架,以解决Edge AI解决方案中的通信瓶颈、数据不一致和内存限制问题。
  • methods: 本研究使用的方法包括深度平衡学习和共识优化,以实现在边缘节点上 derivation of personalized models。具有共识优化的模型结构包括平衡层和传统神经网络层。
  • results: 实验结果表明,FeDEQ可以在不同的benchmark上达到与个性化方法相同的性能水平,同时使用的模型尺寸比较小,即4倍小于个性化方法的通信尺寸和1.5倍小于个性化方法的训练内存尺寸。
    Abstract Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by communication bottlenecks, data heterogeneity, and memory limitations. To address these challenges jointly, we introduce FeDEQ, a pioneering FL framework that effectively employs deep equilibrium learning and consensus optimization to exploit a compact shared data representation across edge nodes, allowing the derivation of personalized models specific to each node. We delve into a unique model structure composed of an equilibrium layer followed by traditional neural network layers. Here, the equilibrium layer functions as a global feature representation that edge nodes can adapt to personalize their local layers. Capitalizing on FeDEQ's compactness and representation power, we present a novel distributed algorithm rooted in the alternating direction method of multipliers (ADMM) consensus optimization and theoretically establish its convergence for smooth objectives. Experiments across various benchmarks demonstrate that FeDEQ achieves performance comparable to state-of-the-art personalized methods while employing models of up to 4 times smaller in communication size and 1.5 times lower memory footprint during training.
    摘要 Federation Learning (FL) 是一种广泛分布式学习 paradigma,帮助 Edge 网络中的节点共同训练全球模型,而不需要中央化数据。通过将计算转移到网络边缘,FL 提供了可靠和快速的 Edge-AI 解决方案,并保护隐私。然而,在 Edge 环境中部署深度 FL 模型时,经常会遇到通信瓶颈、数据多样性和内存限制。为了共同解决这些挑战,我们介绍了 FeDEQ,一种先进的 FL 框架,该利用深度均衡学习和consensus优化来实现一个具有共同数据表示的紧凑共享模型,允许每个节点 derivation 个性化的模型。我们探讨了一种独特的模型结构,其包括均衡层 seguido de tradicional 神经网络层。在这种模型结构中,均衡层 函数为每个节点可适应的全球特征表示,允许边缘节点个性化其本地层。通过 FeDEQ 的紧凑性和表示力,我们提出了一种新的分布式算法,基于 alternating direction method of multipliers (ADMM) 的consensus优化,并理论确定其收敛性。在多种 benchmark 上进行的实验表明,FeDEQ 可以与当前的个性化方法相比,使用的模型尺寸更小,并在训练过程中占用更少的内存。

SANGEA: Scalable and Attributed Network Generation

  • paper_url: http://arxiv.org/abs/2309.15648
  • repo_url: None
  • paper_authors: Valentin Lemaire, Youssef Achenchabe, Lucas Ody, Houssem Eddine Souid, Gianmarco Aversano, Nicolas Posocco, Sabri Skhiri
  • for: 本研究旨在扩展现有的生成模型,使其可以应用于大型图。
  • methods: 本paper使用了分community的方法,首先将大图分成多个社区,然后在每个社区中使用SGG生成图。
  • results: 实验表明,生成的图与原图具有类似的topology和节点特征分布,同时在下游任务中具有高的链接预测性。此外,生成的图也具有合理的隐私评价。
    Abstract The topic of synthetic graph generators (SGGs) has recently received much attention due to the wave of the latest breakthroughs in generative modelling. However, many state-of-the-art SGGs do not scale well with the graph size. Indeed, in the generation process, all the possible edges for a fixed number of nodes must often be considered, which scales in $\mathcal{O}(N^2)$, with $N$ being the number of nodes in the graph. For this reason, many state-of-the-art SGGs are not applicable to large graphs. In this paper, we present SANGEA, a sizeable synthetic graph generation framework which extends the applicability of any SGG to large graphs. By first splitting the large graph into communities, SANGEA trains one SGG per community, then links the community graphs back together to create a synthetic large graph. Our experiments show that the graphs generated by SANGEA have high similarity to the original graph, in terms of both topology and node feature distribution. Additionally, these generated graphs achieve high utility on downstream tasks such as link prediction. Finally, we provide a privacy assessment of the generated graphs to show that, even though they have excellent utility, they also achieve reasonable privacy scores.
    摘要 topic of synthetic graph generators (SGGs) 最近得到了很多关注,因为生成模型的latest breakthroughs 。然而,许多状态对的 SGGs 不能Scales well with the graph size。实际上,在生成过程中,所有可能的边 для一个固定数量的节点必须经常考虑,这 scales in $\mathcal{O}(N^2)$, with $N$ 是图中节点的数量。因此,许多状态对的 SGGs 不适用于大图。在这篇论文中,我们提出了 SANGEA,一个可 scales synthetic graph generation framework。我们首先将大图分成社区,然后在每个社区中训练一个 SGG,然后将社区图相互链接以创建一个 synthetic 大图。我们的实验表明,生成的图与原图具有高度相似性,包括图形态和节点特征分布。此外,这些生成的图可以在下游任务中实现高的链接预测性能。最后,我们进行了隐私评估,表明,即使具有出色的实用性,这些生成的图也可以实现合理的隐私分数。

Cold & Warm Net: Addressing Cold-Start Users in Recommender Systems

  • paper_url: http://arxiv.org/abs/2309.15646
  • repo_url: None
  • paper_authors: Xiangyu Zhang, Zongqiang Kuang, Zehao Zhang, Fan Huang, Xianfeng Tan
  • for: solve the user cold-start problem in the matching stage of recommender systems.
  • methods: utilize side information or meta-learning to model cold-start users, and incorporate the results from two experts using a gate network. Additionally, dynamic knowledge distillation is used to assist experts in better learning user representation, and comprehensive mutual information is used to select highly relevant features for the bias net.
  • results: outperform other models on all user types on public datasets, and achieve a significant increase in app dwell time and user retention rate on an industrial short video platform.
    Abstract Cold-start recommendation is one of the major challenges faced by recommender systems (RS). Herein, we focus on the user cold-start problem. Recently, methods utilizing side information or meta-learning have been used to model cold-start users. However, it is difficult to deploy these methods to industrial RS. There has not been much research that pays attention to the user cold-start problem in the matching stage. In this paper, we propose Cold & Warm Net based on expert models who are responsible for modeling cold-start and warm-up users respectively. A gate network is applied to incorporate the results from two experts. Furthermore, dynamic knowledge distillation acting as a teacher selector is introduced to assist experts in better learning user representation. With comprehensive mutual information, features highly relevant to user behavior are selected for the bias net which explicitly models user behavior bias. Finally, we evaluate our Cold & Warm Net on public datasets in comparison to models commonly applied in the matching stage and it outperforms other models on all user types. The proposed model has also been deployed on an industrial short video platform and achieves a significant increase in app dwell time and user retention rate.
    摘要 冷启动推荐是推荐系统(RS)的一个主要挑战。我们在这里关注用户冷启动问题。现在,利用侧信息或meta学习方法来模型冷启动用户已经得到了一些研究。然而,在实际RS中部署这些方法却很困难。在匹配阶段,很少有关注用户冷启动问题的研究。在这篇论文中,我们提出了冷&温网(Cold & Warm Net),该模型由专家模型负责模型冷启动和温存用户。一个门控网络用于结合两个专家的结果。此外,我们还引入了动态知识填充作为教师选择器,以助专家更好地学习用户表示。通过全面的共同信息,我们选择了高度相关于用户行为的特征来进行偏好网的模型。最后,我们对公共数据集进行评估,并与常见在匹配阶段使用的模型进行比较,我们的模型在所有用户类型上都表现出优异。此外,我们还将该模型部署到了一家短视频平台,并实现了明显提高应用内存时间和用户固持率。

Why do Angular Margin Losses work well for Semi-Supervised Anomalous Sound Detection?

  • paper_url: http://arxiv.org/abs/2309.15643
  • repo_url: None
  • paper_authors: Kevin Wilkinghoff, Frank Kurth
  • for: 这 paper 的目的是调查使用angular margin losses与auxiliary tasks来检测异常声音的原因。
  • methods: 这 paper 使用了angular margin losses与related classification task作为auxiliary task,并通过 teoretic 分析和实验证明了这种方法可以减少compactness loss并避免学习极端解。
  • results: 实验表明,使用相关的类别任务作为auxiliary task可以教会模型学习适合检测异常声音的表示,并在噪音条件下显著地超越生成或一类模型。
    Abstract State-of-the-art anomalous sound detection systems often utilize angular margin losses to learn suitable representations of acoustic data using an auxiliary task, which usually is a supervised or self-supervised classification task. The underlying idea is that, in order to solve this auxiliary task, specific information about normal data needs to be captured in the learned representations and that this information is also sufficient to differentiate between normal and anomalous samples. Especially in noisy conditions, discriminative models based on angular margin losses tend to significantly outperform systems based on generative or one-class models. The goal of this work is to investigate why using angular margin losses with auxiliary tasks works well for detecting anomalous sounds. To this end, it is shown, both theoretically and experimentally, that minimizing angular margin losses also minimizes compactness loss while inherently preventing learning trivial solutions. Furthermore, multiple experiments are conducted to show that using a related classification task as an auxiliary task teaches the model to learn representations suitable for detecting anomalous sounds in noisy conditions. Among these experiments are performance evaluations, visualizing the embedding space with t-SNE and visualizing the input representations with respect to the anomaly score using randomized input sampling for explanation.
    摘要 现代异常声检测系统经常使用角度margin损失来学习适合听音数据的表示。这个想法是,为了解决这个辅助任务,需要学习表示中包含正常数据特征信息,并且这些信息也能够区分正常和异常样本。尤其在噪音条件下,使用角度margin损失的权重分配模型往往与生成模型或一类模型相比表现出色。这项工作的目的是研究为何在听音检测中使用角度margin损失和辅助任务是有效的。这个问题的解决方法是,理论上和实验上都证明,使用角度margin损失同时减小紧凑性损失,并且自然地避免学习极端解。此外,通过多个实验表明,使用相关的类别任务作为辅助任务可以使模型学习适合听音检测的表示,特别是在噪音条件下。这些实验包括表现评估、使用t-SNEVisualizing embedding空间和使用随机输入抽样来解释输入表示的方法。

Efficient tensor network simulation of IBM’s largest quantum processors

  • paper_url: http://arxiv.org/abs/2309.15642
  • repo_url: None
  • paper_authors: Siddhartha Patra, Saeed S. Jahromi, Sukhbinder Singh, Roman Orus
  • For: The paper demonstrates how quantum-inspired 2D tensor networks can be used to efficiently and accurately simulate large quantum processors, specifically the IBM Eagle, Osprey, and Condor processors.* Methods: The paper uses graph-based Projected Entangled Pair States (gPEPS) to simulate the dynamics of a complex quantum many-body system, the kicked Ising experiment.* Results: The paper achieves very large unprecedented accuracy with remarkably low computational resources for this model, and extends the results to larger qubit counts (433 and 1121 qubits) and longer evolution times. Additionally, the paper demonstrates accurate simulations for infinitely-many qubits.
    Abstract We show how quantum-inspired 2d tensor networks can be used to efficiently and accurately simulate the largest quantum processors from IBM, namely Eagle (127 qubits), Osprey (433 qubits) and Condor (1121 qubits). We simulate the dynamics of a complex quantum many-body system -- specifically, the kicked Ising experiment considered recently by IBM in Nature 618, p. 500-505 (2023) -- using graph-based Projected Entangled Pair States (gPEPS), which was proposed by some of us in PRB 99, 195105 (2019). Our results show that simple tensor updates are already sufficient to achieve very large unprecedented accuracy with remarkably low computational resources for this model. Apart from simulating the original experiment for 127 qubits, we also extend our results to 433 and 1121 qubits, and for evolution times around 8 times longer, thus setting a benchmark for the newest IBM quantum machines. We also report accurate simulations for infinitely-many qubits. Our results show that gPEPS are a natural tool to efficiently simulate quantum computers with an underlying lattice-based qubit connectivity, such as all quantum processors based on superconducting qubits.
    摘要 我们显示了使用量子感知的2Dtensor网络来高效地和精度地模拟IBM最大的量子处理器,包括鹰(127 qubits)、𫛭(433 qubits)和鸠(1121 qubits)。我们使用图形基于的Projected Entangled Pair States(gPEPS)来模拟一个复杂的量子多体系统——对IBM在Nature 618, p. 500-505(2023)中考虑的kicked Ising实验。我们的结果显示了简单的tensor更新已经足够以 достиunge非常大的新高精度,仅需使用remarkably low的计算资源。除了模拟127 qubits的原始实验之外,我们还将结果扩展到433和1121 qubits,并在 evolution times around 8 times longer 上进行了benchmark。我们还报告了对无限多颗 qubits 的精度模拟。我们的结果显示了gPEPS是一个自然的工具来高效地模拟基于碳素链的qubit连接的量子计算机。

Enhancing Sharpness-Aware Optimization Through Variance Suppression

  • paper_url: http://arxiv.org/abs/2309.15639
  • repo_url: None
  • paper_authors: Bingcong Li, Georgios B. Giannakis
  • for: 提高深度神经网络的泛化能力,不需要大量数据增强
  • methods: 基于损失函数的几何结构,寻找’平坦谷’,通过最大损失带动参数的敏感范围内的敏感探索
  • results: 提出了一种新的稳定化难度探索方法(VaSSO),可以避免’友好的敌人’的问题,并在模型独立任务中提供了数值上的改进和鲁棒性 against 高强度标签噪声。
    Abstract Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.
    摘要 <>使用适度化 minimization(SAM)可以增强深度神经网络的通用性,无需很大的数据增强。通过捕捉损失函数的几何结构,where neighborhoods of 'flat minima' enhance generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood。虽然需要考虑损失函数的锐度,但这种'过度友好的对手'可能会限制最大化的泛化能力。这篇论文的新 aproach 是通过 variance suppression (VaSSO) 来稳定对手,以避免这种友好性。VaSSO 的数学稳定性保证其数值改进之前的 SAM 在模型无关任务中,包括图像分类和机器翻译。此外,实验证明 VaSSO 会赋予 SAM 对高水平的标签噪声的Robustness。Note: "适度化" (weixiao) in the text refers to "sharpness-aware" in English.

Entropic Matching for Expectation Propagation of Markov Jump Processes

  • paper_url: http://arxiv.org/abs/2309.15604
  • repo_url: None
  • paper_authors: Bastian Alt, Heinz Koeppl
  • for: 本研究探讨了隐藏时间连续随机过程的统计推断问题,这类问题经常是困难的,特别是对于粒子状态空间过程的描述。
  • methods: 我们提出了一种新的可行的推断方案,基于Entropic Matching框架,可以在Well-known Expectation Propagation算法中嵌入。我们给出了一种简单的家族的近似分布的关闭式结果,并应用于普遍的化学反应网络模型,这是系统生物学中重要的工具。
  • results: 我们 deriv了关闭式表达式,用于粒子参数的点估计,并使用近似期望最大化算法进行推断。我们对各种化学反应网络实例进行了评估,包括一个Stochastic Lotka-Volterra示例,并讨论了我们的方法的局限和未来改进的可能性。
    Abstract This paper addresses the problem of statistical inference for latent continuous-time stochastic processes, which is often intractable, particularly for discrete state space processes described by Markov jump processes. To overcome this issue, we propose a new tractable inference scheme based on an entropic matching framework that can be embedded into the well-known expectation propagation algorithm. We demonstrate the effectiveness of our method by providing closed-form results for a simple family of approximate distributions and apply it to the general class of chemical reaction networks, which are a crucial tool for modeling in systems biology. Moreover, we derive closed form expressions for point estimation of the underlying parameters using an approximate expectation maximization procedure. We evaluate the performance of our method on various chemical reaction network instantiations, including a stochastic Lotka-Voltera example, and discuss its limitations and potential for future improvements. Our proposed approach provides a promising direction for addressing complex continuous-time Bayesian inference problems.
    摘要

Distill Knowledge in Multi-task Reinforcement Learning with Optimal-Transport Regularization

  • paper_url: http://arxiv.org/abs/2309.15603
  • repo_url: None
  • paper_authors: Bang Giang Le, Viet Cuong Ta
  • for: 提高多任务 reinforcement learning 训练agent的数据效率,通过将不同但相关的任务之间的知识传递。
  • methods: 使用 Optimal transport-based 奖励来稳定知识传递,通过Sinkhorn mapping来计算Optimal transport距离,并用作积累奖励。
  • results: 在多个网格导航多目标任务中,我们的方法能够加速agent的学习过程,并超越多个基eline。
    Abstract In multi-task reinforcement learning, it is possible to improve the data efficiency of training agents by transferring knowledge from other different but related tasks. Because the experiences from different tasks are usually biased toward the specific task goals. Traditional methods rely on Kullback-Leibler regularization to stabilize the transfer of knowledge from one task to the others. In this work, we explore the direction of replacing the Kullback-Leibler divergence with a novel Optimal transport-based regularization. By using the Sinkhorn mapping, we can approximate the Optimal transport distance between the state distribution of tasks. The distance is then used as an amortized reward to regularize the amount of sharing information. We experiment our frameworks on several grid-based navigation multi-goal to validate the effectiveness of the approach. The results show that our added Optimal transport-based rewards are able to speed up the learning process of agents and outperforms several baselines on multi-task learning.
    摘要 在多任务强化学习中,可以通过知识传递来提高训练代理的数据效率。因为不同任务的经验通常偏向特定任务目标。传统方法利用库拉布-莱布尔正则化来稳定知识传递。在这种工作中,我们研究将库拉布-莱布尔差异 replaced with novel 优化运输基于正则化。通过使用填充映射,我们可以近似优化运输距离 между任务状态分布。这个距离然后用作权衡共享信息的奖励,以补偿代理的学习过程。我们在几个网格基础的导航多目标上进行了实验,结果表明我们的添加的优化运输基于奖励能够加速代理的学习过程,并超过了多个基线在多任务学习中。

OceanBench: The Sea Surface Height Edition

  • paper_url: http://arxiv.org/abs/2309.15599
  • repo_url: https://github.com/jejjohnson/oceanbench
  • paper_authors: J. Emmanuel Johnson, Quentin Febvre, Anastasia Gorbunova, Sammy Metref, Maxime Ballarotta, Julien Le Sommer, Ronan Fablet
    for: This paper aims to provide a unifying framework for machine learning (ML) researchers to benchmark their models and customize their pipelines for ocean satellite data interpolation challenges.methods: The paper uses satellite remote sensing data and machine learning techniques to develop a standardized processing framework called OceanBench, which provides plug-and-play data and pre-configured pipelines for ML researchers.results: The paper demonstrates the effectiveness of the OceanBench framework through a first edition dedicated to sea surface height (SSH) interpolation challenges, providing datasets and ML-ready benchmarking pipelines for simulated ocean satellite data, multi-modal and multi-sensor fusion issues, and transfer-learning to real ocean satellite observations.
    Abstract The ocean profoundly influences human activities and plays a critical role in climate regulation. Our understanding has improved over the last decades with the advent of satellite remote sensing data, allowing us to capture essential quantities over the globe, e.g., sea surface height (SSH). However, ocean satellite data presents challenges for information extraction due to their sparsity and irregular sampling, signal complexity, and noise. Machine learning (ML) techniques have demonstrated their capabilities in dealing with large-scale, complex signals. Therefore we see an opportunity for ML models to harness the information contained in ocean satellite data. However, data representation and relevant evaluation metrics can be the defining factors when determining the success of applied ML. The processing steps from the raw observation data to a ML-ready state and from model outputs to interpretable quantities require domain expertise, which can be a significant barrier to entry for ML researchers. OceanBench is a unifying framework that provides standardized processing steps that comply with domain-expert standards. It provides plug-and-play data and pre-configured pipelines for ML researchers to benchmark their models and a transparent configurable framework for researchers to customize and extend the pipeline for their tasks. In this work, we demonstrate the OceanBench framework through a first edition dedicated to SSH interpolation challenges. We provide datasets and ML-ready benchmarking pipelines for the long-standing problem of interpolating observations from simulated ocean satellite data, multi-modal and multi-sensor fusion issues, and transfer-learning to real ocean satellite observations. The OceanBench framework is available at github.com/jejjohnson/oceanbench and the dataset registry is available at github.com/quentinf00/oceanbench-data-registry.
    摘要 海洋对人类活动产生深远的影响,对气候调控也扮演着关键性的角色。过去几十年来,卫星远程感知数据的出现,使我们能够全球范围内获取重要量,如海面高程(SSH)。然而,海洋卫星数据具有缺乏精度和不规则的采样、信号复杂性和噪声等挑战。机器学习(ML)技术已经在处理大规模复杂信号方面表现出色,因此我们认为ML模型可以从海洋卫星数据中提取信息。然而,数据表示和相关评价指标是确定成功的关键因素。从原始观测数据到ML准备状态和模型输出到可读量需要域专业知识,这可能是ML研究人员面临的 significiant barrier。 OceanBench 是一个统一的框架,它提供了遵循域专业标准的处理步骤,并提供了可插入的数据和预配置的管道,以便 ML 研究人员可以 benchmark 他们的模型。在这项工作中,我们通过 OceanBench 框架展示了 SSH interpolating 挑战。我们提供了数据集和 ML 准备好的管道,以解决长期存在的 ocean 卫星数据 interpolating 问题,以及多模式和多感知融合问题,以及将模型转移到实际 ocean 卫星观测数据。 OceanBench 框架可以在 GitHub 上找到(github.com/jejjohnson/oceanbench),数据库注册可以在 GitHub 上找到(github.com/quentinf00/oceanbench-data-registry)。

Exciton-Polariton Condensates: A Fourier Neural Operator Approach

  • paper_url: http://arxiv.org/abs/2309.15593
  • repo_url: None
  • paper_authors: Surya T. Sathujoda, Yuan Wang, Kanishk Gandhi
  • for: 研究者实现了一种基于Machine Learning的Fourier Neural Operator方法,用于解决吸引子-晶质子对应系统中的非线性问题。
  • methods: 研究者使用了Machine Learning的Fourier Neural Operator方法来解决Gross-Pitaevskii方程式和额外激发数方程式的组合。
  • results: 研究者发现使用Machine Learning的Fourier Neural Operator方法可以高度精度地预测系统的终端解,并且比CUDA-based GPU推导器快得多,约1000倍。这项研究开辟了潜在的全光学芯片设计工程过程的可能性。
    Abstract Advancements in semiconductor fabrication over the past decade have catalyzed extensive research into all-optical devices driven by exciton-polariton condensates. Preliminary validations of such devices, including transistors, have shown encouraging results even under ambient conditions. A significant challenge still remains for large scale application however: the lack of a robust solver that can be used to simulate complex nonlinear systems which require an extended period of time to stabilize. Addressing this need, we propose the application of a machine-learning-based Fourier Neural Operator approach to find the solution to the Gross-Pitaevskii equations coupled with extra exciton rate equations. This work marks the first direct application of Neural Operators to an exciton-polariton condensate system. Our findings show that the proposed method can predict final-state solutions to a high degree of accuracy almost 1000 times faster than CUDA-based GPU solvers. Moreover, this paves the way for potential all-optical chip design workflows by integrating experimental data.
    摘要 “过去十年的半导体制造技术发展,促使了对激子-辐olinarpiton储集体系的广泛研究。初步验证表明,这些设备,包括普通逻辑 gates,在常 ambient 条件下表现出了激动人人。然而,大规模应用还面临着一个主要挑战:没有一个可靠的可用来模拟复杂非线性系统,需要较长时间来稳定。为解决这一需求,我们提议使用机器学习基于Fourier Neural Operator的方法来解决戈斯-皮特涅夫斯基方程组 coupled with extra exciton rate equations。这是直接应用Neural Operators于激子-辐olinarpiton储集体系的首次尝试。我们的发现表明,提议的方法可以高度准确地预测最终解决方案,比CUDA基于GPU的加速器solver快得多,约1000倍。此外,这也开启了可能的全光学芯片设计工作流程,通过结合实验数据。”Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Demographic Parity: Mitigating Biases in Real-World Data

  • paper_url: http://arxiv.org/abs/2309.17347
  • repo_url: None
  • paper_authors: Orestis Loukas, Ho-Ryun Chung
  • for: 本研究旨在提出一种可靠的方法,以除除计算机支持的决策系统中的不良偏见,同时保持分类用途的最大化。
  • methods: 本研究使用了实世界数据 derive an asymptotic dataset,该dataset具有人口分布的准确性和现实性,并可以用来训练计算机支持的各种分类器。
  • results: 经 benchmarking,我们确认了这些基于我们生成的synthetic数据训练的分类器没有显式或隐式的偏见。
    Abstract Computer-based decision systems are widely used to automate decisions in many aspects of everyday life, which include sensitive areas like hiring, loaning and even criminal sentencing. A decision pipeline heavily relies on large volumes of historical real-world data for training its models. However, historical training data often contains gender, racial or other biases which are propagated to the trained models influencing computer-based decisions. In this work, we propose a robust methodology that guarantees the removal of unwanted biases while maximally preserving classification utility. Our approach can always achieve this in a model-independent way by deriving from real-world data the asymptotic dataset that uniquely encodes demographic parity and realism. As a proof-of-principle, we deduce from public census records such an asymptotic dataset from which synthetic samples can be generated to train well-established classifiers. Benchmarking the generalization capability of these classifiers trained on our synthetic data, we confirm the absence of any explicit or implicit bias in the computer-aided decision.
    摘要

Towards Faithful Neural Network Intrinsic Interpretation with Shapley Additive Self-Attribution

  • paper_url: http://arxiv.org/abs/2309.15559
  • repo_url: None
  • paper_authors: Ying Sun, Hengshu Zhu, Hui Xiong
  • for: 本研究旨在提供一种基于自我评估的神经网络模型,以提高模型的解释性和表达能力。
  • methods: 本研究提出了一种普适的加法自我归因(ASA)框架,并提出了基于Shapley值的Self-Attributing Neural Network(SASANet)模型,以实现对输出的自我归因值的正确预测。SASANet使用积分贡献序列schema和内部蒸馏学习策略来模型有意义的输出,从而实现了不减误的意义值函数。
  • results: 实验结果表明,SASANet的性能较 existing self-attributing models 高,与黑盒模型相当,并且比Post-hoc方法更加精准和效率地解释其预测结果。
    Abstract Self-interpreting neural networks have garnered significant interest in research. Existing works in this domain often (1) lack a solid theoretical foundation ensuring genuine interpretability or (2) compromise model expressiveness. In response, we formulate a generic Additive Self-Attribution (ASA) framework. Observing the absence of Shapley value in Additive Self-Attribution, we propose Shapley Additive Self-Attributing Neural Network (SASANet), with theoretical guarantees for the self-attribution value equal to the output's Shapley values. Specifically, SASANet uses a marginal contribution-based sequential schema and internal distillation-based training strategies to model meaningful outputs for any number of features, resulting in un-approximated meaningful value function. Our experimental results indicate SASANet surpasses existing self-attributing models in performance and rivals black-box models. Moreover, SASANet is shown more precise and efficient than post-hoc methods in interpreting its own predictions.
    摘要 自适应神经网络在研究中受到了广泛的关注。现有的研究经常(1)缺乏固定的理论基础,使得真正的解释性受限,或(2)妥协模型表达能力。为回应这些挑战,我们提出了一种通用的加法自我解释(ASA)框架。我们注意到了添加式自我解释中缺失的雪佛利值,因此我们提出了雪佛利添加式自我解释神经网络(SASANet),具有输出的雪佛利值的理论保证。特别是,SASANet使用级联贡献基于的顺序schema和内部蒸馏基于的训练策略,以模型任意数量的特征输出具有准确的意义。我们的实验结果表明,SASANet比现有的自我解释模型在性能和黑盒模型的表现相当,同时也比post-hoc方法更精准和高效地解释其预测结果。

Startup success prediction and VC portfolio simulation using CrunchBase data

  • paper_url: http://arxiv.org/abs/2309.15552
  • repo_url: None
  • paper_authors: Mark Potanin, Andrey Chertok, Konstantin Zorin, Cyril Shtabtsovsky
  • for: This paper aims to predict key success milestones for startups at their Series B and Series C investment stages, such as achieving an Initial Public Offering (IPO), attaining unicorn status, or executing a successful Merger and Acquisition (M&A).
  • methods: The paper introduces a novel deep learning model for predicting startup success, which integrates a variety of factors such as funding metrics, founder features, and industry category. The model uses a comprehensive backtesting algorithm to simulate the venture capital investment process and evaluate its performance against historical data.
  • results: The paper achieved a 14 times capital growth and successfully identified high-potential startups in their B round, including Revolut, DigitalOcean, Klarna, Github, and others. The empirical findings highlight the importance of incorporating diverse feature sets in enhancing the model’s predictive accuracy.Here’s the simplified Chinese version of the three key points:
  • for: 这篇论文旨在预测 startup 在 Series B 和 Series C 融资阶段的成功关键 milestone,如 Initial Public Offering (IPO)、成为unicorn 或成功的 Merger and Acquisition (M&A)。
  • methods: 论文提出了一种新的深度学习模型,用于预测 startup 的成功,该模型集成了多种因素,如资金指标、创始人特征、行业类别。模型使用了全面的回测算法,模拟了投资过程,以评估其在历史数据上的实际性。
  • results: 论文在 Crunchbase 上实现了 14 倍资金增长,并成功地标识了 B 轮高 potential startup,如 Revolut、DigitalOcean、Klarna、Github 等。实验结果表明,将多种特征集成到模型中可以提高预测精度。
    Abstract Predicting startup success presents a formidable challenge due to the inherently volatile landscape of the entrepreneurial ecosystem. The advent of extensive databases like Crunchbase jointly with available open data enables the application of machine learning and artificial intelligence for more accurate predictive analytics. This paper focuses on startups at their Series B and Series C investment stages, aiming to predict key success milestones such as achieving an Initial Public Offering (IPO), attaining unicorn status, or executing a successful Merger and Acquisition (M\&A). We introduce novel deep learning model for predicting startup success, integrating a variety of factors such as funding metrics, founder features, industry category. A distinctive feature of our research is the use of a comprehensive backtesting algorithm designed to simulate the venture capital investment process. This simulation allows for a robust evaluation of our model's performance against historical data, providing actionable insights into its practical utility in real-world investment contexts. Evaluating our model on Crunchbase's, we achieved a 14 times capital growth and successfully identified on B round high-potential startups including Revolut, DigitalOcean, Klarna, Github and others. Our empirical findings illuminate the importance of incorporating diverse feature sets in enhancing the model's predictive accuracy. In summary, our work demonstrates the considerable promise of deep learning models and alternative unstructured data in predicting startup success and sets the stage for future advancements in this research area.
    摘要 预测创业成功具有挑战性,因为创新企业环境的自然变化和不确定性。随着数据库如Crunchbase的出现以及可用的开放数据,我们可以通过机器学习和人工智能来进行更准确的预测分析。这篇论文将关注创业在Series B和Series C融资阶段,以预测创业成功的关键里程碑,如实现首次公开募股(IPO)、成为unicorn企业或成功投资(M&A)。我们提出了一种深度学习模型,以便预测创业成功,并 integrate了各种因素,如资金指标、创始人特征、行业类别。我们的研究的一个独特特点是使用了全面的回测算法,以模拟投资过程,从而提供了对历史数据的robust评估,并提供了实际投资场景中模型的实用性。通过对Crunchbase的数据进行评估,我们实现了14倍的资金增长,并成功地标识了B轮高潜力创业公司,如Revolut、DigitalOcean、Klarna、Github等。我们的实证发现,包括多种特征集的 incorporation 可以增强模型的预测精度。总之,我们的工作表明了深度学习模型和代替性数据在预测创业成功的可能性,并为这个研究领域的未来发展奠定基础。

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.15531
  • repo_url: None
  • paper_authors: Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee
  • for: 这篇论文目的是提出一种新的量化方法,以提高小批量推理中的大型语言模型(LLMs)的效率。
  • methods: 本论文提出了一种新的量化方法,即per-IC量化,它在每个输入通道(IC)中创建量化组,以将问题集中的大量数据隔离出来。此外,论文还提出了一个名为Adaptive Dimensions(AdaDim)的可靠量化框架,可以适应不同的权重敏感性模式。
  • results: 论文的实验结果显示,这些新的量化方法可以与传统的 Round-To-Nearest 和 GPTQ 方法相结合,实现在不同的语言模型 benchmark 上的优化性能。特别是在基础 LLMS 上(up to +4.7% on MMLU)和人工评估 LLMS 上(up to +10% on HumanEval)。
    Abstract Large Language Models (LLMs) have recently demonstrated a remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to its large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outliers. To mitigate the undesirable outlier effect, we first propose per-IC quantization, a simple yet effective method that creates quantization groups within each input channel (IC) rather than the conventional per-output channel (OC). Our method is motivated by the observation that activation outliers affect the input dimension of the weight matrix, so similarly grouping the weights in the IC direction can isolate outliers to be within a group. We also find that activation outliers do not dictate quantization difficulty, and inherent weight sensitivities also exist. With per-IC quantization as a new outlier-friendly scheme, we then propose Adaptive Dimensions (AdaDim), a versatile quantization framework that can adapt to various weight sensitivity patterns. We demonstrate the effectiveness of AdaDim by augmenting prior methods such as Round-To-Nearest and GPTQ, showing significant improvements across various language modeling benchmarks for both base (up to +4.7% on MMLU) and instruction-tuned (up to +10% on HumanEval) LLMs.
    摘要 大型语言模型(LLM)最近表现出了惊人的成功,但是有效地服务LLM仍然是一个挑战,特别是在小批量推理设置(例如移动设备)。量化可以是一种有前途的方法,但是4比特以下的量化仍然是一个挑战,因为大量的活动异常值。为了解决这些不良异常的影响,我们首先提议使用每个输入通道(IC)量化,一种简单而有效的方法,它在每个输入通道内创建量化组而不是传统的每个输出通道(OC)。我们的方法是由于异常值影响输入维度的 weights 矩阵,因此在IC方向上组合 weights 也可以孤立异常。我们还发现,异常值不会决定量化的难度,潜在的 weight 敏感性也存在。在每个IC量化为新的异常友好方案基础之上,我们然后提议一种可变维度的量化框架,可以适应不同的 weight 敏感性模式。我们通过将优化先前的方法,如 Round-To-Nearest 和 GPTQ,在不同的语言模型推理 benchmark 上表现出显著的改善,对于基本(最高 +4.7% 的 MMLU)和指导调整(最高 +10% 的 HumanEval)LLM 都有显著的改善。

GNN4EEG: A Benchmark and Toolkit for Electroencephalography Classification with Graph Neural Network

  • paper_url: http://arxiv.org/abs/2309.15515
  • repo_url: https://github.com/miracle-2001/gnn4eeg
  • paper_authors: Kaiyuan Zhang, Ziyi Ye, Qingyao Ai, Xiaohui Xie, Yiqun Liu
  • for: 本研究旨在提供一个可读的Graph Neural Networks(GNN)工具套件,用于模型EEG信号。
  • methods: 本工具套件包括三个 ком成分:(i)一个大的benchmark,基于四个EEG分类任务,使用123名参与者收集的EEG数据;(ii)各种state-of-the-art GNN-based EEG分类模型的容易使用实现,如DGCNN和RGNN等;(iii)实现了广泛的实验设定和评估协议,如数据分割协议和十字验证协议。
  • results: 本研究通过使用GNN4EEG工具套件,可以实现高精度的EEG分类。
    Abstract Electroencephalography(EEG) classification is a crucial task in neuroscience, neural engineering, and several commercial applications. Traditional EEG classification models, however, have often overlooked or inadequately leveraged the brain's topological information. Recognizing this shortfall, there has been a burgeoning interest in recent years in harnessing the potential of Graph Neural Networks (GNN) to exploit the topological information by modeling features selected from each EEG channel in a graph structure. To further facilitate research in this direction, we introduce GNN4EEG, a versatile and user-friendly toolkit for GNN-based modeling of EEG signals. GNN4EEG comprises three components: (i)A large benchmark constructed with four EEG classification tasks based on EEG data collected from 123 participants. (ii)Easy-to-use implementations on various state-of-the-art GNN-based EEG classification models, e.g., DGCNN, RGNN, etc. (iii)Implementations of comprehensive experimental settings and evaluation protocols, e.g., data splitting protocols, and cross-validation protocols. GNN4EEG is publicly released at https://github.com/Miracle-2001/GNN4EEG.
    摘要 electroencephalography(EEG)分类是 neuroscience, neural engineering 和一些商业应用中的关键任务。传统的EEG分类模型,然而,经常忽视或不充分利用大脑的 topological 信息。认识到这一缺点,过去几年来,有越来越多的研究者在使用Graph Neural Networks (GNN) 来利用选择自 EEG 通道的特征,建模 EEG 信号。为了进一步促进这个方向的研究,我们介绍 GNN4EEG,一个多样化和易用的工具集,用于 GNN 基于 EEG 信号的模型化。GNN4EEG 包括以下三部分:(i) 一个包含四个 EEG 分类任务的大量 benchmark,基于123名参与者收集的 EEG 数据。(ii) 一些实现了当前State-of-the-art GNN 基于 EEG 分类模型,例如 DGCNN 和 RGNN 等。(iii) 实现了完整的实验设置和评估协议,例如数据分割协议和十字验证协议。GNN4EEG 公开发布于 https://github.com/Miracle-2001/GNN4EEG。

Bayesian Personalized Federated Learning with Shared and Personalized Uncertainty Representations

  • paper_url: http://arxiv.org/abs/2309.15499
  • repo_url: None
  • paper_authors: Hui Chen, Hengyu Liu, Longbing Cao, Tiancheng Zhang
  • for: This paper aims to address challenges in existing personalized federated learning (PFL) by introducing a Bayesian personalized federated learning (BPFL) framework that quantifies uncertainty and heterogeneity within and across clients.
  • methods: The BPFL framework uses a Bayesian federated neural network (BPFed) to decompose hidden neural representations into shared and local components, and jointly learns cross-client shared uncertainty and client-specific personalized uncertainty over statistically heterogeneous client data.
  • results: The paper provides theoretical analysis and guarantees, as well as experimental evaluation of BPFed against diversified baselines, to demonstrate the effectiveness of the proposed approach.
    Abstract Bayesian personalized federated learning (BPFL) addresses challenges in existing personalized FL (PFL). BPFL aims to quantify the uncertainty and heterogeneity within and across clients towards uncertainty representations by addressing the statistical heterogeneity of client data. In PFL, some recent preliminary work proposes to decompose hidden neural representations into shared and local components and demonstrates interesting results. However, most of them do not address client uncertainty and heterogeneity in FL systems, while appropriately decoupling neural representations is challenging and often ad hoc. In this paper, we make the first attempt to introduce a general BPFL framework to decompose and jointly learn shared and personalized uncertainty representations on statistically heterogeneous client data over time. A Bayesian federated neural network BPFed instantiates BPFL by jointly learning cross-client shared uncertainty and client-specific personalized uncertainty over statistically heterogeneous and randomly participating clients. We further involve continual updating of prior distribution in BPFed to speed up the convergence and avoid catastrophic forgetting. Theoretical analysis and guarantees are provided in addition to the experimental evaluation of BPFed against the diversified baselines.
    摘要 bayesian人类化联合学习(BPFL)解决了现有的个性化联合学习(PFL)中的挑战。 BPFL 的目标是量化客户端数据中的不确定性和多样性,通过对客户端数据的统计多样性进行处理,生成不确定性表示。在 PFL 中,一些最近的初步工作提议将隐藏神经表示分解为共享和本地组成部分,并得到了有趣的结果。然而,大多数这些方法不Address client uncertainty and heterogeneity in FL systems, while appropriately decoupling neural representations is challenging and often ad hoc。在这篇论文中,我们首次提出了一个通用的BPFL框架,用于在 statistically heterogeneous client data 上 decomposing and jointly learning shared and personalized uncertainty representations。一个 Bayesian federated neural network BPFed 实现了 BPFL,并在 statistically heterogeneous and randomly participating clients 上 jointly learning cross-client shared uncertainty and client-specific personalized uncertainty。我们还包括在 BPFed 中 continual updating of prior distribution 以加速收敛和避免快速忘记。在此之外,我们还提供了理论分析和保证。

Explainable machine learning-based prediction model for diabetic nephropathy

  • paper_url: http://arxiv.org/abs/2309.16730
  • repo_url: None
  • paper_authors: Jing-Mei Yin, Yang Li, Jun-Tang Xue, Guo-Wei Zong, Zhong-Ze Fang, Lang Zou
  • For: The paper aims to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach.* Methods: The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). The optimal 38 features were selected through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. Four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree, and logistic regression, were compared using AUC-ROC curves, decision curves, and calibration curves. The Shapley Additive exPlanations (SHAP) method was used to quantify feature importance and interaction effects in the optimal predictive model.* Results: The XGB model had the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gained more clinical net benefits than others and had a better fitting degree. Significant interactions between serum metabolites and duration of diabetes were found. A predictive model was developed using the XGB algorithm to screen for DN, with C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys being the most contributing factors. These factors could potentially serve as biomarkers for DN.
    Abstract The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree and logistic regression, by AUC-ROC curves, decision curves, calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley Additive exPlanations (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model, and can possibly be biomarkers for DN.
    摘要 这项研究的目的是分析血液代谢物对 диабетичеnehropathy(DN)的影响,并预测DN的发病率通过机器学习方法。数据集包含2018年4月至2019年4月的548名病人在第二附属医院大连医科大学第二附属医院(SAHDMU)。我们选择最佳的38个特征通过Least absolute shrinkage and selection operator(LASSO)回归模型和10fold Cross-Validation。我们比较了四种机器学习算法,包括extreme Gradient Boosting(XGB)、Random Forest、决策树和Logistic Regression,通过AUC-ROC曲线、决策曲线、calibration曲线进行比较。我们使用Shapley Additive exPlanations(SHAP)方法来衡量特征重要性和交互效应在最佳预测模型中。XGB模型在DNcreening方面表现最佳,AUC值为0.966。XGB模型也在临床总效益方面表现更好,并且适应度更高。此外,血液代谢物和糖尿病患期之间存在显著交互效应。我们开发了一个XGB模型来预测DN。C2、C5DC、Tyr、Ser、Met、C24、C4DC和Cys在模型中具有重要作用,可能成为DN的生物标志物。

Fast Locality Sensitive Hashing with Theoretical Guarantee

  • paper_url: http://arxiv.org/abs/2309.15479
  • repo_url: None
  • paper_authors: Zongyuan Tan, Hongya Wang, Bo Xu, Minjie Luo, Ming Du
  • for: nearest neighbor search task
  • methods: random sampling and random projection
  • results: up to 80x speedup in hash function evaluation, on par with state-of-the-arts in terms of answer quality, space occupation, and query efficiency
    Abstract Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks. The cost of hashing is proportional to data dimensions, and thus often the performance bottleneck when dimensionality is high and the number of hash functions involved is large. Surprisingly, however, little work has been done to improve the efficiency of LSH computation. In this paper, we design a simple yet efficient LSH scheme, named FastLSH, under l2 norm. By combining random sampling and random projection, FastLSH reduces the time complexity from O(n) to O(m) (m
    摘要 “本文提出了一种简单又高效的 hash 算法,名为 FastLSH,用于 nearest neighbor search 任务。该算法通过Random Sampling 和 Random Projection 两种方法,将时间复杂度从 O(n) 降低至 O(m),其中 n 是数据维度,m 是取样维度。此外,FastLSH 具有可证明的 LSH 性质,与非 LSH 快笔不同。我们在一个实际数据集和一些 sintetic 数据集上进行了广泛的实验,结果表明,FastLSH 与状态之前的最佳答案相当,并且具有更高的查询效率和更低的存储占用率。我们认为 FastLSH 是一种有前途的 LSH 算法。”

DTC: Deep Tracking Control – A Unifying Approach to Model-Based Planning and Reinforcement-Learning for Versatile and Robust Locomotion

  • paper_url: http://arxiv.org/abs/2309.15462
  • repo_url: None
  • paper_authors: Fabian Jenelten, Junzhe He, Farbod Farshidian, Marco Hutter
  • For: 本研究旨在开发一种可靠和抗衰落的脚部机器人控制方法,以便在实际世界中提供稳定的行走能力。* Methods: 本研究使用了模型驱动的循环优化方法和学习从数据的强化学习方法,并将其结合在一起以实现更高的稳定性和抗衰落性。* Results: 研究表明,该控制方法可以在稀有的脚部支点下实现更高的精度和抗衰落性,并且在不同的跟踪优化方法下仍然可以保持稳定性。
    Abstract Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing due to intuitive cost function tuning, accurate planning, and most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation and may hinder successful sim-to-real transfer. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or stepping stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach utilizes a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluate the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts. Finally, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning.
    摘要 四肢行走控制问题是一个复杂的问题,需要同时具备准确性和鲁棒性,以满足实际世界中的挑战。传统上,四肢系统通过天体动力学优化算法进行控制,这种层次模型基于方法具有直观的成本函数调整、准确的规划和深刻的研究经验。然而,模型匹配和假设违反是常见的问题,可能会导致操作失败。相比之下,在 simulations 中进行学习 Reinforcement 学习可以获得不可思议的鲁棒性和恢复技能。然而,所有学习算法都会在罕见的有效步行面上遇到缺乏奖励的问题,如块或步行石头。在这种情况下,我们提出了一种混合控制架构,将模型基于的优化方法和学习 Reinforcement 学习相结合,以同时实现更高的鲁棒性、脚印准确性和地形泛化。我们的方法在训练中使用模型基于的规划器执行参考动作。一个深度神经网络策略在 simulated 环境中进行培训,目标是跟踪优化的脚印。我们评估了我们的行走管道的准确性在罕见的地形上,其中纯数据驱动方法容易出现问题。此外,我们还证明了我们的提议跟踪控制器在不同的轨迹优化方法下可以保持一致性。综上所述,我们的工作将线性控制和在线计划的优点相结合,同时具备预测能力和优化 garanties。

Deep Learning in Deterministic Computational Mechanics

  • paper_url: http://arxiv.org/abs/2309.15421
  • repo_url: None
  • paper_authors: Leon Herrmann, Stefan Kollmannsberger
  • for: 本文旨在帮助计算机机械领域的研究人员更好地了解深度学习技术的应用,以便更好地探索这一领域。
  • methods: 本文描述了五种主要的深度学习方法,包括模拟替换、模拟加强、离散方法作为神经网络、生成方法和深度强化学习。
  • results: 本文的评论集中在深度学习方法,而不是应用于计算机机械领域的研究成果,以便帮助研究人员更好地了解这一领域。
    Abstract The rapid growth of deep learning research, including within the field of computational mechanics, has resulted in an extensive and diverse body of literature. To help researchers identify key concepts and promising methodologies within this field, we provide an overview of deep learning in deterministic computational mechanics. Five main categories are identified and explored: simulation substitution, simulation enhancement, discretizations as neural networks, generative approaches, and deep reinforcement learning. This review focuses on deep learning methods rather than applications for computational mechanics, thereby enabling researchers to explore this field more effectively. As such, the review is not necessarily aimed at researchers with extensive knowledge of deep learning -- instead, the primary audience is researchers at the verge of entering this field or those who attempt to gain an overview of deep learning in computational mechanics. The discussed concepts are, therefore, explained as simple as possible.
    摘要 “深度学习在计算机机学中的快速发展,包括计算机机学领域内的深度学习研究,已经形成了广泛和多样的文献。为帮助研究者更好地了解计算机机学领域中的深度学习方法,我们提供了深度学习在计算机机学中的概述。本文分为五个主要类别:模拟替换、模拟加强、离散为神经网络、生成方法和深度奖励学习。本文主要针对计算机机学领域的研究者,而不是深度学习专家,因此在解释概念时尽量简单明了。”

Automatic Feature Fairness in Recommendation via Adversaries

  • paper_url: http://arxiv.org/abs/2309.15418
  • repo_url: https://github.com/holdenhu/advfm
  • paper_authors: Hengchang Hu, Yiming Cao, Zhankui He, Samson Tan, Min-Yen Kan
  • for: The paper aims to achieve equitable treatment across diverse groups defined by various feature combinations in recommender systems, by proposing feature fairness as the foundation for practical implementation.
  • methods: The paper introduces unbiased feature learning through adversarial training, using adversarial perturbation to enhance feature representation. The authors adapt adversaries automatically based on two forms of feature biases: frequency and combination variety of feature values.
  • results: The paper shows that the proposed method, AAFM, surpasses strong baselines in both fairness and accuracy measures. AAFM excels in providing item- and user-fairness for single- and multi-feature tasks, showcasing their versatility and scalability. However, the authors find that adversarial perturbation must be well-managed during training to maintain good accuracy.Here’s the Chinese translation of the three key points:
  • for: 这篇论文目标是在推荐系统中实现多样性群体的平等待遇,通过基于特征平衡的方法来实现实用实现。
  • methods: 论文提出了不偏向特征学习方法,通过对抗训练来增强特征表示。作者自动调整了对抗器基于特征偏好的两种形式:频率偏好和组合变化。
  • results: 论文表明,提出的方法AAFM在公平度和准确度两个指标上都超越了强基eline。AAFM在单特征和多特征任务中展示出了其 universality和可扩展性。但是,作者发现,在训练过程中,对抗训练的管理是关键,以保持好的准确度。
    Abstract Fairness is a widely discussed topic in recommender systems, but its practical implementation faces challenges in defining sensitive features while maintaining recommendation accuracy. We propose feature fairness as the foundation to achieve equitable treatment across diverse groups defined by various feature combinations. This improves overall accuracy through balanced feature generalizability. We introduce unbiased feature learning through adversarial training, using adversarial perturbation to enhance feature representation. The adversaries improve model generalization for under-represented features. We adapt adversaries automatically based on two forms of feature biases: frequency and combination variety of feature values. This allows us to dynamically adjust perturbation strengths and adversarial training weights. Stronger perturbations are applied to feature values with fewer combination varieties to improve generalization, while higher weights for low-frequency features address training imbalances. We leverage the Adaptive Adversarial perturbation based on the widely-applied Factorization Machine (AAFM) as our backbone model. In experiments, AAFM surpasses strong baselines in both fairness and accuracy measures. AAFM excels in providing item- and user-fairness for single- and multi-feature tasks, showcasing their versatility and scalability. To maintain good accuracy, we find that adversarial perturbation must be well-managed: during training, perturbations should not overly persist and their strengths should decay.
    摘要 “公平性”是推荐系统中广泛讨论的话题,但实际实现受到定义敏感特征的挑战,以保持推荐准确性。我们提议在不同群体中实现公平待遇,通过不同特征组合来定义多样化的特征基础。这会提高总准确性,通过平衡特征泛化。我们引入无偏特征学习,通过对抗训练来增强特征表示。对抗敌对体现出来的敌人会提高模型泛化,特别是对于少见特征值。我们自动调整对抗训练的权重和强度,根据特征频率和组合多样性。更强的扰动应用于少见特征值,以提高泛化,而高权重用于低频特征值,以解决训练不平衡。我们基于广泛应用的因子分解机器(AAFM)模型,并在实验中超过了强基eline的公平和准确度度量。AAFM在单特征和多特征任务中展现出了优秀的公平和精度性能,这表明它的多样性和可扩展性。为保持好的准确度,我们发现对抗扰动需要有效管理:在训练过程中,扰动应该不会过度 persist,并且强度应该逐渐减弱。

Revolutionizing Terrain-Precipitation Understanding through AI-driven Knowledge Discovery

  • paper_url: http://arxiv.org/abs/2309.15400
  • repo_url: None
  • paper_authors: Hao Xu, Yuntian Chen, Zhenzhong Zeng, Nina Li, Jian Li, Dongxiao Zhang
  • for: 增进当前气候科学中复杂地形区域气候过程的理解,特别是全球气候变化的背景下。
  • methods: 利用先进的人工智能驱动的知识发现技术,揭示了陡峭地形特征和降水模式之间的细腻关系,揭示了过去隐藏的气候动力。
  • results: 发现了一种名为’1995转折点’的现象,表明1995年左右,气候变化的力量对陡峭地形特征和降水关系产生了重要影响。这些公式在应用于降水预测中具有实际应用价值,可以帮助我们从低分辨率未来气候数据中获得精细的降水预测结果。
    Abstract Advancing our understanding of climate processes in regions characterized by intricate terrain complexity is a paramount challenge in contemporary climate science, particularly in the context of global climate change. Notably, the scarcity of observational data in these regions has imposed substantial limitations on understanding the nuanced climate dynamics therein. For the first time, utilizing cutting-edge AI-driven knowledge discovery techniques, we have uncovered explicit equations that elucidate the intricate relationship between terrain features and precipitation patterns, illuminating the previously concealed complexities governing these relationships. These equations, thus far undisclosed, exhibit remarkable accuracy compared to conventional empirical models when applied to precipitation data. Building on this foundation, we reveal a phenomenon known as the '1995 turning point,' indicating a significant shift in the terrain-precipitation relationship in approximately 1995, related to the forces of climate change. These equations have practical applications, particularly in achieving fine-scale downscaling precipitation predictions from low-resolution future climate data. This capability provides invaluable insights into the expected changes in precipitation patterns across diverse terrains under future climate scenarios.
    摘要 当前气候科学中解决复杂地形区域内气候过程的挑战非常大,尤其在全球气候变化背景下。尽管 Observational data 在这些区域scarce,这些区域的气候动力学尚未得到了深入理解。为了解决这个问题,我们首次采用了前沿的 AI驱动知识发现技术,揭示了 terrain features 和降水模式之间的Explicit equations,揭示了之前隐藏的复杂关系。这些方程在与 convential empirical models 比较时显示了惊人的准确性。基于这个基础,我们发现了1995年的“转折点”,表明在约1995年发生了气候变化的力量对 terrain-precipitation 关系的重要影响。这些方程在实践中具有重要的应用价值,特别是在将来气候数据的低分辨率预测降水情况的细致下calibration。这种能力为不同地形下预计的降水模式变化的预期提供了不可或缺的信息。

Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs

  • paper_url: http://arxiv.org/abs/2309.15395
  • repo_url: None
  • paper_authors: Zihan Zhou, Honghao Wei, Lei Ying
  • for: 这 paper 考虑了在线Constrained Markov Decision Processes (CMDPs) 中的最佳策略识别 (BPI) 问题。
  • methods: 该 paper 提出了一种基于 Koole (1988) 和 Ross (1989) 的基本结构性质的新算法(名为 Pruning-Refinement-Identification,简称 PRI),用于识别 Near-optimal 策略。
  • results: PRI 算法可以在线 CMDPs 中实现 trio 目标:(i) PRI 是一种 model-free 算法; (ii) PRI 输出一个 near-optimal 策略,并且在学习结束时达到高概率; (iii) 在图表设置下,PRI 保证了 $\tilde{\mathcal{O}(\sqrt{K})$ 的 regret和约束违反,这significantly 超过了现有的最佳 regret bound $\tilde{\mathcal{O}(K^{\frac{4}{5}})$ under model-free algorithm, where $K$ 是总共话集数。
    Abstract This paper considers the best policy identification (BPI) problem in online Constrained Markov Decision Processes (CMDPs). We are interested in algorithms that are model-free, have low regret, and identify an optimal policy with a high probability. Existing model-free algorithms for online CMDPs with sublinear regret and constraint violation do not provide any convergence guarantee to an optimal policy and provide only average performance guarantees when a policy is uniformly sampled at random from all previously used policies. In this paper, we develop a new algorithm, named Pruning-Refinement-Identification (PRI), based on a fundamental structural property of CMDPs proved in Koole(1988); Ross(1989), which we call limited stochasticity. The property says for a CMDP with $N$ constraints, there exists an optimal policy with at most $N$ stochastic decisions. The proposed algorithm first identifies at which step and in which state a stochastic decision has to be taken and then fine-tunes the distributions of these stochastic decisions. PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs a near-optimal policy with a high probability at the end of learning; and (iii) in the tabular setting, PRI guarantees $\tilde{\mathcal{O}(\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}(K^{\frac{4}{5})$ under a model-free algorithm, where $K$ is the total number of episodes.
    摘要 Our proposed algorithm, Pruning-Refinement-Identification (PRI), is based on a fundamental structural property of CMDPs proven in Koole (1988) and Ross (1989), which we call limited stochasticity. This property states that for a CMDP with N constraints, there exists an optimal policy with at most N stochastic decisions.PRI first identifies the steps and states where stochastic decisions need to be taken and then fine-tunes the distributions of these decisions. Our algorithm achieves three objectives:1. PRI is a model-free algorithm.2. It outputs a near-optimal policy with a high probability at the end of learning.3. In the tabular setting, PRI guarantees $\tilde{\mathcal{O}(\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}(K^{\frac{4}{5})$ under a model-free algorithm, where $K$ is the total number of episodes.

ADGym: Design Choices for Deep Anomaly Detection

  • paper_url: http://arxiv.org/abs/2309.15376
  • repo_url: https://github.com/minqi824/adgym
  • paper_authors: Minqi Jiang, Chaochuan Hou, Ao Zheng, Songqiao Han, Hailiang Huang, Qingsong Wen, Xiyang Hu, Yue Zhao
  • for: 本文旨在探讨深度学习方法中的异常检测问题,并提出了两个关键问题:(1)深度异常检测方法中的设计选择对检测异常的影响是多大?(2)如何自动选择适合的设计选择来优化异常检测模型?
  • methods: 本文提出了一个名为ADGym的平台,用于全面评估和自动选择深度异常检测方法中的设计元素。
  • results: 经过广泛的实验,结果表明,solely relying on现有领先方法并不够,而使用ADGym提出的模型则显著超越了当前领先技术。
    Abstract Deep learning (DL) techniques have recently found success in anomaly detection (AD) across various fields such as finance, medical services, and cloud computing. However, most of the current research tends to view deep AD algorithms as a whole, without dissecting the contributions of individual design choices like loss functions and network architectures. This view tends to diminish the value of preliminary steps like data preprocessing, as more attention is given to newly designed loss functions, network architectures, and learning paradigms. In this paper, we aim to bridge this gap by asking two key questions: (i) Which design choices in deep AD methods are crucial for detecting anomalies? (ii) How can we automatically select the optimal design choices for a given AD dataset, instead of relying on generic, pre-existing solutions? To address these questions, we introduce ADGym, a platform specifically crafted for comprehensive evaluation and automatic selection of AD design elements in deep methods. Our extensive experiments reveal that relying solely on existing leading methods is not sufficient. In contrast, models developed using ADGym significantly surpass current state-of-the-art techniques.
    摘要 Translated into Simplified Chinese:深度学习(DL)技术在不同领域如金融、医疗服务和云计算中得到了成功,但大多数当前研究仅视深度异常检测(AD)算法为一整体,没有分析个别设计选择的贡献,如损失函数和网络架构。这种视角减少了数据预处理的价值,更多地关注新设计的损失函数、网络架构和学习方法。在这篇论文中,我们想要填补这个差距,问两个关键问题:(i)深度AD方法中哪些设计选择对异常检测有着关键作用?(ii)如何自动选择给定AD数据集中最佳的设计选择,而不是依靠现有的、通用解决方案?为此,我们介绍了ADGym,一个专门为深度AD方法的评估和自动选择设计元素而设计的平台。我们的广泛实验表明,仅仅靠现有的领先方法是不够的。相反,使用ADGym开发的模型在当前领先技术上显著超越。

PPG to ECG Signal Translation for Continuous Atrial Fibrillation Detection via Attention-based Deep State-Space Modeling

  • paper_url: http://arxiv.org/abs/2309.15375
  • repo_url: None
  • paper_authors: Khuong Vo, Mostafa El-Khamy, Yoojin Choi
  • for: 这个研究旨在提出一种可以将心跳信号转换为电击血液信号的潜在独立注意力深度状态模型,以实现不需要严格的临床测量,并且可以在日常生活中监测心跳信号。
  • methods: 这个模型使用了非侵入式、低成本的光学方法,并且运用了潜在注意力技术,实现了将心跳信号转换为电击血液信号的目的。
  • results: 这个研究的实验结果显示,这种方法可以实现高效率的数据训练,并且可以实现与电击血液信号的对应关系。此外,这种方法还可以检测成人心跳过速症(AFib),并且可以辅助电击血液信号的检测。
    Abstract An electrocardiogram (ECG or EKG) is a medical test that measures the heart's electrical activity. ECGs are often used to diagnose and monitor a wide range of heart conditions, including arrhythmias, heart attacks, and heart failure. On the one hand, the conventional ECG requires clinical measurement, which restricts its deployment to medical facilities. On the other hand, single-lead ECG has become popular on wearable devices using administered procedures. An alternative to ECG is Photoplethysmography (PPG), which uses non-invasive, low-cost optical methods to measure cardiac physiology, making it a suitable option for capturing vital heart signs in daily life. As a result, it has become increasingly popular in health monitoring and is used in various clinical and commercial wearable devices. While ECG and PPG correlate strongly, the latter does not offer significant clinical diagnostic value. Here, we propose a subject-independent attention-based deep state-space model to translate PPG signals to corresponding ECG waveforms. The model is highly data-efficient by incorporating prior knowledge in terms of probabilistic graphical models. Notably, the model enables the detection of atrial fibrillation (AFib), the most common heart rhythm disorder in adults, by complementing ECG's accuracy with continuous PPG monitoring. We evaluated the model on 55 subjects from the MIMIC III database. Quantitative and qualitative experimental results demonstrate the effectiveness and efficiency of our approach.
    摘要 一个电cardiogram (ECG或EKG)是医疗测试,测量心脏的电动活动。ECG通常用于诊断和监测各种心脏疾病,包括cardiac arrhythmias, heart attacks, 和heart failure。一方面,普通的ECG需要临床测量,这限制了其部署到医疗设施中。另一方面,单导ECG在佩戴设备上使用了管理的程序。一种代替ECG的是光 Plethysmography (PPG),它使用非侵入的、低成本的光学方法测量心脏生理,因此成为了日常生活中捕捉重要的心脏指标的合适选择。由于PPG和ECG之间强相关,因此PPG可以用于补充ECG的诊断价值。在这种情况下,我们提议一种主体无关的注意力基于深度状态空间模型,将PPG信号翻译成对应的ECG波形。该模型具有高度数据效率,通过 incorporating prior knowledge in terms of probabilistic graphical models。特别是,该模型可以通过补充ECG的精度,检测成人最常见的心脏rhythm疾病(cardiac arrhythmias)。我们在MIMIC III数据库上测试了55名参与者。量化和质量实验结果表明我们的方法的效果和效率。

Density Estimation via Measure Transport: Outlook for Applications in the Biological Sciences

  • paper_url: http://arxiv.org/abs/2309.15366
  • repo_url: None
  • paper_authors: Vanessa Lopez-Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo
  • for: 本研究用于支持生物科学研究,尤其是在射线生物学领域,where data is scarce.
  • methods: 本研究使用度量运输方法,specifically using triangular transport maps, to process and analyze data distributed according to a wide class of probability measures.
  • results: 研究发现,当数据稀缺时,使用稀缺运输地图有利,可以找到隐藏在数据中的信息,如果某些基因之间的关系和它们的动态变化。
    Abstract One among several advantages of measure transport methods is that they allow for a unified framework for processing and analysis of data distributed according to a wide class of probability measures. Within this context, we present results from computational studies aimed at assessing the potential of measure transport techniques, specifically, the use of triangular transport maps, as part of a workflow intended to support research in the biological sciences. Scarce data scenarios, which are common in domains such as radiation biology, are of particular interest. We find that when data is scarce, sparse transport maps are advantageous. In particular, statistics gathered from computing series of (sparse) adaptive transport maps, trained on a series of randomly chosen subsets of the set of available data samples, leads to uncovering information hidden in the data. As a result, in the radiation biology application considered here, this approach provides a tool for generating hypotheses about gene relationships and their dynamics under radiation exposure.
    摘要 一种多种优点的度量运输方法是它们允许在一个广泛的概率测度下进行数据处理和分析。在这个上下文中,我们介绍了计算研究,以评估度量运输技术的潜在优势,具体来说是使用三角形运输地图。在具有稀缺数据的情况下,这种方法特别有优势。我们发现,当数据稀缺时,稀缺运输地图可以暴露数据中隐藏的信息。因此,在考虑的辐射生物学应用中,这种方法可以提供一种生成关于基因关系和它们在辐射暴露下的动态变化的假设的工具。

Exploring Learned Representations of Neural Networks with Principal Component Analysis

  • paper_url: http://arxiv.org/abs/2309.15328
  • repo_url: None
  • paper_authors: Amit Harlev, Andrew Engel, Panos Stinis, Tony Chiang
  • for: 本研究探讨深度神经网络(DNN)中的特征表示方法,以提高Explainable AI领域的理解。
  • methods: 本研究使用主成分分析(PCA)来研究一个ResNet-18模型在CIFAR-10数据集上的各层特征表示,并利用k- nearest neighbors分类器(k-NN)、最近类中心分类器(NCC)和支持向量机来评估这些特征表示的性能。
  • results: 研究发现,在某些层次上,只需20%的中间特征空间差异即可达到高精度分类,而且在所有层次上,前100个特征空间差异可以完全确定k-NN和NCC分类器的性能。研究还发现了神经萧略现象和中间神经萧略现象的关系,并提供了三种不同 yet可解释的特征表示模型,其中一种是一个Affine linear model,性能最佳。此外,研究还显示了可以利用多种特征表示模型来估计DNN中的神经萧略现象发生的位置。
    Abstract Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We show that in certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification and that across all layers, the first ~100 PCs completely determine the performance of the k-NN and NCC classifiers. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation with an affine linear model the best performing. We also show that leveraging several surrogate models affords us a clever method to estimate where neural collapse may initially occur within the DNN.
    摘要 <>translate_language: zh-CNUnderstanding deep neural network (DNN) feature representation remains an open question in the broader field of explainable AI. We use principal component analysis (PCA) to study the performance of k-nearest neighbors classifiers (k-NN), nearest class-centers classifiers (NCC), and support vector machines on the layer-wise representations of a ResNet-18 trained on CIFAR-10. We find that in certain layers, as little as 20% of the intermediate feature-space variance is sufficient for high-accuracy classification, and that the first ~100 principal components (PCs) completely determine the performance of the k-NN and NCC classifiers across all layers. Our findings are related to the phenomenon of neural collapse, and we provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation, with an affine linear model being the best performing. We also show that leveraging multiple surrogate models allows us to estimate where neural collapse may initially occur within the DNN.

Neural Operators for Accelerating Scientific Simulations and Design

  • paper_url: http://arxiv.org/abs/2309.15325
  • repo_url: None
  • paper_authors: Kamyar Azizzadenesheli, Nikola Kovachki, Zongyi Li, Miguel Liu-Schiaffini, Jean Kossaifi, Anima Anandkumar
  • for: 代替physical experiments的数字实验方法,提高科学发现和工程设计的效率和成本。
  • methods: 使用人工智能技术,特别是神经网络模型,学习函数的映射,可以在新的位置上预测和拟合解决方案。
  • results: 可以在computational fluid dynamics、天气预测和材料模型等领域中快速替代或补充现有的模拟器,提高解决方案的精度和普遍性。
    Abstract Scientific discovery and engineering design are currently limited by the time and cost of physical experiments, selected mostly through trial-and-error and intuition that require deep domain expertise. Numerical simulations present an alternative to physical experiments but are usually infeasible for complex real-world domains due to the computational requirements of existing numerical methods. Artificial intelligence (AI) presents a potential paradigm shift by developing fast data-driven surrogate models. In particular, an AI framework, known as neural operators, presents a principled framework for learning mappings between functions defined on continuous domains, e.g., spatiotemporal processes and partial differential equations (PDE). They can extrapolate and predict solutions at new locations unseen during training, i.e., perform zero-shot super-resolution. Neural operators can augment or even replace existing simulators in many applications, such as computational fluid dynamics, weather forecasting, and material modeling, while being 4-5 orders of magnitude faster. Further, neural operators can be integrated with physics and other domain constraints enforced at finer resolutions to obtain high-fidelity solutions and good generalization. Since neural operators are differentiable, they can directly optimize parameters for inverse design and other inverse problems. We believe that neural operators present a transformative approach to simulation and design, enabling rapid research and development.
    摘要

On the Power of SVD in the Stochastic Block Model

  • paper_url: http://arxiv.org/abs/2309.15322
  • repo_url: None
  • paper_authors: Xinyu Mao, Jiapeng Zhang
  • for: 该研究旨在解释spectral方法在归一化 clustering问题中的行为,并在 Stochastic Block Model (SBM) 中研究vanilla-SVD算法的力量。
  • methods: 该研究使用了vanilla-SVD算法来解决归一化 clustering问题,并在symmetric设定下确认了该算法可以正确地回归所有团。
  • results: 研究发现,在symmetric设定下,vanilla-SVD算法可以准确地回归所有团,解答了Van Vu(Combinatorics Probability and Computing, 2018)在symmetric设定下提出的开问。
    Abstract A popular heuristic method for improving clustering results is to apply dimensionality reduction before running clustering algorithms. It has been observed that spectral-based dimensionality reduction tools, such as PCA or SVD, improve the performance of clustering algorithms in many applications. This phenomenon indicates that spectral method not only serves as a dimensionality reduction tool, but also contributes to the clustering procedure in some sense. It is an interesting question to understand the behavior of spectral steps in clustering problems. As an initial step in this direction, this paper studies the power of vanilla-SVD algorithm in the stochastic block model (SBM). We show that, in the symmetric setting, vanilla-SVD algorithm recovers all clusters correctly. This result answers an open question posed by Van Vu (Combinatorics Probability and Computing, 2018) in the symmetric setting.
    摘要 一种广泛使用的规则方法是在运行聚类算法之前应用维度减少方法。已经观察到 spectral-based维度减少工具,如PCA或SVD,在许多应用中提高 clustering 算法的性能。这种现象表明 spectral 方法不仅 serves as a维度减少工具,还在某种意义上对 clustering 过程做出了贡献。这是一个有趣的问题,要理解 spectral 步骤在 clustering 问题中的行为。为了解答这个问题,这篇论文研究了 vanilla-SVD 算法在随机块模型(SBM)中的力量。我们显示,在对称设定下,vanilla-SVD 算法可以正确地回归所有群。这个结果回答了 Van Vu(Combinatorics Probability and Computing, 2018)在对称设定下提出的一个开Question。