cs.LG - 2023-11-15

Beyond PCA: A Probabilistic Gram-Schmidt Approach to Feature Extraction

  • paper_url: http://arxiv.org/abs/2311.09386
  • repo_url: None
  • paper_authors: Bahram Yaghooti, Netanel Raviv, Bruno Sinopoli
  • for: 提取非线性特征,即在数据中存在非线性关系的特征提取,是无监督学习中的基本挑战。
  • methods: 我们提出了一种使用概率 Gram-Schmidt(PGS)类型正交化过程,以探测和映射出数据中的重复维度。 Specifically, 我们在任何函数家族(presumably captures the nonlinear dependencies in the data)上应用 PGS 过程,并构建一系列 covariance matrices,可以用来从主成分中除非linear dependencies,或者标识新的大异常方向。
  • results: 我们提供了两种方法,可以从数据中提取线性特征,同时去除非线性关系。 在第一种情况下,我们证明在某些假设下,这些算法可以探测和除非线性关系,当这些关系在数据中的线性拟合中。 在第二种情况下,我们提供了信息学理论保证,以 entropy reduction 的形式。 我们在 sintetic 和实际数据上进行了 simulations,并证明了我们的方法在 variance maximization 和 classification 算法的性能上具有提高。
    Abstract Linear feature extraction at the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a Probabilistic Gram-Schmidt (PGS) type orthogonalization process in order to detect and map out redundant dimensions. Specifically, by applying the PGS process over any family of functions which presumably captures the nonlinear dependencies in the data, we construct a series of covariance matrices that can either be used to remove those dependencies from the principal components, or to identify new large-variance directions. In the former case, we prove that under certain assumptions the resulting algorithms detect and remove nonlinear dependencies whenever those dependencies lie in the linear span of the chosen function family. In the latter, we provide information-theoretic guarantees in terms of entropy reduction. Both proposed methods extract linear features from the data while removing nonlinear redundancies. We provide simulation results on synthetic and real-world datasets which show improved performance over PCA and state-of-the-art linear feature extraction algorithms, both in terms of variance maximization of the extracted features, and in terms of improved performance of classification algorithms.
    摘要 In the former case, we prove that under certain assumptions, the resulting algorithms detect and remove nonlinear dependencies whenever those dependencies lie in the linear span of the chosen function family. In the latter, we provide information-theoretic guarantees in terms of entropy reduction. Both proposed methods extract linear features from the data while removing nonlinear redundancies.We provide simulation results on synthetic and real-world datasets that show improved performance over PCA and state-of-the-art linear feature extraction algorithms, both in terms of variance maximization of the extracted features and improved performance of classification algorithms.我们使用Probabilistic Gram-Schmidt(PGS)类型的正交化过程来检测和映射出数据中的非线性依赖关系。通过将PGS过程应用到数据中的任何函数家族,我们可以构造一系列协 variance 矩阵,这些矩阵可以用于从主成分中除非线性依赖关系,或者标识新的大协变量方向。在前一种情况下,我们证明在某些假设下,得到的算法可以检测并除非线性依赖关系,当这些依赖关系 lie 在选择的函数家族的线性扩展上。在另一种情况下,我们提供信息理论保证,以 entropy 减少为标准。两种提议的方法都可以从数据中提取线性特征,同时去除非线性冗余。我们对 synthetic 和实际世界数据进行了丰富的 simulations,结果表明,我们的方法可以超过PCA和当前的线性特征提取算法,包括 variance 最大化的特征提取和分类算法的性能提高。

Time-dependent Probabilistic Generative Models for Disease Progression

  • paper_url: http://arxiv.org/abs/2311.09369
  • repo_url: None
  • paper_authors: Onintze Zaballa, Aritz Pérez, Elisa Gómez-Inhiesto, Teresa Acaiturri-Ayesta, Jose A. Lozano
  • for: 这篇论文的目的是要利用电子健康纪录中的资料来监控病人的健康趋势过程。
  • methods: 这篇论文使用了Markov运动模型来理解疾病的深层模式和动态。
  • results: 这篇论文的结果显示了模型的有效性,可以从数据中回传出深层模式,并且准确地模型了医疗事件之间的不规则时间间隔。
    Abstract Electronic health records contain valuable information for monitoring patients' health trajectories over time. Disease progression models have been developed to understand the underlying patterns and dynamics of diseases using these data as sequences. However, analyzing temporal data from EHRs is challenging due to the variability and irregularities present in medical records. We propose a Markovian generative model of treatments developed to (i) model the irregular time intervals between medical events; (ii) classify treatments into subtypes based on the patient sequence of medical events and the time intervals between them; and (iii) segment treatments into subsequences of disease progression patterns. We assume that sequences have an associated structure of latent variables: a latent class representing the different subtypes of treatments; and a set of latent stages indicating the phase of progression of the treatments. We use the Expectation-Maximization algorithm to learn the model, which is efficiently solved with a dynamic programming-based method. Various parametric models have been employed to model the time intervals between medical events during the learning process, including the geometric, exponential, and Weibull distributions. The results demonstrate the effectiveness of our model in recovering the underlying model from data and accurately modeling the irregular time intervals between medical actions.
    摘要 To address this challenge, we propose a Markovian generative model of treatments that can (i) model the irregular time intervals between medical events, (ii) classify treatments into subtypes based on the patient sequence of medical events and the time intervals between them, and (iii) segment treatments into subsequences of disease progression patterns. We assume that sequences have an associated structure of latent variables, including a latent class representing the different subtypes of treatments and a set of latent stages indicating the phase of progression of the treatments.We use the Expectation-Maximization algorithm to learn the model, which is efficiently solved with a dynamic programming-based method. During the learning process, we employ various parametric models to model the time intervals between medical events, including the geometric, exponential, and Weibull distributions. The results demonstrate the effectiveness of our model in recovering the underlying model from data and accurately modeling the irregular time intervals between medical actions.Here is the translation in Simplified Chinese:电子健康记录 (EHRs) 包含价值的健康轨迹信息,可以用来监测患者的健康变化趋势。研究人员已经开发了疾病进程模型,以了解医疗记录中的下列特征和动态。然而,分析医疗记录中的时间序列数据具有挑战性,因为它们具有不规则和不稳定的特征。为了解决这个问题,我们提出了一种Markov链模型,可以 (i) 模拟医疗记录中的不规则时间间隔, (ii) 根据患者的医疗记录序列和时间间隔来类型化治疗, (iii) 将治疗分为疾病进程模式的子序列。我们假设序列具有隐藏变量的结构,包括不同的治疗类型和疾病进程阶段。我们使用了Expectation-Maximization算法来学习模型,并使用动态规划方法来效率地解决问题。在学习过程中,我们采用了不同的参数模型来模拟医疗记录中的时间间隔,包括 geometric、exponential 和 Weibull 分布。结果表明,我们的模型可以准确地从数据中回归下列模型,并且能够准确地模拟医疗记录中的不规则时间间隔。

Nondestructive, quantitative viability analysis of 3D tissue cultures using machine learning image segmentation

  • paper_url: http://arxiv.org/abs/2311.09354
  • repo_url: None
  • paper_authors: Kylie J. Trettner, Jeremy Hsieh, Weikun Xiao, Jerry S. H. Lee, Andrea M. Armani
  • for: 本研究旨在开发一种基于图像处理的细胞生存度评估方法,以自动评估细胞群体的生存度和响应于刺激的可能性。
  • methods: 本研究使用图像处理算法来评估细胞生存度,不需要使用各种抑衰指标。研究者使用高内容成像系统拍摄照片,并使用人工智能模型来自动识别细胞生存度。
  • results: 研究发现,使用图像处理算法可以准确地评估细胞生存度,并且可以减少分析时间97%。这种方法可以在不同的细胞 культура条件下进行评估,并且可以帮助提高生物学和临床研究中的细胞 культура分析的可重复性和可靠性。
    Abstract Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measurements to assess the continuity of possible cellular states and responses to perturbation across cell culture conditions is needed. In this work, we demonstrate an image processing algorithm for quantifying cellular viability in 3D cultures without the need for assay-based indicators. We show that our algorithm performs similarly to a pair of human experts in whole-well images over a range of days and culture matrix compositions. To demonstrate potential utility, we perform a longitudinal study investigating the impact of a known therapeutic on pancreatic cancer spheroids. Using images taken with a high content imaging system, the algorithm successfully tracks viability at the individual spheroid and whole-well level. The method we propose reduces analysis time by 97% in comparison to the experts. Because the method is independent of the microscope or imaging system used, this approach lays the foundation for accelerating progress in and for improving the robustness and reproducibility of 3D culture analysis across biological and clinical research.
    摘要 通过评估细胞群体的可活性在不同的细胞文化条件下,通常是通过均值色imetric指标来进行评估,并常常报告出简单的二进制输出。然而,现有的可活性评估技术还需要进一步发展,以评估细胞群体的连续性和响应于干扰的可能性。在这项工作中,我们提出了一种图像处理算法,可以无需各种指标来评估细胞可活性。我们证明了我们的算法与两名人类专家的总体评估结果相似,在不同的日期和细胞Matrix组成下。为了展示可能的实用性,我们进行了一项 longitudinal 研究,investigating the impact of a known therapeutic on pancreatic cancer spheroids。使用高Content imaging系统拍摄的图像,我们的算法成功地跟踪了细胞可活性的个体硬化和整个Well水平。我们的方法可以比人类专家减少分析时间约97%。因为该方法不виси于使用哪种 Mikroskop 或 imaging系统,这种方法可以为生物和临床研究提供加速进步的基础。

Challenges for Predictive Modeling with Neural Network Techniques using Error-Prone Dietary Intake Data

  • paper_url: http://arxiv.org/abs/2311.09338
  • repo_url: None
  • paper_authors: Dylan Spicker, Amir Nazemi, Joy Hutchinson, Paul Fieguth, Sharon I. Kirkpatrick, Michael Wallace, Kevin W. Dodd
  • for: 这个论文旨在探讨食物摄入数据如何影响健康关系,但这些数据经常受到测量误差的影响,导致实际关系与论文中的关系不同。
  • methods: 这篇论文使用神经网络模型来捕捉食物摄入数据中的复杂相互作用,但测量误差会对神经网络模型的性能产生负面影响。
  • results: 这篇论文发现,在受测量误差影响的情况下,神经网络模型的预测性能会受到影响,需要更多的数据和更好的方法来改善模型的性能。
    Abstract Dietary intake data are routinely drawn upon to explore diet-health relationships. However, these data are often subject to measurement error, distorting the true relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of machine learning techniques, and in particular, neural networks. Neural networks are computational models that are able to capture highly complex, nonlinear relationships so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling has not been systematically investigated. However, dietary intake data are typically collected using self-report methods and are prone to large amounts of measurement error. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks, and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play on model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains make them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques, compared to more traditional statistical procedures.
    摘要 Dietary intake data Routinely 被用来探索饮食和健康关系。 However, these data are often subject to measurement error, which distorts the true relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of machine learning techniques, and in particular, neural networks. Neural networks are computational models that are able to capture highly complex, nonlinear relationships so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling have not been systematically investigated. However, dietary intake data are typically collected using self-report methods and are prone to large amounts of measurement error. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks, and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play on model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains make them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques, compared to more traditional statistical procedures.Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from Traditional Chinese.

A Comparative Analysis of Machine Learning Models for Early Detection of Hospital-Acquired Infections

  • paper_url: http://arxiv.org/abs/2311.09329
  • repo_url: None
  • paper_authors: Ethan Harvey, Junzi Dong, Erina Ghosh, Ali Samadani
  • for: 这个研究旨在比较两种护理机器学习模型,以便在早期检测医院获得的感染(HAIs)中提供重要的改进。
  • methods: 这两种模型都使用不同的感染标签定义、选择受试者和预测方案。
  • results: 这个研究发现这两种模型在预测HAIs时存在一定的一致性和冲突。
    Abstract As more and more infection-specific machine learning models are developed and planned for clinical deployment, simultaneously running predictions from different models may provide overlapping or even conflicting information. It is important to understand the concordance and behavior of parallel models in deployment. In this study, we focus on two models for the early detection of hospital-acquired infections (HAIs): 1) the Infection Risk Index (IRI) and 2) the Ventilator-Associated Pneumonia (VAP) prediction model. The IRI model was built to predict all HAIs, whereas the VAP model identifies patients at risk of developing ventilator-associated pneumonia. These models could make important improvements in patient outcomes and hospital management of infections through early detection of infections and in turn, enable early interventions. The two models vary in terms of infection label definition, cohort selection, and prediction schema. In this work, we present a comparative analysis between the two models to characterize concordances and confusions in predicting HAIs by these models. The learnings from this study will provide important findings for how to deploy multiple concurrent disease-specific models in the future.
    摘要 随着更多的感染病特定机器学习模型的开发和规划,同时运行不同模型的预测可能会提供重叠或甚至矛盾的信息。理解并发型模型在部署过程中的协调和行为非常重要。本研究专注于两种早期检测医院获得感染(HAIs)的模型:1)感染风险指数(IRI)模型和2)呼吸器相关肺炎预测模型。IRI模型预测所有HAIs,而VAP预测模型标识患有呼吸器相关肺炎的患者。这两种模型可以通过早期检测感染并提供早期干预,从而提高患者的结果和医院对感染的管理。这两种模型在感染标签定义、样本选择和预测方案方面存在差异。本研究通过对这两种模型进行比较分析,描述这两种模型在预测HAIs方面的协调和混乱。本研究的发现将为未来多个同时部署疾病特定模型提供重要的发现。

A Unified Approach to Learning Ising Models: Beyond Independence and Bounded Width

  • paper_url: http://arxiv.org/abs/2311.09197
  • repo_url: None
  • paper_authors: Jason Gaitonde, Elchanan Mossel
  • for: 该论文目的是提高现有的恒等模型参数学习算法,以便在不满足现有假设的情况下,从数据中提取模型参数。
  • methods: 该论文使用了节点wise逻辑回归算法,该算法可以在各种新的情况下成功地提取模型参数,包括各种本地马可夫链生成的数据,以及随机的温度范围内的玻璃杯模型。
  • results: 该论文的结果表明,使用节点wise逻辑回归算法可以在各种新的情况下提取模型参数,并且可以在较低的样本复杂度下达到最佳的样本复杂度。此外,该论文还提供了一些新的 guarantees for learning from adversarial Glauber dynamics。
    Abstract We revisit the problem of efficiently learning the underlying parameters of Ising models from data. Current algorithmic approaches achieve essentially optimal sample complexity when given i.i.d. samples from the stationary measure and the underlying model satisfies "width" bounds on the total $\ell_1$ interaction involving each node. We show that a simple existing approach based on node-wise logistic regression provably succeeds at recovering the underlying model in several new settings where these assumptions are violated: (1) Given dynamically generated data from a wide variety of local Markov chains, like block or round-robin dynamics, logistic regression recovers the parameters with optimal sample complexity up to $\log\log n$ factors. This generalizes the specialized algorithm of Bresler, Gamarnik, and Shah [IEEE Trans. Inf. Theory'18] for structure recovery in bounded degree graphs from Glauber dynamics. (2) For the Sherrington-Kirkpatrick model of spin glasses, given $\mathsf{poly}(n)$ independent samples, logistic regression recovers the parameters in most of the known high-temperature regime via a simple reduction to weaker structural properties of the measure. This improves on recent work of Anari, Jain, Koehler, Pham, and Vuong [ArXiv'23] which gives distribution learning at higher temperature. (3) As a simple byproduct of our techniques, logistic regression achieves an exponential improvement in learning from samples in the M-regime of data considered by Dutt, Lokhov, Vuffray, and Misra [ICML'21] as well as novel guarantees for learning from the adversarial Glauber dynamics of Chin, Moitra, Mossel, and Sandon [ArXiv'23]. Our approach thus significantly generalizes the elegant analysis of Wu, Sanghavi, and Dimakis [Neurips'19] without any algorithmic modification.
    摘要 我们回到了从数据中划出隐藏模型的问题中。现有的算法方法可以实现基本的体积缩小Sample complexity, provided that the data is i.i.d. from the stationary distribution and the underlying model satisfies certain "width" bounds on the total $\ell_1$ interaction involving each node. 我们显示了一个简单的现有方法,即每个节点的逻辑回传 regression,可以在一些新的设定中成功地从数据中弹出隐藏模型:(1)对于各种本地Markov链的生成数据,例如对于块或轮转动态,逻辑回传 regression可以从数据中提取隐藏模型的parameters,具有最佳的体积缩小因素,只有 $\log\log n$ 因素。这标准化了 Bresler、Gamarnik 和 Shah 的特殊算法 [IEEE Trans. Inf. Theory'18] для结构回传在受限度度Graph中。(2)关于玻璃玻璃产生的磁矩链模型,对于大多数高温区域,逻辑回传 regression可以从 $\mathsf{poly}(n)$ 独立数据中提取模型的parameters。这意味着我们可以在较高的温度区域中进行分布学习,进一步超越了 Anari、Jain、Koehler、Pham 和 Vuong 的最近研究 [ArXiv'23]。(3)我们的方法还具有一个简单的副产物,即逻辑回传 regression可以从 M-regime中的数据中弹出模型的parameters,并且在 Dutt、Lokhov、Vuffray 和 Misra [ICML'21] 中考虑的数据中具有很好的学习效果。此外,我们还提供了一些新的保证,允许在Chin、Moitra、Mossel 和 Sandon 的 adversarial Glauber dynamics [ArXiv'23] 中进行学习。我们的方法因此可以广泛应用在 Wu、Sanghavi 和 Dimakis [Neurips'19] 的数据中,而不需要任何算法修改。

Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge

  • paper_url: http://arxiv.org/abs/2311.09195
  • repo_url: None
  • paper_authors: Sang-Hyun Lee, Seung-Woo Seo
  • for: 本研究旨在提高现有激励学习算法在真实场景中的应用,解决需要在每个话语中重置环境的瓶颈。
  • methods: 本研究提出了一种基于自适应激励学习(ARL)算法,生成适应学习进程中的课程。这些课程可以根据执行策略的学习进程来减少需要人工重置的数量,但是它们需要任务特定的知识,如预先定义的初始状态或重置奖励函数。本研究提出了一种不需要任务特定知识的ARL算法,可以自动生成适应学习进程中的课程。
  • results: 我们的实验结果表明,我们的ARL算法可以生成适应学习进程中的课程,使得执行策略可以自动重置到多样化和有用的初始状态。我们引入了一个成功识别器,以便从每个初始状态中预测执行策略后的成功概率。成功识别器通过在一种自适应的自我监督模式下训练,使得执行策略可以快速地解决缺乏奖励的迷宫探索任务,并且表现出了比基eline的更好的性能。
    Abstract A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent's learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation tasks, outperforming baselines with significantly fewer manual resets.
    摘要 Current reinforcement learning algorithms have a major obstacle to applying them to real-world scenarios: the need to reset the environment between every episode. This reset process requires a lot of human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have proposed autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. These curricula can reduce the number of required manual resets by taking into account the agent's learning progress, but they rely on task-specific knowledge, such as predefined initial states or reset reward functions.In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum enables the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results show that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation tasks, outperforming baselines with significantly fewer manual resets.

Approaching adverse event detection utilizing transformers on clinical time-series

  • paper_url: http://arxiv.org/abs/2311.09165
  • repo_url: None
  • paper_authors: Helge Fredriksen, Per Joel Burman, Ashenafi Woldaregay, Karl Øyvind Mikalsen, Ståle Nymo
  • for: 预测患者的临床趋势并避免不良事件
  • methods: 使用自动化检测系统,基于STraTS transformer架构对时间序列数据进行 Representation,并使用各种聚类技术来探索患者的临床进程分型
  • results: 初步结果显示系统能够准确地检测异常情况,但需要更多的患者数据来进行更全面的评估系统性能
    Abstract Patients being admitted to a hospital will most often be associated with a certain clinical development during their stay. However, there is always a risk of patients being subject to the wrong diagnosis or to a certain treatment not pertaining to the desired effect, potentially leading to adverse events. Our research aims to develop an anomaly detection system for identifying deviations from expected clinical trajectories. To address this goal we analyzed 16 months of vital sign recordings obtained from the Nordland Hospital Trust (NHT). We employed an self-supervised framework based on the STraTS transformer architecture to represent the time series data in a latent space. These representations were then subjected to various clustering techniques to explore potential patient phenotypes based on their clinical progress. While our preliminary results from this ongoing research are promising, they underscore the importance of enhancing the dataset with additional demographic information from patients. This additional data will be crucial for a more comprehensive evaluation of the method's performance.
    摘要 patients being admitted to a hospital will often be associated with a certain clinical development during their stay, but there is always a risk of patients being misdiagnosed or receiving the wrong treatment, which could lead to adverse events. our research aims to develop an anomaly detection system to identify deviations from expected clinical trajectories. to achieve this goal, we analyzed 16 months of vital sign recordings from the Nordland Hospital Trust (nht). we used a self-supervised framework based on the STraTS transformer architecture to represent the time series data in a latent space. these representations were then subjected to various clustering techniques to explore potential patient phenotypes based on their clinical progress. while our preliminary results are promising, we recognize the need to enhance the dataset with additional demographic information from patients to evaluate the method's performance more comprehensively.

Improved Sparse Ising Optimization

  • paper_url: http://arxiv.org/abs/2311.09275
  • repo_url: None
  • paper_authors: Kenneth M. Zick
  • for: 这个论文是为了解决含有大量零值的尼饶问题(Sparse Ising problem),这类问题在物流、吸引物理和深度波尔谱网络训练等领域都有广泛的应用,但可能会很困难和缓慢地解决。
  • methods: 该论文提出了一种新的落差搜索算法,用于解决含有大量零值的尼饶问题。该算法在大型稀疏实例上进行了测试,并实现了比以往报道的速度和准确性(如托笔会的模拟缺乏机制和breakout本地搜索)的至少2-4个数量级的提升。
  • results: 据论文所示,该新算法在一些长期的标准实例上达到了更高的性能,并在两个实例(G72和G77)上发现了更好的解决方案, Solution bitstrings证明了这两个最佳解决方案。这些数据表明,该算法可能会推动稀疏尼饶性能的前沿,并为AI工具箱、决策系统和算法库提供新的可能性。
    Abstract Sparse Ising problems can be found in application areas such as logistics, condensed matter physics and training of deep Boltzmann networks, but can be very difficult to tackle with high efficiency and accuracy. This report presents new data demonstrating significantly higher performance on some longstanding benchmark problems with up to 20,000 variables. The data come from a new heuristic algorithm tested on the large sparse instances from the Gset benchmark suite. Relative to leading reported combinations of speed and accuracy (e.g., from Toshiba's Simulated Bifurcation Machine and Breakout Local Search), a proof-of-concept implementation reached targets 2-4 orders of magnitude faster. For two instances (G72 and G77) the new algorithm discovered a better solution than all previously reported values. Solution bitstrings confirming these two best solutions are provided. The data suggest exciting possibilities for pushing the sparse Ising performance frontier to potentially strengthen algorithm portfolios, AI toolkits and decision-making systems.
    摘要 稀疏各种问题(Sparse Ising problems)可以在物流、吸积物理和深度波尔谱网络训练等应用领域中找到,但它们可以很难以使用高效率和准确性解决。本报告提供新的数据,表明在一些长期的标准测试问题上(有多达20,000个变量)表现出了明显的性能提升。这些数据来自一种新的启发式算法,在Gset benchmark集中的大 sparse instances上进行测试。相比之下,现有的速度和准确性的报道(如东芝的模拟分支机器和Breakout本地搜索),一个证明原型实现的运行速度比之下,让人感到惊叹。对G72和G77两个实例,新算法发现了比之前所报道的更好的解决方案。解决方案的位 bitstring证明了这两个最佳解决方案。数据表明,这些新成果可能会推动稀疏各种问题的性能前沿,并可能增强算法库、人工智能工具箱和决策系统。

Model Agnostic Explainable Selective Regression via Uncertainty Estimation

  • paper_url: http://arxiv.org/abs/2311.09145
  • repo_url: None
  • paper_authors: Andrea Pugnana, Carlos Mougan, Dan Saattrup Nielsen
  • for: 提高机器学习系统的可靠性和可信度
  • methods: 使用模型独立非参数统计 ERROR 估计
  • results: 比状态艺术 selective regression 表现更佳,在 69 个数据集上进行了广泛的比较
    Abstract With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the problem of selective regression is understudied. This paper presents a novel approach to selective regression that utilizes model-agnostic non-parametric uncertainty estimation. Our proposed framework showcases superior performance compared to state-of-the-art selective regressors, as demonstrated through comprehensive benchmarking on 69 datasets. Finally, we use explainable AI techniques to gain an understanding of the drivers behind selective regression. We implement our selective regression method in the open-source Python package doubt and release the code used to reproduce our experiments.
    摘要 With the widespread adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the problem of selective regression is understudied. This paper presents a novel approach to selective regression that utilizes model-agnostic non-parametric uncertainty estimation. Our proposed framework showcases superior performance compared to state-of-the-art selective regressors, as demonstrated through comprehensive benchmarking on 69 datasets. Finally, we use explainable AI techniques to gain an understanding of the drivers behind selective regression. We implement our selective regression method in the open-source Python package doubt and release the code used to reproduce our experiments.Here's the translation in Traditional Chinese:随着机器学习技术的广泛采用,需求已经进一步地进化,不仅需要高性能,更需要模型的可信度。一种常见的方法来增强模型的可信度是允许它们不 Predicting。这种框架称为选择性预测。选择性预测的分类任务已经广泛分析,但选择性回归却受到了较少的研究。本文提出了一种新的选择性回归方法,利用模型不 Parametric 不确定性估计。我们的提案的框架在69个数据集上展示了较高的性能,比state-of-the-art选择回归器更好。最后,我们使用可解释 AI 技术来理解选择性回归的驱动力。我们将选择性回归方法实现在 Open-source Python 套件 doubt 中,并发布了用于重现实验的代码。

Machine-learning parameter tracking with partial state observation

  • paper_url: http://arxiv.org/abs/2311.09142
  • repo_url: None
  • paper_authors: Zheng-Meng Zhai, Mohammadamin Moradi, Bryan Glaz, Mulugeta Haile, Ying-Cheng Lai
  • for: 这 paper 用于描述一种基于机器学习的时间变化参数追踪方法,用于处理复杂和非线性动力系统中的参数变化。
  • methods: 该方法基于储存器计算和反问题形式,不需要知道系统的模型结构,可以从部分状态观测数据中直接学习时间变化参数。
  • results: 研究人员通过使用不同的动力系统数据集,证明了该方法可以准确地预测时间变化参数,并且可以处理低维度和高维度、Markovian和非Markovian的动力系统。
    Abstract Complex and nonlinear dynamical systems often involve parameters that change with time, accurate tracking of which is essential to tasks such as state estimation, prediction, and control. Existing machine-learning methods require full state observation of the underlying system and tacitly assume adiabatic changes in the parameter. Formulating an inverse problem and exploiting reservoir computing, we develop a model-free and fully data-driven framework to accurately track time-varying parameters from partial state observation in real time. In particular, with training data from a subset of the dynamical variables of the system for a small number of known parameter values, the framework is able to accurately predict the parameter variations in time. Low- and high-dimensional, Markovian and non-Markovian nonlinear dynamical systems are used to demonstrate the power of the machine-learning based parameter-tracking framework. Pertinent issues affecting the tracking performance are addressed.
    摘要 复杂和非线性动力系统经常包含时间变化的参数,正精准跟踪这些参数是 estado estimation、预测和控制等任务的关键。现有的机器学习方法需要完整的系统状态观察,而且假设参数的变化是adiabatic的。我们通过形式化反问题和利用储存计算,开发了一种没有模型假设和完全数据驱动的参数跟踪框架。特别是,通过训练数据来自系统动力变量的一个子集,这种框架可以在实时中高精度预测参数的时间变化。我们使用了低维度和高维度、Markovian和非Markovian的非线性动力系统来证明框架的能力。我们还讨论了影响跟踪性能的关键问题。

Causal prediction models for medication safety monitoring: The diagnosis of vancomycin-induced acute kidney injury

  • paper_url: http://arxiv.org/abs/2311.09137
  • repo_url: None
  • paper_authors: Izak Yasrebi-de Kom, Joanna Klopotowska, Dave Dongelmans, Nicolette De Keizer, Kitty Jager, Ameen Abu-Hanna, Giovanni Cinà
  • for: 本研究旨在提供数据驱动的医疗安全监测支持,以改进现有的retrospective diagnosis of adverse drug events(ADEs)的方法。
  • methods: 本研究使用 causal modeling approach,包括 two key causal inference components:(1) 目标试验演示框架和 (2) 使用机器学习来估算个体化治疗效果。
  • results: 研究人员使用这种方法来估算vancomycin-induced acute kidney injury 中 ICU 病人的 causal probability(PC$_{low}$),并与医疗专家提供的qualitative estimates of the PC进行比较。
    Abstract The current best practice approach for the retrospective diagnosis of adverse drug events (ADEs) in hospitalized patients relies on a full patient chart review and a formal causality assessment by multiple medical experts. This evaluation serves to qualitatively estimate the probability of causation (PC); the probability that a drug was a necessary cause of an adverse event. This practice is manual, resource intensive and prone to human biases, and may thus benefit from data-driven decision support. Here, we pioneer a causal modeling approach using observational data to estimate a lower bound of the PC (PC$_{low}$). This method includes two key causal inference components: (1) the target trial emulation framework and (2) estimation of individualized treatment effects using machine learning. We apply our method to the clinically relevant use-case of vancomycin-induced acute kidney injury in intensive care patients, and compare our causal model-based PC$_{low}$ estimates to qualitative estimates of the PC provided by a medical expert. Important limitations and potential improvements are discussed, and we conclude that future improved causal models could provide essential data-driven support for medication safety monitoring in hospitalized patients.
    摘要 现有的最佳实践方法 для透view的药物反应(ADE)在医院化 patients中的诊断是通过全patient chart review和多个医疗专家的正式可能性评估来进行。这种评估用于Quantitatively estimating the probability of causation (PC); the probability that a drug was a necessary cause of an adverse event。这种方法是手动、资源浪费和人类偏见易受影响,可能从数据驱动的决策支持中受益。在这里,我们开创了一种 causal modeling 方法,使用观察数据来估算下限的PC(PC$_{low}$)。这个方法包括两个关键的 causal inference 组件:(1)目标试验拟合框架和(2)使用机器学习来估算个体化治疗效果。我们在Intensive care 患者中使用vancomycin-induced acute kidney injury作为临床实用的例子,并与医疗专家提供的qualitative PC 估计进行比较。我们讨论了重要的限制和可能的改进,并 conclude that future improved causal models could provide essential data-driven support for medication safety monitoring in hospitalized patients。

Fast Detection of Phase Transitions with Multi-Task Learning-by-Confusion

  • paper_url: http://arxiv.org/abs/2311.09128
  • repo_url: None
  • paper_authors: Julian Arnold, Frank Schäfer, Niels Lörch
  • for: study phase transitions
  • methods: learning-by-confusion scheme, multi-task learning
  • results: significant speedups, minor deviations compared to ideal case
    Abstract Machine learning has been successfully used to study phase transitions. One of the most popular approaches to identifying critical points from data without prior knowledge of the underlying phases is the learning-by-confusion scheme. As input, it requires system samples drawn from a grid of the parameter whose change is associated with potential phase transitions. Up to now, the scheme required training a distinct binary classifier for each possible splitting of the grid into two sides, resulting in a computational cost that scales linearly with the number of grid points. In this work, we propose and showcase an alternative implementation that only requires the training of a single multi-class classifier. Ideally, such multi-task learning eliminates the scaling with respect to the number of grid points. In applications to the Ising model and an image dataset generated with Stable Diffusion, we find significant speedups that closely correspond to the ideal case, with only minor deviations.
    摘要 machine learning 已经成功地应用于研究相转换。一种非常流行的方法是通过学习吃惊方式来识别潜在的 kritical point。这种方法需要输入系统样本,这些样本从可能存在phasetransition的参数变化中提取。以前,这种方法需要对每个可能的grid splitting into two sides进行训练独立的 binary classifier,因此计算成本将与grid点数 linearly scalable。在这篇文章中,我们提出了一种 alternatively,只需要训练一个多类分类器。理想情况下,这种多任务学习可以消除与grid点数的关系。在应用于 Ising 模型和一个通过 Stable Diffusion 生成的图像集中,我们发现了显著的加速,与理想情况几乎完全一致,只有小偏差。

Constructing interpretable principal curve using Neural ODEs

  • paper_url: http://arxiv.org/abs/2311.09274
  • repo_url: None
  • paper_authors: Guangzheng Zhang, Bingxian Xu
  • for: 这篇论文旨在Characterizing high-dimensional data sets in a dynamical manner, using neural ODEs to define the principal flow and summarize the space.
  • methods: 这篇论文使用了 neural ODEs 定义主流动,将数据集中的扩展转换为一个动态的流动形式,以便更好地描述高维数据集的本地几何结构。
  • results: 研究人员通过使用主流动来描述高维数据集的几何结构,并且可以在不同的复杂性水平上进行灵活的汇总。此外,主流动还可以包含刚性动力学的概念,以描述数据集的弹性relaxation dynamics。
    Abstract The study of high dimensional data sets often rely on their low dimensional projections that preserve the local geometry of the original space. While numerous methods have been developed to summarize this space as variations of tree-like structures, they are usually non-parametric and "static" in nature. As data may come from systems that are dynamical such as a differentiating cell, a static, non-parametric characterization of the space may not be the most appropriate. Here, we developed a framework, the principal flow, that is capable of characterizing the space in a dynamical manner. The principal flow, defined using neural ODEs, directs motion of a particle through the space, where the trajectory of the particle resembles the principal curve of the dataset. We illustrate that our framework can be used to characterize shapes of various complexities, and is flexible to incorporate summaries of relaxation dynamics.
    摘要 研究高维数据集时,常常利用其低维投影,以保持原始空间的本地几何结构。虽然有许多方法用于概括这个空间,但这些方法通常是非 Parametric 的,即 static 的性质。因为数据可能来自动演化的系统,如 diferenciating 细胞,静止、非 Parametric 的空间概括方法可能不是最合适的。我们在这里提出了一种框架,即主流动,可以 Dynamically 概括这个空间。主流动使用神经 ODEs 定义了一个粒子的运动轨迹,这个轨迹与数据集的主曲线相似。我们示示了我们的框架可以概括各种复杂的形状,并且可以容易地 incorporate 征relaxation 动态概括。

Damped Proximal Augmented Lagrangian Method for weakly-Convex Problems with Convex Constraints

  • paper_url: http://arxiv.org/abs/2311.09065
  • repo_url: None
  • paper_authors: Hari Dahal, Wei Liu, Yangyang Xu
  • for: 解决具有弱 converges 目标函数和几何/非几何约束的问题
  • methods: 使用抑制距离 proximal 束更新法 (DPALM)
  • results: 可以在 $O(\vareps^{-2})$ 外循环迭代中生成一个 $(1+\vareps)$-$ KKT $点,并且在不同类型的目标函数和约束下,DPALM 的迭代复杂度为 $\widetilde{\mathcal{O}\left(\varepsilon^{-2.5} \right)$ 或 $\widetilde{\mathcal{O}\left(\varepsilon^{-3} \right)$。此外,DPALM 在实验中证明比一些现有的方法更高效。
    Abstract We give a damped proximal augmented Lagrangian method (DPALM) for solving problems with a weakly-convex objective and convex linear/nonlinear constraints. Instead of taking a full stepsize, DPALM adopts a damped dual stepsize to ensure the boundedness of dual iterates. We show that DPALM can produce a (near) $\vareps$-KKT point within $O(\vareps^{-2})$ outer iterations if each DPALM subproblem is solved to a proper accuracy. In addition, we establish overall iteration complexity of DPALM when the objective is either a regularized smooth function or in a regularized compositional form. For the former case, DPALM achieves the complexity of $\widetilde{\mathcal{O}\left(\varepsilon^{-2.5} \right)$ to produce an $\varepsilon$-KKT point by applying an accelerated proximal gradient (APG) method to each DPALM subproblem. For the latter case, the complexity of DPALM is $\widetilde{\mathcal{O}\left(\varepsilon^{-3} \right)$ to produce a near $\varepsilon$-KKT point by using an APG to solve a Moreau-envelope smoothed version of each subproblem. Our outer iteration complexity and the overall complexity either generalize existing best ones from unconstrained or linear-constrained problems to convex-constrained ones, or improve over the best-known results on solving the same-structured problems. Furthermore, numerical experiments on linearly/quadratically constrained non-convex quadratic programs and linear-constrained robust nonlinear least squares are conducted to demonstrate the empirical efficiency of the proposed DPALM over several state-of-the art methods.
    摘要 我们提出了一个受抑制的近边增强方法(DPALM)来解决具有弱拟对函数和线性/非线性约束的问题。而不是采用完整的步长,DPALM 使用了一个抑制的对偶步长来保证对偶变数的紧缩性。我们证明了DPALM 可以在 $O(\vareps^{-2})$ 外部迭代中生成一个 $(1-\vareps)$ KKT 点。此外,我们建立了 DPALM 的总迭代复杂度,其中当函数是轻度调整的滑坡函数或是调整后的函数时,DPALM 的复杂度为 $\widetilde{\mathcal{O}\left(\varepsilon^{-2.5} \right)$ 和 $\widetilde{\mathcal{O}\left(\varepsilon^{-3} \right)$ 分别。这些结果缩减了现有最好的结果,或者提高了现有的最好结果。此外,我们还进行了一些实验,证明 DPALM 在线性/quadratically constrained non-convex quadratic programs 和 linear-constrained robust nonlinear least squares 中的实际效率。

New Horizons in Parameter Regularization: A Constraint Approach

  • paper_url: http://arxiv.org/abs/2311.09058
  • repo_url: None
  • paper_authors: Jörg K. H. Franke, Michael Hefenbrock, Gregor Koehler, Frank Hutter
  • for: 本研究旨在提出一种新的训练方法,即受限参数规范化(CPR),用于取代传统的质量惩罚。
  • methods: 本方法 Reformulates learning为一个受限优化问题,通过对具体参数组的统计量(例如L$_2$-norm)进行约束来实现。使用改进的扩展拉格朗日方法来解决这个受限优化问题。
  • results: 我们通过在”感知”现象、图像识别和自然语言处理等领域进行实验,证明CPR可以有效地抵消”感知”现象的影响,并且可以与传统的质量惩罚相比或超越其表现。
    Abstract This work presents constrained parameter regularization (CPR), an alternative to traditional weight decay. Instead of applying a constant penalty uniformly to all parameters, we enforce an upper bound on a statistical measure (e.g., the L$_2$-norm) of individual parameter groups. This reformulates learning as a constrained optimization problem. To solve this, we utilize an adaptation of the augmented Lagrangian method. Our approach allows for varying regularization strengths across different parameter groups, removing the need for explicit penalty coefficients in the regularization terms. CPR only requires two hyperparameters and introduces no measurable runtime overhead. We offer empirical evidence of CPR's effectiveness through experiments in the "grokking" phenomenon, image classification, and language modeling. Our findings show that CPR can counteract the effects of grokking, and it consistently matches or surpasses the performance of traditional weight decay.
    摘要 这个工作提出了制约参数规范化(CPR),这是传统权值衰退的替代方案。而不是对所有参数应用一定的罚款,我们要求各个参数组的统计量(例如L$_2$- нор)的上限。这将学习转化为一个受限制的优化问题。为解决这个问题,我们利用了一种改进后的扩展拉格朗日方法。我们的方法允许不同参数组的规范强度不同,从而消除了明确的罚款系数在规范项中的需求。CPR只需两个超参数,并没有可观测的运行时间开销。我们通过在“感悟”现象、图像分类和自然语言处理等领域进行实验,证明了CPR的有效性。我们的发现表明,CPR可以抵消“感悟”的影响,并且在性能上与传统权值衰退相当或超过。

On the Foundation of Distributionally Robust Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.09018
  • repo_url: None
  • paper_authors: Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou
  • for: This paper contributes to the theoretical foundation of distributionally robust reinforcement learning (DRRL) by providing a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs).
  • methods: The paper unifies and extends existing formulations of DRMDPs, and rigorously constructs DRMDPs that embrace various modeling attributes for both the decision maker and the adversary.
  • results: The paper examines conditions for the existence or absence of the dynamic programming principle (DPP) within the DRMDP framework, and provides streamlined proofs grounded in a unified methodology. Additionally, the paper offers counterexamples for settings in which a DPP with full generality is absent.Here is the same information in Simplified Chinese text:
  • for: 这篇论文为分布robust控制学(DRRL)的理论基础做出了贡献,提供了一个包容性的模型框架, centered around distributionally robust Markov decision processes(DRMDPs)。
  • methods: 论文将现有的DRMDPs整合和扩展,并强制构建DRMDPs,以涵盖决策者和反对派的各种模型特征。
  • results: 论文研究DRMDPs中的动态计划原理(DPP)的存在或缺失情况,并提供了一致的证明方法。此外,论文还提供了不具有全面通用性的DPP的 counterexample。
    Abstract Motivated by the need for a robust policy in the face of environment shifts between training and the deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct DRMDPs that embraces various modeling attributes for both the decision maker and the adversary. These attributes include adaptability granularity, exploring history-dependent, Markov, and Markov time-homogeneous decision maker and adversary dynamics. Additionally, we delve into the flexibility of shifts induced by the adversary, examining SA and S-rectangularity. Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficiency RL algorithms are reliant on the DPP. To study its existence, we comprehensively examine combinations of controller and adversary attributes, providing streamlined proofs grounded in a unified methodology. We also offer counterexamples for settings in which a DPP with full generality is absent.
    摘要 ▼ Motivated by the need for a robust policy in the face of environment shifts between training and deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct DRMDPs that embrace various modeling attributes for both the decision maker and the adversary. These attributes include adaptability granularity, exploring history-dependent, Markov, and Markov time-homogeneous decision maker and adversary dynamics. Additionally, we delve into the flexibility of shifts induced by the adversary, examining SA and S-rectangularity. Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficiency RL algorithms are reliant on the DPP. To study its existence, we comprehensively examine combinations of controller and adversary attributes, providing streamlined proofs grounded in a unified methodology. We also offer counterexamples for settings in which a DPP with full generality is absent.Note: Simplified Chinese is used in this translation, as it is more widely used in everyday communication and is easier to read for non-native speakers. However, if you prefer Traditional Chinese, I can also provide the translation in that format.

Semidefinite programs simulate approximate message passing robustly

  • paper_url: http://arxiv.org/abs/2311.09017
  • repo_url: None
  • paper_authors: Misha Ivkov, Tselil Schramm
  • for: solves many average-case optimization problems optimally
  • methods: uses local statistics hierarchy semidefinite programs (SDPs)
  • results: offers robust guarantees for many problems, including optimizing the Sherrington-Kirkpatrick Hamiltonian and others
    Abstract Approximate message passing (AMP) is a family of iterative algorithms that generalize matrix power iteration. AMP algorithms are known to optimally solve many average-case optimization problems. In this paper, we show that a large class of AMP algorithms can be simulated in polynomial time by \emph{local statistics hierarchy} semidefinite programs (SDPs), even when an unknown principal minor of measure $1/\mathrm{polylog}(\mathrm{dimension})$ is adversarially corrupted. Ours are the first robust guarantees for many of these problems. Further, our results offer an interesting counterpoint to strong lower bounds against less constrained SDP relaxations for average-case max-cut-gain (a.k.a. "optimizing the Sherrington-Kirkpatrick Hamiltonian") and other problems.
    摘要 <> Approximate message passing (AMP) 是一家Iterative algorithms的家族,它们可以通过 Matrix power iteration 的推广来解决许多平均情况优化问题。在这篇论文中,我们展示了一个大类的 AMP 算法可以通过本地统计层次SDP(semidefinite programs)来模拟,即使 unknown principal minor 的推广是 adversarially corrupted 的。这些结果是首次提供了对这些问题的稳定保证。此外,我们的结果还提供了一个有趣的对比,证明强下界 противless constrained SDP relaxations 对平均情况最大切割(a.k.a. "优化 Sherington-Kirkpatrick Hamiltonian")和其他问题的解决方案。<>

sQUlearn $\unicode{x2013}$ A Python Library for Quantum Machine Learning

  • paper_url: http://arxiv.org/abs/2311.08990
  • repo_url: https://github.com/squlearn/squlearn
  • paper_authors: David A. Kreplin, Moritz Willmann, Jan Schnabel, Frederic Rapp, Marco Roth
  • for: 这个论文是用于探讨量子机器学习(QML)的Python库,旨在让量子机器学习研究者和实践者可以轻松地整合古典机器学习工具 like scikit-learn。
  • methods: 这个库使用了分层架构,提供了丰富的工具集,包括量子核心方法和量子神经网络,以及自定义数据编码策略、自动化执行处理和特定核心规化技术。
  • results: 这个库的设计目标是让现有的量子计算能力和实际机器学习应用之间建立桥接,并且提供了高效的概念测试、实验和管道功能。
    Abstract sQUlearn introduces a user-friendly, NISQ-ready Python library for quantum machine learning (QML), designed for seamless integration with classical machine learning tools like scikit-learn. The library's dual-layer architecture serves both QML researchers and practitioners, enabling efficient prototyping, experimentation, and pipelining. sQUlearn provides a comprehensive toolset that includes both quantum kernel methods and quantum neural networks, along with features like customizable data encoding strategies, automated execution handling, and specialized kernel regularization techniques. By focusing on NISQ-compatibility and end-to-end automation, sQUlearn aims to bridge the gap between current quantum computing capabilities and practical machine learning applications.
    摘要

A Multimodal Dataset of 21,412 Recorded Nights for Sleep and Respiratory Research

  • paper_url: http://arxiv.org/abs/2311.08979
  • repo_url: None
  • paper_authors: Alon Diament, Maria Gorodetski, Adam Jankelow, Ayya Keshet, Tal Shor, Daphna Weissglas-Volkov, Hagai Rossman, Eran Segal
  • for: 这个研究旨在提供一个新的、丰富的家庭呼吸暂停测试数据集,来支持睡眠研究、个性化医疗和机器学习应用于生物医学领域。
  • methods: 这个研究使用了FDA批准的WatchPAT-300设备,收集了7,077名参与者在21,412个夜晚的数据,包括3级睡眠数据:原始多核心时间序列数据、注释的睡眠事件和计算的摘要统计数据,其中包括447个睡眠建筑、呼吸暂停和心率变化的特征。
  • results: 这个数据集可以提高许多健康相关特征的预测能力,包括身体结构、骨骼密度、血糖水平和心血管健康。这些结果表明该数据集有可能在睡眠研究、个性化医疗和机器学习应用中提供新的参考值。
    Abstract This study introduces a novel, rich dataset obtained from home sleep apnea tests using the FDA-approved WatchPAT-300 device, collected from 7,077 participants over 21,412 nights. The dataset comprises three levels of sleep data: raw multi-channel time-series from sensors, annotated sleep events, and computed summary statistics, which include 447 features related to sleep architecture, sleep apnea, and heart rate variability (HRV). We present reference values for Apnea/Hypopnea Index (AHI), sleep efficiency, Wake After Sleep Onset (WASO), and HRV sample entropy, stratified by age and sex. Moreover, we demonstrate that the dataset improves the predictive capability for various health related traits, including body composition, bone density, blood sugar levels and cardiovascular health. These results illustrate the dataset's potential to advance sleep research, personalized healthcare, and machine learning applications in biomedicine.
    摘要 Note: "Simplified Chinese" is used to refer to the standardized form of Chinese used in mainland China, as opposed to "Traditional Chinese" which is used in Hong Kong, Taiwan, and other regions. The translation is written in Simplified Chinese, but the original text is in Traditional Chinese.

Probability of Collision of satellites and space debris for short-term encounters: Rederivation and fast-to-compute upper and lower bounds

  • paper_url: http://arxiv.org/abs/2311.08978
  • repo_url: None
  • paper_authors: Ricardo Ferreira, Cláudia Soares, Marta Guimarães
    for: 这篇论文旨在解决低地球轨道(LEO)中对空间业务造成的垃圾物品问题,特别是预测这些物品之间的碰撞可能性。methods: 这篇论文提出了一个新的 derive 方法,基于初始假设,允许快速和紧密的上下限 bounds Computation,以便更好地预测碰撞可能性。results: 这篇论文的实验显示,与传统方法相比,新的 derive 方法可以快速计算碰撞可能性,并且可以实现几乎实时的处理时间。
    Abstract The proliferation of space debris in LEO has become a major concern for the space industry. With the growing interest in space exploration, the prediction of potential collisions between objects in orbit has become a crucial issue. It is estimated that, in orbit, there are millions of fragments a few millimeters in size and thousands of inoperative satellites and discarded rocket stages. Given the high speeds that these fragments can reach, even fragments a few millimeters in size can cause fractures in a satellite's hull or put a serious crack in the window of a space shuttle. The conventional method proposed by Akella and Alfriend in 2000 remains widely used to estimate the probability of collision in short-term encounters. Given the small period of time, it is assumed that, during the encounter: (1) trajectories are represented by straight lines with constant velocity; (2) there is no velocity uncertainty and the position exhibits a stationary distribution throughout the encounter; and (3) position uncertainties are independent and represented by Gaussian distributions. This study introduces a novel derivation based on first principles that naturally allows for tight and fast upper and lower bounds for the probability of collision. We tested implementations of both probability and bound computations with the original and our formulation on a real CDM dataset used in ESA's Collision Avoidance Challenge. Our approach reduces the calculation of the probability to two one-dimensional integrals and has the potential to significantly reduce the processing time compared to the traditional method, from 80% to nearly real-time.
    摘要 随着空间探索的兴趣日益增长,附近轨道上的空间垃圾堆积已成为空间业界的一个重要问题。预测可能的空间碰撞事件已成为一项关键的问题。据估计,附近轨道上有数以百万计的碎片几毫米大小,以及废弃的卫星和发射器阶段。由于这些碎片的高速运动,же�不是几毫米大小的碎片也可以使附近轨道上的卫星舱壁受损或使空间飞船的窗口受伤。传统的方法,提出于Akella和Alfriend在2000年,仍然广泛使用来估计短期遭遇中的碰撞机会。在这种情况下,假设:(1)轨道可以用直线表示,速度为常数;(2)速度不具有uncertainty,位置呈静态分布;(3)位置不确定性是独立的 Gaussian 分布。本研究提出了一种基于初始原理的新 derivation,自然地提供了紧密和快速的上限和下限 bounds для碰撞机会的计算。我们对原始和我们的方法进行了实现,并使用ESA的Collision Avoidance Challenge中使用的真实CDM数据进行测试。我们的方法将计算概率减少到两个一维 интегра尔,并有可能减少计算时间比传统方法的80%,从近实时级别降低到。

A Single-Loop Algorithm for Decentralized Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2311.08945
  • repo_url: None
  • paper_authors: Youran Dong, Shiqian Ma, Junfeng Yang, Chao Yin
  • for: 这 paper 是关于分布式机器学习中的 bilateral 优化问题的研究。
  • methods: 该 paper 提出了一种新的单循环算法来解决分布式 bilateral 优化问题,该算法不需要大量的矩阵-向量乘制。此外,不同于现有的分布式 bilateral 优化和联邦 bilateral 优化方法,该算法不需要任何梯度差异假设。
  • results: 我们的分析表明,提出的算法可以达到最佳知名的 convergence rate для bilateral 优化算法。
    Abstract Bilevel optimization has received more and more attention recently due to its wide applications in machine learning. In this paper, we consider bilevel optimization in decentralized networks. In particular, we propose a novel single-loop algorithm for solving decentralized bilevel optimization with strongly convex lower level problem. Our algorithm is fully single-loop and does not require heavy matrix-vector multiplications when approximating the hypergradient. Moreover, unlike existing methods for decentralized bilevel optimization and federated bilevel optimization, our algorithm does not require any gradient heterogeneity assumption. Our analysis shows that the proposed algorithm achieves the best known convergence rate for bilevel optimization algorithms.
    摘要 “BILevel优化在近期Received更多的注意力,因为它在机器学习中有广泛的应用。在这篇论文中,我们考虑了分布式网络中的BILevel优化。具体来说,我们提出了一种新的单循环算法,用于解决分布式BILevel优化中强 convex下层问题。我们的算法完全是单循环的,不需要大量的矩阵-向量乘法来估计超gradient。此外,与现有的分布式BILevel优化和联邦BILevel优化方法不同,我们的算法不需要任何梯度异质假设。我们的分析表明,我们的算法可以达到BILevel优化算法中最佳的知名的速度。”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Efficiently Escaping Saddle Points for Non-Convex Policy Optimization

  • paper_url: http://arxiv.org/abs/2311.08914
  • repo_url: None
  • paper_authors: Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Niao He, Matthias Grossglauser
  • for: 本研究旨在提出一种基于积分Gradient的第二阶方法,以实现精细化优化。
  • methods: 该方法使用积分Gradient法,并使用矩阵乘法来获取第二阶信息,从而提高效率。
  • results: 实验结果表明,该方法可以更高效地优化问题,并且更具 robustness。
    Abstract Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate first-order stationary point (FOSP) with the sample complexity of $O(\epsilon^{-3})$. However, FOSPs could be bad local optima or saddle points. Moreover, these algorithms often use importance sampling (IS) weights which could impair the statistical effectiveness of variance reduction. In this paper, we propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $\tilde{O}(\epsilon^{-3})$. This rate improves the best-known sample complexity for achieving approximate SOSPs by a factor of $O(\epsilon^{-0.5})$. Moreover, the proposed variance reduction technique bypasses IS weights by using HVP terms. Our experimental results show that the proposed algorithm outperforms the state of the art and is more robust to changes in random seeds.
    摘要 In this paper, we propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $\tilde{O}(\epsilon^{-3})$. This rate improves the best-known sample complexity for achieving approximate SOSPs by a factor of $O(\epsilon^{-0.5})$. Moreover, the proposed variance reduction technique bypasses IS weights by using HVP terms.Our experimental results show that the proposed algorithm outperforms the state of the art and is more robust to changes in random seeds.

On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

  • paper_url: http://arxiv.org/abs/2311.08902
  • repo_url: https://github.com/ratschlab/clinical-embeddings
  • paper_authors: Rita Kuznetsova, Alizée Pace, Manuel Burger, Hugo Yèche, Gunnar Rätsch
  • for: 这些研究旨在探讨深度学习模型在医疗数据中的应用,尤其是在医院床位监测记录中处理时间序列数据。
  • methods: 这些研究使用了新的深度学习架构,包括树状结构和表格数据的处理方法。
  • results: 研究发现,使用这些新的深度学习方法可以在医疗数据中提高时间序列模型的性能,特别是在医院床位监测记录中。 Additionally, the study found that feature grouping within predefined semantic groups in the step-wise embedding module can lead to significant performance gains in clinical time-series.
    Abstract Recent advances in deep learning architectures for sequence modeling have not fully transferred to tasks handling time-series from electronic health records. In particular, in problems related to the Intensive Care Unit (ICU), the state-of-the-art remains to tackle sequence classification in a tabular manner with tree-based methods. Recent findings in deep learning for tabular data are now surpassing these classical methods by better handling the severe heterogeneity of data input features. Given the similar level of feature heterogeneity exhibited by ICU time-series and motivated by these findings, we explore these novel methods' impact on clinical sequence modeling tasks. By jointly using such advances in deep learning for tabular data, our primary objective is to underscore the importance of step-wise embeddings in time-series modeling, which remain unexplored in machine learning methods for clinical data. On a variety of clinically relevant tasks from two large-scale ICU datasets, MIMIC-III and HiRID, our work provides an exhaustive analysis of state-of-the-art methods for tabular time-series as time-step embedding models, showing overall performance improvement. In particular, we evidence the importance of feature grouping in clinical time-series, with significant performance gains when considering features within predefined semantic groups in the step-wise embedding module.
    摘要 近期深度学习框架的进步在序列模型方面尚未完全传递到医疗电子记录中的时间序列处理任务上。特别是在医疗中心内部疗单(ICU)中的问题上,现状的方法仍然是通过树状方法进行序列分类。现有的深度学习方法在表格数据上的发现已经超过了传统方法,更好地处理数据输入特征的严重不同。基于这些发现,我们在临床时序序模型任务上运用这些新的方法,主要目标是强调时序序模型中步骤嵌入的重要性,这在机器学习方法中尚未得到充分发挥。从两个大规模的ICU数据集(MIMIC-III和HiRID)中,我们进行了广泛的性能分析,并证明了采用时间步骤嵌入模型可以获得总体性能提升。尤其是在医疗时序序中,对特定 semantic group 中的特征进行分组,可以获得显著性能提升。

FedCode: Communication-Efficient Federated Learning via Transferring Codebooks

  • paper_url: http://arxiv.org/abs/2311.09270
  • repo_url: None
  • paper_authors: Saeed Khalilian, Vasileios Tsouvalas, Tanir Ozcelebi, Nirvana Meratnia
  • for: 提高 Federated Learning (FL) 中的数据传输效率,降低客户端和服务器之间的通信负担。
  • methods: 提出 FedCode 方法,即在客户端上生成和更新 Codebook,并在服务器端 периоди性地传输模型参数以保证学习过程的稳定性和准确性。
  • results: 通过多个公共数据集和 ResNet-20 和 MobileNet 模型框架进行评估,实现了平均数据传输量的12.2倍减少,同时保持与 FedAvg 相对的模型性能水平(准确率下降率为1.3%)。进一步验证了 FedCode 在非Identical和分布式数据上的性能,其中数据传输量减少约12.7倍,并且模型性能下降率为2.0%。
    Abstract Federated Learning (FL) is a distributed machine learning paradigm that enables learning models from decentralized local data. While FL offers appealing properties for clients' data privacy, it imposes high communication burdens for exchanging model weights between a server and the clients. Existing approaches rely on model compression techniques, such as pruning and weight clustering to tackle this. However, transmitting the entire set of weight updates at each federated round, even in a compressed format, limits the potential for a substantial reduction in communication volume. We propose FedCode where clients transmit only codebooks, i.e., the cluster centers of updated model weight values. To ensure a smooth learning curve and proper calibration of clusters between the server and the clients, FedCode periodically transfers model weights after multiple rounds of solely communicating codebooks. This results in a significant reduction in communication volume between clients and the server in both directions, without imposing significant computational overhead on the clients or leading to major performance degradation of the models. We evaluate the effectiveness of FedCode using various publicly available datasets with ResNet-20 and MobileNet backbone model architectures. Our evaluations demonstrate a 12.2-fold data transmission reduction on average while maintaining a comparable model performance with an average accuracy loss of 1.3% compared to FedAvg. Further validation of FedCode performance under non-IID data distributions showcased an average accuracy loss of 2.0% compared to FedAvg while achieving approximately a 12.7-fold data transmission reduction.
    摘要 联邦学习(FL)是一种分布式机器学习 paradigma,允许从分布式本地数据学习模型。而FL具有保护客户端数据隐私的优点,但是它需要在服务器和客户端之间高频率进行模型Weight的交换,从而增加了通信负担。现有的方法通过模型压缩技术,如剪枝和Weight集成,来解决这个问题。然而,在每次联邦轮次中发送整个Weight更新集合,即使使用压缩Format,仍然限制了可以减少通信量的潜在降低。我们提议FedCode,客户端只发送codebook,即更新模型Weight值的cluster中心。为确保客户端和服务器之间的学习曲线和模型Weight的准确协调,FedCode在多个轮次后 periodic地传输模型Weight。这Resulted in 客户端和服务器之间的通信量减少,而无需增加客户端的计算负担或导致模型性能下降。我们使用多个公共可用的数据集进行评估,并使用ResNet-20和MobileNet底层模型结构。我们的评估结果表明,FedCode可以实现12.2倍的数据传输减少,而无需增加客户端的计算负担,并且模型性能下降率为1.3%,相比FedAvg。此外,我们进一步验证了FedCode在非 Identical Data分布下的性能,其中Accuracy下降率为2.0%,并实现了约12.7倍的数据传输减少。

Towards Label Embedding – Measuring classification difficulty

  • paper_url: http://arxiv.org/abs/2311.08874
  • repo_url: None
  • paper_authors: Katharina Hechinger, Christoph Koller, Xiao Xiang Zhu, Göran Kauermann
  • for: 本研究的目的是提出一种基于投票分布的 Label Embedding 方法,以便在无约束的多个 Labeler 独立标注的情况下,生成高质量的 Label Embedding。
  • methods: 本研究使用了 Bayesian 模型 Dirichlet-Multinomial 模型,通过随机推断 Maximization 算法和 Markov Chain Monte Carlo 步骤来估计模型和 posterior。
  • results: 研究人员通过应用该方法于三个 benchmark 数据集,得到了高质量的 Label Embedding,并且可以 Investigate 得到的相关性矩阵,它们可以作为普通的混淆矩阵,反映原始类别之间的semantic similarity。
    Abstract Uncertainty quantification in machine learning is a timely and vast field of research. In supervised learning, uncertainty can already occur in the very first stage of the training process, the labelling step. In particular, this is the case when not every instance can be unambiguously classified. The problem occurs for classifying instances, where classes may overlap or instances can not be clearly categorised. In other words, there is inevitable ambiguity in the annotation step and not necessarily a 'ground truth'. We look exemplary at the classification of satellite images. Each image is annotated independently by multiple labellers and classified into local climate zones (LCZs). For each instance we have multiple votes, leading to a distribution of labels rather than a single value. The main idea of this work is that we do not assume a ground truth label but embed the votes into a K-dimensional space, with K as the number of possible categories. The embedding is derived from the voting distribution in a Bayesian setup, modelled via a Dirichlet-Multinomial model. We estimate the model and posteriors using a stochastic Expectation Maximisation algorithm with Markov Chain Monte Carlo steps. While we focus on the particular example of LCZ classification, the methods developed in this paper readily extend to other situations where multiple annotators independently label texts or images. We also apply our approach to two other benchmark datasets for image classification to demonstrate this. Besides the embeddings themselves, we can investigate the resulting correlation matrices, which can be seen as generalised confusion matrices and reflect the semantic similarities of the original classes very well for all three exemplary datasets. The insights gained are valuable and can serve as general label embedding if a single ground truth per observation cannot be guaranteed.
    摘要 机器学习中的不确定性评估是一个时髦的和广泛的研究领域。在指导学习中,不确定性可以在训练过程的第一个阶段出现,即标注阶段。具体来说,这是因为不every个实例都可以无ambiguously分类。标注阶段存在不可避免的uncertainty,而不是一个固定的'ground truth'。我们通过卫星图像的分类为例,每个图像都被独立地标注了多个标注者,并被分类到本地气候区(LCZ)中。每个实例都有多个选择,导致一个分布而不是单个值。我们的主要想法是不 assumption of ground truth标签,而是将选择embed到K-维空间中,K为可能的类别数。这个空间中的嵌入是基于投票分布的Dirichlet-Multinomial模型。我们使用随机抽样最大化算法和Markov链条件遍历来估算模型和 posterior。虽然我们专注于 LCZ 分类的例子,但我们的方法很容易扩展到其他情况,其中多个标注者独立地标注文本或图像。我们还应用我们的方法到了两个其他的图像分类 benchmark 数据集,以示这。除了嵌入本身以外,我们还可以调查结果中的相关性矩阵,可以看作是通用的混淆矩阵,很好地反映原始类别之间的 semantic 相似性。这些发现可以作为一般的标签嵌入,当single ground truth per observation不能保证时。

Statistical learning by sparse deep neural networks

  • paper_url: http://arxiv.org/abs/2311.08845
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Felix Abramovich
  • for: 这个论文是用来研究深度神经网络估计器,特别是使用经验风险最小化与L1正则化。
  • methods: 这个论文使用了经验风险最小化与L1正则化来估计深度神经网络。
  • results: 这个论文提出了一个通用的额外风险减少约束,并证明了深度神经网络估计器在不同函数类中同时具有适应性减少约束(只差log因子)。
    Abstract We consider a deep neural network estimator based on empirical risk minimization with l_1-regularization. We derive a general bound for its excess risk in regression and classification (including multiclass), and prove that it is adaptively nearly-minimax (up to log-factors) simultaneously across the entire range of various function classes.
    摘要 我们考虑了一种深度神经网络估计器,基于经验风险最小化和L1正则化。我们得到了一个通用的过度风险上界,并证明其在回归和分类(包括多类)中的过度风险增长率是适应的,即在不同的函数集中同时达到了 Nearly-minimax 性(即Logarithmic factor)。Here's the word-for-word translation:我们考虑了一种深度神经网络估计器,基于经验风险最小化与L1正则化。我们得到了一个通用的过度风险上界,并证明其在回归与分类(包括多类)中的过度风险增长率是适应的,即在不同的函数集中同时达到了 Nearly-minimax 性(即Logarithmic factor)。

Neuroscience inspired scientific machine learning (Part-1): Variable spiking neuron for regression

  • paper_url: http://arxiv.org/abs/2311.09267
  • repo_url: None
  • paper_authors: Shailesh Garg, Souvik Chakraborty
  • for: 降低神经网络中的冗余传输,以降低深度学习模型的复杂性和能耗。
  • methods: 提出一种新的变量脉冲神经元(VSN),基于生物神经元灵感的泄漏集成和发射神经元(LIF-SN)。VSN兼用了LIF-SN和人工神经元的优点,实现了间歇性发射和连续活动的同时存在。
  • results: 对于分类和回归任务进行测试,VSN的结果表明其适用程度较高,尤其是在回归任务中。
    Abstract Redundant information transfer in a neural network can increase the complexity of the deep learning model, thus increasing its power consumption. We introduce in this paper a novel spiking neuron, termed Variable Spiking Neuron (VSN), which can reduce the redundant firing using lessons from biological neuron inspired Leaky Integrate and Fire Spiking Neurons (LIF-SN). The proposed VSN blends LIF-SN and artificial neurons. It garners the advantage of intermittent firing from the LIF-SN and utilizes the advantage of continuous activation from the artificial neuron. This property of the proposed VSN makes it suitable for regression tasks, which is a weak point for the vanilla spiking neurons, all while keeping the energy budget low. The proposed VSN is tested against both classification and regression tasks. The results produced advocate favorably towards the efficacy of the proposed spiking neuron, particularly for regression tasks.
    摘要 <> neural network 中的重复信息传递可能会增加深度学习模型的复杂性,从而增加其电力消耗。本文提出了一种新型的脉冲神经元(Variable Spiking Neuron,VSN),它可以减少不必要的脉冲发生,基于生物神经元发射的灵感,如生物脉冲神经元(LIF-SN)。提出的 VSN 结合了人工神经元和生物神经元的优点。它可以在脉冲神经元中实现间歇性的发射,同时在人工神经元中实现不间歇的活动。这种 VSN 的特性使其适用于回归任务,这是普通脉冲神经元的弱点,又不增加能量预算。本文测试了 VSN 在分类和回归任务上的表现,结果表明,特别是在回归任务上,提出的脉冲神经元具有良好的效果。Note: Simplified Chinese is a romanization of Chinese that uses a simplified set of characters and grammar rules. It is commonly used in mainland China and Singapore. Traditional Chinese is another form of Chinese that uses a more complex set of characters and grammar rules, and is commonly used in Hong Kong, Macau, and Taiwan.

Environment-independent mmWave Fall Detection with Interacting Multiple Model

  • paper_url: http://arxiv.org/abs/2311.08755
  • repo_url: None
  • paper_authors: Xuyao Yu, Jiazhao Wang, Wenchao Jiang
  • for: 本研究旨在开发一种高精度、高可靠性的非侵入式、非合作式、非接触式跌倒检测系统,以满足智能家居未来的老年人日常照顾需求。
  • methods: 本研究使用mmWave雷达技术,并提出了一种实用的多模型状态估计器(IMM),可以提取环境无关的特征,以实现高精度和快速的跌倒检测。此外,我们还提出了一种Robust多用户跟踪系统,以处理环境噪音和其他人体噪音。
  • results: 我们在实际场景中进行了测试,结果显示跌倒检测精度达95%。
    Abstract The ageing society brings attention to daily elderly care through sensing technologies. The future smart home is expected to enable in-home daily monitoring, such as fall detection, for seniors in a non-invasive, non-cooperative, and non-contact manner. The mmWave radar is a promising candidate technology for its privacy-preserving and non-contact manner. However, existing solutions suffer from low accuracy and robustness due to environment dependent features. In this paper, we present FADE (\underline{FA}ll \underline{DE}tection), a practical fall detection radar system with enhanced accuracy and robustness in real-world scenarios. The key enabler underlying FADE is an interacting multiple model (IMM) state estimator that can extract environment-independent features for highly accurate and instantaneous fall detection. Furthermore, we proposed a robust multiple-user tracking system to deal with noises from the environment and other human bodies. We deployed our algorithm on low computing power and low power consumption system-on-chip (SoC) composed of data front end, DSP, and ARM processor, and tested its performance in real-world. The experiment shows that the accuracy of fall detection is up to 95\%.
    摘要 社会老龄化引导了每天老人照顾的注意力,未来智能家庭将采用感知技术实现在家中无需参与的老人照顾。例如,fall detection。 millimeter wave radar是一种有前途的技术,因为它可以保持隐私和不接触的方式。然而,现有的解决方案受到环境依赖的特征的影响,导致准确性和可靠性不高。本文提出了FADE(落体检测),一种实用的落体检测雷达系统,具有提高了准确性和可靠性的实际应用能力。FADE的关键技术是一种交互式多模型(IMM)状态估计器,可以提取环境无关的特征,实现高准确性和快速检测落体。此外,我们还提出了一种Robust多用户跟踪系统,可以处理环境和其他人体的噪声。我们将算法部署到低计算力和低功耗系统上,并在实际应用中进行测试。实验结果表明,落体检测精度达95%。

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling

  • paper_url: http://arxiv.org/abs/2311.08745
  • repo_url: None
  • paper_authors: Naoki Sato, Hideaki Iiduka
  • for: 本研究は非凸函数のグローバル最适解を探索するためのGradient Descent方法の theoretically 分析を提供します。
  • methods: 本研究では新しい非凸函数の家族を定义し、その suffcient condition を议论し、graduated optimization algorithmの扩张を提供します。
  • results: 本研究の结果は、学习率とバッチサイズが函数の平滑化に影响することを示します。また、decaying learning rateと増加するバッチサイズがsuperiorであることを理论的に说明します。さらに、Image classificationの実験结果を提供しています。
    Abstract The graduated optimization approach is a heuristic method for finding globally optimal solutions for nonconvex functions and has been theoretically analyzed in several studies. This paper defines a new family of nonconvex functions for graduated optimization, discusses their sufficient conditions, and provides a convergence analysis of the graduated optimization algorithm for them. It shows that stochastic gradient descent (SGD) with mini-batch stochastic gradients has the effect of smoothing the function, the degree of which is determined by the learning rate and batch size. This finding provides theoretical insights from a graduated optimization perspective on why large batch sizes fall into sharp local minima, why decaying learning rates and increasing batch sizes are superior to fixed learning rates and batch sizes, and what the optimal learning rate scheduling is. To the best of our knowledge, this is the first paper to provide a theoretical explanation for these aspects. Moreover, a new graduated optimization framework that uses a decaying learning rate and increasing batch size is analyzed and experimental results of image classification that support our theoretical findings are reported.
    摘要 “渐进优化方法是一种幂等方法,用于找到非对称函数的全局优化解决方案,在一些研究中得到了理论分析。这篇论文定义了一个新的非对称函数家族,讨论了它们的必要条件,并对渐进优化算法的整体分析进行了讨论。研究表明,使用批处理随机梯度 descend (SGD) 可以将函数缓和,其缓和度取决于学习率和批处理大小。这一发现为渐进优化视角提供了理论上的解释,包括大批处理大小会落入锐的局部最优点、 decaying 学习率和增加批处理大小是优于固定学习率和批处理大小,以及优化学习率的调度。这是我们知道的第一篇提供了这些方面的理论解释的论文。此外,我们还提出了一种使用 decaying 学习率和增加批处理大小的新渐进优化框架,并对实验结果进行了报告。”

Towards Graph-Aware Diffusion Modeling for Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2311.08744
  • repo_url: None
  • paper_authors: Yunqin Zhu, Chao Wang, Hui Xiong
  • for: 这篇论文是为了提出一种基于神经网络模型的推荐系统中的恢复隐藏反馈方法,帮助推荐系统更好地理解用户的偏好。
  • methods: 该方法基于diffusion模型,通过对用户的历史交互数据进行反 diffusion,逐渐恢复用户的隐藏偏好。具体来说,我们首先应用synthetic smoothing filters于item-item图中的交互信号,然后通过graph Fourier transform将这种模型 equivalently characterized为一种在图спектраль领域的非对称Gaussian diffusion。
  • results: 我们的模型在一个数据集上比州当前的方法提高了大量的margin,并在其他数据集上获得了竞争力的结果。
    Abstract Recovering masked feedback with neural models is a popular paradigm in recommender systems. Seeing the success of diffusion models in solving ill-posed inverse problems, we introduce a conditional diffusion framework for collaborative filtering that iteratively reconstructs a user's hidden preferences guided by its historical interactions. To better align with the intrinsic characteristics of implicit feedback data, we implement forward diffusion by applying synthetic smoothing filters to interaction signals on an item-item graph. The resulting reverse diffusion can be interpreted as a personalized process that gradually refines preference scores. Through graph Fourier transform, we equivalently characterize this model as an anisotropic Gaussian diffusion in the graph spectral domain, establishing both forward and reverse formulations. Our model outperforms state-of-the-art methods by a large margin on one dataset and yields competitive results on the others.
    摘要 “复原涂体反馈”是现代推荐系统中广泛应用的一种方法。见到传播模型在解决不确定 inverse 问题中的成功,我们引入一个受条件的涂体架构,通过Iteratively重建用户隐藏的偏好。为了更好地适应实际的隐藏反馈数据特点,我们实现了前方涂体,通过对交互信号进行合成滤波,实现反涂体。这个过程可以解释为对用户个别化的过程,渐进地调整偏好分数。通过几何传播变换,我们将这个模型等同于一个方向性涂体在几何spectral domain中,建立了前后两种表现。我们的模型在一个数据集上大幅超过了现有方法,并在其他数据集上获得了竞争性的结果。

Enabling CMF Estimation in Data-Constrained Scenarios: A Semantic-Encoding Knowledge Mining Model

  • paper_url: http://arxiv.org/abs/2311.08690
  • repo_url: None
  • paper_authors: Yanlin Qi, Jia Li, Michael Zhang
    for: 这个研究的目的是提供一个可靠且可读的知识探索框架,以便更好地估算防车攻击因子(CMF)。methods: 本研究使用了人类理解的灵感和进步的自然语言处理(NLP)技术,将存在的防车攻击因子知识中的细微变化和图像转换成机器可读的表示,以模型这些变化和CMF值之间的复杂关系。results: 实验结果显示,这个新的数据驱动的框架可以与传统的CMF估算方法相比,在精度方面得到了明显的改善。此外,这个方法还提供了对于防车攻击因子估算的新的可能性,例如可以在不 enough crash data 的情况下进行估算。
    Abstract Precise estimation of Crash Modification Factors (CMFs) is central to evaluating the effectiveness of various road safety treatments and prioritizing infrastructure investment accordingly. While customized study for each countermeasure scenario is desired, the conventional CMF estimation approaches rely heavily on the availability of crash data at given sites. This not only makes the estimation costly, but the results are also less transferable, since the intrinsic similarities between different safety countermeasure scenarios are not fully explored. Aiming to fill this gap, this study introduces a novel knowledge-mining framework for CMF prediction. This framework delves into the connections of existing countermeasures and reduces the reliance of CMF estimation on crash data availability and manual data collection. Specifically, it draws inspiration from human comprehension processes and introduces advanced Natural Language Processing (NLP) techniques to extract intricate variations and patterns from existing CMF knowledge. It effectively encodes unstructured countermeasure scenarios into machine-readable representations and models the complex relationships between scenarios and CMF values. This new data-driven framework provides a cost-effective and adaptable solution that complements the case-specific approaches for CMF estimation, which is particularly beneficial when availability of crash data or time imposes constraints. Experimental validation using real-world CMF Clearinghouse data demonstrates the effectiveness of this new approach, which shows significant accuracy improvements compared to baseline methods. This approach provides insights into new possibilities of harnessing accumulated transportation knowledge in various applications.
    摘要 中 precisione 的评估坏事件修复因素(CMF)是评估不同安全处理措施的效果和决策建设投资的中心。尽管欢迎特定情况的自定义研究,但传统的CMF评估方法依赖于提供的坏事件数据的可用性,这不仅使得评估成本高,还使得结果更难于传递。为了缓解这个差距,本研究提出了一种新的知识挖掘框架,用于预测CMF。这个框架Drawing inspiration from human comprehension processes and introducing advanced Natural Language Processing (NLP) techniques, it effectively encodes unstructured countermeasure scenarios into machine-readable representations and models the complex relationships between scenarios and CMF values. This new data-driven framework provides a cost-effective and adaptable solution that complements case-specific approaches for CMF estimation, which is particularly beneficial when availability of crash data or time imposes constraints. Experimental validation using real-world CMF Clearinghouse data demonstrates the effectiveness of this new approach, which shows significant accuracy improvements compared to baseline methods. This approach provides insights into new possibilities of harnessing accumulated transportation knowledge in various applications.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Federated Learning for Sparse Principal Component Analysis

  • paper_url: http://arxiv.org/abs/2311.08677
  • repo_url: None
  • paper_authors: Sin Cheng Ciou, Pin Jui Chen, Elvin Y. Tseng, Yuh-Jye Lee
  • for: 本研究旨在提出一种基于联合方向法的分布式主成分分析(Federated SPCA)方法,以提高数据Owner之间的数据共享和模型训练效率。
  • methods: 本研究使用了联合方向法(ADMM)和权重补做法来解决分布式主成分分析(SPCA)中的优化问题,并在异步数据场景下进行了广泛的实验 validate the effectiveness of the proposed method.
  • results: 实验结果表明,对于同adiabatic和非同adiabatic的情况,Federated SPCA方法能够提高模型训练效率和数据共享安全性,同时保持模型训练精度。此外,Federated SPCA方法还能够适应不同的数据分布场景。
    Abstract In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keeping data localized. Instead of sending raw data to a central server, only model updates are exchanged, enhancing data security. We apply this framework to Sparse Principal Component Analysis (SPCA) in this work. SPCA aims to attain sparse component loadings while maximizing data variance for improved interpretability. Beside the L1 norm regularization term in conventional SPCA, we add a smoothing function to facilitate gradient-based optimization methods. Moreover, in order to improve computational efficiency, we introduce a least squares approximation to original SPCA. This enables analytic solutions on the optimization processes, leading to substantial computational improvements. Within the federated framework, we formulate SPCA as a consensus optimization problem, which can be solved using the Alternating Direction Method of Multipliers (ADMM). Our extensive experiments involve both IID and non-IID random features across various data owners. Results on synthetic and public datasets affirm the efficacy of our federated SPCA approach.
    摘要 在机器学习领域的急速发展中,算法效果经常受到数据质量和可用性的限制。传统方法面临数据分享的挑战,主要是因为法律和隐私问题。基于联邦学习框架的方法可以解决这个问题。联邦学习是一种分布式的方法,在客户端上进行模型训练,保持隐私性,不需要将数据传输到中央服务器。而不是将原始数据传输到中央服务器,只需将模型更新传输,从而提高数据安全性。在这种框架下,我们将SPCA(稀畴主成分分析)应用于这里。SPCA的目标是在保持数据变量的最大化的情况下,获得稀畴的成分荷载。我们在传统的SPCA中添加了简化函数,以便使用梯度基本优化方法。此外,我们引入了最小二乘近似,以提高计算效率。在联邦框架下,我们将SPCA定义为一个协调优化问题,可以使用ADMM(替代方向多个分解器)来解决。我们的广泛的实验包括了不同数据所有者的IID和非IID随机特征。结果表明我们的联邦SPCA方法是有效的。

Coreset Selection with Prioritized Multiple Objectives

  • paper_url: http://arxiv.org/abs/2311.08675
  • repo_url: None
  • paper_authors: Xiaobo Xia, Jiale Liu, Shaokun Zhang, Qingyun Wu, Tongliang Liu
  • for: 降低计算成本和加速数据处理,使深度学习算法在大规模数据上进行训练。
  • methods: 提出了“高性能核心集选择”问题,以最小化核心集大小,保证模型性能。提出了一种新的优先级优化方法,通过优先级顺序优化模型性能和核心集大小,并提供了证明其整体性的 converge 性。
  • results: 经验表明,该方法可以在各种场景下提供更好的模型性能,使用更小的核心集大小。
    Abstract Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically performs on par with full data. When coreset selection is applied in realistic scenes, under the premise that the identified coreset has achieved comparable model performance, practitioners regularly desire the identified coreset can have a size as small as possible for lower costs and greater acceleration. Motivated by this desideratum, for the first time, we pose the problem of "coreset selection with prioritized multiple objectives", in which the smallest coreset size under model performance constraints is explored. Moreover, to address this problem, an innovative method is proposed, which maintains optimization priority order over the model performance and coreset size, and efficiently optimizes them in the coreset selection procedure. Theoretically, we provide the convergence guarantee of the proposed method. Empirically, extensive experiments confirm its superiority compared with previous strategies, often yielding better model performance with smaller coreset sizes.
    摘要 <>核心集选择是深度学习算法中强大的reduction技术,它目标是从大规模数据中选择一小集,使得只训练这小集可以实际上与全量数据达到相同的性能水平。在实际场景中,当应用核心集选择时,参与者通常希望可以选择最小的核心集,以降低成本和提高加速。为此,我们首次提出了“核心集选择 WITH 优先级多个目标”的问题,即寻找最小的核心集,同时保证模型性能的限制。此外,我们还提出了一种创新的方法,可以具有优先级顺序的优化模型性能和核心集大小,并有理论上的 konvergence 保证。实际实验证明了该方法的优越性,常常可以在更小的核心集上达到更好的模型性能。>>>

Supervised low-rank semi-nonnegative matrix factorization with frequency regularization for forecasting spatio-temporal data

  • paper_url: http://arxiv.org/abs/2311.08636
  • repo_url: None
  • paper_authors: Keunsu Kim, Hanbaek Lyu, Jinsu Kim, Jae-Hun Jung
  • for: 预测空间时间数据使用supervised semi-nonnegative矩阵分解(SSNMF) WITH频率正则化
  • methods: 使用矩阵分解将空间时间数据分解成空间和时间组成部分,并在时间域加入非负约束,以提高时间模式的明确度。在频率域中选择特征,使解释更加容易。提出了软和硬正则化两种方法,并提供了首领点的收敛保证。
  • results: 应用于GRACE数据时,与前期研究在地球物理科学中的结果相比,提出的方法可以得到类似的结果,但是解释性更高。
    Abstract We propose a novel methodology for forecasting spatio-temporal data using supervised semi-nonnegative matrix factorization (SSNMF) with frequency regularization. Matrix factorization is employed to decompose spatio-temporal data into spatial and temporal components. To improve clarity in the temporal patterns, we introduce a nonnegativity constraint on the time domain along with regularization in the frequency domain. Specifically, regularization in the frequency domain involves selecting features in the frequency space, making an interpretation in the frequency domain more convenient. We propose two methods in the frequency domain: soft and hard regularizations, and provide convergence guarantees to first-order stationary points of the corresponding constrained optimization problem. While our primary motivation stems from geophysical data analysis based on GRACE (Gravity Recovery and Climate Experiment) data, our methodology has the potential for wider application. Consequently, when applying our methodology to GRACE data, we find that the results with the proposed methodology are comparable to previous research in the field of geophysical sciences but offer clearer interpretability.
    摘要 我们提出了一种新的方法来预测空间-时间数据,使用监督 semi-非正式矩阵分解(SSNMF),并添加频率刻度regularization。矩阵分解用于将空间-时间数据分解成空间和时间组成部分。为了提高时间特征的明显性,我们引入了非负约束在时间频谱上,同时在频率频谱上进行了规regularization。我们提出了两种频率频谱中的方法:软和硬的regulization,并提供了首轮站点的确定性保证。我们的主要动机来自地球物理数据分析,基于 GRACE(重力回升和气候实验)数据,但我们的方法具有更广泛的应用前景。当我们应用我们的方法于 GRACE 数据时,我们发现结果与前一些地球物理科学领域的研究相似,但更容易理解。

Non-Uniform Smoothness for Gradient Descent

  • paper_url: http://arxiv.org/abs/2311.08615
  • repo_url: https://github.com/lindonroberts/nonuniform-smoothness
  • paper_authors: Albert S. Berahas, Lindon Roberts, Fred Roosta
  • for: 这个论文主要是为了提出一种新的梯度下降类方法,以及一种基于本地首项稳定性诊断(LFSO)的模型。
  • methods: 这个论文使用了一种基于LFSO的修改后的梯度下降方法,并给出了全球和本地收敛结果。
  • results: 论文表明,使用LFSO可以在非强式凹陷问题中实现全球线性收敛率,并且超过了通用(加速)首项方法的下界。
    Abstract The analysis of gradient descent-type methods typically relies on the Lipschitz continuity of the objective gradient. This generally requires an expensive hyperparameter tuning process to appropriately calibrate a stepsize for a given problem. In this work we introduce a local first-order smoothness oracle (LFSO) which generalizes the Lipschitz continuous gradients smoothness condition and is applicable to any twice-differentiable function. We show that this oracle can encode all relevant problem information for tuning stepsizes for a suitably modified gradient descent method and give global and local convergence results. We also show that LFSOs in this modified first-order method can yield global linear convergence rates for non-strongly convex problems with extremely flat minima, and thus improve over the lower bound on rates achievable by general (accelerated) first-order methods.
    摘要 通常来说,梯度下降类方法的分析假设函数梯度的 lipschitz连续性。这通常需要一个昂贵的参数调整过程,以适应给定问题。在这篇文章中,我们引入了本地首项简oothness oracle(LFSO),这将 lípschitz连续梯度简单性条件扩展到任何两次导数函数。我们证明这个oracle可以包含所有相关的问题信息,用于调整梯度下降方法的步长。我们还证明LFSOs在修改后的首项方法中可以实现全球线性减少率,超过一般(加速)首项方法的下界。

Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

  • paper_url: http://arxiv.org/abs/2311.08610
  • repo_url: None
  • paper_authors: Itamar Zimerman, Moran Baruch, Nir Drucker, Gilad Ezov, Omri Soceanu, Lior Wolf
  • for: 这项研究旨在开发一种privacy-preserving深度学习模型,尤其是在使用 Homomorphic Encryption (HE) 技术时。
  • methods: 这项研究使用了一种新的幂数变换方法,将 transformer 模型转换成幂数形式,以实现安全的推理。同时,这种方法还可以在不同的数据集上进行图像分类和文本分类。
  • results: 研究结果显示,这种方法可以实现与传统方法相当的性能,并且可以在不同的应用场景中使用。此外,研究还发现了一些稳定性问题,并进行了一系列的ablations来评估每个模型组件的贡献。
    Abstract Designing privacy-preserving deep learning models is a major challenge within the deep learning community. Homomorphic Encryption (HE) has emerged as one of the most promising approaches in this realm, enabling the decoupling of knowledge between the model owner and the data owner. Despite extensive research and application of this technology, primarily in convolutional neural networks, incorporating HE into transformer models has been challenging because of the difficulties in converting these models into a polynomial form. We break new ground by introducing the first polynomial transformer, providing the first demonstration of secure inference over HE with transformers. This includes a transformer architecture tailored for HE, alongside a novel method for converting operators to their polynomial equivalent. This innovation enables us to perform secure inference on LMs with WikiText-103. It also allows us to perform image classification with CIFAR-100 and Tiny-ImageNet. Our models yield results comparable to traditional methods, bridging the performance gap with transformers of similar scale and underscoring the viability of HE for state-of-the-art applications. Finally, we assess the stability of our models and conduct a series of ablations to quantify the contribution of each model component.
    摘要 设计保持隐私的深度学习模型是深度学习社区中的一个主要挑战。归一化加密(HE)已经成为这个领域中最有前途的方法,允许知识的解耦 между模型所有者和数据所有者。尽管已经进行了广泛的研究和应用这技术,主要在卷积神经网络上,但将HE应用于变换器模型却是一个挑战,因为变换器模型不可以直接转化为多项式形式。我们在这篇文章中首次提出了第一个多项式变换器,并提供了将操作符转化为其多项式等价的新方法。这种创新允许我们在LMs上进行安全的推理,并在CIFAR-100和Tiny-ImageNet上进行图像分类。我们的模型的结果与传统方法相似, thereby bridging the performance gap with transformers of similar scale, and demonstrating the feasibility of HE for state-of-the-art applications. Finally, we assess the stability of our models and conduct a series of ablations to quantify the contribution of each model component.