cs.LG - 2023-12-04

Dissecting Medical Referral Mechanisms in Health Services: Role of Physician Professional Networks

  • paper_url: http://arxiv.org/abs/2312.02387
  • repo_url: None
  • paper_authors: Regina de Brito Duarte, Qiwei Han, Claudia Soares
  • for: 这篇论文探讨了医生之间专科医生(SC)和基础医生(PC)的医疗referral对病人质量、满意度和成本产生的影响。
  • methods: 该论文使用了葡萄牙私人医疗提供商五年的咨询数据,进行了探索性数据分析,并将医生之间的专业社交连接建立为referral网络。然后,使用图神经网络(GNN)模型学习推荐网络中的潜在表示。
  • results: 分析结果表明,医生的专业社交连接可以预测医疗referral,这可能会提高医疗机构内部的合作和提高医疗服务质量。这项研究对主要-专科referral的下面机制进行了可质的分析,为改善病人护理和有效管理医疗服务提供了有价值的意见。
    Abstract Medical referrals between primary care physicians (PC) and specialist care (SC) physicians profoundly impact patient care regarding quality, satisfaction, and cost. This paper investigates the influence of professional networks among medical doctors on referring patients from PC to SC. Using five-year consultation data from a Portuguese private health provider, we conducted exploratory data analysis and constructed both professional and referral networks among physicians. We then apply Graph Neural Network (GNN) models to learn latent representations of the referral network. Our analysis supports the hypothesis that doctors' professional social connections can predict medical referrals, potentially enhancing collaboration within organizations and improving healthcare services. This research contributes to dissecting the underlying mechanisms in primary-specialty referrals, thereby providing valuable insights for enhancing patient care and effective healthcare management.
    摘要 医疗专家之间的医疗参考(PC)和专科医生(SC)之间的医疗参考对患者的质量、满意度和成本产生了深远的影响。这篇论文研究了医生之间的专业联系对从PC到SC的患者转介的影响。使用葡萄牙私人医疗提供商五年的咨询数据,我们进行了探索性数据分析,并构建了医生之间的专业和转介网络。然后,我们应用图 нейрон网络(GNN)模型来学习转介网络的隐藏表示。我们的分析支持医生的专业社会连接可以预测医疗参考,可能提高组织内协作和改善医疗服务。这项研究为Primary-specialty referrals的内部机制的分析提供了有价值的信息,以提高患者的护理和有效的医疗管理。

FaultFormer: Transformer-based Prediction of Bearing Faults

  • paper_url: http://arxiv.org/abs/2312.02380
  • repo_url: https://github.com/anthonyzhou-1/faultformer
  • paper_authors: Anthony Zhou, Amir Barati Farimani
  • for: 这个研究用于预测不同类型的发动机缺陷(Bearing Faults),并使用Transformer架构进行分析。
  • methods: 这个研究使用了扩展资料和抽取 Fourier 模式来训练Transformer Encoder,以达到最高精度。
  • results: 研究发现Transformer 能够自动从讯号中提取特征,并学习全局和本地关系,以进行分类。此外,还提出了两种预训练策略,以便在生产线上适应新数据、情况或机器。
    Abstract The growth of deep learning in the past decade has motivated important applications to smart manufacturing and machine health monitoring. In particular, vibration data offers a rich and reliable source to provide meaningful insights into machine health and predictive maintenance. In this work, we present a Transformer based framework for analyzing vibration signals to predict different types of bearing faults (FaultFormer). In particular, we process signal data using data augmentations and extract their Fourier modes to train a transformer encoder to achieve state of the art accuracies. The attention mechanism as well as model outputs were analyzed to confirm the transformer's ability to automatically extract features within signals and learn both global and local relationships to make classifications. Lastly, two pretraining strategies were proposed to pave the way for large, generalizable transformers that could adapt to new data, situations, or machinery on the production floor.
    摘要 随着深度学习技术在过去一代的发展,智能制造和机器健康监测中得到了重要应用。特别是震动数据提供了一种丰富和可靠的源,以提供机器健康的有价值信息和预测维护。在这项工作中,我们提出了基于Transformer框架的震动信号分析方法,以预测不同类型的承式 fault(FaultFormer)。特别是,我们对信号数据进行了数据扩充和抽取其傅里埃模式,并使用Transformer编码器来实现state of the art的准确率。transformer的注意机制以及模型输出被分析,以确认transformer的能力自动提取信号中的特征和学习全局和局部关系,以进行分类。最后,我们提出了两种预训练策略,以便在生产线上适应新数据、情况或机器。

On the Trade-Off between Stability and Representational Capacity in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2312.02372
  • repo_url: None
  • paper_authors: Zhan Gao, Amanda Prorok, Elvin Isufi
  • for: 本研究旨在 investigate graph neural networks (GNNs) 的稳定性 unter topological perturbations,以便更好地理解它们的可转移性和每个architecture component的作用。
  • methods: 本研究使用 EdgeNet 框架,覆盖了更多 than twenty 的解决方案,包括卷积和注意力基本类、图 isomorphism 网络和混合体系。我们证明了所有 EdgeNet 框架中的 GNNs 都是 topological perturbations 的稳定的。
  • results: 我们发现, EdgeNet 框架中 GNNs 的稳定性与不同类型的 EdgeNet 分类有关。 Specifically, 具有较少的参数空间度量的 GNNs (相对于其表示能力的下降) 更加稳定。这种负面关系的关键因素是 EdgeNet 参数矩阵和图Shift operator 的eigenvector misalignment。例如,使用单个整数表示每个信号shift (即完美对齐) 的 graph convolutional neural networks 更加稳定。
    Abstract Analyzing the stability of graph neural networks (GNNs) under topological perturbations is key to understanding their transferability and the role of each architecture component. However, stability has been investigated only for particular architectures, questioning whether it holds for a broader spectrum of GNNs or only for a few instances. To answer this question, we study the stability of EdgeNet: a general GNN framework that unifies more than twenty solutions including the convolutional and attention-based classes, as well as graph isomorphism networks and hybrid architectures. We prove that all GNNs within the EdgeNet framework are stable to topological perturbations. By studying the effect of different EdgeNet categories on the stability, we show that GNNs with fewer degrees of freedom in their parameter space, linked to a lower representational capacity, are more stable. The key factor yielding this trade-off is the eigenvector misalignment between the EdgeNet parameter matrices and the graph shift operator. For example, graph convolutional neural networks that assign a single scalar per signal shift (hence, with a perfect alignment) are more stable than the more involved node or edge-varying counterparts. Extensive numerical results corroborate our theoretical findings and highlight the role of different architecture components in the trade-off.
    摘要

RINAS: Training with Dataset Shuffling Can Be General and Fast

  • paper_url: http://arxiv.org/abs/2312.02368
  • repo_url: None
  • paper_authors: Tianle Zhong, Jiechen Zhao, Xindi Guo, Qiang Su, Geoffrey Fox
  • for: 这个论文的目的是提高深度学习模型训练pipeline中数据处理的效率,特别是针对大规模 dataset的随机抽取。
  • methods: 本文使用了一种名为RINAS的数据加载框架,通过不同的数据加载方式来提高模型训练的并行性和效率。
  • results: 实验结果表明,RINAS可以提高通用语言模型训练和视觉模型训练的 Throughput by up to 59%和89%,尤其是在大规模 dataset 上。
    Abstract Deep learning datasets are expanding at an unprecedented pace, creating new challenges for data processing in model training pipelines. A crucial aspect of these pipelines is dataset shuffling, which significantly improves unbiased learning and convergence accuracy by adhering to the principles of random sampling. However, loading shuffled data for large datasets incurs significant overhead in the deep learning pipeline and severely impacts the end-to-end training throughput. To mitigate this, current deep learning systems often resort to partial dataset shuffling, sacrificing global randomness to maintain acceptable training throughput on large datasets, still leaving global shuffling efficiency issues not fully explored. In this work, we present RINAS, a data loading framework that systematically addresses the performance bottleneck of loading global shuffled datasets. Our key contribution is to offer an intra-batch unordered data fetching approach, which unleashes unexplored parallelism of data loading. We implement RINAS under the PyTorch framework for common dataset libraries HuggingFace and TorchVision. Our experimental results show that RINAS improves the throughput of general language model training and vision model training by up to 59% and 89%, respectively.
    摘要 深度学习数据集的扩展正在无前例的速度下进行,创造了模型训练管道中数据处理的新挑战。一个关键的方面是数据洗混,可以帮助避免深度学习模型受到偏见的影响,并提高模型的准确率。然而,为大型数据集加载洗混数据可能会带来深度学习管道中的重要负担,影响整个训练过程的终端性能。为此,当前的深度学习系统通常采用部分数据洗混,把全局随机抽取替换为保持可接受的训练过程速度。但这些方法仍然留下全局洗混效率的问题未经探讨。在这种情况下,我们提出了RINAS,一个数据加载框架,系统地解决了加载全球洗混数据的性能瓶颈。我们的关键贡献是提供了内批量无序数据fetching方法,使得数据加载中的并行性得到了更好的发挥。我们在PyTorch框架下实现了RINAS,并对HuggingFace和TorchVision等常用的数据库进行了实现。我们的实验结果表明,RINAS可以提高普通语言模型训练和视觉模型训练的throughput by up to 59%和89%,分别。

A Waddington landscape for prototype learning in generalized Hopfield networks

  • paper_url: http://arxiv.org/abs/2312.03012
  • repo_url: https://github.com/nacer-eb/krotovhopfieldwaddington
  • paper_authors: Nacer Eddine Boukacem, Allen Leary, Robin Thériault, Felix Gottlieb, Madhav Mani, Paul François
  • for: 这篇论文研究了通用化胡波尔特网络的学习动态,以及这些网络在学习过程中的内存分化。
  • methods: 作者使用了一种称为”特征至原型”的学习方法,以及一些数学计算和数值仿真方法来研究这种学习动态。
  • results: 研究发现,当增强网络的非线性度时,学习过程会经历一种”特征至原型”的转变,其中内存的学习状态从混合态转变为纯态。此外,作者还发现了一些可靠的分解点和稳定点,这些点在学习过程中可以控制和预测内存的分化。
    Abstract Networks in machine learning offer examples of complex high-dimensional dynamical systems reminiscent of biological systems. Here, we study the learning dynamics of Generalized Hopfield networks, which permit a visualization of internal memories. These networks have been shown to proceed through a 'feature-to-prototype' transition, as the strength of network nonlinearity is increased, wherein the learned, or terminal, states of internal memories transition from mixed to pure states. Focusing on the prototype learning dynamics of the internal memories we observe a strong resemblance to the canalized, or low-dimensional, dynamics of cells as they differentiate within a Waddingtonian landscape. Dynamically, we demonstrate that learning in a Generalized Hopfield Network proceeds through sequential 'splits' in memory space. Furthermore, order of splitting is interpretable and reproducible. The dynamics between the splits are canalized in the Waddington sense -- robust to variations in detailed aspects of the system. In attempting to make the analogy a rigorous equivalence, we study smaller subsystems that exhibit similar properties to the full system. We combine analytical calculations with numerical simulations to study the dynamical emergence of the feature-to-prototype transition, and the behaviour of splits in the landscape, saddles points, visited during learning. We exhibit regimes where saddles appear and disappear through saddle-node bifurcations, qualitatively changing the distribution of learned memories as the strength of the nonlinearity is varied -- allowing us to systematically investigate the mechanisms that underlie the emergence of Waddingtonian dynamics. Memories can thus differentiate in a predictive and controlled way, revealing new bridges between experimental biology, dynamical systems theory, and machine learning.
    摘要 机器学习网络提供了复杂高维动力系统的示例,类似生物系统。我们研究通用哈夫菲尔德网络的学习动力学,允许内存的可视化。当网络强度不同时,这些网络学习的终态状态会从杂合到纯状态转换。我们关注内存学习动力学,发现它们与瓦丁顿矩阵中细胞分化的动力学具有很强的相似性。我们通过分析和数值仿真来研究学习过程中的特征转换和峰点的行为,发现学习过程会经过顺序的分裂,并且这些分裂是可预测的和控制的。我们可以通过调整非线性度来控制学习过程中的分裂和峰点的出现,从而系统地研究emergence的机制。因此,内存可以在预测和控制的情况下分化,揭示了新的生物学、动力学理论和机器学习之间的桥梁。

FLea: Improving federated learning on scarce and label-skewed data via privacy-preserving feature augmentation

  • paper_url: http://arxiv.org/abs/2312.02327
  • repo_url: None
  • paper_authors: Tong Xia, Abhirup Ghosh, Cecilia Mascolo
  • for: 提高 Federated Learning (FL) 中客户端数据小和标签偏好的性能。
  • methods: 提出了一种名为 \textit{FLea} 的框架,通过保护客户端数据的私钥特征进行增强本地模型学习,并通过将本地和共享特征组合为增强本地模型的学习。
  • results: 在广泛的实验中,\textit{FLea} 比起现有的 FL 方法(只分享模型参数)提高了性能,最高提高达 $17.6%$,并且比起分享数据增强法 ($6.3%$) 提高了性能,同时减少了对共享数据增强的隐私敏感性。
    Abstract Learning a global model by abstracting the knowledge, distributed across multiple clients, without aggregating the raw data is the primary goal of Federated Learning (FL). Typically, this works in rounds alternating between parallel local training at several clients, followed by model aggregation at a server. We found that existing FL methods under-perform when local datasets are small and present severe label skew as these lead to over-fitting and local model bias. This is a realistic setting in many real-world applications. To address the problem, we propose \textit{FLea}, a unified framework that tackles over-fitting and local bias by encouraging clients to exchange privacy-protected features to aid local training. The features refer to activations from an intermediate layer of the model, which are obfuscated before being shared with other clients to protect sensitive information in the data. \textit{FLea} leverages a novel way of combining local and shared features as augmentations to enhance local model learning. Our extensive experiments demonstrate that \textit{FLea} outperforms the start-of-the-art FL methods, sharing only model parameters, by up to $17.6\%$, and FL methods that share data augmentations by up to $6.3\%$, while reducing the privacy vulnerability associated with shared data augmentations.
    摘要 学习全球模型,抽象客户端上的知识,不聚合原始数据,是联邦学习(FL)的主要目标。通常情况下,这会在多个客户端上并发地进行本地训练,然后在服务器上进行模型集成。我们发现现有的FL方法在本地数据较少且标签偏度严重时表现不佳,这会导致过拟合和本地模型偏见。这是许多实际应用中的常见情况。为解决问题,我们提议了《FLea》框架,通过保护隐私信息的方式,让客户端交换隐私保护的特征,以帮助本地训练。特征指的是模型中间层的活动,以防止数据泄露。《FLea》利用了一种新的将本地特征和共享特征相结合的方式,以增强本地模型学习。我们的广泛实验表明,《FLea》可以与现有的FL方法相比,在共享模型参数的情况下,提高性能达17.6%,并且在共享数据增强情况下,提高性能达6.3%,同时降低了与共享数据增强相关的隐私暴露风险。

Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

  • paper_url: http://arxiv.org/abs/2312.02300
  • repo_url: None
  • paper_authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Fadi B Nahab, Xiao Hu
  • for: 本研究探讨了连续健康监测使用智能Device时的机器学习模型评估的挑战,以及 conventional metrics 之外的新评估策略。
  • methods: 本研究采用了大规模心血管研究的经验,提供了一个完整的评估指南,以帮助评估连续健康监测中的机器学习模型。
  • results: 研究表明,现实世界的变化、疾病动态、用户特性和假 notifications 等因素会导致 ML 模型的评估变得复杂,需要采用新的评估策略。
    Abstract This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart studies, the paper offers a comprehensive guideline for robust ML model evaluation on continuous health monitoring.
    摘要 Here's the text in Simplified Chinese:这篇论文探讨了使用便携式设备进行连续健康监测的机器学习(ML)模型评估的挑战,并提出了新的评估策略来解决这些挑战。这些复杂性来自实际世界的变化、疾病动态、用户特定特征以及假阳通知的存在,需要新的评估策略。 drawing on insights from large-scale heart studies, the paper provides a comprehensive guide for robust ML model evaluation in continuous health monitoring.

Cotton Yield Prediction Using Random Forest

  • paper_url: http://arxiv.org/abs/2312.02299
  • repo_url: None
  • paper_authors: Alakananda Mitra, Sahila Beegum, David Fleisher, Vangimalla R. Reddy, Wenguang Sun, Chittaranjan Ray, Dennis Timlin, Arindam Malakar
    for: This paper aims to develop a machine learning model to predict cotton yield based on climate change, soil diversity, cultivar, and inorganic nitrogen levels.methods: The authors use a combination of field data from the 1980s to the 1990s and process-based crop modeling to develop a machine learning model that can accurately predict cotton yield.results: The Random Forest Regressor method used in the study achieved a 97.75% accuracy rate, with a root mean square error of 55.05 kg/ha and an R2 of around 0.98. These results demonstrate the potential of machine learning techniques for supporting the cotton industry’s climate-smart initiatives.Here’s the text in Simplified Chinese:for: 这篇论文目标是开发一种基于气候变化、土壤多样性、栽培变种和无机氮水平的机器学习模型,以预测棉花收成。methods: 作者使用了1980年代到1990年代的田间数据和基于过程模型的气候模拟技术,开发出一种可靠地预测棉花收成的机器学习模型。results: 试用Random Forest Regressor方法,实验室得到了97.75%的准确率,root mean square error为55.05 kg/ha,R2约为0.98。这些结果表明机器学习技术可以准确地预测棉花收成,并且可以支持棉花行业的气候减灾Initiaves。
    Abstract The cotton industry in the United States is committed to sustainable production practices that minimize water, land, and energy use while improving soil health and cotton output. Climate-smart agricultural technologies are being developed to boost yields while decreasing operating expenses. Crop yield prediction, on the other hand, is difficult because of the complex and nonlinear impacts of cultivar, soil type, management, pest and disease, climate, and weather patterns on crops. To solve this issue, we employ machine learning (ML) to forecast production while considering climate change, soil diversity, cultivar, and inorganic nitrogen levels. From the 1980s to the 1990s, field data were gathered across the southern cotton belt of the United States. To capture the most current effects of climate change over the previous six years, a second data source was produced using the process-based crop model, GOSSYM. We concentrated our efforts on three distinct areas inside each of the three southern states: Texas, Mississippi, and Georgia. To simplify the amount of computations, accumulated heat units (AHU) for each set of experimental data were employed as an analogy to use time-series weather data. The Random Forest Regressor yielded a 97.75% accuracy rate, with a root mean square error of 55.05 kg/ha and an R2 of around 0.98. These findings demonstrate how an ML technique may be developed and applied as a reliable and easy-to-use model to support the cotton climate-smart initiative.
    摘要 美国棉花industry是承诺实施可持续生产方法,以最小化水、地、能源使用,同时提高棉花output和土壤健康。为提高产量而降低运营成本, клима smart 农业技术正在开发。然而,作物产量预测很具挑战,因为作物的种植、土壤类型、管理、害虫和病虫、气候和天气patterns具有复杂和非线性的影响。为解决这个问题,我们使用机器学习(ML)来预测生产,同时考虑气候变化、土壤多样性、种植和无机氮水平。在1980年代至1990年代之间,在南方棉花带的美国,我们收集了大量的场地数据。为捕捉最近六年的气候变化的影响,我们还生成了一个第二个数据源,使用过程基模型GOSSYM。我们专注于三个不同的区域内每个州的三个南方州:德克萨斯、密西西比和 Джорджи亚。为简化计算量,我们使用积累热量单位(AHU)作为时间序列天气数据的类比。随机森林回归器实现了97.75%的准确率,其中平均差分为55.05 kg/ha,R2约为0.98。这些发现表明了一种机器学习技术可以在支持棉花气候明智initiative的情况下发展和应用。

ALEXR: Optimal Single-Loop Algorithms for Convex Finite-Sum Coupled Compositional Stochastic Optimization

  • paper_url: http://arxiv.org/abs/2312.02277
  • repo_url: None
  • paper_authors: Bokun Wang, Tianbao Yang
  • for: solves a class of convex Finite-Sum Coupled Compositional Stochastic Optimization (cFCCO) problems with applications in group distributionally robust optimization (GDRO), reinforcement learning, and learning to rank.
  • methods: introduces a unified family of efficient single-loop primal-dual block-coordinate proximal algorithms called ALEXR, which leverages block-coordinate stochastic mirror ascent updates for the dual variable and stochastic proximal gradient descent updates for the primal variable.
  • results: establishes the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions of involved functions, which improve the best rates in previous works on smooth cFCCO problems and expand the realm of cFCCO for solving more challenging non-smooth problems such as the dual form of GDRO.
    Abstract This paper revisits a class of convex Finite-Sum Coupled Compositional Stochastic Optimization (cFCCO) problems with many applications, including group distributionally robust optimization (GDRO), reinforcement learning, and learning to rank. To better solve these problems, we introduce a unified family of efficient single-loop primal-dual block-coordinate proximal algorithms, dubbed ALEXR. This algorithm leverages block-coordinate stochastic mirror ascent updates for the dual variable and stochastic proximal gradient descent updates for the primal variable. We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions of involved functions, which not only improve the best rates in previous works on smooth cFCCO problems but also expand the realm of cFCCO for solving more challenging non-smooth problems such as the dual form of GDRO. Finally, we present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate stochastic algorithms for the considered class of cFCCO problems.
    摘要 ALEXR uses block-coordinate stochastic mirror ascent updates for the dual variable and stochastic proximal gradient descent updates for the primal variable. The authors establish the convergence rates of ALEXR in both convex and strongly convex cases, under smoothness and non-smoothness conditions of the involved functions. These rates improve upon the best rates in previous works on smooth cFCCO problems and expand the realm of cFCCO to solve more challenging non-smooth problems, such as the dual form of GDRO.Finally, the authors present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate stochastic algorithms for the considered class of cFCCO problems.

Scaling Laws in Jet Classification

  • paper_url: http://arxiv.org/abs/2312.02264
  • repo_url: None
  • paper_authors: Joshua Batson, Yonatan Kahn
  • for: 这 paper investigate 粒子Physics 中 Jet 分类问题中的缩放法律。
  • methods: 研究人员使用 six 种 физи学上有意义的分类器,发现这些分类器在训练集大小不同时的 binary cross-entropy 测试损失Display power-law Scaling 性质。
  • results: 研究发现,随着训练集大小的增加,不同的分类器的最佳性能可能会有很大变化,而且这些变化可能与自然语言和图像 dataset 中观察到的缩放法律有关。
    Abstract We demonstrate the emergence of scaling laws in the benchmark top versus QCD jet classification problem in collider physics. Six distinct physically-motivated classifiers exhibit power-law scaling of the binary cross-entropy test loss as a function of training set size, with distinct power law indices. This result highlights the importance of comparing classifiers as a function of dataset size rather than for a fixed training set, as the optimal classifier may change considerably as the dataset is scaled up. We speculate on the interpretation of our results in terms of previous models of scaling laws observed in natural language and image datasets.
    摘要 我们展示了扩展法则在Collider物理中的benchmark顶点对QCD涂抹分类问题中的出现。我们的六个 физи学上 Motivated的分类器在训练集大小的函数上显示了对数跨分布损失的power-law扩展法则,其中每个扩展法则都有不同的power law指数。这一结果 highlights the importance of在不同的训练集大小上比较分类器,而不是只用一个固定的训练集,因为最佳的分类器可能会在训练集的扩展上发生重大变化。我们对我们的结果进行了解释,并与之前在自然语言和图像 dataset 中观察到的扩展法则进行了比较。

Learning Polynomial Problems with $SL(2,\mathbb{R})$ Equivariance

  • paper_url: http://arxiv.org/abs/2312.02146
  • repo_url: None
  • paper_authors: Hannah Lawrence, Mitchell Tong Harris
  • for: 该文章是用来解决数学和工程应用中的几何问题,包括动力系统和运筹学问题。
  • methods: 该文章使用神经网络来解决这些问题,并使用数据驱动的方法来实现高精度和高速度。
  • results: 文章显示,使用神经网络可以高效地解决这些问题,并且可以达到10倍的速度提升,而无需增加维度和度数。此外,文章还发现了一些 polynomial 学习问题的特点,如 $SL(2,\mathbb{R})$ 非 compat 群的对称性。
    Abstract Optimizing and certifying the positivity of polynomials are fundamental primitives across mathematics and engineering applications, from dynamical systems to operations research. However, solving these problems in practice requires large semidefinite programs, with poor scaling in dimension and degree. In this work, we demonstrate for the first time that neural networks can effectively solve such problems in a data-driven fashion, achieving tenfold speedups while retaining high accuracy. Moreover, we observe that these polynomial learning problems are equivariant to the non-compact group $SL(2,\mathbb{R})$, which consists of area-preserving linear transformations. We therefore adapt our learning pipelines to accommodate this structure, including data augmentation, a new $SL(2,\mathbb{R})$-equivariant architecture, and an architecture equivariant with respect to its maximal compact subgroup, $SO(2, \mathbb{R})$. Surprisingly, the most successful approaches in practice do not enforce equivariance to the entire group, which we prove arises from an unusual lack of architecture universality for $SL(2,\mathbb{R})$ in particular. A consequence of this result, which is of independent interest, is that there exists an equivariant function for which there is no sequence of equivariant polynomials multiplied by arbitrary invariants that approximates the original function. This is a rare example of a symmetric problem where data augmentation outperforms a fully equivariant architecture, and provides interesting lessons in both theory and practice for other problems with non-compact symmetries.
    摘要 Translated into Simplified Chinese:优化和证明多项式的正性是数学和工程应用的基本原则,从动力系统到运筹学。然而,在实践中解决这些问题需要大量的semidefinitePrograms,具有糟糕的维度和度量。在这种工作中,我们第一次示出了神经网络可以效果地解决这些问题,实现了十倍的速度提升,同时保持高度准确。此外,我们发现这些多项式学习问题与非紧定群$SL(2,\mathbb{R})$相关,该群包括保持面积的线性变换。我们因此适应这种结构,包括数据扩充、新的$SL(2,\mathbb{R})$-equivariant架构和$SO(2,\mathbb{R})$的最大凝合子架构。奇怪的是,在实践中最成功的方法并不强制执行 equivariant 到整个群,我们证明这是由 $SL(2,\mathbb{R})$ 特殊的 Architecture 缺乏 universality 所致。这个结果的一个独立的 interesseting 结论是,存在一个 equivariant 函数,其中不存在任何 equivariant 多项式乘以arbitrary invariants 可以近似原始函数。这是非常罕见的对称问题,数据扩充 exceeds 完全 equivariant 架构,提供了有趣的理论和实践教训 для其他具有非紧定对称的问题。

Mitigating Data Injection Attacks on Federated Learning

  • paper_url: http://arxiv.org/abs/2312.02102
  • repo_url: None
  • paper_authors: Or Shalom, Amir Leshem, Waheed U. Bajwa
  • for: 防止false data injection攻击在 federated learning 系统中
  • methods: 提议一种本地方法,通过协调节点在训练过程中检测和忽略伪装攻击者的数据,使用概率评估来识别伪装攻击者,并在满足某些条件时忽略伪装攻击者的数据
  • results: simulations 表明,当协调节点检测并忽略所有伪装攻击者时,模型会恢复并 converges 到真实的模型
    Abstract Federated learning is a technique that allows multiple entities to collaboratively train models using their data without compromising data privacy. However, despite its advantages, federated learning can be susceptible to false data injection attacks. In these scenarios, a malicious entity with control over specific agents in the network can manipulate the learning process, leading to a suboptimal model. Consequently, addressing these data injection attacks presents a significant research challenge in federated learning systems. In this paper, we propose a novel technique to detect and mitigate data injection attacks on federated learning systems. Our mitigation method is a local scheme, performed during a single instance of training by the coordinating node, allowing the mitigation during the convergence of the algorithm. Whenever an agent is suspected to be an attacker, its data will be ignored for a certain period, this decision will often be re-evaluated. We prove that with probability 1, after a finite time, all attackers will be ignored while the probability of ignoring a trustful agent becomes 0, provided that there is a majority of truthful agents. Simulations show that when the coordinating node detects and isolates all the attackers, the model recovers and converges to the truthful model.
    摘要 federated learning 是一种技术,允许多个实体共同训练模型,使用他们的数据,而不需要数据隐私泄露。然而,这种技术可能受到假数据插入攻击。在这些情况下,一个黑客控制specific agents在网络中的实体可以操纵学习过程,导致模型下降。因此,解决这些数据插入攻击对 Federated Learning 系统具有重要的研究挑战。在这篇论文中,我们提出了一种检测和 Mitigate 数据插入攻击的新技术。我们的避免方法是一种本地方案,在协调节点单个训练过程中执行,allowing mitigation during the convergence of the algorithm。当一个代理被怀疑是黑客时,其数据将被忽略一定时间,这个决定将经常重新评估。我们证明,在某些条件下,在一定时间后,所有的黑客都将被忽略,而不会忽略任何真实的代理,假设有多数的真实代理。仪表试验显示,当协调节点检测并隔离所有黑客时,模型会恢复并 converges 到真实的模型。

Single-sample versus case-control sampling scheme for Positive Unlabeled data: the story of two scenarios

  • paper_url: http://arxiv.org/abs/2312.02095
  • repo_url: None
  • paper_authors: Jan Mielniczuk, Adam Wawrzeńczyk
  • for: 这个论文是为了探讨使用Empirical Risk Minimization(ERM)适应正例数据的分类器在单例情景下的性能问题。
  • methods: 该论文使用了ERM适应正例数据的分类器,并对其在单例情景下的性能进行了分析和比较。
  • results: 研究发现,在大多数情况下,使用ERM适应正例数据的分类器在单例情景下的性能会显著下降。此外,研究还引入了单例情景下的非正式风险分类器,并对其与原始提议进行了比较,发现它们在一些情况下存在显著差异。
    Abstract In the paper we argue that performance of the classifiers based on Empirical Risk Minimization (ERM) for positive unlabeled data, which are designed for case-control sampling scheme may significantly deteriorate when applied to a single-sample scenario. We reveal why their behavior depends, in all but very specific cases, on the scenario. Also, we introduce a single-sample case analogue of the popular non-negative risk classifier designed for case-control data and compare its performance with the original proposal. We show that the significant differences occur between them, especiall when half or more positive of observations are labeled. The opposite case when ERM minimizer designed for the case-control case is applied for single-sample data is also considered and similar conclusions are drawn. Taking into account difference of scenarios requires a sole, but crucial, change in the definition of the Empirical Risk.
    摘要 在论文中,我们Arguments that the performance of classifiers based on Empirical Risk Minimization (ERM) for positive unlabeled data, designed for case-control sampling scheme, may significantly deteriorate when applied to a single-sample scenario. We reveal why their behavior depends, in all but very specific cases, on the scenario. In addition, we introduce a single-sample case analogue of the popular non-negative risk classifier designed for case-control data and compare its performance with the original proposal. We show that significant differences occur between them, especially when half or more positive observations are labeled. The opposite case when ERM minimizer designed for the case-control case is applied for single-sample data is also considered, and similar conclusions are drawn. Taking into account the differences of scenarios requires a sole, but crucial, change in the definition of the Empirical Risk.Here's the breakdown of the translation:* 在论文中 (shang zhong yu yan jiu) - In the paper* 我们Arguments (wo men ying xiang) - We argue* dass (da) - that* 表现 (biao xian) - performance* 可能 (ke yi) - may* 显著 (xian zhi) - significantly* 下降 (xia jiang) - deteriorate* 应用 (ying yao) - when applied* 到 (dao) - to* 单个样本 (dan ye xiang mu) - a single-sample scenario* 我们 (wo men) - we* 揭示 (jiang shi) - reveal* 他们 (ta men) - their* 行为 (xing wei) - behavior* 取决 (qiu jue) - depends* 在 (zai) - in* 很特殊 (hen tai xi) - all but very specific cases* 场景 (chang jing) - scenario* 以及 (yi qi) - and* 我们 (wo men) - we* 引入 (yin jian) - introduce* 一个单例 (yi ge dan yi) - a single-sample case analogue* 的 (de) - of* 非正式 (fei zheng xi) - non-negative risk classifier* 设计 (she jian) - designed* для (for) - for* 整体 (zheng ti) - overall* 数据 (shu ju) - data* 比较 (bi xie) - compare* 其性能 (qie xing neng) - its performance* 与 (yu) - with* 原始提案 (yuan shi tiaolan) - the original proposal* 显著差异 (xian zhi yi yi) - significant differences* 出现 (chu xian) - occur* 特别 (te bi) - especially* 半数 (ban shu) - half* 或更多 (or geng duo) - or more* 正面 (zheng mian) - positive* 观察 (guan yu) - observations* Label (jiao) - labeled* 相反 (xiang fan) - the opposite case* 当 (when) - when* ERM minimizer (ERM minimizer) - designed* 应用 (ying yao) - applied* 于 (yu) - to* 单个样本 (dan ye xiang mu) - a single-sample scenario* 我们 (wo men) - we* 揭示 (jiang shi) - reveal* 他们 (ta men) - their* 行为 (xing wei) - behavior* 取决 (qiu jue) - depends* 在 (zai) - in* 很特殊 (hen tai xi) - all but very specific cases* 场景 (chang jing) - scenario* 以及 (yi qi) - and* 我们 (wo men) - we* 需要 (xū yào) - requires* 一个小 cambio (yī gè xiǎo jiāng) - a sole, but crucial, change* 在 (zai) - in* 定义 (dìng yì) - definition* Empirical Risk (Empirical Risk) - of* 不同 (bù tiān) - different* 场景 (chang jing) - scenariosNote that the translation is in Simplified Chinese, and the word order may be different from the original text in Traditional Chinese.

Deep Set Neural Networks for forecasting asynchronous bioprocess timeseries

  • paper_url: http://arxiv.org/abs/2312.02079
  • repo_url: None
  • paper_authors: Maxim Borisyak, Stefan Born, Peter Neubauer, Mariano Nicolas Cruz-Bournazou
  • for: 这种论文主要是为了处理缺失数据和不规则的时间序列,提供了一种基于深度学习的方法,可以不需要填充或对时间序列进行对齐处理。
  • methods: 这种方法使用了深度学习的 triplet 编码器,可以successfully 处理生物过程数据,并且可以适应不同的任务,如在线监测、预测控制、实验设计等。
  • results: 在这篇论文中,作者使用了这种方法进行了一些预测任务,并与传统的适应方法和基于填充和对齐的方法进行了比较,结果表明,这种方法可以更好地处理缺失数据和不规则的时间序列,并且可以提高预测的准确性。
    Abstract Cultivation experiments often produce sparse and irregular time series. Classical approaches based on mechanistic models, like Maximum Likelihood fitting or Monte-Carlo Markov chain sampling, can easily account for sparsity and time-grid irregularities, but most statistical and Machine Learning tools are not designed for handling sparse data out-of-the-box. Among popular approaches there are various schemes for filling missing values (imputation) and interpolation into a regular grid (alignment). However, such methods transfer the biases of the interpolation or imputation models to the target model. We show that Deep Set Neural Networks equipped with triplet encoding of the input data can successfully handle bio-process data without any need for imputation or alignment procedures. The method is agnostic to the particular nature of the time series and can be adapted for any task, for example, online monitoring, predictive control, design of experiments, etc. In this work, we focus on forecasting. We argue that such an approach is especially suitable for typical cultivation processes, demonstrate the performance of the method on several forecasting tasks using data generated from macrokinetic growth models under realistic conditions, and compare the method to a conventional fitting procedure and methods based on imputation and alignment.
    摘要 培养实验经常产生稀疏和不规则的时间序列数据。经典方法,如最大可能性适应或 Monte-Carlo Markov chain 采样,可以轻松处理稀疏数据和时间格点不规则,但大多数统计学和机器学习工具不是直接处理稀疏数据的设计。在各种方法中,有许多填充缺失值(imputation)和对时间序列进行排序(alignment)的方法,但这些方法会将推断模型的偏见传递给目标模型。我们表明,使用深度集成神经网络,并将输入数据 triplet 编码,可以成功处理生物过程数据,无需进行填充或对齐过程。这种方法不依赖于特定时间序列的性质,可以适应任何任务,例如在线监测、预测控制、实验设计等。在这个工作中,我们关注于预测任务。我们认为,这种方法特别适用于典型的培养过程,并在使用数据生成自 макрокинетиче增长模型的实际条件下进行了证明。我们还与传统适应过程和基于填充和对齐的方法进行比较。

Federated Learning is Better with Non-Homomorphic Encryption

  • paper_url: http://arxiv.org/abs/2312.02074
  • repo_url: None
  • paper_authors: Konstantin Burlachenko, Abdulmajeed Alrowithi, Fahad Ali Albalawi, Peter Richtarik
    for:本研究旨在提供一种可靠、安全、可扩展的分布式学习(Federated Learning,FL)框架,以帮助解决传统AI方法所遇到的中央数据收集和隐私问题。methods:本研究使用了卷积 encryption 和分布式加密技术,以提供安全和可靠的训练过程。此外,本研究还提出了一种基于 permutation 的压缩算法,以减少计算和存储成本。results:本研究的实验结果表明,使用本研究提出的框架和算法可以减少计算和存储成本,同时保持较高的训练效果。此外,本研究还发现了一些可能的应用场景,如医疗领域和智能家居等。
    Abstract Traditional AI methodologies necessitate centralized data collection, which becomes impractical when facing problems with network communication, data privacy, or storage capacity. Federated Learning (FL) offers a paradigm that empowers distributed AI model training without collecting raw data. There are different choices for providing privacy during FL training. One of the popular methodologies is employing Homomorphic Encryption (HE) - a breakthrough in privacy-preserving computation from Cryptography. However, these methods have a price in the form of extra computation and memory footprint. To resolve these issues, we propose an innovative framework that synergizes permutation-based compressors with Classical Cryptography, even though employing Classical Cryptography was assumed to be impossible in the past in the context of FL. Our framework offers a way to replace HE with cheaper Classical Cryptography primitives which provides security for the training process. It fosters asynchronous communication and provides flexible deployment options in various communication topologies.
    摘要 传统的人工智能方法ologies需要中央数据收集,这在面临网络通信、数据隐私和存储容量问题时变得不实际。 Federated Learning(FL)提供了一种分布式人工智能模型训练的方法,无需收集原始数据。在FL训练中,保护隐私是一个重要的问题。一种流行的方法是使用扩展式加密(HE),这是 encrypt computation 的一种突破口。然而,这些方法带有额外的计算和存储占用。为解决这些问题,我们提出了一个创新的框架,它将 permutation-based compressors 与古典密码学结合,即使在FL中使用古典密码学 primitives 也可以提供安全性。我们的框架可以将 HE 替换为更便宜的古典密码学 primitives,并且支持异步通信和多种通信拓扑的灵活部署。

The GPU Phase Folding and Deep Learning Method for Detecting Exoplanet Transits

  • paper_url: http://arxiv.org/abs/2312.02063
  • repo_url: None
  • paper_authors: Kaitlyn Wang, Kevin Wang, Jian Ge, Yinan Zhao, Kevin Willis
  • for: 找寻外星系(exoplanets)使用过程方法
  • methods: 使用GPU加速临时减法(Phase Folding)和卷积神经网络(CNN)系统来检测外星系
  • results: 比 traditional Box-fitting Least Squares(BLS)方法快三个数量级,并且在准确率和准确率之间具有更高的准确率和更高的准确率。
    Abstract This paper presents GPFC, a novel Graphics Processing Unit (GPU) Phase Folding and Convolutional Neural Network (CNN) system to detect exoplanets using the transit method. We devise a fast folding algorithm parallelized on a GPU to amplify low signal-to-noise ratio transit signals, allowing a search at high precision and speed. A CNN trained on two million synthetic light curves reports a score indicating the likelihood of a planetary signal at each period. GPFC improves on speed by three orders of magnitude over the predominant Box-fitting Least Squares (BLS) method. Our simulation results show GPFC achieves 97% training accuracy, higher true positive rate at the same false positive rate of detection, and higher precision at the same recall rate when compared to BLS. GPFC recovers 100% of known ultra-short-period planets in Kepler light curves from a blind search. These results highlight the promise of GPFC as an alternative approach to the traditional BLS algorithm for finding new transiting exoplanets in data taken with Kepler and other space transit missions such as K2, TESS and future PLATO and Earth 2.0.
    摘要

GFS: Graph-based Feature Synthesis for Prediction over Relational Databases

  • paper_url: http://arxiv.org/abs/2312.02037
  • repo_url: None
  • paper_authors: Han Zhang, Quan Gan, David Wipf, Weinan Zhang
  • for: 这篇论文主要是为了解决关系数据库中的数据挖掘和机器学习问题。
  • methods: 该论文提出了一种新的框架 called Graph-based Feature Synthesis (GFS),它将关系数据库转换为不同表之间的异种图,以保留数据中的关系结构。
  • results: 在四个真实的多表关系数据库上进行了广泛的实验,GFS的性能超过了之前为关系数据库设计的方法,证明了它的优越性。
    Abstract Relational databases are extensively utilized in a variety of modern information system applications, and they always carry valuable data patterns. There are a huge number of data mining or machine learning tasks conducted on relational databases. However, it is worth noting that there are limited machine learning models specifically designed for relational databases, as most models are primarily tailored for single table settings. Consequently, the prevalent approach for training machine learning models on data stored in relational databases involves performing feature engineering to merge the data from multiple tables into a single table and subsequently applying single table models. This approach not only requires significant effort in feature engineering but also destroys the inherent relational structure present in the data. To address these challenges, we propose a novel framework called Graph-based Feature Synthesis (GFS). GFS formulates the relational database as a heterogeneous graph, thereby preserving the relational structure within the data. By leveraging the inductive bias from single table models, GFS effectively captures the intricate relationships inherent in each table. Additionally, the whole framework eliminates the need for manual feature engineering. In the extensive experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases, demonstrating its superior performance.
    摘要 现代信息系统中广泛使用关系数据库,这些数据库总是含有价值的数据模式。有很多数据挖掘或机器学习任务在关系数据库上进行。然而,需要注意的是,关系数据库上的机器学习模型很少,大多数模型都是单表设计的。因此,通常需要在多个表中合并数据,并应用单表模型进行训练。这种方法不仅需要大量的特 engineering功能,还会遗弃关系数据库中的自然结构。为解决这些挑战,我们提出了一种新的框架,即图基于特征合成(GFS)。GFS将关系数据库转化为不同类型表的异类图,从而保留数据中的自然关系。通过单表模型的假设,GFS可以有效捕捉每个表中的复杂关系。此外,整个框架不需要手动特 engineering。在四个真实的多表关系数据库上进行了广泛的实验,GFS在前一些设计于关系数据库上的方法之上表现出优于性。

Stochastic Optimal Control Matching

  • paper_url: http://arxiv.org/abs/2312.02027
  • repo_url: None
  • paper_authors: Carles Domingo-Enrich, Jiequn Han, Brandon Amos, Joan Bruna, Ricky T. Q. Chen
  • for: 这篇论文的目的是提出一种新的可逆Diffusion Optimization(IDO)技术,用于数值最佳控制,以应对噪音系统的行为。
  • methods: 这篇论文使用的方法是Stochastic Optimal Control Matching(SOCM),它基于diffusion模型的条件Score Matching损失函数,并通过一个迭代的数值最佳控制算法来实现。
  • results: 实验结果显示,这篇论文的算法在四个不同的控制设定下均能够取得较低的错误值,比其他所有IDO技术更好。
    Abstract Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for four different control settings. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that is of independent interest, e.g., for generative modeling.
    摘要

Optimal Data Generation in Multi-Dimensional Parameter Spaces, using Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2312.02012
  • repo_url: None
  • paper_authors: M. R. Mahani, Igor A. Nechepurenko, Yasmin Rahimof, Andreas Wicht
  • for: 采集大量数据点以便训练精准机器学习(ML)模型,特别在科学领域中数据收集具有资源投入的性。
  • methods: 我们提出了一种新的方法,利用 Gaussian process regression(GPR)模仿输入和输出参数之间的下面关系,以构建高度有用的最小数据库。通过知道的数据,GPR提供了预测性的方差和标准差,从而选择数据点,以获得高精度的ML模型训练数据库。
  • results: 我们的结果表明,使用 Bayesian optimization方法选择数据点后,ML模型在训练数据库上表现出了高精度和较小的数据量,在与传统方法训练的数据库上表现出了明显的优势。这种方法为高维复杂参数空间中的数据收集做出了贡献,以实现高精度机器学习预测。
    Abstract Acquiring a substantial number of data points for training accurate machine learning (ML) models is a big challenge in scientific fields where data collection is resource-intensive. Here, we propose a novel approach for constructing a minimal yet highly informative database for training ML models in complex multi-dimensional parameter spaces. To achieve this, we mimic the underlying relation between the output and input parameters using Gaussian process regression (GPR). Using a set of known data, GPR provides predictive means and standard deviation for the unknown data. Given the predicted standard deviation by GPR, we select data points using Bayesian optimization to obtain an efficient database for training ML models. We compare the performance of ML models trained on databases obtained through this method, with databases obtained using traditional approaches. Our results demonstrate that the ML models trained on the database obtained using Bayesian optimization approach consistently outperform the other two databases, achieving high accuracy with a significantly smaller number of data points. Our work contributes to the resource-efficient collection of data in high-dimensional complex parameter spaces, to achieve high precision machine learning predictions.
    摘要 获取具有大量数据点的高精度机器学习(ML)模型训练是科学领域中的一大挑战,因为数据收集是资源占用的。在这里,我们提出了一种新的方法,用于构建高度简洁又高度有 информатив性的数据库,用于训练ML模型在复杂多维参数空间中。为此,我们使用 Gaussian process regression(GPR)模仿输出和输入参数之间的下面关系。使用一组已知数据,GPR提供了预测性的方差和标准差,用于未知数据。根据预测的标准差,我们使用 Bayesian optimization来选择数据点,以获得高效的数据库 для训练ML模型。我们对使用这种方法 construct的数据库与传统方法 construct的数据库进行比较。我们的结果表明,ML模型在使用 Bayesian optimization方法 construct的数据库上训练时,常常表现出高精度和较少的数据点。我们的工作对高维复杂参数空间中资源有效的数据收集做出了贡献,以实现高精度机器学习预测。

Information Modified K-Nearest Neighbor

  • paper_url: http://arxiv.org/abs/2312.01991
  • repo_url: None
  • paper_authors: Mohammad Ali Vahedifar, Azim Akhtarshenas, Mariam Sabbaghian, Mohammad Rafatpanah
  • for: 提高 K-Nearest Neighbors(KNN)算法的表现
  • methods: 利用 Mutual Information(MI)增强质量,基于 Cooperative Game Theory 的 Shapley values 细化值分配
  • results: 比较了 7 种 contemporary KNN 变种和传统 KNN,在 12 个广泛使用的数据集上进行了实验,并通过精度、准确率和回归率来评估方法的表现,研究发现 IMKNN 在不同的数据集和评价标准上具有显著的优势,能够在多种分类任务中提高 KNN 算法的表现。
    Abstract In this research paper, we introduce a novel classification method aimed at improving the performance of the K-Nearest Neighbors (KNN) algorithm. Our approach leverages Mutual Information (MI) to enhance the significance of weights and draw inspiration from Shapley values, a concept originating from cooperative game theory, to refine value allocation. The fundamental concept underlying KNN is the classification of samples based on the majority thorough their k-nearest neighbors. While both the distances and labels of these neighbors are crucial, traditional KNN assigns equal weight to all samples and prevance considering the varying importance of each neighbor based on their distances and labels. In the proposed method, known as Information-Modified KNN (IMKNN), we address this issue by introducing a straightforward algorithm. To evaluate the effectiveness of our approach, it is compared with 7 contemporary variants of KNN, as well as the traditional KNN. Each of these variants exhibits its unique advantages and limitations. We conduct experiments on 12 widely-used datasets, assessing the methods' performance in terms of accuracy, precision and recall. Our study demonstrates that IMKNN consistently outperforms other methods across different datasets and criteria by highlighting its superior performance in various classification tasks. These findings underscore the potential of IMKNN as a valuable tool for enhancing the capabilities of the KNN algorithm in diverse applications.
    摘要 在这篇研究论文中,我们介绍了一种新的分类方法,旨在提高基于K-最近邻居(KNN)算法的性能。我们的方法利用了相互信息(MI)来增强权重的重要性,并启取着合作游戏理论中的希柏利值(Shapley value)来细化值分配。KNN算法的基本思想是根据样本的k nearest neighbors来分类样本。而传统的KNN方法却忽略了每个邻居的重要性,不考虑它们的距离和标签的不同程度。在我们提出的方法中,称为信息修正KNN(IMKNN),我们解决了这个问题。我们对12种广泛使用的数据集进行了实验,对不同的数据集和评价标准进行了比较。我们的研究表明,IMKNN在不同的数据集和评价标准上具有优秀的性能,在多种分类任务中表现出色。这些发现表明IMKNN在多种应用中具有潜在的价值。

  • paper_url: http://arxiv.org/abs/2312.02248
  • repo_url: None
  • paper_authors: Sophia Krix, Ella Wilczynski, Neus Falgàs, Raquel Sánchez-Valle, Eti Yoles, Uri Nevo, Kuti Baruch, Holger Fröhlich
  • for: 这研究旨在探讨血液基因表示的抑郁症诊断方法,以提高诊断的准确性和效率。
  • methods: 本研究使用机器学习算法和机理模型方法,包括代理模型,对血液中免疫系统相关biomarker的发现进行深入分析和评估。
  • results: 研究发现,血液中免疫系统相关biomarker可能成为抑郁症的早期诊断工具,并且机器学习算法和机理模型方法可以帮助找到这些biomarker。
    Abstract Alzheimer's disease has an increasing prevalence in the population world-wide, yet current diagnostic methods based on recommended biomarkers are only available in specialized clinics. Due to these circumstances, Alzheimer's disease is usually diagnosed late, which contrasts with the currently available treatment options that are only effective for patients at an early stage. Blood-based biomarkers could fill in the gap of easily accessible and low-cost methods for early diagnosis of the disease. In particular, immune-based blood-biomarkers might be a promising option, given the recently discovered cross-talk of immune cells of the central nervous system with those in the peripheral immune system. With the help of machine learning algorithms and mechanistic modeling approaches, such as agent-based modeling, an in-depth analysis of the simulation of cell dynamics is possible as well as of high-dimensional omics resources indicative of pathway signaling changes. Here, we give a background on advances in research on brain-immune system cross-talk in Alzheimer's disease and review recent machine learning and mechanistic modeling approaches which leverage modern omics technologies for blood-based immune system-related biomarker discovery.
    摘要 阿尔茨海默病在全球人口中的发病率不断增加,然而现有的诊断方法基于推荐的生物标志物仅限于专业医疗机构。由于这种情况,阿尔茨海默病通常在晚期诊断,与现有的治疗方法只有在早期效果的情况下。血液基的生物标志物可以填补访问 easily 和成本低的诊断方法的空白。特别是免疫基的血液生物标志物可能是一个有前途的选择,因为最近发现的中枢神经系统免疫细胞与 périphériques 免疫系统之间的交互。通过机器学习算法和机制模型方法,如代理模型,可以进行深入的 cell 动态分析以及高维 omics 资源表示Pathway 信号变化。本文提供了关于研究大脑免疫系统之间的交互在阿尔茨海默病中的进展,以及最近机器学习和机制模型方法的应用,以探索血液基免疫系统相关生物标志物的发现。

Maximising Quantum-Computing Expressive Power through Randomised Circuits

  • paper_url: http://arxiv.org/abs/2312.01947
  • repo_url: None
  • paper_authors: Yingli Yang, Zongkang Zhang, Anbang Wang, Xiaosi Xu, Xiaoting Wang, Ying Li
  • for: 这个论文旨在提出一种新的变量量量子算法(VQA),使用随机量子电路生成变量波函数。
  • methods: 该论文使用随机量子电路和人工神经网络来参数化分布函数,并通过优化来找到解决方案。
  • results: 论文通过数值实验表明,随机量子电路approximation可以达到任意精度,并且与时间成本和门槛数之间存在贝叶矩阵关系。这些结果表明随机电路方法在量子计算中表现出了扎实的潜力。
    Abstract In the noisy intermediate-scale quantum era, variational quantum algorithms (VQAs) have emerged as a promising avenue to obtain quantum advantage. However, the success of VQAs depends on the expressive power of parameterised quantum circuits, which is constrained by the limited gate number and the presence of barren plateaus. In this work, we propose and numerically demonstrate a novel approach for VQAs, utilizing randomised quantum circuits to generate the variational wavefunction. We parameterize the distribution function of these random circuits using artificial neural networks and optimize it to find the solution. This random-circuit approach presents a trade-off between the expressive power of the variational wavefunction and time cost, in terms of the sampling cost of quantum circuits. Given a fixed gate number, we can systematically increase the expressive power by extending the quantum-computing time. With a sufficiently large permissible time cost, the variational wavefunction can approximate any quantum state with arbitrary accuracy. Furthermore, we establish explicit relationships between expressive power, time cost, and gate number for variational quantum eigensolvers. These results highlight the promising potential of the random-circuit approach in achieving a high expressive power in quantum computing.
    摘要 在噪声中等级量子时代,变量量子算法(VQA)已经出现为实现量子优势的有力的方法。然而,VQA的成功取决于可调量子电路的表达力,它受到了门数限制和板块障碍的影响。在这项工作中,我们提出了一种新的方法,使用随机量子电路来生成变量波函数。我们使用人工神经网络来参数化这些随机电路的分布函数,并优化它以找到解决方案。这种随机电路方法存在与表达力和时间成本之间的交易,即在固定门数下,可以逐渐增加表达力,但是随着时间成本的增加,表达力也会逐渐下降。此外,我们确立了变量量子电路的表达力、时间成本和门数之间的直接关系,这些结果表明随机电路方法在量子计算中实现高表达力的潜力。

Intrusion Detection System with Machine Learning and Multiple Datasets

  • paper_url: http://arxiv.org/abs/2312.01941
  • repo_url: https://github.com/priscilla100/ensemble_IDS
  • paper_authors: Haiyan Xuan, Mohith Manohar
  • For: The paper aims to enhance the performance of an intrusion detection system (IDS) using machine learning (ML) and hyperparameter tuning to combat attacks by unethical hackers.* Methods: The paper explores the use of multiple datasets and machine learning models, including XGBoost and random forest classifiers, to improve the accuracy and efficacy of the IDS. The paper also employs the RandomizedSearchCV hyperparameter technique to optimize the performance of the models.* Results: The proposed multi-dataset integration method achieved an accuracy score of 99.9% when equipped with XGBoost and random forest classifiers and the RandomizedSearchCV hyperparameter technique.
    Abstract As Artificial Intelligence (AI) technologies continue to gain traction in the modern-day world, they ultimately pose an immediate threat to current cybersecurity systems via exploitative methods. Prompt engineering is a relatively new field that explores various prompt designs that can hijack large language models (LLMs). If used by an unethical attacker, it can enable an AI system to offer malicious insights and code to them. In this paper, an enhanced intrusion detection system (IDS) that utilizes machine learning (ML) and hyperparameter tuning is explored, which can improve a model's performance in terms of accuracy and efficacy. Ultimately, this improved system can be used to combat the attacks made by unethical hackers. A standard IDS is solely configured with pre-configured rules and patterns; however, with the utilization of machine learning, implicit and different patterns can be generated through the models' hyperparameter settings and parameters. In addition, the IDS will be equipped with multiple datasets so that the accuracy of the models improves. We evaluate the performance of multiple ML models and their respective hyperparameter settings through various metrics to compare their results to other models and past research work. The results of the proposed multi-dataset integration method yielded an accuracy score of 99.9% when equipped with the XGBoost and random forest classifiers and RandomizedSearchCV hyperparameter technique.
    摘要 如果人工智能(AI)技术继续在现代世界得到普及,它们最终会对当前的网络安全系统 pose 潜在的威胁。快速工程是一个相对较新的领域,它探索了不同的提示设计,可以劫持大型自然语言模型(LLM)。如果由不道德的攻击者使用,它可以启用 AI 系统提供有害的洞察和代码。在这篇论文中,我们探讨了一种增强型网络入侵检测系统(IDS),它利用机器学习(ML)和超参数调整来提高模型的性能。最终,这种改进的系统可以用来对抗不道德的黑客的攻击。标准 IDS 仅仅配置了预先配置的规则和模式,但通过利用机器学习,可以生成透过模型的超参数设置和参数的不同和含义的隐式模式。此外,IDS 还将被配置多个数据集,以便提高模型的准确率。我们通过多种 metrics 评估多种 ML 模型和其相应的超参数设置,并与之前的研究成果进行比较。结果表明,我们的多数据集综合方法可以达到 99.9% 的准确率,当使用 XGBoost 和Random Forest 分类器和RandomizedSearchCV 超参数技术。

Analysis and mining of low-carbon and energy-saving tourism data characteristics based on machine learning algorithm

  • paper_url: http://arxiv.org/abs/2312.03037
  • repo_url: None
  • paper_authors: Lukasz Wierzbinski
  • For: This paper aims to study the formation mechanism of residents’ low-carbon awareness and provide an important basis for traffic managers to guide urban residents to choose low-carbon travel modes.* Methods: The paper uses data mining technology to analyze the data of low-carbon travel questionnaires, and applies machine learning algorithms such as K-means clustering and random forest to explore the mechanism of residents’ low-carbon travel willingness.* Results: The paper finds that residents’ low-carbon travel willingness can be divided into three categories based on their social attribute characteristics, travel characteristics, and other factors. The four most significant factors affecting residents’ low-carbon travel willingness are occupation, residence, family composition, and commuting time.
    Abstract In order to study the formation mechanism of residents' low-carbon awareness and provide an important basis for traffic managers to guide urban residents to choose low-carbon travel mode, this paper proposes a low-carbon energy-saving travel data feature analysis and mining based on machine learning algorithm. This paper uses data mining technology to analyze the data of low-carbon travel questionnaire, and regards the 15-dimensional problem under the framework of planned behavior theory as the internal cause variable that characterizes residents' low-carbon travel willingness. The author uses K-means clustering algorithm to classify the intensity of residents' low-carbon travel willingness, and applies the results as the explanatory variables to the random forest model to explore the mechanism of residents' social attribute characteristics, travel characteristics, etc. on their low-carbon travel willingness. The experimental results show that based on the Silhouette index test and t-SNE dimensionality reduction, residents' low-carbon travel willingness can be divided into three categories: strong, neutral, and not strong; Based on the importance index, the four most significant factors are the occupation, residence, family composition and commuting time of residents. Conclusion: This method provides policy recommendations for the development and management of urban traffic low-carbon from multiple perspectives.
    摘要 为了研究城市居民低碳意识形成机制并为城市交通管理者提供低碳交通模式选择的重要依据,这篇论文提议利用机器学习算法进行低碳能源节能旅行数据特征分析和挖掘。本论文使用数据挖掘技术分析低碳旅行问卷数据,并在决定理论框架下视居民低碳旅行意愿的15个维度问题为内在原因变量。作者使用K-means归一化算法将居民低碳旅行意愿的强度分类,并将结果作为解释变量 aplicar Random Forest 模型探索居民社会属性、旅行属性等对其低碳旅行意愿的影响。实验结果显示,根据Silhouette 指数测试和t-SNE维度减少,居民低碳旅行意愿可以分为三类:强、中性和不强;根据重要性指数,居民的四个最重要因素是职业、住址、家庭结构和通勤时间。结论:这种方法为城市交通发展和管理带来多种视角的政策建议。

Unlocking optimal batch size schedules using continuous-time control and perturbation theory

  • paper_url: http://arxiv.org/abs/2312.01898
  • repo_url: None
  • paper_authors: Stefan Perko
  • for: 这篇论文的目的是为了 theoretically derive optimal batch size schedules for Stochastic Gradient Descent (SGD) and similar algorithms.
  • methods: 这篇论文使用了一种 familyd of stochastic differential equations 来近似 Parameters 的更新过程,并通过对 learning rate 的扩展来更好地处理 state-dependent diffusion coefficient.
  • results: 这篇论文得到了一种 continuous-time optimal batch size schedule для大家族of diffusion coefficients,并在 linear regression Setting 中应用了结果。
    Abstract Stochastic Gradient Descent (SGD) and its variants are almost universally used to train neural networks and to fit a variety of other parametric models. An important hyperparameter in this context is the batch size, which determines how many samples are processed before an update of the parameters occurs. Previous studies have demonstrated the benefits of using variable batch sizes. In this work, we will theoretically derive optimal batch size schedules for SGD and similar algorithms, up to an error that is quadratic in the learning rate. To achieve this, we approximate the discrete process of parameter updates using a family of stochastic differential equations indexed by the learning rate. To better handle the state-dependent diffusion coefficient, we further expand the solution of this family into a series with respect to the learning rate. Using this setup, we derive a continuous-time optimal batch size schedule for a large family of diffusion coefficients and then apply the results in the setting of linear regression.
    摘要

Non-Intrusive Load Monitoring for Feeder-Level EV Charging Detection: Sliding Window-based Approaches to Offline and Online Detection

  • paper_url: http://arxiv.org/abs/2312.01887
  • repo_url: None
  • paper_authors: Cameron Martin, Fucai Ke, Hao Wang
    for: 这篇论文的目的是为了实现电动车(EV)充电 networks 上的有效管理,并帮助确保能源和交通领域的减排。methods: 这篇论文使用了进步的计量基础设施,从分布网络中收集高精度的负载数据,并使用非入侵式负载监控(NILM)技术来探测EV充电。results: 这篇论文获得了高精度的EV充电探测结果,在 feeder 层获得了98.88%的偏好分(F-Score),并在线上探测中获得了93.01%的偏好分。
    Abstract Understanding electric vehicle (EV) charging on the distribution network is key to effective EV charging management and aiding decarbonization across the energy and transport sectors. Advanced metering infrastructure has allowed distribution system operators and utility companies to collect high-resolution load data from their networks. These advancements enable the non-intrusive load monitoring (NILM) technique to detect EV charging using load measurement data. While existing studies primarily focused on NILM for EV charging detection in individual households, there is a research gap on EV charging detection at the feeder level, presenting unique challenges due to the combined load measurement from multiple households. In this paper, we develop a novel and effective approach for EV detection at the feeder level, involving sliding-window feature extraction and classical machine learning techniques, specifically models like XGBoost and Random Forest. Our developed method offers a lightweight and efficient solution, capable of quick training. Moreover, our developed method is versatile, supporting both offline and online EV charging detection. Our experimental results demonstrate high-accuracy EV charging detection at the feeder level, achieving an F-Score of 98.88% in offline detection and 93.01% in online detection.
    摘要 Translated into Simplified Chinese:理解电动车(EV)充电在分布网络上的理解是管理电动车充电和能源交通领域的关键,以便实现碳 neutrality。高级计量基础设施已经允许分布系统运营商和供应商收集高分辨率的荷量数据。这些进步使得非侵入式荷量监测(NILM)技术可以通过荷量测量数据探测电动车充电。而现有研究主要集中在各个家庭的NILM电动车充电探测上,存在一个研究空白,即集成荷量测量数据从多个家庭的EV充电探测。在这篇论文中,我们开发了一种新的和有效的方法,用于在分布器级别探测电动车充电,包括滑动窗口特征提取和классиical机器学习技术,具体使用XGBoost和Random Forest模型。我们开发的方法具有轻量级和高效的特点,可以快速训练。此外,我们的方法是多样化的,支持在线和离线电动车充电探测。我们的实验结果表明,在分布器级别,我们的方法可以准确探测电动车充电,达到了98.88%的偏好分和93.01%的在线探测精度。

HGPROMPT: Bridging Homogeneous and Heterogeneous Graphs for Few-shot Prompt Learning

  • paper_url: http://arxiv.org/abs/2312.01878
  • repo_url: None
  • paper_authors: Xingtong Yu, Zemin Liu, Yuan Fang, Xinming Zhang
  • for: 这篇论文的目的是提出一个统一Homogeneous Graph Neural Networks (HGNNs)和heterogeneous graph neural networks (GNNs)的预训练和提示框架,以实现统一预训练和下游任务的目的。
  • methods: 这篇论文提出了一个名为HGPROMPT的新的预训练和提示框架,该框架使用了一个双板设计,具有以下两个特点:first, it unifies homogeneous and heterogeneous graphs via a dual-template design; second, it proposes dual-prompt to assist downstream tasks in locating the most relevant prior to bridge the gaps caused by not only feature variations but also heterogeneity differences across tasks.
  • results: 在这篇论文中, authors thoroughly evaluated and analyzed HGPROMPT through extensive experiments on three public datasets, and achieved promising results. Specifically, HGPROMPT outperformed the state-of-the-art baselines on all three datasets, and demonstrated its ability to bridge the gap between pre-training and downstream tasks, as well as between homogeneous and heterogeneous graphs.
    Abstract Graph neural networks (GNNs) and heterogeneous graph neural networks (HGNNs) are prominent techniques for homogeneous and heterogeneous graph representation learning, yet their performance in an end-to-end supervised framework greatly depends on the availability of task-specific supervision. To reduce the labeling cost, pre-training on self-supervised pretext tasks has become a popular paradigm,but there is often a gap between the pre-trained model and downstream tasks, stemming from the divergence in their objectives. To bridge the gap, prompt learning has risen as a promising direction especially in few-shot settings, without the need to fully fine-tune the pre-trained model. While there has been some early exploration of prompt-based learning on graphs, they primarily deal with homogeneous graphs, ignoring the heterogeneous graphs that are prevalent in downstream applications. In this paper, we propose HGPROMPT, a novel pre-training and prompting framework to unify not only pre-training and downstream tasks but also homogeneous and heterogeneous graphs via a dual-template design. Moreover, we propose dual-prompt in HGPROMPT to assist a downstream task in locating the most relevant prior to bridge the gaps caused by not only feature variations but also heterogeneity differences across tasks. Finally, we thoroughly evaluate and analyze HGPROMPT through extensive experiments on three public datasets.
    摘要 GRAPH NEURAL NETWORKS (GNNs) 和异种 GRAPH NEURAL NETWORKS (HGNNs) 是Homogeneous 和异种 GRAPH 表示学习的主要技术,但它们在终端指导下的性能受到任务特定的监督的限制。为了减少标注成本,预训练在自我监督任务上是一种流行的方法,但是这两个任务之间的差异可能会导致 gap 问题。为了衔接这个 gap,提示学习在 few-shot Settings 中 emerge 为一种有前途的方向,不需要完全 fine-tune 预训练模型。然而,早期的提示学习研究主要集中在Homogeneous GRAPH 上,忽略了 downstream 应用中广泛存在的异种 GRAPH。在这篇论文中,我们提出了 HGPROMPT,一种 novel 的预训练和提示框架,可以Unify 不只是预训练和终端任务,还有 homogeneous 和异种 GRAPH。此外,我们还提出了 dual-prompt 在 HGPROMPT 中,可以帮助下游任务在不同的 feature 和异种 GRAPH 之间找到最相关的优化。最后,我们通过了 Extensive 的实验和分析,证明了 HGPROMPT 的可行性和有效性。

FlowHON: Representing Flow Fields Using Higher-Order Networks

  • paper_url: http://arxiv.org/abs/2312.02243
  • repo_url: None
  • paper_authors: Nan Chen, Zhihong Li, Jun Tao
  • for: 本文旨在构建高阶网络(HONs)从流场中提取高阶相关性。
  • methods: 本文提出了一种基于流场的HON构建方法,通过三个线性变换来生成节点和边。
  • results: 本文通过应用不同的下游任务,如流体density估计、流场分区和网络图示 Representation来证明FlowHON的效果。
    Abstract Flow fields are often partitioned into data blocks for massively parallel computation and analysis based on blockwise relationships. However, most of the previous techniques only consider the first-order dependencies among blocks, which is insufficient in describing complex flow patterns. In this work, we present FlowHON, an approach to construct higher-order networks (HONs) from flow fields. FlowHON captures the inherent higher-order dependencies in flow fields as nodes and estimates the transitions among them as edges. We formulate the HON construction as an optimization problem with three linear transformations. The first two layers correspond to the node generation and the third one corresponds to edge estimation. Our formulation allows the node generation and edge estimation to be solved in a unified framework. With FlowHON, the rich set of traditional graph algorithms can be applied without any modification to analyze flow fields, while leveraging the higher-order information to understand the inherent structure and manage flow data for efficiency. We demonstrate the effectiveness of FlowHON using a series of downstream tasks, including estimating the density of particles during tracing, partitioning flow fields for data management, and understanding flow fields using the node-link diagram representation of networks.
    摘要 <>translate("Flow fields are often partitioned into data blocks for massively parallel computation and analysis based on blockwise relationships. However, most of the previous techniques only consider the first-order dependencies among blocks, which is insufficient in describing complex flow patterns. In this work, we present FlowHON, an approach to construct higher-order networks (HONs) from flow fields. FlowHON captures the inherent higher-order dependencies in flow fields as nodes and estimates the transitions among them as edges. We formulate the HON construction as an optimization problem with three linear transformations. The first two layers correspond to the node generation and the third one corresponds to edge estimation. Our formulation allows the node generation and edge estimation to be solved in a unified framework. With FlowHON, the rich set of traditional graph algorithms can be applied without any modification to analyze flow fields, while leveraging the higher-order information to understand the inherent structure and manage flow data for efficiency. We demonstrate the effectiveness of FlowHON using a series of downstream tasks, including estimating the density of particles during tracing, partitioning flow fields for data management, and understanding flow fields using the node-link diagram representation of networks.")]Here's the translation in Simplified Chinese:流场 часто被分割为数据块进行大规模并行计算和分析,基于块之间的关系。然而,大多数先前技术只考虑流场中块之间的首次依赖关系,这不够用于描述复杂的流动模式。在这种工作中,我们提出了FlowHON方法,可以从流场中构建高阶网络(HON)。FlowHON捕捉流场中块之间的内在高阶依赖关系,并将它们作为节点和边进行估计。我们将HON构建定义为一个优化问题,包括三个线性变换。第一两层对应于节点生成,第三层对应于边估计。我们的定义允许节点生成和边估计在一个统一的框架中解决。与FlowHON相比,传统的图算法可以无需修改应用于分析流场,同时利用更高级的信息来理解流场的内在结构和管理流数据的效率。我们使用一系列下游任务,包括跟踪中粒子的浓度估计、分割流场数据管理和流场使用节点连接图表示方法来证明FlowHON的效果。

Class Symbolic Regression: Gotta Fit ‘Em All

  • paper_url: http://arxiv.org/abs/2312.01816
  • repo_url: https://github.com/wassimtenachi/physo
  • paper_authors: Wassim Tenachi, Rodrigo Ibata, Thibaut L. François, Foivos I. Diakogiannis
  • for: 这个论文是为了自动找到多个数据集中的一个分析函数形式,以便准确地适应每个数据集的特定(可能不同的)适应参数。
  • methods: 该框架使用了一种层次结构,利用所有类physical phenomenon的共同约束,即所有数据集都遵循同一个分析法律。该方法 build upon our earlier Physical Symbolic Optimization($\Phi$-SO)框架,并 integra了维度分析约束和深度鼓励学习来发现数据集中的符号analytical函数。
  • results: 该研究通过应用该新方法到一组synthetic toy case数据集,并成功从 simulate stellar streams中提取了一个分析星系潜能函数。
    Abstract We introduce "Class Symbolic Regression" a first framework for automatically finding a single analytical functional form that accurately fits multiple datasets - each governed by its own (possibly) unique set of fitting parameters. This hierarchical framework leverages the common constraint that all the members of a single class of physical phenomena follow a common governing law. Our approach extends the capabilities of our earlier Physical Symbolic Optimization ($\Phi$-SO) framework for Symbolic Regression, which integrates dimensional analysis constraints and deep reinforcement learning for symbolic analytical function discovery from data. We demonstrate the efficacy of this novel approach by applying it to a panel of synthetic toy case datasets and showcase its practical utility for astrophysics by successfully extracting an analytic galaxy potential from a set of simulated orbits approximating stellar streams.
    摘要 我们介绍“类别符号 regresión”,一个首创的框架,能够自动获得多个数据集合的单一数学函数形式,对应每个数据集合的专有适应parameter。这层次框架利用所有物理现象的共同限制,即所有物理现象都受到共同的管理法则。我们的方法延伸了我们之前的物理符号优化($\Phi$-SO)框架,它组合了尺度分析限制和深度游戏学 Symbolic Regression 的数学函数发现。我们透过对一系列人工构建的实验数据集合进行应用,成功地从数据中提取了一个analytic galaxy potential。

Distributed Continual Learning with CoCoA in High-dimensional Linear Regression

  • paper_url: http://arxiv.org/abs/2312.01795
  • repo_url: None
  • paper_authors: Martin Hellkvist, Ayça Özçelikkale, Anders Ahlén
  • for: 本研究针对 continual learning 问题,即在不断收到新任务的情况下,保持之前已经学习的任务的性能。
  • methods: 本研究使用了分布式学习算法 COCOA,并提供了对 continual learning 的分布式学习的正确分析。
  • results: 研究发现,适当调整网络大小可以显著降低总体化错误,且最佳网络大小取决于任务相似性和任务数量。
    Abstract We consider estimation under scenarios where the signals of interest exhibit change of characteristics over time. In particular, we consider the continual learning problem where different tasks, e.g., data with different distributions, arrive sequentially and the aim is to perform well on the newly arrived task without performance degradation on the previously seen tasks. In contrast to the continual learning literature focusing on the centralized setting, we investigate the problem from a distributed estimation perspective. We consider the well-established distributed learning algorithm COCOA, which distributes the model parameters and the corresponding features over the network. We provide exact analytical characterization for the generalization error of COCOA under continual learning for linear regression in a range of scenarios, where overparameterization is of particular interest. These analytical results characterize how the generalization error depends on the network structure, the task similarity and the number of tasks, and show how these dependencies are intertwined. In particular, our results show that the generalization error can be significantly reduced by adjusting the network size, where the most favorable network size depends on task similarity and the number of tasks. We present numerical results verifying the theoretical analysis and illustrate the continual learning performance of COCOA with a digit classification task.
    摘要 我们考虑在信号 интереса的特征发展过时进行估计。特别是,我们考虑了连续学习问题,其中不同任务(例如数据具有不同分布)在紧随着紧随着时间顺序抵达。相比于中央化学习文献,我们从分布式估计角度 investigate这个问题。我们考虑了已知的分布式学习算法COCOA,它将模型参数和相应的特征分布在网络上。我们提供了对于连续学习的一般化误差的正式分析,包括在不同任务类型下的情况。这些分析结果显示了模型参数的误差随着网络结构、任务相似度和任务数量的变化而变化。具体来说,我们的结果表明,通过调整网络大小,可以显著降低误差。最佳的网络大小取决于任务相似度和任务数量。我们提供了数值结果,证明了理论分析,并通过数字分类任务 illustrate了COCOA在连续学习中的性能。

Wild-Tab: A Benchmark For Out-Of-Distribution Generalization In Tabular Regression

  • paper_url: http://arxiv.org/abs/2312.01792
  • repo_url: None
  • paper_authors: Sergey Kolesnikov
  • for: 该论文旨在提出一个大规模的标准 benchmark,用于评估深度学习模型在 tabular regresion 任务中的异常情况泛化性。
  • methods: 论文使用了 10 种不同的异常情况泛化方法进行评估,包括 Empirical Risk Minimization (ERM) 等。
  • results: 研究发现,许多异常情况泛化方法在未看到的数据上表现不佳,其 OOD 性能与 seen 数据上的性能之间存在显著的差异。而 ERM 简单的方法却能够在所有评估中表现出色,与当前状态法律环境齐平。
    Abstract Out-of-Distribution (OOD) generalization, a cornerstone for building robust machine learning models capable of handling data diverging from the training set's distribution, is an ongoing challenge in deep learning. While significant progress has been observed in computer vision and natural language processing, its exploration in tabular data, ubiquitous in many industrial applications, remains nascent. To bridge this gap, we present Wild-Tab, a large-scale benchmark tailored for OOD generalization in tabular regression tasks. The benchmark incorporates 3 industrial datasets sourced from fields like weather prediction and power consumption estimation, providing a challenging testbed for evaluating OOD performance under real-world conditions. Our extensive experiments, evaluating 10 distinct OOD generalization methods on Wild-Tab, reveal nuanced insights. We observe that many of these methods often struggle to maintain high-performance levels on unseen data, with OOD performance showing a marked drop compared to in-distribution performance. At the same time, Empirical Risk Minimization (ERM), despite its simplicity, delivers robust performance across all evaluations, rivaling the results of state-of-the-art methods. Looking forward, we hope that the release of Wild-Tab will facilitate further research on OOD generalization and aid in the deployment of machine learning models in various real-world contexts where handling distribution shifts is a crucial requirement.
    摘要 OUT-OF-DISTRIBUTION(OOD)泛化,一种关键技术 для 建立 Robust 机器学习模型,能够处理数据与训练集的分布不同。虽然在计算机视觉和自然语言处理领域已经取得了重要进展,但在表格数据领域(ubiquitous 在多个工业应用中)的探索仍然处于初期阶段。为了bridging 这个差距,我们提出了 Wild-Tab,一个大规模的 Benchmark ,专门用于表格回归任务中的 OOD 泛化。该 Benchmark 综合了3个来自天气预测和电力消耗预测等领域的工业数据,提供了一个真实的测试床。我们的广泛的实验,对 Wild-Tab 上的 10 种 OOD 泛化方法进行了详细的评估。我们发现,许多这些方法在未看到的数据上保持高性能水平很困难,OOD 性能与在分布靠拢的性能之间存在显著的差异。同时,Empirical Risk Minimization(ERM),尽管简单,仍然在所有评估中提供了Robust 的性能,与当前State-of-the-art 方法相当。我们希望,通过 Wild-Tab 的发布,将促进 OOD 泛化的研究,并帮助机器学习模型在各种实际应用中,处理分布shift 的需求。

EdgeConvFormer: Dynamic Graph CNN and Transformer based Anomaly Detection in Multivariate Time Series

  • paper_url: http://arxiv.org/abs/2312.01729
  • repo_url: None
  • paper_authors: Jie Liu, Qilin Li, Senjian An, Bradley Ezard, Ling Li
  • for: 这篇论文旨在提出一种基于Transformer的时间序列异常探测方法,以便更好地处理多元时间序列数据。
  • methods: 这篇论文提出了一种名为EdgeConvFormer的新方法,它结合了Time2vec嵌入、堆式动态图像 CNN 和 Transformer,以提取全球和本地时空信息。
  • results: 实验结果显示,EdgeConvFormer 能够从多元时间序列数据中学习时空相互关联,并在许多真实世界数据集上达到更好的异常探测性能。
    Abstract Transformer-based models for anomaly detection in multivariate time series can benefit from the self-attention mechanism due to its advantage in modeling long-term dependencies. However, Transformer-based anomaly detection models have problems such as a large amount of data being required for training, standard positional encoding is not suitable for multivariate time series data, and the interdependence between time series is not considered. To address these limitations, we propose a novel anomaly detection method, named EdgeConvFormer, which integrates Time2vec embedding, stacked dynamic graph CNN, and Transformer to extract global and local spatial-time information. This design of EdgeConvFormer empowers it with decomposition capacities for complex time series, progressive spatiotemporal correlation discovery between time series, and representation aggregation of multi-scale features. Experiments demonstrate that EdgeConvFormer can learn the spatial-temporal correlations from multivariate time series data and achieve better anomaly detection performance than the state-of-the-art approaches on many real-world datasets of different scales.
    摘要 “transformer基本模型可以吸取自注意机制,但它们在异常检测中存在一些问题,如需要大量数据进行训练、标准的 pozitional encoding 不适合多变量时间序列数据、并且不考虑时间序列之间的依赖关系。为解决这些限制,我们提出了一种新的异常检测方法,名为 EdgeConvFormer,它将 Time2vec 嵌入、堆式动态图 CNN 和 transformer 结合使用,以提取全球和本地空间时间信息。这种 EdgeConvFormer 的设计使得它具有对复杂时间序列进行分解、逐步发现时间序列之间的空间相关性、以及多尺度特征的代表汇总能力。实验表明,EdgeConvFormer 可以从多变量时间序列数据中学习空间时间相关性,并在许多真实世界数据集上超过当前状态的方法进行异常检测。”

ImputeFormer: Graph Transformers for Generalizable Spatiotemporal Imputation

  • paper_url: http://arxiv.org/abs/2312.01728
  • repo_url: None
  • paper_authors: Tong Nie, Guoyang Qin, Yuewen Mei, Jian Sun
  • for: This paper addresses the problem of multivariate time series imputation using deep neural architectures.
  • methods: The paper proposes a novel imputation model that leverages low-rank imputation methods and incorporates three key knowledge-driven enhancements: projected temporal attention, global adaptive graph convolution, and Fourier imputation loss.
  • results: The proposed model demonstrates superiority in terms of accuracy, efficiency, and flexibility on heterogeneous datasets, and provides strong empirical results that incorporating time series primitives can facilitate the development of a generalizable imputation model for a wide range of spatiotemporal imputation problems.
    Abstract This paper focuses on the multivariate time series imputation problem using deep neural architectures. The ubiquitous issue of missing data in both scientific and engineering tasks necessitates the development of an effective and general imputation model. Leveraging the wisdom and expertise garnered from low-rank imputation methods, we power the canonical Transformers with three key knowledge-driven enhancements, including projected temporal attention, global adaptive graph convolution, and Fourier imputation loss. These task-agnostic inductive biases exploit the inherent structures of incomplete time series, and thus make our model versatile for a variety of imputation problems. We demonstrate its superiority in terms of accuracy, efficiency, and flexibility on heterogeneous datasets, including traffic speed, traffic volume, solar energy, smart metering, and air quality. Comprehensive case studies are performed to further strengthen the interpretability. Promising empirical results provide strong conviction that incorporating time series primitives, such as low-rank properties, can substantially facilitate the development of a generalizable model to approach a wide range of spatiotemporal imputation problems.
    摘要

The Self-Loop Paradox: Investigating the Impact of Self-Loops on Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2312.01721
  • repo_url: https://github.com/m-lampert/self-loop-paradox
  • paper_authors: Moritz Lampert, Ingo Scholtes
  • for: 本文研究了Graph Neural Networks(GNN)在图上的自 Loop 的影响。
  • methods: 作者采用了一种分析方法,用于研究一种给定的度序列 ensemble 上的统计图。
  • results: 研究发现,在某些 GNN 架构下,自 Loop 的信息可能会在图中具有更小的影响,相比于没有自 Loop 的图。这种现象被称为自 Loop парадок斯,它与 GNN 层数 $k$ 和 $k$ 是否为偶数有关。实验 validate 了这些理论发现,并在 23 个实际图上进行了实践研究。
    Abstract Many Graph Neural Networks (GNNs) add self-loops to a graph to include feature information about a node itself at each layer. However, if the GNN consists of more than one layer, this information can return to its origin via cycles in the graph topology. Intuition suggests that this "backflow" of information should be larger in graphs with self-loops compared to graphs without. In this work, we counter this intuition and show that for certain GNN architectures, the information a node gains from itself can be smaller in graphs with self-loops compared to the same graphs without. We adopt an analytical approach for the study of statistical graph ensembles with a given degree sequence and show that this phenomenon, which we call the self-loop paradox, can depend both on the number of GNN layers $k$ and whether $k$ is even or odd. We experimentally validate our theoretical findings in a synthetic node classification task and investigate its practical relevance in 23 real-world graphs.
    摘要 很多图 neural network (GNN) 添加自环到图以包含节点自身的特征信息在每层。然而,如果 GNN 包含多于一层,这些信息可以通过图 topology 的循环返回到其原点。人类的直觉 suggets 这些 "backflow" 信息在图有自环时应该更大。在这项工作中,我们反对这个直觉,并证明在某些 GNN 架构下,图中的自环可以使节点获得的信息更小。我们采用分析方法研究统计图集合的特性,并证明这种现象(我们称之为 "自环 парадок斯")可以受到层数 $k$ 和 $k$ 是偶数或奇数的影响。我们在 sintetic 节点分类任务中进行了实验验证我们的理论发现,并在 23 个真实世界图中进行了实验研究其实际 relevance。

Estimating Coronal Mass Ejection Mass and Kinetic Energy by Fusion of Multiple Deep-learning Models

  • paper_url: http://arxiv.org/abs/2312.01691
  • repo_url: None
  • paper_authors: Khalid A. Alobaid, Yasser Abduallah, Jason T. L. Wang, Haimin Wang, Shen Fan, Jialiang Li, Huseyin Cavus, Vasyl Yurchyshyn
  • for: 这 paper 的目的是估算 Coronal Mass Ejections (CMEs) 的质量和动能。
  • methods: 该 paper 使用的方法是一种深度学习模型,名为 DeepCME,通过抽象 LASCO C2 图像,以估算 CMEs 的质量和动能。
  • results: 实验结果显示,DeepCME 模型比最佳组件模型 InceptionResNet 和 InceptionNet 更加准确地估算 CMEs 的质量和动能,其中 MRE 为 0.013。
    Abstract Coronal mass ejections (CMEs) are massive solar eruptions, which have a significant impact on Earth. In this paper, we propose a new method, called DeepCME, to estimate two properties of CMEs, namely, CME mass and kinetic energy. Being able to estimate these properties helps better understand CME dynamics. Our study is based on the CME catalog maintained at the Coordinated Data Analysis Workshops (CDAW) Data Center, which contains all CMEs manually identified since 1996 using the Large Angle and Spectrometric Coronagraph (LASCO) on board the Solar and Heliospheric Observatory (SOHO). We use LASCO C2 data in the period between January 1996 and December 2020 to train, validate and test DeepCME through 10-fold cross validation. The DeepCME method is a fusion of three deep learning models, including ResNet, InceptionNet, and InceptionResNet. Our fusion model extracts features from LASCO C2 images, effectively combining the learning capabilities of the three component models to jointly estimate the mass and kinetic energy of CMEs. Experimental results show that the fusion model yields a mean relative error (MRE) of 0.013 (0.009, respectively) compared to the MRE of 0.019 (0.017, respectively) of the best component model InceptionResNet (InceptionNet, respectively) in estimating the CME mass (kinetic energy, respectively). To our knowledge, this is the first time that deep learning has been used for CME mass and kinetic energy estimations.
    摘要 coronial mass ejections (CMEs) 是太阳大规模的喷发,对地球有重要影响。在这篇论文中,我们提出了一种新的方法,称为深度CME,用于估计CME的质量和动能。可以更好地理解CME的动态。我们的研究基于由协调数据分析工作shop(CDAW)数据中心维护的CME目录,该目录包含自1996年以来由大角度和光谱成像仪(LASCO)在太阳和太阳系观测卫星(SOHO)上手动识别的所有CME。我们使用LASCO C2数据在1996年1月至2020年12月期间进行了10次交叉验证。深度CME方法是三个深度学习模型的 fusión,包括ResNet、InceptionNet和InceptionResNet。我们的混合模型从LASCO C2图像中提取特征,有效地将三个组件模型的学习能力结合起来,共同估计CME的质量和动能。实验结果表明,我们的混合模型的相对误差(MRE)为0.013(0.009,分别),比最佳组件模型InceptionResNet(InceptionNet,分别)在估计CME质量(动能,分别)的MRE的0.019(0.017,分别)低得多。到目前为止,这是深度学习首次用于CME质量和动能估计。

Optimizing Bus Travel: A Novel Approach to Feature Mining with P-KMEANS and P-LDA Algorithms

  • paper_url: http://arxiv.org/abs/2312.01687
  • repo_url: None
  • paper_authors: Hongjie Liu, Haotian Shi, Sicheng Fu, Tengfei Yuan, Xinhuan Zhang, Hongzhe Xu, Bin Ran
    for: This paper aims to develop a method for extracting features from public transportation data, specifically for bus travel, to improve the attractiveness, usage, and sustainability of public transportation.methods: The method uses Point of Interest (POI) data and combines enhanced P-KMENAS and P-LDA algorithms to overcome the limitations of disorganized and unstructured public transportation data. The method segments passenger travel paths into distinct clusters and identifies features such as age, occupation, gender, sports, cost, safety, and personality traits.results: The method successfully mines the diverse aspects of bus travel and effectively calculates relationships between individual travel behaviors. It assigns explanatory and evaluative probabilities to POI labels, thereby enhancing bus travel optimization.
    Abstract Customizing services for bus travel can bolster its attractiveness, optimize usage, alleviate traffic congestion, and diminish carbon emissions. This potential is realized by harnessing recent advancements in positioning communication facilities, the Internet of Things, and artificial intelligence for feature mining in public transportation. However, the inherent complexities of disorganized and unstructured public transportation data introduce substantial challenges to travel feature extraction. This study presents a bus travel feature extraction method rooted in Point of Interest (POI) data, employing enhanced P-KMENAS and P-LDA algorithms to overcome these limitations. While the KMEANS algorithm adeptly segments passenger travel paths into distinct clusters, its outcomes can be influenced by the initial K value. On the other hand, Latent Dirichlet Allocation (LDA) excels at feature identification and probabilistic interpretations yet encounters difficulties with feature intermingling and nuanced sub-feature interactions. Incorporating the POI dimension enhances our understanding of travel behavior, aligning it more closely with passenger attributes and facilitating easier data analysis. By incorporating POI data, our refined P-KMENAS and P-LDA algorithms grant a holistic insight into travel behaviors and attributes, effectively mitigating the limitations above. Consequently, this POI-centric algorithm effectively amalgamates diverse POI attributes, delineates varied travel contexts, and imparts probabilistic metrics to feature properties. Our method successfully mines the diverse aspects of bus travel, such as age, occupation, gender, sports, cost, safety, and personality traits. It effectively calculates relationships between individual travel behaviors and assigns explanatory and evaluative probabilities to POI labels, thereby enhancing bus travel optimization.
    摘要 The KMEANS algorithm segments passenger travel paths into distinct clusters, but the outcome can be influenced by the initial K value. On the other hand, LDA excels at feature identification and probabilistic interpretations, but it struggles with feature intermingling and nuanced sub-feature interactions. By incorporating POI data, our refined P-KMENAS and P-LDA algorithms provide a comprehensive understanding of travel behaviors and attributes, facilitating easier data analysis.Incorporating POI data enhances our understanding of travel behavior, aligning it more closely with passenger attributes and facilitating easier data analysis. Our method effectively amalgamates diverse POI attributes, delineates varied travel contexts, and imparts probabilistic metrics to feature properties. We successfully mine the diverse aspects of bus travel, such as age, occupation, gender, sports, cost, safety, and personality traits. Our method calculates relationships between individual travel behaviors and assigns explanatory and evaluative probabilities to POI labels, thereby enhancing bus travel optimization.

EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

  • paper_url: http://arxiv.org/abs/2312.01674
  • repo_url: None
  • paper_authors: Jingyu Pan, Chen-Chia Chang, Zhiyao Xie, Yiran Chen
  • for: 这篇论文的目的是提出一个可用于机器学习(ML)在电子设计自动化(EDA)中的大规模集成(VLSI)设计的开源数据集。
  • methods: 该论文使用了一个综合的数据流程,从抽象到物理实现,以扩大数据收集的范围。它还提供了深入的数据分析,帮助用户更好地理解数据的特性和分布,从而创建更高效的ML模型。
  • results: 该论文提出了一个首个的开源数据集,可以用于ML在EDA中的研究。该数据集包含了现代VLSI设计的复杂性,并且可以促进ML模型的转移性研究。
    Abstract The application of Machine Learning (ML) in Electronic Design Automation (EDA) for Very Large-Scale Integration (VLSI) design has garnered significant research attention. Despite the requirement for extensive datasets to build effective ML models, most studies are limited to smaller, internally generated datasets due to the lack of comprehensive public resources. In response, we introduce EDALearn, the first holistic, open-source benchmark suite specifically for ML tasks in EDA. This benchmark suite presents an end-to-end flow from synthesis to physical implementation, enriching data collection across various stages. It fosters reproducibility and promotes research into ML transferability across different technology nodes. Accommodating a wide range of VLSI design instances and sizes, our benchmark aptly represents the complexity of contemporary VLSI designs. Additionally, we provide an in-depth data analysis, enabling users to fully comprehend the attributes and distribution of our data, which is essential for creating efficient ML models. Our contributions aim to encourage further advances in the ML-EDA domain.
    摘要 machine learning(ml)在电子设计自动化(eda)中的应用在大规模inteграción(vlsi)设计方面已经引起了广泛的研究关注。尽管需要较大的数据来建立有效的ml模型,但大多数研究仅仅使用内部生成的小规模数据集,主要是因为lack comprehensive public resources。为了解决这个问题,我们介绍了EDALearn,首个涵盖ml任务在eda中的开源benchmark suite。这个benchmark suite提供了从合成到物理实现的端到端流程,从而扩大数据收集的范围。它促进了重复性和ml模型在不同技术节点之间的传输性研究。我们的benchmark适应了 contemporary vlsi设计的复杂性,并且提供了深入的数据分析,使用户可以全面了解数据的特征和分布,这是建立有效ml模型的关键。我们的贡献旨在鼓励ml-eda领域的进一步发展。

Universal Deoxidation of Semiconductor Substrates Assisted by Machine-Learning and Real-Time-Feedback-Control

  • paper_url: http://arxiv.org/abs/2312.01662
  • repo_url: None
  • paper_authors: Chao Shen, Wenkang Zhan, Jian Tang, Zhaofeng Wu, Bo Xu, Chao Zhao, Zhanguo Wang
  • for: 本研究旨在 automatizing 厚膜沉积前的substrate除氧过程,以提高厚膜的品质和可靠性。
  • methods: 该研究使用机器学习(ML)混合 convolution和视觉转换器(CNN-ViT)模型,使用干涉高能电子折射(RHEED)视频作为输入,判断substrate的除氧状态,并实现了自动化substrate除氧 beneath a controlled architecture。
  • results: 该研究表明,使用该ML模型可以准确地判断substrate的除氧状态,并且可以在不同的MBE设备上实现高精度的部署。此外,该研究还表明,使用模型训练数据来实现高精度的部署,可以标准化厚膜沉积过程中的除氧温度,从而提高厚膜的品质和可靠性。
    Abstract Thin film deposition is an essential step in the semiconductor process. During preparation or loading, the substrate is exposed to the air unavoidably, which has motivated studies of the process control to remove the surface oxide before thin film deposition. Optimizing the deoxidation process in molecular beam epitaxy (MBE) for a random substrate is a multidimensional challenge and sometimes controversial. Due to variations in semiconductor materials and growth processes, the determination of substrate deoxidation temperature is highly dependent on the grower's expertise; the same substrate may yield inconsistent results when evaluated by different growers. Here, we employ a machine learning (ML) hybrid convolution and vision transformer (CNN-ViT) model. This model utilizes reflection high-energy electron diffraction (RHEED) video as input to determine the deoxidation status of the substrate as output, enabling automated substrate deoxidation under a controlled architecture. This also extends to the successful application of deoxidation processes on other substrates. Furthermore, we showcase the potential of models trained on data from a single MBE equipment to achieve high-accuracy deployment on other equipment. In contrast to traditional methods, our approach holds exceptional practical value. It standardizes deoxidation temperatures across various equipment and substrate materials, advancing the standardization research process in semiconductor preparation, a significant milestone in thin film growth technology. The concepts and methods demonstrated in this work are anticipated to revolutionize semiconductor manufacturing in optoelectronics and microelectronics industries by applying them to diverse material growth processes.
    摘要 紧跨膜填充是半导体生产过程中不可或缺的步骤。在准备或加载过程中,SUBSTRATE会被不可避免地暴露在空气中,这有动使研究者努力控制过程以移除表层氧化。在分子束激光 Epitaxy (MBE) 中优化废氧化过程是一个多维度挑战和有争议的问题,因为不同的半导体材料和生长过程会导致SUBSTRATE的氧化状态异常。在这种情况下,我们采用了一种机器学习(ML)混合 convolution 和视觉变换(CNN-ViT)模型。这个模型使用射电高能电子折射(RHEED)视频作为输入,以确定SUBSTRATE的氧化状态,从而实现自动化SUBSTRATE氧化,并在控制的架构下进行。这种方法也可以应用于其他SUBSTRATE上。此外,我们还表明了模型在不同的MBE设备上进行训练后可以实现高精度的部署。与传统方法相比,我们的方法具有非常重要的实用价值。它标准化了不同设备和半导体材料的氧化温度,从而推动了半导体准备研究的标准化进程,这是膜生长技术中的一个重要突破。我们的方法和技术在optoelectronics和微电子业务中的应用将对半导体生产技术产生深远的影响,并且可以应用于多种材料生长过程。

AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

  • paper_url: http://arxiv.org/abs/2312.01658
  • repo_url: https://github.com/intelligent-machine-learning/dlrover
  • paper_authors: Yun Yue, Zhiling Ye, Jiadi Jiang, Yongchao Liu, Ke Zhang
    for: This paper proposes a new approach to designing the preconditioning matrix for adaptive optimizers, which enhances the generalization performance of deep learning models.methods: The proposed method utilizes the gradient difference between two successive steps as the diagonal elements of the preconditioning matrix, and introduces an auto-switching function that enables the preconditioning matrix to switch dynamically between SGD and the adaptive optimizer.results: The proposed optimizer, named AGD, outperforms state-of-the-art optimizers on public datasets of NLP, CV, and RecSys, achieving highly competitive or significantly better predictive performance. Additionally, the paper analyzes the effects of the auto-switching function on various scenarios.
    Abstract Adaptive optimizers, such as Adam, have achieved remarkable success in deep learning. A key component of these optimizers is the so-called preconditioning matrix, providing enhanced gradient information and regulating the step size of each gradient direction. In this paper, we propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements. These diagonal elements are closely related to the Hessian and can be perceived as an approximation of the inner product between the Hessian row vectors and difference of the adjacent parameter vectors. Additionally, we introduce an auto-switching function that enables the preconditioning matrix to switch dynamically between Stochastic Gradient Descent (SGD) and the adaptive optimizer. Based on these two techniques, we develop a new optimizer named AGD that enhances the generalization performance. We evaluate AGD on public datasets of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys). Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) optimizers, achieving highly competitive or significantly better predictive performance. Furthermore, we analyze how AGD is able to switch automatically between SGD and the adaptive optimizer and its actual effects on various scenarios. The code is available at https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch/atorch/optimizers.
    摘要 适应优化器,如 Adam,在深度学习中取得了非常出色的成功。这些适应优化器的关键组件之一是所谓的预conditioning矩阵,提供了加强的梯度信息和控制每个梯度方向的步长。在这篇论文中,我们提议一种新的预conditioning矩阵设计方法,利用两个连续步骤之间的梯度差为对角元素。这些对角元素与梯度矩阵和参数向量之间的内积非常相关,可以被视为梯度矩阵的一种近似。此外,我们引入了一种自动切换函数,使预conditioning矩阵可以在Stochastic Gradient Descent(SGD)和适应优化器之间动态切换。基于这两种技术,我们开发了一种新的优化器名为AGD,它可以提高总化性能。我们在公共数据集上进行了Natural Language Processing(NLP)、Computer Vision(CV)和Recommendation Systems(RecSys)等领域的实验,结果表明AGD可以与当前最佳优化器(SOTA)匹配或提高预测性能。此外,我们还分析了AGD自动切换SGD和适应优化器的实际效果和不同场景下的表现。代码可以在https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch/atorch/optimizers中找到。

An End-to-End Network Pruning Pipeline with Sparsity Enforcement

  • paper_url: http://arxiv.org/abs/2312.01653
  • repo_url: None
  • paper_authors: Evan Dogariu
  • for: 这篇研究目的是为了开发一个可以在有限资源设备上部署的神经网络模型,并且可以维持竞争性能。
  • methods: 这篇研究使用了非标准的模型参数初始化、预先剪枝训练方法和后剪枝训练优化。
  • results: 研究发现,使用这些方法可以实现剪枝神经网络模型,并且可以获得重要的性能提升。
    Abstract Neural networks have emerged as a powerful tool for solving complex tasks across various domains, but their increasing size and computational requirements have posed significant challenges in deploying them on resource-constrained devices. Neural network sparsification, and in particular pruning, has emerged as an effective technique to alleviate these challenges by reducing model size, computational complexity, and memory footprint while maintaining competitive performance. However, many pruning pipelines modify the standard training pipeline at only a single stage, if at all. In this work, we look to develop an end-to-end training pipeline that befits neural network pruning and sparsification at all stages of training. To do so, we make use of nonstandard model parameter initialization, pre-pruning training methodologies, and post-pruning training optimizations. We conduct experiments utilizing combinations of these methods, in addition to different techniques used in the pruning step, and find that our combined pipeline can achieve significant gains over current state of the art approaches to neural network sparsification.
    摘要 To achieve this, we utilize nonstandard model parameter initialization, pre-pruning training methodologies, and post-pruning training optimizations. We conduct experiments using combinations of these methods, as well as different techniques used in the pruning step, and find that our combined pipeline can achieve significant gains over current state-of-the-art approaches to neural network sparsification.

Robust Streaming, Sampling, and a Perspective on Online Learning

  • paper_url: http://arxiv.org/abs/2312.01634
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Evan Dogariu, Jiatong Yu
  • for: 本文提供了统计学习的概述,以及流动数据处理中的稳健技术和挑战。
  • methods: 本文使用了多种稳健技术,包括敏感搜索、稳健搜索、稳健学习等。
  • results: 本文得到了一些坚实的结论,证明了一些在统计学习和流动数据处理中的深刻关系。
    Abstract In this work we present an overview of statistical learning, followed by a survey of robust streaming techniques and challenges, culminating in several rigorous results proving the relationship that we motivate and hint at throughout the journey. Furthermore, we unify often disjoint theorems in a shared framework and notation to clarify the deep connections that are discovered. We hope that by approaching these results from a shared perspective, already aware of the technical connections that exist, we can enlighten the study of both fields and perhaps motivate new and previously unconsidered directions of research.
    摘要 在这项工作中,我们提供了统计学学习的概述,然后进行了不可靠流处理技术和挑战的抽查,最终得出了许多严谨的结论,证明我们所提出的关系。此外,我们将常见的证明整合到共同框架和符号系统中,以便强调深刻的连接。我们希望通过从共同角度出发,已经熟悉技术之间的连接,以便启发新的研究方向和未曾考虑的研究方向。Here's the word-for-word translation:在这项工作中,我们提供了统计学学习的概述,然后进行了不可靠流处理技术和挑战的抽查,最终得出了许多严谨的结论,证明我们所提出的关系。此外,我们将常见的证明整合到共同框架和符号系统中,以便强调深刻的连接。我们希望通过从共同角度出发,已经熟悉技术之间的连接,以便启发新的研究方向和未曾考虑的研究方向。

How Many Validation Labels Do You Need? Exploring the Design Space of Label-Efficient Model Ranking

  • paper_url: http://arxiv.org/abs/2312.01619
  • repo_url: https://github.com/ppsmk388/morabench
  • paper_authors: Zhengyu Hu, Jieyu Zhang, Yue Yu, Yuchen Zhuang, Hui Xiong
  • for: 降低模型选择任务的标注成本
  • methods: 使用ensemble方法生成pseudo标签,使用uncertainty sampling获取目标,使用Z-score机制进行轮次委员会重新选择模型
  • results: 在多种选择指标下达到与完全标注数据集相同的结果,大幅降低标注成本,并在弱监督和半监督学习设置下有效地导航提示选择。
    Abstract The paper introduces LEMR, a framework that reduces annotation costs for model selection tasks. Our approach leverages ensemble methods to generate pseudo-labels, employs uncertainty sampling for target acquisition, and utilizes a Z-score mechanism for iterative committee reelection to refine model ranks. We present a systematic study across various selection metrics, demonstrating that LEMR achieves comparable results to fully labeled datasets with a fraction of the labeling budget. Our findings indicate that LEMR not only economizes the labeling effort in weak supervision and semi-supervised learning settings but also effectively guides prompt selection for large language models. With extensive experiments across 23 tasks, we reveal that our framework can dramatically decrease the labeling cost without compromising the accuracy of model selection, thereby offering a cost-effective alternative to traditional practices.
    摘要 文章介绍了LEMR框架,这个框架可以减少模型选择任务的标注成本。我们的方法利用协同方法生成 Pseudo-标签,使用不确定采样来获取目标,并使用Z-score机制来进行迭代委员会重新选择来细化模型排名。我们在不同的选择指标下进行了系统性的研究,展示了LEMR可以与完全标注数据集相比,使用一个小部分的标注预算来实现相同的结果。我们的发现表明,LEMR不仅可以在弱监督和半监督学习设置下减少标注努力,还可以有效地指导大语言模型的提问选择。我们在23个任务上进行了广泛的实验,发现LEMR可以减少标注成本,不会降低模型选择的准确性,从而提供一种经济的代替方案。

Deep Learning-Driven Enhancement of Welding Quality Control: Predicting Welding Depth and Pore Volume in Hairpin Welding

  • paper_url: http://arxiv.org/abs/2312.01606
  • repo_url: None
  • paper_authors: Amena Darwish, Stefan Ericson, Rohollah Ghasemi, Tobias Andersson, Dan Lönn, Andreas Andersson Lassila, Kent Salomonsson
  • for: 这个研究是为了提高焊接程序的品质保证,应用深度学习模型预测焊接过程中两个重要性能指标(KPC):焊接深度和平均孔隙量。
  • methods: 本研究使用了一系列激光焊接的关键输入特征(KIC),包括焊接槽形状、焊接速率、焊接道路重复次数以及所有路径的明亮焊接比率。这些数据来自于卷状焊接实验。本研究使用了两个深度学习网络,每个网络有多个隐藏层和线性活化函数,以展示深度学习网络在焊接KPC和KIC之间的复杂非线性关系。
  • results: 对于小型数据集的焊接实验,使用深度学习网络已经获得了扣准的结果,MAE值为0.1079 для预测焊接深度,并且MAE值为0.0641 для预测平均孔隙量。此外,验证过程显示了提案的方法的可靠性。这些结果 Promise significant advantages in controlling welding outcomes, moving beyond the current trend of relying merely on monitoring for defect classification.
    Abstract To advance quality assurance in the welding process, this study presents a robust deep learning model that enables the prediction of two critical welds Key Performance Characteristics (KPCs): welding depth and average pore volume. In the proposed approach, a comprehensive range of laser welding Key Input Characteristics (KICs) is utilized, including welding beam geometries, welding feed rates, path repetitions for weld beam geometries, and bright light weld ratios for all paths, all of which were obtained from hairpin welding experiments. Two deep learning networks are employed with multiple hidden dense layers and linear activation functions to showcase the capabilities of deep neural networks in capturing the intricate nonlinear connections inherent within welding KPCs and KICs. Applying deep learning networks to the small numerical experimental hairpin welding dataset has shown promising results, achieving Mean Absolute Error (MAE) values as low as 0.1079 for predicting welding depth and 0.0641 for average pore volume. Additionally, the validity verification demonstrates the reliability of the proposed method. This, in turn, promises significant advantages in controlling welding outcomes, moving beyond the current trend of relying merely on monitoring for defect classification.
    摘要 为提高钉焊质量控制,本研究提出了一种可靠的深度学习模型,可以预测两个关键钉焊性能指标(KPC):钉焊深度和平均孔颗率。在提posed方法中,使用了涵盖了激光钉焊的各种钉焊输入特征(KIC),包括钉焊光束几何、钉焊速率、路径重复数 для钉焊光束几何、和所有路径的明亮焊率。两个深度学习网络被采用,每个网络有多个隐藏层和线性活化函数,以示深度神经网络在钉焊KPC和KIC之间的非线性连接的强大能力。在应用深度学习网络于小规模的数字实验室钉焊 dataset中,已经达到了0.1079的 Mean Absolute Error(MAE)值,用于预测钉焊深度,以及0.0641的 MAE值,用于预测平均孔颗率。此外,有效验证表明提posed方法的可靠性。这有助于控制钉焊结果,超出了现在的仅仅通过监测来控制钉焊质量的趋势。

ActiveClean: Generating Line-Level Vulnerability Data via Active Learning

  • paper_url: http://arxiv.org/abs/2312.01588
  • repo_url: None
  • paper_authors: Ashwin Kallingal Joshy, Mirza Sanjida Alam, Shaila Sharmin, Qi Li, Wei Le
  • for: 这个论文的目的是开发一种系统atic tools,以生成大量高质量的line-level漏洞数据,从而提高漏洞检测的准确率。
  • methods: 该论文使用了活动学习方法,通过对commitlines进行Semantic和 sintactic特征的设计,然后使用这些特征来训练模型。
  • results: 该论文的ActiveClean模型可以很好地生成line-level漏洞数据,并且可以和现有的static分析方法相比肩。在进行了4.3K commits和119K commit lines的评估中,ActiveClean achieved an F1 score between 70-74。此外,该论文还示出了使用活动学习方法可以使用少量的训练数据来达到高度的准确率。
    Abstract Deep learning vulnerability detection tools are increasing in popularity and have been shown to be effective. These tools rely on large volume of high quality training data, which are very hard to get. Most of the currently available datasets provide function-level labels, reporting whether a function is vulnerable or not vulnerable. However, for a vulnerability detection to be useful, we need to also know the lines that are relevant to the vulnerability. This paper makes efforts towards developing systematic tools and proposes. ActiveClean to generate the large volume of line-level vulnerability data from commits. That is, in addition to function-level labels, it also reports which lines in the function are likely responsible for vulnerability detection. In the past, static analysis has been applied to clean commits to generate line-level data. Our approach based on active learning, which is easy to use and scalable, provide a complementary approach to static analysis. We designed semantic and syntactic properties from commit lines and use them to train the model. We evaluated our approach on both Java and C datasets processing more than 4.3K commits and 119K commit lines. AcitveClean achieved an F1 score between 70-74. Further, we also show that active learning is effective by using just 400 training data to reach F1 score of 70.23. Using ActiveClean, we generate the line-level labels for the entire FFMpeg project in the Devign dataset, including 5K functions, and also detected incorrect function-level labels. We demonstrated that using our cleaned data, LineVul, a SOTA line-level vulnerability detection tool, detected 70 more vulnerable lines and 18 more vulnerable functions, and improved Top 10 accuracy from 66% to 73%.
    摘要 深度学习漏洞检测工具在使用中日益受欢迎,其效果也得到了证明。这些工具需要大量高质量训练数据,但获得这些数据很 diffficult。现有的大多数数据集只提供函数级别的标签,而不是每行的标签。为了使漏洞检测有用,我们需要知道漏洞的相关行。这篇论文努力于开发系统化工具,并提出了ActiveClean来生成大量的行级漏洞数据。即除了函数级别的标签外,ActiveClean还报告了哪些行在函数中是漏洞检测的关键。在过去,静态分析已经被应用于清理提交以生成行级数据。我们的方法基于活动学习,它是容易使用和扩展的。我们从提交行中提取语义和 sintaxis 属性,并将其用于训练模型。我们对Java和C datasets进行了评估,处理了超过4300个提交和119000行提交。ActiveClean在这些数据集上 achieved F1 score between 70-74。此外,我们还证明了活动学习的有效性,只使用400个训练数据可以达到 F1 score of 70.23。使用ActiveClean,我们为FFMpeg项目在Devign数据集中生成了全部函数级别的漏洞数据,包括5000个函数,并检测到了错误的函数级别标签。我们示出了使用我们的净化数据,LineVul,当前最佳的行级漏洞检测工具,检测到了70个更多的漏洞行和18个更多的漏洞函数,并提高了Top 10准确率从66%提高到73%。

Scalable and Independent Learning of Nash Equilibrium Policies in $n$-Player Stochastic Games with Unknown Independent Chains

  • paper_url: http://arxiv.org/abs/2312.01587
  • repo_url: None
  • paper_authors: Tiancheng Qin, S. Rasoul Etesami
  • for: 这种研究是为了找到一种能够在多个玩家的随机游戏中找到最佳策略的算法。
  • methods: 这种算法使用了均衡权重Reflective descent算法,并且使用了可靠集的技术来保证高概率估计Unknown transition matrices。
  • results: 这种算法可以在 polynomial 时间内 converge to the set of $\epsilon$-NE policies,并且可以在一个弱的距离(即 Nikaido-Isoda 距离的均值)中 converge to the stable $\epsilon$-NE policy。
    Abstract We study a subclass of $n$-player stochastic games, namely, stochastic games with independent chains and unknown transition matrices. In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players. However, players' decisions are coupled through their payoff functions. We assume players can receive only realizations of their payoffs, and that the players can not observe the states and actions of other players, nor do they know the transition probability matrices of their own Markov chain. Relying on a compact dual formulation of the game based on occupancy measures and the technique of confidence set to maintain high-probability estimates of the unknown transition matrices, we propose a fully decentralized mirror descent algorithm to learn an $\epsilon$-NE for this class of games. The proposed algorithm has the desired properties of independence, scalability, and convergence. Specifically, under no assumptions on the reward functions, we show the proposed algorithm converges in polynomial time in a weaker distance (namely, the averaged Nikaido-Isoda gap) to the set of $\epsilon$-NE policies with arbitrarily high probability. Moreover, assuming the existence of a variationally stable Nash equilibrium policy, we show that the proposed algorithm converges asymptotically to the stable $\epsilon$-NE policy with arbitrarily high probability. In addition to Markov potential games and linear-quadratic stochastic games, this work provides another subclass of $n$-player stochastic games that, under some mild assumptions, admit polynomial-time learning algorithms for finding their stationary $\epsilon$-NE policies.
    摘要 我们研究一种$n$-Player随机游戏的子集,即独立链随机游戏和未知过渡矩阵。在这类游戏中,玩家控制自己的内部Markov链,过渡不依赖于其他玩家的状态/动作。但是玩家的决策是通过奖励函数相互联系的。我们假设玩家只能获得实现的奖励,并且玩家不能见到其他玩家的状态和动作,也不知道自己的过渡概率矩阵。我们基于Markov链的占据 dual 形式和信任集的技术,提出了一种完全分布式镜像搜索算法,用于学习这类游戏的 $\epsilon$-NE 策略。我们的算法具有独立、可扩展性和收敛性的特点。具体来说,不论奖励函数的假设,我们证明我们的算法在弱Distance(即Nikaido-Isoda距离)中的平均时间内 converges 到 $\epsilon$-NE 策略的集合,并且可以在高概率下获得这些策略。此外,假设存在一个变化稳定的 Nash 平衡策略,我们证明我们的算法会在 asymptotic 上转移到稳定 $\epsilon$-NE 策略,并且可以在高概率下获得这些策略。除了Markov potential game和线性-quadratic stochastic game之外,这种研究还提供了另一种$n$-Player随机游戏的子集,这些游戏在某些轻微的假设下可以在 polynomial 时间内学习其站点 $\epsilon$-NE 策略。

RJHMC-Tree for Exploration of the Bayesian Decision Tree Posterior

  • paper_url: http://arxiv.org/abs/2312.01577
  • repo_url: None
  • paper_authors: Jodie A. Cochrane, Adrian G. Wills, Sarah J. Johnson
  • for: 学习决策树(Decision Tree)使用感知机制(Bayesian Approach),解决机器学习社区中广泛应用决策树的问题。
  • methods: 使用Markov Chain Monte Carlo(MCMC)方法,其效果和效率归结于提议质量。本文探讨使用梯形 Monte Carlo(HMC)方法来更有效地探索bayesian决策树 posterior,通过利用可见性的几何学特性来实现全局更新。
  • results: HMC方法在预测测试精度、接受率和树复杂性方面表现优于现有方法。
    Abstract Decision trees have found widespread application within the machine learning community due to their flexibility and interpretability. This paper is directed towards learning decision trees from data using a Bayesian approach, which is challenging due to the potentially enormous parameter space required to span all tree models. Several approaches have been proposed to combat this challenge, with one of the more successful being Markov chain Monte Carlo (MCMC) methods. The efficacy and efficiency of MCMC methods fundamentally rely on the quality of the so-called proposals, which is the focus of this paper. In particular, this paper investigates using a Hamiltonian Monte Carlo (HMC) approach to explore the posterior of Bayesian decision trees more efficiently by exploiting the geometry of the likelihood within a global update scheme. Two implementations of the novel algorithm are developed and compared to existing methods by testing against standard datasets in the machine learning and Bayesian decision tree literature. HMC-based methods are shown to perform favourably with respect to predictive test accuracy, acceptance rate, and tree complexity.
    摘要

Toward Automated Quantum Variational Machine Learning

  • paper_url: http://arxiv.org/abs/2312.01567
  • repo_url: None
  • paper_authors: Omer Subasi
  • for: automatizing quantum variational machine learning
  • methods: multi-locality parallelizable search algorithm called MUSE
  • results: + improves detection accuracy of quantum variational classifiers by 2.3 times on average + improves quality of predictions from negative to positive coefficients of determination on real-world regression datasets + classification and regression scores of quantum variational models trained with MUSE are on par with classical counterparts
    Abstract In this work, we address the problem of automating quantum variational machine learning. We develop a multi-locality parallelizable search algorithm, called MUSE, to find the initial points and the sets of parameters that achieve the best performance for quantum variational circuit learning. Simulations with five real-world classification datasets indicate that on average, MUSE improves the detection accuracy of quantum variational classifiers 2.3 times with respect to the observed lowest scores. Moreover, when applied to two real-world regression datasets, MUSE improves the quality of the predictions from negative coefficients of determination to positive ones. Furthermore, the classification and regression scores of the quantum variational models trained with MUSE are on par with the classical counterparts.
    摘要 在这项工作中,我们解决了量子变量机器学习自动化问题。我们开发了多地点并行搜索算法MUSE,以找到最佳性能的量子变量环境学习初始点和参数集。通过对五个真实世界分类 dataset 进行 simulations,我们发现,MUSE 在量子变量分类器的检测精度方面平均提高了2.3倍,并且当应用于两个真实世界回归 dataset 时,MUSE 可以从负相关系数提升到正相关系数。此外,使用 MUSE 训练的量子变量模型的分类和回归得分与классиical counterparts 相当。

Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression

  • paper_url: http://arxiv.org/abs/2312.01547
  • repo_url: None
  • paper_authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas
  • for: 这个论文是关于 Gaussian 平均估计和robust 线性回归的基本问题的研究,它们在 Gaussian 冲击下存在 Huber 污染。
  • methods: 这个论文使用了 near-optimal 和 almost linear-time 算法,并且使用了多向滤波技术来解决这些问题。
  • results: 这个论文提出了一个 sample near-optimal 和 almost linear-time 算法,可以在 $\mathbb{R}^d$ 上进行 Gaussian 平均估计和robust 线性回归,并且可以在 $\ell_2$-error 内部 approximates 目标平均和目标回归函数。这个结果比先前的工作更好,并且解决了一个在文献中开放的问题。
    Abstract We study the fundamental problems of Gaussian mean estimation and linear regression with Gaussian covariates in the presence of Huber contamination. Our main contribution is the design of the first sample near-optimal and almost linear-time algorithms with optimal error guarantees for both of these problems. Specifically, for Gaussian robust mean estimation on $\mathbb{R}^d$ with contamination parameter $\epsilon \in (0, \epsilon_0)$ for a small absolute constant $\epsilon_0$, we give an algorithm with sample complexity $n = \tilde{O}(d/\epsilon^2)$ and almost linear runtime that approximates the target mean within $\ell_2$-error $O(\epsilon)$. This improves on prior work that achieved this error guarantee with polynomially suboptimal sample and time complexity. For robust linear regression, we give the first algorithm with sample complexity $n = \tilde{O}(d/\epsilon^2)$ and almost linear runtime that approximates the target regressor within $\ell_2$-error $O(\epsilon)$. This is the first polynomial sample and time algorithm achieving the optimal error guarantee, answering an open question in the literature. At the technical level, we develop a methodology that yields almost-linear time algorithms for multi-directional filtering that may be of broader interest.
    摘要 我们研究了 Gaussian 平均估计和线性回传 regression 问题中的基本问题,在 Hubert 污染的存在下。我们的主要贡献是设计了首个近乎线性时间和几乎线性时间的算法,具有最佳误差保证且适用于这两个问题。具体来说,我们在 $\mathbb{R}^d$ 上的 Gaussian 稳定平均估计问题中,使用污染参数 $\epsilon \in (0, \epsilon_0)$,其中 $\epsilon_0$ 是小数字常数,我们提供了一个算法,其sample 复杂度为 $n = \tilde{O}(d/\epsilon^2)$,runtime 为近乎线性时间,可以将目标平均点 approximated 到 $\ell_2$-误差 $O(\epsilon)$ 之内。这比先前的工作更好,它们可以通过多项式复杂度和时间来实现这个误差保证。另外,我们还提供了一个可能对更广泛的应用有用的方法,即多向 filtering 方法。For the robust linear regression, we also provide the first algorithm with sample complexity $n = \tilde{O}(d/\epsilon^2)$ and almost linear runtime that approximates the target regressor within $\ell_2$-error $O(\epsilon)$. This is the first polynomial sample and time algorithm achieving the optimal error guarantee, answering an open question in the literature.