For: This paper proposes a rewiring technique based on Augmented Forman-Ricci curvature (AFRC) to mitigate over-smoothing and over-squashing effects in message-passing Graph Neural Networks (GNNs).* Methods: The proposed technique uses AFRC, a scalable curvature notation that can be computed in linear time, to characterize over-smoothing and over-squashing effects in GNNs.* Results: The proposed approach achieves state-of-the-art performance while significantly reducing the computational cost in comparison with other methods. The paper also provides effective heuristics for hyperparameters in curvature-based rewiring, which avoids expensive hyperparameter searches.Abstract
While Graph Neural Networks (GNNs) have been successfully leveraged for learning on graph-structured data across domains, several potential pitfalls have been described recently. Those include the inability to accurately leverage information encoded in long-range connections (over-squashing), as well as difficulties distinguishing the learned representations of nearby nodes with growing network depth (over-smoothing). An effective way to characterize both effects is discrete curvature: Long-range connections that underlie over-squashing effects have low curvature, whereas edges that contribute to over-smoothing have high curvature. This observation has given rise to rewiring techniques, which add or remove edges to mitigate over-smoothing and over-squashing. Several rewiring approaches utilizing graph characteristics, such as curvature or the spectrum of the graph Laplacian, have been proposed. However, existing methods, especially those based on curvature, often require expensive subroutines and careful hyperparameter tuning, which limits their applicability to large-scale graphs. Here we propose a rewiring technique based on Augmented Forman-Ricci curvature (AFRC), a scalable curvature notation, which can be computed in linear time. We prove that AFRC effectively characterizes over-smoothing and over-squashing effects in message-passing GNNs. We complement our theoretical results with experiments, which demonstrate that the proposed approach achieves state-of-the-art performance while significantly reducing the computational cost in comparison with other methods. Utilizing fundamental properties of discrete curvature, we propose effective heuristics for hyperparameters in curvature-based rewiring, which avoids expensive hyperparameter searches, further improving the scalability of the proposed approach.
摘要
graph neural networks (GNNs) 已经成功地应用于不同领域的图数据上,但最近有一些潜在的坑害被描述了。这些坑害包括不能准确利用图中长距离连接中的信息 (过滤),以及随着网络深度增加而导致近节点的学习表现相似化 (过滤)。一种有效的方式来描述这两种效果是离散曲率:图中长距离连接的离散曲率较低,而导致过滤的边的离散曲率较高。这一观察引起了重新连接技术的出现,这些技术通过添加或 removing 边来缓解过滤和过滤的问题。已有一些基于图特性的重新连接方法,如曲率或图laplacian的谱,被提出。然而,现有的方法,特别是基于曲率的方法,经常需要费时的优化和精心调整 hyperparameter,这限制了它们在大规模图上的应用。我们提出了基于 Augmented Forman-Ricci curvature (AFRC) 的重新连接技术,AFRC 是一种可以在线时间内计算的离散曲率表示法。我们证明 AFRC 能够有效地描述 GNN 中的过滤和过滤效果。我们通过实验证明,我们的方法可以 achieve state-of-the-art 性能,同时减少了与其他方法相比的计算成本。利用离散曲率的基本属性,我们提出了一些有效的启发式 hyperparameter 优化方法,以避免费时的寻找优化方法,进一步提高了我们的方法的可扩展性。
results: 我们发现 global model 使用 fix-length sequence 更快地 converges than varying-length sequence。Abstract
In this work, we explored federated learning in temporal heterogeneity across clients. We observed that global model obtained by \texttt{FedAvg} trained with fixed-length sequences shows faster convergence than varying-length sequences. We proposed methods to mitigate temporal heterogeneity for efficient federated learning based on the empirical observation.
摘要
在这项工作中,我们探索了联邦学习中的时间不同客户端之间的差异。我们发现,使用固定长度序列的\texttt{FedAvg}训练的全局模型在更快地尝试了。我们提出了一些缓解时间差异的方法,以便有效地进行联邦学习,基于实际观察。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
Fully Convolutional Generative Machine Learning Method for Accelerating Non-Equilibrium Greens Function Simulations
results: 据报告,ML-NEGF方法可以在相同精度下提高模拟速度,减少计算时间,平均提高了60%。Abstract
This work describes a novel simulation approach that combines machine learning and device modelling simulations. The device simulations are based on the quantum mechanical non-equilibrium Greens function (NEGF) approach and the machine learning method is an extension to a convolutional generative network. We have named our new simulation approach ML-NEGF and we have implemented it in our in-house simulator called NESS (nano-electronics simulations software). The reported results demonstrate the improved convergence speed of the ML-NEGF method in comparison to the standard NEGF approach. The trained ML model effectively learns the underlying physics of nano-sheet transistor behaviour, resulting in faster convergence of the coupled Poisson-NEGF simulations. Quantitatively, our ML- NEGF approach achieves an average convergence acceleration of 60%, substantially reducing the computational time while maintaining the same accuracy.
摘要
A Survey on Congestion Control and Scheduling for Multipath TCP: Machine Learning vs Classical Approaches
paper_authors: Maisha Maliha, Golnaz Habibi, Mohammed Atiquzzaman
for: 本研究旨在解决多路 TCP (MPTCP) 中的几个问题,包括流量占用和延迟控制。
methods: 本研究使用两种主要方法:非数据驱动(传统)方法和数据驱动(机器学习)方法。
results: 本研究对这两种方法的优缺点进行比较,并提供实际环境中 MPCTP 的实现和模拟。Abstract
Multipath TCP (MPTCP) has been widely used as an efficient way for communication in many applications. Data centers, smartphones, and network operators use MPTCP to balance the traffic in a network efficiently. MPTCP is an extension of TCP (Transmission Control Protocol), which provides multiple paths, leading to higher throughput and low latency. Although MPTCP has shown better performance than TCP in many applications, it has its own challenges. The network can become congested due to heavy traffic in the multiple paths (subflows) if the subflow rates are not determined correctly. Moreover, communication latency can occur if the packets are not scheduled correctly between the subflows. This paper reviews techniques to solve the above-mentioned problems based on two main approaches; non data-driven (classical) and data-driven (Machine Learning) approaches. This paper compares these two approaches and highlights their strengths and weaknesses with a view to motivating future researchers in this exciting area of machine learning for communications. This paper also provides details on the simulation of MPTCP and its implementations in real environments.
摘要
multipath TCP (MPTCP) 已经广泛应用于许多应用程序中,以提高网络吞吐量和低延迟。数据中心、智能手机和网络运营商都使用 MPTCP 来均衡网络流量。MPTCP 是 TCP(传输控制协议)的扩展,它提供多个路径,从而实现更高的吞吐量和低延迟。虽然 MPTCP 在许多应用中表现了更好的性能,但它还存在一些挑战。如果多个流(subflow)的流量不是正确地确定的话,网络就可能变得拥堵。此外,如果包没有正确地安排的话,则会出现交通延迟。本文评论了解决上述问题的两种方法:非数据驱动(传统)方法和数据驱动(机器学习)方法。本文比较这两种方法的优劣,并强调它们在这一领域的挑战和未来研究的可能性。此外,本文还提供了 MPTCP 的模拟和实现在真实环境中的细节。
An Automatic Tuning MPC with Application to Ecological Cruise Control
results: simulations results show that the proposed approach can effectively optimize the fuel consumption of the ecological cruise control system under different road geometries.Abstract
Model predictive control (MPC) is a powerful tool for planning and controlling dynamical systems due to its capacity for handling constraints and taking advantage of preview information. Nevertheless, MPC performance is highly dependent on the choice of cost function tuning parameters. In this work, we demonstrate an approach for online automatic tuning of an MPC controller with an example application to an ecological cruise control system that saves fuel by using a preview of road grade. We solve the global fuel consumption minimization problem offline using dynamic programming and find the corresponding MPC cost function by solving the inverse optimization problem. A neural network fitted to these offline results is used to generate the desired MPC cost function weight during online operation. The effectiveness of the proposed approach is verified in simulation for different road geometries.
摘要
Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties
paper_authors: Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst
for: 本研究使用机器学习技术来预测分子性质,加速药物发现和材料设计。
methods: 本研究使用深度学习技术,包括多层编码器和解码器架构,进行分类任务。
results: 我们通过应用这种方法,在不同的输入数据上 achiev 高度预测力,例如在Matbench和Moleculenetbenchmarks上,并进行了分子数据表示 вектор的全面分析,揭示了分子数据中的下意识模式。Abstract
The application of machine learning (ML) techniques in computational chemistry has led to significant advances in predicting molecular properties, accelerating drug discovery, and material design. ML models can extract hidden patterns and relationships from complex and large datasets, allowing for the prediction of various chemical properties with high accuracy. The use of such methods has enabled the discovery of molecules and materials that were previously difficult to identify. This paper introduces a new ML model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks. We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds. In particular, we developed and tested the model using the Matbench and Moleculenet benchmarks, which include crystal properties and drug design-related benchmarks. We also conduct a comprehensive analysis of vector representations of chemical compounds, shedding light on the underlying patterns in molecular data. The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning when applied to molecular and material datasets. For instance, on the Tox21 dataset, we achieved an average accuracy of 96%, surpassing the previous best result by 10%. Our code is publicly available at https://github.com/dmamur/elembert.
摘要
机器学习(ML)技术在计算化学中应用得到了显著的进步,包括预测分子性质、加速药物发现和材料设计。ML模型可以从复杂大量数据中提取隐藏的模式和关系,以高度准确地预测各种化学性质。这种方法的应用使得可以发现 previously difficult to identify的分子和材料。本文介绍了一种基于深度学习技术的新ML模型,包括多层编码器和解码器建筑,用于分类任务。我们通过应用这种方法于不同类型的输入数据,包括有机和无机化合物,证明了我们的方法的可行性。具体来说,我们使用了Matbench和Moleculenetbenchmark,包括晶体性和药物设计相关的benchmark,进行了全面的分子表示vector分析,揭示了分子数据中的下面纲。使用的模型在这个工作中表现出了高度预测力,这将进一步推动了对分子和材料数据的机器学习应用。例如,在Tox21dataset上,我们实现了96%的平均准确率,比前一个最佳结果高出10%。我们的代码可以在https://github.com/dmamur/elembert上下载。
Simulation-based Inference for Exoplanet Atmospheric Retrieval: Insights from winning the Ariel Data Challenge 2023 using Normalizing Flows
paper_authors: Mayeul Aubin, Carolina Cuesta-Lazaro, Ethan Tregidga, Javier Viaña, Cecilia Garraffo, Iouli E. Gordon, Mercedes López-Morales, Robert J. Hargreaves, Vladimir Yu. Makhnev, Jeremy J. Drake, Douglas P. Finkbeiner, Phillip Cargile
results: 研究发现了一种新的机器学习模型,可以更高效地分析外层行星大气层谱。此外,研究还发现了一种更高性能的模型,即使在挑战中获得较低分而然。这些发现表明需要重新评估评价指标,并且探索更加高效和准确的方法来分析外层行星大气层谱。Abstract
Advancements in space telescopes have opened new avenues for gathering vast amounts of data on exoplanet atmosphere spectra. However, accurately extracting chemical and physical properties from these spectra poses significant challenges due to the non-linear nature of the underlying physics. This paper presents novel machine learning models developed by the AstroAI team for the Ariel Data Challenge 2023, where one of the models secured the top position among 293 competitors. Leveraging Normalizing Flows, our models predict the posterior probability distribution of atmospheric parameters under different atmospheric assumptions. Moreover, we introduce an alternative model that exhibits higher performance potential than the winning model, despite scoring lower in the challenge. These findings highlight the need to reevaluate the evaluation metric and prompt further exploration of more efficient and accurate approaches for exoplanet atmosphere spectra analysis. Finally, we present recommendations to enhance the challenge and models, providing valuable insights for future applications on real observational data. These advancements pave the way for more effective and timely analysis of exoplanet atmospheric properties, advancing our understanding of these distant worlds.
摘要
This paper presents new machine learning models developed by the AstroAI team for the Ariel Data Challenge 2023. One of our models achieved the top position among 293 competitors by leveraging Normalizing Flows to predict the posterior probability distribution of atmospheric parameters under different atmospheric assumptions.Furthermore, we introduce an alternative model that exhibits higher performance potential than the winning model, despite scoring lower in the challenge. These findings highlight the need to reevaluate the evaluation metric and prompt further exploration of more efficient and accurate approaches for analyzing exoplanet atmosphere spectra.Finally, we provide recommendations to enhance the challenge and models, offering valuable insights for future applications on real observational data. These advancements pave the way for more effective and timely analysis of exoplanet atmospheric properties, deepening our understanding of these distant worlds.
Experiential-Informed Data Reconstruction for Fishery Sustainability and Policies in the Azores
results: 研究结果表明,通过使用不同的模型方法可以有效地重建数据集,并提供了新的视角对不同渔业的行为和时间的影响,这些信息对未来鱼类人口评估和管理具有重要意义。Abstract
Fishery analysis is critical in maintaining the long-term sustainability of species and the livelihoods of millions of people who depend on fishing for food and income. The fishing gear, or metier, is a key factor significantly impacting marine habitats, selectively targeting species and fish sizes. Analysis of commercial catches or landings by metier in fishery stock assessment and management is crucial, providing robust estimates of fishing efforts and their impact on marine ecosystems. In this paper, we focus on a unique data set from the Azores' fishing data collection programs between 2010 and 2017, where little information on metiers is available and sparse throughout our timeline. Our main objective is to tackle the task of data set reconstruction, leveraging domain knowledge and machine learning methods to retrieve or associate metier-related information to each fish landing. We empirically validate the feasibility of this task using a diverse set of modeling approaches and demonstrate how it provides new insights into different fisheries' behavior and the impact of metiers over time, which are essential for future fish population assessments, management, and conservation efforts.
摘要
鱼业分析是维护生物种和渔业生产的长期可持续性的关键。鱼网(metier)是影响海洋生态系统的关键因素,可以选择性地目标种类和鱼的大小。在鱼业资源评估和管理中,商业捕捞数据的分析是非常重要的,可以提供坚实的捕捞努力和海洋生态系统的影响。本文关注Azores鱼业数据收集计划在2010年至2017年之间的独特数据集,因为这个数据集中有少量的鱼网信息,并且这些信息在时间线上是罕见的。我们的主要目标是使用领域知识和机器学习方法来重建这个数据集,并将鱼网相关信息与每个鱼投射相关联。我们通过多种模型方法进行实验验证,并证明这个任务的可行性,从而提供新的鱼业行为和鱼网的影响情况,这些信息对未来鱼种评估、管理和保护具有重要意义。
Kinematics-aware Trajectory Generation and Prediction with Latent Stochastic Differential Modeling
results: 我们的方法在生成和预测车辆路径时比基eline方法表现出色,生成的路径更加真实、物理可行和精度可控。Abstract
Trajectory generation and trajectory prediction are two critical tasks for autonomous vehicles, which generate various trajectories during development and predict the trajectories of surrounding vehicles during operation, respectively. However, despite significant advances in improving their performance, it remains a challenging problem to ensure that the generated/predicted trajectories are realistic, explainable, and physically feasible. Existing model-based methods provide explainable results, but are constrained by predefined model structures, limiting their capabilities to address complex scenarios. Conversely, existing deep learning-based methods have shown great promise in learning various traffic scenarios and improving overall performance, but they often act as opaque black boxes and lack explainability. In this work, we integrate kinematic knowledge with neural stochastic differential equations (SDE) and develop a variational autoencoder based on a novel latent kinematics-aware SDE (LK-SDE) to generate vehicle motions. Our approach combines the advantages of both model-based and deep learning-based techniques. Experimental results demonstrate that our method significantly outperforms baseline approaches in producing realistic, physically-feasible, and precisely-controllable vehicle trajectories, benefiting both generation and prediction tasks.
摘要
几何轨迹生成和预测是自动车的两个关键任务,它们在开发过程中产生了许多轨迹,并在运行过程中预测周围车辆的轨迹。然而,即使有了重要的进步,仍然是一个挑战性的问题,确保生成/预测的轨迹是现实、可解释和物理可行的。现有的模型基方法可以提供可解释的结果,但它们受限于预先定义的模型结构,导致它们无法处理复杂的enario。相反,现有的深度学习基本方法在学习不同的交通enario中表现出色,但它们经常作为透明的黑盒子,无法提供可解释的结果。在这个工作中,我们结合了几何知识和神经统计学 differential equation (SDE),开发了一个基于novel latent kinematics-aware SDE (LK-SDE)的抽象自动车动作统计模型。我们的方法结合了模型基的优点和深度学习基的优点。实验结果显示,我们的方法与基准方法相比,在生成和预测轨迹任务中表现出色,具有现实、物理可行和精确控制的轨迹。
Energy stable neural network for gradient flow equations
results: 通过实验证明,该网络能够生成高精度和稳定的预测结果Abstract
In this paper, we propose an energy stable network (EStable-Net) for solving gradient flow equations. The solution update scheme in our neural network EStable-Net is inspired by a proposed auxiliary variable based equivalent form of the gradient flow equation. EStable-Net enables decreasing of a discrete energy along the neural network, which is consistent with the property in the evolution process of the gradient flow equation. The architecture of the neural network EStable-Net consists of a few energy decay blocks, and the output of each block can be interpreted as an intermediate state of the evolution process of the gradient flow equation. This design provides a stable, efficient and interpretable network structure. Numerical experimental results demonstrate that our network is able to generate high accuracy and stable predictions.
摘要
在这篇论文中,我们提出了一种能量稳定网络(EStable-Net),用于解决梯度流方程。我们的神经网络EStable-Net中的解决方案是基于提出的辅助变量基于等效形式的梯度流方程的想法。EStable-Net使得梯度流方程中的能量逐渐减少,与演化过程中的性质相一致。神经网络EStable-Net的架构包括一些能量衰减块,每个块的输出可以被解释为梯度流方程的演化过程中的中间状态。这种设计提供了稳定、高效和可解释的网络结构。数值实验结果表明,我们的网络能够生成高精度和稳定的预测。
Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets
methods: 这个论文使用了SGD算法,并证明了其在 kontinuous time 下的快速收敛速率,包括使用SoftPlus activation function。
results: 论文证明了SGD在这些对象函数上的快速收敛,并且适用于任何数据和Activation函数。Abstract
In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth $2$ nets -- for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.
摘要
在这份备忘录中,我们证明了SGD在适当规则化的Logistic Empirical Risk函数的深度为2的神经网络上具有首次性的可证明收敛性,包括任意数据和任意数量的门控件,以及具有适当的平滑和缓冲的活化函数,如sigmoid和tanh。我们还证明了SGD在继续时间下的快速收敛速率,其适用于饱和不bounded的活化函数,如SoftPlus。我们的关键想法是证明常量大小神经网络上的Frobenius norm规则化Logistic loss函数是"Villani函数",因此可以基于最近的SGD分析进程建立。
User Assignment and Resource Allocation for Hierarchical Federated Learning over Wireless Networks
results: 实验结果显示,这个提案的HFL架构可以对现有的研究进行能源和延迟时间的明显优化。Abstract
The large population of wireless users is a key driver of data-crowdsourced Machine Learning (ML). However, data privacy remains a significant concern. Federated Learning (FL) encourages data sharing in ML without requiring data to leave users' devices but imposes heavy computation and communications overheads on mobile devices. Hierarchical FL (HFL) alleviates this problem by performing partial model aggregation at edge servers. HFL can effectively reduce energy consumption and latency through effective resource allocation and appropriate user assignment. Nevertheless, resource allocation in HFL involves optimizing multiple variables, and the objective function should consider both energy consumption and latency, making the development of resource allocation algorithms very complicated. Moreover, it is challenging to perform user assignment, which is a combinatorial optimization problem in a large search space. This article proposes a spectrum resource optimization algorithm (SROA) and a two-stage iterative algorithm (TSIA) for HFL. Given an arbitrary user assignment pattern, SROA optimizes CPU frequency, transmit power, and bandwidth to minimize system cost. TSIA aims to find a user assignment pattern that considerably reduces the total system cost. Experimental results demonstrate the superiority of the proposed HFL framework over existing studies in energy and latency reduction.
摘要
大量无线用户人群是数据拥有者学习(ML)的关键驱动力,但数据隐私保护仍然是一大问题。联邦学习(FL)鼓励数据在用户设备上进行学习,而不需要数据离开用户设备,但是在移动设备上进行计算和通信 overhead 占用了大量资源。层次联邦学习(HFL)解决了这个问题,通过在边缘服务器进行部分模型聚合来减少计算和通信 overhead。HFL 可以有效降低能源消耗和延迟,通过有效的资源分配和合适的用户分配。但是,资源分配在 HFL 中包括优化多个变量的问题,并且目标函数应该考虑到能源消耗和延迟两个方面,这使得资源分配算法的开发变得非常复杂。另外,用户分配是一个具有大量搜索空间的启发式优化问题。本文提出了一种spectrum resource optimization algorithm(SROA)和一种两个阶段迭代算法(TSIA)来解决 HFL 中的资源分配和用户分配问题。给定任意用户分配模式,SROA 将在用户设备上优化 CPU 频率、发射功率和带宽,以最小化系统成本。TSIA 则是一种希望找到一个考虑到总系统成本的用户分配模式。实验结果表明,提出的 HFL 框架在能源和延迟两个方面都有较好的性能。
High-dimensional manifold of solutions in neural networks: insights from statistical physics
results: 论文发现了 zero 训练错误配置的几何排序,并如何这种排序随训练集大小的增加而改变。 论文还证明了,在 binary 权重模型中,算法困难是因为解决区域的消失,这个区域可以到非常大的距离。最后,论文表明了研究线性模式连接 между解决方案可以提供解决批处理的平均形状的信息。Abstract
In these pedagogic notes I review the statistical mechanics approach to neural networks, focusing on the paradigmatic example of the perceptron architecture with binary an continuous weights, in the classification setting. I will review the Gardner's approach based on replica method and the derivation of the SAT/UNSAT transition in the storage setting. Then, I discuss some recent works that unveiled how the zero training error configurations are geometrically arranged, and how this arrangement changes as the size of the training set increases. I also illustrate how different regions of solution space can be explored analytically and how the landscape in the vicinity of a solution can be characterized. I give evidence how, in binary weight models, algorithmic hardness is a consequence of the disappearance of a clustered region of solutions that extends to very large distances. Finally, I demonstrate how the study of linear mode connectivity between solutions can give insights into the average shape of the solution manifold.
摘要
这些教学笔记中,我将对神经网络的统计力学方法进行介绍,以某种类型的感知器架构为例,并将着眼于分类设置下的情况。我将详细介绍加德纳的方法,包括使用复制方法的 derivation,以及存储设置下的 SAT/UNSAT 转变。然后,我会讨论一些最近的研究,描述了 zero training error 配置的几何排布,以及这个排布如何随训练集大小的变化。我还会说明如何在解决空间中分析不同区域的解,以及在解近 vicinity 中描述解决方案的场景。最后,我会展示如何在 binary weight 模型中,算法困难性是因为解决空间中的集中区域消失。此外,我还会讨论如何通过 linear mode 连接 между解来了解解决方案的平均形状。
Globally Convergent Accelerated Algorithms for Multilinear Sparse Logistic Regression with $\ell_0$-constraints
For: The paper is written for analyzing multidimensional data using a Multilinear Sparse Logistic Regression model with $\ell_0$-constraints (MLSR).* Methods: The paper proposes an Accelerated Proximal Alternating Linearized Minimization with Adaptive Momentum (APALM$^+$) method to solve the $\ell_0$-MLSR model, which is a novel approach that combines the advantages of both the $\ell_1$-norm and the $\ell_2$-norm.* Results: The paper demonstrates the superior performance of the proposed APALM$^+$ method in terms of both accuracy and speed, compared to other state-of-the-art methods, on synthetic and real-world datasets. Additionally, the paper provides a proof of convergence for the objective function of the $\ell_0$-MLSR model using the Kurdyka-Lojasiewicz property.Abstract
Tensor data represents a multidimensional array. Regression methods based on low-rank tensor decomposition leverage structural information to reduce the parameter count. Multilinear logistic regression serves as a powerful tool for the analysis of multidimensional data. To improve its efficacy and interpretability, we present a Multilinear Sparse Logistic Regression model with $\ell_0$-constraints ($\ell_0$-MLSR). In contrast to the $\ell_1$-norm and $\ell_2$-norm, the $\ell_0$-norm constraint is better suited for feature selection. However, due to its nonconvex and nonsmooth properties, solving it is challenging and convergence guarantees are lacking. Additionally, the multilinear operation in $\ell_0$-MLSR also brings non-convexity. To tackle these challenges, we propose an Accelerated Proximal Alternating Linearized Minimization with Adaptive Momentum (APALM$^+$) method to solve the $\ell_0$-MLSR model. We provide a proof that APALM$^+$ can ensure the convergence of the objective function of $\ell_0$-MLSR. We also demonstrate that APALM$^+$ is globally convergent to a first-order critical point as well as establish convergence rate by using the Kurdyka-Lojasiewicz property. Empirical results obtained from synthetic and real-world datasets validate the superior performance of our algorithm in terms of both accuracy and speed compared to other state-of-the-art methods.
摘要
tensor数据表示多维数组。基于低维张量分解的回归方法利用结构信息来减少参数数。多线性логистиック回归作为多维数据分析的powerful工具。为了提高其效果和可解性,我们提出了多线性稀缺LOGISTIC回归模型($\ell_0$-MLSR)。在$\ell_1$-norm和$\ell_2$-norm之外,$\ell_0$-norm约束更适合特征选择。然而,由于其非拟 convex和非均匀性质,解决它的困难重大,并且存在无法确保的收敛保证。此外,多线性操作在$\ell_0$-MLSR中也带来了非拟 convex性。为了解决这些挑战,我们提出了一种加速 proximal alternating linearized minimization with adaptive momentum(APALM$^+)方法来解决$\ell_0$-MLSR模型。我们提供了一个证明,表明APALM$^+$可以确保$\ell_0$-MLSR模型的目标函数收敛。此外,我们还证明APALM$^+$是全球收敛到一个第一阶关键点,并且使用库德ijk Lojasiewicz性质来确定收敛速率。实验结果表明,基于实验和实际数据,我们的算法在精度和速度方面与当前状态艺术方法相比具有显著优势。
Provable learning of quantum states with graphical models
results: 本文证明了使用这两种邻域学习算法可以在一定的情况下实现对量子状态的高效学习,并且可以比逻辑学习更快。Abstract
The complete learning of an $n$-qubit quantum state requires samples exponentially in $n$. Several works consider subclasses of quantum states that can be learned in polynomial sample complexity such as stabilizer states or high-temperature Gibbs states. Other works consider a weaker sense of learning, such as PAC learning and shadow tomography. In this work, we consider learning states that are close to neural network quantum states, which can efficiently be represented by a graphical model called restricted Boltzmann machines (RBMs). To this end, we exhibit robustness results for efficient provable two-hop neighborhood learning algorithms for ferromagnetic and locally consistent RBMs. We consider the $L_p$-norm as a measure of closeness, including both total variation distance and max-norm distance in the limit. Our results allow certain quantum states to be learned with a sample complexity \textit{exponentially} better than naive tomography. We hence provide new classes of efficiently learnable quantum states and apply new strategies to learn them.
摘要
完全学习一个 $n$-qubit量子状态需要样本数量呈指数函数关系于 $n$。一些作品考虑了一些量子状态的子集,可以在 polynomial 样本复杂性下学习,如稳定器状态或高温 Gibbs 状态。其他作品考虑了一种弱一种学习方式,如 PAC 学习和影子测试。在这项工作中,我们考虑了学习与神经网络状态相似的量子状态,可以有效地表示为受限 Boltzmann 机制(RBM)。为此,我们展示了二步邻居学习算法的Robustness 结果,包括 ferromagnetic 和本地一致 RBM。我们使用 $L_p$-norm 作为距离度量,包括总变分距离和最大 нор距离在限制中。我们的结果表示可以使用更好的样本复杂性学习一些量子状态,比Naive 测试更好。我们因此提供了新的有效地学习量子状态的类别,并应用新的策略来学习它们。
Double Normalizing Flows: Flexible Bayesian Gaussian Process ODEs Learning
paper_authors: Jian Xu, Shian Du, Junmei Yang, Xinghao Ding, John Paisley, Delu Zeng
for: 模型vector field continuous dynamical systems的bayesian inference
methods: incorporate normalizing flows to reparameterize the vector field of ODEs, 使用normalizing flows进行 posterior inference
results: 提高了模型 uncertainty和精度 estimates, 在 simulate dynamical systems and real-world human motion data中得到了更好的结果Abstract
Recently, Gaussian processes have been utilized to model the vector field of continuous dynamical systems. Bayesian inference for such models \cite{hegde2022variational} has been extensively studied and has been applied in tasks such as time series prediction, providing uncertain estimates. However, previous Gaussian Process Ordinary Differential Equation (ODE) models may underperform on datasets with non-Gaussian process priors, as their constrained priors and mean-field posteriors may lack flexibility. To address this limitation, we incorporate normalizing flows to reparameterize the vector field of ODEs, resulting in a more flexible and expressive prior distribution. Additionally, due to the analytically tractable probability density functions of normalizing flows, we apply them to the posterior inference of GP ODEs, generating a non-Gaussian posterior. Through these dual applications of normalizing flows, our model improves accuracy and uncertainty estimates for Bayesian Gaussian Process ODEs. The effectiveness of our approach is demonstrated on simulated dynamical systems and real-world human motion data, including tasks such as time series prediction and missing data recovery. Experimental results indicate that our proposed method effectively captures model uncertainty while improving accuracy.
摘要
近期,Gaussian processes 被应用于连续动力系统的vector field模型中。Bayesian推理 для这些模型 \cite{hegde2022variational} 已经得到了广泛的研究,并在任务如时间序列预测中提供了不确定估计。然而,之前的Gaussian ProcessOrdinary Differential Equation(ODE)模型可能在非Gaussian process priors的数据集上表现不佳,因为它们的受限的先验和媒介质POSTerior可能缺乏灵活性。为了解决这个限制,我们将normalizing flows integration到了ODE的vector field中,从而获得了更灵活和表达力强的先验分布。此外,由于normalizing flows的概率密度函数是可微分的,我们可以将其应用到GP ODEs的后验推理中,生成一个非Gaussian posterior。通过这种双重应用normalizing flows,我们的模型可以提高Bayesian Gaussian Process ODEs的准确性和uncertainty估计。我们的方法在模拟动力系统和真实世界人类运动数据上进行了实验,包括时间序列预测和缺失数据恢复等任务,结果表明我们的提案方法可以有效地捕捉模型uncertainty,同时提高准确性。
MFRL-BI: Design of a Model-free Reinforcement Learning Process Control Scheme by Using Bayesian Inference
results: 研究结果显示,提议的MFRL控制方案在无知过程模型情况下能够实现良好的控制性能,并且在数学性质上也得到了保证。计算研究也证明了我们的方法的有效性和效率。Abstract
Design of process control scheme is critical for quality assurance to reduce variations in manufacturing systems. Taking semiconductor manufacturing as an example, extensive literature focuses on control optimization based on certain process models (usually linear models), which are obtained by experiments before a manufacturing process starts. However, in real applications, pre-defined models may not be accurate, especially for a complex manufacturing system. To tackle model inaccuracy, we propose a model-free reinforcement learning (MFRL) approach to conduct experiments and optimize control simultaneously according to real-time data. Specifically, we design a novel MFRL control scheme by updating the distribution of disturbances using Bayesian inference to reduce their large variations during manufacturing processes. As a result, the proposed MFRL controller is demonstrated to perform well in a nonlinear chemical mechanical planarization (CMP) process when the process model is unknown. Theoretical properties are also guaranteed when disturbances are additive. The numerical studies also demonstrate the effectiveness and efficiency of our methodology.
摘要
制程控制方案的设计对制造系统质量保证具有关键性。以半导体制造为例,广泛的文献关注控制优化基于certain process models(通常是线性模型),这些模型通常通过实验 перед制造过程开始获得。然而,在实际应用中,预定义的模型可能不准确,特别是对于复杂的制造系统。为解决模型不准确的问题,我们提议使用无模型反馈学习(MFRL)方法来进行实验和控制优化同时,根据实时数据进行调整。 Specifically, we design a novel MFRL control scheme by updating the distribution of disturbances using Bayesian inference to reduce their large variations during manufacturing processes. As a result, the proposed MFRL controller is demonstrated to perform well in a nonlinear chemical mechanical planarization (CMP) process when the process model is unknown. 理论性质也得到保证,当干扰是加性的时候。 numerical studies also demonstrate the effectiveness and efficiency of our methodology.
End-to-End Optimized Pipeline for Prediction of Protein Folding Kinetics
paper_authors: Vijay Arvind. R, Haribharathi Sivakumar, Brindha. R
for: 预测蛋白质折叠动力学的高精度且低占用内存的算法 pipeline。
methods: 使用机器学习模型进行预测。
results: 比预先状态艺术模型高4.8%的准确率,并且占用内存327倍少和运行速度7.3%快。Abstract
Protein folding is the intricate process by which a linear sequence of amino acids self-assembles into a unique three-dimensional structure. Protein folding kinetics is the study of pathways and time-dependent mechanisms a protein undergoes when it folds. Understanding protein kinetics is essential as a protein needs to fold correctly for it to perform its biological functions optimally, and a misfolded protein can sometimes be contorted into shapes that are not ideal for a cellular environment giving rise to many degenerative, neuro-degenerative disorders and amyloid diseases. Monitoring at-risk individuals and detecting protein discrepancies in a protein's folding kinetics at the early stages could majorly result in public health benefits, as preventive measures can be taken. This research proposes an efficient pipeline for predicting protein folding kinetics with high accuracy and low memory footprint. The deployed machine learning (ML) model outperformed the state-of-the-art ML models by 4.8% in terms of accuracy while consuming 327x lesser memory and being 7.3% faster.
摘要
Data-Driven Reachability Analysis of Stochastic Dynamical Systems with Conformal Inference
results: 本文可以为learning-enabled控制系统提供可达性保证,并且可以处理复杂的closed-loop dynamics。Abstract
We consider data-driven reachability analysis of discrete-time stochastic dynamical systems using conformal inference. We assume that we are not provided with a symbolic representation of the stochastic system, but instead have access to a dataset of $K$-step trajectories. The reachability problem is to construct a probabilistic flowpipe such that the probability that a $K$-step trajectory can violate the bounds of the flowpipe does not exceed a user-specified failure probability threshold. The key ideas in this paper are: (1) to learn a surrogate predictor model from data, (2) to perform reachability analysis using the surrogate model, and (3) to quantify the surrogate model's incurred error using conformal inference in order to give probabilistic reachability guarantees. We focus on learning-enabled control systems with complex closed-loop dynamics that are difficult to model symbolically, but where state transition pairs can be queried, e.g., using a simulator. We demonstrate the applicability of our method on examples from the domain of learning-enabled cyber-physical systems.
摘要
我们考虑了数据驱动的可达性分析,用于离散时间渐进系统。我们假设我们没有符号表示法,而是有一个$K$-步轨迹数据集。我们的目标是构建一个流管,使得流管中的概率超过用户指定的失败概率阈值。我们的关键想法是:(1)从数据学习一个代理预测模型,(2)使用代理模型进行可达性分析,(3)使用凤凰推理来评估代理模型所吃进的误差,以提供可达性保证。我们关注learning-enabled控制系统,其中具有复杂的关闭环境,但可以通过 simulate 来查询状态转移对。我们在学习启发系统中的示例上进行了应用。
On the Connection Between Riemann Hypothesis and a Special Class of Neural Networks
results: 论文提供了一种扩展的分析条件,以及一种新的方法来检查RH。Abstract
The Riemann hypothesis (RH) is a long-standing open problem in mathematics. It conjectures that non-trivial zeros of the zeta function all have real part equal to 1/2. The extent of the consequences of RH is far-reaching and touches a wide spectrum of topics including the distribution of prime numbers, the growth of arithmetic functions, the growth of Euler totient, etc. In this note, we revisit and extend an old analytic criterion of the RH known as the Nyman-Beurling criterion which connects the RH to a minimization problem that involves a special class of neural networks. This note is intended for an audience unfamiliar with RH. A gentle introduction to RH is provided.
摘要
里曼假设(RH)是数学中一个长期开放的问题。它假设非质数函数的非质数部分都是1/2的实数部分。这个假设的影响是广泛的,覆盖了许多数学领域,包括整数分布、算术函数的增长、欧拉 totient 函数的增长等。在这份notes中,我们重新访问和扩展了一个古老的分析 критерий,称为尼曼-欧拉 criterion,它将RH与一种特殊的神经网络相连接。这份notes是为那些不熟悉RH的读者而设计的。我们会提供一个温顺的引入,以便读者更好地了解RH。
Integration of geoelectric and geochemical data using Self-Organizing Maps (SOM) to characterize a landfill
paper_authors: Camila Juliao, Johan Diaz, Yosmely BermÚdez, Milagrosa Aldana
For: 这个研究的目的是确定垃圾掩埋场周围区域是否存在潜在的污染风险,并通过不同方法来实现这一目的。* Methods: 本研究使用了地球电性资料(抗阻和IP)和表面甲烷测量数据,并使用了一个不supervised Neural Network( Kohonen 型)来处理和分类这些数据。* Results: 研究结果显示,通过使用 Self-Organizing Classification Maps(SOM),可以实现精确地定义潜在污染风险区域,并将其分为不同的类别。两个图像出力被 obtiened 从训练过程中,每个图像都代表了不同的潜在污染风险区域。Abstract
Leachates from garbage dumps can significantly compromise their surrounding area. Even if the distance between these and the populated areas could be considerable, the risk of affecting the aquifers for public use is imminent in most cases. For this reason, the delimitation and monitoring of the leachate plume are of significant importance. Geoelectric data (resistivity and IP), and surface methane measurements, are integrated and classified using an unsupervised Neural Network to identify possible risk zones in areas surrounding a landfill. The Neural Network used is a Kohonen type, which generates; as a result, Self-Organizing Classification Maps or SOM (Self-Organizing Map). Two graphic outputs were obtained from the training performed in which groups of neurons that presented a similar behaviour were selected. Contour maps corresponding to the location of these groups and the individual variables were generated to compare the classification obtained and the different anomalies associated with each of these variables. Two of the groups resulting from the classification are related to typical values of liquids percolated in the landfill for the parameters evaluated individually. In this way, a precise delimitation of the affected areas in the studied landfill was obtained, integrating the input variables via SOMs. The location of the study area is not detailed for confidentiality reasons.
摘要
垃圾排泄物可以很大程度地对周围环境造成影响。即使垃圾排泄物和人口集中区之间的距离相对较远,但是影响公共饮水储存层的风险仍然很高。因此,垃圾排泄物泄洪和监测的重要性非常大。在这种情况下,利用不超级网络(Kohonen类)进行无监督学习,并将抵抗性和IP测量数据集成,以生成自组织分类地图(SOM)。在训练过程中,选择了表现相似的神经元组,并生成了对应的Contour地图,以比较不同变量之间的分类结果和异常相关性。两个组 resulting from the classification are related to typical liquid values percolated in the landfill for the parameters evaluated individually. In this way, a precise delimitation of the affected areas in the studied landfill was obtained, integrating the input variables via SOMs. The location of the study area is not detailed for confidentiality reasons.
Total Variation Distance Estimation Is as Easy as Probabilistic Inference
results: 这篇论文提出了一种基于Bayes网的FPRAS估计TV距离 между任意类型的分布,并且只需要有效的概率推理算法。此外,这种方法还可以用于估计高维分布的TV距离。Abstract
In this paper, we establish a novel connection between total variation (TV) distance estimation and probabilistic inference. In particular, we present an efficient, structure-preserving reduction from relative approximation of TV distance to probabilistic inference over directed graphical models. This reduction leads to a fully polynomial randomized approximation scheme (FPRAS) for estimating TV distances between distributions over any class of Bayes nets for which there is an efficient probabilistic inference algorithm. In particular, it leads to an FPRAS for estimating TV distances between distributions that are defined by Bayes nets of bounded treewidth. Prior to this work, such approximation schemes only existed for estimating TV distances between product distributions. Our approach employs a new notion of $partial$ couplings of high-dimensional distributions, which might be of independent interest.
摘要
在这篇论文中,我们建立了一种新的连接,即全量变量(TV)距离估计和概率推理之间的连接。我们 Specifically, we present a structure-preserving reduction from relative approximation of TV distance to probabilistic inference over directed graphical models. This reduction leads to a fully polynomial randomized approximation scheme (FPRAS) for estimating TV distances between distributions over any class of Bayes nets for which there is an efficient probabilistic inference algorithm. In particular, it leads to an FPRAS for estimating TV distances between distributions that are defined by Bayes nets of bounded treewidth. Prior to this work, such approximation schemes only existed for estimating TV distances between product distributions. Our approach employs a new notion of $partial$ couplings of high-dimensional distributions, which might be of independent interest.Here's the translation in Traditional Chinese:在这篇论文中,我们建立了一种新的连接,即全量变量(TV)距离估计和概率推理之间的连接。我们 Specifically, we present a structure-preserving reduction from relative approximation of TV distance to probabilistic inference over directed graphical models. This reduction leads to a fully polynomial randomized approximation scheme (FPRAS) for estimating TV distances between distributions over any class of Bayes nets for which there is an efficient probabilistic inference algorithm. In particular, it leads to an FPRAS for estimating TV distances between distributions that are defined by Bayes nets of bounded treewidth. Prior to this work, such approximation schemes only existed for estimating TV distances between product distributions. Our approach employs a new notion of $partial$ couplings of high-dimensional distributions, which might be of independent interest.