For: 学习3D欧几何空间中函数之间的映射。* Methods: combinatorial学习方案和差异运算层,保证SE(3)-等变征性。从图谱视角来看,我们的方法可以看作是图像 convolution on graphons,我们称之为InfGCN。* Results: 在大规模电子密度数据集上进行了广泛的实验,与当前状态艺术体系相比,our model表现出了显著的优异性。多个缺省研究也进行了,以证明提案的建筑的效果。Abstract
We propose a general architecture that combines the coefficient learning scheme with a residual operator layer for learning mappings between continuous functions in the 3D Euclidean space. Our proposed model is guaranteed to achieve SE(3)-equivariance by design. From the graph spectrum view, our method can be interpreted as convolution on graphons (dense graphs with infinitely many nodes), which we term InfGCN. By leveraging both the continuous graphon structure and the discrete graph structure of the input data, our model can effectively capture the geometric information while preserving equivariance. Through extensive experiments on large-scale electron density datasets, we observed that our model significantly outperformed the current state-of-the-art architectures. Multiple ablation studies were also carried out to demonstrate the effectiveness of the proposed architecture.
摘要
我们提出了一种通用的建筑方案,这种方案结合了系数学习方案和差分运算层来学习3D欧几何空间中函数的映射。我们的提议的模型由设计 garantizado SE(3)-等价性。从图spectrum的视角来看,我们的方法可以被解释为 dense graphs(有无穷多个节点的图)上的卷积,我们称之为InfGCN。通过利用 continues graphon 结构和输入数据的树状结构,我们的模型可以有效地捕捉几何信息,同时保持等价性。经过对大规模电子密度数据集进行了广泛的实验,我们发现我们的模型可以明显超越当前状态的体系。此外,我们还进行了多个缺省研究来证明提案的效果。
A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions
results: 本文的算法可以在 $n$ 较大时,在 $\epsilon$-approximate 的情况下,使用 $\widetilde{O}(n \epsilon^{-1/3} + \epsilon^{-2})$ 梯度和函数评估,以及 $\widetilde{O}(n \epsilon^{-4/3})$ 的额外时间来解决该问题。在特定的特例中,当每个 $f_i$ 是线性函数时,本文的算法可以在runtime $\widetilde{O}(n (d/\epsilon)^{2/3} + nd + d\epsilon^{-2})$ 内获得 $\epsilon$-approximate 解决方案。这在 $n>d$ 和 $\epsilon=1/\sqrt{n}$ 时超过所有已知的第一个方法。Abstract
We design algorithms for minimizing $\max_{i\in[n]} f_i(x)$ over a $d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $\epsilon$-approximate solution using $\widetilde{O}(n \epsilon^{-1/3} + \epsilon^{-2})$ gradient and function evaluations, and $\widetilde{O}(n \epsilon^{-4/3})$ additional runtime. For large $n$, our evaluation complexity is optimal up to polylogarithmic factors. In the special case where each $f_i$ is linear -- which corresponds to finding a near-optimal primal strategy in a matrix game -- our method finds an $\epsilon$-approximate solution in runtime $\widetilde{O}(n (d/\epsilon)^{2/3} + nd + d\epsilon^{-2})$. For $n>d$ and $\epsilon=1/\sqrt{n}$ this improves over all existing first-order methods. When additionally $d = \omega(n^{8/11})$ our runtime also improves over all known interior point methods. Our algorithm combines three novel primitives: (1) A dynamic data structure which enables efficient stochastic gradient estimation in small $\ell_2$ or $\ell_1$ balls. (2) A mirror descent algorithm tailored to our data structure implementing an oracle which minimizes the objective over these balls. (3) A simple ball oracle acceleration framework suitable for non-Euclidean geometry.
摘要
我们设计算法以最小化 $\max_{i\in[n]} f_i(x)$ 在 $d$-维欧几何或简单体领域上。当每个 $f_i$ 是 $1$-Lipschitz 和 $1$-smooth 时,我们的方法可以在 $\widetilde{O}(n \epsilon^{-1/3} + \epsilon^{-2})$ 梯度和函数评估和 $\widetilde{O}(n \epsilon^{-4/3})$ 额外时间下 Compute an $\epsilon$-approximate solution。对于大 $n$,我们的评估复杂度是最佳的,仅仅带有极小的多项式因子。在特殊情况下,每个 $f_i$ 是线性的情况下(即找到一个近似最佳 primal 策略在矩阵游戏中),我们的方法可以在 runtime $\widetilde{O}(n (d/\epsilon)^{2/3} + nd + d\epsilon^{-2})$ 下Compute an $\epsilon$-approximate solution。当 $n>d$ 且 $\epsilon=1/\sqrt{n}$ 时,我们的时间复杂度超过所有已知的首ORDER方法。另外,当 $d = \omega(n^{8/11})$ 时,我们的时间复杂度也超过所有已知的内部点方法。我们的算法结合了三个新的基本 primitives:1. 一个动态数据结构,可以实现高效的随机梯度估计在小 $\ell_2$ 或 $\ell_1$ 球上。2. 一个镜像下降算法,适应我们的数据结构,实现一个函数实现器,可以实现这些球上的目标最小化。3. 一个简单的球观点增强框架,适合非欧几何。I hope this helps! Let me know if you have any questions or need further clarification.
A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games
paper_authors: Francisca Vasconcelos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras, Michael I. Jordan
for: quantum zero-sum games
methods: hierarchy of quantum optimization algorithms, including Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm
results: quadratic speed-up relative to previous algorithm, with an average-iterate convergence complexity of $\mathcal{O}(d/\epsilon)$ iterations to $\epsilon$-Nash equilibria.Abstract
Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed the first classical algorithm for computing equilibria in quantum zero-sum games using the Matrix Multiplicative Weight Updates (MMWU) method to achieve a convergence rate of $\mathcal{O}(d/\epsilon^2)$ iterations to $\epsilon$-Nash equilibria in the $4^d$-dimensional spectraplex. In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/\epsilon)$ iterations to $\epsilon$-Nash equilibria. This quadratic speed-up relative to Jain and Watrous' original algorithm sets a new benchmark for computing $\epsilon$-Nash equilibria in quantum zero-sum games.
摘要
In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/\epsilon)$ iterations to $\epsilon$-Nash equilibria. This represents a quadratic speed-up relative to Jain and Watrous' original algorithm, setting a new benchmark for computing $\epsilon$-Nash equilibria in quantum zero-sum games.
Accelerating L-shaped Two-stage Stochastic SCUC with Learning Integrated Benders Decomposition
results: 通过创建紧张的割和减少主问题的大小,提高Benders分解法的计算成本和内存使用情况。三种方法被提出,即回归Benders、分类Benders和回归-分类Benders。一个回归器读取负荷profile场景,预测子问题目标函数代理变量,形成紧张割。定义一个标准来衡量割的用于下界提高的水平。用于割的有用性被定义,并在含有分类学习器和无分类学习器两种情况下进行评估。有用割逐渐添加到主问题中,非有用割则被抛弃,以降低Benders迭代中的计算负担。多个测试系统的实践研究显示了提案的学习帮助Benders分解法在比传统多割Benders分解法更有效地解决两阶段SCUC问题。Abstract
Benders decomposition is widely used to solve large mixed-integer problems. This paper takes advantage of machine learning and proposes enhanced variants of Benders decomposition for solving two-stage stochastic security-constrained unit commitment (SCUC). The problem is decomposed into a master problem and subproblems corresponding to a load scenario. The goal is to reduce the computational costs and memory usage of Benders decomposition by creating tighter cuts and reducing the size of the master problem. Three approaches are proposed, namely regression Benders, classification Benders, and regression-classification Benders. A regressor reads load profile scenarios and predicts subproblem objective function proxy variables to form tighter cuts for the master problem. A criterion is defined to measure the level of usefulness of cuts with respect to their contribution to lower bound improvement. Useful cuts that contain the necessary information to form the feasible region are identified with and without a classification learner. Useful cuts are iteratively added to the master problem, and non-useful cuts are discarded to reduce the computational burden of each Benders iteration. Simulation studies on multiple test systems show the effectiveness of the proposed learning-aided Benders decomposition for solving two-stage SCUC as compared to conventional multi-cut Benders decomposition.
摘要
< translate into Simplified Chineseбендер的分解广泛应用于解决大规模杂合integer问题。这篇论文利用机器学习技术,提出了增强版本的本дер分解方法,用于解决两个阶段随机安全约束Unit Commitment(SCUC)问题。问题被分解成主问题和相应的负荷enario子问题。目标是通过创建更紧张的割和减少主问题的大小来降低本дер分解的计算成本和内存使用。三种方法被提出,即回归本дер、分类本дер和回归分类本дер。一个回归器读取负荷profile scenario,预测子问题目标函数假变量,以形成更紧张的割。一个 criterion 是定义用于测量割与 respect to its contribution to lower bound improvement的水平。有用的割是指包含必要信息来形成可行区的割,而不需要分类学习。有用的割在主问题中迭代添加,不用的割则被抛弃,以降低每个本дер迭代的计算负担。多个测试系统的 simulate 研究表明,提出的学习帮助的本дер分解方法可以与传统的多割本дер分解方法相比,更高效地解决两个阶段SCUC问题。
Machine learning phase transitions: Connections to the Fisher information
results: 研究证明了机器学习指标对数据中相转移的精度,并通过数值示范了这些指标在 классиical和量子系统中的性能。Abstract
Despite the widespread use and success of machine-learning techniques for detecting phase transitions from data, their working principle and fundamental limits remain elusive. Here, we explain the inner workings and identify potential failure modes of these techniques by rooting popular machine-learning indicators of phase transitions in information-theoretic concepts. Using tools from information geometry, we prove that several machine-learning indicators of phase transitions approximate the square root of the system's (quantum) Fisher information from below -- a quantity that is known to indicate phase transitions but is often difficult to compute from data. We numerically demonstrate the quality of these bounds for phase transitions in classical and quantum systems.
摘要
Translated into Simplified Chinese:尽管机器学习技术在数据上检测phasetransition的广泛使用和成功,它们的工作原理和基本限制仍然未知。在这里,我们解释这些技术的内部工作和 potential failure modes,并将它们基于信息理论概念。使用信息几何工具,我们证明了一些机器学习phasetransition的指标 approximate系统的(量子) Fisher信息的平方根从下面 - 这是已知能指示phasetransition的量,但是从数据中计算很难。我们 numerically示出了这些下界的质量 для classical和quantum系统的phasetransition.
Optimal Embedding Dimension for Sparse Subspace Embeddings
paper_authors: Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, Mark Rudelson
for: The paper is written to address the main open question posed by Nelson and Nguyen (FOCS 2013) on the embedding dimension of oblivious subspace embeddings (OSEs) and to improve on the previous results by Cohen (SODA 2016).
methods: The paper uses a random matrix with randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column to construct an OSE with $\epsilon = O_{\theta}(1)$.
results: The paper shows that the proposed OSE has an embedding dimension of $m=O(d)$ and achieves a distortion of $\epsilon = O_{\theta}(1)$, which improves on the previous results of $m=O(d\log(d))$ and $\epsilon = O(1)$ respectively. Additionally, the paper presents an optimal single-pass algorithm for least squares regression using the proposed OSE.Abstract
A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $\epsilon>0$, $\delta\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq (1+\epsilon)\|x\|\,\big)\geq 1-\delta.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $\theta > 0$, a Gaussian embedding matrix with $m\geq (1+\theta) d$ is an OSE with $\epsilon = O_\theta(1)$. However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any $\theta > 0$, an $m\times n$ random matrix $S$ with $m\geq (1+\theta)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $\epsilon = O_{\theta}(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to construct even sparser non-oblivious embeddings, leading to the first subspace embedding with low distortion $\epsilon=o(1)$ and optimal embedding dimension $m=O(d/\epsilon^2)$ that can be applied in current matrix multiplication time.
摘要
一个随机矩阵 $S$ 是一个透彻空间嵌入 (OSE), Parameters $\epsilon>0$, $\delta\in(0,1/3)$ 和 $d\leq m\leq n$。如果任何 $d$-维子空间 $W\subseteq \mathbb{R}^n$ 上,then $P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq (1+\epsilon)\|x\|\,\big)\geq 1-\delta.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $\theta > 0$, a Gaussian embedding matrix with $m\geq (1+\theta) d$ is an OSE with $\epsilon = O_\theta(1)$. However, the optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation.We show that, given any $\theta > 0$, an $m\times n$ random matrix $S$ with $m\geq (1+\theta)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $\epsilon = O_{\theta}(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to construct even sparser non-oblivious embeddings, leading to the first subspace embedding with low distortion $\epsilon=o(1)$ and optimal embedding dimension $m=O(d/\epsilon^2)$ that can be applied in current matrix multiplication time.
Multiparameter Persistent Homology for Molecular Property Prediction
for: This paper presents a novel method for generating molecular fingerprints based on multiparameter persistent homology, which reveals the latent structures and relationships within molecular geometry and detects topological features that exhibit persistence across multiple scales.
methods: The proposed fingerprinting method uses multiparameter persistent homology, which is a more comprehensive and interpretable approach than traditional graph neural networks. The method incorporates multiple parameters such as atomic mass, partial charge, and bond type, and can be further enhanced by incorporating additional parameters.
results: The proposed method has been demonstrated to be effective in predicting molecular properties through extensive experiments on the Lipophilicity, FreeSolv, and ESOL datasets. The method provides fresh perspectives on molecular structure that are not easily discernible from single-parameter or single-scale analysis.Abstract
In this study, we present a novel molecular fingerprint generation method based on multiparameter persistent homology. This approach reveals the latent structures and relationships within molecular geometry, and detects topological features that exhibit persistence across multiple scales along multiple parameters, such as atomic mass, partial charge, and bond type, and can be further enhanced by incorporating additional parameters like ionization energy, electron affinity, chirality and orbital hybridization. The proposed fingerprinting method provides fresh perspectives on molecular structure that are not easily discernible from single-parameter or single-scale analysis. Besides, in comparison with traditional graph neural networks, multiparameter persistent homology has the advantage of providing a more comprehensive and interpretable characterization of the topology of the molecular data. We have established theoretical stability guarantees for multiparameter persistent homology, and have conducted extensive experiments on the Lipophilicity, FreeSolv, and ESOL datasets to demonstrate its effectiveness in predicting molecular properties.
摘要
在本研究中,我们提出了一种基于多参数持续同态的分子指纹生成方法。这种方法可以揭示分子几何结构中的隐藏结构和关系,并检测在多个缩放量和多个参数(如原子质量、部分电荷、键类型)之间的 persistente 特征。此外,通过添加更多参数(如离子能力、电子亲和力、旋转hybridization),可以进一步增强分子指纹的准确性。提议的指纹方法可以为分子结构的分析提供新的视角,并且与传统的图 neural network 相比, multiparameter persistent homology 具有更全面和可 interpret的特征。我们已经提供了理论稳定保证,并在 Lipophilicity、FreeSolv 和 ESOL 数据集上进行了广泛的实验,以证明其效iveness 在预测分子性质。
Online Calibration of Deep Learning Sub-Models for Hybrid Numerical Modeling Systems
paper_authors: Said Ouala, Bertrand Chapron, Fabrice Collard, Lucile Gaultier, Ronan Fablet for: 这个论文主要是关于如何使用人工智能和深度学习来改进数值 simulate 框架,以及如何在这些框架中使用 neural network 来模型化物理系统。methods: 这个论文使用了一种名为 EGA(Euler Gradient Approximation)的在线学习方法,该方法假设物理模型中的梯度可以用一种加法方式来近似,并且使用了 Explicit Euler 方法来计算梯度。results: 实验结果表明,EGA 方法可以在不同的案例中提供显著的改进,比如 ocean-atmosphere 动力学等。这些结果也表明,在线学习方法可以在 hybrid 模型中提供更好的预测性能,相比于传统的 offline 学习方法。Abstract
Artificial intelligence and deep learning are currently reshaping numerical simulation frameworks by introducing new modeling capabilities. These frameworks are extensively investigated in the context of model correction and parameterization where they demonstrate great potential and often outperform traditional physical models. Most of these efforts in defining hybrid dynamical systems follow {offline} learning strategies in which the neural parameterization (called here sub-model) is trained to output an ideal correction. Yet, these hybrid models can face hard limitations when defining what should be a relevant sub-model response that would translate into a good forecasting performance. End-to-end learning schemes, also referred to as online learning, could address such a shortcoming by allowing the deep learning sub-models to train on historical data. However, defining end-to-end training schemes for the calibration of neural sub-models in hybrid systems requires working with an optimization problem that involves the solver of the physical equations. Online learning methodologies thus require the numerical model to be differentiable, which is not the case for most modeling systems. To overcome this difficulty and bypass the differentiability challenge of physical models, we present an efficient and practical online learning approach for hybrid systems. The method, called EGA for Euler Gradient Approximation, assumes an additive neural correction to the physical model, and an explicit Euler approximation of the gradients. We demonstrate that the EGA converges to the exact gradients in the limit of infinitely small time steps. Numerical experiments are performed on various case studies, including prototypical ocean-atmosphere dynamics. Results show significant improvements over offline learning, highlighting the potential of end-to-end online learning for hybrid modeling.
摘要
人工智能和深度学习现在在数值仿真框架中发挥重要作用,带来新的模型化能力。这些框架在模型修正和参数化方面得到了广泛的研究,并在许多情况下超越了传统的物理模型。大多数这些尝试都采用了拟合动力系统的方法,其中大多数采用了Offline学习策略,在哪里神经参数化(以下简称为子模型)被训练以输出理想的修正。然而,这些混合模型在定义相关的子模型响应时可能会遇到困难。在线学习方法,也称为Online学习,可以解决这一缺点,并允许深度学习子模型在历史数据上进行训练。然而,定义在混合系统中的End-to-end学习方案需要与物理方程的解除器进行优化问题,这种问题需要数值模型具有导数性。因此,在线学习方法需要数值模型是可导的,这并不是现实的情况。为了缺过这个挑战和Physical模型的不导数性,我们提出了一种高效和实用的在线学习方法,称为EGA(Euler Gradient Approximation)。EGA假设神经修正是加性的,并且使用显式Euler近似来计算导数。我们证明EGA可以在无穷小时步下收敛到正确的导数。在不同的案例研究中,包括气洋大气动力学的示例,我们进行了数值实验,并得到了与Offline学习相比显著的改进。这些结果表明了混合模型的End-to-end在线学习的潜在潜力。
Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms
paper_authors: Shafagh Keyvanian, Michelle J. Johnson, Nadia Figueroa for:这篇论文旨在创建一个真实的人体动机学模型,以便在人机交互、生物力学和机器人帮助重建中更加准确地模拟人体动作。methods:该论文使用数据驱动的方法,通过一个一类支持向量机进行joint空间探索运动数据的适应,并实现了高效的hyperparameter调整方案。results:该论文的方法在有效地学习真实的人体动机学范围,并提供了一个量化的依硬力指标(II),用于评估健康和损伤臂的能力差异。Abstract
A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which are difficult to represent. Hence, physicians and researchers have relied on simple box-constraints, ignoring important anatomical factors. In this paper, we propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion (RoM) boundaries from motion capture data. This is achieved by fitting a one-class support vector machine to a dataset of upper-limb joint space exploration motions with an efficient hyper-parameter tuning scheme. Our approach outperforms similar works focused on valid RoM learning. Further, we propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms. We validate the metric on healthy subjects physically constrained to emulate hemiplegia and different disability levels as stroke patients.
摘要
真实的人类动态模型,满足人体生物力学和机器人协助康复的需求,是非常重要。但是模拟真实的关节约束却是一项挑战,因为人类手臂运动受到关节限制、间关节和内关节依赖、自体冲撞和个体能力和神经学约束的限制。因此,医生和研究人员通常采用简单的盒子约束,忽略了重要的解剖因素。在这篇论文中,我们提出了一种基于数据驱动的方法,通过将一类支持向量机制适应到一个基于运动捕捉数据的上下文中,以学习真实的解剖约束范围。我们的方法比类似的工作更高效。此外,我们还提出了一个评价能力/障碍度的指标(II指标),可以对健康和损伤手臂进行数量评价。我们验证了这个指标,通过将健康人物理约束为假性肢体瘫痪和不同的残伤水平来验证。
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach
results: 实验结果表明,使用 HMM 可以超越预测方法的性能,这 further 支持冲突警告可能具有markov 性质。Abstract
Space is becoming more crowded in Low Earth Orbit due to increased space activity. Such a dense space environment increases the risk of collisions between space objects endangering the whole space population. Therefore, the need to consider collision avoidance as part of routine operations is evident to satellite operators. Current procedures rely on the analysis of multiple collision warnings by human analysts. However, with the continuous growth of the space population, this manual approach may become unfeasible, highlighting the importance of automation in risk assessment. In 2019, ESA launched a competition to study the feasibility of applying machine learning in collision risk estimation and released a dataset that contained sequences of Conjunction Data Messages (CDMs) in support of real close encounters. The competition results showed that the naive forecast and its variants are strong predictors for this problem, which suggests that the CDMs may follow the Markov property. The proposed work investigates this theory by benchmarking Hidden Markov Models (HMM) in predicting the risk of collision between two resident space objects by using one feature of the entire dataset: the sequence of the probability in the CDMs. In addition, Bayesian statistics are used to infer a joint distribution for the parameters of the models, which allows the development of robust and reliable probabilistic predictive models that can incorporate physical or prior knowledge about the problem within a rigorous theoretical framework and provides prediction uncertainties that nicely reflect the accuracy of the predicted risk. This work shows that the implemented HMM outperforms the naive solution in some metrics, which further adds to the idea that the collision warnings may be Markovian and suggests that this is a powerful method to be further explored.
摘要
Space 正在低地球轨道上变得越来越拥挤,由于增加的空间活动。这样的紧密的空间环境会提高空间物体之间的Collision的风险, threatening the whole space population.因此,卫星运营商必须考虑避免Collision的作业为 Routine 的一部分。现有的程序仍然基于人类分析多个Collision 警告。然而,随着空间人口的不断增长,这种手动方法可能变得不可行, highlighting the importance of automation in risk assessment.在2019年,ESA发布了一项竞赛,以研究在Collision 风险估计中应用机器学习的可行性,并发布了一个包含实际近距离Encounter 的数据集。竞赛结果表明,naive forecast和其变体是Close Encounter 中的强有力预测器,这意味着CDMs可能遵循Markov 性质。本工作investigates this theory by benchmarkingHidden Markov Models (HMM) in predicting the risk of collision between two resident space objects using one feature of the entire dataset: the sequence of the probability in the CDMs。此外,Bayesian statistics are used to infer a joint distribution for the parameters of the models, which allows the development of robust and reliable probabilistic predictive models that can incorporate physical or prior knowledge about the problem within a rigorous theoretical framework and provides prediction uncertainties that nicely reflect the accuracy of the predicted risk。本工作显示,实施的HMM OUTPERFORMS naive solution in some metrics,这更加支持CDMs遵循Markovian的想法,并 suggets that this is a powerful method to be further explored。
A Poincaré Inequality and Consistency Results for Signal Sampling on Large Graphs
results: 作者采用了相关的图on信号抽样算法,并通过实验证明了其在图机器学习任务上的良好实验表现。Here’s the same information in English:
for: The paper addresses the challenge of large-scale graph machine learning, where the complexity of learning models scales with the graph size.
methods: The authors propose a signal sampling theory for a type of graph limit called the graphon, and prove that certain sampling sets are unique and consistent for graphon signals.
results: The authors propose a related graphon signal sampling algorithm and demonstrate its good empirical performance on graph machine learning tasks.Abstract
Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In this paper, we introduce a signal sampling theory for a type of graph limit -- the graphon. We prove a Poincar\'e inequality for graphon signals and show that complements of node subsets satisfying this inequality are unique sampling sets for Paley-Wiener spaces of graphon signals. Exploiting connections with spectral clustering and Gaussian elimination, we prove that such sampling sets are consistent in the sense that unique sampling sets on a convergent graph sequence converge to unique sampling sets on the graphon. We then propose a related graphon signal sampling algorithm for large graphs, and demonstrate its good empirical performance on graph machine learning tasks.
摘要
大规模图机器学习是挑战的,因为学习模型的复杂度与图Size相关。图样本是非欧几何的,现有的图样本技术需要计算大Matrix的特征值,并在图改变时重复这些计算,例如图生长。在这篇论文中,我们介绍了一种图Limit——图он的信号抽样理论。我们证明了图он信号的波因耳假设下的质量假设,并证明了这些样本集在Paley-Wiener空间中是独特的抽样集。通过spectral clustering和欧几何排序的连接,我们证明了这些样本集是一致的,即在 convergent 图序列上的样本集 converges to 图он上的样本集。然后,我们提出了一种基于图он信号抽样的大图机器学习算法,并在实际应用中得到了良好的实际表现。
Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks
paper_authors: Benjamin Feuer, Chinmay Hegde, Niv Cohen
for: 本研究旨在 investigating the best way to summarize the labelled training samples before feeding them to a pre-trained Prior-Data Fitted Network (PFN) for tabular data.
methods: 本研究使用 sketching 和 feature-selection methods to summarize the labelled training samples, and compare the results with conventionally fitted tabular models.
results: 研究发现,使用 sketching 和 feature-selection methods可以有效地缩小 labelled training samples,而且这些方法与 conventionally fitted tabular models 有一定的区别。Abstract
Tabular classification has traditionally relied on supervised algorithms, which estimate the parameters of a prediction model using its training data. Recently, Prior-Data Fitted Networks (PFNs) such as TabPFN have successfully learned to classify tabular data in-context: the model parameters are designed to classify new samples based on labelled training samples given after the model training. While such models show great promise, their applicability to real-world data remains limited due to the computational scale needed. Here we study the following question: given a pre-trained PFN for tabular data, what is the best way to summarize the labelled training samples before feeding them to the model? We conduct an initial investigation of sketching and feature-selection methods for TabPFN, and note certain key differences between it and conventionally fitted tabular models.
摘要
文本分类传统上采用了指导算法,这些算法估算模型参数使用训练数据。近期,先进的假数据适应网络(PFN),如TabPFN,已成功地在标签数据上进行了分类:模型参数是在给定模型训练后的标签训练样本上基于新样本进行分类。虽然这些模型显示出了极大的承诺,但它们在实际数据上的应用受到了计算规模的限制。我们研究以下问题:给定一个预训练的PFN,如何最好 SUMMARIZE 标签训练样本?我们进行了初步的笔记和特征选择方法的研究,并注意到了PFN与传统的适应 tabular 模型之间的一些关键差异。
Implicit Maximum a Posteriori Filtering via Adaptive Optimization
methods: 该论文使用了优化问题的形式来实现 bayesian 筛选,而不需要维护大 matrices 或 Monte Carlo 估计。
results: 实验表明,该方法可以在高维状态空间中实现有效、Robust 和可扩展的筛选result,与标准的 bayesian 筛选方法相比,该方法更容易 fine-tune 优化器。Abstract
Bayesian filtering approximates the true underlying behavior of a time-varying system by inverting an explicit generative model to convert noisy measurements into state estimates. This process typically requires either storage, inversion, and multiplication of large matrices or Monte Carlo estimation, neither of which are practical in high-dimensional state spaces such as the weight spaces of artificial neural networks. Here, we frame the standard Bayesian filtering problem as optimization over a time-varying objective. Instead of maintaining matrices for the filtering equations or simulating particles, we specify an optimizer that defines the Bayesian filter implicitly. In the linear-Gaussian setting, we show that every Kalman filter has an equivalent formulation using K steps of gradient descent. In the nonlinear setting, our experiments demonstrate that our framework results in filters that are effective, robust, and scalable to high-dimensional systems, comparing well against the standard toolbox of Bayesian filtering solutions. We suggest that it is easier to fine-tune an optimizer than it is to specify the correct filtering equations, making our framework an attractive option for high-dimensional filtering problems.
摘要
bayesian filtering aproximates the true underlying behavior of a time-varying system by inverting an explicit generative model to convert noisy measurements into state estimates. this process typically requires either storage, inversion, and multiplication of large matrices or monte carlo estimation, neither of which are practical in high-dimensional state spaces such as the weight spaces of artificial neural networks. here, we frame the standard bayesian filtering problem as optimization over a time-varying objective. instead of maintaining matrices for the filtering equations or simulating particles, we specify an optimizer that defines the bayesian filter implicitly. in the linear-gaussian setting, we show that every kalman filter has an equivalent formulation using k steps of gradient descent. in the nonlinear setting, our experiments demonstrate that our framework results in filters that are effective, robust, and scalable to high-dimensional systems, comparing well against the standard toolbox of bayesian filtering solutions. we suggest that it is easier to fine-tune an optimizer than it is to specify the correct filtering equations, making our framework an attractive option for high-dimensional filtering problems.
Graph Neural Networks for Pressure Estimation in Water Distribution Systems
results: 这个方法在一个规模很大的荷兰水分布网络上进行了实验,并得到了较高的准确性和稳定性。相比之下,之前的研究中的方法在同样的网络上的表现较差。Abstract
Pressure and flow estimation in Water Distribution Networks (WDN) allows water management companies to optimize their control operations. For many years, mathematical simulation tools have been the most common approach to reconstructing an estimate of the WDN hydraulics. However, pure physics-based simulations involve several challenges, e.g. partially observable data, high uncertainty, and extensive manual configuration. Thus, data-driven approaches have gained traction to overcome such limitations. In this work, we combine physics-based modeling and Graph Neural Networks (GNN), a data-driven approach, to address the pressure estimation problem. First, we propose a new data generation method using a mathematical simulation but not considering temporal patterns and including some control parameters that remain untouched in previous works; this contributes to a more diverse training data. Second, our training strategy relies on random sensor placement making our GNN-based estimation model robust to unexpected sensor location changes. Third, a realistic evaluation protocol considers real temporal patterns and additionally injects the uncertainties intrinsic to real-world scenarios. Finally, a multi-graph pre-training strategy allows the model to be reused for pressure estimation in unseen target WDNs. Our GNN-based model estimates the pressure of a large-scale WDN in The Netherlands with a MAE of 1.94mH$_2$O and a MAPE of 7%, surpassing the performance of previous studies. Likewise, it outperformed previous approaches on other WDN benchmarks, showing a reduction of absolute error up to approximately 52% in the best cases.
摘要
“水distribution网络(WDN)中的压力和流量估算可以帮助水资源管理公司优化其控制操作。在过去的几十年中,数学模拟工具一直是WDN hidraulics的重要估算方法。然而,基于物理的数据驱动方法具有一些挑战,如部分可见数据、高度不确定和广泛的手动配置。因此,数据驱动方法在WDN中得到了广泛的应用。在这种情况下,我们将物理模型和图 neural network(GNN)相结合,以解决压力估算问题。首先,我们提出了一种新的数据生成方法,使用数学模拟而不考虑时间模式,并包括一些控制参数,这些参数在前一些研究中未经考虑。其次,我们的训练策略基于随机感知器的布局,使我们的GNN-based estimation模型具有鲁棒性。最后,我们采用了一种现实istic的评估协议,考虑真实的时间模式,并在实际情况中添加了内在的不确定性。此外,我们还提出了一种多图预训练策略,使模型可以在未看到的目标WDN中进行重用。我们的GNN-based模型在荷兰的一个大规模WDN中估算了压力的 Mean Absolute Error(MAE)为1.94mH$_2$O,与前一些研究相比,表现出色。此外,它还在其他WDNbenchmark上表现出色,比前一些方法减少绝对错误的约52%。”
paper_authors: Adam D. Cobb, Brian Matejek, Daniel Elenius, Anirban Roy, Susmit Jha
for: 这 paper 是为了提出一种新的likelihood-free simulation-based inference(SBI)的估计器。
methods: 这 paper 使用了一种直观ratio estimator(DNRE)来估计likelihood ratio,DNRE 通过单个前进传播来估计likelihood ratio,与之前的方法不同。
results: 作者在引入 DNRE 时还提出了一种相应的Monte Carlo估计 posterior,并对新的 ratio estimator 进行了比较性分析。 results 显示,新的 ratio estimator 通常能够超越先前的方法。此外,作者还引入了一种新的 derivative estimator,用于比较likelihood-free Hamiltonian Monte Carlo(HMC)与 random-walk Metropolis-Hastings(MH)。结果表明,HMC 和 MH 在效果上几乎相等。最后,作者通过使用 neural ratio estimator 设计了一架quadcopter,这是一个实际应用的例子。代码可以在https://github.com/SRI-CSL/dnre 上获取。Abstract
We introduce a new amortized likelihood ratio estimator for likelihood-free simulation-based inference (SBI). Our estimator is simple to train and estimates the likelihood ratio using a single forward pass of the neural estimator. Our approach directly computes the likelihood ratio between two competing parameter sets which is different from the previous approach of comparing two neural network output values. We refer to our model as the direct neural ratio estimator (DNRE). As part of introducing the DNRE, we derive a corresponding Monte Carlo estimate of the posterior. We benchmark our new ratio estimator and compare to previous ratio estimators in the literature. We show that our new ratio estimator often outperforms these previous approaches. As a further contribution, we introduce a new derivative estimator for likelihood ratio estimators that enables us to compare likelihood-free Hamiltonian Monte Carlo (HMC) with random-walk Metropolis-Hastings (MH). We show that HMC is equally competitive, which has not been previously shown. Finally, we include a novel real-world application of SBI by using our neural ratio estimator to design a quadcopter. Code is available at https://github.com/SRI-CSL/dnre.
摘要
我们介绍一个新的折衣率分布估计器,用于无likelihood-based simulation-based推理(SBI)。我们的估计器简单易于训练,通过单一的前进传播神经估计器来估计对抗组件之间的折衣率。我们的方法直接计算两个竞争性 parameter set 之间的折衣率,与前一种比较两个神经网络输出值的方法不同。我们称之为“直接神经率估计器”(DNRE)。在引入 DNRE 时,我们 derivate 一个对应的Monte Carlo estimate of the posterior。我们 benchmark 我们的新折衣率估计器,并与过去的折衣率估计器进行比较。我们显示了我们的新折衣率估计器经常超越过去的方法。此外,我们引入了一个新的折衣率估计器 Derby 估计器,允许我们比较likelihood-free Hamiltonian Monte Carlo(HMC)与随机步进 Metropolis-Hastings(MH)。我们显示了HMC 与 MH 在likelihood-free状况下是等效的,这没有被证明过。最后,我们还提出了一个新的实际应用,利用我们的神经率估计器设计一架quadcopter。代码可以在https://github.com/SRI-CSL/dnre 上找到。
RONAALP: Reduced-Order Nonlinear Approximation with Active Learning Procedure
for: This paper is written for engineers and researchers who need to evaluate expensive, non-linear high-dimensional functions in their applications.
methods: The paper proposes the RONAALP algorithm, which is a reduced-order nonlinear approximation with active learning procedure to incrementally learn a fast and accurate reduced-order surrogate model of a target function on-the-fly. The algorithm combines nonlinear auto-encoders, community clustering, and radial basis function networks to learn an efficient and compact surrogate model with limited training data.
results: The paper demonstrates the effectiveness of the RONAALP algorithm on three direct numerical simulations of hypersonic flows in chemical nonequilibrium. The results show that the algorithm can reduce the cost of the simulation by up to 75% while maintaining an error of less than 10% on relevant quantities of interest.Abstract
Many engineering applications rely on the evaluation of expensive, non-linear high-dimensional functions. In this paper, we propose the RONAALP algorithm (Reduced Order Nonlinear Approximation with Active Learning Procedure) to incrementally learn a fast and accurate reduced-order surrogate model of a target function on-the-fly as the application progresses. First, the combination of nonlinear auto-encoder, community clustering and radial basis function networks allows to learn an efficient and compact surrogate model with limited training data. Secondly, the active learning procedure overcome any extrapolation issue when evaluating the surrogate model outside of its initial training range during the online stage. This results in generalizable, fast and accurate reduced-order models of high-dimensional functions. The method is demonstrated on three direct numerical simulations of hypersonic flows in chemical nonequilibrium. Accurate simulations of these flows rely on detailed thermochemical gas models that dramatically increase the cost of such calculations. Using RONAALP to learn a reduced-order thermodynamic model surrogate on-the-fly, the cost of such simulation was reduced by up to 75% while maintaining an error of less than 10% on relevant quantities of interest.
摘要
многие инженерные приложения зависят от оценки дорогостоящих, нелинейных, многомерных функций. В этой статье мы предлагаем алгоритм RONAALP (Редуцированный порядковый нелинейный подход с активным обучением) для постепенного обучения быстрому и точному редуцированному моделированию целевой функции на месте, как функция прогрессирует. Сначала комбинация нелинейного автоэнкодера, clustering сообществ и радиальных основных сетей позволяет обучить эффективную и компактную модель-surrogate с ограниченным количеством данных обучения. Затем активный процесс обучения преодолевает любые проблемы экстраполяции, когда модель-surrogate оценивается за пределами своего первоначального диапазона во время онлайн-стадии. Это приводит к генерализованным, быстрым и точным редуцированным моделям высокодимензионных функций. Метод демонстрируется на трёх прямых численных симуляциях гиперзвуковых потоков в химическом неравновесии. Точные симуляции таких потоков зависят от подробных моделей газов, что увеличивает стоимость расчетов. Применение RONAALP для обучения редуцированному модели thermodynamic на месте сократило стоимость таких симуляций на 75% при сохранении ошибки менее 10% на ключевые величины интереса.
Utilizing VQ-VAE for End-to-End Health Indicator Generation in Predicting Rolling Bearing RUL
results: 使用VQ-VAE方法构建标签后,PMH2012数据集上的方法显示出较低的MAD和MV值,而使用VQ-VAE标签训练的ASTCN预测模型也达到了最低的MAD和MV值。Abstract
The prediction of the remaining useful life (RUL) of rolling bearings is a pivotal issue in industrial production. A crucial approach to tackling this issue involves transforming vibration signals into health indicators (HI) to aid model training. This paper presents an end-to-end HI construction method, vector quantised variational autoencoder (VQ-VAE), which addresses the need for dimensionality reduction of latent variables in traditional unsupervised learning methods such as autoencoder. Moreover, concerning the inadequacy of traditional statistical metrics in reflecting curve fluctuations accurately, two novel statistical metrics, mean absolute distance (MAD) and mean variance (MV), are introduced. These metrics accurately depict the fluctuation patterns in the curves, thereby indicating the model's accuracy in discerning similar features. On the PMH2012 dataset, methods employing VQ-VAE for label construction achieved lower values for MAD and MV. Furthermore, the ASTCN prediction model trained with VQ-VAE labels demonstrated commendable performance, attaining the lowest values for MAD and MV.
摘要
rolling bearings 的剩余有用生命剩余预测是工业生产中的一个关键问题。一种关键的方法是将振荡信号转化为健康指标(HI),以便模型训练。本文提出了一种终端HI建构方法,基于量化变分自动编码器(VQ-VAE),解决了传统无监督学习方法中的维度减少问题。此外,由于传统统计指标不准确地反映曲线波动,本文引入了两种新的统计指标:平均绝对距离(MAD)和平均方差(MV)。这两种指标准确地描述曲线波动的特征,因此可以反映模型对相似特征的准确性。在PMH2012数据集上,使用VQ-VAE进行标签建构的方法实现了较低的MAD和MV值。此外,使用VQ-VAE标签训练的ASTCN预测模型实现了最低的MAD和MV值。
Causal Fairness-Guided Dataset Reweighting using Neural Networks
paper_authors: Xuan Zhao, Klaus Broelemann, Salvatore Ruggieri, Gjergji Kasneci
for: This paper aims to address the issue of fairness in machine learning models from a causal perspective, and proposes a reweighting scheme of datasets to mitigate bias and achieve causal fairness.
methods: The proposed method uses two neural networks to approximate the causal model of the data and the causal model of interventions, and applies reweighting guided by a discriminator to achieve various fairness notions.
results: The experiments on real-world datasets show that the proposed method can achieve causal fairness on the data while remaining close to the original data for downstream tasks.Abstract
The importance of achieving fairness in machine learning models cannot be overstated. Recent research has pointed out that fairness should be examined from a causal perspective, and several fairness notions based on the on Pearl's causal framework have been proposed. In this paper, we construct a reweighting scheme of datasets to address causal fairness. Our approach aims at mitigating bias by considering the causal relationships among variables and incorporating them into the reweighting process. The proposed method adopts two neural networks, whose structures are intentionally used to reflect the structures of a causal graph and of an interventional graph. The two neural networks can approximate the causal model of the data, and the causal model of interventions. Furthermore, reweighting guided by a discriminator is applied to achieve various fairness notions. Experiments on real-world datasets show that our method can achieve causal fairness on the data while remaining close to the original data for downstream tasks.
摘要
“machine learning模型中的公平性的重要性不能被过度说明。latest research表明,公平性应该从 causal perspective examined,并提出了基于pearl causal framework的多种公平性观。本文提出了一种基于dataset重Weighting的方法,以mitigate bias by considering the causal relationships among variables and incorporating them into the reweighting process。我们采用了两个神经网络,其结构与causal graph和interventional graph Reflects。两个神经网络可以 aproximate the causal model of the data, and the causal model of interventions。此外,我们还使用了一个抑制器来实现多种公平性观。实验表明,我们的方法可以在实际数据上实现causal fairness,while remain close to the original data for downstream tasks。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other parts of the world. Traditional Chinese is also widely used, especially in Taiwan and Hong Kong.
Handling Overlapping Asymmetric Datasets – A Twice Penalized P-Spline Approach
results: 通过数据 simulate、参数调整和模型改进,我们发现在考虑continuous和binaryResponse的情况下,我们的双重penalized方法可以提供较好的适应性,比起线性B-spline和once penalized P-spline Approximation。在实际数据中应用于评估非酒精性肝炎发展的风险时,我们发现模型适应性得到了65%以上的提高。Abstract
Overlapping asymmetric datasets are common in data science and pose questions of how they can be incorporated together into a predictive analysis. In healthcare datasets there is often a small amount of information that is available for a larger number of patients such as an electronic health record, however a small number of patients may have had extensive further testing. Common solutions such as missing imputation can often be unwise if the smaller cohort is significantly different in scale to the larger sample, therefore the aim of this research is to develop a new method which can model the smaller cohort against a particular response, whilst considering the larger cohort also. Motivated by non-parametric models, and specifically flexible smoothing techniques via generalized additive models, we model a twice penalized P-Spline approximation method to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. This second penalty is created through discrepancies in the marginal value of covariates that exist in both the smaller and larger cohorts. Through data simulations, parameter tunings and model adaptations to consider a continuous and binary response, we find our twice penalized approach offers an enhanced fit over a linear B-Spline and once penalized P-Spline approximation. Applying to a real-life dataset relating to a person's risk of developing Non-Alcoholic Steatohepatitis, we see an improved model fit performance of over 65%. Areas for future work within this space include adapting our method to not require dimensionality reduction and also consider parametric modelling methods. However, to our knowledge this is the first work to propose additional marginal penalties in a flexible regression of which we can report a vastly improved model fit that is able to consider asymmetric datasets, without the need for missing data imputation.
摘要
常见的不协调数据集在数据科学中出现,问题是如何将它们集成到预测分析中。医疗数据集中有时只有一小部分数据可用于较多的病人,例如电子健康记录,但是一些病人可能进行了较多的进一步检测。常见的解决方案,如遗弃值替换,可能不适用,因为小组规模较小的组比大组规模更大。因此,本研究的目标是开发一种新的方法,可以将小组模型为特定的响应,同时考虑大组。我们受非参数模型的激励,以及通用的滑动技术,特别是通用的加性模型。我们使用二次罚款P-Spline近似方法,首先避免小组过/下适应,第二个罚款是通过小组和大组covariate的偏度差异来考虑大组。通过数据 simulations、参数调整和模型修改来考虑连续和二分类响应,我们发现我们的两次罚款方法在Linear B-Spline和一次罚款P-Spline Approximation中提供了显著改进。应用于一个实际数据集,关于一个人的非酒精性肝炎风险,我们发现我们的模型适应性能高于65%。未来的工作包括适应我们的方法不需要维度减少和考虑 Parametric 模型方法。但是,到我们知道的是,这是首次提出额外的边缘罚款在灵活回归中,我们可以报告一个远远超过65%的模型适应性能,能够考虑不协调数据集,无需遗弃数据替换。
Robustness Enhancement in Neural Networks with Alpha-Stable Training Noise
paper_authors: Xueqiong Yuan, Jipeng Li, Ercan Engin Kuruoğlu
For: The paper aims to improve the robustness of deep learning systems by exploring the use of alpha-stable noise instead of Gaussian noise for data augmentation.* Methods: The paper compares the testing accuracy of models trained with Gaussian noise and alpha-stable noise on data corrupted by different types of noise, and finds that training with alpha-stable noise is more effective, especially for impulsive noise.* Results: The paper shows that training with alpha-stable noise improves the robustness of deep learning models on various datasets, including image and time series datasets, and other benchmark corrupted datasets.Here’s the simplified Chinese text for the three points:* For: 该论文目的是提高深度学习系统的Robustness,通过替换传统的高斯噪声使用α稳定噪声进行数据增强。* Methods: 论文通过对不同噪声类型的数据进行测试,并比较高斯噪声和α稳定噪声训练模型的测试精度,发现α稳定噪声训练模型在干扰噪声下的性能更高。* Results: 论文通过在多个图像和时间序列数据集以及其他受损数据集上进行实验,证明α稳定噪声训练模型在不同的噪声环境下都能够提高模型的Robustness。Abstract
With the increasing use of deep learning on data collected by non-perfect sensors and in non-perfect environments, the robustness of deep learning systems has become an important issue. A common approach for obtaining robustness to noise has been to train deep learning systems with data augmented with Gaussian noise. In this work, we challenge the common choice of Gaussian noise and explore the possibility of stronger robustness for non-Gaussian impulsive noise, specifically alpha-stable noise. Justified by the Generalized Central Limit Theorem and evidenced by observations in various application areas, alpha-stable noise is widely present in nature. By comparing the testing accuracy of models trained with Gaussian noise and alpha-stable noise on data corrupted by different noise, we find that training with alpha-stable noise is more effective than Gaussian noise, especially when the dataset is corrupted by impulsive noise, thus improving the robustness of the model. The generality of this conclusion is validated through experiments conducted on various deep learning models with image and time series datasets, and other benchmark corrupted datasets. Consequently, we propose a novel data augmentation method that replaces Gaussian noise, which is typically added to the training data, with alpha-stable noise.
摘要
Alpha-stable noise is widely present in nature, as evidenced by the Generalized Central Limit Theorem and observations in various application areas. By comparing the testing accuracy of models trained with Gaussian noise and alpha-stable noise on data corrupted by different noise, we find that training with alpha-stable noise is more effective than Gaussian noise, especially when the dataset is corrupted by impulsive noise, thus improving the robustness of the model.We validate the generality of this conclusion through experiments conducted on various deep learning models with image and time series datasets, as well as other benchmark corrupted datasets. Based on these results, we propose a novel data augmentation method that replaces Gaussian noise, which is typically added to the training data, with alpha-stable noise. This approach can improve the robustness of deep learning systems and enhance their ability to generalize to real-world data.
Maintenance Techniques for Anomaly Detection AIOps Solutions
paper_authors: Lorena Poenaru-Olaru, Natalia Karpova, Luis Cruz, Jan Rellermeyer, Arie van Deursen
for: 这种研究旨在探讨 anomaly detection 技术如何自动监控 IT 系统和操作,以及如何保持模型的性能在时间变化中。
methods: 本研究使用了两种不同的模型维护技术,namely blind model retraining 和 informed model retraining,并对各种更新频率进行了分析。
results: 研究发现,采用 full-history 方法更新模型可以保持较高的检测精度,而 sliding window 方法更新模型则可以适应时间变化。此外,使用数据变化监控工具可以确定模型是否需要更新。Abstract
Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are continuously evaluated on newly emerging data. Operational data is constantly changing over time, which affects the performance of deployed anomaly detection models. Therefore, continuous model maintenance is required to preserve the performance of anomaly detectors over time. In this work, we analyze two different anomaly detection model maintenance techniques in terms of the model update frequency, namely blind model retraining and informed model retraining. We further investigate the effects of updating the model by retraining it on all the available data (full-history approach) and on only the newest data (sliding window approach). Moreover, we investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.
摘要
“异常探测技术是自动监控IT系统和操作的重要工具。这些技术假设机器学习算法在特定时间段的操作数据上进行训练,并在新的数据上持续评估。操作数据随时间变化,因此需要持续维护适用的异常探测模型,以确保它们在时间进行良好的表现。在这个工作中,我们分析了两种异常探测模型维护技术,分别是隐身模型重训和知情模型重训。我们还调查了将模型重训在所有可用数据上(全历史方法)和仅仅在最新的数据上(滑块窗口方法)的影响。此外,我们还探讨了一个数据变化监控工具是否能够决定异常探测模型是否需要更新。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and other regions.
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
for: This paper proposes a method to improve the efficiency of multi-task model training, which is often hindered by the variation in input sequence length.
methods: The proposed method, called DynaPipe, uses dynamic micro-batching to tackle sequence length variation and enable efficient training of large language models.
results: The authors evaluate DynaPipe on the FLANv2 dataset and show that it achieves up to 4.39x higher training throughput compared to packing-based baselines, and 3.25x compared to the best-performing baseline.Here’s the full text in Simplified Chinese:
results: 作者们在FLANv2数据集上评估了DynaPipe,并显示其在比基eline的4.39倍和GPT的3.25倍的训练吞吐量上具有明显的优势。Abstract
Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipeline-parallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling, enabling highly efficient pipeline training. Extensive evaluation on the FLANv2 dataset demonstrates up to 4.39x higher training throughput when training T5, and 3.25x when training GPT, as compared with packing-based baselines. DynaPipe's source code is publicly available at https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines.
摘要
多任务模型训练已被采用,以使用单个深度神经网络模型(通常是大型语言模型)来处理多个任务(例如,问答和文本摘要)。多任务训练通常接收不同上下文中的输入序列,因此输入样本的长度异常变化。 padding(到同一个序列长度)或 packing(短示例入力到同一个长度的序列)通常被采用,以为模型训练准备输入样本。然而,这并不是空间或计算效率的最佳选择。这篇论文提出了动态微批处理方法,以解决序列长度的变化和有效地进行多任务模型训练。我们提议在大型模型的管道并行训练中使用可变大小的微批,每个微批可能包含不同数量的样本。我们使用动态编程方法优化微批的建立,并通过动态管道和通信调度来处理微批执行时间的变化,以实现高效的管道训练。我们对FLANv2数据集进行了广泛的评估,并显示在与填充基elines进行比较时,DynaPipe可以增加训练 durchput的4.39倍,对T5模型进行训练时,可以增加3.25倍。DynaPipe的源代码可以在https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines中获取。
Decentralized Energy Marketplace via NFTs and AI-based Agents
results: 研究人员通过对系统进行广泛评估,证明了系统的扩展性和FDRL方法在分布式能源供应中的优化性。这项研究对建立先进的分布式智能电网基础设施做出了重要贡献,并拓宽了区块链和人工智能在可再生能源系统中的应用前景。Abstract
The paper introduces an advanced Decentralized Energy Marketplace (DEM) integrating blockchain technology and artificial intelligence to manage energy exchanges among smart homes with energy storage systems. The proposed framework uses Non-Fungible Tokens (NFTs) to represent unique energy profiles in a transparent and secure trading environment. Leveraging Federated Deep Reinforcement Learning (FDRL), the system promotes collaborative and adaptive energy management strategies, maintaining user privacy. A notable innovation is the use of smart contracts, ensuring high efficiency and integrity in energy transactions. Extensive evaluations demonstrate the system's scalability and the effectiveness of the FDRL method in optimizing energy distribution. This research significantly contributes to developing sophisticated decentralized smart grid infrastructures. Our approach broadens potential blockchain and AI applications in sustainable energy systems and addresses incentive alignment and transparency challenges in traditional energy trading mechanisms. The implementation of this paper is publicly accessible at \url{https://github.com/RasoulNik/DEM}.
摘要
文章介绍了一种先进的分布式能源市场place(DEM),利用区块链技术和人工智能来管理智能家庭之间的能源交易。提出的框架使用非 fungible Token(NFT)来表示独特的能源Profile,创造透明和安全的交易环境。通过联邦深度学习(FDRL),系统推广合作和适应能源管理策略,保持用户隐私。使用智能合同,确保高效和完整的能源交易。经过广泛评估,系统的扩展性和FDRL方法在优化能源分布方面的效果得到证明。这项研究对建立先进的分布式智能网格基础设施做出了重要贡献。我们的方法拓宽了区块和人工智能在可再生能源系统中的应用前景,解决了传统能源交易机制中的奖励对齐和透明度挑战。实现该文件可以通过 \url{https://github.com/RasoulNik/DEM} 访问。
results: 该论文对多个研究和生产用 caso,包括训练和推理,在多个优化问题、多个编译器和其版本以及多个gym基础设施进行了评估。Abstract
There is a growing interest in enhancing compiler optimizations with ML models, yet interactions between compilers and ML frameworks remain challenging. Some optimizations require tightly coupled models and compiler internals,raising issues with modularity, performance and framework independence. Practical deployment and transparency for the end-user are also important concerns. We propose ML-Compiler-Bridge to enable ML model development within a traditional Python framework while making end-to-end integration with an optimizing compiler possible and efficient. We evaluate it on both research and production use cases, for training and inference, over several optimization problems, multiple compilers and its versions, and gym infrastructures.
摘要
有增长的兴趣在把机器学习(ML)模型与编译器结合使用,但是编译器和ML框架之间的交互仍然是一个挑战。一些优化需要与编译器内部紧密集成的模型,导致模块化、性能和框架独立性的问题。在实践中,并不是所有的用户都可以轻松地使用和理解这些技术。我们提出了ML编译桥,它使得在传统的Python框架中开发ML模型,同时可以与优化编译器进行绑定,并且可以实现高效的终端集成。我们在多个研究和生产用 caso中进行了评估,包括训练和推理、多个编译器和其版本、以及Gym基础设施。
Delete My Account: Impact of Data Deletion on Machine Learning Classifiers
results: 我们发现,删除数据量、数据集特点和删除偏好等因素对机器学习模型表现产生强烈的影响。Abstract
Users are more aware than ever of the importance of their own data, thanks to reports about security breaches and leaks of private, often sensitive data in recent years. Additionally, the GDPR has been in effect in the European Union for over three years and many people have encountered its effects in one way or another. Consequently, more and more users are actively protecting their personal data. One way to do this is to make of the right to erasure guaranteed in the GDPR, which has potential implications for a number of different fields, such as big data and machine learning. Our paper presents an in-depth analysis about the impact of the use of the right to erasure on the performance of machine learning models on classification tasks. We conduct various experiments utilising different datasets as well as different machine learning algorithms to analyse a variety of deletion behaviour scenarios. Due to the lack of credible data on actual user behaviour, we make reasonable assumptions for various deletion modes and biases and provide insight into the effects of different plausible scenarios for right to erasure usage on data quality of machine learning. Our results show that the impact depends strongly on the amount of data deleted, the particular characteristics of the dataset and the bias chosen for deletion and assumptions on user behaviour.
摘要
用户们现在更加意识到自己的数据重要性,这主要归功于过去几年内的安全泄露和private数据泄露事件的报道。此外,欧盟的GDPR也已经在三年之前生效,许多人已经直接或 indirectly受其影响。因此,更多的用户正在主动保护自己的个人数据。一种方式是通过在GDPR中确保的“右 deletion”来保护自己的数据。这种技术在big data和机器学习等领域可能有各种不同的应用。我们的论文提供了关于使用“右 deletion”的影响对机器学习模型的分类任务性能的深入分析。我们在不同的数据集和机器学习算法上进行了多种实验,以分析不同的删除行为场景。由于实际用户行为的可靠数据缺乏,我们在不同的删除模式和偏见下做出了合理的假设,并对不同的数据质量和机器学习模型的影响进行了分析。我们的结果表明,删除数据的量、特定数据集的特点以及删除模式和偏见的选择都会对数据质量产生强烈的影响。
Adaptive Modelling Approach for Row-Type Dependent Predictive Analysis (RTDPA): A Framework for Designing Machine Learning Models for Credit Risk Analysis in Banking Sector
For: 这个论文的目的是提出一种适应行业特点的预测分析方法(RTDPA),以便更好地处理不同行业类型的数据。* Methods: 该方法使用了特定行业类型的数据预处理和特性工程,并选择了传统机器学习预测模型和高级组合技术。* Results: 研究发现,所有预测方法具有精度达90%以上,而RTDPA方法可以为每个行业类型分别应用不同的预测模型,以捕捉每个行业类型的特定特征和模式。这种方法为银行业提供了更加准确和定制化的分类结果。Abstract
In many real-world datasets, rows may have distinct characteristics and require different modeling approaches for accurate predictions. In this paper, we propose an adaptive modeling approach for row-type dependent predictive analysis(RTDPA). Our framework enables the development of models that can effectively handle diverse row types within a single dataset. Our dataset from XXX bank contains two different risk categories, personal loan and agriculture loan. each of them are categorised into four classes standard, sub-standard, doubtful and loss. We performed tailored data pre processing and feature engineering to different row types. We selected traditional machine learning predictive models and advanced ensemble techniques. Our findings indicate that all predictive approaches consistently achieve a precision rate of no less than 90%. For RTDPA, the algorithms are applied separately for each row type, allowing the models to capture the specific patterns and characteristics of each row type. This approach enables targeted predictions based on the row type, providing a more accurate and tailored classification for the given dataset.Additionally, the suggested model consistently offers decision makers valuable and enduring insights that are strategic in nature in banking sector.
摘要
在许多现实世界数据集中,每行可能具有不同的特点和需要不同的模型预测方法以获得准确的预测结果。在这篇论文中,我们提出了一种适应型模型预测方法(RTDPA),允许模型在单个数据集中处理多种行类型。我们的数据集来自XXX银行,包括两个不同的风险类别:个人贷款和农业贷款。每个类别都被分为四个类别:标准、次normal、 doubtful和损失。我们进行了适应的数据处理和特征工程,并选择了传统的机器学习预测模型和高级的集成技术。我们的发现表明,所有预测方法均能够具有至少90%的准确率。为RTDPA,我们对每个行类型应用了不同的算法,使模型能够捕捉每个行类型特有的特征和模式。这种方法允许模型进行基于行类型的targeted预测,为给定数据集提供更加准确和定制化的分类结果。此外,我们的建议模型可以为银行部门提供有价值和持续的战略性发现。
Few-shot Message-Enhanced Contrastive Learning for Graph Anomaly Detection
paper_authors: Fan Xu, Nan Wang, Xuezhi Wen, Meiqi Gao, Chaoqun Guo, Xibin Zhao
for: The paper is focused on developing a novel few-shot graph anomaly detection model called FMGAD, which can effectively identify anomalies in graph data with limited labeled information.
methods: The proposed FMGAD model uses a self-supervised contrastive learning strategy within and across views to capture intrinsic and transferable structural representations. Additionally, the model employs a Deep-GNN message-enhanced reconstruction module to extensively exploit few-shot label information and disseminate supervision signals to deeper unlabeled nodes.
results: The paper demonstrates that FMGAD achieves better performance than other state-of-the-art methods on six real-world datasets, regardless of artificially injected anomalies or domain-organic anomalies.Abstract
Graph anomaly detection plays a crucial role in identifying exceptional instances in graph data that deviate significantly from the majority. It has gained substantial attention in various domains of information security, including network intrusion, financial fraud, and malicious comments, et al. Existing methods are primarily developed in an unsupervised manner due to the challenge in obtaining labeled data. For lack of guidance from prior knowledge in unsupervised manner, the identified anomalies may prove to be data noise or individual data instances. In real-world scenarios, a limited batch of labeled anomalies can be captured, making it crucial to investigate the few-shot problem in graph anomaly detection. Taking advantage of this potential, we propose a novel few-shot Graph Anomaly Detection model called FMGAD (Few-shot Message-Enhanced Contrastive-based Graph Anomaly Detector). FMGAD leverages a self-supervised contrastive learning strategy within and across views to capture intrinsic and transferable structural representations. Furthermore, we propose the Deep-GNN message-enhanced reconstruction module, which extensively exploits the few-shot label information and enables long-range propagation to disseminate supervision signals to deeper unlabeled nodes. This module in turn assists in the training of self-supervised contrastive learning. Comprehensive experimental results on six real-world datasets demonstrate that FMGAD can achieve better performance than other state-of-the-art methods, regardless of artificially injected anomalies or domain-organic anomalies.
摘要
GRAPH anomaly detection plays a crucial role in identifying exceptional instances in graph data that deviate significantly from the majority. It has gained substantial attention in various domains of information security, including network intrusion, financial fraud, and malicious comments, etc. Existing methods are primarily developed in an unsupervised manner due to the challenge in obtaining labeled data. For lack of guidance from prior knowledge in unsupervised manner, the identified anomalies may prove to be data noise or individual data instances. In real-world scenarios, a limited batch of labeled anomalies can be captured, making it crucial to investigate the few-shot problem in graph anomaly detection. Taking advantage of this potential, we propose a novel few-shot Graph Anomaly Detection model called FMGAD (Few-shot Message-Enhanced Contrastive-based Graph Anomaly Detector). FMGAD leverages a self-supervised contrastive learning strategy within and across views to capture intrinsic and transferable structural representations. Furthermore, we propose the Deep-GNN message-enhanced reconstruction module, which extensively exploits the few-shot label information and enables long-range propagation to disseminate supervision signals to deeper unlabeled nodes. This module in turn assists in the training of self-supervised contrastive learning. Comprehensive experimental results on six real-world datasets demonstrate that FMGAD can achieve better performance than other state-of-the-art methods, regardless of artificially injected anomalies or domain-organic anomalies.
FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification
results: 在一组ML模型上,基于FIKIT的推理系统在GPU共享模式下加速高优先级任务的执行速度,相比JCT,高优先级任务的加速比例在1.33到14.87倍之间,而且超过一半的情况下加速超过3.5倍。同时,在预约共享模式下,低优先级任务的执行速度与默认GPU共享模式JCT相似,占用了0.84到1倍的时间。此外,我们还限制了核心测量和细化核心调度过程的负担在10%以下。Abstract
Highly parallelized workloads like machine learning training, inferences and general HPC tasks are greatly accelerated using GPU devices. In a cloud computing cluster, serving a GPU's computation power through multi-tasks sharing is highly demanded since there are always more task requests than the number of GPU available. Existing GPU sharing solutions focus on reducing task-level waiting time or task-level switching costs when multiple jobs competing for a single GPU. Non-stopped computation requests come with different priorities, having non-symmetric impact on QoS for sharing a GPU device. Existing work missed the kernel-level optimization opportunity brought by this setting. To address this problem, we present a novel kernel-level scheduling strategy called FIKIT: Filling Inter-kernel Idle Time. FIKIT incorporates task-level priority information, fine-grained kernel identification, and kernel measurement, allowing low priorities task's execution during high priority task's inter-kernel idle time. Thereby, filling the GPU's device runtime fully, and reduce overall GPU sharing impact to cloud services. Across a set of ML models, the FIKIT based inference system accelerated high priority tasks by 1.33 to 14.87 times compared to the JCT in GPU sharing mode, and more than half of the cases are accelerated by more than 3.5 times. Alternatively, under preemptive sharing, the low-priority tasks have a comparable to default GPU sharing mode JCT, with a 0.84 to 1 times ratio. We further limit the kernel measurement and runtime fine-grained kernel scheduling overhead to less than 10%.
摘要
高度并行化工作负载如机器学习训练、推断和一般高性能计算任务在GPU设备上得到了很大的加速。在云计算集群中,通过多任务共享来利用GPU的计算能力是非常受需求的,因为有更多的任务请求 чемGPU可用。现有的GPU共享解决方案主要集中在缓和任务级别的等待时间或任务级别的转换成本,当多个作业竞争一个GPU时。现有的工作缺乏了核心层级优化机会,这个设定下。为解决这个问题,我们提出了一个新的核心层级排程策略 called FIKIT:填充间隔 kernel Idle Time。FIKIT包括任务级别优先级信息、精确的核心识别和核心衡量,因此在高优先级任务的间隔 kernel Idle Time中执行低优先级任务,从而填充GPU的设备时间,并将全局GPU共享影响减少到云服务。遍历一系列的ML模型,基于FIKIT的推断系统在GPU共享模式下加速高优先级任务1.33到14.87倍,并大多数情况上增加了超过3.5倍。另一方面,在预设共享模式下,低优先级任务与默认GPU共享模式JCT相似,几乎没有差异。我们进一步限制了测量和精确核心排程调度的开销,以便在10%以下。
How False Data Affects Machine Learning Models in Electrochemistry?
for: This study aims to evaluate the performance of machine learning models in noisy electrochemical data and to determine whether stacking models can provide robustness to weak-to-noise models.
methods: The study uses 12 standalone models and a stacking model to test the performance of different machine learning models on electrochemical data. The models include XGB, LGBM, RF, GB, ADA, NN, ELAS, LASS, RIDGE, SVM, KNN, DT, and the stacking model.
results: The study finds that linear models handle noise well but suffer from low prediction accuracy, while tree-based models have poor noise handling but high prediction accuracy. The stacking model exhibits both high accuracy and good noise handling, making it a viable choice for beginner and experienced machine learning researchers in electrochemistry. Additionally, the study shows that neural networks are not suitable for electrochemical data and can be susceptible to noise.Abstract
Recently, the selection of machine learning model based on only the data distribution without concerning the noise of the data. This study aims to distinguish, which models perform well under noisy data, and establish whether stacking machine learning models actually provide robustness to otherwise weak-to-noise models. The electrochemical data were tested with 12 standalone models and stacking model. This includes XGB, LGBM, RF, GB, ADA, NN, ELAS, LASS, RIDGE, SVM, KNN, DT, and the stacking model. It is found that linear models handle noise well with the average error of (slope) to 1.75 F g-1 up to error per 100% percent noise added; but it suffers from prediction accuracy due to having an average of 60.19 F g-1 estimated at minimal error at 0% noise added. Tree-based models fail in terms of noise handling (average slope is 55.24 F g-1 at 100% percent noise), but it can provide higher prediction accuracy (lowest error of 23.9 F g-1) than that of linear. To address the controversial between prediction accuracy and error handling, the stacking model was constructed, which is not only show high accuracy (intercept of 25.03 F g-1), but it also exhibits good noise handling (slope of 43.58 F g-1), making stacking models a relatively low risk and viable choice for beginner and experienced machine learning research in electrochemistry. Even though neural networks (NN) are gaining popularity in the electrochemistry field. However, this study presents that NN is not suitable for electrochemical data, and improper tuning resulting in a model that is susceptible to noise. Thus, STACK models should provide better benefits in that even with untuned base models, they can achieve an accurate and noise-tolerant model. Overall, this work provides insight into machine learning model selection for electrochemical data, which should aid the understanding of data science in chemistry context.
摘要
近来,选择基于数据分布而不考虑数据噪声的机器学习模型。这项研究目的是分辨哪些模型在噪声数据上表现良好,并确定堆叠机器学习模型是否提供强度噪声模型的可靠性。使用12个独立模型和堆叠模型测试电化学数据。包括XGB、LGBM、RF、GB、ADA、NN、ELAS、LASS、RIDGE、SVM、KNN和DT模型,以及堆叠模型。结果显示线性模型在噪声数据上处理噪声well,其平均误差为1.75 F g-1,但是它因为误差最小值为0% 噪声添加而导致预测精度低下。树状模型在噪声处理方面失败(平均 Slope 为55.24 F g-1),但它可以提供更高的预测精度(最低误差为23.9 F g-1)。为了解决预测精度和噪声处理之间的矛盾,堆叠模型被构建,它不仅显示高精度( intercept 为25.03 F g-1),而且 также表现良好的噪声处理( Slope 为43.58 F g-1)。因此,堆叠模型可以在电化学领域中提供低风险且可靠的选择。虽然神经网络(NN)在电化学领域中获得了流行,但这项研究表明NN不适合电化学数据,并且不当调整可能导致模型易受噪声的影响。因此,STACK模型可以提供更好的利益,即甚至无需调整基本模型,可以实现准确且噪声耐受的模型。总之,这项研究为电化学数据机器学习模型选择提供了新的理解,帮助数据科学在化学上下文中进行更好的发展。
Towards Machine Learning-based Quantitative Hyperspectral Image Guidance for Brain Tumor Resection
paper_authors: David Black, Declan Byrne, Anna Walke, Sidong Liu, Antonio Di leva, Sadahiro Kaneko, Walter Stummer, Septimiu Salcudean, Eric Suero Molina for: 这个论文主要目标是为了开发一种基于多光谱的肿瘤分类系统,以帮助 neurosurgeon 在运行时分辨不同类型的肿瘤。methods: 这个论文使用了五种氧化酶的谱荧光特征,通过多光谱成像技术来分析这些氧化酶的谱荧光特征,并使用机器学习算法来类型不同的肿瘤和组织。results: 这个论文的结果表明,使用这五种氧化酶的谱荧光特征可以准确地分类不同类型的肿瘤和组织,并且这些氧化酶的谱荧光特征在不同的肿瘤和组织中具有不同的异常性。Abstract
Complete resection of malignant gliomas is hampered by the difficulty in distinguishing tumor cells at the infiltration zone. Fluorescence guidance with 5-ALA assists in reaching this goal. Using hyperspectral imaging, previous work characterized five fluorophores' emission spectra in most human brain tumors. In this paper, the effectiveness of these five spectra was explored for different tumor and tissue classification tasks in 184 patients (891 hyperspectral measurements) harboring low- (n=30) and high-grade gliomas (n=115), non-glial primary brain tumors (n=19), radiation necrosis (n=2), miscellaneous (n=10) and metastases (n=8). Four machine learning models were trained to classify tumor type, grade, glioma margins and IDH mutation. Using random forests and multi-layer perceptrons, the classifiers achieved average test accuracies of 74-82%, 79%, 81%, and 93% respectively. All five fluorophore abundances varied between tumor margin types and tumor grades (p < 0.01). For tissue type, at least four of the five fluorophore abundances were found to be significantly different (p < 0.01) between all classes. These results demonstrate the fluorophores' differing abundances in different tissue classes, as well as the value of the five fluorophores as potential optical biomarkers, opening new opportunities for intraoperative classification systems in fluorescence-guided neurosurgery.
摘要
完全除除恶性肿瘤受到了区分肿瘤细胞的困难所限制。使用5-ALA的荧光导航可以帮助达成这个目标。以前的工作通过多光谱成像技术 caracterized five fluorophores的辐射 спектrum在人脑肿瘤中的表达。本文 investigate了这五种辐射的效iveness для不同的肿瘤和组织类型分类任务,在184名患者(891次谱测)中进行了研究,其中有30例低级 glioma、115例高级 glioma、非 glial主要脑肿瘤19例、辐射nekrosis 2例、其他10例和 метастаasis 8例。四种机器学习模型被训练来分类肿瘤类型、级别、 glioma 边缘和 IDH 突变。使用随机森林和多层感知器,分类器在测试集上达到了74-82%、79%、81%和93%的平均测试准确率。五种辐射含量在肿瘤边缘类型和肿瘤级别之间有 statistically significant differences(P < 0.01)。对于组织类型,至少有四种辐射含量都是 statistically significant differences(P < 0.01)。这些结果表明五种辐射含量在不同的组织类型之间有 statistically significant differences,同时也表明了这五种辐射的可能作为光学生物标志的价值,开启了新的可见光导航系统在肿瘤预置手术中的新机会。
Graph Sparsifications using Neural Network Assisted Monte Carlo Tree Search
paper_authors: Alvin Chiu, Mithun Ghosh, Reyan Ahmed, Kwang-Sung Jun, Stephen Kobourov, Michael T. Goodrich
for: 这个论文是为了计算图 sparse 的目的而写的。
methods: 该论文使用了图神经网络和 Monte Carlo 搜索来计算图 sparse。首先,使用图神经网络来训练一个可以接受部分解决方案并提出新节点的模型,然后使用这个模型在 Monte Carlo 搜索中计算一个简化后的图。
results: 该方法在不同类型的图上consistently 超过了一些标准的approximation algorithm,并经常找到最佳解决方案。Abstract
Graph neural networks have been successful for machine learning, as well as for combinatorial and graph problems such as the Subgraph Isomorphism Problem and the Traveling Salesman Problem. We describe an approach for computing graph sparsifiers by combining a graph neural network and Monte Carlo Tree Search. We first train a graph neural network that takes as input a partial solution and proposes a new node to be added as output. This neural network is then used in a Monte Carlo search to compute a sparsifier. The proposed method consistently outperforms several standard approximation algorithms on different types of graphs and often finds the optimal solution.
摘要
图 нейрон网络已成功应用于机器学习和组合问题,如子图同构问题和旅行商问题。我们提出了一种基于图 нейрон网络和Monte Carlo搜索的图简化算法。我们首先训练了一个图 нейрон网络,它接受一个半解决方案并生成一个新的节点作为输出。这个神经网络然后在Monte Carlo搜索中用于计算简化图。我们的方法常常超过一些标准近似算法,并经常找到优化解决方案。
Interpretable Modeling of Single-cell perturbation Responses to Novel Drugs Using Cycle Consistence Learning
methods: 该框架基于编码器-解码器架构,将初始细胞状态映射到一个缺失空间中,其中假设药物干扰对细胞状态的效果follows linear additivity。另外,我们还引入了一种循环一致性约束,以保证初始细胞状态下药物干扰后的细胞响应能够恢复到初始细胞状态。
results: 我们对三种不同类型的数据集进行验证,包括维度肿瘤响应、蛋白质响应和单个细胞肿瘤响应。结果显示,我们的模型在比较状态-of-the-art方法时表现出了更好的性能。Abstract
Phenotype-based screening has attracted much attention for identifying cell-active compounds. Transcriptional and proteomic profiles of cell population or single cells are informative phenotypic measures of cellular responses to perturbations. In this paper, we proposed a deep learning framework based on encoder-decoder architecture that maps the initial cellular states to a latent space, in which we assume the effects of drug perturbation on cellular states follow linear additivity. Next, we introduced the cycle consistency constraints to enforce that initial cellular state subjected to drug perturbations would produce the perturbed cellular responses, and, conversely, removal of drug perturbation from the perturbed cellular states would restore the initial cellular states. The cycle consistency constraints and linear modeling in latent space enable to learn interpretable and transferable drug perturbation representations, so that our model can predict cellular response to unseen drugs. We validated our model on three different types of datasets, including bulk transcriptional responses, bulk proteomic responses, and single-cell transcriptional responses to drug perturbations. The experimental results show that our model achieves better performance than existing state-of-the-art methods.
摘要
生物学上的现象型层次检测已经吸引了很多关注,以找到活性物质的潜在作用。 транскрипцион和蛋白质Profile of cell population or single cells are informative phenotypic measures of cellular responses to perturbations. 在这篇论文中,我们提出了基于编码器-解码器架构的深度学习框架,将初始cellular states映射到一个缺失空间中,我们假设药物干扰对cellular states的影响是线性的加性。然后,我们引入了循环一致性约束,要求初始cellular state exposed to drug perturbations would produce the perturbed cellular responses,并且,从扰动cellular states中移除药物干扰后,初始cellular states可以恢复到原始的cellular states。这些循环一致性约束和线性模型在缺失空间中学习可读取和可传递的药物干扰表示,使得我们的模型可以预测未经见过的药物的作用。我们在三种不同的数据集上验证了我们的模型,包括分布式转录表达数据集、分布式蛋白质表达数据集和单个cell transcriptional responses to drug perturbations。实验结果显示,我们的模型在与现有的状态艺术方法相比,有更好的性能。
Imagination-augmented Hierarchical Reinforcement Learning for Safe and Interactive Autonomous Driving in Urban Environments
results: 在五个复杂的城市驾驶任务中,我们的层次代理成功完成了安全意识和互动行为,其成功率和平均话语步数都高于基elines。Abstract
Hierarchical reinforcement learning (HRL) has led to remarkable achievements in diverse fields. However, existing HRL algorithms still cannot be applied to real-world navigation tasks. These tasks require an agent to perform safety-aware behaviors and interact with surrounding objects in dynamic environments. In addition, an agent in these tasks should perform consistent and structured exploration as they are long-horizon and have complex structures with diverse objects and task-specific rules. Designing HRL agents that can handle these challenges in real-world navigation tasks is an open problem. In this paper, we propose imagination-augmented HRL (IAHRL), a new and general navigation algorithm that allows an agent to learn safe and interactive behaviors in real-world navigation tasks. Our key idea is to train a hierarchical agent in which a high-level policy infers interactions by interpreting behaviors imagined with low-level policies. Specifically, the high-level policy is designed with a permutation-invariant attention mechanism to determine which low-level policy generates the most interactive behavior, and the low-level policies are implemented with an optimization-based behavior planner to generate safe and structured behaviors following task-specific rules. To evaluate our algorithm, we introduce five complex urban driving tasks, which are among the most challenging real-world navigation tasks. The experimental results indicate that our hierarchical agent performs safety-aware behaviors and properly interacts with surrounding vehicles, achieving higher success rates and lower average episode steps than baselines in urban driving tasks.
摘要
Leveraging Function Space Aggregation for Federated Learning at Scale
results: 对实际大规模跨设备测试,该算法在客户端模型偏离度增加时表现更加稳定,并在本地训练轮次增加时表现出显著提高。此外,该算法可以更好地实现本地化个性化,例如在 Stack Overflow 上进行几步个性化后, FedFish 比 FedAvg 提高了下一个token预测的正确率7%。Abstract
The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
摘要
联邦学习模式对联邦学习方法的发展提供了启发,包括将多个客户端更新融合到全球服务器模型中,无需分享客户数据。许多联邦学习算法,包括标准的联邦均值(FedAvg),使用直接(可能是加权)平均客户参数更新,基于分布式优化的结果。在这个研究中,我们从函数空间的角度出发,提出一个新的算法,FedFish,它将客户端的本地近似函数学习结果融合,使用估计基于客户端的费雪信息。我们在实际的大规模跨设备测试中评估了FedFish。相比于客户端模型偏离的情况下,FedFish的表现更加稳定,我们的评估结果显示,当客户端训练epoch增加时,FedFish的表现会比FedAvg更好。此外,FedFish可以实现更好的本地化,例如在C4 dataset上进行联邦预训,然后在相同或类似数据分布上进行几步本地微调,可以得到7%的下一个字预测提升。
Sobol Sequence Optimization for Hardware-Efficient Vector Symbolic Architectures
results: 对于语言和标题分类两个应用,本研究的实验结果显示,使用SobolSequence生成高维向量可以提高准确率,相比 traditional方法(基于线性反馈Shift Register和MATLAB随机函数),准确率提高10.79%,同时具有更低的能耗和更高的面积-延迟产品。Abstract
Hyperdimensional computing (HDC) is an emerging computing paradigm with significant promise for efficient and robust learning. In HDC, objects are encoded with high-dimensional vector symbolic sequences called hypervectors. The quality of hypervectors, defined by their distribution and independence, directly impacts the performance of HDC systems. Despite a large body of work on the processing parts of HDC systems, little to no attention has been paid to data encoding and the quality of hypervectors. Most prior studies have generated hypervectors using inherent random functions, such as MATLAB`s or Python`s random function. This work introduces an optimization technique for generating hypervectors by employing quasi-random sequences. These sequences have recently demonstrated their effectiveness in achieving accurate and low-discrepancy data encoding in stochastic computing systems. The study outlines the optimization steps for utilizing Sobol sequences to produce high-quality hypervectors in HDC systems. An optimization algorithm is proposed to select the most suitable Sobol sequences for generating minimally correlated hypervectors, particularly in applications related to symbol-oriented architectures. The performance of the proposed technique is evaluated in comparison to two traditional approaches of generating hypervectors based on linear-feedback shift registers and MATLAB random function. The evaluation is conducted for two applications: (i) language and (ii) headline classification. Our experimental results demonstrate accuracy improvements of up to 10.79%, depending on the vector size. Additionally, the proposed encoding hardware exhibits reduced energy consumption and a superior area-delay product.
摘要
高维ensional计算(HDC)是一种出现在计算机科学中的新型计算模式,它提供了高效和可靠的学习机制。在HDC中,对象被编码为高维度 вектор符号序列called hypervectors。 hypervectors的质量直接影响HDC系统的性能。despite a large body of work on the processing parts of HDC systems, little attention has been paid to data encoding and the quality of hypervectors. Most prior studies have generated hypervectors using inherent random functions, such as MATLAB`s or Python`s random function. This work introduces an optimization technique for generating hypervectors by employing quasi-random sequences. These sequences have recently demonstrated their effectiveness in achieving accurate and low-discrepancy data encoding in stochastic computing systems. The study outlines the optimization steps for utilizing Sobol sequences to produce high-quality hypervectors in HDC systems. An optimization algorithm is proposed to select the most suitable Sobol sequences for generating minimally correlated hypervectors, particularly in applications related to symbol-oriented architectures. The performance of the proposed technique is evaluated in comparison to two traditional approaches of generating hypervectors based on linear-feedback shift registers and MATLAB random function. The evaluation is conducted for two applications: (i) language and (ii) headline classification. Our experimental results demonstrate accuracy improvements of up to 10.79%, depending on the vector size. Additionally, the proposed encoding hardware exhibits reduced energy consumption and a superior area-delay product.
Multiscale Hodge Scattering Networks for Data Analysis
results: 该方法可以提取具有抗变性和强健性的特征,并且可以用于 signal classification、domain classification 和 molecular dynamics prediction。Abstract
We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $\kappa$-GHWT and $\kappa$-HGLET, which we recently developed for simplices of dimension $\kappa \in \N$ in a given simplicial complex by generalizing the node-based Generalized Haar-Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). The $\kappa$-GHWT and the $\kk$-HGLET both form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs use a layered structure analogous to a convolutional neural network (CNN) to cascade the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation that is akin to local pooling in CNNs, and which may be performed either locally or per-scale. These pooling operations are harder to define in both traditional scattering networks based on Morlet wavelets, and geometric scattering networks based on Diffusion Wavelets. As a result, we are able to extract a rich set of descriptive yet robust features that can be used along with very simple machine learning methods (i.e., logistic regression or support vector machines) to achieve high-accuracy classification systems with far fewer parameters to train than most modern graph neural networks. Finally, we demonstrate the usefulness of our MHSNs in three distinct types of problems: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.
摘要
我们提出了一种新的扩散网络,称为多Scale霍德扩散网络(MHSN),用于测量 simplicial 复合体上的信号。我们的构建基于 simplicial 复合体上的多Scale基准词典,即 $\kappa$-GHWT 和 $\kappa$-HGLET,我们之前在 simplicial 复合体上发展了这些基准词典。这些词典包含了多Scale基准向量的 redundancy 集和相应的扩展系数,可以用来描述信号的不同尺度特征。我们的 MHSN 使用层次结构类似于 convolutional neural network (CNN),将模ulus 的词典系数 moments 层次结构化。这些特征具有对 simplicial 复合体的排序(i.e., 节点重新排序)的不变性。此外,使用多Scale基准词典的使用可以自然地进行本地池化操作,这与传统的扩散网络基于 Morlet 波lets 和 geometric 扩散网络基于Diffusion 波lets不同。这些池化操作可以在多个尺度上进行,并且可以使用本地或每个尺度进行。这些特征可以用与非常简单的机器学习方法(例如 logistic regression 或 support vector machines)结合使用,以实现高精度的分类系统,并且具有许多参数 fewer than most modern graph neural networks。 Finally, we demonstrate the usefulness of our MHSNs in three distinct types of problems: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.
results: 提高了 existing DCD 方法的稳定性和性能,能够更好地处理 thousands of variables 的场景Abstract
Inferring causal relationships as directed acyclic graphs (DAGs) is an important but challenging problem. Differentiable Causal Discovery (DCD) is a promising approach to this problem, framing the search as a continuous optimization. But existing DCD methods are numerically unstable, with poor performance beyond tens of variables. In this paper, we propose Stable Differentiable Causal Discovery (SDCD), a new method that improves previous DCD methods in two ways: (1) It employs an alternative constraint for acyclicity; this constraint is more stable, both theoretically and empirically, and fast to compute. (2) It uses a training procedure tailored for sparse causal graphs, which are common in real-world scenarios. We first derive SDCD and prove its stability and correctness. We then evaluate it with both observational and interventional data and on both small-scale and large-scale settings. We find that SDCD outperforms existing methods in both convergence speed and accuracy and can scale to thousands of variables.
摘要
“推断 causal 关系为导向无环图(DAGs)是一个重要但具有挑战性的问题。可 diferenciable causal discovery(DCD)是一种有前途的方法,它将搜索转化为连续优化问题。但现有的 DCD 方法存在数值不稳定性问题,性能在多个变量上不佳。在这篇论文中,我们提出了稳定可 diferenciable causal discovery(SDCD)方法,这种方法在两个方面提高了现有 DCD 方法:(1)它使用了一种更稳定的 alternating 约束,这个约束是理论上和实际上更稳定,计算快速。(2)它使用了适用于稀疏 causal 图的训练程序,这种图是现实世界中常见的。我们首先 deriv 了 SDCD 并证明其稳定性和正确性。然后,我们对 observational 和 intervencial 数据进行了评估,并在小规模和大规模的设置下进行了评估。我们发现 SDCD 在速度和准确性两个方面都高于现有方法,并且可以扩展到千个变量。”
FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems
For: This paper aims to develop a new framework called FREE for modeling environmental ecosystems, which can capture the complex relationships between various environmental data over space and time.* Methods: The FREE framework uses Large Language Models (LLMs) to map available environmental data into a text space and convert the traditional predictive modeling task into a semantic recognition problem. This allows for the incorporation of natural language descriptions and the capture of data semantics.* Results: The proposed FREE framework is evaluated in two real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. The results show that FREE outperforms multiple baseline methods and is more data- and computation-efficient, as it can be pre-trained on simulated data generated by physics-based models.Abstract
Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time? In this paper, we introduce a new framework, FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in Large Language Models (LLMs) to supplement the original input features with natural language descriptions. This facilitates capturing the data semantics and also allows harnessing the irregularities of input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction. The efficacy of FREE is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Beyond the superior predictive performance over multiple baseline methods, FREE is shown to be more data- and computation-efficient as it can be pre-trained on simulated data generated by physics-based models.
摘要
Modeling environmental ecosystems is critical for the sustainability of our planet, but it is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time?In this paper, we introduce a new framework called FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in Large Language Models (LLMs) to supplement the original input features with natural language descriptions, allowing for the capture of data semantics and the harnessing of irregularities in the input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction.We evaluate the efficacy of FREE in the context of two societally important real-world applications: predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Compared to multiple baseline methods, FREE achieves superior predictive performance and is more data- and computation-efficient, as it can be pre-trained on simulated data generated by physics-based models.
Degeneration of kernel regression with Matern kernels into low-order polynomial regression in high dimension
results: 在高维特征空间中, kernel方法可能变得不稳定,容易变成低阶多项式回归,失去了对kernel方法的优势。这些结果为PIP型多项式模型在中型分子中的成功提供了更多的光UNTING,以及使用physically-motivated(复制)kernels的重要性。Abstract
Kernel methods such as kernel ridge regression and Gaussian process regressions with Matern type kernels have been increasingly used, in particular, to fit potential energy surfaces (PES) and density functionals, and for materials informatics. When the dimensionality of the feature space is high, these methods are used with necessarily sparse data. In this regime, the optimal length parameter of a Matern-type kernel tends to become so large that the method effectively degenerates into a low-order polynomial regression and therefore loses any advantage over such regression. This is demonstrated theoretically as well as numerically on the examples of six- and fifteen-dimensional molecular PES using squared exponential and simple exponential kernels. The results shed additional light on the success of polynomial approximations such as PIP for medium size molecules and on the importance of orders-of-coupling based models for preserving the advantages of kernel methods with Matern type kernels or on the use of physically-motivated (reproducing) kernels.
摘要
kernel 方法如 kernel ridge regression 和 Gaussian process regression with Matern 类型kernel 在特别是用来适应 potential energy surfaces (PES) 和物理函数als, 并在物理计算中进行材料信息学。当特征空间维度高时,这些方法通常使用必要 sparse 的数据。在这种情况下,Matern 类型 kernel 的最佳尺度参数往往变得非常大,这使得方法实际上变成了低阶多项式回归,从而失去了对 kernel 方法的优势。这些结果也提供了中型分子 PIP 的Success 和 Matern 类型 kernel 的orders-of-coupling 基本模型的重要性。Note: The translation is done using Google Translate, and may not be perfect.
results: 实验表明,该方法可以高效精确地学习数据集中的特征表示Here’s the breakdown of each point:
for: The paper is written to resolve the problem of heterogeneity in data collected at different times or locations.
methods: The paper proposes a modified NMF objective called Stratified-NMF, which learns strata-dependent statistics and a shared topics matrix.
results: The paper presents experimental results on synthetic data and real-world datasets to demonstrate the efficiency and accuracy of the method.Abstract
Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent statistics and a shared topics matrix. We develop multiplicative update rules for this novel objective and prove convergence of the objective. Then, we experiment on synthetic data to demonstrate the efficiency and accuracy of the method. Lastly, we apply our method to three real world datasets and empirically investigate their learned features.
摘要
非正式矩阵分解(NMF)是一种重要的数据低维表示技术。然而,经典的NMF不会考虑数据在不同时间或地点采集的差异性。我们解决这个问题,通过解决修改后的NMF目标函数,即Stratified-NMF,它同时学习约束分布和共享话题矩阵。我们开发了乘法更新规则,并证明目标函数的收敛性。然后,我们在synthetic数据上进行实验,以证明方法的效率和准确性。最后,我们应用我们的方法于三个实际世界数据集,并考察其学习出的特征。