results: 通过系列实验,证明了提出的设计原则的优势,包括:在ROC-AUC和PR-AUC中提高了相对增量达到11.88%和44.85%,并在推荐特定受欢迎和不受欢迎标题时保持了公平性。Abstract
Current multi-armed bandit approaches in recommender systems (RS) have focused more on devising effective exploration techniques, while not adequately addressing common exploitation challenges related to distributional changes and item cannibalization. Little work exists to guide the design of robust bandit frameworks that can address these frequent challenges in RS. In this paper, we propose a new design principles to (i) make bandit models robust to time-variant metadata signals, (ii) less prone to item cannibalization, and (iii) prevent their weights fluctuating due to data sparsity. Through a series of experiments, we systematically examine the influence of several important bandit design choices. We demonstrate the advantage of our proposed design principles at making bandit models robust to dynamic behavioral changes through in-depth analyses. Noticeably, we show improved relative gain compared to a baseline bandit model not incorporating our design choices of up to $11.88\%$ and $44.85\%$, respectively in ROC-AUC and PR-AUC. Case studies about fairness in recommending specific popular and unpopular titles are presented, to demonstrate the robustness of our proposed design at addressing popularity biases.
摘要
当前多臂罂缸方法在推荐系统(RS)中更多地关注了发展有效探索技术,而不够注意常见的利用探索挑战,如分布变化和物品吃掉。现有的工作不够引导设计Robust罂缸框架,以解决这些常见挑战。在这篇论文中,我们提出了一些新的设计原则,以使罂缸模型更加Robust于时变元数据信号, menos可害性和数据稀缺性。通过一系列实验,我们系统地检验了一些重要的罂缸设计选择的影响。我们示出了我们提出的设计原则的优势,使罂缸模型更加Robust于动态行为变化,并提高了相对增量比例,分别为11.88%和44.85%。我们还对推荐特定受欢迎和不受欢迎标题的公平性进行了案例研究,以示出我们的设计方法能够解决受欢迎性偏见。
The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance
results: 实验表明,这paper的方法可以在复杂的模拟场景中成功地评估变量重要性,并且可以准确地估计变量重要性的真实排名。此外,这paper还提供了理论保证和 finite sample error rates的分析,以及一个实际案例研究,以证明这paper的方法在实际应用中的效用。Abstract
Quantifying variable importance is essential for answering high-stakes questions in fields like genetics, public policy, and medicine. Current methods generally calculate variable importance for a given model trained on a given dataset. However, for a given dataset, there may be many models that explain the target outcome equally well; without accounting for all possible explanations, different researchers may arrive at many conflicting yet equally valid conclusions given the same data. Additionally, even when accounting for all possible explanations for a given dataset, these insights may not generalize because not all good explanations are stable across reasonable data perturbations. We propose a new variable importance framework that quantifies the importance of a variable across the set of all good models and is stable across the data distribution. Our framework is extremely flexible and can be integrated with most existing model classes and global variable importance metrics. We demonstrate through experiments that our framework recovers variable importance rankings for complex simulation setups where other methods fail. Further, we show that our framework accurately estimates the true importance of a variable for the underlying data distribution. We provide theoretical guarantees on the consistency and finite sample error rates for our estimator. Finally, we demonstrate its utility with a real-world case study exploring which genes are important for predicting HIV load in persons with HIV, highlighting an important gene that has not previously been studied in connection with HIV. Code is available here.
摘要
量化变量重要性是解决高负荷问题的关键在遗传学、公共政策和医学等领域。现有的方法通常计算变量重要性为给定模型和给定数据集中的。然而,对于给定数据集,可能有多个模型都能够准确地预测目标结果,而不同的研究人员可能因数据而得出不同的、尚未得到证明的结论。此外,即使考虑所有可能的解释,这些发现也可能不会普适化,因为不同的数据变换可能会导致不同的优秀模型。我们提出了一个新的变量重要性框架,可以评估变量重要性在所有好的模型中,并且在数据分布下是稳定的。我们的框架非常灵活,可以与大多数现有的模型类型和全局变量重要性度量结合使用。我们通过实验表明,我们的框架可以在复杂的模拟设置中成功地重新分配变量重要性。此外,我们证明了我们的框架可以准确地估计变量重要性的真实值,并提供了理论保证变量重要性的一致性和 finite sample error rate。最后,我们通过一个实际的案例研究,探讨了抑制HIV荷重的关键基因,并发现了一个没有在HIV相关研究中受到过关注的重要基因。代码可以在这里找到。
Improving Robustness of Deep Convolutional Neural Networks via Multiresolution Learning
results: our results show that multiresolution learning can significantly improve the robustness of DNN models for both 1D signal and 2D signal (image) prediction problems, and that this improvement can be achieved with small training dataset size and without sacrificing standard accuracy.Abstract
The current learning process of deep learning, regardless of any deep neural network (DNN) architecture and/or learning algorithm used, is essentially a single resolution training. We explore multiresolution learning and show that multiresolution learning can significantly improve robustness of DNN models for both 1D signal and 2D signal (image) prediction problems. We demonstrate this improvement in terms of both noise and adversarial robustness as well as with small training dataset size. Our results also suggest that it may not be necessary to trade standard accuracy for robustness with multiresolution learning, which is, interestingly, contrary to the observation obtained from the traditional single resolution learning setting.
摘要
当前深度学习的学习过程,无论使用哪种深度神经网络(DNN)架构和学习算法,都是单分辨率训练。我们研究多分辨率学习,并证明多分辨率学习可以显著提高深度神经网络模型对1D信号和2D图像预测问题的鲁棒性。我们通过对噪声和攻击性诊断的改进来证明这一点,同时也发现了训练集大小的影响。我们的结果还表明,在多分辨率学习Setting中,可能不需要在标准准确率和鲁棒性之间进行权衡,这与传统单分辨率学习Setting中所获得的观察相反。
Generative Residual Diffusion Modeling for Km-scale Atmospheric Downscaling
results: 这篇论文的结果显示了 ResDiff 方法在 bulk RMSE 和 CRPS scores 方面表现出了鼓舞人的能力。它还可以实现准确地预测风暴中的重要力学特征,例如降水和风速的分布。case studies 也显示了 ResDiff 方法在不同的气候现象中的适当运作。Abstract
The state of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs. Here, a km-scale downscaling diffusion model is presented as a cost effective alternative. The model is trained from a regional high-resolution weather model over Taiwan, and conditioned on ERA5 reanalysis data. To address the downscaling uncertainties, large resolution ratios (25km to 2km), different physics involved at different scales and predict channels that are not in the input data, we employ a two-step approach (\textit{ResDiff}) where a (UNet) regression predicts the mean in the first step and a diffusion model predicts the residual in the second step. \textit{ResDiff} exhibits encouraging skill in bulk RMSE and CRPS scores. The predicted spectra and distributions from ResDiff faithfully recover important power law relationships regulating damaging wind and rain extremes. Case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics. This includes the sharp wind and temperature variations that co-locate with intense rainfall in a cold front, and the extreme winds and rainfall bands that surround the eyewall of typhoons. Some evidence of simultaneous bias correction is found. A first attempt at downscaling directly from an operational global forecast model successfully retains many of these benefits. The implication is that a new era of fully end-to-end, global-to-regional machine learning weather prediction is likely near at hand.
摘要
现代物理危机预测技术需要使用高resolution数值 simulate,这些 simulate 通常需要很多的计算资源和高resolution的全球输入数据。在这篇文章中,我们提出了一种cost-effective的km级下采 diffusion模型,作为一种alternative。这个模型在台湾地区高resolution天气模型上训练,并使用ERA5分析数据进行条件。为了 Addressing downscaling uncertainties, we employ a two-step approach(ResDiff),其中一个(UNet)回归预报mean,而另一个是diffusion模型预报差异。ResDiff exhibits encouraging skill in bulk RMSE和CRPS分数。预测的spectrum和分布从ResDiff faithful recover了重要的power law关系,这些关系控制了wind和rain extrema的formation。case studies of coherent weather phenomena reveal appropriate multivariate relationships reminiscent of learnt physics, such as the sharp wind and temperature variations that co-locate with intense rainfall in a cold front, and the extreme winds and rainfall bands that surround the eyewall of typhoons。有些证据表明同时进行偏差修正。我们首次尝试了直接从运行的全球预测模型下采,成功保留了大多数的优点。这表明一个全新的end-to-end, global-to-regional机器学习天气预测时代可能即将到来。
Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups
paper_authors: Kathlén Kohn, Anna-Laura Sattelberger, Vahid Shahverdi
for: investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group.
methods: explicit description of their dimension, degree, Euclidean distance degree, and singularities.
results: fully characterize invariance for arbitrary permutation groups, and equivariance for cyclic groups, and prove that all invariant linear functions can be learned by linear autoencoders.Here’s the same information in Traditional Chinese:
for: 研究对应 Permutation group 的函数子集,包括对称函数和对称函数。
methods: 提供对应函数的精确描述,包括次数、度量、欧几何级数和缺陷。
results: 完全描述任意 Permutation group 的对称性,以及循环群的对称性,并证明所有对称函数可以通过线性自动化学习。Abstract
The set of functions parameterized by a linear fully-connected neural network is a determinantal variety. We investigate the subvariety of functions that are equivariant or invariant under the action of a permutation group. Examples of such group actions are translations or $90^\circ$ rotations on images. For such equivariant or invariant subvarieties, we provide an explicit description of their dimension, their degree as well as their Euclidean distance degree, and their singularities. We fully characterize invariance for arbitrary permutation groups, and equivariance for cyclic groups. We draw conclusions for the parameterization and the design of equivariant and invariant linear networks, such as a weight sharing property, and we prove that all invariant linear functions can be learned by linear autoencoders.
摘要
Set of functions parameterized by linear fully-connected neural network is a determinantal variety. We investigate subvariety of functions that are equivariant or invariant under action of permutation group. Examples of such group actions include translations or $90^\circ$ rotations on images. For such equivariant or invariant subvarieties, we provide explicit description of their dimension, degree, Euclidean distance degree, and singularities. We fully characterize invariance for arbitrary permutation groups and equivariance for cyclic groups. We draw conclusions for parameterization and design of equivariant and invariant linear networks, including weight sharing property, and prove that all invariant linear functions can be learned by linear autoencoders.
Towards Tuning-Free Minimum-Volume Nonnegative Matrix Factorization
results: 这篇论文提出了一种不需要选择医学参数的NMF方法,并且提供了一种基于主化最小化的逐步算法来实现。 employing this method, the authors show that the optimal choice of the tuning parameter is insensitive to the noise level in the data.Abstract
Nonnegative Matrix Factorization (NMF) is a versatile and powerful tool for discovering latent structures in data matrices, with many variations proposed in the literature. Recently, Leplat et al.\@ (2019) introduced a minimum-volume NMF for the identifiable recovery of rank-deficient matrices in the presence of noise. The performance of their formulation, however, requires the selection of a tuning parameter whose optimal value depends on the unknown noise level. In this work, we propose an alternative formulation of minimum-volume NMF inspired by the square-root lasso and its tuning-free properties. Our formulation also requires the selection of a tuning parameter, but its optimal value does not depend on the noise level. To fit our NMF model, we propose a majorization-minimization (MM) algorithm that comes with global convergence guarantees. We show empirically that the optimal choice of our tuning parameter is insensitive to the noise level in the data.
摘要
非负矩阵分解(NMF)是一种多变性强大的工具,用于找到数据矩阵中隐藏的结构,文献中有多种提案。最近,Leplat等人(2019)提出了一种可 identificable 的 minimum-volume NMF,用于在噪声存在的情况下 recuperate 缺rank 矩阵。然而,其表现需要选择一个调整参数,该参数的优化值取决于未知的噪声水平。在这篇文章中,我们提出了一种基于平方减法和其调整参数不виси的 minimum-volume NMF 形式化。我们的形式化也需要选择一个调整参数,但该参数的优化值不取决于噪声水平。为了适应我们的 NMF 模型,我们提出了一种majorization-minimization(MM)算法,该算法来with global convergence guarantees。我们通过实验表明,我们的调整参数的优化值对噪声水平的影响不大。
Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense
results: 这些方法可以在PDEs中超越几何约束(COD),即计算量只增长为 polynomial 函数,而不是几何函数。这些方法还可以在$L^p$ norm下与高维PDE解决方法进行比较。Abstract
Recently, several deep learning (DL) methods for approximating high-dimensional partial differential equations (PDEs) have been proposed. The interest that these methods have generated in the literature is in large part due to simulations which appear to demonstrate that such DL methods have the capacity to overcome the curse of dimensionality (COD) for PDEs in the sense that the number of computational operations they require to achieve a certain approximation accuracy $\varepsilon\in(0,\infty)$ grows at most polynomially in the PDE dimension $d\in\mathbb N$ and the reciprocal of $\varepsilon$. While there is thus far no mathematical result that proves that one of such methods is indeed capable of overcoming the COD, there are now a number of rigorous results in the literature that show that deep neural networks (DNNs) have the expressive power to approximate PDE solutions without the COD in the sense that the number of parameters used to describe the approximating DNN grows at most polynomially in both the PDE dimension $d\in\mathbb N$ and the reciprocal of the approximation accuracy $\varepsilon>0$. Roughly speaking, in the literature it is has been proved for every $T>0$ that solutions $u_d\colon [0,T]\times\mathbb R^d\to \mathbb R$, $d\in\mathbb N$, of semilinear heat PDEs with Lipschitz continuous nonlinearities can be approximated by DNNs with ReLU activation at the terminal time in the $L^2$-sense without the COD provided that the initial value functions $\mathbb R^d\ni x\mapsto u_d(0,x)\in\mathbb R$, $d\in\mathbb N$, can be approximated by ReLU DNNs without the COD. It is the key contribution of this work to generalize this result by establishing this statement in the $L^p$-sense with $p\in(0,\infty)$ and by allowing the activation function to be more general covering the ReLU, the leaky ReLU, and the softplus activation functions as special cases.
摘要
近些时候,一些深度学习(DL)方法用于近似高维partial differential equations(PDEs)已经被提出。这些方法在文献中引起了广泛的关注,主要是因为这些方法可以在computational operations上减少高维维度的影响,即“掌数之咎”(COD)。虽然没有现有的数学结论证明其中一种方法可以完全超越COD,但现在有一些文献证明了深度神经网络(DNNs)具有表达力可以在PDE维度$d\in\mathbb N$和reciprocal of approximation accuracy $\varepsilon>0$之间 polynomially增长。粗略地说,在文献中已经证明了,对于任意$T>0$,solutions $u_d\colon [0,T]\times\mathbb R^d\to \mathbb R$, $d\in\mathbb N$, of semilinear heat PDEs with Lipschitz continuous nonlinearities可以通过DNNs with ReLU activation在终点时间 уровнем$L^2$上无COD的方式进行approximation, provided that the initial value functions $\mathbb R^d\ni x\mapsto u_d(0,x)\in\mathbb R$, $d\in\mathbb N$,可以通过ReLU DNNs without COD进行approximation。这是本研究的关键贡献,是通过将这个结论推广到$L^p$ norm中($p\in(0,\infty)$),并允许activation function可以是更加一般的,涵盖ReLU、泄漏ReLU和softplus activation function的特殊情况。
Federated Deep Multi-View Clustering with Global Self-Supervision
paper_authors: Xinyue Chen, Jie Xu, Yazhou Ren, Xiaorong Pu, Ce Zhu, Xiaofeng Zhu, Zhifeng Hao, Lifang He
for: 本研究旨在Addressing the challenges of incomplete multi-view data in distributed environments, where label information is unknown and data privacy must be preserved.
results: 我们的广泛实验结果表明,我们提出的方法可以有效地解决多视图数据的不完整性和隐私担忧问题,并且表现出色。Abstract
Federated multi-view clustering has the potential to learn a global clustering model from data distributed across multiple devices. In this setting, label information is unknown and data privacy must be preserved, leading to two major challenges. First, views on different clients often have feature heterogeneity, and mining their complementary cluster information is not trivial. Second, the storage and usage of data from multiple clients in a distributed environment can lead to incompleteness of multi-view data. To address these challenges, we propose a novel federated deep multi-view clustering method that can mine complementary cluster structures from multiple clients, while dealing with data incompleteness and privacy concerns. Specifically, in the server environment, we propose sample alignment and data extension techniques to explore the complementary cluster structures of multiple views. The server then distributes global prototypes and global pseudo-labels to each client as global self-supervised information. In the client environment, multiple clients use the global self-supervised information and deep autoencoders to learn view-specific cluster assignments and embedded features, which are then uploaded to the server for refining the global self-supervised information. Finally, the results of our extensive experiments demonstrate that our proposed method exhibits superior performance in addressing the challenges of incomplete multi-view data in distributed environments.
摘要
“联合多视角聚类”有可能从多个设备上的数据学习全球聚类模型。在这个设定下,标签信息未知,且需保持数据隐私,导致两个主要挑战。首先,不同客户的视野常有特征差异,采集其辅助聚类结构不单简。其次,在分布式环境中存储和使用多个客户的数据可能会导致多视角数据的不完整性。为解决这些挑战,我们提出了一个新的联合深度多视角聚类方法。在服务器环境中,我们提出了样本Alignment和数据扩展技术来探索多个视野之间的辅助聚类结构。服务器随后将全球原型和全球伪标给每个客户作为全球自我超级信息。在客户环境中,每个客户使用全球自我超级信息和深度自适应器来学习视野特定的聚类分配和嵌入特征,然后将结果上传到服务器进行改进全球自我超级信息。最后,我们的广泛实验结果显示,我们的提议方法在实际中处理多视角数据的不完整性时表现出色。”
Performance Evaluation of Equal-Weight Portfolio and Optimum Risk Portfolio on Indian Stocks
paper_authors: Abhiraj Sen, Jaydip Sen for: 这个论文目的是为了设计一个最佳投资组合,使得投资组合的返回和风险得到优化。methods: 这篇论文使用了三种方法来设计投资组合,包括最小风险方法、最大返回方法和等权分配方法。results: 根据实际股票市场数据,这篇论文发现了三个投资组合,每个组合包括10家公司,可以最大化返回和最小化风险。这些组合的性能被评估于2022年1月1日至2022年12月31日的股票价格数据上,并与市场数据进行比较。Abstract
Designing an optimum portfolio for allocating suitable weights to its constituent assets so that the return and risk associated with the portfolio are optimized is a computationally hard problem. The seminal work of Markowitz that attempted to solve the problem by estimating the future returns of the stocks is found to perform sub-optimally on real-world stock market data. This is because the estimation task becomes extremely challenging due to the stochastic and volatile nature of stock prices. This work illustrates three approaches to portfolio design minimizing the risk, optimizing the risk, and assigning equal weights to the stocks of a portfolio. Thirteen critical sectors listed on the National Stock Exchange (NSE) of India are first chosen. Three portfolios are designed following the above approaches choosing the top ten stocks from each sector based on their free-float market capitalization. The portfolios are designed using the historical prices of the stocks from Jan 1, 2017, to Dec 31, 2022. The portfolios are evaluated on the stock price data from Jan 1, 2022, to Dec 31, 2022. The performances of the portfolios are compared, and the portfolio yielding the higher return for each sector is identified.
摘要
设计最佳投资组合,以优化投资组合的回报和风险,是一个计算复杂的问题。markowitz的基础工作,尝试通过估算未来股票回报来解决问题,发现在实际股市数据上表现下相对较差。这是因为估算任务在股票价格的随机和波动性下变得极其困难。本文介绍了三种方法来设计投资组合,即最小化风险、最大化回报和均衡分配股票。选择了13个关键领域的上市公司(NSE)在印度股市。根据每个领域的自由悬挂市值,选择了每个领域的前十名股票。使用历史股票价格从2017年1月1日到2022年12月31日,设计了三个投资组合。对于2022年1月1日到2022年12月31日的股票价格,评估了投资组合的表现。对每个领域,比较了投资组合的表现,并标识出每个领域的最高回报投资组合。
for: This paper is written to study the role of regularization in multiclass learning with arbitrary label sets, and to introduce optimal learning algorithms that incorporate regularization using one-inclusion graphs (OIGs).
methods: The paper uses OIGs to exhibit optimal learning algorithms that relax structural risk minimization on two dimensions: allowing the regularization function to be “local” to datapoints, and using an unsupervised learning stage to learn this regularizer at the outset. The paper also introduces a combinatorial sequence called the Hall complexity, which is the first to characterize a problem’s transductive error rate exactly.
results: The paper shows that the introduced optimal learner relaxes structural risk minimization on two dimensions and uses an unsupervised learning stage to learn a regularizer at the outset. The paper also demonstrates that an agnostic version of the Hall complexity characterizes error rates exactly, and exhibits an optimal learner using maximum entropy programs.Abstract
The quintessential learning algorithm of empirical risk minimization (ERM) is known to fail in various settings for which uniform convergence does not characterize learning. It is therefore unsurprising that the practice of machine learning is rife with considerably richer algorithmic techniques for successfully controlling model capacity. Nevertheless, no such technique or principle has broken away from the pack to characterize optimal learning in these more general settings. The purpose of this work is to characterize the role of regularization in perhaps the simplest setting for which ERM fails: multiclass learning with arbitrary label sets. Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles: Occam's Razor as embodied by structural risk minimization (SRM), the principle of maximum entropy, and Bayesian reasoning. Most notably, we introduce an optimal learner which relaxes structural risk minimization on two dimensions: it allows the regularization function to be "local" to datapoints, and uses an unsupervised learning stage to learn this regularizer at the outset. We justify these relaxations by showing that they are necessary: removing either dimension fails to yield a near-optimal learner. We also extract from OIGs a combinatorial sequence we term the Hall complexity, which is the first to characterize a problem's transductive error rate exactly. Lastly, we introduce a generalization of OIGs and the transductive learning setting to the agnostic case, where we show that optimal orientations of Hamming graphs -- judged using nodes' outdegrees minus a system of node-dependent credits -- characterize optimal learners exactly. We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs.
摘要
《 Quintessential 学习算法的实际风险最小化(ERM)在各种设置下失败,因此 machine learning 实践中的许多更加复杂的算法技巧成功地控制模型容量。然而,没有任何技巧或原则能够在更一般的设置下Characterize 优化学习。》本文的目的是在多类学习中使用一 inclusion 图(OIGs)来Characterize 识别器的角色,并使用structural risk minimization(SRM)、最大Entropy 原则和 Bayesian 思维来定义优化学习算法。我们介绍了一个优化学习算法,它在两个维度上relax 了SRM:允许正则化函数在数据点上本地化,并在无监督学习阶段使用一个不supervised 学习来学习这个正则化器。我们证明了这些relaxation 是必要的, otherwise 不能得到近似优化学习算法。我们还从 OIGs 中提取了一个 combinatorial sequence,我们称之为 Hall complexity,它可以 exactly Characterize 问题的推导性错误率。 finally,我们将 OIGs 和推导学习设定扩展到agnostic 情况下,并证明在这种情况下,optimal orientations of Hamming graphs(judged by nodes' outdegrees minus a system of node-dependent credits)Characterize 优化学习算法 exactly。我们还证明了agnostic 版本的 Hall complexity 可以 exactly Characterize 错误率,并展示了一个使用最大Entropy 程序的优化学习算法。
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
methods: 这paper提出了基于偏差信号噪声比(GSNR)的偏差减少技术(VRGD),并应用于流行的优化器 such as SGD/Adam/LARS/LAMB。 authors还进行了关于整体趋势的分析和一般化分析,以解释它的快速训练动态和更小的泛化差。
results: 实验表明,VRGD可以加速训练(1-2倍),缩小泛化差和提高最终精度。 authors推进BERT预训练的批量大小到128k/64k和DLRM到512k,而无需影响精度。 ImageNet Top-1准确率在96k上提高了0.52pp,比LARS更高。 总的来说,这paper的研究可以大幅减少BERT和ImageNet训练中的泛化差。Abstract
As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the variance reduced gradient descent technique (VRGD) based on the gradient signal to noise ratio (GSNR) and apply it onto popular optimizers such as SGD/Adam/LARS/LAMB. We carry out a theoretical analysis of convergence rate to explain its fast training dynamics, and a generalization analysis to demonstrate its smaller generalization gap on LB training. Comprehensive experiments demonstrate that VRGD can accelerate training ($1\sim 2 \times$), narrow generalization gap and improve final accuracy. We push the batch size limit of BERT pretraining up to 128k/64k and DLRM to 512k without noticeable accuracy loss. We improve ImageNet Top-1 accuracy at 96k by $0.52pp$ than LARS. The generalization gap of BERT and ImageNet training is significantly reduce by over $65\%$.
摘要
为了提高自然语言处理(NLP)、计算机视觉(CV)和推荐系统(RS)的训练效率,常常使用大量的GPUs/TPUs并行为大批(LB)训练。然而,这些LB任务的训练经常会遇到大的泛化差异和下降最终精度,从而限制扩大批处理的大小。在这种情况下,我们开发了基于梯度信号噪声比(GSNR)的减少梯度下降技术(VRGD),并应用于流行的优化器如SGD/Adam/LAMB/LARS。我们进行了理论分析的速度和稳定性,以及通用的泛化分析,以证明它的快速训练特性和更小的泛化差异。实验表明,VRGD可以提高训练速度(1-2倍),缩小泛化差异和提高最终精度。我们把BERT预训练的批处理大小提高到128k/64k,而DLRM的批处理大小提高到512k,无需注意到精度下降。在ImageNet顶层1任务中,我们提高了LARS的性能,比原来的LARS提高了0.52pp。总的来说,我们通过减少BERT和ImageNet训练中的泛化差异,将其降低了65%以上。
Topology-Agnostic Detection of Temporal Money Laundering Flows in Billion-Scale Transactions
results: 对于一个包含多个大 european 银行交易的数据集,该框架表现出了明显的高效性和实用性,在比较两种现有的探测恶意流动交易方法时。Abstract
Money launderers exploit the weaknesses in detection systems by purposefully placing their ill-gotten money into multiple accounts, at different banks. That money is then layered and moved around among mule accounts to obscure the origin and the flow of transactions. Consequently, the money is integrated into the financial system without raising suspicion. Path finding algorithms that aim at tracking suspicious flows of money usually struggle with scale and complexity. Existing community detection techniques also fail to properly capture the time-dependent relationships. This is particularly evident when performing analytics over massive transaction graphs. We propose a framework (called FaSTMAN), adapted for domain-specific constraints, to efficiently construct a temporal graph of sequential transactions. The framework includes a weighting method, using 2nd order graph representation, to quantify the significance of the edges. This method enables us to distribute complex queries on smaller and densely connected networks of flows. Finally, based on those queries, we can effectively identify networks of suspicious flows. We extensively evaluate the scalability and the effectiveness of our framework against two state-of-the-art solutions for detecting suspicious flows of transactions. For a dataset of over 1 Billion transactions from multiple large European banks, the results show a clear superiority of our framework both in efficiency and usefulness.
摘要
贩卖洗钱者利用检测系统的弱点,故意将黑钱分布到多个帐户,不同银行的帐户中。然后将这笔钱层层转移,以隐藏起点和转移流动的关系。因此,黑钱能够融入金融系统,无需引起怀疑。跟踪款流的算法通常在规模和复杂性方面遇到困难。现有社区检测技术也无法正确捕捉时间关系。特别是在处理庞大交易图时,这些技术的表现很差。我们提出了一个名为FaSTMAN的框架,适应域pecific约束,以生成Sequential Transactions的 temporal graph。该框架包括一种Edge重量方法,使用二次graph表示法,以衡量边的重要性。这种方法允许我们将复杂的查询分配到更小的、紧密连接的网络上。最后,基于这些查询,我们可以有效地认定涉嫌的款流网络。我们对两个国际领先的检测涉嫌款流解决方案进行了广泛的评估,并对一个包含多个大European银行的交易数据集进行了广泛的测试。结果表明,我们的框架在效率和有用性方面具有明显的优势。
Fantastic Generalization Measures are Nowhere to be Found
results: 作者通过数学分析和实验研究发现,在过参数 Setting 下,无论使用哪种一般化 bound,都无法保证一般化能力的准确性。此外,如果学习算法在某些分布上具有良好的准确率,那么一般化 bound 就无法 uniformly 紧张。因此,作者结论认为,在过参数 Setting 下,一般化 bound 无法是紧张的,除非有适当的Assumption sobre 人口分布。Abstract
Numerous generalization bounds have been proposed in the literature as potential explanations for the ability of neural networks to generalize in the overparameterized setting. However, none of these bounds are tight. For instance, in their paper ``Fantastic Generalization Measures and Where to Find Them'', Jiang et al. (2020) examine more than a dozen generalization bounds, and show empirically that none of them imply guarantees that can explain the remarkable performance of neural networks. This raises the question of whether tight generalization bounds are at all possible. We consider two types of generalization bounds common in the literature: (1) bounds that depend on the training set and the output of the learning algorithm. There are multiple bounds of this type in the literature (e.g., norm-based and margin-based bounds), but we prove mathematically that no such bound can be uniformly tight in the overparameterized setting; (2) bounds that depend on the training set and on the learning algorithm (e.g., stability bounds). For these bounds, we show a trade-off between the algorithm's performance and the bound's tightness. Namely, if the algorithm achieves good accuracy on certain distributions in the overparameterized setting, then no generalization bound can be tight for it. We conclude that generalization bounds in the overparameterized setting cannot be tight without suitable assumptions on the population distribution.
摘要
很多通用 bound 在文献中被提出,以解释神经网络在过参数化设置下的泛化能力。然而,这些 bound 都不是紧张的。例如,在他们的 paper "Fantastic Generalization Measures and Where to Find Them" 中, Jiang et al. (2020) examine over a dozen generalization bound, and show empirically that none of them can provide guarantees that can explain the remarkable performance of neural networks.这引起了是否存在紧张的 generalization bound 的问题。我们考虑了文献中两种常见的 generalization bound:1. 依赖于训练集和学习算法的 bound。文献中有多种这类 bound(例如,norm-based和margin-based bound),但我们证明了在过参数化设置下,无法得到一个 uniformly 紧张的 bound。2. 依赖于训练集和学习算法的 bound。例如,stability bound。我们显示出在某些分布下,如果学习算法 achieves 良好的准确率,那么不可能有一个紧张的 bound。我们结论是,在过参数化设置下,generalization bound 不可能是紧张的,除非有适当的人口分布假设。
A Probabilistic Model for Data Redundancy in the Feature Domain
results: 论文提供了一种用于估算大量数据中具有低相关性和低多相关性的特征集的方法,并证明了一个关于互助约束集的辅助结果,这结果是独立有价值的。Abstract
In this paper, we use a probabilistic model to estimate the number of uncorrelated features in a large dataset. Our model allows for both pairwise feature correlation (collinearity) and interdependency of multiple features (multicollinearity) and we use the probabilistic method to obtain upper and lower bounds of the same order, for the size of a feature set that exhibits low collinearity and low multicollinearity. We also prove an auxiliary result regarding mutually good constrained sets that is of independent interest.
摘要
在这篇论文中,我们使用一种 probabilistic 模型来估计大数据集中具有低相关性的特征数量。我们的模型允许对特征之间的对比关系(杂相关)以及多个特征之间的相互关系(多相关),并使用 probabilistic 方法获取同样的订正范围,以便确定具有低相关性和低多相关性的特征集的大小。我们还证明了一个有益的副结果,即具有互助约束的特征集是独立有价值的。
REWAFL: Residual Energy and Wireless Aware Participant Selection for Efficient Federated Learning over Mobile Devices
results: 实验结果表明,REWAFL可以提高训练精度和效率,同时避免移动设备”耗尽电池”的问题。Abstract
Participant selection (PS) helps to accelerate federated learning (FL) convergence, which is essential for the practical deployment of FL over mobile devices. While most existing PS approaches focus on improving training accuracy and efficiency rather than residual energy of mobile devices, which fundamentally determines whether the selected devices can participate. Meanwhile, the impacts of mobile devices' heterogeneous wireless transmission rates on PS and FL training efficiency are largely ignored. Moreover, PS causes the staleness issue. Prior research exploits isolated functions to force long-neglected devices to participate, which is decoupled from original PS designs. In this paper, we propose a residual energy and wireless aware PS design for efficient FL training over mobile devices (REWAFL). REW AFL introduces a novel PS utility function that jointly considers global FL training utilities and local energy utility, which integrates energy consumption and residual battery energy of candidate mobile devices. Under the proposed PS utility function framework, REW AFL further presents a residual energy and wireless aware local computing policy. Besides, REWAFL buries the staleness solution into its utility function and local computing policy. The experimental results show that REW AFL is effective in improving training accuracy and efficiency, while avoiding "flat battery" of mobile devices.
摘要
In this paper, we propose a residual energy and wireless aware PS design for efficient FL training over mobile devices (REWAFL). The proposed PS utility function jointly considers global FL training utilities and local energy utility, which integrates energy consumption and residual battery energy of candidate mobile devices. Under the proposed PS utility function framework, REW AFL further presents a residual energy and wireless aware local computing policy. Moreover, REWAFL buries the staleness solution into its utility function and local computing policy.The experimental results show that REWAFL is effective in improving training accuracy and efficiency while avoiding "flat battery" of mobile devices.
Crack-Net: Prediction of Crack Propagation in Composites
results: 这篇论文的结果显示,Crack-Net可以高度准确地预测材料的破坏模式和压缩曲线,并且可以处理更复杂的微struktural设计。Abstract
Computational solid mechanics has become an indispensable approach in engineering, and numerical investigation of fracture in composites is essential as composites are widely used in structural applications. Crack evolution in composites is the bridge to elucidate the relationship between the microstructure and fracture performance, but crack-based finite element methods are computationally expensive and time-consuming, limiting their application in computation-intensive scenarios. Here we propose a deep learning framework called Crack-Net, which incorporates the relationship between crack evolution and stress response to predict the fracture process in composites. Trained on a high-precision fracture development dataset generated using the phase field method, Crack-Net demonstrates a remarkable capability to accurately forecast the long-term evolution of crack growth patterns and the stress-strain curve for a given composite design. The Crack-Net captures the essential principle of crack growth, which enables it to handle more complex microstructures such as binary co-continuous structures. Moreover, transfer learning is adopted to further improve the generalization ability of Crack-Net for composite materials with reinforcements of different strengths. The proposed Crack-Net holds great promise for practical applications in engineering and materials science, in which accurate and efficient fracture prediction is crucial for optimizing material performance and microstructural design.
摘要
computation solid mechanics 已成为工程领域必备的方法,数字调查对于复合材料的裂解是必要的,因为复合材料广泛应用于结构应用。 裂解进程中的裂解演化在复合材料中是关键,但是基于裂解的finite element方法 computationally expensive 和时间consuming,这限制了它们在 computation-intensive enario 中的应用。 我们提出了一个深度学习框架,叫做Crack-Net,它包含了裂解演化和压力应答之间的关系,以预测复合材料的裂解过程。 Crack-Net 在一个高精度的裂解发展数据集上训练,该数据集使用阶段场方法生成。 Crack-Net 能够准确预测复合材料的长期裂解趋势和压力-弹簧曲线。 Crack-Net 捕捉了裂解的基本原理,因此可以处理更复杂的微结构,例如二元共晶结构。 此外,我们采用了传输学习来进一步提高 Crack-Net 对于不同强度的增强材料的泛化能力。 我们提出的 Crack-Net 具有实际应用的潜在价值,在工程和材料科学中,准确和高效地预测裂解是关键的,以便优化材料性能和微结构设计。
Reinforcement-Enhanced Autoregressive Feature Transformation: Gradient-steered Search in Continuous Space for Postfix Expressions
paper_authors: Dongjie Wang, Meng Xiao, Min Wu, Pengfei Wang, Yuanchun Zhou, Yanjie Fu
for: This paper aims to improve the efficiency and effectiveness of feature transformation for machine learning tasks by reformulating the discrete search space into a continuous optimization task.
methods: The proposed method includes four steps: (1) reinforcement-enhanced data preparation, (2) feature transformation operation sequence embedding, (3) gradient-steered optimal embedding search, and (4) transformation operation sequence reconstruction.
results: The proposed method is expected to fundamentally fill the gap between efficiency and stability/robustness in feature transformation, and to provide a more effective and efficient way to optimize feature transformation for machine learning tasks.Abstract
Feature transformation aims to generate new pattern-discriminative feature space from original features to improve downstream machine learning (ML) task performances. However, the discrete search space for the optimal feature explosively grows on the basis of combinations of features and operations from low-order forms to high-order forms. Existing methods, such as exhaustive search, expansion reduction, evolutionary algorithms, reinforcement learning, and iterative greedy, suffer from large search space. Overly emphasizing efficiency in algorithm design usually sacrifices stability or robustness. To fundamentally fill this gap, we reformulate discrete feature transformation as a continuous space optimization task and develop an embedding-optimization-reconstruction framework. This framework includes four steps: 1) reinforcement-enhanced data preparation, aiming to prepare high-quality transformation-accuracy training data; 2) feature transformation operation sequence embedding, intending to encapsulate the knowledge of prepared training data within a continuous space; 3) gradient-steered optimal embedding search, dedicating to uncover potentially superior embeddings within the learned space; 4) transformation operation sequence reconstruction, striving to reproduce the feature transformation solution to pinpoint the optimal feature space.
摘要
<>功能转换targets于生成新的特征分布,以提高下游机器学习(ML)任务表现。然而,原始特征的逐渐扩展的搜索空间会急剧增长,从低阶形式到高阶形式。现有的方法,如枚举搜索、减少扩展、进化算法、强化学习和迭代蜂巢,都受到搜索空间的限制。强调效率在算法设计中通常会牺牲稳定性或可靠性。为了彻底填补这个差距,我们将离散特征转换重新定义为连续空间优化任务,并开发一个嵌入优化重建框架。这个框架包括以下四个步骤:1. 增强驱动数据准备,目的是为特征转换精度训练数据做准备;2. 特征转换操作序列嵌入,旨在将准备好的训练数据中的知识嵌入到连续空间中;3. 梯度导航优化搜索,旨在在学习空间中找到可能更高质量的嵌入;4. 特征转换操作序列重建,努力将特征转换解决方案复制到特定的特征空间,以确定最佳特征空间。
DPA-WNO: A gray box model for a class of stochastic mechanics problem
for: 解决数据驱动模型缺乏解释性、占据大量数据和不能泛化问题,提出了数据物理融合方法,并提出了一种新的可微分物理增强波лет神经网络操作器(DPA-WNO),将数据驱动模型和物理解决方法融合在一起,以便利用数据驱动模型学习 FROM 数据,同时保留物理解决方法的解释性和泛化能力。
methods: 提出的DPA-WNO结合了可微分物理解决方法和波лет神经网络操作器,使得该方法可以利用数据驱动模型学习 FROM 数据,同时保留物理解决方法的解释性和泛化能力。
results: 对四个不同领域的科学和工程中的时间不确定性量化和可靠性分析问题进行了解决,并得到了有趣的结果,表明该方法可以有效地解决数据驱动模型中的缺乏解释性、占据大量数据和不能泛化等问题。Abstract
The well-known governing physics in science and engineering is often based on certain assumptions and approximations. Therefore, analyses and designs carried out based on these equations are also approximate. The emergence of data-driven models has, to a certain degree, addressed this challenge; however, the purely data-driven models often (a) lack interpretability, (b) are data-hungry, and (c) do not generalize beyond the training window. Operator learning has recently been proposed as a potential alternative to address the aforementioned challenges; however, the challenges are still persistent. We here argue that one of the possible solutions resides in data-physics fusion, where the data-driven model is used to correct/identify the missing physics. To that end, we propose a novel Differentiable Physics Augmented Wavelet Neural Operator (DPA-WNO). The proposed DPA-WNO blends a differentiable physics solver with the Wavelet Neural Operator (WNO), where the role of WNO is to model the missing physics. This empowers the proposed framework to exploit the capability of WNO to learn from data while retaining the interpretability and generalizability associated with physics-based solvers. We illustrate the applicability of the proposed approach in solving time-dependent uncertainty quantification problems due to randomness in the initial condition. Four benchmark uncertainty quantification and reliability analysis examples from various fields of science and engineering are solved using the proposed approach. The results presented illustrate interesting features of the proposed approach.
摘要
科学和工程中常见的管理物理是基于某些假设和简化的。因此,基于这些方程的分析和设计也是有误差的。数据驱动模型的出现有所解决了这个挑战,但是纯数据驱动模型常有两个缺点:一是不可解释性,二是吃掉数据。运维学学习已经被提议为可能的解决方案之一,但是这些挑战仍然存在。我们认为一种可能的解决方案在数据物理融合中,其中数据驱动模型用于 corrections/identification of missing physics。为此,我们提出了一种新的可微分物理增强波let神经网络算法(DPA-WNO)。我们的提议的DPA-WNO将一个可微分物理解决器与波let神经网络(WNO)融合在一起,WNO用于模拟缺失的物理。这使得我们的框架能够利用WNO从数据中学习,同时保持与物理基础模型相关的可解释性和泛化性。我们通过解决时间依赖不确定性量化和可靠性分析问题来证明提议的方法的可行性。我们在不同的科学和工程领域中使用提议的方法解决了四个标准不确定性量化和可靠性分析问题的例子。结果表明了我们的方法的有趣特点。
Self-Tuning Hamiltonian Monte Carlo for Accelerated Sampling
paper_authors: Henrik Christiansen, Federico Errica, Francesco Alesiani
for: 本研究旨在自动调整哈密顿 Monte Carlo 方法的参数,以便快速探索参数空间。
methods: 本研究使用了完全可微分的设置和反射传播来优化参数。 furthermore, an attention-like loss is defined to allow for the gradient-driven learning of the distribution of integration steps.
results: 我们在一维振荡子和艾莫对蛋白质中进行了实验,发现我们的损失和自参数的散度之间存在良好的对映,从而获得了快速参数的调整。Abstract
The performance of Hamiltonian Monte Carlo crucially depends on its parameters, in particular the integration timestep and the number of integration steps. We present an adaptive general-purpose framework to automatically tune these parameters based on a loss function which promotes the fast exploration of phase-space. For this, we make use of a fully-differentiable set-up and use backpropagation for optimization. An attention-like loss is defined which allows for the gradient driven learning of the distribution of integration steps. We also highlight the importance of jittering for a smooth loss-surface. Our approach is demonstrated for the one-dimensional harmonic oscillator and alanine dipeptide, a small protein common as a test-case for simulation methods. We find a good correspondence between our loss and the autocorrelation times, resulting in well-tuned parameters for Hamiltonian Monte Carlo.
摘要
Hamiltonian Monte Carlo 的性能取决于它的参数,特别是 интеграル时步和integration step的数量。我们提出了一种自适应通用框架,通过损函数来促进快速探索phaspace的分布。我们利用了完全导数的设置,并使用反射进行优化。我们定义了一种注意力类损函数,允许通过梯度驱动学习的integration step的分布。我们 также强调了在损函数Surface上的缓冲作用。我们的方法在一维振荡体和 Alanine dipeptide 上进行了示例,发现我们的损函数和自相关时间之间存在良好的匹配,从而获得了良好地调整的 Hamiltonian Monte Carlo 参数。
Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity
results: 本研究显示了现有理论下的学习误差下界不适用于实际场景中的数据不均衡情况,而且提出了一种新的下界。此外,我们还提出了一种robust变种的分布式梯度下降算法,并通过实验 validate our分析。Abstract
The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction 1/2. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice.
摘要
Theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. However, under data heterogeneity, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction 1/2. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice.Note: The translation is done using a machine translation tool, and may not be perfect. Please let me know if you need any further assistance.
Physics Informed Neural Network Code for 2D Transient Problems (PINN-2DT) Compatible with Google Colab
paper_authors: Paweł Maczuga, Maciej Skoczeń, Przemysław Rożnawski, Filip Tłuszcz, Marcin Szubert, Marcin Łoś, Witold Dzwinel, Keshav Pingali, Maciej Paszyński
For: The paper presents an open-source Physics Informed Neural Network (PINN) environment for simulating transient phenomena on two-dimensional rectangular domains.* Methods: The PINN environment uses a neural network to solve time-dependent partial differential equations (PDEs) and supports various boundary conditions, including Neumann and Dirichlet conditions. It also allows for customization of the number of layers and neurons per layer, as well as for arbitrary activation functions.* Results: The PINN environment provides a simple interface for defining the residual loss, boundary condition, and initial loss, together with their weights. It also includes a library of problems, such as non-stationary heat transfer, wave equation modeling a tsunami, atmospheric simulations including thermal inversion, and tumor growth simulations.Abstract
We present an open-source Physics Informed Neural Network environment for simulations of transient phenomena on two-dimensional rectangular domains, with the following features: (1) it is compatible with Google Colab which allows automatic execution on cloud environment; (2) it supports two dimensional time-dependent PDEs; (3) it provides simple interface for definition of the residual loss, boundary condition and initial loss, together with their weights; (4) it support Neumann and Dirichlet boundary conditions; (5) it allows for customizing the number of layers and neurons per layer, as well as for arbitrary activation function; (6) the learning rate and number of epochs are available as parameters; (7) it automatically differentiates PINN with respect to spatial and temporal variables; (8) it provides routines for plotting the convergence (with running average), initial conditions learnt, 2D and 3D snapshots from the simulation and movies (9) it includes a library of problems: (a) non-stationary heat transfer; (b) wave equation modeling a tsunami; (c) atmospheric simulations including thermal inversion; (d) tumor growth simulations.
摘要
我们提供一个开源的物理学 Informed Neural Network 环境,用于二维矩形领域上的脉冲现象模拟,其特点如下:1. 兼容 Google Colab,可以在云环境自动执行;2. 支持二维时间依赖的偏微分方程;3. 提供简单的接口 для定义剩余损失、边界条件和初始损失,以及其权重;4. 支持内壁和 Dirichlet 边界条件;5. 允许自定义层数和神经元数,以及任意活动函数;6. 学习率和迭代次数作为参数;7. 自动 differentiate PINN 对于空间和时间变量;8. 提供折线Plot 的初始条件、2D和3D 快照和电影等;9. 包含一个库,包括:a. 非站点热传输;b. 泪滤波方程模拟潮汐;c. 大气模拟,包括热层倒挪;d. 肿瘤增长模拟。
Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution
results: SEvo可以提高推荐系统性能,并且可以轻松地与现有优化器结合使用。具体来说,SEvo可以在不同的模型和数据集上保持稳定的性能提升。Abstract
Embedding plays a critical role in modern recommender systems because they are virtual representations of real-world entities and the foundation for subsequent decision models. In this paper, we propose a novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step. Unlike GNN (Graph Neural Network) that typically serves as an intermediate part, SEvo is able to directly inject the graph structure information into embedding with negligible computational overhead in training. The convergence properties of SEvo as well as its possible variants are theoretically analyzed to justify the validity of the designs. Moreover, SEvo can be seamlessly integrated into existing optimizers for state-of-the-art performance. In particular, SEvo-enhanced AdamW with moment estimate correction demonstrates consistent improvements across a spectrum of models and datasets, suggesting a novel technical route to effectively utilize graph structure information beyond explicit GNN modules.
摘要
嵌入具有重要作用在现代推荐系统中,因为它们是虚拟世界实体的虚拟表示和后续决策模型的基础。在这篇论文中,我们提出了一种新的嵌入更新机制,即结构意识 embedding 演化(SEvo),以促进相关节点在每步中进行相似演化。与传统的 GNN(图 neural network)不同,SEvo 能够直接将图结构信息注入嵌入中,在训练中减少计算开销。我们还对 SEvo 的收敛性和可能的变体进行了理论分析,以证明其设计的有效性。此外,SEvo 可以轻松地与现有优化器结合使用,以实现最佳性能。例如,SEvo 加强的 AdamW WITH moment estimate correction 在不同的模型和数据集上都显示了稳定的改进表现,这表明了在图结构信息上超出 Explicit GNN 模块的新技术路径。
Tackling the Unlimited Staleness in Federated Learning with Intertwined Data and Device Heterogeneities
results: 实验结果表明,在面临无限停滞的情况下,本研究的方法可以提高训练模型准确率达到 20%,并提高 FL 训练进度达到 35%。Abstract
The efficiency of Federated Learning (FL) is often affected by both data and device heterogeneities. Data heterogeneity is defined as the heterogeneity of data distributions on different clients. Device heterogeneity is defined as the clients' variant latencies in uploading their local model updates due to heterogeneous conditions of local hardware resources, and causes the problem of staleness when being addressed by asynchronous FL. Traditional schemes of tackling the impact of staleness consider data and device heterogeneities as two separate and independent aspects in FL, but this assumption is unrealistic in many practical FL scenarios where data and device heterogeneities are intertwined. In these cases, traditional schemes of weighted aggregation in FL have been proved to be ineffective, and a better approach is to convert a stale model update into a non-stale one. In this paper, we present a new FL framework that leverages the gradient inversion technique for such conversion, hence efficiently tackling unlimited staleness in clients' model updates. Our basic idea is to use gradient inversion to get estimations of clients' local training data from their uploaded stale model updates, and use these estimations to compute non-stale client model updates. In this way, we address the problem of possible data quality drop when using gradient inversion, while still preserving the clients' local data privacy. We compared our approach with the existing FL strategies on mainstream datasets and models, and experiment results demonstrate that when tackling unlimited staleness, our approach can significantly improve the trained model accuracy by up to 20% and speed up the FL training progress by up to 35%.
摘要
受到数据和设备不同性的影响,联合学习(FL)的效率 часто受到数据和设备不同性的影响。数据不同性指的是客户端上的数据分布不同。设备不同性指的是客户端上的具有不同的本地硬件资源,导致异步FL Addressing staleness的问题。传统的FL方案将数据和设备不同性视为独立的两个方面,但这是在实际FL场景中不切实际的。在这些场景下,传统的权重汇集方法在FL中证明不效果,而一种更好的方法是将异步模型更新转换成非异步模型更新。在本文中,我们提出了一个新的FL框架,利用梯度反转技术来实现此类转换,从而高效地解决客户端模型更新中的无限异步问题。我们的基本思想是使用梯度反转获取客户端上传的异步模型更新中的本地训练数据估计,并使用这些估计来计算非异步客户端模型更新。这种方法可以解决使用梯度反转可能导致数据质量下降的问题,同时仍保持客户端本地数据隐私。我们与主流数据集和模型进行比较,实验结果表明,在面临无限异步情况下,我们的方法可以提高训练模型准确率达20%,并提高FL训练进度达35%。
Data-Driven Modeling of an Unsaturated Bentonite Buffer Model Test Under High Temperatures Using an Enhanced Axisymmetric Reproducing Kernel Particle Method
methods: 使用深度神经网络(DNN)来模拟焊铁粉的水含量曲线,并将其 integrate into a Reproducing Kernel Particle Method (RKPM) 进行THM simulations。
results: 通过模拟一个焊铁粉层在中央加热的 tank-scale 实验,提出了一种新的抽象基函数,以便更好地模拟焊铁粉的THM行为。Abstract
In deep geological repositories for high level nuclear waste with close canister spacings, bentonite buffers can experience temperatures higher than 100 {\deg}C. In this range of extreme temperatures, phenomenological constitutive laws face limitations in capturing the thermo-hydro-mechanical (THM) behavior of the bentonite, since the pre-defined functional constitutive laws often lack generality and flexibility to capture a wide range of complex coupling phenomena as well as the effects of stress state and path dependency. In this work, a deep neural network (DNN)-based soil-water retention curve (SWRC) of bentonite is introduced and integrated into a Reproducing Kernel Particle Method (RKPM) for conducting THM simulations of the bentonite buffer. The DNN-SWRC model incorporates temperature as an additional input variable, allowing it to learn the relationship between suction and degree of saturation under the general non-isothermal condition, which is difficult to represent using a phenomenological SWRC. For effective modeling of the tank-scale test, new axisymmetric Reproducing Kernel basis functions enriched with singular Dirichlet enforcement representing heater placement and an effective convective heat transfer coefficient representing thin-layer composite tank construction are developed. The proposed method is demonstrated through the modeling of a tank-scale experiment involving a cylindrical layer of MX-80 bentonite exposed to central heating.
摘要
高度地储存核电废弃物的深层地储Repository中,бенто纳缓冲可能会面临高温(超过100℃),这个范围内的温度范围可能会导致现象学的定量关系不够捕捉潮湿-热-机械(THM)行为,因为现象学的定量关系通常缺乏普遍性和灵活性,无法捕捉各种复杂的交互效应以及压力状态和路径依赖的影响。在这种情况下,一种深度神经网络(DNN)基于的泥土水吸辊曲线(SWRC)模型被引入,并与基于 reproduce kernel particle method(RKPM)的THM模拟方法相结合。DNN-SWRC模型包含温度作为输入变量,以便学习在一般非同温度条件下湿度和吸附之间的关系,这是现象学SWRC难以表示的。为了有效地模拟储存试验,新的轴对称 reproduce kernel基函数,包括热器设置和热传递系数,被开发出来。该方法在一个筒形储存试验中,涉及一层MX-80泥土,在中央加热情况下进行模拟。