results: Favorable results.Abstract
Detecting anomalies in a daily time series with a weekly pattern is a common task with a wide range of applications. A typical way of performing the task is by using decomposition method. However, the method often generates false positive results where a data point falls within its weekly range but is just off from its weekday position. We refer to this type of anomalies as "in-season anomalies", and propose a k-parameter approach to address the issue. The approach provides configurable extra tolerance for in-season anomalies to suppress misleading alerts while preserving real positives. It yields favorable result.
摘要
检测日征时序中的每周征性异常是一项广泛应用的任务。通常使用分解方法来实现这个任务,但这种方法经常生成假阳性结果,其中一个数据点在其每周范围内但是偏离其每周日期位置。我们称这种异常为“在季度异常”,并提出了k参数方法来解决这个问题。这种方法可以配置额外的宽限容许季度异常,以避免误导性警报,同时保留真正的阳性结果。它的结果很有利。
results: 这个论文证明了VC Dimension和Littlestone dimension之间的关系,并提供了一系列 bounds,其中包括一个新的下界,可以提高之前的下界。此外,这个论文还扩展到多类分类和agnostic设定。Abstract
We present new upper and lower bounds on the number of learner mistakes in the `transductive' online learning setting of Ben-David, Kushilevitz and Mansour (1997). This setting is similar to standard online learning, except that the adversary fixes a sequence of instances $x_1,\dots,x_n$ to be labeled at the start of the game, and this sequence is known to the learner. Qualitatively, we prove a trichotomy, stating that the minimal number of mistakes made by the learner as $n$ grows can take only one of precisely three possible values: $n$, $\Theta\left(\log (n)\right)$, or $\Theta(1)$. Furthermore, this behavior is determined by a combination of the VC dimension and the Littlestone dimension. Quantitatively, we show a variety of bounds relating the number of mistakes to well-known combinatorial dimensions. In particular, we improve the known lower bound on the constant in the $\Theta(1)$ case from $\Omega\left(\sqrt{\log(d)}\right)$ to $\Omega(\log(d))$ where $d$ is the Littlestone dimension. Finally, we extend our results to cover multiclass classification and the agnostic setting.
摘要
我们提出新的上下界关于学习者错误的数量在Ben-David、Kushilevitz和Mansour(1997)的推uctive在线学习Setting中。这个设定与标准的在线学习相似,但是敌人会在游戏开始前固定一个序列实例$x_1,\dots,x_n$的标签,并且这个序列是学习者知道的。qualitatively,我们证明了一种三分法, stating that the minimal number of mistakes made by the learner as $n$ grows can take only one of precisely three possible values: $n$, $\Theta\left(\log (n)\right)$, or $\Theta(1)$. Furthermore, this behavior is determined by a combination of the VC dimension and the Littlestone dimension. Quantitatively, we show a variety of bounds relating the number of mistakes to well-known combinatorial dimensions. In particular, we improve the known lower bound on the constant in the $\Theta(1)$ case from $\Omega\left(\sqrt{\log(d)}\right)$ to $\Omega(\log(d))$ where $d$ is the Littlestone dimension. Finally, we extend our results to cover multiclass classification and the agnostic setting.Note: "推uctive" is a typo, it should be "online" instead.
A comprehensive analysis of concept drift locality in data streams
For: 本研究目的是为了探讨概念变革的探测,以便在线学习中进行有效的模型适应。* Methods: 本研究使用了9种现有的概念变革探测方法,并进行了比较性评估,以显示它们在不同的难度水平上的表现。* Results: 研究发现,概念变革的地方性和范围有重要影响在标签器性能上,并提出了不同的概念变革类别下的最佳适应策略。Abstract
Adapting to drifting data streams is a significant challenge in online learning. Concept drift must be detected for effective model adaptation to evolving data properties. Concept drift can impact the data distribution entirely or partially, which makes it difficult for drift detectors to accurately identify the concept drift. Despite the numerous concept drift detectors in the literature, standardized procedures and benchmarks for comprehensive evaluation considering the locality of the drift are lacking. We present a novel categorization of concept drift based on its locality and scale. A systematic approach leads to a set of 2,760 benchmark problems, reflecting various difficulty levels following our proposed categorization. We conduct a comparative assessment of 9 state-of-the-art drift detectors across diverse difficulties, highlighting their strengths and weaknesses for future research. We examine how drift locality influences the classifier performance and propose strategies for different drift categories to minimize the recovery time. Lastly, we provide lessons learned and recommendations for future concept drift research. Our benchmark data streams and experiments are publicly available at https://github.com/gabrieljaguiar/locality-concept-drift.
摘要
适应漂移数据流是在线学习中的一大挑战。概念漂移必须被探测,以便有效地适应数据质量的发展。概念漂移可能会影响整个数据分布或只影响一部分,这使得漂移探测器很难准确地确定概念漂移。尽管文献中有很多概念漂移探测器,但是没有标准化的程序和标准准则 для全面评估,考虑到漂移的地方性。我们提出了一种新的概念漂移分类方法,基于其地方性和规模。我们通过这种分类方法,生成了2,760个benchmark问题,各种难度水平都有reflect。我们对9种当前state-of-the-art漂移探测器进行了 Comparative 评估,并 highlights 它们在不同难度水平上的优势和缺陷,以便未来研究。我们 также examine 如何在不同的漂移类别下,最小化恢复时间。最后,我们提供了未来概念漂移研究的教训和建议,以及我们的benchmark数据流和实验结果,可以在https://github.com/gabrieljaguiar/locality-concept-drift上获取。
A statistical perspective on algorithm unrolling models for inverse problems
results: 该论文显示了 Gradient Descent Network (GDN) 的统计复杂性是 $\mathcal{O}(\log(n)/\log(\varrho_n^{-1}))$,其中 $n$ 是样本大小,$\varrho_n$ 是梯度下降算法的速度。此外,当 negative log-density of latent variable $\bf x$ 有简单的 proximal 操作时,Then a GDN unrolled at depth $D’$ can solve the inverse problem at the parametric rate $O(D’/\sqrt{n})$.Abstract
We consider inverse problems where the conditional distribution of the observation ${\bf y}$ given the latent variable of interest ${\bf x}$ (also known as the forward model) is known, and we have access to a data set in which multiple instances of ${\bf x}$ and ${\bf y}$ are both observed. In this context, algorithm unrolling has become a very popular approach for designing state-of-the-art deep neural network architectures that effectively exploit the forward model. We analyze the statistical complexity of the gradient descent network (GDN), an algorithm unrolling architecture driven by proximal gradient descent. We show that the unrolling depth needed for the optimal statistical performance of GDNs is of order $\log(n)/\log(\varrho_n^{-1})$, where $n$ is the sample size, and $\varrho_n$ is the convergence rate of the corresponding gradient descent algorithm. We also show that when the negative log-density of the latent variable ${\bf x}$ has a simple proximal operator, then a GDN unrolled at depth $D'$ can solve the inverse problem at the parametric rate $O(D'/\sqrt{n})$. Our results thus also suggest that algorithm unrolling models are prone to overfitting as the unrolling depth $D'$ increases. We provide several examples to illustrate these results.
摘要
我们考虑反向问题,其中观察变量 $\bf y$ conditional distribution given 隐藏变量 $\bf x$ (也称为前向模型) 已知,并且我们有许多 $\bf x$ 和 $\bf y$ 的实例数据集。在这种情况下,算法卷积(algorithm unrolling)已成为设计前所未有的深度神经网络架构的非常流行的方法。我们分析了梯度下降网络(Gradient Descent Network,GDN)的统计复杂性。我们显示了 GDN 的推 rolling 深度需要为 $\log(n)/\log(\varrho_n^{-1})$,其中 $n$ 是样本大小,$\varrho_n$ 是相应的梯度下降算法的收敛速率。我们还显示了,当隐藏变量 $\bf x$ 的负梯度Log-浓度有简单的 proximal 运算时,那么在推 rolling 深度 $D'$ 下,GDN 可以在 $O(D'/\sqrt{n})$ 的速率解决反向问题。我们的结果也表明,algorithm unrolling 模型容易过拟合,随着推 rolling 深度 $D'$ 增加。我们给出了一些示例来证明这些结果。
Theory and implementation of inelastic Constitutive Artificial Neural Networks
paper_authors: Hagen Holthusen, Lukas Lamm, Tim Brepols, Stefanie Reese, Ellen Kuhl
For: The paper aims to develop a new method called Constitutive Artificial Neural Networks (CANN) to model the inelastic behavior of materials.* Methods: The paper uses a combination of feed-forward networks of the free energy and pseudo potential with a recurrent neural network approach to take time dependencies into account.* Results: The paper demonstrates that the iCANN is capable of autonomously discovering models for artificially generated data, the response of polymers for cyclic loading, and the relaxation behavior of muscle data.Here are the three points in Simplified Chinese text:
results: 论文示出 iCANN 可以自动找到模型,包括人工生成数据、聚合物的循环加载响应和肌肉数据的 relaxation 行为。Abstract
Nature has always been our inspiration in the research, design and development of materials and has driven us to gain a deep understanding of the mechanisms that characterize anisotropy and inelastic behavior. All this knowledge has been accumulated in the principles of thermodynamics. Deduced from these principles, the multiplicative decomposition combined with pseudo potentials are powerful and universal concepts. Simultaneously, the tremendous increase in computational performance enabled us to investigate and rethink our history-dependent material models to make the most of our predictions. Today, we have reached a point where materials and their models are becoming increasingly sophisticated. This raises the question: How do we find the best model that includes all inelastic effects to explain our complex data? Constitutive Artificial Neural Networks (CANN) may answer this question. Here, we extend the CANNs to inelastic materials (iCANN). Rigorous considerations of objectivity, rigid motion of the reference configuration, multiplicative decomposition and its inherent non-uniqueness, restrictions of energy and pseudo potential, and consistent evolution guide us towards the architecture of the iCANN satisfying thermodynamics per design. We combine feed-forward networks of the free energy and pseudo potential with a recurrent neural network approach to take time dependencies into account. We demonstrate that the iCANN is capable of autonomously discovering models for artificially generated data, the response of polymers for cyclic loading and the relaxation behavior of muscle data. As the design of the network is not limited to visco-elasticity, our vision is that the iCANN will reveal to us new ways to find the various inelastic phenomena hidden in the data and to understand their interaction. Our source code, data, and examples are available at doi.org/10.5281/zenodo.10066805
摘要
Higher-Order Newton Methods with Polynomial Work per Iteration
results: 该方法的本地收敛级别为$d$,比 classical Newton方法更低。数学示例表明,在$d$增加时,拥抱区域的面积可以增加。在certain assumptions下,我们还提出了一种修改后的算法,具有globally convergent和本地收敛级别为$d$。Abstract
We present generalizations of Newton's method that incorporate derivatives of an arbitrary order $d$ but maintain a polynomial dependence on dimension in their cost per iteration. At each step, our $d^{\text{th}$-order method uses semidefinite programming to construct and minimize a sum of squares-convex approximation to the $d^{\text{th}$-order Taylor expansion of the function we wish to minimize. We prove that our $d^{\text{th}$-order method has local convergence of order $d$. This results in lower oracle complexity compared to the classical Newton method. We show on numerical examples that basins of attraction around local minima can get larger as $d$ increases. Under additional assumptions, we present a modified algorithm, again with polynomial cost per iteration, which is globally convergent and has local convergence of order $d$.
摘要
我们提出了新项 Newton 方法的扩展,这些方法包括了阶数 $d$ 但是保持维度的 polynomial 依赖性。在每一步中,我们的 $d$ 阶方法使用半definite 程式来建构和最小化一个 sum of squares-凸函数的 Taylor 展开。我们证明了我们的 $d$ 阶方法有本地几何稳定性 order $d$。这导致与 класиical Newton 方法相比,我们的方法有较低的 oracle 复杂度。我们显示了一些数据例子,显示在 $d$ 增加时,当地点阶数的基础会变大。在更加假设下,我们提出了一个修改后的算法,这个算法还是有 polynomial 成本每一步,并且具有本地几何稳定性 order $d$ 和全球几何稳定性。
Blockchain-Enabled Federated Learning Approach for Vehicular Networks
results: 这个方法在遭受黑客攻击的情况下,仍能维持高准确性(91.92%),与其他分散式 Federated Learning 技术相比,这个方法具有更高的安全性和可靠性。Abstract
Data from interconnected vehicles may contain sensitive information such as location, driving behavior, personal identifiers, etc. Without adequate safeguards, sharing this data jeopardizes data privacy and system security. The current centralized data-sharing paradigm in these systems raises particular concerns about data privacy. Recognizing these challenges, the shift towards decentralized interactions in technology, as echoed by the principles of Industry 5.0, becomes paramount. This work is closely aligned with these principles, emphasizing decentralized, human-centric, and secure technological interactions in an interconnected vehicular ecosystem. To embody this, we propose a practical approach that merges two emerging technologies: Federated Learning (FL) and Blockchain. The integration of these technologies enables the creation of a decentralized vehicular network. In this setting, vehicles can learn from each other without compromising privacy while also ensuring data integrity and accountability. Initial experiments show that compared to conventional decentralized federated learning techniques, our proposed approach significantly enhances the performance and security of vehicular networks. The system's accuracy stands at 91.92\%. While this may appear to be low in comparison to state-of-the-art federated learning models, our work is noteworthy because, unlike others, it was achieved in a malicious vehicle setting. Despite the challenging environment, our method maintains high accuracy, making it a competent solution for preserving data privacy in vehicular networks.
摘要
<> translate into Simplified Chinese数据从连接的自动车可能包含敏感信息,如位置、驾驶行为、个人标识等。无效的安全措施可能会损害数据隐私和系统安全。现有中央化数据分享模式在这些系统中具有特别的隐私问题。认识到这些挑战,在技术发展的同时,倾向于分布式互动的方向,这与工业5.0的原则相吻合。这项工作与这些原则相关,强调分布式、人类中心、安全的技术互动在连接的自动车环境中。为实现这一目标,我们提议一种实用的方法,将 Federated Learning(FL)和区块链技术融合在一起。这种 integrate 的方法可以创建一个分布式的自动车网络。在这个设定下,车辆可以在不侵犯隐私的情况下学习从别的车辆,同时保证数据的完整性和责任。初始实验表明,与传统的分布式联合学习技术相比,我们提议的方法可以明显提高自动车网络的性能和安全性。系统的准确率为91.92%。尽管这些值与当前的联合学习模型相比较低,但我们的工作具有突出的特点,即在恶势力车辆环境下实现高准确率,而不是其他人所做的。不管挑战环境,我们的方法都能保持高准确率,这使得它成为了保护自动车网络数据隐私的可靠解决方案。
The AeroSonicDB (YPAD-0523) Dataset for Acoustic Detection and Classification of Aircraft
results: 基本结果显示了三种二分类模型的性能,并讨论了当前数据集的局限性和未来的潜在价值Abstract
The time and expense required to collect and label audio data has been a prohibitive factor in the availability of domain specific audio datasets. As the predictive specificity of a classifier depends on the specificity of the labels it is trained on, it follows that finely-labelled datasets are crucial for advances in machine learning. Aiming to stimulate progress in the field of machine listening, this paper introduces AeroSonicDB (YPAD-0523), a dataset of low-flying aircraft sounds for training acoustic detection and classification systems. This paper describes the method of exploiting ADS-B radio transmissions to passively collect and label audio samples. Provides a summary of the collated dataset. Presents baseline results from three binary classification models, then discusses the limitations of the current dataset and its future potential. The dataset contains 625 aircraft recordings ranging in event duration from 18 to 60 seconds, for a total of 8.87 hours of aircraft audio. These 625 samples feature 301 unique aircraft, each of which are supplied with 14 supplementary (non-acoustic) labels to describe the aircraft. The dataset also contains 3.52 hours of ambient background audio ("silence"), as a means to distinguish aircraft noise from other local environmental noises. Additionally, 6 hours of urban soundscape recordings (with aircraft annotations) are included as an ancillary method for evaluating model performance, and to provide a testing ground for real-time applications.
摘要
过往,收集和标签音频数据的时间和成本因素,对于特定领域的音频数据集的可用性是一个阻碍因素。当predictive特定性取决于labels训练的特定性,这意味着精确地标签数据集是预测机器学习的关键。为了促进机器听力领域的进步,本文发布了AeroSonicDB(YPAD-0523),一个低飞行 aircraft 音频数据集,用于训练音频检测和分类系统。本文详细介绍了使用ADS-B无线电传输来过程式收集和标签音频 Samples。提供了数据集的总结,并提出了三个binary分类模型的基eline结果。然后讨论了现有数据集的限制和未来潜力。这个数据集包含625架飞机录音, recording duration ranges from 18 to 60 seconds, for a total of 8.87 hours of aircraft audio. These 625 samples feature 301 unique aircraft, each of which are supplied with 14 supplementary (non-acoustic) labels to describe the aircraft. The dataset also contains 3.52 hours of ambient background audio ("silence"), as a means to distinguish aircraft noise from other local environmental noises. Additionally, 6 hours of urban soundscape recordings (with aircraft annotations) are included as an ancillary method for evaluating model performance, and to provide a testing ground for real-time applications.
CALLOC: Curriculum Adversarial Learning for Secure and Robust Indoor Localization
results: 实验证明,CALLOC可以在多种不同的室内场景、移动设备和攻击enario中提高准确性,比如平均误差下降6.03倍,最差情况下误差下降4.6倍,相比之下现有的室内地位定位框架。Abstract
Indoor localization has become increasingly vital for many applications from tracking assets to delivering personalized services. Yet, achieving pinpoint accuracy remains a challenge due to variations across indoor environments and devices used to assist with localization. Another emerging challenge is adversarial attacks on indoor localization systems that not only threaten service integrity but also reduce localization accuracy. To combat these challenges, we introduce CALLOC, a novel framework designed to resist adversarial attacks and variations across indoor environments and devices that reduce system accuracy and reliability. CALLOC employs a novel adaptive curriculum learning approach with a domain specific lightweight scaled-dot product attention neural network, tailored for adversarial and variation resilience in practical use cases with resource constrained mobile devices. Experimental evaluations demonstrate that CALLOC can achieve improvements of up to 6.03x in mean error and 4.6x in worst-case error against state-of-the-art indoor localization frameworks, across diverse building floorplans, mobile devices, and adversarial attacks scenarios.
摘要
indoor定位已成为许多应用程序中越来越重要的一部分,从跟踪资产到提供个性化服务。然而,实现精确定位仍然是一大挑战,因为室内环境中的变化和用于帮助定位的设备之间存在差异。此外,indoor定位系统也面临着抗 adversarial 攻击的挑战,这些攻击不仅会威胁服务的一致性,而且还会减少定位精度。为解决这些挑战,我们介绍了 CALLOC,一个新的框架,旨在抵抗抗 adversarial 攻击和室内环境中的变化。CALLOC 使用了一种新的适应学习approach,其中包括一个适应性较强的域特定缩小乘数产品注意力神经网络,特制 для抗 adversarial 和变化的鲁棒性。在实际使用情况下,CALLOC 可以在不同的建筑层面、移动设备和抗 adversarial 攻击方面实现改进。我们的实验评估表明,CALLOC 可以与现有的indoor定位框架相比,在多种不同的室内环境、移动设备和抗 adversarial 攻击场景中实现改进。改进的均方误差和最均方误差为6.03倍和4.6倍。
Compact Matrix Quantum Group Equivariant Neural Networks
for: 这 paper written for 研究 neural network 学习从数据中的量子同质性。
methods: 这 paper 使用 Woronowicz 的 Tannaka-Krein duality 来描述 compact matrix quantum group 对应的 weight matrices。
results: 这 paper 提出了一种新的类型的 neural network, called compact matrix quantum group equivariant neural network,可以从数据中学习量子同质性。此外,paper 还证明了这种 neural network 包含了所有 compact matrix group equivariant neural network 为子集。同时,paper 也获得了许多 compact matrix group equivariant neural network 的 weight matrices 的Characterization,这些 weight matrices 之前没有出现在机器学习文献中。Abstract
We derive the existence of a new type of neural network, called a compact matrix quantum group equivariant neural network, that learns from data that has an underlying quantum symmetry. We apply the Woronowicz formulation of Tannaka-Krein duality to characterise the weight matrices that appear in these neural networks for any easy compact matrix quantum group. We show that compact matrix quantum group equivariant neural networks contain, as a subclass, all compact matrix group equivariant neural networks. Moreover, we obtain characterisations of the weight matrices for many compact matrix group equivariant neural networks that have not previously appeared in the machine learning literature.
摘要
我们从数据中吸取了一种新的神经网络,即含有量子同质性的矩阵量子群响应神经网络。我们使用沃罗诺维茨形式的塔那卡-克雷因对吸引神经网络的Weight矩阵进行了定义。我们证明了矩阵量子群响应神经网络包含所有矩阵群响应神经网络的子类。此外,我们获得了许多矩阵群响应神经网络在机器学习文献中未出现过的Weight矩阵的特征。
EVORA: Deep Evidential Traversability Learning for Risk-Aware Off-Road Autonomy
paper_authors: Xiaoyi Cai, Siddharth Ancha, Lakshay Sharma, Philip R. Osteen, Bernadette Bucher, Stephen Phillips, Jiuguang Wang, Michael Everett, Nicholas Roy, Jonathan P. How
for: 本研究旨在提高快速机器人跟踪减少摩擦的能力,尤其是在不可预知的地形下。
methods: 本研究使用自我监督学习方法,直接从数据中学习地形特征,而不是手动设置成本。
results: 研究提出了一种能够有效地量化和mitigate Risks的方法,包括学习批处理分布和概率密度,以及一种新的不确定性感知loss函数。这些方法有助于提高机器人的导航性能。Abstract
Traversing terrain with good traction is crucial for achieving fast off-road navigation. Instead of manually designing costs based on terrain features, existing methods learn terrain properties directly from data via self-supervision, but challenges remain to properly quantify and mitigate risks due to uncertainties in learned models. This work efficiently quantifies both aleatoric and epistemic uncertainties by learning discrete traction distributions and probability densities of the traction predictor's latent features. Leveraging evidential deep learning, we parameterize Dirichlet distributions with the network outputs and propose a novel uncertainty-aware squared Earth Mover's distance loss with a closed-form expression that improves learning accuracy and navigation performance. The proposed risk-aware planner simulates state trajectories with the worst-case expected traction to handle aleatoric uncertainty, and penalizes trajectories moving through terrain with high epistemic uncertainty. Our approach is extensively validated in simulation and on wheeled and quadruped robots, showing improved navigation performance compared to methods that assume no slip, assume the expected traction, or optimize for the worst-case expected cost.
摘要
通过适量地形的探索是快速Off-road导航的关键。现有方法通过自我超视来学习地形特性,但是存在风险量化和mitigate风险的挑战。本工作效率地量化了 aleatoric 和 epistemic 不确定性,通过学习离散的扩展特征分布和概率密度来。基于征识深度学习,我们使用网络输出来参数化地 Dirichlet 分布,并提出了一种新的不确定性意识深度Move的距离损失函数,这个函数具有闭合式表达,可以提高学习精度和导航性能。我们的风险意识规划器通过 simulate 状态轨迹的最差预期扩展特征来处理 aleatoric 不确定性,并对高 epistemic 不确定性的轨迹进行惩罚。我们的方法在 simulate 和有脚和四脚机器人上进行了广泛验证,与不考虑滑动、预期的扩展特征或优化最差预期成本的方法进行比较,显示了改进的导航性能。
Learning material synthesis-structure-property relationship by data fusion: Bayesian Co-regionalization N-Dimensional Piecewise Function Learning
paper_authors: A. Gilad Kusne, Austin McDannald, Brian DeCost
For: 本研究旨在推动下一代技术的发展,如量子计算、碳捕集和低成本医疗影像等。* Methods: 研究人员使用了知识管理和数据融合技术,将不同仪器和实验室的数据集成在一起,以学习材料制备-结构-性质关系。* Results: 研究人员提出了一种名为Synthesis-structure-property relAtionship coreGionalized lEarner(SAGE)算法,可以在多种数据源之间进行数据融合,以学习材料制备-结构-性质关系。Abstract
Advanced materials are needed to further next-generation technologies such as quantum computing, carbon capture, and low-cost medical imaging. However, advanced materials discovery is confounded by two fundamental challenges: the challenge of a high-dimensional, complex materials search space and the challenge of combining knowledge, i.e., data fusion across instruments and labs. To overcome the first challenge, researchers employ knowledge of the underlying material synthesis-structure-property relationship, as a material's structure is often predictive of its functional property and vice versa. For example, optimal materials often occur along composition-phase boundaries or within specific phase regions. Additionally, knowledge of the synthesis-structure-property relationship is fundamental to understanding underlying physical mechanisms. However, quantifying the synthesis-structure-property relationship requires overcoming the second challenge. Researchers must merge knowledge gathered across instruments, measurement modalities, and even laboratories. We present the Synthesis-structure-property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to merge knowledge across data sources to learn synthesis-structure-property relationships.
摘要
高级材料需要进一步推动下一代技术,如量子计算、碳捕集和低成本医疗成像。然而,高级材料发现面临两个基本挑战:一是高维度、复杂的材料搜索空间挑战,二是组合知识挑战,即将数据源的知识融合到一起。为了解决第一个挑战,研究人员利用材料合成-结构-性能关系的知识,因为材料结构 oft predicts its functional property and vice versa。例如,理想的材料常occurs along composition-phase boundaries或在specific phase regions。此外,理解材料合成-结构-性能关系的基础知识是理解下面物理机制的基础。然而,量化材料合成-结构-性能关系需要解决第二个挑战。研究人员必须将数据源的知识融合到一起。我们介绍了 Synthesis-structure-property relAtionship coreGionalized lEarner(SAGE)算法。这是一种完全 Bayesian 算法,使用多modal coregionalization来融合数据源的知识,以学习材料合成-结构-性能关系。
Does Differential Privacy Prevent Backdoor Attacks in Practice?
paper_authors: Fereshteh Razmi, Jian Lou, Li Xiong for: This paper aims to investigate the effectiveness of different differential privacy (DP) techniques in preventing backdoor attacks in machine learning (ML) models, specifically examining PATE and Label-DP.methods: The paper employs DP-SGD and PATE to defend against backdoor attacks, and explores the role of different components of DP algorithms in defending against these attacks. The authors also propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE.results: The experiments reveal that hyperparameters and the number of backdoors in the training dataset impact the success of DP algorithms, and that Label-DP algorithms can be more effective than DP methods in defending against backdoor attacks while maintaining model accuracy.Abstract
Differential Privacy (DP) was originally developed to protect privacy. However, it has recently been utilized to secure machine learning (ML) models from poisoning attacks, with DP-SGD receiving substantial attention. Nevertheless, a thorough investigation is required to assess the effectiveness of different DP techniques in preventing backdoor attacks in practice. In this paper, we investigate the effectiveness of DP-SGD and, for the first time in literature, examine PATE in the context of backdoor attacks. We also explore the role of different components of DP algorithms in defending against backdoor attacks and will show that PATE is effective against these attacks due to the bagging structure of the teacher models it employs. Our experiments reveal that hyperparameters and the number of backdoors in the training dataset impact the success of DP algorithms. Additionally, we propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE. We conclude that while Label-DP algorithms generally offer weaker privacy protection, accurate hyper-parameter tuning can make them more effective than DP methods in defending against backdoor attacks while maintaining model accuracy.
摘要
diferencial privacidad (DP) fue desarrollada originalmente para proteger la privacidad, pero recientemente se ha utilizado para proteger modelos de aprendizaje automático (ML) de ataques de contaminación, con DP-SGD recibiendo una gran cantidad de atención. Sin embargo, se requiere una investigación exhaustiva para evaluar la eficacia de diferentes técnicas de privacidad diferencial en prevenir ataques de backdoor en la práctica. En este artículo, investigamos la eficacia de DP-SGD y, por primera vez en la literatura, examinamos PATE en el contexto de ataques de backdoor. Además, exploramos el papel de diferentes componentes de los algoritmos de privacidad diferencial en la defensa contra ataques de backdoor y demostraremos que PATE es efectivo contra estos ataques gracias a la estructura de bolsa de los modelos de maestro que emplea. Nuestras experimentos revelan que los hiperparámetros y el número de backdoors en el conjunto de entrenamiento del impactan el éxito de los algoritmos de privacidad diferencial. Además, propongo Label-DP como una alternativa más rápida y precisa a DP-SGD y PATE. Concluimos que, aunque los algoritmos Label-DP generalmente ofrecen una protección de privacidad más débil, la tuning de hiperparámetros precisa puede hacer que sean más efectivos que los métodos de privacidad diferencial en la defensa contra ataques de backdoor mientras se mantiene la precisión del modelo.
Differentiable VQ-VAE’s for Robust White Matter Streamline Encodings
results: 对比了多种现有的 autoencoder 方法,DVQ-VAE 显示出了更高的编码和重建性能。Abstract
Given the complex geometry of white matter streamlines, Autoencoders have been proposed as a dimension-reduction tool to simplify the analysis streamlines in a low-dimensional latent spaces. However, despite these recent successes, the majority of encoder architectures only perform dimension reduction on single streamlines as opposed to a full bundle of streamlines. This is a severe limitation of the encoder architecture that completely disregards the global geometric structure of streamlines at the expense of individual fibers. Moreover, the latent space may not be well structured which leads to doubt into their interpretability. In this paper we propose a novel Differentiable Vector Quantized Variational Autoencoder, which are engineered to ingest entire bundles of streamlines as single data-point and provides reliable trustworthy encodings that can then be later used to analyze streamlines in the latent space. Comparisons with several state of the art Autoencoders demonstrate superior performance in both encoding and synthesis.
摘要
giventext由于白质物流线的复杂几何结构,Autoencoder已经被提议作为一个简化分析的工具,以将流线简化到低维的隐藏空间中。然而,Despite these recent successes, the majority of encoder architectures only perform dimension reduction on single streamlines as opposed to a full bundle of streamlines。这是一个严重的encoder architecture limitation, completely disregards the global geometric structure of streamlines at the expense of individual fibers。Moreover, the latent space may not be well structured which leads to doubt into their interpretability。在这篇文章中,我们提出了一种新的可微分量化自适应器,可以读取整个组合的流线作为单一数据点,并提供可靠可信的编码,可以用来分析流线在隐藏空间中。与多个现有Autoencoder进行比较,我们的方法具有较高的编码和合成性能。
Optimal Cooperative Multiplayer Learning Bandits with Noisy Rewards and No Communication
results: 该论文显示了这种算法可以在不同奖励信息的情况下实现对数($O(\frac{\log T}{\Delta_{\bm{a}})$)的追悟 regret,以及$O(\sqrt{T\log T})$的追悟 regret,这两者都是对数函数。此外,该算法在实际中也比现有的算法表现更好。Abstract
We consider a cooperative multiplayer bandit learning problem where the players are only allowed to agree on a strategy beforehand, but cannot communicate during the learning process. In this problem, each player simultaneously selects an action. Based on the actions selected by all players, the team of players receives a reward. The actions of all the players are commonly observed. However, each player receives a noisy version of the reward which cannot be shared with other players. Since players receive potentially different rewards, there is an asymmetry in the information used to select their actions. In this paper, we provide an algorithm based on upper and lower confidence bounds that the players can use to select their optimal actions despite the asymmetry in the reward information. We show that this algorithm can achieve logarithmic $O(\frac{\log T}{\Delta_{\bm{a}})$ (gap-dependent) regret as well as $O(\sqrt{T\log T})$ (gap-independent) regret. This is asymptotically optimal in $T$. We also show that it performs empirically better than the current state of the art algorithm for this environment.
摘要
我们考虑了合作多player带狗学习问题,其中玩家只能在进程前合作确定策略,但在学习过程中不能交流。在这个问题中,每个玩家同时选择动作,基于所有玩家选择的动作,团队的玩家收到奖励。但是,每个玩家只能看到自己的奖励,其他玩家的奖励是干扰的。由于玩家收到的奖励可能不同,因此存在 asymmetry 在奖励信息中。在这篇论文中,我们提供了基于上下界的 confidence bounds 算法,allowing players to select their optimal actions despite the asymmetry in the reward information. We show that this algorithm can achieve logarithmic $O(\frac{\log T}{\Delta_{\bm{a}})$ (gap-dependent) regret as well as $O(\sqrt{T\log T})$ (gap-independent) regret. This is asymptotically optimal in $T$. We also show that it performs empirically better than the current state of the art algorithm for this environment.
Time Scale Network: A Shallow Neural Network For Time Series Data
results: 这个研究的结果显示,这个时间尺度网络可以在杜立特证明和血液律异常检测中表现出色,其中包括高精度、快速训练和测试速度、以及可视化和解释学习的特征模式。此外,这个网络也在脑电律异常预测中获得了出色的表现。Abstract
Time series data is often composed of information at multiple time scales, particularly in biomedical data. While numerous deep learning strategies exist to capture this information, many make networks larger, require more data, are more demanding to compute, and are difficult to interpret. This limits their usefulness in real-world applications facing even modest computational or data constraints and can further complicate their translation into practice. We present a minimal, computationally efficient Time Scale Network combining the translation and dilation sequence used in discrete wavelet transforms with traditional convolutional neural networks and back-propagation. The network simultaneously learns features at many time scales for sequence classification with significantly reduced parameters and operations. We demonstrate advantages in Atrial Dysfunction detection including: superior accuracy-per-parameter and accuracy-per-operation, fast training and inference speeds, and visualization and interpretation of learned patterns in atrial dysfunction detection on ECG signals. We also demonstrate impressive performance in seizure prediction using EEG signals. Our network isolated a few time scales that could be strategically selected to achieve 90.9% accuracy using only 1,133 active parameters and consistently converged on pulsatile waveform shapes. This method does not rest on any constraints or assumptions regarding signal content and could be leveraged in any area of time series analysis dealing with signals containing features at many time scales.
摘要
时序数据经常具有多个时间尺度信息,特别是在生物医学数据中。虽然有许多深度学习策略可以捕捉这些信息,但是 многие网络变得更大、需要更多的数据、更复杂的计算和更难于解释。这限制了它们在实际应用中的使用,特别是面临有限的计算和数据约束。我们提出了一种简单、计算效率高的时间尺度网络,将翻译和扩展序列使用在离散干扰变换中的Sequence Network与传统的卷积神经网络和反射传播结合。该网络同时学习多个时间尺度的特征,用于序列分类,而无需增加过多的参数和运算。我们在心脏病变诊断中demonstrated出了superior的准确率-参数和运算量,快速的训练和推理速度,以及序列分类结果的可视化和解释。此外,我们还在EEG信号上进行了抑制预测,并达到了90.9%的准确率,只使用1,133个活动参数。这种方法不受任何信号内容的限制,可以在任何时序分析领域中应用,特别是面临着包含多个时间尺度的信号。
Surrogate Neural Networks to Estimate Parametric Sensitivity of Ocean Models
paper_authors: Yixuan Sun, Elizabeth Cucuzzella, Steven Brus, Sri Hari Krishna Narayanan, Balu Nadiga, Luke Van Roekel, Jan Hückelheim, Sandeep Madireddy
for: 研究气候变化和海洋相互作用的影响
methods: 使用神经网络模型和参数推定法
results: 模型输出的参数敏感性分析Here’s a more detailed explanation of each point:
for: The paper is written to study the impact of greenhouse gases, warming, and ice sheet melting on the ocean, as well as the effects of ocean processes on phenomena such as hurricanes and droughts.
methods: The authors use a combination of idealized ocean models, perturbed parameter ensemble data, and surrogate neural network models to analyze the sensitivity of the model output to unmeasurable parameters.
results: The authors compute the parametric sensitivity of the one-step forward dynamics of the model, providing insights into the impact of unmeasurable parameters on the model output.Abstract
Modeling is crucial to understanding the effect of greenhouse gases, warming, and ice sheet melting on the ocean. At the same time, ocean processes affect phenomena such as hurricanes and droughts. Parameters in the models that cannot be physically measured have a significant effect on the model output. For an idealized ocean model, we generated perturbed parameter ensemble data and trained surrogate neural network models. The neural surrogates accurately predicted the one-step forward dynamics, of which we then computed the parametric sensitivity.
摘要
模拟是理解绿色气体、暖化和冰川融化对海洋的效应的关键。同时,海洋过程对风暴和干旱等现象产生了影响。模型中无法测量的参数会对模型输出产生重要影响。为一个理想化的海洋模型,我们生成了受扰参数数据集和训练了神经网络模型。神经网络模型准确预测了下一步动力学行为,我们 THEN 计算了参数敏感度。
Interpretable Graph Anomaly Detection using Gradient Attention Maps
paper_authors: Yifei Yang, Peng Wang, Xiaofan He, Dongmian Zou
for: 本文旨在提出一种基于可解释性的图像异常检测方法,以提高异常检测性能。
methods: 本方法使用图神经网络的梯度来生成注意力地图,并使用这个地图来评分异常。
results: 对比基eline方法,本方法在多个synthetic数据集上表现出色,并且可以帮助我们更好地理解异常检测决策的过程。Abstract
Detecting unusual patterns in graph data is a crucial task in data mining. However, existing methods often face challenges in consistently achieving satisfactory performance and lack interpretability, which hinders our understanding of anomaly detection decisions. In this paper, we propose a novel approach to graph anomaly detection that leverages the power of interpretability to enhance performance. Specifically, our method extracts an attention map derived from gradients of graph neural networks, which serves as a basis for scoring anomalies. In addition, we conduct theoretical analysis using synthetic data to validate our method and gain insights into its decision-making process. To demonstrate the effectiveness of our method, we extensively evaluate our approach against state-of-the-art graph anomaly detection techniques. The results consistently demonstrate the superior performance of our method compared to the baselines.
摘要
检测图形数据中异常 Pattern 是数据挖掘中的一项关键任务。然而,现有的方法经常遇到一些挑战,包括困难保证满意的性能和缺乏可解性,这些缺陷限制了我们对异常检测决策的理解。在这篇论文中,我们提出了一种新的图形异常检测方法,该方法利用可解性来提高性能。具体来说,我们的方法利用图形神经网络的梯度导数来生成一个注意力地图,该地图作为异常分数的基础。此外,我们使用 sintetic data 进行理论分析,以获得我们的方法做出异常检测决策的理解。为了证明我们的方法的有效性,我们对比了我们的方法与当前最佳的图形异常检测技术。结果一致地表明了我们的方法与基eline相比具有更高的性能。
Minimum norm interpolation by perceptra: Explicit regularization and implicit bias
paper_authors: Jiyoung Park, Ian Pelakh, Stephan Wojtowytsch
for: 研究如何 shallow ReLU 网络在知道区域内 interpolate.
methods: 我们的分析表明,当数据点和参数的数量增加,并且权重 decay 正则化的系数逐渐减少时,Empirical risk minimizers 会 converge to a minimum norm interpolant.
results: 我们的numerical研究表明,通用优化算法对known minimum norm interpolants具有隐式偏好,无论有没有显式正则化。Abstract
We investigate how shallow ReLU networks interpolate between known regions. Our analysis shows that empirical risk minimizers converge to a minimum norm interpolant as the number of data points and parameters tends to infinity when a weight decay regularizer is penalized with a coefficient which vanishes at a precise rate as the network width and the number of data points grow. With and without explicit regularization, we numerically study the implicit bias of common optimization algorithms towards known minimum norm interpolants.
摘要
我们调查如何使浅层ReLU网络在已知区域中进行 interpolating。我们的分析表明,在数据点和参数数量增加时,empirical risk minimizers会趋向 minimum norm interpolant 的最小norm的架构,并且随着网络宽度和数据点数量增加,该 coefficient 会逐渐消失。在有Explicit regularization和无Explicit regularization的情况下,我们 numerically 研究了通用优化算法对于已知 minimum norm interpolants 的隐藏偏见。
Distributionally Robust Skeleton Learning of Discrete Bayesian Networks
methods: 利用分布性robust优化和回归方法,最大化最差风险(worst-case risk)在 Family of Distributions 内的 bounded Wasserstein distance 或 KL divergence 到 empirical distribution。
results: 提出了一种可以应用于普遍 categorical random variables 的方法,不需要 faithfulness、ordinal relationship 或 specific conditional distribution 假设。 提供了高效的算法,并在轻度假设下提供了非 asymptotic 保证。 数值研究表明方法的有效性。 Code 可以在 https://github.com/DanielLeee/drslbn 找到。Abstract
We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data. Building on distributionally robust optimization and a regression approach, we propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution. The worst-case risk accounts for the effect of outliers. The proposed approach applies for general categorical random variables without assuming faithfulness, an ordinal relationship or a specific form of conditional distribution. We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach. Under mild assumptions, we derive non-asymptotic guarantees for successful structure learning with logarithmic sample complexities for bounded-degree graphs. Numerical study on synthetic and real datasets validates the effectiveness of our method. Code is available at https://github.com/DanielLeee/drslbn.
摘要
我们考虑一个统计学上的问题,即从潜在损害的数据中学习一般化的抽象骨架。我们基于分布式弹性优化和回归方法,提出一个优化最坏风险的方法,其中最坏风险是指在一家族中的分布体内的最大差距或KL散度与empirical分布之间的最大差距。这个风险考虑到噪音的影响。我们的方法适用于一般的分类随机Variable而无需假设忠诚、排序关系或具体的假设。我们提供高效的算法和证明在对��� bounded-degree graph的简单假设下,我们可以获得非对数� Spark complexity的成功结构学。我们的方法与标准的规制化回归方法密切相关。我们的方法在实验中证明了有效。code可以在https://github.com/DanielLeee/drslbn中找到。
Turbulence Scaling from Deep Learning Diffusion Generative Models
results: 研究发现,新生成的液体动力学解具有与预期的科尔мого罗夫 scaling 相同的统计尺度 Properties,并且比训练数据集的统计尺度更加精度。这种与实际液体动力学特征相符的表现,提供了模型能够捕捉实际液体动力学特征的强有力证据。Abstract
Complex spatial and temporal structures are inherent characteristics of turbulent fluid flows and comprehending them poses a major challenge. This comprehesion necessitates an understanding of the space of turbulent fluid flow configurations. We employ a diffusion-based generative model to learn the distribution of turbulent vorticity profiles and generate snapshots of turbulent solutions to the incompressible Navier-Stokes equations. We consider the inverse cascade in two spatial dimensions and generate diverse turbulent solutions that differ from those in the training dataset. We analyze the statistical scaling properties of the new turbulent profiles, calculate their structure functions, energy power spectrum, velocity probability distribution function and moments of local energy dissipation. All the learnt scaling exponents are consistent with the expected Kolmogorov scaling and have lower errors than the training ones. This agreement with established turbulence characteristics provides strong evidence of the model's capability to capture essential features of real-world turbulence.
摘要
困难的空间和时间结构是液体动力学中抽象流动的内在特征,理解这些特征是很重要的。我们使用一种扩散基于的生成模型来学习液体动力学中抽象扩散的分布,并生成了不同于训练数据集的液体动力学解。我们在两维空间中考虑逆升阶段,并生成了多种不同的液体动力学解,与训练数据集的解不同。我们分析了新的液体动力学Profile的统计尺度性质,计算了其结构函数、能量频谱、速度分布函数和本地能量投入的积分。所学到的扩散 exponent都与预期的科尔莫戈罗夫 scaling 相符,并且与训练数据集中的 exponent 有更低的错误。这种一致性提供了强有力的证据,证明了模型能够捕捉真实世界中的液体动力学特征。
An Interpretable Machine Learning Framework to Understand Bikeshare Demand before and during the COVID-19 Pandemic in New York City
results: 根据这个研究中考虑的说明变数的相对重要性,女性用户占有和小时是这两个模型中最重要的变数。然而,月份变数在大流行模型中比在以前模型中更重要。Abstract
In recent years, bikesharing systems have become increasingly popular as affordable and sustainable micromobility solutions. Advanced mathematical models such as machine learning are required to generate good forecasts for bikeshare demand. To this end, this study proposes a machine learning modeling framework to estimate hourly demand in a large-scale bikesharing system. Two Extreme Gradient Boosting models were developed: one using data from before the COVID-19 pandemic (March 2019 to February 2020) and the other using data from during the pandemic (March 2020 to February 2021). Furthermore, a model interpretation framework based on SHapley Additive exPlanations was implemented. Based on the relative importance of the explanatory variables considered in this study, share of female users and hour of day were the two most important explanatory variables in both models. However, the month variable had higher importance in the pandemic model than in the pre-pandemic model.
摘要
Recently, 自行车共享系统已经成为非常受欢迎的可靠和可持续的微型交通解决方案。为了生成好的预测模型,这些研究需要进行高级的数据分析和机器学习模型。为此,本研究提出了一个机器学习模型框架,用于估计大规模自行车共享系统的每小时需求。这些研究发展了两个极大Gradient Boosting模型:一个使用2019年3月至2020年2月的数据(前疫情时期),另一个使用2020年3月至2021年2月的数据(疫情时期)。此外,基于SHapley Additive exPlanations的模型解释框架也被实现。根据这些研究中考虑的说明变量的相对重要性,女性用户的份额和时间段是这两个模型中最重要的说明变量。但是,月份变量在疫情模型中比前疫情模型更重要。
1-Lipschitz Neural Networks are more expressive with N-Activations
results: 论文表明,常用的activation function,如MaxMin,以及所有的二段折线activation function都过于限制了函数的表达能力,即使在 simplest一dimensional setting中。它们还引入了一种新的N-activation function,可以更好地表达函数。Abstract
A crucial property for achieving secure, trustworthy and interpretable deep learning systems is their robustness: small changes to a system's inputs should not result in large changes to its outputs. Mathematically, this means one strives for networks with a small Lipschitz constant. Several recent works have focused on how to construct such Lipschitz networks, typically by imposing constraints on the weight matrices. In this work, we study an orthogonal aspect, namely the role of the activation function. We show that commonly used activation functions, such as MaxMin, as well as all piece-wise linear ones with two segments unnecessarily restrict the class of representable functions, even in the simplest one-dimensional setting. We furthermore introduce the new N-activation function that is provably more expressive than currently popular activation functions. We provide code at https://github.com/berndprach/NActivation.
摘要
一个深度学习系统的关键性能特性是其Robustness:小改变输入 shouldn't result in large changes to its outputs. 数学上,这意味着一个网络的 lipschitz常数应该小。 一些最近的工作已经关注如何构建这样的 lipschitz 网络,通常是通过加载矩阵的约束。在这个工作中,我们研究了另一个正交方面,即激活函数的角色。我们显示,通用的激活函数,如 MaxMin,以及所有分割线性的两段激活函数都过于限制了可表示的函数的类型,即使在最简单的一维设定中。我们还引入了新的 N-激活函数,可以证明比现有的激活函数更加表达力强。我们提供了相关代码在 GitHub 上:https://github.com/berndprach/NActivation。
Symbolic Regression as Feature Engineering Method for Machine and Deep Learning Regression Tasks
results: SR-derived features可以帮助提高机器学习和深度学习回归模型的预测精度,实验结果显示SR可以提高模型的root mean square error(RMSE)值34-86%,并在实际应用中提高预测超导温度的准确率。Abstract
In the realm of machine and deep learning regression tasks, the role of effective feature engineering (FE) is pivotal in enhancing model performance. Traditional approaches of FE often rely on domain expertise to manually design features for machine learning models. In the context of deep learning models, the FE is embedded in the neural network's architecture, making it hard for interpretation. In this study, we propose to integrate symbolic regression (SR) as an FE process before a machine learning model to improve its performance. We show, through extensive experimentation on synthetic and real-world physics-related datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and deep learning regression models with 34-86% root mean square error (RMSE) improvement in synthetic datasets and 4-11.5% improvement in real-world datasets. In addition, as a realistic use-case, we show the proposed method improves the machine learning performance in predicting superconducting critical temperatures based on Eliashberg theory by more than 20% in terms of RMSE. These results outline the potential of SR as an FE component in data-driven models.
摘要
在机器学习和深度学习回归任务中,有效的特征工程(FE)角色是关键的提高模型性能。传统的FE方法通常依赖于领域专家手动设计机器学习模型的特征。在深度学习模型中,FE是内置在神经网络结构中,使其解释性困难。在这项研究中,我们提议将符号回归(SR)作为FE过程来改进机器学习模型的性能。我们通过对 sintetic和实际物理相关数据集进行广泛的实验,发现SR derivated特征的 integrate 可以显著提高机器学习和深度学习回归模型的预测能力,具体来说,在 sintetic 数据集中,RMSE 下降了34-86%,而在实际数据集中,RMSE 下降了4-11.5%。此外,我们还展示了该方法可以在预测超导极限温度基于Eliashberg理论中提高机器学习性能,具体来说,RMSE 下降了 más de 20%。这些结果表明SR 可以作为数据驱动模型中的FE组件。
Doubly Robust Structure Identification from Temporal Data
results: 我们的实验结果表明,我们的方法在噪声和循环性数据情况下具有明显的优势,并且可以很好地鉴别出真实的原因结构。Abstract
Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantially biased. Furthermore, potential causes may be correlated in practical applications. Moreover, existing algorithms often do not work with cyclic data. To address these challenges, we propose a new doubly robust method for Structure Identification from Temporal Data ( SITD ). We provide theoretical guarantees, showing that our method asymptotically recovers the true underlying causal structure. Our analysis extends to cases where the potential causes have cycles and they may be confounded. We further perform extensive experiments to showcase the superior performance of our method.
摘要
学习时序数据的原因是许多应用程序的基本任务,从金融到地球科学或生物医学应用程序。常见的方法基于向量自动回归,但这些方法不考虑可能存在的隐藏干扰因素。在具有多个可能的原因和噪声数据的情况下,这些方法可能受到重大偏误。此外,实际应用中的原因可能相互 correlated。此外,现有的算法通常不能处理循环数据。为解决这些挑战,我们提出了一种新的双重可靠方法 для时间数据结构鉴别(SITD)。我们提供了理论保证,表明我们的方法在极限情况下可以准确回归真实的下面结构。我们的分析涵盖了可能存在循环的原因,以及它们可能受到干扰的情况。我们进一步进行了广泛的实验,以示出我们的方法的超过其他方法的性能。
Graph GOSPA metric: a metric to measure the discrepancy between graphs of different sizes
paper_authors: Jinhao Gu, Ángel F. García-Fernández, Robert E. Firth, Lennart Svensson
for: This paper proposes a metric to measure the dissimilarity between graphs with different numbers of nodes.
methods: The proposed metric extends the generalised optimal subpattern assignment (GOSPA) metric for sets to graphs, and includes costs associated with node attribute errors, missed and false nodes, and edge mismatches between graphs.
results: The metric is computable in polynomial time using linear programming, and its properties are demonstrated via simulated and empirical datasets.Here’s the same information in Simplified Chinese:
results: 该metric可以使用线性 программирова来计算,并通过验证模拟和实验数据显示了其性质。Abstract
This paper proposes a metric to measure the dissimilarity between graphs that may have a different number of nodes. The proposed metric extends the generalised optimal subpattern assignment (GOSPA) metric, which is a metric for sets, to graphs. The proposed graph GOSPA metric includes costs associated with node attribute errors for properly assigned nodes, missed and false nodes and edge mismatches between graphs. The computation of this metric is based on finding the optimal assignments between nodes in the two graphs, with the possibility of leaving some of the nodes unassigned. We also propose a lower bound for the metric, which is also a metric for graphs and is computable in polynomial time using linear programming. The metric is first derived for undirected unweighted graphs and it is then extended to directed and weighted graphs. The properties of the metric are demonstrated via simulated and empirical datasets.
摘要
results: 论文证明了这些函数在非随机 Setting 中的带有反馈的情况下,可以达到$(1 - \frac{1}{e})$ regret bound,其bound 为 $\sqrt{MKT}$(忽略对数因子),其中 $T$ 是时间戳和 $M$ 是 Cardinality 约束。这个 bound 胜过了在线半模式函数最大化的 $\widetilde{O}(T^{2/3})$ regret bound。Abstract
Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on facility location, $M$-medians, and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{O}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback.
摘要
多个在线决策问题都对应于最大化一个序列的准确函数。在这项工作中,我们介绍了总最大函数,它是准确函数的一个子类,捕捉了许多有趣的问题,包括最佳-$K$-投降、组合投降、投降版本的设施位置、$M$-中心和击中集。我们证明了这些函数都满足一个关键性的pseudo-凹性性质,这使得我们可以证明在非随机设定下的递归约束下, regret bound为$\big(1 - \frac{1}{e}\big)$(忽略log因子),其值在$\sqrt{MKT}$之间。这个 bound是由一种简单和高效的算法实现,与之前的 $\widetilde{O}\big(T^{2/3}\big)$ regret bound相比,有所改善。
Plasma Surrogate Modelling using Fourier Neural Operators
paper_authors: Vignesh Gopakumar, Stanislas Pamela, Lorenzo Zanisi, Zongyi Li, Ander Gray, Daniel Brennand, Nitesh Bhatia, Gregory Stathopoulos, Matt Kusner, Marc Peter Deisenroth, Anima Anandkumar, JOREK Team, MAST Team
results: FNO可以准确预测束激发的发展,并在实验域中预测实际观测数据。我们在MAST托卡马克实验室中使用摄像头记录束激发的发展,并发现FNO可以准确预测束激发的发展和形状,以及束激发与中央气流和束激发器的互动的位置。FNO具有快速训练和推理,需要 fewer data points,可以完成零射播超解析,并且能够获得高精度解决方案。Abstract
Predicting plasma evolution within a Tokamak reactor is crucial to realizing the goal of sustainable fusion. Capabilities in forecasting the spatio-temporal evolution of plasma rapidly and accurately allow us to quickly iterate over design and control strategies on current Tokamak devices and future reactors. Modelling plasma evolution using numerical solvers is often expensive, consuming many hours on supercomputers, and hence, we need alternative inexpensive surrogate models. We demonstrate accurate predictions of plasma evolution both in simulation and experimental domains using deep learning-based surrogate modelling tools, viz., Fourier Neural Operators (FNO). We show that FNO has a speedup of six orders of magnitude over traditional solvers in predicting the plasma dynamics simulated from magnetohydrodynamic models, while maintaining a high accuracy (MSE $\approx$ $10^{-5}$). Our modified version of the FNO is capable of solving multi-variable Partial Differential Equations (PDE), and can capture the dependence among the different variables in a single model. FNOs can also predict plasma evolution on real-world experimental data observed by the cameras positioned within the MAST Tokamak, i.e., cameras looking across the central solenoid and the divertor in the Tokamak. We show that FNOs are able to accurately forecast the evolution of plasma and have the potential to be deployed for real-time monitoring. We also illustrate their capability in forecasting the plasma shape, the locations of interactions of the plasma with the central solenoid and the divertor for the full duration of the plasma shot within MAST. The FNO offers a viable alternative for surrogate modelling as it is quick to train and infer, and requires fewer data points, while being able to do zero-shot super-resolution and getting high-fidelity solutions.
摘要
预测tokamak激光器中激液的发展是实现可持续核聚合的关键。我们需要快速和准确地预测激液的空间时间发展,以便快速迭代设计和控制策略。 numerically solving plasma evolution models is often expensive and time-consuming, so we need inexpensive surrogate models. We demonstrate accurate predictions of plasma evolution using deep learning-based surrogate modeling tools, specifically Fourier Neural Operators (FNO). FNO has a speedup of six orders of magnitude over traditional solvers, while maintaining a high accuracy (MSE $\approx$ $10^{-5}$). Our modified version of FNO can solve multi-variable partial differential equations (PDEs) and capture the dependence among variables in a single model. FNOs can also predict plasma evolution on real-world experimental data from cameras positioned within the MAST Tokamak, such as cameras looking across the central solenoid and the divertor. We show that FNOs can accurately forecast plasma evolution and have the potential to be deployed for real-time monitoring. Additionally, we demonstrate their capability in forecasting the plasma shape, the locations of interactions of the plasma with the central solenoid and the divertor for the full duration of the plasma shot within MAST. FNO offers a viable alternative for surrogate modeling as it is quick to train and infer, requires fewer data points, and can perform zero-shot super-resolution with high-fidelity solutions.
Multiscale Neural Operators for Solving Time-Independent PDEs
paper_authors: Winfried Ripken, Lisa Coiffard, Felix Pieper, Sebastian Dziadzio
for: 解决大型精度离散方程在数据驱动神经网络中的挑战。
methods: 提出了一种图rewiring技术,以增强神经网络的全球交互能力。
results: 实验结果显示,我们的GNN方法在不规则网格上实现了时间独立精度离散方程的新高度表现标准,而我们的图rewiring策略也提高了基线方法的表现,实现了一个任务中的状态之最。Abstract
Time-independent Partial Differential Equations (PDEs) on large meshes pose significant challenges for data-driven neural PDE solvers. We introduce a novel graph rewiring technique to tackle some of these challenges, such as aggregating information across scales and on irregular meshes. Our proposed approach bridges distant nodes, enhancing the global interaction capabilities of GNNs. Our experiments on three datasets reveal that GNN-based methods set new performance standards for time-independent PDEs on irregular meshes. Finally, we show that our graph rewiring strategy boosts the performance of baseline methods, achieving state-of-the-art results in one of the tasks.
摘要
时间独立的偏微分方程(PDEs)在大型网格上具有严重的挑战,尤其是 для数据驱动的神经偏微分方程解solvers。我们介绍了一种新的グラフ重络技术来解决一些这些挑战,例如在不同维度和不规则网格上聚合信息。我们的提议方法可以跨距离节点相互作用,提高全球几何网络(GNNs)的全球互动能力。我们的实验结果显示,GNN-based方法在三个数据集上设置了新的性能标准,并且在其中一个任务中,我们的グラフ重络策略提高了基准方法的性能,实现了最佳结果。
Hierarchical deep learning-based adaptive time-stepping scheme for multiscale simulations
paper_authors: Asif Hamid, Danish Rafiq, Shahkar Ahmad Nahvi, Mohammad Abid Bazaz
for: 这篇研究是为了解决复杂非线性系统中的多尺度问题。
methods: 这篇研究提出了一种使用深度神经网络来解决多尺度问题的新方法。
results: 这篇研究获得了比固定步骤神经网络解析器更好的性能,并且在计算时间上降低了比例。Abstract
Multiscale is a hallmark feature of complex nonlinear systems. While the simulation using the classical numerical methods is restricted by the local \textit{Taylor} series constraints, the multiscale techniques are often limited by finding heuristic closures. This study proposes a new method for simulating multiscale problems using deep neural networks. By leveraging the hierarchical learning of neural network time steppers, the method adapts time steps to approximate dynamical system flow maps across timescales. This approach achieves state-of-the-art performance in less computational time compared to fixed-step neural network solvers. The proposed method is demonstrated on several nonlinear dynamical systems, and source codes are provided for implementation. This method has the potential to benefit multiscale analysis of complex systems and encourage further investigation in this area.
摘要
多尺度特征是复杂非线性系统的标志性特征。而使用传统的数值方法进行模拟时,会受到本地Taylor系列约束,而多尺度技术则经常受到寻找封闭的限制。本研究提出了使用深度神经网络来模拟多尺度问题的新方法。通过神经网络时间步骤的层次学习,该方法可以对不同时间尺度的动力系统流图进行approximation。这种方法可以在计算时间上比固定步骤神经网络解决方案更快,并达到当前最佳性能。该方法在多个非线性动力系统中进行了示例,并提供了实现代码。这种方法具有推动多尺度分析复杂系统的潜力,并鼓励这一领域进一步的研究。
ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation
results: 实验结果表明,这个方法在三个实际 dataset(Baby, Sports, Clothing)上的评价比靶场的方法高,并且可以增强内容表示的Semantic Features。Abstract
Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance to combine (user- and item-) ID embeddings with multimodal salient features, indicating the value of IDs. However, there is a lack of a thorough analysis of the ID embeddings in terms of feature semantics in the literature. In this paper, we revisit the value of ID embeddings for multimodal recommendation and conduct a thorough study regarding its semantics, which we recognize as subtle features of content and structures. Then, we propose a novel recommendation model by incorporating ID embeddings to enhance the semantic features of both content and structures. Specifically, we put forward a hierarchical attention mechanism to incorporate ID embeddings in modality fusing, coupled with contrastive learning, to enhance content representations. Meanwhile, we propose a lightweight graph convolutional network for each modality to amalgamate neighborhood and ID embeddings for improving structural representations. Finally, the content and structure representations are combined to form the ultimate item embedding for recommendation. Extensive experiments on three real-world datasets (Baby, Sports, and Clothing) demonstrate the superiority of our method over state-of-the-art multimodal recommendation methods and the effectiveness of fine-grained ID embeddings.
摘要
多模态推荐的目标是全面地表示用户和项目表示,并利用多种多媒体内容来提供有效的推荐。现有研究表明,将用户和项目ID编码与多模态突出特征结合起来可以提高推荐性能。然而,学术文献中对ID编码的semantics还没有进行了全面的分析。本文重新评估了多模态推荐中ID编码的值,并进行了semantics的全面分析。然后,我们提出了一种新的推荐模型,该模型通过结合ID编码来增强内容和结构的semantics。具体来说,我们提出了一种层次注意机制,将ID编码与多模态融合进行了强调,并与对比学习结合使用,以提高内容表示。同时,我们提出了一种轻量级的图 convolutional network,用于每种模态的卷积整合,以提高结构表示。最后,内容和结构表示被组合,形成了最终的项目嵌入,用于推荐。我们对三个实际 datasets(婴儿、运动和时尚)进行了广泛的实验,并证明了我们的方法在多模态推荐方法中的优越性和ID编码的细腻性。
Learning-Augmented Scheduling for Solar-Powered Electric Vehicle Charging
for: scheduling the charging of electric vehicles equipped with solar panels and batteries, particularly under out-of-distribution (OOD) conditions.
methods: leverages a novel learning-augmented policy that employs a dynamic robustness budget, which is adapted in real-time based on the reinforcement learning policy’s performance, using the temporal difference (TD) error to assess the trustworthiness of the machine-learned policy.
results: markedly improves scheduling effectiveness and reliability, particularly in OOD contexts, paving the way for more resilient and adaptive EV charging systems.Abstract
We tackle the complex challenge of scheduling the charging of electric vehicles (EVs) equipped with solar panels and batteries, particularly under out-of-distribution (OOD) conditions. Traditional scheduling approaches, such as reinforcement learning (RL) and model predictive control (MPC), often fail to provide satisfactory results when faced with OOD data, struggling to balance robustness (worst-case performance) and consistency (near-optimal average performance). To address this gap, we introduce a novel learning-augmented policy. This policy employs a dynamic robustness budget, which is adapted in real-time based on the reinforcement learning policy's performance. Specifically, it leverages the temporal difference (TD) error, a measure of the learning policy's prediction accuracy, to assess the trustworthiness of the machine-learned policy. This method allows for a more effective balance between consistency and robustness in EV charging schedules, significantly enhancing adaptability and efficiency in real-world, unpredictable environments. Our results demonstrate that this approach markedly improves scheduling effectiveness and reliability, particularly in OOD contexts, paving the way for more resilient and adaptive EV charging systems.
摘要
我们面临电动汽车(EV)装有太阳能板和电池的充电时间安排的复杂挑战,尤其在异常输入(OOD)条件下。传统的安排方法,如强化学习(RL)和预测模型控制(MPC),在面临OOD数据时经常无法提供满意的结果,坚持着平衡稳定性(最差性能)和一致性(近似优性)。为解决这个差距,我们介绍了一种新的学习增强策略。这种策略使用动态 robustness预算,实时根据学习策略的性能而改变。具体来说,它利用时间差(TD)错误,用于评估机器学习策略的预测准确性。这种方法允许更好地平衡稳定性和一致性在EV充电时间安排中,大大提高了适应性和效率,特别在实际不可预测的环境中。我们的结果表明,这种方法在OOD上进行了明显改进,大大提高了安排效果和可靠性,开拓了更加可靠和适应的EV充电系统。
Aggregation Weighting of Federated Learning via Generalization Bound Estimation
results: 通过实验,提出的聚合策略可以显著提高多种代表性FL算法在标准数据集上的性能。Abstract
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. However, this naive weighting method may lead to unfairness and degradation in model performance due to statistical heterogeneity and the inclusion of noisy data among clients. Theoretically, distributional robustness analysis has shown that the generalization performance of a learning model with respect to any shifted distribution is bounded. This motivates us to reconsider the weighting approach in federated learning. In this paper, we replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model. Specifically, we estimate the upper and lower bounds of the second-order origin moment of the shifted distribution for the current local model, and then use these bounds disagreements as the aggregation proportions for weightings in each communication round. Experiments demonstrate that the proposed weighting strategy significantly improves the performance of several representative FL algorithms on benchmark datasets.
摘要
通常, Federated Learning (FL) 使用 Client 模型参数的权重方法进行聚合。但这种简单的权重方法可能会导致不公平和模型性能下降,因为客户端数据具有统计不同性和噪声。理论上,分布robustness分析表明,学习模型对于任何偏移分布的总体性能具有上限。这些上限提供了一个重新考虑权重策略的动机。在本文中,我们将替换原来的权重策略,使用每个本地模型的泛化约束来确定聚合比例。具体来说,我们将估计当前本地模型的第二个源 moments的上下限,并使用这些上下限的差异作为每个通信轮的聚合比例。实验表明,我们的权重策略可以在多个代表性 FL 算法上显著改进模型性能。
Federated Learning with Manifold Regularization and Normalized Update Reaggregation
methods: 本 paper 使用了扩展的散列学习框架,具有聚合客户端更新norm的新全球优化器,以解决模型不一致性问题。具体来说,本 paper 使用了拓扑学习的概念,在散列学习中增加了一个拓扑模型融合方案,以便更好地反映模型的不一致性。
results: 实验表明,FedMRUR可以在散列学习中达到新的州际标准(SOTA)精度,并且减少了通信量。此外,本 paper 还证明了我们的算法在非对称Setting下可以达到线性增速性质。Abstract
Federated Learning (FL) is an emerging collaborative machine learning framework where multiple clients train the global model without sharing their own datasets. In FL, the model inconsistency caused by the local data heterogeneity across clients results in the near-orthogonality of client updates, which leads to the global update norm reduction and slows down the convergence. Most previous works focus on eliminating the difference of parameters (or gradients) between the local and global models, which may fail to reflect the model inconsistency due to the complex structure of the machine learning model and the Euclidean space's limitation in meaningful geometric representations. In this paper, we propose FedMRUR by adopting the manifold model fusion scheme and a new global optimizer to alleviate the negative impacts. Concretely, FedMRUR adopts a hyperbolic graph manifold regularizer enforcing the representations of the data in the local and global models are close to each other in a low-dimensional subspace. Because the machine learning model has the graph structure, the distance in hyperbolic space can reflect the model bias better than the Euclidean distance. In this way, FedMRUR exploits the manifold structures of the representations to significantly reduce the model inconsistency. FedMRUR also aggregates the client updates norms as the global update norm, which can appropriately enlarge each client's contribution to the global update, thereby mitigating the norm reduction introduced by the near-orthogonality of client updates. Furthermore, we theoretically prove that our algorithm can achieve a linear speedup property for non-convex setting under partial client participation.Experiments demonstrate that FedMRUR can achieve a new state-of-the-art (SOTA) accuracy with less communication.
摘要
联邦学习(FL)是一种在多个客户端上训练全域模型的新兴协力机器学习框架,而不需要客户端分享自己的数据。在FL中,因为客户端的地方数据不同而导致的模型不一致性,导致客户端更新的方向接近垂直方向,这会导致全域更新的规模增加和步骤变慢。大多数先前的工作强调在删除本地和全域模型之间的差异,但这可能无法反映模型不一致性,因为机器学习模型的复杂结构和欧几何空间的限制。在这篇文章中,我们提出了FedMRUR,通过采用数据构造模型融合方案和一个新的全域优化器,以解决这些负面影响。具体来说,FedMRUR采用一个拓扑图 manifold regularizer,使得本地和全域模型的表现在低维度子空间中相似。因为机器学习模型具有图结构,在拓扑图上的距离可以更好地反映模型偏见。这样,FedMRUR可以将数据表现的拓扑图结构纳入到模型训练中,以减少模型不一致性。FedMRUR还将客户端更新的规模总和为全域更新的规模,这可以适当地增加每个客户端的贡献,从而减少由近似垂直方向的客户端更新所导致的规模增加。此外,我们也 theoretically 证明了我们的算法可以在非凸设定下 achievelinear speedup 性。实验结果显示,FedMRUR可以 achieve 新的最佳性(SOTA)的准确性,并且需要更少的通信。
An alternative for one-hot encoding in neural network models
results: 本文的实验结果表明,使用二进制编码和修改前向和反向传播过程可以实现神经网络学习过程中 certain 特征类别数据实例的模型权重变化只影响该特征类别数据实例的计算,从而提高模型的性能。Abstract
This paper proposes an algorithm that implements binary encoding of the categorical features of neural network model input data, while also implementing changes in the forward and backpropagation procedures in order to achieve the property of having model weight changes, that result from the neural network learning process for certain data instances of some feature category, only affect the forward pass calculations for input data instances of that same feature category, as it is in the case of utilising one-hot encoding for categorical features.
摘要
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
results: FlashFFTConv 在 exact FFT convolutions 中提高了速度,比 PyTorch 快速了 7.93 倍,并在 end-to-end 速度上达到了 4.4 倍速化。此外, FlashFFTConv 在 Hyena-GPT-s 和 M2-BERT-base 中实现了更好的模型质量,与同样计算预算下的模型具有相同或更好的性能。Abstract
Convolution models with long filters have demonstrated state-of-the-art reasoning abilities in many long-sequence tasks but lag behind the most optimized Transformers in wall-clock time. A major bottleneck is the Fast Fourier Transform (FFT)--which allows long convolutions to run in $O(N logN)$ time in sequence length $N$ but has poor hardware utilization. In this paper, we study how to optimize the FFT convolution. We find two key bottlenecks: the FFT does not effectively use specialized matrix multiply units, and it incurs expensive I/O between layers of the memory hierarchy. In response, we propose FlashFFTConv. FlashFFTConv uses a matrix decomposition that computes the FFT using matrix multiply units and enables kernel fusion for long sequences, reducing I/O. We also present two sparse convolution algorithms--1) partial convolutions and 2) frequency-sparse convolutions--which can be implemented simply by skipping blocks in the matrix decomposition, enabling further opportunities for memory and compute savings. FlashFFTConv speeds up exact FFT convolutions by up to 7.93$\times$ over PyTorch and achieves up to 4.4$\times$ speedup end-to-end. Given the same compute budget, FlashFFTConv allows Hyena-GPT-s to achieve 2.3 points better perplexity on the PILE and M2-BERT-base to achieve 3.3 points higher GLUE score--matching models with twice the parameter count. FlashFFTConv also achieves 96.1% accuracy on Path-512, a high-resolution vision task where no model had previously achieved better than 50%. Furthermore, partial convolutions enable longer-sequence models--yielding the first DNA model that can process the longest human genes (2.3M base pairs)--and frequency-sparse convolutions speed up pretrained models while maintaining or improving model quality.
摘要
卷积模型 WITH long filters 已经在许多长序任务中显示出了state-of-the-art的理解能力,但它们在wall-clock时间方面落后于最优化的 Transformer。一个主要瓶颈是 Fast Fourier Transform (FFT),它可以在序列长度 N 的情况下使卷积运算时间为 $O(N \log N)$,但硬件利用率不高。在这篇论文中,我们研究如何优化 FFT 卷积。我们发现了两个关键瓶颈:FFT 不好地使用特殊化矩阵乘法单元,并且在层次结构中进行 I/O 操作会产生昂贵的成本。为了解决这些问题,我们提出了 FlashFFTConv。FlashFFTConv 使用矩阵分解来计算 FFT,并使用矩阵乘法单元进行计算,从而提高硬件利用率。此外,我们还提出了两种稀疏卷积算法:1)部分卷积和2)频率稀疏卷积。这些算法可以通过跳过块来实现,从而实现更多的内存和计算减少。FlashFFTConv 可以在精确 FFT 卷积中提高速度,达到 Up to 7.93 倍 PyTorch 的速度,并在综合评估中达到 Up to 4.4 倍的速度。给定同样的计算预算,FlashFFTConv 允许 Hyena-GPT-s 在 PILE 上达到 2.3 个点更高的折衔率,并使 M2-BERT-base 在 GLUE 上达到 3.3 个点更高的分数。FlashFFTConv 还可以在 Path-512 高分辨率视觉任务中达到 96.1% 的准确率,并且部分卷积可以实现更长的序列模型,例如可以处理人类基因最长的 2.3M 个基因对。此外,频率稀疏卷积可以加速预训练模型,保持或提高模型质量。
Can Machine Learning Uncover Insights into Vehicle Travel Demand from Our Built Environment?
results: 研究结果表明,使用计算模型可以帮助设计师快速获得交通需求的反馈,包括交通总量和时间分布。此外,计算模型还可以帮助评估和优化城市用地规划,从车辆交通的角度来看。Abstract
In this paper, we propose a machine learning-based approach to address the lack of ability for designers to optimize urban land use planning from the perspective of vehicle travel demand. Research shows that our computational model can help designers quickly obtain feedback on the vehicle travel demand, which includes its total amount and temporal distribution based on the urban function distribution designed by the designers. It also assists in design optimization and evaluation of the urban function distribution from the perspective of vehicle travel. We obtain the city function distribution information and vehicle hours traveled (VHT) information by collecting the city point-of-interest (POI) data and online vehicle data. The artificial neural networks (ANNs) with the best performance in prediction are selected. By using data sets collected in different regions for mutual prediction and remapping the predictions onto a map for visualization, we evaluate the extent to which the computational model sees use across regions in an attempt to reduce the workload of future urban researchers. Finally, we demonstrate the application of the computational model to help designers obtain feedback on vehicle travel demand in the built environment and combine it with genetic algorithms to optimize the current state of the urban environment to provide recommendations to designers.
摘要
在这篇论文中,我们提出了一种基于机器学习的方法,以解决城市规划师无法根据交通工具需求优化城市土地使用的问题。研究表明,我们的计算模型可以帮助城市规划师快速获得交通工具需求的总量和时间分布,包括基于城市功能分布的交通工具需求。此外,它还可以帮助评估和优化城市功能分布的交通工具需求。我们通过收集城市点对点数据和在线交通数据获得城市功能分布信息和交通时间(VHT)信息。我们选择了最佳表现的人工神经网络(ANNs)进行预测。通过在不同地区进行互Predict和重新映射预测结果onto a map for visualization,我们评估了计算模型在不同地区的使用程度,以降低未来城市研究者的工作负担。最后,我们示出了计算模型如何帮助城市规划师获得交通工具需求反馈,并与遗传算法结合优化当前城市环境,以提供建议给城市规划师。
results: 提出了一种新的高阶TRPCA方法LMH-BRTF,通过建立一个基于order-$d$ t-SVD的低级模型和适当的先验来自动确定tensor的多rank结构,并且能够更好地利用噪音信息,从而提高TRPCA的性能。Abstract
The recently proposed tensor robust principal component analysis (TRPCA) methods based on tensor singular value decomposition (t-SVD) have achieved numerous successes in many fields. However, most of these methods are only applicable to third-order tensors, whereas the data obtained in practice are often of higher order, such as fourth-order color videos, fourth-order hyperspectral videos, and fifth-order light-field images. Additionally, in the t-SVD framework, the multi-rank of a tensor can describe more fine-grained low-rank structure in the tensor compared with the tubal rank. However, determining the multi-rank of a tensor is a much more difficult problem than determining the tubal rank. Moreover, most of the existing TRPCA methods do not explicitly model the noises except the sparse noise, which may compromise the accuracy of estimating the low-rank tensor. In this work, we propose a novel high-order TRPCA method, named as Low-Multi-rank High-order Bayesian Robust Tensor Factorization (LMH-BRTF), within the Bayesian framework. Specifically, we decompose the observed corrupted tensor into three parts, i.e., the low-rank component, the sparse component, and the noise component. By constructing a low-rank model for the low-rank component based on the order-$d$ t-SVD and introducing a proper prior for the model, LMH-BRTF can automatically determine the tensor multi-rank. Meanwhile, benefiting from the explicit modeling of both the sparse and noise components, the proposed method can leverage information from the noises more effectivly, leading to an improved performance of TRPCA. Then, an efficient variational inference algorithm is established for parameters estimation. Empirical studies on synthetic and real-world datasets demonstrate the effectiveness of the proposed method in terms of both qualitative and quantitative results.
摘要
最近提出的高阶矩阵坚定原理Component Analysis(TRPCA)方法,基于高阶矩阵均值分解(t-SVD),在多个领域取得了成功。然而,大多数这些方法只适用于第三阶矩阵,而实际数据通常是更高阶的,例如第四阶色视频、第四阶射频视频和第五阶光场图像。此外,在t-SVD框架中,矩阵多rank可以描述矩阵中更细化的低级结构,相比于管道rank。然而,确定矩阵多rank是一个更加困难的问题,而且大多数现有的TRPCA方法并不明确地模型噪音。在这种情况下,我们提出了一种新的高阶TRPCA方法,即含有抽象的高阶矩阵均值分解(LMH-BRTF)。具体来说,我们将观察到的受损矩阵分解成三部分:低级组成部分、稀疏组成部分和噪音组成部分。通过基于第d级t-SVD的低级模型和适当的先验来建立低级模型,LMH-BRTF可以自动确定矩阵多rank。此外,因为明确地模型噪音和稀疏组成,提案的方法可以更好地利用噪音信息,从而提高TRPCA的性能。然后,我们建立了一种高效的变分推理算法来估计参数。empirical studies on synthetic and real-world datasets demonstrate the effectiveness of the proposed method in terms of both qualitative and quantitative results.
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems
results: 实验结果表明,\textsc{Hiformer}模型可以在大规模推荐系统中提供显著改进(最高提升+2.66%),并且在线部署中具有快速执行速度。Abstract
Learning feature interaction is the critical backbone to building recommender systems. In web-scale applications, learning feature interaction is extremely challenging due to the sparse and large input feature space; meanwhile, manually crafting effective feature interactions is infeasible because of the exponential solution space. We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions. Transformer architectures have witnessed great success in many domains, such as natural language processing and computer vision. However, there has not been much adoption of Transformer architecture for feature interaction modeling in industry. We aim at closing the gap. We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems: (1) Transformer architecture fails to capture the heterogeneous feature interactions in the self-attention layer; (2) The serving latency of Transformer architecture might be too high to be deployed in web-scale recommender systems. We first propose a heterogeneous self-attention layer, which is a simple yet effective modification to the self-attention layer in Transformer, to take into account the heterogeneity of feature interactions. We then introduce \textsc{Hiformer} (\textbf{H}eterogeneous \textbf{I}nteraction Trans\textbf{former}) to further improve the model expressiveness. With low-rank approximation and model pruning, \hiformer enjoys fast inference for online deployment. Extensive offline experiment results corroborates the effectiveness and efficiency of the \textsc{Hiformer} model. We have successfully deployed the \textsc{Hiformer} model to a real world large scale App ranking model at Google Play, with significant improvement in key engagement metrics (up to +2.66\%).
摘要
学习特征交互是推荐系统的关键脊梁。在网络级应用中,学习特征交互非常困难,因为输入特征空间很大且稀疏,同时手动设计有效的特征交互非常困难,因为解决空间是指数增长的。我们提议利用基于Transformer架构的模型,通过注意层自动捕捉特征交互。Transformer架构在许多领域取得了很大成功,如自然语言处理和计算机视觉。然而,在产业中对特征交互模型的应用还有一定的差距。我们的目标是填补这个差距。我们认为,在网络级应用中使用vanilla Transformer架构存在两个主要挑战:(1)Transformer架构无法捕捉特征交互中的不同类型交互;(2)Transformer架构的服务延迟可能太高,不适合在网络级应用中使用。我们首先提出一种不同类型交互的自注意层,这是一种简单 yet有效的修改,以满足特征交互中的不同类型交互。然后,我们引入\textsc{Hiformer}(特征交互转换器),以提高模型表达能力。通过低级抽象和模型剔除,\hiformer在线上部署中具有快速的推理速度。我们对大量实验数据进行了广泛的做法验证,证明了\textsc{Hiformer}模型的有效性和高效性。我们成功地将\textsc{Hiformer}模型部署到了Google Play上一个实际应用中,并实现了关键参与度指标(最高+2.66%)上的显著提高。
For: This paper investigates the problem of determining whether two random databases are statistically dependent or not.* Methods: The paper formulates this problem as a hypothesis testing problem, and uses techniques from information theory and matrix analysis to derive thresholds for optimal testing.* Results: The paper shows that the thresholds for optimal testing depend on the number of dimensions $n$ and the spectral properties of the generative distributions of the datasets, and proves that weak detection is statistically impossible when a certain function of the eigenvalues of the likelihood function and $d$ is below a certain threshold, as $d\to\infty$. The paper also derives strong and weak detection lower and upper bounds for the case where $d$ is fixed.Abstract
In this paper, we investigate the problem of deciding whether two random databases $\mathsf{X}\in\mathcal{X}^{n\times d}$ and $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ are statistically dependent or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these two databases are statistically independent, while under the alternative, there exists an unknown row permutation $\sigma$, such that $\mathsf{X}$ and $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$, are statistically dependent with some known joint distribution, but have the same marginal distributions as the null. We characterize the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$, $d$, and some spectral properties of the generative distributions of the datasets. For example, we prove that if a certain function of the eigenvalues of the likelihood function and $d$, is below a certain threshold, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, no matter what the value of $n$ is. This mimics the performance of an efficient test that thresholds a centered version of the log-likelihood function of the observed matrices. We also analyze the case where $d$ is fixed, for which we derive strong (vanishing error) and weak detection lower and upper bounds.
摘要
在这篇论文中,我们研究了两个随机数据库 $\mathsf{X}\in\mathcal{X}^{n\times d}$ 和 $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ 是否 Statistically 相关的问题。我们将这个问题转化为一个 гипотеза测试问题,其中在 null 假设下,这两个数据库是 statistically 独立的,而在 alternative 下,存在一个未知的行Permutation $\sigma$,使得 $\mathsf{X}$ 和 $\mathsf{Y}^\sigma$ 是一个已知的联合分布下的 Statistically 相关的,但它们的各自分布与 null 假设一样。我们分析了在 $n$ 和 $d$ 上的测试阈值,以及这些阈值与数据集的生成分布的特性之间的关系。例如,我们证明了,如果一个函数 $f$ 的值小于一定阈值,然后 $d\to\infty$,那么在任何 $n$ 值下,弱测试(比Random Guessing 稍微好)是 statistically 不可能,这与中心化的 log-likelihood 函数的测试阈值相同。我们还分析了 $d$ 是固定的情况,得到了强(消失错误)和弱(增长错误)检测下界和上界。
Fair Supervised Learning with A Simple Random Sampler of Sensitive Attributes
results: 实验表明,该方法在流行的 benchmark 数据集上比竞争方法具有更好的实用性和公平度量。此外,该方法还 theoretically Characterize 评估误差和损失的Utility。Abstract
As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of sensitive attributes and response variables, the proposed penalty is able to handle versatile formats of the sensitive attributes, so it is more extensively applicable in practice than many existing algorithms. This penalty enables us to build a computationally efficient group-level in-processing fairness-aware training framework. Empirical evidence shows that our framework enjoys better utility and fairness measures on popular benchmark data sets than competing methods. We also theoretically characterize estimation errors and loss of utility of the proposed neural-penalized risk minimization problem.
摘要
“在工业应用中,数据驱动决策过程变得越来越重要,而具有公平性的机器学习也吸引了各方关注。这项工作提出了基于神经网络学习的公平罚款,通过简单随机抽取敏感特征来实现不歧视式指导学习。与许多现有方法不同,我们的罚款可以处理各种敏感特征的格式,因此在实践中更加广泛适用。这种罚款允许我们建立高效的计算机器-级内部处理公平性感知训练框架。实验证明,我们的框架在 популяр的Benchmark数据集上实现了更好的用用性和公平性指标,而且我们还 theoretically characterize estimation errors和loss of utility of the proposed neural-penalized risk minimization problem。”Note that Simplified Chinese is the written form of Chinese used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other regions.
Clipped-Objective Policy Gradients for Pessimistic Policy Optimization
results: 在连续动作空间中使用 clipped-objective policy gradient(COPG)对象可以提高 PPO 的性能,而不添加计算成本或复杂度,并且与 TRPO 相比,COPG 方法可以提供更好的性能。Abstract
To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences. Natural policy gradient methods, including Trust Region Policy Optimization (TRPO), seek to produce monotonic improvement through bounded changes in policy outputs. Proximal Policy Optimization (PPO) is a commonly used, first-order algorithm that instead uses loss clipping to take multiple safe optimization steps per batch of data, replacing the bound on the single step of TRPO with regularization on multiple steps. In this work, we find that the performance of PPO, when applied to continuous action spaces, may be consistently improved through a simple change in objective. Instead of the importance sampling objective of PPO, we instead recommend a basic policy gradient, clipped in an equivalent fashion. While both objectives produce biased gradient estimates with respect to the RL objective, they also both display significantly reduced variance compared to the unbiased off-policy policy gradient. Additionally, we show that (1) the clipped-objective policy gradient (COPG) objective is on average "pessimistic" compared to both the PPO objective and (2) this pessimism promotes enhanced exploration. As a result, we empirically observe that COPG produces improved learning compared to PPO in single-task, constrained, and multi-task learning, without adding significant computational cost or complexity. Compared to TRPO, the COPG approach is seen to offer comparable or superior performance, while retaining the simplicity of a first-order method.
摘要
<>将文本翻译成简化中文。<>为了优化学习效率,深度参与学习(RL)中的政策梯度方法通常与减少噪声度量和基于批处经验的策略相结合。自然政策梯度方法,包括信任区政策优化(TRPO),旨在通过约束的变化来生成升序的改进。而 proximal policy optimization(PPO)则是一种通常使用的第一个算法,它使用损失clip来实现多个安全优化步骤,而不是TRPO中的约束。在这项工作中,我们发现在继续动作空间中应用PPO时,可以通过简单的目标更改来提高性能。而不是PPO的重要样本 objective,我们建议使用基本政策梯度,并将其clip在相同的方式下。虽然两个目标都会生成偏离RL目标的偏梯度估计,但它们都会显著减少噪声度。此外,我们还证明了以下两点:(1)COPG目标比PPO目标更加“负面”,(2)这种负面性会促进更好的探索。因此,我们在单任务、受限制和多任务学习中观察到,COPG可以提高学习效果,而不需要添加显著的计算成本或复杂性。相比TRPO,COPG方法可以提供相当于或更好的性能,而且保留了一个简单的首领方法的简单性。
AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training
results: 研究结果显示,该提案能够在考虑的实验设置下加速边缘管道并行训练,最高提高训练速度达3倍。Abstract
It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings.
摘要
通常不可能使用单个边缘设备(edge device)来整个大深度学习模型(DNN)的适应和训练,因为边缘设备的资源有限。为了推广智能应用于边缘设备,研究人员已经提议将大型模型 partitioned 成多个子模型,并将每个子模型分配到不同的边缘设备进行协同训练 DNN 模型。然而,在训练过程中由一个设备传输到另一个设备的大量数据会导致通信开销增加,而且因为每个边缘设备的计算延迟预测不准确,会导致分区点不佳,从而降低训练速度。在这篇论文中,我们提出了 AccEPT,一种加速边缘协同管道式训练的加速方案。具体来说,我们提出了一种轻量级的适应式计算延迟预测器,可以准确地预测每层的计算延迟在不同的设备上。此外,我们还提出了一种位元级计算效率高的数据压缩方案,可以压缩在训练过程中传输的数据。我们的数据显示,我们的提出的加速策略可以在考虑的实验设置下加速边缘管道式训练,达到 3 倍的速度提升。
Machine Learning-powered Compact Modeling of Stochastic Electronic Devices using Mixture Density Networks
paper_authors: Jack Hutchins, Shamiul Alam, Dana S. Rampini, Bakhrom G. Oripov, Adam N. McCaughan, Ahmedullah Aziz
for: This paper aims to address the challenge of accurately modeling the stochastic behavior of electronic devices in circuit design and simulation.
methods: The authors use Mixture Density Networks (MDNs), a machine learning approach, to model the stochastic behavior of electronic devices and demonstrate their method on heater cryotrons.
results: The authors achieve a mean absolute error of 0.82% in capturing the stochastic switching dynamics of heater cryotrons, showcasing the effectiveness of their approach in accurately simulating the behavior of electronic devices.Abstract
The relentless pursuit of miniaturization and performance enhancement in electronic devices has led to a fundamental challenge in the field of circuit design and simulation: how to accurately account for the inherent stochastic nature of certain devices. While conventional deterministic models have served as indispensable tools for circuit designers, they fall short when it comes to capture the subtle yet critical variability exhibited by many electronic components. In this paper, we present an innovative approach that transcends the limitations of traditional modeling techniques by harnessing the power of machine learning, specifically Mixture Density Networks (MDNs), to faithfully represent and simulate the stochastic behavior of electronic devices. We demonstrate our approach to model heater cryotrons, where the model is able to capture the stochastic switching dynamics observed in the experiment. Our model shows 0.82% mean absolute error for switching probability. This paper marks a significant step forward in the quest for accurate and versatile compact models, poised to drive innovation in the realm of electronic circuits.
摘要
“电子设备的推进式小型化和性能提高已经导致对电路设计和模拟的基本挑战:如何准确地考虑certain device的随机性。传统的决定论模型在电路设计中 serves as indispensable tools, but they fall short when it comes to capturing the subtle yet critical variability exhibited by many electronic components. 在本文中,我们透过 harnessing the power of machine learning, specifically Mixture Density Networks (MDNs), to faithfully represent and simulate the stochastic behavior of electronic devices. We demonstrate our approach by modeling heater cryotrons, where the model is able to capture the stochastic switching dynamics observed in the experiment. Our model shows 0.82% mean absolute error for switching probability, marking a significant step forward in the quest for accurate and versatile compact models, poised to drive innovation in the realm of electronic circuits.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide that as well.
Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction
paper_authors: Shanghao Shi, Ning Wang, Yang Xiao, Chaoyu Zhang, Yi Shi, Y. Thomas Hou, Wenjing Lou
for: This paper aims to address the issue of model inversion attacks (MIAs) in federated learning, which can compromise the data privacy of individual users.
methods: The proposed method, called Scale-MIA, uses a two-step process to efficiently and accurately recover training samples from the aggregated model updates. The first step involves reconstructing the latent space representations (LSRs) from the updates using a closed-form inversion mechanism, and the second step involves recovering the whole input batches from the LSRs using a fine-tuned generative decoder.
results: The proposed Scale-MIA method achieves excellent recovery performance on different datasets, with high reconstruction rates, accuracy, and attack efficiency compared to state-of-the-art MIAs. The method is able to efficiently recover the training samples even when the system is under the protection of a robust secure aggregation protocol.Abstract
Federated learning is known for its capability to safeguard participants' data privacy. However, recently emerged model inversion attacks (MIAs) have shown that a malicious parameter server can reconstruct individual users' local data samples through model updates. The state-of-the-art attacks either rely on computation-intensive search-based optimization processes to recover each input batch, making scaling difficult, or they involve the malicious parameter server adding extra modules before the global model architecture, rendering the attacks too conspicuous and easily detectable. To overcome these limitations, we propose Scale-MIA, a novel MIA capable of efficiently and accurately recovering training samples of clients from the aggregated updates, even when the system is under the protection of a robust secure aggregation protocol. Unlike existing approaches treating models as black boxes, Scale-MIA recognizes the importance of the intricate architecture and inner workings of machine learning models. It identifies the latent space as the critical layer for breaching privacy and decomposes the complex recovery task into an innovative two-step process to reduce computation complexity. The first step involves reconstructing the latent space representations (LSRs) from the aggregated model updates using a closed-form inversion mechanism, leveraging specially crafted adversarial linear layers. In the second step, the whole input batches are recovered from the LSRs by feeding them into a fine-tuned generative decoder. We implemented Scale-MIA on multiple commonly used machine learning models and conducted comprehensive experiments across various settings. The results demonstrate that Scale-MIA achieves excellent recovery performance on different datasets, exhibiting high reconstruction rates, accuracy, and attack efficiency on a larger scale compared to state-of-the-art MIAs.
摘要
federated learning 知名于保护参与者数据隐私。然而,最近出现的模型反向攻击(MIA)表明了一个恶意参数服务器可以通过模型更新 recover 个人用户的本地数据样本。现有的攻击方法可能需要 computation-intensive 搜索基本进行更新恢复,或者它们会在参数服务器上添加额外模块,使攻击过于明显并易于检测。为了解决这些限制,我们提出了Scale-MIA,一种新的MIA,可以高效地和准确地从集成更新中提取客户端训练样本,即使系统在一个可信的安全汇聚协议的保护下。与现有的方法不同,Scale-MIA认为机器学习模型不是黑盒子,而是注重内部结构和层次结构。它将秘密空间作为隐私泄露的关键层,将复杂的恢复任务分解成两个步骤,以降低计算复杂性。第一步是从集成模型更新中重construct秘密空间表示(LSR)使用关键拟合机制,利用特制的对抗性linear层。第二步是通过feeding LSRs into a fine-tuned generative decoder来恢复整个输入批处理。我们在多种常用的机器学习模型上实现了Scale-MIA,并在不同的设置下进行了广泛的实验。结果表明,Scale-MIA在不同的数据集上表现出了高恢复率、准确率和攻击效率,相比现有的MIAs性能更高。
Improvements on Uncertainty Quantification for Node Classification via Distance-Based Regularization
paper_authors: Russell Alan Hart, Linlin Yu, Yifei Lou, Feng Chen
for: The paper focuses on uncertainty quantification for interdependent node-level classification, specifically addressing the limitations of the widely-used uncertainty cross-entropy (UCE) loss function and proposing a distance-based regularization to improve the performance of graph posterior networks (GPNs) in detecting out-of-distribution (OOD) nodes.
methods: The paper uses graph posterior networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss function, and proposes a distance-based regularization to encourage clustered OOD nodes to remain clustered in the latent space.
results: The proposed regularization outperforms the state-of-the-art in both OOD detection and misclassification detection, as demonstrated through extensive comparison experiments on eight standard datasets.Here’s the Chinese translation of the three key information points:
methods: 这篇论文使用图 posterior networks(GPNs),优化不确定度距离(UCE)基于损失函数,并提出了一种距离基于正则化,以便在 latent space 中使 OOD 节点均匀分布。
results: 提议的正则化在 OOD 探测和误分类探测中都超过了现有的最佳性能,通过对八个标准数据集进行了广泛的比较试验来证明。Abstract
Deep neural networks have achieved significant success in the last decades, but they are not well-calibrated and often produce unreliable predictions. A large number of literature relies on uncertainty quantification to evaluate the reliability of a learning model, which is particularly important for applications of out-of-distribution (OOD) detection and misclassification detection. We are interested in uncertainty quantification for interdependent node-level classification. We start our analysis based on graph posterior networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss function. We describe the theoretical limitations of the widely-used UCE loss. To alleviate the identified drawbacks, we propose a distance-based regularization that encourages clustered OOD nodes to remain clustered in the latent space. We conduct extensive comparison experiments on eight standard datasets and demonstrate that the proposed regularization outperforms the state-of-the-art in both OOD detection and misclassification detection.
摘要