cs.LG - 2023-11-24

Effective Structural Encodings via Local Curvature Profiles

  • paper_url: http://arxiv.org/abs/2311.14864
  • repo_url: None
  • paper_authors: Lukas Fesser, Melanie Weber
  • for: 提高Graph Neural Networks在下游任务中的性能。
  • methods: 使用 discrete Ricci curvature(Local Curvature Profiles,简称LCP)作为结构编码方法,并与全局坐标编码相结合以提高下游性能。
  • results: 相比现有编码方法,LCP编码方法显示出显著的性能提升。此外,与rewiring技术相比,使用曲率信息进行结构编码可以实现更大的性能提升。
    Abstract Structural and Positional Encodings can significantly improve the performance of Graph Neural Networks in downstream tasks. Recent literature has begun to systematically investigate differences in the structural properties that these approaches encode, as well as performance trade-offs between them. However, the question of which structural properties yield the most effective encoding remains open. In this paper, we investigate this question from a geometric perspective. We propose a novel structural encoding based on discrete Ricci curvature (Local Curvature Profiles, short LCP) and show that it significantly outperforms existing encoding approaches. We further show that combining local structural encodings, such as LCP, with global positional encodings improves downstream performance, suggesting that they capture complementary geometric information. Finally, we compare different encoding types with (curvature-based) rewiring techniques. Rewiring has recently received a surge of interest due to its ability to improve the performance of Graph Neural Networks by mitigating over-smoothing and over-squashing effects. Our results suggest that utilizing curvature information for structural encodings delivers significantly larger performance increases than rewiring.
    摘要 graph neural networks 可以通过结构和位置编码进行显著提升下游任务的性能。 current literature 已经开始系统地探索这些方法编码的结构性质之间的差异,以及这些方法之间的性能交换。 however, 关于哪些结构特性可以提供最有效的编码仍然是一个开放的问题。 in this paper, 我们从 геометрической角度来调查这个问题。 we propose a novel structural encoding based on discrete ricci curvature (local curvature profiles, short LCP) and show that it significantly outperforms existing encoding approaches. we further show that combining local structural encodings, such as LCP, with global positional encodings improves downstream performance, suggesting that they capture complementary geometric information. finally, we compare different encoding types with (curvature-based) rewiring techniques. rewiring has recently received a surge of interest due to its ability to improve the performance of graph neural networks by mitigating over-smoothing and over-squashing effects. our results suggest that utilizing curvature information for structural encodings delivers significantly larger performance increases than rewiring.

An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification

  • paper_url: http://arxiv.org/abs/2311.14859
  • repo_url: None
  • paper_authors: Prakhar Ganesh
  • for: This paper aims to address the issue of model multiplicity in deep learning, which occurs when multiple models achieve similar performance but exhibit distinct underlying behaviors.
  • methods: The paper proposes a framework called “multiplicity sheets” to benchmark multiplicity in various scenarios, and translates several trustworthy metrics into accuracy under appropriate interventions.
  • results: The paper demonstrates the advantages of the proposed setup through a case study in image classification and provides actionable insights into the impact and trends of different hyperparameters on model multiplicity. Additionally, the paper shows that multiplicity persists in deep learning models even after enforcing additional specifications during model selection.
    Abstract Deep learning models have proven to be highly successful. Yet, their over-parameterization gives rise to model multiplicity, a phenomenon in which multiple models achieve similar performance but exhibit distinct underlying behaviours. This multiplicity presents a significant challenge and necessitates additional specifications in model selection to prevent unexpected failures during deployment. While prior studies have examined these concerns, they focus on individual metrics in isolation, making it difficult to obtain a comprehensive view of multiplicity in trustworthy machine learning. Our work stands out by offering a one-stop empirical benchmark of multiplicity across various dimensions of model design and its impact on a diverse set of trustworthy metrics. In this work, we establish a consistent language for studying model multiplicity by translating several trustworthy metrics into accuracy under appropriate interventions. We also develop a framework, which we call multiplicity sheets, to benchmark multiplicity in various scenarios. We demonstrate the advantages of our setup through a case study in image classification and provide actionable insights into the impact and trends of different hyperparameters on model multiplicity. Finally, we show that multiplicity persists in deep learning models even after enforcing additional specifications during model selection, highlighting the severity of over-parameterization. The concerns of under-specification thus remain, and we seek to promote a more comprehensive discussion of multiplicity in trustworthy machine learning.
    摘要 我们首先确定了一种共同语言,用于研究多样性。我们将多个可靠指标翻译成准确率,并在合适的干预下进行翻译。然后,我们开发了一个名为“多样性表”的框架,用于评估多样性在不同的场景下。我们通过一个实验study示例,证明了我们的设置的优势。最后,我们发现了多样性在深度学习模型中仍然存在,即使在选择模型时采取了额外的要求,这表明了过度参数化的问题仍然存在,而不是下pecification的问题。因此,我们呼吁更加全面地讨论多样性在可靠机器学习中的问题。

Disruption Prediction in Fusion Devices through Feature Extraction and Logistic Regression

  • paper_url: http://arxiv.org/abs/2311.14856
  • repo_url: None
  • paper_authors: Diogo R. Ferreira
  • for: 这篇论文是为了描述一种在多机器断层预测挑战中使用的方法,该挑战由ITU在2023年9月至11月在线平台Zindi上举行。
  • methods: 这篇论文使用了特征提取方法,然后对这些特征进行了логистиック回归。每个信号都被视为一个分子预测器,最终组合这些预测器达到了领导板块的第一名。
  • results: 该论文在多机器断层预测挑战中取得了第一名,表明该方法可以准确地预测断层事件。
    Abstract This document describes an approach used in the Multi-Machine Disruption Prediction Challenge for Fusion Energy by ITU, a data science competition which ran from September to November 2023, on the online platform Zindi. The competition involved data from three fusion devices - C-Mod, HL-2A, and J-TEXT - with most of the training data coming from the last two, and the test data coming from the first one. Each device has multiple diagnostics and signals, and it turns out that a critical issue in this competition was to identify which signals, and especially which features from those signals, were most relevant to achieve accurate predictions. The approach described here is based on extracting features from signals, and then applying logistic regression on top of those features. Each signal is treated as a separate predictor and, in the end, a combination of such predictors achieved the first place on the leaderboard.
    摘要 The approach described here is based on extracting features from signals and applying logistic regression on top of those features. Each signal is treated as a separate predictor, and a combination of such predictors achieved the first place on the leaderboard.

Deep convolutional encoder-decoder hierarchical neural networks for conjugate heat transfer surrogate modeling

  • paper_url: http://arxiv.org/abs/2311.17068
  • repo_url: None
  • paper_authors: Takiah Ebbs-Picken, David A. Romero, Carlos M. Da Silva, Cristina H. Amon
  • for: 这个论文主要是为了开发一种基于深度学习的高级热传输模型(CHT),用于解决计算复杂的热传输问题。
  • methods: 该论文使用了一种名为DeepEDH的神经网络模型,这种模型结合了深度学习和卷积神经网络,用于模拟热传输过程中的温度和流速场。
  • results: 根据实验结果,DeepEDH方法可以与传统的计算机模型相比,提供更高的准确率(R2),并且可以在大规模的热传输问题中实现高效的计算。
    Abstract Conjugate heat transfer (CHT) models are vital for the design of many engineering systems. However, high-fidelity CHT models are computationally intensive, which limits their use in applications such as design optimization, where hundreds to thousands of model evaluations are required. In this work, we develop a modular deep convolutional encoder-decoder hierarchical (DeepEDH) neural network, a novel deep-learning-based surrogate modeling methodology for computationally intensive CHT models. Leveraging convective temperature dependencies, we propose a two-stage temperature prediction architecture that couples velocity and temperature models. The proposed DeepEDH methodology is demonstrated by modeling the pressure, velocity, and temperature fields for a liquid-cooled cold-plate-based battery thermal management system with variable channel geometry. A computational model of the cold plate is developed and solved using the finite element method (FEM), generating a dataset of 1,500 simulations. The FEM results are transformed and scaled from unstructured to structured, image-like meshes to create training and test datasets. The DeepEDH methodology's performance is examined in relation to data scaling, training dataset size, and network depth. Our performance analysis covers the impact of the novel architecture, separate field models, output geometry masks, multi-stage temperature models, and optimizations of the hyperparameters and architecture. Furthermore, we quantify the influence of the CHT thermal boundary condition on surrogate model performance, highlighting improved temperature model performance with higher heat fluxes. Compared to other deep learning neural network surrogate models, such as U-Net and DenseED, the proposed DeepEDH methodology for CHT models exhibits up to a 65% enhancement in the coefficient of determination ($R^{2}$).
    摘要 高级热传输(CHT)模型是多种工程系统的设计中的关键。然而,高级别的CHT模型计算复杂,限制了其在设计优化等应用中的使用。在这种情况下,我们开发了一种模块化深度卷积Encoder-Decoder层次(DeepEDH)神经网络,用于高级别CHT模型的替身模型。利用热传输依赖关系,我们提出了两stage温度预测建筑,将速度和温度模型相联。我们的DeepEDH方法在模拟固体电池热管理系统中的压力、速度和温度场的问题上进行了应用。我们使用finite element法(FEM)计算冰板模型,生成了1500次的数据集。FEM结果被转换和缩放成不结构化到结构化、图像化的网格,以创建训练和测试数据集。我们对DeepEDH方法的性能进行了关于数据缩放、训练数据集大小和网络深度的分析。我们的性能分析包括新建建筑、分离场模型、输出 geometry 面积、多stage 温度模型和优化网络参数和结构的影响。此外,我们评估了热边界条件对替身模型性能的影响,并发现在高热 flux 下,温度模型性能得到了提高。与其他深度学习神经网络替身模型相比,例如U-Net和DenseED,我们的DeepEDH方法在CHT模型替身中表现出到65%的提高。

Deep Latent Force Models: ODE-based Process Convolutions for Bayesian Deep Learning

  • paper_url: http://arxiv.org/abs/2311.14828
  • repo_url: None
  • paper_authors: Thomas Baldwin-McDonald, Mauricio A. Álvarez
  • for: 该论文主要针对高度非线性动力系统中现象的模型化和uncertainty量化问题。
  • methods: 该论文提出了深度嵌入力力模型(DLFM),一种适用于各种问题的领域独特方法,包括深度 Gaussian 过程架构,其中每层的kernel来自于ordinary differential equation,使用过程 convolutions的框架。
  • results: 论文提供了两种DLFM的形式,即weight-space和variational inducing points-based Gaussian process approximations,两者均适用于双重概率变换推断。论文还提供了实验证明DLFM能够正确地模型高度非线性的多变量时间序列数据,并与其他概率模型在 benchark regression 任务上具有相似的性能。
    Abstract Effectively modeling phenomena present in highly nonlinear dynamical systems whilst also accurately quantifying uncertainty is a challenging task, which often requires problem-specific techniques. We outline the deep latent force model (DLFM), a domain-agnostic approach to tackling this problem, which consists of a deep Gaussian process architecture where the kernel at each layer is derived from an ordinary differential equation using the framework of process convolutions. Two distinct formulations of the DLFM are presented which utilise weight-space and variational inducing points-based Gaussian process approximations, both of which are amenable to doubly stochastic variational inference. We provide evidence that our model is capable of capturing highly nonlinear behaviour in real-world multivariate time series data. In addition, we find that our approach achieves comparable performance to a number of other probabilistic models on benchmark regression tasks. We also empirically assess the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models.
    摘要 高度非线性动力系统中的现象模型化是一项复杂的任务,需要专门的技术来解决。我们介绍了深层强制力模型(DLFM),这是一种适用于所有领域的方法,它基于深度 Gaussian process 架构,其中每层的kernel来自于ordinary differential equation使用过程 convolutions框架。我们提出了两种DLFM的形式,一种使用weight-space Gaussian processapproxiamtion,另一种使用variational inducing points-based Gaussian processapproxiamtion,两者都可以使用 doubly stochastic variational inference。我们提供了证据,证明我们的模型可以Capture高度非线性行为的实际世界多变量时间序列数据。此外,我们发现我们的方法与其他概率模型在Benchmark regression task上具有相似的性能。我们还employs empirical assessment to study the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models。

Revisiting Quantum Algorithms for Linear Regressions: Quadratic Speedups without Data-Dependent Parameters

  • paper_url: http://arxiv.org/abs/2311.14823
  • repo_url: None
  • paper_authors: Zhao Song, Junze Yin, Ruizhe Zhang
  • for: Linear regression problem, specifically finding $x’$ such that $|Ax’ - b|2^2 \leq (1+\epsilon)\min{x}|Ax - b|_2^2$.
  • methods: Quantum algorithm that runs in $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \mathrm{poly}(d/\epsilon)$ time, providing a quadratic quantum speedup in $n$ over the classical lower bound without any dependence on data-dependent parameters.
  • results: Exponential quantum speedups over classical algorithms, with a running time of $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \mathrm{poly}(d/\epsilon)$ that depends only on the condition number of $A$ and not on the size of the dataset. The result can be generalized to multiple regression and ridge linear regression.
    Abstract Linear regression is one of the most fundamental linear algebra problems. Given a dense matrix $A \in \mathbb{R}^{n \times d}$ and a vector $b$, the goal is to find $x'$ such that $ \| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2 $. The best classical algorithm takes $O(nd) + \mathrm{poly}(d/\epsilon)$ time [Clarkson and Woodruff STOC 2013, Nelson and Nguyen FOCS 2013]. On the other hand, quantum linear regression algorithms can achieve exponential quantum speedups, as shown in [Wang Phys. Rev. A 96, 012335, Kerenidis and Prakash ITCS 2017, Chakraborty, Gily{\'e}n and Jeffery ICALP 2019]. However, the running times of these algorithms depend on some quantum linear algebra-related parameters, such as $\kappa(A)$, the condition number of $A$. In this work, we develop a quantum algorithm that runs in $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \mathrm{poly}(d/\epsilon)$ time. It provides a quadratic quantum speedup in $n$ over the classical lower bound without any dependence on data-dependent parameters. In addition, we also show our result can be generalized to multiple regression and ridge linear regression.
    摘要 Linear regression 是一个非常基本的线性代数问题。给定一个稠密矩阵 $A \in \mathbb{R}^{n \times d}$ 和一个向量 $b$, 目标是找到 $x'$ 使得 $\| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2 $。最佳的 классический算法需要 $O(nd) + \text{poly}(d/\epsilon)$ 时间 [Clarkson 和 Woodruff STOC 2013, Nelson 和 Nguyen FOCS 2013]。然而,量子线性回归算法可以实现幂量量子加速,如在 [Wang Phys. Rev. A 96, 012335, Kerenidis 和 Prakash ITCS 2017, Chakraborty, Gily{\'e}n 和 Jeffery ICALP 2019] 中所示。但是,这些算法的运行时间取决于一些量子线性代数相关的参数,如 $A$ 的condition number $\kappa(A)$。在这种工作中,我们开发了一个量子算法,运行时间为 $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \text{poly}(d/\epsilon)$。它提供了一个quadratic量子加速度,至于 $n$ 的二次量子加速度无关于数据依赖的参数。此外,我们还证明了我们的结果可以推广到多重回归和梯度回归。

Differentiable and accelerated spherical harmonic and Wigner transforms

  • paper_url: http://arxiv.org/abs/2311.14670
  • repo_url: https://github.com/astro-informatics/s2fft
  • paper_authors: Matthew A. Price, Jason D. McEwen
  • for: The paper is written for researchers and practitioners who work with data defined on spherical manifolds and require efficient computation of gradients for machine learning or other differentiable programming tasks.
  • methods: The paper presents novel algorithmic structures for accelerated and differentiable computation of generalised Fourier transforms on the sphere and rotation group, including a recursive algorithm for the calculation of Wigner $d$-functions and a hybrid automatic and manual differentiation approach.
  • results: The paper reports up to a 400-fold acceleration and very close to optimal linear scaling with increasing number of GPUs when benchmarked against alternative C codes, and exhibits an unprecedented effective linear time complexity when distributing over multiple GPUs.
    Abstract Many areas of science and engineering encounter data defined on spherical manifolds. Modelling and analysis of spherical data often necessitates spherical harmonic transforms, at high degrees, and increasingly requires efficient computation of gradients for machine learning or other differentiable programming tasks. We develop novel algorithmic structures for accelerated and differentiable computation of generalised Fourier transforms on the sphere $\mathbb{S}^2$ and rotation group $\text{SO}(3)$, i.e. spherical harmonic and Wigner transforms, respectively. We present a recursive algorithm for the calculation of Wigner $d$-functions that is both stable to high harmonic degrees and extremely parallelisable. By tightly coupling this with separable spherical transforms, we obtain algorithms that exhibit an extremely parallelisable structure that is well-suited for the high throughput computing of modern hardware accelerators (e.g. GPUs). We also develop a hybrid automatic and manual differentiation approach so that gradients can be computed efficiently. Our algorithms are implemented within the JAX differentiable programming framework in the S2FFT software code. Numerous samplings of the sphere are supported, including equiangular and HEALPix sampling. Computational errors are at the order of machine precision for spherical samplings that admit a sampling theorem. When benchmarked against alternative C codes we observe up to a 400-fold acceleration. Furthermore, when distributing over multiple GPUs we achieve very close to optimal linear scaling with increasing number of GPUs due to the highly parallelised and balanced nature of our algorithms. Provided access to sufficiently many GPUs our transforms thus exhibit an unprecedented effective linear time complexity.
    摘要 多个科学和工程领域遇到定义在球面上的数据。模型和分析球面数据时常常需要球面傅立卷变换,特别是在高度上,并且需要高效地计算导数用于机器学习或其他可导程序任务。我们开发了新的算法结构,以加速和可导计算球面上的总化傅立卷变换。我们提出一种递归算法来计算温顿-$d$函数,该算法是稳定的高傅立卷度和可并行化的。通过与分解的球面变换紧密结合,我们获得了高并行化的算法结构,非常适合现代硬件加速器(如GPU)的高通过put计算。我们还开发了一种混合自动和手动导数方法,以便高效地计算导数。我们的算法在JAX可导编程框架中的S2FFT软件代码中实现。我们支持多种球面抽象,包括均匀和HEALPix抽象。计算错误在球面抽象中的机器精度阈值内。与替换的C代码进行比较时,我们观察到最多400倍的加速。此外,当分布到多个GPU上时,我们达到了几乎optimal的直线扩展性,因为我们的算法具有高并行化和均衡的特性。只要有足够多的GPU,我们的变换就会显示出前所未有的高效性,即无穷Linear Time Complexity。

Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks

  • paper_url: http://arxiv.org/abs/2311.14658
  • repo_url: None
  • paper_authors: Zhen Qin, Xuwei Tan, Zhihui Zhu
  • for: 这个论文的目的是提供对深度神经网络训练的正交正则化的理论分析,尤其是训练正交深度线性神经网络的收敛性分析。
  • methods: 这篇论文使用了里曼尼安排 descent 算法,并提供了一种适当的初始化方法来加速训练过程。
  • results: 这篇论文的实验结果表明,在一些损失函数下,采用正交正则化的方法可以提高训练速度,并且可以在不同的隐藏层数量下进行优化。
    Abstract Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks. However, despite its practical performance, the theoretical analysis of orthonormality in neural networks is still lacking; for example, how orthonormality affects the convergence of the training process. In this letter, we aim to bridge this gap by providing convergence analysis for training orthonormal deep linear neural networks. Specifically, we show that Riemannian gradient descent with an appropriate initialization converges at a linear rate for training orthonormal deep linear neural networks with a class of loss functions. Unlike existing works that enforce orthonormal weight matrices for all the layers, our approach excludes this requirement for one layer, which is crucial to establish the convergence guarantee. Our results shed light on how increasing the number of hidden layers can impact the convergence speed. Experimental results validate our theoretical analysis.
    摘要 强制权重矩阵具有正交性或均匀性性质可以增强深度神经网络的训练,使得梯度爆炸/消失问题得到改善,并提高学习的网络性。然而,关于神经网络中权重矩阵正交性的理论分析仍然缺失,例如如何正交性对训练过程的收敛有什么影响。在这封信中,我们尝试填补这一空白,通过对正交深度线性神经网络的训练进行收敛分析。我们表明,在适当的初始化下,里曼射gradient descent算法可以在训练正交深度线性神经网络时达到直线收敛率,并且不需要所有层的权重矩阵具有正交性。我们的结果显示,增加隐藏层数量可以对收敛速度产生影响。实验结果证明了我们的理论分析。

JetLOV: Enhancing Jet Tree Tagging through Neural Network Learning of Optimal LundNet Variables

  • paper_url: http://arxiv.org/abs/2311.14654
  • repo_url: https://github.com/giorgiocerro/jetlov
  • paper_authors: Mauricio A. Diaz, Giorgio Cerro, Jacan Chaplais, Srinandan Dasmahapatra, Stefano Moretti
  • for: 这 paper 的目的是使用机器学习算法,尤其是深度学习,解决物理领域中复杂的分类问题,如束环核心分类。
  • methods: 这 paper 使用了两种模型:一个简单的多层感知器(MLP)和已经证明有效的 LundNet。
  • results: 研究发现,可以通过不依赖 LundNet 变量,而使用自动学习的新变量,来实现类似的束环标记性能。这些发现可能有助于解决模型依赖性问题,并通过总结和训练在多个数据集上进行模型的泛化。
    Abstract Machine learning has played a pivotal role in advancing physics, with deep learning notably contributing to solving complex classification problems such as jet tagging in the field of jet physics. In this experiment, we aim to harness the full potential of neural networks while acknowledging that, at times, we may lose sight of the underlying physics governing these models. Nevertheless, we demonstrate that we can achieve remarkable results obscuring physics knowledge and relying completely on the model's outcome. We introduce JetLOV, a composite comprising two models: a straightforward multilayer perceptron (MLP) and the well-established LundNet. Our study reveals that we can attain comparable jet tagging performance without relying on the pre-computed LundNet variables. Instead, we allow the network to autonomously learn an entirely new set of variables, devoid of a priori knowledge of the underlying physics. These findings hold promise, particularly in addressing the issue of model dependence, which can be mitigated through generalization and training on diverse data sets.
    摘要 机器学习在物理研究中发挥了关键作用,特别是深度学习在复杂分类问题上提供了突出的贡献,如jets Tagging在物理上。在这个实验中,我们希望利用神经网络的全部潜力,同时也让我们不断地掌握神经网络模型下的物理知识。然而,我们发现在某些情况下,我们可能会忽略神经网络模型下的物理知识。不过,我们的研究表明,我们可以通过不依赖 LundNet 变量来实现相似的 jet 标记性能。我们的结果表明,我们可以让神经网络自动学习一个 entirely new的变量集合,无需受到先前物理知识的限制。这些发现对于解决模型依赖性问题具有潜在的意义,特别是通过普适性和训练在多个数据集上进行加持。

Data-driven Prior Learning for Bayesian Optimisation

  • paper_url: http://arxiv.org/abs/2311.14653
  • repo_url: https://github.com/sighellan/plebo
  • paper_authors: Sigrid Passano Hellan, Christopher G. Lucas, Nigel H. Goddard
  • for: 这 paper 是为了提高 Bayesian 优化中的计算效率,而不是假设所有优化任务具有相似的优化输入。
  • methods: 这 paper 使用的方法是 Prior Learning for Bayesian Optimization - PLeBO -,它通过学习 Gaussian process 模型的 hyperparameter 来更好地近似实际函数。
  • results: experiments 表明,PLeBO 和 Prior transfer 能够在 fewer evaluations 中找到好的输入,比其他 transfer learning 方法更有效。
    Abstract Transfer learning for Bayesian optimisation has generally assumed a strong similarity between optimisation tasks, with at least a subset having similar optimal inputs. This assumption can reduce computational costs, but it is violated in a wide range of optimisation problems where transfer learning may nonetheless be useful. We replace this assumption with a weaker one only requiring the shape of the optimisation landscape to be similar, and analyse the recent method Prior Learning for Bayesian Optimisation - PLeBO - in this setting. By learning priors for the hyperparameters of the Gaussian process surrogate model we can better approximate the underlying function, especially for few function evaluations. We validate the learned priors and compare to a breadth of transfer learning approaches, using synthetic data and a recent air pollution optimisation problem as benchmarks. We show that PLeBO and prior transfer find good inputs in fewer evaluations.
    摘要 <>将文本翻译成简化中文。<>bayesian优化中的转移学习通常假设优化任务之间具有强相似性,至少有一部分优化任务的优化输入具有相似性。这个假设可以降低计算成本,但在许多优化问题中被违反。我们将这个假设改为只需要优化函数的形态相似,并分析最近的方法Prior Learning for Bayesian Optimization(PLeBO)在这种设定下的表现。通过学习GP模型的超参数的先验知识,我们可以更好地近似下面函数,特别是在少量函数评估中。我们验证学习的先验知识和多种转移学习方法,使用 sintetic数据和最新的空气污染优化问题作为标准准比。我们发现PLeBO和先验转移可以快速找到优化的输入, fewer evaluations。

Learning in Deep Factor Graphs with Gaussian Belief Propagation

  • paper_url: http://arxiv.org/abs/2311.14649
  • repo_url: None
  • paper_authors: Seth Nabarro, Mark van der Wilk, Andrew J Davison
  • for: 这种方法用于学习 Gaussian factor graphs 中的量表。
  • methods: 该方法将所有相关的量(输入、输出、参数、隐藏变量)视为图形模型中的随机变量,并视为训练和预测为推理问题,其中每个问题都可以使用信念传播(BP)进行有效地解决。
  • results: 该方法可以扩展到深度网络,并提供一种自然的方式进行连续学习:使用 BP 估计当前任务中的参数积分作为下一个任务的参数先验。在视频噪声任务上,该方法超越了类传统因子图方法,并在 MNIST 图像分类任务上表现出了激进的性能。
    Abstract We propose an approach to do learning in Gaussian factor graphs. We treat all relevant quantities (inputs, outputs, parameters, latents) as random variables in a graphical model, and view both training and prediction as inference problems with different observed nodes. Our experiments show that these problems can be efficiently solved with belief propagation (BP), whose updates are inherently local, presenting exciting opportunities for distributed and asynchronous training. Our approach can be scaled to deep networks and provides a natural means to do continual learning: use the BP-estimated parameter marginals of the current task as parameter priors for the next. On a video denoising task we demonstrate the benefit of learnable parameters over a classical factor graph approach and we show encouraging performance of deep factor graphs for continual image classification on MNIST.
    摘要 我们提出了一种在高斯因子图上进行学习的方法。我们将所有相关的量(输入、输出、参数、隐藏变量)视为图形模型中的随机变量,并将训练和预测视为两个不同的观察节点的推理问题。我们的实验表明,这些问题可以有效地使用信仰卷积(BP)解决,其更新是本地的,这为分布式和异步训练提供了激动人心的机会。我们的方法可扩展到深度网络,并提供了一种自然的升级学习方法:使用当前任务BP估计的参数积分作为下一任务参数先验。在视频噪声去噪任务上,我们证明了可学习参数的优势 над传统的因子图方法,并在MNIST continual图像分类任务上表现出了鼓励的表现。

More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

  • paper_url: http://arxiv.org/abs/2311.14646
  • repo_url: None
  • paper_authors: James B. Simon, Dhruva Karkada, Nikhil Ghosh, Mikhail Belkin
  • for: 提供了关于过度参数化、过度适应和更多数据对随机特征(RF)回归模型的理论支持。
  • methods: 使用了随机特征回归模型,并通过优化ridge penalty来控制模型复杂度。
  • results: 提出了一种新的理论,表明随机特征回归模型的测试风险随着特征数和样本数的增加而下降,并且在某些任务下,只有在训练损失很低时才能达到近似优秀的性能。
    Abstract In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in random feature (RF) regression, a class of models equivalent to shallow networks with only the last layer trained. Concretely, we first show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples, provided the ridge penalty is tuned optimally. In particular, this implies that infinite width RF architectures are preferable to those of any finite width. We then proceed to demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory: near-optimal performance can only be achieved when the training error is much smaller than the test error. Grounding our theory in real-world data, we find empirically that standard computer vision tasks with convolutional neural tangent kernels clearly fall into this class. Taken together, our results tell a simple, testable story of the benefits of overparameterization, overfitting, and more data in random feature models.
    摘要 在我们的巨大神经网络时代,实践中的进步主要受到“更多是更好”的哲学影响。现代深度学习实践 repeatedly 发现,更大的模型Size,更多的数据和更多的计算(导致训练损失降低)会提高性能。在这篇论文中,我们给这些实际观察提供了理论支持,证明这三个属性在Random Feature(RF)回归中成立,这类模型与 shallow network 的最后一层 alone 相同。具体来说,我们首先显示测试风险在RF回归中随着特征数和样本数的增加而下降,只要ridge penalty 得到优化。这 imply dass infinite width RF 架构更加有利,而不是 finite width 的。然后,我们证明,对于一类 Task possessing power-law eigenstructure,在训练到训练损失接近零时,性能才能达到 Near-optimal 水平。基于实际数据,我们发现了标准计算机视觉任务中的 Convolutional Neural Tangent Kernels 明显属于这一类。总之,我们的结果告诉了一个简单、可验证的故事:Random Feature 模型中的过参数、过拟合和更多数据的好处。

A General Framework for User-Guided Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2311.14645
  • repo_url: None
  • paper_authors: Carl Hvarfner, Frank Hutter, Luigi Nardi
  • for: 优化高计算成本的黑盒函数,如在科学领域中广泛存在。
  • methods: 使用 bayesian 优化,可以自动、通用和减少样本数来解决这些问题,但是它无法包含专家所信任的知识或信念来加速优化。
  • results: 我们提出了 ColaBO,第一个基于 bayesian 原理的框架,可以在不同的 Monte Carlo 获取函数和专家信念中包含专家所信任的信念。我们的实验表明,当专家信念准确时,ColaBO 能够显著加速优化,而当专家信念误导时,它能够保持约等于默认性能。
    Abstract The optimization of expensive-to-evaluate black-box functions is prevalent in various scientific disciplines. Bayesian optimization is an automatic, general and sample-efficient method to solve these problems with minimal knowledge of the underlying function dynamics. However, the ability of Bayesian optimization to incorporate prior knowledge or beliefs about the function at hand in order to accelerate the optimization is limited, which reduces its appeal for knowledgeable practitioners with tight budgets. To allow domain experts to customize the optimization routine, we propose ColaBO, the first Bayesian-principled framework for incorporating prior beliefs beyond the typical kernel structure, such as the likely location of the optimizer or the optimal value. The generality of ColaBO makes it applicable across different Monte Carlo acquisition functions and types of user beliefs. We empirically demonstrate ColaBO's ability to substantially accelerate optimization when the prior information is accurate, and to retain approximately default performance when it is misleading.
    摘要 scipy.optimize.minimize(函数)的优化是科学领域中广泛存在的问题。 bayesian 优化是一种自动、通用和效率高的方法,用于解决这些问题,只需要 minimal knowledge of the underlying function dynamics。然而,bayesian 优化的能力 incorporate prior knowledge or beliefs about the function at hand in order to accelerate the optimization is limited, which reduces its appeal for knowledgeable practitioners with tight budgets. To allow domain experts to customize the optimization routine, we propose ColaBO, the first Bayesian-principled framework for incorporating prior beliefs beyond the typical kernel structure, such as the likely location of the optimizer or the optimal value. The generality of ColaBO makes it applicable across different Monte Carlo acquisition functions and types of user beliefs. We empirically demonstrate ColaBO's ability to substantially accelerate optimization when the prior information is accurate, and to retain approximately default performance when it is misleading.Note: "scipy.optimize.minimize(函数)" is a function in the Scipy library used for optimization in Python.

Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach

  • paper_url: http://arxiv.org/abs/2311.14632
  • repo_url: None
  • paper_authors: Xinwei Zhang, Zhiqi Bu, Zhiwei Steven Wu, Mingyi Hong
  • For: 提供一种可靠的权限保护机制,使得深度学习模型在使用敏感数据时能够保持数据隐私。* Methods: 使用Differentially Private Stochastic Gradient Descent with gradient clipping(DPSGD-GC)和错误反馈(EF)算法,提供了一个有理性的权限保护机制,并且可以在具有不同问题特点的场景中进行自适应调整。* Results: 在Cifar-10/100和E2E datasets上进行了实验,并证明了该算法可以在保持同等数据隐私保护的前提下,提高深度学习模型的准确率。
    Abstract Differentially Private Stochastic Gradient Descent with gradient clipping (DPSGD-GC) is a powerful tool for training deep learning models using sensitive data, providing both a solid theoretical privacy guarantee and high efficiency. However, using DPSGD-GC to ensure Differential Privacy (DP) comes at the cost of model performance degradation due to DP noise injection and gradient clipping. Existing research has extensively analyzed the theoretical convergence of DPSGD-GC, and has shown that it only converges when using large clipping thresholds that are dependent on problem-specific parameters. Unfortunately, these parameters are often unknown in practice, making it hard to choose the optimal clipping threshold. Therefore, in practice, DPSGD-GC suffers from degraded performance due to the {\it constant} bias introduced by the clipping. In our work, we propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC, which not only offers a diminishing utility bound without inducing a constant clipping bias, but more importantly, it allows for an arbitrary choice of clipping threshold that is independent of the problem. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R{\'e}nyi DP. Additionally, we demonstrate that under mild conditions, our algorithm can achieve nearly the same utility bound as DPSGD without gradient clipping. Our empirical results on Cifar-10/100 and E2E datasets, show that the proposed algorithm achieves higher accuracies than DPSGD while maintaining the same level of DP guarantee.
    摘要 diferencialmente privado Stochastic Gradient Descent con clipping de gradient (DPSGD-GC) es una herramienta poderosa para entrenar modelos de aprendizaje profundo utilizando datos sensibles, brindando tanto una garantía teórica sólida de privacidad como eficiencia alta. Sin embargo, utilizar DPSGD-GC para garantizar la privacidad diferencial (DP) tiene un costo de degradación de rendimiento del modelo debido al ruido de DP y clipping de gradient. La investigación existente ha analizado ampliamente la convergencia teórica de DPSGD-GC y ha demostrado que solo converge cuando se utiliza un umbral de clipping grande y dependiente de parámetros específicos del problema. Desafortunadamente, estos parámetros son a menudo desconocidos en la práctica, lo que dificulta la elección del umbral de clipping óptimo. Por lo tanto, en la práctica, DPSGD-GC sufre de una bias constante introducida por el clipping, lo que degrade su rendimiento.En nuestro trabajo, propusimos un algoritmo de retroalimentación de error (EF) como una alternativa a DPSGD-GC, que no solo ofrece una bound de utilidad diminuyente sin introducir un bias constante, sino que también permite una elección arbitraria del umbral de clipping que es independiente del problema. Establecemos un análisis de privacidad específico del algoritmo basado en la privacidad de Rényi. Además, demostramos que, bajo ciertas condiciones suaves, nuestro algoritmo puede alcanzar una utilidad similar a la de DPSGD sin clipping de gradient. Nuestros resultados empíricos en los conjuntos de datos Cifar-10/100 y E2E demuestran que el propuesto algoritmo tiene una precisión más alta que DPSGD mientras mantiene el mismo nivel de garantía de privacidad.

Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization

  • paper_url: http://arxiv.org/abs/2311.14609
  • repo_url: None
  • paper_authors: Selina Drews, Michael Kohler
  • for: 这个论文是为了证明深度神经网络可以不使用正则化项来实现好的学习效果。
  • methods: 这个论文使用的方法是应用梯度下降到带扰动的数据集上学习深度神经网络。
  • results: 研究发现,无需正则化项的深度神经网络也可以实现好的学习效果,并且在某些情况下,其学习速率比使用正则化项的情况快。此外,研究还发现,当投影函数是Holder平滑的时候,$L_2$误差的整体趋势是随着数据集大小$n$的增加而下降,并且其 converge 速率与输入维度$d$无关。
    Abstract Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical $L_2$ risk are universally consistent and achieve good rates of convergence. In this paper, we show that the regularization term is not necessary to obtain similar results. In the case of a suitably chosen initialization of the network, a suitable number of gradient descent steps, and a suitable step size we show that an estimate without a regularization term is universally consistent for bounded predictor variables. Additionally, we show that if the regression function is H\"older smooth with H\"older exponent $1/2 \leq p \leq 1$, the $L_2$ error converges to zero with a convergence rate of approximately $n^{-1/(1+d)}$. Furthermore, in case of an interaction model, where the regression function consists of a sum of H\"older smooth functions with $d^*$ components, a rate of convergence is derived which does not depend on the input dimension $d$.
    摘要 最新的结果表明,使用权重参数化深度神经网络的梯度下降学习,可以得到良好的整体性和速度。在这篇论文中,我们证明了,无需正则化项,可以获得类似的结果。在网络初始化合适,步长合适,步长合适时,无正则化项的估计是全面一致的。此外,如果回归函数是Holder平滑的,那么$L_2$误差的整体趋势为零,并且错误率为$n^{-1/(1+d)}$。在交互模型中,其中回归函数为多个Holder平滑函数的总和,我们 derivated一个不依赖输入维度$d$的速度。

A Metalearned Neural Circuit for Nonparametric Bayesian Inference

  • paper_url: http://arxiv.org/abs/2311.14601
  • repo_url: https://github.com/jakesnell/neural-circuits
  • paper_authors: Jake C. Snell, Gianluca Bencomo, Thomas L. Griffiths
  • for: 这个研究是为了解决机器学习类别时遇到的关注集分布问题,即在实际世界中,类别出现的数据不仅是对称的,而且类别数量可能会 seguing 到长尾的力 Ло� distributed 分布。
  • methods: 这个研究使用了非 Parametric Bayesian 模型,并将其中的导引偏好转换到人工神经网络中。通过实际资料上的非 Parametric Bayesian 偏好,我们可以实现遍历无限多个类别的推理。
  • results: 我们的实验结果显示,将非 Parametric Bayesian 模型中的导引偏好转换到人工神经网络中,可以实现比 particle filter 方法更好的推理性能,并且比使用将 Bayesian nonparametric 推理直接包含在人工神经网络中更加快速和简单。
    Abstract Most applications of machine learning to classification assume a closed set of balanced classes. This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution and it is unlikely that all classes are seen in a single sample. Nonparametric Bayesian models naturally capture this phenomenon, but have significant practical barriers to widespread adoption, namely implementation complexity and computational inefficiency. To address this, we present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network. By simulating data with a nonparametric Bayesian prior, we can metalearn a sequence model that performs inference over an unlimited set of classes. After training, this "neural circuit" has distilled the corresponding inductive bias and can successfully perform sequential inference over an open set of classes. Our experimental results show that the metalearned neural circuit achieves comparable or better performance than particle filter-based methods for inference in these models while being faster and simpler to use than methods that explicitly incorporate Bayesian nonparametric inference.
    摘要 大多数机器学习应用于分类假设了关闭的均衡类别。然而,现实中,类别发生频率经常遵循长尾力学 distribuition,而且在单个样本中可能无法见到所有类别。非Parametric Bayesian模型自然地捕捉到这种现象,但它们在实际应用中存在重要的实现复杂性和计算效率问题。为了解决这些问题,我们提出了一种方法,即从非Parametric Bayesian模型中提取 inductive bias,并将其传递到人工神经网络中。通过使用非Parametric Bayesian prior simulate数据,我们可以模拟出一个能够进行无限多个类别的推理的“神经网络”。经过训练,这个“神经网络”已经吸收了相应的 inductive bias,并可以成功地进行无限多个类别的推理。我们的实验结果表明,使用这种模板学习的神经网络可以与 particile filter-based 方法相比,在这些模型中进行推理时达到相同或更好的性能,而且比使用直接 incorporate Bayesian nonparametric inference 的方法更快和更简单。

One Fits All: Universal Time Series Analysis by Pretrained LM and Specially Designed Adaptors

  • paper_url: http://arxiv.org/abs/2311.14782
  • repo_url: https://github.com/psacfc/gpt4ts_adapter
  • paper_authors: Tian Zhou, Peisong Niu, Xue Wang, Liang Sun, Rong Jin
  • for: 本研究旨在开探预训计划模型在时间序列分析领域的应用,并提出一种基于预训计划模型的实时时间序列分析方法。
  • methods: 本研究使用预训计划模型的内置构成,包括自我注意力和传递层,以及特定设计的四个适应器,进行时间序列分析。
  • results: 本研究的实验结果显示,使用预训计划模型可以在不同的时间序列分析任务上达到顶尖性能,而且可以透过特定设计的适应器进一步提高性能。
    Abstract Despite the impressive achievements of pre-trained models in the fields of natural language processing (NLP) and computer vision (CV), progress in the domain of time series analysis has been limited. In contrast to NLP and CV, where a single model can handle various tasks, time series analysis still relies heavily on task-specific methods for activities such as classification, anomaly detection, forecasting, and few-shot learning. The primary obstacle to developing a pre-trained model for time series analysis is the scarcity of sufficient training data. In our research, we overcome this obstacle by utilizing pre-trained models from language or CV, which have been trained on billions of data points, and apply them to time series analysis. We assess the effectiveness of the pre-trained transformer model in two ways. Initially, we maintain the original structure of the self-attention and feedforward layers in the residual blocks of the pre-trained language or image model, using the Frozen Pre-trained Transformer (FPT) for time series analysis with the addition of projection matrices for input and output. Additionally, we introduce four unique adapters, designed specifically for downstream tasks based on the pre-trained model, including forecasting and anomaly detection. These adapters are further enhanced with efficient parameter tuning, resulting in superior performance compared to all state-of-the-art methods.Our comprehensive experimental studies reveal that (a) the simple FPT achieves top-tier performance across various time series analysis tasks; and (b) fine-tuning the FPT with the custom-designed adapters can further elevate its performance, outshining specialized task-specific models.
    摘要 尽管预训模型在自然语言处理(NLP)和计算机视觉(CV)领域的成就印象深刻,但时间序列分析领域的进步却有限。与NLP和CV不同,时间序列分析仍然依赖于特定任务的方法,例如分类、异常检测、预测和几何学学习。主要阻碍开发预训模型的困难在于缺乏充足的训练数据。在我们的研究中,我们利用预训CV或语言模型,这些模型在数据点上训练了数百亿个数据点,并将其应用于时间序列分析。我们使用预训转换器模型进行时间序列分析,并添加输入和输出投影矩阵。此外,我们还引入四种专门为下游任务设计的适应器,包括预测和异常检测。这些适应器通过高效的参数调整,使其性能胜过所有现有方法。我们的广泛的实验研究表明:(a)简单的FPT可以在多种时间序列分析任务中达到顶尖性能;(b)对FPT进行特定任务的定制适应器进行微调,可以进一步提高其性能,超越特定任务的模型。

Example-Based Explanations of Random Forest Predictions

  • paper_url: http://arxiv.org/abs/2311.14581
  • repo_url: None
  • paper_authors: Henrik Boström
  • for: 这篇论文主要是关于如何提供更有用的Random Forest预测解释。
  • methods: 论文提出了一种修改预测过程,只使用最重要的示例来计算预测结果,从而减少了预测中每个示例的数量,同时保持或even improve预测性能。
  • results: 实验表明,使用修改过程可以substantially reduce the number of examples used in each explanation,而且与标准预测过程相比,预测性能可以保持或even improve。
    Abstract A random forest prediction can be computed by the scalar product of the labels of the training examples and a set of weights that are determined by the leafs of the forest into which the test object falls; each prediction can hence be explained exactly by the set of training examples for which the weights are non-zero. The number of examples used in such explanations is shown to vary with the dimensionality of the training set and hyperparameters of the random forest algorithm. This means that the number of examples involved in each prediction can to some extent be controlled by varying these parameters. However, for settings that lead to a required predictive performance, the number of examples involved in each prediction may be unreasonably large, preventing the user to grasp the explanations. In order to provide more useful explanations, a modified prediction procedure is proposed, which includes only the top-weighted examples. An investigation on regression and classification tasks shows that the number of examples used in each explanation can be substantially reduced while maintaining, or even improving, predictive performance compared to the standard prediction procedure.
    摘要 一个随机森林预测可以通过标签的内积和一组由树叶决定的权重来计算;每个预测都可以由一组非零权重的训练示例进行 precisely 解释。训练集的维度和随机森林算法的Hyperparameter会影响这些例子的数量。这意味着可以通过调整这些参数来控制预测中每个例子的数量。然而,在需要的预测性能下,每个预测中可能会有过多的例子,使用者无法理解解释。为了提供更有用的解释,我们提议一种修改的预测方法,只包含权重最大的示例。我们对 regression 和 classification 任务进行了调查,发现可以大幅减少每个解释中的例子数量,保持或者even 改善预测性能与标准预测方法相比。

Predicting Failure of P2P Lending Platforms through Machine Learning: The Case in China

  • paper_url: http://arxiv.org/abs/2311.14577
  • repo_url: None
  • paper_authors: Jen-Yin Yeh, Hsin-Yu Chiu, Jhih-Huei Huang
  • for: 这种研究用机器学习模型预测中国P2P借据平台失败。
  • methods: 这种研究使用筛法和包袋法,并使用前向选择和反向减少来确定变量的有效性和重要性。
  • results: 研究发现一组可靠的变量,这些变量在不同的选择方法和模型中都出现在特征子 subsets 中,表明它们在预测平台失败方面具有可靠性和重要性。 研究发现,减少变量的数量会导致false acceptance rate 增加,但是性能指标保持稳定,AUC值约为0.96,F1 score 约为0.88。
    Abstract This study employs machine learning models to predict the failure of Peer-to-Peer (P2P) lending platforms, specifically in China. By employing the filter method and wrapper method with forward selection and backward elimination, we establish a rigorous and practical procedure that ensures the robustness and importance of variables in predicting platform failures. The research identifies a set of robust variables that consistently appear in the feature subsets across different selection methods and models, suggesting their reliability and relevance in predicting platform failures. The study highlights that reducing the number of variables in the feature subset leads to an increase in the false acceptance rate while the performance metrics remain stable, with an AUC value of approximately 0.96 and an F1 score of around 0.88. The findings of this research provide significant practical implications for regulatory authorities and investors operating in the Chinese P2P lending industry.
    摘要 Here's the translation in Simplified Chinese:这个研究使用机器学习模型预测中国Peer-to-Peer(P2P)借贷平台的失败。通过使用过滤方法和包裹方法,我们建立了一个可靠和实用的程序,确保变量的重要性和可靠性在预测平台失败方面。研究发现,减少变量的数量会导致准确接受率的增加,而性能指标保持稳定,AUC值约为0.96,F1分数约为0.88。这些研究结论对中国P2P借贷行业的规制机构和投资者有重要实践意义。

FRUITS: Feature Extraction Using Iterated Sums for Time Series Classification

  • paper_url: http://arxiv.org/abs/2311.14549
  • repo_url: https://github.com/irkri/fruits
  • paper_authors: Joscha Diehl, Richard Krieg
  • for: 这个论文是为了提出一个时间序列分类管道,该管道EXTRACTS特征基于迭代和签名(ISS),然后应用线性分类器。这些特征是非线性的,捕捉时间序列信息,并在某些设置下是时间抽象的。
  • methods: 这个管道使用迭代和签名(ISS)来EXTRACT特征,然后应用线性分类器进行分类。
  • results: 这个管道在UCAR archive上与当前最佳方法竞争,both in terms of accuracy和速度。codes are available at \url{https://github.com/irkri/fruits}.
    Abstract We introduce a pipeline for time series classification that extracts features based on the iterated-sums signature (ISS) and then applies a linear classifier. These features are intrinsically nonlinear, capture chronological information, and, under certain settings, are invariant to time-warping. We are competitive with state-of-the-art methods on the UCR archive, both in terms of accuracy and speed. We make our code available at \url{https://github.com/irkri/fruits}.
    摘要 我们提出了一个时间序列分类管道,其中提取基于迭代和积分特征(ISS),然后应用线性分类器。这些特征是内在非线性的,捕捉时间信息,并在某些设置下保持时间戳变化的不变性。我们与当前最佳方法在UCRL存档上具有同等精度和速度。我们的代码可以在 GitHub 上找到:https://github.com/irkri/fruits。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form as well.

Finding Foundation Models for Time Series Classification with a PreText Task

  • paper_url: http://arxiv.org/abs/2311.14534
  • repo_url: https://github.com/msd-irimas/domainfoundationmodelstsc
  • paper_authors: Ali Ismail-Fawaz, Maxime Devanne, Stefano Berretti, Jonathan Weber, Germain Forestier
  • for: 这篇研究旨在解决时间序列分类 tasks 中的过滤问题,尤其是在训练数据稀缺的情况下。
  • methods: 本研究提出了一种基于预训练领域的基础模型,使用了一个新的预tex task来训练模型,并在训练过程中分成了两个阶段:预训练阶段和精度训练阶段。
  • results: 实验结果显示,这种预训练策略可以对时间序列分类 tasks 中的过滤问题做出有效的改善,并且在训练数据稀缺的情况下具有较好的适应性。
    Abstract Over the past decade, Time Series Classification (TSC) has gained an increasing attention. While various methods were explored, deep learning - particularly through Convolutional Neural Networks (CNNs)-stands out as an effective approach. However, due to the limited availability of training data, defining a foundation model for TSC that overcomes the overfitting problem is still a challenging task. The UCR archive, encompassing a wide spectrum of datasets ranging from motion recognition to ECG-based heart disease detection, serves as a prime example for exploring this issue in diverse TSC scenarios. In this paper, we address the overfitting challenge by introducing pre-trained domain foundation models. A key aspect of our methodology is a novel pretext task that spans multiple datasets. This task is designed to identify the originating dataset of each time series sample, with the goal of creating flexible convolution filters that can be applied across different datasets. The research process consists of two phases: a pre-training phase where the model acquires general features through the pretext task, and a subsequent fine-tuning phase for specific dataset classifications. Our extensive experiments on the UCR archive demonstrate that this pre-training strategy significantly outperforms the conventional training approach without pre-training. This strategy effectively reduces overfitting in small datasets and provides an efficient route for adapting these models to new datasets, thus advancing the capabilities of deep learning in TSC.
    摘要 过去一个 décennial,时间序列分类(TSC)已经受到了越来越多的关注。虽然各种方法被探索,但是深度学习——特别是通过卷积神经网络(CNN)——脱颖而出,成为了有效的方法。然而,由于训练数据的有限性,为TSC定义一个基础模型,以解决过拟合问题,仍然是一个挑战。UCRC存档,包含了多种时间序列 datasets,从动作识别到心跳监测等,成为了研究这一问题的多样化 TSC 场景的示例。在这篇论文中,我们采用了预训练频率基础模型来解决过拟合问题。我们的方法包括了一个新的预测任务,该任务是识别每个时间序列样本的来源dataset,以创建可以在不同dataset上应用的灵活的卷积滤波器。我们的研究过程包括两个阶段:预训练阶段,模型通过预测任务获得通用特征;然后是精度调整阶段,用于特定dataset的分类。我们在UCRC存档上进行了广泛的实验,发现这种预训练策略在小 datasets 中明显超过了无预训练的情况。这种策略有效地降低了小 datasets 中的过拟合,并提供了一种有效的方式,以适应新的 datasets,从而推动深度学习在 TSC 中的发展。

Comparing Feature Engineering and End-to-End Deep Learning for Autism Spectrum Disorder Assessment based on Fullbody-Tracking

  • paper_url: http://arxiv.org/abs/2311.14533
  • repo_url: None
  • paper_authors: Alberto Altozano, Maria Eleonora Minissi, Mariano Alcañiz, Javier Marín-Morales
  • for: 本研究旨在评估不同方法的有效性在诊断Autism Spectrum Disorder (ASD)中,以便发现更加可靠和灵活的方法。
  • methods: 研究使用了两种方法:一种是使用手工设计的特征,另一种是使用端到端模型。
  • results: 结果显示,使用手工设计的特征在特定任务中表现较好,达到了状态之artefact的区下标(AUC)的0.90$\pm$0.06。然而,端到端模型提供了更加一致的结果,无论任务是什么,并且表现了领域总是和可靠性。
    Abstract Autism Spectrum Disorder (ASD) is characterized by challenges in social communication and restricted patterns, with motor abnormalities gaining traction for early detection. However, kinematic analysis in ASD is limited, often lacking robust validation and relying on hand-crafted features for single tasks, leading to inconsistencies across studies. Thus, end-to-end models have become promising methods to overcome the need for feature engineering. Our aim is to assess both approaches across various kinematic tasks to measure the efficacy of commonly used features in ASD assessment, while comparing them to end-to-end models. Specifically, we developed a virtual reality environment with multiple motor tasks and trained models using both classification approaches. We prioritized a reliable validation framework with repeated cross-validation. Our comparative analysis revealed that hand-crafted features outperformed our deep learning approach in specific tasks, achieving a state-of-the-art area under the curve (AUC) of 0.90$\pm$0.06. Conversely, end-to-end models provided more consistent results with less variability across all VR tasks, demonstrating domain generalization and reliability, with a maximum task AUC of 0.89$\pm$0.06. These findings show that end-to-end models enable less variable and context-independent ASD assessments without requiring domain knowledge or task specificity. However, they also recognize the effectiveness of hand-crafted features in specific task scenarios.
    摘要 自适应发展障碍(ASD)特征之一是社交通信和固定模式的挑战,以及运动异常的出现,这使得早期检测变得可能。然而,在ASD中的运动分析受到限制,通常缺乏可靠的验证和特定任务的手工特征,这导致研究中存在不一致性。因此,端到端模型成为了检测ASD的有效方法。我们的目标是评估这两种方法在不同的运动任务中的效果,并将其与手工特征进行比较。我们在虚拟现实环境中创建了多种运动任务,并使用了两种分类方法进行训练。我们重视了可靠的验证框架,并进行了重复的横排验证。我们的比较分析表明,手工特征在特定任务中表现出色,达到了状态的报告圆度(AUC)的0.90$\pm$0.06。相比之下,端到端模型在所有VR任务中提供了更加一致和无关上下文的评估结果,达到了最大任务AUC的0.89$\pm$0.06。这些发现表明,端到端模型可以提供不变性和无关上下文的ASD评估,无需培riniciples或任务特定知识。然而,它们也证明了特定任务场景中手工特征的效iveness。

Fault Detection in Telecom Networks using Bi-level Federated Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.14469
  • repo_url: None
  • paper_authors: R. Bourgerie, T. Zanouda
  • For: 这个研究旨在探讨如何透过 anomaly detection 和诊断来探测和维护5G 和以后网络中的网络异常。* Methods: 本研究提出了一个 Bi-level Federated Graph Neural Network 异常探测和诊断模型,可以透过 Privacy-preserving 的方式探测网络异常,同时对于不同的应用需求进行适应。* Results: 研究人员透过使用 real-world 数据进行实验,发现 Personalized Federated Temporal Graph Neural Networks 方法可以较好地探测网络异常,并且比较常用的技术有更好的性能。
    Abstract 5G and Beyond Networks become increasingly complex and heterogeneous, with diversified and high requirements from a wide variety of emerging applications. The complexity and diversity of Telecom networks place an increasing strain on maintenance and operation efforts. Moreover, the strict security and privacy requirements present a challenge for mobile operators to leverage network data. To detect network faults, and mitigate future failures, prior work focused on leveraging traditional ML/DL methods to locate anomalies in networks. The current approaches, although powerful, do not consider the intertwined nature of embedded and software-intensive Radio Access Network systems. In this paper, we propose a Bi-level Federated Graph Neural Network anomaly detection and diagnosis model that is able to detect anomalies in Telecom networks in a privacy-preserving manner, while minimizing communication costs. Our method revolves around conceptualizing Telecom data as a bi-level temporal Graph Neural Networks. The first graph captures the interactions between different RAN nodes that are exposed to different deployment scenarios in the network, while each individual Radio Access Network node is further elaborated into its software (SW) execution graph. Additionally, we use Federated Learning to address privacy and security limitations. Furthermore, we study the performance of anomaly detection model under three settings: (1) Centralized (2) Federated Learning and (3) Personalized Federated Learning using real-world data from an operational network. Our comprehensive experiments showed that Personalized Federated Temporal Graph Neural Networks method outperforms the most commonly used techniques for Anomaly Detection.
    摘要 “5G和以后网络逐渐变得越来越复杂和多样化,各种emerging应用的需求也越来越高。这种复杂和多样化的telecom网络对维护和运维带来增加的压力。此外,保持安全和隐私的要求也对 mobil operators来说是一个挑战。以往的工作主要利用传统的ML/DL方法来检测网络异常。然而,这些方法不考虑Radio Access Network系统中的嵌入式和软件化的特点。在这篇论文中,我们提出了一种Bi-level Federated Graph Neural Network异常检测和诊断模型,可以在保持隐私的情况下检测网络异常,同时尽可能降低通信成本。我们的方法基于 conceptualizing Telecom数据为bi-level时间Graph Neural Networks。第一个图表captures不同的RAN节点之间的互动,而每个Radio Access Network节点进一步分解成其软件(SW)执行图。此外,我们使用联合学习来解决隐私和安全限制。我们还在实际网络数据上进行了三种设置的实验:(1)中央化(2)联合学习和(3)个性化联合学习。我们的全面实验结果表明,Personalized Federated Temporal Graph Neural Networks方法在异常检测中比最常用的技术表现出色。”

Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

  • paper_url: http://arxiv.org/abs/2311.14468
  • repo_url: None
  • paper_authors: Corentin Salaün, Xingchang Huang, Iliyan Georgiev, Niloy J. Mitra, Gurprit Singh
  • for: 这 paper 是为了提高 Stochastic Gradient Descent (SGD) 优化器的效果,特别是在Estimating gradients from a mini-batch of data samples方面。
  • methods: 这 paper 使用了一种新的 Adaptive Importance Sampling (AIS) 技术,它可以减少 noise in gradient estimation,并且可以有效地 интеGRATE importance sampling into machine learning frameworks。
  • results: 这 paper 的实验结果表明,使用 AIS 技术可以提高 classification 和 regression 任务的 converge 率,而且可以减少计算负担。 且 validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets。
    Abstract Machine learning problems rely heavily on stochastic gradient descent (SGD) for optimization. The effectiveness of SGD is contingent upon accurately estimating gradients from a mini-batch of data samples. Instead of the commonly used uniform sampling, adaptive or importance sampling reduces noise in gradient estimation by forming mini-batches that prioritize crucial data points. Previous research has suggested that data points should be selected with probabilities proportional to their gradient norm. Nevertheless, existing algorithms have struggled to efficiently integrate importance sampling into machine learning frameworks. In this work, we make two contributions. First, we present an algorithm that can incorporate existing importance functions into our framework. Second, we propose a simplified importance function that relies solely on the loss gradient of the output layer. By leveraging our proposed gradient estimation techniques, we observe improved convergence in classification and regression tasks with minimal computational overhead. We validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets.
    摘要 (Simplified Chinese)机器学习问题强调了Stochastic Gradient Descent(SGD)优化。SGD的效果取决于正确地估计 mini-batch 中数据点的梯度。而不是通常使用均匀采样,可适应采样可以减少梯度估计中的噪声。先前的研究表明,数据点选择概率应该与梯度 нор 成正比。然而,现有的算法很难高效地将重要性采样 integrate 到机器学习框架中。在这种情况下,我们做了两个贡献。首先,我们提供了一个可以包含现有重要性函数的算法。其次,我们提议一种简单的重要性函数,它仅仅基于输出层的损失梯度。通过我们提议的梯度估计技术,我们在分类和回归任务中观察到提高了收敛性,并且具有最小的计算开销。我们验证了我们的自适应和重要性采样方法在图像和点云Dataset中的效果。

Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation

  • paper_url: http://arxiv.org/abs/2311.14464
  • repo_url: None
  • paper_authors: Loh Sher En Jessica, Naheed Anjum Arafat, Wei Xian Lim, Wai Lee Chan, Adams Wai Kin Kong
  • for: 提高 CFD 模拟效果和精度。
  • methods: 使用 Shortest Vector (SV) 和 Directional Integrated Distance (DID) 两种新的几何表示方法,以及 Finite Volume Features (FVF) 在图 convolution 中作为节点和边特征。
  • results: 实验结果表明,使用 SV、DID、FVF 和 residual training 可以降低 CFD 模拟结果的预测错误率,相比现有的 GNN 方法可以降低至少 41%。
    Abstract Computational fluid dynamics (CFD) simulation is an irreplaceable modelling step in many engineering designs, but it is often computationally expensive. Some graph neural network (GNN)-based CFD methods have been proposed. However, the current methods inherit the weakness of traditional numerical simulators, as well as ignore the cell characteristics in the mesh used in the finite volume method, a common method in practical CFD applications. Specifically, the input nodes in these GNN methods have very limited information about any object immersed in the simulation domain and its surrounding environment. Also, the cell characteristics of the mesh such as cell volume, face surface area, and face centroid are not included in the message-passing operations in the GNN methods. To address these weaknesses, this work proposes two novel geometric representations: Shortest Vector (SV) and Directional Integrated Distance (DID). Extracted from the mesh, the SV and DID provide global geometry perspective to each input node, thus removing the need to collect this information through message-passing. This work also introduces the use of Finite Volume Features (FVF) in the graph convolutions as node and edge attributes, enabling its message-passing operations to adjust to different nodes. Finally, this work is the first to demonstrate how residual training, with the availability of low-resolution data, can be adopted to improve the flow field prediction accuracy. Experimental results on two datasets with five different state-of-the-art GNN methods for CFD indicate that SV, DID, FVF and residual training can effectively reduce the predictive error of current GNN-based methods by as much as 41%.
    摘要 计算流体动力学(CFD)模拟是许多工程设计中不可或缺的模拟步骤,但它往往具有计算成本高的问题。一些基于图神经网络(GNN)的CFD方法已经被提出。然而,现有的方法继承了传统的数值模拟器的弱点,同时忽略了finite volume方法中的细网格特征。具体来说,输入节点在这些GNN方法中具有非常有限的对象和其周围环境的信息。此外,finite volume方法中的细网格特征,如细网格体积、面积和中心点,不包括在消息传递操作中。为解决这些弱点,本工作提出了两种新的几何表示:最短 вектор(SV)和方向积分距离(DID)。从细网格中提取出来的SV和DID为每个输入节点提供全局几何视角,因此消除了通过消息传递收集这些信息的需要。此外,本工作还引入了finite volume特征(FVF)在图 convolution 中作为节点和边属性,使其消息传递操作可以适应不同的节点。最后,本工作是首次实现了使用剩余训练,在低分辨率数据可用时,提高流体场预测精度的可能性。实验结果表明,使用SV、DID、FVF和剩余训练可以有效地降低当前GNN基于CFD方法的预测错误,最多降低41%。

Disentangling the Spectral Properties of the Hodge Laplacian: Not All Small Eigenvalues Are Equal

  • paper_url: http://arxiv.org/abs/2311.14427
  • repo_url: None
  • paper_authors: Vincent P. Grande, Michael T. Schaub
  • for: 这篇论文旨在为图论、机器学习和图信号处理等领域提供更多的信息,通过对哥德 Laplacian 的精细特征进行分析。
  • methods: 这篇论文使用了 Hodge Laplacian 作为更高级别的图模型,并对其最小特征值进行分析,从而掌握到图中的重要拓扑性质。
  • results: 这篇论文提出了一种基于 Hodge Laplacian 的 persistenteigenvector similarity 的方法,可以跟踪不同级别的图信号在不同缩放级别上的变化。此外,这篇论文还提出了一种基于这种 persistenteigenvector similarity 的图特征分类方法。
    Abstract The rich spectral information of the graph Laplacian has been instrumental in graph theory, machine learning, and graph signal processing for applications such as graph classification, clustering, or eigenmode analysis. Recently, the Hodge Laplacian has come into focus as a generalisation of the ordinary Laplacian for higher-order graph models such as simplicial and cellular complexes. Akin to the traditional analysis of graph Laplacians, many authors analyse the smallest eigenvalues of the Hodge Laplacian, which are connected to important topological properties such as homology. However, small eigenvalues of the Hodge Laplacian can carry different information depending on whether they are related to curl or gradient eigenmodes, and thus may not be comparable. We therefore introduce the notion of persistent eigenvector similarity and provide a method to track individual harmonic, curl, and gradient eigenvectors/-values through the so-called persistence filtration, leveraging the full information contained in the Hodge-Laplacian spectrum across all possible scales of a point cloud. Finally, we use our insights (a) to introduce a novel form of topological spectral clustering and (b) to classify edges and higher-order simplices based on their relationship to the smallest harmonic, curl, and gradient eigenvectors.
    摘要 “graph Laplacian的各种 спектル інформація在图论、机器学习和图信号处理中具有重要的应用,如图分类、凝集分析和eigenmode分析。在这些应用中,作者们通常分析最小的Hodge Laplacian的値,这些値值与重要的Topological Property有关,如同体积。但是,小的Hodge Laplacian値可能不同具有不同的信息,具体取决于curl或漫步eigenmode的相关性,因此可能不能比较。我们因此引入 persistenteigenvector similarity的概念,并提供了一种方法,通过called persistenc filtration,利用Hodge-Laplacian спектル中所有可能的扩展scale的点云中的全部信息,跟踪individual harmonic、curl和gradient eigenvectors/-values的变化。最后,我们使用我们的发现(a)提出了一种新的topological spectral clustering方法,以及(b)基于curl、gradient和harmonic eigenvectors的edge和高阶simplices的分类。”Note that Simplified Chinese is a written form of Chinese that uses shorter words and simpler grammar than Traditional Chinese. The translation may not be exactly the same as the original text, but it should convey the same meaning and ideas.

Approximation of Convex Envelope Using Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.14421
  • repo_url: None
  • paper_authors: Vivek S. Borkar, Adit Akarsh
  • for: 这个论文是用来估计非凸函数的凸包的问题的 Stochastic control 形式ulation。
  • methods: 该论文使用了一种基于 Q-learning 的控制优化方法来近似凸包,以及一种变体的 Q-learning 算法来实现控制的优化停止。
  • results: 该论文在一个标准的测试问题库中得到了非常有 promise 的结果。
    Abstract Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function. Based on this, we develop a reinforcement learning scheme to approximate the convex envelope, using a variant of Q-learning for controlled optimal stopping. It shows very promising results on a standard library of test problems.
    摘要 奥伯曼提出了一种游戏理论的控制形式,用于估算非凸函数的凸包。基于这种形式,我们开发了一种基于Q学习的控制优化停止方法,用于approximate凸包。实验结果表明,这种方法在标准库中的测试问题上表现很出色。Note: "凸包" (convex envelope) in Chinese is typically translated as "凸函数的凸包" (convex envelope of a function), but since the text already uses "凸包" without any clarification, I kept the translation consistent.

A Comparison of PDF Projection with Normalizing Flows and SurVAE

  • paper_url: http://arxiv.org/abs/2311.14412
  • repo_url: None
  • paper_authors: Paul M. Baggenstoss, Felix Govaers
  • for: 这篇论文目的是为了探讨Normalizing Flows(NF)和Surjection VAE(SurVAE)等方法的应用。
  • methods: 这篇论文使用了NF和SurVAE等方法,这些方法可以实现精确的预测和探索。
  • results: 这篇论文发现了NF和SurVAE的应用是PDF projection的重新发明,这个概念已经在过去二十年内得到了更多的发展。
    Abstract Normalizing flows (NF) recently gained attention as a way to construct generative networks with exact likelihood calculation out of composable layers. However, NF is restricted to dimension-preserving transformations. Surjection VAE (SurVAE) has been proposed to extend NF to dimension-altering transformations. Such networks are desirable because they are expressive and can be precisely trained. We show that the approaches are a re-invention of PDF projection, which appeared over twenty years earlier and is much further developed.
    摘要 对�utable流(NF)在最近吸引了一些注意,因为它可以通过可composable层建立生成网络,并且可以进行精确的概率计算。然而,NF受限于维度保持变换。Sujection VAE(SurVAE)已经提议以增加NF的维度变换能力。这些网络非常表达力强,可以精确地训练。我们显示出这些方法与PDF投影已经在二十年前出现,而且已经更加发展。Note: "PDF projection" in the text refers to "dimensionality-preserving flow projection"

Unveiling The Factors of Aesthetic Preferences with Explainable AI

  • paper_url: http://arxiv.org/abs/2311.14410
  • repo_url: None
  • paper_authors: Derya Soydaner, Johan Wagemans
  • for: 本研究旨在探讨图像美学吸引人的原因,并使用机器学习模型来预测图像的美学分数。
  • methods: 我们使用了多种机器学习模型,包括Random Forest、XGBoost、Support Vector Regression和Multilayer Perceptron,并使用SHAP技术来解释图像美学分数的各个属性和它们之间的交互。
  • results: 我们在三个图像美学标准 benchmark 上进行了实验,发现不同的机器学习模型在预测图像美学分数方面的表现不同,同时使用SHAP技术可以提供更深入的解释。
    Abstract The allure of aesthetic appeal in images captivates our senses, yet the underlying intricacies of aesthetic preferences remain elusive. In this study, we pioneer a novel perspective by utilizing machine learning models that focus on aesthetic attributes known to influence preferences. Through a data mining approach, our models process these attributes as inputs to predict the aesthetic scores of images. Moreover, to delve deeper and obtain interpretable explanations regarding the factors driving aesthetic preferences, we utilize the popular Explainable AI (XAI) technique known as SHapley Additive exPlanations (SHAP). Our methodology involves employing various machine learning models, including Random Forest, XGBoost, Support Vector Regression, and Multilayer Perceptron, to compare their performances in accurately predicting aesthetic scores, and consistently observing results in conjunction with SHAP. We conduct experiments on three image aesthetic benchmarks, providing insights into the roles of attributes and their interactions. Ultimately, our study aims to shed light on the complex nature of aesthetic preferences in images through machine learning and provides a deeper understanding of the attributes that influence aesthetic judgements.
    摘要 《图像美学魅力的研究》Introduction:图像美学魅力 captivates our senses, yet the underlying intricacies of aesthetic preferences remain elusive. In this study, we pioneer a novel perspective by utilizing machine learning models that focus on aesthetic attributes known to influence preferences. Through a data mining approach, our models process these attributes as inputs to predict the aesthetic scores of images. Moreover, to delve deeper and obtain interpretable explanations regarding the factors driving aesthetic preferences, we utilize the popular Explainable AI (XAI) technique known as SHapley Additive exPlanations (SHAP).Methodology:Our methodology involves employing various machine learning models, including Random Forest, XGBoost, Support Vector Regression, and Multilayer Perceptron, to compare their performances in accurately predicting aesthetic scores, and consistently observing results in conjunction with SHAP. We conduct experiments on three image aesthetic benchmarks, providing insights into the roles of attributes and their interactions.Objectives:Our study aims to shed light on the complex nature of aesthetic preferences in images through machine learning and provides a deeper understanding of the attributes that influence aesthetic judgements. By utilizing SHAP, we can obtain interpretable explanations regarding the factors driving aesthetic preferences, allowing for a more nuanced understanding of the underlying mechanisms.Expected Outcomes:We expect our study to provide valuable insights into the aesthetic preferences of images and the factors that influence them. By comparing the performances of various machine learning models and analyzing the results with SHAP, we can gain a more comprehensive understanding of the complex nature of aesthetic preferences and the attributes that contribute to them. Ultimately, our study aims to provide a deeper understanding of the intricacies of aesthetic preferences in images and their underlying mechanisms.

BHGNN-RT: Network embedding for directed heterogeneous graphs

  • paper_url: http://arxiv.org/abs/2311.14404
  • repo_url: https://github.com/albertlordsun/bhgnn-rt
  • paper_authors: Xiyang Sun, Fumiyasu Komaki
  • for: 本研究旨在提出一种适用于指定异ogeneous网络的bidirectional heterogeneous graph neural network with random teleport(BHGNN-RT),以解决指定异ogeneous网络中的过拟合问题。
  • methods: 本研究提出了一种基于bidirectional message-passing过程和网络不同性的 embedding方法,并且通过优化 телепорport比例来解决过拟合问题。
  • results: 经验表明,BHGNN-RT在不同数据集上展现出了优于比较方法的性能,并且可以在节点分类和无监督归一类任务中达到领先水平。此外,研究还探讨了消息组件、模型层次和电PORT比例对模型性能的影响。
    Abstract Networks are one of the most valuable data structures for modeling problems in the real world. However, the most recent node embedding strategies have focused on undirected graphs, with limited attention to directed graphs, especially directed heterogeneous graphs. In this study, we first investigated the network properties of directed heterogeneous graphs. Based on network analysis, we proposed an embedding method, a bidirectional heterogeneous graph neural network with random teleport (BHGNN-RT), for directed heterogeneous graphs, that leverages bidirectional message-passing process and network heterogeneity. With the optimization of teleport proportion, BHGNN-RT is beneficial to overcome the over-smoothing problem. Extensive experiments on various datasets were conducted to verify the efficacy and efficiency of BHGNN-RT. Furthermore, we investigated the effects of message components, model layer, and teleport proportion on model performance. The performance comparison with all other baselines illustrates that BHGNN-RT achieves state-of-the-art performance, outperforming the benchmark methods in both node classification and unsupervised clustering tasks.
    摘要 网络是现实世界中一种非常有价值的数据结构,但最近的节点嵌入策略却主要关注于无向图,尚未充分关注有向图,特别是有向不同类型图。在本研究中,我们首先研究了指向不同类型图的网络性质。基于网络分析,我们提出了一种嵌入方法,即bidirectional heterogeneous graph neural network with random teleport(BHGNN-RT),用于直接不同类型图,该方法利用双向信息传递过程和网络多样性。通过优化 телепор比例,BHGNN-RT可以解决过度平滑问题。我们在多个数据集上进行了广泛的实验,以证明BHGNN-RT的效果和效率。此外,我们还研究了消息组成部分、模型层次和 телепор比例对模型性能的影响。与所有基elines相比,BHGNN-RT实现了状态革命性的表现,在节点类别预测和无监督归一类任务中都高于标准方法。

TEA: Test-time Energy Adaptation

  • paper_url: http://arxiv.org/abs/2311.14402
  • repo_url: None
  • paper_authors: Yige Yuan, Bingbing Xu, Liang Hou, Fei Sun, Huawei Shen, Xueqi Cheng
  • for: 提高模型总体化能力,尤其是在测试数据与训练数据分布不同时
  • methods: 基于能量视角,将训练过的分类器转化为能量基本模型,并将模型的分布与测试数据的分布进行对应
  • results: 对多个任务、标准准则和架构进行了广泛的实验,显示 TEA 的总体化性能较为先进,并且可以帮助模型更好地识别测试数据分布,从而提高总体化和准确性。
    Abstract Test-time adaptation (TTA) aims to improve model generalizability when test data diverges from training distribution, offering the distinct advantage of not requiring access to training data and processes, especially valuable in the context of large pre-trained models. However, current TTA methods fail to address the fundamental issue: covariate shift, i.e., the decreased generalizability can be attributed to the model's reliance on the marginal distribution of the training data, which may impair model calibration and introduce confirmation bias. To address this, we propose a novel energy-based perspective, enhancing the model's perception of target data distributions without requiring access to training data or processes. Building on this perspective, we introduce $\textbf{T}$est-time $\textbf{E}$nergy $\textbf{A}$daptation ($\textbf{TEA}$), which transforms the trained classifier into an energy-based model and aligns the model's distribution with the test data's, enhancing its ability to perceive test distributions and thus improving overall generalizability. Extensive experiments across multiple tasks, benchmarks and architectures demonstrate TEA's superior generalization performance against state-of-the-art methods. Further in-depth analyses reveal that TEA can equip the model with a comprehensive perception of test distribution, ultimately paving the way toward improved generalization and calibration.
    摘要

Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

  • paper_url: http://arxiv.org/abs/2311.14387
  • repo_url: None
  • paper_authors: Mingze Wang, Zeping Min, Lei Wu
  • for: 本研究 investigate了梯度-based algorithm在分类 linearly separable data 上的margin-maximization bias。
  • methods: 本研究使用了深入分析normalized gradient velocity field的特性,强调其在margin maximization中的作用。基于此分析,我们提出了一种新的算法 called Progressive Rescaling Gradient Descent (PRGD),并证明PRGD可以在 exponential rate 上 maximize the margin。
  • results: 我们发现了一些数据分布下,现有的算法such as gradient descent (GD)和normalized gradient descent (NGD) provably fail 在高效地 maximize the margin。此外,我们还发现PRGD可以增强深度神经网络和线性不可分数据集上的泛化性表现。
    Abstract In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
    摘要 在这项研究中,我们调查了基于梯度的算法在分类 linearly separable 数据时的margin-maximization偏好。我们进行了深入的velocity场分析,关注把normalized梯度 associate with的特性,尤其是它们在margin maximization中的作用。 inspirited by这种分析,我们提出了一种新的算法 called Progressive Rescaling Gradient Descent (PRGD),并证明PRGD可以在 exponential rate 上 maximize the margin。这与所有现有的算法不同,他们只能在 slow polynomial rate 上 maximize the margin。我们还证明了exist 的算法如 gradient descent (GD) 和 normalized gradient descent (NGD) 在certain condition下 provably fail 在efficiently maximize the margin。为 validate our theoretical findings, we present both synthetic and real-world experiments. notable, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.

Deciphering and integrating invariants for neural operator learning with various physical mechanisms

  • paper_url: http://arxiv.org/abs/2311.14361
  • repo_url: None
  • paper_authors: Rui Zhang, Qi Meng, Zhi-Ming Ma
  • for: 本研究旨在开发一种基于神经网络的 Physical Invariant Attention Neural Operator (PIANO),用于解决基于 partial differential equation (PDE) 的物理系统模拟问题。
  • methods: PIANO 使用自我超视的学习方法提取物理知识,并使用注意力机制将其集成到动态卷积层中。
  • results: 相比现有技术,PIANO 可以降低 PDE 预测任务中相对误差的百分比值,从13.6% 至 82.2%。此外, varied 下游任务表明 PIANO 所提取的物理嵌入 aligned well 于 PDE 系统中的下面逻辑,证明 PIANO 的物理意义。
    Abstract Neural operators have been explored as surrogate models for simulating physical systems to overcome the limitations of traditional partial differential equation (PDE) solvers. However, most existing operator learning methods assume that the data originate from a single physical mechanism, limiting their applicability and performance in more realistic scenarios. To this end, we propose Physical Invariant Attention Neural Operator (PIANO) to decipher and integrate the physical invariants (PI) for operator learning from the PDE series with various physical mechanisms. PIANO employs self-supervised learning to extract physical knowledge and attention mechanisms to integrate them into dynamic convolutional layers. Compared to existing techniques, PIANO can reduce the relative error by 13.6\%-82.2\% on PDE forecasting tasks across varying coefficients, forces, or boundary conditions. Additionally, varied downstream tasks reveal that the PI embeddings deciphered by PIANO align well with the underlying invariants in the PDE systems, verifying the physical significance of PIANO. The source code will be publicly available at: https://github.com/optray/PIANO.
    摘要 <>Translate given text into Simplified Chinese.<> neuronal 算法已经被探索作为模拟物理系统的代理模型,以超越传统的partial differential equation(PDE)解决方法的局限性。然而,大多数现有的算法学习方法假设数据来自单一的物理机制,这限制了它们在更真实的场景中的可应用性和性能。为此,我们提出了物理 invariants 吸引神经算法(PIANO),以解读并 integrate 物理 invariants(PI)在 PDE 系列中的学习。PIANO 使用了自我超级vised 学习来提取物理知识,并使用了注意力机制来集成它们到动态卷积层中。相比之前的技术,PIANO 可以降低 PDE 预测任务中相对错误率 by 13.6%-82.2%,并且在不同的下游任务中,PI 嵌入被PIANO解读出来的对应于 Underlying invariants 在 PDE 系统中,这证明了PIANO 的物理意义。代码将公开在:https://github.com/optray/PIANO。

Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study

  • paper_url: http://arxiv.org/abs/2311.14359
  • repo_url: None
  • paper_authors: Xueqing Liu, Nina Deliu, Tanujit Chakraborty, Lauren Bell, Bibhas Chakraborty
  • for: 本研究旨在提高远期结果,如临床状况,通过适时适量的可靠性改进。
  • methods: 本研究使用了 Contextual Bandits 框架,通过个性化时变上下文来自适应ively 调整 intervención。
  • results: 研究提出了一种将 count 数据模型 integrate 到在线决策中的方法,并在实际数据集和模拟数据集上证明了该方法的有效性。 regret bounds 也得到了理论上的证明。
    Abstract Mobile health (mHealth) technologies aim to improve distal outcomes, such as clinical conditions, by optimizing proximal outcomes through just-in-time adaptive interventions. Contextual bandits provide a suitable framework for customizing such interventions according to individual time-varying contexts, intending to maximize cumulative proximal outcomes. However, unique challenges such as modeling count outcomes within bandit frameworks have hindered the widespread application of contextual bandits to mHealth studies. The current work addresses this challenge by leveraging count data models into online decision-making approaches. Specifically, we combine four common offline count data models (Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regressions) with Thompson sampling, a popular contextual bandit algorithm. The proposed algorithms are motivated by and evaluated on a real dataset from the Drink Less trial, where they are shown to improve user engagement with the mHealth system. The proposed methods are further evaluated on simulated data, achieving improvement in maximizing cumulative proximal outcomes over existing algorithms. Theoretical results on regret bounds are also derived. A user-friendly R package countts that implements the proposed methods for assessing contextual bandit algorithms is made publicly available at https://cran.r-project.org/web/packages/countts.
    摘要 мобильные технологии здоровья (mHealth) стремятся улучшить дальнейшие результаты, такие как клинические условия, путем оптимизации ближайших результатов в реальном времени с помощью адаптивных интервенций на основе контекста. Фреймворк контекстных бандитов подходит для настройки таких интервенций в соответствии с индивидуальными контекстами в реальном времени, чтобы максимизировать суммарные ближайшие результаты. Однако, существуют уникальные проблемы, такие как моделирование количественных результатов в рамках фреймворков бандитов, что ограничивает применение контекстных бандитов в исследованиях mHealth. Наши работы преодолевают эту проблему, введя модели count data в онлайновое принятие решений. В частности, мы комбинируем четыре общепринятых офлайн-модели count data (полупановую, отрицательную биномиальную,Poissonовскую и отрицательную биномиальную регрессии) с алгоритмом Thompson sampling, популярным контекстным бандитом. Наши предложенные алгоритмы были вдохновлены и оценены на реальных данных из исследования Drink Less, где они были показаны улучшением вовлеченности пользователей в систему mHealth. Наши методы были также оценены на симулированных данных и показали улучшение в максимизации суммарных ближайших результатов по сравнению с существующими алгоритмами. Мы также получены теоретические результаты о границах regret. User-friendly R package countts, который реализует наши предложенные методы для оценки алгоритмов контекстных бандитов, является доступен для скачивания на сайте .

Cycle Invariant Positional Encoding for Graph Representation Learning

  • paper_url: http://arxiv.org/abs/2311.14333
  • repo_url: https://github.com/pkuyzy/CycleNet
  • paper_authors: Zuoyu Yan, Tengfei Ma, Liangcai Gao, Zhi Tang, Chao Chen, Yusu Wang
  • for: 增强图像学模型的图形数据中的循环元素
  • methods: 使用循环基础(一个最小的循环生成图形空间的基础)和循环结构编码模块
  • results: 在多个比较中表现更好,比如在多种benchmark中的表现Here is the full translation of the abstract in Simplified Chinese:
  • for: 本文为了增强图像学模型中的图形数据中的循环元素,提出了一种结构编码模块,即循环网络(CycleNet)。
  • methods: 我们使用循环基础(一个最小的循环生成图形空间的基础)和循环结构编码模块来编码循环信息。为保证编码是 permutation invariant,我们使用基于BasisNet的正交 проекor来编码循环信息。
  • results: 我们通过多种实验表明,增强了的模型在多个比较中表现更好,比如在多种benchmark中的表现。此外,我们还提供了一些理论理解module的表达力。
    Abstract Cycles are fundamental elements in graph-structured data and have demonstrated their effectiveness in enhancing graph learning models. To encode such information into a graph learning framework, prior works often extract a summary quantity, ranging from the number of cycles to the more sophisticated persistence diagram summaries. However, more detailed information, such as which edges are encoded in a cycle, has not yet been used in graph neural networks. In this paper, we make one step towards addressing this gap, and propose a structure encoding module, called CycleNet, that encodes cycle information via edge structure encoding in a permutation invariant manner. To efficiently encode the space of all cycles, we start with a cycle basis (i.e., a minimal set of cycles generating the cycle space) which we compute via the kernel of the 1-dimensional Hodge Laplacian of the input graph. To guarantee the encoding is invariant w.r.t. the choice of cycle basis, we encode the cycle information via the orthogonal projector of the cycle basis, which is inspired by BasisNet proposed by Lim et al. We also develop a more efficient variant which however requires that the input graph has a unique shortest cycle basis. To demonstrate the effectiveness of the proposed module, we provide some theoretical understandings of its expressive power. Moreover, we show via a range of experiments that networks enhanced by our CycleNet module perform better in various benchmarks compared to several existing SOTA models.
    摘要 “循环是很重要的元素在图structured data中,它们已经证明了它们可以增强图学习模型。在图学习框架中实现这些信息,先前的研究通常是提取一个总体量,从数量到更加复杂的对应图的实变图 summaries。但是,更详细的信息,例如哪些边被编码在循环中,尚未在图神经网络中使用过。在这篇论文中,我们做出了一步,对这个问题进行了解释,并提出了一个叫做CycleNet的结构编码模组。这个模组使用图的边结构编码,实现了图的循环信息编码,并且保持了 permutation 不变性。为了有效地编码循环空间,我们开始 WITH 一个循环基底(即循环空间的最小生成集),我们使用图的1维hodgetlaplacian的kernel来计算。为 garantuee encoding是对应给选择循环基底的不变性,我们使用循环基底的正交投影器,这是由BasisNet提出的。我们还开发了一个更有效的变iante,但是它需要输入图有唯一的最短循环基底。通过一些 teoretic 理解和实验验证,我们证明了我们的CycleNet模组可以增强图学习模型的表现。”

GATGPT: A Pre-trained Large Language Model with Graph Attention Network for Spatiotemporal Imputation

  • paper_url: http://arxiv.org/abs/2311.14332
  • repo_url: None
  • paper_authors: Yakun Chen, Xianzhi Wang, Guandong Xu
  • for: 该论文目的是提出一种基于大型自然语言模型(LLM)的协同推理框架,用于适应缺失数据的 espacio-temporal 补做。
  • methods: 该方法利用预训练的 LLM integretes a graph attention mechanism,保持大多数 LLM 参数不变,以便利用现有知识来学习时间模式,同时对特定应用进行微调。
  • results: 通过测试三个真实世界数据集,该创新方法与现有深度学习标准做比较,得到了相似的结果。
    Abstract The analysis of spatiotemporal data is increasingly utilized across diverse domains, including transportation, healthcare, and meteorology. In real-world settings, such data often contain missing elements due to issues like sensor malfunctions and data transmission errors. The objective of spatiotemporal imputation is to estimate these missing values by understanding the inherent spatial and temporal relationships in the observed multivariate time series. Traditionally, spatiotemporal imputation has relied on specific, intricate architectures designed for this purpose, which suffer from limited applicability and high computational complexity. In contrast, our approach integrates pre-trained large language models (LLMs) into spatiotemporal imputation, introducing a groundbreaking framework, GATGPT. This framework merges a graph attention mechanism with LLMs. We maintain most of the LLM parameters unchanged to leverage existing knowledge for learning temporal patterns, while fine-tuning the upper layers tailored to various applications. The graph attention component enhances the LLM's ability to understand spatial relationships. Through tests on three distinct real-world datasets, our innovative approach demonstrates comparable results to established deep learning benchmarks.
    摘要 《维度时空数据分析在不同领域中越来越普遍,如交通、医疗和气象等。在实际应用中,这些数据经常包含某些缺失元素,这可能是因为仪器故障或数据传输错误。目标是使用维度时空填充来估算这些缺失值,并且理解这些维度时空数据的自然关系。传统上,维度时空填充通常采用专门设计的复杂架构,这些架构受限于应用场景和计算复杂性。相比之下,我们的方法将预训练的大型自然语言模型(LLM) integrate into spatiotemporal imputation,提出了一种创新的框架——GATGPT。这种框架将图注意机制与LLM结合在一起,以便更好地理解空间关系。我们保留了大多数LLM参数不变,以利用现有的知识来学习时间模式,同时对上层进行微调,适应不同的应用。图注意部分可以增强LLM对空间关系的理解。经过对三个不同的实际数据集进行测试,我们的创新方法与传统的深度学习标准准确性相当。》

Reinforcement Learning from Statistical Feedback: the Journey from AB Testing to ANT Testing

  • paper_url: http://arxiv.org/abs/2311.14766
  • repo_url: None
  • paper_authors: Feiyang Han, Yimin Wei, Zhaofeng Liu, Yanxing Qi
  • for: 用于填补RLHF中商业目标和模型训练之间的空白,使用统计业务反馈来提高学习效果和性能。
  • methods: 使用AB测试来获取偏好反馈,并使用统计推理方法来计算奖励网络的偏好。
  • results: 提出了RLSF基于AB测试的方法,并通过多个数字体验 validate了方法的有效性。
    Abstract Reinforcement Learning from Human Feedback (RLHF) has played a crucial role in the success of large models such as ChatGPT. RLHF is a reinforcement learning framework which combines human feedback to improve learning effectiveness and performance. However, obtaining preferences feedback manually is quite expensive in commercial applications. Some statistical commercial indicators are usually more valuable and always ignored in RLHF. There exists a gap between commercial target and model training. In our research, we will attempt to fill this gap with statistical business feedback instead of human feedback, using AB testing which is a well-established statistical method. Reinforcement Learning from Statistical Feedback (RLSF) based on AB testing is proposed. Statistical inference methods are used to obtain preferences for training the reward network, which fine-tunes the pre-trained model in reinforcement learning framework, achieving greater business value. Furthermore, we extend AB testing with double selections at a single time-point to ANT testing with multiple selections at different feedback time points. Moreover, we design numerical experiences to validate the effectiveness of our algorithm framework.
    摘要 人工智能学习从人类反馈(RLHF)已经在大型模型如ChatGPT的成功中发挥了关键作用。 RLHF是一种结合人类反馈来提高学习效果和性能的 reinforcement learning 框架。然而,在商业应用中手动获取人类反馈是非常昂贵的。一些商业指标通常被忽略在 RLHF 中。这存在一个商业目标和模型训练之间的差距。在我们的研究中,我们尝试使用统计业务反馈来填充这个差距,使用 AB 测试,这是一种已有的统计方法。基于 AB 测试的 Reinforcement Learning from Statistical Feedback(RLSF)被提议。统计推理方法用于从统计反馈中获取训练奖网络的偏好,这些网络在 reinforcement learning 框架中进行精细调整,从而实现更大的商业价值。此外,我们扩展 AB 测试,使用 double selections 在同一时间点进行测试,并使用多个反馈时间点进行 ANT 测试。此外,我们设计了数字体验来验证我们的算法框架的效果。

AdaMedGraph: Adaboosting Graph Neural Networks for Personalized Medicine

  • paper_url: http://arxiv.org/abs/2311.14304
  • repo_url: None
  • paper_authors: Jie Lian, Xufang Luo, Caihua Shan, Dongqi Han, Varut Vardhanabhuti, Dongsheng Li
  • for: 这篇论文是针对个性化医疗 tailored to individual patients 的预测任务。
  • methods: 这篇论文使用机器学习技术处理个性化数据,包括影像、基因和评估。特别是使用建构图的方法,连结相似的病人,然后运用图神经网络(GNNs)进行预测。
  • results: 这篇论文提出了一个新的算法,可以自动选择重要的特征,建构多个病人相似图,并使用这些图进行预测。在两个实际医疗应用中,这个算法表现出色。
    Abstract Precision medicine tailored to individual patients has gained significant attention in recent times. Machine learning techniques are now employed to process personalized data from various sources, including images, genetics, and assessments. These techniques have demonstrated good outcomes in many clinical prediction tasks. Notably, the approach of constructing graphs by linking similar patients and then applying graph neural networks (GNNs) stands out, because related information from analogous patients are aggregated and considered for prediction. However, selecting the appropriate edge feature to define patient similarity and construct the graph is challenging, given that each patient is depicted by high-dimensional features from diverse sources. Previous studies rely on human expertise to select the edge feature, which is neither scalable nor efficient in pinpointing crucial edge features for complex diseases. In this paper, we propose a novel algorithm named \ours, which can automatically select important features to construct multiple patient similarity graphs, and train GNNs based on these graphs as weak learners in adaptive boosting. \ours{} is evaluated on two real-world medical scenarios and shows superiors performance.
    摘要 准精准医学在最近几年内得到了广泛关注。现代机器学习技术被用来处理个性化数据,包括图像、基因和评估等多种来源。这些技术在许多临床预测任务中表现出色。特别是通过建立相似病人之间的图并应用图 neural networks(GNNs)的方法,这种方法在汇集相似病人的相关信息并考虑这些信息 для预测中表现出色。然而,选择适当的边特征来定义病人相似性并构建图是困难的,因为每个病人都是由多维特征来自多种来源所描述。先前的研究依赖于人类专家来选择边特征,这并不是可推广的 nor efficient in identifying crucial edge features for complex diseases。在这篇论文中,我们提出了一种新的算法名为\ours,可以自动选择重要的特征来构建多个病人相似图,并基于这些图进行GNNs的训练为弱学习器在适应增强中。\ours{}在两个实际医疗场景中进行了评估,并表现出优于其他方法。

Out-of-Distribution Generalized Dynamic Graph Neural Network with Disentangled Intervention and Invariance Promotion

  • paper_url: http://arxiv.org/abs/2311.14255
  • repo_url: https://github.com/zeniSoida/pl1
  • paper_authors: Zeyang Zhang, Xin Wang, Ziwei Zhang, Haoyang Li, Wenwu Zhu
  • for: 这个 paper 的目的是对 dynamic graph neural networks (DyGNNs) 进行改进,以便在具有分布差异的动态图形上进行预测。
  • methods: 这个 paper 使用了一种名为 Disentangled Intervention-based Dynamic graph Attention networks with Invariance Promotion (I-DIDA) 的方法,它可以在动态图形上捕捉到不同的构造和特征,并将其转换为不同的几何构造和特征,以便在分布差异下进行预测。
  • results: 实验结果显示,这个方法可以在分布差异下进行预测,并且比以往的基elines 更高。这是首次研究具有分布差异的动态图形预测问题。
    Abstract Dynamic graph neural networks (DyGNNs) have demonstrated powerful predictive abilities by exploiting graph structural and temporal dynamics. However, the existing DyGNNs fail to handle distribution shifts, which naturally exist in dynamic graphs, mainly because the patterns exploited by DyGNNs may be variant with respect to labels under distribution shifts. In this paper, we propose Disentangled Intervention-based Dynamic graph Attention networks with Invariance Promotion (I-DIDA) to handle spatio-temporal distribution shifts in dynamic graphs by discovering and utilizing invariant patterns, i.e., structures and features whose predictive abilities are stable across distribution shifts. Specifically, we first propose a disentangled spatio-temporal attention network to capture the variant and invariant patterns. By utilizing the disentangled patterns, we design a spatio-temporal intervention mechanism to create multiple interventional distributions and an environment inference module to infer the latent spatio-temporal environments, and minimize the variance of predictions among these intervened distributions and environments, so that our model can make predictions based on invariant patterns with stable predictive abilities under distribution shifts. Extensive experiments demonstrate the superiority of our method over state-of-the-art baselines under distribution shifts. Our work is the first study of spatio-temporal distribution shifts in dynamic graphs, to the best of our knowledge.
    摘要 临时图 neural networks (DyGNNs) 已经表现出了强大的预测能力,利用图结构和时间动态。然而,现有的 DyGNNs 无法处理分布变化,这些变化 Naturally 存在于临时图中,主要因为 DyGNNs 所抓取的模式可能与标签之间的分布变化相关。在这篇论文中,我们提出了分离了解释-based Dynamic graph Attention networks with Invariance Promotion (I-DIDA),用于在动态图中处理空间-时间分布变化。具体来说,我们首先提出了分离的空间-时间注意力网络,用于捕捉变化和不变性模式。然后,我们设计了空间-时间干预机制,创建多个干预分布,并使用环境推理模块来推理 latent 空间-时间环境,以降低预测结果中的差异,从而使我们的模型可以基于不变性模式进行预测,并在分布变化下保持稳定的预测能力。我们的实验结果表明,我们的方法比现有的基elines 在分布变化下表现更优异。我们的工作是动态图中首次研究空间-时间分布变化,到目前为止,这是我们所知道的。