2023-09-29

cs.LG

cs.LG - 2023-09-29

A Neural-preconditioned Poisson Solver for Mixed Dirichlet and Neumann Boundary Conditions

paper_url: http://arxiv.org/abs/2310.00177
repo_url: None
paper_authors: Kai Weixian Lan, Elias Gueidon, Ayano Kaneda, Julian Panetta, Joseph Teran
for: 这个论文是用来解决Poisson方程中混合边界条件的问题的。
methods: 这个论文使用了一种基于神经网络的迭代解决方法，用于解决Poisson方程的离散 strucured-grid Laplace操作数的问题。
results: 这个论文的实验结果表明，使用这种神经网络预处理器可以高效地解决Poisson方程中的混合边界条件问题，并且在一些实际应用中比algebraic multigrid和其他一些神经网络预处理器更高效。

Abstract
We introduce a neural-preconditioned iterative solver for Poisson equations with mixed boundary conditions. The Poisson equation is ubiquitous in scientific computing: it governs a wide array of physical phenomena, arises as a subproblem in many numerical algorithms, and serves as a model problem for the broader class of elliptic PDEs. The most popular Poisson discretizations yield large sparse linear systems. At high resolution, and for performance-critical applications, iterative solvers can be advantageous for these -- but only when paired with powerful preconditioners. The core of our solver is a neural network trained to approximate the inverse of a discrete structured-grid Laplace operator for a domain of arbitrary shape and with mixed boundary conditions. The structure of this problem motivates a novel network architecture that we demonstrate is highly effective as a preconditioner even for boundary conditions outside the training set. We show that on challenging test cases arising from an incompressible fluid simulation, our method outperforms state-of-the-art solvers like algebraic multigrid as well as some recent neural preconditioners.

摘要
我们介绍了一种神经网络预conditioned迭代法来解决杜立特方程式中的混合边界条件问题。杜立特方程式在科学计算中很普遍，它控制了各种物理现象，在许多数值算法中出现为子问题，并且作为更广泛的杜立特PDEs的模型问题。最流行的杜立特积分方法会生成大量的稀疏线性系统。在高分辨率下和性能敏感应用中，迭代法可以是有利的，只要与强大的预conditioner一起使用。我们的核心思想是使用神经网络来近似离散格点拉普拉斯算子的逆函数，该问题具有自由形态和混合边界条件。我们的网络架构受到问题的结构所致，我们示示其在训练集外的边界条件上也是非常有效的预conditioner。我们在来自不可压缩流体模拟的困难测试案例上显示了我们的方法比STATE-OF-THE-ART的方法如多重多射grid和一些最新的神经预conditioner更高效。

Tight Bounds for Volumetric Spanners and Applications

paper_url: http://arxiv.org/abs/2310.00175
repo_url: None
paper_authors: Aditya Bhaskara, Sepideh Mahabadi, Ali Vakilian
for: 本研究目的是找到一个小型的基底，使得所有的输入向量都可以表示为这个基底的线性组合，并且这个基底的大小为最小。
methods: 本文使用了一种简单的本地搜索算法来找到这个小型基底。
results: 本文提供了对所有$\ell_p$ нор的准确的下界，并证明这些下界可以使用本地搜索算法来实现。此外，本文还应用了这些结果到其他任务上，包括最小体积包围螺旋凝聚问题（MVEE）问题的找到核心集问题。

Abstract
Given a set of points of interest, a volumetric spanner is a subset of the points using which all the points can be expressed using "small" coefficients (measured in an appropriate norm). Formally, given a set of vectors $X = \{v_1, v_2, \dots, v_n\}$, the goal is to find $T \subseteq [n]$ such that every $v \in X$ can be expressed as $\sum_{i\in T} \alpha_i v_i$, with $\|\alpha\|$ being small. This notion, which has also been referred to as a well-conditioned basis, has found several applications, including bandit linear optimization, determinant maximization, and matrix low rank approximation. In this paper, we give almost optimal bounds on the size of volumetric spanners for all $\ell_p$ norms, and show that they can be constructed using a simple local search procedure. We then show the applications of our result to other tasks and in particular the problem of finding coresets for the Minimum Volume Enclosing Ellipsoid (MVEE) problem.

摘要
Translated into Simplified Chinese:给一个点集合，一个卷积的扩展是一个包含该点集合的子集，使得所有点可以用"小"系数表示（在适当的 нор 中）。形式地说，给一个 vectors 集合 $X = \{v_1, v_2, \dots, v_n\}$，目标是找到 $T \subseteq [n]$ 使得所有 $v \in X$ 可以表示为 $\sum_{i\in T} \alpha_i v_i$，其中 $\|\alpha\|$ 是小的。这个概念，也被称为well-conditioned basis，在各种应用中得到了推广，包括bandit linear optimization、 determinant maximization 和 matrix low rank approximation。在这篇论文中，我们给出了所有 $\ell_p$ norm 下的几乎最佳大小 bound，并证明了它们可以使用简单的本地搜索算法构建。然后，我们展示了我们的结果在其他任务上的应用，特别是找到 Minimum Volume Enclosing Ellipsoid (MVEE) 问题中的核sets。

ADMET property prediction through combinations of molecular fingerprints

paper_url: http://arxiv.org/abs/2310.00174
repo_url: None
paper_authors: James H. Notwell, Michael W. Wood
for: 预测小分子活性
methods: 使用Random Forest、支持向量机（SVM）和扩展连接指纹（ECFP），并与200个分子性质进行组合
results: 使用Gradient Boosting Decision Tree（CatBoost）和组合ECFP、Avalon、ErG指纹以及200个分子性质得到最佳效果，并成功验证在22个Therapeutics Data Commons ADMETbenchmark上

Abstract
While investigating methods to predict small molecule potencies, we found random forests or support vector machines paired with extended-connectivity fingerprints (ECFP) consistently outperformed recently developed methods. A detailed investigation into regression algorithms and molecular fingerprints revealed gradient-boosted decision trees, particularly CatBoost, in conjunction with a combination of ECFP, Avalon, and ErG fingerprints, as well as 200 molecular properties, to be most effective. Incorporating a graph neural network fingerprint further enhanced performance. We successfully validated our model across 22 Therapeutics Data Commons ADMET benchmarks. Our findings underscore the significance of richer molecular representations for accurate property prediction.

摘要
“我们在预测小分子力量方面的研究中发现，随机林或支持向量机（SVM）配对扩展连接指纹（ECFP）一直表现出色，超过最近开发的方法。我们进一步进行了回归算法和分子指纹的调查，发现渐进式搜索树（Gradient Boosting），特别是CatBoost，配对了ECFP、Avalon和ErG指纹，以及200个分子性质，表现最佳。将graph neural network指纹添加到系统中也有助于提高表现。我们成功验证了我们的模型在22个Therapeutics Data Commons ADMET参考标准上。我们的发现强调了丰富的分子表示的重要性，以精确预测分子属性。”Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese dialects. If you need the translation in Traditional Chinese, please let me know.

One for All: Towards Training One Graph Model for All Classification Tasks

paper_url: http://arxiv.org/abs/2310.00149
repo_url: https://github.com/lechengkong/oneforall
paper_authors: Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, Muhan Zhang
for: 这篇论文的目的是提出一个通用的图像模型，能够解决不同领域的图像任务。
methods: 该模型使用文本描述图像中的节点和边，并使用语言模型将多种不同的文本特征编码到同一个嵌入空间中。此外，它还引入了节点优先的概念，以标准化不同任务的表示。
results: 论文通过使用多个领域的图像数据同时训练，证明了该模型在不同任务下的普通性。它在超参数、少参数和零参数学习场景中都表现出色。

Abstract
Designing a single model that addresses multiple tasks has been a long-standing objective in artificial intelligence. Recently, large language models have demonstrated exceptional capability in integrating and solving different tasks within the language domain. However, a unified model for various tasks on graphs remains underexplored, primarily due to the challenges unique to the graph learning domain. First, graph data from different areas carry distinct attributes and follow different distributions. Such discrepancy makes it hard to represent graphs in a single representation space. Second, tasks on graphs diversify into node, link, and graph tasks, requiring distinct embedding strategies. Finally, an appropriate graph prompting paradigm for in-context learning is unclear. Striving to handle all the aforementioned challenges, we propose One for All (OFA), the first general framework that can use a single graph model to address the above challenges. Specifically, OFA proposes text-attributed graphs to unify different graph data by describing nodes and edges with natural language and uses language models to encode the diverse and possibly cross-domain text attributes to feature vectors in the same embedding space. Furthermore, OFA introduces the concept of nodes-of-interest to standardize different tasks with a single task representation. For in-context learning on graphs, OFA introduces a novel graph prompting paradigm that appends prompting substructures to the input graph, which enables it to address varied tasks without fine-tuning. We train the OFA model using graph data from multiple domains (including citation networks, molecular graphs, knowledge graphs, etc.) simultaneously and evaluate its ability in supervised, few-shot, and zero-shot learning scenarios. OFA performs well across different tasks, making it the first general-purpose graph classification model across domains.

摘要
设计一个单一模型，能够解决多个任务，是人工智能领域的长期目标。现在，大型自然语言模型已经表现出了惊人的能力，可以将不同的语言任务集成到同一个空间中。然而，对于图学习领域的任务，一个综合的模型仍然是未解决的问题。主要的问题包括：图数据来自不同领域，具有不同的特征和分布，这使得将图据表示在同一个空间中变得困难；任务在图上的多样性，需要不同的嵌入策略；以及对于图学习中的具体任务，没有明确的prompting方法。为了解决这些挑战，我们提出了一个名为“One for All”（OFA）的框架，它可以使用单一的图模型来解决多个任务。OFA使用文本描述图的方法，将不同领域的图数据统一到同一个表示空间中。然后，通过语言模型将不同领域的文本特征编码到同一个嵌入空间中。此外，OFA还引入了“焦点节点”的概念，以标识不同任务的共同表示。为了在图学习中进行具体任务的学习，OFA引入了一种新的图Prompting方法，可以在输入图上添加prompting结构，从而实现无需 fine-tuning 的多任务学习。我们使用来自多个领域的图数据（包括引用网络、分子图、知识图等）进行同时训练，并对OFA模型进行supervised、少数shot和零shot学习场景的评估。 results show that OFA perfoms well across different tasks and domains, making it the first general-purpose graph classification model across domains.

On the Disconnect Between Theory and Practice of Overparametrized Neural Networks

paper_url: http://arxiv.org/abs/2310.00137
repo_url: None
paper_authors: Jonathan Wenger, Felix Dangel, Agustinus Kristiadi
for: 这篇论文是关于神经网络（NNs）的无穷宽限制的研究，以分析大规模、过参数化网络的行为为目的。
methods: 论文使用了神经 Tangent 函数（NTK）来连接神经网络和内核方法，从而实现了神经网络和内核方法之间的联系。
results: 研究发现，在优化、uncertainty quantification和 continual learning 等方面，大规模神经网络不会 exhibit 预测的行为，即使宽度比深度多出很多。这种观察到的差异问题了理论和实践之间的连接。

Abstract
The infinite-width limit of neural networks (NNs) has garnered significant attention as a theoretical framework for analyzing the behavior of large-scale, overparametrized networks. By approaching infinite width, NNs effectively converge to a linear model with features characterized by the neural tangent kernel (NTK). This establishes a connection between NNs and kernel methods, the latter of which are well understood. Based on this link, theoretical benefits and algorithmic improvements have been hypothesized and empirically demonstrated in synthetic architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that practically relevant architectures do not exhibit behavior as predicted via the NTK. In this work, we empirically investigate whether the limiting regime either describes the behavior of large-width architectures used in practice or is informative for algorithmic improvements. Our empirical results demonstrate that this is not the case in optimization, uncertainty quantification or continual learning. This observed disconnect between theory and practice calls into question the practical relevance of the infinite-width limit.

摘要
宽度无穷限的神经网络（NN）在理论上已引起了广泛关注，作为大规模、过参数化网络的分析理论框架。随着宽度增加，NN Approximately converge to a linear model，其特征由神经拟合函数（NTK）定义。这种连接NN和核函数方法，使得NN的理论优势和算法改进得到了更好的理解。在这个链接下，有许多理论上的利点和实际上的改进被提出和实验证明，包括快速优化、可靠的uncertainty量化和持续学习。然而，当前的结果表明，在实际应用中使用的大规模网络 Architecture需要orders of magnitude wider than deep，这引起了关注，因为这种假设表明，实际应用中的网络不会展现出预测的行为。在这个工作中，我们通过实验来检验，无限宽限是否对实际应用中的大规模网络有用，我们发现，这并不是情况。这种观察到的偏差，质疑无限宽限的实际 relevance。

Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs

paper_url: http://arxiv.org/abs/2310.00120
repo_url: None
paper_authors: Jean Kossaifi, Nikola Kovachki, Kamyar Azizzadenesheli, Anima Anandkumar
for: addresses the limitations of learning solution operators of partial differential equations (PDEs) at high resolutions by introducing a new data efficient and highly parallelizable approach with reduced memory requirement and better generalization.
methods: leverages local and global structures of full-scale, real-world phenomena through a decomposition of both the input domain and the operator’s parameter space, and represents the parameters of the model in a high-order latent subspace of the Fourier domain through a global tensor factorization.
results: achieves superior performance on the turbulent Navier-Stokes equations with less than half the error and over 150x compression, and reduces the number of parameters by over 150x and the domain size by 7x without losses in accuracy, while slightly enabling parallelism.

Abstract
Memory complexity and data scarcity have so far prohibited learning solution operators of partial differential equations (PDEs) at high resolutions. We address these limitations by introducing a new data efficient and highly parallelizable operator learning approach with reduced memory requirement and better generalization, called multi-grid tensorized neural operator (MG-TFNO). MG-TFNO scales to large resolutions by leveraging local and global structures of full-scale, real-world phenomena, through a decomposition of both the input domain and the operator's parameter space. Our contributions are threefold: i) we enable parallelization over input samples with a novel multi-grid-based domain decomposition, ii) we represent the parameters of the model in a high-order latent subspace of the Fourier domain, through a global tensor factorization, resulting in an extreme reduction in the number of parameters and improved generalization, and iii) we propose architectural improvements to the backbone FNO. Our approach can be used in any operator learning setting. We demonstrate superior performance on the turbulent Navier-Stokes equations where we achieve less than half the error with over 150x compression. The tensorization combined with the domain decomposition, yields over 150x reduction in the number of parameters and 7x reduction in the domain size without losses in accuracy, while slightly enabling parallelism.

摘要
<>将文本翻译成简化中文。<>在解决部分束环Equation (PDEs)的学习问题上，内存复杂性和数据稀缺性是长期的瓶颈。我们通过引入一种新的数据高效并可并发化的运算符学习方法来缓解这些限制，称为多维Grid Tensorized Neural Operator (MG-TFNO)。MG-TFNO可以扩展到大容量的解析，通过利用全场、实际世界现象的本地和全球结构，对输入域和运算符参数空间进行分解。我们的贡献有三个方面：1. 通过一种新的多维Grid-based域分解方法，实现输入样本的并行化。2. 通过高级别的Latent Space Fourier Transform (LSFT)，将运算符参数表示为高级别的 latent subspace，从而减少参数的数量和提高泛化能力。3. 对Backbone FNO框架进行改进。我们的方法可以应用于任何运算符学习设定下。我们在 Navier-Stokes Equations中实现了较低于半个误差，并且实现了150倍压缩。tensorization和域分解的组合，导致参数的减少和域的减少，而无损失准确性。

Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks

paper_url: http://arxiv.org/abs/2310.00115
repo_url: None
paper_authors: Yanqiao Zhu, Jeehyun Hwang, Keir Adams, Zhen Liu, Bozhao Nan, Brock Stenfors, Yuanqi Du, Jatin Chauhan, Olaf Wiest, Olexandr Isayev, Connor W. Coley, Yizhou Sun, Wei Wang
for:这个论文主要是为了研究分子表示学习（MRL）在化学应用中的可能性和潜力。methods:这个论文使用的方法包括 Graph Neural Networks (GNNs) 和 ensemble learning，以学习分子表示。results:这个论文的结果表明，直接从可访问的转换空间学习可以提高多种任务和模型的性能。

Abstract
Molecular Representation Learning (MRL) has proven impactful in numerous biochemical applications such as drug discovery and enzyme design. While Graph Neural Networks (GNNs) are effective at learning molecular representations from a 2D molecular graph or a single 3D structure, existing works often overlook the flexible nature of molecules, which continuously interconvert across conformations via chemical bond rotations and minor vibrational perturbations. To better account for molecular flexibility, some recent works formulate MRL as an ensemble learning problem, focusing on explicitly learning from a set of conformer structures. However, most of these studies have limited datasets, tasks, and models. In this work, we introduce the first MoleculAR Conformer Ensemble Learning (MARCEL) benchmark to thoroughly evaluate the potential of learning on conformer ensembles and suggest promising research directions. MARCEL includes four datasets covering diverse molecule- and reaction-level properties of chemically diverse molecules including organocatalysts and transition-metal catalysts, extending beyond the scope of common GNN benchmarks that are confined to drug-like molecules. In addition, we conduct a comprehensive empirical study, which benchmarks representative 1D, 2D, and 3D molecular representation learning models, along with two strategies that explicitly incorporate conformer ensembles into 3D MRL models. Our findings reveal that direct learning from an accessible conformer space can improve performance on a variety of tasks and models.

摘要
молекулярное представление обучения (MRL) has proven impactful in numerous biochemical applications such as drug discovery and enzyme design. While graph neural networks (GNNs) are effective at learning molecular representations from a 2D molecular graph or a single 3D structure, existing works often overlook the flexible nature of molecules, which continuously interconvert across conformations via chemical bond rotations and minor vibrational perturbations. To better account for molecular flexibility, some recent works formulate MRL as an ensemble learning problem, focusing on explicitly learning from a set of conformer structures. However, most of these studies have limited datasets, tasks, and models. In this work, we introduce the first MoleculAR Conformer Ensemble Learning (MARCEL) benchmark to thoroughly evaluate the potential of learning on conformer ensembles and suggest promising research directions. MARCEL includes four datasets covering diverse molecule- and reaction-level properties of chemically diverse molecules including organocatalysts and transition-metal catalysts, extending beyond the scope of common GNN benchmarks that are confined to drug-like molecules. In addition, we conduct a comprehensive empirical study, which benchmarks representative 1D, 2D, and 3D molecular representation learning models, along with two strategies that explicitly incorporate conformer ensembles into 3D MRL models. Our findings reveal that direct learning from an accessible conformer space can improve performance on a variety of tasks and models.

Reinforcement Learning for Node Selection in Branch-and-Bound

paper_url: http://arxiv.org/abs/2310.00112
repo_url: None
paper_authors: Alexander Mattick, Christopher Mutschler
for: 提高 branch and bound 算法的优化性能
methods: 使用强化学习（RL）模型，基于整个搜索树状态来选择节点
results: 在多种复杂问题集上实现了较高质量的节点选择策略，并在时间约束下提高了优化性能和每个节点的效率

Abstract
A big challenge in branch and bound lies in identifying the optimal node within the search tree from which to proceed. Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data. We propose a novel bi-simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes. To achieve this, we train a graph neural network that produces a probability distribution based on the path from the model's root to its ``to-be-selected'' leaves. Modelling node-selection as a probability distribution allows us to train the model using state-of-the-art RL techniques that capture both intrinsic node-quality and node-evaluation costs. Our method induces a high quality node selection policy on a set of varied and complex problem sets, despite only being trained on specially designed, synthetic TSP instances. Experiments on several benchmarks show significant improvements in optimality gap reductions and per-node efficiency under strict time constraints.

摘要
很大的挑战在分支和约束中是确定最佳节点的搜索树中进行下一步。当前的状态艺术选择器使用 either 手动编写的集合，或者学习节点选择器，它们都依赖于单个节点数据。我们提出了一种新的双 simulations 技术，使用强化学习（RL），而不是只是关注具体节点。为此，我们训练了一个图ael neural network，该模型生成一个基于从模型的根到要选择的叶子节点的路径的概率分布。将节点选择视为概率分布，允许我们使用现代RL技术，包括内在节点质量和节点评估成本。我们的方法在不同和复杂的问题集上实现了高质量的节点选择策略，即使只是在特制的 sintetic TSP 实例上进行训练。在多个标准测试集上，我们的方法可以减少优化差距和每个节点的时间效率，并且在严格的时间限制下达到了显著改善。

Gradient and Uncertainty Enhanced Sequential Sampling for Global Fit

paper_url: http://arxiv.org/abs/2310.00110
repo_url: https://github.com/svenl13/gale
paper_authors: Sven Lämmle, Can Bogoclu, Kevin Cremanns, Dirk Roos
for: 本研究旨在提出一种新的随机抽样策略，以提高global fit的准确性和效率。
methods: 本研究使用机器学习技术建立了一个新的随机抽样策略，名为Gradient和Uncertainty Enhanced Sequential Sampling（GUESS）。该策略使用两个因素：预测 posterior uncertainty 和重量加权 Taylor 展开值。
results: 对于26个1到8维确定函数中的26个测试例，GUESS在global fit中平均实现了最高的样本效率，并且在高维度下的行为和模型选择的重要性进行了一个ablation研究。

Abstract
Surrogate models based on machine learning methods have become an important part of modern engineering to replace costly computer simulations. The data used for creating a surrogate model are essential for the model accuracy and often restricted due to cost and time constraints. Adaptive sampling strategies have been shown to reduce the number of samples needed to create an accurate model. This paper proposes a new sampling strategy for global fit called Gradient and Uncertainty Enhanced Sequential Sampling (GUESS). The acquisition function uses two terms: the predictive posterior uncertainty of the surrogate model for exploration of unseen regions and a weighted approximation of the second and higher-order Taylor expansion values for exploitation. Although various sampling strategies have been proposed so far, the selection of a suitable method is not trivial. Therefore, we compared our proposed strategy to 9 adaptive sampling strategies for global surrogate modeling, based on 26 different 1 to 8-dimensional deterministic benchmarks functions. Results show that GUESS achieved on average the highest sample efficiency compared to other surrogate-based strategies on the tested examples. An ablation study considering the behavior of GUESS in higher dimensions and the importance of surrogate choice is also presented.

摘要
现代工程中的代表模型基于机器学习方法已成为重要的一部分，以取代昂贵的计算模拟。创建代表模型所用的数据是模型准确性的关键因素，而这些数据往往因成本和时间限制而受到限制。适应采样策略可以减少创建模型所需的样本数量。本文提出了一种新的采样策略，即梯度和不确定性增强顺序采样（GUESS）。采样函数使用两个项：代表后预测 posterior 不确定性和质量权重Weighted approximation of the second and higher-order Taylor expansion values。虽然过去已经有多种采样策略被提出，但选择合适的方法并不是易事。因此，我们对9种适应采样策略进行了对比，这些策略基于26个不同的1到8维度束缚函数。结果表明，GUESS在测试例子上的平均样本效率高于其他代表模型基于策略。此外，我们还进行了对GUESS在更高维度和代表选择的影响进行了一个减少研究。

FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things

paper_url: http://arxiv.org/abs/2310.00109
repo_url: https://github.com/aiot-mlsys-lab/fedaiot
paper_authors: Samiul Alam, Tuo Zhang, Tiantian Feng, Hui Shen, Zhichao Cao, Dong Zhao, JeongGil Ko, Kiran Somasundaram, Shrikanth S. Narayanan, Salman Avestimehr, Mi Zhang
for: 本研究的目的是填补现有 Federated Learning (FL) 工作中 dataset 的假设和缺失，通过在 authentic IoT 设备上收集数据，实现更加准确和有用的 FL 模型。
methods: 本研究使用了一种统一的综合 FL 框架，以便对 AIoT 应用场景进行简单的 benchmarking。这种框架包括 eight 个 IoT 设备收集的数据集，这些数据集涵盖了 IoT 特有的模式和目标应用。
results: 本研究的结果表明，FL 在 AIoT 领域具有广泛的应用前景和挑战，FedAIoT 可以作为一个有价值的资源，推动 FL 在 AIoT 领域的进步。

Abstract
There is a significant relevance of federated learning (FL) in the realm of Artificial Intelligence of Things (AIoT). However, most existing FL works are not conducted on datasets collected from authentic IoT devices that capture unique modalities and inherent challenges of IoT data. In this work, we introduce FedAIoT, an FL benchmark for AIoT to fill this critical gap. FedAIoT includes eight datatsets collected from a wide range of IoT devices. These datasets cover unique IoT modalities and target representative applications of AIoT. FedAIoT also includes a unified end-to-end FL framework for AIoT that simplifies benchmarking the performance of the datasets. Our benchmark results shed light on the opportunities and challenges of FL for AIoT. We hope FedAIoT could serve as an invaluable resource to foster advancements in the important field of FL for AIoT. The repository of FedAIoT is maintained at https://github.com/AIoT-MLSys-Lab/FedAIoT.

摘要
在人工智能物联网（AIoT）领域，联合学习（FL）具有重要的相关性。然而，大多数现有的FL工作都不是基于真实的IoT设备收集的数据集，这些数据集捕捉了特殊的IoT模式和IoT数据的内在挑战。在这项工作中，我们介绍了FedAIoT，一个针对AIoT的FLbenchmark，以填补这一关键的空白。FedAIoT包括8个来自各种IoT设备的数据集，这些数据集覆盖了IoT模式的唯一特征和AIoT应用的代表性。FedAIoT还包括一个简化了AIoTFL框架的统一终端，以便对数据集的性能进行比较。我们的benchmark结果暴露了FL在AIoT中的机会和挑战。我们希望FedAIoT能成为AIoTFL的重要资源，推动这一领域的进步。FedAIoT的存储库位于https://github.com/AIoT-MLSys-Lab/FedAIoT。

Latent Space Symmetry Discovery

paper_url: http://arxiv.org/abs/2310.00105
repo_url: https://github.com/jiankeyang/LaLiGAN
paper_authors: Jianke Yang, Nima Dehmamy, Robin Walters, Rose Yu
for: 这个论文是为了提出一种新的生成模型，即Latent LieGAN（LaLiGAN），可以从数据中自动发现非线性对称性。
methods: LaLiGAN使用了一种新的推理方法，即将数据映射到一个 latent space 中，并在这个 latent space 中同时发现数据的对称性。
results: 实验表明，LaLiGAN 可以成功地捕捉高维观察数据中的内在对称性，从而生成一个具有良好结构的 latent space，这个 latent space 可以用于其他下游任务。例如，LaLiGAN 可以提高Equation Discovery 和 Long-term Forecasting 等任务的性能。

Abstract
Equivariant neural networks require explicit knowledge of the symmetry group. Automatic symmetry discovery methods aim to relax this constraint and learn invariance and equivariance from data. However, existing symmetry discovery methods are limited to linear symmetries in their search space and cannot handle the complexity of symmetries in real-world, often high-dimensional data. We propose a novel generative model, Latent LieGAN (LaLiGAN), which can discover nonlinear symmetries from data. It learns a mapping from data to a latent space where the symmetries become linear and simultaneously discovers symmetries in the latent space. Theoretically, we show that our method can express any nonlinear symmetry under certain conditions. Experimentally, our method can capture the intrinsic symmetry in high-dimensional observations, which results in a well-structured latent space that is useful for other downstream tasks. We demonstrate the use cases for LaLiGAN in improving equation discovery and long-term forecasting for various dynamical systems.

摘要
EQUIVALENT NEURAL NETWORKS REQUIRE EXPLICIT KNOWLEDGE OF THE SYMMETRY GROUP. AUTOMATIC SYMMETRY DISCOVERY METHODS AIM TO RELAX THIS CONSTRAINT AND LEARN INVARIANCE AND EQUIVALENCE FROM DATA. HOWEVER, EXISTING SYMMETRY DISCOVERY METHODS ARE LIMITED TO LINEAR SYMMETRIES IN THEIR SEARCH SPACE AND CANNOT HANDLE THE COMPLEXITY OF SYMMETRIES IN REAL-WORLD, OFTEN HIGH-DIMENSIONAL DATA. WE PROPOSE A NOVEL GENERATIVE MODEL, LATENT LIEGAN (LALIGAN), WHICH CAN DISCOVER NONLINEAR SYMMETRIES FROM DATA. IT LEARNS A MAPPING FROM DATA TO A LATENT SPACE WHERE THE SYMMETRIES BECOME LINEAR AND SIMULTANEOUSLY DISCOVERS SYMMETRIES IN THE LATENT SPACE. THEORETICALLY, WE SHOW THAT OUR METHOD CAN EXPRESS ANY NONLINEAR SYMMETRY UNDER CERTAIN CONDITIONS. EXPERIMENTALLY, OUR METHOD CAN CAPTURE THE INTRINSIC SYMMETRY IN HIGH-DIMENSIONAL OBSERVATIONS, WHICH RESULTS IN A WELL-STRUCTURED LATENT SPACE THAT IS USEFUL FOR OTHER DOWNSTREAM TASKS. WE DEMONSTRATE THE USE CASES FOR LALIGAN IN IMPROVING EQUATION DISCOVERY AND LONG-TERM FORECASTING FOR VARIOUS DYNAMICAL SYSTEMS.

Federated Learning with Differential Privacy for End-to-End Speech Recognition

paper_url: http://arxiv.org/abs/2310.00098
repo_url: None
paper_authors: Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan “Honza” Silovsky, Kunal Talwar, Tatiana Likhomanenko
for: 本文旨在应用 federated learning (FL) 技术在自动语音识别 (ASR) 领域，并在不破坏用户隐私的前提下实现 robust 隐私保障。
methods: 本文使用了 differential privacy (DP) 技术来保证用户隐私，并对 Federated Learning (FL) 技术进行了扩展和应用。
results: 本文实现了用户级别 ($7.2$, $10^{-9}$)-DP 和 ($4.5$, $10^{-9}$)-DP 的隐私保障，并且在不同的数据异同和预测范围内实现了nearly optimal的模型训练。

Abstract
While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP to FL for ASR. In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. First, we extend the existing research on FL for ASR by exploring different aspects of recent $\textit{large end-to-end transformer models}$: architecture design, seed models, data heterogeneity, domain shift, and impact of cohort size. With a $\textit{practical}$ number of central aggregations we are able to train $\textbf{FL models}$ that are \textbf{nearly optimal} even with heterogeneous data, a seed model from another domain, or no pre-trained seed model. Second, we apply DP to FL for ASR, which is non-trivial since DP noise severely affects model training, especially for large transformer models, due to highly imbalanced gradients in the attention block. We counteract the adverse effect of DP noise by reviving per-layer clipping and explaining why its effect is more apparent in our case than in the prior work. Remarkably, we achieve user-level ($7.2$, $10^{-9}$)-$\textbf{DP}$ (resp. ($4.5$, $10^{-9}$)-$\textbf{DP}$) with a 1.3% (resp. 4.6%) absolute drop in the word error rate for extrapolation to high (resp. low) population scale for $\textbf{FL with DP in ASR}$.

摘要
While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP to FL for ASR. In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. First, we extend the existing research on FL for ASR by exploring different aspects of recent large end-to-end transformer models: architecture design, seed models, data heterogeneity, domain shift, and impact of cohort size. With a practical number of central aggregations, we are able to train FL models that are nearly optimal even with heterogeneous data, a seed model from another domain, or no pre-trained seed model. Second, we apply DP to FL for ASR, which is non-trivial since DP noise severely affects model training, especially for large transformer models, due to highly imbalanced gradients in the attention block. We counteract the adverse effect of DP noise by reviving per-layer clipping and explaining why its effect is more apparent in our case than in the prior work. Remarkably, we achieve user-level ($7.2$, $10^{-9}$)-DP (resp. ($4.5$, $10^{-9}$)-DP) with a 1.3% (resp. 4.6%) absolute drop in the word error rate for extrapolation to high (resp. low) population scale for FL with DP in ASR.

Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym

paper_url: http://arxiv.org/abs/2310.00077
repo_url: None
paper_authors: Elena Raponi, Nathanael Rakotonirina Carraz, Jérémy Rapin, Carola Doerr, Olivier Teytaud
for: 这篇论文主要研究的是黑盒优化（BBO）和机器学习（ML）之间的交互作用，以及这两个领域之间的关系。
methods: 本论文使用了搜索空间中的演进算法和bayesian优化（BO）来对比不同的优化算法。
results: 研究结果表明，BO-based优化器在评估预算有限时表现良好，但在评估预算较大时常常被其他家族的算法所超越。此外，研究还发现了一些来自BBO社区的算法在ML任务上表现出优异的表现。

Abstract
The growing ubiquity of machine learning (ML) has led it to enter various areas of computer science, including black-box optimization (BBO). Recent research is particularly concerned with Bayesian optimization (BO). BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration. However, their efficiency decreases as the dimensionality of the problem and the budget of evaluations increase. Meanwhile, derivative-free optimization methods have evolved independently in the optimization community. Therefore, we urge to understand whether cross-fertilization is possible between the two communities, ML and BBO, i.e., whether algorithms that are heavily used in ML also work well in BBO and vice versa. Comparative experiments often involve rather small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.

摘要
machine learning（ML）的普遍使得它进入了计算机科学中的各个领域，包括黑盒优化（BBO）。最近的研究特别关注于概率优化（BO）。BO基本算法在机器学习社区中很受欢迎，因为它们用于超参数优化和更一般地用于算法配置。然而，随着问题维度和评估预算的增加，BO基本算法的效率会降低。同时，不需要导数的优化方法在优化社区中发展了独立。因此，我们需要了解是否可以在这两个社区之间进行交叉推导，即机器学习中广泛使用的算法是否也适用于BBO，并且vice versa。 Comparative experiments often involve small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.

EPiC-ly Fast Particle Cloud Generation with Flow-Matching and Diffusion

paper_url: http://arxiv.org/abs/2310.00049
repo_url: https://github.com/uhh-pd-ml/EPiC-FM
paper_authors: Erik Buhmann, Cedric Ewen, Darius A. Faroughy, Tobias Golling, Gregor Kasieczka, Matthew Leigh, Guillaume Quétant, John Andrew Raine, Debajyoti Sengupta, David Shih
for: This paper is written for researchers and practitioners in the field of particle physics and generative modeling. The authors aim to provide two novel methods for generating LHC jets as point clouds, which can be used for a variety of applications such as particle physics experiments and simulations.
methods: The paper introduces two novel methods for generating LHC jets as point clouds: \epcjedi and \epcfm. \epcjedi combines score-matching diffusion models with the Equivariant Point Cloud (EPiC) architecture based on the deep sets framework, while \epcfm is the first permutation equivariant continuous normalizing flow (CNF) for particle cloud generation. Both methods are trained using the flow-matching objective, which is a scalable and easy-to-train objective based on optimal transport.
results: The authors demonstrate that both \epcjedi and \epcfm achieve state-of-the-art performance on the top-quark JetNet datasets while maintaining fast generation speed. Specifically, \epcfm consistently outperforms all the other generative models considered in the paper across every metric. Additionally, the authors introduce two new particle cloud performance metrics: one based on the Kullback-Leibler divergence between feature distributions, and the other is the negative log-posterior of a multi-model ParticleNet classifier.

Abstract
Jets at the LHC, typically consisting of a large number of highly correlated particles, are a fascinating laboratory for deep generative modeling. In this paper, we present two novel methods that generate LHC jets as point clouds efficiently and accurately. We introduce \epcjedi, which combines score-matching diffusion models with the Equivariant Point Cloud (EPiC) architecture based on the deep sets framework. This model offers a much faster alternative to previous transformer-based diffusion models without reducing the quality of the generated jets. In addition, we introduce \epcfm, the first permutation equivariant continuous normalizing flow (CNF) for particle cloud generation. This model is trained with {\it flow-matching}, a scalable and easy-to-train objective based on optimal transport that directly regresses the vector fields connecting the Gaussian noise prior to the data distribution. Our experiments demonstrate that \epcjedi and \epcfm both achieve state-of-the-art performance on the top-quark JetNet datasets whilst maintaining fast generation speed. Most notably, we find that the \epcfm model consistently outperforms all the other generative models considered here across every metric. Finally, we also introduce two new particle cloud performance metrics: the first based on the Kullback-Leibler divergence between feature distributions, the second is the negative log-posterior of a multi-model ParticleNet classifier.

摘要
各种jets在LHC中，通常是一大量高度相关的粒子，是一个非常有趣的实验室，用于深入的生成模型。在这篇论文中，我们介绍了两种新的方法，可以高效精准地生成LHC jets作为点云。我们引入了\epcjedi，它将得分匹配扩散模型和深度集成（EPiC）架构与深度集成框架（deep sets）结合。这个模型比前一代基于转移器的扩散模型更快，而且没有降低生成jets的质量。此外，我们引入了\epcfm，第一个具有对称性的连续正常化流（CNF），用于粒子云生成。这个模型通过可扩展的和易于训练的对流匹配目标训练，直接将泊松噪声前提中的高斯噪声连接到数据分布。我们的实验表明，\epcjedi和\epcfm都达到了JetNet数据集的顶峰性能，而且保持了快速的生成速度。尤其是，我们发现\epcfm模型在所有考虑的生成模型中一直保持领先，在每一个纪录中都表现出优异的性能。最后，我们还介绍了两种新的粒子云性能指标：第一个基于粒子分布的卷积-莱布勒偏移，第二个是一个多模型ParticleNet分类器的负极对数 posterior。

Machine Learning Clifford invariants of ADE Coxeter elements

paper_url: http://arxiv.org/abs/2310.00041
repo_url: https://github.com/dimadroid/ml_clifford_invariants
paper_authors: Siqi Chen, Pierre-Philippe Dechant, Yang-Hui He, Elli Heyes, Edward Hirst, Dmitrii Riabchenko
for: 这个论文是关于新的克利福德几何 invariants的研究，这些 invariants 是用于 linear transformations 的 novel 的几何 invariants。
methods: 这个论文使用了高性能计算的计算代数方法，对 $A_8$, $D_8$ 和 $E_8$ 的 Coxeter transformations 进行了exhaustive 的计算，并使用了数据科学技术such as supervised 和 unsupervised machine learning。
results: 这个论文发现了这些 Clifford algebraic datasets 可以高度精度地被机器学习，并且提供了一些新的 geometric invariants 和其他已知的几何 invariants 之间的关系。

Abstract
There has been recent interest in novel Clifford geometric invariants of linear transformations. This motivates the investigation of such invariants for a certain type of geometric transformation of interest in the context of root systems, reflection groups, Lie groups and Lie algebras: the Coxeter transformations. We perform exhaustive calculations of all Coxeter transformations for $A_8$, $D_8$ and $E_8$ for a choice of basis of simple roots and compute their invariants, using high-performance computing. This computational algebra paradigm generates a dataset that can then be mined using techniques from data science such as supervised and unsupervised machine learning. In this paper we focus on neural network classification and principal component analysis. Since the output -- the invariants -- is fully determined by the choice of simple roots and the permutation order of the corresponding reflections in the Coxeter element, we expect huge degeneracy in the mapping. This provides the perfect setup for machine learning, and indeed we see that the datasets can be machine learned to very high accuracy. This paper is a pump-priming study in experimental mathematics using Clifford algebras, showing that such Clifford algebraic datasets are amenable to machine learning, and shedding light on relationships between these novel and other well-known geometric invariants and also giving rise to analytic results.

摘要
有最近的兴趣在新的Clifford геометрических invariants of linear transformations。这种 motivates the investigation of such invariants for a certain type of geometric transformation of interest in the context of root systems, reflection groups, Lie groups and Lie algebras：Coxeter transformations。我们进行了所有Coxeter transformations for $A_8$, $D_8$ and $E_8$ 的详细计算，使用高性能计算机，并计算了他们的 invariants，使用数据科学技术 such as supervised and unsupervised machine learning。这个计算代数 paradigm generates a dataset that can then be mined using techniques from data science such as supervised and unsupervised machine learning。在这篇论文中，我们主要关注神经网络分类和主成分分析。因为输出—— invariants 完全取决于选择的简单根和相应的反射的 permutation order in the Coxeter element，我们预计存在巨大的杂化在映射中。这提供了完美的设置 для机器学习，并确实发现 datasets 可以通过非常高的准确率进行机器学习。这篇论文是一个实验数学的燃料研究，证明 Clifford 代数数据可以被机器学习，并暴露了这些新的和其他已知的 геомétric invariants之间的关系，并且生成了分析结果。

Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

paper_url: http://arxiv.org/abs/2309.17417
repo_url: https://github.com/arjunsubramonian/link_bias_amplification
paper_authors: Arjun Subramonian, Levent Sagun, Yizhou Sun
for: 本研究探讨了图 neural network（GNN）链接预测在引用、协作和在线社交网络中的应用，以及链接预测中的内部公平性和“富豪征”现象。
methods: 本研究使用了图 convolutional neural network（GCN）和正则化技术来预测链接，并对GCN的偏好附加进行了理论分析。
results: 研究发现，GCN采用正则化技术可以减少链接预测中的内部不公平性，并提出了一种新的内部公平度指标来衡量链接预测 scores 之间的差异。

Abstract
Graph neural network (GNN) link prediction is increasingly deployed in citation, collaboration, and online social networks to recommend academic literature, collaborators, and friends. While prior research has investigated the dyadic fairness of GNN link prediction, the within-group fairness and ``rich get richer'' dynamics of link prediction remain underexplored. However, these aspects have significant consequences for degree and power imbalances in networks. In this paper, we shed light on how degree bias in networks affects Graph Convolutional Network (GCN) link prediction. In particular, we theoretically uncover that GCNs with a symmetric normalized graph filter have a within-group preferential attachment bias. We validate our theoretical analysis on real-world citation, collaboration, and online social networks. We further bridge GCN's preferential attachment bias with unfairness in link prediction and propose a new within-group fairness metric. This metric quantifies disparities in link prediction scores between social groups, towards combating the amplification of degree and power disparities. Finally, we propose a simple training-time strategy to alleviate within-group unfairness, and we show that it is effective on citation, online social, and credit networks.

摘要
graph neural network（GNN）链接预测在引用、协作和社交网络中被越来越广泛应用，以推荐学术论文、合作伙伴和朋友。然而，之前的研究主要关注了GNN链接预测的对称公平性，而内部公平和“富豪投资”的动态尚未得到足够的研究。然而，这些方面对于网络中的度和权力差异有着重要的影响。在这篇论文中，我们探讨了网络中度偏袋的影响于图 convolutional neural network（GCN）链接预测。具体来说，我们经过理论分析发现，GCNs具有对称正规图滤波器时有内部偏袋预测偏好。我们在实际的引用、协作和社交网络上验证了我们的理论分析。此外，我们将GCN的偏袋偏好与链接预测不公平性相连接，并提出了一种新的内部公平度量。这种度量可以衡量链接预测得分之间的社会组别差异，以遏止度和权力差异的扩大。最后，我们提出了一种简单的训练时间策略来缓解内部不公平性，并在引用、社交和借款网络上验证了其效果。

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

paper_url: http://arxiv.org/abs/2310.00036
repo_url: https://github.com/vwxyzjn/cleanba
paper_authors: Shengyi Huang, Jiayi Weng, Rujikorn Charakorn, Min Lin, Zhongwen Xu, Santiago Ontañón
for: 提高自主代理的训练效率，使用更多计算资源和更少的训练时间。
methods: 提出了一个新的开源平台cleanba，该平台提供了高度可重现的actor-learner框架，并实现了高度优化的分布式PPO和IMPALA算法。
results: 在Atari游戏中，cleanba variants可以与强IMPALA基线和PPO基eline匹配或超越，同时具有较短的训练时间和更高的可重现性。

Abstract
Distributed Deep Reinforcement Learning (DRL) aims to leverage more computational resources to train autonomous agents with less training time. Despite recent progress in the field, reproducibility issues have not been sufficiently explored. This paper first shows that the typical actor-learner framework can have reproducibility issues even if hyperparameters are controlled. We then introduce Cleanba, a new open-source platform for distributed DRL that proposes a highly reproducible architecture. Cleanba implements highly optimized distributed variants of PPO and IMPALA. Our Atari experiments show that these variants can obtain equivalent or higher scores than strong IMPALA baselines in moolib and torchbeast and PPO baseline in CleanRL. However, Cleanba variants present 1) shorter training time and 2) more reproducible learning curves in different hardware settings. Cleanba's source code is available at \url{https://github.com/vwxyzjn/cleanba}

摘要
分布式深度强化学习（DRL）目的是利用更多计算资源来训练自主代理人，减少训练时间。尽管当前领域已经取得了一定进步，但是复现性问题还没有得到充分探讨。这篇论文首先显示了通常的演员学习框架可能会存在复现性问题，即使控制了超参数。然后，我们介绍了Cleanba，一个新的开源平台 для分布式DRL，该平台提出了高度复制cible的架构。Cleanba实现了高度优化的分布式PPO和IMPALA算法。我们在Atari游戏中进行了实验，发现Cleanba变体可以与强大的IMPALA基线在moolib和torchbeast中获得相同或更高的得分，并且在不同硬件设置下可以 obtaint 1) 训练时间更短和 2) 更加复制cible的学习曲线。Cleanba的源代码可以在以下链接中获取：https://github.com/vwxyzjn/cleanba。

Maximal Volume Matrix Cross Approximation for Image Compression and Least Squares Solution

paper_url: http://arxiv.org/abs/2309.17403
repo_url: None
paper_authors: Kenneth Allen, Ming-Jun Lai, Zhaiming Shen
for: 这 paper 的主要目标是研究基于最大体积子matrix的矩阵减少方法。
methods: 这 paper 使用了一种新的证明 classic estimate 的方法，以及一种基于最大体积子matrix的迪kins algorithms。
results: 这 paper 提出了一种基于最大体积子matrix的矩阵减少方法，并且提供了许多实际应用，如图像压缩和连续函数最小二乘approximation。numerical results 表明这种方法的效果非常好。

Abstract
We study the classic cross approximation of matrices based on the maximal volume submatrices. Our main results consist of an improvement of a classic estimate for matrix cross approximation and a greedy approach for finding the maximal volume submatrices. Indeed, we present a new proof of a classic estimate of the inequality with an improved constant. Also, we present a family of greedy maximal volume algorithms which improve the error bound of cross approximation of a matrix in the Chebyshev norm and also improve the computational efficiency of classic maximal volume algorithm. The proposed algorithms are shown to have theoretical guarantees of convergence. Finally, we present two applications: one is image compression and the other is least squares approximation of continuous functions. Our numerical results in the end of the paper demonstrate the effective performances of our approach.

摘要
我们研究基于最大体积子矩阵的经典拟合矩阵方法。我们的主要结果包括对经典估计的改进和一种新的规范最大体积子矩阵算法。首先，我们提供一个新的证明，证明了经典估计的不等式具有改进的常数。其次，我们提出了一种家族的排序最大体积子矩阵算法，这些算法可以在Chebychev范数中提高矩阵拟合的误差 bound，同时也可以提高经典最大体积子矩阵算法的计算效率。这些算法具有理论保证的收敛性。最后，我们介绍了两个应用：一个是图像压缩，另一个是continue函数最小二乘approximation。我们的数值结果在文章的末尾展示了我们的方法的有效性。

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

paper_url: http://arxiv.org/abs/2309.17395
repo_url: None
paper_authors: Andrew Rouditchenko, Ronan Collobert, Tatiana Likhomanenko
for: 这个论文是为了提出一种Audio-visual speech recognition（AVSR）模型的训练方法，该方法可以在 audio-visual 输入中进行语音识别，同时可以使用单个模式进行语音识别。
methods: 这个论文使用了 Continuous Pseudo-labeling（CPL）方法，该方法通过将带有标签的视频与无标签视频混合，生成了pseudo-labels，并使用这些pseudo-labels来训练AVSR模型。
results: 该论文在LRS3 dataset上 obtainted significant improvements in Visual Speech Recognition（VSR）性能，同时保持了实用的Audio-Visual Speech Recognition（ASR）和AVSR性能。此外，使用没有标签的视频数据，该方法还能够利用无标签视频来提高VSR性能。

Abstract
Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR). We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method to train an audio-visual speech recognition (AVSR) model on a combination of labeled and unlabeled videos with continuously regenerated pseudo-labels. Our models are trained for speech recognition from audio-visual inputs and can perform speech recognition using both audio and visual modalities, or only one modality. Our method uses the same audio-visual model for both supervised training and pseudo-label generation, mitigating the need for external speech recognition models to generate pseudo-labels. AV-CPL obtains significant improvements in VSR performance on the LRS3 dataset while maintaining practical ASR and AVSR performance. Finally, using visual-only speech data, our method is able to leverage unlabeled visual speech to improve VSR.

摘要
audio-visual speech contains synchronized audio and visual information, providing cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR). We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method to train an audio-visual speech recognition (AVSR) model on a combination of labeled and unlabeled videos with continuously regenerated pseudo-labels. Our models are trained for speech recognition from audio-visual inputs and can perform speech recognition using both audio and visual modalities, or only one modality. Our method uses the same audio-visual model for both supervised training and pseudo-label generation, mitigating the need for external speech recognition models to generate pseudo-labels. AV-CPL obtains significant improvements in VSR performance on the LRS3 dataset while maintaining practical ASR and AVSR performance. Finally, using visual-only speech data, our method is able to leverage unlabeled visual speech to improve VSR.Here's the translation in Traditional Chinese:Audio-visual speech 包含同步的音频和视讯信息，提供跨Modal 的监控，以learn representation for both automatic speech recognition (ASR) 和 visual speech recognition (VSR). We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method to train an audio-visual speech recognition (AVSR) model on a combination of labeled and unlabeled videos with continuously regenerated pseudo-labels. Our models are trained for speech recognition from audio-visual inputs and can perform speech recognition using both audio and visual modalities, or only one modality. Our method uses the same audio-visual model for both supervised training and pseudo-label generation, mitigating the need for external speech recognition models to generate pseudo-labels. AV-CPL obtains significant improvements in VSR performance on the LRS3 dataset while maintaining practical ASR and AVSR performance. Finally, using visual-only speech data, our method is able to leverage unlabeled visual speech to improve VSR.

Tree Cross Attention

paper_url: http://arxiv.org/abs/2309.17388
repo_url: https://github.com/danderfer/Comp_Sci_Sem_2
paper_authors: Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed
for: 这个论文是为了提高减少语言模型在预测时的字符数量而设计的。
methods: 这个论文使用了一种名为Tree Cross Attention（TCA）的模块，它只在预测时从logs（N）中选择一个数量较少的token进行预测。
results: 论文的实验结果表明，使用TCA模块可以与cross attention相比，在不同的分类和uncertainty regression任务中表现相似，而且在使用相同数量的token时，有显著的提升。

Abstract
Cross Attention is a popular method for retrieving information from a set of context tokens for making predictions. At inference time, for each prediction, Cross Attention scans the full set of $\mathcal{O}(N)$ tokens. In practice, however, often only a small subset of tokens are required for good performance. Methods such as Perceiver IO are cheap at inference as they distill the information to a smaller-sized set of latent tokens $L < N$ on which cross attention is then applied, resulting in only $\mathcal{O}(L)$ complexity. However, in practice, as the number of input tokens and the amount of information to distill increases, the number of latent tokens needed also increases significantly. In this work, we propose Tree Cross Attention (TCA) - a module based on Cross Attention that only retrieves information from a logarithmic $\mathcal{O}(\log(N))$ number of tokens for performing inference. TCA organizes the data in a tree structure and performs a tree search at inference time to retrieve the relevant tokens for prediction. Leveraging TCA, we introduce ReTreever, a flexible architecture for token-efficient inference. We show empirically that Tree Cross Attention (TCA) performs comparable to Cross Attention across various classification and uncertainty regression tasks while being significantly more token-efficient. Furthermore, we compare ReTreever against Perceiver IO, showing significant gains while using the same number of tokens for inference.

摘要
cross attention是一种广泛使用的方法，用于从 Kontext 字符串中提取信息，以便进行预测。在推理时，对于每个预测，cross attention会扫描所有的 $\mathcal{O}(N)$ 字符串。在实践中， however，经常只需要一小 subsets of tokens 来获得良好的性能。例如，使用 Perceiver IO 可以在推理时将信息简化为一个更小的精度字符串 $L < N$ 上，从而实现只有 $\mathcal{O}(L)$ 的复杂度。然而，随着输入字符串的数量和信息的总量的增加，需要的精度字符串数量也会增加显著。在这种情况下，我们提出了 Tree Cross Attention（TCA）模块，它只需要在推理时对一个对数 $\mathcal{O}(\log(N))$ 的字符串进行搜索，以便进行预测。TCA 将数据组织成树结构，并在推理时进行树搜索，以 retrieve 需要的信息。基于 TCA，我们介绍了 ReTreever，一种可以进行token-efficient 的架构。我们通过实验表明，Tree Cross Attention（TCA）与 Cross Attention 在不同的分类和不确定度 regression 任务中表现相似，而且在使用相同数量的字符串进行推理时，Token 效率明显更高。此外，我们还对 ReTreever 与 Perceiver IO 进行了比较，发现它们在使用相同数量的字符串时表现出了显著的改善。

Parallel Computation of Multi-Slice Clustering of Third-Order Tensors

paper_url: http://arxiv.org/abs/2309.17383
repo_url: https://github.com/ANDRIANTSIORY/MSC_parallel
paper_authors: Dina Faneva Andriantsiory, Camille Coti, Joseph Ben Geloun, Mustapha Lebbah
for: 该论文目的是提出并实现一种并行计算多方向集群（Multi-Slice Clustering，MSC）算法，以处理巨量数据的挑战。
methods: 该论文使用了分布式存储系统和并行计算方法来实现MSC算法，包括spectral analysis of tensor slices和独立进行每个tensor mode的计算。
results: 论文表明，使用并行计算方法可以超越串行计算方法，并且可以扩展MSC算法的缩放性。

Abstract
Machine Learning approaches like clustering methods deal with massive datasets that present an increasing challenge. We devise parallel algorithms to compute the Multi-Slice Clustering (MSC) for 3rd-order tensors. The MSC method is based on spectral analysis of the tensor slices and works independently on each tensor mode. Such features fit well in the parallel paradigm via a distributed memory system. We show that our parallel scheme outperforms sequential computing and allows for the scalability of the MSC method.

摘要
机器学习方法如聚集方法面临着庞大数据集的增加挑战。我们设计了并行算法计算多slice clustering（MSC） для三阶tensor。MSC方法基于tensor扫描的спектраль分析，并在每个tensor模式上独立工作。这些特征适合并行парадигмы，通过分布式存储系统实现。我们展示了我们的并行方案比Sequential计算更高效，并允许MSC方法的扩展。Note:* 机器学习（Machine Learning）用于描述使用计算机algorithms来学习和分类数据的过程。* 聚集方法（clustering methods）是一种常用的机器学习方法，用于将数据分成不同的群体。* 多slice clustering（MSC）是一种基于tensor的聚集方法，用于分析高维数据。* spectral analysis是一种数学分析方法，用于分析函数或数据的特征和结构。* 并行算法（parallel algorithms）是一种使用多个计算机或处理器来实现计算任务的方法。* 分布式存储系统（distributed storage system）是一种将数据分布在多个计算机或存储设备上的存储系统。

Adversarial Imitation Learning from Visual Observations using Latent Information

paper_url: http://arxiv.org/abs/2309.17371
repo_url: https://github.com/vittoriogiammarino/ail_from_visual_obs
paper_authors: Vittorio Giammarino, James Queeney, Ioannis Ch. Paschalidis
for: 本文主要针对视觉观察下的模仿学习问题，学习代理人没有直接观察到专家行为，只有视频记录。
methods: 本文提出了一种名为“伪装对抗学习从观察”的算法，结合了伪装学习技术和学习agent的状态表示。
results: 在高维连续 робо护任务上，该算法可以达到状态艺术水平，同时提供了显著的计算优势。此外，该方法还可以改善来自像素的奖励学习效率。

Abstract
We focus on the problem of imitation learning from visual observations, where the learning agent has access to videos of experts as its sole learning source. The challenges of this framework include the absence of expert actions and the partial observability of the environment, as the ground-truth states can only be inferred from pixels. To tackle this problem, we first conduct a theoretical analysis of imitation learning in partially observable environments. We establish upper bounds on the suboptimality of the learning agent with respect to the divergence between the expert and the agent latent state-transition distributions. Motivated by this analysis, we introduce an algorithm called Latent Adversarial Imitation from Observations, which combines off-policy adversarial imitation techniques with a learned latent representation of the agent's state from sequences of observations. In experiments on high-dimensional continuous robotic tasks, we show that our algorithm matches state-of-the-art performance while providing significant computational advantages. Additionally, we show how our method can be used to improve the efficiency of reinforcement learning from pixels by leveraging expert videos. To ensure reproducibility, we provide free access to our code.

摘要
我们集中精力于复制学习从视觉观察中，具体是学习代理人从专家短片中学习。这个框架面临的挑战包括专家动作的缺失和环境的侧面不可观测，因为真实的状态只能从像素推导出来。为解决这个问题，我们首先进行了对半可观察环境中的复制学习的理论分析。我们建立了对代理人对于专家和代理人内部状态转换分布的不准确性的上限。受这一分析的激励，我们提出了叫做“伪函数对抗复制”的算法，它结合了不同策略的对抗复制技术和学习的伪函数表示。在高维度连续控制任务上进行了实验，我们发现了我们的算法可以与现有的表现相匹配，并且提供了重要的计算优势。此外，我们还说明了我们的方法可以帮助从像素学习更加有效率。为保持可重现性，我们提供了免费的代码。

Graph-based Neural Weather Prediction for Limited Area Modeling

paper_url: http://arxiv.org/abs/2309.17370
repo_url: https://github.com/joeloskarsson/neural-lam
paper_authors: Joel Oskarsson, Tomas Landelius, Fredrik Lindsten
for: 本研究旨在应用神经网络预测方法来进行局部区域天气预报。
methods: 本研究使用了一种基于图形的神经网络预测方法，并提出了一个多层次嵌入式模型扩展。
results: 实验结果显示，本方法可以有效地应用于局部区域天气预报，并且能够提供高分辨率的预测。

Abstract
The rise of accurate machine learning methods for weather forecasting is creating radical new possibilities for modeling the atmosphere. In the time of climate change, having access to high-resolution forecasts from models like these is also becoming increasingly vital. While most existing Neural Weather Prediction (NeurWP) methods focus on global forecasting, an important question is how these techniques can be applied to limited area modeling. In this work we adapt the graph-based NeurWP approach to the limited area setting and propose a multi-scale hierarchical model extension. Our approach is validated by experiments with a local model for the Nordic region.

摘要
“精准机器学习方法的气象预测技术发展，开创了新的气象模拟可能性。在气候变化时代，高分辨率预测模型的存在也变得越来越重要。大多数现有的神经网络气象预测（NeurWP）方法都集中在全球预测，关键问题是如何将这些技术应用到有限区域模型中。在这种工作中，我们将图基的NeuWP方法应用到有限区域设置中，并提出了多尺度层次模型扩展。我们的方法通过对北欧地区的本地模型进行实验来验证。”Note: Simplified Chinese is used here, as it is the most commonly used version of Chinese in mainland China and is more widely understood. However, if you prefer Traditional Chinese, I can provide that version as well.

Module-wise Training of Neural Networks via the Minimizing Movement Scheme

paper_url: http://arxiv.org/abs/2309.17357
repo_url: None
paper_authors: Skander Karkar, Ibrahim Ayed, Emmanuel de Bézenac, Patrick Gallinari
for: 这篇论文是关于对内存有限的设定下进行对应顶层神经网络训练，并且避免了一些终端 backwards propagation 问题。
methods: 这篇论文提出了一种模组化的训练方法，叫做 TRGL（Transport Regularized Greedy Learning），这种方法运用了分布空间内的运动调整方法来缓和权值网络的过滤问题。
results: 实验结果显示，当加入了这种调整方法时，模组化训练可以提高模型的准确度，并且比其他模组化训练方法和终端训练方法更好，尤其是在内存有限的情况下。

Abstract
Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.

摘要
<>将文本翻译成简化中文。<>吃善层或模块层wise训练神经网络是在受限制的设备上训练的一种吸引人的方法，因为它 circumvents 许多终到端反向传播问题。然而，它会遇到一个停滞问题，其中早期层过拟合，深度更多的层会在某个深度下停止提高测试准确率。我们提议解决这个问题，通过引入模块层wise regularization，我们称之为TRGL（Transport Regularized Greedy Learning）。我们 theoretically 研究了这种方法，证明它会导致规范的模块，逐渐解决任务。实验ally，我们发现在不同的架构，如ResNets、Transformers和VGG等，当我们的规范加入时，模块层wise 训练的准确率会提高，超过其他模块层wise 训练方法，并且经常高于端到端训练。此外，我们发现在60%的内存使用量下，我们的方法可以达到类似的准确率。

Efficient Biologically Plausible Adversarial Training

paper_url: http://arxiv.org/abs/2309.17348
repo_url: https://github.com/IBM/Efficient-Biologically-Plausible-Adversarial-Training
paper_authors: Matilde Tristany Farinha, Thomas Ortner, Giorgia Dellaferrera, Benjamin Grewe, Angeliki Pantazi
for: 这个论文的目的是比较人工神经网络和生物学可能的学习算法在对抗黑客攻击方面的性能。
methods: 这个论文使用的方法包括人工神经网络和生物学可能的学习算法，以及对这些算法进行对抗黑客攻击的训练。
results: 研究结果表明，使用生物学可能的学习算法PEPITA可以提高对抗黑客攻击的性能，并且在不同的计算机视觉任务上具有更好的自然vs黑客性能质量比。

Abstract
Artificial Neural Networks (ANNs) trained with Backpropagation (BP) show astounding performance and are increasingly often used in performing our daily life tasks. However, ANNs are highly vulnerable to adversarial attacks, which alter inputs with small targeted perturbations that drastically disrupt the models' performance. The most effective method to make ANNs robust against these attacks is adversarial training, in which the training dataset is augmented with exemplary adversarial samples. Unfortunately, this approach has the drawback of increased training complexity since generating adversarial samples is very computationally demanding. In contrast to ANNs, humans are not susceptible to adversarial attacks. Therefore, in this work, we investigate whether biologically-plausible learning algorithms are more robust against adversarial attacks than BP. In particular, we present an extensive comparative analysis of the adversarial robustness of BP and Present the Error to Perturb the Input To modulate Activity (PEPITA), a recently proposed biologically-plausible learning algorithm, on various computer vision tasks. We observe that PEPITA has higher intrinsic adversarial robustness and, with adversarial training, has a more favourable natural-vs-adversarial performance trade-off as, for the same natural accuracies, PEPITA's adversarial accuracies decrease in average by 0.26% and BP's by 8.05%.

摘要
人工神经网络（ANNs）通过反射储存（BP）显示了惊人的性能，并在日常任务中越来越常用。然而，ANNs受到针对性攻击的威胁，这些攻击通过小量目标干扰来干扰模型的性能。为了使ANNs具有鲁棒性，可以通过对训练集进行反向储存训练来提高模型的鲁棒性。然而，这种方法带来了增加的训练复杂度，因为生成针对性攻击样本需要大量计算资源。与此相反，人类不受针对性攻击的威胁。因此，在这项工作中，我们调查了使用生物可能的学习算法是否比BP更鲁棒 against针对性攻击。特别是，我们对BP和Recently proposed的生物可能的学习算法Present the Error to Perturb the Input To modulate Activity（PEPITA）在多种计算机视觉任务上进行了广泛的比较分析。我们发现，PEPITA具有更高的内在鲁棒性，并且在受到针对性攻击的情况下，与BP相比，PEPITA的自然vs针对性性能质量更好，即在同一个自然准确率下，PEPITA的针对性准确率平均下降0.26%，而BP的下降8.05%。

Outage-Watch: Early Prediction of Outages using Extreme Event Regularizer

paper_url: http://arxiv.org/abs/2309.17340
repo_url: None
paper_authors: Shubham Agarwal, Sarthak Chakraborty, Shaddy Garg, Sumit Bisht, Chahat Jain, Ashritha Gonuguntla, Shiv Saini
for: 提高云服务的可靠性和诊断灵活性
methods: 使用现有系统状态预测服务质量指标的变化，并使用混合 Gaussian 模型和极值事件补偿器提高学习效果
results: 对真实的 SaaS 公司数据进行评估，Outage-Watch 表现出色，与传统方法相比，平均 AUC 为 0.98，并且能够早期探测服务 metric 的变化，降低了服务中断的 Mean Time To Detection（MTTD），证明了我们提出的方法的有效性。

Abstract
Cloud services are omnipresent and critical cloud service failure is a fact of life. In order to retain customers and prevent revenue loss, it is important to provide high reliability guarantees for these services. One way to do this is by predicting outages in advance, which can help in reducing the severity as well as time to recovery. It is difficult to forecast critical failures due to the rarity of these events. Moreover, critical failures are ill-defined in terms of observable data. Our proposed method, Outage-Watch, defines critical service outages as deteriorations in the Quality of Service (QoS) captured by a set of metrics. Outage-Watch detects such outages in advance by using current system state to predict whether the QoS metrics will cross a threshold and initiate an extreme event. A mixture of Gaussian is used to model the distribution of the QoS metrics for flexibility and an extreme event regularizer helps in improving learning in tail of the distribution. An outage is predicted if the probability of any one of the QoS metrics crossing threshold changes significantly. Our evaluation on a real-world SaaS company dataset shows that Outage-Watch significantly outperforms traditional methods with an average AUC of 0.98. Additionally, Outage-Watch detects all the outages exhibiting a change in service metrics and reduces the Mean Time To Detection (MTTD) of outages by up to 88% when deployed in an enterprise cloud-service system, demonstrating efficacy of our proposed method.

摘要
云服务是 ubique 和关键的，云服务失效是生活中的一种常见现象。为了保持客户和避免收益损失，提供高可靠性保证是非常重要的。一种方法是预测失效，可以帮助降低失效的严重性以及恢复时间。然而， kritische 失效具有罕见的事件，因此难以预测。此外， kritische 失效是无法明确定义的 observable 数据。我们提出的方法是 Outage-Watch，它定义云服务失效为 Quality of Service (QoS) 的下降， capture 由一组 métricas。Outage-Watch 使用当前系统状态预测 QoS métricas 是否将过度阈值，并触发极端事件。使用 Gaussian 混合模型以提供灵活性，并使用极端事件正则化来提高学习tail 的分布。如果任何一个 QoS métricas 的概率过度阈值，则认为出现了失效。我们对一个实际 SaaS 公司数据集进行了评估，结果表明 Outage-Watch 与传统方法相比显著性能更高，其中平均 AUC 为 0.98。此外，Outage-Watch 能够检测所有具有服务 métricas 变化的失效，并在云服务系统中减少失效检测时间（MTTD）的88%，这说明了我们提出的方法的效果。

Scaling Experiments in Self-Supervised Cross-Table Representation Learning

paper_url: http://arxiv.org/abs/2309.17339
repo_url: None
paper_authors: Maximilian Schambach, Dominique Paul, Johannes S. Otterbach
for: 本研究旨在探讨深度表格学习模型的扩展可能性，提出了一种基于Transformer架构的特有表格数据学习模型，并使用表格特定的tokenizer和共享Transformer脊梁来进行cross-table表格表示学习。
methods: 我们的训练方法包括单表模型和交叉表模型，通过自我超vised masked cell recovery目标进行缺失值补充。我们在不同模型大小（约10^4到10^7参数）下训练模型，并使用来自76个多样化数据集的135万个训练token来训练模型。
results: 我们通过对预训练模型进行线性探索，并与 convent ional baselines进行比较，来评估我们的架构是否能够扩展到更大的表格数据。

Abstract
To analyze the scaling potential of deep tabular representation learning models, we introduce a novel Transformer-based architecture specifically tailored to tabular data and cross-table representation learning by utilizing table-specific tokenizers and a shared Transformer backbone. Our training approach encompasses both single-table and cross-table models, trained via missing value imputation through a self-supervised masked cell recovery objective. To understand the scaling behavior of our method, we train models of varying sizes, ranging from approximately $10^4$ to $10^7$ parameters. These models are trained on a carefully curated pretraining dataset, consisting of 135M training tokens sourced from 76 diverse datasets. We assess the scaling of our architecture in both single-table and cross-table pretraining setups by evaluating the pretrained models using linear probing on a curated set of benchmark datasets and comparing the results with conventional baselines.

摘要
translate_text="To analyze the scaling potential of deep tabular representation learning models, we introduce a novel Transformer-based architecture specifically tailored to tabular data and cross-table representation learning by utilizing table-specific tokenizers and a shared Transformer backbone. Our training approach encompasses both single-table and cross-table models, trained via missing value imputation through a self-supervised masked cell recovery objective. To understand the scaling behavior of our method, we train models of varying sizes, ranging from approximately $10^4$ to $10^7$ parameters. These models are trained on a carefully curated pretraining dataset, consisting of 135M training tokens sourced from 76 diverse datasets. We assess the scaling of our architecture in both single-table and cross-table pretraining setups by evaluating the pretrained models using linear probing on a curated set of benchmark datasets and comparing the results with conventional baselines."Here's the translation:为了分析深度表格表示学习模型的扩展潜力，我们提出了一种专门针对表格数据的Transformer架构，利用表格特定的tokenizer和共享Transformer背部。我们的训练方法包括单表和交叉表模型，通过缺失值补充来实现自我超vised做为掩码的恢复目标。为了了解我们的方法的扩展行为，我们训练了参数量从约10^4到10^7的模型，并在76个不同的数据集上进行了自适应预训练。我们通过对预训练模型进行线性探测，在单表和交叉表预训练设置下评估了我们的架构扩展行为，并与传统基elines进行比较。

Robust Stochastic Optimization via Gradient Quantile Clipping

paper_url: http://arxiv.org/abs/2309.17316
repo_url: None
paper_authors: Ibrahim Merad, Stéphane Gaïffas
for: 本文提出了一种SGD采样策略，使用分值函数来clip梯度 norm。这种策略可以提供一种robust和高效的优化算法，适用于均匀目标函数（ convex 或非 convex），承受重 tailed 样本和一部分异常值。
methods: 我们使用了常数步长 SGD 和 Markov chain 的连接，以及clip introduce的偏差的原始方法来进行数学分析。
results: 我们证明了在强 converges 目标函数时， iteration 会 converges to a concentrated distribution，并提供了高probability 上界的最终估计误差。在非均匀目标函数情况下，我们证明了极限分布在一个低梯度的 neighborhood 中受限。我们还提出了一种使用rolling quantiles实现这种算法的实现方法，这种方法具有强大的Robustness 性和高效性，通过数值实验证明。

Abstract
We introduce a clipping strategy for Stochastic Gradient Descent (SGD) which uses quantiles of the gradient norm as clipping thresholds. We prove that this new strategy provides a robust and efficient optimization algorithm for smooth objectives (convex or non-convex), that tolerates heavy-tailed samples (including infinite variance) and a fraction of outliers in the data stream akin to Huber contamination. Our mathematical analysis leverages the connection between constant step size SGD and Markov chains and handles the bias introduced by clipping in an original way. For strongly convex objectives, we prove that the iteration converges to a concentrated distribution and derive high probability bounds on the final estimation error. In the non-convex case, we prove that the limit distribution is localized on a neighborhood with low gradient. We propose an implementation of this algorithm using rolling quantiles which leads to a highly efficient optimization procedure with strong robustness properties, as confirmed by our numerical experiments.

摘要
我们介绍了一种SGD clipping策略，使用Gradient norm的quantiles作为clipping阈值。我们证明了这种新策略可以提供一种robust和高效的优化算法，用于精度目标函数（ convex或非 convex），承受重 tailed samples（包括无限 variance）和数据流中一部分异常值。我们的数学分析利用了常数步长SGD和Markov链之间的连接，并处理clipping引入的偏差。对于强度 convex 目标函数，我们证明了迭代 converges to a concentrated distribution，并 derivated high probability bounds on the final estimation error。在非 convex 情况下，我们证明了限制分布在一个低Gradient的 neighborhood中。我们提议了使用rolling quantiles实现这个算法，导致了一种高效的优化过程，并且具有强大的Robust性质。我们的numerical experiments表明，这种策略在实际应用中具有很好的性能。

Leave-one-out Distinguishability in Machine Learning

paper_url: http://arxiv.org/abs/2309.17310
repo_url: None
paper_authors: Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri
for: 本研究旨在量化机器学习算法输出分布变化的问题，即留下一个数据点时算法训练集的影响，我们称之为留下一个数据点可 distinguishability（LOOD）。
methods: 我们使用 Gaussian processes 来模型机器学习算法的随机性，并通过详细的实验分析信息泄露问题，以证明 LOOD 的有用性。
results: 我们发现 LOOD 可以帮助我们量化数据记忆和隐私问题，并且可以分析训练数据中具有影响的数据点。此外，我们还可以使用优化查询来泄露训练数据中最重要的信息。

Abstract
We introduce a new analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD). This problem is key to measuring data **memorization** and **information leakage** in machine learning, and the **influence** of training data points on model predictions. We illustrate how our method broadens and refines existing empirical measures of memorization and privacy risks associated with training data. We use Gaussian processes to model the randomness of machine learning algorithms, and validate LOOD with extensive empirical analysis of information leakage using membership inference attacks. Our theoretical framework enables us to investigate the causes of information leakage and where the leakage is high. For example, we analyze the influence of activation functions, on data memorization. Additionally, our method allows us to optimize queries that disclose the most significant information about the training data in the leave-one-out setting. We illustrate how optimal queries can be used for accurate **reconstruction** of training data.

摘要
我们提出了一种新的分析框架，用于量化机器学习算法输出分布变化后包括一些数据点在训练集中的效果，我们称之为离带一个数据点分 distinguishability（LOOD）。这个问题对机器学习中的数据 **记忆** 和 **信息泄露** 具有重要意义，以及训练数据点对模型预测的影响。我们使用 Gaussian processes 模型机器学习算法的随机性，并通过广泛的实验分析信息泄露使用会员推理攻击。我们的理论框架允许我们调查训练数据点的泄露原因和泄露的地方。例如，我们分析活动函数对数据记忆的影响。此外，我们的方法允许我们设计优化查询，以披露训练数据中最重要的信息。我们示例如如何使用优化查询进行准确的 **重建** 训练数据。

Navigating the Design Space of Equivariant Diffusion-Based Generative Models for De Novo 3D Molecule Generation

paper_url: http://arxiv.org/abs/2309.17296
repo_url: None
paper_authors: Tuan Le, Julian Cremer, Frank Noé, Djork-Arné Clevert, Kristof Schütt
for: 这个研究旨在提高deep generative diffusion模型的性能，以便于物理科学和药物探索中的三维分子设计。
methods: 研究专注在E(3)对称扩散模型的设计空间中的前所未踏域。实验采用了比较分析，评估了连续和点点状态空间之间的交互。 finally, the EQGAT-diff model was introduced, which consistently outperforms established models on the QM9 and GEOM-Drugs datasets by a large margin.
results: EQGAT-diff model的实验结果显示，该模型在QM9和GEOM-Drugs数据集上的表现远比先前的模型好很多。此外，对于有限的训练数据，EQGAT-diff模型可以转移到Target Distributions with explicit hydrogens，并且通过一些调整 iterations further improve the state-of-the-art performance across datasets.

Abstract
Deep generative diffusion models are a promising avenue for de novo 3D molecular design in material science and drug discovery. However, their utility is still constrained by suboptimal performance with large molecular structures and limited training data. Addressing this gap, we explore the design space of E(3) equivariant diffusion models, focusing on previously blank spots. Our extensive comparative analysis evaluates the interplay between continuous and discrete state spaces. Out of this investigation, we introduce the EQGAT-diff model, which consistently surpasses the performance of established models on the QM9 and GEOM-Drugs datasets by a large margin. Distinctively, EQGAT-diff takes continuous atomic positions while chemical elements and bond types are categorical and employ a time-dependent loss weighting that significantly increases training convergence and the quality of generated samples. To further strengthen the applicability of diffusion models to limited training data, we examine the transferability of EQGAT-diff trained on the large PubChem3D dataset with implicit hydrogens to target distributions with explicit hydrogens. Fine-tuning EQGAT-diff for a couple of iterations further pushes state-of-the-art performance across datasets. We envision that our findings will find applications in structure-based drug design, where the accuracy of generative models for small datasets of complex molecules is critical.

摘要
深层生成扩散模型是物理科学和药物发现中新型分子设计的有前途的途径。然而，它们的实用性仍受大分子结构和有限训练数据的限制。为此，我们探索了E(3)对称扩散模型的设计空间，特别是之前未曾探讨的地方。我们进行了广泛的比较分析，评估了连续和离散状态空间之间的交互。从这些研究中，我们引入了EQGAT-diff模型，其在QM9和GEOM-Drugs数据集上至今的表现均超过了已有模型的表现，差异较大。EQGAT-diff模型使用连续原子位置，而化学元素和键类型则是分类的，同时采用时间依赖损失权重，这有效地增加了训练的整合和生成样本的质量。此外，我们还考虑了EQGAT-diff模型在PubChem3D数据集上进行训练，然后在目标分布中进行了转移学习，以提高模型的应用scope。我们认为，我们的发现将在结构基据设计中扮演着关键的角色，特别是在小数据集中的复杂分子的生成模型准确性是关键。

Deep learning soliton dynamics and complex potentials recognition for 1D and 2D PT-symmetric saturable nonlinear Schrödinger equations

paper_url: http://arxiv.org/abs/2310.02276
repo_url: None
paper_authors: Jin Song, Zhenya Yan
For: 这 paper 的目的是扩展物理 Informed Neural Networks (PINNs) 来学习数据驱动的站ARY和非站ARY的塞耳散度 Equation (SNLSE) 中的两种基本PT-симметричный Scarf-II和周期 potentials在光纤中。* Methods: 这 paper 使用的方法包括 extending PINNs 来学习数据驱动的SNLSE，并对 PT-symmetric potential functions的发现进行数据驱动的反问题研究。特别是，提出了一种modified PINNs (mPINNs) 方案，可以直接通过解数据来确定1D和2D PT-symmetric potentials的函数。* Results: 这 paper 的结果表明，使用 deep neural networks 可以在1D和2D SNLSEs 中实现高精度的解决方案，并且 comparing two network structures under different parameter conditions 可以实现类似的高精度结果。此外， paper 还 analyze了一些影响 neural networks 性能的主要因素，包括活动函数、网络结构和训练数据的大小。

Abstract
In this paper, we firstly extend the physics-informed neural networks (PINNs) to learn data-driven stationary and non-stationary solitons of 1D and 2D saturable nonlinear Schr\"odinger equations (SNLSEs) with two fundamental PT-symmetric Scarf-II and periodic potentials in optical fibers. Secondly, the data-driven inverse problems are studied for PT-symmetric potential functions discovery rather than just potential parameters in the 1D and 2D SNLSEs. Particularly, we propose a modified PINNs (mPINNs) scheme to identify directly the PT potential functions of the 1D and 2D SNLSEs by the solution data. And the inverse problems about 1D and 2D PT -symmetric potentials depending on propagation distance z are also investigated using mPINNs method. We also identify the potential functions by the PINNs applied to the stationary equation of the SNLSE. Furthermore, two network structures are compared under different parameter conditions such that the predicted PT potentials can achieve the similar high accuracy. These results illustrate that the established deep neural networks can be successfully used in 1D and 2D SNLSEs with high accuracies. Moreover, some main factors affecting neural networks performance are discussed in 1D and 2D PT Scarf-II and periodic potentials, including activation functions, structures of the networks, and sizes of the training data. In particular, twelve different nonlinear activation functions are in detail analyzed containing the periodic and non-periodic functions such that it is concluded that selecting activation functions according to the form of solution and equation usually can achieve better effect.

摘要
在本文中，我们首先扩展物理学 Informed Neural Networks (PINNs) 以学习数据驱动的一维和二维非线性普朗克 equations (SNLSEs) 中的定点和非定点 solitons。其次，我们研究了数据驱动的逆问题，即在一维和二维 SNLSEs 中发现PT-对称 potential functions，而不是只是参数。特别是，我们提出了修改后PINNs (mPINNs) 方案，以直接从解数据中获取1D和2D SNLSEs 中的PT potential functions。此外，我们还研究了一维和二维PT-对称 potentials的逆问题，它们取决于媒体传播距离z。此外，我们通过PINNs应用于SNLSEs的站点方程来预测PT potentials，并对其进行比较。我们发现，采用不同参数条件下的两种网络结构可以达到类似高精度。这些结果表明，我们建立的深度神经网络可以成功应用于1D和2D SNLSEs。此外，我们还讨论了一些影响神经网络性能的主要因素，包括活化函数、网络结构和训练数据大小。具体来说，我们对12种不同的非线性活化函数进行了详细分析，并结论选择活化函数应该根据解和方程的形式来选择。

In search of dispersed memories: Generative diffusion models are associative memory networks

paper_url: http://arxiv.org/abs/2309.17290
repo_url: None
paper_authors: Luca Ambrogioni
for: 这项研究旨在将生成推理和神经科学中的记忆理论联系起来，以推动创造性的生成和记忆回忆为一个综合体系。
methods: 这项研究使用了生成推理模型，并将其解释为能量基本模型。在训练过程中，Diffusion模型的能量函数与现代奥普菲尔德网络的能量函数相等。
results: 研究发现，继íz模型的存储容量与现代奥普菲尔德网络的存储容量相同。这些结果证明了生成推理和神经科学中的记忆理论之间的强联系，并提供了一个强大的计算基础 для创造性生成和记忆回忆。

Abstract
Hopfield networks are widely used in neuroscience as simplified theoretical models of biological associative memory. The original Hopfield networks store memories by encoding patterns of binary associations, which result in a synaptic learning mechanism known as Hebbian learning rule. Modern Hopfield networks can achieve exponential capacity scaling by using highly non-linear energy functions. However, the energy function of these newer models cannot be straightforwardly compressed into binary synaptic couplings and it does not directly provide new synaptic learning rules. In this work we show that generative diffusion models can be interpreted as energy-based models and that, when trained on discrete patterns, their energy function is equivalent to that of modern Hopfield networks. This equivalence allows us to interpret the supervised training of diffusion models as a synaptic learning process that encodes the associative dynamics of a modern Hopfield network in the weight structure of a deep neural network. Accordingly, in our experiments we show that the storage capacity of a continuous modern Hopfield network is identical to the capacity of a diffusion model. Our results establish a strong link between generative modeling and the theoretical neuroscience of memory, which provide a powerful computational foundation for the reconstructive theory of memory, where creative generation and memory recall can be seen as parts of a unified continuum.

摘要

Toward Robust Recommendation via Real-time Vicinal Defense

paper_url: http://arxiv.org/abs/2309.17278
repo_url: None
paper_authors: Yichang Xu, Chenwang Wu, Defu Lian
for: 防御推荐系统受到恶意数据攻击，导致推荐结果受到偏见。
methods: 提出了一种通用的方法Real-time Vicinal Defense (RVD)，通过在推荐前 fine-tune模型，使其在实时中保持 robustness。
results: 经过广泛的实验证明，RVD可以有效防御多种目标攻击，而且不需要改变模型结构和训练过程，更加实用。

Abstract
Recommender systems have been shown to be vulnerable to poisoning attacks, where malicious data is injected into the dataset to cause the recommender system to provide biased recommendations. To defend against such attacks, various robust learning methods have been proposed. However, most methods are model-specific or attack-specific, making them lack generality, while other methods, such as adversarial training, are oriented towards evasion attacks and thus have a weak defense strength in poisoning attacks. In this paper, we propose a general method, Real-time Vicinal Defense (RVD), which leverages neighboring training data to fine-tune the model before making a recommendation for each user. RVD works in the inference phase to ensure the robustness of the specific sample in real-time, so there is no need to change the model structure and training process, making it more practical. Extensive experimental results demonstrate that RVD effectively mitigates targeted poisoning attacks across various models without sacrificing accuracy. Moreover, the defensive effect can be further amplified when our method is combined with other strategies.

摘要
<>推荐系统经常受到恶意攻击，攻击者可以插入恶意数据来让推荐系统提供偏袋式的推荐。为了防御这些攻击，有很多鲁棒学习方法被提出，但大多数方法是模型特定或攻击特定的，因此缺乏通用性，而其他方法，如对抗训练，则更关注逃脱攻击而弱于抗毒攻击。在这篇论文中，我们提出了一种通用的方法，即实时邻域防御（RVD），它利用用户的邻域训练数据来细化模型，以在实时推荐时保证模型的鲁棒性。RVD在推荐阶段进行实时微调，不需要改变模型结构和训练过程，因此更加实用。我们的实验结果表明，RVD可以有效地防御目标投毒攻击，并且不 sacrify 准确性。此外，当我们的方法与其他策略相结合时，抗击效果可以得到进一步的增强。

Utility-based Adaptive Teaching Strategies using Bayesian Theory of Mind

paper_url: http://arxiv.org/abs/2309.17275
repo_url: https://github.com/teacher-with-tom/utility_based_adaptive_teaching
paper_authors: Clémence Grislain, Hugo Caselles-Dupré, Olivier Sigaud, Mohamed Chetouani
for: 这个论文的目的是构建一种基于 Bayesian Theory of Mind（ToM）的教师机器人，以便它们可以像人类一样适应学生的内部状态，并为学生选择最佳的示例。
methods: 该论文使用了 Bayesian ToM 机制，从学生的行为观察中构建了学生的内部状态模型，然后根据这些模型选择最佳的示例，以最大化学生的奖励而最小化教学成本。
results: 该论文的实验结果表明，使用这种基于 ToM 的教学策略可以使学生更快速地学习和提高性能，尤其是当教师的 ToM 模型与实际学生状态更加一致时。

Abstract
Good teachers always tailor their explanations to the learners. Cognitive scientists model this process under the rationality principle: teachers try to maximise the learner's utility while minimising teaching costs. To this end, human teachers seem to build mental models of the learner's internal state, a capacity known as Theory of Mind (ToM). Inspired by cognitive science, we build on Bayesian ToM mechanisms to design teacher agents that, like humans, tailor their teaching strategies to the learners. Our ToM-equipped teachers construct models of learners' internal states from observations and leverage them to select demonstrations that maximise the learners' rewards while minimising teaching costs. Our experiments in simulated environments demonstrate that learners taught this way are more efficient than those taught in a learner-agnostic way. This effect gets stronger when the teacher's model of the learner better aligns with the actual learner's state, either using a more accurate prior or after accumulating observations of the learner's behaviour. This work is a first step towards social machines that teach us and each other, see https://teacher-with-tom.github.io.

摘要

Estimation and Inference in Distributional Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.17262
repo_url: https://github.com/zhangliangyu32/estimationandinferencedistributionalrl
paper_authors: Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang
for: 这 paper investigate distributional reinforcement learning 的 statistical efficiency aspect.
methods: 这 paper 使用 certainty-equivalence method construct estimator $\hat\eta^\pi$, 需要一个 generative model.
results: 这 paper prove that with a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\epsilon^{2p}(1-\gamma)^{2p+2}\right)$, we can guarantee the $p$-Wasserstein metric between $\hat\eta^\pi$ and $\eta^\pi$ is less than $\epsilon$ with high probability. Additionally, the paper shows that a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\epsilon^2(1-\gamma)^4}\right)$ is sufficient to ensure the Kolmogorov metric and total variation metric between $\hat\eta^\pi$ and $\eta^\pi$ is below $\epsilon$ with high probability. Finally, the paper demonstrates that the empirical process $\sqrt{n}(\hat\eta^\pi-\eta^\pi)$ converges weakly to a Gaussian process in certain function spaces.

Abstract
In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete distribution of the random return (denoted $\eta^\pi$) attained by a given policy $\pi$. We use the certainty-equivalence method to construct our estimator $\hat\eta^\pi$, given a generative model is available. We show that in this circumstance we need a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\epsilon^{2p}(1-\gamma)^{2p+2}\right)$ to guarantee a $p$-Wasserstein metric between $\hat\eta^\pi$ and $\eta^\pi$ is less than $\epsilon$ with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\epsilon^{2}(1-\gamma)^{4}\right)$ suffices to ensure the Kolmogorov metric and total variation metric between $\hat\eta^\pi$ and $\eta^\pi$ is below $\epsilon$ with high probability. Furthermore, we investigate the asymptotic behavior of $\hat\eta^\pi$. We demonstrate that the ``empirical process'' $\sqrt{n}(\hat\eta^\pi-\eta^\pi)$ converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class $\ell^\infty(\mathcal{F}_{W_1})$, also in the space of bounded functionals on indicator function class $\ell^\infty(\mathcal{F}_{\mathrm{KS})$ and bounded measurable function class $\ell^\infty(\mathcal{F}_{\mathrm{TV})$ when some mild conditions hold. Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $\eta^\pi$.

摘要
在这篇论文中，我们从统计效率的角度研究分布式强化学习。我们研究了分布式政策评估， aiming to estimate the complete distribution of the random return (denoted $\eta^\pi$) attained by a given policy $\pi$.我们使用certainty-equivalence方法construct our estimator $\hat\eta^\pi$, given a generative model is available.我们证明在这种情况下，我们需要一个大小为 $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\epsilon^{2p}(1-\gamma)^{2p+2}\right)$的数据集，以保证分布式政策评估问题可以高效地解决。这意味着可以通过采样效率来解决分布式政策评估问题。此外，我们还证明了不同的某些轻度假设下，一个数据集大小为 $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\epsilon^{2}(1-\gamma)^{4}\right)$ suffices to ensure the Kolmogorov metric and total variation metric between $\hat\eta^\pi$ and $\eta^\pi$ is below $\epsilon$ with high probability.此外，我们还研究了 $\hat\eta^\pi$ 的极限行为。我们证明了 $\sqrt{n}(\hat\eta^\pi-\eta^\pi)$ converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class $\ell^\infty(\mathcal{F}_{W_1})$, also in the space of bounded functionals on indicator function class $\ell^\infty(\mathcal{F}_{\mathrm{KS})$ and bounded measurable function class $\ell^\infty(\mathcal{F}_{\mathrm{TV})$ when some mild conditions hold.我们的发现可以导致一种统一的方法 для统计推断 $\eta^\pi$ 的各种统计函数。

Data-driven localized waves and parameter discovery in the massive Thirring model via extended physics-informed neural networks with interface zones

paper_url: http://arxiv.org/abs/2309.17240
repo_url: None
paper_authors: Junchao Chen, Jin Song, Zijian Zhou, Zhenya Yan
for: 这个论文研究了基于深度学习的物理学习核算法（PINNs）在大规模提尔林（MT）模型中的数据驱动本地波解。
methods: 该论文使用了扩展PINNs（XPINNs）与域 decompositions以捕捉高级别的本地波解，并提出了一种改进版XPINNs方法。
results: 该论文通过对各种解的数据驱动 simulations 和分析，显示了该方法的高精度和快速收敛特点，并成功地解决了不同类型的本地波解的逆问题。

Abstract
In this paper, we study data-driven localized wave solutions and parameter discovery in the massive Thirring (MT) model via the deep learning in the framework of physics-informed neural networks (PINNs) algorithm. Abundant data-driven solutions including soliton of bright/dark type, breather and rogue wave are simulated accurately and analyzed contrastively with relative and absolute errors. For higher-order localized wave solutions, we employ the extended PINNs (XPINNs) with domain decomposition to capture the complete pictures of dynamic behaviors such as soliton collisions, breather oscillations and rogue-wave superposition. In particular, we modify the interface line in domain decomposition of XPINNs into a small interface zone and introduce the pseudo initial, residual and gradient conditions as interface conditions linked adjacently with individual neural networks. Then this modified approach is applied successfully to various solutions ranging from bright-bright soliton, dark-dark soliton, dark-antidark soliton, general breather, Kuznetsov-Ma breather and second-order rogue wave. Experimental results show that this improved version of XPINNs reduce the complexity of computation with faster convergence rate and keep the quality of learned solutions with smoother stitching performance as well. For the inverse problems, the unknown coefficient parameters of linear and nonlinear terms in the MT model are identified accurately with and without noise by using the classical PINNs algorithm.

摘要
在本文中，我们研究了基于深度学习的数据驱动本地波解的MT模型参数发现和解决方法。通过物理学 Informed Neural Networks（PINNs）算法，我们可以高精度地模拟并分析包括喷气、暗气、异常波等数据驱动波解。对于更高阶的本地波解，我们使用了扩展PINNs（XPINNs） WITH 域 decomposing来捕捉整个动态行为的全貌，如喷气碰撞、暗气振荡和异常波superposition。在特定的实现中，我们修改了域 decomposition的界面线为一小的界面zone，并引入 pseudo initial、剩余和导数条件作为界面条件，这些条件与个体神经网络相邻联系。然后，这种修改后的方法被应用到了不同的解，包括明亮喷气、暗气喷气、暗气反喷气、通用喷气、库泽зов-玛喷气和第二阶异常波。实验结果表明，改进后的XPINNs方法可以降低计算复杂度，提高速度度和保持学习解的平滑拼接性。此外，我们还使用了类ical PINNs算法来准确地确定MT模型中未知系数参数，包括线性和非线性项的系数。

MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data

paper_url: http://arxiv.org/abs/2310.02275
repo_url: https://github.com/helloworldlty/muse-gnn
paper_authors: Tianyu Liu, Yuge Wang, Rex Ying, Hongyu Zhao
for: 本研究旨在学习生物医学数据中不同来源数据的基因表达之间的相似性，以便在不同背景下捕捉基因功能相似性。
methods: 本研究提出了一种名为多模态相似学习图 neural network的新模型，该模型结合多模态机器学习和深度图 нейрон网络来学习基因表达从单元细胞测序和空间转录数据中。
results: 对于82个训练数据集、10种组织、三种测序技术和三种物种，我们创建了有用的图结构进行模型训练和基因表达生成，并采用了权重相似学习和对比学习来学习不同数据中基因之间的关系。这种新的设计使得我们可以提供包含多种Modalities的基因表达，其中包含了不同背景下基因功能相似性的功能相似性。

Abstract
Discovering genes with similar functions across diverse biomedical contexts poses a significant challenge in gene representation learning due to data heterogeneity. In this study, we resolve this problem by introducing a novel model called Multimodal Similarity Learning Graph Neural Network, which combines Multimodal Machine Learning and Deep Graph Neural Networks to learn gene representations from single-cell sequencing and spatial transcriptomic data. Leveraging 82 training datasets from 10 tissues, three sequencing techniques, and three species, we create informative graph structures for model training and gene representations generation, while incorporating regularization with weighted similarity learning and contrastive learning to learn cross-data gene-gene relationships. This novel design ensures that we can offer gene representations containing functional similarity across different contexts in a joint space. Comprehensive benchmarking analysis shows our model's capacity to effectively capture gene function similarity across multiple modalities, outperforming state-of-the-art methods in gene representation learning by up to 97.5%. Moreover, we employ bioinformatics tools in conjunction with gene representations to uncover pathway enrichment, regulation causal networks, and functions of disease-associated or dosage-sensitive genes. Therefore, our model efficiently produces unified gene representations for the analysis of gene functions, tissue functions, diseases, and species evolution.

摘要
在生物医学中发现同 функ数据中的基因具有相似功能是一个挑战，因为数据的多样性导致表现学习的问题。在这项研究中，我们解决这个问题 by introducing a novel model called Multimodal Similarity Learning Graph Neural Network, which combines Multimodal Machine Learning and Deep Graph Neural Networks to learn gene representations from single-cell sequencing and spatial transcriptomic data. By leveraging 82 training datasets from 10 tissues, three sequencing techniques, and three species, we create informative graph structures for model training and gene representations generation, while incorporating regularization with weighted similarity learning and contrastive learning to learn cross-data gene-gene relationships. This novel design ensures that we can offer gene representations containing functional similarity across different contexts in a joint space.我们的模型能够优化 capture gene function similarity across multiple modalities, outperforming state-of-the-art methods in gene representation learning by up to 97.5%. Furthermore, we employ bioinformatics tools in conjunction with gene representations to uncover pathway enrichment, regulation causal networks, and functions of disease-associated or dosage-sensitive genes. Therefore, our model efficiently produces unified gene representations for the analysis of gene functions, tissue functions, diseases, and species evolution.

Spurious Feature Diversification Improves Out-of-distribution Generalization

paper_url: http://arxiv.org/abs/2309.17230
repo_url: None
paper_authors: Yong Lin, Lu Tan, Yifan Hao, Honam Wong, Hanze Dong, Weizhong Zhang, Yujiu Yang, Tong Zhang
for: 这个论文的目的是解释ensemble方法在陌生数据上的高效性的机制。methods: 这个论文使用了weight space ensemble方法，特别是WiSE-FT方法，以 interpolate模型参数。results: 研究发现，WiSE-FT方法可以在陌生数据上具有优秀的高效性，且可以正确地纠正各个模型的预测错误。此外，研究还发现， ensemble方法可以通过使用多样性的干扰特征来减少预测错误。

Abstract
Generalization to out-of-distribution (OOD) data is a critical challenge in machine learning. Ensemble-based methods, like weight space ensembles that interpolate model parameters, have been shown to achieve superior OOD performance. However, the underlying mechanism for their effectiveness remains unclear. In this study, we closely examine WiSE-FT, a popular weight space ensemble method that interpolates between a pre-trained and a fine-tuned model. We observe an unexpected phenomenon, in which WiSE-FT successfully corrects many cases where each individual model makes incorrect predictions, which contributes significantly to its OOD effectiveness. To gain further insights, we conduct theoretical analysis in a multi-class setting with a large number of spurious features. Our analysis predicts the above phenomenon and it further shows that ensemble-based models reduce prediction errors in the OOD settings by utilizing a more diverse set of spurious features. Contrary to the conventional wisdom that focuses on learning invariant features for better OOD performance, our findings suggest that incorporating a large number of diverse spurious features weakens their individual contributions, leading to improved overall OOD generalization performance. Empirically we demonstrate the effectiveness of utilizing diverse spurious features on a MultiColorMNIST dataset, and our experimental results are consistent with the theoretical analysis. Building upon the new theoretical insights into the efficacy of ensemble methods, we further identify an issue of WiSE-FT caused by the overconfidence of fine-tuned models in OOD situations. This overconfidence magnifies the fine-tuned model's incorrect prediction, leading to deteriorated OOD ensemble performance. To remedy this problem, we propose a novel method called BAlaNced averaGing (BANG), which significantly enhances the OOD performance of WiSE-FT.

摘要
通用化到非常值 (OOD) 数据是机器学习中的一个关键挑战。集成方法，如权重空间集合，已经显示出在 OOD 性能上表现出色。然而，这些方法的内在机制仍然不够清楚。在这项研究中，我们密切检查了 WiSE-FT，一种广泛使用的权重空间集合方法，该方法 interpolates между预训练和精度调整的模型。我们发现了一种意外的现象：WiSE-FT 成功地更正了许多情况下每个个体模型的错误预测，这对 OOD 性能做出了重要贡献。为了更深入地理解这种现象，我们在多类 setting 中进行了理论分析，并预测了上述现象。我们的分析还显示，集成型模型在 OOD 设置下采用更多的废弃特征来减少预测错误。与传统智慧所预期的不同，我们发现，在 incorporating 一大量多样的废弃特征时，它们的个人贡献减弱，导致了改善的 OOD 总体化能力。empirically，我们在 MultiColorMNIST 数据集上证明了利用多样的废弃特征的效果，并发现结果与理论分析一致。基于新的理论发现，我们进一步发现 WiSE-FT 在 OOD 情况下存在一个问题，即精度调整的模型在 OOD 情况下的过于自信，导致 ensemble 性能下降。为解决这个问题，我们提出了一种新的方法 called BAlaNced averaGing (BANG)，可以减少 OOD 情况下 WiSE-FT 的 ensemble 性能下降。

Memory Gym: Partially Observable Challenges to Memory-Based Agents in Endless Episodes

paper_url: http://arxiv.org/abs/2309.17207
repo_url: https://github.com/marcometer/endless-memory-gym
paper_authors: Marco Pleines, Matthias Pallasch, Frank Zimmer, Mike Preuss
for: The paper compares the performance of Gated Recurrent Unit (GRU) and Transformer-XL (TrXL) in deep reinforcement learning tasks, specifically in memorizing long sequences, withstanding noise, and generalizing.
methods: The paper uses partially observable 2D environments with discrete controls, such as Mortar Mayhem, Mystery Path, and Searing Spotlights, and extrapolates these environments to novel endless tasks as an automatic curriculum. The paper also uses Proximal Policy Optimization and a sliding window approach with TrXL as episodic memory.
results: The paper shows that GRU outperforms TrXL in all endless tasks, with GRU consistently outperforming TrXL by significant margins. However, TrXL demonstrates superior sample efficiency in Mystery Path and outperforms GRU in Mortar Mayhem.Here are the three key points in Simplified Chinese:
for: 这篇论文测试了深度强化学习Agent的表现，特别是GRU和Transformer-XL（TrXL）在记忆长序、抗噪和泛化方面的表现。
methods: 论文使用了部分可见2D环境和简化的控制，如Mortar Mayhem、Mystery Path和Searing Spotlights，并将这些环境推广到了新的无穷任务，以作为自动课程。
results: 论文显示GRU在所有无穷任务中表现出色，与TrXL在所有任务中均有显著的性能优势。但是，TrXL在Mystery Path中表现出较好的样本效率，而GRU在Mortar Mayhem中表现出较好的性能。

Abstract
Memory Gym introduces a unique benchmark designed to test Deep Reinforcement Learning agents, specifically comparing Gated Recurrent Unit (GRU) against Transformer-XL (TrXL), on their ability to memorize long sequences, withstand noise, and generalize. It features partially observable 2D environments with discrete controls, namely Mortar Mayhem, Mystery Path, and Searing Spotlights. These originally finite environments are extrapolated to novel endless tasks that act as an automatic curriculum, drawing inspiration from the car game ``I packed my bag". These endless tasks are not only beneficial for evaluating efficiency but also intriguingly valuable for assessing the effectiveness of approaches in memory-based agents. Given the scarcity of publicly available memory baselines, we contribute an implementation driven by TrXL and Proximal Policy Optimization. This implementation leverages TrXL as episodic memory using a sliding window approach. In our experiments on the finite environments, TrXL demonstrates superior sample efficiency in Mystery Path and outperforms in Mortar Mayhem. However, GRU is more efficient on Searing Spotlights. Most notably, in all endless tasks, GRU makes a remarkable resurgence, consistently outperforming TrXL by significant margins.

摘要
Memory Gym 引入了一个独特的标准测试深度强化学习机制，具体来说是比较 GRU 和 Transformer-XL（TrXL）在记忆长序的能力、抵抗噪音和通用性方面的比较。它采用了部分可见 2D 环境和简单的控制，包括 Mortar Mayhem、Mystery Path 和 Searing Spotlights。这些原始的有限环境通过扩展到新的无穷任务来 acted as an automatic curriculum， draw inspiration from the car game "I packed my bag".这些无穷任务不仅有利于评估效率，而且有趣地用于评估方法在记忆基于机制中的效果。由于公共可用的记忆基线匮乏，我们提供了基于 TrXL 和 Proximal Policy Optimization 的实现。这个实现利用 TrXL 作为 episodic memory 使用滑动窗口方法。在我们对 finite 环境的实验中，TrXL 在 Mystery Path 中表现出了更高的样本效率，而 GRU 在 Searing Spotlights 中更高效。然而，在所有无穷任务中，GRU 表现出了惊人的复兴，一直高于 TrXL 的很大幅度。

ResBit: Residual Bit Vector for Categorical Values

paper_url: http://arxiv.org/abs/2309.17196
repo_url: None
paper_authors: Masane Fuchi, Amar Zanashir, Hiroto Minami, Tomohiro Takagi
for: 提出了一种解决高频率分类数据的精简表示方法，以便在深度学习中避免空间计算复杂性问题。
methods: 提出了一种基于扩散模型的Analog Bits方法，并提出了一种基于Table Residual Bit Diffusion（TRBD）的TabDDPM表格数据生成方法。
results: 通过实验证明，TRBD可以快速生成高质量表格数据，并且ResBit可以作为一种替代一个热点 вектор的方法，用于GANs的conditioning和图像分类中的标签表达。

Abstract
The one-hot vector has long been widely used in machine learning as a simple and generic method for representing discrete data. However, this method increases the number of dimensions linearly with the categorical data to be represented, which is problematic from the viewpoint of spatial computational complexity in deep learning, which requires a large amount of data. Recently, Analog Bits, a method for representing discrete data as a sequence of bits, was proposed on the basis of the high expressiveness of diffusion models. However, since the number of category types to be represented in a generation task is not necessarily at a power of two, there is a discrepancy between the range that Analog Bits can represent and the range represented as category data. If such a value is generated, the problem is that the original category value cannot be restored. To address this issue, we propose Residual Bit Vector (ResBit), which is a hierarchical bit representation. Although it is a general-purpose representation method, in this paper, we treat it as numerical data and show that it can be used as an extension of Analog Bits using Table Residual Bit Diffusion (TRBD), which is incorporated into TabDDPM, a tabular data generation method. We experimentally confirmed that TRBD can generate diverse and high-quality data from small-scale table data to table data containing diverse category values faster than TabDDPM. Furthermore, we show that ResBit can also serve as an alternative to the one-hot vector by utilizing ResBit for conditioning in GANs and as a label expression in image classification.

摘要
“一热vector”已经在机器学习中广泛使用，用于简单且通用的方法来表示数据。然而，这种方法会将数据的维度增加 linearly avec categorical data 被表示，这是深度学习中的空间 Computational Complexity 的问题，需要大量数据。在最近，Analog Bits 方法被提出，用于表示数据为一串位元。然而，这种方法不能够 repre sent 不同类型的数据，导致当需要生成类别值时， Originals 的类别值无法被Restore。为了解决这个问题，我们提出了 Residual Bit Vector (ResBit)，它是一种层次的位元表示方法。尽管它是一个通用的表示方法，在这篇文章中，我们将它视为数据的numerical data，并证明它可以作为 Analog Bits 的扩展使用 Table Residual Bit Diffusion (TRBD)，它是 TabDDPM 中的一个 tabular data 生成方法。我们实验确认了 TRBD 可以从小规模的表格数据中产生多元化和高品质的数据，并且比 TabDDPM 更快。此外，我们还证明了 ResBit 可以作为一热vector 的替代方案，通过在 GANs 中使用 ResBit 作为条件，以及在图像分类中使用 ResBit 作为标签表达。

Generalized Activation via Multivariate Projection

paper_url: http://arxiv.org/abs/2309.17194
repo_url: https://github.com/ljy9912/mimo_nn
paper_authors: Jiayun Li, Yuxiao Cheng, Zhuofan Xia, Yilin Mo, Gao Huang
for: This paper aims to improve the performance of neural networks by introducing a new activation function called Multivariate Projection Unit (MPU).
methods: The paper uses a mathematical proof to establish the expressive power of MPU compared to the widely used Rectified Linear Unit (ReLU) activation function. Experimental evaluations are also conducted to compare the performance of MPU with other activation functions.
results: The paper shows that MPU outperforms ReLU and other activation functions in terms of expressive power, and provides a mathematical proof to support this claim. Experimental results also corroborate the effectiveness of MPU on widely-adopted architectures.

Abstract
Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide a mathematical proof establishing that FNNs activated by SOC projections outperform those utilizing ReLU in terms of expressive power. Experimental evaluations on widely-adopted architectures further corroborate MPU's effectiveness against a broader range of existing activation functions.

摘要
活动函数是神经网络中不可或缺的一部分，RECTIFIED LINEAR UNIT（ReLU）经常被选择因为它的简单性和效果。我们启发自权值迭代PGD算法中的结构相似性，将ReLU视为从R到非负半线R+的投影。基于这种解释，我们扩展ReLU，将其替换为一种多输入多输出的投影运算，例如第二次导数树（SOC）投影，从而自然地扩展到多变量投影单元（MPU）。我们还提供了一个数学证明，证明使用SOC投影 activation 函数的FNN表现更高效于使用ReLU。实验评估表明，MPU在较广泛的 activation 函数中表现更好。

RECOMBINER: Robust and Enhanced Compression with Bayesian Implicit Neural Representations

paper_url: http://arxiv.org/abs/2309.17182
repo_url: None
paper_authors: Jiajun He, Gergely Flamich, Zongyu Guo, José Miguel Hernández-Lobato
for: 提高数据压缩方法的效率和灵活性
methods: 使用INRWeightlinear扩展variational批处理，加入学习位置编码以适应地方细节，将高解度数据分割成 patches 并使用层次PRIORS捕捉它们之间的依赖关系
results: 在多种数据模式下进行了广泛的实验，显示了RECOMBINER可以与best INR-based方法竞争，甚至在低比特率下超越自适应编码器-based codecs 的图像压缩性能

Abstract
COMpression with Bayesian Implicit NEural Representations (COMBINER) is a recent data compression method that addresses a key inefficiency of previous Implicit Neural Representation (INR)-based approaches: it avoids quantization and enables direct optimization of the rate-distortion performance. However, COMBINER still has significant limitations: 1) it uses factorized priors and posterior approximations that lack flexibility; 2) it cannot effectively adapt to local deviations from global patterns in the data; and 3) its performance can be susceptible to modeling choices and the variational parameters' initializations. Our proposed method, Robust and Enhanced COMBINER (RECOMBINER), addresses these issues by 1) enriching the variational approximation while maintaining its computational cost via a linear reparameterization of the INR weights, 2) augmenting our INRs with learnable positional encodings that enable them to adapt to local details and 3) splitting high-resolution data into patches to increase robustness and utilizing expressive hierarchical priors to capture dependency across patches. We conduct extensive experiments across several data modalities, showcasing that RECOMBINER achieves competitive results with the best INR-based methods and even outperforms autoencoder-based codecs on low-resolution images at low bitrates.

摘要
COMpression with Bayesian Implicit NEural Representations (COMBINER) 是一种最近的数据压缩方法，它解决了之前基于Implicit Neural Representation（INR）的方法中的一个关键不足：它避免了量化并直接优化了率度性能。然而，COMBINER仍有 significiant 局限性：1) 它使用因子化 posterior 和先验 approximations 缺乏灵活性; 2) 它无法有效地适应当地偏差于全局模式的数据; 3) 其性能可能受模型选择和 Variational 参数的初始化的影响。我们提出的方法，Robust and Enhanced COMBINER（RECOMBINER），解决了这些问题，通过以下方式：1. 通过对 INR 权重进行线性 реParameterization，保持 computational cost 的同时，提高 variational approxiamtion 的精度。2. 通过添加可学习的位置编码，使 INR 适应当地偏差和详细信息。3. 将高分辨率数据分割成 patches，提高 robustness，并使用层次的 priors 捕捉数据之间的依赖关系。我们在多个数据模式上进行了广泛的实验，展示了 RECOMBINER 与最佳 INR-based 方法竞争，甚至在低比特率下超过 autoencoder-based 编码器在低分辨率图像上。

FedZeN: Towards superlinear zeroth-order federated learning via incremental Hessian estimation

paper_url: http://arxiv.org/abs/2309.17174
repo_url: None
paper_authors: Alessio Maritan, Subhrakanti Dey, Luca Schenato
for: 这篇论文的目的是开发一个基于分布式学习框架的联合训练方法，以便在中央服务器的协调下，多个客户端机器可以共同训练一个模型，不需要分享原始数据样本。
methods: 这篇论文使用了一个叫做 FedZeN 的联合零阶算法，用于估计全域目标函数的弹性。这个算法使用了一个增量对称梯度估计器，其误差 Norm 会在线性增长。在联合零阶设定下，这个算法会从 Stiefel manifold 中随机选择搜寻方向，以提高表现。中央服务器使用同步 Pseudo-Random 数组生成器来实现通信效率和隐私保护。
results: 这篇论文提供了一个名为 FedZeN 的联合零阶算法，可以在实际应用中实现超线性增长率。实验结果显示，FedZeN Algorithm 比已有的联合零阶方法高效，并且在多个实际应用中实现了超线性增长率。

Abstract
Federated learning is a distributed learning framework that allows a set of clients to collaboratively train a model under the orchestration of a central server, without sharing raw data samples. Although in many practical scenarios the derivatives of the objective function are not available, only few works have considered the federated zeroth-order setting, in which functions can only be accessed through a budgeted number of point evaluations. In this work we focus on convex optimization and design the first federated zeroth-order algorithm to estimate the curvature of the global objective, with the purpose of achieving superlinear convergence. We take an incremental Hessian estimator whose error norm converges linearly, and we adapt it to the federated zeroth-order setting, sampling the random search directions from the Stiefel manifold for improved performance. In particular, both the gradient and Hessian estimators are built at the central server in a communication-efficient and privacy-preserving way by leveraging synchronized pseudo-random number generators. We provide a theoretical analysis of our algorithm, named FedZeN, proving local quadratic convergence with high probability and global linear convergence up to zeroth-order precision. Numerical simulations confirm the superlinear convergence rate and show that our algorithm outperforms the federated zeroth-order methods available in the literature.

摘要
《联合学习》是一种分布式学习框架，允许一组客户端在中央服务器的指挥下，共同训练一个模型，无需分享原始数据样本。虽然在实际应用中，目标函数的导数通常不可获得，但只有很少的研究探讨了联合零次设定下的学习，在这种设定下，函数仅可通过一个限定的数量的点评估访问。在这种工作中，我们关注 convex 优化，并设计了首个联合零次算法，用于估计全局目标函数的曲率。我们采用一种增量希格曼统计器，其误差 нор的梯度 converge Linearly，并在联合零次设定下采样随机搜索方向从Stiefel manifold上进行改进。具体来说，梯度和希格曼统计器都在中央服务器上构建，通过同步 Pseudo-Random Number Generators 来实现通信效率和隐私保护。我们提供了对我们算法的理论分析，名为FedZeN，证明其在当前精度下的本地弯曲收敛和全局线性收敛。数值 simulations 表明我们的算法可以在超linear 收敛率下进行高效的训练。此外，我们的算法也比联合零次方法在文献中出现的性能更高。

Efficient Interpretable Nonlinear Modeling for Multiple Time Series

paper_url: http://arxiv.org/abs/2309.17154
repo_url: None
paper_authors: Kevin Roy, Luis Miguel Lopez-Ramos, Baltasar Beferull-Lozano
for: 这篇论文的目的是提出一种高效的多时序序数据非线性模型化方法，以优化时序预测和模型简化。
methods: 该方法假设时序序数据集是通过两步生成的：首先是一个线性VAR过程在隐藏空间中，然后是一组可逆和 lipschitz 不变的非线性映射，每个感知器都有一个不同的映射。模型每个组成部分的非线性使用可逆神经网络，并强制某些约束来反映实际应用中的减少依赖关系。
results: 实验结果表明，提出的方法可以更好地确定VAR几何体的支持，同时也可以提高时序预测的精度，比起当前状态艺术方法更好。

Abstract
Predictive linear and nonlinear models based on kernel machines or deep neural networks have been used to discover dependencies among time series. This paper proposes an efficient nonlinear modeling approach for multiple time series, with a complexity comparable to linear vector autoregressive (VAR) models while still incorporating nonlinear interactions among different time-series variables. The modeling assumption is that the set of time series is generated in two steps: first, a linear VAR process in a latent space, and second, a set of invertible and Lipschitz continuous nonlinear mappings that are applied per sensor, that is, a component-wise mapping from each latent variable to a variable in the measurement space. The VAR coefficient identification provides a topology representation of the dependencies among the aforementioned variables. The proposed approach models each component-wise nonlinearity using an invertible neural network and imposes sparsity on the VAR coefficients to reflect the parsimonious dependencies usually found in real applications. To efficiently solve the formulated optimization problems, a custom algorithm is devised combining proximal gradient descent, stochastic primal-dual updates, and projection to enforce the corresponding constraints. Experimental results on both synthetic and real data sets show that the proposed algorithm improves the identification of the support of the VAR coefficients in a parsimonious manner while also improving the time-series prediction, as compared to the current state-of-the-art methods.

摘要
预测线性和非线性模型，基于kernel机器或深度神经网络，已经用于发现时间序列之间的依赖关系。本文提出一种高效的多时间序列非线性模型方法，其复杂度与线性vector autoregressive（VAR）模型相当，同时仍能够捕捉不同时间序列变量之间的非线性交互关系。模型假设是，时间序列集合是通过两步生成的：首先，一个线性VAR过程在隐藏空间中，然后，每个感知器上应用一组 invertible和Lipschitz连续非线性映射，即每个隐藏变量到测量空间中的变量的组成部分 mapping。VAR偏好的标示提供了这些变量之间的依赖关系的topology表示。提posed方法每个组成部分非线性使用 invertible neural network，并强制 sparse VAR偏好，以反映实际应用中通常发现的含义简单的依赖关系。为有效地解决提出的优化问题，提出了一种自定义算法， combining proximal gradient descent、stochastic primal-dual更新和投影，以保证相应的约束。实验结果表明，提出的算法可以高效地提取VAR偏好的支持，并提高时间序列预测，相比现状态艺术方法。

GRANDE: Gradient-Based Decision Tree Ensembles

paper_url: http://arxiv.org/abs/2309.17130
repo_url: https://github.com/s-marton/grande
paper_authors: Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt
for: 这个研究是为了提出一种基于条件树的Gradient-based Decision Tree Ensemble（GRANDE）方法，用于处理杂多的条件数据。
methods: GRANDE使用了紧密的树组合表示，并使用终端到端的梯度下降来协同优化所有模型参数。它结合了轴线平行分割，这是一种对条件数据有用的推理假设，以及梯度下降的flexibility。
results: 在一个预先定义的benchmark上进行了广泛的评估，该benchmark包括19个分类 dataset，并示出GRANDE在大多数dataset上表现更好于现有的梯度增强和深度学习框架。

Abstract
Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose $\text{GRANDE}$, $\text{GRA}$die$\text{N}$t-Based $\text{D}$ecision Tree $\text{E}$nsembles, a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. GRANDE is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator to jointly optimize all model parameters. Our method combines axis-aligned splits, which is a useful inductive bias for tabular data, with the flexibility of gradient-based optimization. Furthermore, we introduce an advanced instance-wise weighting that facilitates learning representations for both, simple and complex relations, within a single model. We conducted an extensive evaluation on a predefined benchmark with 19 classification datasets and demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets.

摘要
尽管深度学习在文本和图像数据上取得了成功，但是tree-based ensemble模型仍然是机器学习适用于不同类型表格数据的现状的state-of-the-art。然而，由于表格数据的高灵活性，there is a significant need for tabular-specific gradient-based methods。在这篇论文中，我们提出了GRANDE，即GRAdieNt-Based Decision Tree Ensembles，一种使用端到端的梯度下降来学习硬的、轴对齐的决策树集的新方法。GRANDE基于紧凑的树集表示，可以使用反射operators来同时优化所有模型参数。我们的方法结合了轴对齐分割，这是表格数据的有用预测条件，以及梯度下降的灵活性。此外，我们还介绍了一种高级的实例级Weighting，它可以在单个模型中学习表格数据中的简单和复杂关系表示。我们在19个分类 datasets上进行了广泛的评估，并示出了我们的方法在大多数数据集上超过了现有的梯度束合和深度学习框架的性能。

Style Transfer for Non-differentiable Audio Effects

paper_url: http://arxiv.org/abs/2309.17125
repo_url: None
paper_authors: Kieran Grant
for: Audio production style matching, particularly for multi-band compressor effects.
methods: Deep learning approach using audio embeddings, which can be applied to various classes of effects and does not require auto-differentiation.
results: Convincingly styles match a multi-band compressor effect using the proposed approach, and the audio embeddings can be used for downstream tasks such as timbral information retrieval.Here’s the full Chinese text:
for: 这个研究旨在实现音乐制作风格匹配，特别是适用于多射频压缩器效果。
methods: 使用深度学习的音乐嵌入方法，可以应用于不同的效果类型，并且不需要自动微分运算。
results: 透过提案的方法，成功实现多射频压缩器效果的风格匹配，并且音乐嵌入可以用于后续任务，如时域信息回传。

Abstract
Digital audio effects are widely used by audio engineers to alter the acoustic and temporal qualities of audio data. However, these effects can have a large number of parameters which can make them difficult to learn for beginners and hamper creativity for professionals. Recently, there have been a number of efforts to employ progress in deep learning to acquire the low-level parameter configurations of audio effects by minimising an objective function between an input and reference track, commonly referred to as style transfer. However, current approaches use inflexible black-box techniques or require that the effects under consideration are implemented in an auto-differentiation framework. In this work, we propose a deep learning approach to audio production style matching which can be used with effects implemented in some of the most widely used frameworks, requiring only that the parameters under consideration have a continuous domain. Further, our method includes style matching for various classes of effects, many of which are difficult or impossible to be approximated closely using differentiable functions. We show that our audio embedding approach creates logical encodings of timbral information, which can be used for a number of downstream tasks. Further, we perform a listening test which demonstrates that our approach is able to convincingly style match a multi-band compressor effect.

摘要
数字音频效果广泛用于音频工程师修改音频数据的音色和时间特性。然而，这些效果可能有许多参数，可能让新手学习困难，并对专业人士增加创作压力。近些年来，有很多尝试使用深度学习来获取音频效果的低级参数配置。然而，当前的方法通常使用不可预测的黑盒技术或需要实现效果在某些自动梯度框架中。在这种情况下，我们提出了一种深度学习方法来实现音频生产风格匹配。这种方法可以与大多数常用框架中的效果结合使用，只需要参数在维度上有连续的域。此外，我们的方法包括多种类型的效果匹配，许多其中是使用可微函数很难或不可能地近似的。我们表明，我们的音频嵌入方法创造出了逻辑编码的时域信息，可以用于许多下游任务。此外，我们进行了一次听测，证明我们的方法能够成功地匹配多个批处器效果。

Sheaf Hypergraph Networks

paper_url: http://arxiv.org/abs/2309.17116
repo_url: None
paper_authors: Iulia Duta, Giulia Cassarà, Fabrizio Silvestri, Pietro Liò
for: 这篇论文旨在提高高阶关系处理的能力，以推动各种需要结构化数据的领域的发展。
methods: 该论文引入细胞筛sheaf来增强高阶图的表示，并开发了两种不同的sheaf高阶图laplacian表示方法。
results: 对多个 benchmark数据集进行了广泛的实验，表明这种扩展可以显著提高模型性能，达到了当前 Literature 中常见的 Hypergraph Networks 的顶峰成绩。

Abstract
Higher-order relations are widespread in nature, with numerous phenomena involving complex interactions that extend beyond simple pairwise connections. As a result, advancements in higher-order processing can accelerate the growth of various fields requiring structured data. Current approaches typically represent these interactions using hypergraphs. We enhance this representation by introducing cellular sheaves for hypergraphs, a mathematical construction that adds extra structure to the conventional hypergraph while maintaining their local, higherorder connectivity. Drawing inspiration from existing Laplacians in the literature, we develop two unique formulations of sheaf hypergraph Laplacians: linear and non-linear. Our theoretical analysis demonstrates that incorporating sheaves into the hypergraph Laplacian provides a more expressive inductive bias than standard hypergraph diffusion, creating a powerful instrument for effectively modelling complex data structures. We employ these sheaf hypergraph Laplacians to design two categories of models: Sheaf Hypergraph Neural Networks and Sheaf Hypergraph Convolutional Networks. These models generalize classical Hypergraph Networks often found in the literature. Through extensive experimentation, we show that this generalization significantly improves performance, achieving top results on multiple benchmark datasets for hypergraph node classification.

摘要
高阶关系广泛存在在自然中，许多现象具有复杂的互动，超出简单对应关系。因此，进步在高阶处理方面可以推动不同领域的数据结构发展。现有的方法通常使用 гиперграм（hypergraphs）来表示这些互动。我们在这些 гиперграм 上引入细胞层（cellular sheaves），一种数学建构，以添加额外结构，保持了本地、高阶连接的性质。 Drawing inspiration from existing Laplacians in the literature, we develop two unique formulations of sheaf hypergraph Laplacians: linear and non-linear. Our theoretical analysis demonstrates that incorporating sheaves into the hypergraph Laplacian provides a more expressive inductive bias than standard hypergraph diffusion, creating a powerful instrument for effectively modeling complex data structures. We employ these sheaf hypergraph Laplacians to design two categories of models: Sheaf Hypergraph Neural Networks and Sheaf Hypergraph Convolutional Networks. These models generalize classical Hypergraph Networks often found in the literature. Through extensive experimentation, we show that this generalization significantly improves performance, achieving top results on multiple benchmark datasets for hypergraph node classification.

Benchmarking Collaborative Learning Methods Cost-Effectiveness for Prostate Segmentation

paper_url: http://arxiv.org/abs/2309.17097
repo_url: None
paper_authors: Lucia Innocenti, Michela Antonelli, Francesco Cremonesi, Kenaan Sarhan, Alejandro Granados, Vicky Goh, Sebastien Ourselin, Marco Lorenzi
for: 这种研究旨在解决医疗数据分布在多个医院中的问题，并且由于隐私法规限制，访问这些数据是困难的。
methods: 这种研究使用了合作学习（CL）方法，让医院合作解决问题，而不需要直接分享本地数据。
results: 我们的实验结果表明，在考虑的实际场景下，CBM可以提供与FL相等或更好的结果，而且具有高效性。这些结果表明，共识模式可能是FL的可行替代方案。

Abstract
Healthcare data is often split into medium/small-sized collections across multiple hospitals and access to it is encumbered by privacy regulations. This brings difficulties to use them for the development of machine learning and deep learning models, which are known to be data-hungry. One way to overcome this limitation is to use collaborative learning (CL) methods, which allow hospitals to work collaboratively to solve a task, without the need to explicitly share local data. In this paper, we address a prostate segmentation problem from MRI in a collaborative scenario by comparing two different approaches: federated learning (FL) and consensus-based methods (CBM). To the best of our knowledge, this is the first work in which CBM, such as label fusion techniques, are used to solve a problem of collaborative learning. In this setting, CBM combine predictions from locally trained models to obtain a federated strong learner with ideally improved robustness and predictive variance properties. Our experiments show that, in the considered practical scenario, CBMs provide equal or better results than FL, while being highly cost-effective. Our results demonstrate that the consensus paradigm may represent a valid alternative to FL for typical training tasks in medical imaging.

摘要
医疗数据经常被分布在多个医院中的中型/小型集合中，并且由于隐私法规的限制，使得使用这些数据进行机器学习和深度学习模型的开发变得困难。为了解决这些问题，可以使用合作学习（CL）方法，让医院合作解决问题，无需显式地分享本地数据。在这篇论文中，我们研究了一个肾脏分割问题，该问题来自于MRI成像，在合作enario中进行比较两种不同的方法：联邦学习（FL）和consensus-based方法（CBM）。我们知道，这是首次在合作学习中使用CBM方法，例如标签融合技术，来解决一个协作学习问题。在这种设置下，CBM方法将本地训练的模型 predictions合并，以获得一个联邦强学习模型，其具有 идеаль的 robustness和预测偏差性质。我们的实验结果显示，在我们考虑的实际enario中，CBMs 提供了等效或更好的结果，而且非常经济。我们的结果表明，consensus paradigm可能是FL的有效替代方案，用于医学成像领域的典型训练任务。

Too Big, so Fail? – Enabling Neural Construction Methods to Solve Large-Scale Routing Problems

paper_url: http://arxiv.org/abs/2309.17089
repo_url: https://github.com/jokofa/nrr
paper_authors: Jonas K. Falkner, Lars Schmidt-Thieme
for: 解决NP-hard combinatorial优化问题，尤其是交通问题（Vehicle Routing Problems，VRP）。
methods: 使用深度学习方法，特别是顺序神经建构方法，通常通过反射学习进行训练。
results: 提出一种基于“萧瑟重建原理”的神经建构方法，可以在大规模问题上表现更好，并且在四个不同的数据集上进行了 thorought 的实验，证明了这种方法的优势。

Abstract
In recent years new deep learning approaches to solve combinatorial optimization problems, in particular NP-hard Vehicle Routing Problems (VRP), have been proposed. The most impactful of these methods are sequential neural construction approaches which are usually trained via reinforcement learning. Due to the high training costs of these models, they usually are trained on limited instance sizes (e.g. serving 100 customers) and later applied to vastly larger instance size (e.g. 2000 customers). By means of a systematic scale-up study we show that even state-of-the-art neural construction methods are outperformed by simple heuristics, failing to generalize to larger problem instances. We propose to use the ruin recreate principle that alternates between completely destroying a localized part of the solution and then recreating an improved variant. In this way, neural construction methods like POMO are never applied to the global problem but just in the reconstruction step, which only involves partial problems much closer in size to their original training instances. In thorough experiments on four datasets of varying distributions and modalities we show that our neural ruin recreate approach outperforms alternative forms of improving construction methods such as sampling and beam search and in several experiments also advanced local search approaches.

摘要
近年来，深度学习方法来解决 combinatorial optimization 问题，尤其是 NP-hard 的 Vehicle Routing Problems (VRP) ，得到了广泛的研究。最有影响的这些方法是序列神经建构方法，通常通过 reinforcement learning 进行训练。由于这些模型的训练成本高，通常只能在限制 instance size （例如，服务 100 个客户）上训练，然后将其应用到远大得多的 instance size （例如，2000 个客户）。通过系统性的扩展研究，我们发现了一些 state-of-the-art 神经建构方法无法泛化到更大的问题实例。我们提议使用灭亡重建原则，该原则是 alternate between completely destroying 地方化的部分解决方案，然后创建一个改进的变体。这样，神经建构方法如 POMO 只在重建步骤中被应用，该步骤只涉及到部分问题的大小与其原始训练实例相似。在四个不同的数据集和模式下进行了严格的实验，我们发现了我们的神经灭亡重建方法在 sampling 和 beam search 以及一些先进的本地搜索方法之上表现出色。

From Empirical Measurements to Augmented Data Rates: A Machine Learning Approach for MCS Adaptation in Sidelink Communication

paper_url: http://arxiv.org/abs/2309.17086
repo_url: https://github.com/fraunhoferhhi/sidelink-mcs-measurements
paper_authors: Asif Abdullah Rokoni, Daniel Schäufele, Martin Kasparick, Sławomir Stańczak
for: 这个论文的目的是提出一种机器学习方法来预测适合的幂等编码和传输方案（MCS）水平，以满足更高的数据速率需求。
methods: 该论文提出使用机器学习方法预测适合的MCS水平，并评估了不同算法的效果。同时，它还使用了量谱预测来提高预测性能。
results: 论文表明，使用机器学习方法可以对MCS水平进行适应性预测，并且可以实现 significanly 提高预测性能。此外，论文还提供了一个大量的实际驱动测试数据集，并将其公开发布。

Abstract
Due to the lack of a feedback channel in the C-V2X sidelink, finding a suitable modulation and coding scheme (MCS) is a difficult task. However, recent use cases for vehicle-to-everything (V2X) communication with higher demands on data rate necessitate choosing the MCS adaptively. In this paper, we propose a machine learning approach to predict suitable MCS levels. Additionally, we propose the use of quantile prediction and evaluate it in combination with different algorithms for the task of predicting the MCS level with the highest achievable data rate. Thereby, we show significant improvements over conventional methods of choosing the MCS level. Using a machine learning approach, however, requires larger real-world data sets than are currently publicly available for research. For this reason, this paper presents a data set that was acquired in extensive drive tests, and that we make publicly available.

摘要
Translation in Simplified Chinese:由于C-V2X侧链缺乏反馈通道，选择适当的modulation and coding scheme（MCS）是一个困难的任务。然而，Recent vehicle-to-everything（V2X）通信use cases with higher demands on data rate require adaptive selection of the MCS. In this paper, we propose a machine learning approach to predict suitable MCS levels. In addition, we propose the use of quantile prediction and evaluate it in combination with different algorithms for the task of predicting the MCS level with the highest achievable data rate. As a result, we show significant improvements over conventional methods of choosing the MCS level. However, using a machine learning approach requires larger real-world data sets than are currently publicly available for research. Therefore, this paper presents a data set that was acquired in extensive drive tests and makes it publicly available.

Diffusion Models as Stochastic Quantization in Lattice Field Theory

paper_url: http://arxiv.org/abs/2309.17082
repo_url: None
paper_authors: Lingxiao Wang, Gert Aarts, Kai Zhou
for: 这个论文主要是为了研究生成扩散模型（DM）和随机量化（SQ）之间的直接连接。
methods: 这篇论文使用了数值仿真来实现DM，通过精确地模拟某种随机过程，生成样本从先验分布，以模拟目标分布。
results: 数值仿真表明，DM可以作为全球抽样器，生成二维$\phi^4$理论中的量子网格场 configurations。此外，DM还可以显著减少自相关时间，特别是在MCMC算法在极点区域中经历极慢减速。这些发现可能会推动量子网格场 simulations的进一步发展，特别是在生成大 ensemble是昂贵的情况下。

Abstract
In this work, we establish a direct connection between generative diffusion models (DMs) and stochastic quantization (SQ). The DM is realized by approximating the reversal of a stochastic process dictated by the Langevin equation, generating samples from a prior distribution to effectively mimic the target distribution. Using numerical simulations, we demonstrate that the DM can serve as a global sampler for generating quantum lattice field configurations in two-dimensional $\phi^4$ theory. We demonstrate that DMs can notably reduce autocorrelation times in the Markov chain, especially in the critical region where standard Markov Chain Monte-Carlo (MCMC) algorithms experience critical slowing down. The findings can potentially inspire further advancements in lattice field theory simulations, in particular in cases where it is expensive to generate large ensembles.

摘要
在这个工作中，我们建立了生成扩散模型（DM）和随机量化（SQ）之间的直接连接。DM通过近似逆射扩散过程的斜率方程，生成样本从先验分布，有效地模拟目标分布。通过数值实验，我们示出了DM可以作为全球抽取器，生成二维$\phi^4$理论中的量子核场配置。我们还证明了DM可以明显减少自相关时间在马尔可夫链中，特别是在 kritical 区域，标准马尔可夫链 Monte-Carlo（MCMC）算法经历了极限减速。这些发现可能会推动更多的链场理论仿真，特别是在生成大集的情况下。

On the Power of the Weisfeiler-Leman Test for Graph Motif Parameters

paper_url: http://arxiv.org/abs/2309.17053
repo_url: None
paper_authors: Matthias Lanzinger, Pablo Barceló
for: 这项研究的目的是解释 Weisfeiler-Leman（$k$WL）测试如何识别不同图形的特征。
methods: 这项研究使用图神经网络（GNNs）的核心理论和 $k$WL 测试来研究图形的特征分辨率。
results: 这项研究提供了一种精确地 caracterize 图形动作参数的方法，并证明了在某些情况下，可以使用 GNN 的最后一层本地信息来计算图形中特定 Pattern 的出现次数。

Abstract
Seminal research in the field of graph neural networks (GNNs) has revealed a direct correspondence between the expressive capabilities of GNNs and the $k$-dimensional Weisfeiler-Leman ($k$WL) test, a widely-recognized method for verifying graph isomorphism. This connection has reignited interest in comprehending the specific graph properties effectively distinguishable by the $k$WL test. A central focus of research in this field revolves around determining the least dimensionality $k$, for which $k$WL can discern graphs with different number of occurrences of a pattern graph $P$. We refer to such a least $k$ as the WL-dimension of this pattern counting problem. This inquiry traditionally delves into two distinct counting problems related to patterns: subgraph counting and induced subgraph counting. Intriguingly, despite their initial appearance as separate challenges with seemingly divergent approaches, both of these problems are interconnected components of a more comprehensive problem: "graph motif parameters". In this paper, we provide a precise characterization of the WL-dimension of labeled graph motif parameters. As specific instances of this result, we obtain characterizations of the WL-dimension of the subgraph counting and induced subgraph counting problem for every labeled pattern $P$. We additionally demonstrate that in cases where the $k$WL test distinguishes between graphs with varying occurrences of a pattern $P$, the exact number of occurrences of $P$ can be computed uniformly using only local information of the last layer of a corresponding GNN. We finally delve into the challenge of recognizing the WL-dimension of various graph parameters. We give a polynomial time algorithm for determining the WL-dimension of the subgraph counting problem for given pattern $P$, answering an open question from previous work.

摘要
研究领域内的核心研究表明，图 neuronal networks（GNNs）的表达能力与 $k$-dimensional Weisfeiler-Leman（$k$WL）测试之间存在直接的对应关系。这种关系在研究图的特定属性表征方面产生了新的兴趣。我们的研究重点在于确定 Pattern counting 问题中的最小维度 $k$，以便使 $k$WL 测试能够分辨不同 Pattern 图的数量。我们称这个最小维度为 Pattern 图的 WL 维度。这种问题包括两个不同的 counted 问题，即 subgraph counting 和 induced subgraph counting。尽管这两个问题看起来像是独立的挑战，但它们实际上是图 моти夫参数的一部分。在这篇论文中，我们提供了图 моти夫参数的准确特征化，包括 subgraph counting 和 induced subgraph counting 问题的 WL 维度特征化。此外，我们还证明在 $k$WL 测试可以分辨不同 Pattern 图的情况下，可以使用 GNN 的最后一层本地信息来计算 Pattern 图的具体出现次数。最后，我们考虑了识别不同图参数的 WL 维度的挑战。我们提供了对 Pattern 图 counting 问题的 polynomial time 算法，解决了之前的开题。

Efficient Agnostic Learning with Average Smoothness

paper_url: http://arxiv.org/abs/2309.17016
repo_url: None
paper_authors: Steve Hanneke, Aryeh Kontorovich, Guy Kornowski
for: 本研究探讨分布自由非 Parametric 回归，基于 Ashlagi et al. (2021) 提出的平均稳定性概念，测量函数对于未知的基础分布的”有效”稳定性。
methods: 我们使用分布自由的均值渐近级数 bound，并提供了计算效率高的agnostic学习算法。
results: 我们完全填充了这些漏洞，提供了分布自由的均值渐近级数 bound，并与计算效率高的agnostic学习算法相匹配。我们的结果适用于任何totally bounded metric space，并且显示了实现了对于 realizable 学习的保证。

Abstract
We study distribution-free nonparametric regression following a notion of average smoothness initiated by Ashlagi et al. (2021), which measures the "effective" smoothness of a function with respect to an arbitrary unknown underlying distribution. While the recent work of Hanneke et al. (2023) established tight uniform convergence bounds for average-smooth functions in the realizable case and provided a computationally efficient realizable learning algorithm, both of these results currently lack analogs in the general agnostic (i.e. noisy) case. In this work, we fully close these gaps. First, we provide a distribution-free uniform convergence bound for average-smoothness classes in the agnostic setting. Second, we match the derived sample complexity with a computationally efficient agnostic learning algorithm. Our results, which are stated in terms of the intrinsic geometry of the data and hold over any totally bounded metric space, show that the guarantees recently obtained for realizable learning of average-smooth functions transfer to the agnostic setting. At the heart of our proof, we establish the uniform convergence rate of a function class in terms of its bracketing entropy, which may be of independent interest.

摘要
我们研究分布自由非 Parametric 回归，基于 Ashlagi 等 (2021) 提出的平均滑动性概念，该概念测量函数对于未知的平均分布下的"有效"滑动性。而 Hanneke 等 (2023) 的最近研究已经在可 realizable 情况下提供了紧Binding的 uniform 收敛 bounds 和可实现的学习算法，但这两个结果目前在无知（i.e. 噪声）情况下缺乏对应的结果。在这个工作中，我们完全填充了这些差距。首先，我们提供了分布自由的 uniform 收敛 bound для average-smoothness 类型在无知情况下。其次，我们与 derived sample complexity 匹配了一个可实现的 agnostic 学习算法。我们的结果，表示在任何完全度 bounded метри空间上，对于内在的几何结构和数据而言，可以证明 realizable 学习 average-smooth functions 的 guarantees 在无知情况下也适用。在我们的证明中，我们使用函数类型的 bracketing entropy 来证明 uniform 收敛率，这可能是独立的兴趣点。

Feature Cognition Enhancement via Interaction-Aware Automated Transformation

paper_url: http://arxiv.org/abs/2309.17011
repo_url: https://github.com/ehtesam3154/inhrecon
paper_authors: Ehtesamul Azim, Dongjie Wang, Kunpeng Liu, Wei Zhang, Yanjie Fu
for: This paper aims to address the challenges of representation learning in machine learning, specifically the issues of heavy reliance on manual feature engineering, lack of explainability, and inflexible feature space reconstruction.
methods: The proposed approach is based on interaction-aware reinforcement generation, which involves creating meaningful features and controlling feature set size through selection. The authors use a hierarchical reinforcement learning structure with cascading Markov Decision Processes to automate feature and operation selection, as well as feature crossing.
results: The authors conduct extensive experiments to validate their proposed approach, demonstrating the effectiveness of their method in generating intelligible and efficient feature spaces that emulate human decision-making.

Abstract
Creating an effective representation space is crucial for mitigating the curse of dimensionality, enhancing model generalization, addressing data sparsity, and leveraging classical models more effectively. Recent advancements in automated feature engineering (AutoFE) have made significant progress in addressing various challenges associated with representation learning, issues such as heavy reliance on intensive labor and empirical experiences, lack of explainable explicitness, and inflexible feature space reconstruction embedded into downstream tasks. However, these approaches are constrained by: 1) generation of potentially unintelligible and illogical reconstructed feature spaces, stemming from the neglect of expert-level cognitive processes; 2) lack of systematic exploration, which subsequently results in slower model convergence for identification of optimal feature space. To address these, we introduce an interaction-aware reinforced generation perspective. We redefine feature space reconstruction as a nested process of creating meaningful features and controlling feature set size through selection. We develop a hierarchical reinforcement learning structure with cascading Markov Decision Processes to automate feature and operation selection, as well as feature crossing. By incorporating statistical measures, we reward agents based on the interaction strength between selected features, resulting in intelligent and efficient exploration of the feature space that emulates human decision-making. Extensive experiments are conducted to validate our proposed approach.

摘要

Generation of potentially unintelligible and illogical reconstructed feature spaces, stemming from the neglect of expert-level cognitive processes;2. Lack of systematic exploration, which subsequently results in slower model convergence for identification of optimal feature space.To address these challenges, we introduce an interaction-aware reinforcement generation perspective. We redefine feature space reconstruction as a nested process of creating meaningful features and controlling feature set size through selection. We develop a hierarchical reinforcement learning structure with cascading Markov Decision Processes to automate feature and operation selection, as well as feature crossing. By incorporating statistical measures, we reward agents based on the interaction strength between selected features, resulting in intelligent and efficient exploration of the feature space that emulates human decision-making.Extensive experiments are conducted to validate our proposed approach.

Deep Representation Learning for Prediction of Temporal Event Sets in the Continuous Time Domain

paper_url: http://arxiv.org/abs/2309.17009
repo_url: https://github.com/paragduttaiisc/temporal_event_set_modeling
paper_authors: Parag Dutta, Kawin Mayilvaghanan, Pratyaksha Sinha, Ambedkar Dukkipati
for: 预测或预测多个事件的发生
methods: 使用Temporal Point Processes (TPP)模型，并 incorporate contextual event embeddings、 temporal information和域特征来模型时间事件集
results: 比较 existed方法，our proposed approach在多个 dataset上进行了广泛的实验，并达到了较高的预测精度和计算效率

Abstract
Temporal Point Processes (TPP) play an important role in predicting or forecasting events. Although these problems have been studied extensively, predicting multiple simultaneously occurring events can be challenging. For instance, more often than not, a patient gets admitted to a hospital with multiple conditions at a time. Similarly people buy more than one stock and multiple news breaks out at the same time. Moreover, these events do not occur at discrete time intervals, and forecasting event sets in the continuous time domain remains an open problem. Naive approaches for extending the existing TPP models for solving this problem lead to dealing with an exponentially large number of events or ignoring set dependencies among events. In this work, we propose a scalable and efficient approach based on TPPs to solve this problem. Our proposed approach incorporates contextual event embeddings, temporal information, and domain features to model the temporal event sets. We demonstrate the effectiveness of our approach through extensive experiments on multiple datasets, showing that our model outperforms existing methods in terms of prediction metrics and computational efficiency. To the best of our knowledge, this is the first work that solves the problem of predicting event set intensities in the continuous time domain by using TPPs.

摘要
temporal point processes (TPP) 扮演着重要的角色在预测或预测事件中。although these problems have been studied extensively, predicting multiple simultaneously occurring events can be challenging. for instance, more often than not, a patient gets admitted to a hospital with multiple conditions at a time. similarly, people buy more than one stock and multiple news breaks out at the same time. Moreover, these events do not occur at discrete time intervals, and forecasting event sets in the continuous time domain remains an open problem. Naive approaches for extending the existing TPP models for solving this problem lead to dealing with an exponentially large number of events or ignoring set dependencies among events. In this work, we propose a scalable and efficient approach based on TPPs to solve this problem. our proposed approach incorporates contextual event embeddings, temporal information, and domain features to model the temporal event sets. we demonstrate the effectiveness of our approach through extensive experiments on multiple datasets, showing that our model outperforms existing methods in terms of prediction metrics and computational efficiency. to the best of our knowledge, this is the first work that solves the problem of predicting event set intensities in the continuous time domain by using TPPs.Here's the word-for-word translation of the text into Simplified Chinese:temporal point processes (TPP) 扮演着重要的角色在预测或预测事件中。although these problems have been studied extensively, predicting multiple simultaneously occurring events can be challenging. for instance, more often than not, a patient gets admitted to a hospital with multiple conditions at a time. similarly, people buy more than one stock and multiple news breaks out at the same time. Moreover, these events do not occur at discrete time intervals, and forecasting event sets in the continuous time domain remains an open problem. Naive approaches for extending the existing TPP models for solving this problem lead to dealing with an exponentially large number of events or ignoring set dependencies among events. In this work, we propose a scalable and efficient approach based on TPPs to solve this problem. our proposed approach incorporates contextual event embeddings, temporal information, and domain features to model the temporal event sets. we demonstrate the effectiveness of our approach through extensive experiments on multiple datasets, showing that our model outperforms existing methods in terms of prediction metrics and computational efficiency. to the best of our knowledge, this is the first work that solves the problem of predicting event set intensities in the continuous time domain by using TPPs.

Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.16984
repo_url: None
paper_authors: Zihan Ding, Chi Jin
for: 这个论文主要针对的是用Score-based生成模型进行多Modal数据的模型化，以及在强化学习中使用iterative sampling。
methods: 该论文提出了一种效果很好的actor-critic风格的算法，即一致策略，用于三种常见的强化学习Setting：offline、offline-to-online和online。
results: 在offlineRL中，生成模型作为策略从多Modal数据中表达了表达能力。在offline-to-onlineRL中，一致策略比Diffusion策略更快速，性能相似。在onlineRL中，一致策略demonstrated significant speedup和even higher average performance than Diffusion策略。

Abstract
Score-based generative models like the diffusion model have been testified to be effective in modeling multi-modal data from image generation to reinforcement learning (RL). However, the inference process of diffusion model can be slow, which hinders its usage in RL with iterative sampling. We propose to apply the consistency model as an efficient yet expressive policy representation, namely consistency policy, with an actor-critic style algorithm for three typical RL settings: offline, offline-to-online and online. For offline RL, we demonstrate the expressiveness of generative models as policies from multi-modal data. For offline-to-online RL, the consistency policy is shown to be more computational efficient than diffusion policy, with a comparable performance. For online RL, the consistency policy demonstrates significant speedup and even higher average performances than the diffusion policy.

摘要
Score-based生成模型如扩散模型在图像生成和强化学习中证明有效，但扩散模型的推理过程可能会慢，这限制了它在强化学习中使用的可行性。我们提议使用一种高效又表达力强的策略表示方式，即一致策略，并使用actor-critic风格的算法来解决三种常见强化学习设置：离线、离线到在线和在线。在离线强化学习中，我们示出了生成模型作为策略从多Modal数据中表达的表达力。在离线到在线强化学习中，一致策略与扩散策略相比， computationally更高效，性能相似。在在线强化学习中，一致策略示出了明显的加速和比扩散策略更高的平均性能。

Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors

paper_url: http://arxiv.org/abs/2309.16976
repo_url: None
paper_authors: Chengming Zhang, Baixi Sun, Xiaodong Yu, Zhen Xie, Weijian Zheng, Kamil Iskra, Pete Beckman, Dingwen Tao
for: 本文旨在探讨使用特殊AI硬件加速器 Habana GAUDI 加速Transformer模型，解决高计算复杂性和资源需求的问题。
methods: 本文使用Matrix Multiplication Engine (MME)和一群可编程的Tensor Processing Cores (TPC)进行加速。
results: 本文对GAUDI进行了完整的性能比较，揭示了MME和TPC的相对优劣点。此外，本文还提出了优化MME和TPC使用的策略，并评估了Transformer模型在GAUDI上的性能。

Abstract
Transformer models have achieved remarkable success in various machine learning tasks but suffer from high computational complexity and resource requirements. The quadratic complexity of the self-attention mechanism further exacerbates these challenges when dealing with long sequences and large datasets. Specialized AI hardware accelerators, such as the Habana GAUDI architecture, offer a promising solution to tackle these issues. GAUDI features a Matrix Multiplication Engine (MME) and a cluster of fully programmable Tensor Processing Cores (TPC). This paper explores the untapped potential of using GAUDI processors to accelerate Transformer-based models, addressing key challenges in the process. Firstly, we provide a comprehensive performance comparison between the MME and TPC components, illuminating their relative strengths and weaknesses. Secondly, we explore strategies to optimize MME and TPC utilization, offering practical insights to enhance computational efficiency. Thirdly, we evaluate the performance of Transformers on GAUDI, particularly in handling long sequences and uncovering performance bottlenecks. Lastly, we evaluate the end-to-end performance of two Transformer-based large language models (LLM) on GAUDI. The contributions of this work encompass practical insights for practitioners and researchers alike. We delve into GAUDI's capabilities for Transformers through systematic profiling, analysis, and optimization exploration. Our study bridges a research gap and offers a roadmap for optimizing Transformer-based model training on the GAUDI architecture.

摘要
启示器模型已经在不同的机器学习任务中获得了惊人的成功，但它们受到高度的计算复杂性和资源需求的限制。它们的自注意机制的quadratic complexity进一步增加了对于长序列和大量数据的挑战。特种的AI硬件加速器，如Habana GAUDI架构，提供了一个有前途的解决方案。GAUDI架构包括一个矩阵 multiply 引擎（MME）和一群可编程的tensor处理核心（TPC）。本文探讨了使用GAUDI处理器加速启示器模型，解决关键的挑战。首先，我们提供了MME和TPC组件之间的完整性比较，揭示它们的相对优劣点。其次，我们探讨了如何优化MME和TPC的使用，提供了实践的指导，以提高计算效率。第三，我们评估了Transformers在GAUDI上的性能，特别是处理长序列的能力。最后，我们评估了两种基于启示器的大型自然语言模型（LLM）在GAUDI上的综合性能。本文的贡献包括实践的指导和研究人员之间的交流，我们通过系统性的探讨、分析和优化探讨GAUDI架构对启示器模型的可能性。我们的研究填补了一个研究漏洞，并提供了优化启示器模型在GAUDI架构上的路线图。

Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness

paper_url: http://arxiv.org/abs/2309.16973
repo_url: None
paper_authors: Xiaoyu Wen, Xudong Yu, Rui Yang, Chenjia Bai, Zhen Wang
for: 提高RL中的样本效率，使用offlineRL和onlineRL的组合，并在有限的在线交互中提高offline训练的agent。
methods: 提出了Robust Offline-to-Online（RO2O）算法，通过不确定性和精度的追加，使RO2O能够在线上适应而不需要特殊的学习目标变化。
results: RO2O在线上适应中实现了稳定的学习进程，并在限制的在线交互情况下达到了显著的改进。

Abstract
To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a promising approach involves the combination of offline RL, which enhances sample efficiency by leveraging offline datasets, and online RL, which explores informative transitions by interacting with the environment. Offline-to-Online (O2O) RL provides a paradigm for improving an offline trained agent within limited online interactions. However, due to the significant distribution shift between online experiences and offline data, most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in O2O adaptation. To address this problem, we propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation. Specifically, RO2O incorporates Q-ensemble for uncertainty penalty and adversarial samples for policy and value smoothness, which enable RO2O to maintain a consistent learning procedure in online adaptation without requiring special changes to the learning objective. Theoretical analyses in linear MDPs demonstrate that the uncertainty and smoothness lead to a tighter optimality bound in O2O against distribution shift. Experimental results illustrate the superiority of RO2O in facilitating stable offline-to-online learning and achieving significant improvement with limited online interactions.

摘要
<>转换给定文本到简化中文。<>在强化学习（RL）中，以 комбинаción of offline RL 和 online RL 为例，可以提高样本效率并探索有用的转移。Offline-to-Online（O2O）RL 提供了一种改进 offline 训练的机制，但由于在线经验和 offline 数据之间的分布偏移，大多数 offline RL 算法会导致性能下降并在 O2O 适应中失去稳定的政策改进。为解决这个问题，我们提出了 Robust Offline-to-Online（RO2O）算法，通过不确定性和简直性来增强 offline 政策，并在在线适应中维护一个稳定的学习过程。特别是，RO2O 使用 Q-ensemble для不确定性 penalty 和对策和价值简直性，这使得 RO2O 可以在在线适应中维护一个稳定的学习过程，而无需特殊地改变学习目标。理论分析在线 MDP 中表明，不确定性和简直性导致在 O2O 中对分布偏移的优质环境。实验结果表明，RO2O 可以在有限的在线交互下实现稳定的 offline-to-online 学习和显著的改进。

Multi-Resolution Active Learning of Fourier Neural Operators

paper_url: http://arxiv.org/abs/2309.16971
repo_url: None
paper_authors: Shibo Li, Xin Yu, Wei Xing, Mike Kirby, Akil Narayan, Shandian Zhe
For: 提高 FNO 的训练和预测效率，降低数据成本。* Methods: 动态选择输入函数和分辨率，使用ensemble Monte-Carlo实现有效 posterior inference算法，使用 moments matching和矩阵 determinant lemma 实现可追踪的、高效的实用性计算。* Results: 在多个 benchmark 运算符学习任务中表现优秀，并且可以避免高分辨率查询过早停满问题。

Abstract
Fourier Neural Operator (FNO) is a popular operator learning framework, which not only achieves the state-of-the-art performance in many tasks, but also is highly efficient in training and prediction. However, collecting training data for the FNO is a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.

摘要
法ouvrier neural operator（FNO）是一种受欢迎的运算学框架，不仅可以在许多任务中达到状态机器人的性能，而且在训练和预测中也具有高效性。然而，在实践中收集FNO的训练数据可能会成为一个昂贵的瓶颈，因为它们经常需要昂贵的物理 simulate。为了解决这个问题，我们提出了多resolution active learning of FNO（MRA-FNO），它可以在最小化数据成本的情况下，在训练和预测中尽可能高效地学习。我们提出了一种probabilistic multi-resolution FNO，并使用ensemble Monte-Carlo来开发一个有效的 posterior inference 算法。为了实施活动学习，我们在每一步中 maximize一个利用函数和解构函数的utilit-cost比，以获取新的示例和分辨率。我们使用 moments matching和矩阵 determinant lemma来实现可迭代、高效的utilit computation。此外，我们开发了一个cost annealing框架，以避免在早期stage 高分辨率查询时的过惩罚。当分辨率之间的成本差距较大时，活动学习经常会被困在低分辨率查询中，导致性能差。我们的方法可以在多个多fidelity active learning和优化问题中应用。我们在一些benchmark operator learning任务中表明了我们的方法的优势。

Controlling Continuous Relaxation for Combinatorial Optimization

paper_url: http://arxiv.org/abs/2309.16965
repo_url: None
paper_authors: Yuma Ichikawa
for: 解决大规模的组合优化问题 (CO)，physics-inspired GNN (PI-GNN) 算法在大规模 CO 问题中表现出色，但是对于稠密图上的 CO 问题，PI-GNN 算法的性能尚未得到了充分的探讨。此外，PI-GNN 算法使用了一种允许拟合的策略，可能会导致解的不稳定性。
methods: 本文提出了两种方法来解决这些问题：首先，引入了一个新的罚项来控制变量的连续性和离散性，从而消除了本地解；其次，提出了一种新的连续缓和热成法（CRA），该法在找到解之前先优先级化连续解，然后逐渐增加罚项，直到变量接近离散值，从而消除了人工将连续解转换为原始离散解的需要。
results: 实验表明，在稠密图上的 CO 问题上，本文提出的方法可以获得更好的结果，而且在相对稀疏图上的 CO 问题上也表现出色。此外，计算时间呈线性增长，与PI-GNN算法一样。

Abstract
Recent advancements in combinatorial optimization (CO) problems emphasize the potential of graph neural networks (GNNs). The physics-inspired GNN (PI-GNN) solver, which finds approximate solutions through unsupervised learning, has attracted significant attention for large-scale CO problems. Nevertheless, there has been limited discussion on the performance of the PI-GNN solver for CO problems on relatively dense graphs where the performance of greedy algorithms worsens. In addition, since the PI-GNN solver employs a relaxation strategy, an artificial transformation from the continuous space back to the original discrete space is necessary after learning, potentially undermining the robustness of the solutions. This paper numerically demonstrates that the PI-GNN solver can be trapped in a local solution, where all variables are zero, in the early stage of learning for CO problems on the dense graphs. Then, we address these problems by controlling the continuity and discreteness of relaxed variables while avoiding the local solution: (i) introducing a new penalty term that controls the continuity and discreteness of the relaxed variables and eliminates the local solution; (ii) proposing a new continuous relaxation annealing (CRA) strategy. This new annealing first prioritizes continuous solutions and intensifies exploration by leveraging the continuity while avoiding the local solution and then schedules the penalty term for prioritizing a discrete solution until the relaxed variables are almost discrete values, which eliminates the need for an artificial transformation from the continuous to the original discrete space. Empirically, better results are obtained for CO problems on the dense graphs, where the PI-GNN solver struggles to find reasonable solutions, and for those on relatively sparse graphs. Furthermore, the computational time scaling is identical to that of the PI-GNN solver.

摘要
近期的 combinatorial optimization（CO）问题的进展强调了图神经网络（GNN）的潜在能力。physics-inspired GNN（PI-GNN）解决方案，通过无监督学习找到approximate solutions，在大规模CO问题上吸引了 significative attention。然而，有限的讨论是关于PI-GNN解决方案在密集图上的性能，以及greedy算法在密集图上的性能下降。此外，由于PI-GNN解决方案使用了放松策略，因此需要人工将学习后的练习数据转换回原始的离散空间，可能会损害解决方案的稳定性。本文通过数字实验表明，PI-GNN解决方案在密集图上的CO问题中可能会被困在早期学习阶段的本地解。然后，我们解决这些问题：（i）引入一个新的罚项，控制放松变量的连续性和离散性，并消除本地解；（ii）提出一种新的练习气化（CRA）策略。这种新策略在优先级顺序中允许连续解，并通过强化探索来避免本地解，然后在变量接近离散值时间间隔出罚项，以消除人工将练习数据转换回原始离散空间的需要。实际上，对CO问题的密集图和相对稀疏图进行实验，我们可以获得更好的结果，而且计算时间协同PI-GNN解决方案。

Leveraging Optimization for Adaptive Attacks on Image Watermarks

paper_url: http://arxiv.org/abs/2309.16952
repo_url: None
paper_authors: Nils Lukas, Abdulrahman Diaa, Lucas Fenaux, Florian Kerschbaum
for: The paper is written to address the issue of untrustworthy users misusing image generators to create high-quality deepfakes and engage in online spam or disinformation campaigns.
methods: The paper proposes a method of watermarking to deter misuse by marking generated content with a hidden message, and uses an adaptive attack to evaluate the robustness of the watermarking algorithm.
results: The paper demonstrates that an adaptive attack can break all five surveyed watermarking methods at negligible degradation in image quality, emphasizing the need for more rigorous robustness testing against adaptive, learnable attackers.Here is the same information in Simplified Chinese:
for: 论文目的是解决不可靠用户利用图像生成器创造高质量的深伪和在线诈骗或假信息运动中使用图像生成器。
methods: 论文提出了一种防止诈骗的方法，即通过隐藏的消息标记生成的内容，并使用适应性攻击来评估水印算法的安全性。
results: 论文表明，适应性攻击可以破坏所有调查的五种水印方法，而且这些攻击不会导致图像质量明显下降，从而强调了对适应性攻击的更加严格的安全性测试。

Abstract
Untrustworthy users can misuse image generators to synthesize high-quality deepfakes and engage in online spam or disinformation campaigns. Watermarking deters misuse by marking generated content with a hidden message, enabling its detection using a secret watermarking key. A core security property of watermarking is robustness, which states that an attacker can only evade detection by substantially degrading image quality. Assessing robustness requires designing an adaptive attack for the specific watermarking algorithm. A challenge when evaluating watermarking algorithms and their (adaptive) attacks is to determine whether an adaptive attack is optimal, i.e., it is the best possible attack. We solve this problem by defining an objective function and then approach adaptive attacks as an optimization problem. The core idea of our adaptive attacks is to replicate secret watermarking keys locally by creating surrogate keys that are differentiable and can be used to optimize the attack's parameters. We demonstrate for Stable Diffusion models that such an attacker can break all five surveyed watermarking methods at negligible degradation in image quality. These findings emphasize the need for more rigorous robustness testing against adaptive, learnable attackers.

摘要
不可靠用户可能会滥用图像生成器生成高质量的深伪图并进行在线垃圾或歪曲信息运动。水印可以防止这种滥用行为，通过将生成内容中加入隐藏的信息，并使用机密水印键来检测。水印的核心安全特性是 robustness，即攻击者只能通过重大干扰图像质量来逃脱检测。评估robustness需要设计适应攻击的特定水印算法。一个挑战是在评估水印算法和其适应攻击时，确定攻击是最佳的。我们解决这个问题，通过定义一个目标函数，然后将适应攻击看作优化问题。我们的适应攻击的核心思想是在本地复制秘密水印键，通过创建可微分的代理键来优化攻击参数。我们示例中，对于稳定扩散模型，攻击者可以破坏所评估的五种水印方法，而且这些攻击对图像质量的影响非常小。这些发现强调了对适应、学习型攻击的更加严格的Robustness测试。

Beyond Tides and Time: Machine Learning Triumph in Water Quality

paper_url: http://arxiv.org/abs/2309.16951
repo_url: https://github.com/yinpuli/water-quality-prediction
paper_authors: Yinpu Li, Siqi Mao, Yaping Yuan, Ziren Wang, Yixin Kang, Yuanxin Yao
for: 预测水质值的精度和可 interpretability
methods: 使用五种不同的预测模型，包括线性回归、Random Forest、XGBoost、LightGBM和MLP神经网络，对美国格鲁吉亚地区的pH值进行预测。
results: LightGBM模型表现最佳，实现最高的平均准确率；树型模型在解决回归问题方面表现出色，而MLP神经网络受到特征涨分的敏感性受到影响。

Abstract
Water resources are essential for sustaining human livelihoods and environmental well being. Accurate water quality prediction plays a pivotal role in effective resource management and pollution mitigation. In this study, we assess the effectiveness of five distinct predictive models linear regression, Random Forest, XGBoost, LightGBM, and MLP neural network, in forecasting pH values within the geographical context of Georgia, USA. Notably, LightGBM emerges as the top performing model, achieving the highest average precision. Our analysis underscores the supremacy of tree-based models in addressing regression challenges, while revealing the sensitivity of MLP neural networks to feature scaling. Intriguingly, our findings shed light on a counterintuitive discovery: machine learning models, which do not explicitly account for time dependencies and spatial considerations, outperform spatial temporal models. This unexpected superiority of machine learning models challenges conventional assumptions and highlights their potential for practical applications in water quality prediction. Our research aims to establish a robust predictive pipeline accessible to both data science experts and those without domain specific knowledge. In essence, we present a novel perspective on achieving high prediction accuracy and interpretability in data science methodologies. Through this study, we redefine the boundaries of water quality forecasting, emphasizing the significance of data driven approaches over traditional spatial temporal models. Our findings offer valuable insights into the evolving landscape of water resource management and environmental protection.

摘要
水资源是人类生存和环境健康的重要保障。 precisione 预测水质是有效资源管理和污染防治的关键。在本研究中，我们评估了五种不同的预测模型，包括线性回归、Random Forest、XGBoost、LightGBM 和 MLP神经网络，以预测格鲁吉亚（Georgia）地区的pH值。结果显示LightGBM 模型在预测pH值方面表现出色，得到了最高的平均准确率。我们的分析发现树状模型在 Addressing regression challenges 方面具有优势，而神经网络模型受到特征Scaling 的影响。另外，我们的发现推翻了传统假设：机器学习模型，不直接考虑时间和空间因素，在水质预测方面表现更高。这种对机器学习模型的发现挑战了传统的假设，并指出了其在实际应用中的潜在价值。我们的研究旨在建立一个可 accessible 的预测管道，以便无关域专业知识的人员和数据科学专家都可以使用。具体来说，我们在这种研究中提出了一种新的预测方法，即通过数据驱动的方法来实现高精度和可解释性。我们的发现对水资源管理和环境保护领域的发展具有重要意义。

Physics-Informed Induction Machine Modelling

paper_url: http://arxiv.org/abs/2309.16943
repo_url: None
paper_authors: Qing Shen, Yifan Zhou, Peng Zhang
for: 本研究开发了一个名为NeuIM的人工智能模型，用于实现基于人工智能的电磁变化 simulating。
methods: 本研究使用了物理知识驱动的机器学习方法，实现了IM机器的相域表现。
results: 实验结果显示，NeuIM可以实现高准确性和高效率的电磁变化 simulating，并且在没有数据的情况下仍能够正确预测IM机器的动作。

Abstract
This rapid communication devises a Neural Induction Machine (NeuIM) model, which pilots the use of physics-informed machine learning to enable AI-based electromagnetic transient simulations. The contributions are threefold: (1) a formation of NeuIM to represent the induction machine in phase domain; (2) a physics-informed neural network capable of capturing fast and slow IM dynamics even in the absence of data; and (3) a data-physics-integrated hybrid NeuIM approach which is adaptive to various levels of data availability. Extensive case studies validate the efficacy of NeuIM and in particular, its advantage over purely data-driven approaches.

摘要
这种快速通信创造了一种神经起 induction machine（NeuIM）模型，该模型利用物理学 Informed机器学习来实现人工智能基于电磁脉冲 simulating。这个贡献有三个方面：1. 将NeuIM模型表示为频域中的induction machine;2. 一种能够捕捉快速和慢速IM动态的物理学 Informed神经网络;3. 一种可以适应不同数据可用性水平的数据-物理-混合NeuIM方法。广泛的案例研究证明了NeuIM的有效性，特别是与完全数据驱动方法相比的优势。

G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks

paper_url: http://arxiv.org/abs/2309.16941
repo_url: https://github.com/zhaoyu-li/g4satbench
paper_authors: Zhaoyu Li, Jinpei Guo, Xujie Si
for: 提供了一个综合评估框架 дляGraph Neural Networks (GNNs) based Boolean Satisfiability Problem (SAT) 解决方案。
methods: 使用了多种GNN模型，包括不同的预测任务、训练目标和推理算法，并对其进行了比较。
results: 显示了GNN模型可以有效地学习一种类似于排序搜索的解决策略，但它们困难地学习循环搜索在隐藏空间中的技巧。

Abstract
Graph neural networks (GNNs) have recently emerged as a promising approach for solving the Boolean Satisfiability Problem (SAT), offering potential alternatives to traditional backtracking or local search SAT solvers. However, despite the growing volume of literature in this field, there remains a notable absence of a unified dataset and a fair benchmark to evaluate and compare existing approaches. To address this crucial gap, we present G4SATBench, the first benchmark study that establishes a comprehensive evaluation framework for GNN-based SAT solvers. In G4SATBench, we meticulously curate a large and diverse set of SAT datasets comprising 7 problems with 3 difficulty levels and benchmark a broad range of GNN models across various prediction tasks, training objectives, and inference algorithms. To explore the learning abilities and comprehend the strengths and limitations of GNN-based SAT solvers, we also compare their solving processes with the heuristics in search-based SAT solvers. Our empirical results provide valuable insights into the performance of GNN-based SAT solvers and further suggest that existing GNN models can effectively learn a solving strategy akin to greedy local search but struggle to learn backtracking search in the latent space.

摘要
Graph neural networks (GNNs) 近期emerge as a promising approach for solving the Boolean Satisfiability Problem (SAT), offering potential alternatives to traditional backtracking or local search SAT solvers. However, despite the growing volume of literature in this field, there remains a notable absence of a unified dataset and a fair benchmark to evaluate and compare existing approaches. To address this crucial gap, we present G4SATBench, the first benchmark study that establishes a comprehensive evaluation framework for GNN-based SAT solvers. In G4SATBench, we meticulously curate a large and diverse set of SAT datasets comprising 7 problems with 3 difficulty levels and benchmark a broad range of GNN models across various prediction tasks, training objectives, and inference algorithms. To explore the learning abilities and comprehend the strengths and limitations of GNN-based SAT solvers, we also compare their solving processes with the heuristics in search-based SAT solvers. Our empirical results provide valuable insights into the performance of GNN-based SAT solvers and further suggest that existing GNN models can effectively learn a solving strategy akin to greedy local search but struggle to learn backtracking search in the latent space.

Symmetry Leads to Structured Constraint of Learning

paper_url: http://arxiv.org/abs/2309.16932
repo_url: None
paper_authors: Liu Ziyin
for: 这个论文探讨了现代神经网络中广泛存在的同构设计所带来的对学习行为的影响。
methods: 该论文使用了loss函数对称性的研究，证明每种镜像对称性都导致了结构化约束，在权重减排或梯度噪声较大时变为最佳解。
results: 论文表明，对称性可以导致神经网络的稀热性、低级别性和同质整合等现象，并可以用来设计具有固定约束的梯度下降算法。

Abstract
Due to common architecture designs, symmetries exist extensively in contemporary neural networks. In this work, we unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models. We prove that every mirror symmetry of the loss function leads to a structured constraint, which becomes a favored solution when either the weight decay or gradient noise is large. As direct corollaries, we show that rescaling symmetry leads to sparsity, rotation symmetry leads to low rankness, and permutation symmetry leads to homogeneous ensembling. Then, we show that the theoretical framework can explain the loss of plasticity and various collapse phenomena in neural networks and suggest how symmetries can be used to design algorithms to enforce hard constraints in a differentiable way.

摘要
由于现代神经网络的通用体系设计， symmetries 在现代神经网络中存在广泛。在这项工作中，我们揭示了损失函数 symmetries 对机器学习模型的学习行为的重要性。我们证明了每个镜像Symmetry of the loss function 导致了一种结构化约束，当权重衰退或梯度噪声较大时，这种约束变得更加受欢迎。作为直接推论，我们显示了缩放Symmetry 导致稀疏性，旋转Symmetry 导致低级别性，并且 permutation Symmetry 导致同质集成。然后，我们显示了理论框架可以解释神经网络中的损失强化和各种塌陷现象，并建议如何使用 symmetries 设计可微 differentiable 的算法来实现硬约束。

Unlabeled Out-Of-Domain Data Improves Generalization

paper_url: http://arxiv.org/abs/2310.00027
repo_url: None
paper_authors: Amir Hossein Saberi, Amir Najafi, Alireza Heidari, Mohammad Hosein Movasaghinia, Abolfazl Motahari, Babak H. Khalaj
for: 本文提出了一种新的框架，用于在半supervised分类问题中 incorporating 无标签数据。
methods: 本文使用了 Distributionally Robust Optimization (DRO) 和自supervised 训练。
results: 研究人员通过使用本文的方法，可以获得substantial 改善的总体化错误 bounds，比ERM更好。

Abstract
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems, where scenarios involving the minimization of either i) adversarially robust or ii) non-robust loss functions have been considered. Notably, we allow the unlabeled samples to deviate slightly (in total variation sense) from the in-domain distribution. The core idea behind our framework is to combine Distributionally Robust Optimization (DRO) with self-supervised training. As a result, we also leverage efficient polynomial-time algorithms for the training stage. From a theoretical standpoint, we apply our framework on the classification problem of a mixture of two Gaussians in $\mathbb{R}^d$, where in addition to the $m$ independent and labeled samples from the true distribution, a set of $n$ (usually with $n\gg m$) out of domain and unlabeled samples are gievn as well. Using only the labeled data, it is known that the generalization error can be bounded by $\propto\left(d/m\right)^{1/2}$. However, using our method on both isotropic and non-isotropic Gaussian mixture models, one can derive a new set of analytically explicit and non-asymptotic bounds which show substantial improvement on the generalization error compared ERM. Our results underscore two significant insights: 1) out-of-domain samples, even when unlabeled, can be harnessed to narrow the generalization gap, provided that the true data distribution adheres to a form of the "cluster assumption", and 2) the semi-supervised learning paradigm can be regarded as a special case of our framework when there are no distributional shifts. We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.

摘要
我们提出了一种新的框架，用于在半监督分类问题中包含无标示数据。我们考虑了两种情况：一是对 adversarially Robust 损失函数进行最小化，二是对 non-Robust 损失函数进行最小化。我们允许无标示样本略有偏差（在总变量意义上）从域内分布。我们的框架结合了分布 robust 优化（DRO）和自适应培训。因此，我们还可以利用高效的多项式时间算法进行训练阶段。从理论上看，我们在两个 Gaussian 混合模型中应用我们的框架，其中包括 $m$ 个独立的标注样本和 $n$ ($n\gg m$) 个非域样本。使用仅标注数据时，知道generalization error可以被约为 $\propto\left(d/m\right)^{1/2}$。然而，使用我们的方法，我们可以 derivate一组新的分布式bounds，这些bounds显示与ERM相比，我们的方法可以减少generalization error的距离。我们的结果表明以下两点：1）无标示样本，即使不具标注，也可以减少通用化距离，只要数据分布遵循一种"分布假设"。2）半监督学习模型可以看作我们框架的特殊情况，当无Distributional shift时。我们通过对各种 sintetic 和实际数据进行实验，证明了我们的声明。