2023-10-30

cs.LG

cs.LG - 2023-10-30

Efficient Subgraph GNNs by Learning Effective Selection Policies

paper_url: http://arxiv.org/abs/2310.20082
repo_url: None
paper_authors: Beatrice Bevilacqua, Moshe Eliasof, Eli Meirom, Bruno Ribeiro, Haggai Maron
for: 本文旨在学习从可能的子图集中选择一小 subsets，以提高Subgraph GNNs的应用性。
methods: 本文提出了一种新的Policy-Learn方法，通过iterative manner来学习选择子图。
results: 实验结果表明，Policy-Learn在多种 dataset上都能够超越现有的基准值。

Abstract
Subgraph GNNs are provably expressive neural architectures that learn graph representations from sets of subgraphs. Unfortunately, their applicability is hampered by the computational complexity associated with performing message passing on many subgraphs. In this paper, we consider the problem of learning to select a small subset of the large set of possible subgraphs in a data-driven fashion. We first motivate the problem by proving that there are families of WL-indistinguishable graphs for which there exist efficient subgraph selection policies: small subsets of subgraphs that can already identify all the graphs within the family. We then propose a new approach, called Policy-Learn, that learns how to select subgraphs in an iterative manner. We prove that, unlike popular random policies and prior work addressing the same problem, our architecture is able to learn the efficient policies mentioned above. Our experimental results demonstrate that Policy-Learn outperforms existing baselines across a wide range of datasets.

摘要
<>将文本翻译成简化中文。<>图гра非常表达的神经网络（Subgraph GNNs）可以从多个子图中学习图表示。然而，它们的应用受到计算复杂性的限制，因为需要在许多子图上进行消息传递。在这篇论文中，我们考虑了选择一小集合的可能的子图的问题。我们首先证明了存在一些家族的无益图（WL-indistinguishable graphs），其中存在高效的子图选择策略：小 subsets of subgraphs 可以已经识别整个家族中的所有图。然后，我们提出了一种新的方法，叫做 Policy-Learn，它在循环的方式学习如何选择子图。我们证明了，与流行的随机策略和先前的相同问题的解决方案不同，我们的架构可以学习高效的策略。我们的实验结果表明，Policy-Learn 在各种数据集上超过现有的基elines。

Hybridizing Physics and Neural ODEs for Predicting Plasma Inductance Dynamics in Tokamak Fusion Reactors

paper_url: http://arxiv.org/abs/2310.20079
repo_url: None
paper_authors: Allen M. Wang, Darren T. Garnier, Cristina Rea
for: 这项研究旨在提高核聚变炉的控制精度，以便实现更高效的能源生产。
methods: 该研究使用神经 ordinary differential equations（ODE）框架来预测核聚变炉中的气态动力学行为，并结合物理学基本方程来限制神经网络模型。
results: 研究发现，将物理学基本方程与神经网络模型相结合，可以提高预测气态动力学行为的准确性，比已有的物理学推导式ODE和纯神经网络模型都更好。

Abstract
While fusion reactors known as tokamaks hold promise as a firm energy source, advances in plasma control, and handling of events where control of plasmas is lost, are needed for them to be economical. A significant bottleneck towards applying more advanced control algorithms is the need for better plasma simulation, where both physics-based and data-driven approaches currently fall short. The former is bottle-necked by both computational cost and the difficulty of modelling plasmas, and the latter is bottle-necked by the relative paucity of data. To address this issue, this work applies the neural ordinary differential equations (ODE) framework to the problem of predicting a subset of plasma dynamics, namely the coupled plasma current and internal inductance dynamics. As the neural ODE framework allows for the natural inclusion of physics-based inductive biases, we train both physics-based and neural network models on data from the Alcator C-Mod fusion reactor and find that a model that combines physics-based equations with a neural ODE performs better than both existing physics-motivated ODEs and a pure neural ODE model.

摘要
tokamak核聚变反应堆具有成本能源的承诺，但是需要进一步的材料控制和失控事件处理技术才能实现经济性。目前控制算法的应用受到plasma simulate的限制，physics-based和data-driven两种方法都有瓶颈。前者由计算成本和模拟plasma困难而受限，后者由数据的缺乏而受限。为解决这个问题，本工作采用神经网络 ordinary differential equations（ODE）框架来预测plasma动力学中的一部分，即核聚变和内 inductance动力学。由于神经网络ODE框架可以自然地包含物理学基础的束缚，我们在Alcator C-Mod核聚变反应堆数据上训练了physics-based和神经网络模型，发现一个combines physics-based方程和神经网络ODE模型的模型在 physics-motivated ODE和纯神经网络模型之上表现更好。

Meek Separators and Their Applications in Targeted Causal Discovery

paper_url: http://arxiv.org/abs/2310.20075
repo_url: https://github.com/uhlerlab/meek_sep
paper_authors: Kirankumar Shiragur, Jiaqi Zhang, Caroline Uhler
for: 学习干预结构 FROM 干预数据，特别是只需要学习部分 causal graph 的情况。
methods: 我们引入了 $Meek~separator$，它是一个特定的子集 vertices，当 intervened 时，可以将剩下的未oriented edges 分解成小的连接组件。我们还提出了两种随机算法，用于实现 subset search 和 causal matching。
results: 我们的结果提供了首个known average-case 证明保证，用于 subset search 和 causal matching 问题。我们认为这将开启更多targeted causal structure learning问题的解决方案的可能性。

Abstract
Learning causal structures from interventional data is a fundamental problem with broad applications across various fields. While many previous works have focused on recovering the entire causal graph, in practice, there are scenarios where learning only part of the causal graph suffices. This is called $targeted$ causal discovery. In our work, we focus on two such well-motivated problems: subset search and causal matching. We aim to minimize the number of interventions in both cases. Towards this, we introduce the $Meek~separator$, which is a subset of vertices that, when intervened, decomposes the remaining unoriented edges into smaller connected components. We then present an efficient algorithm to find Meek separators that are of small sizes. Such a procedure is helpful in designing various divide-and-conquer-based approaches. In particular, we propose two randomized algorithms that achieve logarithmic approximation for subset search and causal matching, respectively. Our results provide the first known average-case provable guarantees for both problems. We believe that this opens up possibilities to design near-optimal methods for many other targeted causal structure learning problems arising from various applications.

摘要
Towards this, we introduce the Meek separator, which is a subset of vertices that, when intervened, decomposes the remaining unoriented edges into smaller connected components. We then present an efficient algorithm to find Meek separators that are of small sizes. Such a procedure is helpful in designing various divide-and-conquer-based approaches. In particular, we propose two randomized algorithms that achieve logarithmic approximation for subset search and causal matching, respectively. Our results provide the first known average-case provable guarantees for both problems. We believe that this opens up possibilities to design near-optimal methods for many other targeted causal structure learning problems arising from various applications.Translated into Simplified Chinese:学习 causal 结构从 intervening 数据是一个基本的问题，它在各个领域中有广泛的应用。虽然许多前一些工作都是Focus on recovering the entire causal graph，但在实践中有情况是只需要学习 causal 结构的一部分。这被称为 targeted causal discovery。在我们的工作中，我们关注了两个这样的有利场景：subset search和causal matching。我们想要最小化 intervened 的数量。为了实现这一目标，我们引入 Meek separator，它是一个被 intervened 的顶点集，其中的顶点被 intervened 后，可以将剩下的未oriented 边分解成更小的连接组件。我们然后提出了一种高效的算法来找到 Meek separator，其中的顶点数量尽可能小。这种过程对于设计 divide-and-conquer 基于的方法非常有用。具体来说，我们提出了两种随机算法，它们可以对 subset search 和 causal matching 问题实现 logarithmic 的近似，并且我们的结果提供了这些问题的首个平均情况可证 guarantees。我们认为这些结果将开启大量的可优化 targeted causal structure learning 问题的解决方案。

Decentralised, Scalable and Privacy-Preserving Synthetic Data Generation

paper_url: http://arxiv.org/abs/2310.20062
repo_url: None
paper_authors: Vishal Ramesh, Rui Zhao, Naman Goel
for:The paper is written for the purpose of exploring the use of synthetic data in machine learning, with a focus on privacy-preserving methods for generating synthetic data.methods:The paper uses a novel system that combines Solid (Social Linked Data), MPC (Secure Multi-Party Computation), and TEEs (Trusted Execution Environments) to generate differentially private synthetic data.results:The paper demonstrates the effectiveness of their approach through rigorous empirical results on simulated and real datasets, and shows that their method can address various challenges in responsible and trustworthy synthetic data generation, including contributor autonomy, decentralization, privacy, and scalability.

Abstract
Synthetic data is emerging as a promising way to harness the value of data, while reducing privacy risks. The potential of synthetic data is not limited to privacy-friendly data release, but also includes complementing real data in use-cases such as training machine learning algorithms that are more fair and robust to distribution shifts etc. There is a lot of interest in algorithmic advances in synthetic data generation for providing better privacy and statistical guarantees and for its better utilisation in machine learning pipelines. However, for responsible and trustworthy synthetic data generation, it is not sufficient to focus only on these algorithmic aspects and instead, a holistic view of the synthetic data generation pipeline must be considered. We build a novel system that allows the contributors of real data to autonomously participate in differentially private synthetic data generation without relying on a trusted centre. Our modular, general and scalable solution is based on three building blocks namely: Solid (Social Linked Data), MPC (Secure Multi-Party Computation) and Trusted Execution Environments (TEEs). Solid is a specification that lets people store their data securely in decentralised data stores called Pods and control access to their data. MPC refers to the set of cryptographic methods for different parties to jointly compute a function over their inputs while keeping those inputs private. TEEs such as Intel SGX rely on hardware based features for confidentiality and integrity of code and data. We show how these three technologies can be effectively used to address various challenges in responsible and trustworthy synthetic data generation by ensuring: 1) contributor autonomy, 2) decentralisation, 3) privacy and 4) scalability. We support our claims with rigorous empirical results on simulated and real datasets and different synthetic data generation algorithms.

摘要
人工数据正在成为一种有前途的方法，以获得数据的价值，同时降低隐私风险。人工数据的潜在价值不仅限于隐私友好的数据发布，还包括补充实际数据在训练机器学习算法等场景中的使用。目前有很大的兴趣在人工数据生成算法方面的技术进步，以提供更好的隐私和统计保证，并在机器学习管道中更好地使用人工数据。但是，为了负责任和信任worthy的人工数据生成，不能仅仅关注算法方面，而是需要考虑整个人工数据生成管道的各个方面。我们开发了一个新的系统，使得实际数据的 contribuutors可以自主参与在权限匿名的情况下进行权限匿名的情况下进行权限匿名的人工数据生成。我们的模块化、通用和可扩展的解决方案基于以下三个建造件：Solid（社会链接数据）、MPC（安全多方计算）和TEEs（可信执行环境）。Solid是一种规范，允许人们在分布式数据存储called Pods中安全存储自己的数据，并控制对自己的数据的访问。MPC是一种用于不同党计算共同计算函数的 cryptographic 方法，以保持各个党的输入私有。TEEs，如Intel SGX，基于硬件特性，以确保代码和数据的机密性和完整性。我们证明了这三种技术可以有效地解决负责任和信任worthy的人工数据生成中的各种挑战，包括：1）参与者自主性，2）分布式，3）隐私和4）可扩展性。我们支持我们的主张通过对 simulated 和实际数据集和不同的人工数据生成算法进行严格的实验结果。

AdaSub: Stochastic Optimization Using Second-Order Information in Low-Dimensional Subspaces

paper_url: http://arxiv.org/abs/2310.20060
repo_url: https://github.com/jvictormata/adasub
paper_authors: João Victor Galvão da Mata, Martin S. Andersen
for: 提高 Stochastic Optimization 算法的效率和精度，使其在实际应用中更加可靠。
methods: 使用第二阶信息计算搜索方向，并在低维度子空间中进行随机搜索。
results: 比较具有时间和迭代次数的精度和效率，超过了流行的随机优化器。

Abstract
We introduce AdaSub, a stochastic optimization algorithm that computes a search direction based on second-order information in a low-dimensional subspace that is defined adaptively based on available current and past information. Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses, making them impractical. To address this issue, our approach enables the management of computational expenses and algorithm efficiency by enabling the selection of the subspace dimension for the search. Our code is freely available on GitHub, and our preliminary numerical results demonstrate that AdaSub surpasses popular stochastic optimizers in terms of time and number of iterations required to reach a given accuracy.

摘要
我们介绍AdaSub，一种随机优化算法，它基于目前和过去信息中的第二项信息来计算搜寻方向。相比于首项方法，第二项方法具有更好的均衡特性，但是计算梯度矩阵的需要在每个迭代运算中导致过度的计算成本，使其无法实际应用。为解决这个问题，我们的方法可以运算计算成本和算法效率的管理，并允许选择搜寻空间的维度。我们的代码可以免费下载于GitHub，而我们的初步的数据显示，AdaSub在时间和迭代次数方面超过流行的随机优化器。

Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo

paper_url: http://arxiv.org/abs/2310.20053
repo_url: None
paper_authors: Szilvia Ujváry, Gergely Flamich, Vincent Fortuin, José Miguel Hernández Lobato
for: investigate the tightness of PAC-Bayes bounds when restricting the posterior family to factorized Gaussian distributions.
methods: 使用 Hamiltonian Monte Carlo 采样优化 posterior，通过热力学整合Estimate KL divergence from the prior，并提出三种方法来获得高概率 bound under different assumptions.
results: experiments on MNIST dataset show significant tightness gaps, as much as 5-6% in some cases.

Abstract
An important yet underexplored question in the PAC-Bayes literature is how much tightness we lose by restricting the posterior family to factorized Gaussian distributions when optimizing a PAC-Bayes bound. We investigate this issue by estimating data-independent PAC-Bayes bounds using the optimal posteriors, comparing them to bounds obtained using MFVI. Concretely, we (1) sample from the optimal Gibbs posterior using Hamiltonian Monte Carlo, (2) estimate its KL divergence from the prior with thermodynamic integration, and (3) propose three methods to obtain high-probability bounds under different assumptions. Our experiments on the MNIST dataset reveal significant tightness gaps, as much as 5-6\% in some cases.

摘要
“一个重要但未获足够探讨的问题在PAC-Bayes文献中是，当我们将 posterior family 限制为因素化 Gaussian 分布时，我们所失去的紧密程度。我们调查这个问题，使用最佳 Gibbs posterior 来定义 PAC-Bayes 下界，并与 MFVI 的下界进行比较。具体来说，我们进行了以下三个步骤：1. 使用 Hamiltonian Monte Carlo 方法获取最佳 Gibbs posterior 的抽象；2. 使用热力学 интеграation 估算这个 posterior 对从假设的 KL 散度；3. 提出了三种方法来在不同的假设下获得高概率下界。我们对 MNIST dataset 进行了实验，发现在一些情况下，紧密程度可以相对较高，甚至高达 5-6%。”

The Expressibility of Polynomial based Attention Scheme

paper_url: http://arxiv.org/abs/2310.20051
repo_url: None
paper_authors: Zhao Song, Guangyi Xu, Junze Yin
for: This paper aims to provide a theoretical analysis of the expressive capabilities of polynomial attention in transformer architectures, and to explore the effectiveness of high-degree polynomials in amplifying large values and distinguishing between datasets.
methods: The paper uses a combination of theoretical analysis and experimental evaluation to study the representational capacity of polynomial attention. The authors construct two carefully designed datasets, namely $\mathcal{D}_0$ and $\mathcal{D}_1$, and demonstrate the ability of a single-layer polynomial attention network to distinguish between these datasets using a sufficient high degree $\beta$.
results: The paper shows that with a high degree $\beta$, a single-layer polynomial attention network can effectively separate the two datasets, while with a low degree $\beta$, the network cannot effectively distinguish between them. The analysis underscores the greater effectiveness of high-degree polynomials in amplifying large values and capturing intricate linguistic correlations.

Abstract
Large language models (LLMs) have significantly improved various aspects of our daily lives. These models have impacted numerous domains, from healthcare to education, enhancing productivity, decision-making processes, and accessibility. As a result, they have influenced and, to some extent, reshaped people's lifestyles. However, the quadratic complexity of attention in transformer architectures poses a challenge when scaling up these models for processing long textual contexts. This issue makes it impractical to train very large models on lengthy texts or use them efficiently during inference. While a recent study by [KMZ23] introduced a technique that replaces the softmax with a polynomial function and polynomial sketching to speed up attention mechanisms, the theoretical understandings of this new approach are not yet well understood. In this paper, we offer a theoretical analysis of the expressive capabilities of polynomial attention. Our study reveals a disparity in the ability of high-degree and low-degree polynomial attention. Specifically, we construct two carefully designed datasets, namely $\mathcal{D}_0$ and $\mathcal{D}_1$, where $\mathcal{D}_1$ includes a feature with a significantly larger value compared to $\mathcal{D}_0$. We demonstrate that with a sufficiently high degree $\beta$, a single-layer polynomial attention network can distinguish between $\mathcal{D}_0$ and $\mathcal{D}_1$. However, with a low degree $\beta$, the network cannot effectively separate the two datasets. This analysis underscores the greater effectiveness of high-degree polynomials in amplifying large values and distinguishing between datasets. Our analysis offers insight into the representational capacity of polynomial attention and provides a rationale for incorporating higher-degree polynomials in attention mechanisms to capture intricate linguistic correlations.

摘要
大型语言模型（LLM）已经有效地改善了我们日常生活中的多个方面。这些模型影响了许多领域，从医疗保健到教育，提高生产力、决策过程和可доступ性。因此，它们已经影响了人们的生活方式。然而，trasformer架构中的对话复杂度问题使得当文本上下文变长时，训练和应用这些模型变得不实际。 recent study by [KMZ23] introduced a technique that replaces the softmax with a polynomial function and polynomial sketching to speed up attention mechanisms, but the theoretical understandings of this new approach are not yet well understood.在这篇论文中，我们提供了对幂函数注意力的理论分析。我们的研究显示了高度和低度幂函数注意力之间的差异。具体来说，我们创建了两个精心设计的数据集，namely $\mathcal{D}_0$和$\mathcal{D}_1$，其中 $\mathcal{D}_1$ 包含一个具有许多更大值的特征，相比 $\mathcal{D}_0$。我们示出了，随着幂度 $\beta$ 增加到足够高的程度，单层幂函数注意力网络可以有效地分辨 $\mathcal{D}_0$ 和 $\mathcal{D}_1$。然而，随着幂度 $\beta$ 降低，网络无法有效地区分这两个数据集。这一分析显示了幂函数注意力的表达能力，并提供了将更高度幂函数包含在注意力机制中以捕捉复杂的语言相关资讯的理论基础。

Scaling Riemannian Diffusion Models

paper_url: http://arxiv.org/abs/2310.20030
repo_url: https://github.com/louaaron/Scaling-Riemannian-Diffusion
paper_authors: Aaron Lou, Minkai Xu, Stefano Ermon
for: 本研究旨在提高偏微分方法在泛化空间上的性能，并允许在高维空间中应用。
methods: 本研究使用了各种 Ansatz 和技巧来简化偏微分过程，包括使用对称空间的假设，以高精度计算相关量。
results: 研究发现，在低维数据集上，使用我们的修正可以获得显著改善，使偏微分与其他方法竞争。此外，我们还证明了我们的方法可以在高维任务上进行泛化。例如，我们在 $SU(n)$ 格子上模型了 QCD 分布，并在高维球面上学习了对比性学习的嵌入。

Abstract
Riemannian diffusion models draw inspiration from standard Euclidean space diffusion models to learn distributions on general manifolds. Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications in high dimensions. In this work, we reexamine these approximations and propose several practical improvements. Our key observation is that most relevant manifolds are symmetric spaces, which are much more amenable to computation. By leveraging and combining various ans\"{a}tze, we can quickly compute relevant quantities to high precision. On low dimensional datasets, our correction produces a noticeable improvement, allowing diffusion to compete with other methods. Additionally, we show that our method enables us to scale to high dimensional tasks on nontrivial manifolds. In particular, we model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres.

摘要
里曼尼 diffusion 模型继承自标准欧几里得空间 diffusion 模型，以学习总体 manifold 上的分布。然而，额外的几何复杂性使得扩散过程的转移函数无法表示为闭合形式，因此先前的方法通常采用了准确性不高的替换方法，这会影响性能并阻碍高维应用。在这个工作中，我们重新评估这些替换方法，并提出了一些实用的改进方案。我们的关键观察是，大多数相关的拓扑都是 симметричные 空间，这使得计算变得非常方便。通过利用和组合不同的 ansatz，我们可以快速计算相关的量，并达到高精度。在低维数据集上，我们的修正提供了明显的改善，使扩散能够与其他方法竞争。此外，我们还证明了我们的方法可以扩展到高维任务上，特别是在 $SU(n)$ 格点上模型 QCD 分布和高维球体上的 contrastively 学习 embedding。

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices

paper_url: http://arxiv.org/abs/2310.19991
repo_url: None
paper_authors: Minghao Yan, Hongyi Wang, Shivaram Venkataraman
for: 这篇论文旨在提高神经网络（NN）的能源管理，特别是在推广运行中。
methods: 论文使用了对硬件元件的配置优化，包括GPU、内存和CPU频率，以提高NN推广运行中的能源使用效率。
results: 论文的实验评估发现，这种配置优化可以实现36%的能源储存，并且可以快速地对应应用程序的限制。

Abstract
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models. We also validate that PolyThrottle can quickly converge towards near-optimal settings while satisfying application constraints.

摘要
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models. We also validate that PolyThrottle can quickly converge towards near-optimal settings while satisfying application constraints.Here's the translation in Traditional Chinese:随着神经网络（NN）在多个领域应用，其能源需求也随之增加。虽然先前的研究主要集中在训练过程中对能源采取减少措施，但是ML系统的持续运行仍然导致重要的能源消耗。本文研究了在实际应用中对于NN推理的硬件元件配置，如GPU、内存和CPU频率，对于能源消耗的影响。我们提出了PolyThrottle，一个实现硬件元件配置优化的解决方案，使用受限的泊松优化。我们的实验评估发现，PolyThrottle可以实现36%的能源储存，并且可以快速对应应用程序的限制。

Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations

paper_url: http://arxiv.org/abs/2310.19978
repo_url: None
paper_authors: Edward Raff, Amol Khanna, Fred Lu
for: 这个论文的目的是提出一种能够在稀疏输入数据上训练具有不同保证的回归模型。
methods: 这篇论文使用了Frank-Wolfe算法进行$L_1$ 惩罚线性回归模型的训练，并将其改进以适应稀疏输入数据。
results: 该方法可以减少训练时间的复杂度，从$\mathcal{O}(TDCS + TNS)$降低到$\mathcal{O}(NS + T\sqrt{D}\log{D} + TS^2)$，具体取决于私有保证参数$\epsilon$和数据稀疏程度。实验结果表明，这种方法可以减少训练时间的因子达2200倍。

Abstract
To the best of our knowledge, there are no methods today for training differentially private regression models on sparse input data. To remedy this, we adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively. In doing so, we reduce the training time of the algorithm from $\mathcal{O}( T D S + T N S)$ to $\mathcal{O}(N S + T \sqrt{D} \log{D} + T S^2)$, where $T$ is the number of iterations and a sparsity rate $S$ of a dataset with $N$ rows and $D$ features. Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200\times$, depending on the value of the privacy parameter $\epsilon$ and the sparsity of the dataset.

摘要
Currently, there are no methods for training differentially private regression models on sparse input data. To address this, we modify the Frank-Wolfe algorithm for $L_1$ penalized linear regression to handle sparse inputs and improve its efficiency. As a result, the training time of the algorithm is reduced from $\mathcal{O}( T D S + T N S)$ to $\mathcal{O}(N S + T \sqrt{D} \log{D} + T S^2)$, where $T$ is the number of iterations and a sparsity rate $S$ of a dataset with $N$ rows and $D$ features. Our experiments show that this approach can reduce the runtime by a factor of up to $2,200\times$, depending on the value of the privacy parameter $\epsilon$ and the sparsity of the dataset.

Unified Enhancement of Privacy Bounds for Mixture Mechanisms via $f$-Differential Privacy

paper_url: http://arxiv.org/abs/2310.19973
repo_url: None
paper_authors: Chendi Wang, Buxin Su, Jiayuan Ye, Reza Shokri, Weijie J. Su
for: 这篇论文的目的是提高隐私保护的机器学习算法的隐私保证，特别是随机初始化和批处理抽取的随机模型和一轮渐进式隐私梯度下降（DP-GD）的隐私保证。
methods: 这篇论文使用了$f$-DP来提高随机模型和DP-GD的隐私保证，并 derive了一个关于洗淤模型的关闭式表达函数，以及对随机初始化的研究。
results: 这篇论文的研究表明，随机初始化可以增强DP-GD的隐私保证，并且对于洗淤模型，$f$-DP可以提供更好的隐私保证。此外，这篇论文还提出了一个新的不等式 для质量函数，用于分析混合机制的隐私保证。

Abstract
Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when proving differential privacy bounds because it induces mixture distributions for the algorithm's output that are difficult to analyze. This paper focuses on improving privacy bounds for shuffling models and one-iteration differentially private gradient descent (DP-GD) with random initializations using $f$-DP. We derive a closed-form expression of the trade-off function for shuffling models that outperforms the most up-to-date results based on $(\epsilon,\delta)$-DP. Moreover, we investigate the effects of random initialization on the privacy of one-iteration DP-GD. Our numerical computations of the trade-off function indicate that random initialization can enhance the privacy of DP-GD. Our analysis of $f$-DP guarantees for these mixture mechanisms relies on an inequality for trade-off functions introduced in this paper. This inequality implies the joint convexity of $F$-divergences. Finally, we study an $f$-DP analog of the advanced joint convexity of the hockey-stick divergence related to $(\epsilon,\delta)$-DP and apply it to analyze the privacy of mixture mechanisms.

摘要
differentially private（DP）机器学习算法会产生多种随机性，如初始化随机值、批处理随机抽样和排序。然而，这些随机性很难在证明泛化隐私级别时考虑，因为它们导致算法输出的混合分布变得更加复杂。这篇论文关注改善混合模型和一轮泛化隐私梯度下降（DP-GD）的隐私级别，使用 $f $-DP。我们 derive了混合模型的关闭式交易函数表达，超过最新的 $(\epsilon ,\delta )-$DP 结果。此外，我们还研究了随机初始化对DP-GD的隐私性的影响。我们的数值计算表达交易函数指示，随机初始化可以增强DP-GD的隐私性。我们的 $f $-DP 保证分析中使用了这篇论文引入的交易函数不等式，该不等式表明 $F $-散度函数的共轭性。最后，我们研究了 $f $-DP 对混合机制的隐私性的分析，并应用了 $(\epsilon ,\delta )-$DP 相关的先进 JOINT CONVEXITY 的射影分析。

Early detection of inflammatory arthritis to improve referrals using multimodal machine learning from blood testing, semi-structured and unstructured patient records

paper_url: http://arxiv.org/abs/2310.19967
repo_url: None
paper_authors: Bing Wang, Weizi Li, Anthony Bradlow, Antoni T. Y. Chan, Eghosa Bazuaye
for: 早期诊断急性关节炎（IA），以便有效、准确地折衔医疗资源，避免疾病诊断过程中的滥衔和恶化。
methods: 我们使用多modal数据 ensemble学和融合学方法，以帮助早期诊断IA。这些方法包括 semi-structured和不结构数据的融合，以提高IA的检测精度。
results: 我们的研究表明，使用多modal数据 ensemble学和融合学方法可以帮助早期诊断IA，并且在实际应用中可以提高医疗资源的利用率和诊断精度。这些方法可以作为诊断IA的助手，帮助医生更准确地诊断疾病。

Abstract
Early detection of inflammatory arthritis (IA) is critical to efficient and accurate hospital referral triage for timely treatment and preventing the deterioration of the IA disease course, especially under limited healthcare resources. The manual assessment process is the most common approach in practice for the early detection of IA, but it is extremely labor-intensive and inefficient. A large amount of clinical information needs to be assessed for every referral from General Practice (GP) to the hospitals. Machine learning shows great potential in automating repetitive assessment tasks and providing decision support for the early detection of IA. However, most machine learning-based methods for IA detection rely on blood testing results. But in practice, blood testing data is not always available at the point of referrals, so we need methods to leverage multimodal data such as semi-structured and unstructured data for early detection of IA. In this research, we present fusion and ensemble learning-based methods using multimodal data to assist decision-making in the early detection of IA. To the best of our knowledge, our study is the first attempt to utilize multimodal data to support the early detection of IA from GP referrals.

摘要
早期检测Inflammatory Arthritis (IA) 的重要性在有限的医疗资源下是非常重要，以确保有效和准确的医院推荐诊断和避免IA疾病趋势的恶化。现在，手动评估过程是在实践中最常见的检测IA的方法，但它很劳动密集和不效率。每次从普通医生（GP）到医院的 Referral 需要评估大量临床信息。机器学习显示出了自动化重复的评估任务和提供决策支持的潜力，但大多数机器学习基于IA检测方法都依赖血液测试结果。但在实践中，血液测试数据不总是在 Referral 时可以获得，因此我们需要使用多Modal 数据来支持早期IA检测。在这项研究中，我们提出了将多Modal 数据 fusion 和 ensemble learning 技术应用于早期IA检测。到目前为止，我们的研究是首次利用多Modal 数据来支持从GP Referral 中的IA检测。

Topological Learning for Motion Data via Mixed Coordinates

paper_url: http://arxiv.org/abs/2310.19960
repo_url: https://github.com/hrluo/topologicalmotionseries
paper_authors: Hengrui Luo, Jisu Kim, Alice Patania, Mikael Vejdemo-Johansson
for: 这个论文的目的是将拓扑信息integrated into a multiple output Gaussian process model for transfer learning purposes.
methods: 作者们提出了一种使用拓扑信息construct a cluster based kernel in a multiple output Gaussian process model, which incorporates the topological structural information and allows for a unified framework using topological information in time and motion series.
results: 作者们的方法可以effectively learn from multiple time series via a multiple output Gaussian process model, and achieve better performance compared to traditional methods.

Abstract
Topology can extract the structural information in a dataset efficiently. In this paper, we attempt to incorporate topological information into a multiple output Gaussian process model for transfer learning purposes. To achieve this goal, we extend the framework of circular coordinates into a novel framework of mixed valued coordinates to take linear trends in the time series into consideration. One of the major challenges to learn from multiple time series effectively via a multiple output Gaussian process model is constructing a functional kernel. We propose to use topologically induced clustering to construct a cluster based kernel in a multiple output Gaussian process model. This kernel not only incorporates the topological structural information, but also allows us to put forward a unified framework using topological information in time and motion series.

摘要
topology可以效率地提取数据集中的结构信息。在这篇论文中，我们尝试将拓扑信息integrated into a novel framework of mixed-valued coordinates to take linear trends in the time series into consideration。一个主要挑战是通过多个时间序列学习是通过多输出 Gaussian process模型，constructing a functional kernel。我们提议使用拓扑induced clustering construct a cluster-based kernel in a multiple output Gaussian process model。这个kernel不仅 incorporates the topological structural information，而且允许我们提出一个统一的拓扑信息在时间和运动序列中使用的框架。

PriPrune: Quantifying and Preserving Privacy in Pruned Federated Learning

paper_url: http://arxiv.org/abs/2310.19958
repo_url: None
paper_authors: Tianyue Chu, Mengwei Yang, Nikolaos Laoutaris, Athina Markopoulou
for: 这种研究的目的是为了提高 Federated Learning (FL) 中的隐私保护，特别是在使用模型剔除 (pruning) 时。
methods: 这篇论文使用了信息论的Upper bound来衡量剔除后模型泄露的信息量，并通过对state-of-the-art privacy attacks进行实验来验证这些理论结论。
results: 该论文的实验结果表明，使用 PriPrune 可以在 FL 中提高隐私保护，并且可以与不同的防御策略和模型剔除策略结合使用以提高隐私精度。

Abstract
Federated learning (FL) is a paradigm that allows several client devices and a server to collaboratively train a global model, by exchanging only model updates, without the devices sharing their local training data. These devices are often constrained in terms of communication and computation resources, and can further benefit from model pruning -- a paradigm that is widely used to reduce the size and complexity of models. Intuitively, by making local models coarser, pruning is expected to also provide some protection against privacy attacks in the context of FL. However this protection has not been previously characterized, formally or experimentally, and it is unclear if it is sufficient against state-of-the-art attacks. In this paper, we perform the first investigation of privacy guarantees for model pruning in FL. We derive information-theoretic upper bounds on the amount of information leaked by pruned FL models. We complement and validate these theoretical findings, with comprehensive experiments that involve state-of-the-art privacy attacks, on several state-of-the-art FL pruning schemes, using benchmark datasets. This evaluation provides valuable insights into the choices and parameters that can affect the privacy protection provided by pruning. Based on these insights, we introduce PriPrune -- a privacy-aware algorithm for local model pruning, which uses a personalized per-client defense mask and adapts the defense pruning rate so as to jointly optimize privacy and model performance. PriPrune is universal in that can be applied after any pruned FL scheme on the client, without modification, and protects against any inversion attack by the server. Our empirical evaluation demonstrates that PriPrune significantly improves the privacy-accuracy tradeoff compared to state-of-the-art pruned FL schemes that do not take privacy into account.

摘要
联合学习（FL）是一种 paradigm，让多个客户端设备和服务器共同训练全域模型，只需将模型更新交换，不需客户端分享本地训练数据。这些客户端通常受到通信和计算资源的限制，可以进一步从模型剔除中获得保护。实际上，通过让本地模型变得粗糙，剔除是对隐私攻击的一种保护措施，但这个保护尚未得到正式或实验性的描述，而且不清楚是否具有足够的防护力。在这篇文章中，我们进行了联合学习中隐私保护的第一次研究。我们 derive 信息论的上限，以量度剔除FL模型中的资讯泄露。我们还进行了广泛的实验，使用现代隐私攻击，评估多个state-of-the-art FL剔除方案在多个 benchmark 数据集上的性能。这些实验给出了价值的对照，以便选择和参数的推广。基于这些对照，我们引入 PriPrune，一个适应性的隐私保护算法，使用对每个客户端的个人化防御面罩，并调整防御剔除率，以同时优化隐私和模型性能。 PriPrune 可以跨多个FL剔除方案，无需修改，并且可以对任何剔除FL模型进行防护，不受服务器的攻击。我们的实验显示，PriPrune 在隐私对照调整中提供了明显的改善。

The Acquisition of Physical Knowledge in Generative Neural Networks

paper_url: http://arxiv.org/abs/2310.19943
repo_url: https://github.com/cross32768/PlaNet_PyTorch
paper_authors: Luca M. Schulze Buschoff, Eric Schulz, Marcel Binz
for: investigate how the learning trajectories of deep generative neural networks compare to children’s developmental trajectories using physical understanding as a testbed.
methods: use physical understanding as a testbed to examine two distinct hypotheses of human development - stochastic optimization and complexity increase.
results: find that while our models are able to accurately predict a number of physical processes, their learning trajectories under both hypotheses do not follow the developmental trajectories of children.Here’s the summary in Traditional Chinese text:
for: 研究深度生成神经网络的学习轨迹与儿童的发展轨迹，用物理理解作为测试床。
methods: 使用物理理解作为测试床，检查两种人类发展假设——几率优化和复杂度增加。
results: 发现模型能够准确预测一些物理 проце数，但其学习轨迹下两种假设都不跟随儿童的发展轨迹。

Abstract
As children grow older, they develop an intuitive understanding of the physical processes around them. Their physical understanding develops in stages, moving along developmental trajectories which have been mapped out extensively in previous empirical research. Here, we investigate how the learning trajectories of deep generative neural networks compare to children's developmental trajectories using physical understanding as a testbed. We outline an approach that allows us to examine two distinct hypotheses of human development - stochastic optimization and complexity increase. We find that while our models are able to accurately predict a number of physical processes, their learning trajectories under both hypotheses do not follow the developmental trajectories of children.

摘要
As children grow older, they develop an intuitive understanding of the physical processes around them. Their physical understanding develops in stages, moving along developmental trajectories which have been mapped out extensively in previous empirical research. Here, we investigate how the learning trajectories of deep generative neural networks compare to children's developmental trajectories using physical understanding as a testbed. We outline an approach that allows us to examine two distinct hypotheses of human development - stochastic optimization and complexity increase. We find that while our models are able to accurately predict a number of physical processes, their learning trajectories under both hypotheses do not follow the developmental trajectories of children.Here's the translation in Traditional Chinese:随着儿童长大，他们会开始有直觉地理解环境中的物理过程。儿童的物理理解会随着时间的推移，逐步发展出不同的发展轨迹，这些轨迹已经在前一些实验研究中得到了详细的描述。在这里，我们将实验探索深度生成神经网络的学习轨迹与儿童的发展轨迹之间的相似之处。我们提出了两个假设来检验人类发展的机制：随机优化和复杂度增加。我们发现了我们的模型可以对一些物理过程进行准确的预测，但是它们的学习轨迹不会跟随儿童的发展轨迹。

Lyapunov-Based Dropout Deep Neural Network (Lb-DDNN) Controller

paper_url: http://arxiv.org/abs/2310.19938
repo_url: None
paper_authors: Saiedeh Akbari, Emily J. Griffis, Omkar Sudhir Patil, Warren E. Dixon
for: 提高非线性动力系统中不结构化不确定性的补偿
methods: 使用Dropout正则化技术和Lyapunov基于实时权重更新方法进行在线无监督学习
results: 在实验中，提出的Dropout DNN基于适应控制器比基eline adaptive DNN控制器无Dropout正则化技术下的追踪误差下降38.32%，功能预测误差下降53.67%，控制努力下降50.44%。

Abstract
Deep neural network (DNN)-based adaptive controllers can be used to compensate for unstructured uncertainties in nonlinear dynamic systems. However, DNNs are also very susceptible to overfitting and co-adaptation. Dropout regularization is an approach where nodes are randomly dropped during training to alleviate issues such as overfitting and co-adaptation. In this paper, a dropout DNN-based adaptive controller is developed. The developed dropout technique allows the deactivation of weights that are stochastically selected for each individual layer within the DNN. Simultaneously, a Lyapunov-based real-time weight adaptation law is introduced to update the weights of all layers of the DNN for online unsupervised learning. A non-smooth Lyapunov-based stability analysis is performed to ensure asymptotic convergence of the tracking error. Simulation results of the developed dropout DNN-based adaptive controller indicate a 38.32% improvement in the tracking error, a 53.67% improvement in the function approximation error, and 50.44% lower control effort when compared to a baseline adaptive DNN-based controller without dropout regularization.

摘要
（简化中文）深度神经网络（DNN）基于适应控制器可以补偿非结构化不确定性在非线性动态系统中。然而，DNN也很容易过拟合和相互作用。Dropout regularization是一种approach，在训练时随机drop nodes以解决过拟合和相互作用的问题。在这篇论文中，一种dropout DNN基于适应控制器被开发出来。该developed dropout技术allow the deactivation of weights that are stochastically selected for each individual layer within the DNN.同时，一种Lyapunov-based实时重量更新法是引入，以更新所有层的DNN重量 для在线无监督学习。一种非稠密Lyapunov-based稳定分析是执行以确保追踪误差的极限收敛。实验结果表明，与基eline adaptive DNN基于控制器 безdropout regularization相比，开发的dropout DNN基于适应控制器可以提高追踪误差38.32%，函数适应误差53.67%，控制努力50.44%。

Sim2Real for Environmental Neural Processes

paper_url: http://arxiv.org/abs/2310.19932
repo_url: https://github.com/jonas-scholz123/sim2real-downscaling
paper_authors: Jonas Scholz, Tom R. Andersson, Anna Vaughan, James Requeima, Richard E. Turner
for: 这个论文旨在探讨如何使用机器学习模型进行天气预测和气候监测。
methods: 这个论文使用的方法包括使用数值数据融合系统生成的格子化气象数据，以及使用神经网络模型来模拟天气Conditioned on both gridded and off-the-grid context data to make uncertainty-aware predictions at target locations。
results: 研究发现，使用“Sim2Real”方法可以在德国的表面温度预测 task 中取得substantially better results，这表明了使用数值数据融合系统的数据作为适应的跳板，可以帮助学习从实际观测数据中获得更高的准确率。

Abstract
Machine learning (ML)-based weather models have recently undergone rapid improvements. These models are typically trained on gridded reanalysis data from numerical data assimilation systems. However, reanalysis data comes with limitations, such as assumptions about physical laws and low spatiotemporal resolution. The gap between reanalysis and reality has sparked growing interest in training ML models directly on observations such as weather stations. Modelling scattered and sparse environmental observations requires scalable and flexible ML architectures, one of which is the convolutional conditional neural process (ConvCNP). ConvCNPs can learn to condition on both gridded and off-the-grid context data to make uncertainty-aware predictions at target locations. However, the sparsity of real observations presents a challenge for data-hungry deep learning models like the ConvCNP. One potential solution is 'Sim2Real': pre-training on reanalysis and fine-tuning on observational data. We analyse Sim2Real with a ConvCNP trained to interpolate surface air temperature over Germany, using varying numbers of weather stations for fine-tuning. On held-out weather stations, Sim2Real training substantially outperforms the same model architecture trained only with reanalysis data or only with station data, showing that reanalysis data can serve as a stepping stone for learning from real observations. Sim2Real could thus enable more accurate models for weather prediction and climate monitoring.

摘要
机器学习（ML）基于天气模型在最近几年内呈现了快速进步。这些模型通常是基于网格化的重新分析数据进行训练的。然而，重新分析数据具有一些限制，例如假设物理法律和低空时间分辨率。这导致了在重新分析和现实之间的差距，并且引发了使用直接训练在天气站上的ML模型的兴趣。模型散布和稀缺环境观测需要扩展和灵活的ML架构，其中之一是卷积条件隐藏过程（ConvCNP）。ConvCNP可以通过条件在网格和非网格上的数据来进行随机预测。然而，实际观测的稀缺性对深度学习模型来说是一个挑战。一个可能的解决方案是“Sim2Real”：先在重新分析数据上进行预训练，然后在天气站上进行细化训练。我们使用了一个ConvCNP来 interpolate surface air temperature over Germany，使用不同数量的天气站进行细化训练。在封存的天气站上，Sim2Real训练显著超过了同样的模型架构在重新分析数据或天气站数据上进行训练，这表明重新分析数据可以作为学习真实观测的“跳板”。Sim2Real可能可以帮助建立更准确的天气预测和气候监测模型。

Solving a Class of Cut-Generating Linear Programs via Machine Learning

paper_url: http://arxiv.org/abs/2310.19920
repo_url: None
paper_authors: Atefeh Rajabalizadeh, Danial Davarnia
for: 该论文目的是提出一种基于机器学习的方法，用于在分支和缩进算法中选择最有用的节点，以提高权衡Program的解的质量。
methods: 该方法利用了分支和缩进算法中的分支点，通过转化CGLP为一个指示函数，使用传统的数据分类技术来近似CGLP的优化值。
results: 实验结果表明，使用该方法可以提高解时间，比 conventinal cutting plane方法更快。

Abstract
Cut-generating linear programs (CGLPs) play a key role as a separation oracle to produce valid inequalities for the feasible region of mixed-integer programs. When incorporated inside branch-and-bound, the cutting planes obtained from CGLPs help to tighten relaxations and improve dual bounds. However, running the CGLPs at the nodes of the branch-and-bound tree is computationally cumbersome due to the large number of node candidates and the lack of a priori knowledge on which nodes admit useful cutting planes. As a result, CGLPs are often avoided at default settings of branch-and-cut algorithms despite their potential impact on improving dual bounds. In this paper, we propose a novel framework based on machine learning to approximate the optimal value of a CGLP class that determines whether a cutting plane can be generated at a node of the branch-and-bound tree. Translating the CGLP as an indicator function of the objective function vector, we show that it can be approximated through conventional data classification techniques. We provide a systematic procedure to efficiently generate training data sets for the corresponding classification problem based on the CGLP structure. We conduct computational experiments on benchmark instances using classification methods such as logistic regression. These results suggest that the approximate CGLP obtained from classification can improve the solution time compared to that of conventional cutting plane methods. Our proposed framework can be efficiently applied to a large number of nodes in the branch-and-bound tree to identify the best candidates for adding a cut.

摘要
割生成线性程序（CGLP）在杂integer程序的可行区域中扮演着关键角色，作为分离或acles来生成有效的不等式。在branch和bound中包含CGLP时，可以通过割生成的割面来紧张 relaxation 和提高 dual bound。然而，在branch和bound树中运行CGLP的计算占用了大量计算资源，因为有很多节点候选和没有先验知识，哪些节点可以生成有用的割面。因此，CGLP通常在branch和cut算法的默认设置下被避免使用，即使它们有可能改善 dual bound。在这篇论文中，我们提出了一种基于机器学习的新框架，用于缩小CGLP类型的优化值。我们将CGLP转化为目标函数向量的指示函数，并示出可以通过传统数据分类技术来近似其优化值。我们还提供了一种系统化的生成训练数据集的方法，基于CGLP的结构。我们在 benchmark 实例上进行了计算实验，使用类логистиック回归等分类方法。这些结果表明，使用我们的提出的框架可以提高解决时间，比 conventional cutting plane方法更好。我们的提出的框架可以高效地应用于branch和bound树中的大量节点，以确定最佳的加法割面候选。

Meta-Learning Strategies through Value Maximization in Neural Networks

paper_url: http://arxiv.org/abs/2310.19919
repo_url: None
paper_authors: Rodrigo Carrasco-Davis, Javier Masís, Andrew M. Saxe
for: 这篇论文的目的是研究如何在生物学和人工智能学习代理人面临多种学习选择的情况下，理解如何实现正规的控制功能。
methods: 这篇论文使用了一种可 tractable 的学习努力框架，可以有效地优化控制信号，以实现折扣累累性性能的总体优化。
results: 研究发现，在不同的学习Setting中，控制努力在早期学习 easier aspects of a task 时最有利，然后坚持努力 harder aspects 上。总的来说，这种学习努力框架提供了一个可 tractable 的理论测试床，可以研究不同的学习系统中的正规控制策略，以及一种正规的认知控制理论中提出的学习 trajectory 的控制策略。

Abstract
Biological and artificial learning agents face numerous choices about how to learn, ranging from hyperparameter selection to aspects of task distributions like curricula. Understanding how to make these meta-learning choices could offer normative accounts of cognitive control functions in biological learners and improve engineered systems. Yet optimal strategies remain challenging to compute in modern deep networks due to the complexity of optimizing through the entire learning process. Here we theoretically investigate optimal strategies in a tractable setting. We present a learning effort framework capable of efficiently optimizing control signals on a fully normative objective: discounted cumulative performance throughout learning. We obtain computational tractability by using average dynamical equations for gradient descent, available for simple neural network architectures. Our framework accommodates a range of meta-learning and automatic curriculum learning methods in a unified normative setting. We apply this framework to investigate the effect of approximations in common meta-learning algorithms; infer aspects of optimal curricula; and compute optimal neuronal resource allocation in a continual learning setting. Across settings, we find that control effort is most beneficial when applied to easier aspects of a task early in learning; followed by sustained effort on harder aspects. Overall, the learning effort framework provides a tractable theoretical test bed to study normative benefits of interventions in a variety of learning systems, as well as a formal account of optimal cognitive control strategies over learning trajectories posited by established theories in cognitive neuroscience.

摘要
In this study, we investigate optimal strategies in a tractable setting using a learning effort framework that efficiently optimizes control signals based on a fully normative objective: discounted cumulative performance throughout learning. We use average dynamical equations for gradient descent, which are available for simple neural network architectures, to achieve computational tractability. Our framework accommodates a range of meta-learning and automatic curriculum learning methods in a unified normative setting.We apply the framework to investigate the effect of approximations in common meta-learning algorithms, infer aspects of optimal curricula, and compute optimal neuronal resource allocation in a continual learning setting. Our findings show that control effort is most beneficial when applied to easier aspects of a task early in learning, followed by sustained effort on harder aspects.Overall, the learning effort framework provides a tractable theoretical test bed to study normative benefits of interventions in a variety of learning systems, as well as a formal account of optimal cognitive control strategies over learning trajectories posited by established theories in cognitive neuroscience.

GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors Using Protein Language Models

paper_url: http://arxiv.org/abs/2310.19915
repo_url: None
paper_authors: Seongwon Kim, Parisa Mollaei, Akshay Antony, Rishikesh Magar, Amir Barati Farimani
for: 本研究旨在开展G蛋白聚合物受体（GPCR）的Sequential设计。
methods: 本研究使用了Prot-Bert模型，通过精度调整和预测任务来探讨蛋白质序列中氨酸的相互关系，以及NPxxY、CWxP和E/DRY等保守排序模式。
results: 研究发现，使用注意力权重和隐藏状态来描述蛋白质序列中各氨酸的贡献程度，以及3D结构的分析，可以帮助理解GPCR的协同作用。

Abstract
With the rise of Transformers and Large Language Models (LLMs) in Chemistry and Biology, new avenues for the design and understanding of therapeutics have opened up to the scientific community. Protein sequences can be modeled as language and can take advantage of recent advances in LLMs, specifically with the abundance of our access to the protein sequence datasets. In this paper, we developed the GPCR-BERT model for understanding the sequential design of G Protein-Coupled Receptors (GPCRs). GPCRs are the target of over one-third of FDA-approved pharmaceuticals. However, there is a lack of comprehensive understanding regarding the relationship between amino acid sequence, ligand selectivity, and conformational motifs (such as NPxxY, CWxP, E/DRY). By utilizing the pre-trained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights, and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs. In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.

摘要
By utilizing the pre-trained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs.In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.

Bayesian Simulation-based Inference for Cosmological Initial Conditions

paper_url: http://arxiv.org/abs/2310.19910
repo_url: None
paper_authors: Florian List, Noemi Anau Montel, Christoph Weniger
for: 这篇论文是为了重构astrophysical和cosmological场的观测数据而写的。
methods: 这篇论文使用了bayesian场重构算法，该算法基于simulation-based inference和自REGRESSIVE模型。
results: 论文首次实现了从观测数据中重构cosmological initial condition的能力。

Abstract
Reconstructing astrophysical and cosmological fields from observations is challenging. It requires accounting for non-linear transformations, mixing of spatial structure, and noise. In contrast, forward simulators that map fields to observations are readily available for many applications. We present a versatile Bayesian field reconstruction algorithm rooted in simulation-based inference and enhanced by autoregressive modeling. The proposed technique is applicable to generic (non-differentiable) forward simulators and allows sampling from the posterior for the underlying field. We show first promising results on a proof-of-concept application: the recovery of cosmological initial conditions from late-time density fields.

摘要
<>将astro物理和 cosmological场 reconstruction from observations 是一项具有挑战性的任务。它需要考虑非线性变换、空间结构混合以及噪声。相反，前向模拟器可以快速地将场转换为观测数据，这些模拟器在许多应用场景中ready available。我们提出了一种可靠的 Bayesian 场 reconstruction算法，基于simulation-based inference和自适应模型。该算法适用于通用（非 diferenciable）前向模拟器，并允许采样 posterior 中的 underlying 场。我们在一个证明性应用中展示了该技术的初步成果： cosmological initial condition 的回归从晚期密度场中。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and pronunciation. It is commonly used in mainland China and Singapore.

BTRec: BERT-Based Trajectory Recommendation for Personalized Tours

paper_url: http://arxiv.org/abs/2310.19886
repo_url: https://github.com/nxh912/BTRec_RecSys23
paper_authors: Ngai Lam Ho, Roy Ka-Wei Lee, Kwan Hui Lim
for: 这项研究的目的是提供个性化的旅游路线建议，以帮助旅行者在不знаком的城市中享受更加美好的旅行体验。
methods: 这项研究使用BERT框架，combined with user demographic information and past POI visits，提出了一种基于POIBERT嵌入算法的迭代算法（BTREC），用于个性化POI路线建议。
results: 实验结果表明，BTREC算法在八个不同规模的城市 dataset 上具有稳定性和高度的效果，与许多其他序列预测算法相比， measured by recall, precision, and F1-scores。

Abstract
An essential task for tourists having a pleasant holiday is to have a well-planned itinerary with relevant recommendations, especially when visiting unfamiliar cities. Many tour recommendation tools only take into account a limited number of factors, such as popular Points of Interest (POIs) and routing constraints. Consequently, the solutions they provide may not always align with the individual users of the system. We propose an iterative algorithm in this paper, namely: BTREC (BERT-based Trajectory Recommendation), that extends from the POIBERT embedding algorithm to recommend personalized itineraries on POIs using the BERT framework. Our BTREC algorithm incorporates users' demographic information alongside past POI visits into a modified BERT language model to recommend a personalized POI itinerary prediction given a pair of source and destination POIs. Our recommendation system can create a travel itinerary that maximizes POIs visited, while also taking into account user preferences for categories of POIs and time availability. Our recommendation algorithm is largely inspired by the problem of sentence completion in natural language processing (NLP). Using a dataset of eight cities of different sizes, our experimental results demonstrate that our proposed algorithm is stable and outperforms many other sequence prediction algorithms, measured by recall, precision, and F1-scores.

摘要
<> translate "An essential task for tourists having a pleasant holiday is to have a well-planned itinerary with relevant recommendations, especially when visiting unfamiliar cities. Many tour recommendation tools only take into account a limited number of factors, such as popular Points of Interest (POIs) and routing constraints. Consequently, the solutions they provide may not always align with the individual users of the system. We propose an iterative algorithm in this paper, namely: BTREC (BERT-based Trajectory Recommendation), that extends from the POIBERT embedding algorithm to recommend personalized itineraries on POIs using the BERT framework. Our BTREC algorithm incorporates users' demographic information alongside past POI visits into a modified BERT language model to recommend a personalized POI itinerary prediction given a pair of source and destination POIs. Our recommendation system can create a travel itinerary that maximizes POIs visited, while also taking into account user preferences for categories of POIs and time availability. Our recommendation algorithm is largely inspired by the problem of sentence completion in natural language processing (NLP). Using a dataset of eight cities of different sizes, our experimental results demonstrate that our proposed algorithm is stable and outperforms many other sequence prediction algorithms, measured by recall, precision, and F1-scores."中文简体版：旅游者在度假时，有一项非常重要的任务是制定一个合适的行程计划，尤其当访问不熟悉的城市时。许多旅游推荐工具只是考虑有限的因素，如流行的点位（POI）和路径约束。因此，他们提供的解决方案可能不一定适合个人用户。我们在这篇论文中提出了一种迭代算法，即BTREC（基于BERT的行程推荐算法），它从POIBERT嵌入算法中推荐个性化的行程计划。我们的BTREC算法将用户的人口信息和过去访问的POI相结合在一个修改后的BERT语言模型中，以预测给定源和目的POI的个性化行程预测。我们的推荐系统可以创建一个包含最多POI的行程计划，同时也考虑用户对分类POI和时间可用性的偏好。我们的推荐算法受到自然语言处理（NLP）中的句子完成问题的启发。使用八个不同规模的城市的数据集，我们的实验结果表明，我们提出的算法稳定性高，并且在多个序列预测算法中赢得了较高的回归、准确率和F1分数。

Learning quantum states and unitaries of bounded gate complexity

paper_url: http://arxiv.org/abs/2310.19882
repo_url: None
paper_authors: Haimeng Zhao, Laura Lewis, Ishaan Kannan, Yihui Quek, Hsin-Yuan Huang, Matthias C. Caro
for: 本文研究了学习量子状态和量子运算的复杂性。
methods: 作者使用了样本复杂性和查询复杂性来证明了学习量子状态和量子运算的复杂性。
results: 作者证明了学习量子状态和量子运算的sample complexity和查询复杂性必须线性增长。此外，作者还证明了在理想的批处理下，学习量子状态和量子运算的计算复杂性必须线性增长。这些结果解释了量子机器学习模型的表达能力和创造量子状态和量子运算的复杂性之间的关系。

Abstract
While quantum state tomography is notoriously hard, most states hold little interest to practically-minded tomographers. Given that states and unitaries appearing in Nature are of bounded gate complexity, it is natural to ask if efficient learning becomes possible. In this work, we prove that to learn a state generated by a quantum circuit with $G$ two-qubit gates to a small trace distance, a sample complexity scaling linearly in $G$ is necessary and sufficient. We also prove that the optimal query complexity to learn a unitary generated by $G$ gates to a small average-case error scales linearly in $G$. While sample-efficient learning can be achieved, we show that under reasonable cryptographic conjectures, the computational complexity for learning states and unitaries of gate complexity $G$ must scale exponentially in $G$. We illustrate how these results establish fundamental limitations on the expressivity of quantum machine learning models and provide new perspectives on no-free-lunch theorems in unitary learning. Together, our results answer how the complexity of learning quantum states and unitaries relate to the complexity of creating these states and unitaries.

摘要
“量子状态扫描不易，大多数状态对实际应用者来说并不具有兴趣。因为自然界中的状态和单位里程都具有有限的门阶复杂性，因此可以问到是否存在高效的学习方法。在这项工作中，我们证明了要用小距离来学习由quantum circuit生成的状态，需要样本复杂度 Linearly scale with G。我们还证明了要用average-case error来学习由G个门阶生成的单位ри程，需要查询复杂度 Linearly scale with G。虽然可以实现高效的学习，但是在合理的密码学假设下，计算复杂性为学习状态和单位里程的复杂性必须线性增长于G。我们示出了这些结果如何建立量子机器学习模型的基本限制，并提供了新的视角来评估unitary learning的no-free-lunch定理。这些结果回答了学习量子状态和单位里程的复杂性与创造这些状态和单位里程的复杂性之间的关系。”Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Metric Flows with Neural Networks

paper_url: http://arxiv.org/abs/2310.19870
repo_url: https://github.com/teluashish/traffic-flow-volume-prediction
paper_authors: James Halverson, Fabian Ruehle
for: 这篇论文是关于使用神经网络梯度下降来引导流形的理论研究。
methods: 论文使用了神经网络 gradient descent 来实现流形的发展，并 derivated 相应的流形方程，它们是由一个复杂的、非本地的 metric neural tangent kernel 控制的。
results: 论文通过对数字Calabi-Yau流形的应用来实践这些思想，并发现了一些有用的特性学习。

Abstract
We develop a theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincar\'e conjecture. We apply these ideas to numerical Calabi-Yau metrics, including a discussion on the importance of feature learning.

摘要
我们开发了一种在里曼纹理度量空间中流体的理论，它是由神经网络梯度下降引起的。这是由于近期神经网络对Calabi-Yau度量的近似而受到启发，以及对神经网络空间中流体的理解的进展。我们 derive了相应的流体方程，它们是由一个Metric neural tangent kernel（一种复杂、非本地的物体）控制的，这个物体在时间演化。然而，许多架构在无限宽限制下可以得到一个固定的kernel，从而简化动力学。此外，可以通过假设来引入本地性，使得流体可以实现Perelman的形式的Ricci流，这种流可以解决3d Poincaré conjecture。我们在数字Calabi-Yau度量中应用了这些想法，包括一些关于特征学习的讨论。

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

paper_url: http://arxiv.org/abs/2310.19861
repo_url: None
paper_authors: Shuang Qiu, Ziyu Dai, Han Zhong, Zhaoran Wang, Zhuoran Yang, Tong Zhang
for: 这 paper investigate posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations.
methods: authors propose model-based posterior sampling methods to control both players to learn Nash equilibrium, and incorporate adversarial GEC to handle partial observability.
results: authors provide low regret bounds for proposed algorithms that can scale sublinearly with the proposed GEC and the number of episodes $T$. These methods can be applied to a majority of tractable zero-sum MG classes in both fully observable and partially observable MGs with self-play and adversarial learning.

Abstract
This paper investigates posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations. Focusing on zero-sum Markov games (MGs) under two critical settings, namely self-play and adversarial learning, we first propose the self-play and adversarial generalized eluder coefficient (GEC) as complexity measures for function approximation, capturing the exploration-exploitation trade-off in MGs. Based on self-play GEC, we propose a model-based self-play posterior sampling method to control both players to learn Nash equilibrium, which can successfully handle the partial observability of states. Furthermore, we identify a set of partially observable MG models fitting MG learning with the adversarial policies of the opponent. Incorporating the adversarial GEC, we propose a model-based posterior sampling method for learning adversarial MG with potential partial observability. We further provide low regret bounds for proposed algorithms that can scale sublinearly with the proposed GEC and the number of episodes $T$. To the best of our knowledge, we for the first time develop generic model-based posterior sampling algorithms for competitive RL that can be applied to a majority of tractable zero-sum MG classes in both fully observable and partially observable MGs with self-play and adversarial learning.

摘要
To measure the complexity of function approximation in MGs, we propose the self-play and adversarial generalized eluder coefficient (GEC). This captures the exploration-exploitation trade-off in MGs and provides a basis for developing model-based posterior sampling methods to control both players in learning Nash equilibrium.In self-play settings, we propose a model-based self-play posterior sampling method that can successfully handle partial observability of states. Additionally, we identify a set of partially observable MG models that can be used for MG learning with adversarial policies of the opponent.Incorporating the adversarial GEC, we propose a model-based posterior sampling method for learning adversarial MG with potential partial observability. We provide low regret bounds for our proposed algorithms, which can scale sublinearly with the proposed GEC and the number of episodes $T$.Our contributions are twofold: (1) we develop generic model-based posterior sampling algorithms for competitive RL that can be applied to a majority of tractable zero-sum MG classes in both fully observable and partially observable MGs with self-play and adversarial learning; and (2) we provide low regret bounds for our proposed algorithms, which can scale sublinearly with the proposed GEC and the number of episodes $T$.To the best of our knowledge, this is the first time that model-based posterior sampling algorithms have been developed for competitive RL that can handle a wide range of zero-sum MG classes with self-play and adversarial learning, and provide low regret bounds that can scale sublinearly with the proposed GEC and the number of episodes $T$.

Robust Causal Bandits for Linear Models

paper_url: http://arxiv.org/abs/2310.19794
repo_url: None
paper_authors: Zirui Yan, Arpan Mukherjee, Burak Varıcı, Ali Tajer
for: 本文是为了研究 causal bandits 中模型不断变化的情况下的 robustness。
methods: 本文使用了 sequential design of experiments 来优化 reward function，并采用了 cumulative regret 作为设计目标。
results: 本文显示了现有方法在模型偏差的情况下的 regret 是线性增长的，而提出了一种robust causal bandits algorithm，其 regret 是 near-optimal 的 $\tilde{\mathcal{O}(\sqrt{T})$。

Abstract
Sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs). In the existing literature on CBs, a critical assumption is that the causal models remain constant over time. However, this assumption does not necessarily hold in complex systems, which constantly undergo temporal model fluctuations. This paper addresses the robustness of CBs to such model fluctuations. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown. Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations. First, it is established that the existing approaches fail to maintain regret sub-linearity with even a few instances of model deviation. Specifically, when the number of instances with model deviation is as few as $T^\frac{1}{2L}$, where $T$ is the time horizon and $L$ is the longest causal path in the graph, the existing algorithms will have linear regret in $T$. Next, a robust CB algorithm is designed, and its regret is analyzed, where upper and information-theoretic lower bounds on the regret are established. Specifically, in a graph with $N$ nodes and maximum degree $d$, under a general measure of model deviation $C$, the cumulative regret is upper bounded by $\tilde{\mathcal{O}(d^{L-\frac{1}{2}(\sqrt{NT} + NC))$ and lower bounded by $\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$. Comparing these bounds establishes that the proposed algorithm achieves nearly optimal $\tilde{\mathcal{O}(\sqrt{T})$ regret when $C$ is $o(\sqrt{T})$ and maintains sub-linear regret for a broader range of $C$.

摘要
统计设计实验可以有效地模型 causal 系统中的奖励函数。在现有的文献中，一个重要的假设是 causal 模型在时间上不会改变。但这个假设可能不正确，因为复杂的系统可能会在时间上持续地改变。这篇论文处理了 causal 系统中 model 改变的影响。我们专注于 causal 系统中的线性结构方程模型 (SEM)，并且所有的时间变化前后统计模型都是未知的。我们采用了累累 regret 作为设计标准，并且目标是设计一系列的干预，以实现最小的累累 regret，对于一个了解整个 causal 模型和其变化的 oracle。首先，我们证明了现有的方法在几次模型偏差后会具有线性 regret。具体来说，当时间悠久 $T$ 和最长 causal 路径 $L$ 的比例为 $T^\frac{1}{2L}$ 时，现有的算法将会具有线性 regret。接下来，我们设计了一个预警 causal 搜索 algorithm，并且分析了它的 regret。具体来说，在一个具有 $N$ 个节点和最大关系度 $d$ 的图形上，在一个普通的模型偏差 $C$ 下，累累 regret 是上界 $\tilde{\mathcal{O}(d^{L-\frac{1}{2}(\sqrt{NT} + NC))$ 和下界 $\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$。比较这些下界可以证明我们的算法实现了近乎最佳的 $\tilde{\mathcal{O}(\sqrt{T})$ regret，当 $C$ 是 $o(\sqrt{T})$ 时。此外，我们的算法还可以在更广泛的 $C$ 下维持线性 regret。

On Learning Gaussian Multi-index Models with Gradient Flow

paper_url: http://arxiv.org/abs/2310.19793
repo_url: None
paper_authors: Alberto Bietti, Joan Bruna, Loucas Pillaud-Vivien
for: 这个论文研究了高维 Gaussian 数据上的多指标回归问题，并使用了梯度流算法来解决这个问题。
methods: 这个论文使用了一种两个时间步骤算法，其中低维链函数使用非 Parametric 模型，而且在低维向量空间中学习了一个低维投影。
results: 论文表明，在适当地利用了子空间相关矩阵的 matrix semigroup 结构的情况下，可以确定Gradient 流动的全局收敛性，并提供了这个流动的相关 ‘saddle-to-saddle’ 动态的量化描述。

Abstract
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian population gradient flow dynamics, and provide a quantitative description of its associated `saddle-to-saddle' dynamics. Notably, the timescales associated with each saddle can be explicitly characterized in terms of an appropriate Hermite decomposition of the target link function. In contrast with these positive results, we also show that the related \emph{planted} problem, where the link function is known and fixed, in fact has a rough optimization landscape, in which gradient flow dynamics might get trapped with high probability.

摘要
我们研究了梯度流在多指标回归问题上，这个问题适用于高维 Gaussian 数据。多指标函数可以看作是一种含有未知低维线性投影和一个未知低维连接函数的组合。因此，它们成为了神经网络中的自然特征学习模板。我们考虑了一种两个时间标度的算法，其中低维连接函数使用非 Parametric 模型 infinitely faster than 子空间嵌入函数。通过利用子空间相关矩阵的matrix semigroup结构，我们证明了涉及的 Grassmannian 人口梯度流动的全球收敛性，并提供了相关的 `saddle-to-saddle' 动态的量化描述。与此不同的是，相关的植入（planted）问题，其中连接函数已知并固定，实际上有一个 rugged 优化山峰，因此梯度流动可能会在高概率下被困。

Locally Optimal Best Arm Identification with a Fixed Budget

paper_url: http://arxiv.org/abs/2310.19788
repo_url: None
paper_authors: Masahiro Kato
for: 这个研究目的是确定最佳治疗臂，即治疗臂具有最高预期结果。
methods: 我们使用了各种方法来探索最佳治疗臂，包括best arm identification（BAI）和ORDINAL OPTIMIZATION。我们还使用了Generalized-Neyman-Allocation（GNA）-empirical-best-arm（EBA）策略，这是Neyman（1934）所提出的Neyman分配的扩展和Bubeck等人（2011）所提出的Uniform-EBA策略。
results: 我们的实验结果显示，GNA-EBA策略在小差值领域下是 asymptotically 优化的，即其错误率与下限 bounds 相align，这意味着这个策略在这个领域下是最佳的。

Abstract
This study investigates the problem of identifying the best treatment arm, a treatment arm with the highest expected outcome. We aim to identify the best treatment arm with a lower probability of misidentification, which has been explored under various names across numerous research fields, including \emph{best arm identification} (BAI) and ordinal optimization. In our experiments, the number of treatment-allocation rounds is fixed. In each round, a decision-maker allocates a treatment arm to an experimental unit and observes a corresponding outcome, which follows a Gaussian distribution with a variance different among treatment arms. At the end of the experiment, we recommend one of the treatment arms as an estimate of the best treatment arm based on the observations. The objective of the decision-maker is to design an experiment that minimizes the probability of misidentifying the best treatment arm. With this objective in mind, we develop lower bounds for the probability of misidentification under the small-gap regime, where the gaps of the expected outcomes between the best and suboptimal treatment arms approach zero. Then, assuming that the variances are known, we design the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, which is an extension of the Neyman allocation proposed by Neyman (1934) and the Uniform-EBA strategy proposed by Bubeck et al. (2011). For the GNA-EBA strategy, we show that the strategy is asymptotically optimal because its probability of misidentification aligns with the lower bounds as the sample size approaches infinity under the small-gap regime. We refer to such optimal strategies as locally asymptotic optimal because their performance aligns with the lower bounds within restricted situations characterized by the small-gap regime.

摘要
To achieve this goal, we develop lower bounds for the probability of misidentification under the small-gap regime, where the gaps of the expected outcomes between the best and suboptimal treatment arms approach zero. We then design the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, which is an extension of the Neyman allocation proposed by Neyman (1934) and the Uniform-EBA strategy proposed by Bubeck et al. (2011). We show that the GNA-EBA strategy is asymptotically optimal because its probability of misidentification aligns with the lower bounds as the sample size approaches infinity under the small-gap regime. We refer to such optimal strategies as locally asymptotic optimal because their performance aligns with the lower bounds within restricted situations characterized by the small-gap regime.

Autoregressive Attention Neural Networks for Non-Line-of-Sight User Tracking with Dynamic Metasurface Antennas

paper_url: http://arxiv.org/abs/2310.19767
repo_url: None
paper_authors: Kyriakos Stylianopoulos, Murat Bayraktar, Nuria González Prelcic, George C. Alexandropoulos
for: 这个论文旨在革新下一代无线网络中的用户本地化和跟踪技术，使用动态金属表 antenna (DMA) 技术。
methods: 该论文使用两个阶段的机器学习方法进行用户跟踪，特别是在非直线射频环境中。首先，使用注意力机制来将噪声 Channel response 映射到用户位置。其次，通过一个学习的 autoregressive 模型来利用时间相关的通道信息来获得最终的位置预测。
results: 数字评估结果表明，尽管LoS堵塞，这种方法在多种 multipath 环境中可以实现高精度的位置预测。

Abstract
User localization and tracking in the upcoming generation of wireless networks have the potential to be revolutionized by technologies such as the Dynamic Metasurface Antennas (DMAs). Commonly proposed algorithmic approaches rely on assumptions about relatively dominant Line-of-Sight (LoS) paths, or require pilot transmission sequences whose length is comparable to the number of DMA elements, thus, leading to limited effectiveness and considerable measurement overheads in blocked LoS and dynamic multipath environments. In this paper, we present a two-stage machine-learning-based approach for user tracking, specifically designed for non-LoS multipath settings. A newly proposed attention-based Neural Network (NN) is first trained to map noisy channel responses to potential user positions, regardless of user mobility patterns. This architecture constitutes a modification of the prominent vision transformer, specifically modified for extracting information from high-dimensional frequency response signals. As a second stage, the NN's predictions for the past user positions are passed through a learnable autoregressive model to exploit the time-correlated channel information and obtain the final position predictions. The channel estimation procedure leverages a DMA receive architecture with partially-connected radio frequency chains, which results to reduced numbers of pilots. The numerical evaluation over an outdoor ray-tracing scenario illustrates that despite LoS blockage, this methodology is capable of achieving high position accuracy across various multipath settings.

摘要
User 本地化和跟踪在未来的无线网络中将有可能被革命化由动态元件天线（DMA）技术。通常的算法方法假设有 relativelly 主导的直线视线（LoS）路径，或者需要启发传输序列的长度相当于DMA元件的数量，从而导致有限的效果和较大的测量过程中的遮挡和动态干扰环境中。在本文中，我们提出了一种基于机器学习的两个阶段方法 для用户跟踪，特别是非直线视线多path 环境中。我们首先使用一种新的注意力基于神经网络（NN）来将干扰后的通道响应映射到潜在的用户位置，无论用户移动模式。这个架构是基于prominent vision transformer modify的，专门用于从高维频响应信号中提取信息。作为第二个阶段，NN的过去用户位置预测被传递给一个学习的自然随机过程，以利用时相关的通道信息并获得最终的位置预测。通道估计过程利用了DMA接收架构，其中一部分连接的 радио频信号链，这会减少数据的数量。 numerically 评估在一个outdoor 照明场景中表明，尽管LoS堵塞，这种方法可以在不同的多path 环境中实现高精度的位置预测。

Epidemic outbreak prediction using machine learning models

paper_url: http://arxiv.org/abs/2310.19760
repo_url: None
paper_authors: Akshara Pramod, JS Abhishek, Dr. Suganthi K
for: 这个研究是为了预测爆发在纽约州的流感、肝炎和 маля리아疫情，以便当地当局和医疗机构可以预先准备必要的药物和物资。
methods: 这个研究使用机器学习和深度学习算法，以及一个portal来预测疫情爆发。该算法使用历史数据预测下一个5个星期内可能出现的病例数量。此外，非клиниче因素如Google搜索趋势、社交媒体数据和天气数据也被使用来预测疫情爆发的可能性。
results: 根据研究结果，这些算法可以准确地预测疫情爆发，并且可以提供5个星期内的病例数量预测结果。这些结果可以帮助当地当局和医疗机构预先准备疫情应急应急响应。

Abstract
In today's world,the risk of emerging and re-emerging epidemics have increased.The recent advancement in healthcare technology has made it possible to predict an epidemic outbreak in a region.Early prediction of an epidemic outbreak greatly helps the authorities to be prepared with the necessary medications and logistics required to keep things in control. In this article, we try to predict the epidemic outbreak (influenza, hepatitis and malaria) for the state of New York, USA using machine and deep learning algorithms, and a portal has been created for the same which can alert the authorities and health care organizations of the region in case of an outbreak. The algorithm takes historical data to predict the possible number of cases for 5 weeks into the future. Non-clinical factors like google search trends,social media data and weather data have also been used to predict the probability of an outbreak.

摘要
今天的世界中，突发和重新爆发的疫情的风险增加了。最近的医疗技术发展使得可以预测一个地区的疫情爆发。预测疫情爆发的早期帮助当地 autorités 和医疗机构做好准备，以保持事务在控制之下。在这篇文章中，我们使用机器学习和深度学习算法预测新 York 州的 influenza、hepatitis 和 malaria 疫情爆发，并创建了一个portal，可以警示当地 autorités 和医疗机构在疫情爆发时。算法使用历史数据预测下一个5周内可能出现的病例数。此外，我们还使用非клиниче因素，如Google搜索趋势、社交媒体数据和天气数据预测疫情爆发的可能性。

Differentially Private Reward Estimation with Preference Feedback

paper_url: http://arxiv.org/abs/2310.19733
repo_url: None
paper_authors: Sayak Ray Chowdhury, Xingyu Zhou, Nagarajan Natarajan
For: This paper focuses on aligning generative models with human interests using preference-based feedback, and protecting the privacy of human labelers in the process.* Methods: The authors use reinforcement learning with human feedback (RLHF) to train generative models, and adopt the notion of label differential privacy (DP) to protect the privacy of individual labelers. They use the parametric Bradley-Terry-Luce (BTL) model to estimate the latent reward parameter $\theta^* \in \mathbb{R}^d$ from pairwise comparison feedback.* Results: The authors provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP, and show that the additional cost to ensure label-DP under local model is $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}\big)$, while it is $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$ under the weaker central model. They perform simulations on synthetic data that corroborate these theoretical results.Here is the information in Simplified Chinese text, as requested:* For: 这篇论文关注使用偏好反馈来对人类 интересов进行Alignment，并保护人类标签者的隐私。* Methods: 作者使用人类反馈学习（RLHF）来训练生成模型，并采用标签敏感 differential privacy（DP）来保护每个标签者的隐私。他们使用 Bradley-Terry-Luce（BTL）模型来估算基于对比比较的latent reward参数 $\theta^* \in \mathbb{R}^d$。* Results: 作者提供了 tight 上下文 bound 来估算 $\theta^*$ 的误差，并证明在本地模型下，为保持标签-DP，额外的成本为 $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}\big)$。在更弱的中央模型下，成本为 $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$。作者在合成数据上进行了仪表实验，并证明了这些理论结果。

Abstract
Learning from preference-based feedback has recently gained considerable traction as a promising approach to align generative models with human interests. Instead of relying on numerical rewards, the generative models are trained using reinforcement learning with human feedback (RLHF). These approaches first solicit feedback from human labelers typically in the form of pairwise comparisons between two possible actions, then estimate a reward model using these comparisons, and finally employ a policy based on the estimated reward model. An adversarial attack in any step of the above pipeline might reveal private and sensitive information of human labelers. In this work, we adopt the notion of label differential privacy (DP) and focus on the problem of reward estimation from preference-based feedback while protecting privacy of each individual labelers. Specifically, we consider the parametric Bradley-Terry-Luce (BTL) model for such pairwise comparison feedback involving a latent reward parameter $\theta^* \in \mathbb{R}^d$. Within a standard minimax estimation framework, we provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP. We show, for a given privacy budget $\epsilon$ and number of samples $n$, that the additional cost to ensure label-DP under local model is $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}\big)$, while it is $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$ under the weaker central model. We perform simulations on synthetic data that corroborate these theoretical results.

摘要
学习从偏好反馈获得了相当的满意度，作为一种使生成模型与人类兴趣相对应的方法。而不是依靠数字奖励，这些生成模型通过人工智能反馈学习（RLHF）进行训练。这些方法首先从人类标签器收集反馈，通常是两个可能的行动之间的对比，然后将这些反馈用来估算奖励模型，最后使用这个估算模型来采取策略。在这个管道中的任何攻击都可能泄露人类标签器的私人和敏感信息。在这种情况下，我们采用标签权限隐私（DP）的想法，并专注于基于偏好反馈的奖励估算问题，以保护每个人类标签器的隐私。具体来说，我们考虑使用 parametric Bradley-Terry-Luce（BTL）模型来处理这些对比反馈，其中包含一个隐藏奖励参数 $\theta^* \in \mathbb{R}^d$。在标准的最小最大估算框架下，我们提供了紧密的Upper和Lower bounds，用于估算 $\theta^*$ 的误差，并分析了在本地和中央模型下的DP。我们发现，对于给定的隐私预算 $\epsilon$ 和样本数 $n$，在本地模型下添加额外的隐私成本是 $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}\big)$，而在更弱的中央模型下，这个成本是 $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$。我们在合成数据上进行了实验，并证明了这些理论结论。

Support matrix machine: A review

paper_url: http://arxiv.org/abs/2310.19717
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Anuradha Kumari, Mushir Akhtar, Rupal Shah, M. Tanveer
for: This paper is written for academics and researchers who work with matrix input data and want to use support vector machines (SVMs) for classification and regression problems.
methods: The paper proposes a new method called support matrix machine (SMM) that can handle matrix input data and preserve the structural information of the data. SMM uses a combination of the nuclear norm and Frobenius norm, known as the spectral elastic net property, to achieve this.
results: The paper provides an in-depth analysis of the development of the SMM model, including numerous variants such as robust, sparse, class imbalance, and multi-class classification models. The paper also discusses applications of the SMM model and outlines potential future research avenues and possibilities.

Abstract
Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. The SMM method preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models. We also analyze the applications of the SMM model and conclude the article by outlining potential future research avenues and possibilities that may motivate academics to advance the SMM algorithm.

摘要
支持向量机器 (SVM) 是机器学习领域中最受研究的一种类型，用于分类和回归问题。它基于向量化输入数据。然而，现实世界中大量数据存在矩阵格式，需要将矩阵转换为向量，以便给SVM进行输入。然而，这个过程会破坏矩阵数据中的空间相关性，并且将输入数据的维度增加，导致计算复杂性增加。为了解决这些问题，支持矩阵机器 (SMM) 被提出。它是一种处理矩阵输入数据的新方法。SMM方法利用矩阵数据的特征信息，通过spectral elastic net property，该属性是 nuclear norm 和 Frobenius norm 的组合。本文提供了 SMM 模型的首次深入分析，可以作为新手和专家使用的综述。我们讨论了多种 SMM 变体，如 robust、稀热、类偏振和多类分类模型。我们还分析了 SMM 模型的应用，并将文章结束于对 SMM 算法的未来研究方向和可能性的讨论。

Exact Recovery and Bregman Hard Clustering of Node-Attributed Stochastic Block Model

paper_url: http://arxiv.org/abs/2310.19854
repo_url: None
paper_authors: Maximilien Dreveton, Felipe S. Fernandes, Daniel R. Figueiredo
for: 本研究旨在掌握节点网络中的社群标签，同时考虑节点的属性信息。
methods: 该研究提出了一种基于信息理论的 критерион，以及一种基于这个 критерион 的迭代归一化算法，以优化社群标签的掌握。
results: 实验结果表明，提出的算法在Synthetic数据上表现出色，比 классические算法和现状最佳算法更高效。

Abstract
Network clustering tackles the problem of identifying sets of nodes (communities) that have similar connection patterns. However, in many scenarios, nodes also have attributes that are correlated with the clustering structure. Thus, network information (edges) and node information (attributes) can be jointly leveraged to design high-performance clustering algorithms. Under a general model for the network and node attributes, this work establishes an information-theoretic criterion for the exact recovery of community labels and characterizes a phase transition determined by the Chernoff-Hellinger divergence of the model. The criterion shows how network and attribute information can be exchanged in order to have exact recovery (e.g., more reliable network information requires less reliable attribute information). This work also presents an iterative clustering algorithm that maximizes the joint likelihood, assuming that the probability distribution of network interactions and node attributes belong to exponential families. This covers a broad range of possible interactions (e.g., edges with weights) and attributes (e.g., non-Gaussian models), as well as sparse networks, while also exploring the connection between exponential families and Bregman divergences. Extensive numerical experiments using synthetic data indicate that the proposed algorithm outperforms classic algorithms that leverage only network or only attribute information as well as state-of-the-art algorithms that also leverage both sources of information. The contributions of this work provide insights into the fundamental limits and practical techniques for inferring community labels on node-attributed networks.

摘要
Under a general model for the network and node attributes, this work establishes an information-theoretic criterion for the exact recovery of community labels and characterizes a phase transition determined by the Chernoff-Hellinger divergence of the model. The criterion shows how network and attribute information can be exchanged to achieve exact recovery (e.g., more reliable network information requires less reliable attribute information).This work also presents an iterative clustering algorithm that maximizes the joint likelihood, assuming that the probability distribution of network interactions and node attributes belongs to exponential families. This covers a broad range of possible interactions (e.g., edges with weights) and attributes (e.g., non-Gaussian models), as well as sparse networks, while also exploring the connection between exponential families and Bregman divergences.Numerical experiments using synthetic data show that the proposed algorithm outperforms classic algorithms that only use network or attribute information and state-of-the-art algorithms that also use both sources of information. The contributions of this work provide insights into the fundamental limits and practical techniques for inferring community labels on node-attributed networks.

Convolutional State Space Models for Long-Range Spatiotemporal Modeling

paper_url: http://arxiv.org/abs/2310.19694
repo_url: None
paper_authors: Jimmy T. H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon
for: 模型长时间空间序列的挑战在于同时模型复杂的空间相关性和长距离时间依赖关系。
methods: 本文提出了一种新的卷积状态空间模型（ConvSSM），结合了卷积神经网络和状态空间方法的优点，以提高长时间空间序列的模型性能。
results: 对比于Transformers和卷积LSTM，ConvS5在一个长期运动MNIST实验中表现出色，训练3倍 faster和生成样本400倍 faster，并在DMLab、Minecraft和Habitat预测benchmark上与或超过了现有方法的性能。

Abstract
Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences. Here, we address the challenges of prior methods and introduce convolutional state space models (ConvSSM) that combine the tensor modeling ideas of ConvLSTM with the long sequence modeling approaches of state space methods such as S4 and S5. First, we demonstrate how parallel scans can be applied to convolutional recurrences to achieve subquadratic parallelization and fast autoregressive generation. We then establish an equivalence between the dynamics of ConvSSMs and SSMs, which motivates parameterization and initialization strategies for modeling long-range dependencies. The result is ConvS5, an efficient ConvSSM variant for long-range spatiotemporal modeling. ConvS5 significantly outperforms Transformers and ConvLSTM on a long horizon Moving-MNIST experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers. In addition, ConvS5 matches or exceeds the performance of state-of-the-art methods on challenging DMLab, Minecraft and Habitat prediction benchmarks and enables new directions for modeling long spatiotemporal sequences.

摘要
长期空间序列模型化是一项挑战，因为需要同时模型复杂的空间相关性和长距离时间依赖关系。ConvLSTM通过将张量状态更新为逻辑神经网络，但它们的顺序计算使其训练速度慢。相比之下，Transformers可以在并行计算整个空间时间序列，但它们的注意力成本 quadratic 增长，限制其扩展性。在这里，我们解决先前方法的挑战，并引入张量状态空间模型（ConvSSM），它将ConvLSTM的张量模型思想与状态方法如S4和S5的长序列模型方法结合。首先，我们说明了如何使用并行扫描来实现张量权重的并行化，以实现高效的自然递归生成。然后，我们证明了ConvSSM的动态与状态方法的动态相同，这使得我们可以设计参数和初始化策略来模型长距离相关性。结果是ConvS5，一种高效的ConvSSM变体，用于长距离空间时间模型化。ConvS5在一个长期 Moving-MNIST 实验中显著超过了Transformers和ConvLSTM，并在训练3倍 faster than ConvLSTM 和生成样本400倍 faster than Transformers。此外，ConvS5与当前状态级方法在 DMLab、Minecraft 和 Habitat 预测benchmark上具有相同或更高的性能，并开启了长期空间时间序列的新模型方向。

Towards Practical Non-Adversarial Distribution Alignment via Variational Bounds

paper_url: http://arxiv.org/abs/2310.19690
repo_url: None
paper_authors: Ziyu Gong, Ben Usman, Han Zhao, David I. Inouye
for: 学习不变表示，应用于公平和稳定性。
methods: 使用非对抗性VB-based方法，可应用于任何模型链。
results: 可以取代对抗损失，不需修改原始架构，广泛应用非对抗性Alignment方法。Here’s a breakdown of each point:
for: The paper is written for learning invariant representations, with a focus on fairness and robustness.
methods: The paper proposes a non-adversarial VAE-based alignment method that can be applied to any model pipeline.
results: The proposed method can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures, significantly broadening the applicability of non-adversarial alignment methods.

Abstract
Distribution alignment can be used to learn invariant representations with applications in fairness and robustness. Most prior works resort to adversarial alignment methods but the resulting minimax problems are unstable and challenging to optimize. Non-adversarial likelihood-based approaches either require model invertibility, impose constraints on the latent prior, or lack a generic framework for alignment. To overcome these limitations, we propose a non-adversarial VAE-based alignment method that can be applied to any model pipeline. We develop a set of alignment upper bounds (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based alignment approaches both theoretically and empirically. Finally, we demonstrate that our novel alignment losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial alignment methods.

摘要
<>转换文本到简化中文。<>分布对齐可以用来学习不变表示，并且有应用于公平和Robustness。大多数先前的工作都是通过对抗对齐方法来实现，但这些最小化问题是不稳定且困难优化。非对抗的可能性-基于方法 Either require model invertibility, impose constraints on the latent prior, or lack a generic framework for alignment. To overcome these limitations, we propose a non-adversarial VAE-based alignment method that can be applied to any model pipeline. We develop a set of alignment upper bounds (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based alignment approaches both theoretically and empirically. Finally, we demonstrate that our novel alignment losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial alignment methods.

DGFN: Double Generative Flow Networks

paper_url: http://arxiv.org/abs/2310.19685
repo_url: None
paper_authors: Elaine Lau, Nikhil Vemgal, Doina Precup, Emmanuel Bengio
for: 这篇研究探讨了用深度学习来进行药品探索，特别是运用Generative Flow Networks（GFlowNets/GFNs）生成多元候选者。
methods: 本研究引入了双层Generative Flow Networks（DGFNs），参考了增强学习和双层深度Q学习，使用目标网络获取访问路径，并将主网络更新为这些访问路径。
results: 实验结果显示，DGFNs有效地增强了在罕见奖励领域和高维州空间中的探索，具有丰富的应用前景在药品探索中。

Abstract
Deep learning is emerging as an effective tool in drug discovery, with potential applications in both predictive and generative models. Generative Flow Networks (GFlowNets/GFNs) are a recently introduced method recognized for the ability to generate diverse candidates, in particular in small molecule generation tasks. In this work, we introduce double GFlowNets (DGFNs). Drawing inspiration from reinforcement learning and Double Deep Q-Learning, we introduce a target network used to sample trajectories, while updating the main network with these sampled trajectories. Empirical results confirm that DGFNs effectively enhance exploration in sparse reward domains and high-dimensional state spaces, both challenging aspects of de-novo design in drug discovery.

摘要
深度学习在药物发现中emerging为有效工具，潜在应用于预测和生成模型。生成流网络（GFlowNets/GFNs）是最近引入的方法，被认可为能够生成多样化的候选者，尤其是小分子生成任务中。在这种工作中，我们引入双流网络（DGFNs）。通过引入奖励学习和双深度Q学习，我们引入一个目标网络用于采样轨迹，并将主网络更新为这些采样轨迹。实验结果表明，DGFNs有效地增强了探索性在缺乏奖励的领域和高维状态空间中，这些领域都是药物发现中的挑战。

Density Estimation for Entry Guidance Problems using Deep Learning

paper_url: http://arxiv.org/abs/2310.19684
repo_url: None
paper_authors: Jens A. Rataczak, Davide Amato, Jay W. McMahon
for: 这篇论文是用来解决行星入 atmospheric density profiles estimation问题的深度学习方法。
methods: 论文使用了一个长期短期记忆（LSTM）神经网络，通过学习将在board的探测器上可用的测量与 atmospheric density profile 之间的映射关系。测量包括圆柱状态表示、Cartesian感知加速度组件和表层压力测量。
results: 论文的结果表明，使用LSTM网络可以更好地预测行星入 atmospheric density profiles，并且可以重建过去飞行的 density profile。论文还证明了使用LSTM模型可以提高行星入 guidance 算法的终端准确性，比其他两种技术更好。

Abstract
This work presents a deep-learning approach to estimate atmospheric density profiles for use in planetary entry guidance problems. A long short-term memory (LSTM) neural network is trained to learn the mapping between measurements available onboard an entry vehicle and the density profile through which it is flying. Measurements include the spherical state representation, Cartesian sensed acceleration components, and a surface-pressure measurement. Training data for the network is initially generated by performing a Monte Carlo analysis of an entry mission at Mars using the fully numerical predictor-corrector guidance (FNPEG) algorithm that utilizes an exponential density model, while the truth density profiles are sampled from MarsGRAM. A curriculum learning procedure is developed to refine the LSTM network's predictions for integration within the FNPEG algorithm. The trained LSTM is capable of both predicting the density profile through which the vehicle will fly and reconstructing the density profile through which it has already flown. The performance of the FNPEG algorithm is assessed for three different density estimation techniques: an exponential model, an exponential model augmented with a first-order fading-memory filter, and the LSTM network. Results demonstrate that using the LSTM model results in superior terminal accuracy compared to the other two techniques when considering both noisy and noiseless measurements.

摘要
The training data for the network is generated by performing a Monte Carlo analysis of an entry mission at Mars using the fully numerical predictor-corrector guidance (FNPEG) algorithm, which utilizes an exponential density model. The truth density profiles are sampled from MarsGRAM. To refine the LSTM network's predictions, a curriculum learning procedure is developed.The trained LSTM network is capable of both predicting the density profile through which the vehicle will fly and reconstructing the density profile through which it has already flown. The performance of the FNPEG algorithm is assessed for three different density estimation techniques: an exponential model, an exponential model augmented with a first-order fading-memory filter, and the LSTM network. The results show that using the LSTM model results in superior terminal accuracy compared to the other two techniques when considering both noisy and noiseless measurements.

An Online Bootstrap for Time Series

paper_url: http://arxiv.org/abs/2310.19683
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Nicolai Palm, Thomas Nagler
for: 这篇论文是为了解决实时资料分析中的抽样问题，特别是处理大量相互相关的资料时。
methods: 本研究提出了一种基于自适应抽样质量的新方法，可以在线上执行，适合实时应用。这个方法基于一个自适应增长的抽样重要性序列。
results: 我们透过实验证明了这个方法的理论有效性，并且在复杂的资料依赖关系下提供了可靠的不确定量化。本研究将传统抽样技术与现代资料分析之间的距离缩小，提供了实用的工具 для研究者和实践者。

Abstract
Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.

摘要
bootstrap 方法如 bootstrap 已经在机器学习领域得到了广泛的应用。然而，传统的 bootstrap 方法在处理大量相关数据时失效，如时间序列或空间相关观察值。在这篇论文中，我们提出了一种新的 bootstrap 方法，可以考虑数据相互关系，并且可以在线执行，使其特别适用于实时应用。这种方法基于一个自增式相关的排序重样Weight。我们证明了该 bootstrap 方案的理论有效性，并通过了广泛的仿真实验，证明它可以在复杂数据相互关系下提供可靠的不确定性评估。我们的工作 bridges 了传统的排 sampling 技术和现代数据分析的需求之间，提供了一种有价值的工具，为研究人员和实践者在动态、数据丰富的环境中。

HyPE: Attention with Hyperbolic Biases for Relative Positional Encoding

paper_url: http://arxiv.org/abs/2310.19676
repo_url: None
paper_authors: Giorgio Angelotti
for: 提高Transformer架构中的注意机制的可 permutation-invariance性，使其能够更好地捕捉各个输入序列中的关系。
methods: 提出了一种新的Hyperbolic Positional Encoding（HyPE）方法，通过利用抽象函数的性质来编码输入序列中的各个元素的相对位置，从而不需要存储 $O(L^2)$ 值的面Mask，其中 $L$ 是输入序列的长度。HyPE 通过预limiting concatenation 操作和矩阵乘法来实现编码，并且可以在 FlashAttention-2 中兼容，并且支持任何可能存在的学习参数的梯度反propagation。
results: 通过分析示出，通过选择合适的 hyperparameter，HyPE 可以近似 ALiBi 的注意偏好，从而提供了更好的泛化能力，并且在未来的实验中可以作为一个可能的方向进行探索。

Abstract
In Transformer-based architectures, the attention mechanism is inherently permutation-invariant with respect to the input sequence's tokens. To impose sequential order, token positions are typically encoded using a scheme with either fixed or learnable parameters. We introduce Hyperbolic Positional Encoding (HyPE), a novel method that utilizes hyperbolic functions' properties to encode tokens' relative positions. This approach biases the attention mechanism without the necessity of storing the $O(L^2)$ values of the mask, with $L$ being the length of the input sequence. HyPE leverages preliminary concatenation operations and matrix multiplications, facilitating the encoding of relative distances indirectly incorporating biases into the softmax computation. This design ensures compatibility with FlashAttention-2 and supports the gradient backpropagation for any potential learnable parameters within the encoding. We analytically demonstrate that, by careful hyperparameter selection, HyPE can approximate the attention bias of ALiBi, thereby offering promising generalization capabilities for contexts extending beyond the lengths encountered during pretraining. The experimental evaluation of HyPE is proposed as a direction for future research.

摘要
在基于Transformer的架构中，注意机制自然地对输入序列中token的 permutation-invariant。为了强制顺序排序，通常使用一种方案，其中 Either fixed or learnable parameters are used to encode token positions. 我们介绍了一种新的方法：幽微位置编码（HyPE），它利用幽微函数的属性来编码token的相对位置。这种方法不需要存储 $O(L^2)$ 值的 маска，其中 $L$ 是输入序列的长度。HyPE 利用了先行 concatenation 操作和矩阵乘法，以便编码相对距离，并通过间接 incorporating 到 softmax 计算中的权重。这种设计确保了与 FlashAttention-2 兼容，并支持任何可能的可学习参数在编码中。我们分析表明，通过精心选择 hyperparameter，HyPE 可以近似 ALiBi 的注意力偏好，从而提供了扩展 beyond lengths encountered during pretraining 的普适化能力。HyPE 的实验评估被提议作为未来研究的方向。

Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes

paper_url: http://arxiv.org/abs/2310.19666
repo_url: https://github.com/wzhut/dynamic-tensor-decomposition-via-neural-diffusion-reaction-processes
paper_authors: Zheng Wang, Shikai Fang, Shibo Li, Shandian Zhe
For: This paper proposes a new method called Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE) for dynamic tensor decomposition, which can capture both the commonalities and personalities of the entities in the tensor.* Methods: The proposed method uses a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode, and a neural network to model the entry value as a nonlinear function of the embedding trajectories.* Results: The proposed method is shown to have advantages in both simulation study and real-world applications, and can capture the underlying temporal structure of the data more effectively than existing methods.

Abstract
Tensor decomposition is an important tool for multiway data analysis. In practice, the data is often sparse yet associated with rich temporal information. Existing methods, however, often under-use the time information and ignore the structural knowledge within the sparsely observed tensor entries. To overcome these limitations and to better capture the underlying temporal structure, we propose Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE). We develop a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode. Specifically, based on the observed tensor entries, we build a multi-partite graph to encode the correlation between the entities. We construct a graph diffusion process to co-evolve the embedding trajectories of the correlated entities and use a neural network to construct a reaction process for each individual entity. In this way, our model can capture both the commonalities and personalities during the evolution of the embeddings for different entities. We then use a neural network to model the entry value as a nonlinear function of the embedding trajectories. For model estimation, we combine ODE solvers to develop a stochastic mini-batch learning algorithm. We propose a stratified sampling method to balance the cost of processing each mini-batch so as to improve the overall efficiency. We show the advantage of our approach in both simulation study and real-world applications. The code is available at https://github.com/wzhut/Dynamic-Tensor-Decomposition-via-Neural-Diffusion-Reaction-Processes.

摘要
tensor 分解是多方数据分析中的重要工具。在实践中，数据通常是稀疏的， yet 具有丰富的时间信息。现有方法通常会下用时间信息和tensor 中稀疏观测的结构知识。为了超越这些限制和更好地捕捉下面结构，我们提出了动态嵌入 для动态tensor 分解（DEMOTE）。我们采用神经扩散-反应过程来估算动态嵌入 для不同模式的tensor 中的实体。具体来说，基于观测的tensor 入口，我们构建了多部分图来编码实体之间的相关性。我们构建了图扩散过程来同步嵌入轨迹的演化，并使用神经网络来构建每个实体的反应过程。这样，我们的模型可以捕捉不同实体的共同特征和个性特征在tensor 分解过程中的演化。然后，我们使用神经网络来模型每个入口的值为非线性函数。为模型估计，我们结合了ode 解除器来开发随机批处理算法。我们提出了分解采样方法，以保证每个批处理的成本相对均衡。我们在 simulate 研究和实际应用中展示了我们的方法的优势。代码可以在中找到。

Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model

paper_url: http://arxiv.org/abs/2310.19849
repo_url: https://github.com/eurekazhu/diffaffinity
paper_authors: Shiwei Liu, Tian Zhu, Milong Ren, Chungong Yu, Dongbo Bu, Haicang Zhang
for: 预测蛋白质-蛋白质结合的质量变化，用于蛋白工程和药物发现。
methods: 使用 representation learning 方法，基于无标注实验数据，学习蛋白质副链的生成过程，并对蛋白质-蛋白质接触面的结构上的变化进行 representations。
results: 实现了预测蛋白质-蛋白质结合的质量变化的最佳性能，并且 SidechainDiff 是首个使用液体傅尔父模型来生成副链结构的方法。

Abstract
Many crucial biological processes rely on networks of protein-protein interactions. Predicting the effect of amino acid mutations on protein-protein binding is vital in protein engineering and therapeutic discovery. However, the scarcity of annotated experimental data on binding energy poses a significant challenge for developing computational approaches, particularly deep learning-based methods. In this work, we propose SidechainDiff, a representation learning-based approach that leverages unlabelled experimental protein structures. SidechainDiff utilizes a Riemannian diffusion model to learn the generative process of side-chain conformations and can also give the structural context representations of mutations on the protein-protein interface. Leveraging the learned representations, we achieve state-of-the-art performance in predicting the mutational effects on protein-protein binding. Furthermore, SidechainDiff is the first diffusion-based generative model for side-chains, distinguishing it from prior efforts that have predominantly focused on generating protein backbone structures.

摘要
很多生物过程依赖于蛋白质-蛋白质之间的交互，而预测蛋白质变异对蛋白质-蛋白质绑定的影响是蛋白工程和药物发现中非常重要的。然而，实验室内缺乏绑定能力的标注数据，对于开发计算方法，特别是深度学习方法，带来了很大的挑战。在这项工作中，我们提出了SidechainDiff，一种基于学习推论的方法，利用无标注实验蛋白结构来学习蛋白分子中侧链的生成过程。SidechainDiff使用瑞 Mann 扩散模型来学习侧链的生成过程，同时还可以给蛋白质-蛋白质界面上的杂交位点 Representations。利用学习的表示，我们实现了对蛋白质变异对蛋白质-蛋白质绑定的预测性能的状态级别表现。此外，SidechainDiff是首个采用扩散模型生成侧链的方法，与之前的主要关注在蛋白质脊梁结构生成上。

Dis-inhibitory neuronal circuits can control the sign of synaptic plasticity

paper_url: http://arxiv.org/abs/2310.19614
repo_url: https://github.com/fmi-basel/disinhibitory-control
paper_authors: Julian Rossbroich, Friedemann Zenke
for: 解决 neural circuit 中信息归属问题
methods: 使用 microcircuit model 和 Hebbian learning rule
results: naturally emerges error-modulated learning 和 comparable performance to back-propagation of error on several non-linearly separable benchmarks

Abstract
How neuronal circuits achieve credit assignment remains a central unsolved question in systems neuroscience. Various studies have suggested plausible solutions for back-propagating error signals through multi-layer networks. These purely functionally motivated models assume distinct neuronal compartments to represent local error signals that determine the sign of synaptic plasticity. However, this explicit error modulation is inconsistent with phenomenological plasticity models in which the sign depends primarily on postsynaptic activity. Here we show how a plausible microcircuit model and Hebbian learning rule derived within an adaptive control theory framework can resolve this discrepancy. Assuming errors are encoded in top-down dis-inhibitory synaptic afferents, we show that error-modulated learning emerges naturally at the circuit level when recurrent inhibition explicitly influences Hebbian plasticity. The same learning rule accounts for experimentally observed plasticity in the absence of inhibition and performs comparably to back-propagation of error (BP) on several non-linearly separable benchmarks. Our findings bridge the gap between functional and experimentally observed plasticity rules and make concrete predictions on inhibitory modulation of excitatory plasticity.

摘要
neronal 网络如何进行信用分配仍然是系统神经科学中的中心未解问题。 Various 研究表明可能的解决方案是通过多层网络传递误差信号 backwards。 These 纯 fonctionally 动机化的模型假设了不同的 neuronal 腔室来表示本地误差信号，这些信号 Determine the sign of synaptic 弹性。 However, this explicit error 调变是与 Phenomenological 弹性模型不一致，这些模型中误差的 sign 主要取决于postsynaptic 活动。 Here we show how a plausible microcircuit 模型和 Hebbian 学习规则 Derived within an adaptive control theory 框架可以解决这个矛盾。 Assuming errors are encoded in top-down 异化 synaptic afferents, we show that error-modulated 学习 emerges naturally at the circuit level when recurrent inhibition explicitly influences Hebbian plasticity. The same 学习规则 accounts for experimentally observed plasticity in the absence of inhibition and performs comparably to back-propagation of error (BP) on several non-linearly separable benchmarks. Our findings bridge the gap between functional and experimentally observed plasticity rules and make concrete predictions on inhibitory modulation of excitatory plasticity.

Efficient Exploration in Continuous-time Model-based Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.19848
repo_url: None
paper_authors: Lenart Treven, Jonas Hübotter, Bhavya Sukhija, Florian Dörfler, Andreas Krause
for: 本研究的目的是提出一种基于模型的强化学习算法，用于解决维度时间的不确定性问题。
methods: 该算法使用非线性常微分方程（ODE）表示连续时间动力学，并使用准确的概率模型捕捉知识uncertainty。采用乐观原则进行探索。
results: 我们的 regret bound表明，在使用 Gaussian Processes（GP） Dynamics 和合适的 measurement selection strategy（MSS）时， regret 是下线的。此外，我们还提出了一种自适应、数据依存的实用MSS，可以在 fewer samples 下达到相同的性能。

Abstract
Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use the optimistic principle for exploration. Our regret bounds surface the importance of the measurement selection strategy(MSS), since in continuous time we not only must decide how to explore, but also when to observe the underlying system. Our analysis demonstrates that the regret is sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of MSS, such as equidistant sampling. Additionally, we propose an adaptive, data-dependent, practical MSS that, when combined with GP dynamics, also achieves sublinear regret with significantly fewer samples. We showcase the benefits of continuous-time modeling over its discrete-time counterpart, as well as our proposed adaptive MSS over standard baselines, on several applications.

摘要
常规强化学习算法通常考虑逻时动态，即使实际系统是连续时间的。在这篇论文中，我们介绍了一种基于非线性偏微分方程（ODE）的模型基于强化学习算法，用于捕捉连续时间动态中的知识uncertainty。我们使用了准确评估的概率模型，并采用了乐观原则来进行探索。我们的 regret bound 表明，在连续时间中，选择采样策略（MSS）的重要性，因为我们不仅需要决定如何探索，还需要 WHEN 观察到系统。我们的分析表明，使用 Gaussian Processes（GP）来模型 ODEs 的通常选择的 MSS，例如等间隔采样，可以获得下降 regret。此外，我们还提出了一种自适应、数据依赖的实用 MSS，并与 GP 动力相结合，可以实现下降 regret，并且需要更少的样本。我们还证明了连续时间模型的优越性，以及我们的提议的自适应 MSS 的优越性，在多个应用中。

On Feynman–Kac training of partial Bayesian neural networks

paper_url: http://arxiv.org/abs/2310.19608
repo_url: None
paper_authors: Zheng Zhao, Sebastian Mair, Thomas B. Schön, Jens Sjölund
for: 这研究旨在提出一种高效的训练策略，以优化半 bayesian neural network（pBNN）的预测性能。
methods: 该策略基于 simulating Feynman–Kac 模型，并使用sequential Monte Carlo samplers来同时估计参数和秘密 posterior distribution。
results: 对各种 synthetic 和实际世界数据进行了评估，并显示了与现有方法相比，该训练策略可以提高预测性能。

Abstract
Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent-variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. We show on various synthetic and real-world datasets that our proposed training scheme outperforms the state of the art in terms of predictive performance.

摘要
最近，半 bayesian neural network (pBNN) 已经展示了与全 bayesian neural network 相当的性能，但它们在隐变量空间中存在多模性，因此难以使用参数化模型进行近似。为解决这个问题，我们提出了一种高效的采样学习策略，其中将 trains a pBNN 为 Feynman--Kac 模型的实现。然后，我们描述了可以同时估计参数和隐变量 posterior distribution 的变种Sequential Monte Carlo 抽样方法，并在计算成本下降到可行水平。我们在各种 sintetic 和实际数据上表明了我们的训练方案在预测性能方面超过了现状。

Deep Kalman Filters Can Filter

paper_url: http://arxiv.org/abs/2310.19603
repo_url: https://github.com/rishabhpahuja/Apple-Tracking
paper_authors: Blanka Hovart, Anastasis Kratsios, Yannick Limmer, Xuwei Yang
for: 这个论文是为了探讨深度卡尔曼筛（DKF）是一类基于神经网络模型的概率度量生成算法，它们可以在时间序列数据上生成高度概率的抽象模型。
methods: 这篇论文使用了一种名为“离散时间DKF”的新的数学模型，该模型可以在不间断时间上实现非马歇尔过程的条件概率分布。
results: 这篇论文的结果表明，使用离散时间DKF可以在离散时间上高度准确地估计非马歇尔过程的条件概率分布，并且这种估计的误差可以通过二者 Wasserstein 距离来量化。

Abstract
Deep Kalman filters (DKFs) are a class of neural network models that generate Gaussian probability measures from sequential data. Though DKFs are inspired by the Kalman filter, they lack concrete theoretical ties to the stochastic filtering problem, thus limiting their applicability to areas where traditional model-based filters have been used, e.g.\ model calibration for bond and option prices in mathematical finance. We address this issue in the mathematical foundations of deep learning by exhibiting a class of continuous-time DKFs which can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-times measurements. Our approximation results hold uniformly over sufficiently regular compact subsets of paths, where the approximation error is quantified by the worst-case 2-Wasserstein distance computed uniformly over the given compact set of paths.

摘要
深度卡尔曼滤波器（DKF）是一类神经网络模型，可以从时序数据生成 Gaussian 概率度量。虽然 DKF 受 Kalman 滤波器的影响，但它们与传统的模型基 filtered 之间没有具体的理论关系，因此只能在传统的模型基 filtered 领域应用，如股票和选择价格的数学金融中进行模型调整。我们在数学深度学习的基础上解决这个问题，展示了一类连续时间DKF，可以约等 conditional law 的一类非马歇尔时间信号过程，基于噪声损失的连续时间测量。我们的近似结果在充分紧张的区域内保持，并且用最差2-沃asserstein距离来衡量近似错误的范围。

Operator Learning Enhanced Physics-informed Neural Networks for Solving Partial Differential Equations Characterized by Sharp Solutions

paper_url: http://arxiv.org/abs/2310.19590
repo_url: None
paper_authors: Bin Lin, Zhiping Mao, Zhicheng Wang, George Em Karniadakis
for: 解决具有锐解的partial differential equations (PDEs)的问题
methods: 使用Physics-informed Neural Networks (PINNs)和Deep Operator Network (DeepONet)
results: 能够成功地解决多种难以解决的问题，如非线性扩散-反应方程、布格尔方程和不可压缩 Navier-Stokes 方程，并且比vanilla PINN更具有抗过拟合和稳定性。

Abstract
Physics-informed Neural Networks (PINNs) have been shown as a promising approach for solving both forward and inverse problems of partial differential equations (PDEs). Meanwhile, the neural operator approach, including methods such as Deep Operator Network (DeepONet) and Fourier neural operator (FNO), has been introduced and extensively employed in approximating solution of PDEs. Nevertheless, to solve problems consisting of sharp solutions poses a significant challenge when employing these two approaches. To address this issue, we propose in this work a novel framework termed Operator Learning Enhanced Physics-informed Neural Networks (OL-PINN). Initially, we utilize DeepONet to learn the solution operator for a set of smooth problems relevant to the PDEs characterized by sharp solutions. Subsequently, we integrate the pre-trained DeepONet with PINN to resolve the target sharp solution problem. We showcase the efficacy of OL-PINN by successfully addressing various problems, such as the nonlinear diffusion-reaction equation, the Burgers equation and the incompressible Navier-Stokes equation at high Reynolds number. Compared with the vanilla PINN, the proposed method requires only a small number of residual points to achieve a strong generalization capability. Moreover, it substantially enhances accuracy, while also ensuring a robust training process. Furthermore, OL-PINN inherits the advantage of PINN for solving inverse problems. To this end, we apply the OL-PINN approach for solving problems with only partial boundary conditions, which usually cannot be solved by the classical numerical methods, showing its capacity in solving ill-posed problems and consequently more complex inverse problems.

摘要
physics-informed neural networks (PINNs) 已经被证明为解决部分数据方程式 (PDEs) 的前进和反射问题的有前途的方法。另一方面，神经操作方法，包括深度操作网络 (DeepONet) 和傅立叶神经操作 (FNO)，已经被引入并广泛使用以 aproximating PDEs 的解决方案。然而，当解决具有锋利解决方案的问题时，这两种方法会面临一定的挑战。为了解决这个问题，我们在这个工作中提出了一个新的框架，称为Operator Learning Enhanced Physics-informed Neural Networks (OL-PINN)。我们首先使用DeepONet来学习PDEs中相应的解决运算，然后与PINN相结合以解决目标的锋利解决方案问题。我们在多个问题上成功地使用OL-PINN，包括非线性扩散-反应方程、布格斯方程和不弹压流方程。相比于普通的PINN，我们的提案方法只需要一小部分的余类点来 achieve strong generalization capability，同时也提高了精度和稳定性。此外，OL-PINN继承了PINN的优点，可以解决反射问题，并且可以处理部分边界条件的问题，通常无法由古典数据方法解决。

Modeling Dynamics over Meshes with Gauge Equivariant Nonlinear Message Passing

paper_url: http://arxiv.org/abs/2310.19589
repo_url: https://github.com/jypark0/hermes
paper_authors: Jung Yeon Park, Lawson L. S. Wong, Robin Walters
for: 解决 Computer graphics 和生物physical systems 中数据 sobre non-Euclidean manifolds 问题
methods: 使用 gauge equivariant convolutional and attentional architectures on meshes
results: 提高了模型 surface PDEs 的性能，但是不同任务中的设计贸易offs 会导致不同的选择

Abstract
Data over non-Euclidean manifolds, often discretized as surface meshes, naturally arise in computer graphics and biological and physical systems. In particular, solutions to partial differential equations (PDEs) over manifolds depend critically on the underlying geometry. While graph neural networks have been successfully applied to PDEs, they do not incorporate surface geometry and do not consider local gauge symmetries of the manifold. Alternatively, recent works on gauge equivariant convolutional and attentional architectures on meshes leverage the underlying geometry but underperform in modeling surface PDEs with complex nonlinear dynamics. To address these issues, we introduce a new gauge equivariant architecture using nonlinear message passing. Our novel architecture achieves higher performance than either convolutional or attentional networks on domains with highly complex and nonlinear dynamics. However, similar to the non-mesh case, design trade-offs favor convolutional, attentional, or message passing networks for different tasks; we investigate in which circumstances our message passing method provides the most benefit.

摘要
<> translate text into Simplified ChineseData over non-Euclidean manifolds, often discretized as surface meshes, naturally arise in computer graphics and biological and physical systems. In particular, solutions to partial differential equations (PDEs) over manifolds depend critically on the underlying geometry. While graph neural networks have been successfully applied to PDEs, they do not incorporate surface geometry and do not consider local gauge symmetries of the manifold. Alternatively, recent works on gauge equivariant convolutional and attentional architectures on meshes leverage the underlying geometry but underperform in modeling surface PDEs with complex nonlinear dynamics. To address these issues, we introduce a new gauge equivariant architecture using nonlinear message passing. Our novel architecture achieves higher performance than either convolutional or attentional networks on domains with highly complex and nonlinear dynamics. However, similar to the non-mesh case, design trade-offs favor convolutional, attentional, or message passing networks for different tasks; we investigate in which circumstances our message passing method provides the most benefit.Translation notes:* "non-Euclidean" is translated as "非几何" (fēi jí hè)* "manifold" is translated as "流形" (liú xíng)* "partial differential equations" is translated as "部分偏微分方程" (bù zhāng tiān wēi dù fāng jiè)* "surface mesh" is translated as "表面网格" (biǎo miàn wǎng yǐ)* "gauge equivariant" is translated as " gauge 对称" (gāu yǐ xiàng)* "convolutional" is translated as "卷积" (juǎn shì)* "attentional" is translated as "注意" (zhù yì)* "message passing" is translated as "消息传递" (xiāo wèn chuán zhù)Please note that Simplified Chinese is used in this translation, which may differ from Traditional Chinese.

Model Uncertainty based Active Learning on Tabular Data using Boosted Trees

paper_url: http://arxiv.org/abs/2310.19573
repo_url: None
paper_authors: Sharath M Shankaranarayana
for: This paper focuses on active learning for tabular data using boosted trees, with a particular emphasis on measuring model uncertainty and leveraging it for efficient label acquisition.
methods: The paper proposes an uncertainty-based sampling strategy for active learning, using entropy as a measure of model uncertainty. Additionally, the authors propose two novel cost-effective active learning methods for regression and classification tasks.
results: The authors evaluate the proposed methods on several benchmark datasets and show that their uncertainty-based sampling strategy and cost-effective active learning methods achieve better performance compared to existing methods.

Abstract
Supervised machine learning relies on the availability of good labelled data for model training. Labelled data is acquired by human annotation, which is a cumbersome and costly process, often requiring subject matter experts. Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently by selecting the most valuable data instances for model training and querying the labels only for those instances from the human annotator. Recently, a lot of research has been done in the field of active learning, especially for deep neural network based models. Although deep learning shines when dealing with image\textual\multimodal data, gradient boosting methods still tend to achieve much better results on tabular data. In this work, we explore active learning for tabular data using boosted trees. Uncertainty based sampling in active learning is the most commonly used querying strategy, wherein the labels of those instances are sequentially queried for which the current model prediction is maximally uncertain. Entropy is often the choice for measuring uncertainty. However, entropy is not exactly a measure of model uncertainty. Although there has been a lot of work in deep learning for measuring model uncertainty and employing it in active learning, it is yet to be explored for non-neural network models. To this end, we explore the effectiveness of boosted trees based model uncertainty methods in active learning. Leveraging this model uncertainty, we propose an uncertainty based sampling in active learning for regression tasks on tabular data. Additionally, we also propose a novel cost-effective active learning method for regression tasks along with an improved cost-effective active learning method for classification tasks.

摘要
超visired机器学习 rely on the availability of good labeled data for model training. Labelled data is acquired by human annotation, which is a cumbersome and costly process, often requiring subject matter experts. Active learning is a sub-field of machine learning which helps in obtaining the labeled data efficiently by selecting the most valuable data instances for model training and querying the labels only for those instances from the human annotator. Recently, a lot of research has been done in the field of active learning, especially for deep neural network based models. Although deep learning shines when dealing with image\textual\multimodal data, gradient boosting methods still tend to achieve much better results on tabular data. In this work, we explore active learning for tabular data using boosted trees. Uncertainty based sampling in active learning is the most commonly used querying strategy, wherein the labels of those instances are sequentially queried for which the current model prediction is maximally uncertain. Entropy is often the choice for measuring uncertainty. However, entropy is not exactly a measure of model uncertainty. Although there has been a lot of work in deep learning for measuring model uncertainty and employing it in active learning, it is yet to be explored for non-neural network models. To this end, we explore the effectiveness of boosted trees based model uncertainty methods in active learning. Leveraging this model uncertainty, we propose an uncertainty based sampling in active learning for regression tasks on tabular data. Additionally, we also propose a novel cost-effective active learning method for regression tasks along with an improved cost-effective active learning method for classification tasks.

DataZoo: Streamlining Traffic Classification Experiments

paper_url: http://arxiv.org/abs/2310.19568
repo_url: None
paper_authors: Jan Luxemburk, Karel Hynek
for: 这篇论文主要是为了解决网络流量分类领域缺乏标准benchmark数据集和支持工具的问题。
methods: 该论文提出了一个名为DataZoo的工具集，用于加速网络流量分类领域的开发。DataZoo包括了标准化API访问三个大型数据集（CESNET-QUIC22、CESNET-TLS22和CESNET-TLS-Year22），以及feature scaling和realistic dataset partitioning方法。
results: 该论文通过DataZoo工具集，使得网络流量分类领域的开发更加容易、更加准确，同时也提高了result的 reproduceability和cross-comparison的能力。

Abstract
The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification and to reduce the space for potential mistakes in the evaluation setup. DataZoo provides a standardized API for accessing three extensive datasets -- CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-Year22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results.

摘要
machine learning 社区，如计算机视觉或自然语言处理等，已经开发出了许多支持工具和标准评估数据集，以加速开发。而网络流量分类领域却缺乏大多数任务的标准评估数据集，可用的支持软件也很有限。这篇论文想要填补这个空白，并引入数据 zoo，一套用于协调 dataset 管理的工具集。数据 zoo 提供了访问三个广泛的数据集——CESNET-QUIC22、CESNET-TLS22 和 CESNET-TLS-Year22 的标准 API。此外，它还包括特征整形和现实 dataset 分区方法，考虑了时间和服务相关因素。数据 zoo 工具集可以简化实际评估场景的创建，使得cross- comparing 分类方法和重复结果更加容易。

Non-parametric regression for robot learning on manifolds

paper_url: http://arxiv.org/abs/2310.19561
repo_url: None
paper_authors: P. C. Lopez-Custodio, K. Bharath, A. Kucukyilmaz, S. P. Preston
for: 本研究的目的是提出一种在抽象空间上进行回归的方法，以便在机器人学习中处理不拘束的抽象数据。
methods: 本方法基于一种叫做“内在”的方法，即在抽象空间上直接使用一种适当的概率分布，并通过一种非 Parametric 的方法来估算这个概率分布的参数。
results: 实验结果表明，使用本方法可以在机器人学习中提高预测精度，并且比使用投影基本法更好。

Abstract
Many of the tools available for robot learning were designed for Euclidean data. However, many applications in robotics involve manifold-valued data. A common example is orientation; this can be represented as a 3-by-3 rotation matrix or a quaternion, the spaces of which are non-Euclidean manifolds. In robot learning, manifold-valued data are often handled by relating the manifold to a suitable Euclidean space, either by embedding the manifold or by projecting the data onto one or several tangent spaces. These approaches can result in poor predictive accuracy, and convoluted algorithms. In this paper, we propose an "intrinsic" approach to regression that works directly within the manifold. It involves taking a suitable probability distribution on the manifold, letting its parameter be a function of a predictor variable, such as time, then estimating that function non-parametrically via a "local likelihood" method that incorporates a kernel. We name the method kernelised likelihood estimation. The approach is conceptually simple, and generally applicable to different manifolds. We implement it with three different types of manifold-valued data that commonly appear in robotics applications. The results of these experiments show better predictive accuracy than projection-based algorithms.

摘要
许多机器人学习工具是为欧几何数据设计。然而，许多机器人应用中的数据是拥有非欧几何结构的。例如，Orientation可以表示为3x3旋转矩阵或 quarternion，这些空间都是非欧几何 manifold。在机器人学习中，把拥有非欧几何结构的数据处理为Euclidean space 是一个常见的做法。这些方法可能会导致预测精度不佳和算法复杂。在这篇论文中，我们提出了一种“内在”的回归方法，可以直接在拥有非欧几何结构的 manifold 上进行。这种方法是基于一个适当的概率分布在拥有非欧几何结构的 manifold 上，使其参数为预测变量，例如时间。然后使用一种“本地概率”方法来估计这个函数，这种方法包含一个核函数。我们称之为核化概率估计。这种方法概念简单，通用于不同的拥有非欧几何结构的 manifold。我们在三种常见的机器人学习中使用了不同类型的拥有非欧几何结构的数据进行实验，实验结果表明这种方法的预测精度比 projection-based 算法更高。

Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification

paper_url: http://arxiv.org/abs/2310.19558
repo_url: None
paper_authors: Yiwei Li, Chien-Wei Huang, Shuai Wang, Chong-Yung Chi, Tony Q. S. Quek
for: 这篇论文关注了 Federated Learning (FL) 领域中的一类非斜方程和非凸函数问题，这些问题在 FL 应用中很普遍，但却具有复杂的非斜性和非凸性特性，同时需要考虑通信效率和隐私保证。
methods: 这篇论文提出了一个基于 primal-dual 算法的 Federated Learning 方法，其特点是具有双向模型简化，这样可以实现更好的通信效率和隐私保证。此外，论文还应用了对称隐私技术来保证隐私。
results: 实验结果显示，提出的 Federated Learning 方法在实际数据上具有明显的优势，与一些现有的 FL 算法相比，其性能明显更高。此外，论文还 validate了所有的分析结果和性能特性。

Abstract
Federated learning (FL) has been recognized as a rapidly growing research area, where the model is trained over massively distributed clients under the orchestration of a parameter server (PS) without sharing clients' data. This paper delves into a class of federated problems characterized by non-convex and non-smooth loss functions, that are prevalent in FL applications but challenging to handle due to their intricate non-convexity and non-smoothness nature and the conflicting requirements on communication efficiency and privacy protection. In this paper, we propose a novel federated primal-dual algorithm with bidirectional model sparsification tailored for non-convex and non-smooth FL problems, and differential privacy is applied for strong privacy guarantee. Its unique insightful properties and some privacy and convergence analyses are also presented for the FL algorithm design guidelines. Extensive experiments on real-world data are conducted to demonstrate the effectiveness of the proposed algorithm and much superior performance than some state-of-the-art FL algorithms, together with the validation of all the analytical results and properties.

摘要

Approximation Theory, Computing, and Deep Learning on the Wasserstein Space

paper_url: http://arxiv.org/abs/2310.19548
repo_url: None
paper_authors: Massimo Fornasier, Pascal Heid, Giacomo Enrico Sodini
for: 本研究探讨了使用机器学习方法数学approximation Sobolev-smooth函数定义在概率空间中的问题。
methods: 本研究采用了三种机器学习方法来定义函数approximants：1. 解决一个有限个优点运输问题并计算相应的 Wasserstein potentials。2. 使用Wasserstein Sobolev空间中的empirical risk minimizationwith Tikhonov regularization。3. 通过锚点形式来表示Tikhonov函数的弱形式Euler-Lagrange方程。
results: 本研究提供了explicit和量化的bounds on generalization errors for each of these solutions。在证明过程中，我们利用了度量 Sobolev空间的理论和优点运输技术、variational calculus和大偏差 bounds。在数值实现中，我们使用了适应设计的神经网络作为基函数。这些神经网络在训练后可以快速地评估。因此，我们的构建解决方案可以在等级准确性下提高评估速度，超过当前状态方法的几个数量级。

Abstract
The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation. As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude.

摘要
“函数approximation在无穷dimensional空间中从finite samples中进行approximation是广泛认为是困难的挑战。在这种研究中，我们对 Sobolev-smooth 函数定义在概率空间上进行数值 aproximation 进行了研究。我们的特定关注点在于 Wasserstein 距离函数，它是一个有 relevance 的例子。相比之前的文献中关注点在于快速地计算点 wise 评估，我们采取了三种机器学习基于方法：1. 解决 finite 个优质 transport 问题，并计算相应的 Wasserstein 潜 potential。2. 使用 Tikhonov 补做 regularization 在 Wasserstein Sobolev 空间中进行 Empirical Risk Minimization。3. 通过 saddle point 表示法，解决这个问题。在证明中，我们利用了 metric Sobolev 空间理论和optimal transport 技术，并将其与variational calculus 和大数据准则绑定在一起。在数值实现中，我们使用适应设计的神经网络作为基函数。这些神经网络进行训练，并通过不同的方法进行训练。这种方法使得我们可以获得高度精度的函数 aproximation，并且可以在训练后快速地计算这些函数。因此，我们的构建解决方案可以在同等精度下大幅提高评估速度，超过现有方法几个数量级。”

On consequences of finetuning on data with highly discriminative features

paper_url: http://arxiv.org/abs/2310.19537
repo_url: None
paper_authors: Wojciech Masarczyk, Tomasz Trzciński, Mateusz Ostaszewski
for: 这篇论文主要是为了探讨在传输学习时， neural network 是否会忽略先前学习的特征，以及这种现象对网络性能和内部表示的影响。
methods: 作者使用了多种方法来分析传输学习中的特征衰退现象，包括网络性能测试、特征重要性分析和内部表示分析等。
results: 研究发现，在传输学习中，网络倾向于优先学习基本数据模式，导致已经学习的特征被忽略，从而影响网络的性能和内部表示。

Abstract
In the era of transfer learning, training neural networks from scratch is becoming obsolete. Transfer learning leverages prior knowledge for new tasks, conserving computational resources. While its advantages are well-documented, we uncover a notable drawback: networks tend to prioritize basic data patterns, forsaking valuable pre-learned features. We term this behavior "feature erosion" and analyze its impact on network performance and internal representations.

摘要
在转移学习时代，从头开始训练神经网络已成为过时。转移学习利用了先前学习的知识，以便应用于新任务，减少计算资源。虽然它的优点已经很好地记录下来，但我们发现了一个明显的缺点：神经网络往往强调基本数据模式，抛弃有价值的先前学习特征。我们称这种行为为“特征蚀减”，并分析其对网络性能和内部表示的影响。

Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation

paper_url: http://arxiv.org/abs/2310.19536
repo_url: None
paper_authors: Jialin Liu, Xinyan Su, Zeyu He, Xiangyu Zhao, Jun Li
for: 本研究目标是学习奖励（LTR），即在 reinforcement learning 中学习用户奖励。
methods: 我们提出了一种批量反 inverse reinforcement learning 方法，利用折扣站点分布 corrections 结合 LTR 和 recommender agent 评估。我们还利用 Bellman 变换和 KL 正则化来保持 consecutive policy 更新的 Compositional requirement。
results: 我们在两个实际数据集上进行了实验，结果显示，我们的方法可以相对提高效iveness（2.3%）和效率（11.53%）。

Abstract
Rewards serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. In this research, we focus on the problem of learning to reward (LTR), which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization, or assume that user-agent interactions provide perfect demonstrations, which is not feasible in practice. Ideally, we aim to employ a unified approach that optimizes both the reward and policy using compositional demonstrations. However, this requirement presents a challenge since rewards inherently quantify user feedback on-policy, while recommender agents approximate off-policy future cumulative valuation. To tackle this challenge, we propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties. Our method utilizes discounted stationary distribution correction to combine LTR and recommender agent evaluation. To fulfill the compositional requirement, we incorporate the concept of pessimism through conservation. Specifically, we modify the vanilla correction using Bellman transformation and enforce KL regularization to constrain consecutive policy updates. We use two real-world datasets which represent two compositional coverage to conduct empirical studies, the results also show that the proposed method relatively improves both effectiveness (2.3\%) and efficiency (11.53\%)

摘要
奖励 serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. 在这个研究中，我们关注学习奖励（LTR）问题，这是基本的回归学习问题。先前的方法可以是引入额外的学习奖励程序，从而增加优化的复杂度，或者假设用户-代理交互提供完美的示例，这不是实际情况。理想地，我们想使用一种统一的方法，同时优化奖励和策略使用compositional示例。但这会提出一个挑战，因为奖励自然地量化用户反馈on-policy，而推荐代理 approximates off-policy未来累累值。为解决这个挑战，我们提出了一种新的批量反向学习 paradigma，实现了所需的属性。我们的方法使用折扣站ary分布 corrected来combine LTR和推荐代理评价。为了满足compositional要求，我们加入了保守性的概念，specifically，我们修改了vanilla correction，并在 Bellman 变换中强制执行KL regularization，以制约 consecutive policy 更新。我们使用了两个实际数据集，代表了两种compositional coverage，进行了empirical研究，结果也表明，我们提出的方法相对提高了效果（2.3%）和效率（11.53%）。

Decoupled Actor-Critic

paper_url: http://arxiv.org/abs/2310.19527
repo_url: None
paper_authors: Michal Nauman, Marek Cygan
for: 本研究旨在解决actor-critic方法中的两个问题：首先，批评家倾向于过度估计，需要从保守策略优化的下界Q值中采样 temporal-difference 目标。其次，已知结果表明，面对不确定性时，乐见的策略会带来更低的 regret 水平。
methods: 我们提出了一种叫做Decoupled Actor-Critic（DAC）的离散actor-critic方法，该方法通过梯度反推学习两个不同的actor：一个保守的actor用于 temporal-difference 学习，另一个乐见的actor用于探索。
results: 我们在DeepMind Control任务中进行了低和高回放率 régime的测试，并对多个设计选择进行了抹除。结果显示，Despite minimal computational overhead，DAC可以在涤力学任务中 achieve state-of-the-art performance和sample efficiency。

Abstract
Actor-Critic methods are in a stalemate of two seemingly irreconcilable problems. Firstly, critic proneness towards overestimation requires sampling temporal-difference targets from a conservative policy optimized using lower-bound Q-values. Secondly, well-known results show that policies that are optimistic in the face of uncertainty yield lower regret levels. To remedy this dichotomy, we propose Decoupled Actor-Critic (DAC). DAC is an off-policy algorithm that learns two distinct actors by gradient backpropagation: a conservative actor used for temporal-difference learning and an optimistic actor used for exploration. We test DAC on DeepMind Control tasks in low and high replay ratio regimes and ablate multiple design choices. Despite minimal computational overhead, DAC achieves state-of-the-art performance and sample efficiency on locomotion tasks.

摘要

Generator Identification for Linear SDEs with Additive and Multiplicative Noise

paper_url: http://arxiv.org/abs/2310.19491
repo_url: None
paper_authors: Yuanyuan Wang, Xi Geng, Wei Huang, Biwei Huang, Mingming Gong
for: 这个论文是为了研究如何从解析解的分布来确定生成器的线性随机振荡方程（SDE）的生成器的condition。
methods: 这篇论文使用了线性SDE的分布来确定生成器的方法。
results: 这篇论文提出了线性SDE的生成器可以通过解析解的分布来确定的必要和 suficient condition，并且提供了这些condition的几何解释。

Abstract
In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifically, we derive a sufficient and necessary condition for identifying the generator of linear SDEs with additive noise, as well as a sufficient condition for identifying the generator of linear SDEs with multiplicative noise. We show that the conditions derived for both types of SDEs are generic. Moreover, we offer geometric interpretations of the derived identifiability conditions to enhance their understanding. To validate our theoretical results, we perform a series of simulations, which support and substantiate the established findings.

摘要
在这篇论文中，我们提出了确定Linear Stochastic Differential Equation（SDE）生成器的条件。这些条件是 causal inference 中 linear SDE 的关键因素，它们允许我们从观察分布中确定post-intervention 分布。我们得到了 linear SDE 添加噪声的必要和 suficient condition，以及 linear SDE 乘法噪声的 sufficient condition。我们发现这些条件对于 both types of SDEs 是通用的。此外，我们还提供了这些条件的 geometric interpretation，以便更好地理解。为验证我们的理论结论，我们进行了一系列的 simulations，它们支持和证实了我们的结论。Here's the translation of the text in Traditional Chinese:在这篇论文中，我们提出了确定Linear Stochastic Differential Equation（SDE）生成器的条件。这些条件是 causal inference 中 linear SDE 的关键因素，它们允许我们从观察分布中确定post-intervention 分布。我们得到了 linear SDE 添加噪声的必要和 suficient condition，以及 linear SDE 乘法噪声的 sufficient condition。我们发现这些条件对于 both types of SDEs 是通用的。此外，我们还提供了这些条件的 geometric interpretation，以便更好地理解。为验证我们的理论结论，我们进行了一系列的 simulations，它们支持和证实了我们的结论。

Adaptive Meta-Learning-Based KKL Observer Design for Nonlinear Dynamical Systems

paper_url: http://arxiv.org/abs/2310.19489
repo_url: None
paper_authors: Lukas Trommer, Halil Yigit Oksuz
for: 这篇论文是关于非线性系统观察器设计的研究，具体来说是通过meta-学习来优化观察器的设计，以便更好地适应非线性系统的不同状况和特性。
methods: 该论文使用了人工神经网络来 aproximate非线性变换Map，并通过一种基于学习的方法来设计观察器。
results: 实验结果表明，该方法可以高度准确地估计非线性系统的状态，并且具有良好的泛化能力、鲁棒性和适应性。

Abstract
The theory of Kazantzis-Kravaris/Luenberger (KKL) observer design introduces a methodology that uses a nonlinear transformation map and its left inverse to estimate the state of a nonlinear system through the introduction of a linear observer state space. Data-driven approaches using artificial neural networks have demonstrated the ability to accurately approximate these transformation maps. This paper presents a novel approach to observer design for nonlinear dynamical systems through meta-learning, a concept in machine learning that aims to optimize learning models for fast adaptation to a distribution of tasks through an improved focus on the intrinsic properties of the underlying learning problem. We introduce a framework that leverages information from measurements of the system output to design a learning-based KKL observer capable of online adaptation to a variety of system conditions and attributes. To validate the effectiveness of our approach, we present comprehensive experimental results for the estimation of nonlinear system states with varying initial conditions and internal parameters, demonstrating high accuracy, generalization capability, and robustness against noise.

摘要
《kazantzis-kravaris/Luenberger（KKL）观察器设计理论》引入了一种使用非线性变换Map和其左逆函数来估计非线性系统的状态的方法ологи。使用人工神经网络进行数据驱动的方法已经证明了高度准确地 aproximate这些变换Map。本文提出了一种基于机器学习的观察器设计方法，通过meta-学习来优化学习模型，以便快速适应 distribuition of tasks 中的学习问题。我们提出了一种基于测量系统输出信息的框架，用于设计一种可在线适应系统条件和特性的学习型KKL观察器。为验证我们的方法的有效性，我们提供了广泛的实验结果，包括非线性系统的初始条件和内部参数变化的情况， demonstrating高精度、泛化能力和对噪声的Robustness。

Grokking Tickets: Lottery Tickets Accelerate Grokking

paper_url: http://arxiv.org/abs/2310.19470
repo_url: https://github.com/gouki510/grokking-tickets
paper_authors: Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo
for: 本研究旨在探讨神经网络 generale 的机制，即从 Lottery Ticket Hypothesis 出发，找到可以快速泛化的 ‘’Grokking Tickets’’（好 sparse subnetworks），并证明这些子网络在不同的配置下（MLP 和 Transformer，以及数学和图像分类任务）可以快速泛化。
methods: 本研究使用 ‘’Grokking Tickets’’ 来描述从记忆化解决方案转移到泛化解决方案的过渡阶段。 ‘’Grokking Tickets’’ 通过 magnitude pruning после完美泛化而被识别出来。
results: 研究发现，使用 ‘’Grokking Tickets’’ 可以大幅加速泛化，并且这种加速不仅在不同的配置下得到证明，而且比 dense network 更快。此外，研究还发现，在适当的剔除率下，泛化可以在没有权重衰减的情况下实现。

Abstract
Grokking is one of the most surprising puzzles in neural network generalization: a network first reaches a memorization solution with perfect training accuracy and poor generalization, but with further training, it reaches a perfectly generalized solution. We aim to analyze the mechanism of grokking from the lottery ticket hypothesis, identifying the process to find the lottery tickets (good sparse subnetworks) as the key to describing the transitional phase between memorization and generalization. We refer to these subnetworks as ''Grokking tickets'', which is identified via magnitude pruning after perfect generalization. First, using ''Grokking tickets'', we show that the lottery tickets drastically accelerate grokking compared to the dense networks on various configurations (MLP and Transformer, and an arithmetic and image classification tasks). Additionally, to verify that ''Grokking ticket'' are a more critical factor than weight norms, we compared the ''good'' subnetworks with a dense network having the same L1 and L2 norms. Results show that the subnetworks generalize faster than the controlled dense model. In further investigations, we discovered that at an appropriate pruning rate, grokking can be achieved even without weight decay. We also show that speedup does not happen when using tickets identified at the memorization solution or transition between memorization and generalization or when pruning networks at the initialization (Random pruning, Grasp, SNIP, and Synflow). The results indicate that the weight norm of network parameters is not enough to explain the process of grokking, but the importance of finding good subnetworks to describe the transition from memorization to generalization. The implementation code can be accessed via this link: \url{https://github.com/gouki510/Grokking-Tickets}.

摘要
干货猪肉是神经网络通用化的一个最有趣的拟合问题：一个网络在完美的训练精度下达到了记忆解决方案，但在进一步训练后它能够达到完美的总结解决方案。我们想要分析干货猪肉机制从抽奖签 hypothesis开始，并识别找到好的干货猪肉（好的稀疏网络）作为总结和拟合之间的关键过渡阶段。我们称这些签证为“干货猪肉签”，通过干货猪肉的大小减少来识别它们。我们首先使用干货猪肉签显示干货猪肉可以快速加速拟合，并在多种配置（MLP和Transformer）和数学和图像分类任务中进行了比较。此外，我们还比较了与稀疏网络相同L1和L2 нор的 dense网络，结果显示干货猪肉总结 faster than控制的稀疏模型。在进一步的调查中，我们发现在适当的减少率下，拟合可以在没有权重衰减的情况下得到。此外，我们还发现速度不会发生在使用记忆解决方案或在转移到总结和拟合之间的过渡阶段，或在初始化（随机减少、抓取、SNIP和Synflow）中减少网络参数。结果表明网络参数的重量 нор不够用来解释干货猪肉的过程，但是找到好的干货猪肉可以描述总结和拟合之间的过渡阶段。相关实现代码可以通过以下链接获取：\url{https://github.com/gouki510/Grokking-Tickets}。

Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems

paper_url: http://arxiv.org/abs/2310.19468
repo_url: None
paper_authors: Jialin Yi
for: 这个论文主要针对的是多智能体系统（MACL）的设计和分析，用于解决决策问题。
methods: 这个论文使用了多智能体系统的各种学习算法，包括优化算法和搜索算法，以实现最佳决策。
results: 这个论文提出了一系列的 regret Lower bound，用于衡量多智能体系统在决策问题上的性能。这些 regret Lower bound 取决于通信网络的连接度和延迟时间，从而为 MACL 系统的设计提供了有用的指导。

Abstract
A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic control, cloud computing, robotics) has sparked active research into the design and analysis of MACL systems for sequential decision making problems. One important metric of the learning algorithm for decision making problems is its regret, i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis, I analyze MACL systems for different sequential decision making problems. Concretely, the Chapter 3 and 4 investigate the cooperative multi-agent multi-armed bandit problems, with full-information or bandit feedback, in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems, I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay, thus giving useful guidance on design of the communication protocol in MACL systems

摘要
《多智能合作学习（MACL）系统是一种人工智能（AI）系统，其中多个学习代理共同完成共同任务。 recent empirical success of MACL systems in various domains（例如交通管理、云计算、机器人等）has sparked active research into the design and analysis of MACL systems for sequential decision-making problems. One important metric of the learning algorithm for decision-making problems is its regret，i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis，I analyze MACL systems for different sequential decision-making problems. Specifically，Chapter 3 and 4 investigate the cooperative multi-agent multi-armed bandit problems，with full-information or bandit feedback，in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems，I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4, and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay，thus giving useful guidance on design of the communication protocol in MACL systems.》Note that Simplified Chinese is a written form of Chinese that uses simpler characters and grammar than Traditional Chinese. The translation is done in a way that is consistent with the conventions of Simplified Chinese.

MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation

paper_url: http://arxiv.org/abs/2310.19454
repo_url: None
paper_authors: Chandrani Kumari, Rahul Siddharthan
for: 这两个任务是关于不同类型数据表的 clustering 和生成Synthetic数据的新算法。
methods: 这两个任务使用的方法是基于EM的聚类算法和深度学习生成Synthetic数据。
results: 这两个任务的结果是一个高性能的聚类算法和一种可以生成高质量Synthetic数据的算法。

Abstract
We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the presence of a disease) may depend not only on the other variables but on the cluster context. Moreover, sharing of biomedical data is often hindered by patient confidentiality laws, and there is current interest in algorithms to generate synthetic tabular data from real data, for example via deep learning. We demonstrate a novel EM-based clustering algorithm, MMM (``Madras Mixture Model''), that outperforms standard algorithms in determining clusters in synthetic heterogeneous data, and recovers structure in real data. Based on this, we demonstrate a synthetic tabular data generation algorithm, MMMsynth, that pre-clusters the input data, and generates cluster-wise synthetic data assuming cluster-specific data distributions for the input columns. We benchmark this algorithm by testing the performance of standard ML algorithms when they are trained on synthetic data and tested on real published datasets. Our synthetic data generation algorithm outperforms other literature tabular-data generators, and approaches the performance of training purely with real data.

摘要
我们提供了新的算法，用于两个与异类表格数据相关的任务：聚类和生成synthetic数据。异类表格数据通常包含不同的数据类型（数值、排序、 categorical）的列，但可能也有隐藏的聚类结构在其行中：例如，它们可能来自不同的地理、社会经济、方法学来源，并且结果变量（如疾病存在）可能不仅取决于其他变量，而且受到聚类上下文的影响。此外，生物医学数据共享受到了患者隐私法律的限制，现在有兴趣在使用深度学习生成synthetic表格数据。我们描述了一种新的EM基于的聚类算法，名为Madras Mixture Model（MMM），它在异类数据上表现出色，并在实际数据中恢复结构。基于这种算法，我们提出了一种synthetic表格数据生成算法，名为MMMsynth，它先对输入数据进行聚类，然后生成每个聚类的cluster-specific synthetic数据，假设每个列的数据分布是固定的。我们对这种算法进行了测试，并发现它在训练与实际发表数据之间的性能相当接近。

Hodge-Compositional Edge Gaussian Processes

paper_url: http://arxiv.org/abs/2310.19450
repo_url: None
paper_authors: Maosheng Yang, Viacheslav Borovitskiy, Elvin Isufi
for: 本研究旨在开发 principlized Gaussian processes (GPs)，用于模型 simplicial 2-complex 上的函数，特别是流体数据网络中的边流。
methods: 本研究使用 Hodge 分解，开发出 divergence-free 和 curl-free 边 GPs，并将它们组合成 \emph{Hodge-compositional edge GPs}，以便直接学习不同 Hodge ком component of 边函数。
results: 研究人员在 currency exchange, ocean flows 和 water supply networks 中应用了这些 GPs，并与其他模型进行比较，结果表明这些 GPs 能够准确地捕捉边函数的 relevance。

Abstract
We propose principled Gaussian processes (GPs) for modeling functions defined over the edge set of a simplicial 2-complex, a structure similar to a graph in which edges may form triangular faces. This approach is intended for learning flow-type data on networks where edge flows can be characterized by the discrete divergence and curl. Drawing upon the Hodge decomposition, we first develop classes of divergence-free and curl-free edge GPs, suitable for various applications. We then combine them to create \emph{Hodge-compositional edge GPs} that are expressive enough to represent any edge function. These GPs facilitate direct and independent learning for the different Hodge components of edge functions, enabling us to capture their relevance during hyperparameter optimization. To highlight their practical potential, we apply them for flow data inference in currency exchange, ocean flows and water supply networks, comparing them to alternative models.

摘要
我们提出了原理式加aussian proceses（GPs），用于模型 simplicial 2-complex 上的函数，这种结构类似于图，但是 edges 可能会形成三角形面。这种方法适用于 studying flow-type 资料在网络上，其edge flows 可以通过离散凝聚和旋转来描述。我们首先将 divergence 和 curl 分别对应到 Hodge 分解中的两个分量，然后创建 divergence-free 和 curl-free 的 edge GPs。这些 GPs 可以独立地学习不同的 Hodge 分量，使得它们能够捕捉到不同的运算效应。我们组合这些 GPs 创建了 Hodge-compositional edge GPs，这些 GPs 能够表示任何 edge 函数。我们在货币交易、海洋流和水Supply 网络中应用这些 GPs，与其他模型进行比较。这些 GPs 能够对 flow 资料进行直接和独立的学习，并且能够捕捉到不同的运算效应，因此具有实际的应用潜力。

A Federated Learning Framework for Stenosis Detection

paper_url: http://arxiv.org/abs/2310.19445
repo_url: None
paper_authors: Mariachiara Di Cosmo, Giovanna Migliorelli, Matteo Francioni, Andi Mucaj, Alessandro Maolo, Alessandro Aprile, Emanuele Frontoni, Maria Chiara Fiorentino, Sara Moccia
for: 这个研究探讨了 Federated Learning (FL) 在 coronary angiography 影像中的狭窄部分检测。
methods: 我们使用了 Faster R-CNN 模型进行检测，并在两个客户机构之间共享模型背部重量，使用 Federated Averaging (FedAvg) 进行重量聚合。
results: 我们的结果显示，FL 框架不会严重影响客户机构 2 的性能，但对客户机构 1 而言，FL 框架可以提高性能，对比本地训练模型，提高了 +3.76%、+17.21% 和 +10.80%，分别为 P rec = 73.56、Rec = 67.01 和 F1 = 70.13。这些结果显示，FL 可以实现多中心研究，并且保持患者隐私。

Abstract
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA). Two heterogeneous datasets from two institutions were considered: Dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy); Dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature. Stenosis detection was performed by using a Faster R-CNN model. In our FL framework, only the weights of the model backbone were shared among the two client institutions, using Federated Averaging (FedAvg) for weight aggregation. We assessed the performance of stenosis detection using Precision (P rec), Recall (Rec), and F1 score (F1). Our results showed that the FL framework does not substantially affects clients 2 performance, which already achieved good performance with local training; for client 1, instead, FL framework increases the performance with respect to local model of +3.76%, +17.21% and +10.80%, respectively, reaching P rec = 73.56, Rec = 67.01 and F1 = 70.13. With such results, we showed that FL may enable multicentric studies relevant to automatic stenosis detection in CA by addressing data heterogeneity from various institutions, while preserving patient privacy.

摘要
Translated into Simplified Chinese:这项研究探讨了在 coronary angiography 图像 (CA) 中使用 Federated Learning (FL) 进行stenosis 检测。我们考虑了两个不同的数据集，一个来自意大利安科那的医院 (Ospedale Riuniti of Ancona)，另一个来自文献中的一项前期研究，共计7492个顺序图像和90名患者。我们使用 Faster R-CNN 模型进行检测。在我们的 FL 框架中，只有模型背景的加权被共享给两个客户机构，使用 Federated Averaging (FedAvg) 进行加权聚合。我们评估了检测精度使用精度 (Precision)、报告率 (Recall) 和 F1 分数 (F1)。我们的结果表明，FL 框架不会对客户机构 2 的性能产生显著影响，这些客户机构已经在本地训练 achieved good performance; 而对于客户机构 1，FL 框架会提高本地模型的性能，增加 +3.76%, +17.21% 和 +10.80%，分别达到 Precision = 73.56, Recall = 67.01 和 F1 = 70.13。通过这些结果，我们证明了 FL 可能会在 CA 中实现多中心的自动stenosis 检测研究，同时解决不同机构的数据不一致性问题，保护患者隐私。

Asymmetric Diffusion Based Channel-Adaptive Secure Wireless Semantic Communications

paper_url: http://arxiv.org/abs/2310.19439
repo_url: None
paper_authors: Xintian Ren, Jun Wu, Hansong Xu, Qianqian Pan
For: The paper proposes a secure semantic communication system called DiffuSeC to address the security problem caused by semantic attacks in end-to-end data transmission tasks like image classification and image reconstruction.* Methods: The system leverages the diffusion model and deep reinforcement learning (DRL) to mitigate perturbations added by semantic attacks, including data source attacks and channel attacks. A DRL-based channel-adaptive diffusion step selection scheme is developed to improve robustness under unstable channel conditions.* Results: Simulation results demonstrate that DiffuSeC shows higher robust accuracy than previous works under a wide range of channel conditions and can quickly adjust the model state according to signal-to-noise ratios (SNRs) in unstable environments.Here is the Chinese translation of the three key points:* For: 这篇论文提出了一种名为DiffuSeC的安全semantic通信系统，用于解决semantic攻击导致的安全问题在终端数据传输任务中，如图像分类和图像重建。* Methods: DiffuSeC使用了扩散模型和深度强化学习（DRL）来减少semantic攻击添加的干扰，包括数据源攻击和通道攻击。在不稳定的通道条件下，我们开发了一种基于DRL的通道适应扩散步选择方案，以提高系统的稳定性。* Results: 模拟结果表明，DiffuSeC在各种通道条件下具有更高的Robust性精度，并可以快速根据信号噪比（SNR）在不稳定环境中调整模型状态。

Abstract
Semantic communication has emerged as a new deep learning-based communication paradigm that drives the research of end-to-end data transmission in tasks like image classification, and image reconstruction. However, the security problem caused by semantic attacks has not been well explored, resulting in vulnerabilities within semantic communication systems exposed to potential semantic perturbations. In this paper, we propose a secure semantic communication system, DiffuSeC, which leverages the diffusion model and deep reinforcement learning (DRL) to address this issue. With the diffusing module in the sender end and the asymmetric denoising module in the receiver end, the DiffuSeC mitigates the perturbations added by semantic attacks, including data source attacks and channel attacks. To further improve the robustness under unstable channel conditions caused by semantic attacks, we developed a DRL-based channel-adaptive diffusion step selection scheme to achieve stable performance under fluctuating environments. A timestep synchronization scheme is designed for diffusion timestep coordination between the two ends. Simulation results demonstrate that the proposed DiffuSeC shows higher robust accuracy than previous works under a wide range of channel conditions, and can quickly adjust the model state according to signal-to-noise ratios (SNRs) in unstable environments.

摘要
新型深度学习基于的 semantics 通信方式，semantic communication，在图像分类和图像重建等任务中得到了广泛的应用。然而，这种通信方式受到semantic attack的安全问题的影响，导致其存在漏洞。在这篇论文中，我们提出了一种安全的semantic communication系统，DiffuSeC，该系统利用了幂函数模型和深度强化学习（DRL）来解决这一问题。在发送端有扩散模块，接收端有非对称幂函数模块，DiffuSeC可以防止由semantic attack引起的干扰。为了进一步提高在不稳定的通信环境中的稳定性，我们开发了基于DRL的通信环境适应扩散步选择方案，以确保在不稳定的环境中的稳定性。此外，我们还设计了一种时间步同步方案，用于协调发送端和接收端的扩散步。实验结果表明，提出的DiffuSeC在各种通信环境下表现更高的Robust Accuracy，并能够根据信号噪比（SNR）在不稳定的环境中快速调整模型状态。

LightSAGE: Graph Neural Networks for Large Scale Item Retrieval in Shopee’s Advertisement Recommendation

paper_url: http://arxiv.org/abs/2310.19394
repo_url: None
paper_authors: Dang Minh Nguyen, Chenfei Wang, Yan Shen, Yifan Zeng
for: 本研究探讨了在大规模电商搜索中使用图 neural network (GNN) 的应用，以及如何在实际项目中建立高质量图、处理数据稀缺和冷启动问题。
methods: 本研究提出了一种简单 yet novel的图建构技术， combinig strong-signal用户行为和高精度协同推荐（CF）算法来构建高质量item图。此外，我们还提出了一种名为 LightSAGE 的新的 GNN 架构，用于生成高质量items的嵌入，以便vector搜索。
results: 我们的模型在线上A/B测试中表现出色，并在Shopee的推荐广告系统中进行了实质性的应用。我们的模型可以有效地处理冷启动和长尾项目问题，并且在offline评估中也提供了显著的改善。

Abstract
Graph Neural Network (GNN) is the trending solution for item retrieval in recommendation problems. Most recent reports, however, focus heavily on new model architectures. This may bring some gaps when applying GNN in the industrial setup, where, besides the model, constructing the graph and handling data sparsity also play critical roles in the overall success of the project. In this work, we report how GNN is applied for large-scale e-commerce item retrieval at Shopee. We introduce our simple yet novel and impactful techniques in graph construction, modeling, and handling data skewness. Specifically, we construct high-quality item graphs by combining strong-signal user behaviors with high-precision collaborative filtering (CF) algorithm. We then develop a new GNN architecture named LightSAGE to produce high-quality items' embeddings for vector search. Finally, we design multiple strategies to handle cold-start and long-tail items, which are critical in an advertisement (ads) system. Our models bring improvement in offline evaluations, online A/B tests, and are deployed to the main traffic of Shopee's Recommendation Advertisement system.

摘要
Graph Neural Network (GNN) 是当前推荐问题的流行解决方案。然而，最新的报告强调新的模型建立。这可能会导致在实际应用中， besides 模型，构建图和处理数据稀缺问题的重要性被忽略。在这篇文章中，我们报道了在 Shopee 的大规模电商ITEM检索中使用 GNN。我们介绍了我们的简单 yet novel 和有影响力的图构建、模型化和数据偏度处理技术。 Specifically，我们结合强信号用户行为和高精度共同推荐算法来构建高质量的ITEM图。然后，我们开发了一种名为 LightSAGE 的新的 GNN 架构，以生成高质量的ITEM嵌入 Vector 搜索。最后，我们设计了多种方法来处理冷启动和长尾ITEM，这些方法在广告系统中是关键的。我们的模型在线评估、A/B 测试中提供了改进，并在 Shopee 推荐广告系统的主要流量中部署。

Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness

paper_url: http://arxiv.org/abs/2310.19391
repo_url: https://github.com/Ehyaei/Causal-Fair-Metric-Learning
paper_authors: Ahmad-Reza Ehyaei, Golnoosh Farnadi, Samira Samadi
for: 本研究旨在提出一种基于 causal 结构的 fair 度量，以 guaranteeequitable treatment regardless of sensitive attributes。
methods: 本研究使用了 adversarial perturbation 和 protected causal perturbation 来检测和修复模型的漏损和不公正。在 metric learning 方面，提出了一种方法 для metric estimation 和 deployment。
results: 本研究提出了一种基于 causal 结构的 fair 度量，可以应用于 adversarial training, fair learning, algorithmic recourse, 和 causal reinforcement learning 等领域。

Abstract
Adversarial perturbation is used to expose vulnerabilities in machine learning models, while the concept of individual fairness aims to ensure equitable treatment regardless of sensitive attributes. Despite their initial differences, both concepts rely on metrics to generate similar input data instances. These metrics should be designed to align with the data's characteristics, especially when it is derived from causal structure and should reflect counterfactuals proximity. Previous attempts to define such metrics often lack general assumptions about data or structural causal models. In this research, we introduce a causal fair metric formulated based on causal structures that encompass sensitive attributes. For robustness analysis, the concept of protected causal perturbation is presented. Additionally, we delve into metric learning, proposing a method for metric estimation and deployment in real-world problems. The introduced metric has applications in the fields adversarial training, fair learning, algorithmic recourse, and causal reinforcement learning.

摘要
<>将文本翻译成简化中文。<>机器学习模型的敏感性漏洞通过对敏感属性的偏见进行攻击来暴露，而个人公平性目标则是保证不同敏感属性的各种对待。尽管这两个概念在初始阶段有所不同，但它们都 rely on 度量来生成类似的输入数据实例。这些度量应该与数据的特点相对应，特别是当数据来自 causal 结构时。在过去的尝试中，定义这些度量的方法经常缺乏一般假设关于数据或结构 causal 模型。在这项研究中，我们引入了基于 causal 结构的公平度量，用于掌控敏感属性。此外，我们还详细介绍了一种 metric learning 方法，用于度量估计和应用在实际问题中。引入的度量具有应用于对抗训练、公平学习、算法抗议和 causal 强化学习等领域的应用。

Implicit Manifold Gaussian Process Regression

paper_url: http://arxiv.org/abs/2310.19390
repo_url: None
paper_authors: Bernardo Fichera, Viacheslav Borovitskiy, Andreas Krause, Aude Billard
for: 这篇论文是用于提高 Gaussian process regression 在高维度数据上的预测性和准确性。
methods: 这篇论文提出了一种可以直接从数据中推导隐藏结构的 Gaussian process regression 技术，并且可以处理高维度数据。
results: 这篇论文获得了一个可以测量 Gaussian process regression 模型在高维度数据上的预测性和准确性，并且可以处理百万笔数据。

Abstract
Gaussian process regression is widely used because of its ability to provide well-calibrated uncertainty estimates and handle small or sparse datasets. However, it struggles with high-dimensional data. One possible way to scale this technique to higher dimensions is to leverage the implicit low-dimensional manifold upon which the data actually lies, as postulated by the manifold hypothesis. Prior work ordinarily requires the manifold structure to be explicitly provided though, i.e. given by a mesh or be known to be one of the well-known manifolds like the sphere. In contrast, in this paper we propose a Gaussian process regression technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way. For the resulting model, we discuss its convergence to the Mat\'ern Gaussian process on the assumed manifold. Our technique scales up to hundreds of thousands of data points, and may improve the predictive performance and calibration of the standard Gaussian process regression in high-dimensional~settings.

摘要

Gradient-free online learning of subgrid-scale dynamics with neural emulators

paper_url: http://arxiv.org/abs/2310.19385
repo_url: None
paper_authors: Hugo Frezat, Guillaume Balarac, Julien Le Sommer, Ronan Fablet
for: 这 paper 的目的是提出一种通用的算法，用于在线（即 $\textit{a posteriori}$ 损失函数）对 numerical 解析器进行机器学习基于 parametrization 的训练。
methods: 该方法利用神经emuulator来训练一个简化的状态空间解析器的近似，然后使用这个近似来允许时间整合步骤中的梯度传播。
results: 试验表明，通过单独训练神经emuulator和参数化组件的loss量可以最小化一些近似偏差的传播。

Abstract
In this paper, we propose a generic algorithm to train machine learning-based subgrid parametrizations online, i.e., with $\textit{a posteriori}$ loss functions for non-differentiable numerical solvers. The proposed approach leverage neural emulators to train an approximation of the reduced state-space solver, which is then used to allows gradient propagation through temporal integration steps. The algorithm is able to recover most of the benefit of online strategies without having to compute the gradient of the original solver. It is demonstrated that training the neural emulator and parametrization components separately with respective loss quantities is necessary in order to minimize the propagation of some approximation bias.

摘要
在这篇论文中，我们提出了一种通用算法，用于在线训练机器学习基于低级 parametrization 的梯度传播，即使 numerical solvers 无法导数。我们的方法利用神经网络仿真器来训练减少状态空间解决方案的近似，然后使用这个近似来允许时间 интеIntegration 步骤中的梯度传播。我们的算法可以在大多数情况下重新获得在线策略中的优点，而无需计算原始解决方案的梯度。我们还发现，在训练神经网络仿真器和 parametrization 组件 separately 的情况下，可以最小化一些近似偏差的传播。

Deep anytime-valid hypothesis testing

paper_url: http://arxiv.org/abs/2310.19384
repo_url: None
paper_authors: Teodora Pandeva, Patrick Forré, Aaditya Ramdas, Shubhanshu Shekhar
for: 这个论文旨在提供一种通用框架，用于构建强大、顺序的假设测试方法，用于处理非参数测试问题。
methods: 该框架使用两个已知操作符来定义空间假设，允许对数据分布进行一元化处理，包括两样本测试、独立测试和condition independence testing等等。
results: 该框架比传统批处理测试具有以下优点：1）能够不断监测在线数据流中，高效地聚合证据反对空间假设，2）可以实现紧密控制类型一错 без需要多测试修正，3）可以根据未知问题的难度自适应样本大小。

Abstract
We propose a general framework for constructing powerful, sequential hypothesis tests for a large class of nonparametric testing problems. The null hypothesis for these problems is defined in an abstract form using the action of two known operators on the data distribution. This abstraction allows for a unified treatment of several classical tasks, such as two-sample testing, independence testing, and conditional-independence testing, as well as modern problems, such as testing for adversarial robustness of machine learning (ML) models. Our proposed framework has the following advantages over classical batch tests: 1) it continuously monitors online data streams and efficiently aggregates evidence against the null, 2) it provides tight control over the type I error without the need for multiple testing correction, 3) it adapts the sample size requirement to the unknown hardness of the problem. We develop a principled approach of leveraging the representation capability of ML models within the testing-by-betting framework, a game-theoretic approach for designing sequential tests. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines on several tasks.

摘要
我们提出一种通用框架，用于构建强大、顺序的假设测试，用于一类非Parametric测试问题。 null假设使用两个已知运算符定义在数据分布上，这种抽象允许我们对多种古典任务，如两个样本测试、独立测试和条件独立测试，以及现代问题，如机器学习（ML）模型对抗性测试进行统一处理。我们的提议的优势包括：1）在线流处理数据并快速汇集质量证据反对null假设，2）不需要多样测试修正，可以保持紧凑的类型一错率，3）根据未知问题的难度自适应样本大小。我们开发了一种基于测试-ById的游戏理论方法，用于设计顺序测试。实际结果表明，使用我们的通用框架实现的测试在多个任务上与专门的基准相比竞争。

Musical Form Generation

paper_url: http://arxiv.org/abs/2310.19842
repo_url: https://github.com/rshohan/Jeffrey-Whalen-from-Yorktown1
paper_authors: Lilac Atassi
for: 这篇论文是为了生成结构化、可靠的音乐作品而写的。
methods: 该方法使用 conditional generative model 创造乐曲的不同部分，并使用大型自然语言模型提供乐曲的高级结构。
results: 该方法可以生成结构化、可靠的乐曲，不受机会性的限制，可以延展到无限长。

Abstract
While recent generative models can produce engaging music, their utility is limited. The variation in the music is often left to chance, resulting in compositions that lack structure. Pieces extending beyond a minute can become incoherent or repetitive. This paper introduces an approach for generating structured, arbitrarily long musical pieces. Central to this approach is the creation of musical segments using a conditional generative model, with transitions between these segments. The generation of prompts that determine the high-level composition is distinct from the creation of finer, lower-level details. A large language model is then used to suggest the musical form.

摘要
Recent generative models can produce engaging music, but their usefulness is limited. The variation in the music is often left to chance, resulting in compositions that lack structure. Pieces extending beyond a minute can become incoherent or repetitive. This paper introduces an approach for generating structured, arbitrarily long musical pieces. Central to this approach is the creation of musical segments using a conditional generative model, with transitions between these segments. The generation of prompts that determine the high-level composition is distinct from the creation of finer, lower-level details. A large language model is then used to suggest the musical form.Translation note:* "generative models" 被翻译为 "生成模型" (shēngchǎng módel)* "utility" 被翻译为 "实用性" (shíyòngxìng)* "variation" 被翻译为 "变化" (biànbèi)* "incoherent" 被翻译为 "无序" (wùxí)* "repetitive" 被翻译为 "重复" (chóngfù)* "conditional generative model" 被翻译为 "条件生成模型" (tiěkuàng shēngchǎng módel)* "prompts" 被翻译为 "提示" (tímí)* "high-level composition" 被翻译为 "高级组合" (gāojí zhùxìn)* "lower-level details" 被翻译为 "低级细节" (dījí xìaoxiè)* "large language model" 被翻译为 "大型语言模型" (dàxìng yǔyán módel)

An interpretable clustering approach to safety climate analysis: examining driver group distinction in safety climate perceptions

paper_url: http://arxiv.org/abs/2310.19841
repo_url: None
paper_authors: Kailai Sun, Tianxiang Lan, Yang Miang Goh, Sufiana Safiena, Yueng-Hsiang Huang, Bailey Lytle, Yimin He
for: 本研究旨在提高卡车驾驶员安全性，特别是通过分析驾驶员安全氛围的不同群体来开发更有效的安全预防措施。
methods: 本研究使用了5种不同的聚类算法来分析驾驶员对安全氛围的感知，并提出了一种新的量化评估部分依赖图（QPDP）的方法来更好地解释聚类结果。
results: 研究发现，supervisory care promotion是分 distinguish various driver groups的关键因素。此外，使用不同的聚类算法可能会导致不同的结果，因此需要进一步的比较和分析。

Abstract
The transportation industry, particularly the trucking sector, is prone to workplace accidents and fatalities. Accidents involving large trucks accounted for a considerable percentage of overall traffic fatalities. Recognizing the crucial role of safety climate in accident prevention, researchers have sought to understand its factors and measure its impact within organizations. While existing data-driven safety climate studies have made remarkable progress, clustering employees based on their safety climate perception is innovative and has not been extensively utilized in research. Identifying clusters of drivers based on their safety climate perception allows the organization to profile its workforce and devise more impactful interventions. The lack of utilizing the clustering approach could be due to difficulties interpreting or explaining the factors influencing employees' cluster membership. Moreover, existing safety-related studies did not compare multiple clustering algorithms, resulting in potential bias. To address these issues, this study introduces an interpretable clustering approach for safety climate analysis. This study compares 5 algorithms for clustering truck drivers based on their safety climate perceptions. It proposes a novel method for quantitatively evaluating partial dependence plots (QPDP). To better interpret the clustering results, this study introduces different interpretable machine learning measures (SHAP, PFI, and QPDP). Drawing on data collected from more than 7,000 American truck drivers, this study significantly contributes to the scientific literature. It highlights the critical role of supervisory care promotion in distinguishing various driver groups. The Python code is available at https://github.com/NUS-DBE/truck-driver-safety-climate.

摘要
运输业界，特别是卡车运输业，受到工作场所意外和死亡的威胁，大型卡车事故占了交通意外整体死亡人数的一定比例。为了预防意外，研究人员权威了安全氛围的因素和影响，并尝试了在组织内部进行分组。这是因为不同的驾驶者对安全氛围的感受不同，因此可以根据驾驶者对安全氛围的感受进行分组。然而，现有的数据驱动的安全氛围研究尚未充分利用分组方法，而且存在评估因素的困难和解释分组成员的问题。此外，现有的安全相关研究未有比较多种分组算法，导致可能的偏见。为了解决这些问题，本研究将引入可解释的分组方法来分析安全氛围。本研究比较了5种分组算法，并提出了一种新的量化评估参数图（QPDP）。为了更好地解释分组结果，本研究引入了不同的可解释机器学习度量（SHAP、PFI、QPDP）。基于超过7,000名美国卡车驾驶者的数据，本研究对科学文献做出了重要贡献。它显示了监理照顾的推广对不同的驾驶者群体的区别起到了关键的作用。Python代码可以在获取。

ProNet: Progressive Neural Network for Multi-Horizon Time Series Forecasting

paper_url: http://arxiv.org/abs/2310.19322
repo_url: None
paper_authors: Yang Lin
for: 这 paper 是为了解决多个时间序列预测问题，提出了一种基于深度学习的 ProNet 模型，可以同时使用 AR 和 NAR 策略。
methods: ProNet 模型使用了 segmentation 技术，将预测时间段分为多个子段，使用非参照性预测最重要的子段，并使用参照性预测剩下的子段。 segmentation 过程使用了隐藏变量，可以有效地捕捉各个时间步骤的重要性。
results: ProNet 模型在四个大数据集上进行了全面的评估，并进行了一个简要的ablation study，结果显示 ProNet 模型在准确率和预测速度两个方面表现出色，比 AR 和 NAR 模型更高。

Abstract
In this paper, we introduce ProNet, an novel deep learning approach designed for multi-horizon time series forecasting, adaptively blending autoregressive (AR) and non-autoregressive (NAR) strategies. Our method involves dividing the forecasting horizon into segments, predicting the most crucial steps in each segment non-autoregressively, and the remaining steps autoregressively. The segmentation process relies on latent variables, which effectively capture the significance of individual time steps through variational inference. In comparison to AR models, ProNet showcases remarkable advantages, requiring fewer AR iterations, resulting in faster prediction speed, and mitigating error accumulation. On the other hand, when compared to NAR models, ProNet takes into account the interdependency of predictions in the output space, leading to improved forecasting accuracy. Our comprehensive evaluation, encompassing four large datasets, and an ablation study, demonstrate the effectiveness of ProNet, highlighting its superior performance in terms of accuracy and prediction speed, outperforming state-of-the-art AR and NAR forecasting models.

摘要
在这篇论文中，我们介绍了ProNet，一种新的深度学习方法，用于多个预测 horizons 的时间序列预测。我们的方法通过分解预测时间轴，预测每个分解段中的最重要步骤非autoregressively，并且剩下的步骤使用 autoregressive 方法预测。这个分 segmentation 过程基于隐藏变量，可以有效地捕捉个时间步骤的重要性 durch variational inference。与AR模型相比，ProNet 显示出了非常remarkable的优势，需要 fewer AR 迭代，减少预测速度，并 Mitigate 预测误差的寄存。与NAR模型相比，ProNet 考虑了输出空间中预测之间的依赖关系，导致更好的预测精度。我们的全面评估，包括四个大数据集，以及一个ablation study，证明了 ProNet 的效果， highlighting 其在精度和预测速度两个方面的superior performance，超越了当前AR和NAR预测模型。

Dual-Directed Algorithm Design for Efficient Pure Exploration

paper_url: http://arxiv.org/abs/2310.19319
repo_url: None
paper_authors: Chao Qin, Wei You
For: The paper is written to address the problem of pure exploration in stochastic sequential adaptive experiments with a finite set of alternative options. The goal is to accurately identify the best alternative with high confidence and minimal measurement efforts.* Methods: The paper uses dual variables to derive necessary and sufficient conditions for optimality, and proposes an information-directed selection rule to adaptively pick from a candidate set based on information gain. The top-two Thompson sampling algorithm is also used to solve the problem of best-arm identification.* Results: The paper establishes that the proposed algorithm is optimal for Gaussian best-arm identification, and is also applicable to other pure exploration problems such as $\epsilon$-best-arm identification and thresholding bandit problems. The numerical experiments show that the proposed algorithm is more efficient than existing ones.

Abstract
We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternative options. The goal of the decision-maker is to accurately answer a query question regarding the alternatives with high confidence with minimal measurement efforts. A typical query question is to identify the alternative with the best performance, leading to ranking and selection problems, or best-arm identification in the machine learning literature. We focus on the fixed-precision setting and derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples. Using dual variables, we characterize the necessary and sufficient conditions for an allocation to be optimal. The use of dual variables allow us to bypass the combinatorial structure of the optimality conditions that relies solely on primal variables. Remarkably, these optimality conditions enable an extension of top-two algorithm design principle, initially proposed for best-arm identification. Furthermore, our optimality conditions give rise to a straightforward yet efficient selection rule, termed information-directed selection, which adaptively picks from a candidate set based on information gain of the candidates. We outline the broad contexts where our algorithmic approach can be implemented. We establish that, paired with information-directed selection, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature. Our algorithm is optimal for $\epsilon$-best-arm identification and thresholding bandit problems. Our analysis also leads to a general principle to guide adaptations of Thompson sampling for pure-exploration problems. Numerical experiments highlight the exceptional efficiency of our proposed algorithms relative to existing ones.

摘要
我们考虑了纯exploration问题，在随机顺序的adaptive试验中，有 finite 个选项。目标是使用最少的测量努力确定问题的答案，特别是标准问题，如最佳选项的标识。我们关注 fixed-precision 设定下的情况，并 derive 一个强化条件，表明optimal 分配样本是可行的。使用 dual 变量，我们描述了必要和充分条件，以确定分配是optimal。这些optimality condition允许我们扩展top-two 算法设计原则，初始提出于最佳arm 标识。此外，我们的optimality condition 会生成一种直观的 yet efficient 选择规则，称为信息指向选择。我们 outline 了这些算法方法可以应用的广泛上下文。我们证明，将信息指向选择与 top-two Thompson sampling 结合，可以在 Gaussian 最佳arm 标识问题中获得（几何）优化的解决方案，解决了纯exploration 文献中的一个明显的开问题。我们的算法还是 $\epsilon$-最佳arm 标识和阈值bandit问题的优化解决方案。我们的分析还导出了一种通用的原则，可以导向 Thompson sampling 的adaptation для纯exploration问题。 numerics experiments highlighted 我们提出的算法相比之下，relative to existing ones, exceptional efficiency.

A Planning-and-Exploring Approach to Extreme-Mechanics Force Fields

paper_url: http://arxiv.org/abs/2310.19306
repo_url: None
paper_authors: Pengjie Shi, Zhiping Xu
for: 这种研究旨在理解裂解过程中的裂 crack 的形成和增长，以及裂解过程中的微观结构变化。
methods: 这个研究使用分子模拟来解决裂解过程中的进程，包括机械能消耗、裂 crack 的路径选择和动态不稳定性（如弯曲和分支）。
results: 研究发现，使用高精度的力场模型可以更好地预测裂解过程中的材料行为，并且需要考虑材料的电子结构。

Abstract
Extreme mechanical processes such as strong lattice distortion and bond breakage during fracture are ubiquitous in nature and engineering, which often lead to catastrophic failure of structures. However, understanding the nucleation and growth of cracks is challenged by their multiscale characteristics spanning from atomic-level structures at the crack tip to the structural features where the load is applied. Molecular simulations offer an important tool to resolve the progressive microstructural changes at crack fronts and are widely used to explore processes therein, such as mechanical energy dissipation, crack path selection, and dynamic instabilities (e.g., kinking, branching). Empirical force fields developed based on local descriptors based on atomic positions and the bond orders do not yield satisfying predictions of fracture, even for the nonlinear, anisotropic stress-strain relations and the energy densities of edges. High-fidelity force fields thus should include the tensorial nature of strain and the energetics of rare events during fracture, which, unfortunately, have not been taken into account in both the state-of-the-art empirical and machine-learning force fields. Based on data generated by first-principles calculations, we develop a neural network-based force field for fracture, NN-F$^3$, by combining pre-sampling of the space of strain states and active-learning techniques to explore the transition states at critical bonding distances. The capability of NN-F$^3$ is demonstrated by studying the rupture of h-BN and twisted bilayer graphene as model problems. The simulation results confirm recent experimental findings and highlight the necessity to include the knowledge of electronic structures from first-principles calculations in predicting extreme mechanical processes.

摘要
extremely mechanical processes such as strong lattice distortion and bond breakage during fracture are common in nature and engineering, leading to catastrophic failure of structures. However, understanding the nucleation and growth of cracks is challenging due to their multiscale characteristics, ranging from atomic-level structures at the crack tip to the structural features where the load is applied. Molecular simulations are an important tool for exploring the progressive microstructural changes at crack fronts, including mechanical energy dissipation, crack path selection, and dynamic instabilities such as kinking and branching. However, empirical force fields based on local descriptors of atomic positions and bond orders have limited predictive power, especially for nonlinear, anisotropic stress-strain relations and the energy densities of edges. To address this challenge, we developed a neural network-based force field for fracture, NN-F$^3$, by combining pre-sampling of the space of strain states and active-learning techniques to explore the transition states at critical bonding distances. Our simulation results demonstrate the capability of NN-F$^3$ in studying the rupture of h-BN and twisted bilayer graphene as model problems, confirming recent experimental findings and highlighting the importance of considering electronic structures from first-principles calculations in predicting extreme mechanical processes.

Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection

paper_url: http://arxiv.org/abs/2310.19304
repo_url: None
paper_authors: Swanand Ravindra Kadhe, Heiko Ludwig, Nathalie Baracaldo, Alan King, Yi Zhou, Keith Houck, Ambrish Rawat, Mark Purcell, Naoise Holohan, Mikio Takeuchi, Ryo Kawahara, Nir Drucker, Hayim Shaul, Eyal Kushnir, Omri Soceanu
For: The paper is written to address the problem of detecting financial anomalies in a collaborative setting among multiple financial institutions, where trust is limited due to regulation and competition.* Methods: The paper proposes a novel solution called PV4FAD that combines fully homomorphic encryption, secure multi-party computation, differential privacy, and randomization techniques to balance privacy and accuracy during training and prevent inference threats at deployment time.* Results: The proposed solution achieves high utility and accuracy by significantly reducing the per-bank noise level while satisfying distributed differential privacy. The approach produces an ensemble model, specifically a random forest, to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. The solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.Here is the same information in Simplified Chinese:* For: 本文是为了解决多家金融机构之间协作探测金融异常情况，其中信任因为法规和竞争限制。* Methods: 本文提出了一种新的解决方案，即PV4FAD，它将完全Homomorphic encryption、安全多方计算、差分隐私和随机技术结合起来，以保持协作时的隐私和准确性。* Results: 该解决方案可以达到高Utility和准确性，通过减少每家银行的噪声水平，满足分布式差分隐私。该方案生成了一个 Random Forest ensemble模型，以利用 ensemble 模型的知名特性，降低差异和提高准确性。该解决方案在美国隐私提高技术奖 Challenge 的第一阶段中获得了第二名。

Abstract
The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontally partitioned across the entities. However, in real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally and hence it is not possible to use existing FL approaches in a plug-and-play manner. Our novel solution, PV4FAD, combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP), and randomization techniques to balance privacy and accuracy during training and to prevent inference threats at model deployment time. Our solution provides input privacy through HE and SMPC, and output privacy against inference time attacks through DP. Specifically, we show that, in the honest-but-curious threat model, banks do not learn any sensitive features about PNS transactions, and the PNS does not learn any information about the banks' dataset but only learns prediction labels. We also develop and analyze a DP mechanism to protect output privacy during inference. Our solution generates high-utility models by significantly reducing the per-bank noise level while satisfying distributed DP. To ensure high accuracy, our approach produces an ensemble model, in particular, a random forest. This enables us to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.

摘要
要有效探测金融异常，需要多个金融机构合作，其中包括支付网络系统（PNS）和其合作银行。这些金融机构之间的信任受到了法规和竞争的限制。基于联合学习（FL）技术，这些机构可以在数据水平或水平上分割的情况下合作训练模型。然而，在实际金融异常探测场景中，数据通常会被水平和垂直分割，因此不能使用现有的FL方法。我们的新解决方案，PV4FAD，结合了完全同质加密（HE）、安全多方计算（SMPC）、差分隐私（DP）和随机化技术，以保持隐私和准确性 durante 训练和执行时。我们的解决方案提供了输入隐私通过 HE 和 SMPC，并输出隐私防止攻击 during 执行时通过 DP。我们显示，在诚实但偷CURRENT的威胁模型下，银行不会学习PNS交易的敏感特征，而PNS只会学习预测标签，而不是银行的数据集。我们还开发了一种DP机制来保护输出隐私 during 执行。我们的解决方案可以减少每家银行的噪声水平，同时满足分布式DP。为了保证高准确性，我们的方法生成高Utility模型，具体来说是随机森林。这使得我们可以利用随机森林的特性来减少差异和提高准确性。我们的解决方案在美国隐私提升技术（PETs）奖励计划的第一阶段中获得了第二名。

Stage-Aware Learning for Dynamic Treatments

paper_url: http://arxiv.org/abs/2310.19300
repo_url: None
paper_authors: Hanwen Ye, Wenzhuo Zhou, Ruoqing Zhu, Annie Qu
For: 本研究旨在提出一种新的个性化学习方法，以优化患者的临床效果。* Methods: 该方法使用动态治疗决策函数（DTR），并且强调对决策过程中观察到的治疗轨迹和优质治疗轨迹的Alignment。* Results: 研究人员通过实验和实际案例研究，证明了该方法可以提高样本效率和稳定性，并且可以更好地考虑决策过程中各个阶段的差异。

Abstract
Recent advances in dynamic treatment regimes (DTRs) provide powerful optimal treatment searching algorithms, which are tailored to individuals' specific needs and able to maximize their expected clinical benefits. However, existing algorithms could suffer from insufficient sample size under optimal treatments, especially for chronic diseases involving long stages of decision-making. To address these challenges, we propose a novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of inverse probability weighted based methods. In particular, the proposed learning scheme builds a more general framework which includes the popular outcome weighted learning framework as a special case of ours. Moreover, we introduce the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages. We establish the theoretical properties of the proposed approach, including the Fisher consistency and finite-sample performance bound. Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for COVID-19 pandemic.

摘要
最近的动态治疗方案（DTR）技术提供了高效的优化治疗搜索算法，这些算法是针对个人特定需求而设计的，能够最大化他们的临床效果。然而，现有的算法可能会因为优化治疗下的样本数不够而受到限制，特别是 Chronic diseases 的长期决策过程中。为了解决这些挑战，我们提出了一种新的个性化学习方法，该方法优化 DTR 的估计，注重对观察到的治疗轨迹和优化治疗轨迹之间的Alignment。通过放弃优化治疗下的完全Alignment要求，我们的方法可以大幅提高 inverse probability weighted 基于方法的样本效率和稳定性。特别是，我们的学习方案建立了更通用的框架，包括 popular outcome weighted learning 方法作为我们的特殊情况。此外，我们引入了决策阶段重要性分数以及注意机制，以显式地考虑决策阶段之间的不同。我们证明了我们的方法的理论性质，包括 Fisher 一致性和finite-sample performance bound。Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for COVID-19 pandemic.

AMLNet: Adversarial Mutual Learning Neural Network for Non-AutoRegressive Multi-Horizon Time Series Forecasting

paper_url: http://arxiv.org/abs/2310.19289
repo_url: None
paper_authors: Yang Lin
For: 提高多个时间序列预测的准确率和速度。* Methods: 引入了一种新的非自 regression（NAR）模型AMLNet，通过在线知识传递（KD）方法和两种途径（出来-驱动KD和提示驱动KD）来实现教师模型的知识传递，以提高预测的准确性和速度。* Results: 对比于传统AR和NAR模型，AMLNet显示出了更高的准确率和更快的计算速度。

Abstract
Multi-horizon time series forecasting, crucial across diverse domains, demands high accuracy and speed. While AutoRegressive (AR) models excel in short-term predictions, they suffer speed and error issues as the horizon extends. Non-AutoRegressive (NAR) models suit long-term predictions but struggle with interdependence, yielding unrealistic results. We introduce AMLNet, an innovative NAR model that achieves realistic forecasts through an online Knowledge Distillation (KD) approach. AMLNet harnesses the strengths of both AR and NAR models by training a deep AR decoder and a deep NAR decoder in a collaborative manner, serving as ensemble teachers that impart knowledge to a shallower NAR decoder. This knowledge transfer is facilitated through two key mechanisms: 1) outcome-driven KD, which dynamically weights the contribution of KD losses from the teacher models, enabling the shallow NAR decoder to incorporate the ensemble's diversity; and 2) hint-driven KD, which employs adversarial training to extract valuable insights from the model's hidden states for distillation. Extensive experimentation showcases AMLNet's superiority over conventional AR and NAR models, thereby presenting a promising avenue for multi-horizon time series forecasting that enhances accuracy and expedites computation.

摘要

Enhancing Scalability and Reliability in Semi-Decentralized Federated Learning With Blockchain: Trust Penalization and Asynchronous Functionality

paper_url: http://arxiv.org/abs/2310.19287
repo_url: None
paper_authors: Ajay Kumar Shrestha, Faijan Ahamad Khan, Mohammed Afaan Shaikh, Amir Jaberzadeh, Jason Geng
for: 提高分布式联合学习的可扩展性和可靠性，通过结合区块链技术。
methods: 使用信任惩罚机制提高参与节点的可信度，同时实现ynchronization-free的模型更新。
results: 实现了一个公平、安全、透明的分布式联合学习环境，不会侵犯数据隐私。Here’s the same information in English:
for: To enhance the scalability and reliability of Distributed Federated Learning by integrating blockchain technology.
methods: Using a trust penalization mechanism to enhance the trustworthiness of participating nodes, while enabling asynchronous functionality for efficient and robust model updates.
results: Achieving a fair, secure, and transparent environment for collaborative machine learning without compromising data privacy.

Abstract
The paper presents an innovative approach to address the challenges of scalability and reliability in Distributed Federated Learning by leveraging the integration of blockchain technology. The paper focuses on enhancing the trustworthiness of participating nodes through a trust penalization mechanism while also enabling asynchronous functionality for efficient and robust model updates. By combining Semi-Decentralized Federated Learning with Blockchain (SDFL-B), the proposed system aims to create a fair, secure and transparent environment for collaborative machine learning without compromising data privacy. The research presents a comprehensive system architecture, methodologies, experimental results, and discussions that demonstrate the advantages of this novel approach in fostering scalable and reliable SDFL-B systems.

摘要
这篇论文提出了一种创新的方法，以解决分布式联合学习中的可扩展性和可靠性问题，通过把区块链技术与联合学习相结合。该论文将参与节点的可信性提高通过信任惩罚机制，同时允许 asynchronous 功能，以实现高效和可靠的模型更新。通过结合 Semi-Decentralized Federated Learning with Blockchain（SDFL-B），该提案旨在创造一个公平、安全和透明的合作机器学习环境，无需妥协数据隐私。论文采用了完整的系统架构、方法、实验结果和讨论，以示该新方法在推动可扩展和可靠的 SDFL-B 系统的优势。

Facilitating Graph Neural Networks with Random Walk on Simplicial Complexes

paper_url: http://arxiv.org/abs/2310.19285
repo_url: https://github.com/zhouc20/hodgerandomwalk
paper_authors: Cai Zhou, Xiyuan Wang, Muhan Zhang
for: 这篇论文旨在提高图内链接网络（Graph Neural Network，GNN）的表达能力，通过对 simplicial complexes（SC）中的随机漫步进行系统性分析。
methods: 本论文使用随机漫步在不同的几何级别上进行分析，包括节点级别（0-simplices）、边级别（1-simplices）和更高级别的 simplicial complexes。在spectral analysis of Hodge $1$-Laplacians中，提出了一种 permutation equivariant和表达力强的边级别 pozitional encoding（Hodge1Lap）。
results: 广泛的实验证明了随机漫步基于的方法的效果，包括随机漫步 pozitional encoding（RWSE）和Hodge1Lap。这些方法可以提高GNN的表达能力和稳定性。

Abstract
Node-level random walk has been widely used to improve Graph Neural Networks. However, there is limited attention to random walk on edge and, more generally, on $k$-simplices. This paper systematically analyzes how random walk on different orders of simplicial complexes (SC) facilitates GNNs in their theoretical expressivity. First, on $0$-simplices or node level, we establish a connection between existing positional encoding (PE) and structure encoding (SE) methods through the bridge of random walk. Second, on $1$-simplices or edge level, we bridge edge-level random walk and Hodge $1$-Laplacians and design corresponding edge PE respectively. In the spatial domain, we directly make use of edge level random walk to construct EdgeRWSE. Based on the spectral analysis of Hodge $1$-Laplcians, we propose Hodge1Lap, a permutation equivariant and expressive edge-level positional encoding. Third, we generalize our theory to random walk on higher-order simplices and propose the general principle to design PE on simplices based on random walk and Hodge Laplacians. Inter-level random walk is also introduced to unify a wide range of simplicial networks. Extensive experiments verify the effectiveness of our random walk-based methods.

摘要
节点级随机漫步已广泛应用于图 neural network 中，但是有限的注意力是随机漫步在边和更一般来说的 $k$-simplices 方面。本文系统地分析了随机漫步在不同顺序 simplicial complexes （SC）中如何促进 GNNs 的理论表达能力。首先，在 $0$-simplices 或节点级别，我们建立了随机漫步和 pozitional 编码（PE）方法之间的桥接。其次，在 $1$-simplices 或边级别，我们将随机漫步和 Hodge $1$-Laplacians 桥接，并设计对应的边 PE。在空间领域中，我们直接利用边级随机漫步来构建 EdgeRWSE。基于 Hodge $1$-Laplacians 的спектраль分析，我们提出了一种可变的edge-level pozitional 编码 Hodge1Lap。第三，我们扩展了我们的理论到高阶 simplices 上，并提出了一个通用的方法来在随机漫步和 Hodge Laplacians 基础上设计 PE。我们还引入了间隔随机漫步，以统一一 wide range of simplicial networks。extensive experiments 表明我们的随机漫步基于方法的效果。Note: Simplified Chinese is a written form of Chinese that uses simpler characters and grammar compared to Traditional Chinese. It is commonly used in mainland China and is the official language of the People's Republic of China.

rTsfNet: a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction for IMU-based Human Activity Recognition

paper_url: http://arxiv.org/abs/2310.19283
repo_url: None
paper_authors: Yu Enokibori
for: 这篇论文提出了一种基于IMU的人体活动识别（HAR）的新型深度神经网络（DNN）模型，即rTsfNet。
methods: rTsfNet使用多头3D旋转和时间序列特征提取（TSF）来自动选择3D基准，然后使用多层感知网络（MLP）进行人体活动识别。
results: 该模型在管理的标准 bench 条件下，以及多个数据集（UCI HAR、PAMAP2、Daphnet、OPPORTUNITY）中，实现了最高精度，超过了现有模型。

Abstract
This paper proposes rTsfNet, a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction, as a new DNN model for IMU-based human activity recognition (HAR). rTsfNet automatically selects 3D bases from which features should be derived by deriving 3D rotation parameters within the DNN. Then, time series features (TSFs), the wisdom of many researchers, are derived and realize HAR using MLP. Although a model that does not use CNN, it achieved the highest accuracy than existing models under well-managed benchmark conditions and multiple datasets: UCI HAR, PAMAP2, Daphnet, and OPPORTUNITY, which target different activities.

摘要

Machine Learning Regularization for the Minimum Volume Formula of Toric Calabi-Yau 3-folds

paper_url: http://arxiv.org/abs/2310.19276
repo_url: None
paper_authors: Eugene Choi, Rak-Kyeong Seong
for: 这篇论文是为了计算某些特定的5个维度Sasaki-Einstein流形的最小体积而写的。
methods: 这篇论文使用了机器学习正则化技术来计算这些流形的最小体积。
results: 这篇论文提出了一些可解释的、基于流形的几何 invariants的确定最小体积的公式。这些公式可以高度准确地计算这些流形的最小体积。

Abstract
We present a collection of explicit formulas for the minimum volume of Sasaki-Einstein 5-manifolds. The cone over these 5-manifolds is a toric Calabi-Yau 3-fold. These toric Calabi-Yau 3-folds are associated with an infinite class of 4d N=1 supersymmetric gauge theories, which are realized as worldvolume theories of D3-branes probing the toric Calabi-Yau 3-folds. Under the AdS/CFT correspondence, the minimum volume of the Sasaki-Einstein base is inversely proportional to the central charge of the corresponding 4d N=1 superconformal field theories. The presented formulas for the minimum volume are in terms of geometric invariants of the toric Calabi-Yau 3-folds. These explicit results are derived by implementing machine learning regularization techniques that advance beyond previous applications of machine learning for determining the minimum volume. Moreover, the use of machine learning regularization allows us to present interpretable and explainable formulas for the minimum volume. Our work confirms that, even for extensive sets of toric Calabi-Yau 3-folds, the proposed formulas approximate the minimum volume with remarkable accuracy.

摘要
我们提出了一系列Explicit的方程式，用于找到Sasaki-Einstein 5-dimensional manifold的最小体积。这些5-dimensional manifold的对偶是一种toric Calabi-Yau 3-fold。这些toric Calabi-Yau 3-fold和4d N=1 supersymmetric gauge theory之间存在一个无穷的等级关系，它们是D3-brane在toric Calabi-Yau 3-fold上的世界volume理论。根据AdS/CFT对偶，Sasaki-Einstein底物的最小体积与4d N=1 superconformal field theory的中心质量有逆比例关系。我们提出的方程式是使用机器学习调整技术所得到的，这些技术超过了过去使用机器学习来决定最小体积的应用。此外，使用机器学习调整技术允许我们提供可解释和可读的方程式，用于找到最小体积。我们的工作证明，即使是大量的toric Calabi-Yau 3-fold中，我们的方程式仍然可以对最小体积进行高精度的近似。

Prediction of Effective Elastic Moduli of Rocks using Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.19274
repo_url: None
paper_authors: Jaehong Chung, Rasool Ahmad, WaiChing Sun, Wei Cai, Tapan Mukerji
for: 这个研究旨在使用图 neural network (GNN) 方法预测岩石的有效弹性模量从其数字 CT 扫描图像中。
methods: 我们使用 Mapper 算法将三维数字岩石图像转换为图集合，包含重要的几何信息。这些图集合经过训练后能够预测弹性模量。
results: GNN 模型在不同的图大小和Subcube 维度上显示了良好的预测能力，并且在测试集上保持了高预测精度。与 Convolutional Neural Networks (CNNs) 进行比较分析表明，GNNs 在预测未看过的岩石性质方面表现更好。此外，图表示微结构减少了 GPU 内存需求（相比于网格表示法 для CNNs），使得批处理大小的选择更加灵活。这项研究 demonstarte GNN 模型在改善岩石性质预测精度和整个数字岩石分析的效率方面具有潜力。

Abstract
This study presents a Graph Neural Networks (GNNs)-based approach for predicting the effective elastic moduli of rocks from their digital CT-scan images. We use the Mapper algorithm to transform 3D digital rock images into graph datasets, encapsulating essential geometrical information. These graphs, after training, prove effective in predicting elastic moduli. Our GNN model shows robust predictive capabilities across various graph sizes derived from various subcube dimensions. Not only does it perform well on the test dataset, but it also maintains high prediction accuracy for unseen rocks and unexplored subcube sizes. Comparative analysis with Convolutional Neural Networks (CNNs) reveals the superior performance of GNNs in predicting unseen rock properties. Moreover, the graph representation of microstructures significantly reduces GPU memory requirements (compared to the grid representation for CNNs), enabling greater flexibility in the batch size selection. This work demonstrates the potential of GNN models in enhancing the prediction accuracy of rock properties and boosting the efficiency of digital rock analysis.

摘要
Translated into Simplified Chinese:这个研究提出了基于图神经网络（GNN）的方法，用于预测矿石的有效弹性模量，基于其数字CT扫描图像。我们使用Mapper算法将3D数字矿石图像转换成图 dataset，包含重要的几何信息。这些图， после训练，能够有效预测弹性模量。我们的 GNN 模型在不同的图大小和不同的子立方体维度上表现出了良好的预测能力。不仅在测试数据集上表现出色，而且可以保持高度预测精度 для未看到的矿石和未探索的子立方体大小。比较分析表明，GNN 模型在预测未看到的矿石属性方面表现出了超越 CNN 模型的优异性。此外，图表示微结构的几何表示法可以减少 GPU 内存需求 (相比于网格表示法 для CNN 模型), 这使得批处理大小的选择更加灵活。这项工作示示了 GNN 模型在提高矿石属性预测精度和提高数字矿石分析效率方面的潜力。

Invariant kernels on Riemannian symmetric spaces: a harmonic-analytic approach

paper_url: http://arxiv.org/abs/2310.19270
repo_url: None
paper_authors: Nathael Da Costa, Cyrus Mostajeran, Juan-Pablo Ortega, Salem Said
for: 证明古德曼kernel在非欧几何空间上是否总是正定 positivity.
methods: 发展新的几何和分析方法，提供了对古德曼kernel的正定性的彻底 caracterization，完整性仅在低维度场景下有限数量的数值计算中略有缺陷。
results: 提出了L$^{!\scriptscriptstyle p}$-Godement定理（-$p = 1,2），这些定理提供了非 compact型半 Symmetric space上古德曼kernel的正定性的必要和 suficient conditions，这些结果比较容易应用。

Abstract
This work aims to prove that the classical Gaussian kernel, when defined on a non-Euclidean symmetric space, is never positive-definite for any choice of parameter. To achieve this goal, the paper develops new geometric and analytical arguments. These provide a rigorous characterization of the positive-definiteness of the Gaussian kernel, which is complete but for a limited number of scenarios in low dimensions that are treated by numerical computations. Chief among these results are the L$^{\!\scriptscriptstyle p}$-$\hspace{0.02cm}$Godement theorems (where $p = 1,2$), which provide verifiable necessary and sufficient conditions for a kernel defined on a symmetric space of non-compact type to be positive-definite. A celebrated theorem, sometimes called the Bochner-Godement theorem, already gives such conditions and is far more general in its scope, but is especially hard to apply. Beyond the connection with the Gaussian kernel, the new results in this work lay out a blueprint for the study of invariant kernels on symmetric spaces, bringing forth specific harmonic analysis tools that suggest many future applications.

摘要
（这个工作的目标是证明классический高斯核，当定义在非欧几何同轴空间上，从未是一个正定的核心，对于任何选择的参数。为了实现这一目标，文章发展了新的几何和分析 Argument。这些提供了非欧几何同轴空间上高斯核的正定性的完整但是有限数量的低维度场景，通过数值计算处理。文章中最主要的结果是L$^{\!\scriptscriptstyle p}$-$\hspace{0.02cm}$Godement定理（其中$p = 1,2），这些定理提供了非欧几何同轴空间上核定的必要和 suficient条件，这些条件是可靠的，但是特别hard to apply。 beyond the connection with the Gaussian kernel, the new results in this work lay out a blueprint for the study of invariant kernels on symmetric spaces, bringing forth specific harmonic analysis tools that suggest many future applications.）

A Metadata-Driven Approach to Understand Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.19263
repo_url: None
paper_authors: Ting Wei Li, Qiaozhu Mei, Jiaqi Ma
For: This paper aims to understand the limitations of Graph Neural Networks (GNNs) and identify critical data properties that affect their performance.* Methods: The authors propose a metadata-driven approach to analyze the sensitivity of GNNs to graph data properties, using a multivariate sparse regression analysis on benchmarking data.* Results: The authors find that dataset degree distribution has a significant impact on GNN performance, with more balanced degree distributions leading to better linear separability of node representations and better GNN performance. Theoretical analysis and controlled experiments verify the effectiveness of the proposed approach.

Abstract
Graph Neural Networks (GNNs) have achieved remarkable success in various applications, but their performance can be sensitive to specific data properties of the graph datasets they operate on. Current literature on understanding the limitations of GNNs has primarily employed a $\textit{model-driven}$ approach that leverage heuristics and domain knowledge from network science or graph theory to model the GNN behaviors, which is time-consuming and highly subjective. In this work, we propose a $\textit{metadata-driven}$ approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks. We perform a multivariate sparse regression analysis on the metadata derived from benchmarking GNN performance across diverse datasets, yielding a set of salient data properties. To validate the effectiveness of our data-driven approach, we focus on one identified data property, the degree distribution, and investigate how this property influences GNN performance through theoretical analysis and controlled experiments. Our theoretical findings reveal that datasets with more balanced degree distribution exhibit better linear separability of node representations, thus leading to better GNN performance. We also conduct controlled experiments using synthetic datasets with varying degree distributions, and the results align well with our theoretical findings. Collectively, both the theoretical analysis and controlled experiments verify that the proposed metadata-driven approach is effective in identifying critical data properties for GNNs.

摘要
GRAPHNeural Networks (GNNs) 已经取得了各种应用领域的出色成绩，但它们在具体的数据特性上的性能可能具有敏感性。现有文献中理解 GNN 的限制主要采用了一种 $\textit{模型驱动的}$ 方法，利用网络科学或图论中的euristic和专业知识来模型 GNN 的行为，这是时间consuming 和高度主观的。在这种工作中，我们提议一种 $\textit{metadata驱动的}$ 方法来分析 GNN 对图数据特性的敏感性， motivated by the increasing availability of graph learning benchmarks。我们通过对benchmarking GNN 性能的 metadata 进行多ivariate sparse regression分析，得到了一组突出的数据特性。为了证明我们的数据驱动方法的有效性，我们选择了一个标识的数据特性，即度分布，并通过理论分析和控制实验来研究该特性对 GNN 性能的影响。我们的理论发现表明，具有更平衡的度分布的数据集 exhibit better linear separability of node representations，从而导致更好的 GNN 性能。我们还通过使用 synthetic 数据集来进行控制实验，结果与我们的理论发现一致。总之， Both the theoretical analysis and controlled experiments verify that the proposed metadata-driven approach is effective in identifying critical data properties for GNNs.

Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement

paper_url: http://arxiv.org/abs/2310.19261
repo_url: None
paper_authors: Daesol Cho, Seungjae Lee, H. Jin Kim
for: 本文提出了一种新的课程RL方法，叫做Diversify for Disagreement & Conquer（D2C），用于解决RL面临的无知搜索问题。
methods: D2C方法只需很少的愿望结果示例，可以在任何环境中工作，不管环境的geometry或愿望结果示例的分布。该方法首先进行目标状态分类器的多样化，以确定与访问过的状态相似的状态，并确保状态分类器对非典型状态的识别不同，从而可以量化未探索区域和设计一个简单直观的目标条件内部奖励信号。然后，使用两分图匹配定义了一个课程学习目标，生成一个适应度较高的中间目标，使agent自动探索和征服未探索区域。
results: 实验结果表明，D2C方法在量和质上都超过了先前的课程RL方法，即使愿望结果示例的分布是随机的。

Abstract
Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment, regardless of its geometry or the distribution of the desired outcome examples. The proposed method performs diversification of the goal-conditional classifiers to identify similarities between visited and desired outcome states and ensures that the classifiers disagree on states from out-of-distribution, which enables quantifying the unexplored region and designing an arbitrary goal-conditioned intrinsic reward signal in a simple and intuitive way. The proposed method then employs bipartite matching to define a curriculum learning objective that produces a sequence of well-adjusted intermediate goals, which enable the agent to automatically explore and conquer the unexplored region. We present experimental results demonstrating that D2C outperforms prior curriculum RL methods in both quantitative and qualitative aspects, even with the arbitrarily distributed desired outcome examples.

摘要
强化学习（RL）经常遇到无知搜索问题，agent需要无法访问环境特性或外部奖励的情况下探索。为解决这些挑战，这项工作提出了一种新的课程RL方法，即多样化为争议与征服（D2C）。与之前的课程学习方法不同，D2C只需要一些欲要的结果示例，并在任何环境中工作，无论环境的几何结构或欲要结果示例的分布。提出的方法首先进行了目标准备的多样化，以便在访问和欲要结果状态时确定相似性，并确保类ifiers在不同的状态下产生分歧，从而能够量化未探索的区域并设计一个简单直观的目标准备的自适应奖励信号。然后，提出的方法使用两部分匹配来定义一个课程学习目标，生成一系列适应度较高的中间目标，使得agent可以自动探索和征服未探索的区域。我们的实验结果表明，D2C在量和质量上都高于先前的课程RL方法，即使欲要结果示例的分布是随机的。

Flow-based Distributionally Robust Optimization

paper_url: http://arxiv.org/abs/2310.19253
repo_url: None
paper_authors: Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie
for: 解决流基于分布 robust优化（DRO）问题，使用 Wasserstein 不确定集，并需要最差情况分布（也称为最不利分布，LFD）是连续的，以便可以扩展到更大的样本大小和提高适应性。
methods: 使用流模型，连续时间可变Transport map，并开发 Wasserstein proximal梯度流算法来解决计算挑战。在实践中，Transport map 是通过一个序列 neural network 逐步训练的块式方式进行参数化。
results: 在对真实高维数据进行实验中，提出了一种新的机制来实现数据驱动分布扰动隐私，并在分布性检测和鲁棒化学习中显示了强емпириical性能。

Abstract
We present a computationally efficient framework, called \texttt{FlowDRO}, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets, when requiring the worst-case distribution (also called the Least Favorable Distribution, LFD) to be continuous so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models, continuous-time invertible transport maps between the data distribution and the target distribution, and develop a Wasserstein proximal gradient flow type of algorithm. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. Our computational framework is general, can handle high-dimensional data with large sample sizes, and can be useful for various applications. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on real high-dimensional data.

摘要
我们提出一种计算效率高的框架，称为\texttt{FlowDRO}，用于解决基于流的分布式 robust优化（DRO）问题，其中最差情况分布（也称为最不利分布，LFD）需要是连续的，以便可以适用于更大的样本大小和更好的总体化能力。为了解决计算复杂的无穷维度优化问题，我们利用流模型，连续时间可逆运输映射 между数据分布和目标分布，并开发了 Wasserstein proximal梯度流类型的算法。在实践中，我们归一化运输映射使用一个序列的神经网络，逐步在块内部使用梯度下降进行训练。我们的计算框架是通用的，可以处理高维数据和大样本大小，并可以用于多种应用。我们在抑制学习、分布式 robust测试和数据驱动分布泄漏隐私中展示了该方法的强制实际性。

Assessment of Differentially Private Synthetic Data for Utility and Fairness in End-to-End Machine Learning Pipelines for Tabular Data

paper_url: http://arxiv.org/abs/2310.19250
repo_url: None
paper_authors: Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres, Rafael de Sousa
for:This paper aims to investigate the use of differentially private synthetic data in end-to-end machine learning pipelines, specifically exploring the extent to which synthetic data can replace real, tabular data and identifying the most effective synthetic data generation techniques for training and evaluating machine learning models.methods:The authors use a training and evaluation framework that does not assume the availability of real data for testing the utility and fairness of machine learning models trained on synthetic data. They analyze several different definitions of fairness and compare the utility and fairness of models trained using marginal-based and GAN-based synthetic data generation algorithms.results:The authors find that marginal-based synthetic data generators outperform GAN-based ones in terms of model training utility for tabular data, and that models trained using data generated by the marginal-based algorithm MWEM PGM can achieve similar utility to models trained using real data. Additionally, the authors show that these models can also exhibit fairness characteristics similar to those obtained by models trained with real data.

Abstract
Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic data set generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generator MWEM PGM can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.

摘要
diferencialmente privado (DP) 的合成数据集是一种为保护个人数据提供者隐私而分享数据的解决方案。我们 investigated the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identified the most effective synthetic data generation techniques for training and evaluating machine learning models. 我们的分析包括代表两种主要的合成数据生成算法：marginal-based和GAN-based。在我们的研究中，我们首次：（i）提出了一个不假设实际数据可用于测试机器学习模型在合成数据上的实用性和公平性的训练和评估框架；（ii）对合成数据集生成算法进行了最广泛的分析，包括实用性和公平性在内的多种定义。我们的发现表明，marginal-based合成数据生成器在机器学习模型训练中的实用性比GAN-based更高。我们的分析还显示，使用marginal-based算法生成的数据可以让机器学习模型达到与实际数据训练模型相似的实用性特征。此外，我们发现MWEM PGM算法可以通过同时实现实用性和公平性特征来训练模型。

A spectral regularisation framework for latent variable models designed for single channel applications

paper_url: http://arxiv.org/abs/2310.19246
repo_url: None
paper_authors: Ryan Balshaw, P. Stephan Heyns, Daniel N. Wilke, Stephan Schmidt
for: 这篇论文旨在提供一个Python包，用于解决单通道隐变量模型（LVM）应用中的源重复问题。
methods: 该包使用一个新的спектральRegularization项来解决源重复问题，并提供了一个框架，用于在LVM优化过程中应用спектральRegularization。
results: 该包可以帮助 Investigate和使用LVMs with spectral regularization，并提供了一个一致的线性LVM优化框架，用于单通道时间序列应用。

Abstract
Latent variable models (LVMs) are commonly used to capture the underlying dependencies, patterns, and hidden structure in observed data. Source duplication is a by-product of the data hankelisation pre-processing step common to single channel LVM applications, which hinders practical LVM utilisation. In this article, a Python package titled spectrally-regularised-LVMs is presented. The proposed package addresses the source duplication issue via the addition of a novel spectral regularisation term. This package provides a framework for spectral regularisation in single channel LVM applications, thereby making it easier to investigate and utilise LVMs with spectral regularisation. This is achieved via the use of symbolic or explicit representations of potential LVM objective functions which are incorporated into a framework that uses spectral regularisation during the LVM parameter estimation process. The objective of this package is to provide a consistent linear LVM optimisation framework which incorporates spectral regularisation and caters to single channel time-series applications.

摘要
封装化变量模型（LVM）通常用于捕捉观察数据中的下面依赖、模式和隐藏结构。数据块化是单通道LVM应用中的数据处理步骤中的一个副产品，它限制了实用LVM的使用。本文提出了一个名为“spectrally-regularized-LVMs”的Python包，该包解决了源重复问题，通过添加一个新的spectral regularization term。这个包提供了一个基于spectral regularization的单通道LVM应用程序框架，使得可以更容易地investigate和utilize LVMs with spectral regularization。这是通过使用符号或显式表示potential LVM目标函数，并将其包含到一个使用spectral regularization during LVM参数估计过程中的框架中来实现的。该包的目标是提供一个一致的线性LVM优化框架，该框架包含spectral regularization并且适用于单通道时间序列应用。

Maximum Knowledge Orthogonality Reconstruction with Gradients in Federated Learning

paper_url: http://arxiv.org/abs/2310.19222
repo_url: https://github.com/wfwf10/mkor
paper_authors: Feng Wang, Senem Velipasalar, M. Cenk Gursoy
for: 保护客户端数据隐私，防止泄露客户端数据。
methods: 使用最大知识正交重构（MKOR）方法，通过秘密地修改参数传递给客户端，从客户端的梯度更新中重构客户端的输入数据。
results: 对MNIST、CIFAR-100和ImageNet dataset进行评估，比对已有方法，MKOR方法可以重构高质量的输入图像，并且可以效率地和不察觉地从客户端的梯度更新中重构输入图像。

Abstract
Federated learning (FL) aims at keeping client data local to preserve privacy. Instead of gathering the data itself, the server only collects aggregated gradient updates from clients. Following the popularity of FL, there has been considerable amount of work, revealing the vulnerability of FL approaches by reconstructing the input data from gradient updates. Yet, most existing works assume an FL setting with unrealistically small batch size, and have poor image quality when the batch size is large. Other works modify the neural network architectures or parameters to the point of being suspicious, and thus, can be detected by clients. Moreover, most of them can only reconstruct one sample input from a large batch. To address these limitations, we propose a novel and completely analytical approach, referred to as the maximum knowledge orthogonality reconstruction (MKOR), to reconstruct clients' input data. Our proposed method reconstructs a mathematically proven high quality image from large batches. MKOR only requires the server to send secretly modified parameters to clients and can efficiently and inconspicuously reconstruct the input images from clients' gradient updates. We evaluate MKOR's performance on the MNIST, CIFAR-100, and ImageNet dataset and compare it with the state-of-the-art works. The results show that MKOR outperforms the existing approaches, and draws attention to a pressing need for further research on the privacy protection of FL so that comprehensive defense approaches can be developed.

摘要
federated learning (FL) 目的是保持客户端数据本地，以保持隐私。而而不是收集客户端数据本身，服务器只收集客户端上的聚合梯度更新。随着 Federated Learning 的流行，有很多工作揭示了 Federated Learning 方法的漏洞，可以从梯度更新中重建输入数据。然而，大多数现有工作假设了 Federated Learning 的 batch size 非常小，并且在 batch size 较大时图像质量不佳。其他工作可能会修改神经网络结构或参数，以至于可以被客户端探测。此外，大多数工作只能从大批量中重建一个输入数据。为解决这些限制，我们提出了一种新的、完全分析的方法，称为最大知识正交重建（MKOR），可以从客户端的梯度更新中重建输入数据。MKOR 可以高质量地重建大批量中的图像。MKOR 只需服务器在秘密地将修改后的参数发送给客户端，并且可以高效地、不显 Orts reconstruction 输入数据。我们对 MKOR 在 MNIST、CIFAR-100 和 ImageNet 数据集上进行了性能评估，并与现有的方法进行比较。结果显示，MKOR 超过了现有的方法，并吸引了关注于 Federated Learning 隐私保护的进一步研究，以开发全面的防御方法。

From Stream to Pool: Dynamic Pricing Beyond i.i.d. Arrivals

paper_url: http://arxiv.org/abs/2310.19220
repo_url: None
paper_authors: Titing Cui, Su Jia, Thomas Lavastida
for: This paper focuses on the dynamic pricing problem, specifically addressing the issue of high-valuation customers leaving the market early and causing a shift in the valuation distribution.
methods: The authors propose a minimax optimal algorithm that computes a non-adaptive policy to guarantee a $1/k$ fraction of the optimal revenue, given any set of $k$ prices. Additionally, they present an adaptive learn-then-earn policy based on a novel debiasing approach.
results: The authors prove an $\tilde O(kn^{3/4})$ regret bound for the adaptive policy, and further improve the bound to $\tilde O(k^{3/4} n^{3/4})$ using martingale concentration inequalities.

Abstract
The dynamic pricing problem has been extensively studied under the \textbf{stream} model: A stream of customers arrives sequentially, each with an independently and identically distributed valuation. However, this formulation is not entirely reflective of the real world. In many scenarios, high-valuation customers tend to make purchases earlier and leave the market, leading to a \emph{shift} in the valuation distribution. Thus motivated, we consider a model where a \textbf{pool} of $n$ non-strategic unit-demand customers interact repeatedly with the seller. Each customer monitors the price intermittently according to an independent Poisson process and makes a purchase if the observed price is lower than her \emph{private} valuation, whereupon she leaves the market permanently. We present a minimax \emph{optimal} algorithm that efficiently computes a non-adaptive policy which guarantees a $1/k$ fraction of the optimal revenue, given any set of $k$ prices. Moreover, we present an adaptive \emph{learn-then-earn} policy based on a novel \emph{debiasing} approach, and prove an $\tilde O(kn^{3/4})$ regret bound. We further improve the bound to $\tilde O(k^{3/4} n^{3/4})$ using martingale concentration inequalities.

摘要
“流动价格问题已经得到了广泛的研究，以流动客户为例：每个客户独立且相同的评价会随机出现。但这个模型不完全反映现实情况：高评价客户往往在早期购买并离开市场，导致价格分布的变化。因此，我们考虑了一个具有$n$名不策略性单位需求客户的集合，这些客户可以重复地与价格发展商互动。每个客户按照独立的波尔兹数 процес监控价格，如果观察到的价格低于其私人评价，就会购买产品并永久离开市场。我们提出了一个最佳算法，可以效率地计算一个不适应性政策， garantua 1/k Fraction of the 最佳收益， given any set of k prices。此外，我们还提出了一个基于新的传播措施的学习然后获利策略，并证明了 $\tilde O(kn^{3/4})$ 的后悔 bound。 finally，我们使用 martingale concentration inequalities 提高 bound to $\tilde O(k^{3/4} n^{3/4})$。”

A Survey of Federated Unlearning: A Taxonomy, Challenges and Future Directions

paper_url: http://arxiv.org/abs/2310.19218
repo_url: None
paper_authors: Jiaxi Yang, Yang Zhao
for: 本文提出了一种 Federated Unlearning (FU) 的概述，强调了在 Federated Learning (FL) 环境下实现 Right to be Forgotten (RTBF) 的挑战。
methods: 本文评论了现有的 FU 算法、目标函数和评价指标，并将它们分类为不同的方案、应用场景和未来发展方向。
results: 本文通过对一些研究的回顾和比较，总结了它们的特点和优劣点，并提出了未来研究的可能性和挑战。

Abstract
With the development of trustworthy Federated Learning (FL), the requirement of implementing right to be forgotten gives rise to the area of Federated Unlearning (FU). Comparing to machine unlearning, a major challenge of FU lies in the decentralized and privacy-preserving nature of FL, in which clients jointly train a global model without sharing their raw data, making it substantially more intricate to selectively unlearn specific information. In that regard, many efforts have been made to tackle the challenges of FU and have achieved significant progress. In this paper, we present a comprehensive survey of FU. Specially, we provide the existing algorithms, objectives, evaluation metrics, and identify some challenges of FU. By reviewing and comparing some studies, we summarize them into a taxonomy for various schemes, potential applications and future directions.

摘要
随着可靠的 Federated Learning (FL) 的发展，实现“忘记权”的要求给出了 Federated Unlearning (FU) 领域的挑战。与机器解启相比，FU 的主要挑战在于 Federated Learning 的分布式和隐私保护特性， clients 在无需分享原始数据的情况下集成全球模型，使其 SELECTIVE 地忘记特定信息变得非常复杂。为此，许多努力已经被作出，并取得了显著进步。在这篇论文中，我们提供了 FU 的全面报告，包括现有的算法、目标、评价指标，并对一些研究进行了概要总结和比较，将其分类为不同的方案、应用场景和未来方向。

On the accuracy and efficiency of group-wise clipping in differentially private optimization

paper_url: http://arxiv.org/abs/2310.19215
repo_url: None
paper_authors: Zhiqi Bu, Ruixuan Liu, Yu-Xiang Wang, Sheng Zha, George Karypis
for: 该论文主要针对具有数百万到数十亿参数的深度学习模型中的权限保护（Differentially Private，DP）优化问题。
methods: 该论文主要研究了分割样本的梯度抑制方式，即在DP优化中的核心组件。研究发现，不同的梯度抑制方式具有相同的时间复杂度，但实际实现了精度-存储空间之间的质量负担。
results: 研究显示，相比总层梯度抑制，分层梯度抑制可以更好地实现高精度和低峰存储之间的平衡。此外，对大型模型进行DP优化，使用分层梯度抑制可以达到高精度和低峰存储同时。

Abstract
Recent advances have substantially improved the accuracy, memory cost, and training speed of differentially private (DP) deep learning, especially on large vision and language models with millions to billions of parameters. In this work, we thoroughly study the per-sample gradient clipping style, a key component in DP optimization. We show that different clipping styles have the same time complexity but instantiate an accuracy-memory trade-off: while the all-layer clipping (of coarse granularity) is the most prevalent and usually gives the best accuracy, it incurs heavier memory cost compared to other group-wise clipping, such as the layer-wise clipping (of finer granularity). We formalize this trade-off through our convergence theory and complexity analysis. Importantly, we demonstrate that the accuracy gap between group-wise clipping and all-layer clipping becomes smaller for larger models, while the memory advantage of the group-wise clipping remains. Consequently, the group-wise clipping allows DP optimization of large models to achieve high accuracy and low peak memory simultaneously.

摘要

Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices

paper_url: http://arxiv.org/abs/2310.19214
repo_url: https://github.com/cvxgrp/mlr_fitting
paper_authors: Tetiana Parshakova, Trevor Hastie, Eric Darve, Stephen Boyd
for: 这篇论文是关于多层低级（MLR）矩阵的研究，MLR矩阵是一种ROW和列Permutation的积集，每个积是一个块分解的改进版本，所有块都是低级矩阵。
methods: 这篇论文提出了三个问题，即因子适应、级别分配和层次分解。对于这些问题，论文提出了一些解决方案。
results: 论文的结果显示，使用提出的方法可以减少积分矩阵的存储空间和计算复杂度，同时保持积分矩阵的性质。此外，论文还附上了一个开源包，实现了提出的方法。

Abstract
We consider multilevel low rank (MLR) matrices, defined as a row and column permutation of a sum of matrices, each one a block diagonal refinement of the previous one, with all blocks low rank given in factored form. MLR matrices extend low rank matrices but share many of their properties, such as the total storage required and complexity of matrix-vector multiplication. We address three problems that arise in fitting a given matrix by an MLR matrix in the Frobenius norm. The first problem is factor fitting, where we adjust the factors of the MLR matrix. The second is rank allocation, where we choose the ranks of the blocks in each level, subject to the total rank having a given value, which preserves the total storage needed for the MLR matrix. The final problem is to choose the hierarchical partition of rows and columns, along with the ranks and factors. This paper is accompanied by an open source package that implements the proposed methods.

摘要
我们考虑多层低阶（MLR）矩阵，定义为一行和列排序的一个总和矩阵的各个矩阵，每个矩阵都是前一个矩阵的对角线均分划分，所有块都是低阶矩阵的实际形式。 MLR 矩阵扩展了低阶矩阵，但与其属性相似，例如总储存需求和矩阵-向量乘法的复杂度。我们处理三个在适用给一个矩阵的 MLR 矩阵的问题：1. 因数适应（factor fitting）：我们调整 MLR 矩阵的因数。2. 权重分配（rank allocation）：我们选择每个层的块的权重，以保持总权重的给定值，并保持 MLR 矩阵的总储存需求不变。3. 垂直分解和因数选择（hierarchical partition and factor selection）：我们选择行和列的垂直分解，以及每个层的因数。这篇文章附加了一个开源套件，实现了我们提议的方法。

Investigative Pattern Detection Framework for Counterterrorism

paper_url: http://arxiv.org/abs/2310.19211
repo_url: None
paper_authors: Shashika R. Muramudalige, Benjamin W. K. Hung, Rosanne Libretti, Jytte Klausen, Anura P. Jayasumana
for: 预防暴力激进分子的袭击，保障公众安全。
methods: 使用自动化工具提取信息，回答分析员的问题，不断扫描新信息，与过去事件集成，检测出emerging threats。
results: 开发了一套Investigative Pattern Detection Framework for Counterterrorism（INSPECT），可以自动检测行为指标和风险oprofile/组，并自动完成大规模的审查和检索工作。INSPECT已经在domestic jihadism dataset上验证和评估。

Abstract
Law-enforcement investigations aimed at preventing attacks by violent extremists have become increasingly important for public safety. The problem is exacerbated by the massive data volumes that need to be scanned to identify complex behaviors of extremists and groups. Automated tools are required to extract information to respond queries from analysts, continually scan new information, integrate them with past events, and then alert about emerging threats. We address challenges in investigative pattern detection and develop an Investigative Pattern Detection Framework for Counterterrorism (INSPECT). The framework integrates numerous computing tools that include machine learning techniques to identify behavioral indicators and graph pattern matching techniques to detect risk profiles/groups. INSPECT also automates multiple tasks for large-scale mining of detailed forensic biographies, forming knowledge networks, and querying for behavioral indicators and radicalization trajectories. INSPECT targets human-in-the-loop mode of investigative search and has been validated and evaluated using an evolving dataset on domestic jihadism.

摘要
法警调查措施为保障公共安全而日益重要，尤其是针对暴力激进分子的袭击。问题在于巨量数据的检索和分析，以找出激进分子和团体的复杂行为特征。为此，我们提出了一套Investigative Pattern Detection Framework for Counterterrorism（INSPECT），它将多种计算工具集成，包括机器学习技术和图Pattern matching技术，以识别行为指标和风险分布。INSPECT还自动化了大规模的审批细致生物图文搜索、知识网络建立和行为指标和激进化轨迹查询。INSPECT采用人类在循环模式的调查搜索方式，并已经验证和评估了逐渐增长的 dataset on domestic jihadism。