2023-10-26

cs.LG

cs.LG - 2023-10-26

A Spectral Condition for Feature Learning

paper_url: http://arxiv.org/abs/2310.17813
repo_url: None
paper_authors: Greg Yang, James B. Simon, Jeremy Bernstein
for: 本研究旨在探讨初始化和训练大型神经网络时的特点和挑战，以及如何使神经网络的内部表示在各种宽度下发展为非常重要的过程。
methods: 本研究使用spectral norm的扩展来扩大神经网络的训练，并证明了这种方法可以在各种宽度下使神经网络的内部表示进行非常有用的学习。
results: 本研究发现，通过控制weight矩阵的spectral norm和其更新的大小，可以实现在神经网络中feature learning的过程。此外，本研究还提出了一种简单的 derivation of 最大更新参数化。

Abstract
The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a process known as feature learning. Here, we show that feature learning is achieved by scaling the spectral norm of weight matrices and their updates like $\sqrt{\texttt{fan-out}/\texttt{fan-in}$, in contrast to widely used but heuristic scalings based on Frobenius norm and entry size. Our spectral scaling analysis also leads to an elementary derivation of \emph{maximal update parametrization}. All in all, we aim to provide the reader with a solid conceptual understanding of feature learning in neural networks.

摘要
<>Push 大型神经网络的训练引发了内部表示学习的研究。一个关键挑战是将训练缩放到网络宽度上，使网络内部的表示变得非常至彻。我们显示，通过吸引量范围的卷积矩阵和其更新的方式，如 $\sqrt{\texttt{fan-out}/\texttt{fan-in}$，而不是通用的 Frobenius 范数和entry size的吸引量，来实现表示学习。我们的幂谱缩放分析也导致了最大更新参数的元素分析。总之，我们希望通过这篇文章，为读者提供神经网络内部表示学习的坚实概念理解。Note: The text has been translated using the Google Translate API, which may not produce perfect results. Please let me know if you have any further questions or if you would like me to translate the text into another language.

Interacting Diffusion Processes for Event Sequence Forecasting

paper_url: http://arxiv.org/abs/2310.17800
repo_url: None
paper_authors: Mai Zeng, Florence Regol, Mark Coates
for: 预测不规则时间间隔内的事件序列（long-horizon forecasting of Temporal Point Processes）
methods: 提出了一种基于扩散生成模型的新方法，允许多步预测基于历史事件序列，并直接学习事件类型和时间间隔之间的联合概率分布。
results: 与现有基eline比较，该方法在长期预测TPP方面表现出色，得到了更好的结果。

Abstract
Neural Temporal Point Processes (TPPs) have emerged as the primary framework for predicting sequences of events that occur at irregular time intervals, but their sequential nature can hamper performance for long-horizon forecasts. To address this, we introduce a novel approach that incorporates a diffusion generative model. The model facilitates sequence-to-sequence prediction, allowing multi-step predictions based on historical event sequences. In contrast to previous approaches, our model directly learns the joint probability distribution of types and inter-arrival times for multiple events. This allows us to fully leverage the high dimensional modeling capability of modern generative models. Our model is composed of two diffusion processes, one for the time intervals and one for the event types. These processes interact through their respective denoising functions, which can take as input intermediate representations from both processes, allowing the model to learn complex interactions. We demonstrate that our proposal outperforms state-of-the-art baselines for long-horizon forecasting of TPP.

摘要

Neural Stress Fields for Reduced-order Elastoplasticity and Fracture

paper_url: http://arxiv.org/abs/2310.17790
repo_url: None
paper_authors: Zeshun Zong, Xuan Li, Minchen Li, Maurizio M. Chiaramonte, Wojciech Matusik, Eitan Grinspun, Kevin Carlberg, Chenfanfu Jiang, Peter Yichen Chen
for: 这个研究旨在开发一个混合神经网络和物理框架，用于实时调整黏性和剪损模型。
methods: 这个方法使用了神经网络将黏性和剪损场景映射到低维度数据空间中，并且透过这个映射来快速计算黏性和剪损场景。
results: 这个研究获得了对于黏性和剪损场景的实时调整，并且可以实现 dimension reduction 和 computation time reduction。

Abstract
We propose a hybrid neural network and physics framework for reduced-order modeling of elastoplasticity and fracture. State-of-the-art scientific computing models like the Material Point Method (MPM) faithfully simulate large-deformation elastoplasticity and fracture mechanics. However, their long runtime and large memory consumption render them unsuitable for applications constrained by computation time and memory usage, e.g., virtual reality. To overcome these barriers, we propose a reduced-order framework. Our key innovation is training a low-dimensional manifold for the Kirchhoff stress field via an implicit neural representation. This low-dimensional neural stress field (NSF) enables efficient evaluations of stress values and, correspondingly, internal forces at arbitrary spatial locations. In addition, we also train neural deformation and affine fields to build low-dimensional manifolds for the deformation and affine momentum fields. These neural stress, deformation, and affine fields share the same low-dimensional latent space, which uniquely embeds the high-dimensional simulation state. After training, we run new simulations by evolving in this single latent space, which drastically reduces the computation time and memory consumption. Our general continuum-mechanics-based reduced-order framework is applicable to any phenomena governed by the elastodynamics equation. To showcase the versatility of our framework, we simulate a wide range of material behaviors, including elastica, sand, metal, non-Newtonian fluids, fracture, contact, and collision. We demonstrate dimension reduction by up to 100,000X and time savings by up to 10X.

摘要
我们提出一种混合神经网络和物理框架，用于减少模型的精度模拟塑性和断裂。当前的科学计算模型，如物理点方法（MPM），可以准确模拟大弯形变和断裂机理。但是，它们的长时间和大量内存使其无法适用于受计算时间和内存使用限制的应用，例如虚拟现实。为了超越这些限制，我们提出一种减少模型框架。我们的关键创新是通过一种隐藏神经表示来训练低维度曼努斯（NSF），用于快速计算剪切场的剪切压力值。此外，我们还训练神经变形和平移场来建立低维度曼努斯 для变形和平移动 momentum场。这些神经压力、变形和平移场共享同一个低维度隐藏空间，这种隐藏空间独特地嵌入高维度 simulate 状态。在训练后，我们可以通过在这个唯一的隐藏空间中演化来进行新的 simulate，这会带来很大的计算时间和内存占用减少。我们的总体精度模拟框架适用于任何由塑性动力学方程 governs 的现象。为了展示我们的框架的多样性，我们模拟了各种材料行为，包括塑料、沙、铁、非新颖流体、断裂、接触和碰撞。我们示出了减少维度的幂等于100000倍和计算时间的减少率等于10倍。

Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates

paper_url: http://arxiv.org/abs/2310.17786
repo_url: None
paper_authors: Nicholas E. Corrado, Josiah P. Hanna
for: 本研究旨在找到数据扩充(DA)方法在奖励学习(RL)任务中提高数据效率的特定要素。
methods: 本研究使用了动态不变的数据扩充函数，并对RL更新进行了Integration。我们分析了三个 relevante aspect of DA：state-action coverage，奖励密度，和更新中每个扩充过程中的transition数量（扩充缓存比率）。
results: 我们的实验结果表明，在某些任务中，适当减少扩充缓存比率可以大幅提高数据效率，而在其他任务中，增加state-action coverage的影响比 reward density更大。

Abstract
Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency. While prior work has demonstrated the utility of incorporating augmented data directly into model-free RL updates, it is not well-understood when a particular DA strategy will improve data efficiency. In this paper, we seek to identify general aspects of DA responsible for observed learning improvements. Our study focuses on sparse-reward tasks with dynamics-invariant data augmentation functions, serving as an initial step towards a more general understanding of DA and its integration into RL training. Experimentally, we isolate three relevant aspects of DA: state-action coverage, reward density, and the number of augmented transitions generated per update (the augmented replay ratio). From our experiments, we draw two conclusions: (1) increasing state-action coverage often has a much greater impact on data efficiency than increasing reward density, and (2) decreasing the augmented replay ratio substantially improves data efficiency. In fact, certain tasks in our empirical study are solvable only when the replay ratio is sufficiently low.

摘要
最近，数据扩充（DA）已经被认为是一种方法，通过利用领域知识来生成便宜的数据，以提高动作学习（RL）任务中的数据效率。 although prior work has shown that incorporating augmented data into model-free RL updates can be beneficial, it is not well understood when a particular DA strategy will improve data efficiency. In this paper, we aim to identify the general aspects of DA that are responsible for observed learning improvements. Our study focuses on sparse-reward tasks with dynamics-invariant DA functions, serving as an initial step towards a more general understanding of DA and its integration into RL training.Experimentally, we isolate three relevant aspects of DA: state-action coverage, reward density, and the number of augmented transitions generated per update (the augmented replay ratio). From our experiments, we draw two conclusions: (1) increasing state-action coverage often has a much greater impact on data efficiency than increasing reward density, and (2) decreasing the augmented replay ratio can significantly improve data efficiency. In fact, certain tasks in our empirical study are solvable only when the replay ratio is sufficiently low.Here's the Simplified Chinese version of the text:最近，数据扩充（DA）已经被认为是一种方法，通过利用领域知识来生成便宜的数据，以提高动作学习（RL）任务中的数据效率。 although prior work has shown that incorporating augmented data into model-free RL updates can be beneficial, it is not well understood when a particular DA strategy will improve data efficiency. In this paper, we aim to identify the general aspects of DA that are responsible for observed learning improvements. Our study focuses on sparse-reward tasks with dynamics-invariant DA functions, serving as an initial step towards a more general understanding of DA and its integration into RL training.实验方面，我们孤立了三个相关的DA方面：状态动作覆盖率、奖励密度和每个更新中生成的扩充转移数（扩充回放比率）。从我们的实验结果来看，我们得出了两个结论：（1）提高状态动作覆盖率通常会对数据效率产生更大的影响，而不是提高奖励密度，和（2）降低扩充回放比率可以显著提高数据效率。事实上，在我们的实验中，某些任务只有当扩充回放比率够低时才能解决。

Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

paper_url: http://arxiv.org/abs/2310.17785
repo_url: None
paper_authors: Shih-Min Yang, Martin Magnusson, Johannes A. Stork, Todor Stoyano
for: 本研究旨在解决机器人抓取问题中，所有抓取都被阻塞，例如环境障碍。单shot抓取规划无法成功。相反，需要先将 объек manipulate into a configuration that affords a grasp.
methods: 本研究使用 hierarchical reinforcement learning 学习一系列的 action，利用环境来改变 object 的pose。具体来说，我们使用 parameterized manipulation primitives 来组合低级的抓取政策。通过学习低级抓取政策，我们的方法可以通过利用 object、gripper 和环境之间的互动来控制 object 的状态。
results: 我们的方法在各种不同的 weight、形状和黏性性质的 box-shaped object 上成功完成抓取任务，实验中成功率为 98%。此外，我们的方法还可以在实际的机器人上运行，并在98%的实验中成功完成抓取任务。

Abstract
Many practically relevant robot grasping problems feature a target object for which all grasps are occluded, e.g., by the environment. Single-shot grasp planning invariably fails in such scenarios. Instead, it is necessary to first manipulate the object into a configuration that affords a grasp. We solve this problem by learning a sequence of actions that utilize the environment to change the object's pose. Concretely, we employ hierarchical reinforcement learning to combine a sequence of learned parameterized manipulation primitives. By learning the low-level manipulation policies, our approach can control the object's state through exploiting interactions between the object, the gripper, and the environment. Designing such a complex behavior analytically would be infeasible under uncontrolled conditions, as an analytic approach requires accurate physical modeling of the interaction and contact dynamics. In contrast, we learn a hierarchical policy model that operates directly on depth perception data, without the need for object detection, pose estimation, or manual design of controllers. We evaluate our approach on picking box-shaped objects of various weight, shape, and friction properties from a constrained table-top workspace. Our method transfers to a real robot and is able to successfully complete the object picking task in 98\% of experimental trials.

摘要
许多实用的机器人抓取问题中，目标对象都被环境 occluded，例如机器人的抓取方法无法成功。而不是单shot grasp planning，我们需要先将对象变换到一个可以抓取的配置。我们解决这个问题的方法是通过学习一系列的动作，利用环境来改变对象的姿态。具体来说，我们使用层次强化学习来组合一系列学习的参数化操作。通过学习低级操作策略，我们的方法可以通过利用对象、抓取器和环境之间的互动来控制对象的状态。设计这样的复杂行为分析性是在无控制条件下不可能的，因为分析方法需要准确地模型对象和接触动力学。相比之下，我们学习了层次策略模型，直接在深度感知数据上操作，无需对象检测、姿态估计或人工设计控制器。我们在实验中使用了各种不同的箱形对象，重量、形状和黏性属性都有很大的变化。我们的方法在98%的实验中成功完成对象抓取任务。

Learning Optimal Classification Trees Robust to Distribution Shifts

paper_url: http://arxiv.org/abs/2310.17772
repo_url: None
paper_authors: Nathan Justin, Sina Aghaei, Andrés Gómez, Phebe Vayanos
for: 这篇论文主要应用在高赌性设定中，例如公共健康和社会工作，处理自愿报告问卷调查数据，因为这种数据受到问题表述、时间、地点和访谈者的舒适度等多种因素的影响。
methods: 本论文提出了一种基于杂Integer稳定优化技术的类别树学习方法，具体来说是将类别树学习问题转换为一个单一的杂Integer稳定优化问题，并通过对应映射来实现。
results: 本论文的结果显示，相比于非稳定的类别树学习方法，使用本论文提出的稳定类别树学习方法可以提高最差情况的准确率高达12.48%，平均情况的准确率高达4.85%。

Abstract
We consider the problem of learning classification trees that are robust to distribution shifts between training and testing/deployment data. This problem arises frequently in high stakes settings such as public health and social work where data is often collected using self-reported surveys which are highly sensitive to e.g., the framing of the questions, the time when and place where the survey is conducted, and the level of comfort the interviewee has in sharing information with the interviewer. We propose a method for learning optimal robust classification trees based on mixed-integer robust optimization technology. In particular, we demonstrate that the problem of learning an optimal robust tree can be cast as a single-stage mixed-integer robust optimization problem with a highly nonlinear and discontinuous objective. We reformulate this problem equivalently as a two-stage linear robust optimization problem for which we devise a tailored solution procedure based on constraint generation. We evaluate the performance of our approach on numerous publicly available datasets, and compare the performance to a regularized, non-robust optimal tree. We show an increase of up to 12.48% in worst-case accuracy and of up to 4.85% in average-case accuracy across several datasets and distribution shifts from using our robust solution in comparison to the non-robust one.

摘要
Translated into Simplified Chinese:我们关注了在训练和测试/部署数据之间存在分布偏移的学习分类树的问题。这个问题在高度关键的设置中经常出现，如公共健康和社会工作，数据通常通过自我报告问卷收集，这些问卷高度敏感于问题的表述、时间和地点问卷被进行的，以及回答者对调查员的信任程度。我们提出了基于混合整数稳定优化技术的robust分类树学习方法。特别是，我们示出了将优化最优robust树的问题可以转化为单阶段混合整数稳定优化问题，该问题的目标函数具有非线性和缺continuity。我们将该问题等价转化为两阶段线性稳定优化问题，并开发了特制的解决方案基于约束生成。我们对多个公共可用的数据集进行评估，并与非robust优化树进行比较。我们的结果显示，使用我们的robust解决方案可以提高最坏情况准确率和平均准确率相比，在多个数据集和分布偏移下增加了12.48%和4.85%。

Distributed Personalized Empirical Risk Minimization

paper_url: http://arxiv.org/abs/2310.17761
repo_url: None
paper_authors: Yuyang Deng, Mohammad Mahdi Kamani, Pouria Mahdavinia, Mehrdad Mahdavi
for: 本研究提出了一种新的个性化风险最小化（PERM）模式，以便从不同数据源中学习，不需要对参与设备的计算资源做严格的限制。
methods: 本研究使用了个性化 empirical loss 的权重学习，通过有效地估计数据分布之间的统计差异，实现了对所有本地分布的最佳统计准确性。
results: 提议的分布式算法可以同时优化 PERM 目标 для所有设备，并可以学习不同客户端的个性化模型 architecture（例如，不同参数数量的神经网络），从而限制下各客户端的内存和计算资源。

Abstract
This paper advocates a new paradigm Personalized Empirical Risk Minimization (PERM) to facilitate learning from heterogeneous data sources without imposing stringent constraints on computational resources shared by participating devices. In PERM, we aim to learn a distinct model for each client by learning who to learn with and personalizing the aggregation of local empirical losses by effectively estimating the statistical discrepancy among data distributions, which entails optimal statistical accuracy for all local distributions and overcomes the data heterogeneity issue. To learn personalized models at scale, we propose a distributed algorithm that replaces the standard model averaging with model shuffling to simultaneously optimize PERM objectives for all devices. This also allows us to learn distinct model architectures (e.g., neural networks with different numbers of parameters) for different clients, thus confining underlying memory and compute resources of individual clients. We rigorously analyze the convergence of the proposed algorithm and conduct experiments that corroborate the effectiveness of the proposed paradigm.

摘要
To scale up personalized learning, the proposed algorithm replaces standard model averaging with model shuffling, which simultaneously optimizes PERM objectives for all devices. This allows for the learning of distinct model architectures (e.g., neural networks with different numbers of parameters) for different clients, thereby limiting the underlying memory and compute resources of individual clients. The proposed algorithm is rigorously analyzed, and experimental results demonstrate the effectiveness of the proposed paradigm.

Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization

paper_url: http://arxiv.org/abs/2310.17759
repo_url: None
paper_authors: Liang Zhang, Junchi Yang, Amin Karbasi, Niao He
for: 本研究探讨了机器学习算法的重复性问题，即小变化在训练过程中的输出差异。
methods: 本研究使用了规范基本的算法来实现优化的重复性和梯度复杂度之间的权衡。
results: 研究发现，对于凸最小化和凸-凹最小化问题，可以在不同的异常 oracle 设定下实现优化的重复性和梯度复杂度。特别是，通过使用规范正则化算法，可以在不同的初始化 oracle 下实现最佳的重复性和梯度复杂度。

Abstract
Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity. We believe our results contribute to an enhanced understanding of the reproducibility-convergence trade-off in the context of convex optimization.

摘要
算法复现性度量机器学习算法在训练过程中的小变化后输出的偏差。先前的工作表明，首级方法需要让您费时征识和优化训练过程以实现更好的复现性。在这种情况下，我们挑战这一观念，并证明了对于凸 minimization 和凸-凹 minimax 问题，我们可以在不同的错误订单设定下实现优化的复现性和近似最优的梯度复杂度。具体来说，给出不准确的初始化订单，我们的规范基于算法可以实现最优的复现性和近似最优的梯度复杂度 для MINIMIZATION 和 minimax 优化。而使用不准确的梯度订单，我们还可以为 minimax 优化提供近似最优的保证。此外，使用随机梯度订单，我们表明了随机梯度升方法在 reproduceability 和梯度复杂度两个方面具有最佳性。我们认为我们的结果对于凸优化中的 reproduceability-convergence 质量进行了更深入的理解。

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

paper_url: http://arxiv.org/abs/2310.17752
repo_url: None
paper_authors: Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
for: 支持离线学习和隐私保护的个性化自适应（如本地精细调参大语言模型）。
methods: 使用紧凑的、稀疏的和高效的引擎（PockEngine）来实现在边缘设备上的训练和调参。 PockEngine 支持稀疏反传和编译器优化，以提高训练效率。
results: PockEngine 可以在不同的应用、前端和硬件背景下进行敏捷的调参和训练，并且可以实现大量的硬件兼容性和内存减少。在评估中，PockEngine 可以在 Raspberry Pi 上实现15倍的速度提升，在 Jetson AGX Orin 上实现5.6倍的内存减少和7.9倍的大语言模型调参速度提升。

Abstract
On-device learning and efficient fine-tuning enable continuous and privacy-preserving customization (e.g., locally fine-tuning large language models on personalized data). However, existing training frameworks are designed for cloud servers with powerful accelerators (e.g., GPUs, TPUs) and lack the optimizations for learning on the edge, which faces challenges of resource limitations and edge hardware diversity. We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices. PockEngine supports sparse backpropagation: it prunes the backward graph and sparsely updates the model with measured memory saving and latency reduction while maintaining the model quality. Secondly, PockEngine is compilation first: the entire training graph (including forward, backward and optimization steps) is derived at compile-time, which reduces the runtime overhead and brings opportunities for graph transformations. PockEngine also integrates a rich set of training graph optimizations, thus can further accelerate the training cost, including operator reordering and backend switching. PockEngine supports diverse applications, frontends and hardware backends: it flexibly compiles and tunes models defined in PyTorch/TensorFlow/Jax and deploys binaries to mobile CPU/GPU/DSPs. We evaluated PockEngine on both vision models and large language models. PockEngine achieves up to 15 $\times$ speedup over off-the-shelf TensorFlow (Raspberry Pi), 5.6 $\times$ memory saving back-propagation (Jetson AGX Orin). Remarkably, PockEngine enables fine-tuning LLaMav2-7B on NVIDIA Jetson AGX Orin at 550 tokens/s, 7.9$\times$ faster than the PyTorch.

摘要
“Device上学习和高效精度调整允许不间断和隐私保护的自定义（例如，在本地调整大语言模型的个性化数据）。然而，现有的训练框架是为云服务器设计，具有强大加速器（例如，GPU和TPU），而不是面临边缘设备的挑战，包括资源限制和边缘硬件多样性。我们介绍了PockEngine：一个小巧、稀疏和高效的引擎，用于在不同的边缘设备上进行精度调整。PockEngine支持稀疏反传：它缩短反传图和稀疏地更新模型，同时保持模型质量，减少内存占用和延迟。其次，PockEngine是编译先的：整个训练图（包括前向、反向和优化步骤）在编译时确定，从而减少运行时开销并带来可视化和优化图 transformations。PockEngine还集成了丰富的训练图优化功能，可以进一步降低训练成本，包括操作重定义和后端转换。PockEngine支持多种应用程序、前端和硬件后端：它可靠地编译和调整PyTorch/TensorFlow/Jax定义的模型，并将二进制发布到移动CPU/GPU/DSPs。我们对PockEngine进行了对比分析，并证明它可以在Raspberry Pi上提供15倍的速度提升，和 Jetson AGX Orin上的5.6倍内存占用减少。值得一提的是，PockEngine可以在NVIDIA Jetson AGX Orin上进行LLaMav2-7B的精度调整，每秒550个token，比PyTorch的7.9倍快。”

Making the End-User a Priority in Benchmarking: OrionBench for Unsupervised Time Series Anomaly Detection

paper_url: http://arxiv.org/abs/2310.17748
repo_url: https://github.com/sintel-dev/orion
paper_authors: Sarah Alnegheimish, Laure Berti-Equille, Kalyan Veeramachaneni
for: 这个研究是为了提供一个用户中心的时间序列异常检测 benchmark，并且可以持续更新和维护。
methods: 这个研究使用了深度学习基于的时间序列异常检测方法，并且提供了一个名为 OrionBench 的框架，以便实现模型的比较和评估。
results: 研究发现，透过 OrionBench 的持续更新和维护，可以实现较好的时间序列异常检测性能，并且可以适应不同的应用领域和资料集。

Abstract
Time series anomaly detection is a prevalent problem in many application domains such as patient monitoring in healthcare, forecasting in finance, or predictive maintenance in energy. This has led to the emergence of a plethora of anomaly detection methods, including more recently, deep learning based methods. Although several benchmarks have been proposed to compare newly developed models, they usually rely on one-time execution over a limited set of datasets and the comparison is restricted to a few models. We propose OrionBench -- a user centric continuously maintained benchmark for unsupervised time series anomaly detection. The framework provides universal abstractions to represent models, extensibility to add new pipelines and datasets, hyperparameter standardization, pipeline verification, and frequent releases with published benchmarks. We demonstrate the usage of OrionBench, and the progression of pipelines across 15 releases published over the course of three years. Moreover, we walk through two real scenarios we experienced with OrionBench that highlight the importance of continuous benchmarks in unsupervised time series anomaly detection.

摘要
<>将文本翻译成简化中文。<>时间序列异常检测是许多应用领域中的普遍问题，如医疗领域的患者监测、金融领域的预测、或能源领域的预测维护。这导致了各种异常检测方法的出现，包括最近几年内deep learning基于方法。虽然有几个比较方法的标准废，但它们通常基于一次执行在有限的数据集上，并且只能比较一些模型。我们提出了OrionBench——一个用户中心、不断维护的无监督时间序列异常检测标准废。框架提供了通用的抽象来表示模型，扩展性可以添加新的管道和数据集，标准化超参数、管道验证和定期发布标准废。我们介绍了OrionBench的使用方法，以及在过去三年内发布的15个标准废，以及两个实际情景，描述了无监督时间序列异常检测中的连续标准废的重要性。

BERT-PIN: A BERT-based Framework for Recovering Missing Data Segments in Time-series Load Profiles

paper_url: http://arxiv.org/abs/2310.17742
repo_url: None
paper_authors: Yi Hu, Kai Ye, Hyeonjin Kim, Ning Lu
for: 本研究旨在提出一种基于Transformers模型的Profile Inpainting Network（BERT-PIN），用于多个缺失数据段（MDS）的恢复。
methods: 该模型使用了Transformers模型结构，对载重和温度 profiles进行分割，并将每个分割段视为一个单词，整个profile视为一个句子。模型还包括一个顶尖选择过程，以生成多个可能性 Distributions，用于表示不同的自信水平。
results: 实验结果显示，BERT-PIN在精度方面表现优于现有方法，同时能够在更长的窗口内恢复多个MDS。此外，BERT-PIN可以作为预训练模型，进行多个下游任务的调整，如分类和超解像。

Abstract
Inspired by the success of the Transformer model in natural language processing and computer vision, this paper introduces BERT-PIN, a Bidirectional Encoder Representations from Transformers (BERT) powered Profile Inpainting Network. BERT-PIN recovers multiple missing data segments (MDSs) using load and temperature time-series profiles as inputs. To adopt a standard Transformer model structure for profile inpainting, we segment the load and temperature profiles into line segments, treating each segment as a word and the entire profile as a sentence. We incorporate a top candidates selection process in BERT-PIN, enabling it to produce a sequence of probability distributions, based on which users can generate multiple plausible imputed data sets, each reflecting different confidence levels. We develop and evaluate BERT-PIN using real-world dataset for two applications: multiple MDSs recovery and demand response baseline estimation. Simulation results show that BERT-PIN outperforms the existing methods in accuracy while is capable of restoring multiple MDSs within a longer window. BERT-PIN, served as a pre-trained model, can be fine-tuned for conducting many downstream tasks, such as classification and super resolution.

摘要
受到自然语言处理和计算机视觉成功的启发，本文介绍BERT-PIN，一种基于变换器模型的 Profile Inpainting Network。BERT-PIN使用加载和温度时间序列profile作为输入，恢复多个缺失数据段（MDS）。为采用标准变换器模型结构来进行profile填充，我们将加载和温度profile分割成线段，每个线段当作一个单词，整个profile当作一个句子。我们在BERT-PIN中引入了顶部选择过程，使其能生成一个序列的概率分布，基于这些概率分布，用户可以生成多个可能的填充数据集，每个数据集都反映了不同的信任水平。我们开发了BERT-PIN并对实际数据进行评估，用于两个应用：多个MDS恢复和需求回应基线估算。实验结果表明，BERT-PIN在准确性方面超过现有方法，同时能够在较长的窗口内恢复多个MDS。BERT-PIN作为预训练模型，可以进行许多下游任务的 fine-tuning，如分类和超解析。

GNN-GMVO: Graph Neural Networks for Optimizing Gross Merchandise Value in Similar Item Recommendation

paper_url: http://arxiv.org/abs/2310.17732
repo_url: None
paper_authors: Ramin Giahi, Reza Yousefi Maragheh, Nima Farrokhsiar, Jianpeng Xu, Jason Cho, Evren Korpeoglu, Sushant Kumar, Kannan Achan
for: 提高电商平台的相似商品推荐精度和销售额（GMV）。
methods: 使用Graph Neural Networks（GNN）模型，直接优化GMV目标函数，并提出自定义边构建方法来缓解各种商品之间复杂的关系。
results: 在三个真实世界数据集上进行了广泛的实验，与选择的先进Reference模型相比，模型的预测性能和预期的GMV均较高。

Abstract
Similar item recommendation is a critical task in the e-Commerce industry, which helps customers explore similar and relevant alternatives based on their interested products. Despite the traditional machine learning models, Graph Neural Networks (GNNs), by design, can understand complex relations like similarity between products. However, in contrast to their wide usage in retrieval tasks and their focus on optimizing the relevance, the current GNN architectures are not tailored toward maximizing revenue-related objectives such as Gross Merchandise Value (GMV), which is one of the major business metrics for e-Commerce companies. In addition, defining accurate edge relations in GNNs is non-trivial in large-scale e-Commerce systems, due to the heterogeneity nature of the item-item relationships. This work aims to address these issues by designing a new GNN architecture called GNN-GMVO (Graph Neural Network - Gross Merchandise Value Optimizer). This model directly optimizes GMV while considering the complex relations between items. In addition, we propose a customized edge construction method to tailor the model toward similar item recommendation task and alleviate the noisy and complex item-item relations. In our comprehensive experiments on three real-world datasets, we show higher prediction performance and expected GMV for top ranked items recommended by our model when compared with selected state-of-the-art benchmark models.

摘要
Traditional machine learning models have been used for similar item recommendation in the e-commerce industry, but Graph Neural Networks (GNNs) can better understand complex product relationships. However, current GNN architectures are not optimized for revenue-related objectives such as Gross Merchandise Value (GMV), which is a key metric for e-commerce companies. In addition, defining accurate edge relations in GNNs can be challenging in large-scale e-commerce systems due to the complexity of item-item relationships. To address these issues, we propose a new GNN architecture called GNN-GMVO (Graph Neural Network - Gross Merchandise Value Optimizer) that directly optimizes GMV while considering complex item relationships. We also propose a customized edge construction method to tailor the model for similar item recommendation and alleviate noisy item-item relations. In our comprehensive experiments on three real-world datasets, we show that our model outperforms selected state-of-the-art benchmark models in terms of prediction performance and expected GMV for top-ranked items.

Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms

paper_url: http://arxiv.org/abs/2310.17716
repo_url: None
paper_authors: Alexander Nietner
for: 本研究探讨了机器学习算法在量子学习中的一致性问题，提出了一种基于统计学和参数化学习的共同视角。
methods: 本研究使用了KEARNS的统计查询（SQ）oracle和VALIANT的弱评估 oracle（WEAK），并开发了一种扩展性强大且直观的框架，用于学习从评估查询中获得函数值估计。
results: 本研究实现了将传统机器学习算法中的学习问题转化为量子学习问题，并提出了新的下界性质和函数类型的学习复杂性问题。此外，研究还对一些受欢迎的量子机器学习（QML）设定进行分析，从而获得了对这些任务的更深刻的理解。

Abstract
Kearns' statistical query (SQ) oracle (STOC'93) lends a unifying perspective for most classical machine learning algorithms. This ceases to be true in quantum learning, where many settings do not admit, neither an SQ analog nor a quantum statistical query (QSQ) analog. In this work, we take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle (TOCT'14) and establish a unified perspective bridging the statistical and parametrized learning paradigms in a novel way. We explore the problem of learning from an evaluation oracle, which provides an estimate of function values, and introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries and characterizes the query complexity for learning linear function classes. The framework is directly applicable to the QSQ setting and virtually all algorithms based on loss function optimization. Our first application is to extend prior results on the learnability of output distributions of quantum circuits and Clifford unitaries from the SQ to the (multi-copy) QSQ setting, implying exponential separations between learning stabilizer states from (multi-copy) QSQs versus from quantum samples. Our second application is to analyze some popular quantum machine learning (QML) settings. We gain an intuitive picture of the hardness of many QML tasks which goes beyond existing methods such as barren plateaus and the statistical dimension, and contains crucial setting-dependent implications. Our framework not only unifies the perspective of cost concentration with that of the statistical dimension in a unified language but exposes their connectedness and similarity.

摘要
凯尔恩斯（Kearns）的统计查询（SQ）oracle（STOC'93）为经典机器学习算法提供了一种统一的视角。然而，在量子学习中，许多设定并不允许SQ或量子统计查询（QSQ）的类比。在这项工作中，我们从凯尔恩斯的SQ oracle和瓦利安特（Valiant）的弱评估oracle（TOCT'14）中启发灵感，并建立了一种统一的视角，用于将统计学和参数化学习范文联系起来。我们研究从评估 oracle 中学习函数值的问题，并引入了一个易于理解的框架，以获得无条件下界 для学习从评估查询和Characterizing查询复杂度。这个框架直接适用于QSQ设定，并且可以应用于大多数基于损失函数优化的算法。我们的第一个应用是将先前关于量子环 Circuits 和 Clifford 单位的学习可能性从SQ扩展到（多 копи）QSQ设定，从而得到对学习稳定状态的 exponential 分离。我们的第二个应用是分析一些流行的量子机器学习（QML）设定。我们获得了一种具有INTUITIVE 的图像，这些图像超出了现有的方法，如阻挡板和统计维度，并且包含了关键的设定 dependent 影响。我们的框架不仅将cost concentration 和统计维度的视角统一起来，还暴露了它们之间的连接和相似性。

Community Detection and Classification Guarantees Using Embeddings Learned by Node2Vec

paper_url: http://arxiv.org/abs/2310.17712
repo_url: None
paper_authors: Andrew Davison, S. Carlyle Morgan, Owen G. Ward
for: 本研究旨在探讨node2vec算法学习的维度下的网络节点嵌入的理论性质。
methods: 本研究使用node2vec算法学习网络节点的嵌入，并使用k-means聚类算法来恢复网络中的社群。
results: 研究结果表明，使用node2vec算法学习的嵌入 vectors 可以带来网络中节点的弱相关社群恢复，并且可以用于网络节点和链接预测任务。

Abstract
Embedding the nodes of a large network into an Euclidean space is a common objective in modern machine learning, with a variety of tools available. These embeddings can then be used as features for tasks such as community detection/node clustering or link prediction, where they achieve state of the art performance. With the exception of spectral clustering methods, there is little theoretical understanding for other commonly used approaches to learning embeddings. In this work we examine the theoretical properties of the embeddings learned by node2vec. Our main result shows that the use of k-means clustering on the embedding vectors produced by node2vec gives weakly consistent community recovery for the nodes in (degree corrected) stochastic block models. We also discuss the use of these embeddings for node and link prediction tasks. We demonstrate this result empirically, and examine how this relates to other embedding tools for network data.

摘要
<>将网络节点 embedding 到欧几何空间是现代机器学习中常见的目标，有各种工具可用。这些嵌入可以用于任务such as 社区探测/节点封闭或链接预测，达到状态艺术性能。除spectral clustering方法外，其他常用的嵌入学习方法几乎没有理论理解。在这种工作中，我们研究node2vec学习的嵌入性质。我们的主要结果表明，使用node2vec生成的嵌入向量进行k-means封闭可以weakly consistent的社区恢复节点在（修正后）随机块模型中。我们还讨论这些嵌入的用途 для节点和链接预测任务。我们通过实验证明这个结果，并探讨与其他网络数据嵌入工具之间的关系。Note: "degree corrected" in the text refers to the fact that the stochastic block models are corrected for the degree of the nodes, which means that the models take into account the number of connections (edges) that each node has. This is important because nodes with more connections tend to have a higher chance of being assigned to the same community.

High-Dimensional Prediction for Sequential Decision Making

paper_url: http://arxiv.org/abs/2310.17651
repo_url: None
paper_authors: Georgy Noarov, Ramya Ramalingam, Aaron Roth, Stephan Xie
For: The paper is written for solving the problem of making predictions of an adversarially chosen high-dimensional state that are unbiased subject to an arbitrary collection of conditioning events.* Methods: The paper presents efficient algorithms for solving this problem, as well as a number of applications that stem from choosing an appropriate set of conditioning events.* Results: The paper achieves efficient no-subsequence-regret algorithms in extensive-form games (EFGs), yielding a new family of regret guarantees for EFGs that generalizes some existing EFG regret notions. Additionally, the paper develops a novel transparent alternative to conformal prediction for building valid online adversarial multiclass prediction sets, which implies strong conditional validity guarantees and improved loss compared to any collection of benchmark models.Here’s the Chinese version of the three information points:* For: 论文目的是解决一个 adversarially 选择高维状态的预测问题，保证这些预测是受到 conditioning events 的不偏性。* Methods: 论文提出了一些有效的算法来解决这个问题，以及一些来自 conditioning events 的应用。* Results: 论文实现了一些高效的 no-subsequence-regret 算法在扩展形游戏 (EFG) 中，得到了一些新的 EFG regret garanties，并开发了一种透明的对 conformal prediction 的替换方案，它具有强 conditional validity garanties 和改进的 loss。

Abstract
We study the problem of making predictions of an adversarially chosen high-dimensional state that are unbiased subject to an arbitrary collection of conditioning events, with the goal of tailoring these events to downstream decision makers. We give efficient algorithms for solving this problem, as well as a number of applications that stem from choosing an appropriate set of conditioning events. For example, we can efficiently make predictions targeted at polynomially many decision makers, giving each of them optimal swap regret if they best-respond to our predictions. We generalize this to online combinatorial optimization, where the decision makers have a very large action space, to give the first algorithms offering polynomially many decision makers no regret on polynomially many subsequences that may depend on their actions and the context. We apply these results to get efficient no-subsequence-regret algorithms in extensive-form games (EFGs), yielding a new family of regret guarantees for EFGs that generalizes some existing EFG regret notions, e.g. regret to informed causal deviations, and is generally incomparable to other known such notions. Next, we develop a novel transparent alternative to conformal prediction for building valid online adversarial multiclass prediction sets. We produce class scores that downstream algorithms can use for producing valid-coverage prediction sets, as if these scores were the true conditional class probabilities. We show this implies strong conditional validity guarantees including set-size-conditional and multigroup-fair coverage for polynomially many downstream prediction sets. Moreover, our class scores can be guaranteed to have improved $L_2$ loss, cross-entropy loss, and generally any Bregman loss, compared to any collection of benchmark models, yielding a high-dimensional real-valued version of omniprediction.

摘要
我们研究一个在抗对抗选择高维状态的预测问题，其目标是在一个任意的集合conditioning事件下，得到不偏的预测。我们提供高效的算法来解决这个问题，以及一些来自于conditioning事件选择的应用。例如，我们可以高效地对多个决策者进行预测，并为每个决策者提供优化的交换 regret。我们扩展这些结果到在线 combinatorial optimization 中，使得决策者有very large action space，并给出了第一个不偏多个决策者的no regret算法。我们应用这些结果来得到高效的无序序 regret算法，并在extensive-form games（EFGs）中应用这些结果，得到了一个新的 regret guarantee family for EFGs，这个家族包括一些现有的 EFG regret notions，例如 regret to informed causal deviations，并且与其他已知的notions不可比较。接下来，我们开发了一种新的透明的alternative to conformal prediction for building valid online adversarial multiclass prediction sets。我们生成的class scores可以用于生成有效覆盖的预测集，就好像这些 scores 是真实的 conditional class probabilities。我们证明这些 guarantees 包括 set-size-conditional 和 multigroup-fair coverage for polynomially many downstream prediction sets。此外，我们的class scores可以保证与任何集合of benchmark models相比，有 improved $L_2$ loss, cross-entropy loss, 和generally any Bregman loss，这些 guarantees 是一种高维实数valued的 omniprediction。

Counterfactual Fairness for Predictions using Generative Adversarial Networks

paper_url: http://arxiv.org/abs/2310.17687
repo_url: None
paper_authors: Yuchen Ma, Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel
for: 这篇论文是为了实现对不同敏感特征下的预测的公平性而写的。
methods: 该论文提出了一种基于生成对抗网络的新型深度神经网络模型，称为生成对抗公平网络（GCFN），以实现对不同敏感特征下的预测。Specifically, it leverages a tailored generative adversarial network to directly learn the counterfactual distribution of the descendants of the sensitive attribute, which is then used to enforce fair predictions through a novel counterfactual mediator regularization.
results: 该方法在多个实验中达到了状态之最的性能。在一个真实的案例研究中，它还能够在实际中做出有意义的预测。

Abstract
Fairness in predictions is of direct importance in practice due to legal, ethical, and societal reasons. It is often achieved through counterfactual fairness, which ensures that the prediction for an individual is the same as that in a counterfactual world under a different sensitive attribute. However, achieving counterfactual fairness is challenging as counterfactuals are unobservable. In this paper, we develop a novel deep neural network called Generative Counterfactual Fairness Network (GCFN) for making predictions under counterfactual fairness. Specifically, we leverage a tailored generative adversarial network to directly learn the counterfactual distribution of the descendants of the sensitive attribute, which we then use to enforce fair predictions through a novel counterfactual mediator regularization. If the counterfactual distribution is learned sufficiently well, our method is mathematically guaranteed to ensure the notion of counterfactual fairness. Thereby, our GCFN addresses key shortcomings of existing baselines that are based on inferring latent variables, yet which (a) are potentially correlated with the sensitive attributes and thus lead to bias, and (b) have weak capability in constructing latent representations and thus low prediction performance. Across various experiments, our method achieves state-of-the-art performance. Using a real-world case study from recidivism prediction, we further demonstrate that our method makes meaningful predictions in practice.

摘要
法律、伦理和社会因素的直接重要性使得预测中的公平性成为了实践中的核心问题。为了实现这一目标，我们通常采用对假性公平性，即预测结果对于具有不同敏感特征的个体是否相同。然而，实现对假性公平性是困难的，因为对假的世界是不可见的。在这篇论文中，我们开发了一种新的深度神经网络 called Generative Counterfactual Fairness Network (GCFN)，用于在对假性公平性下进行预测。我们特制了一个适应性的生成对抗网络，以直接学习敏感特征的后代的对假分布，然后使用一种新的对假mediator正则化来实现公平预测。如果对假分布被学习得足够好，我们的方法可以在数学上保证对假性公平性的概念。因此，我们的GCFN可以解决现有基准的缺陷，即基于推断 latent variables 的方法可能受敏感特征的干扰，导致偏见，同时 latent representation 的构建能力强度不足，预测性能低下。在多个实验中，我们的方法实现了状态机器人的表现。使用一个实际案例研究，我们进一步证明了我们的方法在实践中可以提供有意义的预测。

Do Graph Neural Networks Dream of Landau Damping? Insights from Kinetic Simulations of a Plasma Sheet Model

paper_url: http://arxiv.org/abs/2310.17646
repo_url: None
paper_authors: Diogo D Carvalho, Diogo R Ferreira, Luis O Silva
for: 本研究目的是替代一个激光物理模拟器使用图像神经网络模拟器。
methods: 我们使用图像神经网络模拟器，因为它们的消息传递更新机制与传统物理解析器更新机制具有相似性，并且可以强制实施知道的物理假设到图像结构和更新中。
results: 我们的模型学习了一维激光物理动力学，包括激光热化、电磁振荡和快速片和兰契抑制。我们与原始激光模型进行比较，并评估了运行时间、保守定律和物理量的时间演化。

Abstract
We explore the possibility of fully replacing a plasma physics kinetic simulator with a graph neural network-based simulator. We focus on this class of surrogate models given the similarity between their message-passing update mechanism and the traditional physics solver update, and the possibility of enforcing known physical priors into the graph construction and update. We show that our model learns the kinetic plasma dynamics of the one-dimensional plasma model, a predecessor of contemporary kinetic plasma simulation codes, and recovers a wide range of well-known kinetic plasma processes, including plasma thermalization, electrostatic fluctuations about thermal equilibrium, and the drag on a fast sheet and Landau damping. We compare the performance against the original plasma model in terms of run-time, conservation laws, and temporal evolution of key physical quantities. The limitations of the model are presented and possible directions for higher-dimensional surrogate models for kinetic plasmas are discussed.

摘要
我们探讨将激射物理学仿真器完全替换为基于图 neural network 的仿真器的可能性。我们关注这类仿真模型，因为它们的消息传递更新机制与传统物理计算器更新机制之间存在相似性，并且可以在图构建和更新中强制实施已知物理前提。我们发现我们的模型学习了一维激射物理动力学，这是当今激射物理计算代码的前身，并重新创造了许多已知激射物理过程，包括激射物理热化、电磁振荡以thermal equilibrium为中心和快板拖拽和兰德抑压。我们与原始激射模型进行比较，包括运行时间、保守法和时间演化关键物理量的方面。我们还提出了模型的限制和高维仿真模型的可能性。

Where you go is who you are – A study on machine learning based semantic privacy attacks

paper_url: http://arxiv.org/abs/2310.17643
repo_url: https://github.com/mie-lab/trip_purpose_privacy
paper_authors: Nina Wiedemann, Ourania Kounadi, Martin Raubal, Krzysztof Janowicz
For: The paper investigates the risk of privacy loss due to machine learning-based attacks on raw location data, even with inaccuracies in the data.* Methods: The paper presents two attack scenarios, location categorization and user profiling, and conducts experiments on the Foursquare dataset and tracking data to demonstrate the potential for abuse of high-quality spatial information.* Results: The paper finds that with location obfuscation of more than 1 km, spatial information hardly adds any value, but a high privacy risk solely from temporal information remains. The availability of public context data such as POIs plays a key role in inference based on spatial information.In Simplified Chinese text, the three points would be:
for: 这篇论文研究了基于机器学习的 Raw 位置数据隐私泄露的风险，即使数据有误差。
methods: 论文提出了两种攻击方案，即位置分类和用户 profiling，并在 Foursquare 数据集和跟踪数据上进行实验，以示出高质量的空间信息的滥用风险。
results: 研究发现，即使Location 屏蔽超过 1 km，空间信息几乎无价值，但是基于时间信息的隐私风险仍然很高。公共上下文数据 such as POIs 在空间信息推断中扮演关键角色。

Abstract
Concerns about data privacy are omnipresent, given the increasing usage of digital applications and their underlying business model that includes selling user data. Location data is particularly sensitive since they allow us to infer activity patterns and interests of users, e.g., by categorizing visited locations based on nearby points of interest (POI). On top of that, machine learning methods provide new powerful tools to interpret big data. In light of these considerations, we raise the following question: What is the actual risk that realistic, machine learning based privacy attacks can obtain meaningful semantic information from raw location data, subject to inaccuracies in the data? In response, we present a systematic analysis of two attack scenarios, namely location categorization and user profiling. Experiments on the Foursquare dataset and tracking data demonstrate the potential for abuse of high-quality spatial information, leading to a significant privacy loss even with location inaccuracy of up to 200m. With location obfuscation of more than 1 km, spatial information hardly adds any value, but a high privacy risk solely from temporal information remains. The availability of public context data such as POIs plays a key role in inference based on spatial information. Our findings point out the risks of ever-growing databases of tracking data and spatial context data, which policymakers should consider for privacy regulations, and which could guide individuals in their personal location protection measures.

摘要
关于数据隐私的担忧是不断增长的，由于人们的数字应用程序使用的不断增加，以及这些应用程序的深层次商业模式，包括卖出用户数据。位置数据尤为敏感，因为它们可以让我们推断用户的活动模式和兴趣，例如，通过将访问的位置分类为基于附近的点位 Interest (POI)。此外，机器学习方法提供了新的强大工具来解读大数据。在考虑这些因素后，我们提出以下问题：真实的风险是，通过实际的机器学习基于隐私攻击，从原始的位置数据中获得有意义的semantic信息，并且在数据不准确的情况下。为回答这个问题，我们提供了两种攻击场景的系统性分析：位置分类和用户 profiling。实验结果表明，使用Foursquare数据集和跟踪数据，高质量的空间信息可以导致重大隐私损失，即使位置数据不准确到200m。而对位置减震超过1公里，空间信息几乎无价值，但高privacy风险仅凭 temporal信息。公共上下文数据，如POI，在推理基于空间信息中扮演关键角色。我们的发现指出了跟踪数据和空间上下文数据的总量的增长，这些数据应该被政策者考虑，以保护个人隐私。同时，这些发现也可以引导个人在个人位置保护措施方面做出决策。

Generative Fractional Diffusion Models

paper_url: http://arxiv.org/abs/2310.17638
repo_url: None
paper_authors: Gabriel Nobis, Marco Aversa, Maximilian Springenberg, Michael Detzel, Stefano Ermon, Shinichi Nakajima, Roderick Murray-Smith, Sebastian Lapuschkin, Christoph Knochenhauer, Luis Oala, Wojciech Samek
for: 这个论文是为了扩展基于分布型生成模型的连续时间框架，从底层布朗运动（BM）到一种近似的费米布朗运动（FBM）的准确的扩展。
methods: 这篇论文使用了一种连续映射技巧和反时间模型，将 FBM 表示为一种家族的奥尔尼采beck过程的积分，定义了一种生成分数演化模型（GFDM），其驱动噪声演化到一种非马普朗过程，具有无限 quadratic variation。
results: 这篇论文的结果表明，通过控制噪声的杂乱程度（Hurst指数 $H\in(0,1)$），GFDM 可以生成具有不同粒度的分布转换路径。这是根据我们所知，第一篇基于杂乱过程的生成模型。

Abstract
We generalize the continuous time framework for score-based generative models from an underlying Brownian motion (BM) to an approximation of fractional Brownian motion (FBM). We derive a continuous reparameterization trick and the reverse time model by representing FBM as a stochastic integral over a family of Ornstein-Uhlenbeck processes to define generative fractional diffusion models (GFDM) with driving noise converging to a non-Markovian process of infinite quadratic variation. The Hurst index $H\in(0,1)$ of FBM enables control of the roughness of the distribution transforming path. To the best of our knowledge, this is the first attempt to build a generative model upon a stochastic process with infinite quadratic variation.

摘要
我们推广了基于 Brownian motion（BM）的连续时间框架，将其转换为一个约� fractional Brownian motion（FBM）的近似。我们 derivated一个连续重构该项技巧和倒时间模型，通过将 FBM 表示为一个家族 Ornstein-Uhlenbeck 过程的统计 интеграル，以定义生成数据演化模型（GFDM），其驱动噪音趋向无法确定的非马可夫过程。 Hurst 指数 $H \in (0,1)$ 控制了 FBM 的分布变数路径的柔软程度。根据我们所知，这是首次建立基于统计过程 infinite quadratic variation 的生成模型。

Combating Representation Learning Disparity with Geometric Harmonization

paper_url: http://arxiv.org/abs/2310.17622
repo_url: https://github.com/MediaBrain-SJTU/Geometric-Harmonization
paper_authors: Zhihan Zhou, Jiangchao Yao, Feng Hong, Ya Zhang, Bo Han, Yanfeng Wang
for: 提高自动学习下 Representation Learning 的稳定性和抗衰落性，特别是在长杂分布下。
methods: 提出了一种名为 Geometric Harmonization（GH）的新方法，通过衡量自动学习下 embedding 空间的人口统计，然后进行细化的实例化调整，以遏制Head类的空间扩展和避免Queue类的落弛。
results: 对多个 benchmark 数据集进行了广泛的测试，证明 GH 能够高效地适应长杂分布，并且具有高耐衰落性。

Abstract
Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios. Nevertheless, when facing the long-tailed distribution in real-world applications, it is still hard for existing methods to capture transferable and robust representation. Conventional SSL methods, pursuing sample-level uniformity, easily leads to representation learning disparity where head classes dominate the feature regime but tail classes passively collapse. To address this problem, we propose a novel Geometric Harmonization (GH) method to encourage category-level uniformity in representation learning, which is more benign to the minority and almost does not hurt the majority under long-tailed distribution. Specially, GH measures the population statistics of the embedding space on top of self-supervised learning, and then infer an fine-grained instance-wise calibration to constrain the space expansion of head classes and avoid the passive collapse of tail classes. Our proposal does not alter the setting of SSL and can be easily integrated into existing methods in a low-cost manner. Extensive results on a range of benchmark datasets show the effectiveness of GH with high tolerance to the distribution skewness. Our code is available at https://github.com/MediaBrain-SJTU/Geometric-Harmonization.

摘要
自我超级vised学习（SSL）作为一种有效的表示学习方法在不同的预测集上取得了很大的成功。然而，当面临实际应用中的长尾分布时，现有的方法很难捕捉转移和可靠的表示。传统的SSL方法，尝试实现样本级别的均匀性，容易导致表示学习不均衡，Where head classes dominate the feature regime but tail classes passively collapse。To address this problem, we propose a novel Geometric Harmonization (GH) method to encourage category-level uniformity in representation learning, which is more benign to the minority and almost does not hurt the majority under long-tailed distribution. Specifically, GH measures the population statistics of the embedding space on top of self-supervised learning, and then infers an fine-grained instance-wise calibration to constrain the space expansion of head classes and avoid the passive collapse of tail classes. Our proposal does not alter the setting of SSL and can be easily integrated into existing methods in a low-cost manner. Extensive results on a range of benchmark datasets show the effectiveness of GH with high tolerance to the distribution skewness. Our code is available at https://github.com/MediaBrain-SJTU/Geometric-Harmonization.

A qualitative difference between gradient flows of convex functions in finite- and infinite-dimensional Hilbert spaces

paper_url: http://arxiv.org/abs/2310.17610
repo_url: None
paper_authors: Jonathan W. Siegel, Stephan Wojtowytsch
for: 这个论文主要研究了梯度流和梯度降低算法在凸函数空间中的优化问题。
methods: 这篇论文使用了梯度流和梯度降低算法来解决凸函数空间中的优化问题。
results: 这篇论文得到了以下结果：在梯度流 случа子，如果函数 $f$ 没有最小值， то converge $f(x_t) \to \inf f$ 可能具有任意慢的速率。如果函数 $f$ 有最小值，那么残存能量 $f(x_t) - \inf f$ 是可积分的/可加的，并且在特定的希尔伯特空间中可以达到 $o(1/t)$ 的速率。此外，这篇论文还证明了在希尔伯特空间中，这种结果是最佳的，即残存能量 $f(x_t) - \inf f$ 可以随时间的推移而逐渐退化到 $0$。

Abstract
We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions. In the gradient flow case, we prove the following: 1. If $f$ does not have a minimizer, the convergence $f(x_t)\to \inf f$ can be arbitrarily slow. 2. If $f$ does have a minimizer, the excess energy $f(x_t) - \inf f$ is integrable/summable in time. In particular, $f(x_t) - \inf f = o(1/t)$ as $t\to\infty$. 3. In Hilbert spaces, this is optimal: $f(x_t) - \inf f$ can decay to $0$ as slowly as any given function which is monotone decreasing and integrable at $\infty$, even for a fixed quadratic objective. 4. In finite dimension (or more generally, for all gradient flow curves of finite length), this is not optimal: We prove that there are convex monotone decreasing integrable functions $g(t)$ which decrease to zero slower than $f(x_t)-\inf f$ for the gradient flow of any convex function on $\mathbb R^d$. For instance, we show that any gradient flow $x_t$ of a convex function $f$ in finite dimension satisfies $\liminf_{t\to\infty} \big(t\cdot \log^2(t)\cdot \big\{f(x_t) -\inf f\big\}\big)=0$. This improves on the commonly reported $O(1/t)$ rate and provides a sharp characterization of the energy decay law. We also note that it is impossible to establish a rate $O(1/(t\phi(t))$ for any function $\phi$ which satisfies $\lim_{t\to\infty}\phi(t) = \infty$, even asymptotically. Similar results are obtained in related settings for (1) discrete time gradient descent, (2) stochastic gradient descent with multiplicative noise and (3) the heavy ball ODE. In the case of stochastic gradient descent, the summability of $\mathbb E[f(x_n) - \inf f]$ is used to prove that $f(x_n)\to \inf f$ almost surely - an improvement on the convergence almost surely up to a subsequence which follows from the $O(1/n)$ decay estimate.

摘要
我们考虑Gradient Flow/Gradient Descent和Heavy Ball/加速Gradient Descent优化算法，用于凸目标函数。在Gradient Flow情况下，我们证明了以下结果：1. 如果$f$没有最小值，则$f(x_t)\to \inf f$的数值可能很慢。2. 如果$f$有最小值，则$f(x_t) - \inf f$是可积和的。具体来说，$f(x_t) - \inf f = o(1/t)$为$t\to\infty$。3. 在希尔伯特空间中，这个结果是最佳的：$f(x_t) - \inf f$可以随着时间慢慢地衰落到$0$，甚至比任何固定的凸函数。4. 在finite dimension中（或更一般地，所有Gradient Flow曲线的情况下），这个结果不是最佳的：我们证明了存在凸函数$g(t)$，其满足$g(t)\to 0$，且$g(t)$在$t\to\infty$下递减得更慢。例如，我们证明任何Gradient Flow $x_t$的凸函数$f$在finite dimension中，有$\liminf_{t\to\infty} \big(t\cdot \log^2(t)\cdot \big\{f(x_t) -\inf f\big\}\big)=0$。这与通常报告的$O(1/t)$率不同，提供了凸函数的凝固 decay 定律。我们还证明了无法设定任何函数 $\phi$，满足 $\lim_{t\to\infty}\phi(t) = \infty$，甚至在某些情况下，这个率是可以实现的。这些结果在相关的设定中也得到了类似的结果，包括：* 紧密相关的数值统计学中的Discrete Time Gradient Descent。* 在Stochastic Gradient Descent中，使用加法随机变量，并证明了$\mathbb E[f(x_n) - \inf f]$的积和可以用来证明$f(x_n)\to \inf f$的概率是可以实现的。* Heavy Ball ODE中的相关结果。

A minimax optimal control approach for robust neural ODEs

paper_url: http://arxiv.org/abs/2310.17584
repo_url: None
paper_authors: Cristina Cipriani, Alessandro Scagliotti, Tobias Wöhrer
for: 本研究旨在透过对神经动力学模型进行对抗训练，以提高神经网络的可靠性和抗训练性。
methods: 本研究使用了控制理论中的对抗训练方法，将神经动力学模型看作是深度学习模型的离散化，从而解释了神经网络的行为。作者还提出了一种新的Weighted Optimization方法，并在一个低维度分类任务中进行了测试。
results: 研究发现，通过对抗训练，神经网络可以具备更高的可靠性和抗训练性。此外，Weighted Optimization方法也能够提高神经网络的性能。

Abstract
In this paper, we address the adversarial training of neural ODEs from a robust control perspective. This is an alternative to the classical training via empirical risk minimization, and it is widely used to enforce reliable outcomes for input perturbations. Neural ODEs allow the interpretation of deep neural networks as discretizations of control systems, unlocking powerful tools from control theory for the development and the understanding of machine learning. In this specific case, we formulate the adversarial training with perturbed data as a minimax optimal control problem, for which we derive first order optimality conditions in the form of Pontryagin's Maximum Principle. We provide a novel interpretation of robust training leading to an alternative weighted technique, which we test on a low-dimensional classification task.

摘要
在这篇论文中，我们研究了神经ODE的 adversarial 训练从一种可靠控制的角度。这是对经验风险最小化的 классиical 训练的一种代替方法，用于保证输入扰动的可靠结果。神经ODE 允许将深度神经网络看作控制系统的离散化，这样就可以使用控制理论中的 poderful 工具来开发和理解机器学习。在这种特定的情况下，我们将难以控制的数据作为输入进行了对抗训练，并 derive 了 Pontryagin 最大原理中的第一个优化条件。我们还提供了一种新的robust 训练导致的权重技巧，并在一个低维分类任务中进行了测试。

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space

paper_url: http://arxiv.org/abs/2310.17582
repo_url: None
paper_authors: Xiuyuan Cheng, Jianfeng Lu, Yixin Tan, Yao Xie
for: 这个论文的目的是提供一种 teorethical guarantee for generating data distribution using a progressive flow model, specifically the JKO flow model.
methods: 该论文使用了 Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network, 以及 proximal gradient descent (GD) in Wasserstein space.
results: 论文提供了 $O(\varepsilon^2)$ 的 Kullback-Leibler (KL) guarantee of data generation by a JKO flow model, 以及 KL-$W_2$ mixed error guarantees. Additionally, the paper proves the non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD for a general class of convex objective functionals.

Abstract
Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be $O(\varepsilon^2)$ when using $N \lesssim \log (1/\varepsilon)$ many JKO steps ($N$ Residual Blocks in the flow) where $\varepsilon $ is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-$W_2$ mixed error guarantees. The non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest.

摘要
流基模型在计算数据生成和概率方面享受着一些优势，并在最近的实验性表现中表现竞争力强。相比之下，相关的分数基diffusion模型的理论研究仍然scarce。在这篇论文中，我们提供了一种 theoretically guarantee of generating data distribution by a progressive flow model, namely the JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. 我们利用了Wasserstein空间中的贝叶斯逼近下降的 exponential convergence，证明了JKO流模型在使用$N \lesssim \log (1/\varepsilon)$ many JKO steps（$N$ residual blocks in the flow）时，对数据生成的KL guarantees是$O(\varepsilon^2)$，其中$\varepsilon$是每步first-order condition的误差。假设数据密度是有限第二 moments，我们的理论推广到无密度的数据分布和倒数过程中的倒数错误。我们得到了KL-$W_2$ mixed error guarantees。此外，我们还证明了JKO-type $W_2$-proximal GD的非偏 asymptotic convergence rate для一类CONvex objective functionals，包括KL divergence作为特例，这可能是独立的兴趣。

BLIS-Net: Classifying and Analyzing Signals on Graphs

paper_url: http://arxiv.org/abs/2310.17579
repo_url: None
paper_authors: Charles Xu, Laney Goldman, Valentina Guo, Benjamin Hollander-Bodie, Maedee Trank-Greene, Ian Adelstein, Edward De Brouwer, Rex Ying, Smita Krishnaswamy, Michael Perlmutter
For: 本文是关于信号分类任务中，使用图 neural network (GNN) 来捕捉信号的多频谱特性和长距离相互作用。* Methods: 本文提出了一种基于 geometric scattering transform 的新型 GNN，称为 BLIS-Net，可以捕捉本地和全局信号结构，并同时捕捉低频和高频信息。* Results: 作者对 synthetic 和实际数据集进行了评估，证明了 BLIS-Net 能够超越原有的 geometric scattering 架构，并且在信号分类任务中 achieve 更高的性能。

Abstract
Graph neural networks (GNNs) have emerged as a powerful tool for tasks such as node classification and graph classification. However, much less work has been done on signal classification, where the data consists of many functions (referred to as signals) defined on the vertices of a single graph. These tasks require networks designed differently from those designed for traditional GNN tasks. Indeed, traditional GNNs rely on localized low-pass filters, and signals of interest may have intricate multi-frequency behavior and exhibit long range interactions. This motivates us to introduce the BLIS-Net (Bi-Lipschitz Scattering Net), a novel GNN that builds on the previously introduced geometric scattering transform. Our network is able to capture both local and global signal structure and is able to capture both low-frequency and high-frequency information. We make several crucial changes to the original geometric scattering architecture which we prove increase the ability of our network to capture information about the input signal and show that BLIS-Net achieves superior performance on both synthetic and real-world data sets based on traffic flow and fMRI data.

摘要
GRAPH NEURAL NETWORKS (GNNs) 已经成为一种有力的工具，用于节点分类和图分类等任务。然而，对于信号分类任务，数据通常是多个函数（被称为信号）定义在一个图的边上。这些任务需要特殊的网络设计，不同于传统的 GNN 任务。实际上，传统的 GNN 依赖于局部化低通滤波器，而信号达到了复杂多频响应和长距离互动。这种情况需要我们提出一种新的 GNN，即 BLIS-Net（Bi-Lipschitz Scattering Net）。我们的网络可以捕捉到本地和全局信号结构，同时捕捉到低频和高频信息。我们对原始的几何散射架构进行了一些重要的更改，并证明这些更改使得我们的网络可以更好地捕捉输入信号的信息。我们在synthetic和实际数据集上基于交通流和fMRI数据表示，BLIS-Net 可以达到更高的性能。

Efficient Numerical Algorithm for Large-Scale Damped Natural Gradient Descent

paper_url: http://arxiv.org/abs/2310.17556
repo_url: None
paper_authors: Yixiao Chen, Hao Xie, Han Wang
for: 这篇论文是为了解决大规模的湿 Fisher 矩阵问题，其中参数数量大大超过可用样本数量。这个问题是自然梯度下降和随机重新配置的基础问题。
methods: 这篇论文提出了一种新的解决方法，基于 Cholesky 分解。这种方法是通用的，并且在 benchmark 结果中显示了它比现有方法更快。
results: benchmark 结果表明，这种方法比现有方法更快。

Abstract
We propose a new algorithm for efficiently solving the damped Fisher matrix in large-scale scenarios where the number of parameters significantly exceeds the number of available samples. This problem is fundamental for natural gradient descent and stochastic reconfiguration. Our algorithm is based on Cholesky decomposition and is generally applicable. Benchmark results show that the algorithm is significantly faster than existing methods.

摘要
我们提出了一种新的算法，用于高规模场景中有效地解决受束梯度矩阵问题，这个问题是自然下降和随机重新配置中的基础问题。我们的算法基于Cholesky分解，并且广泛适用。 benchmark结果表明，我们的算法比现有方法更快。Here's a word-for-word translation of the text:我们提出了一种新的算法，用于高规模场景中有效地解决受束梯度矩阵问题，这个问题是自然下降和随机重新配置中的基础问题。我们的算法基于Cholesky分解，并且广泛适用。 benchmark结果表明，我们的算法比现有方法更快。Note that Simplified Chinese is the standard writing system used in mainland China, and it is different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Hierarchical Ensemble-Based Feature Selection for Time Series Forecasting

paper_url: http://arxiv.org/abs/2310.17544
repo_url: None
paper_authors: Aysin Tumay, Mustafa E. Aydin, Suleyman S. Kozat
for: 本研究旨在提出一种基于层次堆叠的 ensemble 方法，用于在非站立性和有限样本数量的情况下进行特征选择，以解决传统特征选择方法和特征重要性评价的局限性。
methods: 本方法首先使用一种机器学习模型，在一 subset of 特征上训练该模型，然后使用另一种算法将剩余的特征更新模型的输出，以最小化目标损失。这种层次结构允许自适应深度和特征选择。
results: 对synthetic和实际数据集进行测试，示出了该方法与传统方法和状态 искусственный进步的性能优势，同时具有扩展性和稳定性。

Abstract
We study a novel ensemble approach for feature selection based on hierarchical stacking in cases of non-stationarity and limited number of samples with large number of features. Our approach exploits the co-dependency between features using a hierarchical structure. Initially, a machine learning model is trained using a subset of features, and then the model's output is updated using another algorithm with the remaining features to minimize the target loss. This hierarchical structure allows for flexible depth and feature selection. By exploiting feature co-dependency hierarchically, our proposed approach overcomes the limitations of traditional feature selection methods and feature importance scores. The effectiveness of the approach is demonstrated on synthetic and real-life datasets, indicating improved performance with scalability and stability compared to the traditional methods and state-of-the-art approaches.

摘要
我们研究了一种新的集成方法，用于非站立性和有限样本大量特征的特征选择。我们的方法利用特征之间的依赖关系，使用层次结构。首先，我们使用一个子集特征训练机器学习模型，然后使用另一个算法更新模型的输出，使用剩余的特征来最小化目标损失。这种层次结构允许我们灵活地选择深度和特征。通过层次地利用特征之间的依赖关系，我们的提议方法超越传统的特征选择方法和特征重要性分数。我们在synthetic和实际数据集上证明了该方法的有效性，表明与传统方法和当前最佳方法相比，它具有扩展性和稳定性。

EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving

paper_url: http://arxiv.org/abs/2310.17540
repo_url: None
paper_authors: Yuping Wang, Jier Chen
for: 预测自动驾驶车辆动态行为需要深刻理解Agent之间的互动关系和Euclidean几何变换下的动态平衡。传统模型通常缺乏这些复杂的动态关系和Agent之间的互动关系，从而导致模型的预测误差高并且训练效率低。
methods: 我们使用EqMotion，一种革新的平衡性粒子模型，以及考虑到不变性Agent互动关系的人类预测模型， для进行多 Agent 车辆动态预测。此外，我们还使用多模式预测机制，以处理多个可能的未来路径的 probabilistic 方式。
results: 通过利用EqMotion，我们的模型实现了状态当前（SOTA）性能，并且减少了参数数量（1.2 million）和训练时间（ less than 2 hours）。

Abstract
Forecasting vehicular motions in autonomous driving requires a deep understanding of agent interactions and the preservation of motion equivariance under Euclidean geometric transformations. Traditional models often lack the sophistication needed to handle the intricate dynamics inherent to autonomous vehicles and the interaction relationships among agents in the scene. As a result, these models have a lower model capacity, which then leads to higher prediction errors and lower training efficiency. In our research, we employ EqMotion, a leading equivariant particle, and human prediction model that also accounts for invariant agent interactions, for the task of multi-agent vehicle motion forecasting. In addition, we use a multi-modal prediction mechanism to account for multiple possible future paths in a probabilistic manner. By leveraging EqMotion, our model achieves state-of-the-art (SOTA) performance with fewer parameters (1.2 million) and a significantly reduced training time (less than 2 hours).

摘要
预测自动驾驶车辆动态需要深刻理解代理者之间的互动关系和保持几何变换下的动量对称性。传统模型通常缺乏自动驾驶车辆内场势的细腻，因此这些模型的预测误差较高，训练效率较低。在我们的研究中，我们使用EqMotion，一种领先的对称性粒子模型，以及考虑到不变代理者互动关系的人类预测模型，进行多辆自动车动态预测。此外，我们还使用多模态预测机制，以 probabilistic 的方式考虑多个未来路径。通过利用EqMotion，我们的模型实现了状态对当前最佳性能（SOTA），参数量为1.2万，训练时间较短（少于2小时）。

Little Exploration is All You Need

paper_url: http://arxiv.org/abs/2310.17538
repo_url: https://github.com/ArchizSolutions/Lead-Management-Software-2020-for-Generate-More-Sales-
paper_authors: Henry H. H. Chen, Jiaming Lu
for: 本研究旨在提高多臂投机问题（Multi-armed Bandit problem）中的选择策略，特别是处理不确定性和难度问题。
methods: 本研究提出了一个基于UCB算法的修改版本，即UCB$^\tau$，其中$\tau > 1/2$是一个参数，可以考虑任务难度。
results: 在synthetic数据上进行比较性评估，UCB$^\tau$不仅在效率上表现更好，还能够在不同的环境和参数设定下降低风险。

Abstract
The prevailing principle of "Optimism in the Face of Uncertainty" advocates for the incorporation of an exploration bonus, generally assumed to be proportional to the inverse square root of the visit count ($1/\sqrt{n}$), where $n$ is the number of visits to a particular state-action pair. This approach, however, exclusively focuses on "uncertainty," neglecting the inherent "difficulty" of different options. To address this gap, we introduce a novel modification of standard UCB algorithm in the multi-armed bandit problem, proposing an adjusted bonus term of $1/n^\tau$, where $\tau > 1/2$, that accounts for task difficulty. Our proposed algorithm, denoted as UCB$^\tau$, is substantiated through comprehensive regret and risk analyses, confirming its theoretical robustness. Comparative evaluations with standard UCB and Thompson Sampling algorithms on synthetic datasets demonstrate that UCB$^\tau$ not only outperforms in efficacy but also exhibits lower risk across various environmental conditions and hyperparameter settings.

摘要
“optimism在不确定性面前”的主要原则提倡 incorporate an exploration bonus，通常被认为是 proportional to the inverse square root of the visit count ($1/\sqrt{n}$), where $n$ is the number of visits to a particular state-action pair. 然而，这种方法将注意力集中在“uncertainty”上，忽略了不同选项的“difficulty”。为了填补这个 gap，我们介绍了一种修改了标准 UCB 算法的新方法，即 UCB$^\tau$，这个方法在 multi-armed bandit problem 中提出了一个调整的bonus term of $1/n^\tau$, where $\tau > 1/2$, 以考虑任务难度。我们通过了全面的 regret 和 risk 分析，证明了 UCB$^\tau$ 的理论坚固性。在实验中，UCB$^\tau$ 不仅在效率方面表现出色，而且在不同的环境条件和参数设置下也具有较低的风险。

Learning Regularized Graphon Mean-Field Games with Unknown Graphons

paper_url: http://arxiv.org/abs/2310.17531
repo_url: None
paper_authors: Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang
For: The paper is written for learning the Nash Equilibrium (NE) of regularized Graphon Mean-Field Games (GMFGs) when the graphons are unknown.* Methods: The paper proposes the Proximal Policy Optimization for GMFG (GMFG-PPO) algorithm and an efficient algorithm to estimate the transition kernels, reward functions, and graphons from sampled agents using kernel embedding of distributions.* Results: The paper shows that the proposed algorithms converge at a rate of $O(T^{-1/3})$ after $T$ iterations with an estimation oracle, and demonstrates the efficacy of the proposed algorithms through simulations, which show that learning the unknown graphons reduces the exploitability effectively.

Abstract
We design and analyze reinforcement learning algorithms for Graphon Mean-Field Games (GMFGs). In contrast to previous works that require the precise values of the graphons, we aim to learn the Nash Equilibrium (NE) of the regularized GMFGs when the graphons are unknown. Our contributions are threefold. First, we propose the Proximal Policy Optimization for GMFG (GMFG-PPO) algorithm and show that it converges at a rate of $O(T^{-1/3})$ after $T$ iterations with an estimation oracle, improving on a previous work by Xie et al. (ICML, 2021). Second, using kernel embedding of distributions, we design efficient algorithms to estimate the transition kernels, reward functions, and graphons from sampled agents. Convergence rates are then derived when the positions of the agents are either known or unknown. Results for the combination of the optimization algorithm GMFG-PPO and the estimation algorithm are then provided. These algorithms are the first specifically designed for learning graphons from sampled agents. Finally, the efficacy of the proposed algorithms are corroborated through simulations. These simulations demonstrate that learning the unknown graphons reduces the exploitability effectively.

摘要
我们设计和分析了基于Graphon Mean-Field Games（GMFG）的强化学习算法。与前一些研究不同，我们尝试在不知道graphon的情况下学习GMFG的 Nash Equilibrium（NE）。我们的贡献包括以下三点：1. 我们提出了GMFG-PPO算法（Proximal Policy Optimization for GMFG），并证明它在$T$迭代后 converges at a rate of $O(T^{-1/3})$ with an estimation oracle, 超越了Xie et al. (ICML, 2021)的前一个研究。2. 使用kernel embedding of distributions，我们设计了高效的算法来估计transition kernels, reward functions, and graphons from sampled agents。并 derivated convergence rates when the positions of the agents are either known or unknown.3. 我们将GMFG-PPO算法和估计算法相结合，提供了首先特地设计为学习graphons from sampled agents的算法。4. 我们通过simulations corroborated the efficacy of the proposed algorithms, which demonstrate that learning the unknown graphons effectively reduces the exploitability.Here's the translation in Traditional Chinese:我们设计和分析了基于Graphon Mean-Field Games（GMFG）的强化学习算法。与前一些研究不同，我们尝试在不知道graphon的情况下学习GMFG的 Nash Equilibrium（NE）。我们的贡献包括以下三点：1. 我们提出了GMFG-PPO算法（Proximal Policy Optimization for GMFG），并证明它在$T$迭代后 converges at a rate of $O(T^{-1/3})$ with an estimation oracle, 超越了Xie et al. (ICML, 2021)的前一个研究。2. 使用kernel embedding of distributions，我们设计了高效的算法来估计transition kernels, reward functions, and graphons from sampled agents。并 derivated convergence rates when the positions of the agents are either known or unknown.3. 我们将GMFG-PPO算法和估计算法相结合，提供了首先特地设计为学习graphons from sampled agents的算法。4. 我们通过simulations corroborated the efficacy of the proposed algorithms, which demonstrate that learning the unknown graphons effectively reduces the exploitability.

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

paper_url: http://arxiv.org/abs/2310.17502
repo_url: None
paper_authors: Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu
for: 这篇论文的目的是提出一种方法，用于在语音合成系统中自定义声音和发音样式，并且提供了Intuitive和细化控制。
methods: 该方法使用人工生成的说话者嵌入，不需要任何说话者或风格标注数据。在训练过程中，这些嵌入可以通过与真实人的嵌入相乘，以控制语音和发音样式。
results: 该方法可以提供Intuitive和细化控制的语音合成系统，不需要任何隐私数据，并且可以在推理过程中保护隐私。

Abstract
Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference.

摘要
<>Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference.Translation:自然语言合成系统中自定义声音和发音风格具有挑战性，因为有限的数据不够标注。此外，修改现有人的声音也存在伦理问题。在这篇论文中，我们提出了一种方法，可以生成不可追溯到实际人类的人工发音嵌入，同时提供细腻的控制权限。这些人工嵌入可以在训练中被feed到自然语言合成系统，不需要标注人员或风格。在推理中，不会产生隐私问题。

CBD: A Certified Backdoor Detector Based on Local Dominant Probability

paper_url: http://arxiv.org/abs/2310.17498
repo_url: https://github.com/zhenxianglance/cbd
paper_authors: Zhen Xiang, Zidi Xiong, Bo Li
For: This paper presents a certified backdoor detector (CBD) to detect backdoor attacks in deep neural networks.* Methods: CBD uses a novel, adjustable conformal prediction scheme based on the proposed statistic local dominant probability.* Results: CBD achieves high detection accuracy with guarantees and provides detection certification, outperforming state-of-the-art detectors on four benchmark datasets. Specifically, it achieves 100% detection true positive rate on backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0.75$.

Abstract
Backdoor attack is a common threat to deep neural networks. During testing, samples embedded with a backdoor trigger will be misclassified as an adversarial target by a backdoored model, while samples without the backdoor trigger will be correctly classified. In this paper, we present the first certified backdoor detector (CBD), which is based on a novel, adjustable conformal prediction scheme based on our proposed statistic local dominant probability. For any classifier under inspection, CBD provides 1) a detection inference, 2) the condition under which the attacks are guaranteed to be detectable for the same classification domain, and 3) a probabilistic upper bound for the false positive rate. Our theoretical results show that attacks with triggers that are more resilient to test-time noise and have smaller perturbation magnitudes are more likely to be detected with guarantees. Moreover, we conduct extensive experiments on four benchmark datasets considering various backdoor types, such as BadNet, CB, and Blend. CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification. Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0.75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.

摘要
深度神经网络常遭受后门攻击。在测试时，含有后门触发器的样本将被预测为敌对目标，而不含后门触发器的样本将正确地被分类。在这篇论文中，我们介绍了首个证明型后门检测器（CBD），它基于我们提出的新的可调征式预测方法，以及我们的提出的统计本地主要概率。对于任何被检测的分类器，CBD提供了1）检测推理，2）检测是否在同一类型的分类领域下可以保证检测攻击，以及3）预测false positive率的 probabilistic upper bound。我们的理论结果表明，具有更高抗预测噪声和更小的抖动范围的攻击更容易被检测，并且我们在四个标准 benchmark 数据集上进行了广泛的实验，包括BadNet、CB和Blend等多种后门类型。CBD在检测精度方面与现状冠军检测器相当或更高，并且同时提供了检测证明。尤其是对于具有 $\ell_2\leq0.75$ 的随机扰动触发器，CBD在四个 benchmark 数据集上取得了100%（98%）、100%（84%）、98%（98%）和72%（40%）的实验（证明）检测真正率，同时保持低的 false positive 率。

Tackling Interference Induced by Data Training Loops in A/B Tests: A Weighted Training Approach

paper_url: http://arxiv.org/abs/2310.17496
repo_url: None
paper_authors: Nian Si
for: 提高推荐系统的精度和效果
methods: 使用权重训练方法，即在训练模型时对数据点的权重进行调整，以降低模型的偏度和方差
results: 通过对模型进行权重训练，可以降低模型的偏度和方差，并且在实验studies中比其他方法更低In English, this would be:
for: Improving the accuracy and effectiveness of recommendation systems
methods: Using weighted training methods, which involve adjusting the weights of data points during model training to reduce the bias and variance of the model
results: By using weighted training, we can reduce the bias and variance of the model, and in simulation studies, our approach outperformed other methods.

Abstract
In modern recommendation systems, the standard pipeline involves training machine learning models on historical data to predict user behaviors and improve recommendations continuously. However, these data training loops can introduce interference in A/B tests, where data generated by control and treatment algorithms, potentially with different distributions, are combined. To address these challenges, we introduce a novel approach called weighted training. This approach entails training a model to predict the probability of each data point appearing in either the treatment or control data and subsequently applying weighted losses during model training. We demonstrate that this approach achieves the least variance among all estimators without causing shifts in the training distributions. Through simulation studies, we demonstrate the lower bias and variance of our approach compared to other methods.

摘要
现代推荐系统中的标准管道通常包括使用历史数据训练机器学习模型，以预测用户行为并不断改进推荐。然而，这些数据训练循环可能会导致A/B测试中的干扰，因为控制和治疗算法生成的数据可能具有不同的分布。为解决这些挑战，我们介绍了一种新的方法 called 权重训练。这种方法是训练一个模型，以预测每个数据点是否出现在治疗或控制数据中，并在模型训练时应用权重损失。我们示示了这种方法可以实现最小的偏差，而不会导致训练分布的变化。通过实验研究，我们表明我们的方法在其他方法相比具有较低的偏差和干扰。

FedPEAT: Convergence of Federated Learning, Parameter-Efficient Fine Tuning, and Emulator Assisted Tuning for Artificial Intelligence Foundation Models with Mobile Edge Computing

paper_url: http://arxiv.org/abs/2310.17491
repo_url: None
paper_authors: Terence Jie Chua, Wenhan Yu, Jun Zhao, Kwok-Yan Lam
for: 该论文旨在解决基础模型时的部署和微调问题，提供一种基于模拟器和参数效率的微调方法。
methods: 该方法包括模拟器帮助微调（EAT）和参数效率微调（PEFT），并将这两种方法组合成为参数效率模拟器帮助微调（PEAT）。在联合学习环境中，该方法使用适配器、模拟器和PEFT来进行联合学习模型微调，以保护模型隐私和内存效率。
results: 该论文通过在一个独特的服务器参与联合联合学习微调场景中进行测试，表明了该方法的潜在在解决基础模型挑战。

Abstract
The emergence of foundation models, including language and vision models, has reshaped AI's landscape, offering capabilities across various applications. Deploying and fine-tuning these large models, like GPT-3 and BERT, presents challenges, especially in the current foundation model era. We introduce Emulator-Assisted Tuning (EAT) combined with Parameter-Efficient Fine-Tuning (PEFT) to form Parameter-Efficient Emulator-Assisted Tuning (PEAT). Further, we expand this into federated learning as Federated PEAT (FedPEAT). FedPEAT uses adapters, emulators, and PEFT for federated model tuning, enhancing model privacy and memory efficiency. Adapters adjust pre-trained models, while emulators give a compact representation of original models, addressing both privacy and efficiency. Adaptable to various neural networks, our approach also uses deep reinforcement learning for hyper-parameter optimization. We tested FedPEAT in a unique scenario with a server participating in collaborative federated tuning, showcasing its potential in tackling foundation model challenges.

摘要
“基础模型的出现，包括语言和视觉模型，已经改变了人工智能的景观，提供了多种应用领域的能力。部署和细化这些大型模型，如GPT-3和BERT，具有挑战，尤其在当前基础模型时代。我们提出了助手帮助调参（EAT）和参数有效调参（PEFT）的结合，称为参数有效助手帮助调参（PEAT）。此外，我们扩展了这种方法到联合学习，称为联合参数有效助手帮助调参（FedPEAT）。联合参数有效助手帮助调参使用适配器、模拟器和PEFT进行联合模型调参，提高模型隐私和内存效率。适配器调整预训练模型，而模拟器提供了原始模型的减少表示，解决了隐私和效率的问题。我们的方法可以应用于不同的神经网络，并使用深度强化学习来优化参数。我们在一个独特的服务器参与联合联合调参场景中测试了我们的方法，展示了它在基础模型挑战中的潜力。”

Fair collaborative vehicle routing: A deep multi-agent reinforcement learning approach

paper_url: http://arxiv.org/abs/2310.17485
repo_url: None
paper_authors: Stephen Mak, Liming Xu, Tim Pearce, Michael Ostroumov, Alexandra Brintrup
for: Collaborative vehicle routing problem
methods: Deep multi-agent reinforcement learning
results: Reduction in run-time of 88%

Abstract
Collaborative vehicle routing occurs when carriers collaborate through sharing their transportation requests and performing transportation requests on behalf of each other. This achieves economies of scale, thus reducing cost, greenhouse gas emissions and road congestion. But which carrier should partner with whom, and how much should each carrier be compensated? Traditional game theoretic solution concepts are expensive to calculate as the characteristic function scales exponentially with the number of agents. This would require solving the vehicle routing problem (NP-hard) an exponential number of times. We therefore propose to model this problem as a coalitional bargaining game solved using deep multi-agent reinforcement learning, where - crucially - agents are not given access to the characteristic function. Instead, we implicitly reason about the characteristic function; thus, when deployed in production, we only need to evaluate the expensive post-collaboration vehicle routing problem once. Our contribution is that we are the first to consider both the route allocation problem and gain sharing problem simultaneously - without access to the expensive characteristic function. Through decentralised machine learning, our agents bargain with each other and agree to outcomes that correlate well with the Shapley value - a fair profit allocation mechanism. Importantly, we are able to achieve a reduction in run-time of 88%.

摘要
Traditional game theoretic solution concepts are computationally expensive and scale exponentially with the number of agents, making them impractical for large-scale scenarios. We propose modeling this problem as a coalitional bargaining game solved using deep multi-agent reinforcement learning, where agents do not have access to the characteristic function. Instead, we implicitly reason about the characteristic function, allowing for efficient computation.Our contribution is the first to consider both the route allocation problem and gain sharing problem simultaneously, without access to the expensive characteristic function. Through decentralized machine learning, our agents bargain with each other and agree to outcomes that correlate well with the Shapley value, a fair profit allocation mechanism. This approach achieves a reduction in run-time of 88%.

Secure short-term load forecasting for smart grids with transformer-based federated learning

paper_url: http://arxiv.org/abs/2310.17477
repo_url: https://github.com/JonasSievers/transformerBasedFederatedLearningForSecureSTLFInSG
paper_authors: Jonas Sievers, Thomas Blank
for: 预测电力负载，以帮助智能电网实现供应和需求的均衡。
methods: 使用联邦学习，在私人数据上进行本地学习，仅将训练完成的模型参数统一更新在全球服务器上。
results: 使用 transformer 型深度学习模型，在德国大学校园的数据上进行短期电力负载预测，与中央学习和本地学习进行比较，结果显示 transformer 型预测是该领域中轻量级的一个可靠选择。

Abstract
Electricity load forecasting is an essential task within smart grids to assist demand and supply balance. While advanced deep learning models require large amounts of high-resolution data for accurate short-term load predictions, fine-grained load profiles can expose users' electricity consumption behaviors, which raises privacy and security concerns. One solution to improve data privacy is federated learning, where models are trained locally on private data, and only the trained model parameters are merged and updated on a global server. Therefore, this paper presents a novel transformer-based deep learning approach with federated learning for short-term electricity load prediction. To evaluate our results, we benchmark our federated learning architecture against central and local learning and compare the performance of our model to long short-term memory models and convolutional neural networks. Our simulations are based on a dataset from a German university campus and show that transformer-based forecasting is a promising alternative to state-of-the-art models within federated learning.

摘要
electricity load forecasting是智能电网中必备的任务，帮助供应和需求均衡。而高级深度学习模型需要大量高分辨率数据进行准确短期荷电预测，而细致的荷电Profile可能曝光用户的电力消耗习惯，这会引起隐私和安全问题。为解决这问题，我们提出了一种基于转换器的深度学习方法，通过联合学习来保护数据隐私。我们对中央学习和本地学习进行比较，并与长期快速储存型模型和卷积神经网络进行比较。我们的 simulations 基于德国大学校园的数据集，显示 transformer-based forecasting 是 federated learning 中的一种有前途的代替方案。

Foundation Model Based Native AI Framework in 6G with Cloud-Edge-End Collaboration

paper_url: http://arxiv.org/abs/2310.17471
repo_url: None
paper_authors: Xiang Chen, Zhiheng Guo, Xijun Wang, Howard H. Yang, Chenyuan Feng, Junshen Su, Sihui Zheng, Tony Q. S. Quek
for: This paper aims to redefine modes of collaboration between devices and servers and construct native intelligence libraries for 6G native AI.
methods: The proposed framework is based on foundation models and includes a customization approach for intent-aware PFM, a construction of a task-oriented AI toolkit, and a novel cloud-edge-end collaboration paradigm.
results: The proposed framework is applied to orchestration, achieving the maximum sum rate within a wireless communication system, and preliminary evaluation results are presented.

Abstract
Future wireless communication networks are in a position to move beyond data-centric, device-oriented connectivity and offer intelligent, immersive experiences based on task-oriented connections, especially in the context of the thriving development of pre-trained foundation models (PFM) and the evolving vision of 6G native artificial intelligence (AI). Therefore, redefining modes of collaboration between devices and servers and constructing native intelligence libraries become critically important in 6G. In this paper, we analyze the challenges of achieving 6G native AI from the perspectives of data, intelligence, and networks. Then, we propose a 6G native AI framework based on foundation models, provide a customization approach for intent-aware PFM, present a construction of a task-oriented AI toolkit, and outline a novel cloud-edge-end collaboration paradigm. As a practical use case, we apply this framework for orchestration, achieving the maximum sum rate within a wireless communication system, and presenting preliminary evaluation results. Finally, we outline research directions for achieving native AI in 6G.

摘要
未来无线通信网络即将超越数据中心、设备围绕的连接，而是提供智能、沉浸式体验，基于任务围绕的连接，尤其在PFM（预训练基础模型）和6G天然智能落地视野的迅速发展过程中。因此，在6G中重新定义设备和服务器之间的合作模式和构建Native智能库成为非常重要。本文从数据、智能和网络三个角度分析6G天然智能的挑战，然后提出了基于基础模型的6G天然智能框架，并提供了意图意识PFM的自定义方法，构建了任务围绕的AI工具箱，并详细介绍了云端-边缘-终端的合作模式。作为实践案例，我们应用了这个框架来协调无线通信系统中的最大和值，并对此进行了初步的评估结果。最后，我们提出了6G天然智能实现的研究方向。

The statistical thermodynamics of generative diffusion models

paper_url: http://arxiv.org/abs/2310.17467
repo_url: None
paper_authors: Luca Ambrogioni
for: 这篇论文探讨了生成扩散模型在多种生成模型领域的表现，并证明了这些模型可以使用平衡统计力学工具来理解。
methods: 这篇论文使用了平衡统计力学工具来重新解释生成扩散模型，并证明了这些模型在Symmetry breaking现象下经历第二阶段相对稳定过程。
results: 这篇论文显示了生成扩散模型在Symmetry breaking现象下的性能，并提出了一组平衡统计力学的critical exponent来描述这种不稳定性。最后，论文分析了将 diffusion models和associative memory networks相关的最近工作，并将其与热力学形式化进行了评估。

Abstract
Generative diffusion models have achieved spectacular performance in many areas of generative modeling. While the fundamental ideas behind these models come from non-equilibrium physics, in this paper we show that many aspects of these models can be understood using the tools of equilibrium statistical mechanics. Using this reformulation, we show that generative diffusion models undergo second-order phase transitions corresponding to symmetry breaking phenomena. We argue that this lead to a form of instability that lies at the heart of their generative capabilities and that can be described by a set of mean field critical exponents. We conclude by analyzing recent work connecting diffusion models and associative memory networks in view of the thermodynamic formulations.

摘要
生成扩散模型在多种生成模型领域呈现出了吸目的表现。虽然这些模型的基本思想来自非平衡物理学，但在这篇论文中我们展示了使用平衡统计力学工具来理解这些模型的多种方面。使用这种划转，我们显示了生成扩散模型会经历第二阶段相对稳定转变，与Symmetry breaking现象相关。我们认为这种不稳定性是生成模型的核心特点，可以通过一组mean field均衡 exponent来描述。我们最后分析了与扩散模型和相关记忆网络之间的关系，以及这些热力学划转的推广。Note:* "生成扩散模型" (generative diffusion models) refers to a class of machine learning models that generate samples by iteratively refining a random noise vector until it matches the desired target distribution.* "非平衡物理学" (non-equilibrium physics) refers to the study of physical systems that are not in thermodynamic equilibrium, such as systems with energy input or output.* "Symmetry breaking" refers to the phenomenon where a system exhibits different behaviors or properties under different conditions, such as when a physical system is subjected to different external fields.* "mean field critical exponents" refer to a set of mathematical exponents that describe the behavior of a system near a phase transition, such as the power-law behavior of the order parameter near a second-order phase transition.

Bayesian Neural Controlled Differential Equations for Treatment Effect Estimation

paper_url: http://arxiv.org/abs/2310.17463
repo_url: https://github.com/konstantinhess/bayesian-neural-cde
paper_authors: Konstantin Hess, Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel
for: 这篇论文的目的是估计医疗 treatment effect 在连续时间中，以推广个性化医疗。
methods: 这篇论文提出了一种新的 Bayesian neural controlled differential equation (BNCDE)，用于估计 treatment effect 在连续时间中。这个方法使用时间维度通过一个对应的系统的神经控制了条件方程和神经随机方程，使得可以进行可调整的构造式 Bayesian 推断。
results: 这篇论文的结果显示，使用 BNCDE 可以提供 treatment effect 的 posterior predictive distributions，并且这些分布具有可调整的性质。这意味着，这种方法可以提供医疗决策中的可靠性和可调整性。

Abstract
Treatment effect estimation in continuous time is crucial for personalized medicine. However, existing methods for this task are limited to point estimates of the potential outcomes, whereas uncertainty estimates have been ignored. Needless to say, uncertainty quantification is crucial for reliable decision-making in medical applications. To fill this gap, we propose a novel Bayesian neural controlled differential equation (BNCDE) for treatment effect estimation in continuous time. In our BNCDE, the time dimension is modeled through a coupled system of neural controlled differential equations and neural stochastic differential equations, where the neural stochastic differential equations allow for tractable variational Bayesian inference. Thereby, for an assigned sequence of treatments, our BNCDE provides meaningful posterior predictive distributions of the potential outcomes. To the best of our knowledge, ours is the first tailored neural method to provide uncertainty estimates of treatment effects in continuous time. As such, our method is of direct practical value for promoting reliable decision-making in medicine.

摘要
<>使用 continuous time 来估算医疗 intervención 的效果是个重要的挑战。现有的方法只能提供点估计 intervención 的潜在效果，而忽略了uncertainty 的评估。需要注意的是，uncertainty 评估是医疗应用中可靠决策的关键。为了填补这个空白，我们提出了一种novel Bayesian neural controlled differential equation (BNCDE) for treatment effect estimation in continuous time。在我们的 BNCDE 中，时间维度是通过一个coupled系统的神经控制涨权 differential equations和神经随机涨权 differential equations来模型，其中神经随机涨权 differential equations 允许 tractable 的Variational Bayesian inference。因此，对于一个指定的治疗序列，我们的 BNCDE 可以提供可靠的 posterior predictive distributions of potential outcomes。据我们所知，我们的方法是首次采用神经方法来提供 continuous time 中 intervención 效果的uncertainty estimates。因此，我们的方法对于促进医疗应用中可靠决策具有直接的实践价值。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Coalitional Bargaining via Reinforcement Learning: An Application to Collaborative Vehicle Routing

paper_url: http://arxiv.org/abs/2310.17458
repo_url: None
paper_authors: Stephen Mak, Liming Xu, Tim Pearce, Michael Ostroumov, Alexandra Brintrup
For: Collaborative Vehicle Routing (CVR) problem, where delivery companies cooperate to reduce cost, greenhouse gas emissions, and road congestion by sharing delivery information and performing delivery requests on behalf of each other.* Methods: Modified Independent Proximal Policy Optimization (IPPO) algorithm, which is a decentralized approach that considers the self-interested nature of companies, and coalitional bargaining game to implicitly reason about the characteristic function and eliminate the need to evaluate the Vehicle Routing Problem (VRP) an exponential number of times.* Results: The proposed decentralized approach outperforms a strong heuristic bot, with the agents correctly identifying the optimal coalitions 79% of the time with an average optimality gap of 4.2% and a reduction in run-time of 62%.

Abstract
Collaborative Vehicle Routing is where delivery companies cooperate by sharing their delivery information and performing delivery requests on behalf of each other. This achieves economies of scale and thus reduces cost, greenhouse gas emissions, and road congestion. But which company should partner with whom, and how much should each company be compensated? Traditional game theoretic solution concepts, such as the Shapley value or nucleolus, are difficult to calculate for the real-world problem of Collaborative Vehicle Routing due to the characteristic function scaling exponentially with the number of agents. This would require solving the Vehicle Routing Problem (an NP-Hard problem) an exponential number of times. We therefore propose to model this problem as a coalitional bargaining game where - crucially - agents are not given access to the characteristic function. Instead, we implicitly reason about the characteristic function, and thus eliminate the need to evaluate the VRP an exponential number of times - we only need to evaluate it once. Our contribution is that our decentralised approach is both scalable and considers the self-interested nature of companies. The agents learn using a modified Independent Proximal Policy Optimisation. Our RL agents outperform a strong heuristic bot. The agents correctly identify the optimal coalitions 79% of the time with an average optimality gap of 4.2% and reduction in run-time of 62%.

摘要
共同交通规划是elivery公司合作，共享交通信息，为彼此完成交通请求。这实现了经济规模效应，降低成本、温室气体排放和路况堵塞。但是哪些公司应该合作，各自如何分配资金？传统游戏理论的解决方案，如雪堡值或核心，在现实世界问题上难以计算，因为特征函数呈指数增长。这需要解决交通路径问题（NP困难问题）的极限数量。我们因此提议将这个问题模型为一个协会谈判游戏，其中代理人不具备特征函数的访问权限。相反，我们隐式地理解特征函数，因此不需要评估交通路径问题的极限数量。我们的贡献是我们的分布式方法可扩展，同时考虑到公司的自利益。代理人使用修改后的独立近邻策略学习。我们的RL代理人在比较强的启发策略bot上表现出色，代理人正确地Identify合适的协会79%的时间，平均优化差距4.2%，降低运行时间62%。

Sliceformer: Make Multi-head Attention as Simple as Sorting in Discriminative Tasks

paper_url: http://arxiv.org/abs/2310.17683
repo_url: https://github.com/sds-lab/sliceformer
paper_authors: Shen Yuan, Hongteng Xu
for: 这个研究旨在提出一个具有更高效和更高精度的Transformer模组，以替代现有的Transformer模型。
methods: 这个研究使用了一种简单的“截割排序”操作来取代Transformer中的多头注意力（MHA）机制，并考虑了不同的实现方法。
results: 实验结果显示，Sliceformer在Long-Range Arena套件、图像分类、文本分类和分子性质预测等任务中具有较高的效率和较低的内存成本，并且能够避免模式崩溃现象。

Abstract
As one of the most popular neural network modules, Transformer plays a central role in many fundamental deep learning models, e.g., the ViT in computer vision and the BERT and GPT in natural language processing. The effectiveness of the Transformer is often attributed to its multi-head attention (MHA) mechanism. In this study, we discuss the limitations of MHA, including the high computational complexity due to its ``query-key-value'' architecture and the numerical issue caused by its softmax operation. Considering the above problems and the recent development tendency of the attention layer, we propose an effective and efficient surrogate of the Transformer, called Sliceformer. Our Sliceformer replaces the classic MHA mechanism with an extremely simple ``slicing-sorting'' operation, i.e., projecting inputs linearly to a latent space and sorting them along different feature dimensions (or equivalently, called channels). For each feature dimension, the sorting operation implicitly generates an implicit attention map with sparse, full-rank, and doubly-stochastic structures. We consider different implementations of the slicing-sorting operation and analyze their impacts on the Sliceformer. We test the Sliceformer in the Long-Range Arena benchmark, image classification, text classification, and molecular property prediction, demonstrating its advantage in computational complexity and universal effectiveness in discriminative tasks. Our Sliceformer achieves comparable or better performance with lower memory cost and faster speed than the Transformer and its variants. Moreover, the experimental results reveal that applying our Sliceformer can empirically suppress the risk of mode collapse when representing data. The code is available at \url{https://github.com/SDS-Lab/sliceformer}.

摘要
transformed 是深度学习中最受欢迎的神经网络模块之一，在计算机视觉和自然语言处理等领域中扮演着重要的角色。 transformer 的效果很多被归结到它的多头注意力（MHA）机制。在这种研究中，我们讨论 MHA 的限制，包括它的“查询-关键-值”架构带来的计算复杂性以及softmax操作所导致的数值问题。针对这些问题和近期的注意层发展趋势，我们提出了一种高效和高效的 transformer 变体，称为 Sliceformer。 Sliceformer 将 класси的 MHA 机制替换为一种极其简单的“剖分-排序”操作，即将输入 linearly проек到一个 latent space 中，然后将其排序在不同的特征维度（或等效地称为通道）上。对于每个特征维度，排序操作会隐式生成一个含有稀疏、全积、双射的注意力地图。我们考虑了不同的剖分-排序操作的实现方式，并分析它们对 Sliceformer 的影响。我们在 Long-Range Arena benchmark、图像分类、文本分类和分子性质预测等任务中测试了 Sliceformer，并证明它在计算复杂性和泛化效果方面具有优势。在这些任务中，Sliceformer 可以达到与 transformer 和其变体相同或更好的性能，同时具有较低的内存成本和更快的速度。此外，实验结果表明，通过应用 Sliceformer，可以在数据表示方面减少模式陷阱的风险。代码可以在中获取。

Likelihood-based Out-of-Distribution Detection with Denoising Diffusion Probabilistic Models

paper_url: http://arxiv.org/abs/2310.17432
repo_url: None
paper_authors: Joseph Goodier, Neill D. F. Campbell
for: 本研究探讨了基于生成模型的 OUT-OF-DISTRIBUTION检测问题。
methods: 本研究使用了 Deep Denoising Diffusion Models，并提出了一种新的可能性比，即复杂性修正可能性比。
results: 研究结果与现有生成模型 OUT-OF-DISTRIBUTION 检测方法相当。

Abstract
Out-of-Distribution detection between dataset pairs has been extensively explored with generative models. We show that likelihood-based Out-of-Distribution detection can be extended to diffusion models by leveraging the fact that they, like other likelihood-based generative models, are dramatically affected by the input sample complexity. Currently, all Out-of-Distribution detection methods with Diffusion Models are reconstruction-based. We propose a new likelihood ratio for Out-of-Distribution detection with Deep Denoising Diffusion Models, which we call the Complexity Corrected Likelihood Ratio. Our likelihood ratio is constructed using Evidence Lower-Bound evaluations from an individual model at various noising levels. We present results that are comparable to state-of-the-art Out-of-Distribution detection methods with generative models.

摘要
对两个数据集之间的对外分布检测已经广泛探索过，使用生成模型。我们显示，基于概率的对外分布检测可以对于传播模型进行扩展，利用这些模型与其他基于概率的生成模型一样，受到输入样本复杂性的影响。目前所有与传播模型的对外分布检测方法都是重建基于的。我们提议一新的可能性比例，用于传播模型的对外分布检测，我们称之为“可能性 corrected 可能性比例”。我们的可能性比例是通过对单一模型的证据下界评估进行构建的。我们展示了与州标准对外分布检测方法相当的结果。

Causal Modeling with Stationary Diffusions

paper_url: http://arxiv.org/abs/2310.17405
repo_url: None
paper_authors: Lars Lorch, Andreas Krause, Bernhard Schölkopf
for: 本研究提出了一种新的 causal inference 方法，而不是使用结构方程式表示 causal graph。
methods: 本方法使用 stochastic differential equations (SDEs) 来模型系统的行为下 intervened。
results: 本方法可以在一些情况下更好地扩展到未看到的 intervened 变量，比 классиical approach 更好。

Abstract
We develop a novel approach towards causal inference. Rather than structural equations over a causal graph, we learn stochastic differential equations (SDEs) whose stationary densities model a system's behavior under interventions. These stationary diffusion models do not require the formalism of causal graphs, let alone the common assumption of acyclicity. We show that in several cases, they generalize to unseen interventions on their variables, often better than classical approaches. Our inference method is based on a new theoretical result that expresses a stationarity condition on the diffusion's generator in a reproducing kernel Hilbert space. The resulting kernel deviation from stationarity (KDS) is an objective function of independent interest.

摘要
我们开发了一种新的 causal inference 方法。而不是使用 causal graph 上的结构方程，我们学习了随机差分方程（SDEs），其中的静态分布模型系统在干扰下的行为。这些静态扩散模型不需要形式主义 causal graph，也不需要通常假设的无环。我们显示在一些情况下，它们可以更好地扩展到未看到的干扰变量，经常比经典方法更好。我们的推断方法基于一个新的理论结果，表示扩散的生成器在 reproduce kernel 希尔бер特空间中的静态条件。 resulting kernel deviation from stationarity（KDS）是一个独立的目标函数。

Enhancing Graph Neural Networks with Structure-Based Prompt

paper_url: http://arxiv.org/abs/2310.17394
repo_url: None
paper_authors: Qingqing Ge, Zeyuan Zhao, Yiding Liu, Anfeng Cheng, Xiang Li, Shuaiqiang Wang, Dawei Yin
for: 本研究旨在提高Graph Neural Networks（GNNs）在学习图数据 semantics方面的能力，特别是在“预训练、提示” paradigm下。
methods: 本研究提出了一种新的结构基于的提示方法（SAP），该方法在预训练和提示阶段都会利用图结构信息，以帮助更好地传递预训练知识到下游任务。SAP使用了双视对比学习，将节点特征和图结构的semantic空间进行协调，并在提示图中包含结构信息以更好地采用预训练知识。
results: 对于节点分类和图分类任务，SAP显示了更高的效果。此外，SAP在更加困难的少数shot场景下也能够达到更好的性能，包括同型和不同型图。

Abstract
Graph Neural Networks (GNNs) are powerful in learning semantics of graph data. Recently, a new paradigm "pre-train, prompt" has shown promising results in adapting GNNs to various tasks with less supervised data. The success of such paradigm can be attributed to the more consistent objectives of pre-training and task-oriented prompt tuning, where the pre-trained knowledge can be effectively transferred to downstream tasks. However, an overlooked issue of existing studies is that the structure information of graph is usually exploited during pre-training for learning node representations, while neglected in the prompt tuning stage for learning task-specific parameters. To bridge this gap, we propose a novel structure-based prompting method for GNNs, namely SAP, which consistently exploits structure information in both pre-training and prompt tuning stages. In particular, SAP 1) employs a dual-view contrastive learning to align the latent semantic spaces of node attributes and graph structure, and 2) incorporates structure information in prompted graph to elicit more pre-trained knowledge in prompt tuning. We conduct extensive experiments on node classification and graph classification tasks to show the effectiveness of SAP. Moreover, we show that SAP can lead to better performance in more challenging few-shot scenarios on both homophilous and heterophilous graphs.

摘要
图形神经网络（GNNs）在学习图数据的含义方面具有强大的能力。最近，一种新的思路“预训练、提示”（pre-train, prompt）在使用更少的监督数据来适应不同任务中表现出了扎实的成果。这种思路的成功可以归结于预训练和任务尝试提示中的更一致的目标，其中预训练的知识可以更好地被下游任务中转移。然而，现有研究中一个被忽略的问题是，在预训练和任务尝试提示阶段中，通常会利用图structure信息来学习节点表示，而忽略图structure信息在任务尝试提示阶段中的利用。为了填补这一漏洞，我们提出了一种新的结构基于的提示方法，即SAP，该方法在预训练和任务尝试提示阶段都会一致地利用图structure信息。具体来说，SAP包括以下两个部分：1）使用双视contrastive学习来对节点特征和图结构的Semantic空间进行对接，2）在提示图中包含结构信息，以便更好地在任务尝试提示阶段引导更多的预训练知识。我们在节点分类和图分类任务中进行了广泛的实验，并证明了SAP的效果。此外，我们还证明了SAP可以在更加困难的几个shotenario中表现更好，包括同型和不同型的图。

A Challenge in Reweighting Data with Bilevel Optimization

paper_url: http://arxiv.org/abs/2310.17386
repo_url: None
paper_authors: Anastasia Ivanova, Pierre Ablin
for: 这个论文的目的是学习一个权重学习模型，以优化在小型测试集上的性能。
methods: 该论文使用了皮卡优化方法，并使用了一个热启动策略，以同时学习模型参数和数据权重。
results: 研究发现，使用皮卡优化方法可能会导致数据权重变得非常稀疏，这可能是数据重新权重的一个难题。

Abstract
In many scenarios, one uses a large training set to train a model with the goal of performing well on a smaller testing set with a different distribution. Learning a weight for each data point of the training set is an appealing solution, as it ideally allows one to automatically learn the importance of each training point for generalization on the testing set. This task is usually formalized as a bilevel optimization problem. Classical bilevel solvers are based on a warm-start strategy where both the parameters of the models and the data weights are learned at the same time. We show that this joint dynamic may lead to sub-optimal solutions, for which the final data weights are very sparse. This finding illustrates the difficulty of data reweighting and offers a clue as to why this method is rarely used in practice.

摘要
在许多场景下，我们使用大量训练数据集来训练模型，以达到在小型测试数据集上的不同分布下表现良好。学习每个训练数据点的权重是一个吸引人的解决方案，因为它可以自动学习训练数据集中每个数据点的重要性，以便在测试数据集上泛化。这个任务通常被формализова为二级优化问题。经典的二级优化解决方案基于温风策略，其中模型参数和数据权重同时被学习。我们显示，这种联合动态可能会导致低优化解决方案，其中最终的数据权重很稀疏。这一发现反映了数据重新权重的困难，并提供了为何这种方法在实践中 rarely used 的一个原因。

Multitask Online Learning: Listen to the Neighborhood Buzz

paper_url: http://arxiv.org/abs/2310.17385
repo_url: None
paper_authors: Juliette Achddou, Nicolò Cesa-Bianchi, Pierre Laforgue
For: 这个论文研究了多任务在线学习中，在不同任务之间的Agent之间的信息交换是可以进行的。* Methods: 该论文提出了一种名为$\texttt{MT-CO}_2\texttt{OL}$的分布式算法，可以在这种情况下减少代理的误差。* Results: 研究发现，$\texttt{MT-CO}_2\texttt{OL}$的误差与任务相似性和网络结构相关，并且可以在不同任务之间进行减少误差的训练。此外，该算法还可以保证隐私性，并且对于线性损失函数来说，误差的影响可以忽略不计。

Abstract
We study multitask online learning in a setting where agents can only exchange information with their neighbors on an arbitrary communication network. We introduce $\texttt{MT-CO}_2\texttt{OL}$, a decentralized algorithm for this setting whose regret depends on the interplay between the task similarities and the network structure. Our analysis shows that the regret of $\texttt{MT-CO}_2\texttt{OL}$ is never worse (up to constants) than the bound obtained when agents do not share information. On the other hand, our bounds significantly improve when neighboring agents operate on similar tasks. In addition, we prove that our algorithm can be made differentially private with a negligible impact on the regret when the losses are linear. Finally, we provide experimental support for our theory.

摘要
我们研究多任务在线学习的情况，在具有自适应通信网络的情况下，我们引入了$\texttt{MT-CO}_2\texttt{OL}$算法，该算法在分布式环境中实现多任务学习。我们的分析显示，$\texttt{MT-CO}_2\texttt{OL}$的 regret与任务相似性和网络结构之间存在一定的关系。而且，当邻居任务相似时，我们的 bound 会有所改善。此外，我们证明了我们的算法可以在 Linear 损失下实现 differential privacy，并且这对 regret 的影响是可以忽略的。最后，我们提供了实验支持我们的理论。Note:* "多任务" (duō zhèng) means "multi-task" in Chinese.* "在线" (zài xiàn) means "online" in Chinese.* "学习" (xué xí) means "learning" in Chinese.* "分布式" (fēn bù zhī) means "distributed" in Chinese.* "算法" (suān fǎ) means "algorithm" in Chinese.* "regret" (jì shí) means "regret" in Chinese.* "任务相似性" (tài yì xiǎng qi) means "task similarity" in Chinese.* "网络结构" (wǎng luò jí qi) means "network structure" in Chinese.* "Linear" (líng yǐ) means "linear" in Chinese.* "损失" (diān shì) means "loss" in Chinese.* "differential privacy" (dì zhèng bìng jì) means "differential privacy" in Chinese.

On the recognition of the game type based on physiological signals and eye tracking

paper_url: http://arxiv.org/abs/2310.17383
repo_url: None
paper_authors: Łukasz Czekaj, Łukasz Radzinski, Mateusz Kolimaga, Jakub Domaszewicz, Robert Kitłowski, Mariusz Szwoch, Włodzisław Duch
for: 本研究是为了探讨ognitive activity recognition的可能性，通过使用特定的信号集来进行探索。
methods: 本研究使用了三种游戏（Space Invaders、Tetris、Tower Defence）和间游戏 pause 的分类器，并在玩家无关和有关enario下验证了分类器。
results: 根据游戏分类结果，本研究提出了智能监测和量身定制等应用的可能性。

Abstract
Automated interpretation of signals yields many impressive applications from the area of affective computing and human activity recognition (HAR). In this paper we ask the question about possibility of cognitive activity recognition on the base of particular set of signals. We use recognition of the game played by the participant as a playground for exploration of the problem. We build classifier of three different games (Space Invaders, Tetris, Tower Defence) and inter-game pause. We validate classifier in the player-independent and player-dependent scenario. We discuss the improvement in the player-dependent scenario in the context of biometric person recognition. On the base of the results obtained in game classification, we consider potential applications in smart surveillance and quantified self.

摘要
自动化信号解释给出了许多吸引人的应用，从情感计算和人活动识别（HAR）领域。在这篇论文中，我们讨论了基于特定集合信号的认知活动识别的可能性。我们使用游戏被玩家的情况作为探索问题的平台。我们构建了三种游戏（太空邪 spirit, Tetris, 防御塔）和交互游戏间的暂停 pause 的分类器。我们在独立玩家和依赖玩家的情况下验证了分类器。我们在游戏分类结果的基础上讨论了智能监测和量化自己的潜在应用。

Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

paper_url: http://arxiv.org/abs/2310.17360
repo_url: None
paper_authors: Junfeng Hu, Xu Liu, Zhencheng Fan, Yuxuan Liang, Roger Zimmermann
for: 本研究旨在提出一种统一的空间时间图学习方法，用于智能城市、人类流动和气候分析等应用。
methods: 该方法基于 conditional 信息的共享空间时间模式，提出了一种通用的 uncertainty-aware 扩散模型（USTD），包括共享空间时间编码器和注意力基于denoising网络。
results: 在预测和投影 зада务中，USTD 实现了STATE-OF-THE-ART的性能，同时提供了有价值的uncertainty estimate。

Abstract
Spatio-temporal graph learning is a fundamental problem in the Web of Things era, which enables a plethora of Web applications such as smart cities, human mobility and climate analysis. Existing approaches tackle different learning tasks independently, tailoring their models to unique task characteristics. These methods, however, fall short of modeling intrinsic uncertainties in the spatio-temporal data. Meanwhile, their specialized designs limit their universality as general spatio-temporal learning solutions. In this paper, we propose to model the learning tasks in a unified perspective, viewing them as predictions based on conditional information with shared spatio-temporal patterns. Based on this proposal, we introduce Unified Spatio-Temporal Diffusion Models (USTD) to address the tasks uniformly within the uncertainty-aware diffusion framework. USTD is holistically designed, comprising a shared spatio-temporal encoder and attention-based denoising networks that are task-specific. The shared encoder, optimized by a pre-training strategy, effectively captures conditional spatio-temporal patterns. The denoising networks, utilizing both cross- and self-attention, integrate conditional dependencies and generate predictions. Opting for forecasting and kriging as downstream tasks, we design Gated Attention (SGA) and Temporal Gated Attention (TGA) for each task, with different emphases on the spatial and temporal dimensions, respectively. By combining the advantages of deterministic encoders and probabilistic diffusion models, USTD achieves state-of-the-art performances compared to deterministic and probabilistic baselines in both tasks, while also providing valuable uncertainty estimates.

摘要
“在Web of Things时代，空间temporal图学是一个基本问题，它支持许多Web应用程序，如智能城市、人类流动和气候分析。现有的方法都是独立地解决不同的学习任务，特制化他们的模型以满足每个任务的特点。然而，这些方法忽略了空间temporal数据中的内在不确定性。此外，它们的专门设计限制了它们的通用性，无法作为普遍的空间temporal学习解决方案。在本文中，我们提议模型学习任务在一个统一的视角下，视之为基于共享空间temporal信息的预测。根据这一提议，我们引入统一的空间temporal扩散模型（USTD），用于 Addressing 这些任务。USTD 包括共享空间temporal编码器和基于注意力的排除网络，这些网络是任务特定的。共享编码器，通过预训练策略优化，有效地捕捉 conditional space-time 模式。排除网络，通过双向注意力和自注意力，集成 conditional 依赖关系，生成预测。为了forecasting 和 krigeing 两个下游任务，我们设计了闭合注意力（SGA）和时间闭合注意力（TGA），每个任务都有不同的焦点在空间和时间维度。USTD 通过结合确定性编码器和 probabilistic 扩散模型的优点，实现了对 deterministic 和 probabilistic 基准的比较性表现，同时也提供了有价值的不确定性估计。”

Exploring the Trie of Rules: a fast data structure for the representation of association rules

paper_url: http://arxiv.org/abs/2310.17355
repo_url: https://github.com/arm-interpretation/trie-of-rules
paper_authors: Mikhail Kudriavtsev, Dr Marija Bezbradica, Dr Andrew McCarren
for: 提高 association rule mining 技术中数据结构的效率，以提高知识挖掘的速度。
methods: 提出了一种新的数据结构——规则 Trie，用于存储生成的规则集。该结构为规则集存储了一个前缀树图结构，每个节点代表一个规则，其中 antecedent 是从该节点到根节点的路径。
results: 对比 traditional 数据结构，提出的规则 Trie 可以压缩规则集，减少数据损失，并且在基本操作如查找特定规则和排序等方面具有显著的提高。特别是，我们的方法在 traverse 时间方面实现了8倍的提高。

Abstract
Association rule mining techniques can generate a large volume of sequential data when implemented on transactional databases. Extracting insights from a large set of association rules has been found to be a challenging process. When examining a ruleset, the fundamental question is how to summarise and represent meaningful mined knowledge efficiently. Many algorithms and strategies have been developed to address issue of knowledge extraction; however, the effectiveness of this process can be limited by the data structures. A better data structure can sufficiently affect the speed of the knowledge extraction process. This paper proposes a novel data structure, called the Trie of rules, for storing a ruleset that is generated by association rule mining. The resulting data structure is a prefix-tree graph structure made of pre-mined rules. This graph stores the rules as paths within the prefix-tree in a way that similar rules overlay each other. Each node in the tree represents a rule where a consequent is this node, and an antecedent is a path from this node to the root of the tree. The evaluation showed that the proposed representation technique is promising. It compresses a ruleset with almost no data loss and benefits in terms of time for basic operations such as searching for a specific rule and sorting, which is the base for many knowledge discovery methods. Moreover, our method demonstrated a significant improvement in traversing time, achieving an 8-fold increase compared to traditional data structures.

摘要

De-novo Chemical Reaction Generation by Means of Temporarily Convolutional Neural Networks

paper_url: http://arxiv.org/abs/2310.17341
repo_url: None
paper_authors: Andrei Buin, Hung Yi Chiang, S. Andrew Gadsden, Faraz A. Alderson
for: 这篇论文主要是用于描述一种结合 Recurrent Neural Networks (RNN) 和 Temporarily Convolutional Neural Networks (TCN) 的新方法，用于生成新的化学反应。
methods: 这篇论文使用了一种新的化学反应表示法（CGRSmiles），并直接 incorporated 了原子映射。RNN 和 TCN 都是基于 autoregressive 性的神经网络，在自然语言处理 (NLP) 中有广泛应用。
results: 结合 RNN 和 TCN 两种隐藏表示结果，通过 TCN 和 RNN 的组合，实现了对化学反应的更好生成。此外，通过不同的 fine-tuning 协议，在一个 dataset 上进行了转移学习，并且发现了不同 fine-tuning 协议对模型的生成范围产生了深刻的影响。

Abstract
We present here a combination of two networks, Recurrent Neural Networks (RNN) and Temporarily Convolutional Neural Networks (TCN) in de novo reaction generation using the novel Reaction Smiles-like representation of reactions (CGRSmiles) with atom mapping directly incorporated. Recurrent Neural Networks are known for their autoregressive properties and are frequently used in language modelling with direct application to SMILES generation. The relatively novel TCNs possess similar properties with wide receptive field while obeying the causality required for natural language processing (NLP). The combination of both latent representations expressed through TCN and RNN results in an overall better performance compared to RNN alone. Additionally, it is shown that different fine-tuning protocols have a profound impact on generative scope of the model when applied on a dataset of interest via transfer learning.

摘要
我们现在提出了两个网络的结合，回传神经网络（RNN）和时间卷积神经网络（TCN），用于从头开始的化学反应生成，使用新的化学反应SMILES表示法（CGRSmiles），直接包含原子映射。RNN和TCN都是有 autoregressive 性的，通常用于语言模型化，直接应用于 SMILES 生成。TCN 具有广泛的接收领域，并遵循自然语言处理（NLP）的 causality。两个隐藏表示的结合，通过 TCN 和 RNN 的表示，导致整体性能更好，相比 RNN alone。此外，我们还发现，在应用到适当的资料集时，不同的精致调整协议对模型的生成范围有着深远的影响。

A multi-artifact EEG denoising by frequency-based deep learning

paper_url: http://arxiv.org/abs/2310.17335
repo_url: None
paper_authors: Matteo Gabardi, Aurora Saibene, Francesca Gasparini, Daniele Rizzo, Fabio Antonio Stella
for: 本研究旨在提高电энцефалографи记录（EEG）信号质量，减少背景噪声，以便更好地掌握脑动态。
methods: 该研究提出了一种基于频域的EEG减噪模型，利用噪声特征的先验知识来自适应计算权重滤波器，以实现噪声分离。模型通过学习empirical关系，将噪声特征与噪声信号和清洁信号的spectral特征进行非线性变换，实现信号减噪。
results: 实验结果表明，提出的减噪模型在EEGdenoiseNet数据集上具有最佳性能，按照时间和频率度量 Both temporal and spectral metrics show that the proposed denoising model achieves optimal results on the EEGdenoiseNet dataset, effectively removing physiological artifacts from input EEG data. The model outperforms or matches the performance of benchmark models, demonstrating its ability to remove both muscle and ocular artifacts without requiring specific training on the type of artifact.

Abstract
Electroencephalographic (EEG) signals are fundamental to neuroscience research and clinical applications such as brain-computer interfaces and neurological disorder diagnosis. These signals are typically a combination of neurological activity and noise, originating from various sources, including physiological artifacts like ocular and muscular movements. Under this setting, we tackle the challenge of distinguishing neurological activity from noise-related sources. We develop a novel EEG denoising model that operates in the frequency domain, leveraging prior knowledge about noise spectral features to adaptively compute optimal convolutional filters for noise separation. The model is trained to learn an empirical relationship connecting the spectral characteristics of noise and noisy signal to a non-linear transformation which allows signal denoising. Performance evaluation on the EEGdenoiseNet dataset shows that the proposed model achieves optimal results according to both temporal and spectral metrics. The model is found to remove physiological artifacts from input EEG data, thus achieving effective EEG denoising. Indeed, the model performance either matches or outperforms that achieved by benchmark models, proving to effectively remove both muscle and ocular artifacts without the need to perform any training on the particular type of artifact.

摘要
电生电幕信号（EEG）是 neuroscience 研究和临床应用中的基础信号，包括大脑-计算机接口和神经疾病诊断。这些信号通常是脑动和噪声的组合，来自不同的来源，包括生物学 artifacts 如眼动和肌动。在这种设置下，我们解决了难题，即分清脑动和噪声相关的来源。我们开发了一种基于频域的 EEG 去噪模型，利用噪声频谱特征的先验知识来自适应计算优化的卷积滤波器，以分离噪声。模型通过学习 Empirical 关系，将噪声和噪声信号的频谱特征转换为非线性变换，实现信号去噪。性能评估在 EEGdenoiseNet 数据集上表明，提议的模型在时间和频谱 métricas 上具有优秀的表现，能够有效地从输入 EEG 数据中除掉生物学 artifacts。实际上，模型的表现和参考模型的表现相似或更好，能够无需特定类型的训练，快速地除掉眼动和肌动噪声。

On Forecast Stability

paper_url: http://arxiv.org/abs/2310.17332
repo_url: https://github.com/KshitijK1999/The-Impact-of-Macroeconomic-and-Oil-Shocks-on-India-s-Non-Ferrous-Metal-Prices-An-SVAR-Approach-
paper_authors: Rakshitha Godahewa, Christoph Bergmeir, Zeynep Erkin Baz, Chengjun Zhu, Zhangdi Song, Salvador García, Dario Benavides
for: 本研究旨在提高预测稳定性，以便在决策过程中可以更好地使用预测结果。
methods: 本研究使用了一种简单的线性插值方法，可以在任何基模型基础上进行稳定预测。
results: 对四个公共数据集进行评估，提议的框架可以达到较高的稳定性和准确性，与一些参考方法相比。

Abstract
Forecasts are typically not produced in a vacuum but in a business context, where forecasts are generated on a regular basis and interact with each other. For decisions, it may be important that forecasts do not change arbitrarily, and are stable in some sense. However, this area has received only limited attention in the forecasting literature. In this paper, we explore two types of forecast stability that we call vertical stability and horizontal stability. The existing works in the literature are only applicable to certain base models and extending these frameworks to be compatible with any base model is not straightforward. Furthermore, these frameworks can only stabilise the forecasts vertically. To fill this gap, we propose a simple linear-interpolation-based approach that is applicable to stabilise the forecasts provided by any base model vertically and horizontally. The approach can produce both accurate and stable forecasts. Using N-BEATS, Pooled Regression and LightGBM as the base models, in our evaluation on four publicly available datasets, the proposed framework is able to achieve significantly higher stability and/or accuracy compared to a set of benchmarks including a state-of-the-art forecast stabilisation method across three error metrics and six stability metrics.

摘要
预测通常不在荒野中生成，而是在商业上下文中进行定期生成，并且与其他预测相互交互。在决策过程中，可能需要预测不应该随意变化，而是具有一定的稳定性。然而，这一领域在预测文献中得到了有限的注意。在这篇论文中，我们探讨了两种预测稳定性，称之为垂直稳定性和水平稳定性。现有的文献仅适用于某些基础模型，扩展这些框架以适用于任何基础模型是不容易的。此外，这些框架只能稳定预测的垂直方向。为了填补这一漏洞，我们提议了一种简单的线性 interpolate-based 方法，可以在任何基础模型上稳定预测，并且可以同时稳定预测的垂直和水平方向。该方法可以生成准确和稳定的预测。使用 N-BEATS、Pool Regression 和 LightGBM 作为基础模型，在我们对四个公共可用的数据集进行评估中，我们的提议框架能够与一组标准的 benchmark 相比，在三个错误度量和六个稳定度量上显著提高预测的稳定性和/或准确性。

Feature Extraction and Classification from Planetary Science Datasets enabled by Machine Learning

paper_url: http://arxiv.org/abs/2310.17681
repo_url: None
paper_authors: Conor Nixon, Zachary Yahn, Ethan Duncan, Ian Neidel, Alyssa Mills, Benoît Seignovert, Andrew Larsen, Kathryn Gansler, Charles Liles, Catherine Walker, Douglas Trent, John Santerre
For: 本研究使用机器学习神经网络（MLNN）对外太阳系行星任务图像集进行特征识别。* Methods: 我们使用了传输学习方法，将业界标准的Mask R-CNN（区域基于卷积神经网络）模型添加和训练新层来识别卷积 dataset 中标注的块。然后，我们对新数据集进行测试，达到了68%的准确率。在另一个应用中，我们使用了同样的方法来识别 titan 上的云层，并在369张图像上达到了95%的准确率。* Results: 我们评估了我们的技术的相对成功，并建议了进一步的训练和识别方法。这些新方法可以在其他行星上进行类似的识别任务，包括地球。此外，这些技术可以将返回的数据减少到最小化的subset，或者只返回差异数据（即图像中发生变化的部分），从而大幅提高最终数据流中的信息内容。

Abstract
In this paper we present two examples of recent investigations that we have undertaken, applying Machine Learning (ML) neural networks (NN) to image datasets from outer planet missions to achieve feature recognition. Our first investigation was to recognize ice blocks (also known as rafts, plates, polygons) in the chaos regions of fractured ice on Europa. We used a transfer learning approach, adding and training new layers to an industry-standard Mask R-CNN (Region-based Convolutional Neural Network) to recognize labeled blocks in a training dataset. Subsequently, the updated model was tested against a new dataset, achieving 68% precision. In a different application, we applied the Mask R-CNN to recognize clouds on Titan, again through updated training followed by testing against new data, with a precision of 95% over 369 images. We evaluate the relative successes of our techniques and suggest how training and recognition could be further improved. The new approaches we have used for planetary datasets can further be applied to similar recognition tasks on other planets, including Earth. For imagery of outer planets in particular, the technique holds the possibility of greatly reducing the volume of returned data, via onboard identification of the most interesting image subsets, or by returning only differential data (images where changes have occurred) greatly enhancing the information content of the final data stream.

摘要
在这篇论文中，我们介绍了我们最近进行的两个例子，应用机器学习（ML）神经网络（NN）到外行星任务图像集来实现特征识别。我们的第一个调查是在冰尘环境中识别 europa 上的冰块（也称为rafts、plates、多边形）。我们采用了传输学习方法，将 industry-standard Mask R-CNN（区域基于 convolutional neural network）添加并训练新层来识别标注的块。然后，我们对新数据集进行测试，达到了68%的准确率。在另一个应用中，我们将 Mask R-CNN 应用于 titan 上的云层识别，通过更新训练后测试新数据集，达到了95%的准确率，测试了 369 张图像。我们评估了我们的技术的相对成功和提高训练和识别的方法。这些新方法可以进一步应用于类似的识别任务中，包括地球。尤其是 для 外行星图像集，这种技术可以大幅减少返回的数据量，通过在线上标识最有趣的图像子集，或者只返回差异数据（图像中发生变化的部分），从而大幅提高最终数据流的信息内容。

Demonstration-Regularized RL

paper_url: http://arxiv.org/abs/2310.17303
repo_url: None
paper_authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard
for: 本文研究了使用专家示例来提高奖励学习（RL）的样本效率的效果。
methods: 本文使用了KL正则化来使用专家示例来学习策略。
results: 本文发现，使用 $N^{\mathrm{E}$ 个专家示例可以在样本复杂度为 $\widetilde{\mathcal{O}(\mathrm{Poly}(S,A,H)/(\varepsilon^2 N^{\mathrm{E}))$ 中Identify优化策略，在有限的情况下，而在线性 Markov 决策过程中，样本复杂度为 $\widetilde{\mathcal{O}(\mathrm{Poly}(d,H)/(\varepsilon^2 N^{\mathrm{E}))$。此外，本文还提供了紧急的收敛保证 для行为做clone过程，以及RLHF中 demonstrate-regularized 方法的有效性。

Abstract
Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning. Our findings reveal that using $N^{\mathrm{E}$ expert demonstrations enables the identification of an optimal policy at a sample complexity of order $\widetilde{\mathcal{O}(\mathrm{Poly}(S,A,H)/(\varepsilon^2 N^{\mathrm{E}))$ in finite and $\widetilde{\mathcal{O}(\mathrm{Poly}(d,H)/(\varepsilon^2 N^{\mathrm{E}))$ in linear Markov decision processes, where $\varepsilon$ is the target precision, $H$ the horizon, $A$ the number of action, $S$ the number of states in the finite case and $d$ the dimension of the feature space in the linear case. As a by-product, we provide tight convergence guarantees for the behaviour cloning procedure under general assumptions on the policy classes. Additionally, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Interestingly, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, thus setting our approach apart from the prior works.

摘要
incorporating expert demonstrations has been empirically shown to improve the sample efficiency of reinforcement learning (RL). This paper quantifies the extent to which this extra information reduces RL's sample complexity. Specifically, we study the use of expert demonstrations in reinforcement learning, leveraging the expert demonstrations through KL-regularization to learn a policy using behavior cloning. Our findings show that using $N^{\text{E}$ expert demonstrations allows for the identification of an optimal policy at a sample complexity of order $\widetilde{\mathcal{O}(\text{Poly}(S,A,H)/(\varepsilon^2 N^{\text{E}))$ in finite and $\widetilde{\mathcal{O}(\text{Poly}(d,H)/(\varepsilon^2 N^{\text{E}))$ in linear Markov decision processes, where $\varepsilon$ is the target precision, $H$ the horizon, $A$ the number of action, $S$ the number of states in the finite case and $d$ the dimension of the feature space in the linear case. Additionally, we provide tight convergence guarantees for the behavior cloning procedure under general assumptions on the policy classes. Furthermore, we establish that demonstration-regularized methods are provably efficient for reinforcement learning from human feedback (RLHF). In this respect, we provide theoretical evidence showing the benefits of KL-regularization for RLHF in tabular and linear MDPs. Notably, we avoid pessimism injection by employing computationally feasible regularization to handle reward estimation uncertainty, setting our approach apart from prior works.

Looping in the Human: Collaborative and Explainable Bayesian Optimization

paper_url: http://arxiv.org/abs/2310.17273
repo_url: None
paper_authors: Masaki Adachi, Brady Planden, David A. Howey, Krikamol Maundet, Michael A. Osborne, Siu Lun Chau
for: 这篇论文旨在开发一个协作和解释式算法来增强人工智能服务器和用户之间的合作，以提高服务器的启发和用户的信任。
methods: 这篇论文提出了一个协作和解释式算法框架，名为CoExBO，它使用偏好学习来融合人工智能和用户的意见，并在每一轮的选择过程中进行说明，以增强用户的信任。
results: 这篇论文透过人工智能和用户之间的协作实验，展示了CoExBO框架的效果，并证明了它的优化效果和安全性。

Abstract
Like many optimizers, Bayesian optimization often falls short of gaining user trust due to opacity. While attempts have been made to develop human-centric optimizers, they typically assume user knowledge is well-specified and error-free, employing users mainly as supervisors of the optimization process. We relax these assumptions and propose a more balanced human-AI partnership with our Collaborative and Explainable Bayesian Optimization (CoExBO) framework. Instead of explicitly requiring a user to provide a knowledge model, CoExBO employs preference learning to seamlessly integrate human insights into the optimization, resulting in algorithmic suggestions that resonate with user preference. CoExBO explains its candidate selection every iteration to foster trust, empowering users with a clearer grasp of the optimization. Furthermore, CoExBO offers a no-harm guarantee, allowing users to make mistakes; even with extreme adversarial interventions, the algorithm converges asymptotically to a vanilla Bayesian optimization. We validate CoExBO's efficacy through human-AI teaming experiments in lithium-ion battery design, highlighting substantial improvements over conventional methods.

摘要
LIKE many optimizers, Bayesian optimization often falls short of gaining user trust due to opacity. While attempts have been made to develop human-centric optimizers, they typically assume user knowledge is well-specified and error-free, employing users mainly as supervisors of the optimization process. We relax these assumptions and propose a more balanced human-AI partnership with our Collaborative and Explainable Bayesian Optimization (CoExBO) framework. Instead of explicitly requiring a user to provide a knowledge model, CoExBO employs preference learning to seamlessly integrate human insights into the optimization, resulting in algorithmic suggestions that resonate with user preference. CoExBO explains its candidate selection every iteration to foster trust, empowering users with a clearer grasp of the optimization. Furthermore, CoExBO offers a no-harm guarantee, allowing users to make mistakes; even with extreme adversarial interventions, the algorithm converges asymptotically to a vanilla Bayesian optimization. We validate CoExBO's efficacy through human-AI teaming experiments in lithium-ion battery design, highlighting substantial improvements over conventional methods.Here's the text with some additional information about the translation:I used Google Translate to translate the text into Simplified Chinese. I chose Simplified Chinese because it is the most widely used standard for Chinese writing and is more commonly used in mainland China than Traditional Chinese.Please note that while machine translation can be helpful, it may not always produce perfect translations, especially for idiomatic expressions or cultural references. Additionally, the translation may not fully capture the nuances and connotations of the original text. Therefore, it's important to review the translation carefully and make any necessary adjustments to ensure that it accurately conveys the intended meaning.

Variance of ML-based software fault predictors: are we really improving fault prediction?

paper_url: http://arxiv.org/abs/2310.17264
repo_url: https://github.com/plubplub1/bountyfarm
paper_authors: Xhulja Shahini, Domenic Bubel, Andreas Metzger
for: 这个论文主要研究的是如何减少机器学习 fault prediction 模型中的变异，以提高模型在实际应用中的可重复性。
methods: 该论文使用了一种现有的 fault prediction 方法，并通过实验分析了这种方法中的变异原因。
results: 实验结果表明，这种 fault prediction 方法中的变异可以归因于机器学习模型中的随机因素（NI factors），并且这些变异可以导致模型在实际应用中的性能下降。

Abstract
Software quality assurance activities become increasingly difficult as software systems become more and more complex and continuously grow in size. Moreover, testing becomes even more expensive when dealing with large-scale systems. Thus, to effectively allocate quality assurance resources, researchers have proposed fault prediction (FP) which utilizes machine learning (ML) to predict fault-prone code areas. However, ML algorithms typically make use of stochastic elements to increase the prediction models' generalizability and efficiency of the training process. These stochastic elements, also known as nondeterminism-introducing (NI) factors, lead to variance in the training process and as a result, lead to variance in prediction accuracy and training time. This variance poses a challenge for reproducibility in research. More importantly, while fault prediction models may have shown good performance in the lab (e.g., often-times involving multiple runs and averaging outcomes), high variance of results can pose the risk that these models show low performance when applied in practice. In this work, we experimentally analyze the variance of a state-of-the-art fault prediction approach. Our experimental results indicate that NI factors can indeed cause considerable variance in the fault prediction models' accuracy. We observed a maximum variance of 10.10% in terms of the per-class accuracy metric. We thus, also discuss how to deal with such variance.

摘要

fairret: a Framework for Differentiable Fairness Regularization Terms

paper_url: http://arxiv.org/abs/2310.17256
repo_url: None
paper_authors: Maarten Buyl, MaryBeth Defrance, Tijl De Bie
for: 该论文主要用于提出一种基于自动微分库的公平正则化框架，以便更好地考虑机器学习模型的公平性问题。
methods: 该论文使用了一种基于线性分数统计的公平正则化方法，可以快速计算出各种公平性指标，并且可以与自动微分库集成。
results: 实验表明，该方法可以准确地评估机器学习模型的公平性问题，同时保持模型的预测能力。Translation:
for: The paper primarily proposes a fair regularization framework based on automatic differentiation libraries to better consider fairness issues in machine learning.
methods: The paper uses a fair regularization method based on linear-fractional statistics, which can quickly calculate various fairness metrics and be integrated with automatic differentiation libraries.
results: Experimental results show that the method can accurately assess fairness issues in machine learning models while maintaining their predictive power.

Abstract
Current tools for machine learning fairness only admit a limited range of fairness definitions and have seen little integration with automatic differentiation libraries, despite the central role these libraries play in modern machine learning pipelines. We introduce a framework of fairness regularization terms (fairrets) which quantify bias as modular objectives that are easily integrated in automatic differentiation pipelines. By employing a general definition of fairness in terms of linear-fractional statistics, a wide class of fairrets can be computed efficiently. Experiments show the behavior of their gradients and their utility in enforcing fairness with minimal loss of predictive power compared to baselines. Our contribution includes a PyTorch implementation of the fairret framework.

摘要
当前的机器学习公平工具只能接受有限的公平定义，而且它们很少与自动差分库集成，即使这些库在现代机器学习管道中扮演着重要角色。我们介绍了一个公平规范化定义（fairrets），该定义可以衡量偏见为可模块化的目标，可以efficiently在自动差分pipeline中集成。通过使用线性浓度统计来定义公平性，我们可以计算广泛的公平规范化定义。实验表明其梯度的行为和对公平性的实施效果，与基线相比，具有较小的预测力损失。我们的贡献包括一个基于PyTorch的公平规范化框架实现。

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

paper_url: http://arxiv.org/abs/2310.17247
repo_url: None
paper_authors: Jack Miller, Charles O’Neill, Thang Bui
for: 本研究旨在探讨神经网络中的“搜寻”现象（grokking）是否受限于神经网络本身，或者是一个更通用的现象。
methods: 本研究使用了神经网络、Gaussian process（GP）分类、GP回归和线性回归等不同的算法来研究grokking现象。
results: 研究发现，grokking不仅存在于神经网络中，还存在于其他算法中，如GP分类和GP回归。此外，研究还发现了一种方法，可以通过添加包含幻数据的维度来引起grokking现象。这些发现提供了一种更广泛的理论基础，用于解释grokking现象。

Abstract
In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.

摘要
在某些设置下，神经网络会展现一种现象称为“感悟”（grokking），其中神经网络在验证集上达到或几乎达到了完美或near-perfect的准确率，而这种性能已经在训练集上达到了。在这篇论文中，我们发现了感悟不仅限于神经网络，还出现在其他设置中，如 Gaussian process（GP）分类、GP回归和线性回归。我们还发现了如何通过添加包含干扰信息的维度来引起感悟在算法数据集上。非神经网络中的感悟存在证明了感悟不是特定于SGD或权重 нор规则。相反，感悟可能在任何设置中发生，where solution search是由复杂度和错误导航。基于这一见解和 Bayesian neural network（BNN）和 GP 回归模型的训练轨迹趋势，我们进展 towards a more general theory of grokking。 Specifically, we hypothesize that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.

miditok: A Python package for MIDI file tokenization

paper_url: http://arxiv.org/abs/2310.17202
repo_url: https://github.com/Natooz/MidiTok
paper_authors: Nathan Fradet, Jean-Pierre Briot, Fabien Chhel, Amal El Fallah Seghrouchni, Nicolas Gutowski
for: 这篇论文主要针对的是使用自然语言处理技术进行符号音乐处理，包括音乐生成、模型化或转写等任务，以达到现场表现的状态。
methods: 论文使用了语言模型，如转换器，与符号音乐结合进行各种任务，并在生产产品中使用。为了编码和解码音乐，需要依赖于Tokenizer，它将音乐序列化为不同元素的序列。
results: 论文介绍了一个开源库called Miditok，可以高度自定义和扩展，用于符号音乐tokkenization。它支持最流行的音乐tokkenization方法，并提供了一个统一的API。

Abstract
Recent progress in natural language processing has been adapted to the symbolic music modality. Language models, such as Transformers, have been used with symbolic music for a variety of tasks among which music generation, modeling or transcription, with state-of-the-art performances. These models are beginning to be used in production products. To encode and decode music for the backbone model, they need to rely on tokenizers, whose role is to serialize music into sequences of distinct elements called tokens. MidiTok is an open-source library allowing to tokenize symbolic music with great flexibility and extended features. It features the most popular music tokenizations, under a unified API. It is made to be easily used and extensible for everyone.

摘要
最近的自然语言处理进步已经应用到符号音乐模式上。如transformers等语言模型在音乐生成、模拟或转写等任务中具有 state-of-the-art 表现。这些模型已经开始在生产产品中使用。为了对象化和解码音乐，它们需要依赖特 Токен化器，它们的作用是将音乐转换成不同元素的序列。midiotok是一个开源库，它允许用户Tokenize符号音乐，并且具有很好的灵活性和扩展功能。它是为了让 everybody 使用和扩展的。

Adaptive importance sampling for Deep Ritz

paper_url: http://arxiv.org/abs/2310.17185
repo_url: None
paper_authors: Xiaoliang Wan, Tao Zhou, Yuancheng Zhou
for: 解决部分条件方程式（PDEs）的深度抽样方法。
methods: 使用两个深度神经网络：一个用于解决PDEs，另一个用于生成新的拟合点来精细化训练集。适应抽样过程包括两个主要步骤：首先，使用深度瑞特方法解决PDEs，并将其拟合到训练集中的拟合点上。其次，通过生成新的训练集和其相应的PDF值，通过重要抽样来更准确地估算变量损失。
results: 相比原始的深度瑞特方法，提出的适应方法可以提高精度，特别是在低 regularity 和高维度的问题上。通过一系列数学实验，证明了新方法的效果。

Abstract
We introduce an adaptive sampling method for the Deep Ritz method aimed at solving partial differential equations (PDEs). Two deep neural networks are used. One network is employed to approximate the solution of PDEs, while the other one is a deep generative model used to generate new collocation points to refine the training set. The adaptive sampling procedure consists of two main steps. The first step is solving the PDEs using the Deep Ritz method by minimizing an associated variational loss discretized by the collocation points in the training set. The second step involves generating a new training set, which is then used in subsequent computations to further improve the accuracy of the current approximate solution. We treat the integrand in the variational loss as an unnormalized probability density function (PDF) and approximate it using a deep generative model called bounded KRnet. The new samples and their associated PDF values are obtained from the bounded KRnet. With these new samples and their associated PDF values, the variational loss can be approximated more accurately by importance sampling. Compared to the original Deep Ritz method, the proposed adaptive method improves accuracy, especially for problems characterized by low regularity and high dimensionality. We demonstrate the effectiveness of our new method through a series of numerical experiments.

摘要
我们介绍了一种适应样本方法，用于深度Ritz方法来解决部分条件方程（PDEs）。这种方法使用了两个深度神经网络。一个网络用于估算PDEs的解，而另一个网络用于生成新的拟合点，以提高训练集的精度。适应样本过程包括两个主要步骤。第一步是使用深度Ritz方法，通过最小化相关的可变损失函数来解决PDEs。第二步是生成新的训练集，并使用这些新样本和其相应的PDF值进行重新计算，以提高当前解的精度。我们对integrand在可变损失函数中的PDF进行了抽象，并使用了一种深度生成模型calledbounded KRnet来近似它。通过这些新样本和其相应的PDF值，我们可以更准确地估算可变损失函数，从而提高方法的准确性。相比原始的深度Ritz方法，我们的新方法尤其在低regularity和高维度问题上具有更高的准确性。我们通过一系列数值实验证明了新方法的有效性。

DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic

paper_url: http://arxiv.org/abs/2310.17173
repo_url: None
paper_authors: Dexter Neo, Tsuhan Chen
for: 这个论文是为了提高 Soft Actor-Critic (SAC) 算法的性能而设计的。
methods: 论文使用 Maximum Entropy Principle 和一个附加的统计约束来提高 discrete SAC 算法。
results: 论文的实验结果表明，这些约束可以提供额外的鲁棒性，使得在实际世界中部署的机器学习代理人更加安全。

Abstract
We present a novel extension to the family of Soft Actor-Critic (SAC) algorithms. We argue that based on the Maximum Entropy Principle, discrete SAC can be further improved via additional statistical constraints derived from a surrogate critic policy. Furthermore, our findings suggests that these constraints provide an added robustness against potential domain shifts, which are essential for safe deployment of reinforcement learning agents in the real-world. We provide theoretical analysis and show empirical results on low data regimes for both in-distribution and out-of-distribution variants of Atari 2600 games.

摘要
我们提出了一种新的扩展，基于软活动评价（SAC）算法家族。我们认为，基于最大 entropy原则，随机SAC可以通过额外的统计约束来进一步改进。此外，我们的发现表明，这些约束提供了额外的鲁棒性，对于 potential domain shift 的抵御，这些鲁棒性是实际世界中RL Agent的部署安全的重要条件。我们提供了理论分析，并在低数据情况下对各种Atari 2600游戏进行了实验研究。Note: "Maximum Entropy Principle" is translated as "最大 entropy原则" in Simplified Chinese.

Learning an Inventory Control Policy with General Inventory Arrival Dynamics

paper_url: http://arxiv.org/abs/2310.17168
repo_url: None
paper_authors: Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade
for: 这个论文是关于如何学习和测试存储控制策略的问题。
methods: 该论文使用了深度生成模型来模拟订购量的过程，并将问题формализова为一个外部决策过程，可以应用 Madeka et al. (2022) 的结果来降低到supervised learning。
results: 通过实验研究，该论文发现使用Gen-QOT可以提高生产率的利润。此外，使用实际场景A/B测试数据，研究发现Gen-QOT可以 generale well到off-policy数据。

Abstract
In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT). We also allow for order quantities to be modified as a post-processing step to meet vendor constraints such as order minimum and batch size constraints -- a common practice in real supply chains. To the best of our knowledge this is the first work to handle either arbitrary arrival dynamics or an arbitrary downstream post-processing of order quantities. Building upon recent work (Madeka et al., 2022) we similarly formulate the periodic review inventory control problem as an exogenous decision process, where most of the state is outside the control of the agent. Madeka et al. (2022) show how to construct a simulator that replays historic data to solve this class of problem. In our case, we incorporate a deep generative model for the arrivals process as part of the history replay. By formulating the problem as an exogenous decision process, we can apply results from Madeka et al. (2022) to obtain a reduction to supervised learning. Finally, we show via simulation studies that this approach yields statistically significant improvements in profitability over production baselines. Using data from an ongoing real-world A/B test, we show that Gen-QOT generalizes well to off-policy data.

摘要
在这篇论文中，我们 addresses the problem of 学习和测试存储控制策略在总订购动态的存储系统中，包括通用的量订购模型（QOT）。我们还允许订购量在后处理步骤中进行修改，以满足供应商的订购最小量和批量限制。根据我们所知，这是首次处理通用的订购动态或者后处理步骤中的订购量修改。我们基于最近的研究（Madeka et al., 2022），将 periodic review inventory control problem формализова为外生决策过程，其中大多数状态外部控制者的agent。Madeka et al. (2022) 示出如何构建一个 simulator，以历史数据来解决这类问题。在我们的情况下，我们将生成型模型包含在历史重温中。通过将问题формализова为外生决策过程，我们可以应用 Madeka et al. (2022) 中的结果，从而实现减少到监管学习。最后，我们通过实验研究发现，这种方法可以 statistically significant 提高生产率的利润。使用实际世界的 A/B 测试数据，我们发现 Gen-QOT 可以很好地适应偏离策略数据。

MaxEnt Loss: Constrained Maximum Entropy for Calibration under Out-of-Distribution Shift

paper_url: http://arxiv.org/abs/2310.17159
repo_url: None
paper_authors: Dexter Neo, Stefan Winkler, Tsuhan Chen
for: 提高模型的对外类型（out-of-distribution，OOD）折衔问题的解决。
methods: 基于最大熵原则，在训练中包含有助于性的统计约束，以提高模型的折衔而不 sacrificing 准确性。
results: 理论分析和实验结果表明，我们的方法可以在实际应用中很好地实现对模型的折衔，在synthetic和实际 benchmark上达到了状态的折衔表现。

Abstract
We present a new loss function that addresses the out-of-distribution (OOD) calibration problem. While many objective functions have been proposed to effectively calibrate models in-distribution, our findings show that they do not always fare well OOD. Based on the Principle of Maximum Entropy, we incorporate helpful statistical constraints observed during training, delivering better model calibration without sacrificing accuracy. We provide theoretical analysis and show empirically that our method works well in practice, achieving state-of-the-art calibration on both synthetic and real-world benchmarks.

摘要
我们提出了一种新的损失函数，解决了对外部分布（OOD）的准确性问题。虽然许多目标函数已经被提出来有效地在分布内进行准确化，但我们的发现表明它们并不总是在OOD上很好。基于最大熵原理，我们吸收了训练中观察到的有用统计约束，以提供更好的模型准确化，不会影响准确性。我们提供了理论分析，并证明了我们的方法在实践中具有优秀的性能，在 sintetic 和实际 benchmark 上达到了状态之册准确性。

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

paper_url: http://arxiv.org/abs/2310.17157
repo_url: https://github.com/fminference/dejavu
paper_authors: Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen
for: 降低大语言模型（LLM）的执行时间开销
methods: 利用上下文稀缺性（contextual sparsity）来降低LLM的执行时间开销，并不会牺牲LLM的质量或学习能力
results: 对于OPT-175B模型，DejaVu系统可以将执行时间开销降低至至少2倍，并且在比较常用的Hugging Face实现中降低至至少6倍，无需增加计算成本。

Abstract
Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference time. Sparsity is a natural approach to reduce this cost, but existing methods either require costly retraining, have to forgo LLM's in-context learning ability, or do not yield wall-clock time speedup on modern hardware. We hypothesize that contextual sparsity, which are small, input-dependent sets of attention heads and MLP parameters that yield approximately the same output as the dense model for a given input, can address these issues. We show that contextual sparsity exists, that it can be accurately predicted, and that we can exploit it to speed up LLM inference in wall-clock time without compromising LLM's quality or in-context learning ability. Based on these insights, we propose DejaVu, a system that uses a low-cost algorithm to predict contextual sparsity on the fly given inputs to each layer, along with an asynchronous and hardware-aware implementation that speeds up LLM inference. We validate that DejaVu can reduce the inference latency of OPT-175B by over 2X compared to the state-of-the-art FasterTransformer, and over 6X compared to the widely used Hugging Face implementation, without compromising model quality. The code is available at https://github.com/FMInference/DejaVu.

摘要
大型语言模型（LLM）有百十亿个参数，启动了一新波的聪明AI应用。然而，它们在推论时需要大量的计算资源。简约是一种自然的方法来减少这些成本，但现有的方法可能需要重新训练，或者牺牲LLM的内部学习能力，或者不会在现代硬件上获得实时时钟速度优化。我们 hypothesis 认为，contextual sparsity，即对于每个输入的小型、受输入影响的注意头和多层感知（MLP）参数，可以解决这些问题。我们证明了contextual sparsity存在，可以准确预测，并且可以利用它来快速化LLM推论。基于这些见解，我们提出了DejaVu，一个使用低成本的算法来预测contextual sparsity的系统，以及一个ynchronized和硬件对应的实现，可以在实时时钟上快速化LLM推论。我们验证了DejaVu可以对OPT-175B减少推论延迟时间比state-of-the-art FasterTransformer高出2倍，并且高于广泛使用的Hugging Face实现高出6倍，而不对模型质量造成妥协。代码可以在https://github.com/FMInference/DejaVu 中找到。

Spatio-Temporal Meta Contrastive Learning

paper_url: http://arxiv.org/abs/2310.17678
repo_url: https://github.com/hkuds/cl4st
paper_authors: Jiabin Tang, Lianghao Xia, Jie Hu, Chao Huang
for: 这个研究的目的是提高公共交通和安全管理，通过预测交通和犯罪活动的趋势。
methods: 这个研究使用了一种新的对照学习框架（CL4ST），将这些框架应用到交通和犯罪预测中。这个框架包括一个自动生成的节点和边扩展观，以及两个分支的图对照学习 paradigm。
results: 这个研究的结果显示，CL4ST在交通和犯罪预测中表现出色，较以往的基eline模型表现更好。

Abstract
Spatio-temporal prediction is crucial in numerous real-world applications, including traffic forecasting and crime prediction, which aim to improve public transportation and safety management. Many state-of-the-art models demonstrate the strong capability of spatio-temporal graph neural networks (STGNN) to capture complex spatio-temporal correlations. However, despite their effectiveness, existing approaches do not adequately address several key challenges. Data quality issues, such as data scarcity and sparsity, lead to data noise and a lack of supervised signals, which significantly limit the performance of STGNN. Although recent STGNN models with contrastive learning aim to address these challenges, most of them use pre-defined augmentation strategies that heavily depend on manual design and cannot be customized for different Spatio-Temporal Graph (STG) scenarios. To tackle these challenges, we propose a new spatio-temporal contrastive learning (CL4ST) framework to encode robust and generalizable STG representations via the STG augmentation paradigm. Specifically, we design the meta view generator to automatically construct node and edge augmentation views for each disentangled spatial and temporal graph in a data-driven manner. The meta view generator employs meta networks with parameterized generative model to customize the augmentations for each input. This personalizes the augmentation strategies for every STG and endows the learning framework with spatio-temporal-aware information. Additionally, we integrate a unified spatio-temporal graph attention network with the proposed meta view generator and two-branch graph contrastive learning paradigms. Extensive experiments demonstrate that our CL4ST significantly improves performance over various state-of-the-art baselines in traffic and crime prediction.

摘要
“空间时间预测是现实世界中许多应用程序的关键，包括交通预测和犯罪预测，以提高公共交通和安全管理。许多当今最佳模型表明了 Space-Time Graph Neural Network（STGNN）的强大能力，用于捕捉复杂的空间时间相关性。然而，现有的方法并不充分解决一些关键挑战。数据质量问题，如数据缺乏和稀疏性，导致数据噪音和缺乏监督信号，这些限制了 STGNN 的性能。虽然最近的 STGNN 模型采用了对比学习，但大多数其中使用手动定义的扩展策略，这些策略无法适应不同的 Space-Time Graph（STG）场景。为了解决这些挑战，我们提出了一个新的空间时间对比学习（CL4ST）框架，用于生成Robust和通用的 STG 表示。具体来说，我们设计了元视图生成器，用于自动构建节点和边扩展视图 для每个分离的空间和时间图。元视图生成器使用元网络和参数化生成模型来自动定制扩展策略 для每个输入。这种个性化的扩展策略使得学习框架具备空间时间感知信息。此外，我们将统一的空间时间图注意力网络与我们的元视图生成器和两个分支图对比学习 paradigms相结合。广泛的实验表明，我们的 CL4ST 可以在交通预测和犯罪预测等领域sigificantly提高性能。”

Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration

paper_url: http://arxiv.org/abs/2310.17153
repo_url: https://github.com/longinyu/hsivi
paper_authors: Longlin Yu, Tianyu Xie, Yu Zhu, Tong Yang, Xiangyu Zhang, Cheng Zhang
for: 这个论文旨在扩展分析变量描述家族，使用层次结构定义半隐式分布。
methods: 该方法使用半隐式分布进行变量描述，并通过逐层匹配auxiliary分布来训练层次结构。
results: 该方法可以在几个 bayesian 推理问题中提高半隐式分布的表达力，并且可以使用 pré-训练的得分网络来加速 diffusion 模型的采样过程。

Abstract
Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner. However, the single-layer architecture commonly used in current SIVI methods can be insufficient when the target posterior has complicated structures. In this paper, we propose hierarchical semi-implicit variational inference, called HSIVI, which generalizes SIVI to allow more expressive multi-layer construction of semi-implicit distributions. By introducing auxiliary distributions that interpolate between a simple base distribution and the target distribution, the conditional layers can be trained by progressively matching these auxiliary distributions one layer after another. Moreover, given pre-trained score networks, HSIVI can be used to accelerate the sampling process of diffusion models with the score matching objective. We show that HSIVI significantly enhances the expressiveness of SIVI on several Bayesian inference problems with complicated target distributions. When used for diffusion model acceleration, we show that HSIVI can produce high quality samples comparable to or better than the existing fast diffusion model based samplers with a small number of function evaluations on various datasets.

摘要
semi-implicit variational inference (SIVI) 已经引入以扩展分析性的变量家族。然而，目标 posterior 的复杂结构可能会使单层架构成为不够。在这篇论文中，我们提议使用层次 semi-implicit variational inference (HSIVI)，可以扩展 SIVI 以允许更表达性的多层建构。通过引入 auxiliary distribution，我们可以在层次上进行逐层匹配，从而训练 conditioning layer。此外，我们可以使用 pre-trained score network 加速 diffusion model 的采样过程，使用 score matching 目标。我们发现 HSIVI 可以在几个 Bayesian inference 问题上提高 SIVI 的表达能力。当用于 diffusion model 加速时，HSIVI 可以生成高质量样本，与现有的快速 diffusion model 基于 samplers 相比，只需少量的函数评估。

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.17139
repo_url: https://github.com/zanghyu/offline_bisimulation
paper_authors: Hongyu Zang, Xin Li, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Remi Tachet des Combes, Romain Laroche
for: 这篇论文旨在解释为RL任务中使用bisimulation方法时，在线和离线任务之间的性能差异的原因，以及如何通过expectile算子和合适的奖励缩放策略来改进性能。
methods: 这篇论文使用了bisimulation方法，以及expectile算子和奖励缩放策略来解决RL任务中的问题。
results: 研究发现，在bisimulation方法中缺失的转移会导致优化器过度适应 incomplete data，而导致性能下降。此外，研究还发现，对于RL任务，奖励缩放策略可以有效地避免特征塌陷。通过应用expectile算子和合适的奖励缩放策略，研究在D4RL和Visual D4RL两个benchmark suite上实现了性能提升。

Abstract
While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation. We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce. Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data. Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space. We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL. Codes are provided at \url{https://github.com/zanghyu/Offline_Bisimulation}.

摘要
在bisimulation基础上的方法在rekinderlearning（RL）任务上表现良好，但在线上RL任务中它们的表现并没有达到预期。在某些情况下，它们甚至明显下出于代码。我们想要了解bisimulation方法在线上成功的原因，而在线下任务中它们失败的原因。我们的分析发现， dataset中缺失的转移特别是bisimulation原则的瓶颈，导致无效的估计。我们还发现了奖金缩放对bisimulation测量和值误差的约束的重要作用。基于这些发现，我们提议在线下RLSetting中使用expectile算子进行表示学习，以防止适应不完整的数据。同时，通过引入适当的奖金缩放策略，我们可以在表示空间中避免特征塌陷。我们实现这些建议在两种bisimulation基础上的算法MICo和SimSR上，并在D4RL和Visual D4RL benchmark suite上示出了性能提升。代码可以在\url{https://github.com/zanghyu/Offline_Bisimulation}上找到。

Large-Scale Gaussian Processes via Alternating Projection

paper_url: http://arxiv.org/abs/2310.17137
repo_url: None
paper_authors: Kaiwen Wu, Jonathan Wenger, Haydn Jones, Geoff Pleiss, Jacob R. Gardner
for: This paper aims to improve the efficiency of Gaussian process (GP) hyperparameter optimization for large datasets.
methods: The proposed method uses an iterative approach that only accesses subblocks of the kernel matrix, enabling mini-batching and reducing the time and space complexity to $\mathcal{O}(n)$.
results: The proposed method accelerates training by a factor of 2 to 27 compared to conjugate gradients (CG) on large-scale benchmark datasets with up to four million datapoints, and enjoys linear convergence and robustness to ill-conditioning.

Abstract
Gaussian process (GP) hyperparameter optimization requires repeatedly solving linear systems with $n \times n$ kernel matrices. To address the prohibitive $\mathcal{O}(n^3)$ time complexity, recent work has employed fast iterative numerical methods, like conjugate gradients (CG). However, as datasets increase in magnitude, the corresponding kernel matrices become increasingly ill-conditioned and still require $\mathcal{O}(n^2)$ space without partitioning. Thus, while CG increases the size of datasets GPs can be trained on, modern datasets reach scales beyond its applicability. In this work, we propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling \emph{mini-batching}. Our algorithm, based on alternating projection, has $\mathcal{O}(n)$ per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets. Theoretically, we prove our method enjoys linear convergence and empirically we demonstrate its robustness to ill-conditioning. On large-scale benchmark datasets up to four million datapoints our approach accelerates training by a factor of 2$\times$ to 27$\times$ compared to CG.

摘要

On the Convergence of CART under Sufficient Impurity Decrease Condition

paper_url: http://arxiv.org/abs/2310.17114
repo_url: None
paper_authors: Rahul Mazumder, Haoyue Wang
for: 这篇论文主要研究了CART算法在回归设置下的收敛速率。
methods: 作者首先提出了一个上界 bound 来证明 CART 算法在 SID 条件下的预测错误。此外，作者还提供了一些容易验证的 suficient condition 来满足 SID 条件，并在几种常见的非 Parametric 估计中进行了示例。
results: 作者发现了一个可以提高 CART 算法的预测错误 bound 的结果，并且证明了这个结果在一些特定的函数类型中是最佳的。此外，作者还提供了一些实用的函数类型，以便在非 Parametric 估计中使用 CART 算法。

Abstract
The decision tree is a flexible machine learning model that finds its success in numerous applications. It is usually fitted in a recursively greedy manner using CART. In this paper, we investigate the convergence rate of CART under a regression setting. First, we establish an upper bound on the prediction error of CART under a sufficient impurity decrease (SID) condition \cite{chi2022asymptotic} -- our result improves upon the known result by \cite{chi2022asymptotic} under a similar assumption. Furthermore, we provide examples that demonstrate the error bound cannot be further improved by more than a constant or a logarithmic factor. Second, we introduce a set of easily verifiable sufficient conditions for the SID condition. Specifically, we demonstrate that the SID condition can be satisfied in the case of an additive model, provided that the component functions adhere to a ``locally reverse Poincar{\'e} inequality". We discuss several well-known function classes in non-parametric estimation to illustrate the practical utility of this concept.

摘要
“决策树是一种灵活的机器学习模型，在许多应用中获得成功。它通常使用CART进行递归循环 fitted。在这篇论文中，我们研究了CART在回归设置下的收敛率。首先，我们确定了CART在充分减少纯度（SID）条件下的预测误差Upper bound——我们的结果超越了类似假设下的已知结果。其次，我们引入了一组容易验证的Sufficient condition，以确保SID条件的满足。 Specifically，我们示出了在加itive模型中，只要 componenet functions 遵循一种“本地逆波卡尔杜里 inequality”，就可以满足SID条件。我们介绍了一些常见的非参数统计学中的函数类型，以 Illustrate the practical utility of this concept。”Note: "Sufficient condition" is translated as "容易验证的Sufficient condition" in Simplified Chinese, which is a literal translation of the original English phrase. However, the word "Sufficient" is not commonly used in Simplified Chinese, and the more idiomatic translation of the phrase might be "可靠的条件" (kě lì de tiáo yì) or "可以验证的条件" (kě yǐ jiàn yì de tiáo yì).

LLM4DyG: Can Large Language Models Solve Problems on Dynamic Graphs?

paper_url: http://arxiv.org/abs/2310.17110
repo_url: None
paper_authors: Zeyang Zhang, Xin Wang, Ziwei Zhang, Haoyang Li, Yijian Qin, Simin Wu, Wenwu Zhu
for: 这篇论文旨在评估语言模型在动态图上的空间-时间理解能力，尤其是在面对动态图数据时。
methods: 作者提出了一个名为LLM4DyG的benchmark，用于评估语言模型在动态图上的空间-时间理解能力。他们还进行了广泛的实验研究，以分析不同的数据生成器、数据统计方法、提示技术和语言模型在模型性能上的影响。
results: 研究发现，语言模型在动态图上有先验空间-时间理解能力，但是随着图像大小和密度增加，模型的性能显著下降。此外，提出了一种名为Disentangled Spatial-Temporal Thoughts（DST2）的提示方法，可以帮助提高语言模型在动态图上的空间-时间理解能力。

Abstract
In an era marked by the increasing adoption of Large Language Models (LLMs) for various tasks, there is a growing focus on exploring LLMs' capabilities in handling web data, particularly graph data. Dynamic graphs, which capture temporal network evolution patterns, are ubiquitous in real-world web data. Evaluating LLMs' competence in understanding spatial-temporal information on dynamic graphs is essential for their adoption in web applications, which remains unexplored in the literature. In this paper, we bridge the gap via proposing to evaluate LLMs' spatial-temporal understanding abilities on dynamic graphs, to the best of our knowledge, for the first time. Specifically, we propose the LLM4DyG benchmark, which includes nine specially designed tasks considering the capability evaluation of LLMs from both temporal and spatial dimensions. Then, we conduct extensive experiments to analyze the impacts of different data generators, data statistics, prompting techniques, and LLMs on the model performance. Finally, we propose Disentangled Spatial-Temporal Thoughts (DST2) for LLMs on dynamic graphs to enhance LLMs' spatial-temporal understanding abilities. Our main observations are: 1) LLMs have preliminary spatial-temporal understanding abilities on dynamic graphs, 2) Dynamic graph tasks show increasing difficulties for LLMs as the graph size and density increase, while not sensitive to the time span and data generation mechanism, 3) the proposed DST2 prompting method can help to improve LLMs' spatial-temporal understanding abilities on dynamic graphs for most tasks. The data and codes will be open-sourced at publication time.

摘要
在大语言模型（LLM）的广泛采用时代，关注 LLM 在处理网络数据上的能力已成为一个关键问题。实际上，网络数据中的动态图都是通用的。评估 LLM 在动态图上的空间-时间信息理解能力是为其在网络应用中使用的关键验证。在这篇论文中，我们填补了这个隔阂，我们提出了 LL4DyG 测试套件，包括9个特定任务，用于评估 LLM 在动态图上的空间-时间能力。然后，我们进行了广泛的实验，以分析不同的数据生成器、数据统计、提示技术和 LLM 对模型性能的影响。最后，我们提出了改进 LLM 在动态图上的空间-时间理解能力的感知思维（DST2）方法。我们的主要观察结果是：1） LLM 在动态图上已经具备了一定的空间-时间理解能力；2）动态图任务随着图像大小和密度增加而变得越来越Difficult，而不受时间范围和数据生成机制的影响；3）我们提出的 DST2 提示方法可以帮助提高 LLM 在动态图上的空间-时间理解能力。数据和代码将在发表时公开源。

MIM-GAN-based Anomaly Detection for Multivariate Time Series Data

paper_url: http://arxiv.org/abs/2310.18257
repo_url: https://github.com/explorerlu1024/mimad-gan
paper_authors: Shan Lu, Zhicheng Dong, Donghong Cai, Fang Fang, Dongcai Zhao
for: 本研究提出了一种基于生成对抗网络（GAN）的多时序数据异常检测算法，用于检测时序数据中的异常点。
methods: 本算法使用了基于Long Short-Term Memory（LSTM）的生成器和批处理器，并引入了对抗推理损失函数以避免本地最佳解和模型塌陷。
results: 实验结果表明，提出的MIM-GAN基于异常检测算法在精度、回归率和F1分数等方面表现出色，比传统方法有所提高。

Abstract
The loss function of Generative adversarial network(GAN) is an important factor that affects the quality and diversity of the generated samples for anomaly detection. In this paper, we propose an unsupervised multiple time series anomaly detection algorithm based on the GAN with message importance measure(MIM-GAN). In particular, the time series data is divided into subsequences using a sliding window. Then a generator and a discriminator designed based on the Long Short-Term Memory (LSTM) are employed to capture the temporal correlations of the time series data. To avoid the local optimal solution of loss function and the model collapse, we introduce an exponential information measure into the loss function of GAN. Additionally, a discriminant reconstruction score consisting on discrimination and reconstruction loss is taken into account. The global optimal solution for the loss function is derived and the model collapse is proved to be avoided in our proposed MIM-GAN-based anomaly detection algorithm. Experimental results show that the proposed MIM-GAN-based anomaly detection algorithm has superior performance in terms of precision, recall, and F1 score.

摘要
“伪函数（GAN）的损失函数是发现异常时的重要因素，影响发现结果的质量和多样性。本文提出了一种基于GAN的无监督多时间序列伪函数检测算法（MIM-GAN）。具体来说，时间序列数据被分成子序列使用滑动窗口。然后，基于Long Short-Term Memory（LSTM）的生成器和检测器被用来捕捉时间序列数据的时间相关性。为避免本地最佳解和模型崩溃，我们将数据构造的实际信息量纳入损失函数中。此外，我们还考虑了检测和重建损失的平均损失，以确保模型能够具有全局最佳解。实验结果显示，我们的MIM-GAN基于伪函数检测算法在精度、回传和F1分数方面表现出色。”

Network Design through Graph Neural Networks: Identifying Challenges and Improving Performance

paper_url: http://arxiv.org/abs/2310.17100
repo_url: None
paper_authors: Donald Loveland, Rajmonda Caceres
for: 研究 Graph Neural Network (GNN) 的修改策略，以提高网络设计。
methods: 分析 previous works 中的 gradient 计算，揭示影响 editing 的因素，并提出一种 iterative editing 方法（ORE），可以更好地避免基于结构性质的错误编辑。
results: 通过一系列设计任务和外部验证方法，证明 ORE 可以提高 editing 效果，比前一代方法提高至 50%。

Abstract
Graph Neural Network (GNN) research has produced strategies to modify a graph's edges using gradients from a trained GNN, with the goal of network design. However, the factors which govern gradient-based editing are understudied, obscuring why edges are chosen and if edits are grounded in an edge's importance. Thus, we begin by analyzing the gradient computation in previous works, elucidating the factors that influence edits and highlighting the potential over-reliance on structural properties. Specifically, we find that edges can achieve high gradients due to structural biases, rather than importance, leading to erroneous edits when the factors are unrelated to the design task. To improve editing, we propose ORE, an iterative editing method that (a) edits the highest scoring edges and (b) re-embeds the edited graph to refresh gradients, leading to less biased edge choices. We empirically study ORE through a set of proposed design tasks, each with an external validation method, demonstrating that ORE improves upon previous methods by up to 50%.

摘要
GRAPH Neural Network (GNN) 研究已经制定了使用训练GNN的梯度来修改图的边，以达到网络设计的目标。然而，影响梯度基于编辑的因素尚未得到足够的研究，导致选择边和编辑是否与设计任务相关的问题未得到解释。因此，我们开始分析过去的研究中的梯度计算，抛出影响编辑的因素，并指出梯度基于结构性质可能导致不准确的编辑。为了改进编辑，我们提出了ORE，一种迭代编辑方法，它包括（a）编辑梯度最高的边，并（b）重新嵌入编辑后的图以重新计算梯度，从而避免结构性质导致的不准确编辑。我们通过一系列的设计任务和外部验证方法来实验ORE，发现它比前一代方法提高了50%。

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

paper_url: http://arxiv.org/abs/2310.17087
repo_url: None
paper_authors: Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao
for: This paper aims to understand the implicit biases that arise when using large learning rates for nonconvex optimization, and to develop a new global convergence theory for this setting.
methods: The paper uses a combination of theoretical and experimental techniques to study the behavior of large learning rate gradient descent for nonconvex optimization. The authors develop a new global convergence theory for this setting, and validate their results with experiments on neural networks.
results: The paper shows that large learning rates can lead to various implicit biases, including the edge of stability, balancing, and catapult phenomena. The authors also establish that these biases are a result of the combination of a provable preference of large learning rate gradient descent for moving toward flatter regions, and the good regularity of the objective function. Additionally, the paper provides the first non-asymptotic convergence rate bound for large-learning-rate gradient descent optimization of nonconvex functions.

Abstract
Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases including the edge of stability (Cohen et al., 2021), balancing (Wang et al., 2022), and catapult (Lewkowycz et al., 2020). These phenomena cannot be well explained by classical optimization theory. Though significant theoretical progress has been made in understanding these implicit biases, it remains unclear for which objective functions would they occur. This paper provides an initial step in answering this question, namely that these implicit biases are in fact various tips of the same iceberg. They occur when the objective function of optimization has some good regularity, which, in combination with a provable preference of large learning rate gradient descent for moving toward flatter regions, results in these nontrivial dynamical phenomena. To establish this result, we develop a new global convergence theory under large learning rates, for a family of nonconvex functions without globally Lipschitz continuous gradient, which was typically assumed in existing convergence analysis. A byproduct is the first non-asymptotic convergence rate bound for large-learning-rate gradient descent optimization of nonconvex functions. We also validate our theory with experiments on neural networks, where different losses, activation functions, and batch normalization all can significantly affect regularity and lead to very different training dynamics.

摘要
大的学习率，当应用到非对称优化中的梯度下降中，会导致多种隐式偏见，包括稳定边缘（Cohen et al., 2021）、平衡（Wang et al., 2022）和投射机（Lewkowycz et al., 2020）。这些现象无法由传统优化理论解释。虽然在理解这些隐式偏见方面已经做出了重要的理论进步，但是还没有确定哪些目标函数会出现这些现象。这篇论文提供了一个初步答案，即这些隐式偏见实际上是同一个冰山下的不同端点。它们发生在目标函数的优化中有一定的良好的规则性，当combined with provable preference of large learning rate gradient descent for moving toward flatter regions，导致这些非常 dynamical phenomena。为了证明这个结论，我们开发了一种新的全球准确性理论，用于大学习率下的非对称函数优化，这些函数通常会被Assume是全球 lipschitz连续的梯度。这种新的理论允许我们提供非对称函数优化的首个非对应�C bound。我们还通过实验 validate our theory on neural networks，发现不同的损失函数、活动函数和批处理均可以对regularity产生很大的影响，从而导致非常不同的训练 Dinamics。

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates

paper_url: http://arxiv.org/abs/2310.17074
repo_url: None
paper_authors: Miao Lu, Beining Wu, Xiaodong Yang, Difan Zou
for: 这个研究是为了研究吧神经网络（NN）在使用大学习率SGD算法进行训练时的泛化性能。
methods: 这个研究使用了大学习率SGD算法来训练NN，并发现在这种训练环境下，NN的振荡可以提高NN的泛化性能。
results: 研究发现，使用大学习率SGD算法训练NN可以更好地学习弱特征（weak features），而不是使用小学习率SGD算法训练NN所能学习的强特征（strong features）。这种现象被称为“有利振荡”。

Abstract
In this work, we theoretically investigate the generalization properties of neural networks (NN) trained by stochastic gradient descent (SGD) algorithm with large learning rates. Under such a training regime, our finding is that, the oscillation of the NN weights caused by the large learning rate SGD training turns out to be beneficial to the generalization of the NN, which potentially improves over the same NN trained by SGD with small learning rates that converges more smoothly. In view of this finding, we call such a phenomenon "benign oscillation". Our theory towards demystifying such a phenomenon builds upon the feature learning perspective of deep learning. Specifically, we consider a feature-noise data generation model that consists of (i) weak features which have a small $\ell_2$-norm and appear in each data point; (ii) strong features which have a larger $\ell_2$-norm but only appear in a certain fraction of all data points; and (iii) noise. We prove that NNs trained by oscillating SGD with a large learning rate can effectively learn the weak features in the presence of those strong features. In contrast, NNs trained by SGD with a small learning rate can only learn the strong features but makes little progress in learning the weak features. Consequently, when it comes to the new testing data which consist of only weak features, the NN trained by oscillating SGD with a large learning rate could still make correct predictions consistently, while the NN trained by small learning rate SGD fails. Our theory sheds light on how large learning rate training benefits the generalization of NNs. Experimental results demonstrate our finding on "benign oscillation".

摘要
在这个研究中，我们研究了神经网络（NN）由权重学习率很大的梯度下降（SGD）算法进行训练的泛化性能。在这种训练 regime 中，我们发现，NN的权重往复引起的SGD训练中的振荡实际上对NN的泛化有利，可能超过与SGD学习率很小的NN训练，该训练更平滑。根据这种现象，我们称之为“有利的振荡”。我们的理论基于深度学习的特征学习视角。我们考虑了一种特征-噪声数据生成模型，包括（i）弱特征，它们具有小$\ell_2$范数并出现在每个数据点中;（ii）强特征，它们具有更大的$\ell_2$范数，仅出现在一定比例的所有数据点中;以及（iii）噪声。我们证明，由振荡SGD训练的大学习率NN可以有效地在强特征的存在下学习弱特征。相比之下，SGD训练的小学习率NN只能学习强特征，对弱特征的学习做不到进展。因此，当新的测试数据只包含弱特征时，由振荡SGD训练的NN可以在新的测试数据上做出正确预测，而SGD训练的NN则失败。我们的理论解释了大学习率训练如何提高NN的泛化性能。实验结果证明了我们的发现。