2023-11-22

cs.LG

cs.LG - 2023-11-22

Deep Learning as a Method for Inversion of NMR Signals

paper_url: http://arxiv.org/abs/2311.13722
repo_url: None
paper_authors: Julian B. B. Beckmann, Mick D. Mantle, Andrew J. Sederman, Lynn F. Gladden
for: 用于推理NMR信号的逆解决方案
methods: 使用深度学习来实现NMR信号的逆解决方案，具体来说是使用卷积神经网络来实现图像到图像回归问题
results: 与常用的Tikhonov和修改Total Generalized Variation（MTGV）正则化方法相比，深度学习逆解决方案更快速，并且在大多数情况下也超越了这两种正则化方法的性能。

Abstract
The concept of deep learning is employed for the inversion of NMR signals and it is shown that NMR signal inversion can be considered as an image-to-image regression problem, which can be treated with a convolutional neural net. It is further outlined, that inversion through deep learning provides a clear efficiency and usability advantage compared to regularization techniques such as Tikhonov and modified total generalized variation (MTGV), because no hyperparemeter selection prior to reconstruction is necessary. The inversion network is applied to simulated NMR signals and the results compared with Tikhonov- and MTGV-regularization. The comparison shows that inversion via deep learning is significantly faster than the latter regularization methods and also outperforms both regularization techniques in nearly all instances.

摘要
深度学习概念被应用于核磁共振信号的逆转，并表明了核磁共振信号逆转可以视为一种图像到图像回归问题，可以通过卷积神经网络进行处理。此外，还指出了使用深度学习进行逆转的明显效率和使用便捷性的优势，不需要在恢复之前进行参数选择。逆转网络被应用于模拟的核磁共振信号上，并与 Тихонов和改进的总束变量（MTGV）等正则化方法进行比较。结果表明，使用深度学习进行逆转比这些正则化方法更快，并且在大多数情况下也超越了这些正则化方法。

Bayes-xG: Player and Position Correction on Expected Goals (xG) using Bayesian Hierarchical Approach

paper_url: http://arxiv.org/abs/2311.13707
repo_url: None
paper_authors: Alexander Scholtes, Oktay Karakuş
for: 本研究使用 Bayesian 方法来探索打分机会的可能性（Expected Goals，xG） metric 的影响因素。
methods: 利用StatsBomb公开的数据，建立了 Bayesian hierarchical logistic regression 模型，分析了约10,000个英超联赛的点球，以确定 whether 位置或球员水平的因素对 xG 产生影响。
results: 研究发现，位置效应在基本模型中，只有距离球门和射击角度作为预测器时出现，后背和进攻中场球员的打分机会更高。然而，这些效应随着更多的预测器的引入而减弱。不过，even with additional predictors, player-level effects persist, indicating that certain players have notable positive or negative xG adjustments that influence their likelihood of scoring a given chance。研究还扩展到西班牙的La Liga和德国的Bundesliga的数据，得到了相似的结果。

Abstract
This study employs Bayesian methodologies to explore the influence of player or positional factors in predicting the probability of a shot resulting in a goal, measured by the expected goals (xG) metric. Utilising publicly available data from StatsBomb, Bayesian hierarchical logistic regressions are constructed, analysing approximately 10,000 shots from the English Premier League to ascertain whether positional or player-level effects impact xG. The findings reveal positional effects in a basic model that includes only distance to goal and shot angle as predictors, highlighting that strikers and attacking midfielders exhibit a higher likelihood of scoring. However, these effects diminish when more informative predictors are introduced. Nevertheless, even with additional predictors, player-level effects persist, indicating that certain players possess notable positive or negative xG adjustments, influencing their likelihood of scoring a given chance. The study extends its analysis to data from Spain's La Liga and Germany's Bundesliga, yielding comparable results. Additionally, the paper assesses the impact of prior distribution choices on outcomes, concluding that the priors employed in the models provide sound results but could be refined to enhance sampling efficiency for constructing more complex and extensive models feasibly.

摘要
Here is the text in Simplified Chinese:这个研究使用 bayesian 方法来探索Player或位置因素对于计算机绘制的可能性的影响，通过statsbomb 提供的公共数据，构建了bayesian 层次逻辑回归，分析了约10,000次英超联赛中的射击，以确定position或player-level 效应是否影响 xG。结果显示，位置效应在只包括距离球门和射击角度的基本模型中存在，攻击 midfielder 和前锋在距离球门越近和射击角度越大时，有更高的得分可能性。但是，当更多的预测器被引入时，这些效应就减弱了。此外，player-level 效应仍然存在，表明某些球员在给定的机会时，具有明显的正或负 xG 调整，影响其得分可能性。研究还扩展到了西班牙的La Liga和德国的Bundesliga数据，获得了相似的结果。此外，文章还评估了采样效率的影响，结论采样效率可以通过修改 prior distribution 来提高，以建立更复杂和详细的模型。

BackboneLearn: A Library for Scaling Mixed-Integer Optimization-Based Machine Learning

paper_url: http://arxiv.org/abs/2311.13695
repo_url: https://github.com/chziakas/backbone_learn
paper_authors: Vassilis Digalakis Jr, Christos Ziakas
for: 这篇论文是为了提出一种用于扩展杂数优化问题的开源软件包和框架。
methods: 这篇论文使用了核心算法来解决杂数优化问题，并且可以快速解决高维问题。
results: 这篇论文的实验结果表明，使用这种方法可以比精确方法更快速地解决杂数优化问题，并且比常用的优化方法更准确。

Abstract
We present BackboneLearn: an open-source software package and framework for scaling mixed-integer optimization (MIO) problems with indicator variables to high-dimensional problems. This optimization paradigm can naturally be used to formulate fundamental problems in interpretable supervised learning (e.g., sparse regression and decision trees), in unsupervised learning (e.g., clustering), and beyond; BackboneLearn solves the aforementioned problems faster than exact methods and with higher accuracy than commonly used heuristics. The package is built in Python and is user-friendly and easily extensible: users can directly implement a backbone algorithm for their MIO problem at hand. The source code of BackboneLearn is available on GitHub (link: https://github.com/chziakas/backbone_learn).

摘要
我们介绍BackboneLearn：一个开源软件套件和框架，用于扩展束缩数据类型（MIO）问题中的标志变量至高维问题。这个优化架构可以自然地用来设计基本的探索学习问题（例如：简单 regression和决策树）、不supervised learning（例如：分组）和其他领域；BackboneLearn 可以更快速地解决这些问题，并且与通常使用的优化方法相比，具有更高的精度。BackboneLearn 是在 Python 中建立的，易用和可扩展：用户可以直接将背bone算法套用到自己的 MIO 问题中。BackboneLearn 的源代码可以在 GitHub 上获取（连结：https://github.com/chziakas/backbone_learn）。

Beat-Aligned Spectrogram-to-Sequence Generation of Rhythm-Game Charts

paper_url: http://arxiv.org/abs/2311.13687
repo_url: https://github.com/stet-stet/goct_ismir2023
paper_authors: Jayeon Yi, Sungho Lee, Kyogu Lee
for: 该论文targets于chart生成，即在音乐上进行动作的游戏中，需要玩家按照指示执行动作。
methods: 该论文使用Transformer模型，使用大量数据进行训练，并 introduce tempo-informed preprocessing和training procedures，其中一些被认为是训练成功的关键。
results: 该模型在大量数据集上表现出优于基线模型，并且也可以通过预训练和finetuning进一步提高表现。

Abstract
In the heart of "rhythm games" - games where players must perform actions in sync with a piece of music - are "charts", the directives to be given to players. We newly formulate chart generation as a sequence generation task and train a Transformer using a large dataset. We also introduce tempo-informed preprocessing and training procedures, some of which are suggested to be integral for a successful training. Our model is found to outperform the baselines on a large dataset, and is also found to benefit from pretraining and finetuning.

摘要
在“节奏游戏”的心脏地带 - 游戏中玩家需要按照音乐的节奏进行动作 - 有“Chart”，它们是玩家需要遵循的指令。我们新的形式化chart生成为序列生成任务，使用大量数据训练Transformer。我们还提出了根据节奏情况进行预处理和训练方法，其中一些方法被认为是成功训练的重要组成部分。我们的模型在大量数据上表现出色，并且也受益于预训练和精度调整。

A Joint Gradient and Loss Based Clustered Federated Learning Design

paper_url: http://arxiv.org/abs/2311.13665
repo_url: None
paper_authors: Licheng Lin, Mingzhe Chen, Zhaohui Yang, Yusen Wu, Yuchen Liu
for: 提出了一种基于分布式边缘设备的集群 Federated Learning（FL）框架，使得不同数据非统一的边缘设备可以分布式地形成多个集群，并在每个集群中进行 FL 训练。
methods: 我们提出的集群 FL 算法需要解决两个与 FL 训练相关的挑战。首先，服务器有限的 FL 训练信息（即参数服务器只能获得每个设备的 FL 模型信息）和计算能力有限，不能对大量设备进行差异检测。其次，每个设备无法获得其他设备的数据信息，只能使用服务器发送的全球 FL 模型参数和自己的数据信息来确定自己的集群标识，这会增加设备分配的难度。为解决这两个挑战，我们提出了一种基于 gradient 和损失的分布式分配方法，每个设备根据自己的 local FL 模型和梯度相似性、训练损失来确定自己的集群标识。
results: 我们的集群 FL 算法可以将分布式分配 iterations 降低到99%以下，相比之下现有的基准值。

Abstract
In this paper, a novel clustered FL framework that enables distributed edge devices with non-IID data to independently form several clusters in a distributed manner and implement FL training within each cluster is proposed. In particular, our designed clustered FL algorithm must overcome two challenges associated with FL training. First, the server has limited FL training information (i.e., the parameter server can only obtain the FL model information of each device) and limited computational power for finding the differences among a large amount of devices. Second, each device does not have the data information of other devices for device clustering and can only use global FL model parameters received from the server and its data information to determine its cluster identity, which will increase the difficulty of device clustering. To overcome these two challenges, we propose a joint gradient and loss based distributed clustering method in which each device determines its cluster identity considering the gradient similarity and training loss. The proposed clustering method not only considers how a local FL model of one device contributes to each cluster but also the direction of gradient descent thus improving clustering speed. By delegating clustering decisions to edge devices, each device can fully leverage its private data information to determine its own cluster identity, thereby reducing clustering overhead and improving overall clustering performance. Simulation results demonstrate that our proposed clustered FL algorithm can reduce clustering iterations by up to 99% compared to the existing baseline.

摘要
在本文中，我们提出了一种新的分布式FL框架，允许分布式的边缘设备，具有非相同数据，独立形成多个分布式集群，并在每个集群中进行FL训练。具体来说，我们设计的分布式FL算法需要解决FL训练中两个挑战。首先，服务器有限制FL训练信息（即参数服务器只能获得每个设备的FL模型信息）和计算能力，用于找出多个设备之间的差异。其次，每个设备没有其他设备的数据信息，只能使用服务器发送的全局FL模型参数和自己的数据信息，确定自己的集群标识，这会增加设备归类的困难度。为了解决这两个挑战，我们提出了一种基于分布式gradient和损失的分布式归类方法。在这种方法中，每个设备根据自己的梯度相似性和训练损失来确定自己的集群标识。我们的归类方法不仅考虑了一个本地FL模型如何贡献到每个集群中，还考虑了梯度下降的方向，从而提高归类速度。通过委托归类决策给边缘设备，每个设备可以完全利用自己的私有数据信息来确定自己的集群标识，从而减少归类负担和提高总的归类性能。实验结果表明，我们提出的分布式FL算法可以相比现有基eline，减少归类轮次数量达99%。

Evaluating Pretrained models for Deployable Lifelong Learning

paper_url: http://arxiv.org/abs/2311.13648
repo_url: None
paper_authors: Kiran Lekkala, Eshan Bhargava, Laurent Itti
for: 这个论文旨在提出一种可部署的人生学习系统，用于视觉强化学习（RL），该系统可以在执行新任务时保持先前学习的知识。
methods: 该论文使用了一种基于几个shot类增量学习（FSCIL）的任务映射器，以及一个基于预训练数据集的encoder/backbone和策略参数。
results: 该论文通过在DeLL（部署 для人生学习） benchmark上进行实验，证明该系统可以扩展到涵盖大量任务，并且具有较小的内存占用量和计算资源。

Abstract
We create a novel benchmark for evaluating a Deployable Lifelong Learning system for Visual Reinforcement Learning (RL) that is pretrained on a curated dataset, and propose a novel Scalable Lifelong Learning system capable of retaining knowledge from the previously learnt RL tasks. Our benchmark measures the efficacy of a deployable Lifelong Learning system that is evaluated on scalability, performance and resource utilization. Our proposed system, once pretrained on the dataset, can be deployed to perform continual learning on unseen tasks. Our proposed method consists of a Few Shot Class Incremental Learning (FSCIL) based task-mapper and an encoder/backbone trained entirely using the pretrain dataset. The policy parameters corresponding to the recognized task are then loaded to perform the task. We show that this system can be scaled to incorporate a large number of tasks due to the small memory footprint and fewer computational resources. We perform experiments on our DeLL (Deployment for Lifelong Learning) benchmark on the Atari games to determine the efficacy of the system.

摘要
我们创建了一个新的标准测试 Deployable Lifelong Learning 系统 для视觉奖励学习（RL），该系统预训练在一个精心选择的数据集上，并提出了一种可扩展的学习系统，能够保留之前学习的 RL 任务知识。我们的标准测试量化系统的可扩展性、性能和资源利用率。我们提posed的系统，一旦预训练在数据集上，可以部署到执行不断学习新任务。我们的方法包括基于几招类增量学习（FSCIL）的任务映射器和由预训练数据集全部训练的encoder/背部。策略参数相应于认知任务被加载以执行任务。我们证明该系统可以扩展到包含大量任务，因为它具有小内存占用量和较少的计算资源。我们在DeLL（部署 для生命学习）标准测试上进行了Atari游戏的实验，以评估系统的有效性。

Covariance alignment: from maximum likelihood estimation to Gromov-Wasserstein

paper_url: http://arxiv.org/abs/2311.13595
repo_url: None
paper_authors: Yanjun Han, Philippe Rigollet, George Stepaniants
for: 本文研究了Feature alignment方法在科学领域中的数据池、注释和比较中的应用。这是一个偏推学习问题，Feature alignment具有 significante statistical和计算挑战。
methods: 本文提出了协方差对齐模型，用于比较不同的对齐方法并确定一个最小最大下界。这个下界是最小最大下界，且可以通过自然的 quasi MLE实现。然而，这个估计器包含一个搜索所有Permutation的步骤，对于中等大小的问题来说是计算不可能的。
results: 本文表明，使用Gromov-Wasserstein算法可以实现最小最大下界，并且这个算法在大规模问题上可以快速实现。这些结果为Feature alignment方法的实践提供了首次统计正当性。

Abstract
Feature alignment methods are used in many scientific disciplines for data pooling, annotation, and comparison. As an instance of a permutation learning problem, feature alignment presents significant statistical and computational challenges. In this work, we propose the covariance alignment model to study and compare various alignment methods and establish a minimax lower bound for covariance alignment that has a non-standard dimension scaling because of the presence of a nuisance parameter. This lower bound is in fact minimax optimal and is achieved by a natural quasi MLE. However, this estimator involves a search over all permutations which is computationally infeasible even when the problem has moderate size. To overcome this limitation, we show that the celebrated Gromov-Wasserstein algorithm from optimal transport which is more amenable to fast implementation even on large-scale problems is also minimax optimal. These results give the first statistical justification for the deployment of the Gromov-Wasserstein algorithm in practice.

摘要
<>translate_language: zh-CNFeature alignment methods are widely used in many scientific fields for data pooling, annotation, and comparison. As an instance of a permutation learning problem, feature alignment poses significant statistical and computational challenges. In this work, we propose the covariance alignment model to study and compare various alignment methods and establish a minimax lower bound for covariance alignment that has a non-standard dimension scaling due to the presence of a nuisance parameter. This lower bound is in fact minimax optimal and is achieved by a natural quasi MLE. However, this estimator involves a search over all permutations, which is computationally infeasible even for moderate-sized problems. To overcome this limitation, we show that the celebrated Gromov-Wasserstein algorithm from optimal transport, which is more amenable to fast implementation even on large-scale problems, is also minimax optimal. These results provide the first statistical justification for the deployment of the Gromov-Wasserstein algorithm in practice.

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

paper_url: http://arxiv.org/abs/2311.13589
repo_url: None
paper_authors: Zhengqi Wu, Renyuan Xu
for: 这篇论文目的是为了在具有不同风险偏好的决策者面前提出一种基于Markov决策过程(MDP)的风险敏感学习方法。
methods: 该论文提出了一种基于扩展状态空间的粗略化方法，以及一种修改了值迭代算法的方法，以便在具有风险偏好的情况下实现风险敏感学习。
results: 该论文的结果表明，在具有风险偏好的情况下，该方法可以准确地学习一个近似优化策略，并且可以保证样本复杂性和误差 bound。

Abstract
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to maximize the expected cumulative reward. However, in practical scenarios such as portfolio management and e-commerce recommendations, decision-makers often persist in heterogeneous risk preferences subject to outcome uncertainties, which can not be well-captured by the risk-neural framework. Incorporating these preferences can be approached through utility theory, yet the development of risk-sensitive RL under general utility functions remains an open question for theoretical exploration. In this paper, we consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a Markov decision process (MDP). To facilitate the Dynamic Programming Principle and Bellman equation, we enlarge the state space with an additional dimension that accounts for the cumulative reward. We propose a discretized approximation scheme to the MDP under enlarged state space, which is tractable and key for algorithmic design. We then propose a modified value iteration algorithm that employs an epsilon-covering over the space of cumulative reward. When a simulator is accessible, our algorithm efficiently learns a near-optimal policy with guaranteed sample complexity. In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy while ensuring a guaranteed regret bound. For both algorithms, we match the theoretical lower bounds for the risk-neutral setting.

摘要
强化学习（RL）在多种应用领域和理论研究中备受关注。现有RL理论文献主要关注在无风险情况下决策者 maximizes the expected cumulative reward。然而，在实际应用中，决策者经常具有多样化的风险偏好和结果不确定性，这些不能由风险中性框架捕捉。通过 utility theory，可以处理这些偏好，但是对于通用 utility function 的发展仍然是一个开放的理论问题。在这篇论文中，我们考虑了一种决策者通过 maximize 一个通用 utility function 的 cumulative reward 的 Markov decision process（MDP）框架中寻找最佳策略。为了使 Dynamic Programming Principle 和 Bellman 方程成立，我们增加了状态空间的一个额外维度，用于跟踪 cumulative reward。我们提出了一种简化 Approximation 方案，可以方便地实现 Dynamic Programming Principle。然后，我们提出了一种修改的值迭代算法，使用 epsilon 覆盖来space of cumulative reward。当有可用的 simulator 时，我们的算法可以高效地学习一个近似优化策略，并且有 garantied sample complexity。在 simulator 缺失时，我们的算法，通过 upper-confidence-bound exploration 方法，可以快速地识别一个近似优化策略，并且保证一个 garantied regret bound。我们的算法与 risk-neutral 情况下的理论下界匹配。

A Survey of Serverless Machine Learning Model Inference

paper_url: http://arxiv.org/abs/2311.13587
repo_url: None
paper_authors: Kamil Kojs
for: 这篇论文主要是为了探讨大规模深度学习服务系统的发展和优化问题。
methods: 该论文使用了novel taxonomy来总结和分类emerging挑战和优化机会，以帮助研究人员更好地了解大规模深度学习服务系统的优化方法。
results: 该论文通过分析和总结recent trends和技术发展，提供了一些新的优化视角和研究方向，以帮助推动大规模深度学习服务系统的发展。

Abstract
Recent developments in Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This widespread adoption of AI requires significant efforts in deploying these models in production environments. When hosting machine learning models for real-time predictions, it is important to meet defined Service Level Objectives (SLOs), ensuring reliability, minimal downtime, and optimizing operational costs of the underlying infrastructure. Large machine learning models often demand GPU resources for efficient inference to meet SLOs. In the context of these trends, there is growing interest in hosting AI models in a serverless architecture while still providing GPU access for inference tasks. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems. By providing a novel taxonomy and summarizing recent trends, we hope that this survey could shed light on new optimization perspectives and motivate novel works in large-scale deep learning serving systems.

摘要
最近的生成AI、计算机视觉和自然语言处理技术的发展，导致了AI模型在各种产品中的广泛应用。这种普遍应用的AI需要大量的努力来部署这些模型在生产环境中。在托管机器学习模型进行实时预测时，要保证定义的服务水平目标（SLOs），确保可靠性、最小的下时间和基础设施的运营成本优化。大型机器学习模型通常需要GPU资源来实现高效的推断，以达到SLOs。在这些趋势下，有增长的兴趣在托管AI模型在无服务器架构中提供GPU访问，以便进行推断任务。这项调查旨在总结和分类emerging挑战和优化机会，以便在大规模深度学习服务系统中提高性能。我们希望通过这项调查，为大规模深度学习服务系统的优化提供新的视角，并促进新的研究。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates

paper_url: http://arxiv.org/abs/2311.13584
repo_url: None
paper_authors: Stefano Bruno, Ying Zhang, Dong-Young Lim, Ömer Deniz Akyildiz, Sotirios Sabanis
for: 这个论文是为了研究涉及扩散模型的泛化扩散问题，以及在强式准确拟合的假设下提供全面的理论保证。
methods: 该论文使用了 lipschitz连续函数作为推估函数，并对其进行了修改。
results: 该论文提供了关于泛化扩散问题的最佳knownUpper bound estimates，包括关键量的维度和收敛速率，以及一种新的auxiliary process，这些结果在 sampling from a Gaussian distribution with unknown mean 中得到了证明。

Abstract
We provide full theoretical guarantees for the convergence behaviour of diffusion-based generative models under the assumption of strongly logconcave data distributions while our approximating class of functions used for score estimation is made of Lipschitz continuous functions. We demonstrate via a motivating example, sampling from a Gaussian distribution with unknown mean, the powerfulness of our approach. In this case, explicit estimates are provided for the associated optimization problem, i.e. score approximation, while these are combined with the corresponding sampling estimates. As a result, we obtain the best known upper bound estimates in terms of key quantities of interest, such as the dimension and rates of convergence, for the Wasserstein-2 distance between the data distribution (Gaussian with unknown mean) and our sampling algorithm. Beyond the motivating example and in order to allow for the use of a diverse range of stochastic optimizers, we present our results using an $L^2$-accurate score estimation assumption, which crucially is formed under an expectation with respect to the stochastic optimizer and our novel auxiliary process that uses only known information. This approach yields the best known convergence rate for our sampling algorithm.

摘要
我们提供了对于扩散基础的生成模型的完整理论保证，假设 données 是强力的凹陷分布，而我们使用的推估函数是 lipschitz 连续函数。我们通过一个推奋例子，批评 Gaussian 分布的未知均值，说明了我们的方法的强大性。在这个情况下，我们提供了明确的估计，包括推奋估计和采样估计，并且这些估计在适当的情况下组合，从而获得了关键量之间的最好知道上限估计，例如 Wasserstein-2 距离之间的距离。而我们的结果使用 $L^2$ 精度的推奋估计假设，这个假设是基于随机估计和我们的新 auxiliary 过程，这个过程只使用了已知的信息。这种方法将提供最好的对于采样算法的对应率。

Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies

paper_url: http://arxiv.org/abs/2311.13583
repo_url: None
paper_authors: Shabnam Daghaghi, Benjamin Coleman, Benito Geordie, Anshumali Shrivastava
for: 提高神经网络训练速度的数据采样方法
methods: 基于非 Parametric kernel regression 学习有效重要性分数，以及高维统计和随机化算法的 Nadaraya-Watson 估计 sketch 技术
results: 比基eline更高的墙 clock 时间和准确率在四个数据集上Here’s a breakdown of each point:
for: The paper is focused on improving the training speed of neural networks using data sampling methods.
methods: The paper proposes a novel sampling distribution based on nonparametric kernel regression, which learns an effective importance score as the neural network trains. Additionally, the paper uses an efficient sketch-based approximation to the Nadaraya-Watson estimator, which is proven to have exponential convergence guarantees.
results: The paper shows that the proposed sampling algorithm outperforms the baseline in terms of wall-clock time and accuracy on four datasets.

Abstract
Data sampling is an effective method to improve the training speed of neural networks, with recent results demonstrating that it can even break the neural scaling laws. These results critically rely on high-quality scores to estimate the importance of an input to the network. We observe that there are two dominant strategies: static sampling, where the scores are determined before training, and dynamic sampling, where the scores can depend on the model weights. Static algorithms are computationally inexpensive but less effective than their dynamic counterparts, which can cause end-to-end slowdown due to their need to explicitly compute losses. To address this problem, we propose a novel sampling distribution based on nonparametric kernel regression that learns an effective importance score as the neural network trains. However, nonparametric regression models are too computationally expensive to accelerate end-to-end training. Therefore, we develop an efficient sketch-based approximation to the Nadaraya-Watson estimator. Using recent techniques from high-dimensional statistics and randomized algorithms, we prove that our Nadaraya-Watson sketch approximates the estimator with exponential convergence guarantees. Our sampling algorithm outperforms the baseline in terms of wall-clock time and accuracy on four datasets.

摘要
“数据采样是一种有效的方法来加速神经网络训练，最新的研究表明它可以甚至超越神经网络法律。这些结果归功于高质量的输入重要性分值。我们发现有两种主要策略：静态采样，其中分值在训练前已经确定，和动态采样，其中分值可以随模型参数的变化而变化。静态算法是计算效率低下的，但是它们比其他动态策略效果更差，可能会导致整个训练过程延迟。为了解决这个问题，我们提出了一种基于非参数kernel回归的采样分布，该分布在神经网络训练过程中学习有效的输入重要性分值。但是，非参数回归模型太 computationally expensive，无法加速整个训练过程。因此，我们开发了一种高效的笔记本采样方法。使用最新的高维统计技术和随机化算法，我们证明了我们的笔记本采样方法可以快速地近似 Nadaraya-Watson estimator，并且有 exponential convergence guarantee。我们的采样算法在四个 dataset 上比基eline 具有更好的墙 clock time 和准确性。”

A Unified Framework for Trace-induced Quantum Kernels

paper_url: http://arxiv.org/abs/2311.13552
repo_url: None
paper_authors: Beng Yee Gan, Daniel Leykam, Supanut Thanasilp
for: 这篇论文旨在探讨量子机器学习方法中的量子均值方法，以实现实用的量子优势。
methods: 这篇论文将所有的跟踪引起的量子均值方法，包括通用的全球准确性和本地投影量子均值方法，归纳到一个共同框架中。它们将”Lego”均值方法作为基本组件，并规定了这些均值方法的表达能力和泛化能力。
results: 数据表明，使用本地投影量子均值方法可以达到与全球准确性量子均值方法相同的性能，但需要更少的量子门和测量射。此外，这篇论文还提供了一种系统atic Approach来增加量子均值模型的复杂性，并对现有量子均值方法进行了比较。

Abstract
Quantum kernel methods are promising candidates for achieving a practical quantum advantage for certain machine learning tasks. Similar to classical machine learning, an exact form of a quantum kernel is expected to have a great impact on the model performance. In this work we combine all trace-induced quantum kernels, including the commonly-used global fidelity and local projected quantum kernels, into a common framework. We show how generalized trace-induced quantum kernels can be constructed as combinations of the fundamental building blocks we coin "Lego" kernels, which impose an inductive bias on the resulting quantum models. We relate the expressive power and generalization ability to the number of non-zero weight Lego kernels and propose a systematic approach to increase the complexity of a quantum kernel model, leading to a new form of the local projected kernels that require fewer quantum resources in terms of the number of quantum gates and measurement shots. We show numerically that models based on local projected kernels can achieve comparable performance to the global fidelity quantum kernel. Our work unifies existing quantum kernels and provides a systematic framework to compare their properties.

摘要
Translation notes:* "Quantum kernel methods" is translated as "量子核函数方法" (quantum kernel functions).* "Classical machine learning" is translated as "经典机器学习" (classical machine learning).* "Trace-induced quantum kernels" is translated as "跟踪量子核函数" (trace-induced quantum kernels).* "Lego kernels" is translated as "乐高核函数" (Lego kernels).* "Expressive power" is translated as "表达能力" (expressive power).* "Generalization ability" is translated as "泛化能力" (generalization ability).* "Quantum resources" is translated as "量子资源" (quantum resources).* "Quantum gates" is translated as "量子门" (quantum gates).* "Measurement shots" is translated as "测量射击" (measurement shots).

Efficient Numerical Integration in Reproducing Kernel Hilbert Spaces via Leverage Scores Sampling

paper_url: http://arxiv.org/abs/2311.13548
repo_url: https://gitlab.com/achatali/efficient-numerical-integration-in-rkhs-via-ls-sampling
paper_authors: Antoine Chatalic, Nicolas Schreuder, Ernesto De Vito, Lorenzo Rosasco
for: 这个论文target是数值积分问题，即使用点计算积分函数的方法来 aproximate 一个target概率分布。
methods: 这个论文提出了一种高效的方法，利用一小个i.i.d.随机样本来 aproximate 积分函数。这种方法可以在积分函数belongs to a reproducing kernel Hilbert space时使用，并且可以避免大量的函数计算。
results: 这个论文的主要结果是一个上界 Error bound ，用于measure 该方法的误差。这个上界bound 随着样本大小m的增加而逐渐下降，并且可以在 Sobolev spaces 中match 已知的最优率。此外，这个论文还提供了一些数学实验，证明了该方法的实用性和精度。

Abstract
In this work we consider the problem of numerical integration, i.e., approximating integrals with respect to a target probability measure using only pointwise evaluations of the integrand. We focus on the setting in which the target distribution is only accessible through a set of $n$ i.i.d. observations, and the integrand belongs to a reproducing kernel Hilbert space. We propose an efficient procedure which exploits a small i.i.d. random subset of $m

摘要
在这个工作中，我们考虑了数值 интеграル的问题，即使用只有点值评估函数的方式来近似一个目标概率分布中的积分。我们关注的是在目标分布仅通过一组 $n$ 独立同分布的观察值来访问，并且函数属于一个 reproduce kernel Hilbert space 中的问题。我们提出了一种高效的过程，利用一个小的独立同分布的随机子组 $m

Learned Nonlinear Predictor for Critically Sampled 3D Point Cloud Attribute Compression

paper_url: http://arxiv.org/abs/2311.13539
repo_url: None
paper_authors: Tam Thuc Do, Philip A. Chou, Gene Cheung
for: 三维点云特征压缩
methods: 基于三维空间方法，包括基准函数分解、低通量编码和高通量编码
results: 提出了一种基于RAHT（1）的非线性预测方法，并实现了一种基于权重加权 polynomials的快速高通量编码方法，实验结果表明该方法可以比MPEG G-PCC预测器提供11-12%的比特率减少

Abstract
We study 3D point cloud attribute compression via a volumetric approach: assuming point cloud geometry is known at both encoder and decoder, parameters $\theta$ of a continuous attribute function $f: \mathbb{R}^3 \mapsto \mathbb{R}$ are quantized to $\hat{\theta}$ and encoded, so that discrete samples $f_{\hat{\theta}(\mathbf{x}_i)$ can be recovered at known 3D points $\mathbf{x}_i \in \mathbb{R}^3$ at the decoder. Specifically, we consider a nested sequences of function subspaces $\mathcal{F}^{(p)}_{l_0} \subseteq \cdots \subseteq \mathcal{F}^{(p)}_L$, where $\mathcal{F}_l^{(p)}$ is a family of functions spanned by B-spline basis functions of order $p$, $f_l^*$ is the projection of $f$ on $\mathcal{F}_l^{(p)}$ and encoded as low-pass coefficients $F_l^*$, and $g_l^*$ is the residual function in orthogonal subspace $\mathcal{G}_l^{(p)}$ (where $\mathcal{G}_l^{(p)} \oplus \mathcal{F}_l^{(p)} = \mathcal{F}_{l+1}^{(p)}$) and encoded as high-pass coefficients $G_l^*$. In this paper, to improve coding performance over [1], we study predicting $f_{l+1}^*$ at level $l+1$ given $f_l^*$ at level $l$ and encoding of $G_l^*$ for the $p=1$ case (RAHT($1$)). For the prediction, we formalize RAHT(1) linear prediction in MPEG-PCC in a theoretical framework, and propose a new nonlinear predictor using a polynomial of bilateral filter. We derive equations to efficiently compute the critically sampled high-pass coefficients $G_l^*$ amenable to encoding. We optimize parameters in our resulting feed-forward network on a large training set of point clouds by minimizing a rate-distortion Lagrangian. Experimental results show that our improved framework outperformed the MPEG G-PCC predictor by $11$ to $12\%$ in bit rate reduction.

摘要
我们研究3D点云属性压缩 via一种体积方法：假设点云几何已知于编码器和解码器，参数θ的连续属性函数f：mathbb{R}^3→mathbb{R}的量化为hat{\theta}，以便在知道3D点cloud中的特定点x_i（i=1,2,…）上Recover discrete抽象 samplesf_hat({x_i）。具体来说，我们考虑一个嵌套的函数子空间{F^{(p)}_{l_0} ⊆ … ⊆ F^{(p)}_{L}，其中F_l^{(p)}是B-spline基函数的推荐系列中的一个函数空间，f_l∗是f在F_l^{(p)}中的投影，并被编码为低通 coefficientF_l∗，而g_l∗是在Orthogonal subspace Γ_l^{(p)}中的剩余函数，并被编码为高通 coefficientG_l∗。在这篇论文中，我们为提高编码性能而研究预测f_{l+1}^* at level $l+1$ given $f_l^*$ at level $l$，并编码G_l^* for the $p=1$ case (RAHT($1$))。为预测，我们在MPEG-PCC中形式化RAHT(1)的线性预测，并提出一种新的非线性预测器使用双边滤波器的多项式。我们 derive equations to efficiently compute the critically sampled high-pass coefficientsG_l∗ suitable for encoding。我们对 resulting feed-forward network的参数进行优化，使其在大量点云训练集上最小化一个Rate-Distortion lagrange。实验结果表明，我们的改进的框架在比特率减少方面与MPEG G-PCC预测器相比，可以实现$11$到$12\%$的提高。

Naturalness of Attention: Revisiting Attention in Code Language Models

paper_url: http://arxiv.org/abs/2311.13508
repo_url: None
paper_authors: Mootez Saad, Tushar Sharma
for: 这个论文旨在解释CodeBERT模型中注意机制的不同因素，以便更好地理解其捕捉代码结构的能力。
methods: 该研究使用了CodeBERT模型，并对其进行了初步的解释性分析，包括对注意力分布和变换表示进行了分析。
results: 研究发现，通过考虑输入的缩放变换 нор 而不仅仅是注意力 weights alone，CodeBERT模型可以更好地捕捉代码结构的含义。这些结果阐明了CodeBERT模型如何嵌入代码结构的特征。

Abstract
Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. Recent attention analysis studies provide initial interpretability insights by focusing solely on attention weights rather than considering the wider context modeling of Transformers. This study aims to shed some light on the previously ignored factors of the attention mechanism beyond the attention weights. We conduct an initial empirical study analyzing both attention distributions and transformed representations in CodeBERT. Across two programming languages, Java and Python, we find that the scaled transformation norms of the input better capture syntactic structure compared to attention weights alone. Our analysis reveals characterization of how CodeBERT embeds syntactic code properties. The findings demonstrate the importance of incorporating factors beyond just attention weights for rigorously understanding neural code models. This lays the groundwork for developing more interpretable models and effective uses of attention mechanisms in program analysis.

摘要
codeBERT模型可以学习高级代码表示，但它们的 opacity 使得捕捉特性的理解受到阻碍。最近的注意分析研究只关注Transformers模型中的注意力重量，而不考虑更广泛的模型Context。这项研究的目标是探讨注意机制中尚未被考虑的因素。我们通过对CodeBERT模型的注意分布和变换表示进行初步实验分析，发现在Java和Python两种编程语言中，输入的缩放转换 нор 比 attention weights 更好地捕捉语法结构。我们的分析发现了CodeBERT模型如何嵌入语法代码特性。这些发现表明了在理解神经代码模型时需要考虑因素 beyond 注意力重量。这些发现将为开发更可解释的模型和更有效的注意力机制在程序分析中提供基础。

Applying Dimensionality Reduction as Precursor to LSTM-CNN Models for Classifying Imagery and Motor Signals in ECoG-Based BCIs

paper_url: http://arxiv.org/abs/2311.13507
repo_url: https://github.com/bafanas/dim-reduction-with-cnn-lstm
paper_authors: Soham Bafana
for: 提高 motor rehabilitation 结果，optimize motor imagery classification algorithms within Brain-Computer Interfaces (BCIs)
methods: 使用 unsupervised 技术，如 Uniform Manifold Approximation and Projection (UMAP) 和 K-Nearest Neighbors (KNN)，evaluate the necessity of employing supervised methods such as Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs) for classification tasks
results: participants who exhibited high KNN scores following UMAP dimensionality reduction also achieved high accuracy in supervised deep learning (DL) models, individualized model requirements and massive neural training data, dimensionality reduction becomes an effective preprocessing step that minimizes the need for extensive data labeling and supervised deep learning techniques

Abstract
Motor impairments, frequently caused by neurological incidents like strokes or traumatic brain injuries, present substantial obstacles in rehabilitation therapy. This research aims to elevate the field by optimizing motor imagery classification algorithms within Brain-Computer Interfaces (BCIs). By improving the efficiency of BCIs, we offer a novel approach that holds significant promise for enhancing motor rehabilitation outcomes. Utilizing unsupervised techniques for dimensionality reduction, namely Uniform Manifold Approximation and Projection (UMAP) coupled with K-Nearest Neighbors (KNN), we evaluate the necessity of employing supervised methods such as Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs) for classification tasks. Importantly, participants who exhibited high KNN scores following UMAP dimensionality reduction also achieved high accuracy in supervised deep learning (DL) models. Due to individualized model requirements and massive neural training data, dimensionality reduction becomes an effective preprocessing step that minimizes the need for extensive data labeling and supervised deep learning techniques. This approach has significant implications not only for targeted therapies in motor dysfunction but also for addressing regulatory, safety, and reliability concerns in the rapidly evolving BCI field.

摘要
neural incidents like strokes or traumatic brain injuries, present substantial obstacles in rehabilitation therapy. This research aims to elevate the field by optimizing motor imagery classification algorithms within Brain-Computer Interfaces (BCIs). By improving the efficiency of BCIs, we offer a novel approach that holds significant promise for enhancing motor rehabilitation outcomes. Utilizing unsupervised techniques for dimensionality reduction, namely Uniform Manifold Approximation and Projection (UMAP) coupled with K-Nearest Neighbors (KNN), we evaluate the necessity of employing supervised methods such as Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs) for classification tasks. Importantly, participants who exhibited high KNN scores following UMAP dimensionality reduction also achieved high accuracy in supervised deep learning (DL) models. Due to individualized model requirements and massive neural training data, dimensionality reduction becomes an effective preprocessing step that minimizes the need for extensive data labeling and supervised deep learning techniques. This approach has significant implications not only for targeted therapies in motor dysfunction but also for addressing regulatory, safety, and reliability concerns in the rapidly evolving BCI field.Here's the text with the Chinese characters: neural incidents like strokes or traumatic brain injuries, present substantial obstacles in rehabilitation therapy. This research aims to elevate the field by optimizing motor imagery classification algorithms within Brain-Computer Interfaces (BCIs). By improving the efficiency of BCIs, we offer a novel approach that holds significant promise for enhancing motor rehabilitation outcomes. Utilizing unsupervised techniques for dimensionality reduction, namely Uniform Manifold Approximation and Projection (UMAP) coupled with K-Nearest Neighbors (KNN), we evaluate the necessity of employing supervised methods such as Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNNs) for classification tasks. Importantly, participants who exhibited high KNN scores following UMAP dimensionality reduction also achieved high accuracy in supervised deep learning (DL) models. Due to individualized model requirements and massive neural training data, dimensionality reduction becomes an effective preprocessing step that minimizes the need for extensive data labeling and supervised deep learning techniques. This approach has significant implications not only for targeted therapies in motor dysfunction but also for addressing regulatory, safety, and reliability concerns in the rapidly evolving BCI field.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Grad-Shafranov equilibria via data-free physics informed neural networks

paper_url: http://arxiv.org/abs/2311.13491
repo_url: None
paper_authors: Byoungchan Jang, Alan A. Kaptanoglu, Rahul Gaur, Shaowu Pan, Matt Landreman, William Dorland
For: 这 paper 是用来解决 Grad-Shafranov 方程的方法。* Methods: 这 paper 使用 Physics-Informed Neural Networks (PINNs) 方法来解决 Grad-Shafranov 方程，并通过直接将 PDE residual 作为损失函数进行优化。* Results: PINNs 可以准确地和有效地解决 Grad-Shafranov 方程，并且可以适应不同的边界条件。 Additionally, the authors explore the parameter space of the PINN model to map various trade-offs, such as between reconstruction error and computational speed, and introduce a parameterized PINN framework to handle a broader range of plasma scenarios.

Abstract
A large number of magnetohydrodynamic (MHD) equilibrium calculations are often required for uncertainty quantification, optimization, and real-time diagnostic information, making MHD equilibrium codes vital to the field of plasma physics. In this paper, we explore a method for solving the Grad-Shafranov equation by using Physics-Informed Neural Networks (PINNs). For PINNs, we optimize neural networks by directly minimizing the residual of the PDE as a loss function. We show that PINNs can accurately and effectively solve the Grad-Shafranov equation with several different boundary conditions. We also explore the parameter space by varying the size of the model, the learning rate, and boundary conditions to map various trade-offs such as between reconstruction error and computational speed. Additionally, we introduce a parameterized PINN framework, expanding the input space to include variables such as pressure, aspect ratio, elongation, and triangularity in order to handle a broader range of plasma scenarios within a single network. Parametrized PINNs could be used in future work to solve inverse problems such as shape optimization.

摘要
很多磁液动力学（MHD）平衡计算需要uncertainty量化、优化和实时诊断信息，因此MHD平衡代码是物理学中的关键工具。在这篇论文中，我们探讨使用物理学 Informed Neural Networks（PINNs）解决Grad-Shafranov方程。我们通过直接将PDE的差异作为损失函数来优化神经网络。我们表明PINNs可以准确地和效地解决Grad-Shafranov方程，并且可以处理不同的边界条件。我们还探索参数空间，包括模型大小、学习率和边界条件，以映射不同的贸易offs，如重构错误和计算速度之间的贸易offs。此外，我们引入参数化PINN框架，扩展输入空间，以包括压力、比例、延展和三角形等参数，以处理更广泛的气溶胶enario中的多种情况。参数化PINNs可以在未来的工作中解决反向问题，如形状优化。

Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning

paper_url: http://arxiv.org/abs/2311.13490
repo_url: None
paper_authors: Bhavya Mehta, Kush Kothari, Reshmika Nambiar, Seema Shrawne
for: 这个研究是为了提高化学物质的毒性预测和药物发现，以及环境风险评估。
methods: 这篇研究使用了Graph Isomorphic Networks、Multi Headed Attention和Free Large-scale Adversarial Augmentation等方法，具体来说是在graph上 precisely capture 化学物质的结构数据和毒性属性。同时，还将Few-Shot Learning 应用于提高模型的扩展性。
results: 实验结果显示，这种方法可以在一个多样化的毒性数据集上取得一个非常出色的AUC-ROC值（0.816），比基eline GCN 模型高出11.4%。这显示了我们的提案方法和Few Shot Learning 在毒分子类别中的重要性和应用前景。

Abstract
Traditional methods like Graph Convolutional Networks (GCNs) face challenges with limited data and class imbalance, leading to suboptimal performance in graph classification tasks during toxicity prediction of molecules as a whole. To address these issues, we harness the power of Graph Isomorphic Networks, Multi Headed Attention and Free Large-scale Adversarial Augmentation separately on Graphs for precisely capturing the structural data of molecules and their toxicological properties. Additionally, we incorporate Few-Shot Learning to improve the model's generalization with limited annotated samples. Extensive experiments on a diverse toxicology dataset demonstrate that our method achieves an impressive state-of-art AUC-ROC value of 0.816, surpassing the baseline GCN model by 11.4%. This highlights the significance of our proposed methodology and Few Shot Learning in advancing Toxic Molecular Classification, with the potential to enhance drug discovery and environmental risk assessment processes.

摘要
传统方法如图 convolutional neural networks (GCNs) 面临有限数据和分类不均衡的挑战，导致在分子毒性预测任务中表现不佳。为解决这些问题，我们利用图同态网络、多头注意力和自由大规模对抗增强网络分别在图上准确捕捉分子的结构数据和毒理性质。此外，我们还采用少样学习来提高模型的泛化性。广泛的实验表明，我们的方法在多样化毒理学数据集上达到了非常出色的 AUC-ROC 值为 0.816，比基eline GCN 模型高出 11.4%。这些结果表明我们提出的方法和少样学习在毒分子分类方面具有重要意义，有助于提高药物发现和环境风险评估过程。

Comparative Analysis of Linear Regression, Gaussian Elimination, and LU Decomposition for CT Real Estate Purchase Decisions

paper_url: http://arxiv.org/abs/2311.13471
repo_url: None
paper_authors: Xilin Cheng
for: 这个论文是用来评估三种计算方法在买房决策过程中的应用。
methods: 这个论文使用的方法包括Linear Regression、Gaussian Elimination with partial pivoting、LU Decomposition。
results: 研究发现，Linear Regression和LU Decomposition的预测准确性最高，而Gaussian Elimination在稳定性和性能方面存在限制。

Abstract
This paper presents a comprehensive evaluation of three distinct computational algorithms applied to the decision-making process of real estate purchases. Specifically, we analyze the efficacy of Linear Regression from Scikit-learn library, Gaussian Elimination with partial pivoting, and LU Decomposition in predicting the advisability of buying a house in the State of Connecticut based on a set of financial and market-related parameters. The algorithms' performances were compared using a dataset encompassing town-specific details, yearly data, interest rates, and median sale ratios. Our results demonstrate significant differences in predictive accuracy, with Linear Regression and LU Decomposition providing the most reliable recommendations and Gaussian Elimination showing limitations in stability and performance. The study's findings emphasize the importance of algorithm selection in predictive analytic and offer insights into the practical applications of computational methods in real estate investment strategies. By evaluating model efficacy through metrics such as R-squared scores and Mean Squared Error, we provide a nuanced understanding of each method's strengths and weaknesses, contributing valuable knowledge to the fields of real estate analysis and predictive modeling.

摘要

Span-Based Optimal Sample Complexity for Average Reward MDPs

paper_url: http://arxiv.org/abs/2311.13469
repo_url: None
paper_authors: Matthew Zurek, Yudong Chen
for: 学习一个 $\varepsilon$-优质策略在均值奖励Markov决策过程（MDP）中。
methods: 通过减少均值奖励MDP到折扣MDP来实现。
results: 确立了复杂度上下文$\widetilde{O}\left(SA\frac{H}{\varepsilon^2} \right)$，其中$H$是优质策略偏函数的扩展，$SA$是状态动作空间的 cardinality。这个结果在所有参数$S$, $A$, $H$和$\varepsilon$中是最优（减少到 logs 的因子），超过现有的工作，其中一些假设所有策略的混合时间都是均匀的。

Abstract
We study the sample complexity of learning an $\varepsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model. We establish the complexity bound $\widetilde{O}\left(SA\frac{H}{\varepsilon^2} \right)$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space. Our result is the first that is minimax optimal (up to log factors) in all parameters $S,A,H$ and $\varepsilon$, improving on existing work that either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters. Our result is based on reducing the average-reward MDP to a discounted MDP. To establish the optimality of this reduction, we develop improved bounds for $\gamma$-discounted MDPs, showing that $\widetilde{O}\left(SA\frac{H}{(1-\gamma)^2\varepsilon^2} \right)$ samples suffice to learn a $\varepsilon$-optimal policy in weakly communicating MDPs under the regime that $\gamma \geq 1 - \frac{1}{H}$, circumventing the well-known lower bound of $\widetilde{\Omega}\left(SA\frac{1}{(1-\gamma)^3\varepsilon^2} \right)$ for general $\gamma$-discounted MDPs. Our analysis develops upper bounds on certain instance-dependent variance parameters in terms of the span parameter. These bounds are tighter than those based on the mixing time or diameter of the MDP and may be of broader use.

摘要
我们研究了学习一个 $\varepsilon$-优质策略的样本复杂度在均值奖励Markov决策过程（MDP）中。我们提出了复杂度上下文bound $\widetilde{O}\left(SA\frac{H}{\varepsilon^2} \right)$,其中 $H$ 是优质策略的偏函数的扩展 span，$SA$ 是状态动作空间的 cardinality。我们的结果是在所有参数 $S$, $A$, $H$ 和 $\varepsilon$ 中具有最优（即减小 log 因子）的结果，超越了现有的工作，其中 Either assumes uniformly bounded mixing times for all policies or has suboptimal dependence on the parameters。我们的结果基于将均值奖励MDP 降低到折扣MDP。为了证明这种降低的有效性，我们开发了 $\gamma$-折扣MDP 中更好的下界，证明 $\widetilde{O}\left(SA\frac{H}{(1-\gamma)^2\varepsilon^2} \right)$ 样本 suffice to learn a $\varepsilon$-优质策略在弱相互通信MDP 中，超越了对于一般 $\gamma$-折扣MDP 的下界 $\widetilde{\Omega}\left(SA\frac{1}{(1-\gamma)^3\varepsilon^2} \right)$。我们的分析开发了基于 span 参数的上界，这些上界比基于混合时间或 Diameter 参数的上界更紧。这些上界可能在更广泛的应用中得到报用。

Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure

paper_url: http://arxiv.org/abs/2311.13466
repo_url: https://github.com/dunni3/keypoint-diffusion
paper_authors: Ian Dunn, David Ryan Koes
for: 该论文旨在提出一种基于 Graph Neural Network (GNN) 的分子结构生成模型，用于蛋白质结构设计和分子生物学问题解决。
methods: 该模型使用 GNN 学习分子结构的含义，并结合扩散模型进行推断。该模型的特点是使用精细的分子结构表示，从而 preserve 分子间相互作用的信息。
results: 在与一个全原子蛋白质表示相比，该模型在扩散模型中的训练和推断时间减少了3倍，并且性能相对较好。

Abstract
Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.

摘要
Translate the given text into Simplified Chinese.随机生成模型在生物结构学和基于结构药物设计方面已经出现为一种强大的框架。这些模型直接操作于三维分子结构。由于图神经网络（GNNs）的可插入性和扩展尺度的不利扩展性，许多现有的分子扩散模型依赖于粗略化蛋白结构的表示来使训练和推断可行。然而，这些粗略化表示会产生蛋白结构中关键的信息损失，从而降低生成结构的质量。在这项工作中，我们提出了一种基于GNN的新建立 latent表示方法。当与一种用于新药设计的扩散模型进行端到端训练时，我们的模型可以与使用全原子蛋白表示达到相同的性能，同时展现出3倍的推断时间减少。

Multi-Objective Bayesian Optimization with Active Preference Learning

paper_url: http://arxiv.org/abs/2311.13460
repo_url: None
paper_authors: Ryota Ozaki, Kazuki Ishikawa, Youhei Kanzaki, Shinya Suzuki, Shion Takeno, Ichiro Takeuchi, Masayuki Karasuyama
for: 这个论文是为了解决多bjective optimization问题中的优化多个目标函数的问题，但是在这种问题中，找到整个Pareto前进行搜索的成本是繁荐的，而在实际应用中，决策者（DM）通常只需要一个特定的解在Pareto优化解中。
methods: 该论文提出了一种 Bayesian optimization（BO）方法，用于在多bjective optimization问题中适应多个目标函数，并且可以在DM的偏好模型中进行adaptive地估计。此外，该论文还提出了一种在DM的偏好估计中 incorporate了对象函数的uncertainty和DM的偏好uncertainty的 acquisition函数，以及一种可以降低DM与BO算法之间的交互成本的active learning策略。
results: 该论文通过使用 benchmark function optimization和机器学习模型的hyper-parameter优化问题进行实验，证明了其效果的抽象。

Abstract
There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive search cost, while in many practical scenarios, the decision maker (DM) only needs a specific solution among the set of the Pareto optimal solutions. We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in the MOO with expensive objective functions, in which a Bayesian preference model of the DM is adaptively estimated by an interactive manner based on the two types of supervisions called the pairwise preference and improvement request. To explore the most preferred solution, we define an acquisition function in which the uncertainty both in the objective functions and the DM preference is incorporated. Further, to minimize the interaction cost with the DM, we also propose an active learning strategy for the preference estimation. We empirically demonstrate the effectiveness of our proposed method through the benchmark function optimization and the hyper-parameter optimization problems for machine learning models.

摘要
有很多现实世界中的黑盒优化问题需要同时优化多个目标函数。然而，在多目标优化（MOO）问题中，确定整个帕瑞托前缘需要昂贵的搜索成本，而在许多实践中，决策者（DM）只需要在多个优化解的集合中选择特定的解决方案。我们提议使用 bayesian 优化（BO）方法来实现MOO问题中的多个目标函数同时优化，并且在DM的偏好模型中进行了交互式地 Adaptive 估计，基于两种类型的监督：对比 preference 和改进请求。为了探索最优解，我们定义了一个获取函数，其中包含目标函数的不确定性以及DM的偏好不确定性。此外，为了最小化与DM的交互成本，我们还提出了一种活动学习策略来估计DM的偏好。我们通过 benchmark 函数优化和机器学习模型的 hyper-parameter 优化问题进行了实验，并证明了我们的提议的有效性。

The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

paper_url: http://arxiv.org/abs/2311.13459
repo_url: None
paper_authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth
for: 本研究用Tempered Exponential Measures (TEMs)作为一种参数泛化的几何函数，以最大化温和积分函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的温和 entropy 函数中的��

Abstract
Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities. Calculus on TEMs relies on a deformed algebra of arithmetic operators induced by the deformed logarithms used to define the tempered entropy. In this work, we introduce three different parameterizations of finite discrete TEMs via Legendre functions of the negative tempered entropy function. In particular, we establish an isometry between such parameterizations in terms of a generalization of the Hilbert log cross-ratio simplex distance to a tempered Hilbert co-simplex distance. Similar to the Hilbert geometry, the tempered Hilbert distance is characterized as a $t$-symmetrization of the oriented tempered Funk distance. We motivate our construction by introducing the notion of $t$-lengths of smooth curves in a tautological Finsler manifold. We then demonstrate the properties of our generalized structure in different settings and numerically examine the quality of its differentiable approximations for optimization in machine learning settings.

摘要
tempered exponential measures (TEMs) 是一种 Parametric 的普遍函数家族，在正确评估函数家族中最大化温和 entropy 函数，并且对于正确评估函数进行了概率Normalization。 calculus on TEMs 基于一种受扭变的加法运算，这种运算是通过定义温和 entropy 函数的扭变它们来定义的。在这项工作中，我们介绍了三种不同的精度 parameterizations of finite discrete TEMs via Legendre functions of the negative tempered entropy function。特别是，我们证明了这些 parameterizations 之间的同构关系，这种同构关系可以被表示为一种总体化的希尔伯特 log cross-ratio 简单距离。这种希尔伯特距离是一种 $t$-symmetrization of the oriented tempered Funk distance。我们motivate our construction by introducing the notion of $t$-lengths of smooth curves in a tautological Finsler manifold。然后，我们展示了我们的扩展结构在不同的设置中的性质，并 numerically examined the quality of its differentiable approximations for optimization in machine learning settings。

Explaining high-dimensional text classifiers

paper_url: http://arxiv.org/abs/2311.13454
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Odelia Melamed, Rich Caruana
for: 这个论文旨在提供一种新的解释性方法，用于高维输入和神经网络分类器。
methods: 该方法利用神经网络中 theoretically proven 的高维性质，以提高解释性。
results: 在 IMDB 审查 dataset 上进行 sentiment 分析任务中，以及 PowerShell 脚本 dataset 上的 malware 检测任务中，该方法都有出色的表现。

Abstract
Explainability has become a valuable tool in the last few years, helping humans better understand AI-guided decisions. However, the classic explainability tools are sometimes quite limited when considering high-dimensional inputs and neural network classifiers. We present a new explainability method using theoretically proven high-dimensional properties in neural network classifiers. We present two usages of it: 1) On the classical sentiment analysis task for the IMDB reviews dataset, and 2) our Malware-Detection task for our PowerShell scripts dataset.

摘要

过去几年，可解释性已成为人工智能决策的有价值工具。然而，经典的解释工具在考虑高维输入和神经网络分类器时有限。我们介绍了一种新的可解释方法，利用神经网络分类器中已经证明的高维性质。我们在以下两个应用中使用了它：1）传统的情感分析任务中IMDB评论集合，2）我们的Malware检测任务中PowerShell脚本集合。
Here's a word-for-word translation of the text:
过去几年，可解释性已成为人工智能决策的有价值工具。然而，经典的解释工具在考虑高维输入和神经网络分类器时有限。我们介绍了一种新的可解释方法，利用神经网络分类器中已经证明的高维性质。我们在以下两个应用中使用了它：1）传统的情感分析任务中IMDB评论集合，2）我们的Malware检测任务中PowerShell脚本集合。
Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Hong Kong, Macau, and Taiwan. The translation is written in Simplified Chinese, which is the standard form used in mainland China.

Differentially Private Non-Convex Optimization under the KL Condition with Optimal Rates

paper_url: http://arxiv.org/abs/2311.13447
repo_url: None
paper_authors: Michael Menart, Enayat Ullah, Raman Arora, Raef Bassily, Cristóbal Guzmán
for: The paper is written for studying private empirical risk minimization (ERM) problems under the constraint of $\rho$ zero-concentrated differential privacy (zCDP).
methods: The paper proposes new algorithms based on variance reduced gradient descent and proximal point method for solving private ERM problems, which achieve near-optimal convergence rates.
results: The paper shows that the proposed algorithms can achieve convergence rates of $\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^\kappa\big)$ and $\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^{1/2}\big)$ for private ERM problems, which are near-optimal and improve upon existing results.

Abstract
We study private empirical risk minimization (ERM) problem for losses satisfying the $(\gamma,\kappa)$-Kurdyka-{\L}ojasiewicz (KL) condition. The Polyak-{\L}ojasiewicz (PL) condition is a special case of this condition when $\kappa=2$. Specifically, we study this problem under the constraint of $\rho$ zero-concentrated differential privacy (zCDP). When $\kappa\in[1,2]$ and the loss function is Lipschitz and smooth over a sufficiently large region, we provide a new algorithm based on variance reduced gradient descent that achieves the rate $\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^\kappa\big)$ on the excess empirical risk, where $n$ is the dataset size and $d$ is the dimension. We further show that this rate is nearly optimal. When $\kappa \geq 2$ and the loss is instead Lipschitz and weakly convex, we show it is possible to achieve the rate $\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^\kappa\big)$ with a private implementation of the proximal point method. When the KL parameters are unknown, we provide a novel modification and analysis of the noisy gradient descent algorithm and show that this algorithm achieves a rate of $\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^{\frac{2\kappa}{4-\kappa}\big)$ adaptively, which is nearly optimal when $\kappa = 2$. We further show that, without assuming the KL condition, the same gradient descent algorithm can achieve fast convergence to a stationary point when the gradient stays sufficiently large during the run of the algorithm. Specifically, we show that this algorithm can approximate stationary points of Lipschitz, smooth (and possibly nonconvex) objectives with rate as fast as $\tilde{O}\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)$ and never worse than $\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^{1/2}\big)$. The latter rate matches the best known rate for methods that do not rely on variance reduction.

摘要
我们研究了Private Empirical Risk Minimization（ERM）问题，其损失函数满足($\gamma$, $\kappa$)-Kurdyka-{\L}ojasiewicz（KL）条件。当$\kappa=2$时，我们研究这个问题在$\rho$ zero-concentrated differential privacy（zCDP）约束下。当$\kappa \in [1,2]$且损失函数在充分大的区域内是 lipschitz 和光滑的时，我们提出了一种基于减少幂的梯度下降算法，可以实现$\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^\kappa\big)$的副本风险差，其中$n$是数据集大小，$d$是维度。我们进一步证明了这个率是近似优的。当$\kappa \geq 2$且损失函数是 lipschitz 和弱 convex的时，我们示出可以通过私有实现 proximal point 方法来实现$\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^\kappa\big)$的率。当 KL 参数未知时，我们提出了一种 modificated 和分析的杂变梯度下降算法，可以实现 $\tilde{O}\big(\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^{\frac{2\kappa}{4-\kappa}}\big)$的率，这是$\kappa = 2$时的近似优率。此外，我们还证明了不假设 KL 条件时，这种梯度下降算法可以快速地到达站点点。具体来说，我们证明这种算法可以在 Lipschitz 和光滑（可能非 convex）目标函数上 approximate 站点点，其率与 $\frac{\sqrt{d}{n\sqrt{\rho}$ 相同，最坏情况下与 $\big(\frac{\sqrt{d}{n\sqrt{\rho}\big)^{1/2}$ 相同。前者率与不假设 KL 条件的方法相同，后者率则是最坏情况下的最佳率。

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

paper_url: http://arxiv.org/abs/2311.13445
repo_url: None
paper_authors: Chi Zhang, Zifan Wang, Ravi Mangal, Matt Fredrikson, Limin Jia, Corina Pasareanu
for: This paper aims to investigate the effect of adversarial perturbations on coding tasks with large language models (LLMs).
methods: The paper uses white-box attacks to generate adversarial examples for smaller code models, and then studies the transferability of these examples to LLMs. The paper also proposes prompt-based defenses to improve the resilience of LLMs against adversarial perturbations.
results: The paper shows that adversarial examples obtained with a smaller code model are transferable to LLMs, weakening their performance. The proposed defenses show promise in improving the model’s resilience, paving the way to more robust defensive solutions for LLMs in code-related applications.

Abstract
Modern large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities for coding tasks including writing and reasoning about code. They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities. However, these previous code models were shown vulnerable to adversarial examples, i.e. small syntactic perturbations that do not change the program's semantics, such as the inclusion of "dead code" through false conditions or the addition of inconsequential print statements, designed to "fool" the models. LLMs can also be vulnerable to the same adversarial perturbations but a detailed study on this concern has been lacking so far. In this paper we aim to investigate the effect of adversarial perturbations on coding tasks with LLMs. In particular, we study the transferability of adversarial examples, generated through white-box attacks on smaller code models, to LLMs. Furthermore, to make the LLMs more robust against such adversaries without incurring the cost of retraining, we propose prompt-based defenses that involve modifying the prompt to include additional information such as examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations. Our experiments show that adversarial examples obtained with a smaller code model are indeed transferable, weakening the LLMs' performance. The proposed defenses show promise in improving the model's resilience, paving the way to more robust defensive solutions for LLMs in code-related applications.

摘要
现代大语言模型（LLM），如ChatGPT，已经表现出了编程任务中的卓越能力，包括编写和理解代码。它们超越了之前的神经网络模型，如code2seq或seq2seq，这些模型在完成代码摘要和检测代码漏洞等任务上已经达到了竞争性的结果。然而，这些前一代代码模型在面对黑客攻击时表现不稳定，即使是小规模的语义性改变。LLM也可能受到同样的攻击，但是详细的研究尚未进行过。在这篇论文中，我们想 investigate LLM在编程任务中对黑客攻击的影响。特别是，我们研究了黑客攻击小代码模型后，对LLM的攻击传播性。此外，为了使LLM更加鲜硬对抗这些敌人而不需要重新训练，我们提出了提示基的防御方法，即在提示中添加了黑客攻击后的代码示例和逆转黑客攻击的明确指令。我们的实验表明，黑客攻击小代码模型后，LLM的性能会受到影响。我们提出的防御方法显示了承办性，为LLM在代码相关应用中增强鲜硬性铺垫道路。

Recurrent neural networks and transfer learning for elasto-plasticity in woven composites

paper_url: http://arxiv.org/abs/2311.13434
repo_url: None
paper_authors: Ehsan Ghane, Martin Fagerström, Mohsen Mirkhalaf
for: 用于模拟织物复合材料的 meso 级 simulation 的代理模型。
methods: 使用 Recurrent Neural Network (RNN) 模型，利用转移学习来解决初始化挑战和稀缺数据问题，生成了一个广泛的数据集表示塑性行为。
results: 通过随机步进行预测，实现了对循环荷载 conditons 的准确预测，并且通过包含子尺度特性来增强 RNN 的 universality。

Abstract
As a surrogate for computationally intensive meso-scale simulation of woven composites, this article presents Recurrent Neural Network (RNN) models. Leveraging the power of transfer learning, the initialization challenges and sparse data issues inherent in cyclic shear strain loads are addressed in the RNN models. A mean-field model generates a comprehensive data set representing elasto-plastic behavior. In simulations, arbitrary six-dimensional strain histories are used to predict stresses under random walking as the source task and cyclic loading conditions as the target task. Incorporating sub-scale properties enhances RNN versatility. In order to achieve accurate predictions, the model uses a grid search method to tune network architecture and hyper-parameter configurations. The results of this study demonstrate that transfer learning can be used to effectively adapt the RNN to varying strain conditions, which establishes its potential as a useful tool for modeling path-dependent responses in woven composites.

摘要
本文使用循环神经网络（RNN）模型作为织物复合材料的计算残留模拟的代理。通过传输学习的能力，本文解决了RNN模型的初始化挑战和稀缺数据问题，其中包括循环shear strain负荷。一个含有材料弹性行为的含义场模型生成了一个全面的数据集。在模拟中，随机六个维度的弹性历史被用来预测在循环walking作为源任务和循环负荷条件作为目标任务下的压力。通过包含子材料性能，RNN模型的多样性得到了加强。为了实现准确预测，模型使用格子搜索方法来调整网络架构和超参数配置。本研究示出，传输学习可以有效地使RNN模型适应不同的弹性条件，这为模拟织物复合材料的路径依赖响应提供了可靠的工具。

Extracting individual variable information for their decoupling, direct mutual information and multi-feature Granger causality

paper_url: http://arxiv.org/abs/2311.13431
repo_url: None
paper_authors: Jarek Duda
for: 这篇论文旨在提出一种方法，用于分离多变量之间的复杂依赖关系，以提高变量之间的独立性。
methods: 该方法使用 conditional probability distributions 的减少方法，将多变量转化为独立的变量，并且可以用于评估直接相关性和 causality 方向。
results: 该方法可以帮助分离多变量之间的复杂依赖关系，提高变量之间的独立性，并且可以用于评估直接相关性和 causality 方向。

Abstract
Working with multiple variables they usually contain difficult to control complex dependencies. This article proposes extraction of their individual information, e.g. $\overline{X|Y}$ as random variable containing information from $X$, but with removed information about $Y$, by using $(x,y) \leftrightarrow (\bar{x}=\textrm{CDF}_{X|Y=y}(x),y)$ reversible normalization. One application can be decoupling of individual information of variables: reversibly transform $(X_1,\ldots,X_n)\leftrightarrow(\tilde{X}_1,\ldots \tilde{X}_n)$ together containing the same information, but being independent: $\forall_{i\neq j} \tilde{X}_i\perp \tilde{X}_j, \tilde{X}_i\perp X_j$. It requires detailed models of complex conditional probability distributions - it is generally a difficult task, but here can be done through multiple dependency reducing iterations, using imperfect methods (here HCR: Hierarchical Correlation Reconstruction). It could be also used for direct mutual information - evaluating direct information transfer: without use of intermediate variables. For causality direction there is discussed multi-feature Granger causality, e.g. to trace various types of individual information transfers between such decoupled variables, including propagation time (delay).

摘要
工作 avec 多个变量，它们通常含有难以控制的复杂依赖关系。这篇文章提议提取这些变量的个人信息，例如 $\bar{X|Y}$ 作为一个随机变量，含有 $X$ 中的信息，但是去掉了关于 $Y$ 的信息，通过使用 $(x,y) \leftrightarrow (\bar{x}=\textrm{CDF}_{X|Y=y}(x),y)$ 逆转正规化。这种方法可以用于独立变量之间的解耦，例如 $(X_1,\ldots,X_n) \leftrightarrow (\tilde{X}_1,\ldots \tilde{X}_n)$ 这些变量共同含有同样的信息，但是是独立的：$\forall_{i\neq j} \tilde{X}_i\perp \tilde{X}_j, \tilde{X}_i\perp X_j$。这需要详细的复杂Conditional probability distribution模型，这是一项Difficult task，但是在这里可以通过多个依赖降低迭代来完成，使用不完美的方法（如 Hierarchical Correlation Reconstruction）。它也可以用于直接 mutual information 评估，不需要使用中间变量。对于 causality direction，有讨论多种个人信息传递类型，包括协调时间（延迟）。

Bayesian inference of a new Mallows model for characterising symptom sequences applied in primary progressive aphasia

paper_url: http://arxiv.org/abs/2311.13411
repo_url: None
paper_authors: Beatrice Taylor, Cameron Shand, Chris J. D. Hardy, Neil Oxtoby
for: 这个研究用于Characterising symptom sequences using Bayesian inference, 以便更好地理解疾病经验和确保医疗公平。
methods: 这个研究使用了Malawi模型，并采用自定义MCMC适应。
results: 研究发现，这种方法可以 revel mean orderings和估计排名变幂，具有潜在的临床应用。然而，研究受到规模化和小样本大小的限制。

Abstract
Machine learning models offer the potential to understand diverse datasets in a data-driven way, powering insights into individual disease experiences and ensuring equitable healthcare. In this study, we explore Bayesian inference for characterising symptom sequences, and the associated modelling challenges. We adapted the Mallows model to account for partial rankings and right-censored data, employing custom MCMC fitting. Our evaluation, encompassing synthetic data and a primary progressive aphasia dataset, highlights the model's efficacy in revealing mean orderings and estimating ranking variance. This holds the potential to enhance clinical comprehension of symptom occurrence. However, our work encounters limitations concerning model scalability and small dataset sizes.

摘要

An Empirical Study of Uncertainty Estimation Techniques for Detecting Drift in Data Streams

paper_url: http://arxiv.org/abs/2311.13374
repo_url: None
paper_authors: Anton Winter, Nicolas Jourdan, Tristan Wirth, Volker Knauthe, Arjan Kuijper
for: 提高机器学习模型的可靠性在安全关键领域，如自动驾驶和医疗诊断。
methods: 使用uncertainty值作为代替错误率的检测方法，以减少在部署后获取真实标签的需求。
results: 通过对五种uncertainty估计方法和ADWIN检测器在七个实际数据集上进行比较，发现SWAG方法具有更好的准确性，但选择uncertainty估计方法的影响不大，even the most basic method可以达到竞争力强的性能。这些发现有助于实际应用中对uncertainty-based drift检测的实际应用。

Abstract
In safety-critical domains such as autonomous driving and medical diagnosis, the reliability of machine learning models is crucial. One significant challenge to reliability is concept drift, which can cause model deterioration over time. Traditionally, drift detectors rely on true labels, which are often scarce and costly. This study conducts a comprehensive empirical evaluation of using uncertainty values as substitutes for error rates in detecting drifts, aiming to alleviate the reliance on labeled post-deployment data. We examine five uncertainty estimation methods in conjunction with the ADWIN detector across seven real-world datasets. Our results reveal that while the SWAG method exhibits superior calibration, the overall accuracy in detecting drifts is not notably impacted by the choice of uncertainty estimation method, with even the most basic method demonstrating competitive performance. These findings offer valuable insights into the practical applicability of uncertainty-based drift detection in real-world, safety-critical applications.

摘要
在安全关键领域如自动驾驶和医疗诊断中，机器学习模型的可靠性是关键。一个重要挑战是概念漂移，可以导致模型逐渐异常。传统的漂移探测器通常依赖真实标签，然而这些标签经常匮乏和昂贵。本研究对使用不确定性值作为错误率的替代品进行了全面的实验评估，以降低在部署后获取标签数据的依赖。我们在七个实际数据集上对五种不确定性估计方法与ADWIN探测器进行了比较，结果表明，虽然SWAG方法具有更好的准确性，但选择不确定性估计方法对总体漂移探测的准确率没有显著影响，即使使用最简单的方法也能够实现竞争力强的表现。这些发现对实际应用中的实用性提供了有价值的信息。

REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints

paper_url: http://arxiv.org/abs/2311.13349
repo_url: https://github.com/fracorti/reds
paper_authors: Francesco Corti, Balz Maag, Joachim Schauer, Ulrich Pferschy, Olga Saukh
for: 这篇论文旨在解决Edge设备上深度模型的资源不稳定性问题，以提高模型在不同硬件平台上的可扩展性和效率。
methods: 这篇论文提出了一种名为Resource-Efficient Deep Subnetworks（REDS）的新方法，它使用了构造的缺失来实现模型的灵活化，并且通过了一个新的迭代运算器优化器来避免繁重的计算块。
results: 在六个 benchmark 架构上进行了训练，并在四个商业化的手持式和嵌入式硬件平台上进行了评估，结果显示了REDS 可以实现高度的可扩展性和效率，并且在 Arduino Nano 33 BLE Sense 上显示了一个具有2层全连接层的简单深度模型在40微秒内适应资源范围内的适应时间。

Abstract
Deep models deployed on edge devices frequently encounter resource variability, which arises from fluctuating energy levels, timing constraints, or prioritization of other critical tasks within the system. State-of-the-art machine learning pipelines generate resource-agnostic models, not capable to adapt at runtime. In this work we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS use structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieve computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) leveraging simple math to re-arrange the order of operations in REDS computational graph to take advantage of the data cache. REDS support conventional deep networks frequently deployed on the edge and provide computational benefits even for small and simple networks. We evaluate REDS on six benchmark architectures trained on the Google Speech Commands, FMNIST and CIFAR10 datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40$\mu$s, utilizing a 2-layer fully-connected network on Arduino Nano 33 BLE Sense.

摘要
深度模型在边缘设备上部署时经常遇到资源变化，这些变化可能来自于能量水平的波动、时间约束或其他系统中的优先级任务。现有的机器学习管道通常生成不可变的资源模型，无法在运行时进行适应。在这种工作中，我们引入资源高效深度子网络（REDS），以解决模型适应变化资源的问题。与现有技术不同，REDS通过结构化缺失来实现可控性，并且通过Exploiting permutation invariance of neurons来实现硬件特定优化。具体来说，REDS可以通过（1）使用迭代队列优化器 skipping sequential computational blocks，以及（2）通过简单的数学来重新排序REDS计算图中的操作顺序，以利用数据缓存来提高计算效率。REDS支持常见的深度网络，并且在小型和简单的网络上提供了计算性能的改善。我们对六个 benchmark 架构进行了训练，并在 Google Speech Commands、FMNIST 和 CIFAR10 数据集上进行了测试。我们提供了理论分析和实验证明，表明 REDS 在测试集准确率方面表现出色，并且在响应动态资源约束的时间内（ menos than 40μs）内进行适应。在 Arduino Nano 33 BLE Sense 上使用 2-layer 全连接网络进行了实验证明。

MergeSFL: Split Federated Learning with Feature Merging and Batch Size Regulation

paper_url: http://arxiv.org/abs/2311.13348
repo_url: None
paper_authors: Yunming Liao, Yang Xu, Hongli Xu, Lun Wang, Zhiwei Yao, Chunming Qiao
for: 这篇论文旨在解决edge AI系统中的 federated learning（FL）中的计算和通信负担过重和模型隐私问题，通过结合数据和模型并行的 split federated learning（SFL）技术。
methods: 该论文提出了一种新的SFL框架，称为MergeSFL，它通过结合特征合并和批处理大小调节来解决EC系统中的统计差异和系统差异问题，并且同时优化这两种策略的关系以提高SFL的性能。
results: 实验结果表明，MergeSFL可以提高最终模型准确率由5.82%到26.22%，并且可以提高训练效率，比基eline上的性能提高1.74倍到4.14倍。

Abstract
Recently, federated learning (FL) has emerged as a popular technique for edge AI to mine valuable knowledge in edge computing (EC) systems. To mitigate the computing/communication burden on resource-constrained workers and protect model privacy, split federated learning (SFL) has been released by integrating both data and model parallelism. Despite resource limitations, SFL still faces two other critical challenges in EC, i.e., statistical heterogeneity and system heterogeneity. To address these challenges, we propose a novel SFL framework, termed MergeSFL, by incorporating feature merging and batch size regulation in SFL. Concretely, feature merging aims to merge the features from workers into a mixed feature sequence, which is approximately equivalent to the features derived from IID data and is employed to promote model accuracy. While batch size regulation aims to assign diverse and suitable batch sizes for heterogeneous workers to improve training efficiency. Moreover, MergeSFL explores to jointly optimize these two strategies upon their coupled relationship to better enhance the performance of SFL. Extensive experiments are conducted on a physical platform with 80 NVIDIA Jetson edge devices, and the experimental results show that MergeSFL can improve the final model accuracy by 5.82% to 26.22%, with a speedup by about 1.74x to 4.14x, compared to the baselines.

摘要
最近， federated learning（FL）已经出现为边缘智能（EC）系统中 valuabe 知识的挖掘技术。为了减轻工作者的计算/通信压力和保护模型隐私，split federated learning（SFL）已经发布，通过将数据和模型并行化。 despite resource limitations， SFL 仍面临两个关键挑战在 EC 中，即统计不同和系统不同。为了解决这些挑战，我们提出了一个新的 SFL 框架，称为 MergeSFL，通过将特征合并和批处理大小调节在 SFL 中。具体来说，特征合并目的是将工作者的特征合并成一个混合特征序列，这将 approximate 与 IID 数据中得到的特征，并且通过提高模型精度来促进模型的准确性。而批处理大小调节则是为不同的异质工作者分配适合的批处理大小，以提高训练效率。此外，MergeSFL 还尝试同时优化这两个策略，以更好地提高 SFL 的性能。我们在80个NVIDIA Jetson边缘设备上进行了实验，结果显示，MergeSFL 可以提高最终模型准确性 by 5.82% to 26.22%，并且Speedup by about 1.74x to 4.14x，相比基eline。

The Influence of Neural Networks on Hydropower Plant Management in Agriculture: Addressing Challenges and Exploring Untapped Opportunities

paper_url: http://arxiv.org/abs/2311.13293
repo_url: None
paper_authors: C. Coelho, M. Fernanda P. Costa, L. L. Ferrás
for:这项研究的目的是提高水力发电厂的管理方法，以确保稳定的可再生能源和可持续的农业发展。methods:该研究使用人工神经网络技术，以便评估当前水力发电厂管理软件的现状，并提出了一种农业conscious的发电厂管理框架，以最大化电力生产的同时，保证农业稳定的水源供应。results:研究发现，当前的水力发电厂管理软件可能会导致农业水需求不满足，特别是在旱期时。提出了一种农业conscious的发电厂管理框架，可以最大化电力生产的同时，保证农业稳定的水源供应。此外，还提出了一些政策措施，以便提高模型的透明度和可靠性，以保障农业从不当的压力中免受损害。

Abstract
Hydropower plants are crucial for stable renewable energy and serve as vital water sources for sustainable agriculture. However, it is essential to assess the current water management practices associated with hydropower plant management software. A key concern is the potential conflict between electricity generation and agricultural water needs. Prioritising water for electricity generation can reduce irrigation availability in agriculture during crucial periods like droughts, impacting crop yields and regional food security. Coordination between electricity and agricultural water allocation is necessary to ensure optimal and environmentally sound practices. Neural networks have become valuable tools for hydropower plant management, but their black-box nature raises concerns about transparency in decision making. Additionally, current approaches often do not take advantage of their potential to create a system that effectively balances water allocation. This work is a call for attention and highlights the potential risks of deploying neural network-based hydropower plant management software without proper scrutiny and control. To address these concerns, we propose the adoption of the Agriculture Conscious Hydropower Plant Management framework, aiming to maximise electricity production while prioritising stable irrigation for agriculture. We also advocate reevaluating government-imposed minimum water guidelines for irrigation to ensure flexibility and effective water allocation. Additionally, we suggest a set of regulatory measures to promote model transparency and robustness, certifying software that makes conscious and intelligent water allocation decisions, ultimately safeguarding agriculture from undue strain during droughts.

摘要
风力电厂对稳定可再生能源和可持续农业是非常重要的，但是需要评估现有风力电厂管理软件的水资源管理做法。关键问题是在发电和农业水需之间可能存在冲突，偏向发电可能导致干旱季节内的灌溉不足，影响农作物产量和地区食品安全。需要在发电和农业水分配之间协调，以确保优化和环保的做法。 neural network 已成为风力电厂管理的 valuable 工具，但是其黑盒特性引发了决策过程中的透明度问题。此外，现有的方法通常不会利用 neural network 的潜在，创造一个可以均衡水分配的系统。这是一个呼吁和披露风力电厂管理软件的风险，而不是适当的审查和控制。为了解决这些问题，我们提议采用农业conscious风力电厂管理框架，以 maximize 电力生产，同时优先保持干旱季节内的灌溉稳定。我们还建议修改政府对农业水资源的最低水准要求，以确保灵活性和有效的水分配。此外，我们建议一些法规措施，以便确保软件的透明度和可靠性，最终保护农业免受旱情的不必要压力。

Improving performance of heart rate time series classification by grouping subjects

paper_url: http://arxiv.org/abs/2311.13285
repo_url: None
paper_authors: Michael Beekhuizen, Arman Naseri, David Tax, Ivo van der Bilt, Marcel Reinders
for: 用于预测活动的 Classification 任务
methods: 使用了normalization、 groupingSubjects和手动编写的特征，并将其作为深度学习（DL）网络的输入
results: 发现 Heart rate time series 可以用于 Classification 任务，但需要谨慎选择 normalization 或 grouping techniques 以降低subject variability 的问题

Abstract
Unlike the more commonly analyzed ECG or PPG data for activity classification, heart rate time series data is less detailed, often noisier and can contain missing data points. Using the BigIdeasLab_STEP dataset, which includes heart rate time series annotated with specific tasks performed by individuals, we sought to determine if general classification was achievable. Our analyses showed that the accuracy is sensitive to the choice of window/stride size. Moreover, we found variable classification performances between subjects due to differences in the physical structure of their hearts. Various techniques were used to minimize this variability. First of all, normalization proved to be a crucial step and significantly improved the performance. Secondly, grouping subjects and performing classification inside a group helped to improve performance and decrease inter-subject variability. Finally, we show that including handcrafted features as input to a deep learning (DL) network improves the classification performance further. Together, these findings indicate that heart rate time series can be utilized for classification tasks like predicting activity. However, normalization or grouping techniques need to be chosen carefully to minimize the issue of subject variability.

摘要
不同于更常分析的ECG或PPG数据，心率时间序列数据更加简单，经常具有噪音，并可能包含缺失数据点。使用BigIdeasLab_STEP数据集，该数据集包含人类完成特定任务的心率时间序列标注，我们希图确定是否可以实现通用分类。我们的分析发现，选择窗口/步长大小的灵活性会影响准确率的敏感度。此外，我们发现 между个体的物理结构差异导致分类性能之间的差异。我们采用了多种技术来减少这种差异。首先，normalization是一个关键步骤，可以显著提高性能。其次，对分组Subject进行分类，可以提高性能并降低个体间的差异。最后，我们发现在深度学习（DL）网络中使用手工特征作为输入可以进一步提高分类性能。总之，这些发现表明心率时间序列可以用于活动预测任务，但是需要注意选择正确的normalization或分组策略，以避免个体差异的问题。

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

paper_url: http://arxiv.org/abs/2311.13624
repo_url: None
paper_authors: Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang
for: 本研究旨在探讨Gradient Leakage攻击的特点和防范策略，以便更好地保护对私人数据的访问。
methods: 本研究使用Transformer模型进行探讨，并通过仔细分析Gradient Leakage攻击的方法，证明可以准确地从Gradient中提取敏感数据。
results: 本研究发现，Gradient Leakage攻击可以准确地提取Transformer模型中的敏感数据，并且可以在某些情况下进行攻击。此外，本研究还证明了在Gradient Leakage攻击下，SGD算法可以保持稳定的性能。

Abstract
The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients. This approach poses a substantial threat to the privacy of individuals and organizations alike. This research presents a comprehensive analysis of the gradient leakage method when applied specifically to transformer-based models. Through meticulous examination, we showcase the capability to accurately recover data solely from gradients and rigorously investigate the conditions under which gradient attacks can be executed, providing compelling evidence. Furthermore, we reevaluate the approach of introducing additional noise on gradients as a protective measure against gradient attacks. To address this, we outline a theoretical proof that analyzes the associated privacy costs within the framework of differential privacy. Additionally, we affirm the convergence of the Stochastic Gradient Descent (SGD) algorithm under perturbed gradients. The primary objective of this study is to augment the understanding of gradient leakage attack and defense strategies while actively contributing to the development of privacy-preserving techniques specifically tailored for transformer-based models. By shedding light on the vulnerabilities and countermeasures associated with gradient leakage, this research aims to foster advancements in safeguarding sensitive data and upholding privacy in the context of transformer-based models.

摘要
“深层泄露梯度（DLG）攻击已成为训练数据隐私泄露的常见和高效方法。这种攻击方法对个人和组织都 pose 了重大隐私威胁。本研究对特征器基本模型中的梯度泄露方法进行了全面分析。通过仔细的检查，我们证明可以准确地从梯度中提取数据，并且仔细调查了梯度攻击可以在哪些条件下执行，提供了具有吸引力的证据。此外，我们重新评估了在梯度上添加随机噪声作为隐私保护措施的方法。在这个框架下，我们提供了对隐私成本的分析，并证明了SGD算法在受扰 gradients 下的收敛性。本研究的主要目标是增进梯度泄露攻击和防御策略的理解，并为特征器基本模型隐私保护技术的开发作出贡献。通过探讨梯度泄露的漏洞和防御策略，本研究 aspires 在保护敏感数据和维护隐私方面提供新的想法和方法。”

Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective

paper_url: http://arxiv.org/abs/2311.13279
repo_url: None
paper_authors: Hao Yuan, Yajiong Liu, Yanfeng Zhang, Xin Ai, Qiange Wang, Chaoyi Chen, Yu Gu, Ge Yu
for: 本研究旨在从数据管理角度对GNNS的训练进行复杂分析和评估，并提供了一系列实验结果和实践建议，帮助未来GNNS训练系统的设计。
methods: 本文使用了多种代表性方法来评估GNNS训练的数据管理方法，包括数据分区、批处理准备、数据传输等方面的优化。
results: 本文通过对多个benchmark数据集的广泛实验，显示了许多有趣和有价值的结果，并提供了一些实践建议，帮助未来GNNS训练系统的设计。

Abstract
Many Graph Neural Network (GNN) training systems have emerged recently to support efficient GNN training. Since GNNs embody complex data dependencies between training samples, the training of GNNs should address distinct challenges different from DNN training in data management, such as data partitioning, batch preparation for mini-batch training, and data transferring between CPUs and GPUs. These factors, which take up a large proportion of training time, make data management in GNN training more significant. This paper reviews GNN training from a data management perspective and provides a comprehensive analysis and evaluation of the representative approaches. We conduct extensive experiments on various benchmark datasets and show many interesting and valuable results. We also provide some practical tips learned from these experiments, which are helpful for designing GNN training systems in the future.

摘要
多种图神经网络（GNN）训练系统在最近出现，以支持高效的GNN训练。由于GNN包含复杂的训练样本之间的数据依赖关系，因此GNN训练需要解决不同于深度神经网络（DNN）训练的数据管理问题，如数据分区、批处理 для小批训练和CPU和GPU之间数据传输。这些因素占用了训练时间的大部分，因此数据管理在GNN训练中更加重要。本文从数据管理角度对GNN训练进行了全面的分析和评估，并在各种标准数据集上进行了广泛的实验。我们通过这些实验获得了许多有趣和有价值的结果，并提供了一些实践的建议，可以帮助未来设计GNN训练系统。

Hard Label Black Box Node Injection Attack on Graph Neural Networks

paper_url: http://arxiv.org/abs/2311.13244
repo_url: https://github.com/bryanzhou008/hard_label_black_box_attack_gnn
paper_authors: Yu Zhou, Zihao Dong, Guofeng Zhang, Jingchen Tang
for: 本研究旨在攻击图像神经网络（Graph Neural Network，GNN），具体来说是一种非目标性的硬标签黑盒节点插入攻击（Hard Label Black Box Node Injection Attack）。
methods: 本研究基于现有的边扰动攻击，通过限制优化过程来实现节点插入攻击。
results: 研究使用了三个数据集（COIL-DEL、IMDB-BINARY、NCI1）进行评估攻击性能。

Abstract
While graph neural networks have achieved state-of-the-art performances in many real-world tasks including graph classification and node classification, recent works have demonstrated they are also extremely vulnerable to adversarial attacks. Most previous works have focused on attacking node classification networks under impractical white-box scenarios. In this work, we will propose a non-targeted Hard Label Black Box Node Injection Attack on Graph Neural Networks, which to the best of our knowledge, is the first of its kind. Under this setting, more real world tasks can be studied because our attack assumes no prior knowledge about (1): the model architecture of the GNN we are attacking; (2): the model's gradients; (3): the output logits of the target GNN model. Our attack is based on an existing edge perturbation attack, from which we restrict the optimization process to formulate a node injection attack. In the work, we will evaluate the performance of the attack using three datasets, COIL-DEL, IMDB-BINARY, and NCI1.

摘要
“graph neural networks在许多实际任务中已经 дости得了State-of-the-art的性能，包括图像顺序分类和节点顺序分类。然而， latest works have shown that they are also extremely vulnerable to adversarial attacks。大多数前一些工作都是在不实际的白盒enario下进行了攻击。在这项工作中，我们将提出一种非目标性 Hard Label Black Box Node Injection Attack on Graph Neural Networks，这是我们所知道的第一例。在这个设定下，更多的实际任务可以被研究，因为我们的攻击假设了无关（1）：攻击的GNN模型的架构；（2）：攻击的GNN模型的梯度；（3）：目标GNN模型的输出logits。我们的攻击基于现有的边缺陷攻击，从而限制优化过程以形成节点插入攻击。在工作中，我们将评估攻击性能使用三个数据集：COIL-DEL、IMDB-BINARY和NCI1。”

NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

paper_url: http://arxiv.org/abs/2311.13225
repo_url: None
paper_authors: Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, Ge Yu
for: 提高 GPU 训练 GNN 模型的效率
methods: 使用层基task调度策略，将底层训练任务下载到 CPU，并且只有很少的顶层 vertices 进行 CPU 处理，以避免不效率的 CPU 处理
results: 与当前最佳 GNN 系统进行比较，NeutronOrch 可以实现高达 4.61 倍的性能提升

Abstract
Graph Neural Networks (GNNs) have demonstrated outstanding performance in various applications. Existing frameworks utilize CPU-GPU heterogeneous environments to train GNN models and integrate mini-batch and sampling techniques to overcome the GPU memory limitation. In CPU-GPU heterogeneous environments, we can divide sample-based GNN training into three steps: sample, gather, and train. Existing GNN systems use different task orchestrating methods to employ each step on CPU or GPU. After extensive experiments and analysis, we find that existing task orchestrating methods fail to fully utilize the heterogeneous resources, limited by inefficient CPU processing or GPU resource contention. In this paper, we propose NeutronOrch, a system for sample-based GNN training that incorporates a layer-based task orchestrating method and ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training. To avoid inefficient CPU processing, NeutronOrch only offloads the training of frequently accessed vertices to the CPU and lets GPU reuse their embeddings with bounded staleness. Furthermore, NeutronOrch provides a fine-grained pipeline design for the layer-based task orchestrating method, fully overlapping different tasks on heterogeneous resources while strictly guaranteeing bounded staleness. The experimental results show that compared with the state-of-the-art GNN systems, NeutronOrch can achieve up to 4.61x performance speedup.

摘要
graph neural networks (GNNs) 在多种应用中展现出色的表现。现有的框架利用 CPU-GPU 多核心环境来训练 GNN 模型，并将 mini-batch 和采样技术与 GPU 内存限制结合使用。在 CPU-GPU 多核心环境中，我们可以将 sample-based GNN 训练分成三步：采样、聚合和训练。现有的 GNN 系统使用不同的任务调度方法来在 CPU 和 GPU 上运行每个步骤。经过广泛的实验和分析，我们发现现有的任务调度方法无法完全利用多核心资源，受到 GPU 资源竞争或 CPU 处理不充分的限制。在这篇论文中，我们提出了 NeutronOrch，一个基于层次任务调度的 GNN 训练系统。NeutronOrch 将训练过程分解为层次，将底层训练任务下载到 CPU。这会significantly 减少 GPU 训练计算加载和内存占用。为了避免不充分的 CPU 处理，NeutronOrch 只有在 frequently 访问的顶点上下载训练任务，并让 GPU 重用它们的嵌入。此外，NeutronOrch 提供了细致的管道设计，完全在多核心资源上并行进行不同任务，同时严格 garantying bounded staleness。实验结果显示，相比现状态最佳 GNN 系统，NeutronOrch 可以实现到 4.61 倍的性能提升。

Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks

paper_url: http://arxiv.org/abs/2311.13180
repo_url: None
paper_authors: Jianqing Fan, Zhaoran Wang, Zhuoran Yang, Chenlu Ye
for: 这 paper Focuses on high-dimensional multi-armed contextual bandits with batched feedback, where the rewards are revealed only at the end of each batch.
methods: The paper proposes a provably sample-efficient algorithm that achieves a $ \mathcal{\tilde O}(s_0^2 \log^2 T)$ regret in the sparse case and $ \mathcal{\tilde O} ( r ^2 \log^2 T)$ regret in the low-rank case, using only $L = \mathcal{O}( \log T)$ batches.
results: The algorithm achieves regret bounds comparable to those in fully sequential settings with only $\mathcal{O}( \log T)$ batches, and features a novel batch allocation method that adjusts the batch sizes according to the estimation accuracy within each batch and cumulative regret.

Abstract
We study high-dimensional multi-armed contextual bandits with batched feedback where the $T$ steps of online interactions are divided into $L$ batches. In specific, each batch collects data according to a policy that depends on previous batches and the rewards are revealed only at the end of the batch. Such a feedback structure is popular in applications such as personalized medicine and online advertisement, where the online data often do not arrive in a fully serial manner. We consider high-dimensional and linear settings where the reward function of the bandit model admits either a sparse or low-rank structure and ask how small a number of batches are needed for a comparable performance with fully dynamic data in which $L = T$. For these settings, we design a provably sample-efficient algorithm which achieves a $ \mathcal{\tilde O}(s_0^2 \log^2 T)$ regret in the sparse case and $ \mathcal{\tilde O} ( r ^2 \log^2 T)$ regret in the low-rank case, using only $L = \mathcal{O}( \log T)$ batches. Here $s_0$ and $r$ are the sparsity and rank of the reward parameter in sparse and low-rank cases, respectively, and $ \mathcal{\tilde O}(\cdot)$ omits logarithmic factors involving the feature dimensions. In other words, our algorithm achieves regret bounds comparable to those in fully sequential setting with only $\mathcal{O}( \log T)$ batches. Our algorithm features a novel batch allocation method that adjusts the batch sizes according to the estimation accuracy within each batch and cumulative regret. Furthermore, we also conduct experiments with synthetic and real-world data to validate our theory.

摘要
我们研究高维度多臂枪矢弹炮，其中 $T$ 步在线交互中被分成 $L$ 批。具体来说，每个批次收集根据先前批次和奖励的策略，并在批次结束时公布奖励。这种反馈结构在应用中有很多，例如个性化医疗和在线广告，因为在线数据通常不会全部Serial来。我们考虑高维度和线性的设定，其中奖励函数的枪矢模型允许某些簇矢或低矢组织。我们询问需要多少批次以达到与完全动态数据相等的性能，并且我们设计了一个可证sample效率的算法，它在簇矢 caso下有 $\mathcal{\tilde O}(s_0^2 \log^2 T)$ 的 regret，在低矢 caso下有 $\mathcal{\tilde O}(r^2 \log^2 T)$ 的 regret，使用了只有 $\mathcal{O}(\log T)$ 批次。在这些设定下，我们的算法可以实现可证sample效率的性能，并且在实验中验证了我们的理论。我们的算法包括一个新的批次分配方法，它根据批次内部的精度和累绩 regret进行调整。此外，我们还进行了实验，使用了实验和实际数据来验证我们的理论。

SecureCut: Federated Gradient Boosting Decision Trees with Efficient Machine Unlearning

paper_url: http://arxiv.org/abs/2311.13174
repo_url: None
paper_authors: Jian Zhang, Bowen Li Jie Li, Chentao Wu
for: 本研究旨在 Addressing the challenge of data removal in Vertical Federated Learning (VFL) scenarios, where multiple parties provide private features for model training, and honoring the “right to be forgotten” legislation.methods: 我们提出了一个 novel Gradient Boosting Decision Tree (GBDT) framework, which enables both instance unlearning and feature unlearning without the need for retraining from scratch.results: 我们的方法可以实现有效的数据删除，同时减少模型性能下降。实验结果显示，我们的方法在各个受测数据集上实现了比 state-of-the-art 方法更高的模型用途和遗忘性。

Abstract
In response to legislation mandating companies to honor the \textit{right to be forgotten} by erasing user data, it has become imperative to enable data removal in Vertical Federated Learning (VFL) where multiple parties provide private features for model training. In VFL, data removal, i.e., \textit{machine unlearning}, often requires removing specific features across all samples under privacy guarentee in federated learning. To address this challenge, we propose \methname, a novel Gradient Boosting Decision Tree (GBDT) framework that effectively enables both \textit{instance unlearning} and \textit{feature unlearning} without the need for retraining from scratch. Leveraging a robust GBDT structure, we enable effective data deletion while reducing degradation of model performance. Extensive experimental results on popular datasets demonstrate that our method achieves superior model utility and forgetfulness compared to \textit{state-of-the-art} methods. To our best knowledge, this is the first work that investigates machine unlearning in VFL scenarios.

摘要
鉴于法律规定公司必须尊重用户的“忘记权”，即在Vertically Federated Learning（VFL）中多方提供的私有特征用于模型训练中删除用户数据已成为必要。在VFL中，数据删除，即“机器学习忘记”，经常需要在隐私保证下对所有样本中删除特定的特征。为解决这个挑战，我们提议了一种新的Gradient Boosting Decision Tree（GBDT）框架，可以有效地实现实例忘记和特征忘记，无需重新训练。利用GBDT的稳定结构，我们可以有效地删除数据，同时降低模型性能下降。我们的方法在popular datasets上进行了广泛的实验，结果表明，我们的方法可以在VFL场景中实现更高的模型用用和忘记性，比较于当前的方法。据我们所知，这是第一篇investigates机器忘记在VFL场景中的研究。

AdaptiveFL: Adaptive Heterogeneous Federated Learning for Resource-Constrained AIoT Systems

paper_url: http://arxiv.org/abs/2311.13166
repo_url: None
paper_authors: Chentao Jia, Ming Hu, Zekai Chen, Yanxin Yang, Xiaofei Xie, Yang Liu, Mingsong Chen
for: 本研究旨在解决 Federated Learning (FL) 在 Artificial Intelligence of Things (AIoT) 设备之间的协同学习中存在的分类性能低下问题，以及各种设备之间的差异性因素 (如计算能力、存储大小) 的影响。
methods: 本研究提出了一种有效的 Federated Learning 方法，名为 AdaptiveFL，该方法基于一种新的细致化宽度模型剔除策略。该策略可以生成各种异构的本地模型，以适应各种 AIoT 设备的差异性。此外，我们还提出了一种基于 reinforcement learning 的设备选择机制，可以在 fly 中根据设备的可用资源选择适合的异构模型进行本地训练。
results: 对比 state-of-the-art 方法，AdaptiveFL 可以在 IID 和非 IID 场景中实现最高达 16.83% 的推理改进。

Abstract
Although Federated Learning (FL) is promising to enable collaborative learning among Artificial Intelligence of Things (AIoT) devices, it suffers from the problem of low classification performance due to various heterogeneity factors (e.g., computing capacity, memory size) of devices and uncertain operating environments. To address these issues, this paper introduces an effective FL approach named AdaptiveFL based on a novel fine-grained width-wise model pruning strategy, which can generate various heterogeneous local models for heterogeneous AIoT devices. By using our proposed reinforcement learning-based device selection mechanism, AdaptiveFL can adaptively dispatch suitable heterogeneous models to corresponding AIoT devices on the fly based on their available resources for local training. Experimental results show that, compared to state-of-the-art methods, AdaptiveFL can achieve up to 16.83% inference improvements for both IID and non-IID scenarios.

摘要
尽管联合学习（FL）有搭建多个人工智能物联网（AIoT）设备之间的合作学习的承诺，但它受到各种设备内存大小、计算能力等不同因素的影响，导致分类性能较低。为解决这些问题，这篇论文提出了一种有效的FL方法名为AdaptiveFL，基于一种新的细化的宽度模型剔除策略。这种策略可以生成不同的多样化本地模型 для不同的AIoT设备。通过我们提出的利用奖励学习来选择适合的设备，AdaptiveFL可以在设备上运行的地方适应性地分配适合的多样化模型。实验结果表明，相比于状态艺术方法，AdaptiveFL可以在IID和非IID场景下实现最多16.83%的推理提升。

Have Your Cake and Eat It Too: Toward Efficient and Accurate Split Federated Learning

paper_url: http://arxiv.org/abs/2311.13163
repo_url: None
paper_authors: Dengke Yan, Ming Hu, Zeke Xia, Yanxin Yang, Jun Xia, Xiaofei Xie, Mingsong Chen
For: This paper aims to address the challenges of low inference accuracy and low efficiency in Split Federated Learning (SFL) for AIoT systems.* Methods: The proposed approach, named Sliding Split Federated Learning (S$^2$FL), adopts an adaptive sliding model split strategy and a data balance-based training mechanism to alleviate the issues of stragglers and data heterogeneity.* Results: Compared to conventional SFL, S$^2$FL can achieve up to 16.5% inference accuracy improvement and 3.54X training acceleration, as demonstrated by experimental results.Here’s the summary in Simplified Chinese:* 为：本文目标是解决SFL在AIoT系统中的低推理精度和低效率问题。* 方法：提议的S$^2$FL方法采用适应式滑块模型分 split策略和数据均衡基于训练机制，以解决各种缓慢设备和数据不均问题。* 结果：相比传统SFL，S$^2$FL可以达到16.5%的推理精度提升和3.54倍的训练加速，根据实验结果显示。

Abstract
Due to its advantages in resource constraint scenarios, Split Federated Learning (SFL) is promising in AIoT systems. However, due to data heterogeneity and stragglers, SFL suffers from the challenges of low inference accuracy and low efficiency. To address these issues, this paper presents a novel SFL approach, named Sliding Split Federated Learning (S$^2$FL), which adopts an adaptive sliding model split strategy and a data balance-based training mechanism. By dynamically dispatching different model portions to AIoT devices according to their computing capability, S$^2$FL can alleviate the low training efficiency caused by stragglers. By combining features uploaded by devices with different data distributions to generate multiple larger batches with a uniform distribution for back-propagation, S$^2$FL can alleviate the performance degradation caused by data heterogeneity. Experimental results demonstrate that, compared to conventional SFL, S$^2$FL can achieve up to 16.5\% inference accuracy improvement and 3.54X training acceleration.

摘要
因其在资源受限enario中的优势，Split Federated Learning（SFL）在AIoT系统中表现承诺。然而，由于数据不同性和延迟器，SFL受到低推理精度和低效率的挑战。为解决这些问题，本文提出了一种新的SFL方法，即Sliding Split Federated Learning（S$^2$FL），该方法采用了适应式分割模型拟合策略和数据均衡式训练机制。通过在不同计算能力的AIoT设备上动态派发不同模型部分，S$^2$FL可以减轻由延迟器引起的训练效率低下。通过将具有不同数据分布的设备上传的特征组合成多个大批次，并将其用于反射传播，S$^2$FL可以减轻由数据不同性引起的性能下降。实验结果表明，相比传统SFL，S$^2$FL可以提高推理精度达16.5%，并提高训练速度3.54倍。

Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow

paper_url: http://arxiv.org/abs/2311.13159
repo_url: None
paper_authors: Yinuo Ren, Tesi Xiao, Tanmay Gangwani, Anshuka Rangi, Holakou Rahmanian, Lexing Ying, Subhajit Sanyal
for: 该研究旨在提出一种基于分子动力学 simulations的多目标优化（MOO）方法，用于优化多个可能存在冲突的目标。
methods: 该方法 combinest overdamped Langevin 和 birth-death dynamics，并引入 “dominance potential” 以导引 particles towards global Pareto optimality。
results: 对比前一代方法，该方法能够更好地处理复杂的 Pareto front，并且有许多实验证明其在Synthetic 和实际数据上表现出色。I hope this helps! Let me know if you have any other questions.

Abstract
Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a "dominance potential" to steer particles toward global Pareto optimality. In contrast to previous methods, our method is able to relocate dominated particles, making it particularly adept at managing Pareto fronts of complicated geometries. Our method is also theoretically grounded as a Wasserstein-Fisher-Rao gradient flow with convergence guarantees. Extensive experiments confirm that our approach outperforms state-of-the-art methods on challenging synthetic and real-world datasets.

摘要
Simplified Chinese translation:多目标优化（MOO）旨在同时优化多个可能冲突的目标，具有广泛的应用前景。我们提出了基于分子动力学 simulations的一种新型互动 particles 方法，具有适应性和可控性。我们的方法结合了倒摆 Langevin 和生成 Death 动力学，并添加了 "主导性潜在" 来导引粒子向全面 Pareto 优化。与之前的方法不同，我们的方法可以重新分配被占据的粒子，使其特别适合处理复杂的 Pareto 前面。我们的方法还有理论基础，可以视为 Wasserstein-Fisher-Rao 梯度流，并有确定的收敛保证。广泛的实验证明了我们的方法在复杂的 sintetic 和实际数据上表现出色，超过了现有方法。

Testing Closeness of Multivariate Distributions via Ramsey Theory

paper_url: http://arxiv.org/abs/2311.13154
repo_url: None
paper_authors: Ilias Diakonikolas, Daniel M. Kane, Sihan Liu
for: This paper is written for the statistical task of closeness testing for multidimensional distributions.
methods: The paper uses a computationally efficient closeness tester with sub-learning sample complexity in any fixed dimension, and establishes a qualitatively matching sample complexity lower bound. The method makes essential use of tools from Ramsey theory.
results: The paper obtains a sample complexity bound of $O\left((k^{6/7}/\text{poly}_d(\epsilon)) \log^d(k)\right)$ for the closeness testing problem, and a lower bound of $\Omega(k^{6/7}/\text{poly}(\epsilon))$, even for $d=2$. The results show a substantial increase in sample complexity when moving from one to two dimensions, while increases beyond that do not.

Abstract
We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions. Specifically, given sample access to two unknown distributions $\mathbf p, \mathbf q$ on $\mathbb R^d$, we want to distinguish between the case that $\mathbf p=\mathbf q$ versus $\|\mathbf p-\mathbf q\|_{A_k} > \epsilon$, where $\|\mathbf p-\mathbf q\|_{A_k}$ denotes the generalized ${A}_k$ distance between $\mathbf p$ and $\mathbf q$ -- measuring the maximum discrepancy between the distributions over any collection of $k$ disjoint, axis-aligned rectangles. Our main result is the first closeness tester for this problem with {\em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound. In more detail, we provide a computationally efficient closeness tester with sample complexity $O\left((k^{6/7}/ \mathrm{poly}_d(\epsilon)) \log^d(k)\right)$. On the lower bound side, we establish a qualitatively matching sample complexity lower bound of $\Omega(k^{6/7}/\mathrm{poly}(\epsilon))$, even for $d=2$. These sample complexity bounds are surprising because the sample complexity of the problem in the univariate setting is $\Theta(k^{4/5}/\mathrm{poly}(\epsilon))$. This has the interesting consequence that the jump from one to two dimensions leads to a substantial increase in sample complexity, while increases beyond that do not. As a corollary of our general $A_k$ tester, we obtain $d_{\mathrm TV}$-closeness testers for pairs of $k$-histograms on $\mathbb R^d$ over a common unknown partition, and pairs of uniform distributions supported on the union of $k$ unknown disjoint axis-aligned rectangles. Both our algorithm and our lower bound make essential use of tools from Ramsey theory.

摘要
我们研究Multidimensional分布的Statistical任务，具体是给定两个不知道分布 $\mathbf p$ 和 $\mathbf q$ 在 $\mathbb R^d$ 上，判断是否满足 $\mathbf p = \mathbf q$ 或 $\|\mathbf p - \mathbf q\|_{A_k} > \epsilon$，其中 $\|\mathbf p - \mathbf q\|_{A_k}$ 是在 $k$ 个垂直排序的方向上的对称化距离。我们的主要结果是在任何固定维度下的第一个靠近测试，具体是提供一个可 computable 的靠近测试器，其 Sample Complexity 为 $O\left((k^{6/7}/\text{poly}_d(\epsilon)) \log^d(k)\right)$。在下界方面，我们建立了一个Qualitatively matching的下界 Sample Complexity bound，即 $\Omega(k^{6/7}/\text{poly}(\epsilon))$, 甚至在 $d=2$ 的情况下。这些 Sample Complexity bounds 是惊人的，因为在一维情况下，这个问题的 Sample Complexity 为 $\Theta(k^{4/5}/\text{poly}(\epsilon))$。这有着兴味的结果，即从一维变为二维会带来很大的Sample Complexity 增长，而以后的增长不会。作为一个应用，我们得到了 $d_{\text{TV}$-靠近测试器，用于对两个 $k$-histogram 在 $\mathbb R^d$ 上的共同未知分配进行靠近测试，以及对两个均匀分布在 UNION 的 $k$ 个未知排序方向上的靠近测试。 both our algorithm and our lower bound make essential use of tools from Ramsey theory.

Optimal Transport with Cyclic Symmetry

paper_url: http://arxiv.org/abs/2311.13147
repo_url: None
paper_authors: Shoichiro Takeda, Yasunori Akagi, Naoki Marumo, Kenta Niwa
for: 这 paper 的目的是提出一种新的快速算法，用于优化运输 (OT)，利用输入数据中的循环对称结构。
methods: 这 paper 使用了循环对称结构和各种优化技术，将 OT 减少到一个较小的优化问题，并通过解这个问题来获得优化解决方案。
results: experiments 表明，这些算法在具有约束对称结构的 synthetic/real-world 数据上具有效果，并且比直接解决 OT 更快。

Abstract
We propose novel fast algorithms for optimal transport (OT) utilizing a cyclic symmetry structure of input data. Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing. Our main idea is to reduce OT to a small optimization problem that has significantly fewer variables by utilizing cyclic symmetry and various optimization techniques. On the basis of this reduction, our algorithms solve the small optimization problem instead of the original OT. As a result, our algorithms obtain the optimal solution and the objective function value of the original OT faster than solving the original OT directly. In this paper, our focus is on two crucial OT formulations: the linear programming OT (LOT) and the strongly convex-regularized OT, which includes the well-known entropy-regularized OT (EROT). Experiments show the effectiveness of our algorithms for LOT and EROT in synthetic/real-world data that has a strict/approximate cyclic symmetry structure. Through theoretical and experimental results, this paper successfully introduces the concept of symmetry into the OT research field for the first time.

摘要
我们提出了一种新的快速算法，用于优化运输（OT），利用输入数据的循环Symmetry结构。这种OT WITH cyclic symmetry在各种实际应用中出现，包括图像处理、城市规划和图表处理。我们的主要想法是通过利用循环Symmetry和各种优化技术，将OT减少到一个较小的优化问题，并通过解决这个小问题来得到优化的解。这使得我们的算法可以更快地解决OT问题，并且可以保证解决的结果是最优的。在这篇论文中，我们关注了两种关键的OT形式：线性Programming OT（LOT）和强 convex-regulated OT，包括知名的Entropy-regulated OT（EROT）。实验表明，我们的算法在 sintetic/实际数据中，具有约束/approximate cyclic symmetry结构时，对LOT和EROT具有显著的效果。通过理论和实验结果，这篇论文成功地将Symmetry概念引入了OT研究领域中，并为OT问题提供了一种新的解决方案。

Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

paper_url: http://arxiv.org/abs/2311.13094
repo_url: None
paper_authors: Chuan He, Zhaosong Lu
for: 本文研究了一个非 convex 无约束优化问题，目标是最小化一个两遍导数函数的对数函数。
methods: 我们首先提出了一种 Newton-conjugate gradient (Newton-CG) 方法，用于在这个问题中找到一个近似第一阶站点 (FOSP)，假设关联的 H"older 参数已知。然后，我们开发了一种无参数 Newton-CG 方法，不需要任何先知参数。这种方法在我们所知道的范围内具有最佳知义迭代和操作复杂度，用于找到这个问题的近似 FOSP。
results: 我们提出了一种 Newton-CG 方法，用于在高概率下找到这个问题的近似第二阶站点 (SOSP)，并确定了其迭代和操作复杂度。此外，我们还提供了一些初步的数据分析结果，证明我们的无参数 Newton-CG 方法在实际应用中比一种已知规范 Newton 方法更高效。

Abstract
In this paper we consider a nonconvex unconstrained optimization problem minimizing a twice differentiable objective function with H\"older continuous Hessian. Specifically, we first propose a Newton-conjugate gradient (Newton-CG) method for finding an approximate first-order stationary point (FOSP) of this problem, assuming the associated the H\"older parameters are explicitly known. Then we develop a parameter-free Newton-CG method without requiring any prior knowledge of these parameters. To the best of our knowledge, this method is the first parameter-free second-order method achieving the best-known iteration and operation complexity for finding an approximate FOSP of this problem. Furthermore, we propose a Newton-CG method for finding an approximate second-order stationary point (SOSP) of the considered problem with high probability and establish its iteration and operation complexity. Finally, we present preliminary numerical results to demonstrate the superior practical performance of our parameter-free Newton-CG method over a well-known regularized Newton method.

摘要
在本文中，我们考虑了一个非对称不约束优化问题，目标是最小化一个两遍导数可 diferenciable 函数的对象函数。我们首先提出了一种 Newton- conjugate gradient（Newton-CG）方法，用于查找一个approximate first-order stationary point（FOSP）的解，假设关联的Hölder参数已经知道。然后，我们开发了一种无参数 Newton-CG 方法，不需要任何先知的Hölder参数。我们认为这是第一个parameter-free second-order方法，可以在最好的iteration和操作复杂度下找到一个approximate FOSP。此外，我们还提出了一种 Newton-CG 方法，用于查找一个approximate second-order stationary point（SOSP）的解，并证明了其iteration和操作复杂度。最后，我们展示了一些初步的数值结果，证明了我们的parameter-free Newton-CG方法在实际应用中的超越性。

S	M	T	W	T	F	S
« December
29	30	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31	1	2
3	4	5	6	7	8	9