2023-10-28

cs.LG

cs.LG - 2023-10-28

paper_url: http://arxiv.org/abs/2310.18847
repo_url: None
paper_authors: Chen Liu, Kiran Lekkala, Laurent Itti
For: 本研究目的是开发一个可以从便宜的 simulator 转移到实际世界的 robot Navigation 系统。* Methods: 本研究使用了一个组合了传统的 World Model комponents的系统，并将其整合成一个可以在 simulator 上全部训练的 Robust 系统。为了促进转移，我们使用了基于 Bird’s Eye View (BEV) 的中间表示，并将它与 First-Person View (FPV) 的 RGB 图像进行翻译。* Results: 我们使用了 CARLA simulator 收集的数据进行训练，并显示了模型的效能。最后，我们发布了一个完整的代码库、数据和模型，供大众使用。

Abstract
Sim2Real transfer has gained popularity because it helps transfer from inexpensive simulators to real world. This paper presents a novel system that fuses components in a traditional \textit{World Model} into a robust system, trained entirely within a simulator, that \textit{Zero-Shot} transfers to the real world. To facilitate transfer, we use an intermediary representation that are based on \textit{Bird's Eye View (BEV)} images. Thus, our robot learns to navigate in a simulator by first learning to translate from complex \textit{First-Person View (FPV)} based RGB images to BEV representations, then learning to navigate using those representations. Later, when tested in the real world, the robot uses the perception model that translates FPV-based RGB images to embeddings that are used by the downstream policy. The incorporation of state-checking modules using \textit{Anchor images} and \textit{Mixture Density LSTM} not only interpolates uncertain and missing observations but also enhances the robustness of the model when exposed to the real-world environment. We trained the model using data collected using a \textit{Differential drive} robot in the CARLA simulator. Our methodology's effectiveness is shown through the deployment of trained models onto a \textit{Real world Differential drive} robot. Lastly we release a comprehensive codebase, dataset and models for training and deployment that are available to the public.

摘要
实际转移（Sim2Real）已经受到普遍采用，因为它帮助将来自便宜的模拟器转移到真实世界。这篇论文提出了一个新的系统，将模拟器中的元件融合成一个可靠的系统，由真实世界训练，并且透过运算获得零损转移。为了促进转移，我们使用了中心投影（Bird's Eye View，BEV）图像作为中介表示。因此，我们的机器人在模拟器中学习将复杂的首人视角（First-Person View，FPV）基于RGB图像转换为BEV表示，然后学习使用这些表示进行navigation。当它在真实世界中进行测试时，机器人使用视觉模型将FPV基于RGB图像转换为嵌入，这些嵌入被用于下游策略。另外，我们还使用了状态检查模组使用 anchor image和mixture density LSTM interpolate uncertain和缺失观察，这不仅让模型在真实世界环境中更加稳定，而且也增强了模型的可靠性。我们使用了通过CARLA模拟器收集的数据进行训练。我们的方法的有效性被显示在真实世界中部署训练好的模型。最后，我们发布了一个完整的代码库、数据集和模型，供大众使用。

A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees

paper_url: http://arxiv.org/abs/2310.18841
repo_url: None
paper_authors: Shuyao Li, Stephen J. Wright
for: Minimizing a smooth nonconvex function with inexact oracle access to gradient and Hessian.
methods: Using a novel method that chooses the step direction with equal probability of positive or negative sense, and using relative inexactness measures on gradient and Hessian.
results: Achieving ($\epsilon_{g}, \epsilon_{H}$)-approximate second-order optimality with convergence analysis based on martingale analysis and concentration inequalities.Here’s the full summary in Simplified Chinese:
for: 本文目的是使用不精准渐近 oracle 访问梯度和对角线，来实现 ($ \epsilon_{g}, \epsilon_{H} $)-精度二阶优化。
methods: 我们使用一种新的方法，其中在步长选择时，选择的方向的方向是正负两种有 Equal probability。此外，我们使用梯度和对角线的相对不精准度度量，并松弛了梯度和对角线的第一阶和第二阶误差之间的关联。
results: 我们可以通过 martingale 分析和集中不等式来证明我们的算法可以实现 ($\epsilon_{g}, \epsilon_{H}$)-精度二阶优化，并且可以应用到empirical risk minimization问题中。

Abstract
We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (but not the function value) to achieve $(\epsilon_{g}, \epsilon_{H})$-approximate second-order optimality. A novel feature of our method is that if an approximate direction of negative curvature is chosen as the step, we choose its sense to be positive or negative with equal probability. We also use relative inexactness measures on gradient and Hessian and relax the coupling between the first- and second-order tolerances $\epsilon_{g}$ and $\epsilon_{H}$. Our convergence analysis includes both an expectation bound based on martingale analysis and a high-probability bound based on concentration inequalities. We apply our algorithm to empirical risk minimization problems and obtain gradient sample complexity.

摘要
我们考虑使用非конvex函数的最小化，但是只有不准确的梯度和偏微分（而不是函数值）的偏 oracle 访问来实现($\epsilon_{g}, \epsilon_{H}$)-次极性。我们的新特点在于，如果选择一个近似的负曲率方向作为步骤，我们会选择其方向为正或负的概率为50%。我们还使用relative不准确度度量在梯度和偏微分上，并松弛了梯度和偏微分的 Coupling。我们的收敛分析包括基于Martingale分析的预期 bound和基于集中不等式的高概率 bound。我们将我们的算法应用到empirical risk minimization问题，并获得梯度样本复杂度。

Intrinsic Gaussian Vector Fields on Manifolds

paper_url: http://arxiv.org/abs/2310.18824
repo_url: None
paper_authors: Daniel Robert-Nicoud, Andreas Krause, Viacheslav Borovitskiy
for: 本文主要针对的是模型非欧几何空间上的向量值信号，尤其是在不确定性评估中。
methods: 本文提出了一种新的泊松过程模型，即 HODE-MATÉRN 泊松场，用于模型非欧几何空间上的向量值信号。
results: 本文的实验结果表明，HODE-MATÉRN 泊松场可以在二维球面和高维托里上提供更精细的 inductive bias，并且可以在不同的批处理上进行扩展。

Abstract
Various applications ranging from robotics to climate science require modeling signals on non-Euclidean domains, such as the sphere. Gaussian process models on manifolds have recently been proposed for such tasks, in particular when uncertainty quantification is needed. In the manifold setting, vector-valued signals can behave very differently from scalar-valued ones, with much of the progress so far focused on modeling the latter. The former, however, are crucial for many applications, such as modeling wind speeds or force fields of unknown dynamical systems. In this paper, we propose novel Gaussian process models for vector-valued signals on manifolds that are intrinsically defined and account for the geometry of the space in consideration. We provide computational primitives needed to deploy the resulting Hodge-Mat\'ern Gaussian vector fields on the two-dimensional sphere and the hypertori. Further, we highlight two generalization directions: discrete two-dimensional meshes and "ideal" manifolds like hyperspheres, Lie groups, and homogeneous spaces. Finally, we show that our Gaussian vector fields constitute considerably more refined inductive biases than the extrinsic fields proposed before.

摘要
各种应用，从机器人学到气候科学，需要在非欧几何空间上模型信号，例如球面。在拓扑上， Gaussian process 模型在拓扑上最近得到了提议，特别是当需要uncertainty量化时。在拓扑设置中，向量值信号可能会与scalar值信号有很大差异，而前者在许多应用中非常重要，例如模型风速或未知动力系统的力场。在这篇论文中，我们提出了新的 Gaussian process 模型，用于vector值信号在拓扑上的模型，这些模型具有内在定义的拓扑geometry。我们还提供了在两个维度的球面和杂质上运行这些Hodge-Matérn Gaussian vector fields的计算基础。此外，我们还提出了两个扩展方向：离散二维网格和"理想"拓扑，如高维球面、 Lie group 和同态空间。最后，我们表明了我们的 Gaussian vector fields 比之前提出的外在场更加细致，即更加精细的 inductive bias。

Successfully Applying Lottery Ticket Hypothesis to Diffusion Model

paper_url: http://arxiv.org/abs/2310.18823
repo_url: https://github.com/osier0524/lottery-ticket-to-ddpm
paper_authors: Chao Jiang, Bo Hui, Bohan Liu, Da Yan
for: 这个论文是为了应用抽签票假设（Lottery Ticket Hypothesis，LTH）到扩散模型而写的。
methods: 这个论文使用了LTH来找到一个扩散模型中的精炼版网络，并通过对这个精炼版网络进行简化来减少计算量。
results: 实验结果表明，这个方法可以找到一个具有更高精度且具有更少计算量的扩散模型。 codes可以在https://github.com/osier0524/Lottery-Ticket-to-DDPM中找到。

Abstract
Despite the success of diffusion models, the training and inference of diffusion models are notoriously expensive due to the long chain of the reverse process. In parallel, the Lottery Ticket Hypothesis (LTH) claims that there exists winning tickets (i.e., aproperly pruned sub-network together with original weight initialization) that can achieve performance competitive to the original dense neural network when trained in isolation. In this work, we for the first time apply LTH to diffusion models. We empirically find subnetworks at sparsity 90%-99% without compromising performance for denoising diffusion probabilistic models on benchmarks (CIFAR-10, CIFAR-100, MNIST). Moreover, existing LTH works identify the subnetworks with a unified sparsity along different layers. We observe that the similarity between two winning tickets of a model varies from block to block. Specifically, the upstream layers from two winning tickets for a model tend to be more similar than the downstream layers. Therefore, we propose to find the winning ticket with varying sparsity along different layers in the model. Experimental results demonstrate that our method can find sparser sub-models that require less memory for storage and reduce the necessary number of FLOPs. Codes are available at https://github.com/osier0524/Lottery-Ticket-to-DDPM.

摘要
尽管扩散模型取得成功，但它们的训练和推理过程却很昂贵，主要因为扩散过程中的链式结构。同时， Lottery Ticket Hypothesis（LTH）假设存在赢家票（即适当剪辑后的子网络以及原始权重初始化），可以在孤立训练中与普通 dense neural network 具有竞争性的性能。在这个工作中，我们首次应用 LTH 到扩散模型。我们实验发现，在 CIFAR-10、CIFAR-100 和 MNIST 等标准图像预测任务上，可以在 diffusion probabilistic models 中找到 90%-99% 的杂度率下的优秀子网络，而不会影响性能。此外，现有的 LTH 工作通常会找到具有不同层次杂度的子网络。我们发现，两个赢家票之间的相似性从层次上不同。具体来说，模型的上游层从两个赢家票之间更加相似，而下游层则更加不同。因此，我们提议在不同层次上找到具有变化杂度的赢家票。实验结果表明，我们的方法可以找到更加简洁的子网络，减少存储的内存需求和计算所需的 FLOPs。代码可以在中找到。

Adaptive Test-Time Personalization for Federated Learning

paper_url: http://arxiv.org/abs/2310.18816
repo_url: https://github.com/baowenxuan/atp
paper_authors: Wenxuan Bao, Tianxin Wei, Haohan Wang, Jingrui He
for: 本研究旨在提出一种在测试时进行个性化 Federated Learning (FL) 的方法，以适应不同来源客户端的分布差异。
methods: 我们提出了一种名为 ATP 的自适应学习算法，可以在不含标注数据的情况下，在测试时地方式地适应模型。
results: 我们的 ATP 算法在面对多种分布差异，包括标签差异、图像损害和频率差异等，能够超越现有的 TTA 方法，并且可以在多个数据集和模型架构上实现优秀的表现。

Abstract
Personalized federated learning algorithms have shown promising results in adapting models to various distribution shifts. However, most of these methods require labeled data on testing clients for personalization, which is usually unavailable in real-world scenarios. In this paper, we introduce a novel setting called test-time personalized federated learning (TTPFL), where clients locally adapt a global model in an unsupervised way without relying on any labeled data during test-time. While traditional test-time adaptation (TTA) can be used in this scenario, most of them inherently assume training data come from a single domain, while they come from multiple clients (source domains) with different distributions. Overlooking these domain interrelationships can result in suboptimal generalization. Moreover, most TTA algorithms are designed for a specific kind of distribution shift and lack the flexibility to handle multiple kinds of distribution shifts in FL. In this paper, we find that this lack of flexibility partially results from their pre-defining which modules to adapt in the model. To tackle this challenge, we propose a novel algorithm called ATP to adaptively learns the adaptation rates for each module in the model from distribution shifts among source domains. Theoretical analysis proves the strong generalization of ATP. Extensive experiments demonstrate its superiority in handling various distribution shifts including label shift, image corruptions, and domain shift, outperforming existing TTA methods across multiple datasets and model architectures. Our code is available at https://github.com/baowenxuan/ATP .

摘要
个人化联合学习算法已经在不同分布偏移中适应模型表现出色。然而，大多数这些方法需要测试客户端上有标注数据进行个人化，而在实际场景中这些数据通常不可获得。在这篇论文中，我们介绍了一种新的设定，即测试时个人化联合学习（TTPFL），Client可以在无标注数据的情况下，在本地适应全球模型，而不需要任何标注数据。尽管传统的测试时适应（TTA）可以在这种场景中使用，但大多数它们假设训练数据来自单一领域，而实际上来自多个客户端（源领域）的分布不同。忽略这些领域关系可能会导致低效泛化。此外，大多数TTA算法是为某种特定的分布偏移设计的，缺乏在多种分布偏移中的灵活性。为解决这个挑战，我们提议了一种新的算法，即ATP，可以自动学习模型中每个模块的适应率从分布偏移中。理论分析表明ATP具有强大的泛化性。广泛的实验表明ATP在处理多种分布偏移，包括标签偏移、图像损害和频率偏移，在多个数据集和模型架构上超越了现有的TTA方法，并且其代码可以在https://github.com/baowenxuan/ATP上获取。

Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

paper_url: http://arxiv.org/abs/2310.18814
repo_url: None
paper_authors: Yan Wang, Huaiqing Wu, Dan Nettleton
for: 这个论文主要是为了研究随机森林的稳定性，并且提供了一个稳定性的定义，以及一种基于这个定义的预测 интерVALL的建立方法。
methods: 这个论文使用了随机森林的实际实现，以及一些数理Statistics的工具来研究随机森林的稳定性。
results: 这个论文的结果表明，随机森林在一定条件下具有稳定性，并且可以提供正确的预测点和预测 интерVALL，而且这些预测 interval 的覆盖率可以保证在一定范围内。

Abstract
We establish stability of random forests under the mild condition that the squared response ($Y^2$) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed $Y^2$. Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when $Y$ is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions weaker than those considered in previous literature. Our work implies that random forests, with its stability property, is an effective machine learning method that can provide not only satisfactory point prediction but also justified interval prediction at almost no extra computational cost.

摘要
我们证明随机森林的稳定性，具体来说是当响应值($Y^2$) 不具有极大的尾部时。我们的分析适用于实际的随机森林实现，如\texttt{randomForest} 在 \texttt{R} 中的实现。实际结果表明，稳定性可能会 persist 超过我们的假设，并且适用于重 tailed $Y^2$。使用稳定性质量，我们证明了预测间隔 constructed from 随机森林的 out-of-bag 错误的下界。另外，对于 continuous $Y$ 的情况，我们还设立了一个轻量级的条件，并证明了这个下界。我们还讨论了先前文献中考虑的假设下的极限覆盖率。我们的工作 imply 随机森林，具有稳定性质量，是一种有效的机器学习方法，可以不仅提供满意的点预测，还可以提供正确的间预测，而且只需要一些较少的计算成本。

The Synergy of Speculative Decoding and Batching in Serving Large Language Models

paper_url: http://arxiv.org/abs/2310.18813
repo_url: None
paper_authors: Qidong Su, Christina Giannoula, Gennady Pekhimenko
for: 这篇论文的目的是研究大语言模型（LLM）的批处理和预测解oding技术，以提高LLM的硬件利用率。
methods: 这篇论文使用了批处理和预测解oding两种技术来提高LLM的硬件利用率。
results: 论文的实验结果表明，适当的预测 lengths 与批处理大小有关，而且提出了一种适应性的预测解oding策略，可以与现有的最佳化策略相比。

Abstract
Large Language Models (LLMs) like GPT are state-of-the-art text generation models that provide significant assistance in daily routines. However, LLM execution is inherently sequential, since they only produce one token at a time, thus incurring low hardware utilization on modern GPUs. Batching and speculative decoding are two techniques to improve GPU hardware utilization in LLM inference. To study their synergy, we implement a prototype implementation and perform an extensive characterization analysis on various LLM models and GPU architectures. We observe that the optimal speculation length depends on the batch size used. We analyze the key observation and build a quantitative model to explain it. Based on our analysis, we propose a new adaptive speculative decoding strategy that chooses the optimal speculation length for different batch sizes. Our evaluations show that our proposed method can achieve equal or better performance than the state-of-the-art speculation decoding schemes with fixed speculation length.

摘要
大型语言模型（LLM）如GPT是现在的文本生成模型，它们在日常 Routine 中提供了重要的帮助。然而，LLM 执行是Sequential 的，它们只生成一个 Token 在一次，从而导致现代 GPU 的硬件利用率低。批处和推测解码是两种技术来提高 LLM 执行的 GPU 硬件利用率。为了研究这两种技术的相互作用，我们实现了一个原型实现，并对不同的 LLM 模型和 GPU 架构进行了广泛的分析。我们发现，使用不同的批处大小时，最佳的推测长度会有所不同。我们分析了这一关键观察结果，并建立了一个量化的模型来解释它。根据我们的分析，我们提出了一种新的自适应推测解码策略，可以根据不同的批处大小选择最佳的推测长度。我们的评估表明，我们的提议方法可以与现有的最佳推测解码方法相当或更好的性能。

Inverse distance weighting attention

paper_url: http://arxiv.org/abs/2310.18805
repo_url: https://github.com/calvinmccarter/idw-attention
paper_authors: Calvin McCarter
for: 这篇论文研究了取代透彩积 dot-product 注意力的negative-log of Euclidean distance 的效果。
methods: 这种注意力方式简化为 inverse distance weighting interpolation，并在简单的一层隐藏层网络和vanilla cross-entropy loss中进行训练，用于文本分类问题。
results: 研究发现，使用这种注意力方式可以生成一个包含原型的键矩阵和相应的 logits 的解释网络，并可以通过手动构建的特殊情况 прототипы进行低影响的特殊情况处理。

Abstract
We report the effects of replacing the scaled dot-product (within softmax) attention with the negative-log of Euclidean distance. This form of attention simplifies to inverse distance weighting interpolation. Used in simple one hidden layer networks and trained with vanilla cross-entropy loss on classification problems, it tends to produce a key matrix containing prototypes and a value matrix with corresponding logits. We also show that the resulting interpretable networks can be augmented with manually-constructed prototypes to perform low-impact handling of special cases.

摘要
我们报告了在扩展点积（在满意函数中）中使用负欧几丁度的效果。这种注意力的形式简化为对距离权重 interpolating。在简单的一个隐藏层网络中使用，并使用普通的极值损失函数进行训练，它通常会生成一个包含原型的键矩阵和相应的 logits 矩阵。我们还示出了使用手动构造的特殊情况扩展的可行性，以便实现低影响的特殊情况处理。Note: "扩展点积" in Chinese is "扩展点积" (kuòzhè dòngshí), and "负欧几丁度" is "负欧几丁度" (fùōujìtiànduō).

Weakly Coupled Deep Q-Networks

paper_url: http://arxiv.org/abs/2310.18803
repo_url: None
paper_authors: Ibrahim El Shar, Daniel R. Jiang
for: 增强深度强化学习算法的性能在受约非常小的 Markov决策过程（WCMDP）中。
methods: 使用单一网络训练多个独立的 DQN “子代理”，每个子代理专门处理一个子问题，然后将其解决结果组合成最佳动作值的上界，以引导主 DQN 代理向优化尝试。
results: 在有多达 10 个子问题、3^10 个总动作和连续状态空间的设置下，与 DQN 和相关技术相比，WCDQN 在数值实验中显示更快的收敛速度。

Abstract
We propose weakly coupled deep Q-networks (WCDQN), a novel deep reinforcement learning algorithm that enhances performance in a class of structured problems called weakly coupled Markov decision processes (WCMDP). WCMDPs consist of multiple independent subproblems connected by an action space constraint, which is a structural property that frequently emerges in practice. Despite this appealing structure, WCMDPs quickly become intractable as the number of subproblems grows. WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value. This guides the main DQN agent towards optimality. We show that the tabular version, weakly coupled Q-learning (WCQL), converges almost surely to the optimal action value. Numerical experiments show faster convergence compared to DQN and related techniques in settings with as many as 10 subproblems, $3^{10}$ total actions, and a continuous state space.

摘要
我们提出了弱连结深度Q网络（WCDQN），一种新的深度训练学习算法，它在受限构造问题（WCMDP）中提高表现。WCMDP包含多个独立的子问题，连接在动作空间约束上，这是实际中常见的结构性特征。然而，随着子问题的数量增加，WCMDP很快就会变得无法应对。WCDQN使用单一网络来训练多个DQN“子代”，每个子代针对每个子问题进行训练，然后结合其解决方案以建立最佳动作值的Upper bound。这导引主DQN代向最佳解决方案。我们证明了这个 Tabular 版本，弱连结Q学习（WCQL），会逐渐趋向最佳动作值，并且在包含多达10个子问题、3^{10}个总动作和连续状态空间的numerical实验中比DQN和相关技术更快地趋向最佳解决方案。

A Competitive Algorithm for Agnostic Active Learning

paper_url: http://arxiv.org/abs/2310.18786
repo_url: None
paper_authors: Eric Price, Yihan Zhou
for: 这种纸是用于研究agnostic active learning的最佳算法，具体来说是用于任何二分类假设集$H$和分布$D_X$ over $X$中的输入。
methods: 我们采用了一种不同于现有的方法的approach，即使用splitting-based方法，以实现在$O(m^* \log |H|)$ queries中达到$O(\eta)$ error的目标。
results: 我们的算法可以与最佳算法匹配，即使在某些输入上有NP困难，我们的算法可以在$O(\log |H|)$ overhead下达到$O(\eta)$ error。

Abstract
For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m^*$ queries to get $O(\eta)$ error, then our algorithm uses $O(m^* \log |H|)$ queries to get $O(\eta)$ error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($\eta = 0$) setting. We also show that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.

摘要
For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m^*$ queries to get $O(\eta)$ error, then our algorithm uses $O(m^* \log |H|)$ queries to get $O(\eta)$ error. Our algorithm is in the same vein as the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($\eta = 0$) setting. We also prove that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.Note: The text has been translated using Simplified Chinese characters.

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

paper_url: http://arxiv.org/abs/2310.18784
repo_url: None
paper_authors: Aleksandar Armacki, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar
for: 本文研究了一种广泛的非线性SGD方法的收敛性。
methods: 本文使用了高probability下的收敛性 bounds，并且可以涵盖大多数现有的非线性SGD方法，如clipping、normalization和quantization。
results: 对具有 lipschitz continuous 的梯度的强转换函数，本文证明了logarithmic的依赖于失败概率，而且可以在heavy-tailed noise下工作。此外，本文的结果比现有的结果更加广泛，可以涵盖更多的非线性SGD方法和不同的噪声分布。

Abstract
Several recent works have studied the convergence \textit{in high probability} of stochastic gradient descent (SGD) and its clipped variant. Compared to vanilla SGD, clipped SGD is practically more stable and has the additional theoretical benefit of logarithmic dependence on the failure probability. However, the convergence of other practical nonlinear variants of SGD, e.g., sign SGD, quantized SGD and normalized SGD, that achieve improved communication efficiency or accelerated convergence is much less understood. In this work, we study the convergence bounds \textit{in high probability} of a broad class of nonlinear SGD methods. For strongly convex loss functions with Lipschitz continuous gradients, we prove a logarithmic dependence on the failure probability, even when the noise is heavy-tailed. Strictly more general than the results for clipped SGD, our results hold for any nonlinearity with bounded (component-wise or joint) outputs, such as clipping, normalization, and quantization. Further, existing results with heavy-tailed noise assume bounded $\eta$-th central moments, with $\eta \in (1,2]$. In contrast, our refined analysis works even for $\eta=1$, strictly relaxing the noise moment assumptions in the literature.

摘要

A Data-driven Recommendation Framework for Optimal Walker Designs

paper_url: http://arxiv.org/abs/2310.18772
repo_url: None
paper_authors: Advaith Narayanan
for: 这篇论文旨在优化医疗步行器，以提高临床恢复和生理治疗下肢体的功能。
methods: 该论文使用自动化机器学习模型和栅Stacked-Ensemble方法，以优化医疗步行器的设计。同时，该论文还提供了大量的 Parametric walker 设计数据，以便训练预测模型。
results: 该论文的结果表明，通过使用自动化机器学习模型和多目标优化算法，可以实现高性能的医疗步行器设计。论文还提供了一些可能的医疗步行器设计，其中一些设计可以减轻重量达30%，同时提高结构稳定性和完整性。

Abstract
The rapidly advancing fields of statistical modeling and machine learning have significantly enhanced data-driven design and optimization. This paper focuses on leveraging these design algorithms to optimize a medical walker, an integral part of gait rehabilitation and physiological therapy of the lower extremities. To achieve the desirable qualities of a walker, we train a predictive machine-learning model to identify trade-offs between performance objectives, thus enabling the use of efficient optimization algorithms. To do this, we use an Automated Machine Learning model utilizing a stacked-ensemble approach shown to outperform traditional ML models. However, training a predictive model requires vast amounts of data for accuracy. Due to limited publicly available walker designs, this paper presents a dataset of more than 5,000 parametric walker designs with performance values to assess mass, structural integrity, and stability. These performance values include displacement vectors for the given load case, stress coefficients, mass, and other physical properties. We also introduce a novel method of systematically calculating the stability index of a walker. We use MultiObjective Counterfactuals for Design (MCD), a novel genetic-based optimization algorithm, to explore the diverse 16-dimensional design space and search for high-performing designs based on numerous objectives. This paper presents potential walker designs that demonstrate up to a 30% mass reduction while increasing structural stability and integrity. This work takes a step toward the improved development of assistive mobility devices.

摘要
“随着统计模型和机器学习的快速进步，数据驱动设计和优化技术已经得到了很大的提高。本文将focus on 使用这些设计算法来优化医疗杆子，它是距离股体重abilitation和物理治疗的重要部分。为了实现杆子的欲望性能，我们将使用预测机器学习模型，以识别表现目标之间的贸易，并启用高效的优化算法。我们使用了自动化机器学习模型，使用堆叠合 ensemble 方法，已经被证明可以超越传统机器学习模型。然而，训练预测模型需要巨量的数据，以确保准确性。由于有限的公开可用的杆子设计，本文提供了超过5,000个 Parametric 杆子设计，并且包含表现值，以评估杆子的质量、结构完整性和稳定性。我们还引入了一新的稳定指数计算方法。我们使用多目标Counterfactuals for Design (MCD) ，一种新的基因型数据分析方法，来探索16个维度的设计空间，寻找高性能的设计。本文显示了可能的杆子设计，证明了可以降低30%的质量，同时增加结构的稳定性和完整性。这个工作为伤健移动设备的改进做出了一步。”

Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

paper_url: http://arxiv.org/abs/2310.18765
repo_url: https://github.com/yanliang3612/revar
paper_authors: Divin Yan, Gengchen Wei, Chen Yang, Shengzhong Zhang, Zengfeng Huang
for: Addressing the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data.
methods: Integrates imbalanced node classification and Bias-Variance Decomposition, leverages graph augmentation technique to estimate the variance, and designs a regularization term to alleviate the impact of imbalance.
results: Outperforms state-of-the-art methods in various imbalanced scenarios, providing a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.

Abstract
This paper introduces a new approach to address the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data. Our approach integrates imbalanced node classification and Bias-Variance Decomposition, establishing a theoretical framework that closely relates data imbalance to model variance. We also leverage graph augmentation technique to estimate the variance, and design a regularization term to alleviate the impact of imbalance. Exhaustive tests are conducted on multiple benchmarks, including naturally imbalanced datasets and public-split class-imbalanced datasets, demonstrating that our approach outperforms state-of-the-art methods in various imbalanced scenarios. This work provides a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.

摘要
Translation notes:* "GNNs" is translated as "图 нейрон网络" (graph neural networks)* "class imbalance" is translated as "类别不均衡" (class imbalance)* "Bias-Variance Decomposition" is translated as "偏差-差异分解" (Bias-Variance Decomposition)* "graph augmentation" is translated as "图补充" (graph augmentation)* "regularization term" is translated as "正则化项" (regularization term)Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form as well.

Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

paper_url: http://arxiv.org/abs/2310.18762
repo_url: None
paper_authors: Boya Zhang, Weijian Luo, Zhihua Zhang
for: 防止神经网络分类器受到攻击的安全性研究
methods: diffusion purification 方法
results: Purify++ 算法提高了对多种攻击方法的防御能力

Abstract
Adversarial attacks can mislead neural network classifiers. The defense against adversarial attacks is important for AI safety. Adversarial purification is a family of approaches that defend adversarial attacks with suitable pre-processing. Diffusion models have been shown to be effective for adversarial purification. Despite their success, many aspects of diffusion purification still remain unexplored. In this paper, we investigate and improve upon three limiting designs of diffusion purification: the use of an improved diffusion model, advanced numerical simulation techniques, and optimal control of randomness. Based on our findings, we propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks. Our work presents a systematic exploration of the limits of diffusion purification methods.

摘要
Adversarial attacks can mislead neural network classifiers. The defense against adversarial attacks is important for AI safety. Adversarial purification is a family of approaches that defend against adversarial attacks with suitable pre-processing. Diffusion models have been shown to be effective for adversarial purification. Despite their success, many aspects of diffusion purification still remain unexplored. In this paper, we investigate and improve upon three limiting designs of diffusion purification: the use of an improved diffusion model, advanced numerical simulation techniques, and optimal control of randomness. Based on our findings, we propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks. Our work presents a systematic exploration of the limits of diffusion purification methods.Here's the translation in Simplified Chinese characters: adversarial attacks 可以诱导 нейрон网络分类器错误。防止 adversarial attacks 是 AI 安全的重要任务。adversarial purification 是一家 approachedefend against adversarial attacks with suitable pre-processing。diffusion models 已经被证明是有效的 adversarial purification 方法。despite their success, many aspects of diffusion purification still remain unexplored。在这篇 paper中，我们investigate 和改进 diffusion purification 的三个限制设计：使用改进的 diffusion model，进阶的数值 simulations 技术，和优化的随机性控制。基于我们的发现，我们提出 Purify++, 一个新的 diffusion purification 算法，现在是 severaldiffusion purification 方法的州际标准。our work 展示了 diffusion purification 方法的系统性探索。

Optimization of utility-based shortfall risk: A non-asymptotic viewpoint

paper_url: http://arxiv.org/abs/2310.18743
repo_url: None
paper_authors: Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat
for: 本文研究了金融领域中流量风险的评估和优化问题，具体来说是Utility-based shortfall risk（UBSR）的估计和优化问题。
methods: 本文使用了类型样本平均approximation（SAA）来估计UBSR，并 derive了非尺度性质 bound 的均方差误差。在UBSR优化问题中，本文 derive了UBSR导数的表达式，该表达式是一个期望比率，两个期望都 involve UBSR。使用SAA来 aproximate numerator和denominator中的UBSR，得到一个偏导数估计器。
results: 本文 derive non-尺度性质 bound 表示该偏导数估计器是 asymptotically unbiased。此外，本文还 derive non-尺度性质 bound 表示SG算法的速度减少率。

Abstract
We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth parameterization. This expression is a ratio of expectations, both of which involve the UBSR. We use SAA for the numerator as well as denominator in the UBSR gradient expression to arrive at a biased gradient estimator. We derive non-asymptotic bounds on the estimation error, which show that our gradient estimator is asymptotically unbiased. We incorporate the aforementioned gradient estimator into a stochastic gradient (SG) algorithm for UBSR optimization. Finally, we derive non-asymptotic bounds that quantify the rate of convergence of our SG algorithm for UBSR optimization.

摘要
我们考虑了金融中流行的价值基础隐没隐危 (UBSR) 的估计和优化问题。在 UBSR 估计上，我们 derivated 一个非对数减少的 bound 为 classical sample average approximation (SAA) 的均方误差。在 UBSR 优化上，我们 derivated 一个表达式，用于 UBSR 的梯度，这个表达式是两个期望的比率，其中一个是 UBSR 的期望值。我们使用 SAA 来计算 numerator 和 denominator 两个部分，从而得到一个偏导数 estimator。我们 derivated 非对数减少的 bounds ，证明了我们的梯度 estimator 是 asymptotically unbiased。最后，我们 incorporated 这个梯度 estimator 到一个随机梯度 (SG) 算法中，并 derivated 非对数减少的 bounds 来评估这个算法的速度传递率。

Curriculum Learning for Graph Neural Networks: Which Edges Should We Learn First

paper_url: http://arxiv.org/abs/2310.18735
repo_url: https://github.com/rollingstonezz/curriculum_learning_for_gnns
paper_authors: Zheng Zhang, Junxiang Wang, Liang Zhao
For: 本文提出了一种新的课程学习策略，用于逐渐将图数据中的边 integrate 到训练中，以提高图 neural network 的泛化能力和Robustness。* Methods: 本文提出了一种基于课程学习的策略，使用了度量学习策略来衡量边的难度，并逐渐将边添加到训练中，以便学习更好的表示。* Results: 经过EXTENSIVE experiments on nine synthetic datasets and nine real-world datasets, 本文 Demonstrated the strength of the proposed method in improving the generalization ability and robustness of learned representations.

Abstract
Graph Neural Networks (GNNs) have achieved great success in representing data with dependencies by recursively propagating and aggregating messages along the edges. However, edges in real-world graphs often have varying degrees of difficulty, and some edges may even be noisy to the downstream tasks. Therefore, existing GNNs may lead to suboptimal learned representations because they usually treat every edge in the graph equally. On the other hand, Curriculum Learning (CL), which mimics the human learning principle of learning data samples in a meaningful order, has been shown to be effective in improving the generalization ability and robustness of representation learners by gradually proceeding from easy to more difficult samples during training. Unfortunately, existing CL strategies are designed for independent data samples and cannot trivially generalize to handle data dependencies. To address these issues, we propose a novel CL strategy to gradually incorporate more edges into training according to their difficulty from easy to hard, where the degree of difficulty is measured by how well the edges are expected given the model training status. We demonstrate the strength of our proposed method in improving the generalization ability and robustness of learned representations through extensive experiments on nine synthetic datasets and nine real-world datasets. The code for our proposed method is available at https://github.com/rollingstonezz/Curriculum_learning_for_GNNs.

摘要
graph neural networks (GNNs) 图神网络已经取得了很大的成功，通过 recursively propagating 和 aggregating 消息来表示具有依赖关系的数据。然而，实际世界中的图 often have varying degrees of difficulty, and some edges may even be noisy to the downstream tasks.因此，现有的 GNN 可能会导致学习的表示不佳，因为它们通常对每个图边进行平等的处理。在另一个面向，curriculum learning (CL)，模仿人类学习的原理，可以在训练过程中逐渐从易到更加复杂的样本中学习，从而提高学习的普适性和鲁棒性。然而，现有的 CL 策略是为独立的数据样本设计的，无法直接扩展到处理数据依赖关系。为解决这些问题，我们提出了一种新的 CL 策略，通过度量图边的难度从易到更加困难地慢慢地包含更多的图边到训练中，其中图边的难度通过模型训练状态来评估。我们通过对九个Synthetic数据集和九个实际世界数据集进行了广泛的实验，证明了我们的提议的方法能够提高学习的普适性和鲁棒性。code для我们的提议方法可以在 https://github.com/rollingstonezz/Curriculum_learning_for_GNNs 找到。

Latent class analysis by regularized spectral clustering

paper_url: http://arxiv.org/abs/2310.18727
repo_url: None
paper_authors: Huan Qing
for: 这篇论文的目的是提出两种新的算法来估计 categorical 数据中的潜在类型模型。
methods: 这两种算法都基于一个新定义的规范化拉普拉斯矩阵，计算从响应矩阵中获得的。作者提供了这些算法的理论收敛速率，并证明了它们在某些轻度的条件下稳定地生成了一致的潜在类型分析。
results: 作者通过了广泛的 simulations 实验来证明算法的效率和准确性，并在实际的 categorical 数据上应用了这些算法，获得了有前途的结果。

Abstract
The latent class model is a powerful tool for identifying latent classes within populations that share common characteristics for categorical data in social, psychological, and behavioral sciences. In this article, we propose two new algorithms to estimate a latent class model for categorical data. Our algorithms are developed by using a newly defined regularized Laplacian matrix calculated from the response matrix. We provide theoretical convergence rates of our algorithms by considering a sparsity parameter and show that our algorithms stably yield consistent latent class analysis under mild conditions. Additionally, we propose a metric to capture the strength of latent class analysis and several procedures designed based on this metric to infer how many latent classes one should use for real-world categorical data. The efficiency and accuracy of our algorithms are verified by extensive simulated experiments, and we further apply our algorithms to real-world categorical data with promising results.

摘要
“拉丁类模型是一种强大的工具，用于在人口中找到共同特征的分类数据的社会、心理和行为科学中。在这篇文章中，我们提出了两种新的算法，用于估计拉丁类模型。我们的算法基于响应矩阵中定义的新的规范化拉普拉斯矩阵。我们提供了对我们的算法的理论收敛率，并证明我们的算法在轻度条件下稳定地生成了一致的拉丁类分析。此外，我们还提出了一个用于捕捉拉丁类分析的强度的度量，以及基于这个度量的几种过程，用于在实际中的分类数据中决定拉丁类的数量。我们的算法的效率和准确性通过了广泛的模拟实验，并在实际中应用到了分类数据的成功。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

On the Accuracy of Hotelling-Type Asymmetric Tensor Deflation: A Random Tensor Analysis

paper_url: http://arxiv.org/abs/2310.18717
repo_url: None
paper_authors: Mohamed El Amine Seddik, Maxime Guillaud, Alexis Decurninge, José Henrique de Morais Goulart
For: This paper studies the asymptotic behavior of Hotelling-type tensor deflation in the presence of noise, specifically in the regime of large tensor dimensions.* Methods: The paper uses recent advances in random tensor theory to analytically characterize the estimated singular values and the alignment of estimated and true singular vectors at each step of the deflation procedure.* Results: The paper shows that the estimated singular values and the alignments between the estimated and true rank-1 signal components can be used to construct estimators of the signal-to-noise ratios and the alignments between the estimated and true rank-1 signal components.

Abstract
This work introduces an asymptotic study of Hotelling-type tensor deflation in the presence of noise, in the regime of large tensor dimensions. Specifically, we consider a low-rank asymmetric tensor model of the form $\sum_{i=1}^r \beta_i{\mathcal{A}_i + {\mathcal{W}$ where $\beta_i\geq 0$ and the ${\mathcal{A}_i$'s are unit-norm rank-one tensors such that $\left| \langle {\mathcal{A}_i, {\mathcal{A}_j \rangle \right| \in [0, 1]$ for $i\neq j$ and ${\mathcal{W}$ is an additive noise term. Assuming that the dominant components are successively estimated from the noisy observation and subsequently subtracted, we leverage recent advances in random tensor theory in the regime of asymptotically large tensor dimensions to analytically characterize the estimated singular values and the alignment of estimated and true singular vectors at each step of the deflation procedure. Furthermore, this result can be used to construct estimators of the signal-to-noise ratios $\beta_i$ and the alignments between the estimated and true rank-1 signal components.

摘要
The proposed method involves successively estimating the dominant components of the tensor and subtracting them from the noisy observation. By leveraging recent advances in random tensor theory, the paper analytically characterizes the estimated singular values and the alignment of estimated and true singular vectors at each step of the deflation procedure.Furthermore, the results can be used to construct estimators of the signal-to-noise ratios $\beta_i$ and the alignments between the estimated and true rank-1 signal components. This study provides a comprehensive understanding of the behavior of Hotelling-type tensor deflation in the presence of noise and its applications in signal processing and machine learning.In Simplified Chinese, the text can be translated as:这个研究介绍了一种幂等式抑制方法在噪声存在的情况下进行研究，具体来说是在大tensor维度下进行研究。这个模型是一个低维偏 asymmetric tensor的形式，具体来说是 $\sum_{i=1}^r \beta_i{\mathcal{A}_i + {\mathcal{W}$，其中 $\beta_i\geq 0$ 和 $\left| \langle {\mathcal{A}_i, {\mathcal{A}_j \rangle \right| \in [0, 1]$ для $i\neq j$。噪声是一个加itive的。该方法假设在噪声观测下，逐步估计主要组分，并将其从噪声观测中 subtract。通过利用最近的幂等式理论，这篇论文分析了在噪声存在下的逐步估计方法的特性。此外，这些结果还可以用于构建噪声比例和真实维度方向的估计器。这种方法的应用包括信号处理和机器学习等领域。通过这篇论文，我们可以更好地理解幂等式抑制方法在噪声存在下的行为，以及其在实际应用中的可行性和有效性。

Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding

paper_url: http://arxiv.org/abs/2310.18716
repo_url: https://github.com/pku-ml/laplaciancanonization
paper_authors: Jiangyan Ma, Yifei Wang, Yisen Wang
for: 提高 Graph Transformers 的效果，解决spectral embedding在理论上的缺陷
methods: 直接找到 Laplacian Canonization（LC），一种轻量级的预处理方法，可以应用于任何现有的 GNN
results: MAP 算法可以成功 canonize 超过 90% 的 eigenvectors，并在实验中表现出色，与现有方法相比带来较少的计算开销。

Abstract
Spectral embedding is a powerful graph embedding technique that has received a lot of attention recently due to its effectiveness on Graph Transformers. However, from a theoretical perspective, the universal expressive power of spectral embedding comes at the price of losing two important invariance properties of graphs, sign and basis invariance, which also limits its effectiveness on graph data. To remedy this issue, many previous methods developed costly approaches to learn new invariants and suffer from high computation complexity. In this work, we explore a minimal approach that resolves the ambiguity issues by directly finding canonical directions for the eigenvectors, named Laplacian Canonization (LC). As a pure pre-processing method, LC is light-weighted and can be applied to any existing GNNs. We provide a thorough investigation, from theory to algorithm, on this approach, and discover an efficient algorithm named Maximal Axis Projection (MAP) that works for both sign and basis invariance and successfully canonizes more than 90% of all eigenvectors. Experiments on real-world benchmark datasets like ZINC, MOLTOX21, and MOLPCBA show that MAP consistently outperforms existing methods while bringing minimal computation overhead. Code is available at https://github.com/PKU-ML/LaplacianCanonization.

摘要
干扰 embedding 是一种强大的图 embedding 技术，在图transformer 中得到了很多关注，但从理论角度来看，它的通用表达力来到了两个重要的对称性问题的价格，即标志对称性和基准对称性，这也限制了它在图数据上的效果。为了解决这个问题，许多前一代的方法开发了昂贵的方法来学习新的对称性，并且受到高计算复杂度的困扰。在这种情况下，我们 explore 一种最小的方法，即laplacian canonization (LC)，它可以直接找到图laplacian 的可 canonical 方向。作为一种纯粹的预处理方法，LC 轻量级，可以应用于任何现有的 GNNs。我们提供了一份 thorought 的调查，从理论到算法，对这种方法，并发现了一种高效的算法 named Maximal Axis Projection (MAP)，它可以实现标志对称性和基准对称性，并成功 canonize 超过 90% 的所有 eigenvectors。实验结果表明，MAP 在实际 benchmark 数据上（如 ZINC、MOLTOX21 和 MOLPCBA） consistently 超过现有方法，同时带来最小的计算开销。代码可以在 https://github.com/PKU-ML/LaplacianCanonization 上找到。

Episodic Multi-Task Learning with Heterogeneous Neural Processes

paper_url: http://arxiv.org/abs/2310.18713
repo_url: https://github.com/autumn9999/hnps
paper_authors: Jiayi Shen, Xiantong Zhen, Qi, Wang, Marcel Worring
for: 本研究旨在解决多任务学习中的数据不足问题，具体来说是在 episodic 训练设置下利用任务之间的不同信息和 episoden 中的元知识，以有效地处理每个任务。
methods: 我们开发了异 heterogeneous Neural Processes (HNPs) 来解决这个问题，它们在层次 Bayes 框架中有效地利用先前经验作为元知识，捕捉任务之间的相互关系，从而 mitigate 数据不足。 transformer 结构的推理模块也是为了快速地进行元知识和任务相关性的推理。
results: 实验结果表明我们的提案的 HNPs 在比较基eline的情况下表现出色，并且对减少数据不足的影响进行了证明。简化 Studios 中的结果也验证了我们设计的推理模块的有效性。

Abstract
This paper focuses on the data-insufficiency problem in multi-task learning within an episodic training setup. Specifically, we explore the potential of heterogeneous information across tasks and meta-knowledge among episodes to effectively tackle each task with limited data. Existing meta-learning methods often fail to take advantage of crucial heterogeneous information in a single episode, while multi-task learning models neglect reusing experience from earlier episodes. To address the problem of insufficient data, we develop Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within the framework of hierarchical Bayes, HNPs effectively capitalize on prior experiences as meta-knowledge and capture task-relatedness among heterogeneous tasks, mitigating data-insufficiency. Meanwhile, transformer-structured inference modules are designed to enable efficient inferences toward meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful functional priors for adapting to novel heterogeneous tasks in each meta-test episode. Experimental results show the superior performance of the proposed HNPs over typical baselines, and ablation studies verify the effectiveness of the designed inference modules.

摘要

ALERTA-Net: A Temporal Distance-Aware Recurrent Networks for Stock Movement and Volatility Prediction

paper_url: http://arxiv.org/abs/2310.18706
repo_url: https://github.com/hao1zhao/alerta-net
paper_authors: Shengkun Wang, YangXiao Bai, Kaiqun Fu, Linhan Wang, Chang-Tien Lu, Taoran Ji
for: 预测股市运动和不稳定性 + 投资者和 policymakers 都需要准确预测股市，作为经济健康指标
methods: integrate sentiment analysis, macroeconomic indicators, search engine data, and historical prices within a multi-attention deep learning model + 利用社交媒体数据，具有丰富的公众情感信息，以增强股市预测的准确性
results: state-of-the-art performance using a dataset specifically curated for predicting stock market movements and volatility + 我们的提议模型在使用自定义的数据集后，实现了股市运动和不稳定性预测的状态oke-of-the-art表现

Abstract
For both investors and policymakers, forecasting the stock market is essential as it serves as an indicator of economic well-being. To this end, we harness the power of social media data, a rich source of public sentiment, to enhance the accuracy of stock market predictions. Diverging from conventional methods, we pioneer an approach that integrates sentiment analysis, macroeconomic indicators, search engine data, and historical prices within a multi-attention deep learning model, masterfully decoding the complex patterns inherent in the data. We showcase the state-of-the-art performance of our proposed model using a dataset, specifically curated by us, for predicting stock market movements and volatility.

摘要
für both investors und policymakers ist die prognose des aktienmarktes essenziell, da er als indicator für die wirtschaftliche well-being dient. um diese herausforderung zu meistern, nutzen wir die kraft von sozialen medien-daten, ein reiches quell von öffentlicher meinung, um die genauigkeit der aktienmarkt-vorhersagen zu verbessern. im Gegensatz zu conventional methods, entwickeln wir eine ansprechung, die sentiment-analyse, makroökonomische indicatoren, suchmaschine-daten und historische preise in einem multi-attention-tiefen lernmodell integriert, das die komplexen muster im data meisterlich decodiert. wir zeigen die state-of-the-art-leistung unseres vorgeschlagenen modells anhand einer datenbasis, die von uns speziell für die vorhersage von aktienmarkt-bewegungen und -volatilität sammeln.

Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning

paper_url: http://arxiv.org/abs/2310.19831
repo_url: https://github.com/alihanhyk/interpole
paper_authors: Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar
for: 这个论文是为了理解人类决策行为的概念模型，以便提高决策过程的透明度和负责任性。
methods: 这个论文提出了一种基于 bayesian 方法的可解释政策学习方法（Interpole），可以同时估计决策者的（可能偏袋）信念更新过程和决策策略。
results: 通过在模拟和真实世界数据上进行实验，论文示出了该方法的可能作为决策过程的调查、评估和理解的潜在价值。

Abstract
Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging -- with no access to underlying states, no knowledge of environment dynamics, and no allowance for live experimentation. We desire learning a data-driven representation of decision-making behavior that (1) inheres transparency by design, (2) accommodates partial observability, and (3) operates completely offline. To satisfy these key criteria, we propose a novel model-based Bayesian method for interpretable policy learning ("Interpole") that jointly estimates an agent's (possibly biased) belief-update process together with their (possibly suboptimal) belief-action mapping. Through experiments on both simulated and real-world data for the problem of Alzheimer's disease diagnosis, we illustrate the potential of our approach as an investigative device for auditing, quantifying, and understanding human decision-making behavior.

摘要
理解人类行为从观察数据中是决策过程中的关键，以确保决策过程中的透明度和负责任。在实际场景中，如医疗行业，模型决策者的政策非常困难，因为无法访问基础状态、环境动力学不明确、无法进行实际实验。我们希望通过学习数据驱动的方法学习决策行为，以满足以下三个关键需求：1. 具有透明度设计，以便理解决策过程中的决策因素。2. 可以处理部分可见性，以适应决策过程中的不同情况。3. 完全没有线上运行，以便在决策过程中进行实时调整。为了满足这些需求，我们提出了一种新的模型基于概率方法，即“Interpole”，可以同时估算决策者的（可能偏见的）信念更新过程和（可能不优的）信念行为映射。通过在模拟和实际数据上进行实验，我们证明了我们的方法可以作为决策过程的调查、评估和理解人类决策行为的调查工具。

Towards Combinatorial Generalization for Catalysts: A Kohn-Sham Charge-Density Approach

paper_url: http://arxiv.org/abs/2310.18702
repo_url: https://github.com/ppope/rho-learn
paper_authors: Phillip Pope, David Jacobs
for: 本研究旨在探讨一种基于点wise学习的Kohn-Sham充电密度模型，以实现对新材料的预测和设计。
methods: 本研究使用了点wise学习方法，学习了 bulk catalysts 的充电密度，并在新的材料结构中进行了探索和预测。
results: 研究发现，使用点wise学习方法可以实现对新材料的预测和设计，并且可以在多种元素组合下实现 combinatorial 泛化。测试结果显示，超过 80% 的二元和三元测试样本在使用点wise学习方法下可以更快地达到稳定状态，相比标准基线下降减13%的迭代次数，这可能是独立的兴趣点。

Abstract
The Kohn-Sham equations underlie many important applications such as the discovery of new catalysts. Recent machine learning work on catalyst modeling has focused on prediction of the energy, but has so far not yet demonstrated significant out-of-distribution generalization. Here we investigate another approach based on the pointwise learning of the Kohn-Sham charge-density. On a new dataset of bulk catalysts with charge densities, we show density models can generalize to new structures with combinations of elements not seen at train time, a form of combinatorial generalization. We show that over 80% of binary and ternary test cases achieve faster convergence than standard baselines in Density Functional Theory, amounting to an average reduction of 13% in the number of iterations required to reach convergence, which may be of independent interest. Our results suggest that density learning is a viable alternative, trading greater inference costs for a step towards combinatorial generalization, a key property for applications.

摘要
金ohn-Sham方程在许多重要应用中发挥重要作用，如新 catalyst 的发现。现代机器学习方法在 catalyst 模型化中的 Prediction of energy 方面已经受到了重点研究，但是迄今为止并没有显示出significant out-of-distribution generalization。我们在这里 investigate 一种基于点wise learning of Kohn-Sham charge-density 的方法。使用新的 bulk catalysts with charge densities 数据集，我们显示了 density models 可以generalize 到新的结构，包括元素的组合不同于训练时间，这种 combinatorial generalization。我们表明了超过 80% 的 binary 和 ternary test cases 在 Density Functional Theory 中比标准基elines 更快 converges，平均降低了13%的迭代次数，这可能是独立的 interesseting。我们的结果表明了 density learning 是一种可行的alternative，通过更大的推理成本换取了 combinatorial generalization，一种重要的应用特性。

Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards

paper_url: http://arxiv.org/abs/2310.18701
repo_url: None
paper_authors: Bo Xue, Yimu Wang, Yuanyu Wan, Jinfeng Yi, Lijun Zhang
For: investigate the problem of generalized linear bandits with heavy-tailed rewards, and propose two novel algorithms based on truncation and mean of medians to address this issue.* Methods: propose two algorithms, one based on truncation and the other based on mean of medians, to achieve an almost optimal regret bound of $\widetilde{O}(dT^{\frac{1}{1+\epsilon})$ with online learning support and lower computational complexity.* Results: improve the regret bounds by a logarithmic factor compared to existing algorithms when $\epsilon=1$, and confirm the merits of the proposed algorithms through numerical experimental results.

Abstract
This paper investigates the problem of generalized linear bandits with heavy-tailed rewards, whose $(1+\epsilon)$-th moment is bounded for some $\epsilon\in (0,1]$. Although there exist methods for generalized linear bandits, most of them focus on bounded or sub-Gaussian rewards and are not well-suited for many real-world scenarios, such as financial markets and web-advertising. To address this issue, we propose two novel algorithms based on truncation and mean of medians. These algorithms achieve an almost optimal regret bound of $\widetilde{O}(dT^{\frac{1}{1+\epsilon})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Additionally, our mean-of-medians-based algorithm requires only $O(\log T)$ rewards and one estimator per epoch, making it more practical. Moreover, our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $\epsilon=1$. Numerical experimental results confirm the merits of our algorithms.

摘要

Clairvoyance: A Pipeline Toolkit for Medical Time Series

paper_url: http://arxiv.org/abs/2310.18688
repo_url: https://github.com/vanderschaarlab/clairvoyance
paper_authors: Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, Mihaela van der Schaar
for: 这个研究旨在提供一个统一的、端到端的、自动机器学习（AutoML）友好的数据驱动医疗决策支持系统，以便在实际医疗过程中与患者互动，提供适应性强的预测和决策支持。
methods: 这个系统使用了许多Machine Learning（ML）技术，包括数据预processing、缺失数据填充、特征选择、预测和不确定性估计等。
results: 这个系统可以在实际医疗应用中实现高度自动化的数据驱动医疗决策支持，并且可以在不同的医疗设置中进行适应性强的预测和决策支持。

Abstract
Time-series learning is the bread and butter of data-driven *clinical decision support*, and the recent explosion in ML research has demonstrated great potential in various healthcare settings. At the same time, medical time-series problems in the wild are challenging due to their highly *composite* nature: They entail design choices and interactions among components that preprocess data, impute missing values, select features, issue predictions, estimate uncertainty, and interpret models. Despite exponential growth in electronic patient data, there is a remarkable gap between the potential and realized utilization of ML for clinical research and decision support. In particular, orchestrating a real-world project lifecycle poses challenges in engineering (i.e. hard to build), evaluation (i.e. hard to assess), and efficiency (i.e. hard to optimize). Designed to address these issues simultaneously, Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a (i) software toolkit, (ii) empirical standard, and (iii) interface for optimization. Our ultimate goal lies in facilitating transparent and reproducible experimentation with complex inference workflows, providing integrated pathways for (1) personalized prediction, (2) treatment-effect estimation, and (3) information acquisition. Through illustrative examples on real-world data in outpatient, general wards, and intensive-care settings, we illustrate the applicability of the pipeline paradigm on core tasks in the healthcare journey. To the best of our knowledge, Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.

摘要
时间序列学习是医疗数据驱动的严重症状支持的基础，最近几年的机器学习研究表明了各种医疗设置中的潜力。然而，医疗时间序列问题在实际应用中具有复杂的特性：它们包括数据预处理、缺失值填充、特征选择、预测 issuing、 uncertainty 估计和模型解释等多个组件的交互。尽管电子病人数据的增长呈指数型增长，但是在临床研究和决策支持中实现的潜力与可能性之间还存在巨大的差距。特别是在实际项目生命周期中，工程（即困难于建立）、评估（即困难于评估）和效率（即困难于优化）等问题具有挑战性。为了解决这些问题，Clairvoyance 提出了一个统一、端到端、自动化 ML 友好的管道，作为（i）软件工具包、（ii）实验标准和（iii）优化接口。我们的最终目标是使得医疗时间序列 ML 实际实践中的透明度和可重现性得到改善。通过使用实际数据来 illustrate 管道的应用，我们示例了医疗旅程中的核心任务，如个性化预测、治疗效果估计和信息获取。根据我们所知，Clairvoyance 是首个实现了Complex Inference Workflows 的综合和自动化管道的医疗时间序列 ML 项目。

DySurv: Dynamic Deep Learning Model for Survival Prediction in the ICU

paper_url: http://arxiv.org/abs/2310.18681
repo_url: None
paper_authors: Munib Mesinovic, Peter Watkinson, Tingting Zhu
for: 这篇论文旨在提出一种基于深度学习的预测生存时间方法，以便在ICU中进行动态死亡风险预测。
methods: 这篇论文使用了一种名为 DySurv 的新型 conditional variational autoencoder-based 方法，使用了病人电子医疗纪录中的静态和时间序列数据来估计死亡风险。
results: 这篇论文的实验结果显示，DySurv 方法可以在标准库中对比其他方法表现出色，并且在实际患者数据库中进行了验证。 survival 估计的内在一致性和不同数据集中的稳定性都支持了这种动态深度学习模型在预测生存时间方法中的可靠性。

Abstract
Survival analysis helps approximate underlying distributions of time-to-events which in the case of critical care like in the ICU can be a powerful tool for dynamic mortality risk prediction. Extending beyond the classical Cox model, deep learning techniques have been leveraged over the last years relaxing the many constraints of their counterparts from statistical methods. In this work, we propose a novel conditional variational autoencoder-based method called DySurv which uses a combination of static and time-series measurements from patient electronic health records in estimating risk of death dynamically in the ICU. DySurv has been tested on standard benchmarks where it outperforms most existing methods including other deep learning methods and we evaluate it on a real-world patient database from MIMIC-IV. The predictive capacity of DySurv is consistent and the survival estimates remain disentangled across different datasets supporting the idea that dynamic deep learning models based on conditional variational inference in multi-task cases can be robust models for survival analysis.

摘要
生存分析可以 aproximate 时间事件的下面分布，在 ICU 中可以是一种强大的动态死亡风险预测工具。在过去几年中，深度学习技术被应用于生存分析，超越了统计方法的多种限制。在这种工作中，我们提出了一种名为 DySurv 的新方法，使用患者电子医疗记录中的静态和时间序列测量来 dynamically 估算 ICU 中死亡风险。DySurv 已经在标准Benchmark上测试，与其他深度学习方法相比，它在大多数情况下表现出色，并在实际患者数据库中进行了评估。survival 预测的可靠性和预测值在不同数据集中保持分离，支持我们的想法，即基于 conditional variational inference 的动态深度学习模型在多任务情况下可以是Robust模型 для survival analysis。

Energy-Based Models for Anomaly Detection: A Manifold Diffusion Recovery Approach

paper_url: http://arxiv.org/abs/2310.18677
repo_url: None
paper_authors: Sangwoong Yoon, Young-Uk Jin, Yung-Kyun Noh, Frank C. Park
for: 这篇论文是用于侦测异常（Anomaly Detection）的新方法。
methods: 这篇论文使用的方法是把资料点推广到低维度构造中，然后使用EBM进行侦测。
results: 实验结果显示，这篇论文的方法可以在不同的资料类型和侦测任务中具有优秀的表现。

Abstract
We present a new method of training energy-based models (EBMs) for anomaly detection that leverages low-dimensional structures within data. The proposed algorithm, Manifold Projection-Diffusion Recovery (MPDR), first perturbs a data point along a low-dimensional manifold that approximates the training dataset. Then, EBM is trained to maximize the probability of recovering the original data. The training involves the generation of negative samples via MCMC, as in conventional EBM training, but from a different distribution concentrated near the manifold. The resulting near-manifold negative samples are highly informative, reflecting relevant modes of variation in data. An energy function of MPDR effectively learns accurate boundaries of the training data distribution and excels at detecting out-of-distribution samples. Experimental results show that MPDR exhibits strong performance across various anomaly detection tasks involving diverse data types, such as images, vectors, and acoustic signals.

摘要
我们提出了一种新的能量基模型（EBM）训练方法，该方法利用数据中的低维结构。我们的算法，抽象扩散恢复（MPDR），首先将数据点扰动到一个低维抽象 manifold 上，然后通过 MCMC 生成负样本，与 convential EBM 训练中的负样本生成方式类似。但是，MPDR 使用的是一个集中在抽象 manifold 上的分布，从而生成了具有低维结构的负样本。这些近抽象 manifold 上的负样本具有高度信息richness，反映了数据中重要的变换模式。 MPDR 的能量函数可以准确地学习训练数据分布的边界，并且能够准确检测数据集外的异常样本。我们的实验结果显示，MPDR 在多种异常检测任务中表现出色，包括图像、向量和声音信号等数据类型。

Maximum Independent Set: Self-Training through Dynamic Programming

paper_url: http://arxiv.org/abs/2310.18672
repo_url: None
paper_authors: Lorenzo Brusca, Lars C. P. M. Quaedvlieg, Stratis Skoulakis, Grigorios G Chrysos, Volkan Cevher
for: 本文提出了一种基于图神经网络（GNN）的最大独立集（MIS）问题解决方案， drawing inspiration from dynamic programming（DP）。
methods: specifically, the authors propose a DP-like recursive algorithm based on GNNs that first constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call.
results: the authors provide numerical evidence showing the superiority of their method compared to prior methods in multiple synthetic and real-world datasets.

Abstract
This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP). Specifically, given a graph, we propose a DP-like recursive algorithm based on GNNs that firstly constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call. To train our algorithm, we require annotated comparisons of different graphs concerning their MIS size. Annotating the comparisons with the output of our algorithm leads to a self-training process that results in more accurate self-annotation of the comparisons and vice versa. We provide numerical evidence showing the superiority of our method vs prior methods in multiple synthetic and real-world datasets.

摘要

Causal discovery in a complex industrial system: A time series benchmark

paper_url: http://arxiv.org/abs/2310.18654
repo_url: None
paper_authors: Søren Wengel Mogensen, Karin Rathsman, Per Nilsson
for: 这篇论文是用来描述如何从观测数据中推断 causal structure的。
methods: 这篇论文使用了一种时间序列数据的 causal discovery 方法，并对真实的工业系统进行了测试。
results: 论文提供了一个industrial subsystem的 causal graph，并通过专家知识来构建了这个图。这个测试环境可以帮助开发 causal discovery 方法。

Abstract
Causal discovery outputs a causal structure, represented by a graph, from observed data. For time series data, there is a variety of methods, however, it is difficult to evaluate these on real data as realistic use cases very rarely come with a known causal graph to which output can be compared. In this paper, we present a dataset from an industrial subsystem at the European Spallation Source along with its causal graph which has been constructed from expert knowledge. This provides a testbed for causal discovery from time series observations of complex systems, and we believe this can help inform the development of causal discovery methodology.

摘要

SSL Framework for Causal Inconsistency between Structures and Representations

paper_url: http://arxiv.org/abs/2310.18634
repo_url: None
paper_authors: Hang Chen, Xinyu Yang, Keqing Du
for: 这篇论文旨在探讨深度学习和 causal discovery 之间的交叉束合，以揭示无法统计数据中的 causal 关系。
methods: 本文提出了一种针对无法统计数据的 intervention 策略和 causal consistency condition (CCC) 的理论发展，并设计了一个自然语言模型 (LLMs) 和一个监督特殊化模型 (SSMs) 的自动学习框架。
results: 该文通过大量实验证明了其方法的有效性，并在三个下游任务中进行了评估。

Abstract
The cross-pollination of deep learning and causal discovery has catalyzed a burgeoning field of research seeking to elucidate causal relationships within non-statistical data forms like images, videos, and text. Such data, often being named `indefinite data', exhibit unique challenges-inconsistency between causal structure and representation, which are not common in conventional data forms. To tackle this issue, we theoretically develop intervention strategies suitable for indefinite data and derive causal consistency condition (CCC). Moreover, we design a self-supervised learning (SSL) framework that considers interventions as `views' and CCC as a `philosophy' with two implement examples on Supervised Specialized Models (SSMs) and Large Language Models (LLMs), respectively. To evaluate pure inconsistency manifestations, we have prepared the first high-quality causal dialogue dataset-Causalogue. Evaluations are also performed on three other downstream tasks. Extensive experimentation has substantiated the efficacy of our methodology, illuminating how CCC could potentially play an influential role in various fields.

摘要
<>将深度学习和 causal discovery 融合，激发了一个蓬勃的研究，旨在揭示非统计数据中的 causal 关系。这类数据，常被称为 "未定数据"，具有独特的挑战 - causal 结构和表示之间的不一致。为解决这个问题，我们提出了适应于未定数据的干预策略和 causal 一致性条件（CCC）的理论发展。此外，我们还设计了一个基于自我监督学习（SSL）框架，在该框架中，干预被视为 "视图"，CCC 被视为 "哲学"，并在 Supervised Specialized Models (SSMs) 和 Large Language Models (LLMs) 中进行了两个实现例子。为了评估纯净的不一致现象，我们准备了首个高质量 causal 对话集 - Causalogue。此外，我们还在三个下游任务上进行了评估。广泛的实验证明了我们的方法的有效性，揭示了 CCC 在不同领域的可能发挥作用。

Explainable Modeling for Wind Power Forecasting: A Glass-Box Approach with Exceptional Accuracy

paper_url: http://arxiv.org/abs/2310.18629
repo_url: None
paper_authors: Wenlong Liao, Fernando Porté-Agel, Jiannong Fang, Birgitte Bak-Jensen, Guangchun Ruan, Zhe Yang
for: 这篇论文旨在提出一个可读性高的风力预测模型，并且可以实现高精度的风力预测。
methods: 本论文使用了进步的人工智能技术（例如Gradient Boosting），创造了shape函数在预测模型中。这些函数可以将风力输出和输入特征之间的复杂非线性关系实现有效地映射。此外，预测模型还包括互动项，以实现输入特征之间的互动和联合作用。
results: 根据实验结果显示，提案的玻璃箱方法可以实现风力预测的可读性和高精度。对于全球和个别 perspective，这种方法都能够实现高精度的预测。此外，与大多数参考模型相比，玻璃箱方法表现更好，并且和最佳性能的神经网络相比，表现相当。因此，这种玻璃箱方法在可靠的风力预测中具有吸引力。

Abstract
Machine learning models (e.g., neural networks) achieve high accuracy in wind power forecasting, but they are usually regarded as black boxes that lack interpretability. To address this issue, the paper proposes a glass-box approach that combines exceptional accuracy with transparency for wind power forecasting. Specifically, advanced artificial intelligence methods (e.g., gradient boosting) are innovatively employed to create shape functions within the forecasting model. These functions effectively map the intricate non-linear relationships between wind power output and input features. Furthermore, the forecasting model is enriched by incorporating interaction terms that adeptly capture interdependencies and synergies among the input features. Simulation results show that the proposed glass-box approach effectively interprets the results of wind power forecasting from both global and instance perspectives. Besides, it outperforms most benchmark models and exhibits comparable performance to the best-performing neural networks. This dual strength of transparency and high accuracy positions the proposed glass-box approach as a compelling choice for reliable wind power forecasting.

摘要
机器学习模型（如神经网络）可以实现高精度风力预测，但它们通常被视为黑盒模型，缺乏可读性。为解决这个问题，文章提出了一种玻璃盒方法，该方法结合了高精度和可读性来进行风力预测。具体来说，文章使用了进步的人工智能技术（如梯度提升）来创建shape函数在预测模型中。这些函数有效地映射了风力输出和输入特征之间的复杂非线性关系。此外，预测模型还被补充了交互项，以便精准地捕捉输入特征之间的互动和协同作用。 simulation结果显示，提议的玻璃盒方法可以从全局和实例两个角度进行可读性的风力预测，并且在大多数参考模型之上出performances，与最佳性能的神经网络相当。这种两种优点的玻璃盒方法因此成为可靠的风力预测的可靠选择。

Pessimistic Off-Policy Multi-Objective Optimization

paper_url: http://arxiv.org/abs/2310.18617
repo_url: None
paper_authors: Shima Alizadeh, Aniruddha Bhargava, Karthick Gopalswamy, Lalit Jain, Branislav Kveton, Ge Liu
for: 这篇论文主要研究了多目标优化问题中，如何从现有策略收集的数据中提取多目标策略优化。
methods: 该论文提出了一种偏负估 estimator，基于对抗性折衣分数（IPS），用于估算多目标策略价值。这种估计器在理论和实验中都提高了对于naive IPS估计器。
results: 该论文的分析是通用的，可以应用于不同的IPS估计器和优化方法。偏负估 estimator可以通过policy gradient来优化，在所有实验中表现良好。

Abstract
Multi-objective optimization is a type of decision making problems where multiple conflicting objectives are optimized. We study offline optimization of multi-objective policies from data collected by an existing policy. We propose a pessimistic estimator for the multi-objective policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them. The pessimistic estimator can be optimized by policy gradients and performs well in all of our experiments.

摘要
多目标优化是决策问题的一种，其中有多个矛盾的目标被优化。我们研究基于现有策略所采集的数据进行离线优化的多目标策略。我们提出了一种消极估计器，用于估计多目标策略的价值，这种估计器基于反抗概率分布（IPS），并在理论和实验中都有所改进。我们的分析涵盖了更广泛的领域，并不仅限于我们的IPS估计器和优化方法。这种消极估计器可以通过政策Gradient优化，在所有实验中表现良好。

Temporally Disentangled Representation Learning under Unknown Nonstationarity

paper_url: http://arxiv.org/abs/2310.18615
repo_url: https://github.com/xiangchensong/nctrl
paper_authors: Xiangchen Song, Weiran Yao, Yewen Fan, Xinshuai Dong, Guangyi Chen, Juan Carlos Niebles, Eric Xing, Kun Zhang
for: 研究者们是想解决非站点的时序数据中 causal represencing 问题，即在不具备辅助变量（如类别标签和/或领域标识符）的情况下，可以准确分离 causally 相关的 latent 变量。
methods: 研究者们在这篇论文中提出了一种名为 NCTRL 的原则性估计框架，可以在非站点设置下，基于测量序列数据，重建时延 causal 变量并分离其关系。
results: 实验证明，NCTRL 方法可以可靠地分离时延 causal 变量，并且在不具备辅助变量的情况下，表现出明显的优势，超过了现有的基准值。

Abstract
In unsupervised causal representation learning for sequential data with time-delayed latent causal influences, strong identifiability results for the disentanglement of causally-related latent variables have been established in stationary settings by leveraging temporal structure. However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed auxiliary variables (e.g., class labels and/or domain indexes) as side information or assuming simplified latent causal dynamics. Both constrain the method to a limited range of scenarios. In this study, we further explored the Markov Assumption under time-delayed causally related process in nonstationary setting and showed that under mild conditions, the independent latent components can be recovered from their nonlinear mixture up to a permutation and a component-wise transformation, without the observation of auxiliary variables. We then introduce NCTRL, a principled estimation framework, to reconstruct time-delayed latent causal variables and identify their relations from measured sequential data only. Empirical evaluations demonstrated the reliable identification of time-delayed latent causal influences, with our methodology substantially outperforming existing baselines that fail to exploit the nonstationarity adequately and then, consequently, cannot distinguish distribution shifts.

摘要
在不监督 causal 表示学习中，对时间延迟的 latent causal 影响进行了强大的可Identifiability 结果，在静止设置下，通过利用时间结构来恰当地识别 causally 相关的 latent 变量。然而，在不稳定设置下，现有的工作只是部分地解决了问题，可以通过利用观测的auxiliary变量（例如类别标签和/或domain标识符）作为副信息，或者假设简单的 latent causal 动力学。两者都限制方法只能在有限的情况下运行。在这项研究中，我们进一步探讨了在时间延迟 causally 相关的 Markov 假设在不稳定设置下，并证明了在某些轻度条件下，独立的 latent 分量可以从其非线性混合中重建，而无需观测 auxilary 变量。然后，我们引入 NCTRL，一种原则性的估计框架，来重建时间延迟的 latent causal 变量，并identify它们之间的关系，从测量的时间序列数据中。实验证明了我们的方法可靠地识别时间延迟的 latent causal 影响，并且substantially 超越了不充分利用不稳定性的现有基准值。

Efficient kernel surrogates for neural network-based regression

paper_url: http://arxiv.org/abs/2310.18612
repo_url: None
paper_authors: Saad Qadeer, Andrew Engel, Adam Tsou, Max Vargas, Panos Stinis, Tony Chiang
for: 这篇论文的目的是为了解释深度神经网络（DNN）的效果和局限性，并提供一种低成本的估计方法。methods: 这篇论文使用了Randomly initialized DNNs和Conjugate Kernel（CK）来研究DNN的性能。results: 论文表明，CK可以作为NTK的低成本估计方法，并且在某些情况下可以超越NTK的性能。此外，论文还提供了一种改进DNN准确率的简单方法。

Abstract
Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the effectiveness and limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to assess their precise dependence on the training data and to study their generalization properties on unseen datasets. Recent work has shown that randomly initialized DNNs in the infinite width limit converge to kernel machines relying on a Neural Tangent Kernel (NTK) with known closed form. These results suggest, and experimental evidence corroborates, that empirical kernel machines can also act as surrogates for finite width DNNs. The high computational cost of assembling the full NTK, however, makes this approach infeasible in practice, motivating the need for low-cost approximations. In the current work, we study the performance of the Conjugate Kernel (CK), an efficient approximation to the NTK that has been observed to yield fairly similar results. For the regression problem of smooth functions and classification using logistic regression, we show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior. In particular, we establish bounds for the relative test losses, verify them with numerical tests, and identify the regularity of the kernel as the key determinant of performance. In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework provides insights into understanding the robustness of the various approximants and suggests a recipe for improving DNN accuracy inexpensively. We present a demonstration of this on the foundation model GPT-2 by comparing its performance on a classification task using a conventional approach and our prescription.

摘要
尽管深度神经网络（DNN）在许多学习任务上表现出了极大的承诺，但是理论上的效iveness和局限性仍然无法被实践者们完全理解。这是因为不能确定closed form的学习函数，使得训练数据的依赖关系和未经训练数据集的泛化性 harder to assess。近期的研究表明，在无限宽限制下，Randomly initialized DNNs会 converges to kernel machines，这些机器可以通过known closed form的Neural Tangent Kernel（NTK）来描述。这些结果表明，和实验证据支持，empirical kernel machines可以作为finite width DNNs的surrogate。然而，assembling the full NTK的计算成本太高，这使得这种方法在实践中不可行，因此需要低成本的近似。在当前的工作中，我们研究了Conjugate Kernel（CK）的性能，CK是NTK的有效近似。对于抽象函数的回归问题和使用logistic regression进行分类，我们显示CK的性能只是NTK的一个小 margin worse，而且在某些情况下，CKeven outperform NTK。具体来说，我们给出了 bounds for the relative test losses，通过数值测试验证了这些 bound，并发现了核函数的 Regularity是性能的关键因素。此外，我们的框架还提供了使用CK instead of NTK的理论基础，以及如何提高DNN的准确性的recipe。我们在GPT-2基础模型上进行了一个示例，通过对一个分类任务使用我们的方法和传统方法进行比较。

Where have you been? A Study of Privacy Risk for Point-of-Interest Recommendation

paper_url: http://arxiv.org/abs/2310.18606
repo_url: None
paper_authors: Kunlin Cai, Jinghuai Zhang, Will Shand, Zhiqing Hong, Guang Wang, Desheng Zhang, Jianfeng Chi, Yuan Tian
For: This paper aims to evaluate the privacy risks of mobility data-based machine learning models, specifically point-of-interest recommendation models, by designing a privacy attack suite and conducting experimental evaluations.* Methods: The paper uses a privacy attack suite that includes data extraction and membership inference attacks to evaluate the privacy risks of POI recommendation models. The attacks assume different adversary knowledge and aim to extract different types of sensitive information from mobility data.* Results: The experimental evaluation using two real-world mobility datasets demonstrates that current POI recommendation models are vulnerable to the attacks in the privacy attack suite. The paper also presents unique findings on what types of mobility data are more susceptible to privacy attacks.

Abstract
As location-based services (LBS) have grown in popularity, the collection of human mobility data has become increasingly extensive to build machine learning (ML) models offering enhanced convenience to LBS users. However, the convenience comes with the risk of privacy leakage since this type of data might contain sensitive information related to user identities, such as home/work locations. Prior work focuses on protecting mobility data privacy during transmission or prior to release, lacking the privacy risk evaluation of mobility data-based ML models. To better understand and quantify the privacy leakage in mobility data-based ML models, we design a privacy attack suite containing data extraction and membership inference attacks tailored for point-of-interest (POI) recommendation models, one of the most widely used mobility data-based ML models. These attacks in our attack suite assume different adversary knowledge and aim to extract different types of sensitive information from mobility data, providing a holistic privacy risk assessment for POI recommendation models. Our experimental evaluation using two real-world mobility datasets demonstrates that current POI recommendation models are vulnerable to our attacks. We also present unique findings to understand what types of mobility data are more susceptible to privacy attacks. Finally, we evaluate defenses against these attacks and highlight future directions and challenges.

摘要
为了应对 Location-based Services (LBS) 的普及，人类移动数据的收集已成为建立 Machine Learning (ML) 模型的重要步骤，以提供更高的用户便利。然而，这种便利也会带来隐私泄露的风险，因为这些数据可能包含用户标识信息，如家庭/办公室的位置。先前的工作主要关注于在传输或发布 mobility data 时保护隐私，而忽略了 mobility data 基于 ML 模型的隐私风险评估。为了更好地理解和评估 mobility data 基于 ML 模型的隐私泄露，我们设计了一个隐私攻击集，包括数据EXTRACTION和会员推理攻击，专门为点位服务（POI）推荐模型而设计。这些攻击在我们的攻击集中假设不同的反对手知识，目标是从移动数据中提取不同类型的敏感信息，为 POI 推荐模型的隐私风险进行总体评估。我们使用两个实际的移动数据集进行实验，表明现有 POI 推荐模型对我们的攻击非常感受。我们还发现了不同类型的移动数据是哪些隐私攻击最容易受到的，以及对这些攻击的防御措施和未来方向。

TorchDEQ: A Library for Deep Equilibrium Models

paper_url: http://arxiv.org/abs/2310.18605
repo_url: https://github.com/locuslab/torchdeq
paper_authors: Zhengyang Geng, J. Zico Kolter
for: This paper is written to provide a systematic and comprehensive framework for training and applying Deep Equilibrium (DEQ) models, which are a class of implicit models that map inputs to fixed points of neural networks.
methods: The paper presents TorchDEQ, an open-source PyTorch-based library that allows users to define, train, and infer using DEQs over multiple domains with minimal code and best practices.
results: The paper reports that by developing a joint framework that incorporates the best practices across all models, the performance, training stability, and efficiency of DEQs have been substantially improved on ten datasets across all six projects in the “DEQ Zoo”.

Abstract
Deep Equilibrium (DEQ) Models, an emerging class of implicit models that maps inputs to fixed points of neural networks, are of growing interest in the deep learning community. However, training and applying DEQ models is currently done in an ad-hoc fashion, with various techniques spread across the literature. In this work, we systematically revisit DEQs and present TorchDEQ, an out-of-the-box PyTorch-based library that allows users to define, train, and infer using DEQs over multiple domains with minimal code and best practices. Using TorchDEQ, we build a ``DEQ Zoo'' that supports six published implicit models across different domains. By developing a joint framework that incorporates the best practices across all models, we have substantially improved the performance, training stability, and efficiency of DEQs on ten datasets across all six projects in the DEQ Zoo. TorchDEQ and DEQ Zoo are released as \href{https://github.com/locuslab/torchdeq}{open source}.

摘要
深度平衡（DEQ）模型，一种在深度学习社区中升起的新类刚果模型，可以将输入映射到神经网络中的固定点上。然而，在训练和应用DEQ模型时，目前仍然采用各种不同的技术，分散在文献中。在这项工作中，我们系统地回顾DEQs，并提出了一个名为TorchDEQ的基于PyTorch的库，允许用户定义、训练和推理使用DEQs，并在多个领域上进行最小代码和最佳实践。使用TorchDEQ，我们建立了一个名为“DEQ zoo”的 colección，支持了六种已发表的隐式模型，并在不同的领域中进行了六个项目的实验。通过开发一个集成所有模型最佳实践的共同框架，我们在十个数据集上提高了DEQs的性能、训练稳定性和效率。TorchDEQ和DEQ zoo都已经作为开源项目在GitHub上发布。

Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

paper_url: http://arxiv.org/abs/2310.18603
repo_url: None
paper_authors: Wencong You, Zayd Hammoudeh, Daniel Lowd
for: 这 paper 是为了攻击机器学习模型的弱点，使其预测结果被 manipulate 的。
methods: 这 paper 使用了语言模型来自动插入多种风格的触发器到文本中，以达到攻击目标。
results: 论文表明，使用 LLMBkd 攻击方法可以在各种风格下 achieve 高度的攻击成功率，只需要 little effort 和无需模型训练。

Abstract
Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data. We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled. Our attack, LLMBkd, leverages language models to automatically insert diverse style-based triggers into texts. We also propose a poison selection technique to improve the effectiveness of both LLMBkd as well as existing textual backdoor attacks. Lastly, we describe REACT, a baseline defense to mitigate backdoor attacks via antidote training examples. Our evaluations demonstrate LLMBkd's effectiveness and efficiency, where we consistently achieve high attack success rates across a wide range of styles with little effort and no model training.

摘要
<>将给定文本翻译成简化中文。>我们研究了一种新的后门攻击方法，称为LLMBkd。这种攻击方法利用语言模型自动插入文本中的多种风格化触发器。我们还提出了一种毒选择技术，以提高现有的文本后门攻击和LLMBkd的效果。此外，我们还描述了一种基eline防御方法，称为REACT，以 Mitigate backdoor attacks via antidote training examples。我们的评估结果表明，LLMBkd 具有高效率和多样化的触发器，可以轻松地在各种风格下实现高度成功率。

Online Decision Mediation

paper_url: http://arxiv.org/abs/2310.18601
repo_url: https://github.com/uvhw/Bitcoin-Foundation
paper_authors: Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar
For: The paper aims to serve as an intermediary between expert behavior and human behavior in decision-making, with the goal of striking a balance between purely prescriptive and purely descriptive approaches.* Methods: The paper proposes a solution that trades off immediate loss terms against future improvements in generalization error, and identifies why conventional bandit algorithms may fail.* Results: The paper demonstrates consistent gains over applicable benchmarks on performance measures with respect to the mediator policy, the learned model, and the decision-making system as a whole, through experiments and sensitivities on a variety of datasets.Here’s the simplified Chinese text for the three information points:* For: 这篇论文目标是在决策过程中作为中间人，以寻求 struck a balance between凡是指导的（purely prescriptive）和凡是描述的（purely descriptive）方法。* Methods: 论文提出了一种方法，该方法在评估损失和未来改进的泛化误差之间进行了交易，并解释了 conventional bandit 算法可能失败的原因。* Results: 论文通过对各种数据集的实验和敏感度分析，示出了与相关的参考模型、学习模型和决策系统总体性能的一致性。

Abstract
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior: At each time, the algorithm observes an action chosen by a fallible agent, and decides whether to *accept* that agent's decision, *intervene* with an alternative, or *request* the expert's opinion. For instance, in clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances, thus real-world decision support is often limited to monitoring and forecasting. Instead, such an intermediary would strike a prudent balance between the former (purely prescriptive) and latter (purely descriptive) approaches, while providing an efficient interface between human mistakes and expert feedback. In this work, we first formalize the sequential problem of *online decision mediation* -- that is, of simultaneously learning and evaluating mediator policies from scratch with *abstentive feedback*: In each round, deferring to the oracle obviates the risk of error, but incurs an upfront penalty, and reveals the otherwise hidden expert action as a new training data point. Second, we motivate and propose a solution that seeks to trade off (immediate) loss terms against (future) improvements in generalization error; in doing so, we identify why conventional bandit algorithms may fail. Finally, through experiments and sensitivities on a variety of datasets, we illustrate consistent gains over applicable benchmarks on performance measures with respect to the mediator policy, the learned model, and the decision-making system as a whole.

摘要
考虑使用决策支持助手作为 oracle 专家行为和人类行为之间的中间人：在每次时刻，算法观察到一个不准确的代理人选择的行动，然后决定是否接受该代理人的决定， intervene with 一个替代方案，或者请求专家的意见。例如，在临床诊断中，完全自主的机器行为经常超出伦理范畴，因此现实世界决策支持通常受限于监测和预测。而这个中间人可以 strike 一个谨慎的平衡 между两者，同时提供一个有效的人类错误和专家反馈之间的交互。在这项工作中，我们首先正式化了在线决策媒介问题的sequential形式：在每个回合中，推迟到oracle会降低风险，但是会付出一个初始的罚款，并将 Otherwise 隐藏的专家行为作为一个新的训练数据点。其次，我们激励和提出一个解决方案，它在同时学习和评估媒介策略时，要求平衡 (immediate) 损失和 (future) 改进泛化误差的问题。在这个过程中，我们发现了 conventional bandit 算法可能失败的原因。最后，通过对多种数据集进行实验和敏感分析，我们证明了我们的方法在表现度量上与相关的benchmark相比具有一致性。

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

paper_url: http://arxiv.org/abs/2310.18593
repo_url: None
paper_authors: Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun
for: 这个论文的目标是实现公平的主成分分析（PCA），使得投影后的分布匹配于敏感特征的分布。
methods: 这篇论文使用了一种新的定义called“可能相对公平优化”（PAFO）学习可能性，并在实际应用中提出了一种名为“公平流动PCA”的新设定，以及一种具有内存效率的算法“公平噪声方法”（FNPM）。
results: 这篇论文提供了这种算法的“统计”保证，这是公平PCA文献中的首次。此外，它还验证了这种算法的效果和内存效率在实际数据上。

Abstract
Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.

摘要
“ fair principal component analysis (PCA) 是一个问题设定，我们想要在 PCA 中进行不偏的表现，使得投影的分布，根据敏感特征，相互匹配。然而，现有的公平 PCA 方法有两个主要问题：一是理论上没有公平 PCA 的学习基础; two是实际上限制了我们使用现有方法，因为它们需要完整的数据存储。在理论上，我们严格定义公平 PCA 使用一个新的概念 called “可能接近公平且最佳”(PAFO) 可学习性。在实践上，运用最近的流动数据处理技术，我们提出一个新的设定 called “公平流动 PCA”，以及一个内存有效的算法，叫做公平杂音方法 (FNPM)。我们然后提供这个设定的“ Statistical ”保证，这是公平 PCA 文献中的第一个。最后，我们验证了我们的算法在实际数据上的有效性和内存效率。”

Inverse Decision Modeling: Learning Interpretable Representations of Behavior

paper_url: http://arxiv.org/abs/2310.18591
repo_url: https://github.com/danieljarrett/Inverse-Bounded-Rational-Control
paper_authors: Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar
for: 提高决策过程的模型化和改进
methods: 使用参数化表示Sequential Decision Behavior的框架，包括正则化控制行为和资料学习
results: 实现了学习（可读）表示 rationality，自然地捕捉了偏见行为、环境知识不准确和 bounded rationality 的概念

Abstract
Decision analysis deals with modeling and enhancing decision processes. A principal challenge in improving behavior is in obtaining a transparent description of existing behavior in the first place. In this paper, we develop an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior. First, we formalize the forward problem (as a normative standard), subsuming common classes of control behavior. Second, we use this to formalize the inverse problem (as a descriptive model), generalizing existing work on imitation/reward learning -- while opening up a much broader class of research problems in behavior representation. Finally, we instantiate this approach with an example (inverse bounded rational control), illustrating how this structure enables learning (interpretable) representations of (bounded) rationality -- while naturally capturing intuitive notions of suboptimal actions, biased beliefs, and imperfect knowledge of environments.

摘要
First, we define the forward problem, which includes common classes of control behavior. Then, we use this framework to formalize the inverse problem, which generalizes existing work on imitation and reward learning. This approach opens up a broader range of research problems in behavior representation.Finally, we provide an example of inverse bounded rational control, which demonstrates how this structure enables the learning of interpretable representations of rationality while naturally capturing suboptimal actions, biased beliefs, and imperfect knowledge of environments.

Optimal Transport for Kernel Gaussian Mixture Models

paper_url: http://arxiv.org/abs/2310.18586
repo_url: None
paper_authors: Jung Hun Oh, Rena Elkin, Anish Kumar Simhal, Jiening Zhu, Joseph O Deasy, Allen Tannenbaum
for: 本研究使用 Wasserstein 距离来衡量两个 Gaussian mixture 的距离，并利用 kernel trick 避免直接将输入数据映射到高维特征空间。
methods: 本研究使用 kernel Gaussian mixture models 来计算两个 Gaussian mixture 的 Wasserstein 距离。
results: 本研究提出了一种基于 RKHS 的 Wasserstein-type metric，可以帮助更好地模型复杂多模 density 的实际数据。

Abstract
The Wasserstein distance from optimal mass transport (OMT) is a powerful mathematical tool with numerous applications that provides a natural measure of the distance between two probability distributions. Several methods to incorporate OMT into widely used probabilistic models, such as Gaussian or Gaussian mixture, have been developed to enhance the capability of modeling complex multimodal densities of real datasets. However, very few studies have explored the OMT problems in a reproducing kernel Hilbert space (RKHS), wherein the kernel trick is utilized to avoid the need to explicitly map input data into a high-dimensional feature space. In the current study, we propose a Wasserstein-type metric to compute the distance between two Gaussian mixtures in a RKHS via the kernel trick, i.e., kernel Gaussian mixture models.

摘要
水斯坦距离（OMT）是一个具有广泛应用的数学工具，它提供了两个概率分布之间的自然距离量。有几种方法可以将 OMT 整合到广泛使用的概率模型中，如 Gaussian 或 Gaussian 混合体，以增强模型处理复杂多模式数据的能力。然而，几乎没有研究探讨 OMT 问题在复复函数希尔贝特空间（RKHS）中，这里利用核函数传递器来避免直接将输入数据映射到高维的特征空间。在 presente 研究中，我们提出了一种 Wasserstein-type 度量来计算两个 Gaussian 混合体之间的距离，即核函数 Gaussian 混合模型。

Group Robust Classification Without Any Group Information

paper_url: http://arxiv.org/abs/2310.18555
repo_url: https://github.com/tsirif/ula
paper_authors: Christos Tsirigotis, Joao Monteiro, Pau Rodriguez, David Vazquez, Aaron Courville
for: 这个研究旨在提高 Empirical Risk Minimization (ERM) 方法的高阶假设精度，并解决训练数据中的假设相互作用所导致的伪 correlations 问题，以便在高风险应用中部署系统。
methods: 这个研究提出了一种 entirely bias-unsupervised 的方法，使用预训练的自我supervised 模型来可靠地提取偏见信息，并与我们的验证标准数据集成logit adjustment 训练损失。
results: 我们的方法可以超过现有方法的性能，并在实际应用中提供了更好的伪相互作用精度，包括在 MPI3D dataset 上进行系统性的普遍化任务中，当混合对应 attribute value absent 时，现有方法失败。

Abstract
Empirical risk minimization (ERM) is sensitive to spurious correlations in the training data, which poses a significant risk when deploying systems trained under this paradigm in high-stake applications. While the existing literature focuses on maximizing group-balanced or worst-group accuracy, estimating these accuracies is hindered by costly bias annotations. This study contends that current bias-unsupervised approaches to group robustness continue to rely on group information to achieve optimal performance. Firstly, these methods implicitly assume that all group combinations are represented during training. To illustrate this, we introduce a systematic generalization task on the MPI3D dataset and discover that current algorithms fail to improve the ERM baseline when combinations of observed attribute values are missing. Secondly, bias labels are still crucial for effective model selection, restricting the practicality of these methods in real-world scenarios. To address these limitations, we propose a revised methodology for training and validating debiased models in an entirely bias-unsupervised manner. We achieve this by employing pretrained self-supervised models to reliably extract bias information, which enables the integration of a logit adjustment training loss with our validation criterion. Our empirical analysis on synthetic and real-world tasks provides evidence that our approach overcomes the identified challenges and consistently enhances robust accuracy, attaining performance which is competitive with or outperforms that of state-of-the-art methods, which, conversely, rely on bias labels for validation.

摘要
empirical risk minimization (ERM) sensitive to spurious correlations in the training data, posing significant risks when deploying systems trained under this paradigm in high-stakes applications. While the existing literature focuses on maximizing group-balanced or worst-group accuracy, estimating these accuracies is hindered by costly bias annotations. This study contends that current bias-unsupervised approaches to group robustness continue to rely on group information to achieve optimal performance. Firstly, these methods implicitly assume that all group combinations are represented during training. To illustrate this, we introduce a systematic generalization task on the MPI3D dataset and discover that current algorithms fail to improve the ERM baseline when combinations of observed attribute values are missing. Secondly, bias labels are still crucial for effective model selection, restricting the practicality of these methods in real-world scenarios. To address these limitations, we propose a revised methodology for training and validating debiased models in an entirely bias-unsupervised manner. We achieve this by employing pretrained self-supervised models to reliably extract bias information, which enables the integration of a logit adjustment training loss with our validation criterion. Our empirical analysis on synthetic and real-world tasks provides evidence that our approach overcomes the identified challenges and consistently enhances robust accuracy, attaining performance which is competitive with or outperforms that of state-of-the-art methods, which, conversely, rely on bias labels for validation.

Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion

paper_url: http://arxiv.org/abs/2310.18554
repo_url: None
paper_authors: Junghyun Lee, Se-Young Yun, Kwang-Sung Jun
for: 这个论文的目的是提高Logistic Bandit模型中的依赖关系，尤其是在大型数据集 ($S \geq d$) 时。
methods: 这个论文使用了一种新的方法 called “regret-to-confidence set conversion” (R2CS)，用于改进logistic bandit的 regret bound。
results: 通过使用R2CS方法，这个论文得到了一个优化的 regret bound，具有更好的依赖关系于$S$，同时保留计算可行性和其他因素($d$和$T$)的依赖关系。

Abstract
Logistic bandit is a ubiquitous framework of modeling users' choices, e.g., click vs. no click for advertisement recommender system. We observe that the prior works overlook or neglect dependencies in $S \geq \lVert \theta_\star \rVert_2$, where $\theta_\star \in \mathbb{R}^d$ is the unknown parameter vector, which is particularly problematic when $S$ is large, e.g., $S \geq d$. In this work, we improve the dependency on $S$ via a novel approach called {\it regret-to-confidence set conversion (R2CS)}, which allows us to construct a convex confidence set based on only the \textit{existence} of an online learning algorithm with a regret guarantee. Using R2CS, we obtain a strict improvement in the regret bound w.r.t. $S$ in logistic bandits while retaining computational feasibility and the dependence on other factors such as $d$ and $T$. We apply our new confidence set to the regret analyses of logistic bandits with a new martingale concentration step that circumvents an additional factor of $S$. We then extend this analysis to multinomial logistic bandits and obtain similar improvements in the regret, showing the efficacy of R2CS. While we applied R2CS to the (multinomial) logistic model, R2CS is a generic approach for developing confidence sets that can be used for various models, which can be of independent interest.

摘要
“带有搜索问题的游戏”（Logistic Bandit）是一个普遍的框架，用于模型用户选择，例如广告追踪系统中的点击vs无点击。我们发现先前的研究往往忽略或忽略了$S \geq \lVert \theta_\star \rVert_2$中的相互依赖，尤其当$S$较大时（例如$S \geq d$）。在这个工作中，我们通过一种新的方法called“ regret-to-confidence set conversion”（R2CS），将可以建立基于仅存在线上学习算法的 regret guarantee的凸信心集。使用R2CS，我们得到了对$S$的 regret bound的严格改进，同时保持了 Computational Feasibility和其他因素（例如$d$和$T$）的依赖。我们将新的信心集应用到了带有新 martingale concentration step 的 regret分析中，从而缺少一个 $S$ 的额外因素。然后，我们将这些分析扩展到多ategorical logistic bandits，并获得了类似的改进。我们将R2CS应用到（多ategorical） logistic模型，但R2CS是一个更通用的方法，可以用于不同的模型，这可能是独立的兴趣。

The Role of Reference Points in Machine-Learned Atomistic Simulation Models

paper_url: http://arxiv.org/abs/2310.18552
repo_url: None
paper_authors: Xiangyun Lei, Weike Ye, Joseph Montoya, Tim Mueller, Linda Hung, Jens Hummelshoej
for: 本研究提出了一种新的化学环境模型理论（CEMT），用于超越传统的基于原子 Machine Learning Force Field（MLFF）模型，广泛用于化学系统的分子动力学 simulations。
methods: 本研究使用了 Gaussian Multipole（GMP）函数来Feature化不同参考点集，包括finite difference grid-centered和bond-centered模型，以分析不同参考点集的能量预测精度、预测速度和学习效率。
results: 研究发现，使用非原子参考点可以提高力训练的灵活性和适应性，并且可以提高预测精度、预测速度和学习效率。此外，本研究还建立了 CEMT 与 real-space orbital-free finite element Density Functional Theory（FE-DFT）之间的联系，并表明了这种联系可以提高数据效率和稳定性。

Abstract
This paper introduces the Chemical Environment Modeling Theory (CEMT), a novel, generalized framework designed to overcome the limitations inherent in traditional atom-centered Machine Learning Force Field (MLFF) models, widely used in atomistic simulations of chemical systems. CEMT demonstrated enhanced flexibility and adaptability by allowing reference points to exist anywhere within the modeled domain and thus, enabling the study of various model architectures. Utilizing Gaussian Multipole (GMP) featurization functions, several models with different reference point sets, including finite difference grid-centered and bond-centered models, were tested to analyze the variance in capabilities intrinsic to models built on distinct reference points. The results underscore the potential of non-atom-centered reference points in force training, revealing variations in prediction accuracy, inference speed and learning efficiency. Finally, a unique connection between CEMT and real-space orbital-free finite element Density Functional Theory (FE-DFT) is established, and the implications include the enhancement of data efficiency and robustness. It allows the leveraging of spatially-resolved energy densities and charge densities from FE-DFT calculations, as well as serving as a pivotal step towards integrating known quantum-mechanical laws into the architecture of ML models.

摘要

Punica: Multi-Tenant LoRA Serving

paper_url: http://arxiv.org/abs/2310.18547
repo_url: https://github.com/punica-ai/punica
paper_authors: Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy
for: 这个论文是为了提出一个名为Punica的系统，用于在共享GPU集群中服务多个低阶适应（LoRA）模型。
methods: 这个系统使用了一个新的CUDA核心设计，允许不同LoRA模型的批处理操作在GPU上混合进行，这使得GPU只需要储存一个基础预训练模型，从而大幅提高GPU的内存和计算效率。
results: 根据评估结果，Punica在共享GPU集群中服务多个LoRA模型时，与现有的LLM服务系统相比，可以实现12倍的throughput提高，仅加2毫秒迟延性每个字。Punica的源代码可以在https://github.com/punica-ai/punica上获取。

Abstract
Low-rank adaptation (LoRA) has become an important and popular method to adapt pre-trained models to specific domains. We present Punica, a system to serve multiple LoRA models in a shared GPU cluster. Punica contains a new CUDA kernel design that allows batching of GPU operations for different LoRA models. This allows a GPU to hold only a single copy of the underlying pre-trained model when serving multiple, different LoRA models, significantly enhancing GPU efficiency in terms of both memory and computation. Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster. With a fixed-sized GPU cluster, our evaluations show that Punica achieves 12x higher throughput in serving multiple LoRA models compared to state-of-the-art LLM serving systems while only adding 2ms latency per token. Punica is open source at https://github.com/punica-ai/punica .

摘要
低阶尝试（LoRA）已成为特定领域适应模型的重要和受欢迎方法。我们介绍了一个名为“Punica”的系统，用于在共享GPU集群中服务多个LoRA模型。Punica包含一个新的CUDA内核设计，允许不同LoRA模型的GPU操作批处理。这意味着GPU只需要存储一份基于预训练模型的底层模型，可以大幅提高GPU的内存和计算效率。我们的调度器将多家租户的LoRA服务工作负载在共享GPU集群中卷积。与现状的LLM服务系统相比，我们的Punica在服务多个LoRA模型时实现了12倍的throughput，同时只增加2毫秒的延迟每个字。Punica的源代码可以在上下载。

End-to-end Feature Selection Approach for Learning Skinny Trees

paper_url: http://arxiv.org/abs/2310.18542
repo_url: None
paper_authors: Shibal Ibrahim, Kayhan Behdin, Rahul Mazumder
for:这篇论文的目的是提出一种同时进行特征选择和树ensemble学习的工具包，以提高树ensemble模型的性能和可读性。methods:这篇论文使用了一种综合优化方法，包括特征选择和树ensemble学习，并且使用了分组L0-L2正则化来实现特征选择。results:这篇论文在15个 sintetic和实际世界数据集上实现了特征压缩率为1.5倍至620倍，并且在某些情况下可以达到10倍的推理速度提升，而无需失去性能。此外，这篇论文的特征选择方法也超过了许多现有的工具包，例如LightGBM和Random Forests，在特定的特征预算下（25%），Skinny Trees可以提高AUC性能，比LightGBM提高10.2%（最高达37.7%），比Random Forests提高3%（最高达12.5%）。

Abstract
Joint feature selection and tree ensemble learning is a challenging task. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importances, which are known to be misleading, and can significantly hurt performance. We propose Skinny Trees: a toolkit for feature selection in tree ensembles, such that feature selection and tree ensemble learning occurs simultaneously. It is based on an end-to-end optimization approach that considers feature selection in differentiable trees with Group $\ell_0 - \ell_2$ regularization. We optimize with a first-order proximal method and present convergence guarantees for a non-convex and non-smooth objective. Interestingly, dense-to-sparse regularization scheduling can lead to more expressive and sparser tree ensembles than vanilla proximal method. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5\times$ - $620\times$ feature compression rates, leading up to $10\times$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for $25\%$ feature budget, Skinny Trees outperforms LightGBM by $10.2\%$ (up to $37.7\%$), and Random Forests by $3\%$ (up to $12.5\%$).

摘要
共同特征选择和树集合学习是一项具有挑战性的任务。常见的树集合工具包，例如梯度拟合树和随机森林，支持基于特征重要性的特征选择，这些特征选择是已知会导致性能下降的。我们提出了瘦树：一个特征选择在树集合学习中同时进行的工具包。它基于一个端到端优化方法，考虑特征选择在分子树中的分支权重 regularization。我们使用一种第一个贝叶幂方法并提供了对非对称和非均匀目标函数的收敛保证。有趣的是， dense-to-sparse 规则调度可以导致更具表达力和更稀疏的树集合，而不是原始的 proximal 方法。在 15 个 synthetic 和实际世界数据集上，瘦树可以实现 $1.5\times$ - $620\times$ 特征压缩率，导致 $10\times$ 更快的推理速度，而无损性能。瘦树在多个现有工具包中的特征选择表现更佳，例如在 $25\%$ 特征预算下，瘦树可以跟上 LightGBM 的 $10.2\%$ (最高 $37.7\%$)，并跟上 Random Forests 的 $3\%$ (最高 $12.5\%$)。

2023-10-28

World Model Based Sim2Real Transfer for Visual Navigation

A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees

Intrinsic Gaussian Vector Fields on Manifolds

Successfully Applying Lottery Ticket Hypothesis to Diffusion Model

Adaptive Test-Time Personalization for Federated Learning

Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

The Synergy of Speculative Decoding and Batching in Serving Large Language Models

Inverse distance weighting attention

Weakly Coupled Deep Q-Networks

A Competitive Algorithm for Agnostic Active Learning

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

A Data-driven Recommendation Framework for Optimal Walker Designs

Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

Optimization of utility-based shortfall risk: A non-asymptotic viewpoint

Curriculum Learning for Graph Neural Networks: Which Edges Should We Learn First

Latent class analysis by regularized spectral clustering

On the Accuracy of Hotelling-Type Asymmetric Tensor Deflation: A Random Tensor Analysis

Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding

Episodic Multi-Task Learning with Heterogeneous Neural Processes

ALERTA-Net: A Temporal Distance-Aware Recurrent Networks for Stock Movement and Volatility Prediction

Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning

Towards Combinatorial Generalization for Catalysts: A Kohn-Sham Charge-Density Approach

Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards

Clairvoyance: A Pipeline Toolkit for Medical Time Series

DySurv: Dynamic Deep Learning Model for Survival Prediction in the ICU

Energy-Based Models for Anomaly Detection: A Manifold Diffusion Recovery Approach

Maximum Independent Set: Self-Training through Dynamic Programming

Causal discovery in a complex industrial system: A time series benchmark

SSL Framework for Causal Inconsistency between Structures and Representations

Explainable Modeling for Wind Power Forecasting: A Glass-Box Approach with Exceptional Accuracy

Pessimistic Off-Policy Multi-Objective Optimization

Temporally Disentangled Representation Learning under Unknown Nonstationarity

Efficient kernel surrogates for neural network-based regression

Where have you been? A Study of Privacy Risk for Point-of-Interest Recommendation

TorchDEQ: A Library for Deep Equilibrium Models

Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

Online Decision Mediation

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

Inverse Decision Modeling: Learning Interpretable Representations of Behavior

Optimal Transport for Kernel Gaussian Mixture Models

Group Robust Classification Without Any Group Information

Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion

The Role of Reference Points in Machine-Learned Atomistic Simulation Models

Punica: Multi-Tenant LoRA Serving

End-to-end Feature Selection Approach for Learning Skinny Trees