2023-10-02

cs.LG

cs.LG - 2023-10-02

Transformers are efficient hierarchical chemical graph learners

paper_url: http://arxiv.org/abs/2310.01704
repo_url: None
paper_authors: Zihan Pengmei, Zimu Li, Chih-chan Tien, Risi Kondor, Aaron R. Dinner
for: 这篇论文是为了提出一种基于自然语言处理的图表示学习方法，即 SubFormer，以解决现代图 transformer 中节点或边视为分立的做法所导致的计算挑战。
methods: SubFormer 使用了一种消息传递机制来聚合信息，从而减少了 tokens 的数量并提高了长距离交互的学习。
results: 作者在使用 SubFormer 进行化学结构预测任务上达到了与当前状态OF-the-art 图 transformer 的竞争水平，并且在消耗了许多 fewer 计算资源的情况下。具体来说，训练时间在consumer-grade graphics card 上只需要几分钟。此外，作者还进行了对 attention weights 的解释，并证明 SubFormer 具有限制过拟合和过压缩的特点。

Abstract
Transformers, adapted from natural language processing, are emerging as a leading approach for graph representation learning. Contemporary graph transformers often treat nodes or edges as separate tokens. This approach leads to computational challenges for even moderately-sized graphs due to the quadratic scaling of self-attention complexity with token count. In this paper, we introduce SubFormer, a graph transformer that operates on subgraphs that aggregate information by a message-passing mechanism. This approach reduces the number of tokens and enhances learning long-range interactions. We demonstrate SubFormer on benchmarks for predicting molecular properties from chemical structures and show that it is competitive with state-of-the-art graph transformers at a fraction of the computational cost, with training times on the order of minutes on a consumer-grade graphics card. We interpret the attention weights in terms of chemical structures. We show that SubFormer exhibits limited over-smoothing and avoids over-squashing, which is prevalent in traditional graph neural networks.

摘要
transformers，起源于自然语言处理，在图表示学习中emerging为领先方法。当前的图transformers通常将节点或边视为分立的token。这种方法会导致对几乎任何大小的图进行计算而带来挑战，因为自我注意复杂性与token数平方成正比。在这篇论文中，我们介绍SubFormer，一种基于消息传递机制的图transformer，可以在subgraph上进行图表示学习。这种方法可以减少token数量，提高了长距离交互的学习。我们在化学结构预测 tasks上使用SubFormer，并证明它与当前的图transformers在计算成本上一样竞争，但是训练时间只需几分钟，可以在consumer级别的图形处理卡上完成。我们还对SubFormer的注意力权重进行了解释，并证明它在化学结构中具有有限的过滤和压缩现象，这些现象在传统的图神经网络中很普遍。

Robustifying State-space Models for Long Sequences via Approximate Diagonalization

paper_url: http://arxiv.org/abs/2310.01698
repo_url: None
paper_authors: Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney, N. Benjamin Erichson
for: 本研究旨在提出一种泛化的”强制后向稳定”（PTD）方法，用于解决机器学习中的非正常矩阵对称化问题。
methods: 我们提出了一种基于pseudospectral理论的非正常矩阵对称化方法，并在S4-PTD和S5-PTD模型中应用了这种方法。
results: 我们通过对不同初始化方案的传输函数的分析，证明了S4-PTD/S5-PTD初始化对HiPPO框架强有吸引力，而S4D/S5初始化只能获得弱连续性。此外，我们的S5-PTD模型在Long-Range Arenabenchmark上得到了87.6%的准确率，表明PTD方法可以提高深度学习模型的准确率。

Abstract
State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have considered a purely diagonal structure. This choice simplifies the implementation, improves computational efficiency, and allows channel communication. However, diagonalizing the HiPPO framework is itself an ill-posed problem. In this paper, we propose a general solution for this and related ill-posed diagonalization problems in machine learning. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology, which is based on the pseudospectral theory of non-normal operators, and which may be interpreted as the approximate diagonalization of the non-normal matrices defining SSMs. Based on this, we introduce the S4-PTD and S5-PTD models. Through theoretical analysis of the transfer functions of different initialization schemes, we demonstrate that the S4-PTD/S5-PTD initialization strongly converges to the HiPPO framework, while the S4D/S5 initialization only achieves weak convergences. As a result, our new models show resilience to Fourier-mode noise-perturbed inputs, a crucial property not achieved by the S4D/S5 models. In addition to improved robustness, our S5-PTD model averages 87.6% accuracy on the Long-Range Arena benchmark, demonstrating that the PTD methodology helps to improve the accuracy of deep learning models.

摘要
状态空间模型（SSM）最近在长距离序列任务中得到应用。例如，结构化状态空间序列（S4）层使用了HiPPO初始化框架的对角线加低级结构。然而，S4层的复杂结构带来挑战，以至于模型如S4D和S5在解决这些挑战时考虑了纯对角结构。这种选择简化实现，提高计算效率，并允许通道通信。然而，对HiPPO框架的对角化本身是一个不定 пробле。在这篇文章中，我们提出了一种通用的解决方案，基于非正常算子的pseudospectral理论，并可以看作是SSM中非正常矩阵的 Approximate diagonalization。基于这，我们引入了S4-PTD和S5-PTD模型。通过对不同初始化方案的传输函数的分析，我们证明了S4-PTD/S5-PTD初始化强 converges to HiPPO框架，而S4D/S5初始化只有weak converges。因此，我们新的模型具有耐 Fourier-mode 噪声扰动输入的性能，而S4D/S5模型没有达到这种性能。此外，我们的S5-PTD模型在Long-Range Arena benchmark上的准确率为87.6%，表明PTD方法可以提高深度学习模型的准确率。

DANI: Fast Diffusion Aware Network Inference with Preserving Topological Structure Property

paper_url: http://arxiv.org/abs/2310.01696
repo_url: https://github.com/aryanahadinia/dani
paper_authors: Maryam Ramezani, Aryan Ahadinia, Erfan Farhadi, Hamid R. Rabiee
for: 推断社交网络的底层结构
methods: 基于时序链reactivity Matrix和节点之间相似性的方法
results: 高精度和低运行时间，保持结构性特征，包括模块结构、度分布、连接分量、浸泡度和嵌入度Here’s the same information in English:
for: Inferring the underlying structure of social networks
methods: Based on the Markov transition matrix derived from time series cascades and node-node similarity from a structural perspective
results: High accuracy and low running time, preserving structural properties, including modular structure, degree distribution, connected components, density, and clustering coefficients.

Abstract
The fast growth of social networks and their data access limitations in recent years has led to increasing difficulty in obtaining the complete topology of these networks. However, diffusion information over these networks is available, and many algorithms have been proposed to infer the underlying networks using this information. The previously proposed algorithms only focus on inferring more links and ignore preserving the critical topological characteristics of the underlying social networks. In this paper, we propose a novel method called DANI to infer the underlying network while preserving its structural properties. It is based on the Markov transition matrix derived from time series cascades, as well as the node-node similarity that can be observed in the cascade behavior from a structural point of view. In addition, the presented method has linear time complexity (increases linearly with the number of nodes, number of cascades, and square of the average length of cascades), and its distributed version in the MapReduce framework is also scalable. We applied the proposed approach to both real and synthetic networks. The experimental results showed that DANI has higher accuracy and lower run time while maintaining structural properties, including modular structure, degree distribution, connected components, density, and clustering coefficients, than well-known network inference methods.

摘要
“社交网络的快速增长和数据访问限制在最近几年中，导致了获取社交网络的完整拓扑结构的增加困难。然而，社交网络上的协议信息可以获取，许多算法已经被提出来使用这些信息推断社交网络的下面结构。但这些算法只注重推断更多的链接，忽略了保持社交网络的结构性特征。在这篇论文中，我们提出了一种新的方法 called DANI，可以在保持社交网络的结构性特征的情况下推断社交网络的下面结构。它基于时间序列冲击矩阵，以及从结构角度观察到的节点对节点相似性。此外，提出的方法的时间复杂度为线性时间复杂度（与节点数、冲击数、冲击链的平均长度平方成正比增长），其分布式版本在MapReduce框架中也可扩展。我们对真实网络和synthetic网络进行了实验，结果表明，DANI比较知名的网络推断方法更高精度且更快速，同时保持了社交网络的结构特征，包括模块结构、度分布、连接分布、密度和嵌入系数。”

Forecasting Tropical Cyclones with Cascaded Diffusion Models

paper_url: http://arxiv.org/abs/2310.01690
repo_url: https://github.com/nathzi1505/forecast-diffmodels
paper_authors: Pritthijit Nath, Pancham Shukla, César Quilodrán-Casas
for: 预测飓风轨迹和降水强度
methods: 利用扩散模型 Integrate 卫星成像、远程感知和大气数据，采用级联方法，包括预测、超分辨和降水模型，对51个飓风基区进行训练
results: 实验表明，级联模型的最终预测可以准确预测飓风轨迹和降水强度，SSIM和PSNR值分别高于0.5和20 dB，适用于高性能需求和财力有限的地区。Here’s the English version for reference:
for: Forecasting cyclone trajectories and precipitation patterns
methods: Leveraging diffusion models to integrate satellite imaging, remote sensing, and atmospheric data, using a cascaded approach that includes forecasting, super-resolution, and precipitation modeling, with training on a dataset of 51 cyclones from six major basins.
results: Experiments show that the final forecasts from the cascaded models can accurately predict cyclone trajectories and precipitation patterns up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks.

Abstract
As cyclones become more intense due to climate change, the rise of AI-based modelling provides a more affordable and accessible approach compared to traditional methods based on mathematical models. This work leverages diffusion models to forecast cyclone trajectories and precipitation patterns by integrating satellite imaging, remote sensing, and atmospheric data, employing a cascaded approach that incorporates forecasting, super-resolution, and precipitation modelling, with training on a dataset of 51 cyclones from six major basins. Experiments demonstrate that the final forecasts from the cascaded models show accurate predictions up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks. This work also highlights the promising efficiency of AI methods such as diffusion models for high-performance needs, such as cyclone forecasting, while remaining computationally affordable, making them ideal for highly vulnerable regions with critical forecasting needs and financial limitations. Code accessible at \url{https://github.com/nathzi1505/forecast-diffmodels}.

摘要
随着气候变化，风暴的强度变得越来越高，AI模型提供了一种更加可靠和可 accessible的方法，相比传统基于数学模型的方法。这项工作利用扩散模型预测风暴轨迹和降水模式，并将卫星影像、远程感知和大气数据集成起来，采用层次结构的方法，包括预测、超分解和降水模型，并对51个风暴数据进行训练。实验表明，最终预测结果从扩散模型中得到了准确的预测结果，SSIM和PSNR值分别高于0.5和20 dB，对所有三个任务都有良好的预测性。此外，这项工作也指出了AI方法如扩散模型在高性能需求下的可行性，同时保持计算可持，使其成为有严重预测需求和财务限制的地区的理想选择。代码可以在 GitHub上获取：\url{https://github.com/nathzi1505/forecast-diffmodels}。

From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression

paper_url: http://arxiv.org/abs/2310.01687
repo_url: None
paper_authors: Xuxing Chen, Krishnakumar Balasubramanian, Promit Ghosal, Bhavya Agrawalla
for: investigate the dynamics of gradient descent using large-order constant step-sizes in quadratic regression models.
methods: use a specific cubic map to encapsulate the dynamics, and conduct a fine-grained bifurcation analysis concerning the step-size parameter.
results: identify five distinct training phases, including monotonic, catapult, periodic, chaotic, and divergent phases, and observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.

Abstract
We conduct a comprehensive investigation into the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models. Within this framework, we reveal that the dynamics can be encapsulated by a specific cubic map, naturally parameterized by the step-size. Through a fine-grained bifurcation analysis concerning the step-size parameter, we delineate five distinct training phases: (1) monotonic, (2) catapult, (3) periodic, (4) chaotic, and (5) divergent, precisely demarcating the boundaries of each phase. As illustrations, we provide examples involving phase retrieval and two-layer neural networks employing quadratic activation functions and constant outer-layers, utilizing orthogonal training data. Our simulations indicate that these five phases also manifest with generic non-orthogonal data. We also empirically investigate the generalization performance when training in the various non-monotonic (and non-divergent) phases. In particular, we observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.

摘要
我们进行了对梯度下降的全面调查，使用大顺序常数步长在 quadratic 回归模型中。在这个框架下，我们发现这些动态可以通过特定的立方图表示，自然地归一化到步长参数。通过细腻的分岔分析，我们划分出五种不同的训练阶段：（1）升序、（2）炸彩、（3）周期、（4）危机和（5）分散，准确地界定每个阶段的边界。例如，我们提供了phaserecovery和两层神经网络，使用quadratic activation functions和常数外层，使用正交训练数据。我们的实验表明，这五个阶段也会在非正交数据上出现。此外，我们还employs empirical investigation of the generalization performance during training in the various non-monotonic（和非分散）阶段，并发现在非升序（和非分散）阶段执行随机轨迹平均可以稳定测试错误。

A Framework for Interpretability in Machine Learning for Medical Imaging

paper_url: http://arxiv.org/abs/2310.01685
repo_url: None
paper_authors: Alan Q. Wang, Batuhan K. Karaman, Heejong Kim, Jacob Rosenthal, Rachit Saluja, Sean I. Young, Mert R. Sabuncu
for: 本研究的目的是提高医学影像分析中的机器学习模型可解性。
methods: 本研究使用了 formalize 可解性需求，通过理解医学影像分析和机器学习的实际任务和目标，identify 四个核心可解性元素：localization、视觉可识别、物理归因和透明度。
results: 本研究为医学影像分析领域的机器学习模型设计提供了实用和教学信息， inspiritedevelopers可以更深入理解可解性的目的和方法，并提出了未来可解性研究的方向。

Abstract
Interpretability for machine learning models in medical imaging (MLMI) is an important direction of research. However, there is a general sense of murkiness in what interpretability means. Why does the need for interpretability in MLMI arise? What goals does one actually seek to address when interpretability is needed? To answer these questions, we identify a need to formalize the goals and elements of interpretability in MLMI. By reasoning about real-world tasks and goals common in both medical image analysis and its intersection with machine learning, we identify four core elements of interpretability: localization, visual recognizability, physical attribution, and transparency. Overall, this paper formalizes interpretability needs in the context of medical imaging, and our applied perspective clarifies concrete MLMI-specific goals and considerations in order to guide method design and improve real-world usage. Our goal is to provide practical and didactic information for model designers and practitioners, inspire developers of models in the medical imaging field to reason more deeply about what interpretability is achieving, and suggest future directions of interpretability research.

摘要
machine learning models in medical imaging (MLMI) 的可解释性是一个重要的研究方向。然而，有一个通用的感觉是，可解释性的含义不够明确。为什么需要在 MLMI 中的可解释性？我们需要解决什么问题时需要可解释性？为了回答这些问题，我们需要正式化 MLMI 中的目标和元素。通过考虑医学影像分析和机器学习的实际任务和目标，我们确定了 MLMI 中的四个核心元素：局部化、视觉可识别性、物理归因和透明度。总之，这篇论文将 MLMI 中的可解释性需求进行了形式化，并通过应用实际的视角，为模型设计者和实践者提供了实用和教学的信息，激励医学影像领域中的模型开发者更深入思考可解释性的目的，并建议将来的可解释性研究的未来方向。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Commutative Width and Depth Scaling in Deep Neural Networks

paper_url: http://arxiv.org/abs/2310.01683
repo_url: None
paper_authors: Soufiane Hayou
for: 本研究是深度神经网络中 commutativity 的第二篇文章，旨在理解深度神经网络中宽度和深度在无穷大时的行为，并 eventually 确定 commutativity 是否成立。
methods: 本文使用新的证明技术，基于更加容易理解的杂event calculus，证明深度神经网络中 skip connections 的使用可以使 covariance 结构保持不变，无论宽度和深度在无穷大时如何取得。
results: 本文的结果表明，在深度神经网络中，采用 skip connections 的方法，可以使 covariance 结构保持不变，无论宽度和深度在无穷大时如何取得。这些结果扩展了先前的研究（参考 [55]），并有很多理论和实践上的意义，我们在文章中进行了详细的介绍和讨论。

Abstract
This paper is the second in the series Commutative Scaling of Width and Depth (WD) about commutativity of infinite width and depth limits in deep neural networks. Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i.e. the neural function tends to the same limit no matter how width and depth limits are taken. In this paper, we formally introduce and define the commutativity framework, and discuss its implications on neural network design and scaling. We study commutativity for the neural covariance kernel which reflects how network layers separate data. Our findings extend previous results established in [55] by showing that taking the width and depth to infinity in a deep neural network with skip connections, when branches are suitably scaled to avoid exploding behaviour, result in the same covariance structure no matter how that limit is taken. This has a number of theoretical and practical implications that we discuss in the paper. The proof techniques in this paper are novel and rely on tools that are more accessible to readers who are not familiar with stochastic calculus (used in the proofs of WD(I))).

摘要
这份论文是WD系列的第二篇，探讨深度神经网络中宽度和深度范围的交换性。我们的目标是在宽度和深度范围趋向于无穷大时，理解神经函数（取决于神经网络模型的函数）的行为，并 eventually 确定在某些设置下，交换性存在，即神经函数往往趋向同一个边界，无论宽度和深度范围如何选择。在这篇论文中，我们正式引入和定义交换性框架，并讨论其对神经网络设计和缩放的影响。我们研究交换性在神经卷积核中，这个核心反映了神经网络层次如何分离数据。我们的发现超越了之前的结果（参考 [55]），显示在深度神经网络中具有跳跃连接的情况下，当分支适当缩放以避免暴跌行为时，宽度和深度范围趋向同一个covariance结构，无论如何选择这个边界。这有许多理论和实践意义，我们在论文中详细介绍。这份论文的证明技巧是新的，并且基于更加可 accessible 的概率Calculus（WD(I) 证明中使用的概率Calculus）。

Estimating and Implementing Conventional Fairness Metrics With Probabilistic Protected Features

paper_url: http://arxiv.org/abs/2310.01679
repo_url: None
paper_authors: Hadi Elzayn, Emily Black, Patrick Vossler, Nathanael Jo, Jacob Goldin, Daniel E. Ho
For: 本研究的目的是开发一种能够在有限protected attribute标签的情况下训练公平模型的方法。* Methods: 我们提出了一种使用可信度提升 surname geocoding 来获取保护属性标签的 probabilistic 估计，并使用这些估计来计算公平指标的上下限。另外，我们还提出了一种基于上下文信息的具体化方法，该方法利用模型预测结果和保护属性的 probabilistic 预测结果之间的关系来提供更紧的上限。* Results: 我们的实验表明，我们的测量方法可以与previous方法相比，在这些应用程序中紧跟true disparity的上限。此外，我们的训练方法可以减少disparity，同时与其他具有有限保护属性标签的公平优化方法相比，具有较小的公平精度质量trade-off。

Abstract
The vast majority of techniques to train fair models require access to the protected attribute (e.g., race, gender), either at train time or in production. However, in many important applications this protected attribute is largely unavailable. In this paper, we develop methods for measuring and reducing fairness violations in a setting with limited access to protected attribute labels. Specifically, we assume access to protected attribute labels on a small subset of the dataset of interest, but only probabilistic estimates of protected attribute labels (e.g., via Bayesian Improved Surname Geocoding) for the rest of the dataset. With this setting in mind, we propose a method to estimate bounds on common fairness metrics for an existing model, as well as a method for training a model to limit fairness violations by solving a constrained non-convex optimization problem. Unlike similar existing approaches, our methods take advantage of contextual information -- specifically, the relationships between a model's predictions and the probabilistic prediction of protected attributes, given the true protected attribute, and vice versa -- to provide tighter bounds on the true disparity. We provide an empirical illustration of our methods using voting data. First, we show our measurement method can bound the true disparity up to 5.5x tighter than previous methods in these applications. Then, we demonstrate that our training technique effectively reduces disparity while incurring lesser fairness-accuracy trade-offs than other fair optimization methods with limited access to protected attributes.

摘要
大多数尝试培训公平模型都需要访问保护属性（例如种族、性别），ether during training or in production. However, in many important applications, this protected attribute is not readily available. In this paper, we develop methods for measuring and reducing fairness violations in a setting with limited access to protected attribute labels. Specifically, we assume access to protected attribute labels for a small subset of the dataset of interest, but only probabilistic estimates of protected attribute labels (e.g., via Bayesian Improved Surname Geocoding) for the rest of the dataset. With this setting in mind, we propose a method to estimate bounds on common fairness metrics for an existing model, as well as a method for training a model to limit fairness violations by solving a constrained non-convex optimization problem. Unlike similar existing approaches, our methods take advantage of contextual information -- specifically, the relationships between a model's predictions and the probabilistic prediction of protected attributes, given the true protected attribute, and vice versa -- to provide tighter bounds on the true disparity. We provide an empirical illustration of our methods using voting data. First, we show our measurement method can bound the true disparity up to 5.5 times tighter than previous methods in these applications. Then, we demonstrate that our training technique effectively reduces disparity while incurring lesser fairness-accuracy trade-offs than other fair optimization methods with limited access to protected attributes.

Score dynamics: scaling molecular dynamics with picosecond timesteps via conditional diffusion model

paper_url: http://arxiv.org/abs/2310.01678
repo_url: None
paper_authors: Tim Hsu, Babak Sadigh, Vasily Bulatov, Fei Zhou
for: 这个论文是为了学习有效的演化运算符，从分子动力学实验中获得的分子动力学模型。
methods: 这个论文使用了分子动力学实验中的分子动力学模型，并使用了图神经网络来构建分子动力学系统的分数动力学模型。
results: 这个论文的实验结果表明，使用分数动力学模型可以在1~ps时间步长下进行高速的分子动力学模拟，并且可以与分子动力学实验的结果相符。

Abstract
We propose score dynamics (SD), a general framework for learning effective evolution operators for atomistic as well as coarse-grained dynamics from molecular-dynamics (MD) simulations. SD is centered around scores, or derivatives of the transition log-probability with respect to the dynamical degrees of freedom. The latter play the same role as force fields in MD but are used in denoising diffusion probability models to generate discrete transitions of the dynamical variables in an SD timestep, which can be orders of magnitude larger than a typical MD timestep. In this work, we construct graph neural network based score dynamics models of realistic molecular systems that are evolved with 1~ps timesteps. We demonstrate the efficacy of score dynamics with case studies of alanine dipeptide and short alkanes in aqueous solution. Both equilibrium predictions derived from the stationary distributions of the conditional probability and kinetic predictions for the transition rates and transition paths are in good agreement with MD at about 8-18 fold wall-clock speedup. Open challenges and possible future remedies to improve score dynamics are also discussed.

摘要
我们提出了得分动力学（SD），一种泛化框架，可以从分子动力学（MD）仿真中学习有效的演化运算符。SD中心在于得分，即动力学变量的转移极 probabilistic 的导数。这些导数与力场在MD中扮演相同的角色，但是在排除噪声扩散概率模型中使用，以生成动态变量的精炼过程中的精炼步骤，这些步骤可以是MD步骤的数个数量级。在这种工作中，我们使用图 neural network 构建了真实分子系统的Score Dynamics 模型，这些模型在1~ps步骤中进行了演化。我们通过对 Alanine dipeptide 和尘埃烷在液态中的情况进行了 caso study，并证明了得分动力学的有效性。我们的方法可以与MD相比，提高了8-18倍的计时速度。我们还讨论了现有的挑战和可能的未来改进。

Locality-Aware Graph-Rewiring in GNNs

paper_url: http://arxiv.org/abs/2310.01668
repo_url: None
paper_authors: Federico Barbero, Ameya Velingker, Amin Saberi, Michael Bronstein, Francesco Di Giovanni
for: 本文旨在提高图像学习中的图结构学习模型（Graph Neural Networks，GNNs）的表现，通过修改图的连接方式来改善信息流动。
methods: 本文提出了三种重要的条件 для图重编组：减少过载、尊重图的本地特性和保持图的稀疏性。同时，本文还提出了一种新的重编组框架，通过地域性执行重编组操作来满足这三个条件。
results: 本文通过多个实验 validate 了新的重编组框架的有效性，并证明它可以与或大幅超过现有的重编组方法。

Abstract
Graph Neural Networks (GNNs) are popular models for machine learning on graphs that typically follow the message-passing paradigm, whereby the feature of a node is updated recursively upon aggregating information over its neighbors. While exchanging messages over the input graph endows GNNs with a strong inductive bias, it can also make GNNs susceptible to over-squashing, thereby preventing them from capturing long-range interactions in the given graph. To rectify this issue, graph rewiring techniques have been proposed as a means of improving information flow by altering the graph connectivity. In this work, we identify three desiderata for graph-rewiring: (i) reduce over-squashing, (ii) respect the locality of the graph, and (iii) preserve the sparsity of the graph. We highlight fundamental trade-offs that occur between spatial and spectral rewiring techniques; while the former often satisfy (i) and (ii) but not (iii), the latter generally satisfy (i) and (iii) at the expense of (ii). We propose a novel rewiring framework that satisfies all of (i)--(iii) through a locality-aware sequence of rewiring operations. We then discuss a specific instance of such rewiring framework and validate its effectiveness on several real-world benchmarks, showing that it either matches or significantly outperforms existing rewiring approaches.

摘要
图神网络（GNN）是常用的机器学习模型，它通常遵循消息传递假设，其中节点的特征通过与邻居 nodes 的信息聚合来进行更新。在传递消息过程中，GNN 具有强 inductive bias，但是这也可能导致 GNN 不能捕捉图中远距离的交互。为了解决这个问题，人们提出了图重排技术，以改善信息流动。在这个工作中，我们提出了三个愿景 для图重排：1. 减少过度压缩：图重排应该减少 GNN 中节点之间信息的重复，以便更好地捕捉图中的交互。2. 尊重图的本地性：图重排应该尊重图的本地结构，避免对图的整体结构进行大规模的改变。3. 保持图的稀疏性：图重排应该保持图的稀疏性，避免对图的稀疏性进行大规模的破坏。我们指出了图重排技术之间的基本质量贝各，其中一般来说，空间重排技术可以减少过度压缩，但是通常不能尊重图的本地性和稀疏性。相反，spectral重排技术通常能够尊重图的本地性和稀疏性，但是通常不能减少过度压缩。我们提出了一种新的重排框架，该框架通过一系列本地化的重排操作来满足所有的愿景。我们then 讨论了这种重排框架的一个具体实现，并在多个实际 benchmark 上验证了其效果，显示它可以与或大于现有的重排方法相比。

Home Electricity Data Generator (HEDGE): An open-access tool for the generation of electric vehicle, residential demand, and PV generation profiles

paper_url: http://arxiv.org/abs/2310.01661
repo_url: None
paper_authors: Flora Charbonnier, Thomas Morstyn, Malcolm McCulloch
for: 本研究开发了一个名为Home Electricity Data Generator（HEDGE）的开源工具，用于随机生成真实的住宅电力数据。
methods: 本研究使用了生成对抗网络（GANs）来训练生成真实的人工数据，并将其分为不同的行为群。
results: HEDGE可以填补现有数据库中的数据损失，并生成一些真实的住宅电力数据，包括太阳能发电、家用电力负载和电动车的消耗。这些数据可以用于研究住宅分布式能源资源的特性和协调。

Abstract
In this paper, we present the Home Electricity Data Generator (HEDGE), an open-access tool for the random generation of realistic residential energy data. HEDGE generates realistic daily profiles of residential PV generation, household electric loads, and electric vehicle consumption and at-home availability, based on real-life UK datasets. The lack of usable data is a major hurdle for research on residential distributed energy resources characterisation and coordination, especially when using data-driven methods such as machine learning-based forecasting and reinforcement learning-based control. A key issue is that while large data banks are available, they are not in a usable format, and numerous subsequent days of data for a given single home are unavailable. We fill these gaps with the open-access HEDGE tool which generates data sequences of energy data for several days in a way that is consistent for single homes, both in terms of profile magnitude and behavioural clusters. From raw datasets, pre-processing steps are conducted, including filling in incomplete data sequences and clustering profiles into behaviour clusters. Generative adversarial networks (GANs) are then trained to generate realistic synthetic data representative of each behaviour groups consistent with real-life behavioural and physical patterns.

摘要
本文介绍了家庭电力数据生成器（HEDGE），一种开源工具，用于随机生成真实的家庭可再生能源数据。HEDGE生成了真实的每天家庭太阳能生成、家庭电力负荷和电动车消耗的日程表，基于英国实际数据。由于家庭分布式能源资源特征化和协调研究的数据缺乏问题，特别是使用数据驱动方法such as机器学习预测和强化学习控制时，HEDGE工具填补了这些缺失。HEDGE工具可以生成一系列的能源数据序列，包括家庭特有的能源资源特征和行为带。从原始数据开始，进行了预处理步骤，包括填充不完整的数据序列和对 Profile clustering。然后，使用生成敌方网络（GANs）训练生成真实的同一个行为群的合理的 sintetic数据，与实际行为和物理特征相符。

REMEDI: REinforcement learning-driven adaptive MEtabolism modeling of primary sclerosing cholangitis DIsease progression

paper_url: http://arxiv.org/abs/2310.01426
repo_url: None
paper_authors: Chang Hu, Krishnakant V. Saboo, Ahmad H. Ali, Brian D. Juran, Konstantinos N. Lazaridis, Ravishankar K. Iyer
For: This paper aims to introduce a framework called REMEDI, which can assist in exploring treatments for Primary Sclerosing Cholangitis (PSC) by capturing bile acid dynamics and the body’s adaptive response during PSC progression.* Methods: REMEDI combines a differential equation (DE)-based mechanistic model of bile acid metabolism with reinforcement learning (RL) to emulate the body’s adaptations to PSC continuously. The framework leverages RL to approximate adaptations in PSC, treating homeostasis as a reward signal and adjusting the DE parameters as the corresponding actions.* Results: On real-world data, REMEDI generated bile acid dynamics and parameter adjustments consistent with published findings, and supported discussions in the literature that early administration of drugs that suppress bile acid synthesis may be effective in PSC treatment.Here’s the simplified Chinese text for the three key points:* For: 这篇论文目的是介绍一种名为REMEDI的框架，该框架可以帮助研究Primary Sclerosing Cholangitis (PSC) 的治疗方法，通过捕捉胆囊酸的动态和身体的适应反应来模拟PSC的进程。* Methods: REMEDI 结合了差分方程 (DE) 基本的机制模型和奖励学习 (RL) 来模拟身体在PSC 进程中的适应。框架通过RL来估算PSC 的适应，将身体的适应视为奖励信号，并将DE 参数的调整视为相应的行动。* Results: 在实际数据上，REMEDI 生成的胆囊酸动态和参数调整与已发表文献相符，并支持 literatura 中关于PSC 治疗的讨论，提出了抑制胆囊酸 synthesis 的药物可能在PSC 治疗中的早期行使有效性。

Abstract
Primary sclerosing cholangitis (PSC) is a rare disease wherein altered bile acid metabolism contributes to sustained liver injury. This paper introduces REMEDI, a framework that captures bile acid dynamics and the body's adaptive response during PSC progression that can assist in exploring treatments. REMEDI merges a differential equation (DE)-based mechanistic model that describes bile acid metabolism with reinforcement learning (RL) to emulate the body's adaptations to PSC continuously. An objective of adaptation is to maintain homeostasis by regulating enzymes involved in bile acid metabolism. These enzymes correspond to the parameters of the DEs. REMEDI leverages RL to approximate adaptations in PSC, treating homeostasis as a reward signal and the adjustment of the DE parameters as the corresponding actions. On real-world data, REMEDI generated bile acid dynamics and parameter adjustments consistent with published findings. Also, our results support discussions in the literature that early administration of drugs that suppress bile acid synthesis may be effective in PSC treatment.

摘要
主要硬化性胆汁炎（PSC）是一种罕见的疾病，其中改变的胆汁酸代谢过程对持续liver injury做出了贡献。本文介绍了REMEDI框架，该框架旨在捕捉胆汁酸动力学和身体的适应应对PSC进程中的变化。REMEDI通过结合极限值方程（DE）基本机制模型和强化学习（RL）来模拟身体适应PSC的过程，并且通过RL来让身体在PSC进程中实现homeostasis。在这个过程中，RL通过调整DE参数来实现这一目标。在实际数据上，REMEDI生成的胆汁酸动力学和参数调整均与出版物中的发现一致。此外，我们的结果支持文献中的讨论， early administration of drugs that suppress bile acid synthesis may be effective in PSC treatment。

PolySketchFormer: Fast Transformers via Sketches for Polynomial Kernels

paper_url: http://arxiv.org/abs/2310.01655
repo_url: None
paper_authors: Praneeth Kacham, Vahab Mirrokni, Peilin Zhong
for: This paper aims to improve the efficiency of transformer architectures for language modeling by replacing softmax attention with a polynomial function and using polynomial sketching.
methods: The paper proposes a new attention mechanism called polynomial attention, which uses sketches for Polynomial Kernel from the randomized numerical linear algebra literature to approximate the attention output. The paper also introduces an efficient block-based algorithm to apply the causal mask to the attention matrix without explicitly realizing the $n \times n$ attention matrix.
results: The paper shows that the proposed polynomial attention mechanism leads to a significantly faster attention mechanism without assuming any sparse structure for the attention matrix, and the block-based algorithm gives significant speedups over the cumulative sum algorithm used by Performer. The paper also validates the design empirically by training language models with long context lengths and shows that the eval perplexities of the models are comparable to those of models trained with softmax attention, and the training times are significantly faster than FlashAttention.

Abstract
The quadratic complexity of attention in transformer architectures remains a big bottleneck in scaling up large foundation models for long context. In fact, recent theoretical results show the hardness of approximating the output of softmax attention mechanism in sub-quadratic time assuming Strong Exponential Time Hypothesis. In this paper, we show how to break this theoretical barrier by replacing softmax with a polynomial function and polynomial sketching. In particular we show that sketches for Polynomial Kernel from the randomized numerical linear algebra literature can be used to approximate the polynomial attention which leads to a significantly faster attention mechanism without assuming any sparse structure for the attention matrix that has been done in many previous works. In addition, we propose an efficient block-based algorithm that lets us apply the causal mask to the attention matrix without explicitly realizing the $n \times n$ attention matrix and compute the output of the polynomial attention mechanism in time linear in the context length. The block-based algorithm gives significant speedups over the \emph{cumulative sum} algorithm used by Performer to apply the causal mask to the attention matrix. These observations help us design \emph{PolySketchFormer}, a practical linear-time transformer architecture for language modeling with provable guarantees. We validate our design empirically by training language models with long context lengths. We first show that the eval perplexities of our models are comparable to that of models trained with softmax attention. We then show that for large context lengths our training times are significantly faster than FlashAttention.

摘要
“对于对称架构中的注意力运算，这是一个很大的瓶颈，尤其是在扩展大型基础模型时。事实上，最近的理论成果显示，对于softmax注意力机制的输出应用权值矩阵在下ynomial时间内难以近似。在本文中，我们显示了如何突破这个理论障碍，通过取代softmax WITH polynomial函数和概率图 sketching。具体来说，我们显示了图 sketches for Polynomial Kernel from the randomized numerical linear algebra literature可以用来近似 polynomial attention，从而实现了较快的注意力运算，不需要假设注意力矩阵的罕见结构。此外，我们提出了一个高效的封页基于算法，可以将 causal mask 应用到注意力矩阵中，而不需要直接建立 $n \times n$ 的注意力矩阵。这个封页基于算法可以实现linear时间内 compute 出 polynomial attention 的输出。与Performer的 cumulative sum 算法相比，这个封页基于算法可以提供重要的几何速度增加。这些观察帮助我们设计了PolySketchFormer，一个实际的linear-time transformer架构，具有证明的保证。我们透过训练语言模型来验证我们的设计。我们首先显示了我们的模型在不同的文本长度下的eval perplexity是相似的，与使用 softmax attention 训练的模型相似。然后，我们显示了在大文本长度下，我们的训练时间是与FlashAttention相比的significantly faster。”

Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

paper_url: http://arxiv.org/abs/2310.01651
repo_url: https://github.com/ys-zong/foolyourvllms
paper_authors: Yongshuo Zong, Tingyang Yu, Bingchen Zhao, Ruchika Chavhan, Timothy Hospedales
for: 这篇论文旨在检测流行语言和视觉语言模型中的敏感性问题，即在多选问题回答中对答案集的排序影响。
methods: 作者使用了多种方法来检测模型的敏感性，包括对模型的输入和输出进行 permutation 操作，并对模型的性能进行分析。
results: 研究发现，流行的语言和视觉语言模型具有 permutation 敏感性问题，即对答案集的排序会导致模型的性能下降。这种敏感性存在于不同的模型大小和最新的语言和视觉语言模型中。

Abstract
Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). Specifically, we show empirically that popular models are vulnerable to adversarial permutation in answer sets for multiple-choice prompting, which is surprising as models should ideally be as invariant to prompt permutation as humans are. These vulnerabilities persist across various model sizes, and exist in very recent language and vision-language models. Code is available at \url{https://github.com/ys-zong/FoolyourVLLMs}.

摘要
大型语言和视觉语言模型在实践中迅速投入使用，其吸引力在 instrucion following、上下文学习等方面表现出色。然而，这也提高了对这些模型的robustness进行仔细分析的需求，以便各方可以了解这些模型在具体应用中是否可靠。在这篇论文中，我们强调了流行模型中的一个特点，即多项选择问题回答中的排序敏感性。我们通过实验证明，流行的模型对答案集的排序很敏感，这是人类应该是不敏感的。这些敏感性存在不同模型大小和最新语言和视觉语言模型中。可以在 \url{https://github.com/ys-zong/FoolyourVLLMs} 上获取代码。

Equivariant Adaptation of Large Pretrained Models

paper_url: http://arxiv.org/abs/2310.01647
repo_url: None
paper_authors: Arnab Kumar Mondal, Siba Smarak Panigrahi, Sékou-Oumar Kaba, Sai Rajeswar, Siamak Ravanbakhsh
for: 使得大型预训练模型具有更高的采样效率和更准确的预测结果，并且能够快速地在训练和推理过程中进行变换
methods: 使用简单的均值化网络将输入转换到均值形式，然后将其传递给未Constrained预测网络
results: 使用 dataset-dependent priors 来指导均值化函数，使得大型预训练模型能够具有更高的采样效率和更准确的预测结果，并且能够在某些情况下提高对于旋转等概率变换的Robustness。

Abstract
Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to higher sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during both training and inference. A recently proposed alternative towards equivariance that removes the architectural constraints is to use a simple canonicalization network that transforms the input to a canonical form before feeding it to an unconstrained prediction network. We show here that this approach can effectively be used to make a large pretrained network equivariant. However, we observe that the produced canonical orientations can be misaligned with those of the training distribution, hindering performance. Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance. This significantly improves the robustness of these models to deterministic transformations of the data, such as rotations. We believe this equivariant adaptation of large pretrained models can help their domain-specific applications with known symmetry priors.

摘要
Equivariant 网络是专门为了保证输入变换的一致性，以提高样本效率和更准确的预测。然而，为了实现这种选择的一致性，现有的深度神经网络架构中的每个组件都需要重新设计，这会导致训练和推断过程中的计算成本增加。一种最近提出的代替方案是使用一个简单的标准化网络，将输入转换为一个标准形式，然后将其传递给一个未定型预测网络。我们在这里表明，这种方法可以有效地使大型预训练模型变换成一致的。然而，我们发现生产的标准方向可能与训练分布的方向不一致，这会降低性能。使用数据集依赖的先验来 inform 标准化函数，我们能够使大型预训练模型变换成一致，同时保持其性能。这会大幅提高这些模型对 deterministic 变换数据（如旋转）的Robustness。我们认为这种一致适应的大型预训练模型可以帮助它们在知道Symmetry先验的领域应用中提高性能。

Deep Insights into Noisy Pseudo Labeling on Graph Data

paper_url: http://arxiv.org/abs/2310.01634
repo_url: None
paper_authors: Botao Wang, Jia Li, Yang Liu, Jiashun Cheng, Yu Rong, Wenjia Wang, Fugee Tsung
for: 本文旨在对 Pseudo Labeling (PL) 策略在图学习模型中的应用进行深入分析，并提出一种谨慎的PL方法来改进图学习过程。
methods: 本文使用错误分析方法对 PL 策略进行了深入分析，并提出了一种基于 confidence 和多视图一致性的PL方法。
results: 实验结果显示，提出的方法可以改善图学习过程，并在链接预测和节点分类任务上超过了其他 PL 策略。

Abstract
Pseudo labeling (PL) is a wide-applied strategy to enlarge the labeled dataset by self-annotating the potential samples during the training process. Several works have shown that it can improve the graph learning model performance in general. However, we notice that the incorrect labels can be fatal to the graph training process. Inappropriate PL may result in the performance degrading, especially on graph data where the noise can propagate. Surprisingly, the corresponding error is seldom theoretically analyzed in the literature. In this paper, we aim to give deep insights of PL on graph learning models. We first present the error analysis of PL strategy by showing that the error is bounded by the confidence of PL threshold and consistency of multi-view prediction. Then, we theoretically illustrate the effect of PL on convergence property. Based on the analysis, we propose a cautious pseudo labeling methodology in which we pseudo label the samples with highest confidence and multi-view consistency. Finally, extensive experiments demonstrate that the proposed strategy improves graph learning process and outperforms other PL strategies on link prediction and node classification tasks.

摘要
假标签（PL）是一种广泛应用的策略，用于扩大标注数据集的训练过程中。许多研究表明，PL可以提高图学习模型的性能。然而，我们发现 incorrect labels 可能对图学习过程产生致命的影响。不当的 PL 可能导致性能下降，尤其是在图数据中， где 噪声可能进行卷积。 surprisingly，相关的错误分析在文献中 rarely 被 theoretically 探讨。在这篇论文中，我们希望给 PL 在图学习模型中提供深入的理解。我们首先给出 PL 策略的错误分析，并证明 error 是 PL 置信度和多视图预测一致性的 bound。然后，我们 theoretically 描述了 PL 对于融合性的影响。基于分析，我们提出了一种谨慎的假标签方法，其中我们假标签的样本是 confidence 最高和多视图一致的。最后，我们进行了广泛的实验，并证明了我们提出的策略可以改善图学习过程，并在链接预测和节点分类任务上超越其他 PL 策略。

Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods

paper_url: http://arxiv.org/abs/2310.01618
repo_url: None
paper_authors: Emanuele Zappala, Daniel Levine, Sizhuang He, Syed Rizvi, Sacha Levy, David van Dijk
for: 这篇论文目的是帮助深度学习模型更好地理解和设计，通过与数值分析的联系来提供理论基础。
methods: 这篇论文使用了迭代方法来描述神经网络，并提出了一种基于迭代法的神经网络架构。
results: 实验表明，迭代神经网络可以提高性能，而 alphaFold 和扩散模型等流行的架构也是基于迭代法。

Abstract
Deep neural networks, despite their success in numerous applications, often function without established theoretical foundations. In this paper, we bridge this gap by drawing parallels between deep learning and classical numerical analysis. By framing neural networks as operators with fixed points representing desired solutions, we develop a theoretical framework grounded in iterative methods for operator equations. Under defined conditions, we present convergence proofs based on fixed point theory. We demonstrate that popular architectures, such as diffusion models and AlphaFold, inherently employ iterative operator learning. Empirical assessments highlight that performing iterations through network operators improves performance. We also introduce an iterative graph neural network, PIGN, that further demonstrates benefits of iterations. Our work aims to enhance the understanding of deep learning by merging insights from numerical analysis, potentially guiding the design of future networks with clearer theoretical underpinnings and improved performance.

摘要
深度神经网络，尽管在许多应用中取得了成功，但它们往往没有明确的理论基础。在这篇论文中，我们尝试填补这一漏洞，通过将神经网络视为有定点表示希望的解的运算器，开发了一个基于迭代方法的理论框架。在定义的条件下，我们提供了收敛证明基于定点理论。我们发现，流行的架构，如扩散模型和AlphaFold，实际上是使用迭代运算学习。Empirical assessments表明，通过网络运算器进行迭代可以提高性能。我们还介绍了一种迭代图 neural network，PIGN，它进一步证明了迭代的好处。我们的工作的目的是增强深度学习的理解，通过与数学分析的交互，可能导向未来的网络设计有 clearer theoretical underpinnings和提高性能。

Intractability of Learning the Discrete Logarithm with Gradient-Based Methods

paper_url: http://arxiv.org/abs/2310.01611
repo_url: https://github.com/armanbolatov/hardness_of_learning
paper_authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhibek Kadyrsizova, Zhenisbek Assylbekov
for: 本研究探讨了使用梯度下降法学习质数logarithm的缺点。
methods: 本研究使用了梯度下降法和内存梯度下降法，并通过对特定的矩阵的spectral norm进行分析，证明了梯度下降法在学习质数logarithm的缺点上具有局限性。
results: 研究发现，使用梯度下降法学习质数logarithm的缺点时，梯度的强度会受到基数的影响，而不是logarithm的基数。此外，随着群体的规模增加，预测缺点的成功率会下降。

Abstract
The discrete logarithm problem is a fundamental challenge in number theory with significant implications for cryptographic protocols. In this paper, we investigate the limitations of gradient-based methods for learning the parity bit of the discrete logarithm in finite cyclic groups of prime order. Our main result, supported by theoretical analysis and empirical verification, reveals the concentration of the gradient of the loss function around a fixed point, independent of the logarithm's base used. This concentration property leads to a restricted ability to learn the parity bit efficiently using gradient-based methods, irrespective of the complexity of the network architecture being trained. Our proof relies on Boas-Bellman inequality in inner product spaces and it involves establishing approximate orthogonality of discrete logarithm's parity bit functions through the spectral norm of certain matrices. Empirical experiments using a neural network-based approach further verify the limitations of gradient-based learning, demonstrating the decreasing success rate in predicting the parity bit as the group order increases.

摘要
“离散logsarithm问题是数理学中的基本挑战，具有临� Notices significant implications for cryptographic protocols. In this paper, we investigate the limitations of gradient-based methods for learning the parity bit of the discrete logarithm in finite cyclic groups of prime order. Our main result, supported by theoretical analysis and empirical verification, reveals the concentration of the gradient of the loss function around a fixed point, independent of the logarithm's base used. This concentration property leads to a restricted ability to learn the parity bit efficiently using gradient-based methods, irrespective of the complexity of the network architecture being trained. Our proof relies on Boas-Bellman inequality in inner product spaces and it involves establishing approximate orthogonality of discrete logarithm's parity bit functions through the spectral norm of certain matrices. Empirical experiments using a neural network-based approach further verify the limitations of gradient-based learning, demonstrating the decreasing success rate in predicting the parity bit as the group order increases.”

Adversarial Contextual Bandits Go Kernelized

paper_url: http://arxiv.org/abs/2310.01609
repo_url: None
paper_authors: Gergely Neu, Julia Olkhovskaya, Sattar Vakili
for: 本研究探讨了在线学习中的反对抗敌性线性上下文随机带动问题的一种普适化问题，通过使用可重构kernel空间中的损失函数，以更加灵活地模型复杂的决策场景。
methods: 我们提出了一种计算效率高的算法，使用了一种新的乐观偏置估计器来估计损失函数，并实现了近似optimal的恐慌保证下界，对于多种 eigenvalue decay 假设。
results: 我们的算法在多种情况下都可以实现near-optimal的恐慌保证下界，包括对于polynomial eigendecay的情况下， regret 为 $\widetilde{O}(KT^{(\frac{1}{2}(1+\frac{1}{c})}$，其中 $T$ 是 Round 的数量， $K$ 是行动的数量。当 eigendecay 遵循 exponential 模式时，我们可以实现even tighter的 regret bound，即 $\widetilde{O}(\sqrt{T})$。这些率与所有已知的下界匹配，并且与已知的最佳上界匹配。

Abstract
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent $c>1$, the regret is $\widetilde{O}(KT^{\frac{1}{2}(1+\frac{1}{c})})$, where $T$ denotes the number of rounds and $K$ the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of $\widetilde{O}(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.

摘要
我们研究一种扩展线性上下文ual bandit问题的在线学习泛化问题，该问题允许更加灵活地模型复杂的决策场景。我们提出了一种 computationally efficient 的算法，该算法使用了一种新的乐观偏向估计器来估计损失函数，并实现了近似optimal的 regret guarantee。 Specifically, under the assumption of 多项幂减少($c>1$)，我们的 regret是 $\widetilde{O}(KT^{\frac{1}{2}(1+\frac{1}{c})})$, where $T$ denotes the number of rounds and $K$ the number of actions. 另外，当欧几何减少 follows an exponential pattern 时，我们可以达到更紧的 regret bound of $\widetilde{O}(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.

Pool-Based Active Learning with Proper Topological Regions

paper_url: http://arxiv.org/abs/2310.01597
repo_url: https://github.com/Lies0zeta/PALPTR-
paper_authors: Lies Hadjadj, Emilie Devijver, Remi Molinier, Massih-Reza Amini
for: 本研究提出了一种基于多类分类任务的池型活动学习策略，用于增强机器学习模型的性能。
methods: 本文提出了一种基于 topological data analysis（TDA）的Proper Topological Regions（PTR）方法，用于在池型活动学习中选择最有价值的无标注数据。
results: 实验表明，提出的方法在多种 benchmark 数据集上具有竞争力，并且与传统方法相比，可以更好地增强机器学习模型的性能。

Abstract
Machine learning methods usually rely on large sample size to have good performance, while it is difficult to provide labeled set in many applications. Pool-based active learning methods are there to detect, among a set of unlabeled data, the ones that are the most relevant for the training. We propose in this paper a meta-approach for pool-based active learning strategies in the context of multi-class classification tasks based on Proper Topological Regions. PTR, based on topological data analysis (TDA), are relevant regions used to sample cold-start points or within the active learning scheme. The proposed method is illustrated empirically on various benchmark datasets, being competitive to the classical methods from the literature.

摘要
文本翻译成简化中文：机器学习方法通常需要大量数据来达到良好的性能，而在许多应用场景中提供标注数据却是困难的。基于池的活动学习方法可以探测一个未标注数据集中最相关的数据，以便在活动学习中训练。本文提出了一种基于多 класс分类任务的池基活动学习策略的meta方法，使用Proper Topological Regions（PTR）来检测 relevance。PTR基于数据 topological分析（TDA），可以在活动学习中作为冷开始点或在激活学习中选择数据。Empirical experiment表明，提议的方法与文献中的传统方法竞争。Note:* "Pool-based active learning" refers to the approach of using a pool of unlabeled data to select the most relevant instances for labeling.* "Proper Topological Regions" (PTR) are regions in the data space that are defined based on topological data analysis (TDA) and are used to detect relevance in the unlabeled data.* "Multi-class classification" refers to the task of classifying instances into one of multiple classes.

An Investigation of Representation and Allocation Harms in Contrastive Learning

paper_url: http://arxiv.org/abs/2310.01583
repo_url: https://github.com/smaityumich/cl-representation-harm
paper_authors: Subha Maity, Mayank Agarwal, Mikhail Yurochkin, Yuekai Sun
for: 本研究探讨了自动学习中少数群体表现下降的原因，具体来说是对于自适应学习（SSL）中的对比学习（CL）方法的影响。
methods: 本研究使用了图像和文本数据集，以及相关的流行CL方法，来描述对少数群体的 represeting潜在危害。
results: 研究发现，CL方法在处理少数群体时容易导致对少数群体的表示潜在危害，并且这种危害对于下游分类任务有一定的影响。此外，研究还提供了一种理论解释，即在CLSetting中的概率链模型导致了表示潜在危害。

Abstract
The effect of underrepresentation on the performance of minority groups is known to be a serious problem in supervised learning settings; however, it has been underexplored so far in the context of self-supervised learning (SSL). In this paper, we demonstrate that contrastive learning (CL), a popular variant of SSL, tends to collapse representations of minority groups with certain majority groups. We refer to this phenomenon as representation harm and demonstrate it on image and text datasets using the corresponding popular CL methods. Furthermore, our causal mediation analysis of allocation harm on a downstream classification task reveals that representation harm is partly responsible for it, thus emphasizing the importance of studying and mitigating representation harm. Finally, we provide a theoretical explanation for representation harm using a stochastic block model that leads to a representational neural collapse in a contrastive learning setting.

摘要
supervised learning 中少数群体的表现问题已经得到了广泛关注，但在自动学习（SSL）上还未得到足够的探讨。本文显示，在对比学习（CL）中，少数群体的表现会与主要群体的表现相归缩合。我们称此现象为表现害，并在图像和文本 dataset 上使用相关的流行 CL 方法进行证明。此外，我们通过对下游分类任务的干扰分析表明，表现害对 representation harm 具有一定的贡献，因此就是要研究和缓解表现害的重要性。最后，我们提供了一种 theoretically explain representation harm 的Stochastic block model，导致了对比学习设置中的表现害。

Contraction Properties of the Global Workspace Primitive

paper_url: http://arxiv.org/abs/2310.01571
repo_url: None
paper_authors: Michaela Ennis, Leo Kozachkov, Jean-Jacques Slotine
for: 这 paper 探讨了多个领域的循环神经网络（RNN）的重要研究领域，特别是 Kozachkov et al. 提出的可证实的RNN（RNNs）。
methods: 该 paper 通过理论和实验方式扩展了 RNNs 的稳定性条件，特别是对全球工作空间模块结构的研究。
results: 该 paper 通过实验成功地示出了 Global Workspace Sparse Combo Nets 具有少量可训练参数，并且在缺少个体子网络时具有更好的抗耗性。这些实验结果表明了我们的理论研究对于实现模块 RNN 的稳定性具有重要意义。

Abstract
To push forward the important emerging research field surrounding multi-area recurrent neural networks (RNNs), we expand theoretically and empirically on the provably stable RNNs of RNNs introduced by Kozachkov et al. in "RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks". We prove relaxed stability conditions for salient special cases of this architecture, most notably for a global workspace modular structure. We then demonstrate empirical success for Global Workspace Sparse Combo Nets with a small number of trainable parameters, not only through strong overall test performance but also greater resilience to removal of individual subnetworks. These empirical results for the global workspace inter-area topology are contingent on stability preservation, highlighting the relevance of our theoretical work for enabling modular RNN success. Further, by exploring sparsity in the connectivity structure between different subnetwork modules more broadly, we improve the state of the art performance for stable RNNs on benchmark sequence processing tasks, thus underscoring the general utility of specialized graph structures for multi-area RNNs.

摘要

Causality-informed Rapid Post-hurricane Building Damage Detection in Large Scale from InSAR Imagery

paper_url: http://arxiv.org/abs/2310.01565
repo_url: None
paper_authors: Chenguang Wang, Yepeng Liu, Xiaojian Zhang, Xuechun Li, Vladimir Paramygin, Arthriya Subgranon, Peter Sheng, Xilei Zhao, Susu Xu
for:* 这 paper 是为了快速评估飓风引起的建筑物损害而写的。methods:* 这 paper 使用了 remote sensing 技术获取大规模的光学或 Interferometric Synthetic Aperture Radar（InSAR）图像数据，并使用了 causal Bayesian network 编码了风、洪水、建筑物损害、InSAR 图像之间的复杂 causal 关系。results:* 这 paper 的结果表明，使用这种方法可以快速 и准确地检测飓风引起的建筑物损害，并且可以避免了传统的手动检查方法所需的较长的处理时间。

Abstract
Timely and accurate assessment of hurricane-induced building damage is crucial for effective post-hurricane response and recovery efforts. Recently, remote sensing technologies provide large-scale optical or Interferometric Synthetic Aperture Radar (InSAR) imagery data immediately after a disastrous event, which can be readily used to conduct rapid building damage assessment. Compared to optical satellite imageries, the Synthetic Aperture Radar can penetrate cloud cover and provide more complete spatial coverage of damaged zones in various weather conditions. However, these InSAR imageries often contain highly noisy and mixed signals induced by co-occurring or co-located building damage, flood, flood/wind-induced vegetation changes, as well as anthropogenic activities, making it challenging to extract accurate building damage information. In this paper, we introduced an approach for rapid post-hurricane building damage detection from InSAR imagery. This approach encoded complex causal dependencies among wind, flood, building damage, and InSAR imagery using a holistic causal Bayesian network. Based on the causal Bayesian network, we further jointly inferred the large-scale unobserved building damage by fusing the information from InSAR imagery with prior physical models of flood and wind, without the need for ground truth labels. Furthermore, we validated our estimation results in a real-world devastating hurricane -- the 2022 Hurricane Ian. We gathered and annotated building damage ground truth data in Lee County, Florida, and compared the introduced method's estimation results with the ground truth and benchmarked it against state-of-the-art models to assess the effectiveness of our proposed method. Results show that our method achieves rapid and accurate detection of building damage, with significantly reduced processing time compared to traditional manual inspection methods.

摘要
时刻和精准的飓风导致建筑物损坏评估是应急回应和恢复努力的关键。现在，远程感知技术提供大规模的光学或折射 Synthetic Aperture Radar（InSAR）图像数据，可以快速进行飓风后建筑物损坏评估。相比光学卫星图像，Synthetic Aperture Radar可以穿过云层和提供更完整的损坏区域各种天气情况下的损坏评估。然而，这些InSAR图像经常含有高度杂音和混合信号，由于同时发生或位于损坏区域的建筑物损坏、洪水、洪水/风吹落 vegetation 变化以及人类活动，使其� Extracting accurate building damage information challenging。在本文中，我们介绍了一种快速飓风后建筑物损坏检测方法，基于数学关系网络（Bayesian network）。这个方法利用这些建筑物损坏、洪水、风吹的复杂 causal 关系，通过组合 InSAR 图像资讯和预先建立的洪水和风吹 Physical 模型，无需地面实验标签。此外，我们还 validate 了我们的估计结果，在2022年飓风 Ian 中进行了真实世界的应用。我们在李县、佛罗里达聚集和标注建筑物损坏的实际数据，并与地面实验标签相比较，以评估我们提出的方法的有效性。结果显示，我们的方法可以快速和精准地检测建筑物损坏，并且与传统手动检查方法相比，具有明显的处理时间缩短。

On the near-optimality of betting confidence sets for bounded means

paper_url: http://arxiv.org/abs/2310.01547
repo_url: None
paper_authors: Shubhanshu Shekhar, Aaditya Ramdas
for: 这个论文的目的是提供一种非对称信息Interval的建立方法，以及其时间平衡变体 confidence sequence。
methods: 这个论文使用了一种基于赌博的方法，即Waudby-Smith和Ramdas（2023）的赌博信息Interval。
results: 这个论文提供了一些 teorethical guarantees for this improved empirical performance of betting CIs and CSs，包括limiting width comparison, lower bounds characterization, and matching fundamental limits.

Abstract
Constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical nonparametric approach proceeds by inverting standard concentration bounds, such as Hoeffding's or Bernstein's inequalities. Recently, an alternative betting-based approach for defining CIs and their time-uniform variants called confidence sequences (CSs), has been shown to be empirically superior to the classical methods. In this paper, we provide theoretical justification for this improved empirical performance of betting CIs and CSs. Our main contributions are as follows: (i) We first compare CIs using the values of their first-order asymptotic widths (scaled by $\sqrt{n}$), and show that the betting CI of Waudby-Smith and Ramdas (2023) has a smaller limiting width than existing empirical Bernstein (EB)-CIs. (ii) Next, we establish two lower bounds that characterize the minimum width achievable by any method for constructing CIs/CSs in terms of certain inverse information projections. (iii) Finally, we show that the betting CI and CS match the fundamental limits, modulo an additive logarithmic term and a multiplicative constant. Overall these results imply that the betting CI~(and CS) admit stronger theoretical guarantees than the existing state-of-the-art EB-CI~(and CS); both in the asymptotic and finite-sample regimes.

摘要
constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical nonparametric approach proceeds by inverting standard concentration bounds, such as Hoeffding's or Bernstein's inequalities. Recently, an alternative betting-based approach for defining CIs and their time-uniform variants called confidence sequences (CSs), has been shown to be empirically superior to the classical methods. In this paper, we provide theoretical justification for this improved empirical performance of betting CIs and CSs. 我们的主要贡献如下：（i）我们首先比较CIs的第一个 asymptotic width（按照n的平方根 scaling），并显示WAudby-Smith和Ramdas（2023）的赌博CI的限制宽度小于现有的empirical Bernstein（EB）-CI。（ii）然后，我们设定了两个下界，用于描述任何方法构造CIs/CSs的最小宽度，并表示这些下界与certain inverse information projections有关。（iii）最后，我们表明了赌博CI和CS与基本限制相匹配，即，对于任何方法，其宽度至少要比基本限制宽度加上一个对数函数和一个常数多少。总的来说，这些结果表明赌博CI（和CS）在 both the asymptotic and finite-sample regimes具有更强的理论保证，比现有的状态 искусственный智能EB-CI（和CS）更强。

Fusing Models with Complementary Expertise

paper_url: http://arxiv.org/abs/2310.01542
repo_url: None
paper_authors: Hongyi Wang, Felipe Maia Polo, Yuekai Sun, Souvik Kundu, Eric Xing, Mikhail Yurochkin
for: 这个论文的目的是解决训练AI模型通用多任务多领域的问题，以便在测试时能够更好地掌握数据分布的各种多样性。
methods: 这篇论文使用了专家模型的融合（Fusion of Experts， FoE）方法，将专家模型的输出融合到一起，以提高任务的性能。这种方法适用于推理和生成任务，并且在图像和文本分类、文本摘要、多选问答以及自动评估生成文本等任务中得到了显著的性能提升。
results: 这篇论文的实验结果表明，使用 FoE 方法可以在图像和文本分类、文本摘要、多选问答以及自动评估生成文本等任务中提高性能，并且在“倔强”（frugal）设定下，可以减少专家模型评估的次数。

Abstract
Training AI models that generalize across tasks and domains has long been among the open problems driving AI research. The emergence of Foundation Models made it easier to obtain expert models for a given task, but the heterogeneity of data that may be encountered at test time often means that any single expert is insufficient. We consider the Fusion of Experts (FoE) problem of fusing outputs of expert models with complementary knowledge of the data distribution and formulate it as an instance of supervised learning. Our method is applicable to both discriminative and generative tasks and leads to significant performance improvements in image and text classification, text summarization, multiple-choice QA, and automatic evaluation of generated text. We also extend our method to the "frugal" setting where it is desired to reduce the number of expert model evaluations at test time.

摘要
<>对于训练通用的人工智能模型，长期以来是开放问题驱动着人工智能研究的一个重要问题。基础模型的出现使得获得特定任务的专家模型更加容易，但是在测试时可能遇到的数据多样性通常意味着任何一个专家都不够。我们将专家融合（FoE）问题定义为将专家模型输出的 complementary 知识与数据分布相结合，并将其视为一种supervised learning实例。我们的方法适用于推论和生成任务，并在图像和文本分类、文本摘要、多选问答和自动评估生成文本中导致显著性能提升。我们还将方法推广到"倔强"设定，即在测试时尽可能减少专家模型评估数量。

Adversarial Client Detection via Non-parametric Subspace Monitoring in the Internet of Federated Things

paper_url: http://arxiv.org/abs/2310.01537
repo_url: None
paper_authors: Xianjian Xie, Xiaochen Xian, Dan Li, Andi Wang
for: 该论文旨在提出一种有效的非参数方法FedRR，用于解决 federated learning 网络中的恶意攻击问题。
methods: 该方法基于 transmitted 参数更新的低级特征，并可以准确地检测恶意客户端和控制假阳性率。
results: 实验基于 MNIST 数据集的 digit 识别 validate 了我们的方法的优势。

Abstract
The Internet of Federated Things (IoFT) represents a network of interconnected systems with federated learning as the backbone, facilitating collaborative knowledge acquisition while ensuring data privacy for individual systems. The wide adoption of IoFT, however, is hindered by security concerns, particularly the susceptibility of federated learning networks to adversarial attacks. In this paper, we propose an effective non-parametric approach FedRR, which leverages the low-rank features of the transmitted parameter updates generated by federated learning to address the adversarial attack problem. Besides, our proposed method is capable of accurately detecting adversarial clients and controlling the false alarm rate under the scenario with no attack occurring. Experiments based on digit recognition using the MNIST datasets validated the advantages of our approach.

摘要
互联网联邦智能（IoFT）表示一个联网了多个系统，带有联邦学习作为核心，实现共同知识获取的网络，同时保障个体系统的数据隐私。然而，IoFT的广泛应用受到了安全问题的限制，尤其是联邦学习网络对 adversarial 攻击的抵触。在这篇论文中，我们提出了一种有效的非参数方法 FedRR，它利用联邦学习传输的参数更新低级特征来解决 adversarial 攻击问题。此外，我们的提议方法可以准确地检测出恶意客户端，并在没有攻击情况下控制假阳性率。基于 digit 识别 using MNIST 数据集，我们的方法在实验中证明了其优势。

Nowcasting day-ahead marginal emissions using multi-headed CNNs and deep generative models

paper_url: http://arxiv.org/abs/2310.01524
repo_url: None
paper_authors: Dhruv Suri, Anela Arifi, Ines Azevedo
for: 预测当天纳入系统的碳排放因素，以便在高灵活性和分布式能源资源的能源系统中更好地管理能源。
methods: 使用多头 convolutional neural networks（CNN）生成当天碳排放预测，以便更好地理解一个独立系统运营商的投入决策对碳排放的影响。
results: 通过使用多头 CNN 生成当天碳排放预测，可以更好地理解一个独立系统运营商的投入决策对碳排放的影响，从而更好地管理能源系统。

Abstract
Nowcasting day-ahead marginal emissions factors is increasingly important for power systems with high flexibility and penetration of distributed energy resources. With a significant share of firm generation from natural gas and coal power plants, forecasting day-ahead emissions in the current energy system has been widely studied. In contrast, as we shift to an energy system characterized by flexible power markets, dispatchable sources, and competing low-cost generation such as large-scale battery or hydrogen storage, system operators will be able to choose from a mix of different generation as well as emission pathways. To fully develop the emissions implications of a given dispatch schedule, we need a near real-time workflow with two layers. The first layer is a market model that continuously solves a security-constrained economic dispatch model. The second layer determines the marginal emissions based on the output of the market model, which is the subject of this paper. We propose using multi-headed convolutional neural networks to generate day-ahead forecasts of marginal and average emissions for a given independent system operator.

摘要
现在casting日前边额排放因子是现代化能源系统中增加的重要问题，特别是在高灵活性和分布式能源资源的普及下。现有的研究主要关注天然气和煤矿发电厂的固定产量，预测当前能源系统的日前排放。然而，随着我们转向一个具有灵活电力市场、投放可靠发电源和低成本生产如大规模电池或氢存储的能源系统，系统运营商将有多种不同的发电和排放路径可供选择。为了充分发挥排放的影响，我们需要一个实时工作流程，包括两层。第一层是一个安全保证的经济调度模型，第二层确定基于第一层模型的输出的边额排放。我们提议使用多头 convolutional neural networks（CNN）生成日前预测边额和平均排放的方法，这是本文的研究对象。

The Benefit of Noise-Injection for Dynamic Gray-Box Model Creation

paper_url: http://arxiv.org/abs/2310.01517
repo_url: None
paper_authors: Mohamed Kandil, J. J. McArthur
for: This paper aims to improve the performance of gray-box models for equipment emulator development by addressing uncertainties in the model creation process.methods: The paper proposes injecting noise into the training dataset to enrich the data and provide a measure of robustness against uncertainties.results: The approach was tested on a water-to-water heat exchanger using real devices with live data streaming, resulting in a significant reduction in modeling error (root mean square error) compared to the unprocessed signal data. The improvement amounted to 60% on the training set, and 50% and 45% on the test and validation sets, respectively.

Abstract
Gray-box models offer significant benefit over black-box approaches for equipment emulator development for equipment since their integration of physics provides more confidence in the model outside of the training domain. However, challenges such as model nonlinearity, unmodeled dynamics, and local minima introduce uncertainties into grey-box creation that contemporary approaches have failed to overcome, leading to their under-performance compared with black-box models. This paper seeks to address these uncertainties by injecting noise into the training dataset. This noise injection enriches the dataset and provides a measure of robustness against such uncertainties. A dynamic model for a water-to-water heat exchanger has been used as a demonstration case for this approach and tested using a pair of real devices with live data streaming. Compared to the unprocessed signal data, the application of noise injection resulted in a significant reduction in modeling error (root mean square error), decreasing from 0.68 to 0.27{\deg}C. This improvement amounts to a 60% enhancement when assessed on the training set, and improvements of 50% and 45% when validated against the test and validation sets, respectively.

摘要
灰色模型对设备模拟器开发具有显著的优势，因为它们 integrates 物理学提供了更多的信任度在训练领域之外。然而，模型不线性、不确定性和地方极值引入了不确定性，使得现代方法无法超越这些不确定性，导致其表现相对落后于黑色模型。本文提出了将噪声掺入训练集的方法，以增强数据集的质量和模型对不确定性的Robustness。我们使用了一个水到水热交换器的动态模型作为示例，并使用了两个真实的设备进行实际测试。与未处理的信号数据相比，噪声掺入导致模型错误减少了从0.68到0.27℃，即60%的提高。在训练集上评估时，改进了50%，在验证集和验证集上分别提高了45%。

Tensor Ring Optimized Quantum-Enhanced Tensor Neural Networks

paper_url: http://arxiv.org/abs/2310.01515
repo_url: https://github.com/konar1987/tr-qnet
paper_authors: Debanjan Konar, Dheeraj Peddireddy, Vaneet Aggarwal, Bijaya K. Panigrahi
For:The paper is written for researchers in the field of quantum machine learning, specifically those interested in incorporating tensor networks into deep neural networks and variational optimization.Methods:The paper proposes a multi-layer design of a Tensor Ring optimized variational Quantum learning classifier (Quan-TR), which consists of cascading entangling gates replacing the fully connected layers of a tensor network. The parameters of the TR-QNet are optimized through stochastic gradient descent algorithm on qubit measurements.Results:The proposed TR-QNet achieves promising accuracy on three distinct datasets, namely Iris, MNIST, and CIFAR-10, with accuracy of 94.5%, 86.16%, and 83.54%, respectively, on quantum simulations. The paper also conducts benchmark studies on state-of-the-art quantum and classical implementations of tensor network models to demonstrate the efficacy of the proposed TR-QNet. Additionally, the scalability of TR-QNet highlights its potential for deep learning applications on a large scale.

Abstract
Quantum machine learning researchers often rely on incorporating Tensor Networks (TN) into Deep Neural Networks (DNN) and variational optimization. However, the standard optimization techniques used for training the contracted trainable weights of each model layer suffer from the correlations and entanglement structure between the model parameters on classical implementations. To address this issue, a multi-layer design of a Tensor Ring optimized variational Quantum learning classifier (Quan-TR) comprising cascading entangling gates replacing the fully connected (dense) layers of a TN is proposed, and it is referred to as Tensor Ring optimized Quantum-enhanced tensor neural Networks (TR-QNet). TR-QNet parameters are optimized through the stochastic gradient descent algorithm on qubit measurements. The proposed TR-QNet is assessed on three distinct datasets, namely Iris, MNIST, and CIFAR-10, to demonstrate the enhanced precision achieved for binary classification. On quantum simulations, the proposed TR-QNet achieves promising accuracy of $94.5\%$, $86.16\%$, and $83.54\%$ on the Iris, MNIST, and CIFAR-10 datasets, respectively. Benchmark studies have been conducted on state-of-the-art quantum and classical implementations of TN models to show the efficacy of the proposed TR-QNet. Moreover, the scalability of TR-QNet highlights its potential for exhibiting in deep learning applications on a large scale. The PyTorch implementation of TR-QNet is available on Github:https://github.com/konar1987/TR-QNet/

摘要
研究员们常常将量子机器学习与含tensor网络（TN）和变量优化结合起来。然而，在классифика翻译器中使用标准优化技术来训练每层模型参数的问题受到 correlate 和束缚结构的影响。为了解决这个问题，我们提出了一种多层设计的tensor环优化量子学习分类器（Quan-TR），其中每层的含tensor网络（TN）中的完全连接（dense）层被替换为束缚门。这种模型被称为tensor环优化量子含tensor神经网络（TR-QNet）。TR-QNet的参数通过随机梯度下降算法在量子测量中进行优化。我们在三个不同的数据集上（namely Iris、MNIST和CIFAR-10）进行了评估，并达到了高精度的分类结果。在量子仿真中，我们的TR-QNet实现了可观的准确率，即$94.5\%$, $86.16\%$和$83.54\%$。我们还对现有的量子和类型实现的TN模型进行了比较，以显示TR-QNet的效果。此外，TR-QNet的可扩展性表明它在深度学习应用中具有潜在的潜力。TR-QNet的PyTorch实现可以在GitHub上找到：https://github.com/konar1987/TR-QNet/。

CODA: Temporal Domain Generalization via Concept Drift Simulator

paper_url: http://arxiv.org/abs/2310.01508
repo_url: None
paper_authors: Chia-Yuan Chang, Yu-Neng Chuang, Zhimeng Jiang, Kwei-Herng Lai, Anxiao Jiang, Na Zou
for: 这个研究旨在解决机器学习模型在概念漂移（concept drift）中的问题，以提高模型在不同时间点的通用性。
methods: 研究使用了一个名为CODA（Concept Drift simulAtor）的框架，它利用预测的特征相互 correlations来生成未来数据，以便训练模型。
results: 实验结果显示，使用CODA-生成的数据作为训练输入可以有效地实现时间领域通用性，并且可以适用于不同的模型架构。

Abstract
In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data, and (ii) precisely capturing the temporal trends of joint distribution along chronological source domains is computationally infeasible. To tackle the challenges, we propose the COncept Drift simulAtor (CODA) framework incorporating a predicted feature correlation matrix to simulate future data for model training. Specifically, CODA leverages feature correlations to represent data characteristics at specific time points, thereby circumventing the daunting computational costs. Experimental results demonstrate that using CODA-generated data as training input effectively achieves temporal domain generalization across different model architectures.

摘要
在实际应用中，机器学习模型经常因为 JOINT 分布的变化而变得过时，这种现象被称为 "概念漂移"。现有的工作提出了特定于模型的策略来实现时间总结。然而，实际数据的多样性需要特定的预测模型建 architecture。因此，有一项非常需要的是一种模型无关的时间域总结方法，可以在不同的数据模式和建 architecture 下保持一致性。在这项工作中，我们尝试通过数据中心的方式解决概念漂移问题，而不是考虑数据和模型之间的交互。开发这样的框架具有非常大的挑战：（i）现有的生成模型很难生成未经验数据，（ii）准确地捕捉 JOINT 分布中的时间趋势是计算不可能的。为了解决这些挑战，我们提出了 COncept Drift simulAtor（CODA）框架，该框架利用预测的特征相关矩阵来模拟未来数据，以便对模型进行训练。具体来说，CODA 利用特征相关来表示特定时间点的数据特征，从而绕过了计算不可能的问题。实验结果表明，使用 CODA-生成的数据作为训练输入可以实现时间域总结，并且可以在不同的模型建 architecture 下实现。

A Learning Based Scheme for Fair Timeliness in Sparse Gossip Networks

paper_url: http://arxiv.org/abs/2310.01396
repo_url: None
paper_authors: Purbesh Mitra, Sennur Ulukus
for: 本研究旨在研究一个带有各种连接性的谣言网络，source更新信息采用波动过程，并且将信息传递给网络中的节点。由于网络结构不均衡，不同节点的实时性不同，因此需要研究如何对网络进行公平的时间分配，以最小化总体最差性能。
methods: 本研究使用连续搜索空间的枪戈投掷问题形式化了问题，并采用 Gaussian process基于 Bayesian 优化来实现探索和利用的权衡。
results: 研究发现，采用 Gaussian process基于 Bayesian 优化的方法可以在不同的网络结构下实现公平的时间分配，并且可以最小化总体最差性能。

Abstract
We consider a gossip network, consisting of $n$ nodes, which tracks the information at a source. The source updates its information with a Poisson arrival process and also sends updates to the nodes in the network. The nodes themselves can exchange information among themselves to become as timely as possible. However, the network structure is sparse and irregular, i.e., not every node is connected to every other node in the network, rather, the order of connectivity is low, and varies across different nodes. This asymmetry of the network implies that the nodes in the network do not perform equally in terms of timelines. Due to the gossiping nature of the network, some nodes are able to track the source very timely, whereas, some nodes fall behind versions quite often. In this work, we investigate how the rate-constrained source should distribute its update rate across the network to maintain fairness regarding timeliness, i.e., the overall worst case performance of the network can be minimized. Due to the continuous search space for optimum rate allocation, we formulate this problem as a continuum-armed bandit problem and employ Gaussian process based Bayesian optimization to meet a trade-off between exploration and exploitation sequentially.

摘要
我们考虑一个嗅探网络，包含 $n$ 个节点，跟踪源信息的变化。源节点通过波动过程更新自己的信息，并将更新传递给网络中的其他节点。节点之间可以互相交换信息，以使自己的时间线最为整拢。然而，网络结构稀疏和不规则，即不是所有节点与所有其他节点相连，而是每个节点与其他节点之间的连接关系较弱，因此不同节点在网络中的性能不同。由于嗅探网络的自我感知特性，一些节点可以很快地跟踪源信息，而其他节点则经常落后版本。在这个工作中，我们研究如何Constrained source应该在网络中分配更新率，以保持公平性，即最大化网络总体最差性能。由于搜索空间是连续的，我们将这个问题转化为连续武器问题，并使用 Gaussian process 基于 Bayesian 优化来实现搜索和利用的权衡。

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.01380
repo_url: None
paper_authors: Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu
for: 本研究旨在提出一种 oracle-efficient 算法，用于 offline 强化学习（RL）中的非线性函数approximation。
methods: 我们的算法采用了三个创新的Component：(1) 一种基于差异的重 regression scheme，可以应用于各种函数类型; (2) 一种用于幂度估计的 subroutine; (3) 一种计划阶段使用的 pessimistic value iteration 方法。
results: 我们的算法可以 garantuetotal achieve minimax 优化的实例特性 regret，并且在特定的函数类型下，其 regret bound 具有紧张的函数类型复杂度的关系。

Abstract
Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation has been extensively studied with optimal results achieved under certain assumptions, many works shift their interest to offline RL with non-linear function approximation. However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees. In this paper, we propose an oracle-efficient algorithm, dubbed Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline RL with non-linear function approximation. Our algorithmic design comprises three innovative components: (1) a variance-based weighted regression scheme that can be applied to a wide range of function classes, (2) a subroutine for variance estimation, and (3) a planning phase that utilizes a pessimistic value iteration approach. Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation. Our work extends the previous instance-dependent results within simpler function classes, such as linear and differentiable function to a more general framework.

摘要
养成机器人学习（RL）在线上进行学习，以便学习最佳策略基于行为策略收集的数据。Recent years have seen increasing attention paid to offline RL with linear function approximation. However, many works have shifted their focus to offline RL with non-linear function approximation. Although there have been limited works on offline RL with non-linear function approximation that provide instance-dependent regret guarantees. In this paper, we propose an oracle-efficient algorithm, called Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline RL with non-linear function approximation. Our algorithm design includes three innovative components:1. 一种基于方差的重量回归方案，可以应用于各种函数类型2. 一种归一化误差估计的子routine3. 一个使用悲观值迭代方法的规划阶段我们的算法拥有一个具有函数类型复杂度的 regret bound，并在特殊化为线性函数approximation时实现最佳最小化例外 regret。我们的工作扩展了之前只适用于更简单的函数类型，如线性和导数函数的结果，到一个更通用的框架。

Window-based Model Averaging Improves Generalization in Heterogeneous Federated Learning

paper_url: http://arxiv.org/abs/2310.01366
repo_url: None
paper_authors: Debora Caldarola, Barbara Caputo, Marco Ciccone
for: 提高 Federated Learning（FL）中数据分布不均的问题，保护用户隐私。
methods: 提出了窗口基于的模型均值（WIMA）方法，通过融合不同回合的全球模型，有效地捕捉多个用户的知识，降低最后见 Client 数据偏见。
results: 在不同的分布Shift和坏 Client 采样情况下，WIMA 能够提供更平滑、稳定的学习趋势，同时不增加客户端计算或通信开销。

Abstract
Federated Learning (FL) aims to learn a global model from distributed users while protecting their privacy. However, when data are distributed heterogeneously the learning process becomes noisy, unstable, and biased towards the last seen clients' data, slowing down convergence. To address these issues and improve the robustness and generalization capabilities of the global model, we propose WIMA (Window-based Model Averaging). WIMA aggregates global models from different rounds using a window-based approach, effectively capturing knowledge from multiple users and reducing the bias from the last ones. By adopting a windowed view on the rounds, WIMA can be applied from the initial stages of training. Importantly, our method introduces no additional communication or client-side computation overhead. Our experiments demonstrate the robustness of WIMA against distribution shifts and bad client sampling, resulting in smoother and more stable learning trends. Additionally, WIMA can be easily integrated with state-of-the-art algorithms. We extensively evaluate our approach on standard FL benchmarks, demonstrating its effectiveness.

摘要

Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use

paper_url: http://arxiv.org/abs/2310.01362
repo_url: https://github.com/liruiw/fleet-tools
paper_authors: Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake
for: 这篇论文的目的是探讨分布式学习可以如何实现群体级别的机器人学习，而不需要传输或中央化群体级别数据。
methods: 该论文提出了一种分布式学习策略，称为“队伍合并”（fleet-merge），可以有效地将多个策略 Parameterized by recurrent neural networks (RNNs) 集成在分布式环境中。
results: 研究人员在Meta-World环境中训练了50个任务，并通过队伍合并策略将其们的策略集成起来，得到了良好的性能。此外，他们还提出了一个新的机器人工具使用标准，称为“队伍工具”（fleet-tools），可以用于评估群体级别的机器人学习在复杂和有接触的机器人手 manipulate 任务中的性能。

Abstract
Fleets of robots ingest massive amounts of streaming data generated by interacting with their environments, far more than those that can be stored or transmitted with ease. At the same time, we hope that teams of robots can co-acquire diverse skills through their experiences in varied settings. How can we enable such fleet-level learning without having to transmit or centralize fleet-scale data? In this paper, we investigate distributed learning of policies as a potential solution. To efficiently merge policies in the distributed setting, we propose fleet-merge, an instantiation of distributed learning that accounts for the symmetries that can arise in learning policies that are parameterized by recurrent neural networks. We show that fleet-merge consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with the merged policy achieving good performance on nearly all training tasks at test time. Moreover, we introduce a novel robotic tool-use benchmark, fleet-tools, for fleet policy learning in compositional and contact-rich robot manipulation tasks, which might be of broader interest, and validate the efficacy of fleet-merge on the benchmark.

摘要
大量的机器人队伍通过与环境互动生成大量流动数据，远远超出了可以存储或传输的范围。同时，我们希望机器人队伍可以通过不同的场景经验共同获得多样化的技能。如何实现这种队伍级学习而无需传输或中央化队伍级数据？在这篇论文中，我们调查分布式学习策略为可能的解决方案。为了有效地融合分布式环境中的策略，我们提出了“队伍融合”（fleet-merge），这是基于循环神经网络参数化策略的分布式学习实现，考虑到分布式环境中策略学习时可能出现的对称性。我们显示，队伍融合可以有效地将50个任务的策略在Meta-World环境中的行为协调，并在测试时对大多数训练任务表现良好。此外，我们还介绍了一个新的机器人工具使用指标，称为“队伍工具”（fleet-tools），用于评估机器人队伍在复杂的机器人拼接和接触rich任务中的策略学习能力，这可能对更广泛的领域有所启发。我们 Validate the effectiveness of fleet-merge on the benchmark.

A peridynamic-informed deep learning model for brittle damage prediction

paper_url: http://arxiv.org/abs/2310.01350
repo_url: None
paper_authors: Roozbeh Eghbalpoor, Azadeh Sheidaei
for: 预测质量材料中的 quasi-static 损害和裂化
methods: combines périodic 理论与Physics-Informed Neural Network (PINN) 方法
results: 能准确预测质量材料中的损害和裂化，并且高效率Here’s a more detailed explanation of each point:
for: The paper is written to predict the quasi-static damage and crack propagation in brittle materials using a novel approach that combines the principles of peridynamic theory with PINN.
methods: The proposed approach uses the linearized PD governing equation to enforce the PD principles in the PINN’s residual-based loss function, allowing the model to learn and capture intricate displacement patterns associated with different geometrical parameters. The paper also proposes several enhancements, such as cyclical annealing schedule and deformation gradient aware optimization technique, to ensure the model’s convergence and accuracy.
results: The paper’s results show that the proposed PD-INN approach can accurately predict damage and crack propagation in brittle materials, and it is more efficient than traditional methods such as PD direct numerical method and Extended-Finite Element Method. The paper provides several benchmark cases to validate the accuracy of the proposed approach.

Abstract
In this study, a novel approach that combines the principles of peridynamic (PD) theory with PINN is presented to predict quasi-static damage and crack propagation in brittle materials. To achieve high prediction accuracy and convergence rate, the linearized PD governing equation is enforced in the PINN's residual-based loss function. The proposed PD-INN is able to learn and capture intricate displacement patterns associated with different geometrical parameters, such as pre-crack position and length. Several enhancements like cyclical annealing schedule and deformation gradient aware optimization technique are proposed to ensure the model would not get stuck in its trivial solution. The model's performance assessment is conducted by monitoring the behavior of loss function throughout the training process. The PD-INN predictions are also validated through several benchmark cases with the results obtained from high-fidelity techniques such as PD direct numerical method and Extended-Finite Element Method. Our results show the ability of the nonlocal PD-INN to predict damage and crack propagation accurately and efficiently.

摘要
在这一研究中，我们提出了一种新的方法，即将普适动学（PD）原理与人工神经网络（PINN）结合以预测质量静的损害和裂缝升温。为了保证预测精度和收敛率高，我们在PINN的剩余基于损失函数中 enforces 了线性化的PD公式。我们的PD-INN可以学习和捕捉不同的几何参数（如预先裂位和长度）对应的复杂的位移模式。我们还提出了循环退火 schedule 和减弱材料响应优化技术来确保模型不会陷入到极少的解。我们通过监测损失函数的行为进行模型评估。我们的PD-INN预测也与高精度技术such as PD直接数值方法和扩展Finite Element方法的结果进行了验证。我们的结果表明PD-INN能够高效地和准确地预测损害和裂缝升温。

The Optimal use of Segmentation for Sampling Calorimeters

paper_url: http://arxiv.org/abs/2310.04442
repo_url: https://github.com/eiccodesign/regressiononly
paper_authors: Fernando Torales Acosta, Bishnu Karki, Piyush Karande, Aaron Angerami, Miguel Arratia, Kenneth Barish, Ryan Milton, Sebastián Morán, Benjamin Nachman, Anshuman Sinha
for: 这个论文是为了研究探测器的能量重建方法。
methods: 该论文使用深度神经网络来表示探测器，并利用所有可用信息来进行能量重建。
results: 研究发现，在隔离带电离袋中， relativelly细的长itudinal分割是重建能量的关键。这些结果可以作为未来EIC探测器优化的标准，以及其他实验室中高分辨率探测器的研究。

Abstract
One of the key design choices of any sampling calorimeter is how fine to make the longitudinal and transverse segmentation. To inform this choice, we study the impact of calorimeter segmentation on energy reconstruction. To ensure that the trends are due entirely to hardware and not to a sub-optimal use of segmentation, we deploy deep neural networks to perform the reconstruction. These networks make use of all available information by representing the calorimeter as a point cloud. To demonstrate our approach, we simulate a detector similar to the forward calorimeter system intended for use in the ePIC detector, which will operate at the upcoming Electron Ion Collider. We find that for the energy estimation of isolated charged pion showers, relatively fine longitudinal segmentation is key to achieving an energy resolution that is better than 10% across the full phase space. These results provide a valuable benchmark for ongoing EIC detector optimizations and may also inform future studies involving high-granularity calorimeters in other experiments at various facilities.

摘要
一个重要的设计选择 для任何采样加热计是如何细化 longitudinal 和 transverse 分 segmentation。为了决定这个选择，我们研究采用加热计分 segmentation 对能量重建的影响。为确保这些趋势是固有的硬件效应而不是不当使用分 segmentation，我们使用深度神经网络进行重建。这些网络利用所有可用信息，将加热计表示为点云。为了证明我们的方法，我们模拟了类似于前向加热计系统，这将在未来的 Electron Ion Collider 中使用。我们发现，对孤立 charged pion 散射的能量估计，相对细化 longitudinal 分 segmentation 是达到更好于 10% 的全频范围能量分辨率的关键。这些结果提供了价值的参考点 для进行中的 EIC 仪器优化，也可能会影响未来在其他实验室中的高精度加热计研究。

Optimal Estimator for Linear Regression with Shuffled Labels

paper_url: http://arxiv.org/abs/2310.01326
repo_url: None
paper_authors: Hang Zhang, Ping Li
for: Linear regression with shuffled labels, specifically reconstructing the permutation matrix and signal of interest from the sensing results.
methods: One-step estimator with a computational complexity of $O(n^3 + np^2m)$, which is comparable to the maximum complexity of linear assignment and least square algorithms.
results: Sufficient conditions for correct permutation recovery under different regimes of signal-to-noise ratio (SNR), including an easy regime, a medium regime, and a hard regime. Numerical experiments confirm the theoretical claims.

Abstract
This paper considers the task of linear regression with shuffled labels, i.e., $\mathbf Y = \mathbf \Pi \mathbf X \mathbf B + \mathbf W$, where $\mathbf Y \in \mathbb R^{n\times m}, \mathbf Pi \in \mathbb R^{n\times n}, \mathbf X\in \mathbb R^{n\times p}, \mathbf B \in \mathbb R^{p\times m}$, and $\mathbf W\in \mathbb R^{n\times m}$, respectively, represent the sensing results, (unknown or missing) corresponding information, sensing matrix, signal of interest, and additive sensing noise. Given the observation $\mathbf Y$ and sensing matrix $\mathbf X$, we propose a one-step estimator to reconstruct $(\mathbf \Pi, \mathbf B)$. From the computational perspective, our estimator's complexity is $O(n^3 + np^2m)$, which is no greater than the maximum complexity of a linear assignment algorithm (e.g., $O(n^3)$) and a least square algorithm (e.g., $O(np^2 m)$). From the statistical perspective, we divide the minimum $snr$ requirement into four regimes, e.g., unknown, hard, medium, and easy regimes; and present sufficient conditions for the correct permutation recovery under each regime: $(i)$ $snr \geq \Omega(1)$ in the easy regime; $(ii)$ $snr \geq \Omega(\log n)$ in the medium regime; and $(iii)$ $snr \geq \Omega((\log n)^{c_0}\cdot n^{c_1}/{srank(\mathbf B)})$ in the hard regime ($c_0, c_1$ are some positive constants and $srank(\mathbf B)$ denotes the stable rank of $\mathbf B$). In the end, we also provide numerical experiments to confirm the above claims.

摘要
这篇论文考虑了线性回归问题，即 $\mathbf{Y = \Pi XB + W}$, 其中 $\mathbf{Y} \in \mathbb{R}^{n \times m}, \mathbf{\Pi} \in \mathbb{R}^{n \times n}, \mathbf{X} \in \mathbb{R}^{n \times p}, \mathbf{B} \in \mathbb{R}^{p \times m}$, 和 $\mathbf{W} \in \mathbb{R}^{n \times m}$ 分别表示探测结果、对应信息、探测矩阵、信号 OF interest 和随机探测噪音。给定观测值 $\mathbf{Y}$ 和探测矩阵 $\mathbf{X}$，我们提议一步估计器来重建 $(\mathbf{\Pi}, \mathbf{B})$。从计算角度来看，我们的估计器的复杂度为 $O(n^3 + np^2m)$，不超过最大的线性分配算法的复杂度（例如 $O(n^3)$）和最小二乘算法的复杂度（例如 $O(np^2m)$）。从统计角度来看，我们将最小 $snr$ 要求分为四个 режиmes，即未知 режиme、困难 режиme、中等 режиme 和容易 режиme，并给出了各 режиme 下correct permutation recovery的 suficient conditions： $(i)$ $snr \geq \Omega(1)$ 在容易 режиme; $(ii)$ $snr \geq \Omega(\log n)$ 在中等 режиme; 和 $(iii)$ $snr \geq \Omega((log n)^{c_0} \cdot n^{c_1}/{srank(\mathbf{B})})$ 在困难 режиme（$c_0, c_1$ 是一些正数， $srank(\mathbf{B})$ 表示 $\mathbf{B}$ 的稳定秩）。 finally, we also provide numerical experiments to confirm the above claims.

Coupling public and private gradient provably helps optimization

paper_url: http://arxiv.org/abs/2310.01304
repo_url: None
paper_authors: Ruixuan Liu, Zhiqi Bu, Yu-xiang Wang, Sheng Zha, George Karypis
for: 提高大神经网络的成功率，通过结合私人和公共数据进行优化。
methods: 使用权重Linear Combination将私人和公共数据的梯度相互 Coupling，并在 convex 设定下解析出 оптималь solution。
results: 通过实验证明，在语言和视觉benchmark上， gradient Coupling可以加速非对称损失的收敛，并且Hyperparameter如隐私预算、迭代次数、批处理大小和模型大小对 choosing 优化的权重有影响。

Abstract
The success of large neural networks is crucially determined by the availability of data. It has been observed that training only on a small amount of public data, or privately on the abundant private data can lead to undesirable degradation of accuracy. In this work, we leverage both private and public data to improve the optimization, by coupling their gradients via a weighted linear combination. We formulate an optimal solution for the optimal weight in the convex setting to indicate that the weighting coefficient should be hyperparameter-dependent. Then, we prove the acceleration in the convergence of non-convex loss and the effects of hyper-parameters such as privacy budget, number of iterations, batch size, and model size on the choice of the weighting coefficient. We support our analysis with empirical experiments across language and vision benchmarks, and provide a guideline for choosing the optimal weight of the gradient coupling.

摘要
“大型神经网络的成功几率受到数据的可用性的重要限制。已经观察到只在少量公共数据或私有数据上训练时，可能会导致准确度下降。在这种工作中，我们利用了私有和公共数据来改善优化，通过将其权重组合。我们在凸Setting中提出了最佳解决方案，其中权重系数应该是Hyperparameter-dependent。然后，我们证明了加速非 convex损失的收敛速度和 гиперparameters的影响，如隐私预算、迭代次数、批处理大小和模型大小。我们支持我们的分析通过语言和视觉 benchmarks 的实验，并提供了选择最佳权重的指南。”Note that Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and other parts of the world.

Automated regime detection in multidimensional time series data using sliced Wasserstein k-means clustering

paper_url: http://arxiv.org/abs/2310.01285
repo_url: None
paper_authors: Qinmeng Luan, James Hamp
for: 本研究使用 Wasserstein k-means clustering 方法来标识时间序列数据中的不同频率模式。
methods: 本研究首先对一维时间序列数据应用 Wasserstein k-means clustering 算法，并研究了不同初始化的影响。然后，对多维时间序列数据，我们使用 slice Wasserstein k-means clustering 方法（sWk-means），并用合成数据示出了该方法的有效性。
results: 本研究使用实际的外汇spot价数据进行了一个案例研究，并证明了 sWk-means 方法的有效性。研究还发现了一些限制，并提出了可能的补充或替代方法。

Abstract
Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to identify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We study the dynamics of the algorithm and investigate how varying different hyperparameters impacts the performance of the clustering algorithm for different random initialisations. We compute simple metrics that we find are useful in identifying high-quality clusterings. Then, we extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call `sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime detection in multidimensional time series data, using synthetic data to demonstrate the validity of the approach. Finally, we show that the sWk-means method is effective in identifying distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.

摘要
最近的工作提出了 Wasserstein k-means（Wk-means）归一 clustering 方法，用于时间序列数据中的分区。本文首先对 synthetic 一维时间序列数据进行了详细的研究，探讨了 Wasserstein k-means clustering 算法的行为和不同权重参数对不同初始化的影响。我们计算了一些简单的指标，用于评价高质量的归一结果。然后，我们将多维时间序列数据中的 Wasserstein k-means clustering 方法扩展为 sliced Wasserstein k-means（sWk-means）归一方法，通过 aproximating 多维 Wasserstein 距离为 slice Wasserstein 距离。我们使用 synthetic 数据 demonstrate 了这种方法的有效性。最后，我们使用公开available foreign exchange spot rate 数据作为案例研究，证明了 sWk-means 方法在实际多维金融时间序列中可以有效地 Identify 市场 режимы。我们结束时提出了一些限制和可能的补充或替代方法。

Non-Exchangeable Conformal Risk Control

paper_url: http://arxiv.org/abs/2310.01262
repo_url: https://github.com/deep-spin/non-exchangeable-crc
paper_authors: António Farinhas, Chrysoula Zerva, Dennis Ulmer, André F. T. Martins
for: 提供形式保证的uncertainty集或间隔 для黑盒神经网络预测，确保先定的概率包含实际的地面真值。
methods: 基于非交换性数据的扩展，以及提供更广泛的目标的统计保证，如确界最好的F1分数或预期false negative rate。
results: 在 synthetic 和实际数据上实现了非交换性扩展的 conformal risk control，可控制任意升序损失函数的期望值，无需假设，可以根据测试示例的统计相似性进行数据权重。

Abstract
Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth. While the original formulation assumes data exchangeability, some extensions handle non-exchangeable data, which is often the case in many real-world scenarios. In parallel, some progress has been made in conformal methods that provide statistical guarantees for a broader range of objectives, such as bounding the best F1-score or minimizing the false negative rate in expectation. In this paper, we leverage and extend these two lines of work by proposing non-exchangeable conformal risk control, which allows controlling the expected value of any monotone loss function when the data is not exchangeable. Our framework is flexible, makes very few assumptions, and allows weighting the data based on its statistical similarity with the test examples; a careful choice of weights may result on tighter bounds, making our framework useful in the presence of change points, time series, or other forms of distribution drift. Experiments with both synthetic and real world data show the usefulness of our method.

摘要
划分预测（Conformal Prediction）在最近几年内产生了广泛的兴趣，它可以提供对黑盒神经网络模型的预测结果的形式保证的不确定集或间隔，保证预测结果符合定义的概率。然而，原始 формулировщин assumes 数据均匀性，一些扩展可以处理非均匀数据，这些数据在许多实际场景中很常见。另外，一些进展在划分方法上，可以为更广泛的目标提供统计保证，例如缩小最佳 F1 分数或预测FALSE Negative 率的预期。在这篇文章中，我们利用和扩展这两个线索，提出非均匀划分风险控制，可以在不均匀数据上控制预测结果的预期值。我们的框架具有很少假设，可以根据测试例子的统计相似性来赋重数据，选择合适的赋重可能会使我们的框架在变化点、时间序列或其他形式的分布漂移中更加有用。实验表明，我们的方法在实际数据上具有很好的用处。

Self-supervised Learning for Anomaly Detection in Computational Workflows

paper_url: http://arxiv.org/abs/2310.01247
repo_url: None
paper_authors: Hongwei Jin, Krishnan Raghavan, George Papadimitriou, Cong Wang, Anirban Mandal, Ewa Deelman, Prasanna Balaprakash
for: 这个研究旨在探讨计算工作流程中的异常检测问题，以涵盖各领域如防火墙、金融和社交网络等。
methods: 这篇研究使用自动encoder驱动的自我超vised learning（SSL）方法，从无标注的工作流程数据中学习一个总体统计，以评估计算工作流程的正常行为。
results: 研究结果显示，通过估计正常行为的分布在隐藏空间，可以超越现有的异常检测方法在我们的参考数据集上。

Abstract
Anomaly detection is the task of identifying abnormal behavior of a system. Anomaly detection in computational workflows is of special interest because of its wide implications in various domains such as cybersecurity, finance, and social networks. However, anomaly detection in computational workflows~(often modeled as graphs) is a relatively unexplored problem and poses distinct challenges. For instance, when anomaly detection is performed on graph data, the complex interdependency of nodes and edges, the heterogeneity of node attributes, and edge types must be accounted for. Although the use of graph neural networks can help capture complex inter-dependencies, the scarcity of labeled anomalous examples from workflow executions is still a significant challenge. To address this problem, we introduce an autoencoder-driven self-supervised learning~(SSL) approach that learns a summary statistic from unlabeled workflow data and estimates the normal behavior of the computational workflow in the latent space. In this approach, we combine generative and contrastive learning objectives to detect outliers in the summary statistics. We demonstrate that by estimating the distribution of normal behavior in the latent space, we can outperform state-of-the-art anomaly detection methods on our benchmark datasets.

摘要
《异常检测在计算工作流中是一项特殊的任务，因为它在各个领域，如网络安全、金融和社交媒体中具有广泛的应用。然而，在计算工作流中进行异常检测（通常模型为图）是一个相对未经探索的问题，它具有许多独特的挑战。例如，在图数据上进行异常检测时，需要考虑图中节点和边之间的复杂依赖关系，节点属性和边类型的异常性。虽然使用图神经网络可以帮助捕捉图中的复杂依赖关系，但是从计算工作流中获得标注的异常示例还是一个主要的挑战。为解决这个问题，我们提出了一种自动编码器驱动的自我超级vised学习（SSL）方法，该方法通过不supervised learning来学习计算工作流的正常行为的摘要统计。在这种方法中，我们将生成和对比学习目标结合起来，以检测摘要统计中的异常点。我们示示了，通过估计计算工作流的正常行为的分布在隐藏空间，我们可以超越现有的异常检测方法在我们的标准散点集上表现。》

Modality-aware Transformer for Time series Forecasting

paper_url: http://arxiv.org/abs/2310.01232
repo_url: None
paper_authors: Hajar Emami, Xuan-Hong Dang, Yousaf Shah, Petros Zerfos
for: 这篇论文主要针对多modal时间序列预测问题，特别是在金融领域，时间序列的未来行为frequently linked to information derived from various textual reports和多个经济指标。
methods: 我们提出了一个名为Modality-aware Transformer的新型多modal transformer-based模型，利用这个模型可以充分利用不同modal的信息，同时实现时间序列预测和多modal跨模式理解。我们在这个模型中开发了一个内置特性级别注意力层，让模型在每个数据模式中对最重要的特性进行注意。
results: 我们的实验结果显示，Modality-aware Transformer在金融数据上比较 existed方法更好，提供了一个新和实际的解决方案 для多modal时间序列预测问题。

Abstract
Time series forecasting presents a significant challenge, particularly when its accuracy relies on external data sources rather than solely on historical values. This issue is prevalent in the financial sector, where the future behavior of time series is often intricately linked to information derived from various textual reports and a multitude of economic indicators. In practice, the key challenge lies in constructing a reliable time series forecasting model capable of harnessing data from diverse sources and extracting valuable insights to predict the target time series accurately. In this work, we tackle this challenging problem and introduce a novel multimodal transformer-based model named the Modality-aware Transformer. Our model excels in exploring the power of both categorical text and numerical timeseries to forecast the target time series effectively while providing insights through its neural attention mechanism. To achieve this, we develop feature-level attention layers that encourage the model to focus on the most relevant features within each data modality. By incorporating the proposed feature-level attention, we develop a novel Intra-modal multi-head attention (MHA), Inter-modal MHA and Modality-target MHA in a way that both feature and temporal attentions are incorporated in MHAs. This enables the MHAs to generate temporal attentions with consideration of modality and feature importance which leads to more informative embeddings. The proposed modality-aware structure enables the model to effectively exploit information within each modality as well as foster cross-modal understanding. Our extensive experiments on financial datasets demonstrate that Modality-aware Transformer outperforms existing methods, offering a novel and practical solution to the complex challenges of multi-modality time series forecasting.

摘要
时间序列预测存在 significativ Challenge，特别是当它的准确性取决于外部数据源而不仅仅是历史值。这个问题在金融领域非常普遍，因为未来时间序列的行为frequently linked to information derived from various textual reports and a multitude of economic indicators。在实践中，关键挑战在于构建可靠的时间序列预测模型，能够从多种数据源中提取有价值的信息，并准确预测目标时间序列。在这种工作中，我们解决这个挑战，并提出了一种新的多modal transformer-based模型，名为Modality-aware Transformer。我们的模型能够effectively explore the power of both categorical text and numerical time series to forecast the target time series accurately while providing insights through its neural attention mechanism。为了实现这一点，我们开发了一种特有的Feature-level attention层，该层鼓励模型对每个数据模式中最相关的特征进行注意力。通过在MHA中 integrate feature-level attention，我们开发了一种新的Intra-modal multi-head attention (MHA)、Inter-modal MHA和Modality-target MHA，其中both feature和 temporal attentions are incorporated in MHAs。这使得MHAs可以生成基于模式和特征重要性的temporal attention，从而生成更有信息的嵌入。我们的模型结构能够effectively exploit information within each modality as well as foster cross-modal understanding。我们对金融数据集进行了广泛的实验，并证明Modality-aware Transformer可以超过现有方法，提供一种新和实用的解决方案 для复杂的多模态时间序列预测问题。

Reconstructing Atmospheric Parameters of Exoplanets Using Deep Learning

paper_url: http://arxiv.org/abs/2310.01227
repo_url: None
paper_authors: Flavio Giobergia, Alkis Koudounas, Elena Baralis
for: 这篇论文是为了研究外层行星大气的Properties和特性，提出了一种基于深度学习和反向模型的多目标概率回归方法。
methods: 该方法结合了深度学习和反向模型技术，在多模式架构中实现了大气参数的估算。
results: 该方法可以更好地处理多目标问题，提高了计算效率和准确率，为外层行星研究提供了有价值的新思路和方法。

Abstract
Exploring exoplanets has transformed our understanding of the universe by revealing many planetary systems that defy our current understanding. To study their atmospheres, spectroscopic observations are used to infer essential atmospheric properties that are not directly measurable. Estimating atmospheric parameters that best fit the observed spectrum within a specified atmospheric model is a complex problem that is difficult to model. In this paper, we present a multi-target probabilistic regression approach that combines deep learning and inverse modeling techniques within a multimodal architecture to extract atmospheric parameters from exoplanets. Our methodology overcomes computational limitations and outperforms previous approaches, enabling efficient analysis of exoplanetary atmospheres. This research contributes to advancements in the field of exoplanet research and offers valuable insights for future studies.

摘要
translate_text: 探索外行星已经重新定义了我们对宇宙的理解，揭示了许多不同于我们当前理解的行星系统。为了研究它们的大气，我们使用光谱观测获取不直接测量的大气属性。估算大气参数，使得光谱特征最佳匹配指定的大气模型是一个复杂的问题，具有计算限制。在这篇论文中，我们提出了一种多目标概率回归方法，结合深度学习和反向模型技术，在多Modal 架构中提取大气参数。我们的方法超越计算限制，并超越先前的方法，使得对外行星大气的分析变得效率。这项研究对外行星研究领域的发展做出了贡献，并为未来的研究提供了价值的意见。Note: The translation is in Simplified Chinese, which is the standard version of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

A path-norm toolkit for modern networks: consequences, promises and challenges

paper_url: http://arxiv.org/abs/2310.01225
repo_url: None
paper_authors: Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti, Rémi Gribonval
for: This paper introduces a toolkit for generalization bounds of modern neural networks, specifically for DAG ReLU networks with biases, skip connections, and any operation based on the extraction of order statistics.
methods: The toolkit uses path-norms, which are a type of complexity measure that is easy to compute, invariant under network symmetries, and improves sharpness on feedforward networks.
results: The paper establishes generalization bounds for modern neural networks that are the most widely applicable and recover or beat the sharpest known bounds of this type. The toolkit is also used to numerically evaluate the sharpest known bounds for ResNets on ImageNet.

Abstract
This work introduces the first toolkit around path-norms that is fully able to encompass general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.

摘要
这个工具包 introduces the first toolkit around path-norms that can fully encompass general DAG ReLU networks with biases, skip connections, and any operation based on the extraction of order statistics: max pooling, GroupSort, etc. This toolkit allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms also enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. 该工具的多样性和易用性，使我们能够 numerically evaluate the sharpest known bounds for ResNets on ImageNet, challenging the concrete promises of path-norm-based generalization bounds.

Revisiting Mobility Modeling with Graph: A Graph Transformer Model for Next Point-of-Interest Recommendation

paper_url: http://arxiv.org/abs/2310.01224
repo_url: https://github.com/yukayo/mobgt
paper_authors: Xiaohang Xu, Toyotaro Suzumura, Jiawei Yong, Masatoshi Hanai, Chuang Yang, Hiroki Kanezashi, Renhe Jiang, Shintaro Fukushima
for: 本研究旨在提出一种能够充分利用图模型来捕捉用户流动数据中的空间和时间特征的POI推荐模型。
methods: 该模型基于图神经网络（GNN），并将个体空间和时间图嵌入器与全球用户位置关系嵌入器结合，以捕捉唯一的特征。此外，模型还包括基于图变换器的流动嵌入器，以提取更高级别的POI之间关系。
results: 实验结果表明，MobGT模型在多个数据集和指标上都有显著提高，相比之前的模型，平均提高24%。 codes 可以在 \url{https://github.com/Yukayo/MobGT} 上获取。

Abstract
Next Point-of-Interest (POI) recommendation plays a crucial role in urban mobility applications. Recently, POI recommendation models based on Graph Neural Networks (GNN) have been extensively studied and achieved, however, the effective incorporation of both spatial and temporal information into such GNN-based models remains challenging. Extracting distinct fine-grained features unique to each piece of information is difficult since temporal information often includes spatial information, as users tend to visit nearby POIs. To address the challenge, we propose \textbf{\underline{Mob}ility \textbf{\underline{G}raph \textbf{\underline{T}ransformer (MobGT) that enables us to fully leverage graphs to capture both the spatial and temporal features in users' mobility patterns. MobGT combines individual spatial and temporal graph encoders to capture unique features and global user-location relations. Additionally, it incorporates a mobility encoder based on Graph Transformer to extract higher-order information between POIs. To address the long-tailed problem in spatial-temporal data, MobGT introduces a novel loss function, Tail Loss. Experimental results demonstrate that MobGT outperforms state-of-the-art models on various datasets and metrics, achieving 24\% improvement on average. Our codes are available at \url{https://github.com/Yukayo/MobGT}.

摘要
Next Point-of-Interest (POI) recommendation plays a crucial role in urban mobility applications. Recently, POI recommendation models based on Graph Neural Networks (GNN) have been extensively studied and achieved, but the effective incorporation of both spatial and temporal information into such GNN-based models remains challenging. Extracting distinct fine-grained features unique to each piece of information is difficult since temporal information often includes spatial information, as users tend to visit nearby POIs. To address the challenge, we propose \textbf{\underline{Mobile} \textbf{\underline{Graph} \textbf{\underline{Transformer} (MobGT) that enables us to fully leverage graphs to capture both the spatial and temporal features in users' mobility patterns. MobGT combines individual spatial and temporal graph encoders to capture unique features and global user-location relations. Additionally, it incorporates a mobility encoder based on Graph Transformer to extract higher-order information between POIs. To address the long-tailed problem in spatial-temporal data, MobGT introduces a novel loss function, Tail Loss. Experimental results demonstrate that MobGT outperforms state-of-the-art models on various datasets and metrics, achieving 24\% improvement on average. Our codes are available at \url{https://github.com/Yukayo/MobGT}.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication

paper_url: http://arxiv.org/abs/2310.01211
repo_url: None
paper_authors: Irene Cannistraci, Luca Moschella, Marco Fumero, Valentino Maiorca, Emanuele Rodolà
for: 提高 neural network 模块的重用和合并性能
methods: 直接 incorporate 一组抽象到 latent representation 中的几何变换，无需先知道优化的抽象
results: 在 classification 和重建任务中，观察了一致的潜在相似性和下游性能提升在零shot合并设定下

Abstract
It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating task-specific transformations a priori can be challenging and expensive due to several factors (e.g., weights initialization, training hyperparameters, or data modality). To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting. The experimental analysis comprises three modalities (vision, text, and graphs), twelve pretrained foundational models, eight benchmarks, and several architectures trained from scratch.

摘要
各种神经网络学习的表示器可能会隐藏同类结构的相似性，当模型在类似启发假设下训练时。从几何角度来看，找出这些表示器中的类别转换和相关的免疫性是解锁应用的关键，如合并、缝合和重用不同神经模块。然而，在不同任务上预先估算任务特定的转换可能是困难和昂贵的，因为多种因素（例如权重初始化、训练超参数和数据类型）。为此，我们提出了一种通用的方法，直接将一组免疫性 incorporated 到表示器中，在幂 space 上构建免疫性的产品空间，无需先知道最佳免疫性。我们在分类和重建任务上验证了我们的解决方案，观察到了静态相似性和下游性能提升。实验分析包括三种模式（视觉、文本和图）、十二个预训练基础模型、八个标准核心和多个从零开始训练的建筑。

Unified Uncertainty Calibration

paper_url: http://arxiv.org/abs/2310.01202
repo_url: None
paper_authors: Kamalika Chaudhuri, David Lopez-Paz
for: 提高AI系统的稳定性、公平性和安全性，让分类器在测试示例上采取“我不知道”的决策。
methods: 提出了一种名为“统一不确定性均衡（U2C）”的框架，将 aleatoric 和 epistemic 不确定性集成到一起，以便对不确定性进行有效的学习理论分析，并在 ImageNet benchmark 上超越了 reject-or-classify 策略。
results: U2C 在 ImageNet 上实现了比 reject-or-classify 更高的性能，并且可以更好地捕捉不同来源的不确定性，进一步提高了 AI 系统的稳定性、公平性和安全性。

Abstract
To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes.The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise.Unfortunately, this recipe does not allow different sources of uncertainty to communicate with each other, produces miscalibrated predictions, and it does not allow to correct for misspecifications in our uncertainty estimates. To address these three issues, we introduce \emph{unified uncertainty calibration (U2C)}, a holistic framework to combine aleatoric and epistemic uncertainties. U2C enables a clean learning-theoretical analysis of uncertainty estimation, and outperforms reject-or-classify across a variety of ImageNet benchmarks.

摘要

SWoTTeD: An Extension of Tensor Decomposition to Temporal Phenotyping

paper_url: http://arxiv.org/abs/2310.01201
repo_url: None
paper_authors: Hana Sebia, Thomas Guyet, Etienne Audureau
for: 本研究旨在探讨electronic health records（EHR）数据中的个体轨迹分析，并提出了一种基于时间fenotype的新方法SWoTTeD（Sliding Window for Temporal Tensor Decomposition）来挖掘隐藏的时间模式。
methods: 本研究提出了一种 integrate several constraints and regularizations的方法SWoTTeD，以增强 extracted phenotypes的解释性。
results: 通过synthetic和实际数据 validate，SWoTTeD可以与最新的tensor decomposition模型匹配或超越它们，并提取了 meaningful for clinicians的时间 fenotypes。

Abstract
Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records (EHR). However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.

摘要
Recently, tensor decomposition has been gaining attention in the machine learning community for analyzing individual traces, such as Electronic Health Records (EHR). However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the concept of a temporal phenotype as an arrangement of features over time and proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD incorporates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets and present an original use case using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models and extracts temporal phenotypes that are meaningful for clinicians.Here's the Chinese text with traditional characters:近期，tensor decomposition在机器学习社区内已引起关注，用于分析个体轨迹，如电子医疗记录（EHR）。然而，当数据表现出复杂的时间模式时，这种任务变得非常困难。这篇论文提出了时间型现象（temporal phenotype）的概念，即时间上的特征排列，并提出了SWoTTeD（Sliding Window for Temporal Tensor Decomposition），一种新的方法，用于发现隐藏的时间模式。SWoTTeDintegrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes。我们验证了我们的提议使用了 both synthetic和实际数据集，并提供了一个原创的用例，使用了法国巴黎大学医院的数据。结果表明，SWoTTeD在最近的状态艺术tensor decomposition模型中至少具有相同的准确重建能力，并提取了有意义的临床型现象。

Federated K-means Clustering

paper_url: http://arxiv.org/abs/2310.01195
repo_url: https://github.com/ourownstory/federated_kmeans
paper_authors: Swier Garst, Marcel Reinders
for: 本研究旨在提出一种基于 federated learning 的 K-means 嵌入 clustering 算法，以保持数据隐私和拥有权。
methods: 该算法使用 federated averaging 方法，并采用一种新的聚合策略来Address the challenges of varying number of clusters between centers 和 less separable datasets。
results: 实验结果表明，该算法能够在不同数据中心之间的不同数据分布下准确地进行嵌入 clustering，并且在 less separable datasets 上具有更高的鲁棒性和稳定性。

Abstract
Federated learning is a technique that enables the use of distributed datasets for machine learning purposes without requiring data to be pooled, thereby better preserving privacy and ownership of the data. While supervised FL research has grown substantially over the last years, unsupervised FL methods remain scarce. This work introduces an algorithm which implements K-means clustering in a federated manner, addressing the challenges of varying number of clusters between centers, as well as convergence on less separable datasets.

摘要
设置语言为简化中文。 Federated learning 是一种技术，允许在分布式数据集上进行机器学习，而不需要数据集集中化，从而更好地保护数据隐私和所有权。 although supervised FL research has grown substantially in recent years, unsupervised FL methods are still scarce. This work introduces an algorithm that implements K-means clustering in a federated manner, addressing the challenges of varying number of clusters between centers, as well as convergence on less separable datasets.Note: "简化中文" (Simplified Chinese) is a romanization of Chinese characters, which is used in mainland China and Singapore. It is different from "traditional Chinese" (Traditional Chinese) which is used in Hong Kong, Taiwan, and other countries.

If there is no underfitting, there is no Cold Posterior Effect

paper_url: http://arxiv.org/abs/2310.01189
repo_url: None
paper_authors: Yijie Zhang, Yi-Shan Wu, Luis A. Ortega, Andrés R. Masegosa
for: 这篇论文研究了温 posterior effect（CPE）在 bayesian deep learning 中的存在，并发现在温度 $T<1$ 下， posterior predictive 可能会比 bayesian posterior ($T=1$) 更好。
methods: 这篇论文使用了 bayesian deep learning 方法，并研究了 CPE 是否为模型误差问题。
results: 这篇论文发现，如果存在过度适应（underfitting），那么 CPE 会出现；如果没有过度适应，那么 CPE 不会出现。

Abstract
The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performances than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood function. In this work, we provide a more nuanced understanding of the CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE.

摘要
冷后效应（CPE）（文泽等，2020）在 bayesian深度学习中显示，当 posterior 的温度 $T<1$ 时，结果的 posterior predictive 可能比 bayesian posterior ($T=1）更好。由于 bayesian posterior 在完美模型假设下是最优的，因此许多最近的工作都在研究 CPE 是模型假设错误的问题，来自 prior 和/或 likelihood 函数。在这个工作中，我们提供了更加细腻的理解 CPE，显示了 misspecification 导致 CPE 只有当 bayesian posterior 下降时。事实上，我们理论上表明，如果没有下降，就没有 CPE。Note: "冷后效应" (CPE) is the Chinese translation of "cold posterior effect".

Light Schrödinger Bridge

paper_url: http://arxiv.org/abs/2310.01174
repo_url: https://github.com/ngushchin/lightsb
paper_authors: Alexander Korotin, Nikita Gushchin, Evgeny Burnaev
for: This paper aims to address the issue of heavy-weighted and complex optimization of existing Schrodinger Bridges (SB) solvers, and proposes a novel fast and simple SB solver.
methods: The proposed LightSB solver combines two ideas from the field: parameterizing the Schrodinger potentials with sum-exp quadratic functions, and viewing the log-Schrodinger potentials as energy functions. The optimization objective is simple and straightforward, and the solver is lightweight, simulation-free, and theoretically justified.
results: The LightSB solver is able to solve SB in moderate dimensions in a matter of minutes on CPU without painful hyperparameter selection, and is proven to be a universal approximator of SBs. The code for the LightSB solver is available at https://github.com/ngushchin/LightSB.

Abstract
Despite the recent advances in the field of computational Schrodinger Bridges (SB), most existing SB solvers are still heavy-weighted and require complex optimization of several neural networks. It turns out that there is no principal solver which plays the role of simple-yet-effective baseline for SB just like, e.g., $k$-means method in clustering, logistic regression in classification or Sinkhorn algorithm in discrete optimal transport. We address this issue and propose a novel fast and simple SB solver. Our development is a smart combination of two ideas which recently appeared in the field: (a) parameterization of the Schrodinger potentials with sum-exp quadratic functions and (b) viewing the log-Schrodinger potentials as the energy functions. We show that combined together these ideas yield a lightweight, simulation-free and theoretically justified SB solver with a simple straightforward optimization objective. As a result, it allows solving SB in moderate dimensions in a matter of minutes on CPU without a painful hyperparameter selection. Our light solver resembles the Gaussian mixture model which is widely used for density estimation. Inspired by this similarity, we also prove an important theoretical result showing that our light solver is a universal approximator of SBs. The code for the LightSB solver can be found at https://github.com/ngushchin/LightSB

摘要
Despite recent advances in computational Schrödinger bridges (SB), most existing solvers are still computationally expensive and require complex optimization of multiple neural networks. There is no simple yet effective baseline solver for SB, similar to methods like $k$-means in clustering, logistic regression in classification, or the Sinkhorn algorithm in discrete optimal transport. We address this issue and propose a novel fast and simple SB solver. Our approach combines two recent ideas in the field: (a) parameterizing the Schrödinger potentials with sum-exp quadratic functions, and (b) viewing the log-Schrödinger potentials as energy functions. By combining these ideas, we obtain a lightweight, simulation-free, and theoretically justified SB solver with a simple and straightforward optimization objective. This allows for solving SB in moderate dimensions in just a few minutes on CPU without painful hyperparameter selection. Our light solver is similar to the Gaussian mixture model, which is widely used for density estimation. Inspired by this similarity, we also prove an important theoretical result showing that our light solver is a universal approximator of SBs. The code for the LightSB solver can be found at https://github.com/ngushchin/LightSB.

RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network

paper_url: http://arxiv.org/abs/2310.01157
repo_url: None
paper_authors: Haozhe Sun, Isabelle Guyon, Felix Mohr, Hedi Tabia
for: 这篇论文的目的是创建一个较小的、较快的模型，并且保持与大型背景网络相似的性能。
methods: 这篇论文使用了将预训练的大型背景网络（ResNet152）缩减为5个块，并分割模型为多个分支来提高性能。
results: 这篇论文的实验结果显示，使用这些技术可以创建一个较小的、较快的模型，并且与传统的背景网络组合相似的性能。

Abstract
It has become mainstream in computer vision and other machine learning domains to reuse backbone networks pre-trained on large datasets as preprocessors. Typically, the last layer is replaced by a shallow learning machine of sorts; the newly-added classification head and (optionally) deeper layers are fine-tuned on a new task. Due to its strong performance and simplicity, a common pre-trained backbone network is ResNet152.However, ResNet152 is relatively large and induces inference latency. In many cases, a compact and efficient backbone with similar performance would be preferable over a larger, slower one. This paper investigates techniques to reuse a pre-trained backbone with the objective of creating a smaller and faster model. Starting from a large ResNet152 backbone pre-trained on ImageNet, we first reduce it from 51 blocks to 5 blocks, reducing its number of parameters and FLOPs by more than 6 times, without significant performance degradation. Then, we split the model after 3 blocks into several branches, while preserving the same number of parameters and FLOPs, to create an ensemble of sub-networks to improve performance. Our experiments on a large benchmark of $40$ image classification datasets from various domains suggest that our techniques match the performance (if not better) of ``classical backbone fine-tuning'' while achieving a smaller model size and faster inference speed.

摘要
现在计算机视觉和其他机器学习领域中已成为主流的做法是 reuse 已经预训练的基准网络。通常是将最后一层替换为一个浅学习机制，新增的分类头和（选项别）更深的层进行 fine-tuning 新任务。由于其强大的表现和简单性，一个常见的预训练基准网络是 ResNet152。然而，ResNet152 相对较大，导致推理延迟。在许多情况下，一个更加 компакт和高效的基准网络会更有优势于一个更大和更慢的网络。这篇论文研究了如何重用预训练的基准网络，以创建一个更小更快的模型。从 ImageNet 预训练的大 ResNet152 基准网络开始，我们首先将其减少为 5 层，从而减少参数和 FLOPs 的数量比 exceeds 6 times，无需重要性下降。然后，我们在 3 层处将模型分割成多个分支，保持同样的参数和 FLOPs 数量，以创建一个 ensemble 的子网络，以提高性能。我们对 $40$ 个图像分类数据集进行了大规模的实验，结果表明，我们的技术与“经典基准网络精度”匹配或更好，同时实现了更小的模型大小和更快的推理速度。

Modularity in Deep Learning: A Survey

paper_url: http://arxiv.org/abs/2310.01154
repo_url: https://github.com/ckg-DeepLearning/CoverTypeClassification
paper_authors: Haozhe Sun, Isabelle Guyon
For: This paper reviews the concept of modularity in deep learning, focusing on three axes: data, task, and model.* Methods: The paper discusses different instantiations of the modularity principle in deep learning, including data modularity, task modularity, and model modularity.* Results: The paper provides a comprehensive overview of the advantages of modularity in deep learning, including ease of conceptualization, interpretability, scalability, module combinability, and module reusability.Here is the same information in Simplified Chinese text:* For: 这篇论文探讨了深度学习中的模块化原则，围绕着数据、任务和模型三个轴线进行了详细的介绍。* Methods: 论文讲述了不同的模块化实现方式，包括数据模块化、任务模块化和模型模块化。* Results: 论文提供了深度学习中模块化的优点，包括易于理解、可读性、可扩展性、模块可组合性和模块可重用性等。

Abstract
Modularity is a general principle present in many fields. It offers attractive advantages, including, among others, ease of conceptualization, interpretability, scalability, module combinability, and module reusability. The deep learning community has long sought to take inspiration from the modularity principle, either implicitly or explicitly. This interest has been increasing over recent years. We review the notion of modularity in deep learning around three axes: data, task, and model, which characterize the life cycle of deep learning. Data modularity refers to the observation or creation of data groups for various purposes. Task modularity refers to the decomposition of tasks into sub-tasks. Model modularity means that the architecture of a neural network system can be decomposed into identifiable modules. We describe different instantiations of the modularity principle, and we contextualize their advantages in different deep learning sub-fields. Finally, we conclude the paper with a discussion of the definition of modularity and directions for future research.

摘要
modularity是一种通用的原则，存在很多领域中。它带来了一些优点，包括易于概念、可解释性、可扩展性、模块可组合性和模块可重用性。深度学习社区长期寻求从模块原则中得到灵感，或者直接或间接地。这些 интерест在最近几年内不断增长。本文将对深度学习中的模块原理进行评论，分为三个轴：数据、任务和模型，这些轴描述了深度学习生命周期。数据模块性指的是将数据分组为不同目的。任务模块性指的是将任务拆分成子任务。模型模块性是指神经网络系统的架构可以被分解成可识别的模块。我们将介绍不同的模块原理的实现方式，并在不同的深度学习子领域中评估其优点。最后，我们将结束这篇论文，并对模块原理的定义和未来研究的方向进行讨论。

Cryptocurrency Portfolio Optimization by Neural Networks

paper_url: http://arxiv.org/abs/2310.01148
repo_url: None
paper_authors: Quoc Minh Nguyen, Dat Thanh Tran, Juho Kanniainen, Alexandros Iosifidis, Moncef Gabbouj
for: 提出了一种基于神经网络的算法，用于利用现代 cryptocurrency 投资产品进行减少风险或投资。
methods: 使用深度神经网络，输出每个时间间隔的资产分配重量，并通过最大化尖峰比率来减少网络偏袋性。
results: 经过19个月的Backtest测试，提出的算法可以生成能够在不同市场情况下实现盈利的神经网络。

Abstract
Many cryptocurrency brokers nowadays offer a variety of derivative assets that allow traders to perform hedging or speculation. This paper proposes an effective algorithm based on neural networks to take advantage of these investment products. The proposed algorithm constructs a portfolio that contains a pair of negatively correlated assets. A deep neural network, which outputs the allocation weight of each asset at a time interval, is trained to maximize the Sharpe ratio. A novel loss term is proposed to regulate the network's bias towards a specific asset, thus enforcing the network to learn an allocation strategy that is close to a minimum variance strategy. Extensive experiments were conducted using data collected from Binance spanning 19 months to evaluate the effectiveness of our approach. The backtest results show that the proposed algorithm can produce neural networks that are able to make profits in different market situations.

摘要
很多现代加密货币经纪人现在提供一种多种Derivative资产，让投资者进行减风险或投机投资。这篇论文提出了一种基于神经网络的有效算法，用于利用这些投资产品。提议的算法构建了一个包含一对相互负相关资产的资产组合。一个深度神经网络，输出每个时间间隔的每个资产的投资Weight，通过最大化Sharpe比来寻找最佳投资策略。提出了一个新的损失函数，以规避神经网络偏好某个资产，从而使神经网络学习一种减风险的投资策略。对 Binance 收集的19个月的数据进行了广泛的实验测试，以评估我们的方法的有效性。backtest结果表明，我们的算法可以生成能够在不同市场情况下产生利润的神经网络。

Parallel-in-Time Probabilistic Numerical ODE Solvers

paper_url: http://arxiv.org/abs/2310.01145
repo_url: https://github.com/nathanaelbosch/parallel-in-time-ode-filters
paper_authors: Nathanael Bosch, Adrien Corenflos, Fatemeh Yaghoobi, Filip Tronarp, Philipp Hennig, Simo Särkkä
for: numerical simulation of dynamical systems as problems of Bayesian state estimation
methods: time-parallel formulation of iterated extended Kalman smoothers
results: reduced span cost from linear to logarithmic in the number of time steps

Abstract
Probabilistic numerical solvers for ordinary differential equations (ODEs) treat the numerical simulation of dynamical systems as problems of Bayesian state estimation. Aside from producing posterior distributions over ODE solutions and thereby quantifying the numerical approximation error of the method itself, one less-often noted advantage of this formalism is the algorithmic flexibility gained by formulating numerical simulation in the framework of Bayesian filtering and smoothing. In this paper, we leverage this flexibility and build on the time-parallel formulation of iterated extended Kalman smoothers to formulate a parallel-in-time probabilistic numerical ODE solver. Instead of simulating the dynamical system sequentially in time, as done by current probabilistic solvers, the proposed method processes all time steps in parallel and thereby reduces the span cost from linear to logarithmic in the number of time steps. We demonstrate the effectiveness of our approach on a variety of ODEs and compare it to a range of both classic and probabilistic numerical ODE solvers.

摘要
<>传送给定文本到简化中文。>概率数学方法 для常微分方程（ODE）视数学动力系统的数字 simulate 为某种抽象 Bayesian 状态估计问题。除了生成 posterior 分布于 ODE 解和数字方法自身的误差外，这种形式主义还具有通过 Bayesian 滤波和平滑来获得的算法 Fleibility。在这篇文章中，我们利用这种灵活性，并基于时间平行的迭代扩展 Kalman 平滑器来构建一种并发在时间上的概率数学 ODE 解决方案。而不是在时间序列中顺序地 simulate 动力系统，这种方法在所有时间步长进行并行处理，从而将 span 成本由线性降低到对数型。我们在各种 ODE 上测试了我们的方法，并与经典和概率数学 ODE 解决方案进行了比较。

The Map Equation Goes Neural

paper_url: http://arxiv.org/abs/2310.01144
repo_url: None
paper_authors: Christopher Blöcker, Chester Tan, Ingo Scholtes
for: 本研究旨在bridging深度学习和网络科学两个领域，提出一种基于深度学习的社区检测方法，以便自动找出高级组织结构。
methods: 我们使用了map方程，一种信息论函数来实现社区检测。将其表示为完全可导的矩阵形式，然后通过梯度下降优化。这种方法可以兼容任何图神经网络架构，从而实现灵活的归一化和图 pooling，自动找出最佳数量的群集，并且自然地检测 overlap 社区。
results: 我们通过实验表明，我们的方法可以与基eline比肩，自动找出最佳数量的群集，避免了稀疏图的过分 partitioning。我们的方法还能够自然地检测 overlap 社区，并且不需要手动添加辅助特征。

Abstract
Community detection and graph clustering are essential for unsupervised data exploration and understanding the high-level organisation of networked systems. Recently, graph clustering has been highlighted as an under-explored primary task for graph neural networks. While hierarchical graph pooling has been shown to improve performance in graph and node classification tasks, it performs poorly in identifying meaningful clusters. Community detection has a long history in network science, but typically relies on optimising objective functions with custom-tailored search algorithms, not leveraging recent advances in deep learning, particularly from graph neural networks. In this paper, we narrow this gap between the deep learning and network science communities. We consider the map equation, an information-theoretic objective function for community detection. Expressing it in a fully differentiable tensor form that produces soft cluster assignments, we optimise the map equation with deep learning through gradient descent. More specifically, the reformulated map equation is a loss function compatible with any graph neural network architecture, enabling flexible clustering and graph pooling that clusters both graph structure and data features in an end-to-end way, automatically finding an optimum number of clusters without explicit regularisation. We evaluate our approach experimentally using different neural network architectures for unsupervised clustering in synthetic and real data. Our results show that our approach achieves competitive performance against baselines, naturally detects overlapping communities, and avoids over-partitioning sparse graphs.

摘要
In this paper, we bridge the gap between the deep learning and network science communities by considering the map equation, an information-theoretic objective function for community detection. We express it in a fully differentiable tensor form that produces soft cluster assignments, and we optimize it with deep learning through gradient descent. Our approach is compatible with any graph neural network architecture, allowing for flexible clustering and graph pooling that simultaneously clusters both graph structure and data features in an end-to-end manner. This enables the automatic discovery of an optimal number of clusters without explicit regularization.We evaluate our approach experimentally using different neural network architectures for unsupervised clustering in synthetic and real data. Our results show that our approach achieves competitive performance against baselines, naturally detects overlapping communities, and avoids over-partitioning sparse graphs.

CommIN: Semantic Image Communications as an Inverse Problem with INN-Guided Diffusion Models

paper_url: http://arxiv.org/abs/2310.01130
repo_url: None
paper_authors: Jiakang Chen, Di You, Deniz Gündüz, Pier Luigi Dragotti
for: 提高无线图像传输中的品质
methods: 使用傅立叶网络（INN）和扩散模型
results: 在极端条件下（如低带宽和低信号噪比）显著改善图像的品质

Abstract
Joint source-channel coding schemes based on deep neural networks (DeepJSCC) have recently achieved remarkable performance for wireless image transmission. However, these methods usually focus only on the distortion of the reconstructed signal at the receiver side with respect to the source at the transmitter side, rather than the perceptual quality of the reconstruction which carries more semantic information. As a result, severe perceptual distortion can be introduced under extreme conditions such as low bandwidth and low signal-to-noise ratio. In this work, we propose CommIN, which views the recovery of high-quality source images from degraded reconstructions as an inverse problem. To address this, CommIN combines Invertible Neural Networks (INN) with diffusion models, aiming for superior perceptual quality. Through experiments, we show that our CommIN significantly improves the perceptual quality compared to DeepJSCC under extreme conditions and outperforms other inverse problem approaches used in DeepJSCC.

摘要
joint source-channel coding schemes based on deep neural networks (DeepJSCC) 已经取得了对无线影像传输中的出色表现。然而，这些方法通常仅专注于从接收端传输端到源端的变数的干扰，而不是传输端对源端的semantic信息传输的质量。因此，在极端情况下，如低带宽和低信号至杂音比，可能导致严重的semantic扭曲。在这个工作中，我们提出了CommIN，它视为从损坏重建中恢复高质量源影像的问题为一个逆问题。为解决这个问题，CommIN结合了反射神经网络（INN）和扩散模型，实现了更好的semantic质量。经过实验，我们发现CommIN在极端情况下比DeepJSCC更有优秀的semantic质量表现，并且在其他 inverse problem 方法上进行了比较。

Predicting emergence of crystals from amorphous matter with deep learning

paper_url: http://arxiv.org/abs/2310.01117
repo_url: None
paper_authors: Muratahan Aykol, Amil Merchant, Simon Batzner, Jennifer N. Wei, Ekin Dogus Cubuk
for: 这个论文的目的是predicting the outcome of phase transitions in inorganic materials, enabling new research directions in material synthesis and development.
methods: 该论文使用了universal deep learning potentials to sample the crystallization pathways of local structural motifs at the atomistic level, allowing for the prediction of crystal structures of polymorphs from amorphous precursors with high accuracy.
results: 该论文的结果表明，通过利用 Ostwald’s rule of stages mechanistically at the molecular level, it is possible to predictably access new metastable crystals from the amorphous phase in material synthesis, across a diverse set of material systems including polymorphic oxides, nitrides, carbides, fluorides, chlorides, chalcogenides, and metal alloys.

Abstract
Crystallization of the amorphous phases into metastable crystals plays a fundamental role in the formation of new matter, from geological to biological processes in nature to synthesis and development of new materials in the laboratory. Predicting the outcome of such phase transitions reliably would enable new research directions in these areas, but has remained beyond reach with molecular modeling or ab-initio methods. Here, we show that crystallization products of amorphous phases can be predicted in any inorganic chemistry by sampling the crystallization pathways of their local structural motifs at the atomistic level using universal deep learning potentials. We show that this approach identifies the crystal structures of polymorphs that initially nucleate from amorphous precursors with high accuracy across a diverse set of material systems, including polymorphic oxides, nitrides, carbides, fluorides, chlorides, chalcogenides, and metal alloys. Our results demonstrate that Ostwald's rule of stages can be exploited mechanistically at the molecular level to predictably access new metastable crystals from the amorphous phase in material synthesis.

摘要
晶体化的杂形阶段到稳定的晶体发散着重要作用于自然界的形成和人工材料的 synthesis 中。可预测晶体化结果的方法会开拓新的研究方向，但这一目标尚未被分子模型或初始方法实现。本文显示，在无机化学中，通过 sampling 杂形阶段的晶体化路径的本地结构模式，使用 universal deep learning potentials 可预测晶体化产物的晶体结构。我们的结果表明，这种方法可以高精度地预测各种材料系统中的多形体，包括氧化物、硼化物、碳化物、氟化物、氯化物、硫化物和金属合金。我们的结果还示出，在材料合成中，奥斯特瓦尔的规则可以在分子水平上机制地抓住，以预测从杂形阶段到新的稳定晶体的晶体化过程。

R-divergence for Estimating Model-oriented Distribution Discrepancy

paper_url: http://arxiv.org/abs/2310.01109
repo_url: https://github.com/lawliet-zzl/r-div
paper_authors: Zhilin Zhao, Longbing Cao
for: 本文旨在检测两个数据集的概率分布是否相同，以便在不同的概率分布下进行模型训练。
methods: 本文提出了R- divergence方法，该方法通过估计最优假设来评估两个数据集的概率分布差异。
results: 实验表明，R-divergence方法可以准确地检测不同概率分布下的数据集差异，并且在不同任务上达到了状态 искусственный的性能。此外，本文还应用R-divergence方法在受损标签数据集上训练Robust神经网络。

Abstract
Real-life data are often non-IID due to complex distributions and interactions, and the sensitivity to the distribution of samples can differ among learning models. Accordingly, a key question for any supervised or unsupervised model is whether the probability distributions of two given datasets can be considered identical. To address this question, we introduce R-divergence, designed to assess model-oriented distribution discrepancies. The core insight is that two distributions are likely identical if their optimal hypothesis yields the same expected risk for each distribution. To estimate the distribution discrepancy between two datasets, R-divergence learns a minimum hypothesis on the mixed data and then gauges the empirical risk difference between them. We evaluate the test power across various unsupervised and supervised tasks and find that R-divergence achieves state-of-the-art performance. To demonstrate the practicality of R-divergence, we employ R-divergence to train robust neural networks on samples with noisy labels.

摘要
<> translate the following text into Simplified ChineseReal-life data are often non-IID due to complex distributions and interactions, and the sensitivity to the distribution of samples can differ among learning models. Accordingly, a key question for any supervised or unsupervised model is whether the probability distributions of two given datasets can be considered identical. To address this question, we introduce R-divergence, designed to assess model-oriented distribution discrepancies. The core insight is that two distributions are likely identical if their optimal hypothesis yields the same expected risk for each distribution. To estimate the distribution discrepancy between two datasets, R-divergence learns a minimum hypothesis on the mixed data and then gauges the empirical risk difference between them. We evaluate the test power across various unsupervised and supervised tasks and find that R-divergence achieves state-of-the-art performance. To demonstrate the practicality of R-divergence, we employ R-divergence to train robust neural networks on samples with noisy labels.中文简体版：实际数据往往非相关的，因为样本分布复杂和互动，而不同学习模型对样本分布的敏感度也不同。因此，任何supervised或unsupersvised模型的关键问题是否可以视两个给定的数据集为同一个分布。为解决这个问题，我们介绍了R-divergence，用于评估模型层次分布差异。R-divergence的核心思想是，如果两个分布的优化假设都能够对它们的预期风险做出同样的预测，那么这两个分布可能是同一个。为估计两个数据集之间的分布差异，R-divergence学习了混合数据上的最小假设，然后计算这两个数据集之间的实际风险差异。我们在不同的Unsupervised和Supervised任务中评估了R-divergence的测试能力，并发现它达到了当前最佳性能。为证明R-divergence的实用性，我们使用R-divergence来训练对样本标签有误的神经网络。

Energy-Guided Continuous Entropic Barycenter Estimation for General Costs

paper_url: http://arxiv.org/abs/2310.01105
repo_url: None
paper_authors: Alexander Kolesov, Petr Mokrov, Igor Udovichenko, Milena Gazdieva, Gudmund Pammer, Evgeny Burnaev, Alexander Korotin
for: averaging probability distributions while capturing their geometric properties
methods: novel algorithm based on weak OT and dual reformulation, with quality bounds and interconnectivity with Energy-Based Models
results: validated on low-dimensional scenarios and image-space setups, with practical applications in learning barycenter on image manifolds generated by pretrained generative models

Abstract
Optimal transport (OT) barycenters are a mathematically grounded way of averaging probability distributions while capturing their geometric properties. In short, the barycenter task is to take the average of a collection of probability distributions w.r.t. given OT discrepancies. We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions. Our approach is built upon the dual reformulation of the EOT problem based on weak OT, which has recently gained the attention of the ML community. Beyond its novelty, our method enjoys several advantageous properties: (i) we establish quality bounds for the recovered solution; (ii) this approach seemlessly interconnects with the Energy-Based Models (EBMs) learning procedure enabling the use of well-tuned algorithms for the problem of interest; (iii) it provides an intuitive optimization scheme avoiding min-max, reinforce and other intricate technical tricks. For validation, we consider several low-dimensional scenarios and image-space setups, including non-Euclidean cost functions. Furthermore, we investigate the practical task of learning the barycenter on an image manifold generated by a pretrained generative model, opening up new directions for real-world applications.

摘要

我们设定了解决方案的质量上限，从而确保方案的可靠性。2. 我们的方法可以继续使用已经调整好的EBMs学习过程，从而使用已有的算法解决问题。3. 我们的方法具有直观的优化方案，不需要使用复杂的技术手段，如min-max、强化等。为了验证我们的方法，我们在低维度的场景和图像空间中进行了多个实验，包括非欧几何成本函数。此外，我们还研究了在一个由预训练的生成模型生成的图像概率空间中学习矩形成的实际任务，开启了新的应用场景。

Seismogram Transformer: A generic deep learning backbone network for multiple earthquake monitoring tasks

paper_url: http://arxiv.org/abs/2310.01037
repo_url: https://github.com/senli1073/seist
paper_authors: Sen Li, Xu Yang, Anye Cao, Changbin Wang, Yaoqi Liu, Yapeng Liu, Qiang Niu
for: 这个论文主要探讨了地震记录（seismogram）处理的深度学习方法，以提高地震研究和监测的精度和效率。
methods: 这个论文提出了一种新的含有多种基础块的径向神经网络模型，称为Seismogram Transformer（SeisT），用于地震检测、地震阶段选择、初动方向分类、强度估计和反射方向估计等多种地震监测任务。
results: 这个论文的实验结果表明，SeisT模型在不同的任务上能够匹配或者甚至超越现有的状态对照模型，特别是在对于不同数据集的扩展性表现上。SeisT模型通过不同基础块的组合，能够从低级到高级复杂特征之间提取多种特征，如频率、阶段和时间-频率关系。

Abstract
Seismic records, known as seismograms, are crucial records of ground motion resulting from seismic events, constituting the backbone of earthquake research and monitoring. The latest advancements in deep learning have significantly facilitated various seismic signal processing tasks. This paper introduces a novel backbone neural network model designed for various seismic monitoring tasks, named Seismogram Transformer (SeisT). Thanks to its efficient network architecture, SeisT matches or even outperforms the state-of-the-art models in earthquake detection, seismic phase picking, first-motion polarity classification, magnitude estimation, and back-azimuth estimation tasks, particularly in terms of out-of-distribution generalization performance. SeisT consists of multiple network layers composed of different foundational blocks, which help the model understand multi-level feature representations of seismograms from low-level to high-level complex features, effectively extracting features such as frequency, phase, and time-frequency relationships from input seismograms. Three different-sized models were customized based on these diverse foundational modules. Through extensive experiments and performance evaluations, this study showcases the capabilities and potential of SeisT in advancing seismic signal processing and earthquake research.

摘要
震动记录，即震ogram，是地震研究和监测中非常重要的记录，表现地面上的运动。最新的深度学习技术在不同的震动信号处理任务中帮助了我们更好地完成工作。这篇论文介绍了一种新的震动监测模型，名为震ogram Transformer（SeisT），用于不同的震动监测任务。这种模型具有高效的网络架构，可以匹配或者超越现有的状态 искусственный智能模型在地震检测、震动相位选择、首动方向分类、 magnitude 估计和反射角估计等任务中的性能。SeisT 模型由多层网络组成，每层由不同的基础块组成，这些基础块帮助模型理解震ogram 中的多级特征表示，从低级特征到高级复杂特征，有效地提取震ogram 中的频率、相位和时间频率关系等特征。为了适应不同的应用场景，我们还制定了三个不同的模型大小。经过了广泛的实验和性能评估，这篇论文展示了 SeisT 模型在震动信号处理和地震研究中的可能性和应用前景。

A Novel Approach for Machine Learning-based Load Balancing in High-speed Train System using Nested Cross Validation

paper_url: http://arxiv.org/abs/2310.01034
repo_url: None
paper_authors: Ibrahim Yazici, Emre Gures
for: 高速铁路系统中的5G无线通信网络，以提高 mobil 用户的服务质量。
methods: 使用机器学习方法，包括嵌入式检查运算（Nested Cross Validation），以避免模型评估中的信息泄露，并且避免适材料过滤（Overfitting），以获得更好的一致错误（Generalization Error）。
results: 使用不同的机器学习方法，包括渐进增强回传（GBR）、适材料增强（AdaBoost）、猫缩回传（CBR）、人工神经网络（ANN）、核心ridge回传（KRR）、支持向量回传（SVR）和k-最近邻回传（KNNR），可以对高速铁路系统的问题进行优化。而使用嵌入式检查运算的结果，比较了不同的跨 VALIDATION 方案，发现boosting方法、ABR、CBR、GBR在嵌入式检查运算下表现最佳，而SVR、KNNR、KRR、ANN在嵌入式检查运算下的预测结果则有 promise。

Abstract
Fifth-generation (5G) mobile communication networks have recently emerged in various fields, including highspeed trains. However, the dense deployment of 5G millimeter wave (mmWave) base stations (BSs) and the high speed of moving trains lead to frequent handovers (HOs), which can adversely affect the Quality-of-Service (QoS) of mobile users. As a result, HO optimization and resource allocation are essential considerations for managing mobility in high-speed train systems. In this paper, we model system performance of a high-speed train system with a novel machine learning (ML) approach that is nested cross validation scheme that prevents information leakage from model evaluation into the model parameter tuning, thereby avoiding overfitting and resulting in better generalization error. To this end, we employ ML methods for the high-speed train system scenario. Handover Margin (HOM) and Time-to-Trigger (TTT) values are used as features, and several KPIs are used as outputs, and several ML methods including Gradient Boosting Regression (GBR), Adaptive Boosting (AdaBoost), CatBoost Regression (CBR), Artificial Neural Network (ANN), Kernel Ridge Regression (KRR), Support Vector Regression (SVR), and k-Nearest Neighbor Regression (KNNR) are employed for the problem. Finally, performance comparisons of the cross validation schemes with the methods are made in terms of mean absolute error (MAE) and mean square error (MSE) metrics are made. As per obtained results, boosting methods, ABR, CBR, GBR, with nested cross validation scheme superiorly outperforms conventional cross validation scheme results with the same methods. On the other hand, SVR, KNRR, KRR, ANN with the nested scheme produce promising results for prediction of some KPIs with respect to their conventional scheme employment.

摘要
fifth-generation (5G) 移动通信网络在不同领域出现，包括高速列车。然而，5G毫米波基站的密集部署和高速列车的运动速度导致频繁的手动更新（HO），可能会影响移动用户的服务质量（QoS）。因此，HO优化和资源分配是管理移动高速列车系统的关键考虑因素。在这篇论文中，我们使用一种新的机器学习（ML）方法，即嵌入式十字验证（Nested CV），来模型高速列车系统的性能。这种方法可以避免信息泄露，从而避免过拟合和更好地适应泛化误差。为此，我们在高速列车系统场景中采用ML方法。手动更新边缘（HOM）和时间触发（TTT）值被用作特征，并使用多个key performance indicators（KPIs）作为输出。此外，我们采用了多种ML方法，包括梯度提升回归（GBR）、适应提升（AdaBoost）、CatBoost回归（CBR）、人工神经网络（ANN）、核心ridge回归（KRR）、支持向量回归（SVR）和k-最近邻回归（KNNR）。最后，我们对嵌入式十字验证方案与这些方法进行了性能比较，并通过MAE和MSE指标进行评估。根据所获结果，梯度提升方法、ABR、CBR、GBR等方法，在嵌入式十字验证方案下表现出色，而SVR、KNRR、KRR、ANN等方法在嵌入式十字验证方案下表现出色。

The Fisher-Rao geometry of CES distributions

paper_url: http://arxiv.org/abs/2310.01032
repo_url: None
paper_authors: Florent Bouchard, Arnaud Breloy, Antoine Collas, Alexandre Renaux, Guillaume Ginolhac
for: 这篇论文是关于 Parametric Statistical Model 的扩展，通过在参数空间上尝试 Fisher 信息度量来自然地具有 Riemannian 拓扑结构，并利用这种结构来应用 differential geometry 的多种工具。
methods: 这篇论文使用了 Fisher 信息度量来 induce Riemannian 拓扑结构在参数空间上，并利用这种结构来解决 Covariance 矩阵估计、内在 Cramér-Rao 上限、以及使用 Riemannian 距离进行分类等问题。
results: 这篇论文的结果表明，通过使用 Riemannian 拓扑结构和 differential geometry 的工具，可以解决 Parametric Statistical Model 中的一些实际问题，如 Covariance 矩阵估计、内在 Cramér-Rao 上限、以及分类等问题。

Abstract
When dealing with a parametric statistical model, a Riemannian manifold can naturally appear by endowing the parameter space with the Fisher information metric. The geometry induced on the parameters by this metric is then referred to as the Fisher-Rao information geometry. Interestingly, this yields a point of view that allows for leveragingmany tools from differential geometry. After a brief introduction about these concepts, we will present some practical uses of these geometric tools in the framework of elliptical distributions. This second part of the exposition is divided into three main axes: Riemannian optimization for covariance matrix estimation, Intrinsic Cram\'er-Rao bounds, and classification using Riemannian distances.

摘要
当处理parametric统计模型时，Riemannian manifold自然而然地出现，通过将参数空间授予Fisher信息度量。这个度量在参数上引入的几何是Fisher-Rao信息几何。这种角度可以利用多种杂分几何工具。在这篇文章中，我们将首先介绍这些概念，然后在elliptical分布框架中展示一些实际应用。这一部分分为三个主要轴：Riemannian优化 дляcovariance矩阵估计、内在Cramér-Rao bound和Riemannian距离分类。

A Robust Machine Learning Approach for Path Loss Prediction in 5G Networks with Nested Cross Validation

paper_url: http://arxiv.org/abs/2310.01030
repo_url: None
paper_authors: Ibrahim Yazıcı, Emre Gures
for: This paper aims to improve the accuracy of path loss prediction in 5G wireless networks using machine learning (ML) methods, which can facilitate more accurate network planning, resource optimization, and performance improvement.
methods: The paper utilizes a nested cross validation scheme and six different ML methods (Support Vector Regression, CatBoost Regression, eXtreme Gradient Boosting Regression, Artificial Neural Network, and Random Forest) to predict path loss in a 5G network system, and compares the prediction results in terms of Mean Absolute Error and Mean Square Error.
results: The results show that XGBR outperforms the other methods, with a slight performance difference of 0.4% and 1% in terms of MAE and MSE, respectively, compared to CBR. The rest of the methods are outperformed by XGBR with clear performance differences.

Abstract
The design and deployment of fifth-generation (5G) wireless networks pose significant challenges due to the increasing number of wireless devices. Path loss has a landmark importance in network performance optimization, and accurate prediction of the path loss, which characterizes the attenuation of signal power during transmission, is critical for effective network planning, coverage estimation, and optimization. In this sense, we utilize machine learning (ML) methods, which overcome conventional path loss prediction models drawbacks, for path loss prediction in a 5G network system to facilitate more accurate network planning, resource optimization, and performance improvement in wireless communication systems. To this end, we utilize a novel approach, nested cross validation scheme, with ML to prevent overfitting, thereby getting better generalization error and stable results for ML deployment. First, we acquire a publicly available dataset obtained through a comprehensive measurement campaign conducted in an urban macro-cell scenario located in Beijing, China. The dataset includes crucial information such as longitude, latitude, elevation, altitude, clutter height, and distance, which are utilized as essential features to predict the path loss in the 5G network system. We deploy Support Vector Regression (SVR), CatBoost Regression (CBR), eXtreme Gradient Boosting Regression (XGBR), Artificial Neural Network (ANN), and Random Forest (RF) methods to predict the path loss, and compare the prediction results in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE). As per obtained results, XGBR outperforms the rest of the methods. It outperforms CBR with a slight performance differences by 0.4 % and 1 % in terms of MAE and MSE metrics, respectively. On the other hand, it outperforms the rest of the methods with clear performance differences.

摘要
fifth-generation (5G) 无线网络的设计和部署具有显著的挑战，主要是因为无线设备的增加数量。path loss的预测对网络性能优化具有重要的意义，因此我们使用机器学习（ML）方法来预测path loss，以便更好地规划网络、资源优化和性能提高。为此，我们采用了一种新的方法——嵌套交叉验证算法，以避免过拟合，从而获得更好的总体误差和稳定的结果。首先，我们获得了一个公共可用的数据集，通过在北京市区 macro-cell enario中进行了全面的测量活动来获得。该数据集包括了重要的信息，如 longitude、latitude、高度、高度、垃圾高度和距离，这些信息被用作5G网络系统中path loss预测的关键特征。我们使用支持向量回归（SVR）、CatBoost回归（CBR）、极限Gradient Boosting回归（XGBR）、人工神经网络（ANN）和Random Forest（RF）方法来预测path loss，并将结果比较以 Mean Absolute Error（MAE）和Mean Square Error（MSE）指标。根据结果显示，XGBR方法在所有方法中表现最佳，其与CBR方法的性能差距只有0.4%和1%，在MAE和MSE指标上分别下降。此外，XGBR方法在其他方法之间表现出清晰的性能差异。

Conflict-Aware Active Automata Learning

paper_url: http://arxiv.org/abs/2310.01003
repo_url: None
paper_authors: Tiago Ferreira, Léo Henry, Raquel Fernandes da Silva, Alexandra Silva
for: 这篇论文是用于解决活动自动化学习算法对于观察数据中的矛盾问题的。
methods: 本文提出了一个具有观察树的对话式活动自动化学习框架（C3AL），以便在学习过程中处理矛盾资料。
results: 根据大量的实验结果显示，C3AL可以更好地处理错误和系统变化，并且可以与现有的学习算法一起使用。

Abstract
Active automata learning algorithms cannot easily handle conflict in the observation data (different outputs observed for the same inputs). This inherent inability to recover after a conflict impairs their effective applicability in scenarios where noise is present or the system under learning is mutating. We propose the Conflict-Aware Active Automata Learning (C3AL) framework to enable handling conflicting information during the learning process. The core idea is to consider the so-called observation tree as a first-class citizen in the learning process. Though this idea is explored in recent work, we take it to its full effect by enabling its use with any existing learner and minimizing the number of tests performed on the system under learning, specially in the face of conflicts. We evaluate C3AL in a large set of benchmarks, covering over 30 different realistic targets, and over 18,000 different scenarios. The results of the evaluation show that C3AL is a suitable alternative framework for closed-box learning that can better handle noise and mutations.

摘要
aktive automata learning algoritmen kan ikke lette håndtere konflikt i observasjonsdata (forskjellige utgaver observert for de samme inputtene). Dette innbygge impiderer deres effektive anvendelighet i scenarioer der støy er tilstede eller systemet under læring er muterende. Vi foreslår Conflict-Aware Active Automata Learning (C3AL) rammeverk for å håndtere konflikt information under læringprosessen. Hovedidéen er å betrakte den kaldte observasjon tre som en førsteklasses borger i læringprosessen. Selv om denne ideen er utforsket i recent arbeid, tas vi den til sitt fulle uttrykk ved å tillate bruk av den med enhver eksisterende lærer og å minimere antall tester på systemet under læring, særlig i møte med konflikter. Vi evaluerer C3AL i et stort sett med benchmarks, dekkende over 30 forskjellige reelle mål og over 18 000 forskjellige scenarioer. Resultatene av evaluasjonen viser at C3AL er en passende alternativ rammeverk for lukket bok learning som kan bedre håndtere støy og mutasjoner.

A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression

paper_url: http://arxiv.org/abs/2310.00987
repo_url: None
paper_authors: Tin Sum Cheng, Aurelien Lucchi, Ivan Dokmanić, Anastasis Kratsios, David Belius
for: 这个论文旨在提供对任意Finite-rank kernel ridge regression（KRR）的精细学习保证。
methods: 这篇论文使用了非难式的方法，包括 derive sharp non-asymptotic upper and lower bounds for KRR test error。
results: 这篇论文提供了较为紧凑的 bounds，与之前的结果相比，它们在任何正则化参数下都 remained valid。

Abstract
Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regression (KRR) by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR. Our bounds are tighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters.

摘要
<>将给定文本翻译成简化中文。<>现有的统计学学习保证对通用kernel regression器通常会提供松弛的上限。然而，finite-rank kernel naturally appears in several machine learning problems,例如在转移学习中使用pre-trained deep neural network的最后一层进行微调以适应novel task。我们解决了这个差异 для finite-rank kernel ridge regression (KRR) by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR.我们的 boundstighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters.

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

paper_url: http://arxiv.org/abs/2310.00968
repo_url: None
paper_authors: Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu
for: 这篇论文旨在提出一种Contextual Dueling Bandits算法，用于处理带有对比性不确定性的决策问题。
methods: 该算法使用一种新的SupLinUCB型算法，具有计算效率和对偏差的 regret bound。
results: 实验结果表明，该算法在synthetic数据上比前一代不考虑偏差的算法表现更好，特别是在对比性较高的情况下。

Abstract
Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret bounds that account for the inherent uncertainty in pairwise comparisons between the dueling arms. Intuitively, greater uncertainty suggests a higher level of difficulty in the problem. To bridge this gap, this paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM). We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $\tilde O\big(d\sqrt{\sum_{t=1}^T\sigma_t^2} + d\big)$, where $\sigma_t$ is the variance of the pairwise comparison in round $t$, $d$ is the dimension of the context vectors, and $T$ is the time horizon. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $\tilde O(d)$ regret. We perform empirical experiments on synthetic data to confirm the advantage of our method over previous variance-agnostic algorithms.

摘要
“对抗匪徒”是一个具有偏好反馈的决策框架，这种特性适合人际互动的应用，如排名、资料搜寻和推荐系统。 despite significant efforts to minimize the cumulative regret in dueling bandits, there is a notable gap in the current research: the absence of regret bounds that account for the inherent uncertainty in pairwise comparisons between the dueling arms. Intuitively, greater uncertainty suggests a higher level of difficulty in the problem. To bridge this gap, this paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM). We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $\tilde O\big(d\sqrt{\sum_{t=1}^T\sigma_t^2} + d\big)$, where $\sigma_t$ is the variance of the pairwise comparison in round $t$, $d$ is the dimension of the context vectors, and $T$ is the time horizon. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $\tilde O(d)$ regret. We perform empirical experiments on synthetic data to confirm the advantage of our method over previous variance-agnostic algorithms.

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

paper_url: http://arxiv.org/abs/2310.00967
repo_url: https://github.com/kljp/micro
paper_authors: Daegun Yoon, Sangyoon Oh
for: 加速分布式深度神经网络（DNN）训练，提高训练效率和可扩展性。
methods: 提出了一种新的梯度减少方法，即MiCRO，通过将梯度向量分割，并对每个分割分配到相应的工作者进行梯度选择，以避免梯度积累和不当的阈值选择，并且可以根据用户需求设定最佳阈值，以实现近于零成本的梯度减少。
results: 在广泛的实验中，MiCRO比状态空间的减少器更具有极高的快速收敛率。

Abstract
Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate.

摘要
Gradient sparsification 是一种分布式深度神经网络（DNN）训练中的通信优化技术，它降低了梯度聚合所导致的通信峰值。然而，现有的简化器具有较差的扩展性，因为梯度选择所需的计算成本高，以及或者通信峰值的增加。特别是，通信峰值的增加是由梯度建立和不当的阈值导致的。为解决这些挑战，我们提出了一种新的梯度简化方法，称为 MiCRO。在 MiCRO 中，梯度 вектор 被分区，每个分区被分配到对应的工作者。每个工作者然后从其分区中选择梯度，并将所有分区的积和的梯度进行聚合。此外，MiCRO 估算了精准的阈值，以保持通信峰值与用户需求一致。MiCRO 允许 near-zero 成本的梯度简化，解决了分布式 DNN 训练中的扩展性和加速性的问题。在我们的广泛实验中，MiCRO 与状态之前的简化器相比，具有出色的收敛率。

Effective Learning with Node Perturbation in Deep Neural Networks

paper_url: http://arxiv.org/abs/2310.00965
repo_url: None
paper_authors: Sander Dalm, Marcel van Gerven, Nasir Ahmad
for: 提高深度神经网络模型的训练参数，替代传统的反射推传方法。
methods: 使用节点干扰（NP）方法，通过在网络活动中扔入噪声，并测量引起的损失变化来学习。
results: 通过与方向导Derivatives的关联和层weise输入decorrelating机制，NP学习得到显著改进，与传统的反射推传方法竞争。

Abstract
Backpropagation (BP) is the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP) proposes learning by the injection of noise into the network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and unstable due to its unguided, noise-based, activity search. In this work, we investigate different formulations of NP and relate it to the concept of directional derivatives as well as combining it with a decorrelating mechanism for layer-wise inputs. We find that a closer alignment with directional derivatives, and induction of decorrelation of inputs at every layer significantly enhances performance of NP learning making it competitive with BP.

摘要
“背测传播”（Backpropagation，BP）是深度神经网络模型的训练方法之主流和最成功的方法。然而，BP 依赖了两个计算上不同的阶段，无法提供满意的生物学学习解释，且可能对网络中的随机性或不确定性进行训练时具有挑战。相比之下，“节点干扰”（Node Perturbation，NP）提出了通过对网络活动的噪声注入，并且 mesure 对� induced loss change 的方法。NP 依赖了两个前（推论）通过，没有使用网络 Derivative，并且被视为生物学系统中的学习模型。然而，标准的NP 对于资料效率和稳定性而言相当不利，因为它的不导向、噪声基的活动搜寻。在这个研究中，我们调查了不同的NP формулювання，并与方向 derivative 的概念和层别输入的decorrelating Mechanism 结合。我们发现，与方向 derivative 更加接近的NP 学习，并且在每个层级 inducing decorrelation of inputs 可以很好地提高 NP 的性能，使其与 BP 竞争。

A Novel IoT Trust Model Leveraging Fully Distributed Behavioral Fingerprinting and Secure Delegation

paper_url: http://arxiv.org/abs/2310.00953
repo_url: None
paper_authors: Marco Arazzi, Serena Nicolazzo, Antonino Nocera
for: 提供一种在互联网物联网设备之间evaluate对象的信任性的方法，以帮助解决数据黑客和丢失问题。
methods: 我们提出了一种基于行为指纹、分布式一致算法和区块链技术的全分布式信任模型，以及一种安全模型和测试方法来评估模型的正确性和性能。
results: 我们的研究表明，我们的方法可以帮助iot设备在网络中评估对象的信任性，从而减少数据黑客和丢失的风险。我们的方法可以在不同类型的设备上实现，并且在不同的网络环境下进行有效地评估对象的信任性。

Abstract
With the number of connected smart devices expected to constantly grow in the next years, Internet of Things (IoT) solutions are experimenting a booming demand to make data collection and processing easier. The ability of IoT appliances to provide pervasive and better support to everyday tasks, in most cases transparently to humans, is also achieved through the high degree of autonomy of such devices. However, the higher the number of new capabilities and services provided in an autonomous way, the wider the attack surface that exposes users to data hacking and lost. In this scenario, many critical challenges arise also because IoT devices have heterogeneous computational capabilities (i.e., in the same network there might be simple sensors/actuators as well as more complex and smart nodes). In this paper, we try to provide a contribution in this setting, tackling the non-trivial issues of equipping smart things with a strategy to evaluate, also through their neighbors, the trustworthiness of an object in the network before interacting with it. To do so, we design a novel and fully distributed trust model exploiting devices' behavioral fingerprints, a distributed consensus mechanism and the Blockchain technology. Beyond the detailed description of our framework, we also illustrate the security model associated with it and the tests carried out to evaluate its correctness and performance.

摘要
随着智能设备的连接数量逐渐增加，互联网智能（IoT）解决方案的需求也在不断增长，以便更方便地收集和处理数据。智能设备的高度自主性使得它们能够在大多数情况下透明地为人类提供每天任务的支持。然而，随着新功能和服务的增加，用户面临的数据入侵和丢失的风险也在增加。在这种情况下，许多重要的挑战也在出现，其中一个是因为 IoT 设备的计算能力具有多种不同的水平（即在同一个网络中可能有简单的感知器/动作器以及更复杂的智能节点）。在这篇论文中，我们尝试提供一种在这种设定下的贡献，即为智能东西 equip 一种评估网络中对象的可信worthiness 的策略，并通过其邻居进行评估。为此，我们设计了一种全新的、分布式的信任模型，利用设备的行为指纹、分布式共识机制和区块链技术。我们还详细描述了我们的框架的安全模型和对其正确性和性能的测试。

Improved Variational Bayesian Phylogenetic Inference using Mixtures

paper_url: http://arxiv.org/abs/2310.00941
repo_url: https://github.com/lagergren-lab/vbpi-mixtures
paper_authors: Oskar Kviman, Ricky Molén, Jens Lagergren
for: 增强生物进化树的准确性，特别是树结构和分支长度的近似。
methods: 使用Variational Bayesian Phylogenetic Inference（VBPI）框架，加上现代深度学习技术如正常化流和图神经网络，进行树 topology 和分支长度的近似。
results: 在多个实际生物演化数据集上达到了状态体现性能。

Abstract
We present VBPI-Mixtures, an algorithm designed to enhance the accuracy of phylogenetic posterior distributions, particularly for tree-topology and branch-length approximations. Despite the Variational Bayesian Phylogenetic Inference (VBPI), a leading-edge black-box variational inference (BBVI) framework, achieving remarkable approximations of these distributions, the multimodality of the tree-topology posterior presents a formidable challenge to sampling-based learning techniques such as BBVI. Advanced deep learning methodologies such as normalizing flows and graph neural networks have been explored to refine the branch-length posterior approximation, yet efforts to ameliorate the posterior approximation over tree topologies have been lacking. Our novel VBPI-Mixtures algorithm bridges this gap by harnessing the latest breakthroughs in mixture learning within the BBVI domain. As a result, VBPI-Mixtures is capable of capturing distributions over tree-topologies that VBPI fails to model. We deliver state-of-the-art performance on difficult density estimation tasks across numerous real phylogenetic datasets.

摘要
我团队 todavía present VBPI-Mixtures，一种算法用于提高生物进化 posterior 分布的准确性，特别是树体态和分支长度的估计。 despite Variational Bayesian Phylogenetic Inference (VBPI) 是一个 cutting-edge black-box 变量推理 (BBVI) 框架，它已经实现了这些分布的很好的估计，但是树体态 posterior 的多模性是一个困难的挑战，而 sampling-based learning 技术如 BBVI 无法有效地处理这种多模性。在这种情况下，我们提出了一种新的 VBPI-Mixtures 算法，它利用 BBVI 领域中最新的混合学习突破口，以解决 VBPI 无法模型的树体态 posterior 问题。我们的 VBPI-Mixtures 算法能够捕捉 VBPI 无法模型的分布，并在许多实际生物进化数据集上达到了state-of-the-art 性能。

Integration of Graph Neural Network and Neural-ODEs for Tumor Dynamic Prediction

paper_url: http://arxiv.org/abs/2310.00926
repo_url: None
paper_authors: Omid Bazgir, Zichen Wang, Marc Hafner, James Lu
for: 这个研究旨在帮助抗癌药物开发中解决高维度 genomics 数据、肿瘤来源、治疗目标和治疗反应之间的复杂关系。
methods: 本研究提出了一种异源graph Encoder，该方法利用了双方向图connolly（GCN）和神经普通对数方程（Neural-ODEs），以实现个性化肿瘤动态预测。
results: 研究发现，该方法能够提高个性化肿瘤动态预测，并且能够充分利用多modal数据来增强肿瘤预测。

Abstract
In anti-cancer drug development, a major scientific challenge is disentangling the complex relationships between high-dimensional genomics data from patient tumor samples, the corresponding tumor's organ of origin, the drug targets associated with given treatments and the resulting treatment response. Furthermore, to realize the aspirations of precision medicine in identifying and adjusting treatments for patients depending on the therapeutic response, there is a need for building tumor dynamic models that can integrate both longitudinal tumor size as well as multimodal, high-content data. In this work, we take a step towards enhancing personalized tumor dynamic predictions by proposing a heterogeneous graph encoder that utilizes a bipartite Graph Convolutional Neural network (GCN) combined with Neural Ordinary Differential Equations (Neural-ODEs). We applied the methodology to a large collection of patient-derived xenograft (PDX) data, spanning a wide variety of treatments (as well as their combinations) on tumors that originated from a number of different organs. We first show that the methodology is able to discover a tumor dynamic model that significantly improves upon an empirical model which is in current use. Additionally, we show that the graph encoder is able to effectively utilize multimodal data to enhance tumor predictions. Our findings indicate that the methodology holds significant promise and offers potential applications in pre-clinical settings.

摘要
在抗癌药物开发中，一个主要的科学挑战是分离高维 genomics 数据，来自病人肿瘤样本，与相应的肿瘤所属的器官、与给定治疗相关的药Target以及治疗效果的关系。此外，为实现精准医学的目标，需要建立肿瘤动态模型，可以结合长期肿瘤大小和多Modal、高Content数据。在这项工作中，我们提出一种异构图像编码器，使用BiPartite Graph Convolutional Neural network (GCN)和Neural Ordinary Differential Equations (Neural-ODEs)。我们应用这种方法ологи到了一个大量的病人 derivated xenograft (PDX) 数据集，覆盖了多种治疗（以及其组合），对来自多种器官的肿瘤进行预测。我们首先显示该方法可以发现一个肿瘤动态模型，Significantly improves upon an empirical model ，现在使用。此外，我们还表明异构图像编码器可以有效地利用多Modal数据来增强肿瘤预测。我们的发现表明该方法具有普遍应用的潜在应用，特别是在预 клиниче设置中。

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

paper_url: http://arxiv.org/abs/2310.00902
repo_url: https://github.com/ykwon0407/datainf
paper_authors: Yongchan Kwon, Eric Wu, Kevin Wu, James Zou
for: 这paper是为了提高机器学习模型的透明度和理解输出，并且可以帮助标注数据点的批注。
methods: 这paper使用了一种名为DataInf的有效影响近似方法，该方法可以实现大规模的生成AI模型中的影响计算。
results: 经过系统的实验评估，DataInf可以准确地计算影响得分，并且比现有的方法更快速和有更少的内存占用。在应用于RoBERTa-large、Llama-2-13B-chat和stable-diffusion-v1.5模型中，DataInf能够更好地 identificet最重要的 fine-tuning 示例，并且可以帮助标注数据点的批注。

Abstract
Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline. The influence function is a principled and popular data attribution method, but its computational cost often makes it challenging to use. This issue becomes more pronounced in the setting of large language models and text-to-image models. In this work, we propose DataInf, an efficient influence approximation method that is practical for large-scale generative AI models. Leveraging an easy-to-compute closed-form expression, DataInf outperforms existing influence computation algorithms in terms of computational and memory efficiency. Our theoretical analysis shows that DataInf is particularly well-suited for parameter-efficient fine-tuning techniques such as LoRA. Through systematic empirical evaluations, we show that DataInf accurately approximates influence scores and is orders of magnitude faster than existing methods. In applications to RoBERTa-large, Llama-2-13B-chat, and stable-diffusion-v1.5 models, DataInf effectively identifies the most influential fine-tuning examples better than other approximate influence scores. Moreover, it can help to identify which data points are mislabeled.

摘要
量化训练数据点的影响是机器学习模型输出的理解和AI管道的透明度的关键。影响函数是一种有 principios y popular data attribution方法，但其计算成本经常使其成为实现困难的。在大语言模型和文本到图像模型的设置中，这个问题更加突出。在这种情况下，我们提出了DataInf，一种高效的影响估计方法，适用于大规模生成AI模型。通过一个容易计算的关闭式表达，DataInf在计算和内存效率方面比既有的影响计算算法高效得多。我们的理论分析表明，DataInf在精细调整技术such as LoRA中特别有效。通过系统的实验评估，我们发现DataInf可以准确地估计影响得分，并且比现有方法快得多。在应用于RoBERTa-large、Llama-2-13B-chat和stable-diffusion-v1.5模型时，DataInf能够更好地确定最有影响的练习示例，并且能够帮助标识数据点是否被标注错误。

paper_url: http://arxiv.org/abs/2310.00896
repo_url: None
paper_authors: Yihong Zhang, Takahiro Hara
for: 这个论文旨在预测活动参与者。methods: 该论文使用社交媒体转发活动数据来增强活动参与者预测模型。它创建了一个共同知识图，将社交媒体和目标领域的信息相互关联。此外，它提出了一种利用转发信息更好地预测目标领域的学习模型。results: 作者在两个场景中进行了广泛的实验，使用实际数据。在每个场景中，他们设置了不同的训练数据大小和热和冷测试 caso。结果显示，他们的方法在热测试 caso 和数据有限情况下一直表现出优异性，特别是在热测试 caso 上。

Abstract
Nowadays, many platforms on the Web offer organized events, allowing users to be organizers or participants. For such platforms, it is beneficial to predict potential event participants. Existing work on this problem tends to borrow recommendation techniques. However, compared to e-commerce items and purchases, events and participation are usually of a much smaller frequency, and the data may be insufficient to learn an accurate model. In this paper, we propose to utilize social media retweeting activity data to enhance the learning of event participant prediction models. We create a joint knowledge graph to bridge the social media and the target domain, assuming that event descriptions and tweets are written in the same language. Furthermore, we propose a learning model that utilizes retweeting information for the target domain prediction more effectively. We conduct comprehensive experiments in two scenarios with real-world data. In each scenario, we set up training data of different sizes, as well as warm and cold test cases. The evaluation results show that our approach consistently outperforms several baseline models, especially with the warm test cases, and when target domain data is limited.

摘要
现在，许多网络平台上提供了有组织的活动，让用户成为组织者或参与者。为这些平台，预测可能参加活动的人员是有利的。现有的工作通常是借鉴推荐技术。然而，与电商Item和购买相比，活动和参与的频率通常较少，数据可能不够学习准确的模型。在这篇论文中，我们提议使用社交媒体转发活动数据来增强参与者预测模型的学习。我们创建了一个共同知识图，将社交媒体和目标领域的信息相互连接，假设活动描述和微博都是同一种语言。此外，我们提出了一种利用转发信息更好地预测目标领域的学习模型。我们对实际数据进行了广泛的实验，在两个场景中，每个场景都设置了不同的训练数据大小和热和冷测试 caso。评估结果显示，我们的方法在热测试 caso中具有显著的优势，特别是当目标领域数据有限时。

Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss

paper_url: http://arxiv.org/abs/2310.00893
repo_url: None
paper_authors: Jaidev Gill, Vala Vakilian, Christos Thrampoulidis
for: 这篇论文的目的是提出一种代替cross-entropy（CE）的分类任务方法，使用相似性在嵌入空间来允许更加丰富的表示。
methods: 这篇论文提出了 modifying the contrastive loss来引导学习的嵌入空间的geometry的方法，并通过实验发现，在每个batch中包含prototypes可以使得learnt embedding的geometry与prototypes的geometry相似。
results: 通过对深度神经网络进行多个实验， authors validate their findings and show that this method can improve the performance of the classifier.Here’s the full text in Simplified Chinese:
for: 这篇论文的目的是提出一种代替cross-entropy（CE）的分类任务方法，使用相似性在嵌入空间来允许更加丰富的表示。
methods: 这篇论文提出了 modifying the contrastive loss来引导学习的嵌入空间的geometry的方法，并通过实验发现，在每个batch中包含prototypes可以使得learnt embedding的geometry与prototypes的geometry相似。
results: 通过对深度神经网络进行多个实验， authors validate their findings and show that this method can improve the performance of the classifier.

Abstract
Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks that makes use of similarities in the embedding space to allow for richer representations. In this work, we propose methods to engineer the geometry of these learnt feature embeddings by modifying the contrastive loss. In pursuit of adjusting the geometry we explore the impact of prototypes, fixed embeddings included during training to alter the final feature geometry. Specifically, through empirical findings, we demonstrate that the inclusion of prototypes in every batch induces the geometry of the learnt embeddings to align with that of the prototypes. We gain further insights by considering a limiting scenario where the number of prototypes far outnumber the original batch size. Through this, we establish a connection to cross-entropy (CE) loss with a fixed classifier and normalized embeddings. We validate our findings by conducting a series of experiments with deep neural networks on benchmark vision datasets.

摘要
超级vised-contrastive loss (SCL) 是一种用于分类任务的替代方法，它利用特征空间中的相似性来允许更加丰富的表示。在这项工作中，我们提议修改对冲损失来控制learnt的特征嵌入的geometry。为了调整geometry，我们探索包括在训练中添加prototype的方法。具体来说，我们通过实验发现，在每个batch中包含prototype的 inclusioninduceslearnt的特征嵌入的geometry与prototype的geometry相对符合。我们还通过考虑一种情况，即原始batch size与prototype的数量之间的比例较大，来获得更多的内容。通过这种方式，我们建立了与cross-entropy (CE)损失和固定分类器的连接，并且使用normalized的嵌入。我们验证了我们的发现通过对深度神经网络进行一系列实验，并在图像识别 benchmark datasets 上进行了验证。

Deep Neural Networks Tend To Extrapolate Predictably

paper_url: http://arxiv.org/abs/2310.00873
repo_url: https://github.com/katiekang1998/cautious_extrapolation
paper_authors: Katie Kang, Amrith Setlur, Claire Tomlin, Sergey Levine
for: 该研究检验了神经网络对不同类型的输入数据的预测性能，以及如何在面对不同类型的输入数据时使用神经网络进行风险感知。
methods: 该研究使用了多个 datasets，不同的损失函数和网络架构，并通过观察神经网络预测值的变化情况来描述神经网络在面对不同类型的输入数据时的行为。
results: 研究发现，对于高维输入的神经网络预测结果往往受到输入数据的类型的影响，而且在输入数据变得越来越不同于训练数据时，神经网络预测结果往往会变得更加稳定，并且与最优常数解（OCS）之间的差异逐渐减少。

Abstract
Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural networks with high-dimensional inputs. Rather than extrapolating in arbitrary ways, we observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. Moreover, we find that this value often closely approximates the optimal constant solution (OCS), i.e., the prediction that minimizes the average loss over the training data without observing the input. We present results showing this phenomenon across 8 datasets with different distributional shifts (including CIFAR10-C and ImageNet-R, S), different loss functions (cross entropy, MSE, and Gaussian NLL), and different architectures (CNNs and transformers). Furthermore, we present an explanation for this behavior, which we first validate empirically and then study theoretically in a simplified setting involving deep homogeneous networks with ReLU activations. Finally, we show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.

摘要
We present results demonstrating this phenomenon across 8 datasets with different distributional shifts, loss functions, and architectures. We also provide an explanation for this behavior, which we first validate empirically and then study theoretically in a simplified setting involving deep homogeneous networks with ReLU activations. Finally, we show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.

COMPOSER: Scalable and Robust Modular Policies for Snake Robots

paper_url: http://arxiv.org/abs/2310.00871
repo_url: None
paper_authors: Yuyou Zhang, Yaru Niu, Xingyu Liu, Ding Zhao
For: + The paper aims to develop a control policy for snake robots that leverages their hyper-redundancy and flexibility to enhance robustness and generalizability.* Methods: + The paper formulates the control of the snake robot as a cooperative Multi-Agent Reinforcement Learning (MARL) problem, with each segment of the snake robot functioning as an individual agent. + The paper incorporates a self-attention mechanism to enhance cooperative behavior between agents, and proposes a high-level imagination policy to provide additional rewards to guide the low-level control policy.* Results: + The proposed method COMPOSER achieves the highest success rate across five snake robot tasks, including goal reaching, wall climbing, shape formation, tube crossing, and block pushing, compared to a centralized baseline and four modular policy baselines. + The method demonstrates enhanced robustness against module corruption and significantly superior zero-shot generalizability.Here is the information in Simplified Chinese text:* For: + 论文目标是开发一种利用蛇机器人的超低维度和灵活性来增强Robustness和普遍性的控制策略。* Methods: + 论文将蛇机器人控制问题设置为一个协同多智能体学习（MARL）问题，每个蛇机器人段都作为一个个体 Agent。 + 论文含有自注意机制来增强协同行为，并提出一种高级幻想策略来为低级控制策略提供额外奖励。* Results: + 提案的方法COMPOSER在五个蛇机器人任务中取得了最高成功率，包括目标达成、墙 climbing、形态形成、管道跨越和块推动等，比中央基线和四个模块策略基线高。 + 方法还表现出了增强的模块腐坏鲁棒性和零基础学习可重复性。

Abstract
Snake robots have showcased remarkable compliance and adaptability in their interaction with environments, mirroring the traits of their natural counterparts. While their hyper-redundant and high-dimensional characteristics add to this adaptability, they also pose great challenges to robot control. Instead of perceiving the hyper-redundancy and flexibility of snake robots as mere challenges, there lies an unexplored potential in leveraging these traits to enhance robustness and generalizability at the control policy level. We seek to develop a control policy that effectively breaks down the high dimensionality of snake robots while harnessing their redundancy. In this work, we consider the snake robot as a modular robot and formulate the control of the snake robot as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. Each segment of the snake robot functions as an individual agent. Specifically, we incorporate a self-attention mechanism to enhance the cooperative behavior between agents. A high-level imagination policy is proposed to provide additional rewards to guide the low-level control policy. We validate the proposed method COMPOSER with five snake robot tasks, including goal reaching, wall climbing, shape formation, tube crossing, and block pushing. COMPOSER achieves the highest success rate across all tasks when compared to a centralized baseline and four modular policy baselines. Additionally, we show enhanced robustness against module corruption and significantly superior zero-shot generalizability in our proposed method. The videos of this work are available on our project page: https://sites.google.com/view/composer-snake/.

摘要
神经骨蟹机器人在与环境互动中表现出了惊人的适应性和灵活性，与其自然对应者一样。尽管神经骨蟹机器人的高级别和多维度特征增加了控制难度，但是这些特征也隐藏了控制策略的不利影响。我们寻求开发一种控制策略，可以有效地将神经骨蟹机器人的高维度特征纳入控制范畴，同时利用其灵活性。在这项工作中，我们将神经骨蟹机器人视为模块化机器人，并将其控制问题形式为合作多智能体学习（MARL）问题。每个神经骨蟹机器人段功能为个体代理。我们采用自注意机制来增强代理之间的合作行为。我们还提出了高级别想象策略，以提供低级别控制策略的引导。我们验证了我们的方法COMPOSER，并在五个神经骨蟹机器人任务中取得了最高成功率，比中央基线和四个模块策略基线更高。此外，我们还证明了我们的方法具有更高的机器人模块损害robustness和零基础学习能力。视频 demo 可以在我们项目页面上找到：https://sites.google.com/view/composer-snake/.

Drug Discovery with Dynamic Goal-aware Fragments

paper_url: http://arxiv.org/abs/2310.00841
repo_url: None
paper_authors: Seul Lee, Seanie Lee, Sung Ju Hwang
for: 用于药物探索和发现新药候选体
methods: 使用目标化学性质信息瓶颈原理提取目标化学性质的重要片段，并将其组装成一个有目标性的片段词典。然后通过增强的碎片修改模块，继续探索和更新碎片词典。
results: 通过三个模块的生成循环，GEAM有效地探索和发现了许多有优点的药物候选体。

Abstract
Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation. To this end, we propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM). GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification. The fragment extraction module identifies important fragments that contribute to the desired target properties with the information bottleneck principle, thereby constructing an effective goal-aware fragment vocabulary. Moreover, GEAM can explore beyond the initial vocabulary with the fragment modification module, and the exploration is further enhanced through the dynamic goal-aware vocabulary update. We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks.

摘要
Fragment-based drug discovery 是一种有效的探索药物候选者的策略，广泛应用于分子生成模型中。然而，许多现有的 Fragment 提取方法在这些模型中并不考虑目标化学性质或者采用规则性的方法。此外，现有的 Fragment-based 生成模型无法在生成过程中更新 Fragment 词汇库，以满足目标化学性质。为此，我们提出了一种用于药物探索的分子生成框架，名为 Goal-aware Fragment Extraction、Assembly、and Modification（GEAM）。 GEAM 包括三个模块，每个模块负责goal-aware Fragment 提取、Fragment 组装和Fragment 修改。 Fragment 提取模块通过信息瓶颈原理来确定重要的 Fragment，以构建有效的目标化学性质相关的 Fragment 词汇库。此外，GEAM 可以在生成过程中超越初始词汇库，并通过动态更新目标化学性质相关的 Fragment 词汇库来进一步增强探索。我们在多个药物探索任务中实验表明，GEAM 能够有效地通过生成模型的三个模块来找到药物候选者。

Subsurface Characterization using Ensemble-based Approaches with Deep Generative Models

paper_url: http://arxiv.org/abs/2310.00839
repo_url: https://github.com/jichao1/wgan-gp
paper_authors: Jichao Bao, Hongkyu Yoon, Jonghyun Lee
for:This paper aims to accurately and efficiently estimate spatially distributed properties like hydraulic conductivity (K) from sparse measurements using a deep generative model and ensemble-based inversion method.methods:The proposed method combines Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and Ensemble Smoother with Multiple Data Assimilation (ES-MDA) to generate high-dimensional K fields from a low-dimensional latent space and update the latent variables by assimilating available measurements.results:The proposed method accurately characterizes the main features of the unknown K fields with reliable uncertainty quantification, outperforming a widely-used variational inversion approach, especially for channelized and fractured field examples. The ensemble-based approach smooths out the complex objective function surface during minimization, leading to improved performance.

Abstract
Estimating spatially distributed properties such as hydraulic conductivity (K) from available sparse measurements is a great challenge in subsurface characterization. However, the use of inverse modeling is limited for ill-posed, high-dimensional applications due to computational costs and poor prediction accuracy with sparse datasets. In this paper, we combine Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), a deep generative model that can accurately capture complex subsurface structure, and Ensemble Smoother with Multiple Data Assimilation (ES-MDA), an ensemble-based inversion method, for accurate and accelerated subsurface characterization. WGAN-GP is trained to generate high-dimensional K fields from a low-dimensional latent space and ES-MDA then updates the latent variables by assimilating available measurements. Several subsurface examples are used to evaluate the accuracy and efficiency of the proposed method and the main features of the unknown K fields are characterized accurately with reliable uncertainty quantification. Furthermore, the estimation performance is compared with a widely-used variational, i.e., optimization-based, inversion approach, and the proposed approach outperforms the variational inversion method, especially for the channelized and fractured field examples. We explain such superior performance by visualizing the objective function in the latent space: because of nonlinear and aggressive dimension reduction via generative modeling, the objective function surface becomes extremely complex while the ensemble approximation can smooth out the multi-modal surface during the minimization. This suggests that the ensemble-based approach works well over the variational approach when combined with deep generative models at the cost of forward model runs unless convergence-ensuring modifications are implemented in the variational inversion.

摘要
估算沿体分布的特性，如水利导能（K），从可用的稀疏测量数据中估算是一项大allenge。然而，因为这类应用的维度太多，使用反向模型受限于计算成本和精度不高。在这篇论文中，我们结合 Wasserstein Generative Adversarial Network with Gradient Penalty（WGAN-GP）和 Ensemble Smoother with Multiple Data Assimilation（ES-MDA），一种深度生成模型和一种ensemble-based倒推方法，以实现高精度和加速的地下特性估算。WGAN-GP是用于生成高维K场的深度生成模型，ES-MDA则将可用测量数据 assimilate到latent变量中。我们使用多个地下示例来评估提案的准确性和效率，并发现提案可以准确地 caracterize unknown K场的主要特征，并提供可靠的不确定量评估。此外，我们与一种广泛使用的变量，即优化基于推理的倒推方法进行比较，并发现提案的方法在渠化和裂隙场示例中表现更优异。我们通过Visualizing the objective function in the latent space来解释这种更好的性能，因为通过非线性和攻击性的维度减少，生成模型可以生成非常复杂的目标函数表面，而ensemble approximation可以在最小化过程中平滑出多模态表面。这表明， ensemble-based方法在Variational inversion方法中表现更好，尤其是在渠化和裂隙场示例中。

Online Sensitivity Optimization in Differentially Private Learning

paper_url: http://arxiv.org/abs/2310.00829
repo_url: None
paper_authors: Filippo Galli, Catuscia Palamidessi, Tommaso Cucinotta
for: 这篇研究的目的是为了开发具有隐私保证的机器学习模型，并且需要对个人贡献的限制。
methods: 这篇研究使用的方法包括将个人的梯度转换为$2$-norm，并且在批制程中进行数据隐藏。
results: 这篇研究的结果显示，这种动态化 clipping 阈值的方法可以与固定的阈值比较，具有相同或更好的性能，并且可以在不同的数据集、模型结构和隐私水平下进行最佳化。

Abstract
Training differentially private machine learning models requires constraining an individual's contribution to the optimization process. This is achieved by clipping the $2$-norm of their gradient at a predetermined threshold prior to averaging and batch sanitization. This selection adversely influences optimization in two opposing ways: it either exacerbates the bias due to excessive clipping at lower values, or augments sanitization noise at higher values. The choice significantly hinges on factors such as the dataset, model architecture, and even varies within the same optimization, demanding meticulous tuning usually accomplished through a grid search. In order to circumvent the privacy expenses incurred in hyperparameter tuning, we present a novel approach to dynamically optimize the clipping threshold. We treat this threshold as an additional learnable parameter, establishing a clean relationship between the threshold and the cost function. This allows us to optimize the former with gradient descent, with minimal repercussions on the overall privacy analysis. Our method is thoroughly assessed against alternative fixed and adaptive strategies across diverse datasets, tasks, model dimensions, and privacy levels. Our results demonstrate its comparable or superior performance in all evaluated scenarios, given the same privacy requirements.

摘要

2023-10-02

Transformers are efficient hierarchical chemical graph learners

Robustifying State-space Models for Long Sequences via Approximate Diagonalization

DANI: Fast Diffusion Aware Network Inference with Preserving Topological Structure Property

Forecasting Tropical Cyclones with Cascaded Diffusion Models

From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression

A Framework for Interpretability in Machine Learning for Medical Imaging

Commutative Width and Depth Scaling in Deep Neural Networks

Estimating and Implementing Conventional Fairness Metrics With Probabilistic Protected Features

Score dynamics: scaling molecular dynamics with picosecond timesteps via conditional diffusion model

Locality-Aware Graph-Rewiring in GNNs

Home Electricity Data Generator (HEDGE): An open-access tool for the generation of electric vehicle, residential demand, and PV generation profiles

REMEDI: REinforcement learning-driven adaptive MEtabolism modeling of primary sclerosing cholangitis DIsease progression

PolySketchFormer: Fast Transformers via Sketches for Polynomial Kernels

Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

Equivariant Adaptation of Large Pretrained Models

Deep Insights into Noisy Pseudo Labeling on Graph Data

Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods

Intractability of Learning the Discrete Logarithm with Gradient-Based Methods

Adversarial Contextual Bandits Go Kernelized

Pool-Based Active Learning with Proper Topological Regions

An Investigation of Representation and Allocation Harms in Contrastive Learning

Contraction Properties of the Global Workspace Primitive

Causality-informed Rapid Post-hurricane Building Damage Detection in Large Scale from InSAR Imagery

On the near-optimality of betting confidence sets for bounded means

Fusing Models with Complementary Expertise

Adversarial Client Detection via Non-parametric Subspace Monitoring in the Internet of Federated Things

Nowcasting day-ahead marginal emissions using multi-headed CNNs and deep generative models

The Benefit of Noise-Injection for Dynamic Gray-Box Model Creation

Tensor Ring Optimized Quantum-Enhanced Tensor Neural Networks

CODA: Temporal Domain Generalization via Concept Drift Simulator

A Learning Based Scheme for Fair Timeliness in Sparse Gossip Networks

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Window-based Model Averaging Improves Generalization in Heterogeneous Federated Learning

Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use

A peridynamic-informed deep learning model for brittle damage prediction

The Optimal use of Segmentation for Sampling Calorimeters

Optimal Estimator for Linear Regression with Shuffled Labels

Coupling public and private gradient provably helps optimization

Automated regime detection in multidimensional time series data using sliced Wasserstein k-means clustering

Non-Exchangeable Conformal Risk Control

Self-supervised Learning for Anomaly Detection in Computational Workflows

Modality-aware Transformer for Time series Forecasting

Reconstructing Atmospheric Parameters of Exoplanets Using Deep Learning

A path-norm toolkit for modern networks: consequences, promises and challenges

Revisiting Mobility Modeling with Graph: A Graph Transformer Model for Next Point-of-Interest Recommendation

From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication

Unified Uncertainty Calibration

SWoTTeD: An Extension of Tensor Decomposition to Temporal Phenotyping

Federated K-means Clustering

If there is no underfitting, there is no Cold Posterior Effect

Light Schrödinger Bridge

RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network

Modularity in Deep Learning: A Survey

Cryptocurrency Portfolio Optimization by Neural Networks

Parallel-in-Time Probabilistic Numerical ODE Solvers

The Map Equation Goes Neural

CommIN: Semantic Image Communications as an Inverse Problem with INN-Guided Diffusion Models

Predicting emergence of crystals from amorphous matter with deep learning

R-divergence for Estimating Model-oriented Distribution Discrepancy

Energy-Guided Continuous Entropic Barycenter Estimation for General Costs

Seismogram Transformer: A generic deep learning backbone network for multiple earthquake monitoring tasks

A Novel Approach for Machine Learning-based Load Balancing in High-speed Train System using Nested Cross Validation

The Fisher-Rao geometry of CES distributions

A Robust Machine Learning Approach for Path Loss Prediction in 5G Networks with Nested Cross Validation

Conflict-Aware Active Automata Learning

A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

Effective Learning with Node Perturbation in Deep Neural Networks

A Novel IoT Trust Model Leveraging Fully Distributed Behavioral Fingerprinting and Secure Delegation

Improved Variational Bayesian Phylogenetic Inference using Mixtures

Integration of Graph Neural Network and Neural-ODEs for Tumor Dynamic Prediction

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

Organized Event Participant Prediction Enhanced by Social Media Retweeting Data

Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss

Deep Neural Networks Tend To Extrapolate Predictably

COMPOSER: Scalable and Robust Modular Policies for Snake Robots

Drug Discovery with Dynamic Goal-aware Fragments

Subsurface Characterization using Ensemble-based Approaches with Deep Generative Models