2023-10-20

cs.LG

cs.LG - 2023-10-20

Competitive Advantage Attacks to Decentralized Federated Learning

paper_url: http://arxiv.org/abs/2310.13862
repo_url: None
paper_authors: Yuqi Jia, Minghong Fang, Neil Zhenqiang Gong
for: This paper is written for researchers and practitioners in the field of federated learning, particularly those interested in understanding and mitigating attacks on decentralized federated learning (DFL) systems.
methods: The paper proposes a new family of attacks called SelfishAttack, which aims to achieve competitive advantages over non-selfish clients in DFL systems. The authors formulate finding such local models as an optimization problem and propose methods to solve it when DFL uses different aggregation rules.
results: The authors show that their proposed methods successfully increase the accuracy gap between the final learnt local models of selfish clients and those of non-selfish ones. Moreover, SelfishAttack achieves larger accuracy gaps than poisoning attacks when extended to increase competitive advantages.Here is the Chinese translation of the three key information points:
for: 这篇论文是为了向 federated learning 领域的研究者和实践者提供一种新的攻击方法，具体是在分布式 federated learning 系统中实现自利益的攻击。
methods: 论文提出了一种新的攻击方法，即 SelfishAttack，用于在分布式 federated learning 系统中获得竞争优势。作者们将找到这些地方的方法形式为优化问题，并提出了解决这些问题的方法，当 DFL 使用不同的聚合规则时。
results: 作者们证明了他们提出的方法能够成功地增加自利益客户端的最终学习的本地模型准确率与非自利益客户端的准确率之间的差距。此外，SelfishAttack 还能够在扩展到增加竞争优势时超过毒害攻击。

Abstract
Decentralized federated learning (DFL) enables clients (e.g., hospitals and banks) to jointly train machine learning models without a central orchestration server. In each global training round, each client trains a local model on its own training data and then they exchange local models for aggregation. In this work, we propose SelfishAttack, a new family of attacks to DFL. In SelfishAttack, a set of selfish clients aim to achieve competitive advantages over the remaining non-selfish ones, i.e., the final learnt local models of the selfish clients are more accurate than those of the non-selfish ones. Towards this goal, the selfish clients send carefully crafted local models to each remaining non-selfish one in each global training round. We formulate finding such local models as an optimization problem and propose methods to solve it when DFL uses different aggregation rules. Theoretically, we show that our methods find the optimal solutions to the optimization problem. Empirically, we show that SelfishAttack successfully increases the accuracy gap (i.e., competitive advantage) between the final learnt local models of selfish clients and those of non-selfish ones. Moreover, SelfishAttack achieves larger accuracy gaps than poisoning attacks when extended to increase competitive advantages.

摘要
分布式联合学习（DFL）允许客户端（例如医院和银行）共同训练机器学习模型无需中央执行服务器。在每个全球训练轮次中，每个客户端在自己的训练数据上进行本地训练，然后将本地模型交换给其他客户端进行汇总。在这项工作中，我们提出了自利攻击（SelfishAttack），一种新的攻击DFL的方法。在SelfishAttack中，一组自利客户端尝试通过在每个全球训练轮次中发送特制的本地模型来获得与其他非自利客户端的竞争优势。具体来说，自利客户端的最终训练的本地模型比非自利客户端的模型更高精度。为实现这一目标，我们将找到一个优化问题的解决方案，并提出了在不同汇总规则下解决这个问题的方法。理论上，我们证明了我们的方法可以找到优化问题的优解。实际上，我们证明了SelfishAttack成功地增加了自利客户端和非自利客户端之间的精度差（竞争优势）。此外，SelfishAttack在扩展到增加竞争优势时比质量攻击更有效。

A Sparse Bayesian Learning for Diagnosis of Nonstationary and Spatially Correlated Faults with Application to Multistation Assembly Systems

paper_url: http://arxiv.org/abs/2310.16058
repo_url: None
paper_authors: Jihoon Chung, Zhenyu Kong
for: 本研究旨在提出一种新的维度诊断方法，以Addressing the challenges of limited sensor numbers, non-stationary process faults, and correlation information in manufacturing systems.
methods: 本方法基于实际假设，即进程 faults sparse，并使用层次结构和多个参数化先验分布来解决上述挑战。 variants Bayes inference 方法 derive approximate posterior distributions of process faults.
results: numerical and real-world case studies demonstrate the efficacy of the proposed method in an actual autobody assembly system, and its generalizability to other domains, including communication and healthcare systems.

Abstract
Sensor technology developments provide a basis for effective fault diagnosis in manufacturing systems. However, the limited number of sensors due to physical constraints or undue costs hinders the accurate diagnosis in the actual process. In addition, time-varying operational conditions that generate nonstationary process faults and the correlation information in the process require to consider for accurate fault diagnosis in the manufacturing systems. This article proposes a novel fault diagnosis method: clustering spatially correlated sparse Bayesian learning (CSSBL), and explicitly demonstrates its applicability in a multistation assembly system that is vulnerable to the above challenges. Specifically, the method is based on a practical assumption that it will likely have a few process faults (sparse). In addition, the hierarchical structure of CSSBL has several parameterized prior distributions to address the above challenges. As posterior distributions of process faults do not have closed form, this paper derives approximate posterior distributions through Variational Bayes inference. The proposed method's efficacy is provided through numerical and real-world case studies utilizing an actual autobody assembly system. The generalizability of the proposed method allows the technique to be applied in fault diagnosis in other domains, including communication and healthcare systems.

摘要
感测技术发展为制造系统效果的缺陷诊断提供基础。然而，由于物理限制或过分成本，制造系统中的感测器数量有限，这限制了实际进程中准确诊断的能力。此外，时变操作条件会生成非站ARY进程缺陷，而且需要考虑进程中的相关信息。本文提出了一种新的缺陷诊断方法：归一化空间相关稀疏 bayesian 学习（CSSBL），并详细描述其在多站制造系统中的可应用性。具体来说，该方法基于实际情况下的稀疏缺陷假设，并有多个层次结构来 Address 上述挑战。由于 posterior distribution 的不准确性，本文使用变分析法来 derive approximate posterior distribution。实验和实际案例研究表明，提议的方法可以在制造系统中实现高度的缺陷诊断精度。此外，该方法的通用性使其可以应用于其他领域，包括通信和医疗系统的缺陷诊断。

Towards Subject Agnostic Affective Emotion Recognition

paper_url: http://arxiv.org/abs/2310.15189
repo_url: None
paper_authors: Amit Kumar Jaiswal, Haiming Liu, Prayag Tiwari
for: 本研究旨在实现不含受试者特定信息的情感识别，使用EEG信号进行脑机器交互（aBCI）。
methods: 本研究使用领域总结和领域适应来解决EEG信号中的分布偏移问题。
results: 我们的提议方法可以在实验中与当前领域适应方法相比，在没有额外计算资源的情况下实现类似的性能。

Abstract
This paper focuses on affective emotion recognition, aiming to perform in the subject-agnostic paradigm based on EEG signals. However, EEG signals manifest subject instability in subject-agnostic affective Brain-computer interfaces (aBCIs), which led to the problem of distributional shift. Furthermore, this problem is alleviated by approaches such as domain generalisation and domain adaptation. Typically, methods based on domain adaptation confer comparatively better results than the domain generalisation methods but demand more computational resources given new subjects. We propose a novel framework, meta-learning based augmented domain adaptation for subject-agnostic aBCIs. Our domain adaptation approach is augmented through meta-learning, which consists of a recurrent neural network, a classifier, and a distributional shift controller based on a sum-decomposable function. Also, we present that a neural network explicating a sum-decomposable function can effectively estimate the divergence between varied domains. The network setting for augmented domain adaptation follows meta-learning and adversarial learning, where the controller promptly adapts to new domains employing the target data via a few self-adaptation steps in the test phase. Our proposed approach is shown to be effective in experiments on a public aBICs dataset and achieves similar performance to state-of-the-art domain adaptation methods while avoiding the use of additional computational resources.

摘要
The proposed approach uses a recurrent neural network (RNN), a classifier, and a distributional shift controller based on a sum-decomposable function to adapt to new domains. The controller is trained to quickly adapt to new domains using a few self-adaptation steps in the test phase. The authors also show that a neural network explicating a sum-decomposable function can effectively estimate the divergence between varied domains.The proposed approach is evaluated on a public aBCIs dataset and achieves similar performance to state-of-the-art domain adaptation methods while avoiding the use of additional computational resources. The main contributions of the paper are:1. A novel framework for subject-agnostic aBCIs based on meta-learning and adversarial learning.2. A sum-decomposable function-based distributional shift controller for adapting to new domains.3. A self-adaptation mechanism for quickly adapting to new domains using a few self-adaptation steps in the test phase.4. Experimental results on a public aBCIs dataset that demonstrate the effectiveness of the proposed approach.

Exponential weight averaging as damped harmonic motion

paper_url: http://arxiv.org/abs/2310.13854
repo_url: None
paper_authors: Jonathan Patsenker, Henry Li, Yuval Kluger
for: 提供稳定估计深度学习优化过程中的随机量统计
methods: 使用抽象移动平均（EMA）计算模型参数，并在训练过程中和训练完成后进行weight平均，以提高推理模型的稳定性
results: 提出了一种基于物理拟合的改进训练算法（BELAY），并证明了BELAY在训练过程中和训练完成后具有许多优势，包括更高的稳定性和更好的性能。

Abstract
The exponential moving average (EMA) is a commonly used statistic for providing stable estimates of stochastic quantities in deep learning optimization. Recently, EMA has seen considerable use in generative models, where it is computed with respect to the model weights, and significantly improves the stability of the inference model during and after training. While the practice of weight averaging at the end of training is well-studied and known to improve estimates of local optima, the benefits of EMA over the course of training is less understood. In this paper, we derive an explicit connection between EMA and a damped harmonic system between two particles, where one particle (the EMA weights) is drawn to the other (the model weights) via an idealized zero-length spring. We then leverage this physical analogy to analyze the effectiveness of EMA, and propose an improved training algorithm, which we call BELAY. Finally, we demonstrate theoretically and empirically several advantages enjoyed by BELAY over standard EMA.

摘要
exponential moving average (EMA) 是一种常用的统计方法，用于提供深度学习优化中稳定的随机量 estimator。最近，EMA 在生成模型中得到了广泛的应用，其中 compute 与模型参数相关。这些参数的 EMA 计算 Significantly improves the stability of the inference model during and after training. 虽然在训练结束时weight averaging的实践已经很受欢迎，但EMA在训练过程中的效果不够了解。在这篇论文中，我们 derivate 了EMA和两个 particles之间的封闭律动系统的直接关系，其中一个particle（EMA 参数）被吸引到另一个（模型参数）via 一个理想化的零长度spring。我们然后利用这种物理类比来分析 EMA 的效果，并提出了一种改进的训练算法，我们称之为 BELAY。最后，我们 theoretically 和 empirically 证明了 BELAY 对标准 EMA 的优势。

Gradual Domain Adaptation: Theory and Algorithms

paper_url: http://arxiv.org/abs/2310.13852
repo_url: https://github.com/yifei-he/goat
paper_authors: Yifei He, Haoxiang Wang, Bo Li, Han Zhao
for: 这篇论文主要关注于 Gradual Domain Adaptation (GDA) 中的一个问题，即如何在具有大量数据的源领域中对于目标领域进行逐步适束。
methods: 这篇论文使用 Gradual Self-training (GST) 来实现 GDA，并提供了一个改进了 Kumar et al. (2020) 的一般化范围边界。
results: 实验结果显示，在实际应用中，这个 GOAT 框架可以对于欠缺中间领域的情况进行改进，并且可以提高标准 GDA 的性能。

Abstract
Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. Though widely applied, UDA faces a great challenge whenever the distribution shift between the source and the target is large. Gradual domain adaptation (GDA) mitigates this limitation by using intermediate domains to gradually adapt from the source to the target domain. In this work, we first theoretically analyze gradual self-training, a popular GDA algorithm, and provide a significantly improved generalization bound compared with Kumar et al. (2020). Our theoretical analysis leads to an interesting insight: to minimize the generalization error on the target domain, the sequence of intermediate domains should be placed uniformly along the Wasserstein geodesic between the source and target domains. The insight is particularly useful under the situation where intermediate domains are missing or scarce, which is often the case in real-world applications. Based on the insight, we propose $\textbf{G}$enerative Gradual D$\textbf{O}$main $\textbf{A}$daptation with Optimal $\textbf{T}$ransport (GOAT), an algorithmic framework that can generate intermediate domains in a data-dependent way. More concretely, we first generate intermediate domains along the Wasserstein geodesic between two given consecutive domains in a feature space, then apply gradual self-training to adapt the source-trained classifier to the target along the sequence of intermediate domains. Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/yifei-he/GOAT.

摘要
Unsupervised domain adaptation (UDA) 将源频道上标注的模型适应到目标频道上，但是当源和目标频道的分布差异较大时，UDA会遇到很大的挑战。 Gradual domain adaptation (GDA) 可以解决这个问题，通过使用中间频道来慢慢地适应源频道到目标频道。在这项工作中，我们首先从理论角度分析了渐进自动适应算法，并提供了与Kumar et al. (2020)相比较好的泛化约束。我们的理论分析导致了一个有趣的发现：在适应目标频道时，中间频道的序列应该被置于 Wasserstein 地odesic 上。这个发现对于实际应用中缺少或罕见中间频道的情况非常有用。基于这个发现，我们提出了 $\textbf{G}$enerative Gradual D$\textbf{O}$main $\textbf{A}$daptation with Optimal $\textbf{T}$ransport（GOAT）算法框架。具体来说，我们首先在Feature空间中生成中间频道序列，然后应用渐进自动适应来适应源频道上训练的分类器到目标频道。Empirical experiments show that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly expanding the real-world application scenarios of GDA.我们的代码可以在https://github.com/yifei-he/GOAT上获取。

Augment with Care: Enhancing Graph Contrastive Learning with Selective Spectrum Perturbation

paper_url: http://arxiv.org/abs/2310.13845
repo_url: None
paper_authors: Kaiqi Yang, Haoyu Han, Wei Jin, Hui Liu
for: 本文旨在提出一种基于特征频谱的图像增强视图，以提高图像对比学抽象的效果。
methods: 本文使用了spectral hint guided edge perturbation方法，通过选择性地在特定频谱上pose tailored perturbation，以实现adaptive和可控的增强视图。
results: 经过广泛的实验和理论分析，GASSER方法能够提供adaptive和可控的增强视图，同时符合图像结构的自同比率和频谱特征。

Abstract
In recent years, Graph Contrastive Learning (GCL) has shown remarkable effectiveness in learning representations on graphs. As a component of GCL, good augmentation views are supposed to be invariant to the important information while discarding the unimportant part. Existing augmentation views with perturbed graph structures are usually based on random topology corruption in the spatial domain; however, from perspectives of the spectral domain, this approach may be ineffective as it fails to pose tailored impacts on the information of different frequencies, thus weakening the agreement between the augmentation views. By a preliminary experiment, we show that the impacts caused by spatial random perturbation are approximately evenly distributed among frequency bands, which may harm the invariance of augmentations required by contrastive learning frameworks. To address this issue, we argue that the perturbation should be selectively posed on the information concerning different frequencies. In this paper, we propose GASSER which poses tailored perturbation on the specific frequencies of graph structures in spectral domain, and the edge perturbation is selectively guided by the spectral hints. As shown by extensive experiments and theoretical analysis, the augmentation views are adaptive and controllable, as well as heuristically fitting the homophily ratios and spectrum of graph structures.

摘要

Fast hyperboloid decision tree algorithms

paper_url: http://arxiv.org/abs/2310.13841
repo_url: None
paper_authors: Philippe Chlenski, Ethan Turok, Antonio Moretti, Itsik Pe’er
for: 这篇论文的目的是提出一种基于гипербо利 geometry的决策树算法，以解决在机器学习中遇到的计算复杂性问题。
methods: 该论文使用了一种基于内积的方法，通过利用内积来适应гиPERBOLIC空间，从而消除了计算复杂性的问题。
results: 对多个数据集进行了广泛的比较，表明该方法可以提供快速、准确、精度高的 гиPERBOLIC数据分析工具。

Abstract
Hyperbolic geometry is gaining traction in machine learning for its effectiveness at capturing hierarchical structures in real-world data. Hyperbolic spaces, where neighborhoods grow exponentially, offer substantial advantages and consistently deliver state-of-the-art results across diverse applications. However, hyperbolic classifiers often grapple with computational challenges. Methods reliant on Riemannian optimization frequently exhibit sluggishness, stemming from the increased computational demands of operations on Riemannian manifolds. In response to these challenges, we present hyperDT, a novel extension of decision tree algorithms into hyperbolic space. Crucially, hyperDT eliminates the need for computationally intensive Riemannian optimization, numerically unstable exponential and logarithmic maps, or pairwise comparisons between points by leveraging inner products to adapt Euclidean decision tree algorithms to hyperbolic space. Our approach is conceptually straightforward and maintains constant-time decision complexity while mitigating the scalability issues inherent in high-dimensional Euclidean spaces. Building upon hyperDT we introduce hyperRF, a hyperbolic random forest model. Extensive benchmarking across diverse datasets underscores the superior performance of these models, providing a swift, precise, accurate, and user-friendly toolkit for hyperbolic data analysis.

摘要
超凡几何在机器学习中得到推广，因为它能够很好地捕捉实际数据中的层次结构。超凡空间，其 neighberhood exponentially 增长，提供了很多优势，并一直在多种应用中提供了状态机器学习的最佳结果。然而，超凡分类器经常面临计算挑战。基于里曼尼托п optimization 的方法经常会显示缓慢，这是因为在里曼尼托п上进行操作的计算增加了负担。为了解决这些挑战，我们提出了 hyperDT，一种基于几何空间的决策树算法的扩展。 hyperDT 减少了 computationally 成本的 Riemannian 优化、不稳定的对数和对数映射以及点之间的对比。我们的方法是概念简单，保持了常量时间复杂度，同时解决高维度欧氏空间中的缺乏扩展性问题。基于 hyperDT 我们引入了 hyperRF，一种基于超凡空间的随机森林模型。对多种数据进行广泛的 benchmarking 表明，这些模型具有快速、精准、准确和易用的特点，提供了一个可靠的超凡数据分析工具。

Universal Representation of Permutation-Invariant Functions on Vectors and Tensors

paper_url: http://arxiv.org/abs/2310.13829
repo_url: None
paper_authors: Puoya Tabaghi, Yusu Wang
for: 该研究的主要目标是研究多aset函数，即不同大小的输入上的具有 permutation-invariant 性的函数。
methods: 该研究使用 Deep Sets 模型，其提供了一种 scalars 上的 continuous multiset functions 的 universality 表示，以及一种需要 latent space 维度为 O(N^D) 的 universality 近似。
results: 该研究证明了 continuous 和 discontinuous multiset functions 的 universality 表示只需要 latent space 维度为 O(N^D)，并且证明了 encoder 和 decoder 函数都是连续的。此外，该研究还扩展到 permutation-invariant tensor functions 的 universality 表示，并提供了一些特殊的 sum-decomposition 结构以实现这一目标。

Abstract
A main object of our study is multiset functions -- that is, permutation-invariant functions over inputs of varying sizes. Deep Sets, proposed by \cite{zaheer2017deep}, provides a \emph{universal representation} for continuous multiset functions on scalars via a sum-decomposable model. Restricting the domain of the functions to finite multisets of $D$-dimensional vectors, Deep Sets also provides a \emph{universal approximation} that requires a latent space dimension of $O(N^D)$ -- where $N$ is an upper bound on the size of input multisets. In this paper, we strengthen this result by proving that universal representation is guaranteed for continuous and discontinuous multiset functions though a latent space dimension of $O(N^D)$. We then introduce \emph{identifiable} multisets for which we can uniquely label their elements using an identifier function, namely, finite-precision vectors are identifiable. Using our analysis on identifiable multisets, we prove that a sum-decomposable model for general continuous multiset functions only requires a latent dimension of $2DN$. We further show that both encoder and decoder functions of the model are continuous -- our main contribution to the existing work which lack such a guarantee. Also this provides a significant improvement over the aforementioned $O(N^D)$ bound which was derived for universal representation of continuous and discontinuous multiset functions. We then extend our results and provide special sum-decomposition structures to universally represent permutation-invariant tensor functions on identifiable tensors. These families of sum-decomposition models enables us to design deep network architectures and deploy them on a variety of learning tasks on sequences, images, and graphs.

摘要
我们的研究主要目标是多集函数（permutation-invariant functions），即输入尺寸不同时的函数。深度集（Deep Sets），由\cite{zaheer2017deep}所提出，提供了一个基于条件对称的普遍表示（universal representation），用于简单多集函数上的数学分析。当将函数的域限制为给定维度的多集时，深度集还提供了一个普遍推广（universal approximation），需要的隐藏空间维度为$O(N^D)$，其中$N$是输入多集的最大大小。在这篇论文中，我们将这个结果加强，证明了普遍表示是给定维度的连续和不连续多集函数，只需要隐藏空间维度为$O(N^D)$。我们还引入了可识别的多集（identifiable multisets），可以使用一个元素识别函数（identifier function）将其元素唯一标识。使用我们的分析方法，我们证明了一个条件对称的数学分析模型可以用$2DN$维度表示任意连续多集函数。此外，我们还证明了这个模型的数学分析器和实现器都是连续的，这是我们对现有工作的主要贡献之一。此外，这也提供了$O(N^D)$的独立进步，与先前的结果相比，这个结果可以更好地表示连续和不连续多集函数的普遍表示。最后，我们延伸我们的结果，提供了对应 permutation-invariant tensor functions的特殊条件对称分析模型。这些家族的条件对称分析模型可以用来设计深度网络架构，并将其部署到多个学习任务上，例如序列、图像和 Graf 等。

Adversarial Attacks on Fairness of Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.13822
repo_url: https://github.com/zhangbinchi/g-fairattack
paper_authors: Binchi Zhang, Yushun Dong, Chen Chen, Yada Zhu, Minnan Luo, Jundong Li
for: 这篇论文旨在调查对具有公平性考虑的图神经网络（GNNs）的攻击性评估，以及对不同类型的 GNNs 的公平性攻击。
methods: 该论文提出了 G-FairAttack 框架，用于对 GNNs 进行公平性攻击，包括不可察觉地损害公平性。同时， authors 还提出了一种快速计算技术来降低攻击时间复杂度。
results: 实验研究表明，G-FairAttack 可以成功地损害不同类型的 GNNs 的公平性，而不会影响预测的准确性。这些结果表明了对 GNNs 的公平性攻击的潜在漏洞，并促进了进一步研究 GNNs 的Robustness 在公平性方面。

Abstract
Fairness-aware graph neural networks (GNNs) have gained a surge of attention as they can reduce the bias of predictions on any demographic group (e.g., female) in graph-based applications. Although these methods greatly improve the algorithmic fairness of GNNs, the fairness can be easily corrupted by carefully designed adversarial attacks. In this paper, we investigate the problem of adversarial attacks on fairness of GNNs and propose G-FairAttack, a general framework for attacking various types of fairness-aware GNNs in terms of fairness with an unnoticeable effect on prediction utility. In addition, we propose a fast computation technique to reduce the time complexity of G-FairAttack. The experimental study demonstrates that G-FairAttack successfully corrupts the fairness of different types of GNNs while keeping the attack unnoticeable. Our study on fairness attacks sheds light on potential vulnerabilities in fairness-aware GNNs and guides further research on the robustness of GNNs in terms of fairness. The open-source code is available at https://github.com/zhangbinchi/G-FairAttack.

摘要
“公平意识的图 neural network (GNN) 在图基应用中受到了一波关注，因为它们可以减少任何人类群体（如女性）的预测偏见。尽管这些方法可以大幅提高 GNN 的算法公平性，但这种公平性可以被轻松地通过特制的对抗攻击破坏。在这篇论文中，我们调查了 GNN 公平性对抗攻击的问题，并提出了 G-FairAttack，一种可以对多种公平意识 GNN 进行攻击，并且保持攻击不可见地影响预测Utility。此外，我们还提出了一种快速计算技术来降低 G-FairAttack 的时间复杂度。实验研究表明，G-FairAttack 成功地破坏了不同类型的 GNN 公平性，而且保持了攻击不可见。我们的研究对 GNN 公平性对抗攻击提供了新的灵感，并指导了 GNN 的可靠性研究。G-FairAttack 的开源代码可以在 https://github.com/zhangbinchi/G-FairAttack 中获取。”

Geometric Learning with Positively Decomposable Kernels

paper_url: http://arxiv.org/abs/2310.13821
repo_url: None
paper_authors: Nathael Da Costa, Cyrus Mostajeran, Juan-Pablo Ortega, Salem Said
for: 这个论文主要针对非欧几何数据空间上的机器学习问题。
methods: 该论文提出了基于 reproduce kernel Krein space (RKKS) 的方法，该方法只需要一个可分解为正数的kernels，而不需要访问这个分解。
results: 论文表明，对于各种特定的几何空间，可以通过constructing positively decomposable kernels来实现机器学习。此外，该论文还提供了一些理论基础，用于推广 RKKS-based 方法。

Abstract
Kernel methods are powerful tools in machine learning. Classical kernel methods are based on positive-definite kernels, which map data spaces into reproducing kernel Hilbert spaces (RKHS). For non-Euclidean data spaces, positive-definite kernels are difficult to come by. In this case, we propose the use of reproducing kernel Krein space (RKKS) based methods, which require only kernels that admit a positive decomposition. We show that one does not need to access this decomposition in order to learn in RKKS. We then investigate the conditions under which a kernel is positively decomposable. We show that invariant kernels admit a positive decomposition on homogeneous spaces under tractable regularity assumptions. This makes them much easier to construct than positive-definite kernels, providing a route for learning with kernels for non-Euclidean data. By the same token, this provides theoretical foundations for RKKS-based methods in general.

摘要
kernels 是机器学习中的强大工具。经典的kernel方法基于正定kernel，将数据空间映射到 reproduce kernel Hilbert space (RKHS) 中。不theless, 非欧几何数据空间中的正定kernel很难得到。在这种情况下，我们提议使用 reproduce kernel Krein space (RKKS) 基本方法，只需要kernels admit a positive decomposition。我们证明，不需要访问这种分解，可以在 RKKS 中学习。然后，我们研究了kernels admit positive decomposition的条件。我们证明，对具有 Symmetry 的kernel，在具有homogeneous space的情况下，可以在 tractable regularity assumptions 下获得正定分解。这使得这些kernel在constructing上比正定kernel更加容易，提供了非欧几何数据上的学习方法的理论基础。

A Better Match for Drivers and Riders: Reinforcement Learning at Lyft

paper_url: http://arxiv.org/abs/2310.13810
repo_url: None
paper_authors: Xabi Azagirre, Akshay Balwally, Guillaume Candeli, Nicholas Chamandy, Benjamin Han, Alona King, Hyungjun Lee, Martin Loncaric, Sébastien Martin, Vijay Narasiman, Zhiwei, Qin, Baptiste Richard, Sara Smoot, Sean Taylor, Garrett van Ryzin, Di Wu, Fei Yu, Alex Zamoshchin
for: 提高预估驾驶员的预计收益，以便更好地匹配乘客和司机。
methods: 使用了一种新的在线强化学习方法，在实时中估算司机未来的收益，并使用这些信息来找到更有效的匹配。
results: 通过这种改进后，司机可以每年服务更多的乘客，至少增加3000万美元的增值收入。

Abstract
To better match drivers to riders in our ridesharing application, we revised Lyft's core matching algorithm. We use a novel online reinforcement learning approach that estimates the future earnings of drivers in real time and use this information to find more efficient matches. This change was the first documented implementation of a ridesharing matching algorithm that can learn and improve in real time. We evaluated the new approach during weeks of switchback experimentation in most Lyft markets, and estimated how it benefited drivers, riders, and the platform. In particular, it enabled our drivers to serve millions of additional riders each year, leading to more than $30 million per year in incremental revenue. Lyft rolled out the algorithm globally in 2021.

摘要
为了更好地匹配驾驶员和乘客在我们的乘车应用程序中，我们修改了Lyft的核心匹配算法。我们使用了一种新的在线强化学习方法，以估计驾驶员未来的收益情况，并使用这些信息来找到更有效的匹配。这是首次实现了实时学习和改进的乘车匹配算法的documented实现。我们在多个Lyft市场进行了数周的交换实验，并估计了驾驶员、乘客和平台受益的方面。特别是，它允许我们的驾驶员每年服务数百万名乘客，导致每年超过3000万美元的额外收入。Lyft在2021年全球推广了这种算法。

Learning to (Learn at Test Time)

paper_url: http://arxiv.org/abs/2310.13807
repo_url: https://github.com/test-time-training/mttt
paper_authors: Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiaolong Wang, Tatsunori Hashimoto, Xinlei Chen
for: 本文改进了supervised learning的问题，将其转换为一个两层循环学习问题（i.e. 学习问题）。内层循环学习每个个体实例自动提供自我监督，然后进行最终预测。外层循环学习内层循环学习任务，以提高其最终预测。
methods: 内层循环学习使用了自我监督学习，并且可以使用线性注意力或自我注意力。外层循环学习学习了内层循环学习任务，以提高其最终预测。
results: 对于ImageNet datasets，使用本文的方法可以大幅提高准确率和计算量，而且比 tradicional transformers 更高效。此外，当内层循环学习器是神经网络时，本文的方法可以在224x224 Raw Pixels的情况下大幅提高准确率，而且不能由传统的 transformers 实现。

Abstract
We reformulate the problem of supervised learning as learning to learn with two nested loops (i.e. learning problems). The inner loop learns on each individual instance with self-supervision before final prediction. The outer loop learns the self-supervised task used by the inner loop, such that its final prediction improves. Our inner loop turns out to be equivalent to linear attention when the inner-loop learner is only a linear model, and to self-attention when it is a kernel estimator. For practical comparison with linear or self-attention layers, we replace each of them in a transformer with an inner loop, so our outer loop is equivalent to training the architecture. When each inner-loop learner is a neural network, our approach vastly outperforms transformers with linear attention on ImageNet from 224 x 224 raw pixels in both accuracy and FLOPs, while (regular) transformers cannot run.

摘要
我们重新定义监督学习的问题为内层循环学习（i.e. 学习问题），内层循环在每个个体例上进行自我监督学习，然后进行最终预测。外层循环则学习内层循环使用的自我监督任务，以改善其最终预测。我们的内层循环发现是线性注意力（linear attention）当内部学习器只是一个线性模型，而是自适应器当内部学习器。为了与线性注意力或自适应器层进行实用比较，我们将它们替换为内层循环，因此外层循环相等于训练架构。当内部学习器是一个神经网络时，我们的方法在ImageNet上与使用线性注意力的transformer相比，在精确性和FLOPs方面具有很大的提升，而且regular transformer无法运行。

Comparative Analysis of Machine Learning Algorithms for Solar Irradiance Forecasting in Smart Grids

paper_url: http://arxiv.org/abs/2310.13791
repo_url: None
paper_authors: Saman Soleymani, Shima Mohammadzadeh
for: 这个研究旨在预测太阳辐射，以便优化太阳能系统的使用。
methods: 本研究使用了下一代机器学习算法，包括随机树、Extreme Gradient Boosting (XGBoost)、Light Gradient Boosted Machine (lightGBM) ensemble、CatBoost、和多层感知神经网络 (MLP-ANNs)，以预测太阳辐射。此外，运用了 Bayesian 优化来调整参数。
results: 综合结果显示，MLP-ANNs 的性能会随着特征选择而改善；而 Random Forest 则与其他学习算法相比，表现较好。

Abstract
The increasing global demand for clean and environmentally friendly energy resources has caused increased interest in harnessing solar power through photovoltaic (PV) systems for smart grids and homes. However, the inherent unpredictability of PV generation poses problems associated with smart grid planning and management, energy trading and market participation, demand response, reliability, etc. Therefore, solar irradiance forecasting is essential for optimizing PV system utilization. This study proposes the next-generation machine learning algorithms such as random forests, Extreme Gradient Boosting (XGBoost), Light Gradient Boosted Machine (lightGBM) ensemble, CatBoost, and Multilayer Perceptron Artificial Neural Networks (MLP-ANNs) to forecast solar irradiance. Besides, Bayesian optimization is applied to hyperparameter tuning. Unlike tree-based ensemble algorithms that select the features intrinsically, MLP-ANN needs feature selection as a separate step. The simulation results indicate that the performance of the MLP-ANNs improves when feature selection is applied. Besides, the random forest outperforms the other learning algorithms.

摘要
全球减少可再生能源的需求在不断增长，为了利用太阳能电力，人们对光伏系统的智能网格和家庭等领域进行了更多的研究。然而，光伏生产的不可预测性会对智能网格规划和管理、能源交易和市场参与、需求回应等带来问题。因此，太阳辐射预测是必要的。本研究提出了下一代机器学习算法，如随机森林、极限梯度提升（XGBoost）、轻量级梯度提升机器（lightGBM） ensemble、CatBoost和多层感知神经网络（MLP-ANNs）来预测太阳辐射。此外，我们还应用了 bayesian 优化来调整超参数。与树型ensemble算法不同，MLP-ANN需要将特征选择作为一个分离的步骤。实验结果表明，当特征选择被应用时，MLP-ANN的性能会提高。此外，随机森林在其他学习算法中表现出色。

Graph AI in Medicine

paper_url: http://arxiv.org/abs/2310.13767
repo_url: None
paper_authors: Ruth Johnson, Michelle M. Li, Ayush Noori, Owen Queen, Marinka Zitnik
for: 这种论文主要用于探讨在医学人工智能中Graph Representation Learning的应用，尤其是通过Graph Neural Networks（GNNs）来捕捉医学数据中的复杂关系。
methods: 这种方法主要使用Graph Neural Networks（GNNs）来处理医学数据，并通过视模态为节点之间的关系来处理数据。
results: 这种方法可以在不同的医学任务上实现模型的转移，并且可以在不添加参数或最小再训练的情况下实现模型的泛化。然而，在医学决策中，人类中心的设计和模型解释性是不可或缺的。

Abstract
In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks (GNNs), stands out for its capability to capture intricate relationships within structured clinical datasets. With diverse data -- from patient records to imaging -- GNNs process data holistically by viewing modalities as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters or minimal re-training. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on graph relationships, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph models integrate diverse data modalities through pre-training, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way to clinically meaningful predictions.

摘要
在临床人工智能（AI）领域，图表学习（GNNs）在捕捉复杂的临床数据关系方面表现出色。通过视 modalities 为节点之间的关系的方式，GNNs 可以处理数据的整体特征。图AI 使得模型可以在不同患者人口中进行模型迁移，无需额外参数或 minimal 再训练。然而，在临床决策中，人类中心的设计和模型解释性不可或缺。由于 GNNs 通过本地神经变换定义在图关系上来捕捉信息，因此它们同时提供了机会和挑战来描述模型的逻辑。知识图可以增强解释性，将模型驱动的洞察与医学知识相对应。新兴的图模型通过预训练、实时反馈循环和人类-AI合作，为临床有意义的预测开辟了道路。

Learning Interatomic Potentials at Multiple Scales

paper_url: http://arxiv.org/abs/2310.13756
repo_url: None
paper_authors: Xiang Fu, Albert Musaelian, Anders Johansson, Tommi Jaakkola, Boris Kozinsky
for: 这个论文是为了提高分子动力学（MD） simulations 的速度而写的。
methods: 这个论文使用了一种多时步（MTS）integator，该integator可以在评估certain potential energy terms的时候采用更短的时间步。这种方法是由于类型 potentials 的简单且局限的分析形式而可行。
results: 这个论文的结果表明，使用这种方法可以在MD simulations中提高速度（~3x在我们的实验中）无需失去精度。

Abstract
The need to use a short time step is a key limit on the speed of molecular dynamics (MD) simulations. Simulations governed by classical potentials are often accelerated by using a multiple-time-step (MTS) integrator that evaluates certain potential energy terms that vary more slowly than others less frequently. This approach is enabled by the simple but limiting analytic forms of classical potentials. Machine learning interatomic potentials (MLIPs), in particular recent equivariant neural networks, are much more broadly applicable than classical potentials and can faithfully reproduce the expensive but accurate reference electronic structure calculations used to train them. They still, however, require the use of a single short time step, as they lack the inherent term-by-term scale separation of classical potentials. This work introduces a method to learn a scale separation in complex interatomic interactions by co-training two MLIPs. Initially, a small and efficient model is trained to reproduce short-time-scale interactions. Subsequently, a large and expressive model is trained jointly to capture the remaining interactions not captured by the small model. When running MD, the MTS integrator then evaluates the smaller model for every time step and the larger model less frequently, accelerating simulation. Compared to a conventionally trained MLIP, our approach can achieve a significant speedup (~3x in our experiments) without a loss of accuracy on the potential energy or simulation-derived quantities.

摘要
在分子动力学（MD）模拟中，使用短时步是一个关键限制。使用多时步（MTS）integrodor的模拟通常会加速，因为它可以在不同时间步评估不同慢速的潜在能量项。这种方法是由于类型的潜在能量函数的简单但限制的分析型表达。机器学习交互原子电子镜像（MLIPs），特别是最近的对称神经网络，在比较广泛的应用场景中比类型的潜在能量函数更加适用。它们仍然需要使用短时步，因为它们缺乏类型潜在能量函数中的自然层次分离。这项工作介绍了一种方法，通过同时训练两个MLIP来学习复杂的Interatomic交互中的层次分离。首先，一个小型和高效的模型被训练来复制短时间步的交互。然后，一个大型和表达力强的模型被同时训练，以捕捉不同时间步的交互。在MD模拟中，MTS integrodor会在每个时 step中评估小型模型，而大型模型则会在更少的时间 step中评估。相比于传统训练的MLIP，我们的方法可以在同等精度下实现约3倍的加速（~3x在我们的实验中）。

FairBranch: Fairness Conflict Correction on Task-group Branches for Fair Multi-Task Learning

paper_url: http://arxiv.org/abs/2310.13746
repo_url: https://github.com/arjunroyihrpa/fairbranch-open-source-intelligence
paper_authors: Arjun Roy, Christos Koutlis, Symeon Papadopoulos, Eirini Ntoutsi
for: 提高多任务学习（MTL）模型的公平性和准确性
methods: 使用分支法（FairBranch），分析学习过程中任务之间的参数相似性，并在相关任务集中进行任务分组，以减少负向传递和偏见传递
results: FairBranch在表格和视觉MTL问题中表现出色，在公平性和准确性两个方面超过了当前状态的MTL方法

Abstract
The generalization capacity of Multi-Task Learning (MTL) becomes limited when unrelated tasks negatively impact each other by updating shared parameters with conflicting gradients, resulting in negative transfer and a reduction in MTL accuracy compared to single-task learning (STL). Recently, there has been an increasing focus on the fairness of MTL models, necessitating the optimization of both accuracy and fairness for individual tasks. Similarly to how negative transfer affects accuracy, task-specific fairness considerations can adversely influence the fairness of other tasks when there is a conflict of fairness loss gradients among jointly learned tasks, termed bias transfer. To address both negative and bias transfer in MTL, we introduce a novel method called FairBranch. FairBranch branches the MTL model by assessing the similarity of learned parameters, grouping related tasks to mitigate negative transfer. Additionally, it incorporates fairness loss gradient conflict correction between adjoining task-group branches to address bias transfer within these task groups. Our experiments in tabular and visual MTL problems demonstrate that FairBranch surpasses state-of-the-art MTL methods in terms of both fairness and accuracy.

摘要
多任务学习（MTL）的通用能力在不相关任务之间发生负面影响，导致共享参数更新的梯度冲突，从而导致负面传递和相对于单任务学习（STL）的MTL精度下降。随着对MTL模型的公平性的增加注重，我们需要同时优化任务精度和公平性。与负面传递类似，任务特定的公平性考虑可能会对其他共同学习任务的公平性产生负面影响，这被称为偏见传递。为解决MTL中的负面和偏见传递，我们提出了一种新的方法called FairBranch。FairBranch通过评估学习到的参数相似性，将相关任务分组，以避免负面传递。此外，它还 incorporates fairness loss gradient conflict correction between adjacent task-group branches，以解决内部任务组中的偏见传递。我们在表格和视觉MTL问题上进行了实验，结果显示，FairBranch超过了当前MTL方法的公平性和精度。

CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages

paper_url: http://arxiv.org/abs/2310.13683
repo_url: https://github.com/hiaac-nlp/capivara
paper_authors: Gabriel Oliveira dos Santos, Diego A. B. Moreira, Alef Iury Ferreira, Jhessica Silva, Luiz Pereira, Pedro Bueno, Thiago Sousa, Helena Maia, Nádia Da Silva, Esther Colombini, Helio Pedrini, Sandra Avila
for: 提高多语言CLIP模型在低资源语言中的性能
methods: 使用图像captioning和机器翻译生成多个Synthetic caption，并优化训练管道使其更加效率
results: 在零批量视觉语言任务中达到了状态的艺术，并在葡萄牙语文本中显示出了显著的改进Here’s the breakdown of each point:
for: The paper is written to improve the performance of multilingual CLIP models in low-resource languages.
methods: The paper uses image captioning and machine translation to generate multiple synthetic captions in low-resource languages, and optimizes the training pipeline with LiT, LoRA, and gradient checkpointing to alleviate the computational cost.
results: The proposed method achieves state-of-the-art performance in zero-shot tasks involving images and Portuguese texts, and shows the potential for significant improvements in other low-resource languages with fine-tuning.

Abstract
This work introduces CAPIVARA, a cost-efficient framework designed to enhance the performance of multilingual CLIP models in low-resource languages. While CLIP has excelled in zero-shot vision-language tasks, the resource-intensive nature of model training remains challenging. Many datasets lack linguistic diversity, featuring solely English descriptions for images. CAPIVARA addresses this by augmenting text data using image captioning and machine translation to generate multiple synthetic captions in low-resource languages. We optimize the training pipeline with LiT, LoRA, and gradient checkpointing to alleviate the computational cost. Through extensive experiments, CAPIVARA emerges as state of the art in zero-shot tasks involving images and Portuguese texts. We show the potential for significant improvements in other low-resource languages, achieved by fine-tuning the pre-trained multilingual CLIP using CAPIVARA on a single GPU for 2 hours. Our model and code is available at https://github.com/hiaac-nlp/CAPIVARA.

摘要
这个工作介绍了CAPIVARA，一种可靠的框架，用于提高多语言CLIP模型在低资源语言中的性能。尽管CLIP在零shot视频语言任务上表现出色，但模型训练的资源成本仍然很高。许多数据集缺乏语言多样性，只有英文描述图像。CAPIVARA解决这个问题，通过图像描述和机器翻译来生成多个synthetic描述在低资源语言中。我们优化了训练管道，使用LiT、LoRA和梯度检查点来缓解计算成本。经过广泛的实验，CAPIVARA在零shot任务中与图像和葡萄牙文描述表现出状元。我们显示了其他低资源语言可以获得显著改进，通过在单个GPU上使用CAPIVARA进行2小时的微调，对pre-trained多语言CLIP进行了深度学习。我们的模型和代码可以在GitHub上找到：https://github.com/hiaac-nlp/CAPIVARA。

RealFM: A Realistic Mechanism to Incentivize Data Contribution and Device Participation

paper_url: http://arxiv.org/abs/2310.13681
repo_url: None
paper_authors: Marco Bornstein, Amrit Singh Bedi, Anit Kumar Sahu, Furqan Khan, Furong Huang
for: 本研究旨在实际场景下进行联合学习（Federated Learning，FL），并解决现有框架中出现的免费乘客问题。
methods: 本研究提出了RealFM，首个真实联合机制，该机制（1）实际地模型设备利益，（2）激励数据提供和设备参与，（3）可靠地除免费乘客现象。RealFM不需要数据分享，允许非线性关系 между模型准确率和利益，从而提高服务器和参与设备的利益和数据提供量。
results: 在实际数据上，RealFM可以提高设备和服务器的利益和数据提供量，比基eline机制提高3倍和7倍。

Abstract
Edge device participation in federating learning (FL) has been typically studied under the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in real-world settings, with many encountering the free-rider problem. In a step to push FL towards realistic settings, we propose RealFM: the first truly federated mechanism which (1) realistically models device utility, (2) incentivizes data contribution and device participation, and (3) provably removes the free-rider phenomena. RealFM does not require data sharing and allows for a non-linear relationship between model accuracy and utility, which improves the utility gained by the server and participating devices compared to non-participating devices as well as devices participating in other FL mechanisms. On real-world data, RealFM improves device and server utility, as well as data contribution, by up to 3 magnitudes and 7x respectively compared to baseline mechanisms.

摘要
<>这里使用 Simplified Chinese 翻译文本。<> presente study on edge device participation in federated learning (FL) 通常是通过 device-server communication (例如 device dropout) 进行研究，并假设 edge devices 对 FL 有不断的渴望参与。然而，实际上实现 FL 的现场设置中，现有的 FL 框架受到问题，许多人遇到了免责者问题。为了将 FL 带向现实的设置，我们提出 RealFM：首个真实的联邦机制，其中 (1) 实际地模型 device 的利益， (2) 鼓励数据贡献和设备参与， (3) 可靠地除掉免责者现象。RealFM 不需要数据共享，并允许非线性的数据对应模型精度和利益之间的关系，这提高了服务器和参与设备的利益，以及参与其他 FL 机制的设备中的数据贡献和参与。使用真实数据，RealFM 可以与基准机制相比，提高设备和服务器的利益，以及数据贡献，高达 3 倍和 7 倍。

Optimal Transport for Measures with Noisy Tree Metric

paper_url: http://arxiv.org/abs/2310.13653
repo_url: None
paper_authors: Tam Le, Truyen Nguyen, Kenji Fukumizu
for: 研究了最优运输（OT）问题，即在树 metric 空间上支持的概率测度之间的OT问题。
methods: 采用了robust OT方法，即考虑最大可能的树 metric 间距，以降低受到噪声或敌意测量的影响。
results: 提出了新的uncertainty sets of tree metrics，并利用树结构以及支持测度，实现了closed-form表达，以便快速计算。此外，还证明了max-min robust OT具有metric property和负定性，并提出了正定 definite kernels。在多个实验中，测试了这些kernels在文档分类和拓扑数据分析中的表现。

Abstract
We study optimal transport (OT) problem for probability measures supported on a tree metric space. It is known that such OT problem (i.e., tree-Wasserstein (TW)) admits a closed-form expression, but depends fundamentally on the underlying tree structure over supports of input measures. In practice, the given tree structure may be, however, perturbed due to noisy or adversarial measurements. In order to mitigate this issue, we follow the max-min robust OT approach which considers the maximal possible distances between two input measures over an uncertainty set of tree metrics. In general, this approach is hard to compute, even for measures supported in $1$-dimensional space, due to its non-convexity and non-smoothness which hinders its practical applications, especially for large-scale settings. In this work, we propose \emph{novel uncertainty sets of tree metrics} from the lens of edge deletion/addition which covers a diversity of tree structures in an elegant framework. Consequently, by building upon the proposed uncertainty sets, and leveraging the tree structure over supports, we show that the max-min robust OT also admits a closed-form expression for a fast computation as its counterpart standard OT (i.e., TW). Furthermore, we demonstrate that the max-min robust OT satisfies the metric property and is negative definite. We then exploit its negative definiteness to propose \emph{positive definite kernels} and test them in several simulations on various real-world datasets on document classification and topological data analysis for measures with noisy tree metric.

摘要
我们研究优化运输（OT）问题，将概率分布嵌入树度量空间中。已知这个OT问题（即树 Wasserstein（TW））有闭形表达，但它具有基于支持入力测度的树结构的根本关联。在实践中， giventree结构可能会受到不确定或反对的测量影响。为了解决这个问题，我们遵循max-min类型的Robust OT方法，考虑两个入力测度之间的最大可能距离，这个方法在一般情况下是困难Compute，尤其是在大规模设定下，因为它的非对称和非弹性对computation带来障碍。在这个工作中，我们提出了一种新的不确定树度量集，从edge删除/新增的角度出发，这些集合覆盖了树结构的多样性，并且具有一个漂亮的框架。因此，通过这些不确定树度量集，并且利用树结构，我们显示了max-minRobust OT也有闭形表达，可以快速Compute。此外，我们显示了max-minRobust OT满足了度量性质，并且是负定的。我们运用其负定性，提出了一种正definite核函数，并在实验中对多个实际数据进行测试，包括文档分类和数据探索。

Analyzing the contribution of different passively collected data to predict Stress and Depression

paper_url: http://arxiv.org/abs/2310.13607
repo_url: None
paper_authors: Irene Bonafonte, Cristina Bustos, Abraham Larrazolo, Gilberto Lorenzo Martinez Luna, Adolfo Guzman Arenas, Xavier Baro, Isaac Tourgeman, Mercedes Balcells, Agata Lapedriza
for: 这个论文旨在利用捕获数据来评估心理健康。
methods: 论文使用不同类型的捕获数据（WiFi、GPS、社交互动、手机日志、体育活动和学术特征）来预测每天的自报焦虑和抑郁分数。
results: 研究发现，WiFi特征（表示移动 Pattern）和手机日志特征（与睡眠 Pattern相关）对焦虑和抑郁预测具有重要作用。

Abstract
The possibility of recognizing diverse aspects of human behavior and environmental context from passively captured data motivates its use for mental health assessment. In this paper, we analyze the contribution of different passively collected sensor data types (WiFi, GPS, Social interaction, Phone Log, Physical Activity, Audio, and Academic features) to predict daily selfreport stress and PHQ-9 depression score. First, we compute 125 mid-level features from the original raw data. These 125 features include groups of features from the different sensor data types. Then, we evaluate the contribution of each feature type by comparing the performance of Neural Network models trained with all features against Neural Network models trained with specific feature groups. Our results show that WiFi features (which encode mobility patterns) and Phone Log features (which encode information correlated with sleep patterns), provide significative information for stress and depression prediction.

摘要
“ passively captured 数据的多方面特征和环境 контекст的识别，为心理健康评估带来了动机。本文分析了不同类型的抓取数据（WiFi、GPS、社交交互、手机记录、体育活动和学术特征）对日常自报 стресс和PHQ-9抑郁分数预测的贡献。首先，我们从原始的Raw数据中计算出125个中间特征。这125个特征包括不同数据类型的特征组。然后，我们评估每个特征类型的贡献，通过比较使用所有特征和特定特征组合的神经网络模型的性能来进行比较。我们的结果显示，WiFi特征（表示流动性模式）和手机记录特征（表示睡眠模式相关信息）对心理健康预测提供了重要的信息。”Note: The translation is done using Google Translate, and may not be perfect. Please let me know if you need further assistance.

Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature Space

paper_url: http://arxiv.org/abs/2310.13572
repo_url: https://github.com/yufei-gu-451/double_descent_inference
paper_authors: Yufei Gu, Xiaoqing Zheng, Tomaso Aste
for: 这个论文主要是研究深度学习中的双峰现象，以及这种现象如何与噪声数据相关。
methods: 作者们使用了多种方法来研究双峰现象，包括对学习表示空间的分析，以及对不同模型和任务的实验研究。
results: 研究结果表明，双峰现象与噪声数据的存在密切相关，并且可以通过增加模型的参数数量来避免或减少这种现象的发生。作者们还提出了一种理论，即双峰现象是由模型首先学习噪声数据，然后通过拟合来添加隐式正则化的过程。

Abstract
Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon in specific contexts, an accepted theory to account for its occurrence in deep learning remains yet to be established. In this study, we revisit the phenomenon of double descent and demonstrate that its occurrence is strongly influenced by the presence of noisy data. Through conducting a comprehensive analysis of the feature space of learned representations, we unveil that double descent arises in imperfect models trained with noisy data. We argue that double descent is a consequence of the model first learning the noisy data until interpolation and then adding implicit regularization via over-parameterization acquiring therefore capability to separate the information from the noise. We postulate that double descent should never occur in well-regularized models.

摘要

On sample complexity of conditional independence testing with Von Mises estimator with application to causal discovery

paper_url: http://arxiv.org/abs/2310.13553
repo_url: None
paper_authors: Fateme Jamshidi, Luca Ganassali, Negar Kiyavash
for: 本研究旨在提出一种基于kernel density estimator的非 Parametric Von Mises estimator，用于测量多变量分布的 entropy。
methods: 该研究使用了 conditional independence testing，并基于 exponential concentration inequality，设计了一种名为VM-CI的测试方法，可以实现最佳参数化速率。
results: 研究表明，VM-CI测试方法可以在smoothness assumptions下 achiev 最佳 Parametric rates，并且可以 Characterize 任何基于VM-CI测试的 constraint-based causal discovery algorithm 的样本复杂性。此外，VM-CI测试方法在实际中也能够超过其他流行的CI测试方法，这对结构学习也有着良好的影响。

Abstract
Motivated by conditional independence testing, an essential step in constraint-based causal discovery algorithms, we study the nonparametric Von Mises estimator for the entropy of multivariate distributions built on a kernel density estimator. We establish an exponential concentration inequality for this estimator. We design a test for conditional independence (CI) based on our estimator, called VM-CI, which achieves optimal parametric rates under smoothness assumptions. Leveraging the exponential concentration, we prove a tight upper bound for the overall error of VM-CI. This, in turn, allows us to characterize the sample complexity of any constraint-based causal discovery algorithm that uses VM-CI for CI tests. To the best of our knowledge, this is the first sample complexity guarantee for causal discovery for continuous variables. Furthermore, we empirically show that VM-CI outperforms other popular CI tests in terms of either time or sample complexity (or both), which translates to a better performance in structure learning as well.

摘要
使用假设独立性测试为导向，我们研究非 Parametric Von Mises 估计器，用于测量多变量分布的熵。我们证明了这个估计器的快速凝固不等式。我们基于这个估计器设计了一个 Conditional Independence （CI）测试，称为 VM-CI，它在幂级假设下具有最佳参数化速率。利用快速凝固不等式，我们证明了 VM-CI 测试的总错误Bound。这使得我们可以计算任何基于约束的 causal discovery 算法的样本复杂度。根据我们所知，这是首次对连续变量 causal discovery 提供样本复杂度保证。此外，我们还 empirically 表明，VM-CI 测试在时间或样本复杂度（或 Both）方面比其他流行的 CI 测试更高效，这也意味着它在结构学习方面表现更好。Note:* "独立性测试" (conditional independence test) is translated as "独立性测试" in Simplified Chinese.* "假设独立性" (conditional independence) is translated as "假设独立性" in Simplified Chinese.* "熵" (entropy) is translated as "熵" in Simplified Chinese.* "估计器" (estimator) is translated as "估计器" in Simplified Chinese.* "快速凝固不等式" (exponential concentration inequality) is translated as "快速凝固不等式" in Simplified Chinese.* "Conditional Independence 测试" (CI test) is translated as "Conditional Independence 测试" in Simplified Chinese.* "约束" (constraint) is translated as "约束" in Simplified Chinese.* "causal discovery" is translated as " causal discovery" in Simplified Chinese.

Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes

paper_url: http://arxiv.org/abs/2310.13550
repo_url: None
paper_authors: Ruiquan Huang, Yuan Cheng, Jing Yang, Vincent Tan, Yingbin Liang
for: 多任务束规划学习（RL）下的Markov决策过程（MDP）中存在共享幽默结构可以获得显著的采样效率提升，本文探索这种优势是否可以扩展到更一般的顺序决策问题，例如部分可观察MDP（POMDP）和更一般的预测状态表示（PSR）。
methods: 本文使用联合模型类来描述任务，并使用$\eta$-括号数量来衡量其复杂度，这个数量也用于量化任务之间的相似性，从而确定多任务学习是否具有优势。
results: 本文提出了一种可求最优策略的算法UMT-PSR，并证明在所有PSRs中找到近似优化策略的执行是可靠的。此外，本文还提供了一些具有小$\eta$-括号数量的多任务PSR示例，这些示例可以充分利用多任务学习的优势。最后，本文还研究了下游学习，即通过对已知任务的学习来学习一个新的目标任务，并证明可以通过利用已知PSRs来实现高效的学习。

Abstract
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures among multiple MDPs has been shown to yield significant benefits to the sample efficiency compared to single-task RL. In this paper, we investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs). The main challenge here is that the large and complex model space makes it hard to identify what types of common latent structure of multi-task PSRs can reduce the model complexity and improve sample efficiency. To this end, we posit a joint model class for tasks and use the notion of $\eta$-bracketing number to quantify its complexity; this number also serves as a general metric to capture the similarity of tasks and thus determines the benefit of multi-task over single-task RL. We first study upstream multi-task learning over PSRs, in which all tasks share the same observation and action spaces. We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSRs has a smaller $\eta$-bracketing number compared to that of individual single-task learning. We also provide several example multi-task PSRs with small $\eta$-bracketing numbers, which reap the benefits of multi-task learning. We further investigate downstream learning, in which the agent needs to learn a new target task that shares some commonalities with the upstream tasks via a similarity constraint. By exploiting the learned PSRs from the upstream, we develop a sample-efficient algorithm that provably finds a near-optimal policy.

摘要
在多任务强化学习（RL）下，存在共享隐藏结构的多任务Markov决策过程（MDP）可以提供显著的样本效率提升。在这篇论文中，我们研究了这种优势是否可以扩展到更通用的序列决策问题，如部分可观察MDP（POMDP）和更通用的预测状态表示（PSR）。主要挑战在于，由于模型空间的大小和复杂性，很难确定共享隐藏结构多任务PSR可以降低模型复杂性并提高样本效率。为此，我们提出了一个共同模型类，并使用η-括号数量来衡量其复杂性；这个数量也用于捕捉任务的相似性，从而确定多任务RL的优势。我们首先研究了上游多任务学习，在其中所有任务共享同一个观察和动作空间。我们提出了一个可证fficient的算法UMT-PSR，可以在所有PSR上找到近似优化策略，并证明多任务学习的优势在joint模型类的η-括号数量较小时manifests。我们还提供了一些具有小η-括号数量的多任务PSR示例，这些示例能够利用多任务学习的优势。我们进一步调查下游学习，在其中Agent需要通过一个相似性约束学习一个新的目标任务，该任务与上游任务共享一些共同特征。通过利用已经学习的PSR，我们开发了一种可证fficient的算法，可以在样本效率上提高。

Feature Selection and Hyperparameter Fine-tuning in Artificial Neural Networks for Wood Quality Classification

paper_url: http://arxiv.org/abs/2310.13490
repo_url: None
paper_authors: Mateus Roder, Leandro Aparecido Passos, João Paulo Papa, André Luis Debiaso Rossi
for: 这篇论文的目的是提出一种可行的机器学习方法来解决木板质量评估问题，以替代人工操作。
methods: 这篇论文使用人工神经网络（ANN）进行模型训练，并同时调整模型的超参数和选择特征集。
results: 实验结果表明，在不同特征集和超参数配置下，模型的预测性能有很大差异。在一些情况下，只进行特征选择或超参数调整可以达到最佳预测性能。

Abstract
Quality classification of wood boards is an essential task in the sawmill industry, which is still usually performed by human operators in small to median companies in developing countries. Machine learning algorithms have been successfully employed to investigate the problem, offering a more affordable alternative compared to other solutions. However, such approaches usually present some drawbacks regarding the proper selection of their hyperparameters. Moreover, the models are susceptible to the features extracted from wood board images, which influence the induction of the model and, consequently, its generalization power. Therefore, in this paper, we investigate the problem of simultaneously tuning the hyperparameters of an artificial neural network (ANN) as well as selecting a subset of characteristics that better describes the wood board quality. Experiments were conducted over a private dataset composed of images obtained from a sawmill industry and described using different feature descriptors. The predictive performance of the model was compared against five baseline methods as well as a random search, performing either ANN hyperparameter tuning and feature selection. Experimental results suggest that hyperparameters should be adjusted according to the feature set, or the features should be selected considering the hyperparameter values. In summary, the best predictive performance, i.e., a balanced accuracy of $0.80$, was achieved in two distinct scenarios: (i) performing only feature selection, and (ii) performing both tasks concomitantly. Thus, we suggest that at least one of the two approaches should be considered in the context of industrial applications.

摘要
<>转换文本到简化中文。<>钢板质量分类是锯木行业中的一项重要任务，通常由人工操作员在发展中国家的小到中型公司中完成。机器学习算法已成功应用于这个问题，提供了更加可持预算的解决方案。然而，这些方法通常存在一些有关选择hyperparameter的缺点。此外，模型受到木板图像特征的影响，这会影响模型的泛化能力。因此，在这篇论文中，我们 investigate了同时调整ANN的hyperparameter和选择一个更好地描述木板质量的特征集。实验在一个私有的数据集上进行，该数据集包括锯木行业中的图像，并使用不同的特征描述器。模型预测性能与五个基线方法以及随机搜索进行比较。实验结果表明，hyperparameter应该与特征集进行调整，或者选择特征集应该考虑到hyperparameter值。简要总结，最佳预测性能，即平均准确率0.80，在两个不同的方案下实现：（1）只进行特征选择，和（2）同时进行特征选择和hyperparameter调整。因此，我们建议在工业应用中至少考虑一种这两种方法。

Personalized identification, prediction, and stimulation of neural oscillations via data-driven models of epileptic network dynamics

paper_url: http://arxiv.org/abs/2310.13480
repo_url: None
paper_authors: Tena Dubcek, Debora Ledergerber, Jana Thomann, Giovanna Aiello, Marc Serra-Garcia, Lukas Imbach, Rafael Polania
for: 这篇论文的目的是为了提供一个基于EEG数据的个体化预测模型，以评估和预测脑动力疾病的疗效。
methods: 这篇论文使用了EEG数据，通过分析脑律动的快速对频和对频几何，提取了个体化的预测模型。
results: 这篇论文的结果显示，透过这个个体化预测模型，可以对脑动力疾病的疗效进行预测和评估。此外，这个模型还可以显示， periodic brain stimulation 可以导向疾病脑动力状态转化为健康的脑动力状态。

Abstract
Neural oscillations are considered to be brain-specific signatures of information processing and communication in the brain. They also reflect pathological brain activity in neurological disorders, thus offering a basis for diagnoses and forecasting. Epilepsy is one of the most common neurological disorders, characterized by abnormal synchronization and desynchronization of the oscillations in the brain. About one third of epilepsy cases are pharmacoresistant, and as such emphasize the need for novel therapy approaches, where brain stimulation appears to be a promising therapeutic option. The development of brain stimulation paradigms, however, is often based on generalized assumptions about brain dynamics, although it is known that significant differences occur between patients and brain states. We developed a framework to extract individualized predictive models of epileptic network dynamics directly from EEG data. The models are based on the dominant coherent oscillations and their dynamical coupling, thus combining an established interpretation of dynamics through neural oscillations, with accurate patient-specific features. We show that it is possible to build a direct correspondence between the models of brain-network dynamics under periodic driving, and the mechanism of neural entrainment via periodic stimulation. When our framework is applied to EEG recordings of patients in status epilepticus (a brain state of perpetual seizure activity), it yields a model-driven predictive analysis of the therapeutic performance of periodic brain stimulation. This suggests that periodic brain stimulation can drive pathological states of epileptic network dynamics towards a healthy functional brain state.

摘要
Simplified Chinese translation:神经振荡被视为脑特有的信息处理和通信特征。它们也反映了脑神经疾病的病理活动，因此提供了诊断和预测的基础。癫痫是最常见的神经疾病之一，表现为脑内oscillation的异常同步和反同步。约一 third的癫痫患者无法由药物控制，这重申了需要新的治疗方法，其中脑刺激似乎是一个有前途的治疗选择。然而，脑刺激模式的开发 frequently based on general assumptions about brain dynamics，尽管已知脑部和脑状态之间存在显著的差异。我们提出了一个框架，可以直接从EEG数据提取个性化预测模型。这些模型基于主要的同步振荡和其动态相互作用，因此结合了已知的神经振荡解释和准确的患者特有特征。我们显示了 periodic driving下brain-network dynamics的模型和神经整合的机制之间存在直接对应关系。当我们的框架应用于 Status epilepticus（脑状态）的EEG记录时，可以获得一种基于模型的预测分析，以评估脑刺激的治疗性能。这表明 periodic brain stimulation可以驱动癫痫网络动态向具有健康功能脑状态。

An Analysis of $D^α$ seeding for $k$-means

paper_url: http://arxiv.org/abs/2310.13474
repo_url: None
paper_authors: Etienne Bamas, Sai Ganesh Nagarajan, Ola Svensson
for: 这个论文的目的是提供关于$D^\alpha$ 种子算法（也称为$k$-means++ 算法）的一种理解，以及证明其在标准 $k$-means 成本上的近似因子。
methods: 该论文使用了$D^\alpha$ 种子算法，并对其进行了分析和证明。
results: 论文证明了，对于任何 $\alpha>2$，$D^\alpha$ 种子算法在标准 $k$-means 成本上提供了一个 $O_\alpha((g_\alpha)^{2/\alpha}\cdot \left(\frac{\sigma_{\mathrm{max}}{\sigma_{\mathrm{min}}\right)^{2-4/\alpha}\cdot (\min{\ell,\log k})^{2/\alpha})$ 的近似因子。此外，论文还提供了一些下界，证明了这个近似因子的依赖关系。最后，论文还提供了一些实验证明，证明了 $\alpha>2$ 可以提高 $k$-means 成本，并且这种优势可以在 Lloyd 算法后仍保持。

Abstract
One of the most popular clustering algorithms is the celebrated $D^\alpha$ seeding algorithm (also know as $k$-means++ when $\alpha=2$) by Arthur and Vassilvitskii (2007), who showed that it guarantees in expectation an $O(2^{2\alpha}\cdot \log k)$-approximate solution to the ($k$,$\alpha$)-means cost (where euclidean distances are raised to the power $\alpha$) for any $\alpha\ge 1$. More recently, Balcan, Dick, and White (2018) observed experimentally that using $D^\alpha$ seeding with $\alpha>2$ can lead to a better solution with respect to the standard $k$-means objective (i.e. the $(k,2)$-means cost). In this paper, we provide a rigorous understanding of this phenomenon. For any $\alpha>2$, we show that $D^\alpha$ seeding guarantees in expectation an approximation factor of $$ O_\alpha \left((g_\alpha)^{2/\alpha}\cdot \left(\frac{\sigma_{\mathrm{max}}{\sigma_{\mathrm{min}}\right)^{2-4/\alpha}\cdot (\min\{\ell,\log k\})^{2/\alpha}\right)$$ with respect to the standard $k$-means cost of any underlying clustering; where $g_\alpha$ is a parameter capturing the concentration of the points in each cluster, $\sigma_{\mathrm{max}$ and $\sigma_{\mathrm{min}$ are the maximum and minimum standard deviation of the clusters around their means, and $\ell$ is the number of distinct mixing weights in the underlying clustering (after rounding them to the nearest power of $2$). We complement these results by some lower bounds showing that the dependency on $g_\alpha$ and $\sigma_{\mathrm{max}/\sigma_{\mathrm{min}$ is tight. Finally, we provide an experimental confirmation of the effects of the aforementioned parameters when using $D^\alpha$ seeding. Further, we corroborate the observation that $\alpha>2$ can indeed improve the $k$-means cost compared to $D^2$ seeding, and that this advantage remains even if we run Lloyd's algorithm after the seeding.

摘要
一种非常受欢迎的聚类算法是由Arthur和Vassilvitskii（2007）提出的$D^\alpha$ 种子算法（当$\alpha=2$时也称为$k$-means++），它保证了在期望下有$O(2^{2\alpha}\cdot \log k)$-近似解决($k$,$\alpha$)-means成本（在欧几丁度距离上升到$\alpha$势），其中$\alpha\geq 1$。更 reciently，Balcan、Dick和White（2018）通过实验发现，使用$D^\alpha$种子算法可以在标准$k$-means目标上提供更好的解决方案，特别是当$\alpha>2$时。在这篇论文中，我们提供了一个充分理解这种现象的理论基础。对于任何$\alpha>2$，我们证明了$D^\alpha$种子算法在期望下对标准$k$-means成本的近似因子为：$$O_\alpha \left((g_\alpha)^{2/\alpha}\cdot \left(\frac{\sigma_{\mathrm{max}}{\sigma_{\mathrm{min}}\right)^{2-4/\alpha}\cdot (\min\{\ell,\log k\})^{2/\alpha}\right)$$其中$g_\alpha$是聚集点的含量，$\sigma_{\mathrm{max}$和$\sigma_{\mathrm{min}$是聚集点的最大和最小横幅，$\ell$是聚集点的数量。我们还提供了一些下界，证明了$g_\alpha$和$\sigma_{\mathrm{max}/\sigma_{\mathrm{min}$的依赖性是紧的。最后，我们通过实验证明了$D^\alpha$种子算法中各参数的效果，并证明了$\alpha>2$可以在标准$k$-means成本上提供更好的解决方案，而且这种优势仍然存在，即使在使用Lloyd算法之后。

Stable Nonconvex-Nonconcave Training via Linear Interpolation

paper_url: http://arxiv.org/abs/2310.13459
repo_url: None
paper_authors: Thomas Pethick, Wanyun Xie, Volkan Cevher
for: 这篇论文提出了一种理论分析，用于稳定（大规模）神经网络训练的线性 interpolate 方法。文章认为，优化过程中的不稳定性往往是因为损失函数的非升 monotonicity，并示了如何通过使用 nonexpansive 算子理论来利用 linear interpolate。
methods: 文章提出了一种新的优化方案，叫做 relaxed approximate proximal point (RAPP)，它是第一个可以达到最后迭代速度的方法，适用于整个 cohypomonotone 问题范围。此外，文章还扩展了 RAPP 方法，使其适用于约束和规范化 Setting。通过将内部优化器换成 Lookahead 算法，文章还重新发现了 Lookahead 算法家族，并证明了它们在 cohypomonotone 问题中的 convergence。
results: 文章通过实验示范了在生成对抗网络中的应用，证明了 linear interpolate 的利用带来的 benefits。

Abstract
This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear interpolation can help by leveraging the theory of nonexpansive operators. We construct a new optimization scheme called relaxed approximate proximal point (RAPP), which is the first explicit method to achieve last iterate convergence rates for the full range of cohypomonotone problems. The construction extends to constrained and regularized settings. By replacing the inner optimizer in RAPP we rediscover the family of Lookahead algorithms for which we establish convergence in cohypomonotone problems even when the base optimizer is taken to be gradient descent ascent. The range of cohypomonotone problems in which Lookahead converges is further expanded by exploiting that Lookahead inherits the properties of the base optimizer. We corroborate the results with experiments on generative adversarial networks which demonstrates the benefits of the linear interpolation present in both RAPP and Lookahead.

摘要
Translation notes:* "linear interpolation" is translated as "线性 interpolate" (língxìng jìshì)* "nonmonotonicity" is translated as "非 monotonicity" (fēi mónótónicity)* "cohypon monotone" is translated as "共凹 monotone" (gòng cháng mónótón)* "relaxed approximate proximal point" is translated as "松relaxed approximate proximal point" (sōngxiǎoxiǎo jìshì)* "last iterate convergence rates" is translated as "最后迭代收敛率" (zuìhòu dài tiědài shōu yè)* "base optimizer" is translated as "基础优化器" (jībì yóu jiā)* "Lookahead algorithms" is translated as "Lookahead算法" (Lookahead suān fǎ)Please note that the translation is based on the Simplified Chinese version of the text, and the translation may vary depending on the specific context and the version of the text.

Correspondence learning between morphologically different robots through task demonstrations

paper_url: http://arxiv.org/abs/2310.13458
repo_url: None
paper_authors: Hakan Aktas, Yukie Nagai, Minoru Asada, Erhan Oztop, Emre Ugur
for:本文旨在学习不同机器人之间的对应关系，以便将一个机器人学习的技能转移到另一个机器人上。methods:本文提出了一种方法，通过让两个机器人执行相同的任务，形成共同的隐藏表示。然后，通过观察一个机器人执行任务，生成另一个机器人所需的隐藏空间表示。results:本文通过实验和证明，证明了该方法可以成功地学习机器人之间的对应关系。在不同的任务和 trajectory 上，该方法能够将一个机器人学习的技能转移到另一个机器人上。

Abstract
We observe a large variety of robots in terms of their bodies, sensors, and actuators. Given the commonalities in the skill sets, teaching each skill to each different robot independently is inefficient and not scalable when the large variety in the robotic landscape is considered. If we can learn the correspondences between the sensorimotor spaces of different robots, we can expect a skill that is learned in one robot can be more directly and easily transferred to the other robots. In this paper, we propose a method to learn correspondences between robots that have significant differences in their morphologies: a fixed-based manipulator robot with joint control and a differential drive mobile robot. For this, both robots are first given demonstrations that achieve the same tasks. A common latent representation is formed while learning the corresponding policies. After this initial learning stage, the observation of a new task execution by one robot becomes sufficient to generate a latent space representation pertaining to the other robot to achieve the same task. We verified our system in a set of experiments where the correspondence between two simulated robots is learned (1) when the robots need to follow the same paths to achieve the same task, (2) when the robots need to follow different trajectories to achieve the same task, and (3) when complexities of the required sensorimotor trajectories are different for the robots considered. We also provide a proof-of-the-concept realization of correspondence learning between a real manipulator robot and a simulated mobile robot.

摘要
我们观察到多种机器人在体型、感知器和 actuator 方面存在差异。由于 robotic 领域中机器人的多样性，单独教学每个机器人的技能不可能scalable。如果我们可以学习不同机器人感知动作空间之间的对应关系，那么我们可以预期一个学习在一个机器人上的技能可以更直接和更容易地转移到另一个机器人。在这篇论文中，我们提出了一种方法，可以在机器人之间学习对应关系，其中机器人之间存在重大差异。为此，我们首先给两个机器人提供了完成同一任务的示范。在学习初始阶段，我们形成了共同的潜在表示。然后，当一个机器人执行新任务时， observing 它的行为就能够生成另一个机器人所需的潜在空间表示，以便完成同一任务。我们在一系列实验中验证了我们的系统，其中包括（1）两个机器人需要跟踪同一条路来完成同一任务，（2）两个机器人需要跟踪不同的路径来完成同一任务，以及（3）两个机器人所需的感知动作轨迹之间存在差异。此外，我们还提供了一个实际实现对应学习的真实搅拌机器人和虚拟移动机器人之间的对应关系。

Y-Diagonal Couplings: Approximating Posteriors with Conditional Wasserstein Distances

paper_url: http://arxiv.org/abs/2310.13433
repo_url: None
paper_authors: Jannis Chemseddine, Paul Hagemann, Christian Wald
for: 本研究探讨了在逆问题中使用 conditional Wasserstein distance 来 aproximate posterior distribution。
methods: 本研究使用了一种叫做 conditional Wasserstein distance 的方法，该方法使用一组 restriction couplings 来等价 posterior measure。
results: 本研究发现，使用 conditional Wasserstein distance 可以获得更好的 posterior sampling 性能，并且在某些条件下，vanilla Wasserstein distance 和 conditional Wasserstein distance 相同。

Abstract
In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its learned approximation. While this approach also controls the distance between the posterior measures in the case of the Kullback Leibler divergence, it does not hold true for the Wasserstein distance. We will introduce a conditional Wasserstein distance with a set of restricted couplings that equals the expected Wasserstein distance of the posteriors. By deriving its dual, we find a rigorous way to motivate the loss of conditional Wasserstein GANs. We outline conditions under which the vanilla and the conditional Wasserstein distance coincide. Furthermore, we will show numerical examples where training with the conditional Wasserstein distance yields favorable properties for posterior sampling.

摘要
在逆 проблеme中，许多条件生成模型通过控制 JOIN 度和其学习的近似关系来估算 posterior measure。虽然这种方法在 Kullback Leibler 距离下也控制 posterior measures，但并不适用于 Wasserstein 距离。我们将引入一种 conditional Wasserstein 距离，其等于 posterior 的预期 Wasserstein 距离。通过它的 dual，我们找到了 condition Wasserstein GANs 的损失的准确优化方法。我们还详细介绍了这些方法在 vanilla 和 conditional Wasserstein 距离之间的差异，以及训练时使用 conditional Wasserstein 距离的有利属性。Here's the translation in Traditional Chinese:在逆问题中，许多条件生成模型通过控制 JOIN 度和其学习的近似关系来估算 posterior measure。处于 Kullback Leibler 距离下也控制 posterior measures，但并不适用于 Wasserstein 距离。我们将引入一种 conditional Wasserstein 距离，其等于 posterior 的预期 Wasserstein 距离。通过它的 dual，我们找到了 condition Wasserstein GANs 的损失的准确优化方法。我们还详细介绍了这些方法在 vanilla 和 conditional Wasserstein 距离之间的差异，以及训练时使用 conditional Wasserstein 距离的有利属性。

HRTF Interpolation using a Spherical Neural Process Meta-Learner

paper_url: http://arxiv.org/abs/2310.13430
repo_url: None
paper_authors: Etienne Thuillier, Craig Jin, Vesa Välimäki
for:* The paper aims to estimate a subject’s Head-Related Transfer Function (HRTF) using convenient input modalities such as anthropometric measurements or pinnae photographs.methods:* The paper proposes a Convolutional Conditional Neural Process meta-learner specialized in HRTF error interpolation, which includes a Spherical Convolutional Neural Network component and exploits potential symmetries between the HRTF’s left and right channels.results:* The proposed model achieves up to 3 dB relative error reduction compared to state-of-the-art interpolation methods, with a reduction in the data point count required to achieve comparable accuracy from 50 to 28 points. Additionally, the trained model provides well-calibrated uncertainty estimates.

Abstract
Several individualization methods have recently been proposed to estimate a subject's Head-Related Transfer Function (HRTF) using convenient input modalities such as anthropometric measurements or pinnae photographs. There exists a need for adaptively correcting the estimation error committed by such methods using a few data point samples from the subject's HRTF, acquired using acoustic measurements or perceptual feedback. To this end, we introduce a Convolutional Conditional Neural Process meta-learner specialized in HRTF error interpolation. In particular, the model includes a Spherical Convolutional Neural Network component to accommodate the spherical geometry of HRTF data. It also exploits potential symmetries between the HRTF's left and right channels about the median axis. In this work, we evaluate the proposed model's performance purely on time-aligned spectrum interpolation grounds under a simplified setup where a generic population-mean HRTF forms the initial estimates prior to corrections instead of individualized ones. The trained model achieves up to 3 dB relative error reduction compared to state-of-the-art interpolation methods despite being trained using only 85 subjects. This improvement translates up to nearly a halving of the data point count required to achieve comparable accuracy, in particular from 50 to 28 points to reach an average of -20 dB relative error per interpolated feature. Moreover, we show that the trained model provides well-calibrated uncertainty estimates. Accordingly, such estimates can inform the sequential decision problem of acquiring as few correcting HRTF data points as needed to meet a desired level of HRTF individualization accuracy.

摘要
几种个性化方法已经最近提出来估算用户的头相关传函数（HRTF），使用便捷的输入模式，如人体测量或耳朵照片。存在一个需要使用一些数据点样本来修正估算错误的需求。为此，我们介绍了一种适应HRTF错误 interpolator。特别是，该模型包括一个圆拟神经网络组件，以便处理HRTF数据的圆形几何。它还利用HRTF左右通道关于中心轴的可能 symmetries。在这种简化的设置下，我们评估了提posed模型的性能，即使只使用85名用户进行训练。结果显示，训练后的模型可以将相对错误降低至3dB，相比之前的状态 искусство interpolating方法。此外，我们还证明了训练后的模型可以提供准确的不确定性估计。因此，这些估计可以为sequential decision问题提供指导，即需要收集多少个正确的HRTF数据点以达到满意的个性化精度。

BRFL: A Blockchain-based Byzantine-Robust Federated Learning Model

paper_url: http://arxiv.org/abs/2310.13403
repo_url: None
paper_authors: Yang Li, Chunhe Xia, Chang Li, Tianbo Wang
for: 这篇论文旨在提出一个基于区块链技术的联邦学习模型，以提高该模型对于伪造模型的抵抗力。
methods: 本文使用了联邦学习和区块链技术的结合，实现了模型伪造追踪和地方训练Client的优化。特别是，在数据联邦时，选择了基于对应关系的数据范围，并使用了气泡聚类和平均梯度计算，以验证模型的准确性。
results: 实验结果显示，该模型在公开数据上显示了比基于其他基准的伪造抵抗方法更高的抵抗性。此外，模型还能够降低联邦学习的资源消耗问题。

Abstract
With the increasing importance of machine learning, the privacy and security of training data have become critical. Federated learning, which stores data in distributed nodes and shares only model parameters, has gained significant attention for addressing this concern. However, a challenge arises in federated learning due to the Byzantine Attack Problem, where malicious local models can compromise the global model's performance during aggregation. This article proposes the Blockchain-based Byzantine-Robust Federated Learning (BRLF) model that combines federated learning with blockchain technology. This integration enables traceability of malicious models and provides incentives for locally trained clients. Our approach involves selecting the aggregation node based on Pearson's correlation coefficient, and we perform spectral clustering and calculate the average gradient within each cluster, validating its accuracy using local dataset of the aggregation nodes. Experimental results on public datasets demonstrate the superior byzantine robustness of our secure aggregation algorithm compared to other baseline byzantine robust aggregation methods, and proved our proposed model effectiveness in addressing the resource consumption problem.

摘要
随着机器学习的重要性增加，训练数据的隐私和安全问题日益突出。联邦学习，即将数据存储在分布式节点上并只分享模型参数，已经吸引了广泛关注以Addressing这个问题。然而，联邦学习中出现了拜占庭攻击问题，其中有些本地模型会在聚合过程中损害全局模型的性能。这篇文章提出了基于区块链技术的联邦学习模型（BRLF），该模型结合了联邦学习和区块链技术。这种结合使得可以追溯到恶意模型，并为本地训练节点提供了奖励。我们的方法是根据潘森相关系数选择聚合节点，并对每个群进行spectral clustering，计算每个群的平均梯度，并验证其准确性使用本地数据集。实验结果表明，我们的安全聚合算法在其他基准拜占庭Robust聚合方法的比较中显示出了更高的拜占庭Robust性，并证明了我们提出的模型的有效性。

Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability

paper_url: http://arxiv.org/abs/2310.13402
repo_url: https://github.com/dmml-geneva/calibrated-posterior
paper_authors: Maciej Falkiewicz, Naoya Takeishi, Imahn Shekhzadeh, Antoine Wehenkel, Arnaud Delaunoy, Gilles Louppe, Alexandros Kalousis
for: 这个论文主要用于提出一种基于神经网络的隐藏Variable Bayesian inference算法，以提高 posterior belief 的不确定性评估。
methods: 该算法使用了选择性的隐藏Variable Bayesian inference技术，并引入了一个抑制error的评估项直接到训练目标函数中。
results: 经验表明，该算法可以在六个 benchmark 问题上达到或超过前 exists 的方法的覆盖率和预期 posterior density 水平。

Abstract
Bayesian inference allows expressing the uncertainty of posterior belief under a probabilistic model given prior information and the likelihood of the evidence. Predominantly, the likelihood function is only implicitly established by a simulator posing the need for simulation-based inference (SBI). However, the existing algorithms can yield overconfident posteriors (Hermans *et al.*, 2022) defeating the whole purpose of credibility if the uncertainty quantification is inaccurate. We propose to include a calibration term directly into the training objective of the neural model in selected amortized SBI techniques. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. The proposed method is not tied to any particular neural model and brings moderate computational overhead compared to the profits it introduces. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference. We empirically show on six benchmark problems that the proposed method achieves competitive or better results in terms of coverage and expected posterior density than the previously existing approaches.

摘要
bayesian 推理允许表达基于概率模型的 posterior belief 中的不确定性，givem prior information 和证据的可能性。然而，现有的算法可能会导致过于自信的 posterior （Hermans *et al.*, 2022），这会让 credibility 失效，如果uncertainty quantification 不准确。我们提议直接在 neural model 的训练目标中包含 calibration 项。通过 relaxes classical 形式的 calibration error 的形式，我们可以实现 end-to-end backpropagation。该方法不仅可以应用于特定的 neural model，而且相比于它引入的计算开销，它带来了moderate的计算开销。它可以直接应用于现有的计算管道， allowing reliable black-box posterior inference。我们在六个 benchmark 问题上进行了实验，并证明了该方法可以达到与之前的方法相同或更好的coverage和预期 posterior density的结果。

Equivariant Deep Weight Space Alignment

paper_url: http://arxiv.org/abs/2310.13397
repo_url: None
paper_authors: Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik, Nadav Dym, Haggai Maron
for: 深度网络的 permutation symmetries 使得模型平均化和相似性判断变得困难。这些问题的解决需要对深度网络的 weights 进行对齐。
methods: 我们提出了一种新的框架，即 Deep-Align，用于解决这些问题。我们首先证明 weight alignment 遵循两种基本的 symmetries，然后提出了一种深度架构，该架构遵循这些 symmetries。
results: 我们的实验结果表明，使用 Deep-Align 可以更快地生成更好的对齐，并且可以用作其他方法的初始化来获得更好的解决方案，并且可以带来显著的加速。

Abstract
Permutation symmetries of deep networks make simple operations like model averaging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. More generally, weight alignment is essential for a wide range of applications, from model merging, through exploring the optimization landscape of deep neural networks, to defining meaningful distance functions between neural networks. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first demonstrate that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an initialization for other methods to gain even better solutions with a significant speedup in convergence.

摘要
深度网络的 permutation symmetries 使得一些简单的操作，如模型平均和相似性估计，变得复杂。在许多情况下，对深度网络的Weight进行对齐，即找出最佳的 permutation между它们的Weight，是必要的。更广泛地说，Weight对齐是许多应用的关键，从模型合并、探索深度神经网络的优化困难度到定义深度神经网络之间的意义ful distance function。 unfortunately，Weight对齐是NP-hard问题。先前的研究主要集中在解决Weight对齐问题的宽松版本上，导致 either time-consuming methods or suboptimal solutions。为了加速对齐过程并提高其质量，我们提出了一个新的框架，我们称之为Deep-Align。为达到这一目标，我们首先示出Weight对齐符合两种基本的Symmetries，然后我们提议一种尊重这些Symmetries的深度建筑。很notationably，我们的框架不需要任何标注数据。我们提供了对我们方法的理论分析，并在多种网络架构和学习设置下进行了实验性测试。我们的实验结果表明，在Deep-Align中通过 feed-forward pass 生成的对齐比现有的优化算法更好或相同。此外，我们的对齐可以作为其他方法的初始化，以获得更好的解决方案，并且具有显著的加速减速效果。

RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup

paper_url: http://arxiv.org/abs/2310.13396
repo_url: https://github.com/nico-bohlinger/rl-x
paper_authors: Nico Bohlinger, Klaus Dorer
for: 这篇论文是为了描述一个新的深度学习约束学习（DRL）库RL-X，以及其应用于RoboCup Soccer Simulation 3D League和经典DRL benchmark。
methods: RL-X使用自适应的JAX实现，可以 дости到与知名框架Stable-Baselines3相比的4.5倍加速。
results: RL-X可以在RoboCup Soccer Simulation 3D League和经典DRL benchmark上实现更高的性能。

Abstract
This paper presents the new Deep Reinforcement Learning (DRL) library RL-X and its application to the RoboCup Soccer Simulation 3D League and classic DRL benchmarks. RL-X provides a flexible and easy-to-extend codebase with self-contained single directory algorithms. Through the fast JAX-based implementations, RL-X can reach up to 4.5x speedups compared to well-known frameworks like Stable-Baselines3.

摘要
这篇论文介绍了新的深度强化学习（DRL）库RL-X和其在RoboCup Soccer Simulation 3D League和经典DRL benchmarks中的应用。RL-X提供了灵活且易于扩展的代码基础，通过快速的JAX实现，RL-X可以与已知框架如Stable-Baselines3进行比较，达到4.5倍的速度提升。

Optimal Best Arm Identification with Fixed Confidence in Restless Bandits

paper_url: http://arxiv.org/abs/2310.13393
repo_url: None
paper_authors: P. N. Karthik, Vincent Y. F. Tan, Arpan Mukherjee, Ali Tajer
for: 最佳臂标识（best arm identification）在不断时钟的多臂矢量游戏中（restless multi-armed bandit setting），具有 фиnitely many arms的情况。
methods: 使用 homogeneous Markov chain 模型（homogeneous Markov chain model）， captured by an ergodic transition probability matrix (TPM) that is a member of a single-parameter exponential family of TPMs。
results: 提出了一种策略（policy），其预期停止时间（expected stopping time）的增长率与下界（lower bound）匹配，并且证明了这种策略在极限下的错误probability（error probability）下逐渐 converges to the optimal policy。

Abstract
We study best arm identification in a restless multi-armed bandit setting with finitely many arms. The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space. The state transitions in each arm are captured by an ergodic transition probability matrix (TPM) that is a member of a single-parameter exponential family of TPMs. The real-valued parameters of the arm TPMs are unknown and belong to a given space. Given a function $f$ defined on the common state space of the arms, the goal is to identify the best arm -- the arm with the largest average value of $f$ evaluated under the arm's stationary distribution -- with the fewest number of samples, subject to an upper bound on the decision's error probability (i.e., the fixed-confidence regime). A lower bound on the growth rate of the expected stopping time is established in the asymptote of a vanishing error probability. Furthermore, a policy for best arm identification is proposed, and its expected stopping time is proved to have an asymptotic growth rate that matches the lower bound. It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds. It is shown that under every policy, the state-action visitation proportions satisfy a specific approximate flow conservation constraint and that these proportions match the optimal proportions dictated by the lower bound under any asymptotically optimal policy. The prior studies on best arm identification in restless bandits focus on independent observations from the arms, rested Markov arms, and restless Markov arms with known arm TPMs. In contrast, this work is the first to study best arm identification in restless bandits with unknown arm TPMs.

摘要
我们研究了最佳臂标识在不平静多臂带刺设定中，其中每个臂生成了一个homogeneous Markov链，这个链的状态转移是由一个不知道的臂转移概率矩阵（TPM）捕捉，这个TPM是一个单参数的 exponential family 中的一员。臂的实际参数是未知的， belong to a given space。给定一个函数 $f$ 在臂的共同状态空间上定义，我们的目标是在最少样本数下确定最佳臂，即臂的stationary distribution下的最大平均值。在fixed-confidence regime下，我们提出了一个策略，并证明其预期停止时间的增长率与下界匹配。此外，我们还证明了跟踪臂的长期行为和状态-动作访问比例是分析下界和可达性下界的关键组成部分。我们表明，对于任何策略，状态-动作访问比例满足一个特定的approximate flow conservation constraint，这些比例与最佳策略下的下界匹配。与前一些研究不同的是，本研究是在 unknown arm TPMs 下进行最佳臂标识。

Music Augmentation and Denoising For Peak-Based Audio Fingerprinting

paper_url: http://arxiv.org/abs/2310.13388
repo_url: https://github.com/deezer/musicFPaugment
paper_authors: Kamil Akesbi, Dorian Desblancs, Benjamin Martin
for: 这个论文目的是提高音频标识系统的精度和可靠性，特别是在噪音环境下。
methods: 论文提出了一种新的音频增强管道，通过模拟实际情况来加入噪音到音乐片断中，以提高音频标识系统的精度。此外，论文还提出了一种深度学习模型，用于从spectrogram中除除噪音 ком分量，以提高音频标识系统的性能。
results: 论文的实验结果表明，通过添加这些模型，可以提高常用的音频标识系统的准确率，即使在噪音环境下。

Abstract
Audio fingerprinting is a well-established solution for song identification from short recording excerpts. Popular methods rely on the extraction of sparse representations, generally spectral peaks, and have proven to be accurate, fast, and scalable to large collections. However, real-world applications of audio identification often happen in noisy environments, which can cause these systems to fail. In this work, we tackle this problem by introducing and releasing a new audio augmentation pipeline that adds noise to music snippets in a realistic way, by stochastically mimicking real-world scenarios. We then propose and release a deep learning model that removes noisy components from spectrograms in order to improve peak-based fingerprinting systems' accuracy. We show that the addition of our model improves the identification performance of commonly used audio fingerprinting systems, even under noisy conditions.

摘要
音频指纹技术是已有的解决方案，可以从短音频片断中识别歌曲。流行的方法通常基于稀疏表示EXTRACTION，通常是 spectral peaks，并已经证明准确、快速和可扩展到大量收藏。然而，实际应用中的音频识别经常发生在噪音环境中，这会使这些系统失败。在这项工作中，我们解决这个问题，通过引入和发布一个新的音频增强管道，该管道在真实的场景下做出随机尝试，以模拟实际中的噪音。然后，我们提议并发布一种深度学习模型，可以从spectrogram中除掉噪声组件，以提高基于peak的音频指纹系统的准确性。我们表明，在噪音环境下，加入我们的模型可以提高通常使用的音频指纹系统的识别性能。

Assumption violations in causal discovery and the robustness of score matching

paper_url: http://arxiv.org/abs/2310.13387
repo_url: None
paper_authors: Francesco Montagna, Atalanti A. Mastakouri, Elias Eulig, Nicoletta Noceti, Lorenzo Rosasco, Dominik Janzing, Bryon Aragam, Francesco Locatello
for: This paper aims to evaluate the empirical performance of recent causal discovery methods on observational i.i.d. data with different background conditions, allowing for violations of the critical assumptions required by each selected approach.
methods: The paper uses score matching-based methods to recover the causal structure, which demonstrate surprising performance in the false positive and false negative rate of the inferred graph in challenging scenarios.
results: The paper provides theoretical insights into the performance of these methods and is the first effort to benchmark the stability of causal discovery algorithms with respect to the values of their hyperparameters.

Abstract
When domain knowledge is limited and experimentation is restricted by ethical, financial, or time constraints, practitioners turn to observational causal discovery methods to recover the causal structure, exploiting the statistical properties of their data. Because causal discovery without further assumptions is an ill-posed problem, each algorithm comes with its own set of usually untestable assumptions, some of which are hard to meet in real datasets. Motivated by these considerations, this paper extensively benchmarks the empirical performance of recent causal discovery methods on observational i.i.d. data generated under different background conditions, allowing for violations of the critical assumptions required by each selected approach. Our experimental findings show that score matching-based methods demonstrate surprising performance in the false positive and false negative rate of the inferred graph in these challenging scenarios, and we provide theoretical insights into their performance. This work is also the first effort to benchmark the stability of causal discovery algorithms with respect to the values of their hyperparameters. Finally, we hope this paper will set a new standard for the evaluation of causal discovery methods and can serve as an accessible entry point for practitioners interested in the field, highlighting the empirical implications of different algorithm choices.

摘要
当域知识有限，实验受到伦理、金融或时间约束，实践者会选择观察型 causal 发现方法来恢复 causal 结构，利用数据的统计特性。因为无其他假设的 causal 发现是一个不定 пробле space，每种算法都来自其自己的集合不测试的假设，其中一些在实际数据中很难满足。驱动这些考虑，这篇论文对 recent causal discovery 方法进行了广泛的 benchmarking，使用了不同背景条件下的 observational i.i.d. 数据，允许假设满足不同方法的必要条件。我们的实验结果表明，使用 score matching 基于方法可以在这些复杂的场景中具有高度的 false positive 和 false negative 率，我们还提供了理论分析。这项工作还是首次对 causal discovery 算法的稳定性进行了 benchmarking，并且希望这篇论文可以设置一个新的标准 для causal discovery 方法的评估，并且可以为参与这个领域的实践者提供可访问的入门点，强调不同算法选择的 empirical 影响。

Salted Inference: Enhancing Privacy while Maintaining Efficiency of Split Inference in Mobile Computing

paper_url: http://arxiv.org/abs/2310.13384
repo_url: https://github.com/dr-bell/salted-dnns
paper_authors: Mohammad Malekzadeh, Fahim Kawsar
for: 这篇研究旨在提供一种控制深度神经网络（DNN）输出semantic interpretations的方法，以保持精度和效率，并满足在执行时间点网络的资料隐私和计算效率要求。
methods: 本研究使用了“Salted DNNs”方法，让客户在推断时控制DNN输出的semantic interpretations，而不会影响精度和效率。在推断过程中，使用了一个叫做“salted layer”的层，可以让客户控制DNN输出的semantic interpretations。
results: 实验结果显示，使用Salted DNNs方法可以保持高精度和效率，尤其是在将 salted layer 放在早期部分时。在实验中使用了两个不同的数据集，包括图像数据和感应器数据，并与标准DNN相比，Salted DNNs可以实现高精度和效率。

Abstract
Split inference partitions a deep neural network (DNN) to run the early part at the edge and the later part in the cloud. This meets two key requirements for on-device machine learning: input privacy and compute efficiency. Still, an open question in split inference is output privacy, given that the output of a DNN is visible to the cloud. While encrypted computing can protect output privacy, it mandates extensive computation and communication resources. In this paper, we introduce "Salted DNNs": a novel method that lets clients control the semantic interpretation of DNN output at inference time while maintaining accuracy and efficiency very close to that of a standard DNN. Experimental evaluations conducted on both image and sensor data show that Salted DNNs achieve classification accuracy very close to standard DNNs, particularly when the salted layer is positioned within the early part to meet the requirements of split inference. Our method is general and can be applied to various DNNs. We open-source our code and results, as a benchmark for future studies.

摘要
分配推理分区深度神经网络（DNN），让早期部分在边缘上运行，后期部分在云上运行，满足了边缘机器学习中两个关键需求：输入隐私和计算效率。然而，在分配推理中仍存在输出隐私问题，因为深度神经网络的输出可见于云端。尝试使用加密计算来保护输出隐私，但是这需要较为广泛的计算和通信资源。在本文中，我们介绍了“孤立的深度神经网络”（Salted DNNs）：一种新的方法，允许客户控制推理时神经网络输出的Semantic解释，保持精度和效率与标准神经网络几乎相同。我们在图像和感知数据上进行了实验评估，结果表明，孤立神经网络在早期部分中位置时，可以保持分配推理中的精度和效率，并且与标准神经网络几乎相同的精度。我们的方法是通用的，可以应用于多种深度神经网络。我们将代码和结果公开，作为未来研究的标准准。

Accelerated sparse Kernel Spectral Clustering for large scale data clustering problems

paper_url: http://arxiv.org/abs/2310.13381
repo_url: None
paper_authors: Mihaly Novak, Rocco Langone, Carlos Alzate, Johan Suykens
for: 这个论文的目的是提出一种改进的稀疏多方幂 Spectral Clustering（KSC）算法，以解决大规模数据分类问题。
methods: 这个算法使用了Weighted Kernel Principal Component Analysis（KPCA）和锐点 Support Vector Machine（SVM）框架，并使用了缺失Cholesky半阵（ICD）来实现稀疏性。
results: 这个改进算法可以大幅提高计算效率，从而使得 clustering 问题可以在秒钟内解决，而不是之前需要几个小时。此外，稀疏性也得到了显著提高，导致模型的表示更加紧凑，计算效率和描述力都得到了进一步提高。

Abstract
An improved version of the sparse multiway kernel spectral clustering (KSC) is presented in this brief. The original algorithm is derived from weighted kernel principal component (KPCA) analysis formulated within the primal-dual least-squares support vector machine (LS-SVM) framework. Sparsity is achieved then by the combination of the incomplete Cholesky decomposition (ICD) based low rank approximation of the kernel matrix with the so called reduced set method. The original ICD based sparse KSC algorithm was reported to be computationally far too demanding, especially when applied on large scale data clustering problems that actually it was designed for, which has prevented to gain more than simply theoretical relevance so far. This is altered by the modifications reported in this brief that drastically improve the computational characteristics. Solving the alternative, symmetrized version of the computationally most demanding core eigenvalue problem eliminates the necessity of forming and SVD of large matrices during the model construction. This results in solving clustering problems now within seconds that were reported to require hours without altering the results. Furthermore, sparsity is also improved significantly, leading to more compact model representation, increasing further not only the computational efficiency but also the descriptive power. These transform the original, only theoretically relevant ICD based sparse KSC algorithm applicable for large scale practical clustering problems. Theoretical results and improvements are demonstrated by computational experiments on carefully selected synthetic data as well as on real life problems such as image segmentation.

摘要
本文提出了一种改进版的稀疑多方幂kernel spectral clustering（KSC）算法。原始算法基于权重kernel principal component analysis（KPCA）在primaldual最小二乘Support Vector Machine（LS-SVM）框架中得到，并使用 incomplete Cholesky decomposition（ICD）基于低级别approximation kernel矩阵和减少集方法实现稀疑性。原始ICD基于稀疑KSC算法的计算复杂性太高，特别是在应用大规模数据归一化问题时，这使得其在实际应用中具有较低的理论重要性。这篇文章修改了这些问题，使得计算特性得到了很大改进。解决计算中最复杂的核心径值问题的代替版本，从而消除了将大型矩阵的SVD形成和计算的需要，这使得归一化问题可以在几秒钟内解决，而不会改变结果。此外，稀疑性也得到了显著提高，导致模型表示更加紧凑，进一步提高了计算效率和描述力。这些改进使得原来只有理论意义的ICD基于稀疑KSC算法可以应用于大规模实际归一化问题。计算实验在手动制作的数据集和真实问题，如图像分割，都表明了这些改进的有效性。

Physics-Informed Graph Convolutional Networks: Towards a generalized framework for complex geometries

paper_url: http://arxiv.org/abs/2310.14948
repo_url: None
paper_authors: Marien Chenaud, José Alves, Frédéric Magoulès
for: 解决部分束着方程（PDE）问题使用深度学习模型。
methods: 使用图ael neural networks（GNN），基于传统数学方法中的织物结构和PDE问题的相似性。
results: 提出一种将传统数学方法和Physics-Informed framework相结合的方法，并在三维非整形问题上进行实验验证。

Abstract
Since the seminal work of [9] and their Physics-Informed neural networks (PINNs), many efforts have been conducted towards solving partial differential equations (PDEs) with Deep Learning models. However, some challenges remain, for instance the extension of such models to complex three-dimensional geometries, and a study on how such approaches could be combined to classical numerical solvers. In this work, we justify the use of graph neural networks for these problems, based on the similarity between these architectures and the meshes used in traditional numerical techniques for solving partial differential equations. After proving an issue with the Physics-Informed framework for complex geometries, during the computation of PDE residuals, an alternative procedure is proposed, by combining classical numerical solvers and the Physics-Informed framework. Finally, we propose an implementation of this approach, that we test on a three-dimensional problem on an irregular geometry.

摘要
自《[9]》的Physics-Informed neural networks（PINNs）之工作开始以来，有很多努力在解决 partial differential equations（PDEs）中使用深度学习模型。然而，还有一些挑战，如扩展到复杂的三维几何结构，以及如何将这些方法与传统的数学方法结合起来。在这项工作中，我们证明使用图 neural networks 是合适的，因为这些架构与传统的数学方法中使用的网格具有相似之处。在计算 PDE residuals 时发现了物理 Informed 框架的问题，然后提出了一种 alternatively 的方法，将传统的数学方法与物理 Informed 框架结合起来。最后，我们提出了实现这种方法的方式，并在一个三维问题上进行了测试。

SigFormer: Signature Transformers for Deep Hedging

paper_url: http://arxiv.org/abs/2310.13369
repo_url: https://github.com/anh-tong/sigformer
paper_authors: Anh Tong, Thanh Nguyen-Tang, Dongeun Lee, Toan Tran, Jaesik Choi
for: 这个论文的目的是提出一种新的深度学习模型，用于财务风险管理，以提高财务风险管理的精度和效率。
methods: 这个论文使用了一种新的模型，即SigFormer，它将路径签名和变换器结合起来，以处理串行数据，特别是具有异常性的数据。
results: 研究人员通过对 sintetic数据进行比较，发现SigFormer比既有方法更快地学习并且更强的抗辐射性，特别是在存在异常的资产价格数据时。此外，通过对 SP 500 指数的实际回测，也得到了积极的结果。

Abstract
Deep hedging is a promising direction in quantitative finance, incorporating models and techniques from deep learning research. While giving excellent hedging strategies, models inherently requires careful treatment in designing architectures for neural networks. To mitigate such difficulties, we introduce SigFormer, a novel deep learning model that combines the power of path signatures and transformers to handle sequential data, particularly in cases with irregularities. Path signatures effectively capture complex data patterns, while transformers provide superior sequential attention. Our proposed model is empirically compared to existing methods on synthetic data, showcasing faster learning and enhanced robustness, especially in the presence of irregular underlying price data. Additionally, we validate our model performance through a real-world backtest on hedging the SP 500 index, demonstrating positive outcomes.

摘要
深度封风是现代金融数学的一个有前途的方向，具有深度学习研究中的模型和技术。而这些模型却需要在设计神经网络架构时进行仔细的考虑，以避免一些困难。为了解决这些问题，我们介绍了SigFormer，一种新的深度学习模型，它将路径签名和转换器结合起来处理顺序数据，特别是带有不规则性的数据。路径签名有效地捕捉复杂的数据模式，而转换器则提供了更好的顺序注意力。我们对现成的方法进行了比较，并在人工数据上进行了实验，显示了更快的学习速度和更高的稳定性，特别是在存在不规则的下场价格数据时。此外，我们还验证了我们的模型在真实的SP 500指数投资中的性能，显示了正面的结果。

Dissecting Causal Biases

paper_url: http://arxiv.org/abs/2310.13364
repo_url: None
paper_authors: Rūta Binkytė, Sami Zhioua, Yassine Turki
for: 这篇论文是关于机器学习基于自动决策系统中的歧视评估问题。
methods: 该论文使用了 causality 理论来形式地定义和分析歧视的各种来源，包括干扰、选择、测量和互动。
results: 该论文提供了每种歧视来源的关闭式表达式，以便分析每种来源的行为，特别是哪些情况下它们缺失，以及哪些情况下它们最大化。

Abstract
Accurately measuring discrimination in machine learning-based automated decision systems is required to address the vital issue of fairness between subpopulations and/or individuals. Any bias in measuring discrimination can lead to either amplification or underestimation of the true value of discrimination. This paper focuses on a class of bias originating in the way training data is generated and/or collected. We call such class causal biases and use tools from the field of causality to formally define and analyze such biases. Four sources of bias are considered, namely, confounding, selection, measurement, and interaction. The main contribution of this paper is to provide, for each source of bias, a closed-form expression in terms of the model parameters. This makes it possible to analyze the behavior of each source of bias, in particular, in which cases they are absent and in which other cases they are maximized. We hope that the provided characterizations help the community better understand the sources of bias in machine learning applications.

摘要
需要准确测量机器学习自动决策系统中的歧视，以解决人群或个体之间的公平问题。任何偏见在测量歧视方面可能会导致扩大或下降真实的歧视程度。本文关注一种来自训练数据生成和收集的偏见类型，我们称之为 causal bias。我们使用 causality 领域中的工具来正式定义和分析这种偏见。我们考虑了四种偏见来源， namely，杂化、选择、测量和交互。本文的主要贡献是为每种偏见提供了关于模型参数的闭式表达。这使得可以分析每种偏见的行为，特别是在哪些情况下它们缺失，而在哪些情况下它们最大化。我们希望通过提供的特征化来帮助社区更好地理解机器学习应用中的偏见来源。

Learning Recurrent Models with Temporally Local Rules

paper_url: http://arxiv.org/abs/2310.13284
repo_url: None
paper_authors: Azwar Abdulsalam, Joseph G. Makin
for: 这篇论文是为了探讨如何使用生成模型处理序列数据，并且提出了一种新的方法来缓解计算成本问题。
methods: 这篇论文使用了一种新的方法，即要求生成模型学习当前和前一个时刻的共同分布，而不仅仅是转移概率。
results: 研究人员在实验中发现，使用这种新方法可以学习一些通常需要回传计算的数据特征，并且可以在不同的架构下实现这种效果。

Abstract
Fitting generative models to sequential data typically involves two recursive computations through time, one forward and one backward. The latter could be a computation of the loss gradient (as in backpropagation through time), or an inference algorithm (as in the RTS/Kalman smoother). The backward pass in particular is computationally expensive (since it is inherently serial and cannot exploit GPUs), and difficult to map onto biological processes. Work-arounds have been proposed; here we explore a very different one: requiring the generative model to learn the joint distribution over current and previous states, rather than merely the transition probabilities. We show on toy datasets that different architectures employing this principle can learn aspects of the data typically requiring the backward pass.

摘要
通常情况下，生成模型会将序列数据适应的方法是通过时间进行两个递归计算，一个是前进计算，另一个是后退计算。后退计算通常是计算损失函数对数（如在时间层次propagation中），或者推理算法（如在RTS/Kalman滤波器中）。后退计算尤其是计算成本高（因为它是串行的，无法利用GPU），而且Difficult to map onto biological processes。工作around proposal; here we explore a very different one: requiring the generative model to learn the joint distribution over current and previous states, rather than merely the transition probabilities. We show on toy datasets that different architectures employing this principle can learn aspects of the data typically requiring the backward pass.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

FedLoRA: Model-Heterogeneous Personalized Federated Learning with LoRA Tuning

paper_url: http://arxiv.org/abs/2310.13283
repo_url: None
paper_authors: Liping Yi, Han Yu, Gang Wang, Xiaoguang Liu
for: 这个论文的目的是提出一种 computation-和 communication-efficient 的模型多样化个性化联合学习框架（FedLoRA），用于解决模型多样性、系统多样性和统计多样性等挑战。
methods: 这个框架采用了一种基于 LoRA 调整的迭代式全局-本地知识交换方法，通过在每个客户端上加载一个小型同构器，使得客户端可以训练多样化的本地模型而无需高度 computation 和 communication overhead。
results: 在两个实际数据集上进行了广泛的实验，结果显示，FedLoRA 可以在测试精度、计算开销和通信成本三个方面超过六个基准方法，其中测试精度高于最佳方法1.35%，计算开销下降11.81倍，通信成本减少7.41倍。

Abstract
Federated learning (FL) is an emerging machine learning paradigm in which a central server coordinates multiple participants (a.k.a. FL clients) to train a model collaboratively on decentralized data with privacy protection. This paradigm constrains that all clients have to train models with the same structures (homogeneous). In practice, FL often faces statistical heterogeneity, system heterogeneity and model heterogeneity challenges. These challenging issues inspire the field of Model-Heterogeneous Personalized Federated Learning (MHPFL) which aims to train a personalized and heterogeneous local model for each FL client. Existing MHPFL approaches cannot achieve satisfactory model performance, acceptable computational overhead and efficient communication simultaneously. To bridge this gap, we propose a novel computation- and communication-efficient model-heterogeneous personalized Federated learning framework based on LoRA tuning (FedLoRA). It is designed to incorporate a homogeneous small adapter for each client's heterogeneous local model. Both models are trained following the proposed iterative training for global-local knowledge exchange. The homogeneous small local adapters are sent to the FL server to be aggregated into a global adapter. In this way, FL clients can train heterogeneous local models without incurring high computation and communication costs. We theoretically prove the non-convex convergence rate of FedLoRA. Extensive experiments on two real-world datasets demonstrate that FedLoRA outperforms six state-of-the-art baselines, beating the best approach by 1.35% in terms of test accuracy, 11.81 times computation overhead reduction and 7.41 times communication cost saving.

摘要
联合学习（FL）是一种兴起的机器学习方法，在中央服务器协调多个 particiant（称为FL客户端）共同训练模型，并在分散式数据上保护隐私。这个方法限制所有客户端都必须使用同一个结构的模型（同质）。在实践中，FL经常面临统计不一致、系统不一致和模型不一致的挑战。这些挑战驱使了联合学习专门的人类化个性化训练框架（MHPFL），旨在将各个FL客户端专门训练的本地模型变成个性化的模型。现有的MHPFL方法无法同时 дости得满意的模型性能、可接受的计算负载和有效的通信成本。为了填补这个差距，我们提出了一个新的计算和通信效率的模型专业化人类化联合学习框架，基于LoRA调整（FedLoRA）。它运用了各个客户端的同质小适配器，以实现各个客户端专门训练的本地模型，不需要高计算和通信成本。我们对FedLoRA的非对称协调率进行了理论证明。实验结果显示，FedLoRA在两个真实世界数据集上进行了比基eline的性能评估，与最佳方法相比，获得了1.35%的测试精度提升、11.81倍的计算负载削减和7.41倍的通信成本削减。

An Event based Prediction Suffix Tree

paper_url: http://arxiv.org/abs/2310.14944
repo_url: None
paper_authors: Evie Andrew, Travis Monk, André van Schaik
for: 这篇论文是为了介绍一种基于事件预测 suffixed tree（EPST）算法，该算法是基于生物体中的预测方法，可以在多个重叠模式上进行预测。
methods: 该算法使用基于事件数据的特定表示方式，即在短时间窗口内的事件子序列的一部分。它还具有可解释性、耐事件噪音、耐Dropout等多个优点。
results: 在一个 sintetic数据预测任务中，EPST算法可以在添加事件噪音、事件抖动和Dropout等不利条件下进行预测，并输出预测结果，可以应用于事件基于异常检测或模式识别等任务。

Abstract
This article introduces the Event based Prediction Suffix Tree (EPST), a biologically inspired, event-based prediction algorithm. The EPST learns a model online based on the statistics of an event based input and can make predictions over multiple overlapping patterns. The EPST uses a representation specific to event based data, defined as a portion of the power set of event subsequences within a short context window. It is explainable, and possesses many promising properties such as fault tolerance, resistance to event noise, as well as the capability for one-shot learning. The computational features of the EPST are examined in a synthetic data prediction task with additive event noise, event jitter, and dropout. The resulting algorithm outputs predicted projections for the near term future of the signal, which may be applied to tasks such as event based anomaly detection or pattern recognition.

摘要
Translated into Simplified Chinese:这篇文章介绍了基于事件的预测 sufix tree (EPST) 算法，这是基于生物体的革新性预测算法。EPST 在线学习基于事件输入的统计特征，并可以预测多个 overlap 的模式。EPST 使用专门为事件数据定义的表示方式，即在短上下文窗口内的事件子序列的一部分。它可以解释，并具有许多有前途的特性，如fault tolerance、事件噪音抗性以及一次学习能力。EPST 的计算特点在一个 sintetic 数据预测任务中被检查，包括 additive 事件噪音、事件摆动和 dropout。结果输出的预测投影可以用于事件基于异常检测或模式识别等任务。

DIG-MILP: a Deep Instance Generator for Mixed-Integer Linear Programming with Feasibility Guarantee

paper_url: http://arxiv.org/abs/2310.13261
repo_url: https://github.com/graph-com/dig_milp
paper_authors: Haoyu Wang, Jialin Liu, Xiaohan Chen, Xinshang Wang, Pan Li, Wotao Yin
for: 实现高效的整数线性Programming（MILP）解决方案，并且提供广泛、多样化和代表性的数据来支持算法开发、解决方案优化和机器学习模型训练。
methods: 基于Variational auto-encoder（VAE）的深度生成框架，将高度限制的MILP数据中的深度结构特征提取出来，生成与目标数据相似的实例。
results: 通过两个下游任务（S1）资料共享和（S2）资料增强，显示DIG-MILP可以生成高品质且新的MILP实例，并且 garantte Correctness和可行性。

Abstract
Mixed-integer linear programming (MILP) stands as a notable NP-hard problem pivotal to numerous crucial industrial applications. The development of effective algorithms, the tuning of solvers, and the training of machine learning models for MILP resolution all hinge on access to extensive, diverse, and representative data. Yet compared to the abundant naturally occurring data in image and text realms, MILP is markedly data deficient, underscoring the vital role of synthetic MILP generation. We present DIG-MILP, a deep generative framework based on variational auto-encoder (VAE), adept at extracting deep-level structural features from highly limited MILP data and producing instances that closely mirror the target data. Notably, by leveraging the MILP duality, DIG-MILP guarantees a correct and complete generation space as well as ensures the boundedness and feasibility of the generated instances. Our empirical study highlights the novelty and quality of the instances generated by DIG-MILP through two distinct downstream tasks: (S1) Data sharing, where solver solution times correlate highly positive between original and DIG-MILP-generated instances, allowing data sharing for solver tuning without publishing the original data; (S2) Data Augmentation, wherein the DIG-MILP-generated instances bolster the generalization performance of machine learning models tasked with resolving MILP problems.

摘要

Knowledge Graph Context-Enhanced Diversified Recommendation

paper_url: http://arxiv.org/abs/2310.13253
repo_url: https://github.com/anonym844/kg-diverse
paper_authors: Xiaolong Liu, Liangwei Yang, Zhiwei Liu, Mingdai Yang, Chen Wang, Hao Peng, Philip S. Yu
For: The paper aims to enhance recommendation diversity within the context of knowledge graphs (KG) by incorporating contextual information and preserving contextual integrity.* Methods: The paper introduces an innovative metric called Entity Coverage and Relation Coverage to quantify diversity within the KG domain. It also proposes a novel technique called Diversified Embedding Learning (DEL) to formulate user representations that possess an innate awareness of diversity. Additionally, the paper introduces a new technique named Conditional Alignment and Uniformity (CAU) to encode KG item embeddings while preserving contextual integrity.* Results: The paper’s contributions signify a substantial stride towards augmenting the panorama of recommendation diversity within the realm of KG-informed RecSys paradigms.

Abstract
The field of Recommender Systems (RecSys) has been extensively studied to enhance accuracy by leveraging users' historical interactions. Nonetheless, this persistent pursuit of accuracy frequently engenders diminished diversity, culminating in the well-recognized "echo chamber" phenomenon. Diversified RecSys has emerged as a countermeasure, placing diversity on par with accuracy and garnering noteworthy attention from academic circles and industry practitioners. This research explores the realm of diversified RecSys within the intricate context of knowledge graphs (KG). These KGs act as repositories of interconnected information concerning entities and items, offering a propitious avenue to amplify recommendation diversity through the incorporation of insightful contextual information. Our contributions include introducing an innovative metric, Entity Coverage, and Relation Coverage, which effectively quantifies diversity within the KG domain. Additionally, we introduce the Diversified Embedding Learning (DEL) module, meticulously designed to formulate user representations that possess an innate awareness of diversity. In tandem with this, we introduce a novel technique named Conditional Alignment and Uniformity (CAU). It adeptly encodes KG item embeddings while preserving contextual integrity. Collectively, our contributions signify a substantial stride towards augmenting the panorama of recommendation diversity within the realm of KG-informed RecSys paradigms.

摘要
领域内的个人推荐系统（RecSys）已经广泛研究，以提高准确性。然而，这种不断追求准确性的努力经常导致多样性减少，最终形成了“喷 voz”现象。为了解决这问题，多样化RecSys已经出现了，将多样性与准确性平起肩。这篇研究在知识图（KG）的厚重的背景下，探讨了多样化RecSys的领域。KG作为实体和物品之间的连接，可以为推荐多样性的提高提供一条优美的路径。我们的贡献包括引入了一种创新的度量，Entity Coverage和Relation Coverage，可以有效量度KG中多样性。此外，我们还提出了多样化嵌入学习（DEL）模块，仔细设计用于构建具有多样性感的用户表示。同时，我们还提出了一种新的技术 named Conditional Alignment and Uniformity（CAU），可以充分利用KG中项目的嵌入，同时保持上下文完整性。总之，我们的贡献代表了对KG-驱动的RecSys多样化领域的一大突破。

Transparency challenges in policy evaluation with causal machine learning – improving usability and accountability

paper_url: http://arxiv.org/abs/2310.13240
repo_url: None
paper_authors: Patrick Rehill, Nicholas Biddle
for: 这篇论文旨在探讨 causal machine learning 工具在实际政策评估任务中的应用，以及这些方法在政策评估中的透明性问题。
methods: 本论文使用了 causal forest 模型来估计 conditional average treatment effects，并explores 如何使用可解释 AI 工具和简化模型来解决透明性问题。
results: 研究发现，现有的预测模型理解工具对 causal machine learning 模型不够有效，而简化模型以提高可解释性会导致预测误差增加。

Abstract
Causal machine learning tools are beginning to see use in real-world policy evaluation tasks to flexibly estimate treatment effects. One issue with these methods is that the machine learning models used are generally black boxes, i.e., there is no globally interpretable way to understand how a model makes estimates. This is a clear problem in policy evaluation applications, particularly in government, because it is difficult to understand whether such models are functioning in ways that are fair, based on the correct interpretation of evidence and transparent enough to allow for accountability if things go wrong. However, there has been little discussion of transparency problems in the causal machine learning literature and how these might be overcome. This paper explores why transparency issues are a problem for causal machine learning in public policy evaluation applications and considers ways these problems might be addressed through explainable AI tools and by simplifying models in line with interpretable AI principles. It then applies these ideas to a case-study using a causal forest model to estimate conditional average treatment effects for a hypothetical change in the school leaving age in Australia. It shows that existing tools for understanding black-box predictive models are poorly suited to causal machine learning and that simplifying the model to make it interpretable leads to an unacceptable increase in error (in this application). It concludes that new tools are needed to properly understand causal machine learning models and the algorithms that fit them.

摘要
causal machine learning工具正在实际政策评估任务中得到应用，以便灵活地估计治理效果。 however，这些方法的机器学习模型通常是黑盒子，即没有全面可解释的方式来理解模型如何生成估计。这是政策评估应用中的一个明显问题，特别是在政府中，因为困难以理解模型是否正常工作，基于正确的证据解释和透明度足够高以便负责任。然而，在 causal machine learning文献中对透明性问题的讨论相对少。这篇论文探讨了 causal machine learning在公共政策评估应用中的透明性问题，并考虑了如何通过可解释 AI 工具和简化模型来解决这些问题。然后，它应用这些想法到一个 случа study中，使用 causal forest 模型来估计 conditional average treatment effects для一个假设的澳大利亚学生离校年龄的变化。结果显示，现有的理解黑盒predictive模型的工具不适用于 causal machine learning，并且简化模型以使其可解释会导致误差增加（在这个应用中）。因此，这篇论文结论认为，需要新的工具来全面理解 causal machine learning模型和这些模型的算法。

Training A Semantic Communication System with Federated Learning

paper_url: http://arxiv.org/abs/2310.13236
repo_url: None
paper_authors: Loc X. Nguyen, Huy Q. Le, Ye Lin Tun, Pyae Sone Aung, Yan Kyaw Tun, Zhu Han, Choong Seon Hong
for: 本研究旨在提高 semantic communication 系统的性能，使其能够更好地处理数据匮乏问题。
methods: 本研究使用 federated learning (FL) Setting，利用用户数据进行学习，同时保护用户隐私和安全。另外，我们提出了一种减少每次全局轮次中传输信息量的机制，以降低网络负担。
results: 我们的提议技术与基准方法进行比较，实验结果表明其效果明显更高。

Abstract
Semantic communication has emerged as a pillar for the next generation of communication systems due to its capabilities in alleviating data redundancy. Most semantic communication systems are built using advanced deep learning models whose performance heavily depends on data availability. These studies assume that an abundance of training data is available, which is unrealistic. In practice, data is mainly created on the user side. Due to privacy and security concerns, the transmission of data is restricted, which is necessary for conventional centralized training schemes. To address this challenge, we explore semantic communication in federated learning (FL) setting that utilizes user data without leaking privacy. Additionally, we design our system to tackle the communication overhead by reducing the quantity of information delivered in each global round. In this way, we can save significant bandwidth for resource-limited devices and reduce overall network traffic. Finally, we propose a mechanism to aggregate the global model from the clients, called FedLol. Extensive simulation results demonstrate the efficacy of our proposed technique compared to baseline methods.

摘要

Equivariant Transformer is all you need

paper_url: http://arxiv.org/abs/2310.13222
repo_url: None
paper_authors: Akio Tomiya, Yuki Nagai
for: 这篇论文是用于推动计算物理学的机器学习和深度学习的应用。
methods: 论文使用了对称Equivariant Attention来改进Self-Learning Monte Carlo（SLMC）方法。
results: 实验结果表明，对称Equivariant Attention可以减少线性模型的接受率问题，并且可以观察到大型语言模型的扩展性。

Abstract
Machine learning, deep learning, has been accelerating computational physics, which has been used to simulate systems on a lattice. Equivariance is essential to simulate a physical system because it imposes a strong induction bias for the probability distribution described by a machine learning model. This reduces the risk of erroneous extrapolation that deviates from data symmetries and physical laws. However, imposing symmetry on the model sometimes occur a poor acceptance rate in self-learning Monte-Carlo (SLMC). On the other hand, Attention used in Transformers like GPT realizes a large model capacity. We introduce symmetry equivariant attention to SLMC. To evaluate our architecture, we apply it to our proposed new architecture on a spin-fermion model on a two-dimensional lattice. We find that it overcomes poor acceptance rates for linear models and observe the scaling law of the acceptance rate as in the large language models with Transformers.

摘要
机器学习、深度学习已经加速计算物理学，通过在格子上模拟系统。对称是计算物理系统的关键因素，因为它对机器学习模型中描述的概率分布强加假设。这可以降低模型外泌的风险，避免因数学 симметрии和物理法律而导致的误差推断。然而，在SLMC中强制实现对称 occasionally leads to poor acceptance rates.在这个场景下，我们引入对称启发注意力。我们采用这种新架构应用于我们的提议的二维格子上的螺旋- fermion 模型。我们发现它可以超越线性模型的 Acceptance 率问题，并观察到大型语言模型中的启发注意力的扩展律。

In-context Learning with Transformer Is Really Equivalent to a Contrastive Learning Pattern

paper_url: http://arxiv.org/abs/2310.13220
repo_url: None
paper_authors: Ruifeng Ren, Yong Liu
for: 本研究旨在理解Transformers基于强化学习的协同学习（ICL）机制。
methods: 本研究使用权重方法 establishment kernel方法来解释ICL的推理过程，并分析了在contrastive learning pattern下的自注意力层可能的改进。
results: 本研究表明，ICL可以视为一种梯度下降过程，并且通过对contrastive learning pattern进行分析，可以提出可能的自注意力层改进。

Abstract
Pre-trained large language models based on Transformers have demonstrated amazing in-context learning (ICL) abilities. Given several demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we interpret the inference process of ICL as a gradient descent process in a contrastive learning pattern. Firstly, leveraging kernel methods, we establish the relationship between gradient descent and self-attention mechanism under generally used softmax attention setting instead of linear attention setting. Then, we analyze the corresponding gradient descent process of ICL from the perspective of contrastive learning without negative samples and discuss possible improvements of this contrastive learning pattern, based on which the self-attention layer can be further modified. Finally, we design experiments to support our opinions. To the best of our knowledge, our work is the first to provide the understanding of ICL from the perspective of contrastive learning and has the potential to facilitate future model design by referring to related works on contrastive learning.

摘要
<>将文本翻译成简化中文。<>基于 transformer 的大语言模型已经表现出很好的上下文学习（ICL）能力。给定一些示例，模型可以实现新任务无需参数更新。然而，我们还没有很好地理解ICL的机制。在这篇论文中，我们将推理出ICL的推理过程为一个梯度下降过程，并且在通常使用软max注意力设置下，使用kernel方法来建立梯度下降和自注意力机制之间的关系。然后，我们分析ICL的相应梯度下降过程从对照学习的角度，不包括负样本，并讨论可能改进这种对照学习模式的方法。最后，我们设计了实验来支持我们的观点。根据我们所知，我们的工作是首次从对照学习角度理解ICL，并且具有可能引导未来模型设计的潜在优势。