cs.LG - 2023-10-09

Fair Classifiers that Abstain without Harm

  • paper_url: http://arxiv.org/abs/2310.06205
  • repo_url: None
  • paper_authors: Tongxin Yin, Jean-François Ton, Ruocheng Guo, Yuanshun Yao, Mingyan Liu, Yang Liu
  • For: The paper aims to develop a post-hoc method for existing classifiers to selectively abstain from predicting certain samples in order to achieve group fairness while maintaining original accuracy.* Methods: The proposed method uses integer programming to assign abstention decisions for each training sample and trains a surrogate model to generalize the abstaining decisions to test samples.* Results: The paper shows that the proposed method outperforms existing methods in terms of fairness disparity without sacrificing accuracy at similar abstention rates, and provides theoretical results on the feasibility of the IP procedure and the required abstention rate for different levels of unfairness tolerance and accuracy constraint.
    Abstract In critical applications, it is vital for classifiers to defer decision-making to humans. We propose a post-hoc method that makes existing classifiers selectively abstain from predicting certain samples. Our abstaining classifier is incentivized to maintain the original accuracy for each sub-population (i.e. no harm) while achieving a set of group fairness definitions to a user specified degree. To this end, we design an Integer Programming (IP) procedure that assigns abstention decisions for each training sample to satisfy a set of constraints. To generalize the abstaining decisions to test samples, we then train a surrogate model to learn the abstaining decisions based on the IP solutions in an end-to-end manner. We analyze the feasibility of the IP procedure to determine the possible abstention rate for different levels of unfairness tolerance and accuracy constraint for achieving no harm. To the best of our knowledge, this work is the first to identify the theoretical relationships between the constraint parameters and the required abstention rate. Our theoretical results are important since a high abstention rate is often infeasible in practice due to a lack of human resources. Our framework outperforms existing methods in terms of fairness disparity without sacrificing accuracy at similar abstention rates.
    摘要 Translated into Simplified Chinese:在关键应用中,分类器需要延迟决策,以便启用人类决策。我们提出了一种后期方法,使得现有的分类器可以选择性弃权处理certain sample。我们的弃权分类器被激励保持每个子 популяции的原始精度(即不害),同时实现一组集体公正定义到用户指定的程度。为此,我们设计了一个整数程序(IP)过程,将每个训练样本的弃权决策分配给满足一系列约束。为推广弃权决策到测试样本,我们然后训练了一个代理模型,以learn弃权决策基于IP解决方案的末端方式。我们分析了IP过程的可行性,以确定不同的不公正忍容度和精度约束下的可能的弃权率。我们的理论结果非常重要,因为高弃权率在实践中通常是不可能的,由于人工资源的缺乏。我们的框架在保持公正差距方面比现有方法更高,而不是牺牲精度。

PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization

  • paper_url: http://arxiv.org/abs/2310.06182
  • repo_url: None
  • paper_authors: Jiancong Xiao, Ruoyu Sun, Zhi- Quan Luo
  • for: This paper focuses on establishing theoretical guarantees for the robust generalization of deep neural networks (DNNs) against adversarial attacks.
  • methods: The paper uses a PAC-Bayes approach (Neyshabur et al., 2017) and provides a spectrally-normalized robust generalization bound for DNNs, addressing the challenge of extending the key ingredient to robust settings without relying on additional strong assumptions.
  • results: The paper shows that the mismatch terms between standard and robust generalization bounds are solely due to mathematical issues, and provides a different perspective on understanding robust generalization. Additionally, the paper extends the main result to adversarial robustness against general non-$\ell_p$ attacks and other neural network architectures.
    Abstract Deep neural networks (DNNs) are vulnerable to adversarial attacks. It is found empirically that adversarially robust generalization is crucial in establishing defense algorithms against adversarial attacks. Therefore, it is interesting to study the theoretical guarantee of robust generalization. This paper focuses on norm-based complexity, based on a PAC-Bayes approach (Neyshabur et al., 2017). The main challenge lies in extending the key ingredient, which is a weight perturbation bound in standard settings, to the robust settings. Existing attempts heavily rely on additional strong assumptions, leading to loose bounds. In this paper, we address this issue and provide a spectrally-normalized robust generalization bound for DNNs. Compared to existing bounds, our bound offers two significant advantages: Firstly, it does not depend on additional assumptions. Secondly, it is considerably tighter, aligning with the bounds of standard generalization. Therefore, our result provides a different perspective on understanding robust generalization: The mismatch terms between standard and robust generalization bounds shown in previous studies do not contribute to the poor robust generalization. Instead, these disparities solely due to mathematical issues. Finally, we extend the main result to adversarial robustness against general non-$\ell_p$ attacks and other neural network architectures.
    摘要 Translated into Simplified Chinese:深度神经网络(DNN)易受到敌意攻击。实证表明,针对攻击的鲁棒化是建立防御算法的关键。因此,研究鲁棒化的理论保证很有趣。这篇论文关注 norm-based 复杂性,基于 PAC-Bayes 方法(Neyshabur et al., 2017)。主要挑战在扩展关键成分,即标准设置中的 weight 偏移 bound,到鲁棒设置中。现有尝试都需要额外假设,导致约束较松。在这篇论文中,我们解决这个问题,并提供一个spectrally-normalized 鲁棒化维度 bound for DNNs。与现有 bound 相比,我们的 bound 具有两个优势:首先,不需要额外假设。第二,较紧,与标准化 generalization bound 相符。因此,我们的结果提供了一种不同的理解鲁棒化的视角:在前一些研究中显示的鲁棒化与标准化 generalization bound 之间的差异不是由于 poor 鲁棒化,而是由于数学问题。最后,我们扩展主要结果到面向普通非 $\ell_p$ 攻击和其他神经网络架构。

Automatic Integration for Spatiotemporal Neural Point Processes

  • paper_url: http://arxiv.org/abs/2310.06179
  • repo_url: None
  • paper_authors: Zihao Zhou, Rose Yu
  • for: 这篇论文主要针对的是如何有效地捕捉和分析continuous-time点处理,尤其是在空间和时间上的点处理(STPPs)。
  • methods: 这篇论文提出了一种新的AutoSTPP(自动 интеграл для空间时间 нейрон点处理)方法,它是基于AutoInt(自动 интеграл)方法的扩展,可以有效地处理3D STPP。
  • results: 研究人员通过synthetic数据和实际世界数据 validate了AutoSTPP方法,并证明了其在复杂的intensity函数恢复方面的优异性。
    Abstract Learning continuous-time point processes is essential to many discrete event forecasting tasks. However, integration poses a major challenge, particularly for spatiotemporal point processes (STPPs), as it involves calculating the likelihood through triple integrals over space and time. Existing methods for integrating STPP either assume a parametric form of the intensity function, which lacks flexibility; or approximating the intensity with Monte Carlo sampling, which introduces numerical errors. Recent work by Omi et al. [2019] proposes a dual network or AutoInt approach for efficient integration of flexible intensity function. However, the method only focuses on the 1D temporal point process. In this paper, we introduce a novel paradigm: AutoSTPP (Automatic Integration for Spatiotemporal Neural Point Processes) that extends the AutoInt approach to 3D STPP. We show that direct extension of the previous work overly constrains the intensity function, leading to poor performance. We prove consistency of AutoSTPP and validate it on synthetic data and benchmark real world datasets, showcasing its significant advantage in recovering complex intensity functions from irregular spatiotemporal events, particularly when the intensity is sharply localized.
    摘要 Recent work by Omi et al. (2019) proposes a dual network or AutoInt approach for efficient integration of flexible intensity functions. However, this method only focuses on 1D temporal point processes. In this paper, we introduce a novel paradigm called AutoSTPP (Automatic Integration for Spatiotemporal Neural Point Processes) that extends the AutoInt approach to 3D STPP. We show that direct extension of the previous work overly constrains the intensity function, leading to poor performance.We prove the consistency of AutoSTPP and validate it on synthetic data and benchmark real-world datasets. Our results show that AutoSTPP significantly outperforms existing methods in recovering complex intensity functions from irregular spatiotemporal events, particularly when the intensity is sharply localized.

DockGame: Cooperative Games for Multimeric Rigid Protein Docking

  • paper_url: http://arxiv.org/abs/2310.06177
  • repo_url: https://github.com/vsomnath/dockgame
  • paper_authors: Vignesh Ram Somnath, Pier Giuseppe Sessa, Maria Rodriguez Martinez, Andreas Krause
  • for: 本文针对的是预测多蛋白质复合物的结构,即蛋白质 docking 问题。
  • methods: 本文提出了一种基于游戏理论的 docking 方法,视蛋白质 docking 为多个蛋白质之间的合作游戏,并通过同时更新梯度来计算稳定Equilibrium。此外,本文还提出了一种基于扩散生成模型的方法,通过学习扩散分布来采样真实潜在力的Gibbs分布。
  • results: 实验结果表明,对 DB5.5 数据集,DockGame 比传统的 docking 方法快得多,能够生成多个可能的结构,并且与现有的 binary docking 基准集成比较。
    Abstract Protein interactions and assembly formation are fundamental to most biological processes. Predicting the assembly structure from constituent proteins -- referred to as the protein docking task -- is thus a crucial step in protein design applications. Most traditional and deep learning methods for docking have focused mainly on binary docking, following either a search-based, regression-based, or generative modeling paradigm. In this paper, we focus on the less-studied multimeric (i.e., two or more proteins) docking problem. We introduce DockGame, a novel game-theoretic framework for docking -- we view protein docking as a cooperative game between proteins, where the final assembly structure(s) constitute stable equilibria w.r.t. the underlying game potential. Since we do not have access to the true potential, we consider two approaches - i) learning a surrogate game potential guided by physics-based energy functions and computing equilibria by simultaneous gradient updates, and ii) sampling from the Gibbs distribution of the true potential by learning a diffusion generative model over the action spaces (rotations and translations) of all proteins. Empirically, on the Docking Benchmark 5.5 (DB5.5) dataset, DockGame has much faster runtimes than traditional docking methods, can generate multiple plausible assembly structures, and achieves comparable performance to existing binary docking baselines, despite solving the harder task of coordinating multiple protein chains.
    摘要 生物过程中的蛋白质交互和组装是基本的。从组成蛋白质的蛋白质拟合结构 -- 称为蛋白质拟合任务 -- 是蛋白质设计应用中的关键步骤。大多数传统和深度学习方法都主要关注了 binary docking,包括搜索、回归和生成模型的思路。在这篇论文中,我们关注了较少研究的多蛋白质(即两个或更多蛋白质)拟合问题。我们引入了 DockGame,一个基于游戏理论的拟合框架 -- 我们视蛋白质拟合为蛋白质之间的合作游戏,其最终结构为蛋白质之间的稳定平衡点。由于我们没有访问真实的潜在力,我们考虑了两种方法:一是学习带有物理基础能函数的代理游戏可能性函数,并通过同时更新梯度来计算平衡点;二是通过学习动作空间(旋转和平移)中的托德曼分布来采样真实的潜在力。实验表明,在 DB5.5 数据集上,DockGame 的运行时间远比传统拟合方法快得多,可以生成多个可能的结构,并与现有的 binary docking 基线相当。

Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness

  • paper_url: http://arxiv.org/abs/2310.06161
  • repo_url: https://github.com/estija/cmid
  • paper_authors: Bhavya Vasudeva, Kameron Shahabi, Vatsal Sharan
  • for: addressing simplicity bias in neural networks and improving OOD generalization, subgroup robustness, and fairness
  • methods: regularizing the conditional mutual information of a simple model to obtain a more diverse set of features for making predictions
  • results: effective in various problem settings and real-world applications, leading to more diverse feature usage, enhanced OOD generalization, improved subgroup robustness, and fairness, with theoretical analyses of the effectiveness and OOD generalization properties.Here’s the full Chinese text:
  • for: Addressing simplicity bias in 神经网络(NNs),提高 OUT-OF-DISTRIBUTION(OOD)泛化、 subgroup robustness 和 fairness
  • methods: Regularizing the conditional mutual information of a simple model to obtain a more diverse set of features for making predictions
  • results: Effective in various problem settings and real-world applications, leading to more diverse feature usage, enhanced OOD generalization, improved subgroup robustness, and fairness, with theoretical analyses of the effectiveness and OOD generalization properties.
    Abstract Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative. Simplicity bias can lead to the model making biased predictions which have poor out-of-distribution (OOD) generalization. To address this, we propose a framework that encourages the model to use a more diverse set of features to make predictions. We first train a simple model, and then regularize the conditional mutual information with respect to it to obtain the final model. We demonstrate the effectiveness of this framework in various problem settings and real-world applications, showing that it effectively addresses simplicity bias and leads to more features being used, enhances OOD generalization, and improves subgroup robustness and fairness. We complement these results with theoretical analyses of the effect of the regularization and its OOD generalization properties.
    摘要

Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization

  • paper_url: http://arxiv.org/abs/2310.06159
  • repo_url: None
  • paper_authors: Cong Ma, Xingyu Xu, Tian Tong, Yuejie Chi
  • for: 估计低维对象(如矩阵和张量)从不完整、可能受损的线性测量中获得
  • methods: 使用简单迭代法如梯度下降(GD)来直接回归低维因子,具有小内存和计算脚印
  • results: ScaledGD算法可以线性 converge,不受低维对象的condition number影响,并且可以在各种任务中实现快速的全局收敛,包括感知、Robust PCA和完成任务。
    Abstract Many problems encountered in science and engineering can be formulated as estimating a low-rank object (e.g., matrices and tensors) from incomplete, and possibly corrupted, linear measurements. Through the lens of matrix and tensor factorization, one of the most popular approaches is to employ simple iterative algorithms such as gradient descent (GD) to recover the low-rank factors directly, which allow for small memory and computation footprints. However, the convergence rate of GD depends linearly, and sometimes even quadratically, on the condition number of the low-rank object, and therefore, GD slows down painstakingly when the problem is ill-conditioned. This chapter introduces a new algorithmic approach, dubbed scaled gradient descent (ScaledGD), that provably converges linearly at a constant rate independent of the condition number of the low-rank object, while maintaining the low per-iteration cost of gradient descent for a variety of tasks including sensing, robust principal component analysis and completion. In addition, ScaledGD continues to admit fast global convergence to the minimax-optimal solution, again almost independent of the condition number, from a small random initialization when the rank is over-specified in the presence of Gaussian noise. In total, ScaledGD highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the symmetry in low-rank factorization without hurting generalization.
    摘要 许多科学和工程问题可以表示为估算一个低级对象(例如矩阵和张量)从不完整和可能受损的线性测量中。通过矩阵和张量分解的镜头,一种非常流行的方法是使用简单的迭代算法such as gradient descent (GD)来恢复低级因子,这些算法具有小内存和计算成本。然而,GD的收敛率与低级对象的condition number线性相关,当问题不梯化时,GD的收敛率会辐芳缓慢。这章节介绍了一种新的算法方法,称为scaled gradient descent (ScaledGD),该方法可以在不同任务中,包括感知、稳定主成分分析和完成任务中,以Constant rate linearly converge,而不是linearly dependent on the condition number of the low-rank object。此外,ScaledGD还可以快速到达最优解,即minimax-optimal solution,从小Random initialization开始,当级数超出规定时,在存在 Gaussian noise 的情况下。总之,ScaledGD强调了适当的预conditioning在加速非 conjugate statistical estimation中的作用,iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the symmetry in low-rank factorization without hurting generalization。

Manifold-augmented Eikonal Equations: Geodesic Distances and Flows on Differentiable Manifolds

  • paper_url: http://arxiv.org/abs/2310.06157
  • repo_url: None
  • paper_authors: Daniel Kelshaw, Luca Magri
  • for: 这项研究旨在提供一种基于模型的方法来 parameterize distance fields和 geodesic flows on manifolds,以便在 differentiable manifolds 上进行统计分析和减少维度模型。
  • methods: 该研究使用 manifold-augmented Eikonal equation 的解来 parameterize distance fields和 geodesic flows on manifolds。
  • results: 研究发现, manifold 的geometry对 distance field 产生了影响,而 geodesic flow 可以用来获取 globally length-minimizing curves。这些结果开启了 differentiable manifolds 上的统计分析和减少维度模型的可能性。
    Abstract Manifolds discovered by machine learning models provide a compact representation of the underlying data. Geodesics on these manifolds define locally length-minimising curves and provide a notion of distance, which are key for reduced-order modelling, statistical inference, and interpolation. In this work, we propose a model-based parameterisation for distance fields and geodesic flows on manifolds, exploiting solutions of a manifold-augmented Eikonal equation. We demonstrate how the geometry of the manifold impacts the distance field, and exploit the geodesic flow to obtain globally length-minimising curves directly. This work opens opportunities for statistics and reduced-order modelling on differentiable manifolds.
    摘要 人工智能模型发现的 manifold 提供了数据的紧凑表示。 manifold 上的 geodesic 定义了本地最短曲线,并提供了距离的概念,这些概念是reduced-order模型、统计推断和 interpolate 等方面的关键。在这项工作中,我们提议一种基于模型的 parameterization 方法 для distance field 和 geodesic flow на manifold,利用 manifold-augmented Eikonal equation 的解。我们示出了 manifold 的几何特性对 distance field 的影响,并利用 geodesic flow 直接获取全球最短曲线。这项工作开启了 differentiable manifold 上的统计和减少模型的可能性。

Latent Diffusion Model for DNA Sequence Generation

  • paper_url: http://arxiv.org/abs/2310.06150
  • repo_url: None
  • paper_authors: Zehui Li, Yuhao Ni, Tim August B. Huygelen, Akashaditya Das, Guoxuan Xia, Guy-Bart Stan, Yiren Zhao
  • for: 本研究旨在提出一种基于扩散模型的精灵散列模型(DiscDiff),用于静止DNA序列生成。
  • methods: 本研究使用了一种卷积神经网络(autoencoder)将扩散模型的维度嵌入到维度空间中,以便利用连续扩散模型的强大生成能力来生成扩散数据。
  • results: 本研究的DiscDiff模型能够生成具有真实DNA序列的高一致性的合成DNA序列,包括约束分布、封闭空间分布(FReD)和染色体轨迹分布。此外,本研究还提供了15种物种150000个特有前体-基因序列数据,为未来的生成模型在遗传学中提供了更多的资源。
    Abstract The harnessing of machine learning, especially deep generative models, has opened up promising avenues in the field of synthetic DNA sequence generation. Whilst Generative Adversarial Networks (GANs) have gained traction for this application, they often face issues such as limited sample diversity and mode collapse. On the other hand, Diffusion Models are a promising new class of generative models that are not burdened with these problems, enabling them to reach the state-of-the-art in domains such as image generation. In light of this, we propose a novel latent diffusion model, DiscDiff, tailored for discrete DNA sequence generation. By simply embedding discrete DNA sequences into a continuous latent space using an autoencoder, we are able to leverage the powerful generative abilities of continuous diffusion models for the generation of discrete data. Additionally, we introduce Fr\'echet Reconstruction Distance (FReD) as a new metric to measure the sample quality of DNA sequence generations. Our DiscDiff model demonstrates an ability to generate synthetic DNA sequences that align closely with real DNA in terms of Motif Distribution, Latent Embedding Distribution (FReD), and Chromatin Profiles. Additionally, we contribute a comprehensive cross-species dataset of 150K unique promoter-gene sequences from 15 species, enriching resources for future generative modelling in genomics. We will make our code public upon publication.
    摘要 “机器学习的应用,特别是深度生成模型,在人造DNA序列生成领域中开启了有前途的可能性。虽然生成对抗网络(GANs)在这个应用中获得了进展,但它们经常面临有限的样本多样性和模式崩溃的问题。相比之下,传播模型是一种新的生成模型,没有这些问题,因此可以在领域中实现国际级的生成。在这背景下,我们提出了一个新的潜在传播模型,DiscDiff,专门适用于碎变DNA序列生成。通过将碎变DNA序列转换为连续的潜在空间中的对抗网络,我们可以利用传播模型的强大生成能力来生成碎变数据。此外,我们引入了Fréchet重建距离(FReD)作为评估生成DNA序列质量的新指标。DiscDiff模型在关于折衣分布、隐藏嵌入分布(FReD)和染色体质量上呈现高度的一致性。此外,我们提供了15种物种150000个唯一的激活器-蛋白质序列数据,增加了未来生成模型在遗传学方面的资源。我们将代码公开发布。”

On the Correlation between Random Variables and their Principal Components

  • paper_url: http://arxiv.org/abs/2310.06139
  • repo_url: None
  • paper_authors: Zenon Gniazdowski
  • for: 本研究旨在找到Random Variables之间的相関系数,并使用线性代数方法来描述这些相关系数。
  • methods: 本研究使用了选取随机变数之间的统计量,然后使用 вектор和矩阵的概念来表述这些统计量的语言。这使得在后续步骤中可以 derivate预期的公式。
  • results: 研究发现,这个公式与因素分析中用来计算因素负载的公式相同。对于Principal Component Analysis中的主成分选择和因素分析中的因素数选择,这个公式也可以用来优化。
    Abstract The article attempts to find an algebraic formula describing the correlation coefficients between random variables and the principal components representing them. As a result of the analysis, starting from selected statistics relating to individual random variables, the equivalents of these statistics relating to a set of random variables were presented in the language of linear algebra, using the concepts of vector and matrix. This made it possible, in subsequent steps, to derive the expected formula. The formula found is identical to the formula used in Factor Analysis to calculate factor loadings. The discussion showed that it is possible to apply this formula to optimize the number of principal components in Principal Component Analysis, as well as to optimize the number of factors in Factor Analysis.
    摘要 文章尝试找到一个 алгебраическая方程描述Random Variables和它们的主成分之间的相关系数。经过分析,从选择的个体Random Variables的统计信息开始,使用线性代数概念 Vector和矩阵来表示这些统计信息的等价物。这使得在后续步骤中可以 derivate预期的方程。发现的方程与 фактор分析中计算因子负载的方程一样。文章还讨论了如何使用这个方程优化Principal Component Analysis中的主成分数量和Factor Analysis中的因子数量。Note: "Simplified Chinese" is a romanization of Chinese that uses a simplified set of characters and grammar rules to represent the language. It is commonly used in mainland China and Singapore.

Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach

  • paper_url: http://arxiv.org/abs/2310.06112
  • repo_url: https://github.com/fshp971/adv-ntk
  • paper_authors: Shaopeng Fu, Di Wang
  • for: 这篇论文主要是为了解释深度神经网络(DNN)中的对抗训练(AT)方法在Robustness方面的缺点。
  • methods: 该论文使用了神经积簇kernel(NTK)理论来扩展AT方法,并证明了一个攻击者训练的宽度DNN可以被近似为一个线性化DNN。
  • results: 该论文通过实验表明,使用Adv-NTK算法可以帮助无穷宽度DNN增强相对的Robustness,并且该结果证明了论文中的理论结论。
    Abstract Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width DNNs. Experiments on real-world datasets show that Adv-NTK can help infinite-width DNNs enhance comparable robustness to that of their finite-width counterparts, which in turn justifies our theoretical findings. The code is available at https://github.com/fshp971/adv-ntk.
    摘要 “对抗训练(AT)是深度神经网络(DNN)的一种标准方法,但是最近的研究表明,长期的AT可能对DNN的Robustness产生负面影响。这篇论文提供了DNN的Robust overfitting的理论解释。具体来说,我们将 neural tangent kernel(NTK)理论推广到AT,并证明了一个 adversarially trained wide DNN可以被linearized。此外,对于平方损失,我们可以 derivate closed-form AT dynamics for linearized DNN,这 revelas a new AT degeneration phenomenon:long-term AT will cause a wide DNN to degenerate into a DNN without AT, leading to robust overfitting。根据我们的理论结论,我们还设计了一种名为 Adv-NTK的AT算法,该算法可以帮助无限宽 DNN 提高相对的Robustness。实验表明,Adv-NTK可以帮助无限宽 DNN 提高与其有限宽 counterpart 的Robustness,这对我们的理论结论产生了正确的证明。代码可以在 https://github.com/fshp971/adv-ntk 中找到。”Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

Grokking as the Transition from Lazy to Rich Training Dynamics

  • paper_url: http://arxiv.org/abs/2310.06110
  • repo_url: None
  • paper_authors: Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan
  • for: 该论文探讨了 Grokking 现象,即 neural network 的训练损失降低得比测试损失早得多,可能是由于 neural network 从懒散训练方式转移到了丰富的特征学习 régime。
  • methods: 作者通过使用普通的梯度下降法和二层神经网络在一个多项式回归问题上进行研究,发现 Grokking 现象不可能由现有理论解释。作者还提出了测试损失的充分统计,并在训练过程中跟踪这些统计,从而发现 Grokking 现象 arise 在神经网络首先尝试使用初始特征来适应kernel regression解决方案,然后在训练损失已经下降到低水平时发现一个泛化解决方案。
  • results: 作者发现 Grokking 现象的关键因素包括神经网络输出的速率(可以由输出参数控制)和初始特征与目标函数 $y(x)$ 的对齐度。当神经网络在初始特征学习 régime 中训练时,它会首先尝试适应kernel regression解决方案,然后在训练损失已经下降到低水平时发现一个泛化解决方案。此外,作者还发现这种延迟泛化 arise 在 dataset 大 enough,但不是太大,以致可以使神经网络泛化,但不是太早。
    Abstract We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To illustrate this mechanism, we study the simple setting of vanilla gradient descent on a polynomial regression problem with a two layer neural network which exhibits grokking without regularization in a way that cannot be explained by existing theories. We identify sufficient statistics for the test loss of such a network, and tracking these over training reveals that grokking arises in this setting when the network first attempts to fit a kernel regression solution with its initial features, followed by late-time feature learning where a generalizing solution is identified after train loss is already low. We find that the key determinants of grokking are the rate of feature learning -- which can be controlled precisely by parameters that scale the network output -- and the alignment of the initial features with the target function $y(x)$. We argue this delayed generalization arises when (1) the top eigenvectors of the initial neural tangent kernel and the task labels $y(x)$ are misaligned, but (2) the dataset size is large enough so that it is possible for the network to generalize eventually, but not so large that train loss perfectly tracks test loss at all epochs, and (3) the network begins training in the lazy regime so does not learn features immediately. We conclude with evidence that this transition from lazy (linear model) to rich training (feature learning) can control grokking in more general settings, like on MNIST, one-layer Transformers, and student-teacher networks.
    摘要 We found that the key determinants of grokking are the rate of feature learning, which can be controlled precisely by parameters that scale the network output, and the alignment of the initial features with the target function $y(x)$. We argue that delayed generalization arises when the top eigenvectors of the initial neural tangent kernel and the task labels $y(x)$ are misaligned, but the dataset size is large enough so that the network can generalize eventually, but not so large that the train loss perfectly tracks the test loss at all epochs. Additionally, the network begins training in the lazy regime, so it does not learn features immediately.We conclude that this transition from lazy (linear model) to rich training (feature learning) can control grokking in more general settings, such as on MNIST, one-layer Transformers, and student-teacher networks.

Quantifying Uncertainty in Deep Learning Classification with Noise in Discrete Inputs for Risk-Based Decision Making

  • paper_url: http://arxiv.org/abs/2310.06105
  • repo_url: None
  • paper_authors: Maryam Kheirandish, Shengfan Zhang, Donald G. Catanzaro, Valeriu Crudu
  • for: 这篇论文的目的是为了提供一个数据类型为数字和分类的问题上的深度神经网络模型中的预测不确定性评估方法。
  • methods: 这篇论文使用的方法是基于 Bayesian deep learning 的方法,具体是使用 Monte Carlo dropout 和我们的提议的框架来评估预测不确定性。
  • results: 这篇论文的结果显示,我们的提议的框架可以更好地识别预测中的错误 случарес,并且比 Monte Carlo dropout 方法更能捕捉错误的情况。
    Abstract The use of Deep Neural Network (DNN) models in risk-based decision-making has attracted extensive attention with broad applications in medical, finance, manufacturing, and quality control. To mitigate prediction-related risks in decision making, prediction confidence or uncertainty should be assessed alongside the overall performance of algorithms. Recent studies on Bayesian deep learning helps quantify prediction uncertainty arises from input noises and model parameters. However, the normality assumption of input noise in these models limits their applicability to problems involving categorical and discrete feature variables in tabular datasets. In this paper, we propose a mathematical framework to quantify prediction uncertainty for DNN models. The prediction uncertainty arises from errors in predictors that follow some known finite discrete distribution. We then conducted a case study using the framework to predict treatment outcome for tuberculosis patients during their course of treatment. The results demonstrate under a certain level of risk, we can identify risk-sensitive cases, which are prone to be misclassified due to error in predictors. Comparing to the Monte Carlo dropout method, our proposed framework is more aware of misclassification cases. Our proposed framework for uncertainty quantification in deep learning can support risk-based decision making in applications when discrete errors in predictors are present.
    摘要 使用深度神经网络(DNN)模型在风险基础的决策中吸引了广泛的关注,应用于医疗、金融、制造和质量控制等领域。为了减少决策过程中的预测风险,需要同时评估算法的总性表现和预测uncertainty。 latest studies on Bayesian deep learning 可以量化预测uncertainty,但这些模型假设输入噪声是Normal分布,这限制了它们在具有分类和离散特征变量的表格数据集中的应用。在这篇论文中,我们提出了一个数学框架,可以量化DNN模型中的预测uncertainty。预测uncertainty来自预测器中的错误,这些错误遵循一些已知的有限离散分布。我们Then conducted a case study using the framework to predict treatment outcome for tuberculosis patients during their course of treatment. The results show that under a certain level of risk, we can identify risk-sensitive cases, which are prone to be misclassified due to error in predictors. Comparing to the Monte Carlo dropout method, our proposed framework is more aware of misclassification cases. Our proposed framework for uncertainty quantification in deep learning can support risk-based decision making in applications when discrete errors in predictors are present.

Transformers and Large Language Models for Chemistry and Drug Discovery

  • paper_url: http://arxiv.org/abs/2310.06083
  • repo_url: None
  • paper_authors: Andres M Bran, Philippe Schwaller
  • for: 这篇论文旨在探讨如何使用Transformer架构解决化学发现过程中的重要瓶颈问题,如retrosynthetic planning和化学空间探索。
  • methods: 这篇论文使用了Transformer架构,并将其应用于不同类型的数据,如线性化分子图、spectra、synthesis actions和人工语言。
  • results: 这篇论文描述了一种新的方法,可以通过自然语言的灵活性,解决化学问题。这种方法可以在不同的化学应用中使用,并且可以在将来的科学发现中扮演一个更重要的角色。
    Abstract Language modeling has seen impressive progress over the last years, mainly prompted by the invention of the Transformer architecture, sparking a revolution in many fields of machine learning, with breakthroughs in chemistry and biology. In this chapter, we explore how analogies between chemical and natural language have inspired the use of Transformers to tackle important bottlenecks in the drug discovery process, such as retrosynthetic planning and chemical space exploration. The revolution started with models able to perform particular tasks with a single type of data, like linearised molecular graphs, which then evolved to include other types of data, like spectra from analytical instruments, synthesis actions, and human language. A new trend leverages recent developments in large language models, giving rise to a wave of models capable of solving generic tasks in chemistry, all facilitated by the flexibility of natural language. As we continue to explore and harness these capabilities, we can look forward to a future where machine learning plays an even more integral role in accelerating scientific discovery.
    摘要 很多年来,语言模型在技术发展方面有了很大的进步,主要归功于Transformer架构的发明,这些架构的出现对机器学习多个领域产生了革命,包括化学和生物学。在这一章中,我们将探讨如何通过在化学和自然语言之间的相似性,使用Transformers来解决药物发现过程中的重要瓶颈,如逆 synthesis 规划和化学空间探索。这场革命从单一数据类型的模型开始,然后演进到包括其他数据类型,如分析器的spectra、合成操作和人类语言。现在,一新的趋势是利用大语言模型,让化学领域中的普遍任务得到解决,全部归功于自然语言的灵活性。我们继续探索和利用这些能力,未来machine learning在科学发现中的作用将变得更加重要。

Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

  • paper_url: http://arxiv.org/abs/2310.06081
  • repo_url: None
  • paper_authors: Aleksei Ustimenko, Aleksandr Beznosikov
  • for: 本文研究一种广泛和通用的马可夫链,即以爱因斯坦-玛丽亚偏抽象方式描述的某种随机 diffequation 的谱。
  • methods: 本文使用了 almost arbitrary 的各向异常和状态依赖的噪声,而不是通常使用的Normal和状态独立的噪声。此外,我们的链的涨落和扩散系数可以是不准确的,以涵盖广泛的应用,如某种 Stochastic Gradient Langevin Dynamics、sampling、Stochastic Gradient Descent 或 Stochastic Gradient Boosting。
  • results: 我们证明了 $W_{2}$-距离 между含义链和对应的随机 diffequation 的法律之间的上界。这些结果超越或覆盖了大多数已知的估计。此外,对某些特定情况,我们的分析是第一次。
    Abstract This work considers a rather general and broad class of Markov chains, Ito chains that look like Euler-Maryama discretization of some Stochastic Differential Equation. The chain we study is a unified framework for theoretical analysis. It comes with almost arbitrary isotropic and state-dependent noise instead of normal and state-independent one, as in most related papers. Moreover, our chain's drift and diffusion coefficient can be inexact to cover a wide range of applications such as Stochastic Gradient Langevin Dynamics, sampling, Stochastic Gradient Descent, or Stochastic Gradient Boosting. We prove an upper bound for $W_{2}$-distance between laws of the Ito chain and the corresponding Stochastic Differential Equation. These results improve or cover most of the known estimates. Moreover, for some particular cases, our analysis is the first.
    摘要 Note: Simplified Chinese is also known as "简化字" or "简化字".Translation Notes:* "Markov chains" is translated as "Markov链" (Ma Ke Luo)* "Ito chains" is translated as "Itō链" (Ito Luo)* "Stochastic Differential Equation" is translated as "随机 diffe链方程" (Suī Jī Difu Luo Fang Jian)* "Wiener distance" is translated as "维纳度" (Wei Na Du)* "inexact" is translated as "不精确" (Bu Jing Ke)* "Stochastic Gradient Langevin Dynamics" is translated as "随机梯度兰格文运动" (Suī Jī Tiejian Langevin Yùndòng)* "sampling" is translated as "采样" (Cǎi Yàng)* "Stochastic Gradient Descent" is translated as "随机梯度下降" (Suī Jī Tiejian Xiào Jiàng)* "Stochastic Gradient Boosting" is translated as "随机梯度增强" (Suī Jī Tiejian Zēng Qiáng)

Optimal Exploration is no harder than Thompson Sampling

  • paper_url: http://arxiv.org/abs/2310.06069
  • repo_url: None
  • paper_authors: Zhaoqi Li, Kevin Jamieson, Lalit Jain
  • for: This paper aims to solve the pure exploration linear bandit problem with high probability through noisy measurements of $x^{\top}\theta_{\ast}$.
  • methods: The paper proposes an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate with the optimal exponent among all possible allocations asymptotically.
  • results: The algorithm proposed in the paper can be easily implemented and performs as well empirically as existing asymptotically optimal methods.Here is the same information in Simplified Chinese:
  • for: 本 paper 目标是解决纯exploration linear bandit问题,通过各种噪声测量 $x^{\top}\theta_{\ast}$ 来寻找最优解。
  • methods: 本 paper 提出了一种基于抽象和最大值 oracle 的算法,可以在高probability下实现快速收敛率,并且可以证明这种算法在所有分配中具有最优的幂率。
  • results: 本 paper 提出的算法可以轻松实现并与现有的 asymptotically 优化方法相当。
    Abstract Given a set of arms $\mathcal{Z}\subset \mathbb{R}^d$ and an unknown parameter vector $\theta_\ast\in\mathbb{R}^d$, the pure exploration linear bandit problem aims to return $\arg\max_{z\in \mathcal{Z} z^{\top}\theta_{\ast}$, with high probability through noisy measurements of $x^{\top}\theta_{\ast}$ with $x\in \mathcal{X}\subset \mathbb{R}^d$. Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm $z\in \mathcal{Z}$ or b) explicitly maintaining a subset of $\mathcal{Z}$ under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate $\mathcal{Z}$ at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically. In addition, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods.
    摘要 In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically.Moreover, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods.

Early Warning via tipping-preserving latent stochastic dynamical system and meta label correcting

  • paper_url: http://arxiv.org/abs/2310.06059
  • repo_url: None
  • paper_authors: Peng Zhang, Ting Gao, Jin Guo, Jinqiao Duan
  • for: 预测 эпилепсию患者的症状,以提高 их安全性和健康状况。
  • methods: 基于患者的EEG数据,提出了一种基于meta学习框架的预测方法,利用了meta标签修正方法,并通过优化 latent Stochastic differential equation(SDE) 中的信息,选择最佳的 latent 动力系统。
  • results: 通过实验 validate 了我们的方法,发现预测精度有surprisingly的增加。
    Abstract Early warning for epilepsy patients is crucial for their safety and well-being, in terms of preventing or minimizing the severity of seizures. Through the patients' EEG data, we propose a meta learning framework for improving prediction on early ictal signals. To better utilize the meta label corrector method, we fuse the information from both the real data and the augmented data from the latent Stochastic differential equation(SDE). Besides, we also optimally select the latent dynamical system via distribution of transition time between real data and that from the latent SDE. In this way, the extracted tipping dynamical feature is also integrated into the meta network to better label the noisy data. To validate our method, LSTM is implemented as the baseline model. We conduct a series of experiments to predict seizure in various long-term window from 1-2 seconds input data and find surprisingly increment of prediction accuracy.
    摘要 早期警告对 эпилепси patients 的安全和健康至关重要,以预防或减轻癫痫症发作的严重程度。通过患者的 EEG 数据,我们提议一种meta学框架,以提高预测早期癫痫症信号的精度。为了更好地利用meta标签修复方法,我们将真实数据和潜在数据从射频 diferencial equation(SDE)的 latent 信息 fusion。此外,我们还优化了 latent 动力系统的选择,通过transition时间分布 между real data和 latent SDE 中的数据来实现。这样,提取的折冲动力特征也被 интегрирова到 meta 网络中,以更好地标注噪音数据。为验证我们的方法,LSTM 被实现为基eline模型。我们进行了一系列实验,用1-2秒输入数据预测癫痫症,并发现了奇异的增加预测精度。

Knowledge Distillation for Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.06047
  • repo_url: https://github.com/HibikiJie/Multiresolution-Knowledge-Distillation-for-Anomaly-Detection
  • paper_authors: Adrian Alan Pol, Ekaterina Govorkova, Sonja Gronroos, Nadezda Chernyavskaya, Philip Harris, Maurizio Pierini, Isobel Ojalvo, Peter Elmer
  • for: 用于压缩无监督深度学习模型,以便在有限资源的设备上部署。
  • methods: 使用知识储存法压缩无监督异常检测模型,并提出一些改进检测敏感度的技巧。
  • results: 压缩模型与原始模型的性能相似,而减少大小和内存占用。
    Abstract Unsupervised deep learning techniques are widely used to identify anomalous behaviour. The performance of such methods is a product of the amount of training data and the model size. However, the size is often a limiting factor for the deployment on resource-constrained devices. We present a novel procedure based on knowledge distillation for compressing an unsupervised anomaly detection model into a supervised deployable one and we suggest a set of techniques to improve the detection sensitivity. Compressed models perform comparably to their larger counterparts while significantly reducing the size and memory footprint.
    摘要 <>转换文本到简化中文。>无监督深度学习技术广泛用于异常行为识别。这些方法的性能与训练数据量和模型大小相乘。然而,大小往往是部署在资源受限的设备上的限制因素。我们提出了一种基于知识储存的新方法,可以压缩无监督异常检测模型成可部署的超vised模型,并提出了一些提高检测敏感度的技巧。压缩模型与其更大的对手相比,性能相似,却减少了大小和内存占用。

Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions

  • paper_url: http://arxiv.org/abs/2310.05921
  • repo_url: None
  • paper_authors: Jordan Lekeufack, Anastasios N. Angelopoulos, Andrea Bajcsy, Michael I. Jordan, Jitendra Malik
  • for: 该论文旨在提供一种安全的自动决策框架,即使机器学习预测不准确。
  • methods: 该论文使用了准确预测理论,无需假设世界模型。
  • results: 实验表明,该方法在机器人运动规划、自动股票交易和机器人生产中具有实用性。
    Abstract We introduce Conformal Decision Theory, a framework for producing safe autonomous decisions despite imperfect machine learning predictions. Examples of such decisions are ubiquitous, from robot planning algorithms that rely on pedestrian predictions, to calibrating autonomous manufacturing to exhibit high throughput and low error, to the choice of trusting a nominal policy versus switching to a safe backup policy at run-time. The decisions produced by our algorithms are safe in the sense that they come with provable statistical guarantees of having low risk without any assumptions on the world model whatsoever; the observations need not be I.I.D. and can even be adversarial. The theory extends results from conformal prediction to calibrate decisions directly, without requiring the construction of prediction sets. Experiments demonstrate the utility of our approach in robot motion planning around humans, automated stock trading, and robot manufacturing.
    摘要 我们介绍了对准决策理论,一个框架用于生成安全的自动决策,即使机器学习预测不完美。这些决策的例子非常普遍,包括 robot 观察算法依赖人类预测,将自动生产调整为具有高速和低错误,以及在执行时是否信任主要政策或者转折到安全备用政策。我们的算法生成的决策是安全的,即具有可证的Statistical guarantee of low risk,不需要世界模型的任何假设,观察不必I.I.D.,甚至可以是反对的。我们的理论扩展了对准预测的结果,直接将决策calibrate,不需要建立预测集。实验展示了我们的方法在人类附近的机器人运动规划、自动股票交易和机器人生产中的 utility。

Learning to Decode the Surface Code with a Recurrent, Transformer-Based Neural Network

  • paper_url: http://arxiv.org/abs/2310.05900
  • repo_url: None
  • paper_authors: Johannes Bausch, Andrew W Senior, Francisco J H Heras, Thomas Edlich, Alex Davies, Michael Newman, Cody Jones, Kevin Satzinger, Murphy Yuezhen Niu, Sam Blackwell, George Holland, Dvir Kafri, Juan Atalaya, Craig Gidney, Demis Hassabis, Sergio Boixo, Hartmut Neven, Pushmeet Kohli
  • for: 这个论文的目的是提高量子计算的可靠性,通过使用机器学习来解码量子错误 correction 代码。
  • methods: 这个论文使用了循环、变换器基本的神经网络,通过直接学习数据来解码表面码。
  • results: 论文的解码器在实际数据上(Google Sycamore 量子处理器)以及模拟数据上(包括干扰和误差)都有优异表现,可以覆盖距离3和5表面码,并且在训练时间25个循环后仍保持高准确率。
    Abstract Quantum error-correction is a prerequisite for reliable quantum computation. Towards this goal, we present a recurrent, transformer-based neural network which learns to decode the surface code, the leading quantum error-correction code. Our decoder outperforms state-of-the-art algorithmic decoders on real-world data from Google's Sycamore quantum processor for distance 3 and 5 surface codes. On distances up to 11, the decoder maintains its advantage on simulated data with realistic noise including cross-talk, leakage, and analog readout signals, and sustains its accuracy far beyond the 25 cycles it was trained on. Our work illustrates the ability of machine learning to go beyond human-designed algorithms by learning from data directly, highlighting machine learning as a strong contender for decoding in quantum computers.
    摘要 量子错误纠正是可靠量子计算的必要前提。为达到这个目标,我们提出了一种循环、转换器基于神经网络,可以学习解码表面码,这是量子错误纠正代码的领先代码。我们的解码器在Google的Sycamore量子处理器上的真实数据上表现出优于当前最佳算法解码器,在距离3和5表面码上出现了优异表现。在距离11上,我们的解码器在实际噪音,包括交叠、泄漏和分析读取信号的 simulate 数据上维持了其优势,并保持了其精度远远超出了它被训练的25次。我们的工作表明了机器学习可以超越人类设计的算法,通过直接学习数据,机器学习成为量子计算中的强有力竞争者。

A Generalization Bound of Deep Neural Networks for Dependent Data

  • paper_url: http://arxiv.org/abs/2310.05892
  • repo_url: https://github.com/umd-huang-lab/neural-net-generalization-via-tensor
  • paper_authors: Quan Huu Do, Binh T. Nguyen, Lam Si Tung Ho
  • for: 这个研究是为了提供非站勤$\phi$-混合数据上的对抗学习网络对应。
  • methods: 本研究使用了对抗学习网络,并提出了一个新的一致性矩阵bound。
  • results: 研究发现,这个一致性矩阵bound可以对非站勤$\phi$-混合数据进行预测,并且比旧有的 bound 更为精确。
    Abstract Existing generalization bounds for deep neural networks require data to be independent and identically distributed (iid). This assumption may not hold in real-life applications such as evolutionary biology, infectious disease epidemiology, and stock price prediction. This work establishes a generalization bound of feed-forward neural networks for non-stationary $\phi$-mixing data.
    摘要 现有的总体化约束要求深度神经网络数据必须是独立并且相同分布(iid)。这个假设可能不成立在实际应用中,如生物进化、感染病毒流行病学和股票价格预测。这项工作建立了非站ARY $\phi$-混合数据的含积总体化约束。

A Machine Learning Approach to Predicting Single Event Upsets

  • paper_url: http://arxiv.org/abs/2310.05878
  • repo_url: https://github.com/architg1/CREMER
  • paper_authors: Archit Gupta, Chong Yock Eng, Deon Lim Meng Wee, Rashna Analia Ahmed, See Min Sim
  • for: 预测单个事件异常 (SEU) 的发生,以提高半导体设备的可靠性。
  • methods: 使用机器学习技术,只使用位置数据预测 SEU 发生。
  • results: 提高半导体设备的可靠性,创造更安全的数字环境。
    Abstract A single event upset (SEU) is a critical soft error that occurs in semiconductor devices on exposure to ionising particles from space environments. SEUs cause bit flips in the memory component of semiconductors. This creates a multitude of safety hazards as stored information becomes less reliable. Currently, SEUs are only detected several hours after their occurrence. CREMER, the model presented in this paper, predicts SEUs in advance using machine learning. CREMER uses only positional data to predict SEU occurrence, making it robust, inexpensive and scalable. Upon implementation, the improved reliability of memory devices will create a digitally safer environment onboard space vehicles.
    摘要 一个单一事件冲击(SEU)是半导体设备中critical soft error的一种重要问题,它由宇宙射线粒子引起,导致内存组件中的比特跳变。这会导致存储的信息变得更加不可靠,带来多种安全风险。目前,SEU的发生只能在several hours后被探测出来。本文中提出的CREMER模型使用机器学习技术预测SEU发生,只使用位置数据,因此具有robust、便宜和可扩展的特点。在实施后,内存设备的可靠性会得到改善,从而在空间 vehicles上创造出一个更加数字安全的环境。Note: "宇宙射线粒子" in the text refers to ionising particles from space environments.

Bio-inspired computational memory model of the Hippocampus: an approach to a neuromorphic spike-based Content-Addressable Memory

  • paper_url: http://arxiv.org/abs/2310.05868
  • repo_url: None
  • paper_authors: Daniel Casanueva-Morato, Alvaro Ayuso-Martinez, Juan P. Dominguez-Morales, Angel Jimenez-Fernandez, Gabriel Jimenez-Moreno
  • for: 这篇论文目的是开发一种基于海马 CA3 区域的生物体现学习系统,能够学习、忘记和回忆非正式的记忆 fragment。
  • methods: 该模型使用脉冲神经网络(SNN)和SpiNNaker 硬件平台实现,并进行了功能、压力和实用性测试。
  • results: 该模型可以学习、忘记和回忆非正式的记忆 fragment,并且在不同的压力和环境下能够正常工作。这是首次实现了一个完全可工作的生物体现学习系统,将为未来的更复杂的neuromorphic系统开拓新的可能性。
    Abstract The brain has computational capabilities that surpass those of modern systems, being able to solve complex problems efficiently in a simple way. Neuromorphic engineering aims to mimic biology in order to develop new systems capable of incorporating such capabilities. Bio-inspired learning systems continue to be a challenge that must be solved, and much work needs to be done in this regard. Among all brain regions, the hippocampus stands out as an autoassociative short-term memory with the capacity to learn and recall memories from any fragment of them. These characteristics make the hippocampus an ideal candidate for developing bio-inspired learning systems that, in addition, resemble content-addressable memories. Therefore, in this work we propose a bio-inspired spiking content-addressable memory model based on the CA3 region of the hippocampus with the ability to learn, forget and recall memories, both orthogonal and non-orthogonal, from any fragment of them. The model was implemented on the SpiNNaker hardware platform using Spiking Neural Networks. A set of experiments based on functional, stress and applicability tests were performed to demonstrate its correct functioning. This work presents the first hardware implementation of a fully-functional bio-inspired spiking hippocampal content-addressable memory model, paving the way for the development of future more complex neuromorphic systems.
    摘要 脑有计算能力,超过现代系统,能够解决复杂问题,使用简单的方式。神经科工程尝试模仿生物,以开发新的系统,拥有这种能力。生物启发式学习系统仍然是一个挑战,需要进一步的研究。脑中的梨膜区(CA3)是一种自动相关短期记忆,具有学习和记忆任何段落的能力。这些特点使得梨膜区成为开发生物启发式学习系统的理想选择。因此,在这项工作中,我们提出了基于CA3区的生物启发式脉冲记忆模型,具有学习、忘记和记忆任何段落的能力。该模型在SpiNNaker硬件平台上使用脉冲神经网络进行实现。我们对模型进行了功能、压力和实用性测试,以证明其正常工作。这项工作展示了首次实现了完全可用的生物启发式脉冲梨膜区内存模型,开创了未来更复杂的神经omorphic系统的发展之路。

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

  • paper_url: http://arxiv.org/abs/2310.05858
  • repo_url: https://github.com/jingliang-duan/dsac-t
  • paper_authors: Jingliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li
  • for: 提高模型自适应RL方法的性能,解决常见的过估问题。
  • methods: 使用分布式软actor-critic算法(DSAC),并进行了三种改进:批处理梯度调整、双值分布学习和 variance-based target return clipping。
  • results: 在多种环境中,DSAC-T超过了多种主流模型自适应RL算法,包括SAC、TD3、DDPG、TRPO和PPO,而且保证了高稳定性的学习过程和不同奖励缩放下的相似性。
    Abstract Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution. Nonetheless, standard DSAC has its own shortcomings, including occasionally unstable learning processes and needs for task-specific reward scaling, which may hinder its overall performance and adaptability in some special tasks. This paper further introduces three important refinements to standard DSAC in order to address these shortcomings. These refinements consist of critic gradient adjusting, twin value distribution learning, and variance-based target return clipping. The modified RL algorithm is named as DSAC with three refinements (DSAC-T or DSAC-v2), and its performances are systematically evaluated on a diverse set of benchmark tasks. Without any task-specific hyperparameter tuning, DSAC-T surpasses a lot of mainstream model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T, unlike its standard version, ensures a highly stable learning process and delivers similar performance across varying reward scales.
    摘要 “强化学习(RL)已经证明可以很好地解决复杂的决策和控制任务。然而,广泛使用的无策法RL方法经常会遭遇估计问题,导致性能下降。为了解决这个问题,我们最近提出了一种偏离策略RL算法,称为分布型软actor-批评(DSAC或DSAC-v1),可以有效地提高价值估计准确性。然而,标准DSAC有一些缺点,包括 occasionally 不稳定的学习过程和需要任务特定的奖励滤波,这可能会限制其总体性能和适应性。这篇文章进一步介绍了三种重要的DSAC改进,包括评价函数梯度调整、双值分布学习和归一化目标返回截卷。改进后的RL算法被称为DSAC-T或DSAC-v2,其性能在多种环境中进行系统性评估。无需任务特定的超参数调整,DSAC-T比许多主流无策法RL算法,包括SAC、TD3、DDPG、TRPO和PPO,在所有测试环境中表现出色。此外,DSAC-T不同于标准版本,可以保证学习过程非常稳定,并在不同的奖励档次下提供相似的性能。”

Improved Communication Efficiency in Federated Natural Policy Gradient via ADMM-based Gradient Updates

  • paper_url: http://arxiv.org/abs/2310.19807
  • repo_url: None
  • paper_authors: Guangchen Lan, Han Wang, James Anderson, Christopher Brinton, Vaneet Aggarwal
  • for: 这 paper 旨在解决 Federated reinforcement learning (FedRL) 中高度通信开销的问题,尤其是在 natural policy gradient (NPG) 方法中,以提高training效率。
  • methods: 该 paper 提出了 FedNPG-ADMM 框架,通过 alternating direction method of multipliers (ADMM) 方法来近似全局 NPG 方向,从而提高了training efficiency。
  • results: 该 paper theoretically 表明,使用 ADMM-based gradient updates 可以将 communication complexity 降低至 ${O}({d})$,其中 $d$ 是模型参数的数量。此外,paper 还证明了 FedNPG-ADMM 可以保持和标准 FedNPG 相同的 convergence rate。通过在 MuJoCo 环境中评估该 algorithm,paper 还证明了 FedNPG-ADMM 可以保持 reward performance,并且当Agent 数量增加时,其 convergence rate 会提高。
    Abstract Federated reinforcement learning (FedRL) enables agents to collaboratively train a global policy without sharing their individual data. However, high communication overhead remains a critical bottleneck, particularly for natural policy gradient (NPG) methods, which are second-order. To address this issue, we propose the FedNPG-ADMM framework, which leverages the alternating direction method of multipliers (ADMM) to approximate global NPG directions efficiently. We theoretically demonstrate that using ADMM-based gradient updates reduces communication complexity from ${O}({d^{2})$ to ${O}({d})$ at each iteration, where $d$ is the number of model parameters. Furthermore, we show that achieving an $\epsilon$-error stationary convergence requires ${O}(\frac{1}{(1-\gamma)^{2}{\epsilon})$ iterations for discount factor $\gamma$, demonstrating that FedNPG-ADMM maintains the same convergence rate as the standard FedNPG. Through evaluation of the proposed algorithms in MuJoCo environments, we demonstrate that FedNPG-ADMM maintains the reward performance of standard FedNPG, and that its convergence rate improves when the number of federated agents increases.
    摘要 federated reinforcement learning (FedRL) 允许代理共同训练全局策略,不需要分享个人数据。然而,交通开销仍然是critical bottleneck,特别是用natural policy gradient (NPG) 方法,这些方法是second-order。为解决这个问题,我们提出了FedNPG-ADMM框架,它利用了alternating direction method of multipliers (ADMM)来高效地计算全局NPG方向。我们 teorically 表明,使用 ADMM-based Gradient更新可以将交通复杂度从 $O(d^2)$ 降低到 $O(d)$ at each iteration,where $d$ 是模型参数的数量。此外,我们还证明了在 $\epsilon$-error stationary convergence 下,FedNPG-ADMM 需要 ${O}(\frac{1}{(1-\gamma)^{2}{\epsilon})$ 迭代,这与标准 FedNPG 的迭代速率相同。通过在 MuJoCo 环境中评估提议的算法,我们表明了FedNPG-ADMM 可以保持标准 FedNPG 的奖励性能,并且当多个联合代理增加时,其迭代速率会提高。

Robust Angular Synchronization via Directed Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.05842
  • repo_url: None
  • paper_authors: Yixuan He, Gesine Reinert, David Wipf, Mihai Cucuringu
  • for: angular synchronization problem and its heterogeneous extension (sensor network localization, phase retrieval, and distributed clock synchronization)
  • methods: directed graph neural networks and new loss functions
  • results: competitive and often superior performance against a comprehensive set of baselines, validating the robustness of GNNSync even at high noise levels.
    Abstract The angular synchronization problem aims to accurately estimate (up to a constant additive phase) a set of unknown angles $\theta_1, \dots, \theta_n\in[0, 2\pi)$ from $m$ noisy measurements of their offsets $\theta_i-\theta_j \;\mbox{mod} \; 2\pi.$ Applications include, for example, sensor network localization, phase retrieval, and distributed clock synchronization. An extension of the problem to the heterogeneous setting (dubbed $k$-synchronization) is to estimate $k$ groups of angles simultaneously, given noisy observations (with unknown group assignment) from each group. Existing methods for angular synchronization usually perform poorly in high-noise regimes, which are common in applications. In this paper, we leverage neural networks for the angular synchronization problem, and its heterogeneous extension, by proposing GNNSync, a theoretically-grounded end-to-end trainable framework using directed graph neural networks. In addition, new loss functions are devised to encode synchronization objectives. Experimental results on extensive data sets demonstrate that GNNSync attains competitive, and often superior, performance against a comprehensive set of baselines for the angular synchronization problem and its extension, validating the robustness of GNNSync even at high noise levels.
    摘要 “angular synchronization problem”targets to accurately estimate(up to a constant additive phase)a set of unknown angles $\theta_1, \dots, \theta_n\in[0, 2\pi)$ from $m$ noisy measurements of their offsets $\theta_i-\theta_j \;\mbox{mod} \; 2\pi$. Applications include sensor network localization, phase retrieval, and distributed clock synchronization. An extension of the problem to the heterogeneous setting(dubbed $k$-synchronization)is to estimate $k$ groups of angles simultaneously, given noisy observations(with unknown group assignment)from each group. Existing methods for angular synchronization usually perform poorly in high-noise regimes, which are common in applications. In this paper, we leverage neural networks for the angular synchronization problem and its heterogeneous extension by proposing GNNSync, a theoretically-grounded end-to-end trainable framework using directed graph neural networks. In addition, new loss functions are devised to encode synchronization objectives. Experimental results on extensive data sets demonstrate that GNNSync attains competitive, and often superior, performance against a comprehensive set of baselines for the angular synchronization problem and its extension, validating the robustness of GNNSync even at high noise levels.

A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models

  • paper_url: http://arxiv.org/abs/2310.05833
  • repo_url: None
  • paper_authors: Sebastian G. Gruber, Florian Buettner
  • for: 这篇论文的目的是为了提供一种评估生成模型的泛化性和不确定性的理论框架。
  • methods: 该论文使用了对kernel scores的偏差-弹性-协方差分解,并提出了不偏的和一致的估计器,只需要生成的样本而不需要下游模型。
  • results: 该论文的应用是评估扩散模型的泛化评估和发现了少数群体的极化现象,以及验证了干扰和预测卷积 entropy 作为生成模型的不确定性度量。
    Abstract Generative models, like large language models, are becoming increasingly relevant in our daily lives, yet a theoretical framework to assess their generalization behavior and uncertainty does not exist. Particularly, the problem of uncertainty estimation is commonly solved in an ad-hoc manner and task dependent. For example, natural language approaches cannot be transferred to image generation. In this paper we introduce the first bias-variance-covariance decomposition for kernel scores and their associated entropy. We propose unbiased and consistent estimators for each quantity which only require generated samples but not the underlying model itself. As an application, we offer a generalization evaluation of diffusion models and discover how mode collapse of minority groups is a contrary phenomenon to overfitting. Further, we demonstrate that variance and predictive kernel entropy are viable measures of uncertainty for image, audio, and language generation. Specifically, our approach for uncertainty estimation is more predictive of performance on CoQA and TriviaQA question answering datasets than existing baselines and can also be applied to closed-source models.
    摘要 大量语言模型在我们日常生活中变得越来越重要,然而一个有效的理论框架来评估它们的泛化行为和不确定性并没有出现。特别是不确定性估计问题通常采用做出的方式和任务相关。例如,自然语言方法无法被转移到图像生成。在这篇论文中,我们介绍了首个偏差-变量- covariance 分解 для核分数和它们相关的熵。我们提议不偏和一致的估计器,只需要生成的样本而不需要下游模型本身。作为应用,我们对扩散模型进行总体评估,发现扩散的小组聚合是对权重过拟合的反应。此外,我们发现了变量和预测核熵是图像、音频和语言生成中的不确定性度量。 Specifically,我们的不确定性估计方法在CoQA和TriviaQA问答数据集上的性能预测比现有基elines高,并且可以应用于关闭源模型。

Pre-trained Spatial Priors on Multichannel NMF for Music Source Separation

  • paper_url: http://arxiv.org/abs/2310.05821
  • repo_url: None
  • paper_authors: Pablo Cabanas-Molero, Antonio J. Munoz-Montoro, Julio Carabias-Orti, Pedro Vera-Candeas
  • for: 这个论文提出了一种基于录音设置信息的声音来源分离方法,可以应用于现有的室内乐录音设置。
  • methods: 该方法使用 solo 段来训练空间混合筛选器,以捕捉室内回声和扬声器响应的信息。然后将这个预训练过的筛选器integrated into a multichannel non-negative matrix factorization 方法,以更好地捕捉不同声音来源的方差。
  • results: 实验表明,该提出的框架可以更好地分离声音来源,比传统的 MNMF 方法提高性能。
    Abstract This paper presents a novel approach to sound source separation that leverages spatial information obtained during the recording setup. Our method trains a spatial mixing filter using solo passages to capture information about the room impulse response and transducer response at each sensor location. This pre-trained filter is then integrated into a multichannel non-negative matrix factorization (MNMF) scheme to better capture the variances of different sound sources. The recording setup used in our experiments is the typical setup for orchestra recordings, with a main microphone and a close "cardioid" or "supercardioid" microphone for each section of the orchestra. This makes the proposed method applicable to many existing recordings. Experiments on polyphonic ensembles demonstrate the effectiveness of the proposed framework in separating individual sound sources, improving performance compared to conventional MNMF methods.
    摘要 这篇论文提出了一种新的声音源分离方法,利用录制过程中获得的空间信息。我们的方法使用独奏段来训练一个空间混合 filters,以捕捉室内响应和传播器响应在每个感知器位置上的信息。这个预训练过的滤波器然后被 интеGRATED INTO a multichannel non-negative matrix factorization (MNMF) 方案,以更好地捕捉不同声音源的方差。我们的实验使用了典型的乐团录制设置,即主 Mikrofon 和每个乐器部分的 "cardioid" 或 "supercardioid" Mikrofon。这使得我们的方法可以应用于许多现有的录音。实验表明,我们的框架可以更有效地分离声音源,与传统的 MNMF 方法相比,对多重演奏体示出了更好的性能。

Sharing Information Between Machine Tools to Improve Surface Finish Forecasting

  • paper_url: http://arxiv.org/abs/2310.05807
  • repo_url: None
  • paper_authors: Daniel R. Clarkson, Lawrence A. Bull, Tina A. Dardeno, Chandula T. Wickramarachchi, Elizabeth J. Cross, Timothy J. Rogers, Keith Worden, Nikolaos Dervilis, Aidan J. Hughes
  • for: 预测机器制造过程中表面质量
  • methods: bayesian hierarchical model、bayesian linear regression
  • results: 提高预测精度和不确定性评估In Simplified Chinese text:
  • for: 用于预测机器制造过程中的表面质量
  • methods: 使用 bayesian hierarchical model 和 bayesian linear regression
  • results: 提高预测精度和不确定性评估
    Abstract At present, most surface-quality prediction methods can only perform single-task prediction which results in under-utilised datasets, repetitive work and increased experimental costs. To counter this, the authors propose a Bayesian hierarchical model to predict surface-roughness measurements for a turning machining process. The hierarchical model is compared to multiple independent Bayesian linear regression models to showcase the benefits of partial pooling in a machining setting with respect to prediction accuracy and uncertainty quantification.
    摘要 Note: Simplified Chinese is also known as "简化字" or "简化字".Here's the text in Simplified Chinese:当前,大多数表面质量预测方法只能进行单任务预测,这会导致数据被占用不足,重复工作和实验成本增加。为了解决这个问题,作者提议了一种折衣概率模型,用于预测转动加工过程中表面粗糙度测量值。这个模型与多个独立的折衣线性回归模型进行比较,以示出部分汇集在机床设备中的优点,包括预测精度和不确定性评估的提高。

Boosted Control Functions

  • paper_url: http://arxiv.org/abs/2310.05805
  • repo_url: https://github.com/zszszszsz/.config
  • paper_authors: Nicola Gnecco, Jonas Peters, Sebastian Engelke, Niklas Pfister
  • for: 这篇研究旨在bridging the gap between existing prediction methods and the presence of hidden confounding, especially when the training and testing data are different.
  • methods: 本研究使用了distribution generalization from machine learning和simultaneous equation models and control function from econometrics,并提出了一新的同时方程模型(SIMDG)来描述资料生成过程下的分布差异。
  • results: 研究发现了一个强制条件(boosted control function,BCF),可以在不同的训练和测试数据下预测成功,并且提供了必要和充分的条件来识别BCF。
    Abstract Modern machine learning methods and the availability of large-scale data opened the door to accurately predict target quantities from large sets of covariates. However, existing prediction methods can perform poorly when the training and testing data are different, especially in the presence of hidden confounding. While hidden confounding is well studied for causal effect estimation (e.g., instrumental variables), this is not the case for prediction tasks. This work aims to bridge this gap by addressing predictions under different training and testing distributions in the presence of unobserved confounding. In particular, we establish a novel connection between the field of distribution generalization from machine learning, and simultaneous equation models and control function from econometrics. Central to our contribution are simultaneous equation models for distribution generalization (SIMDGs) which describe the data-generating process under a set of distributional shifts. Within this framework, we propose a strong notion of invariance for a predictive model and compare it with existing (weaker) versions. Building on the control function approach from instrumental variable regression, we propose the boosted control function (BCF) as a target of inference and prove its ability to successfully predict even in intervened versions of the underlying SIMDG. We provide necessary and sufficient conditions for identifying the BCF and show that it is worst-case optimal. We introduce the ControlTwicing algorithm to estimate the BCF and analyze its predictive performance on simulated and real world data.
    摘要 现代机器学习方法和大规模数据的可用性打开了预测目标量的准确预测的大门。然而,现有的预测方法在训练和测试数据不同时可能表现不佳,特别是在隐藏束缚的情况下。隐藏束缚在 causal effect estimation 中已经得到了广泛的研究(例如,用工具变量),但是这并不是预测任务的情况。本研究旨在bridging这个差距,通过面对不同训练和测试分布下的预测 task 中隐藏束缚的问题。特别是,我们建立了一种 novel connection между机器学习中的分布泛化和 econometrics 中的同时方程模型和控制函数。我们的贡献包括在这种框架下提出的同时方程模型 для分布泛化(SIMDGs),这些模型描述了数据生成过程中的分布性变化。在这个框架下,我们提出了一种强版均衡性的目标函数,并与现有(弱版)目标函数进行比较。基于控制函数方法,我们提出了增强控制函数(BCF)作为预测目标,并证明其能够在对 SIMDG 进行 intervened 后仍能成功预测。我们还提供了 necessary and sufficient conditions для Identifying BCF,并证明它是最差情况下的优化目标。 finally,我们介绍了 ControlTwicing 算法来估计 BCF,并分析了它在 simulated 和实际数据上的预测性能。

An operator preconditioning perspective on training in physics-informed machine learning

  • paper_url: http://arxiv.org/abs/2310.05801
  • repo_url: None
  • paper_authors: Tim De Ryck, Florent Bonnet, Siddhartha Mishra, Emmanuel de Bézenac
  • for: investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs
  • methods: employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies for preconditioning a critical differential operator
  • results: the difficulty in training these models is closely related to the conditioning of a specific differential operator, and preconditioning this operator is crucial for improving training
    Abstract In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
    摘要 在这篇论文中,我们研究了梯度下降算法在物理学知识Machine learning方法中的行为,如PINNs,它们用来最小化连接到部分偏微分方程(PDEs)的差异。我们的关键结果表明,训练这些模型的困难直接与一个特定的导数器的条件相关。这个导数器,则是PDE的导数器的 Hermitian平方的一个特殊情况。如果这个导数器是不良条件的,它会导致训练慢或不可能进行。因此,预conditioning这个关键导数器是关键。我们使用了严格的数学分析和实验评估来研究不同的策略,解释它们如何改善这个关键导数器的条件,并因此提高训练。

The First Cadenza Signal Processing Challenge: Improving Music for Those With a Hearing Loss

  • paper_url: http://arxiv.org/abs/2310.05799
  • repo_url: https://github.com/claritychallenge/clarity/tree/main/recipes/cad1/task1
  • paper_authors: Gerardo Roa Dabike, Scott Bannister, Jennifer Firth, Simone Graetzer, Rebecca Vos, Michael A. Akeroyd, Jon Barker, Trevor J. Cox, Bruno Fazenda, Alinka Greasley, William Whitmer
  • for: 提高音乐质量 для听力受损人群
  • methods: 使用信号处理挑战和个性化混音/分离技术
  • results: 提高音乐质量,使用HAAQI指数对象评估和人类评审者对subjective评估
    Abstract The Cadenza project aims to improve the audio quality of music for those who have a hearing loss. This is being done through a series of signal processing challenges, to foster better and more inclusive technologies. In the first round, two common listening scenarios are considered: listening to music over headphones, and with a hearing aid in a car. The first scenario is cast as a demixing-remixing problem, where the music is decomposed into vocals, bass, drums and other components. These can then be intelligently remixed in a personalized way, to increase the audio quality for a person who has a hearing loss. In the second scenario, music is coming from car loudspeakers, and the music has to be enhanced to overcome the masking effect of the car noise. This is done by taking into account the music, the hearing ability of the listener, the hearing aid and the speed of the car. The audio quality of the submissions will be evaluated using the Hearing Aid Audio Quality Index (HAAQI) for objective assessment and by a panel of people with hearing loss for subjective evaluation.
    摘要 《干扰计划》旨在提高音乐质量,以帮助有听力问题的人。这是通过一系列的信号处理挑战,促进更加包容的科技。在首轮中,考虑了两个常见的听音情况:用耳机听音乐,以及在车里使用听力器。第一个情况是将音乐转化为男女声、低音、鼓等部分,然后以人性化的方式重新混合,以提高听损人的音乐质量。第二个情况是音乐来自车里的 loudspeakers,需要利用音乐、听力问题、听力器和车速来增强音乐,以扩除车声的遮蔽效应。音乐质量的评价使用《听力器音乐质量指数》(HAAQI)进行 объектив评估,并由有听力问题的人士进行主观评价。

Efficient Hybrid Oversampling and Intelligent Undersampling for Imbalanced Big Data Classification

  • paper_url: http://arxiv.org/abs/2310.05789
  • repo_url: None
  • paper_authors: Carla Vairetti, José Luis Assadi, Sebastián Maldonado
  • for: solves the issue of imbalanced classification in real-world applications
  • methods: combines intelligent undersampling and oversampling using a MapReduce framework
  • results: outperforms alternative resampling techniques for small- and medium-sized datasets, achieves positive results on large datasets with reduced running times.Here’s the full translation of the paper’s abstract in Simplified Chinese:
  • for: solves the issue of imbalanced classification in real-world applications
  • methods: combines intelligent undersampling and oversampling using a MapReduce framework
  • results: outperforms alternative resampling techniques for small- and medium-sized datasets, achieves positive results on large datasets with reduced running times.I hope this helps! Let me know if you have any other questions.
    Abstract Imbalanced classification is a well-known challenge faced by many real-world applications. This issue occurs when the distribution of the target variable is skewed, leading to a prediction bias toward the majority class. With the arrival of the Big Data era, there is a pressing need for efficient solutions to solve this problem. In this work, we present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework. Both procedures are performed on the same pass over the data, conferring efficiency to the technique. The SMOTENN method is complemented with an efficient implementation of the neighborhoods related to the minority samples. Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets while achieving positive results on large datasets with reduced running times.
    摘要 不均衡分类是现实世界中许多应用程序的挑战之一。这种问题出现在目标变量的分布偏斜时,导致预测偏向大多数类。在大数据时代 arrives,有一项压力需要解决这个问题。在这种工作中,我们提出了一种新的抽样方法called SMOTENN,它将智能下抽样和上抽样与 MapReduce 框架结合在一起。两种过程都在数据上进行了同一次读取,从而提高了方法的效率。 SMOTENN 方法还包括有效地实现少数类邻居的方法。我们的实验结果表明,这种方法在小到中型数据集上表现出色,而且在大数据集上具有减少运行时间的正面效果。

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

  • paper_url: http://arxiv.org/abs/2310.05779
  • repo_url: https://github.com/copenlu/wiki-stance
  • paper_authors: Lucie-Aimée Kaffee, Arnav Arora, Isabelle Augenstein
  • For: The paper aims to improve the transparency of content moderation on online platforms, specifically on Wikipedia, by constructing a novel multilingual dataset of editor discussions and their reasoning.* Methods: The paper uses a machine learning approach to predict the stance and reason (content moderation policy) of editors for each edit decision, adding transparency to the decision-making process.* Results: The paper demonstrates that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, providing a more transparent approach to content moderation.Here are the three key information points in Simplified Chinese text:* For: 论文目的是提高在线平台上的内容审核透明度,具体来说是在Wikipedia上进行编辑者讨论和决策的透明度。* Methods: 论文使用机器学习方法预测编辑者决策中的态度和理由(内容审核政策),以提高决策过程的透明度。* Results: 论文表明,态度和相应的理由(政策)可以通过高精度预测的方法相互关联,从而提供更透明的内容审核方法。
    Abstract The moderation of content on online platforms is usually non-transparent. On Wikipedia, however, this discussion is carried out publicly and the editors are encouraged to use the content moderation policies as explanations for making moderation decisions. Currently, only a few comments explicitly mention those policies -- 20% of the English ones, but as few as 2% of the German and Turkish comments. To aid in this process of understanding how content is moderated, we construct a novel multilingual dataset of Wikipedia editor discussions along with their reasoning in three languages. The dataset contains the stances of the editors (keep, delete, merge, comment), along with the stated reason, and a content moderation policy, for each edit decision. We demonstrate that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process. We release both our joint prediction models and the multilingual content moderation dataset for further research on automated transparent content moderation.
    摘要 在线平台内容Moderation通常是不透明的。然而,在Wikipedia上,这个讨论被公开进行,编辑被鼓励使用内容Moderation政策作为决策的解释。目前,只有英语评论中有20%是明确提到政策,德语和土耳其语评论中则只有2%。为了帮助理解内容Moderation的过程,我们构建了一个多语言数据集,包括Wikipedia编辑讨论、理由和内容Moderation政策。我们示示了editors的立场和相应的理由(政策)可以并行预测,增加了决策过程的透明度。我们发布了联合预测模型和多语言内容Moderation数据集,以便进一步研究自动透明内容Moderation。

Foundation Models Meet Visualizations: Challenges and Opportunities

  • paper_url: http://arxiv.org/abs/2310.05771
  • repo_url: None
  • paper_authors: Weikai Yang, Mengchen Liu, Zheng Wang, Shixia Liu
  • for: This paper explores the intersection of visualization techniques and foundation models like BERT and GPT, and how they can be used to improve transparency, explainability, fairness, and robustness in AI systems.
  • methods: The paper divides the intersections of visualization techniques and foundation models into two main areas: visualizations for foundation models (VIS4FM) and foundation models for visualizations (FM4VIS).
  • results: The paper highlights the challenges and opportunities that arise from the confluence of foundation models and visualizations, and provides a starting point for continued exploration in this promising avenue.Here is the same information in Simplified Chinese:
  • for: 这篇论文探讨了基础模型如BERT和GPT与视觉技术的交叉,以提高人工智能系统的透明度、解释性、公平性和稳定性。
  • methods: 论文将这些交叉分为两个主要领域:用于基础模型的视觉(VIS4FM)和基础模型用于视觉的发展(FM4VIS)。
  • results: 论文描述了这些交叉所带来的挑战和机遇,并提供了这个领域的开始点 для进一步的探索。
    Abstract Recent studies have indicated that foundation models, such as BERT and GPT, excel in adapting to a variety of downstream tasks. This adaptability has established them as the dominant force in building artificial intelligence (AI) systems. As visualization techniques intersect with these models, a new research paradigm emerges. This paper divides these intersections into two main areas: visualizations for foundation models (VIS4FM) and foundation models for visualizations (FM4VIS). In VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate models. This addresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, within FM4VIS, we highlight how foundation models can be utilized to advance the visualization field itself. The confluence of foundation models and visualizations holds great promise, but it also comes with its own set of challenges. By highlighting these challenges and the growing opportunities, this paper seeks to provide a starting point for continued exploration in this promising avenue.
    摘要 现代研究表明,基础模型,如BERT和GPT,在适应多种下游任务方面表现出色。这种适应能力使其成为人工智能系统建设的主导力量。在这些模型与视觉化技术交叉点的研究中,一个新的研究模式出现。这篇论文将这些交叉点分为两个主要领域:用于基础模型的视觉化(VIS4FM)和基础模型为视觉化的应用(FM4VIS)。在 VIS4FM 中,我们探索了视觉化在理解、修改和评估这些复杂模型的 primacy 角色。这种需求包括透明度、解释性、公平性和稳定性。相反,在 FM4VIS 中,我们强调了基础模型如何推动视觉化领域的进步。这两个领域的交叉点具有极大的推动力,但也存在一些挑战。通过强调这些挑战和快速发展的机遇,这篇论文希望为这一领域的进一步探索提供一个开始。

LCOT: Linear circular optimal transport

  • paper_url: http://arxiv.org/abs/2310.06002
  • repo_url: None
  • paper_authors: Rocio Diaz Martin, Ivan Medri, Yikun Bai, Xinran Liu, Kangbai Yan, Gustavo K. Rohde, Soheil Kolouri
  • for: 这篇论文主要关注于圆形概率分布,并提出了一新的计算效率高的度量方法,即线性圆形最佳运输(LCOT)。
  • methods: 该论文引入了一个新的计算效率高的度量方法,即LCOT,并提供了一个可靠的线性映射,使得可以将机器学习(ML)算法应用到圆形概率分布上,并且让度量方法与ML算法之间的转换非常容易。
  • results: 论文通过一系列的数据实验示出了LCOT的效能,并显示了它在学习圆形概率分布的表现比较高于传统的圆形最佳运输(COT)度量方法。
    Abstract The optimal transport problem for measures supported on non-Euclidean spaces has recently gained ample interest in diverse applications involving representation learning. In this paper, we focus on circular probability measures, i.e., probability measures supported on the unit circle, and introduce a new computationally efficient metric for these measures, denoted as Linear Circular Optimal Transport (LCOT). The proposed metric comes with an explicit linear embedding that allows one to apply Machine Learning (ML) algorithms to the embedded measures and seamlessly modify the underlying metric for the ML algorithm to LCOT. We show that the proposed metric is rooted in the Circular Optimal Transport (COT) and can be considered the linearization of the COT metric with respect to a fixed reference measure. We provide a theoretical analysis of the proposed metric and derive the computational complexities for pairwise comparison of circular probability measures. Lastly, through a set of numerical experiments, we demonstrate the benefits of LCOT in learning representations of circular measures.
    摘要 最近几年,非欧几何空间上的最优运输问题已经吸引了多种应用,其中包括表示学习。在这篇论文中,我们将关注圆形概率度量,即圆周上的概率度量,并提出一种新的计算高效的度量,称为线性圆形最优运输(LCOT)。我们的度量包括一个显式的线性映射,使得可以通过对扩展到Machine Learning(ML)算法的度量进行应用,并且可以顺利地修改下面的度量为LCOT。我们证明了我们的度量基于圆形最优运输(COT)度量,并且可以视为对固定参照度量的线性化。我们提供了对度量的理论分析,并计算了对圆形概率度量的对比的计算复杂度。最后,通过一系列的数值实验,我们证明了LCOT在学习圆形度量的表示方面的好处。

Nonlinear Correct and Smooth for Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.05757
  • repo_url: None
  • paper_authors: Yuanhang Shao, Xiuwen Liu
  • for: 本研究针对Graph-based semi-supervised learning (GSSL) 进行了改进,以提高预测性能。
  • methods: 本研究使用了 Label Propagation (LP) 和 Graph Neural Networks (GNNs) 等方法,并将它们组合以提高表现。
  • results: 系统评估显示,本研究的方法可以在六个常用的数据集上取得了remarkable的平均提升率,较基本预测方法提升率高出13.71%,并且较现有的后处理方法提升率高出2.16%。
    Abstract Graph-based semi-supervised learning (GSSL) has been used successfully in various applications. Existing methods leverage the graph structure and labeled samples for classification. Label Propagation (LP) and Graph Neural Networks (GNNs) both iteratively pass messages on graphs, where LP propagates node labels through edges and GNN aggregates node features from the neighborhood. Recently, combining LP and GNN has led to improved performance. However, utilizing labels and features jointly in higher-order graphs has not been explored. Therefore, we propose Nonlinear Correct and Smooth (NLCS), which improves the existing post-processing approach by incorporating non-linearity and higher-order representation into the residual propagation to handle intricate node relationships effectively. Systematic evaluations show that our method achieves remarkable average improvements of 13.71% over base prediction and 2.16% over the state-of-the-art post-processing method on six commonly used datasets. Comparisons and analyses show our method effectively utilizes labels and features jointly in higher-order graphs to resolve challenging graph relationships.
    摘要 GRAPH-BASED SEMI-SUPERVISED LEARNING (GSSL) 已经成功应用于多个领域。现有方法利用图结构和标注样本进行分类。标签推广(LP)和图神经网络(GNNs)都是在图上进行迭代传递消息的方法,其中LP通过边传递节点标签,GNN从邻居聚合节点特征。最近,将LP和GNN结合使用已经导致了提高性能。然而,在更高阶图上同时利用标签和特征还没有被探索。因此,我们提出了非线性稳定(NLCS)方法,它通过在剩余传播中添加非线性和高阶表示来有效地处理图中复杂的节点关系。系统性评估显示,我们的方法在六个常用的数据集上取得了显著的平均提升率为13.71%,与基础预测相比,和state-of-the-art post-processing方法相比,分别提高了2.16%。比较和分析表明,我们的方法能够有效地在更高阶图上同时利用标签和特征进行分类。

Deep Concept Removal

  • paper_url: http://arxiv.org/abs/2310.05755
  • repo_url: https://github.com/aman432/Spam-Classifier
  • paper_authors: Yegor Klochkov, Jean-Francois Ton, Ruocheng Guo, Yang Liu, Hang Li
  • for: 本研究旨在深度神经网络中解决概念除去问题,以学习不含特定概念(如性别等)的表示。
  • methods: 我们提出了一种基于对概念集的对抗线性分类器的新方法,该方法可以帮助移除目标特征而不影响模型性能。我们在不同层次的网络中采用对抗探测类ifier,有效地解决了概念杂糜和OOD泛化问题。
  • results: 我们在一些流行的分布式 robust optimization(DRO)benchmark上进行了评估,以及OOD泛化任务。结果表明,我们的方法可以有效地除去概念,同时保持模型性能。
    Abstract We address the problem of concept removal in deep neural networks, aiming to learn representations that do not encode certain specified concepts (e.g., gender etc.) We propose a novel method based on adversarial linear classifiers trained on a concept dataset, which helps to remove the targeted attribute while maintaining model performance. Our approach Deep Concept Removal incorporates adversarial probing classifiers at various layers of the network, effectively addressing concept entanglement and improving out-of-distribution generalization. We also introduce an implicit gradient-based technique to tackle the challenges associated with adversarial training using linear classifiers. We evaluate the ability to remove a concept on a set of popular distributionally robust optimization (DRO) benchmarks with spurious correlations, as well as out-of-distribution (OOD) generalization tasks.
    摘要 我们关注深度神经网络中的概念除除问题,即学习不包含特定指定的概念(如性别等)的表示。我们提出了一种基于对概念集合的 adversarial 线性分类器的方法,可以帮助除除目标特征而保持模型性能。我们的方法深度概念除除包括对网络各层的 adversarial 探测器,有效地解决概念杂糜和外围泛化问题。此外,我们还介绍了一种基于偏导数的技术来解决对 adversarial 训练使用线性分类器的挑战。我们在一些流行的分布robust优化(DRO)benchmark上进行了评估,以及外围泛化任务。

Estimating Shape Distances on Neural Representations with Limited Samples

  • paper_url: http://arxiv.org/abs/2310.05742
  • repo_url: None
  • paper_authors: Dean A. Pospisil, Brett W. Larsen, Sarah E. Harvey, Alex H. Williams
  • for: 本研究旨在提供高维网络表示之间的几何相似性测量的一种有效方法,并且对这些方法进行了系统的分析和评估。
  • methods: 本研究使用了标准估计器,以及一种新的方法——方差调整的方法 OF moments estimator,以确定高维网络表示之间的几何相似性。
  • results: 研究发现,标准估计器在高维特征空间中存在困难,而新引入的方法 OF moments estimator 能够在实验和神经网络数据上达到更高的性能,特别是在高维设置下。
    Abstract Measuring geometric similarity between high-dimensional network representations is a topic of longstanding interest to neuroscience and deep learning. Although many methods have been proposed, only a few works have rigorously analyzed their statistical efficiency or quantified estimator uncertainty in data-limited regimes. Here, we derive upper and lower bounds on the worst-case convergence of standard estimators of shape distance$\unicode{x2014}$a measure of representational dissimilarity proposed by Williams et al. (2021). These bounds reveal the challenging nature of the problem in high-dimensional feature spaces. To overcome these challenges, we introduce a new method-of-moments estimator with a tunable bias-variance tradeoff. We show that this estimator achieves superior performance to standard estimators in simulation and on neural data, particularly in high-dimensional settings. Thus, we lay the foundation for a rigorous statistical theory for high-dimensional shape analysis, and we contribute a new estimation method that is well-suited to practical scientific settings.
    摘要 To overcome these challenges, we introduce a new method-of-moments estimator with a tunable bias-variance tradeoff. We show that this estimator achieves superior performance to standard estimators in simulation and on neural data, particularly in high-dimensional settings. Our findings lay the foundation for a rigorous statistical theory for high-dimensional shape analysis and contribute a new estimation method well-suited to practical scientific settings.

Post-hoc Bias Scoring Is Optimal For Fair Classification

  • paper_url: http://arxiv.org/abs/2310.05725
  • repo_url: None
  • paper_authors: Wenlong Chen, Yegor Klochkov, Yang Liu
  • for: 本文目标是研究一种基于分组公平约束的二分类问题的解决方案,包括人口均衡(DP)、平等机会(EOp)和平等投票机会(EO)等。
  • methods: 本文提出了一种基于 Bayes 优化的修改规则,通过对每个实例计算一个新的偏见指标(bias score),并将这些指标应用于修改规则来实现分组公平。修改规则可以是单个阈值或二元数组,具体取决于所使用的公平约束。
  • results: 本文通过使用三个数据集(Adult、COMPAS 和 CelebA)进行实验,显示了与内部处理和后处理方法相比,该方法可以实现高精度和分组公平。此外,该方法不需要在推断时访问敏感特征。
    Abstract We consider a binary classification problem under group fairness constraints, which can be one of Demographic Parity (DP), Equalized Opportunity (EOp), or Equalized Odds (EO). We propose an explicit characterization of Bayes optimal classifier under the fairness constraints, which turns out to be a simple modification rule of the unconstrained classifier. Namely, we introduce a novel instance-level measure of bias, which we call bias score, and the modification rule is a simple linear rule on top of the finite amount of bias scores. Based on this characterization, we develop a post-hoc approach that allows us to adapt to fairness constraints while maintaining high accuracy. In the case of DP and EOp constraints, the modification rule is thresholding a single bias score, while in the case of EO constraints we are required to fit a linear modification rule with 2 parameters. The method can also be applied for composite group-fairness criteria, such as ones involving several sensitive attributes. We achieve competitive or better performance compared to both in-processing and post-processing methods across three datasets: Adult, COMPAS, and CelebA. Unlike most post-processing methods, we do not require access to sensitive attributes during the inference time.
    摘要 我们考虑了一个二分类问题,其中需要满足一些群体公平约束,可能是人口比例(DP)、机会平等(EOp)或机会几率(EO)。我们提出了一个bayes最优分类器的显式化 caracterization,这 turns out to be a simple modification rule of the unconstrained classifier。我们引入了一个新的实例级别偏见度量,称为偏见得分,并且这个修改规则是一个单个偏见得分的阈值处理。基于这个characterization,我们开发了一种后处方法,可以在维护高准确率的同时适应公平约束。在DP和EOp约束下,修改规则是对偏见得分进行阈值处理,而在EO约束下,我们需要适应一个线性修改规则 WITH 2个参数。此方法还可以应用于复杂的群体公平标准,例如包括多个敏感特征。我们在三个 dataset(Adult、COMPAS和CelebA)上达到了竞争或更好的性能,与大多数后处方法不同的是,我们在推理时不需要访问敏感特征。

Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.05723
  • repo_url: None
  • paper_authors: Trevor McInroe, Stefano V. Albrecht, Amos Storkey
  • for: 在减少在线交互次数的情况下,找到最佳政策。
  • methods: 使用在线搜索和规划算法,以最大化在线数据收集的利益。
  • results: PTGOOD算法可以在减少在线交互次数的情况下,提高代理返回和找到最佳政策,并且可以避免许多基线算法在不同环境中的低效 converges。
    Abstract Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm that is well matched to a real-world RL deployment process: in few real settings would one deploy an offline policy with no test runs and tuning. In this scenario, we aim to find the best-performing policy within a limited budget of online interactions. Previous work in the OtO setting has focused on correcting for bias introduced by the policy-constraint mechanisms of offline RL algorithms. Such constraints keep the learned policy close to the behavior policy that collected the dataset, but this unnecessarily limits policy performance if the behavior policy is far from optimal. Instead, we forgo policy constraints and frame OtO RL as an exploration problem: we must maximize the benefit of the online data-collection. We study major online RL exploration paradigms, adapting them to work well with the OtO setting. These adapted methods contribute several strong baselines. Also, we introduce an algorithm for planning to go out of distribution (PTGOOD), which targets online exploration in relatively high-reward regions of the state-action space unlikely to be visited by the behavior policy. By leveraging concepts from the Conditional Entropy Bottleneck, PTGOOD encourages data collected online to provide new information relevant to improving the final deployment policy. In that way the limited interaction budget is used effectively. We show that PTGOOD significantly improves agent returns during online fine-tuning and finds the optimal policy in as few as 10k online steps in Walker and in as few as 50k in complex control tasks like Humanoid. Also, we find that PTGOOD avoids the suboptimal policy convergence that many of our baselines exhibit in several environments.
    摘要 假设我们有一个偏向于实际世界的强化学习(RL)部署过程:在几个实际场景中,我们不会直接部署一个没有测试和调整的策略。在这种情况下,我们想找到最佳的策略,而且在有限的在线互动次数内完成。先前的工作在OtO设定中集中在修正在线RL算法中的偏见问题上。这些约束保持学习策略与数据收集过程中的行为策略相互关联,但这会无需lessly限制策略性能,如果行为策略远离优化。因此,我们不采用策略约束,而是视为探索问题,我们需要在在线数据收集中最大化收益。我们研究了在线RL探索方法,并将其适应到OtO设定中。这些适应方法提供了多个强大的基线。此外,我们介绍了一种计划去偏现(PTGOOD)算法,该算法target在高奖励区域的状态动作空间中进行在线探索。通过利用Conditional Entropy Bottleneck的概念,PTGOOD鼓励在线数据收集提供新的有用信息,以改进最终部署策略。因此,我们可以有效地使用有限的互动次数。我们显示,PTGOOD在在线细化中显著提高了代理返回,并在Walker和复杂控制任务中在10k和50k在线步骤内找到优化策略。此外,我们发现PTGOOD可以避免许多我们的基eline在多个环境中展现的差异性。

Transformer Fusion with Optimal Transport

  • paper_url: http://arxiv.org/abs/2310.05719
  • repo_url: https://github.com/Yahnnosh/Exploring-Model-Fusion-with-Optimal-Transport-on-Transformers
  • paper_authors: Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh
  • for: 这 paper 的目的是探讨 transformer 网络的合并技术,以提高模型的性能。
  • methods: 这 paper 使用 Optimal Transport 算法来软对接 transformer 网络的不同组件,以实现层Alignment。 authors 还提出了一种抽象层Alignment方法,可以普适应用于不同的架构。
  • results: experiments 表明,这 paper 的方法可以提高 transformer 网络的性能,并且可以让模型具有更好的泛化能力。 authors 还发现了一些有趣的现象,例如软对接在 transformer 网络中的重要作用。
    Abstract Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities. Past attempts have been restricted to the case of fully-connected, convolutional, and residual networks. In this paper, we present a systematic approach for fusing two or more transformer-based networks exploiting Optimal Transport to (soft-)align the various architectural components. We flesh out an abstraction for layer alignment, that can generalize to arbitrary architectures -- in principle -- and we apply this to the key ingredients of Transformers such as multi-head self-attention, layer-normalization, and residual connections, and we discuss how to handle them via various ablation studies. Furthermore, our method allows the fusion of models of different sizes (heterogeneous fusion), providing a new and efficient way for compression of Transformers. The proposed approach is evaluated on both image classification tasks via Vision Transformer and natural language modeling tasks using BERT. Our approach consistently outperforms vanilla fusion, and, after a surprisingly short finetuning, also outperforms the individual converged parent models. In our analysis, we uncover intriguing insights about the significant role of soft alignment in the case of Transformers. Our results showcase the potential of fusing multiple Transformers, thus compounding their expertise, in the budding paradigm of model fusion and recombination.
    摘要 文本翻译为简化中文。merge multiple independently trained neural networks to combine their capabilities. Previous attempts were limited to fully connected, convolutional, and residual networks. In this paper, we present a systematic approach for fusing two or more transformer-based networks using optimal transport to align the various architectural components. We provide an abstraction for layer alignment that can generalize to any architecture and apply it to key ingredients of Transformers such as multi-head self-attention, layer normalization, and residual connections. We also discuss how to handle them through various ablation studies. Furthermore, our method allows the fusion of models of different sizes (heterogeneous fusion), providing a new and efficient way for compression of Transformers. The proposed approach is evaluated on both image classification tasks using Vision Transformer and natural language modeling tasks using BERT. Our approach consistently outperforms vanilla fusion and, after a surprisingly short finetuning, also outperforms the individual converged parent models. In our analysis, we uncover intriguing insights about the significant role of soft alignment in the case of Transformers. Our results showcase the potential of fusing multiple Transformers, thus compounding their expertise, in the emerging paradigm of model fusion and recombination.以下是简化中文版本:融合多个独立训练的神经网络,以合并它们的能力。过去的尝试都是限制在几何网络、卷积网络和差分网络上。在这篇论文中,我们提出了一种系统的方法,使用最优运输来软对齐多个转换器基础网络的各种建筑 ком成分。我们提供了一个抽象层对齐概念,可以泛化到任何建筑,并应用于转换器的关键组成部分,如多头自我注意、层Normalization和差分连接。我们还讨论了如何通过不同的截止方法处理它们。此外,我们的方法允许模型的不同大小(不同大小的融合),提供了一种新的高效的压缩方法。我们的方法在图像分类任务上使用视Transformer和自然语言处理任务上使用BERT进行评估,我们的方法一致性超过了普通融合,并在短暂的训练后也超过了父模型的单独整合。在我们的分析中,我们发现了关于转换器中软对齐的各种惊喜的发现。我们的结果显示,可以将多个转换器融合在一起,从而汇集它们的专长,在模型融合和重新组合的新时代中发挥作用。

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

  • paper_url: http://arxiv.org/abs/2310.05712
  • repo_url: None
  • paper_authors: Xiong-Hui Chen, Junyin Ye, Hang Zhao, Yi-Chen Li, Haoran Shi, Yu-Yan Xu, Zhihao Ye, Si-Hang Yang, Anqi Huang, Kai Xu, Zongzhang Zhang, Yang Yu
  • for: 本文旨在提出一种新的imitator learning(ItorL)方法,以便在很少示例的情况下,快速重建不同任务的imitator模块,并适应未预期的环境变化。
  • methods: 本文提出了一种基于一个专家示例的Demo-Attention Actor-Critic(DAAC)方法,将imitator学习纳入了一种强化学习框架,以Regularize策略的行为在意外情况下。此外,为了自主建立imitator策略,我们设计了一个示例基于注意力架构,可以有效地输出imiter动作,适应不同的状态。
  • results: 我们在一个新的导航benchmark和一个机器人环境中测试了DAAC方法,与之前的imitator方法相比,DAAC方法在 seen和未seen任务上都有大幅提高,具体来说,DAAC方法在seen任务上提高了24.3%,在未seen任务上提高了110.8%。
    Abstract Imitation learning (IL) enables agents to mimic expert behaviors. Most previous IL techniques focus on precisely imitating one policy through mass demonstrations. However, in many applications, what humans require is the ability to perform various tasks directly through a few demonstrations of corresponding tasks, where the agent would meet many unexpected changes when deployed. In this scenario, the agent is expected to not only imitate the demonstration but also adapt to unforeseen environmental changes. This motivates us to propose a new topic called imitator learning (ItorL), which aims to derive an imitator module that can on-the-fly reconstruct the imitation policies based on very limited expert demonstrations for different unseen tasks, without any extra adjustment. In this work, we focus on imitator learning based on only one expert demonstration. To solve ItorL, we propose Demo-Attention Actor-Critic (DAAC), which integrates IL into a reinforcement-learning paradigm that can regularize policies' behaviors in unexpected situations. Besides, for autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy that can effectively output imitated actions by adaptively tracing the suitable states in demonstrations. We develop a new navigation benchmark and a robot environment for \topic~and show that DAAC~outperforms previous imitation methods \textit{with large margins} both on seen and unseen tasks.
    摘要 复制学习(IL)允许代理人模仿专家的行为。大多数前一些IL技术都是通过大量示例来精准地复制一个策略。然而,在许多应用场景中,人们需要代理人能够直接完成多个任务,而不需要大量的示例。在这种情况下,代理人需要不仅模仿示例,还需要适应不可预期的环境变化。这种情况 Motivates us to propose a new topic called imitator learning(ItorL), which aims to derive an imitator module that can on-the-fly重建模仿策略 based on very limited expert demonstrations for different unseen tasks, without any extra adjustment. In this work, we focus on imitator learning based on only one expert demonstration. To solve ItorL, we propose Demo-Attention Actor-Critic(DAAC), which integrates IL into a reinforcement-learning paradigm that can regularize policies' behaviors in unexpected situations. Besides, for autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy that can effectively output imitated actions by adaptively tracing the suitable states in demonstrations. We develop a new navigation benchmark and a robot environment for \topic~and show that DAAC~outperforms previous imitation methods \textit{with large margins} both on seen and unseen tasks.

Protecting Sensitive Data through Federated Co-Training

  • paper_url: http://arxiv.org/abs/2310.05696
  • repo_url: None
  • paper_authors: Amr Abourayya, Jens Kleesiek, Kanishka Rao, Erman Ayday, Bharat Rao, Geoff Webb, Michael Kamp
  • for: 保护敏感数据,避免公开地 revelas 本地训练数据。
  • methods: 使用联合学习方法,将本地训练的参数集成到一个共识模型中,以实现模型的训练。
  • results: 比较 federated learning 和分布式滤波两种方法, federated co-training 方法可以达到更高的隐私保护和模型质量。
    Abstract In many critical applications, sensitive data is inherently distributed. Federated learning trains a model collaboratively by aggregating the parameters of locally trained models. This avoids exposing sensitive local data. It is possible, though, to infer upon the sensitive data from the shared model parameters. At the same time, many types of machine learning models do not lend themselves to parameter aggregation, such as decision trees, or rule ensembles. It has been observed that in many applications, in particular healthcare, large unlabeled datasets are publicly available. They can be used to exchange information between clients by distributed distillation, i.e., co-regularizing local training via the discrepancy between the soft predictions of each local client on the unlabeled dataset. This, however, still discloses private information and restricts the types of models to those trainable via gradient-based methods. We propose to go one step further and use a form of federated co-training, where local hard labels on the public unlabeled datasets are shared and aggregated into a consensus label. This consensus label can be used for local training by any supervised machine learning model. We show that this federated co-training approach achieves a model quality comparable to both federated learning and distributed distillation on a set of benchmark datasets and real-world medical datasets. It improves privacy over both approaches, protecting against common membership inference attacks to the highest degree. Furthermore, we show that federated co-training can collaboratively train interpretable models, such as decision trees and rule ensembles, achieving a model quality comparable to centralized training.
    摘要 许多关键应用中敏感数据是自然地分布式。联邦学习通过合并本地训练模型的参数来培训模型,这样可以避免曝光敏感本地数据。然而,可以通过共享模型参数来推断敏感数据。此外,许多机器学习模型无法参数综合,如决策树或规则集。在许多应用中,尤其是医疗领域,大量的未标注数据公开可用。通过分布式蒸馏,即客户端之间通过未标注数据的差异来协调本地训练。这样仍然披露了私人信息,并限制了可以使用的模型类型。我们提议进一步使用联邦合作训练,其中本地硬标签在公共未标注数据上进行分布式合并,生成一个共识标签。这个共识标签可以用于本地训练任何超级vised机器学习模型。我们显示,这种联邦合作训练方法可以与联邦学习和分布式蒸馏相比,在一组benchmark数据集和真实医疗数据集上达到类似的模型质量。同时,它提高隐私性,保护 против最常见的会员推断攻击。此外,我们还显示,联邦合作训练可以共同训练可读性强的模型,如决策树和规则集,与中央训练相比。

Hierarchical Reinforcement Learning for Temporal Pattern Prediction

  • paper_url: http://arxiv.org/abs/2310.05695
  • repo_url: None
  • paper_authors: Faith Johnson, Kristin Dana
  • for: 这个论文探讨了使用层次强化学习(HRL)来解决时间序列预测任务。
  • methods: 作者使用了深度学习和HRL来开发一个用于预测股票价格时间序列的股票机器人,以及一个基于首人视频的车辆机器人来预测转向角。
  • results: 在两个领域中,作者发现了一种类型的HRL,即封顶强化学习,可以提供更高的训练速度和稳定性以及预测精度,而这一成功归功于网络层次结构中引入的时间和空间抽象。
    Abstract In this work, we explore the use of hierarchical reinforcement learning (HRL) for the task of temporal sequence prediction. Using a combination of deep learning and HRL, we develop a stock agent to predict temporal price sequences from historical stock price data and a vehicle agent to predict steering angles from first person, dash cam images. Our results in both domains indicate that a type of HRL, called feudal reinforcement learning, provides significant improvements to training speed and stability and prediction accuracy over standard RL. A key component to this success is the multi-resolution structure that introduces both temporal and spatial abstraction into the network hierarchy.
    摘要 在这个研究中,我们探索使用层次强制学习(HRL)来解决时间序列预测问题。通过将深度学习和HRL相结合,我们开发了一个股票代理来预测历史股票价格数据中的时间价格序列,以及一个车辆代理来预测来自首人、摄像头图像中的推理角度。我们在两个领域中的结果表明,一种称为“封顶强制学习”的HRL方法可以提供标准RL方法的训练速度和稳定性以及预测精度的显著改进。关键的一点是将多尺度结构引入网络层次结构,这种结构具有时间和空间抽象的双重优势。

Multi-timestep models for Model-based Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.05672
  • repo_url: None
  • paper_authors: Abdelhakim Benechehab, Giuseppe Paolo, Albert Thomas, Maurizio Filippone, Balázs Kégl
  • for: This paper aims to improve the performance of model-based reinforcement learning (MBRL) algorithms by using a multi-timestep objective to train one-step models.
  • methods: The authors use a weighted sum of loss functions at various future horizons as their objective, with exponentially decaying weights, to improve the long-horizon performance of their models.
  • results: The authors find that their multi-timestep models outperform or match standard one-step models in both pure batch reinforcement learning (RL) and iterated batch RL scenarios, particularly in noisy environments.Here’s the same information in Simplified Chinese:
  • for: 这篇论文目标是提高基于模型的学习(MBRL)算法的性能,通过使用多个时间步骤的目标来训练一步模型。
  • methods: 作者们使用多个时间步骤的损失函数权重和衰减来提高模型的长期性能。
  • results: 作者们发现,他们的多个时间步骤模型在纯批量学习(RL)和迭代批量RL场景中都能够超过或与标准一步模型匹配,特别在噪音环境中表现出色, highlighting the potential of their approach in real-world applications。
    Abstract In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as length of the trajectory grows. In this paper we tackle this issue by using a multi-timestep objective to train one-step models. Our objective is a weighted sum of a loss function (e.g., negative log-likelihood) at various future horizons. We explore and test a range of weights profiles. We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score. This improvement is particularly noticeable when the models were evaluated on noisy data. Finally, using a soft actor-critic (SAC) agent in pure batch reinforcement learning (RL) and iterated batch RL scenarios, we found that our multi-timestep models outperform or match standard one-step models. This was especially evident in a noisy variant of the considered environment, highlighting the potential of our approach in real-world applications.
    摘要 在基于模型的强化学习(MBRL)中,大多数算法都是通过从一步动力学模型学习数据上的一步预测错误来预测 trajectory。这种方法的一个挑战是预测误差的积累作用,随着 trajectory 的长度增长。在这篇论文中,我们解决这个问题 by using a multi-timestep objective to train one-step models。我们的目标是一个 weighted sum of a loss function(例如 negative log-likelihood)at various future horizons。我们探索和测试了不同的Weight profile。我们发现,使用恒速衰减的Weight leads to models that significantly improve the long-horizon R2 score。这种改进特别明显在噪音环境中, highlighting the potential of our approach in real-world applications。In addition, we used a soft actor-critic (SAC) agent in pure batch reinforcement learning (RL) and iterated batch RL scenarios, and found that our multi-timestep models outperformed or matched standard one-step models. This was especially evident in a noisy variant of the considered environment.

LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.05668
  • repo_url: None
  • paper_authors: Feiyi Chen, Zhen Qing, Yingying Zhang, Shuiguang Deng, Yi Xiao, Guansong Pang, Qingsong Wen
  • for: 这个研究旨在提出一种Light and Anti-overfitting Retraining Approach (LARA),用于深度Variational Autoencoder (VAEs) 时间序列异常检测方法中。
  • methods: 本研究使用了一个新的Retraining process,它可以快速地调整模型,并且避免过滤。此外,本研究还提出了一个叫做ruminate block的新方法,可以利用历史数据而不需要储存它们。
  • results: 本研究的实验结果显示,可以使用43个时间槽的新分布数据进行重训,却可以与现有的异常检测模型相比,并且显示出较低的过滤频率。此外,本研究还证明了LARA模型的过程调整 overhead 轻量级。
    Abstract Most of current anomaly detection models assume that the normal pattern remains same all the time. However, the normal patterns of Web services change dramatically and frequently. The model trained on old-distribution data is outdated after such changes. Retraining the whole model every time is expensive. Besides, at the beginning of normal pattern changes, there is not enough observation data from the new distribution. Retraining a large neural network model with limited data is vulnerable to overfitting. Thus, we propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs). This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones. Moreover, we have performed many experiments to verify that retraining LARA with even 43 time slots of data from new distribution can result in its competitive F1 Score in comparison with the state-of-the-art anomaly detection models trained with sufficient data. Besides, we verify its light overhead.
    摘要 现有的异常检测模型大多假设常规模式一直不变。然而,Web服务中常规模式会频繁变化,训练过去的模型会变得异常。每次 retraining 整个模型都是昂贵的。此外,在常规模式变化的开始时,新分布中的数据不够,使用有限的数据重新训练大型神经网络模型容易过拟合。因此,我们提出了一种轻量级、避免过拟合的重新训练方法(LARA),用于深度变量自动编码器基于时间序列异常检测方法(VAEs)。本工作的三个新贡献如下:1. 重新训练过程被形式化为一个凸问题,可以快速 converge 并避免过拟合。2. 设计了一个留存块,可以利用历史数据而无需存储。3. 数学上证明,当微调 latent vector 和重构数据时,线性形式可以实现最小调整误差 между 真实值和微调后的值。此外,我们进行了多个实验,证明在使用43个时间槽的新分布数据重新训练LARA后,其竞争性F1分数与state-of-the-art异常检测模型相比较高。同时,我们还证明了其轻量级。

Binary Classification with Confidence Difference

  • paper_url: http://arxiv.org/abs/2310.05632
  • repo_url: https://github.com/wwangwitsel/ConfDiff
  • paper_authors: Wei Wang, Lei Feng, Yuchen Jiang, Gang Niu, Min-Ling Zhang, Masashi Sugiyama
  • for: 本研究旨在利用信度差(Confidence Difference,简称ConfDiff)来进行Binary分类,而不需要每个训练样本的点击标签。
  • methods: 我们提出了一种风险一致的方法来解决这个问题,并证明了这个方法的整体趋势和减震性。
  • results: 我们在 benchmark 数据集和一个实际应用中的推荐系统数据集上进行了广泛的实验,并证明了我们的提议的有效性。
    Abstract Recently, learning with soft labels has been shown to achieve better performance than learning with hard labels in terms of model generalization, calibration, and robustness. However, collecting pointwise labeling confidence for all training examples can be challenging and time-consuming in real-world scenarios. This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. Instead of pointwise labeling confidence, we are given only unlabeled data pairs with confidence difference that specifies the difference in the probabilities of being positive. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound achieves the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven. Extensive experiments on benchmark data sets and a real-world recommender system data set validate the effectiveness of our proposed approaches in exploiting the supervision information of the confidence difference.
    摘要

Cost-sensitive probabilistic predictions for support vector machines

  • paper_url: http://arxiv.org/abs/2310.05997
  • repo_url: None
  • paper_authors: Sandra Benítez-Peña, Rafael Blanquero, Emilio Carrizosa, Pepa Ramírez-Cobo
  • for: 这种方法是为了生成SVM模型中的概率输出,并且能够处理不均衡数据集,以及使用参数优化过程中生成的有价值信息来提高模型的性能。
  • methods: 这种方法使用了成本敏感的SVM模型,并将其嵌入到协同 ensemble 方法中,使用bootstrapEstimates来估计概率。
  • results: 数据测试表明,这种方法在各种数据集上比基准方法有着优异的性能。
    Abstract Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for two-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information that are not fully exploited, not being used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The new method has the following three properties. First, it is designed to be cost-sensitive, and thus the different importance of sensitivity (or true positive rate, TPR) and specificity (true negative rate, TNR) is readily accommodated in the model. As a result, the model can deal with imbalanced datasets which are common in operational business problems as churn prediction or credit scoring. Second, the SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing approaches. Numerical tests on a wide range of datasets show the advantages of our approach over benchmark procedures.
    摘要 支持向量机(SVM)是广泛使用的机器学习模型之一,是二类分类中最好的考试和使用的模型之一。在SVM中的分类基于得分过程,得到了决定性的分类规则,可以转换为概率性的分类规则(如在各种SVM库中实现的),但是不是概率性的。然而,SVM的常量参数优化知识具有高计算成本和生成不完全利用的信息,不会建立概率分类规则。在这篇论文中,我们提出了一种新的方法,以生成SVM的概率输出。这种方法具有以下三个特点:首先,它是成本敏感的,可以 readily 折衔不均衡的数据集,这些数据集在运营商业问题中很常见,如脱退预测和信用评分。第二,SVM被嵌入到集成方法中,以提高其性能,利用参数优化过程中生成的有价值信息。最后,概率估计通过 bootstrap 估计进行,避免使用参数模型作为竞争方法。数值测试在各种数据集上表明了我们的方法的优势。

On Prediction-Modelers and Decision-Makers: Why Fairness Requires More Than a Fair Prediction Model

  • paper_url: http://arxiv.org/abs/2310.05598
  • repo_url: None
  • paper_authors: Teresa Scantamburlo, Joachim Baumann, Christoph Heitz
  • for: 本文旨在阐述在预测基于决策中的公平性问题,并提出一个框架来帮助实现公平性。
  • methods: 本文使用了概念分离技术,将预测和决策分为两个独立的步骤,以便更好地理解和实现公平性。
  • results: 本文提出了一个框架,可以帮助在预测基于决策中实现公平性,并提出了一些实现公平性的策略和方法。
    Abstract An implicit ambiguity in the field of prediction-based decision-making regards the relation between the concepts of prediction and decision. Much of the literature in the field tends to blur the boundaries between the two concepts and often simply speaks of 'fair prediction.' In this paper, we point out that a differentiation of these concepts is helpful when implementing algorithmic fairness. Even if fairness properties are related to the features of the used prediction model, what is more properly called 'fair' or 'unfair' is a decision system, not a prediction model. This is because fairness is about the consequences on human lives, created by a decision, not by a prediction. We clarify the distinction between the concepts of prediction and decision and show the different ways in which these two elements influence the final fairness properties of a prediction-based decision system. In addition to exploring this relationship conceptually and practically, we propose a framework that enables a better understanding and reasoning of the conceptual logic of creating fairness in prediction-based decision-making. In our framework, we specify different roles, namely the 'prediction-modeler' and the 'decision-maker,' and the information required from each of them for being able to implement fairness of the system. Our framework allows for deriving distinct responsibilities for both roles and discussing some insights related to ethical and legal requirements. Our contribution is twofold. First, we shift the focus from abstract algorithmic fairness to context-dependent decision-making, recognizing diverse actors with unique objectives and independent actions. Second, we provide a conceptual framework that can help structure prediction-based decision problems with respect to fairness issues, identify responsibilities, and implement fairness governance mechanisms in real-world scenarios.
    摘要 <> translation into Simplified Chinese一个隐式的歧义在预测基础决策领域是预测和决策的关系。大多数 литературе在这个领域通常会混同这两个概念,并只是说的'公平预测'。在这篇论文中,我们指出了这两个概念之间的分化是有助于实施算法公平的。即使公平性特性与预测模型的特性相关,但是真正是'公平'或'不公平'的是决策系统,不是预测模型。这是因为公平是关于人类生活的后果,而不是预测的结果。我们清楚地区分了预测和决策的概念,并显示了这两个元素在最终公平性质量上的不同影响。除了探讨这种关系的概念和实践方面,我们提出了一个框架,允许更好地理解和理解预测基础决策中的公平创造机制。在我们的框架中,我们详细定义了不同角色,包括'预测模型者'和'决策者',以及它们所需的信息,以便实现预测基础决策系统的公平。我们的框架允许 derive出不同的责任,并讨论一些与伦理和法律要求相关的洞察。我们的贡献是两重的。首先,我们将焦点从抽象的算法公平转移到了 Context-dependent 决策,认可多种演员有独特的目标和独立行动。其次,我们提供了一个概念框架,可以帮助结构预测基础决策问题,识别责任,并在实际场景中实施公平管理机制。

ODEFormer: Symbolic Regression of Dynamical Systems with Transformers

  • paper_url: http://arxiv.org/abs/2310.05573
  • repo_url: https://github.com/sdascoli/odeformer
  • paper_authors: Stéphane d’Ascoli, Sören Becker, Alexander Mathis, Philippe Schwaller, Niki Kilbertus
  • for: 描述一种可以从单个解曲线观测数据中推断多维常微方程系统的符号形式传播模型(ODEFormer)。
  • methods: 使用变换器来推断多维常微方程系统的符号形式。
  • results: ODEFormer在两个数据集上(Strogatz和ODEBench)表现出色,在干扰和不规则观测数据中 Displaying substantially improved robustness and faster inference compared to existing methods。
    Abstract We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory. We perform extensive evaluations on two datasets: (i) the existing "Strogatz" dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully curated from the literature to provide a more holistic benchmark. ODEFormer consistently outperforms existing methods while displaying substantially improved robustness to noisy and irregularly sampled observations, as well as faster inference. We release our code, model and benchmark dataset publicly.
    摘要 我们介绍ODEFormer,首个能够从单一解析轨迹观测中推导多维常微方程系统的transformer。我们在两个数据集上进行了广泛的评估:(一)现有的“Strogatz”数据集,这是一个二维系统的数据集;(二)ODEBench,我们从文献中精心范选了一些一至四维系统,以提供更加全面的benchmark。ODEFormer在数据集上一般性高,而且在噪音和不规则采样观测下具有substantially提高的Robustness,以及更快的推导速度。我们将代码、模型和数据集公开发布。

A New Transformation Approach for Uplift Modeling with Binary Outcome

  • paper_url: http://arxiv.org/abs/2310.05549
  • repo_url: None
  • paper_authors: Kun Li, Jiang Tian, Xiaojia Xiang
  • for: 这篇论文是关于如何实现更好的客户预测和精确的标的定义,以提高营销效果。
  • methods: 本论文使用的方法是一种新的变数方法,可以将原始的对象指标转换为一个新的变数,以便更好地预测客户的反应。
  • results: 实验结果显示,新的变数方法可以优化客户预测和标的定义,提高营销效果。此外,这种方法还可以轻松地应用在实际应用中。
    Abstract Uplift modeling has been used effectively in fields such as marketing and customer retention, to target those customers who are more likely to respond due to the campaign or treatment. Essentially, it is a machine learning technique that predicts the gain from performing some action with respect to not taking it. A popular class of uplift models is the transformation approach that redefines the target variable with the original treatment indicator. These transformation approaches only need to train and predict the difference in outcomes directly. The main drawback of these approaches is that in general it does not use the information in the treatment indicator beyond the construction of the transformed outcome and usually is not efficient. In this paper, we design a novel transformed outcome for the case of the binary target variable and unlock the full value of the samples with zero outcome. From a practical perspective, our new approach is flexible and easy to use. Experimental results on synthetic and real-world datasets obviously show that our new approach outperforms the traditional one. At present, our new approach has already been applied to precision marketing in a China nation-wide financial holdings group.
    摘要 《升级模型》在市场营销和客户保持方面得到了有效应用,以针对那些响应更高的客户进行投入。概括来说,它是一种机器学习技术,预测对不进行处理的结果所带来的提升。一种受欢迎的类型的升级模型是转换方法,它重新定义目标变量与原始治理器指标之间的关系。这些转换方法只需要训练和预测直接的差异。但是,这些方法通常不使用治理器指标中的信息,除了构建转换后的结果外。在这篇论文中,我们设计了一种新的转换结果,用于二分类目标变量的情况。我们的新方法可以充分利用零结果样本的信息,从而提高准确率。实验结果表明,我们的新方法在synthetic和实际数据集上明显超过传统方法。在一家中国国家范围内的金融控股集团中,我们的新方法已经应用于精准营销。(Note: Please note that the translation is provided as-is, and may not be perfect or completely idiomatic. However, it should be sufficient to convey the general meaning of the text.)

NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification

  • paper_url: http://arxiv.org/abs/2310.05530
  • repo_url: https://github.com/koumajos/classification_by_nettisa_flow
  • paper_authors: Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka
  • for: 这篇论文旨在提出一种基于流量记录的网络流量监测方法,以便在各种网络基础设施上部署,包括承载数百万人的大型IPS网络。
  • methods: 该方法基于流量记录的时间序列分析,提出了一种新的扩展IP流记录(NetTiSA),并对25种网络类型任务进行了广泛的测试,以证明NetTiSA的广泛适用性和高可用性。
  • results: 测试结果表明,NetTiSA可以高度精准地分类网络流量,并且在计算流量扩展时对性能的影响较小。此外,NetTiSA可以在100Gbps级别的高速ISP网络上进行实际部署,因此可以提供广泛的网络安全保护。
    Abstract Network traffic monitoring based on IP Flows is a standard monitoring approach that can be deployed to various network infrastructures, even the large IPS-based networks connecting millions of people. Since flow records traditionally contain only limited information (addresses, transport ports, and amount of exchanged data), they are also commonly extended for additional features that enable network traffic analysis with high accuracy. Nevertheless, the flow extensions are often too large or hard to compute, which limits their deployment only to smaller-sized networks. This paper proposes a novel extended IP flow called NetTiSA (Network Time Series Analysed), which is based on the analysis of the time series of packet sizes. By thoroughly testing 25 different network classification tasks, we show the broad applicability and high usability of NetTiSA, which often outperforms the best-performing related works. For practical deployment, we also consider the sizes of flows extended for NetTiSA and evaluate the performance impacts of its computation in the flow exporter. The novel feature set proved universal and deployable to high-speed ISP networks with 100\,Gbps lines; thus, it enables accurate and widespread network security protection.
    摘要 translate into Simplified Chinese:网络流量监测基于IP流是标准监测方法,可以部署到不同的网络基础设施,包括连接百万人的IPS网络。由于流记录通常只包含地址、传输端口和交换的数据量,因此它们常被扩展以获得高精度的网络流量分析。然而,扩展流量通常是太大或计算过程太复杂,因此只能在较小的网络上进行部署。本文提出了一种基于时间序列分析的增强IP流,称为NetTiSA(网络时间序列分析)。通过对25种不同的网络分类任务进行严格测试,我们表明NetTiSA的广泛适用性和高可用性。此外,我们还考虑了NetTiSA扩展流量的大小以及计算流程的性能影响。 results show that the novel feature set is universal and deployable to high-speed ISP networks with 100 Gbps lines, enabling accurate and widespread network security protection.Note: Simplified Chinese is a romanization of Chinese, it is not a direct translation of the original text. The translation is based on the pronunciation of the characters, and it may not be exactly the same as the original text.

A novel Network Science Algorithm for Improving Triage of Patients

  • paper_url: http://arxiv.org/abs/2310.05996
  • repo_url: None
  • paper_authors: Pietro Hiram Guzzi, Annamaria De Filippo, Pierangelo Veltri
  • for: This paper aims to develop a novel algorithm for triaging patients based on the analysis of patient data, with the goal of improving the efficiency, accuracy, and consistency of patient prioritization.
  • methods: The algorithm is based on rigorous preprocessing and feature engineering of a comprehensive data set containing relevant patient information, such as vital signs, symptoms, and medical history.
  • results: The experimental results demonstrate that the algorithm achieved high accuracy and performance, outperforming traditional triage methods.
    Abstract Patient triage plays a crucial role in healthcare, ensuring timely and appropriate care based on the urgency of patient conditions. Traditional triage methods heavily rely on human judgment, which can be subjective and prone to errors. Recently, a growing interest has been in leveraging artificial intelligence (AI) to develop algorithms for triaging patients. This paper presents the development of a novel algorithm for triaging patients. It is based on the analysis of patient data to produce decisions regarding their prioritization. The algorithm was trained on a comprehensive data set containing relevant patient information, such as vital signs, symptoms, and medical history. The algorithm was designed to accurately classify patients into triage categories through rigorous preprocessing and feature engineering. Experimental results demonstrate that our algorithm achieved high accuracy and performance, outperforming traditional triage methods. By incorporating computer science into the triage process, healthcare professionals can benefit from improved efficiency, accuracy, and consistency, prioritizing patients effectively and optimizing resource allocation. Although further research is needed to address challenges such as biases in training data and model interpretability, the development of AI-based algorithms for triaging patients shows great promise in enhancing healthcare delivery and patient outcomes.
    摘要 医疗患者分类占据了医疗业中关键的地位,确保患者得到了时间适当的和适合的护理,根据患者的病情严重程度。传统的分类方法依赖于人类的判断,这可能是主观的和容易出错的。在最近的几年里,人们对使用人工智能(AI)开发患者分类算法表示了增加的兴趣。本文描述了一种基于患者数据分析的新的患者分类算法的开发。该算法通过对病人数据进行严格的预处理和特征工程来生成准确的患者分类结果。实验结果表明,我们的算法可以准确地将患者分为不同的分类 катего里,并且表现出了高度的准确率和性能,超过传统的分类方法。通过将计算机科学引入分类过程,医疗专业人员可以从而获得更高效、准确和一致的患者分类结果,优先级化患者,最大化资源的分配。虽然还需要进一步的研究,例如训练数据中存在的偏见和模型解释性等问题,但AI在患者分类中的应用显示了极大的潜力,以改善医疗服务和患者结果。

Projecting infinite time series graphs to finite marginal graphs using number theory

  • paper_url: http://arxiv.org/abs/2310.05526
  • repo_url: None
  • paper_authors: Andreas Gerhardus, Jonas Wahl, Sofia Faltenbacher, Urmi Ninad, Jakob Runge
  • for: 本文是用于推广和应用 causal-graphical-model 框架的方法和应用工作的一种新方法。
  • methods: 本文提出了一种方法,可以将无穷时间序列图表示为 marginal graphical models,以解决在无穷图中的 $m$-separation 问题。
  • results: 本文提出了一种算法,可以将无穷时间序列图 projection 到 marginal graphical models,并证明这些 marginal graphs 可以用于 causal discovery 和 causal effect estimation。
    Abstract In recent years, a growing number of method and application works have adapted and applied the causal-graphical-model framework to time series data. Many of these works employ time-resolved causal graphs that extend infinitely into the past and future and whose edges are repetitive in time, thereby reflecting the assumption of stationary causal relationships. However, most results and algorithms from the causal-graphical-model framework are not designed for infinite graphs. In this work, we develop a method for projecting infinite time series graphs with repetitive edges to marginal graphical models on a finite time window. These finite marginal graphs provide the answers to $m$-separation queries with respect to the infinite graph, a task that was previously unresolved. Moreover, we argue that these marginal graphs are useful for causal discovery and causal effect estimation in time series, effectively enabling to apply results developed for finite graphs to the infinite graphs. The projection procedure relies on finding common ancestors in the to-be-projected graph and is, by itself, not new. However, the projection procedure has not yet been algorithmically implemented for time series graphs since in these infinite graphs there can be infinite sets of paths that might give rise to common ancestors. We solve the search over these possibly infinite sets of paths by an intriguing combination of path-finding techniques for finite directed graphs and solution theory for linear Diophantine equations. By providing an algorithm that carries out the projection, our paper makes an important step towards a theoretically-grounded and method-agnostic generalization of a range of causal inference methods and results to time series.
    摘要 近年来,一些方法和应用工作已经适应和应用了 causal-graphical-model 框架到时间序列数据。许多这些工作使用时间分解的 causal 图,其延伸到过去和未来无穷,并且图的边重复在时间上,表明了预设的站立 causal 关系。然而,大多数结果和算法从 causal-graphical-model 框架不适用于无穷图。在这种工作中,我们开发了一种方法,将无穷时间序列图的 repetitive 边投影到固定时间窗口内的 marginal 图形式。这些 marginal 图可以回答 $m$-separation 查询,对于无穷图来说,是以前未解决的问题。此外,我们认为这些 marginal 图对 causal 发现和 causal 效应估计在时间序列中都是有用的,因此可以将 finite 图上的结果应用到无穷图上。投影过程基于在要投影的图中寻找共同祖先的搜索,并不是新的。然而,在时间序列图上执行这种投影过程具有挑战,因为可能存在无穷多个路径,导致共同祖先。我们解决这个问题,通过一种独特的将 finite 图上的路径找到与无穷图相匹配的方法,并使用解决线性Diophantine方程的解辑理论。我们的论文提供了一种可以执行投影的算法,这个步骤对于在时间序列中应用 causal 推理方法和结果进行 theoretically-grounded 和方法-agnostic 的总体化做出了重要贡献。

WeatherGNN: Exploiting Complicated Relationships in Numerical Weather Prediction Bias Correction

  • paper_url: http://arxiv.org/abs/2310.05517
  • repo_url: https://github.com/water-wbq/WeatherGNN
  • paper_authors: Binqing Wu, Weiqi Chen, Wengwei Wang, Bingqing Peng, Liang Sun, Ling Chen
  • for: corrected numerical weather prediction (NWP) bias
  • methods: Graph Neural Networks (GNN) and factor-wise GNN, fast hierarchical GNN
  • results: superior performance compared to other state-of-the-art (SOTA) methods, with an average improvement of 40.50% on RMSE compared to the original NWP.Here is the Chinese translation:
  • for: corrected numerical weather prediction (NWP) 误差
  • methods: Graph Neural Networks (GNN) 和分量 wise GNN, 快速层次 GNN
  • results: 与其他状态首选 (SOTA) 方法相比,平均提高40.50%的RMSE 相对于原始 NWP.
    Abstract Numerical weather prediction (NWP) may be inaccurate or biased due to incomplete atmospheric physical processes, insufficient spatial-temporal resolution, and inherent uncertainty of weather. Previous studies have attempted to correct biases by using handcrafted features and domain knowledge, or by applying general machine learning models naively. They do not fully explore the complicated meteorologic interactions and spatial dependencies in the atmosphere dynamically, which limits their applicability in NWP bias-correction. Specifically, weather factors interact with each other in complex ways, and these interactions can vary regionally. In addition, the interactions between weather factors are further complicated by the spatial dependencies between regions, which are influenced by varied terrain and atmospheric motions. To address these issues, we propose WeatherGNN, an NWP bias-correction method that utilizes Graph Neural Networks (GNN) to learn meteorologic and geographic relationships in a unified framework. Our approach includes a factor-wise GNN that captures meteorological interactions within each grid (a specific location) adaptively, and a fast hierarchical GNN that captures spatial dependencies between grids dynamically. Notably, the fast hierarchical GNN achieves linear complexity with respect to the number of grids, enhancing model efficiency and scalability. Our experimental results on two real-world datasets demonstrate the superiority of WeatherGNN in comparison with other SOTA methods, with an average improvement of 40.50\% on RMSE compared to the original NWP.
    摘要 numerical 天气预测(NWP)可能存在偏差或偏见,原因包括大气物理过程的缺失、时空分解不够细致,以及天气预测的内在不确定性。先前的研究已经尝试使用手工设计的特征和领域知识来纠正偏差,或者直接使用通用机器学习模型。但这些方法并未充分探索大气中复杂的物理互动和空间依赖关系,限制了它们在NWP偏差纠正中的应用。具体来说,天气因素之间存在复杂的互动,这些互动可能因地域而异,而且这些互动还受到不同的地形和大气动力的影响。为解决这些问题,我们提出了WeatherGNN,一种基于图神经网络(GNN)的NWP偏差纠正方法。我们的方法包括一个因素独立的GNN,可以在每个网格(具体位置)中适应地捕捉大气物理互动,以及一个快速的层次GNN,可以在不同网格之间快速捕捉空间依赖关系。吸引注意的是,快速的层次GNN在网格数量 linear 复杂度上具有优化,从而提高模型的效率和扩展性。我们的实验结果表明,WeatherGNN在两个真实世界数据集上的表现胜过其他SOTA方法,具有40.50%的RMSE提升。

A Neural Tangent Kernel View on Federated Averaging for Deep Linear Neural Network

  • paper_url: http://arxiv.org/abs/2310.05495
  • repo_url: None
  • paper_authors: Xin Liu, Dazhi Zhan, Wei Tao, Xin Ma, Yu Pan, Yu Ding, Zhisong Pan
  • for: 这篇论文的目的是提供 FedAvg 在训练神经网络时的全球收敛性保证。
  • methods: 这篇论文使用 NTK 理论来研究 FedAvg 在训练神经网络时的收敛性。
  • results: 这篇论文提供了 FedAvg 在训练深度线性神经网络时的全球收敛性保证,并且通过实验验证了理论结论。
    Abstract Federated averaging (FedAvg) is a widely employed paradigm for collaboratively training models from distributed clients without sharing data. Nowadays, the neural network has achieved remarkable success due to its extraordinary performance, which makes it a preferred choice as the model in FedAvg. However, the optimization problem of the neural network is often non-convex even non-smooth. Furthermore, FedAvg always involves multiple clients and local updates, which results in an inaccurate updating direction. These properties bring difficulties in analyzing the convergence of FedAvg in training neural networks. Recently, neural tangent kernel (NTK) theory has been proposed towards understanding the convergence of first-order methods in tackling the non-convex problem of neural networks. The deep linear neural network is a classical model in theoretical subject due to its simple formulation. Nevertheless, there exists no theoretical result for the convergence of FedAvg in training the deep linear neural network. By applying NTK theory, we make a further step to provide the first theoretical guarantee for the global convergence of FedAvg in training deep linear neural networks. Specifically, we prove FedAvg converges to the global minimum at a linear rate $\mathcal{O}\big((1-\eta K /N)^t\big)$, where $t$ is the number of iterations, $\eta$ is the learning rate, $N$ is the number of clients and $K$ is the number of local updates. Finally, experimental evaluations on two benchmark datasets are conducted to empirically validate the correctness of our theoretical findings.
    摘要 《联合平均(FedAvg)》是一种广泛使用的方法,用于在分布式客户端上共同训练模型而无需分享数据。现在,神经网络已经取得了很大的成功,使得它成为了FedAvg中的首选模型。然而,神经网络的优化问题经常是非凸的,甚至是不满足的。此外,FedAvg总是包括多个客户端和本地更新,这会导致不准确的更新方向。这些特性使得分析FedAvg在训练神经网络的 converges 变得更加困难。近年来,神经积极核(NTK)理论被提出,用于理解在非凸神经网络中第一个方法的converges。深度线性神经网络是神经网络理论中的经典模型,然而,关于FedAvg在训练深度线性神经网络的converges的理论结果并未存在。通过应用NTK理论,我们做出了一个进一步的步骤,提供了对FedAvg在训练深度线性神经网络的全球最佳化的首次理论保证。具体来说,我们证明FedAvg会在$(1-\eta K/N)^t$的线性速率下收敛到全球最小值,其中$t$是迭代次数,$\eta$是学习率,$N$是客户端的数量,$K$是本地更新的数量。最后,我们在两个标准数据集上进行了实验评估,以验证我们的理论发现的正确性。

Integration-free Training for Spatio-temporal Multimodal Covariate Deep Kernel Point Processes

  • paper_url: http://arxiv.org/abs/2310.05485
  • repo_url: None
  • paper_authors: Yixuan Zhang, Quyu Kong, Feng Zhou
  • for: 本研究提出了一种新的深度空间时间点 процесс模型(深度混合点过程),即DKMPP,该模型利用多modal的covariate信息。
  • methods: DKMPP使用一种更 flexible的深度kernel来模型事件和covariate数据之间的复杂关系,从而提高模型的表达能力。
  • results: 我们的实验表明,DKMPP和其相应的分数基 estimator在基eline模型之上表现出优异,展示了将covariate信息、深度kernel和分数基 estimator相结合的优势。
    Abstract In this study, we propose a novel deep spatio-temporal point process model, Deep Kernel Mixture Point Processes (DKMPP), that incorporates multimodal covariate information. DKMPP is an enhanced version of Deep Mixture Point Processes (DMPP), which uses a more flexible deep kernel to model complex relationships between events and covariate data, improving the model's expressiveness. To address the intractable training procedure of DKMPP due to the non-integrable deep kernel, we utilize an integration-free method based on score matching, and further improve efficiency by adopting a scalable denoising score matching method. Our experiments demonstrate that DKMPP and its corresponding score-based estimators outperform baseline models, showcasing the advantages of incorporating covariate information, utilizing a deep kernel, and employing score-based estimators.
    摘要 在这项研究中,我们提出了一种新的深度空间时间点过程模型,深度混合点过程(DKMPP),该模型利用多Modal covariate信息。DKMPP是DMPP的改进版本,它使用更 flexible的深度核函数来模型事件和 covariate数据之间的复杂关系,提高模型的表达力。为了解决DKMPP的训练过程中的非可 интегриble深度核函数问题,我们使用了不需要 интеграción的得分匹配方法,并通过采用扩展的净化得分匹配方法来提高效率。我们的实验表明,DKMPP和其相应的得分基估计器在比例模型和事件时间点过程模型方面具有优势, demonstrating the benefits of incorporating covariate information, using a deep kernel, and employing score-based estimators.Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

Vibroacoustic Frequency Response Prediction with Query-based Operator Networks

  • paper_url: http://arxiv.org/abs/2310.05469
  • repo_url: https://github.com/ecker-lab/FQ-Operator
  • paper_authors: Jan van Delden, Julius Schultz, Christopher Blech, Sabine C. Langer, Timo Lüddecke
  • for: 本研究旨在提高机械结构如飞机、汽车和房屋等的震动声波传播的理解,以确保其用户的健康和舒适性。
  • methods: 本研究使用数据驱动模型来加速 numerical simulation,以便进行设计优化、不确定性评估和设计空间探索等任务。特别是,我们提出了一种新的频率查询运算符模型,该模型可以将板体几何特征映射到频率响应函数。
  • results: 我们在一个包含12,000个板体几何特征的全面性 benchmark 上评估了我们的方法,并发现它比 DeepONets、Fourier Neural Operators 和传统神经网络架构更高效。
    Abstract Understanding vibroacoustic wave propagation in mechanical structures like airplanes, cars and houses is crucial to ensure health and comfort of their users. To analyze such systems, designers and engineers primarily consider the dynamic response in the frequency domain, which is computed through expensive numerical simulations like the finite element method. In contrast, data-driven surrogate models offer the promise of speeding up these simulations, thereby facilitating tasks like design optimization, uncertainty quantification, and design space exploration. We present a structured benchmark for a representative vibroacoustic problem: Predicting the frequency response for vibrating plates with varying forms of beadings. The benchmark features a total of 12,000 plate geometries with an associated numerical solution and introduces evaluation metrics to quantify the prediction quality. To address the frequency response prediction task, we propose a novel frequency query operator model, which is trained to map plate geometries to frequency response functions. By integrating principles from operator learning and implicit models for shape encoding, our approach effectively addresses the prediction of resonance peaks of frequency responses. We evaluate the method on our vibrating-plates benchmark and find that it outperforms DeepONets, Fourier Neural Operators and more traditional neural network architectures. The code and dataset are available from https://eckerlab.org/code/delden2023_plate.
    摘要 In this study, we present a structured benchmark for a representative vibroacoustic problem: predicting the frequency response of vibrating plates with varying forms of beadings. The benchmark features a total of 12,000 plate geometries with associated numerical solutions and introduces evaluation metrics to quantify prediction quality. To address the frequency response prediction task, we propose a novel frequency query operator model, which is trained to map plate geometries to frequency response functions. By integrating principles from operator learning and implicit models for shape encoding, our approach effectively predicts the resonance peaks of frequency responses.We evaluate our method on our vibrating-plates benchmark and find that it outperforms DeepONets, Fourier Neural Operators, and more traditional neural network architectures. The code and dataset are available at .

ExIFFI and EIF+: Interpretability and Enhanced Generalizability to Extend the Extended Isolation Forest

  • paper_url: http://arxiv.org/abs/2310.05468
  • repo_url: https://github.com/alessioarcudi/exiffi
  • paper_authors: Alessio Arcudi, Davide Frizzo, Chiara Masiero, Gian Antonio Susto
  • for: 本研究旨在提出一种可解释的异常检测方法,以帮助用户更好地理解模型的预测结果并进行根本分析。
  • methods: 本研究使用了一种加强版的扩展隔离林(EIF),并提出了一种新的可解释方法ExIFFI,该方法通过特征排名来提供异常检测结果的解释。
  • results: 实验结果显示,ExIFFI在异常检测和特征选择方面具有较高的效果和可解释性。此外,研究还提供了一些实际数据集的评估结果,以便进一步研究和复现。
    Abstract Anomaly detection, an essential unsupervised machine learning task, involves identifying unusual behaviors within complex datasets and systems. While Machine Learning algorithms and decision support systems (DSSs) offer effective solutions for this task, simply pinpointing anomalies often falls short in real-world applications. Users of these systems often require insight into the underlying reasons behind predictions to facilitate Root Cause Analysis and foster trust in the model. However, due to the unsupervised nature of anomaly detection, creating interpretable tools is challenging. This work introduces EIF+, an enhanced variant of Extended Isolation Forest (EIF), designed to enhance generalization capabilities. Additionally, we present ExIFFI, a novel approach that equips Extended Isolation Forest with interpretability features, specifically feature rankings. Experimental results provide a comprehensive comparative analysis of Isolation-based approaches for Anomaly Detection, including synthetic and real dataset evaluations that demonstrate ExIFFI's effectiveness in providing explanations. We also illustrate how ExIFFI serves as a valid feature selection technique in unsupervised settings. To facilitate further research and reproducibility, we also provide open-source code to replicate the results.
    摘要 异常检测是机器学习中的一项不supervised任务,它的目的是在复杂的数据和系统中找到不寻常的行为。而机器学习算法和决策支持系统(DSS)可以提供有效的解决方案,但仅仅找到异常点不足以应对实际应用中的需求。用户需要对模型预测的根本原因进行分析,以便进行根本分析和增加信任。然而,由于异常检测的无supervised性,创建可解释的工具是困难的。本工作提出了EIF+,一个优化的扩展隔离林(EIF)的变体,旨在增强其一般化能力。此外,我们还提出了ExIFFI,一个新的方法,它将扩展隔离林与可解释特性结合起来。ExIFFI在实验中与其他隔离基于方法进行比较分析,包括 sintetic 和实际数据评估,以显示ExIFFI在提供解释方面的效果。我们还证明了ExIFFI可以作为无supervised设定下的特性选择技术。为便进一步研究和重现,我们还提供了开源代码,以便重现结果。

Temporal Convolutional Explorer Helps Understand 1D-CNN’s Learning Behavior in Time Series Classification from Frequency Domain

  • paper_url: http://arxiv.org/abs/2310.05467
  • repo_url: https://github.com/jrzhang33/tce
  • paper_authors: Junru Zhang, Lang Feng, Yang He, Yuhan Wu, Yabo Dong
  • for: 提高一维卷积神经网络(1D-CNN)在时间序列分类任务中的表现,并解释它们在应用中的不desirable outcome。
  • methods: 提出了一种Temporal Convolutional Explorer(TCE)来从频谱角度 empirically explore 1D-CNN 的学习行为。
  • results: 通过对 widely-used UCR、UEA 和 UCI 测试集进行了广泛的实验,显示了以下三点:1) TCE 对 1D-CNN 的学习行为提供了深入的理解; 2) 我们的 regulatory framework 可以在现有的 1D-CNN 中实现更好的表现,具有更少的存储和计算开销。
    Abstract While one-dimensional convolutional neural networks (1D-CNNs) have been empirically proven effective in time series classification tasks, we find that there remain undesirable outcomes that could arise in their application, motivating us to further investigate and understand their underlying mechanisms. In this work, we propose a Temporal Convolutional Explorer (TCE) to empirically explore the learning behavior of 1D-CNNs from the perspective of the frequency domain. Our TCE analysis highlights that deeper 1D-CNNs tend to distract the focus from the low-frequency components leading to the accuracy degradation phenomenon, and the disturbing convolution is the driving factor. Then, we leverage our findings to the practical application and propose a regulatory framework, which can easily be integrated into existing 1D-CNNs. It aims to rectify the suboptimal learning behavior by enabling the network to selectively bypass the specified disturbing convolutions. Finally, through comprehensive experiments on widely-used UCR, UEA, and UCI benchmarks, we demonstrate that 1) TCE's insight into 1D-CNN's learning behavior; 2) our regulatory framework enables state-of-the-art 1D-CNNs to get improved performances with less consumption of memory and computational overhead.
    摘要 一维数据列表(1D-CNN)已经在时间序列分类任务中被证明有效,但我们发现在其应用中可能出现不жела的结果,这使我们更深入研究和理解它们的下面机制。在这种工作中,我们提出了时间卷积探索器(TCE)来从频谱频率角度 empirically 探索 1D-CNN 的学习行为。我们的 TCE 分析表明,深度 1D-CNN 会干扰低频组件,导致精度下降现象,并且干扰卷积是驱动因素。然后,我们利用我们的发现来实际应用中,并提出了一种监管框架,可以轻松地integrated into existing 1D-CNNs。它的目的是通过选择ively bypass specify 干扰卷积来纠正不佳的学习行为,从而提高 state-of-the-art 1D-CNNs 的性能,同时减少内存和计算负担。最后,通过对 UCR、UEA 和 UCI 测试集进行了广泛的实验,我们证明了以下两点:1) TCE 对 1D-CNN 的学习行为提供了深入的理解; 2) 我们的监管框架可以使 state-of-the-art 1D-CNNs 获得更好的性能,同时减少内存和计算负担。

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.05422
  • repo_url: https://github.com/sw-packages/a498e1142fb23106c12b054225864aab1156087a5ab634a1d88227024ecb1626
  • paper_authors: Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu
    for:* 这种研究旨在提高offline reinforcement learning的精度和可行性。methods:* 研究人员提出了一种名为”动力奖励”的隐藏因素,它在不同的过程中保持一致,从而提高了模型的泛化能力。results:* 在synthetic任务上,MOREC具有强大的泛化能力,可以 surprisngly回归一些远见过程。* 在21个offline任务上,MOREC超越了之前的最佳性能,升幅分别为4.6%和25.9%。* MOREC是第一种可以在6个D4RL任务和3个NeoRL任务中达到95%以上在线RL性能的方法。
    Abstract Learning a precise dynamics model can be crucial for offline reinforcement learning, which, unfortunately, has been found to be quite challenging. Dynamics models that are learned by fitting historical transitions often struggle to generalize to unseen transitions. In this study, we identify a hidden but pivotal factor termed dynamics reward that remains consistent across transitions, offering a pathway to better generalization. Therefore, we propose the idea of reward-consistent dynamics models: any trajectory generated by the dynamics model should maximize the dynamics reward derived from the data. We implement this idea as the MOREC (Model-based Offline reinforcement learning with Reward Consistency) method, which can be seamlessly integrated into previous offline model-based reinforcement learning (MBRL) methods. MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value. On a synthetic task, we visualize that MOREC has a strong generalization ability and can surprisingly recover some distant unseen transitions. On 21 offline tasks in D4RL and NeoRL benchmarks, MOREC improves the previous state-of-the-art performance by a significant margin, i.e., 4.6% on D4RL tasks and 25.9% on NeoRL tasks. Notably, MOREC is the first method that can achieve above 95% online RL performance in 6 out of 12 D4RL tasks and 3 out of 9 NeoRL tasks.
    摘要 学习准确的动力学模型可能是关键的,尤其是在线上学习中。然而,很遗憾的是,通过历史转移来学习的动力学模型往往难以泛化到未经看过的转移。在这项研究中,我们发现了一个隐藏的但是重要的因素,即动力奖励(dynamics reward),该因素在转移中保持一致。因此,我们提出了奖励一致的动力学模型(MOREC),即任何由动力学模型生成的转移都应该 Maximize the dynamics reward derived from the data。我们实现了这个想法,并将其与前期的Offline Model-based Reinforcement Learning(MBRL)方法相结合。MOREC可以从历史数据中学习一个通用的动力奖励函数,然后将其用作历史数据中的转移筛选器。当生成转移时,动力模型会生成一批转移,并选择具有最高动力奖励值的转移。在一个 synthetic task 上,我们可见地发现,MOREC具有强大的泛化能力,可以 surprisingly 回归一些远程未经看过的转移。在 D4RL 和 NeoRL benchmark 上的 21 个 Offline task 上,MOREC 提高了之前的状态核心性能,即 4.6% 在 D4RL 任务上和 25.9% 在 NeoRL 任务上。特别是,MOREC 是第一个可以达到上述 95% 在线RL 性能的 6 个 D4RL 任务和 3 个 NeoRL 任务。

On sparse regression, Lp-regularization, and automated model discovery

  • paper_url: http://arxiv.org/abs/2310.06872
  • repo_url: None
  • paper_authors: Jeremy A. McCulloch, Skyler R. St. Pierre, Kevin Linka, Ellen Kuhl
  • For: automatic model discovery and induce sparsity in nonlinear regression for material modeling* Methods: hybrid approach combining regularization and physical constraints, Lp regularization, constitutive neural networks, L2, L1, L0 regularization* Results: discovery of interpretable models and physically meaningful parameters, demonstration of Lp regularized constitutive neural networks’ ability to simultaneously discover both interpretability and predictability, and potential applications in generative material design and discovery of new materials with user-defined properties.
    Abstract Sparse regression and feature extraction are the cornerstones of knowledge discovery from massive data. Their goal is to discover interpretable and predictive models that provide simple relationships among scientific variables. While the statistical tools for model discovery are well established in the context of linear regression, their generalization to nonlinear regression in material modeling is highly problem-specific and insufficiently understood. Here we explore the potential of neural networks for automatic model discovery and induce sparsity by a hybrid approach that combines two strategies: regularization and physical constraints. We integrate the concept of Lp regularization for subset selection with constitutive neural networks that leverage our domain knowledge in kinematics and thermodynamics. We train our networks with both, synthetic and real data, and perform several thousand discovery runs to infer common guidelines and trends: L2 regularization or ridge regression is unsuitable for model discovery; L1 regularization or lasso promotes sparsity, but induces strong bias; only L0 regularization allows us to transparently fine-tune the trade-off between interpretability and predictability, simplicity and accuracy, and bias and variance. With these insights, we demonstrate that Lp regularized constitutive neural networks can simultaneously discover both, interpretable models and physically meaningful parameters. We anticipate that our findings will generalize to alternative discovery techniques such as sparse and symbolic regression, and to other domains such as biology, chemistry, or medicine. Our ability to automatically discover material models from data could have tremendous applications in generative material design and open new opportunities to manipulate matter, alter properties of existing materials, and discover new materials with user-defined properties.
    摘要 匿密回归和特征提取是知识发现大数据的基石。它们的目标是从科学变量之间找到可解释性强的预测模型,提供简单的关系。虽然 Linear 回归的统计工具已经在 context 中得到了良好的定制,但在非线性回归方面,它们在材料模型中的普遍性和不够了解。我们在这里探索使用神经网络自动发现模型的潜力,并通过混合两种策略来实现匿密性:规则化和物理约束。我们将 Lp 规则化用于子集选择与物理神经网络结合,并将其训练于both synthetic 和实际数据。我们进行了数千次发现运行,以推导出一些常见的指南和趋势:L2 规则化或ridge regression 不适合模型发现; L1 规则化或lasso 会导致匿密性,但会带来强烈的偏见;只有 L0 规则化可以透明地调整 interpretability 和预测性、简单性和准确性、偏见和偏差的负荷。通过这些发现,我们证明了 Lp 规则化的 constitutive 神经网络可以同时发现可解释性模型和物理意义的参数。我们预计这些发现将普遍到其他发现技术,如稀疏和符号回归,并在生物、化学、医学等领域得到应用。我们的自动发现材料模型技术可能会在生成材料设计中具有巨大的应用,开启新的材料性能控制和物质性能改变的可能性,以及发现新的材料。

Entropy-MCMC: Sampling from Flat Basins with Ease

  • paper_url: http://arxiv.org/abs/2310.05401
  • repo_url: None
  • paper_authors: Bolian Li, Ruqi Zhang
  • for: 这个论文的目的是提出一种偏置采样方法,以优化深度学习模型的 posterior 采样。
  • methods: 该方法基于一个辅助变量,使 MCMC 采样器偏向平坦区域,从而提高采样效率和准确性。
  • results: 实验结果表明,该方法可以成功采样到深度学习模型的平坦区域,并在多个 bencmarks 上表现出色,包括分类、准确性和异常检测等。
    Abstract Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multi-modal in nature, with local modes exhibiting varying generalization performance. Given a practical budget, sampling from the original posterior can lead to suboptimal performance, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. We prove the convergence of our method and further show that it converges faster than several existing flatness-aware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration, and out-of-distribution detection.
    摘要

Find Your Optimal Assignments On-the-fly: A Holistic Framework for Clustered Federated Learning

  • paper_url: http://arxiv.org/abs/2310.05397
  • repo_url: None
  • paper_authors: Yongxin Guo, Xiaoying Tang, Tao Lin
  • for: 这个论文旨在探讨现有的分布式机器学习方法中,如何处理客户端数据不同性,以提高模型在所有客户端上的表现。
  • methods: 该论文使用了聚类技术来解决客户端数据不同性的问题,并提出了一种四层框架,称为HCFL,以涵盖和扩展现有的方法。
  • results: 该论文通过广泛的数值评估表明,使用提出的聚类方法可以提高模型在客户端数据不同性下的表现,并且提出了进一步改进的聚类方法。
    Abstract Federated Learning (FL) is an emerging distributed machine learning approach that preserves client privacy by storing data on edge devices. However, data heterogeneity among clients presents challenges in training models that perform well on all local distributions. Recent studies have proposed clustering as a solution to tackle client heterogeneity in FL by grouping clients with distribution shifts into different clusters. However, the diverse learning frameworks used in current clustered FL methods make it challenging to integrate various clustered FL methods, gather their benefits, and make further improvements. To this end, this paper presents a comprehensive investigation into current clustered FL methods and proposes a four-tier framework, namely HCFL, to encompass and extend existing approaches. Based on the HCFL, we identify the remaining challenges associated with current clustering methods in each tier and propose an enhanced clustering method called HCFL+ to address these challenges. Through extensive numerical evaluations, we showcase the effectiveness of our clustering framework and the improved components. Our code will be publicly available.
    摘要 federated learning (FL) 是一种emerging distributed machine learningapproach that preserves client privacy by storing data on edge devices. However, data heterogeneity among clients presents challenges in training models that perform well on all local distributions. Recent studies have proposed clustering as a solution to tackle client heterogeneity in FL by grouping clients with distribution shifts into different clusters. However, the diverse learning frameworks used in current clustered FL methods make it challenging to integrate various clustered FL methods, gather their benefits, and make further improvements. To this end, this paper presents a comprehensive investigation into current clustered FL methods and proposes a four-tier framework, namely HCFL, to encompass and extend existing approaches. Based on the HCFL, we identify the remaining challenges associated with current clustering methods in each tier and propose an enhanced clustering method called HCFL+ to address these challenges. Through extensive numerical evaluations, we showcase the effectiveness of our clustering framework and the improved components. Our code will be publicly available.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Robust Image Watermarking based on Cross-Attention and Invariant Domain Learning

  • paper_url: http://arxiv.org/abs/2310.05395
  • repo_url: None
  • paper_authors: Agnibh Dasgupta, Xin Zhong
  • for: 这 paper 是为了提出一种robust image watermarking方法,用于嵌入和提取 watermark within a cover image,并且使用深度学习approaches增强总结和鲁棒性。
  • methods: 这 paper 使用了 convolution 和 concatenation 来实现 watermark embedding,同时也integrate了可能的 augmentation 进行训练。
  • results: 这 paper 提出了 two novel 和 significannot advancements:first, 使用 multi-head cross attention mechanism 来实现 watermark embedding,以便在 cover image 和 watermark之间进行信息交换,并且identify semantically suitable embedding locations。second, 提出了 learning an invariant domain representation 来捕捉 both semantic 和 noise-invariant information concerning the watermark,这对于提高 image watermarking technique 是非常有价值的。
    Abstract Image watermarking involves embedding and extracting watermarks within a cover image, with deep learning approaches emerging to bolster generalization and robustness. Predominantly, current methods employ convolution and concatenation for watermark embedding, while also integrating conceivable augmentation in the training process. This paper explores a robust image watermarking methodology by harnessing cross-attention and invariant domain learning, marking two novel, significant advancements. First, we design a watermark embedding technique utilizing a multi-head cross attention mechanism, enabling information exchange between the cover image and watermark to identify semantically suitable embedding locations. Second, we advocate for learning an invariant domain representation that encapsulates both semantic and noise-invariant information concerning the watermark, shedding light on promising avenues for enhancing image watermarking techniques.
    摘要 Image watermarking 图像水印技术 involves embedding and extracting watermarks within a cover image, with deep learning approaches emerging to enhance generalization and robustness. Predominantly, current methods use convolution and concatenation for watermark embedding, while also incorporating possible augmentation in the training process. This paper explores a robust image watermarking methodology by harnessing cross-attention and invariant domain learning, introducing two novel, significant advancements. First, we design a watermark embedding technique utilizing a multi-head cross attention mechanism, enabling information exchange between the cover image and watermark to identify semantically suitable embedding locations. Second, we advocate for learning an invariant domain representation that encapsulates both semantic and noise-invariant information concerning the watermark, shedding light on promising avenues for enhancing image watermarking techniques.Here's the translation breakdown:Image watermarking 图像水印技术 (watermarking technique)involves embedding and extracting watermarks within a cover image, 图像 (cover image)with deep learning approaches emerging to enhance generalization and robustness. 使用深度学习方法提高泛化和鲁棒性。Predominantly, current methods use convolution and concatenation for watermark embedding, 当今主要方法使用卷积和 concatenation 进行水印嵌入。while also incorporating possible augmentation in the training process. 同时在训练过程中也包含可能的增强。This paper explores a robust image watermarking methodology by harnessing cross-attention and invariant domain learning, 本文探讨了一种基于对比注意力和不变域学习的图像水印方法。marking two novel, significant advancements. 标志着两个新、重要的进步。First, we design a watermark embedding technique utilizing a multi-head cross attention mechanism, 首先,我们设计了一种基于多头对比注意力机制的水印嵌入技术。enabling information exchange between the cover image and watermark to identify semantically suitable embedding locations. 使得图像和水印之间进行信息交换,以便在意义上适当的嵌入位置。Second, we advocate for learning an invariant domain representation that encapsulates both semantic and noise-invariant information concerning the watermark, 第二,我们提倡学习一种不变域表示,包含水印中的 semantic 和噪音不变信息。shedding light on promising avenues for enhancing image watermarking techniques. 探讨了图像水印技术的可能的提高方向。

Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

  • paper_url: http://arxiv.org/abs/2310.05387
  • repo_url: None
  • paper_authors: Da Long, Wei W. Xing, Aditi S. Krishnapriyan, Robert M. Kirby, Shandian Zhe, Michael W. Mahoney
  • For: The paper is written for discovering governing equations from data, which is important in many scientific and engineering applications.* Methods: The paper proposes a novel equation discovery method based on Kernel learning and Bayesian Spike-and-Slab priors (KBASS), which combines kernel regression with a Bayesian spike-and-slab prior for effective operator selection and uncertainty quantification.* Results: The paper shows the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks, demonstrating its ability to overcome data sparsity and noise issues, as well as provide uncertainty quantification.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是为了发现数据中的权导方程,这对科学和工程应用来说非常重要。* Methods: 这篇论文提出了一种基于kernel学习和抽象积分架的方法(KBASS),它将kernel regression与抽象积分架相结合,以实现有效的运算选择和uncertainty评估。* Results: 论文在一系列的benchmark ODE和PDE发现任务上显示了KBASS的显著优势,证明了它在数据稀缺和噪声问题上的可行性,并且可以提供uncertainty评估。
    Abstract Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity as well as noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior -- an ideal Bayesian sparse distribution -- for effective operator selection and uncertainty quantification. We develop an expectation propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra methods to enable efficient computation and optimization. We show the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.
    摘要 发现管理方程式从数据中是科学和工程应用中非常重要的。虽然现有方法已经取得了很大的成功,但是它们仍然面临着数据稀缺和噪声问题,这两个问题在实践中却非常普遍。此外,现有的方法缺乏uncertainty量化和/或训练成本高。为了解决这些限制,我们提出了一种基于kernel学习和权重积分干扰(KBASS)的方程发现方法。我们使用kernel回归来估计目标函数,这种方法非常灵活、表达力强和更加抗抗数据稀缺和噪声。我们将其与一种bayesian积分干扰(BAYESIAN SPIKE-AND-SLAB)的干扰分布结合起来,以实现有效的运算选择和uncertainty量化。我们开发了一种期望传播期望最大化(EP-EM)算法,以便高效地进行 posterior推理和函数估计。为了解决kernel回归的计算挑战,我们将函数值放在一个网格上,并使用kronecker产品结构,以及tensor代数方法来实现高效的计算和优化。我们在一系列的benchmark ODE和PDE发现任务上展示了KBASS的显著优势。

Augmented Embeddings for Custom Retrievals

  • paper_url: http://arxiv.org/abs/2310.05380
  • repo_url: None
  • paper_authors: Anirudh Khatry, Yasharth Bajpai, Priyanshu Gupta, Sumit Gulwani, Ashish Tiwari
  • For: 这个论文主要针对的是如何使用 dense retrieval 技术来提高异类、严格的检索效果,以满足现代大语言模型(LLM)的推荐任务。* Methods: 该论文提出了一种名为 Adapted Dense Retrieval 的机制,它可以将预训练的卷积扩展学习到特定任务中,以提高异类、严格的检索效果。* Results: 论文通过实验证明,Adapted Dense Retrieval Mechanism可以与现有的基于预训练矩阵的基线方法相比,在异类、严格的检索任务中提高检索效果。
    Abstract Information retrieval involves selecting artifacts from a corpus that are most relevant to a given search query. The flavor of retrieval typically used in classical applications can be termed as homogeneous and relaxed, where queries and corpus elements are both natural language (NL) utterances (homogeneous) and the goal is to pick most relevant elements from the corpus in the Top-K, where K is large, such as 10, 25, 50 or even 100 (relaxed). Recently, retrieval is being used extensively in preparing prompts for large language models (LLMs) to enable LLMs to perform targeted tasks. These new applications of retrieval are often heterogeneous and strict -- the queries and the corpus contain different kinds of entities, such as NL and code, and there is a need for improving retrieval at Top-K for small values of K, such as K=1 or 3 or 5. Current dense retrieval techniques based on pretrained embeddings provide a general-purpose and powerful approach for retrieval, but they are oblivious to task-specific notions of similarity of heterogeneous artifacts. We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval. Adapted Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding. We empirically validate our approach by showing improvements over the state-of-the-art general-purpose embeddings-based baseline.
    摘要 信息检索通常包括从质量很高的文档库中选择最相关的元素,以满足给定的搜索查询。经典应用中的检索通常采用同质和松散的方式,其中查询和文档元素都是自然语言(NL)句子(同质),并且目标是从文档库中选择最相关的元素,其中K是大的,例如10、25、50或甚至100(松散)。在最近几年,检索已经在准备提示 для大型自然语言模型(LLM)中得到广泛的应用。这些新的应用程序通常是不同类型的Entity的混合和严格的,查询和文档元素不同,需要改进Top-K中的检索。当前的某些检索技术基于预训练的嵌入可以提供一种通用和强大的方法,但它们对特定任务的相似性无法考虑不同类型的文件。我们介绍了适应的检索,一种将嵌入转换成以便改进特定任务、不同类型的文件和严格的检索。适应的检索通过学习一个低级别的剩余适应来实现。我们通过对比与现有的通用嵌入基eline来验证我们的方法。

Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training

  • paper_url: http://arxiv.org/abs/2310.05350
  • repo_url: None
  • paper_authors: Michael Benington, Leo Phan, Chris Pierre Paul, Evan Shoemaker, Priyanka Ranade, Torstein Collett, Grant Hodgson Perez, Christopher Krieger
  • for: 这个论文主要针对AI加速器处理能力和内存限制的问题,旨在探讨如何在可接受时间内执行机器学习任务(如训练和推理)。
  • methods: 这篇论文使用了分布式算法和电路优化技术来进行多节点环境中的模型扩展,提高模型训练和预处理的效率,并尝试将更多参数存储在有限的资源中。
  • results: 研究项目中对5个encoder-decoder LLMS进行了并行和分布式机器学习算法开发,并进行了细化的研究以量化三种ML并行方法(包括Microsoft DeepSpeed Zero Redundancy Optimizer(ZeRO)阶段)的关系。
    Abstract AI accelerator processing capabilities and memory constraints largely dictate the scale in which machine learning workloads (e.g., training and inference) can be executed within a desirable time frame. Training a state of the art, transformer-based model today requires use of GPU-accelerated high performance computers with high-speed interconnects. As datasets and models continue to increase in size, computational requirements and memory demands for AI also continue to grow. These challenges have inspired the development of distributed algorithm and circuit-based optimization techniques that enable the ability to progressively scale models in multi-node environments, efficiently minimize neural network cost functions for faster convergence, and store more parameters into a set number of available resources. In our research project, we focus on parallel and distributed machine learning algorithm development, specifically for optimizing the data processing and pre-training of a set of 5 encoder-decoder LLMs, ranging from 580 million parameters to 13 billion parameters. We performed a fine-grained study to quantify the relationships between three ML parallelism methods, specifically exploring Microsoft DeepSpeed Zero Redundancy Optimizer (ZeRO) stages.
    摘要 人工智能加速器处理能力和内存限制 largely dictate 机器学习任务(例如训练和推理)可以在理想时间内执行的规模。今天,使用 GPU 加速的高性能计算机和高速 интер连接来训练现代变换器基于模型。随着数据集和模型的大小不断增长,人工智能的计算要求和内存需求也在不断增长。这些挑战激发了分布式算法和绕组件优化技术的开发,以实现在多节点环境中逐渐扩大模型,高效地减少神经网络成本函数,并将更多参数存储在可用资源中。在我们的研究项目中,我们专注于并行和分布式机器学习算法开发,具体来说是优化数据处理和前期训练5个Encoder-Decoder LLMS的集合,该集合包括580亿参数到1300亿参数。我们进行了细化的研究,以量化三种机器学习并行方法之间的关系,具体来说是Microsoft DeepSpeed Zero Redundancy Optimizer(ZeRO)阶段。

DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.05333
  • repo_url: https://github.com/felix-thu/DiffCPS
  • paper_authors: Longxiang He, Linrui Zhang, Junbo Tan, Xueqian Wang
  • For: 解决 offline 强化学习中的受限策略搜索问题,提出一种基于Diffusion模型的受限策略搜索方法(DiffCPS),以高度表达能力替代先前的AWR方法。* Methods: 利用Diffusion模型的动作分布来消除受限策略搜索中的策略分布约束,然后使用Diffusion模型中的证据下界(ELBO)来近似KL约束。* Results: 在D4RL数据集上进行了广泛的实验,证明DiffCPS可以 дости得更好或至少相当于传统AWR基eline以及近期的Diffusion模型基eline。代码可以在 $\href{https://github.com/felix-thu/DiffCPS}{https://github.com/felix-thu/DiffCPS}$ 上获取。
    Abstract Constrained policy search (CPS) is a fundamental problem in offline reinforcement learning, which is generally solved by advantage weighted regression (AWR). However, previous methods may still encounter out-of-distribution actions due to the limited expressivity of Gaussian-based policies. On the other hand, directly applying the state-of-the-art models with distribution expression capabilities (i.e., diffusion models) in the AWR framework is insufficient since AWR requires exact policy probability densities, which is intractable in diffusion models. In this paper, we propose a novel approach called $\textbf{Diffusion Model based Constrained Policy Search (DiffCPS)}$, which tackles the diffusion-based constrained policy search without resorting to AWR. The theoretical analysis reveals our key insights by leveraging the action distribution of the diffusion model to eliminate the policy distribution constraint in the CPS and then utilizing the Evidence Lower Bound (ELBO) of diffusion-based policy to approximate the KL constraint. Consequently, DiffCPS admits the high expressivity of diffusion models while circumventing the cumbersome density calculation brought by AWR. Extensive experimental results based on the D4RL benchmark demonstrate the efficacy of our approach. We empirically show that DiffCPS achieves better or at least competitive performance compared to traditional AWR-based baselines as well as recent diffusion-based offline RL methods. The code is now available at $\href{https://github.com/felix-thu/DiffCPS}{https://github.com/felix-thu/DiffCPS}$.
    摘要 “干预策搜索”(Constrained Policy Search,简称CPS)是机器学习中的基本问题,通常通过优先预测(Advantage Weighted Regression,简称AWR)解决。然而,先前的方法可能仍会遇到对不同的动作的不合理的行为,因为运用 Gaussian-based 政策的有限表达能力。另一方面,直接将现场的先进模型(i.e., 传播模型)应用在 AWR 框架中是不足的,因为 AWR 需要精确的政策概率密度,传播模型中的概率密度是无法求解的。在本文中,我们提出了一个新的方法,called “传播模型基于的干预策搜索”(Diffusion Model based Constrained Policy Search,简称DiffCPS)。我们的研究表明,DiffCPS 可以在干预策搜索中消除政策概率密度的限制,并且使用传播模型中的动作分布来估计 KL 函数。因此,DiffCPS 可以充分利用传播模型的表达能力,而不需要耗费时间 Calculate 政策概率密度。我们的实验结果显示,DiffCPS 可以对 D4RL benchmark 进行了广泛的测试,并且与传统 AWR 基础的基elines 和最近的传播模型基础的 offline RL 方法相比,获得了更好的性能。我们的代码现在可以在 $\href{https://github.com/felix-thu/DiffCPS}{https://github.com/felix-thu/DiffCPS}$ 上获取。”

Unlearning with Fisher Masking

  • paper_url: http://arxiv.org/abs/2310.05331
  • repo_url: https://github.com/shivank21/Unlearning-with-Fisher-Masking
  • paper_authors: Yufang Liu, Changzhi Sun, Yuanbin Wu, Aimin Zhou
  • for: Machine unlearning aims to revoke some training data after learning in response to requests from users, model developers, and administrators.
  • methods: The proposed method uses a new masking strategy tailored to unlearning based on Fisher information.
  • results: The proposed method can unlearn almost completely while maintaining most of the performance on the remain data, and exhibits stronger stability compared to other unlearning baselines.Here’s the full text in Simplified Chinese:
  • for: 机器学习推理批处理强制请求下的数据恢复
  • methods: 基于Fisher信息的新遮盖策略
  • results: 可以减少大量数据,保持大多数数据的表现,并且比其他基线方法更稳定
    Abstract Machine unlearning aims to revoke some training data after learning in response to requests from users, model developers, and administrators. Most previous methods are based on direct fine-tuning, which may neither remove data completely nor retain full performances on the remain data. In this work, we find that, by first masking some important parameters before fine-tuning, the performances of unlearning could be significantly improved. We propose a new masking strategy tailored to unlearning based on Fisher information. Experiments on various datasets and network structures show the effectiveness of the method: without any fine-tuning, the proposed Fisher masking could unlearn almost completely while maintaining most of the performance on the remain data. It also exhibits stronger stability compared to other unlearning baselines
    摘要 机器学习卷回目标是在学习后根据用户、模型开发者和管理员的请求,撤销一部分训练数据。现有的大多数方法都基于直接细化,这可能并不会完全 removes 数据,也不会保留剩下数据的全部性能。在这种工作中,我们发现,先对一些重要参数进行遮盖,然后进行细化,可以有效提高卷回的性能。我们提出了针对卷回的新的遮盖策略,基于信息理解。对各种数据集和网络结构进行实验,我们发现,无需任何细化,我们的提议的遮盖策略可以几乎完全卷回数据,同时保留大部分剩下数据的性能。它还比其他卷回基线强制稳定。

Provable Compositional Generalization for Object-Centric Learning

  • paper_url: http://arxiv.org/abs/2310.05327
  • repo_url: None
  • paper_authors: Thaddäus Wiedemer, Jack Brady, Alexander Panfilov, Attila Juhos, Matthias Bethge, Wieland Brendel
  • for: bridging the gap between human and machine perception
  • methods: learning object-centric representations, using autoencoders with structural assumptions and enforcing encoder-decoder consistency
  • results: provable compositional generalization of object-centric representations through identifiability theory, validated through experiments on synthetic image data.Here’s the full text in Simplified Chinese:
  • for: bridging the gap between人类和机器视觉
  • methods: 通过学习对象中心表示,使用具有结构假设的自动编码器和强制编码器-解码器一致性,实现可靠的 композиitional generalization
  • results: 通过identifiability理论,证明对象中心表示可以可靠地推广到新的组合结构,并通过synthetic图像数据实验证明了这一结论。
    Abstract Learning representations that generalize to novel compositions of known concepts is crucial for bridging the gap between human and machine perception. One prominent effort is learning object-centric representations, which are widely conjectured to enable compositional generalization. Yet, it remains unclear when this conjecture will be true, as a principled theoretical or empirical understanding of compositional generalization is lacking. In this work, we investigate when compositional generalization is guaranteed for object-centric representations through the lens of identifiability theory. We show that autoencoders that satisfy structural assumptions on the decoder and enforce encoder-decoder consistency will learn object-centric representations that provably generalize compositionally. We validate our theoretical result and highlight the practical relevance of our assumptions through experiments on synthetic image data.
    摘要 学习概念的总结,使机器和人类视觉之间的差异越来越小,是核心的问题。一种广泛的尝试是学习对象中心的表示,这些表示被推测可以实现 композиitional generalization。然而,是否这种推测是正确的,还没有一个明确的理论或实际理解。在这项工作中,我们通过标识理论来研究对象中心表示是否可以 garantuee compositional generalization。我们证明了满足核心假设的 autoencoder 将学习对象中心表示,并且可以确定性地推导 compositional generalization。我们验证了我们的理论结论,并通过实验 validate 我们的假设在生成的图像数据上。

Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

  • paper_url: http://arxiv.org/abs/2310.05324
  • repo_url: https://github.com/acstarnes/wain23-policy-regularization
  • paper_authors: Andrew Starnes, Anton Dereventsov, Clayton Webster
  • For: 本研究考虑了使用强化学习agent中的政策梯度优化的迁移抑制效果。* Methods: 本文使用了不同的$\varphi$-差分和最大均值差Distance来增强Policy的优化目标函数,以促进Policy的多样性。* Results: 数值实验表明,通过使用多样性促进策略 régularization,可以提高各种个性化任务的性能,而且不会 sacrificing accuracy。
    Abstract In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which means certain actions are seldomly, if ever, selected. We augment the optimization objective function for the policy with terms constructed from various $\varphi$-divergences and Maximum Mean Discrepancy which encourages current policies to follow different state visitation and/or action choice distribution than previously computed policies. We provide numerical experiments using MNIST, CIFAR10, and Spotify datasets. The results demonstrate the advantage of diversity-promoting policy regularization and that its use on gradient-based approaches have significantly improved performance on a variety of personalization tasks. Furthermore, numerical evidence is given to show that policy regularization increases performance without losing accuracy.
    摘要 “在这个努力中,我们考虑了规则化对Policy生成的多样性的影响。Policy梯度学习代理人容易出现Entropy塌塌,这意味着某些动作很少或者从未被选择。我们将优化优化目标函数中的Policy加入了不同的状态访问和/或动作选择分布的梯度的不同$\varphi$-多样性和Maximum Mean Discrepancy。我们在MNIST、CIFAR10和Spotify数据集上进行了数值实验,结果表明了促进多样性的策略REG regularization的优势,并且在各种个性化任务中显著提高了性能。此外,我们还提供了数值证明,证明政策REG regularization可以不失准确性地提高表现。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.