methods: 本文提出了一种新的方法 called Federated Orthogonal Training (FOT),它利用层的全球输入子空间来避免全球忘记现象,并通过对新任务的聚合更新进行修正,使其与老任务的全球主方向 orthogonal。
results: 实验表明,FOT 方法可以在 CFL Setting 中超过现有状态的持续学习方法,实现了最高的准确率提升(最高达 15%),同时具有较低的计算和通信成本(27% 下降),而且不违反隐私原则。Abstract
Federated Learning (FL) has gained significant attraction due to its ability to enable privacy-preserving training over decentralized data. Current literature in FL mostly focuses on single-task learning. However, over time, new tasks may appear in the clients and the global model should learn these tasks without forgetting previous tasks. This real-world scenario is known as Continual Federated Learning (CFL). The main challenge of CFL is Global Catastrophic Forgetting, which corresponds to the fact that when the global model is trained on new tasks, its performance on old tasks decreases. There have been a few recent works on CFL to propose methods that aim to address the global catastrophic forgetting problem. However, these works either have unrealistic assumptions on the availability of past data samples or violate the privacy principles of FL. We propose a novel method, Federated Orthogonal Training (FOT), to overcome these drawbacks and address the global catastrophic forgetting in CFL. Our algorithm extracts the global input subspace of each layer for old tasks and modifies the aggregated updates of new tasks such that they are orthogonal to the global principal subspace of old tasks for each layer. This decreases the interference between tasks, which is the main cause for forgetting. We empirically show that FOT outperforms state-of-the-art continual learning methods in the CFL setting, achieving an average accuracy gain of up to 15% with 27% lower forgetting while only incurring a minimal computation and communication cost.
摘要
受到隐私保护训练 Decentralized 数据的 Federated Learning (FL) 技术在最近得到了广泛关注,因为它可以实现隐私保护训练。然而,目前的文献主要关注单任务学习。然而,随着时间的推移,客户端上可能会出现新的任务,global model需要学习这些任务而不是忘记之前的任务。这种real-world scenario被称为 Continual Federated Learning (CFL)。CFL 的主要挑战是全球性衰减,即当全球模型在新任务上训练时,其对于旧任务的性能下降。有些最近的工作在 CFL 中提出了方法,以解决全球性衰减问题,但这些方法 either 假设了过去数据样本的可用性或者违反了 Federated Learning 的隐私原则。我们提出了一种新的方法,即 Federated Orthogonal Training (FOT),以解决这些挑战。我们的算法从旧任务中提取每层的全球输入子空间,并将新任务的聚合更新修改为在每层上保持垂直于旧任务的全球主成分空间。这种方法降低了任务之间的干扰,这是主要的忘记原因。我们实验表明,FOT 可以在 CFL 设置中击败当前状态的 continual learning 方法,实现了最多15%的准确率提升,同时减少了27%的忘记水平,只占了最小的计算和通信成本。
A Comparative Evaluation of FedAvg and Per-FedAvg Algorithms for Dirichlet Distributed Heterogeneous Data
paper_authors: Hamza Reguieg, Mohammed El Hanjri, Mohamed El Kamili, Abdellatif Kobbane
for: investigate Federated Learning (FL) and compare two strategies within this paradigm: Federated Averaging (FedAvg) and Personalized Federated Averaging (Per-FedAvg)
methods: use Non-Identically and Independently Distributed (Non-IID) data to evaluate the performance of both strategies
results: Per-FedAvg shows superior robustness in conditions of high data heterogeneity, and our results provide insights into the development of more effective and efficient machine learning strategies in a decentralized setting.Here’s the full translation in Simplified Chinese:
methods: 使用 Non-Identically and Independently Distributed (Non-IID) 数据来评估这两种策略的性能。
results: Per-FedAvg 在高度不同数据中显示出更高的 Robustness,而我们的结果可以帮助开发更有效和高效的机器学习策略在分布式环境中。Abstract
In this paper, we investigate Federated Learning (FL), a paradigm of machine learning that allows for decentralized model training on devices without sharing raw data, there by preserving data privacy. In particular, we compare two strategies within this paradigm: Federated Averaging (FedAvg) and Personalized Federated Averaging (Per-FedAvg), focusing on their performance with Non-Identically and Independently Distributed (Non-IID) data. Our analysis shows that the level of data heterogeneity, modeled using a Dirichlet distribution, significantly affects the performance of both strategies, with Per-FedAvg showing superior robustness in conditions of high heterogeneity. Our results provide insights into the development of more effective and efficient machine learning strategies in a decentralized setting.
摘要
在这篇论文中,我们研究了联邦学习(Federated Learning,FL),这是一种机器学习的平台,允许在设备上进行分布式模型训练,而不需要共享原始数据,从而保持数据隐私。我们特别比较了两种策略在这个平台上:联邦平均(FedAvg)和个性化联邦平均(Per-FedAvg),并将注重在非同一样分布(Non-IID)数据上的性能。我们的分析表明,数据不同程度的不同,使用 Dirichlet 分布来模型,对两种策略的性能产生了显著影响,Per-FedAvg 在高度不同程度下表现出了更高的鲁棒性。我们的结果提供了开发更有效率的机器学习策略在分布式环境下的指导。
Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments
paper_authors: M. Soheil Shamaee, S. Fathi Hafshejani
for: 提高 Stochastic Gradient Descent(SGD)算法的性能
methods: 使用修改后的衰减步长,其中包括对 Logarithmic 函数的 интегра
results: 在 Smooth 非 convex 函数上达到 $O(\frac{\ln T}{\sqrt{T})$ 的 converge 速率,并通过数据集的实验表明了该方法的效果。Here’s the breakdown of each point:1. for: The paper is written to enhance the performance of the SGD algorithm.2. methods: The paper proposes a modified decay step size based on $\frac{1}{\sqrt{t}$ with a logarithmic term, which leads to the selection of smaller values in the final iterations.3. results: The paper achieves a convergence rate of $O(\frac{\ln T}{\sqrt{T})$ for smooth non-convex functions without the Polyak-{\L}ojasiewicz condition, and the numerical experiments on image classification tasks demonstrate significant improvements in accuracy compared to the traditional $\frac{1}{\sqrt{t}$ step size.Abstract
This paper introduces a novel approach to enhance the performance of the stochastic gradient descent (SGD) algorithm by incorporating a modified decay step size based on $\frac{1}{\sqrt{t}$. The proposed step size integrates a logarithmic term, leading to the selection of smaller values in the final iterations. Our analysis establishes a convergence rate of $O(\frac{\ln T}{\sqrt{T})$ for smooth non-convex functions without the Polyak-{\L}ojasiewicz condition. To evaluate the effectiveness of our approach, we conducted numerical experiments on image classification tasks using the FashionMNIST, and CIFAR10 datasets, and the results demonstrate significant improvements in accuracy, with enhancements of $0.5\%$ and $1.4\%$ observed, respectively, compared to the traditional $\frac{1}{\sqrt{t}$ step size. The source code can be found at \\\url{https://github.com/Shamaeem/LNSQRTStepSize}.
摘要
这篇论文提出了一种新的方法,用于提高泛率下降(SGD)算法的性能。该方法基于$\frac{1}{\sqrt{t}$的修改步长,其中包含了对数函数,从而选择小于最终迭代的值。我们的分析表明,对于非 convex 函数,该方法可以达到$O(\frac{\ln T}{\sqrt{T})$的 converges 速率,而不需要波佳-{\L}ojasiewicz 条件。为证明该方法的有效性,我们在图像分类任务上进行了数值实验,使用了 FashionMNIST 和 CIFAR10 数据集,结果显示,与传统 $\frac{1}{\sqrt{t}$ 步长相比,该方法可以提高准确率,具体提高了 $0.5\%$ 和 $1.4\%$。源代码可以在 \url{https://github.com/Shamaeem/LNSQRTStepSize} 找到。
Privacy-Utility Tradeoff of OLS with Random Projections
paper_authors: Yun Lu, Malik Magdon-Ismail, Yu Wei, Vassilis Zikas
for: 本研究探讨了Linear Ordinary Least Squares(OLS)问题的分布式隐私(DP)性。
methods: 本研究使用了Sarlos(2006)提出的Approximate LS Algorithm(ALS),以及Dwork et al.(2014)的标准 Gaussian Mechanism。我们还提出了一种新的DP分析方法,以及一些可能是独立有用的工具。
results: 我们的研究结果表明,ALS算法可以保持隐私,而无需修改或噪声加工。我们还提供了第一个精确的DP分析方法,以及一些改进的DP分析工具。此外,我们还证明了,在大规模数据集中,计算DP级别可能是不可能的。因此,需要开发黑obox DP估计器,以便在实际应用中 empirically estimating 数据中的隐私级别。Abstract
We study the differential privacy (DP) of a core ML problem, linear ordinary least squares (OLS), a.k.a. $\ell_2$-regression. Our key result is that the approximate LS algorithm (ALS) (Sarlos, 2006), a randomized solution to the OLS problem primarily used to improve performance on large datasets, also preserves privacy. ALS achieves a better privacy/utility tradeoff, without modifications or further noising, when compared to alternative private OLS algorithms which modify and/or noise OLS. We give the first {\em tight} DP-analysis for the ALS algorithm and the standard Gaussian mechanism (Dwork et al., 2014) applied to OLS. Our methodology directly improves the privacy analysis of (Blocki et al., 2012) and (Sheffet, 2019)) and introduces new tools which may be of independent interest: (1) the exact spectrum of $(\epsilon, \delta)$-DP parameters (``DP spectrum") for mechanisms whose output is a $d$-dimensional Gaussian, and (2) an improved DP spectrum for random projection (compared to (Blocki et al., 2012) and (Sheffet, 2019)). All methods for private OLS (including ours) assume, often implicitly, restrictions on the input database, such as bounds on leverage and residuals. We prove that such restrictions are necessary. Hence, computing the privacy of mechanisms such as ALS must estimate these database parameters, which can be infeasible in big datasets. For more complex ML models, DP bounds may not even be tractable. There is a need for blackbox DP-estimators (Lu et al., 2022) which empirically estimate a data-dependent privacy. We demonstrate the effectiveness of such a DP-estimator by empirically recovering a DP-spectrum that matches our theory for OLS. This validates the DP-estimator in a nontrivial ML application, opening the door to its use in more complex nonlinear ML settings where theory is unavailable.
摘要
我们研究了线性最小二乘(OLS)问题中的分数隐私(DP)。我们的关键结论是,偏相对最小二乘(ALS)算法(Sarlos,2006),一种用于提高大型数据集的性能的随机解决方案,同时也保持隐私。相比于其他修改和噪声OLS算法,ALS实现了更好的隐私/用途质量比,无需进一步修改或噪声。我们提供了首个紧密的DP分析 дляALS算法和标准 Gaussian机制(Dwork等,2014)应用于OLS问题。我们的方法直接改进了(Blocki等,2012)和(Sheffet,2019)中的隐私分析,并 introduce了新的工具:(1)DP分布的准确谱(DP spectrum),其中输出是一个$d$-维 Gaussian 分布,以及(2)改进的DP分布 для随机投影。所有私有OLS(包括我们的)都假设了输入数据库中的约束,例如,约束在输入数据中的倾斜和差异。我们证明了这些约束是必需的。因此,计算私有OLS的隐私必须估计这些数据库参数,这可能是大型数据集中的不可能任务。为更复杂的机器学习模型,DP bound可能无法可读。这需要黑盒DP估计器(Lu等,2022),它可以在数据中使用随机方法来估计数据依赖的隐私。我们证明了这种DP估计器的有效性,通过 empirically recovering a DP spectrum that matches our theory for OLS。这将开启黑盒DP估计器的使用在更复杂的非线性机器学习设置中,where theory is unavailable。
lfads-torch: A modular and extensible implementation of latent factor analysis via dynamical systems
paper_authors: Andrew R. Sedler, Chethan Pandarinath
for: 这篇论文是为了减少高维度神经活动中的噪音,以便在科学和工程领域中使用。
methods: 这篇论文使用了一种名为“Latent Factor Analysis via Dynamical Systems”的Variational Sequential Autoencoder(RNN-based),以解决高维度神经活动中的噪音问题。
results: 这篇论文的结果显示,这种方法可以实现高度的表现,并且可以应用到许多 neuroscience 中的问题上。Abstract
Latent factor analysis via dynamical systems (LFADS) is an RNN-based variational sequential autoencoder that achieves state-of-the-art performance in denoising high-dimensional neural activity for downstream applications in science and engineering. Recently introduced variants and extensions continue to demonstrate the applicability of the architecture to a wide variety of problems in neuroscience. Since the development of the original implementation of LFADS, new technologies have emerged that use dynamic computation graphs, minimize boilerplate code, compose model configuration files, and simplify large-scale training. Building on these modern Python libraries, we introduce lfads-torch -- a new open-source implementation of LFADS that unifies existing variants and is designed to be easier to understand, configure, and extend. Documentation, source code, and issue tracking are available at https://github.com/arsedler9/lfads-torch .
摘要
Latent Factor Analysis via Dynamical Systems(LFADS)是一种基于RNN的变量序列自动编码器,可以在科学和工程领域中实现高级别噪声去除神经活动数据。最近的变体和扩展继续证明了该架构在神经科学中的广泛应用。自LFADS原始实现以来,新的技术出现了,包括动态计算图、最小化boilerplate代码、组合模型配置文件和大规模训练。基于这些现代Python库,我们介绍lfads-torch---一个新的开源实现,它将 существующие变体集成起来,并设计为更容易理解、配置和扩展。文档、源代码和问题跟踪可以在https://github.com/arsedler9/lfads-torch 上找到。
Implicit regularization of deep residual networks towards neural ODEs
paper_authors: Pierre Marion, Yu-Han Wu, Michael E. Sander, Gérard Biau for: 这篇论文的目的是为了建立深度学习模型之间的数学基础,具体来说是将残留神经网络与神经� differential equations(ODEs)之间的连接固化。methods: 这篇论文使用了一种叫做偏微分流的方法来训练深度学习模型,并且证明了如果初始化了一个神经网络为一个残留神经网络的离散化,那么这个离散化会在训练过程中保持不变。results: 这篇论文的结果表明,如果神经网络满足一个Polyak-Lojasiewicz条件,那么 gradient flow 会收敛到一个全局最小值。此外,这个条件适用于一家 residual networks,其中每层的偏微分是一个二层感知器,并且它们在宽度方向上有一定的过度参数。numerical experiments 验证了这些结果。Abstract
Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual networks towards neural ODEs, for nonlinear networks trained with gradient flow. We prove that if the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. Our results are valid for a finite training time, and also as the training time tends to infinity provided that the network satisfies a Polyak-Lojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. Numerical experiments illustrate our results.
摘要
深度学习模型中的剩余神经网络是当前最佳实践。它们的连续深度类型,神经 diferencial equations(ODEs)也广泛使用。尽管它们的成功,但是这两种模型之间的数学基础仍然缺乏固定的连接。在这篇文章中,我们向这个方向致力于建立深度神经网络向神经ODE的隐式规范,对非线性网络进行梯度流训练。我们证明,如果网络在训练开始时初始化为神经ODE的离散化,那么这种离散化会在训练过程中保持不变。我们的结果适用于有限的训练时间和训练时间趋于无穷大,只要网络满足一个Polyak-Lojasiewicz条件。这个条件适用于一家具有线性增强的二层感知机的残差神经网络,并且 garantía 梯度流 converges to a global minimum。实验证明了我们的结果。
Symbolically integrating tensor networks over various random tensors by the second version of Python RTNI
for: 这个论文是为了介绍PyRTNI2库的升级版本,该库可以 симвоlic integrate tensor networks over Haar-distributed unitary matrices。
methods: 这篇论文使用了element-wise moment calculus的方法,以及将tensor network diagrams和delta functions相关联的方法。
results: PyRTNI2可以处理Haar-distributed orthogonal matrices和实数和复数正态分布tensor,并可以导出tensor networks的格式为TensorNetwork,以便进行进一步的计算,包括低维度的情况,where Weingarten functions differ from high-dimensional cases。Abstract
We are upgrading the Python-version of RTNI, which symbolically integrates tensor networks over the Haar-distributed unitary matrices. Now, PyRTNI2 can treat the Haar-distributed orthogonal matrices and the real and complex normal Gaussian tensors as well. Moreover, it can export tensor networks in the format of TensorNetwork so that one can make further calculations with concrete tensors, even for low dimensions, where the Weingarten functions differ from the ones for high dimensions. The tutorial notebooks are found at GitHub: https://github.com/MotohisaFukuda/PyRTNI2. In this paper, we explain maths behind the program and show what kind of tensor network calculations can be made with it. For the former, we interpret the element-wise moment calculus of the above random matrices and tensors in terms of tensor network diagrams, and argue that the view is natural, relating delta functions in the calculus to edges in tensor network diagrams.
摘要
我们正在升级Python版本的RTNI,这个symbolically组合了tensor network的程式。现在PyRTNI2可以处理哈aar分布的对称矩阵和实部和复部的正 Gaussian 网络,并且可以将网络出口到TensorNetwork格式,以便进一步计算具体的网络,甚至低维度的网络,其中Weingarten函数与高维度不同。教程 Notebook可以在GitHub上找到:https://github.com/MotohisaFukuda/PyRTNI2。在这篇论文中,我们解释了软件的数学基础和展示了它可以进行哪些网络计算。对于前者,我们将元素级数律calculus of the above random matrices and tensors interpreting as tensor network diagrams, and argue that the view is natural, relating delta functions in the calculus to edges in tensor network diagrams.
Noise robust speech emotion recognition with signal-to-noise ratio adapting speech enhancement
results: 实验结果表明,NRSER可以有效地提高SER系统的防噪性能,包括防止系统对完全背景噪声的识别。此外,提出的SNR水平检测结构可以独立地用于数据选择等任务。Abstract
Speech emotion recognition (SER) often experiences reduced performance due to background noise. In addition, making a prediction on signals with only background noise could undermine user trust in the system. In this study, we propose a Noise Robust Speech Emotion Recognition system, NRSER. NRSER employs speech enhancement (SE) to effectively reduce the noise in input signals. Then, the signal-to-noise-ratio (SNR)-level detection structure and waveform reconstitution strategy are introduced to reduce the negative impact of SE on speech signals with no or little background noise. Our experimental results show that NRSER can effectively improve the noise robustness of the SER system, including preventing the system from making emotion recognition on signals consisting solely of background noise. Moreover, the proposed SNR-level detection structure can be used individually for tasks such as data selection.
摘要
<>文本翻译成简化中文。<>听话情感识别(SER)经常受到背景噪声的影响,这会导致系统的性能下降。此外,基于背景噪声的预测可能会使用户对系统失去信任。在这种情况下,我们提出了一种防止噪声的Speech Emotion Recognition系统(NRSER)。NRSER使用了Speech Enhancement(SE)技术来有效地减少输入信号中的噪声。然后,我们引入了信号噪声比(SNR)水平检测结构和波形重建策略,以降低SE对无或少背景噪声的语音信号的负面影响。我们的实验结果表明,NRSER可以有效地提高噪声鲁棒性,包括避免系统对背景噪声只作出情感识别。此外,我们提出的SNR水平检测结构可以独立地应用于数据选择等任务。
An Accurate Graph Generative Model with Tunable Features
results: 实验结果表明,使用新的Feedback Error Mechanism 可以准确地调整 GraphTune 模型中的 graph 特征,比 conventinal models 更高精度。Abstract
A graph is a very common and powerful data structure used for modeling communication and social networks. Models that generate graphs with arbitrary features are important basic technologies in repeated simulations of networks and prediction of topology changes. Although existing generative models for graphs are useful for providing graphs similar to real-world graphs, graph generation models with tunable features have been less explored in the field. Previously, we have proposed GraphTune, a generative model for graphs that continuously tune specific graph features of generated graphs while maintaining most of the features of a given graph dataset. However, the tuning accuracy of graph features in GraphTune has not been sufficient for practical applications. In this paper, we propose a method to improve the accuracy of GraphTune by adding a new mechanism to feed back errors of graph features of generated graphs and by training them alternately and independently. Experiments on a real-world graph dataset showed that the features in the generated graphs are accurately tuned compared with conventional models.
摘要
一个图是非常常见和有力的数据结构,用于模型交流和社交网络。生成图模型是重要的基础技术,可以重复 simulate 网络和预测网络结构变化。虽然现有的生成图模型很有用,但是可调特征的图生成模型在领域中尚未得到充分发掘。我们之前已经提出了 GraphTune,一种可生成图模型,可以在生成图时连续调整特定图特征,保持大多数图集特征。但是,GraphTune 中的调整精度并没有达到实际应用中的需求。在这篇论文中,我们提出了一种方法来提高 GraphTune 的调整精度,通过添加一种反馈错误图特征的机制,并在独立地训练它们。实验表明,对实际图集进行生成后,图中的特征都能够准确地调整,比较于传统模型更加精准。
Advances in machine-learning-based sampling motivated by lattice quantum chromodynamics
results: 论文的结果表明,使用机器学习模型可以有效地采样凝聚态场论中的结构和交互,并且可以实现first-principles的物理计算。Abstract
Sampling from known probability distributions is a ubiquitous task in computational science, underlying calculations in domains from linguistics to biology and physics. Generative machine-learning (ML) models have emerged as a promising tool in this space, building on the success of this approach in applications such as image, text, and audio generation. Often, however, generative tasks in scientific domains have unique structures and features -- such as complex symmetries and the requirement of exactness guarantees -- that present both challenges and opportunities for ML. This Perspective outlines the advances in ML-based sampling motivated by lattice quantum field theory, in particular for the theory of quantum chromodynamics. Enabling calculations of the structure and interactions of matter from our most fundamental understanding of particle physics, lattice quantum chromodynamics is one of the main consumers of open-science supercomputing worldwide. The design of ML algorithms for this application faces profound challenges, including the necessity of scaling custom ML architectures to the largest supercomputers, but also promises immense benefits, and is spurring a wave of development in ML-based sampling more broadly. In lattice field theory, if this approach can realize its early promise it will be a transformative step towards first-principles physics calculations in particle, nuclear and condensed matter physics that are intractable with traditional approaches.
摘要
伪乱分布的抽样是计算科学中的一项普遍任务,从语言学到生物和物理等领域都有广泛的应用。生成机器学习(ML)模型在这个领域得到了成功,基于图像、文本和音频生成等应用的成功。然而,在科学领域的生成任务中有独特的结构和特点,例如复杂的对称和精确性保证的要求,这对ML技术提出了挑战和机遇。这篇观点文章描述了基于ML的抽样技术的进步,特别是基于粒子物理学的假想场论。通过实现对物质结构和互动的计算,假想场论成为了物理学界最大的开源超级计算的主要用户。设计为这种应用的ML算法面临着巨大挑战,包括扩展自定义ML架构到最大超级计算机上的必要性,但也承诺巨大的利益,并在ML基于抽样技术的开发中促进了广泛的进步。在粒子物理学中,如果这种方法能实现早期的承诺,那么将是对first-principles物理计算的一个转变步骤,包括粒子、核和 condensed matter 物理的计算,这些计算是使用传统方法不可能完成的。
For: The paper is written for users who want to perform machine learning tasks but do not have deep domain knowledge. The paper aims to provide a framework called AutoML-GPT that simplifies the machine learning pipeline and reduces the time and effort required for these tasks.* Methods: The paper uses a conversational interface to allow users to specify their requirements, constraints, and evaluation metrics. The system employs advanced techniques for hyperparameter optimization and model selection, ensuring that the resulting model achieves optimal performance.* Results: The paper demonstrates through experimental results on diverse datasets that AutoML-GPT significantly reduces the time and effort required for machine learning tasks. The system’s ability to leverage the vast knowledge encoded in large language models enables it to provide valuable insights, identify potential pitfalls, and suggest effective solutions to common challenges faced during model training.Abstract
With the emerging trend of GPT models, we have established a framework called AutoML-GPT that integrates a comprehensive set of tools and libraries. This framework grants users access to a wide range of data preprocessing techniques, feature engineering methods, and model selection algorithms. Through a conversational interface, users can specify their requirements, constraints, and evaluation metrics. Throughout the process, AutoML-GPT employs advanced techniques for hyperparameter optimization and model selection, ensuring that the resulting model achieves optimal performance. The system effectively manages the complexity of the machine learning pipeline, guiding users towards the best choices without requiring deep domain knowledge. Through our experimental results on diverse datasets, we have demonstrated that AutoML-GPT significantly reduces the time and effort required for machine learning tasks. Its ability to leverage the vast knowledge encoded in large language models enables it to provide valuable insights, identify potential pitfalls, and suggest effective solutions to common challenges faced during model training.
摘要
With the emerging trend of GPT models, we have established a framework called AutoML-GPT that integrates a comprehensive set of tools and libraries. This framework grants users access to a wide range of data preprocessing techniques, feature engineering methods, and model selection algorithms. Through a conversational interface, users can specify their requirements, constraints, and evaluation metrics. Throughout the process, AutoML-GPT employs advanced techniques for hyperparameter optimization and model selection, ensuring that the resulting model achieves optimal performance. The system effectively manages the complexity of the machine learning pipeline, guiding users towards the best choices without requiring deep domain knowledge. Through our experimental results on diverse datasets, we have demonstrated that AutoML-GPT significantly reduces the time and effort required for machine learning tasks. Its ability to leverage the vast knowledge encoded in large language models enables it to provide valuable insights, identify potential pitfalls, and suggest effective solutions to common challenges faced during model training.Here's the translation in Traditional Chinese:With the emerging trend of GPT models, we have established a framework called AutoML-GPT that integrates a comprehensive set of tools and libraries. This framework grants users access to a wide range of data preprocessing techniques, feature engineering methods, and model selection algorithms. Through a conversational interface, users can specify their requirements, constraints, and evaluation metrics. Throughout the process, AutoML-GPT employs advanced techniques for hyperparameter optimization and model selection, ensuring that the resulting model achieves optimal performance. The system effectively manages the complexity of the machine learning pipeline, guiding users towards the best choices without requiring deep domain knowledge. Through our experimental results on diverse datasets, we have demonstrated that AutoML-GPT significantly reduces the time and effort required for machine learning tasks. Its ability to leverage the vast knowledge encoded in large language models enables it to provide valuable insights, identify potential pitfalls, and suggest effective solutions to common challenges faced during model training.
results: 本文结合了多种数据源、评估指标和方法,对Machine learning-based 工具和框架在B细胞免疫疗法设计方面的进步进行了评估和检验,并描述了主要挑战和未来发展的方向。Abstract
Antibodies, a prominent class of approved biologics, play a crucial role in detecting foreign antigens. The effectiveness of antigen neutralisation and elimination hinges upon the strength, sensitivity, and specificity of the paratope-epitope interaction, which demands resource-intensive experimental techniques for characterisation. In recent years, artificial intelligence and machine learning methods have made significant strides, revolutionising the prediction of protein structures and their complexes. The past decade has also witnessed the evolution of computational approaches aiming to support immunotherapy design. This review focuses on the progress of machine learning-based tools and their frameworks in the domain of B-cell immunotherapy design, encompassing linear and conformational epitope prediction, paratope prediction, and antibody design. We mapped the most commonly used data sources, evaluation metrics, and method availability and thoroughly assessed their significance and limitations, discussing the main challenges ahead.
摘要
抗体,一种已批准的生物药物,在检测外源抗原方面发挥重要作用。抗体中和降解的效果取决于蛋白质-蛋白质复合物之间的强度、敏感性和特异性,这些特性需要资源充沛的实验技术进行Characterization。在最近几年,人工智能和机器学习方法在蛋白质结构预测和其复合物预测方面做出了重要进步,对免疫疗法设计提供了支持。本文将关注使用机器学习技术支持B细胞免疫疗法设计的进展,包括线性和 conformational 抗体蛋白质预测、蛋白质预测和抗体设计。我们将最常用的数据源、评价指标和方法可用性进行映射,并且详细评估了它们的重要性和局限性,讨论了未来的主要挑战。
Double Clipping: Less-Biased Variance Reduction in Off-Policy Evaluation
results: 研究表明,double clipping 可以减少 estimator 的偏差,同时保持 variance 减少性能,提高计算效率。Abstract
"Clipping" (a.k.a. importance weight truncation) is a widely used variance-reduction technique for counterfactual off-policy estimators. Like other variance-reduction techniques, clipping reduces variance at the cost of increased bias. However, unlike other techniques, the bias introduced by clipping is always a downward bias (assuming non-negative rewards), yielding a lower bound on the true expected reward. In this work we propose a simple extension, called $\textit{double clipping}$, which aims to compensate this downward bias and thus reduce the overall bias, while maintaining the variance reduction properties of the original estimator.
摘要
Carbon Emission Prediction and Clean Industry Transformation Based on Machine Learning: A Case Study of Sichuan Province
For: 本研究使用矩阵正常化处理2000-2019年四川省46个键盘产业的能源消耗数据,使用DBSCAN封顶分析 objective 分类行业。* Methods: 本研究使用DBSCAN封顶分析 objective 分类行业,并使用罚款回归模型来控制过采样、处理高维数据和选择特征。* Results: 研究发现第二个群around coal有最高排放,这主要归结于生产需求。 gasoline-focused和coke-focused 群也有显著的排放。根据这些结果,提出了使用清洁煤矿技术、交通管理、钢铁行业电力取代煤炭、行业标准化等减排策略。Abstract
This study preprocessed 2000-2019 energy consumption data for 46 key Sichuan industries using matrix normalization. DBSCAN clustering identified 16 feature classes to objectively group industries. Penalized regression models were then applied for their advantages in overfitting control, high-dimensional data processing, and feature selection - well-suited for the complex energy data. Results showed the second cluster around coal had highest emissions due to production needs. Emissions from gasoline-focused and coke-focused clusters were also significant. Based on this, emission reduction suggestions included clean coal technologies, transportation management, coal-electricity replacement in steel, and industry standardization. The research introduced unsupervised learning to objectively select factors and aimed to explore new emission reduction avenues. In summary, the study identified industry groupings, assessed emissions drivers, and proposed scientific reduction strategies to better inform decision-making using algorithms like DBSCAN and penalized regression models.
摘要
这个研究对2000-2019年四川46个重点产业的能源消耗数据进行了归一化处理。使用DBSCAN划分 clustering 方法对行业进行了 объектив分类。然后,对高维数据进行了惩罚回归模型的应用,以便控制过拟合、处理高维数据和选择特征。研究结果显示,第二个群组织煤矿产业占据了最高排出水平,这是因为生产需要。汽油和焦炭专注的群组也有显著的排出水平。根据这些结果,提出了清洁煤技术、交通管理、钢铁产业煤电replace和产业标准化等减排策略。这项研究通过不监督学习方法选择因素,探索了新的减排途径,以更好地 Inform 决策。In summary, the study used unsupervised learning techniques like DBSCAN clustering and penalized regression models to identify industry groupings, assess emissions drivers, and propose scientific reduction strategies for better decision-making. The research aimed to explore new emission reduction avenues by introducing unsupervised learning to objectively select factors.
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
results: 研究发现,在seen和unseen情况下,使用SSL模型对于dysarthria患者的词语运动预测具有显著改善,相比MFCC。DeCoAR在精度调整方案下,对于健康人和患者都显示了${\sim}1.81%}$和${\sim}4.56%}$的Relative Improvement of Pearson Correlation Coefficient(CC)。Abstract
$ $Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic space to the articulatory space. Signal-processing features like the MFCCs, have been widely used for the AAI task. For subjects with dysarthric speech, AAI is challenging because of an imprecise and indistinct pronunciation. In this work, we perform AAI for dysarthric speech using representations from pre-trained self-supervised learning (SSL) models. We demonstrate the impact of different pre-trained features on this challenging AAI task, at low-resource conditions. In addition, we also condition x-vectors to the extracted SSL features to train a BLSTM network. In the seen case, we experiment with three AAI training schemes (subject-specific, pooled, and fine-tuned). The results, consistent across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves a relative improvement of the Pearson Correlation Coefficient (CC) by ${\sim}$1.81\% and ${\sim}$4.56\% for healthy controls and patients, respectively, over MFCCs. In the unseen case, we observe similar average trends for different SSL features. Overall, SSL networks like wav2vec, APC, and DeCoAR, which are trained with feature reconstruction or future timestep prediction tasks, perform well in predicting dysarthric articulatory trajectories.
摘要
$ $音律-语音映射(AAI)是将音律空间映射到语音空间。信号处理特征如MFCCs,已广泛用于AAI任务。对于具有异常语音的主题而言,AAI是一项挑战,因为它们的发音不准确和模糊。在这项工作中,我们使用预训练的自然语言学习(SSL)模型来实现AAI。我们表明了不同预训练特征对于这项具有挑战性的AAI任务的影响。此外,我们还将x-vector conditioning到提取的SSL特征来训练BLSTM网络。在可见情况下,我们尝试了三种AAI训练方案(主体特定、混合和细化)。结果显示,在细化方案下,DeCoAR achieved relative improvement of Pearson Correlation Coefficient (CC) by approximately 1.81% and 4.56% for healthy controls and patients, respectively, over MFCCs.在未见情况下,我们观察到了不同的SSL特征对于不同的语音特征的平均趋势。总的来说,SSL网络如wav2vec、APC和DeCoAR,通过特征重建或未来时间步预测任务进行训练,在预测异常语音的语音映射方面表现良好。
Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization
results: 数值实验表明,该方法可以学习出一个更加稳健和 menos conservative 的策略,与传统的 rectangular uncertainty 相比。Abstract
In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set. By targeting maximal return under the most adversarial model from that set, RMDPs address performance sensitivity to misspecified environments. Yet, to preserve computational tractability, the uncertainty set is traditionally independently structured for each state. This so-called rectangularity condition is solely motivated by computational concerns. As a result, it lacks a practical incentive and may lead to overly conservative behavior. In this work, we study coupled reward RMDPs where the transition kernel is fixed, but the reward function lies within an $\alpha$-radius from a nominal one. We draw a direct connection between this type of non-rectangular reward-RMDPs and applying policy visitation frequency regularization. We introduce a policy-gradient method, and prove its convergence. Numerical experiments illustrate the learned policy's robustness and its less conservative behavior when compared to rectangular uncertainty.
摘要
在robust markov decision processes(RMDPs)中,假设奖励和转移动力在给定的不确定集中。通过targeting最大返回在最敌对模型下,RMDPs Address performance sensitivity to misspecified environments.然而,为保持计算 tractability,不确定集通常是独立结构的每个状态。这种called rectangularity condition 仅由计算问题所衍生,而且缺乏实践驱动力,可能会导致过度保守的行为。在这项工作中,我们研究了奖励RMDPs,其中转移函数固定,但奖励函数在一个α-距离 nominated one 内。我们 drew a direct connection between这种非正方形奖励-RMDPs 和应用策略访问频率规范化。我们介绍了一种策略梯度法,并证明其 convergence。numerical experiments 表明学习策略的 robustness 和与矩形不确定相比较保守的行为。
Tropical Geometric Tools for Machine Learning: the TML package
methods: 该论文使用 Hit and Run Markov chain Monte Carlo 采样器和 тропи metric 进行统计推断。此外,论文还介绍了一些基于 tropos 的超vised 和无关 learn 方法,包括 tropos 主成分分析、 tropos 逻辑回归和 tropos 核密度估计。
results: 论文的结果主要表明,使用 tropically 的 HAR 采样器可以有效地进行统计推断,并且可以应用于多种超vised 和无关 learn 问题。此外,论文还提出了一些基于 tropos 的新的方法和应用,例如 tropos 主成分分析和 tropos 核密度估计。Abstract
In the last decade, developments in tropical geometry have provided a number of uses directly applicable to problems in statistical learning. The TML package is the first R package which contains a comprehensive set of tools and methods used for basic computations related to tropical convexity, visualization of tropically convex sets, as well as supervised and unsupervised learning models using the tropical metric under the max-plus algebra over the tropical projective torus. Primarily, the TML package employs a Hit and Run Markov chain Monte Carlo sampler in conjunction with the tropical metric as its main tool for statistical inference. In addition to basic computation and various applications of the tropical HAR sampler, we also focus on several supervised and unsupervised methods incorporated in the TML package including tropical principal component analysis, tropical logistic regression and tropical kernel density estimation.
摘要
在过去一个十年中, тропическая геометрия的发展已经为统计学学习带来了许多直接适用的应用。TML包是R包中的第一个包含涵盖тропические几何基本计算、视觉化 тропически几何集以及使用极大加法代数下的极大值推论的完整工具集。主要地,TML包使用 тропи metric来进行统计推论,使用射击和逃跑Markov链式 Monte Carlo抽样法。此外,TML包还包括了一些基本计算和各种应用,如тропические主成分分析、тропические逻辑回归和 тропические核密度估计。
Federated Few-shot Learning for Cough Classification with Edge Devices
results: 我们的结果显示,F2LCough在COVID-19 Thermal Face & Cough数据集上取得了86%的F1-Score,较其他方法为高。这显示了几据学和联合学可以在数据缺乏情况下建立一个分类模型,并且维护了隐私性。Abstract
Automatically classifying cough sounds is one of the most critical tasks for the diagnosis and treatment of respiratory diseases. However, collecting a huge amount of labeled cough dataset is challenging mainly due to high laborious expenses, data scarcity, and privacy concerns. In this work, our aim is to develop a framework that can effectively perform cough classification even in situations when enormous cough data is not available, while also addressing privacy concerns. Specifically, we formulate a new problem to tackle these challenges and adopt few-shot learning and federated learning to design a novel framework, termed F2LCough, for solving the newly formulated problem. We illustrate the superiority of our method compared with other approaches on COVID-19 Thermal Face & Cough dataset, in which F2LCough achieves an average F1-Score of 86%. Our results show the feasibility of few-shot learning combined with federated learning to build a classification model of cough sounds. This new methodology is able to classify cough sounds in data-scarce situations and maintain privacy properties. The outcomes of this work can be a fundamental framework for building support systems for the detection and diagnosis of cough-related diseases.
摘要
自动分类咳声是肺病诊断和治疗中最关键的任务之一,但收集庞大量标注咳数据却具有高度劳动成本、数据缺乏和隐私问题。在这种情况下,我们的目标是开发一个能够有效地进行咳类型分类的框架,同时解决隐私问题。我们将问题重新定义为新的问题,并采用少量学习和联合学习来设计一个名为F2LCough的新框架。我们在COVID-19 thermal face & cough数据集上进行了比较,发现F2LCough在average F1-Score方面达到86%。这些结果表明了少量学习与联合学习的可行性,可以在数据缺乏情况下分类咳声并保持隐私性。这种新的方法可以为诊断咳病提供支持。
Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models
results: 实验结果表明,提议方法可以减少参数数量和计算复杂性,同时可以与现有方法相比做出类似的推断性能。Abstract
The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states. To surmount this obstacle, we propose to integrate the efficient transformed Gaussian process (ETGP) into the GPSSM, which involves pushing a shared GP through multiple normalizing flows to efficiently model the transition function in high-dimensional latent state space. Additionally, we develop a corresponding variational inference algorithm that surpasses existing methods in terms of parameter count and computational complexity. Experimental results on diverse synthetic and real-world datasets corroborate the efficiency of the proposed method, while also demonstrating its ability to achieve similar inference performance compared to existing methods. Code is available at \url{https://github.com/zhidilin/gpssmProj}.
摘要
Gaussian процесс状态空间模型 (GPSSM) 已经吸引了广泛的关注,用于模型复杂非线性动力系统。然而,现有的 GPSSM 使用每个隐藏状态维度之间的分离 Gaussian 过程 (GP),导致计算复杂性和参数增加,从而对高维隐藏状态系统的模型 pose 了挑战。为了缓解这个困难,我们提议将高效转换 Gaussian 过程 (ETGP) 集成到 GPSSM 中,该方法涉及将共享 GP Push 多个 normalizing flows,以高效地模型高维隐藏状态空间中的过渡函数。此外,我们还开发了相应的变量推理算法,它在参数计数和计算复杂性方面超过了现有方法。实验结果表明,提议的方法在多种 sintetic 和实际世界数据上具有高效性,同时也能够达到与现有方法相似的推理性能。代码可以在 \url{https://github.com/zhidilin/gpssmProj} 上获取。
MQENet: A Mesh Quality Evaluation Neural Network Based on Dynamic Graph Attention
results: 实验结果表明,MQENet可以有效地评估结构网格的质量,并且在NACA-Market benchmark dataset上达到了高度的评估精度。Abstract
With the development of computational fluid dynamics, the requirements for the fluid simulation accuracy in industrial applications have also increased. The quality of the generated mesh directly affects the simulation accuracy. However, previous mesh quality metrics and models cannot evaluate meshes comprehensively and objectively. To this end, we propose MQENet, a structured mesh quality evaluation neural network based on dynamic graph attention. MQENet treats the mesh evaluation task as a graph classification task for classifying the quality of the input structured mesh. To make graphs generated from structured meshes more informative, MQENet introduces two novel structured mesh preprocessing algorithms. These two algorithms can also improve the conversion efficiency of structured mesh data. Experimental results on the benchmark structured mesh dataset NACA-Market show the effectiveness of MQENet in the mesh quality evaluation task.
摘要
随着计算流体动力学的发展,工业应用中流体模拟精度的要求也在不断提高。mesh质量直接影响模拟精度。然而,过去的网格质量指标和模型无法全面、 объектив地评估网格质量。为此,我们提出MQENet,一种基于动态图注意力的结构化网格质量评估神经网络。MQENet将网格评估任务视为一种图分类任务,用于评估输入结构网格的质量。为了使结构网格生成的图更加有用,MQENet引入了两种新的结构网格预处理算法。这两种算法还可以提高结构网格数据的转换效率。实验结果表明,MQENet在标准网格数据集NACA-Market上得到了较高的评估精度。
Distribution learning via neural differential equations: a nonparametric statistical perspective
results: 这篇论文提出了一个普适的统计收敛分析方法,并在 $C^k$ 平滑目标分布和神经网络类型中实现了nearly minimax-optimal 的收敛率。Abstract
Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work establishes the first general nonparametric statistical convergence analysis for distribution learning via ODE models trained through likelihood maximization. We first prove a convergence theorem applicable to arbitrary velocity field classes $\mathcal{F}$ satisfying certain simple boundary constraints. This general result captures the trade-off between approximation error (`bias') and the complexity of the ODE model (`variance'). We show that the latter can be quantified via the $C^1$-metric entropy of the class $\mathcal F$. We then apply this general framework to the setting of $C^k$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $\mathcal F$: $C^k$ functions and neural networks. The latter is the practically important case of neural ODEs. Our proof techniques require a careful synthesis of (i) analytical stability results for ODEs, (ii) classical theory for sieved M-estimators, and (iii) recent results on approximation rates and metric entropies of neural network classes. The results also provide theoretical insight on how the choice of velocity field class, and the dependence of this choice on sample size $n$ (e.g., the scaling of width, depth, and sparsity of neural network classes), impacts statistical performance.
摘要
ordinary differential equations (ODEs) 通过它们引入的流动图,提供了一个强大的框架来Parameterize invertible transformations,以便表示复杂的概率分布。而这些模型在机器学习中已经取得了巨大的成功,特别是在生成模型和概率预测方面。然而,对这些模型的统计性质所知之少。这个工作首次提供了对ODE模型通过最大化likelihood进行学习的统计收敛分析。我们首先证明了适用于任何速度场类$\mathcal{F}$满足某些简单的边界约束的收敛定理。这个总体结果捕捉了在approximation error('bias')和ODE模型('variance')之间的负责任。我们表明了后者可以通过$\mathcal{F}$的$C^1$度量 entropy来衡量。然后,我们将这个总体框架应用于$C^k$平滑目标分布的设置,并确定了相对迫近最优的收敛率。其中,$C^k$函数和神经网络是两个有实际意义的速度场类。我们的证明技术需要结合(i) ODEs的分析稳定性结果,(ii) Sieved M-estimators的经典理论,以及(iii)近期关于应用率和度量Entropy的神经网络类的结果。结果还提供了对速度场类选择和样本大小 $n$(例如,宽度、深度和稀疏性的神经网络类的依赖关系)的统计性能的理论启示。