results: 在两个 sintetic 数据集和一个实际数据集上进行了实验,结果表明 RecAD 框架可以有效地检测异常并提供修复建议。Abstract
Anomaly detection in multivariate time series has received extensive study due to the wide spectrum of applications. An anomaly in multivariate time series usually indicates a critical event, such as a system fault or an external attack. Therefore, besides being effective in anomaly detection, recommending anomaly mitigation actions is also important in practice yet under-investigated. In this work, we focus on algorithmic recourse in time series anomaly detection, which is to recommend fixing actions on abnormal time series with a minimum cost so that domain experts can understand how to fix the abnormal behavior. To this end, we propose an algorithmic recourse framework, called RecAD, which can recommend recourse actions to flip the abnormal time steps. Experiments on two synthetic and one real-world datasets show the effectiveness of our framework.
摘要
<>多变量时间序列异常检测已经得到了广泛的研究,因为它们在各种应用领域中有广泛的应用前提。异常在多变量时间序列通常表示系统故障或外部攻击,因此 besides being effective in anomaly detection, recommending anomaly mitigation actions is also important in practice yet under-investigated。在这种情况下,我们将注重在时间序列异常检测中的算法措施,即可以在异常时间序列上提供修复动作的最小成本,以便域专家可以理解如何修复异常行为。为此,我们提出了一个算法措施框架,called RecAD,可以对异常时间序列提供修复动作建议。实验结果表明,我们的框架在两个 sintetic 数据集和一个实际世界数据集上具有效果。Note: "异常" (anomaly) in Chinese is usually translated as "异常行为" (abnormal behavior) or "异常情况" (abnormal situation), but in the context of this text, "异常" is used to refer to the anomalous data points or time steps.
The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing
results: 实验结果显示,这个方法可以将预测模型的认证范围提高,并且可以实现零学习的情况下的认证。Abstract
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius is in this context a crucial indicator of the robustness of models. However how to design an efficient classifier with a sufficient certified radius? Randomized smoothing provides a promising framework by relying on noise injection in inputs to obtain a smoothed and more robust classifier. In this paper, we first show that the variance introduced by randomized smoothing closely interacts with two other important properties of the classifier, i.e. its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. Moreover, to increase the certified robust radius, we introduce a different simplex projection technique for the base classifier to leverage the variance-margin trade-off thanks to Bernstein's concentration inequality, along with an enhanced Lipschitz bound. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models that are used with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
摘要
Message Propagation Through Time: An Algorithm for Sequence Dependency Retention in Time Series Modeling
methods: 该方法基于Message Propagation Through Time(MPTT)算法,通过两个内存模块同步管理RNN的初始隐藏状态,以便在不同的批处理中交换信息。此外,MPTT还实施三种策略来过滤过时信息和保留重要信息,以便为RNN提供有用的初始隐藏状态。
results: 实验结果显示,MPTT在四个气候数据集上与七种策略进行比较,具有最高的性能。Abstract
Time series modeling, a crucial area in science, often encounters challenges when training Machine Learning (ML) models like Recurrent Neural Networks (RNNs) using the conventional mini-batch training strategy that assumes independent and identically distributed (IID) samples and initializes RNNs with zero hidden states. The IID assumption ignores temporal dependencies among samples, resulting in poor performance. This paper proposes the Message Propagation Through Time (MPTT) algorithm to effectively incorporate long temporal dependencies while preserving faster training times relative to the stateful solutions. MPTT utilizes two memory modules to asynchronously manage initial hidden states for RNNs, fostering seamless information exchange between samples and allowing diverse mini-batches throughout epochs. MPTT further implements three policies to filter outdated and preserve essential information in the hidden states to generate informative initial hidden states for RNNs, facilitating robust training. Experimental results demonstrate that MPTT outperforms seven strategies on four climate datasets with varying levels of temporal dependencies.
摘要
时间序列模型ing,科学领域的一个关键领域,经常遇到训练机器学习(ML)模型,如回归神经网络(RNNs)时,使用常见的 mini-batch 训练策略,该策略假设样本是独立同分布(IID),并将 RNNs 初始化为零隐藏状态。IID 假设忽略了时间序列中的相互关系,导致训练不佳。这篇论文提出了Message Propagation Through Time(MPTT)算法,以有效地包含长期时间相关性,而且保持更快的训练时间相对于状态保持解决方案。MPTT 使用两个内存模块来异步管理 RNNs 的初始隐藏状态,以便在多个笔记中进行信息交换,并在多个epoch中进行多个笔记。MPTT 还实施了三种策略来过滤过时的信息和保留重要信息在隐藏状态中,以生成有用的初始隐藏状态,以便帮助 RNNs 强健地训练。实验结果表明,MPTT 在四个气候 dataset 上比 seven 种策略表现出色。
Sharp Generalization of Transductive Learning: A Transductive Local Rademacher Complexity Approach
for: 这 paper 的目的是提出一种新的工具,Transductive Local Rademacher Complexity (TLRC),用于分析抽象学习方法的泛化性能。
methods: 这 paper 使用了一种基于本地归一化的方法,Local Rademacher Complexity (LRC),在抽象学习 setting 中进行了修改和扩展。
results: 这 paper 提出了一种新的泛化性能分析工具,TLRC,可以应用于多种抽象学习问题,并在适当的条件下获得了锐利的上限。此外, paper 还提出了一种基于TLRC的低维方法,用于Graph Transductive Learning (GTL) 和 Transductive Nonparametric Kernel Regression (TNKR) 两种抽象学习任务,其泛化性能上限比exist 学习理论方法更为锐利。Abstract
We introduce a new tool, Transductive Local Rademacher Complexity (TLRC), to analyze the generalization performance of transductive learning methods and motivate new transductive learning algorithms. Our work extends the idea of the popular Local Rademacher Complexity (LRC) to the transductive setting with considerable changes compared to the analysis of typical LRC methods in the inductive setting. We present a localized version of Rademacher complexity based tool wihch can be applied to various transductive learning problems and gain sharp bounds under proper conditions. Similar to the development of LRC, we build TLRC by starting from a sharp concentration inequality for independent variables with variance information. The prediction function class of a transductive learning model is then divided into pieces with a sub-root function being the upper bound for the Rademacher complexity of each piece, and the variance of all the functions in each piece is limited. A carefully designed variance operator is used to ensure that the bound for the test loss on unlabeled test data in the transductive setting enjoys a remarkable similarity to that of the classical LRC bound in the inductive setting. We use the new TLRC tool to analyze the Transductive Kernel Learning (TKL) model, where the labels of test data are generated by a kernel function. The result of TKL lays the foundation for generalization bounds for two types of transductive learning tasks, Graph Transductive Learning (GTL) and Transductive Nonparametric Kernel Regression (TNKR). When the target function is low-dimensional or approximately low-dimensional, we design low rank methods for both GTL and TNKR, which enjoy particularly sharper generalization bounds by TLRC which cannot be achieved by existing learning theory methods, to the best of our knowledge.
摘要
我们介绍一个新工具---逐步抽象本地快速复杂度(TLRC),用于分析这些类型的学习方法的一般化性表现。我们将传统的本地快速复杂度(LRC)的想法推广到这个推uctive设定中,并做了一些重要的修改,以应对典型的LRC方法在对应设定中的分析。我们提出了一个基于独立变量的本地快速复杂度基于工具,可以应用到不同的这些类型的推uctive学习问题,并在适当的条件下获得锐利的上限。类似于LRC的发展,我们从一个锐利的均匀分布不等式中开始,并将预测函数类别的一个推uctive学习模型分成多个部分,每个部分的上限是基于对应的条件下的快速复杂度。我们还使用了一个特别设计的偏对应运算来确保 bound for the test loss on unlabeled test data in the transductive setting enjoys a remarkable similarity to that of the classical LRC bound in the inductive setting。我们使用了这个TLRC工具来分析这些类型的推uctive学习模型,包括这些类型的推uctive核函数学习(TKL)模型。结果显示了这些模型的一般化表现,并提供了两种类型的推uctive学习任务的一般化上限,包括这些类型的图像推uctive学习(GTL)和非 Parametric Kernel Regression(TNKR)。当目标函数是低维或近似低维的时候,我们设计了低维方法,这些方法具有特别锐利的一般化上限,不可能由现有的学习理论方法所 дости得,到最好的我们所知。
Applications of Federated Learning in IoT for Hyper Personalisation
results: 论文实现了ultra个性化ersonalization,并且可以在多个客户端上进行实时训练和应用。Abstract
Billions of IoT devices are being deployed, taking advantage of faster internet, and the opportunity to access more endpoints. Vast quantities of data are being generated constantly by these devices but are not effectively being utilised. Using FL training machine learning models over these multiple clients without having to bring it to a central server. We explore how to use such a model to implement ultra levels of personalization unlike before
摘要
亿量的物联网设备正在投入使用,利用更快的互联网和更多的终端机器。这些设备不断生成大量数据,但是它们并未有效利用。我们探讨如何使用FL训练机器学习模型,在多个客户端上运行无需带到中央服务器。我们还explore如何使用这种模型实现以往未有的超级个性化。Note that "FL" in the text refers to "Federated Learning", which is a machine learning technique that allows training models on distributed data without bringing all the data to a central server.
Optimal Nonlinearities Improve Generalization Performance of Random Features
methods: 该论文使用了一种Random feature model with a nonlinear activation function,并对其等价模型进行分析,以掌握活动函数的重要作用。
results: 该论文的实验结果表明,通过优化非线性函数,可以提高泛化性能,并且可以 Mitigate the double descent phenomenon。此外,论文还提供了一些优化后的非线性函数,如第二阶多项式和分割函数,可以在不同的 regression 和分类问题中使用。Abstract
Random feature model with a nonlinear activation function has been shown to perform asymptotically equivalent to a Gaussian model in terms of training and generalization errors. Analysis of the equivalent model reveals an important yet not fully understood role played by the activation function. To address this issue, we study the "parameters" of the equivalent model to achieve improved generalization performance for a given supervised learning problem. We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities. We provide two example classes from this set, e.g., second-order polynomial and piecewise linear functions. These functions are optimized to improve generalization performance regardless of the actual form. We experiment with regression and classification problems, including synthetic and real (e.g., CIFAR10) data. Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU. Furthermore, we illustrate that the proposed nonlinearities also mitigate the so-called double descent phenomenon, which is known as the non-monotonic generalization performance regarding the sample size and the model size.
摘要
随机特征模型与非线性活动函数已经被证明可以在训练和泛化错误方面达到相同的性能。分析等价模型的参数 revelas了活动函数在模型性能中扮演的重要 yet not fully understood 角色。为了解决这个问题,我们研究了等价模型的参数,以实现给定的supervised learning问题中的改进泛化性能。我们证明了从 Gaussian model 获取的参数可以定义一组优化的非线性函数。我们提供了这组函数的两个例子,如二次多项式和分割线性函数。这些函数可以在不同的形式下进行优化,以提高泛化性能。我们通过回归和分类问题进行实验,包括 sintetic 和实际(如 CIFAR10)数据。我们的数据显示,使用优化的非线性函数可以超越广泛使用的非线性函数如 ReLU,并且这些函数还可以 Mitigate double descent 现象,即样本大小和模型大小之间的非 monotonic 泛化性能。
Constant Approximation for Individual Preference Stable Clustering
paper_authors: Anders Aamand, Justin Y. Chen, Allen Liu, Sandeep Silwal, Pattara Sukprasert, Ali Vakilian, Fred Zhang
For: 这个论文的目的是解释一种基于稳定和公正约束的自然聚类目标,即IP稳定性(Individual Preference stability),并证明这种目标下的聚类算法是可行的。* Methods: 这篇论文使用了一种新的稳定性对象,即IP稳定性,并提供了一种基于这种对象的聚类算法。该算法使用了一种新的技术,即证明了一个$O(1)$稳定性的聚类算法是可行的。* Results: 这篇论文的结果表明,对于普通的距离函数,存在一个$O(1)$稳定性的聚类算法,并且该算法是可行的。此外,论文还介绍了一些扩展IP稳定性的方法,并提供了一些高效的近似算法。Abstract
Individual preference (IP) stability, introduced by Ahmadi et al. (ICML 2022), is a natural clustering objective inspired by stability and fairness constraints. A clustering is $\alpha$-IP stable if the average distance of every data point to its own cluster is at most $\alpha$ times the average distance to any other cluster. Unfortunately, determining if a dataset admits a $1$-IP stable clustering is NP-Hard. Moreover, before this work, it was unknown if an $o(n)$-IP stable clustering always \emph{exists}, as the prior state of the art only guaranteed an $O(n)$-IP stable clustering. We close this gap in understanding and show that an $O(1)$-IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering. We also introduce generalizations of IP stability beyond average distance and give efficient, near-optimal algorithms in the cases where we consider the maximum and minimum distances within and between clusters.
摘要
个人偏好稳定性(IP稳定),由阿hmadi等人(ICML 2022)引入,是一种自然的 clustering 目标,受到稳定和公平约束的影响。一个 clustering 是 $\alpha$-IP 稳定的,如果每个数据点与自己的集群的平均距离不大于 $\alpha$ 倍于任何其他集群的平均距离。 unfortunately, 确定数据集是否具有 $1$-IP 稳定 clustering 是NP-Hard。此外,在此前的工作中,只有 garantía an $O(n)$-IP 稳定 clustering,而不知道是否存在 $o(n)$-IP 稳定 clustering。我们在这个不了解中填补了这个差距,并证明了一个 $O(1)$-IP 稳定 clustering 总是存在于一般的度量下,并且我们提供了一个高效的算法,该算法输出这种 clustering。我们还介绍了 IP 稳定性的扩展,超过平均距离的情况下,并给出了高效的、近似优的算法。
An analysis of the derivative-free loss method for solving PDEs
results: 研究了时间间隔和步长对计算效率、训练可能和抽样误差的影响,并提供了分析结果和数值测试来支持分析结果Abstract
This study analyzes the derivative-free loss method to solve a certain class of elliptic PDEs using neural networks. The derivative-free loss method uses the Feynman-Kac formulation, incorporating stochastic walkers and their corresponding average values. We investigate the effect of the time interval related to the Feynman-Kac formulation and the walker size in the context of computational efficiency, trainability, and sampling errors. Our analysis shows that the training loss bias is proportional to the time interval and the spatial gradient of the neural network while inversely proportional to the walker size. We also show that the time interval must be sufficiently long to train the network. These analytic results tell that we can choose the walker size as small as possible based on the optimal lower bound of the time interval. We also provide numerical tests supporting our analysis.
摘要
这种研究利用神经网络解决一种特定类型的圆形偏微分方程(PDEs)的derivative-free损失法。derivative-free损失法使用费曼-卡克表示法,利用杂乱步进行随机扩散和其相应的平均值。我们研究了在计算效率、训练可能性和抽样误差等方面,Feynman-Kac表示法中时间间隔和步进行的影响。我们的分析表明,训练损失偏好与时间间隔和神经网络的空间梯度成正比,而与步进行的大小成反比。此外,我们还证明了训练过程中时间间隔必须足够长以训练神经网络。这些分析结果告诉我们可以根据最佳下界选择步进行的最小化。我们还提供了支持我们分析的数学测试。
Post-Training Overfitting Mitigation in DNN Classifiers
results: 实验结果表明,对CIFAR-10和CIFAR-100 dataset进行 poste针处理后MM基于规范可以减少托管攻击的影响,同时也可以提高clean generalization的准确率。Abstract
Well-known (non-malicious) sources of overfitting in deep neural net (DNN) classifiers include: i) large class imbalances; ii) insufficient training-set diversity; and iii) over-training. In recent work, it was shown that backdoor data-poisoning also induces overfitting, with unusually large classification margins to the attacker's target class, mediated particularly by (unbounded) ReLU activations that allow large signals to propagate in the DNN. Thus, an effective post-training (with no knowledge of the training set or training process) mitigation approach against backdoors was proposed, leveraging a small clean dataset, based on bounding neural activations. Improving upon that work, we threshold activations specifically to limit maximum margins (MMs), which yields performance gains in backdoor mitigation. We also provide some analytical support for this mitigation approach. Most importantly, we show that post-training MM-based regularization substantially mitigates non-malicious overfitting due to class imbalances and overtraining. Thus, unlike adversarial training, which provides some resilience against attacks but which harms clean (attack-free) generalization, we demonstrate an approach originating from adversarial learning that helps clean generalization accuracy. Experiments on CIFAR-10 and CIFAR-100, in comparison with peer methods, demonstrate strong performance of our methods.
摘要
well-known (非恶意) source of overfitting in deep neural network (DNN) classifiers include: i) large class imbalances; ii) insufficient training set diversity; and iii) over-training. 在latest work, it was shown that backdoor data poisoning also induces overfitting, with unusually large classification margins to the attacker's target class, mediated particularly by (unbounded) ReLU activations that allow large signals to propagate in the DNN. 因此, an effective post-training (without knowledge of the training set or training process) mitigation approach against backdoors was proposed, leveraging a small clean dataset, based on bounding neural activations. 我们提高了这种 mitigation approach by specifically thresholding activations to limit maximum margins (MMs), which yields performance gains in backdoor mitigation. 我们也提供了一些analytical support for this mitigation approach. most importantly, we show that post-training MM-based regularization substantially mitigates non-malicious overfitting due to class imbalances and overtraining. 因此, unlike adversarial training, which provides some resilience against attacks but harms clean (attack-free) generalization, we demonstrate an approach originating from adversarial learning that helps clean generalization accuracy. 我们的方法在CIFAR-10和CIFAR-100上进行了实验,与同期方法进行比较,示出了我们的方法的强大表现。
FENDA-FL: Personalized Federated Learning on Heterogeneous Clinical Datasets
methods: 该研究提出了一种基于 FENDA 方法(Kim et al., 2016)的 FL 扩展方法,并在 FLamby 测试集(du Terrail et al., 2022a)和 GEMINI 数据集(Verma et al., 2017)上进行了实验,结果表明该方法在医疗数据中具有稳定性和高性能。
results: 该研究的实验结果表明,与现有的全球和个性化 FL 技术相比,该方法在评估个性化 FL 方法时表现出了显著改进,并扩展了 FLamby 测试集,以便更好地反映实际应用场景。此外,该研究还提出了一个完整的检查点和评估框架,以更好地反映实际应用场景,并提供多个基准点 для比较。Abstract
Federated learning (FL) is increasingly being recognized as a key approach to overcoming the data silos that so frequently obstruct the training and deployment of machine-learning models in clinical settings. This work contributes to a growing body of FL research specifically focused on clinical applications along three important directions. First, an extension of the FENDA method (Kim et al., 2016) to the FL setting is proposed. Experiments conducted on the FLamby benchmarks (du Terrail et al., 2022a) and GEMINI datasets (Verma et al., 2017) show that the approach is robust to heterogeneous clinical data and often outperforms existing global and personalized FL techniques. Further, the experimental results represent substantive improvements over the original FLamby benchmarks and expand such benchmarks to include evaluation of personalized FL methods. Finally, we advocate for a comprehensive checkpointing and evaluation framework for FL to better reflect practical settings and provide multiple baselines for comparison.
摘要
《联合学习(FL)在医疗设置中越来越被认可为解决数据岛屿的障碍,帮助机器学习模型训练和部署。本研究对医疗应用的FL研究做出了三个重要贡献。首先,我们提出了对FENDA方法(Kim et al., 2016)的扩展,并在FLamby测试集(du Terrail et al., 2022a)和GEMINI数据集(Verma et al., 2017)上进行了实验。结果表明,我们的方法在医疗数据中具有坚定性,并经常超越现有的全球和个性化FL技术。此外,我们的实验结果超越了原始的FLamby测试集,并扩展了个性化FL方法的评估。最后,我们提出了一个完整的检查点和评估框架,以更好地反映实际场景,并提供多个基线 для比较。》Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The Traditional Chinese version may be slightly different.
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
results: 提高预测精度和泛化能力,能够处理数据噪声和符号表达错误,包括不精确的数值值、模型误差和错误添加或删除项。Abstract
Approximating nonlinear differential equations using a neural network provides a robust and efficient tool for various scientific computing tasks, including real-time predictions, inverse problems, optimal controls, and surrogate modeling. Previous works have focused on embedding dynamical systems into networks through two approaches: learning a single solution operator (i.e., the mapping from input parametrized functions to solutions) or learning the governing system of equations (i.e., the constitutive model relative to the state variables). Both of these approaches yield different representations for the same underlying data or function. Additionally, observing that families of differential equations often share key characteristics, we seek one network representation across a wide range of equations. Our method, called Predicting Operators and Symbolic Expressions (PROSE), learns maps from multimodal inputs to multimodal outputs, capable of generating both numerical predictions and mathematical equations. By using a transformer structure and a feature fusion approach, our network can simultaneously embed sets of solution operators for various parametric differential equations using a single trained network. Detailed experiments demonstrate that the network benefits from its multimodal nature, resulting in improved prediction accuracy and better generalization. The network is shown to be able to handle noise in the data and errors in the symbolic representation, including noisy numerical values, model misspecification, and erroneous addition or deletion of terms. PROSE provides a new neural network framework for differential equations which allows for more flexibility and generality in learning operators and governing equations from data.
摘要
使用神经网络来近似非线性差分方程提供了一种robust和高效的工具,用于各种科学计算任务,如实时预测、反问题、优化控制和代理模型。先前的研究通过两种方法来嵌入动力系统到网络中:学习输入参数函数到解的映射(即单个解析器)或学习 governing 方程(即状态变量关系的定律模型)。两种方法都会生成不同的表示方式,但是它们都是基于同一个下面数据或函数。此外,我们注意到了各种差分方程之间的共同特征,我们寻找一个可以覆盖各种差分方程的网络表示。我们的方法,叫做预测操作符和符号表达(PROSE),学习从多模式输入到多模式输出的映射,能够生成数值预测和 математиче Equations。通过使用 transformer 结构和特征融合方法,我们的网络可以同时嵌入多个解析器 для多个参数差分方程,使用单个训练的网络。详细的实验表明,网络具有多模式特征,导致预测精度提高和更好的泛化。网络能够处理数据中的噪声和符号表达中的错误,包括不精确的数值、模型误差和错误地添加或删除 терм。PROSE 提供了一个新的神经网络框架,允许在学习解析器和 governing 方程时更多的灵活性和通用性。
GraB-sampler: Optimal Permutation-based SGD Data Sampler for PyTorch
results: 论文的实验结果表明,使用GraB-sampler库可以复制训练损失和测试准确率结果,仅在训练时间开销上增加8.7%,并且占用GPU内存峰值使用率上升0.85%。Abstract
The online Gradient Balancing (GraB) algorithm greedily choosing the examples ordering by solving the herding problem using per-sample gradients is proved to be the theoretically optimal solution that guarantees to outperform Random Reshuffling. However, there is currently no efficient implementation of GraB for the community to easily use it. This work presents an efficient Python library, $\textit{GraB-sampler}$, that allows the community to easily use GraB algorithms and proposes 5 variants of the GraB algorithm. The best performance result of the GraB-sampler reproduces the training loss and test accuracy results while only in the cost of 8.7% training time overhead and 0.85% peak GPU memory usage overhead.
摘要
在线 Gradient Balancing(GraB)算法通过 solving 每个样本的散射问题,使用每个样本的梯度来遍历示例,已经证明是理论上最佳解,可以超越Random Reshuffling。然而,目前并没有有效的 GraB 实现,供社区使用。这项工作提供了一个高效的 Python 库,$\textit{GraB-sampler}$,使得社区可以轻松地使用 GraB 算法。此外,该工作还提出了 5 种 GraB 算法的变种。最佳性能结果表明,GraB-sampler 可以在训练损失和测试准确率上达到同样的水平,仅在训练时间成本上增加了8.7%,并且在最大 GPU 内存使用率上增加了0.85%。
HyperPPO: A scalable method for finding small policies for robotic control
results: 实验表明, HyperPPO 可以快速并效率地训练多种小型神经网络,并提供多个高性能策略选择。Abstract
Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo
摘要
模型 avec fewer parameters 是对储存有限的、高性能的机器人控制中必备的。找到这些更小的神经网络架构可能会耗时。我们提出了 HyperPPO,一种在政策上的 reinforcement learning 算法,利用图函数 hypernetworks 来估算多个神经网络架构中的参数。我们的方法可以同时计算多个小型神经网络的参数,并且可以在同样的训练资源下获得高性能的策略。我们提供了多个训练过的策略,让用户可以根据自己的计算限制选择合适的网络架构。我们发现,我们的方法可以扩展,更多的训练资源将导致更快地 converges 到更高性能的架构。我们示出了使用 HyperPPO 来控制 Crazyflie2.1 四旋翼机器人的神经策略是可行的。网站:https://sites.google.com/usc.edu/hyperppo
Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures
paper_authors: Christian Pedersen, Tiberiu Tesileanu, Tinghui Wu, Siavash Golkar, Miles Cranmer, Zijun Zhang, Shirley Ho
for: 这个研究是为了开发一个基于生物学信息的深度神经网络,用于预测肝癌病例。
methods: 这个研究使用了一个单调数据驱动的单向神经网络,并将生物学信息 integrate into the network through sparse connections。
results: 研究发现,这个方法可以提供更好的预测性,并且可以识别出不同神经网络的错误预测。Abstract
In Elmarakeby et al., "Biologically informed deep neural network for prostate cancer discovery", a feedforward neural network with biologically informed, sparse connections (P-NET) was presented to model the state of prostate cancer. We verified the reproducibility of the study conducted by Elmarakeby et al., using both their original codebase, and our own re-implementation using more up-to-date libraries. We quantified the contribution of network sparsification by Reactome biological pathways, and confirmed its importance to P-NET's superior performance. Furthermore, we explored alternative neural architectures and approaches to incorporating biological information into the networks. We experimented with three types of graph neural networks on the same training data, and investigated the clinical prediction agreement between different models. Our analyses demonstrated that deep neural networks with distinct architectures make incorrect predictions for individual patient that are persistent across different initializations of a specific neural architecture. This suggests that different neural architectures are sensitive to different aspects of the data, an important yet under-explored challenge for clinical prediction tasks.
摘要
在《Elmarakeby等人的《生物学信息感知深度神经网络 для肾癌发现》》中,提出了一种具有生物学信息的深度神经网络(P-NET),用于模拟肾癌的状态。我们对Elmarakeby等人的研究进行了重复性研究,使用他们原始代码库和我们自己使用更新版库的重新实现。我们评估了路径生物学信息的减少对P-NET性能的贡献,并证明其重要性。此外,我们还 explore了不同的神经网络架构和生物信息的集成方法。我们在同一个训练数据上测试了三种图神经网络,并 investigate了不同模型之间的临床预测一致性。我们的分析表明,不同的神经网络架构会对不同的数据特征产生不同的预测结果,这是致命疾病预测任务中尚未得到足够的研究的一个重要挑战。
Robust Offline Reinforcement Learning – Certify the Confidence Interval
results: 在不同环境中,确认算法能够有效地验证策略的稳定性Abstract
Currently, reinforcement learning (RL), especially deep RL, has received more and more attention in the research area. However, the security of RL has been an obvious problem due to the attack manners becoming mature. In order to defend against such adversarial attacks, several practical approaches are developed, such as adversarial training, data filtering, etc. However, these methods are mostly based on empirical algorithms and experiments, without rigorous theoretical analysis of the robustness of the algorithms. In this paper, we develop an algorithm to certify the robustness of a given policy offline with random smoothing, which could be proven and conducted as efficiently as ones without random smoothing. Experiments on different environments confirm the correctness of our algorithm.
摘要
当前,人工智能学会(RL),特别是深度RL,在研究领域内已经受到了越来越多的关注。然而,RL的安全性问题已经成为了一大问题,因为攻击方式已经成熟。为了防止这些攻击,一些实用的方法已经被开发出来,如对抗训练和数据筛选等。然而,这些方法都基于了empirical算法和实验,没有rigorous的理论分析。在这篇论文中,我们开发了一种可以证明RL策略的Robustness的算法,可以在Random Smoothing下进行有效地证明和实现。实验结果表明了我们的算法的正确性。
results: 该论文的实验结果证明了这种技术的有效性,并且对LAD模型的VC维度的估计也得到了证明。Abstract
The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.
摘要
“数据逻辑分析”(LAD)是一种技术,可以生成基于布尔函数的二分类分类器,其表示形式为排中函数(DNF)。尽管LAD算法使用优化技术,但生成的二分类分类器或规则不会导致过拟合。我们提出了对LAD模型中假设集的VC维度(Vapnik-Chervonenkis dimension)的理论正当性的解释,并通过实际实验证实。
Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance
for: comparing graphs with both node and edge attributes
methods: using Gromov-Wasserstein distances with novel algorithms for distance and barycenter computation
results: effective in learning tasks where graphs occur in either input space or output space, such as classification and graph predictionAbstract
Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.
摘要
<> translate english text into simplified chinesePairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.中文简体版:对于机器学习中的许多应用,如 clustering、基于核函数的分类/回归以及最近的监督图预测,对图进行对比是关键。通常,图之间的距离取决于这些结构化对象的有用表示,如 bag of substructures 或其他图嵌入。在最近几年,一种流行的解决方案是将图表示为度量度量空间,以便成功地利用最优运输,从而获得有意义的距离,以比较图。然而,这个家族的距离完全忽略了边属性,这些属性对许多结构化对象是关键。在这项工作中,我们介绍了一种扩展的格罗莫-瓦asserstein距离,用于比较具有特征的图。我们提出了新的算法来计算距离和中点。我们通过实验表明,该新距离在图出现在输入空间或输出空间中的学习任务中具有有效性,如分类和图预测。
Deep Learning Based Uplink Multi-User SIMO Beamforming Design
paper_authors: Cemil Vahapoglu, Timothy J. O’Shea, Tamoghna Roy, Sennur Ulukus
For: 提高5G无线通信网络的数据传输速率、覆盖范围和能效率,以及适应动态条件的问题。* Methods: 使用深度学习技术,具体是一种无监督的深度学习框架NNBF,实现Receive Multi-User Single Input Multiple Output(MU-SIMO)天线筛波设计。* Results: 比较基eline方法(ZFBF和MMSE均值器)表现出色,可扩展到单天线用户设备(UE)的数量增加,而基eline方法具有显著的计算扩展问题。Abstract
The advancement of fifth generation (5G) wireless communication networks has created a greater demand for wireless resource management solutions that offer high data rates, extensive coverage, minimal latency and energy-efficient performance. Nonetheless, traditional approaches have shortcomings when it comes to computational complexity and their ability to adapt to dynamic conditions, creating a gap between theoretical analysis and the practical execution of algorithmic solutions for managing wireless resources. Deep learning-based techniques offer promising solutions for bridging this gap with their substantial representation capabilities. We propose a novel unsupervised deep learning framework, which is called NNBF, for the design of uplink receive multi-user single input multiple output (MU-SIMO) beamforming. The primary objective is to enhance the throughput by focusing on maximizing the sum-rate while also offering computationally efficient solution, in contrast to established conventional methods. We conduct experiments for several antenna configurations. Our experimental results demonstrate that NNBF exhibits superior performance compared to our baseline methods, namely, zero-forcing beamforming (ZFBF) and minimum mean square error (MMSE) equalizer. Additionally, NNBF is scalable to the number of single-antenna user equipments (UEs) while baseline methods have significant computational burden due to matrix pseudo-inverse operation.
摘要
fifth-generation (5G) 无线通信网络的发展带来了更大的无线资源管理解决方案的需求,包括高数据速率、广泛的覆盖率、最小的延迟和能效的性能。然而,传统的方法在计算复杂性和适应动态条件方面存在缺陷,这导致了算法解决方案的实践与理论分析之间的差距。深度学习基于的技术提供了可能的解决方案,它们具有很大的表示能力。我们提出了一种新的无监督深度学习框架,名为NNBF,用于设计上行接收多用户单输入多输出(MU-SIMO)扫描。我们的目标是提高吞吐量,同时也提供计算效率高的解决方案,与已有的 conventional 方法不同。我们对几种天线配置进行了实验。我们的实验结果表明,NNBF 在 ZFBF 和 MMSE 平衡器的基础上表现出色,并且可扩展到单天线用户设备(UE)的数量。此外,NNBF 具有计算复杂性较低的优势,而基eline 方法在计算Matrix pseudo-inverse 操作时具有显著的计算压力。
results: 通过使用机器学习技术来填充缺失的标签,并应用减偏技术以解决预测不准确的问题,实现了高质量的推理结论,并且比使用只有标签数据的方法更加可靠。Abstract
While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.
摘要
While reliable decision-making relies on high-quality labeled data, acquiring such labels can be time-consuming and expensive through human annotations or scientific measurements. Machine learning offers an attractive alternative by generating large amounts of predicted labels quickly and cheaply; for example, predicted protein structures can supplement experimentally derived structures, and predictions of socioeconomic indicators from satellite imagery can supplement accurate survey data. However, since predictions are imperfect and potentially biased, this practice raises questions about the validity of downstream inferences.To address this issue, we propose cross-prediction, a method for making valid inferences powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction uses machine learning to impute missing labels and applies debiasing techniques to remedy prediction inaccuracies. This approach achieves the desired error probability and is more powerful than relying solely on labeled data.In comparison, the recent proposal of prediction-powered inference assumes that a good pre-trained model is already available. We show that cross-prediction is more powerful than this approach, especially when a fraction of the labeled data is split off and used to train the model. Additionally, cross-prediction provides more stable conclusions, with confidence intervals typically having lower variability.
A Design Toolbox for the Development of Collaborative Distributed Machine Learning Systems
paper_authors: David Jin, Niclas Kannengießer, Sascha Rank, Ali Sunyaev for:The paper is written for developers who want to design collaborative distributed machine learning (CDML) systems that meet specific use case requirements.methods:The paper presents a CDML design toolbox that guides the development of CDML systems, and it introduces CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements.results:The paper provides a systematic approach to designing CDML systems that can meet specific use case requirements, and it offers a set of CDML system archetypes with distinct key traits that can support the design of CDML systems.Abstract
To leverage data for the sufficient training of machine learning (ML) models from multiple parties in a confidentiality-preserving way, various collaborative distributed ML (CDML) system designs have been developed, for example, to perform assisted learning, federated learning, and split learning. CDML system designs show different traits, including high agent autonomy, ML model confidentiality, and fault tolerance. Facing a wide variety of CDML system designs with different traits, it is difficult for developers to design CDML systems with traits that match use case requirements in a targeted way. However, inappropriate CDML system designs may result in CDML systems failing their envisioned purposes. We developed a CDML design toolbox that can guide the development of CDML systems. Based on the CDML design toolbox, we present CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements.
摘要
To address this challenge, we have developed a CDML design toolbox that can guide the development of CDML systems. Based on the CDML design toolbox, we present CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements. These archetypes include:1. Assisted Learning: This archetype focuses on providing assistance to ML models through distributed computing and data sharing.2. Federated Learning: This archetype emphasizes maintaining the privacy and security of data while training ML models in a distributed manner.3. Split Learning: This archetype involves dividing the training process into multiple stages, each of which is performed by a different party.By leveraging these archetypes, developers can design CDML systems that meet their specific use case requirements and ensure the confidentiality and security of the data involved.
M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning
results: 研究发现,使用 M-OFDFT 方法可以在广泛的分子系统中实现相当于 Kohn-Sham DFT 的精度,并且可以对大分子系统进行 extrapolation, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.Abstract
Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
摘要
orbital-free density functional theory(OFDFT)是一种量子化学形式化,其成本规模比前一代键恩-谜DFT更低,现在越来越受当代分子研究的欢迎。然而,其准确性受到动能密度函数的限制,这是非扩散分子系统中难以估算的。在这种工作中,我们提出了M-OFDFT方法,可以使用深度学习函数模型来解决分子系统。我们在模型中嵌入非本地性,通过基于原子基的简洁密度表示来使其可负担。通过解决不同学习挑战,M-OFDFT可以与键恩-谜DFT相比走同样的精度,并且可以对大分子,包括蛋白质,进行研究,这表明了量子化学精度-效率贸易的前进。
Review of Machine Learning Methods for Additive Manufacturing of Functionally Graded Materials
paper_authors: Mohammad Karimzadeh, Aleksandar Vakanski, Fei Xu, Xinchang Zhang
for: 本研究旨在探讨机器学习技术在Directed Energy Deposition(DED)中的应用,以及其在功能含量变化材料(Functionally Graded Materials,FGM)的制造中的效果。
methods: 本研究使用机器学习技术来优化DED的处理参数,提高产品质量,并检测制造缺陷。
results: 本研究结果表明,机器学习技术可以有效地优化DED的处理参数,提高FGM的性能和特性。Abstract
Additive manufacturing has revolutionized the manufacturing of complex parts by enabling direct material joining and offers several advantages such as cost-effective manufacturing of complex parts, reducing manufacturing waste, and opening new possibilities for manufacturing automation. One group of materials for which additive manufacturing holds great potential for enhancing component performance and properties is Functionally Graded Materials (FGMs). FGMs are advanced composite materials that exhibit smoothly varying properties making them desirable for applications in aerospace, automobile, biomedical, and defense industries. Such composition differs from traditional composite materials, since the location-dependent composition changes gradually in FGMs, leading to enhanced properties. Recently, machine learning techniques have emerged as a promising means for fabrication of FGMs through optimizing processing parameters, improving product quality, and detecting manufacturing defects. This paper first provides a brief literature review of works related to FGM fabrication, followed by reviewing works on employing machine learning in additive manufacturing, Afterward, we provide an overview of published works in the literature related to the application of machine learning methods in Directed Energy Deposition and for fabrication of FGMs.
摘要
加法制造技术已经革命化了复杂部件的制造过程,可以直接Join матери材料,并且具有许多优势,如成本效益的制造复杂部件、减少制造废弃物和开销新的制造自动化机会。一组材料 для 加法制造具有极大的潜力提高部件性能和特性,那就是功能梯度材料(FGM)。FGM 是一种先进的复合材料,其中物质的分布随着位置而变化,使得它们在航空、汽车、医疗和国防等领域中具有极大的应用前景。与传统复合材料不同,FGM 的组分随着位置的变化而变化,导致了提高的性能。最近,机器学习技术在加法制造中 emerge 为一种可能的方法,通过优化处理参数、提高产品质量和检测制造缺陷。本文首先提供了关于 FGM 制造的文献综述,然后评论了关于机器学习在加法制造中的应用,最后提供了已发表的文献中关于机器学习方法在 Directed Energy Deposition 中的应用和 FGM 制造方面的评论。
CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption
for: investigate the regret-minimization problem in a multi-armed bandit setting with arbitrary corruptions
methods: introduce CRIMED algorithm, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance, and provide a finite-sample analysis of CRIMED’s regret performance
results: establish a problem-dependent lower bound on regret, and show that CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$, and develop a tight concentration result for medians in the presence of arbitrary corruptions.Here’s the format you requested:
for: <what are the paper written for?>
methods: <what methods the paper use?>
results: <what results the paper get?>I hope that helps!Abstract
We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
摘要
我们研究了在多臂机构设定中对抗恨衰问题,这个问题的特点是:代理人在每次选择臂时会收到由该臂的分布生成的奖励,但是这些奖励不直接观察到。而是,代理人会有一定的几率(即 $\varepsilon$)选择观察臂的分布,而另一些几率选择一个随机的扰动分布。我们不假设这些扰动分布的性质,它们可能是无界的。在这个设定下,我们建立了问题依赖的下界限定 regret,这是一个对于每个臂分布的家族而言。我们引入了CRIMED,一个在数据分布为 Gaussian 时的 asymptotically-optimal 算法,它可以实现下界限定 regret。此外,我们进行了finite-sample 分析,发现CRIMED 可以有效地处理 $\varepsilon$ 值高达 $\frac{1}{2}$ 的扰动。此外,我们还提出了一个紧密的集中数据分布中的 median 对于arbitrary corruptions的对称问题,这可能是独立的 интерес。最后,我们讨论了在 Gaussian 模型中处理错误的扩展。
Implicit Gaussian process representation of vector fields over arbitrary latent manifolds
paper_authors: Robert L. Peach, Matteo Vinao-Carl, Nir Grossman, Michael David, Emma Mallas, David Sharp, Paresh A. Malhotra, Pierre Vandergheynst, Adam Gosztolai
results: 可以在 manifold 上具有全球正则性,从而实现 vector 场的超分辨和填充,并且可以用于重建高密度神经动力学记录中的疾病标志Here’s the breakdown of each point:1. for: 用于学习未知函数和评估数据空间不确定性 - The paper is written to present a new method for learning vector signals over latent Riemannian manifolds, which can be used to improve the resolution and inpainting of vector fields in various applications.2. methods: 使用 pozitional 编码和 Laplacian 的Connection eigenfunctions 来扩展 GP 来学习 vector 信号 - The method uses positional encoding with eigenfunctions of the connection Laplacian to extend Gaussian processes (GPs) for learning vector signals over latent Riemannian manifolds.3. results: 可以在 manifold 上具有全球正则性,从而实现 vector 场的超分辨和填充,并且可以用于重建高密度神经动力学记录中的疾病标志 - The method is able to achieve global regularity over the manifold, which allows for super-resolution and inpainting of vector fields, and can be used to reconstruct high-density neural dynamics from low-density EEG recordings in healthy individuals and Alzheimer’s patients.Abstract
Gaussian processes (GPs) are popular nonparametric statistical models for learning unknown functions and quantifying the spatiotemporal uncertainty in data. Recent works have extended GPs to model scalar and vector quantities distributed over non-Euclidean domains, including smooth manifolds appearing in numerous fields such as computer vision, dynamical systems, and neuroscience. However, these approaches assume that the manifold underlying the data is known, limiting their practical utility. We introduce RVGP, a generalisation of GPs for learning vector signals over latent Riemannian manifolds. Our method uses positional encoding with eigenfunctions of the connection Laplacian, associated with the tangent bundle, readily derived from common graph-based approximation of data. We demonstrate that RVGP possesses global regularity over the manifold, which allows it to super-resolve and inpaint vector fields while preserving singularities. Furthermore, we use RVGP to reconstruct high-density neural dynamics derived from low-density EEG recordings in healthy individuals and Alzheimer's patients. We show that vector field singularities are important disease markers and that their reconstruction leads to a comparable classification accuracy of disease states to high-density recordings. Thus, our method overcomes a significant practical limitation in experimental and clinical applications.
摘要
We introduce RVGP, a generalization of GPs for learning vector signals over latent Riemannian manifolds. Our method uses positional encoding with eigenfunctions of the connection Laplacian, associated with the tangent bundle, readily derived from common graph-based approximation of data. We demonstrate that RVGP possesses global regularity over the manifold, which allows it to super-resolve and inpaint vector fields while preserving singularities.Furthermore, we use RVGP to reconstruct high-density neural dynamics derived from low-density EEG recordings in healthy individuals and Alzheimer's patients. We show that vector field singularities are important disease markers and that their reconstruction leads to a comparable classification accuracy of disease states to high-density recordings. Thus, our method overcomes a significant practical limitation in experimental and clinical applications.Translation in Simplified Chinese: Gaussian 进程 (GPs) 是一种流行的非 Parametric 统计模型,用于学习未知函数并量化数据中的空间时间不确定性。 recent works 扩展了 GPs 以模型分布在非 Euclidian 空间中的Scalar 和 Vector 量,包括计算机视觉、动力学系统和神经科学中的 Smooth 抽象。然而,这些方法假设数据下的抽象是已知的,这限制了它们的实际应用。我们引入 RVGP,一种 GPs 的扩展,用于学习分布在 latent Риман manifolds 上的Vector 信号。我们的方法使用 pozitional 编码,使用 tangent 维Bundle 上的连接 Laplacian 的 eigenfunctions,可以从通用的图像基 Approximation 中 derivation。我们示示了 RVGP 在 manifold 上具有全局正则性,可以Super-Resolution 和 Inpaint vector 场景,保留特点。此外,我们使用 RVGP 重建来自低密度 EEG 记录的高密度神经动力学。我们表明 that vector 场景的缺点是重要的疾病标志,并且其重建可以达到与高密度记录相同的疾病状态分类精度。因此,我们的方法超越了实际和临床应用中的重要限制。
Correcting for heterogeneity in real-time epidemiological indicators
paper_authors: Aaron Rumack, Roni Rosenfeld, F. William Townes for:* 这个论文旨在解决 auxillary data sources 上的空间和时间不均性问题,以提高 epidemiological surveillance 的准确性。methods:* 这种方法使用一个 “导航” 信号来纠正空间和时间不均性,并生成一个更可靠的信号,可以用于模型和预测。* 方法假设不均性可以近似为一个低级matrix,并且时间不均性平滑。results:* 通过使用这种方法,可以减少 auxillary data sources 上的不均性,从而大幅提高 epidemiological surveillance 的准确性。* 无基础实际数据,通过图表和地图来证明这种方法的有效性。Abstract
Auxiliary data sources have become increasingly important in epidemiological surveillance, as they are often available at a finer spatial and temporal resolution, larger coverage, and lower latency than traditional surveillance signals. We describe the problem of spatial and temporal heterogeneity in these signals derived from these data sources, where spatial and/or temporal biases are present. We present a method to use a ``guiding'' signal to correct for these biases and produce a more reliable signal that can be used for modeling and forecasting. The method assumes that the heterogeneity can be approximated by a low-rank matrix and that the temporal heterogeneity is smooth over time. We also present a hyperparameter selection algorithm to choose the parameters representing the matrix rank and degree of temporal smoothness of the corrections. In the absence of ground truth, we use maps and plots to argue that this method does indeed reduce heterogeneity. Reducing heterogeneity from auxiliary data sources greatly increases their utility in modeling and forecasting epidemics.
摘要
auxiliary数据源在流行病监测中变得越来越重要,因为它们通常具有更细致的空间和时间分布、更广泛的覆盖率和更低的延迟时间,相比传统监测信号。我们描述了这些信号中的空间和时间不均衡问题,其中存在空间和/或时间偏见。我们提出了使用“导向”信号来修正这些偏见,生成一个更可靠的信号,可以用于模型和预测。该方法假设空间和时间不均衡可以被近似为低级矩阵,并且时间不均衡平滑于时间。我们还提出了选择参数算法,用于选择表示矩阵级别和时间不均衡级别的参数。在缺乏真实参照数据的情况下,我们通过地图和图表来证明,这种方法确实减少了不均衡。减少auxiliary数据源中的不均衡,大大提高了它们的用于模型和预测流行病的utililty。
Efficient Training of One Class Classification-SVMs
results: 我们对实际世界数据集进行了广泛验证,并证明了我们的策略可以获得 statistically significant results。Abstract
This study examines the use of a highly effective training method to conduct one-class classification. The existence of both positive and negative examples in the training data is necessary to develop an effective classifier in common binary classification scenarios. Unfortunately, this criteria is not met in many domains. Here, there is just one class of examples. Classification algorithms that learn from solely positive input have been created to deal with this setting. In this paper, an effective algorithm for dual soft-margin one-class SVM training is presented. Our approach makes use of the Augmented Lagrangian (AL-FPGM), a variant of the Fast Projected Gradient Method. The FPGM requires only first derivatives, which for the dual soft margin OCC-SVM means computing mainly a matrix-vector product. Therefore, AL-FPGM, being computationally inexpensive, may complement existing quadratic programming solvers for training large SVMs. We extensively validate our approach over real-world datasets and demonstrate that our strategy obtains statistically significant results.
摘要
In this paper, an effective algorithm for dual soft-margin one-class support vector machine (SVM) training is presented. Our approach utilizes the Augmented Lagrangian (AL-FPGM), a variant of the Fast Projected Gradient Method, which is computationally inexpensive and can complement existing quadratic programming solvers for training large SVMs.We extensively validate our approach on real-world datasets and demonstrate that our strategy achieves statistically significant results.
Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models
results: 这篇论文透过实验显示了一个新的框架,可以根据患者的个人历史资料,生成适应性的治疗方案,并且可以预测未来的血糖水平。Abstract
We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction
摘要
我们提出了一种新的框架,它结合深度生成时间序列模型和决策理论来生成个性化治疗策略。它利用历史患者轨迹数据来同时学习生成真实个性化治疗和未来结果轨迹的深度生成时间序列模型。特别是,我们的框架允许生成基于个性化患者历史和训练优化预期未来结果的多变量治疗策略。我们通过生成医院糖尿病患者的个性化药物治疗策略和血糖预测,展示了我们的方法在生成改进个性化治疗策略方面的潜在优势。关键字:深度生成模型、概率决策支持、个性化治疗生成、药物和血糖预测
AtomSurf : Surface Representation for Learning on Protein Structures
results: 研究发现,surface 表示独立不足,但可与 graph-based 方法相结合,以获得最佳结果。 本研究的代码和数据可在 GitHub 上找到:https://github.com/Vincentx15/atom2D。Abstract
Recent advancements in Cryo-EM and protein structure prediction algorithms have made large-scale protein structures accessible, paving the way for machine learning-based functional annotations.The field of geometric deep learning focuses on creating methods working on geometric data. An essential aspect of learning from protein structures is representing these structures as a geometric object (be it a grid, graph, or surface) and applying a learning method tailored to this representation. The performance of a given approach will then depend on both the representation and its corresponding learning method. In this paper, we investigate representing proteins as $\textit{3D mesh surfaces}$ and incorporate them into an established representation benchmark. Our first finding is that despite promising preliminary results, the surface representation alone does not seem competitive with 3D grids. Building on this, we introduce a synergistic approach, combining surface representations with graph-based methods, resulting in a general framework that incorporates both representations in learning. We show that using this combination, we are able to obtain state-of-the-art results across $\textit{all tested tasks}$. Our code and data can be found online: https://github.com/Vincentx15/atom2D .
摘要
最近的冰结电镜和蛋白结构预测算法的进步使得大规模蛋白结构可以获得,这开了机器学习基于功能注释的大门。在几何深度学习中,我们关注创建适用于几何数据的方法。在学习蛋白结构时,表示这些结构为几何对象(是灰度、图还是表面),并应用适应这种表示的学习方法。这种方法的性能取决于表示和学习方法。在这篇论文中,我们调查使用$\textit{3D mesh surfaces}$表示蛋白质和其他表示结合使用的方法。我们发现,尽管有前期的承诺性结果,但独立使用表面表示并不是与3D网格相比竞争力强。基于这,我们介绍了一种衍生的方法,把表面表示与图形基本方法结合使用,得到一个涵盖所有测试任务的通用框架。我们的代码和数据可以在https://github.com/Vincentx15/atom2D中找到。
paper_authors: Tianci Liu, Haoyu Wang, Feijie Wu, Hengtong Zhang, Pan Li, Lu Su, Jing Gao
for: Mitigating model prediction bias against certain demographic subgroups such as elder and female.
methods: Data poisoning attack on fair representation learning (FRL) models, which induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data.
results: Superiority of the proposed attack on benchmark fairness datasets and state-of-the-art fair representation learning models, as well as a theoretical analysis on the needed number of poisoning samples to defend against the attack.Abstract
Fair machine learning seeks to mitigate model prediction bias against certain demographic subgroups such as elder and female. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream tasks. Despite the development of FRL methods, their vulnerability under data poisoning attack, a popular protocol to benchmark model robustness under adversarial scenarios, is under-explored. Data poisoning attacks have been developed for classical fair machine learning methods which incorporate fairness constraints into shallow-model classifiers. Nonetheless, these attacks fall short in FRL due to notably different fairness goals and model architectures. This work proposes the first data poisoning framework attacking FRL. We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data. This attack entails a prohibitive bilevel optimization, wherefore an effective approximated solution is proposed. A theoretical analysis on the needed number of poisoning samples is derived and sheds light on defending against the attack. Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.
摘要
“公平机器学习” seek to mitigate model prediction bias against certain demographic subgroups such as the elderly and women. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream tasks. Despite the development of FRL methods, their vulnerability under data poisoning attack, a popular protocol to benchmark model robustness under adversarial scenarios, is under-explored. Data poisoning attacks have been developed for classical fair machine learning methods which incorporate fairness constraints into shallow-model classifiers. Nonetheless, these attacks fall short in FRL due to notably different fairness goals and model architectures. This work proposes the first data poisoning framework attacking FRL. We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data. This attack entails a prohibitive bilevel optimization, wherefore an effective approximated solution is proposed. A theoretical analysis on the needed number of poisoning samples is derived and sheds light on defending against the attack. Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese writing systems. The other one is Traditional Chinese.
Predicting Long-term Renal Impairment in Post-COVID-19 Patients with Machine Learning Algorithms
results: 研究发现,年龄、血压、血糖和肥胖等因素均与长期reno衰竭有关,并且通过使用机器学习算法,可以准确预测患者是否会发展reno衰竭。Abstract
The COVID-19 pandemic has had far-reaching implications for global public health. As we continue to grapple with its consequences, it becomes increasingly clear that post-COVID-19 complications are a significant concern. Among these complications, renal impairment has garnered particular attention due to its potential long-term health impacts. This study, conducted with a cohort of 821 post-COVID-19 patients from diverse regions of Iraq across the years 2021, 2022, and 2023, endeavors to predict the risk of long-term renal impairment using advanced machine learning algorithms. Our findings have the potential to revolutionize post-COVID-19 patient care by enabling early identification and intervention for those at risk of renal impairment, ultimately improving clinical outcomes. This research encompasses comprehensive data collection and preprocessing, feature selection, and the development of predictive models using various machine learning algorithms. The study's objectives are to assess the incidence of long-term renal impairment in post-COVID-19 patients, identify associated risk factors, create predictive models, and evaluate their accuracy. We anticipate that our machine learning models, drawing from a rich dataset, will provide valuable insights into the risk of renal impairment, ultimately enhancing patient care and quality of life. In conclusion, the research presented herein offers a critical contribution to the field of post-COVID-19 care. By harnessing the power of machine learning, we aim to predict long-term renal impairment risk accurately. These predictions have the potential to inform healthcare professionals, enabling them to take proactive measures and provide targeted interventions for post-COVID-19 patients at risk of renal complications, thus minimizing the impact of this serious health concern.
摘要
COVID-19 大流行对全球公共卫生造成了深见的影响。我们继续面临这些后果,越来越清楚地认识到Post-COVID-19 合并症是一个重要的担忧。其中,肾功能障碍受到了特别关注,因为它可能会对长期健康造成深见的影响。本研究使用了821名来自伊拉克不同地区的 Post-COVID-19 患者的 cohort,从2021年至2023年,通过先进的机器学习算法,预测患者在长期内可能发生肾功能障碍的风险。我们的发现有助于提高患者的临床结果,因为它们可以让医生在患者风险肾功能障碍的情况下采取早期预防措施,从而改善患者的生活质量。本研究包括了完整的数据收集和处理、特征选择以及使用不同的机器学习算法来建立预测模型。研究的目标是评估Post-COVID-19 患者长期肾功能障碍的发生率、相关风险因素、预测模型的建立以及其准确性的评估。我们预计,基于丰富的数据集,我们的机器学习模型将提供价值的预测,帮助医生更好地识别患者患肾功能障碍的风险,并采取相应的措施,从而最大化患者的生活质量。因此,本研究对Post-COVID-19 患者的护理做出了重要贡献。通过利用机器学习的力量,我们可以准确预测患者在长期内可能发生的肾功能障碍风险,并提供价值的预测,以便医生能够在患者风险肾功能障碍的情况下采取早期预防措施,最终减少这种严重的健康问题对患者的影响。
High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality
paper_authors: Urte Adomaityte, Leonardo Defilippis, Bruno Loureiro, Gabriele Sicuro
For: 这个论文研究了高维度 robust 回归估计器在含有非常极端尾的杂度和响应函数杂度的情况下的性能。* Methods: 这个论文使用了 M-估计器和ridge 回归估计器,并研究了这些估计器在高维度情况下的性能。* Results: 研究发现,尽管 Huber 损失函数可以在一些情况下达到最优性能,但在高维度情况下,这种损失函数会导致估计器的性能下降。此外,研究还发现了一种“奇怪的转变”现象,即在抽样复杂度和杂度之间的关系。最后,研究还发现了这些方法在更加丰富的模型和数据分布下的性能。Abstract
We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a curious transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for noise distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.
摘要
我们研究高维性质的稳健回归估计器在 covariates 和响应函数中存在重尾的情况下。特别是,我们提供了一个锐化的极限性Characterisation of M-estimators 在一家包括二阶和更高阶积分布的 Elliptical covariate 和噪声数据分布下。我们发现,虽然consistent,但是在高维度 régime中,使用惩罚函数 Huber loss 的最佳调整参数 $\delta$ 是不优化的,这显示了需要进一步的正则化以实现最佳性能。这也暴露了一个curious transition 在 $\delta$ 中,它与样本复杂度和污染的关系。此外,我们 derivethe decay rates for the excess risk of ridge regression。我们发现,当噪声分布有finite second moment时,它是最佳和universal的,但是其衰减率可以在 covariates 的 second moment不存在时显著快。最后,我们表明了我们的公式可以轻松扩展到一个更加Rich family of models and data distributions,例如通用线性估计器在 mixture models 上。
Compositional Program Generation for Systematic Generalization
paper_authors: Tim Klinger, Luke Liu, Soham Dan, Maxwell Crouse, Parikshit Ram, Alexander Gray
for: The paper is written to explore the ability of a neuro-symbolic architecture called the Compositional Program Generator (CPG) to generalize on new concepts in a few-shot manner and to achieve perfect generalization on sequence-to-sequence language tasks.
methods: The paper uses a grammar of the input domain and a parser to generate a type hierarchy, and learns parameters for the semantic modules incrementally.
results: The paper achieves perfect generalization on the SCAN and COGS benchmarks in both standard and extreme few-shot settings.Here is the text in Simplified Chinese:
for: 本文是用来探究一种叫做 Compositional Program Generator (CPG) 的 neuromorphic架构在几个示例下能够泛化到新的概念。
results: CPG 在 SCAN 和 COGS 测试集上在标准和极端几个示例下达到了完美泛化。Abstract
Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Machine learning models, including the now ubiquitous transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificial neural architectures, motivates this study on a neuro-symbolic architecture called the Compositional Program Generator (CPG). CPG has three key features: modularity, type abstraction, and recursive composition, that enable it to generalize both systematically to new concepts in a few-shot manner, as well as productively by length on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input domain and a parser to generate a type hierarchy in which each grammar rule is assigned its own unique semantic module, a probabilistic copy or substitution program. Instances with the same hierarchy are processed with the same composed program, while those with different hierarchies may be processed with different programs. CPG learns parameters for the semantic modules and is able to learn the semantics for new types incrementally. Given a context-free grammar of the input language and a dictionary mapping each word in the source language to its interpretation in the output language, CPG can achieve perfect generalization on the SCAN and COGS benchmarks, in both standard and extreme few-shot settings.
摘要
人类 possess a crucial ability called "compositional generalization", which enables us to learn new concepts from just a few examples. However, current machine learning models, including transformers, struggle to generalize in this way and often require thousands of examples to generalize meaningfully. This difference in ability motivates the development of a neuro-symbolic architecture called the Compositional Program Generator (CPG).CPG has three key features: modularity, type abstraction, and recursive composition. These features enable CPG to generalize both systematically to new concepts in a few-shot manner and productively on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input domain and a parser to generate a type hierarchy, where each grammar rule is assigned its own unique semantic module. Instances with the same hierarchy are processed with the same composed program, while those with different hierarchies may be processed with different programs.CPG learns parameters for the semantic modules and can learn the semantics for new types incrementally. With a context-free grammar of the input language and a dictionary mapping each word in the source language to its interpretation in the output language, CPG can achieve perfect generalization on the SCAN and COGS benchmarks, both in standard and extreme few-shot settings.
A Metaheuristic for Amortized Search in High-Dimensional Parameter Spaces
For: 这篇论文的目的是提出一种新的元HEURISTIC方法,用于适应 Dynamical models of (bio)physical systems 的参数推断问题,解决了难以求导数、高维空间和非线性模型函数等问题。* Methods: 这篇论文使用了 Bayesian inference methods,通过考虑参数在统计分布下的方法,不需要计算点优化参数值。具体来说,DR-FFIT 实现了一种有效的采样策略,通过 feature-informed transformations 来降低维度,并使用人工神经网络来获得模型特征的导数。* Results: 测试数据显示,DR-FFIT 可以提高 random-search 和 simulated-annealing 等metaheuristics的性能,同时保持计算成本在合理的范围内。此外,DR-FFIT 还可以提高模型的准确性。Abstract
Parameter inference for dynamical models of (bio)physical systems remains a challenging problem. Intractable gradients, high-dimensional spaces, and non-linear model functions are typically problematic without large computational budgets. A recent body of work in that area has focused on Bayesian inference methods, which consider parameters under their statistical distributions and therefore, do not derive point estimates of optimal parameter values. Here we propose a new metaheuristic that drives dimensionality reductions from feature-informed transformations (DR-FFIT) to address these bottlenecks. DR-FFIT implements an efficient sampling strategy that facilitates a gradient-free parameter search in high-dimensional spaces. We use artificial neural networks to obtain differentiable proxies for the model's features of interest. The resulting gradients enable the estimation of a local active subspace of the model within a defined sampling region. This approach enables efficient dimensionality reductions of highly non-linear search spaces at a low computational cost. Our test data show that DR-FFIT boosts the performances of random-search and simulated-annealing against well-established metaheuristics, and improves the goodness-of-fit of the model, all within contained run-time costs.
摘要
参数推断 для动力学模型(生物)物理系统仍然是一个挑战。不可追加的梯度、高维空间和非线性模型函数通常会带来困难,除非有大量计算预算。近些年,这个领域的研究都集中在 Bayesian推断方法上,它们考虑参数在统计分布中,因此不会得到优化参数值的点 estimate。我们提出了一种新的metaheuristic,它可以从特征 Informed Transformations(DR-FFIT)中得到维度减少。DR-FFIT实现了一种高效的采样策略,该策略可以在高维空间中进行梯度free的参数搜索。我们使用人工神经网络来获得模型特征的可导代理。得到的梯度可以计算出模型在采样区域内的当地活跃子空间。这种方法可以高效地减少非线性搜索空间,在低计算成本下。我们的测试数据显示,DR-FFIT可以在Random-Search和Simulated-Annealing等已有metaheuristics的基础上提高性能,同时保持模型的匹配度, все在包含的运行时间成本下。
Universal Sleep Decoder: Aligning awake and sleep neural representation across subjects
results: 研究实现了Up to 16.6%的零尝试预测精度,与使用个体睡眠数据的表现相当。 fine-tuning USD on test subjects可以提高预测精度至25.9%,与基线的6.7%预测精度有显著差异。Abstract
Decoding memory content from brain activity during sleep has long been a goal in neuroscience. While spontaneous reactivation of memories during sleep in rodents is known to support memory consolidation and offline learning, capturing memory replay in humans is challenging due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness and sleep. To address these challenges, we designed a novel cognitive neuroscience experiment and collected a comprehensive, well-annotated electroencephalography (EEG) dataset from 52 subjects during both wakefulness and sleep. Leveraging this benchmark dataset, we developed the Universal Sleep Decoder (USD) to align neural representations between wakefulness and sleep across subjects. Our model achieves up to 16.6% top-1 zero-shot accuracy on unseen subjects, comparable to decoding performances using individual sleep data. Furthermore, fine-tuning USD on test subjects enhances decoding accuracy to 25.9% top-1 accuracy, a substantial improvement over the baseline chance of 6.7%. Model comparison and ablation analyses reveal that our design choices, including the use of (i) an additional contrastive objective to integrate awake and sleep neural signals and (ii) the pretrain-finetune paradigm to incorporate different subjects, significantly contribute to these performances. Collectively, our findings and methodologies represent a significant advancement in the field of sleep decoding.
摘要
“decode”brain activity during sleep的内容long been a goal in neuroscience. While spontaneous reactivation of memories during sleep in rodents is known to support memory consolidation and offline learning, capturing memory replay in humans is challenging due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness and sleep. To address these challenges, we designed a novel cognitive neuroscience experiment and collected a comprehensive, well-annotated electroencephalography (EEG) dataset from 52 subjects during both wakefulness and sleep. 积极应用这个benchmark dataset,我们开发了Universal Sleep Decoder (USD),用于对 neural representations between wakefulness and sleep across subjects进行对应。我们的模型在未看过数据的情况下可以达到16.6%的top-1零投票精度,与使用个体睡眠数据的decoding性能相当。此外,对测试 subjects进行 fine-tuning可以提高decoding精度到25.9%的top-1精度,与基线的6.7%精度有所提高。模型比较和简洁分析表明,我们的设计选择,包括(i)使用额外的对比性目标来整合睡眠和醒目的神经信号,以及(ii)使用pretrain-finetune paradigm来 incorporate different subjects,对这些性能做出了重要贡献。总的来说,我们的发现和方法ология在睡眠decoding领域 represent a significant advancement.
Resisting Backdoor Attacks in Federated Learning via Bidirectional Elections and Individual Perspective
results: 对于五个真实的 dataset,Snowball 与当前的防御技术进行比较,显示其在防止后门攻击方面更高效,并且对全球模型精度的影响较小。Abstract
Existing approaches defend against backdoor attacks in federated learning (FL) mainly through a) mitigating the impact of infected models, or b) excluding infected models. The former negatively impacts model accuracy, while the latter usually relies on globally clear boundaries between benign and infected model updates. However, model updates are easy to be mixed and scattered throughout in reality due to the diverse distributions of local data. This work focuses on excluding infected models in FL. Unlike previous perspectives from a global view, we propose Snowball, a novel anti-backdoor FL framework through bidirectional elections from an individual perspective inspired by one principle deduced by us and two principles in FL and deep learning. It is characterized by a) bottom-up election, where each candidate model update votes to several peer ones such that a few model updates are elected as selectees for aggregation; and b) top-down election, where selectees progressively enlarge themselves through picking up from the candidates. We compare Snowball with state-of-the-art defenses to backdoor attacks in FL on five real-world datasets, demonstrating its superior resistance to backdoor attacks and slight impact on the accuracy of the global model.
摘要
现有方法在 federated learning (FL) 中防止后门攻击主要通过:一、减轻感染模型的影响,或二、排除感染模型。前者会影响模型精度,而后者通常基于全局清晰的边界来分开干扰和不干扰模型更新。然而,模型更新在实际情况中容易杂mix和散布,这使得前两种方法具有局限性。这种工作关注于排除感染模型在 FL 中。与前一种全球视图不同,我们提出了 Snowball,一个新的反后门 FL 框架,基于个体视图而 inspirited 由我们所采用的一个原理和 FL 和深度学习中的两个原理。它的特点包括:a)底层选举,每个候选模型更新可以向几个同等 peer 模型更新投票,以选出一些模型更新作为选择者进行聚合;b)顶层选举,选择者逐渐扩大自己通过挑选候选模型更新。我们与现有的防御技术进行比较,在五个真实的数据集上,展示了 Snowball 对后门攻击的高度抵抗力和模型全球精度的轻微影响。
On the Trade-offs between Adversarial Robustness and Actionable Explanations
methods: 这 paper 使用了现有的 state-of-the-art 算法来生成可解释性和抗击攻击性的模型。
results: 这 paper 的结果表明,逐渐增加模型的抗击攻击性会导致可解释性减退,而且在某些情况下,可能导致可解释性和抗击攻击性之间存在负相关性。Abstract
As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations
摘要
machine learning模型在不同的高风险场景中越来越常被应用,因此确保这些模型的预测结果不仅抗击性强,而且可以快速地解释给关键参与者变得非常重要。然而,是否可以同时实现这两个目标,或者存在这两个目标之间的负担,这是一个未知的问题。在这种情况下,我们在研究抗击性模型对可行的解释的影响,这些解释可以提供结束用户一种纠正的机会。我们在理论和实际上分析了使用当前的算法生成的纠正措施的成本(实施的容易度)和有效性(获得正确模型预测结果的概率)。我们在理论上 deriv了对抗性模型和非抗击性模型的线性和非线性模型的分析结果,并通过多个实际数据集的实验 validate我们的理论结果,以验证模型的抗击性对纠正措施的影响。我们的分析结果表明,抗击性模型会增加纠正措施的成本和降低纠正措施的有效性,从而揭示了这两个目标之间的内在负担。
A parsimonious, computationally efficient machine learning method for spatial regression
paper_authors: Milan Žukovič, Dionissios T. Hristopulos
For: The paper is written for researchers and practitioners who are interested in spatial/temporal regression and machine learning methods.* Methods: The paper introduces the modified planar rotator method (MPRS), a non-parametric machine learning method that incorporates spatial or temporal correlations via short-range, distance-dependent “interactions” without assuming a specific form for the underlying probability distribution. The method uses equilibrium conditional Monte Carlo simulations to make predictions.* Results: The paper reports tests on various synthetic and real-world data in one, two, and three dimensions that demonstrate the competitiveness of MPRS prediction performance with standard interpolation methods such as ordinary kriging and inverse distance weighting, especially for rough and non-Gaussian data. The method also shows superior computational efficiency and scalability for large samples, allowing for the processing of massive data sets involving millions of nodes in a few seconds on a standard personal computer.Abstract
We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.
摘要
我团队引入修改的平面旋转方法(MPRS),这是一种物理启发的机器学习方法,用于空间/时间回归。MPRS 是一种非参数型模型,通过短距离、距离相互作用来包含空间或时间相关性,不需要假设特定的概率分布。预测通过完全自主学习算法,使用平衡conditional Monte Carlo仿真。MPRS 可以处理散列数据和任意空间维度。我们在一、二、三维 Synthetic 和实际数据上进行了多种测试,显示 MPRS 的预测性能(无需参数调整)与标准 interpolate 方法相匹配,如ordinary kriging 和 inverse distance weighting。尤其是在非欧几何数据(如日降雨时间序列)中,MPRS 表现出色,可以快速处理大量数据,只需几秒钟在标准个人电脑上处理百万个节点。
Nonlinear MPC design for incrementally ISS systems with application to GRU networks
results: 这种控制方法在 GRU 网络上应用,并提供了一种适应性较高的状态观察器,并且有确定的收敛保证。 tested on a benchmark system, demonstrating its good control performances and efficient applicability.Abstract
This brief addresses the design of a Nonlinear Model Predictive Control (NMPC) strategy for exponentially incremental Input-to-State Stable (ISS) systems. In particular, a novel formulation is devised, which does not necessitate the onerous computation of terminal ingredients, but rather relies on the explicit definition of a minimum prediction horizon ensuring closed-loop stability. The designed methodology is particularly suited for the control of systems learned by Recurrent Neural Networks (RNNs), which are known for their enhanced modeling capabilities and for which the incremental ISS properties can be studied thanks to simple algebraic conditions. The approach is applied to Gated Recurrent Unit (GRU) networks, providing also a method for the design of a tailored state observer with convergence guarantees. The resulting control architecture is tested on a benchmark system, demonstrating its good control performances and efficient applicability.
摘要
Translation notes:* "Nonlinear Model Predictive Control" is translated as "非线性预测控制" (fēi xiàn xìng yù jí kòng zhì)* "Input-to-State Stable" is translated as "输入到状态稳定" (yù xīn dào zhèng dìng)* "Gated Recurrent Unit" is translated as "闭合回归单元" (bì hé huí qù dān yuán)* "Recurrent Neural Networks" is translated as "循环神经网络" (xún huán shēn zhì wǎng wǎn)
results: 研究提供了非 asymptotic 的 risk bound,并证明了不同的收敛 режи。实验结果在 simulated 和实际数据上都有示例。Abstract
Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data.
摘要
预测(或选择性预测)是重要的错误敏感机器学学问。在分类设置下,选择性预测已经受到了广泛的研究。但在回归设置下,选择性预测还是很少研究。在这种工作中,我们考虑了非参数化不同方差回归问题,并开发了一种投票过程,通过测试对给定点的 conditional variance 的假设来实现选择性预测。与现有方法不同的是,我们的方法不仅考虑了 conditional variance 的值本身,还考虑了相关的 variance 预测器的uncertainty。我们证明了非 asymptotic 的风险 bound,并证明了不同的整合 regime 的存在。理论分析通过了一系列的 simulate 和实际数据实验来进行了说明。
Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption
methods: 使用模拟控制方法构建目标人口的合成治疗组,通过对源人口治疗组的权重进行最小化 conditional maximum mean discrepancy 来Estimate weights。
results: 我们建立了合成治疗组估计器的非 Parametric 性质,并通过实验证明了我们的方法可以作为 mean exchangeability assumption 被违反时的新的 complementary approach。Abstract
The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.
摘要
本研究的目的是将多个随机控制试验中的信息传输到目标人口,其中只有控制组数据 available。先前的研究几乎完全依赖于mean exchangeability假设。然而,根据当前的研究所指出,mean exchangeability假设可能被违反。我们被Synthetic control方法所驱动,通过对源人口中的治疗组组合一个权重的混合来构建目标人口中的 sintetic treatment组。我们对这个权重进行最小化条件的最大均值差来确定。我们建立了这种 sintetic treatment组估计器的几何分布正态性,基于 Sieving 半 Parametric 理论。我们的方法可以作为mean exchangeability假设不成立时的一种新的补充方法。我们在synthetic和实际世界数据上进行了实验,以证明我们的方法的有效性。
VAE-based latent-space classification of RNO-G data
results: 该方法可以自动检测和分类不同的噪声类型,包括物理风吹引起的信号和人工噪声。这些结果可以用来识别和分类不同的事件类型。Abstract
The Radio Neutrino Observatory in Greenland (RNO-G) is a radio-based ultra-high energy neutrino detector located at Summit Station, Greenland. It is still being constructed, with 7 stations currently operational. Neutrino detection works by measuring Askaryan radiation produced by neutrino-nucleon interactions. A neutrino candidate must be found amidst other backgrounds which are recorded at much higher rates -- including cosmic-rays and anthropogenic noise -- the origins of which are sometimes unknown. Here we describe a method to classify different noise classes using the latent space of a variational autoencoder. The latent space forms a compact representation that makes classification tractable. We analyze data from a noisy and a silent station. The method automatically detects and allows us to qualitatively separate multiple event classes, including physical wind-induced signals, for both the noisy and the quiet station.
摘要
<>格陵兰的电台中微子观测站(RNO-G)是一个位于格陵兰的电子基本高能中微子探测器。它目前正在建设,现有7个站点已经运行。中微子探测是通过测量阿斯卡莱涅发生的中微子-原子间相互作用产生的探测。中微子候选者需要在其他背景中被发现,其中包括宇宙射线和人类噪声,它们的起源有时未知。我们介绍了一种使用变量自动编码器的方法来分类不同的噪声类型。变量空间形成了一个紧凑的表示,使得分类变得可追踪。我们分析了具有噪声和无噪声的站点的数据。该方法自动检测并允许我们质量地分开多个事件类型,包括物理风引起的信号,对于两个站点都是如此。>>>
Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey
results: 该论文提供了一个全面的国家概述,概括了隐私中央化深度学习领域的最新进展和挑战,并对未来发展预测提出了一些建议。Abstract
Differential Privacy has become a widely popular method for data protection in machine learning, especially since it allows formulating strict mathematical privacy guarantees. This survey provides an overview of the state-of-the-art of differentially private centralized deep learning, thorough analyses of recent advances and open problems, as well as a discussion of potential future developments in the field. Based on a systematic literature review, the following topics are addressed: auditing and evaluation methods for private models, improvements of privacy-utility trade-offs, protection against a broad range of threats and attacks, differentially private generative models, and emerging application domains.
摘要
Diffential Privacy 已经成为机器学习中数据保护的广泛使用方法,特别是它允许提出严格的数学隐私保证。本调查提供了关于批处理隐私中心深度学习的现状报告,包括近期进展和开问题的系统性分析,以及未来发展领域的讨论。根据系统Literature Review,以下话题被讨论:隐私评估和评估方法 для私人模型,隐私利用 trait的改进,对各种威胁和攻击的保护,隐私生成模型,和emerging应用领域。
paper_authors: Lorenzo Beretta, Vincent Cohen-Addad, Silvio Lattanzi, Nikos Parotsidis for: 提高$k$-means clustering问题的解决方案质量methods: 使用$k$-means++搜索分布进行$O(k \log \log k)$次本地搜索,并通过多中心同时交换来提高解决方案质量results: 实现了$9 + \varepsilon$的近似比率,并在多个数据集上显示了重要的实践改进。Abstract
The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the $k$-means++ sampling distribution to yield a $c$-approximation to the $k$-means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.
摘要
“Arthur和Vassilvitskii(SODA 2007)的$k$-means++算法是优化流行的$k$-means减少对象的偏好算法,并且知道可以在期望下提供$O(\log k)$-approximation。为了获得更高质量的解决方案,Lattanzi和Sohler(ICML 2019)提出了在$k$-means++采样分布中进行$O(k \log \log k)$次本地搜索,以便实现$c$-approximation的$k$-means减少问题,其中$c$是一个大的绝对常数。在这里,我们扩展和推广他们的本地搜索算法,考虑更大和更复杂的本地搜索邻域,因此可以同时交换多个中心。我们的算法实现了$9 + \varepsilon$的approximation比率,这是本地搜索中的最佳可能性。重要的是,我们示出了我们的方法在许多数据集上实现了重要的实践改进。”
MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network
results: 在多种物料性质预测任务中显示出搭配MHG-GNN的扩展性和可靠性Abstract
Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.
摘要
物理预测在材料发现中扮演着重要的角色。作为材料科学基础模型的初步阶段,我们介绍了一种新的自适应神经网络(GNN),即MHG-GNN,它将分子 гиперграмmar(MHG)与神经网络相结合。对于多种不同材料的性能预测任务,MHG-GNN表现了良好的承诺。
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
results: 结果表明,锐度较高的损失函数极 minimum tend to have better generalization performance,尤其是对于来自新设备的Out-of-domain data。此外,选择优化器的选择是主要驱动锐度的变化。Abstract
The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the audio scene classification task of the DCASE2020 challenge data. Our analysis is based on twodimensional filter-normalised visualisations and a derived sharpness measure. Our exploratory analysis shows that sharper minima tend to show better generalisation than flat minima -even more so for out-of-domain data, recorded from previously unseen devices-, thus adding to the dispute about better generalisation capabilities of flat minima. We further find that, in particular, the choice of optimisers is a main driver of the sharpness of minima and we discuss resulting limitations with respect to comparability. Our code, trained model states and loss landscape visualisations are publicly available.
摘要
TEXT深度神经网络中loss minimum的锐度和泛化关系已经引起了长时间的讨论。我们在DCASE2020挑战数据集上进行了对音频Scene classification任务的分析,使用两维filter-normalized visualization和派生的锐度度量。我们的探索分析表明,锐度较高的 minimum 往往具有更好的泛化性,尤其是在外部数据集和从前未知设备上录制的数据集中。此外,我们发现了优化器的选择是loss minimum的锐度的主要驱动器,并讨论了相关的局限性。我们的代码、训练模型状态和损失地图Visualization都是公共可用的。Here's the translation in Traditional Chinese:TEXT深度神经网络中loss minimum的锯度和泛化关系已经引起了长时间的讨论。我们在DCASE2020挑战数据集上进行了对音频Scene classification任务的分析,使用两维filter-normalized visualization和派生的锯度度量。我们的探索分析显示,锯度较高的 minimum 往往具有更好的泛化性,尤其是在外部数据集和从前未知设备上录制的数据集中。此外,我们发现了优化器的选择是loss minimum的锯度的主要驱动器,并讨论了相关的局限性。我们的代码、训练模型状态和损失地图Visualization都是公共可用的。
Leveraging Pre-trained Language Models for Time Interval Prediction in Text-Enhanced Temporal Knowledge Graphs
methods: 本研究使用了语言模型的参数知识,将文本和时间信息分别处理,并将其相互融合以生成可能性分数。与之前的方法不同,本研究能够 Capture time dependencies and perform inductive inference on unseen entities.
results: 在不同的时间间隔预测和 triple classification 任务中,TEMT 与当前状态域的方法相当竞争,并且在 inductive Settings中表现出 excel。Abstract
Most knowledge graph completion (KGC) methods learn latent representations of entities and relations of a given graph by mapping them into a vector space. Although the majority of these methods focus on static knowledge graphs, a large number of publicly available KGs contain temporal information stating the time instant/period over which a certain fact has been true. Such graphs are often known as temporal knowledge graphs. Furthermore, knowledge graphs may also contain textual descriptions of entities and relations. Both temporal information and textual descriptions are not taken into account during representation learning by static KGC methods, and only structural information of the graph is leveraged. Recently, some studies have used temporal information to improve link prediction, yet they do not exploit textual descriptions and do not support inductive inference (prediction on entities that have not been seen in training). We propose a novel framework called TEMT that exploits the power of pre-trained language models (PLMs) for text-enhanced temporal knowledge graph completion. The knowledge stored in the parameters of a PLM allows TEMT to produce rich semantic representations of facts and to generalize on previously unseen entities. TEMT leverages textual and temporal information available in a KG, treats them separately, and fuses them to get plausibility scores of facts. Unlike previous approaches, TEMT effectively captures dependencies across different time points and enables predictions on unseen entities. To assess the performance of TEMT, we carried out several experiments including time interval prediction, both in transductive and inductive settings, and triple classification. The experimental results show that TEMT is competitive with the state-of-the-art.
摘要
大多数知识图完成(KGC)方法将实体和关系映射到一个维度空间中,以便学习缓存的表示。尽管大多数这些方法专注于静态知识图,但许多公共可用的KG具有时间信息,表示一个特定事实在某个时间点/时间段内是真。这些图被称为temporal知识图。此外,知识图也可能包含实体和关系的文本描述。 static KGC方法不会考虑时间信息和文本描述,只是利用结构信息来学习表示。最近,一些研究使用了时间信息来改进链接预测,但是它们不会利用文本描述,并且不支持推导推理(预测已经在训练中没有看到的实体)。我们提出了一种新的框架called TEMT,它利用预训练语言模型(PLM)来提高文本扩展 temporal knowledge graph completion。TEMT可以利用知识图中的文本和时间信息,将它们分 separetely处理,并将它们融合以获得可能性分数。与先前的方法不同,TEMT可以有效地捕捉不同时间点之间的依赖关系,并允许预测未经训练的实体。为评估TEMT的性能,我们进行了多个实验,包括时间间隔预测、满意度预测和 triple classification。实验结果表明,TEMT与状态之前的方法竞争。
ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging
paper_authors: Ali Ismail-Fawaz, Hassan Ismail Fawaz, François Petitjean, Maxime Devanne, Jonathan Weber, Stefano Berretti, Geoffrey I. Webb, Germain Forestier
results: 根据UCR数据集库中的123个数据集,与k-means减少算法相结合,ShapeDTW Barycentric Average可以达到新的州OF-THE-ARTResultsin Adjusted Rand Index。Abstract
Time series data can be found in almost every domain, ranging from the medical field to manufacturing and wireless communication. Generating realistic and useful exemplars and prototypes is a fundamental data analysis task. In this paper, we investigate a novel approach to generating realistic and useful exemplars and prototypes for time series data. Our approach uses a new form of time series average, the ShapeDTW Barycentric Average. We therefore turn our attention to accurately generating time series prototypes with a novel approach. The existing time series prototyping approaches rely on the Dynamic Time Warping (DTW) similarity measure such as DTW Barycentering Average (DBA) and SoftDBA. These last approaches suffer from a common problem of generating out-of-distribution artifacts in their prototypes. This is mostly caused by the DTW variant used and its incapability of detecting neighborhood similarities, instead it detects absolute similarities. Our proposed method, ShapeDBA, uses the ShapeDTW variant of DTW, that overcomes this issue. We chose time series clustering, a popular form of time series analysis to evaluate the outcome of ShapeDBA compared to the other prototyping approaches. Coupled with the k-means clustering algorithm, and evaluated on a total of 123 datasets from the UCR archive, our proposed averaging approach is able to achieve new state-of-the-art results in terms of Adjusted Rand Index.
摘要
时序数据可以在各个领域中找到,从医疗领域到制造和无线通信。生成实用和真实的示例和原型是时序数据分析的基本任务。在这篇论文中,我们研究了一种新的方法来生成实用和真实的示例和原型 для时序数据。我们的方法使用了一种新的时序数据平均方法,即ShapeDTW矩阵平均。因此,我们转移我们的注意力于准确地生成时序示例原型的新方法。现有的时序示例原型生成方法基于动态时间戳匹配(DTW)相似度度量,如DTW矩阵平均(DBA)和SoftDBA。这些方法都受到了生成不符合分布的缺陷,主要是因为DTW变体使用的匹配方法无法检测邻域相似性,而是检测绝对相似性。我们提议的方法,ShapeDBA,使用ShapeDTW变体,解决了这个问题。我们选择了时序分组,一种流行的时序分析方法来评估ShapeDBA的结果与其他原型生成方法相比。与k-means分 clustering算法结合,我们在UCAR存档中的总共123个数据集上进行了评估,并实现了新的状态之册 Rand Index的最佳成绩。
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite
results: 论文 introduces physical metrics like kinetic energy MSE and Sinkhorn distance to measure the performance of learned surrogates, and provides baseline results using GNNs like GNS and SEGNN.Abstract
Machine learning has been successfully applied to grid-based PDE modeling in various scientific applications. However, learned PDE solvers based on Lagrangian particle discretizations, which are the preferred approach to problems with free surfaces or complex physics, remain largely unexplored. We present LagrangeBench, the first benchmarking suite for Lagrangian particle problems, focusing on temporal coarse-graining. In particular, our contribution is: (a) seven new fluid mechanics datasets (four in 2D and three in 3D) generated with the Smoothed Particle Hydrodynamics (SPH) method including the Taylor-Green vortex, lid-driven cavity, reverse Poiseuille flow, and dam break, each of which includes different physics like solid wall interactions or free surface, (b) efficient JAX-based API with various recent training strategies and three neighbor search routines, and (c) JAX implementation of established Graph Neural Networks (GNNs) like GNS and SEGNN with baseline results. Finally, to measure the performance of learned surrogates we go beyond established position errors and introduce physical metrics like kinetic energy MSE and Sinkhorn distance for the particle distribution. Our codebase is available at https://github.com/tumaer/lagrangebench .
摘要
机器学习已成功应用于网格基的 Partiall differential equation 模型化在不同科学领域。然而,基于拉格朗日 particulate 方法的学习 PDE 解决方案,这些解决方案更适用于具有自由表面或复杂物理的问题,仍然未得到充分探索。我们介绍了 LagrangeBench,首个为 Lagrangian particulate 问题提供了benchmarking 集成。具体来说,我们的贡献包括:(a) seven 个新的 fluid mechanics 数据集(四个在 2D 和三个在 3D),通过 Smoothed Particle Hydrodynamics (SPH) 方法生成,包括泰勒-绿 Vortex、封闭 Cavity、反 Poiseuille 流和溢流,每个数据集都包含不同的物理学如固体壁面交互或自由表面。(b)高效的 JAX-based API,包括各种最新的训练策略和三种邻居搜索算法。(c) JAX 实现了一些Established Graph Neural Networks (GNNs),如 GNS 和 SEGNN,以及基线结果。最后,为了评估学习的表现,我们不仅使用了传统的位置错误,还引入了物理指标如动能差分和 Sinkhorn 距离,用于测试 particulate 分布的性能。我们的代码库可以在 https://github.com/tumaer/lagrangebench 上下载。
EFFL: Egalitarian Fairness in Federated Learning for Mitigating Matthew Effect
for: The paper aims to address the issue of unequal representation and bias in federated learning (FL) when dealing with heterogeneous datasets from multiple clients.
methods: The proposed Egalitarian Fairness Federated Learning (EFFL) method uses a constrained multi-objective optimization approach to optimize the egalitarian fairness and performance of the global model.
results: The proposed EFFL algorithm outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.Here’s the simplified Chinese text for the three information points:
results: 提议的 EFFL 算法在实现高性能全局模型的同时,也能够提高所有客户端的 egalitarian fairness。Abstract
Recent advances in federated learning (FL) enable collaborative training of machine learning (ML) models from large-scale and widely dispersed clients while protecting their privacy. However, when different clients' datasets are heterogeneous, traditional FL mechanisms produce a global model that does not adequately represent the poorer clients with limited data resources, resulting in lower accuracy and higher bias on their local data. According to the Matthew effect, which describes how the advantaged gain more advantage and the disadvantaged lose more over time, deploying such a global model in client applications may worsen the resource disparity among the clients and harm the principles of social welfare and fairness. To mitigate the Matthew effect, we propose Egalitarian Fairness Federated Learning (EFFL), where egalitarian fairness refers to the global model learned from FL has: (1) equal accuracy among clients; (2) equal decision bias among clients. Besides achieving egalitarian fairness among the clients, EFFL also aims for performance optimality, minimizing the empirical risk loss and the bias for each client; both are essential for any ML model training, whether centralized or decentralized. We formulate EFFL as a constrained multi-constrained multi-objectives optimization (MCMOO) problem, with the decision bias and egalitarian fairness as constraints and the minimization of the empirical risk losses on all clients as multiple objectives to be optimized. We propose a gradient-based three-stage algorithm to obtain the Pareto optimal solutions within the constraint space. Extensive experiments demonstrate that EFFL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.
摘要
To address this issue, we propose Egalitarian Fairness Federated Learning (EFFL), which aims to achieve egalitarian fairness among clients in two aspects:1. Equal accuracy among clients.2. Equal decision bias among clients.Besides egalitarian fairness, EFFL also pursues performance optimality by minimizing the empirical risk loss and bias for each client. This is essential for any ML model training, whether centralized or decentralized.We formulate EFFL as a constrained multi-constrained multi-objectives optimization (MCMOO) problem, with the decision bias and egalitarian fairness as constraints and the minimization of the empirical risk losses on all clients as multiple objectives to be optimized. To obtain the Pareto optimal solutions within the constraint space, we propose a gradient-based three-stage algorithm.Extensive experiments demonstrate that EFFL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.
DeepPCR: Parallelizing Sequential Operations in Neural Networks
results: 在多层感知器中实现了高达30倍的前向传播速度增加和高达200倍的反向传播速度增加,以及在 diffusion 模型中实现了更快的训练和生成速度Abstract
Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes are executed layer-by-layer, and the output of diffusion models is produced by applying a sequence of denoising steps. This sequential approach results in a computational cost proportional to the number of steps involved, presenting a potential bottleneck as the number of steps increases. In this work, we introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations in order to speed up inference and training of neural networks. DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm. This reduces the complexity of computing the sequential operations from $\mathcal{O}(L)$ to $\mathcal{O}(\log_2L)$, thus yielding a speedup for large $L$. To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons, and reach speedups of up to $30\times$ for the forward and $200\times$ for the backward pass. We additionally showcase the flexibility of DeepPCR by parallelizing training of ResNets with as many as 1024 layers, and generation in diffusion models, enabling up to $7\times$ faster training and $11\times$ faster generation, respectively, when compared to the sequential approach.
摘要
深度学习模型的推理和训练速度加速技术已经广泛应用。然而,许多操作仍然以序列方式进行,例如层次推理和反向传播。这种序列方式会导致计算成本与操作步骤数直线关系,从而带来计算成本增加的潜在瓶颈。在这项工作中,我们介绍了深度PCR算法,它可以将通常以序列方式进行的操作并行化,以加速深度学习模型的推理和训练。深度PCR基于解释一系列$L$步骤为特定系统方程的解,我们使用并行循环减少算法来解决这些方程。这将计算序列操作的复杂度从 $\mathcal{O}(L)$ 降低到 $\mathcal{O}(\log_2L)$,从而实现大量$L$的速度增加。为了证明算法的理论下界复杂度,以及哪些情况下可以获得加速,我们在多层感知器的前向和反向传播中测试了深度PCR的效果,并达到了最多$30\times$的加速。此外,我们还示cases了深度PCR的灵活性,可以并行训练具有1024层的ResNet模型,以及在Diffusion模型中的生成过程,各自实现了$7\times$快的训练和$11\times$快的生成。
Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models
results: 研究发现,在训练数据丰富的情况下,Astroconformer可以实现Surface gravity(log g)的估计,其RMSE为0.017 dex,而在训练数据稀缺的情况下,RMSE可以达0.1 dex。此外,这个模型也超越了传统的K-nearest neighbor-based模型和现有的CNN模型。Abstract
Light curves of stars encapsulate a wealth of information about stellar oscillations and granulation, thereby offering key insights into the internal structure and evolutionary state of stars. Conventional asteroseismic techniques have been largely confined to power spectral analysis, neglecting the valuable phase information contained within light curves. While recent machine learning applications in asteroseismology utilizing Convolutional Neural Networks (CNNs) have successfully inferred stellar attributes from light curves, they are often limited by the local feature extraction inherent in convolutional operations. To circumvent these constraints, we present $\textit{Astroconformer}$, a Transformer-based deep learning framework designed to capture long-range dependencies in stellar light curves. Our empirical analysis, which focuses on estimating surface gravity ($\log g$), is grounded in a carefully curated dataset derived from $\textit{Kepler}$ light curves. These light curves feature asteroseismic $\log g$ values spanning from 0.2 to 4.4. Our results underscore that, in the regime where the training data is abundant, $\textit{Astroconformer}$ attains a root-mean-square-error (RMSE) of 0.017 dex around $\log g \approx 3 $. Even in regions where training data are sparse, the RMSE can reach 0.1 dex. It outperforms not only the K-nearest neighbor-based model ($\textit{The SWAN}$) but also state-of-the-art CNNs. Ablation studies confirm that the efficacy of the models in this particular task is strongly influenced by the size of their receptive fields, with larger receptive fields correlating with enhanced performance. Moreover, we find that the attention mechanisms within $\textit{Astroconformer}$ are well-aligned with the inherent characteristics of stellar oscillations and granulation present in the light curves.
摘要
星星的光谱Curve包含了许多关于恒星振荡和 granulation的信息,因此可以提供关键的内部结构和演化状态信息。传统的asteroseismic技术主要是通过功率 spectral analysis来分析,忽略了光谱Curve中的价值得 phase information。而最近的机器学习应用在asteroseismology中使用Convolutional Neural Networks (CNNs) 已经成功地从光谱Curve中推断出了星宿属性,但它们frequently受到了 convolutional operation中的本地特征提取的限制。为了缺乏这些限制,我们提出了 $\textit{Astroconformer}$,一种基于Transformer的深度学习框架,可以捕捉stellar light curves中的长距离依赖关系。我们的实验,关注于estersurface gravity( $\log g$ )的估算,基于 $\textit{Kepler}$ 光谱Curve的精心划分 datasets。这些光谱Curve的asteroseismic $\log g$ 值覆盖了0.2至4.4之间。我们的结果表明,当训练数据充足时, $\textit{Astroconformer}$ 在 $\log g \approx 3 $ 的 regime内具有根圆弧误差(RMSE)为0.017 dex。甚至在训练数据稀缺的地方,RMSE可以达到0.1 dex。它不仅超过了基于K-nearest neighbor( $\textit{The SWAN}$)的模型,还超过了当前的 state-of-the-art CNNs。归并学习表明,模型在这个特定任务中的 efficacy 强烈受到了其 reception field 的大小的影响,大 reception field 与更高的表现相关。此外,我们发现 $\textit{Astroconformer}$ 中的注意机制与stellar oscillations和 granulation在光谱Curve中的特点相吻合。
A Primer on Bayesian Neural Networks: Review and Debates
results: 本文提供了一种系统性的介绍,涵盖了 bayesian 统计学和神经网络之间的相互作用,以及在实际应用中的考虑因素。 additionally, the paper explores advanced topics in BNN research and acknowledges ongoing debates and controversies in the field.Abstract
Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
摘要
neural networks 已经在不同的问题领域 достичь了很高的表现,但它们的广泛应用受到了内在的限制,如预测时的过度自信、难以解释性和针对攻击的敏感性。为了解决这些挑战, Bayesian neural networks(BNNs)作为传统神经网络的吸收性扩展,将不确定性估计integrated into its predictive capabilities。这个全面的指南将为拥有bayesian方法背景的统计学家提供系统性的引入,以及擅长深度学习的机器学习专家,尽管有限的bayesian统计知识。我们将详细介绍常用的 prior,并 analyze its impact on model behavior and performance。此外,我们还会讨论BNNs在训练和推理中的实际问题。此外,我们还会探讨BNN的高级主题,包括正在进行的辩论和争议。通过这个指南,您将获得BNN的坚实基础知识,并了解这个动态领域的潜在应用。作为一个有价值的资源,这个指南将促进BNN的理解和推动知识和创新的进步。
3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information
results: 在7个标准测试 benchmark 上比较了3D-Mol 与多种现有基eline(SOTA),在5个标准测试 benchmark 上表现出色。Abstract
Molecular property prediction offers an effective and efficient approach for early screening and optimization of drug candidates. Although deep learning based methods have made notable progress, most existing works still do not fully utilize 3D spatial information. This can lead to a single molecular representation representing multiple actual molecules. To address these issues, we propose a novel 3D structure-based molecular modeling method named 3D-Mol. In order to accurately represent complete spatial structure, we design a novel encoder to extract 3D features by deconstructing the molecules into three geometric graphs. In addition, we use 20M unlabeled data to pretrain our model by contrastive learning. We consider conformations with the same topological structure as positive pairs and the opposites as negative pairs, while the weight is determined by the dissimilarity between the conformations. We compare 3D-Mol with various state-of-the-art (SOTA) baselines on 7 benchmarks and demonstrate our outstanding performance in 5 benchmarks.
摘要
молекулярная свойство предсказание предлагает эффективный и эффективный подход для раннего скрининга и оптимизации кандидатов на лекарства. хотя методы на основе глубокого обучения сделали заметный прогресс, большинство существующих работ еще не полностью используют информацию о 3D-пространстве. это может привести к ситуации, когда один молекулярный представление отображает несколько реальных молекул. для решения этих проблем мы предлагаем новый метод 3D-Mol, который использует структурную моделирование молекул на основе трехмерных графиков. кроме того, мы используем 20M немаркированных данных для предварительного обучения нашего модели с помощью обучения с contraste. мы считаем конфигурации с одинаковой топологической структурой положительными парами, а противоположные конфигурации - отрицательными парами, а вес определяется с помощью разницы между конфигурациями. мы сравниваем 3D-Mol с различными стандартными базами на 7 benchmarks и демонстрируем нашу выдающуюся производительность на 5 benchmarks.
CasIL: Cognizing and Imitating Skills via a Dual Cognition-Action Architecture
results: 实验结果表明, compared to其他方法, CasIL在多种长期任务中的机器人技能模仿能力具有竞争力和可靠性。Abstract
Enabling robots to effectively imitate expert skills in longhorizon tasks such as locomotion, manipulation, and more, poses a long-standing challenge. Existing imitation learning (IL) approaches for robots still grapple with sub-optimal performance in complex tasks. In this paper, we consider how this challenge can be addressed within the human cognitive priors. Heuristically, we extend the usual notion of action to a dual Cognition (high-level)-Action (low-level) architecture by introducing intuitive human cognitive priors, and propose a novel skill IL framework through human-robot interaction, called Cognition-Action-based Skill Imitation Learning (CasIL), for the robotic agent to effectively cognize and imitate the critical skills from raw visual demonstrations. CasIL enables both cognition and action imitation, while high-level skill cognition explicitly guides low-level primitive actions, providing robustness and reliability to the entire skill IL process. We evaluated our method on MuJoCo and RLBench benchmarks, as well as on the obstacle avoidance and point-goal navigation tasks for quadrupedal robot locomotion. Experimental results show that our CasIL consistently achieves competitive and robust skill imitation capability compared to other counterparts in a variety of long-horizon robotic tasks.
摘要
启用机器人效果地模仿专家技能,如行走、抓取和更多的任务,是长期挑战。现有的机器人学习(IL)方法仍然在复杂任务中表现不佳。在这篇论文中,我们考虑了如何通过人类认知优先级来解决这个挑战。我们准确地将行为扩展到高级认知(高水平)和低级动作(低水平)的双核心架构中,并提出了一种基于人类认知优先级的新型技能IL框架,称为认知动作基于技能学习(CasIL)。这种框架使得机器人代理人能够有效地认识和模仿从原始视觉示例中的关键技能。在高级认知指导低级动作的情况下,CasIL实现了both cognition和action imitation,提供了robustness和可靠性 для整个技能IL过程。我们在MuJoCo和RLBench标准吨量上进行了测试,以及 quadrupedal robot locomotion的障碍物避免和点目标导航任务。实验结果表明,我们的CasIL在多种长期机器人任务中具有竞争力和可靠的技能模仿能力。
A framework for paired-sample hypothesis testing for high-dimensional data
paper_authors: Ioannis Bargiotas, Argyris Kalogeratos, Nicolas Vayatis
for: This paper proposes a new approach to multidimensional paired-sample testing, which can handle numerous features and provide accurate results.
methods: The proposed approach uses scoring functions produced by decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. The optimal scoring function is obtained by the pseudomedian of those rules, which is estimated using the Hodges-Lehmann estimator.
results: The proposed approach is shown to have substantial performance gains in testing accuracy compared to traditional multivariate and multiple testing methods, while also providing estimates of each feature’s contribution to the final result.Abstract
The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.
摘要
traditional multivariate and multiple testing方法在多维度设定下存在一些缺陷,特别是当数据集具有大量特征时。一些研究表明,分类准确率可以作为两个样本测试的代理。然而,这种策略的理论基础和实践方法尚未得到过详细的探讨。在这项工作中,我们提出了一种思路,即可以通过定义每对实例之间的垂线段的截距线性函数来生成评分函数。然后,我们可以通过拓展自然的方式来获得最佳评分函数,这里我们使用拓展了HODGES-LEHMANN estimator来进行估计。我们因此提出了一种两步测试方法。第一步是估计每对实例之间的截距线性函数,并使用这些函数 derive一个总评分函数。然后,我们使用这个总评分函数对paired samples进行分类,并生成一个一维ensional的表示。第二步是在 obtained representation上perform Wilcoxon signed-rank test。我们的实验表明,我们的方法在测试准确率方面具有substantial的性能提升,同时能够计算每个特征对最终结果的贡献。
Hierarchical Network Data Analytics Framework for B5G Network Automation: Design and Implementation
results: 通过使用开源软件(i.e., free5GC)进行广泛的 simulate结果表明,H-NDAF可以提供充分准确的分析结果,并且比 convential NWDAF更快地提供分析结果Here is the same information in Simplified Chinese:
results: 通过使用开源软件(i.e., free5GC)进行广泛的 simulate结果表明,H-NDAF可以提供充分准确的分析结果,并且比 convential NWDAF更快地提供分析结果Abstract
5G introduced modularized network functions (NFs) to support emerging services in a more flexible and elastic manner. To mitigate the complexity in such modularized NF management, automated network operation and management are indispensable, and thus the 3rd generation partnership project (3GPP) has introduced a network data analytics function (NWDAF). However, a conventional NWDAF needs to conduct both inference and training tasks, and thus it is difficult to provide the analytics results to NFs in a timely manner for an increased number of analytics requests. In this article, we propose a hierarchical network data analytics framework (H-NDAF) where inference tasks are distributed to multiple leaf NWDAFs and training tasks are conducted at the root NWDAF. Extensive simulation results using open-source software (i.e., free5GC) demonstrate that H-NDAF can provide sufficiently accurate analytics and faster analytics provision time compared to the conventional NWDAF.
摘要
5G 引入模块化网络功能(NF)以支持出现的服务更加灵活和弹性。为了减少这些模块化 NF 的管理复杂性,自动化网络运维和管理是必要的,因此3GPP 引入了网络数据分析功能(NWDAF)。然而,传统的 NWDAF 需要同时进行推理和训练任务,因此难以在增加数据分析请求后提供分析结果。在本文中,我们提议一种层次网络数据分析框架(H-NDAF),其中推理任务被分配到多个叶 NWDAF,而训练任务则在根 NWDAF 中进行。经过大量的 simulations 结果,我们发现 H-NDAF 可以提供充分的准确性和更快的分析结果提供时间,相比传统的 NWDAF。
for: investigate how well context alone may be used to predict tweet engagement likelihood
methods: employ the Spark engine on TU Wien’s Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines, and manually create just under 200 additional features to describe tweet context
results: features describing users’ prior engagement history and the popularity of hashtags and links in the tweet were the most informative, and factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results, with context-based models underperforming in terms of the RCE score compared to content-only models and models developed by the Challenge winners.Abstract
Twitter is currently one of the biggest social media platforms. Its users may share, read, and engage with short posts called tweets. For the ACM Recommender Systems Conference 2020, Twitter published a dataset around 70 GB in size for the annual RecSys Challenge. In 2020, the RecSys Challenge invited participating teams to create models that would predict engagement likelihoods for given user-tweet combinations. The submitted models predicting like, reply, retweet, and quote engagements were evaluated based on two metrics: area under the precision-recall curve (PRAUC) and relative cross-entropy (RCE). In this diploma thesis, we used the RecSys 2020 Challenge dataset and evaluation procedure to investigate how well context alone may be used to predict tweet engagement likelihood. In doing so, we employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines. We manually created just under 200 additional features to describe tweet context. The results indicate that features describing users' prior engagement history and the popularity of hashtags and links in the tweet were the most informative. We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results. After comparing the best results of our context-only prediction models with content-only models and with models developed by the Challenge winners, we identified that the context-based models underperformed in terms of the RCE score. This work thus concludes by situating this discrepancy and proposing potential improvements to our implementation, which is shared in a public git repository.
摘要
推特是目前最大的社交媒体平台之一,其用户可以分享、阅读和参与短消息 called tweets。2020年ACM推荐系统会议上,推特发布了约70GB的数据集,并邀请参与者们创建模型,以预测给定用户-消息组合的参与可能性。提交的模型包括like、回复、转推和引用参与的预测都会被评估基于两个指标:精度-回归曲线面积(PRAUC)和相对杂化率(RCE)。在本毕业论文中,我们使用2020年RecSys挑战的数据集和评估方法,以 investigate how well context alone may be used to predict tweet engagement likelihood。我们使用TU Wien的Little Big Data Cluster上的Spark引擎,创建了可扩展的数据处理、工程、特征选择和机器学习管道。我们手动创建了约200个特征来描述消息上下文。结果显示,用户的前一次参与历史和消息中的话题和链接的流行程度是最有用的特征。我们还发现,预测算法、训练数据集大小、训练数据集采样方法和特征选择会影响结果。在与挑战赛得奖者的模型进行比较后,我们发现context-only模型在RCE指标上表现较差。这项工作因此结束,并提出了可能的改进方案,并在公共Git存储库中分享。
results: 该论文的实验结果表明,mSMI在独立测试、多视图学习、公平性检测和生成模型中具有优异表现,并且在计算量方面具有明显的优势。Abstract
Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.
摘要
高维 Random variable 之间的关系衡量是统计学学习和推断中的中心问题。两种经典方法是Canonical correlation analysis (CCA),它可以找到最大相关的投影后的原始变量,以及Shannon的共同信息 (mutual information),它是一种通用的关系度量,同时也能捕捉高阶关系。但是CCA只能考虑线性关系,可能不够 для某些应用,而共同信息则在高维时常不可计算或估计。这项工作提出了一种可扩展的信息理论基于CCA的方法,称为最大剖分共同信息 (mSMI)。mSMI等于高维变量的低维投影中的最大共同信息,在Gaussian情况下降到了CCA。它同时具有了两种方法的优点:能够捕捉数据中的复杂关系,并且可以快速计算和可扩展地估计。我们证明了mSMI保持了共同信息的有利结构性质,如变量形式和独立性识别。然后,我们研究了mSMI的统计估计,提出了一种高效计算的神经网络估计器,并与非假顺序 bound 相结合。我们在几个任务上进行了实验,包括独立性测试、多视图学习、算法公平和生成模型。我们发现mSMI在这些任务上一般性能更高,而且具有微不足的计算开销。
results: 提供实例 dependent regret bound,证明StackelbergLearner算法可以学习一个最佳尝试策略,可以与任何比较器策略进行竞争,无需数据覆盖和强 функ数据近似条件。通过广泛的实验,发现StackelbergLearner算法在批处理RL benchmark和实际数据上表现良好或更好。Abstract
Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynamic naturally lends StackelbergLearner to a game-theoretic interpretation and provides a convergence guarantee to differentiable Stackelberg equilibria. From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data. Notably, our theoretical regret guarantees only require realizability without any data coverage and strong function approximation conditions, e.g., Bellman closedness, which is in contrast to prior works lacking such guarantees. Through comprehensive experiments, we find that our algorithm consistently performs as well or better as compared to state-of-the-art methods in batch RL benchmark and real-world datasets.
摘要
批量强化学习(RL)定义为从固定批量数据中学习,缺乏完整的探索。最坏情况优化算法,它们从日志体验中拟合值函数模型类型,并在学习后进行一种类型的悲观评估。在这篇论文中,我们采用了游戏论视角,将策略学习图表作为两个玩家的通用和总和游戏,其中一个是领导者,另一个是追随者。我们提出了一种新的随机梯度学习算法:StackelbergLearner,其中领导者玩家更新根据总对象的梯度而不是各个梯度,而追随者玩家进行个人更新,并确保转移逻辑一致。这种学习动态自然地具有游戏论视角,并提供了对分解 Stackelberg 平衡的启发性证明。从理论上看,我们提供了实例特定的 regret bound,证明我们的算法可以学习一个最大努力策略,可以与任何比较器策略进行竞争,这些策略只需要涵盖批量数据中的一部分。值得注意的是,我们的理论 regret 保证只需要 realizability 和 strong function approximation 条件,而不需要数据覆盖和强函数approximation 条件,这与先前的方法不同。通过对比periment,我们发现我们的算法在批量 RL 标准测试数据集和实际数据中表现了良好的性能。
Systematic Sampling and Validation of Machine Learning-Parameterizations in Climate Models
results: 研究发现,在线模拟中的模型性能可以通过添加内存、温度湿度输入变换和额外输入变量进行改进。同时,研究还发现了在线模拟错误的大量变化和在线vs. 离线错误统计的不一致现象。这些结果表明,需要评估数百个候选机器学习模型,以检测参数设计选择的效果。Abstract
Progress in hybrid physics-machine learning (ML) climate simulations has been limited by the difficulty of obtaining performant coupled (i.e. online) simulations. While evaluating hundreds of ML parameterizations of subgrid closures (here of convection and radiation) offline is straightforward, online evaluation at the same scale is technically challenging. Our software automation achieves an order-of-magnitude larger sampling of online modeling errors than has previously been examined. Using this, we evaluate the hybrid climate model performance and define strategies to improve it. We show that model online performance improves when incorporating memory, a relative humidity input feature transformation, and additional input variables. We also reveal substantial variation in online error and inconsistencies between offline vs. online error statistics. The implication is that hundreds of candidate ML models should be evaluated online to detect the effects of parameterization design choices. This is considerably more sampling than tends to be reported in the current literature.
摘要
“ hybrid physics-machine learning(ML)气象模拟的进步受到了在线(同时)模拟的困难所限制。虽然可以轻松地在离线环境中评估数百个ML参数化的子网格闭合(如风化和辐射),但在同样的大小级别上进行在线评估是技术上困难的。我们的软件自动化实现了在线模型评估中的样本增加,相比之前的评估,样本数量增加了一个数量级。使用这些样本,我们评估了混合气象模型的性能,并定义了改进策略。我们发现,在线模型性能会提高,当将记忆、湿度输入特征变换和额外输入变量添加到模型中时。我们还发现了在线错误的重大变化和在离线vs在线错误统计之间的不一致。这表明需要评估数百个候选机器学习模型,这比现有文献报道的评估范围更多。”
Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation
for: delete information from pre-trained graph neural network (GNN) to comply with data protection regulations and reduce carbon footprint
methods: knowledge distillation, model-agnostic approach, dividing and marking complete graph knowledge for retention and deletion, using response-based soft targets and feature-based node embedding, minimizing KL divergence
results: surpasses existing methods in various real-world graph datasets by up to $43.1%$ (AUC) in edge and node unlearning tasks, better efficiency, better performance in removing target elements, preservation of performance for the retained elements, zero overhead costs, surpasses state-of-the-art GNNDelete in AUC by $2.4%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.Here’s the Chinese version:
results: surpasses 现有方法在各种实际图数据集中 by up to $43.1%$ (AUC) 的边和节点解启 зада务, 更好的效率, 更好地 removing 目标元素, 保留元素性能的 preserved, 零开销成本, surpasses 状态艺术 GNNDelete 的 AUC by $2.4%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.Abstract
Graph unlearning has emerged as a pivotal method to delete information from a pre-trained graph neural network (GNN). One may delete nodes, a class of nodes, edges, or a class of edges. An unlearning method enables the GNN model to comply with data protection regulations (i.e., the right to be forgotten), adapt to evolving data distributions, and reduce the GPU-hours carbon footprint by avoiding repetitive retraining. Existing partitioning and aggregation-based methods have limitations due to their poor handling of local graph dependencies and additional overhead costs. More recently, GNNDelete offered a model-agnostic approach that alleviates some of these issues. Our work takes a novel approach to address these challenges in graph unlearning through knowledge distillation, as it distills to delete in GNN (D2DGN). It is a model-agnostic distillation framework where the complete graph knowledge is divided and marked for retention and deletion. It performs distillation with response-based soft targets and feature-based node embedding while minimizing KL divergence. The unlearned model effectively removes the influence of deleted graph elements while preserving knowledge about the retained graph elements. D2DGN surpasses the performance of existing methods when evaluated on various real-world graph datasets by up to $43.1\%$ (AUC) in edge and node unlearning tasks. Other notable advantages include better efficiency, better performance in removing target elements, preservation of performance for the retained elements, and zero overhead costs. Notably, our D2DGN surpasses the state-of-the-art GNNDelete in AUC by $2.4\%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.
摘要
“图гра推断学(Graph Neural Network,GNN)的忘记(unlearning)技术已经成为了练网图模型(pre-trained graph neural network)中删除信息的重要方法。可以删除节点、类型节点、边或类型边。忘记方法可以让GNN模型遵循数据保护法规(如“忘记权”),适应数据分布的变化,并减少GPU时间的碳脚印迹(避免重复 retraining)。现有的分区和聚合方法有限制,因为它们对本地图像依赖不善,并且带有额外成本。更新的GNNDelete提供了一种模型无关的方法,解决了一些这些问题。我们的工作采用了一种新的方法来解决图гра忘记的挑战,通过知识储存(knowledge distillation)来忘记(D2DGN)。这是一种模型无关的储存框架,Complete graph knowledge 被分解并标记为保留和删除。它通过响应式软目标和特征基于节点嵌入进行储存,并最小化KL散度。被忘记的模型可以有效地减少被删除图像元素的影响,保留保留图像元素的知识。D2DGN在多种实际图像Dataset上评估得到了比GNNDelete更高的表现,最高达到$43.1\%$(AUC)的边和节点忘记任务。其他优点包括更高的效率、更好的目标元素删除、保留元素性能的保留、零额外成本。值得一提的是,我们的D2DGN在AUC方面比GNNDelete提高了$2.4\%$, 提高了成员推断率$+1.3$, 需要$10.2\times10^6$ fewer FLOPs per forward pass和${\mathbf{3.2}\times$快。”
A Spectral Approach for Learning Spatiotemporal Neural Differential Equations
paper_authors: Mingtao Xia, Xiangting Li, Qijing Shen, Tom Chou
for: Computationally reconstructing differential equations (DEs) from observational data to gain insights into underlying causative mechanisms.
methods: Using spectral expansions in space to learn spatiotemporal DEs, without relying on spatial discretization, allowing for long-range, nonlocal spatial interactions on unbounded domains.
results: The proposed spectral neural DE learning approach is shown to be as accurate as some of the latest machine learning approaches for learning PDEs operating on bounded domains, and can be applied to a larger class of problems including unbounded DEs and integro-differential equations.Abstract
Rapidly developing machine learning methods has stimulated research interest in computationally reconstructing differential equations (DEs) from observational data which may provide additional insight into underlying causative mechanisms. In this paper, we propose a novel neural-ODE based method that uses spectral expansions in space to learn spatiotemporal DEs. The major advantage of our spectral neural DE learning approach is that it does not rely on spatial discretization, thus allowing the target spatiotemporal equations to contain long range, nonlocal spatial interactions that act on unbounded spatial domains. Our spectral approach is shown to be as accurate as some of the latest machine learning approaches for learning PDEs operating on bounded domains. By developing a spectral framework for learning both PDEs and integro-differential equations, we extend machine learning methods to apply to unbounded DEs and a larger class of problems.
摘要
“快速发展的机器学习方法已经刺激了观察数据中的对应运动方程式(DEs)的计算重建的研究兴趣。在这篇论文中,我们提出了一种新的神经网络-ODE基于方法,使用特征展开来学习空间时间的对应运动方程式。我们的spectral neural DE学习方法不依赖空间维度化,因此允许目标空间时间方程式包含无限距离的非本地空间互动, acting on unbounded spatial domains。我们的spectral方法与latest machine learning方法相比,具有相同的精度。通过开发一个spectral框架来学习PDE和 integro-differential方程式,我们延伸了机器学习方法,让它适用于无限DE和更多的问题。”Note that Simplified Chinese is a written form of Chinese that uses shorter words and sentences, and is more commonly used in informal writing and online communication. Traditional Chinese is a more formal written form that is used in more formal writing, such as newspapers and books.
Compositional Sculpting of Iterative Generative Processes
results: 本研究通过实验表明,通过使用compositional sculpting可以在图像和分子生成任务中实现更高的生成质量和更好的扩散性。Abstract
High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.
摘要
高训练成本的生成模型和特定任务的 fine-tuning 已经创造了对模型再利用和组合的强大兴趣。一个关键挑战在组合迭代生成过程,如 GFlowNets 和 diffusion models,是确保所有生成过程步骤协调,并满足细腻的平衡条件。在这项工作中,我们提出了组合雕塑:一种通用的方法来定义生成过程的组合。然后,我们介绍了基于分类指导的抽样方法。我们在 GFlowNets 和 diffusion models 中实现了 compositional sculpting,并提供了对多组件分布的总体化。我们介绍了两种二元操作:harmonic mean ($p_1 \otimes p_2$) 和 contrast ($p_1 \unicode{x25D1}\,p_2$) 之间的对比,以及这些操作的普遍化到多个组件分布。我们提供了对图像和分子生成任务的实验结果。
Comparing Active Learning Performance Driven by Gaussian Processes or Bayesian Neural Networks for Constrained Trajectory Exploration
results: 结果表明,使用 Gaussian processes 的活动学习策略可以更快地 converges 到一个准确的模型,并且可以采取更短的轨迹。 Bayesian neural networks 在大数据 regime 下可以更准确地模型环境,但是需要更多的计算资源。Abstract
Robots with increasing autonomy progress our space exploration capabilities, particularly for in-situ exploration and sampling to stand in for human explorers. Currently, humans drive robots to meet scientific objectives, but depending on the robot's location, the exchange of information and driving commands between the human operator and robot may cause undue delays in mission fulfillment. An autonomous robot encoded with a scientific objective and an exploration strategy incurs no communication delays and can fulfill missions more quickly. Active learning algorithms offer this capability of intelligent exploration, but the underlying model structure varies the performance of the active learning algorithm in accurately forming an understanding of the environment. In this paper, we investigate the performance differences between active learning algorithms driven by Gaussian processes or Bayesian neural networks for exploration strategies encoded on agents that are constrained in their trajectories, like planetary surface rovers. These two active learning strategies were tested in a simulation environment against science-blind strategies to predict the spatial distribution of a variable of interest along multiple datasets. The performance metrics of interest are model accuracy in root mean squared (RMS) error, training time, model convergence, total distance traveled until convergence, and total samples until convergence. Active learning strategies encoded with Gaussian processes require less computation to train, converge to an accurate model more quickly, and propose trajectories of shorter distance, except in a few complex environments in which Bayesian neural networks achieve a more accurate model in the large data regime due to their more expressive functional bases. The paper concludes with advice on when and how to implement either exploration strategy for future space missions.
摘要
Robots with increasing autonomy are advancing our space exploration capabilities, particularly for in-situ exploration and sampling to replace human explorers. Currently, humans control robots to achieve scientific objectives, but the delay in information exchange and driving commands between the human operator and robot can hinder mission success. An autonomous robot with a scientific objective and exploration strategy encoded does not incur communication delays and can complete missions more quickly. Active learning algorithms offer this capability of intelligent exploration, but the underlying model structure affects the performance of the active learning algorithm in understanding the environment. In this paper, we compare the performance differences between active learning algorithms driven by Gaussian processes or Bayesian neural networks for exploration strategies encoded on agents with constrained trajectories, such as planetary surface rovers. These two active learning strategies were tested in a simulation environment against science-blind strategies to predict the spatial distribution of a variable of interest along multiple datasets. The performance metrics of interest are model accuracy in root mean squared (RMS) error, training time, model convergence, total distance traveled until convergence, and total samples until convergence. Active learning strategies encoded with Gaussian processes require less computation to train, converge to an accurate model more quickly, and propose trajectories of shorter distance, except in a few complex environments where Bayesian neural networks achieve a more accurate model in the large data regime due to their more expressive functional bases. The paper concludes with advice on when and how to implement either exploration strategy for future space missions.
Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics
results: 研究发现,通过增强数据增强而生成的两个正例会在数据表示空间中受到吸引力,而负例会受到排斥力。但是,通过对比学习和非对比学习两种方法的比较,发现 feature normalization 对学习过程的稳定性具有重要的影响。Abstract
Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
摘要
“对照式学习是一种自我指导学习框架,其中两个正例通过数据增强生成的观察者通过吸引力在数据表示空间中变相似,而负例则通过排斥力让它们远离负例。非对照式学习,例如BYOL和SimSiam,进一步删除负例,并提高计算效率。然而,学习的表示可能会崩溃到单一点,因为缺乏排斥力。但是,这个问题可以通过调整数据增强的强度来解决。”“然而,这些分析不考虑通常使用的特征Normalizer,即在计算表示之间的相似度时,将特征转换为相同的尺度。因此,过度强制正规化可能会导致动态崩溃,这是一种不自然的行为。因此,我们从L2损失中推广的理论,考虑cosine损失,这个损失函数包含特征Normalizer。我们显示,cosine损失导致第六种动态(而L2损失导致第三种动态),其中稳定的平衡 dynamically emerges,即使只有崩溃的初始参数。因此,我们提出了一个新的理解,即特征Normalizer在避免动态崩溃中扮演了重要的角色。”
Differentially Private Secure Multiplication: Hiding Information in the Rubble of Noise
results: 该论文提出了一种紧张性 privacy-accuracy 质量的衡量方法,并在不同层次上具有层次结构的隐私泄露分布,从而实现了在诚实节点少于2t+1的情况下的隐私和准确性。Abstract
We consider the problem of private distributed multi-party multiplication. It is well-established that Shamir secret-sharing coding strategies can enable perfect information-theoretic privacy in distributed computation via the celebrated algorithm of Ben Or, Goldwasser and Wigderson (the "BGW algorithm"). However, perfect privacy and accuracy require an honest majority, that is, $N \geq 2t+1$ compute nodes are required to ensure privacy against any $t$ colluding adversarial nodes. By allowing for some controlled amount of information leakage and approximate multiplication instead of exact multiplication, we study coding schemes for the setting where the number of honest nodes can be a minority, that is $N< 2t+1.$ We develop a tight characterization privacy-accuracy trade-off for cases where $N < 2t+1$ by measuring information leakage using {differential} privacy instead of perfect privacy, and using the mean squared error metric for accuracy. A novel technical aspect is an intricately layered noise distribution that merges ideas from differential privacy and Shamir secret-sharing at different layers.
摘要
我团队考虑了分布式多方计算中的私人分享 multiply 问题。已经证明了Shamir的秘密分享编码策略可以在分布式计算中实现完美的信息理论隐私,通过著名的Ben Or、Goldwasser和Wigderson算法(BGW算法)。然而,完美隐私和精度需要一个诚实的多数,即 $N \geq 2t+1$ 计算节点。我们允许一定的控制的信息泄露和approximate multiply instead of exact multiply,研究在计算节点少于 $2t+1$ 的情况下的编码方案。我们开发了一个紧张的隐私准确度质量负担,通过使用 {differential} privacy 而不是完美隐私来度量信息泄露,并使用 mean squared error метри来度量准确性。一个新的技术方面是一种复杂层次的噪声分布,这将 Shamir secret-sharing 和分布式隐私的想法 merge 在不同层次。
Task-Oriented Koopman-Based Control with Contrastive Encoder
paper_authors: Xubo Lyu, Hanyang Hu, Seth Siriya, Ye Pu, Mo Chen
for: 这 paper 的目的是 simultaneously learn Koopman latent embedding, operator and associated linear controller within an iterative loop, 以便在高维、复杂非线性系统中进行控制。
methods: 这 paper 使用 end-to-end reinforcement learning 和 contrastive encoder 来学习 Koopman latent embedding, operator and associated linear controller。
results: 通过优先级 task cost 作为控制器学习的主要目标,这 paper 可以减少控制器设计对于准确模型的依赖,从而扩展 Koopman control 到高维、复杂非线性系统,包括像素化enario。Abstract
We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator and associated linear controller within an iterative loop. By prioritizing the task cost as main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which extends Koopman control beyond low-dimensional systems to high-dimensional, complex nonlinear systems, including pixel-based scenarios.
摘要
我们提出了任务导向的库曼控制方法,该方法利用端到端学习和对比编码器同时学习库曼嵌入、运算和相关的直线控制器。我们将任务成本作为控制器学习的主要目标,从而减少控制器设计依赖于良好识别模型的需求,因此扩展了库曼控制到高维、复杂非线性系统,包括像素化场景。
Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations via Inverse Reinforcement Learning
methods: 本研究使用了学习从示例(Learning from Demonstrations,LfD)技术,将专家策略传染到机器人上,并运用了最新的反馈学习(Inverse Reinforcement Learning,IRL)技术来解决步行机器人的行走问题。
results: 研究发现,透过对专家策略进行学习,可以将机器人的行走性能提高,并且在未见过的地形上保持稳定的行走。这显示了对 reward 学习的适应性和机器人行走的可控性。Abstract
Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.
摘要
enable bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.Here's the translation in Traditional Chinese:问题是,具有访问高度不均匀、动态变化的地形的双脚行走机器人学习是具有机器人动力学和互动环境的复杂性,导致学习问题的挑战。现有的学习从示例探索已经展示了在复杂环境中机器人学习的可能性。然而,对于双脚行走中的专家奖励函数学习,尚未充分探索。本文将使用现代倒推奖励学技术(IRL)解决双脚行走问题。我们提出了学习专家奖励函数的算法,并且分析学习到的函数。通过非线性函数推对,我们获得了专家行走策略的深入理解。此外,我们还证明了将专家奖励函数训练到双脚行走策略上,可以增强其在未见地形上的行走性能,强调了奖励学习的适应性。