results: 论文显示了在随机环境中,反对抗算法的性能不佳,而在随机环境中,最优算法可以实现近似最优性能。这个研究提供了一种Best-of-both-worlds算法,可以同时实现近似最优性能和Robust性能。Abstract
Convex function chasing (CFC) is an online optimization problem in which during each round $t$, a player plays an action $x_t$ in response to a hitting cost $f_t(x_t)$ and an additional cost of $c(x_t,x_{t-1})$ for switching actions. We study the CFC problem in stochastic and adversarial environments, giving algorithms that achieve performance guarantees simultaneously in both settings. Specifically, we consider the squared $\ell_2$-norm switching costs and a broad class of quadratic hitting costs for which the sequence of minimizers either forms a martingale or is chosen adversarially. This is the first work that studies the CFC problem using a stochastic framework. We provide a characterization of the optimal stochastic online algorithm and, drawing a comparison between the stochastic and adversarial scenarios, we demonstrate that the adversarial-optimal algorithm exhibits suboptimal performance in the stochastic context. Motivated by this, we provide a best-of-both-worlds algorithm that obtains robust adversarial performance while simultaneously achieving near-optimal stochastic performance.
摘要
“凹函数追踪(CFC)是一个在线优化问题,每个回合 $t$,玩家会选择动作 $x_t$,面临到打击成本 $f_t(x_t)$ 和 switch 成本 $c(x_t,x_{t-1})$。我们研究了 CFC 问题在随机和敌对环境中的解决方案,提供了同时在两个设定中实现性能保证的算法。我们考虑了平方 $\ell_2$ нор switching 成本和广泛的quadratic hitting costs,其中序列的最小值 either forms a martingale or is chosen adversarially。这是第一个使用随机框架研究 CFC 问题的作品。我们提供了最佳随机在线算法的特征化,并通过对随机和敌对场景进行比较,示出了对敌对场景的优化算法在随机场景中的下降性能。这些成果motivates us to提供一种 best-of-both-worlds 算法,实现了敌对性能的稳定性和随机场景中的近似优化性能。”Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
For: The paper aims to address the issue of “objective mismatch” in reinforcement learning from human feedback (RLHF) for large language models (LLMs), which can lead to unexpected behaviors and suboptimal performance.* Methods: The paper reviews relevant literature from model-based reinforcement learning and discusses potential solutions to the objective mismatch issue in RLHF, including the use of multi-objective reward shaping and the integration of multiple training objectives.* Results: The paper argues that by solving the objective mismatch issue in RLHF, LLMs of the future will be more precisely aligned to user instructions for both safety and helpfulness. However, the paper does not present any specific experimental results.Abstract
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to prompt and more capable in complex settings. RLHF at its core is providing a new toolkit to optimize LLMs other than next-token prediction, enabling the integration of qualitative training goals. The attempted match between user preferences and downstream performance, which happens in a learned reward model, results in an optimization landscape where training and evaluation metrics can appear correlated. The apparent correlation can lead to unexpected behaviors and stories of "too much RLHF." In RLHF, challenges emerge because the following sub-modules are not consistent with each other: the reward model training, the policy model training, and the policy model evaluation. This mismatch results in models that sometimes avoid user requests for false safety flags, are difficult to steer to an intended characteristic, or always answer in a specific style. As chat model evaluation becomes increasingly nuanced, the reliance on a perceived link between reward model score and downstream performance drives the objective mismatch issue. In this paper, we illustrate the cause of this issue, reviewing relevant literature from model-based reinforcement learning, and discuss relevant solutions to encourage further research. By solving objective mismatch in RLHF, the LLMs of the future will be more precisely aligned to user instructions for both safety and helpfulness.
摘要
在RLHF中,出现挑战的原因是以下子模块不兼容:奖励模型训练、策略模块训练和策略模块评估。这种差异导致模型在满足用户请求时可能避免假的安全标识,困难带动特征,或总是回答在特定风格下。随着对话模型评估的加深,对奖励模型分数和下游性能的感知驱动对象匹配问题变得更加突出。本文介绍了对象匹配问题的原因,参考了相关的模型基于束缚学习 литераature,并讨论了鼓励进一步研究的相关解决方案。通过解决RLHF中的对象匹配问题,未来的LLMS将更加精准地遵循用户指令,以保证安全和有用性。
Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations for Accident Analysis
paper_authors: Abhinav Nippani, Dongyue Li, Haotian Ju, Haris N. Koutsopoulos, Hongyang R. Zhang for: 这 paper 旨在分析交通事故在路网上的分析,并评估现有深度学习方法的准确性。methods: 这 paper 使用了 graph neural networks (GraphSAGE) 和多任务学习来预测路网上交通事故的发生。results: 这 paper 的主要发现是,使用 GraphSAGE 可以准确预测路网上交通事故的数量(减幅Error 低于 22%)和事故发生或不发生(AUROC 高于 87%)。Abstract
We consider the problem of traffic accident analysis on a road network based on road network connections and traffic volume. Previous works have designed various deep-learning methods using historical records to predict traffic accident occurrences. However, there is a lack of consensus on how accurate existing methods are, and a fundamental issue is the lack of public accident datasets for comprehensive evaluations. This paper constructs a large-scale, unified dataset of traffic accident records from official reports of various states in the US, totaling 9 million records, accompanied by road networks and traffic volume reports. Using this new dataset, we evaluate existing deep-learning methods for predicting the occurrence of accidents on road networks. Our main finding is that graph neural networks such as GraphSAGE can accurately predict the number of accidents on roads with less than 22% mean absolute error (relative to the actual count) and whether an accident will occur or not with over 87% AUROC, averaged over states. We achieve these results by using multitask learning to account for cross-state variabilities (e.g., availability of accident labels) and transfer learning to combine traffic volume with accident prediction. Ablation studies highlight the importance of road graph-structural features, amongst other features. Lastly, we discuss the implications of the analysis and develop a package for easily using our new dataset.
摘要
我们考虑了基于路网连接和交通量的道路事故分析问题。先前的工作已经设计了各种深度学习方法,用于预测交通事故发生。然而,exist的评估方法的准确性没有达成一致,而且存在基本的问题是缺乏公共事故数据集,进行全面的评估。本文构建了美国各州官方报告的大规模、统一的交通事故记录集,总计900万条记录,并附带路网和交通量报告。使用这个新的数据集,我们评估了现有的深度学习方法,预测路网上事故的发生。我们的主要发现是,图 neural network 如 GraphSAGE 可以准确预测路网上事故的数量(mean absolute error 低于 22%)和事故发生或不发生的问题(AUROC 高于 87%),并且可以透过多任务学习和传输学习来补做交通量和事故预测之间的交叉状态(如事故标签的可用性)。减少学习显示了路网结构特征的重要性,等其他特征。最后,我们讨论了分析结果的意义以及开发了一个包,以便轻松使用我们新的数据集。
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data
results: 在模拟数据集上训练 Neuroformer 模型后,发现其可以准确预测神经元征集和神经细胞连接,并且可以在几个shot fine-tuning 下预测动物的行为。这些结果表明 Neuroformer 可以分析神经系统数据和其 Emergent 性质,为神经科学研究提供新的方法和理论。Abstract
State-of-the-art systems neuroscience experiments yield large-scale multimodal data, and these data sets require new tools for analysis. Inspired by the success of large pretrained models in vision and language domains, we reframe the analysis of large-scale, cellular-resolution neuronal spiking data into an autoregressive spatiotemporal generation problem. Neuroformer is a multimodal, multitask generative pretrained transformer (GPT) model that is specifically designed to handle the intricacies of data in systems neuroscience. It scales linearly with feature size, can process an arbitrary number of modalities, and is adaptable to downstream tasks, such as predicting behavior. We first trained Neuroformer on simulated datasets, and found that it both accurately predicted simulated neuronal circuit activity, and also intrinsically inferred the underlying neural circuit connectivity, including direction. When pretrained to decode neural responses, the model predicted the behavior of a mouse with only few-shot fine-tuning, suggesting that the model begins learning how to do so directly from the neural representations themselves, without any explicit supervision. We used an ablation study to show that joint training on neuronal responses and behavior boosted performance, highlighting the model's ability to associate behavioral and neural representations in an unsupervised manner. These findings show that Neuroformer can analyze neural datasets and their emergent properties, informing the development of models and hypotheses associated with the brain.
摘要
现代系统神经科学实验室内生成大规模多Modal数据,需要新的分析工具。启发自视语领域中大型预训练模型的成功,我们将大规模、细胞分辨率神经元准噪数据分析转换成一个自然语言生成问题。Neuroformer是一种多模态、多任务生成预训练变换器(GPT)模型,专门针对系统神经科学数据处理。它可以Linearly扩展特征大小,处理任意数据模式,并适应下游任务,如预测行为。我们首先在模拟数据集上训练Neuroformer,发现它可以准确预测模拟神经Circuit活动,并自动推导下面的神经Circuit连接性,包括方向。当用于解码神经响应时,模型只需几枚射精度调整,能够预测鼠标的行为,表明模型直接从神经表示中学习如何进行这种行为,无需任何显式监督。我们使用了减少研究来证明,联合神经响应和行为培训可以提高性能,表明模型可以在无监督下关联神经和行为表示。这些发现表明Neuroformer可以分析神经数据和其emergent Properties,为神经科学发展模型和假设提供参考。
Extracting the Multiscale Causal Backbone of Brain Dynamics
results: 实验结果表明,相比于基于函数连接网络的基线方法,该方法在synthetic data上表现出优于性。当应用于resting-state fMRI数据时,发现左右大脑半球都有稀疏的 MCB。在低频带widthband上, causal 动力是由高级认知功能相关的脑区控制的;而在更高频带widthband上,感知处理相关的节点发挥关键作用。最后,对个体多尺度 causal 结构的分析表明,大脑连接指纹确实存在,从 causal 角度支持了已有的大脑连接指纹研究。Abstract
The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it. Our approach leverages recent advances in multiscale causal structure learning and optimizes the trade-off between the model fitting and its complexity. Empirical assessment on synthetic data shows the superiority of our methodology over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. Thanks to its multiscale nature, our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting from a causal perspective the existing extensive research in brain connectivity fingerprinting.
摘要
大部分脑连接研究都集中在统计相关性中,而不直接关系脑动态的 causal 机制。我们提议一种多尺度 causal 脑动态核心(MCB),共享多个个体 across multiple temporal scales,并提出一种原则性的方法来提取它。我们的方法利用了 latest advances in multiscale causal structure learning 和 optimize 模型 Complexity 的质量。empirical assessment on synthetic data shows our methodology 的 superiority over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. 多尺度 nature of our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting from a causal perspective the existing extensive research in brain connectivity fingerprinting.
EXTRACT: Explainable Transparent Control of Bias in Embeddings
paper_authors: Zhijin Guo, Zhaozhen Xu, Martha Lewis, Nello Cristianini
for: The paper aims to address the issue of bias in knowledge graph embeddings, specifically the implicit presence of protected information, and to propose a suite of Explainable and Transparent methods to control bias.
methods: The paper uses Canonical Correlation Analysis (CCA) to investigate the presence, extent, and origins of information leaks during training, and decomposes embeddings into a sum of their private attributes by solving a linear system.
results: The paper shows that a range of personal attributes can be inferred from a user’s viewing behavior and preferences, and that information about the conference in which a paper was published can be inferred from the citation network of that article. The paper also proposes four transparent methods to maintain the capability of the embedding to make the intended predictions without retaining unwanted information.Abstract
Knowledge Graphs are a widely used method to represent relations between entities in various AI applications, and Graph Embedding has rapidly become a standard technique to represent Knowledge Graphs in such a way as to facilitate inferences and decisions. As this representation is obtained from behavioural data, and is not in a form readable by humans, there is a concern that it might incorporate unintended information that could lead to biases. We propose EXTRACT: a suite of Explainable and Transparent methods to ConTrol bias in knowledge graph embeddings, so as to assess and decrease the implicit presence of protected information. Our method uses Canonical Correlation Analysis (CCA) to investigate the presence, extent and origins of information leaks during training, then decomposes embeddings into a sum of their private attributes by solving a linear system. Our experiments, performed on the MovieLens1M dataset, show that a range of personal attributes can be inferred from a user's viewing behaviour and preferences, including gender, age, and occupation. Further experiments, performed on the KG20C citation dataset, show that the information about the conference in which a paper was published can be inferred from the citation network of that article. We propose four transparent methods to maintain the capability of the embedding to make the intended predictions without retaining unwanted information. A trade-off between these two goals is observed.
摘要
知识图(Knowledge Graph)是人工智能应用中广泛使用的一种方法,用于表示实体之间的关系。而图嵌入(Graph Embedding)在这些知识图中的应用已经变得标准化,以便进行推理和决策。然而,由于这种表示来自行为数据,并不是人类可读的形式,因此存在潜在的偏见问题。我们提出了EXTRACT:一个包括可解释的和透明的方法,用于控制知识图嵌入中的偏见。我们使用相似性分析(Canonical Correlation Analysis,CCA)来调查培训过程中的信息泄露情况,然后将嵌入分解为私有属性的总和,解决一个线性系统。我们在MovieLens1M数据集上进行了实验,发现用户的观影行为和偏好可以推断出一些个人属性,包括性别、年龄和职业。在KG20C引用数据集上,我们发现了一种方法,可以通过文献引用网络中的文献来推断出文章的发表会议信息。我们提出了四种透明的方法,以保持嵌入的能力,而不是保留不必要的信息。我们观察到了这两个目标之间的平衡。
FairWASP: Fast and Optimal Fair Wasserstein Pre-processing
paper_authors: Zikai Xiong, Niccolò Dalmasso, Alan Mishler, Vamsi K. Potluru, Tucker Balch, Manuela Veloso for: 这个研究是为了提高分类模型的不偏性,尤其是在训练数据被重复使用的情况下。methods: 这个研究使用了一种名为 FairWASP 的新型预处理方法,可以在分类dataset中实现不偏性。FairWASP 使用 sample-level 的权重来将训练数据重新分配,以降低分类模型的不偏性。results: 这个研究的结果显示,FairWASP 可以对分类dataset进行优化,以降低不偏性。实验结果显示,FairWASP 可以与商业 solver 相比,在解决大规模数据的混合数据 програм中表现出色。此外,这个研究还证明了 FairWASP 可以在下游分类任务中维持精度,同时降低不偏性。Abstract
Recent years have seen a surge of machine learning approaches aimed at reducing disparities in model outputs across different subgroups. In many settings, training data may be used in multiple downstream applications by different users, which means it may be most effective to intervene on the training data itself. In this work, we present FairWASP, a novel pre-processing approach designed to reduce disparities in classification datasets without modifying the original data. FairWASP returns sample-level weights such that the reweighted dataset minimizes the Wasserstein distance to the original dataset while satisfying (an empirical version of) demographic parity, a popular fairness criterion. We show theoretically that integer weights are optimal, which means our method can be equivalently understood as duplicating or eliminating samples. FairWASP can therefore be used to construct datasets which can be fed into any classification method, not just methods which accept sample weights. Our work is based on reformulating the pre-processing task as a large-scale mixed-integer program (MIP), for which we propose a highly efficient algorithm based on the cutting plane method. Experiments on synthetic datasets demonstrate that our proposed optimization algorithm significantly outperforms state-of-the-art commercial solvers in solving both the MIP and its linear program relaxation. Further experiments highlight the competitive performance of FairWASP in reducing disparities while preserving accuracy in downstream classification settings.
摘要
近年来,机器学习方法的增长有助于降低不同子群体的模型输出差异。在多个应用场景中,训练数据可能会被多个用户 reuse,这意味着可以直接 intervene 在训练数据本身上。在这种情况下,我们提出了 FairWASP,一种新的预处理方法,可以在类别 datasets 中减少不同子群体的差异。 FairWASP 返回样本级别的权重,使得重新权重化的 dataset 最小化 Wasserstein 距离原始dataset,同时满足(一种 empirical 版本的)人口平衡,一种受欢迎的公平性标准。我们证明了整数权重是优化的,这意味着我们的方法可以看作是复制或消除样本。 FairWASP 可以用来构建可以被任何类别方法接受的 dataset,不仅是可以接受样本权重的方法。我们基于大规模混合整数 програм(MIP)的 reformulation 提出了一种高效的算法,基于切割面法。 Synthetic 数据集上的实验表明,我们提出的优化算法在 MIP 和其线性 програм的 relaxation 上明显超越了当前市场上的商业解决方案。此外,实验还表明,FairWASP 可以减少差异,保持下游类别设置中的准确性。
results: 该方法可以减少训练数据量,但保持3D重建质量与使用整个数据集相当。Abstract
We consider the problem of 3D seismic inversion from pre-stack data using a very small number of seismic sources. The proposed solution is based on a combination of compressed-sensing and machine learning frameworks, known as compressed-learning. The solution jointly optimizes a dimensionality reduction operator and a 3D inversion encoder-decoder implemented by a deep convolutional neural network (DCNN). Dimensionality reduction is achieved by learning a sparse binary sensing layer that selects a small subset of the available sources, then the selected data is fed to a DCNN to complete the regression task. The end-to-end learning process provides a reduction by an order-of-magnitude in the number of seismic records used during training, while preserving the 3D reconstruction quality comparable to that obtained by using the entire dataset.
摘要
我团队考虑了基于非常小的数量的地震源数据的3D地震推准问题。我们的解决方案基于压缩感知和机器学习框架,称为压缩学习。我们的解决方案同时优化了维度减少算子和基于深度卷积神经网络(DCNN)实现的3D推准编码器-解码器。通过学习一个稀疏二进制感知层,选择一小部分可用的源数据,然后将选择的数据传递给DCNN完成回归任务。我们的综合学习过程可以在训练中采用一个数量级减少,保持与整个数据集相同的3D重建质量。
Seeking Truth and Beauty in Flavor Physics with Machine Learning
results: 研究人员通过优化损失函数来构建了真实美丽的Yukawa夹心模型,这个模型满足了现有实验数据和抽象理论家的标准。Abstract
The discovery process of building new theoretical physics models involves the dual aspect of both fitting to the existing experimental data and satisfying abstract theorists' criteria like beauty, naturalness, etc. We design loss functions for performing both of those tasks with machine learning techniques. We use the Yukawa quark sector as a toy example to demonstrate that the optimization of these loss functions results in true and beautiful models.
摘要
发现过程中建立新理论物理模型具有两个方面的双重目标:一是适应现有实验数据,二是满足抽象理论家的标准如美食、自然性等。我们使用机器学习技术定义损失函数来实现这两个任务。我们使用Yukawa夹心作为一个示例,示出优化这些损失函数后可以获得真实美妙的模型。
Ensemble models outperform single model uncertainties and predictions for operator-learning of hypersonic flows
results: 比较三种不确定量化方法,发现ensembling最佳地对错误和不确定量化进行减少,并在 interpolative 和 extrapolative 状况下均表现出色。Abstract
High-fidelity computational simulations and physical experiments of hypersonic flows are resource intensive. Training scientific machine learning (SciML) models on limited high-fidelity data offers one approach to rapidly predict behaviors for situations that have not been seen before. However, high-fidelity data is itself in limited quantity to validate all outputs of the SciML model in unexplored input space. As such, an uncertainty-aware SciML model is desired. The SciML model's output uncertainties could then be used to assess the reliability and confidence of the model's predictions. In this study, we extend a DeepONet using three different uncertainty quantification mechanisms: mean-variance estimation, evidential uncertainty, and ensembling. The uncertainty aware DeepONet models are trained and evaluated on the hypersonic flow around a blunt cone object with data generated via computational fluid dynamics over a wide range of Mach numbers and altitudes. We find that ensembling outperforms the other two uncertainty models in terms of minimizing error and calibrating uncertainty in both interpolative and extrapolative regimes.
摘要
高精度计算 simulations 和物理实验对高速流动进行资源密集的研究。使用有限的高精度数据来训练科学机器学习(SciML)模型,以快速预测未曾看到的行为。然而,高精度数据受限,无法验证SciML模型在未知输入空间中的所有输出。因此,需要一个 uncertainty-aware SciML 模型。SciML 模型的输出不确定性可以用来评估模型预测的可靠性和信任度。在这种研究中,我们扩展了 DeepONet,使用三种不同的不确定性评估机制:mean-variance estimation、evidential uncertainty 和 ensembling。这些不确定性意识 DeepONet 模型在高速流动around blunt cone 对象上被训练和评估,我们发现 ensemble 在 interpolative 和 extrapolative режимом中都能够最小化错误和 calibrate 不确定性。
Training-Free Generalization on Heterogeneous Tabular Data via Meta-Representation
methods: 本研究提出了一种名为 Tabular data Pre-Training via Meta-representation (TabPTM) 的方法,具体来说是使用一个固定数量的原型来标准化不同的表格数据集,然后通过一个深度神经网络将这些元表示与数据集特定的分类信任关联起来。
results: 实验表明,TabPTM 在新的表格数据集上能够达到了良好的表现,甚至在几shot场景下也能够达到比较好的结果。Abstract
Tabular data is prevalent across various machine learning domains. Yet, the inherent heterogeneities in attribute and class spaces across different tabular datasets hinder the effective sharing of knowledge, limiting a tabular model to benefit from other datasets. In this paper, we propose Tabular data Pre-Training via Meta-representation (TabPTM), which allows one tabular model pre-training on a set of heterogeneous datasets. Then, this pre-trained model can be directly applied to unseen datasets that have diverse attributes and classes without additional training. Specifically, TabPTM represents an instance through its distance to a fixed number of prototypes, thereby standardizing heterogeneous tabular datasets. A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences, endowing TabPTM with the ability of training-free generalization. Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
摘要
tabular 数据广泛存在不同机器学习领域中。然而,不同的 tabular 数据集之间的自然差异会阻碍知识共享,限制一个 tabular 模型只能在自己的数据集上学习。在这篇论文中,我们提出了 Tabular 数据预训练 via Meta-representation(TabPTM),允许一个 tabular 模型在一组不同的 heterogeneous 数据集上预训练。然后,这个预训练后的模型可以直接应用于未看过的数据集,无需进行额外的训练。具体来说,TabPTM 通过将实例表示为它们与固定数量的原型之间的距离,标准化 heterogeneous tabular 数据集。然后,一个深度神经网络被训练以关联这些 meta-representation 与数据集特定的分类信心,赋予 TabPTM 无需训练的普适性。实验证明,TabPTM 在新的数据集中表现出色,即使是几个步骤的情况下。
results: 研究发现,使用连续、离散、无 bound 或 bounded 活动函数的高默戈罗夫两层神经网络模型可以准确表示所有多变量函数,包括连续、离散、无 bound 和 bounded 函数。Abstract
In this paper, we show that the Kolmogorov two hidden layer neural network model with a continuous, discontinuous bounded or unbounded activation function in the second hidden layer can precisely represent continuous, discontinuous bounded and all unbounded multivariate functions, respectively.
摘要
在这篇论文中,我们表明了科尔莫洛夫两层神经网络模型,其中第二层隐藏层使用连续、终止、 bounded或无界活动函数,可以精准表示连续、终止 bounded 和所有无界多变量函数。Note: "终止" (bié zhì) in the text refers to "discontinuous" in English.
Unexpected Improvements to Expected Improvement for Bayesian Optimization
paper_authors: Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, Eytan Bakshy
for: The paper is written for optimizing acquisition functions in Bayesian optimization, specifically addressing the challenges of numerical optimization in existing methods.
methods: The paper proposes a new family of acquisition functions called LogEI, which includes reformulations of classic EI and its variants to address numerical pathologies and improve optimization performance.
results: The paper demonstrates the effectiveness of LogEI through empirical results, showing that members of the LogEI family substantially improve optimization performance compared to their canonical counterparts and are on par with or exceed the performance of recent state-of-the-art acquisition functions.Here’s the same information in Simplified Chinese text:
methods: 论文提出了一个新的 acquisition function 家族 called LogEI,包括修复 classic EI 和其 variants 中的数学问题,以提高优化性能。
results: 论文通过实验结果显示,LogEI 家族成员在优化性能上有显著提高,与 canonical counterparts 相当或超过了最新的 state-of-the-art acquisition function 的性能。Abstract
Expected Improvement (EI) is arguably the most popular acquisition function in Bayesian optimization and has found countless successful applications, but its performance is often exceeded by that of more recent methods. Notably, EI and its variants, including for the parallel and multi-objective settings, are challenging to optimize because their acquisition values vanish numerically in many regions. This difficulty generally increases as the number of observations, dimensionality of the search space, or the number of constraints grow, resulting in performance that is inconsistent across the literature and most often sub-optimal. Herein, we propose LogEI, a new family of acquisition functions whose members either have identical or approximately equal optima as their canonical counterparts, but are substantially easier to optimize numerically. We demonstrate that numerical pathologies manifest themselves in "classic" analytic EI, Expected Hypervolume Improvement (EHVI), as well as their constrained, noisy, and parallel variants, and propose corresponding reformulations that remedy these pathologies. Our empirical results show that members of the LogEI family of acquisition functions substantially improve on the optimization performance of their canonical counterparts and surprisingly, are on par with or exceed the performance of recent state-of-the-art acquisition functions, highlighting the understated role of numerical optimization in the literature.
摘要
预期改进(EI)是搜索优化中最受欢迎的功能之一,在bayesian优化中找到了 countless 的成功应用,但其表现通常被更新的方法所超越。尤其是EI和其变体,包括并行和多目标设置,在数值优化方面很困难,因为它们在许多区域中的优化值会变成数值 zero。这种困难通常随着观察数、搜索空间维度或约束数量增加,导致文献中的表现不一致,通常是下OPTimal。在这里,我们提出了 LogEI,一个新的家族 acquisition 函数,其成员在 canonical 对应者中有同等或相近的最优点,但是在数值优化方面更加容易。我们示示了类别 EI、Expected Hypervolume Improvement(EHVI)以及它们的 constrained、noisy 和并行变体中的数值病理,并提出了相应的改进方案。我们的实验结果表明,LogEI 家族中的 acquisition 函数在 canonical 对应者中显著提高了优化性能,奇怪地,与或超过了当前状态的艺术 acquisition 函数的性能,这highlights 了文献中数值优化的重要性。
Farthest Greedy Path Sampling for Two-shot Recommender Search
results: 通过在Three Click-Through Rate (CTR) prediction benchmarks上进行评估,发现该方法可以常见 manually designed和大多数 NAS 基于的模型,并且在Click-Through Rate (CTR) 预测任务中具有优秀的性能。Abstract
Weight-sharing Neural Architecture Search (WS-NAS) provides an efficient mechanism for developing end-to-end deep recommender models. However, in complex search spaces, distinguishing between superior and inferior architectures (or paths) is challenging. This challenge is compounded by the limited coverage of the supernet and the co-adaptation of subnet weights, which restricts the exploration and exploitation capabilities inherent to weight-sharing mechanisms. To address these challenges, we introduce Farthest Greedy Path Sampling (FGPS), a new path sampling strategy that balances path quality and diversity. FGPS enhances path diversity to facilitate more comprehensive supernet exploration, while emphasizing path quality to ensure the effective identification and utilization of promising architectures. By incorporating FGPS into a Two-shot NAS (TS-NAS) framework, we derive high-performance architectures. Evaluations on three Click-Through Rate (CTR) prediction benchmarks demonstrate that our approach consistently achieves superior results, outperforming both manually designed and most NAS-based models.
摘要
weight-sharing neural architecture search (WS-NAS) 提供了一种高效的终端深度推荐模型开发机制。然而,在复杂的搜索空间中,分化出优于劣的体系(或路径)是具有挑战性。这些挑战得到加剧,因为超网络覆盖率有限,并且子网重量相互整合,这限制了weight-sharing机制内置的探索和利用能力。为解决这些挑战,我们介绍了远程最大规则 sampling (FGPS),一种新的路径采样策略,既保证路径质量,又提高了路径多样性,以便更全面地探索超网络。通过将FGPS纳入两极 NAS (TS-NAS) 框架中,我们 derivation 高性能的体系。对三个 Click-Through Rate (CTR) 预测benchmark进行评估,我们的方法一直保持优秀的结果,比 manual 设计和大多数 NAS 基于模型都高。
Bayesian Multistate Bennett Acceptance Ratio Methods
results: 当使用均匀先验分布时,BayesMBAR方法可以与MBAR方法recover结果,但是可以提供更加准确的uncertainty估计。此外,当有具体关于自由能量的先验知识时,BayesMBAR方法可以将其integrate到估计过程中,提供更加准确的估计结果。Abstract
The multistate Bennett acceptance ratio (MBAR) method is a prevalent approach for computing free energies of thermodynamic states. In this work, we introduce BayesMBAR, a Bayesian generalization of the MBAR method. By integrating configurations sampled from thermodynamic states with a prior distribution, BayesMBAR computes a posterior distribution of free energies. Using the posterior distribution, we derive free energy estimations and compute their associated uncertainties. Notably, when a uniform prior distribution is used, BayesMBAR recovers the MBAR's result but provides more accurate uncertainty estimates. Additionally, when prior knowledge about free energies is available, BayesMBAR can incorporate this information into the estimation procedure by using non-uniform prior distributions. As an example, we show that, by incorporating the prior knowledge about the smoothness of free energy surfaces, BayesMBAR provides more accurate estimates than the MBAR method. Given MBAR's widespread use in free energy calculations, we anticipate BayesMBAR to be an essential tool in various applications of free energy calculations.
摘要
多状态本нет特征比率方法(MBAR)是一种广泛使用的方法来计算热动力学状态的自由能。在这项工作中,我们介绍了抽象 BayesMBAR,一种 bayesian 扩展 MBAR 方法。通过将thermodynamic状态中采样的配置与一个先验分布集成,BayesMBAR 计算出一个后验分布自由能。使用这个后验分布,我们计算出自由能估计值和相关的不确定性。值得注意的是,当使用均匀先验分布时,BayesMBAR 重现 MBAR 的结果,但是提供更加准确的不确定性估计。此外,当有具体的先验知识关于自由能的情况下,BayesMBAR 可以将这些信息integrated到估计过程中,通过使用非均匀先验分布。作为一个例子,我们示例了,通过 incorporating 自由能表面的平滑性先验知识,BayesMBAR 提供了更加准确的估计值。considering MBAR 在自由能计算中广泛应用,我们预计 BayesMBAR 将成为各种自由能计算应用中的重要工具。
Compression with Exact Error Distribution for Federated Learning
results: 该论文提出的压缩和汇集方法可以提高和改进标准的 FL 方案中的 Gaussian 噪声(如 Langevin 动力学和随机平滑)。Abstract
Compression schemes have been extensively used in Federated Learning (FL) to reduce the communication cost of distributed learning. While most approaches rely on a bounded variance assumption of the noise produced by the compressor, this paper investigates the use of compression and aggregation schemes that produce a specific error distribution, e.g., Gaussian or Laplace, on the aggregated data. We present and analyze different aggregation schemes based on layered quantizers achieving exact error distribution. We provide different methods to leverage the proposed compression schemes to obtain compression-for-free in differential privacy applications. Our general compression methods can recover and improve standard FL schemes with Gaussian perturbations such as Langevin dynamics and randomized smoothing.
摘要
压缩方案在分布式学习(Federated Learning,FL)中广泛应用以降低分布式学习中的通信成本。大多数方法假设压缩器生成的噪声具有有限的方差,但这篇论文探讨使用压缩和汇集方案生成特定异常分布,例如 Gaussian 或 Laplace Distribution,于汇集数据。我们提出和分析不同层次量化器实现的不同汇集方案,并提供不同的方法来利用我们的压缩方法实现“压缩-for-free”在权限保护应用中。我们的通用压缩方法可以恢复和改进标准FL方案中的 Gaussian 噪声,例如杜邦随机扩散和随机缓和。
Latent Field Discovery In Interacting Dynamical Systems With Neural Fields
results: 实验表明,本研究可以准确地发现场景中的场效应,并使用这些场效应来预测未来的轨迹。Abstract
Systems of interacting objects often evolve under the influence of field effects that govern their dynamics, yet previous works have abstracted away from such effects, and assume that systems evolve in a vacuum. In this work, we focus on discovering these fields, and infer them from the observed dynamics alone, without directly observing them. We theorize the presence of latent force fields, and propose neural fields to learn them. Since the observed dynamics constitute the net effect of local object interactions and global field effects, recently popularized equivariant networks are inapplicable, as they fail to capture global information. To address this, we propose to disentangle local object interactions -- which are $\mathrm{SE}(n)$ equivariant and depend on relative states -- from external global field effects -- which depend on absolute states. We model interactions with equivariant graph networks, and combine them with neural fields in a novel graph network that integrates field forces. Our experiments show that we can accurately discover the underlying fields in charged particles settings, traffic scenes, and gravitational n-body problems, and effectively use them to learn the system and forecast future trajectories.
摘要
Translated into Simplified Chinese:系统中的对象们经常在场效果的影响下演化,然而先前的工作忽略了这些效果,假设系统在虚拟中运行。在这项工作中,我们关注发现这些场,不直接观察它们。我们推测存在潜在的力场,并提议使用神经场来学习它们。由于观察的动态包含本地对象互动和全局场效果,当前流行的对称网络是无法捕捉全局信息的。为此,我们提议分离本地对象互动,它们是 $\mathrm{SE}(n)$ 对称的并且基于相对状态,与外部全局场效果,它们基于绝对状态分离。我们使用对称图网络来表示互动,并将其与神经场结合在一起。我们的实验表明,我们可以准确发现带电粒子设置、交通场景和重力n体问题下的下面场,并使用它们来学习系统和预测未来轨迹。
Balancing Act: Constraining Disparate Impact in Sparse Models
results: 这篇论文的结果显示,使用这种模型剔除方法可以实现类似于整个数据集的性能,但是对于一些数据子集可能会导致严重的性能下降。此外,现有的方法可能会对保护的子集来 induction of disparate impact,而这篇论文的方法可以直接 Addresses 这个问题,并且可以在大型模型和多个保护子集的情况下进行测试。Abstract
Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that $\textit{directly addresses the disparate impact of pruning}$: our formulation bounds the accuracy change between the dense and sparse models, for each sub-group. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.
摘要
Density Matrix Emulation of Quantum Recurrent Neural Networks for Multivariate Time Series Prediction
paper_authors: José Daniel Viqueira, Daniel Faílde, Mariamo M. Juane, Andrés Gómez, David Mera
for: 模型和预测未来值的多变量时间序列的 quantum 循环神经网络 (QRNN)。
methods: 使用密度矩阵 formalism 设计专门的模拟方法,以减少计算成本。
results: QRNN 可以准确预测未来值,并通过捕捉输入时间序列的非常复杂的模式来捕捉非线性关系。Abstract
Quantum Recurrent Neural Networks (QRNNs) are robust candidates to model and predict future values in multivariate time series. However, the effective implementation of some QRNN models is limited by the need of mid-circuit measurements. Those increase the requirements for quantum hardware, which in the current NISQ era does not allow reliable computations. Emulation arises as the main near-term alternative to explore the potential of QRNNs, but existing quantum emulators are not dedicated to circuits with multiple intermediate measurements. In this context, we design a specific emulation method that relies on density matrix formalism. The mathematical development is explicitly provided as a compact formulation by using tensor notation. It allows us to show how the present and past information from a time series is transmitted through the circuit, and how to reduce the computational cost in every time step of the emulated network. In addition, we derive the analytical gradient and the Hessian of the network outputs with respect to its trainable parameters, with an eye on gradient-based training and noisy outputs that would appear when using real quantum processors. We finally test the presented methods using a novel hardware-efficient ansatz and three diverse datasets that include univariate and multivariate time series. Our results show how QRNNs can make accurate predictions of future values by capturing non-trivial patterns of input series with different complexities.
摘要
To address this challenge, we propose a specific emulation method based on density matrix formalism. We provide a compact mathematical development using tensor notation, which allows us to show how the present and past information from a time series is transmitted through the circuit and how to reduce the computational cost in every time step of the emulated network. Additionally, we derive the analytical gradient and Hessian of the network outputs with respect to its trainable parameters, which is essential for gradient-based training and handling noisy outputs when using real quantum processors.We test the presented methods using a novel hardware-efficient ansatz and three diverse datasets that include univariate and multivariate time series. Our results demonstrate that QRNNs can accurately predict future values by capturing non-trivial patterns of input series with different complexities.Here is the translation in Simplified Chinese:量子回归神经网络 (QRNN) 是Multivariate时间序列预测的可能性的稳定候选人。然而,一些QRNN模型的有效实现受到中间测量的限制,这会增加量子硬件的需求,现今NISQ时代无法可靠计算。虚拟是NISQ时代的主要准near-term alternativeto explore QRNN的潜力。然而,现有的量子模拟器没有专门针对多个中间测量的环境进行设计。为此,我们提出了一种基于密度矩阵ormalformalism的特定模拟方法。我们使用tensor notation提供了一个紧凑的数学发展,可以显示在时间序列中传递信息的方式,以及在每次时间步骤中减少模拟网络的计算成本。此外,我们还 derive了模拟网络输出的分析偏微和Hessian,这对于基于梯度的训练和使用真正量子处理器时的噪声输出都是非常重要的。我们使用一种新的硬件高效的ansatz测试了我们的方法,并使用三个多样化的时间序列 dataset,包括单variate和多variate时间序列。我们的结果表明,QRNN可以准确预测未来值,并且可以捕捉不同复杂性的输入序列中的非常见模式。
Performance Improvement in Multi-class Classification via Automated Hierarchy Generation and Exploitation through Extended LCPN Schemes
results: 研究发现,LCPN+F 在多个数据集和场景中具有最高的表现优势,并且与 Flat Classification(FC)的运行时间性能相当。此外,研究也强调了选择合适的层次利用方案可以最大化分类表现。Abstract
Hierarchical classification (HC) plays a pivotal role in multi-class classification tasks, where objects are organized into a hierarchical structure. This study explores the performance of HC through a comprehensive analysis that encompasses both hierarchy generation and hierarchy exploitation. This analysis is particularly relevant in scenarios where a predefined hierarchy structure is not readily accessible. Notably, two novel hierarchy exploitation schemes, LCPN+ and LCPN+F, which extend the capabilities of LCPN and combine the strengths of global and local classification, have been introduced and evaluated alongside existing methods. The findings reveal the consistent superiority of LCPN+F, which outperforms other schemes across various datasets and scenarios. Moreover, this research emphasizes not only effectiveness but also efficiency, as LCPN+ and LCPN+F maintain runtime performance comparable to Flat Classification (FC). Additionally, this study underscores the importance of selecting the right hierarchy exploitation scheme to maximize classification performance. This work extends our understanding of HC and establishes a benchmark for future research, fostering advancements in multi-class classification methodologies.
摘要
Projecting basis functions with tensor networks for Gaussian process regression
methods: 我们使用了一个线性组合方法,其中每个基函数的精度取决于总共的基函数数量M。我们开发了一种方法,可以使用无限多个基函数,而无需相应的无限大的计算复杂度。这个想法的关键点是使用低维度TN。我们首先从数据中找到一个低维度的子空间,并使用这个子空间来解决一个 Bayesian 推理问题。最后,我们将结果 projection 到原始空间中,以便使用 GP 预测。
results: 我们在一个18维度的标准数据集上进行了一个实验,用于解决一个逆动力学问题。我们的方法可以减少计算复杂度,同时保持 GP 预测的准确性。Abstract
This paper presents a method for approximate Gaussian process (GP) regression with tensor networks (TNs). A parametric approximation of a GP uses a linear combination of basis functions, where the accuracy of the approximation depends on the total number of basis functions $M$. We develop an approach that allows us to use an exponential amount of basis functions without the corresponding exponential computational complexity. The key idea to enable this is using low-rank TNs. We first find a suitable low-dimensional subspace from the data, described by a low-rank TN. In this low-dimensional subspace, we then infer the weights of our model by solving a Bayesian inference problem. Finally, we project the resulting weights back to the original space to make GP predictions. The benefit of our approach comes from the projection to a smaller subspace: It modifies the shape of the basis functions in a way that it sees fit based on the given data, and it allows for efficient computations in the smaller subspace. In an experiment with an 18-dimensional benchmark data set, we show the applicability of our method to an inverse dynamics problem.
摘要
To overcome this limitation, the proposed method uses low-rank TNs to reduce the dimensionality of the data. Specifically, the method first finds a suitable low-dimensional subspace from the data using a low-rank TN. In this low-dimensional subspace, the method then infers the weights of the model by solving a Bayesian inference problem. Finally, the resulting weights are projected back to the original space to make GP predictions.The key advantage of the proposed method is that it allows for efficient computations in a smaller subspace. By projecting the basis functions onto a lower-dimensional space, the method modifies the shape of the basis functions in a way that is appropriate for the given data. This leads to more accurate predictions with a smaller number of basis functions.The proposed method is demonstrated on an 18-dimensional benchmark data set for an inverse dynamics problem. The results show the applicability of the method to real-world problems and its potential to improve the efficiency and accuracy of GP regression.
Graph Matching via convex relaxation to the simplex
results: 在 correlated Gaussian Wigner model 下,示出了凸relaxation方法可以具有高概率Unique解,并且在无噪场景下可以精确地回归真实的 Permutation。此外,还提出了一种新的 suficiency condition,可以更好地限制输入矩阵,从而提高对 GRAMPA 算法的性能。Abstract
This paper addresses the Graph Matching problem, which consists of finding the best possible alignment between two input graphs, and has many applications in computer vision, network deanonymization and protein alignment. A common approach to tackle this problem is through convex relaxations of the NP-hard \emph{Quadratic Assignment Problem} (QAP). Here, we introduce a new convex relaxation onto the unit simplex and develop an efficient mirror descent scheme with closed-form iterations for solving this problem. Under the correlated Gaussian Wigner model, we show that the simplex relaxation admits a unique solution with high probability. In the noiseless case, this is shown to imply exact recovery of the ground truth permutation. Additionally, we establish a novel sufficiency condition for the input matrix in standard greedy rounding methods, which is less restrictive than the commonly used `diagonal dominance' condition. We use this condition to show exact one-step recovery of the ground truth (holding almost surely) via the mirror descent scheme, in the noiseless setting. We also use this condition to obtain significantly improved conditions for the GRAMPA algorithm [Fan et al. 2019] in the noiseless setting.
摘要
Online Conversion with Switching Costs: Robust and Learning-Augmented Algorithms
results: 论文通过一个碳素扩展EV充电案例研究了其提议的算法,并证明了它们可以substantially提高基准方法的性能。Abstract
We introduce and study online conversion with switching costs, a family of online problems that capture emerging problems at the intersection of energy and sustainability. In this problem, an online player attempts to purchase (alternatively, sell) fractional shares of an asset during a fixed time horizon with length $T$. At each time step, a cost function (alternatively, price function) is revealed, and the player must irrevocably decide an amount of asset to convert. The player also incurs a switching cost whenever their decision changes in consecutive time steps, i.e., when they increase or decrease their purchasing amount. We introduce competitive (robust) threshold-based algorithms for both the minimization and maximization variants of this problem, and show they are optimal among deterministic online algorithms. We then propose learning-augmented algorithms that take advantage of untrusted black-box advice (such as predictions from a machine learning model) to achieve significantly better average-case performance without sacrificing worst-case competitive guarantees. Finally, we empirically evaluate our proposed algorithms using a carbon-aware EV charging case study, showing that our algorithms substantially improve on baseline methods for this problem.
摘要
我们介绍和研究在线数据汇流中具有转换成本的问题,这是跨能源和可持续性领域的新兴问题。在这个问题中,一个在线玩家尝试在时间长度为T的时间interval中购买(或卖出)资产的分量。在每个时间步骤中,一个成本函数(或价格函数)会被公布,玩家必须不可逆地决定要购买的资产量。当玩家在连续两个时间步骤中改变他的决定时,就会付出转换成本。我们提出了竞争(可靠)阈值基于的算法,用于最小化和最大化这个问题的解。我们还提出了学习增强的算法,可以利用不可信的黑盒模型(如机器学习模型)来实现更好的平均情况表现,而不需要对最差情况的竞争保证。最后,我们实际评估了我们的提案算法,使用一个具有可持续性的电动车充电案例,展示了我们的算法可以对基eline方法做出重要改进。
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
paper_authors: Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu
For: + The paper aims to address the challenge of offline reinforcement learning (RL) in real-world scenarios where data collection is costly and risky. + The authors propose a general framework called LaMo, which leverages pre-trained Language Models (LMs) to improve offline RL performance.* Methods: + The LaMo framework initializes Decision Transformers with sequentially pre-trained LMs and employs the LoRA fine-tuning method to combine pre-trained knowledge and in-domain knowledge effectively. + The framework uses non-linear MLP transformation to generate embeddings and integrates an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages.* Results: + The LaMo framework achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. + The method demonstrates superior performance in scenarios with limited data samples.Here’s the simplified Chinese text version of the three information points:* For: + 本研究旨在解决在实际应用中,数据采集成本高昂且风险大的下线强化学习问题。 + 作者提出了一个通用的框架,即 LaMo,该框架利用预训练的语言模型(LMs)提高下线强化学习性能。* Methods: + LaMo框架在初始化决策变换器时使用顺序预训练LMs,并使用LoRA fine-tuning方法,相比全量精度练习,有效地结合预训练知识和域内知识。 + 框架使用非线性MLP变换而不是线性投影,生成嵌入,并在练习过程中添加语言预测任务,以稳定LMs并保持其原始语言能力。* Results: + LaMo框架在稀有奖励任务中实现了状态最佳性和减少了决策变换器在拥挤奖励任务中的差距。 + 方法在数据样本有限的情况下表现出优于其他方法。Abstract
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate $\textbf{LaMo}$ achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Our project website is https://lamo2023.github.io
摘要
偏向学习(Offline Reinforcement Learning)目标是找到近似优化策略,使用预收集的数据集。在实际场景中,数据收集可能成本高昂并且具有风险,因此偏向学习在域内数据有限的情况下特别挑战。基于大语言模型(Large Language Models,LLMs)和它们的少量学习能力,这篇论文提出了《语言模型 для动作控制》(LaMo)框架,利用决策转移器来有效地使用预训练的语言模型(LMs)进行偏向学习。我们的框架包括四个关键组成部分:1. 使用顺序预训练的LMs初始化决策转移器。2. 使用LoRA fine-tuning方法,而不是全量精度训练,将LMs的预训练知识和域内知识相结合。3. 使用非线性多层Perceptron变换而不是线性投影,生成特征表示。4. 在精通训练中添加语言预测损失,以稳定LMs并保持它们的原始语言能力。实验结果表明,LaMo在缺少奖励任务中表现出色,并在权值积分任务中追caught到值基本的offline RL方法。尤其是在数据样本有限的情况下,LaMo表现出了superior的性能。如果您想了解更多细节,请访问我们的项目网站:
Stochastic Gradient Descent for Gaussian Processes Done Right
paper_authors: Jihao Andreas Lin, Shreyas Padhy, Javier Antorán, Austin Tripp, Alexander Terenin, Csaba Szepesvári, José Miguel Hernández-Lobato, David Janz
for: 这篇论文关注的是 Gaussian process regression 优化问题,具体来说是使用平方损失函数。
results: 这篇论文的实验结果表明,使用 Stochastic Dual Gradient Descent 算法可以高效地解决 Gaussian process regression 优化问题,并且与其他方法相比,如 conjugate gradient descent 和 variational Gaussian process approximations,表现更出色。在一个分子绑定亲和力预测任务上,这种方法可以让 Gaussian process regression 与状态革命的 graph neural networks 相匹配。Abstract
We study the optimisation problem associated with Gaussian process regression using squared loss. The most common approach to this problem is to apply an exact solver, such as conjugate gradient descent, either directly, or to a reduced-order version of the problem. Recently, driven by successes in deep learning, stochastic gradient descent has gained traction as an alternative. In this paper, we show that when done right$\unicode{x2014}$by which we mean using specific insights from the optimisation and kernel communities$\unicode{x2014}$this approach is highly effective. We thus introduce a particular stochastic dual gradient descent algorithm, that may be implemented with a few lines of code using any deep learning framework. We explain our design decisions by illustrating their advantage against alternatives with ablation studies and show that the new method is highly competitive. Our evaluations on standard regression benchmarks and a Bayesian optimisation task set our approach apart from preconditioned conjugate gradients, variational Gaussian process approximations, and a previous version of stochastic gradient descent for Gaussian processes. On a molecular binding affinity prediction task, our method places Gaussian process regression on par in terms of performance with state-of-the-art graph neural networks.
摘要
我们研究 Gaussian process regression 中的优化问题,使用平方损失函数。最常见的方法是使用精确解算法,如 conjugate gradient descent,直接或将问题缩放到减少的版本上进行解决。在深度学习的成功推动下,stochastic gradient descent 在过去几年中得到了广泛的应用。在这篇论文中,我们表明,当使用特定的优化和核函数社区的知识时,这种方法是非常有效的。我们因此提出了一种特定的随机双重梯度下降算法,可以使用任何深度学习框架进行实现,只需几行代码即可。我们解释了我们的设计决策,并通过缺失研究和比较减少梯度下降、变量 Gaussian process 近似和前一个版本的随机梯度下降,我们的方法与之相比较高效。在一个分子绑定亲和力预测任务上,我们的方法使 Gaussian process regression 与状态机智能网络在性能上具有相同的水平。
Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks
results: 发现了模型初始化的方差对隐私损失的直接关系,并在不同的初始化 distribuion 下显示了深度对隐私损失的复杂交互作用。此外,还证明了在固定 KL 隐私预算下的过分Empirical risk bounds。Abstract
We analytically investigate how over-parameterization of models in randomized machine learning algorithms impacts the information leakage about their training data. Specifically, we prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets, and explore its dependence on the initialization, width, and depth of fully connected neural networks. We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training. Notably, for the special setting of linearized network, our analysis indicates that the squared gradient norm (and therefore the escalation of privacy loss) is tied directly to the per-layer variance of the initialization distribution. By using this analysis, we demonstrate that privacy bound improves with increasing depth under certain initializations (LeCun and Xavier), while degrades with increasing depth under other initializations (He and NTK). Our work reveals a complex interplay between privacy and depth that depends on the chosen initialization distribution. We further prove excess empirical risk bounds under a fixed KL privacy budget, and show that the interplay between privacy utility trade-off and depth is similarly affected by the initialization.
摘要
我们分析了随机机器学习算法中模型过参数化对训练数据泄露信息的影响。我们证明了一个隐私约束 для KL散度 между模型分布在最坏邻居数据集上,并研究其与初始化、宽度和深度等因素的关系。我们发现这个隐私约束主要受到训练过程中参数对模型的期望平方Gradient norm的影响。特别是在特殊的线性化网络情况下,我们的分析表明,squared gradient norm(并且因此隐私损害的加剧)与初始化分布的每层卷积 variance直接相关。我们通过这种分析,证明了在某些初始化情况下(如LeCun和Xavier),隐私约束随着深度增加而提高,而在其他初始化情况下(如He和NTK),隐私约束随着深度增加而下降。我们的研究发现,隐私和深度之间存在复杂的互动,这与选择的初始化分布有关。我们还证明了随着 fix KL 隐私预算的情况下,隐私和深度之间存在的利用性质和深度之间的负相关。
Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization
results: 我们表明了 arTuRO 可以结合适应式 moments-based 优化的快速收敛性和SGD的泛化能力。Abstract
Stochastic gradient-based optimization is crucial to optimize neural networks. While popular approaches heuristically adapt the step size and direction by rescaling gradients, a more principled approach to improve optimizers requires second-order information. Such methods precondition the gradient using the objective's Hessian. Yet, computing the Hessian is usually expensive and effectively using second-order information in the stochastic gradient setting is non-trivial. We propose using Information-Theoretic Trust Region Optimization (arTuRO) for improved updates with uncertain second-order information. By modeling the network parameters as a Gaussian distribution and using a Kullback-Leibler divergence-based trust region, our approach takes bounded steps accounting for the objective's curvature and uncertainty in the parameters. Before each update, it solves the trust region problem for an optimal step size, resulting in a more stable and faster optimization process. We approximate the diagonal elements of the Hessian from stochastic gradients using a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information. We show that arTuRO combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of SGD.
摘要
arTuRO models the network parameters as a Gaussian distribution and uses a Kullback-Leibler divergence-based trust region to take bounded steps that account for the objective's curvature and uncertainty in the parameters. Before each update, it solves a trust region problem for an optimal step size, resulting in a more stable and faster optimization process.To approximate the diagonal elements of the Hessian, we use a simple recursive least squares approach, constructing a model of the expected Hessian over time using only first-order information. Our approach combines the fast convergence of adaptive moment-based optimization with the generalization capabilities of stochastic gradient descent (SGD).
One-shot backpropagation for multi-step prediction in physics-based system identification
results: 作为一个案例研究,提出的方法在估计空间垃圾的惯性矩中进行了测试,并得到了较高的精度。Abstract
The aim of this paper is to present a novel general framework for the identification of possibly interconnected systems, while preserving their physical properties and providing accuracy in multi-step prediction. An analytical and recursive algorithm for the gradient computation of the multi-step loss function based on backpropagation is introduced, providing physical and structural insight directly into the learning algorithm. As a case study, the proposed approach is tested for estimating the inertia matrix of a space debris starting from state observations.
摘要
本文的目的是提出一种新的总体框架,用于可能相互连接的系统的标识,保持物理性质和多步预测精度。我们提出了一种分析和回归的梯度计算方法,基于反射传播,以获得学习算法中的物理和结构直观。作为一个案例研究,我们使用该方法测试预测空间垃圾的拟合矩阵。
Privacy-preserving design of graph neural networks with applications to vertical federated learning
results: 实验结果表明,VESPER可以在公共数据集和industry数据集上训练高性能的GNN模型,并在reasonable privacy budget下实现隐私保证。Abstract
The paradigm of vertical federated learning (VFL), where institutions collaboratively train machine learning models via combining each other's local feature or label information, has achieved great success in applications to financial risk management (FRM). The surging developments of graph representation learning (GRL) have opened up new opportunities for FRM applications under FL via efficiently utilizing the graph-structured data generated from underlying transaction networks. Meanwhile, transaction information is often considered highly sensitive. To prevent data leakage during training, it is critical to develop FL protocols with formal privacy guarantees. In this paper, we present an end-to-end GRL framework in the VFL setting called VESPER, which is built upon a general privatization scheme termed perturbed message passing (PMP) that allows the privatization of many popular graph neural architectures.Based on PMP, we discuss the strengths and weaknesses of specific design choices of concrete graph neural architectures and provide solutions and improvements for both dense and sparse graphs. Extensive empirical evaluations over both public datasets and an industry dataset demonstrate that VESPER is capable of training high-performance GNN models over both sparse and dense graphs under reasonable privacy budgets.
摘要
vertical 联合学习(VFL)的 paradigm,where institutions 合作 trains machine learning 模型,via combining each other's local feature 或标签信息,has achieved great success in financial risk management(FRM)applications。the surging developments of graph representation learning(GRL)have opened up new opportunities for FRM applications under FL via efficiently utilizing the graph-structured data generated from underlying transaction networks。However,transaction information is often considered highly sensitive,so it is critical to develop FL protocols with formal privacy guarantees。In this paper,we present an end-to-end GRL framework in the VFL setting called VESPER,which is built upon a general privatization scheme termed perturbed message passing(PMP)that allows the privatization of many popular graph neural architectures。Based on PMP,we discuss the strengths and weaknesses of specific design choices of concrete graph neural architectures and provide solutions and improvements for both dense and sparse graphs。extensive empirical evaluations over both public datasets and an industry dataset demonstrate that VESPER is capable of training high-performance GNN models over both sparse and dense graphs under reasonable privacy budgets。
Multi-task learning of convex combinations of forecasting models
results: 实验结果表明,提出的方法可以在 M4 竞赛数据集上增加点预测精度,比之前的方法更高。Abstract
Forecast combination involves using multiple forecasts to create a single, more accurate prediction. Recently, feature-based forecasting has been employed to either select the most appropriate forecasting models or to learn the weights of their convex combination. In this paper, we present a multi-task learning methodology that simultaneously addresses both problems. This approach is implemented through a deep neural network with two branches: the regression branch, which learns the weights of various forecasting methods by minimizing the error of combined forecasts, and the classification branch, which selects forecasting methods with an emphasis on their diversity. To generate training labels for the classification task, we introduce an optimization-driven approach that identifies the most appropriate methods for a given time series. The proposed approach elicits the essential role of diversity in feature-based forecasting and highlights the interplay between model combination and model selection when learning forecasting ensembles. Experimental results on a large set of series from the M4 competition dataset show that our proposal enhances point forecast accuracy compared to state-of-the-art methods.
摘要
forecast 组合 involves 使用多个 forecast 创造一个更准确的预测。 最近, feature-based forecasting 已经被用来选择最有用的 forecasting 模型或学习它们的 convex 组合的重量。 在这篇论文中,我们提出了一种多任务学习方法ологи,同时解决了这两个问题。 这种方法通过一个深度神经网络,其中有两个分支:回归分支,通过最小化组合预测错误来学习不同预测方法的重量,以及分类分支,通过强调多样性来选择适合特定时间序列的预测方法。 为生成训练标签,我们引入了一种优化驱动的方法,可以确定特定时间序列中最适合的预测方法。 提议中的方法强调了特征基于预测的多样性,并且高亮了组合预测和选择预测方法之间的互动。 实验结果表明,我们的提议可以比现有方法提高点预测精度。
Group-Feature (Sensor) Selection With Controlled Redundancy Using Neural Networks
results: 实验结果表明,提出的方法在一些标准数据集上具有优秀的表现,比如特征选择和组特征选择等。Abstract
In this paper, we present a novel embedded feature selection method based on a Multi-layer Perceptron (MLP) network and generalize it for group-feature or sensor selection problems, which can control the level of redundancy among the selected features or groups. Additionally, we have generalized the group lasso penalty for feature selection to encompass a mechanism for selecting valuable group features while simultaneously maintaining a control over redundancy. We establish the monotonicity and convergence of the proposed algorithm, with a smoothed version of the penalty terms, under suitable assumptions. Experimental results on several benchmark datasets demonstrate the promising performance of the proposed methodology for both feature selection and group feature selection over some state-of-the-art methods.
摘要
在这篇论文中,我们提出了一种基于多层感知网络(MLP)的嵌入式特征选择方法,并将其推广到组特征或感知器选择问题,以控制选择的特征或组中的重复性。此外,我们扩展了组lasso penalty的特征选择机制,以同时选择价值很高的组特征,并保持特征或组中的重复性控制。我们证明了提案的算法的升降持续性和收敛性,在适当的假设下。实验结果表明,提案的方法在多个标准数据集上具有优秀的表现,超过了一些当前的方法。
results: 研究表明,使用扩展的人口平衡度标准和 Parametric 方法可以提高机器学习模型的公平性,并且可以允许专家知识的使用,以避免 tradicional 公平度标准的局限性。Abstract
Algorithmic fairness has gained prominence due to societal and regulatory concerns about biases in Machine Learning models. Common group fairness metrics like Equalized Odds for classification or Demographic Parity for both classification and regression are widely used and a host of computationally advantageous post-processing methods have been developed around them. However, these metrics often limit users from incorporating domain knowledge. Despite meeting traditional fairness criteria, they can obscure issues related to intersectional fairness and even replicate unwanted intra-group biases in the resulting fair solution. To avoid this narrow perspective, we extend the concept of Demographic Parity to incorporate distributional properties in the predictions, allowing expert knowledge to be used in the fair solution. We illustrate the use of this new metric through a practical example of wages, and develop a parametric method that efficiently addresses practical challenges like limited training data and constraints on total spending, offering a robust solution for real-life applications.
摘要
《算法公平性在机器学习模型中得到了更多的关注,因为社会和管制机构对这些模型中的偏见有所关注。常见的集体公平度度量如Equalized Odds for classification和Demographic Parity for classification和regression都广泛使用,而且有许多计算优点的后处理方法被开发出来。然而,这些度量经常限制用户不能使用领域知识。即使符合传统的公平性标准,它们可能隐藏 intersectional 公平性问题,甚至在处理公平解决方案时复制不良内部偏见。为了避免这种狭隘的视角,我们扩展了 Demographic Parity 的概念,以包括预测结果中的分布性质,使得专家知识可以包含在公平解决方案中。我们通过一个实际的薪资示例来说明使用这种新的度量,并开发了一种参数化的方法,可以有效地解决实际应用中的困难,如有限的训练数据和总支出的约束,提供一个可靠的解决方案。》Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Generative Learning of Continuous Data by Tensor Networks
paper_authors: Alex Meiburg, Jing Chen, Jacob Miller, Raphaëlle Tihon, Guillaume Rabusseau, Alejandro Perdomo-Ortiz
for: solves machine learning problems, especially unsupervised generative learning
methods: uses tensor network generative models for continuous data, with a new family of models based on matrix product states
results: can approximate any reasonably smooth probability density function with arbitrary precision, and performs well on synthetic and real-world datasets with both continuous and discrete variables.Abstract
Beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. While possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting their utility in real-world modeling problems. We overcome this by introducing a new family of tensor network generative models for continuous data, which are capable of learning from distributions containing continuous random variables. We develop our method in the setting of matrix product states, first deriving a universal expressivity theorem proving the ability of this model family to approximate any reasonably smooth probability density function with arbitrary precision. We then benchmark the performance of this model on several synthetic and real-world datasets, finding that the model learns and generalizes well on distributions of continuous and discrete variables. We develop methods for modeling different data domains, and introduce a trainable compression layer which is found to increase model performance given limited memory or computational resources. Overall, our methods give important theoretical and empirical evidence of the efficacy of quantum-inspired methods for the rapidly growing field of generative learning.
摘要
beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. while possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting their utility in real-world modeling problems. we overcome this by introducing a new family of tensor network generative models for continuous data, which are capable of learning from distributions containing continuous random variables. we develop our method in the setting of matrix product states, first deriving a universal expressivity theorem proving the ability of this model family to approximate any reasonably smooth probability density function with arbitrary precision. we then benchmark the performance of this model on several synthetic and real-world datasets, finding that the model learns and generalizes well on distributions of continuous and discrete variables. we develop methods for modeling different data domains, and introduce a trainable compression layer which is found to increase model performance given limited memory or computational resources. overall, our methods give important theoretical and empirical evidence of the efficacy of quantum-inspired methods for the rapidly growing field of generative learning.Here's the translation in Traditional Chinese:这些tensor网络在训练问题上已经获得了广泛的应用,特别是在无supervision的生成学习方面。这些模型具有从量子灵感中获得的多个有利特征,但它们在实际应用中还是受到了限制,因为它们只能处理二进制或分类型数据。我们在matrix product states的设定下解决了这个问题,我们引入了一新的tensor网络生成模型,可以从包含连续随机变量的分布中学习。我们首先证明了这个模型家族可以对任何合理平滑概率密度函数进行任意精度的 aproximation。然后,我们对一些人工和实际数据集进行了 benchmarking,发现这个模型可以从连续和分别类型数据中学习和推导 well。我们还开发了不同的数据域模型,并引入了可调压缩层,这个层可以在有限的存储或计算资源情况下提高模型表现。总的来说,我们的方法给了量子灵感方法在生成学习领域的发展具有重要的理论和实验证据。
BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis
results: 这篇论文的实验结果显示,BasisFormer 比前一代方法高出11.04%和15.78% respectively 的精度,对于单Variate和多Variate预测任务都有着出色的表现。Abstract
Bases have become an integral part of modern deep learning-based models for time series forecasting due to their ability to act as feature extractors or future references. To be effective, a basis must be tailored to the specific set of time series data and exhibit distinct correlation with each time series within the set. However, current state-of-the-art methods are limited in their ability to satisfy both of these requirements simultaneously. To address this challenge, we propose BasisFormer, an end-to-end time series forecasting architecture that leverages learnable and interpretable bases. This architecture comprises three components: First, we acquire bases through adaptive self-supervised learning, which treats the historical and future sections of the time series as two distinct views and employs contrastive learning. Next, we design a Coef module that calculates the similarity coefficients between the time series and bases in the historical view via bidirectional cross-attention. Finally, we present a Forecast module that selects and consolidates the bases in the future view based on the similarity coefficients, resulting in accurate future predictions. Through extensive experiments on six datasets, we demonstrate that BasisFormer outperforms previous state-of-the-art methods by 11.04\% and 15.78\% respectively for univariate and multivariate forecasting tasks. Code is available at: \url{https://github.com/nzl5116190/Basisformer}
摘要
基于现代深度学习模型,时间序列预测中的基准已成为一个重要组成部分,因为它们可以作为特征提取器或未来参考。为了有效,一个基准必须适应特定的时间序列数据集,并且与每个时间序列在该集中表现出明确的相关性。然而,现有的状态 искусственный智能方法受限于同时满足这两个需求的能力。为解决这个挑战,我们提议了基础former,一种终端时间序列预测架构,它利用可学习和可解释的基准。这个架构包括三个组成部分:首先,我们通过适应性自我超vised学习获得基准,该学习方法将历史和未来部分视为两个不同的视图,并使用对比学习。然后,我们设计了一个Coef模块,该模块通过双向交叉注意力计算时间序列和基准在历史视图中的相似性系数。最后,我们提出了一个预测模块,该模块根据相似性系数选择和汇聚未来视图中的基准,从而实现准确的未来预测。经过广泛的实验,我们证明了基础former在六个数据集上的表现优于前一代方法11.04%和15.78%。代码可以在:\url{https://github.com/nzl5116190/Basisformer} 查看
Requirement falsification for cyber-physical systems using generative models
results: CPS falsification efficiency and effectiveness (state-of-the-art)Abstract
We present the OGAN algorithm for automatic requirement falsification of cyber-physical systems. System inputs and output are represented as piecewise constant signals over time while requirements are expressed in signal temporal logic. OGAN can find inputs that are counterexamples for the safety of a system revealing design, software, or hardware defects before the system is taken into operation. The OGAN algorithm works by training a generative machine learning model to produce such counterexamples. It executes tests atomically and does not require any previous model of the system under test. We evaluate OGAN using the ARCH-COMP benchmark problems, and the experimental results show that generative models are a viable method for requirement falsification. OGAN can be applied to new systems with little effort, has few requirements for the system under test, and exhibits state-of-the-art CPS falsification efficiency and effectiveness.
摘要
我团队提出了OGAN算法,用于自动化 cyber-physical systems 的需求证明。系统的输入和输出都表示为时间上的分割常量信号,而需求则用 signal temporal logic 表达。OGAN 可以找到系统的安全性不足的输入 counterexample,暴露设计、软件或硬件问题,从而避免系统在运行前发现问题。OGAN 算法通过训练生成机器学习模型来生成 counterexample。它可以原子地执行测试,不需要任何先前系统模型。我们使用 ARCH-COMP bencmark 问题进行评估,实验结果表明,生成模型是可靠的需求证明方法。OGAN 可以应用于新系统,需要少量的努力和系统的输入,并且具有现代 CPS 证明效率和可靠性。
Log-based Anomaly Detection of Enterprise Software: An Empirical Study
results: 研究发现,不同的模型在不同的数据集上表现不同,特别是在较小的、不具有固定结构的数据集上。此外,通过移除一些常见的数据泄露问题,研究发现模型的效果有所改善。同时,对开发者对异常分析的评估也表明了不同的模型在检测不同类型异常时的优缺点。最后,通过逐渐增加训练数据量来评估模型效果的影响也被研究。Abstract
Most enterprise applications use logging as a mechanism to diagnose anomalies, which could help with reducing system downtime. Anomaly detection using software execution logs has been explored in several prior studies, using both classical and deep neural network-based machine learning models. In recent years, the research has largely focused in using variations of sequence-based deep neural networks (e.g., Long-Short Term Memory and Transformer-based models) for log-based anomaly detection on open-source data. However, they have not been applied in industrial datasets, as often. In addition, the studied open-source datasets are typically very large in size with logging statements that do not change much over time, which may not be the case with a dataset from an industrial service that is relatively new. In this paper, we evaluate several state-of-the-art anomaly detection models on an industrial dataset from our research partner, which is much smaller and loosely structured than most large scale open-source benchmark datasets. Results show that while all models are capable of detecting anomalies, certain models are better suited for less-structured datasets. We also see that model effectiveness changes when a common data leak associated with a random train-test split in some prior work is removed. A qualitative study of the defects' characteristics identified by the developers on the industrial dataset further shows strengths and weaknesses of the models in detecting different types of anomalies. Finally, we explore the effect of limited training data by gradually increasing the training set size, to evaluate if the model effectiveness does depend on the training set size.
摘要
大多数企业应用程序使用日志来诊断问题,以减少系统下时间。问题探测使用软件执行日志的机器学习模型已经在多个先前研究中进行过探讨,使用了古典和深度神经网络模型。在最近几年,研究几乎集中在使用序列基的深度神经网络模型(例如Long-Short Term Memory和Transformer-based模型)进行日志基的问题探测。但是,这些模型尚未在工业数据集中使用。此外,研究使用的开源数据集通常是非常大的,且日志陈述不会随时间变化,这可能不是工业服务中的数据集。在本文中,我们评估了一些现有的问题探测模型,在我们的研究伙伴提供的工业数据集上进行评估。结果显示,处理器都能够探测问题,但一些模型更适合不具体的数据集。我们还发现,模型的效果会因某些常见的数据泄露而变化。此外,我们进一步进行了开发人员关于数据集中的问题特征的质性研究,以了解不同类型的问题探测模型在不同类型的问题上的优劣。最后,我们考虑了训练集大小的影响,通过逐步增加训练集大小来评估模型的效果是否受训练集大小影响。
Exploring Practitioner Perspectives On Training Data Attribution Explanations
results: 研究发现,实际中模型性能往往受到培根数据质量的影响,而模型开发者通常靠自己的经验来选择和准备数据。使用TDA解释不够知名,因此不被广泛使用。研究提醒了社区,需要从人机合作角度出发,推广TDA技术的应用和评估,以满足实际应用中的需求。Abstract
Explainable AI (XAI) aims to provide insight into opaque model reasoning to humans and as such is an interdisciplinary field by nature. In this paper, we interviewed 10 practitioners to understand the possible usability of training data attribution (TDA) explanations and to explore the design space of such an approach. We confirmed that training data quality is often the most important factor for high model performance in practice and model developers mainly rely on their own experience to curate data. End-users expect explanations to enhance their interaction with the model and do not necessarily prioritise but are open to training data as a means of explanation. Within our participants, we found that TDA explanations are not well-known and therefore not used. We urge the community to focus on the utility of TDA techniques from the human-machine collaboration perspective and broaden the TDA evaluation to reflect common use cases in practice.
摘要
Explainable AI (XAI) 目标是为人类提供模型决策的透明性,因此是一个跨学科领域的研究领域。在这篇论文中,我们采访了10名实践者,以了解培训数据拟合(TDA)说明的可能性,并探讨这种方法的设计空间。我们发现,实际中高模型性能的关键因素通常是培训数据质量,而模型开发者主要依靠自己的经验来选择数据。使用者希望通过模型与人类的交互来得到说明,而不一定优先考虑培训数据作为解释的来源。在我们的参与者中,我们发现TDA说明并不很知道,因此并不常用。我们呼吁社区关注TDA技术在人机合作视角下的实用性,扩大TDA评价的范围,以满足实际应用中的常见用例。
Amoeba: Circumventing ML-supported Network Censorship via Adversarial Reinforcement Learning
results: 根据实验结果,Amoeba 可以实现高度的逃脱成功率(平均为 94%),并且这些逃脱的流量具有转移性和稳定性,能够在不同的网络环境下保持高度的逃脱能力,并且可以在不同的 ML 模型之间进行转移。Abstract
Embedding covert streams into a cover channel is a common approach to circumventing Internet censorship, due to censors' inability to examine encrypted information in otherwise permitted protocols (Skype, HTTPS, etc.). However, recent advances in machine learning (ML) enable detecting a range of anti-censorship systems by learning distinct statistical patterns hidden in traffic flows. Therefore, designing obfuscation solutions able to generate traffic that is statistically similar to innocuous network activity, in order to deceive ML-based classifiers at line speed, is difficult. In this paper, we formulate a practical adversarial attack strategy against flow classifiers as a method for circumventing censorship. Specifically, we cast the problem of finding adversarial flows that will be misclassified as a sequence generation task, which we solve with Amoeba, a novel reinforcement learning algorithm that we design. Amoeba works by interacting with censoring classifiers without any knowledge of their model structure, but by crafting packets and observing the classifiers' decisions, in order to guide the sequence generation process. Our experiments using data collected from two popular anti-censorship systems demonstrate that Amoeba can effectively shape adversarial flows that have on average 94% attack success rate against a range of ML algorithms. In addition, we show that these adversarial flows are robust in different network environments and possess transferability across various ML models, meaning that once trained against one, our agent can subvert other censoring classifiers without retraining.
摘要
嵌入底层流入覆盖通道是常见的绕过互联网审查的方法,因为审查人员无法检查加密信息在允许的协议中(Skype、HTTPS等)。然而,最近的机器学习(ML)技术的进步使得可以通过学习流量中的特征 Patterns 检测多种防火墙系统。因此,设计生成混淆流量的方法,以便在速度上骗 ML 基于的分类器,是困难的。在这篇论文中,我们提出了一种实用的反抗流分类器的攻击策略,用于绕过审查。具体来说,我们将问题找到可以被 ML 分类器错分的流程的找到作为一个序列生成任务,并使用我们设计的 Amoeba 算法来解决这个问题。Amoeba 算法会与审查分类器交互,不知道它们的模型结构,但是通过编辑包和观察分类器的决策,以导引序列生成过程。我们的实验结果表明,使用 Amoeba 算法可以生成高效的反抗流,其平均攻击成功率为 94%,并且这些反抗流在不同的网络环境下保持稳定,同时具有转移性,meaning that once trained against one censoring classifier, our agent can subvert other censoring classifiers without retraining.
paper_authors: Tom Coates, Alexander M. Kasprzyk, Sara Veneziale
for: 本研究用机器学习方法来 классифицировать八维正几何体。
methods: 使用神经网络分类器来预测八维正几何体是否为Q-Fano变体。
results: 使用机器学习方法可以准确地预测八维正几何体是否为Q-Fano变体,并且可以提供初步的Q-Fano变体景观。Abstract
Algebraic varieties are the geometric shapes defined by systems of polynomial equations; they are ubiquitous across mathematics and science. Amongst these algebraic varieties are Q-Fano varieties: positively curved shapes which have Q-factorial terminal singularities. Q-Fano varieties are of fundamental importance in geometry as they are "atomic pieces" of more complex shapes - the process of breaking a shape into simpler pieces in this sense is called the Minimal Model Programme. Despite their importance, the classification of Q-Fano varieties remains unknown. In this paper we demonstrate that machine learning can be used to understand this classification. We focus on 8-dimensional positively-curved algebraic varieties that have toric symmetry and Picard rank 2, and develop a neural network classifier that predicts with 95% accuracy whether or not such an algebraic variety is Q-Fano. We use this to give a first sketch of the landscape of Q-Fanos in dimension 8. How the neural network is able to detect Q-Fano varieties with such accuracy remains mysterious, and hints at some deep mathematical theory waiting to be uncovered. Furthermore, when visualised using the quantum period, an invariant that has played an important role in recent theoretical developments, we observe that the classification as revealed by ML appears to fall within a bounded region, and is stratified by the Fano index. This suggests that it may be possible to state and prove conjectures on completeness in the future. Inspired by the ML analysis, we formulate and prove a new global combinatorial criterion for a positively curved toric variety of Picard rank 2 to have terminal singularities. Together with the first sketch of the landscape of Q-Fanos in higher dimensions, this gives new evidence that machine learning can be an essential tool in developing mathematical conjectures and accelerating theoretical discovery.
摘要
algebraic varieties 是 mathematics 和 science 中的几何形状,它们是由多项式方程定义的。其中包括 Q-Fano varieties:具有 Q-factorial 终点特性的正几何形状。Q-Fano varieties 在几何中具有基本重要性,因为它们是更复杂形状的“原子组分”——破坏这种形状的过程被称为 Minimal Model Programme。尽管其分类仍然未知,但我们在这篇论文中使用机器学习来理解这个分类。我们关注 8 维正几何形状,具有 toric симметry 和 Picard rank 2,并开发了一个神经网络分类器,可以将其中的 algebraic variety 分为 Q-Fano 和非 Q-Fano 两类,准确率为 95%。我们使用这个分类器来给 Q-Fano 变量在 8 维中的首个风景。神经网络如何准确地检测 Q-Fano 变量的原理仍然是一个谜,它可能表明了一些深刻的数学理论。此外,通过使用量子期,一个具有重要作用的 invariant,我们发现分类结果在一个固定区域内,并且按照 Fano 指数的分布。这表示可能在未来提出和证明某些 conjecture 的 Completeness。受 ML 分析启发,我们提出和证明了一个全球 combinatorial criterion,用于判断正几何 toric 变量的 Picard rank 2 是否具有终点特性。这与 Q-Fano 变量在更高维度中的首个风景,以及新的数学推断的证明,共同给出了新的证明,表明机器学习可以成为数学推断的关键工具。
FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments
results: 作者在CIFAR-100数据集上实现了FlexTrain的效果,一个全球模型可以轻松地在多种设备上部署,从而降低训练时间和能耗。此外,作者还扩展了FlexTrain到联合学习Setting,表明该方法在CIFAR-10和CIFAR-100数据集上超过了标准联合学习标准准则。Abstract
As deep learning models become increasingly large, they pose significant challenges in heterogeneous devices environments. The size of deep learning models makes it difficult to deploy them on low-power or resource-constrained devices, leading to long inference times and high energy consumption. To address these challenges, we propose FlexTrain, a framework that accommodates the diverse storage and computational resources available on different devices during the training phase. FlexTrain enables efficient deployment of deep learning models, while respecting device constraints, minimizing communication costs, and ensuring seamless integration with diverse devices. We demonstrate the effectiveness of FlexTrain on the CIFAR-100 dataset, where a single global model trained with FlexTrain can be easily deployed on heterogeneous devices, saving training time and energy consumption. We also extend FlexTrain to the federated learning setting, showing that our approach outperforms standard federated learning benchmarks on both CIFAR-10 and CIFAR-100 datasets.
摘要
随着深度学习模型的大小不断增长,它们在多种设备环境中带来了重要的挑战。由于深度学习模型的大小,它们在低功率或资源受限的设备上部署困难,从而导致了长期的推理时间和高能耗。为了解决这些挑战,我们提出了FlexTrain框架,该框架在训练阶段可以适应不同设备之间的多样化存储和计算资源。FlexTrain可以有效地部署深度学习模型,同时尊重设备限制,降低通信成本,并具有与多种设备的一体化性。我们在CIFAR-100 dataset上示出了FlexTrain的有效性,其中一个全球模型可以轻松地在不同设备上部署,从而节省训练时间和能耗。此外,我们还扩展了FlexTrain到联合学习Setting中,并证明了我们的方法在CIFAR-10和CIFAR-100 dataset上超过标准联合学习标准准则。
The Phase Transition Phenomenon of Shuffled Regression
for: 这 paper investigate the phase transition phenomenon in shuffled (permuted) regression problem, which has many applications in databases, privacy, data analysis, etc.
methods: 这 paper 使用 message passing (MP) 技术来 precisely identify the phase transition points. The authors first transform the permutation recovery problem into a probabilistic graphical model, and then use the analytical tools rooted in the MP algorithm to derive an equation to track the convergence of the MP algorithm.
results: 根据 this study, the impact of signal-to-noise-ratio ($\snr$) on permutation recovery can be characterized by linking the equation to the branching random walk process. In the oracle case, the method can fairly accurately predict the phase transition $\snr$. In the non-oracle case, the algorithm can predict the maximum allowed number of permuted rows and uncover its dependency on the sample number.Abstract
We study the phase transition phenomenon inherent in the shuffled (permuted) regression problem, which has found numerous applications in databases, privacy, data analysis, etc. In this study, we aim to precisely identify the locations of the phase transition points by leveraging techniques from message passing (MP). In our analysis, we first transform the permutation recovery problem into a probabilistic graphical model. We then leverage the analytical tools rooted in the message passing (MP) algorithm and derive an equation to track the convergence of the MP algorithm. By linking this equation to the branching random walk process, we are able to characterize the impact of the signal-to-noise-ratio ($\snr$) on the permutation recovery. Depending on whether the signal is given or not, we separately investigate the oracle case and the non-oracle case. The bottleneck in identifying the phase transition regimes lies in deriving closed-form formulas for the corresponding critical points, but only in rare scenarios can one obtain such precise expressions. To tackle this technical challenge, this study proposes the Gaussian approximation method, which allows us to obtain the closed-form formulas in almost all scenarios. In the oracle case, our method can fairly accurately predict the phase transition $\snr$. In the non-oracle case, our algorithm can predict the maximum allowed number of permuted rows and uncover its dependency on the sample number.
摘要
我们研究排序(permuted)回溯问题中的相变现象,这问题在数据库、隐私、数据分析等领域都获得了广泛应用。在这些研究中,我们将专注于精确地描述相变点的位置,并使用讯息传递(MP)技术来进行分析。我们首先将排序回溯问题转换为概率Graphical Model。然后,我们利用MP算法的分析工具, derivation equation to track MP algorithm的参数。通过与分支随机步进程连接这个方程,我们能够描述 $\snr$ 对排序回溯的影响。假设讯息是否存在,我们分别进行 oracle 和非 oracle 两种情况的研究。在确定相变点的难点上,这些研究对应的批处是很困难,但这些研究提出了一种名为 Gaussian approximation method的方法,可以在大多数情况下获得关键的关键表达。在 oracle 情况下,我们的方法可以对相变 $\snr$ 进行很好的预测。在非 oracle 情况下,我们的算法可以预测最多允许的排序回溯行数,并且描述这个值对应的样本数的相互关联。
Discussing the Spectra of Physics-Enhanced Machine Learning via a Survey on Structural Mechanics Applications
results: 本研究通过一系列实验和案例研究,展示了物理学增强机器学习方法在复杂问题上的应用和优势。同时,提供了一些实际的代码,以便读者可以参照和尝试。Abstract
The intersection of physics and machine learning has given rise to a paradigm that we refer to here as physics-enhanced machine learning (PEML), aiming to improve the capabilities and reduce the individual shortcomings of data- or physics-only methods. In this paper, the spectrum of physics-enhanced machine learning methods, expressed across the defining axes of physics and data, is discussed by engaging in a comprehensive exploration of its characteristics, usage, and motivations. In doing so, this paper offers a survey of recent applications and developments of PEML techniques, revealing the potency of PEML in addressing complex challenges. We further demonstrate application of select such schemes on the simple working example of a single-degree-of-freedom Duffing oscillator, which allows to highlight the individual characteristics and motivations of different `genres' of PEML approaches. To promote collaboration and transparency, and to provide practical examples for the reader, the code of these working examples is provided alongside this paper. As a foundational contribution, this paper underscores the significance of PEML in pushing the boundaries of scientific and engineering research, underpinned by the synergy of physical insights and machine learning capabilities.
摘要
физи学和机器学习的交叉点已经给出了一种新的思想,我们称之为物理增强机器学习(PEML),旨在提高数据或物理方法的能力,同时减少它们的个体缺陷。在这篇论文中,我们讨论了物理增强机器学习方法的谱系,通过物理和数据两个定义轴的交叉分析其特征、使用和动机。这种方法的应用和发展,包括一些最近的应用和发展,揭示了PEML在复杂挑战中的力量。此外,我们还使用单度 oscillator 作为一个简单的工作示例,以阐明不同类型的 PEML 方法的特点和动机。为促进合作和透明度,并为读者提供实践例子,我们附加了这篇论文中的代码。作为基础贡献,这篇论文强调了PEML在科学和工程研究的前沿Positioning, 基于物理洞察和机器学习能力的共同作用。
DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory
paper_authors: Cenlin Duan, Jianlei Yang, Xiaolin He, Yingjie Qi, Yikun Wang, Yiou Wang, Ziyan He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weitao Pan, Weisheng Zhao for: 这个论文主要目标是提高处理在内存中(PIM)的性能,尤其是减少有效数据移动的问题。methods: 这篇论文提出了一种名为DDC-PIM的效率算法/架构合作方法,用于提高SRAM基于PIM的容量。在算法层次,提出了一种筛子相关协同(FCC)算法来获得一个bitwise相对补做。在架构层次,利用6T SRAM的内生交叠结构,将每个SRAM单元中的bitwise相对补做存储在其 complementary states($Q/\bar{Q}$)中,以最大化每个SRAM单元的数据容量。results: 评估结果表明,DDC-PIM相比PIM基eline实现,在MobileNetV2和EfficientNet-B0上提供了约2.84倍的速度提升,而且减少了精度损失。与现有的SRAM基于PIM宏比较,DDC-PIM在Weight Density和面积效率方面提供了最高的改进,即8.41倍和2.75倍。Abstract
Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other non-volatile memory-based ones, due to its inherent 6T structure for storing a single bit. Within comparable area constraints, SRAM-based PIM exhibits notably lower capacity. Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity. At the algorithmic level, we propose a filter-wise complementary correlation (FCC) algorithm to obtain a bitwise complementary pair. At the architecture level, we exploit the intrinsic cross-coupled structure of 6T SRAM to store the bitwise complementary pair in their complementary states ($Q/\overline{Q}$), thereby maximizing the data capacity of each SRAM cell. The dual-broadcast input structure and reconfigurable unit support both depthwise and pointwise convolution, adhering to the requirements of various neural networks. Evaluation results show that DDC-PIM yields about $2.84\times$ speedup on MobileNetV2 and $2.69\times$ on EfficientNet-B0 with negligible accuracy loss compared with PIM baseline implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM achieves up to $8.41\times$ and $2.75\times$ improvement in weight density and area efficiency, respectively.
摘要
Processing-in-memory (PIM)是一种新的计算模式,它在数据移动效率方面提供了显著的性能优势。SRAM基于的PIM被认为是最有前途的候选者,因为它具有持续性和兼容性。然而,SRAM基于的PIM的集成密度远低于其他非朗帕内存基于的一些,这是因为它的内在6T结构只能存储一个比特。在相同的面积限制下,SRAM基于的PIM表现出较低的容量。为了解锁其容量潜力,我们提出了DDC-PIM,一种高效的算法/架构合作方法,可以有效地将其容量提高至二倍。在算法层次,我们提出了一种filter-wise complementary correlation(FCC)算法,以获得一个比特对。在架构层次,我们利用6T SRAM的内在交叉结构,将比特对存储在其相互 complementary 状态($Q/\bar{Q}$)中,从而最大化每个SRAM单元的数据容量。双向广播输入结构和可重新配置单元支持深度wise和点wise卷积,符合各种神经网络的需求。评估结果显示,DDC-PIM可以在MobileNetV2和EfficientNet-B0上提供约2.84倍的速度提升,与PIM基eline实现相比,而且减少精度损失。相比之下,DDC-PIM在SRAM基于PIM macros中实现了最高达8.41倍和2.75倍的Weight density和面积效率提升,分别。
Coalitional Manipulations and Immunity of the Shapley Value
results: 论文发现,使用这种新的基础可以获得一个唯一的高效和 симметри�efficient allocation rule,该规则具有免受协续操纵和内部重新分配价值的特性。此外,论文还发现,对于高效的分配规则,免受协续操纵性和内部重新分配价值的特性是等价的。Abstract
We consider manipulations in the context of coalitional games, where a coalition aims to increase the total payoff of its members. An allocation rule is immune to coalitional manipulation if no coalition can benefit from internal reallocation of worth on the level of its subcoalitions (reallocation-proofness), and if no coalition benefits from a lower worth while all else remains the same (weak coalitional monotonicity). Replacing additivity in Shapley's original characterization by these requirements yields a new foundation of the Shapley value, i.e., it is the unique efficient and symmetric allocation rule that awards nothing to a null player and is immune to coalitional manipulations. We further find that for efficient allocation rules, reallocation-proofness is equivalent to constrained marginality, a weaker variant of Young's marginality axiom. Our second characterization improves upon Young's characterization by weakening the independence requirement intrinsic to marginality.
摘要
我们在伙伴游戏中考虑操作,伙伴的目的是增加成员的总回扣。一个分配规则是免受伙伴操作的抵抗(reallocation-proofness),如果无法对内部子伙伴的资产重新分配(reallocation),并且如果无法在其他事物保持不变的情况下,从低回扣获得更多的回扣(弱伙伴对称)。将添加性在雪布利原始特征中替换为这些需求,则得到一个新的基础,即雪布利值是唯一的有效和对称分配规则,没有空戏者获得回扣,并且免受伙伴操作。我们还发现,对于有效分配规则,reallocation-proofness与对称组合的限制组合(constrained marginality)相等,这是对Young的组合性质较弱的一个变形。我们的第二个特征提高了Young的特征,削弱了独立性的内在需求。
A hybrid approach for solving the gravitational N-body problem with Artificial Neural Networks
paper_authors: Veronica Saz Ulibarrena, Philipp Horn, Simon Portegies Zwart, Elena Sellentin, Barry Koren, Maxwell X. Cai
for: 这 paper 的目的是研究使用人工神经网络(ANNs)来加速天体系统的数值integration。
methods: 这 paper 使用了 Hamiltonian Neural Networks 和 Deep Neural Networks 来代替 computationally expensive parts of the numerical simulation。
results: 使用 hybrid integrator 可以增加方法的可靠性,并避免大量的能量错误。 在 asteroids 的数量大于 70 时,使用神经网络可以实现更快的 simulations。Abstract
Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the last few years, although few attempts have been made to use them to speed up the simulation of the motion of celestial bodies. We study the advantages and limitations of using Hamiltonian Neural Networks to replace computationally expensive parts of the numerical simulation. We compare the results of the numerical integration of a planetary system with asteroids with those obtained by a Hamiltonian Neural Network and a conventional Deep Neural Network, with special attention to understanding the challenges of this problem. Due to the non-linear nature of the gravitational equations of motion, errors in the integration propagate. To increase the robustness of a method that uses neural networks, we propose a hybrid integrator that evaluates the prediction of the network and replaces it with the numerical solution if considered inaccurate. Hamiltonian Neural Networks can make predictions that resemble the behavior of symplectic integrators but are challenging to train and in our case fail when the inputs differ ~7 orders of magnitude. In contrast, Deep Neural Networks are easy to train but fail to conserve energy, leading to fast divergence from the reference solution. The hybrid integrator designed to include the neural networks increases the reliability of the method and prevents large energy errors without increasing the computing cost significantly. For this problem, the use of neural networks results in faster simulations when the number of asteroids is >70.
摘要
模拟行星系统的 gravitational N-body 问题的计算成本随着 N 的增加而变得极其高昂,因为问题的复杂度与体数之间存在 quadratic 关系。我们研究使用人工神经网络 (ANNs) 来取代计算成本高昂的部分。 Physical knowledge 包含的神经网络在过去几年中得到了广泛应用,但对于使用其快速化行星系统的运动 simulations 的尝试却非常少。我们研究使用 Hamiltonian Neural Networks 取代计算成本高昂的部分的优势和局限性。我们将比较一个包含 asteroids 的 planetary system 的数值积分与一个 Hamiltonian Neural Network 和一个 conventinal Deep Neural Network 的结果,并特别关注这个问题的挑战。由于 gravitational 方程的非线性,误差在积分中会卷积。为了增加使用神经网络的方法的可靠性,我们提出了一种 hybrid 积分器,该积分器会评估神经网络的预测,并将其替换为数值解决方案如果被视为不准确。 Hamiltonian Neural Networks 可以预测与 symplectic 积分器类似的结果,但是它们在输入差异大约 7 个数量级时具有困难培训和稳定性问题。相比之下, Deep Neural Networks 轻松培训,但是它们不会保留能量,导致快速偏离参照解。我们设计的 hybrid 积分器可以增加方法的可靠性,避免大量能量误差,而无需明显增加计算成本。对于这个问题,使用神经网络的方法可以在 asteroids 数量超过 70 时实现更快的计算。
Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods
results: 对 Atari 2600 环境中的 PPO 算法进行了比较实验,结果显示 D-PPO 算法在性能上有显著的提升,并有效地限制了强制样本重要性抽样引起的 surrogate 目标函数差值的增长。Abstract
Policy-based reinforcement learning algorithms are widely used in various fields. Among them, mainstream policy optimization algorithms such as TRPO and PPO introduce importance sampling into policy iteration, which allows the reuse of historical data. However, this can also lead to high variance of the surrogate objective and indirectly affects the stability and convergence of the algorithm. In this paper, we first derived an upper bound of the surrogate objective variance, which can grow quadratically with the increase of the surrogate objective. Next, we proposed a dropout technique to avoid the excessive increase of the surrogate objective variance caused by importance sampling. Then, we introduced a general reinforcement learning framework applicable to mainstream policy optimization methods, and applied the dropout technique to the PPO algorithm to obtain the D-PPO variant. Finally, we conduct comparative experiments between D-PPO and PPO algorithms in the Atari 2600 environment, results show that D-PPO achieved significant performance improvements compared to PPO, and effectively limited the excessive increase of the surrogate objective variance during training.
摘要
Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm
paper_authors: Miaoxi Zhu, Li Shen, Bo Du, Dacheng Tao
for: investigate the primal-dual generalization bound of the decentralized stochastic gradient descent ascent (D-SGDA) algorithm and analyze the impact of different topologies on the generalization bound.
methods: use the approach of algorithmic stability under both convex-concave and nonconvex-nonconcave settings to refine the algorithmic stability of D-SGDA and demonstrate that the decentralized structure does not destroy the stability and generalization of D-SGDA.
results: obtain the optimal population risk of D-SGDA in the convex-concave setting by balancing the optimization error with the generalization gap, and validate the theoretical findings through several numerical experiments.Here is the text in Simplified Chinese:
results: 在 convex-concave Setting下获得 D-SGDA 的优化人口风险,并通过数据分析 validate 理论发现。Abstract
The growing size of available data has attracted increasing interest in solving minimax problems in a decentralized manner for various machine learning tasks. Previous theoretical research has primarily focused on the convergence rate and communication complexity of decentralized minimax algorithms, with little attention given to their generalization. In this paper, we investigate the primal-dual generalization bound of the decentralized stochastic gradient descent ascent (D-SGDA) algorithm using the approach of algorithmic stability under both convex-concave and nonconvex-nonconcave settings. Our theory refines the algorithmic stability in a decentralized manner and demonstrates that the decentralized structure does not destroy the stability and generalization of D-SGDA, implying that it can generalize as well as the vanilla SGDA in certain situations. Our results analyze the impact of different topologies on the generalization bound of the D-SGDA algorithm beyond trivial factors such as sample sizes, learning rates, and iterations. We also evaluate the optimization error and balance it with the generalization gap to obtain the optimal population risk of D-SGDA in the convex-concave setting. Additionally, we perform several numerical experiments which validate our theoretical findings.
摘要
“由于可用数据的增长,这些问题在分散化的方式下解决引起了越来越多的关注。过去的理论研究主要集中在分散化问题的参数率和通信复杂度上,几乎没有关注其普遍化。在这篇文章中,我们 investigate了分散化推导矩降(D-SGDA)算法的内部稳定性下的内部稳定性,使用了内部稳定性的方法在内部稳定性下进行了研究。我们的理论显示,分散化结构不会摧毁D-SGDA的稳定性和普遍化,这意味着它可以在某些情况下与标准的SGDA具有相同的普遍化能力。我们的结果分析了不同的结构对D-SGDA算法的普遍化范围之外的影响,以及与标准的SGDA算法进行比较。我们还评估了优化错误和普遍化距离,并在凸-凹设定下调节它们以取得D-SGDA算法的最佳人口难度。此外,我们还进行了一些实验,以验证我们的理论结果。”Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.
Distil the informative essence of loop detector data set: Is network-level traffic forecasting hungry for more data?
paper_authors: Guopeng Li, Victor L. Knoop, J. W. C., van Lint for: 这个研究主要探讨了如何评估大量交通数据的真实信息内容,以提高交通预测的精度和可靠性。methods: 本研究提出了一个不确定性认识框架,结合交通流理论和グラフ神经网络,以及使用证据学来量化不同来源的不确定性。results: results show that more than 80% of the data during daytime can be removed, and the remaining 20% samples have equal prediction power for training models. This suggests that large traffic datasets can be subdivided into significantly smaller but equally informative datasets.Abstract
Network-level traffic condition forecasting has been intensively studied for decades. Although prediction accuracy has been continuously improved with emerging deep learning models and ever-expanding traffic data, traffic forecasting still faces many challenges in practice. These challenges include the robustness of data-driven models, the inherent unpredictability of traffic dynamics, and whether further improvement of traffic forecasting requires more sensor data. In this paper, we focus on this latter question and particularly on data from loop detectors. To answer this, we propose an uncertainty-aware traffic forecasting framework to explore how many samples of loop data are truly effective for training forecasting models. Firstly, the model design combines traffic flow theory with graph neural networks, ensuring the robustness of prediction and uncertainty quantification. Secondly, evidential learning is employed to quantify different sources of uncertainty in a single pass. The estimated uncertainty is used to "distil" the essence of the dataset that sufficiently covers the information content. Results from a case study of a highway network around Amsterdam show that, from 2018 to 2021, more than 80\% of the data during daytime can be removed. The remaining 20\% samples have equal prediction power for training models. This result suggests that indeed large traffic datasets can be subdivided into significantly smaller but equally informative datasets. From these findings, we conclude that the proposed methodology proves valuable in evaluating large traffic datasets' true information content. Further extensions, such as extracting smaller, spatially non-redundant datasets, are possible with this method.
摘要
Firstly, the model design combines traffic flow theory with graph neural networks, ensuring the robustness of prediction and uncertainty quantification. Secondly, evidential learning is employed to quantify different sources of uncertainty in a single pass. The estimated uncertainty is used to "distil" the essence of the dataset that sufficiently covers the information content.Results from a case study of a highway network around Amsterdam show that, from 2018 to 2021, more than 80% of the data during daytime can be removed. The remaining 20% of samples have equal prediction power for training models. This result suggests that large traffic datasets can be subdivided into significantly smaller but equally informative datasets. From these findings, we conclude that the proposed methodology is valuable in evaluating the true information content of large traffic datasets. Further extensions, such as extracting smaller, spatially non-redundant datasets, are possible with this method.
paper_authors: Adam Dejl, Hamed Ayoobi, Matthew Williams, Francesca Toni
for: 该研究旨在提出一种新的特征归因方法(CAFE),用于解释神经网络模型的输出。
methods: 该方法 addresses three limitations of existing feature attribution methods:它不考虑输入特征之间的冲突,也不考虑模型偏好项的影响,而且过敏感于活动函数下的地方变化。CAFE提供了防止过度估计输入特征的影响的保障,并分别跟踪输入特征和偏好项的正面和负面影响,从而提高了其robustness和可靠性。
results: 实验表明,CAFE在 sintetic tabular data上能够更好地检测冲突特征,并在实际世界的 tabular datasets上 exhibits the best overall fidelity,同时具有高度计算效率。Abstract
Feature attribution methods are widely used to explain neural models by determining the influence of individual input features on the models' outputs. We propose a novel feature attribution method, CAFE (Conflict-Aware Feature-wise Explanations), that addresses three limitations of the existing methods: their disregard for the impact of conflicting features, their lack of consideration for the influence of bias terms, and an overly high sensitivity to local variations in the underpinning activation functions. Unlike other methods, CAFE provides safeguards against overestimating the effects of neuron inputs and separately traces positive and negative influences of input features and biases, resulting in enhanced robustness and increased ability to surface feature conflicts. We show experimentally that CAFE is better able to identify conflicting features on synthetic tabular data and exhibits the best overall fidelity on several real-world tabular datasets, while being highly computationally efficient.
摘要
Feature 归属方法广泛使用于解释神经网络模型,它们可以确定输入特征对模型输出的影响。我们提出了一种新的Feature归属方法,即CAFE(冲突意识Feature-wise解释),它解决了现有方法的三大限制:它们忽略了冲突特征的影响,并且不考虑偏移项的影响,以及当地活动函数下的过敏感性。与其他方法不同,CAFE提供了防止过度估计输入神经元影响的保障,并分别跟踪输入特征和偏移项的正面和负面影响,从而提高了Robustness和Surface特征冲突。我们通过实验表明,CAFE在 sintetic 表格数据上能够更好地检测冲突特征,并在多个实际表格数据上表现出最好的总准确性,同时具有高效的计算效率。
Verification of Neural Networks Local Differential Classification Privacy
results: 通过训练Only 7% of the networks,可以获得93%的验证率和提高分析时间的速度$1.7\cdot10^4$倍。Here’s a brief explanation of each point:
for: The paper is focused on protecting the privacy of individuals in neural network training.
methods: The proposed method uses distribution prediction and fine-grained verification to ensure local differential classification privacy (LDCP).
results: The proposed method, called Sphynx, can accurately predict an abstract network from a small set of training networks, and verify LDCP with high probability and low analysis time.Abstract
Neural networks are susceptible to privacy attacks. To date, no verifier can reason about the privacy of individuals participating in the training set. We propose a new privacy property, called local differential classification privacy (LDCP), extending local robustness to a differential privacy setting suitable for black-box classifiers. Given a neighborhood of inputs, a classifier is LDCP if it classifies all inputs the same regardless of whether it is trained with the full dataset or whether any single entry is omitted. A naive algorithm is highly impractical because it involves training a very large number of networks and verifying local robustness of the given neighborhood separately for every network. We propose Sphynx, an algorithm that computes an abstraction of all networks, with a high probability, from a small set of networks, and verifies LDCP directly on the abstract network. The challenge is twofold: network parameters do not adhere to a known distribution probability, making it difficult to predict an abstraction, and predicting too large abstraction harms the verification. Our key idea is to transform the parameters into a distribution given by KDE, allowing to keep the over-approximation error small. To verify LDCP, we extend a MILP verifier to analyze an abstract network. Experimental results show that by training only 7% of the networks, Sphynx predicts an abstract network obtaining 93% verification accuracy and reducing the analysis time by $1.7\cdot10^4$x.
摘要
神经网络容易受到隐私攻击。至今,无法对培训集中的个人隐私进行推理。我们提出了一种新的隐私属性,called local differential classification privacy (LDCP),将本地鲁棒性扩展到随机推理设置,适用于黑盒分类器。给定一个输入集,一个分类器是LDCP的如果它对于全集或任何单个输入的训练集都将所有输入分类相同。一个简单的算法是非常不实际,因为它需要训练一个非常大的数量的网络,并对每个网络进行本地鲁棒性检查。我们提出了一种名为Sphynx的算法,它可以从一个小量的网络中计算一个抽象网络,并直接在抽象网络上验证LDCP。挑战是双重的:网络参数不遵循已知的概率分布,使得预测抽象困难,而且预测过大的抽象会对验证产生负面影响。我们的关键想法是将参数转换为一个由KDE提供的分布,使得过应ximation错误小。为了验证LDCP,我们将MILP验证器扩展到分析抽象网络。实验结果表明,通过训练仅7%的网络,Sphynx可以预测一个抽象网络, obtiene 93%的验证精度,并将分析时间减少为$1.7\cdot10^4$倍。
Accelerating Generalized Linear Models by Trading off Computation for Uncertainty
results: 根据这篇论文的结果,使用这家系列的迭代方法可以将 GLMs 的训练时间大幅缩短,并且可以调整这种训练时间的调整。这些方法可以在大规模数据上进行高效的推断,并且可以将推断错误的信息传递给使用者。Abstract
Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training by explicitly trading off reduced computation for increased uncertainty.
摘要
bayesian总体线性模型(GLMs)提供一种灵活的概率框架,用于模型 categorical、ordinal和连续数据,广泛应用于实践中。然而, exact inference在GLMs中是不可能进行大规模数据处理的,因此需要使用 aproximations。这些 aproximations 会导致模型的可靠性受到影响,并且不会考虑模型预测中的uncertainty。在这种情况下,我们介绍了一家Iterator的方法,可以显式地模型这种错误。这些方法特别适合并行现代计算硬件,高效地重复计算,并将信息压缩到减少时间和内存需求。我们在一个实际上是一个大规模分类问题中进行了示例,并示出了我们的方法可以明显加速训练,通过交换减少计算和增加uncertainty来进行明确的贸易OFF。
Advancing Bayesian Optimization via Learning Correlated Latent Space
results: 通过引入 lipschitz 常数 regularization、损失权重和信任区域重定位,我们的方法可以减少内在差距,并在有批量优化任务中达到高效性。Abstract
Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it leads to an inherent gap that results in potentially suboptimal solutions. To alleviate the discrepancy, we propose Correlated latent space Bayesian Optimization (CoBO), which focuses on learning correlated latent spaces characterized by a strong correlation between the distances in the latent space and the distances within the objective function. Specifically, our method introduces Lipschitz regularization, loss weighting, and trust region recoordination to minimize the inherent gap around the promising areas. We demonstrate the effectiveness of our approach on several optimization tasks in discrete data, such as molecule design and arithmetic expression fitting, and achieve high performance within a small budget.
摘要
bayesian优化是一种强大的方法,用于优化黑盒函数,具有有限的函数评估次数。最近的研究表明,通过深度生成模型,如变分自动编码器,在嵌入空间进行优化可以获得有效和高效的bayesian优化。然而,由于优化不发生在输入空间,这会导致一定的差距,从而可能导致不优化的解。为了缓解这个差距,我们提出相关的嵌入空间抽象 bayesian优化方法(CoBO),它关注学习相关的嵌入空间,其中嵌入空间中距离与目标函数中距离之间具有强相关性。我们的方法引入了 lipschitz 规范、损失权重和信任区域重新坐标,以降低差距的困扰。我们在多个数据嵌入中进行了评估,包括分子设计和数学表达适应,并在有限资源下达到了高性能。
STDA-Meta: A Meta-Learning Framework for Few-Shot Traffic Prediction
results: 实验结果表明,与基eline模型相比,我们的模型在两个纪录MAE和RMSE上提高了7%的预测精度。Abstract
As the development of cities, traffic congestion becomes an increasingly pressing issue, and traffic prediction is a classic method to relieve that issue. Traffic prediction is one specific application of spatio-temporal prediction learning, like taxi scheduling, weather prediction, and ship trajectory prediction. Against these problems, classical spatio-temporal prediction learning methods including deep learning, require large amounts of training data. In reality, some newly developed cities with insufficient sensors would not hold that assumption, and the data scarcity makes predictive performance worse. In such situation, the learning method on insufficient data is known as few-shot learning (FSL), and the FSL of traffic prediction remains challenges. On the one hand, graph structures' irregularity and dynamic nature of graphs cannot hold the performance of spatio-temporal learning method. On the other hand, conventional domain adaptation methods cannot work well on insufficient training data, when transferring knowledge from different domains to the intended target domain.To address these challenges, we propose a novel spatio-temporal domain adaptation (STDA) method that learns transferable spatio-temporal meta-knowledge from data-sufficient cities in an adversarial manner. This learned meta-knowledge can improve the prediction performance of data-scarce cities. Specifically, we train the STDA model using a Model-Agnostic Meta-Learning (MAML) based episode learning process, which is a model-agnostic meta-learning framework that enables the model to solve new learning tasks using only a small number of training samples. We conduct numerous experiments on four traffic prediction datasets, and our results show that the prediction performance of our model has improved by 7\% compared to baseline models on the two metrics of MAE and RMSE.
摘要
随着城市的发展,交通堵塞问题日益突出,交通预测成为解决这一问题的一种经典方法。交通预测是空间时间预测学习的一个特定应用,与出租车预测、天气预测和船舶轨迹预测等问题相关。然而,传统的空间时间预测学习方法,包括深度学习,需要大量的训练数据。在实际情况下,一些新发展的城市可能缺乏感知器,这会导致预测性能下降。在这种情况下,少量数据学习(Few-shot Learning,FSL)成为了一个挑战。一方面,城市网络结构的不规则性和时间动态性使得空间时间预测方法的性能受到限制。另一方面,传统的领域适应方法在缺乏训练数据情况下不能很好地传递知识到目标领域。为解决这些挑战,我们提出了一种新的空间时间预测适应(STDA)方法,该方法通过在数据充沛城市中学习转移性空间时间元知识的方式,来提高数据缺乏城市的预测性能。具体来说,我们使用基于Model-Agnostic Meta-Learning(MAML)的 episodic learning 进程来训练 STDA 模型,该进程使得模型可以通过少量训练样本来解决新的学习任务。我们在四个交通预测数据集上进行了多次实验,结果显示,我们的模型在 MAE 和 RMSE 两个指标上提高了7%。
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics
results: 试验表明,通过使用这些指标作为正则化项,可以提高预测的准确性、锐度和决策质量,并且超过了仅仅采用后期重新准确化的方法。Abstract
Calibration ensures that probabilistic forecasts meaningfully capture uncertainty by requiring that predicted probabilities align with empirical frequencies. However, many existing calibration methods are specialized for post-hoc recalibration, which can worsen the sharpness of forecasts. Drawing on the insight that calibration can be viewed as a distribution matching task, we introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. Furthermore, we provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions. Our empirical evaluation demonstrates that employing these metrics as regularizers enhances calibration, sharpness, and decision-making across a range of regression and classification tasks, outperforming methods relying solely on post-hoc recalibration.
摘要
<>加油确保机会预测值具有意义的不确定性捕捉,需要预测概率与实际频率一致。然而,许多现有的准备方法专注于后期重新准备,可能会恶化预测的精度。基于准备可以视为分布匹配任务的想法,我们引入核函数基础的准备指标,总结和普适了广泛的准备方法,包括分类和回归。这些指标允许可微样本估计,使其容易在empirical risk minimization中包含准备目标。此外,我们提供直观的机制来适应准备指标与决策任务的关系,并强制实际损失估计和无损失决策。我们的实验评估表明,通过加入这些指标作为正则izers来增强准备、锐度和决策,在广泛的分类和回归任务上表现出色,超过仅仅依靠后期重新准备的方法。
Network Contention-Aware Cluster Scheduling with Reinforcement Learning
results: 相比于传统的调度策略,本论文的方法可以降低平均训练时间 by up to 18.2%,并将尾部训练时间降低 by up to 20.7%,同时可以在平均训练时间和资源利用率之间进行可接受的变数调整。Abstract
With continuous advances in deep learning, distributed training is becoming common in GPU clusters. Specifically, for emerging workloads with diverse amounts, ratios, and patterns of communication, we observe that network contention can significantly degrade training throughput. However, widely used scheduling policies often face limitations as they are agnostic to network contention between jobs. In this paper, we present a new approach to mitigate network contention in GPU clusters using reinforcement learning. We formulate GPU cluster scheduling as a reinforcement learning problem and opt to learn a network contention-aware scheduling policy that efficiently captures contention sensitivities and dynamically adapts scheduling decisions through continuous evaluation and improvement. We show that compared to widely used scheduling policies, our approach reduces average job completion time by up to 18.2\% and effectively cuts the tail job completion time by up to 20.7\% while allowing a preferable trade-off between average job completion time and resource utilization.
摘要
随着深度学习的不断发展,分布式训练在GPU集群中变得越来越普遍。特别是在新趋势上,我们发现网络竞争可以很大程度地降低训练速率。然而,广泛使用的调度策略经常面临限制,因为它们对GPU集群中的网络竞争不感知。在这篇论文中,我们提出了一种使用强化学习来减轻GPU集群中的网络竞争的方法。我们将GPU集群调度问题定义为一个强化学习问题,并选择学习一个感知网络竞争的调度策略,以高效地捕捉竞争敏感度并通过不断评估和改进来 dynamically adapt 调度决策。我们表明,比起广泛使用的调度策略,我们的方法可以降低平均任务完成时间,最多降低18.2%,同时有一个可接受的资源利用率。
Importance Estimation with Random Gradient for Neural Network Pruning
results: 与现有方法相比,我们的方法在ResNet和VGG架构上的CIFAR-100和STL-10数据集上表现更好,并且可以补充和提高现有方法的表现。Abstract
Global Neuron Importance Estimation is used to prune neural networks for efficiency reasons. To determine the global importance of each neuron or convolutional kernel, most of the existing methods either use activation or gradient information or both, which demands abundant labelled examples. In this work, we use heuristics to derive importance estimation similar to Taylor First Order (TaylorFO) approximation based methods. We name our methods TaylorFO-abs and TaylorFO-sq. We propose two additional methods to improve these importance estimation methods. Firstly, we propagate random gradients from the last layer of a network, thus avoiding the need for labelled examples. Secondly, we normalize the gradient magnitude of the last layer output before propagating, which allows all examples to contribute similarly to the importance score. Our methods with additional techniques perform better than previous methods when tested on ResNet and VGG architectures on CIFAR-100 and STL-10 datasets. Furthermore, our method also complements the existing methods and improves their performances when combined with them.
摘要
FedRec+: Enhancing Privacy and Addressing Heterogeneity in Federated Recommendation Systems
results: 实验结果表明,FedRec+在不同的参考数据集上具有状态 искус级的性能。Abstract
Preserving privacy and reducing communication costs for edge users pose significant challenges in recommendation systems. Although federated learning has proven effective in protecting privacy by avoiding data exchange between clients and servers, it has been shown that the server can infer user ratings based on updated non-zero gradients obtained from two consecutive rounds of user-uploaded gradients. Moreover, federated recommendation systems (FRS) face the challenge of heterogeneity, leading to decreased recommendation performance. In this paper, we propose FedRec+, an ensemble framework for FRS that enhances privacy while addressing the heterogeneity challenge. FedRec+ employs optimal subset selection based on feature similarity to generate near-optimal virtual ratings for pseudo items, utilizing only the user's local information. This approach reduces noise without incurring additional communication costs. Furthermore, we utilize the Wasserstein distance to estimate the heterogeneity and contribution of each client, and derive optimal aggregation weights by solving a defined optimization problem. Experimental results demonstrate the state-of-the-art performance of FedRec+ across various reference datasets.
摘要
translation in simplified Chinese:保护用户隐私和降低边缘用户的通信成本是推荐系统的主要挑战。虽然联邦学习已经证明可以保护隐私 by avoiding数据交换 между客户端和服务器,但是服务器可以根据两次连续的用户上传的非零梯度来推断用户评分。此外,联邦推荐系统(FRS)面临着异ogeneous挑战,导致推荐性能下降。在这篇论文中,我们提出了FedRec+,一种ensemble框架 для FRS,可以增强隐私,同时解决异ogeneous挑战。FedRec+使用了优化subset选择基于特征相似性,生成 Pseudo Item 的近似优评级,只使用用户的本地信息。这种方法可以减少噪音,而无需额外的通信成本。此外,我们利用 Wasserstein 距离来估计每个客户端的异ogeneous和贡献,并 deriv Optimal Aggregation Weights 通过解决定义的优化问题。实验结果表明FedRec+在多个参考数据集上达到了状态最佳性能。
Compact Binary Systems Waveform Generation with Generative Pre-trained Transformer
for: 这项研究旨在解决空间 gravitational wave 探测中的数据处理困难,具体来说是预测 Compact Binary Systems 波形的准确性。
methods: 该研究使用了一种可解释的大型预训练模型,名为 CBS-GPT,来预测 Compact Binary Systems 波形。
results: 该模型在预测 Massive Black Hole Binary、Extreme Mass-Ratio Inspirals 和 Galactic Binary 波形方面达到了98%、91%和99%的准确率。 CBS-GPT 模型具有可读性特点,其隐藏参数能够有效捕捉波形中的复杂信息,包括仪器响应和广泛的参数范围。Abstract
Space-based gravitational wave detection is one of the most anticipated gravitational wave (GW) detection projects in the next decade, which will detect abundant compact binary systems. However, the precise prediction of space GW waveforms remains unexplored. To solve the data processing difficulty in the increasing waveform complexity caused by detectors' response and second-generation time-delay interferometry (TDI 2.0), an interpretable pre-trained large model named CBS-GPT (Compact Binary Systems Waveform Generation with Generative Pre-trained Transformer) is proposed. For compact binary system waveforms, three models were trained to predict the waveforms of massive black hole binary (MBHB), extreme mass-ratio inspirals (EMRIs), and galactic binary (GB), achieving prediction accuracies of 98%, 91%, and 99%, respectively. The CBS-GPT model exhibits notable interpretability, with its hidden parameters effectively capturing the intricate information of waveforms, even with complex instrument response and a wide parameter range. Our research demonstrates the potential of large pre-trained models in gravitational wave data processing, opening up new opportunities for future tasks such as gap completion, GW signal detection, and signal noise reduction.
摘要
空间基于 gravitational wave 探测是下一个辉煌的 gravitational wave(GW)探测项目,可探测丰富的紧凑 binary system。然而,准确预测空间 GW 波形仍然未经探索。为解决探测器响应和第二代时间延迟相互探测(TDI 2.0)中增加的波形复杂性,一种可解释的大型预训练模型被提议,称为 CBS-GPT(紧凑 binary system waveform generation with generative pre-trained transformer)。对紧凑 binary system waveform,三个模型被训练以预测大黑洞 binary(MBHB)、极高质量比例减少(EMRIs)和 galactic binary(GB)的波形,实现预测精度分别为98%、91%和99%。CBS-GPT 模型具有明显的可解释性,其隐藏参数能够有效捕捉波形中的复杂信息,即使检测器响应和参数范围宽泛。我们的研究表明大型预训练模型在 gravitational wave 数据处理中具有潜在的应用前景,开启新的未来任务,如 gap completion、GW 信号检测和信号噪声减少。
Understanding and Visualizing Droplet Distributions in Simulations of Shallow Clouds
paper_authors: Justus C. Will, Andrea M. Jenney, Kara D. Lamb, Michael S. Pritchard, Colleen Kaul, Po-Lun Ma, Kyle Pressel, Jacob Shpund, Marcus van Lier-Walqui, Stephan Mandt
results: 研究人员使用变分自适应器(VAEs)生成了新的、直观的视觉化方法,可以更好地理解液滴大小的分布和时间发展。这些结果提高了解释,并允许我们对不同气尘含量的 simulations 进行比较,探讨气尘-云交互的过程。Abstract
Thorough analysis of local droplet-level interactions is crucial to better understand the microphysical processes in clouds and their effect on the global climate. High-accuracy simulations of relevant droplet size distributions from Large Eddy Simulations (LES) of bin microphysics challenge current analysis techniques due to their high dimensionality involving three spatial dimensions, time, and a continuous range of droplet sizes. Utilizing the compact latent representations from Variational Autoencoders (VAEs), we produce novel and intuitive visualizations for the organization of droplet sizes and their evolution over time beyond what is possible with clustering techniques. This greatly improves interpretation and allows us to examine aerosol-cloud interactions by contrasting simulations with different aerosol concentrations. We find that the evolution of the droplet spectrum is similar across aerosol levels but occurs at different paces. This similarity suggests that precipitation initiation processes are alike despite variations in onset times.
摘要
Translated into Simplified Chinese:全面分析当地液滴级别交互是云物理过程的解释中的关键。大气动力学精度高的大气液滴大小分布模拟挑战当前分析技术,因为它们具有三维空间、时间和液滴大小的维度。通过Variational Autoencoders (VAEs)的紧凑尺度表示,我们生成了不同于归类技术的新的和直观的视觉化,以更好地理解液滴大小的组织和时间演化。这有助于解释,并允许我们通过不同气尘浓度的模拟比较云-降水相互作用。我们发现,不同气尘浓度下的液滴spectrum在发展 pace 上存在相似性,这表示降水起始过程是相同的,即使起始时间有所不同。
Efficient Robust Bayesian Optimization for Arbitrary Uncertain inputs
paper_authors: Lin Yang, Junlong Lyu, Wenlong Lyu, Zhitang Chen for: 这个研究的目的是发展一种可以对不确定输入进行Robust Bayesian Optimization的算法,以确保Optimum的稳定性和高效性。methods: 这个算法使用了Gaussian Process的Maximum Mean Discrepancy(MMD)来直接模型不确定的输入,并使用Nystrom数学来加速 posterior inference。results: 实验结果显示,这个方法可以 Handle various input uncertainties和 achieve state-of-the-art performance。同时,这个方法也可以提供了严格的理论 regret bound。Abstract
Bayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In some challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such as machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended value before evaluation, resulting in significant performance fluctuations in the final result. In this paper, we introduce a novel robust Bayesian Optimization algorithm, AIRBO, which can effectively identify a robust optimum that performs consistently well under arbitrary input uncertainty. Our method directly models the uncertain inputs of arbitrary distributions by empowering the Gaussian Process with the Maximum Mean Discrepancy (MMD) and further accelerates the posterior inference via Nystrom approximation. Rigorous theoretical regret bound is established under MMD estimation error and extensive experiments on synthetic functions and real problems demonstrate that our approach can handle various input uncertainties and achieve state-of-the-art performance.
摘要
bayesian 优化(BO)是一种样本效率高的优化算法,广泛应用于各种应用领域。在一些复杂的BO任务中,输入不确定性来自优化过程中的不可避免随机性,如机器加工错误、执行噪声或上下文变化。这种不确定性使输入偏离目标值,导致优化结果的表现异常变化。在这篇论文中,我们介绍了一种新的Robust Bayesian Optimization算法,AIRBO,可以有效地确定一个可靠的最优点,在任意输入不确定性下表现一致良好。我们直接使用Gaussian Process模型来模型不确定的输入,并通过Maximum Mean Discrepancy(MMD)和Nystrom采样加速 posterior 推理。我们对MMD估计误差的理论 regret bound 进行了证明,并在synthetic functions 和实际问题上进行了广泛的实验,显示我们的方法可以处理多种输入不确定性,并达到状态的末点性能。
results: 提供更加精细的保证,超越了现有的信息论bounds在不同的学习场景中。Abstract
We present new information-theoretic generalization guarantees through the a novel construction of the "neighboring-hypothesis" matrix and a new family of stability notions termed sample-conditioned hypothesis (SCH) stability. Our approach yields sharper bounds that improve upon previous information-theoretic bounds in various learning scenarios. Notably, these bounds address the limitations of existing information-theoretic bounds in the context of stochastic convex optimization (SCO) problems, as explored in the recent work by Haghifam et al. (2023).
摘要
我们提出一新的信息理论基准,通过一个独特的“邻域假设”矩阵和一新的家族称为“样本控制假设”(SCH)稳定性。我们的方法可以提供更加锐利的上限,超越了现有信息理论上限在不同学习场景中。特别是,这些上限解决了现有信息理论上限在随机凸伸估计(SCO)问题中的限制,如某些最近的研究(Haghifam et al., 2023)所提出的问题。
Robust Learning for Smoothed Online Convex Optimization with Feedback Delay
results: 研究证明了RCL可以保证$(1+\lambda)$-竞争性比任何给定专家,而且可以在多步跳过成本和反馈延迟的情况下显著提高平均性能。Abstract
We study a challenging form of Smoothed Online Convex Optimization, a.k.a. SOCO, including multi-step nonlinear switching costs and feedback delay. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specifically,we prove that RCL is able to guarantee$(1+\lambda)$-competitiveness against any given expert for any$\lambda>0$, while also explicitly training the ML model in a robustification-aware manner to improve the average-case performance. Importantly,RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay.We demonstrate the improvement of RCL in both robustness and average performance using battery management for electrifying transportationas a case study.
摘要
我们研究一种具有多步非线性调整成本和循环延迟的缓和线上凸项估算(SOCO)。我们提出了一个新的机器学习(ML)加持的在网上算法,即强健性条件学习(RCL),它通过将不信任的ML预测与可靠的专家网上算法结合,并透过受限的投影来强化ML预测。我们证明了RCL可以保证$(1+\lambda)$-竞争性比任何 givent expert 的任何 $\lambda>0$,同时也可以透过专门的训练来提高均值性能。特别是,RCL 是具有调整-可靠性保证的第一个 ML 加持算法,在多步调整成本和循环延迟的情况下。我们使用电动车电池管理作为一个实际应用来说明 RCL 的改进。
Bridging the Gap Between Variational Inference and Wasserstein Gradient Flows
For: The paper bridges the gap between variational inference and Wasserstein gradient flows, demonstrating that the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow, and offering an alternative perspective on the path-derivative gradient estimator.* Methods: The paper uses the path-derivative gradient estimator to generate the vector field of the gradient flow, and demonstrates how distillations can be extended to encompass $f$-divergences and non-Gaussian variational families.* Results: The paper offers a new gradient estimator for $f$-divergences, which is readily implementable using contemporary machine learning libraries like PyTorch or TensorFlow.Here’s the information in Simplified Chinese text, as requested:* 为:本文将变量推理和 Wasserstein 梯度流之间的关系继续,表明BUres-Wasserstein 梯度流可以转化为Euclidean梯度流,并提供了一种alternative perspective on path-derivative gradient estimator。* 方法:本文使用path-derivative gradient estimator来生成梯度流的向量场,并证明了可以将distillations扩展到包括$f$-divergence和非 Gaussian variational families。* 结果:本文提供了一种新的 gradient estimator for $f$-divergences,可以使用现代机器学习库如PyTorch或TensorFlow进行实现。Abstract
Variational inference is a technique that approximates a target distribution by optimizing within the parameter space of variational families. On the other hand, Wasserstein gradient flows describe optimization within the space of probability measures where they do not necessarily admit a parametric density function. In this paper, we bridge the gap between these two methods. We demonstrate that, under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow where its forward Euler scheme is the standard black-box variational inference algorithm. Specifically, the vector field of the gradient flow is generated via the path-derivative gradient estimator. We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow. Distillations can be extended to encompass $f$-divergences and non-Gaussian variational families. This extension yields a new gradient estimator for $f$-divergences, readily implementable using contemporary machine learning libraries like PyTorch or TensorFlow.
摘要
<>translate "Variational inference is a technique that approximates a target distribution by optimizing within the parameter space of variational families. On the other hand, Wasserstein gradient flows describe optimization within the space of probability measures where they do not necessarily admit a parametric density function. In this paper, we bridge the gap between these two methods. We demonstrate that, under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow where its forward Euler scheme is the standard black-box variational inference algorithm. Specifically, the vector field of the gradient flow is generated via the path-derivative gradient estimator. We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow. Distillations can be extended to encompass $f$-divergences and non-Gaussian variational families. This extension yields a new gradient estimator for $f$-divergences, readily implementable using contemporary machine learning libraries like PyTorch or TensorFlow.">Here's the translation:Variational inference是一种技术,通过在变量家族的参数空间中优化,来近似目标分布。而 Wasserstein梯度流则是在概率度量空间中优化的,但它们并不一定具有参数概率函数。在这篇论文中,我们将这两种方法相互关联。我们证明,在某些条件下,寿-沃asserstein梯度流可以转换为欧几何梯度流,其前向尼尔逻辑步骤是标准的黑盒变量推理算法。具体来说,梯度流的向量场是通过路径导数 gradient estimator 生成的。我们还提供了一种alternative perspective on the path-derivative gradient,它是将它看作是瓦asserstein梯度流的蒸馏过程。这种扩展可以涵盖 $f$-散度和非泊尔分布变量家族。这种扩展可以生成一个新的梯度估计器,可以使用 contemporary machine learning libraries like PyTorch or TensorFlow 进行实现。