results: 提高了生成模型的训练效率和准确性,不需要使用可逆神经网络和计算Jacobian矩阵。Abstract
We present a supervised learning framework of training generative models for density estimation. Generative models, including generative adversarial networks, normalizing flows, variational auto-encoders, are usually considered as unsupervised learning models, because labeled data are usually unavailable for training. Despite the success of the generative models, there are several issues with the unsupervised training, e.g., requirement of reversible architectures, vanishing gradients, and training instability. To enable supervised learning in generative models, we utilize the score-based diffusion model to generate labeled data. Unlike existing diffusion models that train neural networks to learn the score function, we develop a training-free score estimation method. This approach uses mini-batch-based Monte Carlo estimators to directly approximate the score function at any spatial-temporal location in solving an ordinary differential equation (ODE), corresponding to the reverse-time stochastic differential equation (SDE). This approach can offer both high accuracy and substantial time savings in neural network training. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner. Compared with existing normalizing flow models, our method does not require to use reversible neural networks and avoids the computation of the Jacobian matrix. Compared with existing diffusion models, our method does not need to solve the reverse-time SDE to generate new samples. As a result, the sampling efficiency is significantly improved. We demonstrate the performance of our method by applying it to a set of 2D datasets as well as real data from the UCI repository.
摘要
我们提出了一个监督式学习框架,用于对密度估计进行生成模型训练。生成模型,包括生成对抗网络、标准化对抗网络和条件 autoencoder,通常被视为无监督式学习模型,因为训练时通常没有标签的资料。 despite the success of the generative models, there are several issues with the unsupervised training, such as the need for reversible architectures, vanishing gradients, and training instability. To enable supervised learning in generative models, we utilize the score-based diffusion model to generate labeled data. Unlike existing diffusion models that train neural networks to learn the score function, we develop a training-free score estimation method. This approach uses mini-batch-based Monte Carlo estimators to directly approximate the score function at any spatial-temporal location in solving an ordinary differential equation (ODE), corresponding to the reverse-time stochastic differential equation (SDE). This approach can offer both high accuracy and substantial time savings in neural network training. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in a supervised manner. Compared with existing normalizing flow models, our method does not require the use of reversible neural networks and avoids the computation of the Jacobian matrix. Compared with existing diffusion models, our method does not need to solve the reverse-time SDE to generate new samples. As a result, the sampling efficiency is significantly improved. We demonstrate the performance of our method by applying it to a set of 2D datasets as well as real data from the UCI repository.
URegM: a unified prediction model of resource consumption for refactoring software smells in open source cloud
results: 实验结果表明,URegM可以准确预测代码异臭重构后对云资源的使用。这将帮助云服务提供商更好地规划资源分配和代码重构。Abstract
The low cost and rapid provisioning capabilities have made the cloud a desirable platform to launch complex scientific applications. However, resource utilization optimization is a significant challenge for cloud service providers, since the earlier focus is provided on optimizing resources for the applications that run on the cloud, with a low emphasis being provided on optimizing resource utilization of the cloud computing internal processes. Code refactoring has been associated with improving the maintenance and understanding of software code. However, analyzing the impact of the refactoring source code of the cloud and studying its impact on cloud resource usage require further analysis. In this paper, we propose a framework called Unified Regression Modelling (URegM) which predicts the impact of code smell refactoring on cloud resource usage. We test our experiments in a real-life cloud environment using a complex scientific application as a workload. Results show that URegM is capable of accurately predicting resource consumption due to code smell refactoring. This will permit cloud service providers with advanced knowledge about the impact of refactoring code smells on resource consumption, thus allowing them to plan their resource provisioning and code refactoring more effectively.
摘要
“低成本和快速提供 capacities 使云平台成为 Complex scientific applications 的吸引力。然而,云服务提供商需要优化资源使其能够更好地使用资源,因为在云计算内部过程中的资源使用不受优化。Code refactoring 有助于改善软件代码的维护和理解。然而,云环境中 refactoring 代码的影响需要进一步的分析。本文提出一个名为 Unified Regression Modelling (URegM) 的框架,可以预测代码臭味 refactoring 对云资源的使用情况。我们在一个真实的云环境中进行实验,使用一个复杂的科学应用作为负载。结果表明,URegM 可以准确预测代码臭味 refactoring 对云资源的使用情况。这将允许云服务提供商在代码 refactoring 和资源分配方面更加有效地规划。”
paper_authors: Mingyang Wu, Xiaohui Chen, Li-Ping Liu
For: 本研究目的是改进现有的Diffusion-based方法,以提高大型网络生成的效率和生成质量。* Methods: 本文提出了对EDGE模型的两个改进:一是根据度Specific Noise Schedule优化活动节点数量,从而减少内存消耗;二是提出了一种改进的采样方案,以更好地控制生成过程中的相似性。* Results: 实验结果表明,提出的修改不仅提高了生成效率,还提高了生成的图像质量,为大型网络生成任务提供了一个可靠和扩展的解决方案。Abstract
Recently developed deep neural models like NetGAN, CELL, and Variational Graph Autoencoders have made progress but face limitations in replicating key graph statistics on generating large graphs. Diffusion-based methods have emerged as promising alternatives, however, most of them present challenges in computational efficiency and generative performance. EDGE is effective at modeling large networks, but its current denoising approach can be inefficient, often leading to wasted computational resources and potential mismatches in its generation process. In this paper, we propose enhancements to the EDGE model to address these issues. Specifically, we introduce a degree-specific noise schedule that optimizes the number of active nodes at each timestep, significantly reducing memory consumption. Additionally, we present an improved sampling scheme that fine-tunes the generative process, allowing for better control over the similarity between the synthesized and the true network. Our experimental results demonstrate that the proposed modifications not only improve the efficiency but also enhance the accuracy of the generated graphs, offering a robust and scalable solution for graph generation tasks.
摘要
近期发展的深度神经网络模型如NetGAN、CELL和Variational Graph Autoencoders等已经做出了进步,但是它们在生成大图时面临限制,Diffusion-based方法也在潜在的替代者中出现,但是大多数其中的计算效率和生成性能存在挑战。EDGE模型可以模型大网络,但是其当前的净化方法可能会导致计算资源浪费和生成过程中的匹配问题。在这篇论文中,我们提出了对EDGE模型的改进,包括度量特定的噪声调度,以优化每个时间步中活动节点的数量,显著减少内存占用。此外,我们还提出了改进的采样方案,可以细化生成过程,以更好地控制生成的图和真实图之间的相似性。我们的实验结果表明,我们的修改不仅提高了效率,还提高了生成的图的准确性,提供了一个可靠和扩展的图生成解决方案。
results: 实验表明,提出的方法可以提高 fairness 度并保持相同的 utility,与现有的 fairness-aware 基线方法相比。Abstract
Graphs are mathematical tools that can be used to represent complex real-world interconnected systems, such as financial markets and social networks. Hence, machine learning (ML) over graphs has attracted significant attention recently. However, it has been demonstrated that ML over graphs amplifies the already existing bias towards certain under-represented groups in various decision-making problems due to the information aggregation over biased graph structures. Faced with this challenge, here we take a fresh look at the problem of bias mitigation in graph-based learning by borrowing insights from graph signal processing. Our idea is to introduce predesigned graph filters within an ML pipeline to reduce a novel unsupervised bias measure, namely the correlation between sensitive attributes and the underlying graph connectivity. We show that the optimal design of said filters can be cast as a convex problem in the graph spectral domain. We also formulate a linear programming (LP) problem informed by a theoretical bias analysis, which attains a closed-form solution and leads to a more efficient fairness-aware graph filter. Finally, for a design whose degrees of freedom are independent of the input graph size, we minimize the bias metric over the family of polynomial graph convolutional filters. Our optimal filter designs offer complementary strengths to explore favorable fairness-utility-complexity tradeoffs. For performance evaluation, we conduct extensive and reproducible node classification experiments over real-world networks. Our results show that the proposed framework leads to better fairness measures together with similar utility compared to state-of-the-art fairness-aware baselines.
摘要
图表是数学工具,可以用来表示复杂的现实世界中的连接系统,如金融市场和社交网络。因此,机器学习(ML)在图表上的应用吸引了广泛的关注。然而,已经证明了ML在图表上会增强现有的偏见,导致各种决策问题中的偏见倾向某些被排除的群体。面临这个挑战,我们借鉴了图像处理的思想,引入了预设计图 filters 来减少敏感特征和图连接性之间的相关性。我们示示了这些筛选器的优化设计可以转化为一个对准的问题,并且可以通过LP问题来减少偏见。最后,我们为独立于输入图表大小的设计,对家族中的多项式图 convolutional filters 进行了最小化偏见度量。我们的优化筛选器设计提供了不同的优势,可以根据偏见、 utility 和复杂度进行质量评估。通过广泛和可重复的节点分类实验,我们的结果表明,我们的框架可以同时实现更好的偏见度量和相似的实用性。
Clustering Students Based on Gamification User Types and Learning Styles
for: clustering students according to their gamification user types and learning styles
methods: K-means algorithm and Gamification User Type Hexad Scale, Grasha-Riechmann Student Learning Style Scale
results: neutral results with a Silhouette coefficient of 0.12, indicating that the clustering is not satisfactory.Abstract
The aim of this study is clustering students according to their gamification user types and learning styles with the purpose of providing instructors with a new perspective of grouping students in case of clustering which cannot be done by hand when there are multiple scales in data. The data used consists of 251 students who were enrolled at a Turkish state university. When grouping students, K-means algorithm has been utilized as clustering algorithm. As for determining the gamification user types and learning styles of students, Gamification User Type Hexad Scale and Grasha-Riechmann Student Learning Style Scale have been used respectively. Silhouette coefficient is utilized as clustering quality measure. After fitting the algorithm in several ways, highest Silhouette coefficient obtained was 0.12 meaning that results are neutral but not satisfactory. All the statistical operations and data visualizations were made using Python programming language.
摘要
本研究的目的是根据学生的游戏化用户类型和学习风格进行分群,以提供教师一种新的分组学生的方法,这种方法不可以由手动进行分组,当数据具有多个级别时。该研究使用了251名在土耳其国立大学学习的学生的数据。在分群学生时,使用了K-means算法。为确定学生的游戏化用户类型和学习风格,使用了游戏用户类型六元排序和格拉沙-瑞希曼学生学习风格分型。使用了Silhouette系数作为分群质量度量。经过多种适应,最高的Silhouette系数为0.12,表示结果为中度可行,不够满意。所有统计计算和数据可视化都使用了Python编程语言。
A Quadratic Synchronization Rule for Distributed Deep Learning
paper_authors: Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang for: 本文旨在解决分布式深度学习中同步梯度的问题,特别是在多个节点同时训练大型模型时,通信开销增大。methods: 本文提出了一种名为幂 synchronization rule(QSR)的理论基础方法,该方法在学习率递减过程中动态设置H值,以提高模型的泛化性。results: 对于ImageNet datasets上的ResNet和ViT模型,本文的实验表明,使用QSR的本地梯度方法可以在同步方法中提高测试准确率,并且在16或64个GPU上进行训练时,可以降低训练时间,同时提高验证预测率。Abstract
In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Local SGD, address this issue by allowing workers to compute locally for $H$ steps without synchronizing with others, hence reducing communication frequency. While $H$ has been viewed as a hyperparameter to trade optimization efficiency for communication cost, recent research indicates that setting a proper $H$ value can lead to generalization improvement. Yet, selecting a proper $H$ is elusive. This work proposes a theory-grounded method for determining $H$, named the Quadratic Synchronization Rule (QSR), which recommends dynamically setting $H$ in proportion to $\frac{1}{\eta^2}$ as the learning rate $\eta$ decays over time. Extensive ImageNet experiments on ResNet and ViT show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies. Compared with the standard data parallel training, QSR enables Local AdamW on ViT-B to cut the training time on 16 or 64 GPUs down from 26.7 to 20.2 hours or from 8.6 to 5.5 hours and, at the same time, achieves $1.16\%$ or $0.84\%$ higher top-1 validation accuracy.
摘要
在分布式深度学习中,每个训练步骤的参数同步过程可能会导致巨大的通信开销,特别是当多个节点同时训练大型模型时。本地梯度方法,如本地SGD,可以通过让工作者在本地计算$H$步骤而降低与其他节点之间的通信频率。虽然$H$被视为训练效率和通信成本之间的权衡因素,但是选择合适的$H$值仍然是一个悬峰。这种工作提出了一种基于理论的方法,称为幂函数同步规则(QSR),该方法在学习率$\eta$逐渐减小过程中,对$H$进行动态设置,并与$\frac{1}{\eta^2}$成正比。经验显示,使用QSR的本地梯度方法在ImageNet上的ResNet和ViT上表现出了比其他同步策略更高的测试准确率。相比标准的数据并行训练,QSR使得Local AdamW在ViT-B上的16或64 GPU上减少了26.7小时训练时间,并同时实现了1.16%或0.84%高的顶部一 validate准精度。
Data Augmentation: a Combined Inductive-Deductive Approach featuring Answer Set Programming
results: 提出了一个 hybrid inductive-deductive 框架,可以从有限多个真实标签图像开始,以运用逻辑程式来压缩新图像的结构,并且 garantuee 这些新图像符合领域知识中的组合遵循和特定的欲望。Abstract
Although the availability of a large amount of data is usually given for granted, there are relevant scenarios where this is not the case; for instance, in the biomedical/healthcare domain, some applications require to build huge datasets of proper images, but the acquisition of such images is often hard for different reasons (e.g., accessibility, costs, pathology-related variability), thus causing limited and usually imbalanced datasets. Hence, the need for synthesizing photo-realistic images via advanced Data Augmentation techniques is crucial. In this paper we propose a hybrid inductive-deductive approach to the problem; in particular, starting from a limited set of real labeled images, the proposed framework makes use of logic programs for declaratively specifying the structure of new images, that is guaranteed to comply with both a set of constraints coming from the domain knowledge and some specific desiderata. The resulting labeled images undergo a dedicated process based on Deep Learning in charge of creating photo-realistic images that comply with the generated label.
摘要
Universal representation by Boltzmann machines with Regularised Axons
paper_authors: Przemysław R. Grzybowski, Antoni Jankiewicz, Eloy Piñol, David Cirauqui, Dorota H. Grzybowska, Paweł M. Petrykowski, Miguel Ángel García-March, Maciej Lewenstein, Gorka Muñoz-Gil, Alejandro Pozas-Kerstjens
results: 论文证明了规范后的 Boltzmann 机器仍能够表示任意分布,并且可以控制数量的能量地峰,以便进行导航式采样和训练。此外,文章还表明了规范后的 Boltzmann 机器可以存储无限多个相关的visible patron,并且可以完美地重新建立。Abstract
It is widely known that Boltzmann machines are capable of representing arbitrary probability distributions over the values of their visible neurons, given enough hidden ones. However, sampling -- and thus training -- these models can be numerically hard. Recently we proposed a regularisation of the connections of Boltzmann machines, in order to control the energy landscape of the model, paving a way for efficient sampling and training. Here we formally prove that such regularised Boltzmann machines preserve the ability to represent arbitrary distributions. This is in conjunction with controlling the number of energy local minima, thus enabling easy \emph{guided} sampling and training. Furthermore, we explicitly show that regularised Boltzmann machines can store exponentially many arbitrarily correlated visible patterns with perfect retrieval, and we connect them to the Dense Associative Memory networks.
摘要
广泛知道,博尔兹曼机能够表示任意概率分布的值 visible neuron,只要有 enough hidden ones。然而,采样 -- 和因此训练 -- 这些模型可能是数值上的困难。最近,我们提出了 Boltzmann 机Connection 的 regularization,以控制模型的能量地形,使得可以有效地采样和训练。我们正式证明,这些正则化的 Boltzmann 机能够保持表示任意分布的能力。此外,我们还显式地显示了正则化 Boltzmann 机可以存储无限多个相关的可见模式,并且可以完美地重新 retrieve,并与 dense associative memory networks 相连。Note: "visible neurons" and "hidden neurons" are not explicitly translated in the text, as they are not necessary for the meaning of the sentence. However, in Simplified Chinese, "visible neurons" can be translated as "可见神经元" and "hidden neurons" can be translated as "隐藏神经元".
A global product of fine-scale urban building height based on spaceborne lidar
paper_authors: Xiao Ma, Guang Zheng, Chi Xu, L. Monika Moskal, Peng Gong, Qinghua Guo, Huabing Huang, Xuecao Li, Yong Pang, Cheng Wang, Huan Xie, Bailang Yu, Bo Zhao, Yuyu Zhou for:This paper aims to provide a global product of urban building heights with fine spatial resolutions and global coverages, which is essential for achieving the UN’s Sustainable Development Goals (SDGs) and supporting future urban studies.methods:The authors combined the spaceborne lidar instrument of GEDI with multi-sourced data including remotely sensed images (Landsat-8, Sentinel-2, and Sentinel-1) and topographic data to produce a global product of urban building heights with a fine grid size of 150 m around 2020.results:The estimated method of building height samples based on the GEDI data was effective with a Pearson’s r of 0.78 and an RMSE of 3.67 m in comparison to the reference data. The mapping product also demonstrated good performance with a Pearson’s r of 0.71 and an RMSE of 4.60 m. The global urban building height map provides a higher spatial resolution (150 m) with greater inherent details about the spatial heterogeneity and flexibility of updating using the GEDI samples as inputs.Abstract
Characterizing urban environments with broad coverages and high precision is more important than ever for achieving the UN's Sustainable Development Goals (SDGs) as half of the world's populations are living in cities. Urban building height as a fundamental 3D urban structural feature has far-reaching applications. However, so far, producing readily available datasets of recent urban building heights with fine spatial resolutions and global coverages remains a challenging task. Here, we provide an up-to-date global product of urban building heights based on a fine grid size of 150 m around 2020 by combining the spaceborne lidar instrument of GEDI and multi-sourced data including remotely sensed images (i.e., Landsat-8, Sentinel-2, and Sentinel-1) and topographic data. Our results revealed that the estimated method of building height samples based on the GEDI data was effective with 0.78 of Pearson's r and 3.67 m of RMSE in comparison to the reference data. The mapping product also demonstrated good performance as indicated by its strong correlation with the reference data (i.e., Pearson's r = 0.71, RMSE = 4.60 m). Compared with the currently existing products, our global urban building height map holds the ability to provide a higher spatial resolution (i.e., 150 m) with a great level of inherent details about the spatial heterogeneity and flexibility of updating using the GEDI samples as inputs. This work will boost future urban studies across many fields including climate, environmental, ecological, and social sciences.
摘要
将城市环境Characterizing with broad coverage and high precision是实现联合国可持续发展目标(SDGs)的关键,因为全球人口的一半居住在城市中。城市建筑高度作为城市三维结构特征有着广泛的应用。然而,到目前为止,生成可靠的城市建筑高度数据集,具有细度的高空间分辨率和全球覆盖率,仍然是一项挑战。我们提供了2020年的全球城市建筑高度产品,基于150米的细网格大小,通过结合地面雷达仪器GEDI和多源数据(如卫星图像(Landsat-8、Sentinel-2、Sentinel-1)和地形数据)。我们的结果表明,基于GEDI数据的建筑高度采样方法的效果是良好,Pearson相关系数为0.78,RMSE为3.67米。我们的映射产品也表现出了良好的性能,与参照数据相关系数为0.71,RMSE为4.60米。与现有产品相比,我们的全球城市建筑高度地图具有更高的空间分辨率(150米)和更多的内在细节,可以用GEDI采样作为输入,更好地满足未来城市研究的需求。
Can strong structural encoding reduce the importance of Message Passing?
results: 研究结果表明,使用矩阵乘法方法可以在一些任务上减少或完全消除消息传递层,而无需让模型表现下降。这表明,当模型可以构建强的结构编码时,消息传递的重要性相对较低。Abstract
The most prevalent class of neural networks operating on graphs are message passing neural networks (MPNNs), in which the representation of a node is updated iteratively by aggregating information in the 1-hop neighborhood. Since this paradigm for computing node embeddings may prevent the model from learning coarse topological structures, the initial features are often augmented with structural information of the graph, typically in the form of Laplacian eigenvectors or Random Walk transition probabilities. In this work, we explore the contribution of message passing when strong structural encodings are provided. We introduce a novel way of modeling the interaction between feature and structural information based on their tensor product rather than the standard concatenation. The choice of interaction is compared in common scenarios and in settings where the capacity of the message-passing layer is severely reduced and ultimately the message-passing phase is removed altogether. Our results indicate that using tensor-based encodings is always at least on par with the concatenation-based encoding and that it makes the model much more robust when the message passing layers are removed, on some tasks incurring almost no drop in performance. This suggests that the importance of message passing is limited when the model can construct strong structural encodings.
摘要
最常见的图学习网络是消息传递神经网络(MPNN),它们的节点表示更新是通过邻居信息的聚合来实现的。由于这种方法可能会阻止模型学习大规模的结构特征,因此通常会将初始特征与图结构信息相结合,通常是拉普拉斯特征或游走过程的概率。在这项工作中,我们研究了消息传递在强结构编码下的贡献。我们提出了一种基于维度积 producer 而不是标准拼接的交互方法来模型特征和结构信息之间的互动。我们对于常见的场景和消息传递层的容量减少情况进行比较,最后还移除消息传递阶段 altogether。我们的结果表明,使用维度积编码总是与拼接编码相当,并且在某些任务下,它会减少性能的下降幅度。这表示消息传递的重要性在模型可以构建强结构编码时相对较低。
Pyramidal Hidden Markov Model For Multivariate Time Series Forecasting
results: 实验结果表明,提出的PHMM模型在多变量时间序列 dataset上比其竞争对手更加出色,能够更好地处理非站点和噪音数据,并建立更加准确和全面的预测结果。Abstract
The Hidden Markov Model (HMM) can predict the future value of a time series based on its current and previous values, making it a powerful algorithm for handling various types of time series. Numerous studies have explored the improvement of HMM using advanced techniques, leading to the development of several variations of HMM. Despite these studies indicating the increased competitiveness of HMM compared to other advanced algorithms, few have recognized the significance and impact of incorporating multistep stochastic states into its performance. In this work, we propose a Pyramidal Hidden Markov Model (PHMM) that can capture multiple multistep stochastic states. Initially, a multistep HMM is designed for extracting short multistep stochastic states. Next, a novel time series forecasting structure is proposed based on PHMM, which utilizes pyramid-like stacking to adaptively identify long multistep stochastic states. By employing these two schemes, our model can effectively handle non-stationary and noisy data, while also establishing long-term dependencies for more accurate and comprehensive forecasting. The experimental results on diverse multivariate time series datasets convincingly demonstrate the superior performance of our proposed PHMM compared to its competitive peers in time series forecasting.
摘要
隐藏Markov模型(HMM)可以预测时间序列的未来值基于其当前和前一个值,使其成为处理多种时间序列的 poderful算法。许多研究已经探索了HMM的改进,导致了多种HMM的发展。 despite these studies indicating the increased competitiveness of HMM compared to other advanced algorithms, few have recognized the significance and impact of incorporating multistep stochastic states into its performance. 在这种工作中,我们提议一种Pyramidal隐藏Markov模型(PHMM),可以捕捉多个多步骤的随机状态。首先,我们设计了一种多步骤HMM,用于提取短时间内的多步骤随机状态。然后,我们提出了一种基于PHMM的新的时间序列预测结构,使用Pyramid-like的堆叠来适应ively认定长时间内的多步骤随机状态。通过这两种方案,我们的模型可以有效地处理不稳定和噪声掺杂的数据,同时也可以建立长期依赖关系,以更准确和全面的预测。实验结果表明,我们的提议的PHMM在多种多变量时间序列数据集上表现出了superior的性能,与其竞争对手相比。
PPFL: A Personalized Federated Learning Framework for Heterogeneous Population
results: 研究表明,PPFL 方法可以提供substantial insights into client characteristics,并且比 existed Personalized Federated Learning 方法更有优势。Abstract
Personalization aims to characterize individual preferences and is widely applied across many fields. However, conventional personalized methods operate in a centralized manner and potentially expose the raw data when pooling individual information. In this paper, with privacy considerations, we develop a flexible and interpretable personalized framework within the paradigm of Federated Learning, called PPFL (Population Personalized Federated Learning). By leveraging canonical models to capture fundamental characteristics among the heterogeneous population and employing membership vectors to reveal clients' preferences, it models the heterogeneity as clients' varying preferences for these characteristics and provides substantial insights into client characteristics, which is lacking in existing Personalized Federated Learning (PFL) methods. Furthermore, we explore the relationship between our method and three main branches of PFL methods: multi-task PFL, clustered FL, and decoupling PFL, and demonstrate the advantages of PPFL. To solve PPFL (a non-convex constrained optimization problem), we propose a novel random block coordinate descent algorithm and present the convergence property. We conduct experiments on both pathological and practical datasets, and the results validate the effectiveness of PPFL.
摘要
个人化目标是描述个体偏好,广泛应用于多个领域。然而,传统的个人化方法采用中央化的方式运行,可能暴露个体数据。在这篇论文中,我们考虑隐私问题,开发了一种灵活可解释的个人化框架,称为PPFL(人口个性化联合学习)。我们利用 canonical model 捕捉人口中的基本特征,并使用会员 вектор 表达客户的偏好,以模拟客户对这些特征的差异性,并提供了详细的客户特征信息,其与现有的个性化联合学习方法(PFL)不同。此外,我们还探讨了我们的方法与多任务 PFL、分区 FL 和解除 PFL 的三大分支的关系,并证明 PPFL 的优势。为解决 PPFL (一个非对称约束优化问题),我们提议一种新的随机块坐标下降算法,并证明其收敛性。我们在实验中使用了一些病理和实际的数据集,并 validate PPFL 的效果。
Finite-Sample Analysis of the Temporal Difference Learning
results: 本文提供了近似优化的几何和偏差项,以及相应的样本复杂性 bound。 我们的证明技巧基于 refined error bounds для linear stochastic approximation 以及 TD-type recurrence 中的新稳定性结果。Abstract
In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear functional approximation for policy evaluation in discounted Markov Decision Processes. We show that a simple algorithm with a universal and instance-independent step size together with Polyak-Ruppert tail averaging is sufficient to obtain near-optimal variance and bias terms. We also provide the respective sample complexity bounds. Our proof technique is based on refined error bounds for linear stochastic approximation together with the novel stability result for the product of random matrices that arise from the TD-type recurrence.
摘要
在这篇论文中,我们考虑了使用线性函数 aproximation 的 temporal difference(TD)方法来评估折扣Markov决策过程中的性能。我们显示了一个简单的算法,具有 universal 和实例独立的步长,可以获得near-optimal的偏差和方差项。我们还提供了相应的样本复杂性bound。我们的证明技术基于线性随机化的精细错误 bounds 以及TD型循环中的产品Random Matrices的新稳定性结果。
Robust Visual Imitation Learning with Inverse Dynamics Representations
methods: 开发了一种 inverse dynamics state representation learning objective,以对学习环境和专家环境进行对齐
results: 在各种视觉扰动和多种视觉控制任务中,可以达到几乎专家水平的性能,与现状的visual IL方法和Robust IL方法显著超越Abstract
Imitation learning (IL) has achieved considerable success in solving complex sequential decision-making problems. However, current IL methods mainly assume that the environment for learning policies is the same as the environment for collecting expert datasets. Therefore, these methods may fail to work when there are slight differences between the learning and expert environments, especially for challenging problems with high-dimensional image observations. However, in real-world scenarios, it is rare to have the chance to collect expert trajectories precisely in the target learning environment. To address this challenge, we propose a novel robust imitation learning approach, where we develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment. With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data not only element-wise, but also from the trajectory level. We conduct extensive experiments to evaluate the proposed approach under various visual perturbations and in diverse visual control tasks. Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods.
摘要
“模仿学习(IL)已经在复杂的顺序决策问题上取得了显著的成功。然而,现有的IL方法主要假设学习策略的环境与收集专家数据的环境一致。因此,这些方法可能在环境有所不同时失效,特别是高维图像观察的复杂问题上。然而,在实际场景中,很少有收集专家轨迹的机会, precisely in the target learning environment。为解决这个挑战,我们提出了一种新的稳定的模仿学习方法,其中我们开发了反动动态状态表示学习目标,以对专家环境和学习环境进行对齐。通过抽象状态表示,我们设计了一个有效的奖励函数,该函数不仅在元素级别,还在轨迹级别进行 Similarity Measure between behavior data and expert data。我们进行了广泛的实验来评估我们的方法,并在不同的视觉干扰和多种视觉控制任务中达到了 near-expert 性能。我们的方法可以在大多数环境中达到领先的性能,并在视觉IL方法和稳定IL方法中具有显著优势。”
Shortcuts for causal discovery of nonlinear models by score matching
methods: 该论文使用了 simulated data 进行实验,并对非线性模型进行了分析和比较。
results: 研究发现,ScoreSort 算法在非线性模型中具有更高的统计效率,并且可以在多种 synthetic benchmarks 上实现 score-sortability。同时,研究还发现了数据的多样性是评估非线性 causal discovery 方法的重要限制因素,以及在不同的设定下进行详细测试和分析统计性质是对 causal discovery 研究中的重要考虑因素。Abstract
The use of simulated data in the field of causal discovery is ubiquitous due to the scarcity of annotated real data. Recently, Reisach et al., 2021 highlighted the emergence of patterns in simulated linear data, which displays increasing marginal variance in the casual direction. As an ablation in their experiments, Montagna et al., 2023 found that similar patterns may emerge in nonlinear models for the variance of the score vector $\nabla \log p_{\mathbf{X}$, and introduced the ScoreSort algorithm. In this work, we formally define and characterize this score-sortability pattern of nonlinear additive noise models. We find that it defines a class of identifiable (bivariate) causal models overlapping with nonlinear additive noise models. We theoretically demonstrate the advantages of ScoreSort in terms of statistical efficiency compared to prior state-of-the-art score matching-based methods and empirically show the score-sortability of the most common synthetic benchmarks in the literature. Our findings remark (1) the lack of diversity in the data as an important limitation in the evaluation of nonlinear causal discovery approaches, (2) the importance of thoroughly testing different settings within a problem class, and (3) the importance of analyzing statistical properties in causal discovery, where research is often limited to defining identifiability conditions of the model.
摘要
使用模拟数据在 causal discovery 领域是普遍的,因为真实标注数据的罕见。Recently, Reisach et al. (2021) 指出了模拟数据中的增长性趋势,这种趋势在 causal 方向上 display 增加的边缘方差。在他们的实验中,Montagna et al. (2023) 发现了类似的趋势可能会出现在非线性模型中,并提出了 ScoreSort 算法。在这种工作中,我们正式定义和特征化这种 ScoreSort 模型的评分可排性特征。我们发现这种特征定义了一类可识别的双向 causal 模型,与非线性加性随机噪声模型 overlap 。我们理论上表明 ScoreSort 比 Priori 状态的分配比对方法更高效,并且实际上证明了 literature 中最常见的 sintetic 标准 benchmark 中的评分可排性。我们的发现包括(1)数据的不够多样性是评估非线性 causal discovery 方法的重要限制,(2)在一个问题类中测试不同设置是重要的,(3)在 causal discovery 中分析统计性质是研究中常常被限制的。
Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective
results: 实验结果显示,独立模式之间存在高度的测量不确定性,而模式组合可以将这些不确定性减少,提高OoD检测的可靠性。Abstract
Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution data (InD) mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter space, or namely modes. Interestingly, we observe that these independent modes, which all reach low-loss regions with InD data (training and test data), yet yield significantly different loss landscapes with OoD data. Such an observation provides a novel view to investigate the OoD detection from the loss landscape and further suggests significantly fluctuating OoD detection performance across these modes. For instance, FPR values of the RankFeat method can range from 46.58% to 84.70% among 5 modes, showing uncertain detection performance evaluations across independent modes. Motivated by such diversities on OoD loss landscape across modes, we revisit the deep ensemble method for OoD detection through mode ensemble, leading to improved performance and benefiting the OoD detector with reduced variances. Extensive experiments covering varied OoD detectors and network structures illustrate high variances across modes and also validate the superiority of mode ensemble in boosting OoD detection. We hope this work could attract attention in the view of independent modes in the OoD loss landscape and more reliable evaluations on OoD detectors.
摘要
现有的Out-of-Distribution(OoD)检测方法主要通过探索特征、搅瑞和梯度在深度神经网络(DNN)中检测OoD样本。在本工作中,我们提出了一新的视角,即损失 landscape和模式ensemble来研究OoD检测。DNN的优化中存在许多本地极值点,即模式。我们发现这些独立的模式,它们都能够在训练和测试数据上达到低损失区域,但是在OoD数据上却导致了显著不同的损失 landscape。这种观察提供了一个新的视角来研究OoD检测,并且建议了通过模式ensemble来改进OoD检测性能。例如,RankFeat方法的FPR值在不同模式中可以从46.58%到84.70%不同,表明OoD检测性能的评估存在很大的不确定性。我们通过模式ensemble来降低这种不确定性,并通过广泛的实验证明了模式ensemble的superiority。我们希望这种新的视角能吸引更多的关注,并且能够提供更可靠的OoD检测评估方法。
SUT: Active Defects Probing for Transcompiler Models
results: 实验表明,即使使用了强大的模型如ChatGPT,其仍然在这些基本单元测试上出现错误,相比之前的编程语言翻译任务评价集,其通过率下降了26.15%。此外,评价工具还揭示了这些模型在语法元素方面的不足。Abstract
Automatic Program translation has enormous application value and hence has been attracting significant interest from AI researchers. However, we observe that current program translation models still make elementary syntax errors, particularly, when the target language does not have syntax elements in the source language. Metrics like BLUE, CodeBLUE and computation accuracy may not expose these issues. In this paper we introduce a new metrics for programming language translation and these metrics address these basic syntax errors. We develop a novel active defects probing suite called Syntactic Unit Tests (SUT) which includes a highly interpretable evaluation harness for accuracy and test scoring. Experiments have shown that even powerful models like ChatGPT still make mistakes on these basic unit tests. Specifically, compared to previous program translation task evaluation dataset, its pass rate on our unit tests has decreased by 26.15%. Further our evaluation harness reveal syntactic element errors in which these models exhibit deficiencies.
摘要
自动程序翻译具有巨大的应用价值,因此吸引了许多人工智能研究者的关注。然而,我们发现当目标语言缺乏源语言语法元素时,当前的程序翻译模型仍然会出现基本语法错误。度量器如BLUE、CodeBLUE和计算准确率可能不会暴露这些问题。在这篇论文中,我们介绍了一种新的程序语言翻译度量器,这些度量器可以捕捉这些基本语法错误。我们开发了一个新的活动缺陷探测组合 called Syntactic Unit Tests (SUT),该组合包括一个高度可读性评估器和准确分数评价。实验表明,即使是 poderoso的 ChatGPT 模型也会在我们的单元测试中出现错误。具体来说,相比之前的程序翻译任务评估集,我们的单元测试中的通过率下降了26.15%。此外,我们的评估器还揭示了这些模型在语法元素上的缺陷。
Prompt Engineering Through the Lens of Optimal Control
results: 该论文提出了一种优化控制框架,可以对多轮PE进行系мати化和优化,并且可以扩展到多智能协作PE和集成PE等方法。这些方法可以提高人机交互的效率和效果,并且具有更好的可解释性和可视化性。Abstract
Prompt Engineering (PE) has emerged as a critical technique for guiding Large Language Models (LLMs) in solving intricate tasks. Its importance is highlighted by its potential to significantly enhance the efficiency and effectiveness of human-machine interaction. As tasks grow increasingly complex, recent advanced PE methods have extended beyond the limitations of single-round interactions to embrace multi-round interactions, which allows for a deeper and more nuanced engagement with LLMs. In this paper, we propose an optimal control framework tailored for multi-round interactions with LLMs. This framework provides a unified mathematical structure that not only systematizes the existing PE methods but also sets the stage for rigorous analytical improvements. Furthermore, we extend this framework to include PE via ensemble methods and multi-agent collaboration, thereby enlarging the scope of applicability. By adopting an optimal control perspective, we offer fresh insights into existing PE methods and highlight theoretical challenges that warrant future research. Besides, our work lays a foundation for the development of more effective and interpretable PE methods.
摘要
Improved Techniques for Training Consistency Models
paper_authors: Yang Song, Prafulla Dhariwal for:This paper focuses on improving the quality of consistency models, a type of generative model that can sample high-quality data in one step without the need for adversarial training.methods:The authors present several improved techniques for consistency training, including eliminating Exponential Moving Average from the teacher consistency model, adopting Pseudo-Huber losses, and introducing a lognormal noise schedule.results:The authors achieve FID scores of 2.51 and 3.25 on CIFAR-10 and ImageNet $64\times 64$ respectively in a single sampling step, marking a 3.5$\times$ and 4$\times$ improvement compared to prior consistency training approaches. Through two-step sampling, they further reduce FID scores to 2.24 and 2.77 on these two datasets, surpassing those obtained via distillation in both one-step and two-step settings.Abstract
Consistency models are a nascent family of generative models that can sample high quality data in one step without the need for adversarial training. Current consistency models achieve optimal sample quality by distilling from pre-trained diffusion models and employing learned metrics such as LPIPS. However, distillation limits the quality of consistency models to that of the pre-trained diffusion model, and LPIPS causes undesirable bias in evaluation. To tackle these challenges, we present improved techniques for consistency training, where consistency models learn directly from data without distillation. We delve into the theory behind consistency training and identify a previously overlooked flaw, which we address by eliminating Exponential Moving Average from the teacher consistency model. To replace learned metrics like LPIPS, we adopt Pseudo-Huber losses from robust statistics. Additionally, we introduce a lognormal noise schedule for the consistency training objective, and propose to double total discretization steps every set number of training iterations. Combined with better hyperparameter tuning, these modifications enable consistency models to achieve FID scores of 2.51 and 3.25 on CIFAR-10 and ImageNet $64\times 64$ respectively in a single sampling step. These scores mark a 3.5$\times$ and 4$\times$ improvement compared to prior consistency training approaches. Through two-step sampling, we further reduce FID scores to 2.24 and 2.77 on these two datasets, surpassing those obtained via distillation in both one-step and two-step settings, while narrowing the gap between consistency models and other state-of-the-art generative models.
摘要
《协调模型》是一个新兴的生成模型,可以在一步中采样高质量数据,而不需要对抗学习。现有的协调模型可以达到最佳的样本质量 by 热退 diffusion 模型,并使用学习的度量如 LPIPS。然而,热退限制了协调模型的质量,而 LPIPS 会导致评估中不良的偏见。为了解决这些挑战,我们提出了改进的协调训练方法,协调模型可以直接从数据中学习,无需热退。我们分析了协调训练的理论基础,发现了一个以前未被注意到的缺陷,我们通过从教师协调模型中减少 Exponential Moving Average 来解决这个缺陷。而代替学习度量,我们采用了 Pseudo-Huber 损失函数。此外,我们还引入了 lognormal 噪声程序,并提议在训练迭代中逐步增加总数化步骤。通过更好的hyperparameter优化,这些修改使得协调模型可以在一步中采样 FID 分数为 2.51 和 3.25 的 CIFAR-10 和 ImageNet $64\times 64$ 分别,比前一个 consistency training 方法提高了 3.5 倍和 4 倍。通过两步采样,我们还可以将 FID 分数降低到 2.24 和 2.77,超越了在一步和两步 Setting 中使用热退的分数,同时逐渐趋近其他状态的生成模型。
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
results: 本文实证了MoE模型在分类问题下的抽象率和参数估计率,并发现在一些专家参数消失时,抽象率会 slower than polynomial 率,但通过修改Softmax抽象函数,可以提高参数估计率。Abstract
Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential equations. To address this issue, we propose using a novel class of modified softmax gating functions which transform the input value before delivering them to the gating functions. As a result, the previous interaction disappears and the parameter estimation rates are significantly improved.
摘要
归一模型(Mixture-of-experts,MoE)利用多个子模型的力via关键函数实现更高的性能在多种回归和分类应用中。从理论上来说,尚未在文献中对MoE模型在回归设置下的行为进行了深入的分析,而这种分析在分类问题下尚未被探讨。我们填补了这一空白,并证明了softmax关键函数多omial几何MoE模型的整体抽象率和参数估计率在部分专家参数消失时 slower than 多项式率,这是因为softmax关键函数和专家函数之间存在自然的互动。 To address this issue, we propose using a novel class of modified softmax gating functions, which transform the input value before delivering them to the gating functions. As a result, the previous interaction disappears, and the parameter estimation rates are significantly improved.
Learning Invariant Molecular Representation in Latent Discrete Space
results: 我们的模型在18个真实的分子数据集上进行了广泛的实验,结果显示我们的模型在不同环境下的泛化性比起现有基eline的优胜。我们的代码可以在https://github.com/HICAI-ZJU/iMoLD上取得。Abstract
Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD.
摘要
分子表示学习建立了药物发现的基础。然而,现有方法受到不同环境下数据的外部分布Shift的困难,特别是在训练和测试数据来源不同时。为解决这问题,我们提出了一个新的分子表示学习框架,具有不变性和抗分布Shift的特点。具体来说,我们提出了一种名为“首先编码然后分离”的策略,以适应分子特征的不变性。在分离步骤之前,我们引入了一个很 residual vector quantization module,以避免训练数据分布适应而导致的过拟合,同时保持编码器的表达能力。此外,我们设计了一个无关任务的自适应学习目标,以促进精准的不变性标识,这使得我们的方法可以通用于多种任务,如回归和多个标签分类。我们的代码可以在https://github.com/HICAI-ZJU/iMoLD中获取。Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
results: ensemble学习可以增强GNN对复杂图结构数据的分析能力,提高总准确率,降低偏差和方差,降低噪声数据的影响。Abstract
Graph Neural Networks (GNNs) have shown success in various fields for learning from graph-structured data. This paper investigates the application of ensemble learning techniques to improve the performance and robustness of Graph Neural Networks (GNNs). By training multiple GNN models with diverse initializations or architectures, we create an ensemble model named ELGNN that captures various aspects of the data and uses the Tree-Structured Parzen Estimator algorithm to determine the ensemble weights. Combining the predictions of these models enhances overall accuracy, reduces bias and variance, and mitigates the impact of noisy data. Our findings demonstrate the efficacy of ensemble learning in enhancing GNN capabilities for analyzing complex graph-structured data. The code is public at https://github.com/wongzhenhao/ELGNN.
摘要
GRAPH Neural Networks (GNNs) have shown success in various fields for learning from graph-structured data. This paper investigates the application of ensemble learning techniques to improve the performance and robustness of GRAPH Neural Networks (GNNs). By training multiple GNN models with diverse initializations or architectures, we create an ensemble model named ELGNN that captures various aspects of the data and uses the Tree-Structured Parzen Estimator algorithm to determine the ensemble weights. Combining the predictions of these models enhances overall accuracy, reduces bias and variance, and mitigates the impact of noisy data. Our findings demonstrate the efficacy of ensemble learning in enhancing GNN capabilities for analyzing complex graph-structured data. The code is public at https://github.com/wongzhenhao/ELGNN.Note that I've kept the original English names for the concepts and techniques to maintain clarity and consistency.
methods: 作者提出了一种有效的算法,可以在全信息设定和抽象反馈设定下保证 approximately sublinear regret。
results: 该算法可以在对抗设定中 garantuee an approximately sublinear regret,并且可以避免闭塞效应和符合法规要求。Abstract
Contextual bandit algorithms are at the core of many applications, including recommender systems, clinical trials, and optimal portfolio selection. One of the most popular problems studied in the contextual bandit literature is to maximize the sum of the rewards in each round by ensuring a sublinear regret against the best-fixed context-dependent policy. However, in many applications, the cumulative reward is not the right objective - the bandit algorithm must be fair in order to avoid the echo-chamber effect and comply with the regulatory requirements. In this paper, we consider the $\alpha$-Fair Contextual Bandits problem, where the objective is to maximize the global $\alpha$-fair utility function - a non-decreasing concave function of the cumulative rewards in the adversarial setting. The problem is challenging due to the non-separability of the objective across rounds. We design an efficient algorithm that guarantees an approximately sublinear regret in the full-information and bandit feedback settings.
摘要
Contextual bandit algorithms are at the core of many applications, including recommender systems, clinical trials, and optimal portfolio selection. One of the most popular problems studied in the contextual bandit literature is to maximize the sum of the rewards in each round by ensuring a sublinear regret against the best-fixed context-dependent policy. However, in many applications, the cumulative reward is not the right objective - the bandit algorithm must be fair in order to avoid the echo-chamber effect and comply with the regulatory requirements. In this paper, we consider the $\alpha$-Fair Contextual Bandits problem, where the objective is to maximize the global $\alpha$-fair utility function - a non-decreasing concave function of the cumulative rewards in the adversarial setting. The problem is challenging due to the non-separability of the objective across rounds. We design an efficient algorithm that guarantees an approximately sublinear regret in the full-information and bandit feedback settings.Here's the translation in Simplified Chinese:Contextual bandit algorithms 是多种应用的核心,包括推荐系统、临床试验和最佳投资选择。文章中最受欢迎的问题是在各个回合中 maximize 奖励的和,并在最佳固定上下文依赖策略的比较下保持 sublinear 后悔。然而,在许多应用中,总奖励不是正确的目标 - 带刺机器必须公平,以避免闪电室效应和符合规定要求。在这篇文章中,我们考虑 $\alpha $- Fair Contextual Bandits 问题,其目标是在对抗设定下 maximize 全球 $\alpha $- fair utility 函数 - 一个不递减的凹陷函数。问题具有不可分离性,使得它变得更加挑战。我们设计了一个有效的算法,以保证在充分信息和反馈设定下 approximately sublinear 后悔。
Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation
results: 实验表明,通过生成多种增强实例,AdaSolver可以提高MILP解题器的效率,并且可以在不同的分布下实现remarkable的提高I hope this helps! Let me know if you have any further questions.Abstract
Machine learning has been successfully applied to improve the efficiency of Mixed-Integer Linear Programming (MILP) solvers. However, the learning-based solvers often suffer from severe performance degradation on unseen MILP instances -- especially on large-scale instances from a perturbed environment -- due to the limited diversity of training distributions. To tackle this problem, we propose a novel approach, which is called Adversarial Instance Augmentation and does not require to know the problem type for new instance generation, to promote data diversity for learning-based branching modules in the branch-and-bound (B&B) Solvers (AdaSolver). We use the bipartite graph representations for MILP instances and obtain various perturbed instances to regularize the solver by augmenting the graph structures with a learned augmentation policy. The major technical contribution of AdaSolver is that we formulate the non-differentiable instance augmentation as a contextual bandit problem and adversarially train the learning-based solver and augmentation policy, enabling efficient gradient-based training of the augmentation policy. To the best of our knowledge, AdaSolver is the first general and effective framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based (RL-based) B&B solvers. Extensive experiments demonstrate that by producing various augmented instances, AdaSolver leads to a remarkable efficiency improvement across various distributions.
摘要
machine learning 已经成功应用于提高混合整数线性 программирова (MILP) 解决器的效率。然而,学习基于的解决器经常在未看过的 MILP 实例上表现出严重的性能下降 --特别是大规模实例从受到扰动环境-- 由于培育分布的有限多样性。为解决这个问题,我们提出了一种新的方法,即 Adversarial Instance Augmentation,不需要知道问题类型,为学习基于分支模块在 B&B 解决器 (AdaSolver) 中提高数据多样性。我们使用 MILP 实例的双分图表示,并通过学习的扩充策略对图结构进行加工,以增强解决器的普适性。AdaSolver 的主要技术贡献在于将不 diferenciable 的实例扩充视为上下文ual Bandit 问题,并对学习基于的扩充策略进行 adversarial 训练,使得学习基于的解决器可以高效地进行梯度下降训练。根据我们所知,AdaSolver 是首个普适并有效的框架,用于理解和提高学习基于 B&B 解决器的通用化。我们的实验证明,通过生成多种扩充实例,AdaSolver 在不同的分布下可以得到很大的效率提高。