cs.LG - 2023-10-18

No-Regret Learning in Bilateral Trade via Global Budget Balance

  • paper_url: http://arxiv.org/abs/2310.12370
  • repo_url: None
  • paper_authors: Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Federico Fusco
  • for: 本文研究在双方均有私人估价的 bilateral trade 问题中,learner 设定价格,无知到代理人的估价。
  • methods: 本文引入全局预算平衡的概念,要求代理人在整个时间轴上保持预算平衡。通过全局预算平衡,提供了首个不负 regret 算法,在不同反馈模型下对 adversarial 输入进行评估。
  • results: 在全部反馈模型中,learner 可以保证 $\tilde{O}(\sqrt{T})$ regret,与最佳固定价格相比,是OPTimal的。在partial feedback模型中,提供了一个 $\tilde{O}(T^{3/4})$ regret Upper bound的算法,并且 complement with a nearly-matching lower bound。
    Abstract Bilateral trade revolves around the challenge of facilitating transactions between two strategic agents -- a seller and a buyer -- both of whom have a private valuations for the item. We study the online version of the problem, in which at each time step a new seller and buyer arrive. The learner's task is to set a price for each agent, without any knowledge about their valuations. The sequence of sellers and buyers is chosen by an oblivious adversary. In this setting, known negative results rule out the possibility of designing algorithms with sublinear regret when the learner has to guarantee budget balance for each iteration. In this paper, we introduce the notion of global budget balance, which requires the agent to be budget balance only over the entire time horizon. By requiring global budget balance, we provide the first no-regret algorithms for bilateral trade with adversarial inputs under various feedback models. First, we show that in the full-feedback model the learner can guarantee $\tilde{O}(\sqrt{T})$ regret against the best fixed prices in hindsight, which is order-wise optimal. Then, in the case of partial feedback models, we provide an algorithm guaranteeing a $\tilde{O}(T^{3/4})$ regret upper bound with one-bit feedback, which we complement with a nearly-matching lower bound. Finally, we investigate how these results vary when measuring regret using an alternative benchmark.
    摘要 bilateral trade 环绕着两个策略代理人(一个买家和一个卖家)之间的挑战,这两个代理人都有私人的评价值。我们研究在网络上进行的这个问题,在每个时间步骤中,新的买家和卖家会出现。学习者的任务是设定价格,但是没有任何关于代理人们的评价知识。选择序列的买家和卖家是由一个无知的敌人选择。在这个设定下,已知的负结果规则排除了设计具有下图 regret 的算法的可能性。在这篇论文中,我们引入全面预算平衡的概念,它需要学习者在整个时间频谱上保持预算平衡。通过需要全面预算平衡,我们提供了首个不负担 regret 的算法,在不同的反馈模型下实现了双方贸易。首先,在完整反馈模型中,我们显示学习者可以在对照后获得 $\tilde{O}(\sqrt{T})$ regret,这是很好的估计。然后,在受限反馈模型中,我们提供了一个 garantia $\tilde{O}(T^{3/4})$ regret 的算法,并补充了一个几乎匹配的下限。最后,我们调查了这些结果如何在使用不同的参考基准时变化。

MARVEL: Multi-Agent Reinforcement-Learning for Large-Scale Variable Speed Limits

  • paper_url: http://arxiv.org/abs/2310.12359
  • repo_url: None
  • paper_authors: Yuhang Zhang, Marcos Quinones-Grueiro, Zhiyao Zhang, Yanbing Wang, William Barbour, Gautam Biswas, Daniel Work
  • for: 这个论文的目的是提出一种基于多代理学习(MARL)的大规模Variable Speed Limit(VSL)控制策略,以提高交通安全性和流体性。
  • methods: 该论文使用MARL框架,通过使用常见的数据来实现大规模VSL控制。代理学习算法通过考虑交通条件的变化、安全性和流体性的奖励结构进行学习,从而实现代理之间的协调。
  • results: 对于一段7英里的高速公路,MARL方法提高了交通安全性63.4%,并提高了交通流体性14.6%,相比于现有的实践算法。此外,文章还进行了解释性分析,以了解代理在不同交通条件下的决策过程。最后,文章测试了在实际数据上的策略,以证明该策略的可部署性。
    Abstract Variable speed limit (VSL) control is a promising traffic management strategy for enhancing safety and mobility. This work introduces MARVEL, a multi-agent reinforcement learning (MARL) framework for implementing large-scale VSL control on freeway corridors using only commonly available data. The agents learn through a reward structure that incorporates adaptability to traffic conditions, safety, and mobility; enabling coordination among the agents. The proposed framework scales to cover corridors with many gantries thanks to a parameter sharing among all VSL agents. The agents are trained in a microsimulation environment based on a short freeway stretch with 8 gantries spanning 7 miles and tested with 34 gantries spanning 17 miles of I-24 near Nashville, TN. MARVEL improves traffic safety by 63.4% compared to the no control scenario and enhances traffic mobility by 14.6% compared to a state-of-the-practice algorithm that has been deployed on I-24. An explainability analysis is undertaken to explore the learned policy under different traffic conditions and the results provide insights into the decision-making process of agents. Finally, we test the policy learned from the simulation-based experiments on real input data from I-24 to illustrate the potential deployment capability of the learned policy.
    摘要 Variable speed limit (VSL) 控制是一种有前途的交通管理策略,可以提高安全性和流动性。这项工作介绍了 MARVEL,一种多代理学习 (MARL) 框架,用于实现大规模 VSL 控制在高速公路段上,只使用常见的数据。代理学习的奖励结构包括适应交通条件、安全性和流动性,使代理之间协调。提出的框架可以涵盖覆盖许多斜塔,因为所有 VSL 代理的参数共享。代理在基于微观 simulate 环境中学习,该环境基于一段长7英里的高速公路,涵盖8个斜塔。代理在基于实际数据进行测试,并在I-24公路上进行了17英里的测试。 MARVEL 可以提高交通安全性63.4%,并提高交通流动性14.6%,相比之前的实践算法。 Explainability 分析用于探索不同交通条件下代理学习的策略,结果提供了决策过程中代理的启示。最后,我们将在实际数据上测试从 simulate 中学习的策略,以 illustrate 学习的可部署性。

Networkwide Traffic State Forecasting Using Exogenous Information: A Multi-Dimensional Graph Attention-Based Approach

  • paper_url: http://arxiv.org/abs/2310.12353
  • repo_url: None
  • paper_authors: Syed Islam, Monika Filipovska
  • for: 这篇论文主要针对交通管理和控制策略中的交通状态预测问题,以及用户和系统层次的决策。
  • methods: 该论文提出了一种基于多维空间时间图注意力网络(M-STGAT)的交通预测方法,该方法使用过去观测到的速度、路况事件、温度和视程,并利用交通网络的结构来学习。
  • results: 实验结果表明,M-STGAT在使用加利福尼亚交通部门(Caltrans)性能衡量系统(PeMS)提供的交通速度和路况数据,并与国家海洋和大气管理局(NOAA)自动Surface Observing Systems(ASOS)提供的天气数据进行比较,在30、45和60分钟预测时间 horizons 上表现出了较好的预测性能,其中error measures包括 Mean Absolute Error(MAE)、Root Mean Square Error(RMSE)和Mean Absolute Percentage Error(MAPE)。但是,模型的传送性可能需要进一步的调查。
    Abstract Traffic state forecasting is crucial for traffic management and control strategies, as well as user- and system-level decision making in the transportation network. While traffic forecasting has been approached with a variety of techniques over the last couple of decades, most approaches simply rely on endogenous traffic variables for state prediction, despite the evidence that exogenous factors can significantly impact traffic conditions. This paper proposes a multi-dimensional spatio-temporal graph attention-based traffic prediction approach (M-STGAT), which predicts traffic based on past observations of speed, along with lane closure events, temperature, and visibility across the transportation network. The approach is based on a graph attention network architecture, which also learns based on the structure of the transportation network on which these variables are observed. Numerical experiments are performed using traffic speed and lane closure data from the California Department of Transportation (Caltrans) Performance Measurement System (PeMS). The corresponding weather data were downloaded from the National Oceanic and Atmospheric Administration (NOOA) Automated Surface Observing Systems (ASOS). For comparison, the numerical experiments implement three alternative models which do not allow for the multi-dimensional input. The M-STGAT is shown to outperform the three alternative models, when performing tests using our primary data set for prediction with a 30-, 45-, and 60-minute prediction horizon, in terms of three error measures: Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). However, the model's transferability can vary for different transfer data sets and this aspect may require further investigation.
    摘要 交通状况预测是交通管理和控制策略以及用户和系统层次的决策中非常重要的一环。自过去几十年来,交通预测已经使用了多种技术,但大多数方法都仅仅基于内生的交通变量进行预测,尽管外生因素可能对交通条件产生重要影响。这篇论文提出了一种多维度空间时间图注意力基本交通预测方法(M-STGAT),该方法基于过去观测到的速度,以及路段 closure事件、温度和视程等外生因素进行预测。该方法基于图注意力网络架构,同时还学习了交通网络上这些变量的结构。我们使用了加利福尼亚交通部门(Caltrans)性能测量系统(PeMS)中的交通速度和路段 closure数据进行数值实验,并下载了国家海洋和大气管理局(NOAA)自动地面观测系统(ASOS)中的天气数据。为比较,我们实现了三种不允许多维度输入的数学模型。M-STGAT在使用我们的主要数据集进行预测时,在30-, 45-, 和 60-分钟预测距离时表现出了与三个错误度量( Mean Absolute Error,Root Mean Square Error 和 Mean Absolute Percentage Error)相对较高的性能。然而,模型的传输性可能会随着不同的传输数据集而异。这一点可能需要进一步的调查。

Equipping Federated Graph Neural Networks with Structure-aware Group Fairness

  • paper_url: http://arxiv.org/abs/2310.12350
  • repo_url: https://github.com/yuening-lab/f2gnn
  • paper_authors: Nan Cui, Xiuling Wang, Wendy Hui Wang, Violet Chen, Yue Ning
  • for: 这篇论文旨在提出一种解决联合学习中Graph Neural Networks(GNNs)中的偏见问题的方法,以保证GNNs在分布式学习中保持公平性。
  • methods: 该方法基于两个关键组成部分:一是客户端上的公平性意识更新方案,二是在集成过程中考虑本地模型的公平性指标和数据偏见指标的全球模型更新方案。
  • results: 该方法在许多基准方法上表现出色,在公平性和模型准确性两个方面均有显著提升。
    Abstract Graph Neural Networks (GNNs) have been widely used for various types of graph data processing and analytical tasks in different domains. Training GNNs over centralized graph data can be infeasible due to privacy concerns and regulatory restrictions. Thus, federated learning (FL) becomes a trending solution to address this challenge in a distributed learning paradigm. However, as GNNs may inherit historical bias from training data and lead to discriminatory predictions, the bias of local models can be easily propagated to the global model in distributed settings. This poses a new challenge in mitigating bias in federated GNNs. To address this challenge, we propose $\text{F}^2$GNN, a Fair Federated Graph Neural Network, that enhances group fairness of federated GNNs. As bias can be sourced from both data and learning algorithms, $\text{F}^2$GNN aims to mitigate both types of bias under federated settings. First, we provide theoretical insights on the connection between data bias in a training graph and statistical fairness metrics of the trained GNN models. Based on the theoretical analysis, we design $\text{F}^2$GNN which contains two key components: a fairness-aware local model update scheme that enhances group fairness of the local models on the client side, and a fairness-weighted global model update scheme that takes both data bias and fairness metrics of local models into consideration in the aggregation process. We evaluate $\text{F}^2$GNN empirically versus a number of baseline methods, and demonstrate that $\text{F}^2$GNN outperforms these baselines in terms of both fairness and model accuracy.
    摘要 GRAPH Neural Networks (GNNs) 在不同领域中对各种图数据进行处理和分析任务广泛使用。在中央化图数据上训练 GNNs 可能因为隐私问题和管制约束而成为不可能的。因此,联邦学习 (FL) 成为一种解决这个挑战的趋势。然而, GNNs 可能从训练数据中继承历史偏见,并导致歧视性预测,因此在分布式设置下,本地模型的偏见可能被轻松传播到全球模型。这种挑战需要解决偏见在联邦 GNN 中的问题。为此,我们提出了 $\text{F}^2$GNN,一种增强分布式 Graph Neural Network 的分组公平性。由于偏见可以来自数据和学习算法,$\text{F}^2$GNN 采用了两个关键组成部分:在客户端上使用公平性意识的本地模型更新方案,以及在聚合过程中考虑本地模型的公平性度量和数据偏见的准确度。我们对 $\text{F}^2$GNN 进行了理论分析,并对其与一些基准方法进行了实验比较,并证明 $\text{F}^2$GNN 在公平性和模型准确性两个方面都高于基准方法。

Tracking electricity losses and their perceived causes using nighttime light and social media

  • paper_url: http://arxiv.org/abs/2310.12346
  • repo_url: None
  • paper_authors: Samuel W Kerber, Nicholas A Duncan, Guillaume F LHer, Morgan Bazilian, Chris Elvidge, Mark R Deinert
  • for: 本研究旨在使用卫星图像、社交媒体和信息提取技术监测停电和其所报导的原因。
  • methods: 本研究使用了夜晚照明数据(2019年3月的加拉加斯,委内瑞拉),并通过Twitter数据分析公众对停电的感受和看法,以及使用统计分析和话题模型探讨公众归咎政府的停电原因。
  • results: 研究发现,夜晚照明强度与停电Region之间存在 inverse 关系。twitter上提到委内瑞拉总统的帖子具有更高的负面性和更多的责任相关词汇,这表明公众归咎政府对停电的责任。
    Abstract Urban environments are intricate systems where the breakdown of critical infrastructure can impact both the economic and social well-being of communities. Electricity systems hold particular significance, as they are essential for other infrastructure, and disruptions can trigger widespread consequences. Typically, assessing electricity availability requires ground-level data, a challenge in conflict zones and regions with limited access. This study shows how satellite imagery, social media, and information extraction can monitor blackouts and their perceived causes. Night-time light data (in March 2019 for Caracas, Venezuela) is used to indicate blackout regions. Twitter data is used to determine sentiment and topic trends, while statistical analysis and topic modeling delved into public perceptions regarding blackout causes. The findings show an inverse relationship between nighttime light intensity. Tweets mentioning the Venezuelan President displayed heightened negativity and a greater prevalence of blame-related terms, suggesting a perception of government accountability for the outages.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China and Singapore.

Open-Set Multivariate Time-Series Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.12294
  • repo_url: None
  • paper_authors: Thomas Lai, Thi Kieu Khanh Ho, Narges Armanfard
  • for: 提出了一种新的方法来解决时间序列异常检测(TSAD)问题,即在训练阶段只有有限的异常样本可用,并且需要检测未经见过的异常类型。
  • methods: 该方法包括三个主要模块:特征提取器、多头网络和异常评分模块。特征提取器抽取有用的时间序列特征,多头网络包括生成-, 偏差-和对比头,用于捕捉已见和未见异常类型。
  • results: 对三个实际数据集进行了广泛的实验,结果显示,我们的方法在不同的设定下都能够超越现有方法,从而在TSAD领域实现了新的状态级表现。
    Abstract Numerous methods for time series anomaly detection (TSAD) methods have emerged in recent years. Most existing methods are unsupervised and assume the availability of normal training samples only, while few supervised methods have shown superior performance by incorporating labeled anomalous samples in the training phase. However, certain anomaly types are inherently challenging for unsupervised methods to differentiate from normal data, while supervised methods are constrained to detecting anomalies resembling those present during training, failing to generalize to unseen anomaly classes. This paper is the first attempt in providing a novel approach for the open-set TSAD problem, in which a small number of labeled anomalies from a limited class of anomalies are visible in the training phase, with the objective of detecting both seen and unseen anomaly classes in the test phase. The proposed method, called Multivariate Open-Set timeseries Anomaly Detection (MOSAD) consists of three primary modules: a Feature Extractor to extract meaningful time-series features; a Multi-head Network consisting of Generative-, Deviation-, and Contrastive heads for capturing both seen and unseen anomaly classes; and an Anomaly Scoring module leveraging the insights of the three heads to detect anomalies. Extensive experiments on three real-world datasets consistently show that our approach surpasses existing methods under various experimental settings, thus establishing a new state-of-the-art performance in the TSAD field.
    摘要 Recently, many time series anomaly detection (TSAD) methods have been proposed. Most of these methods are unsupervised and assume the availability of normal training samples, while only a few supervised methods have shown better performance by incorporating labeled anomalous samples in the training phase. However, some anomaly types are difficult for unsupervised methods to distinguish from normal data, while supervised methods are limited to detecting anomalies similar to those present during training and cannot handle unseen anomaly classes. This paper is the first attempt to solve the open-set TSAD problem, in which a small number of labeled anomalies from a limited class of anomalies are available during training, with the goal of detecting both seen and unseen anomaly classes in the test phase.The proposed method, called Multivariate Open-Set Time Series Anomaly Detection (MOSAD), consists of three primary modules: a Feature Extractor to extract meaningful time-series features; a Multi-head Network consisting of Generative-, Deviation-, and Contrastive heads to capture both seen and unseen anomaly classes; and an Anomaly Scoring module that leverages the insights of the three heads to detect anomalies. Extensive experiments on three real-world datasets consistently show that our approach outperforms existing methods under various experimental settings, thereby establishing a new state-of-the-art performance in the TSAD field.

A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs

  • paper_url: http://arxiv.org/abs/2310.12248
  • repo_url: None
  • paper_authors: Mateo Perez, Fabio Somenzi, Ashutosh Trivedi
  • for: 表示非Markov决策学中的非Markov目标,使用线性时间逻辑(LTL)和ω-正则目标。
  • methods: 使用模型基于可能approx Correct(PAC)学习算法,从系统轨迹样本中学习。不需要先知系统结构。
  • results: 学习omega-正则目标的Markov决策过程中的可能approx Correct算法。
    Abstract Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes. Unlike prior approaches, our algorithm learns from sampled trajectories of the system and does not require prior knowledge of the system's topology.
    摘要 线性时间逻辑(LTL)和奥米加常量目标——LTL的超集——在人工智能中被用来表达非马普朗的目标。我们介绍了基于模型的可能相对正确(PAC)学习算法 для奥米加常量目标在Markov决策过程中。与先前的方法不同,我们的算法从系统样本轨迹中学习,而不需要先知系统结构。

Fast Parameter Inference on Pulsar Timing Arrays with Normalizing Flows

  • paper_url: http://arxiv.org/abs/2310.12209
  • repo_url: None
  • paper_authors: David Shih, Marat Freytsis, Stephen R. Taylor, Jeff A. Dror, Nolan Smyth
  • for: 这个论文是为了提高�ulsar时间尺度数组(PTAs)的 Bayesian posterior inference 的效率而写的。
  • methods: 这篇论文使用了模拟数据生成的 conditional normalizing flows 技术来快速和准确地计算狮子时间尺度数组(SGWB)的 posterior distribution,从原来的数天减少到只需几秒钟。
  • results: 该论文的实验结果表明,使用 conditional normalizing flows 技术可以在狮子时间尺度数组(SGWB)的 posterior distribution 计算中大幅提高效率,从原来的数天减少到只需几秒钟。
    Abstract Pulsar timing arrays (PTAs) perform Bayesian posterior inference with expensive MCMC methods. Given a dataset of ~10-100 pulsars and O(10^3) timing residuals each, producing a posterior distribution for the stochastic gravitational wave background (SGWB) can take days to a week. The computational bottleneck arises because the likelihood evaluation required for MCMC is extremely costly when considering the dimensionality of the search space. Fortunately, generating simulated data is fast, so modern simulation-based inference techniques can be brought to bear on the problem. In this paper, we demonstrate how conditional normalizing flows trained on simulated data can be used for extremely fast and accurate estimation of the SGWB posteriors, reducing the sampling time from weeks to a matter of seconds.
    摘要

Dynamic financial processes identification using sparse regressive reservoir computers

  • paper_url: http://arxiv.org/abs/2310.12144
  • repo_url: https://github.com/fredyvides/dynet-cnbs
  • paper_authors: Fredy Vides, Idelfonso B. R. Nogueira, Lendy Banegas, Evelyn Flores
  • For: 本文研究结构矩阵近似理论,应用于财经系统动态过程的回归表示。* Methods: 使用非线性时间延迟嵌入、稀疏最小二乘和结构矩阵近似方法来探索财经系统的输出封顶矩阵的近似表示。* Results: 通过应用上述技术,可以实现财经系统动态过程的近似识别和预测,包括可能或可能不具有混沌行为的场景。
    Abstract In this document, we present key findings in structured matrix approximation theory, with applications to the regressive representation of dynamic financial processes. Initially, we explore a comprehensive approach involving generic nonlinear time delay embedding for time series data extracted from a financial or economic system under examination. Subsequently, we employ sparse least-squares and structured matrix approximation methods to discern approximate representations of the output coupling matrices. These representations play a pivotal role in establishing the regressive models corresponding to the recursive structures inherent in a given financial system. The document further introduces prototypical algorithms that leverage the aforementioned techniques. These algorithms are demonstrated through applications in approximate identification and predictive simulation of dynamic financial and economic processes, encompassing scenarios that may or may not exhibit chaotic behavior.
    摘要 在本文中,我们介绍了结构化矩阵近似理论的关键发现,并应用于金融或经济系统中的回归表现力学过程的重构表示。我们首先探讨了一种通用非线性时间延迟嵌入方法,用于从金融或经济系统中提取时间序列数据。然后,我们使用稀疏最小二乘和结构矩阵近似方法来推导出输出 coupling 矩阵的近似表示。这些表示在建立金融系统中的回归模型中扮演关键角色。文章还介绍了一些原型算法,这些算法利用上述技术来实现精度的回归预测和模拟。这些算法在不同的金融和经济过程中的应用中得到了证明。

Automatic prediction of mortality in patients with mental illness using electronic health records

  • paper_url: http://arxiv.org/abs/2310.12121
  • repo_url: None
  • paper_authors: Sean Kim, Samuel Kim
  • for: 预测精神疾病患者30天 mortality rate
  • methods: 使用predictive machine-learning models with electronic health records (EHR),包括Logistic Regression、Random Forest、Support Vector Machine和K-Nearest Neighbors四种机器学习算法
  • results: Random Forest和Support Vector Machine模型表现最佳,AUC分数为0.911,Feature importance分析显示 morphine sulfate等药物具有预测作用。
    Abstract Mental disorders impact the lives of millions of people globally, not only impeding their day-to-day lives but also markedly reducing life expectancy. This paper addresses the persistent challenge of predicting mortality in patients with mental diagnoses using predictive machine-learning models with electronic health records (EHR). Data from patients with mental disease diagnoses were extracted from the well-known clinical MIMIC-III data set utilizing demographic, prescription, and procedural information. Four machine learning algorithms (Logistic Regression, Random Forest, Support Vector Machine, and K-Nearest Neighbors) were used, with results indicating that Random Forest and Support Vector Machine models outperformed others, with AUC scores of 0.911. Feature importance analysis revealed that drug prescriptions, particularly Morphine Sulfate, play a pivotal role in prediction. We applied a variety of machine learning algorithms to predict 30-day mortality followed by feature importance analysis. This study can be used to assist hospital workers in identifying at-risk patients to reduce excess mortality.
    摘要 精神疾病影响全球数百万人的生活,不仅妨碍日常生活,而且明显减少生存期。本文使用可预测机器学习模型和电子健康纪录(EHR)预测患有精神诊断的患者 mortality。从 клиничеwell-known MIMIC-III数据集中提取了患有精神疾病诊断的患者数据,并使用LOGISTIC REGRESSION、Random Forest、Support Vector Machine和K-Nearest Neighbors四种机器学习算法。结果表明,Random Forest和Support Vector Machine模型在其他四个模型中表现最佳,AUC分数为0.911。特征重要性分析显示,药物处方,特别是摩革定(Morphine Sulfate),在预测中扮演着关键性角色。我们通过不同的机器学习算法预测30天内死亡,并进行特征重要性分析,以帮助医院工作人员识别高风险患者,从而减少过度死亡。

MMD-based Variable Importance for Distributional Random Forest

  • paper_url: http://arxiv.org/abs/2310.12115
  • repo_url: None
  • paper_authors: Clément Bénard, Jeffrey Näf, Julie Josse
  • for: 这篇论文目的是提出一种基于森林方法的全Conditional分布估计方法,用于 Multivariate output of interest 的输入变量。
  • methods: 该论文使用了 Drop and relearn 原理和MMD距离来实现变量重要性度量,而传统的重要性度量仅检测输出均值的影响变量。
  • results: 引入的重要性度量是一致的,在实际数据和模拟数据上具有高效性,并且超越竞争者。特别是,该算法可以通过回归特征减少来选择变量,从而提供高精度的Conditional输出分布估计。
    Abstract Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for DRFs, based on the well-established drop and relearn principle and MMD distance. While traditional importance measures only detect variables with an influence on the output mean, our algorithm detects variables impacting the output distribution more generally. We show that the introduced importance measure is consistent, exhibits high empirical performance on both real and simulated data, and outperforms competitors. In particular, our algorithm is highly efficient to select variables through recursive feature elimination, and can therefore provide small sets of variables to build accurate estimates of conditional output distributions.
    摘要 <>使用 Distributional Random Forest(DRF)来估算输入变量的多变量输出分布。在本文中,我们提出了基于drop和重新学习原则以及MMD距离的变量重要性算法,该算法可捕捉输入变量对输出分布的影响,而不仅仅是输出均值。我们证明了该算法的一致性和高效性,并在实际和预测数据上实现了比较高的表现。特别是,我们的算法可以通过 recursively feature elimination来选择变量,从而快速建立高精度的 conditional output distribution 估计。Translation notes:* "Distributional Random Forest" is translated as "多变量随机森林" (mányuànxīn sēn lín)* "full conditional distribution" is translated as "完整的分布" (quèzhè de fēn xiǎng)* "variable importance" is translated as "变量重要性" (biànxīn zhòng yào xìng)* "drop and relearn principle" is translated as "drop和重新学习原则" (drop hé zhòng xīn xué xí yuè)* "MMD distance" is translated as "MMD距离" (MMD jù lù)* "recursive feature elimination" is translated as " recursively feature elimination" (jiē yǐjī zhì xiǎng fāng yì)

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

  • paper_url: http://arxiv.org/abs/2310.12109
  • repo_url: https://github.com/HazyResearch/m2
  • paper_authors: Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré
  • for: 这篇论文目的是探讨是否存在可以在序列长度和模型维度上呈现高性能的、不 quadratic 的机器学习模型。
  • methods: 该论文提出了一种新的 Monarch Mixer(M2)架构,使用同样的不 quadratic primitives来处理序列长度和模型维度:Monarch 矩阵,一种简单的可表示性structured 矩阵,可以在 GPU 上实现高硬件效率,并且可以呈现高性能。
  • results: 作为证明,该论文在三个领域中explored M2 的性能:非 causal BERT 样式语言模型、ViT 样式图像分类和 causal GPT 样式语言模型。在非 causal BERT 样式模型中,M2 与 BERT-base 和 BERT-large 相比,在下游 GLUE 质量上具有相同的性能,并且可以达到更高的通过put 性能(最高达 9.1 倍)。在 ImageNet 上,M2 超过 ViT-b 的准确率,仅使用半个参数。在 causal GPT 样式模型中,M2 可以与 Transformer 相比,在 360M 参数的预训练质量上具有相同的性能。
    Abstract Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension: Monarch matrices, a simple class of expressive structured matrices that captures many linear transforms, achieves high hardware efficiency on GPUs, and scales sub-quadratically. As a proof of concept, we explore the performance of M2 in three domains: non-causal BERT-style language modeling, ViT-style image classification, and causal GPT-style language modeling. For non-causal BERT-style modeling, M2 matches BERT-base and BERT-large in downstream GLUE quality with up to 27% fewer parameters, and achieves up to 9.1$\times$ higher throughput at sequence length 4K. On ImageNet, M2 outperforms ViT-b by 1% in accuracy, with only half the parameters. Causal GPT-style models introduce a technical challenge: enforcing causality via masking introduces a quadratic bottleneck. To alleviate this bottleneck, we develop a novel theoretical view of Monarch matrices based on multivariate polynomial evaluation and interpolation, which lets us parameterize M2 to be causal while remaining sub-quadratic. Using this parameterization, M2 matches GPT-style Transformers at 360M parameters in pretraining perplexity on The PILE--showing for the first time that it may be possible to match Transformer quality without attention or MLPs.
    摘要 机器学习模型在序列长度和模型维度上逐渐升级以达到更长的上下文和更高的性能。然而,现有的架构,如Transformers,在这两个轴上 quadratic scaling。我们问:是否存在高性能的架构,可以在序列长度和模型维度上下降幂?我们介绍了一新的架构:宫廷混合器(M2),它使用同样的幂次性 primitive来序列长度和模型维度:宫廷矩阵,一种简单的表达 Structured matrices 的类型,可以在 GPU 上实现高硬件效率,并在序列长度和模型维度上下降幂。作为一个证明,我们探索了 M2 在三个领域的性能:非 causal BERT 风格语言模型、ViT 风格图像分类和 causal GPT 风格语言模型。在非 causal BERT 风格模型中,M2 与 BERT-base 和 BERT-large 相当在下游 GLUE 质量上,并且在序列长度 4K 时间点可以达到 9.1 倍的throughput,而且只需要 27% 的参数。在 ImageNet 上,M2 超过 ViT-b 的准确率,只需要一半的参数。 causal GPT 风格模型引入了一个技术挑战:在 маSKing 中引入的 quadratic bottleneck。为了缓解这个瓶颈,我们开发了一种新的理论视角,基于多Variable 多项式评估和插值,这使得我们可以在 M2 中使用 causal 参数化,而不是 quadratic 参数化。使用这种参数化,M2 与 GPT 风格 Transformers 在 360M 参数的预训练损失上匹配,这是第一次证明可以不使用注意力或 MLP 匹配 Transformer 质量。

An Online Learning Theory of Brokerage

  • paper_url: http://arxiv.org/abs/2310.12107
  • repo_url: None
  • paper_authors: Nataša Bolić, Tommaso Cesari, Roberto Colomboni
  • For: The paper is written for investigating brokerage between traders from an online learning perspective, with a focus on the case where there are no designated buyer and seller roles.* Methods: The paper uses online learning techniques to achieve a low regret bound in the brokerage problem, specifically providing algorithms achieving regret $M \log T$ and $\sqrt{M T}$ under different assumptions about the agents’ valuations.* Results: The paper shows that the optimal regret rate is $M \log T$ when the agents’ valuations are revealed after each interaction, and $\sqrt{M T}$ when only their willingness to sell or buy at the proposed price is revealed. Additionally, the paper demonstrates that the optimal rate degrades to $\sqrt{T}$ when the bounded density assumption is dropped.
    Abstract We investigate brokerage between traders from an online learning perspective. At any round $t$, two traders arrive with their private valuations, and the broker proposes a trading price. Unlike other bilateral trade problems already studied in the online learning literature, we focus on the case where there are no designated buyer and seller roles: each trader will attempt to either buy or sell depending on the current price of the good. We assume the agents' valuations are drawn i.i.d. from a fixed but unknown distribution. If the distribution admits a density bounded by some constant $M$, then, for any time horizon $T$: $\bullet$ If the agents' valuations are revealed after each interaction, we provide an algorithm achieving regret $M \log T$ and show this rate is optimal, up to constant factors. $\bullet$ If only their willingness to sell or buy at the proposed price is revealed after each interaction, we provide an algorithm achieving regret $\sqrt{M T}$ and show this rate is optimal, up to constant factors. Finally, if we drop the bounded density assumption, we show that the optimal rate degrades to $\sqrt{T}$ in the first case, and the problem becomes unlearnable in the second.
    摘要 我们研究在线学习中的经纪人交易。在任意的回合 $t$ 中,两个经纪人会 arrive WITH 他们的私人估价,经纪人会提议交易价格。与其他双方贸易问题已经在在线学习文献中研究过的不同,我们专注于情况下没有指定的买方和卖方角色:每个经纪人都会尝试 Either 购买或卖出,根据当前商品价格。 我们假设经纪人的估价是从固定而 unknown 的分布中随机样本。如果该分布具有最大值 $M$,那么,对于任意的时间 horizon $T$:❝ 如果经纪人的估价在每次交互后公布,我们提供了一个算法,其 regret 为 $M \log T$,并证明这个率是最佳的,占常数因子。❞❝ 如果只有经纪人对于提议价格的愿意性被公布在每次交互后,我们提供了一个算法,其 regret 为 $\sqrt{M T}$,并证明这个率是最佳的,占常数因子。❞最后,如果我们取消了均勋度 bound 的假设,我们显示了最佳率下降到 $\sqrt{T}$ 在第一个情况下,并问题变得不可学习在第二个情况下。

On the latent dimension of deep autoencoders for reduced order modeling of PDEs parametrized by random fields

  • paper_url: http://arxiv.org/abs/2310.12095
  • repo_url: None
  • paper_authors: Nicola Rares Franco, Daniel Fraulin, Andrea Manzoni, Paolo Zunino
  • for: 这篇论文的目的是提供对随机场生成的Stochastic Partial Differential Equations (SPDEs)中Deep Learning-based Reduced Order Models (DL-ROMs)的理论分析。
  • methods: 本文使用了深度学习自动编码器作为ROMs的基础工具,并通过非线性神经网络的能力来减少问题的维度。
  • results: 本文提供了关于DL-ROMs在随机场中的理论分析,并提供了可导的错误 bound,以帮助域专家在选择深度学习自动编码器的缓存维度时进行优化。 数据示例表明,本文的分析对DL-ROMs的性能产生了显著的影响。
    Abstract Deep Learning is having a remarkable impact on the design of Reduced Order Models (ROMs) for Partial Differential Equations (PDEs), where it is exploited as a powerful tool for tackling complex problems for which classical methods might fail. In this respect, deep autoencoders play a fundamental role, as they provide an extremely flexible tool for reducing the dimensionality of a given problem by leveraging on the nonlinear capabilities of neural networks. Indeed, starting from this paradigm, several successful approaches have already been developed, which are here referred to as Deep Learning-based ROMs (DL-ROMs). Nevertheless, when it comes to stochastic problems parameterized by random fields, the current understanding of DL-ROMs is mostly based on empirical evidence: in fact, their theoretical analysis is currently limited to the case of PDEs depending on a finite number of (deterministic) parameters. The purpose of this work is to extend the existing literature by providing some theoretical insights about the use of DL-ROMs in the presence of stochasticity generated by random fields. In particular, we derive explicit error bounds that can guide domain practitioners when choosing the latent dimension of deep autoencoders. We evaluate the practical usefulness of our theory by means of numerical experiments, showing how our analysis can significantly impact the performance of DL-ROMs.
    摘要 深度学习对减少顺序模型(ROMs)的设计产生了深刻的影响,特别是在解决复杂问题上,其中经典方法可能会失败时。在这个情况下,深度自适应神经网络扮演了非常重要的角色,因为它们可以通过神经网络的非线性能力来减少问题的维度。从这个角度出发,已经有许多成功的方法被开发出来,这些方法被称为深度学习基于ROMs(DL-ROMs)。然而,当面临随机场所 parametrized 的问题时,现有的理论分析仅限于具有固定数量的 deterministic 参数的PDEs。本文的目的是扩展现有的文献,提供关于DL-ROMs在随机场所下的理论分析。特别是,我们 derive 了明确的错误 bound,可以帮助域专家在选择深度自适应神经网络的缓存维度时作出决策。我们通过数值实验证明了我们的理论对DL-ROMs的性能产生了显著的影响。

Contributing Components of Metabolic Energy Models to Metabolic Cost Estimations in Gait

  • paper_url: http://arxiv.org/abs/2310.12083
  • repo_url: None
  • paper_authors: Markus Gambietz, Marlies Nitschke, Jörg Miehling, Anne Koelewijn
  • for: 这个研究旨在深入理解人类行走中的代谢能量消耗模型,以便更好地估计代谢能量消耗。
  • methods: 我们使用了四种代谢能量消耗模型的参数进行 Monte Carlo 敏感分析,然后分析了这些参数的敏感指数、生理上的Context和生理过程中的代谢率。最终选择了一个 quasi-优化的模型。在第二步,我们 investigate了输入参数和变量的重要性,通过使用不同的输入特征来训练神经网络。
  • results: 我们发现,力量相关的参数在敏感分析中最为重要,而神经网络基于的输入特征选择也显示了承诺。然而,我们发现,使用神经网络模型的代谢能量消耗估计并没有达到传统模型的准确性。
    Abstract Objective: As metabolic cost is a primary factor influencing humans' gait, we want to deepen our understanding of metabolic energy expenditure models. Therefore, this paper identifies the parameters and input variables, such as muscle or joint states, that contribute to accurate metabolic cost estimations. Methods: We explored the parameters of four metabolic energy expenditure models in a Monte Carlo sensitivity analysis. Then, we analysed the model parameters by their calculated sensitivity indices, physiological context, and the resulting metabolic rates during the gait cycle. The parameter combination with the highest accuracy in the Monte Carlo simulations represented a quasi-optimized model. In the second step, we investigated the importance of input parameters and variables by analysing the accuracy of neural networks trained with different input features. Results: Power-related parameters were most influential in the sensitivity analysis and the neural network-based feature selection. We observed that the quasi-optimized models produced negative metabolic rates, contradicting muscle physiology. Neural network-based models showed promising abilities but have been unable to match the accuracy of traditional metabolic energy expenditure models. Conclusion: We showed that power-related metabolic energy expenditure model parameters and inputs are most influential during gait. Furthermore, our results suggest that neural network-based metabolic energy expenditure models are viable. However, bigger datasets are required to achieve better accuracy. Significance: As there is a need for more accurate metabolic energy expenditure models, we explored which musculoskeletal parameters are essential when developing a model to estimate metabolic energy.
    摘要 方法:我们在四种代谢能耗模型中进行了Monte Carlo敏感分析,然后分析了模型参数的计算敏感度指数、生理上的文脉和代谢过程中的代谢率。在Monte Carlo优化中,我们选择了最佳的参数组合,并在第二步中,通过不同输入特征的分析,了解输入参数和变量的重要性。结果:在敏感分析中,力量相关的参数具有最大的影响力,而神经网络基于的特征选择也显示了扩展的能力。然而,我们发现,在许多情况下,神经网络模型的准确性不如传统的代谢能耗模型。结论:我们发现,在步行过程中,力量相关的代谢能耗模型参数和输入变量具有最大的影响力。此外,我们的结果表明,神经网络基于的代谢能耗模型是可行的,但需要更大的数据来达到更高的准确性。重要性:由于代谢成本的估计是一个需要更加准确的问题,我们在这篇论文中探讨了 Musculoskeletal 参数是如何影响代谢能耗模型的。

Differential Equation Scaling Limits of Shaped and Unshaped Neural Networks

  • paper_url: http://arxiv.org/abs/2310.12079
  • repo_url: None
  • paper_authors: Mufan Bill Li, Mihai Nica
  • for: 这篇论文是关于无形activation函数的神经网络性能的研究,尤其是对于两种不同的无形网络架构(即Fully Connected ResNet和Multilayer Perceptron)的性能分析。
  • methods: 本文使用了不同的方法来分析无形网络的性能,包括使用差分方程来描述无形网络的架构,以及使用初值问题来分析无形网络的层次相关性。
  • results: 本文发现了两种无形网络架构在初始化时的相同架构准确性限制,并且对无形MLP网络的层次相关性进行了第一项级准确性修正。这些结果表明了无形网络和形态activation函数之间的连接,并开 up了研究正则化方法和形态activation函数之间的关系的可能性。
    Abstract Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled as the network size grows) have led to scaling limits described by differential equations. However, these results do not a priori tell us anything about "ordinary" unshaped networks, where the activation is unchanged as the network size grows. In this article, we find similar differential equation based asymptotic characterization for two types of unshaped networks. Firstly, we show that the following two architectures converge to the same infinite-depth-and-width limit at initialization: (i) a fully connected ResNet with a $d^{-1/2}$ factor on the residual branch, where $d$ is the network depth. (ii) a multilayer perceptron (MLP) with depth $d \ll$ width $n$ and shaped ReLU activation at rate $d^{-1/2}$. Secondly, for an unshaped MLP at initialization, we derive the first order asymptotic correction to the layerwise correlation. In particular, if $\rho_\ell$ is the correlation at layer $\ell$, then $q_t = \ell^2 (1 - \rho_\ell)$ with $t = \frac{\ell}{n}$ converges to an SDE with a singularity at $t=0$. These results together provide a connection between shaped and unshaped network architectures, and opens up the possibility of studying the effect of normalization methods and how it connects with shaping activation functions.
    摘要 近期的分析表明,在神经网络中使用扩展 activation function(即网络大小增长时Activation function也随着增长)会导致分析限制,这些结果并不直接告诉我们关于“常规”无形网络(即Activation function不变化与网络大小增长)的 anything。在这篇文章中,我们发现了两种类型的无形网络的极限性特征,即:首先,我们证明了以下两个架构在初始化时 converges to the same infinite-depth-and-width limit:(i)一个具有 $d^{-1/2}$ 因子的完全连接 ResNet,其中 $d$ 是网络深度。(ii)一个具有 $d \ll n$ 的多层感知器(MLP),其中 $d$ 是网络深度, activation 是 $d^{-1/2}$ 的折叠函数。其次,对于无形 MLP 的初始化,我们 derive the first order asymptotic correction to the layerwise correlation。 Specifically, if $\rho_\ell$ is the correlation at layer $\ell$, then $q_t = \ell^2 (1 - \rho_\ell)$ with $t = \frac{\ell}{n}$ converges to an SDE with a singularity at $t=0$.这些结果共同表明了无形和形 activation function 之间的连接,并开放了研究正规化方法和 activation function 的拟合方面的可能性。

Transformers for scientific data: a pedagogical review for astronomers

  • paper_url: http://arxiv.org/abs/2310.12069
  • repo_url: None
  • paper_authors: Dimitrios Tanoglidis, Bhuvnesh Jain, Helen Qu
  • for: 该论文主要用于引入transformers深度学习架构和相关的生成AI产品,并为科学家介绍transformers的应用。
  • methods: 论文使用自注意机制和原始transformer架构,并介绍了在天文学中使用transformers的应用。
  • results: 论文介绍了自注意机制的数学基础和transformers的应用在时间序列和图像数据中的成果。Note: The above information is in Simplified Chinese text.
    Abstract The deep learning architecture associated with ChatGPT and related generative AI products is known as transformers. Initially applied to Natural Language Processing, transformers and the self-attention mechanism they exploit have gained widespread interest across the natural sciences. The goal of this pedagogical and informal review is to introduce transformers to scientists. The review includes the mathematics underlying the attention mechanism, a description of the original transformer architecture, and a section on applications to time series and imaging data in astronomy. We include a Frequently Asked Questions section for readers who are curious about generative AI or interested in getting started with transformers for their research problem.
    摘要 <>将文本翻译成简化中文。<>与ChatGPT和相关的生成AI产品相关的深度学习架构被称为transformers。初始应用于自然语言处理,transformers和它们利用的自注意机制已经在自然科学领域引起了广泛的关注。本文的教学和非正式评论的目的是引入transformers给科学家。文中包括自注意机制的数学基础、原始transformer架构的描述和在天文学中对时间序列和图像数据的应用。我们附加了关于生成AI或想要使用transformers解决研究问题的常见问题 section。

Learning Gradient Fields for Scalable and Generalizable Irregular Packing

  • paper_url: http://arxiv.org/abs/2310.19814
  • repo_url: None
  • paper_authors: Tianyang Xue, Mingdong Wu, Lin Lu, Haoxuan Wang, Hao Dong, Baoquan Chen
  • for: solves the packing problem with irregularly shaped pieces, minimizing waste and avoiding overlap, using machine learning and conditional generative modeling.
  • methods: employs the score-based diffusion model to learn gradient fields that encode constraint satisfaction and spatial relationships, and uses a coarse-to-fine refinement mechanism to generate packing solutions.
  • results: demonstrates spatial utilization rates comparable to or surpassing those achieved by the teacher algorithm, and exhibits some level of generalization to shape variations.
    Abstract The packing problem, also known as cutting or nesting, has diverse applications in logistics, manufacturing, layout design, and atlas generation. It involves arranging irregularly shaped pieces to minimize waste while avoiding overlap. Recent advances in machine learning, particularly reinforcement learning, have shown promise in addressing the packing problem. In this work, we delve deeper into a novel machine learning-based approach that formulates the packing problem as conditional generative modeling. To tackle the challenges of irregular packing, including object validity constraints and collision avoidance, our method employs the score-based diffusion model to learn a series of gradient fields. These gradient fields encode the correlations between constraint satisfaction and the spatial relationships of polygons, learned from teacher examples. During the testing phase, packing solutions are generated using a coarse-to-fine refinement mechanism guided by the learned gradient fields. To enhance packing feasibility and optimality, we introduce two key architectural designs: multi-scale feature extraction and coarse-to-fine relation extraction. We conduct experiments on two typical industrial packing domains, considering translations only. Empirically, our approach demonstrates spatial utilization rates comparable to, or even surpassing, those achieved by the teacher algorithm responsible for training data generation. Additionally, it exhibits some level of generalization to shape variations. We are hopeful that this method could pave the way for new possibilities in solving the packing problem.
    摘要 <> packing 问题,也称为割辑或嵌入,在物流、制造、布局设计和地图生成中有广泛的应用。它涉及到将不规则形状的物品安排,以最小化剩下物和避免重叠。 recent advances in machine learning,特别是强化学习,对 packing 问题提出了新的思路。在这项工作中,我们将更深入地探讨一种基于机器学习的新方法,将 packing 问题转化为 conditional generative modeling。为了解决不规则嵌入中的挑战,包括物体有效性约束和碰撞避免,我们的方法使用分数据模型来学习一系列的梯度场。这些梯度场表达了对约束满足和物体间的空间关系的学习。在测试阶段,我们使用一种粗细层次匀化机制,以指导学习的梯度场来生成嵌入解。为了提高嵌入可行性和优化,我们引入了两种关键的建筑设计:多尺度特征提取和粗细层次关系提取。我们对两种典型的工业嵌入领域进行实验,只考虑翻译。实验结果表明,我们的方法可以与教师算法负责数据生成的空间利用率相当,甚至超过。此外,它还有一定的泛化能力。我们希望这种方法可以为嵌入问题开拓新的可能性。

Understanding Reward Ambiguity Through Optimal Transport Theory in Inverse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.12055
  • repo_url: None
  • paper_authors: Ali Baheri
  • for: 这 paper 的中心目标是寻找在观察到的专家行为中隐藏的奖励函数,以便不仅解释数据,还能够泛化到未经见过的情况。
  • methods: 这 paper 使用 optimal transport (OT) 理论,提供了一种新的视角来解决高维问题和奖励不确定性的问题。
  • results: 这 paper 的研究发现,通过 Wasserstein 距离来衡量奖励不确定性,并提供了一种中心表示或中心函数的确定方法,这些发现可以为高维 setting 中的 robust IRL 方法提供一种结构化的途径。
    Abstract In inverse reinforcement learning (IRL), the central objective is to infer underlying reward functions from observed expert behaviors in a way that not only explains the given data but also generalizes to unseen scenarios. This ensures robustness against reward ambiguity where multiple reward functions can equally explain the same expert behaviors. While significant efforts have been made in addressing this issue, current methods often face challenges with high-dimensional problems and lack a geometric foundation. This paper harnesses the optimal transport (OT) theory to provide a fresh perspective on these challenges. By utilizing the Wasserstein distance from OT, we establish a geometric framework that allows for quantifying reward ambiguity and identifying a central representation or centroid of reward functions. These insights pave the way for robust IRL methodologies anchored in geometric interpretations, offering a structured approach to tackle reward ambiguity in high-dimensional settings.
    摘要 倒 inverse reinforcement learning(IRL)的中心目标是从专家行为中推理出底层奖励函数,以解释数据并在未看到的情况下推广。这 Ensures robustness against 奖励ambiguity, where multiple 奖励函数可以一样 explain 专家行为。 although significant efforts have been made to address this issue, current methods often face challenges with high-dimensional problems and lack a geometric foundation.this paper harnesses the optimal transport(OT)theory to provide a fresh perspective on these challenges. By utilizing the Wasserstein distance from OT, we establish a geometric framework that allows for quantifying 奖励ambiguity and identifying a central representation or centroid of reward functions. These insights pave the way for robust IRL methodologies anchored in geometric interpretations, offering a structured approach to tackle 奖励ambiguity in high-dimensional settings.

Applications of ML-Based Surrogates in Bayesian Approaches to Inverse Problems

  • paper_url: http://arxiv.org/abs/2310.12046
  • repo_url: None
  • paper_authors: Pelin Ersin, Emma Hayes, Peter Matthews, Paramjyoti Mohapatra, Elisa Negrini, Karl Schulz
  • for: 寻找波源位置在方正区域的逆问题,给出噪音解的解决方案。
  • methods: 使用神经网络作为代理模型,提高计算效率,使得Markov Chain Monte Carlo方法可以用于评估 posterior 分布中的源位置。
  • results: 通过寻找波源位置的方法,可以准确地从噪音数据中提取源位置信息。
    Abstract Neural networks have become a powerful tool as surrogate models to provide numerical solutions for scientific problems with increased computational efficiency. This efficiency can be advantageous for numerically challenging problems where time to solution is important or when evaluation of many similar analysis scenarios is required. One particular area of scientific interest is the setting of inverse problems, where one knows the forward dynamics of a system are described by a partial differential equation and the task is to infer properties of the system given (potentially noisy) observations of these dynamics. We consider the inverse problem of inferring the location of a wave source on a square domain, given a noisy solution to the 2-D acoustic wave equation. Under the assumption of Gaussian noise, a likelihood function for source location can be formulated, which requires one forward simulation of the system per evaluation. Using a standard neural network as a surrogate model makes it computationally feasible to evaluate this likelihood several times, and so Markov Chain Monte Carlo methods can be used to evaluate the posterior distribution of the source location. We demonstrate that this method can accurately infer source-locations from noisy data.
    摘要 Translated into Simplified Chinese:神经网络已成为数学问题的强大工具,提供了计算效率的增强。这种效率可以在计算复杂的问题中帮助提高解决时间,或者在评估多个相似的分析场景时提高计算效率。一个科学领域的特别兴趣是反问题,即知道系统的前向动力学方程,并且要从(潜在噪声)观测中推断系统的性质。我们考虑了二维声波方程的反问题,即在平方Domain中推断声源的位置,给出噪声解的情况。在假设 Gaussian 噪声时,可以形式化一个likelihood函数,该函数需要一次前向模拟 per 评估。使用标准神经网络作为模拟模型,可以使计算这个likelihood多次成为可能,然后使用Markov Chain Monte Carlo 方法评估 posterior 分布。我们示示了这种方法可以准确地从噪声数据中推断声源位置。

Conformal Drug Property Prediction with Density Estimation under Covariate Shift

  • paper_url: http://arxiv.org/abs/2310.12033
  • repo_url: https://github.com/siddharthal/CoDrug
  • paper_authors: Siddhartha Laghuvarapu, Zhen Lin, Jimeng Sun
  • for: This paper aims to address the challenge of obtaining reliable uncertainty estimates in drug discovery tasks using Conformal Prediction (CP) and to provide valid prediction sets for molecular properties with a coverage guarantee.
  • methods: The proposed method, CoDrug, employs an energy-based model leveraging both training data and unlabelled data, and Kernel Density Estimation (KDE) to assess the densities of a molecule set. The estimated densities are then used to weigh the molecule samples while building prediction sets and rectifying for distribution shift.
  • results: In extensive experiments involving realistic distribution drifts in various small-molecule drug discovery tasks, CoDrug was shown to provide valid prediction sets and to reduce the coverage gap by over 35% when compared to conformal prediction sets not adjusted for covariate shift.
    Abstract In drug discovery, it is vital to confirm the predictions of pharmaceutical properties from computational models using costly wet-lab experiments. Hence, obtaining reliable uncertainty estimates is crucial for prioritizing drug molecules for subsequent experimental validation. Conformal Prediction (CP) is a promising tool for creating such prediction sets for molecular properties with a coverage guarantee. However, the exchangeability assumption of CP is often challenged with covariate shift in drug discovery tasks: Most datasets contain limited labeled data, which may not be representative of the vast chemical space from which molecules are drawn. To address this limitation, we propose a method called CoDrug that employs an energy-based model leveraging both training data and unlabelled data, and Kernel Density Estimation (KDE) to assess the densities of a molecule set. The estimated densities are then used to weigh the molecule samples while building prediction sets and rectifying for distribution shift. In extensive experiments involving realistic distribution drifts in various small-molecule drug discovery tasks, we demonstrate the ability of CoDrug to provide valid prediction sets and its utility in addressing the distribution shift arising from de novo drug design models. On average, using CoDrug can reduce the coverage gap by over 35% when compared to conformal prediction sets not adjusted for covariate shift.
    摘要 在药物发现中,确认计算模型预测的药品性能需要通过costly的湿lab实验进行验证。因此,获得可靠的不确定性估计是关键的,以便在后续实验验证中PRIORITIZE drug molecules。 Conformal Prediction(CP)是一种可靠的工具,可以创建包含预测性能的prediction sets。然而,CP中的交换性假设在药物发现任务中经常遇到冲击:大多数数据集只包含有限的标签数据,这些数据可能不能代表整个化学空间中的分子。为解决这个限制,我们提出了一种方法called CoDrug,它使用能量基本模型利用训练数据和无标签数据,以及Kernel Density Estimation(KDE)来评估分子集的浓度。然后,使用这些估计的浓度来权重分子样本,以建立预测集和纠正分布shift。在具有实际分布滑动的小分子药物发现任务中,我们通过实验证明CoDrug可以提供有效的预测集,并且在de novo drug design模型中 Addressing the distribution shift。在average上,使用CoDrug可以将覆盖缺口减少超过35%,比不 Rectifying for distribution shift。

Exact and efficient solutions of the LMC Multitask Gaussian Process model

  • paper_url: http://arxiv.org/abs/2310.12032
  • repo_url: https://github.com/qwerty6191/projected-lmc
  • paper_authors: Olivier Truffinet, Karim Ammar, Jean-Philippe Argaud, Bertrand Bouriquet
  • for: 这个论文是关于多任务 Gaussian process regression 或分类的一种非常通用的模型,它的表达能力和概念简单性很吸引人。但是,直接实现方式的复杂性是 cubic 在数据点和任务数量的平方,这意味着大多数应用中需要使用简化方法。然而,最近的研究表明,在某些条件下,模型的隐藏过程可以分离,从而实现 linear 复杂性。
  • methods: 我们在这篇论文中扩展了这些结果,并证明了在最通用的假设下,只需要一个轻度的噪声模型假设,就可以实现高效的精确计算。我们还提出了一种完整的参数化方法,并给出了质量函数,以便高效地优化。
  • results: 我们在synthetic数据上进行了参数研究,并证明了我们的方法的出色表现,相比之下unrestricted exact LMC和其他简化方法。总之, проекed LMC 模型是一种可靠和简单的代用方法,它可以大大简化一些计算,如离散一个数据点的cross-validation和幻想。
    Abstract The Linear Model of Co-regionalization (LMC) is a very general model of multitask gaussian process for regression or classification. While its expressivity and conceptual simplicity are appealing, naive implementations have cubic complexity in the number of datapoints and number of tasks, making approximations mandatory for most applications. However, recent work has shown that under some conditions the latent processes of the model can be decoupled, leading to a complexity that is only linear in the number of said processes. We here extend these results, showing from the most general assumptions that the only condition necessary to an efficient exact computation of the LMC is a mild hypothesis on the noise model. We introduce a full parametrization of the resulting \emph{projected LMC} model, and an expression of the marginal likelihood enabling efficient optimization. We perform a parametric study on synthetic data to show the excellent performance of our approach, compared to an unrestricted exact LMC and approximations of the latter. Overall, the projected LMC appears as a credible and simpler alternative to state-of-the art models, which greatly facilitates some computations such as leave-one-out cross-validation and fantasization.
    摘要 linear 模型的协同地域化 (LMC) 是一种非常通用的多任务 Gaussian 过程 regression 或 classification 模型。 虽其表达能力和概念简洁吸引人,但直接实现的方法具有 кубиック complexity 在数据点和任务数量上,使得大多数应用中需要使用 Approximations。然而,最近的研究表明,在某些条件下,latent 过程的模型可以减少,导致只有 linear 复杂度在数据点和任务数量上。我们在这里扩展这些结果,表明只需要对模型噪声模型进行一定的假设,就可以实现高效的精确计算。我们介绍了该模型的完整均衡 parametrization,并提供了计算 marginal 概率的表达,使得可以高效地优化。我们在synthetic数据上进行了参数研究,并证明了我们的方法在比较于未限制的精确 LMC 和其approximations 上表现出色。总之,Projected LMC 模型看起来是一种可靠和简单的代码,它可以大大简化一些计算,如离开一个 cross-validation 和幻想。

Nonparametric Discrete Choice Experiments with Machine Learning Guided Adaptive Design

  • paper_url: http://arxiv.org/abs/2310.12026
  • repo_url: None
  • paper_authors: Mingzhang Yin, Ruijiang Gao, Weiran Lin, Steven M. Shugan
  • For: 这个论文旨在设计用于满足消费者偏好的产品,以提高企业的成功。* Methods: 论文提出了一种名为 Gradient-based Survey (GBS) 的不 Parametric 的选择实验方法,用于多Attribute 产品设计。GBS 通过问题序列和响应者之前的选择来逐步定义产品特性。* Results: 对于在 simulations 中进行比较的 parametric 和非 Parametric 方法,GBS 具有更高的准确率和样本效率。
    Abstract Designing products to meet consumers' preferences is essential for a business's success. We propose the Gradient-based Survey (GBS), a discrete choice experiment for multiattribute product design. The experiment elicits consumer preferences through a sequence of paired comparisons for partial profiles. GBS adaptively constructs paired comparison questions based on the respondents' previous choices. Unlike the traditional random utility maximization paradigm, GBS is robust to model misspecification by not requiring a parametric utility model. Cross-pollinating the machine learning and experiment design, GBS is scalable to products with hundreds of attributes and can design personalized products for heterogeneous consumers. We demonstrate the advantage of GBS in accuracy and sample efficiency compared to the existing parametric and nonparametric methods in simulations.
    摘要 为商业成功,设计产品根据消费者的偏好非常重要。我们提议 Gradient-based Survey(GBS),一种多Attribute产品设计的灵活选择实验。这种实验通过一系列对半个配置进行对比,抽取消费者的偏好。与传统的随机Utility最大化理论不同,GBS不需要 Parametric Utility模型,因此更具鲁棒性。通过融合机器学习和实验设计,GBS可扩展到产品上百个特征,设计个性化产品 для多样化的消费者。我们通过模拟表明,GBS在准确性和样本效率方面比现有的参数化和非参数化方法有优势。

Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

  • paper_url: http://arxiv.org/abs/2310.12000
  • repo_url: https://github.com/fabsig/GPBoost
  • paper_authors: Pascal Kündig, Fabio Sigrist
  • for: 这篇论文旨在探讨高维函数模型(Gaussian Process,GP)的精确估计方法,以及组合Vecchia-Laplace近似法和迭代法的优化。
  • methods: 本论文使用Vecchia-Laplace近似法和迭代法来实现高维函数模型的精确估计,并提出了一些iterative方法来提高 computations的速度。
  • results: 论文的实验结果显示,使用Vecchia-Laplace近似法和迭代法可以大幅提高估计的速度,并且在一个大型卫星数据集上比起现有方法实现三倍的预测精度。
    Abstract Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present several iterative methods for inference with Vecchia-Laplace approximations which make computations considerably faster compared to Cholesky-based calculations. We analyze our proposed methods theoretically and in experiments with simulated and real-world data. In particular, we obtain a speed-up of an order of magnitude compared to Cholesky-based inference and a threefold increase in prediction accuracy in terms of the continuous ranked probability score compared to a state-of-the-art method on a large satellite data set. All methods are implemented in a free C++ software library with high-level Python and R packages.
    摘要 潜在 Gaussian 过程(GP)模型是一种灵活的可信度非参数函数模型。Vecchia aproximations 是一种精准的GP模型 Approximations 可以在大量数据时提高计算效率,而 Laplace Approximations 是一种快速的方法,它具有 asymptotic convergence guarantees 来近似 marginal likelihoods 和 posterior predictive distributions 的非 Gaussian 类型。然而,Vecchia-Laplace approximations 的计算复杂度随着样本大小增加,使用 direct solver methods such as Cholesky decomposition 时会变得不可持久。因此,在大数据集时,Vecchia-Laplace approximations 的计算变得繁琐。在这篇文章中,我们提出了一些迭代法来实现Vecchia-Laplace approximations 的推理,使计算速度比 Cholesky-based 计算更快。我们还进行了理论分析和实验室测试,并在 simulated 和实际数据集上 obtaint 一个级别的速度提升和三倍的预测精度。所有方法都是在一个免费 C++ 软件库中实现的,并提供了高级 Python 和 R 包。

Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation

  • paper_url: http://arxiv.org/abs/2310.11991
  • repo_url: None
  • paper_authors: Floris Holstege, Bram Wouters, Noud van Giersbergen, Cees Diks
  • for: 本研究旨在提高神经网络模型对异常数据的泛化性能,通过禁用干扰因素。
  • methods: 本研究提出了一种迭代算法,通过同时确定两个低维度正交子空间来分离干扰因素和主任务因素。
  • results: 对 Wasserstein 和 CelebA 图像Dataset以及 MultiNLI 自然语言处理Dataset进行评估,发现该算法可以超过现有的概念除除法。
    Abstract Out-of-distribution generalization in neural networks is often hampered by spurious correlations. A common strategy is to mitigate this by removing spurious concepts from the neural network representation of the data. Existing concept-removal methods tend to be overzealous by inadvertently eliminating features associated with the main task of the model, thereby harming model performance. We propose an iterative algorithm that separates spurious from main-task concepts by jointly identifying two low-dimensional orthogonal subspaces in the neural network representation. We evaluate the algorithm on benchmark datasets for computer vision (Waterbirds, CelebA) and natural language processing (MultiNLI), and show that it outperforms existing concept removal methods
    摘要 neural networks 中的 out-of-distribution 泛化受到假 correlate 的干扰。一般的方法是通过 removing spurious concepts 来 mitigate 这种情况。现有的 concept-removal 方法往往过于积极,不小心 eliminating 主要任务相关的特征,从而害到模型性能。我们提出了一种迭代算法,jointly identifying two low-dimensional orthogonal subspaces 在 neural network representation 中,以分离假 correlations 和主要任务相关的特征。我们在 Waterbirds、CelebA 和 MultiNLI 等 benchmark datasets 上评估了该算法,并显示它在 existing concept removal 方法 上表现出色。

Image Clustering with External Guidance

  • paper_url: http://arxiv.org/abs/2310.11989
  • repo_url: None
  • paper_authors: Yunfan Li, Peng Hu, Dezhong Peng, Jiancheng Lv, Jianping Fan, Xi Peng
  • for: 提高图像归一化的性能,利用外部知识作为指导信号
  • methods: 利用WordNet词语来增强特征分化,并在图像和文本模式之间进行相互馈散学习
  • results: 在五个常用的图像归一化 benchmark 上达到了状态机的性能,包括全部 ImageNet-1K 数据集
    Abstract The core of clustering is incorporating prior knowledge to construct supervision signals. From classic k-means based on data compactness to recent contrastive clustering guided by self-supervision, the evolution of clustering methods intrinsically corresponds to the progression of supervision signals. At present, substantial efforts have been devoted to mining internal supervision signals from data. Nevertheless, the abundant external knowledge such as semantic descriptions, which naturally conduces to clustering, is regrettably overlooked. In this work, we propose leveraging external knowledge as a new supervision signal to guide clustering, even though it seems irrelevant to the given data. To implement and validate our idea, we design an externally guided clustering method (Text-Aided Clustering, TAC), which leverages the textual semantics of WordNet to facilitate image clustering. Specifically, TAC first selects and retrieves WordNet nouns that best distinguish images to enhance the feature discriminability. Then, to improve image clustering performance, TAC collaborates text and image modalities by mutually distilling cross-modal neighborhood information. Experiments demonstrate that TAC achieves state-of-the-art performance on five widely used and three more challenging image clustering benchmarks, including the full ImageNet-1K dataset.
    摘要 核心是在含有先验知识的情况下构建监督信号。从经典k-means基于数据压缩到最近的对照集成监督,集群方法的演化都与监督信号的进步相对应。到目前为止,大量的内部监督信号从数据中被挖掘出来。然而,外部知识,如semantic description,尚未得到了适当的利用。在这种情况下,我们提议利用外部知识作为新的监督信号,即使它与给定数据看起来不相关。为了实现和验证我们的想法,我们设计了一种受外部知识引导的集群方法(Text-Aided Clustering,TAC)。TAC首先选择和检索WordNet词汇,以增强特征描述性。然后,为了提高图像集群性能,TAC与文本和图像模式之间进行协同整合,通过相互洗礼距离信息。实验表明,TAC在5个广泛使用的和3个更加挑战的图像集群 benchmark上达到了状态机器人的性能。

A Finite-Horizon Approach to Active Level Set Estimation

  • paper_url: http://arxiv.org/abs/2310.11985
  • repo_url: None
  • paper_authors: Phillip Kearns, Bruno Jedynak, John Lipor
  • for: 本文目的是提出一种活动学习方法来实现等值集 estimation(LSE),以最小化最终估计误差和旅行距离。
  • methods: 本文使用一种finite-horizon搜索过程来实现LSE,并通过调整一个参数来让方法兼顾估计准确性和旅行距离。
  • results: 实验表明,当cost of travel增加时,我们的方法可以更好地使用距离非偏视来提高估计精度,并在真实的空气质量数据上实现约一半的估计误差。
    Abstract We consider the problem of active learning in the context of spatial sampling for level set estimation (LSE), where the goal is to localize all regions where a function of interest lies above/below a given threshold as quickly as possible. We present a finite-horizon search procedure to perform LSE in one dimension while optimally balancing both the final estimation error and the distance traveled for a fixed number of samples. A tuning parameter is used to trade off between the estimation accuracy and distance traveled. We show that the resulting optimization problem can be solved in closed form and that the resulting policy generalizes existing approaches to this problem. We then show how this approach can be used to perform level set estimation in higher dimensions under the popular Gaussian process model. Empirical results on synthetic data indicate that as the cost of travel increases, our method's ability to treat distance nonmyopically allows it to significantly improve on the state of the art. On real air quality data, our approach achieves roughly one fifth the estimation error at less than half the cost of competing algorithms.
    摘要 我们在各种空间采样中考虑了活动学习,其目标是尽可能快地找到一个函数关注的区域是否超过了一定的阈值。我们提出了一种有限距离搜索过程,用于在一维中进行最优化的水平集估计,同时尽量减少最终估计误差和旅行距离。我们使用一个调整参数,以让最终估计误差和旅行距离之间进行负面交易。我们表明,这个优化问题可以在关闭式形式下解决,并且得到的策略可以折衔现有方法。然后,我们展示了如何使用这种方法来进行高维空间下的水平集估计,使用泊松过程模型。我们的实验结果表明,当成本增加时,我们的方法可以不偏袋见茫地减少估计误差。在实际空气质量数据上,我们的方法可以实现约一剑五分之一的估计误差,而且花费比竞争算法少得多。

Can bin-wise scaling improve consistency and adaptivity of prediction uncertainty for machine learning regression ?

  • paper_url: http://arxiv.org/abs/2310.11978
  • repo_url: https://github.com/ppernot/2023_bvs
  • paper_authors: Pascal Pernot
  • for: 这篇论文是为了提出一种基于不同变量的准备误差抑制方法,以提高机器学习回归问题的预测不确定性calibration的效果。
  • methods: 这篇论文使用了 uncertainty-based binning 方法,通过基于不同变量的分配来改进calibration的条件,即consistency。
  • results: 作者在一个 benchmark 数据集上测试了 BVS 和其变体,与 isotonic regression 进行比较,发现 BVS 和其变体可以更好地适应不同的输入特征,提高calibration的效果。
    Abstract Binwise Variance Scaling (BVS) has recently been proposed as a post hoc recalibration method for prediction uncertainties of machine learning regression problems that is able of more efficient corrections than uniform variance (or temperature) scaling. The original version of BVS uses uncertainty-based binning, which is aimed to improve calibration conditionally on uncertainty, i.e. consistency. I explore here several adaptations of BVS, in particular with alternative loss functions and a binning scheme based on an input-feature (X) in order to improve adaptivity, i.e. calibration conditional on X. The performances of BVS and its proposed variants are tested on a benchmark dataset for the prediction of atomization energies and compared to the results of isotonic regression.
    摘要 Binwise Variance Scaling (BVS) 是一种最近提出的机器学习回归问题预测不确定性的后处修正方法,能够更有效地 corrections than uniform variance (或温度) scaling。原版本的 BVS 使用不确定性基于的分类,以提高预测条件上的准确性,即一致性。我在这里 explore 了 BVS 的一些变体,包括使用不同的损失函数和基于输入特征(X)的分类方案,以提高适应性,即预测条件下的准确性。我们对一个 benchmark 数据集进行了预测 atomization energies 的测试,并与ISOREG 的结果进行了比较。Here's the translation in Traditional Chinese: Binwise Variance Scaling (BVS) 是一种最近提出的机器学习回归问题的预测不确定性的后置修正方法,能够更有效地 corrections than uniform variance (或温度) scaling。原版本的 BVS 使用不确定性基于的分类,以提高预测条件上的准确性,即一致性。我在这里 explore 了 BVS 的一些变体,包括使用不同的损失函数和基于输入特征(X)的分类方案,以提高适应性,即预测条件下的准确性。我们对一个 benchmark 数据集进行了预测 atomization energies 的测试,并与ISOREG 的结果进行了比较。

Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews

  • paper_url: http://arxiv.org/abs/2310.11967
  • repo_url: https://github.com/bandas-center/atrain
  • paper_authors: Armin Haberl, Jürgen Fleiß, Dominik Kowald, Stefan Thalmann
  • for: 这个论文是为了帮助研究人员进行多种语言的语音数据转译,不需要编程技能,可以在大多数计算机上运行,不需要互联网连线,且不会上传数据到服务器。
  • methods: 论文使用OpenAI的Whisper模型和声音识别技术,与MAXQDA和ATLAS.ti等流行的质量数据分析软件集成,提供了易于使用的图形用户界面,可以通过Microsoft Store上的Windows应用程序安装。
  • results: 根据论文的描述,在现有的移动CPU上,转译时间约为音频档案的2-3倍,如果有入门级的图形卡,则转译速度增加到音频档案的20%。
    Abstract aTrain is an open-source and offline tool for transcribing audio data in multiple languages with CPU and NVIDIA GPU support. It is specifically designed for researchers using qualitative data generated from various forms of speech interactions with research participants. aTrain requires no programming skills, runs on most computers, does not require an internet connection, and was verified not to upload data to any server. aTrain combines OpenAI's Whisper model with speaker recognition to provide output that integrates with the popular qualitative data analysis software tools MAXQDA and ATLAS.ti. It has an easy-to-use graphical interface and is provided as a Windows-App through the Microsoft Store allowing for simple installation by researchers. The source code is freely available on GitHub. Having developed aTrain with a focus on speed on local computers, we show that the transcription time on current mobile CPUs is around 2 to 3 times the duration of the audio file using the highest-accuracy transcription models. If an entry-level graphics card is available, the transcription speed increases to 20% of the audio duration.
    摘要 aTrain 是一个开源、离线工具,用于转换多种语言的语音数据。它是特意针对对谈话参与者的质数数据进行研究而设计,并且不需要程式码技能,可以在大多数电脑上运行,不需要网页连线,并且确保没有上传数据到服务器。aTrain 结合 OpenAI 的 Whisper 模型和话者识别系统,以提供与 MAXQDA 和 ATLAS.ti 等受欢迎的质数数据分析软件集成。它具有易用的 графі式界面,通过 Microsoft Store 提供为 Windows 应用程序,让研究人员可以简单地安装。源代码则是免费公开在 GitHub 上。我们透过专注于本地电脑的速度,显示在现有的移动 CPU 上,转换时间约为音频档案的2-3倍,使用最高精度转换模型。如果有入门级的显卡可用,则转换速度将提高到音频档案的20%。

Flexible Payload Configuration for Satellites using Machine Learning

  • paper_url: http://arxiv.org/abs/2310.11966
  • repo_url: None
  • paper_authors: Marcele O. K. Mendonca, Flor G. Ortiz-Gomez, Jorge Querol, Eva Lagunas, Juan A. Vásquez Peralvo, Victor Monzon Baeza, Symeon Chatzinotas, Bjorn Ottersten
  • for: 提高卫星通信系统的效率和质量,适应不同的吞吐量和延迟要求。
  • methods: 使用机器学习(ML)技术进行无线资源管理(RRM),将RRM任务定义为一个回归型ML问题,并将RRM目标和约束集成到损失函数中,以便ML算法尽可能地减小。
  • results: 通过对ML模型的表现进行评估,并考虑模型的资源分配决策对总体通信系统性能的影响,提出了一种Context-aware ML metric。
    Abstract Satellite communications, essential for modern connectivity, extend access to maritime, aeronautical, and remote areas where terrestrial networks are unfeasible. Current GEO systems distribute power and bandwidth uniformly across beams using multi-beam footprints with fractional frequency reuse. However, recent research reveals the limitations of this approach in heterogeneous traffic scenarios, leading to inefficiencies. To address this, this paper presents a machine learning (ML)-based approach to Radio Resource Management (RRM). We treat the RRM task as a regression ML problem, integrating RRM objectives and constraints into the loss function that the ML algorithm aims at minimizing. Moreover, we introduce a context-aware ML metric that evaluates the ML model's performance but also considers the impact of its resource allocation decisions on the overall performance of the communication system.
    摘要 卫星通信,现代连接的关键,扩展至海上、航空和远郊地区, terrestrial 网络无法实现。现有的 GEO 系统在多个扫描面上均匀分配功率和频率,使用多扫描面 fractional frequency reuse。然而, latest research 显示这种方法在多样化流量场景下存在限制,导致不充分利用。为解决这个问题,这篇论文提出一种基于机器学习(ML)的Radio Resource Management(RRM)方法。我们将 RRM 任务视为一个回归 ML 问题,将 RRM 目标和约束 integrate 到 ML 算法目标函数中。此外,我们还引入了一种 context-aware ML 指标,评估 ML 模型的性能,同时考虑它的资源分配决策对通信系统的总性能的影响。

Recasting Continual Learning as Sequence Modeling

  • paper_url: http://arxiv.org/abs/2310.11952
  • repo_url: https://github.com/soochan-lee/cl-as-seq
  • paper_authors: Soochan Lee, Jaehyeon Son, Gunhee Kim
  • for: 本研究旨在将重要的机器学习领域——启发学习和序列模型——加强连接起来。即我们提议将启发学习视为序列模型问题,使高级序列模型可以用于启发学习。在这种形式下,启发学习过程变成了序列模型的前进传播。
  • methods: 我们采用了元 continual learning(MCL)框架,在多个启发学习集合中训练序列模型。作为具体的例子,我们示cases了使用 transformers 和其高效变体作为 MCL 方法。
  • results: 我们在七个 bencmarks 上进行了七个benchmark,包括分类和回归任务,结果显示了序列模型可以是通用 MCL 的有ffektive解决方案。
    Abstract In this work, we aim to establish a strong connection between two significant bodies of machine learning research: continual learning and sequence modeling. That is, we propose to formulate continual learning as a sequence modeling problem, allowing advanced sequence models to be utilized for continual learning. Under this formulation, the continual learning process becomes the forward pass of a sequence model. By adopting the meta-continual learning (MCL) framework, we can train the sequence model at the meta-level, on multiple continual learning episodes. As a specific example of our new formulation, we demonstrate the application of Transformers and their efficient variants as MCL methods. Our experiments on seven benchmarks, covering both classification and regression, show that sequence models can be an attractive solution for general MCL.
    摘要 在这个工作中,我们想要建立两个机器学习研究领域之间的强有力连接:不间断学习和序列模型。即我们提议将不间断学习问题设置为序列模型问题,以便使用高级序列模型进行不间断学习。根据这种设置,不间断学习过程变成了序列模型的前向传播。通过采用meta-不间断学习(MCL)框架,我们可以在多个不间断学习集合上训练序列模型。为了示例,我们展示了使用Transformers和其高效变体作为MCL方法的应用。我们在七个标准准确的benchmark上进行了七种不同的实验,包括分类和回归任务,结果表明序列模型可以是通用MCL的有效解决方案。

Interpretable Spectral Variational AutoEncoder (ISVAE) for time series clustering

  • paper_url: http://arxiv.org/abs/2310.11940
  • repo_url: None
  • paper_authors: Óscar Jiménez Rama, Fernando Moreno-Pino, David Ramírez, Pablo M. Olmos
  • for: 这篇论文是为了提出一种新的变量自动编码器(VAE)模型,该模型具有可解释性的瓶颈(Filter Bank,FB),以便学习更加可解释的潜在空间。
  • methods: 该模型使用了VAE的基本结构,并在其前置了FB。FB强制VAE关注输入信号中最重要的部分,从而学习一个新的编码${f_0}$,该编码具有更高的可解释性和分化性。
  • results: 实验结果表明,ISVAE模型比传统的VAE模型在分类率上表现更高,并且可以更好地处理复杂的数据配置。此外,${f_0}$的演化征imatters表明了群集之间的相似性。
    Abstract The best encoding is the one that is interpretable in nature. In this work, we introduce a novel model that incorporates an interpretable bottleneck-termed the Filter Bank (FB)-at the outset of a Variational Autoencoder (VAE). This arrangement compels the VAE to attend on the most informative segments of the input signal, fostering the learning of a novel encoding ${f_0}$ which boasts enhanced interpretability and clusterability over traditional latent spaces. By deliberately constraining the VAE with this FB, we intentionally constrict its capacity to access broad input domain information, promoting the development of an encoding that is discernible, separable, and of reduced dimensionality. The evolutionary learning trajectory of ${f_0}$ further manifests as a dynamic hierarchical tree, offering profound insights into cluster similarities. Additionally, for handling intricate data configurations, we propose a tailored decoder structure that is symmetrically aligned with FB's architecture. Empirical evaluations highlight the superior efficacy of ISVAE, which compares favorably to state-of-the-art results in clustering metrics across real-world datasets.
    摘要 最佳编码是可解释的编码。在这项工作中,我们提出了一种新的模型,其中包含了一个可解释的瓶颈(Filter Bank,FB),这个瓶颈位于Variational Autoencoder(VAE)的开头。这种设计使得VAE需要关注输入信号中最重要的信息,从而促进了学习一个新的编码${f_0}$,该编码具有更高的可解释性和分布性。通过强制VAE通过FB进行制约,我们故意削弱VAE对输入信号范围广的信息访问权限,从而促进了编码的可读性、分割性和维度减少。${f_0}$的演化学习轨迹更显示出了动态层次树的形式,提供了深刻的群集相似性的启示。此外,为处理复杂的数据配置,我们提议一种适应FB的编码结构。实验证明,ISVAE的效果明显高于州际级的结果,在真实世界数据集上达到了高度的分 clustering metrics。

Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.11897
  • repo_url: https://github.com/nycu-rl-bandits-lab/apg
  • paper_authors: Yen-Ju Chen, Nai-Chieh Huang, Ping-Chun Hsieh
  • for: 本研究证明了使用推移 momentum 加速度 gradient 方法可以在 reinforcement learning 中提高 converges 率。
  • methods: 本文使用 Nesterov 加速度 gradient 方法(NAG),并对其进行了适应以适应 reinforcement learning 中的 softmax 政策参数化。
  • results: 我们表明了 NAG 在 true gradient 下可以在 $\tilde{O}(1/t^2)$ 时间复杂度下连续 converges 到优化的政策。此外,我们还通过数值验证表明了 NAG 可以在实际应用中提高 converge 性。
    Abstract Policy gradient methods have recently been shown to enjoy global convergence at a $\Theta(1/t)$ rate in the non-regularized tabular softmax setting. Accordingly, one important research question is whether this convergence rate can be further improved, with only first-order updates. In this paper, we answer the above question from the perspective of momentum by adapting the celebrated Nesterov's accelerated gradient (NAG) method to reinforcement learning (RL), termed \textit{Accelerated Policy Gradient} (APG). To demonstrate the potential of APG in achieving faster global convergence, we formally show that with the true gradient, APG with softmax policy parametrization converges to an optimal policy at a $\tilde{O}(1/t^2)$ rate. To the best of our knowledge, this is the first characterization of the global convergence rate of NAG in the context of RL. Notably, our analysis relies on one interesting finding: Regardless of the initialization, APG could end up reaching a locally nearly-concave regime, where APG could benefit significantly from the momentum, within finite iterations. By means of numerical validation, we confirm that APG exhibits $\tilde{O}(1/t^2)$ rate as well as show that APG could significantly improve the convergence behavior over the standard policy gradient.
    摘要 We formally show that with the true gradient, APG with softmax policy parametrization converges to an optimal policy at a $\tilde{O}(1/t^2)$ rate. This is the first characterization of the global convergence rate of NAG in the context of RL. Our analysis relies on one interesting finding: regardless of the initialization, APG could end up reaching a locally nearly-concave regime, where APG could benefit significantly from the momentum, within finite iterations.Numerical validation confirms that APG exhibits a $\tilde{O}(1/t^2)$ rate and shows that APG could significantly improve the convergence behavior over the standard policy gradient.

A Hyperparameter Study for Quantum Kernel Methods

  • paper_url: http://arxiv.org/abs/2310.11891
  • repo_url: None
  • paper_authors: Sebastian Egginger, Alona Sakhnenko, Jeanette Miriam Lorenz
  • for: 本研究旨在investigating the effects of hyperparameter choice on the model performance and the generalization gap between classical and quantum kernels, and exploring the use of the geometric difference as a tool for evaluating the potential for quantum advantage.
  • methods: 本研究使用了 quantum kernel methods and hyperparameter optimization techniques to evaluate the performance of quantum and classical machine learning models on 11 datasets. The geometric difference was used as a closeness measure between the two kernel-based machine learning approaches.
  • results: 研究发现,hyperparameter optimization是critical for achieving good model performance and reducing the generalization gap between classical and quantum kernels. The geometric difference can be a useful tool for evaluating the potential for quantum advantage, and can help identify commodities that can be exploited when examining a new dataset.
    Abstract Quantum kernel methods are a promising method in quantum machine learning thanks to the guarantees connected to them. Their accessibility for analytic considerations also opens up the possibility of prescreening datasets based on their potential for a quantum advantage. To do so, earlier works developed the geometric difference, which can be understood as a closeness measure between two kernel-based machine learning approaches, most importantly between a quantum kernel and classical kernel. This metric links the quantum and classical model complexities. Therefore, it raises the question of whether the geometric difference, based on its relation to model complexity, can be a useful tool in evaluations other than for the potential for quantum advantage. In this work, we investigate the effects of hyperparameter choice on the model performance and the generalization gap between classical and quantum kernels. The importance of hyperparameter optimization is well known also for classical machine learning. Especially for the quantum Hamiltonian evolution feature map, the scaling of the input data has been shown to be crucial. However, there are additional parameters left to be optimized, like the best number of qubits to trace out before computing a projected quantum kernel. We investigate the influence of these hyperparameters and compare the classically reliable method of cross validation with the method of choosing based on the geometric difference. Based on the thorough investigation of the hyperparameters across 11 datasets we identified commodities that can be exploited when examining a new dataset. In addition, our findings contribute to better understanding of the applicability of the geometric difference.
    摘要 量子kernels方法是量子机器学习中的一种有前途的方法,这主要归功于它们的 garantías。它们的可见性使得可以对数据进行预选择,以确定它们是否具有量子优势。以前的工作在开发了 геомétríain difference,这可以理解为两种基于kernel的机器学习方法之间的距离度量,主要是quantum kernel和классическийkernel之间的距离。这个指标连接了量子和классиical模型复杂性。因此,它提出了问题,是否可以通过其与模型复杂性的关系来使用 geometric difference 作为评估工具?在这种工作中,我们investigate了hyperparameter的选择对模型性能和量子和классиical kernel之间的泛化差异的影响。特别是 для量子 Hamiltonian 演化特征图,输入数据的涨落Scaling 已经被证明是关键。然而,还有其他参数需要优化,例如最佳的量子bits数量来计算projected quantum kernel。我们 investigate了这些超参数的影响,并将cross validation 方法与基于 geometric difference 的选择方法进行比较。通过对 11 个数据集进行了全面的超参数调整,我们发现了一些可以利用的商品,并对量子和классиical kernel之间的泛化差异进行了更好的理解。

Building a Graph-based Deep Learning network model from captured traffic traces

  • paper_url: http://arxiv.org/abs/2310.11889
  • repo_url: None
  • paper_authors: Carlos Güemes-Palau, Miquel Ferriol Galmés, Albert Cabellos-Aparicio, Pere Barlet-Ros
  • for: 本研究旨在提出一种基于图神经网络(GNN)的解决方案,用于更好地捕捉实际网络场景中的复杂性。
  • methods: 本研究使用了一种新的编码方法,用于从捕捉的包序列中提取信息,以及一种改进的消息传递算法,用于更好地表示物理网络中的依赖关系。
  • results: 我们的实验结果表明,提议的解决方案能够学习和泛化到未看过的捕捉网络场景。
    Abstract Currently the state of the art network models are based or depend on Discrete Event Simulation (DES). While DES is highly accurate, it is also computationally costly and cumbersome to parallelize, making it unpractical to simulate high performance networks. Additionally, simulated scenarios fail to capture all of the complexities present in real network scenarios. While there exists network models based on Machine Learning (ML) techniques to minimize these issues, these models are also trained with simulated data and hence vulnerable to the same pitfalls. Consequently, the Graph Neural Networking Challenge 2023 introduces a dataset of captured traffic traces that can be used to build a ML-based network model without these limitations. In this paper we propose a Graph Neural Network (GNN)-based solution specifically designed to better capture the complexities of real network scenarios. This is done through a novel encoding method to capture information from the sequence of captured packets, and an improved message passing algorithm to better represent the dependencies present in physical networks. We show that the proposed solution it is able to learn and generalize to unseen captured network scenarios.
    摘要 现在的状态艺术网络模型都基于不可countdown事件模拟(DES)。虽然DES具有高度准确的优点,但也有计算成本高和并行化困难,使得模拟高性能网络不实际。此外,模拟场景不能捕捉实际网络场景中的所有复杂性。而现有的网络模型基于机器学习(ML)技术来减少这些问题,但这些模型又是通过模拟数据进行训练,因此也受到相同的局限性。因此,2023年的图解网络挑战(GNN)引入了一个包含流量轨迹的数据集,可以用于构建一个基于机器学习(ML)的网络模型,不受上述局限性的影响。在本文中,我们提出了一种基于图解网络(GNN)的解决方案,通过一种新的编码方法来捕捉从流量序列中获得的信息,以及一种改进的消息传递算法来更好地表示物理网络中的依赖关系。我们示出了我们的解决方案能够学习和掌握未看过的捕捉网络场景。

Online Convex Optimization with Switching Cost and Delayed Gradients

  • paper_url: http://arxiv.org/abs/2310.11880
  • repo_url: None
  • paper_authors: Spandan Senapati, Rahul Vaze
  • for: 这个论文研究了在有限信息设定下的在线半正定优化问题,特别是使用quadratic和linear switching cost。
  • methods: 该论文提出了一种名为online multiple gradient descent(OMGD)算法,用于解决这个问题。
  • results: 论文显示了OMGD算法的竞争比例upper bound为$4(L + 5) + \frac{16(L + 5)}{\mu}$,并且证明了这个Upper bound是order-wise tight。此外,论文还证明了任何在线算法的竞争比例至少为$\max{\Omega(L), \Omega(\frac{L}{\sqrt{\mu})}$。
    Abstract We consider the online convex optimization (OCO) problem with quadratic and linear switching cost in the limited information setting, where an online algorithm can choose its action using only gradient information about the previous objective function. For $L$-smooth and $\mu$-strongly convex objective functions, we propose an online multiple gradient descent (OMGD) algorithm and show that its competitive ratio for the OCO problem with quadratic switching cost is at most $4(L + 5) + \frac{16(L + 5)}{\mu}$. The competitive ratio upper bound for OMGD is also shown to be order-wise tight in terms of $L,\mu$. In addition, we show that the competitive ratio of any online algorithm is $\max\{\Omega(L), \Omega(\frac{L}{\sqrt{\mu})\}$ in the limited information setting when the switching cost is quadratic. We also show that the OMGD algorithm achieves the optimal (order-wise) dynamic regret in the limited information setting. For the linear switching cost, the competitive ratio upper bound of the OMGD algorithm is shown to depend on both the path length and the squared path length of the problem instance, in addition to $L, \mu$, and is shown to be order-wise, the best competitive ratio any online algorithm can achieve. Consequently, we conclude that the optimal competitive ratio for the quadratic and linear switching costs are fundamentally different in the limited information setting.
    摘要 我们考虑在有限信息设定下的线上凸优化(OCO)问题,其中一个线上算法可以根据过去的目标函数GradientInformation选择行动。对于$L$-smooth和$\mu$-强制凸目标函数,我们提出了一个线上多重梯度降低(OMGD)算法,并证明其在具有quadratic switching cost的OCO问题中的竞争比率不大于$4(L + 5) + \frac{16(L + 5)}{\mu}$。此外,我们还证明了OMGD算法的竞争比率Upper bound是order-wise tight in terms of $L,\mu$。另外,我们还证明了在有限信息设定下,任何线上算法的竞争比率都是$\max\{\Omega(L), \Omega(\frac{L}{\sqrt{\mu})\}$。此外,我们还证明了OMGD算法在有限信息设定下具有最佳(order-wise)动态遗憾。在linear switching cost的情况下,我们证明了OMGD算法的竞争比率Upper bound取决于问题实体的路径长度和平方路径长度,而且随着$L, \mu$的变化而变化。此外,我们还证明了OMGD算法在linear switching cost的情况下具有order-wise最佳的竞争比率。因此,我们结论到了quadratic和linear switching cost在有限信息设定下的竞争比率是基本不同的。

SQ Lower Bounds for Learning Mixtures of Linear Classifiers

  • paper_url: http://arxiv.org/abs/2310.11876
  • repo_url: None
  • paper_authors: Ilias Diakonikolas, Daniel M. Kane, Yuxin Sun
  • For: 学习混合线性分类器下 Gaussian covariates 问题。* Methods: 使用 Statistical Query (SQ) 算法来解决问题,并提供了一个新的圆形设计技术。* Results: 得到了一个 Statistical Query (SQ) 下界,表明现有算法的复杂性为 $n^{\mathrm{poly}(1/\Delta) \log(r)} $,其中 $\Delta$ 是 $\mathbf{v}_\ell$ 对应的下界Pairwise $\ell_2$-separation。
    Abstract We study the problem of learning mixtures of linear classifiers under Gaussian covariates. Given sample access to a mixture of $r$ distributions on $\mathbb{R}^n$ of the form $(\mathbf{x},y_{\ell})$, $\ell\in [r]$, where $\mathbf{x}\sim\mathcal{N}(0,\mathbf{I}_n)$ and $y_\ell=\mathrm{sign}(\langle\mathbf{v}_\ell,\mathbf{x}\rangle)$ for an unknown unit vector $\mathbf{v}_\ell$, the goal is to learn the underlying distribution in total variation distance. Our main result is a Statistical Query (SQ) lower bound suggesting that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures. In particular, we show that the complexity of any SQ algorithm for the problem is $n^{\mathrm{poly}(1/\Delta) \log(r)}$, where $\Delta$ is a lower bound on the pairwise $\ell_2$-separation between the $\mathbf{v}_\ell$'s. The key technical ingredient underlying our result is a new construction of spherical designs that may be of independent interest.
    摘要 我们研究混合线性分类器学习问题,假设我们有一个混合的$r$个分布在 $\mathbb{R}^n$ 上,每个分布的形式是 $({\mathbf{x},y_{\ell})$, $\ell\in [r] $,其中 $\mathbf{x}\sim\mathcal{N}(0,\mathbf{I}_n) $ 是一个标准均值为零的均值为 $\mathbf{I}_n $ 的高维Normal分布,$y_{\ell} = \text{sign}(\langle \mathbf{v}_{\ell}, \mathbf{x} \rangle)$ 是一个未知的单位向量 $\mathbf{v}_{\ell} $ 的某种标记。我们的目标是通过总变化距离来学习这个下面的分布。我们的主要结果是一个统计查询(SQ)下界,表明现有的算法是可能最佳的,即使特殊情况下是均匀混合。具体来说,我们证明任何 SQ 算法的复杂度为 $n^{\mathrm{poly}(1/\Delta) \log(r)}$, 其中 $\Delta$ 是 $\mathbf{v}_{\ell}$ 之间的对角线 $\ell_2$ separation 的下界。我们的技术核心是一种新的圆柱体设计,可能具有独立的利用价值。

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function

  • paper_url: http://arxiv.org/abs/2310.11866
  • repo_url: None
  • paper_authors: Liu Liu, Xuanqing Liu, Cho-Jui Hsieh, Dacheng Tao
  • for: 这个论文的目的是提出一种基于不确定计算的信任区间(TR)和适应正则化(ARC)方法,以便在非对称优化问题中提高优化效率。
  • methods: 这种方法使用了不确定计算来计算函数值、梯度和希格曼矩阵,从而选择下一个搜索方向和调整参数。
  • results: 该论文证明了这种方法可以同时提供不确定计算的函数值、梯度和希格曼矩阵,并且可以在非对称优化问题中实现$\epsilon$-近似二阶优 оптимальность。此外,这种方法的迭代复杂度与前一个研究中的精确计算相同。
    Abstract Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for non-convex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoretically guarantee the convergence rate. In this paper, we explore a family of stochastic TR and ARC methods that can simultaneously provide inexact computations of the Hessian matrix, gradient, and function values. Our algorithms require much fewer propagations overhead per iteration than TR and ARC. We prove that the iteration complexity to achieve $\epsilon$-approximate second-order optimality is of the same order as the exact computations demonstrated in previous studies. Additionally, the mild conditions on inexactness can be met by leveraging a random sampling technology in the finite-sum minimization problem. Numerical experiments with a non-convex problem support these findings and demonstrate that, with the same or a similar number of iterations, our algorithms require less computational overhead per iteration than current second-order methods.
    摘要 信任区域(TR)和适应正则化使用立方体(ARC)在非对称优化中有非常吸引人的理论性质。它们同时计算函数值、梯度和偏导数矩阵,以获取下一步搜索方向和调整参数。 although stochastic approximations can significantly reduce computational cost, it is challenging to theoretically guarantee the convergence rate.在这篇论文中,我们探讨了一家Stochastic TR和ARC方法,可同时提供不准确的函数值、梯度和偏导数矩阵计算。我们的算法需要每次迭代 fewer propagations overhead than TR和ARC。我们证明,以 Achieve $\epsilon$-近似第二阶优化的迭代复杂度与前一个研究中的精确计算相同顺序。此外,我们的方法可以通过利用随机抽样技术在finite-sum minimization问题中实现轻度的不准确性条件。numerical experiments with a non-convex problem support these findings and demonstrate that, with the same or a similar number of iterations, our algorithms require less computational overhead per iteration than current second-order methods.

Effective and Efficient Federated Tree Learning on Hybrid Data

  • paper_url: http://arxiv.org/abs/2310.11865
  • repo_url: None
  • paper_authors: Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li, Bingsheng He, Dawn Song
  • for: 该论文旨在 Addressing the challenges of federated learning in hybrid data settings, where data from different parties may differ in both features and samples.
  • methods: 该论文提出了 HybridTree,一种基于分布式学习的新方法,可以在混合数据设置下进行树学习。通过分析了树中具有一致的拆分规则,该方法可以在不需要频繁的通信协议的情况下训练树。
  • results: 实验表明,HybridTree 可以与中央集成集成环境相比,达到相同的准确率,而且可以减少计算和通信协议的开销,最高可以达到 8 倍的速度提升。
    Abstract Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing federated learning studies focus on either horizontal or vertical data settings, where the data of different parties are assumed to be from the same feature or sample space. In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. We observe the existence of consistent split rules in trees. With the help of these split rules, we theoretically show that the knowledge of parties can be incorporated into the lower layers of a tree. Based on our theoretical analysis, we propose a layer-level solution that does not need frequent communication traffic to train a tree. Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. HybridTree can achieve up to 8 times speedup compared with the other baselines.
    摘要 《联合学习》已经成为一种有前途的分布式学习 paradigma,它使得多个党 collaboration 学习,无需传输原始数据。然而,现有大多数联合学习研究都集中在水平或垂直数据设置中,即不同党的数据假设来自同一个特征或样本空间。在实际应用中,常见的情景是混合数据设置,其中党的数据可能具有不同的特征和样本。为 Addressing 此问题,我们提出 HybridTree,一种新的联合学习方法,可以在混合数据上进行联合树学习。我们发现了共同拆分规则在树中的存在,这些规则帮助我们 theoretically 表明党的知识可以在树的下层级中被包含。基于我们的理论分析,我们提出一种层级解决方案,不需要频繁的通信协议来训练树。我们的实验表明,HybridTree 可以与中央集成设置具有相同的准确率,同时具有较低的计算和通信协议负担。HybridTree 可以与其他基准值进行比较,达到 8 倍的速度提升。

Accelerate Presolve in Large-Scale Linear Programming via Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.11845
  • repo_url: None
  • paper_authors: Yufei Kuang, Xijun Li, Jie Wang, Fangzhou Zhu, Meng Lu, Zhihai Wang, Jia Zeng, Houqiang Li, Yongdong Zhang, Feng Wu
  • for: 这篇论文的目的是提出一种基于机器学习的LP解决方法,以提高现代LP解决器的效率和可靠性。
  • methods: 该方法使用了动态行为序列学习(RL)框架,将LP解决器的routine设计任务形式化为一个Markov决策过程,并通过适应行动序列来生成高质量的解决方案。
  • results: 实验结果表明,RL4Presolve可以有效地提高大规模LP的解决效率,特别是在来自业界的benchmark中。此外,通过提取学习政策中的规则,可以将RL4Presolve简单地部署到华为的供应链中。这些结果表明,将机器学习技术应用于现代LP解决器可以实现可负担的经济和学术潜力。
    Abstract Large-scale LP problems from industry usually contain much redundancy that severely hurts the efficiency and reliability of solving LPs, making presolve (i.e., the problem simplification module) one of the most critical components in modern LP solvers. However, how to design high-quality presolve routines -- that is, the program determining (P1) which presolvers to select, (P2) in what order to execute, and (P3) when to stop -- remains a highly challenging task due to the extensive requirements on expert knowledge and the large search space. Due to the sequential decision property of the task and the lack of expert demonstrations, we propose a simple and efficient reinforcement learning (RL) framework -- namely, reinforcement learning for presolve (RL4Presolve) -- to tackle (P1)-(P3) simultaneously. Specifically, we formulate the routine design task as a Markov decision process and propose an RL framework with adaptive action sequences to generate high-quality presolve routines efficiently. Note that adaptive action sequences help learn complex behaviors efficiently and adapt to various benchmarks. Experiments on two solvers (open-source and commercial) and eight benchmarks (real-world and synthetic) demonstrate that RL4Presolve significantly and consistently improves the efficiency of solving large-scale LPs, especially on benchmarks from industry. Furthermore, we optimize the hard-coded presolve routines in LP solvers by extracting rules from learned policies for simple and efficient deployment to Huawei's supply chain. The results show encouraging economic and academic potential for incorporating machine learning to modern solvers.
    摘要 大规模LP问题从行业 обычно含有很多重复性,这会严重地降低解决LP的效率和可靠性,因此宏观问题简化模块(i.e., 问题简化模块)成为现代LP解决器中最 kritical 的一部分。然而,如何设计高质量的宏观问题简化程序 --- 即确定(P1)哪些简化器选择,(P2)在哪个顺序执行,以及(P3)何时停止 --- 仍然是一项非常困难的任务,这主要归结于宏观问题简化程序的广泛需求和搜索空间的庞大。由于任务具有顺序决策性和缺乏专家示范,我们提出了一种简单和高效的机器学习(RL)框架 --- 即RL4Presolve --- 以同时解决(P1)-(P3)。具体来说,我们将问题简化任务视为一个Markov决策过程,并提出了一种RL框架,其中包含可适应行为序列来生成高质量的宏观问题简化程序。注意,可适应行为序列可以高效地学习复杂的行为并适应不同的标准。在两种解决器(开源和商业)和八个标准(实际世界和 sintetic)上进行了实验,RL4Presolve显示可以有效地提高大规模LP的解决效率,特别是在行业标准上。此外,我们还使用RL学习到来自学习的策略中的规则,以便简单和高效地在Huawei的供应链中部署。结果表明,通过把机器学习技术应用到现代解决器中,可以获得有优 экономиче和学术潜力。

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.11840
  • repo_url: None
  • paper_authors: Rohan Subramani, Marcus Williams, Max Heitmann, Halfdan Holm, Charlie Griffin, Joar Skalse
  • for: 这个论文主要针对的是 reinforcement learning(RL)任务中的目标形式化问题。
  • methods: 这篇论文使用了多种目标规定 formalism,包括Linear Temporal Logic和Multi-Objective Reinforcement Learning,并进行了这些 formalism 之间的比较。
  • results: 论文发现了不同的目标规定 formalism 之间存在一定的限制,并且没有任何一种 formalism 同时具有优化和表达能力。例如,论文证明了 Regularised RL、Outer Nonlinear Markov Rewards、Reward Machines、Linear Temporal Logic 和 Limit Average Rewards 等 formalism 可以表达其他 formalism 无法表达的目标。这些结论有关于RL中目标规定 formalism的选择和实践中的表达限制。
    Abstract To solve a task with reinforcement learning (RL), it is necessary to formally specify the goal of that task. Although most RL algorithms require that the goal is formalised as a Markovian reward function, alternatives have been developed (such as Linear Temporal Logic and Multi-Objective Reinforcement Learning). Moreover, it is well known that some of these formalisms are able to express certain tasks that other formalisms cannot express. However, there has not yet been any thorough analysis of how these formalisms relate to each other in terms of expressivity. In this work, we fill this gap in the existing literature by providing a comprehensive comparison of the expressivities of 17 objective-specification formalisms in RL. We place these formalisms in a preorder based on their expressive power, and present this preorder as a Hasse diagram. We find a variety of limitations for the different formalisms, and that no formalism is both dominantly expressive and straightforward to optimise with current techniques. For example, we prove that each of Regularised RL, Outer Nonlinear Markov Rewards, Reward Machines, Linear Temporal Logic, and Limit Average Rewards can express an objective that the others cannot. Our findings have implications for both policy optimisation and reward learning. Firstly, we identify expressivity limitations which are important to consider when specifying objectives in practice. Secondly, our results highlight the need for future research which adapts reward learning to work with a variety of formalisms, since many existing reward learning methods implicitly assume that desired objectives can be expressed with Markovian rewards. Our work contributes towards a more cohesive understanding of the costs and benefits of different RL objective-specification formalisms.
    摘要 要解决一个任务使用强化学习(RL),需要正式 specify 该任务的目标。大多数 RL 算法需要将目标 formalized 为 Markov 奖励函数,但是有其他形式(如线性时间逻辑和多目标强化学习)也有被开发出来。然而,到目前为止,没有任何 thorougly 分析这些形式之间的关系。在这种情况下,我们填充了现有文献中的这种 gap by 提供了17种目标规定 formalism 在RL中的比较。我们将这些 formalism 按照其表达力排序,并将其显示为一个 Hasse диаграм。我们发现了不同 formalism 的一些限制,并证明了每种 formalism 都有一些可以表达的任务,而其他 formalism 不能表达。例如,我们证明了 Regularized RL、Outer Nonlinear Markov Rewards、Reward Machines、线性时间逻辑和 Limit Average Rewards 可以表达出其他 formalism 不能表达的任务。我们的发现对于policy优化和奖励学习都有重要的意义。首先,我们identified 表达力的限制,这些限制在实践中需要考虑。其次,我们的结果表明需要将奖励学习适应到不同 formalism 中,因为现有的奖励学习方法通常假设desired objective 可以用 Markov 奖励函数表达。我们的工作对RL objective-specification formalism 的costs and benefits 提供了更加一致的理解。

Equivariant Bootstrapping for Uncertainty Quantification in Imaging Inverse Problems

  • paper_url: http://arxiv.org/abs/2310.11838
  • repo_url: https://github.com/tachella/equivariant_bootstrap
  • paper_authors: Julian Tachella, Marcelo Pereyra
  • for: This paper aims to accurately quantify the uncertainty in solutions to severely ill-posed scientific imaging problems, which is critical for interpreting experimental results and using reconstructed images as scientific evidence.
  • methods: The proposed uncertainty quantification methodology is based on an equivariant formulation of the parametric bootstrap algorithm, which leverages symmetries and invariance properties commonly encountered in imaging problems. The method is general and can be applied with any image reconstruction technique, including unsupervised training strategies.
  • results: The proposed method delivers remarkably accurate high-dimensional confidence regions and outperforms alternative uncertainty quantification strategies in terms of estimation accuracy, uncertainty quantification accuracy, and computing time. The method is demonstrated through a series of numerical experiments.
    Abstract Scientific imaging problems are often severely ill-posed, and hence have significant intrinsic uncertainty. Accurately quantifying the uncertainty in the solutions to such problems is therefore critical for the rigorous interpretation of experimental results as well as for reliably using the reconstructed images as scientific evidence. Unfortunately, existing imaging methods are unable to quantify the uncertainty in the reconstructed images in a manner that is robust to experiment replications. This paper presents a new uncertainty quantification methodology based on an equivariant formulation of the parametric bootstrap algorithm that leverages symmetries and invariance properties commonly encountered in imaging problems. Additionally, the proposed methodology is general and can be easily applied with any image reconstruction technique, including unsupervised training strategies that can be trained from observed data alone, thus enabling uncertainty quantification in situations where there is no ground truth data available. We demonstrate the proposed approach with a series of numerical experiments and through comparisons with alternative uncertainty quantification strategies from the state-of-the-art, such as Bayesian strategies involving score-based diffusion models and Langevin samplers. In all our experiments, the proposed method delivers remarkably accurate high-dimensional confidence regions and outperforms the competing approaches in terms of estimation accuracy, uncertainty quantification accuracy, and computing time.
    摘要

Optimising Distributions with Natural Gradient Surrogates

  • paper_url: http://arxiv.org/abs/2310.11837
  • repo_url: None
  • paper_authors: Jonathan So, Richard E. Turner
  • for: 优化probability distribution的参数
  • methods: 使用自然偏导法优化参数
  • results: 扩展了可以使用自然偏导法优化的 distribuition 类型,以及fast、易于理解、简单实现和不需要详细模型Derivation。
    Abstract Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution, for which computing the natural gradient is easy. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. Our method expands the set of distributions that can be efficiently targeted with natural gradients. Furthermore, it is fast, easy to understand, simple to implement using standard autodiff software, and does not require lengthy model-specific derivations. We demonstrate our method on maximum likelihood estimation and variational inference tasks.
    摘要 自然均方法已经广泛应用于估计概率分布参数,经常导致快速收敛的过程。然而,许多感兴趣的分布中,计算自然均方的问题充满挑战。在这种情况下,我们提出了一种新的技巧,即将估计变换为一种对准ocker分布参数的估计问题,其中计算自然均方是容易的。我们给出了一些现有的方法,可以看作是应用这种技巧,并提出了一种新的方法,可以应用于各种问题。我们的方法可以扩展到更多的分布,并且快速、易于理解、使用标准自动极化软件实现,不需要详细的模型特定的 derivations。我们在最大 LIKELIHOOD估计和variational推断任务中进行了示例。

CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

  • paper_url: http://arxiv.org/abs/2310.11830
  • repo_url: https://github.com/knoriy/CLARA
  • paper_authors: Kari A Noriy, Xiaosong Yang, Marcin Budka, Jian Jun Zhang
  • for: 本研究提出了一种多语言语音和声音表示学习框架,用于解决语音处理研究中数据的问题。
  • methods: 该框架使用了对比学习技术,通过自动生成的卷积数据来增加数据量,并让模型从无标签数据上学习共享表示。
  • results: 实验结果表明,该模型在识别情绪、音频分类和检索 bencmarks 中表现出色,具有适用于多语言和各种各样的声音条件的共享表示能力,同时还能够编码潜在的情感维度。
    Abstract This paper proposes a novel framework for multilingual speech and sound representation learning using contrastive learning. The lack of sizeable labelled datasets hinders speech-processing research across languages. Recent advances in contrastive learning provide self-supervised techniques to learn from unlabelled data. Motivated by reducing data dependence and improving generalisation across diverse languages and conditions, we develop a multilingual contrastive framework. This framework enables models to acquire shared representations across languages, facilitating cross-lingual transfer with limited target language data. Additionally, capturing emotional cues within speech is challenging due to subjective perceptual assessments. By learning expressive representations from diverse, multilingual data in a self-supervised manner, our approach aims to develop speech representations that encode emotive dimensions. Our method trains encoders on a large corpus of multi-lingual audio data. Data augmentation techniques are employed to expand the dataset. The contrastive learning approach trains the model to maximise agreement between positive pairs and minimise agreement between negative pairs. Extensive experiments demonstrate state-of-the-art performance of the proposed model on emotion recognition, audio classification, and retrieval benchmarks under zero-shot and few-shot conditions. This provides an effective approach for acquiring shared and generalised speech representations across languages and acoustic conditions while encoding latent emotional dimensions.
    摘要 We train encoders on a large corpus of multi-lingual audio data, and employ data augmentation techniques to expand the dataset. The contrastive learning approach trains the model to maximize agreement between positive pairs and minimize agreement between negative pairs. Extensive experiments demonstrate state-of-the-art performance of the proposed model on emotion recognition, audio classification, and retrieval benchmarks under zero-shot and few-shot conditions. This provides an effective approach for acquiring shared and generalised speech representations across languages and acoustic conditions while encoding latent emotional dimensions.Here's the Simplified Chinese translation:这篇论文提出了一种新的多语言语音和声音表示学习框架,使用对比学习。由于语言上的大量标注数据缺乏,这阻碍了跨语言语音处理研究的进步。但是,最近的对比学习技术提供了一种无监督的学习方法,可以从无标注数据中学习。我们的方法旨在通过学习多语言数据,以便在不同语言和条件下实现交互转移,并且编码潜在的情感维度。我们将编码器训练在一个大量多语言音频数据集上,并使用数据扩展技术来扩大数据集。对比学习方法将模型训练以最大化正方向对的匹配,并最小化负方向对的匹配。广泛的实验表明,提议的模型在情感识别、音频分类和检索benchmark上实现了顶尖性能,包括零shot和几shot情况下。这提供了一种有效的方法,可以在不同语言和音频条件下获得共享和普适的语音表示,同时编码潜在的情感维度。

Towards Graph Foundation Models: A Survey and Beyond

  • paper_url: http://arxiv.org/abs/2310.11829
  • repo_url: None
  • paper_authors: Jiawei Liu, Cheng Yang, Zhiyuan Lu, Junze Chen, Yibo Li, Mengmei Zhang, Ting Bai, Yuan Fang, Lichao Sun, Philip S. Yu, Chuan Shi
  • for: 这篇论文旨在探讨基于图像学的基本模型,以及其在不同的人工智能应用中的潜在应用。
  • methods: 本论文提出了基于图像学的基本模型(GFM)的概念,并对其特点和技术进行了系统的描述。此外,文章还分类了现有的工作,根据它们的依赖于图像神经网络和大语言模型的程度。
  • results: 文章提供了当前基于图像学的基本模型领域的全面的概述,以及这个领域的未来研究方向。
    Abstract Emerging as fundamental building blocks for diverse artificial intelligence applications, foundation models have achieved notable success across natural language processing and many other domains. Parallelly, graph machine learning has witnessed a transformative shift, with shallow methods giving way to deep learning approaches. The emergence and homogenization capabilities of foundation models have piqued the interest of graph machine learning researchers, sparking discussions about developing the next graph learning paradigm that is pre-trained on broad graph data and can be adapted to a wide range of downstream graph tasks. However, there is currently no clear definition and systematic analysis for this type of work. In this article, we propose the concept of graph foundation models (GFMs), and provide the first comprehensive elucidation on their key characteristics and technologies. Following that, we categorize existing works towards GFMs into three categories based on their reliance on graph neural networks and large language models. Beyond providing a comprehensive overview of the current landscape of graph foundation models, this article also discusses potential research directions for this evolving field.
    摘要 emerging as fundamental building blocks for diverse artificial intelligence applications, foundation models have achieved notable success across natural language processing and many other domains. 同时, graph machine learning has witnessed a transformative shift, with shallow methods giving way to deep learning approaches. the emergence and homogenization capabilities of foundation models have piqued the interest of graph machine learning researchers, sparking discussions about developing the next graph learning paradigm that is pre-trained on broad graph data and can be adapted to a wide range of downstream graph tasks. however, there is currently no clear definition and systematic analysis for this type of work. in this article, we propose the concept of graph foundation models (gfms), and provide the first comprehensive elucidation on their key characteristics and technologies. following that, we categorize existing works towards gfms into three categories based on their reliance on graph neural networks and large language models. beyond providing a comprehensive overview of the current landscape of graph foundation models, this article also discusses potential research directions for this evolving field.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

A Historical Context for Data Streams

  • paper_url: http://arxiv.org/abs/2310.19811
  • repo_url: None
  • paper_authors: Indre Zliobaite, Jesse Read
  • for: 这篇论文主要针对的是从数据流中学习的机器学习问题,这是一个活跃和快速发展的研究领域。
  • methods: 这篇论文使用了一些传统的机器学习算法,但是它们受到了数据流的计算资源限制,例如每个实例只能被检查一次,并且需要在任何时间提供预测结果。
  • results: 这篇论文提出了一些历史上对数据流机器学习的假设,并将这些假设放在历史上的学术背景中进行了回顾。
    Abstract Machine learning from data streams is an active and growing research area. Research on learning from streaming data typically makes strict assumptions linked to computational resource constraints, including requirements for stream mining algorithms to inspect each instance not more than once and be ready to give a prediction at any time. Here we review the historical context of data streams research placing the common assumptions used in machine learning over data streams in their historical context.
    摘要 <>将数据流中的机器学习作为活跃和快速发展的研究领域。研究从数据流中学习通常做出严格的计算资源限制,包括流程挖掘算法不能再次检查每个实例,并且要准备任何时间给出预测。我们在这里将数据流研究的历史背景和常见的机器学习假设置在历史上的位置。Translation:机器学习从数据流中是一个活跃和快速发展的研究领域。研究从数据流中学习通常做出严格的计算资源限制,包括流程挖掘算法不能再次检查每个实例,并且要准备任何时间给出预测。我们在这里将数据流研究的历史背景和常见的机器学习假设置在历史上的位置。

De novo protein design using geometric vector field networks

  • paper_url: http://arxiv.org/abs/2310.11802
  • repo_url: None
  • paper_authors: Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao Chen, Chunhua Shen
  • for: 这篇论文主要关注的是对蛋白质构造设计的进步,特别是透过蛋白质diffusion的创新,使得蛋白质设计得以进行更加精确和有效的模型化。
  • methods: 本论文提出了一种新的数据 Computation Network(VFN),可以在蛋白质diffusion中进行更加精确的框架模型化,并且可以同时模型框架和原子。VFN使用了学习可控的 вектор计算,将蛋白质框架中的各个位置转换为可读的 вектор值,然后使用弹性总和将这些 вектор值与蛋白质框架中的各个原子进行相互关联。
  • results: 本论文的实验结果显示,VFN在蛋白质diffusion中表现出色,比起先前的IPA模型,VFN在设计性(67.04% vs. 53.58%)和多样性(66.54% vs. 51.98%)等方面均有较好的表现。此外,VFN也在倒拾蛋白质(frame和原子模型)中表现出色,比起先前的PiFold模型(54.7% vs. 51.66%),VFN在序列恢复率上有较好的表现。此外,本论文还提出了一种将VFN与ESM模型结合的方法,这种方法在先前的ESM-based SoTA(62.67% vs. 55.65%)上有着较好的表现。
    Abstract Innovations like protein diffusion have enabled significant progress in de novo protein design, which is a vital topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context. Thus far, only several simple encoders, such as IPA, have been proposed for this scenario, exposing the frame modeling as a bottleneck. In this work, we proffer the Vector Field Network (VFN), which enables network layers to perform learnable vector computations between coordinates of frame-anchored virtual atoms, thus achieving a higher capability for modeling frames. The vector computation operates in a manner similar to a linear layer, with each input channel receiving 3D virtual atom coordinates instead of scalar values. The multiple feature vectors output by the vector computation are then used to update the residue representations and virtual atom coordinates via attention aggregation. Remarkably, VFN also excels in modeling both frames and atoms, as the real atoms can be treated as the virtual atoms for modeling, positioning VFN as a potential universal encoder. In protein diffusion (frame modeling), VFN exhibits an impressive performance advantage over IPA, excelling in terms of both designability (67.04% vs. 53.58%) and diversity (66.54% vs. 51.98%). In inverse folding (frame and atom modeling), VFN outperforms the previous SoTA model, PiFold (54.7% vs. 51.66%), on sequence recovery rate. We also propose a method of equipping VFN with the ESM model, which significantly surpasses the previous ESM-based SoTA (62.67% vs. 55.65%), LM-Design, by a substantial margin.
    摘要 新技术如蛋白diffusion已经使得蛋白结构设计得到了重要的进步,这是生命科学中非常重要的话题。这些方法通常依赖于蛋白结构编码器来模拟蛋白质量框架,其中原子不存在。以前的编码器大多数依赖于原子粒子特征,如原子之间的角度和距离,这些特征在这种情况下不可用。只有一些简单的编码器,如IPA,已经被提出,这暴露了框架模型化为瓶颈。在这种工作中,我们提议使用 Vector Field Network(VFN),它使得网络层可以通过学习vector计算来处理坐标相关的操作。VFN在蛋白diffusion(框架模型)中表现出了非常出色的性能优势,比IPA更高,达到67.04% vs. 53.58%的设计性能和66.54% vs. 51.98%的多样性。在 inverse folding(框架和原子模型)中,VFN也超越了之前的SoTA模型,PiFold(54.7% vs. 51.66%),在序列恢复率方面表现出色。我们还提出了使用VFN和ESM模型的方法,该方法在之前的ESM-based SoTA(62.67% vs. 55.65%)之上显著提高了性能。

Adversarial Training for Physics-Informed Neural Networks

  • paper_url: http://arxiv.org/abs/2310.11789
  • repo_url: https://github.com/yaoli90/at-pinn
  • paper_authors: Yao Li, Shengzhu Shi, Zhichang Guo, Boying Wu
  • For: 解决复杂的偏微分方程(PDEs)中的缺乏稳定性问题,提高Physics-informed neural networks(PINNs)的测试精度和可靠性。* Methods: 基于投影gradient descent对抗攻击(PGD-based adversarial attack),提出了一种名为AT-PINNs的对抗训练策略,可以增强PINNs的Robustness和稳定性。AT-PINNs可以通过在训练过程中使用对抗样本来准确地识别模型失败位置,并在训练过程中使模型更加注重这些位置。* Results: 应用AT-PINNs于各种复杂的PDEs,包括多尺度约束的圆柱方程、多峰解的波兰射方程、普朗克方程的锐度解和Allen-Cahn方程。结果表明,AT-PINNs可以有效地定位和减少失败区域,并且适用于解决复杂的PDEs,因为对于失败区域的定位无关于失败区域的大小或分布的复杂性。
    Abstract Physics-informed neural networks have shown great promise in solving partial differential equations. However, due to insufficient robustness, vanilla PINNs often face challenges when solving complex PDEs, especially those involving multi-scale behaviors or solutions with sharp or oscillatory characteristics. To address these issues, based on the projected gradient descent adversarial attack, we proposed an adversarial training strategy for PINNs termed by AT-PINNs. AT-PINNs enhance the robustness of PINNs by fine-tuning the model with adversarial samples, which can accurately identify model failure locations and drive the model to focus on those regions during training. AT-PINNs can also perform inference with temporal causality by selecting the initial collocation points around temporal initial values. We implement AT-PINNs to the elliptic equation with multi-scale coefficients, Poisson equation with multi-peak solutions, Burgers equation with sharp solutions and the Allen-Cahn equation. The results demonstrate that AT-PINNs can effectively locate and reduce failure regions. Moreover, AT-PINNs are suitable for solving complex PDEs, since locating failure regions through adversarial attacks is independent of the size of failure regions or the complexity of the distribution.
    摘要 物理学 Informed Neural Networks (PINNs) 已经展示了解决partial differential equations (PDEs) 的巨大承诺. 然而,由于不充分的Robustness,vanilla PINNs 经常在解决复杂的PDEs中遇到挑战,特别是包含多尺度行为或解决具有锐利或振荡特征的PDEs. 为了解决这些问题,我们基于Projected gradient descent adversarial attack (PGD-AA)提出了一种名为AT-PINNs的对抗训练策略。AT-PINNs可以增强PINNs的Robustness,通过在训练过程中使用对抗样本,准确地识别模型失败的位置并使模型在训练过程中专注于这些位置。AT-PINNs还可以通过选择时间初值附近的初始坐标来进行时间 causality 的推理。我们将AT-PINNs应用到了各种PDEs,包括具有多尺度系数的圆柱 equation、Poisson equation with multi-peak solutions、Burgers equation with sharp solutions和Allen-Cahn equation。结果表明,AT-PINNs可以有效地定位和减少失败区域。此外,AT-PINNs适用于解决复杂的PDEs,因为通过对抗攻击定位失败区域是独立于失败区域的大小或分布复杂性的。

NeuroCUT: A Neural Approach for Robust Graph Partitioning

  • paper_url: http://arxiv.org/abs/2310.11787
  • repo_url: None
  • paper_authors: Rishi Shah, Krishnanshu Jain, Sahil Manchanda, Sourav Medya, Sayan Ranu
  • for: 分区图形式的问题,即将图分成k个独立的部分,以优化特定的分区目标。
  • methods: neural approach,包括一个新的框架NeuroCut,它在查询时可以对图的结构和分区数进行泛化,并通过基于节点表示学习的强化学习框架来满足任何优化目标,包括不可导函数。
  • results: NeuroCut在实验中表现出色,能够找到高质量的分区,具有强大的泛化性和对图结构 modificatiopn的抗颤势性。
    Abstract Graph partitioning aims to divide a graph into $k$ disjoint subsets while optimizing a specific partitioning objective. The majority of formulations related to graph partitioning exhibit NP-hardness due to their combinatorial nature. As a result, conventional approximation algorithms rely on heuristic methods, sometimes with approximation guarantees and sometimes without. Unfortunately, traditional approaches are tailored for specific partitioning objectives and do not generalize well across other known partitioning objectives from the literature. To overcome this limitation, and learn heuristics from the data directly, neural approaches have emerged, demonstrating promising outcomes. In this study, we extend this line of work through a novel framework, NeuroCut. NeuroCut introduces two key innovations over prevailing methodologies. First, it is inductive to both graph topology and the partition count, which is provided at query time. Second, by leveraging a reinforcement learning based framework over node representations derived from a graph neural network, NeuroCut can accommodate any optimization objective, even those encompassing non-differentiable functions. Through empirical evaluation, we demonstrate that NeuroCut excels in identifying high-quality partitions, showcases strong generalization across a wide spectrum of partitioning objectives, and exhibits resilience to topological modifications.
    摘要 graf分割的目标是将 Graf 分成 k 个不交叉的子集,同时最大化特定的分割目标。大多数相关的形式化问题都会显示NP困难,因为它们具有各种 combinatorial 特性。因此,现有的approximation算法通常使用了heuristic方法,有时具有approximation保证,有时没有。 unfortunately,传统的方法通常是为特定的分割目标设计的,不能总是泛化到其他从文献中知道的分割目标。为了解决这个限制,并从数据中直接学习heuristics,神经方法出现了。在这项研究中,我们通过一个新的框架,NeuroCut,进一步推动这一线的发展。NeuroCut 具有两个关键创新:首先,它是对 Graf 结构和分割 count inductive的,可以在查询时提供。其次,通过利用基于节点表示学习的reinforcement learning框架,NeuroCut 可以处理任何优化目标,包括不可导函数。通过实验评估,我们示出NeuroCut 可以提供高质量的分割,具有强大的泛化能力,并且对 Graf 结构的修改 display 强大的抗衡性。

A Quasi-Wasserstein Loss for Learning Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.11762
  • repo_url: None
  • paper_authors: Minjie Cheng, Hongteng Xu
  • for: 提高 Graph Neural Network (GNN) 在节点级预测任务中的性能,因为现有的损失函数通常对每个节点独立进行应用,即使节点嵌入和标签不是独立的。
  • methods: 提出了一种新的 quasi-Wasserstein (QW) 损失函数,基于图上的最优运输定义,用于修改 GNN 的学习和预测方法。该损失函数定义了图边上的 quasi-Wasserstein 距离,用于优化标签的运输定义。
  • results: 实验表明,提出的 QW 损失函数可以应用于多种 GNN 模型,并且能够提高其性能在节点级预测和回归任务中。此外,该损失函数还可以提供一种新的拟合学习和预测方法。
    Abstract When learning graph neural networks (GNNs) in node-level prediction tasks, most existing loss functions are applied for each node independently, even if node embeddings and their labels are non-i.i.d. because of their graph structures. To eliminate such inconsistency, in this study we propose a novel Quasi-Wasserstein (QW) loss with the help of the optimal transport defined on graphs, leading to new learning and prediction paradigms of GNNs. In particular, we design a "Quasi-Wasserstein" distance between the observed multi-dimensional node labels and their estimations, optimizing the label transport defined on graph edges. The estimations are parameterized by a GNN in which the optimal label transport may determine the graph edge weights optionally. By reformulating the strict constraint of the label transport to a Bregman divergence-based regularizer, we obtain the proposed Quasi-Wasserstein loss associated with two efficient solvers learning the GNN together with optimal label transport. When predicting node labels, our model combines the output of the GNN with the residual component provided by the optimal label transport, leading to a new transductive prediction paradigm. Experiments show that the proposed QW loss applies to various GNNs and helps to improve their performance in node-level classification and regression tasks.
    摘要 当学习图 neural network (GNN) 在节点级预测任务时,大多数现有的损失函数都是对每个节点独立应用的,即使节点表示和其标签不是独立的,因为它们的图结构。为了消除这种不一致,在本研究中我们提出了一种新的 quasi-Wasserstein (QW) 损失函数,基于图上的最优运输定义。在特定情况下,我们定义了 observe 多维节点标签和其估计之间的 "quasi-Wasserstein" 距离,并且优化了图边上的标签运输定义。这些估计是通过一个 GNN 来 parameterize,其中优化的标签运输可能会确定图边权重。通过将 строго的标签运输约束转换为 Bregman 分布定义based REG regularizer,我们获得了我们的提议的 QW 损失函数,并且可以使用两种高效的算法来学习 GNN 和标签运输。在预测节点标签时,我们将 GNN 的输出与标签运输的 residual 组件相加,这导致了一种新的混合预测 paradigm。实验表明,我们的提议 QW 损失函数可以应用于多种 GNN 和提高它们在节点级预测和回归任务中的性能。

Unintended Memorization in Large ASR Models, and How to Mitigate It

  • paper_url: http://arxiv.org/abs/2310.11739
  • repo_url: None
  • paper_authors: Lun Wang, Om Thakkar, Rajiv Mathews
  • for: 检测大型自动声音识别(ASR)模型中的 memorization 问题,以保护隐私。
  • methods: 提出了一种简单的检测方法,通过快速生成的句子速度来创建声音和文本信息之间的易于学习Mapping。
  • results: 在state-of-the-art ASR模型中发现了memorization问题,并通过gradient clipping来 Mitigate memorization。在大规模分布式训练中,clip each example’s gradient可以保持中性模型质量和计算成本,同时提供强的隐私保护。
    Abstract It is well-known that neural networks can unintentionally memorize their training examples, causing privacy concerns. However, auditing memorization in large non-auto-regressive automatic speech recognition (ASR) models has been challenging due to the high compute cost of existing methods such as hardness calibration. In this work, we design a simple auditing method to measure memorization in large ASR models without the extra compute overhead. Concretely, we speed up randomly-generated utterances to create a mapping between vocal and text information that is difficult to learn from typical training examples. Hence, accurate predictions only for sped-up training examples can serve as clear evidence for memorization, and the corresponding accuracy can be used to measure memorization. Using the proposed method, we showcase memorization in the state-of-the-art ASR models. To mitigate memorization, we tried gradient clipping during training to bound the influence of any individual example on the final model. We empirically show that clipping each example's gradient can mitigate memorization for sped-up training examples with up to 16 repetitions in the training set. Furthermore, we show that in large-scale distributed training, clipping the average gradient on each compute core maintains neutral model quality and compute cost while providing strong privacy protection.
    摘要 很多人知道神经网络可能会无意地记忆训练示例,这引起了隐私问题。然而,对大型非自动回归自动语音识别(ASR)模型的审核记忆存在高计算成本的问题,使得现有方法如困难度调整不太实用。在这种情况下,我们提出了一种简单的审核方法,可以不增加计算成本来测量大型ASR模型的记忆。具体来说,我们将随机生成的语音快速播放,以创建语音和文本信息之间的易于学习的映射。因此,只有对快速播放的训练示例进行准确预测时,可以作为记忆的证据,并且可以用这个精度来测量记忆。使用我们的方法,我们显示了state-of-the-art ASR模型中的记忆。为了解决记忆问题,我们尝试使用梯度截断法在训练时进行 bounding 梯度的影响。我们经验显示,对每个示例的梯度进行截断可以 Mitigate 记忆,并且可以在快速播放示例中进行16次复制。此外,我们还显示了在大规模分布式训练中,对每个计算核心的梯度平均截断可以保持中立的模型质量和计算成本,同时提供强的隐私保护。

On the Evaluation of Generative Models in Distributed Learning Tasks

  • paper_url: http://arxiv.org/abs/2310.11714
  • repo_url: None
  • paper_authors: Zixiao Wang, Farzan Farnia, Zhenghao Lin, Yunheng Shen, Bei Yu
  • for: 这篇论文主要关注在分布式学习中评估深度生成模型,包括生成对抗网络(GANs)和扩散模型。
  • methods: 这篇论文使用了Fréchet对劲距离(FID)和核心对劲距离(KID)等评估生成模型的方法。
  • results: 论文发现在分布式学习 задача中,使用FID和KID评估生成模型的结果可能会不同,具体来说是FID-avg和FID-all的评估结果可能会不同,而KID-avg和KID-all的评估结果则相同。
    Abstract The evaluation of deep generative models including generative adversarial networks (GANs) and diffusion models has been extensively studied in the literature. While the existing evaluation methods mainly target a centralized learning problem with training data stored by a single client, many applications of generative models concern distributed learning settings, e.g. the federated learning scenario, where training data are collected by and distributed among several clients. In this paper, we study the evaluation of generative models in distributed learning tasks with heterogeneous data distributions. First, we focus on the Fr\'echet inception distance (FID) and consider the following FID-based aggregate scores over the clients: 1) FID-avg as the mean of clients' individual FID scores, 2) FID-all as the FID distance of the trained model to the collective dataset containing all clients' data. We prove that the model rankings according to the FID-all and FID-avg scores could be inconsistent, which can lead to different optimal generative models according to the two aggregate scores. Next, we consider the kernel inception distance (KID) and similarly define the KID-avg and KID-all aggregations. Unlike the FID case, we prove that KID-all and KID-avg result in the same rankings of generative models. We perform several numerical experiments on standard image datasets and training schemes to support our theoretical findings on the evaluation of generative models in distributed learning problems.
    摘要 文章研究了深度生成模型(包括生成对抗网络)在分布式学习任务中的评价方法。现有评价方法主要针对中央式学习问题,即训练数据由单个客户端存储。然而,许多生成模型应用场景是分布式学习场景,例如联邦学习场景,其中训练数据由多个客户端分布存储。本文研究了分布式学习任务中各客户端数据分布不同的生成模型评价方法。首先,我们关注Fréchet吸引距离(FID),并考虑以下FID基于客户端的综合分数:1)FID-avg,即客户端个体FID分数的平均值,2)FID-all,即训练模型与所有客户端数据集的FID距离。我们证明了FID-all和FID-avg的模型排名可能不一致,可能导致不同的优化生成模型。接下来,我们考虑核心吸引距离(KID),并定义KID-avg和KID-all综合分数。与FID不同的是,我们证明了KID-all和KID-avg的模型排名是一致的。我们在标准图像集和训练方案上进行了多个数值实验来支持我们的理论发现。

Learning under Label Proportions for Text Classification

  • paper_url: http://arxiv.org/abs/2310.11707
  • repo_url: None
  • paper_authors: Jatin Chauhan, Xiaoxuan Wang, Wei Wang
  • for: 本研究旨在探讨在�xygen Learning from Label Proportions(LLP)的挑战性设置下进行NLPR Training,其数据提供在汇总形式下,仅提供每个类别的样本比例作为ground truth。
  • methods: 本研究提出了一种新的形式ulation,以及一种learnability result,以提供一个generalization bound under LLP。此外,该研究还使用了一种自我supervised objective。
  • results: 根据实验结果,该方法在大规模模型和多个维度上的文本数据上 achieved better results compared to基elines in almost 87% of the experimental configurations, across multiple metrics。
    Abstract We present one of the preliminary NLP works under the challenging setup of Learning from Label Proportions (LLP), where the data is provided in an aggregate form called bags and only the proportion of samples in each class as the ground truth. This setup is inline with the desired characteristics of training models under Privacy settings and Weakly supervision. By characterizing some irregularities of the most widely used baseline technique DLLP, we propose a novel formulation that is also robust. This is accompanied with a learnability result that provides a generalization bound under LLP. Combining this formulation with a self-supervised objective, our method achieves better results as compared to the baselines in almost 87% of the experimental configurations which include large scale models for both long and short range texts across multiple metrics.
    摘要 我们介绍了一项初步的自然语言处理(NLP)工作,在“学习从标签含量(LLP)”的挑战性设置下进行训练,其中数据提供在归一化的形式下,即袋(bag),并且只有每个类别的样本占总数的比例作为真实的地面信息。这种设置符合训练模型下的隐私设置和弱监督。我们对最常用的基线技术DLLP的不规则性进行描述,并提出了一种新的形式ulation,这种形式ulation具有 robustness。此外,我们还提供了一个learnability result,它在LLP下提供了一个通用的泛化 bound。将这种形式ulation与一种自我超vised目标函数相结合,我们的方法在大规模的实验配置中(包括长文本和短文本) across multiple metrics Achieves better results than baselines in nearly 87% of the cases.

AUC-mixup: Deep AUC Maximization with Mixup

  • paper_url: http://arxiv.org/abs/2310.11693
  • repo_url: None
  • paper_authors: Jianzhi Xv, Gang Li, Tianbao Yang
  • for: 提高异常点识别模型的泛化能力,解决深度AUC最大化(DAM)在小据集上存在严重过拟合问题。
  • methods: 使用混合数据增强(mixup)数据增强技术,并采用AUC环境损失来有效地从混合数据生成的数据中学习,称为AUC-mixup损失。
  • results: 在异常点识别和医学影像数据集上,与标准DAM训练方法相比,提出的AUC-mixup方法显示出更高的泛化性能。
    Abstract While deep AUC maximization (DAM) has shown remarkable success on imbalanced medical tasks, e.g., chest X-rays classification and skin lesions classification, it could suffer from severe overfitting when applied to small datasets due to its aggressive nature of pushing prediction scores of positive data away from that of negative data. This paper studies how to improve generalization of DAM by mixup data augmentation -- an approach that is widely used for improving generalization of the cross-entropy loss based deep learning methods. %For overfitting issues arising from limited data, the common approach is to employ mixup data augmentation to boost the models' generalization performance by enriching the training data. However, AUC is defined over positive and negative pairs, which makes it challenging to incorporate mixup data augmentation into DAM algorithms. To tackle this challenge, we employ the AUC margin loss and incorporate soft labels into the formulation to effectively learn from data generated by mixup augmentation, which is referred to as the AUC-mixup loss. Our experimental results demonstrate the effectiveness of the proposed AUC-mixup methods on imbalanced benchmark and medical image datasets compared to standard DAM training methods.
    摘要 While deep AUC maximization (DAM) has shown remarkable success on imbalanced medical tasks, such as chest X-rays classification and skin lesions classification, it can suffer from severe overfitting when applied to small datasets due to its aggressive nature of pushing prediction scores of positive data away from that of negative data. This paper studies how to improve the generalization of DAM by using mixup data augmentation -- an approach that is widely used for improving the generalization of cross-entropy loss-based deep learning methods. For overfitting issues arising from limited data, the common approach is to employ mixup data augmentation to boost the models' generalization performance by enriching the training data. However, AUC is defined over positive and negative pairs, which makes it challenging to incorporate mixup data augmentation into DAM algorithms. To tackle this challenge, we employ the AUC margin loss and incorporate soft labels into the formulation to effectively learn from data generated by mixup augmentation, which is referred to as the AUC-mixup loss. Our experimental results demonstrate the effectiveness of the proposed AUC-mixup methods on imbalanced benchmark and medical image datasets compared to standard DAM training methods.

Deep learning based on Transformer architecture for power system short-term voltage stability assessment with class imbalance

  • paper_url: http://arxiv.org/abs/2310.11690
  • repo_url: None
  • paper_authors: Yang Li, Jiting Cao, Yan Xu, Lipeng Zhu, Zhao Yang Dong
  • for: 本研究提出了一种解决短时电压稳定评估中数据不均衡问题的方法,以提高实时电压稳定评估的精度和可靠性。
  • methods: 本研究使用了Transformer架构,开发了一种名为StaaT的稳定评估Transformer,并采用了conditional Wasserstein生成敌对网络(CWGAN-GP)来生成Synthetic数据,以帮助创建一个均衡的、代表性的训练集。此外,本研究还采用了半upervised clustering学习来提高划分质量,因为短时电压稳定无一定的量化标准。
  • results: 数据测试表明,提出的方法在面临100:1的数据不均衡和噪音环境时仍然保持了稳定的性能,并且在增加可再生能源的情况下也保持了一致的效果。比较结果表明,CWGAN-GP生成的数据更具备均衡性,而StaaT也超过了其他深度学习算法。这种方法可以应用于实际短时电压稳定评估中,frequently face着数据不均衡和噪音挑战。
    Abstract Most existing data-driven power system short-term voltage stability assessment (STVSA) approaches presume class-balanced input data. However, in practical applications, the occurrence of short-term voltage instability following a disturbance is minimal, leading to a significant class imbalance problem and a consequent decline in classifier performance. This work proposes a Transformer-based STVSA method to address this challenge. By utilizing the basic Transformer architecture, a stability assessment Transformer (StaaT) is developed {as a classification model to reflect the correlation between the operational states of the system and the resulting stability outcomes}. To combat the negative impact of imbalanced datasets, this work employs a conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for synthetic data generation, aiding in the creation of a balanced, representative training set for the classifier. Semi-supervised clustering learning is implemented to enhance clustering quality, addressing the lack of a unified quantitative criterion for short-term voltage stability. {Numerical tests on the IEEE 39-bus test system extensively demonstrate that the proposed method exhibits robust performance under class imbalances up to 100:1 and noisy environments, and maintains consistent effectiveness even with an increased penetration of renewable energy}. Comparative results reveal that the CWGAN-GP generates more balanced datasets than traditional oversampling methods and that the StaaT outperforms other deep learning algorithms. This study presents a compelling solution for real-world STVSA applications that often face class imbalance and data noise challenges.
    摘要 现有的数据驱动电力系统短期电压稳定评估(STVSA)方法大多假设输入数据具有均衡的分布。然而,在实际应用中,短期电压不稳定的发生率很低,导致数据分布受到很大的偏好问题,从而导致分类器性能下降。这项工作提出了一种基于Transformer的STVSA方法来解决这个挑战。通过利用基本Transformer架构,我们开发了一种稳定评估Transformer(StaaT),用于反映系统运行状态和导致的稳定结果之间的相关性。为了解决偏好数据的负面影响,这项工作采用了 conditional Wasserstein生成敌方网络(CWGAN-GP) для生成人工数据,以帮助创建一个均衡、代表性的训练集 для分类器。 semi-supervised clustering learning 技术被应用以提高归一化质量,因为没有短期电压稳定的准确量标准。 {numeraire tests on the IEEE 39-bus test system extensively demonstrate that the proposed method exhibits robust performance under class imbalances up to 100:1 and noisy environments, and maintains consistent effectiveness even with an increased penetration of renewable energy}. comparative results reveal that the CWGAN-GP generates more balanced datasets than traditional oversampling methods and that the StaaT outperforms other deep learning algorithms. this study presents a compelling solution for real-world STVSA applications that often face class imbalance and data noise challenges.

Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features

  • paper_url: http://arxiv.org/abs/2310.11654
  • repo_url: None
  • paper_authors: Hangbin Lee, Il Do Ha, Changha Hwang, Youngjo Lee
  • for: 提高预测性能和学习效率,适用于高纬度ategorical特征的分布数据处理。
  • methods: 基于 hierarchical likelihood 学习框架,引入gamma random effects,同时利用最大 likelihood 估计和best unbiased predictors来捕捉输入变量的非线性效应和subject-specific层效应。
  • results: 通过实验和实际数据分析,证明了提议方法的优势,包括提高预测性能和学习效率,以及适用于高纬度ategorical特征的分布数据处理。
    Abstract There is a growing interest in subject-specific predictions using deep neural networks (DNNs) because real-world data often exhibit correlations, which has been typically overlooked in traditional DNN frameworks. In this paper, we propose a novel hierarchical likelihood learning framework for introducing gamma random effects into the Poisson DNN, so as to improve the prediction performance by capturing both nonlinear effects of input variables and subject-specific cluster effects. The proposed method simultaneously yields maximum likelihood estimators for fixed parameters and best unbiased predictors for random effects by optimizing a single objective function. This approach enables a fast end-to-end algorithm for handling clustered count data, which often involve high-cardinality categorical features. Furthermore, state-of-the-art network architectures can be easily implemented into the proposed h-likelihood framework. As an example, we introduce multi-head attention layer and a sparsemax function, which allows feature selection in high-dimensional settings. To enhance practical performance and learning efficiency, we present an adjustment procedure for prediction of random parameters and a method-of-moments estimator for pretraining of variance component. Various experiential studies and real data analyses confirm the advantages of our proposed methods.
    摘要 有越来越多的研究者对特定领域预测使用深度神经网络(DNN),因为实际数据经常具有相关性,传统的DNN框架中通常会忽略这些相关性。在这篇论文中,我们提出了一种新的层次可能性学习框架,以在Poisson DNN中引入γ随机效应,以提高预测性能,同时捕捉输入变量的非线性效应和特定颗集效应。我们的方法同时实现最大可能性估计器和不偏预测器,通过优化单个目标函数。这种方法使得可以快速处理受集分布的端到端算法,这些分布frequently包含高cardinality的分类特征。此外,我们可以轻松地将当前的网络架构 integrate into our proposed h-likelihood framework。例如,我们引入多头注意层和简洁最大化函数,这些功能允许在高维度设置中进行特征选择。为了提高实际性和学习效率,我们提出了预测随机参数的调整方法和预测变量组件的方法-of-moments估计器。多种实验和实际数据分析证明了我们的提出的方法的优势。

Free-text Keystroke Authentication using Transformers: A Comparative Study of Architectures and Loss Functions

  • paper_url: http://arxiv.org/abs/2310.11640
  • repo_url: None
  • paper_authors: Saleh Momeni, Bagher BabaAli
  • for: 这个研究旨在提出一个基于Transformer的网络,以提高键盘识别和验证的精度。
  • methods: 这个模型使用自我注意力来提取键盘序列中的有用特征,并评估了两种不同的架构, namely bi-encoder 和 cross-encoder,以及不同的损失函数和距离度量。
  • results: 这个研究发现,使用 bi-encoder 架构、批量全 triplet 损失函数和圆形距离度量可以实现最佳性能,具体Equla Error Rate 为0.0186%。此外,还探讨了不同的相似度评估方法,以提高模型的精度。
    Abstract Keystroke biometrics is a promising approach for user identification and verification, leveraging the unique patterns in individuals' typing behavior. In this paper, we propose a Transformer-based network that employs self-attention to extract informative features from keystroke sequences, surpassing the performance of traditional Recurrent Neural Networks. We explore two distinct architectures, namely bi-encoder and cross-encoder, and compare their effectiveness in keystroke authentication. Furthermore, we investigate different loss functions, including triplet, batch-all triplet, and WDCL loss, along with various distance metrics such as Euclidean, Manhattan, and cosine distances. These experiments allow us to optimize the training process and enhance the performance of our model. To evaluate our proposed model, we employ the Aalto desktop keystroke dataset. The results demonstrate that the bi-encoder architecture with batch-all triplet loss and cosine distance achieves the best performance, yielding an exceptional Equal Error Rate of 0.0186%. Furthermore, alternative algorithms for calculating similarity scores are explored to enhance accuracy. Notably, the utilization of a one-class Support Vector Machine reduces the Equal Error Rate to an impressive 0.0163%. The outcomes of this study indicate that our model surpasses the previous state-of-the-art in free-text keystroke authentication. These findings contribute to advancing the field of keystroke authentication and offer practical implications for secure user verification systems.
    摘要 “键盘生物метри学是一种有前途的方法 для用户识别和验证,利用个人键盘实习独特的模式。在本研究中,我们提出了基于Transformer的网络,使用自我对项来提取键盘序列中有用的特征,超越传统的Recurrent Neural Networks的表现。我们探索了两种不同的架构,分别是双向encoder和cross-encoder,并比较它们在键盘验证中的效果。此外,我们寻找了不同的损失函数,包括三重、批量三重和WDCL损失函数,以及不同的距离度量,如Euclidean、曼哈顿和内角距离。这些实验允许我们优化训练过程,提高模型的性能。为了评估我们的提案模型,我们使用了阿尔托桌面键盘数据集。结果显示,双向encoder架构加 batch-all triplet损失函数和内角距离可以取得最佳性能,具体Equla Error Rate为0.0186%。此外,我们还探索了不同的相似度计算算法,以提高准确性。例如,使用一个一阶支持向量机可以降低Equla Error Rate至0.0163%。研究结果显示,我们的模型超越了过去的州际前进于自由文本键盘验证。这些发现对于键盘验证领域的进步做出了贡献,并且提供了实际的应用于安全用户验证系统。”