results: DPpack提供了一个 user-friendly 的隐私保护版本的 logistic regression、SVM 和线性回归,以及隐私保护的模型参数调整。这些实现的隐私保护统计和机器学习技术,使得通常进行的统计分析中可以轻松地应用隐私保护原则。Abstract
Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy for individuals when releasing aggregated statistics or building statistical/machine learning models from data. We develop the open-source R package DPpack that provides a large toolkit of differentially private analysis. The current version of DPpack implements three popular mechanisms for ensuring DP: Laplace, Gaussian, and exponential. Beyond that, DPpack provides a large toolkit of easily accessible privacy-preserving descriptive statistics functions. These include mean, variance, covariance, and quantiles, as well as histograms and contingency tables. Finally, DPpack provides user-friendly implementation of privacy-preserving versions of logistic regression, SVM, and linear regression, as well as differentially private hyperparameter tuning for each of these models. This extensive collection of implemented differentially private statistics and models permits hassle-free utilization of differential privacy principles in commonly performed statistical analysis. We plan to continue developing DPpack and make it more comprehensive by including more differentially private machine learning techniques, statistical modeling and inference in the future.
摘要
differential privacy (DP) 是当今最先进的隐私保护框架,用于保护数据分析时个人隐私。我们开发了一个开源的 R 包 DPpack,该包提供了丰富的涉及扩展的隐私保护分析工具。现版本的 DPpack 实现了三种流行的隐私保护机制:拉пла斯、高斯和指数。此外,DPpack 还提供了许多易于访问的隐私保护描述统计函数,包括平均值、方差、covariance 和分位数,以及 histogram 和 conditional tables。最后,DPpack 提供了用户友好的实现隐私保护版本的 logistic regression、支持向量机器学习和线性回归,以及隐私保护参数优化 для每一种模型。这种广泛的实现的涉及隐私保护统计和机器学习技术,使得使用隐私保护原则在常见的统计分析中免受困惑。我们计划继续开发 DPpack,以便在未来包括更多的隐私保护机器学习技术、统计模型和推理。
Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces
paper_authors: Andrea Angiuli, Jean-Pierre Fouque, Ruimeng Hu, Alan Raydan
for: The paper is written to develop and analyze a reinforcement learning algorithm for solving continuous-space mean field game and mean field control problems in a unified manner.
methods: The proposed algorithm uses the actor-critic paradigm with a parameterized score function to represent the mean field distribution, and updates the AC agent and the score function iteratively to converge to the MFG equilibrium or the MFC optimum. Langevin dynamics are used to obtain samples from the resulting distribution.
results: The performance of the algorithm is evaluated using linear-quadratic benchmarks in the asymptotic infinite horizon framework, and the results show that the algorithm is able to find the optimal solution to the mean field control problem.Here’s the same information in Simplified Chinese text:
results: 文章中的性能分析使用线性-квадратиче benchmark在极限无穷远 horizon框架下,结果显示该算法可以解决含义场控制问题的优化问题。Abstract
We present the development and analysis of a reinforcement learning (RL) algorithm designed to solve continuous-space mean field game (MFG) and mean field control (MFC) problems in a unified manner. The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function, which can be efficiently updated in an online fashion, and uses Langevin dynamics to obtain samples from the resulting distribution. The AC agent and the score function are updated iteratively to converge, either to the MFG equilibrium or the MFC optimum for a given mean field problem, depending on the choice of learning rates. A straightforward modification of the algorithm allows us to solve mixed mean field control games (MFCGs). The performance of our algorithm is evaluated using linear-quadratic benchmarks in the asymptotic infinite horizon framework.
摘要
我们介绍了一种基于 actor-critic(AC)方法的强化学习算法,用于解决连续空间的mean field game(MFG)和mean field control(MFC)问题。我们的方法通过使用参数化的分数函数来表示mean field分布,可以高效地在在线模式下更新,并使用朗格文动力学来获取该分布中的样本。AC Agent和分数函数在每次更新后会趋于MFG均衡或MFC优化点,具体取决于学习率。我们还提出了一种简单的修改,使得我们的算法可以解决混合mean field控制游戏(MFCG)。我们对linear-quadratic benchmark在无穷远距离框架中进行了性能评估。
methods: 我们引入了标准speech分类任务的测试集中的分布转移,并explored how TTT可以适应这种分布转移。我们的实验包括后声和自然语音的变化(如性别和年龄)引入分布转移。
results: 我们发现了一些关键挑战,包括优化器参数的敏感性(例如数量化步骤和选择参数的TTT)和可扩展性(每个示例都需要自己的参数)。我们提议使用BitFit,一种效率高的参数细化算法,并证明它在TTT中更稳定 than全模型参数细化。Abstract
In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications. In particular, we introduce distribution-shifts to the test datasets of standard speech-classification tasks -- for example, speaker-identification and emotion-detection -- and explore how Test-Time Training (TTT) can help adjust to the distribution-shift. In our experiments that include distribution shifts due to background noise and natural variations in speech such as gender and age, we identify some key-challenges with TTT including sensitivity to optimization hyperparameters (e.g., number of optimization steps and subset of parameters chosen for TTT) and scalability (e.g., as each example gets its own set of parameters, TTT is not scalable). Finally, we propose using BitFit -- a parameter-efficient fine-tuning algorithm proposed for text applications that only considers the bias parameters for fine-tuning -- as a solution to the aforementioned challenges and demonstrate that it is consistently more stable than fine-tuning all the parameters of the model.
摘要
在这篇论文中,我们研究了在语音应用中使用测试时训练(TTT)解决发布分布变化的问题。我们特别是在标准语音分类任务的测试集中引入分布变化,例如 speaker-identification 和 emotion-detection,并探索了 TTT 如何适应这些分布变化。在我们的实验中,包括背景噪音和自然语音变化(如性别和年龄)的分布变化,我们发现了一些关键挑战,如优化器参数的敏感性(例如优化步数和选择的参数个数)以及可扩展性(例如每个示例都需要自己的参数)。最后,我们提议使用 BitFit,一种参数效率的精度调整算法,解决这些挑战,并证明它在文本应用中具有更高的稳定性。
Posterior Contraction Rates for Matérn Gaussian Processes on Riemannian Manifolds
paper_authors: Paul Rosa, Viacheslav Borovitskiy, Alexander Terenin, Judith Rousseau
for: 本研究旨在探讨 Whether intrinsic geometric Gaussian processes (GPs) can lead to better performance compared to embedding all relevant quantities into $\mathbb{R}^d$ and using the restriction of an ordinary Euclidean GP.
methods: 本研究使用了Optimal contraction rates for intrinsic Mat'ern GPs defined on compact Riemannian manifolds, as well as trace and extension theorems between manifold and ambient Sobolev spaces.
results: 研究发现,对于合适的细分参数,intrinsic GPs can achieve better performance than embedding all relevant quantities into $\mathbb{R}^d$ and using the restriction of an ordinary Euclidean GP. This result is demonstrated empirically on a number of examples.Abstract
Gaussian processes are used in many machine learning applications that rely on uncertainty quantification. Recently, computational tools for working with these models in geometric settings, such as when inputs lie on a Riemannian manifold, have been developed. This raises the question: can these intrinsic models be shown theoretically to lead to better performance, compared to simply embedding all relevant quantities into $\mathbb{R}^d$ and using the restriction of an ordinary Euclidean Gaussian process? To study this, we prove optimal contraction rates for intrinsic Mat\'ern Gaussian processes defined on compact Riemannian manifolds. We also prove analogous rates for extrinsic processes using trace and extension theorems between manifold and ambient Sobolev spaces: somewhat surprisingly, the rates obtained turn out to coincide with those of the intrinsic processes, provided that their smoothness parameters are matched appropriately. We illustrate these rates empirically on a number of examples, which, mirroring prior work, show that intrinsic processes can achieve better performance in practice. Therefore, our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency of geometric Gaussian processes, particularly in settings which involve small data set sizes and non-asymptotic behavior.
摘要
Crypto’Graph: Leveraging Privacy-Preserving Distributed Link Prediction for Robust Graph Learning
For: 针对分布式图的隐私保护和敏感数据分享* Methods: 使用 cryptographic primitives 进行隐私保护,不需要披露个别党所拥有的图结构,却可以 computed 新的连接likelihood* Results: 能够实现高精度预测和防范图中毒攻击Abstract
Graphs are a widely used data structure for collecting and analyzing relational data. However, when the graph structure is distributed across several parties, its analysis is particularly challenging. In particular, due to the sensitivity of the data each party might want to keep their partial knowledge of the graph private, while still willing to collaborate with the other parties for tasks of mutual benefit, such as data curation or the removal of poisoned data. To address this challenge, we propose Crypto'Graph, an efficient protocol for privacy-preserving link prediction on distributed graphs. More precisely, it allows parties partially sharing a graph with distributed links to infer the likelihood of formation of new links in the future. Through the use of cryptographic primitives, Crypto'Graph is able to compute the likelihood of these new links on the joint network without revealing the structure of the private individual graph of each party, even though they know the number of nodes they have, since they share the same graph but not the same links. Crypto'Graph improves on previous works by enabling the computation of a certain number of similarity metrics without any additional cost. The use of Crypto'Graph is illustrated for defense against graph poisoning attacks, in which it is possible to identify potential adversarial links without compromising the privacy of the graphs of individual parties. The effectiveness of Crypto'Graph in mitigating graph poisoning attacks and achieving high prediction accuracy on a graph neural network node classification task is demonstrated through extensive experimentation on a real-world dataset.
摘要
GRAPHs 是一种广泛使用的数据结构,用于收集和分析关系数据。然而,当 GRAPH 结构分布在多个方面时,其分析变得特别困难。具体来说,由于每个方面可能想保持自己的部分 GRAPH 私有知识,而同时愿意与其他方面合作,例如数据整理或毒素数据的去除。为解决这个挑战,我们提议 Crypto'Graph,一种高效的隐私保护链接预测协议。更准确地说,它允许方面分享部分 GRAPH 的链接来预测未来新链接的可能性。通过使用 криптографических原则,Crypto'Graph 可以在共同网络上计算新链接的可能性,而不需要披露每个方面的私有 GRAPH 结构,即使它们知道它们拥有的节点数量。Crypto'Graph 在前一些工作的基础上进一步提高了可计算的相似指标数量,而不添加任何成本。使用 Crypto'Graph 可以防止 GRAPH 毒素攻击,并在一个实际 dataset 上进行了广泛的实验,证明了它的有效性。
Dynamical Tests of a Deep-Learning Weather Prediction Model
results: 研究者发现,这种模型在不同的情况下都能够表现出真实的物理特性,包括热带高压系统、温带低压系统和极低压系统的形成等。Abstract
Global deep-learning weather prediction models have recently been shown to produce forecasts that rival those from physics-based models run at operational centers. It is unclear whether these models have encoded atmospheric dynamics, or simply pattern matching that produces the smallest forecast error. Answering this question is crucial to establishing the utility of these models as tools for basic science. Here we subject one such model, Pangu-weather, to a set of four classical dynamical experiments that do not resemble the model training data. Localized perturbations to the model output and the initial conditions are added to steady time-averaged conditions, to assess the propagation speed and structural evolution of signals away from the local source. Perturbing the model physics by adding a steady tropical heat source results in a classical Matsuno--Gill response near the heating, and planetary waves that radiate into the extratropics. A localized disturbance on the winter-averaged North Pacific jet stream produces realistic extratropical cyclones and fronts, including the spontaneous emergence of polar lows. Perturbing the 500hPa height field alone yields adjustment from a state of rest to one of wind--pressure balance over ~6 hours. Localized subtropical low pressure systems produce Atlantic hurricanes, provided the initial amplitude exceeds about 5 hPa, and setting the initial humidity to zero eliminates hurricane development. We conclude that the model encodes realistic physics in all experiments, and suggest it can be used as a tool for rapidly testing ideas before using expensive physics-based models.
摘要
全球深度学习天气预测模型最近已经能够生成与物理基础模型运行中心的预测相当的forecast。然而,这些模型是通过编码大气动力学或 simply模式匹配来生成预测错误最小化的。解决这个问题是确定这些模型是否有用作基础科学工具的关键。在这里,我们使用一个名为Pangu-weather的模型进行四种经典动力学实验,这些实验不同于模型训练数据。我们在稳定时间均值条件下添加了本地扰动和初始条件,以评估信号径向速度和结构的发展。在添加了热带热源后,模型物理学会出现经典的Matsuno--Gill响应,以及在极地射线方向射出的 планет� waves。在冬季平均北太平洋液压流上添加了本地扰动,可以生成真实的温带风暴和前线,包括自发性产生的极地低压系统。在500hPa高程场 alone 上进行调整,可以在 ~6小时内从一种平衡状态变换到一种风压平衡状态。当初始气压值大于5 hPa,并将初始湿度设置为0时,可以生成 Atlantics 风暴。我们 conclude that这些模型编码了实际物理学,并可以用作快速测试想法之前使用昂贵的物理基础模型。
$O(k)$-Equivariant Dimensionality Reduction on Stiefel Manifolds
results: 本研究通过多个实验表明,PSC算法可以对高维度资料进行有效的降维,并且可以提高分析和检测的效率。另外,研究也显示了PSC算法在不同的数据集中的表现。Abstract
Many real-world datasets live on high-dimensional Stiefel and Grassmannian manifolds, $V_k(\mathbb{R}^N)$ and $Gr(k, \mathbb{R}^N)$ respectively, and benefit from projection onto lower-dimensional Stiefel (respectively, Grassmannian) manifolds. In this work, we propose an algorithm called Principal Stiefel Coordinates (PSC) to reduce data dimensionality from $ V_k(\mathbb{R}^N)$ to $V_k(\mathbb{R}^n)$ in an $O(k)$-equivariant manner ($k \leq n \ll N$). We begin by observing that each element $\alpha \in V_n(\mathbb{R}^N)$ defines an isometric embedding of $V_k(\mathbb{R}^n)$ into $V_k(\mathbb{R}^N)$. Next, we optimize for such an embedding map that minimizes data fit error by warm-starting with the output of principal component analysis (PCA) and applying gradient descent. Then, we define a continuous and $O(k)$-equivariant map $\pi_\alpha$ that acts as a ``closest point operator'' to project the data onto the image of $V_k(\mathbb{R}^n)$ in $V_k(\mathbb{R}^N)$ under the embedding determined by $\alpha$, while minimizing distortion. Because this dimensionality reduction is $O(k)$-equivariant, these results extend to Grassmannian manifolds as well. Lastly, we show that the PCA output globally minimizes projection error in a noiseless setting, but that our algorithm achieves a meaningfully different and improved outcome when the data does not lie exactly on the image of a linearly embedded lower-dimensional Stiefel manifold as above. Multiple numerical experiments using synthetic and real-world data are performed.
摘要
许多实际数据集生活在高维度Stiefel和Grassmannian manifolds上,即$V_k(\mathbb{R}^N)$和$Gr(k, \mathbb{R}^N)$ respectively,并且受益于降维到lower-dimensional Stiefel(respectively, Grassmannian) manifolds上的投影。在这种工作中,我们提出了一种名为Principal Stiefel Coordinates(PSC)的算法,用于从$V_k(\mathbb{R}^N)$降维到$V_k(\mathbb{R}^n)$,并且在$O(k)$-equivariant manner下进行。我们开始 Observation that each element $\alpha \in V_n(\mathbb{R}^N)$ defines an isometric embedding of $V_k(\mathbb{R}^n)$ into $V_k(\mathbb{R}^N)$. Next, we optimize for such an embedding map that minimizes data fit error by warm-starting with the output of principal component analysis (PCA) and applying gradient descent. Then, we define a continuous and $O(k)$-equivariant map $\pi_\alpha$ that acts as a "closest point operator" to project the data onto the image of $V_k(\mathbb{R}^n)$ in $V_k(\mathbb{R}^N)$ under the embedding determined by $\alpha$, while minimizing distortion. Because this dimensionality reduction is $O(k)$-equivariant, these results extend to Grassmannian manifolds as well. Lastly, we show that the PCA output globally minimizes projection error in a noiseless setting, but that our algorithm achieves a meaningfully different and improved outcome when the data does not lie exactly on the image of a linearly embedded lower-dimensional Stiefel manifold as above. Multiple numerical experiments using synthetic and real-world data are performed.
Semi-supervised Domain Adaptation in Graph Transfer Learning
results: 我们在一些公开的数据集上进行了广泛的实验,证明了我们的提出的SGDA在不同的实验设定下的效果。Abstract
As a specific case of graph transfer learning, unsupervised domain adaptation on graphs aims for knowledge transfer from label-rich source graphs to unlabeled target graphs. However, graphs with topology and attributes usually have considerable cross-domain disparity and there are numerous real-world scenarios where merely a subset of nodes are labeled in the source graph. This imposes critical challenges on graph transfer learning due to serious domain shifts and label scarcity. To address these challenges, we propose a method named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain shift, we add adaptive shift parameters to each of the source nodes, which are trained in an adversarial manner to align the cross-domain distributions of node embedding, thus the node classifier trained on labeled source nodes can be transferred to the target nodes. Moreover, to address the label scarcity, we propose pseudo-labeling on unlabeled nodes, which improves classification on the target graph via measuring the posterior influence of nodes based on their relative position to the class centroids. Finally, extensive experiments on a range of publicly accessible datasets validate the effectiveness of our proposed SGDA in different experimental settings.
摘要
为特例的图转移学习,无监督领域适应图的目标是将标签丰富的源图知识传播到无标注目标图。然而,图像结构和特征通常存在跨领域差异,世界上有许多实际情况下只有源图中的一个节点被标注。这些挑战使得图转移学习受到了严重的领域shift和标签稀缺的挑战。为解决这些挑战,我们提出了半监督图领域适应方法(SGDA)。为了处理领域shift,我们在源节点中添加了适应参数,这些参数在对抗式的训练中使得源节点的领域分布与目标节点的领域分布相似,从而使得源节点的类别推论器可以被传递到目标节点。此外,为了解决标签稀缺的问题,我们提出了假标注方法,通过测量每个节点的后向影响,根据节点的相对位置来提高目标图的分类精度。最后,我们在一系列公共访问的数据集上进行了广泛的实验,证明了我们提出的SGDA在不同的实际情况下的效果。
Improving Opioid Use Disorder Risk Modelling through Behavioral and Genetic Feature Integration
results: 结果表明,结合行业和基因特征可以提高风险评估,并且行业特征对OUD风险的影响更大,尽管基因贡献也是显著的,特别是在线性模型中。但是需要考虑隐私、安全、偏见和普遍性问题,才能在临床试验中评估这种方法的可行性。Abstract
Opioids are an effective analgesic for acute and chronic pain, but also carry a considerable risk of addiction leading to millions of opioid use disorder (OUD) cases and tens of thousands of premature deaths in the United States yearly. Estimating OUD risk prior to prescription could improve the efficacy of treatment regimens, monitoring programs, and intervention strategies, but risk estimation is typically based on self-reported data or questionnaires. We develop an experimental design and computational methods that combines genetic variants associated with OUD with behavioral features extracted from GPS and Wi-Fi spatiotemporal coordinates to assess OUD risk. Since both OUD mobility and genetic data do not exist for the same cohort, we develop algorithms to (1) generate mobility features from empirical distributions and (2) synthesize mobility and genetic samples assuming a level of comorbidity and relative risks. We show that integrating genetic and mobility modalities improves risk modelling using classification accuracy, area under the precision-recall and receiver operator characteristic curves, and $F_1$ score. Interpreting the fitted models suggests that mobility features have more influence on OUD risk, although the genetic contribution was significant, particularly in linear models. While there exists concerns with respect to privacy, security, bias, and generalizability that must be evaluated in clinical trials before being implemented in practice, our framework provides preliminary evidence that behavioral and genetic features may improve OUD risk estimation to assist with personalized clinical decision-making.
摘要
吗 Opioids 是一种有效的疼痛镇静药物,但也会导致数百万例的 Opioid 使用障碍 (OUD) 和数千例的危险死亡在美国每年。 估计 OUD 风险前置于订scriptions可以提高治疗方案、监测计划和干预策略的效果,但风险估计通常基于自我报告数据或问卷。 我们开发了一种实验设计和计算方法,结合 associates with OUD 的遗传变异和从 GPS 和 Wi-Fi 空间时间坐标提取的行为特征来评估 OUD 风险。 因为 OUD mobilit和遗传数据不同层次, we develop algorithms to (1) generate mobilit features from empirical distributions and (2) synthesize mobilit and genetic samples assuming a level of comorbidity and relative risks. We show that integrating genetic and mobilit modalities improves risk modelling using classification accuracy, area under the precision-recall and receiver operator characteristic curves, and $F_1$ score. Interpreting the fitted models suggests that mobility features have more influence on OUD risk, although the genetic contribution was significant, particularly in linear models. While there exists concerns with respect to privacy, security, bias, and generalizability that must be evaluated in clinical trials before being implemented in practice, our framework provides preliminary evidence that behavioral and genetic features may improve OUD risk estimation to assist with personalized clinical decision-making.
Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
results: 在AudioCaps dataset上进行了实验,发现这些模型可以保持 diffusion models 的高质量和多样性,并且降低了查询次数,每次查询可以减少到400次。Abstract
Diffusion models power a vast majority of text-to-audio (TTA) generation methods. Unfortunately, these models suffer from slow inference speed due to iterative queries to the underlying denoising network, thus unsuitable for scenarios with inference time or computational constraints. This work modifies the recently proposed consistency distillation framework to train TTA models that require only a single neural network query. In addition to incorporating classifier-free guidance into the distillation process, we leverage the availability of generated audio during distillation training to fine-tune the consistency TTA model with novel loss functions in the audio space, such as the CLAP score. Our objective and subjective evaluation results on the AudioCaps dataset show that consistency models retain diffusion models' high generation quality and diversity while reducing the number of queries by a factor of 400.
摘要
多种扩散模型通常用于文本到语音(TTA)生成方法中。然而,这些模型受到迭代查询到底层减噪网络的限制,导致推理速度慢,不适合具有推理时间或计算限制的场景。本工作改进了最近提出的一致性熔化框架,以训练不需要多个神经网络查询的 TTA 模型。此外,我们还在熔化训练过程中 incorporate 类ifier-free 指导,并利用生成的音频数据来细化熔化 TTA 模型,使用 novel 的损失函数,如 CLAP 分数。我们对 AudioCaps 数据集进行了对象和主观评估,结果表明,一致性模型可以保持扩散模型的高质量和多样性,同时减少查询数量,比例为 400 倍。
Mixture Weight Estimation and Model Prediction in Multi-source Multi-target Domain Adaptation
results: 本文提出了一种新的混合学习方法,可以有效地解决多个目标分布的学习问题,并且可以在线上和离线上实现高效的学习。Abstract
We consider the problem of learning a model from multiple heterogeneous sources with the goal of performing well on a new target distribution. The goal of learner is to mix these data sources in a target-distribution aware way and simultaneously minimize the empirical risk on the mixed source. The literature has made some tangible advancements in establishing theory of learning on mixture domain. However, there are still two unsolved problems. Firstly, how to estimate the optimal mixture of sources, given a target domain; Secondly, when there are numerous target domains, how to solve empirical risk minimization (ERM) for each target using possibly unique mixture of data sources in a computationally efficient manner. In this paper we address both problems efficiently and with guarantees. We cast the first problem, mixture weight estimation, as a convex-nonconcave compositional minimax problem, and propose an efficient stochastic algorithm with provable stationarity guarantees. Next, for the second problem, we identify that for certain regimes, solving ERM for each target domain individually can be avoided, and instead parameters for a target optimal model can be viewed as a non-linear function on a space of the mixture coefficients. Building upon this, we show that in the offline setting, a GD-trained overparameterized neural network can provably learn such function to predict the model of target domain instead of solving a designated ERM problem. Finally, we also consider an online setting and propose a label efficient online algorithm, which predicts parameters for new targets given an arbitrary sequence of mixing coefficients, while enjoying regret guarantees.
摘要
我们考虑一个从多个不同来源学习模型的问题,目的是在新的目标分布上表现良好。学习者需要将这些数据源混合在目标分布意识的方式下,同时降低混合后的观察风险。文献已经做出了一些可观的进展,但还有两个未解决的问题。第一个问题是,给定目标分布,如何估计最佳混合比例?第二个问题是,当有多个目标分布时,如何使用可能不同的数据源混合来解决每个目标的Empirical Risk Minimization(ERM)问题,并且在计算效率上具有保证?在这篇论文中,我们efficiently和有保证地解决了这两个问题。我们将第一个问题,混合比例估计,转化为一个凸-非凸 Compositional Minimax问题,并提出了一种效果的杂化算法,具有可观的站点性保证。接下来,我们发现在某些情况下,可以避免解决每个目标分布的ERM问题,而是视 Parameters for a target optimal model as a non-linear function on a space of the mixture coefficients。在这个基础上,我们证明了在离线设置下,一个GD训练的过参数化神经网络可以可靠地学习这种函数,以预测目标分布下的模型。最后,我们还考虑了在线设置,并提出了一种标签效率的在线算法,可以预测新的目标参数,并且具有误差保证。
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
results: 研究发现,LLMs可以帮助生成高质量的AI加速器设计,并且可以帮助不熟悉硬件领域的人员也能够设计高效的加速器。这是首次在LLMs中实现了自动化AI加速器设计的工作。Abstract
The remarkable capabilities and intricate nature of Artificial Intelligence (AI) have dramatically escalated the imperative for specialized AI accelerators. Nonetheless, designing these accelerators for various AI workloads remains both labor- and time-intensive. While existing design exploration and automation tools can partially alleviate the need for extensive human involvement, they still demand substantial hardware expertise, posing a barrier to non-experts and stifling AI accelerator development. Motivated by the astonishing potential of large language models (LLMs) for generating high-quality content in response to human language instructions, we embark on this work to examine the possibility of harnessing LLMs to automate AI accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework intended to democratize AI accelerator design by leveraging human natural languages instead of domain-specific languages. Specifically, we first perform an in-depth investigation into LLMs' limitations and capabilities for AI accelerator design, thus aiding our understanding of our current position and garnering insights into LLM-powered automated AI accelerator design. Furthermore, drawing inspiration from the above insights, we develop a framework called GPT4AIGChip, which features an automated demo-augmented prompt-generation pipeline utilizing in-context learning to guide LLMs towards creating high-quality AI accelerator design. To our knowledge, this work is the first to demonstrate an effective pipeline for LLM-powered automated AI accelerator generation. Accordingly, we anticipate that our insights and framework can serve as a catalyst for innovations in next-generation LLM-powered design automation tools.
摘要
人工智能(AI)的出色能力和复杂性已经提高了特殊的AI加速器的需求。然而,为不同的AI任务设计这些加速器仍然是劳动和时间耗资的。虽然现有的设计探索和自动化工具可以部分减轻人类的参与,但它们仍需具备相当的硬件专业知识,从而成为非专家的障碍和AI加速器开发的束缚。为了解决这个问题,我们启动了这项工作,旨在利用大型自然语言模型(LLM)来自动化AI加速器设计。通过这项工作,我们开发了GPT4AIGChip框架,这是一个利用人类自然语言而不是域专门语言来自动化AI加速器设计的框架。 Specifically, we first perform an in-depth investigation into LLMs' limitations and capabilities for AI accelerator design, thus aiding our understanding of our current position and garnering insights into LLM-powered automated AI accelerator design. Furthermore, drawing inspiration from the above insights, we develop a framework called GPT4AIGChip, which features an automated demo-augmented prompt-generation pipeline utilizing in-context learning to guide LLMs towards creating high-quality AI accelerator design. To our knowledge, this work is the first to demonstrate an effective pipeline for LLM-powered automated AI accelerator generation. Accordingly, we anticipate that our insights and framework can serve as a catalyst for innovations in next-generation LLM-powered design automation tools.
On the different regimes of Stochastic Gradient Descent
For: + The paper is written to understand the dynamics of stochastic gradient descent (SGD) in training deep neural networks. + The authors aim to resolve the central challenges of understanding the cross-overs between SGD and gradient descent (GD) in large batches. + The paper focuses on a teacher-student perceptron classification model, and the results are expected to apply to deep networks.* Methods: + The authors use a phase diagram in the $B$-$\eta$ plane to separate three dynamical phases: noise-dominated SGD, large-first-step-dominated SGD, and GD. + The analysis reveals that the batch size $B^*$ separating regimes $\textit{(i)}$ and $\textit{(ii)}$ scales with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.* Results: + The authors obtain empirical results that support their key predictions and show the applicability of their analysis to deep networks. + The phase diagram provides a framework for understanding the different regimes of generalization error in SGD.Abstract
Modern deep networks are trained with stochastic gradient descent (SGD) whose key parameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the `temperature' $T\equiv \eta/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^*$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here we resolve these questions for a teacher-student perceptron classification model, and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$\eta$ plane that separates three dynamical phases: $\textit{(i)}$ a noise-dominated SGD governed by temperature, $\textit{(ii)}$ a large-first-step-dominated SGD and $\textit{(iii)}$ GD. These different phases also corresponds to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^*$ separating regimes $\textit{(i)}$ and $\textit{(ii)}$ scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.
摘要
现代深度网络通常使用梯度下降(SGD)进行训练,SGD的关键参数包括每次考虑的数据量或批处理大小$B$,以及学习率$\eta$。对于小$B$和大$\eta$来说,SGD对参数进行杂音演化,其杂音强度由温度$T\equiv \eta/B$控制。然而,这个描述会在 sufficient large batches $B\geq B^*$ 或者简化为梯度下降(GD)当温度够小时失效。理解这些交叉点的位置是中心挑战。我们在教师-学生批处理分类模型上解决这些问题,并证明我们的预测仍然适用于深度网络。具体来说,我们获得了 $B$-$\eta$ 平面上的相对谱,这个谱分为三个动力学阶段: $\textit{(i)}$ 杂音控制的SGD, $\textit{(ii)}$ 大first-step控制的SGD,和 $\textit{(iii)}$ GD。这些不同的阶段也对应于不同的泛化误差阶段。很remarkably,我们的分析显示,训练集大小$P$的Batch大小$B^*$分界线 separates这些阶段,并且这个分界线的幂数与泛化误差阶段的困难程度相关。
Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach
results: 只需要 $O\left(\log\left(1/\epsilon\right)^{\beta}\right)$ 二点成本信息,以达到 $\beta \in (0,1)$ 的 approximate optimal solution。Abstract
We investigate the problem of learning an $\epsilon$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient methods have proven to converge linearly to the optimal solution of the model-free LQR problem, the substantial requirement for two-point cost queries in gradient estimations may be intractable, particularly in applications where obtaining cost function evaluations at two distinct control input configurations is exceptionally costly. To this end, we propose an oracle-efficient approach. Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm. It achieves an approximate optimal solution with only $O\left(\log\left(1/\epsilon\right)^{\beta}\right)$ two-point cost information for $\beta \in (0,1)$.
摘要
我们研究一个 $\epsilon$-近似解决方案 для碎时Linear Quadratic Regulator(LQR)问题,使用Stochastic Variance-Reduced Policy Gradient(SVRPG)方法。 policy gradient 方法已经证明可以线性传递到model-free LQR 问题的最佳解,但是需要两点成本询问的要求可能是不可行的,特别是在应用中具有高成本的cost function询问。为此,我们提出了一个 oracle-efficient 方法。我们的方法结合了一点和二点询问的对称降低算法,实现了 $\epsilon$-近似解的 aproximate 优化,只需要 $O\left(\log\left(1/\epsilon\right)^{\beta}\right)$ 两点成本信息,其中 $\beta \in (0,1)$。
Implementing a new fully stepwise decomposition-based sampling technique for the hybrid water level forecasting model in real-world application
results: 根据这个研究,使用FSDB抽样技术和VMD分析方法,水位时间序列预测中的Nash-Sutcliffe效率(NSE)系数在三个站点中提高了6.4%、28.8%和7.0%,相比之下,使用现有最先进的抽样技术时的NSE系数提高幅度较低。同时,使用SSA分析方法时,NSE系数在三个站点中提高了3.2%、3.1%和1.1%。Abstract
Various time variant non-stationary signals need to be pre-processed properly in hydrological time series forecasting in real world, for example, predictions of water level. Decomposition method is a good candidate and widely used in such a pre-processing problem. However, decomposition methods with an inappropriate sampling technique may introduce future data which is not available in practical applications, and result in incorrect decomposition-based forecasting models. In this work, a novel Fully Stepwise Decomposition-Based (FSDB) sampling technique is well designed for the decomposition-based forecasting model, strictly avoiding introducing future information. This sampling technique with decomposition methods, such as Variational Mode Decomposition (VMD) and Singular spectrum analysis (SSA), is applied to predict water level time series in three different stations of Guoyang and Chaohu basins in China. Results of VMD-based hybrid model using FSDB sampling technique show that Nash-Sutcliffe Efficiency (NSE) coefficient is increased by 6.4%, 28.8% and 7.0% in three stations respectively, compared with those obtained from the currently most advanced sampling technique. In the meantime, for series of SSA-based experiments, NSE is increased by 3.2%, 3.1% and 1.1% respectively. We conclude that the newly developed FSDB sampling technique can be used to enhance the performance of decomposition-based hybrid model in water level time series forecasting in real world.
摘要
各种不同时间和不同固定信号需要进行正确的预处理,以便在实际应用中进行水位时间序列预测,如水位水平。分解方法是一个好的解决方案,广泛应用于这种预处理问题。然而,使用不当的抽样技术可能会引入未来数据,导致不正确的分解基于预测模型。在这种情况下,我们提出了一种新的幂等步骤分解基于抽样技术(FSDB), strict avoiding the introduction of future information.这种抽样技术与分解方法,如变分Mode分析(VMD)和特征峰分析(SSA),应用于预测水位时间序列的三个不同站点:国阳和朝湖水系。结果表明,使用VMD基于的混合模型使用FSDB抽样技术,相比最先进的抽样技术,Nash-Sutcliffe效率系数(NSE)的提高为6.4%、28.8%和7.0%分别在三个站点上。同时,对Series of SSA-based experiments,NSE的提高为3.2%、3.1%和1.1%分别。我们 conclude that the newly developed FSDB sampling technique can be used to enhance the performance of decomposition-based hybrid model in water level time series forecasting in real world.
results: 本文在多机器人系统和竞争型多体系统中评估了ASRL,与学习基于和控制理论基于的方法进行比较。实验结果表明ASRL有效地增强了安全性和长期性能,并且可以适应各种非典型情况。代码和补充材料在线公开。Abstract
Ensuring safety in dynamic multi-agent systems is challenging due to limited information about the other agents. Control Barrier Functions (CBFs) are showing promise for safety assurance but current methods make strong assumptions about other agents and often rely on manual tuning to balance safety, feasibility, and performance. In this work, we delve into the problem of adaptive safe learning for multi-agent systems with CBF. We show how emergent behavior can be profoundly influenced by the CBF configuration, highlighting the necessity for a responsive and dynamic approach to CBF design. We present ASRL, a novel adaptive safe RL framework, to fully automate the optimization of policy and CBF coefficients, to enhance safety and long-term performance through reinforcement learning. By directly interacting with the other agents, ASRL learns to cope with diverse agent behaviours and maintains the cost violations below a desired limit. We evaluate ASRL in a multi-robot system and a competitive multi-agent racing scenario, against learning-based and control-theoretic approaches. We empirically demonstrate the efficacy and flexibility of ASRL, and assess generalization and scalability to out-of-distribution scenarios. Code and supplementary material are public online.
摘要
Ensuring safety in dynamic multi-agent systems is challenging due to limited information about other agents. Control Barrier Functions (CBFs) are showing promise for safety assurance but current methods make strong assumptions about other agents and often rely on manual tuning to balance safety, feasibility, and performance. In this work, we delve into the problem of adaptive safe learning for multi-agent systems with CBF. We show how emergent behavior can be profoundly influenced by the CBF configuration, highlighting the necessity for a responsive and dynamic approach to CBF design. We present ASRL, a novel adaptive safe RL framework, to fully automate the optimization of policy and CBF coefficients, to enhance safety and long-term performance through reinforcement learning. By directly interacting with the other agents, ASRL learns to cope with diverse agent behaviors and maintains the cost violations below a desired limit. We evaluate ASRL in a multi-robot system and a competitive multi-agent racing scenario, against learning-based and control-theoretic approaches. We empirically demonstrate the efficacy and flexibility of ASRL, and assess generalization and scalability to out-of-distribution scenarios. Code and supplementary material are public online.Here's the text in Traditional Chinese, if you prefer: Ensuring safety in dynamic multi-agent systems is challenging due to limited information about other agents. Control Barrier Functions (CBFs) are showing promise for safety assurance but current methods make strong assumptions about other agents and often rely on manual tuning to balance safety, feasibility, and performance. In this work, we delve into the problem of adaptive safe learning for multi-agent systems with CBF. We show how emergent behavior can be profoundly influenced by the CBF configuration, highlighting the necessity for a responsive and dynamic approach to CBF design. We present ASRL, a novel adaptive safe RL framework, to fully automate the optimization of policy and CBF coefficients, to enhance safety and long-term performance through reinforcement learning. By directly interacting with the other agents, ASRL learns to cope with diverse agent behaviors and maintains the cost violations below a desired limit. We evaluate ASRL in a multi-robot system and a competitive multi-agent racing scenario, against learning-based and control-theoretic approaches. We empirically demonstrate the efficacy and flexibility of ASRL, and assess generalization and scalability to out-of-distribution scenarios. Code and supplementary material are public online.
A spectrum of physics-informed Gaussian processes for regression in engineering
paper_authors: Elizabeth J Cross, Timothy J Rogers, Daniel J Pitchforth, Samuel J Gibson, Matthew R Jones
for: 提高使用有限数据进行预测模型的能力
methods: 结合机器学习技术和物理基础理解来提高预测模型的可靠性和可读性
results: 通过将物理基础理解与数据回归方法联系起来,可以大幅减少数据收集量,同时提高模型的解释性。Abstract
Despite the growing availability of sensing and data in general, we remain unable to fully characterise many in-service engineering systems and structures from a purely data-driven approach. The vast data and resources available to capture human activity are unmatched in our engineered world, and, even in cases where data could be referred to as ``big,'' they will rarely hold information across operational windows or life spans. This paper pursues the combination of machine learning technology and physics-based reasoning to enhance our ability to make predictive models with limited data. By explicitly linking the physics-based view of stochastic processes with a data-based regression approach, a spectrum of possible Gaussian process models are introduced that enable the incorporation of different levels of expert knowledge of a system. Examples illustrate how these approaches can significantly reduce reliance on data collection whilst also increasing the interpretability of the model, another important consideration in this context.
摘要
An Extendable Python Implementation of Robust Optimisation Monte Carlo
paper_authors: Vasilis Gkolemis, Michael Gutmann, Henri Pesonen
For: 这篇论文的目的是提出一个基于Monte Carlo的likelihood-free inference(LFI)方法,并实现其在Python的ELFI套件中。* Methods: 这篇论文使用了一种名为Robust Optimisation Monte Carlo(ROMC)的LFI方法,ROMC是一个新的、高度平行化的LFI框架,可以实现精确的 posterior 权重样本。* Results: 这篇论文的实现方法可以在两种方式下使用:一、科学家可以直接使用它作为一个出厂装置的LFI算法;二、研究者可以将ROMC分解为单独的部分,并让其在完全平行化的方式下运行,以便进一步的扩展和改进。Abstract
Performing inference in statistical models with an intractable likelihood is challenging, therefore, most likelihood-free inference (LFI) methods encounter accuracy and efficiency limitations. In this paper, we present the implementation of the LFI method Robust Optimisation Monte Carlo (ROMC) in the Python package ELFI. ROMC is a novel and efficient (highly-parallelizable) LFI framework that provides accurate weighted samples from the posterior. Our implementation can be used in two ways. First, a scientist may use it as an out-of-the-box LFI algorithm; we provide an easy-to-use API harmonized with the principles of ELFI, enabling effortless comparisons with the rest of the methods included in the package. Additionally, we have carefully split ROMC into isolated components for supporting extensibility. A researcher may experiment with novel method(s) for solving part(s) of ROMC without reimplementing everything from scratch. In both scenarios, the ROMC parts can run in a fully-parallelized manner, exploiting all CPU cores. We also provide helpful functionalities for (i) inspecting the inference process and (ii) evaluating the obtained samples. Finally, we test the robustness of our implementation on some typical LFI examples.
摘要
Performing inference in statistical models with an intractable likelihood is challenging, therefore, most likelihood-free inference (LFI) methods encounter accuracy and efficiency limitations. In this paper, we present the implementation of the LFI method Robust Optimisation Monte Carlo (ROMC) in the Python package ELFI. ROMC is a novel and efficient (highly-parallelizable) LFI framework that provides accurate weighted samples from the posterior. Our implementation can be used in two ways. First, a scientist may use it as an out-of-the-box LFI algorithm; we provide an easy-to-use API harmonized with the principles of ELFI, enabling effortless comparisons with the rest of the methods included in the package. Additionally, we have carefully split ROMC into isolated components for supporting extensibility. A researcher may experiment with novel method(s) for solving part(s) of ROMC without reimplementing everything from scratch. In both scenarios, the ROMC parts can run in a fully-parallelized manner, exploiting all CPU cores. We also provide helpful functionalities for (i) inspecting the inference process and (ii) evaluating the obtained samples. Finally, we test the robustness of our implementation on some typical LFI examples.Here's the translation in Traditional Chinese:行使 statistical models 中的 intractable likelihood 是具有挑战性的,因此大多数 likelihood-free inference (LFI) 方法受到精度和效率限制。在这篇 paper 中,我们提出了 ELFI 套件中的 Robust Optimisation Monte Carlo (ROMC) 方法的实现。ROMC 是一个新的、高度可行化 (高度并行化) LFI 框架,可以从 posterior 中获得正确的权重样本。我们的实现可以在 two 种方式使用:一、科学家可以将它作为 out-of-the-box LFI 算法使用;我们提供了与 ELFI 的原则相互适应的易用 API,使得与其他方法在套件中的比较变得容易。其次,我们将 ROMC 分解为可扩展的部分,让研究人员可以对部分 ROMC 进行 novel 的方法解决,而不需要从零开始重新实现。在这两种情况下,ROMC 的部分可以在完全并行化的方式下运行,扩展到所有 CPU 核心。我们还提供了帮助性的功能,包括 (i) 检查推断过程和 (ii) 评估取得的样本。最后,我们将实现的稳定性试验在一些常见的 LFI 例子上。
Asteroids co-orbital motion classification based on Machine Learning
results: 我们的结果表明,机器学习算法能够正确地分类时间序列,并且性能非常高。Abstract
In this work, we explore how to classify asteroids in co-orbital motion with a given planet using Machine Learning. We consider four different kinds of motion in mean motion resonance with the planet, nominally Tadpole, Horseshoe and Quasi-satellite, building 3 datasets defined as Real (taking the ephemerides of real asteroids from the JPL Horizons system), Ideal and Perturbed (both simulated, obtained by propagating initial conditions considering two different dynamical systems) for training and testing the Machine Learning algorithms in different conditions. The time series of the variable theta (angle related to the resonance) are studied with a data analysis pipeline defined ad hoc for the problem and composed by: data creation and annotation, time series features extraction thanks to the tsfresh package (potentially followed by selection and standardization) and the application of Machine Learning algorithms for Dimensionality Reduction and Classification. Such approach, based on features extracted from the time series, allows to work with a smaller number of data with respect to Deep Learning algorithms, also allowing to define a ranking of the importance of the features. Physical Interpretability of the features is another key point of this approach. In addition, we introduce the SHapley Additive exPlanations for Explainability technique. Different training and test sets are used, in order to understand the power and the limits of our approach. The results show how the algorithms are able to identify and classify correctly the time series, with a high degree of performance.
摘要
在这项工作中,我们探索如何使用机器学习分类 asteroids 在与给定的 planet 的共轨运动中。我们考虑了四种不同的运动,包括 Tadpole、Horseshoe 和 Quasi-satellite,并构建了三个数据集:Real(使用 JPL Horizons 系统中的真实小行星轨道数据)、Ideal 和 Perturbed(两者都是通过初始条件的传播而生成的,以模拟不同的动力系统),用于训练和测试机器学习算法。我们使用自定义的数据分析管道来研究时间序列中的θ(与共轨运动相关的角度)。该管道包括:数据创建和注释、时间序列特征提取(使用 tsfresh 包)以及机器学习算法的维度减少和分类。这种方法,基于时间序列特征,允许我们使用较少的数据量与深度学习算法进行比较,同时允许我们定义特征的排名和物理可解性。此外,我们还引入 SHapley Additive exPlanations for Explainability 技术。为了了解我们的方法的力量和局限,我们使用了不同的训练和测试集。结果显示,我们的算法能够正确地识别和分类时间序列,性能非常高。
Motif-Centric Representation Learning for Symbolic Music
results: 实验结果表明,这两种方法可以补充Each other,提高了 music 旋律检索任务中的区域下面积分值12.6%。最后,我们可见了获得的旋律表示,为了更直观地理解音乐作品的结构。根据我们所知,这是计算机模型音乐旋律的一个突破性的研究成果,laying the foundations for future applications of motifs in automatic music composition and music information retrieval。Abstract
Music motif, as a conceptual building block of composition, is crucial for music structure analysis and automatic composition. While human listeners can identify motifs easily, existing computational models fall short in representing motifs and their developments. The reason is that the nature of motifs is implicit, and the diversity of motif variations extends beyond simple repetitions and modulations. In this study, we aim to learn the implicit relationship between motifs and their variations via representation learning, using the Siamese network architecture and a pretraining and fine-tuning pipeline. A regularization-based method, VICReg, is adopted for pretraining, while contrastive learning is used for fine-tuning. Experimental results on a retrieval-based task show that these two methods complement each other, yielding an improvement of 12.6% in the area under the precision-recall curve. Lastly, we visualize the acquired motif representations, offering an intuitive comprehension of the overall structure of a music piece. As far as we know, this work marks a noteworthy step forward in computational modeling of music motifs. We believe that this work lays the foundations for future applications of motifs in automatic music composition and music information retrieval.
摘要
音乐主题,作为作曲的概念建筑块,对音乐结构分析和自动作曲具有重要意义。虽然人类听众可以轻松地识别主题,但现有计算机模型却无法准确表示主题和其变化。这是因为主题的本质是隐式的,主题变化的多样性超出了简单的重复和修改。在这项研究中,我们目的是通过学习来表示主题和其变化的关系,使用siamesenet Architecture和预训练和精度调整管道。采用VICReg方法进行预训练,而对比学习用于精度调整。实验结果表明,这两种方法相互补做,提高了 Retrieval-based任务的区间 beneath the precision-recall 曲线的面积12.6%。最后,我们可视化获得的主题表示,为音乐作品的结构提供了直观的理解。根据我们所知,这项工作是计算机模型音乐主题的开创性工作,我们认为这项工作将为自动音乐作曲和音乐信息检索的未来应用奠定基础。
Task Graph offloading via Deep Reinforcement Learning in Mobile Edge Computing
results: 对比 existed 策略,SATA-DRL 能够更好地减少平均延迟和死线违反率。Abstract
Various mobile applications that comprise dependent tasks are gaining widespread popularity and are increasingly complex. These applications often have low-latency requirements, resulting in a significant surge in demand for computing resources. With the emergence of mobile edge computing (MEC), it becomes the most significant issue to offload the application tasks onto small-scale devices deployed at the edge of the mobile network for obtaining a high-quality user experience. However, since the environment of MEC is dynamic, most existing works focusing on task graph offloading, which rely heavily on expert knowledge or accurate analytical models, fail to fully adapt to such environmental changes, resulting in the reduction of user experience. This paper investigates the task graph offloading in MEC, considering the time-varying computation capabilities of edge computing devices. To adapt to environmental changes, we model the task graph scheduling for computation offloading as a Markov Decision Process (MDP). Then, we design a deep reinforcement learning algorithm (SATA-DRL) to learn the task scheduling strategy from the interaction with the environment, to improve user experience. Extensive simulations validate that SATA-DRL is superior to existing strategies in terms of reducing average makespan and deadline violation.
摘要
各种移动应用程序,它们包含依赖关系的任务,在广泛流行的情况下,逐渐变得越来越复杂。这些应用程序通常具有低延迟要求,从而导致计算资源的巨大需求增加。随着移动边缘计算(MEC)的出现,将应用任务卸载到靠近移动网络边缘的小规模设备上成为了最重要的问题,以实现高质量用户体验。然而,由于MEC环境的动态性,大多数现有的任务图卸载工作,它们依赖于专家知识或准确的分析模型,无法完全适应环境变化,从而导致用户体验下降。本文研究MEC中的任务图卸载,考虑到边缘计算设备的时间变化计算能力。为适应环境变化,我们将任务图 scheduling 模型为Markov决策过程(MDP)。然后,我们设计了深度强化学习算法(SATA-DRL),从环境互动中学习任务调度策略,以提高用户体验。广泛的 simulations validate 表明,SATA-DRL 在减少平均延迟和死线 violet 方面胜过现有策略。
A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents
results: 我们发现了大语言模型(GPT-Neo和GPT-J)在法律文档上的适应性和内部领域(法律)的转移学习能力。此外,我们还比较了MESc和这些模型的性能,以及将嵌入组合在最后几层中的影响。为 Hierarchical模型,我们也提出了一种解释抽取算法 named ORSE,即Occlusion sensitivity-based Relevant Sentence Extractor。Abstract
Automatic legal judgment prediction and its explanation suffer from the problem of long case documents exceeding tens of thousands of words, in general, and having a non-uniform structure. Predicting judgments from such documents and extracting their explanation becomes a challenging task, more so on documents with no structural annotation. We define this problem as "scarce annotated legal documents" and explore their lack of structural information and their long lengths with a deep learning-based classification framework which we call MESc; "Multi-stage Encoder-based Supervised with-clustering"; for judgment prediction. Specifically, we divide a document into parts to extract their embeddings from the last four layers of a custom fine-tuned Large Language Model, and try to approximate their structure through unsupervised clustering. Which we use in another set of transformer encoder layers to learn the inter-chunk representations. We explore the adaptability of LLMs with multi-billion parameters (GPT-Neo, and GPT-J) to legal texts and their intra-domain(legal) transfer learning capacity. Alongside this, we compare their performance with MESc and the impact of combining embeddings from their last layers. For such hierarchical models, we also propose an explanation extraction algorithm named ORSE; Occlusion sensitivity-based Relevant Sentence Extractor;
摘要
自动化法律判断预测和其解释受到长案文档超过万字的问题困扰,其中文档结构不均匀,预测判断和提取解释变得非常困难。我们称这个问题为“缺乏注释法律文档”,并explore其缺乏结构信息和长度的问题,以及无结构注释的文档。我们使用一种深度学习基于分类的框架,称之为MESc,以预测判断。具体来说,我们将文档分成多个部分,从最后四层的自定义微调Large Language Model中提取嵌入,并使用无结构注释的 clustering来approximate结构。然后,我们使用另一组 transformer encoder层来学习间键表示。此外,我们还 explore了LLMs的多亿 Parameter(GPT-Neo和GPT-J)在法律文本上的适应性和内部领域(legal)的传输学习能力。此外,我们还比较了MESc和这些模型的性能,以及将嵌入从最后几层 combine的影响。对于层次模型,我们还提出了一种解释抽取算法,称之为ORSE,即Occlusion sensitivity-based Relevant Sentence Extractor。
Hybrid State Space-based Learning for Sequential Data Prediction with Joint Optimization
paper_authors: Mustafa E. Aydın, Arda Fazla, Suleyman S. Kozat
for: 该 paper investigate 非线性预测/回归在在线环境中,并提出了一种混合模型,可以有效地解决传统非线性预测模型中的域pecificFeature工程问题,并实现了有效的非线性和线性组件混合。
methods: 该 paper 使用了回归结构来提取Raw序列序列中的特征,并使用了传统的线性时间序列模型来处理时间序列数据中的特有性,如季节性和趋势。而不同于现有的集成或混合模型,我们在一个单一的过程中,jointly optimize 一个加强的杜林 ней网络 (LSTM) для自动特征提取和一个 ARMA 家族时间序列模型 (SARIMAX) 来有效地处理时间序列数据。
results: 我们在 widely 公布的实验数据集上表现出了显著的改进,并在 Code 中开源了我们的代码,以便进一步研究和复现我们的结果。Abstract
We investigate nonlinear prediction/regression in an online setting and introduce a hybrid model that effectively mitigates, via a joint mechanism through a state space formulation, the need for domain-specific feature engineering issues of conventional nonlinear prediction models and achieves an efficient mix of nonlinear and linear components. In particular, we use recursive structures to extract features from raw sequential sequences and a traditional linear time series model to deal with the intricacies of the sequential data, e.g., seasonality, trends. The state-of-the-art ensemble or hybrid models typically train the base models in a disjoint manner, which is not only time consuming but also sub-optimal due to the separation of modeling or independent training. In contrast, as the first time in the literature, we jointly optimize an enhanced recurrent neural network (LSTM) for automatic feature extraction from raw data and an ARMA-family time series model (SARIMAX) for effectively addressing peculiarities associated with time series data. We achieve this by introducing novel state space representations for the base models, which are then combined to provide a full state space representation of the hybrid or the ensemble. Hence, we are able to jointly optimize both models in a single pass via particle filtering, for which we also provide the update equations. The introduced architecture is generic so that one can use other recurrent architectures, e.g., GRUs, traditional time series-specific models, e.g., ETS or other optimization methods, e.g., EKF, UKF. Due to such novel combination and joint optimization, we demonstrate significant improvements in widely publicized real life competition datasets. We also openly share our code for further research and replicability of our results.
摘要
我们调查线性预测/回归在线上环境中,并提出了一种混合模型,可以有效地减少传统非线性预测模型中的域专特性工程问题,并 achiev 了一个有效的非线性和线性元件混合。具体来说,我们使用回传结构来从原始的序列序列中提取特征,并使用传统的线性时间序列模型来处理时间序列数据中的特有性,例如季节性和趋势。现有的ensemble或混合模型通常会在分开的方式训练基本模型,这不仅是时间耗费大的,而且也是不佳的,因为它们的模型化或独立训练。相反地,我们是第一次在文献中使用一个强化的长期内部遮蔽网络(LSTM)来自动提取特征,并使用ARMA家族时间序列模型(SARIMAX)来有效地处理时间序列数据中的特有性。我们通过引入新的状态空间表示来融合这两个基本模型,然后将它们联合以提供一个完整的状态空间表示。因此,我们可以在单一通过粒子统计来协同优化这两个模型。我们的架构是通用的,可以使用其他回归架构,例如GRU,传统时间序列特定模型,例如ETS,或其他优化方法,例如EKF、UKF。因为我们的新的结构和协同优化,我们在广泛公开的真实生活竞赛数据中展示了明显的改善。我们还公开了我们的代码,以便进一步的研究和我们的结果的重现。
Love or Hate? Share or Split? Privacy-Preserving Training Using Split Learning and Homomorphic Encryption
results: 本论文的结果显示,使用HE数据进行U型分布式学习只会导致精度下降2.65%,并且保护了原始训练数据的隐私。Abstract
Split learning (SL) is a new collaborative learning technique that allows participants, e.g. a client and a server, to train machine learning models without the client sharing raw data. In this setting, the client initially applies its part of the machine learning model on the raw data to generate activation maps and then sends them to the server to continue the training process. Previous works in the field demonstrated that reconstructing activation maps could result in privacy leakage of client data. In addition to that, existing mitigation techniques that overcome the privacy leakage of SL prove to be significantly worse in terms of accuracy. In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data. More precisely, in our approach, the client applies homomorphic encryption on the activation maps before sending them to the server, thus protecting user privacy. This is an important improvement that reduces privacy leakage in comparison to other SL-based works. Finally, our results show that, with the optimum set of parameters, training with HE data in the U-shaped SL setting only reduces accuracy by 2.65% compared to training on plaintext. In addition, raw training data privacy is preserved.
摘要
分离学习(SL)是一种新的合作学习技术,允许参与者(例如客户和服务器)无需共享原始数据来训练机器学习模型。在这种设定下,客户首先应用其部分机器学习模型于原始数据,生成活动图并将其发送给服务器继续训练过程。先前的工作表明,重建活动图可能导致客户数据泄露。此外,现有的防范措施可以减少SL中客户数据隐私泄露的影响,但是这些措施在准确性方面表现不佳。在这篇论文中,我们超越先前的工作,基于U型SL构建了一种具有同质 encrypting 数据的协议。具体来说,在我们的方法中,客户将 homomorphic encryption 应用于活动图,以保护用户隐私。这是一项重要的改进,可以减少客户数据泄露的风险,相比其他SL基于的工作。最后,我们的结果表明,在U型SL设定下,使用HE数据进行训练,只减少准确性比例2.65%,与平文训练相比。此外,原始数据隐私得到保护。
Learning End-to-End Channel Coding with Diffusion Models
results: 实验结果表明,扩散模型可以准确地学习频道分布,并在不同的频道模型下实现近乎最佳的端到端符号错误率。此外,扩散模型还具有较好的抗随机性和鲁棒性。Abstract
The training of neural encoders via deep learning necessitates a differentiable channel model due to the backpropagation algorithm. This requirement can be sidestepped by approximating either the channel distribution or its gradient through pilot signals in real-world scenarios. The initial approach draws upon the latest advancements in image generation, utilizing generative adversarial networks (GANs) or their enhanced variants to generate channel distributions. In this paper, we address this channel approximation challenge with diffusion models, which have demonstrated high sample quality in image generation. We offer an end-to-end channel coding framework underpinned by diffusion models and propose an efficient training algorithm. Our simulations with various channel models establish that our diffusion models learn the channel distribution accurately, thereby achieving near-optimal end-to-end symbol error rates (SERs). We also note a significant advantage of diffusion models: A robust generalization capability in high signal-to-noise ratio regions, in contrast to GAN variants that suffer from error floor. Furthermore, we examine the trade-off between sample quality and sampling speed, when an accelerated sampling algorithm is deployed, and investigate the effect of the noise scheduling on this trade-off. With an apt choice of noise scheduling, sampling time can be significantly reduced with a minor increase in SER.
摘要
neural encoder 的训练通过深度学习需要可导通道模型,这是因为权重传播算法的需求。这种要求可以通过在实际场景中使用启动信号来缓解。首先,我们使用最新的图像生成技术,如生成对抗网络(GANs)或其改进版本,来生成通道分布。在这篇论文中,我们使用扩散模型来解决通道 aproximation 挑战。我们提出了基于扩散模型的端到端通道编码框架,并提出了高效的训练算法。我们的 simulations 表明,我们的扩散模型可以准确地学习通道分布,并实现near-optimal的端到端符号错误率(SER)。此外,我们注意到扩散模型的一个重要优点:在高信号噪比区域中具有robust的总体化能力,与GAN变体不同,后者在错误 floor 方面受到影响。此外,我们还研究了样本质量和抽取速度之间的质量,以及降低SER的代价。通过适当的噪声调度,抽取时间可以被显著减少,但是SER的增加很小。
A comparative study of Grid and Natural sentences effects on Normal-to-Lombard conversion
paper_authors: Hongyang Chen, Yuhong Yang, Qingmu Liu, Baifeng Li, Weiping Tu, Song Lin for:This paper is written to investigate the effectiveness of Normal-to-Lombard models in improving natural speech intelligibility in real-world applications, using a parallel Lombard corpus (Lombard Chinese TIMIT, LCT) and a comparison with a standard grid sentence corpus (Enhanced MAndarin Lombard Grid, EMALG).methods:The paper uses a parallel Lombard corpus (LCT) and a comparison with a standard grid sentence corpus (EMALG) to evaluate the Lombard effect and Normal-to-Lombard conversion in natural and grid sentences. The authors also use a subjective intelligibility assessment across genders and Signal-to-Noise Ratios to evaluate the performance of a StarGAN model trained on EMALG.results:The paper finds that both natural and grid sentences exhibit similar changes in parameters as the noise level increases, but grid sentences show a greater increase in the alpha ratio. The StarGAN model trained on EMALG consistently outperforms the model trained on LCT in terms of improving intelligibility, which may be attributed to EMALG’s larger alpha ratio increase from normal to Lombard speech.Abstract
Grid sentence is commonly used for studying the Lombard effect and Normal-to-Lombard conversion. However, it's unclear if Normal-to-Lombard models trained on grid sentences are sufficient for improving natural speech intelligibility in real-world applications. This paper presents the recording of a parallel Lombard corpus (called Lombard Chinese TIMIT, LCT) extracting natural sentences from Chinese TIMIT. Then We compare natural and grid sentences in terms of Lombard effect and Normal-to-Lombard conversion using LCT and Enhanced MAndarin Lombard Grid corpus (EMALG). Through a parametric analysis of the Lombard effect, We find that as the noise level increases, both natural sentences and grid sentences exhibit similar changes in parameters, but in terms of the increase of the alpha ratio, grid sentences show a greater increase. Following a subjective intelligibility assessment across genders and Signal-to-Noise Ratios, the StarGAN model trained on EMALG consistently outperforms the model trained on LCT in terms of improving intelligibility. This superior performance may be attributed to EMALG's larger alpha ratio increase from normal to Lombard speech.
摘要
Grid sentence 通常用于研究洛伯特效应和正常到洛伯特转换。然而,不清楚的是,正常到洛伯特模型在格子句子上训练后是否能提高实际应用中的自然语音 inteligibility。这篇论文介绍了一个平行的洛伯特 corpus(名为洛伯特中文 TIMIT, LCT),从中文 TIMIT 中提取了自然句子。然后,我们比较了自然句子和格子句子在洛伯特效应和正常到洛伯特转换方面的不同,使用 LCT 和增强 Mandarin Lombard Grid corpora(EMALG)。通过对洛伯特效应的参数分析,我们发现,随着噪音水平的增加,自然句子和格子句子都会出现类似的参数变化,但是在 alpha 比率的增加方面,格子句子显示更大的增加。进一步,我们对 gender 和 Signal-to-Noise Ratio 不同的人进行主观的听力评估,发现 StarGAN 模型在 EMALG 上训练后在 inteligibility 方面一直表现出优于在 LCT 上训练的模型。这种更好的性能可能是因为 EMALG 中 alpha 比率在正常到洛伯特语音转换时的更大增加。
Ad-load Balancing via Off-policy Learning in a Content Marketplace
For: 优化在社交媒体平台上的在线广告系统中的广告负载均衡,以 maximize 用户满意度和广告收益,同时维护用户体验。* Methods: 利用偏函数学习和记录bandit反馈来评估和优化广告负载。* Results: 在大规模A/B测试中,通过使用偏函数学习和不偏估计器(如倒推propensity scoring和双重Robust),实现了对用户满意度指标和广告收益的同时优化。Abstract
Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and revenue while maintaining a satisfactory user experience. This requires the optimization of conflicting objectives, such as user satisfaction and ads revenue. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. In this paper, we present an approach that leverages off-policy learning and evaluation from logged bandit feedback. We start by presenting a motivating analysis of the ad-load balancing problem, highlighting the conflicting objectives between user satisfaction and ads revenue. We emphasize the nuances that arise due to user heterogeneity and the dependence on the user's position within a session. Based on this analysis, we define the problem as determining the optimal ad-load for a particular feed fetch. To tackle this problem, we propose an off-policy learning framework that leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and Doubly Robust (DR) to learn and estimate the policy values using offline collected stochastic data. We present insights from online A/B experiments deployed at scale across over 80 million users generating over 200 million sessions, where we find statistically significant improvements in both user satisfaction metrics and ads revenue for the platform.
摘要
<>使用简化中文。<>在在线广告系统中,负载均衡是一项挑战,尤其是在社交媒体平台上,目标是 maximize 用户满意度和广告收益,同时保持用户体验满意。这需要优化冲突的目标,如用户满意度和广告收益。传统的负载均衡方法采用静态分配策略,不能适应用户偏好和上下文因素的变化。在这篇论文中,我们提出一种方法,利用偏移策略和来自日志抽象反馈的评估。我们首先提出了负载均衡问题的动机分析,指出用户满意度和广告收益之间的冲突,并强调用户多样性和Session中的用户位置对问题的影响。基于这种分析,我们定义了负载均衡问题为特定的Feed fetch中的optimal ad-load。为解决这个问题,我们提出了一种偏移学习框架,利用不偏抽象器 such as Inverse Propensity Scoring (IPS)和Doubly Robust (DR)来学习和估计策略值使用在线采集的随机数据。我们提供在线 A/B 试验的实际经验,在8000万用户和2000万会话中发现了 statistically significant 的提升用户满意度指标和广告收益。
Coreset selection can accelerate quantum machine learning models with provable generalization
results: 通过系统的数字实验,显示了核心集选择法在多种量子机器学习任务中的潜在效果,包括 sintetic数据分类、量子相关性识别和量子编译。Abstract
Quantum neural networks (QNNs) and quantum kernels stand as prominent figures in the realm of quantum machine learning, poised to leverage the nascent capabilities of near-term quantum computers to surmount classical machine learning challenges. Nonetheless, the training efficiency challenge poses a limitation on both QNNs and quantum kernels, curbing their efficacy when applied to extensive datasets. To confront this concern, we present a unified approach: coreset selection, aimed at expediting the training of QNNs and quantum kernels by distilling a judicious subset from the original training dataset. Furthermore, we analyze the generalization error bounds of QNNs and quantum kernels when trained on such coresets, unveiling the comparable performance with those training on the complete original dataset. Through systematic numerical simulations, we illuminate the potential of coreset selection in expediting tasks encompassing synthetic data classification, identification of quantum correlations, and quantum compiling. Our work offers a useful way to improve diverse quantum machine learning models with a theoretical guarantee while reducing the training cost.
摘要
量子神经网络(QNN)和量子kernels作为量子机器学习领域的代表性 figma,潜在地利用近期量子计算机的潜在能力,以超越 классиical机器学习挑战。然而,训练效率问题成为了QNN和量子kernels的限制因素,在处理大规模数据时减少了它们的效果。为了解决这个问题,我们提出了一种统一方法:核心选择,旨在加速QNN和量子kernels的训练过程,通过筛选judicioussubset从原始训练集。此外,我们分析了QNN和量子kernels在 Such coresets上进行训练时的泛化误差 bound,发现它们与原始数据集上进行训练时的性能相似。通过系统的数字实验,我们描述了核心选择在加速Synthetic数据分类、量子相关性识别和量子编译等任务中的潜在优势。我们的工作提供了一种可靠的方法来改进多种量子机器学习模型,同时降低训练成本。
Graph Neural Networks for Dynamic Modeling of Roller Bearing
results: 通过测试不同的滚珠机械配置,证明 GNN 模型在准确预测滚珠机械的动态行为方面具有良好的学习和泛化能力,表明其在实时监测旋转机械的健康状况中具有潜在的应用前景。Abstract
In the presented work, we propose to apply the framework of graph neural networks (GNNs) to predict the dynamics of a rolling element bearing. This approach offers generalizability and interpretability, having the potential for scalable use in real-time operational digital twin systems for monitoring the health state of rotating machines. By representing the bearing's components as nodes in a graph, the GNN can effectively model the complex relationships and interactions among them. We utilize a dynamic spring-mass-damper model of a bearing to generate the training data for the GNN. In this model, discrete masses represent bearing components such as rolling elements, inner raceways, and outer raceways, while a Hertzian contact model is employed to calculate the forces between these components. We evaluate the learning and generalization capabilities of the proposed GNN framework by testing different bearing configurations that deviate from the training configurations. Through this approach, we demonstrate the effectiveness of the GNN-based method in accurately predicting the dynamics of rolling element bearings, highlighting its potential for real-time health monitoring of rotating machinery.
摘要
在提出的工作中,我们提议使用图 neural network (GNN) 模型来预测滚动元件机械的动态行为。这种方法具有普适性和可解释性,有可能在实时操作中的数字双系统中进行扩展使用,以监测旋转机器的健康状态。通过在图中表示机械的组件为节点,GNN 可以有效地模型机械中复杂的关系和互动。我们利用一种动态春荷振荷模型来生成训练数据,其中离散质量表示机械中的滚动元件、内环和外环等组件,而哈特兹振荷模型则用于计算这些组件之间的力。我们通过测试不同的机械配置来评估 GNN 模型的学习和泛化能力。通过这种方法,我们证明了 GNN 模型在准确预测滚动元件机械的动态行为方面的效iveness,这种方法还有可能在实时监测旋转机器的健康状态。
A Variational Auto-Encoder Enabled Multi-Band Channel Prediction Scheme for Indoor Localization
paper_authors: Ruihao Yuan, Kaixuan Huang, Pan Yang, Shunqing Zhang
for: 提高室内地位标定精度,适用于虚拟/增强现实和智能家居等高技术应用。
methods: 使用频域预测channe state information (CSI)值,并将多频信息合并以提高室内地位标定精度。
results: 在COST 2100 simulate数据和实际时orthogonal frequency division multiplexing (OFDM) WiFi数据中测试了提议方案,并获得了更精度的室内地位标定结果。Abstract
Indoor localization is getting increasing demands for various cutting-edged technologies, like Virtual/Augmented reality and smart home. Traditional model-based localization suffers from significant computational overhead, so fingerprint localization is getting increasing attention, which needs lower computation cost after the fingerprint database is built. However, the accuracy of indoor localization is limited by the complicated indoor environment which brings the multipath signal refraction. In this paper, we provided a scheme to improve the accuracy of indoor fingerprint localization from the frequency domain by predicting the channel state information (CSI) values from another transmitting channel and spliced the multi-band information together to get more precise localization results. We tested our proposed scheme on COST 2100 simulation data and real time orthogonal frequency division multiplexing (OFDM) WiFi data collected from an office scenario.
摘要
室内定位技术正在不断受到不同的前沿技术的需求,如虚拟/增强现实和智能家居。传统的模型基地定位技术具有较高的计算开销,因此脸部特征定位技术在受到越来越多的关注,需要降低计算成本后,脸部特征定位技术可以在室内环境中提供更高的精度。然而,室内环境的复杂性使得多Path信号折射带来了局部定位精度的限制。本文提出了一种从频域提高室内脸部特征定位精度的方案,通过预测另一个传输通道的通道状态信息(CSI)值,并将多频信息组合在一起,以获得更加精确的定位结果。我们在COST 2100 simulatedata和实时的orthogonal frequency division multiplexing(OFDM)WiFi数据中测试了我们的提议方案,并取得了更好的定位精度。
Minimum width for universal approximation using ReLU networks on compact domain
methods: 作者使用了几种方法来Characterize the minimum width $w_{\min}$ enabling the universal approximation property, but only a few of them found the exact values.
results: 作者证明了最小宽度为 $\max{d_x,d_y,2}$ 可以 universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$,且比知道的结果 $w_{\min}=\max{d_x+1,d_y}$ 更低。此外,作者还证明了一个下界于 $w_{\min}$ 的上限,即 $w_{\min}\ge d_y+1$ if $d_x<d_y\le2d_x$。Abstract
The universal approximation property of width-bounded networks has been studied as a dual of the classical universal approximation theorem for depth-bounded ones. There were several attempts to characterize the minimum width $w_{\min}$ enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for the universal approximation of $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly $\max\{d_x,d_y,2\}$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result $w_{\min}=\max\{d_x+1,d_y\}$ when the domain is ${\mathbb R^{d_x}$, our result first shows that approximation on a compact domain requires smaller width than on ${\mathbb R^{d_x}$. We next prove a lower bound on $w_{\min}$ for uniform approximation using general activation functions including ReLU: $w_{\min}\ge d_y+1$ if $d_x摘要
全球近似性性的宽度约束网络的研究已被视为深度约束网络的 dual。虽有几种尝试Characterizing the minimum width $w_{\min}$ enabling the universal approximation property, but only a few found the exact values. In this work, we show that the minimum width for the universal approximation of $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly $\max\{d_x,d_y,2\}$ if the activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result $w_{\min}=\max\{d_x+1,d_y\}$ when the domain is $\mathbb R^{d_x}$, our result first shows that approximation on a compact domain requires smaller width than on $\mathbb R^{d_x}$. We next prove a lower bound on $w_{\min}$ for uniform approximation using general activation functions including ReLU: $w_{\min}\ge d_y+1$ if $d_x
Differentiable Quantum Architecture Search for Quantum Reinforcement Learning
results: 实验结果显示 DQAS 可以自动和高效地设计量子电路,并且在训练过程中,自动创建的电路表现较为出色。Abstract
Differentiable quantum architecture search (DQAS) is a gradient-based framework to design quantum circuits automatically in the NISQ era. It was motivated by such as low fidelity of quantum hardware, low flexibility of circuit architecture, high circuit design cost, barren plateau (BP) problem, and periodicity of weights. People used it to address error mitigation, unitary decomposition, and quantum approximation optimization problems based on fixed datasets. Quantum reinforcement learning (QRL) is a part of quantum machine learning and often has various data. QRL usually uses a manually designed circuit. However, the pre-defined circuit needs more flexibility for different tasks, and the circuit design based on various datasets could become intractable in the case of a large circuit. The problem of whether DQAS can be applied to quantum deep Q-learning with various datasets is still open. The main target of this work is to discover the capability of DQAS to solve quantum deep Q-learning problems. We apply a gradient-based framework DQAS on reinforcement learning tasks and evaluate it in two different environments - cart pole and frozen lake. It contains input- and output weights, progressive search, and other new features. The experiments conclude that DQAS can design quantum circuits automatically and efficiently. The evaluation results show significant outperformance compared to the manually designed circuit. Furthermore, the performance of the automatically created circuit depends on whether the super-circuit learned well during the training process. This work is the first to show that gradient-based quantum architecture search is applicable to QRL tasks.
摘要
“差分可求量化架构搜寻(DQAS)是一个基于梯度的框架,用于自动设计量子Circuit在NISQ时代。它受到了低精度量子硬件、固定Circuit架构、高Circuit设计成本、梯度扁平(BP)问题以及periodicity of weights等因素的激励。人们使用它来解决错误补偿、单元分解和量子近似优化问题,基于固定数据集。量子回传学习(QRL)通常使用预先设计的Circuit,但这种预先设计的Circuit需要更多的灵活性以应对不同的任务,而且基于不同的数据集进行Circuit设计可能会成为实际问题。这篇研究的主要目标是探索DQAS能否应用于量子深度Q-学习问题。我们将DQAS应用到了循环学习任务上,并在滑车和冻湖两个不同环境中进行评估。实验结果显示DQAS可以自动设计量子Circuit,效率高。评估结果显示DQAS可以对量子深度Q-学习问题提供明显的超越性。此外,自动创建的Circuit表现取决于在训练过程中超Circuit是否从良好的学习。这是首次显示出gradient-based量子架构搜寻可以应用到QRL任务。”
Testable Likelihoods for Beyond-the-Standard Model Fits
results: 这 paper 研究了一种特定的normalizing flow,并应用它到一个多模态和非泊松的例子中,并评估了 likelihood 函数和测试统计量的准确性。Abstract
Studying potential BSM effects at the precision frontier requires accurate transfer of information from low-energy measurements to high-energy BSM models. We propose to use normalising flows to construct likelihood functions that achieve this transfer. Likelihood functions constructed in this way provide the means to generate additional samples and admit a ``trivial'' goodness-of-fit test in form of a $\chi^2$ test statistic. Here, we study a particular form of normalising flow, apply it to a multi-modal and non-Gaussian example, and quantify the accuracy of the likelihood function and its test statistic.
摘要
Translated into Simplified Chinese:研究BSM效应的精度前沿需要从低能量测量中准确地传递信息到高能BSM模型。我们提议使用 нормализа流来构建likelihood函数,以实现这种传递。通过这种方法,likelihood函数可以生成更多的样本,并且可以使用“轻松”的goodness-of-fit测试 Statistics。在这里,我们研究了一种特定的 нормализа流,将其应用于多模态和非泊松的示例,并评估likelihood函数和测试统计值的准确性。
Striking a Balance: An Optimal Mechanism Design for Heterogenous Differentially Private Data Acquisition for Logistic Regression
results: 我们提供了一些各向异性结果,包括测试错误和支付量在数据量很大时的极限分布。此外,我们还应用了我们的想法到一个实际的医疗数据集中,以便展示我们的方法在实际应用中的效果。Abstract
We investigate the problem of performing logistic regression on data collected from privacy-sensitive sellers. Since the data is private, sellers must be incentivized through payments to provide their data. Thus, the goal is to design a mechanism that optimizes a weighted combination of test loss, seller privacy, and payment, i.e., strikes a balance between multiple objectives of interest. We solve the problem by combining ideas from game theory, statistical learning theory, and differential privacy. The buyer's objective function can be highly non-convex. However, we show that, under certain conditions on the problem parameters, the problem can be convexified by using a change of variables. We also provide asymptotic results characterizing the buyer's test error and payments when the number of sellers becomes large. Finally, we demonstrate our ideas by applying them to a real healthcare data set.
摘要
我们研究在收集来自隐私敏感卖家的数据时进行логистиック回归的问题。由于数据是私人的,卖家需要通过支付来激励提供数据。因此,我们的目标是设计一种机制,以优化一个权衡多个目标函数,即测试损失、卖家隐私和支付。我们利用游戏理论、统计学学习理论和幂等隐私来解决这个问题。买家的目标函数可能很不对称。然而,我们证明,在某些问题参数的条件下,问题可以通过变量变换 convex化。我们还提供了大量卖家测试错误和支付的极限结果,并将我们的想法应用到一个真实的医疗数据集中。
Computational Approaches for App-to-App Retrieval and Design Consistency Check
results: 实验结果显示,该方法不仅超越了先前的提取模型,还启用了多种新的应用程序。Abstract
Extracting semantic representations from mobile user interfaces (UI) and using the representations for designers' decision-making processes have shown the potential to be effective computational design support tools. Current approaches rely on machine learning models trained on small-sized mobile UI datasets to extract semantic vectors and use screenshot-to-screenshot comparison to retrieve similar-looking UIs given query screenshots. However, the usability of these methods is limited because they are often not open-sourced and have complex training pipelines for practitioners to follow, and are unable to perform screenshot set-to-set (i.e., app-to-app) retrieval. To this end, we (1) employ visual models trained with large web-scale images and test whether they could extract a UI representation in a zero-shot way and outperform existing specialized models, and (2) use mathematically founded methods to enable app-to-app retrieval and design consistency analysis. Our experiments show that our methods not only improve upon previous retrieval models but also enable multiple new applications.
摘要
<>TRANSLATE_TEXT现有的方法仅仅是使用小规模的手机用户界面(UI)数据集来训练机器学习模型,从而提取Semantic vectors和使用屏幕截图比较来检索类似的UI。然而,这些方法的可用性受限,因为它们通常不是开源的,具有复杂的训练管道,并且无法进行应用程序之间(i.e., app-to-app)检索。为了解决这些问题,我们采用以下两种方法:1. 使用大规模的网络图像训练视觉模型,以验证这些模型可以在零容量情况下提取UI表示,并且超越现有的专门模型。2. 使用数学基础的方法来启用应用程序之间的检索和设计一致分析。我们的实验结果表明,我们的方法不仅超越了现有的检索模型,还启用了多种新的应用。TRANSLATE_TEXTHere's the translation in Traditional Chinese as well:<>TRANSLATE_TEXT现有的方法只是使用小规模的手机用户界面(UI)数据集来训练机器学习模型,从而提取Semantic vectors和使用屏幕截图比较来检索类似的UI。然而,这些方法的可用性受限,因为它们通常不是开源的,具有复杂的训练管道,并且无法进行应用程序之间(i.e., app-to-app)检索。为了解决这些问题,我们采用以下两种方法:1. 使用大规模的网络图像训练视觉模型,以验证这些模型可以在零容量情况下提取UI表示,并且超越现有的专门模型。2. 使用数学基础的方法来启用应用程序之间的检索和设计一致分析。我们的实验结果显示,我们的方法不仅超越了现有的检索模型,也启用了多种新的应用。TRANSLATE_TEXT
TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions
methods: 这个算法使用了三个关键想法:首先, integrate a recurrent neural network into Tensor-Train Decomposition,以增强其表达力和降低低级假设的限制。其次,折叠输入tensor到更高级tensor,以降低NTTD所需的空间。最后,重新排序输入tensor的模式索引,以便通过NTTD进行更好的预测。
results: 该算法可以达到以下三个目标:(a) concise:它可以提供7.38倍的更紧凑的压缩,与相同的重建错误相比;(b) accurate:给定相同的压缩大小预算,它可以提供3.33倍的更准确的重建,与相同的重建错误相比;(c) scalable:其实际压缩时间是线性增长,并且每个Entry的重建时间是对数增长。Abstract
Many real-world datasets are represented as tensors, i.e., multi-dimensional arrays of numerical values. Storing them without compression often requires substantial space, which grows exponentially with the order. While many tensor compression algorithms are available, many of them rely on strong data assumptions regarding its order, sparsity, rank, and smoothness. In this work, we propose TENSORCODEC, a lossy compression algorithm for general tensors that do not necessarily adhere to strong input data assumptions. TENSORCODEC incorporates three key ideas. The first idea is Neural Tensor-Train Decomposition (NTTD) where we integrate a recurrent neural network into Tensor-Train Decomposition to enhance its expressive power and alleviate the limitations imposed by the low-rank assumption. Another idea is to fold the input tensor into a higher-order tensor to reduce the space required by NTTD. Finally, the mode indices of the input tensor are reordered to reveal patterns that can be exploited by NTTD for improved approximation. Our analysis and experiments on 8 real-world datasets demonstrate that TENSORCODEC is (a) Concise: it gives up to 7.38x more compact compression than the best competitor with similar reconstruction error, (b) Accurate: given the same budget for compressed size, it yields up to 3.33x more accurate reconstruction than the best competitor, (c) Scalable: its empirical compression time is linear in the number of tensor entries, and it reconstructs each entry in logarithmic time. Our code and datasets are available at https://github.com/kbrother/TensorCodec.
摘要
许多实际数据集都是tensor的形式,即多维数值数组。不压缩存储这些tensor可能需要巨大的存储空间,其增长为 exponent。虽然有很多tensor压缩算法可用,但是许多它们假设输入tensor的级数、稀疏性、核心级和平滑性。在这种工作中,我们提出了TENSORCODEC,一种lossy压缩算法,用于通用的tensor,不 necesarily遵循强大的输入数据假设。TENSORCODEC包括三个关键想法:首先,我们将输入tensor integrate到一个循环神经网络中,以提高表达力并缓解低级假设的限制。其次,我们将输入tensor折叠成更高级的tensor,以降低存储空间的需求。最后,我们重新排序输入tensor的模式索引,以便NTTD可以更好地利用这些模式进行改进的approximation。我们的分析和实验表明,TENSORCODEC具有以下特点:(a) Concise:它可以提供与最佳竞争对手相同的压缩比,但是具有7.38倍的容器大小;(b) Accurate:给定相同的压缩容器大小,它可以提供3.33倍的更高精度重建结果;(c) Scalable:其实际压缩时间是线性增长的tensor入口数量,并且每个入口的重建时间是对数增长的。我们的代码和数据集可以在https://github.com/kbrother/TensorCodec中获得。
Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms
paper_authors: Keru Wu, Yuansi Chen, Wooseok Ha, Bin Yu
For: The paper focuses on the assumption of conditionally invariant components (CICs) in domain adaptation (DA) and explores their role in providing target risk guarantees.* Methods: The paper proposes a new algorithm called importance-weighted conditional invariant penalty (IW-CIP) based on CICs, which has target risk guarantees beyond simple settings such as covariate shift and label shift. Additionally, the paper shows that CICs help identify large discrepancies between source and target risks of other DA algorithms.* Results: The paper demonstrates the effectiveness of the proposed algorithm and theoretical findings via numerical experiments on synthetic data, MNIST, CelebA, and Camelyon17 datasets. Specifically, the paper shows that incorporating CICs into the domain invariant projection (DIP) algorithm can address its failure scenario caused by label-flipping features.Abstract
Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated considerable empirical success, blindly applying these algorithms can often lead to worse performance on new datasets. To address this, it is crucial to clarify the assumptions under which a DA algorithm has good target performance. In this work, we focus on the assumption of the presence of conditionally invariant components (CICs), which are relevant for prediction and remain conditionally invariant across the source and target data. We demonstrate that CICs, which can be estimated through conditional invariant penalty (CIP), play three prominent roles in providing target risk guarantees in DA. First, we propose a new algorithm based on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target risk guarantees beyond simple settings such as covariate shift and label shift. Second, we show that CICs help identify large discrepancies between source and target risks of other DA algorithms. Finally, we demonstrate that incorporating CICs into the domain invariant projection (DIP) algorithm can address its failure scenario caused by label-flipping features. We support our new algorithms and theoretical findings via numerical experiments on synthetic data, MNIST, CelebA, and Camelyon17 datasets.
摘要
域适应(DA)是一个统计学习问题,其中源数据用于训练模型的分布与目标数据用于评估模型的分布不同。虽然许多DA算法已经显示了较好的实际成果,但是盲目地应用这些算法可能会导致新的数据集上的性能更差。为了解决这问题,需要清楚地明确DA算法在目标数据上的性能假设。在这种工作中,我们关注了conditionally invariant component(CIC)的存在,它们在预测中 relevante和目标数据中具有条件不变性。我们示出了CICs可以通过conditional invariant penalty(CIP)来估计,它们在域适应中提供了三个主要的角色:首先,我们提出了一种基于CICs的新算法,即importance-weighted conditional invariant penalty(IW-CIP),它在更复杂的设置中,如covariate shift和label shift,具有更多的目标风险保证。其次,我们表明CICs可以帮助标识源和目标数据之间的大量差异。最后,我们示出了在DIP算法中包含CICs可以解决因为label-flipping特征而导致的失败情况。我们通过synthetic数据、MNIST、CelebA和Camelyon17 dataset的numerical experiments支持我们的新算法和理论发现。
Learning Orbitally Stable Systems for Diagrammatically Teaching
paper_authors: Weiming Zhi, Kangni Liu, Tianyi Zhang, Matthew Johnson-Roberson
for: 教师机器人novel技能
methods: 使用用户提供的2D图像来形态 robot的运动
results: 可以 diagrammatically teach complex cyclic motion patterns with a high degree of accuracy.Here is the full translation of the abstract in Simplified Chinese:
results: 实验结果显示,可以使用diffomorphism来教师机器人完成复杂的循环运动,并且具有高度准确性。Abstract
Diagrammatic Teaching is a paradigm for robots to acquire novel skills, whereby the user provides 2D sketches over images of the scene to shape the robot's motion. In this work, we tackle the problem of teaching a robot to approach a surface and then follow cyclic motion on it, where the cycle of the motion can be arbitrarily specified by a single user-provided sketch over an image from the robot's camera. Accordingly, we introduce the \emph{Stable Diffeomorphic Diagrammatic Teaching} (SDDT) framework. SDDT models the robot's motion as an \emph{Orbitally Asymptotically Stable} (O.A.S.) dynamical system that learns to follow the user-specified sketch. This is achieved by applying a \emph{diffeomorphism}, i.e. a differentiable and invertible function, to morph a known O.A.S. system. The parameterised diffeomorphism is then optimised with respect to the Hausdorff distance between the limit cycle of our modelled system and the sketch, to produce the desired robot motion. We provide theoretical insight into the behaviour of the optimised system and also empirically evaluate SDDT, both in simulation and on a quadruped with a mounted 6-DOF manipulator. Results show that we can diagrammatically teach complex cyclic motion patterns with a high degree of accuracy.
摘要
图形教学是一种 робоット学习新技能的方法,其中用户提供的2D图形将影响机器人的运动。在这项工作中,我们解决了教育机器人接近场景中的表面并跟踪循环运动的问题,其中循环运动的周期可以通过用户提供的一个图形来定义。因此,我们介绍了稳定幂函数减杂教学框架(SDDT)。 SDDT将机器人的运动模型为一个稳定幂函数系统,该系统学习从用户提供的图形中学习循环运动。我们通过应用一个幂函数,即一个可导和反函数,将一个已知稳定幂函数系统变换为我们的模型系统。然后,我们对这个参数化的幂函数进行优化,使其与图形中的循环运动的 Hausdorff 距离最小化,以生成所需的机器人运动。我们提供了对优化后的系统行为的理论分析,以及在实验中对 SDDT 的评估,包括在模拟环境和一只四脚机器人上安装了六度 freedom 抓取机的实验。结果表明,我们可以使用图形来教学机器人复杂的循环运动模式,并且具有高度准确性。
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
results: Flash-LLM 在 SpMM 层次上至少比 state-of-the-art 库 Sputnik 和 SparTA 快速得多,在 OPT-30B/66B/175B 模型上达到了最高的 tokens per GPU-second 性能,与 DeepSpeed 和 FasterTransformer 相比,具有显著的性能提升和较低的执行成本。Abstract
With the fast growth of parameter size, it becomes increasingly challenging to deploy large generative models as they typically require large GPU memory consumption and massive computation. Unstructured model pruning has been a common approach to reduce both GPU memory footprint and the overall computation while retaining good model accuracy. However, the existing solutions do not provide a highly-efficient support for handling unstructured sparsity on modern GPUs, especially on the highly-structured Tensor Core hardware. Therefore, we propose Flash-LLM for enabling low-cost and highly-efficient large generative model inference with the sophisticated support of unstructured sparsity on high-performance but highly restrictive Tensor Cores. Based on our key observation that the main bottleneck of generative model inference is the several skinny matrix multiplications for which Tensor Cores would be significantly under-utilized due to low computational intensity, we propose a general Load-as-Sparse and Compute-as-Dense methodology for unstructured sparse matrix multiplication. The basic insight is to address the significant memory bandwidth bottleneck while tolerating redundant computations that are not critical for end-to-end performance on Tensor Cores. Based on this, we design an effective software framework for Tensor Core based unstructured SpMM, leveraging on-chip resources for efficient sparse data extraction and computation/memory-access overlapping. At SpMM kernel level, Flash-LLM significantly outperforms the state-of-the-art library, i.e., Sputnik and SparTA by an average of 2.9x and 1.5x, respectively. At end-to-end framework level on OPT-30B/66B/175B models, for tokens per GPU-second, Flash-LLM achieves up to 3.8x and 3.6x improvement over DeepSpeed and FasterTransformer, respectively, with significantly lower inference cost.
摘要
随着参数大小的快速增长,大型生成模型的部署变得越来越困难,因为它们通常需要大量的GPU内存消耗和巨量计算。不结构化模型剔除已成为一种常见的方法来降低GPU内存占用和总计算量,同时保持良好的模型准确性。然而,现有的解决方案不提供高效支持对现代GPU上的不结构化稀疏性,特别是在高性能但高度结构化的Tensor Core硬件上。因此,我们提出Flash-LLM,用于实现低成本高效的大型生成模型推理,并且在高性能但高度结构化的Tensor Core硬件上提供了高度支持不结构化稀疏性。我们的关键观察是,生成模型推理的主要瓶颈在于一些瘦剑矩阵乘法,对于Tensor Core硬件来说,这些矩阵乘法的计算INTENSITY很低,导致GPU内存带宽瓶颈。基于这一点,我们提出一种普适的 Load-as-Sparse 和 Compute-as-Dense 方法,用于不结构化稀疏矩阵乘法。这种方法的基本思想是在约束可以忽略的计算上进行缓存和内存访问的重叠,以降低内存带宽瓶颈。基于这一方法,我们设计了一个高效的Tensor Core基于不结构化稀疏矩阵乘法的软件框架,利用GPU内存中的资源进行高效的稀疏数据提取和计算/内存访问重叠。在SpMM kernel层,Flash-LLM比Sputnik和SparTA两者均高效,平均提高2.9倍和1.5倍。在框架层,Flash-LLM在OPT-30B/66B/175B模型上,对于每个GPU每秒的字符数,与DeepSpeed和FasterTransformer相比,提高了3.8倍和3.6倍,同时具有显著更低的推理成本。
Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech Audio
paper_authors: Forsad Al Hossain, Tanjid Hasan Tonmoy, Andrew A. Lover, George A. Corey, Mohammad Arif Ul Alam, Tauhidur Rahman
For: 本研究旨在提出一种基于非语音audio的人群分析方法,以便在各种场景中提高智能建筑的运行和管理,同时坚持个人隐私要求。* Methods: 我们提出了一种基于转换器模型的非语音audio-based方法,并进行了大量的实验和比较分析,以证明非语音audio alone可以准确地进行人群分析。* Results: 我们的实验结果表明,非语音audio-based方法可以高度准确地估算人群占用率,并且超过了基于热像的模型和其他基线。此外,我们还进行了附加的差分隐私技术分析,以提供进一步的隐私保障。Abstract
Privacy-preserving crowd density analysis finds application across a wide range of scenarios, substantially enhancing smart building operation and management while upholding privacy expectations in various spaces. We propose a non-speech audio-based approach for crowd analytics, leveraging a transformer-based model. Our results demonstrate that non-speech audio alone can be used to conduct such analysis with remarkable accuracy. To the best of our knowledge, this is the first time when non-speech audio signals are proposed for predicting occupancy. As far as we know, there has been no other similar approach of its kind prior to this. To accomplish this, we deployed our sensor-based platform in the waiting room of a large hospital with IRB approval over a period of several months to capture non-speech audio and thermal images for the training and evaluation of our models. The proposed non-speech-based approach outperformed the thermal camera-based model and all other baselines. In addition to demonstrating superior performance without utilizing speech audio, we conduct further analysis using differential privacy techniques to provide additional privacy guarantees. Overall, our work demonstrates the viability of employing non-speech audio data for accurate occupancy estimation, while also ensuring the exclusion of speech-related content and providing robust privacy protections through differential privacy guarantees.
摘要
隐私保护的人群密度分析在各种场景中发挥着广泛的应用,大幅提高智能建筑的运行和管理,同时坚持隐私期望在不同的空间。我们提出了一种基于转换器的非语音音频方法,用于人群分析。我们的结果表明,非语音音频 alone 可以用于进行这种分析,并且具有惊人的准确性。我们知道,这是第一次使用非语音音频信号进行占用率预测。在这之前,没有任何相似的方法。为了完成这一目标,我们在一所大 Hospital 的等待室中部署了我们的传感器平台,并在一些月份内采集了非语音音频和热成像数据,用于训练和评估我们的模型。我们的非语音基本方法在占用率预测方面表现出色,并且超过了基于热成像模型和所有基eline的表现。此外,我们还进行了进一步的分析,使用差分隐私技术提供了额外的隐私保障。总的来说,我们的工作表明了非语音音频数据的可靠性,而不需要使用语音内容,同时提供了坚实的隐私保障。
results: 数值结果表明,这两种方法在数据充沛和数据缺乏两种情况下都能够效果地生成转移路径。Abstract
In this work, we seek to simulate rare transitions between metastable states using score-based generative models. An efficient method for generating high-quality transition paths is valuable for the study of molecular systems since data is often difficult to obtain. We develop two novel methods for path generation in this paper: a chain-based approach and a midpoint-based approach. The first biases the original dynamics to facilitate transitions, while the second mirrors splitting techniques and breaks down the original transition into smaller transitions. Numerical results of generated transition paths for the M\"uller potential and for Alanine dipeptide demonstrate the effectiveness of these approaches in both the data-rich and data-scarce regimes.
摘要
在这项工作中,我们寻求使用得分模型来模拟罕见的 между元态态转移。一种高效的方法 для生成高质量的转移路径对分子系统的研究非常有用,因为数据往往困难获取。我们在这篇论文中开发了两种新的方法来生成转移路径:一种链基的方法和一种中点基的方法。第一种偏导原动力学到促进转移,而第二种使用分割技术将原始转移分解成小转移。我们对穆勒潜能和阿拉伦二肽进行了数值研究,并证明了这些方法在数据丰富和数据缺乏两种情况下的有效性。
Multi-fidelity climate model parameterization for better generalization and extrapolation
results: 研究发现,使用多质量混合方法可以提供更准确的气候预测,而无需增加计算资源的增加。此外,这种方法还可以提供可靠的不确定性评估,并能够在多种enario下提供更加准确的预测结果。Abstract
Machine-learning-based parameterizations (i.e. representation of sub-grid processes) of global climate models or turbulent simulations have recently been proposed as a powerful alternative to physical, but empirical, representations, offering a lower computational cost and higher accuracy. Yet, those approaches still suffer from a lack of generalization and extrapolation beyond the training data, which is however critical to projecting climate change or unobserved regimes of turbulence. Here we show that a multi-fidelity approach, which integrates datasets of different accuracy and abundance, can provide the best of both worlds: the capacity to extrapolate leveraging the physically-based parameterization and a higher accuracy using the machine-learning-based parameterizations. In an application to climate modeling, the multi-fidelity framework yields more accurate climate projections without requiring major increase in computational resources. Our multi-fidelity randomized prior networks (MF-RPNs) combine physical parameterization data as low-fidelity and storm-resolving historical run's data as high-fidelity. To extrapolate beyond the training data, the MF-RPNs are tested on high-fidelity warming scenarios, $+4K$, data. We show the MF-RPN's capacity to return much more skillful predictions compared to either low- or high-fidelity (historical data) simulations trained only on one regime while providing trustworthy uncertainty quantification across a wide range of scenarios. Our approach paves the way for the use of machine-learning based methods that can optimally leverage historical observations or high-fidelity simulations and extrapolate to unseen regimes such as climate change.
摘要
globale 气候模型或液体动力学 simulations中的机器学习基 Parameters (i.e. 表示 sub-grid 过程的 representation) 被提议为一种有力的代替physical, but empirical, representations,提供更低的计算成本和更高的准确性。然而,这些方法仍然受到 extrapolation beyond the training data 的限制,这是 projecting 气候变化或未观察到的液体动力学 regime 的 kritical 因素。我们展示了一种多 fideliTY 方法,该方法 integrate 不同精度和充沛的数据集,可以提供best of both worlds:可以 extrapolate leveraging physically-based parameterization,同时使用 machine-learning-based parameterizations 提供更高的准确性。在气候模型中,我们的多 fideliTY 框架实现了更准确的气候预测,不需要大幅增加计算资源。我们的多 fideliTY randomized prior networks (MF-RPNs) 组合物理参数化数据作为low-fidelity,并使用风暴解决的历史数据作为高精度。为了 extrapolate beyond the training data,MF-RPNs 在高精度增温enario数据上进行测试。我们显示MF-RPNs 能够返回更有技巧的预测,比 Either low- 或 high-fidelity (历史数据) simulations 训练只在一个 режиме上,同时提供可靠的uncertainty quantification across a wide range of scenarios。我们的方法开创了使用机器学习基 methods 可以最佳地利用历史观察或高精度 simulations 并 extrapolate to unseen regimes such as climate change。