cs.LG - 2023-12-05

CaloQVAE : Simulating high-energy particle-calorimeter interactions using hybrid quantum-classical generative models

  • paper_url: http://arxiv.org/abs/2312.03179
  • repo_url: None
  • paper_authors: Sehmimul Hoque, Hao Jia, Abhishek Abhishek, Mojde Fadaie, J. Quetzalcoatl Toledo-Marín, Tiago Vale, Roger G. Melko, Maximilian Swiatlowski, Wojciech T. Fedorko
  • for: 这篇论文是用于描述大型哈丁撞击机 era 中的 Computational challenges 和 MC simulations 的方法。
  • methods: 这篇论文使用了 recent advancements in generative models 和 quantum annealing 来快速和高效地模拟高能量 particles 在探测器中的传播。
  • results: 这篇论文的结果显示了一种快速和高效的 MC simulation 方法,可以将 statistically uncertainty 降低到 experimental data 的水平。
    Abstract The Large Hadron Collider's high luminosity era presents major computational challenges in the analysis of collision events. Large amounts of Monte Carlo (MC) simulation will be required to constrain the statistical uncertainties of the simulated datasets below these of the experimental data. Modelling of high-energy particles propagating through the calorimeter section of the detector is the most computationally intensive MC simulation task. We introduce a technique combining recent advancements in generative models and quantum annealing for fast and efficient simulation of high-energy particle-calorimeter interactions.
    摘要 Large Hadron Collider 的高照度时期具有主要的计算挑战,需要大量的 Monte Carlo (MC) 模拟来约束实验数据中的统计不确定性。模拟高能粒子在探测器中的传播是 MC 模拟最为计算成本高的任务。我们介绍了一种 combining 最新的生成模型和量子搜索的技术,以快速和高效地模拟高能粒子-探测器交互。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know.

Active Learning for Abrupt Shifts Change-point Detection via Derivative-Aware Gaussian Processes

  • paper_url: http://arxiv.org/abs/2312.03176
  • repo_url: None
  • paper_authors: Hao Zhao, Rong Pan
  • for: 本研究旨在提出一种能够有效地检测数据中突然变化的方法,以便帮助各个领域的决策和资源分配。
  • methods: 本研究使用Derivative-Aware Change Detection(DACD)方法,该方法利用 Gaussian Process(GP)的导数过程进行活动学习(AL),以便有效地检测变化点。DACD通过多种数据收集函数(AFs)来均衡抽取和探索过程,从而提高算法效率并确保准确性。
  • results: 研究表明,DACD方法在多种场景下表现出优于其他活动学习变化检测方法。
    Abstract Change-point detection (CPD) is crucial for identifying abrupt shifts in data, which influence decision-making and efficient resource allocation across various domains. To address the challenges posed by the costly and time-intensive data acquisition in CPD, we introduce the Derivative-Aware Change Detection (DACD) method. It leverages the derivative process of a Gaussian process (GP) for Active Learning (AL), aiming to pinpoint change-point locations effectively. DACD balances the exploitation and exploration of derivative processes through multiple data acquisition functions (AFs). By utilizing GP derivative mean and variance as criteria, DACD sequentially selects the next sampling data point, thus enhancing algorithmic efficiency and ensuring reliable and accurate results. We investigate the effectiveness of DACD method in diverse scenarios and show it outperforms other active learning change-point detection approaches.
    摘要 change-point detection(CPD)对于数据中突然变化的检测是非常重要的,这对各个领域的决策和资源分配产生了深远的影响。为了解决CPD中数据收集所需的成本和时间困难,我们提出了Derivative-Aware Change Detection(DACD)方法。它利用GP derivative进行活动学习(AL),以有效地找到变化点的位置。DACD通过多个数据收集函数(AF)来平衡利用和探索 derivative 过程的权衡,从而提高算法的效率和可靠性。我们在多种情况下对DACD方法进行了研究,并证明它在其他活动学习变化检测方法的基础上具有更高的效果。

Adaptive spectral graph wavelets for collaborative filtering

  • paper_url: http://arxiv.org/abs/2312.03167
  • repo_url: None
  • paper_authors: Osama Alshareet, A. Ben Hamza
  • for: 提供个性化的ITEM建议给潜在用户,解决新用户无法提供充足的行为数据的冷启动问题。
  • methods: 使用spectral graph wavelet collaborative filtering框架,将用户、item和他们的交互表示为一个两分图。采用适应转换函数稳定图像频谱中变量的方法,并设计一种深度推荐模型,通过spectral graph wavelets在端到端的方式学习低维表示USER和ITEM。
  • results: 通过实验表明,提出的模型在实际benchmark数据集上达到了 stronger baseline方法的推荐性能。
    Abstract Collaborative filtering is a popular approach in recommender systems, whose objective is to provide personalized item suggestions to potential users based on their purchase or browsing history. However, personalized recommendations require considerable amount of behavioral data on users, which is usually unavailable for new users, giving rise to the cold-start problem. To help alleviate this challenging problem, we introduce a spectral graph wavelet collaborative filtering framework for implicit feedback data, where users, items and their interactions are represented as a bipartite graph. Specifically, we first propose an adaptive transfer function by leveraging a power transform with the goal of stabilizing the variance of graph frequencies in the spectral domain. Then, we design a deep recommendation model for efficient learning of low-dimensional embeddings of users and items using spectral graph wavelets in an end-to-end fashion. In addition to capturing the graph's local and global structures, our approach yields localization of graph signals in both spatial and spectral domains, and hence not only learns discriminative representations of users and items, but also promotes the recommendation quality. The effectiveness of our proposed model is demonstrated through extensive experiments on real-world benchmark datasets, achieving better recommendation performance compared with strong baseline methods.
    摘要 Our approach begins with an adaptive transfer function that leverages a power transform to stabilize the variance of graph frequencies in the spectral domain. This is followed by the design of a deep recommendation model that efficiently learns low-dimensional embeddings of users and items using spectral graph wavelets in an end-to-end fashion. Our approach not only captures the local and global structures of the graph, but also localizes graph signals in both spatial and spectral domains, leading to the learning of discriminative representations of users and items. As a result, our proposed model achieves better recommendation performance compared to strong baseline methods, as demonstrated through extensive experiments on real-world benchmark datasets.

Deep Learning for Fast Inference of Mechanistic Models’ Parameters

  • paper_url: http://arxiv.org/abs/2312.03166
  • repo_url: None
  • paper_authors: Maxim Borisyak, Stefan Born, Peter Neubauer, Mariano Nicolas Cruz-Bournazou
  • for: 这项研究的目的是提出一种使用深度神经网络(NN)直接预测机理模型参数的方法,以提高生物工程中参数估算的效率和精度。
  • methods: 该方法使用了一种组合神经网络和机理模型的训练程序,通过对实验数据进行预测来直接预测机理模型参数。
  • results: 研究发现,使用神经网络预测机理模型参数的方法可以提供较好的估算结果,并且比传统的梯度下降法更快速和更精度。
    Abstract Inferring parameters of macro-kinetic growth models, typically represented by Ordinary Differential Equations (ODE), from the experimental data is a crucial step in bioprocess engineering. Conventionally, estimates of the parameters are obtained by fitting the mechanistic model to observations. Fitting, however, requires a significant computational power. Specifically, during the development of new bioprocesses that use previously unknown organisms or strains, efficient, robust, and computationally cheap methods for parameter estimation are of great value. In this work, we propose using Deep Neural Networks (NN) for directly predicting parameters of mechanistic models given observations. The approach requires spending computational resources for training a NN, nonetheless, once trained, such a network can provide parameter estimates orders of magnitude faster than conventional methods. We consider a training procedure that combines Neural Networks and mechanistic models. We demonstrate the performance of the proposed algorithms on data sampled from several mechanistic models used in bioengineering describing a typical industrial batch process and compare the proposed method, a typical gradient-based fitting procedure, and the combination of the two. We find that, while Neural Network estimates are slightly improved by further fitting, these estimates are measurably better than the fitting procedure alone.
    摘要 寻求macro-运动生长模型参数的推断,通常表示为常微分方程(ODE),在生物过程工程中是一个关键步骤。传统上,参数估算通常通过模型适应来获得。然而,这需要较高的计算能力。特别是在开发新的生物过程中使用未知的微生物或株类时,能够快速、稳定、计算成本低的参数估算方法具有重要的价值。在这种情况下,我们提出使用深度神经网络(NN)直接预测机制模型中的参数。这种方法需要训练NN的计算资源,但一旦训练完成,可以在机制模型中提供参数估算,比传统方法快得多。我们考虑一种将神经网络和机制模型结合在一起的训练程序。我们在数据来自多种生物过程中常用的机制模型上进行了测试,并与传统的梯度下降方法和这两种方法进行比较。我们发现,虽然神经网络估算与进一步适应之间存在一定的改进,但神经网络估算的结果明显比梯度下降方法好。

Multitask Learning Can Improve Worst-Group Outcomes

  • paper_url: http://arxiv.org/abs/2312.03151
  • repo_url: https://github.com/atharvajk98/mtl-group-robustness
  • paper_authors: Atharva Kulkarni, Lucio Dery, Amrith Setlur, Aditi Raghunathan, Ameet Talwalkar, Graham Neubig
  • for: 本研究旨在 investigating the impact of multitask learning (MTL) on worst-group accuracy, as well as exploring MTL’s potential to address the challenge of group-wise fairness.
  • methods: 作者使用了 fine-tuning 方法,并在 end task 数据上构建了 pre-training 目标。在 absence of group annotations, 作者发现 multitasking 可以 achieve better worst-group accuracy than Just-Train-Twice (JTT) 方法。作者还提出了一种 modify 了 MTL 的方法,通过在 joint multitask representation space 中增加正则化来提高 worst-group accuracy.
  • results: 作者通过大量 fine-tuning 实验发现,其 modify 了 MTL 方法 consistently outperforms JTT 方法 on both worst and average group outcomes。code 可以在 https://github.com/atharvajk98/MTL-group-robustness 找到。
    Abstract In order to create machine learning systems that serve a variety of users well, it is vital to not only achieve high average performance but also ensure equitable outcomes across diverse groups. However, most machine learning methods are designed to improve a model's average performance on a chosen end task without consideration for their impact on worst group error. Multitask learning (MTL) is one such widely used technique. In this paper, we seek not only to understand the impact of MTL on worst-group accuracy but also to explore its potential as a tool to address the challenge of group-wise fairness. We primarily consider the common setting of fine-tuning a pre-trained model, where, following recent work (Gururangan et al., 2020; Dery et al., 2023), we multitask the end task with the pre-training objective constructed from the end task data itself. In settings with few or no group annotations, we find that multitasking often, but not always, achieves better worst-group accuracy than Just-Train-Twice (JTT; Liu et al. (2021)) -- a representative distributionally robust optimization (DRO) method. Leveraging insights from synthetic data experiments, we propose to modify standard MTL by regularizing the joint multitask representation space. We run a large number of fine-tuning experiments across computer vision and natural language and find that our regularized MTL approach consistently outperforms JTT on both worst and average group outcomes. Our official code can be found here: https://github.com/atharvajk98/MTL-group-robustness.
    摘要 要创建机器学习系统,以便服务于多样化的用户,是非常重要的。这些系统不仅需要达到高的平均性能,还需要确保对多个群体的输出结果具有公平性。然而,大多数机器学习方法都是为了提高模型的平均性能而设计的,而忽略了对最差群体的影响。在这篇论文中,我们不仅想要了解多任务学习(MTL)对最差群体精度的影响,还想要探索它是否可以用来解决群体公平性的挑战。我们主要考虑了在练习模型的情况下,通过将练习任务与预训练目标结合在一起来进行多任务学习。在具有少量或无群体注解的情况下,我们发现,通过多任务学习,可以 oftentimes,但并不总是,在最差群体精度方面比just-train-twice(JTT)方法(Liu et al., 2021)表现更好。基于对 sintetic data 的实验结果,我们提议修改标准 MTL,通过对共同多任务表示空间进行规范。我们在计算机视觉和自然语言领域进行了大量的练习实验,并发现,我们的规范 MTL 方法在最差和平均群体结果方面都能够超越 JTT。我们的官方代码可以在以下链接中找到:https://github.com/atharvajk98/MTL-group-robustness。

Neural parameter calibration and uncertainty quantification for epidemic forecasting

  • paper_url: http://arxiv.org/abs/2312.03147
  • repo_url: None
  • paper_authors: Thomas Gaskin, Tim Conrad, Grigorios A. Pavliotis, Christof Schütte
  • For: This paper aims to accurately forecast contagion dynamics and provide uncertainty quantification for pandemic projections.* Methods: The paper uses a novel computational method that combines a neural network with an ODE model to learn probability densities on contagion parameters and provide uncertainty quantification.* Results: The paper achieves a significantly more accurate calibration and prediction than Markov-Chain Monte Carlo (MCMC)-based sampling schemes, with meaningful confidence intervals on infection figures and hospitalisation rates. The method is also shown to converge to the true posterior on a simplified SIR model of epidemics and can learn complex models from a small number of compartments.
    Abstract The recent COVID-19 pandemic has thrown the importance of accurately forecasting contagion dynamics and learning infection parameters into sharp focus. At the same time, effective policy-making requires knowledge of the uncertainty on such predictions, in order, for instance, to be able to ready hospitals and intensive care units for a worst-case scenario without needlessly wasting resources. In this work, we apply a novel and powerful computational method to the problem of learning probability densities on contagion parameters and providing uncertainty quantification for pandemic projections. Using a neural network, we calibrate an ODE model to data of the spread of COVID-19 in Berlin in 2020, achieving both a significantly more accurate calibration and prediction than Markov-Chain Monte Carlo (MCMC)-based sampling schemes. The uncertainties on our predictions provide meaningful confidence intervals e.g. on infection figures and hospitalisation rates, while training and running the neural scheme takes minutes where MCMC takes hours. We show convergence of our method to the true posterior on a simplified SIR model of epidemics, and also demonstrate our method's learning capabilities on a reduced dataset, where a complex model is learned from a small number of compartments for which data is available.
    摘要 COVID-19 大流行 recent 使得精准预测传染动力和学习感染参数的重要性得到了抛弃光照。同时,有效的政策制定需要了解预测的不确定性,以便准备医院和重症监护单元面对最坏情况,无需浪费资源。在这项工作中,我们运用了一种新的计算方法来解决学习感染参数的概率密度和预测不确定性。使用神经网络,我们对柏林2020年COVID-19的传染情况进行了拟合,实现了较MCMC样本 schemes 更高的准确率和预测。我们的预测中的不确定性提供了有意义的信任范围,例如感染人数和医院化率,而训练和运行神经网络只需几分钟,MCMC则需要多少时间。我们证明我们的方法对简化的SIR模型的真后验进行了收敛,并且在减少数据集上展示了我们的方法的学习能力,从一个小量的分布中学习出复杂的模型。

A Hardware Evaluation Framework for Large Language Model Inference

  • paper_url: http://arxiv.org/abs/2312.03134
  • repo_url: None
  • paper_authors: Hengrui Zhang, August Ning, Rohan Prabhakar, David Wentzlaff
    for:LLMCompass is a hardware evaluation framework for Large Language Models (LLMs) inference workloads, aiming to evaluate different hardware designs and optimize their performance.methods:LLMCompass includes a mapper to automatically find performance-optimal mapping and scheduling, as well as an area-based cost model to help architects reason about their design choices.results:Compared to real-world hardware, LLMCompass’ estimated latency achieves an average 10.4% error rate across various operators with various input sizes and an average 4.1% error rate for LLM inference. With LLMCompass, simulating a 4-NVIDIA A100 GPU node running GPT-3 175B inference can be done within 16 minutes on commodity hardware, including 26,400 rounds of the mapper’s parameter search. The framework also explores new cost-effective hardware designs that can achieve as much as 3.41x improvement in performance/cost compared to an NVIDIA A100.
    Abstract The past year has witnessed the increasing popularity of Large Language Models (LLMs). Their unprecedented scale and associated high hardware cost have impeded their broader adoption, calling for efficient hardware designs. With the large hardware needed to simply run LLM inference, evaluating different hardware designs becomes a new bottleneck. This work introduces LLMCompass, a hardware evaluation framework for LLM inference workloads. LLMCompass is fast, accurate, versatile, and able to describe and evaluate different hardware designs. LLMCompass includes a mapper to automatically find performance-optimal mapping and scheduling. It also incorporates an area-based cost model to help architects reason about their design choices. Compared to real-world hardware, LLMCompass' estimated latency achieves an average 10.4% error rate across various operators with various input sizes and an average 4.1% error rate for LLM inference. With LLMCompass, simulating a 4-NVIDIA A100 GPU node running GPT-3 175B inference can be done within 16 minutes on commodity hardware, including 26,400 rounds of the mapper's parameter search. With the aid of LLMCompass, this work draws architectural implications and explores new cost-effective hardware designs. By reducing the compute capability or replacing High Bandwidth Memory (HBM) with traditional DRAM, these new designs can achieve as much as 3.41x improvement in performance/cost compared to an NVIDIA A100, making them promising choices for democratizing LLMs. LLMCompass is planned to be fully open-source.
    摘要 过去一年,大型语言模型(LLM)的普及度逐渐增长。它们的无 precedent 的规模和相应的高硬件成本,使得它们的更广泛的应用被阻碍。为了解决这个问题,这项工作提出了 LLMCompass,一个用于 LLM 推理工作负荷的硬件评估框架。LLMCompass 具有快速、准确、多样化和可以描述和评估不同硬件设计的特点。LLMCompass 包括一个映射器,可以自动找到性能优化的映射和调度。它还包括一个面积基于的成本模型,帮助建筑师思考他们的设计选择。与实际硬件相比,LLMCompass 的估算延迟 Error Rate 为 10.4% 以上,并且对不同的输入大小和 LLM 推理而言具有平均 4.1% 的误差率。通过使用 LLMCompass,可以在常见硬件上模拟一个运行 GPT-3 175B 推理的 4 个 NVIDIA A100 GPU 节点,需要 16 分钟的时间,包括 26,400 次映射器的参数搜索。LLMCompass 可以帮助推断出新的成本效果的硬件设计,例如通过减少计算能力或者将高频带储存(HBM)替换为传统的 DDR,可以实现与 NVIDIA A100 相当的性能/成本比,达到 3.41 倍的提升。LLMCompass 计划将是完全开源的。

Advantage of Quantum Machine Learning from General Computational Advantages

  • paper_url: http://arxiv.org/abs/2312.03057
  • repo_url: None
  • paper_authors: Hayata Yamasaki, Natsuto Isogai, Mio Murao
  • for: 这 paper 的目的是证明量子机器学习(QML)在supervised learning task中的优势,并进一步证明 QML 可以在更广泛的学习任务中展示其优势。
  • methods: 这 paper 使用了一种普遍的量子算法优势来构建一个更广泛的学习任务,并证明这种任务是不可能由任何类型的常见算法解决。
  • results: 这 paper 证明了 QML 在这个更广泛的学习任务中的优势,并提供了准确的准则来评估 QML 的优势。
    Abstract An overarching milestone of quantum machine learning (QML) is to demonstrate the advantage of QML over all possible classical learning methods in accelerating a common type of learning task as represented by supervised learning with classical data. However, the provable advantages of QML in supervised learning have been known so far only for the learning tasks designed for using the advantage of specific quantum algorithms, i.e., Shor's algorithms. Here we explicitly construct an unprecedentedly broader family of supervised learning tasks with classical data to offer the provable advantage of QML based on general quantum computational advantages, progressing beyond Shor's algorithms. Our learning task is feasibly achievable by executing a general class of functions that can be computed efficiently in polynomial time for a large fraction of inputs by arbitrary quantum algorithms but not by any classical algorithm. We prove the hardness of achieving this learning task for any possible polynomial-time classical learning method. We also clarify protocols for preparing the classical data to demonstrate this learning task in experiments. These results open routes to exploit a variety of quantum advantages in computing functions for the experimental demonstration of the advantage of QML.
    摘要 全面的里程碑之一在量子机器学习(QML)领域是证明QML在所有可能的类传统学习方法上具有优势,以加速常见的学习任务。然而,截至目前,只有使用特定量子算法的学习任务中的优势被证明为QML的优势。在这里,我们明确构造了一个新的、前所未有的超级vised learning任务,使得QML具有基于通用量子计算优势的证明优势。我们证明任务是可以由任意量子算法efficiently处理的,但是不可以由任何类传统学习方法处理。我们还阐述了准备经典数据的协议,以便在实验中证明这个学习任务。这些结果开启了在计算函数方面利用量子优势的路径,并进一步推动QML的应用。

Learning High-Dimensional Differential Graphs From Multi-Attribute Data

  • paper_url: http://arxiv.org/abs/2312.03761
  • repo_url: None
  • paper_authors: Jitendra K Tugnait
  • for: 这篇论文目的是为了估计两个泊尔图模型(GGM)之间的差异,这两个GGM都知道具有相似的结构。
  • methods: 这篇论文使用了一种基于D-trace损失函数的 групlace penalty方法来学习多属性数据中的差异图模型。一种基于多重方向乘法法(ADMM)的优化算法也是提出的。
  • results: 论文的 тео리тиче分析表明,在高维设置下,这种方法能够支持恢复和估计差异图模型。同时,数据分析结果表明,这种方法可以在实际数据中具有良好的性能。
    Abstract We consider the problem of estimating differences in two Gaussian graphical models (GGMs) which are known to have similar structure. The GGM structure is encoded in its precision (inverse covariance) matrix. In many applications one is interested in estimating the difference in two precision matrices to characterize underlying changes in conditional dependencies of two sets of data. Existing methods for differential graph estimation are based on single-attribute (SA) models where one associates a scalar random variable with each node. In multi-attribute (MA) graphical models, each node represents a random vector. In this paper, we analyze a group lasso penalized D-trace loss function approach for differential graph learning from multi-attribute data. An alternating direction method of multipliers (ADMM) algorithm is presented to optimize the objective function. Theoretical analysis establishing consistency in support recovery and estimation in high-dimensional settings is provided. Numerical results based on synthetic as well as real data are presented.
    摘要 我们考虑了两个 Gaussian graphical model (GGM) 的差异估计问题,这两个 GGM 知道它们的结构相似。GGM 的结构是它的精度矩阵 (逆covariance matrix) 所编码的。在许多应用中,我们 interesseted in 估计两个 GGM 的精度矩阵之间的差异,以描述两aset of data 之间的下游相依性变化。现有的方法是基于单一属性 (SA) 模型,其中每个 node 都相关着一个数值随机变量。在多属性 (MA) 图形模型中,每个 node 表示一个随机 вектор。在这篇论文中,我们分析了一个 group lasso 抑制 D-trace 损失函数的方法来进行差异图学学习。我们提出了一个 alternating direction method of multipliers (ADMM) 算法来优化目标函数。我们提供了 teorical 分析,证明了在高维设定下支持回溯和估计的一致性。我们还提供了基于真实数据的numerical 结果。

Detecting algorithmic bias in medical AI-models

  • paper_url: http://arxiv.org/abs/2312.02959
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Jeffrey Smith, Andre Holder, Rishikesan Kamaleswaran, Yao Xie
  • for: 该论文旨在确保机器学习和人工智能基于医疗决策支持系统提供公正和公平的结果。
  • methods: 该论文提出了一种创新的方法来检测医疗-AI决策支持系统中的算法偏见。该方法使用分类和回归树(CART)算法,并通过 synthetic data 实验和实际医疗记录数据进行验证。
  • results: 该论文的实验结果表明,该方法可以准确地检测医疗-AI 模型中的偏见,并在实际临床环境中提供了一种有效的公正性验证工具。
    Abstract With the growing prevalence of machine learning and artificial intelligence-based medical decision support systems, it is equally important to ensure that these systems provide patient outcomes in a fair and equitable fashion. This paper presents an innovative framework for detecting areas of algorithmic bias in medical-AI decision support systems. Our approach efficiently identifies potential biases in medical-AI models, specifically in the context of sepsis prediction, by employing the Classification and Regression Trees (CART) algorithm. We verify our methodology by conducting a series of synthetic data experiments, showcasing its ability to estimate areas of bias in controlled settings precisely. The effectiveness of the concept is further validated by experiments using electronic medical records from Grady Memorial Hospital in Atlanta, Georgia. These tests demonstrate the practical implementation of our strategy in a clinical environment, where it can function as a vital instrument for guaranteeing fairness and equity in AI-based medical decisions.
    摘要

Attention-enhanced neural differential equations for physics-informed deep learning of ion transport

  • paper_url: http://arxiv.org/abs/2312.02871
  • repo_url: None
  • paper_authors: Danyal Rehman, John H. Lienhard
  • for: 模型transportnanoporous系统中的离子运输
  • methods: 使用机器学习方法,特别是注意力增强神经 diferencial equations,以提高模型的泛化性能
  • results: physics-informed deep learning solutions可以超越传统PDE-based方法,并提供模拟复杂运输现象的可能性
    Abstract Species transport models typically combine partial differential equations (PDEs) with relations from hindered transport theory to quantify electromigrative, convective, and diffusive transport through complex nanoporous systems; however, these formulations are frequently substantial simplifications of the governing dynamics, leading to the poor generalization performance of PDE-based models. Given the growing interest in deep learning methods for the physical sciences, we develop a machine learning-based approach to characterize ion transport across nanoporous membranes. Our proposed framework centers around attention-enhanced neural differential equations that incorporate electroneutrality-based inductive biases to improve generalization performance relative to conventional PDE-based methods. In addition, we study the role of the attention mechanism in illuminating physically-meaningful ion-pairing relationships across diverse mixture compositions. Further, we investigate the importance of pre-training on simulated data from PDE-based models, as well as the performance benefits from hard vs. soft inductive biases. Our results indicate that physics-informed deep learning solutions can outperform their classical PDE-based counterparts and provide promising avenues for modelling complex transport phenomena across diverse applications.
    摘要 种类运输模型通常将 partial differential equations (PDEs) 与阻碍运输理论的关系组合以量化电动力、涌动和扩散运输过复杂的奈米孔系统; however, these formulations are frequently substantial simplifications of the governing dynamics, leading to the poor generalization performance of PDE-based models. 给出了物理科学中深度学习方法的增长兴趣,我们开发了一种基于机器学习的方法来Characterize ion transport across nanoporous membranes. Our proposed framework centers around attention-enhanced neural differential equations that incorporate electroneutrality-based inductive biases to improve generalization performance relative to conventional PDE-based methods. In addition, we study the role of the attention mechanism in illuminating physically-meaningful ion-pairing relationships across diverse mixture compositions. Further, we investigate the importance of pre-training on simulated data from PDE-based models, as well as the performance benefits from hard vs. soft inductive biases. Our results indicate that physics-informed deep learning solutions can outperform their classical PDE-based counterparts and provide promising avenues for modelling complex transport phenomena across diverse applications.Note: Simplified Chinese is also known as "简化字" or "简体字".

REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training

  • paper_url: http://arxiv.org/abs/2312.03044
  • repo_url: https://github.com/zhao1402072392/rest
  • paper_authors: Jiaxu Zhao, Lu Yin, Shiwei Liu, Meng Fang, Mykola Pechenizkiy
  • for: 本研究旨在提高深度神经网络(DNN)在不同数据集中的表现,特别是在批处理大数据时。
  • methods: 本研究提出了一种重新权重的简 sparse 训练框架(REST),通过增强训练数据中的偏好性,提高模型在偏好数据上的表现。
  • results: 实验表明,REST 框架可以有效地降低模型对偏好性的依赖,提高模型的一致性和泛化能力。 code 在 \url{https://github.com/zhao1402072392/REST} 上发布。
    Abstract The deep neural network (DNN) has been proven effective in various domains. However, they often struggle to perform well on certain minority groups during inference, despite showing strong performance on the majority of data groups. This is because over-parameterized models learned \textit{bias attributes} from a large number of \textit{bias-aligned} training samples. These bias attributes are strongly spuriously correlated with the target variable, causing the models to be biased towards spurious correlations (i.e., \textit{bias-conflicting}). To tackle this issue, we propose a novel \textbf{re}weighted \textbf{s}parse \textbf{t}raining framework, dubbed as \textit{\textbf{REST}, which aims to enhance the performance of biased data while improving computation and memory efficiency. Our proposed REST framework has been experimentally validated on three datasets, demonstrating its effectiveness in exploring unbiased subnetworks. We found that REST reduces the reliance on spuriously correlated features, leading to better performance across a wider range of data groups with fewer training and inference resources. We highlight that the \textit{REST} framework represents a promising approach for improving the performance of DNNs on biased data, while simultaneously improving computation and memory efficiency. By reducing the reliance on spurious correlations, REST has the potential to enhance the robustness of DNNs and improve their generalization capabilities. Code is released at \url{https://github.com/zhao1402072392/REST}
    摘要 深度神经网络(DNN)在不同领域都有证明其效果。然而,它们在推理时经常对少数群体表现不佳,即使在大量数据组中表现出色。这是因为DNN学习了偏见特征,这些特征与目标变量强烈相关。这些偏见特征来自大量偏见对齐的训练样本。这导致模型偏爱这些偏见特征,从而导致模型偏爱假 correlations(即偏见冲突)。为解决这个问题,我们提出了一种重新权重的稀疏训练框架,名为REST(重新权重的稀疏训练)。REST框架的目标是在不良数据上提高性能,同时提高计算和存储效率。我们在三个数据集上进行了实验,并证明REST框架的效果。我们发现,REST可以减少依赖于假 correlations的特征,从而提高性能 across a wider range of data groups,并采用 fewer training and inference resources。我们强调,REST框架代表了改进DNN性能的有力方法,同时提高计算和存储效率。通过减少依赖于假 correlations,REST有可能提高DNN的Robustness和泛化能力。代码可以在 \url{https://github.com/zhao1402072392/REST} 中下载。

Semi-Supervised Health Index Monitoring with Feature Generation and Fusion

  • paper_url: http://arxiv.org/abs/2312.02867
  • repo_url: None
  • paper_authors: Gaëtan Frusque, Ismail Nejjar, Majid Nabavi, Olga Fink
  • for: 本研究旨在提供一种可靠且cost-effective的健康指标(Health Index,HI)估算方法,用于检测系统异常和预测系统剩余有用寿命,特别适用于需要高安全可靠性的系统。
  • methods: 本研究使用深度半监督异常检测(DeepSAD)方法构建HI,并使用DeepSAD嵌入器作为状况指标以解决可 interpretability 和系统特有因素的敏感性问题。我们还引入多样性损失以增加状况指标的多样性。
  • results: 在PHME 2010 磨削数据集上验证,我们的方法可以提供有意义的HI估算结果。此外,我们还应用了这种方法监测热涂敷剂的温度变化,以获得更加可靠和可访问的HI估算结果。
    Abstract The Health Index (HI) is crucial for evaluating system health, aiding tasks like anomaly detection and predicting remaining useful life for systems demanding high safety and reliability. Tight monitoring is crucial for achieving high precision at a lower cost, with applications such as spray coating. Obtaining HI labels in real-world applications is often cost-prohibitive, requiring continuous, precise health measurements. Therefore, it is more convenient to leverage run-to failure datasets that may provide potential indications of machine wear condition, making it necessary to apply semi-supervised tools for HI construction. In this study, we adapt the Deep Semi-supervised Anomaly Detection (DeepSAD) method for HI construction. We use the DeepSAD embedding as a condition indicators to address interpretability challenges and sensitivity to system-specific factors. Then, we introduce a diversity loss to enrich condition indicators. We employ an alternating projection algorithm with isotonic constraints to transform the DeepSAD embedding into a normalized HI with an increasing trend. Validation on the PHME 2010 milling dataset, a recognized benchmark with ground truth HIs demonstrates meaningful HIs estimations. Our methodology is then applied to monitor wear states of thermal spray coatings using high-frequency voltage. Our contributions create opportunities for more accessible and reliable HI estimation, particularly in cases where obtaining ground truth HI labels is unfeasible.
    摘要 健康指数(HI)是评估系统健康的关键指标,有助于异常检测和预测系统剩余有用生命期。严格监测是实现高精度的关键,其应用包括涂抹技术。在实际应用中获得HI标签是经济不可能的,需要连续、精度高的健康测量。因此,我们选择利用运行至故障数据来建立HI。在这种情况下,我们采用深度半supervised检测方法(DeepSAD)来建立HI。我们使用DeepSAD嵌入作为机器磨损状况指标,以解决可 interpretability 和系统特定因素的敏感性问题。然后,我们引入多样性损失,以便增加condition指标的多样性。我们采用交叉 проекction算法和iso逻辑约束来转换DeepSAD嵌入,以获得正负排序的HI,HI的增长趋势。验证PHME 2010 毯剂数据集,一个公认的benchmark,我们得到了有意义的HI估计。我们的方法后来应用于监测热涂抹层的磨损状况,我们的贡献会创造更加可 accessible 和可靠的HI估计方法,特别是在获得真实HI标签是不可能的情况下。

Lessons from Usable ML Deployments and Application to Wind Turbine Monitoring

  • paper_url: http://arxiv.org/abs/2312.02859
  • repo_url: None
  • paper_authors: Alexandra Zytek, Wei-En Wang, Sofia Koukoura, Kalyan Veeramachaneni
  • for: 这篇论文是关于可用机器学习(usable ML)的应用于现实世界领域的经验分享。
  • methods: 论文中使用了 bridges 的概念,即将机器学习开发人员和领域专家相连接的人员,以开发可用 ML 应用程序。论文还提出了一种可配置的系统,用于在与 bridges 的合作中轻松地进行可用 ML 界面的迭代。
  • results: 论文通过应用这些经验到风力机监测任务中,展示了可用 ML 在可再生能源领域中的实际影响。在风力机监测中,机器学习开发人员和数据分析员需要决定是否进行 expensive in-person 调查,以避免 potential 的缸盘失效。论文示出了如何使用可用 ML 界面来帮助决策过程。
    Abstract Through past experiences deploying what we call usable ML (one step beyond explainable ML, including both explanations and other augmenting information) to real-world domains, we have learned three key lessons. First, many organizations are beginning to hire people who we call ``bridges'' because they bridge the gap between ML developers and domain experts, and these people fill a valuable role in developing usable ML applications. Second, a configurable system that enables easily iterating on usable ML interfaces during collaborations with bridges is key. Finally, there is a need for continuous, in-deployment evaluations to quantify the real-world impact of usable ML. Throughout this paper, we apply these lessons to the task of wind turbine monitoring, an essential task in the renewable energy domain. Turbine engineers and data analysts must decide whether to perform costly in-person investigations on turbines to prevent potential cases of brakepad failure, and well-tuned usable ML interfaces can aid with this decision-making process. Through the applications of our lessons to this task, we hope to demonstrate the potential real-world impact of usable ML in the renewable energy domain.
    摘要 在过去的实践中,我们发现了三个关键的教训,这些教训在实现可用机器学习(一步超过可解释机器学习,包括解释和其他增强信息)应用中非常重要。首先,许多组织开始招聘我们称为“桥梁”的人,这些人将机器学习开发者和领域专家之间的隔阂bridge,他们在开发可用机器学习应用中扮演了非常重要的角色。第二,一个可配置的系统,可以轻松地在与桥梁合作时进行可用机器学习界面的迭代,是非常重要的。最后,在部署过程中进行连续评估,以评估可用机器学习在实际世界中的影响,是非常重要的。在这篇论文中,我们将这些教训应用于风力机监测任务,这是可再生能源领域的关键任务。风机工程师和数据分析师必须决定是否进行costly的面对面调查,以避免潜在的制动盘失效情况,而且良好的可用机器学习界面可以帮助决策过程中。通过对这个任务的应用,我们希望能够展示可用机器学习在可再生能源领域的实际影响。

Expert-guided Bayesian Optimisation for Human-in-the-loop Experimental Design of Known Systems

  • paper_url: http://arxiv.org/abs/2312.02852
  • repo_url: https://github.com/trsav/hitl-bo
  • paper_authors: Tom Savage, Ehecatl Antonio del Rio Chanona
  • for: 该论文的目的是使用高通量抽象 Bayesian 优化和人类决策理论来让领域专家影响优化实验的选择。
  • methods: 该方法利用人类在 discrete 决策方面的优势,并在初期决策中让专家产生影响。在每一轮中,我们解决一个增强多目标优化问题,以最大化sum 的 utility function 值和 covariance 矩阵的 determinant,等于 total 变化。在 Pareto 前折线的叉点处选择解决方案,并返回一组具有高 utility 值和合理差异的 alternate 解决方案,由专家选择一个进行评估。
  • results: 我们表明,即使专家无知,我们的算法仍可回归标准 Bayesian 优化的 regret。
    Abstract Domain experts often possess valuable physical insights that are overlooked in fully automated decision-making processes such as Bayesian optimisation. In this article we apply high-throughput (batch) Bayesian optimisation alongside anthropological decision theory to enable domain experts to influence the selection of optimal experiments. Our methodology exploits the hypothesis that humans are better at making discrete choices than continuous ones and enables experts to influence critical early decisions. At each iteration we solve an augmented multi-objective optimisation problem across a number of alternate solutions, maximising both the sum of their utility function values and the determinant of their covariance matrix, equivalent to their total variability. By taking the solution at the knee point of the Pareto front, we return a set of alternate solutions at each iteration that have both high utility values and are reasonably distinct, from which the expert selects one for evaluation. We demonstrate that even in the case of an uninformed practitioner, our algorithm recovers the regret of standard Bayesian optimisation.
    摘要 域名专家经常拥有可贵的物理洞察,这些洞察在完全自动化的决策过程中被忽略,如极高精度优化。在这篇文章中,我们将高速批量的极高精度优化与人类决策理论相结合,以便域名专家可以影响选择优化实验的决策。我们的方法假设人类在作出离散选择时比在连续选择时更好,并让专家在早期决策中发挥影响。在每个迭代中,我们解决一个增强多目标优化问题,其中每个解决方案的总用值和决定矩阵的 determinant 都达到最大化。通过在 Pareto 前凹点处选择解决方案,我们每次返回一组具有高用值和合理分化的 alternate solution,由专家选择进行评估。我们示例显示,即使域名专家无知,我们的算法仍可恢复标准极高精度优化的 regret。

A Kernel-Based Neural Network Test for High-dimensional Sequencing Data Analysis

  • paper_url: http://arxiv.org/abs/2312.02850
  • repo_url: None
  • paper_authors: Tingting Hou, Chang Jiang, Qing Lu
  • for: 这个论文的目的是为了探讨高维数据分析中使用人工智能技术,尤其是深度神经网络技术,以及如何在高维数据分析中使用这些技术。
  • methods: 这个论文使用了 kernel-based neural network (KNN) 方法,这种方法使用了随机效应来模型高维遗传数据的总效应,并使用 kernel-based 神经网络结构来模型复杂的遗传型病理关系。
  • results: 通过 simulate 的结果表明,这种方法在面对非线性和交互效应时有更高的力度,比如 SKAT 方法。此外,这个论文还应用了这种方法到了 Alzheimer’s Disease Neuroimaging Initiative (ADNI) 研究中的整个基因组序列数据,发现了一些新的与脑干体积变化相关的遗传变化。
    Abstract The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has been rarely used in sequencing data analysis due to challenges brought by high-dimensional sequencing data (e.g., overfitting). Moreover, due to the complexity of neural networks and their unknown limiting distributions, building association tests on neural networks for genetic association analysis remains a great challenge. To address these challenges and fill the important gap of using AI in high-dimensional sequencing data analysis, we introduce a new kernel-based neural network (KNN) test for complex association analysis of sequencing data. The test is built on our previously developed KNN framework, which uses random effects to model the overall effects of high-dimensional genetic data and adopts kernel-based neural network structures to model complex genotype-phenotype relationships. Based on KNN, a Wald-type test is then introduced to evaluate the joint association of high-dimensional genetic data with a disease phenotype of interest, considering non-linear and non-additive effects (e.g., interaction effects). Through simulations, we demonstrated that our proposed method attained higher power compared to the sequence kernel association test (SKAT), especially in the presence of non-linear and interaction effects. Finally, we apply the methods to the whole genome sequencing (WGS) dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, investigating new genes associated with the hippocampal volume change over time.
    摘要 Recent advances in artificial intelligence (AI) technology, particularly in deep neural network (DNN) technology, have revolutionized many fields. However, DNN has been rarely used in sequencing data analysis due to the challenges posed by high-dimensional sequencing data, such as overfitting. Moreover, the complexity of neural networks and their unknown limiting distributions make it difficult to build association tests on neural networks for genetic association analysis. To address these challenges and fill the important gap of using AI in high-dimensional sequencing data analysis, we propose a new kernel-based neural network (KNN) test for complex association analysis of sequencing data. Our test is built on our previously developed KNN framework, which uses random effects to model the overall effects of high-dimensional genetic data and adopts kernel-based neural network structures to model complex genotype-phenotype relationships. Based on KNN, we introduce a Wald-type test to evaluate the joint association of high-dimensional genetic data with a disease phenotype of interest, considering non-linear and non-additive effects (e.g., interaction effects). Through simulations, we demonstrated that our proposed method attained higher power compared to the sequence kernel association test (SKAT), especially in the presence of non-linear and interaction effects. Finally, we apply the methods to the whole genome sequencing (WGS) dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, investigating new genes associated with the hippocampal volume change over time.

Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space

  • paper_url: http://arxiv.org/abs/2312.02849
  • repo_url: None
  • paper_authors: Yiheng Jiang, Sinho Chewi, Aram-Alexandre Pooladian
  • For: The paper is written for optimizing functionals over finite-dimensional polyhedral subsets of the Wasserstein space, with a main application in mean-field variational inference.* Methods: The paper uses first-order methods for optimization over these polyhedral subsets, and provides approximation rates and an algorithm for minimizing the KL divergence over these sets.* Results: The paper obtains accelerated convergence with a complexity of $O(\sqrt \kappa \log(\kappa d/\varepsilon^2))$, where $\kappa$ is the condition number of the distribution being optimized.Here’s the Chinese translation of the three pieces of information:* For: 本文是为了优化函数als over finite-dimensional polyhedral subsets of Wasserstein space 的问题,主要应用在mean-field variational inference中。* Methods: 本文使用first-order methods for optimization over这些polyhedral subsets,并提供了一个approximation rates和一个算法来最小化KL divergence over这些sets。* Results: 本文获得了$O(\sqrt \kappa \log(\kappa d/\varepsilon^2)))$的加速 convergence,where $\kappa$是被优化分布的condition number。
    Abstract We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $\pi$ over $\mathbb{R}^d$ by a product measure $\pi^\star$. When $\pi$ is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that $\pi^\star$ is close to the minimizer $\pi^\star_\diamond$ of the KL divergence over a \emph{polyhedral} set $\mathcal{P}_\diamond$, and (2) an algorithm for minimizing $\text{KL}(\cdot\|\pi)$ over $\mathcal{P}_\diamond$ with accelerated complexity $O(\sqrt \kappa \log(\kappa d/\varepsilon^2))$, where $\kappa$ is the condition number of $\pi$.
    摘要 我们开发了一个有限维多面体子集的理论,该子集位于 Wasserstein 空间上,并通过首次方法优化函数ional。我们的主要应用是在mean-fieldvariational推理中,这是一个将分布 $\pi$ approximated by a product measure $\pi^\star$ 的问题。当 $\pi$ 是强式log-concave和log-smooth时,我们提供了以下两个result:1. 确认 $\pi^\star$ 是 $\mathcal{P}_\diamond$ 中的最佳解,其中 $\mathcal{P}_\diamond$ 是一个多面体子集,并且提供了一个 $O(\sqrt \kappa \log(\kappa d/\varepsilon^2)))$ 的 accelerated complexity 的算法,其中 $\kappa$ 是 $\pi$ 的condition number。2. 对 $\mathcal{P}_\diamond$ 中的函数ional $\text{KL}(\cdot\|\pi)$ 进行最小化,并提供了一个 $O(\sqrt \kappa \log(\kappa d/\varepsilon^2)))$ 的 accelerated complexity 的算法。Here's the translation in Traditional Chinese:我们将开发一个有限维多面体子集的理论,该子集位于 Wasserstein 空间上,并通过首次方法优化函数ional。我们的主要应用是在mean-fieldvariational推理中,这是一个将分布 $\pi$ approximated by a product measure $\pi^\star$ 的问题。当 $\pi$ 是强式log-concave和log-smooth时,我们提供了以下两个结果:1. 确认 $\pi^\star$ 是 $\mathcal{P}_\diamond$ 中的最佳解,其中 $\mathcal{P}_\diamond$ 是一个多面体子集,并且提供了一个 $O(\sqrt \kappa \log(\kappa d/\varepsilon^2)))$ 的 accelerated complexity 的算法,其中 $\kappa$ 是 $\pi$ 的condition number。2. 对 $\mathcal{P}_\diamond$ 中的函数ional $\text{KL}(\cdot\|\pi)$ 进行最小化,并提供了一个 $O(\sqrt \kappa \log(\kappa d/\varepsilon^2)))$ 的 accelerated complexity 的算法。

Transformer-Based Deep Learning Model for Bored Pile Load-Deformation Prediction in Bangkok Subsoil

  • paper_url: http://arxiv.org/abs/2312.03041
  • repo_url: None
  • paper_authors: Sompote Youwai, Chissanupong Thongnoo
  • for: 预测大钻孔杆在曼尼索イル底层中的荷载-减压行为
  • methods: 使用变换器架构深度学习模型,编码土壤Profile和钻孔特征作为token输入,生成荷载-减压曲线输出,并 incorporate上一个顺序数据来提高预测精度
  • results: 模型在测试数据上显示了满意的准确率和泛化能力,误差为5.72%,可用于 Parametric analysis和设计优化钻孔在不同的土壤和钻孔条件下。
    Abstract This paper presents a novel deep learning model based on the transformer architecture to predict the load-deformation behavior of large bored piles in Bangkok subsoil. The model encodes the soil profile and pile features as tokenization input, and generates the load-deformation curve as output. The model also incorporates the previous sequential data of load-deformation curve into the decoder to improve the prediction accuracy. The model also incorporates the previous sequential data of load-deformation curve into the decoder. The model shows a satisfactory accuracy and generalization ability for the load-deformation curve prediction, with a mean absolute error of 5.72% for the test data. The model could also be used for parametric analysis and design optimization of piles under different soil and pile conditions, pile cross section, pile length and type of pile.
    摘要

Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications

  • paper_url: http://arxiv.org/abs/2312.02828
  • repo_url: None
  • paper_authors: Rajeeva L. Karandikar, M. Vidyasagar
  • for: 本文探讨了Stochastic Approximation(SA)算法在各种应用中的性能,包括非 convex 优化和 reinforcement learning(RL)。
  • methods: 本文扩展了SA理论,包括了随机 error 的非零 conditional mean 和 unbounded conditional variance,以及异步 SA。
  • results: 本文 Compute the “optimal step size sequences” to maximize the estimated rate of convergence of the algorithm, and prove that SA converges in nonconvex optimization and Markovian SA situations.
    Abstract The Stochastic Approximation (SA) algorithm introduced by Robbins and Monro in 1951 has been a standard method for solving equations of the form $\mathbf{f}({\boldsymbol {\theta}) = \mathbf{0}$, when only noisy measurements of $\mathbf{f}(\cdot)$ are available. If $\mathbf{f}({\boldsymbol {\theta}) = \nabla J({\boldsymbol {\theta})$ for some function $J(\cdot)$, then SA can also be used to find a stationary point of $J(\cdot)$. In much of the literature, it is assumed that the error term ${\boldsymbol {xi}_{t+1}$ has zero conditional mean, and that its conditional variance is bounded as a function of $t$ (though not necessarily with respect to ${\boldsymbol {\theta}_t$). Also, for the most part, the emphasis has been on ``synchronous'' SA, whereby, at each time $t$, \textit{every} component of ${\boldsymbol {\theta}_t$ is updated. Over the years, SA has been applied to a variety of areas, out of which two are the focus in this paper: Convex and nonconvex optimization, and Reinforcement Learning (RL). As it turns out, in these applications, the above-mentioned assumptions do not always hold. In zero-order methods, the error neither has zero mean nor bounded conditional variance. In the present paper, we extend SA theory to encompass errors with nonzero conditional mean and/or unbounded conditional variance, and also asynchronous SA. In addition, we derive estimates for the rate of convergence of the algorithm. Then we apply the new results to problems in nonconvex optimization, and to Markovian SA, a recently emerging area in RL. We prove that SA converges in these situations, and compute the ``optimal step size sequences'' to maximize the estimated rate of convergence.
    摘要 estone Approximation(SA)算法, introduction by Robbins and Monro in 1951, 是一种标准的解决Equations of the form $\mathbf{f}({\boldsymbol {\theta}) = \mathbf{0}$ 的方法,只有各个噪声测量 $\mathbf{f}(\cdot)$ 可用。如果 $\mathbf{f}({\boldsymbol {\theta}) = \nabla J({\boldsymbol {\theta})$ для某函数 $J(\cdot)$ , then SA 也可以用来找到 $J(\cdot)$ 的站点点。在大量的文献中,假设 $\mathbf{xi}_{t+1}$ 的 conditional mean 为零,并且其 conditional variance 随 $t$ 增长。此外,大多数情况下,强调“同步” SA,即在每个时间 $t$ 中,\textit{每一个} ${\boldsymbol {\theta}_t$ 的更新。总之,在这篇文章中,我们将SA理论扩展到包括噪声error 的非零 conditional mean 和/或不bounded conditional variance,并且 asynchronous SA。此外,我们 derivestimates 的速度收敛率,然后应用新结果到非对称优化和Markovian SA 中的问题上,证明 SA 在这些情况下收敛,并计算了“最佳步长序列” 以最大化 estimated rate of convergence。

Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems

  • paper_url: http://arxiv.org/abs/2312.02804
  • repo_url: None
  • paper_authors: Céline Comte, Matthieu Jonckheere, Jaron Sanders, Albert Senen-Cerda
  • for: 这个论文的目的是解决Markov决策过程(MDP)中的异常大状态和动作空间以及非凸目标函数,使得许多机器学习(RL)算法无法 converges。
  • methods: 这篇论文提出了一种新的摘要估计器called score-aware gradient estimators(SAGEs),可以在MDP的站点分布是 exponential family parametrized by policy parameters时估计策vector gradient,无需计算值函数。
  • results: 研究表明,在两个常见的控制问题中,SAGEs可以更快地找到优化策略,并且在非凸目标函数和多个最大值时,策略的概率很高地 converges to 优化策略,只要它们在优化策略附近开始。
    Abstract Stochastic networks and queueing systems often lead to Markov decision processes (MDPs) with large state and action spaces as well as nonconvex objective functions, which hinders the convergence of many reinforcement learning (RL) algorithms. Policy-gradient methods perform well on MDPs with large state and action spaces, but they sometimes experience slow convergence due to the high variance of the gradient estimator. In this paper, we show that some of these difficulties can be circumvented by exploiting the structure of the underlying MDP. We first introduce a new family of gradient estimators called score-aware gradient estimators (SAGEs). When the stationary distribution of the MDP belongs to an exponential family parametrized by the policy parameters, SAGEs allow us to estimate the policy gradient without relying on value-function estimation, contrary to classical policy-gradient methods like actor-critic. To demonstrate their applicability, we examine two common control problems arising in stochastic networks and queueing systems whose stationary distributions have a product-form, a special case of exponential families. As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers. Our key assumptions are that, locally around a maximizer, a nondegeneracy property of the Hessian of the objective function holds and a Lyapunov function exists. Finally, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic algorithm. The results demonstrate that the SAGE-based method finds close-to-optimal policies more rapidly, highlighting its superior performance over the traditional actor-critic method.
    摘要 Stochastic networks and queueing systems oft lead to Markov decision processes (MDPs) with large state and action spaces as well as nonconvex objective functions, which hinders the convergence of many reinforcement learning (RL) algorithms. Policy-gradient methods perform well on MDPs with large state and action spaces, but they sometimes experience slow convergence due to the high variance of the gradient estimator. In this paper, we show that some of these difficulties can be circumvented by exploiting the structure of the underlying MDP. We first introduce a new family of gradient estimators called score-aware gradient estimators (SAGEs). When the stationary distribution of the MDP belongs to an exponential family parametrized by the policy parameters, SAGEs allow us to estimate the policy gradient without relying on value-function estimation, contrary to classical policy-gradient methods like actor-critic. To demonstrate their applicability, we examine two common control problems arising in stochastic networks and queueing systems whose stationary distributions have a product-form, a special case of exponential families. As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers. Our key assumptions are that, locally around a maximizer, a nondegeneracy property of the Hessian of the objective function holds and a Lyapunov function exists. Finally, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic algorithm. The results demonstrate that the SAGE-based method finds close-to-optimal policies more rapidly, highlighting its superior performance over the traditional actor-critic method.

Materials Expert-Artificial Intelligence for Materials Discovery

  • paper_url: http://arxiv.org/abs/2312.02796
  • repo_url: None
  • paper_authors: Yanjun Liu, Milena Jovanovic, Krishnanand Mallayya, Wesley J. Maddox, Andrew Gordon Wilson, Sebastian Klemenz, Leslie M. Schoop, Eun-Ah Kim
  • for: This paper aims to develop a machine learning approach to uncover predictive descriptors for emergent material properties from vast data space, with a focus on topological semimetals (TSMs) among square-net materials.
  • methods: The authors use a machine learning approach called “Materials Expert-Artificial Intelligence” (ME-AI) to encapsulate and articulate human intuition, which is based on experimental data whenever possible. They use Dirichlet-based Gaussian process regression with a specialized kernel to reveal composite descriptors for square-net TSMs.
  • results: The ME-AI learned descriptors independently reproduce expert intuition and expand upon it, pointing to hypervalency as a critical chemical feature predicting TSM within square-net compounds. The success of the approach on a carefully defined problem suggests that it is promising for machine learning-aided material discovery.
    Abstract The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space. However, common reliance on high-throughput ab initio data necessarily inherits limitations of such data: mismatch with experiments. On the other hand, experimental decisions are often guided by an expert's intuition honed from experiences that are rarely articulated. We propose using machine learning to "bottle" such operational intuition into quantifiable descriptors using expertly curated measurement-based data. We introduce "Materials Expert-Artificial Intelligence" (ME-AI) to encapsulate and articulate this human intuition. As a first step towards such a program, we focus on the topological semimetal (TSM) among square-net materials as the property inspired by the expert-identified descriptor based on structural information: the tolerance factor. We start by curating a dataset encompassing 12 primary features of 879 square-net materials, using experimental data whenever possible. We then use Dirichlet-based Gaussian process regression using a specialized kernel to reveal composite descriptors for square-net topological semimetals. The ME-AI learned descriptors independently reproduce expert intuition and expand upon it. Specifically, new descriptors point to hypervalency as a critical chemical feature predicting TSM within square-net compounds. Our success with a carefully defined problem points to the "machine bottling human insight" approach as promising for machine learning-aided material discovery.
    摘要 Material databases 的出现提供了前所未有的机会,揭示出emergent material properties的预测描述符从庞大的数据空间中。然而,通常依赖于高通量ab initio数据的限制,这些数据与实验不匹配。相反,实验决策 oftentimes 受到专家的直觉导向,这些直觉通常是从经验中熟悉而来,而这些经验 rarely 被详细表述。我们提议使用机器学习来“瓶化”这些人类直觉,转化为可衡量的描述符,使用专家curated measurement-based数据。我们称之为“Materials Expert-Artificial Intelligence”(ME-AI)。作为这一计划的首先步骤,我们将关注在四角网材料中的topological semimetal(TSM),基于结构信息所 inspirited 的描述符:容忍因子。我们开始是通过筛选879个四角网材料的12个基本特征,使用实验数据 whenever possible。然后,我们使用Dirichlet-based Gaussian process regression的特殊kernel来揭示square-net topological semimetals中的复合描述符。ME-AI学习的描述符独立地重现专家直觉,并且进一步发掘了它。 Specifically, new descriptors point to hypervalency as a critical chemical feature predicting TSM within square-net compounds。我们在定制的问题上成功,表明了“机器瓶化人类智慧”的方法为机器学习帮助材料发现具有潜力。

Machine Learning Driven Sensitivity Analysis of E3SM Land Model Parameters for Wetland Methane Emissions

  • paper_url: http://arxiv.org/abs/2312.02786
  • repo_url: None
  • paper_authors: Sandeep Chinta, Xiang Gao, Qing Zhu
    for:This study aims to identify critical parameters for methane emission in the Energy Exascale Earth System Model (E3SM) land model (ELM) and to reduce biases and uncertainties in future projections using sensitivity analysis (SA) and machine learning (ML) algorithms.methods:The study uses SA to examine the impact of 19 selected parameters responsible for critical biogeochemical processes in the methane module of ELM on various CH4 fluxes at 14 FLUXNET-CH4 sites with diverse vegetation types. The study also employs an ML algorithm to emulate the complex behavior of ELM methane biogeochemistry and to reduce computational costs.results:The study found that parameters linked to CH4 production and diffusion generally present the highest sensitivities despite apparent seasonal variation. Comparing simulated emissions from perturbed parameter sets against FLUXNET-CH4 observations revealed that better performances can be achieved at each site compared to the default parameter values, indicating a scope for further improving simulated emissions using parameter calibration with advanced optimization techniques like Bayesian optimization.
    Abstract Methane (CH4) is the second most critical greenhouse gas after carbon dioxide, contributing to 16-25% of the observed atmospheric warming. Wetlands are the primary natural source of methane emissions globally. However, wetland methane emission estimates from biogeochemistry models contain considerable uncertainty. One of the main sources of this uncertainty arises from the numerous uncertain model parameters within various physical, biological, and chemical processes that influence methane production, oxidation, and transport. Sensitivity Analysis (SA) can help identify critical parameters for methane emission and achieve reduced biases and uncertainties in future projections. This study performs SA for 19 selected parameters responsible for critical biogeochemical processes in the methane module of the Energy Exascale Earth System Model (E3SM) land model (ELM). The impact of these parameters on various CH4 fluxes is examined at 14 FLUXNET- CH4 sites with diverse vegetation types. Given the extensive number of model simulations needed for global variance-based SA, we employ a machine learning (ML) algorithm to emulate the complex behavior of ELM methane biogeochemistry. ML enables the computational time to be shortened significantly from 6 CPU hours to 0.72 milliseconds, achieving reduced computational costs. We found that parameters linked to CH4 production and diffusion generally present the highest sensitivities despite apparent seasonal variation. Comparing simulated emissions from perturbed parameter sets against FLUXNET-CH4 observations revealed that better performances can be achieved at each site compared to the default parameter values. This presents a scope for further improving simulated emissions using parameter calibration with advanced optimization techniques like Bayesian optimization.
    摘要 氨 (CH4) 是大气中第二重要的绿色气体,占据大气暖化的 16-25%。湿地是全球主要的自然氨发生源。然而,湿地氨发生估计从生物地球化学模型中含有较大的不确定性。这种不确定性的主要来源是生物地球化学过程中的多个不确定参数。敏感分析 (SA) 可以帮助标识氨发生中关键的参数,以便在未来预测中减少偏差和不确定性。这个研究在 E3SM terrestrial model (ELM) 中的氨模块中进行了 19 个参数的敏感分析。这些参数影响 CH4 的多种流向,并在 14 个 FLUXNET-CH4 站点上进行了多种植被类型的 исследование。由于需要进行全球差异基于的 SA,我们使用机器学习 (ML) 算法来模拟 ELM 氨生物地球化学的复杂行为。 ML 使得计算时间从原来的 6 CPU 小时缩短到 0.72 毫秒,实现了计算成本的减少。我们发现,与 CH4 生产和扩散直接相关的参数通常具有最高敏感性,尽管显示季节性变化。对比推测参数集中的释放与 FLUXNET-CH4 观测数据表示,可以在每个站点上实现更好的表现,比 default 参数值更好。这表明可以通过参数调整和进一步的优化技术,如 Bayesian 优化,进一步提高预测的释放。

Learning “Look-Ahead” Nonlocal Traffic Dynamics in a Ring Road

  • paper_url: http://arxiv.org/abs/2312.02770
  • repo_url: None
  • paper_authors: Chenguang Zhao, Huan Yu
  • for: 这个研究旨在探讨非本地差分方程模型(PDE)的应用,以掌握车流速度的预测和管理。
  • methods: 本研究使用了交通轨迹数据,设计了物理学 Informed Neural Network(PINN)来learns the fundamental diagram和look-ahead核函数,并通过最小化损失函数的方式创建了一个基于数据的增强非本地LWR模型。
  • results: 研究结果显示,使用了PINN学习的非本地LWR模型能够更 preciselly预测车流速度的传播,在三个不同的情况下都有更好的预测效果:停车往复运动、塞车和自由流。此外,研究也确认了“look-ahead”效应的存在,并发现optimal nonlocal kernel的长度为约35-50米,而内部5米的核函数占了大多数非本地效应。
    Abstract The macroscopic traffic flow model is widely used for traffic control and management. To incorporate drivers' anticipative behaviors and to remove impractical speed discontinuity inherent in the classic Lighthill-Whitham-Richards (LWR) traffic model, nonlocal partial differential equation (PDE) models with ``look-ahead" dynamics have been proposed, which assume that the speed is a function of weighted downstream traffic density. However, it lacks data validation on two important questions: whether there exist nonlocal dynamics, and how the length and weight of the ``look-ahead" window affect the spatial temporal propagation of traffic densities. In this paper, we adopt traffic trajectory data from a ring-road experiment and design a physics-informed neural network to learn the fundamental diagram and look-ahead kernel that best fit the data, and reinvent a data-enhanced nonlocal LWR model via minimizing the loss function combining the data discrepancy and the nonlocal model discrepancy. Results show that the learned nonlocal LWR yields a more accurate prediction of traffic wave propagation in three different scenarios: stop-and-go oscillations, congested, and free traffic. We first demonstrate the existence of ``look-ahead" effect with real traffic data. The optimal nonlocal kernel is found out to take a length of around 35 to 50 meters, and the kernel weight within 5 meters accounts for the majority of the nonlocal effect. Our results also underscore the importance of choosing a priori physics in machine learning models.
    摘要 宽泛交通流模型广泛用于交通控制和管理。为了包括 drivers 的预测行为并消除类别 Lighthill-Whitham-Richards (LWR) 流体模型中的不实际速度缺失,非本地partial differential equation (PDE) 模型 WITH "look-ahead" 动力学被提议,它假设速度为下游交通密度的加权函数。然而,它缺乏数据验证两个重要问题:是否存在非本地动力学,以及"look-ahead" 窗口的长度和重量如何影响空间时间层流密度的传播。在这篇论文中,我们采用环路实验的交通轨迹数据,并设计了physics-informed neural network来学习基本图ogram和look-ahead kernel,并通过最小化损失函数来恢复数据增强的非本地LWR模型。结果表明学习的非本地LWR模型可以更准确地预测交通波的传播在三种不同的情况下:停止-和-跑动、堵塞和自由交通。我们首先证明了实际交通数据中的"look-ahead"效应的存在。最佳的非本地kernel长度为35-50米,而在5米内的kernel重量占了非本地效应的大多数。我们的结果也强调了在机器学习模型中采用先验法的重要性。

LExCI: A Framework for Reinforcement Learning with Embedded Systems

  • paper_url: http://arxiv.org/abs/2312.02739
  • repo_url: https://github.com/mechatronics-rwth/lexci-2
  • paper_authors: Kevin Badalian, Lucas Koch, Tobias Brinkmann, Mario Picerno, Marius Wegener, Sung-Yong Lee, Jakob Andert
  • for: 这篇论文是关于控制工程中的人工智能应用,具体来说是一种名为强化学习(Reinforcement Learning,RL)的方法,用于让代理人在环境中自由地互动,以找到最佳策略。
  • methods: 本论文使用的方法是一种名为LExCI(Learning and Experiencing Cycle Interface)的框架,它可以将RLlib开源库与特定的嵌入式设备集成,以便在这些设备上训练RL代理人。
  • results: 本论文的结果表明,LExCI框架可以帮助训练RL代理人,并且可以与现有的工具链集成。两种状态前瞻RL算法和快速控制概念验证系统都被用来演示LExCI的可操作性。
    Abstract Advances in artificial intelligence (AI) have led to its application in many areas of everyday life. In the context of control engineering, reinforcement learning (RL) represents a particularly promising approach as it is centred around the idea of allowing an agent to freely interact with its environment to find an optimal strategy. One of the challenges professionals face when training and deploying RL agents is that the latter often have to run on dedicated embedded devices. This could be to integrate them into an existing toolchain or to satisfy certain performance criteria like real-time constraints. Conventional RL libraries, however, cannot be easily utilised in conjunction with that kind of hardware. In this paper, we present a framework named LExCI, the Learning and Experiencing Cycle Interface, which bridges this gap and provides end-users with a free and open-source tool for training agents on embedded systems using the open-source library RLlib. Its operability is demonstrated with two state-of-the-art RL-algorithms and a rapid control prototyping system.
    摘要 人工智能(AI)的进步已经应用到了我们日常生活中的各个领域。在控制工程中,回归学习(RL)是一种特别有把握的方法,因为它将代理人允许自由地与环境互动,以找到最佳策略。然而,训练和部署RL代理人时,专业人员常遇到的挑战是RL代理人通常需要运行在专门的嵌入式设备上。这可能是为了结合现有的工具链,或者满足certain性能标准,如实时约束。 conventioanl RL库不能方便地在这种硬件上使用。在这篇论文中,我们提出了一个名为LExCI的框架,即学习和体验循环界面。LExCI bridges this gap and provides end-users with a free and open-source tool for training agents on embedded systems using the open-source library RLlib。我们的框架可以与两种现状最佳RL算法和快速控制原型系统进行运练。

(Provable) Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More

  • paper_url: http://arxiv.org/abs/2312.02708
  • repo_url: None
  • paper_authors: Jan Schuchardt, Yan Scholten, Stephan Günnemann
  • for: 本研究旨在提供一种具有任务对称性的鲁棒性定义,并证明可以通过选择具有任务对称性的模型和进行 tradicional adversarial robustness 证明来实现可靠的鲁棒性。
  • methods: 本研究使用了 equivariance-preserving randomized smoothing 框架和architecture-specific graph edit distance certificates来证明模型的鲁棒性。
  • results: 本研究发现了一些鲁棒性证明方法,包括 choosing a model that matches the task’s equivariances 和 certifying traditional adversarial robustness,可以为未来在鲁棒机器学习和几何机器学习之间的工作提供基础。
    Abstract A machine learning model is traditionally considered robust if its prediction remains (almost) constant under input perturbations with small norm. However, real-world tasks like molecular property prediction or point cloud segmentation have inherent equivariances, such as rotation or permutation equivariance. In such tasks, even perturbations with large norm do not necessarily change an input's semantic content. Furthermore, there are perturbations for which a model's prediction explicitly needs to change. For the first time, we propose a sound notion of adversarial robustness that accounts for task equivariance. We then demonstrate that provable robustness can be achieved by (1) choosing a model that matches the task's equivariances (2) certifying traditional adversarial robustness. Certification methods are, however, unavailable for many models, such as those with continuous equivariances. We close this gap by developing the framework of equivariance-preserving randomized smoothing, which enables architecture-agnostic certification. We additionally derive the first architecture-specific graph edit distance certificates, i.e. sound robustness guarantees for isomorphism equivariant tasks like node classification. Overall, a sound notion of robustness is an important prerequisite for future work at the intersection of robust and geometric machine learning.
    摘要 We demonstrate that provable robustness can be achieved by (1) selecting a model that matches the task's equivariances and (2) certifying traditional adversarial robustness. However, certification methods are not available for many models, such as those with continuous equivariances. To address this gap, we develop the framework of equivariance-preserving randomized smoothing, which enables architecture-agnostic certification. Additionally, we derive the first architecture-specific graph edit distance certificates, which provide sound robustness guarantees for isomorphism equivariant tasks like node classification.Overall, a sound notion of robustness is crucial for future work at the intersection of robust and geometric machine learning.

Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler

  • paper_url: http://arxiv.org/abs/2312.02683
  • repo_url: None
  • paper_authors: Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May
  • for: 这个论文主要针对的是speech enhancement的泛化性能,以及 diffusion models在这个领域的应用。
  • methods: 这个论文使用了diffusion models,并在多个语音、噪声和binaru room impulse response(BRIR)数据库中进行了训练,以测试其在不同的噪声和音响环境下的泛化性能。
  • results: 论文表明,使用多个数据库进行训练可以提高 diffusion-based speech enhancement 模型的泛化性能,并且在 matched 和 mismatched 条件下都表现出优于当前领先的泛化模型。此外,使用 Heun-based 采样器也可以在更小的计算成本下提高泛化性能。
    Abstract Diffusion models are a new class of generative models that have recently been applied to speech enhancement successfully. Previous works have demonstrated their superior performance in mismatched conditions compared to state-of-the art discriminative models. However, this was investigated with a single database for training and another one for testing, which makes the results highly dependent on the particular databases. Moreover, recent developments from the image generation literature remain largely unexplored for speech enhancement. These include several design aspects of diffusion models, such as the noise schedule or the reverse sampler. In this work, we systematically assess the generalization performance of a diffusion-based speech enhancement model by using multiple speech, noise and binaural room impulse response (BRIR) databases to simulate mismatched acoustic conditions. We also experiment with a noise schedule and a sampler that have not been applied to speech enhancement before. We show that the proposed system substantially benefits from using multiple databases for training, and achieves superior performance compared to state-of-the-art discriminative models in both matched and mismatched conditions. We also show that a Heun-based sampler achieves superior performance at a smaller computational cost compared to a sampler commonly used for speech enhancement.
    摘要 Diffusion models 是一种新的生成模型,最近在语音提升中得到了成功应用。之前的研究表明,Diffusion models 在不同的匹配条件下比现有的描述性模型表现更出色。然而,这些研究通常使用一个训练数据集和一个测试数据集,这使得结果受到特定数据集的限制。此外,图像生成领域的最新发展还没有得到过语音提升的应用。这些包括Diffusion models中的噪声程度或反向抽象等设计方面。在这项工作中,我们系统地评估了一种基于Diffusion models的语音提升模型,使用多个语音、噪声和双耳室音响响应(BRIR)数据集来模拟不同的匹配条件。我们还尝试了一种没有用于语音提升之前的噪声程度和抽象方法。我们发现,提案的系统在使用多个数据集进行训练时得到了明显的改善,并在匹配和不匹配条件下都与现有的描述性模型相比表现出色。此外,我们还发现了一种基于Heun的抽象方法在计算成本更小的情况下表现更好。

Learning a Sparse Representation of Barron Functions with the Inverse Scale Space Flow

  • paper_url: http://arxiv.org/abs/2312.02671
  • repo_url: None
  • paper_authors: Tjeerd Jan Heeringa, Tim Roith, Christoph Brune, Martin Burger
  • for: 这 paper 是用来找到 Barron 函数的稀疏表示方法。
  • methods: 这 paper 使用 inverse scale space flow 来找到一个稀疏测度 $\mu$,使得 Barron 函数相关于测度 $\mu$ 和函数 $f$ 之间的 $L^2$ 距离最小化。
  • results: 这 paper 分析了这种方法在理想情况下和干扰情况下的收敛性质。在理想情况下,目标函数会逐渐减少,直到到达最小值,并且收敛速率为 $\mathcal{O}(1/t)$。在干扰情况下,最优解可能会受到多余或加法常数的影响。这种收敛性保持在分析参数空间的离散化上,并且在不断细化参数空间上的最小化点会 converges 到全参数空间上的最优解。
    Abstract This paper presents a method for finding a sparse representation of Barron functions. Specifically, given an $L^2$ function $f$, the inverse scale space flow is used to find a sparse measure $\mu$ minimising the $L^2$ loss between the Barron function associated to the measure $\mu$ and the function $f$. The convergence properties of this method are analysed in an ideal setting and in the cases of measurement noise and sampling bias. In an ideal setting the objective decreases strictly monotone in time to a minimizer with $\mathcal{O}(1/t)$, and in the case of measurement noise or sampling bias the optimum is achieved up to a multiplicative or additive constant. This convergence is preserved on discretization of the parameter space, and the minimizers on increasingly fine discretizations converge to the optimum on the full parameter space.
    摘要 In an ideal setting, the objective function decreases strictly monotonically in time to a minimizer with a rate of $\mathcal{O}(1/t)$. In the presence of measurement noise or sampling bias, the optimum is achieved up to a multiplicative or additive constant. This convergence is preserved when discretizing the parameter space, and the minimizers on increasingly fine discretizations converge to the optimum on the full parameter space.Translated into Simplified Chinese:这篇论文提出了一种方法,用于找到巴朗函数的简洁表示。给定一个 $L^2$ 函数 $f$,使用反尺度空间流动来找到一个简洁度量 $\mu$,使得巴朗函数相关于度量 $\mu$ 和函数 $f$ 之间的 $L^2$ 距离最小。这种方法的收敛性被分析在理想情况下和干扰和抽象偏见的情况下。在理想情况下,目标函数随着时间的增长而逐渐减少,直到到达最佳解,减少的速率为 $\mathcal{O}(1/t)$。在干扰和抽象偏见的情况下,最佳解可以在多项式幂级上减少,但是会受到多项式幂级的影响。当精度化参数空间时,这种收敛性保持不变,并且在不断细化参数空间时,最佳解在不断细化的参数空间上都会 converge 到全参数空间上的最佳解。Translated by Google Translate:This paper proposes a method for finding a sparse representation of Barron functions. Given an $L^2$ function $f$, the inverse scale space flow is used to find a sparse measure $\mu$ that minimizes the $L^2$ loss between the Barron function associated with the measure $\mu$ and the function $f$. The convergence properties of this method are analyzed in an ideal setting and in the cases of measurement noise and sampling bias.In an ideal setting, the objective function decreases strictly monotonically in time to a minimizer with a rate of $\mathcal{O}(1/t)$. In the presence of measurement noise or sampling bias, the optimum is achieved up to a multiplicative or additive constant. This convergence is preserved when discretizing the parameter space, and the minimizers on increasingly fine discretizations converge to the optimum on the full parameter space.

A Self-Commissioning Edge Computing Method for Data-Driven Anomaly Detection in Power Electronic Systems

  • paper_url: http://arxiv.org/abs/2312.02661
  • repo_url: None
  • paper_authors: Pere Izquierdo Gomez, Miguel E. Lopez Gajardo, Nenad Mijatovic, Tomislav Dragicevic
  • for: 验证电子转换器可靠性的重要性,数据驱动监测技术在这方面扮演着越来越重要的角色。
  • methods: 本文提出了一种边缘计算方法,通过优先存储训练样本的大小偏差来mitigate lab数据有限样本的困难,以提高训练过程的稳定性和预测性。
  • results: 实验数据显示,该方法可以提高预测精度和训练速度,比 tradicional online学习方法无需该数据选择过程更好。
    Abstract Ensuring the reliability of power electronic converters is a matter of great importance, and data-driven condition monitoring techniques are cementing themselves as an important tool for this purpose. However, translating methods that work well in controlled lab environments to field applications presents significant challenges, notably because of the limited diversity and accuracy of the lab training data. By enabling the use of field data, online machine learning can be a powerful tool to overcome this problem, but it introduces additional challenges in ensuring the stability and predictability of the training processes. This work presents an edge computing method that mitigates these shortcomings with minimal additional memory usage, by employing an autonomous algorithm that prioritizes the storage of training samples with larger prediction errors. The method is demonstrated on the use case of a self-commissioning condition monitoring system, in the form of a thermal anomaly detection scheme for a variable frequency motor drive, where the algorithm self-learned to distinguish normal and anomalous operation with minimal prior knowledge. The obtained results, based on experimental data, show a significant improvement in prediction accuracy and training speed, when compared to equivalent models trained online without the proposed data selection process.
    摘要 This work proposes an edge computing method that mitigates these shortcomings with minimal additional memory usage. The method employs an autonomous algorithm that prioritizes the storage of training samples with larger prediction errors. The approach is demonstrated on the use case of a self-commissioning condition monitoring system, in the form of a thermal anomaly detection scheme for a variable frequency motor drive. The algorithm self-learned to distinguish normal and anomalous operation with minimal prior knowledge, and the obtained results, based on experimental data, show a significant improvement in prediction accuracy and training speed compared to equivalent models trained online without the proposed data selection process.

Do AI models produce better weather forecasts than physics-based models? A quantitative evaluation case study of Storm Ciarán

  • paper_url: http://arxiv.org/abs/2312.02658
  • repo_url: None
  • paper_authors: Andrew J. Charlton-Perez, Helen F. Dacre, Simon Driscoll, Suzanne L. Gray, Ben Harvey, Natalie J. Harvey, Kieran M. R. Hunt, Robert W. Lee, Ranjini Swaminathan, Remy Vandaele, Ambrogio Volonté
  • for: 这个研究的目的是对现代机器学习模型在 simulate 高impact 天气事件方面的性能进行了评估。
  • methods: 这个研究使用了四种机器学习模型(FourCastNet、Pangu-Weather、GraphCast和FourCastNet-v2)来预测欧洲风暴雨灾事件 Storm Ciar'an。
  • results: 研究发现这些机器学习模型能够准确地捕捉风暴的大规模结构,包括云头的位置、温带的形状和热带湍流的位置,以及风暴的发展驱动因素。但是,它们在发布气象警报所需的更细节结构方面的表现更为杂mix。
    Abstract There has been huge recent interest in the potential of making operational weather forecasts using machine learning techniques. As they become a part of the weather forecasting toolbox, there is a pressing need to understand how well current machine learning models can simulate high-impactweather events. We compare forecasts of Storm Ciar\'an, a European windstorm that caused sixteen deaths and extensive damage in Northern Europe, made by machine learning and numericalweather prediction models. The four machine learning models considered (FourCastNet, Pangu-Weather, GraphCast and FourCastNet-v2) produce forecasts that accurately capture the synoptic-scale structure of the cyclone including the position of the cloud head, shape of the warm sector and location of warm conveyor belt jet, and the large-scale dynamical drivers important for the rapid storm development such as the position of the storm relative to the upper-level jet exit. However, their ability to resolve the more detailed structures important for issuing weather warnings is more mixed. All of the machine learning models underestimate the peak amplitude of winds associated with the storm, only some machine learning models resolve the warm core seclusion and none of the machine learning models capture the sharp bent-back warm frontal gradient. Our study shows there is a great deal about the performance and properties of machine learning weather forecasts that can be derived from case studies of high-impact weather events such as Storm Ciar\'an.
    摘要 有很大的现代 интерес在使用机器学习技术进行操作天气预报。随着它们成为天气预报工具箱的一部分,有一个急需要理解现有的机器学习模型是否可以正确地预测高影响天气事件。我们比较了由机器学习和数值天气预报模型生成的飓风恩戈尔(Storm Ciarán)的预测,包括云头位置、暖带形状和暖带喷流的位置,以及飓风发展中的大气动力驱动因素。然而,它们在发布天气警报时的详细结构是更加混乱。所有机器学习模型都低估了飓风相关的风暴潮振幅,只有一些机器学习模型解释暖核孤立,而 none of them capture the sharp bent-back warm frontal gradient。我们的研究表明,可以从高影响天气事件如飓风恩戈尔的案例研究中获得许多有关机器学习天气预报性能和特性的信息。

What Machine Learning Can Do for Focusing Aerogel Detectors

  • paper_url: http://arxiv.org/abs/2312.02652
  • repo_url: None
  • paper_authors: Foma Shipilov, Alexander Barnyakov, Vladimir Bobrovnikov, Sergey Kononov, Fedor Ratnikov
  • for: 这项研究用于提高Super Charm-Tau工厂实验中的粒子识别率。
  • methods: 这项研究使用了计算机视觉技术的多种措施来筛选信号射击。
  • results: 这些措施可以有效地减少数据流量和提高粒子速度分辨率。
    Abstract Particle identification at the Super Charm-Tau factory experiment will be provided by a Focusing Aerogel Ring Imaging CHerenkov detector (FARICH). The specifics of detector location make proper cooling difficult, therefore a significant number of ambient background hits are captured. They must be mitigated to reduce the data flow and improve particle velocity resolution. In this work we present several approaches to filtering signal hits, inspired by machine learning techniques from computer vision.
    摘要 超 charm-tau 实验室中的粒子识别将由焦点式气泡图像液态液凝聚器(FARICH)提供。察看器的具体位置使得正确冷却受到限制,因此在捕捉大量的 ambient 背景射击中捕捉到了许多射击。为了减少数据流量并提高粒子运动解析精度,在这种工作中我们提出了一些基于计算机视觉技术的筛选信号射击方法。

A Q-learning approach to the continuous control problem of robot inverted pendulum balancing

  • paper_url: http://arxiv.org/abs/2312.02649
  • repo_url: None
  • paper_authors: Mohammad Safeea, Pedro Neto
  • for: 这个研究是用于评估抽象动作空间强化学习方法(Q学习)在Robot倒立拐杆平衡控制中的应用。
  • methods: 这种方法使用了在实际系统上进行学习阶段的数据拟合,以加速学习过程和缓解实际系统上的技术困难。
  • results: 该方法在实际系统上成功应用,并在一个真实世界Robot上学习平衡倒立拐杆。这个研究也证明了在实际世界中使用抽象动作空间算法控制连续动作的重要性,并且用于加速学习过程。
    Abstract This study evaluates the application of a discrete action space reinforcement learning method (Q-learning) to the continuous control problem of robot inverted pendulum balancing. To speed up the learning process and to overcome technical difficulties related to the direct learning on the real robotic system, the learning phase is performed in simulation environment. A mathematical model of the system dynamics is implemented, deduced by curve fitting on data acquired from the real system. The proposed approach demonstrated feasible, featuring its application on a real world robot that learned to balance an inverted pendulum. This study also reinforces and demonstrates the importance of an accurate representation of the physical world in simulation to achieve a more efficient implementation of reinforcement learning algorithms in real world, even when using a discrete action space algorithm to control a continuous action.
    摘要 Translation notes:* "discrete action space" is translated as "离散动作空间" (lián chuān dòng yào kōng jì)* "continuous control" is translated as "连续控制" (lián xù kòng zhì)* "inverted pendulum balancing" is translated as "倒立悬挂平衡" (dào zhí xiàng guī píng yǐng)* "real world robot" is translated as "真实世界机器人" (zhēn shí shì jiè yì jī rén)* "simulation environment" is translated as "模拟环境" (mó xiǎo huán jì)* "mathematical model" is translated as "数学模型" (shù xué mó xiǎng)* "system dynamics" is translated as "系统动态" (xiàng tǒng dòng dài)* "curve fitting" is translated as "曲线适应" (qū xiàn tí bèng)* "accurate representation" is translated as "准确表示" (zhèng qiú biǎo gòng)

Rethinking and Simplifying Bootstrapped Graph Latents

  • paper_url: http://arxiv.org/abs/2312.02619
  • repo_url: https://github.com/zszszs25/sgcl
  • paper_authors: Wangbin Sun, Jintang Li, Liang Chen, Bingzhe Wu, Yatao Bian, Zibin Zheng
  • for: 提高图像自supervised learning中模型的可分解性和性能。
  • methods: 利用两次循环输出作为正样本,取消负样本。
  • results: 与传统GCL方法相比,SGCL可以实现竞争性的性能,同时具有更少的参数、更低的时间和空间成本,以及显著的速度提升。
    Abstract Graph contrastive learning (GCL) has emerged as a representative paradigm in graph self-supervised learning, where negative samples are commonly regarded as the key to preventing model collapse and producing distinguishable representations. Recent studies have shown that GCL without negative samples can achieve state-of-the-art performance as well as scalability improvement, with bootstrapped graph latent (BGRL) as a prominent step forward. However, BGRL relies on a complex architecture to maintain the ability to scatter representations, and the underlying mechanisms enabling the success remain largely unexplored. In this paper, we introduce an instance-level decorrelation perspective to tackle the aforementioned issue and leverage it as a springboard to reveal the potential unnecessary model complexity within BGRL. Based on our findings, we present SGCL, a simple yet effective GCL framework that utilizes the outputs from two consecutive iterations as positive pairs, eliminating the negative samples. SGCL only requires a single graph augmentation and a single graph encoder without additional parameters. Extensive experiments conducted on various graph benchmarks demonstrate that SGCL can achieve competitive performance with fewer parameters, lower time and space costs, and significant convergence speedup.
    摘要 《GRAPH CONTRASTIVE LEARNING(GCL)在图自助学习中已成为一种代表性的 paradigm, negative samples 通常被视为防止模型塌缩和生成 отличитель的表示的关键。 however, recent studies have shown that GCL without negative samples can achieve state-of-the-art performance as well as scalability improvement, with bootstrapped graph latent(BGRL)as a prominent step forward。然而,BGRL 依赖于复杂的架构来维护能够散射表示的能力,而下面 mechanisms 使得成功 remain largely unexplored。 在这篇论文中,我们提出了一种实例级别的decorrelation perspectives来解决上述问题,并利用其为springboard 探索BGRL 中可能存在的不必要的模型复杂度。根据我们的发现,我们提出了SGCL,一种简单 yet effective GCL framework,利用两个连续的迭代 outputs作为正例对。SGCL 仅需要一个图像增强和一个图像编码器,没有额外参数。在多种图 benchmarks 上进行了广泛的实验, demonstrate that SGCL 可以 дости到与 fewer parameters, lower time and space costs, and significant convergence speedup 的竞争性性能。

Privacy-Aware Data Acquisition under Data Similarity in Regression Markets

  • paper_url: http://arxiv.org/abs/2312.02611
  • repo_url: None
  • paper_authors: Shashi Raj Pandey, Pierre Pinson, Petar Popovski
  • for: 该论文旨在设计数据市场,考虑数据所有者的隐私偏好和数据相似性的影响。
  • methods: 该论文提出了一种基于本地均分隐私协议的查询-回复协议,用于实现两方数据交换机制。
  • results: 该论文通过分析参与者之间的策略交互,分析了隐私意识的影响于价格和隐私因子。 Additionally, the paper shows that data similarity affects market participation and traded data value.
    Abstract Data markets facilitate decentralized data exchange for applications such as prediction, learning, or inference. The design of these markets is challenged by varying privacy preferences as well as data similarity among data owners. Related works have often overlooked how data similarity impacts pricing and data value through statistical information leakage. We demonstrate that data similarity and privacy preferences are integral to market design and propose a query-response protocol using local differential privacy for a two-party data acquisition mechanism. In our regression data market model, we analyze strategic interactions between privacy-aware owners and the learner as a Stackelberg game over the asked price and privacy factor. Finally, we numerically evaluate how data similarity affects market participation and traded data value.
    摘要 “数据市场促进了分布式数据交换,用于预测、学习或推理应用。市场设计面临着数据所有者的隐私偏好以及数据之间的相似性问题。相关的研究经常忽略了数据相似性对价格和数据价值的影响。我们证明了数据相似性和隐私偏好是市场设计的关键组成部分,并提出了基于本地均匀隐私协议的查询-响应协议 для两方数据获取机制。在我们的回归数据市场模型中,我们分析了隐私意识的所有者和学习者之间的战略交互,包括价格和隐私因素。最后,我们数字评估了数据相似性对市场参与度和交易数据价值的影响。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I'll be happy to provide it.

TSVR+: Twin support vector regression with privileged information

  • paper_url: http://arxiv.org/abs/2312.02596
  • repo_url: None
  • paper_authors: Anuradha Kumari, M. Tanveer
  • for: 提高机器学习模型的训练速度和准确性
  • methods: combining twin support vector regression (TSVR) with learning using privileged information (LUPI) and using successive overrelaxation (SOR) technique to solve the optimization problem
  • results: 在 UCI、股票和时间序列数据集上进行了数值实验,并证明了提案的模型的优越性
    Abstract In the realm of machine learning, the data may contain additional attributes, known as privileged information (PI). The main purpose of PI is to assist in the training of the model and then utilize the acquired knowledge to make predictions for unseen samples. Support vector regression (SVR) is an effective regression model, however, it has a low learning speed due to solving a convex quadratic problem (QP) subject to a pair of constraints. In contrast, twin support vector regression (TSVR) is more efficient than SVR as it solves two QPs each subject to one set of constraints. However, TSVR and its variants are trained only on regular features and do not use privileged features for training. To fill this gap, we introduce a fusion of TSVR with learning using privileged information (LUPI) and propose a novel approach called twin support vector regression with privileged information (TSVR+). The regularization terms in the proposed TSVR+ capture the essence of statistical learning theory and implement the structural risk minimization principle. We use the successive overrelaxation (SOR) technique to solve the optimization problem of the proposed TSVR+, which enhances the training efficiency. As far as our knowledge extends, the integration of the LUPI concept into twin variants of regression models is a novel advancement. The numerical experiments conducted on UCI, stock and time series data collectively demonstrate the superiority of the proposed model.
    摘要 在机器学习领域中,数据可能包含附加的特征,称为特权信息(PI)。PI的主要目的是帮助模型训练并使用所获知ledge来预测未经见过的样本。支持向量回归(SVR)是一种有效的回归模型,但它的学习速度较低,因为它解决了一个几何 quadratic problem(QP),并且受到一对约束的限制。相比之下,双支持向量回归(TSVR)比SVR更高效,因为它解决了两个QP,每个QP受到一个集合约束。然而,TSVR和其变种只在常见特征上训练,并不使用特权特征进行训练。为了填补这个空隙,我们提出了将TSVR与特权信息学习(LUPI)融合,并提出了一种新的方法called twin support vector regression with privileged information(TSVR+)。TSVR+的正则化项捕捉了统计学学习理论的核心,并实现了结构风险最小化原则。我们使用successive overrelaxation(SOR)技术解决TSVR+优化问题,这有助于提高训练效率。在我们所知道的范围内,将LUPI概念integrated into twin variants of regression models是一种新的进展。在UCIC、股票和时间序列数据上进行的数字实验结果表明,提议的模型具有superiority。

FRAPPÉ: A Post-Processing Framework for Group Fairness Regularization

  • paper_url: http://arxiv.org/abs/2312.02592
  • repo_url: https://github.com/google-research/google-research
  • paper_authors: Alexandru Ţifrea, Preethi Lahoti, Ben Packer, Yoni Halpern, Ahmad Beirami, Flavien Prost
  • for: 提高群体公平性,减少偏袋性和欺诈性
  • methods: 将任何内部处理方法转换为后处理方法,并使用罚 penalty 函数来解决敏感特征知ledge的问题
  • results: 经过批处理可以达到与内部处理方法相同的公平性-错误负担协议,并且在实际数据上表现出较好的性能
    Abstract Post-processing mitigation techniques for group fairness generally adjust the decision threshold of a base model in order to improve fairness. Methods in this family exhibit several advantages that make them appealing in practice: post-processing requires no access to the model training pipeline, is agnostic to the base model architecture, and offers a reduced computation cost compared to in-processing. Despite these benefits, existing methods face other challenges that limit their applicability: they require knowledge of the sensitive attributes at inference time and are oftentimes outperformed by in-processing. In this paper, we propose a general framework to transform any in-processing method with a penalized objective into a post-processing procedure. The resulting method is specifically designed to overcome the aforementioned shortcomings of prior post-processing approaches. Furthermore, we show theoretically and through extensive experiments on real-world data that the resulting post-processing method matches or even surpasses the fairness-error trade-off offered by the in-processing counterpart.
    摘要 对于群体公平性,后处理mitigation技术通常是调整基本模型的决策阈值,以改进公平性。这些方法具有许多优点,使其在实践中吸引人:后处理不需要对模型训练管道有任何Access,不受模型架构的限制,计算成本较低。然而,现有方法存在其他挑战,包括需要掌握敏感特征的知识在推理时,并且经常被内部处理方法所超越。在这篇论文中,我们提出一种普适的框架,可以将任何内部处理方法转化为后处理过程。得到的方法能够超越先前后处理方法的缺点,并且我们在理论和实验中展示了这种后处理方法与内部处理方法的公平性-错误负担trade-off匹配或甚至超越。

On Optimal Consistency-Robustness Trade-Off for Learning-Augmented Multi-Option Ski Rental

  • paper_url: http://arxiv.org/abs/2312.02547
  • repo_url: None
  • paper_authors: Yongho Shin, Changyeol Lee, Hyung-Chan An
  • for: 这个论文主要针对的问题是什么?
  • methods: 这个论文使用了哪些方法?
  • results: 这个论文获得了什么结果?Here are the answers in Simplified Chinese:
  • for: 这个论文主要针对的问题是学习增强的多选 ski 租赁问题,它将经典 ski 租赁问题扩展到了两个方面:首先,算法被提供了预测天气情况的数据,其次,租赁选项现在包括多个租赁期和价格选择。
  • methods: 这个论文使用了学习增强的方法,并且对于不同的租赁期和价格,提供了多种不同的策略。
  • results: 这个论文提出了一个最佳的算法,它可以与已知的下界匹配,并且对于随机化策略,提供了首次的下界,并且提出了一个改进的随机化策略,该策略在稳定性和多样性之间取得了最佳的平衡。
    Abstract The learning-augmented multi-option ski rental problem generalizes the classical ski rental problem in two ways: the algorithm is provided with a prediction on the number of days we can ski, and the ski rental options now come with a variety of rental periods and prices to choose from, unlike the classical two-option setting. Subsequent to the initial study of the multi-option ski rental problem (without learning augmentation) due to Zhang, Poon, and Xu, significant progress has been made for this problem recently in particular. The problem is very well understood when we relinquish one of the two generalizations -- for the learning-augmented classical ski rental problem, algorithms giving best-possible trade-off between consistency and robustness exist; for the multi-option ski rental problem without learning augmentation, deterministic/randomized algorithms giving the best-possible competitiveness have been found. However, in presence of both generalizations, there remained a huge gap between the algorithmic and impossibility results. In fact, for randomized algorithms, we did not have any nontrivial lower bounds on the consistency-robustness trade-off before. This paper bridges this gap for both deterministic and randomized algorithms. For deterministic algorithms, we present a best-possible algorithm that completely matches the known lower bound. For randomized algorithms, we show the first nontrivial lower bound on the consistency-robustness trade-off, and also present an improved randomized algorithm. Our algorithm matches our lower bound on robustness within a factor of e/2 when the consistency is at most 1.086.
    摘要 Ski 租赁问题可以分为两种通用情况:一是学习扩展 Ski 租赁问题,另一是多选 Ski 租赁问题。在这两种情况下,我们可以提供预测 Ski 租赁天数的数据,以及不同的租赁时间和价格选择。在这两种情况下,我们可以通过不同的算法来寻找最佳的租赁解决方案。在过去的研究中,我们已经对 Ski 租赁问题进行了许多研究,但是这些研究都是在单一选择情况下进行的。在这些研究中,我们发现了一些算法可以在这两种情况下寻找最佳的租赁解决方案。在这篇论文中,我们将这两种情况组合起来,对 Ski 租赁问题进行了全面的研究。我们提出了一个完美的算法,可以在这两种情况下寻找最佳的租赁解决方案。此外,我们还提出了一个新的下界,可以用于评估这两种情况下的租赁解决方案。在这篇论文中,我们还详细介绍了一些关于 Ski 租赁问题的概念和理论。我们希望这篇论文可以帮助更多的人了解这个问题,并且帮助他们寻找更好的租赁解决方案。

Characterization of Locality in Spin States and Forced Moves for Optimizations

  • paper_url: http://arxiv.org/abs/2312.02544
  • repo_url: None
  • paper_authors: Yoshiki Sato, Makiko Konoshima, Hirotaka Tamura, Jun Ohkubo
  • for: 解决 combinatorial optimization 问题中的本地极小点问题
  • methods: 利用特殊硬件和一种新的算法技术
  • results: 提出一种高效的、无拒绝的算法,可以快速离开本地极小点
    Abstract Ising formulations are widely utilized to solve combinatorial optimization problems, and a variety of quantum or semiconductor-based hardware has recently been made available. In combinatorial optimization problems, the existence of local minima in energy landscapes is problematic to use to seek the global minimum. We note that the aim of the optimization is not to obtain exact samplings from the Boltzmann distribution, and there is thus no need to satisfy detailed balance conditions. In light of this fact, we develop an algorithm to get out of the local minima efficiently while it does not yield the exact samplings. For this purpose, we utilize a feature that characterizes locality in the current state, which is easy to obtain with a type of specialized hardware. Furthermore, as the proposed algorithm is based on a rejection-free algorithm, the computational cost is low. In this work, after presenting the details of the proposed algorithm, we report the results of numerical experiments that demonstrate the effectiveness of the proposed feature and algorithm.
    摘要 伊顿形式ulation是广泛应用于解决 combinatorial optimization 问题,而现在一些量子或半导体基础设施也已经提供。在 combinatorial optimization 问题中,当地点最小值存在问题,因为它们使得寻找全局最小值变得困难。我们注意到优化的目标不是获取精确的抽样,因此不需要满足细节平衡条件。为了缓解这个问题,我们开发了一种能够快速离开本地最小值的算法,该算法不需要拒绝任何样本。在这种情况下,我们可以利用当前状态的本地特征,这是通过特殊硬件获得的易于获得。此外,由于我们的算法基于拒绝自由算法,计算成本较低。在这个工作中,我们将详细介绍我们的算法,并对数值实验结果进行报告。

Asymmetric leader-laggard cluster synchronization for collective decision-making with laser network

  • paper_url: http://arxiv.org/abs/2312.02537
  • repo_url: None
  • paper_authors: Shun Kotoku, Takatomo Mihana, André Röhm, Ryoichi Horisaki, Makoto Naruse
  • for: 这个论文是为了研究光学加速器在信息处理中的应用,特别是通过使用激光网络来解决竞争多臂弓兵(CMAB)问题。
  • methods: 该论文使用了光学连接的激光器来实现集体决策,利用激光网络中的异步和同步动力来解决CMAB问题。
  • results: 研究人员通过稳定性分析对集体决策的必要网络结构进行了评估,并发现了玩家偏好的偏好性,从而扩展了CMAB问题的应用范围。
    Abstract Photonic accelerators have recently attracted soaring interest, harnessing the ultimate nature of light for information processing. Collective decision-making with a laser network, employing the chaotic and synchronous dynamics of optically interconnected lasers to address the competitive multi-armed bandit (CMAB) problem, is a highly compelling approach due to its scalability and experimental feasibility. We investigated essential network structures for collective decision-making through quantitative stability analysis. Moreover, we demonstrated the asymmetric preferences of players in the CMAB problem, extending its functionality to more practical applications. Our study highlights the capability and significance of machine learning built upon chaotic lasers and photonic devices.
    摘要 光学加速器最近受到了极高的关注,利用光的本质来处理信息。通过用激光网络实现集体决策,利用激光相互连接的异洛动和同步动力学性能来解决多臂投机问题,是一种非常吸引人的方法,因为它具有扩展性和实验可行性。我们通过量化稳定分析 investigate了集体决策的重要网络结构,以及扩展了玩家偏好的性质,使其更加适用于实际应用。我们的研究强调了基于激光和光学设备的机器学习技术的可能性和重要性。

Pseudo Replay-based Class Continual Learning for Online New Category Anomaly Detection in Additive Manufacturing

  • paper_url: http://arxiv.org/abs/2312.02491
  • repo_url: None
  • paper_authors: Zhangyue Shi, Tianxin Xie, Chenang Liu, Yuxuan Li
    for: 这个研究旨在提高现代生产过程中的质量监控,使用先进的感应器和机器学习技术进行数据驱动的实时监控。methods: 本研究使用了内存基础的不断学习,并通过增加级别学习和样本增加的方法来解决资料储存容量的限制。results: 实验结果显示,提案的方法能够实现高质量的数据生成,并在新的类别偏差出现时进行incremental learning,不需要储存所有数据。此外,这些方法还能够提高监控性能,并增加模型架构的 flexibility。
    Abstract The incorporation of advanced sensors and machine learning techniques has enabled modern manufacturing enterprises to perform data-driven in-situ quality monitoring based on the sensor data collected in manufacturing processes. However, one critical challenge is that newly presented defect category may manifest as the manufacturing process continues, resulting in monitoring performance deterioration of previously trained machine learning models. Hence, there is an increasing need for empowering machine learning model to learn continually. Among all continual learning methods, memory-based continual learning has the best performance but faces the constraints of data storage capacity. To address this issue, this paper develops a novel pseudo replay-based continual learning by integrating class incremental learning and oversampling-based data generation. Without storing all the data, the developed framework could generate high-quality data representing previous classes to train machine learning model incrementally when new category anomaly occurs. In addition, it could even enhance the monitoring performance since it also effectively improves the data quality. The effectiveness of the proposed framework is validated in an additive manufacturing process, which leverages supervised classification problem for anomaly detection. The experimental results show that the developed method is very promising in detecting novel anomaly while maintaining a good performance on the previous task and brings up more flexibility in model architecture.
    摘要 现代制造企业通过具有先进感测器和机器学习技术的数据驱动 situational quality monitoring 实现了基于感测器数据收集的制造过程中的质量监测。然而,一个重要挑战是新的缺陷类型可能在制造过程继续时出现,导致先前训练的机器学习模型的监测性能下降。因此,有一个增加需要 empowering 机器学习模型进行不断学习。在所有的不断学习方法中,记忆基本的不断学习具有最好的表现,但面临数据存储容量的限制。为解决这个问题,本文开发了一种新的 Pseudo replay-based 不断学习方法,通过将类增量学习和扩sampling-based 数据生成相结合。不需要存储所有数据,开发的框架可以在新类异常出现时逐步培训机器学习模型,并且可以提高监测性能。此外,它还可以增强监测性能,因为它还可以提高数据质量。本文在使用超过 classification 问题进行杂合制造过程中的异常检测,实验结果表明,提出的方法是非常有前途的,能够检测新的异常,保持好的前任任务性能,并增加模型架构的灵活性。

Constrained Twin Variational Auto-Encoder for Intrusion Detection in IoT Systems

  • paper_url: http://arxiv.org/abs/2312.02490
  • repo_url: None
  • paper_authors: Phai Vu Dinh, Quang Uy Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Son Pham Bao, Eryk Dutkiewicz
  • for: 保护互联网物联网设备免受恶意攻击
  • methods: 使用受限的双质量变换自动编码器(CTVAE)帮助攻击检测系统获得更可分离和低维度的数据表示
  • results: 比对11个最受欢迎的互联网物联网恶意蜂灾数据集,CTVAE可以提高约1%的准确率和分数比,而运行时间为攻击检测下降至2E-6秒,模型大小低于1MB。
    Abstract Intrusion detection systems (IDSs) play a critical role in protecting billions of IoT devices from malicious attacks. However, the IDSs for IoT devices face inherent challenges of IoT systems, including the heterogeneity of IoT data/devices, the high dimensionality of training data, and the imbalanced data. Moreover, the deployment of IDSs on IoT systems is challenging, and sometimes impossible, due to the limited resources such as memory/storage and computing capability of typical IoT devices. To tackle these challenges, this article proposes a novel deep neural network/architecture called Constrained Twin Variational Auto-Encoder (CTVAE) that can feed classifiers of IDSs with more separable/distinguishable and lower-dimensional representation data. Additionally, in comparison to the state-of-the-art neural networks used in IDSs, CTVAE requires less memory/storage and computing power, hence making it more suitable for IoT IDS systems. Extensive experiments with the 11 most popular IoT botnet datasets show that CTVAE can boost around 1% in terms of accuracy and Fscore in detection attack compared to the state-of-the-art machine learning and representation learning methods, whilst the running time for attack detection is lower than 2E-6 seconds and the model size is lower than 1 MB. We also further investigate various characteristics of CTVAE in the latent space and in the reconstruction representation to demonstrate its efficacy compared with current well-known methods.
    摘要 侵入检测系统(IDS)对数百万个物联网设备进行了重要的保护。然而,IDS 面临物联网系统中的内在挑战,包括设备和数据的多样性,高维度训练数据,以及数据不均衡。此外,在物联网系统上部署 IDS 困难,有时候甚至不可能,因为典型的物联网设备的内存/存储和计算能力有限。为了解决这些挑战,这篇文章提议了一种新的深度神经网络/架构,即受限的双质量变换自适应器(CTVAE)。CTVAE 可以为 IDS 的分类器提供更分割/ отличи的、更低维度的数据表示。此外,相比现状的神经网络,CTVAE 需要更少的内存/存储和计算能力,因此更适合物联网 IDS 系统。在使用 11 个最受欢迎的物联网 botnet 数据集进行了广泛的实验后,我们发现,CTVAE 可以提高约 1% 的准确率和 Fscore 在检测攻击方面,而且检测攻击的运行时间低于 2E-6 秒,模型大小低于 1 MB。我们还进一步研究了 CTVAE 在幂空间和重建表示中的特点,以示其效果相比现有的方法。

RL-Based Cargo-UAV Trajectory Planning and Cell Association for Minimum Handoffs, Disconnectivity, and Energy Consumption

  • paper_url: http://arxiv.org/abs/2312.02478
  • repo_url: None
  • paper_authors: Nesrine Cherif, Wael Jaafar, Halim Yanikomeroglu, Abbas Yongacoglu
  • For: 这个论文的目的是提高无人机货物交付的可靠性和能效性。* Methods: 这篇论文使用了强化学习(RL)技术来联合货物无人机的路径规划和Cell Association。* Results: 实验结果表明,与比较方法相比,这种方法可以降低手动交换事件,降低离线事件,并提高能源消耗。
    Abstract Unmanned aerial vehicle (UAV) is a promising technology for last-mile cargo delivery. However, the limited on-board battery capacity, cellular unreliability, and frequent handoffs in the airspace are the main obstacles to unleash its full potential. Given that existing cellular networks were primarily designed to service ground users, re-utilizing the same architecture for highly mobile aerial users, e.g., cargo-UAVs, is deemed challenging. Indeed, to ensure a safe delivery using cargo-UAVs, it is crucial to utilize the available energy efficiently, while guaranteeing reliable connectivity for command-and-control and avoiding frequent handoff. To achieve this goal, we propose a novel approach for joint cargo-UAV trajectory planning and cell association. Specifically, we formulate the cargo-UAV mission as a multi-objective problem aiming to 1) minimize energy consumption, 2) reduce handoff events, and 3) guarantee cellular reliability along the trajectory. We leverage reinforcement learning (RL) to jointly optimize the cargo-UAV's trajectory and cell association. Simulation results demonstrate a performance improvement of our proposed method, in terms of handoffs, disconnectivity, and energy consumption, compared to benchmarks.
    摘要 无人飞行器(UAV)是一种有前途的科技,用于最后一英里的货物交付。然而,有限的机体内置电池容量、无线电不可靠、空中交换频繁等因素,使得UAV的潜力受到限制。由于现有的无线网络主要为地面用户设计,对高度移动的空中用户,如货物UAV,进行再利用很困难。为确保货物UAV安全交付,必须有效利用可用能量,同时保证命令控制的可靠连接,避免频繁交换。为达到这个目标,我们提出了一种新的方法,即货物UAV轨迹规划和Cells关联优化。具体来说,我们将货物UAV的任务视为一个多目标问题,即1)最小化能量消耗,2)减少交换事件,3)保证无线连接可靠性。我们利用了强化学习(RL)来联合优化货物UAV的轨迹和Cells关联。实验结果显示,我们的提议方法可以比准 benchmark 更好地改善交换、离线和能量消耗等指标。

NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams

  • paper_url: http://arxiv.org/abs/2312.02473
  • repo_url: None
  • paper_authors: Chaoyi Chen, Dechao Gao, Yanfeng Zhang, Qiange Wang, Zhenbo Fu, Xuecang Zhang, Junhua Zhu, Yu Gu, Ge Yu
  • for: 本文旨在提供一个用于训练动态图 neural network(GNN)模型的框架,以便开发者更方便地创建性能强的 GNN 实现。
  • methods: 本文使用了一种称为 NeutronStream 的框架,它将输入动态图转换为一个按时间顺序更新的事件流,并使用优化的滑动窗口来逐步捕捉事件的空间-时间相关性。 NeutronStream 还提供了一个并行执行引擎,以解决事件处理的并发挑战,并实现高性能。
  • results: 对比州际端的动态 GNN 实现,NeutronStream 在速度方面实现了提升 ranges from 1.48X to 5.87X,并在平均准确率方面实现了3.97%的提升。
    Abstract Existing Graph Neural Network (GNN) training frameworks have been designed to help developers easily create performant GNN implementations. However, most existing GNN frameworks assume that the input graphs are static, but ignore that most real-world graphs are constantly evolving. Though many dynamic GNN models have emerged to learn from evolving graphs, the training process of these dynamic GNNs is dramatically different from traditional GNNs in that it captures both the spatial and temporal dependencies of graph updates. This poses new challenges for designing dynamic GNN training frameworks. First, the traditional batched training method fails to capture real-time structural evolution information. Second, the time-dependent nature makes parallel training hard to design. Third, it lacks system supports for users to efficiently implement dynamic GNNs. In this paper, we present NeutronStream, a framework for training dynamic GNN models. NeutronStream abstracts the input dynamic graph into a chronologically updated stream of events and processes the stream with an optimized sliding window to incrementally capture the spatial-temporal dependencies of events. Furthermore, NeutronStream provides a parallel execution engine to tackle the sequential event processing challenge to achieve high performance. NeutronStream also integrates a built-in graph storage structure that supports dynamic updates and provides a set of easy-to-use APIs that allow users to express their dynamic GNNs. Our experimental results demonstrate that, compared to state-of-the-art dynamic GNN implementations, NeutronStream achieves speedups ranging from 1.48X to 5.87X and an average accuracy improvement of 3.97%.
    摘要 现有的图 нейрон网络(GNN)训练框架已经被设计便于开发者快速创建高性能的 GNN 实现。然而,大多数现有的 GNN 框架假设输入图为静止的,忽略了实际世界中大多数图是不断更新的。虽然许多动态 GNN 模型已经出现以学习发展中的图,但是这些动态 GNN 的训练过程与传统 GNN 的训练过程有很大差异。这些差异带来了设计动态 GNN 训练框架的新挑战。首先,传统的批处理训练方法无法捕捉实时结构发展信息。其次,时间依赖性使得并行训练变得困难。最后,缺乏对用户进行高效实现动态 GNN 的系统支持。在本文中,我们提出了 NeutronStream,一个用于训练动态 GNN 模型的框架。NeutronStream 将输入动态图转化为一个时间顺序更新的事件流,并使用优化的滑动窗口来逐步捕捉事件流中的空间-时间相关性。此外,NeutronStream 提供了并行执行引擎来解决事件处理挑战,以实现高性能。NeutronStream 还集成了一个支持动态更新的图存储结构,并提供了一组易于使用的 API,allowing users to easily express their dynamic GNNs。我们的实验结果表明,相比于当前的动态 GNN 实现,NeutronStream 在性能和准确率方面具有1.48X-5.87X的加速和3.97%的均值提升。

Congestion-aware Distributed Task Offloading in Wireless Multi-hop Networks Using Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2312.02471
  • repo_url: None
  • paper_authors: Zhongyuan Zhao, Jake Perazzone, Gunjan Verma, Santiago Segarra
  • for: 这个研究旨在提高边缘智能设备中的处理能力,特别是在无线多跳网络中具有多个移动设备的情况下。
  • methods: 本研究使用了分布式排阵法和图像学习来实现低负载、干扰确认的分布式任务卸载方案。
  • results: 在实验中,我们的方法能够降低无线多跳网络中任务卸载所导致的网络填充和不稳 queue,同时提高了本地处理的执行时间。
    Abstract Computational offloading has become an enabling component for edge intelligence in mobile and smart devices. Existing offloading schemes mainly focus on mobile devices and servers, while ignoring the potential network congestion caused by tasks from multiple mobile devices, especially in wireless multi-hop networks. To fill this gap, we propose a low-overhead, congestion-aware distributed task offloading scheme by augmenting a distributed greedy framework with graph-based machine learning. In simulated wireless multi-hop networks with 20-110 nodes and a resource allocation scheme based on shortest path routing and contention-based link scheduling, our approach is demonstrated to be effective in reducing congestion or unstable queues under the context-agnostic baseline, while improving the execution latency over local computing.
    摘要 computational offloading已成为移动设备和智能设备的核心组件。现有的卸载方案主要关注于移动设备和服务器,而忽略了多个移动设备任务之间的网络压力。为填补这一空白,我们提议一种低开销、压力感知分布式任务卸载方案,通过对分布式满积框架进行图像学习增强。在模拟无线多跳网络中,我们的方法可以降低压力或不稳定队列,比基线下降低执行延迟,而且在不同上下文中具有改善性。

Dimensionality Reduction and Dynamical Mode Recognition of Circular Arrays of Flame Oscillators Using Deep Neural Network

  • paper_url: http://arxiv.org/abs/2312.02462
  • repo_url: None
  • paper_authors: Weiming Xu, Tao Yang, Peng Zhang
  • for: 本研究旨在减少高维空间时间数据,并实现不同振荡模式的分类。
  • methods: 该研究使用了一种基于Bi-LSTM-VAE和WDC的方法,包括使用Bi-LSTM-VAE进行维度减少,并使用WDC进行模式分类。
  • results: 研究结果表明,该方法可以生成不 overlap的分布,并且在分类中表现出优于VAE和PCA。
    Abstract Oscillatory combustion in aero engines and modern gas turbines often has significant adverse effects on their operation, and accurately recognizing various oscillation modes is the prerequisite for understanding and controlling combustion instability. However, the high-dimensional spatial-temporal data of a complex combustion system typically poses considerable challenges to the dynamical mode recognition. Based on a two-layer bidirectional long short-term memory variational autoencoder (Bi-LSTM-VAE) dimensionality reduction model and a two-dimensional Wasserstein distance-based classifier (WDC), this study proposes a promising method (Bi-LSTM-VAE-WDC) for recognizing dynamical modes in oscillatory combustion systems. Specifically, the Bi-LSTM-VAE dimension reduction model was introduced to reduce the high-dimensional spatial-temporal data of the combustion system to a low-dimensional phase space; Gaussian kernel density estimates (GKDE) were computed based on the distribution of phase points in a grid; two-dimensional WD values were calculated from the GKDE maps to recognize the oscillation modes. The time-series data used in this study were obtained from numerical simulations of circular arrays of laminar flame oscillators. The results show that the novel Bi-LSTM-VAE method can produce a non-overlapping distribution of phase points, indicating an effective unsupervised mode recognition and classification. Furthermore, the present method exhibits a more prominent performance than VAE and PCA (principal component analysis) for distinguishing dynamical modes in complex flame systems, implying its potential in studying turbulent combustion.
    摘要 oscillatory combustion in 发动机和现代液体发动机经常会有显著的不良影响,并且正确地识别不同的振荡模式是理解和控制燃燃不稳定的必要前提。然而,复杂的燃燃系统的高维度空间时间数据通常会对动态模式识别提出很大挑战。本研究基于二层双向长短期记忆自适应网络(Bi-LSTM-VAE)维度减少模型和二维 Wasserstein距离基于分类器(WDC),提出了一种有 promise的方法(Bi-LSTM-VAE-WDC)用于识别动态模式。具体来说,Bi-LSTM-VAE 维度减少模型将高维度空间时间数据转化为低维度的相位空间,然后基于相位点的分布在网格中计算Gaussian核密度估计(GKDE),从GKDE 图表中计算二维 Wasserstein距离,用于识别振荡模式。这些时间序列数据由数字 simulate circular array of laminar flame oscillators 得到。结果表明,新的Bi-LSTM-VAE方法可以生成不重叠的相位点分布,表明有效的无监督模式识别和分类。此外, presente 方法在复杂的燃燃系统中比VAE和PCA(主成分分析)表现更出色,implying its potential in studying turbulent combustion。

GIT-Net: Generalized Integral Transform for Operator Learning

  • paper_url: http://arxiv.org/abs/2312.02450
  • repo_url: https://github.com/chaow-mat/general_integral_transform_neural_network
  • paper_authors: Chao Wang, Alexandre Hoang Thiery
  • for: 用于解决部分偏微分方程(PDE)Operator的深度神经网络架构。
  • methods: 使用深度神经网络模型来近似特定函数基(如傅敏极化)中的偏微分算子。
  • results: 比较其他最新的方案更有利的计算和内存需求,适用于复杂 geometries 上的 PDE 问题,并在许多 PDE 问题上表现出小测试错误和低评价。
    Abstract This article introduces GIT-Net, a deep neural network architecture for approximating Partial Differential Equation (PDE) operators, inspired by integral transform operators. GIT-NET harnesses the fact that differential operators commonly used for defining PDEs can often be represented parsimoniously when expressed in specialized functional bases (e.g., Fourier basis). Unlike rigid integral transforms, GIT-Net parametrizes adaptive generalized integral transforms with deep neural networks. When compared to several recently proposed alternatives, GIT-Net's computational and memory requirements scale gracefully with mesh discretizations, facilitating its application to PDE problems on complex geometries. Numerical experiments demonstrate that GIT-Net is a competitive neural network operator, exhibiting small test errors and low evaluations across a range of PDE problems. This stands in contrast to existing neural network operators, which typically excel in just one of these areas.
    摘要 这篇文章介绍了 GIT-Net,一种深度神经网络架构,用于近似 diferencial equation(PDE)算子。GIT-Net 灵感来自积分 transform 算子,利用了 differential 算子通常用于定义 PDE 的特殊函数基(例如 fourier 基)来表示。与固定积分 transform 不同,GIT-Net 使用深度神经网络来 Parametrize 自适应总积分 transform。与其他最近提出的 altenativas 相比,GIT-Net 的计算和存储需求随着网格精度的增加而减少,使其适用于复杂 geometry 上的 PDE 问题。数字实验表明,GIT-Net 是一个竞争力强的神经网络算子,在多种 PDE 问题中表现出小误差和低评价。这与现有的神经网络算子不同,通常只在一个这些领域中具有优势。

Adaptive Instrument Design for Indirect Experiments

  • paper_url: http://arxiv.org/abs/2312.02438
  • repo_url: https://github.com/yashchandak/IndirectExpDesign
  • paper_authors: Yash Chandak, Shiv Shankar, Vasilis Syrgkanis, Emma Brunskill
  • for: 估计干预效果,尤其是在实施Randomized Control Trials (RCTs) 是不现实或不道德的情况下。
  • methods: 利用(条件)工具变量,通过奖励和推荐而不是严格的治疗分配来估计干预效果。
  • results: 通过自适应实验设计来提高 indirect experiment 的样本效率,并通过Influence Functions来搜索最佳数据收集策略,最小化欲要的(非线性)估计器的均方差误差。
    Abstract Indirect experiments provide a valuable framework for estimating treatment effects in situations where conducting randomized control trials (RCTs) is impractical or unethical. Unlike RCTs, indirect experiments estimate treatment effects by leveraging (conditional) instrumental variables, enabling estimation through encouragement and recommendation rather than strict treatment assignment. However, the sample efficiency of such estimators depends not only on the inherent variability in outcomes but also on the varying compliance levels of users with the instrumental variables and the choice of estimator being used, especially when dealing with numerous instrumental variables. While adaptive experiment design has a rich literature for direct experiments, in this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy over instrumental variables. Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy, minimizing the mean-squared error of the desired (non-linear) estimator. Through experiments conducted in various domains inspired by real-world applications, we showcase how our method can significantly improve the sample efficiency of indirect experiments.
    摘要 Translated into Simplified Chinese: indirect experiments 提供一个值得关注的框架,用于估计干预效果,特别是在实施Randomized Control Trials (RCTs) 是不切实际或不道德的情况下。与 RCTs 不同, indirect experiments 通过利用 (conditional) instrumente variables 来估计干预效果,通过奖励和推荐而不是严格的干预分配。然而, indirect experiments 的样本效率取决于不同的用户是否遵循 instrumente variables 的合作率,以及选择的估计器。 especialmente when dealing with numerous instrumente variables。 While adaptive experiment design 在 direct experiments 领域有丰富的Literature, in this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy over instrumente variables。 Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy, minimizing the mean-squared error of the desired (non-linear) estimator。 Through experiments conducted in various domains inspired by real-world applications, we showcase how our method can significantly improve the sample efficiency of indirect experiments。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need the translation in Traditional Chinese, please let me know.

PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models

  • paper_url: http://arxiv.org/abs/2312.02429
  • repo_url: https://github.com/amzn/pecos
  • paper_authors: Wei-Cheng Chang, Jyun-Yu Jiang, Jiong Zhang, Mutasem Al-Darabsah, Choon Hui Teo, Cho-Jui Hsieh, Hsiang-Fu Yu, S. V. N. Vishwanathan
  • for: 这个研究的目的是提出一种 ParamEter-Free Adapters (PEFA) 框架,用于快速调参大规模文本检索问题中的嵌入式模型 (ERM)。
  • methods: PEFA 框架使用非参数式 k-最近邻 (kNN) 组件来 equip ERM,并在推理阶段使用 convex combination 的方式将 ERM 和 kNN 两个得分函数相结合。
  • results: 在两个检索应用中,PEFA 实际上达到了显著的提升,包括对 Trivia-QA 和 NQ-320K 进行了预训练和微调 ERM 的改进。对于文档检索,PEFA 在 Recall@100 指标上提高了预训练 ERM 的平均提升率为 13.2%,而微调 ERM 的平均提升率为 5.5%。对于产品搜索,PEFA 在微调 ERM 上提高了 Recall@100 的平均提升率为 5.3%和 14.5%。
    Abstract Embedding-based Retrieval Models (ERMs) have emerged as a promising framework for large-scale text retrieval problems due to powerful large language models. Nevertheless, fine-tuning ERMs to reach state-of-the-art results can be expensive due to the extreme scale of data as well as the complexity of multi-stages pipelines (e.g., pre-training, fine-tuning, distillation). In this work, we propose the PEFA framework, namely ParamEter-Free Adapters, for fast tuning of ERMs without any backward pass in the optimization. At index building stage, PEFA equips the ERM with a non-parametric k-nearest neighbor (kNN) component. At inference stage, PEFA performs a convex combination of two scoring functions, one from the ERM and the other from the kNN. Based on the neighborhood definition, PEFA framework induces two realizations, namely PEFA-XL (i.e., extra large) using double ANN indices and PEFA-XS (i.e., extra small) using a single ANN index. Empirically, PEFA achieves significant improvement on two retrieval applications. For document retrieval, regarding Recall@100 metric, PEFA improves not only pre-trained ERMs on Trivia-QA by an average of 13.2%, but also fine-tuned ERMs on NQ-320K by an average of 5.5%, respectively. For product search, PEFA improves the Recall@100 of the fine-tuned ERMs by an average of 5.3% and 14.5%, for PEFA-XS and PEFA-XL, respectively. Our code is available at https://github.com/amzn/pecos/tree/mainline/examples/pefa-wsdm24.
    摘要 大型文本检索问题上,嵌入式检索模型(ERMs)已经成为一种有前途的框架,尤其是由于大型语言模型的出色表现。然而,为了达到状态之最的效果, fine-tuning ERMs 可能会很昂贵,因为数据的极大规模以及多个阶段管道(如预训练、精度调整、蒸馏)的复杂性。在这种情况下,我们提出了 PEFA 框架,即 ParamEter-Free Adapters,用于快速调整 ERMs 而无需反向传播优化。在索引建立阶段,PEFA 在 ERM 上添加了一个非参数式 k-最近邻(kNN)组件。在检索阶段,PEFA 通过权值融合两个分数函数,一个来自 ERM 和另一个来自 kNN。基于邻居定义,PEFA 框架实现了两个实现,即 PEFA-XL(i.e., extra large)使用双 ANN 索引,以及 PEFA-XS(i.e., extra small)使用单 ANN 索引。实验表明,PEFA 在两个检索应用中具有显著改善。对于文档检索,PEFA 对 Trivia-QA 预训练 ERM 的 Recall@100 指标提高了平均 13.2%,对于 NQ-320K 预训练 ERM 提高了平均 5.5%。对于产品检索,PEFA 对精度调整 ERM 的 Recall@100 指标提高了平均 5.3%和14.5%,分别用于 PEFA-XS 和 PEFA-XL。我们的代码可以在 找到。

AI-driven emergence of frequency information non-uniform distribution via THz metasurface spectrum prediction

  • paper_url: http://arxiv.org/abs/2312.03017
  • repo_url: https://github.com/Ufere/Assingment_1
  • paper_authors: Xiaohua Xing, Yuqi Ren, Die Zou, Qiankun Zhang, Bingxuan Mao, Jianquan Yao, Deyi Xiong, Shuang Zhang, Liang Wu
  • for: 预测tera兆频模ulation效果
  • methods: 使用人工智能预测方法,并添加多频输入来提高预测精度
  • results: 实现了预测tera兆频模ulation效果的高精度预测,并且开辟了人工智能在化学、复杂材料设计、生物医学等领域的应用前景
    Abstract Recently, artificial intelligence has been extensively deployed across various scientific disciplines, optimizing and guiding the progression of experiments through the integration of abundant datasets, whilst continuously probing the vast theoretical space encapsulated within the data. Particularly, deep learning models, due to their end-to-end adaptive learning capabilities, are capable of autonomously learning intrinsic data features, thereby transcending the limitations of traditional experience to a certain extent. Here, we unveil previously unreported information characteristics pertaining to different frequencies emerged during our work on predicting the terahertz spectral modulation effects of metasurfaces based on AI-prediction. Moreover, we have substantiated that our proposed methodology of simply adding supplementary multi-frequency inputs to the existing dataset during the target spectral prediction process can significantly enhance the predictive accuracy of the network. This approach effectively optimizes the utilization of existing datasets and paves the way for interdisciplinary research and applications in artificial intelligence, chemistry, composite material design, biomedicine, and other fields.
    摘要 Here is the text in Simplified Chinese:最近,人工智能已经广泛应用于不同的科学领域,通过大量数据的集成,优化和导引实验的进程,同时不断探索数据中的庞大理论空间。特别是深度学习模型,它们的终端适应学习能力,使其能够自动学习数据中的内在特征,至少部分突破传统经验的限制。在我们预测teraHz频谱修饰效应的metaSurfaces使用人工智能预测时,我们发现了不同频率的新信息特征。此外,我们还证明了我们提议的方法——在目标频谱预测过程中,添加多个频率输入——可以显著提高网络的预测精度。这种方法可以有效利用现有数据,开展跨学科研究和应用于人工智能、化学、复合材料设计、生物医学和其他领域。

Robust Clustering using Hyperdimensional Computing

  • paper_url: http://arxiv.org/abs/2312.02407
  • repo_url: None
  • paper_authors: Lulu Ge, Keshab K. Parhi
    for:This paper aims to improve the clustering performance in the hyperdimensional computing (HDC) domain by proposing four HDC-based clustering algorithms.methods:The proposed algorithms use similarity-based k-means, equal bin-width histogram, equal bin-height histogram, and similarity-based affinity propagation to assign initial cluster hypervectors and improve the performance of HDCluster.results:The proposed algorithms achieve better accuracy, more robust performance, fewer iterations, and less execution time compared to the existing HDCluster. Specifically, similarity-based affinity propagation outperforms the other three algorithms on eight datasets by 2-38% in clustering accuracy. Additionally, the proposed algorithms can provide more robust clustering accuracy than HDCluster even for one-pass clustering, and traditional clustering is more desirable than HDC when the number of clusters is large.Here is the answer in Simplified Chinese text:for:这篇论文目标是在幂维度计算(HDC)领域中提高归一化性能。methods:提议的算法使用相似性基本的k-means、等宽度 histogram、等高度 histogram 和相似性基本的吸引传播来初始化归一化集群。results:提议的算法相比现有的 HDCluster 具有更高的准确率、更稳定的性能、更少的迭代次数和更短的执行时间。具体来说,相似性基本的吸引传播在八个数据集上比其他三种算法提供2-38%的归一化精度提升。此外,提议的算法可以在一次归一化(即无迭代更新集群准则)下提供更稳定的归一化精度,而传统归一化在分支数量较大时更加愿意使用。
    Abstract This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is not robust. The performance of HDCluster is degraded as the hypervectors for the clusters are chosen at random during the initialization step. To overcome this bottleneck, we assign the initial cluster hypervectors by exploring the similarity of the encoded data, referred to as \textit{query} hypervectors. Intra-cluster hypervectors have a higher similarity than inter-cluster hypervectors. Harnessing the similarity results among query hypervectors, this paper proposes four HDC-based clustering algorithms: similarity-based k-means, equal bin-width histogram, equal bin-height histogram, and similarity-based affinity propagation. Experimental results illustrate that: (i) Compared to the existing HDCluster, our proposed HDC-based clustering algorithms can achieve better accuracy, more robust performance, fewer iterations, and less execution time. Similarity-based affinity propagation outperforms the other three HDC-based clustering algorithms on eight datasets by 2~38% in clustering accuracy. (ii) Even for one-pass clustering, i.e., without any iterative update of the cluster hypervectors, our proposed algorithms can provide more robust clustering accuracy than HDCluster. (iii) Over eight datasets, five out of eight can achieve higher or comparable accuracy when projected onto the hyperdimensional space. Traditional clustering is more desirable than HDC when the number of clusters, $k$, is large.
    摘要 Experiments show that the proposed algorithms outperform HDCluster in terms of accuracy, robustness, and execution time. Specifically, similarity-based affinity propagation achieves the highest accuracy on eight datasets, with an improvement of 2-38% compared to HDCluster. Additionally, the proposed algorithms can provide robust clustering accuracy even with one-pass clustering, without any iterative update of the cluster hypervectors. Finally, the paper shows that projecting the data onto the hyperdimensional space can improve the accuracy for some datasets.In summary, the paper proposes four HDC-based clustering algorithms that achieve better accuracy and robustness than existing methods, and demonstrates their effectiveness on eight datasets.

Harmonizing Global Voices: Culturally-Aware Models for Enhanced Content Moderation

  • paper_url: http://arxiv.org/abs/2312.02401
  • repo_url: None
  • paper_authors: Alex J. Chan, José Luis Redondo García, Fabrizio Silvestri, Colm O’Donnel, Konstantina Palla
  • for: 本研究旨在探讨如何使CONTENT Moderation System能够考虑地区文化差异,以便更好地识别和处理不当内容。
  • methods: 本研究使用大量媒体新闻和文章数据进行模型训练,以创建地域化的语言模型,以捕捉各地区的通信方式差异,并且通过生成内容违反情况的解释,让政策指南在不同的地区文化背景下能够更好地理解和应用。
  • results: 研究发现,通过训练于媒体数据集上的大型语言模型,可以成功地捕捉地区文化差异,并且提高了地区性的内容识别和处理能力,同时还能够生成与当地文化和社会背景相align的解释。
    Abstract Content moderation at scale faces the challenge of considering local cultural distinctions when assessing content. While global policies aim to maintain decision-making consistency and prevent arbitrary rule enforcement, they often overlook regional variations in interpreting natural language as expressed in content. In this study, we are looking into how moderation systems can tackle this issue by adapting to local comprehension nuances. We train large language models on extensive datasets of media news and articles to create culturally attuned models. The latter aim to capture the nuances of communication across geographies with the goal of recognizing cultural and societal variations in what is considered offensive content. We further explore the capability of these models to generate explanations for instances of content violation, aiming to shed light on how policy guidelines are perceived when cultural and societal contexts change. We find that training on extensive media datasets successfully induced cultural awareness and resulted in improvements in handling content violations on a regional basis. Additionally, these advancements include the ability to provide explanations that align with the specific local norms and nuances as evidenced by the annotators' preference in our conducted study. This multifaceted success reinforces the critical role of an adaptable content moderation approach in keeping pace with the ever-evolving nature of the content it oversees.
    摘要 We find that training on extensive media datasets successfully induced cultural awareness and resulted in improvements in handling content violations on a regional basis. Additionally, these advancements include the ability to provide explanations that align with the specific local norms and nuances, as evidenced by the annotators' preference in our conducted study. This multifaceted success reinforces the critical role of an adaptable content moderation approach in keeping pace with the ever-evolving nature of the content it oversees.

Auto DP-SGD: Dual Improvements of Privacy and Accuracy via Automatic Clipping Threshold and Noise Multiplier Estimation

  • paper_url: http://arxiv.org/abs/2312.02400
  • repo_url: None
  • paper_authors: Sai Venkatesh Chilukoti, Md Imran Hossen, Liqun Shan, Vijay Srinivas Tida, Xiai Hei
  • for: 保护深度学习应用中的个人隐私信息,DP-SGD 方法已经得到了广泛的应用。
  • methods: 研究者提出了多种自适应DP-SGD方法来提高模型的实用性。
  • results: 我们的Auto DP-SGD方法可以在不同的数据集上提高隐私和准确性,并且可以适应不同的隐私预算。特别是,我们的方法可以降低缩放因子和使用学习率调度器来降低隐私预算,而无需 significatively reducuce 准确性。
    Abstract DP-SGD has emerged as a popular method to protect personally identifiable information in deep learning applications. Unfortunately, DP-SGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility. To enhance the model's utility, researchers proposed various adaptive DP-SGD methods. However, we examine and discover that these techniques result in greater privacy leakage or lower accuracy than the traditional DP-SGD method, or a lack of evaluation on a complex data set such as CIFAR100. To address these limitations, we propose an Auto DP-SGD. Our method automates clipping threshold estimation based on the DL model's gradient norm and scales the gradients of each training sample without losing gradient information. This helps to improve the algorithm's utility while using a less privacy budget. To further improve accuracy, we introduce automatic noise multiplier decay mechanisms to decrease the noise multiplier after every epoch. Finally, we develop closed-form mathematical expressions using tCDP accountant for automatic noise multiplier and automatic clipping threshold estimation. Through extensive experimentation, we demonstrate that Auto DP-SGD outperforms existing SOTA DP-SGD methods in privacy and accuracy on various benchmark datasets. We also show that privacy can be improved by lowering the scale factor and using learning rate schedulers without significantly reducing accuracy. Specifically, Auto DP-SGD, when used with a step noise multiplier, improves accuracy by 3.20, 1.57, 6.73, and 1.42 for the MNIST, CIFAR10, CIFAR100, and AG News Corpus datasets, respectively. Furthermore, it obtains a substantial reduction in the privacy budget of 94.9, 79.16, 67.36, and 53.37 for the corresponding data sets.
    摘要 DP-SGD 已成为深度学习应用中保护个人隐私的受欢迎方法。然而,DP-SGD 的每个样本的梯度cliping和均匀噪声添加 durante el entrenamiento可能会导致模型的性能下降。为了提高模型的性能,研究人员提出了多种自适应DP-SGD 方法。然而,我们发现这些技术会导致隐私泄露或减少准确率,或者在复杂的数据集上没有进行评估。为了解决这些限制,我们提出了自动DP-SGD。我们的方法可以自动估计梯度norm的clipping阈值,并将每个训练样本的梯度缩放至保留梯度信息。这有助于提高算法的性能,同时使用更少的隐私预算。此外,我们引入自动幂数减少机制,以逐 epoch 递减幂数 Multiplier。最后,我们通过closed-form 的数学表述使用 tCDP 财务公司来自动确定幂数 multiplier 和 clipping 阈值。经过广泛的实验,我们证明了 Auto DP-SGD 可以在各种标准 benchmark 数据集上超越现有的 SOTA DP-SGD 方法,同时保持隐私和准确率的平衡。此外,我们还发现可以通过降低扬射因子和使用学习率调整器,无需减少准确率,进一步提高隐私。例如,在使用步进噪声 multiplier 时,Auto DP-SGD 可以在 MNIST、CIFAR10、CIFAR100 和 AG News Corpus 数据集上提高准确率 3.20、1.57、6.73 和 1.42,同时降低隐私预算 94.9、79.16、67.36 和 53.37。