cs.LG - 2023-09-22

The LHCb ultra-fast simulation option, Lamarr: design and validation

  • paper_url: http://arxiv.org/abs/2309.13213
  • repo_url: None
  • paper_authors: Lucio Anderlini, Matteo Barbetti, Simone Capelli, Gloria Corti, Adam Davis, Denis Derkach, Nikita Kazeev, Artem Maevskiy, Maurizio Martinelli, Sergei Mokonenko, Benedetto Gianluca Siddi, Zehua Xu
  • for: 用于提高LHCb实验中的详细探测器模拟,以满足Run 3中的数据收集需求。
  • methods: 使用Gaudi框架,并利用深度生成模型和梯度提升决策树来 parameterize探测器响应和重建算法。
  • results: 比较详细模拟和Lamarr模拟的结果,发现Lamarr可以提供两个数量级的速度提升,同时保持与详细模拟的一致性。
    Abstract Detailed detector simulation is the major consumer of CPU resources at LHCb, having used more than 90% of the total computing budget during Run 2 of the Large Hadron Collider at CERN. As data is collected by the upgraded LHCb detector during Run 3 of the LHC, larger requests for simulated data samples are necessary, and will far exceed the pledged resources of the experiment, even with existing fast simulation options. An evolution of technologies and techniques to produce simulated samples is mandatory to meet the upcoming needs of analysis to interpret signal versus background and measure efficiencies. In this context, we propose Lamarr, a Gaudi-based framework designed to offer the fastest solution for the simulation of the LHCb detector. Lamarr consists of a pipeline of modules parameterizing both the detector response and the reconstruction algorithms of the LHCb experiment. Most of the parameterizations are made of Deep Generative Models and Gradient Boosted Decision Trees trained on simulated samples or alternatively, where possible, on real data. Embedding Lamarr in the general LHCb Gauss Simulation framework allows combining its execution with any of the available generators in a seamless way. Lamarr has been validated by comparing key reconstructed quantities with Detailed Simulation. Good agreement of the simulated distributions is obtained with two-order-of-magnitude speed-up of the simulation phase.
    摘要 具有详细探测器模拟功能的 Lamarr 框架,基于 Gaudi 框架,可以提供最快的 LHCb 探测器模拟解决方案。Lamarr 包含一系列模块,用于 parameterizing LHCb 实验中的探测器响应和重建算法。大多数参数化都是使用深度生成模型和梯度提升决策树,并在训练过程中使用 simulate 样本或实际数据。嵌入 Lamarr 到 LHCb Gauss Simulation 框架中,可以将其与任何可用的生成器结合使用,实现无缝的执行。Lamarr 已经得到了对 Key 重建量的验证,并与详细模拟相比,实现了两个级别的速度提升。

Evidential Deep Learning: Enhancing Predictive Uncertainty Estimation for Earth System Science Applications

  • paper_url: http://arxiv.org/abs/2309.13207
  • repo_url: https://github.com/AI2ES/miles-guess
  • paper_authors: John S. Schreck, David John Gagne II, Charlie Becker, William E. Chapman, Kim Elmore, Gabrielle Gantos, Eliot Kim, Dhamma Kimpara, Thomas Martin, Maria J. Molina, Vanessa M. Pryzbylo, Jacob Radford, Belen Saavedra, Justin Willson, Christopher Wirz
  • for: 这个研究旨在提供一个可靠且实用的深度学习方法来量化气候和天气预测结果的不确定性。
  • methods: 这个研究使用的方法是 Parametric deep learning 和 Evidential deep learning,这两种方法可以 estimate 预测结果的不确定性,并且可以account for both aleatoric 和 epistemic uncertainty。
  • results: 这个研究发现,使用 evidential neural networks 可以实现预测精度与 ensemble 方法相当,同时可以严谨地量化预测结果的不确定性。
    Abstract Robust quantification of predictive uncertainty is critical for understanding factors that drive weather and climate outcomes. Ensembles provide predictive uncertainty estimates and can be decomposed physically, but both physics and machine learning ensembles are computationally expensive. Parametric deep learning can estimate uncertainty with one model by predicting the parameters of a probability distribution but do not account for epistemic uncertainty.. Evidential deep learning, a technique that extends parametric deep learning to higher-order distributions, can account for both aleatoric and epistemic uncertainty with one model. This study compares the uncertainty derived from evidential neural networks to those obtained from ensembles. Through applications of classification of winter precipitation type and regression of surface layer fluxes, we show evidential deep learning models attaining predictive accuracy rivaling standard methods, while robustly quantifying both sources of uncertainty. We evaluate the uncertainty in terms of how well the predictions are calibrated and how well the uncertainty correlates with prediction error. Analyses of uncertainty in the context of the inputs reveal sensitivities to underlying meteorological processes, facilitating interpretation of the models. The conceptual simplicity, interpretability, and computational efficiency of evidential neural networks make them highly extensible, offering a promising approach for reliable and practical uncertainty quantification in Earth system science modeling. In order to encourage broader adoption of evidential deep learning in Earth System Science, we have developed a new Python package, MILES-GUESS (https://github.com/ai2es/miles-guess), that enables users to train and evaluate both evidential and ensemble deep learning.
    摘要 Robust量化预测uncertainty是气候和天气结果的关键因素。集合可以提供预测uncertainty估计,但物理和机器学习集合都是计算成本高的。 parametric deep learning可以通过预测概率分布的参数来估计uncertainty,但不能考虑到epistemic uncertainty。 evidential deep learning,一种扩展 parametric deep learning 到更高阶分布的技术,可以同时考虑到aleatoric和epistemic uncertainty。本研究比较了来自集合和 evidential neural network 的uncertainty。通过对冬季降水类型分类和表面层流量预测的应用,我们显示 evidential deep learning 模型可以与标准方法匹配的预测精度,同时坚定地量化两种uncertainty。我们评估预测的uncertainty,包括预测是否准确折叠和预测错误与uncertainty之间的相关性。对输入uncertainty进行分析,可以了解模型对下游气象过程的敏感性,从而更好地理解模型。 evidential neural network 的概念简单、可解释性和计算效率,使其成为可靠和实用的uncertainty量化方法。为促进 Earth System Science 中 evidential deep learning 的广泛应用,我们已经开发了一个新的 Python 包,MILES-GUESS(https://github.com/ai2es/miles-guess),它允许用户训练和评估 evidential 和集合 deep learning。

Federated Short-Term Load Forecasting with Personalization Layers for Heterogeneous Clients

  • paper_url: http://arxiv.org/abs/2309.13194
  • repo_url: None
  • paper_authors: Shourya Bose, Kibaek Kim
  • for: 这篇论文是为了提高 Federated Learning(FL)的精度和减少资料隐私问题。
  • methods: 本论文使用了Argonne Privacy-Preserving Federated Learning套件,并提出了一个专门 для处理对私页面层的Personalized Federated Learning(PL-FL)算法,以提高模型的精度。
  • results: 根据NREL ComStock资料集的实验结果显示,PL-FL算法可以提高模型的预测性能,并且可以处理各个客户的对私页面层。
    Abstract The advent of smart meters has enabled pervasive collection of energy consumption data for training short-term load forecasting (STLF) models. In response to privacy concerns, federated learning (FL) has been proposed as a privacy-preserving approach for training, but the quality of trained models degrades as client data becomes heterogeneous. In this paper we alleviate this drawback using personalization layers, wherein certain layers of an STLF model in an FL framework are trained exclusively on the clients' own data. To that end, we propose a personalized FL algorithm (PL-FL) enabling FL to handle personalization layers. The PL-FL algorithm is implemented by using the Argonne Privacy-Preserving Federated Learning package. We test the forecast performance of models trained on the NREL ComStock dataset, which contains heterogeneous energy consumption data of multiple commercial buildings. Superior performance of models trained with PL-FL demonstrates that personalization layers enable classical FL algorithms to handle clients with heterogeneous data.
    摘要 智能仪器的出现使得能源消耗数据进行普遍收集,用于训练短期负荷预测(STLF)模型。为了保护隐私,联邦学习(FL)被提议作为隐私保护的方法,但训练模型的质量受到客户数据的不同性的影响。在本文中,我们通过个性化层来缓解这个缺点,其中某些层在联邦学习框架中仅使用客户自己的数据进行训练。为此,我们提出了个性化联邦学习算法(PL-FL),允许联邦学习算法处理个性化层。PL-FL算法使用Argonne隐私保护联邦学习包进行实现。我们在NREL ComStock数据集上测试了由PL-FL训练的预测模型的forecast性能,该数据集包含多个商业建筑物的各种能源消耗数据。我们发现模型通过PL-FL训练显示出了superior的预测性能,这说明个性化层使得传统的联邦学习算法能够处理客户具有不同数据的情况。

Visualizing Topological Importance: A Class-Driven Approach

  • paper_url: http://arxiv.org/abs/2309.13185
  • repo_url: None
  • paper_authors: Yu Qin, Brittany Terese Fasy, Carola Wenk, Brian Summa
  • for: 本研究首次用图像化方法来显示数据中重要的拓扑特征,以便更好地分析和理解数据的结构。
  • methods: 本研究使用了已经证明的可解释深度学习方法,并将其应用于拓扑分类任务。这种方法可以在每个数据集中找出重要的拓扑结构,并为每个类别分配不同的权重。
  • results: 本研究通过创建 persistente point density 的重要性场来显示数据中重要的拓扑特征。这种方法可以在图像、3D 形状和医疗图像等数据上进行实际应用,并提供了真实世界中这种方法的应用示例。
    Abstract This paper presents the first approach to visualize the importance of topological features that define classes of data. Topological features, with their ability to abstract the fundamental structure of complex data, are an integral component of visualization and analysis pipelines. Although not all topological features present in data are of equal importance. To date, the default definition of feature importance is often assumed and fixed. This work shows how proven explainable deep learning approaches can be adapted for use in topological classification. In doing so, it provides the first technique that illuminates what topological structures are important in each dataset in regards to their class label. In particular, the approach uses a learned metric classifier with a density estimator of the points of a persistence diagram as input. This metric learns how to reweigh this density such that classification accuracy is high. By extracting this weight, an importance field on persistent point density can be created. This provides an intuitive representation of persistence point importance that can be used to drive new visualizations. This work provides two examples: Visualization on each diagram directly and, in the case of sublevel set filtrations on images, directly on the images themselves. This work highlights real-world examples of this approach visualizing the important topological features in graph, 3D shape, and medical image data.
    摘要 Translated into Simplified Chinese:这篇论文介绍了首先使用 topological features 来定义数据类别的方法。 topological features 拥有抽象复杂数据的基本结构的能力,因此是数据可视化和分析管道中的一个重要组成部分。although not all topological features in data are of equal importance. Until now, the default definition of feature importance has been often assumed and fixed. This work shows how proven explainable deep learning approaches can be adapted for use in topological classification. In doing so, it provides the first technique that illuminates what topological structures are important in each dataset in regards to their class label. In particular, the approach uses a learned metric classifier with a density estimator of the points of a persistence diagram as input. This metric learns how to reweigh this density such that classification accuracy is high. By extracting this weight, an importance field on persistent point density can be created. This provides an intuitive representation of persistence point importance that can be used to drive new visualizations. This work provides two examples: Visualization on each diagram directly and, in the case of sublevel set filtrations on images, directly on the images themselves. This work highlights real-world examples of this approach visualizing the important topological features in graph, 3D shape, and medical image data.

Enhancing Multi-Objective Optimization through Machine Learning-Supported Multiphysics Simulation

  • paper_url: http://arxiv.org/abs/2309.13179
  • repo_url: None
  • paper_authors: Diego Botache, Jens Decke, Winfried Ripken, Abhinay Dornipati, Franz Götz-Hahn, Mohamed Ayeb, Bernhard Sick
  • for: 这篇论文旨在提出一个方法ological framework для快速化多物理 simulations,以满足多个目标的优化。
  • methods: 这篇论文使用了两种机器学习和深度学习算法,以及两种优化算法,并将其组合成一个完整的训练和优化管线。
  • results: 经过实验和评估,这篇论文发现可以使用相对少量的数据来训练高精度的代理模型,并且可以快速地获得多个目标的Pareto优化结果。
    Abstract Multiphysics simulations that involve multiple coupled physical phenomena quickly become computationally expensive. This imposes challenges for practitioners aiming to find optimal configurations for these problems satisfying multiple objectives, as optimization algorithms often require querying the simulation many times. This paper presents a methodological framework for training, self-optimizing, and self-organizing surrogate models to approximate and speed up Multiphysics simulations. We generate two real-world tabular datasets, which we make publicly available, and show that surrogate models can be trained on relatively small amounts of data to approximate the underlying simulations accurately. We conduct extensive experiments combining four machine learning and deep learning algorithms with two optimization algorithms and a comprehensive evaluation strategy. Finally, we evaluate the performance of our combined training and optimization pipeline by verifying the generated Pareto-optimal results using the ground truth simulations. We also employ explainable AI techniques to analyse our surrogates and conduct a preselection strategy to determine the most relevant features in our real-world examples. This approach lets us understand the underlying problem and identify critical partial dependencies.
    摘要 多物理 simulate 快速增加计算成本,这会对实践者们的优化问题提出挑战,因为优化算法通常需要对 simulate 进行多次查询。这篇论文提出了一种方法ológical framework для训练、自动优化和自动组织替身模型,以加速多物理 simulate。我们生成了两个实际世界的表格数据集,并证明了替身模型可以通过相对小量数据来准确地表示下面 simulate。我们在多种机器学习和深度学习算法和两种优化算法的基础上进行了广泛的实验。最后,我们使用了可解释 AI 技术来分析我们的替身和采用预选策略来确定实际世界中最重要的特征。这种方法让我们理解下面的问题,并识别 kritical partial dependencies。

Invisible Watermarking for Audio Generation Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.13166
  • repo_url: https://github.com/mikiyaxi/watermark-audio-diffusion
  • paper_authors: Xirong Cao, Xiang Li, Divyesh Jadav, Yanzhao Wu, Zhehui Chen, Chen Zeng, Wenqi Wei
  • for: 保护音频扩散模型的 интеграITY和数据权益
  • methods: 基于mel-spectrogram的音频扩散模型 watermarking技术
  • results: 实现了不可见水印触发机制,保护模型的有Integrity和数据权益,同时仍能够保持高效的净音频生成能力。
    Abstract Diffusion models have gained prominence in the image domain for their capabilities in data generation and transformation, achieving state-of-the-art performance in various tasks in both image and audio domains. In the rapidly evolving field of audio-based machine learning, safeguarding model integrity and establishing data copyright are of paramount importance. This paper presents the first watermarking technique applied to audio diffusion models trained on mel-spectrograms. This offers a novel approach to the aforementioned challenges. Our model excels not only in benign audio generation, but also incorporates an invisible watermarking trigger mechanism for model verification. This watermark trigger serves as a protective layer, enabling the identification of model ownership and ensuring its integrity. Through extensive experiments, we demonstrate that invisible watermark triggers can effectively protect against unauthorized modifications while maintaining high utility in benign audio generation tasks.
    摘要 各种扩散模型在图像领域中得到了广泛应用,以其数据生成和转换能力为特点,在图像和音频领域中实现了状态 искусственный机器学习的最佳性能。在快速发展的音频基于机器学习领域中,保护模型完整性和确立数据版权是核心问题。本文提出了首个应用于音频扩散模型训练的mel-spectrogram watermarking技术。这提供了一种新的方法来解决以上问题。我们的模型不仅在正常的音频生成任务中表现出色,还包含了隐藏的 watermarking 触发器机制,以确保模型的完整性和版权。通过广泛的实验,我们证明了隐藏的 watermark 触发器可以有效地保护 against 未授权修改,同时保持高的用于正常音频生成任务的实用性。

Forecasting Response to Treatment with Global Deep Learning and Patient-Specific Pharmacokinetic Priors

  • paper_url: http://arxiv.org/abs/2309.13135
  • repo_url: None
  • paper_authors: Willa Potosnak, Cristian Challu, Kin G. Olivares, Artur Dubrawski
  • for: 预测医疗时序数据,以早发现不良结果和监测病人状况。
  • methods: 提议一种新的混合全局-本地架构和药理学编码器,用于深入了解患者特定的治疗效应。
  • results: 对比patient-specific模型,全局-本地架构提高了9.2-14.6%的准确率;对比alternative编码技术,药理学编码器在模拟数据上提高了4.4%,在实际数据上提高了2.1%。
    Abstract Forecasting healthcare time series is crucial for early detection of adverse outcomes and for patient monitoring. Forecasting, however, can be difficult in practice due to noisy and intermittent data. The challenges are often exacerbated by change points induced via extrinsic factors, such as the administration of medication. To address these challenges, we propose a novel hybrid global-local architecture and a pharmacokinetic encoder that informs deep learning models of patient-specific treatment effects. We showcase the efficacy of our approach in achieving significant accuracy gains for a blood glucose forecasting task using both realistically simulated and real-world data. Our global-local architecture improves over patient-specific models by 9.2-14.6%. Additionally, our pharmacokinetic encoder improves over alternative encoding techniques by 4.4% on simulated data and 2.1% on real-world data. The proposed approach can have multiple beneficial applications in clinical practice, such as issuing early warnings about unexpected treatment responses, or helping to characterize patient-specific treatment effects in terms of drug absorption and elimination characteristics.
    摘要 预测医疗时序数据是重要的,可以早期检测不良结果并跟踪病人。然而,在实践中预测可能会困难,因为数据充满噪音和中断。这些挑战通常由外部因素引起的变换点加剧,如药物的给药。为了解决这些挑战,我们提议一种新的全球-本地架构和一种用于深度学习模型的药物生物学编码器。我们在血糖预测任务中使用这种方法,并使用真实的 simulated 数据和实际数据进行比较。我们的全球-本地架构在patient-specific模型的基础上提高了9.2-14.6%的准确率。此外,我们的药物生物学编码器在 simulated 数据上比替代编码技术提高4.4%,并在实际数据上提高2.1%。我们的方法可以在临床实践中有多个有利应用,如发现不ждан的治疗反应,或者帮助characterize patient-specific treatment effects in terms of drug absorption and elimination characteristics。

AntiBARTy Diffusion for Property Guided Antibody Design

  • paper_url: http://arxiv.org/abs/2309.13129
  • repo_url: None
  • paper_authors: Jordan Venderley
  • for: 这 paper 是为了探讨用 machine learning 技术来设计和工程抗体的可能性。
  • methods: 这 paper 使用了一种基于 BART 的语言模型,以及一种基于这种语言模型的扩散模型来导向 IgG 抗体的 de novo 设计。
  • results: 这 paper 的实验结果表明,可以使用这种方法来生成具有改进的在silico 稳定性的新抗体,同时保持抗体的有效性和序列多样性。
    Abstract Over the past decade, antibodies have steadily grown in therapeutic importance thanks to their high specificity and low risk of adverse effects compared to other drug modalities. While traditional antibody discovery is primarily wet lab driven, the rapid improvement of ML-based generative modeling has made in-silico approaches an increasingly viable route for discovery and engineering. To this end, we train an antibody-specific language model, AntiBARTy, based on BART (Bidirectional and Auto-Regressive Transformer) and use its latent space to train a property-conditional diffusion model for guided IgG de novo design. As a test case, we show that we can effectively generate novel antibodies with improved in-silico solubility while maintaining antibody validity and controlling sequence diversity.
    摘要 过去十年,抗体在治疗方面的重要性逐渐增长,主要归功于它们的高特异性和其他药物modalities相比的低风险。而传统抗体发现主要是在湿lab中进行,但随着机器学习(ML)基于生成模型的快速进步,在硬件上进行的方法在抗体发现和工程方面变得越来越有前途。为此,我们训练了一个抗体特有的语言模型 AntiBARTy,基于BART(双向自适应变换器),并使用其潜在空间来训练一个基于属性的扩散模型,用于导引IgG de novo设计。作为一个测试案例,我们显示了我们可以效果地生成改进了室内溶解性的新抗体,同时保持抗体有效性和控制序列多样性。

Data is often loadable in short depth: Quantum circuits from tensor networks for finance, images, fluids, and proteins

  • paper_url: http://arxiv.org/abs/2309.13108
  • repo_url: None
  • paper_authors: Raghav Jumade, Nicolas PD Sawaya
  • For: This paper addresses the “input problem” of loading classical data into a quantum computer, which has been an obstacle to achieving quantum advantage.* Methods: The paper introduces a circuit compilation method based on tensor network (TN) theory, called AMLET (Automatic Multi-layer Loader Exploiting TNs), which can be tailored to arbitrary circuit depths.* Results: The paper performs numerical experiments on real-world classical data from four distinct areas and shows that the required circuit depths are often several orders of magnitude lower than the exponentially-scaling general loading algorithm would require. This demonstrates that many classical datasets can be loaded into a quantum computer in much shorter depth than previously expected, which has positive implications for speeding up classical workloads on quantum computers.
    Abstract Though there has been substantial progress in developing quantum algorithms to study classical datasets, the cost of simply loading classical data is an obstacle to quantum advantage. When the amplitude encoding is used, loading an arbitrary classical vector requires up to exponential circuit depths with respect to the number of qubits. Here, we address this ``input problem'' with two contributions. First, we introduce a circuit compilation method based on tensor network (TN) theory. Our method -- AMLET (Automatic Multi-layer Loader Exploiting TNs) -- proceeds via careful construction of a specific TN topology and can be tailored to arbitrary circuit depths. Second, we perform numerical experiments on real-world classical data from four distinct areas: finance, images, fluid mechanics, and proteins. To the best of our knowledge, this is the broadest numerical analysis to date of loading classical data into a quantum computer. Consistent with other recent work in this area, the required circuit depths are often several orders of magnitude lower than the exponentially-scaling general loading algorithm would require. Besides introducing a more efficient loading algorithm, this work demonstrates that many classical datasets are loadable in depths that are much shorter than previously expected, which has positive implications for speeding up classical workloads on quantum computers.
    摘要 尽管在开发量子算法研究类别数据上已经取得了重要进展,但是将类别数据加载到量子计算机上的成本仍然是一个障碍物,以致于实现量子优势。当使用振荡编码时,将任意类别数据加载到多个量子比特(qubit)上可能需要对数量积累的循环深度。在这里,我们提出了两项贡献以解决这个“输入问题”。首先,我们基于张量网络(TN)理论开发了一种简单的练习方法,称之为自动多层加载器(AMLET)。我们的方法通过精心构建特定的TN结构,可以适应任意循环深度。其次,我们在实际的类别数据上进行了数值实验,来评估加载类别数据到量子计算机上的可能性。我们的实验结果表明,可以在循环深度上下文中加载类别数据,而不需要遵循普通的循环深度级数。此外,这项工作还证明了许多类别数据可以在循环深度上下文中加载,这意味着可以通过加速类别工作来减轻量子计算机上的工作负担。

Graph Neural Network for Stress Predictions in Stiffened Panels Under Uniform Loading

  • paper_url: http://arxiv.org/abs/2309.13022
  • repo_url: None
  • paper_authors: Yuecheng Cai, Jasmin Jelovica
  • for: 本研究旨在提出一种novel的图形嵌入技术,用于高效地表示3D厚度板的强度分布。
  • methods: 本研究使用了Graph Sampling and Aggregation(GraphSAGE)技术,并 comparing withfinite-element-vertex图表示方法。
  • results: 研究结果表明,使用提议的图形嵌入方法可以更加准确地预测3D厚度板的强度分布,并且可以快速地对不同结构 Parametric study。
    Abstract Machine learning (ML) and deep learning (DL) techniques have gained significant attention as reduced order models (ROMs) to computationally expensive structural analysis methods, such as finite element analysis (FEA). Graph neural network (GNN) is a particular type of neural network which processes data that can be represented as graphs. This allows for efficient representation of complex geometries that can change during conceptual design of a structure or a product. In this study, we propose a novel graph embedding technique for efficient representation of 3D stiffened panels by considering separate plate domains as vertices. This approach is considered using Graph Sampling and Aggregation (GraphSAGE) to predict stress distributions in stiffened panels with varying geometries. A comparison between a finite-element-vertex graph representation is conducted to demonstrate the effectiveness of the proposed approach. A comprehensive parametric study is performed to examine the effect of structural geometry on the prediction performance. Our results demonstrate the immense potential of graph neural networks with the proposed graph embedding method as robust reduced-order models for 3D structures.
    摘要

Brain Age Revisited: Investigating the State vs. Trait Hypotheses of EEG-derived Brain-Age Dynamics with Deep Learning

  • paper_url: http://arxiv.org/abs/2310.07029
  • repo_url: https://github.com/gemeinl/eeg-brain-age
  • paper_authors: Lukas AW Gemein, Robin T Schirrmeister, Joschka Boedecker, Tonio Ball
    for:* The paper aims to investigate the relationship between brain age and brain pathology using clinical EEG recordings.methods:* The authors use a state-of-the-art Temporal Convolutional Network (TCN) for age regression, and train the model on recordings from the Temple University Hospital EEG Corpus (TUEG) with explicit labels for non-pathological and pathological recordings.results:* The TCN achieves state-of-the-art performance in age decoding with a mean absolute error of 6.6 years.* The authors find that the brain age gap biomarker is not indicative of pathological EEG, and that the model significantly underestimates the age of non-pathological and pathological subjects.
    Abstract The brain's biological age has been considered as a promising candidate for a neurologically significant biomarker. However, recent results based on longitudinal magnetic resonance imaging data have raised questions on its interpretation. A central question is whether an increased biological age of the brain is indicative of brain pathology and if changes in brain age correlate with diagnosed pathology (state hypothesis). Alternatively, could the discrepancy in brain age be a stable characteristic unique to each individual (trait hypothesis)? To address this question, we present a comprehensive study on brain aging based on clinical EEG, which is complementary to previous MRI-based investigations. We apply a state-of-the-art Temporal Convolutional Network (TCN) to the task of age regression. We train on recordings of the Temple University Hospital EEG Corpus (TUEG) explicitly labeled as non-pathological and evaluate on recordings of subjects with non-pathological as well as pathological recordings, both with examinations at a single point in time and repeated examinations over time. Therefore, we created four novel subsets of TUEG that include subjects with multiple recordings: I) all labeled non-pathological; II) all labeled pathological; III) at least one recording labeled non-pathological followed by at least one recording labeled pathological; IV) similar to III) but with opposing transition (first pathological then non-pathological). The results show that our TCN reaches state-of-the-art performance in age decoding with a mean absolute error of 6.6 years. Our extensive analyses demonstrate that the model significantly underestimates the age of non-pathological and pathological subjects (-1 and -5 years, paired t-test, p <= 0.18 and p <= 0.0066). Furthermore, the brain age gap biomarker is not indicative of pathological EEG.
    摘要 研究人员认为大脑的生物龄可能是脑科学中的一个有价值的生物标志物。然而,最近的长期磁共振成像数据显示了解释问题。我们的中心问题是大脑生物龄是脑病学的指标吗,而且改变大脑生物龄与诊断病理相关吗(状态假设)?或者这些差异是每个人的稳定特征吗(性 trait假设)?为了回答这个问题,我们提供了一项全面的大脑老化研究,基于临床EEG。我们使用了当今最佳的时间卷积神经网络(TCN)进行年龄预测任务。我们在记录了普通大学医院EEG资料库(TUEG)的非病理记录上进行训练,并对记录了非病理和病理记录的评估。因此,我们创建了四个新的TUEG子集:I) 所有非病理记录; II) 所有病理记录; III) 至少有一个非病理记录,后跟至少一个病理记录; IV) 与III相似,但具有反向转变(先病理然后非病理)。结果显示,我们的TCN达到了当今最佳性能水平,年龄预测的绝对误差为6.6年。我们进行了广泛的分析,发现TCN对非病理和病理subject下都有显著下降(-1和-5年,paired t-test,p<=0.18和p<=0.0066)。此外,大脑生物龄差异标志物并不是诊断EEG的病理指标。

Understanding Deep Gradient Leakage via Inversion Influence Functions

  • paper_url: http://arxiv.org/abs/2309.13016
  • repo_url: https://github.com/illidanlab/inversion-influence-function
  • paper_authors: Haobo Zhang, Junyuan Hong, Yuyang Deng, Mehrdad Mahdavi, Jiayu Zhou
  • for: 防止分布式学习中的隐私泄露,尤其是在客户端存储敏感数据时。
  • methods: 提出了一种新的倒影影响函数(I$^2$F),通过减少隐私泄露,为分布式学习提供了一种可扩展的解决方案。
  • results: 在不同的网络架构、数据集、攻击实现和干扰防御方法下,I$^2$F有效地预测了潜在的隐私泄露。 codes are provided in https://github.com/illidanlab/inversion-influence-function.
    Abstract Deep Gradient Leakage (DGL) is a highly effective attack that recovers private training images from gradient vectors. This attack casts significant privacy challenges on distributed learning from clients with sensitive data, where clients are required to share gradients. Defending against such attacks requires but lacks an understanding of when and how privacy leakage happens, mostly because of the black-box nature of deep networks. In this paper, we propose a novel Inversion Influence Function (I$^2$F) that establishes a closed-form connection between the recovered images and the private gradients by implicitly solving the DGL problem. Compared to directly solving DGL, I$^2$F is scalable for analyzing deep networks, requiring only oracle access to gradients and Jacobian-vector products. We empirically demonstrate that I$^2$F effectively approximated the DGL generally on different model architectures, datasets, attack implementations, and noise-based defenses. With this novel tool, we provide insights into effective gradient perturbation directions, the unfairness of privacy protection, and privacy-preferred model initialization. Our codes are provided in https://github.com/illidanlab/inversion-influence-function.
    摘要 深度梯度泄露(DGL)是一种非常有效的攻击,可以从梯度向量中提取私人训练图像。这种攻击对于分布式学习从客户端进行训练的数据进行了重大隐私挑战,因为客户端需要共享梯度。防止这种攻击需要一个深入了解梯度泄露发生的时间和方式,但是由于深度网络的黑盒特性,这种理解很困难。在这篇论文中,我们提出了一种新的反向影响函数(I$^2$F),它可以通过解决DGL问题来建立私人梯度和 recovered图像之间的关系。与直接解决DGL相比,I$^2$F是可扩展的,只需要对梯度和Jacobian-vector产品进行 oracle 访问即可。我们通过实验表明,I$^2$F可以有效地适应不同的网络架构、数据集、攻击实现和噪声防御。通过这个新工具,我们提供了关于有效梯度扰动方向、隐私保护不公平性和隐私首选模型初始化的新视角。我们的代码可以在https://github.com/illidanlab/inversion-influence-function中找到。

Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR

  • paper_url: http://arxiv.org/abs/2309.13102
  • repo_url: None
  • paper_authors: Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan “Honza” Silovsky
  • For: 本研究使用 Federated Learning (FL) 技术来训练 End-to-End 语音识别 (ASR) 模型,并研究如何减少 word error rate между FL 模型和中央化训练模型之间的性能差距。* Methods: 本研究考虑了多个因素,包括适应优化器、 Connectionist Temporal Classification (CTC) 权重的变化、模型初始化方法、将中央化训练经验应用到 FL 中、FL 特有的hyperparameter 等,以探讨如何在 ASR 下面 heterogeneous data distribution 中实现更好的性能。* Results: 研究发现一些优化器可以更好地适应 FL 环境,并且在不同的Client sample size 和学习率调度器下进行了详细的分析。此外,本研究还总结了以前的相关研究中的算法、趋势和最佳实践,以便在 FL 中实现更好的 ASR 性能。
    Abstract In this paper, we start by training End-to-End Automatic Speech Recognition (ASR) models using Federated Learning (FL) and examining the fundamental considerations that can be pivotal in minimizing the performance gap in terms of word error rate between models trained using FL versus their centralized counterpart. Specifically, we study the effect of (i) adaptive optimizers, (ii) loss characteristics via altering Connectionist Temporal Classification (CTC) weight, (iii) model initialization through seed start, (iv) carrying over modeling setup from experiences in centralized training to FL, e.g., pre-layer or post-layer normalization, and (v) FL-specific hyperparameters, such as number of local epochs, client sampling size, and learning rate scheduler, specifically for ASR under heterogeneous data distribution. We shed light on how some optimizers work better than others via inducing smoothness. We also summarize the applicability of algorithms, trends, and propose best practices from prior works in FL (in general) toward End-to-End ASR models.
    摘要 在本文中,我们开始由使用联合学习(Federated Learning,FL)训练端到端自动语音识别(ASR)模型,并探讨在减少中心化训练模型和FL模型之间性能差距方面的基本考虑因素。我们专注于以下五个方面:(i)适应性优化器,(ii)修改连接主义时间分类(CTC)重量,(iii)模型初始化通过种子开始,(iv)从中心化训练经验中提取模型设置,例如前层或后层正则化,(v)FL特有的超参数,如本地环节数、客户端抽样大小和学习率调度器。我们解释了一些优化器如何通过减少缓动性来工作更好。我们还总结了先前的FL研究中对端到端ASR模型的算法、趋势和最佳实践。

Expressive variational quantum circuits provide inherent privacy in federated learning

  • paper_url: http://arxiv.org/abs/2309.13002
  • repo_url: None
  • paper_authors: Niraj Kumar, Jamie Heredge, Changhao Li, Shaltiel Eloul, Shree Hari Sureshbabu, Marco Pistoia
  • for: 这个论文目的是提出一种基于量子机器学习模型的联合学习方法,以保护数据隐私。
  • methods: 这个论文使用了变量量子环境模型,并利用表达性编码映射和过参数化 Ansatz 来保护数据隐私。
  • results: 论文表明,使用变量量子环境模型可以避免数据泄露,并且在各种攻击模型下保持模型训练可能性。
    Abstract Federated learning has emerged as a viable distributed solution to train machine learning models without the actual need to share data with the central aggregator. However, standard neural network-based federated learning models have been shown to be susceptible to data leakage from the gradients shared with the server. In this work, we introduce federated learning with variational quantum circuit model built using expressive encoding maps coupled with overparameterized ans\"atze. We show that expressive maps lead to inherent privacy against gradient inversion attacks, while overparameterization ensures model trainability. Our privacy framework centers on the complexity of solving the system of high-degree multivariate Chebyshev polynomials generated by the gradients of quantum circuit. We present compelling arguments highlighting the inherent difficulty in solving these equations, both in exact and approximate scenarios. Additionally, we delve into machine learning-based attack strategies and establish a direct connection between overparameterization in the original federated learning model and underparameterization in the attack model. Furthermore, we provide numerical scaling arguments showcasing that underparameterization of the expressive map in the attack model leads to the loss landscape being swamped with exponentially many spurious local minima points, thus making it extremely hard to realize a successful attack. This provides a strong claim, for the first time, that the nature of quantum machine learning models inherently helps prevent data leakage in federated learning.
    摘要 Federated learning 已经出现为一种可行的分布式解决方案,用于在没有实际分享数据的情况下训练机器学习模型。然而,标准的神经网络基本的 federated learning 模型已经被证明容易受到数据泄露的威胁,即通过分享梯度来泄露数据。在这种情况下,我们介绍了使用表达式编码映射和过参数 Ansatz 构建的 federated learning 模型。我们表明了表达式编码映射会带来自然的隐私保护,而过参数 Ansatz 可以保证模型可训练。我们的隐私框架基于解决由梯度生成的高阶多变量Chebychev多项式系统的复杂性。我们提供了吸引人的论述,证明在正确和近似情况下解决这些方程是非常困难的。此外,我们还探讨了机器学习基于攻击策略,并证明了过参数化在原始 federated learning 模型中的下降会导致攻击模型下降。最后,我们提供了数学Scaling 理论,表明在攻击模型中下降过参数化会导致搜索空间拥有infiniti多个假的本地最优点,因此非常难实现成功攻击。这提供了一个强有力的证明,即 quantum machine learning 模型的本质带来了防止数据泄露的隐私保护。

Deep learning probability flows and entropy production rates in active matter

  • paper_url: http://arxiv.org/abs/2309.12991
  • repo_url: None
  • paper_authors: Nicholas M. Boffi, Eric Vanden-Eijnden
  • for: 这 paper 是为了理解 nonequilibrium 状态下 active matter 系统的性质。
  • methods: 这 paper 使用了 deep learning 方法来计算 entropy production rate 和 probability current。
  • results: 这 paper 得到了一种可以 direct access 到 entropy production rate 和 probability current 的方法,并且可以分解成各个个体、空间区域和自由度的地方贡献。
    Abstract Active matter systems, from self-propelled colloids to motile bacteria, are characterized by the conversion of free energy into useful work at the microscopic scale. These systems generically involve physics beyond the reach of equilibrium statistical mechanics, and a persistent challenge has been to understand the nature of their nonequilibrium states. The entropy production rate and the magnitude of the steady-state probability current provide quantitative ways to do so by measuring the breakdown of time-reversal symmetry and the strength of nonequilibrium transport of measure. Yet, their efficient computation has remained elusive, as they depend on the system's unknown and high-dimensional probability density. Here, building upon recent advances in generative modeling, we develop a deep learning framework that estimates the score of this density. We show that the score, together with the microscopic equations of motion, gives direct access to the entropy production rate, the probability current, and their decomposition into local contributions from individual particles, spatial regions, and degrees of freedom. To represent the score, we introduce a novel, spatially-local transformer-based network architecture that learns high-order interactions between particles while respecting their underlying permutation symmetry. We demonstrate the broad utility and scalability of the method by applying it to several high-dimensional systems of interacting active particles undergoing motility-induced phase separation (MIPS). We show that a single instance of our network trained on a system of 4096 particles at one packing fraction can generalize to other regions of the phase diagram, including systems with as many as 32768 particles. We use this observation to quantify the spatial structure of the departure from equilibrium in MIPS as a function of the number of particles and the packing fraction.
    摘要 活的物质系统,从自驱动溶液到运动细菌,通常表现为在微观尺度上将自由能转化为有用的劳动。这些系统通常包括物理现象超出平衡统计力学的范畴,因此理解其非平衡状态的性质是一个挑战。生成热量率和稳态概率流的大小都是量化Nonequilibrium状态的指标,它们取决于系统的未知和高维度概率密度。在这里,我们基于最近的生成模型技术,开发了一种深度学习框架,可以估算概率密度的分数。我们证明,这个分数,与微观运动方程相结合,可以直接访问生成热量率、稳态概率流和它们的分解为个体粒子、空间区域和自由度的本地贡献。为表示分数,我们引入了一种新的、空间地本符论基于网络架构,可以学习高阶相互作用 между粒子,同时尊重它们的基本卷积共轭性。我们在应用这种方法于多种高维度相互作用的活跃粒子系统时,发现这种方法可以泛化到其他频谱 диаграм中,包括系统中的4096个粒子。我们用这个观察来量化离散于MIPS中的空间结构,并与粒子数和压力 fraction有关。

BayesDLL: Bayesian Deep Learning Library

  • paper_url: http://arxiv.org/abs/2309.12928
  • repo_url: https://github.com/samsunglabs/bayesdll
  • paper_authors: Minyoung Kim, Timothy Hospedales
  • for: 这份论文是为了描述一个基于PyTorch的泛型概率神经网络库,用于处理大规模深度网络。
  • methods: 这个库实现了主流的approximate Bayesian推理算法,包括变分推理、MC-dropout、渐进MCMC和拉пла斯方法。
  • results: 与其他现有的Bayesian神经网络库相比,这个库可以处理非常大的深度网络,包括视transformer(ViTs)。此外,用户无需编写任何代码修改,可以直接使用现有的backbone网络定义代码。最后,这个库还允许使用预训练模型的权重作为先验均值,这非常有用于使用大规模基础模型如ViTs进行Bayesian推理,这些模型难以从scratch使用下游数据进行优化。
    Abstract We release a new Bayesian neural network library for PyTorch for large-scale deep networks. Our library implements mainstream approximate Bayesian inference algorithms: variational inference, MC-dropout, stochastic-gradient MCMC, and Laplace approximation. The main differences from other existing Bayesian neural network libraries are as follows: 1) Our library can deal with very large-scale deep networks including Vision Transformers (ViTs). 2) We need virtually zero code modifications for users (e.g., the backbone network definition codes do not neet to be modified at all). 3) Our library also allows the pre-trained model weights to serve as a prior mean, which is very useful for performing Bayesian inference with the large-scale foundation models like ViTs that are hard to optimise from scratch with the downstream data alone. Our code is publicly available at: \url{https://github.com/SamsungLabs/BayesDLL}\footnote{A mirror repository is also available at: \url{https://github.com/minyoungkim21/BayesDLL}.}.
    摘要 我们发布了一个基于PyTorch的抽象概率神经网络库,用于大规模深度网络。我们的库实现了主流的抽象概率推理算法:变量推理、MC-dropout、随机梯度MCMC和拉пла斯投影。与其他现有的概率神经网络库相比,我们的库具有以下主要优势:1. 我们的库可以处理非常大的深度网络,包括视Transformer(ViTs)。2. 用户没需要修改代码(例如,后ION网络定义代码不需要修改)。3. 我们的库还允许预训练模型的权重服为先验均值,这对于使用大规模基础模型如ViTs进行概率推理非常有用,这些模型难以从头开始使用下游数据进行优化。我们的代码公共可用于:(备用存储库:)。

Topological Data Mapping of Online Hate Speech, Misinformation, and General Mental Health: A Large Language Model Based Study

  • paper_url: http://arxiv.org/abs/2309.13098
  • repo_url: None
  • paper_authors: Andrew Alexander, Hongbin Wang
  • for: 这项研究旨在了解社交媒体上的仇恨言论和谣言对poster的心理健康造成的影响。
  • methods: 研究使用OpenAI的GPT3 derivateposts的嵌入,并通过机器学习分类来理解仇恨言论/谣言在不同社区中的角色。
  • results: 研究发现仇恨言论/谣言与心理疾病之间存在紧密的关系,并通过图形分析获得了在线仇恨言论/谣言与心理健康之间的视觉地图。
    Abstract The advent of social media has led to an increased concern over its potential to propagate hate speech and misinformation, which, in addition to contributing to prejudice and discrimination, has been suspected of playing a role in increasing social violence and crimes in the United States. While literature has shown the existence of an association between posting hate speech and misinformation online and certain personality traits of posters, the general relationship and relevance of online hate speech/misinformation in the context of overall psychological wellbeing of posters remain elusive. One difficulty lies in the lack of adequate data analytics tools capable of adequately analyzing the massive amount of social media posts to uncover the underlying hidden links. Recent progresses in machine learning and large language models such as ChatGPT have made such an analysis possible. In this study, we collected thousands of posts from carefully selected communities on the social media site Reddit. We then utilized OpenAI's GPT3 to derive embeddings of these posts, which are high-dimensional real-numbered vectors that presumably represent the hidden semantics of posts. We then performed various machine-learning classifications based on these embeddings in order to understand the role of hate speech/misinformation in various communities. Finally, a topological data analysis (TDA) was applied to the embeddings to obtain a visual map connecting online hate speech, misinformation, various psychiatric disorders, and general mental health.
    摘要 “社交媒体的出现引发了对其可能传播仇恨言论和谎言的担忧,这些言论可能导致人们偏见和歧视,并被怀疑与社会暴力和犯罪之间存在关系。虽然文献表明在线仇恨言论和谎言与发帖者的个人特征有关,但全面的心理健康和发帖者的关系还未得到了解。一个问题在于分析大量社交媒体帖子的数据分析工具不够完善。Recent progresses in machine learning and large language models such as ChatGPT have made such an analysis possible。在这项研究中,我们收集了Reddit社交媒体平台上的 тысячи篇帖子,然后使用OpenAI的GPT3来 derive embeddings的这些帖子,这些帖子的坐标是高维实数Vecctors,它们可能表示帖子的隐藏 semantics。然后我们通过这些坐标进行了不同的机器学习分类,以了解在不同社区中仇恨言论和谎言的角色。最后,我们对坐标进行了拓扑数据分析(TDA),以获得在线仇恨言论、谎言、心理疾病和总的心理健康之间的视觉地图。”

FairComp: Workshop on Fairness and Robustness in Machine Learning for Ubiquitous Computing

  • paper_url: http://arxiv.org/abs/2309.12877
  • repo_url: None
  • paper_authors: Sofia Yfantidou, Dimitris Spathis, Marios Constantinides, Tong Xia, Niels van Berkel
  • for: 本研讨会旨在讨论 ubicomp 研究中的公平性,以及其社会、技术和法律含义。
  • methods: 本研讨会将从社会角度探讨公平性和 ubicomp 研究之间的关系,并确定了不会 causing harm 或违反个人权利的技术实践。
  • results: 本研讨会希望能够培养一个关注公平性的 ubicomp 研究社区,同时也为未来的研究提供明确的指导方针。
    Abstract How can we ensure that Ubiquitous Computing (UbiComp) research outcomes are both ethical and fair? While fairness in machine learning (ML) has gained traction in recent years, fairness in UbiComp remains unexplored. This workshop aims to discuss fairness in UbiComp research and its social, technical, and legal implications. From a social perspective, we will examine the relationship between fairness and UbiComp research and identify pathways to ensure that ubiquitous technologies do not cause harm or infringe on individual rights. From a technical perspective, we will initiate a discussion on data practices to develop bias mitigation approaches tailored to UbiComp research. From a legal perspective, we will examine how new policies shape our community's work and future research. We aim to foster a vibrant community centered around the topic of responsible UbiComp, while also charting a clear path for future research endeavours in this field.
    摘要 如何确保宇宙计算(UbiComp)研究成果是公正和公平的?尽管机器学习(ML)中的公正在最近几年得到了更多的关注,但UbiComp中的公正仍然未得到探讨。这个研讨会旨在讨论UbiComp研究中的公正性和其社会、技术和法律因素的影响。从社会角度来看,我们将探讨UBicomp技术不会对个人 права或者造成伤害的关系。从技术角度来看,我们将开始讨论针对UbiComp研究的数据实践,以开发减少偏见的技术策略。从法律角度来看,我们将检查新的政策如何影响我们的社区和未来的研究。我们想建立一个热烈的社区,以讨论负责任的UbiComp研究,同时也映射出未来这一领域的研究方向。

Robotic Handling of Compliant Food Objects by Robust Learning from Demonstration

  • paper_url: http://arxiv.org/abs/2309.12856
  • repo_url: None
  • paper_authors: Ekrem Misimi, Alexander Olofsson, Aleksander Eilertsen, Elling Ruud Øye, John Reidar Mathiassen
  • for: robotic grasping of food compliant objects, to improve the consistency of robot learning and reduce the variability of human operators.
  • methods: Learning from Demonstration (LfD) approach that combines RGB-D images and tactile data to estimate the necessary gripper pose, finger configuration, and forces for effective robot handling.
  • results: the proposed approach can automatically remove inconsistent demonstrations and estimate the teacher’s intended policy, with validated performance for fragile and compliant food objects with complex 3D shapes.
    Abstract The robotic handling of compliant and deformable food raw materials, characterized by high biological variation, complex geometrical 3D shapes, and mechanical structures and texture, is currently in huge demand in the ocean space, agricultural, and food industries. Many tasks in these industries are performed manually by human operators who, due to the laborious and tedious nature of their tasks, exhibit high variability in execution, with variable outcomes. The introduction of robotic automation for most complex processing tasks has been challenging due to current robot learning policies. A more consistent learning policy involving skilled operators is desired. In this paper, we address the problem of robot learning when presented with inconsistent demonstrations. To this end, we propose a robust learning policy based on Learning from Demonstration (LfD) for robotic grasping of food compliant objects. The approach uses a merging of RGB-D images and tactile data in order to estimate the necessary pose of the gripper, gripper finger configuration and forces exerted on the object in order to achieve effective robot handling. During LfD training, the gripper pose, finger configurations and tactile values for the fingers, as well as RGB-D images are saved. We present an LfD learning policy that automatically removes inconsistent demonstrations, and estimates the teacher's intended policy. The performance of our approach is validated and demonstrated for fragile and compliant food objects with complex 3D shapes. The proposed approach has a vast range of potential applications in the aforementioned industry sectors.
    摘要 “ robotic food raw material 的自适应和弹性处理,具有高度生物变化、复杂的三维几何形状、机械结构和 текстусту,目前在海洋、农业和食品行业中受到巨大的需求。这些行业中的许多任务现在由人类操作员执行,由于任务的劳动 INTENSIVE 和 monotony,操作员的执行效果存在很大的变化, resulting in variable outcomes。 introducing robotic automation for most complex processing tasks has been challenging due to current robot learning policies. therefore, a more consistent learning policy involving skilled operators is desired. in this paper, we address the problem of robot learning when presented with inconsistent demonstrations. to this end, we propose a robust learning policy based on Learning from Demonstration (LfD) for robotic grasping of food compliant objects. the approach uses a merging of RGB-D images and tactile data in order to estimate the necessary pose of the gripper, gripper finger configuration and forces exerted on the object in order to achieve effective robot handling. during LfD training, the gripper pose, finger configurations and tactile values for the fingers, as well as RGB-D images are saved. we present an LfD learning policy that automatically removes inconsistent demonstrations, and estimates the teacher's intended policy. the performance of our approach is validated and demonstrated for fragile and compliant food objects with complex 3D shapes. the proposed approach has a vast range of potential applications in the aforementioned industry sectors.”Note: The translation is in Simplified Chinese, which is the standard version of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

DeepOPF-U: A Unified Deep Neural Network to Solve AC Optimal Power Flow in Multiple Networks

  • paper_url: http://arxiv.org/abs/2309.12849
  • repo_url: https://github.com/hieu9955/ggggg
  • paper_authors: Heng Liang, Changhong Zhao
  • for: 解决不同电力网络中的最优电力流问题
  • methods: 使用单一深度神经网络(DNN)解决交流电力流问题,并采用滑动输入和输出层对各种电力网络中的负荷和OPT问题进行适应
  • results: 对IEEE 57/118/300-bus测试系统和一个逐渐增长的网络进行了优化的性能表现,并且可以处理不同数量的节点、线径和可再生能源资源。
    Abstract The traditional machine learning models to solve optimal power flow (OPF) are mostly trained for a given power network and lack generalizability to today's power networks with varying topologies and growing plug-and-play distributed energy resources (DERs). In this paper, we propose DeepOPF-U, which uses one unified deep neural network (DNN) to solve alternating-current (AC) OPF problems in different power networks, including a set of power networks that is successively expanding. Specifically, we design elastic input and output layers for the vectors of given loads and OPF solutions with varying lengths in different networks. The proposed method, using a single unified DNN, can deal with different and growing numbers of buses, lines, loads, and DERs. Simulations of IEEE 57/118/300-bus test systems and a network growing from 73 to 118 buses verify the improved performance of DeepOPF-U compared to existing DNN-based solution methods.
    摘要 传统的机器学习模型用于优化电力流(OPF)大多是为某个特定的电力网络训练,而缺乏对今天的电力网络结构和增加插入式分布式能源资源(DERs)的普适性。在这篇论文中,我们提议了DeepOPF-U,它使用一个通用的深度神经网络(DNN)来解决不同电力网络中的交流电力流优化问题。Specifically,我们设计了弹性输入和输出层,以处理具有不同长度的输入和解决方案Vector在不同的网络中。我们的方法使用单个通用DNN来解决不同的和增加的电力网络中的问题,包括不同数量的总站、线径和负荷。我们的实验结果表明,相比之前的DNN基本方法,DeepOPF-U可以更好地处理不同的电力网络和增加的负荷。Here's a word-for-word translation of the text into Simplified Chinese:传统的机器学习模型用于优化电力流(OPF)大多是为某个特定的电力网络训练,而缺乏对今天的电力网络结构和增加插入式分布式能源资源(DERs)的普适性。在这篇论文中,我们提议了DeepOPF-U,它使用一个通用的深度神经网络(DNN)来解决不同电力网络中的交流电力流优化问题。Specifically,我们设计了弹性输入和输出层,以处理具有不同长度的输入和解决方案Vector在不同的网络中。我们的方法使用单个通用DNN来解决不同的和增加的电力网络中的问题,包括不同数量的总站、线径和负荷。我们的实验结果表明,相比之前的DNN基本方法,DeepOPF-U可以更好地处理不同的电力网络和增加的负荷。

Multiple Independent DE Optimizations to Tackle Uncertainty and Variability in Demand in Inventory Management

  • paper_url: http://arxiv.org/abs/2309.13095
  • repo_url: None
  • paper_authors: Sarit Maitra, Sukanya Kundu, Vivek Mishra
  • for: 本研究旨在找出适用于不确定需求 patrerns 的 Metaheuristic Differeential Evolution 优化策略,以最小化存储成本。
  • methods: 本研究结合综合 IM 策略的 continuous review 和 Monte Carlo Simulation (MCS),并对多种算法进行比较,以找到最佳解决方案。
  • results: 研究发现,Differeential Evolution (DE) 算法在优化 IM 中表现最佳,并通过 Latin Hypercube Sampling (LHS) 统计方法进行参数调整。本研究还提出了一种 combining 多个独立 DE 优化实例的方法,以提高性能和成本效益,特别是在不确定需求 patrerns 下。
    Abstract To determine the effectiveness of metaheuristic Differential Evolution optimization strategy for inventory management (IM) in the context of stochastic demand, this empirical study undertakes a thorough investigation. The primary objective is to discern the most effective strategy for minimizing inventory costs within the context of uncertain demand patterns. Inventory costs refer to the expenses associated with holding and managing inventory within a business. The approach combines a continuous review of IM policies with a Monte Carlo Simulation (MCS). To find the optimal solution, the study focuses on meta-heuristic approaches and compares multiple algorithms. The outcomes reveal that the Differential Evolution (DE) algorithm outperforms its counterparts in optimizing IM. To fine-tune the parameters, the study employs the Latin Hypercube Sampling (LHS) statistical method. To determine the final solution, a method is employed in this study which combines the outcomes of multiple independent DE optimizations, each initiated with different random initial conditions. This approach introduces a novel and promising dimension to the field of inventory management, offering potential enhancements in performance and cost efficiency, especially in the presence of stochastic demand patterns.
    摘要 为了判断metaheuristic diferencial evolution优化策略对供应链管理(IM)在不确定的需求 Patterns 上的效果,这个实验室进行了一项严格的调查。主要目标是找到最有效的策略来最小化存储成本在企业中。存储成本包括保持和管理存储的成本。该方法结合了连续性 IM 策略的审查和Monte Carlo Simulation(MCS)。为了找到优化策略,这个研究对meta-heuristic Approaches进行了比较多个算法。研究发现,diferencial Evolution(DE)算法在优化IM方面表现出色。为了调整参数,这个研究使用了Latin Hypercube Sampling(LHS)统计方法。为了确定最终解决方案,这个研究employs a方法,将多个独立的DE优化结果组合起来,每个初始条件都是random。这种方法在供应链管理领域引入了一个新的维度,提供了可能的性能和成本效益,特别是在不确定的需求Patterns 上。

Reward Function Design for Crowd Simulation via Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.12841
  • repo_url: None
  • paper_authors: Ariel Kwiatkowski, Vicky Kalogeiton, Julien Pettré, Marie-Paule Cani
  • for: 这个论文的目的是探讨基于奖励学习的人群 simulate 的方法,以及设计奖励函数的正确方法。
  • methods: 这个论文使用了奖励学习方法,并通过 theoretically 和 empirically 分析奖励函数的效果。
  • results: 该论文的实验结果表明,直接减少能量消耗是一种有效的策略, provided that it is paired with an appropriately scaled guiding potential。这些结果可以帮助开发新的人群 simulate 技术,并对人类 Navigation 的研究产生影响。
    Abstract Crowd simulation is important for video-games design, since it enables to populate virtual worlds with autonomous avatars that navigate in a human-like manner. Reinforcement learning has shown great potential in simulating virtual crowds, but the design of the reward function is critical to achieving effective and efficient results. In this work, we explore the design of reward functions for reinforcement learning-based crowd simulation. We provide theoretical insights on the validity of certain reward functions according to their analytical properties, and evaluate them empirically using a range of scenarios, using the energy efficiency as the metric. Our experiments show that directly minimizing the energy usage is a viable strategy as long as it is paired with an appropriately scaled guiding potential, and enable us to study the impact of the different reward components on the behavior of the simulated crowd. Our findings can inform the development of new crowd simulation techniques, and contribute to the wider study of human-like navigation.
    摘要 伪人群模拟在游戏设计中具有重要意义,因为它使得虚拟世界中的自主人物能够在人类化的方式下自主 Navigation。基于奖励学习的人群模拟显示了巨大的潜力,但是奖励函数的设计是获得有效和高效的结果的关键。在这项工作中,我们探讨了基于奖励学习的人群模拟中奖励函数的设计。我们提供了理论上的思路,并通过一系列场景的实验来评估奖励函数的有效性。我们发现,直接减少能量使用是一个有效的策略,只要与适当的拟合潜在能量相关的奖励函数相结合。这些实验结果可以导向新的人群模拟技术的开发,并对人类化导航的更广泛研究产生贡献。

Doubly Robust Proximal Causal Learning for Continuous Treatments

  • paper_url: http://arxiv.org/abs/2309.12819
  • repo_url: None
  • paper_authors: Yong Wu, Yanwei Fu, Shouyan Wang, Xinwei Sun
  • for: 本研究旨在提出一种可以处理连续干扰因素的 proximal causal learning 框架,以便在实际应用中更好地估计 causal effect。
  • methods: 我们提出了一种基于 kernel 函数的 DR 估计器,可以有效地处理连续干扰因素。我们还提出了一种新的方法来效率地解决干扰函数的问题。
  • results: 我们对 synthetic 数据和实际应用进行了评估,并证明了我们的估计器具有良好的准确性和稳定性。
    Abstract Proximal causal learning is a promising framework for identifying the causal effect under the existence of unmeasured confounders. Within this framework, the doubly robust (DR) estimator was derived and has shown its effectiveness in estimation, especially when the model assumption is violated. However, the current form of the DR estimator is restricted to binary treatments, while the treatment can be continuous in many real-world applications. The primary obstacle to continuous treatments resides in the delta function present in the original DR estimator, making it infeasible in causal effect estimation and introducing a heavy computational burden in nuisance function estimation. To address these challenges, we propose a kernel-based DR estimator that can well handle continuous treatments. Equipped with its smoothness, we show that its oracle form is a consistent approximation of the influence function. Further, we propose a new approach to efficiently solve the nuisance functions. We then provide a comprehensive convergence analysis in terms of the mean square error. We demonstrate the utility of our estimator on synthetic datasets and real-world applications.
    摘要 近似 causal learning 是一个有前途的框架,用于在存在未测量的干扰因素时确定 causal effect。在这个框架下, doubly robust(DR)估计器被 derivation 出来,并在不符合模型假设的情况下表现出优异的效果。然而,现有的 DR 估计器只适用于 binary 治疗,而在实际应用中,治疗可能是连续的。主要的障碍是 delta 函数存在在原始 DR 估计器中,使其无法在 causal effect 估计中使用,并且在 auxiliary function 估计中增加了巨大的计算负担。为了解决这些挑战,我们提出了基于 kernel 的 DR 估计器,可以好好地处理连续治疗。利用其平滑性,我们表明其oracle形式是一个可靠的 influence function 的近似。此外,我们提出了一种新的方法来有效地解决 auxiliary function。然后,我们进行了完整的mean square error(MSE)的收敛分析。我们在 sintetic 数据和实际应用中展示了我们的估计器的实用性。

Improving Generalization in Game Agents with Data Augmentation in Imitation Learning

  • paper_url: http://arxiv.org/abs/2309.12815
  • repo_url: None
  • paper_authors: Derek Yadgaroff, Alessandro Sestini, Konrad Tollmar, Linus Gisslén
  • for: 提高游戏AI的通用能力
  • methods: 使用数据扩展法提高imitative learning agents的通用能力
  • results: 数据扩展法可以有效提高imitative learning agents的通用能力,并提供了多种3D环境中的性能指标。
    Abstract Imitation learning is an effective approach for training game-playing agents and, consequently, for efficient game production. However, generalization - the ability to perform well in related but unseen scenarios - is an essential requirement that remains an unsolved challenge for game AI. Generalization is difficult for imitation learning agents because it requires the algorithm to take meaningful actions outside of the training distribution. In this paper we propose a solution to this challenge. Inspired by the success of data augmentation in supervised learning, we augment the training data so the distribution of states and actions in the dataset better represents the real state-action distribution. This study evaluates methods for combining and applying data augmentations to observations, to improve generalization of imitation learning agents. It also provides a performance benchmark of these augmentations across several 3D environments. These results demonstrate that data augmentation is a promising framework for improving generalization in imitation learning agents.
    摘要 仿制学习是一种有效的方法用于训练游戏AI代理人,并且可以提高游戏生产效率。然而,通用化(能够在相关 yet unseen 的情况下表现良好)是一个必备的要求,它是一个未解决的挑战。通用化对仿制学习代理人来说是一个困难的任务,因为它需要算法在训练分布之外行为。在这篇论文中,我们提出了一种解决这个挑战的方法。受到超参数学习中的数据扩展成功的启发,我们将训练数据进行扩展,以使得状态和动作的分布更好地表示真实的状态-动作分布。本研究评估了对观察数据的合并和应用的方法,以提高仿制学习代理人的通用化。此外,本研究还提供了不同3D环境下这些扩展的性能比较。这些结果表明,数据扩展是一种有前途的框架,可以提高仿制学习代理人的通用化。

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

  • paper_url: http://arxiv.org/abs/2309.12802
  • repo_url: None
  • paper_authors: Alexandre R. Ferreira, Cláudio E. C. Campelo
  • for: 本研究旨在提高语音识别模型的Robustness,需要一个大型多样化的标注数据集。
  • methods: 本文提出了一种基于深度假音的数据增强策略,使用了voice cloner和英文语音 Dataset produced by Indians。
  • results: 经过实验 validate,使用增强数据可以提高语音识别模型的性能,并且在不同的场景下都有良好的效果。
    Abstract To train transcriptor models that produce robust results, a large and diverse labeled dataset is required. Finding such data with the necessary characteristics is a challenging task, especially for languages less popular than English. Moreover, producing such data requires significant effort and often money. Therefore, a strategy to mitigate this problem is the use of data augmentation techniques. In this work, we propose a framework that approaches data augmentation based on deepfake audio. To validate the produced framework, experiments were conducted using existing deepfake and transcription models. A voice cloner and a dataset produced by Indians (in English) were selected, ensuring the presence of a single accent in the dataset. Subsequently, the augmented data was used to train speech to text models in various scenarios.
    摘要 In this work, we conducted experiments using existing deepfake and transcription models. We selected a voice cloner and a dataset produced by Indians (in English) to ensure the presence of a single accent in the dataset. We then augmented the data and used it to train speech-to-text models in various scenarios.

An Intelligent Approach to Detecting Novel Fault Classes for Centrifugal Pumps Based on Deep CNNs and Unsupervised Methods

  • paper_url: http://arxiv.org/abs/2309.12765
  • repo_url: None
  • paper_authors: Mahdi Abdollah Chalaki, Daniyal Maroufi, Mahdi Robati, Mohammad Javad Karimi, Ali Sadighi
  • for: 本研究旨在Addressing the challenges of data-driven fault diagnosis of rotating machines, particularly the lack of information about various faults in the field.
  • methods: 本 paper 使用了 convolutional neural network (CNN) 和 t-SNE 方法,以检测 novel faults. 首先,使用受限的系统故障信息进行训练,然后使用 clustering 技术进行检测。 如果检测到新的故障,则使用新数据进行网络的扩展。
  • results: 实验结果表明,这种 two-stage 方法在一台 centrifugal pump 上得到了高精度的 novel fault 检测结果。
    Abstract Despite the recent success in data-driven fault diagnosis of rotating machines, there are still remaining challenges in this field. Among the issues to be addressed, is the lack of information about variety of faults the system may encounter in the field. In this paper, we assume a partial knowledge of the system faults and use the corresponding data to train a convolutional neural network. A combination of t-SNE method and clustering techniques is then employed to detect novel faults. Upon detection, the network is augmented using the new data. Finally, a test setup is used to validate this two-stage methodology on a centrifugal pump and experimental results show high accuracy in detecting novel faults.
    摘要 尽管在数据驱动机器故障诊断方面已经取得了一定的成功,但这个领域仍然存在一些挑战。其中一个问题是系统可能在场景中遇到多种故障的信息不够。在这篇论文中,我们假设系统具有部分故障知识,并使用相应的数据来训练卷积神经网络。然后,我们使用t-SNE方法和聚类技术检测新的故障。检测到故障后,网络被扩展使用新的数据。最后,我们使用测试setup验证这种两个阶段方法在中心泵上的效果,实验结果显示高精度地检测到新的故障。Note: "t-SNE" stands for "t-distributed Stochastic Neighbor Embedding", which is a technique used to reduce the dimensionality of data.

Prototype-Enhanced Hypergraph Learning for Heterogeneous Information Networks

  • paper_url: http://arxiv.org/abs/2309.13092
  • repo_url: None
  • paper_authors: Shuai Wang, Jiayi Shen, Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring
  • for: 本研究旨在提出一种基于超граpher学习的节点分类方法,用于处理具有多样化和复杂关系的 multimedia数据中的异构信息网络(HINs)。
  • methods: 本方法使用超граpher instead of graph,以捕捉高阶关系 между节点,而无需靠谱定过程。它还利用示例来改善超граpher学习过程的稳定性,从而提供可读性的人类可读性。
  • results: 对于三个真实的 HINs 实验,本方法显示了效果。
    Abstract The variety and complexity of relations in multimedia data lead to Heterogeneous Information Networks (HINs). Capturing the semantics from such networks requires approaches capable of utilizing the full richness of the HINs. Existing methods for modeling HINs employ techniques originally designed for graph neural networks, and HINs decomposition analysis, like using manually predefined metapaths. In this paper, we introduce a novel prototype-enhanced hypergraph learning approach for node classification in HINs. Using hypergraphs instead of graphs, our method captures higher-order relationships among nodes and extracts semantic information without relying on metapaths. Our method leverages the power of prototypes to improve the robustness of the hypergraph learning process and creates the potential to provide human-interpretable insights into the underlying network structure. Extensive experiments on three real-world HINs demonstrate the effectiveness of our method.
    摘要 multimedia数据中的多样性和复杂性导致非同様网络(HINs)的出现。捕捉HINs中的semantics需要能够利用非同様网络的全部 ricinus。现有的HINs模型使用原本设计 для图神经网络的技术,以及手动划定的ме타路径进行分析。本文提出了一种基于prototype强化的超граraph学习方法 дляHINs节点分类。使用超граraph instead of graphs,我们的方法可以捕捉节点之间的高阶关系,并提取无需依赖于ме타路径的semantic信息。我们的方法利用 prototype的力量来提高超гра�学习过程的稳定性,并创造了可以提供人类可读的网络结构下的内部层次结构的可能性。我们在三个真实的HINs上进行了广泛的实验,并证明了我们的方法的有效性。

Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation

  • paper_url: http://arxiv.org/abs/2309.12742
  • repo_url: https://github.com/yue-zhongqi/icon
  • paper_authors: Zhongqi Yue, Hanwang Zhang, Qianru Sun
  • for: 本研究旨在解决Unsupervised Domain Adaptation (UDA)中存在的假 correlate问题,即源领域中的特征与目标领域的特征之间的假 correlate,这导致了目标领域的模型难以泛化。
  • methods: 我们提出了一种名为“做 Consistency learning”(ICON)的方法,它通过同时使源领域和目标领域的分类器预测结果相一致,从而消除了目标领域中的假 correlate。
  • results: 我们在经验证上表明,ICON可以在经典的UDA benchmark上达到最佳性能,并在挑战性的WILDS 2.0 benchmark上超越所有传统方法。
    Abstract Domain Adaptation (DA) is always challenged by the spurious correlation between domain-invariant features (e.g., class identity) and domain-specific features (e.g., environment) that does not generalize to the target domain. Unfortunately, even enriched with additional unsupervised target domains, existing Unsupervised DA (UDA) methods still suffer from it. This is because the source domain supervision only considers the target domain samples as auxiliary data (e.g., by pseudo-labeling), yet the inherent distribution in the target domain -- where the valuable de-correlation clues hide -- is disregarded. We propose to make the U in UDA matter by giving equal status to the two domains. Specifically, we learn an invariant classifier whose prediction is simultaneously consistent with the labels in the source domain and clusters in the target domain, hence the spurious correlation inconsistent in the target domain is removed. We dub our approach "Invariant CONsistency learning" (ICON). Extensive experiments show that ICON achieves the state-of-the-art performance on the classic UDA benchmarks: Office-Home and VisDA-2017, and outperforms all the conventional methods on the challenging WILDS 2.0 benchmark. Codes are in https://github.com/yue-zhongqi/ICON.
    摘要 域 adaptation (DA) 总是面临着域特异特征(例如类标识)和域特定特征(例如环境)之间的假设相关性,这种相关性不能泛化到目标域。尽管使用额外的无监督目标域数据,现有的无监督DA(UDA)方法仍然受到这种挑战。这是因为源域监督只考虑目标域样本为辅助数据(例如 pseudo-labeling),忽略了目标域的自然分布,其中包含了价值的分解准则。我们提议使得U在UDA中变得重要,即在源域和目标域之间学习一个不变的分类器,其预测结果同时与源域中的标签和目标域中的团集一致,因此在目标域中排除了假设相关性。我们称之为“不变CONsistency学习”(ICON)。我们进行了广泛的实验,ICON在经典的UDABenchmark上取得了state-of-the-art性能,并在挑战性的WILDS 2.0 Benchmark上超过了所有传统方法。代码在https://github.com/yue-zhongqi/ICON。

Optimal Dynamic Fees for Blockchain Resources

  • paper_url: http://arxiv.org/abs/2309.12735
  • repo_url: None
  • paper_authors: Davide Crapis, Ciamac C. Moallemi, Shouqiao Wang
  • For: 本研究は多个区块链资源的优化奖励机制的设计问题进行通用和实用的框架开发。* Methods: 我们的框架可以计算优化奖励策略,以让奖励策略与持续的需求变化进行融合,同时保证奖励策略对当地噪声的Robustness。在多个资源的总情况下,我们的优化策略能正确处理资源需求之间的交叉效应(补做和替代)。* Results: 我们的框架可以用来修订或指导使用各种各样的奖励更新规则,如EIP-1559或EIP-4844。我们通过两个案例研究证明了这一点。我们还使用实际市场数据来对一个一维版本的我们模型进行估算,并对EIP-1559的性能与我们的优化策略进行比较。
    Abstract We develop a general and practical framework to address the problem of the optimal design of dynamic fee mechanisms for multiple blockchain resources. Our framework allows to compute policies that optimally trade-off between adjusting resource prices to handle persistent demand shifts versus being robust to local noise in the observed block demand. In the general case with more than one resource, our optimal policies correctly handle cross-effects (complementarity and substitutability) in resource demands. We also show how these cross-effects can be used to inform resource design, i.e. combining resources into bundles that have low demand-side cross-effects can yield simpler and more efficient price-update rules. Our framework is also practical, we demonstrate how it can be used to refine or inform the design of heuristic fee update rules such as EIP-1559 or EIP-4844 with two case studies. We then estimate a uni-dimensional version of our model using real market data from the Ethereum blockchain and empirically compare the performance of our optimal policies to EIP-1559.
    摘要 我们开发了一个通用且实用的框架,以解决多个区块链资源的优化设计动态费用机制问题。我们的框架可以计算优化费用更新策略,以优考虑到适应持续强制变化的需求轨迹,同时保持对地方噪声观测到的区块需求的稳定性。在多than one resource的普通情况下,我们的优化策略可以正确处理资源需求之间的交叉效应(补做和替代)。我们还示出了如何使用这些交叉效应来指导资源设计,例如将资源合并成具有低需求层次交叉效应的套件可以得到简单而高效的价格更新规则。我们的框架也是实用的,我们示出了如何使用它来优化或指导基于EIP-1559或EIP-4844的费用更新规则的设计。然后,我们使用实际市场数据从Ethereum区块链进行了一个一维版本的模型估算,并对EIP-1559的性能与我们的优化策略进行了实际比较。

Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition

  • paper_url: http://arxiv.org/abs/2309.12714
  • repo_url: None
  • paper_authors: Amirali Soltani Tehrani, Niloufar Faridani, Ramin Toosi
  • for: 这个研究旨在提高人机交互的深度理解,通过感知人类情感状态,提高人机交互的效果和同情性。
  • methods: 该研究提出了一种新的方法,即结合自我指导特征提取和指导分类来实现情绪识别。在预处理步骤中,我们使用基于Wav2Vec模型的自我指导特征提取器,从audio数据中捕捉音频特征。然后,输出特征图的前一步结果被传递给自定义的卷积神经网络模型进行情绪分类。
  • results: 在使用ShEMO数据集进行测试时,该方法超过了两个基准方法,即支持向量机分类器和转移学习预训练的CNN模型。与状态的艺术方法相比,该方法表现更出色,提供了更高的情感认知水平,为人机交互领域带来更多的同情性和效果。
    Abstract Speech Emotion Recognition (SER) plays a pivotal role in enhancing human-computer interaction by enabling a deeper understanding of emotional states across a wide range of applications, contributing to more empathetic and effective communication. This study proposes an innovative approach that integrates self-supervised feature extraction with supervised classification for emotion recognition from small audio segments. In the preprocessing step, to eliminate the need of crafting audio features, we employed a self-supervised feature extractor, based on the Wav2Vec model, to capture acoustic features from audio data. Then, the output featuremaps of the preprocessing step are fed to a custom designed Convolutional Neural Network (CNN)-based model to perform emotion classification. Utilizing the ShEMO dataset as our testing ground, the proposed method surpasses two baseline methods, i.e. support vector machine classifier and transfer learning of a pretrained CNN. comparing the propose method to the state-of-the-art methods in SER task indicates the superiority of the proposed method. Our findings underscore the pivotal role of deep unsupervised feature learning in elevating the landscape of SER, offering enhanced emotional comprehension in the realm of human-computer interactions.
    摘要 人机交互中情感认识(SER)发挥关键作用,帮助更深入理解情感状态,涵盖广泛应用领域,从而提供更 Empathetic 和有效的沟通。本研究提出了一种创新的方法,将自主学习特征提取与经过监督分类结合用于情感识别。在预处理步骤中,我们采用基于 Wav2Vec 模型的自主学习特征提取器,从音频数据中提取了音频特征。然后,预处理步骤的输出特征地图被传递给自定义设计的卷积神经网络(CNN)模型进行情感分类。使用 ShEMO 数据集进行测试,我们的提议方法超过了两个基准方法,即支持向量机学习分类器和转移学习已经训练的 CNN。与状态 искусственный地进行SER任务的方法进行比较,我们的发现表明了深入的无监督特征学习对 SER 任务的提升具有重要作用。我们的发现强调了深入的特征学习在人机交互中的情感理解方面的重要性。

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

  • paper_url: http://arxiv.org/abs/2309.12712
  • repo_url: https://github.com/hugomalard/big-model-only-for-hard-audios
  • paper_authors: Hugo Malard, Salah Zaiem, Robin Algayres
  • for: 这个研究的目的是提出一个可以在不同的模型大小下选择最佳的决策模组,以便在不同的内存和硬件环境下进行自动语音识别(ASR)。
  • methods: 作者使用了两个不同大小的 Whisper 模型,并将它们联合使用以构建一个决策模组。他们还使用了一些计算效率的技巧来降低决策模组的计算成本。
  • results: 作者的实验结果显示,使用这个决策模组可以实现substantial的计算成本减少,同时保持transcription的性能水平。具体来说,在两个 Whisper 模型中,使用决策模组可以降低了模型的计算成本,并且对于大多数的测试数据进行了好的调整。
    Abstract Recent progress in Automatic Speech Recognition (ASR) has been coupled with a substantial increase in the model sizes, which may now contain billions of parameters, leading to slow inferences even with adapted hardware. In this context, several ASR models exist in various sizes, with different inference costs leading to different performance levels. Based on the observation that smaller models perform optimally on large parts of testing corpora, we propose to train a decision module, that would allow, given an audio sample, to use the smallest sufficient model leading to a good transcription. We apply our approach to two Whisper models with different sizes. By keeping the decision process computationally efficient, we build a decision module that allows substantial computational savings with reduced performance drops.
    摘要

Discovering the Interpretability-Performance Pareto Front of Decision Trees with Dynamic Programming

  • paper_url: http://arxiv.org/abs/2309.12701
  • repo_url: None
  • paper_authors: Hector Kohler, Riad Akrour, Philippe Preux
  • for: 本文提出了一种新的Markov Decision Problem(MDP)形式,用于找到最佳决策树。
  • methods: 该方法使用了一个单一的动态计划来计算多种解释性-性能质量Front的优化决策树。
  • results: 实验表明,该方法与当前状态的算法相当于的精度和运行时间,同时返回了一组决策树,用于用户选择最适合其需求的解释性-性能质量Front。
    Abstract Decision trees are known to be intrinsically interpretable as they can be inspected and interpreted by humans. Furthermore, recent hardware advances have rekindled an interest for optimal decision tree algorithms, that produce more accurate trees than the usual greedy approaches. However, these optimal algorithms return a single tree optimizing a hand defined interpretability-performance trade-off, obtained by specifying a maximum number of decision nodes, giving no further insights about the quality of this trade-off. In this paper, we propose a new Markov Decision Problem (MDP) formulation for finding optimal decision trees. The main interest of this formulation is that we can compute the optimal decision trees for several interpretability-performance trade-offs by solving a single dynamic program, letting the user choose a posteriori the tree that best suits their needs. Empirically, we show that our method is competitive with state-of-the-art algorithms in terms of accuracy and runtime while returning a whole set of trees on the interpretability-performance Pareto front.
    摘要 In this paper, we propose a new Markov Decision Problem (MDP) formulation for finding optimal decision trees. Our approach allows us to compute the optimal decision trees for multiple interpretability-performance trade-offs by solving a single dynamic program. This enables the user to choose the tree that best suits their needs after the fact. Empirical results show that our method is competitive with state-of-the-art algorithms in terms of accuracy and runtime, while providing a set of trees on the interpretability-performance Pareto front.

Recurrent Temporal Revision Graph Networks

  • paper_url: http://arxiv.org/abs/2309.12694
  • repo_url: None
  • paper_authors: Yizhou Chen, Anxiang Zeng, Guangda Huzhang, Qingtao Yu, Kerui Zhang, Cao Yuanpeng, Kangle Wu, Han Yu, Zhiming Zhou
  • for: 本研究旨在提供一种更准确地模型 temporal graph 的方法,具体来说是一种基于 recurrent neural network (RNN) 的 temporal neighbor aggregation 方法,以便更好地捕捉 temporal graph 中 node 之间的关系。
  • methods: 本研究使用 RNN WITH node-wise hidden states 来集成所有历史邻居信息,从而提供更完整的邻居信息。这种方法可以在实际应用中提高 averaged precision 约 9.6% compared to existing methods。
  • results: 本研究的实际应用result 显示,使用本研究提出的方法可以在 Ecommerce dataset 中提高 averaged precision 约 9.6% compared to existing methods。这表明本研究的方法可以更好地捕捉 temporal graph 中 node 之间的关系,从而提高模型的准确性。
    Abstract Temporal graphs offer more accurate modeling of many real-world scenarios than static graphs. However, neighbor aggregation, a critical building block of graph networks, for temporal graphs, is currently straightforwardly extended from that of static graphs. It can be computationally expensive when involving all historical neighbors during such aggregation. In practice, typically only a subset of the most recent neighbors are involved. However, such subsampling leads to incomplete and biased neighbor information. To address this limitation, we propose a novel framework for temporal neighbor aggregation that uses the recurrent neural network with node-wise hidden states to integrate information from all historical neighbors for each node to acquire the complete neighbor information. We demonstrate the superior theoretical expressiveness of the proposed framework as well as its state-of-the-art performance in real-world applications. Notably, it achieves a significant +9.6% improvement on averaged precision in a real-world Ecommerce dataset over existing methods on 2-layer models.
    摘要 时间图表提供更加准确地模型多种实际场景 than static graphs. However, temporal graph neighbor aggregation, a critical component of graph networks, is currently extended straightforwardly from static graphs. This can be computationally expensive when considering all historical neighbors during aggregation. In practice, only a subset of the most recent neighbors are typically involved, but such subsampling leads to incomplete and biased neighbor information. To address this limitation, we propose a novel framework for temporal neighbor aggregation that uses recurrent neural networks with node-wise hidden states to integrate information from all historical neighbors for each node to obtain complete neighbor information. We demonstrate the superior theoretical expressiveness of the proposed framework as well as its state-of-the-art performance in real-world applications. Notably, it achieves a significant +9.6% improvement in averaged precision over existing methods on 2-layer models in a real-world Ecommerce dataset.

OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling

  • paper_url: http://arxiv.org/abs/2309.12659
  • repo_url: https://github.com/yfzhang114/onenet
  • paper_authors: Yi-Fan Zhang, Qingsong Wen, Xue Wang, Weiqi Chen, Liang Sun, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan
  • For: 本研究旨在提出一种能够高效地更新时间序列预测模型,以Addressing the concept drifting problem。* Methods: 本文提出了一种基于online convex programming框架的强化学习方法,可以动态地更新和组合两个模型,其中一个模型专注于时间维度上的关系,另一个模型则是跨变量关系。* Results: 实验结果显示,OneNet可以在线预测错误下降超过50%,至比State-Of-The-Art方法更高。
    Abstract Online updating of time series forecasting models aims to address the concept drifting problem by efficiently updating forecasting models based on streaming data. Many algorithms are designed for online time series forecasting, with some exploiting cross-variable dependency while others assume independence among variables. Given every data assumption has its own pros and cons in online time series modeling, we propose \textbf{On}line \textbf{e}nsembling \textbf{Net}work (OneNet). It dynamically updates and combines two models, with one focusing on modeling the dependency across the time dimension and the other on cross-variate dependency. Our method incorporates a reinforcement learning-based approach into the traditional online convex programming framework, allowing for the linear combination of the two models with dynamically adjusted weights. OneNet addresses the main shortcoming of classical online learning methods that tend to be slow in adapting to the concept drift. Empirical results show that OneNet reduces online forecasting error by more than $\mathbf{50\%}$ compared to the State-Of-The-Art (SOTA) method. The code is available at \url{https://github.com/yfzhang114/OneNet}.
    摘要 在线更新时间序列预测模型目的是解决概念漂移问题,通过基于流入数据的高效更新预测模型。许多算法已经为在线时间序列预测设计,其中一些利用时间维度之间的依赖关系,而其他们假设变量之间是独立的。每个数据假设都有其自己的优缺点,在在线时间序列预测中。我们提出了《Online Ensembling Network(OneNet)》,它在实时更新和组合两个模型,其中一个专门关注时间维度之间的依赖关系,另一个则关注变量之间的交叉依赖关系。我们的方法将经验学学习基于的逻辑添加到传统的在线凸programming框架中,允许在线动态调整模型之间的线性组合。OneNet可以快速适应概念漂移,并且实际结果表明,与现状技术(SOTA)方法相比,OneNet可以降低在线预测错误率高于50%。代码可以在 \url{https://github.com/yfzhang114/OneNet} 中找到。

Neural Operator Variational Inference based on Regularized Stein Discrepancy for Deep Gaussian Processes

  • paper_url: http://arxiv.org/abs/2309.12658
  • repo_url: None
  • paper_authors: Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng
  • for: 这个研究旨在提出一种基于神经网络的 Variational Inference 方法,用于深度 Gaussian Process (DGP) 模型中的 Bayesian 推断。
  • methods: 方法使用神经网络生成器,实现了对真 posterior 的过程独立推断,并使用 Monte Carlo 估计和抽样数据点估计技术来解决问题。
  • results: 实验结果显示,提出的方法可以实现高精度和高速度的推断,并在许多数据集上实现了比 SOTA Gaussian process 方法更高的分类精度。此外,方法可以 theoretically 控制预测误差,并在各种数据集上展示了优异的表现。
    Abstract Deep Gaussian Process (DGP) models offer a powerful nonparametric approach for Bayesian inference, but exact inference is typically intractable, motivating the use of various approximations. However, existing approaches, such as mean-field Gaussian assumptions, limit the expressiveness and efficacy of DGP models, while stochastic approximation can be computationally expensive. To tackle these challenges, we introduce Neural Operator Variational Inference (NOVI) for Deep Gaussian Processes. NOVI uses a neural generator to obtain a sampler and minimizes the Regularized Stein Discrepancy in L2 space between the generated distribution and true posterior. We solve the minimax problem using Monte Carlo estimation and subsampling stochastic optimization techniques. We demonstrate that the bias introduced by our method can be controlled by multiplying the Fisher divergence with a constant, which leads to robust error control and ensures the stability and precision of the algorithm. Our experiments on datasets ranging from hundreds to tens of thousands demonstrate the effectiveness and the faster convergence rate of the proposed method. We achieve a classification accuracy of 93.56 on the CIFAR10 dataset, outperforming SOTA Gaussian process methods. Furthermore, our method guarantees theoretically controlled prediction error for DGP models and demonstrates remarkable performance on various datasets. We are optimistic that NOVI has the potential to enhance the performance of deep Bayesian nonparametric models and could have significant implications for various practical applications
    摘要 深度泊松过程(DGP)模型提供了一种强大的非参数方法 для bayesian推理,但确切的推理通常是不可能的,这导致了不同的 aproximation 被使用。然而,现有的方法,如 Gaussian 假设,限制了 DGP 模型的表达能力和有效性,而随机approximation 可能会是 computationally expensive。为了解决这些挑战,我们引入了 Neural Operator Variational Inference(NOVI) для Deep Gaussian Processes。NOVI 使用神经网络生成器来获取一个采样器,并将 Regularized Stein Discrepancy 在 L2 空间中减少到真 posterior 和生成的分布之间的差异。我们使用 Monte Carlo 估计和抽样化优化技术来解决最小最大问题。我们发现,我们的方法中引入的偏差可以通过多余的 Fisher 异同平方控制,从而保证算法的稳定性和精度。我们的实验结果表明,我们的方法可以在数据集规模从百万到万个数据点之间进行效果地训练,并且在 CIFAR10 数据集上达到了 93.56% 的分类精度,超过了现有的 Gaussian process 方法。此外,我们的方法可以 theoretically 控制 DGP 模型的预测错误,并在不同的数据集上显示出惊人的性能。我们认为 NOVI 有可能提高深度 Bayesian 非 Parametric 模型的性能,并可能在各种实际应用中具有重要意义。

Sequential Action-Induced Invariant Representation for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.12628
  • repo_url: https://github.com/dmu-xmu/sar
  • paper_authors: Dayang Liang, Qihang Chen, Yunlong Liu
  • for: 提高visual reinforcement learning中 task-relevant state representation的学习精度,并在受到视觉干扰的环境中实现更好的性能。
  • methods: 基于bisimulation metric、prediction、contrast和重建等方法,提出Sequential Action–induced invariant Representation(SAR)方法,通过控制信号驱动encoder的优化,使代理人能够学习对干扰免疫的表示。
  • results: 在DeepMind Control suite任务上实现了最佳baseline的性能,并在实际的CARLA自动驾驶中证明了方法的有效性。 Code和示例视频可以在https://github.com/DMU-XMU/SAR.git中找到。
    Abstract How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a realistic and challenging problem in visual reinforcement learning. Recently, unsupervised representation learning methods based on bisimulation metrics, contrast, prediction, and reconstruction have shown the ability for task-relevant information extraction. However, due to the lack of appropriate mechanisms for the extraction of task information in the prediction, contrast, and reconstruction-related approaches and the limitations of bisimulation-related methods in domains with sparse rewards, it is still difficult for these methods to be effectively extended to environments with distractions. To alleviate these problems, in the paper, the action sequences, which contain task-intensive signals, are incorporated into representation learning. Specifically, we propose a Sequential Action--induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions, so the agent can be induced to learn the robust representation against distractions. We conduct extensive experiments on the DeepMind Control suite tasks with distractions while achieving the best performance over strong baselines. We also demonstrate the effectiveness of our method at disregarding task-irrelevant information by deploying SAR to real-world CARLA-based autonomous driving with natural distractions. Finally, we provide the analysis results of generalization drawn from the generalization decay and t-SNE visualization. Code and demo videos are available at https://github.com/DMU-XMU/SAR.git.
    摘要 如何准确地从高维观察数据中提取任务相关的状态表示是现实和挑战性的问题在视觉回归学中。在最近的无监督表示学方法基于 bisimulation 度量、对比、预测和重构方面,已经显示出提取任务相关信息的能力。但由于预测、对比和重构相关的方法中缺乏任务信息抽取的适当机制,以及 bisimulation 度量相关的方法在射频奖励下的局限性,使得这些方法在环境噪音中仍然具有困难。为解决这些问题,在本文中,我们提出了一种Sequential Action--induced invariant Representation(SAR)方法,其中扩展器是通过辅助学习器优化,以便只保留遵循控制信号的序列动作中的组件,以使代理人能够学习免斥噪音的Robust表示。我们在 DeepMind Control suite任务上进行了广泛的实验,并实现了强基eline的最高表现。我们还证明了我们的方法可以忽略任务无关的信息,通过将 SAR 应用于实际的 CARLA 基于自动驾驶中的自然噪音。最后,我们提供了一些总结和分析结果,包括通过总结衰减和 t-SNE 视觉化来证明代理人学习的一致性。代码和示例视频可以在 中找到。

Data-driven Preference Learning Methods for Multiple Criteria Sorting with Temporal Criteria

  • paper_url: http://arxiv.org/abs/2309.12620
  • repo_url: None
  • paper_authors: Li Yijun, Guo Mengzhuo, Zhang Qingpeng
  • for: 本研究旨在提出新的偏好学习方法,用于多 criterion 排序问题中的时间序列数据处理。
  • methods: 本研究使用了一种固定时间折扣因子的几何 quadratic programming 模型,以及一种 ensemble learning 算法,可以将多个可能较弱的优化器的输出集成起来,并通过并行计算进行高效地执行。此外,本研究还提出了一种新的幂等 Recurrent Neural Network (mRNN),可以捕捉时间序列中的偏好动态,并保持多重排序问题中的关键性质,如偏好幂等性、偏好独立性和自然排序。
  • results: 对于 synthetic 数据和一个实际案例(关于分类用户在 mobil 游戏中的历史行为序列),实验结果表明,提出的模型在与基准方法(包括机器学习、深度学习和传统多 criterion 排序方法)进行比较时,表现出了显著的性能改进。
    Abstract The advent of predictive methodologies has catalyzed the emergence of data-driven decision support across various domains. However, developing models capable of effectively handling input time series data presents an enduring challenge. This study presents novel preference learning approaches to multiple criteria sorting problems in the presence of temporal criteria. We first formulate a convex quadratic programming model characterized by fixed time discount factors, operating within a regularization framework. Additionally, we propose an ensemble learning algorithm designed to consolidate the outputs of multiple, potentially weaker, optimizers, a process executed efficiently through parallel computation. To enhance scalability and accommodate learnable time discount factors, we introduce a novel monotonic Recurrent Neural Network (mRNN). It is designed to capture the evolving dynamics of preferences over time while upholding critical properties inherent to MCS problems, including criteria monotonicity, preference independence, and the natural ordering of classes. The proposed mRNN can describe the preference dynamics by depicting marginal value functions and personalized time discount factors along with time, effectively amalgamating the interpretability of traditional MCS methods with the predictive potential offered by deep preference learning models. Comprehensive assessments of the proposed models are conducted, encompassing synthetic data scenarios and a real-case study centered on classifying valuable users within a mobile gaming app based on their historical in-app behavioral sequences. Empirical findings underscore the notable performance improvements achieved by the proposed models when compared to a spectrum of baseline methods, spanning machine learning, deep learning, and conventional multiple criteria sorting approaches.
    摘要 “预测方法的出现刺激了不同领域的数据驱动决策。然而,处理时间序列资料的模型建立仍然是一个持续的挑战。本研究提出了一些新的偏好学习方法,用于多个条件中的排序问题,包括时间条件。我们首先建立了一个固定时间折冲因子的对称quadratic programming模型,并在一个调整框架下进行运算。此外,我们提出了一个ensemble学习算法,用于结合多个、可能的弱来调整器的output,这个过程通过平行计算进行高效执行。为了增强可扩展性和可学习时间折冲因子,我们引入了一个新的对称复环神经网络(mRNN)。这个mRNN可以捕捉时间的演进 Dynamics 的偏好,同时维持多个条件问题的核心性质,包括条件单调性、偏好独立性和时间条件下的自然顺序。提出的mRNN可以描述偏好动态,包括时间条件下的贡献值函数和对个人时间折冲因子的描述,实现了传统多个条件排序方法的解释性和深度偏好学习模型的预测能力。实验结果显示,提出的模型在 synthetic 数据enario 和一个实际的移动游戏APP用户评分案例中均表现出色,与一系列基准方法相比,包括机器学习、深度学习和传统多个条件排序方法。”

Zero-Regret Performative Prediction Under Inequality Constraints

  • paper_url: http://arxiv.org/abs/2309.12618
  • repo_url: None
  • paper_authors: Wenjing Yan, Xuanyu Cao
  • for: 本文研究了受约束的performative预测问题,即预测结果会影响未来数据分布的问题。
  • methods: 本文提出了一种robust预测框架,可以在约束条件下实现高效的预测。此外,本文还提出了一种适应预测算法,可以在各种场景下实现优化的预测。
  • results: 本文的分析表明,提出的适应预测算法可以在约束条件下实现$\ca{O}(\sqrt{T})$的违规和约束违宪,使用只有$\sqrt{T} + 2T$个样本。这是首次对performative预测问题的优化问题进行分析和研究。
    Abstract Performative prediction is a recently proposed framework where predictions guide decision-making and hence influence future data distributions. Such performative phenomena are ubiquitous in various areas, such as transportation, finance, public policy, and recommendation systems. To date, work on performative prediction has only focused on unconstrained scenarios, neglecting the fact that many real-world learning problems are subject to constraints. This paper bridges this gap by studying performative prediction under inequality constraints. Unlike most existing work that provides only performative stable points, we aim to find the optimal solutions. Anticipating performative gradients is a challenging task, due to the agnostic performative effect on data distributions. To address this issue, we first develop a robust primal-dual framework that requires only approximate gradients up to a certain accuracy, yet delivers the same order of performance as the stochastic primal-dual algorithm without performativity. Based on this framework, we then propose an adaptive primal-dual algorithm for location families. Our analysis demonstrates that the proposed adaptive primal-dual algorithm attains $\ca{O}(\sqrt{T})$ regret and constraint violations, using only $\sqrt{T} + 2T$ samples, where $T$ is the time horizon. To our best knowledge, this is the first study and analysis on the optimality of the performative prediction problem under inequality constraints. Finally, we validate the effectiveness of our algorithm and theoretical results through numerical simulations.
    摘要 Performative 预测是一种最近提出的框架,在预测导向决策的过程中,预测结果会影响未来数据分布。这种 performative 现象在交通、金融、公共政策和推荐系统等领域都是非常普遍的。然而,现有的工作都是在不受限制的情况下进行预测,忽略了现实世界学习问题往往受到限制。这篇论文尝试填补这个空白,通过研究 performative 预测下 inequality 约束来解决这个问题。不同于大多数现有的工作,我们不仅提供 performative 稳定点,而是寻找最佳解决方案。预测 performative Gradient 是一项非常困难的任务,因为 performative 对数据分布的影响是agnostic的。为 Addressing this issue, we first develop a robust primal-dual framework that requires only approximate gradients up to a certain accuracy, yet delivers the same order of performance as the stochastic primal-dual algorithm without performativity. Based on this framework, we then propose an adaptive primal-dual algorithm for location families. Our analysis demonstrates that the proposed adaptive primal-dual algorithm attains $\ca{O}(\sqrt{T})$ regret and constraint violations, using only $\sqrt{T} + 2T$ samples, where $T$ is the time horizon. To our best knowledge, this is the first study and analysis on the optimality of the performative prediction problem under inequality constraints. Finally, we validate the effectiveness of our algorithm and theoretical results through numerical simulations.

ARRQP: Anomaly Resilient Real-time QoS Prediction Framework with Graph Convolution

  • paper_url: http://arxiv.org/abs/2310.02269
  • repo_url: None
  • paper_authors: Suraj Kumar, Soumi Chattopadhyay
    for: 这种研究旨在提高现代服务套件架构中的质量服务(QoS)预测精度,以便用户可以根据预测结果做出了 Informed 决策。methods: 这种预测框架(名为 ARRQP)利用图 convolution 技术捕捉用户和服务之间的复杂关系和依赖关系,即使数据是有限或缺失的。 ARRQP 集成了上下文信息和协同信息,以获得用户-服务交互的全面理解。 results: 对 WS-DREAM 测试集的实验表明,这种预测框架可以准确地预测 QoS,并且在各种异常情况下保持高度的稳定性。
    Abstract In the realm of modern service-oriented architecture, ensuring Quality of Service (QoS) is of paramount importance. The ability to predict QoS values in advance empowers users to make informed decisions. However, achieving accurate QoS predictions in the presence of various issues and anomalies, including outliers, data sparsity, grey-sheep instances, and cold-start scenarios, remains a challenge. Current state-of-the-art methods often fall short when addressing these issues simultaneously, resulting in performance degradation. In this paper, we introduce a real-time QoS prediction framework (called ARRQP) with a specific emphasis on improving resilience to anomalies in the data. ARRQP utilizes the power of graph convolution techniques to capture intricate relationships and dependencies among users and services, even when the data is limited or sparse. ARRQP integrates both contextual information and collaborative insights, enabling a comprehensive understanding of user-service interactions. By utilizing robust loss functions, ARRQP effectively reduces the impact of outliers during the model training. Additionally, we introduce a sparsity-resilient grey-sheep detection method, which is subsequently treated separately for QoS prediction. Furthermore, we address the cold-start problem by emphasizing contextual features over collaborative features. Experimental results on the benchmark WS-DREAM dataset demonstrate the framework's effectiveness in achieving accurate and timely QoS predictions.
    摘要 在现代服务套件架构中,保证服务质量(QoS)的重要性不言而喻。预测QoS值的能力使用户做出了 Informed 决策。然而,在面临各种问题和异常情况,包括异常值、数据稀缺、灰羊实例和冷启动场景时,实现准确的QoS预测仍然是一大挑战。当前的状态艺术方法经常在同时处理这些问题时表现不佳,导致性能下降。在这篇论文中,我们提出了一个实时QoS预测框架(叫做ARRQP),强调改进数据中异常现象的抗逆性。ARRQP利用图 convolution 技术捕捉用户和服务之间的复杂关系和依赖关系,即使数据稀缺或异常。ARRQP结合了上下文信息和协同知识,使得用户-服务交互的全面理解。通过使用robust 损失函数,ARRQP减少了模型训练中异常值的影响。此外,我们还提出了稀缺灰羊检测方法,并将其与QoS预测分开处理。此外,我们解决冷启动问题,强调上下文特征而不是协同特征。实验结果表明,ARRQP在WS-DREAM 数据集上实现了准确和时间性的QoS预测。

Multiply Robust Federated Estimation of Targeted Average Treatment Effects

  • paper_url: http://arxiv.org/abs/2309.12600
  • repo_url: None
  • paper_authors: Larry Han, Zhu Shen, Jose Zubizarreta
  • for: 这 paper 是为了 Derive valid causal inferences for a target population using multi-site data.
  • methods: 这 paper 使用了一种 novel federated approach, 包括 covariate shift and covariate mismatch between sites 的 adjustment, 以及 transfer learning 来 estimate ensemble weights to combine information from source sites.
  • results: 这 paper 的研究结果表明,这种方法在不同的 scenario 下具有高效和可靠的特点,并且在 finite sample 上有效性和稳定性比 existed approach 更高.
    Abstract Federated or multi-site studies have distinct advantages over single-site studies, including increased generalizability, the ability to study underrepresented populations, and the opportunity to study rare exposures and outcomes. However, these studies are challenging due to the need to preserve the privacy of each individual's data and the heterogeneity in their covariate distributions. We propose a novel federated approach to derive valid causal inferences for a target population using multi-site data. We adjust for covariate shift and covariate mismatch between sites by developing multiply-robust and privacy-preserving nuisance function estimation. Our methodology incorporates transfer learning to estimate ensemble weights to combine information from source sites. We show that these learned weights are efficient and optimal under different scenarios. We showcase the finite sample advantages of our approach in terms of efficiency and robustness compared to existing approaches.
    摘要 Simplified Chinese translation:多站或联合研究有许多优点,包括增加了一般化性、研究少数群体和罕见的暴露和结果。然而,这些研究具有保护每个个体数据隐私和不同站点的 covariate 分布异常性的挑战。我们提出了一种新的联邦方法,以 derivation 适用于目标人口的有效 causal inference。我们对 covariate shift 和 covariate mismatch 进行了修正,并通过开发多重可靠和隐私保护的干扰函数估计。我们的方法包括使用转移学习来估计ensemble weights,将多个源站的信息组合。我们显示了这些学习到的权重是有效的和优化的在不同的场景下。我们还展示了我们的方法在规模和稳定性方面的较好的 finite sample 优势,与现有方法相比。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The Traditional Chinese version is also available upon request.

Learning algorithms for identification of whisky using portable Raman spectroscopy

  • paper_url: http://arxiv.org/abs/2309.13087
  • repo_url: None
  • paper_authors: Kwang Jun Lee, Alexander C. Trowbridge, Graham D. Bruce, George O. Dwapanyin, Kylie R. Dunning, Kishan Dholakia, Erik P. Schartner
  • for: 鉴定高值饮料的可靠性是一个日益重要的领域,因为问题如品牌替换(即虚假产品)和质量控制对行业是关键的。
  • methods: 我们检查了一系列机器学习算法,并将其直接与可携带式拉曼谱仪device进行了交互,以 both identify和 characterize commercial whisky samples的 ethanol/methanol浓度。
  • results: 我们示出了机器学习模型可以在二十八个商业样本中实现超过99%的品牌认定率。此外,我们还使用了同样的样本和算法来量化 ethanol浓度,以及在杂入 whisky 样本中测量 methanol 水平。我们的机器学习技术然后与通过瓶装置进行spectral analysis和标识,不需要样本从原始容器中抽取,这表明了这种方法在检测假冒或杂入饮料和其他高值液体样本中的实际潜力。
    Abstract Reliable identification of high-value products such as whisky is an increasingly important area, as issues such as brand substitution (i.e. fraudulent products) and quality control are critical to the industry. We have examined a range of machine learning algorithms and interfaced them directly with a portable Raman spectroscopy device to both identify and characterize the ethanol/methanol concentrations of commercial whisky samples. We demonstrate that machine learning models can achieve over 99% accuracy in brand identification across twenty-eight commercial samples. To demonstrate the flexibility of this approach we utilised the same samples and algorithms to quantify ethanol concentrations, as well as measuring methanol levels in spiked whisky samples. Our machine learning techniques are then combined with a through-the-bottle method to perform spectral analysis and identification without requiring the sample to be decanted from the original container, showing the practical potential of this approach to the detection of counterfeit or adulterated spirits and other high value liquid samples.
    摘要 stable 识别高值产品,如威士忌,在当今越来越重要,因为问题如品牌替换(即假冒产品)和质量控制是行业关键。我们已经审查了一系列机器学习算法,并直接与可携带式拉曼谱仪器集成以识别和Characterize商业威士忌样本中的丙醇/甲醇浓度。我们示出了机器学习模型可以在28个商业样本中达到99%以上的品牌识别率。为了 demonstarte 这种方法的灵活性,我们使用了相同的样本和算法来量化丙醇浓度,以及测量杂入威士忌样本中的甲醇含量。我们的机器学习技术然后与通过瓶子方法进行spectral analysis和识别,无需将样本从原始容器中抽取,显示了这种方法在检测假冒或杂入饮料和其他高值液体样本中的实际潜力。

Sampling-Frequency-Independent Universal Sound Separation

  • paper_url: http://arxiv.org/abs/2309.12581
  • repo_url: None
  • paper_authors: Tomohiko Nakamura, Kohei Yatabe
  • for: 这个论文提出了一种能够处理未经训练的采样频率(SF)的通用声音分离(USS)方法,用于分离不同类型的源 signal。
  • methods: 该方法使用了我们之前提出的SF独立(SFI)扩展,使用SFI convolutional layers来处理不同SF。
  • results: 实验表明,信号重采样可能会降低USS性能,而我们提出的方法在不同SF下表现更一致。
    Abstract This paper proposes a universal sound separation (USS) method capable of handling untrained sampling frequencies (SFs). The USS aims at separating arbitrary sources of different types and can be the key technique to realize a source separator that can be universally used as a preprocessor for any downstream tasks. To realize a universal source separator, there are two essential properties: universalities with respect to source types and recording conditions. The former property has been studied in the USS literature, which has greatly increased the number of source types that can be handled by a single neural network. However, the latter property (e.g., SF) has received less attention despite its necessity. Since the SF varies widely depending on the downstream tasks, the universal source separator must handle a wide variety of SFs. In this paper, to encompass the two properties, we propose an SF-independent (SFI) extension of a computationally efficient USS network, SuDoRM-RF. The proposed network uses our previously proposed SFI convolutional layers, which can handle various SFs by generating convolutional kernels in accordance with an input SF. Experiments show that signal resampling can degrade the USS performance and the proposed method works more consistently than signal-resampling-based methods for various SFs.
    摘要 To address this challenge, the proposed method extends a computationally efficient USS network, SuDoRM-RF, with an SF-independent (SFI) extension. The proposed network uses SFI convolutional layers that can handle various SFs by generating convolutional kernels in accordance with the input SF. This allows the network to maintain its performance across different SFs. Experimental results show that signal resampling can degrade the USS performance, and the proposed method outperforms signal-resampling-based methods for various SFs.In simplified Chinese, the text would be:这篇论文提出了一种能处理不受训练 sampling frequency (SF) 的通用声音分离 (USS) 方法。USS 目标是分离不同类型的原始源,并且可以是下游任务的键技术。为实现这一目标,需要两个关键属性:对于源类型和记录条件的通用性。前者已经在 USS 文献中得到了大量的研究,但是后者(即 SF)尚未得到了 suficient 的关注,尽管它的重要性。由于 SF 在下游任务中变化广泛,通用的源分离器必须能处理多种 SF。为此,我们提议一种 SF-独立 (SFI) 的扩展,使用我们之前提出的 SFI 卷积层,可以根据输入 SF 生成卷积 kernel。实验显示,signal resampling 可能会降低 USS 性能,而我们提议的方法在不同 SF 下表现更稳定。

SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling

  • paper_url: http://arxiv.org/abs/2309.12578
  • repo_url: None
  • paper_authors: Bokyeong Yoon, Yoonsang Han, Gordon Euhyun Moon
  • for: 这篇论文旨在提高Transformer模型的训练效率和内存压缩,以提高模型的运算效率和评估质量。
  • methods: 本论文提出了一种新的对Transformer模型进行实体化的方法,使用了对于层次的滤波器和淹水填充方法,以提高对于注意力操作的实体化效率。
  • results: 本论文的实验结果显示,使用了本方法可以实现Transformer模型的训练时间和内存压缩,并且可以维持评估质量。具体来说,本论文可以在GPU上实现快速的实体化执行,并且可以比起现有的紧缩Transformer模型实现3.08倍的速度提升,同时保持评估质量。
    Abstract Sparsifying the Transformer has garnered considerable interest, as training the Transformer is very computationally demanding. Prior efforts to sparsify the Transformer have either used a fixed pattern or data-driven approach to reduce the number of operations involving the computation of multi-head attention, which is the main bottleneck of the Transformer. However, existing methods suffer from inevitable problems, such as the potential loss of essential sequence features due to the uniform fixed pattern applied across all layers, and an increase in the model size resulting from the use of additional parameters to learn sparsity patterns in attention operations. In this paper, we propose a novel sparsification scheme for the Transformer that integrates convolution filters and the flood filling method to efficiently capture the layer-wise sparse pattern in attention operations. Our sparsification approach reduces the computational complexity and memory footprint of the Transformer during training. Efficient implementations of the layer-wise sparsified attention algorithm on GPUs are developed, demonstrating a new SPION that achieves up to 3.08X speedup over existing state-of-the-art sparse Transformer models, with better evaluation quality.
    摘要 减少Transformer的计算复杂性得到了广泛关注,因为训练Transformer很计算昂贵。先前的减少方法包括使用固定模式或数据驱动方法来减少多头注意力计算的数量,但现有方法受到不可避免的问题,如所有层都应用 uniform 固定模式,导致可能丢失重要的序列特征,并且使用更多参数来学习注意力操作的缺省模式。在这篇论文中,我们提出了一种新的减少方案,将 convolution 筛选器和淹水填充方法结合使用,以高效地捕捉层 wise sparse 模式在注意力操作中。我们的减少方法可以在训练过程中降低Transformer的计算复杂性和内存占用。我们实现了层 wise 减少的注意力算法在GPU上,并达到了3.08倍的速度提升,与评价质量相对较好的现有 sparse Transformer 模型相比。

Enhancing Network Resilience through Machine Learning-powered Graph Combinatorial Optimization: Applications in Cyber Defense and Information Diffusion

  • paper_url: http://arxiv.org/abs/2310.10667
  • repo_url: None
  • paper_authors: Diksha Goel
  • For: This paper focuses on developing effective approaches for enhancing network resilience in cyber defense and information diffusion application domains.* Methods: The paper transforms the problems of discovering bottleneck edges and structural hole spanner nodes into graph-combinatorial optimization problems and designs machine learning-based approaches to discover bottleneck points vital for network resilience.* Results: The paper aims to provide effective, efficient, and scalable techniques for enhancing network resilience in specific application domains.Here is the simplified Chinese version of the three key information points:
  • for: 这篇论文关注于在网络防御和信息传播应用领域中提高网络可恢复性。
  • methods: 论文将瓶颈边缘和结构孔挫节点的问题转化为图谱-组合优化问题,并采用机器学习方法来找出网络中瓶颈点。
  • results: 论文目标是为特定应用领域提供有效、高效和可扩展的网络可恢复性提高方法。
    Abstract With the burgeoning advancements of computing and network communication technologies, network infrastructures and their application environments have become increasingly complex. Due to the increased complexity, networks are more prone to hardware faults and highly susceptible to cyber-attacks. Therefore, for rapidly growing network-centric applications, network resilience is essential to minimize the impact of attacks and to ensure that the network provides an acceptable level of services during attacks, faults or disruptions. In this regard, this thesis focuses on developing effective approaches for enhancing network resilience. Existing approaches for enhancing network resilience emphasize on determining bottleneck nodes and edges in the network and designing proactive responses to safeguard the network against attacks. However, existing solutions generally consider broader application domains and possess limited applicability when applied to specific application areas such as cyber defense and information diffusion, which are highly popular application domains among cyber attackers. This thesis aims to design effective, efficient and scalable techniques for discovering bottleneck nodes and edges in the network to enhance network resilience in cyber defense and information diffusion application domains. We first investigate a cyber defense graph optimization problem, i.e., hardening active directory systems by discovering bottleneck edges in the network. We then study the problem of identifying bottleneck structural hole spanner nodes, which are crucial for information diffusion in the network. We transform both problems into graph-combinatorial optimization problems and design machine learning based approaches for discovering bottleneck points vital for enhancing network resilience.
    摘要 随着计算和网络通信技术的不断发展,网络基础设施和其应用环境变得越来越复杂,因此网络更容易受到硬件故障和攻击。为了应对这些攻击和故障,网络可靠性变得非常重要,以确保网络在攻击或故障时仍能提供可接受的服务。在这个视角下,这个论文将关注开发有效的网络可靠性提升方法。现有的网络可靠性提升方法通常是通过确定网络中瓶须节点和边来预防攻击。然而,现有的解决方案通常只适用于更广泛的应用领域,而不是特定的应用领域,如网络防御和信息传播,这些应用领域在网络攻击者中非常受欢迎。这个论文的目标是为网络防御和信息传播应用领域提供有效、高效和可扩展的瓶须节点和边发现方法,以提升网络可靠性。我们首先研究了网络防御图优化问题,即通过发现网络中瓶须边来强化网络防御。然后,我们研究了网络中瓶须结构孔隙节点的问题,这些节点对于信息传播非常重要。我们将这两个问题转化为图-数学优化问题,并使用机器学习方法来发现瓶须点,以提升网络可靠性。

A Simple Illustration of Interleaved Learning using Kalman Filter for Linear Least Squares

  • paper_url: http://arxiv.org/abs/2310.03751
  • repo_url: None
  • paper_authors: Majnu John, Yihren Wu
  • for: 提出了一种基于Kalman Filter的线性最小二乘算法的机器学习算法协调学习机制。
  • methods: 使用了Kalman Filter来实现线性最小二乘算法中的协调学习机制。
  • results: 通过实验证明了该算法的效果。
    Abstract Interleaved learning in machine learning algorithms is a biologically inspired training method with promising results. In this short note, we illustrate the interleaving mechanism via a simple statistical and optimization framework based on Kalman Filter for Linear Least Squares.
    摘要 生物学中的混合学习(Interleaved learning)是一种机器学习算法中的训练方法,具有承诺的成果。本短记将通过简单的统计和优化框架,基于加尔曼缓冲器进行线性最小二乘问题的示例阐释interleaving机制。