cs.LG - 2023-10-25

Strategizing EV Charging and Renewable Integration in Texas

  • paper_url: http://arxiv.org/abs/2310.17056
  • repo_url: None
  • paper_authors: Mohammad Mohammadi, Jesse Thornburg
  • for: 本研究旨在探讨电动汽车(EVs)、可再生能源和智能电网技术在德州上的整合,并解决电动汽车普及化困难所带来的挑战。
  • methods: 该研究使用时间扭曲 clustering(DTW)和k-means clustering方法对每天的总负荷和网络负荷进行分类,从而获得每天的电力消耗和可再生能源生产特征。此外,研究还提出了一种基于特定负荷特点的优化充电和汽车到电网(V2G)窗口的方法,以便更好地决策能源消耗和可再生资源的整合。
  • results: 研究发现,通过DTW clustering和k-means clustering方法可以分化不同的每天电力消耗和可再生能源生产特征,并且可以根据特定负荷特点设置优化的充电和V2G窗口,以提高智能电网的稳定性和可再生能源的使用率。这些发现对于实现可持续可靠的能源未来具有重要意义。
    Abstract Exploring the convergence of electric vehicles (EVs), renewable energy, and smart grid technologies in the context of Texas, this study addresses challenges hindering the widespread adoption of EVs. Acknowledging their environmental benefits, the research focuses on grid stability concerns, uncoordinated charging patterns, and the complicated relationship between EVs and renewable energy sources. Dynamic time warping (DTW) clustering and k-means clustering methodologies categorize days based on total load and net load, offering nuanced insights into daily electricity consumption and renewable energy generation patterns. By establishing optimal charging and vehicle-to-grid (V2G) windows tailored to specific load characteristics, the study provides a sophisticated methodology for strategic decision-making in energy consumption and renewable integration. The findings contribute to the ongoing discourse on achieving a sustainable and resilient energy future through the seamless integration of EVs into smart grids.
    摘要 研究electric vehicles (EVs)在德州的整合、可再生能源和智能网络技术方面进行了探索。本研究承认EVs具有环保的优点,但是对于网络稳定性、充电模式不协调和可再生能源源与EVs之间的复杂关系存在一些挑战。使用时间扭曲分 clustering和k-means分 clustering方法对每天的总负荷和网络负荷进行分类,从而提供了细化的每天电力消耗和可再生能源生产模式的洞察。通过确定最佳充电和汽车到网络(V2G)窗口,以适应特定的负荷特征,本研究提供了一种高级的决策方法,以便在能源消耗和可再生资源 интеграции方面做出策略决策。研究成果对于实现可持续可靠的能源未来做出了贡献。

Early Detection of Tuberculosis with Machine Learning Cough Audio Analysis: Towards More Accessible Global Triaging Usage

  • paper_url: http://arxiv.org/abs/2310.17675
  • repo_url: None
  • paper_authors: Chandra Suda
  • for: 这份研究旨在提高肺炎诊断,开发一个快速、可靠、易于存取的诊断工具。
  • methods: 这个研究使用了一种新型的机器学习架构,将智能手机的麦克风声音档案和人口普查资料进行分析,以检测肺炎。
  • results: 研究获得了88%的AUROC分布(世界卫生组织的诊断标准),比实际应用更高,并且可以在15秒内获得结果。
    Abstract Tuberculosis (TB), a bacterial disease mainly affecting the lungs, is one of the leading infectious causes of mortality worldwide. To prevent TB from spreading within the body, which causes life-threatening complications, timely and effective anti-TB treatment is crucial. Cough, an objective biomarker for TB, is a triage tool that monitors treatment response and regresses with successful therapy. Current gold standards for TB diagnosis are slow or inaccessible, especially in rural areas where TB is most prevalent. In addition, current machine learning (ML) diagnosis research, like utilizing chest radiographs, is ineffective and does not monitor treatment progression. To enable effective diagnosis, an ensemble model was developed that analyzes, using a novel ML architecture, coughs' acoustic epidemiologies from smartphones' microphones to detect TB. The architecture includes a 2D-CNN and XGBoost that was trained on 724,964 cough audio samples and demographics from 7 countries. After feature extraction (Mel-spectrograms) and data augmentation (IR-convolution), the model achieved AUROC (area under the receiving operator characteristic) of 88%, surpassing WHO's requirements for screening tests. The results are available within 15 seconds and can easily be accessible via a mobile app. This research helps to improve TB diagnosis through a promising accurate, quick, and accessible triaging tool.
    摘要 抑菌疾病(TB)是全球主要传染性疾病之一,主要影响肺部。在时间上采取有效抑菌治疗是关键,以防TB在身体内进一步扩散并导致生命威胁性的合并症。咳嗽是TB的对象生物标志,可以评估治疗效果,并随着成功治疗而逐渐下降。但目前的TB诊断标准过于慢或不可达,尤其是在乡村地区,TB的发病率最高。此外,目前的机器学习(ML)诊断研究,如使用胸部X射线,无法准确诊断TB。为了实现有效的诊断,我们开发了一个ensemble模型,利用智能手机麦克风的声音样本来检测TB。该模型包括2D-CNN和XGBoost,并在7个国家的724,964个咳嗽声音样本和人口统计数据上进行训练。 после特征提取(Mel-spectrograms)和数据增强(IR-convolution),模型达到了AUROC(区域下收益特征)的88%,超过世界卫生组织(WHO)的诊断测试标准。结果在15秒内可以获得,并可以通过移动应用程序访问。这项研究可以改善TB诊断,提供一个准确、快速、可 accessible的检测工具。

Learning to Rank for Active Learning via Multi-Task Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2310.17044
  • repo_url: None
  • paper_authors: Zixin Ding, Si Chen, Ruoxi Jia, Yuxin Chen
  • for: 提高活动学习效率和可行性,降低标注成本。
  • methods: 提出了一种新的活动学习方法,通过学习的搅拌模型来选择未标注的实例。
  • results: 通过实验表明,使用我们的方法可以在标注成本高的情况下提高活动学习的效率和可行性。
    Abstract Active learning is a promising paradigm to reduce the labeling cost by strategically requesting labels to improve model performance. However, existing active learning methods often rely on expensive acquisition function to compute, extensive modeling retraining and multiple rounds of interaction with annotators. To address these limitations, we propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition. A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time. Our novel algorithmic contribution is a bilevel multi-task bilevel optimization framework that predicts the relative utility -- measured by the validation accuracy -- of different training sets, and ensures the learned acquisition function generalizes effectively. For cases where validation accuracy is expensive to evaluate, we introduce efficient interpolation-based surrogate models to estimate the utility function, reducing the evaluation cost. We demonstrate the performance of our approach through extensive experiments on standard active classification benchmarks. By employing our learned utility function, we show significant improvements over traditional techniques, paving the way for more efficient and effective utility maximization in active learning applications.
    摘要 活动学习是一种有前途的思想,可以减少标注成本,通过策略性地请求标注,提高模型性能。然而,现有的活动学习方法经常依赖于贵重的获取函数来计算,需要广泛的模型重新训练和多轮与注解员的互动。为了解决这些限制,我们提出了一种新的活动学习方法,通过一个学习的代理模型来选择未标注的实例集。我们的新算法贡献是一种缓中多任务缓中优化框架,可以预测不同训练集的相对utilty值,并确保学习得到的获取函数可以广泛适用。在评估utilty值时成本高的情况下,我们引入了高效的 interpolate-based 代理模型,以便估计获取函数,降低评估成本。我们通过对标准的活动分类benchmark进行广泛的实验,证明了我们的方法的性能优势,开拓了更有效率的活动学习应用。

Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting

  • paper_url: http://arxiv.org/abs/2310.17032
  • repo_url: None
  • paper_authors: Saad Zafar Khan, Nazeefa Muzammil, Syed Mohammad Hassan Zaidi, Abdulah Jeza Aljohani, Haibat Khan, Salman Ghafoor
  • for: 预测太阳能生产的精准预测是现代可再生能源系统的关键。本研究 compare Quantum Long Short-Term Memory(QLSTM)和 классиcal Long Short-Term Memory(LSTM)模型在太阳能生产预测中的比较。
  • methods: 我们的控制实验表明,QLSTM具有加速训练收敛和在初始EPoch中显著降低测试损失的优势。这些实验证明QLSTM在处理复杂时间序关系方面具有潜在的优势。
  • results: 我们的实验结果表明,QLSTM在初始EPoch中的测试损失比 классиcal LSTM快了数个数量级。这表明QLSTM在处理复杂时间序关系方面具有潜在的优势。
    Abstract Accurately forecasting solar power generation is crucial in the global progression towards sustainable energy systems. In this study, we conduct a meticulous comparison between Quantum Long Short-Term Memory (QLSTM) and classical Long Short-Term Memory (LSTM) models for solar power production forecasting. Our controlled experiments reveal promising advantages of QLSTMs, including accelerated training convergence and substantially reduced test loss within the initial epoch compared to classical LSTMs. These empirical findings demonstrate QLSTM's potential to swiftly assimilate complex time series relationships, enabled by quantum phenomena like superposition. However, realizing QLSTM's full capabilities necessitates further research into model validation across diverse conditions, systematic hyperparameter optimization, hardware noise resilience, and applications to correlated renewable forecasting problems. With continued progress, quantum machine learning can offer a paradigm shift in renewable energy time series prediction. This pioneering work provides initial evidence substantiating quantum advantages over classical LSTM, while acknowledging present limitations. Through rigorous benchmarking grounded in real-world data, our study elucidates a promising trajectory for quantum learning in renewable forecasting. Additional research and development can further actualize this potential to achieve unprecedented accuracy and reliability in predicting solar power generation worldwide.
    摘要 Forecasting solar power generation accurately is crucial in the global transition to sustainable energy systems. In this study, we compare the Quantum Long Short-Term Memory (QLSTM) and classical Long Short-Term Memory (LSTM) models for solar power production forecasting. Our experiments show that QLSTMs have several advantages, such as faster training convergence and lower test loss within the initial epoch, compared to classical LSTMs. These findings demonstrate the potential of QLSTM to quickly learn complex time series relationships, thanks to quantum phenomena like superposition. However, further research is needed to validate the model across different conditions, optimize hyperparameters, and improve hardware noise resilience. Additionally, we need to explore the application of QLSTM to correlated renewable forecasting problems. With continued progress, quantum machine learning can offer a paradigm shift in renewable energy time series prediction. This study provides initial evidence of quantum advantages over classical LSTM, while acknowledging present limitations. Through rigorous benchmarking with real-world data, we elucidate a promising trajectory for quantum learning in renewable forecasting, paving the way for unprecedented accuracy and reliability in predicting solar power generation worldwide.

On the Identifiability and Interpretability of Gaussian Process Models

  • paper_url: http://arxiv.org/abs/2310.17023
  • repo_url: https://github.com/jiawenchenn/gp_mixture_kernel
  • paper_authors: Jiawen Chen, Wancen Mu, Yun Li, Didong Li
  • for: 本文探讨了在单输出 Gaussian Process(GP)模型中广泛使用的添加性杂合Mat'ernkernel的做法,并研究了这种杂合kernel的性质。
  • methods: 作者在单输出和多输出GP模型中使用了不同的方法,包括deriving theoretical results和进行 simulations和实际应用。
  • results: 研究发现,在单输出情况下,杂合Mat'ernkernel的熔炉性受到最不稳定的组件的限制,而GP模型具有这种杂合kernel的效果等同于最不稳定的kernel组件。此外,作者发现在多输出情况下,covariance matrix $A$是可Identifiable的,表明杂合kernel适合多输出任务。这些结论得到了实际应用和仔细的 simulations 的支持。
    Abstract In this paper, we critically examine the prevalent practice of using additive mixtures of Mat\'ern kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Mat\'ern kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Mat\'ern kernels is determined by the least smooth component and that a GP with such a kernel is effectively equivalent to the least smooth kernel component. Furthermore, we demonstrate that none of the mixing weights or parameters within individual kernel components are identifiable. We then turn our attention to multi-output GP models and analyze the identifiability of the covariance matrix $A$ in the multiplicative kernel $K(x,y) = AK_0(x,y)$, where $K_0$ is a standard single output kernel such as Mat\'ern. We show that $A$ is identifiable up to a multiplicative constant, suggesting that multiplicative mixtures are well suited for multi-output tasks. Our findings are supported by extensive simulations and real applications for both single- and multi-output settings. This work provides insight into kernel selection and interpretation for GP models, emphasizing the importance of choosing appropriate kernel structures for different tasks.
    摘要 在本文中,我们critically examines the prevailing practice of using additive mixtures of Matérn kernels in single-output Gaussian process (GP) models, and explore the properties of multiplicative mixtures of Matérn kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Matérn kernels is determined by the least smooth component, and that a GP with such a kernel is effectively equivalent to the least smooth kernel component. Furthermore, we demonstrate that none of the mixing weights or parameters within individual kernel components are identifiable. We then turn our attention to multi-output GP models and analyze the identifiability of the covariance matrix $A$ in the multiplicative kernel $K(x,y) = AK_0(x,y)$, where $K_0$ is a standard single output kernel such as Matérn. We show that $A$ is identifiable up to a multiplicative constant, suggesting that multiplicative mixtures are well suited for multi-output tasks. Our findings are supported by extensive simulations and real applications for both single- and multi-output settings. This work provides insight into kernel selection and interpretation for GP models, emphasizing the importance of choosing appropriate kernel structures for different tasks.

Streaming Factor Trajectory Learning for Temporal Tensor Decomposition

  • paper_url: http://arxiv.org/abs/2310.17021
  • repo_url: https://github.com/xuangu-fang/streaming-factor-trajectory-learning
  • paper_authors: Shikai Fang, Xin Yu, Shibo Li, Zheng Wang, Robert Kirby, Shandian Zhe
  • for: 这篇论文是为了解决实际应用中的流动数据问题,即如何有效地捕捉流动数据中对象的表示的时间演化。
  • methods: 该论文提出了流动因子轨迹学习(SFTL)方法,使用 Gaussian processes(GPs)来模型因子轨迹的时间演化,并通过将 GPs 转换为状态空间假设来处理流动数据的计算挑战。
  • results: 论文通过 sintetic tasks 和实际应用示例表明了 SFTL 的优势,可以有效地捕捉流动数据中对象的表示时间演化,并且可以在流动数据处理中实现标准的 Rauch-Tung-Striebel 平滑。
    Abstract Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects' representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To address these issues, we propose Streaming Factor Trajectory Learning (SFTL) for temporal tensor decomposition. We use Gaussian processes (GPs) to model the trajectory of factors so as to flexibly estimate their temporal evolution. To address the computational challenges in handling streaming data, we convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE). We develop an efficient online filtering algorithm to estimate a decoupled running posterior of the involved factor states upon receiving new data. The decoupled estimation enables us to conduct standard Rauch-Tung-Striebel smoothing to compute the full posterior of all the trajectories in parallel, without the need for revisiting any previous data. We have shown the advantage of SFTL in both synthetic tasks and real-world applications.
    摘要 实际tensor数据经常同时间信息一起出现。现有的 temporal decomposition方法大多Estimate a set of fixed factors for the objects in each tensor mode, 因此无法捕捉对象表示的 temporal evolution。更重要的是,我们缺乏有效的方法来从流动数据中捕捉这种演化。为了解决这些问题,我们提出了Streaming Factor Trajectory Learning (SFTL) temporal tensor decomposition方法。我们使用 Gaussian processes (GPs) 来模型因子的轨迹,以便灵活地估计其时间演化。为了处理流动数据的计算挑战,我们将GPs转换为状态空间假设,并构建了相应的随机演化方程(SDE)。我们开发了高效的在线筛选算法,以便在接收新数据时估计一个协调运行的因子状态。这种协调估计使我们可以在并行计算中对所有轨迹进行标准的Rauch-Tung-Striebel平滑,而不需要再次访问任何之前的数据。我们在 synthetic tasks 和实际应用中都表明了SFTL的优势。

Simulation based stacking

  • paper_url: http://arxiv.org/abs/2310.17009
  • repo_url: https://github.com/bregaldo/simulation_based_stacking
  • paper_authors: Yuling Yao, Bruno Régaldo-Saint Blancard, Justin Domke
  • for: 这个论文旨在提出一种总结多个 posterior approximation 的核心框架,以提高 Bayesian 计算的精度和可靠性。
  • methods: 该论文使用了多种 inference algorithm 和 architecture,并利用了随机初始化和梯度的Randomness,以获得多个 posterior approximation。
  • results: 该论文通过对多个 benchmark simulations 和一个具有挑战性的 cosmological inference 任务进行示例,证明了该框架的 asymptotic guarantee 和多个 posterior approximation 的合理性。
    Abstract Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a provable asymptotic guarantee, we present a general stacking framework to make use of all available posterior approximations. Our stacking method is able to combine densities, simulation draws, confidence intervals, and moments, and address the overall precision, calibration, coverage, and bias at the same time. We illustrate our method on several benchmark simulations and a challenging cosmological inference task.
    摘要 模拟基于推理已经广泛应用于权重 bayesian 计算中。通常有多个 posterior aproximation,来自不同的推理算法、不同的架构或协同初始化和随机梯度的Randomness。我们提供一种通用的堆叠框架,可以利用所有可用的 posterior approximations。我们的堆叠方法可以将density、实验取样、信任范围和幂等元素组合起来,同时Addressing 总精度、准确性、覆盖率和偏见问题。我们在一些标准的benchmark simulations和一个复杂的 cosmological inference task中进行了示例。

Faster Recalibration of an Online Predictor via Approachability

  • paper_url: http://arxiv.org/abs/2310.17002
  • repo_url: None
  • paper_authors: Princewill Okoroafor, Robert Kleinberg, Wen Sun
  • for: 提高在线预测模型的可靠性和可信度,特别是在对象输出序列可能会被敌意攻击的情况下。
  • methods: 使用黑威尔的可达性定理来转换不准确的在线预测模型,以实现更好的准确率和折补率。
  • results: 提出了一种新的算法,可以在在线预测设置下实现更快的准确率和折补率,并且可以静态地控制折补率和准确率之间的平衡。
    Abstract Predictive models in ML need to be trustworthy and reliable, which often at the very least means outputting calibrated probabilities. This can be particularly difficult to guarantee in the online prediction setting when the outcome sequence can be generated adversarially. In this paper we introduce a technique using Blackwell's approachability theorem for taking an online predictive model which might not be calibrated and transforming its predictions to calibrated predictions without much increase to the loss of the original model. Our proposed algorithm achieves calibration and accuracy at a faster rate than existing techniques arXiv:1607.03594 and is the first algorithm to offer a flexible tradeoff between calibration error and accuracy in the online setting. We demonstrate this by characterizing the space of jointly achievable calibration and regret using our technique.
    摘要 Machine learning 预测模型需要可靠和可信,通常至少表示输出加工。但在在线预测设置下,结果序列可能会被反对抗性生成,这可能使得保证预测的准确性很难。在这篇论文中,我们介绍了一种使用黑威尔的可达性定理来将一个可能不准确的在线预测模型转换成准确预测,而无需增加原始模型的损失。我们的提议算法可以快速实现折衔和准确性的平衡,并且是现有技术arXiv:1607.03594中的第一个可以进行折衔和准确性的flexible tradeoff的算法。我们通过characterizing the jointly achievable calibration and regret space来证明这一点。

Towards Continually Learning Application Performance Models

  • paper_url: http://arxiv.org/abs/2310.16996
  • repo_url: None
  • paper_authors: Ray A. O. Sinurat, Anurag Daram, Haryadi S. Gunawi, Robert B. Ross, Sandeep Madireddy
  • for: 本研究旨在开发一种能够考虑数据分布变化的机器学习性性能模型,以便在生产级HPC系统中进行重要的任务调度和应用优化决策。
  • methods: 本研究使用了随着时间的推移逐渐学习的方法,以抵消数据分布的变化对性能模型的影响。此外,我们还采用了一种叫做归并学习的技术,以避免训练过程中的违和强化现象。
  • results: 我们的最佳模型能够保持准确性,即使面临新的数据分布变化,同时在整个数据序列中的预测精度比预期方法提高了2倍。
    Abstract Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions. Traditionally, these models assume that data distribution does not change as more samples are collected over time. However, owing to the complexity and heterogeneity of production HPC systems, they are susceptible to hardware degradation, replacement, and/or software patches, which can lead to drift in the data distribution that can adversely affect the performance models. To this end, we develop continually learning performance models that account for the distribution drift, alleviate catastrophic forgetting, and improve generalizability. Our best model was able to retain accuracy, regardless of having to learn the new distribution of data inflicted by system changes, while demonstrating a 2x improvement in the prediction accuracy of the whole data sequence in comparison to the naive approach.
    摘要

Probabilistic Integral Circuits

  • paper_url: http://arxiv.org/abs/2310.16986
  • repo_url: None
  • paper_authors: Gennaro Gala, Cassio de Campos, Robert Peharz, Antonio Vergari, Erik Quaeghebeur
  • for: 这篇论文旨在探讨连续隐变量(Continuous Latent Variables,简称LV)和概率Circuit(PC)两种模型之间的桥接,以及它们之间的一致性。
  • methods: 该论文提出了一种新的计算图语言——概率积分电路(Probabilistic Integral Circuits,简称PIC),它将PC的积分单元扩展到连续隐变量,从而实现了PC的扩展和改进。PIC使用 симвоlic计算图,可以在一些简单的情况下进行analytical integration,并且可以通过光量神经网络来参数化。
  • results: 在多个分布估计 benchmark 上,PIC-approximating PCs 系统性地超越了常见的PCs,它们通常通过 expectation-maximization 或 SGD 来学习。这表明PIC可以在连续隐变量模型中实现PC的 tractability 和表达能力。
    Abstract Continuous latent variables (LVs) are a key ingredient of many generative models, as they allow modelling expressive mixtures with an uncountable number of components. In contrast, probabilistic circuits (PCs) are hierarchical discrete mixtures represented as computational graphs composed of input, sum and product units. Unlike continuous LV models, PCs provide tractable inference but are limited to discrete LVs with categorical (i.e. unordered) states. We bridge these model classes by introducing probabilistic integral circuits (PICs), a new language of computational graphs that extends PCs with integral units representing continuous LVs. In the first place, PICs are symbolic computational graphs and are fully tractable in simple cases where analytical integration is possible. In practice, we parameterise PICs with light-weight neural nets delivering an intractable hierarchical continuous mixture that can be approximated arbitrarily well with large PCs using numerical quadrature. On several distribution estimation benchmarks, we show that such PIC-approximating PCs systematically outperform PCs commonly learned via expectation-maximization or SGD.
    摘要 PICs are symbolic computational graphs and are fully tractable in simple cases where analytical integration is possible. In practice, we parameterize PICs with lightweight neural nets, delivering an intractable hierarchical continuous mixture that can be approximated arbitrarily well with large PCs using numerical quadrature. On several distribution estimation benchmarks, we show that such PIC-approximating PCs systematically outperform PCs commonly learned via expectation-maximization or SGD.

Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark

  • paper_url: http://arxiv.org/abs/2310.16981
  • repo_url: https://github.com/vanderschaarlab/data-centric-synthetic-data
  • paper_authors: Lasse Hansen, Nabeel Seedat, Mihaela van der Schaar, Andrija Petrovic
  • for: 提高机器学习模型的训练数据质量和效果。
  • methods: 结合数据中心AI技术,对数据进行 profiling,以准确反映实际数据的复杂特征。
  • results: 对 eleven 个不同的表格数据集进行实验,发现现有的生成方法具有一定的局限性和不足,并提出了实践建议,以提高生成数据的质量和效果。
    Abstract Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper addresses this issue by exploring the potential of integrating data-centric AI techniques which profile the data to guide the synthetic data generation process. Moreover, we shed light on the often ignored consequences of neglecting these data profiles during synthetic data generation -- despite seemingly high statistical fidelity. Subsequently, we propose a novel framework to evaluate the integration of data profiles to guide the creation of more representative synthetic data. In an empirical study, we evaluate the performance of five state-of-the-art models for tabular data generation on eleven distinct tabular datasets. The findings offer critical insights into the successes and limitations of current synthetic data generation techniques. Finally, we provide practical recommendations for integrating data-centric insights into the synthetic data generation process, with a specific focus on classification performance, model selection, and feature selection. This study aims to reevaluate conventional approaches to synthetic data generation and promote the application of data-centric AI techniques in improving the quality and effectiveness of synthetic data.
    摘要 人工数据作为机器学习模型训练时的替代方案,特别是当实际世界数据scarce或difficult to access时。然而,确保人工数据准确反映实际世界数据的复杂特点是一项挑战。这篇论文通过探讨integrating data-centric AI技术,以 Profiling the data to guide the synthetic data generation process。此外,我们还探讨在不考虑这些数据Profile during synthetic data generation时所忽略的后果--尽管似乎具有高度的统计准确性。随后,我们提出了一种新的评估框架,用于评估数据Profile的集成,以创建更代表性的人工数据。在一项实验研究中,我们评估了五种当今最佳实践的 tabular data生成模型在 eleven 个不同的 tabular 数据集上。发现的结果提供了关键的洞察,揭示了当前人工数据生成技术的成功和局限性。最后,我们提供了实践oriented的建议,用于在人工数据生成过程中integrating data-centric AI技术,特别是关于分类性能、模型选择和特征选择。本研究的目标是重新评估现有的人工数据生成方法,并促进data-centric AI技术在提高人工数据质量和效果方面的应用。

Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference

  • paper_url: http://arxiv.org/abs/2310.16975
  • repo_url: https://github.com/emorymlip/pcp-map
  • paper_authors: Zheyu Oliver Wang, Ricardo Baptista, Youssef Marzouk, Lars Ruthotto, Deepanshu Verma
  • for: 这两种神经网络方法用于解决静态和动态conditional optimal transport(COT)问题,以便实现样本和概率分布估计,这些任务是 bayesian inference 的核心任务。
  • methods: 这两种方法都是基于 measure transport 框架,将目标 conditional distribution 表示为一个可追踪的参考分布的变换。 COT 图是这个框架中的一个可能性,具有uniqueness 和 monotonicity 的优点。然而,相关的 COT 问题在 moderate 维度时 computationally challenging。为了提高可扩展性,我们的数学算法利用神经网络来参数化 COT 图。
  • results: 我们的方法比 state-of-the-art 方法更高效和更准确,可以在 benchmark 数据集和 bayesian inverse problem 中进行证明。 PCP-Map 模型将 conditional transport 图表示为 partially input convex neural network(PICNN)的梯度,并使用一种新的数学实现来提高计算效率。 COT-Flow 模型则使用一种含杂化的神经网络 ODE 来表示 conditional transport,它在训练时 slower,但在 sampling 时 faster。
    Abstract We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems, respectively. Both approaches enable sampling and density estimation of conditional probability distributions, which are core tasks in Bayesian inference. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. COT maps are a canonical choice within this framework, with desirable properties such as uniqueness and monotonicity. However, the associated COT problems are computationally challenging, even in moderate dimensions. To improve the scalability, our numerical algorithms leverage neural networks to parameterize COT maps. Our methods exploit the structure of the static and dynamic formulations of the COT problem. PCP-Map models conditional transport maps as the gradient of a partially input convex neural network (PICNN) and uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. COT-Flow models conditional transports via the flow of a regularized neural ODE; it is slower to train but offers faster sampling. We demonstrate their effectiveness and efficiency by comparing them with state-of-the-art approaches using benchmark datasets and Bayesian inverse problems.
    摘要 我们提出了两种神经网络方法,用于近似静态和动态条件最优运输(COT)问题的解决方案。这两种方法允许采样和概率分布估计,这些任务是泛型推理中核心任务之一。我们的方法将目标 conditional distribution 表示为一个可迭代的参考分布变换,因此属于度量运输框架。 COT 图是这种框架中的一个可能性,具有uniqueness 和 monotonicity 的感知性。然而,相关的 COT 问题在moderate 维度下 computationally 挑战。为了提高可扩展性,我们的数字算法使用神经网络来参数化 COT 图。我们的方法利用静态和动态 COT 问题的结构。 PCP-Map 模型 conditional transport 图像为 partially input convex neural network(PICNN)的梯度,并使用一种新的数字实现以提高计算效率相比之前的状态艺术。 COT-Flow 模型 conditional transport 通过一个正则化神经 ODE 的流动来实现,它在训练 slower 但在样本 faster 。我们通过对比 benchmark 数据和推理 inverse 问题来证明它们的有效性和效率。

Privately Aligning Language Models with Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16960
  • repo_url: None
  • paper_authors: Fan Wu, Huseyin A. Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim
  • for: 本研究旨在采用异谱隐私(DP)和强化学习(RL)对大型自然语言模型(LLM)进行适应性调整,以提高模型的听说能力和隐私保护。
  • methods: 本研究使用了Ziegler等人(2020)提出的两种主要方法:一是通过RL无人Loop(例如正面评价生成)进行调整,二是通过RL人反馈(RLHF)(例如 SUMMARIZATION 在人类偏好的方式)。我们提供了一个新的DP框架来实现这两种方法的适应性调整,并证明了其正确性。
  • results: 我们的实验结果证明了我们的方法的有效性,即可以提供竞争力强的实用性while ensuring strong privacy protections。
    Abstract Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we initiate the study of privacy-preserving alignment of LLMs through Differential Privacy (DP) in conjunction with RL. Following the influential work of Ziegler et al. (2020), we study two dominant paradigms: (i) alignment via RL without human in the loop (e.g., positive review generation) and (ii) alignment via RL from human feedback (RLHF) (e.g., summarization in a human-preferred way). We give a new DP framework to achieve alignment via RL, and prove its correctness. Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections.
    摘要 位于预训练和用户部署之间,通过强化学习(RL)来对大语言模型(LLM)进行对齐已成为训练指令遵从模型如ChatGPT的主要策略。在这项工作中,我们开始研究保护隐私的LLM对齐方法,通过强化学习和隐私保护(DP)。根据茅利等人(2020)的 influential work,我们研究两种主导方法:(i)通过RL无人参与(例如, Positive Review生成)和(ii)通过RL从人类反馈(RLHF,例如,人类偏好的概要)。我们提出了一个新的DP框架来实现LLM对齐,并证明其正确性。我们的实验结果证明了我们的方法的有效性,具有竞争的实用性并保护强大隐私权。

Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

  • paper_url: http://arxiv.org/abs/2310.16959
  • repo_url: None
  • paper_authors: Ananth Balashankar, Xiao Ma, Aradhana Sinha, Ahmad Beirami, Yao Qin, Jilin Chen, Alex Beutel
  • for: 这 paper 探讨了 LLM 在新的安全问题和政策出现后,如何建立检测违反规则的分类器。
  • methods: 这 paper 使用了域总化少数例学习来解决 LLM 在新的安全问题上建立分类器。
  • results: 这 paper 实验表明,相比于优化学习和示例选择,参数高效调整(PEFT)和类例基于的数据增强(DAPT)方法可以在新的安全规则下提高分类器的性能,具体提高了 Social Chemistry 道德判断和 Toxicity 检测任务中的 F1 分数和 AUC 值。
    Abstract As large language models (LLMs) are widely adopted, new safety issues and policies emerge, to which existing safety classifiers do not generalize well. If we have only observed a few examples of violations of a new safety rule, how can we build a classifier to detect violations? In this paper, we study the novel setting of domain-generalized few-shot learning for LLM-based text safety classifiers. Unlike prior few-shot work, these new safety issues can be hard to uncover and we do not get to choose the few examples. We demonstrate that existing few-shot techniques do not perform well in this setting, and rather we propose to do parameter-efficient fine-tuning (PEFT) combined with augmenting training data based on similar examples in prior existing rules. We empirically show that our approach of similarity-based data-augmentation + prompt-tuning (DAPT) consistently outperforms baselines that either do not rely on data augmentation or on PEFT by 7-17% F1 score in the Social Chemistry moral judgement and 9-13% AUC in the Toxicity detection tasks, even when the new rule is loosely correlated with existing ones.
    摘要 大型语言模型(LLM)的广泛采用导致新的安全问题和政策出现,现有的安全分类器不能通用。如果我们只有几个例外新安全规则的违反示例,如何建立一个检测违反的分类器?在这篇论文中,我们研究了域总化几个步骤的文本安全分类器。不同于先前的几个步骤工作,这些新的安全问题可能很难发现,我们不能选择几个示例。我们表明,现有的几个步骤技术不适用于这种设定,而是提议使用效率高的参数调整(PEFT)和基于先前的规则相似例子的数据增强(DAPT)。我们实际证明,我们的方法在社交化学道德评价和攻击性识别任务中表现出了7-17%的F1分和9-13%的AUC提升,即使新规则与现有规则之间存在潜在的相互关系。

Transferring a molecular foundation model for polymer property predictions

  • paper_url: http://arxiv.org/abs/2310.16958
  • repo_url: None
  • paper_authors: Pei Zhang, Logan Kearney, Debsindhu Bhowmik, Zachary Fox, Amit K. Naskar, John Gounley
  • for: 加速设计优化,如药品开发和材料发现
  • methods: 使用对小分子进行预训的Transformer型语言模型,并在这些模型上进行精细调整,以估计聚合物性能
  • results: 使用这种方法可以获得与对增强聚合物数据进行增强调整的比较类似的准确性,但是无需进行耗时的数据增强。
    Abstract Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and materials discovery. Self-supervised pretraining of transformer models requires large-scale datasets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incurs extra computational costs. In contrast, large-scale open-source datasets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieve comparable accuracy to those trained on augmented polymer datasets for a series of benchmark prediction tasks.
    摘要 使用基于转换器的大型自然语言模型可以快速加速设计优化,如药物开发和材料发现。自我超级vised pretraining转换器模型需要大规模数据集,但材料科学领域中这些数据集 часто是罕见的。现状的方法是对材料进行数据扩充来生成更多样本,但这会添加额外的计算成本。相反,小分子领域有大规模的开源数据集可供使用,这提供了数据缺乏的解决方案通过传输学习。在这项工作中,我们显示了使用基于小分子的转换器模型,并将其精度调整为聚合物性能可以达到与数据扩充后的转换器模型相同的准确性水平。Here's the translation in Simplified Chinese:使用基于转换器的大型自然语言模型可以快速加速设计优化,如药物开发和材料发现。自我超级vised pretraining转换器模型需要大规模数据集,但材料科学领域中这些数据集 часто是罕见的。现状的方法是对材料进行数据扩充来生成更多样本,但这会添加额外的计算成本。相反,小分子领域有大规模的开源数据集可供使用,这提供了数据缺乏的解决方案通过传输学习。在这项工作中,我们显示了使用基于小分子的转换器模型,并将其精度调整为聚合物性能可以达到与数据扩充后的转换器模型相同的准确性水平。

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

  • paper_url: http://arxiv.org/abs/2310.16955
  • repo_url: None
  • paper_authors: Aradhana Sinha, Ananth Balashankar, Ahmad Beirami, Thi Avrahami, Jilin Chen, Alex Beutel
  • for: 这个论文的目的是提高自然语言处理系统对人类敌对者的Robustness。
  • methods: 这个论文使用有限的人类敌对例进行对抗训练,以生成更多的有用敌对例。
  • results: 对ANLI和仇恨言语检测 benchmark数据集进行训练,相比只训练 observed human attacks,同时训练 synthetic adversarial examples,可以提高模型对未来人类敌对者的Robustness。
    Abstract Real-world natural language processing systems need to be robust to human adversaries. Collecting examples of human adversaries for training is an effective but expensive solution. On the other hand, training on synthetic attacks with small perturbations - such as word-substitution - does not actually improve robustness to human adversaries. In this paper, we propose an adversarial training framework that uses limited human adversarial examples to generate more useful adversarial examples at scale. We demonstrate the advantages of this system on the ANLI and hate speech detection benchmark datasets - both collected via an iterative, adversarial human-and-model-in-the-loop procedure. Compared to training only on observed human attacks, also training on our synthetic adversarial examples improves model robustness to future rounds. In ANLI, we see accuracy gains on the current set of attacks (44.1%$\,\to\,$50.1%) and on two future unseen rounds of human generated attacks (32.5%$\,\to\,$43.4%, and 29.4%$\,\to\,$40.2%). In hate speech detection, we see AUC gains on current attacks (0.76 $\to$ 0.84) and a future round (0.77 $\to$ 0.79). Attacks from methods that do not learn the distribution of existing human adversaries, meanwhile, degrade robustness.
    摘要 现实世界的自然语言处理系统需要强健于人类黑客。收集人类黑客的示例用于训练是一种有效的 pero 昂贵的解决方案。然而,使用小偏移量的Synthetic攻击训练并不实际提高对人类黑客的Robustness。在这篇论文中,我们提出了一种基于有限的人类黑客示例的对抗训练框架。我们示示了这种系统在ANLI和仇恨言语检测 benchmark 数据集上的优势。与仅仅训练在观察到的人类攻击的情况相比,我们的对抗训练还能提高模型对未来征的Robustness。在 ANLI 中,我们看到了当前攻击的准确率提高(44.1% $\to$ 50.1%),以及未来两个未见的人类生成的攻击(32.5% $\to$ 43.4%,和 29.4% $\to$ 40.2%)。在仇恨言语检测中,我们看到了投用率提高(0.76 $\to$ 0.84)以及未来一个round(0.77 $\to$ 0.79)。而不学习现有人类黑客的分布的攻击方法则会降低Robustness。

Causal Q-Aggregation for CATE Model Selection

  • paper_url: http://arxiv.org/abs/2310.16945
  • repo_url: None
  • paper_authors: Hui Lan, Vasilis Syrgkanis
  • for: 该论文的目的是提出一种新的 conditional average treatment effect(CATE)模型选择方法,以便实现个性化决策。
  • methods: 该论文使用了 proxy loss metrics with double robust properties 和 model ensembling,并提出了一种基于 Q-aggregation 的新的 CATE 模型选择方法。
  • results: 该论文的主要结果表明,该新方法可以 дости到 statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$,并且不需要任何候选 CATE 模型都需要接近真实值。
    Abstract Accurate estimation of conditional average treatment effects (CATE) is at the core of personalized decision making. While there is a plethora of models for CATE estimation, model selection is a nontrivial task, due to the fundamental problem of causal inference. Recent empirical work provides evidence in favor of proxy loss metrics with double robust properties and in favor of model ensembling. However, theoretical understanding is lacking. Direct application of prior theoretical work leads to suboptimal oracle model selection rates due to the non-convexity of the model selection problem. We provide regret rates for the major existing CATE ensembling approaches and propose a new CATE model ensembling approach based on Q-aggregation using the doubly robust loss. Our main result shows that causal Q-aggregation achieves statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$ (with $M$ models and $n$ samples), with the addition of higher-order estimation error terms related to products of errors in the nuisance functions. Crucially, our regret rate does not require that any of the candidate CATE models be close to the truth. We validate our new method on many semi-synthetic datasets and also provide extensions of our work to CATE model selection with instrumental variables and unobserved confounding.
    摘要 “精确估计 conditional average treatment effect (CATE) 是个核心的人性化决策问题。尽管有许多 CATE 估计模型,但选择这些模型是一个非常困难的任务,因为 causal inference 的基本问题。latest empirical work 表明,使用 proxy loss metrics with double robust properties 和 model ensembling 可以提高 CATE 估计的精度。然而,理论上的理解是lacking。直接运用先前的理论工作将会导致 suboptimal oracle model selection rates,因为 model selection 问题是非断的。我们提供了 existing CATE ensembling 方法中的 regret rates,并提出了一个基于 Q-aggregation 的新 CATE model ensembling 方法,使用 doubly robust loss。我们的主要结果表明,causal Q-aggregation 可以 дости得 statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$(with $M$ models and $n$ samples),并且添加了高阶 estimation error terms related to products of errors in the nuisance functions。其中,我们的 regret rate 不需要任何 candidate CATE 模型都需要接近 truth。我们验证了我们的新方法在许多 semi-synthetic 数据上,并且提供了 CATE model selection with instrumental variables 和 unobserved confounding 的扩展。”

Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots

  • paper_url: http://arxiv.org/abs/2310.16941
  • repo_url: None
  • paper_authors: Connor Mattson, Jeremy C. Clark, Daniel S. Brown
  • for: 本研究旨在探讨功能不同的机器人群体中可能出现的新行为。
  • methods: 本研究使用了 novelties 搜索和聚类来找到新的 emergent 行为。
  • results: 研究发现,先前的方法无法找到许多有趣的行为,而人类在Loop 的探索过程可以找到更多的行为。总共发现了 23 个 emergent 行为,其中 18 个是新发现的。这些行为是 computation-free 代理人群体中的首次发现。Here’s the same information in Simplified Chinese:
  • for: 本研究旨在探讨功能不同的机器人群体中可能出现的新行为。
  • methods: 本研究使用了 novelties 搜索和聚类来找到新的 emergent 行为。
  • results: 研究发现,先前的方法无法找到许多有趣的行为,而人类在Loop 的探索过程可以找到更多的行为。总共发现了 23 个 emergent 行为,其中 18 个是新发现的。这些行为是 computation-free 代理人群体中的首次发现。
    Abstract We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this paper, we seek to better understand the role of novelty search and the efficacy of using clustering to discover novel emergent behaviors. Through a large set of experiments and ablations, we analyze the effect of representations, evolutionary search, and various clustering methods in the search for novel behaviors in a heterogeneous swarm. Our results indicate that prior methods fail to discover many interesting behaviors and that an iterative human-in-the-loop discovery process discovers more behaviors than random search, swarm chemistry, and automated behavior discovery. The combined discoveries of our experiments uncover 23 emergent behaviors, 18 of which are novel discoveries. To the best of our knowledge, these are the first known emergent behaviors for heterogeneous swarms of computation-free agents. Videos, code, and appendix are available at the project website: https://sites.google.com/view/heterogeneous-bd-methods
    摘要 我们研究一个功能多样化群体机器人的发展行为问题,该群体具有有限的能力。先前的研究曾经考虑过同型群体的行为搜索,并提出使用新奇搜索来寻找行为空间中的新行为,然后使用对应的对应组合来返回用户。在这篇论文中,我们想要更好地理解新奇搜索的角色以及使用对应组合来发现新的发展行为的有效性。通过一系列实验和删除,我们分析了表示、演化搜索和对应组合的效果,以寻找在多样化群体中发现新的行为。我们的结果显示先前的方法无法发现许多有兴趣的行为,而一种轮循人类在过程中的寻找过程可以发现更多的行为,比如随机搜索、群体化学和自动行为发现。我们的实验发现了23个发展行为,其中18个是新发现。到目前为止,这些是同型多样化群体机器人中第一个已知的发展行为。详细信息可以在项目网站上找到:https://sites.google.com/view/heterogeneous-bd-methods。

MimicTouch: Learning Human’s Control Strategy with Multi-Modal Tactile Feedback

  • paper_url: http://arxiv.org/abs/2310.16917
  • repo_url: None
  • paper_authors: Kelin Yu, Yunhai Han, Matthew Zhu, Ye Zhao
  • for: 这paper的目的是开发一种基于人类感觉的控制策略的机器人控制系统。
  • methods: 这paper使用了多模态感觉数据集和模拟学习技术,以及在线差分强化学习来让机器人模仿人类的感觉控制策略。
  • results: experiments show that MimicTouch 可以安全地将人类的感觉控制策略传递给机器人,并且能够在各种任务中提高机器人的性能。
    Abstract In robotics and artificial intelligence, the integration of tactile processing is becoming increasingly pivotal, especially in learning to execute intricate tasks like alignment and insertion. However, existing works focusing on tactile methods for insertion tasks predominantly rely on robot teleoperation data and reinforcement learning, which do not utilize the rich insights provided by human's control strategy guided by tactile feedback. For utilizing human sensations, methodologies related to learning from humans predominantly leverage visual feedback, often overlooking the invaluable tactile feedback that humans inherently employ to finish complex manipulations. Addressing this gap, we introduce "MimicTouch", a novel framework that mimics human's tactile-guided control strategy. In this framework, we initially collect multi-modal tactile datasets from human demonstrators, incorporating human tactile-guided control strategies for task completion. The subsequent step involves instructing robots through imitation learning using multi-modal sensor data and retargeted human motions. To further mitigate the embodiment gap between humans and robots, we employ online residual reinforcement learning on the physical robot. Through comprehensive experiments, we validate the safety of MimicTouch in transferring a latent policy learned through imitation learning from human to robot. This ongoing work will pave the way for a broader spectrum of tactile-guided robotic applications.
    摘要 在机器人和人工智能领域,感觉处理的集成在执行复杂任务 like 对接和定位方面变得越来越重要,特别是在学习执行这些任务时。然而,现有的执行任务中的策略几乎完全依赖于机器人 теле操作数据和奖励学习,不使用人类的感觉指导策略。为了利用人类的感觉,相关的学习人类方法主要依赖于视觉反馈,经常忽略人类在完成复杂把握中的感觉反馈。为了解决这个空隙,我们提出了“模仿感觉”(MimicTouch)框架。在这个框架中,我们首先收集多modal的感觉数据集,包括人类在完成任务时的感觉指导策略。接着,我们通过模仿学习使用多modal感觉数据和重定向的人类动作来指导机器人。为了进一步减少人机embody gap,我们使用在线剩余奖励学习来调整物理机器人。通过广泛的实验,我们证明了MimicTouch可以安全地将人类的潜在策略传递给机器人。这项工作将为机器人感觉导向应用带来更广泛的前景。

Transformer-based Atmospheric Density Forecasting

  • paper_url: http://arxiv.org/abs/2310.16912
  • repo_url: None
  • paper_authors: Julia Briden, Peng Mun Siew, Victor Rodriguez-Fernandez, Richard Linares
  • for: 预测大气密度,提高空间环境意识。
  • methods: 使用深度学习模型, capture 大气密度数据中长期关系。
  • results: 比较 Empirical NRLMSISE-00、JB2008 和 TIEGCM 模型,显示 transformer-based пропагатор 的预测性能更高。
    Abstract As the peak of the solar cycle approaches in 2025 and the ability of a single geomagnetic storm to significantly alter the orbit of Resident Space Objects (RSOs), techniques for atmospheric density forecasting are vital for space situational awareness. While linear data-driven methods, such as dynamic mode decomposition with control (DMDc), have been used previously for forecasting atmospheric density, deep learning-based forecasting has the ability to capture nonlinearities in data. By learning multiple layer weights from historical atmospheric density data, long-term dependencies in the dataset are captured in the mapping between the current atmospheric density state and control input to the atmospheric density state at the next timestep. This work improves upon previous linear propagation methods for atmospheric density forecasting, by developing a nonlinear transformer-based architecture for atmospheric density forecasting. Empirical NRLMSISE-00 and JB2008, as well as physics-based TIEGCM atmospheric density models are compared for forecasting with DMDc and with the transformer-based propagator.
    摘要 As the peak of the solar cycle approaches in 2025 and the ability of a single geomagnetic storm to significantly alter the orbit of Resident Space Objects (RSOs), 技术 для预测大气密度是 Space situational awareness 中非常重要的。而以前使用的线性数据驱动方法,如动态模式分解控制(DMDc),已经用于大气密度预测,但是深度学习基本法可以捕捉数据中的非线性关系。通过从历史大气密度数据中学习多层权重,捕捉了数据中长期依赖关系,并将当前大气密度状态与控制输入的映射关系储存在内存中。这种工作提高了之前的线性协议方法,通过开发一种基于 transformer 架构的大气密度预测方法。empirical NRLMSISE-00 和 JB2008 模型,以及基于物理学的 TIEGCM 大气密度模型,与 DMDc 和 transformer 基于的传播器进行比较。

Deep machine learning for meteor monitoring: advances with transfer learning and gradient-weighted class activation mapping

  • paper_url: http://arxiv.org/abs/2310.16826
  • repo_url: None
  • paper_authors: Eloy Peña-Asensio, Josep M. Trigo-Rodríguez, Pau Grèbol-Tomàs, David Regordosa-Avellana, Albert Rimola
  • for: 这个论文主要目的是提出一种自动化的激光探测系统,以便更好地研究流星学。
  • methods: 该论文使用了卷积神经网络(CNNs)来自动分类候选的流星检测图像。具体来说,使用了Gradient-weighted Class Activation Mapping(Grad-CAM)技术来确定流星在每幅图像中的准确位置。
  • results: 该论文在使用SPMN数据集进行训练和评估后,实现了98%的精度。这种新方法有助于减少流星科学家和站点运行人员的工作负担,同时提高流星跟踪和分类的精度。
    Abstract In recent decades, the use of optical detection systems for meteor studies has increased dramatically, resulting in huge amounts of data being analyzed. Automated meteor detection tools are essential for studying the continuous meteoroid incoming flux, recovering fresh meteorites, and achieving a better understanding of our Solar System. Concerning meteor detection, distinguishing false positives between meteor and non-meteor images has traditionally been performed by hand, which is significantly time-consuming. To address this issue, we developed a fully automated pipeline that uses Convolutional Neural Networks (CNNs) to classify candidate meteor detections. Our new method is able to detect meteors even in images that contain static elements such as clouds, the Moon, and buildings. To accurately locate the meteor within each frame, we employ the Gradient-weighted Class Activation Mapping (Grad-CAM) technique. This method facilitates the identification of the region of interest by multiplying the activations from the last convolutional layer with the average of the gradients across the feature map of that layer. By combining these findings with the activation map derived from the first convolutional layer, we effectively pinpoint the most probable pixel location of the meteor. We trained and evaluated our model on a large dataset collected by the Spanish Meteor Network (SPMN) and achieved a precision of 98\%. Our new methodology presented here has the potential to reduce the workload of meteor scientists and station operators and improve the accuracy of meteor tracking and classification.
    摘要 近年来,用于天体研究的光学探测系统的使用量有所增加,导致大量数据需要进行分析。自动化的流星探测工具是研究不断的流星oid进来流和回收新的流星的关键。在流星探测方面,传统上通过手动进行分类来 отлича流星和非流星图像,这是非常时间消耗的。为解决这个问题,我们开发了一个完全自动化的管道,使用卷积神经网络(CNNs)来分类候选的流星探测。我们的新方法可以在包含静止元素 such as 云、月亮和建筑物的图像中探测流星。为了准确地在每帧中找到流星,我们使用了梯度加权灵活图像(Grad-CAM)技术。这种方法将各层的激活值与该层的梯度的平均值相乘,以便在特定的特征图像中标识感兴趣的区域。通过将这些发现与基层卷积层的激活图像相结合,我们可以准确地定位流星在每帧中的位置。我们使用了大量由西班牙流星网络(SPMN)收集的数据进行训练和评估,并达到了98%的精度。我们的新方法可以减少流星科学家和站点操作人员的工作负担,并提高流星跟踪和分类的精度。

CATE Lasso: Conditional Average Treatment Effect Estimation with High-Dimensional Linear Regression

  • paper_url: http://arxiv.org/abs/2310.16819
  • repo_url: None
  • paper_authors: Masahiro Kato, Masaaki Imaizumi
  • for: This paper is written to study the estimation of Conditional Average Treatment Effects (CATEs) in causal inference, specifically in the presence of high-dimensional and non-sparse parameters.
  • methods: The paper proposes a method for consistently estimating CATEs using Lasso regression, which is specialized for CATE estimation and leverages the assumption of implicit sparsity.
  • results: The paper demonstrates the consistency of the proposed method through simulation studies, and shows that desirable theoretical properties such as consistency remain attainable even without assuming sparsity explicitly.
    Abstract In causal inference about two treatments, Conditional Average Treatment Effects (CATEs) play an important role as a quantity representing an individualized causal effect, defined as a difference between the expected outcomes of the two treatments conditioned on covariates. This study assumes two linear regression models between a potential outcome and covariates of the two treatments and defines CATEs as a difference between the linear regression models. Then, we propose a method for consistently estimating CATEs even under high-dimensional and non-sparse parameters. In our study, we demonstrate that desirable theoretical properties, such as consistency, remain attainable even without assuming sparsity explicitly if we assume a weaker assumption called implicit sparsity originating from the definition of CATEs. In this assumption, we suppose that parameters of linear models in potential outcomes can be divided into treatment-specific and common parameters, where the treatment-specific parameters take difference values between each linear regression model, while the common parameters remain identical. Thus, in a difference between two linear regression models, the common parameters disappear, leaving only differences in the treatment-specific parameters. Consequently, the non-zero parameters in CATEs correspond to the differences in the treatment-specific parameters. Leveraging this assumption, we develop a Lasso regression method specialized for CATE estimation and present that the estimator is consistent. Finally, we confirm the soundness of the proposed method by simulation studies.
    摘要 在 causal inference 中, conditional average treatment effects (CATEs) 扮演着重要的角色,它是个人化的 causal effect,定义为两个待遇中期望的差异,它们条件于 covariates。这个研究假设了两个线性回归模型,它们连接 potential outcome 和 covariates 的两个待遇。然后,我们提出了一种方法来稳定地估计 CATEs,即使高维度和非杂参数情况下也可以。在我们的研究中,我们证明了欲望的理论性质,如一致性,可以在不假设简单性的情况下保持。在这个假设中,我们假设了待遇中的参数可以分解为待遇特定的参数和通用参数,其中待遇特定的参数在各个线性回归模型中差异化,而通用参数保持不变。因此,在两个线性回归模型之间的差异中,通用参数消失,只有待遇特定的参数存在非零值。基于这个假设,我们开发了一种特化于 CATE 估计的 Lasso 回归方法,并证明了该估计器是一致的。最后,我们通过 simulate 研究证明了我们的方法的正确性。

Learning COVID-19 Regional Transmission Using Universal Differential Equations in a SIR model

  • paper_url: http://arxiv.org/abs/2310.16804
  • repo_url: https://github.com/adrocampos/udes_in_sir_regional_transmision
  • paper_authors: Adrian Rojas-Campos, Lukas Stelz, Pascal Nieters
  • for: 模型COVID-19的传播行为在高度连接的社会中很难模拟。单个区域SIR模型无法考虑来自其他地区的感染力量,扩展到大量交互地区则需要许多不实际存在的假设。
  • methods: 我们提出使用Universal Differential Equations(UDEs)来捕捉邻居地区对感染的影响,并与SIR模型结合使用。UDEs是 Totally或部分由深度神经网络(DNN)定义的微分方程。我们添加了一个由DNN学习来自其他地区的感染力量的添加项到SIR方程中。学习使用自动导数和梯度下降来让模型更好地预测疫情的发展。
  • results: 我们对使用模拟COVID-19疫情的方法进行比较,包括单个区域SIR模型和基于 solely DNN的完全数据驱动模型。我们发现,提出的UDE+SIR模型可以更准确地预测疫情的发展,但在疫情的末期出现衰减性的表现。单个区域SIR模型和完全数据驱动模型无法正确地预测疫情的发展。
    Abstract Highly-interconnected societies difficult to model the spread of infectious diseases such as COVID-19. Single-region SIR models fail to account for incoming forces of infection and expanding them to a large number of interacting regions involves many assumptions that do not hold in the real world. We propose using Universal Differential Equations (UDEs) to capture the influence of neighboring regions and improve the model's predictions in a combined SIR+UDE model. UDEs are differential equations totally or partially defined by a deep neural network (DNN). We include an additive term to the SIR equations composed by a DNN that learns the incoming force of infection from the other regions. The learning is performed using automatic differentiation and gradient descent to approach the change in the target system caused by the state of the neighboring regions. We compared the proposed model using a simulated COVID-19 outbreak against a single-region SIR and a fully data-driven model composed only of a DNN. The proposed UDE+SIR model generates predictions that capture the outbreak dynamic more accurately, but a decay in performance is observed at the last stages of the outbreak. The single-area SIR and the fully data-driven approach do not capture the proper dynamics accurately. Once the predictions were obtained, we employed the SINDy algorithm to substitute the DNN with a regression, removing the black box element of the model with no considerable increase in the error levels.
    摘要 高度连接的社会困难模型冠状病毒如COVID-19的传播。单个地区SIR模型无法考虑来自其他地区的感染力量和扩展它们到许多交互地区具有许多假设不符实际世界。我们提议使用全球差分方程(UDE)捕捉邻近地区的影响并改进模型预测,在SIR+UDE模型中。UDEs是 Totally或部分定义为深度神经网络(DNN)的差分方程。我们添加了一个由DNN学习来自其他地区的感染力量的添加项到SIR方程中。学习使用自动微分和梯度下降以 approachingtarget系统中由邻近地区的状态所引起的变化。我们对提议模型使用了模拟COVID-19爆发的结果进行比较,与单个地区SIR模型和完全数据驱动模型(只由DNN组成)进行比较。提议的UDE+SIR模型生成了更加准确地捕捉爆发动态的预测,但在爆发的最后阶段显示衰减性。单个地区SIR和完全数据驱动方法不能准确地捕捉动态。一旦预测得到,我们使用SINDy算法将DNN替换为回归,从而消除黑盒模型的隐身元素,而无 considerable增加误差水平。

From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction

  • paper_url: http://arxiv.org/abs/2310.16802
  • repo_url: None
  • paper_authors: Nima Shoghi, Adeesh Kolluru, John R. Kitchin, Zachary W. Ulissi, C. Lawrence Zitnick, Brandon M. Wood
  • for: 提高化学领域中物理性质预测的效果
  • methods: 使用多个化学领域的数据同时进行超参数化训练,并将每个领域作为独立的预训练任务来处理
  • results: 比起从零开始训练,平均提高59%的性能和在40个任务中 matches或 setting state-of-the-art 的表现
    Abstract Foundation models have been transformational in machine learning fields such as natural language processing and computer vision. Similar success in atomic property prediction has been limited due to the challenges of training effective models across multiple chemical domains. To address this, we introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that simultaneously trains on multiple datasets from different chemical domains, treating each dataset as a unique pre-training task within a multi-task framework. Our combined training dataset consists of $\sim$120M systems from OC20, OC22, ANI-1x, and Transition-1x. We evaluate performance and generalization by fine-tuning over a diverse set of downstream tasks and datasets including: QM9, rMD17, MatBench, QMOF, SPICE, and MD22. JMP demonstrates an average improvement of 59% over training from scratch, and matches or sets state-of-the-art on 34 out of 40 tasks. Our work highlights the potential of pre-training strategies that utilize diverse data to advance property prediction across chemical domains, especially for low-data tasks.
    摘要 基础模型在自然语言处理和计算机视觉领域已经带来了转变。然而,在化学领域中的质量预测方面,这些成功却受到训练效果难以扩展到多个化学领域的挑战。为解决这个问题,我们介绍了一种名为 JOINT MULTI-DOMAIN PRE-TRAINING(JMP)的指导预训练策略。这种策略 simultaneous 地在多个化学领域中训练,将每个领域视为独立的预训练任务,并在多任务框架中进行同时训练。我们的合并训练集包括了OC20、OC22、ANI-1x和Transition-1x等约120M个系统。我们通过对这些系统进行精细调整和特定任务和数据集进行评估,发现JMP可以在34个任务中提高平均59%,并与或与状态之一的成果相匹配。我们的工作表明,通过使用多样化数据进行预训练,可以推动化学领域中的质量预测进步,特别是低数据任务。

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

  • paper_url: http://arxiv.org/abs/2310.16795
  • repo_url: https://github.com/ist-daslab/qmoe
  • paper_authors: Elias Frantar, Dan Alistarh
  • for: 这 paper 是为了解决大语言模型(LLM)的高执行成本问题,通过稀疏路由来提供更快速和准确的模型,但是带来了庞大参数的问题。
  • methods: 这 paper 使用了一种新的压缩和执行框架called QMoE,该框架包括一种可扩展的算法,可以准确地压缩 trillion-parameter MoE 到 less than 1 bit per parameter,并且与特制 GPU 解码器一起实现高效的端到端压缩执行。
  • results: QMoE 可以将 1.6 万亿参数的 SwitchTransformer-c2048 模型压缩到 less than 160GB (20x 压缩,0.8 bits per parameter),并且只需要少量的精度损失,在单个 GPU 上完成在一天内。这使得可以在可负担的商业硬件上执行 trillion-parameter 模型,如单个服务器上的 4x NVIDIA A6000 或 8x NVIDIA 3090 GPUs,并且只需要 less than 5% 的运行时过程相对于理想的无压缩执行。
    Abstract Mixture-of-Experts (MoE) architectures offer a general solution to the high inference costs of large language models (LLMs) via sparse routing, bringing faster and more accurate models, at the cost of massive parameter counts. For example, the SwitchTransformer-c2048 model has 1.6 trillion parameters, requiring 3.2TB of accelerator memory to run efficiently, which makes practical deployment challenging and expensive. In this paper, we present a solution to this memory problem, in form of a new compression and execution framework called QMoE. Specifically, QMoE consists of a scalable algorithm which accurately compresses trillion-parameter MoEs to less than 1 bit per parameter, in a custom format co-designed with bespoke GPU decoding kernels to facilitate efficient end-to-end compressed inference, with minor runtime overheads relative to uncompressed execution. Concretely, QMoE can compress the 1.6 trillion parameter SwitchTransformer-c2048 model to less than 160GB (20x compression, 0.8 bits per parameter) at only minor accuracy loss, in less than a day on a single GPU. This enables, for the first time, the execution of a trillion-parameter model on affordable commodity hardware, like a single server with 4x NVIDIA A6000 or 8x NVIDIA 3090 GPUs, at less than 5% runtime overhead relative to ideal uncompressed inference. The source code and compressed models are available at github.com/IST-DASLab/qmoe.
    摘要 大量语言模型(LLM)的高推理成本可以通过杂合扩展(MoE)架构解决,通过稀疏路由实现更快速和更准确的模型,但是需要巨量的参数数量。例如,SwitchTransformer-c2048模型有1.6万亿个参数,需要3.2TB的加速器内存来运行高效,这使得实际部署成为困难和昂贵的问题。在这篇论文中,我们提出了一种解决这个内存问题的解决方案,即新的压缩和执行框架 called QMoE。具体来说,QMoE包括一个可扩展的算法,可以高精度地压缩大量参数的MoE,以 less than 1比特/参数的形式,并与特制GPU解码器一起实现高效的压缩执行,具有较少的运行时间开销。例如,QMoE可以将1.6万亿参数的SwitchTransformer-c2048模型压缩到less than 160GB(20倍压缩,0.8比特/参数),在单个GPU上完成,只需要一天时间,并且只有较少的精度损失。这使得,对于首次执行一万亿参数模型,可以使用可靠的商用硬件,如单个服务器上的4个NVIDIA A6000或8个NVIDIA 3090 GPU,并且在5%的运行时间开销下完成。源代码和压缩模型可以在github.com/IST-DASLab/qmoe中下载。

Learning Independent Program and Architecture Representations for Generalizable Performance Modeling

  • paper_url: http://arxiv.org/abs/2310.16792
  • repo_url: None
  • paper_authors: Lingda Li, Thomas Flynn, Adolfy Hoisie
  • for: 这篇论文提出了一种基于深度学习的性能模型框架,可以学习高维独立/正交的程序和微架构表示。
  • methods: 该框架使用深度学习来学习程序和微架构表示,并可以将程序表示应用于任何微架构,以及将微架构表示应用于任何程序的性能预测。
  • results: 评估表明,PerfVec比前一个方法更通用、高效和准确。
    Abstract This paper proposes PerfVec, a novel deep learning-based performance modeling framework that learns high-dimensional, independent/orthogonal program and microarchitecture representations. Once learned, a program representation can be used to predict its performance on any microarchitecture, and likewise, a microarchitecture representation can be applied in the performance prediction of any program. Additionally, PerfVec yields a foundation model that captures the performance essence of instructions, which can be directly used by developers in numerous performance modeling related tasks without incurring its training cost. The evaluation demonstrates that PerfVec is more general, efficient, and accurate than previous approaches.
    摘要 Translated into Simplified Chinese:这篇论文提出了PerfVec,一种基于深度学习的性能模拟框架,可以学习高维、独立/垂直的程序和微架构表示。一 fois学习完成,程序表示可以用来预测任何微架构上的性能,而微架构表示也可以应用于任何程序的性能预测。此外,PerfVec还提供了一个基础模型,可以直接用于开发者在多种性能模拟相关任务中使用,无需付出训练成本。评估表明,PerfVec比前方法更通用、高效和准确。

Covert Planning against Imperfect Observers

  • paper_url: http://arxiv.org/abs/2310.16791
  • repo_url: None
  • paper_authors: Haoxiang Ma, Chongyang Shi, Shuo Han, Michael R. Dorothy, Jie Fu
  • for: 本文研究了如何通过抽象维度和察视者的不准确观测来实现隐蔽计划,以达到最大化任务性能而不被探测。
  • methods: 本文使用了Markov决策过程来模型智能机器和其随机环境之间的互动,并使用偏见函数来捕捉察视者对于隐蔽计划的泄露信息。 我们假设察视者使用假设测试来检测到是否存在异常情况,隐蔽计划的目标是 maximize 折扣回报,同时保持察视者探测到的概率Below a given threshold。
  • results: 我们证明了finite-memory策略比Markovian策略更有力量在隐蔽计划中,然后我们开发了一种基于 primal-dual proximal policy gradient 的方法来计算一个(本地)最优隐蔽策略。我们通过一个柔性网格世界示例来证明我们的方法的有效性,实验结果表明我们的方法可以计算一个不violate detection constraint的策略,同时 empirically 示出了环境噪声对隐蔽策略的影响。
    Abstract Covert planning refers to a class of constrained planning problems where an agent aims to accomplish a task with minimal information leaked to a passive observer to avoid detection. However, existing methods of covert planning often consider deterministic environments or do not exploit the observer's imperfect information. This paper studies how covert planning can leverage the coupling of stochastic dynamics and the observer's imperfect observation to achieve optimal task performance without being detected. Specifically, we employ a Markov decision process to model the interaction between the agent and its stochastic environment, and a partial observation function to capture the leaked information to a passive observer. Assuming the observer employs hypothesis testing to detect if the observation deviates from a nominal policy, the covert planning agent aims to maximize the total discounted reward while keeping the probability of being detected as an adversary below a given threshold. We prove that finite-memory policies are more powerful than Markovian policies in covert planning. Then, we develop a primal-dual proximal policy gradient method with a two-time-scale update to compute a (locally) optimal covert policy. We demonstrate the effectiveness of our methods using a stochastic gridworld example. Our experimental results illustrate that the proposed method computes a policy that maximizes the adversary's expected reward without violating the detection constraint, and empirically demonstrates how the environmental noises can influence the performance of the covert policies.
    摘要 隐蔽 планирование(covert planning)是一类受限制的 планирование问题,其中一个智能体尝试完成任务,并避免被一个旁观者探测到。然而,现有的隐蔽 планирование方法通常考虑决定性环境,或者不利用旁观者的不准确观察。这篇论文研究了如何使用随机动力学和旁观者的不准确观察来实现隐蔽任务完成,无需被探测。 Specifically, we employ a Markov decision process to model the interaction between the agent and its stochastic environment, and a partial observation function to capture the leaked information to a passive observer. Assuming the observer employs hypothesis testing to detect if the observation deviates from a nominal policy, the covert planning agent aims to maximize the total discounted reward while keeping the probability of being detected as an adversary below a given threshold. We prove that finite-memory policies are more powerful than Markovian policies in covert planning. Then, we develop a primal-dual proximal policy gradient method with a two-time-scale update to compute a (locally) optimal covert policy. We demonstrate the effectiveness of our methods using a stochastic gridworld example. Our experimental results illustrate that the proposed method computes a policy that maximizes the adversary's expected reward without violating the detection constraint, and empirically demonstrates how the environmental noises can influence the performance of the covert policies.

The Simplest Inflationary Potentials

  • paper_url: http://arxiv.org/abs/2310.16786
  • repo_url: None
  • paper_authors: Tomás Sousa, Deaglan J. Bartlett, Harry Desmond, Pedro G. Ferreira
  • for: 这个论文是为了研究早期宇宙的Inflation理论,并且与现有的cosmic microwave background和大规模结构观察相Compatible。
  • methods: 这个论文使用了一种新的符号重回归法,生成所有可能的简单演算符潜在性。
  • results: 这个论文通过使用信息理论度量(“最小描述长度”)评估这些模型是否能够压缩Planck数据中的信息,并explored两种不同的假设空间中的参数。
    Abstract Inflation is a highly favoured theory for the early Universe. It is compatible with current observations of the cosmic microwave background and large scale structure and is a driver in the quest to detect primordial gravitational waves. It is also, given the current quality of the data, highly under-determined with a large number of candidate implementations. We use a new method in symbolic regression to generate all possible simple scalar field potentials for one of two possible basis sets of operators. Treating these as single-field, slow-roll inflationary models we then score them with an information-theoretic metric ("minimum description length") that quantifies their efficiency in compressing the information in the Planck data. We explore two possible priors on the parameter space of potentials, one related to the functions' structural complexity and one that uses a Katz back-off language model to prefer functions that may be theoretically motivated. This enables us to identify the inflaton potentials that optimally balance simplicity with accuracy at explaining the Planck data, which may subsequently find theoretical motivation. Our exploratory study opens the door to extraction of fundamental physics directly from data, and may be augmented with more refined theoretical priors in the quest for a complete understanding of the early Universe.
    摘要 inflation是 Early Universe 的非常受欢迎理论。它与 cosmic microwave background 和 large scale structure 的现有观察结果相Compatible,并且是探测primordial gravitational waves的 Driver。然而,由于数据质量的限制,这个理论目前处于高度不确定的状态,有许多候选的实现方式。我们使用新的 symbolic regression 方法生成所有可能的简单Scalar field potentials,然后使用信息理论度量("最小描述长度")对这些模型进行评分。我们使用两种不同的 prior 在 potential space 中,一种是函数的结构复杂度,另一种是使用 Katz back-off language model 来 preference functions ,这些函数可能具有理论导向性。这些 inflaton potentials 可以最优地平衡简洁性和准确性,从而描述 Planck 数据,并可能找到理论上的支持。我们的探索性研究可以直接从数据中提取基本物理学,并可能与更加精细的理论假设相结合,以完全理解 Early Universe。

Simple, Scalable and Effective Clustering via One-Dimensional Projections

  • paper_url: http://arxiv.org/abs/2310.16752
  • repo_url: https://github.com/boredoms/prone
  • paper_authors: Moses Charikar, Monika Henzinger, Lunjia Hu, Maxmilian Vötsch, Erik Waingarten
  • for: 这个论文的目的是提出一种基于随机sampling的归一化算法,以提高 clustering 算法的运行时间和准确性。
  • methods: 该算法使用了随机扫描和简单的排序算法,并且使用了一种新的评价函数来评价每个分区的质量。
  • results: 研究人员通过 theoretical 分析和实验 validate 了该算法的正确性和精度,并且发现该算法可以提供一个新的平衡点 между运行时间和分区质量。
    Abstract Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in a $d$-dimensional space (represented by an $n\times d$ matrix $X$) into $k$ clusters. In applications with moderate to large $k$, the multiplicative $k$ factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time $O(\mathrm{nnz}(X) + n\log n)$ for arbitrary $k$. Here $\mathrm{nnz}(X)$ is the total number of non-zero entries in the input dataset $X$, which is upper bounded by $nd$ and can be significantly smaller for sparse datasets. We prove that our algorithm achieves approximation ratio $\smash{\widetilde{O}(k^4)}$ on any input dataset for the $k$-means objective. We also believe that our theoretical analysis is of independent interest, as we show that the approximation ratio of a $k$-means algorithm is approximately preserved under a class of projections and that $k$-means++ seeding can be implemented in expected $O(n \log n)$ time in one dimension. Finally, we show experimentally that our clustering algorithm gives a new tradeoff between running time and cluster quality compared to previous state-of-the-art methods for these tasks.
    摘要 “集群是机器学习中的基本问题,具有许多应用在数据分析中。受欢迎的集群算法如条 Lloyd 算法和 $k$-means++ 可能需要 $\Omega(ndk)$ 时间来将 $n$ 个点在 $d$-dimensional 空间中集结成 $k$ 个集群。在实际应用中,$k$ 的乘积因子可能会很昂贵。我们介绍了一个简单的随机集群算法,其预期时间复杂度为 $O(\text{nnz}(X) + n\log n)$,其中 $\text{nnz}(X)$ 是输入数据集 $X$ 中非零元素的总数,最多等于 $nd$,并且可能较小 для稀疏数据集。我们证明了我们的算法对任何输入数据集都有 $\smash{\widetilde{O}(k^4)}$ 的近似比率,并且我们还证明了这个近似比率在某些投影下保持不变。此外,我们还显示了在一维中可以实现 $k$-means++ 种子生成的预期时间为 $O(n \log n)$。最后,我们通过实验显示了我们的集群算法对前state-of-the-art方法的新的时间负载与集群质量之间的交换。”

Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets

  • paper_url: http://arxiv.org/abs/2310.16741
  • repo_url: https://github.com/ira-shokar/stochastic_latent_transformer
  • paper_authors: Ira J. S. Shokar, Rich R. Kerswell, Peter H. Haynes
  • for: 用于生成大量 ensemble,以便研究流体动力系统中的统计问题,如自发转换事件的概率。
  • methods: 使用 ‘Stochastic Latent Transformer’ 深度学习模型,其包括带有随机冲击的 transformer 和 translate-equivariant autoencoder,可以复制系统动力 across various integration periods。
  • results: 对于已知的zonal jet系统,使用这种模型可以 achieved a five-order-of-magnitude speedup compared to numerical integration,以便生成大量 ensemble,用于研究流体动力系统中的统计问题。
    Abstract We introduce the 'Stochastic Latent Transformer', a probabilistic deep learning approach for efficient reduced-order modelling of stochastic partial differential equations (SPDEs). Despite recent advances in deep learning for fluid mechanics, limited research has explored modelling stochastically driven flows - which play a crucial role in understanding a broad spectrum of phenomena, from jets on giant planets to ocean circulation and the variability of midlatitude weather. The model architecture consists of a stochastically-forced transformer, paired with a translation-equivariant autoencoder, that we demonstrate is capable of reproducing system dynamics across various integration periods. We demonstrate its effectiveness applied to a well-researched zonal jet system, with the neural network achieving a five-order-of-magnitude speedup compared to numerical integration. This facilitates the cost-effective generation of large ensembles, enabling the exploration of statistical questions concerning probabilities of spontaneous transition events.
    摘要 我们介绍“随机隐藏传播器”,一种几率深度学习方法来有效地实现减少阶层模型(SPDEs)。 DESPITE recent advances in deep learning for fluid mechanics, limited research has explored stochastically driven flows, which play a crucial role in understanding a broad spectrum of phenomena, from jets on giant planets to ocean circulation and the variability of midlatitude weather. The model architecture consists of a stochastically-forced transformer, paired with a translation-equivariant autoencoder, that we demonstrate is capable of reproducing system dynamics across various integration periods. We demonstrate its effectiveness applied to a well-researched zonal jet system, with the neural network achieving a five-order-of-magnitude speedup compared to numerical integration. This facilitates the cost-effective generation of large ensembles, enabling the exploration of statistical questions concerning probabilities of spontaneous transition events.Here's the translation in Traditional Chinese:我们介绍“随机隐藏传播器”,一种几率深度学习方法来有效地实现减少阶层模型(SPDEs)。 DESPITE recent advances in deep learning for fluid mechanics, limited research has explored stochastically driven flows, which play a crucial role in understanding a broad spectrum of phenomena, from jets on giant planets to ocean circulation and the variability of midlatitude weather. The model architecture consists of a stochastically-forced transformer, paired with a translation-equivariant autoencoder, that we demonstrate is capable of reproducing system dynamics across various integration periods. We demonstrate its effectiveness applied to a well-researched zonal jet system, with the neural network achieving a five-order-of-magnitude speedup compared to numerical integration. This facilitates the cost-effective generation of large ensembles, enabling the exploration of statistical questions concerning probabilities of spontaneous transition events.

MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16730
  • repo_url: None
  • paper_authors: Dong-Ki Kim, Sungryull Sohn, Lajanugen Logeswaran, Dongsub Shim, Honglak Lee
  • for: 这篇论文主要针对Automated prompt optimization based on reinforcement learning (RL) 的问题,即如何使用RL来优化提示,以生成可读的提示和黑盒基础模型兼容。
  • methods: 这篇论文提出了一种新的MultiPrompter框架,视提示优化为合作游戏,在提示组合过程中,多个提示工作 вместе,以降低问题大小,并帮助提示学习优化提示。
  • results: 测试在文本到图像任务中,MultiPrompter方法可以生成高质量的图像,比基eline表现更好。
    Abstract Recently, there has been an increasing interest in automated prompt optimization based on reinforcement learning (RL). This approach offers important advantages, such as generating interpretable prompts and being compatible with black-box foundation models. However, the substantial prompt space size poses challenges for RL-based methods, often leading to suboptimal policy convergence. This paper introduces MultiPrompter, a new framework that views prompt optimization as a cooperative game between prompters which take turns composing a prompt together. Our cooperative prompt optimization effectively reduces the problem size and helps prompters learn optimal prompts. We test our method on the text-to-image task and show its ability to generate higher-quality images than baselines.
    摘要 Here is the text in Simplified Chinese:近期,有越来越多的关注在基于强化学习(RL)的自动化提示优化方面。这种方法具有许多优点,如生成可解释的提示和与黑盒基模型兼容。然而,大型提示空间大小会对RL基本方法造成优化策略的困难。这篇论文介绍了 MultiPrompter,一个新的框架,它视提示优化为多个提示者之间的合作游戏。我们的合作提示优化方法可以减小问题的大小,并帮助提示者学习优化提示。我们在文本到图像任务上测试了我们的方法,并显示它可以生成比基elines高质量的图像。

AI Hazard Management: A framework for the systematic management of root causes for AI risks

  • paper_url: http://arxiv.org/abs/2310.16727
  • repo_url: None
  • paper_authors: Ronald Schnitzer, Andreas Hapfelmeier, Sven Gaube, Sonja Zillner
  • for: This paper aims to provide a structured process for identifying, assessing, and treating risks associated with Artificial Intelligence (AI) systems, called AI Hazard Management (AIHM) framework.
  • methods: The proposed AIHM framework is based on a comprehensive state-of-the-art analysis of AI hazards and provides a systematic approach to identify, assess, and treat AI hazards in parallel with the development of AI systems.
  • results: The proposed framework can increase the overall quality of AI systems by systematically reducing the impact of identified hazards to an acceptable level, and provides a taxonomy to support the optimal treatment of identified AI hazards. Additionally, the framework ensures the auditability of AI systems by systematically documenting evidence that the potential impact of identified AI hazards could be reduced to a tolerable level.
    Abstract Recent advancements in the field of Artificial Intelligence (AI) establish the basis to address challenging tasks. However, with the integration of AI, new risks arise. Therefore, to benefit from its advantages, it is essential to adequately handle the risks associated with AI. Existing risk management processes in related fields, such as software systems, need to sufficiently consider the specifics of AI. A key challenge is to systematically and transparently identify and address AI risks' root causes - also called AI hazards. This paper introduces the AI Hazard Management (AIHM) framework, which provides a structured process to systematically identify, assess, and treat AI hazards. The proposed process is conducted in parallel with the development to ensure that any AI hazard is captured at the earliest possible stage of the AI system's life cycle. In addition, to ensure the AI system's auditability, the proposed framework systematically documents evidence that the potential impact of identified AI hazards could be reduced to a tolerable level. The framework builds upon an AI hazard list from a comprehensive state-of-the-art analysis. Also, we provide a taxonomy that supports the optimal treatment of the identified AI hazards. Additionally, we illustrate how the AIHM framework can increase the overall quality of a power grid AI use case by systematically reducing the impact of identified hazards to an acceptable level.
    摘要 最新的人工智能(AI)技术发展提供了解决复杂问题的基础。然而,通过AI的应用,新的风险也出现了。因此,为了获得其优势,需要有效地处理AI中的风险。相关领域的现有风险管理过程,如软件系统,需要足够考虑AI的特点。关键问题是系统地和透明地识别和解决AI风险的根本原因,也称为AI危险。本文介绍了AI危险管理(AIHM)框架,该框架提供了一种结构化的过程,系统地识别、评估和治理AI风险。提议的过程与开发同步进行,以确保在AI系统的生命周期中最早 posible时间捕捉到任何AI风险。此外,为确保AI系统的审核性,提议的框架系统地记录证明已经识别的AI风险可以降低到可接受水平的证据。框架基于AI风险列表,从全面的现状分析中综合获得。此外,我们还提供了一种支持优化治理认知的分类。此外,我们示例了如何通过AIHM框架在电力网络AI应用中系统地减少识别的风险影响,使其达到可接受的水平。

Wasserstein Gradient Flow over Variational Parameter Space for Variational Inference

  • paper_url: http://arxiv.org/abs/2310.16705
  • repo_url: None
  • paper_authors: Dai Hai Nguyen, Tetsuya Sakurai, Hiroshi Mamitsuka
  • for: 这个论文的目的是提出一种基于 Wasserstein 梯度下降的变分推断(VI)优化方法,用于减少变分推断的优化问题中的梯度下降过程中的计算复杂度。
  • methods: 本论文使用的方法包括变分推断(VI)和 Wasserstein 梯度下降(WGD),以及一些实用的数值方法来解决变分推断中的离散梯度流问题。
  • results: 经验测试和理论分析表明,提出的方法可以有效地提高变分推断的优化效果,并且可以视为现有的黑盒子变分推断(VB)和自然变分推断(NVB)的特例。
    Abstract Variational inference (VI) can be cast as an optimization problem in which the variational parameters are tuned to closely align a variational distribution with the true posterior. The optimization task can be approached through vanilla gradient descent in black-box VI or natural-gradient descent in natural-gradient VI. In this work, we reframe VI as the optimization of an objective that concerns probability distributions defined over a \textit{variational parameter space}. Subsequently, we propose Wasserstein gradient descent for tackling this optimization problem. Notably, the optimization techniques, namely black-box VI and natural-gradient VI, can be reinterpreted as specific instances of the proposed Wasserstein gradient descent. To enhance the efficiency of optimization, we develop practical methods for numerically solving the discrete gradient flows. We validate the effectiveness of the proposed methods through empirical experiments on a synthetic dataset, supplemented by theoretical analyses.
    摘要 <>Variational inference (VI) 可以看作一个优化问题,其中的变量参数被调整以使Variational分布与真实 posterior 之间匹配。这个优化任务可以通过普通的梯度下降或自然梯度下降进行解决。在这篇文章中,我们将 VI 重新框架为一个关于变量参数空间上的概率分布优化问题。然后,我们提出了 Wasserstein 梯度下降来解决这个优化问题。各种优化技术,包括黑盒子 VI 和自然梯度 VI,可以被视为特殊情况的 Wasserstein 梯度下降。为了提高优化效率,我们开发了实用的数值方法来解决离散梯度流的问题。我们通过实验和理论分析 Validate 提出的方法的有效性。Note: "Variational parameter space" is translated as "变量参数空间" (fāng yì jī yuán jī) in Simplified Chinese.

Interpretable time series neural representation for classification purposes

  • paper_url: http://arxiv.org/abs/2310.16696
  • repo_url: None
  • paper_authors: Etienne Le Naour, Ghislain Agoua, Nicolas Baskiotis, Vincent Guigue
  • for: 本研究旨在提出一种可解释性强的神经网络模型,用于解决现有的时间序列数据表示方法缺乏可解释性的问题。
  • methods: 本研究提出了一组可解释性神经网络模型的需求,并提出了一种新的无监督神经网络架构,可以满足这些需求。该模型通过独立学习无下渠任务来学习,以确保其robustness。
  • results: 在使用UCRC存档 datasets进行分类任务的实验中,提出的模型比其他可解释性模型和现有神经网络表示学习模型得到更好的结果,而且在多个 dataset 上得到了平均更好的结果。此外,我们还进行了质量实验来评估该方法的可解释性。
    Abstract Deep learning has made significant advances in creating efficient representations of time series data by automatically identifying complex patterns. However, these approaches lack interpretability, as the time series is transformed into a latent vector that is not easily interpretable. On the other hand, Symbolic Aggregate approximation (SAX) methods allow the creation of symbolic representations that can be interpreted but do not capture complex patterns effectively. In this work, we propose a set of requirements for a neural representation of univariate time series to be interpretable. We propose a new unsupervised neural architecture that meets these requirements. The proposed model produces consistent, discrete, interpretable, and visualizable representations. The model is learned independently of any downstream tasks in an unsupervised setting to ensure robustness. As a demonstration of the effectiveness of the proposed model, we propose experiments on classification tasks using UCR archive datasets. The obtained results are extensively compared to other interpretable models and state-of-the-art neural representation learning models. The experiments show that the proposed model yields, on average better results than other interpretable approaches on multiple datasets. We also present qualitative experiments to asses the interpretability of the approach.
    摘要

Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs

  • paper_url: http://arxiv.org/abs/2310.17816
  • repo_url: None
  • paper_authors: Jacqueline Maasch, Weishen Pan, Shantanu Gupta, Volodymyr Kuleshov, Kyra Gan, Fei Wang
  • for: 该研究是为了解决自动选择 covariate 的问题,即在具有限制的先验知识的情况下。
  • methods: 该研究使用了 Local Discovery by Partitioning(LDP)算法,该算法可以将变量集 Z partitioned 到与 exposure-outcome 对 {X,Y} 相关的不同 subset 中。该算法是基于有效地调整集的标准化,但不需要先做很多先验知识。
  • results: 该研究提供了对于任何 Z 的有效的 adjustment set,并且可以确保这些 adjustment set 是有效的。在更强的条件下,研究表明了 partition label 是 asymptotically correct。total independence tests 的时间复杂度是 quadratic 的,但是在实际测试中,可以观察到 quadratic 的时间复杂度。与基eline 相比,LDP 可以更好地回归 confounder 和更准确地估计 average treatment effect。
    Abstract This work addresses the problem of automated covariate selection under limited prior knowledge. Given an exposure-outcome pair {X,Y} and a variable set Z of unknown causal structure, the Local Discovery by Partitioning (LDP) algorithm partitions Z into subsets defined by their relation to {X,Y}. We enumerate eight exhaustive and mutually exclusive partitions of any arbitrary Z and leverage this taxonomy to differentiate confounders from other variable types. LDP is motivated by valid adjustment set identification, but avoids the pretreatment assumption commonly made by automated covariate selection methods. We provide theoretical guarantees that LDP returns a valid adjustment set for any Z that meets sufficient graphical conditions. Under stronger conditions, we prove that partition labels are asymptotically correct. Total independence tests is worst-case quadratic in |Z|, with sub-quadratic runtimes observed empirically. We numerically validate our theoretical guarantees on synthetic and semi-synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baselines, with LDP outperforming on confounder recall, test count, and runtime for valid adjustment set discovery.
    摘要

Learning-based adaption of robotic friction models

  • paper_url: http://arxiv.org/abs/2310.16688
  • repo_url: None
  • paper_authors: Philipp Scholl, Maged Iskandar, Sebastian Wolf, Jinoh Lee, Aras Bacho, Alexander Dietrich, Alin Albu-Schäffer, Gitta Kutyniok
  • for: This paper aims to address the challenge of modeling friction torque in robotic joints, which is a longstanding problem due to the lack of a good mathematical description.
  • methods: The authors propose a novel approach based on residual learning to adapt an existing friction model to new dynamics using as little data as possible. They use a base neural network to learn an accurate relation between velocity and friction torque, and then train a second network to predict the residual of the initial network’s output.
  • results: The authors demonstrate that their proposed estimator outperforms the conventional model-based approach and the base neural network significantly, with an approximately 60-70% improvement in trajectory tracking accuracy. They also show that their method can adapt to diverse scenarios based on prior knowledge about friction in different settings, using only 43 seconds of robot movement data.
    Abstract In the Fourth Industrial Revolution, wherein artificial intelligence and the automation of machines occupy a central role, the deployment of robots is indispensable. However, the manufacturing process using robots, especially in collaboration with humans, is highly intricate. In particular, modeling the friction torque in robotic joints is a longstanding problem due to the lack of a good mathematical description. This motivates the usage of data-driven methods in recent works. However, model-based and data-driven models often exhibit limitations in their ability to generalize beyond the specific dynamics they were trained on, as we demonstrate in this paper. To address this challenge, we introduce a novel approach based on residual learning, which aims to adapt an existing friction model to new dynamics using as little data as possible. We validate our approach by training a base neural network on a symmetric friction data set to learn an accurate relation between the velocity and the friction torque. Subsequently, to adapt to more complex asymmetric settings, we train a second network on a small dataset, focusing on predicting the residual of the initial network's output. By combining the output of both networks in a suitable manner, our proposed estimator outperforms the conventional model-based approach and the base neural network significantly. Furthermore, we evaluate our method on trajectories involving external loads and still observe a substantial improvement, approximately 60-70\%, over the conventional approach. Our method does not rely on data with external load during training, eliminating the need for external torque sensors. This demonstrates the generalization capability of our approach, even with a small amount of data-only 43 seconds of a robot movement-enabling adaptation to diverse scenarios based on prior knowledge about friction in different settings.
    摘要 在第四次工业革命中,人工智能和机器自动化在中心地位上扮演着重要角色。在这个过程中,使用机器人非常重要。然而,通过机器人进行生产过程,尤其是在人机合作的情况下,非常复杂。特别是模拟机器人 JOINT 的摩擦力很难,因为没有好的数学描述。这个问题驱使了我们使用数据驱动方法。然而,模型基的和数据驱动模型经常具有泛化能力不足的问题,我们在这篇论文中展示了这一点。为了解决这个挑战,我们提出了一种新的方法,基于剩余学习,可以通过最少数据来适应新的动力学。我们验证了我们的方法,通过在 симметry 的摩擦数据集上训练基本神经网络,以学习摩擦力和运动速度之间的准确关系。然后,为了适应更复杂的非对称情况,我们在一个小 dataset 上训练第二个神经网络,专注于预测初始神经网络输出的差异。通过合理地组合这两个网络的输出,我们的提议的估计器在比较 conventional 模型基的方法和基本神经网络上显著提高了性能。此外,我们还评估了我们的方法在包括外部荷重的轨迹上的性能,并观察到大约 60-70% 的提高。我们的方法不需要在训练时使用外部扭矩传感器,因此可以在不同的情况下进行适应,只需要43秒的机器人运动数据。这表明了我们的方法具有泛化能力,甚至只需要小量的数据。

Robust and Actively Secure Serverless Collaborative Learning

  • paper_url: http://arxiv.org/abs/2310.16678
  • repo_url: None
  • paper_authors: Olive Franzese, Adam Dziedzic, Christopher A. Choquette-Choo, Mark R. Thomas, Muhammad Ahmad Kaleem, Stephan Rabanser, Congyu Fang, Somesh Jha, Nicolas Papernot, Xiao Wang
  • for: 本研究旨在提供一种安全和可靠的分布式机器学习(Collaborative Machine Learning)方法,以保护客户端数据点免受服务器或客户端的攻击。
  • methods: 本研究使用了分布式机器学习(Distributed Machine Learning)的方法,并提出了一种安全和可靠的peer-to-peer(P2P)学习方案,可以防止服务器和客户端的不可靠行为。
  • results: 研究表明,该方法可以在1000个参与者的情况下,对1000万参数的模型进行训练,并且可以防止服务器和客户端的攻击。
    Abstract Collaborative machine learning (ML) is widely used to enable institutions to learn better models from distributed data. While collaborative approaches to learning intuitively protect user data, they remain vulnerable to either the server, the clients, or both, deviating from the protocol. Indeed, because the protocol is asymmetric, a malicious server can abuse its power to reconstruct client data points. Conversely, malicious clients can corrupt learning with malicious updates. Thus, both clients and servers require a guarantee when the other cannot be trusted to fully cooperate. In this work, we propose a peer-to-peer (P2P) learning scheme that is secure against malicious servers and robust to malicious clients. Our core contribution is a generic framework that transforms any (compatible) algorithm for robust aggregation of model updates to the setting where servers and clients can act maliciously. Finally, we demonstrate the computational efficiency of our approach even with 1-million parameter models trained by 100s of peers on standard datasets.
    摘要 共同机器学习(ML)广泛应用于各institution以获得更好的模型,从分布式数据中学习。而共同approach to learning Intuitively保护用户数据,但它们仍然易受到服务器、客户端或两者都 deviation from the protocol。实际上,因为协议是非对称的,一个恶意服务器可以利用其权力重construct客户端数据点。相反,恶意客户端可以腐化学习 mediante malicious updates。因此,客户端和服务器都需要一个保证,当另一方不能完全合作时。在这项工作中,我们提议一种分布式学习方案,安全于恶意服务器和对客户端腐化。我们的核心贡献是一种可generic framework,将任何(相容)算法 дляRobust Aggregation of model updates transform into setting where servers and clients can act maliciously。最后,我们证明了我们的方法的计算效率, même avec 1000000参数模型由100个同仁在标准 datasets上训练。

UAV Pathfinding in Dynamic Obstacle Avoidance with Multi-agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16659
  • repo_url: None
  • paper_authors: Qizhen Wu, Lei Chen, Kexin Liu, Jinhu Lv
  • For: solve the dynamic obstacle avoidance problem online for multi-agent systems* Methods: centralized training with decentralized execution based on multi-agent reinforcement learning, with improved model predictive control for efficiency and sample utilization* Results: experimental results in simulation, indoor, and outdoor environments validate the effectiveness of the proposed method, with video available at https://www.bilibili.com/video/BV1gw41197hV/?vd_source=9de61aecdd9fb684e546d032ef7fe7bf
    Abstract Multi-agent reinforcement learning based methods are significant for online planning of feasible and safe paths for agents in dynamic and uncertain scenarios. Although some methods like fully centralized and fully decentralized methods achieve a certain measure of success, they also encounter problems such as dimension explosion and poor convergence, respectively. In this paper, we propose a novel centralized training with decentralized execution method based on multi-agent reinforcement learning to solve the dynamic obstacle avoidance problem online. In this approach, each agent communicates only with the central planner or only with its neighbors, respectively, to plan feasible and safe paths online. We improve our methods based on the idea of model predictive control to increase the training efficiency and sample utilization of agents. The experimental results in both simulation, indoor, and outdoor environments validate the effectiveness of our method. The video is available at https://www.bilibili.com/video/BV1gw41197hV/?vd_source=9de61aecdd9fb684e546d032ef7fe7bf
    摘要 多智能体学习基于方法在线规划可行安全的路径 для智能体在动态不确定的enario中是非常重要的。尽管有些方法,如完全中央化和完全分布式方法,在某种程度上达到了成功,但它们也遇到了维度爆发和优化问题,分别。在这篇论文中,我们提出了一种新的中央训练与分布式执行方法,基于多智能体学习来解决动态障碍避免问题在线。在这种方法中,每个智能体只与中央规划器或只与其他邻居进行交流,以在线规划可行安全的路径。我们通过模型预测控制的想法来提高我们的方法的训练效率和智能体的样本利用率。实验结果表明,我们的方法在 simulate、indoor和outdoor环境中具有效果。视频可以在https://www.bilibili.com/video/BV1gw41197hV/?vd_source=9de61aecdd9fb684e546d032ef7fe7bf找到。

Towards Control-Centric Representations in Reinforcement Learning from Images

  • paper_url: http://arxiv.org/abs/2310.16655
  • repo_url: None
  • paper_authors: Chen Liu, Hongyu Zang, Xin Li, Yong Heng, Yifei Wang, Zhen Fang, Yisen Wang, Mingzhong Wang
  • for: 解决图像基于奖励学习中的实用性和挑战性问题
  • methods: integrate reward-free control information和奖励特定知识,使用 transformer 架构模型动力学,并使用块级卷积来消除时空重复信息
  • results: 在 Atari 游戏和 DeepMind Control Suit 两大标准 bencmark 上表现出色,比现有方法有更高的性能,证明其有效性
    Abstract Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the inadaptability to sparse reward environments. To address these limitations, we introduce ReBis, which aims to capture control-centric information by integrating reward-free control information alongside reward-specific knowledge. ReBis utilizes a transformer architecture to implicitly model the dynamics and incorporates block-wise masking to eliminate spatiotemporal redundancy. Moreover, ReBis combines bisimulation-based loss with asymmetric reconstruction loss to prevent feature collapse in environments with sparse rewards. Empirical studies on two large benchmarks, including Atari games and DeepMind Control Suit, demonstrate that ReBis has superior performance compared to existing methods, proving its effectiveness.
    摘要 图像基于的再强化学习是一项实用又挑战性的任务。主要难点在于提取控制中心的表示,而不考虑无关的信息。尽管遵循 bisimulation 原理的方法表现出了学习状态表示的潜在力,但它们仍然面临着卷积动力的有限表达能力和缺乏适应环境中的稀薄奖励的问题。为解决这些限制,我们介绍了 ReBis,它通过结合奖励free控制信息和奖励特有的知识来捕捉控制中心的信息。ReBis 使用 transformer 架构来隐式地模型动力学,并在块级别屏蔽中除掉空间时间的重复性。此外,ReBis 结合 bisimulation 基于的损失与不对称重建loss 来避免特征塌沦的问题。实验研究在 Atari 游戏和 DeepMind Control Suit 两个大 benchmark 上表明,ReBis 与现有方法相比有较高的性能,证明其效果。

  • paper_url: http://arxiv.org/abs/2310.16652
  • repo_url: None
  • paper_authors: Linping Qu, Shenghui Song, Chi-Ying Tsui, Yuyi Mao
  • for: 这篇论文探讨了 federated learning(FL)在无线网络上的稳定性,具体来说,它研究了 FL 对于上行和下行通信错误的抗预测性。
  • methods: 该论文使用了理论分析方法,探讨了 FL 中两个关键参数(客户端数量和模型参数范围)对稳定性的影响,并提出了一个公式来量化上行和下行通信错误之间的差异。
  • results: 研究发现,FL 在上行通信错误情况下可以忍受更高的比特错误率(BER),并且与客户端数量和模型参数范围有关。这些结论得到了实验 validate。
    Abstract Because of its privacy-preserving capability, federated learning (FL) has attracted significant attention from both academia and industry. However, when being implemented over wireless networks, it is not clear how much communication error can be tolerated by FL. This paper investigates the robustness of FL to the uplink and downlink communication error. Our theoretical analysis reveals that the robustness depends on two critical parameters, namely the number of clients and the numerical range of model parameters. It is also shown that the uplink communication in FL can tolerate a higher bit error rate (BER) than downlink communication, and this difference is quantified by a proposed formula. The findings and theoretical analyses are further validated by extensive experiments.
    摘要 因其隐私保护能力,联合学习(FL)已经吸引了学术界和产业界的广泛关注。然而,在无线网络上实现FL时,不清楚FC可以tolerate多少communication error。这篇论文研究了FL对上行和下行通信错误的Robustness。我们的理论分析表明,FL的Robustness取决于两个关键参数:客户端数量和模型参数的数值范围。此外,我们还发现了FL的上行通信可以tolerate较高的比特错误率(BER),相比下行通信,这个差异由我们提出的公式进行了详细描述。实验结果也验证了我们的理论分析。

Posterior Consistency for Missing Data in Variational Autoencoders

  • paper_url: http://arxiv.org/abs/2310.16648
  • repo_url: None
  • paper_authors: Timur Sudak, Sebastian Tschiatschek
  • for: 学习带有缺失数据的变量自动机(VAEs),以提高变量自动机的权重平衡和数据填充能力。
  • methods: 提出了一种 posterior consistency 定义和规范,以及一种基于这种定义的激活函数正则化方法,以促进变量自动机的后验分布的一致性。
  • results: 通过实验表明,该正则化方法可以提高缺失数据下的变量自动机的表现,包括增加了数据重建质量和下游任务中使用uncertainty的能力。此外,该方法可以在不同类型的 VAEs 中实现改进表现,包括 VAEs 携带流形。
    Abstract We consider the problem of learning Variational Autoencoders (VAEs), i.e., a type of deep generative model, from data with missing values. Such data is omnipresent in real-world applications of machine learning because complete data is often impossible or too costly to obtain. We particularly focus on improving a VAE's amortized posterior inference, i.e., the encoder, which in the case of missing data can be susceptible to learning inconsistent posterior distributions regarding the missingness. To this end, we provide a formal definition of posterior consistency and propose an approach for regularizing an encoder's posterior distribution which promotes this consistency. We observe that the proposed regularization suggests a different training objective than that typically considered in the literature when facing missing values. Furthermore, we empirically demonstrate that our regularization leads to improved performance in missing value settings in terms of reconstruction quality and downstream tasks utilizing uncertainty in the latent space. This improved performance can be observed for many classes of VAEs including VAEs equipped with normalizing flows.
    摘要 我团队考虑了使用 Variational Autoencoders (VAEs) 学习,即深度生成模型,从数据中缺失值处理。这种数据在实际机器学习应用中很普遍,因为完整的数据往往是不可逾或者太Costly来获得的。我们特别关注在缺失数据中改进 VAE 的权重平均推理,即encoder,因为在缺失数据中,encoder可能会学习不一致的 posterior 分布。为此,我们提出了 posterior 一致性的正式定义,并提议一种对 encoder 的 posterior 分布进行规范化,以便促进一致性。我们发现,我们的规范化建议一个与常见在缺失值情况下考虑的培训目标不同的训练目标。此外,我们在实验中观察到,我们的规范化可以在缺失值情况下提高 VAE 的表现,包括减少缺失值的重建质量和在隐藏空间中使用不确定性来进行下游任务。这种提高表现可以观察到许多类型的 VAEs,包括 VAEs 配置有流形函数。

Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach

  • paper_url: http://arxiv.org/abs/2310.16647
  • repo_url: None
  • paper_authors: Diogo Lavado, Cláudia Soares, Alessandra Micheletti
  • For: 这个论文的目的是提高深度神经网络(DNNs)的通用化和避免过拟合。* Methods: 我们提出了一种新的DNN调理方法,将训练过程架构为具有限制的对抗问题,并使用Stochastic Augmented Lagrangian(SAL)方法来实现更加灵活和高效的调理机制。* Results: 实验结果显示,SAL方法可以在image-based classification tasks中实现更高的准确率,同时也可以更好地满足限制,显示其在受限设定下优化DNNs的潜力。
    Abstract Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting. Fixed penalty methods, though common, lack adaptability and suffer from hyperparameter sensitivity. In this paper, we propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem. Where the data fidelity term is the minimization objective and the regularization terms serve as constraints. Then, we employ the Stochastic Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism. Our approach extends beyond black-box regularization, demonstrating significant improvements in white-box models, where weights are often subject to hard constraints to ensure interpretability. Experimental results on image-based classification on MNIST, CIFAR10, and CIFAR100 datasets validate the effectiveness of our approach. SAL consistently achieves higher Accuracy while also achieving better constraint satisfaction, thus showcasing its potential for optimizing DNNs under constrained settings.
    摘要 Deep Neural Networks (DNNs) 的常规化是提高普适性和避免过拟合的关键。固定罚金方法,虽然广泛使用,但缺乏适应性和参数敏感性。在这篇论文中,我们提出了一种新的 DNN 常规化方法,将训练过程框定为一个具有数据准确度对象和常规化约束的归一化优化问题。然后,我们使用 Stochastic Augmented Lagrangian(SAL)方法来实现更加灵活和高效的常规化机制。我们的方法不仅超越黑盒常规化,而且在白盒模型中,通常有硬件约束来保证解释性。我们对 MNIST、CIFAR10 和 CIFAR100 等图像分类 dataset 进行了实验,并证明了我们的方法的有效性。SAL 在具有约束的情况下可以具有更高的准确率和更好的约束满足度,这表明它在受约束的情况下可以优化 DNNs。

Model predictive control-based value estimation for efficient reinforcement learning

  • paper_url: http://arxiv.org/abs/2310.16646
  • repo_url: None
  • paper_authors: Qizhen Wu, Kexin Liu, Lei Chen
  • for: 提高 reinforcement learning 在实际应用中的效率,因为现实环境中需要大量交互。
  • methods: 基于模型预测控制的改进 reinforcement learning 方法,通过数据驱动的环境模型来预测值函数和优化策略。
  • results: 方法在经典数据库和无人机避险场景中实验 validate,显示了更高的学习效率,更快的策略倾向于优化值,以及更少的经验回放缓存空间需求。
    Abstract Reinforcement learning suffers from limitations in real practices primarily due to the numbers of required interactions with virtual environments. It results in a challenging problem that we are implausible to obtain an optimal strategy only with a few attempts for many learning method. Hereby, we design an improved reinforcement learning method based on model predictive control that models the environment through a data-driven approach. Based on learned environmental model, it performs multi-step prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the optimal value, and fewer sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle avoidance scenario for unmanned aerial vehicle, validate the proposed approaches.
    摘要 <>translate "Reinforcement learning suffers from limitations in real practices primarily due to the numbers of required interactions with virtual environments. It results in a challenging problem that we are implausible to obtain an optimal strategy only with a few attempts for many learning method. Hereby, we design an improved reinforcement learning method based on model predictive control that models the environment through a data-driven approach. Based on learned environmental model, it performs multi-step prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the optimal value, and fewer sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle avoidance scenario for unmanned aerial vehicle, validate the proposed approaches." into Simplified Chinese.Here's the translation:<>现实中实施的增强学习受到环境份的限制,主要是因为需要与虚拟环境进行多次互动。这实际上是一个困难的问题,我们很难在几次尝试后就获得最佳策略。为此,我们设计了一个基于预测控制的改进增强学习方法。这个方法通过数据驱动的方法来建立环境模型,然后使用预测多步来估算值函数和优化策略。这个方法在学习效率、策略趋向最佳值的速度和经验练习空间的需求方面具有更高的表现。实验结果,包括 классические数据库和无人飞行器避障enario,证实了我们的提案。

Robust Covariate Shift Adaptation for Density-Ratio Estimation

  • paper_url: http://arxiv.org/abs/2310.16638
  • repo_url: None
  • paper_authors: Masahiro Kato
  • For: 预测测试数据中缺失的结果。* Methods: 使用重要性权重法和双重机器学习技术来适应covariate shift,并提出了一种双重可靠估计器来减少density ratio估计误差所带来的偏差。* Results: 通过实验研究表明,提出的方法可以减少density ratio估计误差所带来的偏差,并且方法 remains consistent if either the density ratio estimator or the regression function is consistent。
    Abstract Consider a scenario where we have access to train data with both covariates and outcomes while test data only contains covariates. In this scenario, our primary aim is to predict the missing outcomes of the test data. With this objective in mind, we train parametric regression models under a covariate shift, where covariate distributions are different between the train and test data. For this problem, existing studies have proposed covariate shift adaptation via importance weighting using the density ratio. This approach averages the train data losses, each weighted by an estimated ratio of the covariate densities between the train and test data, to approximate the test-data risk. Although it allows us to obtain a test-data risk minimizer, its performance heavily relies on the accuracy of the density ratio estimation. Moreover, even if the density ratio can be consistently estimated, the estimation errors of the density ratio also yield bias in the estimators of the regression model's parameters of interest. To mitigate these challenges, we introduce a doubly robust estimator for covariate shift adaptation via importance weighting, which incorporates an additional estimator for the regression function. Leveraging double machine learning techniques, our estimator reduces the bias arising from the density ratio estimation errors. We demonstrate the asymptotic distribution of the regression parameter estimator. Notably, our estimator remains consistent if either the density ratio estimator or the regression function is consistent, showcasing its robustness against potential errors in density ratio estimation. Finally, we confirm the soundness of our proposed method via simulation studies.
    摘要 假设我们有训练数据包含covariates和结果,而测试数据只包含covariates。在这种情况下,我们的主要目标是预测测试数据中缺失的结果。为了实现这个目标,我们在covariate shift下训练参数化回归模型,其中covariate分布在训练和测试数据中不同。现有的研究已经提出了covariate shift适应via重要性权重使用density比率。这种方法将训练数据的损失平均值,每个权重为训练和测试数据中covariate分布之间的density比率估计值,以 Approximate测试数据的风险。although it allows us to obtain a test-data risk minimizer, its performance heavily relies on the accuracy of the density ratio estimation. Moreover, even if the density ratio can be consistently estimated, the estimation errors of the density ratio also yield bias in the estimators of the regression model's parameters of interest.To mitigate these challenges, we introduce a doubly robust estimator for covariate shift adaptation via importance weighting, which incorporates an additional estimator for the regression function. Leveraging double machine learning techniques, our estimator reduces the bias arising from the density ratio estimation errors. We demonstrate the asymptotic distribution of the regression parameter estimator. Notably, our estimator remains consistent if either the density ratio estimator or the regression function is consistent, showcasing its robustness against potential errors in density ratio estimation. Finally, we confirm the soundness of our proposed method via simulation studies.

Photometric Redshifts with Copula Entropy

  • paper_url: http://arxiv.org/abs/2310.16633
  • repo_url: https://github.com/majianthu/quasar
  • paper_authors: Jian Ma
  • for: 应用 copula entropy (CE) 来提高光度谱的准确性。
  • methods: 使用 CE 测量光度谱和光度测量之间的相关性,并选择高 CE 值的测量来预测红移。
  • results: 实验结果表明,使用选择的测量(包括 luminosity 磁场、U 频率带的标准差和其他四个频率带的磁场)可以提高光度谱的准确性,特别是对高红移样本的预测。
    Abstract In this paper we propose to apply copula entropy (CE) to photometric redshifts. CE is used to measure the correlations between photometric measurements and redshifts and then the measurements associated with high CEs are selected for predicting redshifts. We verified the proposed method on the SDSS quasar data. Experimental results show that the accuracy of photometric redshifts is improved with the selected measurements compared to the results with all the measurements used in the experiments, especially for the samples with high redshifts. The measurements selected with CE include luminosity magnitude, the brightness in ultraviolet band with standard deviation, and the brightness of the other four bands. Since CE is a rigorously defined mathematical concept, the models such derived is interpretable.
    摘要 在这篇论文中,我们提议使用 copula entropy (CE) 来应用于光度谱。 CE 用于测量光度谱和光度测量之间的相关性,然后选择具有高 CE 的测量来预测谱。我们对 SDSS квазар数据进行验证。实验结果表明,使用选择的测量比使用所有测量来预测谱的准确性更高,特别是高红shift 样本中。选择 CE 的测量包括照度大小、标准差 ultraviolet 频谱亮度和其他四个频谱的亮度。由于 CE 是一种严格定义的数学概念,所 derivated 的模型是可解释的。

Free-form Flows: Make Any Architecture a Normalizing Flow

  • paper_url: http://arxiv.org/abs/2310.16624
  • repo_url: https://github.com/vislearn/fff
  • paper_authors: Felix Draxler, Peter Sorrenson, Lea Zimmermann, Armand Rousselot, Ullrich Köthe
  • for: 本研究旨在提高Normalizing Flows的设计领域,使其能够更加灵活地适应具体任务。
  • methods: 本研究使用一种高效的梯度估计器,允许任意维度保持的神经网络作为生成模型进行最大化 posterior probability 训练。
  • results: 研究人员在分子生成和反问题 benchmark 中获得了优秀的结果,并在使用存在权重 ResNet 架构的情况下与比较势力竞争。
    Abstract Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. Our approach allows placing the emphasis on tailoring inductive biases precisely to the task at hand. Specifically, we achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks. Moreover, our method is competitive in an inverse problem benchmark, while employing off-the-shelf ResNet architectures.
    摘要 正常化流是一种生成模型,直接 maximize 可能性。在过去,normalizing flow的设计受限于需要分析逆变换的需要。我们通过一种高效的梯度估计器来缓解这个限制,使任何维度保持的神经网络可以作为生成模型通过最大化可能性来训练。我们的方法允许在任务上加入适合的启发性权重,并在分子生成数据集上达到极高的性能。此外,我们的方法在反问题数据集上也具有竞争力,只使用商业化 ResNet 架构。

SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence

  • paper_url: http://arxiv.org/abs/2310.16620
  • repo_url: https://github.com/fangwei123456/spikingjelly
  • paper_authors: Wei Fang, Yanqi Chen, Jianhao Ding, Zhaofei Yu, Timothée Masquelier, Ding Chen, Liwei Huang, Huihui Zhou, Guoqi Li, Yonghong Tian
  • for: 这个论文旨在实现基于神经元逻辑芯片的高效能脑机智能,通过引入神经动力和射频性质。
  • methods: 这篇论文提出了名为SpikingJelly的全栈工具箱,用于预处理神经动力数据集、建立深度神经网络、优化参数和部署神经动力网络在神经元逻辑芯片上。相比现有方法,SpikingJelly可以加速深度神经网络的训练 $11\times$。
  • results: SpikingJelly提供了高度可扩展和灵活的工具箱,可以帮助用户在低成本下加速自定义模型,通过多层继承和自动代码生成。SpikingJelly开创了高效能神经动力网络基于机器智能系统的 synthesis 领域,这将扩展神经元计算生态系统。
    Abstract Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency by introducing neural dynamics and spike properties. As the emerging spiking deep learning paradigm attracts increasing interest, traditional programming frameworks cannot meet the demands of the automatic differentiation, parallel computation acceleration, and high integration of processing neuromorphic datasets and deployment. In this work, we present the SpikingJelly framework to address the aforementioned dilemma. We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips. Compared to existing methods, the training of deep SNNs can be accelerated $11\times$, and the superior extensibility and flexibility of SpikingJelly enable users to accelerate custom models at low costs through multilevel inheritance and semiautomatic code generation. SpikingJelly paves the way for synthesizing truly energy-efficient SNN-based machine intelligence systems, which will enrich the ecology of neuromorphic computing.
    摘要 聚凝神经网络(SNN)目标实现基于神经元模拟芯片的高效能智能,通过引入神经动力学和冲击特性。随着emerging spiking deep learning paradigm吸引越来越多的关注,传统的编程框架无法满足自动微分、并行计算加速和高集成处理神经元数据的需求。在这个工作中,我们提出SpikingJelly框架,以解决上述困境。我们提供了全栈工具箱,用于预处理神经元数据、建立深度SNN、优化参数和部署SNN在神经元模拟芯片上。与现有方法相比,SpikingJelly可以加速深度SNN的训练$11\times$,并且其出色的可扩展性和灵活性使得用户可以在低成本下加速自定义模型,通过多级继承和semiautomatic code generation。SpikingJelly开创了真正能效的SNN-基于机器智能系统的合成,这将丰富神经元计算生态。

Performative Prediction: Past and Future

  • paper_url: http://arxiv.org/abs/2310.16608
  • repo_url: https://github.com/salmansust/Machine-Learning-TSF-Petroleum-Production
  • paper_authors: Moritz Hardt, Celestine Mendler-Dünner
  • for: 这篇论文主要关注的是机器学习预测的表现力和其对数据生成分布的影响。
  • methods: 该论文使用了定义和概念框架来研究机器学习中的表现力,并提出了一种自然平衡概念和学习与导航两种机制的分类。
  • results: 该论文发现了机器学习预测可能会导致数据生成分布的变化,并提出了一种新的优化挑战。同时,论文还探讨了数字市场中平台对参与者的导航问题。
    Abstract Predictions in the social world generally influence the target of prediction, a phenomenon known as performativity. Self-fulfilling and self-negating predictions are examples of performativity. Of fundamental importance to economics, finance, and the social sciences, the notion has been absent from the development of machine learning. In machine learning applications, performativity often surfaces as distribution shift. A predictive model deployed on a digital platform, for example, influences consumption and thereby changes the data-generating distribution. We survey the recently founded area of performative prediction that provides a definition and conceptual framework to study performativity in machine learning. A consequence of performative prediction is a natural equilibrium notion that gives rise to new optimization challenges. Another consequence is a distinction between learning and steering, two mechanisms at play in performative prediction. The notion of steering is in turn intimately related to questions of power in digital markets. We review the notion of performative power that gives an answer to the question how much a platform can steer participants through its predictions. We end on a discussion of future directions, such as the role that performativity plays in contesting algorithmic systems.
    摘要 社会世界中的预测通常会影响预测目标,这种现象被称为表现力。自我实现和自我否定的预测是表现力的例子。 econometrics, finance and social sciences of great importance, this concept has been absent from the development of machine learning. In machine learning applications, performativity often appears as distribution shift. A predictive model deployed on a digital platform, for example, influences consumption and changes the data-generating distribution. We survey the recently founded area of performative prediction, which provides a definition and conceptual framework to study performativity in machine learning. A consequence of performative prediction is a natural equilibrium notion that gives rise to new optimization challenges. Another consequence is a distinction between learning and steering, two mechanisms at play in performative prediction. The notion of steering is closely related to questions of power in digital markets. We review the notion of performative power, which answers the question of how much a platform can steer participants through its predictions. We end with a discussion of future directions, such as the role that performativity plays in contesting algorithmic systems.Note: Please note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

AirFL-Mem: Improving Communication-Learning Trade-Off by Long-Term Memory

  • paper_url: http://arxiv.org/abs/2310.16606
  • repo_url: None
  • paper_authors: Haifeng Wen, Hong Xing, Osvaldo Simeone
  • for: 提高 federated learning(FL)中的通信瓶颈问题,随空间FL(AirFL)已经出现为一种有前途的解决方案,但是它受到深层折射条件的妨碍。
  • methods: 本文提出了 AirFL-Mem,一种利用长期记忆机制来减轻深层折射的影响的新方案。提供了对于普通非对称目标函数的收敛界限,包括长期记忆和现有AirFL变体的短期记忆。
  • results: 理论分析表明,AirFL-Mem 能够与理想的通信情况下的 FedAvg 达到同样的收敛速率,而现有方案通常受到错误底值的限制。同时,提出了一种基于几何programming的 convex 优化策略来调整 truncation 阈值以实现功率控制在RAYLEIGH 折射通道上。实验结果证明了理论分析的正确性,并证明了长期记忆机制对深层折射的减轻的优势。
    Abstract Addressing the communication bottleneck inherent in federated learning (FL), over-the-air FL (AirFL) has emerged as a promising solution, which is, however, hampered by deep fading conditions. In this paper, we propose AirFL-Mem, a novel scheme designed to mitigate the impact of deep fading by implementing a \emph{long-term} memory mechanism. Convergence bounds are provided that account for long-term memory, as well as for existing AirFL variants with short-term memory, for general non-convex objectives. The theory demonstrates that AirFL-Mem exhibits the same convergence rate of federated averaging (FedAvg) with ideal communication, while the performance of existing schemes is generally limited by error floors. The theoretical results are also leveraged to propose a novel convex optimization strategy for the truncation threshold used for power control in the presence of Rayleigh fading channels. Experimental results validate the analysis, confirming the advantages of a long-term memory mechanism for the mitigation of deep fading.
    摘要 Translation in Simplified Chinese:addressing the communication bottleneck inherent in federated learning (FL), over-the-air FL (AirFL) has emerged as a promising solution, which is, however, hampered by deep fading conditions. in this paper, we propose AirFL-Mem, a novel scheme designed to mitigate the impact of deep fading by implementing a \emph{long-term} memory mechanism. convergence bounds are provided that account for long-term memory, as well as for existing AirFL variants with short-term memory, for general non-convex objectives. the theory demonstrates that AirFL-Mem exhibits the same convergence rate of federated averaging (FedAvg) with ideal communication, while the performance of existing schemes is generally limited by error floors. the theoretical results are also leveraged to propose a novel convex optimization strategy for the truncation threshold used for power control in the presence of Rayleigh fading channels. experimental results validate the analysis, confirming the advantages of a long-term memory mechanism for the mitigation of deep fading.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing.

Parcel loss prediction in last-mile delivery: deep and non-deep approaches with insights from Explainable AI

  • paper_url: http://arxiv.org/abs/2310.16602
  • repo_url: None
  • paper_authors: Jan de Leeuw, Zaharah Bukhsh, Yingqian Zhang
    for: 降低电子商务 послед个分段配送阶段的包裹丢失是industry中一个重要的目标。methods: 本文提出了两种机器学习方法,namely Data Balance with Supervised Learning (DBSL)和 Deep Hybrid Ensemble Learning (DHEL),以准确预测包裹丢失。results: 我们对一年的比利时船运数据进行了全面的评估,发现 DHEL 模型,它将feed-forward autoencoder与随机森林结合,实现了最高的分类性能。
    Abstract Within the domain of e-commerce retail, an important objective is the reduction of parcel loss during the last-mile delivery phase. The ever-increasing availability of data, including product, customer, and order information, has made it possible for the application of machine learning in parcel loss prediction. However, a significant challenge arises from the inherent imbalance in the data, i.e., only a very low percentage of parcels are lost. In this paper, we propose two machine learning approaches, namely, Data Balance with Supervised Learning (DBSL) and Deep Hybrid Ensemble Learning (DHEL), to accurately predict parcel loss. The practical implication of such predictions is their value in aiding e-commerce retailers in optimizing insurance-related decision-making policies. We conduct a comprehensive evaluation of the proposed machine learning models using one year data from Belgian shipments. The findings show that the DHEL model, which combines a feed-forward autoencoder with a random forest, achieves the highest classification performance. Furthermore, we use the techniques from Explainable AI (XAI) to illustrate how prediction models can be used in enhancing business processes and augmenting the overall value proposition for e-commerce retailers in the last mile delivery.
    摘要 在电商零售领域,一个重要的目标是减少最后一英里配送阶段的包裹产生损失。随着数据的普遍化,包括产品、顾客和订单信息,可以通过机器学习来预测包裹损失。然而,数据的内生偏见成为了一个主要挑战,即只有非常低的百分比的包裹会产生损失。在这篇论文中,我们提出了两种机器学习方法,namely Data Balance with Supervised Learning (DBSL)和Deep Hybrid Ensemble Learning (DHEL),以准确预测包裹损失。实际上,这些预测结果对电商零售商来说具有很大的实用价值,可以帮助他们优化保险相关的决策政策。我们使用了一年的比利时船运数据进行了全面的评估。发现DHEL模型,它将Feed-Forward Autoencoder与随机森林相结合,实现了最高的分类性能。此外,我们使用了Explainable AI(XAI)技术,以示如何使用预测模型来增强业务过程,并增加电商零售商在最后一英里配送阶段的总价值。

Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes

  • paper_url: http://arxiv.org/abs/2310.16597
  • repo_url: None
  • paper_authors: Thiziri Nait-Saada, Alireza Naderi, Jared Tanner
  • for: 这 paper 探讨了深度学习中的许多现象,例如深度网络的训练动态和 activation function 的选择对训练过程的影响。
  • methods: 作者使用了 Matthews et al. (2018) 的证明,将其推广到更大的初始权重分布(即 PSEUDO-IID)中,包括已知的 IID 和 orthogonal 权重,以及低级和结构化缺失的设置。
  • results: 作者显示了 initialized 的 fully-connected 和 convolutional 网络,均可以 viewed as effectively equivalent up to their variance,并且可以通过 tuning 网络在 Edge-of-Chaos 附近进行增强训练。
    Abstract The infinitely wide neural network has been proven a useful and manageable mathematical model that enables the understanding of many phenomena appearing in deep learning. One example is the convergence of random deep networks to Gaussian processes that allows a rigorous analysis of the way the choice of activation function and network weights impacts the training dynamics. In this paper, we extend the seminal proof of Matthews et al. (2018) to a larger class of initial weight distributions (which we call PSEUDO-IID), including the established cases of IID and orthogonal weights, as well as the emerging low-rank and structured sparse settings celebrated for their computational speed-up benefits. We show that fully-connected and convolutional networks initialized with PSEUDO-IID distributions are all effectively equivalent up to their variance. Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training.
    摘要 “无穷深度神经网络”已经被证明为一个有用且可控的数学模型,帮助理解深度学习中的许多现象。一个例子是深度网络的趋向 Gaussian 过程,允许严谨地分析 activation function 和网络重量的选择对训练过程的影响。在这篇文章中,我们将 Matthews et al. (2018) 的著名证明扩展到更大的初始重量分布(我们称为 Pseudo-IID),包括已知的 IID 和对称重量设定,以及受欢迎的低维度和结构簇 sparse 设定,这些设定在 Computational speed-up 方面获得了优化。我们显示了完全连接和卷积神经网络,它们都可以使用 Pseudo-IID 分布初始化,并且它们的方差都是相同的。使用我们的结果,您可以在更广泛的神经网络中找到 Edge-of-Chaos,并将它们调整到 críticality 以提高它们的训练。

Over-the-air Federated Policy Gradient

  • paper_url: http://arxiv.org/abs/2310.16592
  • repo_url: None
  • paper_authors: Huiwen Yang, Lingying Huang, Subhrakanti Dey, Ling Shi
  • for: 这篇论文是关于大规模分布学习优化感知中的无线协同技术。
  • methods: 本文提出了无线联合策略偏微算法,所有代理都同时通过无线通道广播一个分析信号,中央控制器使用接收的总波形更新策略参数。
  • results: 文章研究无线协同策略偏微算法的精度和稳定性,并确定了通信和采样的复杂性。最后,文章还提供了一些实验结果,证明了算法的有效性。
    Abstract In recent years, over-the-air aggregation has been widely considered in large-scale distributed learning, optimization, and sensing. In this paper, we propose the over-the-air federated policy gradient algorithm, where all agents simultaneously broadcast an analog signal carrying local information to a common wireless channel, and a central controller uses the received aggregated waveform to update the policy parameters. We investigate the effect of noise and channel distortion on the convergence of the proposed algorithm, and establish the complexities of communication and sampling for finding an $\epsilon$-approximate stationary point. Finally, we present some simulation results to show the effectiveness of the algorithm.
    摘要 Recently, over-the-air aggregation has been widely considered in large-scale distributed learning, optimization, and sensing. In this paper, we propose the over-the-air federated policy gradient algorithm, where all agents simultaneously broadcast an analog signal carrying local information to a common wireless channel, and a central controller uses the received aggregated waveform to update the policy parameters. We investigate the effect of noise and channel distortion on the convergence of the proposed algorithm, and establish the complexities of communication and sampling for finding an $\epsilon$-approximate stationary point. Finally, we present some simulation results to show the effectiveness of the algorithm.Here is the translation in Traditional Chinese:近年来,透过空中集成已经广泛地考虑在大规模分布式学习、优化和感应中。在这篇文章中,我们提出了透过空中联邦策略梯度算法,所有代理都同时将本地资讯传送到共同无线通道上,中央控制器使用接收的总波形来更新策略参数。我们研究了噪音和通道扭曲对提案的均衡稳定点的影响,并估算了通信和抽样的复杂性。最后,我们显示了一些实验结果,以证明提案的有效性。

Multi-parallel-task Time-delay Reservoir Computing combining a Silicon Microring with WDM

  • paper_url: http://arxiv.org/abs/2310.16588
  • repo_url: None
  • paper_authors: Bernard J. Giron Castro, Christophe Peucheret, Darko Zibar, Francesco Da Ros
  • for: 这个论文用于解决三个任务,即时间序列预测、分类和无线通道准确。
  • methods: 论文使用微环扬的时间延迟储存方案,并对每个任务进行优化的功率和频率调整。
  • results: 论文在每个任务上都达到了状态eliae-of-the-art的性能,并且在多普勒谱上进行了时间序列预测、分类和无线通道准确。
    Abstract We numerically demonstrate a microring-based time-delay reservoir computing scheme that simultaneously solves three tasks involving time-series prediction, classification, and wireless channel equalization. Each task performed on a wavelength-multiplexed channel achieves state-of-the-art performance with optimized power and frequency detuning.
    摘要 我们数字 demonstrate一种基于微环的时间延迟储存计算方案,同时解决了三个任务,包括时间序列预测、分类和无线通道平衡。每个任务在一个波长多plexed通道上实现了状态 искусственный智能性能,并且通过优化功率和频率偏差来优化性能。

Mapping the magnetic field using a magnetometer array with noisy input Gaussian process regression

  • paper_url: http://arxiv.org/abs/2310.16577
  • repo_url: None
  • paper_authors: Thomas Edridge, Manon Kok
  • for: 室内环境中磁性材料会导致环境磁场的干扰,这些磁场干扰地图可以用于室内定位。
  • methods: 我们使用了 Gaussian 过程来学习磁场的空间变化大小,使用磁计的测量值和磁计的位置信息。
  • results: 我们的方法可以提高磁场地图的质量,并且通过实验数据示出了这种方法的有效性。
    Abstract Ferromagnetic materials in indoor environments give rise to disturbances in the ambient magnetic field. Maps of these magnetic disturbances can be used for indoor localisation. A Gaussian process can be used to learn the spatially varying magnitude of the magnetic field using magnetometer measurements and information about the position of the magnetometer. The position of the magnetometer, however, is frequently only approximately known. This negatively affects the quality of the magnetic field map. In this paper, we investigate how an array of magnetometers can be used to improve the quality of the magnetic field map. The position of the array is approximately known, but the relative locations of the magnetometers on the array are known. We include this information in a novel method to make a map of the ambient magnetic field. We study the properties of our method in simulation and show that our method improves the map quality. We also demonstrate the efficacy of our method with experimental data for the mapping of the magnetic field using an array of 30 magnetometers.
    摘要 ferromagnetic 物体在室内环境中会导致环境类极化场的干扰。使用探测器测量和磁场的位置信息,可以使用 Gaussian 程序学习磁场的空间各点的强度。但是探测器的位置仅仅知道大约,这会对磁场地图的质量产生负面影响。在这篇论文中,我们研究了如何使用数组探测器来改善磁场地图的质量。数组的位置大约知道,但是它们之间的相对位置仅知道。我们将这个信息添加到一种新的方法中,以生成更高质量的磁场地图。我们在实验中研究了这个方法的性能,并证明了它可以提高地图的质量。我们还使用实验数据来证明方法的有效性,使用了30个探测器进行磁场的映射。

Large-scale magnetic field maps using structured kernel interpolation for Gaussian process regression

  • paper_url: http://arxiv.org/abs/2310.16574
  • repo_url: None
  • paper_authors: Clara Menzen, Marnix Fetter, Manon Kok
  • for: 这篇论文是为了计算室内环境中大规模磁场图的映射而设计的。
  • methods: 这篇论文使用了一种基于核函数 interpolate(SKI)的方法,通过利用高效的 kronecker 子空间方法来加速推理。
  • results: 在 simulate 中,这种方法可以在磁场图的映射区域增加的精度,并在大规模实验中可以在两分钟内从40000个三维磁场测量数据中构建磁场图。
    Abstract We present a mapping algorithm to compute large-scale magnetic field maps in indoor environments with approximate Gaussian process (GP) regression. Mapping the spatial variations in the ambient magnetic field can be used for localization algorithms in indoor areas. To compute such a map, GP regression is a suitable tool because it provides predictions of the magnetic field at new locations along with uncertainty quantification. Because full GP regression has a complexity that grows cubically with the number of data points, approximations for GPs have been extensively studied. In this paper, we build on the structured kernel interpolation (SKI) framework, speeding up inference by exploiting efficient Krylov subspace methods. More specifically, we incorporate SKI with derivatives (D-SKI) into the scalar potential model for magnetic field modeling and compute both predictive mean and covariance with a complexity that is linear in the data points. In our simulations, we show that our method achieves better accuracy than current state-of-the-art methods on magnetic field maps with a growing mapping area. In our large-scale experiments, we construct magnetic field maps from up to 40000 three-dimensional magnetic field measurements in less than two minutes on a standard laptop.
    摘要 我们提出了一种映射算法,用于计算室内环境中大规模的磁场地图。这种地图可以用于室内定位算法。使用 Gaussian process(GP)回归,可以提供新位置的磁场预测,同时还可以Quantify the uncertainty of the prediction.然而,全GP回归的复杂度会随着数据点的增加而增加,因此有严格的研究。在这篇论文中,我们基于结构kernel interpolation(SKI)框架,通过高效的Krylov子空间方法加速推理。更 specifically,我们将SKI与 derivatives(D-SKI)纳入scalar potential模型中,并计算预测的mean和covariance,复杂度 linearly with the data points。在我们的 simulations中,我们发现我们的方法可以在增加的映射区域大小的情况下,与当前状态的方法相比,提高精度。在我们的大规模实验中,我们从40000个三维磁场测量中构建了磁场地图,并在标准笔记电脑上完成了 less than two minutes。

Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

  • paper_url: http://arxiv.org/abs/2310.16566
  • repo_url: None
  • paper_authors: Chengpeng Li, Zhengyi Yang, Jizhi Zhang, Jiancan Wu, Dingxian Wang, Xiangnan He, Xiang Wang
  • for: 提高推荐系统的长期用户满意度,通过Markov决策过程(MDP)来形式化推荐,并使用了强化学习(RL)方法优化。
  • methods: 提出了一种名为模型强化对比强化学习(MCRL)的新的RL推荐器,通过同时学习价值函数和保守价值学习机制来缓解过度估计问题,并使用对比学习来利用MDP的内部结构信息来模型奖励函数和状态转移函数。
  • results: 实验结果表明,相比现有的OFFLINE RL和自动学习RL方法,MCRL方法在两个真实世界数据集上得到了显著的提升。
    Abstract Reinforcement learning (RL) has been widely applied in recommendation systems due to its potential in optimizing the long-term engagement of users. From the perspective of RL, recommendation can be formulated as a Markov decision process (MDP), where recommendation system (agent) can interact with users (environment) and acquire feedback (reward signals).However, it is impractical to conduct online interactions with the concern on user experience and implementation complexity, and we can only train RL recommenders with offline datasets containing limited reward signals and state transitions. Therefore, the data sparsity issue of reward signals and state transitions is very severe, while it has long been overlooked by existing RL recommenders.Worse still, RL methods learn through the trial-and-error mode, but negative feedback cannot be obtained in implicit feedback recommendation tasks, which aggravates the overestimation problem of offline RL recommender. To address these challenges, we propose a novel RL recommender named model-enhanced contrastive reinforcement learning (MCRL). On the one hand, we learn a value function to estimate the long-term engagement of users, together with a conservative value learning mechanism to alleviate the overestimation problem.On the other hand, we construct some positive and negative state-action pairs to model the reward function and state transition function with contrastive learning to exploit the internal structure information of MDP. Experiments demonstrate that the proposed method significantly outperforms existing offline RL and self-supervised RL methods with different representative backbone networks on two real-world datasets.
    摘要 利用强化学习(RL)的应用在推荐系统中广泛,因为RL可以优化用户的长期投入。从RL的角度来看,推荐系统可以视为一个Markov决策过程(MDP),其中推荐系统(代理)可以与用户(环境)互动,获得反馈(奖励讯号)。然而,在线上互动的问题下,用户体验和实现方式的问题导致RL推荐器的训练仅能靠搀托给定的缺乏奖励讯号和状态变化的练习数据进行训练。因此,推荐系统的奖励讯号和状态变化的缺乏问题非常严重,而这个问题长期以来受到RL推荐器的遗传。更糟糕的是,RL方法通过尝试和错误的模式学习,但是在隐式反馈推荐任务中,负面反馈无法获得,这个问题进一步严重过估推荐系统的性能。为了解决这些挑战,我们提出了一个新的RL推荐器,名为模型强化对照学习(MCRL)。在一方面,我们学习一个值函数估计用户的长期投入,同时将一个保守的值学习机制引入以解决过估问题。另一方面,我们使用对照学习来建构一些正面和负面的状态动作对,以模型奖励函数和状态变化函数,并将这些对照学习的训练结果与RL推荐器结合。实验结果显示,提出的方法在两个真实的数据集上具有明显的性能优化,与存在的缺乏奖励和自我超级RL方法相比。

DECWA : Density-Based Clustering using Wasserstein Distance

  • paper_url: http://arxiv.org/abs/2310.16552
  • repo_url: https://github.com/nabilem/decwa
  • paper_authors: Nabil El Malki, Robin Cugny, Olivier Teste, Franck Ravat
  • for: 这篇论文是为了提出一种新的分区方法和一种基于空间浓度和概率方法的归一化算法,以解决现有的density-based clustering方法在低密度团、邻近密度相似团和高维数据上的缺陷。
  • methods: 该论文提出了一种基于probability density function($p.d.f$)的分区方法,首先使用$p.d.f$来建立子团,然后使用 Wasserstein metric来聚合相似的子团。
  • results: 论文表明,该方法在多种 datasets 上表现出色,超过了现有的state-of-the-art density-based clustering方法。
    Abstract Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters. Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new clustering algorithm based on spatial density and probabilistic approach. First of all, sub-clusters are built using spatial density represented as probability density function ($p.d.f$) of pairwise distances between points. A method is then proposed to agglomerate similar sub-clusters by using both their density ($p.d.f$) and their spatial distance. The key idea we propose is to use the Wasserstein metric, a powerful tool to measure the distance between $p.d.f$ of sub-clusters. We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.
    摘要 首先,我们使用对Points之间的距离的概率浸润函数($p.d.f$)来构建子集。然后,我们提出了一种将相似的子集归类的方法,使用这些子集的浸润函数($p.d.f$)和其空间距离。我们的关键想法是使用沃氏距离,这是一种强大的测量 $p.d.f$ 的距离的工具。我们展示了我们的方法在各种数据集上的优秀性能。

Cyclic Directed Probabilistic Graphical Model: A Proposal Based on Structured Outcomes

  • paper_url: http://arxiv.org/abs/2310.16525
  • repo_url: None
  • paper_authors: Oleksii Sirotkin
  • for: probabilistic graphical model 建模 (structural learning)
  • methods: 使用 probablistic relation network (PRN) 直接捕捉方向相关性 (directional cyclic dependencies)
  • results: 支持从观测数据学习 (learning from observed data), 满足 probabilistic inference (probabilistic inference),可用于数据分析和专家设计等应用 (data analysis and expert design applications)
    Abstract In the process of building (structural learning) a probabilistic graphical model from a set of observed data, the directional, cyclic dependencies between the random variables of the model are often found. Existing graphical models such as Bayesian and Markov networks can reflect such dependencies. However, this requires complicating those models, such as adding additional variables or dividing the model graph into separate subgraphs. Herein, we describe a probabilistic graphical model - probabilistic relation network - that allows the direct capture of directional cyclic dependencies during structural learning. This model is based on the simple idea that each sample of the observed data can be represented by an arbitrary graph (structured outcome), which reflects the structure of the dependencies of the variables included in the sample. Each of the outcomes contains only a part of the graphical model structure; however, a complete graph of the probabilistic model is obtained by combining different outcomes. Such a graph, unlike Bayesian and Markov networks, can be directed and can have cycles. We explored the full joint distribution and conditional distribution and conditional independence properties of variables in the proposed model. We defined the algorithms for constructing of the model from the dataset and for calculating the conditional and full joint distributions. We also performed a numerical comparison with Bayesian and Markov networks. This model does not violate the probability axioms, and it supports learning from observed data. Notably, it supports probabilistic inference, making it a prospective tool in data analysis and in expert and design-making applications.
    摘要 在建立结构学习(structural learning)probabilistic graphical model时,常发现方向性循环依赖关系 междуRandom variable。现有的图书馆模型,如权化网络和马可夫链,可以表示这些依赖关系。然而,这需要补充这些模型,例如添加更多变量或将模型图分成多个子图。在这里,我们描述了一种probabilistic graphical model——probabilistic relation network——可以直接捕捉方向性循环依赖关系。这种模型基于简单的想法,即每个观察数据样本可以表示一个任意图(structured outcome),该图反映变量之间的依赖结构。每个结果只包含变量之间的一部分结构,但是可以将多个结果组合起来获得完整的图形模型结构。这种图不同于权化网络和马可夫链,可以是指定的并且可以有循环。我们研究了这种模型的全联合分布和 condition distribution和 conditional independence性质。我们还定义了从数据集构建模型的算法和计算 conditional和全联合分布的算法。此外,我们对权化网络和马可夫链进行了数值比较。这种模型不违背概率axioms,并且支持从观察数据学习。尤其是,它支持概率推理,使其成为数据分析和专家设计应用的可能工具。

Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test Data

  • paper_url: http://arxiv.org/abs/2310.16524
  • repo_url: https://github.com/vanderschaarlab/3S-Testing
  • paper_authors: Boris van Breugel, Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar
    for: This paper aims to address the challenges of accurately assessing the performance of machine learning models on diverse and underrepresented subgroups.methods: The paper proposes a deep generative modeling framework called 3S Testing, which generates synthetic test sets for small subgroups and simulates distributional shifts.results: The authors demonstrate that 3S Testing outperforms traditional baselines in estimating model performance on minority subgroups and under plausible distributional shifts, and provides intervals around its performance estimates with superior coverage of the ground truth compared to existing approaches.
    Abstract Evaluating the performance of machine learning models on diverse and underrepresented subgroups is essential for ensuring fairness and reliability in real-world applications. However, accurately assessing model performance becomes challenging due to two main issues: (1) a scarcity of test data, especially for small subgroups, and (2) possible distributional shifts in the model's deployment setting, which may not align with the available test data. In this work, we introduce 3S Testing, a deep generative modeling framework to facilitate model evaluation by generating synthetic test sets for small subgroups and simulating distributional shifts. Our experiments demonstrate that 3S Testing outperforms traditional baselines -- including real test data alone -- in estimating model performance on minority subgroups and under plausible distributional shifts. In addition, 3S offers intervals around its performance estimates, exhibiting superior coverage of the ground truth compared to existing approaches. Overall, these results raise the question of whether we need a paradigm shift away from limited real test data towards synthetic test data.
    摘要

Towards Self-Interpretable Graph-Level Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.16520
  • repo_url: https://github.com/yixinliu233/signet
  • paper_authors: Yixin Liu, Kaize Ding, Qinghua Lu, Fuyi Li, Leo Yu Zhang, Shirui Pan
  • for: 本研究的目的是提出一种可解释的图像异常检测模型(SIGNET),可以同时检测图像异常和生成相关的解释。
  • methods: 本研究使用了多视图子图信息瓶颈(MSIB)框架,从而实现了自我解释的图像异常检测。
  • results: 广泛的实验表明,SIGNET具有优秀的异常检测能力和自我解释能力。
    Abstract Graph-level anomaly detection (GLAD) aims to identify graphs that exhibit notable dissimilarity compared to the majority in a collection. However, current works primarily focus on evaluating graph-level abnormality while failing to provide meaningful explanations for the predictions, which largely limits their reliability and application scope. In this paper, we investigate a new challenging problem, explainable GLAD, where the learning objective is to predict the abnormality of each graph sample with corresponding explanations, i.e., the vital subgraph that leads to the predictions. To address this challenging problem, we propose a Self-Interpretable Graph aNomaly dETection model (SIGNET for short) that detects anomalous graphs as well as generates informative explanations simultaneously. Specifically, we first introduce the multi-view subgraph information bottleneck (MSIB) framework, serving as the design basis of our self-interpretable GLAD approach. This way SIGNET is able to not only measure the abnormality of each graph based on cross-view mutual information but also provide informative graph rationales by extracting bottleneck subgraphs from the input graph and its dual hypergraph in a self-supervised way. Extensive experiments on 16 datasets demonstrate the anomaly detection capability and self-interpretability of SIGNET.
    摘要 <>Translate the given text into Simplified Chinese.<> GRaph-level anomaly detection (GLAD) aims to identify graphs that exhibit notable dissimilarity compared to the majority in a collection. However, current works primarily focus on evaluating graph-level abnormality while failing to provide meaningful explanations for the predictions, which largely limits their reliability and application scope. In this paper, we investigate a new challenging problem, explainable GLAD, where the learning objective is to predict the abnormality of each graph sample with corresponding explanations, i.e., the vital subgraph that leads to the predictions. To address this challenging problem, we propose a Self-Interpretable Graph aNomaly dETection model (SIGNET for short) that detects anomalous graphs as well as generates informative explanations simultaneously. Specifically, we first introduce the multi-view subgraph information bottleneck (MSIB) framework, serving as the design basis of our self-interpretable GLAD approach. This way SIGNET is able to not only measure the abnormality of each graph based on cross-view mutual information but also provide informative graph rationales by extracting bottleneck subgraphs from the input graph and its dual hypergraph in a self-supervised way. Extensive experiments on 16 datasets demonstrate the anomaly detection capability and self-interpretability of SIGNET.

Particle-based Variational Inference with Generalized Wasserstein Gradient Flow

  • paper_url: http://arxiv.org/abs/2310.16516
  • repo_url: https://github.com/alexczh1/gwg
  • paper_authors: Ziheng Cheng, Shiyue Zhang, Longlin Yu, Cheng Zhang
  • for: 这 paper 是关于 Particle-based variational inference methods (ParVIs) 的研究,具体来说是 Stein variational gradient descent (SVGD) 的一种更改方法。
  • methods: 这 paper 使用了 kernelized Wasserstein gradient flow 来更新 particles,但是 kernel 的设计可能会带来一些限制。这 paper 提出了一种基于函数梯度流的方法,可以更好地适应不同的情景。
  • results: 这 paper 显示了 Generalized Wasserstein gradient descent (GWG) 方法的强大的收敛保证,并且提供了一种自适应版本,可以根据 Wasserstein 距离选择最佳的 метри来加速收敛。在实验中,这 paper 证明了该方法在 simulate 和实际数据问题上的效果和效率。
    Abstract Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this paper, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems.
    摘要 <> translate "Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this paper, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems." into Simplified Chinese.翻译文本为Simplified Chinese:<>stein variational gradient descent(SVGD)和其他particle-based variational inference方法(ParVI)通常使用kernelized Wasserstein gradient flow来更新particles,但是kernel的设计可能是非常复杂和限制方法的灵活性。latest works表明,可以通过添加quadratic form regularization term来提高性能。在这篇论文中,我们提出了一种基于generalized Wasserstein gradient flow的ParVI框架,称为generalized Wasserstein gradient descent(GWG)。GWG可以看作一种基于convex function所induced的functional gradient方法,具有更广泛的regulator。我们证明了GWG具有强 convergence guarantees。此外,我们还提供了一个自适应版本,可以自动选择Wasserstein metric,以加速收敛。在实验中,我们展示了提议框架在模拟和实际数据问题上的效果和高效性。

Data Optimization in Deep Learning: A Survey

  • paper_url: http://arxiv.org/abs/2310.16499
  • repo_url: https://github.com/yaorujing/data-optimization
  • paper_authors: Ou Wu, Rujing Yao
    for:这篇论文的目的是为了提供一个包容性强的分类法,以便更好地理解现有的数据优化技术,并且探讨未来研究的可能性。methods:这篇论文使用了大量的文献研究,对现有的数据优化技术进行了分类和总结,并建立了一个包容性强的分类法。results:这篇论文通过对现有数据优化技术的分类和总结,提供了一个全面的视角,并探讨了未来研究的可能性。
    Abstract Large-scale, high-quality data are considered an essential factor for the successful application of many deep learning techniques. Meanwhile, numerous real-world deep learning tasks still have to contend with the lack of sufficient amounts of high-quality data. Additionally, issues such as model robustness, fairness, and trustworthiness are also closely related to training data. Consequently, a huge number of studies in the existing literature have focused on the data aspect in deep learning tasks. Some typical data optimization techniques include data augmentation, logit perturbation, sample weighting, and data condensation. These techniques usually come from different deep learning divisions and their theoretical inspirations or heuristic motivations may seem unrelated to each other. This study aims to organize a wide range of existing data optimization methodologies for deep learning from the previous literature, and makes the effort to construct a comprehensive taxonomy for them. The constructed taxonomy considers the diversity of split dimensions, and deep sub-taxonomies are constructed for each dimension. On the basis of the taxonomy, connections among the extensive data optimization methods for deep learning are built in terms of four aspects. We probe into rendering several promising and interesting future directions. The constructed taxonomy and the revealed connections will enlighten the better understanding of existing methods and the design of novel data optimization techniques. Furthermore, our aspiration for this survey is to promote data optimization as an independent subdivision of deep learning. A curated, up-to-date list of resources related to data optimization in deep learning is available at \url{https://github.com/YaoRujing/Data-Optimization}.
    摘要 大规模、高质量数据被认为是深度学习技术应用成功的重要因素。然而,许多实际应用中的深度学习任务仍面临着不足的高质量数据的问题。此外,模型的Robustness、公平性和信任性也与训练数据密切相关。因此,现有许多研究在深度学习任务中强调数据的重要性。这些研究包括数据增强、LOGit扰动、样本权重和数据缩densification等技术。这些技术通常来自不同的深度学习分支,其理论 inspirations 或启发性可能与彼此不相关。本研究 aimsto organize 过去 литераature 中关于深度学习数据优化的各种方法,并尝试构建一个全面的分类体系。该分类系统考虑了多维度的分割多样性,并为每个维度建立了深度的子分类。基于这个分类系统,对涵盖了广泛的深度学习数据优化方法的连接建立了四个方面。我们还探讨了一些有潜力和有趣的未来方向。建立的分类系统和连接将有助于更好地理解现有方法,以及设计新的数据优化技术。此外,我们的愿望是通过这种报告来促进数据优化成为深度学习的独立分支。一个CURATED、最新的关于数据优化在深度学习的资源列表可以在 \url{https://github.com/YaoRujing/Data-Optimization} 上找到。

Citizen participation: crowd-sensed sustainable indoor location services

  • paper_url: http://arxiv.org/abs/2310.16496
  • repo_url: None
  • paper_authors: Ioannis Nasios, Konstantinos Vogklis, Avleen Malhi, Anastasia Vayona, Panos Chatziadam, Vasilis Katos
  • for: 提供无需特殊硬件的indoor定位功能,帮助实现智能基础设施转型。
  • methods: 使用机器学习技术,通过访问可用的WiFi基站网络,基于visitor的智能手机来估算indoor位置。
  • results: 实验结果显示,提posed方法可以达到准确率低于2m,并且模型在BSSIDs数量减少时仍然具有抗难度性。
    Abstract In the present era of sustainable innovation, the circular economy paradigm dictates the optimal use and exploitation of existing finite resources. At the same time, the transition to smart infrastructures requires considerable investment in capital, resources and people. In this work, we present a general machine learning approach for offering indoor location awareness without the need to invest in additional and specialised hardware. We explore use cases where visitors equipped with their smart phone would interact with the available WiFi infrastructure to estimate their location, since the indoor requirement poses a limitation to standard GPS solutions. Results have shown that the proposed approach achieves a less than 2m accuracy and the model is resilient even in the case where a substantial number of BSSIDs are dropped.
    摘要 现在可持续创新时代,循环经济模式提倡最佳利用现有的有限资源。同时,转移到智能基础设施需要较大的投资资源人力。在这项工作中,我们提出了一种通用机器学习方法,以实现无需特殊硬件投资的室内位置意识。我们探讨访问者通过使用可用的 WiFi 基础设施与其智能手机进行互动,以估算室内位置,因为标准 GPS 解决方案在室内环境下存在限制。结果表明,我们的方法可以实现准确率低于 2m,并且模型在大量 BSSID 被drop 时仍能保持可靠性。

TSONN: Time-stepping-oriented neural network for solving partial differential equations

  • paper_url: http://arxiv.org/abs/2310.16491
  • repo_url: None
  • paper_authors: Wenbo Cao, Weiwei Zhang
  • for: 解决具有部分偏微分方程(PDE)的前向和反向问题,特别是通过physics-informed neural networks(PINNs)来解决这些问题。
  • methods: 将时间步长级联到深度学习中,将原始的不良条件优化问题转化为一系列良好条件的子问题,使模型训练 converges significantly。
  • results: 提出的方法可以稳定地训练并在许多问题中获得正确的结果,而标准PINNs则无法解决这些问题。此外,我们还发现了时间步长方法在神经网络优化方法框架中的一些新性和优势,比如可以使用显式或隐式时间步长,并且可以与传统的网格数值方法进行比较。
    Abstract Deep neural networks (DNNs), especially physics-informed neural networks (PINNs), have recently become a new popular method for solving forward and inverse problems governed by partial differential equations (PDEs). However, these methods still face challenges in achieving stable training and obtaining correct results in many problems, since minimizing PDE residuals with PDE-based soft constraint make the problem ill-conditioned. Different from all existing methods that directly minimize PDE residuals, this work integrates time-stepping method with deep learning, and transforms the original ill-conditioned optimization problem into a series of well-conditioned sub-problems over given pseudo time intervals. The convergence of model training is significantly improved by following the trajectory of the pseudo time-stepping process, yielding a robust optimization-based PDE solver. Our results show that the proposed method achieves stable training and correct results in many problems that standard PINNs fail to solve, requiring only a simple modification on the loss function. In addition, we demonstrate several novel properties and advantages of time-stepping methods within the framework of neural network-based optimization approach, in comparison to traditional grid-based numerical method. Specifically, explicit scheme allows significantly larger time step, while implicit scheme can be implemented as straightforwardly as explicit scheme.
    摘要 深度神经网络(DNN),特别是物理学信息泛化神经网络(PINN),最近成为解决部分导数方程(PDE)的前向和反向问题的新方法。然而,这些方法仍然面临困难在实现稳定训练和正确结果的多个问题中,因为将PDE residuals minimized with PDE-based soft constraint会导致问题变得不稳定。与所有直接将PDE residuals minimized的方法不同,这项工作将时间步骤法与深度学习结合,将原始不稳定优化问题转化为一系列稳定优化问题。通过跟踪pseudo时间步骤过程的路径,提高了模型训练的 converges。我们的结果表明,提案的方法可以在许多问题中实现稳定的训练和正确的结果,只需要对损失函数进行一个简单的修改。此外,我们还展示了时间步骤法在神经网络基于优化方法框架中的一些新特性和优势,比如:Explicit scheme可以允许更大的时间步骤,而Implicit scheme可以被实现为Explicit scheme一样直接。

Hyperparameter Optimization for Multi-Objective Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16487
  • repo_url: https://github.com/lucasalegre/morl-baselines
  • paper_authors: Florian Felten, Daniel Gareev, El-Ghazali Talbi, Grégoire Danoy
  • for: 本研究旨在解决多目标算法中的超参数优化问题,以提高多目标算法的性能。
  • methods: 本研究提出了一种系统的方法来解决超参数优化问题,包括精心设计的搜索策略和优化目标函数。
  • results: 实验结果表明,提出的方法可以有效地提高多目标算法的性能,并且标识了未来研究的可能性。
    Abstract Reinforcement learning (RL) has emerged as a powerful approach for tackling complex problems. The recent introduction of multi-objective reinforcement learning (MORL) has further expanded the scope of RL by enabling agents to make trade-offs among multiple objectives. This advancement not only has broadened the range of problems that can be tackled but also created numerous opportunities for exploration and advancement. Yet, the effectiveness of RL agents heavily relies on appropriately setting their hyperparameters. In practice, this task often proves to be challenging, leading to unsuccessful deployments of these techniques in various instances. Hence, prior research has explored hyperparameter optimization in RL to address this concern. This paper presents an initial investigation into the challenge of hyperparameter optimization specifically for MORL. We formalize the problem, highlight its distinctive challenges, and propose a systematic methodology to address it. The proposed methodology is applied to a well-known environment using a state-of-the-art MORL algorithm, and preliminary results are reported. Our findings indicate that the proposed methodology can effectively provide hyperparameter configurations that significantly enhance the performance of MORL agents. Furthermore, this study identifies various future research opportunities to further advance the field of hyperparameter optimization for MORL.
    摘要 本文对MORL中的超参数优化进行了初步的研究。我们正式定义了问题,抛光了其独特的挑战,并提出了系统的方法ology。我们应用了一种现状最佳的MORL算法在一个知名环境中,并发布了初步的结果。我们的发现表明,我们的方法可以有效地为MORL代理人提供优化超参数的配置,从而提高其性能。此外,本研究还标识了MORL中超参数优化的未来研究的多种可能性。

A Comprehensive Python Library for Deep Learning-Based Event Detection in Multivariate Time Series Data and Information Retrieval in NLP

  • paper_url: http://arxiv.org/abs/2310.16485
  • repo_url: None
  • paper_authors: Menouar Azib, Benjamin Renard, Philippe Garnier, Vincent Génot, Nicolas André
    for: 这份研究目的是为了开发一个基于深度学习的时间序列资料事件探测方法,以便在不同领域中进行更有效的事件探测和预测。methods: 这个方法使用了四个重要的新特性,包括:首先,它是一个回归型的方法,而不是一个双值分类方法。其次,它不需要标注化的数据集,而是仅需要提供参考事件,例如时间点或时间间隔。第三,它使用了一个堆叠 ensemble 学习元模型,融合了多种深度学习模型,包括预设的 feed-forward neural networks (FFNs) 到现有的架构如 transformers。最后,为了实用实现,我们已经开发了一个 Python 套件,名为 eventdetector-ts,可以通过 Python Package Index (PyPI) 进行安装。results: 在不同的实验中,我们显示了这个方法的多元性和有效性,包括自然语言处理 (NLP) 到金融安全领域的实验。
    Abstract Event detection in time series data is crucial in various domains, including finance, healthcare, cybersecurity, and science. Accurately identifying events in time series data is vital for making informed decisions, detecting anomalies, and predicting future trends. Despite extensive research exploring diverse methods for event detection in time series, with deep learning approaches being among the most advanced, there is still room for improvement and innovation in this field. In this paper, we present a new deep learning supervised method for detecting events in multivariate time series data. Our method combines four distinct novelties compared to existing deep-learning supervised methods. Firstly, it is based on regression instead of binary classification. Secondly, it does not require labeled datasets where each point is labeled; instead, it only requires reference events defined as time points or intervals of time. Thirdly, it is designed to be robust by using a stacked ensemble learning meta-model that combines deep learning models, ranging from classic feed-forward neural networks (FFNs) to state-of-the-art architectures like transformers. This ensemble approach can mitigate individual model weaknesses and biases, resulting in more robust predictions. Finally, to facilitate practical implementation, we have developed a Python package to accompany our proposed method. The package, called eventdetector-ts, can be installed through the Python Package Index (PyPI). In this paper, we present our method and provide a comprehensive guide on the usage of the package. We showcase its versatility and effectiveness through different real-world use cases from natural language processing (NLP) to financial security domains.
    摘要 时序数据中的事件检测在各个领域都是非常重要的,包括金融、医疗、网络安全和科学等。正确地在时序数据中检测事件是至关重要的,以便做出了 informed 的决策,检测异常点和预测未来趋势。虽然有 extensively 的研究探讨了不同的事件检测方法,但是还有很多空间和创新的 possiblities 在这个领域。在这篇论文中,我们提出了一种新的深度学习监督方法,用于在多ivariate 时序数据中检测事件。我们的方法包括四个新特点,与现有的深度学习监督方法相比:1. 基于回归而不是二分类。2. 不需要标注数据集,只需要参考事件定义为时间点或时间Interval。3. 使用堆叠 ensemble learning 元模型,结合深度学习模型,从 classic feed-forward neural networks (FFNs) 到现代架构如 transformers。这种ensembleapproach可以 Mitigate 个模型的弱点和偏见,从而获得更加稳定的预测。4. 为了方便实现,我们开发了一个 Python 包,可以在 PyPI 上安装。在这篇论文中,我们介绍了我们的方法,并提供了使用该包的完整指南。我们通过不同的实际案例,从自然语言处理 (NLP) 到金融安全领域,展示了我们的方法的多样性和效果。

Symphony of experts: orchestration with adversarial insights in reinforcement learning

  • paper_url: http://arxiv.org/abs/2310.16473
  • repo_url: None
  • paper_authors: Matthieu Jonckheere, Chiara Mignacco, Gilles Stoltz
  • for: 这 paper 的目的是探讨 Structured reinforcement learning 如何在探索困难的情况下 достичь更好的性能,特别是通过 “orchestration” 的概念,一小组专家策略来导航决策。
  • methods: 这 paper 使用的方法包括 orchestration 的模型化,以及对 adversarial settings 的转移 regret bound 结果,以及对 natural policy gradient 的扩展和推广。
  • results: 这 paper 的结果包括在 tabular Setting 下的 value-functions regret bounds,以及对 arbitrary adversarial aggregation strategies 的扩展和推广。 另外, paper 还提供了更加透明的证明方法,以及一个 Stochastic matching toy model 的仿真结果。
    Abstract Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges. We explore this field through the concept of orchestration, where a (small) set of expert policies guides decision-making; the modeling thereof constitutes our first contribution. We then establish value-functions regret bounds for orchestration in the tabular setting by transferring regret-bound results from adversarial settings. We generalize and extend the analysis of natural policy gradient in Agarwal et al. [2021, Section 5.3] to arbitrary adversarial aggregation strategies. We also extend it to the case of estimated advantage functions, providing insights into sample complexity both in expectation and high probability. A key point of our approach lies in its arguably more transparent proofs compared to existing methods. Finally, we present simulations for a stochastic matching toy model.
    摘要 “结构化强化学习可以利用有利属性的策略来 дости得更好的性能,特别在探索问题时。我们通过“指挥”的概念来探索这一领域,其中一小集的专家策略导向决策。我们在这篇文章中将提供值函数的后悔 bounds,并将这些结果应用到表格设定中。我们还将对自然策略均衡的分析扩展到任意的反抗策略整合策略,并将其推广到估计的优势函数中。我们的方法具有更加透明的证明,相比于现有的方法。最后,我们将提供一个Stochastic Matching实验模型的 simulate。”Note: The translation is in Simplified Chinese, which is one of the two standard varieties of Chinese. The other variety is Traditional Chinese.

Learning Continuous Network Emerging Dynamics from Scarce Observations via Data-Adaptive Stochastic Processes

  • paper_url: http://arxiv.org/abs/2310.16466
  • repo_url: https://github.com/csjtx1021/neural_ode_processes_for_network_dynamics-master
  • paper_authors: Jiaxu Cui, Bingyi Sun, Jiming Liu, Bo Yang
    for: 学习复杂网络动态的研究是探讨复杂网络在多个领域中的互动机制的重要基础。methods: 我们提出了一种新的Neural ODE Processes for Network Dynamics(NDP4ND),它是基于随机数据适应网络动态的新一代渐近逻辑过程。results: 我们在多种网络动态实验中进行了广泛的实验,结果表明,NDP4ND方法具有优秀的数据适应性和计算效率,可以快速适应未见的网络出现动态,并且可以减少观察数据的比例至只有约6%,提高学习新动态的速度。
    Abstract Learning network dynamics from the empirical structure and spatio-temporal observation data is crucial to revealing the interaction mechanisms of complex networks in a wide range of domains. However, most existing methods only aim at learning network dynamic behaviors generated by a specific ordinary differential equation instance, resulting in ineffectiveness for new ones, and generally require dense observations. The observed data, especially from network emerging dynamics, are usually difficult to obtain, which brings trouble to model learning. Therefore, how to learn accurate network dynamics with sparse, irregularly-sampled, partial, and noisy observations remains a fundamental challenge. We introduce Neural ODE Processes for Network Dynamics (NDP4ND), a new class of stochastic processes governed by stochastic data-adaptive network dynamics, to overcome the challenge and learn continuous network dynamics from scarce observations. Intensive experiments conducted on various network dynamics in ecological population evolution, phototaxis movement, brain activity, epidemic spreading, and real-world empirical systems, demonstrate that the proposed method has excellent data adaptability and computational efficiency, and can adapt to unseen network emerging dynamics, producing accurate interpolation and extrapolation with reducing the ratio of required observation data to only about 6\% and improving the learning speed for new dynamics by three orders of magnitude.
    摘要 学习复杂网络动力学从实际结构和空间时间观察数据中是揭示复杂网络交互机制的关键。然而,现有方法大多只关注于学习特定常 differential equation 实例生成的网络动力学行为,导致对新的动力学不效果,并且通常需要密集观察。观察数据,特别是从网络 emerging 动力学来,通常困难以获得,这会带来模型学习的困难。因此,如何学习准确的网络动力学以及稀疏、不规则、部分和噪音观察数据仍是一个基本挑战。我们介绍了神经网络过程 для网络动力学(NDP4ND),一种新的随机过程,由数据适应网络动力学控制。经过对各种网络动力学实验,包括生态学人口演化、光激活运动、脑活动、流行病传播和实际观察数据,我们发现该方法具有出色的数据适应性和计算效率,可以适应未见到的网络emerging动力学,生成高精度的 interpolate 和 extrapolate,并将观察数据比例降低至只有6%,提高了对新动力学的学习速度。

Unknown Health States Recognition With Collective Decision Based Deep Learning Networks In Predictive Maintenance Applications

  • paper_url: http://arxiv.org/abs/2310.17670
  • repo_url: None
  • paper_authors: Chuyue Lou, M. Amine Atoui
  • for: 这个研究的目的是为了提出一个集体决策框架,以便不同的卷积神经网络可以同时进行不同的健康状态分类。
  • methods: 这个研究使用了多个卷积神经网络,包括进步的卷积神经网络、多尺度卷积神经网络和差异卷积神经网络。这些神经网络可以从工业数据中学习有效的健康状态表示。
  • results: 根据TEP公共数据集的验证结果显示,提出的卷积神经网络集体决策框架可以优化不知道的健康状态标本的识别能力,同时维持知道的健康状态标本的准确率。这些结果显示了该深度学习框架的优越性,并且基于余差和多尺度学习的神经网络表现最佳。
    Abstract At present, decision making solutions developed based on deep learning (DL) models have received extensive attention in predictive maintenance (PM) applications along with the rapid improvement of computing power. Relying on the superior properties of shared weights and spatial pooling, Convolutional Neural Network (CNN) can learn effective representations of health states from industrial data. Many developed CNN-based schemes, such as advanced CNNs that introduce residual learning and multi-scale learning, have shown good performance in health state recognition tasks under the assumption that all the classes are known. However, these schemes have no ability to deal with new abnormal samples that belong to state classes not part of the training set. In this paper, a collective decision framework for different CNNs is proposed. It is based on a One-vs-Rest network (OVRN) to simultaneously achieve classification of known and unknown health states. OVRN learn state-specific discriminative features and enhance the ability to reject new abnormal samples incorporated to different CNNs. According to the validation results on the public dataset of Tennessee Eastman Process (TEP), the proposed CNN-based decision schemes incorporating OVRN have outstanding recognition ability for samples of unknown heath states, while maintaining satisfactory accuracy on known states. The results show that the new DL framework outperforms conventional CNNs, and the one based on residual and multi-scale learning has the best overall performance.
    摘要 当前,基于深度学习(DL)模型的决策支持技术在预测维护(PM)应用中得到了广泛的关注,随着计算能力的快速提升。利用深度学习模型的共享权重和空间 pooling 特性,卷积神经网络(CNN)可以从工业数据中学习有效的健康状态表示。一些已经发展出来的 CNN 基本 schemes,如增强 CNN 和多级学习,在健康状态识别任务中表现良好,假设所有类别都是已知的。然而,这些 schemes 无法处理新的异常样本,它们不在训练集中。在这篇论文中,一种集成多个 CNN 的决策框架是提出的。它基于一个对抗网络(OVRN),同时实现已知和未知健康状态的分类。OVRN 学习特定状态的抗性特征,提高了对新异常样本的拒绝能力。根据公共数据集 TEP 的验证结果,提出的 CNN 基本 schemes incorporating OVRN 在未知健康状态样本的识别能力方面表现出色,同时保持知道状态的准确率。结果表明,新的 DL 框架超越传统 CNNs,而基于增强和多级学习的 CNN 在整体性能方面表现最佳。

ClearMark: Intuitive and Robust Model Watermarking via Transposed Model Training

  • paper_url: http://arxiv.org/abs/2310.16453
  • repo_url: None
  • paper_authors: Torsten Krauß, Jasper Stang, Alexandra Dmitrienko
  • for: 提供一种可读性好的深度神经网络(DNN)水印方法,以便人类可以直观地判断水印是否存在。
  • methods: 使用一种名为ClearMark的方法,该方法在DNN模型中嵌入可见的水印,并且不需要复杂的验证算法或强制性阈值。
  • results: ClearMark方法可以在不同的数据集和模型上实现高度的可读性和抗性能,并且可以承受模型修改和黑客攻击。
    Abstract Due to costly efforts during data acquisition and model training, Deep Neural Networks (DNNs) belong to the intellectual property of the model creator. Hence, unauthorized use, theft, or modification may lead to legal repercussions. Existing DNN watermarking methods for ownership proof are often non-intuitive, embed human-invisible marks, require trust in algorithmic assessment that lacks human-understandable attributes, and rely on rigid thresholds, making it susceptible to failure in cases of partial watermark erasure. This paper introduces ClearMark, the first DNN watermarking method designed for intuitive human assessment. ClearMark embeds visible watermarks, enabling human decision-making without rigid value thresholds while allowing technology-assisted evaluations. ClearMark defines a transposed model architecture allowing to use of the model in a backward fashion to interwove the watermark with the main task within all model parameters. Compared to existing watermarking methods, ClearMark produces visual watermarks that are easy for humans to understand without requiring complex verification algorithms or strict thresholds. The watermark is embedded within all model parameters and entangled with the main task, exhibiting superior robustness. It shows an 8,544-bit watermark capacity comparable to the strongest existing work. Crucially, ClearMark's effectiveness is model and dataset-agnostic, and resilient against adversarial model manipulations, as demonstrated in a comprehensive study performed with four datasets and seven architectures.
    摘要 由于数据收集和模型训练的成本高昂,深度神经网络(DNN)通常属于创建者的知识产权。因此,不经授权使用、盗取或修改可能会导致法律后果。现有的DNN涂鸦方法为证明所有权存在一些缺点,如难于理解、需要对算法评估中的信任,且需要固定的阈值,这使得其容易受到部分涂鸦 removing 的影响。本文介绍了 ClearMark,首个为人类可读性设计的 DNN 涂鸦方法。ClearMark 使用可见的涂鸦,allowing human decision-making without rigid value thresholds,同时允许技术支持的评估。ClearMark 使用拼接模型 architecture,使得模型在反向方式下使用,将涂鸦与主任务内所有模型参数结合在一起。与现有的涂鸦方法相比,ClearMark 生成的视觉涂鸦易于人类理解,无需复杂的验证算法或固定的阈值。涂鸦被内置于所有模型参数和主任务中,具有更高的鲁棒性。它可以承载 8,544 比特的涂鸦 capacities,与最强的现有工作相当。更重要的是,ClearMark 的效果是模型和数据aset 独立,并且对抗模型修改的攻击,如在四个数据集和七种架构上进行了全面的研究。

Grokking in Linear Estimators – A Solvable Model that Groks without Understanding

  • paper_url: http://arxiv.org/abs/2310.16441
  • repo_url: None
  • paper_authors: Noam Levi, Alon Beck, Yohai Bar-Sinai
  • for: 这 paper 探讨了模型如何在训练数据之后仍然能够泛化。
  • methods: 作者使用了教师-学生模式和高维输入来研究了 linear 网络在Linear任务上的泛化行为。
  • results: 研究发现,在训练和泛化数据协方差矩阵的基础上,模型可以在训练数据之后仍然具有泛化能力,并且可以通过精确预测泛化时间的因素来预测模型的泛化能力。
    Abstract Grokking is the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We show both analytically and numerically that grokking can surprisingly occur in linear networks performing linear tasks in a simple teacher-student setup with Gaussian inputs. In this setting, the full training dynamics is derived in terms of the training and generalization data covariance matrix. We present exact predictions on how the grokking time depends on input and output dimensionality, train sample size, regularization, and network initialization. We demonstrate that the sharp increase in generalization accuracy may not imply a transition from "memorization" to "understanding", but can simply be an artifact of the accuracy measure. We provide empirical verification for our calculations, along with preliminary results indicating that some predictions also hold for deeper networks, with non-linear activations.
    摘要 它(Grokking)是一种吸引人的现象,在学习过程中,模型会在训练数据之后仍然能够泛化。我们通过分析和数值方法表明,在线性网络中进行线性任务时,grokking可以意外地发生。在这种设置中,我们计算了全面的训练动态,并通过训练和泛化数据协方差矩阵来表示。我们提出了具体预测,包括输入和输出维度、训练样本大小、规范、网络初始化等因素,grokking时间如何随变化。我们还证明了,尽管sharply increase in generalization accuracy不一定意味着从"记忆"转移到"理解",但可能只是精度度量的假象。我们提供了实验证明,以及初步结果表明,一些预测也适用于更深的网络和非线性活化。

An Approach for Efficient Neural Architecture Search Space Definition

  • paper_url: http://arxiv.org/abs/2310.17669
  • repo_url: None
  • paper_authors: Léo Pouy, Fouad Khenfri, Patrick Leserf, Chokri Mraidha, Cherif Larouci
  • for: 本研究旨在提出一种新的自动Machine Learning(AutoML)方法和工具,帮助用户在选择神经网络架构时快速寻找最佳策略。
  • methods: 本研究使用了一种新的细胞结构搜索空间,易于理解和操作,并且可以涵盖大多数当前领先的卷积神经网络架构。
  • results: 研究人员通过实验和分析表明,提出的方法可以快速找到最佳策略,并且可以涵盖大多数当前领先的卷积神经网络架构。
    Abstract As we advance in the fast-growing era of Machine Learning, various new and more complex neural architectures are arising to tackle problem more efficiently. On the one hand their efficient usage requires advanced knowledge and expertise, which is most of the time difficult to find on the labor market. On the other hand, searching for an optimized neural architecture is a time-consuming task when it is performed manually using a trial and error approach. Hence, a method and a tool support is needed to assist users of neural architectures, leading to an eagerness in the field of Automatic Machine Learning (AutoML). When it comes to Deep Learning, an important part of AutoML is the Neural Architecture Search (NAS). In this paper, we propose a novel cell-based hierarchical search space, easy to comprehend and manipulate. The objectives of the proposed approach are to optimize the search-time and to be general enough to handle most of state of the art Convolutional Neural Networks (CNN) architectures.
    摘要 随着机器学习领域的快速发展,不断出现新的更复杂的神经网络架构,以提高问题的解决效率。一方面,这些神经网络架构的高级知识和专业技能的需求往往困难找到在劳动市场上。另一方面,手动进行试验和尝试的方法来搜索优化的神经网络架构是一项时间consuming的任务。因此,一种方法和工具支持是需要的,以帮助神经网络架构的用户,从而促进自动机器学习(AutoML)领域的发展。在深度学习方面,搜索神经网络架构(NAS)是AutoML的一个重要组成部分。本文提出了一种新的细胞基 hierarchical 搜索空间,易于理解和操作。该方法的目标是 оптимизиethe search-time和涵盖大多数当前领先的卷积神经网络架构。

Non-isotropic Persistent Homology: Leveraging the Metric Dependency of PH

  • paper_url: http://arxiv.org/abs/2310.16437
  • repo_url: None
  • paper_authors: Vincent P. Grande, Michael T. Schaub
  • for: 这篇论文旨在提出一种新的点云数据分析方法,以EXTRACT ADDITIONAL TOPOLOGICAL AND GEOMETRICAL INFORMATION FROM PERSISTENT HOMOLOGY ANALYSIS。
  • methods: 该方法基于变换距离函数的思想,通过对 persistently diagram 的变化来EXTRACT ADDITIONAL INFORMATION。
  • results: 实验表明,该方法可以准确地EXTRACT INFORMATION ON ORIENTATION, ORIENTATIONAL VARIANCE AND SCALING OF POINT CLOUDS,并且可以应用于实际数据。
    Abstract Persistent Homology is a widely used topological data analysis tool that creates a concise description of the topological properties of a point cloud based on a specified filtration. Most filtrations used for persistent homology depend (implicitly) on a chosen metric, which is typically agnostically chosen as the standard Euclidean metric on $\mathbb{R}^n$. Recent work has tried to uncover the 'true' metric on the point cloud using distance-to-measure functions, in order to obtain more meaningful persistent homology results. Here we propose an alternative look at this problem: we posit that information on the point cloud is lost when restricting persistent homology to a single (correct) distance function. Instead, we show how by varying the distance function on the underlying space and analysing the corresponding shifts in the persistence diagrams, we can extract additional topological and geometrical information. Finally, we numerically show that non-isotropic persistent homology can extract information on orientation, orientational variance, and scaling of randomly generated point clouds with good accuracy and conduct some experiments on real-world data.
    摘要 persistente homology 是一种广泛使用的数据分析工具,可以生成一个精炼的点云的Topological Property 的描述,基于指定的筛选。大多数使用的筛选都是基于标准欧几里得度量空间 $\mathbb{R}^n$ 中的距离函数,而这些距离函数通常是随意选择的。近些年来,人们尝试了找到 '真实' 的度量函数,以获得更加有意义的 persistente homology 结果。在这篇文章中,我们提出了一个不同的思路:我们认为,只要限制 persistente homology 到单一的正确距离函数上,就会产生信息损失。相反,我们示出了通过在下面空间上变换距离函数,并分析对应的 persistente diagram 的变化,可以从中提取更多的topological 和几何信息。最后,我们通过数值实验表明,非均匀 persistente homology 可以对 randomly generated point clouds 中的方向、方向 variance 和扩展缩放进行准确的检测,并进行了一些实验。

FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.16412
  • repo_url: https://github.com/zhuohuangai/FlatMatch
  • paper_authors: Zhuo Huang, Li Shen, Jun Yu, Bo Han, Tongliang Liu
  • for: 这个论文的目的是提出一种新的 semi-supervised learning (SSL) 方法,以便将充沛的无标数数据与罕见的标数数据结合起来,以提高 SSL 的性能。
  • methods: 这个方法基于一个新的测度名为“cross-sharpness”,它测量了两个不同变数的关系。这个测度可以确保学习过程中的模型在标数数据和无标数数据上的学习性能是一致的。
  • results: 这个方法可以在许多 SSL Setting中取得最佳的结果,并且可以将无标数数据中的学习性能与标数数据中的学习性能连接起来,以提高 SSL 的性能。
    Abstract Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data. However, most SSL methods are commonly based on instance-wise consistency between different data transformations. Therefore, the label guidance on labeled data is hard to be propagated to unlabeled data. Consequently, the learning process on labeled data is much faster than on unlabeled data which is likely to fall into a local minima that does not favor unlabeled data, leading to sub-optimal generalization performance. In this paper, we propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets. Specifically, we increase the empirical risk on labeled data to obtain a worst-case model which is a failure case that needs to be enhanced. Then, by leveraging the richness of unlabeled data, we penalize the prediction difference (i.e., cross-sharpness) between the worst-case model and the original model so that the learning direction is beneficial to generalization on unlabeled data. Therefore, we can calibrate the learning process without being limited to insufficient label information. As a result, the mismatched learning performance can be mitigated, further enabling the effective exploitation of unlabeled data and improving SSL performance. Through comprehensive validation, we show FlatMatch achieves state-of-the-art results in many SSL settings.
    摘要 半指导学习(SSL)是一种有效地利用充沛的无标签数据和极其罕见的标签数据。然而,大多数SSL方法通常基于实例级别的一致性,因此将标签指导从标签数据传递到无标签数据很困难。因此,在标签数据上学习的过程比在无标签数据上更快,可能导致欠拟合性问题,从而影响泛化性表现。在这篇论文中,我们提出了平滑匹配(FlatMatch),它使用跨锐度度量来保证两个数据集之间的学习表现的一致性。具体来说,我们通过提高标签数据上的Empirical risk来获得一个最坏情况模型,这是一个需要改进的失败情况。然后,通过利用无标签数据的质量,我们对最坏情况模型和原始模型之间的预测差(i.e., 跨锐度度量)进行惩罚,以便通过提高学习方向来优化泛化性表现。因此,我们可以不受限于不充分的标签信息进行调整学习过程。通过全面验证,我们显示了FlatMatch在许多SSL设置中实现了状态机器的 результа。

Multiple Key-value Strategy in Recommendation Systems Incorporating Large Language Model

  • paper_url: http://arxiv.org/abs/2310.16409
  • repo_url: None
  • paper_authors: Dui Wang, Xiangyu Hou, Xiaohui Yang, Bo Zhang, Renbing Chen, Daiyue Xue
    for: 这篇论文的目的是提出一种基于多重键值数据的sequential recommendation方法,以帮助在实际应用中处理多个键值数据。methods: 该方法利用了大型自然语言模型(LLM)的特性,通过适应域知识注入来增强RS的表现。此外,该方法还提出了一种创新的洗牌和隐藏策略来解决LLM在多个键值数据上学习的问题。results: 经过广泛的实验 validate,该方法在MovieLens dataset上显示出了良好的效果,能够好好地完成多个键值数据的sequential recommendation问题。
    Abstract Recommendation system (RS) plays significant roles in matching users information needs for Internet applications, and it usually utilizes the vanilla neural network as the backbone to handle embedding details. Recently, the large language model (LLM) has exhibited emergent abilities and achieved great breakthroughs both in the CV and NLP communities. Thus, it is logical to incorporate RS with LLM better, which has become an emerging research direction. Although some existing works have made their contributions to this issue, they mainly consider the single key situation (e.g. historical interactions), especially in sequential recommendation. The situation of multiple key-value data is simply neglected. This significant scenario is mainstream in real practical applications, where the information of users (e.g. age, occupation, etc) and items (e.g. title, category, etc) has more than one key. Therefore, we aim to implement sequential recommendations based on multiple key-value data by incorporating RS with LLM. In particular, we instruct tuning a prevalent open-source LLM (Llama 7B) in order to inject domain knowledge of RS into the pre-trained LLM. Since we adopt multiple key-value strategies, LLM is hard to learn well among these keys. Thus the general and innovative shuffle and mask strategies, as an innovative manner of data argument, are designed. To demonstrate the effectiveness of our approach, extensive experiments are conducted on the popular and suitable dataset MovieLens which contains multiple keys-value. The experimental results demonstrate that our approach can nicely and effectively complete this challenging issue.
    摘要

Information-Theoretic Generalization Analysis for Topology-aware Heterogeneous Federated Edge Learning over Noisy Channels

  • paper_url: http://arxiv.org/abs/2310.16407
  • repo_url: None
  • paper_authors: Zheshun Wu, Zenglin Xu, Hongfang Yu, Jie Liu
  • for: 本研究旨在探讨 Federated Edge Learning (FEEL) 中的泛化问题,即在无isy通道和多样环境下,移动设备进行模型参数的传输和数据收集,以及设备之间的协同通信对模型的泛化的影响。
  • methods: 本研究采用信息论的泛化分析方法,对于 topology-aware FEEL 中的数据不同性和噪声通道的影响进行了全面的检查。此外,我们还提出了一种新的常见规范方法 called Federated Global Mutual Information Reduction (FedGMIR),用于提高模型的性能。
  • results: 数据 validate our theoretical findings and provide evidence for the effectiveness of the proposed method.
    Abstract With the rapid growth of edge intelligence, the deployment of federated learning (FL) over wireless networks has garnered increasing attention, which is called Federated Edge Learning (FEEL). In FEEL, both mobile devices transmitting model parameters over noisy channels and collecting data in diverse environments pose challenges to the generalization of trained models. Moreover, devices can engage in decentralized FL via Device-to-Device communication while the communication topology of connected devices also impacts the generalization of models. Most recent theoretical studies overlook the incorporation of all these effects into FEEL when developing generalization analyses. In contrast, our work presents an information-theoretic generalization analysis for topology-aware FEEL in the presence of data heterogeneity and noisy channels. Additionally, we propose a novel regularization method called Federated Global Mutual Information Reduction (FedGMIR) to enhance the performance of models based on our analysis. Numerical results validate our theoretical findings and provide evidence for the effectiveness of the proposed method.
    摘要 随着边缘智能的快速发展, Federated Edge Learning(FEEL)在无线网络上进行的部署吸引了越来越多的关注。在 FEEL 中,移动设备通过含杂annel 传输模型参数并收集数据在多样化环境中具有挑战,这些挑战对训练模型的泛化造成了影响。此外,设备可以通过设备之间的 Device-to-Device 通信参与到分布式 FL 中,而连接设备的通信结构也会影响模型的泛化。最近的理论研究忽视了这些效应在 FEEL 中进行泛化分析。相比之下,我们的工作提供了一种信息论的泛化分析方法,该方法考虑了数据多样性和含杂annel 的影响。此外,我们还提出了一种新的规范方法,即 Federated Global Mutual Information Reduction(FedGMIR),以提高模型性能。数值结果证明了我们的理论发现,并提供了 FEEL 中模型性能的提升证据。

Graph Neural Networks with a Distribution of Parametrized Graphs

  • paper_url: http://arxiv.org/abs/2310.16401
  • repo_url: None
  • paper_authors: See Hian Lee, Feng Ji, Kelin Xia, Wee Peng Tay
  • for: 提高图像分类和图像回归的性能,捕捉更多的信息
  • methods: 使用 latent variables Parameterizing multiple graphs,基于 Expectation-Maximization 框架和 Markov Chain Monte Carlo 方法
  • results: 在hetroogeneous graph 和化学数据集上对节点分类和图像回归实现了提高性能,比基eline模型更好Here’s a breakdown of each point:
  • for: The paper is written to improve the performance of graph neural networks on node classification and graph regression tasks by incorporating additional information from multiple graphs.
  • methods: The authors introduce latent variables to parameterize and generate multiple graphs, and use an Expectation-Maximization framework and Markov Chain Monte Carlo method to obtain the maximum likelihood estimate of the network parameters.
  • results: The authors demonstrate improvements in performance against baseline models on node classification for heterogeneous graphs and graph regression on chemistry datasets.
    Abstract Traditionally, graph neural networks have been trained using a single observed graph. However, the observed graph represents only one possible realization. In many applications, the graph may encounter uncertainties, such as having erroneous or missing edges, as well as edge weights that provide little informative value. To address these challenges and capture additional information previously absent in the observed graph, we introduce latent variables to parameterize and generate multiple graphs. We obtain the maximum likelihood estimate of the network parameters in an Expectation-Maximization (EM) framework based on the multiple graphs. Specifically, we iteratively determine the distribution of the graphs using a Markov Chain Monte Carlo (MCMC) method, incorporating the principles of PAC-Bayesian theory. Numerical experiments demonstrate improvements in performance against baseline models on node classification for heterogeneous graphs and graph regression on chemistry datasets.
    摘要

Learning Efficient Surrogate Dynamic Models with Graph Spline Networks

  • paper_url: http://arxiv.org/abs/2310.16397
  • repo_url: https://github.com/kaist-silab/graphsplinenets
  • paper_authors: Chuanbo Hua, Federico Berto, Michael Poli, Stefano Massaroli, Jinkyoo Park
  • for: 这篇论文旨在提高物理系统预测的效率,使用深度学习方法来简化网格大小和迭代步骤。
  • methods: 本文提出了GraphSplineNets,一种新的深度学习方法,利用两个可微的正交拓拨方法来高效预测时间和空间中的回应。此外,我们也提出了一个适应标本策略,优先在重要区域进行标本。
  • results: 本文透过处理多种物理系统,包括热方程、振荡波传播、奈奎-斯托克方程和真实世界的海洋流体,以及训练过程中的测试和评估,发现GraphSplineNets可以提高预测精度和速度的对应关系。
    Abstract While complex simulations of physical systems have been widely used in engineering and scientific computing, lowering their often prohibitive computational requirements has only recently been tackled by deep learning approaches. In this paper, we present GraphSplineNets, a novel deep-learning method to speed up the forecasting of physical systems by reducing the grid size and number of iteration steps of deep surrogate models. Our method uses two differentiable orthogonal spline collocation methods to efficiently predict response at any location in time and space. Additionally, we introduce an adaptive collocation strategy in space to prioritize sampling from the most important regions. GraphSplineNets improve the accuracy-speedup tradeoff in forecasting various dynamical systems with increasing complexity, including the heat equation, damped wave propagation, Navier-Stokes equations, and real-world ocean currents in both regular and irregular domains.
    摘要 traditional simulations of physical systems have been widely used in engineering and scientific computing, but their high computational requirements have only recently been addressed by deep learning methods. In this paper, we present GraphSplineNets, a novel deep-learning approach that speeds up the forecasting of physical systems by reducing the grid size and number of iteration steps of deep surrogate models. Our method uses two differentiable orthogonal spline collocation methods to efficiently predict responses at any location in time and space. Additionally, we introduce an adaptive collocation strategy in space to prioritize sampling from the most important regions. GraphSplineNets improve the accuracy-speedup tradeoff in forecasting various dynamical systems with increasing complexity, including the heat equation, damped wave propagation, Navier-Stokes equations, and real-world ocean currents in both regular and irregular domains.Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.

Distributed Uncertainty Quantification of Kernel Interpolation on Spheres

  • paper_url: http://arxiv.org/abs/2310.16384
  • repo_url: None
  • paper_authors: Shao-Bo Lin, Xingping Sun, Di Wang
  • for: 这 paper 是关于 radial basis function (RBF) kernel interpolation of scattered data 的研究,具体来说是研究如何对具有较大规模噪声数据的射频数据进行插值,以及如何管理和评估插值过程中的不确定性。
  • methods: 这 paper 使用了分布式插值方法来处理具有较大规模噪声数据的射频数据,并通过对不确定性进行评估和管理来提高插值精度和稳定性。
  • results: 该 paper 的数据示出了该方法在处理具有较大规模噪声数据的射频数据时的实用性和稳定性,并且可以减少对不确定性的影响,从而提高插值精度。
    Abstract For radial basis function (RBF) kernel interpolation of scattered data, Schaback in 1995 proved that the attainable approximation error and the condition number of the underlying interpolation matrix cannot be made small simultaneously. He referred to this finding as an "uncertainty relation", an undesirable consequence of which is that RBF kernel interpolation is susceptible to noisy data. In this paper, we propose and study a distributed interpolation method to manage and quantify the uncertainty brought on by interpolating noisy spherical data of non-negligible magnitude. We also present numerical simulation results showing that our method is practical and robust in terms of handling noisy data from challenging computing environments.
    摘要 For radial basis function (RBF) kernel interpolation of scattered data, Schaback in 1995 proved that the attainable approximation error and the condition number of the underlying interpolation matrix cannot be made small simultaneously. He referred to this finding as an "uncertainty relation", an undesirable consequence of which is that RBF kernel interpolation is susceptible to noisy data. In this paper, we propose and study a distributed interpolation method to manage and quantify the uncertainty brought on by interpolating noisy spherical data of non-negligible magnitude. We also present numerical simulation results showing that our method is practical and robust in terms of handling noisy data from challenging computing environments.Here's the translation in Traditional Chinese:For radial basis function (RBF) kernel interpolation of scattered data, Schaback in 1995 proved that the attainable approximation error and the condition number of the underlying interpolation matrix cannot be made small simultaneously. He referred to this finding as an "uncertainty relation", an undesirable consequence of which is that RBF kernel interpolation is susceptible to noisy data. In this paper, we propose and study a distributed interpolation method to manage and quantify the uncertainty brought on by interpolating noisy spherical data of non-negligible magnitude. We also present numerical simulation results showing that our method is practical and robust in terms of handling noisy data from challenging computing environments.

A model for multi-attack classification to improve intrusion detection performance using deep learning approaches

  • paper_url: http://arxiv.org/abs/2310.16380
  • repo_url: None
  • paper_authors: Arun Kumar Silivery, Ram Mohan Rao Kovvur
  • for: 本研究旨在开发一种可靠的攻击检测机制,以 помочь发现恶意攻击。
  • methods: 该研究提出了一种深度学习方法框架,包括三种方法:Long-Short Term Memory Recurrent Neural Network (LSTM-RNN)、Recurrent Neural Network (RNN) 和 Deep Neural Network (DNN)。
  • results: 研究结果显示,LSTM-RNN WITH adamax 优化器在 NSL-KDD 数据集上表现出色,在准确率、检测率和假阳性率方面超过了现有的浅学习和深度学习模型。此外,多模型方法也在 KDD99、NSL-KDD 和 UNSWNB15 数据集上提供了显著的性能。
    Abstract This proposed model introduces novel deep learning methodologies. The objective here is to create a reliable intrusion detection mechanism to help identify malicious attacks. Deep learning based solution framework is developed consisting of three approaches. The first approach is Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) with seven optimizer functions such as adamax, SGD, adagrad, adam, RMSprop, nadam and adadelta. The model is evaluated on NSL-KDD dataset and classified multi attack classification. The model has outperformed with adamax optimizer in terms of accuracy, detection rate and low false alarm rate. The results of LSTM-RNN with adamax optimizer is compared with existing shallow machine and deep learning models in terms of accuracy, detection rate and low false alarm rate. The multi model methodology consisting of Recurrent Neural Network (RNN), Long-Short Term Memory Recurrent Neural Network (LSTM-RNN), and Deep Neural Network (DNN). The multi models are evaluated on bench mark datasets such as KDD99, NSL-KDD, and UNSWNB15 datasets. The models self-learnt the features and classifies the attack classes as multi-attack classification. The models RNN, and LSTM-RNN provide considerable performance compared to other existing methods on KDD99 and NSL-KDD dataset
    摘要 这种提议的模型引入了新的深度学习方法ologies。目标是创建一个可靠的攻击检测机制,以帮助标识恶意攻击。基于深度学习的解决方案框架由三种方法组成:首先是Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) WITH seven optimizer functions such as adamax, SGD, adagrad, adam, RMSprop, nadam和 adadelta。模型在 NSL-KDD 数据集上进行评估,并将多种攻击分类。模型使用 adamax 优化器时在准确率、检测率和假阳性率方面占据了领先地位。对于 LSTM-RNN WITH adamax 优化器的结果进行比较,与现有的浅学习和深度学习模型在准确率、检测率和假阳性率方面的性能。此外,我们还提出了一种多模型方法,包括 Recurrent Neural Network (RNN)、Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) 和 Deep Neural Network (DNN)。这些模型在 KDD99、NSL-KDD 和 UNSWNB15 数据集上进行评估,并自学习特征以进行多类攻击分类。RNN 和 LSTM-RNN 模型在 KDD99 和 NSL-KDD 数据集上表现出了显著的优异性,与其他现有方法相比。

DyExplainer: Explainable Dynamic Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.16375
  • repo_url: None
  • paper_authors: Tianchun Wang, Dongsheng Luo, Wei Cheng, Haifeng Chen, Xiang Zhang
  • for: 这 paper 的目的是解释动态图神经网络(GNNs)的含义,以便更好地理解和信任这些模型,从而扩大其在重要应用场景中的使用。
  • methods: 这 paper 使用了一种名为 DyExplainer 的新方法,它在运行时对动态 GNNs 进行解释。DyExplainer 使用了一种动态 GNN 的干扰注意力技术,同时通过对比学习技术来保持结构一致性和时间连续性。
  • results: 该 paper 的实验结果表明,DyExplainer 不仅能够 faithful 地解释模型预测结果,还能够显著提高模型预测精度,如在链接预测任务中。
    Abstract Graph Neural Networks (GNNs) resurge as a trending research subject owing to their impressive ability to capture representations from graph-structured data. However, the black-box nature of GNNs presents a significant challenge in terms of comprehending and trusting these models, thereby limiting their practical applications in mission-critical scenarios. Although there has been substantial progress in the field of explaining GNNs in recent years, the majority of these studies are centered on static graphs, leaving the explanation of dynamic GNNs largely unexplored. Dynamic GNNs, with their ever-evolving graph structures, pose a unique challenge and require additional efforts to effectively capture temporal dependencies and structural relationships. To address this challenge, we present DyExplainer, a novel approach to explaining dynamic GNNs on the fly. DyExplainer trains a dynamic GNN backbone to extract representations of the graph at each snapshot, while simultaneously exploring structural relationships and temporal dependencies through a sparse attention technique. To preserve the desired properties of the explanation, such as structural consistency and temporal continuity, we augment our approach with contrastive learning techniques to provide priori-guided regularization. To model longer-term temporal dependencies, we develop a buffer-based live-updating scheme for training. The results of our extensive experiments on various datasets demonstrate the superiority of DyExplainer, not only providing faithful explainability of the model predictions but also significantly improving the model prediction accuracy, as evidenced in the link prediction task.
    摘要 GRAPH Neural Networks (GNNs) once again become a popular research topic due to their impressive ability to capture representations from graph-structured data. However, the black-box nature of GNNs presents a significant challenge in terms of understanding and trusting these models, which limits their practical applications in critical scenarios. Although there has been substantial progress in the field of explaining GNNs in recent years, most of these studies focus on static graphs, leaving the explanation of dynamic GNNs largely unexplored. Dynamic GNNs, with their ever-changing graph structures, pose a unique challenge and require additional efforts to effectively capture temporal dependencies and structural relationships. To address this challenge, we propose DyExplainer, a novel approach to explaining dynamic GNNs on the fly. DyExplainer trains a dynamic GNN backbone to extract representations of the graph at each snapshot, while simultaneously exploring structural relationships and temporal dependencies through a sparse attention technique. To preserve the desired properties of the explanation, such as structural consistency and temporal continuity, we augment our approach with contrastive learning techniques to provide prior-guided regularization. To model longer-term temporal dependencies, we develop a buffer-based live-updating scheme for training. The results of our extensive experiments on various datasets demonstrate the superiority of DyExplainer, not only providing faithful explainability of the model predictions but also significantly improving the model prediction accuracy, as evidenced in the link prediction task.

Joint Distributional Learning via Cramer-Wold Distance

  • paper_url: http://arxiv.org/abs/2310.16374
  • repo_url: None
  • paper_authors: Seunghwan An, Jong-June Jeon
  • for: 提高高维数据集中变量 conditional independence 假设的限制,以适应复杂的 correlation 结构和高维数据集。
  • methods: 提出了 Cramer-Wold 距离正则化,可以在关闭式计算中实现高维数据集的共同分布学习。同时,我们提出了一种两步学习方法,以便灵活地设定先验分布和改进 posterior 和先验分布的对齐。
  • results: 通过对高维数据集进行synthetic数据生成测试,我们证明了我们的提议方法的效果。由于许多现有的数据集和数据科学应用都包含多 categorical 变量,我们的实验表明了我们的方法的 universal 性。
    Abstract The assumption of conditional independence among observed variables, primarily used in the Variational Autoencoder (VAE) decoder modeling, has limitations when dealing with high-dimensional datasets or complex correlation structures among observed variables. To address this issue, we introduced the Cramer-Wold distance regularization, which can be computed in a closed-form, to facilitate joint distributional learning for high-dimensional datasets. Additionally, we introduced a two-step learning method to enable flexible prior modeling and improve the alignment between the aggregated posterior and the prior distribution. Furthermore, we provide theoretical distinctions from existing methods within this category. To evaluate the synthetic data generation performance of our proposed approach, we conducted experiments on high-dimensional datasets with multiple categorical variables. Given that many readily available datasets and data science applications involve such datasets, our experiments demonstrate the effectiveness of our proposed methodology.
    摘要 假设独立性 Among observed variables的假设,通常用于Variational Autoencoder(VAE)解oder模型中,对高维度数据集或复杂的变量结构存在限制。为了解决这个问题,我们引入了Cramer-Wold distance regularization,可以在关闭式中计算,以便实现高维度数据集中的共同分布学习。此外,我们引入了两步学习方法,以提高对汇集 posterior和先前分布的整合。此外,我们还提供了与现有方法的理论区别。为评估我们提出的方法的 sintetic data生成性能,我们在高维度数据集中进行了实验,其中许多 readily available datasets和数据科学应用都包含这类数据。我们的实验显示了我们提出的方法的有效性。

Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms

  • paper_url: http://arxiv.org/abs/2310.16363
  • repo_url: None
  • paper_authors: Prashansa Panda, Shalabh Bhatnagar
  • for: 本文研究了actor critic和natural actor critic算法在受限制Markov决策过程(C-MDP)中的应用,特别是当状态-动作空间较大时。
  • methods: 本文使用了actor critic和natural actor critic算法,并使用了函数近似来处理受限制MDP中的不等约束。
  • results: 本文通过非假设分析表明,actor critic和natural actor critic算法在非i.i.d(Markovian) Setting下可以 garantúa找到一个首ORDER站点(即 $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$),并且其样本复杂度为 $\mathcal{\tilde{O}(\epsilon^{-2.5})$。此外,在一些网格世界设置下进行了实验,并观察到了良好的实验性能。
    Abstract Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$) of the performance (Lagrange) function $L(\theta,\gamma)$, with a sample complexity of $\mathcal{\tilde{O}(\epsilon^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms.We also show the results of experiments on a few different grid world settings and observe good empirical performance using both of these algorithms. In particular, for large grid sizes, Constrained Natural Actor Critic shows slightly better results than Constrained Actor Critic while the latter is slightly better for a small grid size.
    摘要 actor-critic方法在各种强化学习任务上发现了广泛的应用,特别是当状态动作空间很大时。在这篇论文中,我们考虑actor-critic和自然actor-critic算法,并使用函数近似来解决受约束的马可夫决策过程(C-MDP)中的不等约束。我们使用长期均值成本函数,其中对象函数和约束函数都是适用于指定成本函数的长期均值。我们使用拉格朗日积分法来处理不等约束。我们证明了这些算法可以在非i.i.d(Markovian)设置下找到一个第一阶站点(即 $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$)的性能(拉格朗日)函数$L(\theta,\gamma)$的first-order站点,并且其样本复杂度为 $\mathcal{\tilde{O}(\epsilon^{-2.5})$。我们还进行了一些网格世界的实验,并观察了这两种算法在各种网格大小下的良好实验性。特别是在大网格大小下,自然actor-critic算法表现slightly better than Constrained Actor Critic算法,而后者在小网格大小下表现slightly better。

Neural Potential Field for Obstacle-Aware Local Motion Planning

  • paper_url: http://arxiv.org/abs/2310.16362
  • repo_url: https://github.com/cog-isa/npfield
  • paper_authors: Muhammad Alhaddad, Konstantin Mironov, Aleksey Staroverov, Aleksandr Panov
  • for: 本文旨在提供基于预测模型的移动 робот平台的本地规划方法。
  • methods: 本文提出了一种基于神经网络的潜在场的射程场,该模型基于机器人姿态、障碍物地图和机器人脚印的 differentiable 射程场,可以在 MPC solving 中使用。
  • results: 对比试验表明,提出的方法可以与现有的本地规划器相比,提供了更平滑的轨迹、相对较短的路径长度和安全的距离障碍物。实验结果表明,该方法可以在 Husky UGV 移动机器人上实现实时和安全的本地规划。
    Abstract Model predictive control (MPC) may provide local motion planning for mobile robotic platforms. The challenging aspect is the analytic representation of collision cost for the case when both the obstacle map and robot footprint are arbitrary. We propose a Neural Potential Field: a neural network model that returns a differentiable collision cost based on robot pose, obstacle map, and robot footprint. The differentiability of our model allows its usage within the MPC solver. It is computationally hard to solve problems with a very high number of parameters. Therefore, our architecture includes neural image encoders, which transform obstacle maps and robot footprints into embeddings, which reduce problem dimensionality by two orders of magnitude. The reference data for network training are generated based on algorithmic calculation of a signed distance function. Comparative experiments showed that the proposed approach is comparable with existing local planners: it provides trajectories with outperforming smoothness, comparable path length, and safe distance from obstacles. Experiment on Husky UGV mobile robot showed that our approach allows real-time and safe local planning. The code for our approach is presented at https://github.com/cog-isa/NPField together with demo video.
    摘要

Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs

  • paper_url: http://arxiv.org/abs/2310.16355
  • repo_url: https://github.com/tanyuqian/redco
  • paper_authors: Bowen Tan, Yun Zhu, Lijuan Liu, Hongyi Wang, Yonghao Zhuang, Jindong Chen, Eric Xing, Zhiting Hu
  • for: 这篇论文主要是为了提供一个轻量级、易用的工具来自动分布式训练和推理 dla 大型自然语言模型 (LLM),以及简化 ML 管道的开发。
  • methods: 该论文使用了两个简单的规则来生成 tensor 平行策略,以便轻松地分布式训练和推理 LLM。此外,论文还提出了一种机制,allowing for the customization of diverse ML pipelines through the definition of merely three functions。
  • results: 论文通过应用 Redco 在一些 LLlM 架构上,如 GPT-J、LLaMA、T5 和 OPT, demonstrate 了其效果,并且比官方实现更少的代码行数。
    Abstract The recent progress of AI can be largely attributed to large language models (LLMs). However, their escalating memory requirements introduce challenges for machine learning (ML) researchers and engineers. Addressing this requires developers to partition a large model to distribute it across multiple GPUs or TPUs. This necessitates considerable coding and intricate configuration efforts with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa. These tools require users' expertise in machine learning systems (MLSys), creating a bottleneck in LLM development, particularly for developers without MLSys background. In this work, we present Redco, a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development. The design of Redco emphasizes two key aspects. Firstly, to automate model parallism, our study identifies two straightforward rules to generate tensor parallel strategies for any given LLM. Integrating these rules into Redco facilitates effortless distributed LLM training and inference, eliminating the need of additional coding or complex configurations. We demonstrate the effectiveness by applying Redco on a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to the size of 66B. Secondly, we propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions, eliminating redundant and formulaic code like multi-host related processing. This mechanism proves adaptable across a spectrum of ML algorithms, from foundational language modeling to complex algorithms like meta-learning and reinforcement learning. Consequently, Redco implementations exhibit much fewer code lines compared to their official counterparts.
    摘要 Recent progress in AI 可以归功于大语言模型 (LLM)。然而,这些模型的内存需求在机器学习 (ML) 研究人员和工程师面临挑战。为 Addressing this, developers need to divide a large model into smaller parts and distribute it across multiple GPUs or TPUs。这需要较多的编程和配置工作,包括使用现有的模型并行工具,如 Megatron-LM、DeepSpeed 和 Alpa。这些工具需要用户具有机器学习系统 (MLSys) 背景,从而成为 LLM 开发中的瓶颈,特别是对没有 MLSys 背景的开发者而言。在这种情况下,我们提出了 Redco,一个轻量级的和易于使用的工具,用于自动化分布式训练和推理 для LLM,以及简化 ML 管道开发。Redco 的设计强调两个关键方面。首先,通过生成简单的两个规则,我们可以自动实现任何给定的 LLM 的模型并行策略。将这些规则集成到 Redco 中,使得分布式 LLM 训练和推理变得非常简单,不需要额外的编程或复杂的配置。我们在一些 LLM 架构,如 GPT-J、LLaMA、T5 和 OPT,上进行了应用,并达到了66B的规模。其次,我们提出了一种机制,allowing for the customization of diverse ML pipelines through the definition of merely three functions。这种机制可以适应多种 ML 算法,从基础语言模型到复杂的算法,如元学习和强化学习。因此,Redco 的实现比官方对应的实现更少了代码行数。

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process

  • paper_url: http://arxiv.org/abs/2310.16336
  • repo_url: https://github.com/zichongli5/smurf-thp
  • paper_authors: Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha
  • for: 模型事件序列数据的Transformer Hawkes过程模型,但大多数现有的训练方法仍然基于最大化事件序列的概率,这会导致计算不可分解的积分。此外,现有方法无法提供模型预测结果的不确定性评估,例如预测事件到达时间的信任区间。
  • methods: SMURF-THP方法基于分数函数对事件的到达时间进行学习和预测,并通过分数匹配目标函数来避免计算不可分解的积分。
  • results: 在事件类型预测和到达时间不确定性评估中,SMURF-THP方法在置信度抑制中表现出色,同时保持与现有概率基于方法相当的预测精度。
    Abstract Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective that avoids the intractable computation. With such a learned score function, we can sample arrival time of events from the predictive distribution. This naturally allows for the quantification of uncertainty by computing confidence intervals over the generated samples. We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time. In all the experiments, SMURF-THP outperforms existing likelihood-based methods in confidence calibration while exhibiting comparable prediction accuracy.
    摘要 <>使用Transformer Hawkes过程模型成功地处理事件序列数据,但大多数现有训练方法都是基于最大化事件序列的极高概率,这些方法通常需要计算一些不可 Calculate some intractable integral.此外,现有的方法无法提供预测结果的uncertainty量评估,例如预测事件到达时间的信任区间。为解决这些问题,我们提出了SMURF-THP,一种基于分数函数的方法,用于学习Transformer Hawkes过程和评估预测结果的uncertainty。具体来说,SMURF-THP学习事件到达时间的分数函数,基于分数匹配目标函数,而不需要计算不可 Calculate some intractable integral。通过这种学习的分数函数,我们可以从预测分布中采样到达时间,从而自然地计算预测结果的uncertainty。我们在事件类型预测和到达时间uncertainty量评估中进行了广泛的实验,并在所有实验中,SMURF-THP在信任报表中保持更好的报表 while exhibiting comparable prediction accuracy。Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".

Defense Against Model Extraction Attacks on Recommender Systems

  • paper_url: http://arxiv.org/abs/2310.16335
  • repo_url: None
  • paper_authors: Sixiao Zhang, Hongzhi Yin, Hongxu Chen, Cheng Long
  • for: 本研究旨在提高推荐系统的Robustness,尤其是针对模型提取攻击。
  • methods: 本文提出了一种名为Gradient-based Ranking Optimization(GRO)的防御策略,用于对模型提取攻击进行防御。GRO将防御问题定义为一个优化问题,目标是将保护目标模型的损失降低到最低,同时将攻击者的代理模型的损失增加到最高。在实现GRO时,我们使用了 swap matrices来代替top-k排名列表,以使用 student model来模拟攻击者的代理模型。
  • results: 我们在三个 benchmark 数据集上进行了实验,并证明了 GRO 的超越性,可以有效地防御模型提取攻击。
    Abstract The robustness of recommender systems has become a prominent topic within the research community. Numerous adversarial attacks have been proposed, but most of them rely on extensive prior knowledge, such as all the white-box attacks or most of the black-box attacks which assume that certain external knowledge is available. Among these attacks, the model extraction attack stands out as a promising and practical method, involving training a surrogate model by repeatedly querying the target model. However, there is a significant gap in the existing literature when it comes to defending against model extraction attacks on recommender systems. In this paper, we introduce Gradient-based Ranking Optimization (GRO), which is the first defense strategy designed to counter such attacks. We formalize the defense as an optimization problem, aiming to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model. Since top-k ranking lists are non-differentiable, we transform them into swap matrices which are instead differentiable. These swap matrices serve as input to a student model that emulates the surrogate model's behavior. By back-propagating the loss of the student model, we obtain gradients for the swap matrices. These gradients are used to compute a swap loss, which maximizes the loss of the student model. We conducted experiments on three benchmark datasets to evaluate the performance of GRO, and the results demonstrate its superior effectiveness in defending against model extraction attacks.
    摘要 “推荐系统的强健性已经成为研究社区中的焦点话题。许多敌意攻击已经被提出,但大多数它们需要很多先前知识,如白盒子攻击或黑盒子攻击,这些攻击假设有一定的外部知识是可用的。在这些攻击中,模型提取攻击最引人注目,它们可以通过重复询问目标模型来训练代理模型。然而,在现有的文献中,防御模型提取攻击的方法尚未得到充分的研究。在这篇论文中,我们介绍了一个名为Gradient-based Ranking Optimization(GRO)的防御策略,这是第一个针对推荐系统中的模型提取攻击进行防御的策略。我们将防御当做一个优化问题,寻找可以最大化攻击者的代理模型的损失,同时最小化保护目标模型的损失。因为排名列表是不可微的,我们将它转换为可微的交换矩阵,这些交换矩阵作为学生模型的输入。通过将学生模型的损失传递到交换矩阵上,我们可以得到交换损失,这个交换损失可以最大化学生模型的损失。我们在三个标准 benchmark 数据集上进行实验评估 GRO 的性能,结果显示 GRO 能够有效地防御模型提取攻击。”

Corrupting Neuron Explanations of Deep Visual Features

  • paper_url: http://arxiv.org/abs/2310.16332
  • repo_url: None
  • paper_authors: Divyansh Srivastava, Tuomas Oikarinen, Tsui-Wei Weng
  • for: This paper aims to investigate the robustness of Neuron Explanation Methods (NEMs) in deep neural networks.
  • methods: The authors use a unified pipeline to evaluate the robustness of NEMs under different types of corruptions, including random noises and well-designed perturbations.
  • results: The authors find that even small amounts of noise can significantly corrupt the explanations provided by NEMs, and that their proposed corruption algorithm can manipulate the explanations of more than 80% of neurons by poisoning less than 10% of the probing data. This raises concerns about the trustworthiness of NEMs in real-life applications.
    Abstract The inability of DNNs to explain their black-box behavior has led to a recent surge of explainability methods. However, there are growing concerns that these explainability methods are not robust and trustworthy. In this work, we perform the first robustness analysis of Neuron Explanation Methods under a unified pipeline and show that these explanations can be significantly corrupted by random noises and well-designed perturbations added to their probing data. We find that even adding small random noise with a standard deviation of 0.02 can already change the assigned concepts of up to 28% neurons in the deeper layers. Furthermore, we devise a novel corruption algorithm and show that our algorithm can manipulate the explanation of more than 80% neurons by poisoning less than 10% of probing data. This raises the concern of trusting Neuron Explanation Methods in real-life safety and fairness critical applications.
    摘要 Translated into Simplified Chinese:深度神经网络(DNN)的不可见行为无法解释的问题,带来了一波解释方法的增加。然而,有增加的担忧,这些解释方法不是可靠和可信的。在这项工作中,我们对神经元解释方法进行了首次稳定性分析,发现这些解释可以由杂音和设计的干扰添加到其探测数据中而严重损害。我们发现,即使添加0.02标准差的随机噪音,也可以改变深层神经元的分配概念,达到28%以上。此外,我们设计了一种新的损害算法,并证明我们的算法可以通过污染探测数据来控制神经元解释的80%以上。这引发了在实际安全和公平应用中信任神经元解释方法的担忧。

Brain-Inspired Reservoir Computing Using Memristors with Tunable Dynamics and Short-Term Plasticity

  • paper_url: http://arxiv.org/abs/2310.16331
  • repo_url: None
  • paper_authors: Nicholas X. Armendarez, Ahmed S. Mohamed, Anurag Dhungel, Md Razuan Hossain, Md Sakib Hasan, Joseph S. Najem
  • for: 本研究旨在提供一种用于时间类型分类和预测任务的analog设备,以提高信息处理速度,降低能耗和占用面积。
  • methods: 研究人员使用 ion-channel-based memristors,通过控制电压或 ion channel 浓度来实现多态动态特性。
  • results: 实验和 simulations 表明,使用一小 número de distinct memristors 可以获得高精度的预测和分类结果,比如在 second-order nonlinear dynamical system prediction 任务中,使用 five distinct memristors 实际 achievable normalized mean square error 为0.0015,在 neural activity classification 任务中,使用 three distinct memristors 实际 achievable accuracy 为96.5%。
    Abstract Recent advancements in reservoir computing research have created a demand for analog devices with dynamics that can facilitate the physical implementation of reservoirs, promising faster information processing while consuming less energy and occupying a smaller area footprint. Studies have demonstrated that dynamic memristors, with nonlinear and short-term memory dynamics, are excellent candidates as information-processing devices or reservoirs for temporal classification and prediction tasks. Previous implementations relied on nominally identical memristors that applied the same nonlinear transformation to the input data, which is not enough to achieve a rich state space. To address this limitation, researchers either diversified the data encoding across multiple memristors or harnessed the stochastic device-to-device variability among the memristors. However, this approach requires additional pre-processing steps and leads to synchronization issues. Instead, it is preferable to encode the data once and pass it through a reservoir layer consisting of memristors with distinct dynamics. Here, we demonstrate that ion-channel-based memristors with voltage-dependent dynamics can be controllably and predictively tuned through voltage or adjustment of the ion channel concentration to exhibit diverse dynamic properties. We show, through experiments and simulations, that reservoir layers constructed with a small number of distinct memristors exhibit significantly higher predictive and classification accuracies with a single data encoding. We found that for a second-order nonlinear dynamical system prediction task, the varied memristor reservoir experimentally achieved a normalized mean square error of 0.0015 using only five distinct memristors. Moreover, in a neural activity classification task, a reservoir of just three distinct memristors experimentally attained an accuracy of 96.5%.
    摘要 To overcome this limitation, researchers have either diversified the data encoding across multiple memristors or harnessed the stochastic device-to-device variability among memristors. However, these approaches require additional pre-processing steps and can lead to synchronization issues.In this study, we demonstrate that ion-channel-based memristors with voltage-dependent dynamics can be controllably and predictively tuned through voltage or adjustment of the ion channel concentration to exhibit diverse dynamic properties. By constructing reservoir layers with a small number of distinct memristors, we found that the predictive and classification accuracies can be significantly improved with a single data encoding.In our experiments and simulations, we showed that for a second-order nonlinear dynamical system prediction task, a reservoir layer constructed with only five distinct memristors achieved a normalized mean square error of 0.0015. Moreover, in a neural activity classification task, a reservoir of just three distinct memristors experimentally attained an accuracy of 96.5%. Our findings demonstrate the potential of using ion-channel-based memristors with diverse dynamics to improve the performance of reservoir computing.

Reinforcement Learning for SBM Graphon Games with Re-Sampling

  • paper_url: http://arxiv.org/abs/2310.16326
  • repo_url: None
  • paper_authors: Peihan Huo, Oscar Peralta, Junyu Guo, Qiaomin Xie, Andreea Minca
  • for: 本研究旨在探讨大规模群体动力学中的 Mean-Field 方法的局限性,并提出了 Multi-Population Mean-Field Game(MP-MFG)模型来解决这些限制。
  • methods: 作者们提出了一种基于 Policy Mirror Ascent 算法的 MP-MFG Nash 平衡找索法,并在更真实的情况下,即无知 Stochastic Block Model 时,提出了一种基于图顺游戏的 Graphon Game with Re-Sampling(GGR-S)模型。
  • results: 作者们分析了 GGR-S 模型的动态,并证明了它们的动态相对于 MP-MFG 动态的收敛性。此外,他们还提出了一种基于 GGR-S 模型的高效的 N-player 启动学习算法,并提供了finite sample 保证的收敛分析。
    Abstract The Mean-Field approximation is a tractable approach for studying large population dynamics. However, its assumption on homogeneity and universal connections among all agents limits its applicability in many real-world scenarios. Multi-Population Mean-Field Game (MP-MFG) models have been introduced in the literature to address these limitations. When the underlying Stochastic Block Model is known, we show that a Policy Mirror Ascent algorithm finds the MP-MFG Nash Equilibrium. In more realistic scenarios where the block model is unknown, we propose a re-sampling scheme from a graphon integrated with the finite N-player MP-MFG model. We develop a novel learning framework based on a Graphon Game with Re-Sampling (GGR-S) model, which captures the complex network structures of agents' connections. We analyze GGR-S dynamics and establish the convergence to dynamics of MP-MFG. Leveraging this result, we propose an efficient sample-based N-player Reinforcement Learning algorithm for GGR-S without population manipulation, and provide a rigorous convergence analysis with finite sample guarantee.
    摘要 “mean-fieldapproximation”是一种可行的方法来研究大规模动态系统。然而,它的假设homogeneity和所有代理者之间的通用连接限制了其在实际场景中的应用。“multi-population mean-field game”(MP-MFG)模型已经在文献中提出,以解决这些限制。当下面的随机块模型是known的时候,我们表明了一种策略镜像上升算法可以找到MP-MFG的 Nash Equilibrium。在更实际的场景中,当随机块模型是 unknown的时候,我们提议一种从graphon集成到finite N-player MP-MFG模型的重新抽样方案。我们开发了一种基于graphon game with re-sampling(GGR-S)模型的学习框架,该模型捕捉了代理者之间的复杂网络结构。我们分析了GGR-S动力学和establish了MP-MFG动力学的整合。基于这个结果,我们提出了一种高效的sample-based N-player reinforcement learning算法,并提供了finite sample guarantee的准确性分析。

Personalized Federated X -armed Bandit

  • paper_url: http://arxiv.org/abs/2310.16323
  • repo_url: None
  • paper_authors: Wenjie Li, Qifan Song, Jean Honorio
  • for: 本研究探讨了个性化联合 $\mathcal{X}$ 武器问题,即在联合学习框架中同时优化客户端的不同化本地目标。
  • methods: 我们提出了 \texttt{PF-PNE} 算法,具有独特的双排除策略,可以安全排除非优区间,同时鼓励联合合作通过偏袋而且有效的评估本地目标。
  • results: 我们的理论分析表明,提出的 \texttt{PF-PNE} 算法可以优化本地目标的任何水平异ogeneity,并且限制通信保护客户端奖励数据的隐私。实验表明,\texttt{PF-PNE} 超越多基elines在both synthetic和实际数据上。
    Abstract In this work, we study the personalized federated $\mathcal{X}$-armed bandit problem, where the heterogeneous local objectives of the clients are optimized simultaneously in the federated learning paradigm. We propose the \texttt{PF-PNE} algorithm with a unique double elimination strategy, which safely eliminates the non-optimal regions while encouraging federated collaboration through biased but effective evaluations of the local objectives. The proposed \texttt{PF-PNE} algorithm is able to optimize local objectives with arbitrary levels of heterogeneity, and its limited communications protects the confidentiality of the client-wise reward data. Our theoretical analysis shows the benefit of the proposed algorithm over single-client algorithms. Experimentally, \texttt{PF-PNE} outperforms multiple baselines on both synthetic and real life datasets.
    摘要 在这项研究中,我们研究了个性化联合 $\mathcal{X}$-臂投资问题,其中客户端的多样化本地目标同时在联合学习框架下优化。我们提议了\texttt{PF-PNE}算法,该算法使用独特的双淘汰策略,安全地排除非优化区域,同时通过偏袋但有效的本地目标评估来促进联合协作。提议的\texttt{PF-PNE}算法可以优化客户端目标的任何水平多样性,并且限制通信保护客户端奖励数据的隐私。我们的理论分析表明提议算法在单个客户端算法方面具有优势。实验表明\texttt{PF-PNE}超过多个基elines在 sintetic 和实际数据集上表现出色。

Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo

  • paper_url: http://arxiv.org/abs/2310.16320
  • repo_url: None
  • paper_authors: Ziyi Wang, Yujie Chen, Qifan Song, Ruqi Zhang
  • for: 这个论文研究了一种叫做低精度训练的技术,用于提高深度神经网络的训练效率,而不是减少准确性。
  • methods: 该论文使用了一种名叫Stochastic Gradient Hamiltonian Monte Carlo(SGHMC)的低精度抽样方法,并使用了低精度和全精度的梯度积累器。
  • results: 研究发现,使用SGHMC抽样方法可以在非几何分布上实现$\epsilon$-错误的2-沃asserstein距离,并且与现有的低精度抽样方法相比,具有 quadratic 改善($\widetilde{\mathbf{O}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}\right)}\right)$)。此外,研究还证明了SGHMC在梯度噪声中更加稳定和Robust。
    Abstract Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy. This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradient accumulators for both strongly log-concave and non-log-concave distributions. Theoretically, our results show that, to achieve $\epsilon$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\widetilde{\mathbf{O}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\widetilde{\mathbf{O}\left({\epsilon}^{-4}{\lambda^{*}^{-1}\log^5\left({\epsilon^{-1}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise. Empirically, we conduct experiments on synthetic data, and {MNIST, CIFAR-10 \& CIFAR-100} datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning.
    摘要 低精度训练技术已经出现为深度神经网络训练效率的低成本技术,无需做出很大牺牲。其 Bayesian 对应技术可以提供uncertainty量化和改善泛化精度。这篇论文研究了低精度抽样,使用随机Gradient Hamiltonian Monte Carlo(SGHMC)实现低精度和全精度梯度积累器。我们的研究结果表明,在非几何分布上,低精度 SGHMC 可以在 $\epsilon $ 误差下实现 2-Wasserstein 距离的 $\epsilon $-误差,而且与现状态的低精度抽样器 Stochastic Gradient Langevin Dynamics(SGLD)相比,具有quadratic 提高($\widetilde{\mathbf{O}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}\right)}\right)$)。此外,我们证明了低精度 SGHMC 对于梯度误差的Robustness比低精度 SGLD 更强,这是因为涉及梯度噪声的摇摆更新的稳定性。我们在 sintetic 数据和 {MNIST, CIFAR-10 \& CIFAR-100} datasets上进行了实验, validate 我们的理论发现。我们的研究强调了低精度 SGHMC 作为大规模和有限资源的机器学习中的有效和准确抽样方法。

Understanding Code Semantics: An Evaluation of Transformer Models in Summarization

  • paper_url: http://arxiv.org/abs/2310.16314
  • repo_url: https://github.com/Demon702/robust_code_summary
  • paper_authors: Debanjan Mondal, Abhilasha Lodha, Ankita Sahoo, Beena Kumari
  • for: 这篇论文探讨了高级变换器基于语言模型如何实现代码概要。
  • methods: 我们通过对函数和变量名进行修改来评估模型是否真正理解代码 semantics 还是仅仅依靠文本提示。我们还在三种编程语言(Python、Javascript、Java)中引入了干扰者如死代码和注释代码,进一步检验模型的理解能力。
  • results: 我们的研究希望通过探讨变换器基于LM的内部工作 Mechanism,提高模型对代码的理解能力,并促进软件开发和维护过程的效率。
    Abstract This paper delves into the intricacies of code summarization using advanced transformer-based language models. Through empirical studies, we evaluate the efficacy of code summarization by altering function and variable names to explore whether models truly understand code semantics or merely rely on textual cues. We have also introduced adversaries like dead code and commented code across three programming languages (Python, Javascript, and Java) to further scrutinize the model's understanding. Ultimately, our research aims to offer valuable insights into the inner workings of transformer-based LMs, enhancing their ability to understand code and contributing to more efficient software development practices and maintenance workflows.
    摘要 Here is the translation in Simplified Chinese:这篇论文探讨了使用高级变换器基本语言模型进行代码概要的细节。通过实验研究,我们评估了代码概要的有效性,通过修改函数和变量名来检验模型是否真正理解代码 semantics 还是仅仅依据文本提示。我们还在三种编程语言(Python、Javascript、Java)中引入了死代码和注释代码作为敌手,进一步检验模型的理解。我们的研究目标是提供有价值的内在性研究,以提高变换器基本语言模型对代码的理解,并为更有效的软件开发实践和维护工作流程做出贡献。

Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification

  • paper_url: http://arxiv.org/abs/2310.16310
  • repo_url: None
  • paper_authors: Zichong Li, Qunzhi Xu, Zhenghao Xu, Yajun Mei, Tuo Zhao, Hongyuan Zha
  • for: 本研究旨在提出一种可靠地学习带有时空特征的事件发生过程的数学工具,以及为这些过程提供不确定性评估。
  • methods: 本研究使用的方法包括分布式预测和精度评估,以及一种基于分数匹配的pseudolikelihood函数来估计标记的STPPs。
  • results: 研究表明,对于不同的事件类型和数据量,SMASH方法可以提供更高的准确率和更低的不确定性,并且可以提供事件发生时间和位置的信心区间和标记的信心范围。
    Abstract Spatio-temporal point processes (STPPs) are potent mathematical tools for modeling and predicting events with both temporal and spatial features. Despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, current techniques fail to provide uncertainty quantification for model predictions, such as confidence intervals for the predicted event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data. To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.
    摘要 �� Stefanos �쳌�测点过程 (STPPs) 是一种强大的数学工具,用于模拟和预测具有时间和空间特征的事件。 despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, current techniques fail to provide uncertainty quantification for model predictions, such as confidence intervals for the predicted event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data.To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.Here's the text in Traditional Chinese:�� Stefanos �쳌�测点过程 (STPPs) 是一种强大的数学工具,用于模拟和预测具有时间和空间特征的事件。 despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, current techniques fail to provide uncertainty quantification for model predictions, such as confidence intervals for the predicted event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data.To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.

Imperfect Digital Twin Assisted Low Cost Reinforcement Training for Multi-UAV Networks

  • paper_url: http://arxiv.org/abs/2310.16302
  • repo_url: None
  • paper_authors: Xiucheng Wang, Nan Cheng, Longfei Ma, Zhisheng Yin, Tom. Luan, Ning Lu
  • for: 这个研究目的是优化多架空航空器网络的性能,并使用深度强化学习(DRL)来实现。
  • methods: 这个研究使用了随机生成的UAV混合部署方法,并使用了两个降阶神经网(NN)来优化UAV的混合部署,DT的建构成本,以及多架空航空器网络的性能。这两个NN使用了无监督学习和强化学习,两种低成本的无标签训练方法。
  • results: simulation results表明,这个方法可以对多架空航空器网络的训练成本进行重要优化,同时保证训练性能。这表明,使用不完美的DT模型可以实现高效的决策。
    Abstract Deep Reinforcement Learning (DRL) is widely used to optimize the performance of multi-UAV networks. However, the training of DRL relies on the frequent interactions between the UAVs and the environment, which consumes lots of energy due to the flying and communication of UAVs in practical experiments. Inspired by the growing digital twin (DT) technology, which can simulate the performance of algorithms in the digital space constructed by coping features of the physical space, the DT is introduced to reduce the costs of practical training, e.g., energy and hardware purchases. Different from previous DT-assisted works with an assumption of perfect reflecting real physics by virtual digital, we consider an imperfect DT model with deviations for assisting the training of multi-UAV networks. Remarkably, to trade off the training cost, DT construction cost, and the impact of deviations of DT on training, the natural and virtually generated UAV mixing deployment method is proposed. Two cascade neural networks (NN) are used to optimize the joint number of virtually generated UAVs, the DT construction cost, and the performance of multi-UAV networks. These two NNs are trained by unsupervised and reinforcement learning, both low-cost label-free training methods. Simulation results show the training cost can significantly decrease while guaranteeing the training performance. This implies that an efficient decision can be made with imperfect DTs in multi-UAV networks.
    摘要 为了衡量DT构建成本、训练成本和DT的偏差影响训练,我们提出了自然和虚拟生成的 UAV 混合部署方法。在这种方法中,我们使用了两个降准神经网络(NN)来优化虚拟生成 UAV 的数量、DT 构建成本和多架空航空器网络的性能。这两个 NN 通过无监督学习和反射学习训练,这些训练方法都是低成本的标签自由训练方法。在 simulations 中,我们发现可以大幅降低训练成本,同时保证训练性能。这表示,在多架空航空器网络中,可以有效地使用不完美的 DT 进行决策。

FuXi-Extreme: Improving extreme rainfall and wind forecasts with diffusion model

  • paper_url: http://arxiv.org/abs/2310.19822
  • repo_url: None
  • paper_authors: Xiaohui Zhong, Lei Chen, Jun Liu, Chensen Lin, Yuan Qi, Hao Li
    for:这个论文主要是为了提高天气预测模型的精度和准确性。methods:这个论文使用了一种名为denoising diffusion probabilistic model(DDPM),用于Restore表面预测数据中的细致细节,从而提高预测的准确性。results:论文表明,使用 FuXi-Extreme 模型可以在 5 天预测中提高预测的精度和准确性,特别是在极端天气事件方面。此外,这个模型还在 tropical cyclone 预测中表现出优于 HRES 模型。
    Abstract Significant advancements in the development of machine learning (ML) models for weather forecasting have produced remarkable results. State-of-the-art ML-based weather forecast models, such as FuXi, have demonstrated superior statistical forecast performance in comparison to the high-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF). However, ML models face a common challenge: as forecast lead times increase, they tend to generate increasingly smooth predictions, leading to an underestimation of the intensity of extreme weather events. To address this challenge, we developed the FuXi-Extreme model, which employs a denoising diffusion probabilistic model (DDPM) to restore finer-scale details in the surface forecast data generated by the FuXi model in 5-day forecasts. An evaluation of extreme total precipitation ($\textrm{TP}$), 10-meter wind speed ($\textrm{WS10}$), and 2-meter temperature ($\textrm{T2M}$) illustrates the superior performance of FuXi-Extreme over both FuXi and HRES. Moreover, when evaluating tropical cyclone (TC) forecasts based on International Best Track Archive for Climate Stewardship (IBTrACS) dataset, both FuXi and FuXi-Extreme shows superior performance in TC track forecasts compared to HRES, but they show inferior performance in TC intensity forecasts in comparison to HRES.
    摘要 significannot advancements in the development of machine learning (ML) models for weather forecasting have produced remarkable results. State-of-the-art ML-based weather forecast models, such as FuXi, have demonstrated superior statistical forecast performance in comparison to the high-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF). However, ML models face a common challenge: as forecast lead times increase, they tend to generate increasingly smooth predictions, leading to an underestimation of the intensity of extreme weather events. To address this challenge, we developed the FuXi-Extreme model, which employs a denoising diffusion probabilistic model (DDPM) to restore finer-scale details in the surface forecast data generated by the FuXi model in 5-day forecasts. An evaluation of extreme total precipitation ($\textrm{TP}$), 10-meter wind speed ($\textrm{WS10}$), and 2-meter temperature ($\textrm{T2M}$) illustrates the superior performance of FuXi-Extreme over both FuXi and HRES. Moreover, when evaluating tropical cyclone (TC) forecasts based on International Best Track Archive for Climate Stewardship (IBTrACS) dataset, both FuXi and FuXi-Extreme shows superior performance in TC track forecasts compared to HRES, but they show inferior performance in TC intensity forecasts in comparison to HRES.Here's the translation in Traditional Chinese:这些进步在机器学习(ML)模型的气象预报中得到了杰出的结果。例如FuXi这个ML-based weather forecast model,在欧洲中期气象预报中心(ECMWF)的高分辨率预报(HRES)之上, demonstrates superior statistical forecast performance。然而,ML模型面临一个普遍的挑战:当预报时间增加时,它们倾向于生成越来越平滑的预报,导致极端天气事件的Underestimation。为了解决这个问题,我们开发了FuXi-Extreme模型,该模型使用减钠扩散概率模型(DDPM)来重新塑造在FuXi模型的5天预报中的表面数据,以 Restore finer-scale details。一个evaluation of extreme total precipitation(TP)、10-meter wind speed(WS10)和2-meter temperature(T2M)表明FuXi-Extreme模型在FuXi和HRES之上表现出色。此外,基于International Best Track Archive for Climate Stewardship(IBTrACS)数据集的评估显示,FuXi和FuXi-Extreme在热带气旋(TC)预报中表现出Superior performance compared to HRES,但在TC intensity forecasts中表现较差 compared to HRES。

Crowd-Certain: Label Aggregation in Crowdsourced and Ensemble Learning Classification

  • paper_url: http://arxiv.org/abs/2310.16293
  • repo_url: None
  • paper_authors: Mohammad S. Majdi, Jeffrey J. Rodriguez
  • for: 这 paper 是为了提出一种基于大量标注数据的人员协同学习和集成学习分类任务的标注聚合方法,以提高性能和计算效率。
  • methods: 该方法使用了对 annotators 的一致性 versus 训练的分类器来确定每个 annotator 的可靠度分数。此外,Crowd-Certain 还利用预测的概率,使得可以 reuse 训练好的分类器,从而消除现有方法中的循环 simulation 过程。
  • results: 对于十种现有方法,Crowd-Certain 在十个不同的数据集上进行了广泛的评估,并取得了较高的平均准确率、F1 分数和 AUC 率。此外,本 paper 还提出了两种现有信任分数测试技术的变体,并使用了两个评估指标:预期均衡错误 (ECE) 和 Brier 分数损失。结果显示,Crowd-Certain 在大多数评估数据集上取得了更高的 Brier 分数,并且下降的 ECE。
    Abstract Crowdsourcing systems have been used to accumulate massive amounts of labeled data for applications such as computer vision and natural language processing. However, because crowdsourced labeling is inherently dynamic and uncertain, developing a technique that can work in most situations is extremely challenging. In this paper, we introduce Crowd-Certain, a novel approach for label aggregation in crowdsourced and ensemble learning classification tasks that offers improved performance and computational efficiency for different numbers of annotators and a variety of datasets. The proposed method uses the consistency of the annotators versus a trained classifier to determine a reliability score for each annotator. Furthermore, Crowd-Certain leverages predicted probabilities, enabling the reuse of trained classifiers on future sample data, thereby eliminating the need for recurrent simulation processes inherent in existing methods. We extensively evaluated our approach against ten existing techniques across ten different datasets, each labeled by varying numbers of annotators. The findings demonstrate that Crowd-Certain outperforms the existing methods (Tao, Sheng, KOS, MACE, MajorityVote, MMSR, Wawa, Zero-Based Skill, GLAD, and Dawid Skene), in nearly all scenarios, delivering higher average accuracy, F1 scores, and AUC rates. Additionally, we introduce a variation of two existing confidence score measurement techniques. Finally we evaluate these two confidence score techniques using two evaluation metrics: Expected Calibration Error (ECE) and Brier Score Loss. Our results show that Crowd-Certain achieves higher Brier Score, and lower ECE across the majority of the examined datasets, suggesting better calibrated results.
    摘要 众生系统已经用于收集大量标注数据,用于计算机视觉和自然语言处理等应用。然而,由于众生标注是自然动态的,因此开发一种能够在多种情况下工作的技术非常困难。在本文中,我们介绍了一种新的标签聚合方法——Crowd-Certain,用于众生和ensemble学习分类任务。该方法可以提高性能和计算效率,并且可以适用于不同的annotator数量和数据集。我们使用了annotators的一致性 versus 训练的分类器来确定每位annotator的可靠性分数。此外,Crowd-Certain还利用预测概率,从而消除了现有方法中的循环 simulation 过程。我们对十种现有方法进行了十种不同的数据集和不同的annotator数量的比较。结果表明,Crowd-Certain在大多数场景下超越现有方法,提供更高的平均准确率、F1分数和ROC曲线。此外,我们还提出了两种现有信息分数测量技术的变体。最后,我们使用了两种评估 metric:预期抽样错误(ECE)和布莱尔分数损失来评估这两种信息分数测量技术。我们的结果表明,Crowd-Certain在大多数检查的数据集上获得了更高的布莱尔分数,并且在大多数数据集上获得了更低的ECE,这表明它提供了更好的准确性。

Removing Dust from CMB Observations with Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.16285
  • repo_url: None
  • paper_authors: David Heurtel-Depeiges, Blakesley Burkhart, Ruben Ohana, Bruno Régaldo-Saint Blancard
  • for: 研究了吸引型尘埃前景模型的应用和其在分解成分方面的利益。
  • methods: 使用吸引型模型来模拟尘埃前景,并通过训练这些模型来实现分解成分。
  • results: 通过使用吸引型模型进行分解,可以良好地回归尘埃前景中的通用统计量(如功率spectrum和明ков斯函数)。此外,通过conditioning模型使用cosmology来提高模型的性能。
    Abstract In cosmology, the quest for primordial $B$-modes in cosmic microwave background (CMB) observations has highlighted the critical need for a refined model of the Galactic dust foreground. We investigate diffusion-based modeling of the dust foreground and its interest for component separation. Under the assumption of a Gaussian CMB with known cosmology (or covariance matrix), we show that diffusion models can be trained on examples of dust emission maps such that their sampling process directly coincides with posterior sampling in the context of component separation. We illustrate this on simulated mixtures of dust emission and CMB. We show that common summary statistics (power spectrum, Minkowski functionals) of the components are well recovered by this process. We also introduce a model conditioned by the CMB cosmology that outperforms models trained using a single cosmology on component separation. Such a model will be used in future work for diffusion-based cosmological inference.
    摘要 在 cosmology 中,寻找宇宙微波背景 (CMB) 中的原始 $B$-模式已经强调了对 галактической尘埃前景的精细模型的需求。我们研究了基于扩散的尘埃前景模型,并对其在分组分离中的利用性进行了调查。假设宇宙微波背景是 Gaussian 的,并且知道 cosmology 或者 covariance matrix,我们表明了扩散模型可以通过对尘埃辐射图像的示例进行训练,使其直接匹配 posterior 抽象在分组分离中的样本处理过程。我们在模拟中使用了尘埃辐射和 CMB 的混合图像,并证明了这种过程可以良好地重现组件的共同摘要统计(power spectrum、Minkowski 函数)。此外,我们还介绍了基于 CMB cosmology 的模型,该模型在分组分离中超越了基于单个 cosmology 的模型。这种模型将在未来的工作中用于扩散基于 cosmological inference。

Improvement in Alzheimer’s Disease MRI Images Analysis by Convolutional Neural Networks Via Topological Optimization

  • paper_url: http://arxiv.org/abs/2310.16857
  • repo_url: None
  • paper_authors: Peiwen Tan
  • for: 提高阿尔茨染色肿瘤病 diagnosis 精度
  • methods: 利用散射函数 topological optimization 进行 MRI 图像进一步加工
  • results: 使用 CNN 模型 VGG16、ResNet50、InceptionV3 和 Xception 后处理,对 MRI 图像进行改进,提高了图像的清晰度和准确率
    Abstract This research underscores the efficacy of Fourier topological optimization in refining MRI imagery, thereby bolstering the classification precision of Alzheimer's Disease through convolutional neural networks. Recognizing that MRI scans are indispensable for neurological assessments, but frequently grapple with issues like blurriness and contrast irregularities, the deployment of Fourier topological optimization offered enhanced delineation of brain structures, ameliorated noise, and superior contrast. The applied techniques prioritized boundary enhancement, contrast and brightness adjustments, and overall image lucidity. Employing CNN architectures VGG16, ResNet50, InceptionV3, and Xception, the post-optimization analysis revealed a marked elevation in performance. Conclusively, the amalgamation of Fourier topological optimization with CNNs delineates a promising trajectory for the nuanced classification of Alzheimer's Disease, portending a transformative impact on its diagnostic paradigms.
    摘要

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

  • paper_url: http://arxiv.org/abs/2310.16252
  • repo_url: https://github.com/aistats2024-noisy-psne/midsearch
  • paper_authors: Arnab Maiti, Ross Boczar, Kevin Jamieson, Lillian J. Ratliff
  • for: 本研究是为了研究在两个玩家的零和游戏中确定纯策略尼希尔平衡(PSNE)的样本复杂性。
  • methods: 本文使用了一种随机模型,其中任何学习者可以随机选择输入矩阵$A\in[-1,1]^{n\times m}$中的一个元素$(i,j)$,并观察$A_{i,j}+\eta$,其中$\eta$是一个零均值的1子-${\mathcal{N}(0,1)$噪音。学习者的目标是通过取得足够多样本来确定$A$中的PSNE,如果存在的话,并尽可能快。
  • results: 本文提出了一个基于实例的下界,该下界只取决于输入矩阵$A$中的行和列中的entry。此外,本文还提出了一个近似算法,其样本复杂性与下界匹配,即可以达到下界的logs级别。此外,本文还证明了这个问题与游戏策略的纯exploration问题和对抗策略问题之间的普通性,并且其结果与这些问题的优化下界匹配,即可以达到logs级别。
    Abstract We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where $\eta$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to identify the PSNE of $A$, whenever it exists, with high probability while taking as few samples as possible. Zhou et al. (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.
    摘要 我们研究样本复杂性标识纯策略尼仑平衡点(PSNE)在两个玩家零和游戏中的样本复杂性。正式地说,我们给出了一个随机模型,任何学习者可以随机选择输入矩阵$A\in[-1,1]^{n\times m}$的一个Entry $(i,j)$,并观察$A_{i,j}+\eta$,其中$\eta$是一个0mean的1子正态噪声。学习者的目标是通过取得高概率地标识PSNE的$A$,使用最少的样本数。周等(2017)提供了一个实例依赖的下界,该下界只取决于PSNE所在的行和列。我们设计了一个近似优化算法,其样本复杂性与下界匹配,即使log因子。PSNE的问题也涵盖了游戏纯exploration问题和对抗游戏问题,我们的结果与这两个设置的优化下界匹配,即使log因子。