cs.LG - 2023-09-11

Reaction coordinate flows for model reduction of molecular kinetics

  • paper_url: http://arxiv.org/abs/2309.05878
  • repo_url: None
  • paper_authors: Hao Wu, Frank Noé
  • for: 本研究推出了一种基于流程的机器学习方法,即反应均衡(RC)流,用于描述分子系统的低维度动力学模型。
  • methods: 该方法使用了正规化流形成均衡变换,并使用布朗运动模型来近似RC的动力学。所有模型参数都可以通过数据驱动方式进行估算。
  • results: numerical experiments表明,提出的方法可以高效地从仿真数据中提取低维度、可读的状态空间表示。
    Abstract In this work, we introduce a flow based machine learning approach, called reaction coordinate (RC) flow, for discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast to existing model reduction methods for molecular kinetics, RC flow offers a trainable and tractable model of reduced kinetics in continuous time and space due to the invertibility of the normalizing flow. Furthermore, the Brownian dynamics-based reduced kinetic model investigated in this work yields a readily discernible representation of metastable states within the phase space of the molecular system. Numerical experiments demonstrate how effectively the proposed method discovers interpretable and accurate low-dimensional representations of given full-state kinetics from simulations.
    摘要 在这个研究中,我们介绍了一种基于流的机器学习方法,即反应坐标(RC)流,用于分子系统的低维度动力学模型的发现。RC流利用了正规化流来设计坐标变换,并使用布朗运动模型来近似RC的动力学,其中所有模型参数都可以在数据驱动下被估算。与现有的分子动力学减量方法不同,RC流提供了可训练和可追踪的维度减少的动力学模型,因为正规化流的倒散性。此外,在这种研究中,我们使用布朗运动模型来研究分子系统中的潜在稳态态,从而获得了可读取的和准确的低维度表示。numerical experiments表明,该方法可以从模拟数据中提取有效和准确的低维度表示。

Force-directed graph embedding with hops distance

  • paper_url: http://arxiv.org/abs/2309.05865
  • repo_url: None
  • paper_authors: Hamidreza Lotfalizadeh, Mohammad Al Hasan
  • for: 本研究旨在提出一种基于力的图像方法,用于图像中节点的快速嵌入和分类。
  • methods: 该方法使用了稳定加速公式,将节点嵌入低维空间中,以保持图像的结构特征。具体来说,该方法 simulate了一些自定义吸引和排斥力,用于 Node pairs中的快速嵌入。
  • results: 对多个图像分析任务进行评估,该方法可以与现有的无监督嵌入方法相比,实现竞争性的表现。
    Abstract Graph embedding has become an increasingly important technique for analyzing graph-structured data. By representing nodes in a graph as vectors in a low-dimensional space, graph embedding enables efficient graph processing and analysis tasks like node classification, link prediction, and visualization. In this paper, we propose a novel force-directed graph embedding method that utilizes the steady acceleration kinetic formula to embed nodes in a way that preserves graph topology and structural features. Our method simulates a set of customized attractive and repulsive forces between all node pairs with respect to their hop distance. These forces are then used in Newton's second law to obtain the acceleration of each node. The method is intuitive, parallelizable, and highly scalable. We evaluate our method on several graph analysis tasks and show that it achieves competitive performance compared to state-of-the-art unsupervised embedding techniques.
    摘要 图像嵌入技术在处理图Structured数据方面已经变得越来越重要。通过将图像中的节点表示为低维度空间中的向量,图像嵌入技术可以实现高效的图像处理和分析任务,如节点分类、链接预测和可视化。在这篇论文中,我们提出了一种新的力导向的图像嵌入方法,该方法利用了稳定加速公式来嵌入节点,以保持图像的结构特征。我们在所有节点对之间 simulate 自定义的吸引和排斥力,并使用牛顿第二定律来获得每个节点的加速度。该方法易于理解,可并行化和高度扩展。我们在多个图像分析任务中评估了该方法,并示出了与当前最佳无监督嵌入技术相比的竞争性性能。

Energy Preservation and Stability of Random Filterbanks

  • paper_url: http://arxiv.org/abs/2309.05855
  • repo_url: https://github.com/danedane-haider/random-filterbanks
  • paper_authors: Daniel Haider, Vincent Lostanlen, Martin Ehler, Peter Balazs
  • for: 这篇论文是关于干扰波形深度学习的挑战。
  • methods: 这篇论文使用了数据卷积神经网络(convnet)来设计滤波器。
  • results: 研究发现,使用随机 Gaussian 权重的 FIR 滤波器在大 Filter 和本地 périodic 输入信号中存在不稳定性和 Condition number 问题。此外,研究还发现了预期能量保持的不够,导致了数字稳定性的问题,并提出了理论上的 BOUND 限制。
    Abstract What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. This is all the more surprising because these baselines are linear time-invariant systems: as such, their transfer functions could be accurately represented by a convnet with a large receptive field. In this article, we elaborate on the statistical properties of simple convnets from the mathematical perspective of random convolutional operators. We find that FIR filterbanks with random Gaussian weights are ill-conditioned for large filters and locally periodic input signals, which both are typical in audio signal processing applications. Furthermore, we observe that expected energy preservation of a random filterbank is not sufficient for numerical stability and derive theoretical bounds for its expected frame bounds.
    摘要 (以下是文本的简化中文版本)为什么波形基于深度学习这么困难?虽然许多尝试用深度神经网络(convnet)来设计滤波器,但它们frequently fail to outperform hand-crafted baselines。这种情况更加奇怪,因为这些基elines是线性时间不变的系统:这意味着它们的转移函数可以准确地由一个大感知场来表示。在这篇文章中,我们从数学角度来描述简单的convnet的统计性质。我们发现,随机 Gaussian 权重的 FIR 滤波器在大 filters 和本地 periodic input signals 中是不稳定的。此外,我们发现预期能量保留不够 для数学稳定,并derive theoretical bounds for its expected frame bounds。

ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation

  • paper_url: http://arxiv.org/abs/2309.05853
  • repo_url: None
  • paper_authors: Gregory W. Kyro, Anton Morgunov, Rafael I. Brent, Victor S. Batista
  • For: The paper is written for the purpose of developing a novel and efficient semi-supervised active learning methodology for fine-tuning generative artificial intelligence models, specifically in the context of targeted molecular generation.* Methods: The paper uses a GPT-based molecular generator and a constructed representation of the sample space to strategically operate within a chemical space proxy, maximizing attractive interactions between the generated molecules and a protein target. The approach does not require the individual evaluation of all data points used for fine-tuning, enabling the incorporation of computationally expensive metrics.* Results: The paper demonstrates the ability to fine-tune a GPT-based molecular generator with respect to an attractive interaction-based scoring function, resulting in maximized attractive interactions between the generated molecules and a protein target.Here is the same information in Simplified Chinese text:* For: 这篇论文是为了开发一种新的和高效的半监督学习方法,用于微调生成人工智能模型,特别是在分子生成领域中。* Methods: 这篇论文使用基于GPT的分子生成器,并使用一个构建的样本空间来策略地操作在一个化学空间代理中,以最大化生成分子和蛋白质目标之间的吸引力。这种方法不需要评估所有数据点,因此可以包含计算成本较高的指标。* Results: 这篇论文 demonstarted 可以使用这种方法微调基于吸引力分数函数的GPT基于分子生成器,以最大化生成分子和蛋白质目标之间的吸引力。
    Abstract The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. It is therefore of tremendous interest to develop methodologies that enhance the abilities and applicability of these powerful tools. In this work, we present a novel and efficient semi-supervised active learning methodology that allows for the fine-tuning of a generative model with respect to an objective function by strategically operating within a constructed representation of the sample space. In the context of targeted molecular generation, we demonstrate the ability to fine-tune a GPT-based molecular generator with respect to an attractive interaction-based scoring function by strategically operating within a chemical space proxy, thereby maximizing attractive interactions between the generated molecules and a protein target. Importantly, our approach does not require the individual evaluation of all data points that are used for fine-tuning, enabling the incorporation of computationally expensive metrics. We are hopeful that the inherent generality of this methodology ensures that it will remain applicable as this exciting field evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.
    摘要 “启示人工智能模型的强大能力已经无可避免地应用于药物发现领域。因此,开发 методологиías可以提高这些工具的能力和应用性是非常重要的。在这种工作中,我们提出了一种新的、高效的半监督学习方法,可以让一个生成模型与一个目标函数进行精细调整,通过在一个构造的样本空间中精细操作。在分子生成中,我们示例了可以通过在一个化学空间代理中精细调整一个GPT基于的分子生成器,以便最大化与一个蛋白质目标分子之间的吸引力相互作用。值得注意的是,我们的方法不需要评估所有用于精细调整的数据点,因此可以包括计算成本较高的指标。我们希望这种方法的内在一致性将保持其可应用性,并且随着这个吸引人的领域的发展,它将继续保持有用。为了促进实现和复制性,我们已经将所有的软件公开发布在OpenSource的ChemSpaceAL Python包中。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and other parts of the world. If you need the translation in Traditional Chinese, please let me know.

  • paper_url: http://arxiv.org/abs/2309.05843
  • repo_url: None
  • paper_authors: Louis Blankemeier, Sebastien Baur, Wei-Hung Weng, Jake Garrison, Yossi Matias, Shruthi Prabhakara, Diego Ardila, Zaid Nabulsi
  • for: 这个论文旨在提出一种自适应学习框架,用于对医疗声学信号进行对比学习。
  • methods: 该论文使用了SimCLR框架和Slowfast NFNet底层,并进行了对声音扩展的深入分析,以便优化Slowfast NFNet对声音 tasks的性能。
  • results: 研究发现,合适的声音扩展策略可以提高Slowfast NFNet对声音任务的性能,并且当扩展策略组合起来时,它们可以产生相互增强的效果,超过每个策略应用 separately的效果。
    Abstract Health-related acoustic signals, such as cough and breathing sounds, are relevant for medical diagnosis and continuous health monitoring. Most existing machine learning approaches for health acoustics are trained and evaluated on specific tasks, limiting their generalizability across various healthcare applications. In this paper, we leverage a self-supervised learning framework, SimCLR with a Slowfast NFNet backbone, for contrastive learning of health acoustics. A crucial aspect of optimizing Slowfast NFNet for this application lies in identifying effective audio augmentations. We conduct an in-depth analysis of various audio augmentation strategies and demonstrate that an appropriate augmentation strategy enhances the performance of the Slowfast NFNet audio encoder across a diverse set of health acoustic tasks. Our findings reveal that when augmentations are combined, they can produce synergistic effects that exceed the benefits seen when each is applied individually.
    摘要 医疗相关的声学信号,如喘挫和呼吸 зву频,对医疗诊断和连续健康监测有重要 significancE。现有大多数机器学习方法 для医疗声学都是专门为特定任务训练和评估,这限制了它们在不同医疗应用程序中的一致性。在这篇论文中,我们利用了一种无监督学习框架,SimCLR,并与Slowfast NFNet底层结构一起进行对照学习医疗声学。对于 optimize Slowfast NFNet для这个应用程序,我们进行了深入的声音扩充分析,并证明了有效的声音扩充策略可以提高 Slowfast NFNet 声音编码器在多种医疗声学任务上的性能。我们的发现表明,当扩充策略相互结合时,它们可以产生同工合作的效果,超过每个扩充策略应用 separately 的效果。

The Safety Filter: A Unified View of Safety-Critical Control in Autonomous Systems

  • paper_url: http://arxiv.org/abs/2309.05837
  • repo_url: None
  • paper_authors: Kai-Chieh Hsu, Haimin Hu, Jaime Fernández Fisac
  • for: 提高自主 робо器的安全性,满足新的部署环境的需求
  • methods: 评估和比较现有的安全筛选方法,提出一种统一的技术框架,推动未来的安全筛选技术的发展
  • results: 提出一种新的安全筛选方法,可以更好地满足自主 робо器的安全需求,并且可以帮助实现更好的安全性和可靠性
    Abstract Recent years have seen significant progress in the realm of robot autonomy, accompanied by the expanding reach of robotic technologies. However, the emergence of new deployment domains brings unprecedented challenges in ensuring safe operation of these systems, which remains as crucial as ever. While traditional model-based safe control methods struggle with generalizability and scalability, emerging data-driven approaches tend to lack well-understood guarantees, which can result in unpredictable catastrophic failures. Successful deployment of the next generation of autonomous robots will require integrating the strengths of both paradigms. This article provides a review of safety filter approaches, highlighting important connections between existing techniques and proposing a unified technical framework to understand, compare, and combine them. The new unified view exposes a shared modular structure across a range of seemingly disparate safety filter classes and naturally suggests directions for future progress towards more scalable synthesis, robust monitoring, and efficient intervention.
    摘要 This article provides a review of safety filter approaches, highlighting the connections between existing techniques and proposing a unified technical framework for understanding, comparing, and combining them. The new framework exposes a shared modular structure across a range of safety filter classes, providing a foundation for future progress in scalable synthesis, robust monitoring, and efficient intervention.

Ensemble-based modeling abstractions for modern self-optimizing systems

  • paper_url: http://arxiv.org/abs/2309.05823
  • repo_url: https://github.com/smartarch/ml-deeco-security-isola
  • paper_authors: Michal Töpfer, Milad Abdullah, Tomáš Bureš, Petr Hnětynka, Martin Kruliš
  • for: 这篇论文旨在扩展DEECo模型,以便使用机器学习和优化策略来建立和重新配置自动化组件集。
  • methods: 论文使用机器学习和优化策略来建立和重新配置自动化组件集,并在模型层次上Capture这些概念。
  • results: 论文通过用机器学习和优化策略来建立和重新配置自动化组件集,可以在Industry 4.0 Setting中模型访问控制相关问题,并且 argue что这种方法是现代智能系统的关键特征,可以在运行时学习和优化其行为以适应环境不确定性。
    Abstract In this paper, we extend our ensemble-based component model DEECo with the capability to use machine-learning and optimization heuristics in establishing and reconfiguration of autonomic component ensembles. We show how to capture these concepts on the model level and give an example of how such a model can be beneficially used for modeling access-control related problem in the Industry 4.0 settings. We argue that incorporating machine-learning and optimization heuristics is a key feature for modern smart systems which are to learn over the time and optimize their behavior at runtime to deal with uncertainty in their environment.
    摘要 在本文中,我们将我们的集成式组件模型DEECo扩展以使用机器学习和优化办法在自动化组件集中进行设置和重新配置。我们证明了如何在模型层面上表示这些概念,并给出了一个例子,证明如何使用这种模型来解决在产业4.0设置下的访问控制相关问题。我们认为,在运行时使用机器学习和优化办法是现代智能系统的关键特征,以适应环境中的不确定性。Here's the word-for-word translation:在本文中,我们将我们的集成式组件模型DEECo扩展以使用机器学习和优化办法在自动化组件集中进行设置和重新配置。我们证明了如何在模型层面上表示这些概念,并给出了一个例子,证明如何使用这种模型来解决在产业4.0设置下的访问控制相关问题。我们认为,在运行时使用机器学习和优化办法是现代智能系统的关键特征,以适应环境中的不确定性。

Interpretable learning of effective dynamics for multiscale systems

  • paper_url: http://arxiv.org/abs/2309.05812
  • repo_url: None
  • paper_authors: Emmanuel Menier, Sebastian Kaltenbach, Mouadh Yagoubi, Marc Schoenauer, Petros Koumoutsakos
  • for: 这篇论文旨在提出一种可解释性的学习动力学框架,以提高高维多Scale系统的模型化和仿真。
  • methods: 该框架基于深度回归神经网络,并结合了Mori-Zwanzig和Koopman运动理论。
  • results: 实验结果表明,该框架可以生成高精度预测和获得可解释性的动力学特性,适用于解决高维多Scale系统。
    Abstract The modeling and simulation of high-dimensional multiscale systems is a critical challenge across all areas of science and engineering. It is broadly believed that even with today's computer advances resolving all spatiotemporal scales described by the governing equations remains a remote target. This realization has prompted intense efforts to develop model order reduction techniques. In recent years, techniques based on deep recurrent neural networks have produced promising results for the modeling and simulation of complex spatiotemporal systems and offer large flexibility in model development as they can incorporate experimental and computational data. However, neural networks lack interpretability, which limits their utility and generalizability across complex systems. Here we propose a novel framework of Interpretable Learning Effective Dynamics (iLED) that offers comparable accuracy to state-of-the-art recurrent neural network-based approaches while providing the added benefit of interpretability. The iLED framework is motivated by Mori-Zwanzig and Koopman operator theory, which justifies the choice of the specific architecture. We demonstrate the effectiveness of the proposed framework in simulations of three benchmark multiscale systems. Our results show that the iLED framework can generate accurate predictions and obtain interpretable dynamics, making it a promising approach for solving high-dimensional multiscale systems.
    摘要 高维度多尺度系统的模拟和仿真是现代科学和工程领域的核心挑战。广泛认为,即使今天的计算技术得到进步,解决所有空间时间尺度的 governing 方程仍然是一个远方目标。这一 realizations 促使了对模型简化技术的努力。在过去几年,基于深度循环神经网络的技术已经在复杂空间时间系统的模拟和仿真中提供了有希望的结果,并且可以包含实验室和计算数据。然而,神经网络缺乏可解性,这限制了其应用和普遍性,特别是在复杂系统中。我们提出了一种新的框架,即可解释性学习有效动力(iLED)框架。该框架基于 Mori-Zwanzig 和 Koopman 运算理论,这个选择的特定架构是合理的。我们在三个标准多尺度系统的 simulations 中证明了该框架的有效性。我们的结果表明,iLED 框架可以生成准确预测和获得可解性动力,使其成为解决高维度多尺度系统的有力的方法。

Predicting the Radiation Field of Molecular Clouds using Denoising Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2309.05811
  • repo_url: None
  • paper_authors: Duo Xu, Stella Offner, Robert Gutermuth, Michael Grudic, David Guszejnov, Philip Hopkins
  • for: 这篇论文的目的是量化星系形成过程中辐射反馈的影响,以便更好地理解星系形成的物理过程。
  • methods: 这篇论文使用了深度学习技术,具体来说是denoising diffusion probabilistic models (DDPMs),来预测Interstellar Radiation Field (ISRF) 的强度,基于三种频谱尘埃辐射的观测数据。
  • results: 论文通过对STARFORGE项目的 magnetohydrodynamic模拟和Monoceros R2 (MonR2)星系形成区的观测数据进行训练,成功地预测了ISRF的分布。这些预测结果与实际值之间的偏差在0.1倍以内,并且模型可以有效地约束ISRF的相对强度在0.2倍之间。
    Abstract Accurately quantifying the impact of radiation feedback in star formation is challenging. To address this complex problem, we employ deep learning techniques, denoising diffusion probabilistic models (DDPMs), to predict the interstellar radiation field (ISRF) strength based on three-band dust emission at 4.5 \um, 24 \um, and 250 \um. We adopt magnetohydrodynamic simulations from the STARFORGE (STAR FORmation in Gaseous Environments) project that model star formation and giant molecular cloud (GMC) evolution. We generate synthetic dust emission maps matching observed spectral energy distributions in the Monoceros R2 (MonR2) GMC. We train DDPMs to estimate the ISRF using synthetic three-band dust emission. The dispersion between the predictions and true values is within a factor of 0.1 for the test set. We extended our assessment of the diffusion model to include new simulations with varying physical parameters. While there is a consistent offset observed in these out-of-distribution simulations, the model effectively constrains the relative intensity to within a factor of 2. Meanwhile, our analysis reveals weak correlation between the ISRF solely derived from dust temperature and the actual ISRF. We apply our trained model to predict the ISRF in MonR2, revealing a correspondence between intense ISRF, bright sources, and high dust emission, confirming the model's ability to capture ISRF variations. Our model robustly predicts radiation feedback distribution, even in complex, poorly constrained ISRF environments like those influenced by nearby star clusters. However, precise ISRF predictions require an accurate training dataset mirroring the target molecular cloud's unique physical conditions.
    摘要 准确量化星系形成中辐射反馈的影响是一项复杂的问题。为了解决这个问题,我们使用深度学习技术,杂散扩散概率模型(DDPMs),预测Interstellar Radiation Field(ISRF)强度基于三个频谱带的尘埃辐射。我们采用了STARFORGE项目中的磁液动学模拟,模拟星系形成和大分子云(GMC)的演化。我们生成了与观测 спектраль能量分布匹配的人造尘埃辐射图像。我们使用这些图像训练 DDPMs,以便估算ISRF。我们发现在测试集上,模型的预测与真实值之间的差异在一个因子0.1之内。我们对模型进行了进一步的评估,包括在不同物理参数下运行的新的simulation。虽然在这些 OUT-OF-distribution 的simulation中,我们 observes a consistent offset,但模型 Effectively constrains the relative intensity to within a factor of 2.而我们的分析发现,尘埃温度 solo derivated ISRF 和实际 ISRF 之间存在弱相关性。我们使用我们的训练模型来预测 MonR2 中的 ISRF,发现与强 ISRF 和高尘埃辐射相对应。我们的模型可靠地预测辐射反馈分布,即使在复杂、不充分约束的 ISRF 环境中。然而,精确地预测 ISRF 需要一个准确的训练集,这个训练集必须反映目标分子云的特定物理条件。

Online ML Self-adaptation in Face of Traps

  • paper_url: http://arxiv.org/abs/2309.05805
  • repo_url: None
  • paper_authors: Michal Töpfer, František Plášil, Tomáš Bureš, Petr Hnětynka, Martin Kruliš, Danny Weyns
  • for: 本研究旨在探讨在智能农业场景中应用在线机器学习自适应系统中的陷阱。
  • methods: 本研究使用了在线机器学习 estimator 的规范和在线训练,并评估了这些 estimator 的影响。
  • results: 研究发现了一些陷阱,包括规范和在线训练的影响,以及如何评估 estimator 的方法。这些结果可以作为其他研究人员和实践者在应用在线机器学习自适应系统时的指南。
    Abstract Online machine learning (ML) is often used in self-adaptive systems to strengthen the adaptation mechanism and improve the system utility. Despite such benefits, applying online ML for self-adaptation can be challenging, and not many papers report its limitations. Recently, we experimented with applying online ML for self-adaptation of a smart farming scenario and we had faced several unexpected difficulties -- traps -- that, to our knowledge, are not discussed enough in the community. In this paper, we report our experience with these traps. Specifically, we discuss several traps that relate to the specification and online training of the ML-based estimators, their impact on self-adaptation, and the approach used to evaluate the estimators. Our overview of these traps provides a list of lessons learned, which can serve as guidance for other researchers and practitioners when applying online ML for self-adaptation.
    摘要 在线机器学习(ML)常常用于自适应系统以增强适应机制并提高系统的用用。 DESPITE 这些优点,将在线ML应用于自适应可能是问题,而且不多的论文就讨论了这些问题的限制。 最近,我们尝试在智能农业场景中应用在线ML自适应,并遇到了许多意外的困难(陷阱),我们知道,在社区中不太多讨论这些问题。 在这篇论文中,我们详细讨论了这些陷阱,包括在线ML基于估计器的规则和在线培训的影响,以及自适应的方法。 我们的概述这些陷阱提供了一个指南,可以帮助其他研究者和实践者当在线ML应用自适应时遇到的问题。

Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models

  • paper_url: http://arxiv.org/abs/2309.05803
  • repo_url: None
  • paper_authors: Sumeet Singh, Stephen Tu, Vikas Sindhwani
  • for: 本研究的目的是探讨能量基本模型(EBM)作为策略表示的可行性。
  • methods: 研究人员提出了一种实用的训练目标和算法,使得EBM可以成功地训练。这种方法结合了几个关键元素:(i)排名噪音对比估计(R-NCE),(ii)可学习负样本,以及(iii)非对抗共同训练。
  • results: 研究人员通过实验发现,使用EBM作为策略表示可以与 diffusion models 和其他现有方法竞争,并在多个多模态benchmark中表现出色,包括避免障碍物和推动块。
    Abstract A crucial design decision for any robot learning pipeline is the choice of policy representation: what type of model should be used to generate the next set of robot actions? Owing to the inherent multi-modal nature of many robotic tasks, combined with the recent successes in generative modeling, researchers have turned to state-of-the-art probabilistic models such as diffusion models for policy representation. In this work, we revisit the choice of energy-based models (EBM) as a policy class. We show that the prevailing folklore -- that energy models in high dimensional continuous spaces are impractical to train -- is false. We develop a practical training objective and algorithm for energy models which combines several key ingredients: (i) ranking noise contrastive estimation (R-NCE), (ii) learnable negative samplers, and (iii) non-adversarial joint training. We prove that our proposed objective function is asymptotically consistent and quantify its limiting variance. On the other hand, we show that the Implicit Behavior Cloning (IBC) objective is actually biased even at the population level, providing a mathematical explanation for the poor performance of IBC trained energy policies in several independent follow-up works. We further extend our algorithm to learn a continuous stochastic process that bridges noise and data, modeling this process with a family of EBMs indexed by scale variable. In doing so, we demonstrate that the core idea behind recent progress in generative modeling is actually compatible with EBMs. Altogether, our proposed training algorithms enable us to train energy-based models as policies which compete with -- and even outperform -- diffusion models and other state-of-the-art approaches in several challenging multi-modal benchmarks: obstacle avoidance path planning and contact-rich block pushing.
    摘要 robot学习管道中的关键设计决策是选择策略表示方式:使用哪种模型生成下一个机器人动作?由于许多机器人任务的自然多Modal性,加上近年来的生成模型成功,研究人员已经转向当今最先进的概率模型,如扩散模型,作为策略表示方式。在这种工作中,我们重新考虑使用能量模型(EBM)作为策略类型。我们证明了 prevailing folklore ――高维离散空间中的能量模型是不可行的――是错误的。我们开发了实用的训练目标和算法,其中包括以下几个关键元素:(i)排名噪音对比估计(R-NCE),(ii)可学习的负样本,以及(iii)非对抗共同训练。我们证明我们的提议的目标函数是 asymptotically consistent,并且量化其限界方差。然而,我们显示了启发行为嵌入(IBC)目标是偏向的,并提供了数学解释,以解释在多个独立跟进工作中,IBC训练的能量策略表现不佳。此外,我们还扩展了我们的算法,以学习一个连续随机过程,该过程将噪音和数据相连,并使用一个家族的 EBMs 指标。在这种情况下,我们证明了生成模型的核心想法和 EBMs 是兼容的。总之,我们的提议的训练算法可以训练能量模型作为策略,与扩散模型和其他当今最先进方法在多个复杂多Modal 标准准则中竞争。

Enhancing Hyperedge Prediction with Context-Aware Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2309.05798
  • repo_url: https://github.com/yy-ko/cash
  • paper_authors: Yunyong Ko, Hanghang Tong, Sang-Wook Kim
  • for: 这个论文主要用于解决hyperedge prediction问题,即预测未知的超链接(group-wise relations)。
  • methods: 该论文提出了一种新的hyperedge prediction框架(CASH),使用context-aware node aggregation和自supervised contrastive learning来提高hypergraph表示性和预测精度。
  • results: 实验结果显示,CASH在六个实际的超链接上的预测精度高于所有竞争方法,并且每一种提出的策略都有助于提高CASH模型的准确率。Here’s the format of the information in Simplified Chinese text:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope this helps! Let me know if you have any other questions.
    Abstract Hypergraphs can naturally model group-wise relations (e.g., a group of users who co-purchase an item) as hyperedges. Hyperedge prediction is to predict future or unobserved hyperedges, which is a fundamental task in many real-world applications (e.g., group recommendation). Despite the recent breakthrough of hyperedge prediction methods, the following challenges have been rarely studied: (C1) How to aggregate the nodes in each hyperedge candidate for accurate hyperedge prediction? and (C2) How to mitigate the inherent data sparsity problem in hyperedge prediction? To tackle both challenges together, in this paper, we propose a novel hyperedge prediction framework (CASH) that employs (1) context-aware node aggregation to precisely capture complex relations among nodes in each hyperedge for (C1) and (2) self-supervised contrastive learning in the context of hyperedge prediction to enhance hypergraph representations for (C2). Furthermore, as for (C2), we propose a hyperedge-aware augmentation method to fully exploit the latent semantics behind the original hypergraph and consider both node-level and group-level contrasts (i.e., dual contrasts) for better node and hyperedge representations. Extensive experiments on six real-world hypergraphs reveal that CASH consistently outperforms all competing methods in terms of the accuracy in hyperedge prediction and each of the proposed strategies is effective in improving the model accuracy of CASH. For the detailed information of CASH, we provide the code and datasets at: https://github.com/yy-ko/cash.
    摘要 “几何グラフ”(hypergraph)可以自然地模型群聚关系(例如,一群用户购买同一款商品)作为几何边(hyperedge)。几何边预测是几何グラフ应用中的基本任务之一(例如,群组推荐)。Despite the recent breakthrough of 几何边预测方法,以下两个挑战很少被研究:(C1)如何将几何边候选中的节点组合数据准确地预测几何边?和(C2)如何缓和几何边预测中的自然数据罕规问题?To tackle both challenges together, in this paper, we propose a novel 几何边预测框架(CASH),该框架使用(1)具体考虑几何边候选中每个节点的背景信息,以精确地捕捉几何边中每个节点之间的复杂关系,来满足(C1),以及(2)在几何边预测的上下文中,透过自适应对称学习,对几何边预测进行增强,来缓和几何边预测中的数据罕规问题,来满足(C2)。此外,为了缓和几何边预测中的数据罕规问题,我们还提出了一个几何边测试方法,具体来说是通过对几何边预测模型进行几何边层次抽象,对几何边预测进行双层次对称抽象,以便更好地捕捉几何边的内在 semantics。实验结果显示,CASH 在六个真实世界几何グラフ上 consistently outperform 所有竞争方法,并且每个提出的策略都是增强 CASH 的模型精度的有效方法。详细信息请参考:https://github.com/yy-ko/cash。”

On the Fine-Grained Hardness of Inverting Generative Models

  • paper_url: http://arxiv.org/abs/2309.05795
  • repo_url: None
  • paper_authors: Feyza Duman Keles, Chinmay Hegde
  • for: 本研究的目的是为Generative模型的反射问题提供一个细化的视角,即查找一个大小为$n$的latent vector,使得generative模型的输出与给定的target匹配得非常好。
  • methods: 本研究使用了多种方法,包括几何学的抽象和复杂度分析,以及来自计算复杂性理论的抽象和技巧。
  • results: 本研究提供了一些新的下界,证明了generative模型的反射问题在某些情况下的计算复杂度是$\Omega(2^n)$,这是在计算复杂性理论中的一个新成果。
    Abstract The objective of generative model inversion is to identify a size-$n$ latent vector that produces a generative model output that closely matches a given target. This operation is a core computational primitive in numerous modern applications involving computer vision and NLP. However, the problem is known to be computationally challenging and NP-hard in the worst case. This paper aims to provide a fine-grained view of the landscape of computational hardness for this problem. We establish several new hardness lower bounds for both exact and approximate model inversion. In exact inversion, the goal is to determine whether a target is contained within the range of a given generative model. Under the strong exponential time hypothesis (SETH), we demonstrate that the computational complexity of exact inversion is lower bounded by $\Omega(2^n)$ via a reduction from $k$-SAT; this is a strengthening of known results. For the more practically relevant problem of approximate inversion, the goal is to determine whether a point in the model range is close to a given target with respect to the $\ell_p$-norm. When $p$ is a positive odd integer, under SETH, we provide an $\Omega(2^n)$ complexity lower bound via a reduction from the closest vectors problem (CVP). Finally, when $p$ is even, under the exponential time hypothesis (ETH), we provide a lower bound of $2^{\Omega (n)}$ via a reduction from Half-Clique and Vertex-Cover.
    摘要 目标是使用生成模型进行逆向模型,即找到一个大小为$n$的隐藏向量,使得生成模型输出与给定目标匹配得非常好。这是现代计算机视觉和自然语言处理中的一种核心计算基础。然而,这个问题知道是计算上具有NP困难的worst-case。本文旨在为这个问题提供细腻的视野,并确定了一些新的困难下界。在精确的逆向模型中,目标是确定给定生成模型是否包含一个目标。在STRONG EXPONENTIAL TIME HYPOTHESIS(SETH)下,我们证明了逆向模型的计算复杂度是$\Omega(2^n)$,这是现有结果的强化。在更实用的逆向模型中,目标是确定一个模型范围内的点是否与给定目标之间的$\ell_p$-范数相似。当$p$是正的奇数时,在SETH下,我们提供了$\Omega(2^n)$的困难下界,via reduction from closest vectors problem(CVP)。而当$p$是偶数时,在EXPONENTIAL TIME HYPOTHESIS(ETH)下,我们提供了$2^{\Omega(n)}$的困难下界,via reduction from Half-Clique and Vertex-Cover。

Smartwatch-derived Acoustic Markers for Deficits in Cognitively Relevant Everyday Functioning

  • paper_url: http://arxiv.org/abs/2309.05777
  • repo_url: None
  • paper_authors: Yasunori Yamada, Kaoru Shinkawa, Masatomo Kobayashi, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai
  • for: 早期发现脑性功能障碍的检测是重要的,特别是讨论阿兹海默病。现有的评估标准是基于主观评价。 however, speech 有可能提供准确的对� stato markers。
  • methods: 我们使用smartwatch应用程序收集语音特征作为对� estado markers。我们从54名老年人手中收集了语音数据,包括认知任务和日常对话,以及一种普遍的每日功能测试。
  • results: 我们的结果表明,使用语音特征可以准确地检测一些日常功能障碍。我们使用机器学习模型,可以在68.5%的准确率下,检测出使用标准神经心理测试不能检测到的障碍。此外,我们还发现了一些通用的语音特征,可以强制对� estado功能障碍的识别。
    Abstract Detection of subtle deficits in everyday functioning due to cognitive impairment is important for early detection of neurodegenerative diseases, particularly Alzheimer's disease. However, current standards for assessment of everyday functioning are based on qualitative, subjective ratings. Speech has been shown to provide good objective markers for cognitive impairments, but the association with cognition-relevant everyday functioning remains uninvestigated. In this study, we demonstrate the feasibility of using a smartwatch-based application to collect acoustic features as objective markers for detecting deficits in everyday functioning. We collected voice data during the performance of cognitive tasks and daily conversation, as possible application scenarios, from 54 older adults, along with a measure of everyday functioning. Machine learning models using acoustic features could detect individuals with deficits in everyday functioning with up to 77.8% accuracy, which was higher than the 68.5% accuracy with standard neuropsychological tests. We also identified common acoustic features for robustly discriminating deficits in everyday functioning across both types of voice data (cognitive tasks and daily conversation). Our results suggest that common acoustic features extracted from different types of voice data can be used as markers for deficits in everyday functioning.
    摘要 检测日常功能下降 Due to cognitive impairment 是早期发现 neurosurgery 疾病,特别是阿尔茨海默病的关键。然而,现有的日常功能评估标准是基于主观的评分。speech 已经被证明可以提供good 的对象标志,但与认知有关的日常功能之间的关系还没有被调查。本研究示出了使用智能手表应用程序收集语音特征作为对日常功能下降的对象标志的可能性。我们收集了54名老年人的语音数据,包括认知任务和日常对话,以及一种测量日常功能的指标。使用语音特征的机器学习模型可以准确地检测出日常功能下降,准确率高达77.8%,比标准神经生理测试的68.5%高。我们还发现了日常功能下降的共同语音特征,可以在不同类型的语音数据中robustly 分类日常功能下降。我们的结果表明,共同语音特征可以作为日常功能下降的标志。

The Effect of Intrinsic Dimension on Metric Learning under Compression

  • paper_url: http://arxiv.org/abs/2309.05751
  • repo_url: None
  • paper_authors: Efstratios Palias, Ata Kabán
  • for: 本研究的目的是为了提高距离基于学习算法的性能,通过找到适当的距离度量函数。
  • methods: 本研究使用了随机压缩数据来训练全级度度量函数,并提供了对随机压缩的误差 bounds。
  • results: 实验结果表明,在高维设定下,采用随机压缩训练的方法可以提高距离基于学习算法的性能,并且 bounds 不依赖于环境维度。
    Abstract Metric learning aims at finding a suitable distance metric over the input space, to improve the performance of distance-based learning algorithms. In high-dimensional settings, metric learning can also play the role of dimensionality reduction, by imposing a low-rank restriction to the learnt metric. In this paper, instead of training a low-rank metric on high-dimensional data, we consider a randomly compressed version of the data, and train a full-rank metric there. We give theoretical guarantees on the error of distance-based metric learning, with respect to the random compression, which do not depend on the ambient dimension. Our bounds do not make any explicit assumptions, aside from i.i.d. data from a bounded support, and automatically tighten when benign geometrical structures are present. Experimental results on both synthetic and real data sets support our theoretical findings in high-dimensional settings.
    摘要

CaloClouds II: Ultra-Fast Geometry-Independent Highly-Granular Calorimeter Simulation

  • paper_url: http://arxiv.org/abs/2309.05704
  • repo_url: https://github.com/flc-qu-hep/caloclouds-2
  • paper_authors: Erik Buhmann, Frank Gaede, Gregor Kasieczka, Anatolii Korol, William Korcari, Katja Krüger, Peter McKeown
  • for: 高精度探测器的能量储存速度加速是未来冲击实验所需的。
  • methods: 使用生成型机器学习(ML)模型加速和补充传统 simulate chain 的物理分析。
  • results: 新的 CaloClouds II 模型提供了许多关键改进,包括连续时间分数基本模型,可以与 Geant4 相比,在单个 CPU 上实现 $6\times$ 的速度提升,并且通过简化储存模型为准确抽象模型,实现了 $46\times$ ($37\times$) 的速度提升。
    Abstract Fast simulation of the energy depositions in high-granular detectors is needed for future collider experiments with ever increasing luminosities. Generative machine learning (ML) models have been shown to speed up and augment the traditional simulation chain in physics analysis. However, the majority of previous efforts were limited to models relying on fixed, regular detector readout geometries. A major advancement is the recently introduced CaloClouds model, a geometry-independent diffusion model, which generates calorimeter showers as point clouds for the electromagnetic calorimeter of the envisioned International Large Detector (ILD). In this work, we introduce CaloClouds II which features a number of key improvements. This includes continuous time score-based modelling, which allows for a 25 step sampling with comparable fidelity to CaloClouds while yielding a $6\times$ speed-up over Geant4 on a single CPU ($5\times$ over CaloClouds). We further distill the diffusion model into a consistency model allowing for accurate sampling in a single step and resulting in a $46\times$ ($37\times$) speed-up. This constitutes the first application of consistency distillation for the generation of calorimeter showers.
    摘要 In this work, we introduce CaloClouds II, which features a number of key improvements. This includes continuous time score-based modeling, which allows for a 25-step sampling with comparable fidelity to CaloClouds while yielding a 6x speed-up over Geant4 on a single CPU (5x over CaloClouds). We further distill the diffusion model into a consistency model, allowing for accurate sampling in a single step and resulting in a 46x (37x) speed-up. This constitutes the first application of consistency distillation for the generation of calorimeter showers.

Unsupervised Machine Learning Techniques for Exploring Tropical Coamoeba, Brane Tilings and Seiberg Duality

  • paper_url: http://arxiv.org/abs/2309.05702
  • repo_url: None
  • paper_authors: Rak-Kyeong Seong
  • for: 这个论文的目的是用无监督机器学习技术来识别四维N=1维度超Symmetric gauge理论中的拓扑阶段,这些理论是D3- branes在toric Calabi-Yau 3-fold中的世界量理论。
  • methods: 这篇论文使用了无监督机器学习技术,如principal component analysis (PCA)和t-distributed stochastic neighbor embedding (t-SNE),来将复数结构参数的变化所对应的coamoeba和相关的brane tilingsProject down to a lower-dimensional phase space with phase boundaries corresponding to Seiberg duality。
  • results: 这篇论文的结果是通过使用无监督机器学习技术,可以将复数结构参数的变化所对应的coamoeba和相关的brane tilingsProject down to a lower-dimensional phase space with phase boundaries corresponding to Seiberg duality,并且在这个2-dimensional phase diagram中,可以看到Seiberg duality的相关关系。
    Abstract We introduce unsupervised machine learning techniques in order to identify toric phases of 4d N=1 supersymmetric gauge theories corresponding to the same toric Calabi-Yau 3-fold. These 4d N=1 supersymmetric gauge theories are worldvolume theories of a D3-brane probing a toric Calabi-Yau 3-fold and are realized in terms of a Type IIB brane configuration known as a brane tiling. It corresponds to the skeleton graph of the coamoeba projection of the mirror curve associated to the toric Calabi-Yau 3-fold. When we vary the complex structure moduli of the mirror Calabi-Yau 3-fold, the coamoeba and the corresponding brane tilings change their shape, giving rise to different toric phases related by Seiberg duality. We illustrate that by employing techniques such as principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), we can project the space of coamoeba labelled by complex structure moduli down to a lower dimensional phase space with phase boundaries corresponding to Seiberg duality. In this work, we illustrate this technique by obtaining a 2-dimensional phase diagram for brane tilings corresponding to the cone over the zeroth Hirzebruch surface F0.
    摘要 我们引入无监督机器学习技术,以识别四维N=1瑞奖场论的托立阶段。这些四维N=1瑞奖场论是D3-节点探测托立Calabi-Yau 3-次元的世界体系,并且可以表示为Type IIB节点配置所对应的节点矩阵。它对应到镜对称Calabi-Yau 3-次元的对偶矩阵。当我们变化镜对称Calabi-Yau 3-次元的复数结构参数,节点和相应的节点矩阵会改变形状,从而导致不同的托立阶段相互关联。我们使用技术如主成分分析(PCA)和t-分布随机邻近测度(t-SNE),可以将节点参数下的空间对应到较低维度的阶段空间,阶段边界与Seiberg对偶相关。在这个研究中,我们这种技术以获得F0的极点 Hirzebruch 面上的节点矩阵对应的2-维阶段图。

On the quality of randomized approximations of Tukey’s depth

  • paper_url: http://arxiv.org/abs/2309.05657
  • repo_url: None
  • paper_authors: Simon Briend, Gábor Lugosi, Roberto Imbuzeiro Oliveira
  • for: 这个论文是为了研究Tukey深度的准确计算问题,特别是在高维度时。
  • methods: 这篇论文使用随机化方法来估计Tukey深度,并研究这种方法在不同情况下的性能。
  • results: 研究结果显示,如果要求算法在维度上运行时间 polynomial,那么随机化方法可以准确地估计 maximal depth 和 depths close to zero,但是对于中间深度的任意点,任何好的估计都需要 exponential complexity。
    Abstract Tukey's depth (or halfspace depth) is a widely used measure of centrality for multivariate data. However, exact computation of Tukey's depth is known to be a hard problem in high dimensions. As a remedy, randomized approximations of Tukey's depth have been proposed. In this paper we explore when such randomized algorithms return a good approximation of Tukey's depth. We study the case when the data are sampled from a log-concave isotropic distribution. We prove that, if one requires that the algorithm runs in polynomial time in the dimension, the randomized algorithm correctly approximates the maximal depth $1/2$ and depths close to zero. On the other hand, for any point of intermediate depth, any good approximation requires exponential complexity.
    摘要 土耳其的深度(或半空间深度)是多变量数据中广泛使用的中心性指标。然而,对高维数据进行准确计算的Tukey的深度知道是一个困难的问题。为了缓解这个问题,随机化Tukey的深度的算法已经被提出。在这篇论文中,我们研究了这些随机算法是否能够给出Tukey的深度的好 aproximation。我们研究了数据来自具有卷积的均匀分布的情况。我们证明,如果要求算法在维度上运行在多项式时间内,那么随机算法可以正确地approximates maximal depth 1/2和 depths close to zero。然而,对于任何中间深度的点,任何好的approximation都需要对数复杂度。

Dynamic Handover: Throw and Catch with Bimanual Hands

  • paper_url: http://arxiv.org/abs/2309.05655
  • repo_url: None
  • paper_authors: Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, Xiaolong Wang
  • for: 这个论文的目的是解决人工智能机器人在投捕和捕获物体时遇到的挑战,如高速动作、精准协作和对多种物体的交互。
  • methods: 作者使用了多个拟合机器人臂上的多指手系统,并使用多任务学习和实验室转移来训练该系统。为了 bridging the Sim2Real gap,作者提供了多种新的算法设计,包括学习一个物体轨迹预测模型,以帮助机器人捕手在实时情况下了解物体的运动轨迹,并根据此反应。
  • results: 作者在实验中使用多种物体,并与多个基eline进行比较,显示了 significannot improvements。作者的项目页面可以在 \url{https://binghao-huang.github.io/dynamic_handover/} 上找到。
    Abstract Humans throw and catch objects all the time. However, such a seemingly common skill introduces a lot of challenges for robots to achieve: The robots need to operate such dynamic actions at high-speed, collaborate precisely, and interact with diverse objects. In this paper, we design a system with two multi-finger hands attached to robot arms to solve this problem. We train our system using Multi-Agent Reinforcement Learning in simulation and perform Sim2Real transfer to deploy on the real robots. To overcome the Sim2Real gap, we provide multiple novel algorithm designs including learning a trajectory prediction model for the object. Such a model can help the robot catcher has a real-time estimation of where the object will be heading, and then react accordingly. We conduct our experiments with multiple objects in the real-world system, and show significant improvements over multiple baselines. Our project page is available at \url{https://binghao-huang.github.io/dynamic_handover/}.
    摘要 人类常常投掷和捕获物体,但这些动作却对机器人带来许多挑战:机器人需要在高速下进行动态动作,协同准确地操作,并与多种物体进行交互。在这篇论文中,我们设计了两个多 fingers 手 attachment 到机器人臂,以解决这个问题。我们使用多智能 reinforcement learning 在模拟环境中训练我们的系统,并通过 Sim2Real 跨度传输部署到真实机器人上。为了bridging Sim2Real gap,我们提供了多种新算法设计,包括学习物体运动预测模型。这种模型可以帮助机器人捕获器在实时情况下获得物体的预测位置,然后按照相应的反应。我们在实际系统中进行了多种物体的实验,并显示了多个基准值的改进。我们的项目页面可以在 \url{https://binghao-huang.github.io/dynamic_handover/} 上找到。

Data efficiency, dimensionality reduction, and the generalized symmetric information bottleneck

  • paper_url: http://arxiv.org/abs/2309.05649
  • repo_url: None
  • paper_authors: K. Michael Martini, Ilya Nemenman
  • for: simultaneous compression of two random variables to preserve information
  • methods: Generalized Symmetric Information Bottleneck (GSIB) with different functional forms of the cost
  • results: qualitatively less data required for simultaneous compression compared to compressing variables one at a time, demonstrating the principle of simultaneous compression being more data efficient.Here’s the full text in Simplified Chinese:
  • for: 这个论文是为了同时压缩两个随机变量以保留信息的方法。
  • methods: 该方法是通过不同的函数形式来定义成本的扩展了信息瓶颈。
  • results: 通过分析不同的数据集大小,我们发现在典型的情况下,同时压缩两个变量的压缩需要比独立压缩每个变量的数据量更少,这是同时压缩的一个例子,表明同时压缩是独立压缩每个变量的更高效的方法。
    Abstract The Symmetric Information Bottleneck (SIB), an extension of the more familiar Information Bottleneck, is a dimensionality reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the Generalized Symmetric Information Bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the dataset size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that, in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.
    摘要 “ симметричный информационный бутылочный”(SIB)是一种维度减少技术,它同时压缩两个随机变量,以保留它们压缩后的信息之间的相互关系。我们引入了“总体化 симметричный информационный бутылочный”(GSIB),它探讨了不同的函数形式,以优化这种同时压缩的成本。我们然后研究了这种同时压缩的数据大小需求,通过计算涨落函数的下界和均方差估计来评估相关的统计噪声。我们发现,在常见的情况下,同时压缩的GSIB需要较少的数据来实现相同的错误率,而独立压缩每个输入变量的情况下,需要更多的数据。我们认为,这是一种更通用的原理,即同时压缩是独立压缩每个输入变量的数据效率更高的。

A Novel Supervised Deep Learning Solution to Detect Distributed Denial of Service (DDoS) attacks on Edge Systems using Convolutional Neural Networks (CNN)

  • paper_url: http://arxiv.org/abs/2309.05646
  • repo_url: https://github.com/VedanthR5/A-Novel-Deep-Learning-Solution-to-detect-DDoS-attacks-using-Neural-Networks
  • paper_authors: Vedanth Ramanathan, Krish Mahadevan, Sejal Dua
  • for: 本研究旨在探讨一种基于深度学习的网络流量中的DDOS攻击检测方法,以帮助防止DDOS攻击对互联网安全造成威胁。
  • methods: 本研究使用了 convolutional neural network (CNN) 和常用的深度学习算法,开发了一种新的检测技术,可以分类为正常和恶意流量。研究人员采用了数据预处理、流量normalization和自适应抑制等方法,以提高模型的泛化能力和精度。
  • results: 本研究的结果表明,提案的检测算法在2000个未看过的网络流量中实现了检测精度为0.9883,表明该方法可以有效地检测DDOS攻击。此外,研究人员还发现该方法可以扩展到任何网络环境,并且可以满足实时检测的需求。
    Abstract Cybersecurity attacks are becoming increasingly sophisticated and pose a growing threat to individuals, and private and public sectors. Distributed Denial of Service attacks are one of the most harmful of these threats in today's internet, disrupting the availability of essential services. This project presents a novel deep learning-based approach for detecting DDoS attacks in network traffic using the industry-recognized DDoS evaluation dataset from the University of New Brunswick, which contains packet captures from real-time DDoS attacks, creating a broader and more applicable model for the real world. The algorithm employed in this study exploits the properties of Convolutional Neural Networks (CNN) and common deep learning algorithms to build a novel mitigation technique that classifies benign and malicious traffic. The proposed model preprocesses the data by extracting packet flows and normalizing them to a fixed length which is fed into a custom architecture containing layers regulating node dropout, normalization, and a sigmoid activation function to out a binary classification. This allows for the model to process the flows effectively and look for the nodes that contribute to DDoS attacks while dropping the "noise" or the distractors. The results of this study demonstrate the effectiveness of the proposed algorithm in detecting DDOS attacks, achieving an accuracy of .9883 on 2000 unseen flows in network traffic, while being scalable for any network environment.
    摘要 “网络安全攻击日益变得更加复杂,对个人和公共领域 pose 成长中的威胁。分布式拒绝服务攻击(DDoS)是当今互联网中最有害的一种威胁,可以中断网络服务的可用性。本项目提出了一种基于深度学习的新方法,用于在网络流量中检测DDoS攻击。该方法使用了大学新不列颠的DDoS评估数据集,该数据集包含实际时间DDoS攻击的PacketCapture,创造了更广泛和更适用的模型。该算法使用了卷积神经网络和常见的深度学习算法,建立了一种新的防御技术,该技术可以分类 benign 和 malicious 流量。该提案的模型采取了数据预处理步骤,将流量拼接成包流,并将其归一化为固定长度,然后通过自定义架构,包括节点Dropout、normalization和sigmoid活化函数,进行二分类。这种方法可以让模型有效地处理流量,寻找引起DDoS攻击的节点,同时抛弃“噪音”或“拖垮”。研究结果表明,该提案的算法在2000个未看到的流量中的网络流量中的检测DDOS攻击的精度为0.9883,同时具有扩展性,适用于任何网络环境。”

Desenvolvimento de modelo para predição de cotações de ação baseada em análise de sentimentos de tweets

  • paper_url: http://arxiv.org/abs/2309.06538
  • repo_url: None
  • paper_authors: Mario Mitsuo Akita, Everton Josue da Silva
  • for: 预测股票市场价格
  • methods: 使用 iFeel 2.0 平台提取推特社交媒体上关于 Petrobras 公司的19个情感特征,然后使用这些特征训练 XBoot 模型预测未来股票价格。
  • results: 使用模型预测 Petrobras 股票价格后,在250天内实现了与100个随机模型的平均性能相比的净收益 R$88,82。
    Abstract Training machine learning models for predicting stock market share prices is an active area of research since the automatization of trading such papers was available in real time. While most of the work in this field of research is done by training Neural networks based on past prices of stock shares, in this work, we use iFeel 2.0 platform to extract 19 sentiment features from posts obtained from microblog platform Twitter that mention the company Petrobras. Then, we used those features to train XBoot models to predict future stock prices for the referred company. Later, we simulated the trading of Petrobras' shares based on the model's outputs and determined the gain of R$88,82 (net) in a 250-day period when compared to a 100 random models' average performance.
    摘要 研究用机器学习模型预测股票市场价格是一个活跃的领域,因为自动化交易可以在实时提供相关文献。大多数在这个领域的研究是通过以往的股票价格训练神经网络,而在这种工作中,我们使用iFeel 2.0平台提取了19个情感特征从微博平台上提到Petrobras公司的帖子。然后,我们使用这些特征训练XBoot模型预测Petrobras公司的未来股票价格。后来,我们使用模型的输出进行了Petrobras股票的交易模拟,并发现在250天内比Random Models的平均表现获得了R$88,82(净)的收益。

Boundary Peeling: Outlier Detection Method Using One-Class Peeling

  • paper_url: http://arxiv.org/abs/2309.05630
  • repo_url: None
  • paper_authors: Sheikh Arafat, Na Sun, Maria L. Weese, Waldyn G. Martinez
  • for: 本研究旨在提出一种不需要标注数据的潜在异常点检测算法,用于数据分析阶段中的异常点检测。
  • methods: 本算法基于一类支持向量机(One-class Support Vector Machine,SVM)的迭代封闭精度计算,生成可变的边界,并通过iteratively-peeled, flexible boundaries进行异常点检测。
  • results: 在模拟数据中,本算法在无异常情况下的性能比所有现状方法更高,并在异常情况下与参考方法相当或更高,同时在常用的数据集上也表现良好,与标准方法相当或更高。
    Abstract Unsupervised outlier detection constitutes a crucial phase within data analysis and remains a dynamic realm of research. A good outlier detection algorithm should be computationally efficient, robust to tuning parameter selection, and perform consistently well across diverse underlying data distributions. We introduce One-Class Boundary Peeling, an unsupervised outlier detection algorithm. One-class Boundary Peeling uses the average signed distance from iteratively-peeled, flexible boundaries generated by one-class support vector machines. One-class Boundary Peeling has robust hyperparameter settings and, for increased flexibility, can be cast as an ensemble method. In synthetic data simulations One-Class Boundary Peeling outperforms all state of the art methods when no outliers are present while maintaining comparable or superior performance in the presence of outliers, as compared to benchmark methods. One-Class Boundary Peeling performs competitively in terms of correct classification, AUC, and processing time using common benchmark data sets.
    摘要 translate("Unsupervised outlier detection constitutes a crucial phase within data analysis and remains a dynamic realm of research.")一种无监督异常检测算法是数据分析中的关键阶段,并且是一个动态的研究领域。一个好的异常检测算法应该具有计算效率、鲁棒性和多样性的特点。我们介绍了一种无监督边缘剥离算法,即One-Class Boundary Peeling。这种算法使用一个一类支持向量机生成的轮廓进行迭代剥离,并且具有鲁棒的超参数设置。为了增加灵活性,One-Class Boundary Peeling可以被视为一种集成方法。在模拟数据中,One-Class Boundary Peeling在没有异常时的情况下比所有现状方法高效,同时在异常存在的情况下也能够保持与参考方法相当或更高的性能。在常用的测试数据集上,One-Class Boundary Peeling在正确分类、AUC和处理时间方面与参考方法竞争。

Privacy Side Channels in Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2309.05610
  • repo_url: None
  • paper_authors: Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr
  • for: 本研究旨在掌握机器学习(ML)模型中的隐私问题,并提出了四种隐私渠道,即训练数据筛选、输入预处理、输出后处理和查询筛选,这些渠道可以帮助攻击者从 ML 模型中提取私人信息。
  • methods: 本研究使用了一系列实验和分析方法,包括训练数据筛选、输入预处理、输出后处理和查询筛选等,以探索机器学习模型中的隐私问题。
  • results: 本研究发现了四种隐私渠道,可以帮助攻击者从机器学习模型中提取私人信息,包括提高会员推理攻击和提取用户测试查询等。此外,研究还发现了一些系统组件,如训练数据筛选和输出筛选,可以帮助攻击者从 ML 模型中提取私人信息。
    Abstract Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models. We propose four categories of side channels that span the entire ML lifecycle (training data filtering, input preprocessing, output post-processing, and query filtering) and allow for either enhanced membership inference attacks or even novel threats such as extracting users' test queries. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. Moreover, we show that systems which block language models from regenerating training data can be exploited to allow exact reconstruction of private keys contained in the training set -- even if the model did not memorize these keys. Taken together, our results demonstrate the need for a holistic, end-to-end privacy analysis of machine learning.
    摘要 现有的机器学习(ML)隐私保护方法都假设模型在孤立的环境中运行,而实际上ML模型是更大的系统的一部分,包括训练数据筛选、输出监控和更多的组件。在这个工作中,我们介绍了隐私副通道:它们利用这些系统级别的组件来提取私人信息,比起独立的模型来说,提取速率更高。我们提出了四种类别的副通道,涵盖了整个ML生命周期(训练数据筛选、输入预处理、输出后处理和查询筛选),可以提供加强的会员推断攻击或者是新的威胁,如提取用户的测试查询。例如,我们显示了在应用幂等训练前的数据归一化会完全跳过任何可证明隐私保证的保障。此外,我们还显示了防止语言模型重新生成训练数据的系统可以被滥用,以访问私钥,即使模型没有记忆这些私钥。总之,我们的结果表明了机器学习隐私分析应该是综合、端到端的。

Quantitative Analysis of Forecasting Models:In the Aspect of Online Political Bias

  • paper_url: http://arxiv.org/abs/2309.05589
  • repo_url: None
  • paper_authors: Srinath Sai Tripuraneni, Sadia Kamal, Arunkumar Bagavathi
  • for: 本研究旨在Characterizing political bias in online social media platforms, particularly in forecasting political leaning time series.
  • methods: 我们提出了一种决策树方法, Utilizing existing time series forecasting models on two social media datasets with different political ideologies, specifically Twitter and Gab.
  • results: Through our experiments and analyses, we aim to shed light on the challenges and opportunities in forecasting political bias in social media platforms, and ultimately pave the way for developing more effective strategies to mitigate the negative impact of political bias in the digital realm.
    Abstract Understanding and mitigating political bias in online social media platforms are crucial tasks to combat misinformation and echo chamber effects. However, characterizing political bias temporally using computational methods presents challenges due to the high frequency of noise in social media datasets. While existing research has explored various approaches to political bias characterization, the ability to forecast political bias and anticipate how political conversations might evolve in the near future has not been extensively studied. In this paper, we propose a heuristic approach to classify social media posts into five distinct political leaning categories. Since there is a lack of prior work on forecasting political bias, we conduct an in-depth analysis of existing baseline models to identify which model best fits to forecast political leaning time series. Our approach involves utilizing existing time series forecasting models on two social media datasets with different political ideologies, specifically Twitter and Gab. Through our experiments and analyses, we seek to shed light on the challenges and opportunities in forecasting political bias in social media platforms. Ultimately, our work aims to pave the way for developing more effective strategies to mitigate the negative impact of political bias in the digital realm.
    摘要 In this paper, we propose a heuristic approach to classify social media posts into five distinct political leaning categories. Since there is a lack of prior work on forecasting political bias, we conduct an in-depth analysis of existing baseline models to identify which model best fits to forecast political leaning time series. Our approach involves utilizing existing time series forecasting models on two social media datasets with different political ideologies, specifically Twitter and Gab.Through our experiments and analyses, we seek to shed light on the challenges and opportunities in forecasting political bias in social media platforms. Our work aims to pave the way for developing more effective strategies to mitigate the negative impact of political bias in the digital realm.

Anisotropic Diffusion Stencils: From Simple Derivations over Stability Estimates to ResNet Implementations

  • paper_url: http://arxiv.org/abs/2309.05575
  • repo_url: None
  • paper_authors: Karl Schrader, Joachim Weickert, Michael Krause
  • for: This paper is written for studying the numerical approximation of anisotropic diffusion processes with a diffusion tensor, and deriving a large family of finite difference discretizations on a 3x3 stencil.
  • methods: The paper uses a directional splitting method to derive a stencil class that covers a wide range of existing discretizations, and establishes a bound on the spectral norm of the matrix corresponding to the stencil to guarantee stability of an explicit scheme in the Euclidean norm.
  • results: The paper shows that the resulting stencil class involves one free parameter and covers a wide range of existing discretizations, and that the two parameters in the stencil of Weickert et al. (2013) contain redundancy. Additionally, the paper demonstrates a natural translation of the explicit scheme into ResNet blocks, which enables simple and highly efficient parallel implementations on GPUs.
    Abstract Anisotropic diffusion processes with a diffusion tensor are important in image analysis, physics, and engineering. However, their numerical approximation has a strong impact on dissipative artefacts and deviations from rotation invariance. In this work, we study a large family of finite difference discretisations on a 3 x 3 stencil. We derive it by splitting 2-D anisotropic diffusion into four 1-D diffusions. The resulting stencil class involves one free parameter and covers a wide range of existing discretisations. It comprises the full stencil family of Weickert et al. (2013) and shows that their two parameters contain redundancy. Furthermore, we establish a bound on the spectral norm of the matrix corresponding to the stencil. This gives time step size limits that guarantee stability of an explicit scheme in the Euclidean norm. Our directional splitting also allows a very natural translation of the explicit scheme into ResNet blocks. Employing neural network libraries enables simple and highly efficient parallel implementations on GPUs.
    摘要 “散射过程”在图像分析、物理和工程中具有重要地位。然而,其数字逼近带来强烈的消耗残差和不够旋转 invariants 的偏差。在这项工作中,我们研究了一个大家族的finite difference离散方法,其基于2-D不同方向散射的4-1-D散射。这个stencil家族包括一个自由参数,覆盖了广泛存在的离散方法。它包括Weickert等人(2013)中的全stencil家族,并证明了其两个参数之间的重复性。此外,我们还提出了一个spectral norm的下界,这个下界 garanties stability of an explicit scheme in Euclidean norm。我们的方向分解也使得可以非常自然地将批处理翻译成ResNet块。通过使用神经网络库,可以在GPU上实现高效并简单的并行实现。

Advancing Federated Learning in 6G: A Trusted Architecture with Graph-based Analysis

  • paper_url: http://arxiv.org/abs/2309.05525
  • repo_url: https://github.com/chendiqian/GNN4FL
  • paper_authors: Wenxuan Ye, Chendi Qian, Xueli An, Xueqiang Yan, Georg Carle
  • for: 提高6G网络中的人工智能支持,使其更加安全和可靠。
  • methods: 使用分布式记录技术和图 neural network,包括预处理层使用同质加密、图StructuredNN用于异常模型检测、和分布式系统选择机制。
  • results: 在模拟中 Validates the feasibility of the proposed architecture, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines.
    Abstract Integrating native AI support into the network architecture is an essential objective of 6G. Federated Learning (FL) emerges as a potential paradigm, facilitating decentralized AI model training across a diverse range of devices under the coordination of a central server. However, several challenges hinder its wide application in the 6G context, such as malicious attacks and privacy snooping on local model updates, and centralization pitfalls. This work proposes a trusted architecture for supporting FL, which utilizes Distributed Ledger Technology (DLT) and Graph Neural Network (GNN), including three key features. First, a pre-processing layer employing homomorphic encryption is incorporated to securely aggregate local models, preserving the privacy of individual models. Second, given the distributed nature and graph structure between clients and nodes in the pre-processing layer, GNN is leveraged to identify abnormal local models, enhancing system security. Third, DLT is utilized to decentralize the system by selecting one of the candidates to perform the central server's functions. Additionally, DLT ensures reliable data management by recording data exchanges in an immutable and transparent ledger. The feasibility of the novel architecture is validated through simulations, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines.
    摘要 六代网络中 интеGRATING本地AI支持是一个重要目标。联邦学习(FL)emerges as a potential paradigm,允许分散的AI模型训练 Across a diverse range of devices under the coordination of a central server。然而,several challenges hinder its wide application in the 6G context, such as malicious attacks and privacy snooping on local model updates, and centralization pitfalls。This work proposes a trusted architecture for supporting FL, which utilizes Distributed Ledger Technology (DLT) and Graph Neural Network (GNN), including three key features。First, a pre-processing layer employing homomorphic encryption is incorporated to securely aggregate local models, preserving the privacy of individual models。Second, given the distributed nature and graph structure between clients and nodes in the pre-processing layer, GNN is leveraged to identify abnormal local models, enhancing system security。Third, DLT is utilized to decentralize the system by selecting one of the candidates to perform the central server's functions。Additionally, DLT ensures reliable data management by recording data exchanges in an immutable and transparent ledger。The feasibility of the novel architecture is validated through simulations, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Re-formalization of Individual Fairness

  • paper_url: http://arxiv.org/abs/2309.05521
  • repo_url: None
  • paper_authors: Toshihiro Kamishima
  • for: 本研究旨在重新定义个人公平,通过统计独立条件确定个人。
  • methods: 本研究使用了Dwork等人的形式化方法,将类似数据在不公平空间映射到公平空间中相似的位置。
  • results: 本研究提出了一种新的公平定义,可以与等式公平和统计平衡结合使用,并且可以应用于预处理、进程处理和后处理阶段。
    Abstract The notion of individual fairness is a formalization of an ethical principle, "Treating like cases alike," which has been argued such as by Aristotle. In a fairness-aware machine learning context, Dwork et al. firstly formalized the notion. In their formalization, a similar pair of data in an unfair space should be mapped to similar positions in a fair space. We propose to re-formalize individual fairness by the statistical independence conditioned by individuals. This re-formalization has the following merits. First, our formalization is compatible with that of Dwork et al. Second, our formalization enables to combine individual fairness with the fairness notion, equalized odds or sufficiency, as well as statistical parity. Third, though their formalization implicitly assumes a pre-process approach for making fair prediction, our formalization is applicable to an in-process or post-process approach.
    摘要 “个人公平”是一种形式化的道德原则, Aristotle 已经提出过,而 Dwork 等人首先将其形式化。在公平意识的机器学习上下,他们的形式化是指将不公平的空间中的相似资料映射到公平的空间中相似的位置。我们提议将个人公平重新形式化为统计独立的条件。这种重新形式化具有以下优点:首先,我们的形式化与 Dwork 等人的形式化相容。其次,我们的形式化可以与 equalized odds 或 sufficiency 的公平观念结合。最后,他们的形式化预设了先processing 的公平预测方法,而我们的形式化则可以应用于进程或后process approach。

Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated Learning

  • paper_url: http://arxiv.org/abs/2309.05505
  • repo_url: https://github.com/shenzebang/centaur-privacy-federated-representation-learning
  • paper_authors: Zebang Shen, Jiayuan Ye, Anmin Kang, Hamed Hassani, Reza Shokri
  • for: 这个论文目的是提出一种基于分布式学习的隐私保护 federated representation learning 方法,以保持数据隐私而提高模型性能。
  • methods: 该方法使用了现代差分隐私算法,并使用了一种新的可变权重策略来保证 differential privacy 的承诺,同时允许本地个性化。
  • results: 在线性表示设定下,我们的新算法 \DPFEDREP\ 在一个 linear rate 下 converges to a ball centered around the global optimal solution, 并且radius of the ball 与隐私预算的reciprocal成正比。这些结果提高了这个问题的 utility-privacy trade-off 的最佳性能,比过去的最佳性能增加了 $\sqrt{d}$。
    Abstract Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. Randomized mechanisms can prevent convergence of models on learning even the useful representation functions, especially if there is more disagreement between local models on the classification functions (due to data heterogeneity). In this paper, we consider a representation federated learning objective that encourages various parties to collaboratively refine the consensus part of the model, with differential privacy guarantees, while separately allowing sufficient freedom for local personalization (without releasing it). We prove that in the linear representation setting, while the objective is non-convex, our proposed new algorithm \DPFEDREP\ converges to a ball centered around the \emph{global optimal} solution at a linear rate, and the radius of the ball is proportional to the reciprocal of the privacy budget. With this novel utility analysis, we improve the SOTA utility-privacy trade-off for this problem by a factor of $\sqrt{d}$, where $d$ is the input dimension. We empirically evaluate our method with the image classification task on CIFAR10, CIFAR100, and EMNIST, and observe a significant performance improvement over the prior work under the same small privacy budget. The code can be found in this link: https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning.
    摘要 “重复的参数共享在联合学习中会导致敏感数据信息泄露,这会背离联合学习的主要目的——数据隐私。为了解决这种信息泄露风险,使用当前最佳的权限隐私算法也不是免费的。随机机制可以防止模型学习到本地模型之间的分布不同的情况下的有用表示函数,尤其是当数据不同时。在这篇论文中,我们考虑了一种联合学习目标,它鼓励不同党派共同修改共识部分的模型,同时保证隐私保证。我们证明在线性表示设定下,虽然目标函数不对称,但我们提出的新算法\DPFEDREP\在线性速率下 converge到一个球心在全球优致解的解,球半径与隐私预算reciprocal成正比。通过这种新的用户分析,我们提高了这个问题的状态艺术-隐私质量比,提高了状态艺术质量比$\sqrt{d}$,其中$d$是输入维度。我们通过对图像分类任务进行实验,观察到在相同的小隐私预算下,我们的方法与先前的成果相比有显著的性能提升。代码可以在以下链接中找到:https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning。”

Systematic Review of Experimental Paradigms and Deep Neural Networks for Electroencephalography-Based Cognitive Workload Detection

  • paper_url: http://arxiv.org/abs/2309.07163
  • repo_url: None
  • paper_authors: Vishnu KN, Cota Navin Gupta
  • for: 这种系统性文献综述旨在探讨基于电энце法成功的认知工作负荷(CWL)估计方法。
  • methods: 这些研究使用了多种实验方法来刺激人类的认知工作负荷,并使用了深度神经网络(DNNs)进行信号分类。
  • results: 研究发现,只有一些研究使用了在线或 pseudo-在线的分类策略来实时估计认知工作负荷,而大多数研究使用了黑盒模型。 综述还表明,DNNs 是可以有效地分类 EEG 信号的工具,但是现有方法受到非站态信号的限制。
    Abstract This article summarizes a systematic review of the electroencephalography (EEG)-based cognitive workload (CWL) estimation. The focus of the article is twofold: identify the disparate experimental paradigms used for reliably eliciting discreet and quantifiable levels of cognitive load and the specific nature and representational structure of the commonly used input formulations in deep neural networks (DNNs) used for signal classification. The analysis revealed a number of studies using EEG signals in its native representation of a two-dimensional matrix for offline classification of CWL. However, only a few studies adopted an online or pseudo-online classification strategy for real-time CWL estimation. Further, only a couple of interpretable DNNs and a single generative model were employed for cognitive load detection till date during this review. More often than not, researchers were using DNNs as black-box type models. In conclusion, DNNs prove to be valuable tools for classifying EEG signals, primarily due to the substantial modeling power provided by the depth of their network architecture. It is further suggested that interpretable and explainable DNN models must be employed for cognitive workload estimation since existing methods are limited in the face of the non-stationary nature of the signal.
    摘要
  1. Identify the disparate experimental paradigms used for reliably eliciting discreet and quantifiable levels of cognitive load.2. Examine the specific nature and representational structure of the commonly used input formulations in deep neural networks (DNNs) used for signal classification.The analysis revealed several findings:1. Many studies used EEG signals in their native representation, a two-dimensional matrix, for offline classification of CWL.2. Only a few studies adopted an online or pseudo-online classification strategy for real-time CWL estimation.3. Only a couple of interpretable DNNs and a single generative model were employed for cognitive load detection until now.4. Most researchers used DNNs as black-box type models.In conclusion, DNNs are valuable tools for classifying EEG signals, primarily due to the substantial modeling power provided by the depth of their network architecture. However, the review suggests that interpretable and explainable DNN models should be employed for cognitive workload estimation, as existing methods are limited in the face of the non-stationary nature of the signal.

Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes

  • paper_url: http://arxiv.org/abs/2309.05477
  • repo_url: https://github.com/timsey/npal
  • paper_authors: Tim Bakker, Herke van Hoof, Max Welling
  • for: 提高机器学习模型的数据效率
  • methods: 使用 Attentive Conditional Neural Process 模型,利用活动学习问题的对称和独立性
  • results: 比较多种基elines表现出色,并且对变化数据集 exhibits improved stabilityHere’s the full translation of the abstract in Simplified Chinese:
  • for: 本文提出了一种基于 Pool-based Active Learning 技术的 Classification 方法,以提高机器学习模型的数据效率。
  • methods: 我们使用 Attentive Conditional Neural Process 模型,利用活动学习问题的对称和独立性来学习活动学习策略。
  • results: 我们的方法在不同的数据集和训练设置下表现出色,比较多种基elines表现出色,并且对变化数据集 exhibits improved stability。However, please note that the translation is not perfect and may not capture all the nuances of the original English text.
    Abstract Pool-based active learning (AL) is a promising technology for increasing data-efficiency of machine learning models. However, surveys show that performance of recent AL methods is very sensitive to the choice of dataset and training setting, making them unsuitable for general application. In order to tackle this problem, the field Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting. In this work, we propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem with an Attentive Conditional Neural Process model. Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives, such as those that do not equally weight the error on all data points. We experimentally verify that our Neural Process model outperforms a variety of baselines in these settings. Finally, our experiments show that our model exhibits a tendency towards improved stability to changing datasets. However, performance is sensitive to choice of classifier and more work is necessary to reduce the performance the gap with the myopic oracle and to improve scalability. We present our work as a proof-of-concept for LAL on nonstandard objectives and hope our analysis and modelling considerations inspire future LAL work.
    摘要 池度基于的活动学习(AL)是一种可靠的技术,可以提高机器学习模型的数据效率。然而,评估表明,现有的AL方法在不同的数据集和训练环境下表现非常敏感,使其不适用于通用应用。为了解决这个问题,场景学习活动学习(LAL)建议学习活动学习策略自身,以适应给定的环境。在这个工作中,我们提出了一种基于归一化神经过程模型的LAL方法 для分类。我们的方法基于学习偏向oracle,让我们的模型能够适应非标准目标函数,如不均衡所有数据点的错误。我们实验表明,我们的神经过程模型在这些设置下超过了多种基准。然而,我们的模型表现受到选择类фика器的影响,需要更多的工作来减少与偏向oracle的性能差距,并提高可扩展性。我们的工作作为LAL非标准目标的证明,希望我们的分析和模型考虑能够激励未来LAL工作。

Machine learning the dimension of a Fano variety

  • paper_url: http://arxiv.org/abs/2309.05473
  • repo_url: https://bitbucket.org/fanosearch/mldim
  • paper_authors: Tom Coates, Alexander M. Kasprzyk, Sara Veneziale
  • for: 本研究探讨了 whether the quantum period of a Fano variety determines its dimension.
  • methods: 使用 machine learning 技术,特别是 feed-forward neural network,来解决这个问题。
  • results: 研究发现,一个简单的 feed-forward neural network 可以准确地确定 Fano variety 的维度,准确率达到 98%。此外,研究还提出了对 Fano variety 的几何期的准确误差分析,并证明了这些误差可以用来确定 Fano variety 的维度。
    Abstract Fano varieties are basic building blocks in geometry - they are `atomic pieces' of mathematical shapes. Recent progress in the classification of Fano varieties involves analysing an invariant called the quantum period. This is a sequence of integers which gives a numerical fingerprint for a Fano variety. It is conjectured that a Fano variety is uniquely determined by its quantum period. If this is true, one should be able to recover geometric properties of a Fano variety directly from its quantum period. We apply machine learning to the question: does the quantum period of X know the dimension of X? Note that there is as yet no theoretical understanding of this. We show that a simple feed-forward neural network can determine the dimension of X with 98% accuracy. Building on this, we establish rigorous asymptotics for the quantum periods of a class of Fano varieties. These asymptotics determine the dimension of X from its quantum period. Our results demonstrate that machine learning can pick out structure from complex mathematical data in situations where we lack theoretical understanding. They also give positive evidence for the conjecture that the quantum period of a Fano variety determines that variety.
    摘要 To explore this idea, we applied machine learning techniques to the question of whether the quantum period of a Fano variety can determine its dimension. While there is currently no theoretical understanding of this, we found that a simple feed-forward neural network can accurately determine the dimension of a Fano variety with 98% accuracy.Building on this result, we established rigorous asymptotics for the quantum periods of a class of Fano varieties, which allow us to determine the dimension of a Fano variety from its quantum period. Our findings demonstrate that machine learning can be used to extract structure from complex mathematical data, even in situations where there is no theoretical understanding. Additionally, our results provide positive evidence for the conjecture that the quantum period of a Fano variety determines that variety.

Unveiling the Sentinels: Assessing AI Performance in Cybersecurity Peer Review

  • paper_url: http://arxiv.org/abs/2309.05457
  • repo_url: None
  • paper_authors: Liang Niu, Nian Xue, Christina Pöpper
  • For: This paper aims to evaluate the performance of AI in reviewing academic security conferences, specifically by comparing the results obtained from human reviewers and machine-learning models.* Methods: The paper uses a comprehensive dataset of thousands of papers from computer science conferences and the arXiv preprint website, and evaluates the prediction capabilities of ChatGPT and a two-stage classification approach based on the Doc2Vec model with various classifiers.* Results: The experimental evaluation of review outcome prediction using the Doc2Vec-based approach achieves an accuracy of over 90%, significantly outperforming ChatGPT. The paper also identifies the potential advantages and limitations of the tested ML models and explores areas within the paper-reviewing process that can benefit from automated support approaches.
    Abstract Peer review is the method employed by the scientific community for evaluating research advancements. In the field of cybersecurity, the practice of double-blind peer review is the de-facto standard. This paper touches on the holy grail of peer reviewing and aims to shed light on the performance of AI in reviewing for academic security conferences. Specifically, we investigate the predictability of reviewing outcomes by comparing the results obtained from human reviewers and machine-learning models. To facilitate our study, we construct a comprehensive dataset by collecting thousands of papers from renowned computer science conferences and the arXiv preprint website. Based on the collected data, we evaluate the prediction capabilities of ChatGPT and a two-stage classification approach based on the Doc2Vec model with various classifiers. Our experimental evaluation of review outcome prediction using the Doc2Vec-based approach performs significantly better than the ChatGPT and achieves an accuracy of over 90%. While analyzing the experimental results, we identify the potential advantages and limitations of the tested ML models. We explore areas within the paper-reviewing process that can benefit from automated support approaches, while also recognizing the irreplaceable role of human intellect in certain aspects that cannot be matched by state-of-the-art AI techniques.
    摘要 Peer review 是科学共识社区用来评估研究进步的方法。在网络安全领域,双盲审核是标准做法。这篇论文探讨审核评审的圣杯之物,旨在探讨人工智能在学术安全会议审核中的表现。我们专门 investigate 审核结果的预测性,比较人工审核员和机器学习模型 obtiain 的结果。为了进行这项研究,我们构建了包括了数千篇计算机科学会议和arXiv预印站点上的论文的全面数据集。基于收集到的数据,我们评估 Doc2Vec 模型和两个阶段分类器的预测能力。我们的实验测试结果显示,使用 Doc2Vec 模型和两个阶段分类器可以达到高于 90% 的准确率。在分析实验结果时,我们发现了机器学习模型的优点和局限性,并探讨了可以通过自动支持方法帮助审核过程的部分,同时也认可人工智能在某些方面无法被现代 AI 技术所代替的不可或缺的作用。

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

  • paper_url: http://arxiv.org/abs/2309.05455
  • repo_url: None
  • paper_authors: Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow
  • for: 本研究是为了开发一个可以生成人类样式的合体动作系统,用于GENEA(生成和评估非语言行为 для具有身体的代理)挑战2023。
  • methods: 本研究基于现有的扩散基于动作合成模型,并提出了一个对比性语音和姿势预训练(CSMP)模块,用于学习语音和姿势的semantic coupling。CSMP模块的输出用作diffusion-based gesture synthesis模型的conditioning signal,以实现semantically-aware co-speech gesture generation。
  • results: 根据提交的入场点,我们的系统在人类化度和语音合适度方面得到了最高分,表明我们的系统是一种可靠的实现人类样式的合口动作生成的方法。
    Abstract This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising approach to achieve human-like co-speech gestures in agents that carry semantic meaning.
    摘要

Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models

  • paper_url: http://arxiv.org/abs/2309.05436
  • repo_url: https://github.com/neuripsANON2023/QFF
  • paper_authors: Frederiek Wesel, Kim Batselier
  • for: 提高高维数据集中模型的泛化能力和精度
  • methods: 使用几何和傅ри特特征进行非线性扩展,并将模型参数约化为几何网络
  • results: 在大规模 regression 任务中实现了状态最佳的结果,并且通过实验证明了这种方法可以增强模型的泛化能力和精度
    Abstract In the context of kernel machines, polynomial and Fourier features are commonly used to provide a nonlinear extension to linear models by mapping the data to a higher-dimensional space. Unless one considers the dual formulation of the learning problem, which renders exact large-scale learning unfeasible, the exponential increase of model parameters in the dimensionality of the data caused by their tensor-product structure prohibits to tackle high-dimensional problems. One of the possible approaches to circumvent this exponential scaling is to exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. In this paper we quantize, i.e. further tensorize, polynomial and Fourier features. Based on this feature quantization we propose to quantize the associated model weights, yielding quantized models. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts, at no additional computational cost while learning from identical features. We verify experimentally how this additional tensorization regularizes the learning problem by prioritizing the most salient features in the data and how it provides models with increased generalization capabilities. We finally benchmark our approach on large regression task, achieving state-of-the-art results on a laptop computer.
    摘要 在内核机器学中,多项式和傅里叶特征通常用于提供非线性扩展,将数据映射到更高维的空间。如果不考虑学习问题的 dual 形式,那么因为特征的维度乘积结构而导致的模型参数的几率增长会使大规模学习变得不可行。为了缓解这种几率增长,可以利用特征中的维度结构,将模型参数约束为减参数化的tensor网络。在这篇论文中,我们对多项式和傅里叶特征进行量化,并将相关的模型参数量化。我们发现,对于同样的参数数量,量化模型具有更高的VC-维度上限,而无需额外的计算成本,而且在学习同样的特征时,可以减少特征的繁殖。我们通过实验表明,这种额外的维度regularizes学习问题,使模型具有更好的泛化能力。最后,我们对大规模回归任务进行了 benchmark,在笔记计算机上实现了状态级Result。

  • paper_url: http://arxiv.org/abs/2309.05434
  • repo_url: None
  • paper_authors: Haohui Lu, Shahadat Uddin
  • For: The paper is written for researchers and practitioners working on link prediction in graph machine learning, with applications in disease prediction, social network recommendations, and drug discovery.* Methods: The paper proposes a novel method called Node Centrality and Similarity Based Parameterised Model (NCSM), which integrates node centrality and similarity measures as edge features in a customized Graph Neural Network (GNN) layer.* Results: The proposed model outperforms existing state-of-the-art models like Graph Convolutional Networks and Variational Graph Autoencoder across various metrics and datasets, demonstrating its superiority in link prediction tasks.
    Abstract Link prediction is a key aspect of graph machine learning, with applications as diverse as disease prediction, social network recommendations, and drug discovery. It involves predicting new links that may form between network nodes. Despite the clear importance of link prediction, existing models have significant shortcomings. Graph Convolutional Networks, for instance, have been proven to be highly efficient for link prediction on a variety of datasets. However, they encounter severe limitations when applied to short-path networks and ego networks, resulting in poor performance. This presents a critical problem space that this work aims to address. In this paper, we present the Node Centrality and Similarity Based Parameterised Model (NCSM), a novel method for link prediction tasks. NCSM uniquely integrates node centrality and similarity measures as edge features in a customised Graph Neural Network (GNN) layer, effectively leveraging the topological information of large networks. This model represents the first parameterised GNN-based link prediction model that considers topological information. The proposed model was evaluated on five benchmark graph datasets, each comprising thousands of nodes and edges. Experimental results highlight NCSM's superiority over existing state-of-the-art models like Graph Convolutional Networks and Variational Graph Autoencoder, as it outperforms them across various metrics and datasets. This exceptional performance can be attributed to NCSM's innovative integration of node centrality, similarity measures, and its efficient use of topological information.
    摘要 <>链接预测是图机器学习中关键的一环,它在多个应用领域中发挥着重要的作用,如疾病预测、社交媒体推荐和药物发现。链接预测的目标是预测图中可能会形成的新链接。 despite the clear importance of link prediction, existing models have significant shortcomings. Graph Convolutional Networks, for instance, have been proven to be highly efficient for link prediction on a variety of datasets. However, they encounter severe limitations when applied to short-path networks and ego networks, resulting in poor performance. This presents a critical problem space that this work aims to address.In this paper, we present the Node Centrality and Similarity Based Parameterised Model (NCSM), a novel method for link prediction tasks. NCSM uniquely integrates node centrality and similarity measures as edge features in a customised Graph Neural Network (GNN) layer, effectively leveraging the topological information of large networks. This model represents the first parameterised GNN-based link prediction model that considers topological information.The proposed model was evaluated on five benchmark graph datasets, each comprising thousands of nodes and edges. Experimental results highlight NCSM's superiority over existing state-of-the-art models like Graph Convolutional Networks and Variational Graph Autoencoder, as it outperforms them across various metrics and datasets. This exceptional performance can be attributed to NCSM's innovative integration of node centrality, similarity measures, and its efficient use of topological information.

Neuromorphic Auditory Perception by Neural Spiketrum

  • paper_url: http://arxiv.org/abs/2309.05430
  • repo_url: None
  • paper_authors: Huajin Tang, Pengjie Gu, Jayawan Wijekoon, MHD Anas Alsakkal, Ziming Wang, Jiangrong Shen, Rui Yan
  • for: This paper aims to develop a neural spike coding model called “spiketrum” to efficiently process auditory signals and enable brain-like intelligence in neuromorphic computing.
  • methods: The paper proposes a spiketrum model that transforms time-varying analog signals into spatiotemporal spike patterns, minimizing information loss and providing informational robustness to neural fluctuations and spike losses.
  • results: The paper demonstrates the effectiveness of the spiketrum model through a neuromorphic cochlear prototype, showing that it can provide a systematic solution for spike-based artificial intelligence by fully exploiting the advantages of spike-based computation.
    Abstract Neuromorphic computing holds the promise to achieve the energy efficiency and robust learning performance of biological neural systems. To realize the promised brain-like intelligence, it needs to solve the challenges of the neuromorphic hardware architecture design of biological neural substrate and the hardware amicable algorithms with spike-based encoding and learning. Here we introduce a neural spike coding model termed spiketrum, to characterize and transform the time-varying analog signals, typically auditory signals, into computationally efficient spatiotemporal spike patterns. It minimizes the information loss occurring at the analog-to-spike transformation and possesses informational robustness to neural fluctuations and spike losses. The model provides a sparse and efficient coding scheme with precisely controllable spike rate that facilitates training of spiking neural networks in various auditory perception tasks. We further investigate the algorithm-hardware co-designs through a neuromorphic cochlear prototype which demonstrates that our approach can provide a systematic solution for spike-based artificial intelligence by fully exploiting its advantages with spike-based computation.
    摘要

Temporal Patience: Efficient Adaptive Deep Learning for Embedded Radar Data Processing

  • paper_url: http://arxiv.org/abs/2309.05686
  • repo_url: None
  • paper_authors: Max Sponner, Julius Ott, Lorenzo Servadei, Bernd Waschneck, Robert Wille, Akash Kumar
  • for: 这篇论文旨在提高深度学习推理的能效性,使其在具有限制的附加设备上进行实时处理。
  • methods: 该论文提出了一种使用流动 radar 数据的时间相关性来增强深度学习推理的效率。这些方法包括在架构中添加额外的分类支路,以实现在推理过程中提前终止。
  • results: 该论文的实验结果表明,使用该方法可以在推理过程中减少计算成本,同时保持准确性的最小损失。相比单 Exit 网络,该方法可以节省至 26% 的操作数量。此外,该方法可以与传统优化结合使用,使其在有限的附加设备上可用。
    Abstract Radar sensors offer power-efficient solutions for always-on smart devices, but processing the data streams on resource-constrained embedded platforms remains challenging. This paper presents novel techniques that leverage the temporal correlation present in streaming radar data to enhance the efficiency of Early Exit Neural Networks for Deep Learning inference on embedded devices. These networks add additional classifier branches between the architecture's hidden layers that allow for an early termination of the inference if their result is deemed sufficient enough by an at-runtime decision mechanism. Our methods enable more informed decisions on when to terminate the inference, reducing computational costs while maintaining a minimal loss of accuracy. Our results demonstrate that our techniques save up to 26% of operations per inference over a Single Exit Network and 12% over a confidence-based Early Exit version. Our proposed techniques work on commodity hardware and can be combined with traditional optimizations, making them accessible for resource-constrained embedded platforms commonly used in smart devices. Such efficiency gains enable real-time radar data processing on resource-constrained platforms, allowing for new applications in the context of smart homes, Internet-of-Things, and human-computer interaction.
    摘要 雷达感知器提供了功率高的解决方案,但是处理流处理数据流在具有限制的嵌入式平台上仍然是一个挑战。这篇论文提出了新的技术,利用雷达数据流中的时间相关性来增强深度学习的早期终止网络(Early Exit Neural Networks)的效率在嵌入式设备上。这些网络添加了在架构中隐藏层之间的额外分支,以实现在运行时决策机制的基础上,提前终止推断。我们的方法可以更好地决定终止推断的时间,从而降低计算成本,保持最小的准确性损失。我们的结果表明,我们的技术可以在嵌入式设备上实现26%的操作数减少,相比单exit网络。此外,我们的技术还可以与传统优化相结合,使其在资源有限的嵌入式设备上可用。这些效率提升使得雷达数据流处理在资源有限的平台上实现了实时处理,开 up了新的应用场景,如智能家居、物联网和人机交互。

Learning noise-induced transitions by multi-scaling reservoir computing

  • paper_url: http://arxiv.org/abs/2309.05413
  • repo_url: None
  • paper_authors: Zequn Lin, Zhaofan Lu, Zengru Di, Ying Tang
  • for: 本研究旨在使用机器学习模型,具体是循环神经网络,捕捉时间序列中的随机过渡。
  • methods: 本研究使用了循环神经网络,并对其中的一个关键参数进行了优化。
  • results: 研究发现,使用这种方法可以准确地计算过渡时间和过渡次数的统计数据。此外,该方法还可以capture多稳态系统中的过渡,包括蛋白质折叠过渡。
    Abstract Noise is usually regarded as adversarial to extract the effective dynamics from time series, such that the conventional data-driven approaches usually aim at learning the dynamics by mitigating the noisy effect. However, noise can have a functional role of driving transitions between stable states underlying many natural and engineered stochastic dynamics. To capture such stochastic transitions from data, we find that leveraging a machine learning model, reservoir computing as a type of recurrent neural network, can learn noise-induced transitions. We develop a concise training protocol for tuning hyperparameters, with a focus on a pivotal hyperparameter controlling the time scale of the reservoir dynamics. The trained model generates accurate statistics of transition time and the number of transitions. The approach is applicable to a wide class of systems, including a bistable system under a double-well potential, with either white noise or colored noise. It is also aware of the asymmetry of the double-well potential, the rotational dynamics caused by non-detailed balance, and transitions in multi-stable systems. For the experimental data of protein folding, it learns the transition time between folded states, providing a possibility of predicting transition statistics from a small dataset. The results demonstrate the capability of machine-learning methods in capturing noise-induced phenomena.
    摘要 噪声通常被视为时间序列数据驱动方法中的障碍物,以便学习时间序列中的动力学。然而,噪声可以扮演一种驱动稳定状态之间的转移的功能性角色。为了从数据中捕捉这些随机转移,我们发现可以通过利用机器学习模型,即复合 нейрон网络中的液体计算,来学习噪声引起的转移。我们开发了一种简洁的训练协议,以控制模型中的时间尺度,并将注重这个关键参数。训练后,模型可以准确地预测转移时间和转移次数。这种方法可以应用于广泛的系统中,包括下降 double-well 潜伏 potential 中的 биста布系统,以及白噪声或某些颜色噪声。此外,它还能够捕捉非详细平衡引起的旋转动力学,以及多稳态系统中的转移。对蛋白质折叠的实验数据,它可以学习转移时间 между折叠态,从而提供预测转移统计的可能性。结果表明机器学习方法可以捕捉噪声引起的现象。

Physics-informed reinforcement learning via probabilistic co-adjustment functions

  • paper_url: http://arxiv.org/abs/2309.05404
  • repo_url: None
  • paper_authors: Nat Wannawas, A. Aldo Faisal
    for:The paper is written for training reinforcement learning systems in real-world tasks, which are typically data-inefficient and rely on simulation-based modeling.methods:The paper introduces two novel approaches called co-kriging adjustments (CKA) and ridge regression adjustment (RRA) that combine the advantages of using individual system dynamics and simulation models. These methods use an auto-regressive AR1 co-kriging model integrated with Gaussian process priors to improve uncertainty quantification of the entire system’s dynamics.results:The paper demonstrates the efficiency of co-kriging adjustment with an interpretable reinforcement learning control example, learning to control a biomechanical human arm using only a two-link arm simulation model and CKA derived from a small amount of interaction data. The results show that the method provides more accurate uncertainty quantification of the entire system’s dynamics than pure GP-based and AR1 methods.
    Abstract Reinforcement learning of real-world tasks is very data inefficient, and extensive simulation-based modelling has become the dominant approach for training systems. However, in human-robot interaction and many other real-world settings, there is no appropriate one-model-for-all due to differences in individual instances of the system (e.g. different people) or necessary oversimplifications in the simulation models. This requires two approaches: 1. either learning the individual system's dynamics approximately from data which requires data-intensive training or 2. using a complete digital twin of the instances, which may not be realisable in many cases. We introduce two approaches: co-kriging adjustments (CKA) and ridge regression adjustment (RRA) as novel ways to combine the advantages of both approaches. Our adjustment methods are based on an auto-regressive AR1 co-kriging model that we integrate with GP priors. This yield a data- and simulation-efficient way of using simplistic simulation models (e.g., simple two-link model) and rapidly adapting them to individual instances (e.g., biomechanics of individual people). Using CKA and RRA, we obtain more accurate uncertainty quantification of the entire system's dynamics than pure GP-based and AR1 methods. We demonstrate the efficiency of co-kriging adjustment with an interpretable reinforcement learning control example, learning to control a biomechanical human arm using only a two-link arm simulation model (offline part) and CKA derived from a small amount of interaction data (on-the-fly online). Our method unlocks an efficient and uncertainty-aware way to implement reinforcement learning methods in real world complex systems for which only imperfect simulation models exist.
    摘要 现实世界中的许多任务的强化学习很数据不fficient,而且广泛采用了基于模拟的模型训练系统。然而,在人机交互和许多实际场景中,没有适合一个模型所有的情况,因为实际系统的差异(例如,不同的人)或者模拟模型的必要简化。这需要两种方法:1. either学习个体系统的动态约束从数据中,需要大量的训练数据;2.使用实例的完整数字双方,可能在许多情况下不可能实现。我们介绍了两种新的方法:协同拟合调整(CKA)和ridge regression调整(RRA),这两种方法可以结合模拟模型和GP prior的优点。我们的调整方法基于一个自适应AR1拟合模型,并将其与GP prior相结合。这提供了数据和模拟效率的方式,使用简单的模拟模型(例如,两连接模型),并快速地适应个体实例(例如,人体生物力学)。使用CKA和RRA,我们可以获得更高精度的整体系统动态uncertainty量化,比GP和AR1方法更好。我们通过一个可解释的强化学习控制示例,使用只有两连接臂模型(线上部分)和CKA从小量的互动数据(在线部分)来学习控制人体生物力学。我们的方法可以快速、不确定性感知地实现强化学习方法在实际世界复杂系统中。

Practical Homomorphic Aggregation for Byzantine ML

  • paper_url: http://arxiv.org/abs/2309.05395
  • repo_url: None
  • paper_authors: Antoine Choffrut, Rachid Guerraoui, Rafael Pinot, Renaud Sirdey, John Stephan, Martin Zuber
    for:这个论文是关于分布式学习中的安全和隐私问题的研究。methods:这个论文使用了一种新的纯文本编码方法,这种方法可以在分布式学习中实现稳定的聚合器,并且可以加速现有的幂ometric sorting。results:该论文的实验结果表明,使用这种纯文本编码方法可以实现实时执行,并且与非隐私版本的算法具有相同的机器学习性能。
    Abstract Due to the large-scale availability of data, machine learning (ML) algorithms are being deployed in distributed topologies, where different nodes collaborate to train ML models over their individual data by exchanging model-related information (e.g., gradients) with a central server. However, distributed learning schemes are notably vulnerable to two threats. First, Byzantine nodes can single-handedly corrupt the learning by sending incorrect information to the server, e.g., erroneous gradients. The standard approach to mitigate such behavior is to use a non-linear robust aggregation method at the server. Second, the server can violate the privacy of the nodes. Recent attacks have shown that exchanging (unencrypted) gradients enables a curious server to recover the totality of the nodes' data. The use of homomorphic encryption (HE), a gold standard security primitive, has extensively been studied as a privacy-preserving solution to distributed learning in non-Byzantine scenarios. However, due to HE's large computational demand especially for high-dimensional ML models, there has not yet been any attempt to design purely homomorphic operators for non-linear robust aggregators. In this work, we present SABLE, the first completely homomorphic and Byzantine robust distributed learning algorithm. SABLE essentially relies on a novel plaintext encoding method that enables us to implement the robust aggregator over batching-friendly BGV. Moreover, this encoding scheme also accelerates state-of-the-art homomorphic sorting with larger security margins and smaller ciphertext size. We perform extensive experiments on image classification tasks and show that our algorithm achieves practical execution times while matching the ML performance of its non-private counterpart.
    摘要 Translated into Simplified Chinese:由于大规模数据的可用性,机器学习(ML)算法在分布式架构中被部署,不同的节点共同训练ML模型以其各自的数据交换模型相关信息(例如,梯度)与中央服务器。然而,分布式学习方案受到两种威胁。首先,Byzantine节点可以单方面地损害学习,通过向服务器发送错误信息(例如,错误的梯度)。标准的应对方法是使用非线性的robust合计方法。其次,服务器可以违反节点的隐私。最近的攻击表明,不加加密的梯度交换可以让curious服务器恢复整个节点的数据。使用 homomorphic encryption(HE),一种金标准安全 primitives,广泛研究了分布式学习的隐私保护方案。然而,由于 HE 的计算成本,特别是高维度 ML 模型,还没有任何尝试设计纯 homomorphic 操作符。在这项工作中,我们介绍 SABLE,首个完全 homomorphic 和 Byzantine 抗性的分布式学习算法。 SABLE 基于一种新的普通文本编码方法,允许我们实现robust合计器在批处理友好的 BGV 上。此外,这种编码方案还加速了当前的 homomorphic 排序,提供更大的安全优势和更小的 Ciphertext 大小。我们对图像分类任务进行了广泛的实验,并证明了我们的算法可以实现实用的执行时间,与非隐私counterpart匹配 ML 性能。

Career Path Recommendations for Long-term Income Maximization: A Reinforcement Learning Approach

  • paper_url: http://arxiv.org/abs/2309.05391
  • repo_url: None
  • paper_authors: Spyros Avlonitis, Dor Lavi, Masoud Mansoury, David Graus
  • for: 增强职业规划过程,提高员工长期收入水平
  • methods: 利用Markov决策过程(MDP)和机器学习算法,如Sarsa、Q-学习和A2C,学习优化职业发展路径
  • results: 实验结果显示,RL模型,特别是Q-学习和Sarsa,可以提高员工的收入趋势,平均提高5%,对职业规划过程具有有效性。
    Abstract This study explores the potential of reinforcement learning algorithms to enhance career planning processes. Leveraging data from Randstad The Netherlands, the study simulates the Dutch job market and develops strategies to optimize employees' long-term income. By formulating career planning as a Markov Decision Process (MDP) and utilizing machine learning algorithms such as Sarsa, Q-Learning, and A2C, we learn optimal policies that recommend career paths with high-income occupations and industries. The results demonstrate significant improvements in employees' income trajectories, with RL models, particularly Q-Learning and Sarsa, achieving an average increase of 5% compared to observed career paths. The study acknowledges limitations, including narrow job filtering, simplifications in the environment formulation, and assumptions regarding employment continuity and zero application costs. Future research can explore additional objectives beyond income optimization and address these limitations to further enhance career planning processes.
    摘要 Translation notes:* "Dutch job market" is translated as "荷兰劳动市场" (hànlán láodòng shìchǎng)* "long-term income" is translated as "长期收入" (chángqī shōu jīn)* "career planning" is translated as "职业规划" (zhíyè guīhuà)* "Markov Decision Process" is translated as "Markov决策过程" (Markov juédà gòu jiāng)* "machine learning algorithms" is translated as "机器学习算法" (jīshī xuéxí suàn fāng)* "Q-Learning" is translated as "Q学习" (Q xuéxí)* "Sarsa" is translated as "SARSA" (SARSA)* "A2C" is translated as "A2C" (A2C)* "income trajectories" is translated as "收入轨迹" (shōu jīn xiào tiě)* "RL models" is translated as "RL模型" (RL módeli)

Data-Driven Model Reduction and Nonlinear Model Predictive Control of an Air Separation Unit by Applied Koopman Theory

  • paper_url: http://arxiv.org/abs/2309.05386
  • repo_url: None
  • paper_authors: Jan C. Schulze, Danimir T. Doncevic, Nils Erwes, Alexander Mitsos
  • for: 实现实时能力是非线性预测控制(NMPC)的industrial应用前提。数据驱动模型减少提供了一种获取低阶控制模型的方法,并且该方法需要 minimal expert knowledge of the particular process and its model.
  • methods: 我们使用了 Schulze et al. (2022)提出的数据驱动减少策略,基于Koopman理论,生成了一个低阶控制模型,并使用了机器学习来构建。
  • results: 我们的减少策略和适应NMPC实现使得ASU的实时NMPC可以实现,而且相比原始模型,CPU时间减少了98%。
    Abstract Achieving real-time capability is an essential prerequisite for the industrial implementation of nonlinear model predictive control (NMPC). Data-driven model reduction offers a way to obtain low-order control models from complex digital twins. In particular, data-driven approaches require little expert knowledge of the particular process and its model, and provide reduced models of a well-defined generic structure. Herein, we apply our recently proposed data-driven reduction strategy based on Koopman theory [Schulze et al. (2022), Comput. Chem. Eng.] to generate a low-order control model of an air separation unit (ASU). The reduced Koopman model combines autoencoders and linear latent dynamics and is constructed using machine learning. Further, we present an NMPC implementation that uses derivative computation tailored to the fixed block structure of reduced Koopman models. Our reduction approach with tailored NMPC implementation enables real-time NMPC of an ASU at an average CPU time decrease by 98 %.
    摘要 实时能力是非线性预测控制(NMPC)的industrial实现的必要前提。数据驱动模型减少提供了从复杂数字响应器中获得低阶控制模型的方法。特别是数据驱动方法不需要特定过程和模型的专家知识,并提供了具有well-defined结构的减少模型。在这里,我们使用我们最近提出的基于Koopman理论的数据驱动减少策略来生成一个气体分离机(ASU)的低阶控制模型。减少的Koopman模型结合了自动编码器和线性潜在动力,通过机器学习构建。此外,我们提出了适应fixed block结构的减少Koopman模型的derivative计算,以实现NMPC实现。我们的减少方法和适应NMPC实现使得ASU的实时NMPC实现时间减少了98%。

EDAC: Efficient Deployment of Audio Classification Models For COVID-19 Detection

  • paper_url: http://arxiv.org/abs/2309.05357
  • repo_url: https://github.com/edac-ml4h/edac-ml4h
  • paper_authors: Andrej Jovanović, Mario Mihaly, Lennon Donaldson
  • for: 本研究旨在开发一种可靠、可行的 COVID-19 检测方法,以帮助预防和控制疫情的蔓延。
  • methods: 本研究使用机器学习方法,利用 CT 扫描图像和喊喊声音信号作为输入特征,通过深度神经网络架构实现 COVID-19 的检测。
  • results: 研究人员通过网络剪辑和量化技术来压缩两个模型,实现了模型文件尺寸的压缩和检测性能的维持。Specifically, 研究人员可以实现模型文件尺寸的压缩105.76倍和19.34倍,并对两个模型的检测时间进行了1.37倍和1.71倍的压缩。
    Abstract The global spread of COVID-19 had severe consequences for public health and the world economy. The quick onset of the pandemic highlighted the potential benefits of cheap and deployable pre-screening methods to monitor the prevalence of the disease in a population. Various researchers made use of machine learning methods in an attempt to detect COVID-19. The solutions leverage various input features, such as CT scans or cough audio signals, with state-of-the-art results arising from deep neural network architectures. However, larger models require more compute; a pertinent consideration when deploying to the edge. To address this, we first recreated two models that use cough audio recordings to detect COVID-19. Through applying network pruning and quantisation, we were able to compress these two architectures without reducing the model's predictive performance. Specifically, we were able to achieve an 105.76x and an 19.34x reduction in the compressed model file size with corresponding 1.37x and 1.71x reductions in the inference times of the two models.
    摘要 COVID-19 的全球蔓延引发了公共卫生和世界经济的严重后果。快速的疫情爆发表明了可能利用便宜并可部署的预屏检测方法来监测人口中疫苗的存在。各种研究人员利用机器学习方法尝试检测 COVID-19。这些解决方案利用了不同的输入特征,如 CT 扫描或喊嚔音信号,并使用了当前的神经网络架构获得了 state-of-the-art 的结果。然而,更大的模型需要更多的计算资源,这是在部署到边缘时必须考虑的。为解决这个问题,我们首先重新创建了两个使用喊嚔音记录检测 COVID-19 的模型。通过应用网络剪辑和量化,我们能够压缩这两个架构,而无需降低模型的预测性能。具体来说,我们能够实现一个 105.76x 和一个 19.34x 的压缩模型文件大小减少,相应的执行时间也降低了 1.37x 和 1.71x。

Neural Discovery of Permutation Subgroups

  • paper_url: http://arxiv.org/abs/2309.05352
  • repo_url: None
  • paper_authors: Pavan Karjol, Rohan Kashyap, Prathosh A P
  • for: 找到 permutation group $S_{n}$ 中的子群 $H$
  • methods: 使用 $S_{n}$-invariant function 和线性变换来发现 underlying subgroup
  • results: 可以发现任何类型 $S_{k} (k \leq n)$ 的子群,并且证明了类似结果对循环和对称群也成立。
    Abstract We consider the problem of discovering subgroup $H$ of permutation group $S_{n}$. Unlike the traditional $H$-invariant networks wherein $H$ is assumed to be known, we present a method to discover the underlying subgroup, given that it satisfies certain conditions. Our results show that one could discover any subgroup of type $S_{k} (k \leq n)$ by learning an $S_{n}$-invariant function and a linear transformation. We also prove similar results for cyclic and dihedral subgroups. Finally, we provide a general theorem that can be extended to discover other subgroups of $S_{n}$. We also demonstrate the applicability of our results through numerical experiments on image-digit sum and symmetric polynomial regression tasks.
    摘要 我团队考虑了找到 permutation group $S_{n}$ 中的子群 $H$ 的问题。不同于传统的 $H$-invariant 网络,我们提出了一种方法,可以在 $H$ 满足 certain conditions 时找到其下面的子群。我们的结果表明,可以通过学习 $S_{n}$- invariant 函数和一个线性变换来找到任何类型 $S_{k} (k \leq n)$ 的子群。此外,我们还证明了这些结果的类型也适用于圆柱体和二面体 subgroup。最后,我们提出了一个通用的定理,可以扩展到其他 $S_{n}$ 中的子群。我们还通过数值实验证明了我们的结果,在 image-digit sum 和 symmetric polynomial regression 任务中。

A DRL-based Reflection Enhancement Method for RIS-assisted Multi-receiver Communications

  • paper_url: http://arxiv.org/abs/2309.05343
  • repo_url: None
  • paper_authors: Wei Wang, Peizheng Li, Angela Doufexi, Mark A Beach
  • for: 本研究旨在优化具有 periodical 单反射profile 的 RIS 助持 wireless 通信系统中的 pointing 精度和反射强度。
  • methods: 本文提出了一种基于深度优化学习(DRL)的优化方法,用于解决 periodic 单反射profile 的杂合导致的 amplitude/phase 干扰问题。
  • results: 对比Random Search和枚举Search两种方法,DRL 方法在优化时间短化方面表现出了明显的优势,并且实现了无硬件修改的 1.2 dB 增强和更宽的抛射束。
    Abstract In reconfigurable intelligent surface (RIS)-assisted wireless communication systems, the pointing accuracy and intensity of reflections depend crucially on the 'profile,' representing the amplitude/phase state information of all elements in a RIS array. The superposition of multiple single-reflection profiles enables multi-reflection for distributed users. However, the optimization challenges from periodic element arrangements in single-reflection and multi-reflection profiles are understudied. The combination of periodical single-reflection profiles leads to amplitude/phase counteractions, affecting the performance of each reflection beam. This paper focuses on a dual-reflection optimization scenario and investigates the far-field performance deterioration caused by the misalignment of overlapped profiles. To address this issue, we introduce a novel deep reinforcement learning (DRL)-based optimization method. Comparative experiments against random and exhaustive searches demonstrate that our proposed DRL method outperforms both alternatives, achieving the shortest optimization time. Remarkably, our approach achieves a 1.2 dB gain in the reflection peak gain and a broader beam without any hardware modifications.
    摘要 在带有智能表面(RIS)的无线通信系统中,投射精度和反射强度受到 Profile(所有数组元素的振荡状态信息)的影响。多个单投射Profile的超пози合成可以实现分布式用户的多投射。然而,单投射Profile的Periodic配置和多投射Profile的优化挑战尚未得到足够的研究。在这种双投射优化场景中,我们发现了 Profile重叠的misalignment导致的远场性能弱化。为解决这问题,我们提出了一种基于深度学习(DRL)的优化方法。与随机搜索和枚举搜索相比,我们的提案的DRL方法在优化时间上表现出了明显的优势,并且实现了无硬件修改的1.2 dB的反射峰强度提高和更广的扩散。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Stochastic Gradient Descent-like relaxation is equivalent to Glauber dynamics in discrete optimization and inference problems

  • paper_url: http://arxiv.org/abs/2309.05337
  • repo_url: None
  • paper_authors: Maria Chiara Angelini, Angelo Giorgio Cavaliere, Raffaele Marino, Federico Ricci-Tersenghi
  • for: 这篇论文是 investigate whether Stochastic Gradient Descent (SGD) and Glauber dynamics are substantially different, and to understand the relationship between the two algorithms.
  • methods: 这篇论文使用了SGD-like algorithm和Metropolis Monte Carlo algorithm,并 Compares their dynamics in discrete optimization and inference problems.
  • results: 研究发现,在离散优化和推理问题中,SGD-like algorithm的动力学与Metropolis Monte Carlo algorithm具有很高的相似性,即使这两种算法在详细平衡不满足的情况下。这种相似性使得我们可以使用关于 Monte Carlo 算法的性能和限制来优化SGD-like algorithm的 mini-batch 大小,并使其在困难的推理问题中效率地恢复信号。
    Abstract Is Stochastic Gradient Descent (SGD) substantially different from Glauber dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g.\ SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.
    摘要

A Strong and Simple Deep Learning Baseline for BCI MI Decoding

  • paper_url: http://arxiv.org/abs/2309.07159
  • repo_url: https://github.com/elouayas/eegsimpleconv
  • paper_authors: Yassine El Ouahidi, Vincent Gripon, Bastien Pasdeloup, Ghaith Bouallegue, Nicolas Farrugia, Giulia Lioi
  • for: 这 paper 是为了提出一种简单的1D convolutional neural network(EEG-SimpleConv),用于股肱运动干预BCI的识别。
  • methods: 这 paper 使用了常见的文献中的标准组件,包括1D convolutional neural network和简单的训练策略。
  • results: EEG-SimpleConv 在四个EEG股肱运动数据集上表现至少如好或更高效,并且具有跨主题知识传递能力,但是执行时间较低。
    Abstract We propose EEG-SimpleConv, a straightforward 1D convolutional neural network for Motor Imagery decoding in BCI. Our main motivation is to propose a very simple baseline to compare to, using only very standard ingredients from the literature. We evaluate its performance on four EEG Motor Imagery datasets, including simulated online setups, and compare it to recent Deep Learning and Machine Learning approaches. EEG-SimpleConv is at least as good or far more efficient than other approaches, showing strong knowledge-transfer capabilities across subjects, at the cost of a low inference time. We advocate that using off-the-shelf ingredients rather than coming with ad-hoc solutions can significantly help the adoption of Deep Learning approaches for BCI. We make the code of the models and the experiments accessible.
    摘要 我们提出EEG-SimpleConv,一个简单的1D卷积神经网络,用于肌电意念识别 BCIs。我们的主要动机是提出一个非常简单的基准,使用文献中的标准元素。我们在四个EEG肌电意念数据集上评估了EEG-SimpleConv的性能,并与最近的深度学习和机器学习方法进行比较。EEG-SimpleConv至少和其他方法一样好,甚至更高效,在不同主题之间具有强大的知识传递能力,但没有高的推断时间。我们认为使用商业可用的元素而不是创建特殊解决方案可以帮助深度学习方法在 BCIs 的采纳。我们将模型和实验的代码公开。

Neural Koopman prior for data assimilation

  • paper_url: http://arxiv.org/abs/2309.05317
  • repo_url: https://github.com/anthony-frion/sentinel2ts
  • paper_authors: Anthony Frion, Lucas Drumetz, Mauro Dalla Mura, Guillaume Tochon, Abdeldjalil Aïssa El Bey
  • for: 这个论文是用来描述如何使用神经网络模型来描述动态系统的。
  • methods: 该论文使用了 Koopman 算子理论来嵌入动态系统在隐藏空间中,以便在这个空间中描述动态系统的动态。它还介绍了一些方法来训练这种模型,包括自我监督学习和变量数据整合。
  • results: 该论文的实验结果表明,使用这种神经网络模型可以在不精确时间序列数据的情况下进行长期不间断的重建,并且可以在难以预测的情况下进行自动适应。此外,论文还示出了使用训练过的动态模型作为Variational数据整合的先验。
    Abstract With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.
    摘要 In this paper, we use a neural network architecture that leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, allowing for several appealing features. We introduce methods that enable training such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. Additionally, we demonstrate the potential for self-supervised learning by showing how trained dynamical models can be used as priors for variational data assimilation techniques, with applications to time series interpolation and forecasting.

Balance Measures Derived from Insole Sensor Differentiate Prodromal Dementia with Lewy Bodies

  • paper_url: http://arxiv.org/abs/2309.08623
  • repo_url: None
  • paper_authors: Masatomo Kobayashi, Yasunori Yamada, Kaoru Shinkawa, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai
    for:这个研究旨在提供一种自动化识别患有 Lewy bodies 认知障碍的机器学习 pipeline,以便在 prodromal 阶段提供适当的护理。methods:这个研究使用了一种基于机器学习的自动化识别方法,利用了一个尺度感应器获取的30秒站立任务中的平衡测量数据。results:研究发现,结果模型可以准确地识别患有 Lewy bodies 认知障碍的参与者,与其他组比之下,准确率可达78.0%(AUC:0.681),比对照模型基于人口和临床神经心理测量的准确率高6.8%。
    Abstract Dementia with Lewy bodies is the second most common type of neurodegenerative dementia, and identification at the prodromal stage$-$i.e., mild cognitive impairment due to Lewy bodies (MCI-LB)$-$is important for providing appropriate care. However, MCI-LB is often underrecognized because of its diversity in clinical manifestations and similarities with other conditions such as mild cognitive impairment due to Alzheimer's disease (MCI-AD). In this study, we propose a machine learning-based automatic pipeline that helps identify MCI-LB by exploiting balance measures acquired with an insole sensor during a 30-s standing task. An experiment with 98 participants (14 MCI-LB, 38 MCI-AD, 46 cognitively normal) showed that the resultant models could discriminate MCI-LB from the other groups with up to 78.0% accuracy (AUC: 0.681), which was 6.8% better than the accuracy of a reference model based on demographic and clinical neuropsychological measures. Our findings may open up a new approach for timely identification of MCI-LB, enabling better care for patients.
    摘要 德мен般 Lewy body 是第二常见的肉体性脑下降症,早期识别$-$即轻度智能障碍due to Lewy bodies (MCI-LB)$-$是提供适当照顾的关键。但是, MCI-LB 常常被认为是其他病情的一部分,因为它的临床表现多样化和 Alzheimer's disease 的轻度智能障碍 (MCI-AD) 相似。在这项研究中,我们提出了一个基于机器学习的自动化管道,可以帮助识别 MCI-LB。我们使用了一个尖锐感知器测量的30秒立位任务中的平衡测量,并使用机器学习算法对数据进行分析。实验中还有98名参与者(14名 MCI-LB,38名 MCI-AD,46名正常智能),结果显示,所得的模型可以对 MCI-LB 和其他两个群体进行区别,精度可达78.0%(AUC: 0.681),较传统基于人口和临床神经心理测验的模型好6.8%。我们的发现可能会开启一个新的识别 MCI-LB 的方法,帮助病人获得更好的照顾。

Fully-Connected Spatial-Temporal Graph for Multivariate Time Series Data

  • paper_url: http://arxiv.org/abs/2309.05305
  • repo_url: None
  • paper_authors: Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen
  • for: 本文旨在为多变量时间序列(MTS)数据提供有效的模型方法,具体来说是利用图神经网络(GNN)来处理MTS数据中的空间时间(ST)相互关联性。
  • methods: 本文提出了一种新的方法 called Fully-Connected Spatial-Temporal Graph Neural Network (FC-STGNN),它包括两个关键组件:FC图构建和FC图卷积。FC图构建使用衰减图连接各感知器的时间距离,以全面模型ST相互关联性,并且FC图卷积使用移动覆盖GNN层来有效地捕捉ST相互关联性。
  • results: 对多个MTS数据集进行了广泛的实验,FC-STGNN的性能比SOTA方法更高,demonstrating the effectiveness of the proposed method in handling MTS data with ST dependencies.
    Abstract Multivariate Time-Series (MTS) data is crucial in various application fields. With its sequential and multi-source (multiple sensors) properties, MTS data inherently exhibits Spatial-Temporal (ST) dependencies, involving temporal correlations between timestamps and spatial correlations between sensors in each timestamp. To effectively leverage this information, Graph Neural Network-based methods (GNNs) have been widely adopted. However, existing approaches separately capture spatial dependency and temporal dependency and fail to capture the correlations between Different sEnsors at Different Timestamps (DEDT). Overlooking such correlations hinders the comprehensive modelling of ST dependencies within MTS data, thus restricting existing GNNs from learning effective representations. To address this limitation, we propose a novel method called Fully-Connected Spatial-Temporal Graph Neural Network (FC-STGNN), including two key components namely FC graph construction and FC graph convolution. For graph construction, we design a decay graph to connect sensors across all timestamps based on their temporal distances, enabling us to fully model the ST dependencies by considering the correlations between DEDT. Further, we devise FC graph convolution with a moving-pooling GNN layer to effectively capture the ST dependencies for learning effective representations. Extensive experiments show the effectiveness of FC-STGNN on multiple MTS datasets compared to SOTA methods.
    摘要 多变量时间序列(MTS)数据在各种应用领域具有重要意义。MTS数据具有顺序和多源(多感器)性质,因此自然地带有空间-时(ST)相关性,包括时间相关性和感器在每个时间戳中的空间相关性。为了有效利用这些信息,图神经网络(GNN)已经广泛应用。然而,现有方法通常分别捕捉空间相关性和时间相关性,而忽略了不同感器在不同时间戳之间的相关性(DEDT)。这限制了现有GNN的全面模型化能力,从而阻碍它们学习有效表示。为解决这个局限性,我们提出了一种新方法,即完全连接空间-时图神经网络(FC-STGNN),其包括两个关键组件:FC图构建和FC图卷积。为图构建,我们设计了衰减图来连接不同时间戳的感器,根据它们的时间距离来建立连接,从而允许我们完全模型ST相关性,包括不同DEDT之间的相关性。此外,我们设计了移动 Pooling GNN层,以有效地捕捉ST相关性,以便学习有效表示。我们对多个MTS数据集进行了广泛的实验,并证明了FC-STGNN在相对方法的基础上表现出色。

The fine print on tempered posteriors

  • paper_url: http://arxiv.org/abs/2309.05292
  • repo_url: None
  • paper_authors: Konstantinos Pitas, Julyan Arbel
  • for: 本研究探讨tempered posteriors的细节,发现了一些重要但未经讨论的点。
  • methods: 我们使用realistic models和dataset,以及Laplace approximation的紧张情况,发现在实际情况下,随机性不一定能提高测试精度。
  • results: 我们发现,bayesian模型中的随机性可能会导致测试精度下降,而targeting Frequentist metrics使得temperature参数$\lambda$在优化目标中无法被视为简单地修正错误的先前分布或概率。
    Abstract We conduct a detailed investigation of tempered posteriors and uncover a number of crucial and previously undiscussed points. Contrary to previous results, we first show that for realistic models and datasets and the tightly controlled case of the Laplace approximation to the posterior, stochasticity does not in general improve test accuracy. The coldest temperature is often optimal. One might think that Bayesian models with some stochasticity can at least obtain improvements in terms of calibration. However, we show empirically that when gains are obtained this comes at the cost of degradation in test accuracy. We then discuss how targeting Frequentist metrics using Bayesian models provides a simple explanation of the need for a temperature parameter $\lambda$ in the optimization objective. Contrary to prior works, we finally show through a PAC-Bayesian analysis that the temperature $\lambda$ cannot be seen as simply fixing a misspecified prior or likelihood.
    摘要 我们进行了详细的探究模tempered posteriors,并发现了一些重要且先前未讨论的点。与之前的结果不同,我们第一次表明在实际模型和数据集下,精度控制的情况下, Stochasticity不一定提高测试准确率。最冷的温度通常是最佳。一 might think Bayesian模型具有一定的随机性可以至少获得改善 Calibration的提高。但我们实际观测到,当获得了这些改善时,它们来的代价是测试准确率下降。然后我们讨论了如何使用 Bayesian 模型来target Frequentist metrics,并通过 PAC-Bayesian 分析显示了温度参数 $\lambda$ 不能被简单地视为修正了错误的先验或likelihood。

Efficient Finite Initialization for Tensorized Neural Networks

  • paper_url: http://arxiv.org/abs/2309.06577
  • repo_url: https://github.com/i3bquantumteam/q4real
  • paper_authors: Alejandro Mata Ali, Iñigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta
  • for: 本研究旨在开发一种初始化tensorized神经网络层的新方法,以避免这些层的参数数量爆炸。这种方法适用于具有较高节点数的层,其中每个节点都与输入或输出相连。
  • methods: 本方法基于层的 Frobenius нор的 iterative 部分形式使用,以确保其 Finite 并且在某个范围内。这种 norm 计算效率高,可以在大多数情况下完全或部分计算。
  • results: 我们在不同层上应用该方法并评估其性能。我们还创建了一个 Python 函数,可以在任意层上运行,可以在 GitHub 上找到:https://github.com/i3BQuantumTeam/Q4Real/blob/e07c827651ef16bcf74590ab965ea3985143f891/Quantum-Inspired%20Variational%20Methods/Normalization_process.ipynb
    Abstract We present a novel method for initializing layers of tensorized neural networks in a way that avoids the explosion of the parameters of the matrix it emulates. The method is intended for layers with a high number of nodes in which there is a connection to the input or output of all or most of the nodes. The core of this method is the use of the Frobenius norm of this layer in an iterative partial form, so that it has to be finite and within a certain range. This norm is efficient to compute, fully or partially for most cases of interest. We apply the method to different layers and check its performance. We create a Python function to run it on an arbitrary layer, available in a Jupyter Notebook in the i3BQuantum repository: https://github.com/i3BQuantumTeam/Q4Real/blob/e07c827651ef16bcf74590ab965ea3985143f891/Quantum-Inspired%20Variational%20Methods/Normalization_process.ipynb
    摘要 我团队提出了一种新的层初始化方法,用于避免tensorized神经网络层的参数爆炸。这种方法适用于具有较高节点数的层,其中所有或大部分节点与输入或输出进行连接。我们的方法的核心在于使用层的 Frobenius нор的迭代部分形式,使其必须是Finite且在某个范围内。这个 нор 效率Compute,可以在大多数情况下进行完全或半完全计算。我们在不同的层上应用了这种方法,并对其性能进行了检查。我们还创建了一个Python函数,可以在任意层上运行,可以在 i3BQuantum 存储库中找到:https://github.com/i3BQuantumTeam/Q4Real/blob/e07c827651ef16bcf74590ab965ea3985143f891/Quantum-Inspired%20Variational%20Methods/Normalization_process.ipynb。

Compressed Real Numbers for AI: a case-study using a RISC-V CPU

  • paper_url: http://arxiv.org/abs/2309.07158
  • repo_url: None
  • paper_authors: Federico Rossi, Marco Cococcioni, Roger Ferrer Ibàñez, Jesùs Labarta, Filippo Mantovani, Marc Casas, Emanuele Ruffaldi, Sergio Saponara
  • for: 本研究旨在提高深度神经网络(DNN)的运算效率,使用更低的精度数字。
  • methods: 本文使用了两种已经在机器学习应用中取得了有趣的结果的压缩格式:bfloat和posit。
  • results: 本文提出了一种在计算前 decompress tensor 的方法,以避免压缩后的数据进行计算时的带宽使用和缓存不足问题。
    Abstract As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machine learning applications, without sensible degradation of the accuracy: bfloat and posit. Even if 16-bit and 8-bit bfloat/posit are routinely used for reducing the storage of the weights/biases of trained DNNs, the inference still often happens on the 32-bit FPU of the CPU (especially if GPUs are not available). In this paper we propose a way to decompress a tensor of bfloat/posits just before computations, i.e., after the compressed operands have been loaded within the vector registers of a vector capable CPU, in order to save bandwidth usage and increase cache efficiency. Finally, we show the architectural parameters and considerations under which this solution is advantageous with respect to the uncompressed one.
    摘要 Recently, deep neural networks (DNN) 通常使用单精度 IEEE 754 浮点数 (binary32) 进行训练,但也可以使用较低精度。因此,16 位和 8 位压缩格式在机器学习应用中吸引了广泛的关注。在这篇论文中,我们关注了两种家族的格式,它们已经在机器学习应用中实现了有趣的结果,无论是否占用精度:bfloat 和 posit。尽管 16 位和 8 位 bfloat/posit 通常用于减少训练后的权重/偏移的存储,但是推理通常发生在 CPU 的 32 位 FPU 上(尤其是如果 GPU 不可用)。在这篇论文中,我们提议在计算之前,即在 vector 可能 CPU 中加载压缩参数后,解压缩一个 tensor 中的 bfloat/posits,以节省带宽使用和提高缓存效率。最后,我们介绍了这种解压缩方案与不压缩方案之间的架构参数和考虑因素。

Beamforming in Wireless Coded-Caching Systems

  • paper_url: http://arxiv.org/abs/2309.05276
  • repo_url: None
  • paper_authors: Sneha Madhusudan, Charitha Madapatha, Behrooz Makki, Hao Guo, Tommy Svensson
  • for: 提高存取网络的容量对于传输网络 pose 容量挑战,但用户数据需求具有空间和时间相关性,可能被利用。
  • methods: 我们 investigate 一种 integrate beamforming 和coded-caching 的无线传输网络架构,其中服务器具有多个天线,将内容广播到缓存节点,负责为用户提供内容。
  • results: 我们的设计可以实现多播机会增加,干扰 Mitigation 和减少峰值后端流量。 Comparative 分析表明,与传统、uncoded caching 方法相比,我们的方法具有显著的优势。 更进一步,我们发现适当的扫描优化可以增强coded-caching 技术的效iveness,导致峰值后端流量的显著减少。
    Abstract Increased capacity in the access network poses capacity challenges on the transport network due to the aggregated traffic. However, there are spatial and time correlation in the user data demands that could potentially be utilized. To that end, we investigate a wireless transport network architecture that integrates beamforming and coded-caching strategies. Especially, our proposed design entails a server with multiple antennas that broadcasts content to cache nodes responsible for serving users. Traditional caching methods face the limitation of relying on the individual memory with additional overhead. Hence, we develop an efficient genetic algorithm-based scheme for beam optimization in the coded-caching system. By exploiting the advantages of beamforming and coded-caching, the architecture achieves gains in terms of multicast opportunities, interference mitigation, and reduced peak backhaul traffic. A comparative analysis of this joint design with traditional, un-coded caching schemes is also conducted to assess the benefits of the proposed approach. Additionally, we examine the impact of various buffering and decoding methods on the performance of the coded-caching scheme. Our findings suggest that proper beamforming is useful in enhancing the effectiveness of the coded-caching technique, resulting in significant reduction in peak backhaul traffic.
    摘要 增加了访问网络的容量会导致传输网络的压力增加,但是用户数据需求存在空间和时间的协调关系,这可能可以利用。为此,我们调查了一种具有扫描和编码缓存策略的无线传输网络架构。具体来说,我们的设计包括一个有多个天线的服务器,通过扫描来广播内容到缓存节点,负责服务用户。传统的缓存方法受到各自内存的限制,同时增加了过程中的额外开销。因此,我们开发了一种高效的遗传算法基本方法来优化扫描。通过利用扫描和编码缓存的优势,该架构实现了多播机会的增加、干扰 Mitigation和传输峰值压力的减少。我们还对该 JOINT 设计与传统、未编码缓存方案进行了比较分析,以评估提议的方法的优势。此外,我们还研究了缓存和解码方法对编码缓存方案的性能影响。我们的发现表明,当使用正确的扫描时,编码缓存技术的效果会增强,从而导致峰值传输压力的显著减少。

Generalized Graphon Process: Convergence of Graph Frequencies in Stretched Cut Distance

  • paper_url: http://arxiv.org/abs/2309.05260
  • repo_url: None
  • paper_authors: Xingchao Jian, Feng Ji, Wee Peng Tay
  • for: 本文研究了稀疏图序列的收敛Property,并提出了一种基于泛化图和延展距离的方法来描述这种收敛。
  • methods: 本文使用了通过一种随机图生成器生成的泛化图来模型稀疏图的收敛,并证明了随机图的谱值收敛。
  • results: 本文的研究表明,通过延展距离来定义的稀疏图序列可以收敛到一个泛化图,并且可以在稀疏图上实现 Transfer Learning。
    Abstract Graphons have traditionally served as limit objects for dense graph sequences, with the cut distance serving as the metric for convergence. However, sparse graph sequences converge to the trivial graphon under the conventional definition of cut distance, which make this framework inadequate for many practical applications. In this paper, we utilize the concepts of generalized graphons and stretched cut distance to describe the convergence of sparse graph sequences. Specifically, we consider a random graph process generated from a generalized graphon. This random graph process converges to the generalized graphon in stretched cut distance. We use this random graph process to model the growing sparse graph, and prove the convergence of the adjacency matrices' eigenvalues. We supplement our findings with experimental validation. Our results indicate the possibility of transfer learning between sparse graphs.
    摘要 GRAPHONS tradicionalmente han servido como objetos de límite para secuencias de grafos densos, con la distancia de cortes sirviendo como métrica para la convergencia. Sin embargo, las secuencias de grafos esparsas convergen al grafón trivial bajo la definición conventional de distancia de cortes, lo que hace que este marco sea inadecuado para muchas aplicaciones prácticas. En este artículo, utilizamos los conceptos de grafons generalizados y la distancia de cortes estirada para describir la convergencia de las secuencias de grafos esparsas. En particular, consideramos un proceso de grafos aleatorio generado desde un grafon generalizado. Este proceso de grafos aleatorio converge al grafon generalizado en distancia de cortes estirada. Usamos este proceso de grafos aleatorio para modelar el crecimiento de grafos esparsos y probamos la convergencia de los valores propios de las matrices de conexión. Complementamos nuestros resultados con validación experimental. Nuestros resultados sugieren la posibilidad de transferencia de aprendizaje entre grafos esparsos.Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

A physics-informed and attention-based graph learning approach for regional electric vehicle charging demand prediction

  • paper_url: http://arxiv.org/abs/2309.05259
  • repo_url: None
  • paper_authors: Haohao Qu, Haoxuan Kuang, Jun Li, Linlin You
  • for: 预测电动车充电需求,以便优化电动车充电空间使用,从而缓解城市智能交通系统的负载。
  • methods: 融合 graf和时间注意力机制进行特征提取,并使用物理学 Informed 元学习来对模型进行预训。
  • results: 在深圳18,013个电动车充电桩数据集上进行评估,获得了预测性能的州OF-THE-ART和理解价格变化导致充电需求的适应性。
    Abstract Along with the proliferation of electric vehicles (EVs), optimizing the use of EV charging space can significantly alleviate the growing load on intelligent transportation systems. As the foundation to achieve such an optimization, a spatiotemporal method for EV charging demand prediction in urban areas is required. Although several solutions have been proposed by using data-driven deep learning methods, it can be found that these performance-oriented methods may suffer from misinterpretations to correctly handle the reverse relationship between charging demands and prices. To tackle the emerging challenges of training an accurate and interpretable prediction model, this paper proposes a novel approach that enables the integration of graph and temporal attention mechanisms for feature extraction and the usage of physic-informed meta-learning in the model pre-training step for knowledge transfer. Evaluation results on a dataset of 18,013 EV charging piles in Shenzhen, China, show that the proposed approach, named PAG, can achieve state-of-the-art forecasting performance and the ability in understanding the adaptive changes in charging demands caused by price fluctuations.
    摘要 随着电动汽车(EV)的普及,优化EV充电空间的使用可以有效缓解城市交通系统的增长荷负。为达到这种优化,在城市区域内需要一种空间时间方法来预测EV充电需求。虽然已有使用数据驱动深度学习方法提出了许多解决方案,但这些性能强调的方法可能会错误地处理充电需求和价格之间的反关系。为了解决这些新出现的训练精度和可读性预测模型的挑战,这篇论文提出了一种新的方法,名为PAG,它可以integrate图和时间注意力机制以提取特征,并在模型预训练步骤中使用物理学习来传递知识。对于18,013个EV充电柱在深圳市的数据进行评估,PAG方法可以实现领先的预测性能和理解充电需求因价格波动而发生的适应变化。

Examining the Effect of Pre-training on Time Series Classification

  • paper_url: http://arxiv.org/abs/2309.05256
  • repo_url: None
  • paper_authors: Jiashu Pu, Shiwei Zhao, Ling Cheng, Yongzhu Chang, Runze Wu, Tangjie Lv, Rongsheng Zhang
  • for: 这个研究旨在探讨无监督预训的后续精度调整方法在新的时间序列模式下的效果。
  • methods: 该研究使用了150个分类数据集,来对无监督预训后续精度调整的效果进行了全面的检查。
  • results: 研究发现,预训只有在数据适应度较差的情况下能够改善优化过程,而不会对数据适应度较好的情况下提供正则化效果。此外,预训不会提高总体化能力,但可以加速参数的整合。尽管预训任务和模型结构都会影响该方法在给定数据集上的效果,但模型结构在这两个因素中扮演更重要的角色。
    Abstract Although the pre-training followed by fine-tuning paradigm is used extensively in many fields, there is still some controversy surrounding the impact of pre-training on the fine-tuning process. Currently, experimental findings based on text and image data lack consensus. To delve deeper into the unsupervised pre-training followed by fine-tuning paradigm, we have extended previous research to a new modality: time series. In this study, we conducted a thorough examination of 150 classification datasets derived from the Univariate Time Series (UTS) and Multivariate Time Series (MTS) benchmarks. Our analysis reveals several key conclusions. (i) Pre-training can only help improve the optimization process for models that fit the data poorly, rather than those that fit the data well. (ii) Pre-training does not exhibit the effect of regularization when given sufficient training time. (iii) Pre-training can only speed up convergence if the model has sufficient ability to fit the data. (iv) Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume, such as faster convergence. (v) While both the pre-training task and the model structure determine the effectiveness of the paradigm on a given dataset, the model structure plays a more significant role.
    摘要 尽管预训练followed by fine-tuning paradigm在多个领域广泛应用,但是预训练对精度调整的影响仍存在一定的争议。目前,基于文本和图像数据的实验结果缺乏一致性。为了更深入地探讨预训练followed by fine-tuning paradigm,我们在新的模式下进行了扩展性研究:时间序列数据。在这项研究中,我们对150个分类数据集 derived from Univariate Time Series (UTS)和Multivariate Time Series (MTS) benchmark进行了全面的分析。我们的分析结果显示了以下几点:(i) 预训练只能帮助改善模型不适合数据的优化过程,而不是适合数据的模型。(ii) 预训练不会在充分训练时间后展现正则化效果。(iii) 预训练只能快速化整合速度,只要模型具备足够的适应能力。(iv) 增加更多的预训练数据不会提高总体化,但可以强化预训练对原始数据量的优势,如更快的整合速度。(v) 预训练任务和模型结构决定了预训练followed by fine-tuning paradigm在给定数据集的效果,但模型结构在这两个因素中扮演更重要的角色。

A quantum tug of war between randomness and symmetries on homogeneous spaces

  • paper_url: http://arxiv.org/abs/2309.05253
  • repo_url: None
  • paper_authors: Rahul Arvind, Kishor Bharti, Jun Yong Khoo, Dax Enshan Koh, Jian Feng Kong
  • for: 研究量子信息中的Symmetry和Randomness的关系
  • methods: 采用几何方法,考虑状态为$H$-相似的集合,并引入哈恩抽象来描述真正随机的系统
  • results: 提出了基于哈恩空间的随机性概念,并研究了近似随机性和假随机性的概念,以及其应用于量子机器学习模型的表达性。
    Abstract We explore the interplay between symmetry and randomness in quantum information. Adopting a geometric approach, we consider states as $H$-equivalent if related by a symmetry transformation characterized by the group $H$. We then introduce the Haar measure on the homogeneous space $\mathbb{U}/H$, characterizing true randomness for $H$-equivalent systems. While this mathematical machinery is well-studied by mathematicians, it has seen limited application in quantum information: we believe our work to be the first instance of utilizing homogeneous spaces to characterize symmetry in quantum information. This is followed by a discussion of approximations of true randomness, commencing with $t$-wise independent approximations and defining $t$-designs on $\mathbb{U}/H$ and $H$-equivalent states. Transitioning further, we explore pseudorandomness, defining pseudorandom unitaries and states within homogeneous spaces. Finally, as a practical demonstration of our findings, we study the expressibility of quantum machine learning ansatze in homogeneous spaces. Our work provides a fresh perspective on the relationship between randomness and symmetry in the quantum world.
    摘要 我们探索量子信息中的对称和随机性的关系。我们采用幂等方法,将状态看作$H$-相似的,当$H$表示一个群。然后我们引入了哈ROW measure在同质空间$\mathbb{U}/H$上,这个概率测度描述了真正的随机性。这个数学工具已经由数学家们很好地研究过,但在量子信息领域却很少被应用。我们认为我们的工作是量子信息领域中第一次利用同质空间来描述对称的。接着,我们讨论了true randomness的近似,包括$t$-wise独立的近似和$H$-相似的状态上的$t$-设计。然后我们探索了假随机性,定义了在同质空间中的假随机单位和状态。最后,我们研究了使用同质空间来表示量子机器学习模型的表达性。我们的工作提供了量子世界中对于随机性和对称的新的视角。

Graph Contextual Contrasting for Multivariate Time Series Classification

  • paper_url: http://arxiv.org/abs/2309.05202
  • repo_url: None
  • paper_authors: Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen
  • for: 本文提出了一种新的对比学习方法,用于 Multivariate Time-Series(MTS)分类。该方法可以保证不同视图的无标样本之间的一致性,并学习有效的表示。
  • methods: 该方法使用了图像增强和对比技术来保持时间束缚的稳定性,并使用图像对比来提取强健的感知器和相关性。
  • results: 实验结果表明,提出的GCC方法可以在多种MTS分类任务上达到领先的表现。
    Abstract Contrastive learning, as a self-supervised learning paradigm, becomes popular for Multivariate Time-Series (MTS) classification. It ensures the consistency across different views of unlabeled samples and then learns effective representations for these samples. Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques, aiming to preserve temporal patterns against perturbations for MTS data. However, they overlook spatial consistency that requires the stability of individual sensors and their correlations. As MTS data typically originate from multiple sensors, ensuring spatial consistency becomes essential for the overall performance of contrastive learning on MTS data. Thus, we propose Graph Contextual Contrasting (GCC) for spatial consistency across MTS data. Specifically, we propose graph augmentations including node and edge augmentations to preserve the stability of sensors and their correlations, followed by graph contrasting with both node- and graph-level contrasting to extract robust sensor- and global-level features. We further introduce multi-window temporal contrasting to ensure temporal consistency in the data for each sensor. Extensive experiments demonstrate that our proposed GCC achieves state-of-the-art performance on various MTS classification tasks.
    摘要 《对比学习》作为一种自我监督学习方法,在多变量时间序列(MTS)分类中变得受欢迎。它确保不同视图的无标示样本之间的一致性,然后学习这些样本的有效表示。现有的对比学习方法主要强调获得时间一致性,通过时间扩展和对比技术来保持时间模式对MTS数据的抗变化。然而,它们忽视了空间一致性,即感知器的稳定性和相关性。由于MTS数据通常来自多个感知器,保证空间一致性对总性表现的影响是关键的。因此,我们提议使用图构造对比(GCC)来保证MTS数据的空间一致性。具体来说,我们提出了图加工包括节点和边加工,以保持感知器的稳定性和相关性,然后与节点和图 уровень对比来提取Robust的感知器和全局级别特征。此外,我们还引入了多窗口时间对比来确保每个感知器的数据中的时间一致性。广泛的实验证明,我们提出的GCC可以在多种MTS分类任务上达到顶尖性能。

CARE: Confidence-rich Autonomous Robot Exploration using Bayesian Kernel Inference and Optimization

  • paper_url: http://arxiv.org/abs/2309.05200
  • repo_url: https://github.com/shepherd-gregory/bkio-exploration
  • paper_authors: Yang Xu, Ronghao Zheng, Senlin Zhang, Meiqin Liu, Shoudong Huang
  • for: 本研究旨在提高无人机在未知和复杂环境中的信息基于自主探索效率。
  • methods: 我们首先使用 Gaussian process(GP)回归来学习一个替身模型,以便通过查询控制动作的信息强度来INFER confidence-rich mutual information(CRMI),然后采用一个包含预测值和预测不确定性的目标函数来进行 Bayesian optimization(BO),即 GP-based BO(GPBO)。通过让探索和利用之间进行交互,我们可以实现质量和效率之间的平衡。
  • results: 我们提出了一种新的轻量级信息增加推理方法,基于 Bayesian kernel inference and optimization(BKIO),可以在不需要训练的情况下实现approximate logarithmic complexity。BKIO可以通过INFER CRMI和选择最佳动作来实现GPBO的同等准确性,但是具有更高的效率。我们在不同的无结构、杂乱环境中进行了广泛的数值和实际实验,并证明了我们的提议的效果。我们还提供了我们的开源实现代码,可以在 https://github.com/Shepherd-Gregory/BKIO-Exploration 中下载。
    Abstract In this paper, we consider improving the efficiency of information-based autonomous robot exploration in unknown and complex environments. We first utilize Gaussian process (GP) regression to learn a surrogate model to infer the confidence-rich mutual information (CRMI) of querying control actions, then adopt an objective function consisting of predicted CRMI values and prediction uncertainties to conduct Bayesian optimization (BO), i.e., GP-based BO (GPBO). The trade-off between the best action with the highest CRMI value (exploitation) and the action with high prediction variance (exploration) can be realized. To further improve the efficiency of GPBO, we propose a novel lightweight information gain inference method based on Bayesian kernel inference and optimization (BKIO), achieving an approximate logarithmic complexity without the need for training. BKIO can also infer the CRMI and generate the best action using BO with bounded cumulative regret, which ensures its comparable accuracy to GPBO with much higher efficiency. Extensive numerical and real-world experiments show the desired efficiency of our proposed methods without losing exploration performance in different unstructured, cluttered environments. We also provide our open-source implementation code at https://github.com/Shepherd-Gregory/BKIO-Exploration.
    摘要 在这篇论文中,我们考虑了改善自动机器人在未知和复杂环境中的信息基本探索效率。我们首先使用 Gaussian process(GP)回归来学习一个替代模型,来推算信息充沛的共享信息(CRMI)的询问控制动作的估计值,然后采用一个包含预测值和预测不确定性的目标函数来进行 Bayesian 优化(BO),即 GP-based BO(GPBO)。通过在最佳动作和高信息充沛动作之间进行负荷平衡,我们可以实现对 GPBO 的高效性。为了进一步提高 GPBO 的效率,我们提出了一种新的轻量级信息增强推断方法,基于 Bayesian kernel 推断和优化(BKIO),可以在无需训练的情况下实现对数 logarithmic 复杂度。BKIO 还可以计算 CRMI 和生成最佳动作,并使用 BO 实现 bounded 累累 regret,这 garantizes 其与 GPBO 相比较准确。我们在不同的无结构、堆满环境中进行了广泛的数值和实际实验,并证明了我们的提出的方法可以保持高效性而不失去探索性。我们还提供了我们的开源实现代码,可以在 中找到。